Speaker Representation Learning via Contrastive Loss with Maximal Speaker Separability

Zhe Li, Man Wai Mak

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

7 Citations (Scopus)

Abstract

A great challenge in speaker representation learning using deep models is to design learning objectives that can enhance the discrimination of unseen speakers under unseen domains. This work proposes a supervised contrastive learning objective to learn a speaker embedding space by effectively leveraging the label information in the training data. In such a space, utterance pairs spoken by the same or similar speakers will stay close, while utterance pairs spoken by different speakers will be far apart. For each training speaker, we perform random data augmentation on their utterances to form positive pairs, and utterances from different speakers form negative pairs. To maxi-mize speaker separability in the embedding space, we incorporate the additive angular-margin loss into the contrastive learning objective. Experimental results on CN-Celeb show that this new learning objective can cause ECAPA-TDNN to find an embedding space that exhibits great speaker discrimination. The contrastive learning objective is easy to implement, and we provide PyTorch code at https://github.com/shanmon110/AAMSupCon.

Original languageEnglish
Title of host publicationProceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages962-967
Number of pages6
ISBN (Electronic)9786165904773
DOIs
Publication statusPublished - Nov 2022
Event2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022 - Chiang Mai, Thailand
Duration: 7 Nov 202210 Nov 2022

Publication series

NameProceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022

Conference

Conference2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
Country/TerritoryThailand
CityChiang Mai
Period7/11/2210/11/22

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Speaker Representation Learning via Contrastive Loss with Maximal Speaker Separability'. Together they form a unique fingerprint.

Cite this