Multi-level Deep Neural Network Adaptation for Speaker Verification using MMD and Consistency Regularization

Weiwei Lin, Man Wai Mak, Na Li, D. Su, D. Yu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

30 Citations (Scopus)

Abstract

Adapting speaker verification (SV) systems to a new environment is a very challenging task. Current adaptation methods in SV mainly focus on the backend, i.e, adaptation is carried out after the speaker embeddings have been created. In this paper, we present a DNN-based adaptation method using maximum mean discrepancy (MMD). Our method exploits two important aspects neglected by previous research. First, instead of minimizing domain discrepancy at utterance-level alone, our method minimizes domain discrepancy at both frame-level and utterance-level, which we believe will make the adaptation more robust to the duration discrepancy between training data and test data. Second, we introduce a consistency regularization for unlabelled target-domain data. The consistency regularization encourages the target speaker embeddings robust to adverse perturbations. Experiments on NIST SRE 2016 and 2018 show that our DNN adaptation works significantly better than the previously proposed DNN adaptation methods. What's more, our method works well with backend adaptation. By combining the proposed method with backend adaptation, we achieve a 9% improvement over backend adaptation in SRE18.
Original languageEnglish
Title of host publication2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
Pages6839-6843
ISBN (Electronic)9781509066315
DOIs
Publication statusPublished - May 2020
Event2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain
Duration: 4 May 20208 May 2020

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Conference

Conference2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Country/TerritorySpain
CityBarcelona
Period4/05/208/05/20

Keywords

  • data augmentation
  • domain adaptation
  • maximum mean discrepancy
  • Speaker verification
  • transfer learning

Fingerprint

Dive into the research topics of 'Multi-level Deep Neural Network Adaptation for Speaker Verification using MMD and Consistency Regularization'. Together they form a unique fingerprint.

Cite this