Variational domain adversarial learning for speaker verification

Youzhi Tu, Man Wai Mak, Jen Tzung Chien

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

18 Citations (Scopus)

Abstract

Domain mismatch refers to the problem in which the distribution of training data differs from that of the test data. This paper proposes a variational domain adversarial neural network (VDANN), which consists of a variational autoencoder (VAE) and a domain adversarial neural network (DANN), to reduce domain mismatch. The DANN part aims to retain speaker identity information and learn a feature space that is robust against domain mismatch, while the VAE part is to impose variational regularization on the learned features so that they follow a Gaussian distribution. Thus, the representation produced by VDANN is not only speaker discriminative and domain-invariant but also Gaussian distributed, which is essential for the standard PLDA backend. Experiments on both SRE16 and SRE18-CMN2 show that VDANN outperforms the Kaldi baseline and the standard DANN. The results also suggest that VAE regularization is effective for domain adaptation.

Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Pages4315-4319
Number of pages5
Volume2019-September
DOIs
Publication statusPublished - 15 Sep 2019
Event20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria
Duration: 15 Sep 201919 Sep 2019

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Conference

Conference20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019
Country/TerritoryAustria
CityGraz
Period15/09/1919/09/19

Keywords

  • Domain adaptation
  • Domain adversarial training
  • Speaker verification
  • Variational autoencoder

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this