TY - JOUR
T1 - Variational Domain Adversarial Learning With Mutual Information Maximization for Speaker Verification
AU - Tu, Youzhi
AU - Mak, Man Wai
AU - Chien, Jen-Tzung
N1 - Funding Information:
Manuscript received January 7, 2020; revised May 15, 2020; accepted June 7, 2020. Date of publication June 24, 2020; date of current version July 9, 2020. This work was supported in part by the RGC of Hong Kong SAR Grant PolyU 152137/17E and in part by Taiwan MOST Grant 109-2634-F-009-024. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Yu Tsao. (Corresponding author: Man-Wai Mak.) Youzhi Tu and Man-Wai Mak are with the Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong SAR (e-mail: [email protected]; [email protected]).
Publisher Copyright:
© 2014 IEEE.
PY - 2020/6
Y1 - 2020/6
N2 - Domain mismatch is a common problem in speaker verification (SV) and often causes performance degradation. For the system relying on the Gaussian PLDA backend to suppress the channel variability, the performance would be further limited if there is no Gaussianity constraint on the learned embeddings. This paper proposes an information-maximized variational domain adversarial neural network (InfoVDANN) that incorporates an InfoVAE into domain adversarial training (DAT) to reduce domain mismatch and simultaneously meet the Gaussianity requirement of the PLDA backend. Specifically, DAT is applied to produce speaker discriminative and domain-invariant features, while the InfoVAE performs variational regularization on the embedded features so that they follow a Gaussian distribution. Another benefit of the InfoVAE is that it avoids posterior collapse in VAEs by preserving the mutual information between the embedded features and the training set so that extra speaker information can be retained in the features. Experiments on both SRE16 and SRE18-CMN2 show that the InfoVDANN outperforms the recent VDANN, which suggests that increasing the mutual information between the embedded features and input features enables the InfoVDANN to extract extra speaker information that is otherwise not possible.
AB - Domain mismatch is a common problem in speaker verification (SV) and often causes performance degradation. For the system relying on the Gaussian PLDA backend to suppress the channel variability, the performance would be further limited if there is no Gaussianity constraint on the learned embeddings. This paper proposes an information-maximized variational domain adversarial neural network (InfoVDANN) that incorporates an InfoVAE into domain adversarial training (DAT) to reduce domain mismatch and simultaneously meet the Gaussianity requirement of the PLDA backend. Specifically, DAT is applied to produce speaker discriminative and domain-invariant features, while the InfoVAE performs variational regularization on the embedded features so that they follow a Gaussian distribution. Another benefit of the InfoVAE is that it avoids posterior collapse in VAEs by preserving the mutual information between the embedded features and the training set so that extra speaker information can be retained in the features. Experiments on both SRE16 and SRE18-CMN2 show that the InfoVDANN outperforms the recent VDANN, which suggests that increasing the mutual information between the embedded features and input features enables the InfoVDANN to extract extra speaker information that is otherwise not possible.
KW - Speaker verification (SV)
KW - domain adaptation
KW - domain adversarial training
KW - mutual information
KW - variational autoencoder
UR - http://www.scopus.com/inward/record.url?scp=85089194439&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2020.3004760
DO - 10.1109/TASLP.2020.3004760
M3 - Journal article
SN - 2329-9290
VL - 28
SP - 2013
EP - 2024
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
M1 - 9124672
ER -