TY - GEN
T1 - Information Maximized Variational Domain Adversarial Learning for Speaker Verification
AU - Tu, Youzhi
AU - Mak, Man Wai
AU - Chien, Jen-Tzung
PY - 2020/5
Y1 - 2020/5
N2 - Domain mismatch is a common problem in speaker verification. This paper proposes an information-maximized variational domain adversarial neural network (InfoVDANN) to reduce domain mismatch by incorporating an InfoVAE into domain adversarial training (DAT). DAT aims to produce speaker discriminative and domain-invariant features. The InfoVAE has two roles. First, it performs variational regularization on the learned features so that they follow a Gaussian distribution, which is essential for the standard PLDA backend. Second, it preserves mutual information between the features and the training set to extract extra speaker discriminative information. Experiments on both SRE16 and SRE18-CMN2 show that the InfoVDANN outperforms the recent VDANN, which suggests that increasing the mutual information between the latent features and input features enables the InfoVDANN to extract extra speaker information that is otherwise not possible.
AB - Domain mismatch is a common problem in speaker verification. This paper proposes an information-maximized variational domain adversarial neural network (InfoVDANN) to reduce domain mismatch by incorporating an InfoVAE into domain adversarial training (DAT). DAT aims to produce speaker discriminative and domain-invariant features. The InfoVAE has two roles. First, it performs variational regularization on the learned features so that they follow a Gaussian distribution, which is essential for the standard PLDA backend. Second, it preserves mutual information between the features and the training set to extract extra speaker discriminative information. Experiments on both SRE16 and SRE18-CMN2 show that the InfoVDANN outperforms the recent VDANN, which suggests that increasing the mutual information between the latent features and input features enables the InfoVDANN to extract extra speaker information that is otherwise not possible.
KW - Speaker verification
KW - adversarial training
KW - domain adaptation
KW - mutual information
KW - variational autoencoder
UR - http://www.scopus.com/inward/record.url?scp=85089228360&partnerID=8YFLogxK
U2 - 10.1109/ICASSP40776.2020.9053735
DO - 10.1109/ICASSP40776.2020.9053735
M3 - Conference article published in proceeding or book
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6449
EP - 6453
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Y2 - 4 May 2020 through 8 May 2020
ER -