TY - GEN
T1 - Unsupervised Domain Adaptation for Gender-Aware PLDA Mixture Models
AU - Li, Longxin
AU - Mak, Man Wai
PY - 2018/4/15
Y1 - 2018/4/15
N2 - Probabilistic linear discriminant analysis (PLDA) is a state-of-art back-end for i-vector based speaker verification. However, this backend is still problematic when (1) the model is deployed to new environment (in-domain) that is very different from the training one (out-of-domain) and (2) there are insufficient labeled data from the new environment. To address these problems, this paper proposes using out-of-domain training data to pre-train a PLDA mixture model and applying the mixture model on the in-domain training data to compute a pairwise score matrix for spectral clustering. The hypothesized speaker labels produced by spectral clustering are then used for re-training the mixture model to fit the new environment. To refine the mixture model, the spectral clustering and re-training processes are repeated a number of times. To make the mixture model amenable to both genders, a deep neural network (DNN) is trained to produce gender posteriors given an i-vector. The gender posteriors then replace the posterior probabilities of the indicator variables in the PLDA mixture model. Evaluations based on NIST 2016 SRE suggest that at the end of the iterative re-training, the PLDA mixture model becomes fully adapted to the new domain. Results also show that the PLDA scores can be readily incorporated into spectral clustering, resulting in high quality speaker clusters that could not be possibly achieved by agglomerative hierarchical clustering.
AB - Probabilistic linear discriminant analysis (PLDA) is a state-of-art back-end for i-vector based speaker verification. However, this backend is still problematic when (1) the model is deployed to new environment (in-domain) that is very different from the training one (out-of-domain) and (2) there are insufficient labeled data from the new environment. To address these problems, this paper proposes using out-of-domain training data to pre-train a PLDA mixture model and applying the mixture model on the in-domain training data to compute a pairwise score matrix for spectral clustering. The hypothesized speaker labels produced by spectral clustering are then used for re-training the mixture model to fit the new environment. To refine the mixture model, the spectral clustering and re-training processes are repeated a number of times. To make the mixture model amenable to both genders, a deep neural network (DNN) is trained to produce gender posteriors given an i-vector. The gender posteriors then replace the posterior probabilities of the indicator variables in the PLDA mixture model. Evaluations based on NIST 2016 SRE suggest that at the end of the iterative re-training, the PLDA mixture model becomes fully adapted to the new domain. Results also show that the PLDA scores can be readily incorporated into spectral clustering, resulting in high quality speaker clusters that could not be possibly achieved by agglomerative hierarchical clustering.
KW - DNN-driven mixture of PLDA
KW - Domain adaptation
KW - L-vectors
KW - Speaker verification
KW - Spectral clustering
UR - http://www.scopus.com/inward/record.url?scp=85054288121&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2018.8461943
DO - 10.1109/ICASSP.2018.8461943
M3 - Conference article published in proceeding or book
AN - SCOPUS:85054288121
SN - 9781538646588
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5269
EP - 5273
BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Y2 - 15 April 2018 through 20 April 2018
ER -