Abstract
In speaker recognition, the mismatch between the enrollment and test utterances due to noise with different signal-to-noise ratios (SNRs) is a great challenge. Based on the observation that noise-level variability causes the i-vectors to form heterogeneous clusters, this paper proposes using an SNR-aware deep neural network (DNN) to guide the training of PLDA mixture models. Specifically, given an i-vector, the SNR posterior probabilities produced by the DNN are used as the posteriors of indicator variables of the mixture model. As a result, the proposed model provides a more reasonable soft division of the i-vector space compared to the conventional mixture of PLDA. During verification, given a test trial, the marginal likelihoods from individual PLDA models are linearly combined by the posterior probabilities of SNR levels computed by the DNN. Experimental results for SNR mismatch tasks based on NIST 2012 SRE suggest that the proposed model is more effective than PLDA and conventional mixture of PLDA for handling heterogeneous corpora.
Original language | English |
---|---|
Title of host publication | 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings |
Publisher | IEEE |
Pages | 186-191 |
Number of pages | 6 |
ISBN (Electronic) | 9781509049035 |
DOIs | |
Publication status | Published - 7 Feb 2017 |
Event | 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - San Diego, United States Duration: 13 Dec 2016 → 16 Dec 2016 |
Conference
Conference | 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 |
---|---|
Country/Territory | United States |
City | San Diego |
Period | 13/12/16 → 16/12/16 |
Keywords
- Deep neural networks
- I-vector
- Mixture of PLDA
- SNR mismatch
- Speaker verification
ASJC Scopus subject areas
- Human-Computer Interaction
- Artificial Intelligence
- Language and Linguistics
- Computer Vision and Pattern Recognition
- Computer Science Applications