Abstract
Recent research has shown that using senone posteriors for i-vector extraction can achieve outstanding performance. In this paper, we extend this idea to robust speaker verification by constructing a deep neural network (DNN) comprising a deep belief network (DBN) stacked on top of a denoising autoencoder (DAE). The proposed method addresses noise robustness in two perspectives: (1) denoising the MFCC vectors through the DAE and (2) extracting noise robust bottleneck (BN) features and senone posteriors from the DBN for total-variability matrix training and i-vector extraction. The DAE comprises several layers of restricted Boltzmann machines (RBM), which are trained to minimize the mean squared error between the denoised and clean MFCCs. After training the DAE, three layers of RBMs are put on top of it to form the DNN. The whole network is fine-tuned by backpropagation to minimize the cross-entropy between the senone labels and network outputs. This architecture allows us to extract BN features and estimates senone posteriors given noisy MFCCs as input, resulting in robust BN-based senone i-vectors. Results on NIST 2012 SRE show that these senone i-vectors outperform the conventional i-vectors and the BN-based i-vectors in which the posteriors are obtained from a GMM.
Original language | English |
---|---|
Title of host publication | Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016 |
Publisher | IEEE |
ISBN (Electronic) | 9781509042937 |
DOIs | |
Publication status | Published - 2 May 2017 |
Event | 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016 - Tianjin, China Duration: 17 Oct 2016 → 20 Oct 2016 |
Conference
Conference | 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016 |
---|---|
Country/Territory | China |
City | Tianjin |
Period | 17/10/16 → 20/10/16 |
Keywords
- Deep learning
- Denoising autoencoders
- I-vectors
- Senone posteriors
- Speaker verification
ASJC Scopus subject areas
- Signal Processing
- Computer Vision and Pattern Recognition
- Linguistics and Language