Senone I-vectors for robust speaker verification

Zhili Tan, Yingke Zhu, Man Wai Mak, Brian Kan Wing Mak

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

4 Citations (Scopus)

Abstract

Recent research has shown that using senone posteriors for i-vector extraction can achieve outstanding performance. In this paper, we extend this idea to robust speaker verification by constructing a deep neural network (DNN) comprising a deep belief network (DBN) stacked on top of a denoising autoencoder (DAE). The proposed method addresses noise robustness in two perspectives: (1) denoising the MFCC vectors through the DAE and (2) extracting noise robust bottleneck (BN) features and senone posteriors from the DBN for total-variability matrix training and i-vector extraction. The DAE comprises several layers of restricted Boltzmann machines (RBM), which are trained to minimize the mean squared error between the denoised and clean MFCCs. After training the DAE, three layers of RBMs are put on top of it to form the DNN. The whole network is fine-tuned by backpropagation to minimize the cross-entropy between the senone labels and network outputs. This architecture allows us to extract BN features and estimates senone posteriors given noisy MFCCs as input, resulting in robust BN-based senone i-vectors. Results on NIST 2012 SRE show that these senone i-vectors outperform the conventional i-vectors and the BN-based i-vectors in which the posteriors are obtained from a GMM.
Original languageEnglish
Title of host publicationProceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
PublisherIEEE
ISBN (Electronic)9781509042937
DOIs
Publication statusPublished - 2 May 2017
Event10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016 - Tianjin, China
Duration: 17 Oct 201620 Oct 2016

Conference

Conference10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
CountryChina
CityTianjin
Period17/10/1620/10/16

Keywords

  • Deep learning
  • Denoising autoencoders
  • I-vectors
  • Senone posteriors
  • Speaker verification

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Linguistics and Language

Cite this