Denoised Senone I-Vectors for Robust Speaker Verification

Zhili Tan, Man Wai Mak, Brian Kan Wing Mak, Yingke Zhu

Research output: Journal article publicationJournal articleAcademic researchpeer-review

7 Citations (Scopus)

Abstract

Recently, it has been shown that senone i-vectors, whose posteriors are produced by senone deep neural networks (DNNs), outperform the conventional Gaussian mixture model (GMM) i-vectors in both speaker and language recognition tasks. The success of senone i-vectors relies on the capability of the DNN to incorporate phonetic information into the i-vector extraction process. In this paper, we argue that to apply senone i-vectors in noisy environments, it is important to robustify the phonetically discriminative acoustic features and senone posteriors estimated by the DNN. To this end, we propose a deep architecture formed by stacking a deep belief network on top of a denoising autoencoder (DAE). After backpropagation fine-tuning, the network, referred to as denoising autoencoder-deep neural network (DAE-DNN), facilitates the extraction of robust phonetically-discriminitive bottleneck (BN) features and senone posteriors for i-vector extraction. We refer to the resulting i-vectors as denoised BN-based senone i-vectors. Results on NIST 2012 SRE show that senone i-vectors outperform the conventional GMM i-vectors. More interestingly, the BN features are not only phonetically discriminative, results suggest that they also contain sufficient speaker information to produce BN-based senone i-vectors that outperform the conventional senone i-vectors. This work also shows that DAE training is more beneficial to BN feature extraction than senone posterior estimation.
Original languageEnglish
Article number8269399
Pages (from-to)820-830
Number of pages11
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume26
Issue number4
DOIs
Publication statusPublished - 1 Apr 2018

Keywords

  • deep learning
  • denoising autoencoders
  • i-vectors
  • noise robustness
  • phonetically discriminative features
  • senone posteriors
  • Speaker verification

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

Cite this