Deep neural network driven mixture of PLDA for robust i-vector speaker verification

Na Li, Man Wai Mak, Jen Tzung Chien

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

7 Citations (Scopus)

Abstract

In speaker recognition, the mismatch between the enrollment and test utterances due to noise with different signal-to-noise ratios (SNRs) is a great challenge. Based on the observation that noise-level variability causes the i-vectors to form heterogeneous clusters, this paper proposes using an SNR-aware deep neural network (DNN) to guide the training of PLDA mixture models. Specifically, given an i-vector, the SNR posterior probabilities produced by the DNN are used as the posteriors of indicator variables of the mixture model. As a result, the proposed model provides a more reasonable soft division of the i-vector space compared to the conventional mixture of PLDA. During verification, given a test trial, the marginal likelihoods from individual PLDA models are linearly combined by the posterior probabilities of SNR levels computed by the DNN. Experimental results for SNR mismatch tasks based on NIST 2012 SRE suggest that the proposed model is more effective than PLDA and conventional mixture of PLDA for handling heterogeneous corpora.
Original languageEnglish
Title of host publication2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings
PublisherIEEE
Pages186-191
Number of pages6
ISBN (Electronic)9781509049035
DOIs
Publication statusPublished - 7 Feb 2017
Event2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - San Diego, United States
Duration: 13 Dec 201616 Dec 2016

Conference

Conference2016 IEEE Workshop on Spoken Language Technology, SLT 2016
Country/TerritoryUnited States
CitySan Diego
Period13/12/1616/12/16

Keywords

  • Deep neural networks
  • I-vector
  • Mixture of PLDA
  • SNR mismatch
  • Speaker verification

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Artificial Intelligence
  • Language and Linguistics
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this