DNN-Driven Mixture of PLDA for Robust Speaker Verification

Na Li, Man Wai Mak, Jen Tzung Chien

Research output: Journal article publicationJournal articleAcademic researchpeer-review

19 Citations (Scopus)

Abstract

The mismatch between enrollment and test utterances due to different types of variabilities is a great challenge in speaker verification. Based on the observation that the SNR-level variability or channel-type variability causes heterogeneous clusters in i-vector space, this paper proposes to apply supervised learning to drive or guide the learning of probabilistic linear discriminant analysis (PLDA) mixture models. Specifically, a deep neural network (DNN) is trained to produce the posterior probabilities of different SNR levels or channel types given i-vectors as input. These posteriors then replace the posterior probabilities of indicator variables in the mixture of PLDA. The discriminative training causes the mixture model to perform more reasonable soft divisions of the i-vector space as compared to the conventional mixture of PLDA. During verification, given a test i-vector and a target-speaker's i-vector, the marginal likelihood for the same-speaker hypothesis is obtained by summing the component likelihoods weighted by the component posteriors produced by the DNN, and likewise for the different-speaker hypothesis. Results based on NIST 2012 SRE demonstrate that the proposed scheme leads to better performance under more realistic situations where both training and test utterances cover a wide range of SNRs and different channel types. Unlike the previous SNR-dependent mixture of PLDA which only focuses on SNR mismatch, the proposed model is more general and is potentially applicable to addressing different types of variability in speech.
Original languageEnglish
Pages (from-to)1371-1383
Number of pages13
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume25
Issue number6
DOIs
Publication statusPublished - 1 Jun 2017

Keywords

  • Deep neural networks
  • i-vectors
  • mixture of PLDA
  • speaker verification

ASJC Scopus subject areas

  • Signal Processing
  • Media Technology
  • Instrumentation
  • Acoustics and Ultrasonics
  • Linguistics and Language
  • Electrical and Electronic Engineering
  • Speech and Hearing

Fingerprint

Dive into the research topics of 'DNN-Driven Mixture of PLDA for Robust Speaker Verification'. Together they form a unique fingerprint.

Cite this