Discriminative subspace modeling of SNR and duration variabilities for robust speaker verification

Na Li, Man Wai Mak, Wei Wei Lin, Jen Tzung Chien

Research output: Journal article publicationJournal articleAcademic researchpeer-review

6 Citations (Scopus)

Abstract

This paper aims to improve the robustness of i-vector based speaker verification systems by compensating for the utterance-length variability and noise-level variability. Inspired by the recent findings that noise-level variability can be modeled by a signal-to-noise ratio (SNR) subspace and that duration variability can be modeled as additive noise in the i-vector space, we propose to add an SNR factor and a duration factor to the PLDA model. In this framework, we assume that i-vectors derived from utterances with comparable durations share similar duration-specific information and that i-vectors extracted from utterances within a narrow SNR range have similar SNR-specific information. Based on these assumptions, an i-vector can be represented as a linear combination of four components: speaker, SNR, duration, and channel. A variational Bayes algorithm is developed to infer this latent variable model via a discriminative subspace training procedure. In the testing stage, different variabilities are compensated for when computing the likelihood ratio. Experiments on Common Conditions 1 and 4 in NIST 2012 SRE show that the proposed model outperforms the conventional PLDA and SNR-invariant PLDA. Results also show that the proposed model performs better than the uncertainty-propagation PLDA (UP-PLDA) for long test utterances.
Original languageEnglish
Pages (from-to)83-103
Number of pages21
JournalComputer Speech and Language
Volume45
DOIs
Publication statusPublished - 1 Sept 2017

Keywords

  • Duration variation
  • I-vector
  • PLDA
  • SNR mismatch
  • Speaker verification
  • Variational Bayes

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Discriminative subspace modeling of SNR and duration variabilities for robust speaker verification'. Together they form a unique fingerprint.

Cite this