Noise robust speaker verification via the fusion of SNR-independent and SNR-dependent PLDA

Xiaomin Pang, Man Wai Mak

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

While i-vectors with probabilistic linear discriminant analysis (PLDA) can achieve state-of-the-art performance in speaker verification, the mismatch caused by acoustic noise remains a key factor affecting system performance. In this paper, a fusion system that combines a multi-condition signal-to-noise ratio (SNR)-independent PLDA model and a mixture of SNR-dependent PLDA models is proposed to make speaker verification systems more noise robust. First, the whole range of SNR that a verification system is expected to operate is divided into several narrow ranges. Then, a set of SNR-dependent PLDA models, one for each narrow SNR range, are trained. During verification, the SNR of the test utterance is used to determine which of the SNR-dependent PLDA models is used for scoring. To further enhance performance, the SNR-dependent and SNR-independent models are fused using linear and logistic regression fusion. The performance of the fusion system and the SNR-dependent system is evaluated on the NIST 2012 speaker recognition evaluation for both noisy and clean conditions. Results show that a mixture of SNR-dependent PLDA models perform better in both clean and noisy conditions. It was also found that the fusion system is more robust than the conventional i-vector/PLDA systems under noisy conditions.
Original languageEnglish
Pages (from-to)633-648
Number of pages16
JournalInternational Journal of Speech Technology
Volume18
Issue number4
DOIs
Publication statusPublished - 1 Dec 2015

Keywords

  • Fusion
  • i-Vectors
  • NIST 2012 SRE
  • Noise robustness
  • Probabilistic LDA
  • Speaker verification

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Human-Computer Interaction
  • Linguistics and Language
  • Computer Vision and Pattern Recognition

Cite this