I-Vector DNN scoring and calibration for noise robust speaker verification

Zhili Tan, Man Wai Mak

Research output: Journal article publicationConference articleAcademic researchpeer-review

3 Citations (Scopus)

Abstract

This paper proposes applying multi-task learning to train deep neural networks (DNNs) for calibrating the PLDA scores of speaker verification systems under noisy environments. To facilitate the DNNs to learn the main task (calibration), several auxiliary tasks were introduced, including the prediction of SNR and duration from i-vectors and classifying whether an i-vector pair belongs to the same speaker or not. The possibility of replacing the PLDA model by a DNN during the scoring stage is also explored. Evaluations on noise contaminated speech suggest that the auxiliary tasks are important for the DNNs to learn the main calibration task and that the uncalibrated PLDA scores are an essential input to the DNNs. Without this input, the DNNs can only predict the score shifts accurately, suggesting that the PLDA model is indispensable.
Original languageEnglish
Pages (from-to)1562-1566
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
Publication statusPublished - 1 Jan 2017
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: 20 Aug 201724 Aug 2017

Keywords

  • Deep learning
  • Multi-task learning
  • Noise robustness
  • Score calibration
  • Speaker verification

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this