Direct Optimization of the Detection Cost for I-Vector-Based Spoken Language Recognition

Aleksandr Sizov, Kong Aik Lee, Tomi Kinnunen

Research output: Journal article publicationJournal articleAcademic researchpeer-review

8 Citations (Scopus)


We explore a method to boost discriminative capabilities of probabilistic linear discriminant analysis (PLDA) model without losing its generative advantages. We show a sequential projection and training steps leading to a classifier that operates in the original i-vector space but is discriminatively trained in a low-dimensional PLDA latent subspace. We use extended Baum-Welch technique to optimize the model with respect to two objective functions for discriminative training. One of them is the well-known maximum mutual information objective, while the other one is a new objective that we propose to approximate the language detection cost. We evaluate the performance on NIST language recognition evaluation (LRE) 2015 and our development dataset comprised of the utterances from previous LREs. We improve the detection cost by 10% and 6% relative compared to our fine-tuned generative and discriminative baselines, and by 10% over the best of our previously reported results. The proposed approximation method of the cost function and PLDA subspace training are applicable for a broad range of tasks.

Original languageEnglish
Pages (from-to)588-597
Number of pages10
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Issue number3
Publication statusPublished - Mar 2017
Externally publishedYes


  • Discriminative training
  • factor analysis
  • language detection
  • language identification
  • PLDA

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering


Dive into the research topics of 'Direct Optimization of the Detection Cost for I-Vector-Based Spoken Language Recognition'. Together they form a unique fingerprint.

Cite this