Twin model G-PLDA for duration mismatch compensation in text-independent speaker verification

Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong Aik Lee

Research output: Journal article publicationConference articleAcademic researchpeer-review

8 Citations (Scopus)

Abstract

Short duration speaker verification is a challenging problem partly due to utterance duration mismatch. This paper proposes a novel method that modifies the standard Gaussian probabilistic linear discriminant analysis (G-PLDA) to use two separate generative models for i-vectors from long and short utterances which are jointly trained. The proposed twin model G-PLDA employs distinct models for i-vectors corresponding to different durations from the same speaker but shares the same latent variables. Unlike the standard G-PLDA, this twin model G-PLDA takes the differences between utterances of varying durations into account. Hyper-parameter estimation and scoring formulae for the twin model G-PLDA are presented. Experimental results obtained using NIST 2010 data show that the proposed technique leads to relative improvements of 8.5% and 15.6% when tested on utterances of 5 second and 3 second durations respectively.

Original languageEnglish
Pages (from-to)1853-1857
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume08-12-September-2016
DOIs
Publication statusPublished - Sept 2016
Externally publishedYes
Event17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
Duration: 8 Sept 201616 Sept 2016

Keywords

  • Automatic speaker verification
  • G-PLDA
  • i-vector
  • Short duration speaker verification
  • Twin model G-PLDA

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Twin model G-PLDA for duration mismatch compensation in text-independent speaker verification'. Together they form a unique fingerprint.

Cite this