Boosting the performance of I-vector based speaker verification via utterance partitioning

Wei Rao, Man Wai Mak

Research output: Journal article publicationJournal articleAcademic researchpeer-review

47 Citations (Scopus)

Abstract

The success of the recent i-vector approach to speaker verification relies on the capability of i-vectors to capture speaker characteristics and the subsequent channel compensation methods to suppress channel variability. Typically, given an utterance, an i-vector is determined from the utterance regardless of its length. This paper investigates how the utterance length affects the discriminative power of i-vectors and demonstrates that the discriminative power of i-vectors reaches a plateau quickly when the utterance length increases. This observation suggests that it is possible to make the best use of a long conversation by partitioning it into a number of sub-utterances so that more i-vectors can be produced for each conversation. To increase the number of sub-utterances without scarifying the representation power of the corresponding i-vectors, repeated applications of frame-index randomization and utterance partitioning are performed. Results on NIST 2010 speaker recognition evaluation (SRE) suggest that (1) using more i-vectors per conversation can help to find more robust linear discriminant analysis (LDA) and within-class covariance normalization (WCCN) transformation matrices, especially when the number of conversations per training speaker is limited; and (2) increasing the number of i-vectors per target speaker helps the i-vector based support vector machines (SVM) to find better decision boundaries, thus making SVM scoring outperforms cosine distance scoring by 19% and 9% in terms of minimum normalized DCF and EER.
Original languageEnglish
Article number6423258
Pages (from-to)1012-1022
Number of pages11
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume21
Issue number5
DOIs
Publication statusPublished - 20 Feb 2013

Keywords

  • I-vectors
  • linear discriminant analysis
  • speaker verification
  • support vector machines
  • utterance partitioning with acoustic vector resampling (UP-AVR)

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Cite this