Co-whitening of i-vectors for short and long duration speaker verification

Longting Xu, Kong Aik Lee, Haizhou Li, Zhen Yang

Research output: Journal article publicationConference articleAcademic researchpeer-review

1 Citation (Scopus)

Abstract

An i-vector is a fixed-length and low-rank representation of a speech utterance. It has been used extensively in text-independent speaker verification. Ideally, speech utterances from the same speaker would map to an unique i-vector. However, this is not the case due to some intrinsic and extrinsic factors like physical condition of the speaker, channel difference, noise and notably the duration of speech utterances. In particular, we found that i-vectors extracted from short utterances exhibit larger variance than that of long utterances. To address the problem, we propose a co-whitening approach, taking into account the duration, while maximizing the correlation between the i-vectors of short and long duration. The proposed co-whitening method was derived based on canonical correlation analysis (CCA). Experimental results on NIST SRE 2010 show that co-whitening method is effective in compensating the duration mismatch, leading to a reduction of up to 13.07% in equal error rate (EER).

Original languageEnglish
Pages (from-to)1066-1070
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2018-September
DOIs
Publication statusPublished - Sept 2018
Externally publishedYes
Event19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India
Duration: 2 Sept 20186 Sept 2018

Keywords

  • Canonical correlation analysis
  • Co-whitening
  • I-vector
  • Short duration
  • Speaker recognition
  • Text-independent

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Co-whitening of i-vectors for short and long duration speaker verification'. Together they form a unique fingerprint.

Cite this