Abstract
An i-vector is a fixed-length and low-rank representation of a speech utterance. It has been used extensively in text-independent speaker verification. Ideally, speech utterances from the same speaker would map to an unique i-vector. However, this is not the case due to some intrinsic and extrinsic factors like physical condition of the speaker, channel difference, noise and notably the duration of speech utterances. In particular, we found that i-vectors extracted from short utterances exhibit larger variance than that of long utterances. To address the problem, we propose a co-whitening approach, taking into account the duration, while maximizing the correlation between the i-vectors of short and long duration. The proposed co-whitening method was derived based on canonical correlation analysis (CCA). Experimental results on NIST SRE 2010 show that co-whitening method is effective in compensating the duration mismatch, leading to a reduction of up to 13.07% in equal error rate (EER).
Original language | English |
---|---|
Pages (from-to) | 1066-1070 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2018-September |
DOIs | |
Publication status | Published - Sept 2018 |
Externally published | Yes |
Event | 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India Duration: 2 Sept 2018 → 6 Sept 2018 |
Keywords
- Canonical correlation analysis
- Co-whitening
- I-vector
- Short duration
- Speaker recognition
- Text-independent
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation