Abstract
I-vector is widely described as a compact and effective representation of speech utterances for speaker recognition. Standard i-vector extraction could be an expensive task for applications where computing resource is limited, for instance, on handheld devices. Fast approximate inference of i-vector aims to reduce the computational cost required in i-vector extraction where run-time requirement is critical. Most fast approaches hinge on certain assumptions to approximate the i-vector inference formulae with little loss of accuracy. In this paper, we analyze the uniform assumption that we had proposed earlier. We show that the assumption generally hold for long utterances but inadequate for utterances of short duration. We then propose to compensate for the negative effects by applying a simple gain factor on the i-vectors estimated from short utterances. The assertion is confirmed through analysis and experiments conducted on NIST SRE'08 and SRE'10 datasets.
Original language | English |
---|---|
Pages (from-to) | 1527-1531 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2017-August |
DOIs | |
Publication status | Published - Aug 2017 |
Externally published | Yes |
Event | 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden Duration: 20 Aug 2017 → 24 Aug 2017 |
Keywords
- Factor analysis
- Speaker recognition
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation