Abstract
In text-independent speaker verification, it has been shown effective to represent the variable-length and information rich speech utterances using fixed-dimensional vectors, for instance, in the form of i-vectors. An i-vector is a low-dimensional vector in the so-called total variability space represented with a thin and tall rectangular matrix. Taking each row of the total variability matrix as a random vector, we look into the redundancy in representing the total variability space. We show that the total variability matrix is compressible and such characteristic could be exploited to reduce the memory and computational requirement in i-vector extraction. We also show that the existing sparse coding and dictionary learning techniques could be easily adapted for this purpose. Experiments on NIST SRE'10 dataset confirm that the total variability matrix could be represented with a smaller matrix without affecting the performance.
Original language | English |
---|---|
Pages (from-to) | 1022-1026 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2015-January |
Publication status | Published - Sept 2015 |
Externally published | Yes |
Event | 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany Duration: 6 Sept 2015 → 10 Sept 2015 |
Keywords
- I-vector
- Sparse coding
- Speaker verification
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation