Abstract
In this paper, we advocate the use of the uncompressed form of i-vector and depend on subspace modeling using probabilistic linear discriminant analysis (PLDA) in handling the speaker and session (or channel) variability. An i-vector is a low-dimensional vector containing both speaker and channel information acquired from a speech segment. When PLDA is used on an i-vector, dimension reduction is performed twice: first in the i-vector extraction process and second in the PLDA model. Keeping the full dimensionality of the i-vector in the i-supervector space for PLDA modeling and scoring would avoid unnecessary loss of information. We refer to the uncompressed i-vector as the i-supervector. The drawback in using the i-supervector with PLDA is the inversion of large matrices in the estimation of the full posterior distribution, which we show can be solved rather efficiently by portioning large matrices into smaller blocks. We also introduce the Gaussianized rank-norm, as an alternative to whitening, for feature normalization prior to PLDA modeling. We found that the i-supervector performs better during normalization. A better performance is obtained by combining the i-supervector and i-vector at the score level. Furthermore, we also analyze the computational complexity of the i-supervector system, compared with that of the i-vector, at four different stages of loading matrix estimation, posterior extraction, PLDA modeling, and PLDA scoring.
Original language | English |
---|---|
Article number | 29 |
Pages (from-to) | 1-13 |
Number of pages | 13 |
Journal | Eurasip Journal on Audio, Speech, and Music Processing |
Volume | 2014 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Dec 2014 |
Externally published | Yes |
Keywords
- I-supervector
- I-vector
- Probabilistic linear discriminant analysis
- Speaker verification
ASJC Scopus subject areas
- Acoustics and Ultrasonics
- Electrical and Electronic Engineering