Abstract
Total variability model has shown to be effective for text-independent speaker verification. It provisions a tractable way to estimate the so-called i-vector, which describes the speaker and session variability rendered in a whole utterance. In order to extract the local session variability that is neglected by an i-vector, local variability models were proposed, including the Gaussian- and the dimension-oriented local variability models. This paper presents a consolidated study of the total and local variability models and gives a full comparison between them under the same framework. Besides, new extensions are proposed for the existing local variability models. The comparison between the total variability model and the local variability models is fulfilled with the experiments on NIST SRE’08 and SRE’10 datasets. Furthermore, in the experiments, the dimension-oriented local variability models show their capability to capture the session variability which is complementary to that estimated by the total variability model.
Original language | English |
---|---|
Pages (from-to) | 217-228 |
Number of pages | 12 |
Journal | Journal of Signal Processing Systems |
Volume | 82 |
Issue number | 2 |
DOIs | |
Publication status | Published - 1 Feb 2016 |
Keywords
- Factor analysis
- Session variability
- Speaker recognition
ASJC Scopus subject areas
- Control and Systems Engineering
- Theoretical Computer Science
- Signal Processing
- Information Systems
- Modelling and Simulation
- Hardware and Architecture