Abstract
Total variability model (TVM) was recently proposed for the compression of speech utterances to low dimensional vectors (i.e., the so-call identity vector or i-vector). Compared to the variable-length nature of speech utterances, i-vectors have fixed length and therefore could be used with simple classifiers for text-independent speaker verification task. This paper proposes the local variability model (LVM), the central idea of which is to capture the local variability associated with individual Gaussians in the acoustic space that are absent in the i-vector representation. We analyze the latent structure of both the total and local variability models and show that tying the latent variable across frames and mixtures leads to powerful methods for extracting information from variable sequences. Experimental results on NIST SRE'08 and SRE'10 datasets show that the proposed LVM is effective for speaker verification.
Original language | English |
---|---|
Pages | 54-59 |
Number of pages | 6 |
Publication status | Published - Jun 2014 |
Externally published | Yes |
Event | Speaker and Language Recognition Workshop, Odyssey 2014 - Joensuu, Finland Duration: 16 Jun 2014 → 19 Jun 2014 |
Conference
Conference | Speaker and Language Recognition Workshop, Odyssey 2014 |
---|---|
Country/Territory | Finland |
City | Joensuu |
Period | 16/06/14 → 19/06/14 |
Keywords
- Factor analysis
- Session variability
- Speaker recognition
ASJC Scopus subject areas
- Signal Processing
- Software
- Human-Computer Interaction