Abstract
GMM-SVM has become a promising approach to text-independent speaker verification. However, a problematic issue of this approach is the extremely serious imbalance between the numbers of speaker-class and impostor-class utterances available for training the speaker-dependent SVMs. This data-imbalance problem can be addressed by (1) creating more speaker-class supervectors for SVM training through utterance partitioning with acoustic vector resampling (UP-AVR) and (2) avoiding the SVM training so that speaker scores are formulated as an inner product discriminant function (IPDF) between the target-speaker's supervector and test supervector. This paper highlights the differences between these two approaches and compares the effect of using different kernels - including the KL divergence kernel, GMM-UBM mean interval (GUMI) kernel and geometric-mean-comparison kernel - on their performance. Experiments on the NIST 2010 Speaker Recognition Evaluation suggest that GMM-SVM with UP-AVR is superior to speaker comparison and that the GUMI kernel is slightly better than the KL kernel in speaker comparison.
Original language | English |
---|---|
Pages (from-to) | 2717-2720 |
Number of pages | 4 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publication status | Published - 1 Dec 2011 |
Event | 12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011 - Florence, Italy Duration: 27 Aug 2011 → 31 Aug 2011 |
Keywords
- Data imbalance
- GMM-SVM
- NIST SRE
- Speaker comparison
- Speaker verification
- Utterance partitioning
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation