Abstract
Recent research has demonstrated the merit of combining Gaussian mixture models and support vector machine (SVM) for text-independent speaker verification. However, one unaddressed issue in this GMM-SVM approach is the imbalance between the numbers of speaker-class utterances and impostor-class utterances available for training a speaker-dependent SVM. This paper proposes a resampling technique - namely utterance partitioning with acoustic vector resampling (UP-AVR) - to mitigate the data imbalance problem. Briefly, the sequence order of acoustic vectors in an enrollment utterance is first randomized, which is followed by partitioning the randomized sequence into a number of segments. Each of these segments is then used to produce a GMM supervector via MAP adaptation and mean vector concatenation. The randomization and partitioning processes are repeated several times to produce a sufficient number of speaker-class supervectors for training an SVM. Experimental evaluations based on the NIST 2002 and 2004 SRE suggest that UP-AVR can reduce the error rate of GMM-SVM systems.
Original language | English |
---|---|
Pages (from-to) | 119-130 |
Number of pages | 12 |
Journal | Speech Communication |
Volume | 53 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Jan 2011 |
Keywords
- Data imbalance
- GMM-supervectors (GSV)
- GMM-SVM
- Random resampling
- Speaker verification
- Support vector machine
- Utterance partitioning
ASJC Scopus subject areas
- Software
- Modelling and Simulation
- Communication
- Language and Linguistics
- Linguistics and Language
- Computer Vision and Pattern Recognition
- Computer Science Applications