Addressing the data-imbalance problem in kernel-based speaker verification via utterance partitioning and speaker comparison

Wei Rao, Man Wai Mak

Research output: Journal article publicationConference articleAcademic researchpeer-review

7 Citations (Scopus)

Abstract

GMM-SVM has become a promising approach to text-independent speaker verification. However, a problematic issue of this approach is the extremely serious imbalance between the numbers of speaker-class and impostor-class utterances available for training the speaker-dependent SVMs. This data-imbalance problem can be addressed by (1) creating more speaker-class supervectors for SVM training through utterance partitioning with acoustic vector resampling (UP-AVR) and (2) avoiding the SVM training so that speaker scores are formulated as an inner product discriminant function (IPDF) between the target-speaker's supervector and test supervector. This paper highlights the differences between these two approaches and compares the effect of using different kernels - including the KL divergence kernel, GMM-UBM mean interval (GUMI) kernel and geometric-mean-comparison kernel - on their performance. Experiments on the NIST 2010 Speaker Recognition Evaluation suggest that GMM-SVM with UP-AVR is superior to speaker comparison and that the GUMI kernel is slightly better than the KL kernel in speaker comparison.
Original languageEnglish
Pages (from-to)2717-2720
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 1 Dec 2011
Event12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011 - Florence, Italy
Duration: 27 Aug 201131 Aug 2011

Keywords

  • Data imbalance
  • GMM-SVM
  • NIST SRE
  • Speaker comparison
  • Speaker verification
  • Utterance partitioning

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this