Characterizing speech utterances for speaker verification with sequence kernel SVM

Kong Aik Lee, Changhuai You, Haizhou Li, Tomi Kinnunen, Donglai Zhu

Research output: Journal article publicationConference articleAcademic researchpeer-review

10 Citations (Scopus)

Abstract

Support vector machine (SVM) equipped with sequence kernel has been proven to be a powerful technique for speaker verification. A number of sequence kernels have been recently proposed, each being motivated from different perspectives with diverse mathematical derivations. Analytical comparison of kernels becomes difficult. To facilitate such comparisons, we propose a generic structure showing how different levels of cues conveyed by speech utterances, ranging from low-level acoustic features to high-level speaker cues, are being characterized within a sequence kernel. We then identify the similarities and differences between the popular generalized linear discriminant sequence (GLDS) and GMM supervector kernels, as well as our own probabilistic sequence kernel (PSK). Furthermore, we enhance the PSK in terms of accuracy and computational complexity. The enhanced PSK gives competitive accuracy with the other two kernels. Fusing all the three kernels yields an EER of 4.83% on the 2006 NIST SRE core test.

Original languageEnglish
Pages (from-to)1397-1400
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2008
Externally publishedYes
EventINTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, Australia
Duration: 22 Sept 200826 Sept 2008

Keywords

  • Characteristic vector
  • Sequence kernel
  • Speaker verification
  • Support vector machine

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Sensory Systems

Fingerprint

Dive into the research topics of 'Characterizing speech utterances for speaker verification with sequence kernel SVM'. Together they form a unique fingerprint.

Cite this