High-level feature-based speaker verification via articulatory phonetic-class pronunciation modeling

Shi Xiong Zhang, Man Wai Mak, Helen Meng

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Although articulatory feature-based conditional pronunciation models (AFCPMs) can capture the pronunciation characteristics of speakers, they requires one discrete density function for each phoneme, which may lead to inaccurate models when the amount of training data is limited. This paper proposes a phonetic-class based AFCPM in which the density functions in speaker models are conditioned on phonetic classes instead of phonemes. Phonemes are mapped to phonetic classes by ; (1) vector quantizing the phoneme-dependent universal background models, (2) grouping phonemes according to the classical phoneme tree, and (3) combination of (1) and (2). A new scoring method that uses an SVM to combine the scores of phonetic-class models is also proposed. Evaluations based on 2000 NIST SRE show that the proposed approach can effectively solve the data sparseness problem encountered in conventional AFCPM.
Original languageEnglish
Title of host publicationInternational Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Pages621-624
Number of pages4
Volume1
Publication statusPublished - 1 Dec 2007
Event8th Annual Conference of the International Speech Communication Association, Interspeech 2007 - Antwerp, Belgium
Duration: 27 Aug 200731 Aug 2007

Conference

Conference8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Country/TerritoryBelgium
CityAntwerp
Period27/08/0731/08/07

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Modelling and Simulation
  • Linguistics and Language
  • Communication

Fingerprint

Dive into the research topics of 'High-level feature-based speaker verification via articulatory phonetic-class pronunciation modeling'. Together they form a unique fingerprint.

Cite this