Speaker verification via high-level feature-based phonetic-class pronunciation modeling

Shi Xiong Zhang, Man Wai Mak, Helen M. Meng

Research output: Journal article publicationJournal articleAcademic researchpeer-review

12 Citations (Scopus)

Abstract

It has recently been shown that the pronunciation characteristics of speakers can be represented by articulatory featurebased conditional pronunciation models (AFCPMs). However, the pronunciation models are phoneme dependent, which may lead to speaker models with low discriminative power when the amount of enrollment data is limited. This paper proposes mitigating this problem by grouping similar phonemes into phonetic classes and representing background and speaker models as phonetic-class dependent density functions. Phonemes are grouped by 1) vector quantizing the discrete densities in the phoneme-dependent universal background models, 2) using the phone properties specified in the classical phoneme tree, or 3) combining vector quantization and phone properties. Evaluations based on the 2000 NIST SRE show that this phonetic-class approach effectively alleviates the data spareness problem encountered in conventional AFCPM, which results in better performance when fused with acoustic features.
Original languageEnglish
Pages (from-to)1189-1198
Number of pages10
JournalIEEE Transactions on Computers
Volume56
Issue number9
DOIs
Publication statusPublished - 1 Sep 2007

Keywords

  • Articulatory features
  • NIST speaker recognition evaluation
  • Phonetic classes
  • Pronunciation modeling
  • Speaker verification

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this