Abstract
It has recently been shown that the pronunciation characteristics of speakers can be represented by articulatory featurebased conditional pronunciation models (AFCPMs). However, the pronunciation models are phoneme dependent, which may lead to speaker models with low discriminative power when the amount of enrollment data is limited. This paper proposes mitigating this problem by grouping similar phonemes into phonetic classes and representing background and speaker models as phonetic-class dependent density functions. Phonemes are grouped by 1) vector quantizing the discrete densities in the phoneme-dependent universal background models, 2) using the phone properties specified in the classical phoneme tree, or 3) combining vector quantization and phone properties. Evaluations based on the 2000 NIST SRE show that this phonetic-class approach effectively alleviates the data spareness problem encountered in conventional AFCPM, which results in better performance when fused with acoustic features.
Original language | English |
---|---|
Pages (from-to) | 1189-1198 |
Number of pages | 10 |
Journal | IEEE Transactions on Computers |
Volume | 56 |
Issue number | 9 |
DOIs | |
Publication status | Published - 1 Sept 2007 |
Keywords
- Articulatory features
- NIST speaker recognition evaluation
- Phonetic classes
- Pronunciation modeling
- Speaker verification
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture
- Computational Theory and Mathematics