Abstract
This paper proposes an articulatory feature-based conditional pronunciation modeling (AFCPM) technique for speaker verification. The technique captures the pronunciation characteristics of speakers by modeling the linkage between the actual phones produced by the speakers and the state of articulations during speech production. The speaker models, which consist of conditional probabilities of two articulatory classes, are adapted from a set of universal background models (UBMs) via MAP adaptation. This creates a direct coupling between the speaker and background models, which prevents over-fitting the speaker models when the amount of speaker data is limited. Experimental results demonstrate that MAP adaptation not only enhances the discriminative power of the speaker models but also improves their robustness against handset mismatches. Results also show that fusing the scores derived from an AFCPM-based system and a conventional spectral-based system achieves an error rate that is significantly lower than that can be achieved by the individual systems. This suggests that AFCPM and spectral features are complementary to each other.
Original language | English |
---|---|
Title of host publication | 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing |
Publisher | IEEE |
Volume | I |
ISBN (Print) | 0780388747, 9780780388741 |
DOIs | |
Publication status | Published - 1 Jan 2005 |
Event | 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Philadelphia, PA, United States Duration: 18 Mar 2005 → 23 Mar 2005 |
Conference
Conference | 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 |
---|---|
Country/Territory | United States |
City | Philadelphia, PA |
Period | 18/03/05 → 23/03/05 |
ASJC Scopus subject areas
- Software
- Signal Processing
- Electrical and Electronic Engineering