Because of the differences in education background, accents, and so on, different persons have different ways of pronunciation. Therefore, the pronunciation patterns of individuals can be used as features for discriminating speakers. This paper exploits the pronunciation characteristics of speakers and proposes a new conditional pronunciation modeling (CPM) technique for speaker verification. The proposed technique establishes a link between articulatory properties (e.g., manners and places of articulation) and phoneme sequences produced by a speaker. This is achieved by aligning two articulatory feature (AF) streams with a phoneme sequence determined by a phoneme recognizer, which is followed by formulating the probabilities of articulatory classes conditioned on the phonemes as speaker-dependent discrete probabilistic models. The scores obtained from the AF-based pronunciation models are then fused with those obtained from spectral-based acoustic models. A frame-weighted fusion approach is introduced to weight the frame-based fused scores based on the confidence of observing the articulatory classes. The effectiveness of AF-based CPM and the frame-weighted approach is demonstrated in a speaker verification task.
ASJC Scopus subject areas
- Language and Linguistics
- Modelling and Simulation
- Linguistics and Language
- Computer Vision and Pattern Recognition
- Computer Science Applications