Abstract
In speaker verification over public telephone networks, utterances can be obtained from different types of handsets. Different handsets may introduce different degrees of distortion to the speech signals. This paper attempts to combine a handset selector with (1) handset-specific transformations and (2) handset-dependent speaker models to reduce the effect caused by the acoustic distortion. Specifically, a number of Gaussian mixture models are independently trained to identify the most likely handset given a test utterance; then during recognition, the speaker model and background model are either transformed by MLLR-based handset-specific transformation or respectively replaced by a handset-dependent speaker model and a handset-dependent background model whose parameters were adapted by reinforced learning to fit the new environment. Experimental results based on 150 speakers of the HTIMIT corpus show that environment adaptation based on both MLLR and reinforced learning outperforms the classical CMS, Hnorm and Tnorm approaches, with MLLR adaptation achieves the best performance.
Original language | English |
---|---|
Title of host publication | EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology |
Publisher | International Speech Communication Association |
Pages | 2973-2976 |
Number of pages | 4 |
Publication status | Published - 1 Jan 2003 |
Event | 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland Duration: 1 Sept 2003 → 4 Sept 2003 |
Conference
Conference | 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 |
---|---|
Country/Territory | Switzerland |
City | Geneva |
Period | 1/09/03 → 4/09/03 |
ASJC Scopus subject areas
- Computer Science Applications
- Software
- Linguistics and Language
- Communication