Abstract
Feature transformation plays an important role in robust speaker verification over telephone networks. This paper compares several feature transformation techniques and evaluates their verification performance and computation time under the 2000 NIST speaker recognition evaluation protocol. Techniques compared include feature mapping (FM), stochastic feature transformation (SFT), and blind stochastic feature transformation (BSFT). The paper proposes a probabilistic feature mapping (PFM) in which the mapped features depend not only on the top-1 decoded Gaussian but also on the posterior probabilities of other Gaussians in the root model. The paper also proposes speeding up the computation of PFM and BSFT parameters by considering the top few Gaussians only. Results show that PFM performs slightly better than FM and that the fast approach can reduce computation time substantially. Among the approaches investigated, the fast BSFT is found to have the highest potential for robust speaker verification over telephone networks because it can achieve good performance without any a priori knowledge of the communication channel. It was also found that fusion of the scores derived from systems using BSFT and PFM can reduce the error rate further.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, MLSP 2006 |
Pages | 433-438 |
Number of pages | 6 |
DOIs | |
Publication status | Published - 1 Dec 2007 |
Event | 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, MLSP 2006 - Maynooth, Ireland Duration: 6 Sept 2006 → 8 Sept 2006 |
Conference
Conference | 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, MLSP 2006 |
---|---|
Country/Territory | Ireland |
City | Maynooth |
Period | 6/09/06 → 8/09/06 |
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Signal Processing