Abstract
Text-independent speaker verification can reach high accuracy provided that there are sufficient amount of training and test speech utterances. Gaussian mixture model - universal background model (GMM-UBM), joint factor analysis (JFA) and identity-vector (i-vector) represent the dominant techniques used in this area in view of their superior performance. However, their accuracies drop significantly when the duration of speech utterances are much constrained. In many realistic voice biometric application, the speech duration is required to be quite short, which leads to low accuracy. One solution is to use passphrases in place of the uncertain contents. In contrast with textindependent system, this kind of text-dependent speaker verification can achieve higher accuracy even when the speech is short. In this paper, we conduct a study on the application of the pass-phrase based speaker modeling and recognition where the speech signal is obtained through VHF (Very High Frequency) communication channel. We attempt to evaluate the effectiveness of the GMM-UBM, JFA, i-vector methods and their fusion system on this text-dependent speaker verification platform. Our primary target is to achieve equal error rate (EER) of 10~15% under adverse condition using about 3 seconds of speech sample.
Original language | English |
---|---|
Pages | 216-223 |
Number of pages | 8 |
Publication status | Published - Jun 2014 |
Externally published | Yes |
Event | Speaker and Language Recognition Workshop, Odyssey 2014 - Joensuu, Finland Duration: 16 Jun 2014 → 19 Jun 2014 |
Conference
Conference | Speaker and Language Recognition Workshop, Odyssey 2014 |
---|---|
Country/Territory | Finland |
City | Joensuu |
Period | 16/06/14 → 19/06/14 |
ASJC Scopus subject areas
- Signal Processing
- Software
- Human-Computer Interaction