A new adaptation approach to high-level speaker-model creation in speaker verification

Shi Xiong Zhang, Man Wai Mak

Research output: Journal article publicationJournal articleAcademic researchpeer-review

2 Citations (Scopus)

Abstract

Research has shown that speaker verification based on high-level speaker features requires long enrollment utterances to guarantee low error rate during verification. However, in practical speaker verification, it is common to model speakers based on a limited amount of enrollment data, which will make the speaker models unreliable. This paper proposes four new adaptation methods for creating high-level speaker models to alleviate this undesirable effect. Unlike conventional methods in which only the phoneme-dependent background model is adapted, the proposed adaptation methods also adapts the phoneme-independent speaker model to fully utilize all the information available in the training data. A proportional factor, which is derived from the ratio between the phoneme-dependent background model and the phoneme-independent background model, is used to adjust the phoneme-independent speaker models during adaptation. The proposed method was evaluated under the NIST 2000 and NIST 2002 SRE frameworks. Experimental results show that the proposed adaptation method can alleviate the data-sparseness problem effectively and achieves a better performance when compared with traditional MAP adaptation.
Original languageEnglish
Pages (from-to)534-550
Number of pages17
JournalSpeech Communication
Volume51
Issue number6
DOIs
Publication statusPublished - 1 Jun 2009

Keywords

  • High-level features
  • Maximum-a-posterior (MAP) adaptation
  • Model adaptation
  • Speaker verification

ASJC Scopus subject areas

  • Software
  • Modelling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'A new adaptation approach to high-level speaker-model creation in speaker verification'. Together they form a unique fingerprint.

Cite this