A two-stage scoring method combining world and cohort models for speaker verification

W. D. Zhang, Man Wai Mak, M. X. He

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

9 Citations (Scopus)

Abstract

The cohort and world models are commonly used for scoring normalization in speaker verification. As these models represent different regions of the feature space, a better solution could be obtained by integrating them into a single framework. In this paper, we embed the two models in elliptical basis function networks and propose a two-stage decision procedure for improving verification performance. In the first stage, the score of an unknown utterance is normalized by a world model. If the difference between the resulting normalized score and a world threshold is sufficiently large, the claimant is accepted or rejected immediately. Otherwise, the score will be normalized by a cohort model and compared with a cohort threshold to make a final accept/reject decision. Experimental evaluations based on the YOHO corpus suggest that the two-stage method achieves a lower error rate as compared to the case where only one background model is used.
Original languageEnglish
Title of host publicationSignal Processing Theory and Methods IIAudio and ElectroacusticsSpeech Processing I
PublisherIEEE
Pages1193-1196
Number of pages4
Volume2
ISBN (Electronic)0780362934
DOIs
Publication statusPublished - 1 Jan 2000
Event25th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000 - Hilton Hotel and Convention Center, Istanbul, Turkey
Duration: 5 Jun 20009 Jun 2000

Conference

Conference25th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000
Country/TerritoryTurkey
CityIstanbul
Period5/06/009/06/00

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A two-stage scoring method combining world and cohort models for speaker verification'. Together they form a unique fingerprint.

Cite this