MAP estimation of subspace transform for speaker recognition

Donglai Zhu, Bin Ma, Kong Aik Lee, Cheung Chi Leung, Haizhou Li

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

We propose using the maximum-a-posteriori (MAP) estimation of subspace transform for speaker recognition. The linear transform is defined on the mean vectors of the Gaussian mixture model (GMM), where transform matrices and bias vectors are associated with separate regression classes so that both can be estimated with sufficient statistics given limited training data. The transform matrices are further defined as a linear combination of a set of basis transforms so that the weights are parameters to be estimated. We characterize the speakers with the transform parameters and model them using support vector machine (SVM). Experiments on the 2008 NIST SRE task illustrate the effectiveness of the method.

Original languageEnglish
Title of host publicationProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
PublisherInternational Speech Communication Association
Pages1465-1468
Number of pages4
Publication statusPublished - Sept 2010
Externally publishedYes

Publication series

NameProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

Keywords

  • Maximum a posteriori
  • Speaker recognition
  • Subspace transform

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'MAP estimation of subspace transform for speaker recognition'. Together they form a unique fingerprint.

Cite this