Generalizing I-Vector Estimation for Rapid Speaker Recognition

Longting Xu, Kong Aik Lee, Haizhou Li, Zhen Yang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

24 Citations (Scopus)

Abstract

An i-vector is a compact representation that captures both the speaker and session variabilities rendered in a spoken utterance. Over the past years, it has prevailed over other techniques and is now the de facto representation for text-independent speaker recognition. Standard i-vector extraction requires intense computation at run-time. Reducing the computation will allow effective use of i-vector in more applications. Such intense computation arises from the posterior covariance matrix, when estimating the i-vector. There have been studies on how to simplify the computation of posterior covariance matrix with modest success. In this paper, we propose a novel approach to i-vector extraction without the need to evaluate the full posterior covariance thereby speeding up the run-time extraction process. This is achieved by generalizing the i-vector estimation in two ways. First, we introduce the use of occupancy reweighting in conjunction with whitening over the Baum-Welch statistics as part of the preprocessing step. Second, we introduce the so-called subspace-orthogonalizing prior (SOP) to replace the standard Gaussian prior in i-vector formulation. Experiments conducted on the extended-core task of NIST SRE'10 show that the proposed rapid SOP approach achieves considerable speed-up over the standard i-vector with comparable equal error rates.

Original languageEnglish
Article number8259230
Pages (from-to)749-759
Number of pages11
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume26
Issue number4
DOIs
Publication statusPublished - Apr 2018
Externally publishedYes

Keywords

  • rapid computation
  • Speaker verification
  • total variability model

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Generalizing I-Vector Estimation for Rapid Speaker Recognition'. Together they form a unique fingerprint.

Cite this