Xi-Vector Embedding for Speaker Recognition

Kong Aik Lee, Qiongqiong Wang, Takafumi Koshinaka

Research output: Journal article publicationJournal articleAcademic researchpeer-review

19 Citations (Scopus)

Abstract

We present a Bayesian formulation for deep speaker embedding, wherein the xi-vector is the Bayesian counterpart of the x-vector, taking into account the uncertainty estimate. On the technology front, we offer a simple and straightforward extension to the now widely used x-vector. It consists of an auxiliary neural net predicting the frame-wise uncertainty of the input sequence. We show that the proposed extension leads to substantial improvement across all operating points, with a significant reduction in error rates and detection cost. On the theoretical front, our proposal integrates the Bayesian formulation of linear Gaussian model to speaker-embedding neural networks via the pooling layer. In one sense, our proposal integrates the Bayesian formulation of the i-vector to that of the x-vector. Hence, we refer to the embedding as the xi-vector, which is pronounced as /zai/ vector. Experimental results on the SITW evaluation set show a consistent improvement of over 17.5% in equal-error-rate and 10.9% in minimum detection cost.

Original languageEnglish
Article number9463712
Pages (from-to)1385-1389
Number of pages5
JournalIEEE Signal Processing Letters
Volume28
DOIs
Publication statusPublished - Jun 2021
Externally publishedYes

Keywords

  • neural embedding
  • Speaker verification
  • uncertainty

ASJC Scopus subject areas

  • Signal Processing
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Xi-Vector Embedding for Speaker Recognition'. Together they form a unique fingerprint.

Cite this