Meta-learning for cross-channel speaker verification

Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang, Hui Chen

Research output: Journal article publicationConference articleAcademic researchpeer-review

10 Citations (Scopus)

Abstract

Automatic speaker verification (ASV) has been successfully deployed for identity recognition. With increasing use of ASV technology in real-world applications, channel mismatch caused by the recording devices and environments severely degrade its performance, especially in the case of unseen channels. To this end, we propose a meta speaker embedding network (MSEN) via meta-learning to generate channel-invariant utterance embeddings. Specifically, we optimize the differences between the embeddings of a support set and a query set in order to learn a channel-invariant embedding space for utterances. Furthermore, we incorporate distribution optimization (DO) to stabilize the performance of MSEN. To quantitatively measure the effect of MSEN on unseen channels, we specially design the generalized cross-channel (GCC) evaluation. The experimental results on the HI-MIA corpus demonstrate that the proposed MSEN reduce considerably the impact of channel mismatch, while significantly outperforms other state-of-the-art methods.

Original languageEnglish
Pages (from-to)5839-5843
Number of pages5
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2021-June
DOIs
Publication statusPublished - 2021
Externally publishedYes
Event2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada
Duration: 6 Jun 202111 Jun 2021

Keywords

  • Cross channel
  • Meta speaker embedding network
  • Meta-learning
  • Speaker verification

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Meta-learning for cross-channel speaker verification'. Together they form a unique fingerprint.

Cite this