Meta-Generalization for Domain-Invariant Speaker Verification

Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang, Helen Meng

Research output: Journal article publicationJournal articleAcademic researchpeer-review

7 Citations (Scopus)

Abstract

Automatic speaker verification (ASV) exhibits unsatisfactory performance under domain mismatch conditions owing to intrinsic and extrinsic factors, such as variations in speaking styles and recording devices encountered in real-world applications. To ensure robust performance under unseen conditions, domain generalization has been explored. However, an inherent contradiction exists between model discrimination and domain generalization, in which the discrimination ability may be reduced while learning to generalize. In this paper, to extract discriminative yet domain-invariant representations, we propose the meta-generalized speaker verification (MGSV) via meta-learning. Specifically, we propose a metric-based distribution optimization and a gradient-based meta-optimization to simultaneously supervise the spatial relationship between embeddings and improve the generalization ability of the model on unseen domains. In addition, we design multiple-single (MS) and simulated speaker verification (SSV) sampling strategies based on single-domain (SD) and single-single (SS) strategies to simulate the train/test domain mismatch more relevantly, thereby mining transferable speaker-related knowledge. SSV is chosen as the most effective method, as it substantially improves the domain generalization by ensuring that the model has learned to discriminate efficiently. Additionally, to intuitively reflect the model performance on the unseen domains, the proposed method is validated on cross-genre, cross-device, and cross-dataset tasks. The experimental results demonstrate that our proposed method achieves remarkable performance in handling domain mismatch issues in speaker verification.

Original languageEnglish
Article number10053562
Pages (from-to)1024-1036
Number of pages13
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume31
DOIs
Publication statusPublished - Feb 2023
Externally publishedYes

Keywords

  • Domain mismatch
  • meta-generalized speaker verification
  • meta-learning
  • Speaker verification

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Meta-Generalization for Domain-Invariant Speaker Verification'. Together they form a unique fingerprint.

Cite this