W-GVKT: Within-Global-View Knowledge Transfer for Speaker Verification

Zezhong Jin, Youzhi Tu, Man Wai Mak

Research output: Journal article publicationConference articleAcademic researchpeer-review

1 Citation (Scopus)

Abstract

Contrastive self-supervised learning has played an important role in speaker verification (SV). However, such approaches suffer from false-negative issues. To address this problem, we enhance the non-contrastive DINO framework by enabling knowledge transfer from the teacher network to the student network through diversified versions of global views and call the method Within-Global-View Knowledge Transfer (W-GVKT) DINO. We discovered that given the global view of the entire utterance, creating discrepancies in the student's output through applying spectral augmentation and feature diversification to the global view can facilitate the transfer of knowledge from the teacher to the student. With negligible computational resource increases, W-GVKT achieves an impressive EER of 4.11% without utilizing speaker labels on Voxceleb1. When combined with the RDNIO framework, W-GVKT achieved an EER of 2.89%.

Original languageEnglish
Pages (from-to)3779-3783
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
Publication statusPublished - Sept 2024
Event25th Interspeech Conferece 2024 - Kos Island, Greece
Duration: 1 Sept 20245 Sept 2024

Keywords

  • DINO
  • knowledge transfer
  • self-supervised learning
  • speaker verification

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'W-GVKT: Within-Global-View Knowledge Transfer for Speaker Verification'. Together they form a unique fingerprint.

Cite this