W-GVKT: Within-Global-View Knowledge Transfer for Speaker Verification

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Contrastive self-supervised learning has played an important role in speaker verification (SV). However, such approaches suffer from false-negative issues. To address this problem, we enhance the non-contrastive DINO framework by enabling knowledge transfer from the teacher network to the student network through diversified versions of global views and call the method Within-Global-View Knowledge Transfer (W-GVKT) DINO. We discovered that given the global view of the entire utterance, creating discrepancies in the student's output through applying spectral augmentation and feature diversification to the global view can facilitate the transfer of knowledge from the teacher to the student. With negligible computational resource increases, W-GVKT achieves an impressive EER of 4.11% without utilizing speaker labels on Voxceleb1. When combined with the RDNIO framework, W-GVKT achieved an EER of 2.89%.

Original languageEnglish
Title of host publicationEnglish
Pages3779-3783
Number of pages5
DOIs
Publication statusPublished - Sept 2024
Event25th Interspeech Conferece 2024 - Kos Island, Greece
Duration: 1 Sept 20245 Sept 2024

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech Communication Association
ISSN (Print)2308-457X

Conference

Conference25th Interspeech Conferece 2024
Country/TerritoryGreece
CityKos Island
Period1/09/245/09/24

Keywords

  • DINO
  • knowledge transfer
  • self-supervised learning
  • speaker verification

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'W-GVKT: Within-Global-View Knowledge Transfer for Speaker Verification'. Together they form a unique fingerprint.

Cite this