TY - GEN
T1 - W-GVKT: Within-Global-View Knowledge Transfer for Speaker Verification
AU - Jin, Zezhong
AU - Tu, Youzhi
AU - Mak, Man Wai
N1 - Publisher Copyright:
© 2024 International Speech Communication Association. All rights reserved.
PY - 2024/9
Y1 - 2024/9
N2 - Contrastive self-supervised learning has played an important role in speaker verification (SV). However, such approaches suffer from false-negative issues. To address this problem, we enhance the non-contrastive DINO framework by enabling knowledge transfer from the teacher network to the student network through diversified versions of global views and call the method Within-Global-View Knowledge Transfer (W-GVKT) DINO. We discovered that given the global view of the entire utterance, creating discrepancies in the student's output through applying spectral augmentation and feature diversification to the global view can facilitate the transfer of knowledge from the teacher to the student. With negligible computational resource increases, W-GVKT achieves an impressive EER of 4.11% without utilizing speaker labels on Voxceleb1. When combined with the RDNIO framework, W-GVKT achieved an EER of 2.89%.
AB - Contrastive self-supervised learning has played an important role in speaker verification (SV). However, such approaches suffer from false-negative issues. To address this problem, we enhance the non-contrastive DINO framework by enabling knowledge transfer from the teacher network to the student network through diversified versions of global views and call the method Within-Global-View Knowledge Transfer (W-GVKT) DINO. We discovered that given the global view of the entire utterance, creating discrepancies in the student's output through applying spectral augmentation and feature diversification to the global view can facilitate the transfer of knowledge from the teacher to the student. With negligible computational resource increases, W-GVKT achieves an impressive EER of 4.11% without utilizing speaker labels on Voxceleb1. When combined with the RDNIO framework, W-GVKT achieved an EER of 2.89%.
KW - DINO
KW - knowledge transfer
KW - self-supervised learning
KW - speaker verification
UR - https://www.scopus.com/pages/publications/85214800257
U2 - 10.21437/Interspeech.2024-354
DO - 10.21437/Interspeech.2024-354
M3 - Conference article published in proceeding or book
AN - SCOPUS:85214800257
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 3779
EP - 3783
BT - English
T2 - 25th Interspeech Conferece 2024
Y2 - 1 September 2024 through 5 September 2024
ER -