TY - GEN
T1 - Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification
AU - Truong, Duc Tuan
AU - Tao, Ruijie
AU - Yip, Jia Qi
AU - Lee, Kong Aik
AU - Chng, Eng Siong
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/4/14
Y1 - 2024/4/14
N2 - Knowledge distillation (KD) is used to enhance automatic speaker verification performance by ensuring consistency between large teacher networks and lightweight student networks at the embedding level or label level. However, the conventional label-level KD overlooks the significant knowledge from non-target speakers, particularly their classification probabilities, which can be crucial for automatic speaker verification. In this paper, we first demonstrate that leveraging a larger number of training non-target speakers improves the performance of automatic speaker verification models. Inspired by this finding about the importance of non-target speakers' knowledge, we modified the conventional label-level KD by disentangling and emphasizing the classification probabilities of non-target speakers during knowledge distillation. The proposed method is applied to three different student model architectures and achieves an average of 13.67% improvement in EER on the VoxCeleb dataset compared to embedding-level and conventional label-level KD methods.
AB - Knowledge distillation (KD) is used to enhance automatic speaker verification performance by ensuring consistency between large teacher networks and lightweight student networks at the embedding level or label level. However, the conventional label-level KD overlooks the significant knowledge from non-target speakers, particularly their classification probabilities, which can be crucial for automatic speaker verification. In this paper, we first demonstrate that leveraging a larger number of training non-target speakers improves the performance of automatic speaker verification models. Inspired by this finding about the importance of non-target speakers' knowledge, we modified the conventional label-level KD by disentangling and emphasizing the classification probabilities of non-target speakers during knowledge distillation. The proposed method is applied to three different student model architectures and achieves an average of 13.67% improvement in EER on the VoxCeleb dataset compared to embedding-level and conventional label-level KD methods.
KW - automatic speaker verification
KW - knowledge distillation
KW - label-level knowledge distillation
UR - http://www.scopus.com/inward/record.url?scp=85195410440&partnerID=8YFLogxK
U2 - 10.1109/ICASSP48485.2024.10447160
DO - 10.1109/ICASSP48485.2024.10447160
M3 - Conference article published in proceeding or book
AN - SCOPUS:85195410440
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 10336
EP - 10340
BT - 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Y2 - 14 April 2024 through 19 April 2024
ER -