TY - GEN
T1 - Self-Supervised Speaker Recognition with Loss-Gated Learning
AU - Tao, Ruijie
AU - Lee, Kong Aik
AU - Das, Rohan Kumar
AU - Hautamäki, Ville
AU - Li, Haizhou
N1 - Publisher Copyright:
© 2022 IEEE
PY - 2022/5
Y1 - 2022/5
N2 - In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals. It is a known fact that a speaker recognition model doesn't always benefit from pseudo labels due to their unreliability. In this work, we observe that a speaker recognition network tends to model the data with reliable labels faster than those with unreliable labels. This motivates us to study a loss-gated learning (LGL) strategy, which extracts the reliable labels through the fitting ability of the neural network during training. With the proposed LGL, our speaker recognition model obtains a 46.3% performance gain over the system without it. Further, the proposed self-supervised speaker recognition with LGL trained on the VoxCeleb2 dataset without any labels achieves an equal error rate of 1.66% on the VoxCeleb1 original test set.
AB - In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals. It is a known fact that a speaker recognition model doesn't always benefit from pseudo labels due to their unreliability. In this work, we observe that a speaker recognition network tends to model the data with reliable labels faster than those with unreliable labels. This motivates us to study a loss-gated learning (LGL) strategy, which extracts the reliable labels through the fitting ability of the neural network during training. With the proposed LGL, our speaker recognition model obtains a 46.3% performance gain over the system without it. Further, the proposed self-supervised speaker recognition with LGL trained on the VoxCeleb2 dataset without any labels achieves an equal error rate of 1.66% on the VoxCeleb1 original test set.
KW - loss-gated learning
KW - pseudo label selection
KW - self-supervised speaker recognition
UR - http://www.scopus.com/inward/record.url?scp=85131252382&partnerID=8YFLogxK
U2 - 10.1109/ICASSP43922.2022.9747162
DO - 10.1109/ICASSP43922.2022.9747162
M3 - Conference article published in proceeding or book
AN - SCOPUS:85131252382
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6142
EP - 6146
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Y2 - 23 May 2022 through 27 May 2022
ER -