TY - GEN
T1 - Learning Domain-Invariant Transformation for Speaker Verification
AU - Zhang, Hanyi
AU - Wang, Longbiao
AU - Lee, Kong Aik
AU - Liu, Meng
AU - Dang, Jianwu
AU - Chen, Hui
N1 - Publisher Copyright:
© 2022 IEEE
PY - 2022/5
Y1 - 2022/5
N2 - Automatic speaker verification (ASV) faces domain shift caused by the mismatch of intrinsic and extrinsic factors such as recording device and speaking style in real-world applications, which leads to unsatisfactory performance. To this end, we propose the meta generalized transformation via meta-learning to build a domain-invariant embedding space. Specifically, the transformation module is motivated to learn the domain generalization knowledge by executing meta-optimization on the meta-train and meta-test sets which are designed to simulate domain shift. Furthermore, distribution optimization is incorporated to supervise the metric structure of embeddings. In terms of the transformation module, we investigate various instantiations and observe the multilayer perceptron with gating (gMLP) is the most effective given its extrapolation capability. The experimental results on cross-genre and cross-dataset settings demonstrate that the meta generalized transformation dramatically improves the robustness of ASV systems to domain shift, while outperforms the state-of-the-art methods.
AB - Automatic speaker verification (ASV) faces domain shift caused by the mismatch of intrinsic and extrinsic factors such as recording device and speaking style in real-world applications, which leads to unsatisfactory performance. To this end, we propose the meta generalized transformation via meta-learning to build a domain-invariant embedding space. Specifically, the transformation module is motivated to learn the domain generalization knowledge by executing meta-optimization on the meta-train and meta-test sets which are designed to simulate domain shift. Furthermore, distribution optimization is incorporated to supervise the metric structure of embeddings. In terms of the transformation module, we investigate various instantiations and observe the multilayer perceptron with gating (gMLP) is the most effective given its extrapolation capability. The experimental results on cross-genre and cross-dataset settings demonstrate that the meta generalized transformation dramatically improves the robustness of ASV systems to domain shift, while outperforms the state-of-the-art methods.
KW - domain-invariant
KW - meta generalized transformation
KW - meta-learning
KW - speaker verification
UR - http://www.scopus.com/inward/record.url?scp=85131243013&partnerID=8YFLogxK
U2 - 10.1109/ICASSP43922.2022.9747514
DO - 10.1109/ICASSP43922.2022.9747514
M3 - Conference article published in proceeding or book
AN - SCOPUS:85131243013
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 7177
EP - 7181
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Y2 - 23 May 2022 through 27 May 2022
ER -