TY - GEN
T1 - CPAUG: Refining Copy-Paste Augmentation for Speech Anti-Spoofing
AU - Zhang, Linjuan
AU - Lee, Kong Aik
AU - Zhang, Lin
AU - Wang, Longbiao
AU - Niu, Baoning
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/4/14
Y1 - 2024/4/14
N2 - Conventional copy-paste augmentations generate new training instances by concatenating existing utterances to increase the amount of data for neural network training. However, the direct application of copy-paste augmentation for anti-spoofing is problematic. This paper refines the copy-paste augmentation for speech anti-spoofing, dubbed CpAug, to generate more training data with rich intra-class diversity. The CpAug employs two policies: concatenation to merge utterances with identical labels, and substitution to replace segments in an anchor utterance. Besides, considering the impacts of speakers and spoofing attack types, we craft four blending strategies for the CpAug. Furthermore, we explore how CpAug complements the Rawboost augmentation method. Experimental results reveal that the proposed CpAug significantly improves the performance of speech anti-spoofing. Particularly, CpAug with substitution policy leads to relative improvements of 43% and 38% on the ASVspoof' 19LA and 21LA, respectively. Notably, the CpAug and Rawboost synergize effectively, achieving an EER of 2.91% on ASVspoof' 21LA.
AB - Conventional copy-paste augmentations generate new training instances by concatenating existing utterances to increase the amount of data for neural network training. However, the direct application of copy-paste augmentation for anti-spoofing is problematic. This paper refines the copy-paste augmentation for speech anti-spoofing, dubbed CpAug, to generate more training data with rich intra-class diversity. The CpAug employs two policies: concatenation to merge utterances with identical labels, and substitution to replace segments in an anchor utterance. Besides, considering the impacts of speakers and spoofing attack types, we craft four blending strategies for the CpAug. Furthermore, we explore how CpAug complements the Rawboost augmentation method. Experimental results reveal that the proposed CpAug significantly improves the performance of speech anti-spoofing. Particularly, CpAug with substitution policy leads to relative improvements of 43% and 38% on the ASVspoof' 19LA and 21LA, respectively. Notably, the CpAug and Rawboost synergize effectively, achieving an EER of 2.91% on ASVspoof' 21LA.
KW - blending strategies
KW - concatenation
KW - data augmentation
KW - speech anti-spoofing
KW - substitution
UR - https://www.scopus.com/pages/publications/85195421773
U2 - 10.1109/ICASSP48485.2024.10446438
DO - 10.1109/ICASSP48485.2024.10446438
M3 - Conference article published in proceeding or book
AN - SCOPUS:85195421773
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 10996
EP - 11000
BT - 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Y2 - 14 April 2024 through 19 April 2024
ER -