Abstract
The generalization ability of speech anti-spoofing (SAS) models is critical for their effective deployment in security-critical applications. Recently, data augmentation has gained significant attention as an effective method to enhance model generalization. However, most existing augmentation techniques are either tailored to specific test sets or overlook spoof-related characteristics, limiting their generalization. The goal of this research is to develop a novel data augmentation framework that maximizes existing data usage and enhances model performance in unseen domains without complex design. We propose a copy-based augmentation framework, inspired by the theoretical error bound of domain generalization, which mitigates the distribution discrepancy between training and testing domains by applying transformations to existing speech signals or features. Within this framework, directly concatenating existing speech signals is problematic for SAS. To address this, we propose an optimized copy-paste augmentation (CpAug) method by concatenation and substitution policies, avoiding the introduction of forgery traces and enhancing intra-class diversity. Additionally, to further enhance generalization, we also propose mixed copy-paste augmentation (mCpAug), which integrates signal-level, generation-level, and lexical-level perturbations to better align the training data with unseen domains. Extensive cross-dataset evaluations demonstrate that the proposed methods outperform most augmentation strategies, exhibiting superior generalization across various datasets.
| Original language | English |
|---|---|
| Article number | 130799 |
| Pages (from-to) | 1-10 |
| Journal | Neurocomputing |
| Volume | 649 |
| DOIs | |
| Publication status | Published - Jun 2025 |
Keywords
- Data augmentation
- Generalization
- Speech anti-spoofing
ASJC Scopus subject areas
- Computer Science Applications
- Cognitive Neuroscience
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Make full use of your data: On copy-based augmentation in speech anti-spoofing'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver