Abstract
Automatic speaker verification (ASV) suffers from performance degradation in noisy environments. To solve this problem, we propose the noise-disentanglement metric learning to reduce the speaker-irrelevant noisy components and build a noise-invariant embedding space. Specifically, the disentanglement module, including the speaker encoder and re-construction module, is dedicated to decoupling speech signals. The speaker encoder is used to disentangle speaker-related components, and the reconstruction module increases the model's ability to constrain the noise information by re-constructing the signal. In addition, distribution optimization is introduced to supervise the spatial structure of speaker embeddings under noisy environments. Experiments on Vox-Celeb1 indicate that the proposed method improves the performance of the speaker verification system in both clean and noisy conditions.
Original language | English |
---|---|
Article number | 10096848 |
Pages (from-to) | 1 |
Number of pages | 5 |
Journal | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
DOIs | |
Publication status | Published - 5 May 2023 |
Event | 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece Duration: 4 Jun 2023 → 10 Jun 2023 |
Keywords
- disentangled representation learning
- metric learning
- noise robustness
- speaker verification
ASJC Scopus subject areas
- Software
- Signal Processing
- Electrical and Electronic Engineering