Abstract
Typically, speaker verification systems are highly optimized on the speech collected by close-talking microphones. However, these systems will perform poorly when the users use far-field microphones during verification. In this paper, we propose an adversarial separation and adaptation network (ADSAN) to extract speaker discriminative and domain-invariant features through adversarial learning. The idea is based on the notion that speaker embedding comprises domain-specific components and domain-shared components, and that the two components can be disentangled by the interplay of the separation network and the adaptation network in the ADSAN. We also propose to incorporate a mutual information neural estimator into the domain adaptation network to retain speaker discriminative information. Experiments on the VOiCES Challenge 2019 demonstrate that the proposed approaches can produce more domain-invariant and speaker discriminative representations, which could help to reduce the domain shift caused by different types of microphones and reverberant environments.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| Pages | 4298-4302 |
| Number of pages | 5 |
| Volume | 2020-October |
| DOIs | |
| Publication status | Published - Oct 2020 |
| Event | 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China Duration: 25 Oct 2020 → 29 Oct 2020 |
Conference
| Conference | 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 |
|---|---|
| Country/Territory | China |
| City | Shanghai |
| Period | 25/10/20 → 29/10/20 |
Keywords
- Adversarial learning
- Domain adaptation
- Domain mismatch
- Far field speaker verification
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation