Abstract
How to overcome the training and test data mismatch in speaker verification systems has been a focus of research recently. In this paper, we propose a semi-supervised nuisance attribute network (SNAN) to reduce the domain mismatch in i-vectors and x-vectors. SNANs are based on the idea of nuisance attribute removal in inter-dataset variability compensation (IDVC). But instead of measuring the domain variability through the dataset means, SNANs use the maximum mean discrepancy (MMD) as part of their loss function, which enables the network to find nuisance directions in which domain variability is measured up to infinite moment. The architecture of SNANs also allows us to incorporate the out-of-domain speaker labels into the semi-supervised training process through the center loss and triplet loss. Using SNANs as a preprocessing step for PLDA training, we achieve a relative improvement of 11.8% in EER on NIST 2016 SRE compared to PLDA without adaptation. We also found that the semi-supervised approach can further improve SNANs' performance.
Original language | English |
---|---|
Title of host publication | IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Pages | 6236-6240 |
Number of pages | 5 |
DOIs | |
Publication status | Published - May 2019 |