Abstract
The series of speaker recognition evaluations (SREs) organized by the National Institute of Standards and Technology (NIST) is widely accepted as the de facto benchmark for speaker recognition technology. This paper describes the NEC-TT speaker verification system developed for the recent SRE'19 CTS Challenge. Our system is based on an x-vector embedding front-end followed by a thin scoring back-end. We trained a very-deep neural network for x-vector extraction by incorporating residual connections, squeeze-and-excitation networks, and angular-margin softmax at the output layer. We enhanced the back-end with a tandem approach leveraging the benefit of supervised and unsupervised domain adaptation. We obtained over 30% relative reduction in error rate with each of these enhancements at the front-end and back-end, respectively.
Original language | English |
---|---|
Pages (from-to) | 2227-2231 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2020-October |
DOIs | |
Publication status | Published - Oct 2020 |
Externally published | Yes |
Event | 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China Duration: 25 Oct 2020 → 29 Oct 2020 |
Keywords
- Benchmark evaluation
- Speaker recognition
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation