Abstract
We propose a dynamic-margin softmax loss for the training of deep speaker embedding neural network. Our proposal is inspired by the additive-margin softmax (AM-Softmax) loss reported earlier. In AM-Softmax loss, a constant margin is used for all training samples. However, the angle between the feature vector and the ground-truth class center is rarely the same for all samples. Furthermore, the angle also changes during training. Thus, it is more reasonable to set a dynamic margin for each training sample. In this paper, we propose to dynamically set the margin of each training sample commensurate with the cosine angle of that sample, hence, the name dynamic-additive-margin softmax (DAM-Softmax) loss. More specifically, the smaller the cosine angle is, the larger the margin between the training sample and the corresponding class in the feature space should be to promote intra-class compactness. Experimental results show that the proposed DAM-Softmax loss achieves state-of-the-art performance on the VoxCeleb dataset by 1.94% in equal error rate (EER). In addition, our method also outperforms AM-Softmax loss when evaluated on the Speakers in the Wild (SITW) corpus.
Original language | English |
---|---|
Pages (from-to) | 3800-3804 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2020-October |
DOIs | |
Publication status | Published - Oct 2020 |
Externally published | Yes |
Event | 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China Duration: 25 Oct 2020 → 29 Oct 2020 |
Keywords
- Intra-class compactness
- Large-margin loss
- Speaker verification
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation