Abstract
This paper proposes and investigates several deep neural network (DNN) based score compensation, transformation, and calibration algorithms for enhancing the noise robustness of i-vector speaker verification systems. Unlike conventional calibration methods where the required score shift is a linear function of SNR or log-duration, the DNN approach learns the complex relationship between the score shifts and the combination of i-vector pairs and uncalibrated scores. Furthermore, with the flexibility of DNNs, it is possible to explicitly train a DNN to recover the clean scores without having to estimate the score shifts. To alleviate the overfitting problem, multitask learning is applied to incorporate auxiliary information such as SNRs and speaker ID of training utterances into the DNN. Experiments on NIST 2012 SRE show that score calibration derived from multitask DNNs can improve the performance of the conventional score-shift approch significantly, especially under noisy conditions.
Original language | English |
---|---|
Article number | 8249870 |
Pages (from-to) | 700-712 |
Number of pages | 13 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 26 |
Issue number | 4 |
DOIs | |
Publication status | Published - 1 Apr 2018 |
Keywords
- Deep learning
- multi-task learning
- noise robustness
- score calibration
- speaker verification
ASJC Scopus subject areas
- Signal Processing
- Media Technology
- Instrumentation
- Acoustics and Ultrasonics
- Linguistics and Language
- Electrical and Electronic Engineering
- Speech and Hearing