TY - GEN
T1 - Speaker-Phonetic Vector Estimation for Short Duration Speaker Verification
AU - Ma, Jianbo
AU - Sethu, Vidhyasaharan
AU - Ambikairajah, Eliathamby
AU - Lee, Kong Aik
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/10
Y1 - 2018/9/10
N2 - Phonetic variability is one of the primary challenges in short duration speaker verification. This paper proposes a novel method that modifies the standard normal distribution prior in the total variability model to use a mixture of Gaussians as the prior distribution. The proposed speaker-phonetic vectors are then estimated from the posterior probability of latent variables, and each vector has a phonetic meaning. Unlike the standard total variability model, the proposed method can incorporate a phoneme classifier to perform soft content matching, which has the potential to solve the phonetic variability problem. Parameter estimation and scoring formulae for speaker-phonetic vectors method are presented. Experimental results obtained using NIST 2010 data show that the proposed technique leads to relative improvements of more than 30% when fused with total variability model and tested on 3 second duration test files.
AB - Phonetic variability is one of the primary challenges in short duration speaker verification. This paper proposes a novel method that modifies the standard normal distribution prior in the total variability model to use a mixture of Gaussians as the prior distribution. The proposed speaker-phonetic vectors are then estimated from the posterior probability of latent variables, and each vector has a phonetic meaning. Unlike the standard total variability model, the proposed method can incorporate a phoneme classifier to perform soft content matching, which has the potential to solve the phonetic variability problem. Parameter estimation and scoring formulae for speaker-phonetic vectors method are presented. Experimental results obtained using NIST 2010 data show that the proposed technique leads to relative improvements of more than 30% when fused with total variability model and tested on 3 second duration test files.
KW - Automatic speaker verification
KW - I-vector
KW - Phonetic variability
KW - Short duration speaker verification
KW - Speaker-phonetic vector
UR - http://www.scopus.com/inward/record.url?scp=85054202854&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2018.8461978
DO - 10.1109/ICASSP.2018.8461978
M3 - Conference article published in proceeding or book
AN - SCOPUS:85054202854
SN - 9781538646588
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5264
EP - 5268
BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Y2 - 15 April 2018 through 20 April 2018
ER -