TY - JOUR
T1 - Duration compensation of i-vectors for short duration speaker verification
AU - Ma, Jianbo
AU - Sethu, Vidhyasaharan
AU - Ambikairajah, Eliathamby
AU - Lee, Kong Aik
N1 - Publisher Copyright:
© The Institution of Engineering and Technology 2017.
PY - 2017/3/16
Y1 - 2017/3/16
N2 - The standard i-vector/Gaussian probabilistic linear discriminant analysis (G-PLDA) system does not compensate for duration mismatch, which is a significant confounding factor in short duration speaker verification. A novel duration compensation technique to normalise the distribution mismatch caused by duration variation in the i-vector space is proposed. The proposed technique involves the use of two factor analysers that are tied together to share latent variables for a given speaker as the underlying generative model of the i-vector space. This leads to a transform which maps the original i-vectors onto a latent subspace that is expected to be duration invariant. The proposed method has the advantages that it normalises distribution mismatch while taking into consideration both inter- and intra-speaker variability. Experiments conducted on NIST SRE 2010 database shows that the proposed method leads to 18.54, 15.48 and 8.77% relative improvements when tested on utterances of 10, 5 and 3 s durations, respectively, compared with the best results obtained by either standard i-vector/G-PLDA or the previously proposed twin model G-PLDA.
AB - The standard i-vector/Gaussian probabilistic linear discriminant analysis (G-PLDA) system does not compensate for duration mismatch, which is a significant confounding factor in short duration speaker verification. A novel duration compensation technique to normalise the distribution mismatch caused by duration variation in the i-vector space is proposed. The proposed technique involves the use of two factor analysers that are tied together to share latent variables for a given speaker as the underlying generative model of the i-vector space. This leads to a transform which maps the original i-vectors onto a latent subspace that is expected to be duration invariant. The proposed method has the advantages that it normalises distribution mismatch while taking into consideration both inter- and intra-speaker variability. Experiments conducted on NIST SRE 2010 database shows that the proposed method leads to 18.54, 15.48 and 8.77% relative improvements when tested on utterances of 10, 5 and 3 s durations, respectively, compared with the best results obtained by either standard i-vector/G-PLDA or the previously proposed twin model G-PLDA.
UR - http://www.scopus.com/inward/record.url?scp=85015769162&partnerID=8YFLogxK
U2 - 10.1049/el.2016.4629
DO - 10.1049/el.2016.4629
M3 - Journal article
AN - SCOPUS:85015769162
SN - 0013-5194
VL - 53
SP - 405
EP - 407
JO - Electronics Letters
JF - Electronics Letters
IS - 6
ER -