TY - GEN
T1 - On the Importance of Analytic Phase of Speech Signals in Spoken Language Recognition
AU - Vijayan, Karthika
AU - Li, Haizhou
AU - Sun, Hanwu
AU - Lee, Kong Aik
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/10
Y1 - 2018/9/10
N2 - In this paper, we study the role of long-time analytic phase of speech signals in spoken language recognition (SLR) and employ a set of features termed as instantaneous frequency cepstral coefficients (IFCC). We extract IFCC from long-time analytic phase, in an effort to capture long range acoustic features from speech signals. These features are used in combination with the traditional shifted delta cepstral coefficients (SDCC) for SLR. As the SDCC are extracted from spectral magnitude and IFCC are from analytic phase, they characterize long-time information of speech in different ways. The experiments conducted with NIST LRE 2017 task reveals the complementary effects of IFCC features to SDCC and deep bottleneck (DBN) features. The fusion of IFCC with SDCC/DBN features delivered relative improvements of 23.23% and 16.78% in average equal error rate over the SDCC and DBN features, respectively, indicating the benefits of information from analytic phase in SLR.
AB - In this paper, we study the role of long-time analytic phase of speech signals in spoken language recognition (SLR) and employ a set of features termed as instantaneous frequency cepstral coefficients (IFCC). We extract IFCC from long-time analytic phase, in an effort to capture long range acoustic features from speech signals. These features are used in combination with the traditional shifted delta cepstral coefficients (SDCC) for SLR. As the SDCC are extracted from spectral magnitude and IFCC are from analytic phase, they characterize long-time information of speech in different ways. The experiments conducted with NIST LRE 2017 task reveals the complementary effects of IFCC features to SDCC and deep bottleneck (DBN) features. The fusion of IFCC with SDCC/DBN features delivered relative improvements of 23.23% and 16.78% in average equal error rate over the SDCC and DBN features, respectively, indicating the benefits of information from analytic phase in SLR.
KW - Analytic phase
KW - Fusion
KW - Instantaneous frequency
KW - Long-time features
KW - Spoken language recognition
UR - http://www.scopus.com/inward/record.url?scp=85054210633&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2018.8461501
DO - 10.1109/ICASSP.2018.8461501
M3 - Conference article published in proceeding or book
AN - SCOPUS:85054210633
SN - 9781538646588
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5194
EP - 5198
BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Y2 - 15 April 2018 through 20 April 2018
ER -