TY - GEN
T1 - Modeling Suprasegmental Information Using Finite Difference Network for End-to-End Speaker Verification
AU - Li, Jin
AU - Mak, Man Wai
AU - Yan, Nan
AU - Wang, Lan
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023/11
Y1 - 2023/11
N2 - In recent years, using raw waveforms as input to deep networks has been widely explored for speaker verification systems that process speech signals at the segmental level. A critical issue of such an approach is that the front-end network with a small kernel fails to capture the suprasegmental information, such as the intonation patterns and prosody that span longer than one second. This paper proposes a novel framework that can capture the segmental and suprasegmental information after the first convolutional layer. Concretely, suprasegmental information is obtained from the first-order finite difference of two consecutive suprasegmental envelopes estimated by Hilbert transforms. Experimental evaluations on the Voxceleb dataset show that combining segmental and suprasegmental features can reduce the EER of an end-to-end system by 27%. To our best knowledge, this is the first attempt to incorporate suprasegmental information for end-to-end speaker verification.
AB - In recent years, using raw waveforms as input to deep networks has been widely explored for speaker verification systems that process speech signals at the segmental level. A critical issue of such an approach is that the front-end network with a small kernel fails to capture the suprasegmental information, such as the intonation patterns and prosody that span longer than one second. This paper proposes a novel framework that can capture the segmental and suprasegmental information after the first convolutional layer. Concretely, suprasegmental information is obtained from the first-order finite difference of two consecutive suprasegmental envelopes estimated by Hilbert transforms. Experimental evaluations on the Voxceleb dataset show that combining segmental and suprasegmental features can reduce the EER of an end-to-end system by 27%. To our best knowledge, this is the first attempt to incorporate suprasegmental information for end-to-end speaker verification.
UR - http://www.scopus.com/inward/record.url?scp=85180004372&partnerID=8YFLogxK
U2 - 10.1109/APSIPAASC58517.2023.10317476
DO - 10.1109/APSIPAASC58517.2023.10317476
M3 - Conference article published in proceeding or book
AN - SCOPUS:85180004372
T3 - 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
SP - 119
EP - 124
BT - 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
Y2 - 31 October 2023 through 3 November 2023
ER -