Modeling Suprasegmental Information Using Finite Difference Network for End-to-End Speaker Verification

Jin Li, Man Wai Mak, Nan Yan, Lan Wang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

In recent years, using raw waveforms as input to deep networks has been widely explored for speaker verification systems that process speech signals at the segmental level. A critical issue of such an approach is that the front-end network with a small kernel fails to capture the suprasegmental information, such as the intonation patterns and prosody that span longer than one second. This paper proposes a novel framework that can capture the segmental and suprasegmental information after the first convolutional layer. Concretely, suprasegmental information is obtained from the first-order finite difference of two consecutive suprasegmental envelopes estimated by Hilbert transforms. Experimental evaluations on the Voxceleb dataset show that combining segmental and suprasegmental features can reduce the EER of an end-to-end system by 27%. To our best knowledge, this is the first attempt to incorporate suprasegmental information for end-to-end speaker verification.

Original languageEnglish
Title of host publication2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages119-124
Number of pages6
ISBN (Electronic)9798350300673
DOIs
Publication statusPublished - Nov 2023
Event2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023 - Taipei, Taiwan
Duration: 31 Oct 20233 Nov 2023

Publication series

Name2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023

Conference

Conference2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
Country/TerritoryTaiwan
CityTaipei
Period31/10/233/11/23

ASJC Scopus subject areas

  • Hardware and Architecture
  • Signal Processing
  • Artificial Intelligence
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Modeling Suprasegmental Information Using Finite Difference Network for End-to-End Speaker Verification'. Together they form a unique fingerprint.

Cite this