Joseph: phonetic-aware speaker embedding for far-field speaker verification

Zezhong Jin, Youzhi Tu, Man Wai Mak

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Performing speaker verification (SV) at a distance from the sound source is challenging because of the interference of noise and reverberation. In such a situation, incorporating phonetic information into speaker embeddings can help reduce the adverse effects of noise and reverberation. Inspired by this observation, we propose a Jointly optimized speaker-embedding and phonetic-matching (Joseph) framework to exploit phonetic content for far-field SV. The framework encourages the speaker embeddings to preserve phonetic information by matching the frame-based feature maps of a speaker embedding network with wav2vec's vectors. The intuition is that phonetic information can preserve low-level acoustic dynamics with speaker information and thus partly compensate for the degradation due to noise and reverberation. Results show that the proposed framework outperforms the standard speaker embedding on the VOiCES Challenge 2019 evaluation set and the VoxCeleb1 test set. This indicates that leveraging phonetic information under far-field conditions is effective for learning robust speaker representations.

Original languageEnglish
Title of host publicationAPSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350367331
DOIs
Publication statusPublished - Dec 2024
Event2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024 - Macau, China
Duration: 3 Dec 20246 Dec 2024

Publication series

NameAPSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024

Conference

Conference2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024
Country/TerritoryChina
CityMacau
Period3/12/246/12/24

Keywords

  • Far-field speaker verification
  • multi-task learning
  • phonetic content
  • wav2vec

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Hardware and Architecture
  • Signal Processing

Fingerprint

Dive into the research topics of 'Joseph: phonetic-aware speaker embedding for far-field speaker verification'. Together they form a unique fingerprint.

Cite this