UNet-DenseNet for Robust Far-Field Speaker Verification

Zhenke Gao, Man Wai Mak, Weiwei Lin

Research output: Journal article publicationConference articleAcademic researchpeer-review

10 Citations (Scopus)

Abstract

Far-field speaker verification (SV) has always been critical but challenging. Data augmentation is commonly used to overcome the problems arising from far-field microphones, such as high background noise levels and reverberation effects. On top of data augmentation, this paper tackles these problems by introducing a UNet-based speech enhancement (SE) module as a front-end processor for the speaker embedding module. To prevent the SE module from distorting speaker information, we propose two improvements to the speech enhancement-speaker embedding pipeline. (1) A UNet-DenseNet joint training scheme in which the UNet is optimized by both the MSE and speaker classification losses. (2) A semi-joint training scheme that stops the UNet training but continues the DenseNet training when overfitting of the UNet is detected. Extensive experiments on noise-contaminated Voxceleb1 and the VOiCES Challenge 2019 demonstrate the effectiveness of the two training schemes.

Original languageEnglish
Pages (from-to)3714-3718
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2022-September
DOIs
Publication statusPublished - Sept 2022
Event23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
Duration: 18 Sept 202222 Sept 2022

Keywords

  • DenseNet
  • Far-field speaker verification
  • speaker embedding
  • speech enhancement
  • UNet

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'UNet-DenseNet for Robust Far-Field Speaker Verification'. Together they form a unique fingerprint.

Cite this