Channel Interdependence Enhanced Speaker Embeddings for Far-Field Speaker Verification

Ling Jun Zhao, Man Wai Mak

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Recognizing speakers from a distance using far-field microphones is difficult because of the environmental noise and reverberation distortion. In this work, we tackle these problems by strengthening the frame-level processing and feature aggregation of x-vector networks. Specifically, we restructure the dilated convolutional layers into Res2Net blocks to generate multi-scale frame-level features. To exploit the relationship between the channels, we introduce squeeze-and-excitation (SE) units to rescale the channels' activations and investigate the best places to put these SE units in the Res2Net blocks. Based on the hypothesis that layers at different depth contain speaker information at different granularity levels, multi-block feature aggregation is introduced to propagate and aggregate the features at various depths. To optimally weight the channels and frames during feature aggregation, we propose a channel-dependent attention mechanism. Combining all of these enhancements leads to a network architecture called channel-interdependence enhanced Res2Net (CE-Res2Net). Results show that the proposed network achieves a relative improvement of about 16% in EER and 17% in minDCF on the VOiCES 2019 Challenge's evaluation set.

Original languageEnglish
Title of host publication2021 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021
Place of PublicationHong Kong
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728169941
DOIs
Publication statusPublished - 24 Jan 2021
Event12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021 - Hong Kong, Hong Kong
Duration: 24 Jan 202127 Jan 2021

Publication series

Name2021 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021

Conference

Conference12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021
Country/TerritoryHong Kong
CityHong Kong
Period24/01/2127/01/21

Keywords

  • channel-dependent attention
  • Far-field speaker verification
  • Res2Net
  • speaker embedding
  • Squeeze-and-excitation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Linguistics and Language

Cite this