Speaker augmentation and bandwidth extension for deep speaker embedding

Hitoshi Yamamoto, Kong Aik Lee, Koji Okabe, Takafumi Koshinaka

Research output: Journal article publicationConference articleAcademic researchpeer-review

34 Citations (Scopus)

Abstract

This paper investigates a novel data augmentation approach to train deep neural networks (DNNs) used for speaker embedding, i.e. to extract representation that allows easy comparison between speaker voices with a simple geometric operation. Data augmentation is used to create new examples from an existing training set, thereby increasing the quantity of training data improves the robustness of the model. We attempt to increase the number of speakers in the training set by generating new speakers via voice conversion. This speaker augmentation expands the coverage of speakers in the embedding space in contrast to conventional audio augmentation methods which focus on within-speaker variability. With an increased number of speakers in the training set, the DNN is trained to produce a better speaker-discriminative embedding. We also advocate using bandwidth extension to augment narrowband speech for a wideband application. Text-independent speaker recognition experiments in Speakers in the Wild (SITW) demonstrate a 17.9% reduction in minimum detection cost with speaker augmentation. The combined use of the two techniques provides further improvement.

Original languageEnglish
Pages (from-to)406-410
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2019-September
DOIs
Publication statusPublished - Sept 2019
Externally publishedYes
Event20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria
Duration: 15 Sept 201919 Sept 2019

Keywords

  • Bandwidth extension
  • Data augmentation
  • Speaker embedding
  • Speaker recognition

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Speaker augmentation and bandwidth extension for deep speaker embedding'. Together they form a unique fingerprint.

Cite this