Bottleneck features from SNR-adaptive denoising deep classifier for speaker identification

Zhili Tan, Man Wai Mak

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

6 Citations (Scopus)

Abstract

In this paper, we explore the potential of using deep learning for extracting speaker-dependent features for noise robust speaker identification. More specifically, an SNR-adaptive denoising classifier is constructed by stacking two layers of restricted Boltzmann machines (RBMs) on top of a denoising deep autoencoder, where the top-RBM layer is connected to a soft-max output layer that outputs the posterior probabilities of speakers and the top-RBM layer outputs speaker-dependent bottleneck features. Both the deep autoencoder and RBMs are trained by contrastive divergence, followed by backpropagation fine-tuning. The autoencoder aims to reconstruct the clean spectra of a noisy test utterance using the spectra of the noisy test utterance and its SNR as input. With this denoising capability, the output from the bottleneck layer of the classifier can be considered as a low-dimension representation of denoised utterances. These frame-based bottleneck features are than used to train an iVector extractor and a PLDA model for speaker identification. Experimental results based on a noisy YOHO corpus show that the bottleneck features slightly outperform the conventional MFCC under low SNR conditions and that fusion of the two features lead to further performance gain, suggesting that the two features are complementary with each other.
Original languageEnglish
Title of host publication2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
PublisherIEEE
Pages1035-1040
Number of pages6
ISBN (Electronic)9789881476807
DOIs
Publication statusPublished - 19 Feb 2016
Event2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015 - Hong Kong, Hong Kong
Duration: 16 Dec 201519 Dec 2015

Conference

Conference2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
Country/TerritoryHong Kong
CityHong Kong
Period16/12/1519/12/15

Keywords

  • Bottleneck features
  • deep belief networks
  • Deep learning
  • denoising autoencoder
  • speaker identification

ASJC Scopus subject areas

  • Artificial Intelligence
  • Modelling and Simulation
  • Signal Processing

Fingerprint

Dive into the research topics of 'Bottleneck features from SNR-adaptive denoising deep classifier for speaker identification'. Together they form a unique fingerprint.

Cite this