Avoiding dominance of speaker features in speech-based depression detection

Lishi Zuo, Man Wai Mak

Research output: Journal article publicationJournal articleAcademic researchpeer-review

6 Citations (Scopus)

Abstract

The performance of speech-based depression detectors is limited by the scarcity and imbalance in depression data. We found that depression detectors could be strongly biased toward speaker features when the number of training speakers is insufficient. To address this issue, we propose a speaker-invariant depression detector (SIDD) that minimizes speaker information in the latent space. The SIDD consists of an autoencoder, a depression classifier, and a speaker-embedding projector. By incorporating speaker-embedding vectors into the autoencoder's latent vectors, speaker information is effectively eliminated for the depression classifier. Experimental results demonstrate significant improvements achieved by minimizing speaker information, and our proposed method generally outperforms previous approaches for depression detection on the DAIC-WOZ dataset.

Original languageEnglish
Pages (from-to)50-56
Number of pages7
JournalPattern Recognition Letters
Volume173
DOIs
Publication statusPublished - Sept 2023

Keywords

  • Depression detection
  • Feature disentanglement
  • Speaker embedding
  • Speaker invariance

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Avoiding dominance of speaker features in speech-based depression detection'. Together they form a unique fingerprint.

Cite this