Promoting Independence of Depression and Speaker Features for Speaker Disentanglement in Speech-Based Depression Detection

Lishi Zuo, Man-Wai Mak, Youzhi Tu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Recent studies have demonstrated the effectiveness of speaker disentanglement in mitigating the interference caused by speaker features in speech-based depression detection. However, the inherent entanglement between depression features and speaker features poses challenges to depression detection. In this study, we propose a mutual information-based speaker-invariant depression detector (MI-SIDD) that aims to promote independence between depression and speaker features to facilitate speaker disentanglement. Specifically, we disentangle the speaker features using a vanilla autoencoder with a well-tuned bottleneck layer and minimize the mutual information between depression and speaker features using a conditional mutual information constraint. Experimental results demonstrate the effectiveness of speaker disentanglement and the promotion of independence between depression and speaker features. Our MI-SIDD model achieves competitive performance compared to state-of-the-art methods on the DAIC-WOZ dataset.
Original languageEnglish
Title of host publicationICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherIEEE
Pages10191-10195
Number of pages5
ISBN (Electronic)979-8-3503-4485-1
ISBN (Print)979-8-3503-4486-8
DOIs
Publication statusPublished - Apr 2024

Fingerprint

Dive into the research topics of 'Promoting Independence of Depression and Speaker Features for Speaker Disentanglement in Speech-Based Depression Detection'. Together they form a unique fingerprint.

Cite this