Abstract
Recent studies have demonstrated the effectiveness of speaker disentanglement in mitigating the interference caused by speaker features in speech-based depression detection. However, the inherent entanglement between depression features and speaker features poses challenges to depression detection. In this study, we propose a mutual information-based speaker-invariant depression detector (MI-SIDD) that aims to promote independence between depression and speaker features to facilitate speaker disentanglement. Specifically, we disentangle the speaker features using a vanilla autoencoder with a well-tuned bottleneck layer and minimize the mutual information between depression and speaker features using a conditional mutual information constraint. Experimental results demonstrate the effectiveness of speaker disentanglement and the promotion of independence between depression and speaker features. Our MI-SIDD model achieves competitive performance compared to state-of-the-art methods on the DAIC-WOZ dataset.
Original language | English |
---|---|
Title of host publication | ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Publisher | IEEE |
Pages | 10191-10195 |
Number of pages | 5 |
ISBN (Electronic) | 979-8-3503-4485-1 |
ISBN (Print) | 979-8-3503-4486-8 |
DOIs | |
Publication status | Published - Apr 2024 |