Skip to main navigation Skip to search Skip to main content

Segmental and Suprasegmental Speech Foundation Models for Classifying Cognitive Risk Factors: Evaluating Out-of-the-Box Performance

  • Si Ioi Ng
  • , Lingfeng Xu
  • , Kimberly D. Mueller
  • , Julie Liss
  • , Visar Berisha

Research output: Journal article publicationConference articleAcademic researchpeer-review

Abstract

Speech foundation models are remarkably successful in various consumer applications, prompting their extension to clinical use-cases. This is challenged by small clinical datasets, which precludes effective fine-tuning. We tested the efficacy of two models to classify participants by segmental (Wav2Vec2.0) and suprasegmental (Trillsson) speech analysis windows. Analysis at both time scales has shown differences in the context of cognitive decline. Speakers were classified as healthy controls (HC), Amyloid-β+ (Aβ+), mild cognitive impairment (MCI), or dementia. A subset of W2V2 and Trillsson representations showed large effect size between HC and each risk factor. Cross-validation showed W2V2 consistently outperforms Trillsson. Mean macro-F1 of 54.1%, 63.5%, and 72.0% in were found for classifying Aβ+, MCI, and dementia from HC. Repeatability of Trillsson and W2V2 showed intraclass correlations of 0.30 and 0.41. Reliability of such models must be enhanced for clinical speech analysis and longitudinal tracking.
Original languageEnglish
Pages (from-to)917-921
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2024
DOIs
Publication statusPublished - Sept 2024
Externally publishedYes
Event25th Interspeech Conferece 2024 - Kos Island, Greece
Duration: 1 Sept 20245 Sept 2024

Keywords

  • Dementia
  • cognitive risk factors
  • clinical speech analytics
  • speech foundation models
  • feature reliability

Fingerprint

Dive into the research topics of 'Segmental and Suprasegmental Speech Foundation Models for Classifying Cognitive Risk Factors: Evaluating Out-of-the-Box Performance'. Together they form a unique fingerprint.

Cite this