TY - JOUR
T1 - Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders
AU - Meng, Helen
AU - Mak, Brian
AU - Mak, Man Wai
AU - Fung, Helene
AU - Gong, Xianmin
AU - Kwok, Timothy
AU - Liu, Xunying
AU - Mok, Vincent
AU - Wong, Patrick
AU - Woo, Jean
AU - Wu, Xixin
AU - Wong, Ka Ho
AU - Xu, Sean Shensheng
AU - Zheng, Naijun
AU - Huang, Ranzo
AU - Kang, Jiawen
AU - Ke, Xiaoquan
AU - Li, Junan
AU - Li, Jinchao
AU - Wang, Yi
N1 - Funding Information:
*1: Dept. of Systems Engineering & Engineering Management, 2: Stanley Ho Big Data Decision Analytics Research Centre, 3: Centre for Perceptual & Interactive Intelligence, 4: Dept. of Computer Science & Engineering, 5: Dept. of Electronic & Information Engineering, 6: Dept. of Psychology, 7: Dept. of Medicine & Therapeutics, 8: Jockey Club Centre for Osteoporosis Care & Control, 9: Jockey Club Institute of Aging, 10: Division of Neurology, Dept. of Medicine & Therapeutics, 11: Margaret K. L. Cheung Research Centre for Management of Parkinsonism, 12: Li Ka Shing Institute of Health Sciences, 13: Gerald Choa Neuroscience Institute, 14: Dept. of Linguistics & Modern Languages, 15: Brain & Mind Institute, 16: School of Biomedical Engineering, This project is partially supported by the HKSAR Research Grants Council (Project No. T45-407/19N).
Publisher Copyright:
© 2023 International Speech Communication Association. All rights reserved.
PY - 2023/8
Y1 - 2023/8
N2 - This paper presents an enhanced pipeline system for automated screening of neurocognitive disorders, e.g. Alzheimer's Disease (AD), using spoken language technologies. To ensure local relevance, the pipeline is applied to two-way interactions between clinical assessors and older adult participants in spoken Cantonese, the predominant language used in Hong Kong. The pipeline includes: (i) Speaker diarization using speaker-turn-aware scoring to capture the temporal structure of conversations. (ii) ASR using XLS-R wav2vec 2.0 models further pre-trained on Cantonese speech data and fine-tuned. (iii) Language modelling using RoBERTa with further fine-tuning. (iv) AD screening with neural network classification. A reference benchmark is obtained using the ADReSS corpus where no diarization is needed, and the partial pipeline attained a competitive detection accuracy of 87.5%.
AB - This paper presents an enhanced pipeline system for automated screening of neurocognitive disorders, e.g. Alzheimer's Disease (AD), using spoken language technologies. To ensure local relevance, the pipeline is applied to two-way interactions between clinical assessors and older adult participants in spoken Cantonese, the predominant language used in Hong Kong. The pipeline includes: (i) Speaker diarization using speaker-turn-aware scoring to capture the temporal structure of conversations. (ii) ASR using XLS-R wav2vec 2.0 models further pre-trained on Cantonese speech data and fine-tuned. (iii) Language modelling using RoBERTa with further fine-tuning. (iv) AD screening with neural network classification. A reference benchmark is obtained using the ADReSS corpus where no diarization is needed, and the partial pipeline attained a competitive detection accuracy of 87.5%.
KW - dementia
KW - diarization
KW - NCD detection
KW - neurocognitive disorder
KW - speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85171578221&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2023-2249
DO - 10.21437/Interspeech.2023-2249
M3 - Conference article
AN - SCOPUS:85171578221
SN - 2308-457X
VL - 2023-August
SP - 1713
EP - 1717
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 24th International Speech Communication Association, Interspeech 2023
Y2 - 20 August 2023 through 24 August 2023
ER -