Age-Invariant Speaker Embedding for Diarization of Cognitive Assessments

Sean Shensheng Xu, Man Wai Mak, Ka Ho Wong, Helen Meng, Timothy C.Y. Kwok

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

5 Citations (Scopus)

Abstract

This paper investigates an age-invariant speaker embedding approach to speaker diarization, which is an essential step towards the automatic cognitive assessments from speech. Studies have shown that incorporating speaker traits (e.g., age, gender, etc.) can improve speaker diarization performance. However, we found that age information in the speaker embeddings is detrimental to speaker diarization if there is a severe mismatch between the age distributions in the training data and test data. To minimize the detrimental effect of age mismatch, an adversarial training strategy is introduced to remove age variability from the utterance-level speaker embeddings. Evaluations on an interactive dialog dataset for Montreal cognitive assessments (MoCA) show that the adversarial training strategy can produce age-invariant embeddings and reduce diarization error rate (DER) by 4.33%. The approach also outperforms the conventional method even with less training data.

Original languageEnglish
Title of host publication2021 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021
Place of PublicationHong Kong
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728169941
DOIs
Publication statusPublished - 24 Jan 2021
Event12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021 - Hong Kong, Hong Kong
Duration: 24 Jan 202127 Jan 2021

Publication series

Name2021 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021

Conference

Conference12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021
Country/TerritoryHong Kong
CityHong Kong
Period24/01/2127/01/21

Keywords

  • age-invariant speaker embedding
  • deep neural networks
  • Montreal cognitive assessments
  • speaker diarization

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Age-Invariant Speaker Embedding for Diarization of Cognitive Assessments'. Together they form a unique fingerprint.

Cite this