Predicting gender and age categories in English conversations using lexical, non-lexical, and turn-taking features

Andreas Maria Liesenfeld, Gabor Parti, Yu-yin Hsu, Chu-ren Huang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

This paper examines gender and age salience and (stereo)typicality in British English talk with the aim to predict gender and age categories based on lexical, phrasal and turntaking features. We examine the SpokenBNC, a corpus of around 11.4 million words of British English conversations and identify behavioural
differences between speakers that are labelled for gender and age categories.
We explore differences in language use and turn-taking dynamics and identify a range of characteristics that set the categories apart. We find that female speakers tend to produce more and slightly longer turns, while turns by male speakers feature a higher type-token ratio and a distinct range of minimal particles such as “eh”, “uh” and “em”. Across age groups, we observe, for instance, that swear words and laughter characterize young speakers’ talk, while old speakers tend to produce more truncated words. We then use the observed
characteristics to predict gender and age labels of speakers per conversation and
per turn as a classification task, showing that non-lexical utterances such as minimal particles that are usually left out of dialog data can contribute to setting the categories apart.
Original languageEnglish
Title of host publicationProceedings of the 34th Pacific Asia Conference on Language, Information and Computation
EditorsMinh Le Nguyen, Mai Chi Luong, Sanghoun Song
PublisherAssociation for Computational Linguistics (ACL)
Pages157–166
Publication statusPublished - Oct 2020
EventThe 34th Pacific Asia Conference on Language, Information and Computation (PACLIC-34) - Vietnam National University, Hanoi, Viet Nam
Duration: 24 Oct 202026 Oct 2020

Conference

ConferenceThe 34th Pacific Asia Conference on Language, Information and Computation (PACLIC-34)
Country/TerritoryViet Nam
CityHanoi
Period24/10/2026/10/20

Fingerprint

Dive into the research topics of 'Predicting gender and age categories in English conversations using lexical, non-lexical, and turn-taking features'. Together they form a unique fingerprint.

Cite this