Abstract
Both humans and large language models (LLMs) are sensitive to the dialectic background of their interlocutor in language comprehension (e.g., Cai et al., 2017, 2023). For example, when hearing/reading cross-dialectally ambiguous words (e.g., flat) produced by an American English (AE) or British English (BE) interlocutor, both people and LLMs tend to access the dialect- specific meaning (i.e., level vs. an apartment for AE and BE interlocutors respectively). However, the dialectic background of an interlocutor can be inferred not only from their accent but also from their lexical use, spelling or cultural references. In this study, we test whether humans and LLMs are also sensitive to non-spoken dialectic cues in word meaning access.
In Experiment 1 (with 36 items), 42 human participants played a dialogue game with an interlocutor (in reality pre-scripted responses), where the interlocutor typed in a word according to a given definition and the participants typed the first word they thought of upon reading their partner’s word. In target trials, the interlocutors typed a word that was a cross-dialectally ambiguous word (e.g., flat) and we were interested whether participants would access a meaning that was appropriate to the interlocutor’s supposed dialect by examining the participant’s associate (e.g., level vs. housing for AE and BE respectively). The interlocutor’s dialectic background (AE vs. BE) was either explicitly mentioned to the participant (e.g., I’m from America/England; used as a baseline), implied via culture-specific references in the self- introduction (e.g., I like to watch Saturday Night Live/East Enders), implied via lexical use in the typed words (e.g., vacation/holiday), or implied via spellings in the typed words (e.g., theater/theatre). In none of the conditions did we observe a tendency for participants to access word meanings appropriate to the interlocutor’s explicitly mentioned or implied dialectic background (see Fig. 1). These results suggest that human participants were not sensitive to these non-spoken dialectic cues in word meaning access.
Experiment 2 was similar to Experiment 1, except we tested ChatGPT as the participant. A Python script simulated a human interlocutor doing the experimental task with ChatGPT for 1,000 runs (i.e., equivalent to 1,000 participants), with an introduction given to ChatGPT that varied either the cultural references, lexical items, or spelling of AE or BE interlocutors. The results indicate that ChatGPT is sensitive to the dialectal background of their interlocutors in meaning access (see Fig. 2), with more AE associates when interacting with AE (50.7%) compare to BE interlocutors (34.8%), and with this effect being present in all conditions (p’s< .001).
Overall, the results of Experiments 1 and 2 indicate that humans and LLMs are not fully equivalent linguistic comprehenders. LLMs are more sensitive to and make use of subtle text cues on implied dialectal background in word meaning access. Therefore, while LLMs have lesser linguistic abilities than humans in certain areas (e.g., Dentella et al., 2023), LLMs seem to be better than humans in modelling certain characteristics of interlocutors and/or applying these interlocutor models in language comprehension.
In Experiment 1 (with 36 items), 42 human participants played a dialogue game with an interlocutor (in reality pre-scripted responses), where the interlocutor typed in a word according to a given definition and the participants typed the first word they thought of upon reading their partner’s word. In target trials, the interlocutors typed a word that was a cross-dialectally ambiguous word (e.g., flat) and we were interested whether participants would access a meaning that was appropriate to the interlocutor’s supposed dialect by examining the participant’s associate (e.g., level vs. housing for AE and BE respectively). The interlocutor’s dialectic background (AE vs. BE) was either explicitly mentioned to the participant (e.g., I’m from America/England; used as a baseline), implied via culture-specific references in the self- introduction (e.g., I like to watch Saturday Night Live/East Enders), implied via lexical use in the typed words (e.g., vacation/holiday), or implied via spellings in the typed words (e.g., theater/theatre). In none of the conditions did we observe a tendency for participants to access word meanings appropriate to the interlocutor’s explicitly mentioned or implied dialectic background (see Fig. 1). These results suggest that human participants were not sensitive to these non-spoken dialectic cues in word meaning access.
Experiment 2 was similar to Experiment 1, except we tested ChatGPT as the participant. A Python script simulated a human interlocutor doing the experimental task with ChatGPT for 1,000 runs (i.e., equivalent to 1,000 participants), with an introduction given to ChatGPT that varied either the cultural references, lexical items, or spelling of AE or BE interlocutors. The results indicate that ChatGPT is sensitive to the dialectal background of their interlocutors in meaning access (see Fig. 2), with more AE associates when interacting with AE (50.7%) compare to BE interlocutors (34.8%), and with this effect being present in all conditions (p’s< .001).
Overall, the results of Experiments 1 and 2 indicate that humans and LLMs are not fully equivalent linguistic comprehenders. LLMs are more sensitive to and make use of subtle text cues on implied dialectal background in word meaning access. Therefore, while LLMs have lesser linguistic abilities than humans in certain areas (e.g., Dentella et al., 2023), LLMs seem to be better than humans in modelling certain characteristics of interlocutors and/or applying these interlocutor models in language comprehension.
| Original language | English |
|---|---|
| Publication status | Not published / presented only - Dec 2024 |
| Event | Architectures and Mechanisms for Language Processing Asia 2024 - , Singapore Duration: 5 Dec 2024 → 7 Dec 2024 |
Conference
| Conference | Architectures and Mechanisms for Language Processing Asia 2024 |
|---|---|
| Abbreviated title | AMLaP Asia 2024 |
| Country/Territory | Singapore |
| Period | 5/12/24 → 7/12/24 |
Fingerprint
Dive into the research topics of 'Large language models but not humans are sensitive to the implied dialectic background of their interlocutor in word meaning access'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver