Abstract
With the advancement of large language models (LLMs) such as ChatGPT (OpenAI Team, 2022), artificial intelligence is set to drastically impact language education (Kasneci et al., 2023). LLMs can give second language learners unlimited feedback and one-on-one conversational interaction, allowing universal access to personalized language teaching. However, in order for LLMs to be suitable language instructors, this technology needs to have correct knowledge on linguistics and language specific content. Due to the novelty and variety of LLMs, there is little research detailing the extent to which various widely-used LLMs are accurate in this domain. One study found that ChatGPT correctly answered questions regarding English ambiguity detection only 61% of the time (Ortega-Martín et al., 2023), suggesting that at least some LLMs may not be effective language learning tools.
In order to examine how effective various LLMs are in language teaching, the current study tested how accurate a variety of popular LLMs are across a range of domains about the English language, including questions regarding English phonology, vocabulary, syntax, and pragmatics. For example, the LLMs were asked “What does “it” refer to in the following sentence: The restaurant was a bit pricey, but he still went to it often”. The LLMs were very accurate with questions on syntax and vocabulary, with less accuracy on phonology and pragmatics. There was also considerable variability in accuracy between different LLMs. Overall, LLMs have the potential to be effective language instructors, but additional fine-tuning (i.e., providing additional customized data into LLMs) is needed for LLMs to be consistently accurate across all facets of language education.
In order to examine how effective various LLMs are in language teaching, the current study tested how accurate a variety of popular LLMs are across a range of domains about the English language, including questions regarding English phonology, vocabulary, syntax, and pragmatics. For example, the LLMs were asked “What does “it” refer to in the following sentence: The restaurant was a bit pricey, but he still went to it often”. The LLMs were very accurate with questions on syntax and vocabulary, with less accuracy on phonology and pragmatics. There was also considerable variability in accuracy between different LLMs. Overall, LLMs have the potential to be effective language instructors, but additional fine-tuning (i.e., providing additional customized data into LLMs) is needed for LLMs to be consistently accurate across all facets of language education.
| Original language | English |
|---|---|
| Publication status | Not published / presented only - Jul 2025 |
| Event | International Summit on the Use of AI in Learning and Teaching Languages and Other Subjects - PolyU, Hong Kong Duration: 4 Jul 2025 → 7 Jul 2025 |
Conference
| Conference | International Summit on the Use of AI in Learning and Teaching Languages and Other Subjects |
|---|---|
| Country/Territory | Hong Kong |
| Period | 4/07/25 → 7/07/25 |