Abstract
An open question in language comprehension studies is whether non-compositional multiword expressions like idioms and compositional-but-frequent word sequences are processed differently. Are the latter constructed online, or are instead directly retrieved from the lexicon, with a degree of entrenchment depending on their frequency? In this paper, we address this question with two different methodologies. First, we set up a self-paced reading experiment comparing human reading times for idioms and both highfrequency and low-frequency compositional word sequences. Then, we ran the same experiment using the Surprisal metrics computed with Neural Language Models (NLMs). Our results provide evidence that idiomatic and high-frequency compositional expressions are processed similarly by both humans and NLMs. Additional experiments were run to test the possible factors that could affect the NLMs’ performance.
Original language | English |
---|---|
Title of host publication | Proceedings of the EACL Workshop on Multiword Expressions (MWE 2023) |
Editors | Archna Bhatia, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Shiva Taslimipoor |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 87–98 |
ISBN (Print) | 978-1-959429-59-3 |
Publication status | Published - May 2023 |
Event | Workshop on Multiword Expressions - Valamar Lacroma Dubrovnik Hotel, Dubrovnik, Croatia Duration: 6 May 2023 → 6 May 2023 |
Conference
Conference | Workshop on Multiword Expressions |
---|---|
Abbreviated title | MWE 2023 |
Country/Territory | Croatia |
City | Dubrovnik |
Period | 6/05/23 → 6/05/23 |