Are Frequent Phrases Directly Retrieved like Idioms? An Investigation with Self-paced Reading and Language Models

Giulia Rambelli, Emmanuele Chersoni, Marco Senaldi, Philippe Blache, Alessandro Lenci

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

An open question in language comprehension studies is whether non-compositional multiword expressions like idioms and compositional-but-frequent word sequences are processed differently. Are the latter constructed online, or are instead directly retrieved from the lexicon, with a degree of entrenchment depending on their frequency? In this paper, we address this question with two different methodologies. First, we set up a self-paced reading experiment comparing human reading times for idioms and both highfrequency and low-frequency compositional word sequences. Then, we ran the same experiment using the Surprisal metrics computed with Neural Language Models (NLMs). Our results provide evidence that idiomatic and high-frequency compositional expressions are processed similarly by both humans and NLMs. Additional experiments were run to test the possible factors that could affect the NLMs’ performance.
Original languageEnglish
Title of host publicationProceedings of the EACL Workshop on Multiword Expressions (MWE 2023)
EditorsArchna Bhatia, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Shiva Taslimipoor
PublisherAssociation for Computational Linguistics (ACL)
Pages87–98
ISBN (Print)978-1-959429-59-3
Publication statusPublished - May 2023
EventWorkshop on Multiword Expressions - Valamar Lacroma Dubrovnik Hotel, Dubrovnik, Croatia
Duration: 6 May 20236 May 2023

Conference

ConferenceWorkshop on Multiword Expressions
Abbreviated titleMWE 2023
Country/TerritoryCroatia
CityDubrovnik
Period6/05/236/05/23

Fingerprint

Dive into the research topics of 'Are Frequent Phrases Directly Retrieved like Idioms? An Investigation with Self-paced Reading and Language Models'. Together they form a unique fingerprint.

Cite this