ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models

  • Martina Miliani
  • , Serena Auriemma
  • , Alessandro Bondielli
  • , Emmanuele Chersoni
  • , Lucia Passaro
  • , Irene Sucameli
  • , Alessandro Lenci

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Large Language Models (LLMs) are increasingly used in tasks requiring interpretive and inferential accuracy. In this paper, we introduce ExpliCa, a new dataset for evaluating LLMs in explicit causal reasoning. ExpliCa uniquely integrates both causal and temporal relations presented in different linguistic orders and explicitly expressed by linguistic connectives. The dataset is enriched with crowdsourced human acceptability ratings. We tested LLMs on ExpliCa through prompting and perplexity-based metrics. We assessed seven commercial and open-source LLMs, revealing that even top models struggle to reach 0.80 accuracy. Interestingly, models tend to confound temporal relations with causal ones, and their performance is also strongly influenced by the linguistic order of the events. Finally, perplexity-based scores and prompting performance are differently affected by model size.
Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics: ACL 2025
EditorsWanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
PublisherAssociation for Computational Linguistics
Pages17335-17355
ISBN (Electronic)9798891762565
DOIs
Publication statusPublished - Jul 2025
EventThe 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) - Vienna, Austria
Duration: 27 Jul 20251 Aug 2025

Conference

ConferenceThe 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)
Country/TerritoryAustria
CityVienna
Period27/07/251/08/25

Fingerprint

Dive into the research topics of 'ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models'. Together they form a unique fingerprint.

Cite this