Comparing Static and Contextual Distributional Semantic Models on Intrinsic Tasks: An Evaluation on Mandarin Chinese Datasets

Pranav A, Yan Cong, Emmanuele Chersoni, Yu Yin Hsu, Alessandro Lenci

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

The field of Distributional Semantics has recently undergone important changes, with the contextual representations produced by Transformers taking the place of static word embeddings models. Noticeably, previous studies comparing the two types of vectors have only focused on the English language and a limited number of models. In our study, we present a comparative evaluation of static and contextualized distributional models for Mandarin Chinese, focusing on a range of intrinsic tasks. Our results reveal that static models remain stronger for some of the classical tasks that consider word meaning independent of context, while contextualized models excel in identifying semantic relations between word pairs and in the categorization of words into abstract semantic classes. The code and datasets are available at https://github.com/pranav-ust/chinese-dsm.

Original languageEnglish
Title of host publication2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
EditorsNicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
PublisherEuropean Language Resources Association (ELRA)
Pages3610-3627
Number of pages18
ISBN (Electronic)9782493814104
Publication statusPublished - May 2024
EventJoint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 - Hybrid, Torino, Italy
Duration: 20 May 202425 May 2024

Publication series

Name2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings

Conference

ConferenceJoint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
Country/TerritoryItaly
CityHybrid, Torino
Period20/05/2425/05/24

Keywords

  • Distributional Semantic Models
  • Mandarin Chinese
  • Semantic Similarity
  • Transformers

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computational Theory and Mathematics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Comparing Static and Contextual Distributional Semantic Models on Intrinsic Tasks: An Evaluation on Mandarin Chinese Datasets'. Together they form a unique fingerprint.

Cite this