Mandarin Relata: A Dataset of Word Relations and Their Semantic Types

Hongchao Liu, Chu Ren Huang, Ren Kui Hou

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

For both the training and evaluation of semantic distributional models, language datasets are needed that are both elaborate in their word level descriptors and readily intuitive to human judgment. The current paper introduces a dataset for Mandarin Chinese constructed through the combination of word relation pairs from two distinct sources: corpus extraction, and human elicitation. Our results show that while more word relation pairs were gained through the corpus extraction process, human elicited semantic neighbors were almost twice as likely to show agreement with human raters. The current methods created 4091 word relation pairs that span hypernymy, hyponymy, synonymy, antonymy, and meronymy alongside semantic type information. To date, this is the largest collection of human-rated word relation pairs in Mandarin Chinese.

Original languageEnglish
Title of host publicationChinese Lexical Semantics - 18th Workshop, CLSW 2017, Revised Selected Papers
EditorsYunfang Wu, Qi Su, Jia-Fei Hong
PublisherSpringer-Verlag
Pages336-340
Number of pages5
ISBN (Print)9783319735726
DOIs
Publication statusPublished - 1 Jan 2018
Event18th Chinese Lexical Semantics Workshop, CLSW 2017 - Leshan, China
Duration: 18 May 201720 May 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10709 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th Chinese Lexical Semantics Workshop, CLSW 2017
Country/TerritoryChina
CityLeshan
Period18/05/1720/05/17

Keywords

  • Dataset
  • DSM
  • Semantic types
  • Word relation

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Mandarin Relata: A Dataset of Word Relations and Their Semantic Types'. Together they form a unique fingerprint.

Cite this