Using corpus-based linguistic approaches in sense prediction study

Jia Fei Hong, Sue Jin Ker, Chu-ren Huang, Kathleen Virginia Ahrens

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

In this study, we propose to use two corpus-based linguistic approaches for a sense prediction study. We will concentrate on the character similarity clustering approach and concept similarity clustering approach to predict the senses of non-assigned words by using corpora and tools, such as Chinese Gigaword Corpus, and HowNet. In this study, we would then like to evaluate their predictions via the sense divisions of Chinese Wordnet and Xiandai Hanyu Cidian. Using these corpora, we will determine the clusters of our four target words ---- chi1 "eat", wan2 "play", huan4 "change" and shao1 "burn" in order to predict their all possible senses and evaluate them. This requirement will demonstrate the visibility of the corpus-based approaches.
Original languageEnglish
Title of host publicationPACLIC 24 - Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation
Pages399-407
Number of pages9
Publication statusPublished - 1 Dec 2010
Event24th Pacific Asia Conference on Language, Information and Computation, PACLIC 24 - Sendai, Japan
Duration: 4 Nov 20107 Nov 2010

Conference

Conference24th Pacific Asia Conference on Language, Information and Computation, PACLIC 24
Country/TerritoryJapan
CitySendai
Period4/11/107/11/10

Keywords

  • Character similarity clustering
  • Concept similarity clustering
  • Corpus-based approach
  • Evaluation
  • Lexical ambiguity
  • Sense prediction

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science (miscellaneous)

Cite this