Abstract
In this study, we propose to use two corpus-based linguistic approaches for a sense prediction study. We will concentrate on the character similarity clustering approach and concept similarity clustering approach to predict the senses of non-assigned words by using corpora and tools, such as Chinese Gigaword Corpus, and HowNet. In this study, we would then like to evaluate their predictions via the sense divisions of Chinese Wordnet and Xiandai Hanyu Cidian. Using these corpora, we will determine the clusters of our four target words ---- chi1 "eat", wan2 "play", huan4 "change" and shao1 "burn" in order to predict their all possible senses and evaluate them. This requirement will demonstrate the visibility of the corpus-based approaches.
Original language | English |
---|---|
Title of host publication | PACLIC 24 - Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation |
Pages | 399-407 |
Number of pages | 9 |
Publication status | Published - 1 Dec 2010 |
Event | 24th Pacific Asia Conference on Language, Information and Computation, PACLIC 24 - Sendai, Japan Duration: 4 Nov 2010 → 7 Nov 2010 |
Conference
Conference | 24th Pacific Asia Conference on Language, Information and Computation, PACLIC 24 |
---|---|
Country/Territory | Japan |
City | Sendai |
Period | 4/11/10 → 7/11/10 |
Keywords
- Character similarity clustering
- Concept similarity clustering
- Corpus-based approach
- Evaluation
- Lexical ambiguity
- Sense prediction
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science (miscellaneous)