Abstract
In this study, we propose to use two corpus-driven linguistic approaches for a sense prediction study. We will concentrate on the character similarity clustering approach and the concept similarity clustering approach to predict the senses of non-assigned words by using corpora and tools, such as the Chinese Gigaword Corpus and HowNet. In this study, we will evaluate sense predictions via the sense divisions of Chinese Wordnet (CWN) and Xiandai Hanyu Cidian (Xian Han). Using these corpora, we will determine the clusters of our four target words — chi1 "eat", wan2 "play", huan4 "change", and shao1 "burn" — in order to predict all possible senses and then evaluate them. This process will demonstrate the viability of the corpus-based approaches.
Original language | English |
---|---|
Pages (from-to) | 229-241 |
Number of pages | 13 |
Journal | International journal of computer processing of languages |
Volume | 23 |
Issue number | 3 |
DOIs | |
Publication status | Published - 2011 |
Keywords
- Lexical ambiguity
- Sense prediction
- Corpus-based approach
- Character similarity clustering approach
- Concept similarity clustering approach
- Evaluation