Abstract
This paper introduces a new technology for collocation extraction in Chinese. Sketch Engine (Kilgarriff et al., 2004) has proven to be a very effective tool for automatic description of lexical information, including collocation extraction, based on large-scale corpus. The original work of Sketch Engine was based on BNC. We extend Sketch Engine to Chinese based on Gigaword corpus from LDC. We discuss the available functions of the prototype Chinese Sketch Engine (CSE) as well as the robustness of language-independent adaptation of Sketch Engine. We conclude by discussing how Chinese-specific linguistic information can be incorporated to improve the CSE prototype.
Original language | English |
---|---|
Title of host publication | 4th SIGHAN Workshop on Chinese Language Processing, Proceedings of the Workshop |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 48-55 |
Number of pages | 8 |
Publication status | Published - 2005 |
Externally published | Yes |
Event | 4th SIGHAN Workshop on Chinese Language Processing at the 2nd International Joint Conference on Natural Language Processing, SIGHAN@IJCNLP 2005 - Jeju Island, Korea, Republic of Duration: 14 Oct 2005 → 15 Oct 2005 |
Conference
Conference | 4th SIGHAN Workshop on Chinese Language Processing at the 2nd International Joint Conference on Natural Language Processing, SIGHAN@IJCNLP 2005 |
---|---|
Country/Territory | Korea, Republic of |
City | Jeju Island |
Period | 14/10/05 → 15/10/05 |
ASJC Scopus subject areas
- Language and Linguistics
- Linguistics and Language