Abstract
中文词汇网络(Chinese WordNet,简称CWN)的设计理念, 是在完整的知识系统下兼顾词义与词义关系的精确表达与语言科技应用。 中文词义的区分与词义间关系的精确表征必须建立在语言学理论, 特别是词汇语义学的基础上。 而词义内容与词义关系的发掘与验证, 则必须源自实际语料。 我们采用的方法是分析与语料结合。 结合的方式则除了验证与举例外, 主要是在大量语料上平行进行词义标记,以反向回馈验证。 完整、 强健知识系统的建立, 是兼顾知识本体(ontology)的完备规范(formal integrity)和人类语言系统内部的完整知识。 我们采用了上层共享知识本体(SUMO)来提供知识的规范系统表征。||The design criterion of Chinese WordNet(CWN) is to build a complete and robust knowledge system which also embodies a precise expression of semantic relations. Such precise expression for the Chinese sense division and the semantic relations must be based on linguistic theory, esp.lexical semantics. All word sense examples together with the lexical semantic relations in CWN are all attested with corpus data. Our methodology involves first analyzing language data and then combining the analyzed result with corpus by sense tagging to re-examine the accuracy of the analysis. For formal representation and computational application a complete and robust knowledge system needs to be equipped with the formal integrity of ontology. The suggested Upper merged Ontology (SUMO) is adopted for this purpose.
Original language | Chinese (Simplified) |
---|---|
Pages (from-to) | 14-23 |
Number of pages | 10 |
Journal | 中文信息学报 (Journal of Chinese information processing) |
Volume | 24 |
Issue number | 2 |
Publication status | Published - Mar 2010 |
Keywords
- Computer application
- Chinese information processing
- Chinese WordNet
- Global Wordnet grid
- Ontology
- Multi-language processing
- Cross-lingual integration