Sinica BOW: Integrating bilingual WordNet and SUMO Ontology

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

5 Citations (Scopus)

Abstract

Starting Point: The Lexicon Lexicons can perform the bridging function between documents and conceptual categorisation (Calzolari, this panel). This position is motivated by both language engineering concerns as well as psychological felicity. Language engineering adopts linguistic models where a word is the unified atom for both form and meaning. Psycho- and neurc-linguistics, on the other hand, assumes the paradigm where word forms are concrete units that are manipulated for conceptual access. In addition, when the issues and needs of multi-linguality are taken into consideration, it becomes obvious that the lexicon is the only level where generalizations as well as variations across different languages can be captured efficiently and comprehensively. In this talk, we report our preliminary work on integrating a lexical structure such that the linguistic-to-conceptual representation and language-to-language gaps can be bridged simultaneously. Sinica BOW The Sinica BOW (Academia Sinica Bilingual Ontological Wordnet) is intended as a linguistic infrastructure for knowledge representation and knowledge engineering. It is built upon the relation-based structure of WordNet. On one hand, a bilingual wordnet is constructed with Ure crucial design feature of treating bilingual translation correspondences as lexical semantic relations [1]. On the other hand, SUMO (Suggested Upper Merged Ontology) is adopted as thee shared system of conceptual categorization [2]. SUMO is also one of the first conceptual categorization systems to be mapped to an English lexicon [3], Since SUMO is mapped to WordNet 1.6, the English WordNet has become the cornerstone for linking across languages and between a language and its conceptual system. In addition, domain tags are assigned to lemmas when necessary in order to ensure domain inter-operability. By the combination of ontology and wordnet, we hope that Sinica BOW will 1) give each linguistic form a rigorous conceptual location, 2) clarify the relation between conceptual classification and linguistic instantiation, and 3) facilitate genuine cross -lingual access of knowledge. The Sinica BOW allows lexical searches in either language to return ontological information (in either language). Searches on Sinica BOW can return the following iriforination: Sense-based English-Chinese translation equivalency, English word-sense-based ontology and inference, Chinese word-based ontology and inference, Word-sense-based domain specification (under construction). In addition to the integration of Wordnet and ontology, it is also an important goal of Sinica BOW to integrate lexical resources. Sinica BOW's design is lemma-driven, A lexical database of word forms is first compiled by integrating multiple lexical resources. This becomes the central database for lexical management for Sinica BOW. Making use of this lexical database, a lexical search may link to either the main BOW knowledgebase or any of the corresponding entries in an online lexicon. Hie Multilingual and Cross-Domain Properties of (Semantic) Relations In addition to relying on lemmas as retrieval keys, a crucial step in establishing synergy between language and knowledge resources is to identify the conceptual atoms that apply equally effectively to knowledge and language resources. Lexical semantic relations are exactly, such a set of atoms. Sinica BOW implements this idea by encoding the lexical semantic relations between English-Chinese translation equivalent pairs. In addition to more precisely describing the relationship between two translation equivalents, this also allows better cross-lingual inferences. Explicitly allowing lexical semantic relations to be coded cross-lingually also will facilitate the transferring to a structured set of tree relations from one language to the other.
Original languageEnglish
Title of host publicationNLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings
PublisherIEEE
Pages825-826
Number of pages2
ISBN (Electronic)0780379020, 9780780379022
DOIs
Publication statusPublished - 1 Jan 2003
Externally publishedYes
EventInternational Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2003 - Beijing Media Center, Beijing, China
Duration: 26 Oct 200329 Oct 2003

Conference

ConferenceInternational Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2003
Country/TerritoryChina
CityBeijing
Period26/10/0329/10/03

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Software

Cite this