Linking basic lexicon to shared ontology for endangered languages: A linked data approach toward formosan languages

Chu Ren Huang, Shu Kai Hsieh, Laurent Prévot, Pei Yi Hsiao, Henry Y. Chang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

4 Citations (Scopus)

Abstract

This paper proposes an innovative approach to link basic lexicon (e.g. Swadesh list) to upper ontology as the foundation of OntoLex interface to address the challenge of building language resources for endangered languages in the linked data paradigm. A linked data approach to language resources requires existing, and preferably sizable, language resources. For endangered and other less-resourced languages, however, the scarcity of existing resources limits the possibilities and potential benefits of linking. The challenges are then, how can construction of language resources for endangered language continue to thrive in the linked data paradigm, and how can the linked data approach benefit language resources for endangered languages. Our proposal requires the bare minimum of available data and we show with examples from Formosan languages (Austronesian or aboriginal languages of Taiwan (Blust 2013, 20)) i that 1) this approach is applicable to endangered languages, and that 2) in spite of the restrictions imposed by scarcity of resources, the linked linguistic data consisting of basic lexicon + upper ontology generate important new information. Comparing Swadesh lists from different languages allowed us to build a small shared ontology that reflects direct human experience, and can serve as the cross-lingual conceptual core. In addition, these micro-ontologized lexicons can be used as seeds for developing a fully-grown and more comprehensive documentation of linguistically motivated ontology for each language.

Original languageEnglish
Pages (from-to)227-268
Number of pages42
JournalJournal of Chinese Linguistics
Volume46
Issue number2
Publication statusPublished - 1 Jun 2018

Keywords

  • Endangered languages
  • Formosan languages (Austronesian languages in Taiwan)
  • Linked Data Swadesh list
  • Ontology
  • SUMO

ASJC Scopus subject areas

  • Arts and Humanities (miscellaneous)
  • Linguistics and Language

Cite this