A Grammar-informed Corpus-based Sentence database for linguistic and computational studies

Hongzhi Xu, Helen Kaiyun Chen, Chu Ren Huang, Qin Lu, Tin Shing Chiu, Dingxu Shi

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

1 Citation (Scopus)

Abstract

We adopt the corpus-informed approach to example sentence selections for the construction of a reference grammar. In the process, a database containing sentences that are carefully selected by linguistic experts including the full range of linguistic facts covered in an authoritative Chinese Reference Grammar is constructed and structured according to the reference grammar. A search engine system is developed to facilitate the process of finding the most typical examples the users need to study a linguistic problem or prove their hypotheses. The database can also be used as a training corpus by computational linguists to train models for Chinese word segmentation, POS tagging and sentence parsing.
Original languageEnglish
Title of host publicationProceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012
PublisherEuropean Language Resources Association (ELRA)
Pages3140-3144
Number of pages5
ISBN (Electronic)9782951740877
Publication statusPublished - 1 Jan 2012
Event8th International Conference on Language Resources and Evaluation, LREC 2012 - Istanbul Lufti Kirdar Convention and Exhibition Centre, Istanbul, Turkey
Duration: 21 May 201227 May 2012

Conference

Conference8th International Conference on Language Resources and Evaluation, LREC 2012
CountryTurkey
CityIstanbul
Period21/05/1227/05/12

Keywords

  • Chinese Reference Grammar
  • Linguistic Study
  • Sentence Database

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics
  • Education
  • Library and Information Sciences

Cite this