PoS tagging for classical chinese text

Tin Shing Chiu, Qin Lu, Jian Xu, Dan Xiong, Fengju Lo

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

1 Citation (Scopus)

Abstract

The Chinese language is evolving over the centuries. In order to study the changes of Chinese language using computational methods, segmentation and PoS tagging of Chinese are essential. However, segmentation and PoS tagging methods developed for Modern Standard Chinese do not perform well for Classical Chinese. The cost of segmenting and annotation is high if they are done manually. In this work, we present a CRF based method for PoS tagging for Classical Chinese text in the Ming and Qing dynasties. One of the key issues is the preparation of the training data for CRF. Our initial experiment shows that PoS tagging based on Modern Standard Chinese text can achieve a precision of 83%; and by adding as little as 12,000-word annotated Classical Chinese texts, we were able to improve the precision to over 90%.
Original languageEnglish
Title of host publicationChinese Lexical Semantics - 16th Workshop, CLSW 2015, Revised Selected Papers
PublisherSpringer Verlag
Pages448-456
Number of pages9
ISBN (Print)9783319271934
DOIs
Publication statusPublished - 1 Jan 2015
Event16th Workshop on Chinese Lexical Semantics Workshop, CLSW 2015 - Beijing, China
Duration: 9 May 201511 May 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9332
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th Workshop on Chinese Lexical Semantics Workshop, CLSW 2015
CountryChina
CityBeijing
Period9/05/1511/05/15

Keywords

  • Ancient chinese
  • Chinese classics
  • Novels in the ming and qing dynasties
  • Part-of-speech tagging

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this