Quality assurance for segmentation and tagging of Chinese novels in the Ming and Qing dynasties

Dan Xiong, Qin Lu, Fengju Lo, Dingxu Shi, Tin Shing Chiu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

1 Citation (Scopus)

Abstract

This paper presents a word segmentation and named entity tagging project which annotates Chinese novels in the Ming and Qing dynasties. Computer-aided tools are used to assist the annotation. The focus of this paper will be on the quality assurance measures to ensure precision and consistency. The specification for word segmentation and named entity tagging is formulated based on the standards for modern Chinese segmentation commonly used in Mainland China and in Taiwan as well as the analysis of differences between Chinese classics and modern Chinese. The specification is established through iterative refinements. This refinement process can offer valuable insights into the quality control of computer-aided processing performed on Chinese literature works in the Ming and Qing dynasties and can be applied to those in even earlier periods. The finalized corpus, built in a computer-aided, manually-reviewed method in accordance with the specification, can be used for researches in literature, linguistics, information technology, and teaching of Chinese.
Original languageEnglish
Title of host publicationProceedings - 2012 International Conference on Asian Language Processing, IALP 2012
PublisherIEEE Computer Society
Pages77-80
Number of pages4
DOIs
Publication statusPublished - 1 Jan 2012
Event2012 International Conference on Asian Language Processing, IALP 2012 - Hanoi, Viet Nam
Duration: 13 Nov 201215 Nov 2012

Conference

Conference2012 International Conference on Asian Language Processing, IALP 2012
Country/TerritoryViet Nam
CityHanoi
Period13/11/1215/11/12

Keywords

  • Named entities
  • Novels in the Ming and Qing dynasties
  • Quality assurance
  • Tagging
  • Word segmentation

ASJC Scopus subject areas

  • Software

Cite this