Recognition and extraction of honorifics in chinese diachronic corpora

Dan Xiong, Jian Xu, Qin Lu, Fengju Lo

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

5 Citations (Scopus)


They can be found in various written records in different periods and have great historical significance. This paper introduces a machine learning system to recognize the honorifics in diachronic corpora. A tagged corpus of four classic novels written in the Ming and Qing dynasties is used to train the system. The system is then used to automatically recognize and extract the honorifics in pre-Qin classics, Tang-dynasty poems, and modern Chinese news. Experimental results show that the system can achieve relatively good results in recognizing the honorifics in the pre-Qin classics and Tang-dynasty poems. This work is an attempt to improve the performance of automatic recognition of honorifics in diachronic corpora. The system can be a helpful tool in the studies on the evolution of honorifics throughout Chinese history.
Original languageEnglish
Title of host publicationChinese Lexical Semantics - 15th Workshop, CLSW 2014, Revised Selected Papers
PublisherSpringer Verlag
Number of pages12
ISBN (Electronic)9783319143309
Publication statusPublished - 1 Jan 2014
Event15th Workshop on Chinese Lexical Semantics, CLSW 2014 - Macao, China
Duration: 9 Jun 201412 Jun 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference15th Workshop on Chinese Lexical Semantics, CLSW 2014


  • Chinese diachronic corpora
  • Honorifics
  • Machine learning algorithm

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Recognition and extraction of honorifics in chinese diachronic corpora'. Together they form a unique fingerprint.

Cite this