Linguistic and behavioural studies of Chinese chat language

K.F. Wong, Y. Xia, Wenjie Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

Real-time communication platforms such as ICQ, MSN and online chat rooms are popular today over the Internet. The language used on these platforms differs significantly from standard natural language. Comparatively, this language, referred to as chat language, is informal, anomalous and dynamic. In this paper, the NIL corpus — a Chinese chat language text collection — is presented which is constructed to facilitate language study. Also, linguistic characteristics and ecological behaviour of the language are studied. We show that chat language is anomalous and dynamic in nature leading to the out-of-vocabulary, ambiguity and sparse data problems. To tackle these problems, a novel chat language model is proposed. The model is based on phonetic mapping between chat terms and their standard natural language counterparts.
Original languageEnglish
Pages (from-to)133-152
Number of pages20
JournalInternational journal of computer processing of languages
Volume19
Issue number2
DOIs
Publication statusPublished - 2006

Keywords

  • Chinese chat language
  • Language resources
  • Computational linguistics

Cite this