Abstract
Real-time communication platforms such as ICQ, MSN and online chat rooms are popular today over the Internet. The language used on these platforms differs significantly from standard natural language. Comparatively, this language, referred to as chat language, is informal, anomalous and dynamic. In this paper, the NIL corpus — a Chinese chat language text collection — is presented which is constructed to facilitate language study. Also, linguistic characteristics and ecological behaviour of the language are studied. We show that chat language is anomalous and dynamic in nature leading to the out-of-vocabulary, ambiguity and sparse data problems. To tackle these problems, a novel chat language model is proposed. The model is based on phonetic mapping between chat terms and their standard natural language counterparts.
Original language | English |
---|---|
Pages (from-to) | 133-152 |
Number of pages | 20 |
Journal | International journal of computer processing of languages |
Volume | 19 |
Issue number | 2 |
DOIs | |
Publication status | Published - 2006 |
Keywords
- Chinese chat language
- Language resources
- Computational linguistics