Abstract
Informal short texts on the web are rich in emotions as they often reflect unfiltered immediate reactions to breaking news events. The emotion density, however, stands in contrast to its poverty of linguistic contexts and features for emotion classification. This paper tackles that challenge by proposing orthographic features based on orthographic code mixing and code-switching for both non-ML and ML approaches. Our results show that orthographic features routinely outperform grammatical features for emotion classification for short texts in all approaches as expected. Orthographic features were also shown to make more significant contributions, especially in terms of precision and in formal texts when state of the art deep learning algorithms are applied. This result confirms the effectiveness of the orthographic change feature to the task of emotion classification. These results are argued to be applicable to all languages because of the common code-shifting in languages with non-Latin orthographies, and the use of non-letter symbols in all languages.
Original language | English |
---|---|
Pages (from-to) | 329–352 |
Journal | Language Resources and Evaluation |
Volume | 55 |
Early online date | 23 Nov 2020 |
DOIs | |
Publication status | Published - Jun 2021 |
Keywords
- Orthography
- Emotion classification
- Orthographic code mixing
- Code-switching
- Short text
- Orthographic features
- Morpho-syntactic features