Abstract
We propose a new approach to stylometric analysis combining lexical and textual information, but without annotation or other pre-processing. In particular, our study makes use Chinese tones motifs and word length motifs automatically extracted from unannotated texts. The proposed approach is based on linked data in nature as tone and word-length information is extracted from a lexicon and mapped to the text. Support vector machine and random forest were used to establish the classification models for author differentiation. Based on comparative study of classification results of different models, we conclude that the combination of word-final tones motifs, segment-final motifs and word length motifs provides the best outcome and hence is the best model.
Original language | English |
---|---|
Pages | 56-63 |
Number of pages | 8 |
Publication status | Published - 1 Jan 2019 |
Event | 31st Pacific Asia Conference on Language, Information and Computation, PACLIC 2017 - Cebu City, Philippines Duration: 16 Nov 2017 → 18 Nov 2017 |
Conference
Conference | 31st Pacific Asia Conference on Language, Information and Computation, PACLIC 2017 |
---|---|
Country/Territory | Philippines |
City | Cebu City |
Period | 16/11/17 → 18/11/17 |
Keywords
- Chinese prose
- Stylometric analysis
- Tones motifs
- Word length motif
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science (miscellaneous)