Stylometric studies based on tone and word length motifs

Hou Renkui, Huang Chu-Ren

Research output: Unpublished conference presentation (presented paper, abstract, poster)Conference presentation (not published in journal/proceeding/book)Academic researchpeer-review

Abstract

We propose a new approach to stylometric analysis combining lexical and textual information, but without annotation or other pre-processing. In particular, our study makes use Chinese tones motifs and word length motifs automatically extracted from unannotated texts. The proposed approach is based on linked data in nature as tone and word-length information is extracted from a lexicon and mapped to the text. Support vector machine and random forest were used to establish the classification models for author differentiation. Based on comparative study of classification results of different models, we conclude that the combination of word-final tones motifs, segment-final motifs and word length motifs provides the best outcome and hence is the best model.

Original languageEnglish
Pages56-63
Number of pages8
Publication statusPublished - 1 Jan 2019
Event31st Pacific Asia Conference on Language, Information and Computation, PACLIC 2017 - Cebu City, Philippines
Duration: 16 Nov 201718 Nov 2017

Conference

Conference31st Pacific Asia Conference on Language, Information and Computation, PACLIC 2017
Country/TerritoryPhilippines
CityCebu City
Period16/11/1718/11/17

Keywords

  • Chinese prose
  • Stylometric analysis
  • Tones motifs
  • Word length motif

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science (miscellaneous)

Cite this