Robust stylometric analysis and author attribution based on tones and rimes

Renkui Hou, Chu Ren Huang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

3 Citations (Scopus)

Abstract

In this article, we propose an innovative and robust approach to stylometric analysis without annotation and leveraging lexical and sub-lexical information. In particular, we propose to leverage the phonological information of tones and rimes in Mandarin Chinese automatically extracted from unannotated texts. The texts from different authors were represented by tones, tone motifs, and word length motifs as well as rimes and rime motifs. Support vector machines and random forests were used to establish the text classification model for authorship attribution. From the results of the experiments, we conclude that the combination of bigrams of rimes, word-final rimes, and segment-final rimes can discriminate the texts from different authors effectively when using random forests to establish the classification model. This robust approach can in principle be applied to other languages with established phonological inventory of onset and rimes.

Original languageEnglish
Pages (from-to)49-71
Number of pages23
JournalNatural Language Engineering
Volume26
Issue number1
DOIs
Publication statusPublished - 1 Jan 2020

Keywords

  • Author identification
  • Quantitative stylistics
  • Random forest
  • Stylometrics
  • SVM
  • Tone and rime motifs

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Linguistics and Language
  • Artificial Intelligence

Cite this