Review Authorship Attribution in a Similarity Space

T.-Y. Qian, B. Liu, Qing Li, J. Si

Research output: Journal article publicationJournal articleAcademic researchpeer-review

6 Citations (Scopus)

Abstract

© 2015, Springer Science+Business Media New York. Authorship attribution, also known as authorship classification, is the problem of identifying the authors (reviewers) of a set of documents (reviews). The common approach is to build a classifier using supervised learning. This approach has several issues which hurts its applicability. First, supervised learning needs a large set of documents from each author to serve as the training data. This can be difficult in practice. For example, in the online review domain, most reviewers (authors) only write a few reviews, which are not enough to serve as the training data. Second, the learned classifier cannot be applied to authors whose documents have not been used in training. In this article, we propose a novel solution to deal with the two problems. The core idea is that instead of learning in the original document space, we transform it to a similarity space. In the similarity space, the learning is able to naturally tackle the issues. Our experiment results based on online reviews and reviewers show that the proposed method outperforms the state-of-the-art supervised and unsupervised baseline methods significantly.
Original languageEnglish
Pages (from-to)200-213
Number of pages14
JournalJournal of Computer Science and Technology
Volume30
Issue number1
DOIs
Publication statusPublished - 1 Jan 2015
Externally publishedYes

Keywords

  • authorship attribution
  • similarity space
  • supervised learning

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this