A subspace decision cluster classifier for text classification

Yan Li, Edward Hung, Fu Lai Korris Chung

Research output: Journal article publicationJournal articleAcademic researchpeer-review

14 Citations (Scopus)

Abstract

In this paper, a new classification method (SDCC) for high dimensional text data with multiple classes is proposed. In this method, a subspace decision cluster classification (SDCC) model consists of a set of disjoint subspace decision clusters, each labeled with a dominant class to determine the class of new objects falling in the cluster. A cluster tree is first generated from a training data set by recursively calling a subspace clustering algorithm Entropy Weighting k-Means algorithm. Then, the SDCC model is extracted from the subspace decision cluster tree. Various tests including Anderson-Darling test are used to determine the stopping condition of the tree growing. A series of experiments on real text data sets have been conducted. Their results show that the new classification method (SDCC) outperforms the existing methods like decision tree and SVM. SDCC is particularly suitable for large, high dimensional sparse text data with many classes.
Original languageEnglish
Pages (from-to)12475-12482
Number of pages8
JournalExpert Systems with Applications
Volume38
Issue number10
DOIs
Publication statusPublished - 15 Sep 2011

Keywords

  • Classification
  • Subspace decision cluster

ASJC Scopus subject areas

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence

Cite this