A rough set-based CBR approach for feature and document reduction in text categorization

Yan Li, Chi Keung Simon Shiu, Sankar Kumar Pal, James Nga Kwok Liu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

3 Citations (Scopus)

Abstract

In this paper, a novel approach of rough set-based case-based reasoning (CBR) approach is proposed to tackle the task of text categorization (TC). The initial work of integrating both feature and document reduction/selection in TC using rough sets and CBR properties is presented. Rough set theory is incorporated to reduce the number of feature terms through generating reducts. On the other hand, two concepts of case coverage and case reachability in CBR are used in selecting representative documents. The main contribution of this paper is that both the number of features and the documents are reduced with minimal loss of useful information. Some experiments are conducted on the text datasets of Reuters21578. The experimental results show that, although the number of feature terms and documents are reduced greatly, the problem-solving quality in terms of classification accuracy is still preserved.
Original languageEnglish
Title of host publicationProceedings of 2004 International Conference on Machine Learning and Cybernetics
Pages2438-2443
Number of pages6
Volume4
Publication statusPublished - 2 Nov 2004
EventProceedings of 2004 International Conference on Machine Learning and Cybernetics - Shanghai, China
Duration: 26 Aug 200429 Aug 2004

Conference

ConferenceProceedings of 2004 International Conference on Machine Learning and Cybernetics
Country/TerritoryChina
CityShanghai
Period26/08/0429/08/04

Keywords

  • Case Coverage
  • Case Reachability
  • Case-based Reasoning (CBR)
  • Rough Set
  • Text Categorization (TC)

ASJC Scopus subject areas

  • Engineering(all)

Cite this