A rough set-based CBR approach for feature and document reduction in text categorization

Y. Li, Chi Keung Simon Shiu, S.K. Pal, J.N.K. Liu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

An approach of rough set-based case-based reasoning (CBR) approach is proposed to tackle the task of text categorization (TC). The initial work of integrating both feature and document reduction/selection in TC using rough sets and CBR properties is presented. Rough set theory is incorporated to reduce the number of feature terms through generating reducts. On the other hand, two concepts of case coverage and case reachability in CBR are used in selecting the representative documents. The main contribution of this paper is that both the number of features and the documents are reduced with minimal loss of useful information. Some experiments are conducted on the text datasets of Reuters21578. The experimental results show that, although the number of feature terms and documents are reduced greatly, the problem-solving quality in terms of classification accuracy is still preserved.
Original languageEnglish
Title of host publicationProceedings of 2004 International Conference on Machine Learning and Cybernetics, 2004, 26-29 August 2004
PublisherIEEE
Pages2438-2443
Number of pages6
ISBN (Print)0780384032
DOIs
Publication statusPublished - 2004
EventInternational Conference on Machine Learning and Cybernetics -
Duration: 1 Jan 2004 → …

Conference

ConferenceInternational Conference on Machine Learning and Cybernetics
Period1/01/04 → …

Keywords

  • Case-based reasoning
  • Natural languages
  • Rough set theory
  • Word processing

Cite this