Text categorization using hybrid (mined) terms

C. K.P. Wong, Wing Pong Robert Luk, K. F. Wong, K. L. Kwok

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

2 Citations (Scopus)

Abstract

2000 ACM. This paper evaluated text categorization using charactes, bigrams, words and hybrid terms. These terms were also augmented with mined terms. Classifiers using hybrid terms did not achieve better classification performance. The use of data mining techniques to add new terms to the dictionary improves the performance of character-based classifiers. Our naive comparison between the Pat-tree classifier and our best classifier shows that the Pat-tree classifier has the best precision (77%) and our best classifier has the best recall (72%) and the lowest storage requirement (13%).
Original languageEnglish
Title of host publicationProceedings of the 5th international Workshop on Information Retrieval with Asian Languages, IRAL 2000
PublisherAssociation for Computing Machinery, Inc
Pages217-218
Number of pages2
ISBN (Electronic)1581133006, 9781581133004
DOIs
Publication statusPublished - 1 Nov 2000
Event5th International Workshop on Information Retrieval with Asian Languages, IRAL 2000 - Hong Kong, Hong Kong
Duration: 30 Sep 20001 Oct 2000

Conference

Conference5th International Workshop on Information Retrieval with Asian Languages, IRAL 2000
Country/TerritoryHong Kong
CityHong Kong
Period30/09/001/10/00

Keywords

  • Data mining
  • Evaluation
  • Text categorization

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems

Cite this