Abstract
2000 ACM. This paper evaluated text categorization using charactes, bigrams, words and hybrid terms. These terms were also augmented with mined terms. Classifiers using hybrid terms did not achieve better classification performance. The use of data mining techniques to add new terms to the dictionary improves the performance of character-based classifiers. Our naive comparison between the Pat-tree classifier and our best classifier shows that the Pat-tree classifier has the best precision (77%) and our best classifier has the best recall (72%) and the lowest storage requirement (13%).
Original language | English |
---|---|
Title of host publication | Proceedings of the 5th international Workshop on Information Retrieval with Asian Languages, IRAL 2000 |
Publisher | Association for Computing Machinery, Inc |
Pages | 217-218 |
Number of pages | 2 |
ISBN (Electronic) | 1581133006, 9781581133004 |
DOIs | |
Publication status | Published - 1 Nov 2000 |
Event | 5th International Workshop on Information Retrieval with Asian Languages, IRAL 2000 - Hong Kong, Hong Kong Duration: 30 Sept 2000 → 1 Oct 2000 |
Conference
Conference | 5th International Workshop on Information Retrieval with Asian Languages, IRAL 2000 |
---|---|
Country/Territory | Hong Kong |
City | Hong Kong |
Period | 30/09/00 → 1/10/00 |
Keywords
- Data mining
- Evaluation
- Text categorization
ASJC Scopus subject areas
- Computer Science Applications
- Information Systems