Text categorization based on subtopic clusters

Research output: Journal article publicationConference articleAcademic researchpeer-review

2 Citations (Scopus)

Abstract

The distribution of the number of documents in topic classes is typically highly skewed. This leads to good micro-average performance but not so desirable macro-average performance. By viewing topics as clusters in a high dimensional space, we propose the use of clustering to determine subtopic clusters for large topic classes by assuming that large topic clusters are in general a mixture of a number of subtopic clusters. We used the Reuters News articles and support vector machines to evaluate whether using subtopic cluster can lead to better macro-average performance.
Original languageEnglish
Pages (from-to)203-214
Number of pages12
JournalLecture Notes in Computer Science
Volume3513
Publication statusPublished - 30 Sep 2005
Event10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005: Natural Language Processing and Information Systems - Alicante, Spain
Duration: 15 Jun 200517 Jun 2005

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this