Text categorization based on subtopic clusters

Francis C.Y. Chik, Wing Pong Robert Luk, Fu Lai Korris Chung

Research output: Journal article publicationConference articleAcademic researchpeer-review

2 Citations (Scopus)

Abstract

The distribution of the number of documents in topic classes is typically highly skewed. This leads to good micro-average performance but not so desirable macro-average performance. By viewing topics as clusters in a high dimensional space, we propose the use of clustering to determine subtopic clusters for large topic classes by assuming that large topic clusters are in general a mixture of a number of subtopic clusters. We used the Reuters News articles and support vector machines to evaluate whether using subtopic cluster can lead to better macro-average performance.
Original languageEnglish
Pages (from-to)203-214
Number of pages12
JournalLecture Notes in Computer Science
Volume3513
Publication statusPublished - 30 Sept 2005
Event10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005: Natural Language Processing and Information Systems - Alicante, Spain
Duration: 15 Jun 200517 Jun 2005

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Text categorization based on subtopic clusters'. Together they form a unique fingerprint.

Cite this