Abstract
The distribution of the number of documents in topic classes is typically highly skewed. This leads to good micro-average performance but not so desirable macro-average performance. By viewing topics as clusters in a high dimensional space, we propose the use of clustering to determine subtopic clusters for large topic classes by assuming that large topic clusters are in general a mixture of a number of subtopic clusters. We used the Reuters News articles and support vector machines to evaluate whether using subtopic cluster can lead to better macro-average performance.
| Original language | English |
|---|---|
| Pages (from-to) | 203-214 |
| Number of pages | 12 |
| Journal | Lecture Notes in Computer Science |
| Volume | 3513 |
| Publication status | Published - 30 Sept 2005 |
| Event | 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005: Natural Language Processing and Information Systems - Alicante, Spain Duration: 15 Jun 2005 → 17 Jun 2005 |
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science