Leveraging network structure for incremental document clustering

Tieyun Qian, Jianfeng Si, Qing Li, Qian Yu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Recent studies have shown that link-based clustering methods can significantly improve the performance of content-based clustering. However, most previous algorithms are developed for fixed data sets, and are not applicable to the dynamic environments such as data warehouse and online digital library. In this paper, we introduce a novel approach which leverages the network structure for incremental clustering. Under this framework, both the link and content information are incorporated to determine the host cluster of a new document. The combination of two types of information ensures a promising performance of the clustering results. Furthermore, the status of core members is used to quickly determine whether to split or merge a new cluster. This filtering process eliminates the unnecessary and time-consuming checks of textual similarity on the whole corpus, and thus greatly speeds up the entire procedure. We evaluate our proposed approach on several real-world publication data sets and conduct an extensive comparison with both the classic content based and the recent link based algorithms. The experimental results demonstrate the effectiveness and efficiency of our method.

Original languageEnglish
Title of host publicationWeb Technologies and Applications - 14th Asia-Pacific Web Conference, APWeb 2012, Proceedings
Pages342-353
Number of pages12
DOIs
Publication statusPublished - 18 Apr 2012
Externally publishedYes
Event14th Asia Pacific Web Technology Conference, APWeb 2012 - Kunming, China
Duration: 11 Apr 201213 Apr 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7235 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th Asia Pacific Web Technology Conference, APWeb 2012
Country/TerritoryChina
CityKunming
Period11/04/1213/04/12

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Leveraging network structure for incremental document clustering'. Together they form a unique fingerprint.

Cite this