Ranking through clustering: An integrated approach to multi-document summarization

Xiaoyan Cai, Wenjie Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

40 Citations (Scopus)

Abstract

Multi-document summarization aims to create a condensed summary while retaining the main characteristics of the original set of documents. Under such background, sentence ranking has hitherto been the issue of most concern. Since documents often cover a number of topic themes with each theme represented by a cluster of highly related sentences, sentence clustering has been explored in the literature in order to provide more informative summaries. For each topic theme, the rank of terms conditional on this topic theme should be very distinct, and quite different from the rank of terms in other topic themes. Existing cluster-based summarization approaches apply clustering and ranking in isolation, which leads to incomplete, or sometimes rather biased, analytical results. A newly emerged framework uses sentence clustering results to improve or refine the sentence ranking results. Under this framework, we propose a novel approach that directly generates clusters integrated with ranking in this paper. The basic idea of the approach is that ranking distribution of sentences in each cluster should be quite different from each other, which may serve as features of clusters and new clustering measures of sentences can be calculated accordingly. Meanwhile, better clustering results can achieve better ranking results. As a result, ranking and clustering by mutually and simultaneously updating each other so that the performance of both can be improved. The effectiveness of the proposed approach is demonstrated by both the cluster quality analysis and the summarization evaluation conducted on the DUC 2004-2007 datasets.
Original languageEnglish
Article number6480794
Pages (from-to)1424-1433
Number of pages10
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume21
Issue number7
DOIs
Publication statusPublished - 10 Apr 2013

Keywords

  • Document summarization
  • sentence clustering
  • sentence ranking

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Ranking through clustering: An integrated approach to multi-document summarization'. Together they form a unique fingerprint.

Cite this