Exploiting surface, content and relevance features for learning-based extractive summarization

Mingli Wu, Wenjie Li, Furu Wei, Qin Lu, Kam Fai Wong

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

1 Citation (Scopus)

Abstract

Extractive summarization is to identify whether a sentence should be selected for inclusion in the summary or not. It can be transformed into a classification task. In this paper, we explore various features under a learning-based classification framework, including basic surface features, content features a sentence may represent and the features indicating the relevance among sentences. While surface and content features are about extrinsic and intrinsic aspects of a sentence itself, relevance features describe the strength of sentence relatedness. Sentences processed by classifiers are then feed to a re-ranking algorithm. The ones with higher priority are included in the summary. Experiments show that the proposed framework and the integrated features achieve competitive results on DUC 2001 document sets when evaluated by ROUGE. We find that relevance features are able to improve the summarization performance obviously.
Original languageEnglish
Title of host publicationIEEE NLP-KE 2007 - Proceedings of International Conference on Natural Language Processing and Knowledge Engineering
Pages234-241
Number of pages8
DOIs
Publication statusPublished - 1 Dec 2007
EventInternational Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE 2007 - Beijing, China
Duration: 30 Aug 20071 Sept 2007

Conference

ConferenceInternational Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE 2007
Country/TerritoryChina
CityBeijing
Period30/08/071/09/07

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Exploiting surface, content and relevance features for learning-based extractive summarization'. Together they form a unique fingerprint.

Cite this