Summarizing Long-Form Document with Rich Discourse Information

Tianyu Zhu, Wen Hua, Jianfeng Qu, Xiaofang Zhou

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

4 Citations (Scopus)

Abstract

The development of existing extractive summarization models for long-form document summarization is hindered by two factors: 1) the computation of the summarization model will dramatically increase due to the sheer size of the input long document; 2) the discourse structural information in the long-form document has not been fully exploited. To address the two deficiencies, we propose HEROES, a novel extractive summarization model for summarizing long-form documents with rich discourse structural information. In particular, the HEROES model consists of two modules: 1) a content ranking module that ranks and selects salient sections and sentences to compose a short digest that empowers complex summarization models and serves as its input; 2) an extractive summarization module based on a heterogeneous graph with nodes from different discourse levels and elaborately designed edge connections to reflect the discourse hierarchy of the document and restrain the semantic drifts across section boundaries. Experimental results on benchmark datasets show that HEROES can achieve significantly better performance compared with various strong baselines.

Original languageEnglish
Title of host publicationCIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages2770-2779
Number of pages10
ISBN (Electronic)9781450384469
DOIs
Publication statusPublished - 26 Oct 2021
Externally publishedYes
Event30th ACM International Conference on Information and Knowledge Management, CIKM 2021 - Virtual, Online, Australia
Duration: 1 Nov 20215 Nov 2021

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference30th ACM International Conference on Information and Knowledge Management, CIKM 2021
Country/TerritoryAustralia
CityVirtual, Online
Period1/11/215/11/21

Keywords

  • content ranking
  • discourse information
  • extractive summarization
  • heterogeneous graph
  • long-form document

ASJC Scopus subject areas

  • Information Systems

Fingerprint

Dive into the research topics of 'Summarizing Long-Form Document with Rich Discourse Information'. Together they form a unique fingerprint.

Cite this