TY - GEN
T1 - Summarizing Long-Form Document with Rich Discourse Information
AU - Zhu, Tianyu
AU - Hua, Wen
AU - Qu, Jianfeng
AU - Zhou, Xiaofang
N1 - Funding Information:
This work was partially supported by the Australian Research Council under Grant No. DE210100160.
Publisher Copyright:
© 2021 ACM.
PY - 2021/10/26
Y1 - 2021/10/26
N2 - The development of existing extractive summarization models for long-form document summarization is hindered by two factors: 1) the computation of the summarization model will dramatically increase due to the sheer size of the input long document; 2) the discourse structural information in the long-form document has not been fully exploited. To address the two deficiencies, we propose HEROES, a novel extractive summarization model for summarizing long-form documents with rich discourse structural information. In particular, the HEROES model consists of two modules: 1) a content ranking module that ranks and selects salient sections and sentences to compose a short digest that empowers complex summarization models and serves as its input; 2) an extractive summarization module based on a heterogeneous graph with nodes from different discourse levels and elaborately designed edge connections to reflect the discourse hierarchy of the document and restrain the semantic drifts across section boundaries. Experimental results on benchmark datasets show that HEROES can achieve significantly better performance compared with various strong baselines.
AB - The development of existing extractive summarization models for long-form document summarization is hindered by two factors: 1) the computation of the summarization model will dramatically increase due to the sheer size of the input long document; 2) the discourse structural information in the long-form document has not been fully exploited. To address the two deficiencies, we propose HEROES, a novel extractive summarization model for summarizing long-form documents with rich discourse structural information. In particular, the HEROES model consists of two modules: 1) a content ranking module that ranks and selects salient sections and sentences to compose a short digest that empowers complex summarization models and serves as its input; 2) an extractive summarization module based on a heterogeneous graph with nodes from different discourse levels and elaborately designed edge connections to reflect the discourse hierarchy of the document and restrain the semantic drifts across section boundaries. Experimental results on benchmark datasets show that HEROES can achieve significantly better performance compared with various strong baselines.
KW - content ranking
KW - discourse information
KW - extractive summarization
KW - heterogeneous graph
KW - long-form document
UR - http://www.scopus.com/inward/record.url?scp=85119181880&partnerID=8YFLogxK
U2 - 10.1145/3459637.3482396
DO - 10.1145/3459637.3482396
M3 - Conference article published in proceeding or book
AN - SCOPUS:85119181880
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 2770
EP - 2779
BT - CIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 30th ACM International Conference on Information and Knowledge Management, CIKM 2021
Y2 - 1 November 2021 through 5 November 2021
ER -