Multi-Task Learning for Abstractive and Extractive Summarization

Yangbin Chen, Yun Ma, Xudong Mao, Qing Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

41 Citations (Scopus)

Abstract

The abstractive method and extractive method are two main approaches for automatic document summarization. In this paper, to fully integrate the relatedness and advantages of both approaches, we propose a general unified framework for abstractive summarization which incorporates extractive summarization as an auxiliary task. In particular, our framework is composed of a shared hierarchical document encoder, a hierarchical attention mechanism-based decoder, and an extractor. We adopt multi-task learning method to train these two tasks jointly, which enables the shared encoder to better capture the semantics of the document. Moreover, as our main task is abstractive summarization, we constrain the attention learned in the abstractive task with the labels of the extractive task to strengthen the consistency between the two tasks. Experiments on the CNN/DailyMail dataset demonstrate that both the auxiliary task and the attention constraint contribute to improve the performance significantly, and our model is comparable to the state-of-the-art abstractive models. In addition, we cut half number of labels of the extractive task, pretrain the extractor, and jointly train the two tasks using the estimated sentence salience of the extractive task to constrain the attention of the abstractive task. The results do not decrease much compared with using full-labeled data of the auxiliary task.

Original languageEnglish
Pages (from-to)14-23
Number of pages10
JournalData Science and Engineering
Volume4
Issue number1
DOIs
Publication statusPublished - 1 Mar 2019

Keywords

  • Attention mechanism
  • Automatic document summarization
  • Multi-task learning

ASJC Scopus subject areas

  • Computational Mechanics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Multi-Task Learning for Abstractive and Extractive Summarization'. Together they form a unique fingerprint.

Cite this