Key Phrase Aware Transformer for Abstractive Summarization

Shuaiqi Liu, Jiannong Cao, Ruosong Yang, Zhiyuan Wen

Research output: Journal article publicationJournal articleAcademic researchpeer-review

20 Citations (Scopus)


Abstractive summarization aims to generate a concise summary covering salient content from single or multiple text documents. Many recent abstractive summarization methods are built on the transformer model to capture long-range dependencies in the input text and achieve parallelization. In the transformer encoder, calculating attention weights is a crucial step for encoding input documents. Input documents usually contain some key phrases conveying salient information, and it is important to encode these phrases completely. However, existing transformer-based summarization works did not consider key phrases in input when determining attention weights. Consequently, some of the tokens within key phrases only receive small attention weights, which is not conducive to encoding the semantic information of input documents. In this paper, we introduce some prior knowledge of key phrases into the transformer-based summarization model and guide the model to encode key phrases. For the contextual representation of each token in the key phrase, we assume the tokens within the same key phrase make larger contributions compared with other tokens in the input sequence. Based on this assumption, we propose the Key Phrase Aware Transformer (KPAT), a model with the highlighting mechanism in the encoder to assign greater attention weights for tokens within key phrases. Specifically, we first extract key phrases from the input document and score the phrases’ importance. Then we build the block diagonal highlighting matrix to indicate these phrases’ importance scores and positions. To combine self-attention weights with key phrases’ importance scores, we design two structures of highlighting attention for each head and the multi-head highlighting attention. Experimental results on two datasets (Multi-News and PubMed) from different summarization tasks and domains show that our KPAT model significantly outperforms advanced summarization baselines. We conduct more experiments to analyze the impact of each part of our model on the summarization performance and verify the effectiveness of our proposed highlighting mechanism.

Original languageEnglish
Article number102913
Pages (from-to)1-17
Number of pages17
JournalInformation Processing & Management
Issue number3
Publication statusPublished - May 2022


  • Text summarization
  • Abstractive summarization
  • Key phrase extraction
  • Deep learning


Dive into the research topics of 'Key Phrase Aware Transformer for Abstractive Summarization'. Together they form a unique fingerprint.

Cite this