Abstract
Abstract Capturing the compositional process from words to documents is a key challenge in natural language processing and information retrieval. Extractive style query-oriented multi-document summarization generates a summary by extracting a proper set of sentences from multiple documents based on pre-given query. This paper proposes a novel document summarization framework based on deep learning model, which has been shown outstanding extraction ability in many real-world applications. The framework consists of three parts: concepts extraction, summary generation, and reconstruction validation. A new query-oriented extraction technique is proposed to extract information distributed in multiple documents. Then, the whole deep architecture is fine-tuned by minimizing the information loss in reconstruction validation. According to the concepts extracted from deep architecture layer by layer, dynamic programming is used to seek most informative set of sentences for the summary. Experiment on three benchmark datasets (DUC 2005, 2006, and 2007) assess and confirm the effectiveness of the proposed framework and algorithms. Experiment results show that the proposed method outperforms state-of-the-art extractive summarization approaches. Moreover, we also provide the statistical analysis of query words based on Amazon's Mechanical Turk (MTurk) crowdsourcing platform. There exists underlying relationships from topic words to the content which can contribute to summarization task.
Original language | English |
---|---|
Article number | 10053 |
Pages (from-to) | 8146-8155 |
Number of pages | 10 |
Journal | Expert Systems with Applications |
Volume | 42 |
Issue number | 21 |
DOIs | |
Publication status | Published - 18 Jul 2015 |
Keywords
- Deep learning
- Multi-document
- Neocortex simulation
- Query-oriented summarization
ASJC Scopus subject areas
- General Engineering
- Computer Science Applications
- Artificial Intelligence