A weakly-supervised extractive framework for sentiment-preserving document summarization

Y. Ma, Qing Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

2 Citations (Scopus)

Abstract

© 2018 Springer Science+Business Media, LLC, part of Springer NatureThe popularity of social media sites provides new ways for people to share their experiences and convey their opinions, leading to an explosive growth of user-generated content. Text data, owing to the amazing expressiveness of natural language, is of great value for people to explore various kinds of knowledge. However, considerable user-generated text contents are longer than what a reader expects, making automatic document summarization a necessity to facilitate knowledge digestion. In this paper, we focus on the reviews-like sentiment-oriented textual data. We propose the concept of Sentiment-preserving Document Summarization (SDS), aiming at summarizing a long textual document to a shorter version while preserving its main sentiments and not sacrificing readability. To tackle this problem, using deep neural network-based models, we devise an end-to-end weakly-supervised extractive framework, consisting of a hierarchical document encoder, a sentence extractor, a sentiment classifier, and a discriminator to distinguish the extracted summaries from the natural short reviews. The framework is weakly-supervised in that no ground-truth summaries are used for training, while the sentiment labels are available to supervise the generated summary to preserve the sentiments of the original document. In particular, the sentence extractor is trained to generate summaries i) making the sentiment classifier predict the same sentiment category as the original longer documents, and ii) fooling the discriminator into recognizing them as human-written short reviews. Experimental results on two public datasets validate the effectiveness of our framework.
Original languageEnglish
Pages (from-to)1-25
Number of pages25
JournalWorld Wide Web
DOIs
Publication statusAccepted/In press - 30 May 2018
Externally publishedYes

Keywords

  • document summarization
  • end-to-end neural network
  • reinforcement learning
  • sentiment-preserving

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this