One story, one flow: Hidden markov story models for multilingual multidocument summarization

Pascale Fung, Grace Ngai

Research output: Journal article publicationJournal articleAcademic researchpeer-review

33 Citations (Scopus)

Abstract

This article presents a multidocument, multilingual, theme-based summarization system based on modeling text cohesion (story flow). Conventional extractive summarization systems which pick out salient sentences to include in a summary often disregard any flow or sequence that might exist between these sentences. We argue that such inherent text cohesion exists and is (1) specific to a particular story and (2) specific to a particular language. Documents within the same story, and in the same language, share a common story flow, and this flow differs across stories, and across languages. We propose using Hidden Markov Models (HMMs) as story models. An unsupervised segmental K-means method is used to iteratively cluster multiple documents into different topics (stories) and learn the parameters of parallel Hidden Markov Story Models (HMSM), one for each story. We compare story models within and across stories and within and across languages (English and Chinese). The experimental results support our “one story, one flow” and “one language, one flow” hypotheses. We also propose a Naïve Bayes classifier for document summarization. The performance of our summarizer is superior to conventional methods that do not incorporate text cohesion information. Our HMSM method also provides a simple way to compile a single metasummary for multiple documents from individual summaries via state labeled sentences.

Original languageEnglish
Pages (from-to)1-16
Number of pages16
JournalACM Transactions on Speech and Language Processing
Volume3
Issue number2
DOIs
Publication statusPublished - Jul 2006

Keywords

  • Hidden Markov models
  • Multilingual document summarization

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computational Mathematics

Cite this