Abstract
Most up-to-date well-behaved topic-based summarization systems are built upon the extractive framework. They score the sentences based on the associated features by manually assigning or experimentally tuning the weights of the features. In this paper, we discuss how to develop learning strategies in order to obtain the optimal feature weights automatically, which can be used for assigning a sound score to a sentence characterized with a set of features. The two fundamental issues are about training data and learning models. To save the costly manual annotation time and effort, we construct the training data by labeling the sentence with a "true" score calculated according to human summaries. The Support Vector Regression (SVR) model is then used to learn how to relate the "true" score of the sentence to its features. Once the relations have been mathematically modeled, SVR is able to predict the "estimated" score for any given sentence. The evaluations by ROUGE-2 criterion on DUC 2006 and DUC 2005 document sets demonstrate the competitiveness and the adaptability of the proposed approaches.
Original language | English |
---|---|
Title of host publication | CIKM 2007 - Proceedings of the 16th ACM Conference on Information and Knowledge Management |
Pages | 79-86 |
Number of pages | 8 |
DOIs | |
Publication status | Published - 1 Dec 2007 |
Event | 16th ACM Conference on Information and Knowledge Management, CIKM 2007 - Lisboa, Portugal Duration: 6 Nov 2007 → 9 Nov 2007 |
Conference
Conference | 16th ACM Conference on Information and Knowledge Management, CIKM 2007 |
---|---|
Country/Territory | Portugal |
City | Lisboa |
Period | 6/11/07 → 9/11/07 |
Keywords
- Document summarization
- Support vector regression
ASJC Scopus subject areas
- General Business,Management and Accounting
- General Decision Sciences