TY - GEN
T1 - Learning similarity functions in graph-based document summarization
AU - Ouyang, You
AU - Li, Wenjie
AU - Wei, Furu
AU - Lu, Qin
PY - 2009/11/9
Y1 - 2009/11/9
N2 - Graph-based models have been extensively explored in document summarization in recent years. Compared with traditional feature-based models, graph-based models incorporate interrelated information into the ranking process. Thus, potentially they can do a better job in retrieving the important contents from documents. In this paper, we investigate the problem of how to measure sentence similarity which is a crucial issue in graph-based summarization models but in our belief has not been well defined in the past. We propose a supervised learning approach that brings together multiple similarity measures and makes use of human-generated summaries to guide the combination process. Therefore, it can be expected to provide more accurate estimation than a single cosine similarity measure. Experiments conducted on the DUC2005 and DUC2006 data sets show that the proposed learning approach is successful in measuring similarity. Its competitiveness and adaptability are also demonstrated.
AB - Graph-based models have been extensively explored in document summarization in recent years. Compared with traditional feature-based models, graph-based models incorporate interrelated information into the ranking process. Thus, potentially they can do a better job in retrieving the important contents from documents. In this paper, we investigate the problem of how to measure sentence similarity which is a crucial issue in graph-based summarization models but in our belief has not been well defined in the past. We propose a supervised learning approach that brings together multiple similarity measures and makes use of human-generated summaries to guide the combination process. Therefore, it can be expected to provide more accurate estimation than a single cosine similarity measure. Experiments conducted on the DUC2005 and DUC2006 data sets show that the proposed learning approach is successful in measuring similarity. Its competitiveness and adaptability are also demonstrated.
KW - Document summarization
KW - Graph-based ranking
KW - Sentence similarity calculation
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=70350647630&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-00831-3_18
DO - 10.1007/978-3-642-00831-3_18
M3 - Conference article published in proceeding or book
SN - 3642008305
SN - 9783642008306
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 189
EP - 200
BT - Computer Processing of Oriental Languages
T2 - 22nd International Conference on Computer Processing of Oriental Languages, ICCPOL 2009
Y2 - 26 March 2009 through 27 March 2009
ER -