TY - GEN
T1 - Self-Supervised Learning Approach for Extracting Citation Information on the Web
AU - Huynh, Dat T.
AU - Hua, Wen
PY - 2012
Y1 - 2012
N2 - In this paper, we propose a framework for automatically training a model to extract citation information on the web. Constructing manually labeled training data to learn an extraction model is tedious, time consuming and difficult to be applied to several styles of citations with different types of entities. To eliminate the requirement of manually labeled training data, we exploit a knowledge base of citation domain and web search to derive labeled training data automatically. Our experiments show that the combination of knowledge base, heuristics and statistical methods can automate the extraction process and achieve good performance.
AB - In this paper, we propose a framework for automatically training a model to extract citation information on the web. Constructing manually labeled training data to learn an extraction model is tedious, time consuming and difficult to be applied to several styles of citations with different types of entities. To eliminate the requirement of manually labeled training data, we exploit a knowledge base of citation domain and web search to derive labeled training data automatically. Our experiments show that the combination of knowledge base, heuristics and statistical methods can automate the extraction process and achieve good performance.
UR - http://www.scopus.com/inward/record.url?scp=84859711052&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-29253-8_69
DO - 10.1007/978-3-642-29253-8_69
M3 - Conference article published in proceeding or book
AN - SCOPUS:84859711052
SN - 9783642292521
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 719
EP - 726
BT - Web Technologies and Applications - 14th Asia-Pacific Web Conference, APWeb 2012, Proceedings
T2 - 14th Asia Pacific Web Technology Conference, APWeb 2012
Y2 - 11 April 2012 through 13 April 2012
ER -