Self-Supervised Learning Approach for Extracting Citation Information on the Web

Dat T. Huynh, Wen Hua

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

3 Citations (Scopus)

Abstract

In this paper, we propose a framework for automatically training a model to extract citation information on the web. Constructing manually labeled training data to learn an extraction model is tedious, time consuming and difficult to be applied to several styles of citations with different types of entities. To eliminate the requirement of manually labeled training data, we exploit a knowledge base of citation domain and web search to derive labeled training data automatically. Our experiments show that the combination of knowledge base, heuristics and statistical methods can automate the extraction process and achieve good performance.

Original languageEnglish
Title of host publicationWeb Technologies and Applications - 14th Asia-Pacific Web Conference, APWeb 2012, Proceedings
Pages719-726
Number of pages8
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event14th Asia Pacific Web Technology Conference, APWeb 2012 - Kunming, China
Duration: 11 Apr 201213 Apr 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7235 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th Asia Pacific Web Technology Conference, APWeb 2012
Country/TerritoryChina
CityKunming
Period11/04/1213/04/12

ASJC Scopus subject areas

  • Information Systems

Fingerprint

Dive into the research topics of 'Self-Supervised Learning Approach for Extracting Citation Information on the Web'. Together they form a unique fingerprint.

Cite this