Title extraction from Loosely Structured Data Records

Yi Pu Wu, Xue Jie Zhang, Qing Li, Jing Chen

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

1 Citation (Scopus)

Abstract

In this paper, we present a novel title extraction method from Loosely Structured Data Records (LSDRs). Firstly, we automatically identify the format of titles and then extract them accordingly. For the Web page whose title is occurred in all the Data Records, we obtain the one in the candidate titles which has the largest length of the "same content" as the accurate title. And for the Web page whose title is occurred before the first Data Record, the candidate title which has the largest length of the "different content" can be considered as the accurate title. Our experiment demonstrates that our automatic algorithm is robust and effective on two databases collected from the Internet.

Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Machine Learning and Cybernetics, ICMLC
Pages2623-2628
Number of pages6
DOIs
Publication statusPublished - 25 Dec 2008
Externally publishedYes
Event7th International Conference on Machine Learning and Cybernetics, ICMLC - Kunming, China
Duration: 12 Jul 200815 Jul 2008

Publication series

NameProceedings of the 7th International Conference on Machine Learning and Cybernetics, ICMLC
Volume5

Conference

Conference7th International Conference on Machine Learning and Cybernetics, ICMLC
Country/TerritoryChina
CityKunming
Period12/07/0815/07/08

Keywords

  • Forum data
  • Loosely structured data records
  • Structured data records
  • Title extraction

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Control and Systems Engineering

Cite this