Algorithm for extracting loosely structured data records through digging strict patterns

Qing Li, J. Chen, Y. Wu

Research output: Journal article publicationJournal articleAcademic researchpeer-review

1 Citation (Scopus)

Abstract

Extracting loosely structured data records (LSDRs) has wide applications in many domains, such as forum pattern recognition, Weblogs data analysis, and books and news review analysis. Yet currently existing methods only work well for strongly structured data records (SDRs). In this paper, we propose to address the problem of extracting LSDRs through mining strict patterns. In our method, we utilize both content feature and tag tree feature to recognize the LSDRs, and propose a new algorithm to extract the Data Records (DRs) automatically. The experimental results demonstrate that our algorithm is able to effectively extract LSDRs with higher precision and recall. © 2009 Springer Science+Business Media, LLC.
Original languageEnglish
Pages (from-to)263-284
Number of pages22
JournalWorld Wide Web
Volume12
Issue number3
DOIs
Publication statusPublished - 1 Aug 2009
Externally publishedYes

Keywords

  • Content feature
  • Data extraction
  • Loosely structured data record
  • Semi-structured data
  • Tree edit distance

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this