Extracting loosely structured data records through mining strict patterns

Yipu Wu, Jing Chen, Qing Li

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

7 Citations (Scopus)

Abstract

Extracting loosely structured data records (DRs) has wide applications in many domains, such as forum pattern recognition, blog data analysis, and books and news review analysis. Currently existing methods work well for strongly structured DRs only. In this paper, we address the problem of extracting loosely structured DRs through mining strict patterns. In our method, we utilize both content feature and tag tree feature to recognize the loosely structured DRs, and propose a new approach to extract the DRs automatically. Through experimental study we demonstrate that this method is both effective and robust in practice.

Original languageEnglish
Title of host publicationProceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
Pages1322-1324
Number of pages3
DOIs
Publication statusPublished - 1 Oct 2008
Externally publishedYes
Event2008 IEEE 24th International Conference on Data Engineering, ICDE'08 - Cancun, Mexico
Duration: 7 Apr 200812 Apr 2008

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Conference

Conference2008 IEEE 24th International Conference on Data Engineering, ICDE'08
Country/TerritoryMexico
CityCancun
Period7/04/0812/04/08

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Software

Cite this