Automatic template detection for structured web pages

Lawrence Lo, Vincent To Yee Ng, Patrick Ng, Stephen C.F. Chan

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

5 Citations (Scopus)

Abstract

Similar web pages of web sites on the World Wide Web are usually encoded from an underlying structured source, and generated dynamically from a pre-defined template, such as books' information pages in Amazon.com. By given a set of web pages from a common website, it is possible to extract the template by analyzing common patterns between the web pages. In our work, we developed the CF-EXALG (Collaborative Finer-EXALG), based on EXALG, to decompose web pages and finding their common structures. In our system, templates that are used to generate Web pages Can be discovered automatically and stored in XML format. Hence, data encoded in web pages can be easily extracted and the template can be stored for future manipulation, In our preliminary experiments, CF-EXALG has shown to be more accurate and efficient when compared with other similar systems.
Original languageEnglish
Title of host publicationProceedings - 2006 10th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2006
Pages708-713
Number of pages6
DOIs
Publication statusPublished - 1 Dec 2006
Event2006 10th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2006 - Nanjing, China
Duration: 3 May 20065 May 2006

Conference

Conference2006 10th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2006
CountryChina
CityNanjing
Period3/05/065/05/06

Keywords

  • Collaborative system
  • Webpage template construction
  • XML

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Software

Cite this