A new sequential mining approach to XML document similarity computation1

Ho Pong Leung, Fu Lai Korris Chung, Stephen Chi Fai Chan

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

5 Citations (Scopus)


There exist several methods to measuring the structural similarity among XML documents. The data mining approach seems to be a novel, interesting and promising one. In view of the deficiencies encountered by ignoring the hierarchical information in encoding the paths for mining, we propose a new sequential pattern mining scheme for XML document similarity computation. It makes use of the hierarchical information to computing the document structural similarity. In addition, it includes a post-processing step to reuse the mined patterns to estimate the similarity of unmatched elements so that another metric to qualify the similarity between XML documents can be introduced. Encouraging experimental results were obtained and reported.
Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining
PublisherSpringer Verlag
Number of pages7
ISBN (Electronic)3540047603, 9783540047605
Publication statusPublished - 1 Jan 2003
Event7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2003 - Seoul, Korea, Republic of
Duration: 30 Apr 20032 May 2003


Conference7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2003
Country/TerritoryKorea, Republic of

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this