Abstract
There exist several methods to measuring the structural similarity among XML documents. The data mining approach seems to be a novel, interesting and promising one. In view of the deficiencies encountered by ignoring the hierarchical information in encoding the paths for mining, we propose a new sequential pattern mining scheme for XML document similarity computation. It makes use of the hierarchical information to computing the document structural similarity. In addition, it includes a post-processing step to reuse the mined patterns to estimate the similarity of unmatched elements so that another metric to qualify the similarity between XML documents can be introduced. Encouraging experimental results were obtained and reported.
Original language | English |
---|---|
Title of host publication | Advances in Knowledge Discovery and Data Mining |
Publisher | Springer Verlag |
Pages | 356-362 |
Number of pages | 7 |
Volume | 2637 |
ISBN (Electronic) | 3540047603, 9783540047605 |
Publication status | Published - 1 Jan 2003 |
Event | 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2003 - Seoul, Korea, Republic of Duration: 30 Apr 2003 → 2 May 2003 |
Conference
Conference | 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2003 |
---|---|
Country/Territory | Korea, Republic of |
City | Seoul |
Period | 30/04/03 → 2/05/03 |
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Science(all)