Indexing high-dimensional data in dual distance spaces: A symmetrical encoding approach

Yi Zhuang, Yueting Zhuang, Qing Li, Lei Chen, Yi Yu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

12 Citations (Scopus)

Abstract

Due to the well-known dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel symmetrical encoding-based index structure, which is called EHD-Tree (for symmetrical Encoding-based Hybrid Distance Tree), is proposed to support fast k-Nearest-Neighbor (k-NN) search in high-dimensional spaces. In an EHD-Tree, all data points are first grouped into clusters by a k-Means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B + -tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EHD-Tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high dimensional search techniques such as the X-Tree, VA-file, !Distance and NB-Tree, especially when the query radius is not very large.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings
Pages241-251
Number of pages11
DOIs
Publication statusPublished - 16 May 2008
Externally publishedYes
Event11th International Conference on Extending Database Technology, EDBT 2008 - Nantes, France
Duration: 25 Mar 200829 Mar 2008

Publication series

NameAdvances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings

Conference

Conference11th International Conference on Extending Database Technology, EDBT 2008
CountryFrance
CityNantes
Period25/03/0829/03/08

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems
  • Software

Cite this