TY - GEN
T1 - Indexing high-dimensional data in dual distance spaces
T2 - 11th International Conference on Extending Database Technology, EDBT 2008
AU - Zhuang, Yi
AU - Zhuang, Yueting
AU - Li, Qing
AU - Chen, Lei
AU - Yu, Yi
PY - 2008/5/16
Y1 - 2008/5/16
N2 -
Due to the well-known dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel symmetrical encoding-based index structure, which is called EHD-Tree (for symmetrical Encoding-based Hybrid Distance Tree), is proposed to support fast k-Nearest-Neighbor (k-NN) search in high-dimensional spaces. In an EHD-Tree, all data points are first grouped into clusters by a k-Means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B
+
-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EHD-Tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high dimensional search techniques such as the X-Tree, VA-file, !Distance and NB-Tree, especially when the query radius is not very large.
AB -
Due to the well-known dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel symmetrical encoding-based index structure, which is called EHD-Tree (for symmetrical Encoding-based Hybrid Distance Tree), is proposed to support fast k-Nearest-Neighbor (k-NN) search in high-dimensional spaces. In an EHD-Tree, all data points are first grouped into clusters by a k-Means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B
+
-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EHD-Tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high dimensional search techniques such as the X-Tree, VA-file, !Distance and NB-Tree, especially when the query radius is not very large.
UR - http://www.scopus.com/inward/record.url?scp=43349099618&partnerID=8YFLogxK
U2 - 10.1145/1353343.1353375
DO - 10.1145/1353343.1353375
M3 - Conference article published in proceeding or book
AN - SCOPUS:43349099618
SN - 9781595939265
T3 - Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings
SP - 241
EP - 251
BT - Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings
Y2 - 25 March 2008 through 29 March 2008
ER -