TY - GEN
T1 - Multi-query optimization for distributed similarity query processing
AU - Zhuang, Yi
AU - Li, Qing
AU - Chen, Lei
PY - 2008/9/22
Y1 - 2008/9/22
N2 - This paper considers a multi-query optimization issue for distributed similarity query processing, which attempts to exploit the dependencies in the derivation of a query evaluation plan. To the best of our knowledge, this is the first work investigating a multiquery optimization technique for distributed similarity query processing (0MDSQ). Four steps are incorporated in our MDSQ algorithm. First when a number of query requests(i.e., m query vectors and m radiuses) are simultaneously submitted by users, then a cost-based dynamic query scheduling(DQS) procedure is invoked to quickly and effectively identify the correlation among the query spheres (requests). After that, an index-based vector set reduction is performed at data node level in parallel. Finally, a refinement process of the candidate vectors is conducted to get the answer set. The proposed method includes a cost-based dynamic query scheduling, a Start-Distance(SD)-based load balancing scheme, and an index-based vector set reduction algorithm. The experimental results validate the efficiency and effectiveness of the algorithm in minimizing the response time and increasing the parallelism of I/O and CPU.
AB - This paper considers a multi-query optimization issue for distributed similarity query processing, which attempts to exploit the dependencies in the derivation of a query evaluation plan. To the best of our knowledge, this is the first work investigating a multiquery optimization technique for distributed similarity query processing (0MDSQ). Four steps are incorporated in our MDSQ algorithm. First when a number of query requests(i.e., m query vectors and m radiuses) are simultaneously submitted by users, then a cost-based dynamic query scheduling(DQS) procedure is invoked to quickly and effectively identify the correlation among the query spheres (requests). After that, an index-based vector set reduction is performed at data node level in parallel. Finally, a refinement process of the candidate vectors is conducted to get the answer set. The proposed method includes a cost-based dynamic query scheduling, a Start-Distance(SD)-based load balancing scheme, and an index-based vector set reduction algorithm. The experimental results validate the efficiency and effectiveness of the algorithm in minimizing the response time and increasing the parallelism of I/O and CPU.
UR - http://www.scopus.com/inward/record.url?scp=51849168957&partnerID=8YFLogxK
U2 - 10.1109/ICDCS.2008.58
DO - 10.1109/ICDCS.2008.58
M3 - Conference article published in proceeding or book
AN - SCOPUS:51849168957
SN - 9780769531724
T3 - Proceedings - The 28th International Conference on Distributed Computing Systems, ICDCS 2008
SP - 639
EP - 646
BT - Proceedings - The 28th International Conference on Distributed Computing Systems, ICDCS 2008
T2 - 28th International Conference on Distributed Computing Systems, ICDCS 2008
Y2 - 17 July 2008 through 20 July 2008
ER -