TY - GEN
T1 - Venn sampling
T2 - 21st International Conference on Data Engineering, ICDE 2005
AU - Tao, Yufei
AU - Zhai, Jian
AU - Papadias, Dimitris
AU - Li, Qing
PY - 2005/12/12
Y1 - 2005/12/12
N2 - Given a region qR and a future timestamp qT, a "range aggregate" query estimates the number of objects expected to appear in qR at time qT. Currently the only methods for processing such queries are based on spatio-temporal histograms, which have several serious problems. First, they consume considerable space in order to provide accurate estimation. Second, they incur high evaluation cost. Third, their efficiency continuously deteriorates with time. Fourth, their maintenance requires significant update overhead. Motivated by this, we develop Venn sampling (VS), a novel estimation method optimized for a set of "pivot queries" that reflect the distribution of actual ones. In particular, given m pivot queries, VS achieves perfect estimation with only O(m) samples, as opposed to O(2m) required by the current state of the art in workload-aware sampling. Compared with histograms, our technique is much more accurate (given the same space), produces estimates with negligible cost, and does not deteriorate with time. Furthermore, it permits the development of a novel "query-driven" update policy, which reduces the update cost of conventional policies significantly.
AB - Given a region qR and a future timestamp qT, a "range aggregate" query estimates the number of objects expected to appear in qR at time qT. Currently the only methods for processing such queries are based on spatio-temporal histograms, which have several serious problems. First, they consume considerable space in order to provide accurate estimation. Second, they incur high evaluation cost. Third, their efficiency continuously deteriorates with time. Fourth, their maintenance requires significant update overhead. Motivated by this, we develop Venn sampling (VS), a novel estimation method optimized for a set of "pivot queries" that reflect the distribution of actual ones. In particular, given m pivot queries, VS achieves perfect estimation with only O(m) samples, as opposed to O(2m) required by the current state of the art in workload-aware sampling. Compared with histograms, our technique is much more accurate (given the same space), produces estimates with negligible cost, and does not deteriorate with time. Furthermore, it permits the development of a novel "query-driven" update policy, which reduces the update cost of conventional policies significantly.
UR - http://www.scopus.com/inward/record.url?scp=28444488874&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2005.151
DO - 10.1109/ICDE.2005.151
M3 - Conference article published in proceeding or book
AN - SCOPUS:28444488874
SN - 0769522858
T3 - Proceedings - International Conference on Data Engineering
SP - 680
EP - 691
BT - Proceedings - 21st International Conference on Data Engineering, ICDE 2005
Y2 - 5 April 2005 through 8 April 2005
ER -