Abstract
In this work we study the distance distribution computation problem. It has been widely used in many real-world applications, e.g., human genome clustering, cosmological model analysis, and parameter tuning. The straightforward solution for the exact distance distribution computation problem is unacceptably slow due to (i) massive data size, and (ii) expensive distance computation. In this paper, we propose a novel method to compute approximate distance distributions with error bound guarantees. Furthermore, our method is generic to different distance measures. We conduct extensive experimental studies on three widely used distance measures with real-world datasets. The experimental results demonstrate that our proposed method outperforms sampling-based solution (without error guarantees) by up to three orders of magnitude.
Original language | English |
---|---|
Pages (from-to) | 5364-5377 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 34 |
Issue number | 11 |
Publication status | Published - Nov 2022 |
Keywords
- Analytical models
- Bioinformatics
- Computational modeling
- cumulative distance distribution
- error-bound guaranteed approximation
- Genomics
- Histograms
- lower and upper bounds
- Time series analysis
- Trajectory
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics