Fast Error-Bounded Distance Distribution Computation

Jiahao Zhang, Man Lung Yiu, Bo Tang, Qing Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

In this work we study the distance distribution computation problem. It has been widely used in many real-world applications, e.g., human genome clustering, cosmological model analysis, and parameter tuning. The straightforward solution for the exact distance distribution computation problem is unacceptably slow due to (i) massive data size, and (ii) expensive distance computation. In this paper, we propose a novel method to compute approximate distance distributions with error bound guarantees. Furthermore, our method is generic to different distance measures. We conduct extensive experimental studies on three widely used distance measures with real-world datasets. The experimental results demonstrate that our proposed method outperforms sampling-based solution (without error guarantees) by up to three orders of magnitude.

Original languageEnglish
Pages (from-to)1-14
JournalIEEE Transactions on Knowledge and Data Engineering
DOIs
Publication statusAccepted/In press - 2021

Keywords

  • Analytical models
  • Bioinformatics
  • Computational modeling
  • cumulative distance distribution
  • error-bound guaranteed approximation
  • Genomics
  • Histograms
  • lower and upper bounds
  • Time series analysis
  • Trajectory

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this