Fast Error-Bounded Distance Distribution Computation

Jiahao Zhang, Man Lung Yiu, Bo Tang, Qing Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

In this work we study the distance distribution computation problem. It has been widely used in many real-world applications, e.g., human genome clustering, cosmological model analysis, and parameter tuning. The straightforward solution for the exact distance distribution computation problem is unacceptably slow due to (i) massive data size, and (ii) expensive distance computation. In this paper, we propose a novel method to compute approximate distance distributions with error bound guarantees. Furthermore, our method is generic to different distance measures. We conduct extensive experimental studies on three widely used distance measures with real-world datasets. The experimental results demonstrate that our proposed method outperforms sampling-based solution (without error guarantees) by up to three orders of magnitude.

Original languageEnglish
Pages (from-to)5364-5377
JournalIEEE Transactions on Knowledge and Data Engineering
Volume34
Issue number11
Publication statusPublished - Nov 2022

Keywords

  • Analytical models
  • Bioinformatics
  • Computational modeling
  • cumulative distance distribution
  • error-bound guaranteed approximation
  • Genomics
  • Histograms
  • lower and upper bounds
  • Time series analysis
  • Trajectory

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Fast Error-Bounded Distance Distribution Computation'. Together they form a unique fingerprint.

Cite this