Optimizing Quality for Probabilistic Skyline Computation and Probabilistic Similarity Search

X. Miao, Y. Gao, L. Zhou, W. Wang, Qing Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

3 Citations (Scopus)

Abstract

© 1989-2012 IEEE. Probabilistic queries have been extensively explored to provide answers with confidence, in order to support the real-life applications struggling with uncertain data, such as sensor networks and data integration. However, the uncertainty of data may propagate, and thus, the results returned by probabilistic queries contain much noise, which degrades query quality significantly. In this paper, we propose an efficient optimization framework, termed as QueryClean , for both probabilistic skyline computation and probabilistic similarity search. The goal of QueryClean is to optimize query quality via selecting a group of uncertain objects to clean under limited resource available, where a joint-entropy based quality function is leveraged. We develop an efficient structure called ASI to index the possible result sets of probabilistic queries, which helps to avoid many types of probabilistic query evaluations over a large number of the possible worlds for quality computation. Moreover, we present exact and approximate algorithms for the optimization problem, using two newly presented heuristics. Considerable experimental results on both real and synthetic data sets demonstrate the efficiency and scalability of our proposed framework QueryClean.
Original languageEnglish
Article number8291012
Pages (from-to)1741-1755
Number of pages15
JournalIEEE Transactions on Knowledge and Data Engineering
Volume30
Issue number9
DOIs
Publication statusPublished - 1 Sep 2018
Externally publishedYes

Keywords

  • optimization algorithms
  • probabilistic similarity query
  • Probabilistic skyline query
  • query quality

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this