Answering Skyline Queries over Incomplete Data with Crowdsourcing

Xiaoye Miao, Yunjun Gao, Su Guo, Lu Chen, Jianwei Yin, Qing Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

12 Citations (Scopus)


Due to the pervasiveness of incomplete data, incomplete data queries are vital in a large number of real-life scenarios. Current models and approaches for incomplete data queries mainly rely on the machine power. In this paper, we study the problem of skyline queries over incomplete data with crowdsourcing. We propose a novel query framework, termed as ${\sf BayesCrowd}$BayesCrowd, which takes into account the data correlation using the Bayesian network. We leverage the typical c-Table model on incomplete data to represent objects. Considering budget and latency constraints, we present a suite of effective task selection strategies. Moreover, we introduce a marginal utility function to measure the benefit of crowdsourcing one task. In particular, the probability computation of each object being an answer object is at least as hard as #SAT problem. To this end, we propose an adaptive DPLL (i.e., Davis-Putnam-Logemann-Loveland) algorithm to speed up the computation. Extensive experiments using both real and synthetic data sets confirm the superiority of ${\sf BayesCrowd}$BayesCrowd to the state-of-The-Art method, in terms of execution time, monetary cost, and latency minimization.

Original languageEnglish
Article number8865657
Pages (from-to)1360-1374
Number of pages15
JournalIEEE Transactions on Knowledge and Data Engineering
Issue number4
Publication statusPublished - 1 Apr 2021


  • crowdsourcing
  • incomplete data
  • Query processing
  • skyline query

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics


Dive into the research topics of 'Answering Skyline Queries over Incomplete Data with Crowdsourcing'. Together they form a unique fingerprint.

Cite this