TY - JOUR
T1 - Answering Skyline Queries over Incomplete Data with Crowdsourcing
AU - Miao, Xiaoye
AU - Gao, Yunjun
AU - Guo, Su
AU - Chen, Lu
AU - Yin, Jianwei
AU - Li, Qing
N1 - Funding Information:
This work was supported in part by the National Key Research and Development Program of China under Grants No. 2018YFB1004003 and 2017YFB1400603, NSFC Grants No. 61972338, 61902343, 61825205, 61772459, and U1609217, National Science and Technology Major Project of China under Grant No. 50-D36B02-9002-16/19, the ZJU-Hikvision Joint Project, and the Fundamental Research Funds for the Central Universities. Yunjun Gao is the corresponding author of the work.
Publisher Copyright:
© 1989-2012 IEEE.
PY - 2021/4/1
Y1 - 2021/4/1
N2 - Due to the pervasiveness of incomplete data, incomplete data queries are vital in a large number of real-life scenarios. Current models and approaches for incomplete data queries mainly rely on the machine power. In this paper, we study the problem of skyline queries over incomplete data with crowdsourcing. We propose a novel query framework, termed as ${\sf BayesCrowd}$BayesCrowd, which takes into account the data correlation using the Bayesian network. We leverage the typical c-Table model on incomplete data to represent objects. Considering budget and latency constraints, we present a suite of effective task selection strategies. Moreover, we introduce a marginal utility function to measure the benefit of crowdsourcing one task. In particular, the probability computation of each object being an answer object is at least as hard as #SAT problem. To this end, we propose an adaptive DPLL (i.e., Davis-Putnam-Logemann-Loveland) algorithm to speed up the computation. Extensive experiments using both real and synthetic data sets confirm the superiority of ${\sf BayesCrowd}$BayesCrowd to the state-of-The-Art method, in terms of execution time, monetary cost, and latency minimization.
AB - Due to the pervasiveness of incomplete data, incomplete data queries are vital in a large number of real-life scenarios. Current models and approaches for incomplete data queries mainly rely on the machine power. In this paper, we study the problem of skyline queries over incomplete data with crowdsourcing. We propose a novel query framework, termed as ${\sf BayesCrowd}$BayesCrowd, which takes into account the data correlation using the Bayesian network. We leverage the typical c-Table model on incomplete data to represent objects. Considering budget and latency constraints, we present a suite of effective task selection strategies. Moreover, we introduce a marginal utility function to measure the benefit of crowdsourcing one task. In particular, the probability computation of each object being an answer object is at least as hard as #SAT problem. To this end, we propose an adaptive DPLL (i.e., Davis-Putnam-Logemann-Loveland) algorithm to speed up the computation. Extensive experiments using both real and synthetic data sets confirm the superiority of ${\sf BayesCrowd}$BayesCrowd to the state-of-The-Art method, in terms of execution time, monetary cost, and latency minimization.
KW - crowdsourcing
KW - incomplete data
KW - Query processing
KW - skyline query
UR - http://www.scopus.com/inward/record.url?scp=85102900060&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2019.2946798
DO - 10.1109/TKDE.2019.2946798
M3 - Journal article
AN - SCOPUS:85102900060
SN - 1041-4347
VL - 33
SP - 1360
EP - 1374
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 4
M1 - 8865657
ER -