Abstract
Probabilistic top-k ranking is an important and well-studied query operator in uncertain databases. However, the quality of top- k results might be heavily affected by the ambiguity and uncertainty of the underlying data. Uncertainty reduction techniques have been proposed to improve the quality of top- k results by cleaning the original data. Unfortunately, most data cleaning models aim to probe the exact values of the objects individually and therefore do not work well for subjective data types, such as user ratings, which are inherently probabilistic. In this paper, we propose a novel pairwise crowdsourcing model to reduce the uncertainty of top-k ranking using a crowd of domain experts. Given a crowdsourcing task of limited budget, we propose efficient algorithms to select the best object pairs for crowdsourcing that will bring in the highest quality improvement. Extensive experiments show that our proposed solutions outperform a random selection method by up to 30 times in terms of quality improvement of probabilistic top- k ranking queries. In terms of efficiency, our proposed solutions can reduce the elapsed time of a brute-force algorithm from several days to one minute.
Original language | English |
---|---|
Article number | 7954652 |
Pages (from-to) | 2290-2303 |
Number of pages | 14 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 29 |
Issue number | 10 |
DOIs | |
Publication status | Published - 1 Oct 2017 |
Keywords
- Crowdsourcing
- Top-k ranking
- Uncertain data management
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics