Efficient Data Placement and Replication for QoS-Aware Approximate Query Evaluation of Big Data Analytics

Qiufen Xia, Zichuan Xu, Weifa Liang, Shui Yu, Song Guo, Albert Y. Zomaya

Research output: Journal article publicationJournal articleAcademic researchpeer-review

7 Citations (Scopus)

Abstract

Enterprise users at different geographic locations generate large-volume data that is stored at different geographic datacenters. These users may also perform big data analytics on the stored data to identify valuable information in order to make strategic decisions. However, it is well known that performing big data analytics on data in geographical-located datacenters usually is time-consuming and costly. In some delay-sensitive applications, the query result may become useless if answering a query takes too long time. Instead, sometimes users may only be interested in timely approximate rather than exact query results. When such approximate query evaluation is the case, applications must sacrifice timeliness to get more accurate evaluation results or tolerate evaluation result with a guaranteed error bound obtained from analyzing the samples of the data to meet their stringent timeline. In this paper, we study quality-of-service (QoS)-aware data replication and placement for approximate query evaluation of big data analytics in a distributed cloud, where the original (source) data of a query is distributed at different geo-distributed datacenters. We focus on the problems of placing data samples of the source data at some strategic datacenters to meet stringent query delay requirements of users, by exploring a non-trivial trade-off between the cost of query evaluation and the error bound of the evaluation result. We first propose an approximation algorithm with a provable approximation ratio for a single approximate query. We then develop an efficient heuristic algorithm for evaluating a set of approximate queries with the aim to minimize the evaluation cost while meeting the delay requirements of these queries. We finally demonstrate the effectiveness and efficiency of the proposed algorithms through both experimental simulations and implementations in a real test-bed, real datasets are employed. Experimental results show that the proposed algorithms are promising.

Original languageEnglish
Article number8732398
Pages (from-to)2677-2691
Number of pages15
JournalIEEE Transactions on Parallel and Distributed Systems
Volume30
Issue number12
DOIs
Publication statusPublished - 1 Dec 2019

Keywords

  • algorithm analysis
  • approximate query evaluation
  • approximation algorithms
  • big data analytics
  • Data replication and placement

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this