Traffic-aware geo-distributed Big data analytics with predictable job completion time

Peng Li, Song Guo, Toshiaki Miyazaki, Xiaofei Liao, Hai Jin, Albert Y. Zomaya, Kun Wang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

40 Citations (Scopus)

Abstract

Big data analytics has attracted close attention fromboth industry and academic because of its great benefits in cost reduction and better decisionmaking. As the fast growth of various global services, there is an increasing need for big data analytics acrossmultiple data centers (DCs) located in different countries or regions. It asks for the support of a cross-DC data processing platform optimized for the geo-distributed computing environment. Although some recent efforts have beenmade for geo-distributed big data analytics, they cannot guarantee predictable job completion time, and would incur excessive traffic over the inter-DC network that is a scarce resource shared by many applications. In this paper, we study tominimize the inter-DC traffic generated by MapReduce jobs targeting on geo-distributed big data, while providing predicted job completion time. To achieve this goal, we formulate an optimization problemby jointly considering input datamovement and task placement. Furthermore, we guarantee predictable job completion time by applying the chance-constrained optimization technique, such that the MapReduce job can finish within a predefined job completion time with high probability. To evaluate the performance of our proposal, we conduct extensive simulations using real traces generated by a set of queries onHive. The results show that our proposal can reduce 55 percent inter-DCtraffic comparedwith centralized processing by aggregating all data to a single data center.

Original languageEnglish
Article number7738559
Pages (from-to)1785-1796
Number of pages12
JournalIEEE Transactions on Parallel and Distributed Systems
Volume28
Issue number6
DOIs
Publication statusPublished - Jun 2017

Keywords

  • Big data
  • Geo-distributed
  • MapReduce
  • Traffic-aware

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this