Cross-cloud MapReduce for Big Data

Peng Li, Song Guo, Shui Yu, Weihua Zhuang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

1 Citation (Scopus)

Abstract

MapReduce plays a critical role as a leading framework for big data analytics. In this paper, we consider a geo-distributed cloud architecture that provides MapReduce services based on the big data collected from end users all over the world. Existing work handles MapReduce jobs by a traditional computation-centric approach that all input data distributed in multiple clouds are aggregated to a virtual cluster that resides in a single cloud. Its poor efficiency and high cost for big data support motivate us to propose a novel data-centric architecture with three key techniques, namely, cross-cloud virtual cluster, data-centric job placement, and network coding based traffic routing. Our design leads to an optimization framework with the objective of minimizing both computation and transmission cost for running a set of MapReduce jobs in geo-distributed clouds. We further design a parallel algorithm by decomposing the original large-scale problem into several distributively solvable subproblems that are coordinated by a high-level master problem. Finally, we conduct real-world experiments and extensive simulations to show that our proposal significantly outperforms the existing works.

Original languageEnglish
Article number7229313
Pages (from-to)375-386
Number of pages12
JournalIEEE Transactions on Cloud Computing
Volume8
Issue number2
DOIs
Publication statusPublished - 1 Apr 2020
Externally publishedYes

Keywords

  • Big data
  • Cloud
  • MapReduce
  • Optimization
  • Parallel algorithm

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Computer Science Applications
  • Computer Networks and Communications

Cite this