On Traffic-Aware Partition and Aggregation in MapReduce for Big Data Applications

Huan Ke, Peng Li, Song Guo, Minyi Guo

Research output: Journal article publicationJournal articleAcademic researchpeer-review

37 Citations (Scopus)

Abstract

The MapReduce programming model simplifies large-scale data processing on commodity cluster by exploiting parallel map tasks and reduce tasks. Although many efforts have been made to improve the performance of MapReduce jobs, they ignore the network traffic generated in the shuffle phase, which plays a critical role in performance enhancement. Traditionally, a hash function is used to partition intermediate data among reduce tasks, which, however, is not traffic-efficient because network topology and data size associated with each key are not taken into consideration. In this paper, we study to reduce network traffic cost for a MapReduce job by designing a novel intermediate data partition scheme. Furthermore, we jointly consider the aggregator placement problem, where each aggregator can reduce merged traffic from multiple map tasks. A decomposition-based distributed algorithm is proposed to deal with the large-scale optimization problem for big data application and an online algorithm is also designed to adjust data partition and aggregation in a dynamic manner. Finally, extensive simulation results demonstrate that our proposals can significantly reduce network traffic cost under both offline and online cases.
Original languageEnglish
Article number7079380
Pages (from-to)818-828
Number of pages11
JournalIEEE Transactions on Parallel and Distributed Systems
Volume27
Issue number3
DOIs
Publication statusPublished - 1 Mar 2016
Externally publishedYes

Keywords

  • aggregation
  • big data
  • lagrangian decomposition
  • MapReduce
  • partition

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'On Traffic-Aware Partition and Aggregation in MapReduce for Big Data Applications'. Together they form a unique fingerprint.

Cite this