Aggregation on the fly: Reducing traffic for big data in the cloud

Huan Ke, Peng Li, Song Guo, Ivan Stojmenovic

Research output: Journal article publicationJournal articleAcademic researchpeer-review

10 Citations (Scopus)

Abstract

As a leading framework for processing and analyzing big data, MapReduce is leveraged by many enterprises to parallelize their data processing on distributed computing systems. Unfortunately, the all-to-all data forwarding from map tasks to reduce tasks in the traditional MapReduce framework would generate a large amount of network traffic. The fact that the intermediate data generated by map tasks can be combined with significant traffic reduction in many applications motivates us to propose a data aggregation scheme for MapReduce jobs in cloud. Specifically, we design an aggregation architecture under the existing MapReduce framework with the objective of minimizing the data traffic during the shuffle phase, in which aggregators can reside anywhere in the cloud. Some experimental results also show that our proposal outperforms existing work by reducing the network traffic significantly.
Original languageEnglish
Article number7293300
Pages (from-to)17-23
Number of pages7
JournalIEEE Network
Volume29
Issue number5
DOIs
Publication statusPublished - 1 Sep 2015
Externally publishedYes

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this