Joint scheduling of MapReduce jobs with servers: Performance bounds and experiments

Yi Yuan, Dan Wang, Jiangchuan Liu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

18 Citations (Scopus)

Abstract

MapReduce has achieved tremendous success for large-scale data processing in data centers. A key feature distinguishing MapReduce from previous parallel models is that it interleaves parallel and sequential computation. Past schemes, and especially their theoretical bounds, on general parallel models are therefore, unlikely to be applied to MapReduce directly. There are many recent studies on MapReduce job and task scheduling. These studies assume that the servers are assigned in advance. In current data centers, multiple MapReduce jobs of different importance levels run together. In this paper, we investigate a schedule problem for MapReduce taking server assignment into consideration as well. We formulate a MapReduce server-job organizer problem (MSJO) and show that it is NP-complete. We develop a 3-approximation algorithm and a fast heuristic. We evaluate our algorithms through both simulations and experiments on Amazon EC2 with an implementation in Hadoop. The results confirm the advantage of our algorithms.
Original languageEnglish
Title of host publicationIEEE INFOCOM 2014 - IEEE Conference on Computer Communications
PublisherIEEE
Pages2175-2183
Number of pages9
ISBN (Print)9781479933600
DOIs
Publication statusPublished - 1 Jan 2014
Event33rd IEEE Conference on Computer Communications, IEEE INFOCOM 2014 - Toronto, ON, Canada
Duration: 27 Apr 20142 May 2014

Conference

Conference33rd IEEE Conference on Computer Communications, IEEE INFOCOM 2014
CountryCanada
CityToronto, ON
Period27/04/142/05/14

ASJC Scopus subject areas

  • Computer Science(all)
  • Electrical and Electronic Engineering

Cite this