SAMR: A self-adaptive MapReduce scheduling algorithm in heterogeneous environment

Quan Chen, Daqiang Zhang, Minyi Guo, Qianni Deng, Song Guo

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

147 Citations (Scopus)

Abstract

Hadoop is seriously limited by its MapReduce scheduler which does not scale well in heterogeneous environment. Heterogenous environment is characterized by various devices which vary greatly with respect to the capacities of computation and communication, architectures, memorizes and power. As an important extension of Hadoop, LATE MapReduce scheduling algorithm takes heterogeneous environment into consideration. However, it falls short of solving the crucial problem - poor performance due to the static manner in which it computes progress of tasks. Consequently, neither Hadoop nor LATE schedulers are desirable in heterogeneous environment. To this end, we propose SAMR: a Self-Adaptive MapReduce scheduling algorithm, which calculates progress of tasks dynamically and adapts to the continuously varying environment automatically. When a job is committed, SAMR splits the job into lots of fine-grained map and reduce tasks, then assigns them to a series of nodes. Meanwhile, it reads historical information which stored on every node and updated after every execution. Then, SAMR adjusts time weight of each stage of map and reduce tasks according to the historical information respectively. Thus, it gets the progress of each task accurately and finds which tasks need backup tasks. What's more, it identifies slow nodes and classifies them into the sets of slow nodes dynamically. According to the information of these slow nodes, SAMR will not launch backup tasks on them, ensuring the backup tasks will not be slow tasks any more. It gets the final results of the fine-grained tasks when either slow tasks or backup tasks finish first. The proposed algorithm is evaluated by extensive experiments over various heterogeneous environment. Experimental results show that SAMR significantly decreases the time of execution up to 25% compared with Hadoop's scheduler and up to 14% compared with LATE scheduler.
Original languageEnglish
Title of host publicationProceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010
Pages2736-2743
Number of pages8
DOIs
Publication statusPublished - 19 Nov 2010
Externally publishedYes
Event10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, 10th IEEE Int. Conf. Scalable Computing and Communications, ScalCom-2010 - Bradford, United Kingdom
Duration: 29 Jun 20101 Jul 2010

Conference

Conference10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, 10th IEEE Int. Conf. Scalable Computing and Communications, ScalCom-2010
CountryUnited Kingdom
CityBradford
Period29/06/101/07/10

Keywords

  • Heterogeneous environment
  • MapReduce
  • Scheduling algorithm
  • Self-adaptive

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Software

Cite this