More convenient more overhead: The performance evaluation of Hadoop streaming

Mengwei Ding, Long Zheng, Yanchao Lu, Li Li, Song Guo, Minyi Guo

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

29 Citations (Scopus)

Abstract

Hadoop is one popular implementation of MapReduce programming model, which has made programming on distributed system with much ease. In computer world, the convenience is always at the cost of performance. Comparing with MPI, Hadoop simplifies the programming, but it degrades the performance. In this work, we focus on the comparison between Hadoop and Hadoop Streaming, since Hadoop Streaming is widely used as it frees programmers from Java language, which makes programmers use the power of Hadoop more easily. Also, Hadoop Streaming brings the performance penalty. With deep analysis of Hadoop Streaming mechanism, we find out that pipe is the major bottleneck. In our experiments, we evaluate the performance of Hadoop Streaming with 6 benchmarks, The experiment results show that Hadoop Streaming degrades the performance a lot only for data intensive jobs, and for computational intensive jobs, Hadoop Streaming may even performs better because of using a more effiecient language than Java.
Original languageEnglish
Title of host publicationProceedings of the 2011 ACM Research in Applied Computation Symposium, RACS 2011
Pages307-313
Number of pages7
DOIs
Publication statusPublished - 1 Dec 2011
Externally publishedYes
Event2011 ACM Research in Applied Computation Symposium, RACS 2011 - Miami, FL, United States
Duration: 2 Nov 20115 Nov 2011

Conference

Conference2011 ACM Research in Applied Computation Symposium, RACS 2011
Country/TerritoryUnited States
CityMiami, FL
Period2/11/115/11/11

Keywords

  • Hadoop
  • Hadoop streaming
  • Linux kernel
  • MapReduce

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Applied Mathematics

Cite this