Abstract
Hadoop is one popular implementation of MapReduce programming model, which has made programming on distributed system with much ease. In computer world, the convenience is always at the cost of performance. Comparing with MPI, Hadoop simplifies the programming, but it degrades the performance. In this work, we focus on the comparison between Hadoop and Hadoop Streaming, since Hadoop Streaming is widely used as it frees programmers from Java language, which makes programmers use the power of Hadoop more easily. Also, Hadoop Streaming brings the performance penalty. With deep analysis of Hadoop Streaming mechanism, we find out that pipe is the major bottleneck. In our experiments, we evaluate the performance of Hadoop Streaming with 6 benchmarks, The experiment results show that Hadoop Streaming degrades the performance a lot only for data intensive jobs, and for computational intensive jobs, Hadoop Streaming may even performs better because of using a more effiecient language than Java.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2011 ACM Research in Applied Computation Symposium, RACS 2011 |
| Pages | 307-313 |
| Number of pages | 7 |
| DOIs | |
| Publication status | Published - 1 Dec 2011 |
| Externally published | Yes |
| Event | 2011 ACM Research in Applied Computation Symposium, RACS 2011 - Miami, FL, United States Duration: 2 Nov 2011 → 5 Nov 2011 |
Conference
| Conference | 2011 ACM Research in Applied Computation Symposium, RACS 2011 |
|---|---|
| Country/Territory | United States |
| City | Miami, FL |
| Period | 2/11/11 → 5/11/11 |
Keywords
- Hadoop
- Hadoop streaming
- Linux kernel
- MapReduce
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Applied Mathematics
Fingerprint
Dive into the research topics of 'More convenient more overhead: The performance evaluation of Hadoop streaming'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver