From Micro-benchmarks to Machine Learning: Unveiling the Efficiency and Scalability of Hadoop and Spark

Authors

DOI:

https://doi.org/10.3991/ijim.v18i17.44555

Keywords:

Big Data, Hadoop, Apache Spark, MapReduce, HiBench benchmark, Machine Learning, , Memory Resource Limitations, Data Workloads

Abstract


With the exponential growth of data, the demand for efficient and scalable data processing solutions has become paramount. Hadoop and Spark, pivotal components of the open-source Big Data landscape, have been put to the test in this study. We conducted a comprehensive performance analysis of Hadoop and Spark in virtualized environments, evaluating their prowess across a suite of benchmarks. The benchmarks encompassed a spectrum of workloads, from micro-benchmarks such as Sort, WordCount, and TeraSort to web search tasks such as PageRank and machine learning endeavors including Naive Bayes and K-means. The central focus was to gauge their performance, efficiency, and resource utilization. The findings of this study underscore the benefits of Spark’s in-memory processing, demonstrating its superiority over Hadoop in various scenarios. Spark excels in machine learning and web search applications, particularly when handling smaller inputs. Its efficient memory management and support for multiple iterations make it a strong choice. In resource-constrained environments or when dealing with large input files and limited memory, Hadoop may still hold an edge. The design and implementation of data processing solutions in virtualized environments should carefully consider the specific demands of each framework. This study not only presents a performance comparison of Hadoop and Spark across different benchmarks but also emphasizes the vital implications for designing and deploying data processing solutions in virtualized settings. It serves as a cornerstone for informed decision-making, paving the way for optimized algorithms and techniques in the dynamic landscape of big data processing.

Downloads

Published

2024-09-11

How to Cite

Hebabaze, S. E. ., EL Ghmary, M., El bouabidi, H., Maftah, S. ., & Amnai, M. . (2024). From Micro-benchmarks to Machine Learning: Unveiling the Efficiency and Scalability of Hadoop and Spark. International Journal of Interactive Mobile Technologies (iJIM), 18(17), pp. 46–60. https://doi.org/10.3991/ijim.v18i17.44555

Issue

Section

Papers