MapReduce Solutions Classification by Their Implementation

Kamila Orynbekova; Andrey Bogdanchikov; Selcuk Cankurt; Abzatdin Adamov; Shirali Kadyrov

doi:10.3991/ijep.v13i5.38867

Authors

Kamila Orynbekova Suleyman Demirel University https://orcid.org/0000-0002-2182-2914
Andrey Bogdanchikov Suleyman Demirel University https://orcid.org/0000-0001-9693-7487
Selcuk Cankurt Vistula University https://orcid.org/0000-0003-0581-1913
Abzatdin Adamov ADA University
Shirali Kadyrov Suleyman Demirel University https://orcid.org/0000-0002-8352-2597

DOI:

https://doi.org/10.3991/ijep.v13i5.38867

Keywords:

MapReduce, Big Data, Apache Hadoop, Apache Spark, problems classification, solutions categorization, course design

Abstract

Distributed Systems are widely used in industrial projects and scientific research. The Apache Hadoop environment, which works on the MapReduce paradigm, lost popularity because new, modern tools were developed. For example, Apache Spark is preferred in some cases since it uses RAM resources to hold intermediate calculations; therefore, it works faster and is easier to use. In order to take full advantage of it, users must think about the MapReduce concept. In this paper, a usual solution and MapReduce solution of ten problems were compared by their pseudocodes and categorized into five groups. According to these groups’ descriptions and pseudocodes, readers can get a concept of MapReduce without taking specific courses. This paper proposes a five-category classification methodology to help distributed-system users learn the MapReduce paradigm fast. The proposed methodology is illustrated with ten tasks. Furthermore, statistical analysis is carried out to test if the proposed classification methodology affects learner performance. The results of this study indicate that the proposed model outperforms the traditional approach with statistical significance, as evidenced by a p-value of less than 0.05. The policy implication is that educational institutions and organizations could adopt the proposed classification methodology to help learners and employees acquire the necessary knowledge and skills to use distributed systems effectively.

Author Biographies

Kamila Orynbekova, Suleyman Demirel University

Kamila Orynbekova is a Senior Lecturer in the Computer Sciences Department, Faculty of Engineering and Natural Sciences and a Head of Distributed Systems and Computing Laboratory in Suleyman Demirel University, Kaskelen, Almaty, Kazakhstan. Also she is a PhD student in the Computer Sciences educational program. (email: kamila.orynbekova@sdu.edu.kz)

Andrey Bogdanchikov, Suleyman Demirel University

Andrey Bogdanchikov holds the title of Associate Professor at Suleyman Demirel University's Faculty of Engineering and Natural Sciences, within the Department of Information Systems and Vice-Rector of Academic Affairs situated in Abylai khan street 1/1, Kaskelen, Kazakhstan. He obtained his Doctor of Philosophy degree in 2014 from Suleyman Demirel University, Kazakhstan. His areas of expertise include the fields of Distributed Systems, Parallel Computing and Programming Languages. (email: andrey.bogdanchikov@sdu.edu.kz ).

Selcuk Cankurt, Vistula University

Selcuk Cankurt holds the title of Assistant Professor at Vistula University in the Department of Computer Engineering in Warsaw, Poland. He graduated from the University of Marmara, Istanbul, Turkey in 1997. He received the M.S. and Ph.D. degrees in Information Technologies from International Burch University, Sarajevo, Bosnia and Herzegovina in 2011 and 2015 respectively. He has studied in the areas of database systems, data warehouse, data cubes, data mining, business intelligence, artificial intelligence, machine learning, and fuzzy systems. His present research interests are data science, data lake, big data, big data analytics, deep learning and natural language processing. (email: s.cankurt@vistula.edu.pl).

Abzatdin Adamov, ADA University

Abzatdin Adamov is a director of the Center for Data Analytics Research and faculty member at the School of Information Technology and Engineering, ADA University. He is an adjunct professor at the Computer Science Department, George Washington University. He is SIEEE and the Founding General Chair of the IEEE International Conference on Application of Information and Communication Technologies. (email: aadamov@ada.edu.az).

Shirali Kadyrov, Suleyman Demirel University

Shirali Kadyrov holds the title of Professor at Suleyman Demirel University's Faculty of Engineering and Natural Sciences, within the Department of Mathematics and Natural Sciences situated in Kaskelen, Kazakhstan. He obtained his Doctor of Philosophy degree in 2010 from Ohio State University in Columbus, USA. His areas of expertise include the fields of Dynamical Systems, Mathematics Education, and Data Science. (email: shirali.kadyrov@sdu.edu.kz ).