Heterogeneous Ensemble with Combined Dimensionality Reduction for Social Spam Detection

Authors

  • Abdulfatai Ganiyu Oladepo Department of Computer Science, University of Ilorin, Ilorin, Nigeria
  • Amos Orenyi Bajeh Department of Computer Science, University of Ilorin, Ilorin, Nigeria
  • Abdullateef Oluwagbemiga Balogun Department of Computer Science, University of Ilorin, Ilorin, Nigeria https://orcid.org/0000-0001-7411-3639
  • Hammed Adeleye Mojeed Department of Computer Science, University of Ilorin, Ilorin, Nigeria
  • Abdulsalam Abiodun Salman Department of Library and Information Science, University of Ilorin, Ilorin, Nigeria
  • Abdullateef Iyanda Bako Department of Urban and Regional Planning, University of Ilorin, Ilorin, Nigeria

DOI:

https://doi.org/10.3991/ijim.v15i17.19915

Keywords:

High dimensionality, Ensemble, Spam detection

Abstract


This study presents a novel framework based on a heterogeneous ensemble method and a hybrid dimensionality reduction technique for spam detection in micro-blogging social networks. A hybrid of Information Gain (IG) and Principal Component Analysis (PCA) (dimensionality reduction) was implemented for the selection of important features and a heterogeneous ensemble consisting of Naïve Bayes (NB), K Nearest Neighbor (KNN), Logistic Regression (LR) and Repeated Incremental Pruning to Produce Error Reduction (RIPPER) classifiers based on Average of Probabilities (AOP) was used for spam detection. The proposed framework was applied on MPI_SWS and SAC’13 Tip spam datasets and the developed models were evaluated based on accuracy, precision, recall, f-measure, and area under the curve (AUC). From the experimental results, the proposed framework (that is, Ensemble + IG + PCA) outperformed other experimented methods on studied spam datasets. Specifically, the proposed method had an average accuracy value of 87.5%, an average precision score of 0.877, an average recall value of 0.845, an average F-measure value of 0.872 and an average AUC value of 0.943. Also, the proposed method had better performance than some existing methods. Consequently, this study has shown that addressing high dimensionality in spam datasets, in this case, a hybrid of IG and PCA with a heterogeneous ensemble method can produce a more effective method for detecting spam contents.

Author Biographies

Abdulfatai Ganiyu Oladepo, Department of Computer Science, University of Ilorin, Ilorin, Nigeria

Abdulfatai Ganiyu Oladepo is an IT Service Management practitioner with a keen interest in Data Science, Machine Learning, and IT Project Management.

Amos Orenyi Bajeh, Department of Computer Science, University of Ilorin, Ilorin, Nigeria

Amos Orenyi Bajeh has a BSc and an MSc degree in Computer Science from the University of Ilorin where he is currently a Senior Lecturer in the Department of Computer Science at the same University. He has a Ph.D. in Information Technology from Universiti Teknologi PETRONAS. Software measurement, software maintenance, machine learning, and fuzzy inference system are his areas of research interest

Abdullateef Oluwagbemiga Balogun, Department of Computer Science, University of Ilorin, Ilorin, Nigeria

Abdullateef Oluwagbemiga Balogun received his B.Sc. and M.Sc degrees in Computer Science from the University of Ilorin, Nigeria. Currently on his Ph.D. in Information Technology at the Universiti Teknologi PETRONAS, Perak, Malaysia. He is an academic staff in the Department of Computer Science, Faculty of Communication and Information Sciences, University of Ilorin, Nigeria. His research interests include Search-Based Software Engineering, Software Quality Assurance, Machine Learning, Data Science

Hammed Adeleye Mojeed, Department of Computer Science, University of Ilorin, Ilorin, Nigeria

Hammed Adeleye Mojeed is a Lecturer in the Department of Computer Science, University of Ilorin, Ilorin Nigeria. He received a Master of Science in Computer Science with distinction from the University of Ilorin, Ilorin, Nigeria in 2019, a Diploma in Computer Networking from SIIT Global, New Delhi, India in 2014, and a Bachelor of Science in Computer Science with First Class Honors from the University of Ilorin, Ilorin Nigeria in 2013. His research interests fall in the field of Empirical Search-Based Software Engineering, Software Project Planning and Management, Machine Learning, Optimization, and Text Mining. He has authored/co-authored over 20 publications in reputable outlets. He is a member of the IEEE Nigeria Computer Chapter and a Graduate Member of Computer Professionals of Nigeria (GMCPN).

Abdulsalam Abiodun Salman, Department of Library and Information Science, University of Ilorin, Ilorin, Nigeria

Abdulsalam Abiodun Salman is an Associate Professor and Head of the Department of Library and Information Science, Faculty of Communication and Information Sciences, University of Ilorin, Ilorin, Nigeria.

Abdullateef Iyanda Bako, Department of Urban and Regional Planning, University of Ilorin, Ilorin, Nigeria

Abdullateef Iyanda Bako is an Associate Professor and Dean of the Faculty of Environmental Sciences, University of Ilorin, Ilorin, Nigeria.

Downloads

Published

2021-09-06

How to Cite

Oladepo, A. G., Bajeh, A. O., Balogun, A. O., Mojeed, H. A., Salman, A. A., & Bako, A. I. (2021). Heterogeneous Ensemble with Combined Dimensionality Reduction for Social Spam Detection. International Journal of Interactive Mobile Technologies (iJIM), 15(17), pp. 84–103. https://doi.org/10.3991/ijim.v15i17.19915

Issue

Section

Papers