Malware Detection Using Ensemble N-gram Opcode Sequences

Authors

  • Paul Ntim Yeboah Kofi Annan Centre of Excellence in ICT
  • Stephen Kweku Amuquandoh Kwame Nkrumah University of Science and Technology
  • Haruna Balle Baz Musah Kofi Annan Centre of Excellence in ICT

DOI:

https://doi.org/10.3991/ijim.v15i24.25401

Keywords:

Malware Detection, N-Gram, Opcode, Machine Learning, Ensemble, Grid Search

Abstract


Conventional approaches to tackling malware attacks have proven to be futile at detecting never-before-seen (zero-day) malware. Research however has shown that zero-day malicious files are mostly semantic-preserving variants of already existing malware, which are generated via obfuscation methods. In this paper we propose and evaluate a machine learning based malware detection model using ensemble approach. We employ a strategy of ensemble where multiple feature sets generated from different n-gram sizes of opcode sequences are trained using a single classifier. Model predictions on the trained multi feature sets are weighted and combined on average to make a final verdict on whether a binary file is malicious or benign. To obtain optimal weight combination for the ensemble feature sets, we applied a grid search on a set of pre-defined weights in the range 0 to 1. With a balanced dataset of 2000 samples, an ensemble of n-gram opcode sequences of n sizes 1 and 2 with respective weight pair 0.3 and 0.7 yielded the best detection accuracy of 98.1% using random forest (RF) classifier. Ensemble n-gram sizes 2 and 3 obtained 99.7% as best precision using weight 0.5 for both models.

Author Biographies

Paul Ntim Yeboah, Kofi Annan Centre of Excellence in ICT

Lecturer

Stephen Kweku Amuquandoh, Kwame Nkrumah University of Science and Technology

Systems Administrator

Haruna Balle Baz Musah, Kofi Annan Centre of Excellence in ICT

Lecturer

Downloads

Published

2021-12-21

How to Cite

Yeboah, P. N., Amuquandoh, S. K., & Musah, H. B. B. (2021). Malware Detection Using Ensemble N-gram Opcode Sequences. International Journal of Interactive Mobile Technologies (iJIM), 15(24), pp. 19–31. https://doi.org/10.3991/ijim.v15i24.25401

Issue

Section

Papers