Optimizing Cheating Detection in Online Exams with K-Shingling, MinHashing, and LSH: A Comparative Analysis with TF-IDF and BoW

Authors

  • Nabila El Rhezzali School of Information Sciences (ESI), Rabat, Morocco https://orcid.org/0009-0001-4928-7474
  • Imane Hilal School of Information Sciences (ESI), Rabat, Morocco
  • Meriem Hnida School of Information Sciences (ESI), Rabat, Morocco

DOI:

https://doi.org/10.3991/ijep.v15i4.54419

Keywords:

Cheating detection, Online Exams, String-based Similarity, TF-IDF, Bag Of Words, K-shingling, Minhashing, Locality Sensitive Hashing (LSH), Logistic Re-gression, Random Forest, SVM.

Abstract


Detecting cheating in online exams is a major challenge, not least to guarantee the originality and independence of answers. This paper presents a comparative analysis of three feature extraction methods for cheating detection based on similarity detection: Term frequencyinverse document frequency (TF-IDF), Bag of Words (BoW), and a new approach combining K-Shingling, MinHashing, and Locality Sensitive Hashing (LSH). We evaluate these methods in terms of their ability to accurately and efficiently identify similarities between student responses. Experimental results show that the K-Shingling, MinHashing, and LSH pipelines consistently outperform or match traditional approaches. Logistic regression and random forest classifiers with MinHashing + LSH achieve perfect scores of 1.00 in terms of precision, recall, F1 score, and accuracy, demonstrating the robustness and effectiveness of the method. In comparison, TF-IDF and BoW show mixed performance between classifiers, with notable limitations in terms of scalability and sensitivity to text variations. This study highlights the scalability and computational efficiency of the K-shingling, MinHashing, and LSH approaches, making them particularly suitable for large-scale online examination environments. By offering a detailed performance comparison, we demonstrate that K-shingling, MinHashing, and LSH provide a more reliable and efficient solution for detecting cheating in online exams, paving the way for greater academic integrity in digital education.

Downloads

Published

2025-05-21

How to Cite

El Rhezzali, N., Hilal, I., & Hnida, M. (2025). Optimizing Cheating Detection in Online Exams with K-Shingling, MinHashing, and LSH: A Comparative Analysis with TF-IDF and BoW. International Journal of Engineering Pedagogy (iJEP), 15(4), pp. 40–56. https://doi.org/10.3991/ijep.v15i4.54419

Issue

Section

Papers