Optimizing Cheating Detection in Online Exams with K-Shingling, MinHashing, and LSH: A Comparative Analysis with TF-IDF and BoW
DOI:
https://doi.org/10.3991/ijep.v15i4.54419Keywords:
Cheating detection, Online Exams, String-based Similarity, TF-IDF, Bag Of Words, K-shingling, Minhashing, Locality Sensitive Hashing (LSH), Logistic Re-gression, Random Forest, SVM.Abstract
Detecting cheating in online exams is a major challenge, not least to guarantee the originality and independence of answers. This paper presents a comparative analysis of three feature extraction methods for cheating detection based on similarity detection: Term frequencyinverse document frequency (TF-IDF), Bag of Words (BoW), and a new approach combining K-Shingling, MinHashing, and Locality Sensitive Hashing (LSH). We evaluate these methods in terms of their ability to accurately and efficiently identify similarities between student responses. Experimental results show that the K-Shingling, MinHashing, and LSH pipelines consistently outperform or match traditional approaches. Logistic regression and random forest classifiers with MinHashing + LSH achieve perfect scores of 1.00 in terms of precision, recall, F1 score, and accuracy, demonstrating the robustness and effectiveness of the method. In comparison, TF-IDF and BoW show mixed performance between classifiers, with notable limitations in terms of scalability and sensitivity to text variations. This study highlights the scalability and computational efficiency of the K-shingling, MinHashing, and LSH approaches, making them particularly suitable for large-scale online examination environments. By offering a detailed performance comparison, we demonstrate that K-shingling, MinHashing, and LSH provide a more reliable and efficient solution for detecting cheating in online exams, paving the way for greater academic integrity in digital education.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Nabila El Rhezzali, Imane Hilal, Meriem Hnida

This work is licensed under a Creative Commons Attribution 4.0 International License.
