Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data

Authors

  • Carmen Lili Rodríguez Velasco Universidad Europea del Atlántico, Universidad Internacional Iberoamericana UNINI-MX, Universidade Internacional do Cuanza UNIC https://orcid.org/0000-0002-9609-4026
  • Eduardo García Villena Universidad Europea del Atlántico, Universidad Internacional Iberoamericana UNIB (PR-USA) https://orcid.org/0000-0001-7549-3733
  • Julien Brito Ballester Universidad Europea del Atlántico, Universidad Internacional Iberoamericana UNINI-MX https://orcid.org/0000-0001-6436-0214
  • Frigdiano Álvaro Durántez Prados Universidad Europea del Atlántico, Universidad Internacional Iberoamericana UNINI-MX, Universidade Internacional do Cuanza UNIC
  • Eduardo Silva Alvarado Universidad Europea del Atlántico, Universidad Internacional Iberoamericana UNINI-MX, Universidad Internacional Iberoamericana UNIB (PR-USA)
  • Jorge Crespo Álvarez Universidad Europea del Atlántico, Universidad Internacional Iberoamericana UNIB (PR-USA) https://orcid.org/0000-0001-7589-5337

DOI:

https://doi.org/10.3991/ijet.v18i04.34825

Keywords:

optimal likelihood threshold,, imbalanced data, student dropout prediction, resample techniques, distance learning courses

Abstract


The purpose of this research article was to contrast the benefits of the optimal probability threshold adjustment technique with other imbalanced data processing techniques, in its application to the prediction of post-graduate students’ late dropout from distance learning courses in two universities in the Ibero-American space. In this context, the optimization of the Logistic Regression, Random Forest, and Neural Network classifiers, together with different techniques, attributes, and algorithms (Hyperparameters, SMOTE, SMOTE_SVM, and ADASYN) resulted in a set of metrics for decision-making, prioritizing the reduction of false negatives. The best model was the Neural Network model in combination with SMOTE_SVM, obtaining a recall index of 0.75 and an f1-Score of 0.60. Likewise, the robustness of the Random Forest classifier for imbalanced data was demonstrated by achieving, with an optimal threshold of 0.427, very similar metrics to those obtained by the consensus of the three best models found. This demonstrates that, for Random Forest, the optimal prediction probability threshold is an excellent alternative to resampling techniques with different optimal thresholds. Finally, it is hoped that this research paper will contribute to boost the application of this simple but powerful technique, which is highly underrated with respect to data resampling techniques for imbalanced data.

Downloads

Published

2023-02-23

How to Cite

Rodríguez Velasco, C. L., García Villena, E., Brito Ballester, J., Durántez Prados, F. Álvaro, Silva Alvarado, E., & Crespo Álvarez, J. (2023). Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data. International Journal of Emerging Technologies in Learning (iJET), 18(04), pp. 120–155. https://doi.org/10.3991/ijet.v18i04.34825

Issue

Section

Papers