Bachelor Thesis Analytics: Using Machine Learning to Predict Dropout and Identify Performance Factors


  • Jalal Nouri Stockholm University, Sweden
  • Ken Larsson
  • Mohammed Saqr



thesis, bachelor, completion, machine learning, retention, performance, learning analytics, prediction, dropout


The bachelor thesis is commonly a necessary last step towards the first graduation in higher education and constitutes a central key to both further studies in higher education and employment that requires higher education degrees. Thus, completion of the thesis is a desirable outcome for individual students, academic institutions and society, and non-completion is a significant cost. Unfortunately, many academic institutions around the world experience that many thesis projects are not completed and that students struggle with the thesis process. This paper addresses this issue with the aim to, on the one hand, identify and explain why thesis projects are completed or not, and on the other hand, to predict non-completion and completion of thesis projects using machine learning algorithms. The sample for this study consisted of bachelor students’ thesis projects (n=2436) that have been started between 2010 and 2017. Data were extracted from two different data systems used to record data about thesis projects. From these systems, thesis project data were collected including variables related to both students and supervisors. Traditional statistical analysis (correlation tests, t-tests and factor analysis) was conducted in order to identify factors that influence non-completion and completion of thesis projects and several machine learning algorithms were applied in order to create a model that predicts completion and non-completion. When taking all the analysis mentioned above into account, it can be concluded with confidence that supervisors’ ability and experience play a significant role in determining the success of thesis projects, which, on the one hand, corroborates previous research. On the other hand, this study extends previous research by pointing out additional specific factors, such as the time supervisors take to complete thesis projects and the ratio of previously unfinished thesis projects. It can also be concluded that the academic title of the supervisor, which was one of the variables studied, did not constitute a factor for completing thesis projects. One of the more novel contributions of this study stems from the application of machine learning algorithms that were used in order to – reasonably accurately – predict thesis completion/non-completion. Such predictive models offer the opportunity to support a more optimal matching of students and supervisors.

Author Biography

Jalal Nouri, Stockholm University, Sweden

Associate professor




How to Cite

Nouri, J., Larsson, K., & Saqr, M. (2019). Bachelor Thesis Analytics: Using Machine Learning to Predict Dropout and Identify Performance Factors. International Journal of Learning Analytics and Artificial Intelligence for Education (iJAI), 1(1), pp. 116–131.