Bachelor Thesis Analytics: Using Machine Learning to Predict Dropout and Identify Performance Factors
Keywords:thesis, bachelor, completion, machine learning, retention, performance, learning analytics, prediction, dropout
The bachelor thesis is commonly a necessary last step towards the first graduation in higher education and constitutes a central key to both further studies in higher education and employment that requires higher education degrees. Thus, completion of the thesis is a desirable outcome for individual students, academic institutions and society, and non-completion is a significant cost. Unfortunately, many academic institutions around the world experience that many thesis projects are not completed and that students struggle with the thesis process. This paper addresses this issue with the aim to, on the one hand, identify and explain why thesis projects are completed or not, and on the other hand, to predict non-completion and completion of thesis projects using machine learning algorithms. The sample for this study consisted of bachelor students’ thesis projects (n=2436) that have been started between 2010 and 2017. Data were extracted from two different data systems used to record data about thesis projects. From these systems, thesis project data were collected including variables related to both students and supervisors. Traditional statistical analysis (correlation tests, t-tests and factor analysis) was conducted in order to identify factors that influence non-completion and completion of thesis projects and several machine learning algorithms were applied in order to create a model that predicts completion and non-completion. When taking all the analysis mentioned above into account, it can be concluded with confidence that supervisors’ ability and experience play a significant role in determining the success of thesis projects, which, on the one hand, corroborates previous research. On the other hand, this study extends previous research by pointing out additional specific factors, such as the time supervisors take to complete thesis projects and the ratio of previously unfinished thesis projects. It can also be concluded that the academic title of the supervisor, which was one of the variables studied, did not constitute a factor for completing thesis projects. One of the more novel contributions of this study stems from the application of machine learning algorithms that were used in order to – reasonably accurately – predict thesis completion/non-completion. Such predictive models offer the opportunity to support a more optimal matching of students and supervisors.
How to Cite
The submitting author warrants that the submission is original and that she/he is the author of the submission together with the named co-authors; to the extend the submission incorporates text passages, figures, data or other material from the work of others, the submitting author has obtained any necessary permission.
Articles in this journal are published under the Creative Commons Attribution Licence (CC-BY What does this mean?). This is to get more legal certainty about what readers can do with published articles, and thus a wider dissemination and archiving, which in turn makes publishing with this journal more valuable for you, the authors.
By submitting an article the author grants to this journal the non-exclusive right to publish it. The author retains the copyright and the publishing rights for his article without any restrictions.