Exploring Student Academic Performance Using Data Mining Tools

India Abstract— Most of the educational institutes nowadays benefited from the hidden knowledge extracted from the datasets of their students, instructors and educational settings. The education system has gone through a paradigm shift from traditional system to smart learning environments and from teacher-centric system to context-aware any time anywhere student-centric approach. In this changing scenario, we have undertaken a study to investigate the results, grades and patterns of the students of North Lakhimpur College. The paper aims to evaluate the quality of learning on the basis of 19249 grades received from 758 students in 511 courses, included in the curriculum of 3 study


Introduction
Educational Data Mining (EDM) is an emerging discipline, concerned with the application of data mining, statistical methods and machine learning for exploring the distinctive and increasingly large-scale data produced from educational organizations to better understand students and educational settings. EDM techniques can be applied to can get useful information which will help the educationalists to design or amend the formation of the courses. EDM is useful in many different areas including categorizing at-risk students, priority learning needs for diverse cluster of students, increasing graduation rates, efficaciously evaluating institutional performance, optimize curriculum renewal and maximize the use of campus resources [1]. By extracting (mining) valuable information from educational data that may have an effect on students' performance, we can achieve the highest level of quality in higher education system. Major goals of EDM are: Student's academic performance depends on diverse factors like psychological, personal, demographics, educational background, academic advancement and other environment variables. These variables are often related in complicated nonlinear way and the interrelationships among these variables participated in complex and multifaceted academic performance is not clearly understood [3].
During the recent years, the extraction and analysis of data generated during the learning process has become increasingly important. Many education institutions worldwide have already used Learning Analytics to improve the quality of learning [4], student success and retention [5,6,7], and immediate feedback [8].
The paper aims to evaluate the quality of learning in North Lakhimpur College on the basis of 19249 grades received from 758 students in 511 courses, included in the curriculum of 3 study programmes.

Literature Review
There are various scientific literatures to dig out the hidden patterns from the sea of data of the students from different educational organizations. The extracted knowledge may be utilized by the authorities of the institutes, academicians and educators for the betterment of the students and enhance the performance the learners and obviate the dropout of the students. There are an assortment of machine learning tools and data mining techniques employed to discover such knowledge.
Learning Analytics combines approaches, methods and results from different scientific fields such as intelligent data analysis and business intelligence, predictive modelling, etc.
Krpan and Stankov [9] applied data mining technique for grouping students with similar characteristics for e-learning systems.
Nagy et al. [10] in their research "Student Advisory Framework" utilizes classification and clustering technique to build an intelligent system. This intelligent system can be used to decrease the high rate of academic failure among the students by providing consultations to first year university student to pursue a certain education track. They have proved this by a real case study in Cairo Higher Institute for Engineering, Computer Science and Management on the dataset collected from 2000−2012.
Ariouat et. al. [11] used two step based approach to improve educational process mining. In the first step they create clusters based on employability indicators and in the second step they obtained clusters using the AXOR algorithm. They have tested their result using ProM Framework and found that their model optimizes both performance/stability and comprehensibility/size simultaneously.
Ahuja et. al. [12] compared various clustering and classification algorithms by applying on the same dataset. They have highlighted different design challenges like goal and functionality, precision, and overheads when the data set is extremely large. They have also discussed graph-based clustering, centroid-based clustering, and various supervised classification algorithms that can be applied to Educational Data mining.
Hussain et. al. [13] tried to find out the association rule on the dataset that contained 666 instances with 11 attribute. They used data mining tools like Orange, Weka and R. Studio to study and compare various clustering and classification methods. It is believed that neural network performs well on big dataset but the authors found that neural network was the best classifier on the above dataset with 90.84% accuracy. The authors also found that PAM and K-means clustering performs better than hierarchical clustering.
Educational data mining techniques can make a difference to an educational institution by discriminating the academically weak and at-risk students. The final grades of a student can be predicted using internal assessment marks. Hussain et al. [14] collected the internal assessment marks and final grades of three different colleges from Assam, India to devise a model for such prediction using deep learning methodologies. The sequential neural deep learning model with Adam optimization outperformed the Adaboost and Artificial Immune Recognition System v2.0 classifiers. The statistical parameters proved it efficacy.

Dataset Description
The Dataset was collected for the students of who took admission in three different programmes namely BA, BSc and BCA from the North Lakhimpur College of Assam, India. It contains 19249 grades received from 758 students in 511 courses. A student has to earn 120 credits during six semesters to complete these three programmes. There are 25% marks for Internal Assessment and 75% Marks for End-Semester Examination in each course during every semester in UG Programme.

Data Visualization
We had utilized data mining tools to visualize the data. The following figure (

Experiments and Results
The quality of the training in North Lakhimpur College, Assam, India (Autonomous) evaluated on the basis of grades received from the students in different courses. The quality of education is evaluated by two aspects -the feedback of teachers to students during the training and students' success in subjects.

Feedback during training
The feedback of the teachers to the students during the training in each course is evaluated on the basis of an investigation of the relation between the intermediate and final grades of the students. For this purpose, statistical methods for t-test have been applied. The following null hypothesis was set for each course in which the intermediate assessment has been conducted (391 courses): Н0: Any differences in intermediate and final grades are due to chance.
To accept or reject the null hypothesis, the values required for the t-test are calculated. According to this statistical method, the null hypothesis can be rejected when the calculated t-value>t-table value. The calculated t-value is greater than the t-table value at an alpha level of .05 for grades obtained in 93% of BA courses, 58% of BCA courses, and 78% of BSc courses. In all of these courses, the difference in intermediate and final grades is not due to chance. This difference is due to the measures taken by teachers to improve student success and timely feedback of teachers. Fig. 4 represents the difference between the calculated t-values and table-values for each subject. Table 2 represents only the values for the first three courses from each study programme, for which the difference in values is the largest.  Table 3 presents the differences between students' intermediate and final grades in points for each study programme. Most students in all programmes increased their final grade with 10-20 points, and the lowest number of grades were increased by 50-60 points. The timely intervention of teachers did not help to increase 228 grades of students in BA programme, 20 grades of students in BCA programme and 115 grades of students in BSc programme. For each course, the differences between the intermediate and final grades of each student were examined. Table 4 presents the data for the course CT-4-BCA-103 PROGRAMMING AND PROBLEM SOLVING studied in the BCA program. During the intermediate assessment, all 16 students received low grades. The data from the table show that 8 students failed to complete the course successfully, although 6 of them significantly increased the number of points obtained in the final assessment, on the basis of which the final grade was calculated. The other 8 students have completed the course successfully, as 4 of them are increased their grades with two units (from F to C) and 4 with one unit (from F to D).

Student performance
The quality of courses in the three study programmes was evaluated on the basis of student performance.
The number of A +, A, B +, B, C +, C <D and F grades are calculated for each course. The analysis aims to check if there are courses in which most students have received high grades and courses in which most students have received poor grades. Table 5 presents a summary of the courses in the BCA program.  41% of all courses included in the syllabus of BA study programme. More than 80% of students did not successfully complete the training in the other 10 courses (3.52% of all courses). These results indicate that it is necessary for teachers to look for the reasons for the poor results and to take measures to improve the quality of training and students' performance. The number of poor grades obtained is below 10%in 110 courses (38.73% of all courses), which is a sign of high performance of the students in these courses. Fig.6 presents the percent of A+, A, B+, B, C+, C, D and F grades in each course in the study programme BA. The results of the analyses show that students enrolled in BSc study programme have the highest performance. All students received F grade and did not complete only one course CT-3-ELE-301 (0.36% of all courses). More than 80% of the students did not successfully complete the training in the other 17 courses (6.25% of all courses). These results indicate that it is necessary for teachers to look for the reasons for the poor results and to take measures to improve the quality of training and students' performance. The number of F grades is below 10% in 125 courses (45.96% of all courses), which is a sign of high student achievement in these courses. Fig. 7 presents the percent of A+, A, B+, B, C+, C, D and F grades in each course in the study programme BSc.

Fig. 7. Grades in BSc study programme
An in-depth analysis has been made to evaluate the quality of the courses according to the students' results. It aims to determine whether students have low grades only in certain courses or in all courses. Table 6 presents the number of grades A +, A, B +, B, C +, C, D and F received from each student in the courses included in the curriculum of the BCA programme. For example, student 001 has 13 low grades (F), 18 grades in the interval from A + to D, 2 of which are A + and 1 A. This indicates that the quality of teaching in courses in which the student has F grades is probably not on a level.  Student  A+  A  B+  B  C+  C  D  F  001  2  1  3  0  3  7  2  13  002  0  1  3  2  1  3  1  20  003  0  0  1  1  0  0  0  9  004  0  0  1  1  1  0  1  13  005  0  2  2  2  1  6  5  13  006  0  1  0  0  0  0  0  5  007  2  2  2  1  2  5  3  14  008  1  0  1  1  0  0  0  14  009  0  1  2  3  0  3  4  18  010  1  1  1  2  1  3  4  18  011  0  0  4  1  1  1  1  23  013  1  0  2  3  1  4  4 Table 7 presents the number of grades only for those students who have received high percentA + and A grades. The results of the analysis show that the quality of education in some courses is likely to be poor. For example, a student who has received 33% high grades has not completed 13% of the courses. This indicates that there is a high probability of the quality of teaching in these courses is not on high level and that poor grades are not entirely due to the unpreparedness of the student. Data for all students can be seen in Fig.9, which presents the percent of A+, A, B+, B, C+, C, D and F grades for each student.  Table 8 presents the number of grades only for those students who have a high percent of high marks -A + and A. The results of the analysis show that the quality of teaching in some courses is likely to be poor. For example, a student who has 56% excellent grades (A+ and A) has not completed 22% of the courses. This indicates that there is a high probability the quality of teaching in these courses is not good and poor grades are not due to the lack of preparedness of the student in a full degree. On the other hand, some strong students do not have F grades in any of the courses (see Table 8 and Fig. 10), which requires further analysis of the causes of the case described.

Conclusion
The quality of the training in North Lakhimpur College, Assam, India (Autonomous) was evaluated on the basis of 19249 grades received from 758 students in 511 courses, included in the curriculum of three study programmes. The results from the evaluation of feedback of the teachers to the students during the training show that the final grades in 93% of BA courses, 58% of BCA courses, and 78% of BSc courses had been improved after measures taken by teachers to improve student success and their timely feedback. There are courses in which all students have low grades or few students have passed the course with C +, C and D grades. These results can be interpreted as a sign of the poor quality of teaching in these courses and the need to study the reasons for the poor results and to take measures to improve the students' performance. An in-depth analysis which aims to determine whether students have low grades only in certain courses or in all courses.
During the next study year, all analyses will be conducted again. The results will be compared with the current results. The result of the comparison will show if the taken measures are gave results and the quality of courses has been improved.