Learning Analytics: Analyzing Various Aspects of Learners’ Performance in Blended Courses. The Case of Kabul Polytechnic University, Afghanistan

Learning performance is crucial in students’ academic lives because it opens opportunities for future professional development. However, conventional educational practices do not provide all the necessary skills for university instructors and students to succeed in today's educational context. In addition, due to poor information resources, ineffective ICT tool utilization and the teaching methodologies in developing countries, particularly Afghanistan, a large gap exists across curriculum plans and instructor practices. Learning analytics, as a new educational instrument, has made it possible for higher education actors to reshape the educational environment to be more effective and consistent. In this study, we analyzed multiple research approaches and the results of analytics of various learner aspects to address the aforementioned issues. The research methods were predominantly quantitative-cum-qualitative. Real (quantitative) data were collected based on learners’ explicit actions, such as completing assignments and taking exams, and implicit actions, such interacting and posting on discussion forums. Meanwhile, secondary (qualitative) data collection was conducted on-site at Kabul Polytechnic University (KPU); both blended and traditional class samples were included. The results of this study offer insight into various aspects of learners’ behaviors that lead to their success and indicate the best analytical model/s to provide the highest prediction accuracy. Furthermore, the results of this study could help educational organizations adopt learning analytics to conduct early assessments to evaluate the quality of teaching and learning and improve learners’ performance. Keywords—Learning Analytics, Blended Learning, Higher Education, Teaching, Learning, Quantitative-cum-Qualitative, KPU


Introduction
Good performance of learners is a widely shared, primary demand and hope in the educational environment, and it guarantees self-realization and well-being in learners' academic lives. Currently, information and communication technology (ICT) is used in a variety of fields around the world, particularly in education. Blended learning (BL) is one of the most effective types of e-learning and has been implemented and cited by many works in the education environment [1][2][3][4][5]. However, even with the many cited benefits of using e-learning platforms in higher education institutions, there are still factors that should be considered to ensure its effective deployment.
In developing countries, particularly Afghanistan, despite ICT usage and sufficient ICT awareness in educational sectors, there is still a gap in individual learners' knowledge of ICT. Due to poor information resources, low motivation, and ineffective ICT tool utilization in the educational environment, it is very difficult to track learners regarding effective and efficient methods of learning and teaching and to analyze the details of their online learning activities [6]. Such difficulty makes the educational environment much more uniform and creates a large gap in curriculum plans and instructor practices in the education environment. In addition, in conventional learning, due to the large number of students and the unreliability of educational data, it is very difficult to monitor students' activities and provide possible intervention mechanisms for students at academic risk. This scenario leads the number of failing students to increase and prevents the quality of teaching and learning from improving or being predictable [7][8][9].
Learning analytics (LA), as a new tool, not only plays a key role in learners' successful performance, the prediction of possible outcomes in learning activities, and student retention rates but also enables academic institutions to achieve and evaluate the effectiveness of the learning environment for learners' behaviors and ensure learners' improvement and enrichment [10][11][12][13], [29][30][31]. LA provides opportunities to evaluate learners' performance in real time and determine the effectiveness of learning. This paper aims to extend previous research focused on determining students' performance through LA by correlating some student activities to their final grades. In addition, this study intends to investigate the major predictors of the successful completion of courses based on various learner aspects (interactive and noncognitive). Furthermore, this study aims to assess the best analytical model/s that offer the highest prediction accuracy and then investigate the relationship between analytical and descriptive statistics.

Related Work
The past decade has witnessed a global revolution in LA that has led to success in higher education, the improvement of BL and e-learning promotion. The aforementioned fields are shaped by the intersection of information and communications technologies, teaching and learning [14]. At the service and technology levels, particularly in higher education, such the interest in ICTs has resulted in the creation of important analytics tools that enable educational actors to make informed teaching and learningrelated decisions. Recently, LA has had broader institutional impacts on learning, teaching, administration and learners' support [15]. The use of LA in education sys-tems has become a promising alternative and replacement for traditional learning and teaching.
A review of LA in relation to the academic performance of students in a university in South Korea was conducted [16]. The authors analyzed the factors affecting students' academic performance and outcomes using Moodle log data of 84 students. The authors claimed that factors such as total studying time, interaction with peers, regularity of learning intervals, and number of downloads had a significant impact on learners' performance. In addition, [10] used a vector space model algorithm to aggregate logs and quantify data using a single numeric value that could be used to generate visualizations of students' levels of performance. The results demonstrated that determining a single value could help instructors make early assessments of students' performance.
Furthermore, [18] used principal component regression to predict students' final academic performance in a BL environment. The researchers combined online and face-to-face (F2F) variables and achieved optimal predictive performance. Similarly, [19] used a variety of data-mining methods, such as visualization, decision trees, class association rules, and clustering, to determine the perceptions of 337 students in a blended learning environment. The authors concluded that failure in the course was associated with negative attitudes and that excellent grades were associated with increased use of Moodle.
In addition, [20] collected data from 84 undergraduate female students who attended both F2F and online learning environments within one semester (16 weeks). The authors developed a model through multiple linear regression analysis to predict students' performance based on six different variables. They found that the model predicted learners' outcomes well and allowed students to increase their final grades by 33.5%.
Finally, [21] developed four innovative types of mathematical models, namely, a multiple linear regression model, a multilayer perception network model, a radial basis function network model, and a support vector machine model, for the prediction of students' achievement in an engineering dynamics course. The researchers claimed that the results revealed a slight effect on the average prediction accuracy and the percentage of accurate predictions.
These studies and others have revealed the role and effect of LA in educational environments, showing that LA enables academic institutions to reach specific learning goals. However, current measures differ in their approaches and are also somewhat limited in their ability to measure all aspects of learners' performance and outcomes and to effectively meet the needs of students and instructors, particularly in regions with fewer resources. Most recent studies have considered system log data (quantitative approach) and have determined learners' future performance after the conclusion of courses, which is impractical in real situations and makes the effective determination of significant factors difficult [6].
In addition, current studies mostly generate the results of their studies from examination of implicit actions in a single data set (fully online activities or classroom activities), which is ineffective and inefficient for the application of the results to blended courses [6]. Meanwhile, recent studies have not investigated improvements in students' noncognitive aspects, such as attitudes, collaboration, motivation, capability, perception, and pedagogy related to blended activities and how these aspects influence their performance and therefore the prediction process [9][10][11][12], [22][23], [27][28]. In turn, this gap in the research limits our ability to broaden the knowledge of such topic and identify appropriate strategies for analyzing various learner aspects to determine their performance and academic achievements.

Research goal
The intended purpose of this research was to assess various aspects of learners' academic performance and determine the factors that influence learners' academic performance. We aimed to achieve the following objectives:  To analyze learners' activities in relation to various features of selected courses in BL.  To identify students' behaviors in the online environment that are correlated to their final grades.  To determine the most significant factors that affect learners' academic performance in both blended and traditional learning environments.  To determine the consistency between analytical and descriptive statistics.  To identify the best analytical model/s that provide the most accurate metrics for evaluating the performance of students.

Research approaches
To achieve the research objectives, primary data were collected from log data stored in a learning management system (LMS) database. Hundreds of activity logs for each student were collected, classified and analyzed using the proposed methods. The overall LA process was conducted in four main steps. For the first step, the raw data captured from the data warehouse were cleaned and converted to a validated dataset. The dataset extracted in Moodle had missing values and redundant information that was processed and normalized in the second step. In this step, the extracted data underwent the normalization process, in which the actions logged by instructors and administrators were removed from the dataset and filtered by department, user identification and action. In the third step, data were evaluated with the proposed statistics [8]. Finally, the experimental results were summarized as recommendations and suggestions for academic organizations. Figure 1 shows the visualization of the LA process.

Data and study context
This study used a quantitative-cum-qualitative approach. The primary data (quantitative) were collected based on learners' explicit actions, including completion of assignments and exams, and implicit actions, such as interactions, posts on discussion forums, and other activities, recorded in data from two different blended courses for the 2018 academic year.  Compulsory sophomore-level class  17,416 LMS log files A total of 171 students were enrolled in the OS course, with 114 of the students from the Computing Information Science (CIS) department and Computer Engineering (CE) department who were in the blended course and 57 students from the Information Technology (IT) department who were in the traditional course. For the SAD course, 70 students from the IT department were in the blended course, while 68 students from the CIS department were in the traditional course. Table 1 presents the characteristics of the two courses for the 2018 academic year at Kabul Polytechnic University (KPU). As shown in Table 1, among the total number of enrolled students in the OS course, only 106 validated students were left after cleaning the dataset, and 62 students were left for the SAD course. The semester lasted for 15 weeks and included 3 online and 4 F2F learning activities, as shown in Figure 2. During the semester, students were granted unlimited access to course materials, but quizzes and assignments were open for a limited duration of approximately one to two weeks. For quizzes, the higher score of two attempts during the specified duration was used as the final score, and for assignments, the best score after class debriefing of the results was used as the final score. Figure 2 shows a diagram of the blended course activities.

Fig. 2. Blended course activities diagram
As shown in Figure 2, the students registered themselves before the classes through Moodle. During the courses, they had mixed activities: (i) online activities, in which they took quizzes, submitted assignments, participated in discussion forums and used course content, and (ii) in-class traditional activities, in which they attended weekly (F2F) sessions and took mid-term and final exams. To assess and enhance participants' knowledge, instructors provided additional weekly sessions for the evaluation and debriefing of weekly assignments.
The secondary (qualitative) study was carried out onsite in March and April 2019 in KPU; junior (OS) and sophomore (SAD) samples from both blended and traditional classes were included. We collected the required data in each stage of our study using three phases of data collection: (i) semistructured key informant interviews with educational actors, including students, lecturers, technicians and decision makers; (ii) a mini-survey using structured questionnaires to capture participants' beliefs, perceptions, motivations and behaviors to inform productive and collaborative conversations Course Registration Online Activities  Table 2 shows the characteristics of the collected data. The survey mainly collected qualitative information to explore participants' perceptions of, feedback on and agreement with the preliminary research results and to compare the results of the previous stages of the study with those of the current stage. Participants indicated their responses on a Likert scale (LS) ranging from Strongly Agree (5) to Strongly Disagree (1).

Data analysis
The collected data consisted of two main parts: (i) subjective data (see Table 3) and (ii) objective data (see Table 4). In Table 3, the first seven variables (S1-S7) and S9 were related to blended courses only, variables (S16-S18) were related to traditional courses, and the rest of variables (S8 and S10-S15) were related to both blended and traditional courses. As shown in Table 4, variables O1-O9 were extracted from log files from students' profiles. This combination of interactive variables and subjective variables was used not only to examine various aspects of students' performance but also to identify the significant factors influencing students' academic performance.  Attendance of lab-based work/class case study Mean score of assignments Ratio (0-100) O4 Mean score of quizzes Total access number Ratio (0->500) O7 Quiz attempt duration (minutes) Ratio (0-20) O8 Number of quiz activities Ratio (0->270)

Results
For the experiment in this study, we used an exploratory research design using both descriptive and analytical statistics to describe the associations of variables to each other, demonstrate cause and effect relationships between variables and identify the ability of one variable to predict another [4]. The analysis tool was designed in the R programming package to enable us to extract any type of analytic and statistical information as required.

4.1
Analytical statistics Logistic regression model (LRM): performance assessment: Logistic regression (LR) is a type of predictive analysis that is used to determine the relationship between dependent and independent variables [11][12], [29]. For the quality of the analysis and credibility of the data, we used LR to model, estimate, predict and finally classify the collected data. In this experiment, we considered five interactive variables. Due to differences in course setup, we considered each course separately and divided the data set for each course into six equal subsets (k = 6-fold), and the random split was used for training and test set formation. Each time one of the k subsets was used as the test set, and the other k-1 subsets altogether formed our training set. Finally, after the calculation of the average error across all k trials, the average accuracy metric of 6 testing sets was removed.
For each period for the OS class and SAD class, the collected data first was accumulated on a week-by-week basis and then was classified based on a series of 8 steps to identify the weekly prediction success. The first three weeks were introductory and did not include any online activities, so analysis of the training and testing datasets started from the fourth week. For the first step, we considered only variables extracted from the first week's log data. The second step covered the accumulated data from the log data of the first and second weeks. Thus, the last step (week 14) included data from the first week until the end of the 14th week of the semester. The experiment was designed based on the following criteria. a) Target or dependent variables The decision boundary for the determination of students' performance was based on equation (1): where I: total number of samples after cleaning the data set (106 for OS and 62 for SAD) N: number of weeks (15 weeks in a semester) Ot: number of quiz attempts and submitted assignment threshold : cumulative weekly election based on the number of online activities for the i th student : predictor variable, for the i th student in the n th week : score of the i th student in n th week : access log frequency of the i th student in the n th week At: access log threshold Qt: quiz and assignment score threshold b) Predictors or independent variables  O4 -O8 As mentioned above, the target or dependent variable in this study was performance and took two values: "0" for low performance and "1" for high performance. For low performance, the target condition S was satisfied when equation (1) became less than the defined arbitrarily thresholds. In such a case, the function assigned the student as an inactive (low performer) status or an active (high performer) status otherwise. The statistical moving thresholds were calculated based on aggregate data on a week-by-week basis to model data based on log patterns and online activities of students. For each week, the average 'access log' was calculated, and 30% of weekly access was defined as a minimum threshold for each student because students accessed online materials on campus. Meanwhile, the mean score of quizzes and assignments threshold was set to 0.45 as a minimum score for high performers. Qt was arbitrarily set to 0.45 based on the high academic pressure on students, the volume of students' online and practical activities during the semester and the availability of essential content before mid-semester to pass the courses. In addition, the number of submitted assignments and quiz attempts was accumulated on a weekly basis, and less than 70% of the number of submitted assignments and quiz attempts was established as an early indicator of a low performer for each week.
Meanwhile, the predicted variables were combined with interactive variables (O4-O8). To improve the accuracy and predictive ability of the models, we applied various accuracy metrics, including accuracy, sensitivity, and specificity. The overall accuracy metrics were calculated through the following equations (Eq. 2-4).   Figure 4 illustrates the average sensitivity metric of the 6-fold test prediction for both courses.

Fig. 4. Prediction sensitivity for both courses
As shown in Figure 4, the prediction sensitivity for both courses was promising, particularly for the OS course. For the first week of online activities (4th week), 86% of the high-performing students were correctly predicted, and 85% were predicted for the SAD class. The sensitivity score for the OS course had 10% variation, which indicated higher sensitivity stability of the OS model than the SAD model, which had 21% variation. The sensitivity of prediction for the SAD class reached 93% just before the mid-term exam (7th week) and 96% at the end of the semester, which was high enough to provide early information on students' engagement in online activities and grade prediction. The sensitivity measures for the OS class were slightly lower than those for the SAD class, which may have been because of differences in attitudes, beliefs and motivation between students from the two departments who were enrolled in the OS class; the contribution of students to online activities; and the contribution of the online activity score to the final grades and course setup. For the OS class, 84% of high-performing students were correctly identified before the mid-term exam, and the highest percentage of 85% was reached at the end of the semester.

c) Specificity
 Indicates the accuracy of the prediction of low-performing students as calculated in Equation 3.
The average prediction specificity for both courses is illustrated in Figure 5.

Fig. 5. Prediction specificity for both courses
Conversely, the prediction specificity for both classes was relatively the same. For the OS class, 96% of the low-performing students were correctly predicted to have academic risk at the first week, which was the highest prediction rate, and 91% and 92% were predicted to have academic risk before the mid-term (week 7) of the semester and at the end of the semester, respectively. For the SAD class, 83% of the lowperforming students were correctly identified at the first of the week, and this percentage reached 88% before the mid-term. The specificity score for both courses had 7% variation. This prediction specificity could be used by higher education institutions for the early identification of students' success and could give institutions the opportunity to intervene in a timely manner and assist low-performing students during the semester to prevent failure or drop out and increase course retention rates. Figure 6 and Table 5 summarize the accuracy, sensitivity and specificity results for both courses.  In conclusion, the overall results for both courses were promising. However, the OS class data showed better results than those of the SAD class; in particular, the values of the vertical axis increased almost constantly from W5-W12, whereas for the SAD class, the values of the vertical axis fluctuated. In addition, there were significant differences in the sensitivity and specificity metrics between the two courses. The result shows the highest sensitivity for the SAD course but the highest specificity for the OS course. Based on the findings, the major reasons behind this finding could be differences in course setup, followed by the contribution of students to online activities, particularly the number of times they accessed the LMS, and the contribution of the online activity score to the final grade. We also found statistically significant differences (p<0.001) in the course completion rates and overall final results between the SAD and OS courses. In addition, we found that significantly more junior OS students were employed (had a part-time job) than sophomore SAD students (p<0.01). Furthermore, there was a significant difference (p<0.01) in internet usage between the SAD and OS classes. Therefore, we determined that sophomore students performed better than junior students and that online activities have the potential to exert an influence on learners' attitudes toward learning, thereby enhancing or hindering their ultimate achievements in the learning context. Hence, the models of each metric provided the greatest accuracy of prediction with the highest stability, which will make it possible to apply the models in real time in the future. The results obtained were comparable to and consistent with those of previous studies that achieved early prediction of at-risk students and offered interventions before the mid-term exam or end of the semester [17][18], [22], [27][28][29], which is too late.
K-means clustering results: at-risk cluster determination: In the second stage of analytical statistical analysis, we applied the k-means clustering algorithm using data at any week since the fourth week to track the behavior of students and determine an "at-risk" cluster in the unlabeled data. The clustering algorithm was applied to the data-based interactive variables (O1-O8). We divided the 'O8' attribute into three 'values' and used k=3 using the k-means for the courses. The results allowed us to categorize similar data into groups and assign candidates to any cluster based on their efforts that distinguished them from the "at-risk" cluster (O8~F) for each subsequent week. The clustering results are summarized in Table 6. The results obtained using the k-means were comparable to those of similar studies, except in our study results, a maximum of 42% of students in the OS class and 26% of students in the SAD class were predicted to be at risk, while in similar studies, 10% to 23% of students were identified to be at academic risk [27]. In addition, the prediction specificity of the atrisk cluster in our study was promising. The results showed that in each subsequent week, a group of students was at academic risk, which reflected the students' performance from the third week onwards. This result is in accordance with the findings of previous studies that achieved early prediction of at-risk students after one-fourth of the semester period [27] and before the midterm exam [18]. Table 7 depicts the comparison of the means between clusters for online activities for week 14.  In Table 7, "Cluster 1" represents mainly high-performing students with high grades 1 (A, B), "Cluster 2" represents mostly average-performing students with grades of C or D, and "Cluster 3" includes mostly failing or dropout students (F).
The clustering results for both courses were relatively effective. As shown in Table  7, the candidates were divided into three clusters according to their online activities, and the online activities had a direct impact on their performance. For instance, it is apparent that Cluster 3 was comparable to the two other clusters based on clustering, but according to online activities, all clusters were dissimilar. Therefore, the results revealed that online-related activities were solid indicators of students' performance. In addition, students with a high number of times accessing the system had high grades. This result is in contrast with that of [10], who found that longer time spent on Moodle may not result in students' course achievement. Thus, we conclude that there are similarities in the way that students who are at academic risk in educational programs are ranked in the at-risk cluster, such that those with a similar number of times accessing the system and online activities as well as those with low GPA belong to the same cluster. Such a conclusion can have an impact on improving student performance in future periods and motivate students to use online activities. In addition, this conclusion indicates the importance for lecturers to provide students with the necessary feedback on performance and outcomes.
The major aim behind the overall approaches (supervised and unsupervised) was to accurately measure the data and determine which of the algorithms would provide the best accuracy and stability for detecting students' performance. Among the different types of analytics, we have achieved the most stable and balanced results from LRM, which yielded the highest sensitivity and specificity metrics for both courses, with 7% variation compared to the k-means model variation of 17-23%. Therefore, the results revealed that the LRM was the best model with higher stability and accuracy metrics than k-means, which predicted failure-prone and high-performing students as early as the second week of the semester.

Descriptive statistics
To accurately predict the learners' performance, determining the most efficient factors influencing learners' actual performance and examining factors beyond interactive aspects (noncognitive aspects), descriptive statistics were used in this study. The use of descriptive statistics not only helped us identify the noncognitive factors that had a major impact on student performance but also were consistent with the analytical results. In this analysis, two different approaches were used to achieve the expected results. In the first approach, we considered the perceptions of passing and failing students in BL courses. In the second approach, we also considered the perceptions of students in traditional courses and drew conclusions by comparing the perceptions of traditional and blended learners.
A set of questionnaires was administered to both BL and traditional classes that consisted of 46 items and 3 sections, including demographics, technical and suggestion/comments. In questionnaires for both classes, a Likert response scale was used to measure respondents' attitudes toward the questions. In this analysis, three BL (n=169) and two traditional (n=111) class samples were included. Among the BL students, 37% (n=63) were female, and 63% (n=106) were male. Among the traditional students, 35% (n=39) were female, and 65% (n=72) were male.
For the analysis of this study, descriptive statistics (mean, M; standard deviation, sd) were used to summarize each item. In addition, Welch's t-test was used to assess whether there was a difference in the average responses of passing and failing students, as well as in BL and traditional students' perceptions. A P-value <.05 was considered statistically significant. Table 9 presents the descriptive statistics for both groups of students. In Table 9, the percentage of positive responses (Strongly Agree, SA, and Agree, A), mean (M), standard deviation (sd), and p-value are provided for each item.
Welch's t-test yielded a p-value <.01 for the first three items and items 6-8 and 14-16, indicating significance at the 1% level. In addition, the results yielded a p-value <.05 for items 5, 10, 12, and 13, indicating significance at the 5% level, whereas the remaining items (4, 9, and 11) showed no statistically significant differences between passing and failing students' perceptions.
Based on the findings, both passing and failing students perceived that the majority of online activities could influence learners' academic performance; for instance, online materials, quizzes, and assignments were strong indicators of students' performance. In addition, in-class interaction (Item 10) and in-class debriefing assignments (Item 8) were much more associated with students' interest than online forum discussion (Item 9). This result is in contrast with the findings of [23], which revealed that discussion posts and peer interaction influenced students' academic performance in blended learning. In addition, both groups agreed with item 12, "I am willing to make time to use online activities that affect my academic performance in my learning" (M=4.5, sd=0.9 for passing students and M=4.1, sd=1.1 for failing students).
Conversely, all students' perceptions were not positively correlated with class size; approximately 80% of respondents disagreed or strongly disagreed with item 11 (M=2.3, sd=1.3 for passing students and M=2.4, sd=1.2 for failing students). This result is in accordance with those of [25], who found that having a large number of students in one class is considered a disadvantage in improving academic achievement. Meanwhile, item 14, "student part-time employment", as well as item 16, "less attendance of lab activities," had completely negative effects on students' academic performance.
Furthermore, there was a significant difference between the two groups (p<.05) for item 13. The results revealed that passing students' attitudes and feelings (M=0.7, sd=0.5) were much more strongly associated with their final achievement than were failing students' attitudes and feelings (M=0. 6, sd=0.5). Therefore, the results revealed that failing students may not be as confident as passing students in their ability to successfully complete a course. Similarly, for the second approach, significant differences were observed between BL and traditional items. Table 9 presents the descriptive analysis of some of the items with significant differences between BL and traditional learners' perspectives. The study results indicated that there was a significant difference between the two groups for all considered items, with p<0.05.
In conclusion, in both approaches to the descriptive analysis, the responses indicated that there was a direct link between students' performance and their perceptions, with the students' success in the course totally depending on the educational environment and perceptions. Therefore, it is argued that the students' perceptions, as well as interactive variables, act as strong predictors of students' success.

Relationship mining: comparison of subjective and objective variables
Relationship mining is one of the most robust analytical algorithms and is used to determine differences between defined variables [9]. In this study, we applied correlation analysis to describe the relationship among the online activities initiated by students, students' perceptions and students' course achievement (final grade). For this purpose, we observed fourteen variables and found substantial correlations between the majority of them. Table 10 indicates the correlation analysis results for the OS and SAD classes. As shown in Table 10, the results indicated sufficiently strong positive significant correlations (p<0.001) for most of the variables for both OS and SAD courses. The correlation analysis showed positive significant correlations of variability scores (interactive variables) and perceptions (noncognitive aspects) with students' final results. For instance, there was a strong correlation between students' final results and the mean quiz score (O4) (r=0.7, p<0.001) for the SAD class, and a moderately positive association existed (r=0.35, p<0.001) for the OS class, which was strong enough to indicate that students who attempted more quizzes obtained better scores.
Similarly, there were sufficiently strong positive correlations between students' perceptions of activities (subjective variables), particularly online activities (S1-S3, S6-S8), and their final results, which were sufficient to indicate students' behaviors and attitudes. This result is consistent with the findings of [10], [19], [26], who argued that the regularity of the learning interval in LMSs influences students' academic performance and achievement.
Another area in which very strong positive correlations existed was between "student part-time employment" (O1) and its related item (S15) for the SAD class (r=0.85, p=0.001) and OS class (r=0.58, p=0.001), which clearly indicates that parttime jobs have a negative effect on the performance of students and interfere with their studies. Conversely, there was a high negative correlation between S13 and S14 for the OS class (r=-0.82, p<0.001) and the SAD class (r=-0.72, p<0.001). The results showed the negative feelings and attitudes of students toward their education experiences. The major factors mentioned by students were psychological factors such as depression, frustration, and fear of examination. In addition, they mentioned other factors, such as the professionalism of lecturers and lack of motivation.
In conclusion, we identified six subjective variables (S1-S3 and S6-S8) and seven interactive variables (O2-O8) as critical variables that affected students' final results. Additionally, having a part-time job and having negative feelings had a strong negative impact on students' performance and their outcomes. Therefore, we assume that there is a direct link between students' achievement and online activities so that students who are more interested in online activities are less exposed to academic risk, even if they have a part-time job and negative feelings. Such findings could be used to allow lecturers to respond with proactive early intervention whenever students lag behind based on the observed correlations.
In addition, an interesting finding of our study was that the descriptive and analytic statistics were consistent. The results obtained from both statistics showed that the same variables were significantly correlated with students' final grades. Specifically, the sensitivity results of the LRM for high-performing students from both courses were consistent with the descriptive results. However, the specificity results of the kmeans algorithm for low-performing students were much more consistent with descriptive statistics.

Discussion and Conclusion
In this study, different statistics (descriptive and analytical) were used to investigate various aspects (interactive and noncognitive) of student learning to determine how these aspects influenced students' performance and were effective in predicting performance in real time. Previous studies have not explored this topic [9][10][11], [22][23]. The major goal behind using these types of statistics was to provide appropriate measures for our sample and data, as well as to determine the relationship between variables to enable for predictions and inferences.
Using analytical statistics, we applied different methods of data mining (supervised and unsupervised) to track the details of learners' online activities and behaviors in a variety of ways. LRM models of predictive variables were developed with a variety of accuracy metrics. The models provided more accurate and balanced results for both courses and highly effectively predicted potential student performance during the semester, which can support early interventions for low-performing students and improve their chances of academic success. In addition, the models make it possible to provide early information on students' engagement and grade predictions before the end of the semester.
Importantly, k-means clustering allowed good determination of students' online behavior. This technique provided adequate information about students' academic characteristics for each subsequent week and grouped them with the nearest mean based on their efforts and abilities. Such information could be used to identify learners with different expected performances, thus detecting needs for pedagogical improvement and measuring the results. Furthermore, the model had the highest specificity metric for the prediction of failure-prone students for each subsequent week, which was consistent with the results of the supervised (LRM) model. In conclusion, the experimental results demonstrate that compared to k-means, the LRM had higher stability and more balanced results, with 7% variation, which indicates that the model has superior accuracy metrics for detecting the students' performance.
For descriptive statistics, two approaches were used (analysis of the perceptions of passing and failing students and of the perceptions of BL and traditional students), which yielded similar results. The results for both approaches revealed that students were willing to engage in online activities; however, the BL students' perceptions indicate that the majority of online activities could have more of an impact on their final results than traditional activities. Similar results were obtained from the analysis of passing and failing students' perceptions. Thus, this information could be valuable for educational actors in estimating actual data parameters as well as examining the variables that influence students' performance. Furthermore, this finding could indicate a great opportunity for students to motivate themselves, as online activities not only do not require much effort to use but also increase students' levels of interaction.
In addition, through the relationship mining method, major factors that could play a key role in learners' performance and final results were identified. Among the interactive and subjective variables, six subjective variables (S1-S3, S6-S8) and seven interactive variables (O2-O8) were identified as critical indicators of students' success and retention. This finding also reflected the strong consistency between the descriptive and analytic statistics. Conversely, this study identified that students' part-time jobs and negative feelings toward teaching and learning had a strongly negative influence on their final performance. Such information could have a major impact on students' ability, competitiveness, and performance and could be used to make accurate predictions and decisions in the future.
In conclusion, the overall findings revealed that the university must focus more on online activities, provide enough facilities, and encourage lecturers and students to use e-learning in their teaching and studying. In addition, there is a need for lecturers to improve course materials and increase the number of times students access online activities by incorporating online quizzes, assignments, forum discussions, alternative course references, and lab activities. Meanwhile, it is necessary for lecturers to maintain positive behavior in their interactions with students and implement teaching processes in a beneficial manner to help students overcome fear and anxiety. It is also vital for students to be dedicated and responsible in their studies. Finally, it is strongly recommended that the Ministry of Higher Education give appropriate consideration to the course evaluation system and devote a substantial percentage of the overall grade to student activities (place more focus on formative assessments rather than summative assessments).
Considering the results obtained, it could also be possible to minimize the large gaps across curriculum plans and instructors' performance in the educational environment, thus providing opportunities to challenge instructors to arrange course materials and prepare well for the classes. In such cases, it would be easy to identify teachers who are performing well and teachers who need assistance with teaching methods. Furthermore, based on the results of this study, universities and higher education institutions can take action to integrate descriptive and analytical statistics into their decision-making processes for the quality of teaching and learning.
Overall, the experimental results demonstrated positive outcomes of describing students' performance in relation to various aspects (interactive and noncognitive) and clearly indicated that both interactive aspects of student learning and noncognitive aspects, such as attitudes, collaboration, feelings, and motivation, have strong correlations with students' performance. Furthermore, it is argued that the findings of this study could help educational actors perform early assessments of the quality of teaching and learning and improve learners' knowledge and motivation. A summary of the most common challenges and important recommendations will be provided in the future. In addition, in further research, there is a need to incorporate more data for real-time prediction to move toward achieving better accuracy and realizing the potential of LA to optimize teaching and learning.