Research on Personalized Recommendations for Students’ Learning Paths Based on Big Data

With the development of the Internet, the use of hybrid learning is spreading in colleges and universities across the country. The urgent problem now is how to improve the quality of hybrid learning; specifically, how to improve the learning effect of students under an online learning mode. In this paper, we build an online learning path model by exploring the big data of students' online learning processes. The model can be used to find excellent learning paths. Based on students’ learning habits, we recommend personalized and excellent learning paths with a high degree of similarity for general students. By comparison, experimental results indicate that our proposed methods not only provide sound recommendations regarding appropriate learning paths with significantly improved learning results in terms of accuracy and efficiency, but our methods also provide support that helps to improve teaching quality, promote personalized learning and target teaching. Keywords—Big data, learning path, similarity, personalized recommendation


Introduction
Today's classroom design and large-scale online courses have fundamentally changed the mode and accessibility of both learning and teaching; they have also significantly affected the academic research into and teaching of higher education. In a traditional face-to-face classroom, the students' dominant position cannot be well reflected. However, only learning online is also not conducive to the students' systematic management and mastery of knowledge. In order to integrate the advantages of the traditional classroom and online learning, and reform the inherent teaching modes and methods, the international education community initially proposed the idea of hybrid learning at the end of the 20th century. Since then, research on hybrid learning has attracted more and more attention. As institutions of higher education began to adopt the hybrid teaching method, more began to learn online. In addition, mechanisms were put in place to record the online and offline learning teaching log data. Faced with massive amounts of data, researchers use different ways to obtain teachers' learning log data. They analyze the learning log data from different perspectives, in order to uncover the best possible learning path. However, different teachers have different teaching habits and styles. With the increase in the number of personalized teachers, the best learning path (discovered by researchers) is difficult to adapt to each type of student. Therefore, the recommendation of an adaptive personalized excellent learning path has become a research hotspot. These studies will be of great reference value with regard to improving students' learning effect and teachers' personalized teaching methods.
To provide teachers with personalized learning content and an adaptive teaching style, two major problems need to be solved. The first is the mining of both the learning path and the excellent learning path. The second is to find the shortcomings of the general learning path and to recommend the personalized learning path. According to the problems mentioned above, this paper uses process mining technology to mine the excellent learning path. Then, the matrix process similarity principle is used to recommend the best teaching process and resources for each student. It should be noted that the first time "learner" is used, that in this paper, "learner" means student.

2
Related Work

Learning process optimization
In the era of education informatization 2.0, technologies such as digital teaching resources and telecommunications were developed. These technologies promoted mobile learning based on mobile Internet and smartphones, and gradually became the most recent trend of education development [1]. With hybrid teaching currently in full swing, the study of learning process optimization is now being developed from various aspects.
Yang et al. proposed an important learning method that meets the individual needs of learners and which applies to lifelong learning. The method is fragmentation learning, which can improve and consolidate the knowledge acquired in the learning process [2]. Zhao et al. hold that educational technology is the theory and practice of designing, developing, utilizing, managing and evaluating the learning process and resources. The learning process can be optimized by using changes in educational technology [3]. Xie et al. found a suitable learning path for a group of teachers (rather than a single teacher) in an e-learning environment, through the framework of a profile-based group learning path [4]. Yang Lin et al. proposed a method based on the knowledge concept network topology. The method was created to optimize the learning path. This method uses different networks connected by related concepts to represent different domain knowledge, in order to analyze the concept of the correlation between the massive data in the learning process [5]. Lin et al. used relevant data, such as learning time, playing days and learning chapters, to cluster the learning behavior of different groups by a K-Mean clustering method. By discussing the relationship between learning behavior and learning outcomes, a proposal for learning process optimization was proposed [6]. Zhou et al. used clustering and machine learning techniques to predict their learning path and learning effect. The study then puts forward suggestions regarding how to optimize the learning process [7].
Existing research into the learning process mainly elaborates the learning process from the macro perspective of the learning environment, learning groups, and learning behavior. These studies also put forward some general suggestions on how to optimize the learning process. However, due to the differences in students' learning habits, learning styles and learning processes, being able to recommend personalized learning paths for different students is particularly important.

Personalized learning process recommendations
Resnick proposed the concept of personalized recommendations in 1994 [8]. Since then, personalized recommendation technology has become a research hotspot. De-Marcos et al. believed that making a learning resource recommendation is a multiobjective combinatorial optimization problem. Modeling teachers to optimize and combine the content and sequence of appropriate learning resources was declared to be the most critical technology [9]. Liu Min et al. analyzed the teaching style, online teaching preferences, the teacher's knowledge structure, the teacher's online teaching behavior, and the results. The study then made personalized recommendation settings for the content, type, recommendation time and frequency of learning resources [10]. Avi et al. proposed a personalized teaching content algorithm which combines a collaborative filtering algorithm with voting methods [11]. Vanitha et al. presented a collaborative optimization algorithm, combining ant colony optimization and a genetic algorithm to provide learners with a personalized learning path [12]. Huang et al. put forward the framework of a personalized learning resources recommendation system. The system is based on a knowledge map, which provides a technical solution for the establishment of a personalized learning resources recommendation system [13]. In order to meet the needs of different teachers for learning paths, Liu et al. proposed an intelligent learning path recommendation model based on an ant colony algorithm, which could be used for the personalized customization of learning paths in an intelligent learning environment [14].
The personalized research provided a useful reference point for the personalized path recommendation in this paper, mainly from the aspects of learning resources, learning behavior, teaching content, etc. However, few studies examine personalized recommendations from the aspect of a student's learning path. In this paper, we first mine the learning paths of excellent students by using process mining technology. Then, we use the similarity principle of learning paths to recommend excellent learning paths for students with average or poor grades. Finally, we put forward optimization suggestions to improve students' curriculum performance.

Learning Path Personalized Recommendation Method
The learning path refers to the route and sequence of learning activities. It is the ordering of the learning activities that the learners need to complete, and this must be done according to the learning objectives and learning content, under the guidance of certain learning strategies [15]. It is possible that the study of learning paths will not only provide learners with a clear learning route and improve students' learning efficiency, but such study can also serve as the basis upon which managers can evaluate students' learning.
In this paper, the personalized recommendation method of the learning path was mainly studied from three aspects: data preparation, excellent learning path mining and the personalized recommendation of a learning path. Firstly, we collected and organized data during the data preparation phase. Secondly, process mining technology was used to mine the learning logs of excellent students. The excellent learning path models were formed as the basis for the recommendation of the learning path for ordinary students or students with poor grades. Then, the personalized learning path recommendation was proposed by calculating the similarity of the learning paths. Finally, based on the gap between the excellent path and the general path, we suggest ways to optimize the learning path. The specific research route is shown in Fig. 1. Data Preperation

Data acquisition
The data in this paper are from the learning website platform of Hebei University of Science and Technology. We took the Electronic Commerce System Analysis and Design course at level 15 and level 16 as an example, in order to analyze the learning path of online students. This is the platform for online learning: the teachers publish online learning tasks in the teacher's backstage system. After class, the students should watch videos or download materials to complete the tasks independently. Teachers need to enter their own system to check and score students' task work. In the student operation interface, there is a separate discussion area, as well as facilities for video learning, data downloading, job viewing, and job submission. In the discussion area, students can publish questions with regard to topics or problems they don't understand and then wait for responses from their classmates or teachers. In addition, students can directly reply to the published questions in the discussion area, in order to help other students to understand and obtain the corresponding and relevant knowledge. Video learning and data learning can be either viewed online or downloaded. Students can view assignments and submit completed assignments on the job review page.
In the Electronic Commerce System Analysis and Design course, the teacher published seven learning tasks, each of which was graded. For each task, the teacher awarded the students one of five grades, from A-E. After communicating with the administrator, we obtained the learning logs of the two classes. Then, we sorted out the seven learning path logs of the 57 students with a grade A as excellent path mining logs.

Data procession
The collected data were generally messy and disorderly; it needed to be cleaned and organized. The ID numbers of the data directly obtained from the learning website platform were not continuous, and we needed to organize the learning events by each student. As repetitive events were inevitable in the student learning process, we needed to organize the same event log by time, in order to ensure the reliability of mining. In addition, the Prom (process mining software) needs the data that is the XES structure. Therefore, it was necessary to convert the log from CSV format into XES format (shown below in Fig. 2). This data used the plug-in in the Prom process mining software, which can convert CSV into the required XES structure. The data used the plug-in in the Prom process mining software which can convert CSV into XES structure of the Prom process mining software to change the data format (the plug-in shown in Fig.2).

Excellent Learning Path Mining
Process mining, also called workflow mining, was initially proposed by R. Agrawal in 1998. Process mining refers to those methods that extract the structured process description from the actual execution set. The purpose of process mining is to extract information from log data, establish a clear process model, and ensure that the process model being built is consistent with the actual process.
The Prom counted students' learning paths after the data were transformed into XES. The Fig. 4 shows the 389 learning paths and 4011 events found in the log summary. After visualizing the learning paths, you can see that there are five frequent learning paths (shown in Fig. 5).   By selecting the visual items in Fig. 4, the Prom is able to show the learning paths in Fig. 5. Also, Fig. 5 shows that the first five learning paths account for more than 60% of the 389 total learning paths. We regarded the first five learning paths as frequent excellent learning paths for analysis purposes. We classified the frequent excellent learning paths into five learning styles: discovery learning, discussive learning, exploratory learning, cooperative learning, and task-based learning.
1. Discovery learning: This learning path can be mapped as follows: start→register→taskviewed→datadownloaded→videoviewed→jobviewed→video viewed2→dataviewed→commented→interacted→jobfinished→exit→end. The feature of this learning path is that students will download the relevant data and learn the relevant video data after entering the website. Then, they do their homework according to the learning content. In the task completion process, if there is something that students do not understand, they will first study independently and then discuss the task with each other, in order to finally complete the job. 2. Exploratory learning: This learning path can be mapped as follows: start→register→videoviewed→dataviewed→commented→interacted→exit→end.
The feature of this learning path is that students do not study for the purpose of finishing the job; rather, they learn and discuss the contents independently. These types of students generally have better consciousness. 3. Cooperative learning: This learning path can be mapped as follows: start→register→taskviewed→videoviewed→dataviewed→jobviewed→commente d→interacted→jobfinished→exit→end. The feature of this learning path is that students will first seek cooperation. Then, they will complete the finished task through discussion and cooperation when they don't understand something in the job completion process. 4. Discussive learning: This learning path can be mapped as follows: start→register→taskviewed→commented→datadownloaded→videoviewed→com mented2→jobviewed→interacted→jobfinished→exit→end. The feature of this learning path is that the frequency of discussion is obviously higher than the frequency of video learning. As such, students complete the final job through interaction and discussing the problem. 5. Task-based learning: This learning path can be mapped as follows: start→register→taskviewed→datadownloaded→videoviewed→jobviewed→jobfin ished→exit→end. The feature of this learning path is that the only reason these students enter the website is to submit the job. They usually don't spend too much time online. On the contrary, they usually perform better in an offline learning environment.
Through the excellent learning paths detailed above, we can clearly and intuitively see the learning path of learners, which are mined using process mining technology as applied to the mining of learning paths. The excellent learning path provides a reference for students with general or poor academic performance. In addition, using process mining technology to mine learning paths may not only provide a clear learning path for learners and improve students' learning efficiency, but this method may also provide a basis for helping managers to evaluate students' learning.

Deviation Analysis About General Path and Personalized Recommendations
Student path recommendations should be based on a student's existing learning habits. The above five learning paths are the most frequent of all excellent learning paths, which represent five learning behavior habits, respectively. When recommending learning paths to a general student, we need to discover the characteristics of that general students' learning path, in order to recommend a similar learning path. Based on the above analysis, this paper will use the similarity method of the matrix process to recommend the most excellent learning path.

Similarity of matrix processes
The difference between two numbers can be obtained by subtraction. The smaller the value is, the closer the two values are. Similarly, two processes can also derive a similarity value by subtracting one from the other. This similarity value can be used to indicate the similarity between two processes [16]. However, the process cannot be subtracted directly, so the existing methods basically measure the similarity of the process by calculating the distance of the graph. In this paper, we calculate the similarity value according to the matrix of the learning process. Specifically, it is assumed that PM and PM', respectively, represent a matrix corresponding to an excellent learning process and a general learning process. The difference matrix (DM) can be obtained by subtracting the corresponding elements of the two matrices. Then, we calculate the absolute values of each element in the DM, and the absolute values are summed to represent the difference between the matrices. The number of rows and columns of a matrix may be unequal. Therefore, if we want to subtract two matrices, we should first normalize the matrices and convert them into matrices with equal numbers of rows and columns. The definition of matrix standardization is as follows: 6-1(Standardization of Matrix): Suppose that NM and NM' represent the standard matrices corresponding to PM and PM', respectively. The construction method of the standard matrix can then be completed in the following steps: 6. The number of rows and columns of the two standard matrices should be equal, and the number of rows and columns is equal to the number of sets of all events. 7. In the standard matrix, the relationship between events is expressed by a unified subscript. The subscript positions of two events in different sequences are identical in the standard matrix. 8. In the standard matrix, the relationship between events is expressed by a unified subscr NM (i, j). Also, NM (i, j)' represent elements of Row i and Column j in NM and NM, and their element values are expressed in the following form (where transition indicates a mapping relationship between two events)： According to the definition, the discovery learning path can transform into a standard matrix, as shown in Fig. 6, and the discussive learning path can convert into a standard matrix, as shown in Fig. 7. The rest of the learning path standard matrix is omitted.  The number of elements of NM(i,j)≠0 and NM'(i,j)≠0 is represented by N0; N1 stands for NM(i,j)≠0, and N2 represents the number of elements of NM'(i,j)≠0. By subtracting the elements corresponding to the positions in the two standard matrices of NM and NM', we will obtain a difference matrix DM=NM-NM'. We take the absolute value for every element in DM and get the absolute value matrix |DM|. The similarity between processes can be calculated by Formula (1) as follows: The similarity result must be a value between 0 and 1. The following is an analysis of Formula (1) ： The maximum value of MDS(PM, PM') is 1. When PM and PM' represent the same path, the path matrices of the PM and PM' must be the same, and their standard matrices NM and NM' must be the same. Then： The general learning path of one of the students is shown in Fig. 8. Then, Fig. 9 shows the DM between the general learning path and the discovery learning path. Finally, Fig. 10 shows the DM between the general learning path and the discussive learning path. After using the above principle to calculate the similarity, we can see that the path of this student is MDS (PM, PM') = 0.35, compared with the discovery learning path. Compared with the discussive learning path, the student's path similarity is MDS (PM, PM') = 0.73. Therefore, it can be seen from the MDS that the discussive learning path is more suitable for this student.
We can find that this student is more suited to the discussive learning path from the differences in learning paths. However, the student lacks elements such as data download, video learning and problem publication in the discussion learning path. From the learning path, you can see that the student only paid attention to the discussion of the problem in the process of learning, but neglected the selfimprovement element of learning. The advice for the student is to post questions to the website and discuss the possible answers with classmates after viewing the assignments. However, before solving the problem with other students, the student should first learn to solve the problems that he/she can solve on his/her own. Selflearning methods include video learning and data downloading. In the end, the results of the discussion can be seen in the comments section. Of course, students can also publish the problems they have solved and share resources to achieve self-learning.   In the case of paths that receive fewer recommendations, the similarity of matrix processes method is appropriative to this paper. This method is not only able to calculate the gap between the path of ordinary students and excellent students, but it can also directly see the differences of their learning paths and discover the shortcomings of general students and students with poor performance. Therefore, this method can offer a proposal regarding learning path optimization.

7
The Analysis of Learning Effect

Analysis of experimental results
Based on the above research methods, this paper conducted an experimental study of the personalized recommendation of learning paths. The study involved 60 students who had average or poor academic performance. Of that group, 30 students used personalized recommendations to learn. We named these students the experimental group. The other 30 students continued to learn in their own way, and they are referred to as the free group. The experimental results are shown in Figs. 11 and 12 (abscissa represents students and ordinate represents scores).  From the experimental results shown in Fig. 11 and Fig. 12, the scores of the students in the experimental group can be seen to have significantly improved, while the scores of the students in the free group did not change to any significant degree. In addition, as can be seen from Table 1, the grades of the students in the experimental group improved significantly. Meanwhile, in the free group, the students' grades are not significantly different from their previous grades. The above results show that the use of process mining technology and the path similarity method is very helpful in improving students' learning effect.

Discussion
Many studies have been done that examine learning paths and personalized learning recommendations. The research results in this paper are consistent with the research results of Lin Qilin [6] et al., which were based on the clustering of student groups. All such studies show that learners' behavior has a great impact on learning effectiveness. In addition to the study of group characteristics, Dwivedi [12] et al. studied learners' learning style and knowledge level by using a variable length genetic algorithm. This study also believes that personalized learning style research and learning resource recommendations are of great help to students trying to improve their academic performance. In general, whether it is the study of group learning path or the study of the recommendation of the learning path of knowledge of a single iJET -Vol. 15, No. 8, 2020 student, from the experimental results, both can improve the learning effect of students and the management level of teachers. In addition, the academic performance and learning style of many experimental participants in this paper were different. This was very consistent with the personalized recommendation theory of the different learning styles of Huang Huasheng [13]. This indicates that the recommendation of personalized learning paths is also of great significance to the improvement of students' academic performance. All the above studies can show and support the significance of this study.

Conclusion
In this paper, a new personalized learning path recommendation method based on students' learning style is proposed, based on the in-depth mining of online students' learning logs. Firstly, the method applies a process mining technique to mining a students' learning path and obtained an excellent learning path with a general learning path. Secondly, the principle of process similarity is applied, in order to recommend excellent learning paths for students with general or poor grades, based on their individual learning styles.
This study proposes the personalized recommendation of an excellent learning path from the theoretical level. This further lays a theoretical foundation for the subsequent accurate personalized learning path recommendation of products. At the same time, in the field of people-oriented personalized education, teaching students in accordance with their aptitudes under the proposed concept provides a fresh idea in how to improve the learning effect of students and optimize the management of teachers.
The disadvantage of this paper is using the 0-1 matrix. The event occurrence is 1, the event did not occur is 0, and the duration of the event is not taken into account. In addition, the path recommendation of the computer system is not perfect enough, so it can only recommend the path for a single student. Future research will conduct empirical research based on the above theories, so as to iteratively modify the personalized recommendation model of the learning path and form a complete and effective computer recommendation system.