Score Prediction Model of MOOCs Learners Based on Neural Network

— Through analyzing the behavior data of MOOCs learners, a MOOCs learner's score prediction model is constructed based on clustering algo-rithm and neural network in this paper. By using this model, we can find out the neglected information and hidden learning rules in the MOOCs learning process. The model can provide personalized guidance for each user and improve learning efficiency. The model can provide personalized service to help learners form personalized learning strategies, and it also can alert learners with low grades and risk of dropping out.


Introduction
Since 2012, Massive open online courses (MOOCs) have broken forth and spread widely among various universities in the world, it has an extremely important impact on the teaching of higher education in the world [1][2].Coursera, edX and Udicity three platforms have been introduced by many world-renowned universities to open up highquality online educational resources and services to users worldwide [3].
The biggest characteristic of MOOCs is that it has a large number of learners and a variety of learner groups.At the same time, learners' knowledge background and learning motivation are varied.In the MOOCs environment, the learner's learning behavior will be recorded in a variety of data.Therefore, it is necessary to study and analyze the learning behavior data of MOOCs learners [4].
Although MOOCs has many advantages, there are many disadvantages, such as, lack of interaction between learners and teachers, high suffixing rate and so on.To solve the above problem, some scholars analyzed the data of MOOCs learners' learning behavior [5][6][7], and they find methods to overcome the limitations of MOOCs, and some scholars used the analysis results of learning behavior to improve the learning effects and methods in MOOCs [8][9][10].
In the research of the emergence and development of MOOCs, Rai et al. proposed the application of social software to MOOC platform from the perspective of "learner center" to achieve the horizontal expansion and gradient development of MOOCs [11].Fu et al. used visual technology to study various literature key words related to MOOCs.Through the methods of word analysis and social network analysis, they revealed the hot spots and development trend of MOOCs [12].
In the analysis of the characteristics of MOOCs, Kennedy analyzed the typical characteristics of MOOCs from four aspects: scale, openness, networking and innovation [13].Hughes et al. selected seven typical MOOCs, which were analyzed from six aspects, such as platform positioning, curriculum organization and teaching methods [14].
On the innovation of MOOCs and the impact on traditional teaching, Zeng et al. elaborate the challenge of MOOCs to the traditional education [15].Lai et al. proposed a mixed teaching mode suitable for university teaching based on MOOCs [16].
On the curriculum and platform analysis of the MOOCs platform, Kim used the evaluation standard of web-based teaching platform, and elaborated the usefulness of MOOCs in exploring the learner group's conscious learning [17].After analyzing the background of MOOCs, Liu et al. compared curriculum resources, learning activities and learning evaluation based on the characteristics of the curriculum design on MOOCs platform [18].
In the analysis of the MOOCs data, Jiang et al. proposed the factor analysis model based on intrinsic motivation, basic psychological demand factors and the design factors of MOOCs [19].Zhuo et al. take a course in UOOC as an example to collect all the data of the course, and study the user's learning behavior based on big data analysis [20].Mou et al. took six MOOC courses as an example to analyze the learners' behavioral, they explored the learning behavior of the learner at the course level [21].

Analysis of Learner Behavior in MOOCs
The most important problem of learner behavior analysis is to get accurate, reliable and comprehensive learner data.A large number of structured and unstructured data will be generated through interaction between learners and instructors, or between learners and learning resources.
There are two kinds of data sources, one is to collect learner behavior data by questionnaires and the other one is to share open data sets published by some international organizations.In this paper, Canvas dataset was chosen as the source dataset for MOOCs learner behavior data analysis.Data in Canvas open dataset is related to MOOC courses and learner learning behavior.Learners' behavior in the course may help researchers to better describe and understand the learning situation and learning behavior in MOOCs.The Canvas dataset has a large number of attributes to record a learner's behavior in a particular course.The specific feature attributes are shown in Table 1.
There are 325198 records in the Canvas dataset, and 89213 records of them are missing part of the information.This part of data will be cleaned out when we analyzed.
Age distribution of learners is shown in Table 2.As shown Table 2, the number of learners aged 18 to 30 is the largest.The number of learners aged over 45 is the least.It shows that MOOCs learners are younger and more familiar with the Internet platform, and they have strong thirst for knowledge and learning goals.
The distribution of educational level of MOOCs learners can explain some characteristics of the MOOC user groups.As shown in Figure 1, MOOCs learners mainly focus on bachelor's degree and master's degree.It shows that the learners attracted by MOOCs are basically receiving good higher education.
Most MOOCs learners can't spend a lot of learning time.For this purpose, the Canvas platform had investigated the appropriate value of learners' expectation of spending time each week on learning.From Figure 2 we can see that 39% learners tend to spend 2 to 4 hours per week for learning, and 33% learners tend to spend 1 to 2 hours per week for learning.From the data point of view, the learners are expected to spend less time on the MOOCs platform, which means that the course arrangement should be as short as possible to meet the needs of the learners.In this paper, we assume that learners are divided into three categories: Active learners, Passive learners, and Negative learners.Active users are active in learning activities, actively speak in the forum, actively finish their homework and learn more than 50% of the course content.Passive learners, who use traditional learning methods, only watch videos, browse courseware and finish homework.Negative learners have passive learning, few learning activities or less learning behavior, and lack of autonomous learning ability.According to analysis results, there are only 9856 learners in the Canvas dataset of active users, there 79827 passive learners, and the rest are negative learners.

Fig. 3. Learner classification
The analysis of factors that affect learners' grades is an important part of MOOCs analysis.In order to make a more intuitive analysis of the factors that affects the MOOCs performance, the following typical feature attributes are selected as reference for analysis: course interaction times, number of interactive days in the course, number of course chapters, number of posts in the forum and the length of course.The relationship between the five characteristics and scores is analyzed, and scatter plots of the scores and the five characteristics are plotted respectively.
As shown in Figure 4, in terms of courses interaction times, the average level is low except for few high values.Moreover, there is not much difference in courses interaction times corresponding to different grades.
From Figure 5, with the increment of the number of interactive days in the course, the score becomes higher.
The number of course chapters is distributed from 0 to 100, and the results are dense at both ends.There are obvious linear rules in Figure 6.
As shown in Figure 7, there was a clear positive correlation between the number of posts on the forum and the scores.There is no obvious trend in the length of the course.However, unlike the number of chapters, the length of course is evenly distributed, and there is no polarization.

MOOCs Score Prediction Model Based on Neural Network
Aiming at the various learning behaviors and records contained in the learning process of MOOCs, a score prediction model is constructed to provide learners with strong prediction data support.In the process of learner learning behavior, there are uncertain behavior changes.Therefore, in this paper we choose RBF neural network as a method to construct a score prediction model.
The model takes the learner behavior data as the training data and selects the behavior characteristics of the processed earners as the input of the model, and predicts and outputs the prediction results of MOOCs learners.For constructing a score prediction model, we need to qualitatively analyze the independent variables of the function according to the learner's behavior characteristics, and construct a one-to-one correspondence function relation.
In this paper, five independent variables are selected as the input variables of the score prediction model, and the scores of the learners are taken as the output variables of the prediction model, then the input and output relation of the score prediction model can be defined as follow: where y represents score,  & represents course completion ratio,  ( represents the number of posts in the forum,  ) represents homework completion ratio,  * represents course interaction times,  + represents the number of interactive days in the course.
In this paper, RBF neural network is used to construct learner score prediction model.RBF neural network is a single hidden layer feed-forward network.It is divided into three layers.The first level is the input layer.The function is to input the n input to the m hidden layer.The input number is the number of input variables in the prediction model.The second level is the hidden layer, and the transformation function of the layer selects the radial basis function.The third level is the output layer, which can adjust the linear weight in response to the input mode.
The mathematical model of RBF neural network with threshold is defined as follows: Where ℎ(‖ -−  5 ‖) represents radial basis function, ‖•‖ is Euclidean norm, it represents the distance between  -and  5 ,  -∈  = is the i-th input of the neural network,  5 is the center of the k-th node of the hidden layer,  0 is the threshold of the j-th output node.
The structure of MOOCs score prediction model based on RBF neural network is shown in Figure 9.In this paper, Gauss function is chosen as the radial basis function of RBF neural network, the Gauss function is defined as follows: On the basis of the use of Gauss's function, the output of the y-th node of the hidden layer is: The linear mapping is realized by  0 →  in the output layer,  represents the output of the output layer node, then the output of the j-th node in output layer is: http://www.i-jet.org where k is the number of hidden layer nodes,  0 is the weight of the j-th hidden layer node to the output layer node. 0 is the output of the j-th hidden layer node. is the threshold of the output layer node.

Simulation Experiment and Result Analysis
According to the analysis and feature attributes of Canvas dataset, the score prediction model based on the K-means feature selection is used to predict the learner's score in simulation experiment.In experiment, we extract 3100 groups of pre-processed data to construct prediction model, 3000 groups of data are acted as training dataset and 100 groups of data are acted as test dataset.
The simulation results of RBF neural network score prediction model based on Kmeans feature selection is shown in Figure 10.
The simulation results of RBF neural network score prediction model based on random centers is shown in Figure 11.
By observing the time efficiency of the two prediction models, it is found that the time efficiency of the score prediction model based on K-means feature selection is higher.It shows that the convergence accuracy of the score prediction model proposed in this paper is better than that of the RBF neural network prediction model based on random centers.
In order to further use 100 groups of test data to verify the validity of the constructed score prediction model.In this paper, a comparative experiment is designed to compare users' actual performance and prediction results.The results show that the model proposed in this paper has a high accuracy in predicting the score of MOOCs learners.

Fig. 4 .
Fig. 4. A scatter plot of score and course interaction times

Fig. 5 .Fig. 6 .
Fig. 5.A scatter plot of score and number of interactive days in the course

Fig. 7 .Fig. 8 .
Fig. 7.A scatter plot of score and number of posts in the forum

Fig. 9 .
Fig. 9.The structure of score prediction model Paper-Score Prediction Model of MOOCs Learners Based on Neural Network

Table 1 .
Feature attributes of Canvas Dataset

Table 2 .
Age Distribution of Learners