Collaborative Filtering Recommendation of Online Learning Resources Based on Knowledge Association Model

yanglifen110@126.com Abstract— Online learning platforms are prone to information overload, as they contain a huge number of diverse resources. To solve the problem, domestic and foreign scholars have focused their attention on personalized recommendation of learning resources. However, the existing studies perform poorly in the prediction of online learning paths, failing to clarify the overall knowledge system of students and the associations of resource knowledge. Therefore, this paper explores the collaborative filtering recommendation (CFR) of online learning resources (OLRs) based on knowledge association model. Firstly, the knowledge units were extracted from the semantic information of OLRs, and a knowledge association model was established for OLR recommendation. Next, a CFR algorithm was designed to couple semantic adjacency with learning interest, and used to quantify the semantic similarity of OLRs. The proposed algorithm was proved effective through


Introduction
With the development of technology, students are increasingly dependent and in need of the query of online learning resources (OLRs) facing the education big data. The OLR query points the development direction for students' personalized learning [1][2][3][4][5]. Online learning platforms are prone to information overload, as they contain a huge number of diverse resources [6][7][8]. To solve the problem, domestic and foreign scholars have focused their attention on personalized recommendation of learning resources [8][9][10][11][12][13][14]. Under new learning theories, the relationship between students and learning resources has become more complex. It is still difficult to solve the data sparsity and cold start problems of learning resource recommendation systems, calling for deeper research [15][16][17][18][19].
Diao et al. [20] proposed a personalized learning resource recommendation framework based on course ontology and learners' cognitive ability. Taking C-language course as an example, course ontology was established, and used to define semantic reasoning rules. According to the test results, the learner's cognitive ability was dynamically evaluated with maximum likelihood estimation (MLE) and joint probability, and a learner model was constructed based on learning preferences and cognitive ability. Wang and Fu [21] presented a personalized learning resource recommendation method based on the dynamic collaborative filtering algorithm. To recommend personalized learning resources, the Pearson correlation coefficient was adopted to compute the data similarity between learners or project resources in the network; the personalized recommendation of resources was improved by the stage-evolutionary twoway self-equilibrium mechanism; the optimal series recommendation was realized with the fuzzy adaptive binary particle swarm optimization (PSO) based on the judgement of evolutionary stat. To improve online learning efficiency, Dai and Xu [22] designed an OLR recommendation algorithm based on improved backpropagation (BP) neural network, and demonstrated the high promoting value of the algorithm, providing a reference for the development of personalized recommendation algorithms for online resources.
The traditional recommendation methods for English learning resources cannot meet the needs of students for in-depth learning. To solve the problem, Zhou [23] designed a resource recommendation algorithm for online English learning systems, based on learning ability evaluation, introduced the workflow of the algorithm, and developed a four-layer test system for evaluating English learning ability. Their results provide a reference for resource recommendation of other online learning systems. With the rapid development of Internet and information technology, personalized learning has attracted extensive attention. From the angle of user portraits in education big data, Wan et al. [24] constructed the framework for personalized learning resource recommendation services, modeled personalized learners and personalized learning resources, and proposed a collaborative filtering algorithm based on personalized labels, which realizes effective recommendation of personalized learning resources. Hao and Liu [25] put forward a big data-based recommendation model of personalized learning resources. The model consists of data storage, data analysis, data matching, and resource recommendation. Experimental results show that the personalized resource recommendation platform indeed promotes the learning effect.
The existing studies on OLR recommendation perform poorly in the prediction of online learning paths, failing to clarify the overall knowledge system of students and the associations of resource knowledge. With the progression of learning, the traditional recommendation systems cannot guarantee the coherence and systematicity of online learning, leading to problems like poor learning effect and low learning interest in the long run. Therefore, this paper explores the collaborative filtering recommendation (CFR) of OLRs based on knowledge association model. The main contents are as follows: (1) extracting the knowledge units from the semantic information of OLRs, and building a knowledge association model for OLR recommendation; (2) detailing a CFR algorithm that couples semantic adjacency with learning interest; (3) quantifying the semantic similarity of OLRs. The proposed algorithm was proved effective through experiments.

Knowledge association model
During the construction of OLR recommendation systems, the recommendation effect will be undermined by the sparsity of sample data, a result of the limited historical learning behaviors. The cold start problem may occur when a CFR algorithm is selected. Since the collaborative filtering algorithm does not comprehensively consider the semantic information of OLRs, this paper treats knowledge association model as the semantic assistance tool of OLRs, and applies the model to optimize the collaborative filtering algorithm. Figure 1 shows the architecture of the knowledge association model. Besides, an interest function was added to the improved algorithm to characterize the variation of learning interest, aiming to enhance the performance of OLR recommendation system.

Extraction of knowledge units from semantic information
This paper firstly mines the knowledge associations of the OLRs in official databases, or those retrievable by search engines, i.e., predicts the presence of knowledge associations according to the topology features of the knowledge association network.
Semantic information is a direct representation of knowledge in OLRs. The semantic information of OLRs can be extracted by the semantic generation model of latent Dirichlet allocation (LDA) files, and used to characterize knowledge subjects. To determine the number of semantics to be extracted, the perplexity PER was adopted to estimate the processing ability of the model for OLR texts. Let Etest be the test set of OLR texts; qe be the word series in file e; Me be the number of words in file e. The perplexity PER can be calculated by: The smaller the PER, the stronger the predictive ability of model for texts, i.e., the greater the clustering ability of LDA model for OLR semantics.
This paper gathers the statistics of all OLR sources, including authors, websites, and professional databases. A total of over 40,000 pieces of information was obtained about these sources. But some of the information is repetitive. Therefore, the authors carefully identified the full name of every source information, compared the names of websites and professional databases, and checked author information against the information of their employers. Let Mmax be the maximum number of professional papers written by authors in a professional field. Based on Price's law, the minimum number of professional papers written by relatively prolific authors can be calculated by:

Knowledge association model
Traditionally, OLRs are directly associated, indirectly associated, or coupled. To realize stereo association between OLRs, this paper optimizes the standard weighted direct association principle in the field of literature citation associations. Let ERXY be the direct association strength between OLRs X and Y; BRXY be the normalized indirect association strength between OLRs X and Y; CSXY be the normalized coupling strength between OLRs X and Y. Then, the standard weighted direct association strength between OLRs X and Y can be quantified by: Let N be the number of indirect associations between OLRs X and Y via other resources; nZ be the number of other resources associated with OLR Z; M be the number of OLRs associated with both OLR X and Y; mI be the association frequency between OLR I, which is associated with both OLR X and Y, and other resources. Then, the value of ERXY can be calculated by: The value of BRXY can be calculated by: The value of CSXY can be calculated by: If OLRs X and Y are not directly associated, but indirectly associated or coupled with each other, then SWERXY=SWERYX.
The associations between OLRs are mostly established based on the correlations between resource sources. The principle of stereo association between OLRs can be extended to the layers of authors, websites, and professional databases. That is, the standard weighted direct association theory can be extended to the resource sources. For OLR recommendation, the weighted harmonic algorithm below can be adopted to reasonably allocate the contribution of resource sources: where, m is the total number of OLR sources. If the number is odd, E=0. Before analyzing the semantics of the OLRs of all authors, the set of same-source OLRs should contain all the OLRs from that source. Then, the SWER value of OLR X and Y can be derived from the SWER value of the same-source OLR set of X quoting that of Y. Let S be the set of all the direct association mappings from the samesource OLR set of X to that of Y. Then, S={s|s=Xi→Yj, Xi∈ same-source OLR set of X, Yi∈ same-source OLR set of Y}. Let Xs be OLR X in the OLR association s; Ys be OLR Y in the OLR association s; qXs be the contribution of X to Xs; qYs be the contribution of Y to Ys. Then, the SWER of Xs to Ys can be described by SWDCXsYs. On the layer of resource sources, the standard weighted direct association RSSWER between X and Y can be quantified by:

CFR algorithm
This section mainly talks about how to vectorize OLR knowledge and establish OLR associations in the knowledge association model, and realizes the mapping of entities and associations in OLR semantics from high-dimensional semantic matrix to low-dimensional semantic matrix. Next, this paper will fuse the semantic matrix of knowledge association model with the student-OLR behavior matrix of collaborative filtering, and assign weights to the similarity matrix between them.
To overcome data insufficiency of collaborative filtering algorithm, the OLR knowledge must be vectorized, i.e., the OLR semantics must be vectorized. The vectorization aims to map the entities and associations in OLR semantics from highdimensional semantic matrix to low-dimensional semantic matrix. Let Rli be the value of the embedded vector of OLR Pi on the l-th dimension. Then, the mapping process can be expressed as: The distance between OLRs can be calculated by: The distance solved by formula (10) is negatively correlated with the similarity between two resources. For convenience, this paper normalizes the similarity between OLRs to the interval of (0, 1]. For the physical vectors Pi and Pj of two OLR semantics, the normalized similarity can be quantified by: Based on formula (11), it is possible to obtain a semantic similarity matrix of OLRs. The similarity between two OLRs peaks, as the similarity infinitely approaches 1. The similarity between them is too low, as the similarity infinitely approaches 0. In the latter case, the OLR will not be recommended.
For OLR recommendation, the traditional collaborative filtering algorithms all recommend resources based on the historical online learning behaviors, without fully considering the attenuation of learning interest over time. The longer the time, the greater the probability for the learning interest to be fixed. To optimize the previous collaborative filtering algorithms, this paper introduces a time function, i.e., the Ebbinghaus forgetting curve of the forgetting law of learners, to the existing OLR recommendation algorithm.
Our memory has a forgetting law. With the elapse of time, learning interest and learning preferences will change. In the OLR recommendation system, the time interval is correlated with the semantic similarity between the OLR in the current interval and that in the previous interval, as well as the cooling coefficient. Let φ be the holding time of the temperature in the current time interval; φ0 be the holding time of the temperature in the previous time interval; l be the cooling coefficient; φ-φ0 be the time interval between two measurements. Then, the Newton's law of cooling can be expressed as: Let u0 and uφ be the quantified semantic similarities of the previous and current time intervals, respectively; φ0 and φ be the durations of the quantified semantic similarities of the previous and current time intervals, respectively; l be the attenuation weight of learning interest. Similar to the Newton's law of cooling (12), the OLR semantic similarity can be quantified by: (13) If an OLR faces a small mean time interval and frequent visits, then the OLR must be highly preferred by students, and should be assigned a large weight; otherwise, the OLR should be assigned a small weight. The weight function can be given by: Let φ0 and φ be the earliest historical duration, and the current duration of quantified semantic similarity of OLR i in the training set, respectively (unit: day); μ be the time weight coefficient that determines the value of weight function; 0≦ω≦1 be the weight of time in the OLR recommendation system.
Considering the different time weights of OLRs, this paper characterizes the time weight coefficient μ with a factor that adjusts time attenuation, i.e., replaces μ with the mean duration of quantified semantic similarities of OLR. Let Ui be the time set of all quantified semantic similarities of OLR i; mT be the number of all time points for quantified semantic similarities; φ0 be the earliest time point for quantified semantic similarities of the target OLR; φui be any other time point for quantified semantic similarities of the target OLR. Then, the time weight coefficient can be calculated by: If the most recent time for quantified semantic similarities of the target OLR has a small gap with the mean time interval, then the target OLR is highly preferred by students; if the gap is large, then the target OLR is not very preferred by students.
The learning interest attenuation can be expressed by a function through the above analysis. In general, when two students commit the same learning behaviors in different time intervals, the semantic similarities of the corresponding OLRs will have different weights. The greater the time interval, the smaller the weight. To reflect the time effect of OLR recommendation, this paper introduces a time coefficient to calculate the students' preference for OLRs. Let SIM(Pi, Pj) be the OLR similarity algorithm coupling semantic adjacency. Then, the semantic similarity formula after introducing the time function can be given by: The basic idea of our OLR recommendation is as follows: Firstly, obtain the semantic similarity matrix from the rich semantics of the knowledge association model, and combine collaborative filtering algorithm with the obtained similarity matrix. Next, introduce the learning interest function of OLRs, and quantify the semantic similarity of the OLRs not yet visited. Finally, rank the OLRs in descending order by the quantified semantic similarity, and generate the Top-N OLR recommendation list. Let FSuvi be the semantic similarity of OLR i quantified by student v; ui * be the mean quantified semantic similarity of OLR i; U(P, l) be the set of the l OLRs with the highest semantic similarities with OLR i; O(v) be the set of OLR visits of student v. Then, the quantified semantic similarity of OLRs can be predicted by: The specific steps of prediction are as follows: ─ Step 1. Compute the semantic similarity set of the OLRs based on the knowledge association model.

─
Step 2. Obtain the OLR similarity set based on the online learning behaviors. ─ Step 3. Integrate OLR semantic information, merge OLR semantic similarities with the visited OLR similarities under proper weights, and compute the composite OLR similarity. ─ Step 4. Introduce time weight to quantify OLR semantic similarity, and predict the quantified semantic similarities of OLRs.

Experiments and results analysis
To mine resource semantics in the OLR recommendation field scientifically and standardly, this paper extracts the titles, abstracts, and keywords of learning resources from different sources, compiles them into the sample set for experiments, and preprocesses the sample data. The word segmentation tool was adopted to regularize the natural language processing, which yields the corpus of OLR texts for our experiments. Furthermore, the semantic information was extracted from OLR texts by the LDA model. Figure 3 shows the relationship between perplexity and number of semantic information. It can be seen that the perplexity of resources reached the valley, when the number of semantic information arrived at 200. The perplexity gradually increased with the number of semantic information. Therefore, 200 semantic information was selected to study the knowledge units of semantic information.

Fig. 3. Relationship between perplexity and number of semantic information
Because knowledge association model is a weighted directed network, this paper measures the importance of semantic information with in-degree and out-degree. Table 1 lists the in-degrees and out-degrees of knowledge association model. In time interval 1, topics 53, 45, and 8 were relatively important semantic information; in time interval 2, topics 46, 6, and 72 were preferred and concerned by students, and became the most favored and researched semantic information in the target professional field. This paper compares four supervised learning prediction algorithms on the same sample set, including our algorithm 1, random forest (RF) 2, naïve bayes 3, and gradient ascent 4. The four algorithms were separately adopted as the classifier to receive training on the prediction of quantized OLR semantic similarity. Figure 4 compares the AUC and precision of the four algorithms. Our algorithm achieved better AUC (0.8155) and precision (0.75) than the other algorithms, a sign of its effectiveness. The next is to verify the effectiveness of the collaborative filtering algorithm coupled with interest function. This paper solves the OLR set that interests the students, and adjusts the number of time intervals of target resources with a step length of 10. The performance of our algorithm was compared with that of our algorithm excluding the interest function, user-based collaborative filtering algorithm, and item-based collaborative filtering algorithm. The precisions, recalls, and composite evaluations of the four algorithms are compared in Figures 5-7, respectively.
When there were 10 time intervals, our algorithm achieved a precision of 21.66% and a recall of 6.58% before introducing the interest function; the precision and recall rose to 30.97% and 9.37%, respectively, after introducing that function. The two parameters were always above those of the user-based collaborative filtering algorithm and item-based collaborative filtering algorithm. With the growing number of time intervals, the precision and recall of all four algorithms were on the rise. Their precision and recall reached the highest levels at 30 time intervals. At this time, our algorithm (with interest function) achieved a precision of 38.04% and a recall of 10.84%. Further increase in the number of time intervals suppressed the precision and recall of the four algorithms. As shown in Figure 7, the composite evaluation F-score of our algorithm was higher than that of the user-based collaborative filtering algorithm, and item-based collaborative filtering algorithm, under different number of time intervals. The highest F-score belonged to our algorithm (with interest function). The results demonstrate the superiority of our algorithm in recommendation.

Conclusions
This paper explores the CFR of OLRs based on knowledge association model. Firstly, the authors detailed the extraction of knowledge units from the semantic information of OLRs, and constructed a knowledge association model for OLR recommendation. Next, a CFR algorithm was designed to couple semantic adjacency with learning interest, and used to quantify the semantic similarity of OLRs. Through experiments, the relationship between perplexity and number of OLR semantic information was obtained, the in-and out-degrees of the knowledge association model were summarized, and the AUC and precision of our prediction algorithm were evaluated. The results show that our algorithm outperformed the other methods in AUC and precision. Finally, the effectiveness of our algorithm was demonstrated through a comparative experiment against the traditional user-based collaborative filtering algorithm and item-based collaborative filtering algorithm.