A Personalized English Learning Material Recommendation System Based on Knowledge Graph

The world has rushed into the information age. As the lingua franca, English shapes the global landscape of information transmission and exchange. Mastering English is equivalent to possessing an important tool for acquiring precious information. Therefore, it is very necessary to improve English teaching. This paper analyzes the problems in traditional classroom teaching and online learning of English, and discusses how to keep students unbaffled in online learning, improve their English learning efficiency, and satisfy their personalized demands. Specifically, the relevant data were characterized by knowledge points in English teaching, and used to formulate a knowledge graph. Then, the related knowledge points were labeled in online learning data on learning platforms. After that, user portrait was created by analyzing the data on daily learning behaviors. Finally, collaborative filtering was coupled with content-based recommendation to push English learning resources to students, which meet their personalized demands. Keywords—Knowledge graph, personalized English learning, learning resources recommendation, collaborative filtering, recommendation system


Introduction
As digital economy and artificial intelligence are developing rapidly, our world has rushed into the information age. As the lingua franca, English shapes the global landscape of information transmission and exchange. Mastering English is equivalent to possessing an important tool for acquiring precious information [1]. No doubt, English is now an important information transmission tool for fast acquirement of various cutting-edge technologies and professional knowledges. In the field of education, English teaching runs from elementary schools to colleges and universities. Nowadays, English is no longer just a language tool for ordinary people to communicate with each other, but also a useful means for people to acquire knowledge and information from various media [2]. Therefore, the quality and effect of English teaching are of crucial significance. The traditional English classroom teaching has very limited time and space, within which teachers are not able to provide students with sufficient English learning materials or language application opportunities, let alone the one-on-one personal tutoring, and the learning demands of students of different levels cannot be met either. In terms of online platforms and mobile APPs, although there are massive learning resources available for students, two common problems often accompany online English learning, namely: (1) Lacking the professional guidance from teachers, students are often confused when facing such massive learning data; they might lose their learning interest and get lost by distractions, or even feel befuddled and bewildered with the knowledge they learnt; (2) Students are usually of different learning and comprehension ability levels; even for students in a same class, there are learning level hierarchies, and different students have different demands for knowledge learning. Existing online learning platforms mostly divide knowledge content based on students' grades, they generally unify and organize the knowledge without considering students' individual demands, which has resulted in issues such as insufficient learning materials for good students, while average and poor students never learn [3,4]. Therefore, in view of these two issues, how to achieve personalized recommendation of learning resources based on the characteristics of English knowledge and the individual demands of learners is a question worthy careful consideration.
Regarding the first issue mentioned above, with English course of a certain grade as an example, this paper analyzed the characteristics of English as a linguistic tool and its knowledge attributes, sorted out the knowledge system structure, and built knowledge graph of the course to describe the knowledge points. In such case, students would be able to learn the knowledge structure of the course according to the knowledge graph without the professional guidance of teachers, avoiding them from getting lost in the massive learning resources. As for the second issue, this paper analyzed the user portrait of students in the grade, the learning behavior paths of these students, as well as the specific situations of the online teaching platforms; then with the help of knowledge graph, it labeled the related knowledge points of learning data on online learning platforms, constructed the personalized English learning material recommendation system, and researched a few key points such as the system recommendation flow, user portrait, and recommendation algorithm, etc.

Construction of Knowledge Graph
With the advent of the era of big data, data are accumulated continuously and hardware is upgrading and iterating at an astonishing speed [5], on this basis, schools and various educational institutions have developed a series online English learning platforms and APPs, which contain a huge amount of English learning resource libraries that accumulated massive education data resources (such as audio files for English listening, teaching videos, courseware, exercise questions, test papers, and interactive games, etc.), however, these resources generally have the characteristics of unstructured or semi-structured, and diverse in forms, the traditional data mining methods or machine learning methods cannot use them directly [6]; and the processing of these education data has determined the effectiveness of the personalized recommendation system in subsequent researches. In English teaching, the teaching content is usually formulated centered on the knowledge points that should be mastered by students in a certain grade, therefore, knowledge points are the focus of teaching and testing, and they should be taken as the main input information of the intelligent recommendation algorithms. For this reason, this paper attempted to construct the knowledge graph based on knowledge points of a certain grade, and then applied technologies related to knowledge graph and network to describe the nodes in the corresponding graphs, and the advantages lie in that [7]: (1) knowledge graph can well reflect the system structure of subjects and visualize the internal structure of knowledge and the connections between knowledge nodes; (2) knowledge graph has good scalability and mobility, and can well adapt to the subsequent expansion of the knowledge system and the improvement of the algorithm.
Knowledge graph is an important concept developed with the use of mind graph in recent years [8], it mainly consists two parts: (1) the "Entity-Relation-Entity" triplets; (2) the entities and their corresponding attributes, they are usually value pairs or relationship expressions that are similar to keys/values in programming languages. Taking the "Topic 1-Nouns" in the junior high school English course as an example, the knowledge graph of the topic can be plotted as a network-shaped knowledge library structure, as shown in Figure 1 [9]. In subsequent works, it was converted into structured language that can be understood by computers, for instance, a triple of the entity relation of English teaching can be <teacher, lecture, course>, <student, grade, first grade of junior high school>, <course resource, include, knowledge point label>. At the same time, this paper took the knowledge points of the course as the entities of the knowledge graph, then, for a specific knowledge point "noun", the triple can be expressed as <noun, classify, proper noun>, or <noun, classify, common noun>, and so on. Constructing knowledge graph can help students expand and transfer course knowledge and cultivate their independent learning ability; at the same time, it can better solve their problem of "lost in online resources", and it facilitates the personalized recommendation. Based on the knowledge graph constructed in previous text, the resources of online platforms need to be integrated with knowledge points as the main content of the characterization technique, then the related knowledge points are used to normalize the specific content of the network resources and process them into characteristics that can be understood by the machine and the algorithm (such as classification, labels, knowledge library). Labels are highly refined characteristics identifications obtained through the analysis of resource information, and multimedia contents such as images, videos, and audios in the learning resources often involve image recognition and processing, and it's quite difficult for machine to comprehend the contents directly and label them, therefore, for resources in these formats, a set of user labeling mechanisms could be established and manually fill out and make the labels. For example, when uploading resources to the teaching platforms or the back-end database of APPs, the system can remind teachers to manually label the related knowledge points, with the "Lesson 55 The Sawyer family" in the text book New Concept English Volume 1 as an example, the involved knowledge points include the phrases of the verb "go", the grammar knowledge of general present tense, and the changes of third-person predicate verbs, etc. When teachers have uploaded DOC files, PPT files, audios, videos, exercises and other related learning materials, they can label the related knowledge points in the software interface. The names of these related knowledge points are specific entities in the content of the knowledge graph. Teachers do not have to input by themselves, but can choose in the graphical interface, as shown in Figure 2 below.

Technical Framework of Knowledge Graph
After the knowledge graph is constructed and the related knowledge points of the resources are labeled, the unstructured data resources have been preprocessed into structured data that can be easily recognized by computers [10]; and the next thing to be considered is using database to store them. Nowadays, relational database and graph database are the mostly commonly-used database; relatively speaking, graph database is much more flexible than relational database. When the content volume of knowledge graph increases, the number of system layer increases as well, and the efficiency of graph database will be hundreds or even thousands of times higher than that of relational database. For example, the commonly-used Neoj graph database [11] is a high-performance NOSQL graph database, instead of storing structured data in tables, it stores structured data on the network to form graphs. Neoj graph database has a high-performance graph engine [12] and the important features of a mature database, such as transactions and indexes. A graph generally contains two basic data types: Nodes and Relationships; Nodes and Relationships contain attributes in the form of key/value. Nodes are connected by the connections defined by Relationships to form a relational network structure [13]. For instance, for the knowledge graph entity relationship triples <teacher, lecture, course>, <teacher, lecture, student> <course resource, include, knowledge point label> mentioned in previous text, taking a student as an example, the relationship creation steps of teachers, students, and courses stored in the Neoj database are as follows: 1. Create a student node, including node name, label name, attribute name, and attribute value, etc.: For example, if a student's student ID is 200610201, his name is Alex, his age is 13 years old, and his gender is male, then the student node can be described as: create (s:Student{id: 200610201, name:"Alex",age:13,sex:1}) 2. Create relationship: create teacher relationship for the student. For example, for an English teacher named Tom with an age of 35, there is: create(t:Teacher{id:2000000121,name:"Tom",age:35,sex:1,teach:"English"})return t Assume the class taught by teacher Tom has 3 students: Alex, Jane, and Lucky, here we need to create the relationships between the teacher and the 3 students. The first is to create the relationship between Tom and Alex: The relationships between the teacher and the students are stored in the Neoj graph database as the relationship description shown in Figure 3 below. Subsequently, for the knowledge point labels of the courses taught by teachers, the course learnt by students, and the course resources, the same steps are taken to create nodes and relationships, and finally forming a knowledge graph-based database network structure. Knowledge graph is not constructed at a time, but accumulated slowly through constant repetitions, iterations, and updates [14]. As shown in Figure 4 below, to generate a complete knowledge graph structure, the construction process mainly includes 4 stages: (1) Data collection: the resources on the teaching platform are mainly composed of unstructured data (such as images, audios, videos, and texts), and some semi-structured data (such as XML, JSON, and Encyclopedia); these two kinds of data cannot be well recognized by computers, and need to be converted into structured data to perform subsequent algorithmic calculations; (2) Information extraction: extract the three content items of attributes, relationships, and entities from the content data source, and form knowledge expressions on this basis; (3) Knowledge fusion: after obtaining the new knowledge expressions, use anaphora resolution and entity disambiguation methods to process them to eliminate the contradictions and ambiguities of knowledge points. For example, some knowledge in English may have multiple expressions, and a specific appellation may correspond to different entities; (4) knowledge processing: the fused new knowledge needs to be evaluated for its quality by the system or professional English teachers, then the qualified parts are added to the knowledge library to ensure that the generated knowledge graph is of high quality.

Creation of User Portrait
An accurate and real-time personalized learning recommendation system needs to help the web page-end or the APP-end of the platform to create accurate user portrait, analyze the students' online and offline learning behaviors, capture the correct learning scenarios, and find the right "person", only in this way can the system realize effective personalized commendation [15]. In this paper, the user portrait technology refers to the modeling of student users of the English teaching platform based on student data: first, the system collects data such as students' school attributes, learning habits, and learning behaviors, etc.; then, using the user portrait technology based on knowledge graph, deeper level information that is related to students' learning requirements are extracted from the existing student data, and the different data are abstracted to form a student learning model with labels. Just like labeling the learning resources with related knowledge points mentioned in previous text, each platform student user also needs to be labeled to facilitate using recommendation algorithms to classify student users and extract information in subsequent research; a student's user portrait can be created as shown in Figure 5 below. The triple entity relationship of the knowledge graph can be expressed as <student A, classmate, student B>, <student A, gender, male>, etc., in this way, the knowledge graph structures of specific user portraits could be created, and the knowledge graph information could be stored in the Neoj graph database.

Architecture of the Personalized English Learning Material Recommendation System
Personalized recommendation of English learning resources means to discovering a student's English learning interest or similar learning content based on his/her past learning behaviors or learning records, and then providing the student with learning resources that meet his/her individual needs [16]. The architecture of the recommendation system ensures the automatic and real-time operation of the entire recommendation processes; it receives student requests, collects, processes, and stores student data, performs calculations using recommendation algorithms, and returns the recommendation results. The architecture of the personalized recommendation system built in this study is shown in Figure 6 below. 1. Content source: to establish a comprehensive personalized recommendation system, the data collection sources should be wide and deep enough. First, collect the buried data points from web pages and APPs of the platform, embed invisible probes in web pages, collect user learning behavior data and platform operation log, store data, and then the server is responsible for putting the collected log information into the storage device. 2. Content processing: if the collected basic data are structured data, then they can be processed directly by computer language; for unstructured and semi-structured data, as discussed in previous text, the content needs to be sorted out into specific knowledges points, the content is constructed by knowledge graph, marked by labels, and stored into the Neoj graph database for efficient reading and storage. 3. User mining: this step mainly includes four contents: user behavior log collection, transmission, mining, and storage. The collection of behavior log can still adopt the front-end data-point burying method to report the users' clicks, sharing, collection and other behaviors. Transmission requires stable transmission and update of the log, so that a certain operation of the user can be quickly fed back to the next recommendation [16]. The mining process is to calculate and mine user data into the features we want, namely the "user portraits". Here we need to add attribute labels, interest labels, behavior labels and customized labels to users.
4. Recommendation algorithms: after completing the first three steps, the system has got the content and user data; after that, algorithms can be used to match the two. For recommendation systems, one of the most classic algorithms is the user-based collaborative filtering algorithm [17][18][19][20], and the purpose of the algorithm is to recommend courses or knowledge points that are similar to student users' interest, and this algorithm has two steps: (1) The courses or knowledge points learnt by the object student user are taken as the characteristics of the learner to calculate the learner similarity matrix; (2) According to the learner similarity matrix, other K students who are similar to the object student user are obtained, then with similarity added as a weight, from courses/knowledge points learnt by other similar students, the top N courses/knowledge points are recommended to the object student. However, the biggest flaw of the collaborative filtering algorithm is the cold start problem. When a new user has just entered the platform, there is no history information of the user, so the system cannot perceive its existence and it is difficult to make personalized recommendations. Another classic recommendation algorithm is the content-based recommendation algorithm. This algorithm creates configuration files for users and learning materials respectively; then by analyzing the content browsed by a user, a configuration file of the user is created; after that, by comparing the similarity between the user and the learning materials, the most matching learning materials are recommended to the user. The English materials of the learning platforms are mostly text and multimedia data. Previous text has introduced how to use knowledge graph to describe course content and knowledge system and how to create user portrait; the labeling method is adopted to make marks using the labels, which are then stored in the Neoj graph database, therefore, the content-based recommendation can make up for the defects of the user-based collaborative filtering algorithm and solve the user cold start problem. As a result, the mixed recommendation technology combining user-based collaborative filtering algorithm and content-based recommendation algorithm can improve the accuracy of the recommendation system.

Mixed Recommendation Technology
In the system architecture shown above, we have mentioned that the mixed recommendation technology adopted by the algorithm layer is the core content of the entire system. The accuracy and real-time performance of the system are closely related to it and the effective mixed recommendation technology is important for improving the performance of the recommendation system and enhancing the recommendation effect. As the recommendation technology field has been developed so far, the user-based collaborative filtering is a common method that has certain advantages in terms of the novelty of recommendation results, but the relevance of recommendation results is relatively weak, popular learning resources are recommended more frequently, and thus losing the meaning of personalized learning, moreover, as mentioned above, there's also the problem of cold start with it. The content-based recommendation algorithm is the simplest and most intuitive recommendation algorithm, it doesn't have the cold start problem and it judges whether to recommend a content based on the similarity, therefore, its accuracy depends on the descriptions of the learning resources, namely the labels made during content mining, and this has determined that this algorithm is limited by the in-depth analysis of the content of texts, images, audios, and videos. The two methods have their respective pros and cons, and both have certain limitations. Simply applying one recommendation technology cannot guarantee high-quality recommendation results; therefore, this study proposed to combine the two methods to make up for their respective shortcomings. About how to mix the two to truly maximize their strengths and avoid weaknesses, two measures were proposed as follows: First, in terms of the mixed recommendation framework, the algorithm's calculation cycle, online response time, and resource consumption should be considered, and the accuracy of the recommendation results needs to be taken as a reference, we need to find a balance between the two. A lot of practical applications have proved that, by applying the three-staged system, namely the Online-Nearline-Offline three-layer mixed mechanism, a relative ideal balance between the two could be found. In this mechanism, the Online module is used to quickly capture the behavior information of students or platform users, such as what courses have been studied, what videos have been clicked and scored, etc., and this module requires a Cache system to process and respond to API requests. The Nearline module makes use of the user events and uses stream-oriented computation to obtain intermediate results, on the one hand, these intermediate results are sent to the online part to update the recommendation model in real-time; on the other hand, these intermediate results are stored as well. The Offline module mines the long-term and massive user behavior logs, undertakes high-load complex algorithms, and regularly takes out the data stored in the data warehouse for batch operations and model updates; therefore, it consumes large resources and requires a long mining cycle, but the quality of the calculation results of the Offline recommendation system is often the highest, and these results will be transmitted online through the Nearline system. These three layers need to work collaboratively to bear high-load and lightweight algorithms to quickly respond to user behaviors, ensuring the high reliability and high concurrency performance of the recommendation system and the high accuracy of recommendation results.
Second, in terms of the algorithms, applying the weighted mixed recommendation technology can yield better recommendation results. That is, run the two recommendation algorithms separately to generate some candidate results, then perform weighting and get the final recommendation ranks and results. A relatively simple weighting method is to give the same weight value to the collaborative filtering recommendation result and the content-based recommendation result, and then generate the recommendation results. However, the fixed weighting system ignores that the quality of the algorithm varies according to the different learning objects and under different learning scenarios, therefore, it needs to consider setting training samples and compare whether the user's evaluation of the recommendation results is consistent with the system's prediction, then, the system generates the weighted model based on the training results, and dynamically adjusts the weights to generate recommendation results, the flow of the algorithm is shown in Figure 7. http://www.i-jet.org

Conclusion
Combining with the characteristics of English knowledge, this paper analyzed the problem that students often get lost in the massive online learning materials and built an efficient and accurate personalized English learning material recommendation system. The proposed system used knowledge points of English course to characterize the data and create the knowledge graph, and the relevant technologies of the proposed system had been introduced in detail. To realize personalized and customized recommendations, the learners' behavior data were taken into consideration to create complete user portrait and the specific knowledge graph structure; and user label data was combined with the mixed recommendation technology (integrating the user-based collaborative filtering algorithm and the content-based recommendation algorithm) to make recommendations. The operation and implementation of the proposed system can help learners save a lot time spent on data search and query, and it can improve the efficiency of English learning.