Personalized Distance Learning System based on Sequence Analysis Algorithm

—Personalized learning system can provide users with the most valuable learning resource to them through intelligent recommendation models and algorithms. This paper proposed the classical sequence analysis algorithms, and the Prefixspan algorithm is validated through distance learning platform data. In the event that the minimum support threshold is between 0.003 to 0.004%, test data shows that the performance of the algorithm's accuracy rate is relatively stable and the recommendation effect is satisfactory.

The distance learning platform for farmers was set up by Beijing Academy of Agriculture and Forestry Sciences, which includes front broadcast platform, learning site, learning resource library and learning management system. What's more, the function of video-on-demand, live and expert lectures were existed in the platform. It is possible for farmers to learn in low cost, and they can get agriculture technology knowledge ASAP. At present, the number of registered users reached more than forty thousand, and the video teaching resources has reached more than 9,000 pieces . But it is very difficult for farmers to get their interested learning resources in the platform. So Personalized learning system was developed to solve this problem, which can analyze the user's behavior of individual , then provide them with useful information. The paper researched personalized learning system based on the massive user behavior data in the distance learning platform, and carried out the research of distance learning systems personalization algorithm for sequence analysis.
Sequential Pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data samples where the values are delivered in a sequence. It is usually presumed that the values are discrete, thus time series mining is closely related, which usually considered a different activity. Sequential pattern mining is a special case of structured data mining.
Several key traditional computational problems are addressed in this field. These involve in building efficient databases and indexes for sequence information, extracting the frequently occurring patterns, comparing sequences for similarity, and recovering missing sequence members. In general, the problems of sequence mining can be classified as string mining, which is typically based on string processing algorithms and item set mining which is typically based on association rule learning. PREFIXSPAN ALGORITHMS III.
1 Scan sequence databases, and generate all sequences mode of length 1.
2 Form the corresponding projection database according to the length of the sequence pattern 1.
3 Repeat the above steps on the corresponding projection database until it cannot produces a sequence mode of length 1 on the corresponding projection database 4.The projection for different databases were repeating the process until no new sequence of length 1 mode are set up. the basic principle of Prefixspan algorithm is shown in Figure 1. 2. The same record user ID is combined, the specific time of occurrence of each video can be ignored, and generate a sequence database.
3. Call Prefixspan algorithm processing sequence database into a user maximal frequent sequences and obtain frequent sequences credibly and supports.
4.The length of user sequence is obtained by querying the average number of user watching video daily. User sequence database is obtained by intercepting the user specify length of the sequence.
5.Slect user sequence credibly . Sequence has been obtained using the user, and the user frequently sequence table is obtained by fuzzy query, which containing the current user sequences. The credibility value has chose in PAPER PERSONALIZED DISTANCE LEARNING SYSTEM BASED ON SEQUENCE ANALYSIS ALGORITHM these frequent sequences. N sequences are obtained, and N is the number of recommended video.
6. According to users credibility, frequent sequences is filtered . The highest sequence credibility and support video are priority selected to recommend.

THE TEST PROGRAMS AND PROCESSES V.
Test data selection A.
The users' learning records in the distance learning platform are as the test data. The number of the records is about 440,000 .And the user learning time length ,less than 10minutes , is filtered.
The test processes B.
(1) Select the training and test sets The data set is divided into training and test sets, and the training set is about two-thirds of the record, which is used to generate frequent sequences. The rest ones are test data.
(2) The division of the test set There are two methods to divide the test data set, as follows.
1) In the test set, each user browsed daily video records by using their own user ID, and being arranged in ascending order of time. This record is divided into two portions t1 and t2. T1 is account for 2/3 of the total browsing video records, which is used to generate a recommendation for video, t2 is account for 1/3 of the total browsing video records, which is used for the evaluation of the recommendation video result.
2) In the test set, each user browsed the daily video records by using their own user ID, and arranged in ascending order of time. This record is divided into two portions t1 and t2. T1 is account for 2/3 of the total browsing video records, which is used for generating a recommendation for video, and t2 is account for 1/3 of the total browsing video records, which is used for the evaluation of the recommendation video result.
2) In the test set, each user browsing the video records daily are identified based on the user ID, and each video is separated by a commas and made up of user sequence. This record is divided into two portions t1and t2.T1is account for 2/3 of the total browsing video records, which is used for generating a recommendation for video, and t2 is account for 1/3 of the total browsing video records, which is used for the evaluation of the recommendation video result.
Generating recommended video C.
(1)The recommended sequence algorithm program is run into training data set, frequent sequences of training is obtained.
(2)T1 data set is using the recommended sequence algorithm program for recommendation , the frequent sequence is obtained from training data set.
(3)The recommendation video set from training set is denoted by RS.

Recommendation video number is fixed, The test A. result and analysis from different minimum support threshold (1) Test Results
After the data set is divided, a total of training set records is 189,733. In the first method, there are 60989 records and 4988 users. In the second method, there are 61,626 records and 5757 users. Table 1 shows the test result when recommendation video number is 5 The effect curve line of precision is shown in Figure 2, when recommendation video number is 5, and the effect curve line of coverage is shown in Figure 3. (2) Analysis on test result When the recommendation number is the same, the minimum support threshold is changed, and the performance of accuracy and coverage in the method 1 is better than in the method 2.
The minimum support threshold is fixed, the test result B.
and analysis under different recommendation video number (1) Test result After the data set is divided, the number of training set records is 189,733. In the first method, there are 60989 records and 4988 users. In the second method, there are 61,626 records and 5757 users.
The minimum support threshold is set up to 0.004%, and the recommendation video number is changed 1, 3, 5, 8,10.The precision and coverage is shown in Table 2. Table 2 shows the test result, when minimum support threshold is 0.004%. when minimum support threshold is 0.004%,the effect curve line of precision shows in Figure 4, , while the effect curve line of coverage shows in Figure 5. (2) The test result analysis When minimum support threshold is set up, the recommendation number is changed, and the accuracy and coverage performance in the method 2 is better than in the method 1.

CONCLUSION VII.
The accuracy of the prefixspan algorithm will be better while increasing the minimum support threshold. The setup of frequent sequence data will decrease tremendously, if the minimum support threshold increased. So it is recommended that minimum support threshold should not be too large under the guarantee of coverage and frequent sequence data. when the minimum support threshold is between 0.003 to 0.004%, experimental results show that the performance of the algorithm's accuracy rate is relatively stable and the recommendation effect is satisfactory.