Learning Motivations and Learning Behaviors of Sports Majors Based on Big Data

— The subjective factors of sports majors play a critical role in the improvement of their cultural quality. Based on data mining, the valuable information about learning motivation and learning behavior can be obtained from the massive data. Therefore, this paper explores the learning motivations and learning behaviors of sports majors based on big data. Firstly, this paper analyzed the features of the learning behaviors of sports majors, and measured the complexity of their learning behaviors with information entropy, approximate entropy, and change-complexity function. Next, a dataset was established based on the students’ use of campus access network and online learning plat-forms. After that, a time domain convolutional capsule network model of multiple semantic features was established to recognize and classify the learning motivations of sports majors. The proposed model was proved effective through experiments.


Introduction
The cultivation of sports majors should pay adequate attention to both the sports training and the cultural education, so that the students could become aspiring, virtuous, cultivated, self-disciplined, and qualified members in future sports team, thus, educators in China must be fully aware of the importance of the cultural education works of sports majors [1][2][3][4][5]. To improve the cultural quality of sports majors, we must start from finding out the subjective factors lying in the sports majors themselves. The learning motivations and learning behaviors of sports majors are critical factors affecting their learning, therefore it's necessary to further study and analyze them in detail [6][7][8]. As campus informatization is accelerating in Chinese schools, massive data recording the learning and living of students on campus has been generated [9][10][11][12]. If these data can be well processed using data mining technology, then we can get valuable information about the learning motivations and learning behaviors of students.
In recent years, the physical quality of college students is on decline, and the reform of higher education has become an urgent problem to be solved for colleges and universities in China. To find solutions for this matter, scholars have done various works, for example, Yang [13] analyzed the important role of quality education in college PE class, then it pointed out a few problems in PE class reform and proposed corresponding suggestions and solutions. Xin and Fang [14] proposed that, in the context of accurate analysis of big data, the situational teaching and quality training of college PE class are two topics worthy of the attention from sports educators; then, through the research on big data and precision sports, this paper analyzed the possible questions might rise during the process of situational teaching in college PE class and the possible requirements of college students, in the end, this paper summarized the methods of thinking design and practical application. Gao [15] holds that PE is a basic way for college students to promote intellectual development, improve learning efficiency, and cultivate ideological and aesthetic qualities. Zhang [16] employed the principles and methods of quality education to explain the theoretical basis of physical quality education and analyze the difference between quality education and physical quality education.
The existing campus big data is mostly used to extract students' learning features and their daily behavior patterns, so that we could make early warnings if there're learning risks in their learning status of the comprehensive or specific subjects, however, few researches have analyzed the learning motivations and learning behaviors of sports majors based on big data, in view of this research blank, this paper attempts to explore the learning motivations and learning behaviors of sports majors based on big data, and the content of this paper is arranged as follows: 1) analyze the features of the learning behaviors of sports majors, and measure the complexity of their learning behaviors with information entropy, approximate entropy, and the change-complexity function; 2) build a dataset based on the students' use of campus access network and online learning platforms, and construct a time domain convolutional capsule network model of multiple semantic features to recognize and classify the learning motivations of sports majors; 3) Verify the effectiveness of the constructed model using experimental results.

Feature analysis of the learning behaviors of sports majors
For sports majors, only by letting them fully realize the importance of cultural class can we ultimately achieve the joint development of sports training and cultural education, then, they could grow and become sports talents with excellent cultural quality, and meet the requirements of the country, moreover, they will have a great advantage in employment after graduation.
The learning motivations of sports majors can be divided into external motivations (inspired by external environmental conditions and motivations) and internal motivations (determined by learning and psychological factors). Figure 1 lists factors affecting the external motivations of sports majors' learning behaviors. If a student's sense of autonomy, sense of competence, and sense of belonging couldn't be met, then he/she will lose learning motivations for the cultural class.  For sports majors, they generally have a free and relaxed sports training environment, so randomness and diversity are two common features for most of them. In such case, the simple mathematical statistics method is no longer applicable for study the complexity of the learning behaviors of sports majors, and we need more effective quantitative indicators to evaluate it. Thus, this paper selected to use information entropy, approximate entropy, and the change-complexity function in the ACSS toolkit to measure the complexity of the learning behaviors of sports majors.
Assuming: m represents the total number of different elements in the learning behavior data of students; ti represents the probability of occurrence of a single element; in order to effectively measure the orderliness of the learning behaviors of students, Formula 1 gives the calculation formula of information entropy: As the most basic indicator of information quantification, information entropy can quantify the uncertainty of the learning behavior information of students, but this indicator ignores the correlation between elements and only pays attention to the occurrence probability of a single element, so its effect in evaluating the learning behavior complexity of students is poor.
Approximate entropy can measure the occurrence probability of behavior data patterns in the student learning behavior data time series, this indicator is often used to identify data changes in complex systems, and a larger value of approximate entropy indicates greater data complexity. Assuming, {v (1), v(2),...,v(M)} represents the time series of the learning behaviors of students and is divided into M-n segments, wherein M is the length and n is the data volume in the time series segments, then, with n as the period, the divided time series segments can be represented by {a (1) gives the calculation formula of approximate entropy: Assuming: s represents the tolerance of similarity among sub-segments, then in above formula, there are D n i={(The number of segments whose distance between the i-th time series segment and other M-n time series segments is less than s)/M-n+1}, and D n+1 i={(The number of segments whose distance between the i-th time series segment and other M-n-1 time series segments is less than s)/M-n}. According to the formula, the approximate entropy indicator can describe the probability of generating new behavior patterns when the dimensionality of the time series of student learning behavior data changes, it has a good effect in measuring the structural complexity of time series.
To further study the complexity of behavior patterns, this paper also chose the change-complexity function to measure the complexity of the data structure based on the degree of change between data. Since the time series format required by the change-complexity function is 0-1 sequence, the student learning behavior data needs to be converted into 0-1 sequence, here we use {a1,a2,...,ai,...,am} to represent it, and Formula 3 gives the calculation principle of ai: Assuming: K represents the total length of the time series, ε represents the time interval of segment division, and it satisfies 2<ε≤K; CNε represents the number of changes when the time interval is ε; K+j-1 represents the number of time series segments that can be divided, then the complexity can be calculated by Formula 4: According to above formula, the final complexity is the mean value of the numbers of changes of the sub-sequences in the student learning behavior data time series.
After quantifying the features of student learning behavior data, the learning behavior features of students could be obtained. In order to explore the learning behavior patterns of students, this paper performed feature visualization and Pearson correlation analysis. Formula 5 gives the calculation formula of Pearson's correlation coefficient: The absolute value of Pearson's correlation coefficient represents the strong-orweak correlation between learning behavior features, but the prerequisite for discussing the correlation between two variables based on Pearson's correlation coefficient is to have an ideal T value (namely the degree of confidence), only when the value of T is ideal, the calculated PCC value is credible.
The common synchronous data collection technology mainly collects data from two aspects: the server end, and the client end. Since the data of the campus network access control system and the online learning platform is complex and in big volume, it could be collected using a webpage log collection method that integrates the count method, the time collection method, and the serve/client end method ( Figure 2).
Through the correlation analysis of the data of access control systems of dormitory buildings, libraries, and self-study rooms of sports majors and their learning data of online learning platforms, it's found that, for the data of dormitory access control systems and library access control systems, the quantification effect of information entropy was better, even better than other newly proposed indicators. While for the leaning behaviors mapped from the data of the access control systems of self-study rooms, the quantification effect was worse, and the advantages of other indicators were obvious and can well reflect the strong correlation with the learning performance of cultural class.  Further analysis of the sample data suggested that, the data volume of library access control, self-study room access control, and online learning platform was much smaller than that of dormitory building access control, and the data complexity was lower, which means that, for low-complexity data such as the library access control, self-study room access control, online learning platform, the information entropy can be used to quantify the complexity of student learning behaviors, while for highcomplexity data such as the dormitory building access control, the two newlyproposed indicators, namely the approximate entropy and the change-complexity function, could be adopted, and their effect was more significant.
The above is the correlation analysis of a single learning behavior of sports majors, in order to achieve more comprehensive indicator performance evaluation, in this paper, the indicators corresponding to different learning behaviors of students were weighted and integrated into one comprehensive indicator, CB. Assuming, l represents any indicator, j represents the serial number of students, N represents the total number of students contained in the sample data and it satisfies 0≤j<N; i represents any single learning behavior indicator in the comprehensive indicator, M represents the total amount of behavior features of students and it satisfies 0≤i<M; δi represents the correlation coefficient between the i-th learning behavior indicator and the learning performance of cultural class, then, for the j-th sports major student, all behavior data in the indicator can be expressed as {BD 0 i, BDi 1 , ...,BDi M-1 }. Formula 6 gives the calculation method of CB: According to above formula, after the CB value (indicator of the complexity of student learning behaviors) was calculated, for sports majors in different cultural class performance score intervals, the curves of the probability density function of the approximate entropy and the change-complexity comprehensive indicator could be plotted.

Prediction of the learning motivations of sports majors
Based on Japanese scholar Kitao Tomohiko's "three-level theory", this paper divided the influencing factors of learning motivations of sports majors into three aspects: direct factors (containing teacher factors such as teaching ability, professional ethics, etc.), internal factors (containing personal factors such as academic needs, competition needs, and esteem needs, etc.), and environmental factors (containing family factors, and social factors, etc.), as shown in Table 1. The reasons affecting the learning motivations of sports majors are complex and diverse, and they work together on the learning motivations, that is, there's a chain reaction effect between them. If a student is late for class because of oversleeping, inevitably this will have a negative influence on his/her learning effect of the cultural class that day, in the collected data samples, such correlation between cause and effect is reflected as the influence on the relative positions of data elements.
In order to give the constructed model a better ability to sense the distribution of data, this paper constructed a time feature table for each type of semantic labels of the collected learning behavior data samples, and used it to capture the relationship between different time series segments in the feature layer and construct the connection between different feature layers. To put it another way, the design of the time feature table can capture the different patterns of the sports majors' living and learning habits, and this can assist the division of student groups with different learning motivations.
Aiming at correctly identifying and classifying the learning motivations of sports majors, this paper focused on the data set created from students' behaviors when they use the campus access control network and the online learning platform, and used the constructed time domain convolutional capsule network model with multiple semantic features to identify and classify the learning motivations of sports majors. This model can use attitude matrix to maintain the hierarchical attitude relationship between entities at the same time, that is, it can capture the results of the interaction of data features on different time scales.
The capsule network can effectively recognize relative position patterns due to the reason that there're vector neurons containing in its structure that are very different from the traditional scalar neurons. Figure 3 compares the information propagation process of a traditional scalar neuron and a vector neuron. For the vector neuron, the input vector in the previous layer will be multiplied with the corresponding attitude matrix before taking the weighted sum, thereby realizing the maintenance of the relative position relationship between entities.

Fig. 3. Comparison of different neuron information propagation processes
The input of the network model is a constructed multi-dimensional time feature table G of students' learning behaviors. Assuming: CPC-n represents the first layer output feature map generated by the convolution kernel n; Nn represents all the combinations of the n-th convolution kernel and the input learning behavior features; ωm represents the weight distribution corresponding to the m-th combination; rm represents the deviation corresponding to the n-th convolution kernel; g represents the nonlinear activation function, in this paper, the ReLU function was adopted; then Formula 7 gives the expression of the first convolution operation of the time feature table: The output CPC of the first-layer convolution operation was input to the initial capsule layer which is essentially the convolution operation. Assuming: CPPC,h represents the output feature map generated by convolution kernel h; Nh represents all the combinations of the h-th convolution kernel and the input features; ωw represents the weight distribution of the w-th combination; rh represents the deviation corresponding to the h-th convolution kernel; then, the calculation formula of the initial capsule layer can be equivalent to: The initial capsule layer divided the 256 output feature layers into 32 vector neuron layers. Each vector neuron layer had 8 dimensions and was transmitted as an input to the routing capsule layer, which was denoted as VNi. For the routing capsule layer, this paper calculated the length of the norm of vectors in each category of the final classified student learning motivations, and the category with the longest norm length was the predicted classification output. Assuming: dij represents the routing weight, then the specific calculation formula is given by Formula 9: The SQ in the above formula represents the squash function, and Formula 10 gives its calculation formula: For the output uj of the squash function, the formula not only guarantees that the length of uj does not exceed K, but also completes the unitization of the input vector ej, so that the vectors uj and ej maintain a same direction. The probability of the existence of the learning motivations of sports majors represented by vector neurons can be equivalent to the norm length of the entire vector, and the different attributes of learning motivations of sports majors can be equivalent to the direction of the vector.

Analysis of experimental results
The student learning behavior features studied in this paper included indicators such as diligence degree and behavior regularity, etc. The diligence degree indicator contains features such as the frequency of getting up early, the frequency of entering study room and library, the frequency of logging on to the online learning platform for learning, and the frequency of borrowing books, etc. The behavior regularity indicator contains features such as the change of the slope of learning behaviors, the change of mean value of indicators, and the measurement indicator of behavior complexity. Figure 4 shows the probability density distribution of the comprehensive indicator of different learning behaviors calculated by the change-complexity function. It shows the learning behaviors of sports majors in the evaluation intervals of 0-6 point, 6-7 point, 7-8 point, and 8-10 point. In the figure, the value of the vertical axis describes the possibility of a certain value near the horizontal axis, and the integral of the probability density curve in a certain interval on the horizontal axis is the probability of the corresponding area. According to the change trend of the curves, we can see that, the distribution of sports majors in different evaluation intervals exhibited normal distribution, and there're differences in the distribution peaks in different evaluation intervals, indicating that in different evaluation intervals, the indicator of students exhibited great differences, that is, according to the value of the comprehensive indicator after calculated by the change-complexity function, students of different learning behavior evaluation levels could be distinguished, and the evaluation of the learning behaviors of sports majors is feasible.

Fig. 4. Probability density distribution of comprehensive indicator
In addition, this paper conducted correlation analysis on the each CB value calculated by the information entropy, the approximate entropy, and the change-complexity function, the results are shown in Table 2, according to the results, the correlation coefficients between the new indicators calculated by the approximate entropy and the change-complexity function and the cultural class performance of sports majors were relative large, the values were all above 0.34, that is, the proposed indicator had a good effect in quantifying the complexity of the learning behaviors of students. Importance of the feature ** *** *** The prediction performance of the 6 models and the proposed model was compared and the results are given in Table 3. The data results showed that the proposed model obtained good scores in terms of precision, recall rate, and F1. The input of Model 6 was the same with that of the proposed model, but the score of the proposed model was higher. Both models 4 and 5 performed good, but still not as good as the proposed model. In terms of the score of F1 in negative samples, the performance of the proposed model was the best, the score was 0.933, followed by Model 6 with a score of 0.931. In terms of precision, the performance of the proposed model was also good. In summary, the proposed model performed better on the entire data set of the learning behaviors of sports majors.

Conclusion
This paper researched the learning motivations of sports majors based on big data and analyzed the features of their learning behaviors. Then, it used information entropy, approximate entropy, and change-complexity function to measure the complexity of the learning behaviors of sports majors. Next, a dataset was established based on the students' use of campus access network and online learning platforms, and a time domain convolutional capsule network model of multiple semantic features was established to recognize and classify the learning motivations of sports majors. After that, the experimental results gave the probability density distribution of the comprehensive indicator of different learning behaviors calculated by the change-complexity function, and a table of the correlation coefficient between cultural class performance and CB value was formed, indicating that the proposed indicators had a good effect in quantifying the complexity of students' learning behaviors. Also, a line chart of the determination coefficient RS between different models was plotted, and the indicators of different feature combinations were compared, which had verified that the proposed quantitative indicators had a good contribution to the construction of an accu-rate learning motivation prediction model. At last, this paper compared the prediction performance of six kinds of commonly-used models with the proposed model, and the results proved that the proposed model performed better on the entire data set of the learning behaviors of sports majors.