Machine Learning-Based Student Emotion Recognition for Business English Class

—Traditional English teaching model neglects student emotions, making many tired of learning. Machine learning supports end-to-end recognition of learning emotions, such that the recognition system can adaptively adjust the learning difficulty in English classroom. With the help of machine learning, this paper presents a method to extract the facial expression features of students in business English class, and establishes a student emotion recognition model, which consists of such modules as emotion mechanism, signal acquisition, analysis and recognition, emotion understanding, emotion expression, and wearable equipment. The results show that the proposed emotion recognition model monitors the real-time emotional states of each student during English learning; upon detecting frustration or boredom, machine learning will timely switch to the contents that interest the student or easier to learn, keeping the student active in learning. The research provides an end-to-end student emotion recognition system to assist with classroom teaching, and enhance the positive emotions of students in English learning.


Introduction
Emotions are the psychological states and emotional responses of a person to things based on the person's subjective experience in a certain environment. They are the physiological and psychological responses of the person to the good or bad information in the environment he/she is experiencing [1]. With the improvement of the computing power of computers and the advancement of technologies such as neural network, machine translation, and behavior recognition, artificial intelligence has made positive progresses, starting from simple seeing and hearing, it now has entered the stage of human-computer interaction [2]. Machine learning is an important field of artificial intelligence. Only by truly understanding the emotions of humans can the machines make correct responds to the environment using the thinking patterns of mankind [3,4]. Human emotions are extremely rich in states, and the machines need to understand various signals corresponding to each emotion to truly understand these emotions [5].
Emotion recognition technology is a field related to artificial intelligence; it can help computers intelligently recognize human emotions [6]. With the continuous development of the emotion recognition technology, there are more and more researches on emotion recognition, and these researches have occupied an important position in different application fields such as human-computer interaction [7]. The primary objective of emotion recognition systems is to interpret input signals of different patterns [8,9]. Human expressions and actions often carry more emotional information than the language. Most human emotion identification methods are performed based on the analysis of facial expressions. However, in many cases, people tend to conceal their true emotions, but they can hardly hide their body languages [10].
During the learning of business English, Emotional Intelligence (EI) plays an important role in English learning. Students with high EI can quickly relieve negative emotions such as dislike and depression during the learning process by regulating their own emotions [11,12]. And high EI students know how to motivate themselves to have positive emotions in learning and maintain good learning status [13]. The learning of English is a very complex psychological process; the learning process involves three stages of language recognition, language comprehension, and language communication, which respectively corresponding to language information processing, storage, and extraction [14]. In terms of the defining of machine learning environment, during the learning of business English, students can learn and practice in a purposeful, planned, and organized way, develop their knowledge and ability in English recognition, understanding and communication, and use emotions to express their business English learning during this process [15,16]. Based on machine learning and business English class, this paper proposed a student facial expression feature extraction method and constructed a student emotion recognition model to obtain the emotional states of learners of business English class.

Facial expression recognition
Facial expressions are an important way for humans to express their emotional states, and the emotional states of learners can be recognized by analyzing and processing their facial expressions [17]. Figure 1 shows the two-dimensional theory of emotions. In the figure, the arrow direction indicates increase of arousal values, when pleasure increases, positive emotion increases; when pleasure decreases, negative emotion appears. According to this theory, surprised and excited correspond to positive emotions; while anger, sad, hate, and fear correspond to negative emotions [18,19]. Positive emotions promote the occurrence of cognition activities, while negative emotions hinder the cognition process [20,21].  According to the features of different expressions, facial expression images can be divided into static image features and dynamic sequential image features. The expression recognition based on static images can extract, classify, and identify facial expression feature information; while the dynamic sequential facial expression images contain the dynamic information of the continuous changes of expressions, and they can reflect the changing process of the facial expressions [22]. Figure 2 shows the facial expression feature extraction methods. The static facial expression extraction method includes three types: global information method, geometric feature method, and mixed feature method; wherein the global information method contains the principal component analysis method, the linear discriminant method, and the Gabor wavelet transform method. The dynamic facial expression extraction method includes optical flow method, feature point tracking method, and differential imaging method. To select key features that can summarize the characteristics of facial expression changes, first, the facial movement units should be analyzed, the data of the multiple geometric shapes between facial feature points in the sequential images should be calculated, and these data need to be mapped to figures and tables for comparative analysis [23,24]. Then, according to the characteristics of the data changes in the images, features with more obvious changes should be selected; after that, through analysis, changes of facial feature points should be extracted to reflect the changes in facial features, so as to improve the expression recognition rate [25].

Analysis of emotion recognition algorithm
According to the research of domestic and foreign scholars, the basic process of facial expression recognition is to preprocess the human face images after obtaining the facial expression images, and then extract the facial expression features based on these processed images, and finally select a suitable classifier to realize the facial expression classification, as shown in Figure 3. A facial expression recognition system generally includes two parts: a system workbench and an inference engine. The system workbench forms a framework for mixed facial feature detection with multiple feature detection technologies applied in parallel; and the redundant information is used to define the parts in the image that do not contain the geometrical structure of human faces, and it also contains missing or highly inaccurate data [26,27]. The inference engine can convert low-level facial geometric shapes into high-level facial behaviors, and finally into high-level weighted emotion tags [28].

Feature extraction method of facial expression sequential images based on fused features
Facial feature extraction is the focus of image recognition. Extracting key features that can summarize facial feature changes can improve the recognition effect of the classifier, therefore, in the entire recognition process, feature extraction is of great significance to facial expression recognition [29]. In order to solve problems such as too high feature dimensions, too much memory consumption, and information redundancy, this paper adopted the feature extraction method based on the peak frame of expression images, which can directly use the expression peak images provided by static database or manually select expression images provided by dynamic database. The changing process of facial expressions is reflected by extracting facial feature points with higher contributions to the expression changes. One method of dynamic feature extraction is to track the facial feature points of facial expression changes, during the tracking process, background information that is irrelevant to expressions could be ignored. For the feature extraction method based on fused feature sequence, although it can use the position change information of feature points to represent the complete human facial expressions, still it's difficult to summarize the specific shape changes of each organ.

Facial expression classification method based on deep multi-kernel learning
The core of expression extraction is the extraction of features. A suitable classification algorithm should be chosen to process the labeled training set, and then the classifier model obtained after processing is used to classify and identify the remaining dataset of unknown expression categories. Multi-kernel learning is to obtain the best kernel by conducting combinatorial learning on a set of defined basic kernels. Deep learning is a machine learning method that interprets and analyzes images by simulating the mechanism of human brain. This method has a very strong learning ability, and it shows great advantages in image classification and shallow learning. Deep multi-kernel learning is composed of multiple layers of kernel functions and neural networks. The kernel function is the key component of deep multi-core learning, and its architecture is a multi-layer network architecture, and each layer has a set of kernels. Facial expression recognition uses expression database for testing, based on the deep core-kernel learning method, expression images such as Figure 4 are selected as examples. First, by extracting expression images from each frame of human face pictures, the features of geometric shape changes in local areas of the human face picture are described, then, the Gabor features describing the textures are extracted from the peak frame of human facial expression images, and the extracted twocategories of features are mixed in series and input into the deep multi-kernel learning model for training. Finally, the trained model is applied to classify the expressions. Figure 5 shows the model recognition rate based on deep multi-kernel learning. It can be clearly seen from the figure that, compared with the classification method based on geometric feature classification, the recognition rate of the classification method based on deep multi-kernel learning and fused features has been improved. For the expression given in Figure 4, the recognition rate of the expression "surprise" was 100%, and the second highest recognition rate was the expression "happy".

Human-computer interaction control method based on emotion recognition
Machine learning is a general method of artificial intelligence. The use of machine learning does not require assumptions of unknown or undefined mechanisms. Typical machine learning includes data normalization, representation learning, model fitting and evaluation, etc. [30]. The emotion recognition system monitors the emotional states of students during English learning in real time. When it detects that a student is frustrated or bored, the machine learning will timely switch to the contents that interest the student or easier to learn, keeping the student active in learning. The emotion recognition system consists of four layers, namely the hardware layer, software layer, information layer and service layer; wherein the hardware layer includes the physiological signal acquisition device and the rehabilitation robot system; the software layer processes the collected physiological signals; the information layer is responsible for the analysis of personal emotion information files and emotion performance; the service layer adjusts the corresponding learning content after receiving the emotion results identified by the software layer. Figure 6 shows the emotional state regulation mechanism. After difficulty level is initialized, if frustration is detected and the system is not at the lowest difficulty level, the difficulty level of English learning will be decreased; if boredom is detected and the system is not at the highest difficulty level, the difficulty level of English learning will be increased.   Figure 7 shows a diagram of the convolution method of a 3D convolutional neural network. The 3D convolutional neural network can solve temporal and spatial problems at the same time. One convolution kernel can convolve the images of multiple adjacent frames at a same time, and the values in the feature images are convolutions of the same position of each image in the previous layer. Except for this aspect, the principles of the 3D convolutional neural network are the same with those of the 2D convolutional neural network [31]. During English learning, the expressions captured by the machine are single-frame static expressions, which can be analyzed by the convolutional neural network. Figures 8 and 9 are the accuracy and loss of the convolutional neural network model. By comparing the results of training and testing, we can see that the experimental results agreed well.  Figure 10 shows the overall framework of the emotion recognition model, including emotion mechanism, signal acquisition, analysis and recognition, emotion under-standing, emotion expression, and wearable equipment. Emotion mechanism is the reaction of emotional states and physiological response/behavioral characteristics; emotion understanding is the expression of emotional states corresponding to the extracted emotion signals; emotion expression is the study of giving computers the human emotional expressions. Figure 11 shows the emotion recognition model. Users log in to the system by providing user information; the system acquires images through cameras during machine learning, then, it performs face detection on the original surveillance images, preprocesses the images, and recognizes expressions from the preprocessed human face images; and finally, the system can classify the emotions in the emotion recognition results.  Fig. 13. Flow of function modules of the emotion recognition system Figure 12 shows the modules of the emotion recognition system. The system includes two main parts: the core modules and the assistant modules. The core modules include image preprocessing, face detection, face recognition, expression recognition and expression classification; and the assistant modules include the database, teaching suggestions, database visualization, image acquisition, data management, user management, and learning emotion statistics. Figure 13 shows the flow of function mod-ules of the emotion recognition system. When a learner acquires permissions, the image acquisition module, face detection module, image preprocessing module, face recognition module and expression recognition module will turn on one by one; the emotion recognition module will analyze the learning emotions, and the learning emotion statistics module will provide suggestions for the teaching suggestion module, the visualization module, and the data management module. In the entire emotion recognition model, in order to improve the emotion recognition accuracy of the system, the images captured by the image acquisition device should be preprocessed, including the contrast, brightness and gray level of the images.

Conclusion
Based on machine learning and business English class, this paper proposed a feature extraction method for students' facial expression features, established a student emotion recognition model, and obtained the emotional states of learners of business English class. The conclusions drawn in this research are as follows: 1. The static facial expression extraction method includes three types: global information method, geometric feature method, and mixed feature method; the dynamic facial expression extraction methods include optical flow methods, feature point tracking method, and differential imaging method. 2. One method of dynamic feature extraction is to track the feature points of facial expression changes, during the tracking process, background information that is irrelevant to expressions could be ignored. 3. The emotion recognition system monitors the emotional states of students during English learning in real time. When it detects that a student is frustrated or bored, the machine learning will timely switch to the contents that interest the student or easier to learn, keeping the student active in learning. The emotion recognition system consists of four layers, namely the hardware layer, software layer, information layer and service layer. 4. The overall framework of the emotion recognition model includes emotion mechanism, signal acquisition, analysis and recognition, emotion understanding, emotion expression, and wearable equipment.