Paper —A Comparative Study of Machine Learning Methods for Automatic Classification of Academic … A Comparative Study of Machine Learning Methods for Automatic Classification of Academic and Vocational Guidance Questions

— Academic and vocational guidance is a particularly important issue today, as it strongly determines the chances of successful integration into the labor market, which has become increasingly difficult. Families have understood this because they are interested, often with concern, in the orientation of their child. In this context, it is very important to consider the interests, trades, skills, and personality of each student to make the right decision and build a strong career path. This paper deals with the problematic of educational and vocational guidance by providing a comparative study of the results of four machine-learning algorithms. The algorithms we used are for the automatic classification of school orientation questions and four categories based on John L. Holland's Theory of RIASEC typology. The results of this study show that neural networks work better than the other three algorithms in terms of the automatic classification of these questions. In this sense, our model allows us to automatically generate questions in this domain. This model can serve practitioners and researchers in E-Orientation for further research because the algorithms give us good results.


Introduction
The classification of questions is a problem that has already been studied by several researchers in this field, but most of the work is domain-specific or limited to a high-level classification.
Anbuselvan and R.Ahmed [1] proposed an SVM-based method for the same task. The question is first analyzed and numbered, the parts of the speech are labeled, the stop words are deleted, the data is truncated and many features are extracted. The feature selection steps are performed prior to transmitting the data to a carrier vector machine for training. The same treatment is also done for test questions, which can take a long time to get results in real-time.
Marco Pota [2] propose a feature-based method, in which features related to a subset of questions such as keywords, how -all / some words, leading verbs and various other such features were extracted from the texts a classifier.
For Natural Language Processing (NLP) Convolutional neural networks (CNNs) have already been used in some works. Collobert and J.Weston [3] first proposed the idea of a convolutional neural network architecture, which includes lookup tables and hyperbolic hard tangents. Kalchbrenner and P.Blunson [4] proposed a simplified version of Collobert's network, which was used to classify Twitter's questions and opinions. They used the concept of k-max pooling. Yoon Kim [5] developed Kalchbrenner's work to add various machine-learning strategies, such as regularization, to improve network performance.
For the time, the question classification has mainly been studied in the context of open-domain TREC (Text REtrieval Conference) questions [6], with smaller recent datasets available in biomedical [7] [8] and education [9]. The TREC corpus of questions from the open-domain is a set of questions associated with a taxonomy developed by Li and Roth [10] that includes 6 types of coarse responses (such as entities, locations, and numbers) and 50 fine-grained types (for example, specific types of entities, such as animals or vehicles). While a wide variety of syntactic, semantic and other features and classification methods have been applied to this task, culminating in an almost perfect classification performance [11], recent work has shown that QC methods developed on TREC issues usually fail to transfer to datasets with more complex issues such as those in the biomedical field [7], probably due in part to the simplicity and syntactic regularity of questions and the possibility of simpler term frequency models achieve near-ceiling performance [12].
In this world, the educational and guidance system of each country seeks to help the students or the laureates of higher education institutions and vocational training institutes to make their choice.
According to Ali Boulahcen [13] and through his analysis, he noticed that there is no real process of educational guidance in Morocco, but there is only a summary process in the context, within a few seconds, one decides on the fate of the pupil that based solely on his academic value then translated by a numerical note.
This means that the Moroccan school institution is based at least on selection criteria and not on orientation [13]. In this context, our goal is to set up an E-Orientation system that is interested in the automation of the orientation task, thanks to the evolution of information technologies. The realization of this electronic guidance system requires the classification then modeling and integration of user preferences in this system. In this paper, we used the Multi-Class Neural Networks algorithm to classify the different questions according to John L. Holland's RIASEC topology.
This document is organized as follows: Section 2 is devoted to a review of the literature of different theories of educational and vocational guidance, including the theory of John L. Holland. Section 3 is devoted to the various algorithms for the automatic classification of text that we will use in our model. Section 4 deals with the experimental evaluation of each classification algorithm with the results obtained. Finally, section 5 covers the conclusion with research perspectives.

Related Work
The guiding approach is based on theories and studies related to career choice and career development. These include Hoyt's concept of career education, Gardner's theory of multiple intelligences and Holland's typology of professional interests [14]. Holland's theory of vocational choice (1997) [15], is the result of the work of American psychologist and researcher "John Holland (1919Holland ( -2008". The results of his research argue that their skills, interests, and personality would determine the association of workers to one type of career.
Some activities would be better suited to one type of person than another would. It constitutes the theoretical anchoring of our classification model and serves as a basis for many psychometric tools, including the Hexa 3d professional interest's questionnaire. Although this theory, dating from the mid-1960s is still widely used [16] and has been the subject of numerous studies [17], [18].
To briefly explain his theory, Holland (1997) [15] formulates several hypotheses according to professional interests that are a mode of expression of personality. Therefore, he considers the choices of orientation as a mode of expression of this personality and distinguishes six types of personality (RIASEC), according to aptitudes, personality traits, values, and beliefs.
Of all the models related to career development, the Holland model has been the subject of the greatest number of analyzes and studies. [19]. Among those conducted on the structure of interests across gender and ethnic populations, a number demonstrates the consistency of the arrangement of types and their proximity on a hexagonal and spherical model [18], [20], [21]. This debate focuses more on the geometric regularity of the hexagon and on the correspondence distances between the different types. Vrignaud and Bernaud (1994) validated other things such as the structure of the Holland model in France [22].
Professional activities, as well as work environments, tend to bring together people who share common interests to a certain extent. The choice of a profession or trade is a form of expression of the personality of an individual; it is the theory of vocational interests. As well as, the person-work environment combination is the most widely used method in the world of educational and vocational guidance.
The theory of vocational choice distinguishes six categories of professional interest (realistic, investigative, artistic, social, enterprising, and conventional) corresponding to different personality profiles. Holland represents them according to a hexagonal model illustrated in Fig.1 [23]. According to Holland's theory and previous research, they have confirmed the profession or trade chosen by a person which is a form of expression of his personality. Therefore, it is related to the type to which he belongs.
The affiliation of a worker to one of the six types would be determined by his aptitudes, by certain traits of his personality and interests. So, according to Holland, people of the same type would be attracted to the same kind of work. Why? Because these people are similar in their personality, in the fact they pursue similar objectives and have the same physical or psychological dispositions with regard to their work. All persons can be divided into six professional types.
The typology of a person is established by measuring his degree of affinity with each of the six types, to place them in order of importance, of the type that corresponds most to him. For most people, it is mostly the first two or three types of personal classification that determine their way of being and acting in their personal and professional lives. For example, a person whose dominant type is "Investigator" and who has affinities with the "Realist" type; we will say that he has an "IR" profile. To further characterize this person's typology, it is possible to consider the third type which it most closely resembles and to say the case where it is of the "Social" type and is this person has an "IRS" profile?
These types can be combined in all sorts of ways and their combination determines the personality.
• The Realistic type: People of this type take pleasure in carrying out concrete tasks.
Adroit with their hands, they know how to coordinate their actions. They are happy to use tools, are adept at appliances, machines, vehicles. No problem to tinker or repair what is down. Realists often have a sense of mechanics and precision. Many practice their profession outdoors rather than indoors. Their work often requires good physical stamina and even athletic abilities.
• The Investigator type: Most investigators are not afraid of "theory", on the contrary. They like to collect data, make assumptions, look for solutions to solve problems as we do in maths. The "investigators" take the time of the observation; they are often "secondary" unlike the impulsiveness that acts without taking the time of the analysis. So, they like to be absorbed in their thoughts, play with ideas.
In the work, we appreciate their intellectual rigor and their sense of method, but as a team, their character may seem a bit cold and distant. • The Artistic type: Artist profiles are interested in creative work, be it visual art, literature, music, advertising or theater. Independent and non-conformist, they are comfortable in situations that are out of the ordinary. They are endowed with great sensitivity and imagination. Although they are discouraged by methodical and routine tasks, they are nevertheless able to work with discipline to perfect their artistic talent and to carry out long-term work. • The Social type: People of this type like to be in contact with others in order to help them, to inform them, to educate them, to entertain them, to treat them or to promote their growth. They are interested in human behaviors and are concerned about the quality of their relationships with others. They use their knowledge and their feelings and emotions to act and interact. • The Entrepreneurial type: People of this type like to influence their surroundings. Their decision-making ability, sense of organization and a particular ability to communicate their enthusiasm support them in their goals. They know how to sell ideas as much as material goods. They have a sense of organization, planning, and initiative and know-how to carry out their projects. They know how to be bold and efficient. • The Conventional type: People of this type have a preference for specific, methodical activities that focus on a predictable outcome. They are concerned about the order and the good material organization of their environment. They prefer to conform to well-established conventions and clear instructions rather than to act with improvisation. They like to calculate, classify, maintain registers. or folders. They are effective in any job that requires accuracy and ease in routine tasks. [24].

Materials and Methods
Classification systems for the best-performing questions tend to use a rule-based custom template matching [25] [11], or a combination of basic learning approaches. of rules and machine learning [26], to the detriment of model construction time.
Recent research on the methods learned has shown that a large number of CNN variants [27] and LSTM [12] achieve similar precision on the TREC question classification; these models presenting at best small gains compared to simple models. Term frequency models. These recent developments echo the observations of Roberts and M.Fiszman [7], who have shown that existing methods beyond term frequency models fail to generalize to questions in the medical field.
In the education sector, researchers Godea. A and Nielsen.R [9] collected 1,155 questions in class and classified them into 16 categories. To allow a detailed study of the classification of questions in the scientific field.
The process of classifying a text collection is to label each text with one or more predefined classes (Categories). In this process, an algorithm is first designed then it is driven with a set of specific characteristics, for example, word occurrences or even theme distributions in a document. Once trained, the algorithm is used to label new texts, but these are different from the texts used during training. The algorithm is evaluated on the number of classification errors obtained during the learning phase and during the test phase.
When we are training the classification algorithm, the extraction phase of the characteristics is used for learning crucial. These Characteristics extracted from texts that are typically derived from a large vector space. This space is constructed with vector modeling of words using distributional semantics [28].
Data science or statistical algorithms are further classified into multiple machines learning specific algorithmic categories: • Supervised learning algorithms (label and output known).
• Unsupervised learning algorithms (label and output not known).
• Semi-supervised learning algorithms (mix of supervised and unsupervised).
These algorithms, in turn, contain multiple sub-algorithms and types (see Table I). For example, a few algorithms fall under the category of parametric, whereas others are non-parametric. In parametric algorithms, information about the population is completely known which not the case with non-parametric algorithms is. Typically, parametric models deal with a finite number of parameters, whereas non-parametric learning models are capable of dealing with an infinite number of parameters. Therefore, the training data grows the complexity of nonparametric models increases. Linear regression, logistic regression, and Support vector machines are examples of parametric algorithms. K-nearest neighbor and decision trees are non-parametric learning algorithms. These algorithms are computationally faster in comparison to their nonparametric companions. As TABLE 1 depicts, the machine learning algorithms are large in number [29]. In this section, we will describe the different classification algorithms used in our research

Multiclass decision forest
The decision forest algorithm is an ensemble learning method for classification. The algorithm works by creating several decision trees and then voting on the most popular output class. Voting is a form of aggregation, in which each tree in a classification decision forest generates a non-standard frequency histogram of labels. The aggregation process adds these histograms and normalizes the result to obtain the "probabilities" for each label. Trees that have high confidence in the forecasts have a greater weight in the final decision of the set.
Decision trees, in general, are non-parametric models, which means that they support data with varied distributions. In each tree, a simple test sequence is executed for each class, increasing the levels of a tree structure until a leaf node (decision) is reached.
Decision trees have many advantages, they can represent non-linear decision limits, they are effective in calculating and using memory during training and prediction, and they perform an integrated selection and classification of features are resistant in the presence of noise characteristics.
The decision forest classifier in Azure Machine Learning Studio (Classic) consists of a set of decision trees. In general, ensemble models provide better coverage and accuracy than single decision trees.

Multiclass decision jungle
Decision Jungles are a recent extension of Decision Forests. A decision jungle consists of a set of decision-directed acyclic graphs (DAGs). The decision jungles have the following advantages; By allowing tree branches to merge, a decision DAG generally has a smaller memory footprint and better generalization performance than a decision tree, but at the cost of slightly longer training time. Additionally, decision jungles are non-parametric models, which can represent nonlinear decision boundaries. Finally, they perform an integrated selection and classification of features and are resistant to noisy features.

Multiclass regression logistic
Logistic regression classification is a supervised learning method and therefore requires a tagged dataset. You train the model by providing the model and dataset labeled as input to a module such as the Train Model or Tune Model Hyperparameters. The driven model can then be used to predict the values of new input examples.
Logistic regression is a well-known method in statistics that is used to predict the probability of a result and is particularly popular for classification tasks. The algorithm predicts the probability of occurrence of an event by adjusting the data to a logistic function. For more details on this implementation, see the Technical Notes section. In multi-class logistic regression, the classifier can be used to predict multiple outcomes.
Multinomial logistic regression is a form of logistic regression, which used to predict a target variable; it has more than 2 classes. It is a modification of logistic regression using the softmax function instead of the sigmoid function, and the crossentropy loss function. The softmax function squashes all values to the range [0, 1] and the sum of the elements is one.
(1) Cross entropy is a measure of how different 2 probability distributions are near to each other. If p and q are discrete, we have: (2) This function has a range of [0, inf], it is equal to 0 when p=q and infinity then p is very small compared to q or vice versa. For example x, the class scores are given by vector z=Wx+b, where W is a C×M matrix and b is a length C vector of biases. We define the label y as a one-hot vector equal to 1 for the correct class c and 0 everywhere else. The loss for a training example x with predicted class distribution y and correct class c will be: (3) http://www.i-jim.org Paper-A Comparative Study of Machine Learning Methods for Automatic Classification of Academic … (4) As in the binary case, the loss value is exactly the negative log probability of a single example x having true class label c. Thus, minimizing the sum of the loss over our training example is equivalent to maximizing the log-likelihood. We can learn the model parameters W and b by performing gradient descent on the loss function with respect to these parameters. There are two common methods to perform multi-class classification using the binary classification logistic regression algorithm: one-vs-all and one-vs-one. In one-vs-all, we train C separate binary classifier for each class and run all those classifiers on any new example x, we want to predict and take the class with the maximum score. In one-vs-one, we train C to choose 2 classifiers = C(C-1)/2 one for each possible pair of class and choose the class with maximum votes while predicting for a new example.

Multiclass neural network
A neural network is a set of interconnected layers. The inputs are the first layer and are connected to an output layer by an acyclic graph comprised of weighted edges and nodes.
Between the input and output layers, you can insert multiple hidden layers. Most predictive tasks can be accomplished easily with only one or a few hidden layers. However, recent research has shown that deep neural networks (DNN) with many layers can be very effective in complex tasks such as image or speech recognition. The successive layers are used to model increasing levels of semantic depth.
The relationship between inputs and outputs is learned from training the neural network on the input data. The direction of the graph proceeds from the inputs through the hidden layer and to the output layer. All nodes in a layer are connected by the weighted edges to nodes in the next layer.
To compute the output of the network for a particular input, a value is calculated at each node in the hidden layers and in the output layer. The value is set by calculating the weighted sum of the values of the nodes from the previous layer. An activation function is then applied to that weighted sum. For example, neural networks of this type can be used in complex computer vision tasks, such as recognition of numbers or letters, document classification, and pattern recognition.
Classification using neural networks is a supervised learning method and therefore requires a tagged data set that includes a label column. You can train the model by providing the tagged model and dataset as input for Train Model or Tune Model Hyperparameters. The driven model can then be used to predict the values of the new input examples.
A neural network is a set of interconnected layers. The inputs are the first layer and are connected to an output layer by an acyclic graph composed of weighted edges and nodes. We can insert multiple hidden layers between the input and output layers. Most predictive tasks can be accomplished easily with one or more hidden layers. However, Deep Neural Networks (DNNs) [30], [31] with many layers can be very effective for complex tasks such as image recognition or speech. Successive layers are used to model increasing levels of semantic depth. The relationship between inputs and outputs is learned during the formation of the neural network on the input data. The chart direction passes inputs to the hidden layer and the output layer. All the nodes of a layer are connected by the weighted edges to the nodes of the next layer.
To calculate the network output for a particular input, a value is calculated at each node of the masked layers and the output layer. The value is defined by calculating the weighted sum of the values of the nodes of the previous layer. An activation function is then applied to this weighted sum.
We use a multiclass neural network module to predict a multi-valued target knowing that neural networks of this type could be used in complex computer vision tasks, such as recognition of numbers or letters, classification of documents, of text (Questions) and for pattern recognition. In this sense classification, using neural networks is a supervised learning method. It, therefore, requires a tagged data set comprising a label column.

Proposed Method
Our proposed system is based on the four algorithms described in the second part of this article that follows supervised learning. The goal is to discover an underlying structure of the data. This algorithm requires a tagged data set. The E-Orientation Data Orientation Data Set is divided into two series, such as training data and test data. The classification performed by the algorithm used in our model is based on the knowledge acquired by the learning data during the learning process.
Our dataset was collected from the RIASEC test based on Holland's theory [32], [33], [34], it contains two columns namely: Question: It contains questions and statements that measure either the occupations or the activities or abilities or the personality of the users.
Categories: we have four classes (labels) of categories namely: In our research work on Guidance Classification, we used the Azure Machine Learning Studio [35] tool which is a collaborative drag-and-drop tool that we can use to create, test, and deploy predictive analytics solutions on our data. Machine Learning Studio publishes templates as a web of services that can be easily consumed by custom applications. Machine Learning Studio is the meeting place of data science, predictive analytics, cloud resources, and our data.

Experiment and Results
The experimental steps described and illustrated in Fig.2.They are explained below: a) Importing the dataset: We import our dataset entitled "E-Orientation Data" that we collected from several websites from our local disk on Azure ML Studio to be used for the experiment and Category names that we have been used as a class tag or attribute to predict. b) Preprocessing and preparing the dataset: The dummy column headers have been replaced by meaningful column names by using the metadata editor. In addition, missing values have been cleared by deleting the entire line containing the missing value. c) Feature engineering: After the processing phase of the dataset, we will use the feature hashing module to convert the raw text of the questions into integers; and use the integer values as input entities of the model. Figure 3 represents our model. d) Split the data and parameter settings: We have divided the data of "E-Orientation Data" as 70% of the data for training and 30% for the test. Then for the Multiclass Neural Networks algorithm, we applied it with the default settings for model formation. The parameters have been set by using the "Tune model hyperparameters". e) The model: each time We used one of these four algorithms f) Score and evaluate the model: The Evaluate model visualizes the results through the confusion matrix.
For the schema of our model, we can summarize it in the following figure knowing that for each algorithm we keep the same steps described in the figure except that we change the algorithm used.  According to the results shown in this last table, we note that the results obtained by the Multiclass neural network algorithm are the best followed by the results of the Multiclass Regression Logistic algorithm and for the two algorithms we see that the results are the same., this shows us that the best algorithm to use is the Multiclass neural network algorithm [36].
For the matrix of confusion concerning the algorithm Multiclass decision forest, we have obtained the following in figure number 02.

Conclusion
In this article, we defined and applied the four machine learning algorithms used for text classification. We conclude that multi-class neural networks work better than the other three machine learning algorithms.
The Multiclass Neural Network algorithm used in our classification model of Academic and Professional Class Orientation Issues is implemented using Azure Machine Learning Studio. In fact, we found that the supervised method gives very good precision. This method can also be used to automatically generate academic and vocational orientation questionnaires by knowing the class of these proposed new questions in advance, and we can view this research question as a perspective. This automatic classification model using machine-learning algorithms can also help Eguidance researchers in the development process in this area.
As future work, we focus on the use of social network analysis, for example, using Twitter's sentiment analysis as a feature to determine the class of questions and interests of students and faculties of educational institutions. 'Education. The emergence of a new multi-label classification approach called BERT [37], the acronym for Bidirectional Encoder Representations from Transformers, is a language model (in) developed by Google in 2018. This method has significantly improved automatic language processing algorithms; the application of this method in our next work is an issue in order to compare the results obtained by the latter method with the results obtained by these four algorithms used in this research work. to develop a system of E-orientation is our goal knowing that online services (evaluation, learning) have shown their great effectiveness according to several researchers [38]