SAFE: A Sentiment Analysis Framework for E-Learning

— The spread of social networks allows sharing opinions on different aspects of life and daily millions of messages appear on the web. This textual information can be a rich source of data for opinion mining and sentiment analysis: the computational study of opinions, sentiments and emotions expressed in a text. Its main aim is the iden-ti!cation of the agreement or disagreement statements that deal with positive or negative feelings in comments or reviews. In this paper, we investigate the adoption, in the field of the e-learning, of a probabilistic approach based on the Latent Dirichlet Allocation (LDA) as Sentiment grabber. By this approach, for a set of documents belonging to a same knowledge domain, a graph, the Mixed Graph of Terms, can be automatically extracted. The paper shows how this graph contains a set of weighted word pairs, which are discriminative for sentiment classi!cation. In this way, the system can detect the feeling of students on some topics and teacher can better tune his/her teaching approach. In fact, the proposed method has been tested on datasets coming from e-learning platforms. A preliminary experimental campaign shows how the proposed approach is effective and satisfactory.


INTRODUCTION
E-Learning represents an effective answer to the continuous request of life-long learning.Many institutions are adopting this approach both to improve their traditional courses and increase the potential audience since it allows more flexibility and quality in general.Anyway, eclassrooms are often composed by students inattentive or appearing bored and wondered.So the main question for the teacher becomes: why am I not able to reach these students and catch their attention?Why are they not excited about the material although my efforts to present it in an organized and coherent manner?This sense of frustration increases when he faces students' poor performance on tests [28] [29].Recent studies showed that emotions can affect the e-learning experience.[1] What are emotions?A general definition for emotions is the following: emotions are complex psychophysical processes that evoke positive or negative psychological responses (or both) and physical expressions, often involuntary.Emotions are often related to feelings, perceptions or beliefs about elements, objects or relations between them, in reality or in the imagination.They typically arise spontaneously, rather than through conscious effort.An emotion (reaction or state) is often differentiated from a feeling (sensation or impression), although the word "feeling" is used as a synonym for "emotion" in some contexts.In fact, emotion has to do with how one feels.This feeling, if positive is believed to have a productive effect on the individual; otherwise it seems to impact negatively on the individual's learning experience.Obviously, the topic of emotions goes far beyond this simple definition and it is especially hard to detect in an e-learning environment.In a face-to-face class instructors can detect facial expressions of students but, in an online environment, students need to establish an online presence and the instructors need to be able to pick up on this [2].In this scenario, a promising approach is sentiment analysis -the computational study of opinions, sentiments and emotions expressed in a text [3].Its main aim is the identification of agreement or disagreement statements to capture positive or negative feelings in comments or reviews.In this paper, an approach for detecting the emotions of students in an elearning environment by the use of the sentiment analysis is proposed.Opinion-driven content management has several important applications, such as determining critics' opinions on a given product by analyzing online product reviews, or tracking the shifting attitudes of the public toward a political candidate by mining online forums [4] [5].Many researchers are investigating the adoption of Sentiment Analysis in E-Learning field [6].An introduction to an opinion mining framework that can be manipulated to work in an e-learning system was presented by [7].A promising approach uses Conditional Random Fields for identifying and extracting the opinions; it considers the negative sentences and degree adverbs in sentiment processing [8].The experiment has proved that it is with high analysis precision and accuracy on opinions' extraction and sentiment analysis are helpful to the e-learning system.Another interesting approach is in [9] where a HMM and SVM-based hybrid learning sentiment classification algorithm has been introduced to classify the learner opinion regarding the e-learning system service to improve its performance.In [10] different possibilities aimed at automatically extracting emotions from texts have been explored: twelve essays written by a fresher student along her first semester in college are analysed and investigated.The results support the idea of using non-intrusive emotion detection for providing feedback to students.In this paper, we investigate the adoption of an approach to sentiment analysis based on the Latent Dirichlet Allocation (LDA).In LDA, each document may be viewed as composed by a mixture of various topics.This is similar to probabilistic latent semantic analysis (pLSA), except that in LDA the topic distribution is assumed to have a Dirichlet prior.By the use of the LDA approach on a set of documents belonging to a same knowledge domain, a Mixed Graph of Terms can be automatically extracted [11] [12].Such a graph contains a set of weighted word pairs, which we demonstrate to be discriminative for PAPER SAFE: A SENTIMENT ANALYSIS FRAMEWORK FOR E-LEARNING sentiment classification.The proposed approach has been applied to a real case: the blended course of Software Technologies for the Web held in the University of Salerno's Computer Science School.The organization of this paper is the following: in section 2 related works on sentiment analysis are discussed; section 3 discusses briefly the extraction of a Mixed Graphs of Terms from a document corpus collected from a case study and their discriminative power.Section 4 introduces the proposed approach while section 5 discusses experimental results.

II. RELATED WORKS
In literature, there are many approaches related to the sentiment analysis [25][24] [23].In particular, some approaches attempt to classify the sentiment at a document level.In [22] authors introduce an approach based on the algebraic sum of the orientation terms (positive or negative) for document classification.Starting from this approach other techniques have been developed [21].Baroni [20] proposed to rank a large list of adjectives according to a subjectivity score by employing a small set of manually selected adjectives and computing the mutual information of pairs of them using frequency and cooccurrence frequency counts on the web.Starting from this approach many researchers developed "sentiment" lexicon.The work of Turney [19] proposes an approach to measure the semantic orientation of a given word based on the strength of its association with a set of context insensitive positive words minus the strength of its association with a set of negative words.By this approach sentiment lexicon can be built and a sentiment polarity score can be assigned to each word [18] [17].Artificial intelligence and probabilistic approaches have been adopted for the sentiment mining.In [16] three machine learning approaches (Naive Bayes, Maximum Entropy and Support Vector Machines) have been adopted to label the polarity of movie reviews.A promising approach has been developed in [15] where a novel methodology has been obtained by the combination of rule based classification, supervised learning and machine learning.Another interesting approach is in [14] where a probabilistic model, the Sentiment Probabilistic Latent Semantic Analysis (S-PLSA), has been adopted [13].The S-PLSA is an extension of the PLSA where it is assumed that there are a set of hidden semantic factors or aspects in the documents related to each other according to a probabilistic framework.

III. EXTRACTING A MIXED GRAPH OF TERMS
In this section we explain how a Mixed Graph of Terms can be extracted from a corpus of documents.The Feature Extraction module (FE) is represented in Fig. 1.The input of the system is the set of documents: After the pre-processing phase, which involves tokenization, stop words filtering and stemming, a Term-Document Matrix is built to feed the Latent Dirichlet Allocation (LDA) [27] module.The LDA algorithm, assuming that each document is a mixture of a small number of latent topics and each word's creation is attributable to one of the document's topics, provides as output two matricesand -which express probabilistic relations between topic-document and word-topic respectively.
Under particular assumptions [26], LDA module's results can be used to determine: the probability for each word v i to occur in the corpus (W A ) ; the conditional probability between word pairs (W C ); the joint probability between word pairs (W J ).Details on LDA and probability computation can be found on [26].Defining Aggregate roots (AR) as the words whose occurrence is most implied by the occurrence of other words of the corpus, a set of H aggregate root r=(r 1 ,…,r H ) can be determined from W C : This phase is referred as Root Selection (RS) in Fig. 1.A weight can be defined as a degree of probabilistic correlation between AR pairs: g . We define an aggregate as word v s having a high probabilistic dependency with an aggregate root r i .Such a dependency can be expressed through the probabilistic weight p .Therefore, for each aggregate root, a set of aggregates can be selected according to the highest weight values.As a result of the Root-Word level selection (RWL), an initial mGT structure, composed by H aggregate roots R l linked to all possible aggregates W l, is obtained.An optimization phase allows neglecting weakly related pairs according to fitness function [26].In particular, our algorithm, given the number of aggregate roots H and the desired max number of pairs as constraints, chooses the best parameter settings and defined as follows: • : the threshold that establishes the number of aggregate root/aggregate root pairs.A relationship between the aggregate root and aggregate root is relevant if gg g .

•
: the threshold that establishes, for each aggregate root , the number of aggregate root/word pairs.A relationship between the word and the aggregate root is relevant if A mixed graph of terms is then built from several clusters, each containing a set of words (aggregates) related to an (aggregate root) , the centroid of the cluster.Some aggregate roots are also linked together building a centroids sub graph.

IV. SEARCHING THE SENTIMENT BY THE USE OF THE MIXED GRAPH OF TERMS
As described in the previous section, a Mixed Graph of Terms gives a compact representation of a set of documents related to a well-defined knowledge domain.In this PAPER SAFE: A SENTIMENT ANALYSIS FRAMEWORK FOR E-LEARNING way the obtained graph can be considered as a filter to be employed in document classification problems.The main aim of this paper is to show how mGT can be effectively applied for sentiment mining from texts: the proposed method can be used to build a sentiment detector able to label a document according its sentiment.Our system is composed by the following modules: • Mixed Graph of Terms building module: this module builds a mixed graph of terms starting from a set of documents belonging to a well-defined knowledge domain and previously labeled according the sentiment expressed in them.In this way the obtained mixed graph of terms contains information about the words and their co-occurrences so representing a certain sentiment in a well-defined knowledge domain.As described in section 3 thanks to the LDA approach such a graph can be obtained by the use of a set of few documents.In figure 2 the module architecture and its main functional steps are depicted.The output of this module is a mixed graph of terms representing the documents and sentiment.By feeding this module with positive or negative training sets, it will be possible to build mixed graphs of terms for documents that express positive or negative sentiment in a well-defined domain.
• Sentiment Mining Module: this module extracts the sentiment from a document thanks to the use of the Mixed Graph of Term as a sentiment filter.The input of this module is a generic document, the mixed graph of terms representing positive and negative sentiment in a knowledge domain and the output is the sentiment detected in the input document.The proposed algorithm requires the use of an annotated lexicon, as for example WordNet or ItalWordNet, for the retrieval of synonyms of the words contained in the document D and not included in the reference mGT.The retrieved synonyms are added to the vector W and analyzed according to the classification strategy.The proposed approach is effective in an asynchronous sentiment classification, but can work also in a synchronous way.In figure 3 the synchronous sentiment real time classificatory architecture is depicted.For real time working two new modules have been introduced:

Determining the Sentiment
• Document Grabber.This module aims to collect documents from web sources (social networks, blogs and so on).These documents can be collected both for updating the training set and for their classification according to the sentiment.The training set update is an important feature of the proposed approach.In this way, in fact, the various mGTs can be continuously updated and improve their discriminating power introducing new words and relations and deleting inconsistent ones.
• Document Sentiment Classification.The new documents inserted into the training set have to be classified by the support of an expert.The aim of this module is to provide a user friendly environment for the classification, according to their sentiment, of the retrieved documents.

VI. EXPERIMENTAL RESULTS
The evaluation of the proposed method has been conducted through two steps.Firstly the proposed approach has been applied on a standard dataset: the Movie Reviews Dataset [16].The main aim of this experimentation was to evaluate method's performance and make a comparison with the other approaches well known in literature.The experimentation has been conducted considering the 25% of the dataset as training set and the remaining 75% as test set.The obtained results and their comparison with other approaches are depicted in table 1.From the table 1 it can be observed that the proposed approach shows the best results from the point of view of accuracy.
The second experimental phase has been carried out using a real dataset.The experimental scenario involved the analysis of posts collected from the popular e-learning platform Moodle.In particular, the course of Software Technology has been held by the use of a blended ap-

Reference paper
Methodology Accuracy [14] mGT approach The course has been organized in the following topics: For each topic a final test has been submitted to the students.The traditional lectures have been supported by the use of additional learning contents distributed by the use of Moodle.Chat and forum enhanced the collaborative approach of the course.About 75 students attended the lectures and used Moodle for share comments each other.The contents exchanged by the use of forum and chat have been set not visible for the teacher and this policy has known by students.A Sentiment Analysis Module has been used for grabbing the mood of students during the various lectures related to the various topics.The real time analysis of the comments furnished a sort of thermometer of the mood of classroom regarding to the various topics.In table 2 the number of the posts collected from the chat and the forum for each topic has been reported.Also the relative retrieved sentiment has been reported in terms of positive and negative percentage.The observation period expresses the length of the course's section dedicated to a certain topic.
In Table 2 the average sentiment of the classroom has been reported.The sentiment analyzer module furnished in real time the sentiment of the classroom to the teacher which can tune the teaching strategy.In particular it is interesting the evaluation of the trend of the sentiment during the observation period (figure 4).
It is interesting to notice the positive trend of the positive sentiment during the observation time.The reason of this trend is almost clear: at the beginning of each topic students showed a natural disorientation that is greater for topics related to the programming sections of the course.After these first phases teacher updated his teaching style according to the sentiment of the students giving them more contents or introducing more examples or exercises.For example, in the case of PHP language after the 12th  Day teacher introduced a series of exercises done and this support had a positive effect on the students.In general, teacher appreciated the Sentiment Grabber tool above all for the opportunity to manage the mood of the class without the filter of the relationship teacher -student.

VII. CONCLUSIONS
This paper proposes the use of the mixed graph of terms, obtained by the use of Latent Dirichlet Allocation approach, as tool for the sentiment classification of documents.The method relies on building the reference mGTs from documents labeled according their sentiment.The classification of a document can be conducted by using the reference mGTs.The proposed method has been applied in the e-learning field for measuring the mood of a classroom towards some topics.Further development of this approach will include the introduction of annotated lexicon, as SentiWordnet, for a better sentiment evaluation of the words and the sentence structures.

Figure 2 .
Figure 2. Sentiment Analysis System ArchitectureThe sentiment extraction is obtained by a comparison between document and the mixed graph of terms according to the following algorithm:

Figure 3 .
Figure 3. System Architecture for Synchronous Classification

Figure 4 .
Figure 4. Trend of the positive sentiment during the observation period

TABLE I .
THE ACCURACY OBTAINED BY THE VARIOUS METHODS ON THE CON-

TABLE II .
AVERAGE SENTIMENT OF THE CLASSROOM