Learning Analytics Through

—learning analytics is an emerging discipline focused on the measurement, collection, analysis and reporting of learner interaction data through the E-learning contents. Serious game provides a potential source for relevant educational user data; it can propose an interactive environment for training and offer an effective learning process. This paper presents methods and approaches of educational data mining such as EM and K-Means to discuss the learning analytics through serious games, and then we provide an analysis of the player experience data collected from the educational game “ELISA” used to teach students of biology the immunological technique for determination of ANTI-HIV antibodies. Finally, we propose critically evaluation of our results including the limitations of our study and making suggestions for future research that links learning analytics and serious gaming.


Introduction
The learning analytics (LA) focuses on the collection, analysis and visualization of large amounts of data related to learning processes. It aims to understand and promote the learning effectiveness. LA involves the power of big data and data-mining techniques to improve the learning assessment [1]. Serious game proposes highly interactive software, which can produce massive user data. It can pose an advantage to feed LA systems, providing a learning dashboard for professors.
Serious game provides an additional mean to increase learning interest, coaching and evaluation of user performance. For instance, serious games can be designed to solve complex problems collaboratively, make learning process more efficient, achieve predictive modeling and real time visualization, and attain more retention of knowledge compared to traditional methods [2,3]. We propose in this topic a serious game for immunological techniques. This kind of medical training can benefit from using games to help the bachelor students in biology; especially the students who are interested to study the human system to deepen their knowledge and understand the ELISA (Enzyme Linked Immunosorbent Assay) method for HIV screening. The immunological techniques are used to determine or measure an immune response, and antigens using antibodies; among the most used techniques, ELISA (Enzyme Linked Immunosorbent Assay) is one of the widely biochemical method in analysis and diagnostics laboratory. Several analyses such as peptides, proteins, antibodies and hormones can be detected selectively and quantified in low concentrations among a multitude of other substances. ELISA combines the specificity of antibodies with highturnover catalysis by enzymes to provide specificity and sensitivity [4].
This paper provides the clustering methods and data visualization of learner performance on experiencing ELISA game. In this perspective we will present the data mining algorithms EM and k-means involved discussing students results, then, we will propose the serious game ELISA designed to learn the immunology technique and to support the learning effectiveness for students on biology field. Finally, we will provide the performance levels of students resolved from clustering and data visualization methods.

Learning analytics and data mininng
The field of learning analytics (LA) refers to the collection, analysis and visualization of large amounts of data related to educational processes. In its heart, LA aims to harness the power of big data and data-mining techniques to improve the assessment of the learning processes [1].
Data mining is the process of analyzing data from different perspectives and summarizing the results as useful information. It has been defined as the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data [5].
Among the most famous branches of data mining, there are the educational data mining "EDM" that describes a research field concerned with the application of data mining, machine learning algorithms and statistical tools to information generated from educational area.
Another definition of educational data mining as a tool of mining in the educational environment, concerns developing new methods to discover knowledge from educational databases [6]. It is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in [7]. The educational data mining process involves several methods for learning analytics such as prediction, structure discovery, relationship mining, and discovery with models [8].

Clustering
The clustering techniques provide a structure discovery of data, they aim to find similar points of heterogeneous data resources that group items into a set of clusters [9], and provide the possibility to identify dense and sparse regions in item space and the correlations among data attributes. It can be used as a tool to difference groups or classes of objects. Clustering is particularly useful in the case where the most common categories within the data set are not known in advance. Clustering can be applied in several communities, for example students in schools could be clustered together to investigate similarities and differences among them, also student actions could be clustered together to investigate patterns of behavior.
Clustering algorithms typically split into two categories (see Figure 1: Clustering categories): we can go "bottom up" in the hierarchical agglomerative clustering (HAC) by grouping small clusters into larger ones, or "top down" in the divisive clustering by splitting big clusters into small ones, as in the divisive clustering provided by several algorithms such as k-means, EM-based clustering and spectral clustering. Although the agglomerative clustering assume that clusters themselves cluster together, the non-hierarchical approaches assume that clusters are separate from each other [8]. In our case study, we will provide an application of the divisive clustering with the purpose to split the players' data into separate clusters to propose an effective learning analytics through serious games.

The K-Means algorithm
The k-means algorithm [10] selects randomly k number of objects, each of them initially represents a cluster mean or center, an object is assigned to the cluster to which it is most similar, based on the distance between the object and cluster mean. Then it computes the new mean for each cluster. This process iterates until the criterion function converges and no change in the cluster centers is discovered. The flowchart of k-means algorithm shown in Figure 2. The k-means algorithm requires the number of clusters as input; it will produce exactly k different clusters that have the greatest possible difference. The best number of clusters leading to greatest distinction must be computed from the data [11].

The Expectation Maximization (EM) Algorithm
The Expectation-Maximization (EM) algorithm defines an extension of clustering to continuous and categorical variables, it assumes an employment of probability model with parameters to describe the probability of an instance to belong to a certain cluster [11]. EM starts by the initialization of the model parameters, then the expectation step computes the probability that an instance x belongs to cluster i, finally the maximization step calculates the clusters means according to the probabilities that all points belong to cluster i. The expectation and maximization steps are repeated until the model parameters converge (see Figure 3).

Fig. 3. flowchart of EM algorithm
The final cluster of the EM algorithm provided from the maximization of the overall probability or likelihood of the data. Although the EM clustering can handle continuous and categorical variables, the classical K-means algorithm focuses on the continuous variables. The K-means can also be modified to deal with the categorical variables [11].

3
Serious game for immunology techniques

ELISA technique
The simple definition of the enzyme-linked immunosorbent assay (ELISA) is the test that uses antibodies and color change to identify a substance. The procedure employs the specificity of both antibodies and enzymes to achieve a highly sensitive and precise test. It is one of the procedures used for the detection of antibodies to the HIV virus. There are three types of ELISA (see Figure 4): direct, sandwich and competitive; each one of them has specific steps.
The process of ELISA test consists, in general, of three main steps: preparation of samples, execution and evaluation of the obtained results. In each step there are different manipulations that the learner must understand and practice to have an accurate result at the end of the test [12].
During this paper we focus on sandwich Elisa procedure, this procedure requires the use of matched antibody pairs, where each antibody is specific for a different, non-overlapping part of the antigen molecule. A first antibody is coated to the wells.
The sample solution is then added to the well. A second antibody follows this step in order to measure the concentration of the sample. The procedure is described in Figure 5.

ELISA Game
ELISA Game provides a virtual Laboratory for students of biology to develop their practices of immunological technics by playing. It proposes also an assessment of the student actions through the game and a knowledge data for learning analytics.
The main pedagogical objectives of ELISA game are: • Understand the principle of ELISA.
• Highlight the interest of antibodies as diagnostic tool.
• Understand the principle of ELISA method in the detection of AIDS.
• Describe the different steps of the method.
• Identify the main characteristic of an antibody.
• Know the structure of HIV.
The player plays to success in the 12 steps of game and complete the process of the ELISA method (Table 1) ELISA game has been developed following the multilayer methodology [13] and Figure 6 presents a screenshot of laboratory during the experimentation of ELISA technic. Using the serum from patient A, prepare three dilutions as follows: -Take 1 ml of serum from patient A and add 1 ml of phosphate-buffered saline (PBS) solution. This is a 1:2 dilution.
-Take 1 ml of serum from patient A and add 9 ml of PBS. This is a 1:10 dilution.
-Take 0.1 ml of serum from patient A and add 9.9 ml of PBS. This is a 1:100 dilution. 3 Prepare an ELISA plate with 0.1 ml of the different dilutions of patient serum using a pipette. *Note: The ELISA plate has been pretreated to bind SLE antigen to each well. 4 Add to the ELISA plate 0.1 ml dilutions for each titer of anti-DNA primary antibody (positive control) and a buffer (negative control).

5
Incubate the ELISA plate at 37° C for 15 minutes.

6
Remove the fluid from each well with the pipette and wash with 0.1 ml of PBS. Often, in a real laboratory, these washes are repeated 3-6 times prior to adding the next substance. In order to speed this example along, we have given just one rinse/wash example per step.

7
Add 0.1 ml of buffered solution containing a secondary antibody that recognizes antibodies made in humans. Note this secondary antibody is made in a rabbit and has an attached enzyme (HRP) that will interact with the substrate in the next step. ELISA game environment involves the entire required environment for the immunology technics: Centrifuge, samples, tubes, pipettes, ELISA palette, solutions (phosphate-buffered saline (PBS), SLE anti-DNA primary antibody, rabbit anti-Human, HRP-subtract), incubator, timer. The game displays on a table a description of required actions for each phase of ELISA technic. The game engine increments the score by 200 points for each correct action in ELISA process and decrements the power bar by 0.8% for each false action, it also record on database the following player data set (see Table 2).

Method of study
The learning analytics defines a tool to understand learning behavior. It will provide an effective learning through serious games and propose a corrective feedback to their design actors. In this perspective, we have conducted an experimental study using ELISA game [14] to teach and evaluate the learning outcomes. One hundred and two (102) students of biology from the Faculty of Sciences and Technologies in Tangier (FSTT-Morocco) play the ELISA game to learn the immunology technics.
ELISA database will collect data from the play experiences on debriefing class, then we will use the data mining tool WEKA to analyze results and find learning clusters by K-means and EM algorithms, finally we will analyze the result clusters using the data visualization to provide the corrective feedback (see Figure 7).!

Debriefing
We have collected more than 450 experiences of play achieved by 102 biology students. They are 18 boys and 84 girls with age between 19 and 25 years.
The debriefing has been conducted on two main rounds: Firstly, students will play ELISA game without assistance, and then they will play secondly with assistance and have a challenge to complete the ELISA steps with the higher score.
Students provide the following feedback after their experiences: • The learning through ELISA game is more easy and interactive than laboratory.
• The experience on laboratory requires all materials and more than 1 hour to complete the ELISA, but on game, we can complete all steps on 15 min and replaying the experience more times. • In playing, we do not take any risk to be infected by the required ELISA solutions and samples. • The play with assistance provide a good result, the majority of students complete the game.

Clustering
The ELISA database stores all attributes on Table 2. It will help to analyze the player performance by feeding EM and K-means algorithms; it will provide a student's clusters according to their performances.
Preprocess and Analyses of attributes. The goal of the preprocessing is to clean incomplete data (the lacking attribute values or certain attributes of interest), noisy (containing errors, or outlier values which deviate from the expected), and inconsistent (containing discrepancies in codes or names).
The Figure 8 displays the label distributions of each attributes. Although the behavioral attributes nbrUsePipettor, nbrClicks, nbrFailsClicks, healthbarre, and time represents too many labels with weak instances. We remark that score, phase represent a significant distribution and involves the performance issues. In this scope, we will focus on our clustering on the score and phase to provide students performance clusters. http://www.i-jet.org EM clustering. The EM algorithm is a probabilistic clustering algorithm. Each cluster is defined by the probabilities for instances to have certain values for their attributes, and a probability for instances to reside in the cluster. For numerical values, it consists of a mean value and a standard deviation for each attribute value, and for discrete values, it consists of a probability for each attribute value.
Using WEKA tools, we have fixed the number of interactions to 100 and without specifying the number of clusters; the algorithm will find the number of clusters and cluster specification. (see Figure 9: EM inputs).   As shown in both Table 3 and Table 4 there are five clusters, the cluster 0 represents the medium students "level C", those students have an score between 5600 and 6200 and the max phase that have been reached is 3 or 5. The cluster 1 represents the excellent students "level A" with score that varies between 7000 and 7600 and the max phase reached by those students is 12. The cluster 2 "level E" is the worst students that belong to this cluster suffer from several problems, their score is limited to 800 and they have reached only the first phase. The same remark for the students that belong to the cluster 3 "level D", the majority of those students have a score 2800 and their max reached phase is 2. Finally, the cluster 4 refers to the good students "level A" with score between 6600 and 7000 and their max reached phase is 8.
The EM Clustering has engendered five clusters described above, the cluster of medium students represent 23% of the concerned students, cluster 1 represents 27% of all students, by cons cluster 2 and 3 represent 25% and 17 % of all, and cluster 4 represents 8%.
K-Means clustering. k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. Kmeans clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
To select the best K "number of clusters" for our case, we have proceed by Starting with K=2, and keep increasing it in each step by 1, calculating the clusters and the cost that comes with the training. Figure 10 display the K-means parameters used in our study, we have starting with K=2 and we find significant results for K=5. http://www.i-jet.org   The second algorithm used for the clustering is K-means, according to Table 5 and Table 6, the number of clusters generated by using this algorithm are 5. The cluster 0 represents students with difficulties "level E", their scores do not exceed 800 points and they do not reach the phase 2. The performance of those students is bad, problems encountered in the first phase of the game. The students that belong to the cluster 1 "Level B" are good, their score does not exceed 7400 points but they have reached the twelfth phase. The students that belong to the cluster 1 "Level A-" are good their score does not exceed 7400 points but they have reached the twelfth phase. The cluster 2 "Level A+" contents excellent students the majority of their scores achieved the 7600 points and all of them have researched the twelfth phase. The cluster 3 refers to "Level C" contents medium students with score of 6200 and they reach the fifth phase. The last cluster number 4 represents the students "level D" with week scores do not exceed 2800 and they don't reach the phase 3 during the play experience.
As the EM Clustering, K-means clustering algorithm has engendered five clusters described above, the distribution of the clustered instances are 39% of the students who have major problems, 22% of students are good, 3% of students are excellent, 16% of students are weak but they can do well with the assistance, and 19% of the rest are medium.
The two algorithms used in this part have clustered students in several categories; the excellent students with "A", this category of students do not need any assistance. The good students "B" are the students who mastered the basics of the ELISA process steps, but they have little problems, with more exercise they can overcame their problems.
The medium students "C" are the students that make a lot of mistakes, and they reached the medium phases, they need more assistance to achieve the main objective of the proposed game, the bad students or the students that have major problems with the ELSIA process implemented in this game are D And E categories, those students need an Intensive assistance from an expert, with a detailed explanation of the process of the experiment (see Table 7).

Behavioral analysis and data visualization:
This section provides a deep analysis related to the learner behaviours through ELISA game, we proceed by analyzing the data visualization of each variable: "Temptatives, Max phase, Max number of clicks and max use of pipettes", the main objective of data visualization is to describe and analyze the players' behaviours. In this perspective, we will apply the statistical average method to discuss players' data. The Table 8 provides the average of each behavioural attributes related to the performance level. The Figure 9 provides the distribution of players' temptatives according to the performance levels, the average of students experiences on play increases from levels E to A. Students on level A represent high number of temptatives, however students on level E define the low number of temptatives. Although the average of level A' temptatives vary between 2 and 3 times, the average of others levels vary between 1 and 2 times. The Figure 10 shows the players achievements related to their level, the diagram highlights that players on level E, D and C achieved between 1 and 2 phases of ELISA, however players of level B achieve the phase 5 and level A completed all phases of the game. Player clicks provide the behavioral interaction of student on the serious game, the Figure 11 displays an increased number of clicks from level E to A, we remark stu- dents on level E and D do not make more than 100 clicks, however students on level C, B and A interact on game with more than 300 clicks. The Figure 11 involves the same behaviors as the number of player clicks (see Figure 10), the distribution of data refers to the student level increasingly from E to A. The Figure 12 involves the same behaviors as the number of player clicks; the distribution of data refers to the student level increasingly from E to A. The learning analytics techniques used in this paper, have given a global view on the learning level of the students according to their outcomes. Those techniques are considered as efficient tools that can be used by the instructors to evaluate both learning progress and find the learning problems. With the use of the flow charts and the clusters generated by clustering algorithms, the instructors can determine easily the problems source and waken points of the students. In our case the students that belong to the level E and level D. These have reached in maximum the fifth phase, their temptatives are limited to one temptative and the number of uses of the pipette are less than 20; which may lead to the conclusion that these students did not understand the basics of ELISA procedure, or they did not know how to manipulate the game, therefore they must read more about the ELISA, with the instructor assistance, and replay the game several times to improve their skills. According to the outcomes of the C cluster students, their levels are medium, they have provided an effort, but unfortunately they have some lacks during the establishment of the ELISA procedure.
The proposed procedure can be improved by using other variables/parameters and others learning analytics techniques to define in detail and more precisely the learning insufficiencies concerning the students in a learning process that can use several tools as e-learning platforms and serious games.

Conclusion
ELISA game provided an effective environment of medical training; it proposed a new tool of learning for biological students at the Faculty of Sciences and Technologies (Tangier). It helped students to master the immunological techniques and evaluated their performance in play.
Following the learning analytics and using the clustering methods, ELISA game discovered five performance levels (A-B-C-D-E) on the biology students. Although students on D and E levels need an intensive assistance from an expert, with a detailed explanation of the process of the experiment, Students on levels C, B, and A can complete the game by playing the game more times and have the challenge to complete all phases of ELISA game with high score.
In future work, we will focus in increasing the assistance through ELISA game to improve students performance, and in providing the tutor with a learning dashboard to propose a learning analytics of students performances. Our goal is to make the learning through serious game ELISA the more easily possible and successful for all students.