State of Art of Data Mining and Learning Analytics Tools in Higher Education

In this decade, the use of learning management systems (LMS) does not cease to increase, becoming one of the most popular approaches adopted and widely used in the learning process. Learners' online activities generate a huge amount of unused data that is wasted because traditional learning analyses are not able to process them. In this regard, a large collection of applications/tools have emerged to conduct research in educational data mining (EDM) and / or learning analysis (LA). This study looks into the recent applications/tools of Big Data technologies in education and presents some of the most widely used, accessible, and powerful tools in this field of research. The majority of these tools are for researchers with the purpose of conducting research on educational data mining and learning analysis. Keywords—Learning Analytics, LMS, Big Data, Educational Data Mining, Text Mining, Modeling


Introduction
In recent years, community learning environments have multiplied and have provided new directions for educational improvements to educational research. Recent learning methods like Flipped Classroom [1] largely depend on learners' online activities like discussion forums, online chats, instant messaging, etc… through a LMS (Moodle, Claroline, Google Classroom, etc.). Usually, huge amount of data is created and generated by learners' activities through a LMS. These data can be used in developing the learning environment helping the learners in learning and improving the overall learning experience. In this way, several frameworks and models [2] have been proposed for online learning management systems to improve the learning experience. But data created from learner's activities in educational institutions is so enormous [3] and traditional processing techniques cannot be used to process them. The limitations of traditional data processing applications have enabled communities of educational data mining (EDM) and learning analysis (LA) to become an alternative to basic approaches to working with educational data [4] [5].

2
Big Data in Learning

Big data
The term "Big Data" refers to any set of data [6] that is too large and moves too fast, becoming too difficult and too complex to be dealt by traditional data-processing application software. Indeed, the processing of data produced by learning environments has become a real challenge, which has made it necessary to use Big Data technologies and tools to handle them.
Generally, Big Data has come to be identified by several of fundamental characteristics. Key among them are [7]: Table 1. Properties of Big Data

Volume
Size of data plays a very crucial role in determining value out of data.

Velocity
How fast the data is generated and processed to meet the demands, determines real potential in the data.

Variability
The inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively Variety Data in diverse format both structured and unstructured.

Big data in higher education
Big Data techniques and tools can be used in a variety of ways in learning analytics as listed in table 2 [8]: Table 2. Different axes of applying Big Data in Learning Analytics

By analyzing learner's interaction in a learning environment with other learners and tutors. Attrition Risk Detection
By analyzing the learner's behavior using measures implemented in the beginning of the learning process to detect the risk of students dropping out.

Data Visualization
By using visualization techniques to easily identify the relations and trends in the data just by looking on the visual reports.
Intelligent feedback By providing through LMS intelligent and immediate feedback to students in response to their inputs in order to improve their interactions and performances. Course Recommendation By recommending best courses to students based on their interest and activities. That will ensure that students are not misguided in choosing fields.

Student skill estimation
By analysing interaction of the learner with the system or in the message boards or discussion forums.

Behavior Detection
By analyzing the learner's behavior in community-based activities or games which help in developing a student model. will allow us to draw a guideline that could be used afterwards to perform an analysis of one of these tools or to explore a research question. Among the major challenges in the fields of EDM is the transformation of raw and incomplete data streams into significant variables. This transformation remains a complex process, since data usually comes in different forms that are not ready for analysis; the data need to be cleaned in order to remove cases and values that are actively incorrect, in this way data can be converted into a more significant format [9]. As a first step, we will start by listing some tools well suited for the manipulation, cleaning, and formatting of data as well as for feature engineering and data creation. We will also discuss the role of programming and querying languages in this task of data manipulation and formatting. In the second part, once data have been cleaned, transformed into significant format and structured appropriately for analysis, the problem for an EDM or LA researcher is how to analyse these data, what models can be constructed and what relationships can be mapped and explored from this manner. Several tools and packages are well suited for testing, analysis, and modelling data will be discussed later. In the last part of our discussion, after the analysis has been conducted and the model has been validated by researchers, the issue is to visualize information in the significant way. We will debate some tools and packages that allows data scientists the capacity to create informative graphs, diagrams models, networks, charts, and other manners of visualized information.
In the next section, we will discuss in details tools that are relevant to these types of specialized data, especially those frequently used in EDM due to their relevance and their popularity to researchers and practitioners.

Data manipulation and feature engineering
The process of data mining can begin, when datasets have been cleaned and prepared from their raw state. This is a recurring problem, and more complex when data miners have to work with log data or learning management system (LMS) data recorded in forms that are not directly amenable to analysis. Educational data is generally known by their messy, sometimes incomplete or some parts have to be merged; and usually in different and unusual formats. For example, if a tutor is interested to identify off-task learners [10] [11], a part of the information can be found in the logs file of the system as a raw time stamp. In this case, Baker [12] & Veeramachaneni et al. [13] recommend the feature engineering process to create new variables in order to conduct the desired analyses. In follows, we present tools that can be used for cleaning, organizing, and creating data. We will discuss for each tool, their advantages and their utility for restructuring large data sets and creating and managing new and more useful variables from existing variables.
EDM workbench: It is a tool with the aim to address the limitations of Excel and Google Sheets about the specific tasks such as the generation of complex sequential features and data labelling [14]. EDM Workbench allows the user to define the set of features by which the data should be grouped into subsets of learner-tutor transactions (referred to as ''clips''). Creating features in the EDM Workbench is based on XML and the extraction of several features used in existing literature and intelligent tutoring systems (ITS). We can mention some features such as (the time the learner spent on the problem, the number, and the proportion of correct, wrong, or help actions for the current skill for the last n steps, for the skill, or for the learner, etc). In addition, it allows data labelling by creating text replays and printed sections of human behaviour [15]. These latter are coded by researchers or other domain experts in terms of categories of behaviour or other labels of interest. Finally, EDM Workbench supports sampling, reliability checking, synchronization and organization between features extracted and labels.
Python and Jupyter notebook: Several programming languages allows the manipulation of data and engineering of features. One of these programming languages is Python which is considered the most suited for these purposes, especially in engineering context dependent or temporal features compared to Excel and Sheets. Jupyter Notebook [16] is a web-based interactive computational environment having a useful feature that allows creating and sharing document including data cleaning and transformation, numerical simulation, data visualization, statistical modeling, machine learning, etc. Nonetheless, visual inspection of data and features created in Excel or Google Sheets is easier than in Jupyter. Among the difficulties encountered by data scientists is the high time generated to identify missing data, duplicates or unusual values such as JSON files (JavaScript Object Notation) produced by several online learning platforms and Massively Open Online Course (MOOC). Such files unusual data formats can be handled with Python. Even if, Python is more powerful in accommodating larger data and those involving nested loops than the spreadsheet tools covered above, but it faces data size limitations and becomes slower during processing data.
Structured Query Language (SQL): SQL is a language used in programming and designed to manage data in some (but not all) databases. It is mostly useful for extracting exactly the desired data, sometimes integrating (joining) across multiple database tables. SQL, Hadoop [17] or Spark [18] are database languages that allows significantly fast processing for basic tasks such as (selecting a specific subset of learners or obtaining data from a specific date range) than any of the tools aforementioned. In addition, SQL can work effectively in combination with Excel & Python in sorting and filtering task.

Algorithmic analysis
This step consists to analyse and model datasets and validate the resulting models when features have been engineered, then outcome variables have been labelled, and finally data have been sampled and appropriately structured. A big collection of modelling frameworks and algorithms will be detailed in the following section. All these tools are used to model and predict processes and relationships in pedagogical data.
RapidMiner: RapidMiner is a data science software platform that provides an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics with an extensive set of classification and regression algorithms as well as algorithms for clustering, association rule mining, and other applications [19]. Among the various data mining tools that exist, RapidMiner's graphical programming language is the more powerful of them, since it allows, for example to conduct cross validation at multiple levels (such as learner-level/courselevel cross validation) using the BatchCrossValidation operator, which becomes an advantage over the other graphical languages in data mining packages. In order to help the user to evaluate the goodness of a model, a large range of metrics is available by RapidMiner in this sense Models here are generated from mathematical models based on RapidMiner code or XML files [20]. Integration of a programming application program interface (API) into RapidMiner's graphical programming language allows the possibility to achieve more and different tasks. Moreover, it integrates all of the algorithms available in WEKA, detailed below. RapidMiner is a free open source software and is available for free for academic use, also it's a commercial product, where licenses are available through the publisher Rapid-I. A wide set of tutorials is available in the official website in learning how to use the graphical programming language.
WEKA: WEKA is a free and an open source software including a set of algorithms related to machine learning. It offers tools for data mining tasks such as regression, classification, association rules mining, clustering, and visualization [21]. Data mining algorithms can be invoked by users through a graphical user interface (GUI), a command line, or by invoking algorithms from a Java API. GUI does not give users access to all advanced functions than the command line interface and APIs. The integration of PMML (Predictive Modeling Markup Language) files support into the Weka scoring plugin and a new PMML classifier scoring plugin for the Weka KnowledgeFlow have been completed. From Weka 3.6.0, PMML models can be run from the Classify panel in Weka's Explorer user interface and from the command line. Learning to use WEKA is supported by a book by Witten, Frank, Pal and Hall [22], now in its fourth edition. The WEKA website also hosts an active mailing list, tutorials, wikis, and bug reports.
SPSS: SPSS is a widely used program for statistical analysis in social science including a large packages of regression frameworks, statistical tests, factor analyses and correlations. SPSS Modeler was created by IBM in order to build predictive models and conduct other new analytic tasks [23] [24]. From the outset, one of its main goals was to eliminate the unnecessary complexity of data transformations and to make complex predictive models very easy to use. In addition, a functionality had been added for using the target class in feature selection, which is not available in many other packages. Even if, SPSS is considered such as a complete statistical analysis tool, but it faces modelling limitations compared to others tools in this section. SPSS remains less flexible in terms of customization and also not documented as well. SPSS is available commercially at the official website of IBM. KNIME: KNIME is a free and open-source data analysis and reporting platform generally similar to WEKA and RapidMiner [25] [26]. It integrates many components for data mining and incorporates all of WEKA's algorithms. In addition, KNIME offers several algorithms in different areas such as Social network analysis (SNA) and sentiment analysis. One of the biggest advantages of KNIME is its capacity to incorporate data from multiple sources (e.g., a database of learners, a word document of text responses, and a csv file of engineered features, etc.). Finally, several extensions of KNIME allows the interfacing with programming language such as (Python, R, Sql & Java). Orange: Orange is an open-source data visualization, machine learning and data mining toolkit [27]. It contains fewer algorithms compared to the other tools mentioned before, but offers many commonly used algorithms, such as random forests, kNN and naive Bayes. Orange remains much easier for understanding the interface by using color-coded widgets to make simple the difference between data input and cleaning, visualization, regression, and clustering. Also, Orange offers the possibility to customize visualization modules for the presentation of model results in the best way. Compared to the other tools cited in this section, Orange is limited in the scale of data and may be better suited as a tool for smaller research projects.
KEEL: KEEL (Knowledge Extraction based on Evolutionary Learning) is an open source Java software tool that can be used by EDM researchers for a large number of different knowledge data discovery tasks [28]. KEEL provides a simple GUI based on data flow to design experiments with different datasets and computational intelligence algorithms (paying special attention to evolutionary algorithms) in order to assess the behaviour of the algorithms [29]. It contains a wide variety of classical knowledge extraction algorithms, pre-processing techniques (training set selection, feature selection, discretization, imputation methods for missing values, among others), computational intelligence-based learning algorithms, hybrid models, statistical methodologies for contrasting experiments and so forth. It allows to perform a complete analysis of new computational intelligence proposals in comparison to existing ones. KEEL has relatively less support for new users than most other data mining packages, though there are help features and a user manual. KEEL is open source and free for use under a GNU license. Moreover, KEEL has been designed with a two-fold goal: research and educational.
Spark MLLib: Apache Spark is a framework for wide-scale processing of data across multiple computer processors, in a distributed fashion [30]. Spark can connect with several programming languages, including Java, Python, and SQL, through an API, allowing these languages to be used for distributed processing [31]. Even if MLLib's functionality is still somewhat limited, and it is essentially a programmatic tool (reducing its usability to nonprogrammers), its distributed nature makes it an efficient and rapid choice.
R Analytical tool to learn easily: Rattle is a popular and free open source GUIbased data mining tool using Gnome graphical interface from Togaware [32]. Rattle supports unsupervised and supervised data mining and machine learning models. It allows the dataset to be partitioned into training, validation and testing. Also, data in Rattle can be summarized visually.

Visualizations
After the data extraction and analysis phases, the visualization phase comes to support both analysts and practitioners in deriving meaning from data [33] [34] [35] [36]. We will introduce in this section, some general tools that may have relevant implications of data analysis, especially tools and methods that allows visual analytics. These tools enable building interactive visual interfaces in order to gain knowledge and insight from data as well as communicating important implications for learning to learners and tutors.
Tableau: Many products for interactive data analysis and visualization are offered by Tableau [37]. These products have been widely applied in learning environments to analyse learner's data, provide actionable information, improve pedagogical reports and tutoring practices. Using Tableau requires no programming knowledge to analyse enormous amounts of data from various sources [38]. This advantage makes it easy to have a range of visualizations for a larger community. The visualizations are displayed to users on a are dynamic real-time way and based on rich and interactive dashboards. Another advantage for this tool is its ability to import data from different standardized formats for data storing (e.g., data warehouses, databases, log files, etc…). One of the limitations of Tableau is its incapacity to support relational data mining or predictive analytics, furthermore, it's a commercial tool, which does not allow extensions and integration of other software platforms.
D3.js: D3.js is a JavaScript library for manipulating data driven by documents, producing dynamic, it helps researchers to build interactive data visualizations in modern web browsers [39]. Even though, D3.js is free and open source, does not require installation, supports code reuse, and has the ability to build wide range of kinds of data visualization, but it's always a big challenge to adopt it for educational research purposes. D3.js is facing by the problem of compatibility with some browsers (e.g., Internet Explorer) as well as some performance limitations for larger datasets [40]. In addition, to guarantee privacy and data security, the data pre-processing is required in order to hide data from users of visualizations. In parallel, several data visualization tools exist to provide different ways for presenting data visually and building interactive dashboards. We can cite some tools such as JavaScript InfoVis Tool kit, Raw, jpGraph, Chart.js and Google Visualization API. We didn't detail these tools in this section because they have been less frequently used by EDM and LA researchers compared to D3.js.

Specialized EDM & LA applications
Until now, we have listed and discussed general-purpose tools/applications for EDM modeling and analysis. Sometimes, analysis goals and types of data require more specialized algorithms that are not offered by these general-purpose tools. In this case, we will cover some of the most popular tools widely used by practitioners and researchers that accomplish these aims.
SNA: Social network analysis (SNA) is a sociological approach, based on the study of network theory applied to social networks. It builds and designs social relations with nodes and links. Nodes are usually the social actors or institutions, and the links are the relationships and the interactions between these nodes. SNA is commonly used with the aim to analyze collaborative social networks (e.g., learner interaction within MOOCs, in social media, or in online courses within LMS).
Gephi: It is popular and interactive software writing in Java for graph and network analysis and visualization in real-time [41]. It provides easy and broad access to net-work data and allows for specialization, filtration, navigation, manipulation and clustering. Gephi offers also a Java API to manipulate social network graphs, calculate several measures (e.g., average path, density, and degree of centrality), and to execute algorithms (e.g., graph clustering and giant connected component extraction) usually used in SNA. Gephi is available on several operating systems, and commonly applied in LA research. Although it is used under GPL license.
EgoNet: It is a free SNA program that helps the user to create survey, to collect and analyses all the egocentric network data (all social network data of a website on the Internet), and offer data matrixes and general network measures in order to be used for other analysis by other programs [42]. Furthermore, since members provide information about network structure from their perspective (hence ''ego'' in the name), a collection of analysis tools are offered by EgoNet to better understand the overall network structure, with the possibility to interrogate a member of the network with further questions.
NodeXL: It is a free package for Microsoft Excel that facilitates the exploration of network graphs from a large variety of input data formats [43]. The basic version of NodeXL provides a set of tools for filtering and visualizing the data, NodeXL Pro propose additional features that extend NodeXL Basic, allowing an easy access to the various social media network data streams (e.g., Twitter, YouTube, and Flickr), offering advanced network metrics, and powerful report generation.
Pajek: It is a free desktop tool for complex analysis of a large variety of huge networks including the analysis of networks of social interactions [44]. It is widely used in LA research and academia for SNA. Pajek allows the network partitioning, information flow analysis and community detection. Recent version of Pajek is designed called Pajek-XXL that is able to work with enormously huge networks networks (with millions of nodes and more). Pajek is available on Windows, Mac OS & Unix.
Sonia: It is a social network an open-source platform with an image animator specialized for longitudinal analysis of networks [45]. It has the ability to retrieve the information about the time relationships occurred or at least the order in which relationships developed by members. This allows a better visualization of network changes over time. The final result is an animated image of structural changes over time, which can be exported into QuickTime video format. Sonia project is based on Java programming language, developed by Stanford University and can be used in all major operating systems.
NetworkX: It is a Python package that allows to create, manipulate, and analyze complex network processes, structures, and dynamics [46]. NetworkX is widely used in academic research and offers a wide range of advanced functionalities to manipulate networked data, such as graph clustering, graph reduction, community detection, network triads analysis, link prediction (finding missing links, e.g., missing Facebook connection among two friends), and others.
The social networks adapting pedagogical practice: It is a software tool that performs real-time social network analysis and visualization of discussion forum activity within a Learning Management System (e.g., Desire2Learn, Blackboard, and Moodle) [47]. Data formed through learner's posting and replying interactions from HTML pages of LMS discussions can be exported for further analysis or visualized within SNAPP. Visualization and analysis can be realized with the use of several various graph layout algorithms. In addition, SNAPP has the ability to explore the evolution of learner social networks, to identify structural holes, to analyze the highly active/inactive users and to compare analysis of several discussion forums.
R packages: Network, sna, igraph, statnet and ergm: The network package [48] has the ability to build and modify network objects, extract network metrics, and visualize network graphs. There are several packages for SNA [49] in the R programming language, that are used with the network package to get some functionalities commonly needed for SNA, such as network regression, graph generation networks, calculation of node and network metrics, and others. Another package written in the C programming language with additional language bindings for the R and Python programming languages called igraph [50] is usually used for SNA. It can be used for builidng and modifying social networks from a large variety of input formats (e.g. Gephi, Pajek, GraphML), visualizing graphs, calculating of node and network properties. In addition, it supports different network analysis such as graph clustering, block modeling, community detection, and others. Another package for SNA is statnet package [51] which is a collection of packages for network analysis with the latest improvement in the statistical modeling of networks. Statnet provides a set tools for the visualization, representation, simulation and analysis of various forms of network data. This extensive functionality is supplied by a central Markov chain Monte Carlo algorithm (MCMC). Finally, the ergm package [52] can provide the same functionalities of stanet and can also be used for statistical modeling of social networks using Exponential Random Graph Models (ERGMs).
Cytoscape: It is an open source tool developed on the Java platform allowing the visualization of complex networks and the integration of these latter with any type of attribute data [53]. It can be applied in various types of problem domains (e.g., bioinformatics, semantic web and sna). A basic set of features is offered by the Cytoscape core distribution allowing data analysis, visualization, and integration, which is then extended by using several user-supplied modules (formerly called plugins). Cytoscape can be applied within several operating systems.
Text mining: Text mining is a speedily expanding field of text mining, which consists of deriving high quality information from text. Various applications and APIs are available for the tagging, processing, and identification of textual data. Text analysis tools can process text parts of speech, sentence structure, and semantic word meaning. Also, some tools have the capacity to detect representational relationships between different sentences and words. We will cite below a set of popular tools allowing the treatment and the textual analysis.
Linguistic Inquiry & Word: It is a graphical and easy-to-use computerized text analysis that calculates the degree to which various categories of words are used in a text [54]. LIWC organizes the words into dozens of linguistic and psychological groups that tap social, cognitive, and affective processes. LIWC tool has been broadly used and validated in several and various empirical studies.
Coh-Metrix: It is a system for calculating cohesion and coherence metrics using indices of linguistic and discursive representations of a text [55] [56]. Coh-Metrix calculates the coherence of texts on more than 100 measures of text divided into 11 categories. With his multiple tags, Coh-Metrix allows assessing deep text cohesion, such as measures of referential cohesion or narrativity. Coh-Metrix remains better for the analysis of text features and relationships in the data.
ConceptNet: It is a free semantic network, based on a multilingual knowledge with the aim to help computers understand the meanings of words used by people [57]. The main objective of ConceptNet is to develop an enormously large graph of ''commonsense'' knowledge (e.g., ''keyboard is a computer hardware"), which can be then used for understanding and processing natural text. With a wide knowledge base, ConceptNet has the ability to categorize textual documents according to corpora topics, to analyze sentiments (e.g., detecting emotions in the text), and to summarize text among other uses.
AlchemyAPI: It is an IBM-owned tool based on machine learning (more precisely, deep learning) to do natural language processing (in particular, semantic text analysis, including sentiment analysis) [58]. It affords It allows content processing in the form of standard text documents or web resources (i.e., accessible through URL) and supports response formats such as JSON, XML, and RDF. AlchemyAPI remains a commercial platform, and gets paid per API call, but it provides free access for up to 1,000 calls per day. According to the studies of AlchemyAPI's performance [59] [60] [61], it turned out that the best results are obtained when the tool is used for the semantic analysis of long articles, such as research articles or blogs. TAGME: It is a powerful tool that have the ability to identify in unstructured text a significant short-phrases (called "spots") and to link them to a pertinent Wikipedia page in a fast and effective way [62]. That is, TAGME assigns (if possible) a Wikipedia concept to each of the term sequences in the analyzed text. According to the studies of TAGME's performance and compared to other solutions [63], it turned out that the best results are obtained on short text segments and a comparable precision/recall results on longer text. TAGME offers an API to be integrated with other applications.
Apache stanbol: It is an open source tool and reusable collection of components allowing semantic content management with the goal to bring semantic technologies into existing CMS and for text mining [64]. Apache Stanbol remains an easy tool to set up and run it on small set of instances allowing the possibility to incorporate a domain-specific ontology in the annotation process. This is extremely useful when working with locally defined concepts specific to a given educational context. In addition, Apache Stanbol supports and integration with different CMS, and the text annotation in various languages.
Natural language processing (NLP) tool kits (Apache OpenNLP, Python analysis and Stanford CoreNLP): These represent an important part of the text mining toolset and they are naturally used in the preprocessing stage of the analysis, by (i) splitting paragraphs into individual sentences, or words; (ii) extracting syntactic dependences between words; (iii) assigning categories to each word; (iv) reducing derived words to their root word; (e) extracting named-entity, (i.e., names of people, places, institutions, monetary amounts, dates); and (f) resolving coreference (resolution of pronouns to their target nouns). Among the most popular NLP tools, we can cite Apache OpenNLP tool kit [65], a Java-based NLP tool kit that supports most of the common NLP tasks listed above. Also, Python NLTK [66] which is an NLP library for Python programming language with very same abilities. Another useful toolkit is Stanford CoreNLP [67] which aside from providing a Java API, also provides a stand-alone command line interface and a set of ''wrappers'' for other programming languages (e.g., Python, R, C#, Ruby, JavaScript and Scala). LightSIDE: It is an open source platform based on the WEKA toolkit to support text-mining. It allows creating a set of features usually used for educational text, especially, creating variables for individual words, punctuation, line length, bigrams (similair and adjacent words), and word stemming [68]. LightSIDE offers also a simplified interface for error analysis that can help researchers to iteratively improve their text mining solution.
Process and sequence mining: In addition to more classical methods to educational data analysis, such as course persistence or predicting learning outcome, researchers also have the objective to track sequences of learner activities to understand learning approaches and processes [69] [70]. In this section, we will present two tools for this type of application which are commonly used to support EDM and LA research. They are generally used to perform analysis, but they also allow for some level of data preprocessing.
ProM: It is an independent framework developed with Java that supports a wide range of process mining techniques [71]. It supports running process mining in a distributed setting or through batch processing, and providing a clear specification of expected inputs and outputs for each of the supported implementations. Furthermore, new plugins can be added at run-time, to be integrated simply into the analysis process. Another advantage for ProM is the integration with existing information systems does not require programming task. The most current version is ProM 6.9.
TraMineR: It is a R-package that allows mining, description and visualization of states or events sequences data [72]. Some of the primary features of TraMineR for the analysis and visualization of state sequence data include (i) handling of longitudinal data and conversion between several sequence formats, (ii) plotting sequences (frequency plot, density plot, and more), (iii) sequence transversal characteristics by age point (transversal state distribution, transversal entropy), (vi) individual longitudinal characteristics of sequences (length, time in each state, longitudinal entropy, complexity and more).
PSLC Datashop: It is a free multifunctional web application which offers a secure place to store & access research data and it supports various kinds of research [73]. DataShop has functionalities allowing a focus on learner-tutor interaction data with a learning curves & error reports provide summary and low/high level views of learner performance (e.g. hint use, latent knowledge, response times, and other variables of interest). Additionally, it offers performance profiler aggregates across various levels of granularity (problem, dataset levels, knowledge components, etc.).

Conclusion
EDM and LA is speedily changing area, and several tools are emerging constantly. We have reviewed in this article, about 40 tools commonly used for datamining and analytics in the area of education. The main objective of this state of art is to give help to researchers interested in learning about these emerging methods in terms of theoretical and practical application and use.
According to this state of art, it has been found that no one tool is ideally suited to conducting the entire process of analyzing most data sets from start to finish. For this, researchers and practitioners using EDM and LA have the obligation to use different tools that are suited to different tasks. For instance, data generated by a popular MOOC can easily reach more than 50 system transactions. A researcher may use (SQL) to select only data of a particular semester, then use (Excel) to refine this dataset in order to calculate total learner time in the system. After he will use (RapidMiner) to fit a predictive model and (NodeXL) to analyze the relationship between forum posts and replies. Then, he will use (CohMetrix) to get an overall textual quality of posts and replies by that learner. At the end, this researcher may use (Gephi) to visualize the most interesting clusters of learner found within the social network data.
We have presented in this paper, a collection of tools that researchers in the fields of EDM and LA currently use, and are represented in aggregate across the different groups of scientists working in this field. As we mentioned above, each tool represents different approaches to different problems, with their own particular strengths and weaknesses. The combination of these tools can be a useful discovery and a best way to perform complex analyses.

7 Author
Mohammed Salihoun obtained the doctorate degree in computer science in 2018 at EMI, (Ecole Mohammadia des Ingénieurs, Mohammadia School of Engineers) of the Mohammed V University (UM5) of Rabat, Morocco. He has been teaching computer sciences since 2012. His areas of interests are: Elearning, Big Data.