Using Learning Analytics to Explore Responses from Student Conversations with Chatbot for Education

— Chatbots simulate human conversations through computer programs using natural language. They are developed for a variety of reasons and purposes, as virtual characters and animators or as an interactive game component. Today, Chatbots for education are widely used to allow students to engage with learning content on an ongoing basis. In this study, students use the Chatbot to query web programming learning content such as code description, coding and problem-solving. However, the Chatbot's successful responses to students' questions are unknown. The teachers also do not know the desired learning content of the students using the Chatbot. Thus, the objectives of this study are to explore the likelihood of a student getting successful responses in each conversation and to identify the learning content desired by the students. The learning analytics method is used to analyze the learning data from the Chatbot. The data analysis performed is descriptive analysis and binomial probability test. The results of the studies showed that the value of successful responses of the Chatbot is high. The learning content most desired by the students is related to three categories of web programming content, namely hypertext preprocessor (PHP), database and structured query language (SQL) and hypertext markup language (HTML). The Chat-bot is updated based on the proposed actions to provide more efficient responses.


Introduction
Chatbot is a computer program that can mimic human speech. Chatbot users can communicate through text or voice input on a computer screen with text or audio output. They are created for a wide range of reasons and purposes [1]. The integration of information and communication technologies to support online learning is very important today [2][3][4][5]. As part of the technological development of digital learning, Chatbot for education is now widely used [6].
Web programming is the most difficult programming subject for college students to understand [7][8][9][10]. Students need learning content support to be able to learn outside the classroom. Using Chatbot for education is one of the identified methods that can help students get learning content at any time. However, teachers are not aware of the learning content desired by students who perform self-study using the Chatbot. It is likely that the Chatbot also provides answers that do not coincide with the students' desired learning content Results from previous studies suggest that learning analytics can be used to explore student learning using online learning applications [11][12][13][14][15][16]. Learning analytics is a field of research that has its roots in various fields related to learning. Learning analytics research integrates several existing techniques such as action research, academic analytics, educational data mining (EDM), recommender systems and personal adaptive learning. Learning analytics is considered an umbrella and global term to describe the field of technology-enhanced learning (TEL). Learning analytics research focuses on the development of methods for analyzing and tracking trends in learning data obtained in educational settings, as well as the use of these methods to support learning experiences [17][18][19][20][21].
This study uses Learning Analytics to explore the responses of student conversations with Chatbot for Education. The results of the analysis are used to improve the learning content and functionality of the Chatbot. Exploring the learning content desired by students is important for them to get sufficient, accurate, and meaningful information [22]. The following objectives are identified to achieve the purpose of the study: ─ Study Chatbot and Learning Analytics. ─ Explore the probability of a student getting successful responses in each conversation. ─ Identify the learning content desired by the students.
The structure of this article is as follows. Section 2 presents a review of the literature on the use of Chatbots and Learning Analytics. Section 3 presents the methodological aspects of the study. Section 4 presents the main results and discusses the relevance of the proposed approach 2 Literature review

Chatbots
Chatbots are generally developed to automatically provide specific information on conversations in a topic such as frequently asked questions, website guides, virtual support agents, virtual sales agents, survey takers, quiz hosts, education and chat room hosts [1,23]. Chatbot simulates the language of human conversation with a text-based dialogue system on a computer program. Chatbots are initially developed using a simple keyword matching technique to find user input matches [24]. Subsequently, Chatbots are developed using different pattern matching algorithms to simulate fiction or real personalities [25].
Chatbots are currently developed to generate feedback based on a machine learning model or several heuristic search techniques to select feedback based on a predefined set of responses. Each module in Chatbot contains a set of keywords from users and feedback from the system database. Modules in Chatbot are referred to as intentions or intents that describe keywords for information retrieval from users. Each built-in module describes the scope of content for a particular user conversation scenario interacting with Chatbot.
There are two models of Chatbot architecture commonly used for Chatbot development, namely the generative and retrieval-based models, as shown in Figures 1 and 2 [26]. Chatbot that uses a generative model is limited to responses that the Chatbot developer has determined. The generative model can generate new dialogues through the collection and processing of large conversational training data. The Chatbot needs many examples and samples of conversations as a set of exercises to be used in the deep learning model towards producing quality conversations that can be implemented automatically.
The retrieval-based model Chatbot is trained to provide the best feedback from the feedback database that has been developed. The feedback provided is based on existing information. Chatbot development that uses this model is easier than the generative model. This model can provide more predictable feedback. The Chatbot of this kind uses an information database such as frequently asked questions to provide relevant feedback to the current conversation.
Chatbot for education is usually developed according to the learning content of a subject. It can understand the keywords and questions related to the learning content that has been set. Chatbot uses natural language processing and response to users. The Chatbot will ask the user to provide more information before the actual response is given if the question is not understood [27]. However, the answer is limited to the response that has been defined in the library. The lack of response in the library causes the answer that may not be accurate and does not meet users' desires [14].

Learning analytics
Learning analytics is described as the collection, analysis, measurement, and reporting of data about users and their contexts with the goal of understanding and optimising learning and its environment [13,25]. Another definition of learning analytics is the process of developing actionable insights through the definition of a problem and the use of statistical models and analysis of existing data and/or future data simulations [28]. Learning analytics is more concerned with making judgments and actions, which contrasts with EDM that focuses on developing methods to explore unique data types from educational settings. Although the techniques used in both areas are similar, EDM focuses more specifically on reduction analysis [29]. Future analytical techniques and tools developed from the two fields are expected to overlap [30]. Analytics in education can also exist at various levels, ranging from individual classrooms, departments, universities, regions, states and international. Shum, Knight & Littleton [31] classified this level of organisation as micro-analytic, meso-analytic and macro-analytic layers. Each level gains access to a different dataset of quantity, variety, and context.
The field of learning analytics encompasses three main components: tools, techniques, and application [29,32]. Tools refer to the apparatuses used for learning analytics. In contrast, techniques refer to the methods used to perform learning analytics. Application refers to the use of these techniques to improve teaching and learning. According to Baker and Yacef [33], the five main areas of learning analytics and data mining are: ─ Prediction requires developing a model that can infer one aspect of the data (predictive variable) from some combinations of other aspects of the data (predictor variable). ─ Clustering refers to finding group data points that are naturally together and can be used to divide an entire data set into categories. ─ Relational mining includes discovering associations between variables in a data set and coding as a later use rule. ─ Distillation for human judgment is a technique that comprises data that describes how humans can identify or classify the characteristics of data quickly. ─ Discovery with models is a technique that includes the application to study further a validated model of a phenomenon (developed through prediction, collection or technical manual knowledge).
The differences between the techniques and applications of learning analytics indicate the difficulty of researchers in explaining the definition and taxonomy of learning analytics [34]. The use of learning analytics to predict and model student activities allows early intervention to be performed to prevent dropouts from occurring [16].
Studies on learning analytics are related to Chatbot include prediction and personal learning [1,[35][36][37]. Predictive study in learning analytics is concerned with exploring the conversation data of Chatbot users. The data obtained are then analysed to explore Chatbot response in the future. Meanwhile, a personalised learning study is about analysing the data explored to enhance the learning experience. The learning rates and teaching approaches can be optimised for the needs of each user through personalised learning. Based on the previous study, the learning analytics approach is suitable to explore the responses of students' conversations with Chatbot for education.

Chatbot development
Chatbots for education is developed based on the retrieval-based model for the subject of Web Programming. Chatbot development uses Android Studio (Integrated Development Environment (IDE) for android application development), Dart programming language, Flutter framework and Dialogflow platform. The use of the Dialogflow platform allows every conversation between the student and the Chatbot to be recorded using the built-in analytics and history functions. Figure 3 shows the conversational Chatbot architecture used. The descriptions of the figure are as follows: • Student types a query in the chat client (mobile app).
• Chatbot reads and sends the query to the Dialogflow.
• Dialogflow extracts the student's intent and entities from the given phrase.
• The decision engine in Dialogflow will find the right intent for the response.
• Chatbot forwarded a response to the chat client.
The Chatbot will respond to each conversation with the student covering the learning content of the subject. The responses given are in the form of text, links to the web or multimedia content. The resulting responses are based on machine learning model and heuristic techniques that select responses from a predefined response library. Therefore, students will get the best response from Chatbot. Machine learning allows intention classification algorithms to be trained. Dialogflow offers a web interface for building and testing conversation scenarios.
In this study, the Chatbot mobile app used the Android platform and the Telegram Bot app. The Chatbot mobile app for the Android platform shows in Figure 4, while the Chatbot for the Telegram Bot app shows in Figure 5. Students need to type a query and Chatbot response immediately.

Sample and procedure
This study focused on exploring the learning content desired by the students who use Chatbot for education (Android mobile app and Telegram bot) in a public university. The study data included 513 total responses from 47 students' conversations with Chatbot over 90 days. This study did not consider students' demographics and experience using the Chat app since they all had similar backgrounds. All students are involved in using Chatbot as their self-learning assistant.
Siemens's learning analytics model was used as a reference to explore the learning content desired by students who use Chatbot for education [34]. Figure 6 shows the model of learning analytics used. The five main components of this model of learning analytics are: • Data collection • Data storage • Data cleaning and filtering • Data analysis • Action

Fig. 6. Model of Learning Analytics
Data collection. Dialogflow in Chatbot development allowed students' learning data to be tracked automatically. The analytical data tracked were conversation sessions, intents and session flow. Conversation sessions are about the frequency of a conversation session and the frequency of queries for each session by day. The frequency of inquiries can also divide into periods of 7 days and 30 days. Meanwhile, the session flow is a view of the route commonly taken by the students, including the exit point based on the 30 days of the session data. The history menu of Dialogflow serves to store all records of Chatbot students' conversations. Recorded conversations included all platforms that use Chatbot. All of them are tracked, including the intended conversations that do not match in the response library.
Data Storage. Learning data resulting from student conversations using Chatbot for education was also stored on the Dialogflow platform. The data consisted of student query data, Chatbot responses and conversation sessions. Dialogflow also stores data of successful and unsuccessful responses in each conversation session between students and Chatbot.
Data Cleaning and Filtering. The process of cleaning and filtering is done on the learning data collected. In this process, unstructured data were first structured before being cleaned and filtered. The learning data that need to be cleared are overlapping data, including data that have the same identification at the same time. It is to avoid analysis that is being done on the same data.
Data Analysis. Analysis of conversation data was done to identify the content desired by students. The data analysis performed included descriptive analysis and binomial probability testing. The description for each analysis is as follows: ─ Descriptive analysis -The analysis was performed based on successful and unsuccessful responses. ─ Binomial test -This test was performed to determine the probability of a successful conversational response [38]. The test features are as follows: (a) This test involves repeated trials. (b) Each attempt has only two possible outcomes -successful or unsuccessful.
(c) The probability that a particular result will occur in each given experiment is constant.
(d) All trials in the test are independent.
Action plan. Actions proposed on students and Chatbot for education were based on results from data analysis. Previous researchers have suggested actions such as intervening, finding the best solutions, signals and warnings, guiding and assisting, and design improvements [34,[39][40].

Results and discussion
An analysis was performed by using the recorded learning data that included each conversation session's successful or unsuccessful responses. Successful response refers to a Chatbot response that coincides with a student query. On the other hand, an unsuccessful response refers to a Chatbot response that does not coincide with the student's query. In addition, the unsuccessful response also refers to the absence of an appropriate response in the Chatbot database when a conversation session occurs. Table 1 displays a partial descriptive analysis of response status from the 513 total number of responses recorded. The binomial test was performed to determine the probability of a successful response in each conversation [38]. Equation (1) shows the calculations for the binomial probability.
b(x; n, P) = nCx . P x . (1 -P) n -x (1) x: The number of binomial test successes. n: The binomial test number of trials. P: The probability of an individual trial's success.
nCx: The number of different combinations of x successes selected from a set of n trials. b(x; n, P): Binomial probability -The probability for n-test binomial tests to produce exactly x successes, when P is the success probability on an individual trial. The success probability in this study is 0.85 [41].
Binomial and cumulative binomial probability calculations are done as follows: Based on the binomial test, the values of successful responses are higher than unsuccessful responses. It means that the Chatbot has successfully met most of the learning content desired by students. The bar chart in Figure 7 shows the categories of 434 successful responses. It shows the three categories of learning content that the most desired by students, related to hypertext Preprocessor (PHP), followed by database & structured query language (SQL), and hypertext markup language (HTML). Hence, the others category refers to general student responses to the subject of Web Programming.

Fig. 7. Successful response categories
The bar chart in Figure 8 shows the categories of 79 unsuccessful responses. It shows the three categories of learning content the most desired by students: PHP, HTML, and databases & SQL. All three categories are similar to the successful response categories. It indicates that such categories are important for students knowledge. The others category refers to general student responses to the subject of Web Programming. and HTML. These categories are important components in website development for Web Programming subjects. PHP is used as a back-end or server-side language and as a front-end language by integrating it with HTML. SQL is used for adding, accessing, and processing data in a database. PHP makes a call to the database using SQL when it receives the request from the user, then PHP obtains the requested information from the database and presents the requested information to the user. The importance of learning contents from the categories of PHP, database & SQL, and HTML is supported and evidence with binomial tests. The analysis results also showed that the value of successful responses is higher than the unsuccessful responses. Even so, the learning content still needs to be updated based on changes in the version of the Web Programming language and the latest technology requirements. Table 2 shows the proposed actions for Chatbot improvements based on the analysis performed. These proposed actions are important to generate higher values of successful responses in the future.

Conclusions
Chatbot for education is an innovative solution to bridge the gap between technology and education. Chatbots can provide immediate answers in conversational sessions with students who have questions about learning content. However, the Chatbot's response does not coincide with the students' desired learning content. Therefore, learning analytics is used to explore the responses of the students' conversations with the Chatbot. The analysis performed identifies the learning content desired by the students. This study successfully explored the probability of a student getting positive responses in each conversation and identified the desired learning content of the students. Chatbot learning content is updated for more accurate responses. Interactive features are added to ensure the Chatbot is more student friendly. The current study is limited to a batch of web programming students. Extension of the study to other batches of students will be pursued in the future to consider the effectiveness of the proposed actions. Azliza Yacob is a lecturer and researcher at University College TATI (UCTATI), Malaysia. Her research interests include Computer programming, Quality control, education, and computer industry. Her main research concentrates on the knowledge management system. (email: azliza@tatiuc.edu.my).