Paper— Automated Data-Driven Hint Generation in Intelligent Tutoring Systems for Code-Writing: On … Automated Data-Driven Hint Generation in Intelligent Tutoring Systems for Code-Writing: On the Road of Future Research

— Introductory programming is an essential part of the curriculum in any engineering discipline in universities. However, for many beginning students, it is very difficult to learn. In particular, these students often get stuck and frustrated when attempting to solve programming exercises. One way to assist beginning programmers to overcome difficulties in learning to program is to use intelligent tutoring systems (ITSs) for programming, which can provide students with personalized hints of students’ solving process in programming exercises. Currently, mostly these systems manually construct the domain models. They take much time to construct, especially for exercises with very large solu-tion spaces. One of the major challenges associated with handling ITSs for programming comes from the diversity of possible code solutions that a student can write. The use of data-driven approaches to develop these ITSs is just starting to be explored in the field. Given that this is still a relatively new research field, many challenges are still remained unsolved. Our goal in this paper is to review and classify analysis techniques that are requested to generate data-driven hints in ITSs for programming. This work also aims equally to identify the possible future directions in this research field.


Introduction
Programming skills are becoming a core competency for almost every profession and thus, computer science education is being integrated in the curriculum for almost every study subject [1]. However, many students find great difficulty with the learning of programming and it becomes a barrier to their further studies of computer science and other disciplines. This difficulty is in large part due to students' inabilities to solve their programming exercises, and this may discourage them to progress further when help can be obtained immediately. In order to address this problem, various approaches have been proposed to help students learn solving programming exercises. Traditionally, face-to-face and one-to-one human tutoring had been the best option for tutor. However, human tutors are not always available and that's why computer based tutoring is developed to provide as an alternative support. Intelligent Tutoring System (ITS) is an example of computer-based tutoring which is developed emulating the human tutor [2]. As shown by VanLehn [3], an ITS that is designed with the ability to understand the coding to a low level of granularity in its advice can be just as effective as human tutor. ITSs for programming are useful particularly for first year computer science students and non-major students [4]. A current trend in the ITSs for programming world is to use data-driven techniques to give hints to users of ITSs for programming [5,6,7,8,9,10,11,1,12,13,14,15,16,17,18,19,20,21,22]. According to [22], ITSs can provide personalized feedback to students automatically, but they can take large amounts of time and expert knowledge to build, especially when determining how to give students hints. Data-driven approaches can be used to provide personalized next-step hints automatically and at scale, by mining previous students' solutions. Instead of taking much time for modeling domain knowledge, the data-driven approach uses a mass of correct student programs. The data-driven approach uses correct student solutions in order to construct a solution space that contains all solution states students have created in the past (e.g., in the former semesters of a programming course). The solution states build many possible paths to correct solutions [1]. The primary contributions of this paper are 1) a classification of ITSs for programming, 2) a review of current data-driven hint generation approaches for ITSs for code-writing and 3) a discussion of the challenges that need to be addressed before we can expect to generate hints for data-driven ITSs for code-writing.

Intelligent tutoring systems
As we stated above, face-to-face and one-to-one human tutoring is the best tutoring field. However, it is extremely expensive in terms of both physical and human resources. ITSs are a natural solution that can be used to address this problem, as they are developed to give personalized feedback and help to students who are working on problems. The fact the ITSs are formed by three fields: Computer Science, Psychology, and Education, as illustrated in Figure 1, in which, (i) Artificial Intelligence (AI) addresses how to reason about intelligence and thus learning, (ii) Psychology (Cognitive Science) addresses how people think and learn, and (iii) Education focuses on how to best support teaching/learning [23].
According to [24], an Intelligent Tutoring System (ITS) is a computer system that provides immediate and customized instruction or feedback to learners. The classical architecture of an ITS includes the following four components ( Figure 2) [25,26,27,28,65].
• A knowledge domain model that stores the learning content that is taught to students. • A student model that stores information about the student's knowledge level, abilities, preferences and needs. • A tutoring (pedagogical) model, which makes student diagnosis and controls the tutoring process and make appropriate instructional decisions based on the information provided by the other components of the ITS. • A User Interface that allows the system to interact with the user-learner. This traditional view of ITSs is still very accepted by the ITS community. However, recent studies stress functionality over structure [29,30,25,7,31], describing ITSs as having two main loops [29]: 1) the inner loop and 2) the outer loop ( Figure 3) [25]. The inner loop is responsible for providing personalized feedback, hints, and direct problem solving assistance to students. The inner loop also assesses students' competence and registers it on the student model. Using the information that is obtained about the student, the outer loop performs task selection. The main task of the outer loop is to select an appropriate programming exercise for the student. The inner loop is responsible for giving hints on student steps. Here, we focus on the inner loop. We do not support an outer loop which can create an overall student model and intelligently choose which programming exercises to show to the student.
According to [32], research on ITSs has accelerated over the last decade, and scholarly interest in such systems has never been greater. ITS have been developed for a wide range of subject domains (e.g., mathematics, physics, biology, medicine, reading, languages, philosophy, information technology and computer science) and for students in primary, secondary and postsecondary levels of education.
Founded on several decades of research on human cognition and intelligence, ITS is now a fast growing area in academia and industry. We now turn our attention to some cutting-edge research on ITS in a specific learning domain: programming [33].

Intelligent tutoring systems in the programming domain
In the past four decades, a variety of ITSs for programming have been built to provide tutoring services for programming exercises. When it comes to functionalities, in general, ITSs for programming can be classified into five types: 1) curriculum sequencing, which constructs for each student an individual learning path, including individual selection of topics to learn, examples, and exercises; 2) intelligent analysis of student's solutions, which focuses more on debugging and error diagnosis for complete student's program; 3) program debugging support, which helps students learn to analyze programs; 4) interactive code-writing problem solving support, which provides students with personalized assistance in each code-writing problem solving step and 5) example based code-writing problem solving support which suggests the most relevant cases or examples to students. In the context of ITSs for programming, for brevity, we will use the term "ITSs for code-writing" to describe to the ITSs for programming for interactive code-writing problem solving support.

Automated hint generation in ITSs for code-writing
As demonstrated by [1], these non-data-driven techniques are including plan libraries, program transformation, constraint-based models, strategy-based models. Several recent studies deal with the problem of helping students to learn programming, in particular by giving them useful hints in real time while they are coding.
According to [34], ITSs for code-writing that focus on the process of solving an exercise are still rare or have limitations: some targeted for declarative programming [35,6], which is less flexible because they do not support exercises that can be solved by multiple algorithms [36,37], or only support a static, pre-defined process [38]. Furthermore, it often requires substantial work to add new exercises [39] and tutors can be difficult to adapt by a teacher. ASK-ELLE [11] is an ITS for code-writing for learning the higher-order, stronglytyped functional programming language Haskell. They model alternative solution strategies in the system ASK-ELLE through several model programs (e.g. model solutions). This system supports the stepwise development of Haskell programs by verifying the correctness of incomplete programs, and by providing hints. Programming exercises are added to ASK-ELLE by providing a task description for the exercise, one or more model solutions, and properties that a solution should satisfy. The properties and model solutions can be annotated with feedback messages, and the amount of flexibility that is allowed in student solutions can be adjusted. The disadvantage of this strategy-based approach is that their tutor based on model solutions provided by instructors/teachers, because they are experts in their field and their solutions serve as examples for students. However, variations to these model solutions are boundless. Programming exercises are characterized by huge and expanding solution spaces, which cannot be covered by manually designed hints.
According to [33], this is a vastly challenging problem, mainly because even for very simple programming tasks there are a multitude of different solution approaches, both syntactically and semantically. Even if we restrict the semantic aspect (i.e., the underlying algorithm) to a single one, the syntactic variations of implementing the algorithm present a daunting task for hint generation. For such programming exercises, ITSs for code-writing are still possible to collect implicit data in terms of solutions given by students or teachers/experts.
The data-driven approach is particularly useful when it is hard to come up with a more or less complete set of model solutions. It is worth noting that a range of nondata-driven techniques can be used to generate feedback and hints for programming exercises automatically [22].
As mentioned by [7], data-driven ITS is a subfield of ITS where decision-making is based on the previous student's work instead of a knowledge base built by experts or an author-mapped graph of all possible paths. Successful solutions from the past can be used to provide feedback and hints for students in the present, which circumvents the need to create an expert model. A data-driven tutoring system can be bootstrapped by experts providing missing data. The data-driven approach has proven to work well in combination with artificial intelligence and machine-learning techniques for learning an expert model by demonstration [40].

Automated data-driven hint generation approaches in ITSs for code-writing
New research efforts to tackle broader programming exercises are at a nascent stage and use previous students' solutions to a programming exercise to generate hints for a new student who is working on the same exercise. In recent years, there are two types of data-driven hint generation in ITSs for code-writing: hint generation has focused on code correctness and hint generation for code style [41,42]. In this research work, we focus on the hint generation for code correctness.

Program synthesis approach
In [39], the author used error models and program sketches to find a mapping from student current programs to a model solution. Rather than relying on a predefined set of solutions, he used program synthesis to generate a new solution from the student's current program.
However, according to [43], this system requires experts/teachers to define an error model specific to each programming exercise, and only supports a subset of Python. In [6], the authors has relied on analyzing the single-line edits made by students between submissions, and then using those edits to attempt to find a correct solution for the Prolog program. Those edits could then be used as a source for hints to be supplied to the new student. However, their technique requires a set of test cases to evaluate generated programs [12].
Perelman et al. [44] published their study to use all common expressions that occurred in students' code to create a database of source code that was then used for hint generation. As mentioned by [7], these techniques have great potential for supporting new and obscure solutions, but also have the drawback of only working on solutions which are already close to correct; they all tend to fail when the code has many different errors.
Rolim et al. [20] take an example-based approach to learn code fixes as abstract syntax tree transformations from pairs of incorrect and correct student submissions. However, while this approach requires far less engineering effort, it may fail to generate hints, especially when a student's program is not close to a correct solution [17].
Head et al. [19] introduce a mixed-initiative approach which combines teacher expertise with data-driven program synthesis techniques. Their work has demonstrated how program analysis and synthesis can be used as an aid for a teacher to scale feedback grounded in their deep domain knowledge. While scaling up teacher effort, these systems still require teachers to manually review and write hints for incorrect student work [45].
Suzuki et al. [45] explore a design space of hints that can be automatically generated from code transformations learned by program synthesis. Authors' ultimate goal is to adapt the strategies that a human teacher employs to automated hints driven by program synthesis. They identified five types of teacher hints that can also be generated by program synthesis. These hints describe transformations, locations, data, behavior, and examples. Their hints rely on the capabilities of program synthesis techniques to discover code transformations that fix incorrect code. As noted by the authors, while such techniques have been demonstrated on short assignments in introductory programming classrooms, in the future it may be possible to learn generalizable fixes for larger, more complex programs.
In [17], the authors present a robust hint generation system that extends the coverage of the program synthesis based approach using two complementary techniques. A syntax checker detects common syntax misconception errors in individual subexpressions to guide students to partial solutions that can be evaluated for the semantic correctness. A program synthesis based approach is then used to generate hints for almost-correct programs. If the program synthesis-based approach fails, a case analyzer detects missing program branches to guide students to partial solutions with reasonable structures. According to the authors, their experience suggests several ways that the system could be improved further.

Cluster based techniques
Gross et al. [46,47] used clustering to infer clusters of computer programs and select the most similar sample solution for hint generation. When the student requires a hint on how to change her/his code to get closer to a correct solution, it can be compared to a similar example from the cluster, and the dissimilarities between her/his code and the example code can be contrasted or highlighted in order to help the student to improve her/his own solution. As noted by the authors, the challenge with this approach is the derivation of solution steps from sample complete solutions in order to reduce the effort for modeling examples.
In [13], the authors introduced an alternative representation of computer programs for classification and error detection in ITSs, namely execution traces. The trace representation can be applied to identify erroneous programs, enabling an ITS to detect whether a student has finished a task or still needs to continue. However, they concluded that a syntactic representation is necessary when a program does not yet compile or crashes and wherever the high level of abstraction applied by a program trace is not helpful (e.g. when teaching certain syntactic constructs).
Kaleeswaran et al. [48] propose a semi-supervised technique for feedback generation. This technique clusters the solutions based on the strategies to solve it. Then instructors manually label in each cluster one correct submission. They formally validate the incorrect solutions against the correct one. However, as noted by the authors, there are many possible directions to improve clustering and verification by designing sophisticated algorithms.
Gulwani et al. [49] present a novel technique for clustering and repairing introductory programming assignments. They cluster correct submissions using variable traces computed for different inputs. Then, a representative submission from each cluster is selected as a reference solution. An incorrect submission is compared to each reference solution using variables traces, and some repairs are computed. The technique provides personalized feedback using the reference solution with the least number of repairs. However, a problem of this technique is that it requires inputs that are not easy to provide to trigger all possible errors. Variable traces are compared as a whole, so it needs a reference solution per any possible variation of a given assignment. Furthermore, the technique is not able to deal with infinite loops and submissions with multiple methods [50].

Recommendation approach
In [51], the authors represent a framework that can help students in their coding process by recommending specific code edits relevant to their codes. They use a pq-Gram tree edit distance algorithm to match a student's program to its closest counterpart in a database of correct solutions, as well as to identify the set of insertions, deletions and relabeling that will directly transform the student's abstract syntax tree (AST) into this solution. According to the authors, the disadvantages of this method involve the following three aspects: AST based program analysis, semantic similarity of programs and usability testing. With the example-based learning (EBL) strategy, Chaturvedi [52] presents a framework called Example Recommendation System (ERS) that is built upon EBL and that uses state-of-the-art mining algorithms in order to recommend a focused, organized and customized list of worked-out examples with the overall objective of increasing the likelihood of student success in the ITS's domain. However, as noted by the author, the limitation of algorithms used in this system is manual construction of regular expressions (RE) by experts.

Case-based reasoning approach
Freeman, et al. [53] use a case-based reasoning (CBR) approach, which they call Abstract Syntax Tree Retrieval (ASTR) to data mine prior solutions contained in a large dataset. This system requires no prior knowledge of the problem being solved. It uses CBR and the grammar of the programming language to retrieve a prior solution with high similarity to a struggling student's failing submission. The results achieved by their system are encouraging. However, as noted by the authors, the system contains no information about the programming problem prior to observing successful submissions. Additionally, their system has no understanding of Python syntax.

Hint Factory based approaches
In general, the basic technique in this new line of work is to first represent the previous student-tutor interactions in the form of a graph. When a new student asks for a hint, that student's interaction pattern is matched with some part of the graph and the student is directed to an appropriate next step that ultimately leads to a solution. It is not hard to imagine the potential impact of such work on any ITS that teaches programming [33].
In [54], the authors designed the Hint Factory to use student problem-solving data for automatic hint generation in a propositional logic tutor. This approach uses student data to build a Markov decision process of student problem-solving strategies to serve as a domain model to automatic hint generation. The Hint Factory operates on a representation of a problem called a directed graph where each node represents a student's state at some point in the problem solving process, and each edge represents a student's action that alters that state. A solution is represented as a path from the initial state to a goal state. A student requesting a hint is matched to a previously observed state and directed on a path to a goal state. The Hint Factory approach has been extended to work in other domains more closely related to programming.
Fossati et al. [55,56] implemented Hint Factory in the iList tutor that helps students learn linked list, a demanding topic in information technology and computer science education. In [56], the authors also concluded that their tutor produced equivalent learning gains to a human tutor.
Using the Hint Factory approach, Jin et al. [57] use linkage graphs to represent program states. A linkage graph is an acyclic graph consisting of nodes representing code statements and directed edges representing the order of the statements determined by which variables are read and assigned to in each statement. However, in [58], the author points out that multiple existing student solutions should be available with the risk that a specific alternative to solve the exercise might not be recognized. On the other hand, as noted by the authors, the challenge with this method is the determination of strategies for hint presentation.
In [7], the authors propose a data-driven approach to create a solution space consisting of all possible paths from the problem statement to a correct solution. This approach borrows heavily from the Hint Factory, but also extends it by enhancing the solution space, creating new edges for states that are disconnected instead of relying on student-generated paths.
As demonstrated in [7], ITAP (Intelligent Teaching Assistant for Programming) makes it possible to generate hints for never-seen-before states, which the original Hint Factory could not do. ITAP combines algorithms for state abstraction (the process of reducing syntactic variability in code states), path construction (determining which steps a student should take to improve their solution), and state reification (reindividualizing the resulting edits into personalized hint messages) to fully automate the process of hint generation. However, as noted by the authors, the path construction algorithm could be modified to further improve the performance. However, according to [51], one major pitfall of AST representations of source code is the loss of behavioral information.
Price et al. [12] present a new data-driven algorithm (CTD: Contextual Tree Decomposition), based on the Hint Factory, to generate hints for these broader programming exercises. As noted by the authors, a major limitation of this work is the reliance on a single programming exercise for evaluation.
More recently, Price et al. [59] present iSnap, an extension to the Snap programming environment which adds some key features of ITSs, including detailed logging and automatically generated hints. They share results from a pilot study of iSnap, indicating that students are generally willing to use hints and that hints can create positive outcomes. Hints in iSnap are generated using the CTD algorithm. As noted by the authors, the study revealed several remaining challenges for the CTD algorithm and the presentation of iSnap hints.

Summary
In summary, there has been two board lines of research proposed for data-driven generating hints in ITSs for code-writing: program synthesis based and Hint Factory based. However, according to [60], there are two major drawbacks of program synthesis based approaches. First, an instructor must manually provide error models for each problem. Second, scalability is a big issue, especially with larger programs. In terms of expert knowledge, the Hint Factory based approaches are suitable for generating hints in ITSs for code-writing. These approaches only require a two pieces of expert knowledge to run independently, though this knowledge is kept to a minimum. The needed data is: (1) at least one reference solution to the problem (e.g. a model solution) and (2) a test method that can automatically score code (e.g. pairs of expected input and output). Both model solutions and test methods are already commonly created by experts/teachers in the process of preparing programming exercises, so the burden of knowledge generation is not too large.

Conclusion and future research
This study surveys the existing ITSs for code-writing that are solely based on datadriven hint generation to conclude that they differ from each other in at least the following ways: 1) representation of student's current code (snapshot of source code, a set of features, the actual code of program); 2) immediate representation of computer programs (AST, source code); 3) extracting distinct solutions of a programming exercise (preprocessing); 4) granularity of the code state used; 5) automatically modeling solution steps and 6) programming language. In the context of data-driven ITSs for code-writing, despite the research efforts in recent years, however, generating datadriven hints is still having some problems. In summary, in this work, the gaps we identified that provide the motivation for future researches are listed below.
1. Representation of the student's current code. In the context of Hint Factory based approaches to generate data-driven hint for ITSs for code-writing, a student's state corresponds to a snapshot of the student's current code. However, according to [61], the snapshots are captured every time students compiled or saved their code, but this is not an accurate representation of a unit of work (e.g., a line of code, a statement of source code) 2. Modeling automatically solution steps from correct solutions. Clearly, in this literature review, none of the works model automatically solution steps from correct solutions of a programming exercise. How to model automatically solution steps from a large number of correct solutions of a programming exercise is an unresolved problem. 3. Semantic similarity. At the heart of data-driven ITSs for code-writing is the notion of program similarity. Measuring the similarities and dissimilarities between programs plays a crucial role in data-driven ITSs. Edit distances have been used as a measurement for the similarity of programs. Most existing systems represent programs as abstract syntax trees (ASTs), however, it is known that the tree edit distance problem is NP-hard. How to extract distinct solutions from a large dataset consisting of learners' solution attempts and a sample solution created by teachers/experts efficiently and precisely is an unresolved problem [62]. 4. Programming exercises supported by data-driven ITSs code-writing. It is important that a data-driven ITS for code-writing provides a collection of programming exercises covering an introductory programming course syllabus. Nevertheless, these programming exercises are generally stored in proprietary systems for their own use. According to [63], in general, two issues were detected that can hinder the proliferation of ITSs for code-writing: the lack of content standards for describing programming exercises and to communicate with other ITSs for codewriting. 5. Programming language. In the context of data-driven ITSs for code-writing, it can be seen that although ITSs covering many domains have been developed previously, none of them teach C/C++ programming. 6. Integrate data-driven ITSs for code-writing into curriculum. As noted by Rivers [64], data-driven ITSs for programming has been expanding as a subfield of ITSs over the past few years, with many different researchers creating new techniques to automatically generate hints. However, most of the systems (including theirs) have only been evaluated on collected student problem-solving traces, and the ones that are being tested on real students are implemented in online learning environments such as MOOCs (massive open online courses), not in individual classrooms. In the context of curriculum and real classroom in an ITS, this indicates that there is significant room for improvement in the field of data-driven ITS for code-writing.