A Software Tool to Visualize Verbal Protocols to Enhance Strategic and Metacognitive Abilities in Basic Programming

—Learning to program is difficult for many first year undergraduate students. Instructional strategies of traditional programming courses tend to focus on syntactic issues and assigning practice exercises using the presenta-tion-examples-practice formula and by showing the verbal and visual explanation of a teacher during the “step by step” process of writing a computer program. Cognitive literature regarding the mental processes involved in programming suggests that the explicit teaching of certain aspects such as mental models, strategic knowledge and metacognitive abilities, are critical issues of how to write and assemble the pieces of a computer program. Verbal protocols are often used in software engineering as a technique to record the short term cognitive process of a user or expert in evaluation or problem solving scenarios. We argue that verbal protocols can be used as a mechanism to explicitly show the strategic and metacognitive process of an in-structor when writing a program. In this paper we present an Information System Prototype developed to store and visualize worked examples derived from transcribed verbal protocols during the process of writing introductory level programs. Empirical data comparing the grades obtained by two groups of novice programming students, using ANOVA, indicates a statistically positive difference in performance in the group using the tool, even though these results still cannot be extrapolated to general population, given the reported limitations of this study.


A. The programming context
The difficulties many undergraduate students face when learning programming are still a common topic in cognitive, educational and technological research literature. The problem has been approached from many angles, such as the study of the cognitive behavior of novices and experts [1][2][3] some creative pedagogical strategies [4][5][6], the cultural environment of the student [7], [8], and of course the use of software tools [9][10][11][12].
In developing countries such as Mexico, programming skills are relevant for undergraduate students, given the increasing trends of first world economies to outsource programming, information technology and software related jobs [13][14][15].
But the most visible aspect of the problem is the almost universal pattern of high failure rates among first year computer science programming students. Depending on the source, it can be found that this failure rates range from 30% to 60% [16][17][18][19]

B. The traditional teaching model
Teaching programming is often based on the pedagogical pattern of: 1) presenting the topic, 2) showing a few examples, and 3) assigning practice exercises; that is, the presentation-examples-practice formula [20]. And so, a traditional programming course is mainly based on theoretical lecture sessions and practical work on computer laboratories, where most of the content is focused on the characteristics of the computer language being taught [21]. Refs. [22][23][24] agree that most introductory programming courses are reasonably good to emphasize syntax comprehension of programs, but that they do not reinforce the strategic kind of knowledge required to write programs.
A common perception of computer programming educators is the assumption that this strategic knowledge will develop itself as a byproduct of curricular design [25], while literature suggests that a more effective approach is that this knowledge has to be explicitly taught [21], [26].

C. Software tools
During the last four decades, researchers and designers have been trying to make programming more appealing to students and to the public. They have developed a wide variety of software applications, to make programming skills easier to acquire. From Logo [27], to Alice [11] a diversity of learning goals have been pursued: to develop problem resolution abilities, to develop logical thinking through games, or to facilitate the transition to general purpose programming languages by way of alternative and easier to use interfaces, among other goals. Ref. [28] did a survey of approximately 80 software tools designed to teach programming or to foster the interest in programming by way of games, animations or puzzles.
Other recent types of software tools designed to be an aid in teaching programming are: a) program visualization; by using graphics that enable the student to visualize the behavior of algorithms and data structures [29], b) learning objects, small instructional components that can be reused in several contexts [10], c) concept maps, that work like big knowledge "scaffolds" to represent the main concepts of programming, and to combine them with other teaching strategies and tools [30], [31], and d) cognitive tutors, that use declarative and procedural knowledge in A SOFTWARE TOOL TO VISUALIZE VERBAL PROTOCOLS TO ENHANCE STRATEGIC AND METACOGNITIVE ABILITIES IN BASIC PROGRAMMING the form of rules, to give guided feedback to the student [32], [33]. All these types of tools have reported positive results, and have had various degrees of success in the goal of teaching programming, but we argue that while some of them have been adopted by programming educators, most of them have focused only on a limited subset of the cognitive aspects (e.g. the transfer of mental models through graphics or interactive feedback) that cognitive literature report as critical.

D. Critical cognitive aspects
The act of programming is essentially a cognitive process of problem resolution that involves writing abstract structures of an algorithmical process. In other words, programming is a way to mentally create a solution to a problem, simultaneously combining a limited and predefined set of syntactic structures and statements, by way of a computer language.
Cognitive literature regarding the acquisition of programming skills is vast and complex. The subject was of special interest in the 1980s [34][35][36], and in recent years it has still been explored, so that other facets of the problem have been identified and more alternative solutions explored [25], [37], [38]. The most recurring topics found in relevant cognitive literature are: a) comparative studies of mental models of novices and experts, b) the development of programming strategies (also called, plans, schemas or clichés) for common types of problems, and more recently, c) the cognitive process called Metacognition.

1) Mental models
Ref. [3] defines a mental model as an internal representation of a system or complex task, whose construction enables the learner to comprehend and predict the behavior of that system or task.
Ref. [39] also tells that a mental model develops and refines through time, as a result of interaction between the subject and the target system, and that this mental model does not have to be very precise, as long as it is "functional".
In programming, a mental model refers to the image the programmer has about the invisible processing that occurs inside the computer, in the interval between an input and an output [35]. Ref. [40] clarifies that to write a program a person has to have many and very diverse mental models, referring, for example, as to how a loop, a data structure or decision structure behaves. Ref. [41] notes that the existence of a wide range of valid mental models is critical for the novice to acquire the ability to write programs, and if these mental models are not explicitly taught, the student will anyway create its own, of dubious quality and effectiveness.
2) Strategies It has been found that, even though a student is in fact able to acquire valid mental models, and knows the correct syntax of a programming language, a key cognitive element is still necessary for him to write effective programs. This component is called "strategy" [42] (also known as schema, plan or cliché). Strategies are predefined solutions to stereotyped kinds of problems. The lack of a minimum amount of these strategies restricts the student ability to recognize certain types of problems, and therefore their solution. Ref. [43] indicates that an important aspect of strategies is that they cannot be deducted from the final form of the program. This means that a novice can study the final shape of a program, but unless explicitly taught by a teacher, he or she cannot see the process and strategies involved in its writing. The final form of a program can give the student information about the concepts and syntactic structures used, but not about the strategies and decisions applied during the writing process. These strategies are a lot more difficult to teach in the classroom and laboratories, but Ref. [44] notes that "in programming, there is considerable empirical evidence that suggests that strategies are the main basic cognitive component used in design and program comprehension." Finally, Ref. [45] argues that the process of writing a program does not have to be understood as a "literal transcription" of a previously stored and typified solution, but rather as an iterative, exploratory, and incremental process determined by minor episodes of problem solving and constant re-evaluation of the effectiveness of applied strategies. That is, the effectiveness of a set of strategies is constantly monitored and evaluated by the programmer, in the process of writing the program.
This finding leads to another important aspect of programming (and of problem solving in general) called Metacognition.
3) Metacognition Ref [46] described Metacognition as "awareness of a person's own cognitive process". While strategies allow a programmer to solve problems, Metacognition allows him to monitor its progress, apply his knowledge to new situations, and identify its own limitations. Ref. [47] indicates that through Metacognition a student can define the nature of a problem or task, select a useful mental representation, use the most pertinent strategy to implement it and put attention to feedback as to how he or she is making progress towards the solution.
In this context, favorable results have been reported through the use of instructional strategies such as "pair programming" [48] (that is, a pair of novice programmers monitor each other's progress, with constant feedback), and with the use of "think-alouds" [49], (instructing the students to verbally reproduce their thought process when writing a program, thus explicitly making such students aware of the decisions, and problem solving strategies they are applying). These are clear examples of Metacognition in the programming context.
Given that empirical evidence in cognitive literature suggests that these three cognitive components (valid Mental Models, Strategies and Metacognition) are critical to acquire the ability to program, we argue that a software tool designed to help students to learn to program, has to include some form of these elements.

II. DEVELOPMENT OF A VERBAL PROTOCOL VISUALIZER TOOL.
A. Verbal protocols Verbal protocols, as a method of representation and analysis of a person's thought processes, have a solid tradition in the context of cognitive psychology [50][51][52] As a technique, verbal protocols were initially developed to study a person's short term memory processes (to what things he or she pays attention to, and in what order, when given a certain task?), but in time, they have been extensively used in other disciplines such as software iJIM -Volume 5, Issue 3, July 2011 A SOFTWARE TOOL TO VISUALIZE VERBAL PROTOCOLS TO ENHANCE STRATEGIC AND METACOGNITIVE ABILITIES IN BASIC PROGRAMMING engineering (e.g. usability studies [53], [54], software task analysis [55]), and even in programming teaching [49].
To develop our tool, we selected the method of verbal protocols, as a way to elicit (and later explicitly show to students) an expert programmer's series of decisions when given a certain basic programming type of problem: that is, a verbal protocol can show a student what elements of a problem the expert is paying attention to, how and in what form the programmer applies the basic programming structures (loops, decisions, data structures), and how does the programmer identifies when he or she made a mistake and has to backtrack and correct it.
To analyze a verbal protocol, a researcher has to rely on some kind of recording device (in the old days, a tape recorder) to be able to transcribe, apply a coding scheme and compare the verbalizations of a given set of subjects. At this point we opted for a different kind of recording method, using video capture software, to be able not only to record the verbal data, but also the visual behavior of the expert programmer.
For example, in our study, a recording session would consist of asking an expert programmer to write a program to solve a simple programming problem, such as the following, while verbalizing his or her thought process: Write a program in C Language that, when given a quantity N of integers, gives the sum of all pair numbers, and the average of all uneven numbers.
Then, we used the video capture software to record audio and video activity taking place in the computer. The resulting product was a video file with visual and audio information that was later transcribed and edited to a database ( Figure 1) It has to be noted that, in all cases, video editing was needed to re-record the video segments of the protocols, because correspondence between the verbalizations of the programmer, and the visual information (the actual writing of the code) where very rarely synchronized ( Figure 2) For our experimental test, four edited protocols where produced, representing four types of problems with different levels of difficulty ( Figure 3).
Once the tool was in its final form, students could access and visualize the protocols using a web based interface, typing a keyword, an specific phrase or an author's (programmer) name ( Figure 4).

B. Dual coding
Dual coding theory (DCT) describes that to process sensorial stimuli from the environment, the human mind has two independent but connected memory subsystems: one for visual and one for verbal information. The visual subsystem handles concrete images and sounds. The verbal subsystem records language and abstract information. According to the theory, both systems function independently but are intimately connected: when a verbal representation is created as a response to a visual image, or when an image is created as a result of seeing or hearing a word, it is said that a referential connection has been made, and thus, dually coded. [56][57][58].
Empirical data of DCT studies [56], [57], [59] shows that the brain can retrieve information better when it is dually coded.    In our study, we tried to apply this principle to the design of the user interface of the tool, by allowing the student to browse the verbal and visual information of the protocols ( Figure 5). The protocols were divided in segments that students could study and analyze by reading the verbalizations, and watching the video segment corresponding to the writing of the code.
The interface was subject to several tests to further refine its usability. For example a feature was added to allow a student to "jump" directly to a specific step of the protocol ( Figure 6).

C. Experimental conditions
To measure the tool's capability to transfer strategic knowledge to novice undergraduate programmers, we designed a standard test, consisting of three basic programming problems, to assess the student's skills. The test was written to evaluate the following specific abilities: a) Recognize types of problems that involved combined structures of repetition (loops) and selection (if). b) Effectively write repetition and decision structures. c) Recognize and effectively apply problems that involve counters. d) Make calculations involving exponents.
It has to be noticed that the test was designed using previously applied questions and problems, taken from our internal programming academy quiz repository. This repository of tests dates back to the year 2008. The specific sample of the three test questions was randomly selected to design the measurement instrument, but taking into account their similarity to the desired specific abilities to be measured. 15 historical undergraduate student's results of both computer science and electronic engineering were selected.  These previous results were analyzed to verify if the instrument's behavior was normal and without significant bias. We used a grading scale of 0 to 10.
Descriptive statistics (Table I), normality tests (Table  IIFehler! Verweisquelle konnte nicht gefunden werden.), the corresponding histogram of the instrument data (Figure 7), and a Q-Q Plot (Figure 8) are shown.
Given the small size of the sample we look at the Shapiro-Wilk test for normality asumption. In this case, the Sig. value is greater than 0.05, wich indicates that the data is normal.
Also, in Figure 8, we can see that the data obtained with the instrument (that is, the grades obtained) also behave normally, except for one observed value.  To test the effectiveness of our tool, a semiexperimental setting was designed, using two groups of programming students: one from 2 nd semester Computer Science students and another from 2 nd semester Electronic Engineering students, both from Autonomous University of Aguascalientes (UAA), México. Selection of participants was not random. Complete groups were invited, given that the experiment was conducted during a period of normal classes.
The Electronic Engineering group (n=20) was selected as the control group (TRAD). The computer science group (n=18) was to serve as the experimental group (EXP). 55% of the (TRAD) group had previous programming experience from highschool, while 41% of the (EXP) group had previous programming experience.
Two days before the application of the test, four excersices were given to both groups for them to study and practice. This excersices were the same ones loaded in the Protocol Visualizer Tool.
Our research model had as independent variable the "teaching method", so that the control group (TRAD) had a teacher giving a traditional lecture using blackboard and laboratory computers to explain the solving procedure of the given practice excersices, and the experimental group (EXP) used the Protocol Visualizer Tool to study the solving procedure of those same excersices. Our dependent variable was "performance": that is, the grade obtained through the instrument.
Controlled conditions for both groups were:  Explanatory lecture sessions had a one hour duration for both groups.  The given time to answer the instrument was limited to an hour for both groups.  A "motivation" factor was introduced for both groups in form of "extra points".  Characteristics of the lab computers used from both groups were the same.  Previous programming experience was similar in both groups (55% for TRAD and 41% for EXP).  At the time of the experiment, the instructional content given (during normal classes) to both groups was the same. Both groups where studying basic data structures in C Language.
Uncontrolled conditions were:  Teachers from both groups were different, but they came from the same programming academy, taught the same content, and had the same experience in teaching.  It was not possible to record the individual answering time of the test participants.  Selection of participants was not random. In both cases, complete groups where invited to participate in the study.

A. Descriptive statistics
The test was graded by professors of our internal programming academy staff. Table III shows the results of the descriptive statistics of both groups. Mean results for EXP group was 6.33 and 3.79 for TRAD group. The Median value for EXP group was 7.5, and 3.7 for the TRAD group. Mode for the EXP group was 3.30 and 2.30 for the TRAD group. Mean values indicate that the experimental had performed 25% better than the control group; but Modes results suggests that both groups had bad performers. Standard deviation of the EXP group was 2.99, which indicates bigger dispersion of data than the TRAD group (2.35). This result indicates that the TRAD group performed "uniformly bad", and that the EXP group had more "better than average" performers.
Comparative histograms of both groups' results are shown in Figures 8 and 9 . It can be seen that EXP group is negatively skewed to the left, meaning that the majority of its frequencies are grouped towards the upper values of the scale. TRAD group frequencies shown in Figure 9 show the opposite behavior.

B. ANOVA
Given the descriptive statistics results, it can be inferred that the EXP group had better performance than the TRAD group. To see if this performance was statistically significant, we ran an ANOVA test with the results shown in Table IV.
These results indicate that there was a statistically significant different performance (p<=.006) between the two groups.

C. Correlations
Programming literature suggests that there is a positive correlation between previous experience and performance in the first undergraduate programming course [37], [60], [61]. We ran a correlation test to see if a correlation could exist between previous experience of these participants and the results obtained in the test. As mentioned earlier 55% of TRAD group participants have had previous experience in programming, and 41% of EXP participants had studied programming in high school. Pearson correlation results are shown in V and Table VI.  Pearson results show that, for both groups, there is no significant correlation between their previous programming experience, and their performance in the study.

A. Explicit strategy learning
These positive results are promising in the sense that, under the conditions of the study, a significant improvement (25%) in performance was obtained.
It can be interpreted that when students where using the tool and explicitly studied (by reading and seeing) the problem solving procedure, they learned a small set of strategies that where effective to those kinds of problems.
Also, the verbalization feature of the protocols, allowed the students to understand why the expert programmer was using a specific kind of programming structure.
In the protocols, a metacognitive element was implicitly present during some "backtracking" episodes. For example: in one case, a programmer identified that she had omitted the declaration of a variable, and that was causing a syntax error in the program. In other case, (in the final steps of the protocol) the programmer noted that she needed a counter variable to obtain the average of the uneven numbers. These incidents showed the students that the process of writing a program is not linear, but incremental and constantly monitored.
Our results also seem to support previous studies related to some instructional strategies in the context of cognitive load theory [62]. That is, that showing a "worked example" [63][64][65] can be an effective instructional strategy, given the reduced "memory load" that the student is submitted to.

B. Benefits
We argue that the tool can be helpful in decreasing high failure rates, if the amount of protocols loaded in the tool is sufficiently big to cover a significant range of problem categories. That is, if students are given a wider range of problem solving strategies, the transit between the initial states of learning [66] and towards an automation of strategies can be made more efficiently.
A SOFTWARE TOOL TO VISUALIZE VERBAL PROTOCOLS TO ENHANCE STRATEGIC AND METACOGNITIVE ABILITIES IN BASIC PROGRAMMING Also, programming teachers can share their knowledge with a wider range of students, whom, in turn, can constructively compare different kinds of solutions for one same type of problem.
It is important to note that the tool was designed as an aid and complement to programming teachers, and not as a substitute to them.

C. Limitations.
Given the uncontrolled conditions reported in previous sections of this paper, the positive results obtained by the EXP group using the Protocol Visualizer Tool, cannot be generalized to be valid for all the population of first year undergraduate programming students.
An uncontrolled variable was teaching style: that is, even though both teachers were part of the same academy and had similar background and experience, the two groups having different teachers could have had an unmeasured effect on the final results.
Also, duration of the actual study was limited to three days, given the limited availability of the students who voluntarily participated.

D. Future studies
Future studies need to be longitudinal in nature, so that the effect of a longer exposure of the students to the tool can be measured.
A randomly selected sample of participants is desirable, but this kind of scenario is not always possible (or practical) given the nature of every day lectures.
It is also possible to load the tool with protocols that involve problems related to other programming languages or paradigms such as Java, C#.
Also, the graphical user interface can still be improved through usability test, in order to use within other Webenabled platforms (such as mobile devices).
Lastly, it is planned to extend the functionality of the tool, by adding pedagogical features such as completion problems, in selected segments of the protocols, to be in accordance to suggestions given by Refs [67], [68].