Assessing Collaborative Problem Solving Skills in Technology-Enhanced Learning Environments – The PISA Framework and Modes of Communication

As been highlighted by many, for instance by PISA, Collaborative Problem Solving (CPS) is a critical and necessary 21 century skill across educational settings. While many initiatives have been launched to investigate the nature of these skills, fewer are the attempts to understand how they should be assessed. However, in 2015, the PISA organization presented a framework for assessing CPS skills. This paper reports on an exploratory study investigating the predictive validity of the PISA assessment framework and if and how modes of communication influence the assessment of 24 students’ collaborative problem solving activities when using a computer-based assessment task system. The findings presented demonstrate that the PISA CPS assessment framework have a weak predictive validity, does not count for quality or productivity in communication, and that the mode of communication indeed influence CPS processes and in turn what is possible to assess. Keywords—collaborative problem solving, assessment, technology, 21 century skills, PISA


Introduction
and universities. These skills are often described in different ways and with different items linked to them [1] [2][3] [4]. However, commonly the 21 st century skills include various Life and Career skills, Learning and Innovation skills, and Information, Media, and Technology skills. Life skills are in its turn often described as including Flexibility, Initiative, Social Skills, Productivity and Leadership; Learning Skills as Critical Thinking, Creative Thinking, Collaborating and Communicating; and finally Literacy Skills as Information Literacy, Media Literacy and Technology Literacy (https://k12.thoughtfullearning.com/FAQ/what-are-21st-century-skills). Recently, the PISA tests have also been adjusted to test some of these skills. Many initiatives have been formed to improve both teaching and assessment of these skills, including Partnership for 21st century skills (www.21stcenturyskills.org) and the Cisco/Intel/Microsoft assessment and teaching of 21st century skills project (www.atc21s.org)], but also other initiatives have been formed [5].
One of the most common 21 st skills discussed is Collaborative Problem solving skills (CPS), something that usually is clearly linked to computer-assisted problem solving tasks [6]. There have been numerous attempts to create innovative assessment methods for such tasks from eg. Australian Curriculum, Assessment and Reporting Authority [7], but also from other research groups [3][8] [9].
However, even if such assessment frameworks have been formed, there seems to be a lack of consensus on how to score and grade such tests. To assess CPS skills is challenging in numerous of ways. For example, it is a complex interactive activity were the individual performance is hard to "fixate" and measure when individuals becomes entangled in process oriented activities. Another reason for the lack of assessment frameworks is that collaboration skills has been foremost seen as a method for learning and not as a skill to assess [10]. This calls for a shift from measuring individual cognitive skills towards measuring social and interactive process oriented activities. For example, the PISA framework with its 12 perspectives [11] [12], seem to assess different issues than the ACARA framework with its 6 strands and 5 levels [13]. Other assessment models seem to lack published details of what and how they assess the collaborative problem solving skills. Furthermore, there seem to be limited studies on how these different assessment frameworks work in real life situations and if and how external factors might influence the test results.
Therefore, we see a need for empirical investigations of these types of assessment frameworks and how their scoring matrices works in relation to the basic ideas behind 21 st century skills, and especially on collaborative problem solving skills. One of the most used assessment framework is the PISA test on collaborative problem solving, which also have published details on their assessment rubrics [11] [12]. However, we have not been able to find empirical studies on how these rubrics work when used as a tool for assess CPS.
The aim of this exploratory study was thus to investigate how one of the most common assessment frameworks of collaborative problem solving skills, the PISA matrix, can be applied on a typical computer based assessment task system (http://janison.com/). An additional aim was to investigate if different means of communication (text chat vs. audio chat) influence the results of the tests, and if the assessment matrix can detect such possible influences.

2
Assessment of collaborative problem solving -the PISA framework Since the foundation of the field of computer supported collaborative learning (CSCL), a substantial body of research has provided evidence on the positive effects of introducing technology into collaborative learning and problem solving tasks. Several large meta-analyses indicate that participants who collaborate making use of information technology show greater increases in motivation, elaboration, dialogue and debate, higher-order thinking, self-regulation, meta-cognitive processes, and divergent thinking [14] However, as [15] also note, while most CSCL research focuses on exploring the important impact of collaboration on learning, research regarding how to assess collaborative problem solving skills in technology-enhanced environments is overlooked, despite that the domain of educational assessment have repeatedly called for it [16] [17].
CPS assessment can be done through a number of different approaches depending on types of measures used to determine the quality of student performance. [18] lists measures such as the quality of solutions and objects generated through collaboration; analyses of intermediate results, paths to solutions, team processes and structure of interactions; and finally quality and type of collaborative communication. A key challenge here, as put by [18], is to assure that performance can be accurately quantified and captured by the assessment approach. Lately, the PISA 2015 collaborative problem solving framework have been proposed as a potentially accurate for assessing CPS skills in technology-enhanced environments. In 2015 PISA decided that assessment will include CPS skills as one of the key competencies to measure, and presented the PISA 2015 Draft Collaborative Problem Solving Framework to guide the assessment [11] of CPS in technology-enhanced environments. The framework (see Figure 1) rests upon two types of skills, namely collaboration and problem solving skills. The collaboration skills are: 1) Establishing and maintaining a shared understanding; 2) Taking appropriate action to solve a problem; and 3) Establishing and maintaining team organization. The problem solving skills on the other hand are: A) Exploring and understanding; B) Representing and formulating; C) Planning and executing; and D) Monitoring and reflecting.
While the PISA framework have been refereed to as a valued departure point for assessment of CPS by other researchers [19] [15], the research literature lacks papers reporting on empirical uses and evaluations of the framework. Therefore, we currently, on the one hand, lack understanding of how the PISA assessment framework can be utilized, and on the other hand, of the actual merits of the framework in assessing students' collaborative problem solving skills in technology enhanced assessment environments. It is against such a background this paper reports on a study that evaluates the PISA assessment framework for CPS. A particular interest has been in investigating the modes of communication text and audio, and how it may influence CPS and CPS assessment. Modes of communication has not so far been raised as an aspect that may have influence when aspects of CPS assessment have been discussed [9].

Methodology
In order to evaluate the PISA collaborative problem solving assessment framework, a study was conducted during 2016 with the participation of 24 students aged 13 and 14 (11 females, 13 males). The participants were from five different schools in the Stockholm area and were part in a 21 st Century Skills Assessment project. In order to investigate the validity of the framework, a research design was employed that valued variety in participants as well as mode of communication that was provided. Recruitment of participating schools and participating students was achieved through collaboration with local educational organizations based on the following criteria: a) the school are actively involved in the Swedish overall 21 st century Skills project (a part of the Australian, Irish and Swedish collaboration on 21 st century skills assessment project, atc21s.org); and b) the students should be in the 8th grade; and c) the students should be proficient in English. The students were divided into groups of two and performed two assessment tasks with two different modes of communication. In each of these test groups, students were randomly divided into groups of two. Thus, the collaborative problem-solving tasks were performed in collaboration between two students which where physically located in different rooms. The assessment tasks were performed in small and quiet rooms in the school environments. Data was collected through video screen recordings of the activities in the assessment task system. In total, 12 test groups performed the activities in the assessment task system. For each test group, the activities were no longer than 20 minutes, a limit set by the research team.

3.1
The computer-based assessment tasks In this study we used a computer-based assessment task system developed by Janison (http://janison.com/) as they were part of a 21 st Century Skills Assessment project. Two tasks were developed by teachers and researchers with the support from Janison regarding technical issues. The first was a balancing scale problem task (see Figure 2) that comprised of 5 subtasks in which a pair of interdependent students had different information on their respective screens that needed to be used in order to successfully solve the problems presented. In this particular task, communication was done through text chat within the computer-based assessment task system. The system was designed to only support communication through written text.
The other task developed concerned an environmental problem and more specifically the carbon cycle (see Figure 3). This task, which was developed in Swedish, also comprised of 5 subtasks and were constructed in a similar way as the first task. The students needed to communicate with each other in order to solve the task. In this particular task, the research team allowed the students to communicate through audio which were facilitated through Skype. Complexity and level of difficulty was similar in the Balance scale and Environmental problem tasks. Thus, the fundamental difference between the two tasks was the mode of communication.
The reason for using two different modes for communication was that we hypothesized that collaborative problem solving will unfold significantly differently when communicating through different modalities which in turn would affect what can be assessed and how it can be assessed. In general, the attempt to evaluate the validity of the PISA assessment framework is to high extent dependent on that the framework is tried out in various ways, which is another reason for employing a research design in which two different tasks are performed with the use of two different communicational means.

Scoring student performance
As no documented uses of the PISA 2015 CPS framework could be found, and PI-SA's own description of the framework (OECD, 2013) lack explanations of how the framework should be employed and how scoring should be done, we were strained to develop our own scoring procedure. The procedure was done as following.
We started out by collectively analysing and scoring a group of students' collaborative problem solving activities in order to reach consensus regarding how different communicative acts were to be interpreted through the lens of the framework and in order to better understand how the framework could be utilized. That collective activity culminated in a prescription of the following. First of all, we defined the unit of analysis as communicative acts (verbal sentences and utterances). Also, we took the point of departure in that the CPS framework was understood as comprising of 12 codes (A1 to A3, B1 to B3, and C1 to C3, see Figure 1). Then scoring started by viewing the screen recordings of the students collaborative problem-solving activities and by mapping communicative acts onto the CPS framework codes. For each occurrence of a certain code, one point was ascribed in an excel sheet. For instance, the communicative act "What do you see on your screen right now?" was scored as code A1, Discovering abilities and perspective of team members. Each time similar events occurred, new points in A1 was accumulated. Thus, the final result of such a scoring was a quantitative frequency description of the occurrence of the events represented by the 12 different codes. For each student group activity, we also registered the number of accomplished tasks, total amount of utterances, total time of activities, and time per accomplished tasks. These measures were defined as the overall performance indicators. Results

Performance of the two groups -the role of modes of communication
We began by exploring performance of students in the two different groups. Performance was here indicated by four variables, namely total amount of utterances, total time of activities, number of accomplished tasks and time per accomplished task. The descriptive statistics is presented in Table 2.
As can be noted from Table 2, few groups of students managed to complete all tasks before the time limit of 20 minutes was passed. Two out of six of the student groups that communicated through text chat managed to complete all tasks while only one of the groups that communicated through audio completed all tasks. In average the text chat groups spent 10.63 minutes to accomplish a task in comparison to the audio chat groups that spent 8.85 minutes. In terms of number of utterances, the text chat group communicated in average 39.3 utterances, while the audio chat group communicated 65 utterances in average.
To further examine differences between the groups with regards to the different performance indicators, independent sample t-test were conducted. No significant differences could be noted except of that the students who communicated through audio demonstrated more utterances (M=65.00, SD=11.22) than students who communicated through text chat (M=39.33, SD=21.16), t(10)=-2.62, p<0.05. Thus, although students that communicated through audio significantly exchanged more utterances and spent a slightly less time on tasks that were accomplished (p>0.05), they did not solve more tasks.

Examining differences between the two modes of communication in each groups in terms of coding based on the assessment framework
Independent sample t-tests were also conducted to examine if any differences in coding frequency of the assessment framework categories could be revealed between the two conditions. Significant differences could be noted between the two conditions with regards to four out of 12 coding categories, namely A1: discovered perspectives and abilities of team members, B1: built and negotiated a shared representation of the problem, C1: communicated about actions to be performed and D3: monitored, provided feedback and adapted the team organization.
Thus, the results of the t-test showed that student in the audio group significantly: Hence, the results show that communicating through audio during the problemsolving activities seem to encourage students to demonstrate the skills represented by the coding categories A1, B1, C1 and D3. These differences may also partly explain what the audio group communicated more about in terms of amount of utterances. The implication is that the chosen communication mode in significant terms affects how the collaborative problem-solving activities are unfolded and which skills that are demonstrated, and as a consequence, how the collaborative problem-solving skills are assessed.

Examining the predictive validity of the assessment framework
To assess predictive validity, we examined the extent to which the coding categories of the assessment framework, used in the two different conditions, correlated with the performance measures accomplished tasks and time spent in average per accomplished task. The results of the correlation analysis for the group of students communicating through text chat is presented in Table 3.
As can be noted, all categories, expect of A2 and B2, correlated to different extent with the two performance measures, slightly stronger with number of accomplished tasks in comparison with time spent per accomplished tasks. Mostly weak but some modest to strong correlations were demonstrated by categories A1, B1, C1, C2, C3, D2 and D3. However, no statistically significant correlations were found between the categories and the performance measures for the text group.
With regards to the audio group, the correlation analysis revealed almost similar results than for the text group (see Table 4). For this group all categories, except of A3 and B2, correlated with the two performance measures to different extents, slight-ly stronger with number of accomplished tasks in comparison with time spent per accomplished tasks. Mostly weak but some modest to strong correlations were demonstrated by categories A1, B1, C1, C2, C3, D2 and D3. Significant strong correlations were found for A3 and B2.
In total, taking both conditions into account, the categories in the assessment framework only explains 22% of the variance in number of accomplished tasks and 16% of the variance in the time students spend per accomplished task for the text group. Thus, the large portion of variance is explained by other factors than the categories in the assessment framework.

Discussion
Understanding of teaching, learning and assessment is changing in many ways, both in terms of digitized, global media and in terms of mental frames and new ways of conceptualizing learning, as with the 21 st Century Skills (Griffin et al 2013.). Two of the aspects of learning nowadays concern social collaboration and the ability to communicate and negotiate to solve problems. Since this is the case, it is also of importance to investigates ways to assess and analyze social interaction and communication which traditional has seen as a method for learning and not as a skill to assess.
We have in this study followed 12 groups of students (n=24) engaged in collaborative communication activities to solve problems in digital environments. From this we draw a number of conclusions. The first one concerns communication qua communication (utterances, turn-taking, negotiation etc.). We can notice a difference in communicative patterns depending on the way communication is carried out: as written text or as verbal talk through Skype. Through the verbal talk, the tempo of the communication is much more fluent and concerns the ability to quickly find out which kind of information the other participant has access to, and how the students together could combine information to solve the problem although our result show that the text groups completed more tasks. We also notice that communicating through verbal talk produce significantly more utterances than communication through text chat, and that some categories of the PISA framework is more prominent than others depending on the way of communication. Relating this to an evaluation of the PISA framework, we conclude that the mode of communication indeed affect what is made available for assessment and what is finally assessed.
The second conclusion highlights the content aspect. We have noticed that intense communication either can lead closer to the solving of the problem, or away from it. Thus, a high frequency of coded categories in the PISA framework does not entail productive collaboration or problem-solving performance per se. A related finding is that some students have intense communication, but mainly based on trial-and-error strategy. When they manage to come further (to the next level), they not always understand why this happens. This trial-and-error strategy highlights, on the one hand, a weakness in the design of the system that allow for the use of this strategy, and on the other hand a weakness within the assessment framework that do not take such behaviour into account. This could also be linked to a behaviour connected to reading strategies when text moves from a printed text to the screen. Previous research has shown [20] that the reader of screen based text material has a tendency to explore the content in a non-linear way. The screen provides several entry points which might lead to different reading paths than the designer of the system had accounted for [21] [22]. If the point of departure is a linear path that the students should follow the screen affordances might be a problem that might lead to trial-and-error strategy.
Nevertheless, the general conclusion is that an analytical grid that only focus on the first, communicative aspect cannot help us to understand communication as a problem solving activity. Our analysis also showed that variance in performance is explained by other factors than the categories in the PISA framework and that many of the framework categories have weak and insignificant correlations with performance measures. Thus, one can question the predictive validity of the PISA CPS assessment framework.
In final remarks, the quality of communication must be related to the content aspect. Our argument is that the kind of analytic framework we have used here, i.e. the PISA CPS assessment framework, does not count for quality or productivity in communication. If we only look at the quality of the communication qua communication, in terms of intense or mutual communication, one cannot relate such findings to the content aspect. Our data have helped us to highlight the importance of this aspect, and therefore we will continue to develop a communicative-and content related framework.