Interaction with a Problem Solving Multi Video Lecture: Observing Students from Distance and Traditional Learning Courses

.

The interactive nature of our novel multi video object offers students several alternatives for watching the lecture. When the multi video is used by students of traditional and distance learning courses, is opportune to investigate if there are differences in how students watch and interact with the multi video.
We captured a problem solving lecture in the theme of Database Design our system. The resulting interactive multivideo object was offered to two groups of students as extra learning material, in preparation for exams. One of the groups attended a traditional, classroom-based course, and the other group attended a distance learning course. In this paper we first give a brief overview of our system, and then we present observations of how both groups of students interacted with the multi video multimedia learning object. We could observe, for instance, that students from the traditional course used the alternative views allowed by multi video more than students from the distance learning course, while students from the distance learning course spent more time watching and interacted more with the interactive multi video.

Index Terms-Student-multimedia interaction, Interactive Multimedia, E-learning, Ubiquitous Capture and Access. INTRODUCTION
An increasing number of video-based lectures has been made available via Web-based platforms. In some cases the lectures are generated by capturing a live lecture delivered to students in traditional classrooms, as supported by platforms such as Matterhorn [1] 1 , virtPresenter [2], 1 opencast.org/matterhorn/overview Video Lectures 2 , Echo360 3 and Eya [3] 4 . Some systems also allow instructors to deliver their lecture in a studio without students, which is a demand for massive online courses such as those deployed in the coursera 5 and edX 6 platforms. Yet another demand is the generation of lectures via (whiteboard) screen casting, a case in point being the Khan Academy [4]. Given that in many scenarios the lecture material is produced using more than one video source -one video for the instructor and another for the slide or the whiteboard, for instance -recent systems such as Matterhorn and Echo360 allow users to review the contents using more than a single video stream. This is the case with videoconferencing-based systems used in synchronous learning such as those based on the BigBlueButton 7 and other platforms [5].
The generation of the web-based lectures achieved by capturing live lectures exploits the fact that the classroom can be viewed as a rich multimedia environment where audiovisual information is combined with annotating activities to produce complex multimedia objects, as proposed in the late '90s [6] [7].
As growing number of web-based lectures is made available, challenging tasks include being able to extract semantics [8] to support search [9] and mobile devices [10], and to predict when students are likely to abandon a course [11]. One theme of recognized importance is the ability to analyze how students watch the lectures and learn from them [12]. As a matter of fact, the literature reports several efforts involving the comparison of traditional and distance learning courses in several aspects [13] [14] [15] [16]. We built a system prototype that allows the recording of several video streams associated with a lecture delivered in an instrumented classroom. The video streams are captured both (a) from cameras focused on the instructor, on projected slides, or on (traditional or electronic) whiteboards, and (b) from the computer screen used by the instructor while interacting with a piece of software or presenting some previously recorded video, for instance.
Given the several sources of information available in the classroom and that are captured by our system, stu-SPECIAL FOCUS PAPER INTERACTION WITH A PROBLEM SOLVING MULTI VIDEO LECTURE: OBSERVING STUDENTS FROM DISTANCE AND… dents must be given a broad range of interaction alternatives when reviewing the lecture. Our system produces an interactive multi video object which not only combines the several video streams with contextual and control information but also offers navigation options in the form of points of interest such as slides transitions and the position of lecturer in the classroom.
Given that the interactive nature of our novel multi video object offers students several alternatives for watching its contents, it is opportune to investigate how students interact with multi videos. After all, students not only can choose which video stream to watch -and watch the chosen video using the conventional video playback controls -but also have available several points of interest to which they may navigate to. Because it is important to be able to analyze how students watch the lectures (e.g. [8] [12]), our system logs the students' interactions with the multi video object.
When the multi video is used by students of traditional and distance learning courses, is opportune to investigate if there are differences in how students watch and interact with multi videos. As an initial step towards comparing how students from traditional and distance courses interact with our interactive multi video object, we captured a problem solving lecture and offered the resulting interactive multi video to two groups of students as extra learning material they could use while preparing for exams. One of the groups attended a traditional, classroom-based course, and the other group attended a distance learning course.
For our study, a 38-minute problem solving lecture in the theme of Database Design was recorded using our system. The lecture was captured in 12 separate sessions, each session using three sources of video: one video captured the slides with the specification of the exercises shown in the instructors' computer; one video captured the instructor while presenting the solution using a set of (animated and projected) slides; one video captured the slides with the solution directly from the computer in charge of the projection (in full resolution). As a result, the problem solving lecture is composed of 12 short multi videos, each one composed of synchronized videos, audio, context information and navigation alternatives. Moreover, the 12 multi videos can be played back in sequence as a single interactive multi video object.
In this paper we present observations of how the two groups of students watched the multi video multimedia object generated with our system. We could observe, for instance, that students from the traditional course used the alternative views allowed by the multi video more than students from the distance learning course, while students from the distance learning course spent more time watching and interacted more with the interactive multi video.
This paper is organized as follows. In Section II we briefly introduce our system. In Section III we detail the captured lecture and the corresponding multi video object. In Section IV we report our study with respect to how the two groups of students interacted with the same interactive multi video object. In Section V we present our final remarks. VIDEO We have instrumented a classroom with cameras, electronic whiteboards and computers (Figure 1), and built a prototype system with tree main modules (details of the corresponding software architecture are given elsewhere [17]):

II. FROM LECTURE CAPTURE TO INTERACTIVE MULTI
• classrec, to capture several information streams from a lecture; • classgen, to generate an interactive multi video by orchestrating the information captured by the classrec module and; • a player which allows the playback of the interactive multi video on HTML5-compliant browsers. The capture infrastructure has options that allow the capture of a lecture in lecture modules which may last a few minutes or up to two hours. This provides flexibility for the capture of traditional lectures delivered to students in the classroom, and for the capture of a lecture split into several modules. The latter allows, for instance, the recording of one or more modules of the same lecture several times.
The player ( Figure 2) is designed so that the multi video object corresponding to the lecture may be reconstituted and explored in dimensions not achievable in the classroom. The student is able, for example, to obtain multiple synchronized audiovisual content that includes the slide presentation ( Figure 2(1)), the whiteboard content ( Figure  2(2)), video streams with focus on the instructor presenting the slide (Figure 2(3)) or the whole classroom ( Figure  2(4)), or the lecturer's web browsing, among others.
Moreover, at any time the student may click on one of the small video windows (Figure 2(2) to Figure 2(4)) to have that particular video presented on the main window on the top, which also causes the video stream that was presented in the main window to presented in the small one.
The student may use the control for navigation by points of interest (Figure 2(5)) to perform semantic browsing via next/previous module, slides transition, change of view of the instructor (e.g. a close-up, in front of the whiteboard or the whole classroom). Given that only the videos needed at one time are down-loaded via streaming, users with a bandwidth of 1 Mb at home are able to watch and interact with the multi video objects. This is because the size of the of each captured video is about 3.3 MB per minute: the lecture discussed in the next section, for example, lasts for is 38 minutes and contains three videos totalizing 360 MB; the problem solving lecture detailed elsewhere lasts for 125 minutes and contains three videos which totalize 1285 MB [18].
It is important to observe that all video streams ( Figure  2(1) to Figure 2(4)) play/pause/stop synchronously. It is also important to observe that, when a lecture is split into several modules, each module is presented in sequence, each module showing its own interactive timeline ( Figure  2(6)).

III. ONE LECTURE, 12 MODULES, ONE INTERACTIVE MULTI VIDEO OBJECT
Using the capture tool system prototype, one instructor captured one lecture without students in the classroom.
The lecture was a problem solving session for a Database Design course in which the instructor solved a large problem: an entity-relationship diagram was gradually designed from a set of requirements.
The instructor planned the lecture in 12 modules, totaling 38 minutes of content. All modules had a short duration, with an average duration of 3 minutes and 10 seconds. The first module presented an introduction to the problem, and the last module summarized the solution. The solution was developed from the 2nd to the 11th modules.
The lecture was captured in 12 separate sessions, each session using three sources of video: one video captured the slides with the specification of the problem shown in the instructors' computer; one video captured the instructor in front of the whiteboard, while presenting the solution using a set of (animated and projected) slides; one video captured the set of animated slides corresponding to the solution, which can be reviewed in a resolution higher than in the video capturing the instructor in front of the slides. Due to the modular organization of the sessions, the instructor could repeat the recording of any session as many times as necessary. Figure 3 depicts the multi video object automatically generated by the system, and presented using our presentation engine. The three video streams are presented in separate windows: the video stream with the instructor in front of the slides is shown in Figure 3(1); the video stream with the set of animated slides corresponding to the solution is shown in Figure 3(2); the video stream with the slides presenting the specification of the problem shown in Figure 3(3). The student may click on one small window (Figure 3(2) and (3)) to cause the corresponding video to be exchanged with the one presented in the large window (Figure 3(1)).
It is opportune to observe that the system offers an option to generate a multi video object that performs the automatic orchestration of which video is presented in the top window. In this case study we did not use this option, since we aimed at observing which interactions the students performed themselves.

IV. OBSERVING STUDENT INTERACTIONS
The aim of the study we present in this paper was to observe how students from different modalities -traditional and distance learning courses -interact with the same multi video object. As interaction options students could (a) select which video stream to present in the top window, (b) select a point the timeline corresponding to the current module, and (c) use navigation control to browse to the next or previous point of interest. For this lecture points of interest were moments in which the instructor (a) changed slide, (b) used the laptop computer, or (c) used the whiteboard.
The interactive multi video corresponding to the problem solving lecture in Database Design was offered to two groups of students as extra learning material, in preparation for exams. One of the groups attended a traditional, classroom-based course, and the other group attended a distance learning course.

A. Students background
The distance learning group had 92 students from a Bachelor in Information Systems (BSI) course. The group SPECIAL FOCUS PAPER INTERACTION WITH A PROBLEM SOLVING MULTI VIDEO LECTURE: OBSERVING STUDENTS FROM DISTANCE AND… from the traditional classroom group had 25 students from a Bachelor of Electrical Engineering course (BEE). BSI students take the Database Design class earlier in their courses (BSI in the 3rd semester, BEE in the 7th semester).
The BSI course is a distance learning course, and students take their classes in this modality. Therefore they are used to watching web lectures as well as using other resources available in via learning management system (LMS) such as discussions in forums. The BEE course is a traditional classroom-based course, and students take their classes in classrooms were lectures are delivered by instructors. Therefore they are used to have their instructors discussing over material presented on slides, on whiteboards, on the web or other software, etc.
The background of both groups is quite different. BEE students are younger (23 years old in average) than BSI students (more than 40% are over 30 years old). BEE are full time students while BSI students are all part-time students with full time jobs. Another important difference is that 46% of BSI students have previous higher education degrees, mostly in science and math or information technology. This background implies that BSI student are more mature than BEE students.

B. Interactions
The multi video object was made available for the students on the Web. We logged all the interactions carried out by the students, such as when and where the users clicked and to which point they jumped to in the presentation timeline. Our infrastructure includes python scripts to extract information relative to how the students interacted with the multi video.
A total of 25 students from the distance learning course and 15 students from the traditional classroom-based course interacted with the multi video object for more than 4 minutes (we did not consider students who watched less than 4 minutes). From now on, we also use: • DLC students to refer to group of the 25 students from the distance learning course, and as • TLC students to refer to the 15 students from the traditional course. Table I summarizes data corresponding to interaction time and number of interactions, which suggests that, overall, DLC students tend to spend more time watching the video than TLC students, while TLC students tend to use the interaction alternatives more than DLC students. However, these differences are not statistically significant.  The charts in the figure show that students from both courses used the main video selection more than other navigation options (51% for DLC and 66% for TLC). Moreover, while the play/pause interaction options were used more often by the students from TLC group, the navigation via the timeline was used more often by the students from DLC group. Moreover, the DLC students tend to use the temporal navigation alternatives more than the TLC students. Interestingly, TLC students did not use the navigation by points of interest at all -this may be related to the fact that they are not used to video-based lectures for their classes. Figure 5 depicts the total time spent and the number of interactions performed on each module by DLC students (Figure 5(a)) and TLC students ( Figure 5(b)). The data is normalized by the duration of the module. The vertical scales refer to time (left) and number of interactions (right). Overall, the graphs show that even though both groups had similar interaction behavior along the modules, DLC students interacted more. For instance, module 2 was the one watched for more time by both groups but, while for DLC students the ratio time/module duration is 40, for TLC students the ratio is 18. Similarly, for the same module the ratio number of interactions/module duration is 31 for DLC students and 18 for TLC students. Figure 6 summarizes the watching behavior of modules 2, 4 and 8. The horizontal axis refers to time duration (in seconds) of each module (Presentation Space).  As modules always start from instant 0, it is natural that the attendance of the first seconds is highest one. Overall, more DLC students tend to interact and in more opportunities than TLC students. Moreover, even though the actual behavior varied among the modules, the two groups had similar behavior : this can be observed comparing lines 1 and 2 for modules 4 and 8. Moreover, this information can be useful or instructors to find out which portions of a lecture were more useful or important for the students or, conversely, which portions students had more difficulty in understanding.
Given that the multimedia object has more than one video stream and that the students can choose which stream they wish to see as the main stream, the information of which stream is most selected as the main stream at each moment is useful not only for lecturers but also for students, who can check the most attended versions of the lecture. Figure 7, Figure 8 and Figure 9 summarize which streams were most selected as the main stream in each moment for some modules by DLC and TLC students. Each line represents how many times a stream was watched as the main video in a specific moment. In all figures, • line Instructor refers to the video capturing the instructor in front of the whiteboard (Figure 3(1)); • line Specification refers to video capturing the slides with the specification of the problem (Figure 3(3)) and; • line Solution refers to video capturing the slides corresponding to the solution (Figure 3(2)).     Considering the two specific groups of students, we could observe that: • Overall, TLC and DLC students gave a similar amount of attention, and of interaction, to the different modules; • Overall, DLC students interacted with the multi video object for more time and using more interaction options. This is expected since watching video lectures is part of the everyday activities of DLC students; • Overall, DLC students selected the slide captured view to be presented in the main window more often than TLC students, who preferred the view containing the instructor presenting the slide. This is probably related to the fact that TLC students are used have the professor in the classroom presenting the slides, which is not the case with TLC students; • Overall, TLC students were more active in selecting videos to be presented in the main window. This may be related to the fact, while TLC students are used to participate in lectures in the classroom environment where they have live access to the different views of the lecture as provided by the several video streams, DLC students are used to be watch web lectures in which the change of views, if any, is made by the video editor prior to the publication of the video; • Overall, TLC students did not make use of the navigation by points of interest. This may be related to the fact that they were more active selecting each video to show in the main window; • Overall, all students made use of the navigation using the timeline. This may be related to the fact all students are used to timeline-based video navigation; • Overall, all students made use of module-based navigation but did not use other points of interest to navigate. This may be related to the fact that the modules had short duration.

V. FINAL REMARKS
Because video has become "a premier media for learning" 8 , teaching and learning video-based technologies 8 sites.google.com/a/ionio.gr/wave/about demand studies involving several dimensions that include analyzing student behavior and providing novel interacting opportunities.
We have built a prototype system that allows capturing several video sources along with context information, so that a multi video is automatically generated. Analyzing the data from student interactions has been shown to be useful for instructors to identify points in which the lectures can be improved [18].
In this paper we report results from observing two groups of students, one group attending a distance learning course and the other a traditional course, who were offered a multi video generated from capturing a problem solving lecture as additional learning material to prepare for exams. Observing the data from the student interactions we could identify some aspects that are similar and other that are distinct for the two groups. We plan to use these results to offer customized features for the groups. In other words, the difference in navigation behavior can be considered as a type of context information that influence the services offered by the system. As one example, automatic video selection may be the default option for students from traditional courses because they are usually more passive in terms of navigation-based interactions. As another example, the default view for DLC students may be set to be the slide view in the main window while the default view for the TLC students is the one in which the instructor is in front of the slide.
In the short term, we plan to study the performance of students who make use of our system. We are also investigating novel facilities to be provides to instructors who demand other capture infrastructures.
Although we do not provide instructors with visualization interfaces which they can use to analyze the watching behavior of the students, this is an important new requirement as stated by the instructor of the lecture discussed in this paper as well as by other instructors who used our system [18] [8]. This is in fact a need recognize by researchers involved in video-based learning (e.g. [8] [12]).
Our plans for future work also include capturing more contextual information during the presentation toward providing novel navigation facilities, and the development of visualization tools for the instructors to analyze the students' multi video object interaction.