A Framework for the Use of Immersive Virtual Reality in Learning Environments

Immersive Virtual Reality (iVR) technologies can enrich teaching and learning environments, but their use is often technology-driven and instructional concepts are missing. The design of iVR-technology-supported learning environments should base on both, an evidence-based educational model as well as on features specific to iVR. Therefore, the article provides a framework for the use of iVR in learning environments based on the Cognitive Theory of Multimedia Learning (CTML). It outlines how iVR learning environments could and should be designed based on current knowledge from research on Multimedia Learning. Keywords—Virtual reality, instructional design, immersive virtual reality, pedagogical framework


Introduction
Virtual reality (VR) technology is increasingly promoted as a promising educational tool in various training settings [1,2], like health care [3 -6] or engineering [7 -9]. While the educational use of VR is growing, little is known about the learning processes occurring in VR environments [10]. When asking "Where is the pedagogy?", Fowler [11] is urging for models explaining learning in VR environments. Using VR technology in training should be a balanced decision that considers the positive and restrictive attributes of VR. Several educational benefits of implementing VR have been reported in the literature. It is highly motivating, increases student engagement, provides high-quality visualizations, and creates the feeling of being present [1][2][3][12][13][14]. However, the additional educational value of VR differs in terms of intensity in (1) immersive VR (iVR) environments and (2) 3D environments that are presented via a 2D display. A distinction between low immersive VR, that is based on traditional devices like mouse and keyboard, and high immersive VR, that generally involves a head-mounted display (HMD), is typically made in the literature [15,16]. This paper focusses on how to design iVR learning environments to support meaningful learning. However, iVR technology is subject to certain restrictions. Disadvantages of using iVR are related to time and costs necessary for developing hard-and software, possible health and safety effects, the uncomfortable nature of wearing HMDs, possible reluctance to use and integration into learning scenarios [17]. Additionally, especially immersive VR environments are more likely to distract and overload users and result in lower levels of learning. Depending on the instructional goals, integrating iVR may reduce working-memory capacity and thereby interfere with learning processes. Thus, the use of iVR technology should provide an added value for learning results [18]. Following instructional guidelines, the opportunities of iVR should be exploited as well as challenges by using iVR should be addressed. Developing a powerful learning environment requires consideration of features specific to iVR technology. So far, research on iVR applications is often technology-driven and focuses on anecdotes, case studies and demonstrations of technical prototypes. Neither learning processes are mentioned in iVR nor do instructional methods form the basis of training applications [1,2,11,19]. This leads to the situation that in some cases iVR seems to be uneconomical, ineffective as well as exaggerated, i.e., too complex, or inappropriate to fulfil a training goal and other learning media (e.g., simulations, pictures) might have been a better choice regarding costs and benefits [20]. Consequently, instructional designers and computer scientists should work closely together to develop iVR learning environments that are based on educational decisions. This paper suggests that iVR technology used in educational settings should be designed according principles to design multimedia to benefit from its promising characteristics. Hence, an approach is needed that offers instructional hands-on guidelines how to design iVR learning environments to take full advantages of the technology and overcome obstacles.

Theoretical Background
The development of our framework is based on two assumptions. First, we understand learning in iVR as multimedia learning, because in virtual worlds images and texts are presented in combination. Secondly, learning is an active process that goes beyond the mere repetition and reproduction of information or central concepts. Therefore, we are following the distinction between rote and meaningful learning postulated by Mayer [21]. Whilst rote learning leads to superior performance in retention processes, meaningful learning supports transfer processes as well. Meaningful learning refers to the application, i.e. the transfer of knowledge to solve problembased tasks. This is illustrated by the revised taxonomy for educational objectives [22], according to which meaningful learning addresses the goals of understanding, applying, analyzing, evaluating and creating [21]. To enable meaningful learning, Mayer suggests to design instructional media according to the principles stated in the cognitive theory of multimedia learning (CTML) [21,23].
These two assumptions guide the development of our framework. Therefore, we first introduce the CTML, how multimedia learning works and the consequences for the instructional design to help learners to learn with multimedia instruction. Subsequently, we will describe the key features of iVR technology. In a final step, we synthesize both, CTML and features specific to iVR. As a result, we present the meaningful iVR learning (M-iVR-L) framework. Based on this framework, we identify design guidelines to enable meaningful iVR learning.

Multimedia learning
The combined presentation of words (spoken or written) and pictures (static or animated) for the purpose of learning is known as multimedia learning [24]. This term, originates from the empirical work of Mayer and colleagues, is widespread and influences research on various instructional media like computer games, simulations, video and also iVR [25][26][27]. The CTML is a theoretical framework of how people learn with instructional media [24,28]. Therein, three principles are assumed [29]. First principle, has its roots in dual coding theory [30], that is that people process information within two different channels. One channel is responsible for the processing of verbal information, the other one of visual information. Second principle, the limited capacity of each channel, builds upon the findings of cognitive load theory (CLT) [31,32]. This instructional theory has proven that human working memory capacity is limited and therefore instruction has to abandon the application of inappropriate instructional approaches. This is achieved by reducing unnecessary strain for learning, i.e. extraneous cognitive load, as much as possible [33]. These two rather cognitivist principles are supplemented by a third constructivist one, namely learning as a generative activity [29]. According to generative learning theory of Wittrock [34,35], learning is an interplay between already stored information with new stimuli and is effective when learners active cognitive processing is stimulated. In CTML, active cognitive processing is stimulated through the engagement of learners in selecting the relevant material, organizing it into a coherent structure, and integrating it with prior knowledge [29,36]. Learning with instructional media according to the principles stated in CTML is then what Mayer calls meaningful learning where learners acquire knowledge and skills for the purpose of effective problem-solving [21,23].
Over the last 30 years, empirical research has repeatedly confirmed the assumptions made in CTML. From these results three major instructional design goals emerged that should be considered when designing multimedia learning environments [29].

Instructional design goals
Instructional design goals are based on the scientific study of how to help people learn, i.e. the science of instruction. The current assumption here is that hands-on activities by themselves cannot foster meaningful learning, but cognitively guided active processing can do so [29]. With the principles of CTML in mind, three instructional design goals are essential to help learners learn with instructional media: First design goal is the reduction of extraneous processing. This is dispensing distracting aspects within the multimedia learning environment, like background music (coherence principle) or presenting onscreen text during narration (redundancy principle) [37]. Another way is the physically and temporally synchronous display of information. The positive effect on learning outcomes of temporal and spatial contiguity principle is also confirmed in metanalyses [38]. The same applies to the signaling principle where symbols and colors are used to guide learner's attention on relevant material. The positive effect of signaling on learning outcomes is well documented and robust [39].
The second design goal refers to scaffolding which helps learners to manage the essential processing to avoid cognitive overloading. Principles that help are the modality principle, segmenting principle and pretraining principle. The modality principle states that it is better to present images with spoken instead of written text. This presentation format leads to better learning, at least for less complex content [40,41]. The distribution of complex material into smaller learning units is recommended by research on the segmenting principle. If considered learning time increases, cognitive load is reduced and a positive effect on both memory and transfer tasks occurs [42]. For learners with little prior knowledge of the to be learnt content pretraining is an effective principle. Here, learners study the basic concepts of a lesson before interacting with the multimedia instruction which in turn frees up working memory capacities for the essential processing [43].
Making sense of the material through generative processing is the third instructional design goal. Here, the use of social cues and generative learning strategies are recommended. Social cues are the usage of conversational language during narration (personalization principle), speaking of information or instructions with a friendly and human voice (voice principle) as well as the application of human-like gestures for animated content (embodiment principle) [44]. Generative learning strategies in multimedia learning are self-explanation [45] and drawing principle [46]. Other strategies like self-testing, summarizing, mapping and teaching are already well investigated for traditional media like textbooks and are now gradually finding their way into the design and research of more emergent instructional media (for an overview see [47]). The use of learning strategies is based on aforementioned generative learning theory [48] and the idea that learning is an active construction of knowledge. The positive effect of such strategies has already been proven for instructional videos. Learning with video becomes with the help of strategies an active engagement with the content instead of a purely passive consumption [26].
Aim of the outlined instructional goals is to help learners gain skills and knowledge that are applicable to new problems and tasks. This claim regarding the transfer of learning outcomes is the difference between retention-based or rote learning and meaningful learning as it is understood by Mayer and colleagues [21].

Key features of VR technology
The application of multimedia principles and instructional goals within iVR learning environments requires a profound understanding of the medium itself and factors that affect individual perceptions of iVR technology. VR can be described as "the sum of the hardware and software systems that seek to perfect an all-inclusive, sensory illusion of being present in another environment" [49]. This distinguishes VR from other reality enhancing technologies such as augmented reality (AR) and augmented virtuality (AV). These are placed on the reality-virtuality-continuum of Milgram and Kishino [50] between the real environment and the entirely computer-simulated environment (Fig. 1).  Fig. 1. reality-virtuality-continuum [50] VR learning environments, especially the more immersive ones, allow the realistic visualization of three-dimensional (3D) data and support an exciting real-time learning experience. They can improve performance outcomes, enable high interactivity with objects and persons, allow to present a virtual environment that resembles the real world, offer feedback from the simulation to the learner and foster conceptual understanding by providing an effective and unique way to learn and motivate learners [51]. Learning environments building up on this technology offer authentic learning activities that other media (e.g. video) cannot provide appropriately (e.g. turn and rotate elements of mechanical installations that are not available in real world). There are different ideas about the key characteristics of VR that distinguish VR from other educational media [52 -56]. Burdea and Coiffet [52] define VR as "I3" (Immersion-Interaction-Imagination).
Immersion: Immersion is one factor that contribute to the capabilities and impact of VR as it can bridge the technical features of a 3D environment, the experience of presence and the educational affordances of a task. Immersion can be classified into:

1) Mental immersion 2) Physical immersion
It plays an important part in creating a successful personal experience within a VR environment. When the user is moving, the visual, auditory, or haptic devices that establish physical immersion in the scene are changing in response. A user can interpret cues to gather information while navigating and controlling objects. Naturally, the more sensory inputs are present in a virtual environment, the easier it is for the user to visualize and feel incorporated into the world [57]. Mental immersion refers to the tension to be deeply engaged within a VR environment [58]. Hence, immersive environments can offer learners rich and complex content-based learning while also helping learners to improve their technical, creative, and problem-solving skills [51]. Slater and Wilbur [59] identify five characteristics to describe immersion: inclusiveness (diversion of focus from the real world), extensiveness (extent of sensory input), surroundingness (extent of panoramic display), vividness (richness of features) and proprioceptive matching (alignment of perceptual means with the virtual interface).
Interaction: Another feature that contributes to the success of learning in 3D environments is interaction or interactivity [56,60,61]. That is, a VR system can detect an input (e.g., a user's gesture) via multiple sensory channels (e.g., haptic, visual) and provide real-time response to the new activity instantaneously. At the same time, users can see activity change on the screen based on their commands and captured in the simulation [51]. Interactivity includes the ability to freely move around in a virtual environment, to experience it ''first-hand'' and from multiple points of view, to modify its elements, to control parameters, or to respond to perceived affordances, environment cues, and system feedback. Interaction has also often been linked to immersion, indicating that user control over the environment was important for the experience of being present in VR [62]. VR learning environments enable several interactions (e.g., navigation, selection, manipulation). When using an HMD, the user can navigate freely if he does not leave the range of the tracker, is hindered by cables or hits a wall of the real room. By a simple touch with the input device, selection can be made. The position of the input device in the virtual world is represented by a 3D cursor, for example in the shape of a human hand. If the objects are too far away, techniques such as laser pointers or crosshairs can be used by pointing too distant objects. The manipulation of objects in the real world is manifold (touching, lifting, rotating, turning on etc.). In VR, it is usually precisely defined which objects allow which interactions and special 3D widgets are required (e.g., spotlight manipulator, Through-the-Lens-Camera Control etc.) [63 -65].
Imagination: A further construct that is specific to VR is imagination [52]. It refers to the human mind's capacity to perceive non-existent things. VR supports the user to elaborate on thoughts and engage in meaningful learning. This requires you to wilfully put yourself into a suitable frame of mind. It takes active attention as well as active mental modelling of what one is perceiving [66]. For Jonassen [67], VR technologies can activate cognitive tools that help learners to elaborate on their thoughts and to engage in meaningful learning. Therefore, a VR environment triggers the human mind's capacity to imagine in a creative sense non-existent thing. Hence, VR technologies are well suited to convey abstract concepts (e.g., the inside of a machine) due to visualization abilities [51]. To stimulate imaginations of direct experiences, sensory information should be balanced with prior knowledge to avoid under-or overstimulation, which would impede imagination [68].
Depending on their particular instructional goals, many educators regard it as unnecessary to deploy all three features. Hence, numerous virtual learning applications integrate interaction and immersion, whereas imagination seems to be underrepresented [69]. However, focus of current research in the field still often seems technologydriven, whereas it is crucial to explore how to design an iVR learning environment to accomplish learning objectives and to enable meaningful learning. In the following section, we therefore aim at developing a framework that provides guidelines for the instructional design of iVR learning environments and to encourage stakeholders to implement iVR for their own learning scenarios based on this framework.

Meaningful iVR Learning (M-iVR-L) Framework
In this section, we bring the principles of CTML, the instructional design goals to support learning and the key features of iVR technology, described in section 2, together. As a result, we postulate the meaningful iVR learning framework (see Fig.  2) which considers the design of iVR learning environments as a process. The principles found in CTML influence the instructional design goals [29] and these must be taken into account with respect to the technical features of iVR technology to enable meaningful learning.
Consequently, we propose six evidence-based recommendations within our M-iVR-L framework that should be considered when designing iVR learning environments.

Learning first, immersion second
With the raise of iVR technology, the key feature of immersion was claimed as supportive for learning, e.g. because of its possibility to provide situated learning through authentic contexts and tasks [70]. Recently, studies comparing learning scenarios in low immersive VR media (like desktop computer games) with iVR media draw a contradictory picture. On the one hand, the study in [12] found that an iVR simulation leads to a higher feeling of being present in a virtual lab but less learning compared to the low immersive desktop condition. This was also found in [71]. On the other hand, the studies in [72][73][74] found evidence for a positive influence of feeling of immersion on learning outcomes. We recommend, on behalf of the instructional goal to reduce extraneous processing, to carefully think about the grade of immersion necessary. If a higher degree of immersion is not relevant to achieve the learning objective, here, less is more.

Provide learning relevant interactions
Learn-relevant physical activities can positively impact declarative knowledge acquisition and are unavoidable if procedural knowledge, i.e. skills, are to be obtained. This learning strategy is known as enactment and can foster generative processing [47]. However, it should be noted that enacting is only beneficial if the movements performed are relevant for a certain learning task [75]. This is also true for iVR learning, where for example the use of controllers let learners perform object manipulation with virtual representations of one's hands. In [76] high levels of interactivity are found to be helpful for learning, while in [77] compared to a video condition without generative processing, no advantage for iVR was found for procedural knowledge gain or transfer performance. To optimize iVR learning in terms of interaction, we postulate two recommendations: First, avoid unneeded and learn-irrelevant interactions. Second, enable the learners pre-training, not only in terms of basic concepts, but also on how to use the iVR interaction tools.

Segment complex tasks in smaller units
Content in iVR learning environments is an extremely complex form of multimedia instruction with the high risk of overwhelming learners. The influence of this possible distraction was tested in two studies, also through EEG measurement regarding cogni-tive load. In both studies it was found that the iVR groups cognitive load was higher and at the same time the scores in a retention and transfer test were lower compared to a slide show presentation group [25,78]. Similar results for cognitive load levels were found in [12] and [79]. The authors of the mentioned studies point out that iVR can increase extraneous load, which is the type of load that hinders learning [70]. Providing scaffold to manage essential processing is one way to overcome this issue. For example, in [25] an iVR simulation on the human body was divided into six smaller segments with a summarizing phase after each segment. Therefore, the iVR group with segmented lessons outperformed an iVR condition without segmenting and compared to the slide show group similar performance levels were reached. We conclude that breaking down complex tasks into small segments is also effective for managing essential processing in iVR.

Guide immersive learning
The role of guidance is still a debated topic in educational psychology and beyond. Even if there seems to be at least some agreement that completely unguided discovery learning is not useful due cognitive overload issue, the debate is now about timing and form of guidance for effective learning [for an overview see 80]). As mentioned in section 3.1, iVR itself increases cognitive load, whereas it is the responsibility of instructional designers to provide appropriate guidance. If not, especially novices will feel overloaded and thus not learn [32]. Evidence for that claim was found in [81], where elementary students reported high levels of presence but not for perceived learning during an iVR field trip. Here, highlighting essential material (signalling principle) as well as the use of pedagogical agents, designed based on personalization and voice principle can guide learners through the iVR learning environment. Guidance can also foster generative processing through just-in-time information that fades away if the learner has built higher levels of knowledge and skills to solve the next learning task [82]. For example, in vocational education novices practice car-painting through an iVR simulation with hints and information during the process. After reaching a certain skill level the hints fade away, still giving the learner the chance to call for help if needed [83].

Build on existing knowledge
To foster learning activities, new information should be balanced with prior knowledge to avoid under-or overstimulation [51,68]. Worked examples and tutorials may help learners with a low level of prior knowledge, but hinder learners with a high level of prior knowledge. This phenomenon is called expertise reversal effect [84] and is also valid in iVR learning scenarios [85]. We recommend, to determine learners' current level of knowledge to adjust severity as well as amount of support. This needs to be an ongoing process during learning progress. Depending on their current level of knowledge, learners need preparation, inside and/or outside iVR (pertaining principle), which frees up working memory capacities for the essential processing within the iVR learning task. Supportive information helps to keep the cogni-tive load low, especially for learners with little prior knowledge [43]. This principle has already been tested within iVR. Compared to a group with video instruction and with and without pretraining, the iVR group with pretraining achieved the greatest learning success in both memory and transfer [79].

Provide constructive learning activities
Today it is common consent that learning is an active process which engages learners in knowledge construction. Some construction processes are visible, like hands-on activities which often result in a self-designed artifact or product [86]. Others are not visible, like linking prior knowledge with newly acquired information which is based on human cognitive architecture [32,87]. What they have in common is the assumption that learning takes place through learning activities. Several learning activities were found to be effective in iVR learning. In [72] learners used the strategy of memory palaces in an HMD and outperformed a desktop based control group condition. In [25] the generative learning strategy of summarizing was used to foster processing. Here, the summary was written by the learners after each segment of an iVR simulation on the human body outside of the HMD. The same applies to the study in [77] for the learning strategy enactment. Here, the learners used a virtual lab in iVR and afterwards enacted physical objects on a table that represent the same laboratory tools manipulated before in iVR. Of interest in these last two studies is that the learner's enjoyment during the iVR lesson was not diminished through adding generative learning strategies. This means that iVR has the potential to be effective for learning and at the same time makes learning more enjoyable than traditional media like slide show presentation. We conclude with the words of David Merrill who stated that "information alone is not instruction" [88,89]. Learning is an active process of knowledge construction and even the most impressive, immersive, and realistic iVR environment will not promote learning if learners do not engage in learning activities. Therefore, we recommend providing constructive learning activities that enable learner's knowledge construction and the application of it to newly problem-based tasks inside or outside of iVR. http://www.i-jet.org

Conclusion and Future Research
IVR offers new learning experiences based on a vivid and lifelike learning environment [90 -92]. So far, there are only few examples that demonstrate the usefulness of iVR in learning applications. Therefore, we have worked out an evidence-based framework grounded on the widespread and proven theory of multimedia learning (CTML), its consequences with regard to instructional design goals and additionally have taken into account key features of iVR that make this technology unique.
Our framework consisting of six recommendations is not to be understood as final, it has been developed based on current empirical findings in learning with iVR. The key findings are that effective and enjoyable learning does not need high degrees of immersion in most cases, but it profits from guidance and the breakdown of iVR lessons into smaller units. Interactions must meet the learning objectives, if not, they can distract and therefore hinder learning. Taking prior knowledge into account to enable learner's efficient knowledge construction as for every learning environment also applies for iVR learning. Learner preparation inside as well as outside iVR is recommended. Due the fact that learning is more than just consuming information, constructive learning activities must be integrated, inside or outside the virtually designed world, if meaningful learning with iVR should happen. Teachers and instructional designers need not fear that iVR learning will no longer be perceived as joyful if they use our recommendations. The current findings show that learning strategies do not diminish learner's positive affective states towards learning with iVR.
However, even if our framework is based on an analysis of current literature findings, we need to point out that the proposal is still mostly centered on assumptions. For example, we have not incorporated the social dimension of learning in iVR [93] or other aspects like gamification and game-based learning mechanisms [94].
To learn more about these aspects, more carefully planned and rigorous designed research is necessary, both in real-classroom as well as in laboratory settings [95][96][97].
For example, not all principles of CTML are tested in iVR learning, nor were the learning strategies proposed in generative learning theory [47]. It is also noticeable that the added learning strategies used in the outlined studies were always established outside the iVR environment. Research studying their usage inside an immersive virtual world are completely lacking. For instance, it would be interesting if selfexplaining or teaching others (humans or avatars) is affected by the features of iVR and hence impacts learning outcomes.
The creation of an empirical basis on how learning happens in iVR should also consider replication studies like in [78]. Here, the authors found no beneficial effect of adding the learning strategy of practice testing to an iVR simulation compared to traditional slide show presentation neither for retention nor transfer.
Further open questions concern the already mentioned social interaction possibilities in iVR. It may be possible to reduce extraneous cognitive load in a collaborative iVR learning environment based on the claims made in collaborative cognitive load theory [98,99]. Thus, design elements found to be distracting in other studies would lose this negative significance and design collaborative iVR learning environments would have to be thought differently.