Design of Immersive Virtual Reality System to Improve Communication Skills in Individuals with Autism

—Individuals with autism spectrum disorder (ASD) regularly experience situations in which they need to give answers but do not know how to respond; for example, questions related to everyday life activities that are asked by strangers. Research geared at utilizing technology to mend social and communication impairments in children with autism is actively underway. Immer-sive virtual reality (VR) is a relatively recent technology that has the potential of being an effective therapeutic tool for developing various skills in autistic children. This paper presents an interactive scenario-based VR system developed to improve the communications skills of autistic children. The system utilizes speech recognition to provide natural interaction and role-play and turn-taking to evaluate and verify the effectiveness of the immersive environment on the social performance of autistic children. In experiments conducted, participants showed more improved performance with a computer augmented virtual environment (CAVE) than with a head mounted display (HMD) or a normal desktop. The results indicate that immersive VR could be more satisfactory and motivational than desktop for children with ASD.


Introduction
Because the behaviorally defined syndrome of autism is a spectrum disorder, it does not have a universally accepted treatment. However, despite this fact, creating generalized situations as part of an intervention program can potentially act as an individualized virtual environment solution to enhance the communication skills of the targeted children at their own pace. The nature of VR environments is such that they maintain the ability to induce controlled stimuli (verbal or nonverbal) and also facilitates monitoring of the behavior of the child within the virtual environment (VE) [3]. Because autistic children are unable to establish human-to-human interaction, with the use of explanatory environments, the VE replicates real-life scenarios in the form of human to avatar interaction [11]. Therefore, VR-based systems can prove to be effective as a medium for educating children with ASD in controlled interactive environments [3] [11] [13] [14].
Role-play and turn-taking approaches are efficient in establishing conversations in virtual scenarios that are similar to reality. In a controlled environment, VEs provide the advantage of enabling the design of specific social situations in which users can mimic avatar(s) by receiving instructions from them.
Halabi et al. [15] presented a preliminary study that confirmed the effectiveness of immersive environment on the social performance of an autistic child. That usability study showed promising results; however, no intensive evaluation was conducted on autistic subjects. In this study, the VR system was improved and intensive testing carried out on the effectiveness of the VR solution targeting communication skills in high-functioning children with autism. Therefore, this study was conducted with two objectives: (1) design and development of an immersive VR-based system with verbal/nonverbal input interaction based on the time required for the user to respond in the VE task, (2) investigation of the impact of immersion level on the performance of high-functioning autistic users. The subjects in this study experienced VR via different installations: (1) a four walls projection room computer augmented virtual environment (CAVE) that provides a high level of immersion and interactivity, (2) head mounted display (HMD), and (3) non-VR normal desktop screen. Even though our system was not designed as an intervention platform, a preliminary feasibility study was designed as a proof-of-concept application.

System design
The VR-based system comprises two main modules: 1) system architecture framework; and 2) social communication task module.

2.1
System Architecture Fig. 1 gives an overview of the system architecture. As can be seen in the figure, it comprises various integrated modules to provide VR-based interactive simulation: display and trackers, auto-navigation, speech recognition, and gesture recognition. http://www.i-jet.org The display and trackers module provide an immersive surround-screen display system comprising four rear-projected stereoscopic screens-three comprising the front, left, and right walls, and the fourth comprising the floor. The user wears polarized glasses with two markers on the rim of the glasses to enable tracking (by PPT-E tracking cameras from WorldViz) [16] in order to calculate the position of the head. The second display used is an Oculus Rift HMD. The third display is a normal 20 inches LCD display. These displays are used in the automatic navigation module to guide the user through the virtual environment based on the user's view. Parts of the display module also present different levels of immersion in the virtual environment.
The speech recognition module is used by the system to efficiently recognize the speech coming from the user in response to the virtual character(s) in the VE scenario, which facilitates the major project objective of improving primary social communication skills in children with autism. A wearable microphone is used to allow the system to receive the user's voice. Microsoft speech SDK is used to detect the recommended word within the scenario. However, a major problem is that the engine can only deal with the English language whereas support is needed for Arabic language, as most of the patients in the gulf region may not be able to communicate in English The gesture recognition module is included in the system to enable detection of hand gestures, such as waving hello, from the user. This contributes toward nonverbal communication and facilitates tracking of any kind of (verbal/nonverbal) response from the user. A LEAP Motion device is used in the system to obtain the position of the hand and its skeleton information for mapping of the user's hand into the system.
The virtual environment and avatar were developed, to achieve verbal and/or nonverbal response from users with ASD, using Vizard from WorldViz [16]. The software comes with a limited set of resources including avatars, virtual objects, and scenes. Therefore, we designed customized avatars and virtual scenes to create specific social communication environments for our study. The virtual world and characters were designed and rendered in 3Ds Max. Each avatar, generated using Autodesk's Character Generator and rigged and animated using Autodesk's 3Ds Max and Mo-tionBuilder software, was built with a purpose that contributes to the scenario.
The environment selected was a classroom, which is commonplace and should be familiar to most of the subjects. Fig. 2 shows the school virtual environment and the avatars created for the simulation.
The auto-navigation module uses the tracking sensors as input and starts the automatic navigation through the virtual environment based on the user's view.
All the modules manipulate the virtual environment and characters in the system, which results in view and sound output changes in accordance with input changes. Users experience changes in the output devices based on these modules and the input provided.

Social communication task module
In order to improve communication skills in high-functioning children with autism, it is essential that the environment developed efficiently triggers the expected response from the user. We developed a classroom (school) "greeting" scenario involving communication between a virtual character and the user.
Our communication task module comprises (1) the "greeting" scenario, and (2) the 3D environment and characters to implement the scenario. Scenario: We developed the "greeting" scenario in consultation with experts and practitioners in the local center of education for the autistic, who handle autistic children on a daily basis. The scenario was also inspired from the novel "Social Stories" techniques in use by a number of autism-related VR solutions, which further justifies our approach [17]. The "greeting" scenario, a general situation that an average indi-vidual faces multiple times per day, can help to track the user's level of social interaction with multiple usages of the VR system.
We start the scenario by virtually auto-navigating from outside the school environment into the school and entering the classroom through the corridor. The classroom environment is the main setting where the conversation development task takes place. A virtual teacher avatar in the classroom stands with two other virtual student avatars in front of the chalkboard. The teacher initiates conversation and instructs all the students, including the subject, to perform the greeting task, see Fig. 3-Fig. 8.
The virtual avatar of the teacher has a look that is similar to that of actual instructors/therapists in schools during their sessions with autistic students. Both the virtual avatar of the students in the virtual environment and the virtual environment itself (school and classroom environment) try to follow a similar presentation conduct to provide high relatability to the participant. Keeping in mind the constraints related to the curriculum being followed at these traditionally practicing institutions, and the inherent limitations of high-functioning autistic children in general, the scenario is developed to cope with the barriers in initiating or responding to conversations with strangers.
Role-play: Because our scenario is shaped around obtaining a response from the autistic child, all the tasks and conversations in the scenario coordinate to raise the potential of the user responding to the task given to him/her by the virtual teacher (i.e., to greet her back) by showing the user (through the other virtual students playing their character in the scenario) how to reply to a greeting. User response is triggered/instigated (but not forced) using role-playing by the student avatars in the VE.
Turn-taking: In order to ensure a structured form within the task, each virtual student character is asked to perform the assigned "greeting" task on a turn-by-turn basis. After the teacher instructs the students about the task, each student is greeted and asked to reply to the greeting. The subject is then assigned the same task at the end of the scenario. This teaches the user to respond when spoken to and wait for his/her turn when in groups.
The Scenario in The Virtual Environment: We designed the virtual world to display and integrate the scenario outlined above in the following order: 1. Auto-Navigation: The subject automatically navigates through the virtual environment (from outside the school to inside the classroom through the corridor), Fig. 3. 2. Welcome and Introduction: The virtual teacher character introduces itself and the virtual student avatars, Fig. 4. 3. The virtual teacher greets one student avatar in the VE first, and receives a response, Fig. 5. 4. The teacher then greets the next virtual student in the VE and gets similar verbal and nonverbal responses (i.e., "waving animation"), Fig. 6. 5. The subject in test gets the final turn in which the virtual teacher avatar greets him/her (customized using the name of the user in the teacher's speech), and the teacher waits for a response from the subject. 6. Voice and Action Monitoring: The subject's voice and physical motion are continually tracked to record a response and the corresponding response time, Fig. 7.
7. Task Completion: The virtual environment displays text labels that show the amount of time taken by the user to respond to the task assigned (i.e., the subject has returned the greeting), as shown in Fig. 8.

Experimental Setup
Experiments were conducted using the VR environment described above. The participants' view orientation was tracked by the Positional Tracker in the Oculus Rift HMD and/or the PPT Tracking Camera System for CAVE display experiments.
The scenario was presented on three interactive displays with different levels of immersion-specifically, (1) desktop computer with no immersion factor, (2) Oculus Rift HMD, and (3) CAVE immersive display. Fig. 9 shows screenshots of the subjects during the experiments in the three different display types. The usability of the system was evaluated using a group of three children with ASD in the age range 4-6 years and a group of seven typically developing (TD) children in the age range 9-12 years. According to Happé [18], these two groups can be considered equivalent as it has been found that normal children at a verbal mental age of four years are equivalent to high-functioning autistic children at a verbal mental age of more than nine years because they both have a 50% chance of passing the theory of mind test.
These two categories were selected for the following two reasons: (1) to fine-tune and refine our system before starting with the target population and (2) to understand the similarities and differences in the interaction with our system by ASD and TD participants.
The system was designed to administer social interaction in VR involving computer-based bidirectional conversation. To evaluate the task performance, the start time and each participant's response time were recorded and displayed at the end of successful completion of the assigned task. The response time was logged when the system detected that the participant's speech was recognized by the voice input device. The task progress could be observed during the sessions by therapists and for tracking nonverbal response. The system also has the ability to receive input from a monitor to record response time in cases of nonverbal response, such as waving.

Tasks and procedures
We designed a usability study as proof-of-concept to investigate the feasibility of the designed system with the three different interactive displays. The participants were required to commit to a total of two sessions for the study (Sessions 1 and 2) on separate days for a duration of approximately 20 minutes per session plus their feedback. We first conducted a learnability session in which the team briefed the participants and their caregivers about the session and had the participants familiarize themselves with the immersive display setup(s) (CAVE and HMD) for genuine results. The participants were also told that they could choose to withdraw at any point during the session for reasons such as feeling uncomfortable with the system, or dizziness due to wearing the HMD. The participants were introduced to each system one by one. The session started with the scenario being presented to each participant on the desktop computer first, where the participant was equipped with only a microphone (for voice input) that was used for all three interaction sessions.
Second, each participant was presented with the same scenario and assigned the same task on the HMD (Oculus Rift) and the response time recorded. Finally, the scenario was replayed on the CAVE immersive display with the trackers and 3D glasses and the response time for each session similarly recorded. The order of presentation of the different displays was randomized among the participants. On the following test day, Session 2 was performed with the participants going through the same procedures as in Session 1 but without any guidance or pre-testing. An exit interview was then conducted at the end of the experiment to acquire feedback via a questionnaire. The questionnaire was designed to address questions and statements related to (1) Usability Satisfaction of the system in each interactive display, and (2) Impact of the level of immersion on the response of the participant. We extracted information about the satisfaction and impact of immersion level on the user (participant) based on their feedback.

Results and comparative study
All participants completed the sessions despite being given the option of withdrawing from the experiment at any time. The exit interview revealed that all the participants liked interacting with the system. When asked about any take-home lesson that they had from the conversation between them and the virtual teacher and classmates, most participants said that they learned that they should introduce themselves first when speaking to a new friend for the first time. These findings suggest that our system has the potential to be accepted by the target population of children with ASD.

System usability and acceptability
In the usability study, we investigated the effectiveness of our VR-based interactive system by analyzing the "Usability Satisfaction" feedback from the participants on every interactive device in each session. Our participants' exit survey feedback at the end of the experiment revealed a general trend among them to favor the CAVE environment while being satisfied with all the different interactive displays and virtual environments. They had a certain (acceptable) level of understanding of the scenario presented to them in the virtual environment. They were also able to recognize the system's virtual classroom, school, classmates, and teacher. They faced very minimal problems trying to feel at-ease in the virtual environment, according to our observation. Moreover, they remained focused on the VR system, in an attentive manner.
We found that the learning curve stabilized around an average response time of 30-50 seconds. For the first training session the participants had to get acquainted with our VR-based interactive system in the CAVE. In summary, the system proved to be an easy-to-use platform with which interaction would benefit our participants.

Quantitative analysis of participants' performance in multi-session interactions
Satisfaction: Analysis of the feedback from the subjects as well as their improved performance evidenced by decreased response time indicates that the subjects were, in general, satisfied with the virtual environment in our VR-based system. A decreased response time shows that the subjects' understood the scenario and the number of sessions it took to understand the virtual scenario to be able to respond promptly and correctly to the system.
In our comparative study, the three immersive interfaces displayed the same scenario to every participant in each session. The subject satisfaction for each level of immersion for each session for both the TD and ASD groups is shown in Fig. 10. The figure represents the five satisfaction statistics (minimum, first quartile, median, third quartile, and maximum). The median satisfaction for CAVE is 85% for TD and 82.5% for ASD; for HMD, it is 82% for TD and 78.5% for ASD; and for desktop, it is 57.5% for TD and 60% for ASD. The results clearly show that TD evaluated CAVE and HMD higher than did ASD, which means that they appreciated the immersive technology more than ASD did. In contrast, ASD evaluated desktop higher than did TD (60% to 57.5%); however, the value is still less than that for CAVE and HMD. The range for HMD can intersect the range for CAVE values, but it can be seen that the CAVE boxplot is particularly skewed to the top, which means that the extreme minimum is an outlier. Nevertheless, the median satisfaction for HMD is still less than the first quartile for CAVE, which strongly validates the differences in both TD and ASD participants.

Impact of immersiveness on participants' performance:
The scenario was tested on each subject with the three different displays in each session to determine the impact of the level of immersion of the VR system on the subjects' performance.
From the response times shown in Fig. 11(a) and Fig. 11(b), it is clear that CAVE has a much higher general impact and is more effective in obtaining the desired response from the subjects. The trend is the same for both the TD and ASD groups. The median response time for CAVE is 63 msec for TD and 65 msec for ASD; for Desktop it is 65 msec for TD and 68.5 msec for ASD; and the slowest response time was for HMD with 66 msec for TD and 81 msec for ASD. Thus, it can be seen that the performance of both TD and ASD participants improved when exposed to the immersive CAVE VR interface. Although HMD is also immersive, it is cumbersome and isolates the subjects, which may account for the large response time for the ASD group; the time was significantly lower for the TD participants, as they apparently enjoyed HMD more than the ASD participants. CAVE provides the highest and most intuitive level of immersion in VR; this proves the general effectiveness of an immersive interface in improving communication skills in children with autism using a realistic scenario.

Conclusion
In this study, we developed a VR-based (verbal/nonverbal) interactive system for improving the communication skills of children with autism based on a predefined and planned greeting scenario. The results of tests conducted verify that the developed system is an effective tool for improving the communication skills of autistic children. Although the results indicate that the impact on virtually all the participants was positive, there are certain limitations when considering these results in general for all high-functioning children with ASD. Thus, there is a need for more work to under-stand the ultimate potential of VE platforms for integrating physiological processes in order to target and treat features of ASD.
The mechanism used for bidirectional conversation between the participants and the avatar was natural and replicates actual real-life conversation skills as we used speech recognition, which makes our system more intuitive than many previous systems. However, one major limitation of the current study is the fact that English language had to be used because there is no workable Arabic language speech recognition engine. Although we used simple greeting words, it is still considered a limitation as most of the ASD children in our region do not know English.
Another limitation is that our study had a limited sample size of children with ASD. The task designed for this study was employed as a first pilot step in evaluation of the benefits of such a technological system with ASD intervention. However, the present study was designed as a proof-of-concept study and not as an intervention study. This is also essential for verifying how our VR system can be generalized to all ASD populations. A consistent barrier to the success of VR programs is the difficulty that children with autism have in generalizing newly acquired skills to new environments. Therefore, the generalization of skill improvement in real life remains an open question. Hence, questions about the practicality, efficacy, and benefits of the use of this and similar technological tools for demonstrating clinically significant improvements in terms of ASD impairment remain.