A Fire Drill Training System Based on VR and Kinect Somatosensory Technologies

A fire drill is a method of practicing how a building would be evacuated in the event of a fire or other emergencies. Many jurisdictions require that fire drills be conducted at certain intervals, but high costs and low efficiency of traditional fire drills require urgent innovative solutions. To improve the efficiency of mandatory fire drills and train the personnel to a variety of fire scenes, a new fire drill platform combining a somatosensory camera, Kinect, and virtual reality (VR) is proposed. The platform ensures the somatosensory interaction between human and 3D objects in the virtual simulation environment. The integral constituents of the proposed platform, namely: 3D modeling, scene building with Unity 3D game development platform, human skeletal identification by Kinect sensor, and the Euclidean distance matching recognition algorithm for body movements are discussed. The evaluation of fire drills via the proposed technique was conducted by a survey of 20 trainees. The results obtained strongly suggest that the system has a practical significance and is instrumental in fire drills, fire safety training, and other applications. Keywords—fire drill, VR, Kinect, somatosensory interaction, action recognition


Introduction
A fire drill (also termed a trial evacuation) is a method of practicing how a building would be evacuated in the event of a fire or other emergencies. In most cases, the building's existing fire alarm system is activated and the building is evacuated as if the emergency had occurred. Generally, the evacuation is timed to ensure that it is fast enough, and problems with the emergency system or evacuation procedures are identified to be remedied.
The large-scale industrial implementation of "virtual reality" (VR) in 2016 is considered as an important benchmark or even a start of the "virtual reality era". The available VR technologies allow users to immerse into the virtual world interacted with computer-generated and multi-sourced information fusion for 3D dynamic visual and physical behavior visual scene and entity behavior. The latter is a product of many advanced technologies, such as computer graphics, human-computer interface, simulation, multimedia, sensor, voice, pattern recognition, and artificial intelligence.
It has three primary characteristics, namely immersion, interactivity, and imagination. Since the VR development is known to affect multiple domains, more and more technology companies worldwide get involved in the research of VR hardware and software. In the virtual simulation exercise domain, Tao Yi et al. [1] introduced the calculation method for the simulation of three kinds of chemical disaster situations, establishing the simulation system for chemical disaster emergency drills and the virtual simulation control. However, since the above simulation implied a fuzzy rough method integrated modeling, its application for quantifying the problem of emergency rescue drill control was quite problematic. Wang et al. [2] developed the system of earthquake rescue drill real seismic simulation and fast terrain rendering and analyzed practical exercises to simulate their difficulty, cost, effect, etc., by using the 3D visualization and virtual simulation technology. Feng et al. [3] designed the scene simulation training function of the firefighting integrated system for the mine safety accident, which featured the intelligent virtual interactive control and approximate actual conditions. Liu et al. [4] used the analytic hierarchy process to establish an alternative set of comprehensive evaluation index system, which was no longer based on the evaluation method of the maximum degree of fuzzy membership but comprehensively combined the entropy weight method, Delphi method, and fuzzy synthetical evaluation to provide the quantitative multi-level evaluation. Luo et al. [5] achieved a higher degree of flame simulation and firefighting interaction and succeeded in the simulation of the fire scene and fire training. The current status of virtual simulation studies applied to fire drill exhibits multiple problems, such as low virtual simulation, poor interaction, poor immersion, high cost and so on, that have to be mitigated. One of the VR intrinsic features is its interactivity in the virtual environment, which implies the interaction between data collected by the sensor device and the traditional input and output devices, such as PC keyboard and mouse. The breakthrough appearance of somatosensory devices for the VR development and industrial implementation offered the users a more natural way to immerse into the virtual environment and humancomputer interaction. In particular, an outstanding representative of somatosensory devices is Microsoft's Kinect released in November 2010, which is essentially a depth somatosensory camera, acquiring the real-time visual scene and RGB information of the effective range, identifying the human skeleton information, and achieving the voice information with built-in microphone array [6,7]. Although Kinect was primarily designed for natural interaction in a computer game environment, its excellent performance have found it numerous applications including mapping and 3D modeling. Moreover, a joint application of VR and Kinect technologies is lucrative for the virtual fire drill elaboration. The realistic reconstruction of some scenes (e.g., earthquakes, oil and grain fires, etc.) is quite problematic: due to the safety or cost considerations, the conventional drilling scene layouts often fail to reflect the true hazardous situations, thus deteriorating the commitment of trainees and the exercise effect, while more realistic layouts are jeopardized by possible accidents, waste of manpower and material resources. Given this, a comprehensive combination of VR and Kinect technologies seems to be very lucrative for large, medium, and small safety drills, since it reconstructs various scenes by the computer simulations, in order to recognize the human actions through the Kinect equipment and software, as well as to assess the final training results. A system based on these technologies and the appropriate utility model has the advantages of high fidelity and openness, strong anti-interference and autonomy, as well as ensures high safety. These lucrative intrinsic features envisage the proposed system feasibility in training applications.

2
Development platform and general framework

Development platform and tools
The scene 3D model was constructed in 3D MAX via Unity 3D game development platform (by Unity Technologies, US), which made possible the real-time acquisition scene in the Kinect skeleton information, as well as recognized the interaction of human action and 3D virtual objects through the C# language. The overall data flow is shown in Fig.1.

General framework
The system can manage the fire drill simulation through the procedure, time, and scene of the whole plan. Moreover, a comprehensive evaluation of the fire drill and the individual behavior are evaluated, forming a complete system of overall training and evaluation system. The system can be personalized by using the self-service in the building process, according to different exercises, which makes the overall scenes more variable and improves the trainees' experience. The system provides not only self-help exercise in the fire drill but also tests the simulation process of emergency fire command and emergency plan, which promotes the overall fire drill efficiency in a interactive realistic environment. The system flowchart is shown in Fig. 2.

Key technology
The development technology based on object-oriented components is used to develop and build a component library, GIToolkit, which contains numerous functions of the image analysis and processing. The platform manages multiple classifiers and allows the user to configure them dynamically; a visual user interface can define highlevel gesture interaction semantics, according to different applications; it shields technical details of the image processing and machine learning, as well as simplifies the interface development process.

Overall requirements for VR in the firefighting system
The first requirement is to ensure that users get a better sense of immersion; the second one is to provide the virtual simulation scene and character interaction. The better immersion needs more authentic models in various emergency fire drills, overall environment, and firefighting tool models (such as fire engines, fire hydrants, fire extinguishers, fire signs, etc.). Also, 3D model objects can be customized, including the placement and usage status. These basic models are applied to the flexible fire drill scene customization.

Kinect somatosensory technology
Kinect is a tool that can collect the RGB and depth information in real time around the scene. It consists of an infrared laser emitter, an infrared camera, and an RGB camera. The laser source emits a single beam, which is split into multiple beams by a diffraction grating to create a constant pattern of speckles projected onto the scene. This pattern is captured by the infrared camera and is correlated against a reference pattern. The latter is obtained by capturing a plane at a known distance from the sensor and is stored in the sensor memory while the depth measurement is performed as a triangulation process. Noteworthy is that the Kinect sensor captures depth and color images simultaneously at a frame rate of up to 30 fps. The integration of depth and color data results in a colored point cloud that contains about 300,000 points in every frame.
Provided by Microsoft SDK, 20 skeletal key points of a human body can be directly identified and tracked, with the interpretation of human body language and image content information by the corresponding algorithm. In the fire drill, the most important task is to minimize any real hazards in the virtual scene, and the essence is to exercise the user in the virtual danger of the body movement expression. Such attempts were made by numerous researchers. Thus, Yang et al. [8] designed the Kinect depth information-based human motion and tracking algorithm, for the lower robustness of human tracking algorithm, avoiding the illumination changes and background under similar circumstances, which led to unstable human tracking in complex scenes. Luo et al [9] managed to distinguish person's hands from the background, using the Kinect depth information, by adding three expressions based on the Hu matrix, which allowed one to add more detailed features to invariant moments and provided the recognition of gestures transmitted via the intelligent wheelchair motion control, with the background anti-interference. Given this, the Kinect application to the fire drill is quite feasible. The key technological challenges of the whole system are the comprehensive identification/separation of the human body from the background in the complex environment of a fire drill, the rapid identification of human body movements, and the formation of interaction in the virtual scene. According to the characteristics of the body movement of the fire drill and rapid accurate identification, a more simple and direct method is adopted in this study. The unified coordinate system. It is necessary to collect data on the human action that establish suitable coordinates linking the Kinect coordinate system and 3D virtual space with the interacting character and virtual object in space. There are four kinds of coordinates in the Unity, in which the Unity Inspector window describes a 3D object position in the global coordinate system, including left-and right-handed coordinate systems, the former one being used in Unity [10,11]. We can use the transform position to get the global coordinates of an object in the scene. Here X C =(x c , y c , z c ) T represents the coordinate information around the scene in Kinect, where C is the Kinect center non-homogeneous coordinate in the global system, X=(x, y, z) T corresponds to the coordinates is the global coordinate system, while the relationship between the two coordinates is as follows: Given this, the projection matrix of the general camera is: where t = -RC.
iJOE -Vol. 14, No. 4, 2018 Smoothing the skeletal data in Kinect. Since the origins of RGB and the depth image data are the RGB and infrared cameras, respectively, there will be a certain mismatch between their coordinate systems, with respective errors. To correct these, the mapping function "Map Depth to Color Image Point" is provided by the Kinect for Windows SDK software package. This adjustment/ alignment can be applied to the Unity scene, in order to overlap characters of RGB and depth images in the scene and calculate two images based on the skeleton key point and figure image skeleton key information. However, possible unstable performance of the Kinect hardware and incoherence of the operator action may result in a large difference between the relative positions of skeletal joints from frame to frame. Moreover, the appearance of outliers' skeleton data sequence may lead to the distortion or unnatural effect of immersion in the virtual scene movement. Therefore, the outliers should be first identified and eliminated, while the skeletal data need be denoised and smoothed.
The principle of the moving average trajectory smoothing algorithm is (i) to make the average position value of the current position and the limb position in the first N-1 sampling periods as the standard position, (ii) itemize the passage position according to step N collected by Kinect in the time sequence, and (iii) get the whole body movement position value. The algorithm can be used to tune off/filter out the effect of periodic changes on the smoothness of the position curve. The process of trajectory smoothing is shown in Fig. 3:   Fig. 3. The trajectory smoothing process The moving average trajectory smoothing algorithm can eliminate the chattering of the body and other random disturbances, as well as filter out the desired operation information. For retaining the desired position of the operator, the weight of the current position information in the smooth processing position is refined by the mobile smoothing algorithm. The assumed time shifting step is N in the trajectory smoothing algorithm, i is the weight of the current position information in the planning of the location information. If k!N, the value of k corresponds to the time planning position derived by the following formula: If k < N, the planning position of the K moment is derived as follows: Besides random outliers, the disturbance of the limb movement without coherence and fluctuation in the position information may occur as well. Insofar as this study focused on the strategy of position increment, the algorithm of adjacent limb position incremental smoothing is applied directly to the collected body position information, instead of the trajectory smoothing algorithm. In this way, the moving average algorithm can be used to filter out the irregular variation of the position information. It is verified that the moving average trajectory smoothing algorithm based on the position increment can mitigate the interference of the major jitter and outliers.
Recognition of body movement based on Kinect. Body movements in the fire drill simulations are simpler for the recognition than those detected in sign language, physical rehabilitation, and other actions, and the key is the recognition accuracy and speed. The identification of body movements was mapped to the mouse and keyboard in the VR system. There are two categories of body movements: one is the path selection, location selection of hedgings in the virtual environment, such as whether to take the escape exit direction and choose the elevator or stairs; the other is a single subtle action, such as what kind of fire extinguisher is taken, whether to use wet towel to cover one's mouth and nose, and whether to pass the smoke environment by crawling, etc. Two categories of body movements are based on the identification of Kinect human skeleton key points, as follows. The Kinect system can track and identify two skeletons, 20 skeletal key points of standing persons and 20 skeletal key points of sitting ones. In a scene, six persons can be detected. In fire drills, the system initializes the front person in the scene and defines everyone's ID and name, and the initialization process records the initial state of all persons, including the coordinates of key points and skeleton calculated by the coordinates of the body height, spine height spine, and skeletal key points near the shoulder.
For the identification of the first category of movements, the main track, which is applied to real-time human spine bone critical points, sets the threshold, in order to determine whether the body reaches a regional scene and interacts within the region and the 3D object. For the collision between the 3D object and characters in the virtual scene, the 3D model Rigidbody Collider outer membrane can be set up for the event monitoring, as is shown in Fig. 5: The outer edge of the 3D characters is the Collider, using which the whole 3D virtual scene model can be implemented interactively with the human body. The technology is relatively mature, and the tracking recognition of the first category of movements by this method is user-friendly and accurate.
The recognition of the second category of limb movements is more complicated, since it is hard to detect whether the completion of the body movement of the fire drill meets the requirements of the physical behavior. Due to the recognition of the fire drill action, the number of standard actions is limited, and this study uses the recognition method based on the template matching, as shown in Fig. 6.   Fig. 6. Complex motion recognition framework Firstly, the characteristic database of the standard action is established to compare the skeletal action features of the real-time acquisition by the Euclidean distance. Insofar as the Euclidean distance matching needs the same dimensions of the data, it is necessary to reduce the dimensionality of different matrices via the LDA processing [12], and simultaneously improve the efficiency of the system.
The action library is established by the specific setting, which action has to meet the general user needs for different applications that involve various firefighting actions. The extraction method of a feature is based on the quaternion method, including 3D coordinates and rotation angle for each bone/skeletal point that is represented by four data. Hence, 15 extracted skeletons points are representative, whose quaternion characteristics are expressed by a matrix of 60 columns, while various actions correspond to different rows, so that particular matrices have the same columns and different rows. Insofar as the quaternion characteristics are different and inconvenient to match, the action feature data need the dimensionality reduction, with the action sequence length being consistent, where the action feature data are 12x60 matrices.
In this study, a simple and efficient Euclidean distance matching method was adopted, which used formula (7) to calculate the distance between two random points in the 3D space of 20 key points depicted in Fig. 5: where d h_left and d h_right are spatial distances between two skeleton key points from one's head and the left and right hands, respectively. Different scenes and fire drill scenarios require to perform such particular actions as protecting one's head with both hands or crawling forward and covering one's mouth with one hand. The Euclidean distance feature matching can determine whether the person is engaged in the former or the latter actions. The time complexity of this algorithm is low, and it can achieve the effect of real-time monitoring and recognition, which involves such parameters as the recognition accuracy and time of fire drill, as shown in Table 1. The experimental results in Table 1 show that the recognition time provided by the Euclidean distance algorithm is less than 0.1s. As compared with the adaptability, the recognition time of various actions is slightly different. The reason why the Euclidean distance matching recognition time can be controlled at this low level is that the data dimensionality is reduced after the Euclidean distance first matching. The characteristic data are simple and can reflect the particular action characteristics, so the respective process is less time-consuming, which dramatically improves the running speed of the system.
Motion assessment and verification. For the second category of movements in fire drills, the evaluation of the action norms was carried out. The characteristic motion vector selects four-joint angle data of 10 testers, and each test was repeated ten times, where 100 test samples in total were used to model the nonlinear regression of the curve. The least squares method was applied to select one of the curves as the standard template curve. Finally, the dynamic time warping (DTW) algorithm was used to evaluate the action. The contrast chart, i.e., the difference in the DTW curves obtained by the dynamic programming is depicted in Fig. 7, where red curves represent the current tester's angle data, while blue ones correspond to the template library data. The curve is modeled using regression equations, and each standard joint angle curve was measured at 800 frames. By cross-validation, the data of one individual, in turn, were selected as test samples in the 100 sets of data, while the data of the remaining nine individuals were modeled as templates. It was found that the DTW difference distribution of four-joint angles exhibited the trend, as shown in Fig. 8. According to the performed statistical analysis, the DTW difference distribution interval of the range upper limb is quite sparse, while the difference range is large. In contrast, the angular distribution of the lower limb is quite dense, while the difference range is small. This can be attributed to the fact that movements of the upper limbs are faster than those of the lower ones, while the respective joint angles vary significantly.

System evaluation
Fire drills envisage that all units/participants follow various emergency plans, which are carried out and their results are qualitatively evaluated. However, most evaluations are made by the team leaders or experts on the overall fire drill results, which makes them somewhat biased/subjective. Therefore, alternative unbiased and comprehensive evaluations are instrumental. Given the difference between the virtual simulation system and true field exercise, the proposed experimental system underwent a further modification in nine aspects, which are listed in Fig. 9. These include the subjective and objective evaluations, combined with the data of virtual simulation training platform to obtain characteristics and records, the evaluation method on the basis of Wang et al. [13] and Chen et al. [14]. The evaluation content being comprehensive and objective, the results obtained are considered quite satisfactory.

Application examples and analysis
To verify the effectiveness and robustness of the key technologies of the fire drill system based on VR and Kinect, this paper implements several indoor virtual fire drill scenes. The platform simulates the emergence of fire, self-help and putting out the fire, and finally gives a complete evaluation process. A comparative study is made between the virtual fire drill and the original one. The results are shown in Table 2, where the first number corresponds to the absolute number of respondents, while that in brackets is their percentage, the total number of survey participants being equal to n=20. The statistical analysis of limited data in Table 2 revealed that VR and Kinectbased fire drills are more efficient in the use of resources, improve the aim and interest of trainees, and feature other aspects, which are significantly higher than the traditional method based on practice. However, there are some gaps in the simulation, as compared to real scenes.

Conclusion
Based on VR and Kinect technologies, an innovative fire drill platform is proposed, where the main control can be customized for the drill scene, and one can achieve the entire process of fire drill simulation exercise. According to the local special scene, through the identification of skeletal Kinect users, the applied algorithm can recognize human body movements, the overall completion of the fire drill testing process, as well as provide the quantitative evaluation of fire drill exercise results. The proposed fire drill simulation system based on VR and Kinect technologies is shown to bring the training process closer to reality, makes it more immersive, and has higher application value and development space. On the other hand, the system can be further improved by the multiple Kinect online usages with the accumulation of library templates. It is envisaged to increase the number of recognized actions and reduce the space limitations in the follow-up studies of the authors.