Safety in Interactive Hybrid Online Labs

Based on the complex grid infrastructure of the Ilmenau Interactive Hybrid Online Lab various possibilities of occurrence of malfunctions as well as mechanisms to avoid them will be discussed. This will be demonstrated by different examples. One implementation challenge is to protect the physical systems in the lab against wrong control algorithms or malicious trying to sabotage or to destroy the system - without defining too many design constraints. Therefore, a high level of safety must be guaranteed for such an autonomous working system. Therefore the first step is the verification of sensor and actuator constellations to guarantee safety in online labs. Based upon this an efficient possibility for a validation of the student's design will be shown as subsequent step.


INTRODUCTION
The Integrated Communication Systems Group at the Ilmenau University of Technology is an expert in the field of Internet-supported teaching of digital system design and is well experienced in the area of integrated hard-and software systems for over 10 years. The students have to pass hands-on examinations in a lab to complete the learning outcomes by own experiences. For all students, hands-on experiences are important to deepen their knowledge about topics they learned during lectures. With our interactive hybrid online lab, called GOLDi (Grid of Online Lab Devices Ilmenau), we want to offer the students a working environment that is as close as possible to a real world laboratory. Under real laboratory conditions disturbances can appear and lead to failures of the control algorithm that cannot be detected under virtual lab conditions. Facilities of hybrid online labs provide permanent online access for students and supervisors, and give possibilities to check different parts of the designs most easily. Besides the advantages for students, this also reduces the costs for academic teaching and improves the overall quality by offering more practical training options.
This gives students the chance to realize correct designs, to organize self-study process of the student more efficiently, to control student's work and to broaden the ways of communication in research work with companies as well. This solution is intended for the use in teaching materials dealing with the design of digital control systems and embedded systems -from the basics up to complex design tasks as well as within the newly established Tempus projects "ICo-op -Industrial Cooperation and Creative Engineering Education based on Remote Engineering and Virtual Instrumentation" [1] and "DesIRE -Development of Embedded System courses with implementation of Innovative virtual approaches for integration of Education, Research and Production in UA, GE, AM" [2].
The detailed concept, including the possibilities and limitations of such an approach, as well as the infrastructure and many different fields of applications of the Ilmenau Interactive Hybrid Online Lab GOLDi (see Fig. 1) were presented in previous papers [3,4,5,6].

II. SECURITY AND SAFETY IN ONLINE LABS
One implementation challenge is to protect the physical systems (the electro-mechanical hardware model, e.g. elevator, water-level control, high-storage warehouse, 3-axis portal) in the lab against wrong control algorithms of unskilled students or malicious trying to sabotage or to destroy the system without defining too many design constraints. Therefore, a high level of safety must be guaranteed for such an autonomous working system. Generally two types of "security/safety" must be considered for systems with access to the Internet -like our online labs: (1) Information Security (security) defines procedures and requirements concerning the protection of information processing as well as some policies to avoid unauthorized data manipulations [7] -see Fig. 2. This topic concerns first and foremost the remote lab server as well as the experiment booking system but should not be the focus of this article.
(2) Operational safety (safety) defines functional safety of electrical, electronic, programmable electronic safety-related systems. It defines several safety integrity level (SIL) requirements for any safety function and provides a risk-based conceptual framework [8] -see Fig. 3. There are different protection strategies already existing in the industry; however, they are designed for a specific application and require special software or they impose design constraints. More flexibility is required for online labs with constantly changing tasks, especially when students are free in their design decisions and should be encouraged to develop their own creative solutions for a given task / problem.
To guarantee the safety within the GOLDi infrastructure two main components are necessary (see Fig. 1) the bus protection unit (BPU) to interface various control units where the students design task is running on (e.g. FPGA, Microcontroller, PLC, …) to the internal remote lab bus and to protect it from misuse and damage as well as the physical system protection unit (PSPU), which protects the physical systems against deliberate damage or accidentally wrong control commands and which offers different access and control mechanisms.
The bus protection unit receives commands from a control unit and simply checks them for bus validity. This is done by using the specific GOLDi transmission protocol [9]. The content of the transmitted data and addresses are not checked, because this depends on the selected physical system and is therefore done by the specific protection units of each physical system. The function of the bus protection unit is to prevent a control unit from blocking the bus and causing others to be affected. The bus protection unit is based on the same hardware as the protection units for the physical hardware models in the remote lab but uses different add-on boards and a different firmware. This simplifies production and maintenance.
The physical system protection unit is necessary when students execute their algorithms on the selected control unit and want to be completely free in their choice of design tools. All protection mechanisms discussed within this paper are executed inside an FPGA on this PSPU.
Compared to the approach used so far, there was no possibility to check, if the executed commands are safe for the physical system. This means, damage could be caused by invalid commands. The first task of the protection unit (the verification process) is to check for command safety by filtering all commands. Only commands that will not cause any malfunction will be transferred to the physical system. All others are discarded. A validation as second step can be used as indicator for the quality of the students design. The feedbacks from the verification and validation module will be reported to a learning management system (LMS) to give the student immediately information about the occurrence of a fault and inform him about the quality of the realized design task -see Fig. 4. Using such universal protection units gives the students the largest degree of freedom for their design, because no precautions have to be taken into account. Therefore, no additional security framework (workbench) within the software and hardware control design is required to prevent malfunctions of the physical system. The complete design flow is carried out at the students' side, giving them a more authentic look at a real world project design flow.
A normal faultless communication during a running experiment is shown in Fig. 5. The physical system (e.g. an elevator) generates different sensor signals (x sensor ). Based on these sensor input signals the control unit generates the corresponding actuator output signals (y actuator ) according the implemented student's control algorithm [10].   Fig. 6 shows an example for a sequence diagram for a 3floor elevator which drives upwards from the 2 nd floor with destination floor no. 3. The sensor signal x 1 [cabin position 2 nd floor] signalizes that the cabin is still in floor no. 2. This signal will be transmitted via the PSPU and the BPU to the connected control unit. Thereupon the control unit will activate the actuator signal y 0 [drive upwards] which will be transmitted via the BPU and the PSPU back to the physical system to finally move the cabin upwards. If the cabin reaches the 3 rd floor sensor signal x 2 [cabin position 3 th floor] will be activated and the whole communication cycle starts again -the control unit will now deactivate the actuator signal y 0 to stop the cabin motion.
In both directions the PSPU checks the signals on validityonly correct actuator signals will be transferred to the physical system. The detailed verification and validation concept will be described in the next sections.

III. OPERATIONAL SAFETY THROUGH VERIFICATION
A verification of the control signals is used to guarantee the operational safety of the remote lab infrastructure. It means the electromechanical hardware models (physical system) must be protected against destruction. Malfunctions in the control can have multiple causes. They are defined as control signals (actuator signals), which cause a damaging or destruction of the physical system according the actual status of the input signals (sensor signals).
For online labs we have to distinguish between two causes of faults: on the one hand faults caused by the user ("userbased faults") as well as faults due to the remote lab infrastructure and the communication within this infrastructure on the other hand ("infrastructure-based faults"). The whole fault diagnosis and handling will be managed through the PSPU by an FPGA. In any case the user must receive information on the occurrence of the fault. Additionally these faults will be logged to give feedback to a connected LMS or to the responsible tutor.
These two types of faults as well as the error detection mechanisms will be described in the following.

A. User-based Faults
These faults are caused by the users e.g. by wrong control algorithms or malicious trying to sabotage or to destroy the system. Fig. 7 shows an example for a faulty elevator control algorithm. The cabin is already located on the top (3 rd

B. Infrastructure-based Faults
The GOLDi infrastructure itself and the communication between all remote lab components can also cause fault situations.

1) Invalid sensor value constellation
The simplest case is a faulty electrical connection between a sensor and the PSPU or a defective sensor. This leads to invalid sensor values. Fig. 8 gives an example for two simultaneously active sensor signals in the 2 nd floor (  Furthermore it is not so easy to detect a defective sensor which generates no sensor signal. Only such floor sensors can be detected if there are additional sensors above and below. Defective floor sensors in the lowest and highest floor cannot be detected.

2) Infrastructure-based delay (without timeout)
The quality of the Internet connection speed leads to a more or less large time delay between the activated sensor signal (e.g. x 3 in Fig. 3) and the calculated corresponding actuator signal (e.g. y 0 in Fig. 3). Especially if the selected control unit for this experiment is not located in the remote lab, but outside on the student's client PC or a tablet. This means that the cabin moves continuously upward for a certain time (see Fig. 9) and could damage the physical system. That's why the PSPU must stop the cabin motion immediately after . According to the internal PSPU architecture this takes only a few nanoseconds. This means the physical system is in a fail-safe state temporarily. This is the same effect as the above described invalid sensor value constellation detection. But in this case the PSPU will wait for a response of the control unit for a certain time . Normally the control unit detects the situation (activated sensor x 02 ) and reacts within with the corresponding actuator signal (deactivated y 00 ) to stop the motion. If this response time is within the range of the given time slot , the PSPU will not generate any error messages. Fig. 9 demonstrates this effect. This time slot is adjustable -for the GOLDi infrastructure it is defined with 2 seconds.

3) Infrastructure-based error (timeout)
The just described situation assumes that the control unit will generate an actuator signal (as the result of any sensor changing) within a defined time slot .
If the reaction time is greater than the specified time slot ; the PSPU must generate a timeout and cancel the experiment to bring the physical system in a stable state (see Fig. 10). The user will be informed about this. In general it makes no sense to continue the experiment if the Internet speed is too slow. A timeout will also be generated if something goes wrong within the control unit e.g. the control unit doesn't send actuator signals anymore.

C. Error Detection Mechanisms
After the description of possible faulty situations during a running experiment (user-based, infrastructure-based) the mechanism for the detection of these errors will be described in the following.
As mentioned before the whole fault diagnosis and handling will be managed through the PSPU by an FPGA. Thereby three modules within the PSPU have to analyze   Because the error handling will be done by an FPGA, all three modules can be executed in parallel. If one of the modules detects an error, a corresponding error code will be generated and transferred to the main experiment flow control module (see Fig. 11). This flow control module finally decides whether an intervention in the ongoing experiment is necessary. Fig. 11. Error Detection Mechanism Heart of each of these three diagnosis modules is a simple matrix structure to register all possible forbidden constellations. Fig. 12 gives an impression for such a matrix for the third module (sensor/actuator constellations). This means that all functionality of the error detection can be reduced to combinational equations implemented in VHDL on the FPGA of the PSPU, e.g. for the described situation:

IV. EVALUATION THROUGH VALIDATION
In addition to the verification of the students design necessary to guarantee the safety inside the online lab (described in the previous section), users want to have a fast feedback to their realized control task. Therefore, possibilities for an automatic validation of the user's solutions within the GOLDi infrastructure were implemented as well (see Fig. 4). Thus, an assessment of the quality of the developed solution can be given immediately. Therefore a reference design and a method to check the students' design against this reference are needed. The reference design should be independent of the used control unit and the development tools and only specifies certain objectives of the given task.
It is the main advantage of this verification that only valid sensor/actuator constellations (passed through the verification module) will serve as inputs for the reference automaton. Eventually these are not efficient according the given task; but they will not cause any faulty problems on the physical system. Both changes in sensor signals as well as changes in actuator signals will act as "events" to trigger the reference automaton (one event = one automaton execution step).
To evaluate the students design, the given task will be divided into several subtasks. The most efficient path through the automaton graph (best quality of the realized design) is a vertical processing from state to state. If there are any deviations from the ideal design, each subtask contains several "co-states" where the tutor can define a "weight" for this deviation. Strong deviations from the ideal design mean large weights; little deviations mean small weights. For the overall quality of the students design it finally means: the lower the earned deviation weight points the better the design! The design of the reference automaton with the possibility to allow several deviations of the optimal design in each subtask and the specification of the weights for a certain deviation is a very ambitious task for the tutor and requires a lot of experience. That's why only an extract of a very simple design task will be used to illustrate the validation mechanism. The reference design for this simple task can be divided into several subtasks. The first subtask "move cabin to 1 st floor" is fulfilled, when sensor x 0 … cabin position 1 st floor is activated. For the next subtask (main state) "move cabin to the 3 rd floor" the following two deviations (co-states) are allowable: "stop cabin motion" (with a weighting of 2) and "move cabin downward" (with a weighting of 6 -because this means a stronger deviation from the given task). This subtask can be fulfilled from any of these three states (main state or the two co-states) by activating the sensor signal x 2 … cabin position 3 rd floor. Starting from the main state (see Fig. 13), the following three sequences will give an impression how to calculate the overall weighting for the actually performed design task -with derivations from the ideal design: (1) The motion starts upwards according the given task.
But now the motion will be stopped. After continuing motion upwards the task will be fulfilled by reaching the 3 rd floor. The student will earn in sum 2 derivation points. (2) The motion starts upwards according the given task.
But now the motion will change to downward. A strong derivation! After continuing motion upwards the task will be fulfilled by reaching the 3 rd floor. The student will earn in sum 6 derivation points. (3) The motion starts upwards according the given task.
Now the motion will be stopped. After that the motion will change to downward. Finally after continuing motion upwards the task will be fulfilled by reaching the 3 rd floor. The student will earn in sum 10 derivation points.

V. CONCLUSION
Main focus of this paper is a complex verification system to protect the electromechanical hardware models (physical systems) in a remote lab against wrong control algorithms of (unskilled) students or malicious trying to sabotage or to destroy the system -without defining too many design constraints. For online labs we have to distinguish between two causes of faults: faults caused by the user ("user-based faults") as well as faults due to the remote lab infrastructure and the communication within this infrastructure on the other hand ("infrastructure-based faults"). This task will be executed within the physical system protection unit (PSPU) by filtering all commands. Only commands that will not cause any malfunction will be transferred to the physical system.
In addition, users want to have a fast feedback to their realized solutions. Therefore, possibilities for an automatic validation of the user's design within the GOLDi infrastructure were implemented as well. Thus, an assessment of the quality of the developed solution can be given immediately. For an efficient usage of the validation system, the reference design and a method to check the student's design against this reference design step by step will be traced by a learning management system (LMS) like "moodle" to analyze any experimental results of the user and generates feedback to improve the given design.