Calibration of Metal Oxide Semiconductor Gas Sensors by High School Students

— A wide range of pollutants cannot be perceived with human senses, which is why the use of gas sensors is indispensable for an objective assessment of air quality. Since many pollutants are both odorless and colorless, there is a lack of awareness, in particular among students. The project SUSmobil (funded by DBU – Deutsche Bundesstiftung Umwelt) aims to change this. In three modules on the topic of gas sensors and air quality, the students (a) learn the functionality of a metal oxide semiconductor (MOS) gas sensor, (b) perform a calibration process and (c) carry out environmental measurements with calibrated sensors. Based on these introductory experiments, the students are encour-aged to develop their own environmental questions. In this paper, the student experiment for the calibration of a MOS gas sensor for ethanol is discussed. The experiment, designed as an HTML-based learning, addresses both theoretical and practical aspects of a typical sensor calibration process, consisting of data acqui-sition, feature extraction and model generation. In this example, machine learning is used for generating the evaluation model as existing physical models are not sufficiently exact. Paper—


Introduction and motivation
Air pollution is the single largest environmental health risk in Europe with over 400.000 deaths per year in 2018 [1]. According to the World Health Organization (WHO) air pollution is a major cause for heart diseases and strokes, as well as lung diseases and even Alzheimer's [2].
The awareness of air pollutants has increased in recent years, especially as a result of the Fridays for Future movement, but especially young people often have a diffuse or only vague idea of it. For example, there is a widespread misconception about carbon dioxide (CO2) and its role as pollutant [3]. Although it is harmful to the environment as a greenhouse gas and contributes significantly to climate change in addition to gases like methane (CH4), for human beings it is only dangerous at relatively high concentrations (>1 % continuous exposure over more than three weeks), with symptoms like increased respiratory rate, dizziness, confusion and dyspnea [4]. Typically, these high concentrations are never reached indoors. Nevertheless, CO2 can serve as indicator for bad air quality, because other pollutants like volatile organic compounds (VOC) correlate with the CO2 concentration, when there is no other source of VOCs than human's evaporations [5]. This has led to the widely accepted Pettenkofer value of 1000 ppm for CO2, above which increased ventilation is recommended [6].
To create awareness about air quality, the outreach project "SUSmobil" (German: Schüler-Umwelt-Studien mit mobilen Messgeräten, English: Environmental Studies by Students with mobile measuring devices) aims to teach students, age 12 to 18, about air quality and how it is determined with low-cost sensors [7] - [9]. In three learning modules the students learn about the function principle of metal oxide semiconductor (MOS) gas sensors (module 1), the required calibration process for quantification of target gas concentrations (module 2) and finally perform practical measurements of indoor air quality (IAQ, module 3). These modules form the theoretical basis for students to develop their own environmental studies in the form of citizen science projects [10], [11]. Examples of student environmental studies are the investigation of particulate matter emissions near school, the investigation of the influence of plants on bedroom air quality or the investigation of the air composition in beehives and the bees' reaction to increased CO2 levels [12].
In the first module, students qualitatively investigate the sensor behavior at different sensor temperatures in the presence of different substances. Based on these observations a simplified and student adequate sensor model is developed [13]. This paper focuses on module 2 -quantitative calibration of a MOS gas sensor, here for different ethanol concentrations. In this context, the simplified sensor model is used and extended by concepts of temperature cycled operation (TCO, [14]). A commercial lowcost gas sensor module is used with the sensor operated dynamically resulting in a characteristic response pattern which can be interpreted by pattern recognition and machine learning (ML) techniques [15]. Starting with a conceptualization of the term "calibration", aspects such as the recording of training data, feature extraction and model formation using an Artificial Neural Network (ANN) are dealt with in a HTML-based learning course.

Technical concept
For the implementation of the student experiment on the calibration of a MOS gas sensor, hardware and software components were developed which are freely available (open source), thus enabling an easy transfer to other institutes, especially student labs in terms of sustainability. The corresponding components are briefly introduced in the following.

Hardware
In this experiment, the SnO2 based MOS gas sensor module BME680 from Bosch is used, which also includes a temperature, humidity and pressure sensor for environmental monitoring [16]. It is integrated on an Adafruit sensor board and can be programmed with freely available libraries via the Arduino IDE. The sensor is controlled and read out via I2C interface using a microcontroller ESP32 [17].
The calibration makes use of two gas chambers: (a) a gas reservoir ("storage chamber") with a known and constant ethanol concentration, and (b) a closed measuring chamber, in which the sensor is located, and a variation of the ethanol concentration takes place, Fig 1. The frame of the storage chamber as well as the base of the measuring chamber is built from aluminum profiles. The panes of the gas storage chamber and the cube on top of the measuring chamber's base are made from acrylic glass. The known ethanol concentration inside the storage chamber is generated by evaporating liquid ethanol on a hot plate (heater) inside the chamber. The air is mixed by two fans providing an uniform distribution of the evaporated ethanol in the chamber. There is a reclosable opening on the top (a septum) through which a defined volume of the ethanol air mixture can be extracted and transferred into the measuring chamber, also through a septum on the top, by means of a syringe. The storage chamber and the measuring chamber have dimensions of 25 cm x 50 cm x 50 cm and 10 cm x 10 cm x 10 cm, respectively, the volume transferred by the syringe is 1 ml, resulting in a dilution of the ethanol concentration by a factor of 1.000 in the measuring chamber compared to the gas reservoir. Data is transferred via a micro USB cable that extends into the measuring chamber.

Software
The experiment is based on a HTML-based learning course, which accompanies the learning process. The software contains information, animations, simulations and work orders (Fig. 2, left). Students work in small groups of two to three together on a computer and complete the learning course at their own pace. In addition to the reduction in personnel expenditure -only one or two tutors are required per class -these kind of learning courses also have the advantage of making use of modern learning media. Graphics and videos as well as running simulations and viewing animations help to achieve better learning success [18]. Working with the PC also increases the motivation of the students [19]. The learning success of the students is ensured by the fact that they can only progress once they have successfully completed certain tasks in the self-learning course. In addition to the learning course, a calibration software was developed ( Fig. 2, right), which is able to record, visualize and process training data. The visualization of the training data in real time promotes a better understanding of which processes happen on the surface of the sensor. Effects of different concentrations can be directly perceived visually. Furthermore, to the actual measurement software, an integrated tutorial provides the students with the most important functions of the software, which is specially adapted to the needs of high school students and is deliberately designed in a clear and concise manner so as not to overburden them. The measuring software, as well as the learning course, are open source and freely available [20]. An open source library from Adafruit is used to control and read out the sensor [21].

3
Student experiment: calibration of a gas sensor

Overview of the calibration experiment
The aim of the experiment is to convey the basics of a calibration process. Starting with an intuitive introduction of the term and the importance of calibration for any kind of sensor, the basics of semiconductor gas sensors are explained in a student adequate model. This model is then expanded by the introduction of temperature-cycled operation with a dynamic component. Next, the term feature extraction and the terminology of different orders of magnitude of concentrations are highlighted. After the recording of training data, the students perform feature extraction using simple features like maximum, minimum and mean values as well as slope in a given range to characterize special properties of the sensor reaction (resistance-time-curve) at each concentration to discriminate them. In order to create a mathematical connection between features and gas concentrations the students use an Artificial Neural Network (ANN). The basic concept of model creation is explained using everyday examples and a weighted sum model. In the end, the students will train the ANN for 10.000 iterations, thus introducing them to fundamentals of machine learning in an easy-to-understand example. By comparing their models, they are asked to examine which features are more suitable to differentiate the different patterns, i.e. to predict the gas concentration.

Introduction -Calibration of a virtual thermometer
Limit values for pollutants, measurement of speeds in a radar control or just the filling of beverage bottles require calibration of the respective measuring device. In the introduction, the students learn the importance of a comprehensive sensor calibration and perform one themselves for a virtual liquid-in-glass thermometer.
Conceptually, the temperature T, which cannot be determined accurately by human senses, is converted to the height h of a liquid column by thermal expansion. A value for h is determined for various known values of T and these calibration points are transferred via a suitable mathematical model allowing determination of T for any value of h. In this case, a simple linear equation is used with two reference temperatures, frozen water at 0°C and boiling water at 100°C, for determining the parameters of the model, i.e. slope and offset, Fig. 3 left. This process of determining the desired value, here the temperature T, from a measured value, here the column height h, is called calibration. The calibration of a virtual thermometer offers an intuitive introduction to the subject and parallels can be drawn to the calibration of a gas sensor, Fig. 3 right. In each case, a variable that cannot be determined directly (temperature / gas concentration) is first converted into a measurable variable (column height / sensor resistance). The subsequent model-based conversion (height of the liquid column to temperature / sensor resistance to gas concentration) then designates a calibration.

Simplified physical model of metal oxide semiconductor gas sensors
Already in the first module of the project, the students have learned the function principle of MOS gas sensors. A detailed qualitative and semiquantitative mathematical description of the simplified sensor model can be found in the paper for module 1 [13]. This student adequate sensor model describes the sensor reaction to a target gas at different sensor temperatures as a result of three competing effects: (a) "faster electrons" at higher temperatures leading to an decrease in resistance, (b) enhanced adsorption of oxygen reducing the number of free electrons and thus increasing the resistance and (c) increased reaction rate between reducing gases and adsorbed oxygen freeing captured electrons and thus decreasing resistance, Fig. 4.
The most important finding from this model is the understanding that the sensor reaction, i.e. the change in electrical resistance, depends on several factors: 1. Type of gas -chemically there are reducing and oxidizing gases which reduce or increase the sensor resistance. 2. Gas concentration -the higher the gas concentration, the stronger the effect of resistance change. 3. Sensor temperature -depending on the sensor temperature, the reaction rate of the respective gas with the sensor surface changes. This can be used to increase sensitivity and selectivity by optimizing the sensor temperature for the respective target gas. Although the model greatly simplifies the real processes on the surface [22], [23], it provides a vivid picture and is able to explain the observations the students make in these experiments. This simple model is now repeated and extended by the component of temperature-cycled operation to increase sensitivity and selectivity [24].
A cyclical variation of the sensor temperature results in a typical response pattern course of the sensor signal. After an abrupt change of the sensor temperature, an equilibrium surface coverage is reached only very slowly. Depending on the temperature, this process takes a few seconds to several hours. In a typical temperature cycle, the sensors are permanently in a state of non-equilibrium. This behavior is dominated by the grain-boundary effect: surface charges on the metal oxide lead to band bending resulting in an energy barrier between grains in the sensor layer [22]. Since the surface charge at SnO2 is mainly determined by ionosorbed oxygen, the observed relaxation can be attributed to a change in the coverage with reactive oxygen, Fig. 5 [25]. In this part of the course, it is important to demonstrate for the students that more information about the surrounding gas type and concentration can be extracted from the cyclic variation of the sensor temperature. Fig. 6 shows the temperature cycle used with the BME680 gas sensor and the resulting response pattern in air. It consists of 50 data points and each measuring cycle takes about 6 seconds.

Feature extraction for pattern recognition using machine learning
The sensor resistance pattern within a temperature cycle is characteristic for the gas type and concentration. The main goal of feature extraction is to extract relevant information and remove redundant information from the raw data, which could otherwise contribute to overfitting effects. The term overfitting describes the fact that noise, which by definition does not contain any information, is interpreted by the model as a supposedly "real" information. In later machine learning (ML), this often results in a model that matches the training data almost perfectly. However, this structure does not refer to an underlying effect, but only to random features. This reduces the ability of the model to generalize and produces unsatisfactory results when applied to new data. In order to concentrate as much information as possible in as few parameters as possible, so-called features are calculated. For a simple time-series signal, these can be maximum, minimum or average values as well as gradients within a certain time range. In order to get a feeling for the feature extraction, the students are asked to determine these features from a given resistance curve and interval, Fig. 7.

Concentration -what is that?
In this experiment, the sensor is calibrated for ethanol in the concentration range between 0 and 40 ppm. Typically, students only know concentration ranges such as percent, e.g. from alcoholic beverages in the liquid phase or breath alcohol concentration in the gas phase. In order to refresh the students' knowledge, the term "concentration" is introduced as a particle ratio, Fig. 8. Supported by guided calculations, the students are introduced to the concentration units "ppm" (parts per million, 10 -6 ) and "ppb" (parts per billion, 10 -9 ) [26], in which pollutant and greenhouse gas concentrations are often given [27]. Note that the correct SI units for ppm and ppb would be µmol/mol and nmol/mol, respectively [26]. http://www.i-joe.org Fig. 8. Explanation of the concentration units "parts per million" and "parts per billion".

Paper-Calibration of Metal Oxide Semiconductor Gas Sensors by High School Students
An important insight from this part of the learning course is the fact that concentrations are independent of the volumes considered. With the knowledge of concentrations, the students are now able to calculate the concentration increase inside the measuring chamber after injecting 1 ml ethanol-air mixture with a given concentration from the storage chamber.

Experimental setup
Following the theoretical part, the students record calibration data themselves and determine two characteristic features from these to train their ML model. The sequence of this practical part is depicted in Fig. 9. Liquid ethanol is evaporated in the storage chamber and produces a known ethanol concentration. By transferring gas from the storage to the measuring chamber the concentration within the measuring chamber is increased step-by-step. Ten temperature cycles are recorded for each concentration within the measuring chamber. When all cycles are recorded for one concentration, the average pattern of the cycle is displayed. Five different concentrations between 0 and 40 ppm are recorded, and the students themselves can decide which of the three medium concentrations they wish to use for further evaluation together with the lowest and highest. After collecting all raw data, the students are asked to select two characteristic features which -in their opinion -allow a good calibration of the sensor, i.e. which change significantly with the gas concentration. For this, two intervals consisting of 6 data points each can be moved freely inside the cycle and for each Max, Min, Mean or Slope can be determined. Previews of the calculated features depending on the concentrations are displayed in diagrams below the raw data, Fig. 10. When the students are satisfied with their choice, they can save the features and the information on the respective concentrations in a JavaScript file, which is then used to train a simple Artificial Neural Network (ANN).

Model building and the "least wrong solution"
The features calculated from the sensor data are automatically loaded into the HTML learning course together with the correct concentrations. Then, a mathematical model is built to predict the "target" (concentration) from these features. The principle of model building is explained to the students with an intuitive example: Is it possible to infer the weight of a person (output) only from the height and waist size (input) (Fig.  11). For this purpose, the students receive a hypothetical data set of five weights , height and waist size values each and are asked to investigate functional relation between these values. A simple relation is the (linear) weighted sum of both parameters multiplied with weights ! and " (1).
The resulting system of five linear equations with two variables, Fig. 12, is overdetermined and, in general, has no solution. However, it is possible to calculate the root mean square error (RMSE) of each possible combination of the weighting factors ! and " , and thus determine the combination with the minimum mean deviation. This combination can be called the "least incorrect solution". In an applet integrated in the learning course, the students can search for this solution by varying the weights. They receive a direct graphical feedback for the prediction of the model with the corresponding factors, Fig. 13. Any combination of weights resulting in a mean deviation of less than 5kg is accepted and allows the students to continue with the learning course.

Model validation
Finally, the students are asked to "validate" the resulting model on themselves using a measuring tape and a scale. Typically, the model provides only unsatisfactory predictions, so the students are asked to submit suggestions for optimizing the model. Possible answers are ─ Increasing the size of the training data set ─ Collect more representative training data ─ Use other (possibly more appropriate) features ─ Built a more complex and/or theory-based model An example of a more complex model is an Artificial Neural Network (ANN), which can be validated via k-fold cross-validation. It is then used to build the gas sensor calibration model for ethanol. Due to the complexity of the functional principle of an ANN, the exact theory is not discussed in the course. However, analogies between the training process of a neural network and the determination of the optimal combination of weights in the example for determining the weight are discussed. In both cases, weights are adjusted and varied until a nearly optimal combination is found.
Students can follow the learning process of the neural network based on their training data in real time. The training, based on the standard backpropagation algorithm, automatically stops after 10,000 iterations. Typically, the neural network manages to find a nearly optimal model eventually, due to the small amount of training data. For unsuitable features, however, this process tends to take longer, Fig. 14. Thus, students who have identified more suitable features can obtain a better model. Finally, the different student groups can compare their training data, selected features and resulting models and can record the features of an unknown concentration. These features can be entered in the built model to predict the unknown gas concentration. In the end, all students receive a complete summary of the theoretical fundamentals and experimental results.

Conclusion and outlook
The developed experiment of the calibration of a MOS gas sensor highlights the importance of calibration for any measurement. The students acquire insights into the principle of calibration as well as modern techniques for data acquisition, feature extraction and modelling using machine learning. Starting with the intuitive example of the calibration of a thermometer and presenting analogies to gas sensors, the term calibration becomes tangible for the students. The age-adequate model of the processes on the surface of a MOS gas sensor that influence its resistance helps to explain the sensor reaction at different sensor temperatures, gas types and concentrations. The complex process of temperature-cycled operation to increase the sensitivity of the sensor is also well explained by the developed model. The students learn about different concentration magnitudes such as parts per million (ppm) and parts per billion (ppb), which are often unknown to them. In the practical part of the experiment, the students generate different ethanol concentrations in the measuring chamber, record sensor response data and perform a feature extraction to predict the gas concentration. The subsequent modelling by means of an ANN is motivated using the example of the model of the weighted sum of two features. The "least incorrect solution" is identified systematically by adjusting the weights and receiving direct visual feedback in the learning course. This also simulates the learning process of an ANN (e.g. the backpropagation algorithm searching for a better solution). The final comparison of their models based on their selected features additionally motivates the students to deal with the model building and provides a playful aspect.
The presented HTML-based learning course has already been used as part of a graded MINT practical course in the 9 th grade of a German high school ("Gymnasium") and is a well-established part of the course program in the student lab SinnTec at Saarland University and in the student research center Saarlouis, Germany. A more advanced version of the self-learning course was developed to introduce gas sensor as well as measurement science fundamentals for engineering students (2 nd semester bachelors' program) as part of the fundamental hands-on training at Saarland University. Both