A New Covid-19 Tracing Approach using Machine Learning and Drones Enabled Wireless Network

The continuous advancements in wireless network systems have reshaped the healthcare systems towards using emerging communication technologies at different levels. This paper makes two major contributions. Firstly, a new monitoring and tracking wireless system is developed to handle the COVID-19 spread problem. Unmanned aerial vehicles (UAVs), i.e., drones, are used as base stations as well as data collection points from Internet of Things (IoT) devices on the ground. These UAVs are also able to exchange data with other UAVs and cloud servers. Secondly, this paper introduces a new reinforcement learning (RL) framework for learning the optimal signal-aware UAV trajectories under quality of service constraints. The proposed RL algorithm is instrumental in making the UAV movement decisions that maximize the signal power at the receiver and the data collected from the ground agents. Simulation experiments confirm that the system overcomes conventional wireless monitoring systems and demonstrates efficiency especially in terms of flexible continues connectivity, line-of sight visibility, and collision avoidance. The results show that the proposed healthcare system is feasible and innovative. Furthermore, the system shows good performance under different conditions. Keywords—contact tracing, UAVs, Covid-19, wireless monitoring system, wireless mesh networks, reinforcement learning


Introduction
The most important step in the fight against COVID-19 pandemic is to prevent the spread of the disease [1]. Wireless communication technology has been instrumental in healthcare system at different levels [3][4][5]. Wireless technology enables real-time monitoring of patients and connect different units in the healthcare system in unprecedented fashions. In our work, the aim of the proposed wireless monitoring system (WMS) is monitoring persons who are infected with COVID-19 to prevent transferring the disease. Contact tracing (CT) is defined as the process of specifying the persons (contacts) who come close to an infected person [1,2]. Owing to the possibility of transmission, contacts should be tested for infection. In order to reduce the likelihood of transmission, each person should be isolated if the test result is negative [1,2]. However, the person should be treated and isolated if his/her test result is positive [1,2].
infected persons. After that, it sends a list of patients to all users. The system broadcasts alert messages for users to leave the place if infected person is detected. The attributes for each close physical contact (CPC) are recorded in the database if physical distance between persons is less than certain threshold (i.e. 1.5 m). These attributes include: the telephone number for each person, time of CPC, date of CPC, and distance between the persons. Furthermore, the database contains appropriate actions such as calling via phone an emergency number, and informing infected persons to quarantine themselves. Each action taken is preferably responsive to the particular individual uniquely identified by the identifying signal. The major contributions of this paper are summarized as follows: 1. A new architecture is proposed for WMS where an intelligent effective wireless mesh technology and UAVs are integrated to build a wide-range wireless network for CT in wide area. The proposed WMS is IoT networks that consists of a massive number of heterogeneous end user devices. These devices include: smart phones, sensors, electronic gadgets and wearables, household electrical, and anymore. The system offers an efficient and reliable connectivity that enables decision-makers to take preventative action for breaking the chains of COVID-19 transmission by exchanging warning messages to inform the contacts of infected cases as quickly as possible about the risk of infection, in order to take the right steps in a timely manner. 2. Developing an automated, and reliable method to select the values of the configuration parameters where new intelligent RL based algorithm is suggested for optimal configuration of UAVs. The main concern of the proposed algorithm is enabling UAVs to provide wireless coverage for mobile users by maximizing the total received power while satisfying WMS constraints. 3. Analyzing the performance of the RL configuration model under different network conditions.
The rest of the paper is organized as follows: Section 2 briefly surveys the relevant work; our assumptions and work environment are shown in Section 3; Section 4 describes our scheme for contact tracing and configuring the monitoring wireless network. The performance results of our scheme are evaluated in Section 5. Finally, the paper is concluded and future research directions are given.

Related work
Due to a lack of surveillance system for controlling Covid-19 pandemic, many patients are dying. A system that provides continuous health monitoring and emergency reporting is required to assist them. WMS is a reliable technology for monitoring and ensuring the safety especially for dealing with COVID-19 pandemic. Besides providing accurate, rapid, and low-cost, WMS supports real-time data gathering from multiple sensors. Drones have been used in WMS to guarantee connectivity and to support long-distance peer-to-peer communication. They are routinely used in healthcare monitoring systems in many countries. In [9], new monitoring system for patient at home has been proposed to reduce the cost of monitoring patients and to allow a patient to get a full range of services at home. Networks of IoT were used to monitor patients remotely. Firstly, the network collects healthcare data from patients using sensors and actuators. After that, it sends data to a Cloud-based Hospital Information System (HIS) for processing. In [9], new system was proposed for a remote health monitoring system based on IoT. Several technologies have integrated in this system to provide up-to-date health monitoring systems in smart environment. These technologies include: IoT, cloud computing, smart environment, and virtual machines.
In [10], new healthcare system was proposed to overcome the delay with traditional WMS. A model for monitoring and processing the received data was used. Furthermore, the patients' health data are analyzed to identify anomalies. In [11], telecommunication techniques were used to develop new device that collect the patients' health data and transfer the data to a remote device wirelessly. These data includes: a patient's body temperature, heart rate, and electrocardiography. In [12], new healthcare system was proposed to collect patients' data such as: weight, height, and pulse rate. The data are transferred to a device that analyzes the data to generate final report of the patient health status. Many wireless devices are used in this system. These devices include: microcontroller, height sensor, weight sensor, and pulse sensor.
Three sensors were used in the proposed system in [13]. These sensors include: temperature sensor, and heartbeat sensor. The temperature of patient is measured using temperature sensor. The system sends alert to the processor if the heartbeat of the patient changes from normal rate. Authors proposed a new framework in [14] to collect patients' data in real time. Furthermore, the proposed system suggests medical whenever needed. The framework integrates different technologies such as cloud technology, sensors, and mobile devices. The collected data are stored in cloud to make the data available for physicians, paramedics, or any other authorized entity.
New framework was proposed in [15] for developing health monitoring system. The main concern of the system is providing extensible and usable services for pervasive patient care. Sensors are used to collect patient's physiological parameters in this system. Beside analyzing patient's data, smart phones are used to send the results to a medical center system. A home-based wireless monitoring system was considered in [16] for monitoring patient in their own home. Zigbee technology was used to gather data. These systems can continuously collect patient's physiological parameters and offer further analysis and interpretation. Authors proposed a novel, IoT-aware, smart architecture for automatic healthcare monitoring system in [17]. The system can track patients anywhere and anytime. Besides collecting patients' physiological parameters in real time, the system can gather environmental conditions. The collected data is fed into a control center for processing the data.
In [18], new scheme was proposed to optimize the transmission rate, the transmission power, and the allocated time slots for each sensor in the healthcare monitoring system. Furthermore, authors proposed new sub-optimal resource allocation to minimize the time complexity of the resource allocation. In order to support active and assisted living health care services and applications, authors proposed new technique that integrates IoT and non-interoperable IoT platforms for monitoring patients in [19].
The main concern of the proposed technique is detecting and correcting wrong lifestyles or critical situations quickly. New IoT enabled wearable device was designed in [26] to send notifications when any of the monitored parameters falls outside of the typical range. In [27], authors proposed new application-based temperature monitoring system. The system displays real-time temperature data. In [28], authors proposed new nonprescription drugs mobile health application (NMMHA) for helping patients in the initial medication. However, most of these studies assumed that patients are located indoor. More specifically, all previous studies neglect monitoring patients in outdoor environment. Therefore, in our work, we study integrating UAVs with wireless mesh network for the COVID-19 remote patient monitoring service and enable developing alerting system to inform at-risk people (i.e. contacts) about what to do for contact. Therefore, we propose using UAV to enhance coverage, reliability, and connectivity for monitoring system. The problem is formulated as maximization of signal power at receiver to cover the entire zone.

System model
We present our assumption in this Section. Each MR is responsible for health and safety at certain subarea in the context of COVID-19. The dimension of each subarea is 250 m X 250 m. The network consists of three types of nodes: MRs, mesh clients (MCs), and UAVs. While MRs have fixed locations, MCs and UAVs are moving and changing their places arbitrarily. On the ground, MCs (i.e. IoT devices) send the collected data about CT to MRs. Each node is equipped with a single IEEE 802.11b based transceiver. The spectrum is partitioned into non-overlapping channels (16 channels with 5 MHZ spacing with transmission and power mask restrictions similar to the ISM band). MC could be referred to as the integration of sensors, actuators, smart phones, vehicles, and any wireless system which can gather data and fed it to MR. The main principle behind MCs in our work is that the objects connected via the Internet can collect data with the help of existing technologies and the collected data would be sent to MRs. Each UAV searches for MR in its range to make WMS. UAVs can move dynamically to collect data from the MRs using uplink communication links.
Basically, UAVs are used as base stations. Assume (x i , y i , z i ) is 3D location of the i th UAV. We assume that MRs are distributed in 3D space and (x j , y j , z j ) is the location of j th MR. The distance between i th UAV and j th MR is computed as follows: The path loss model in [18] is used in our work and it is computed as follows: where P F is the free space path loss, P B is the building penetration loss, and P I is the indoor loss. Figure 1 illustrates the architecture of the proposed WMS. We assume that each MR has a unique ID, which can be the MAC address of the node. Covid-19 drone-based wireless monitoring system In this section, we propose a drone-based wireless monitoring system. In this system, drones configure a wireless mesh network to enable monitoring CT in wide area.
The proposed system is shown in Figure 1 where set of drones are distributed over regular zones to establish wireless network with drones serving as wireless relays for improving connectivity and coverage of ground IoT devices.

Signaling protocol for CT
In this section, we introduce the signaling protocol for WCT. Contact tracing is conducted in two levels: local (zone), and global (the whole area) level. In our scheme, tracing is managed as follows: Step 1: Every MC (i.e. IoT device) collects physical distancing measures and users' data.
Step 2: All MCs send their distancing measures and data to MR.
Step 3: MR combines results from all MCs and generates a final contact tracing's status.
Step 4: MRs exchange zone's status using UAVs and then a final status of area is extracted at each MR.
Step 5: A new status for each zone is broadcasted to all MRs using UAVs.
In our system, Markov Decision Process (MDP) is used to extract optimal policy for controlling UAV's location. This problem can be formulated a follows: find the optimal UAV place, which maximizes the signal power at receiver. The received signal power is defined as follows [20]: where S d r is the received power at distance d, α is the cost of increment of path loss P, and S t is the transmitted power. The main concern of our policy is determining the optimal location of UAV such that the transmitted signal can be detected clearly by MR. This problem can be formulated as follows: where ω is the threshold for signal detection, and ρ is the interference threshold. To maximize the received signal at distance d, the derivative of the received power at distance d with respect to the configuration of UAV C t at time t, should be equal to zero. Maximizing the power of the received signal can be expressed as follows: Let θ be the current orientation of UAV in the sky, where θ ∈ [π, -π]. The configuration of UAV at time t can be defined as follows: where L t is the current location of UAV. The condition for optimal configuration for H UAVs in the system can be formulated as a requirement of having the system power of signals gradient, with respect to UAVs configurations, equal to the zero vector: In our work, we assume that for typical CT, the direct sensitivity of UAV's signal power to the UAV's configuration ∂ ∂ S C t t , is much more significant than the sensitivity to the path loss, ∂ ∂ α P C t . Then, condition in Eq. (5) can be written as follows:

UAVs' configuration adaptation model
In the configuration adaptation model, the necessary condition for signal power optimality (7) can be extracted using an iterative gradient minimization approach [21], where successive projections of the signal gradient is performed to converge ∇S(H) to 0. At each iteration, a step-size factor τ scales the projected configuration changes ΔH = (ΔC 1 , ΔC 2 , … , ΔC v ) to improve the convergence. Newton method is used to find δ is the received signal power of ith UAV at iteration n+1. In our work, MDP is used to model the UAVs' interaction with the environment. MDP is defined as follows: 1. The set of states Z is the system may be in. In our work, the position of UAV and S d r is the state of the system. The state of system at time t can be defined as follows.
2. Action space A which consists of a finite set of actions that change the state of the system. 3. Reward Function: the agent gets a reward (a numerical score) after executing an action. Eq. (3) is used to compute the reward for each action.
The main goal of our intelligent scheme is selecting a sequence of actions that maximize the total reward as follows: The location of the UAV changes after selecting an action. To decompose the environment, we use a 3D projection idea where a 2D graph is extracted from the 3D graph using the projection technique [22,23]. After that, the environment is divided into cells. The cell is considered as free cell if there is no object inside it. UAV can solely move to free cell. After choosing an action a t , the system transits to the state Z t+1 with reward S d r . Theorem 1: Average reward for the UAVs' agent is sensitive to the configuration of UAV and this sensitivity can be calculated as follows: Proof: after taking the action a t at state Z t , the gain for UAV's configuration under policy π can be expressed as follows: where Z t +Δ is the new state of the system after changing the configuration of UAV, and r(Z t ) is the reward of state Z t . The right-hand side of Eq. (9) can be written as follows [21]: The optimal policy π neglects any action with zero reward. The rate reward of policy π can be calculated as follows: where (Z t ) is the probability of state Z t . By using Eq. (9) it can be shown that Eq. (10) is equivalent to: Analogous proof holds for any change in UAVs' configuration. This analysis is helpful for an UAV's agent to decide the next configuration of UAV based on the sensitivity of reward to configuration.
Theorem 2: Given a set of potential configurations, the optimal policy p maximizes the long-run average reward calculated over finite time horizon while satisfying system requirements.
Proof: The policy π is optimal at time t = 0 since there is no localization decision done before. Policy π is extracted using RL [8] that generates optimal final configuration by iteratively finding optimal configuration at each time instant for an increasing num ber of configurations in system. Its main idea is based on the notion of assigning the next possible configuration C t i to the UAV at time t that would maximize long-run average reward: By contradiction, assume there is an action f t gives C t j such that mapping C t j to the UAV generates more reward. This would lead to the following: max(max ( , ), ( , )) max(max ( , ), ( , ) Using the non-decreasing property of G (Z t , a t ), the associative property of the maximum property of the maximum operator, and the definition of C t i in Eq. (18), we rewrite Eq. (19) as follows: iJIM -Vol. 15, No. 22, 2021 (1) Throughput, which is the average rate of successful message delivery over a communication channel [24][25]. (2) Bit Error Rate (BER), which is the average number of received bit in errors relative to the total number of bits received in a transmission during a studied time interval [24]. The network parameters chosen for evaluating the algorithm and the methodology of the simulation are shown in Table 1. Two metrics are used to evaluate the performance of the RL scheme: ─ The convergence metric which measures the speed of the adaptation of signal power to extract the optimal signal power following a load change. ─ The stability metric that evaluates the variation of signal power during periods of stationary load once the convergence is achieved. The convergence of the scheme can be defined as follows: where t c is the time of signal power change and t o is the time when signal power converges at distance d to its optimal value. The convergence deviation C d is defined as follows: (32) Figure 2 illustrates the tradeoff between the convergence and stability. Clearly, improving the convergence of RL scheme results in reduced stability. Figure 3 shows the signal power at receiver for different locations of UAV. It can be seen that our scheme extracts the optimal location for the UAV in our system. The extracted location certainly maximizes the throughput of network. In Figure 4, we evaluate the scalability of our scheme for large scale network. Clearly, the figure illustrates that the throughput increases as the number of MRs increases. However, after certain number of MRs the throughput stays at the same levels. As the number of MRs increases, spectrum utilization significantly increases which improves the throughput. On the other hand, the interference increases when the number of MRs increases which may degrade spectrum utilization. To study the effect of the load (i.e. λ packet arrival rate) on the performance of the WCT, we measure the packet loss probability under various number of λ. From Figure 5, we notice that as the value of λ increases (i.e. networks' load), the packet loss probability increases.
To study the effect of area density (i.e. the number of people standing in an area) on the performance of the WCT, we measure the bit error rate (BER) for different scenarios of densities. From Figure 6, we notice that as the density increases, BER increases. From these experiments, we find that using WCT for contact tracing becomes less efficient if the performance of wireless network is degraded due to the network congestion.

Conclusion
In this paper, a new IoT-based healthcare networked system has been developed to monitor the spread of covid-19. The proposed system gathers contact tracing data by IoT devices and UAVs in the targeted areas. The system can adapt the location of UAVs in the targeted zones in order to maintain network throughput maximization while providing required connection stability. The throughput maximization is achieved by a new RL model from which we extract the dynamic configuration of the UAVs. The proposed RL model allows integration of the signal power maximization and connection stability adaptation. The UAV provides efficient wireless connectivity for the WMS. More specifically, we present the concept of connected WMS, exploiting reconfigurable drone and wireless mesh network to cover a large area within a short period of time.
The results demonstrated that the proposed architecture is scalable and useful in remote and highly congested and affected pandemic areas where connectivity of the system is a major issue to ensure efficient contact-tracing operations. The proposed system converts the collected data into appropriate actions. In future, we wish to consider balancing the load between UAVs and MRs. Moreover, channel allocation for MRs and UAVs will be considered in future works. This is anticipated to create promising research directions.