Mobile Application to Detect Covid-19 Pandemic by Using Classification Techniques: Proposed System

Various mobile applications such as Mobile Health (mHealth) have been developed and spread across the world which has played an important role in mitigating the Coronavirus pandemic (COVID-19). As the COVID-19 pandemic spreads, several people have drawn parallels to influenza. While both viruses cause respiratory infections, they propagate in very different ways. This has a major impact on the public health measures that can be used to fight each virus. These viruses are pandemic-causing in the same way. That is, they both cause respiratory disease, and can present themselves in several ways, ranging from asymptomatic to severe and deadly. A proposal is presented in this paper that uses two algorithms to define and classify these pandemics, they are: The Back Propagation (BP) classification algorithm and the Fuzzy C-Mean (FCM) clustering algorithm. Two stages are implemented in the proposed system: in the first step, the FCM algorithm is used to find out the type of virus, and this algorithm is capable of handling ambiguous features of viruses. In the second step, a BP neural network is used as a classifier to detect the pandemic class. The proposed system was trained and tested using a well-known dataset (covid-19 vs influenza). Information Gain (IG) is used to optimize the related features that affect the classification process to improve speed and accuracy. The proposed mobile application is developed to support users easily detecting the COVID-19 infection by inputting the medical tests as significant features to the proposed system. The proposed system’s accuracy is up to (89%), the framework was created using the Matlab programming environment and an Android Studio for Mobil application designing. Keywords—COVID-19, Fuzzy C-Mean (FCM), propagation (BP) classification, information gain (IG), mobil application 34 http://www.i-jim.org Paper—Mobile Application to Detect Covid-19 Pandemic by Using Classification Techniques...


Introduction
In different countries, every day, the number of cases of a new coronavirus (Covid-19) is increasing. COVID-19 is an infectious pandemic that has frightened the world and continues to put billions of people's lives in danger. The identification of coronavirus  has recently become a vital activity for medical practitioners. COVID-19, unfortunately, spreads It has spread so quickly among people that it has now reached millions of people all over the world in just a few months. It is important to rapidly and reliably classify infected people to avoid the spread of the pandemic [1,2]. To assess policies and strategies, the analysis of distributions of the virus's spread, and the study of the relationships between the features that have ratios. In this analysis, the distributions of Covid-19 or influenza virus spread were correlated and clustered using the fuzzy clustering technique. Despite the use of several diagnostic methods to identify individual injuries, the diagnosis reliability has yet to be achieved. In this post, the inputs are combined using the FCM clustering algorithm. Clustering's major part, according to news studies, is to split a dataset into clusters that represent similar objects in one cluster and dissimilar objects in another [3]. Back Propagation (BP) artificial neural networks can address several problems that other current intrusion detection techniques can't. There are three advantages of using a neural network to detect intrusions: In the intrusion detection process, neural networks provide elasticity by allowing the neural network to analyze data that is either entirely correct or partially correct. Similarly, a neural network that can perform nonlinear data analysis is useful. Neural networks have the potential to interpret data in a non-linear fashion from a variety of source [4][5][6]. Contact tracing was one of the first COVID-19 technologies to be industrialized and widely publicized. It was created to alert individuals whether they had crossed paths with another person infected with the coronavirus. In Singapore, the first national smartphone application for touch tracing was developed using Bluetooth technology. In response to COVID-19, mobile applications for symptom tracking have also appeared. These apps usually gather information about a user's health by presenting a list of questions about symptom detection, from which a distinct diagnosis is derived. Other sophisticated methods, such as automated assembly of the user's health data (such as temperature and pulse rates) from wearable like wristbands, have also been used. In the event of a suspected COVID-19 infection, the user is informed and directed to a nearby clinic for a checkup [7][8][9][10][11].

Related work
Now will present some of the related works we are standing on it to introduce the proposal of detecting COVID-19.
1. In [12], this research aims to create a fuzzy supported method to evaluate the safety of dental treatment concerning a variety of patient and environmental factors. For dental care providers, a fuzzy framework may be used as a diagnostic method to make diagnoses, identify and determine risks, and improve interventions. As a result of this research, the fuzzy method will measure and classify the risk of dental treatment based on the patient's health status and dental hygiene conditions. 2. In [13], this work introduced an artificial intelligence paradigm that seeks to solve new cases by using COVID-19's vast database of cases. In this analysis, the researchers used an improved CBR model for a state-of-the-art reasoning challenge in the classification of alleged COVID-19 instances. The results showed that the suggested method in this work correctly categorized suspicious cases into their categories with a 94.54 percent accuracy. 3. In [14], based on previously reported COVID-19 cases identified in China, The number of confirmed COVID-19 cases over the next ten days is estimated and expected using new prediction models presented in this article. The suggested model is an enhanced flower pollination algorithm (FPA) and an advanced adaptive neuro-fuzzy inference approach (ANFIS) based on the salp swarm algorithm (SSA). 4. In [15], this article, the growing spectrum of confirmed COVID-19 cases was estimated using an SVM with fuzzy granulation. By showing that the Elman neural network and SVM used in this analysis can forecast the progression of combined reported cases, deaths, and healed cases, while the LSTM is best suited for cumulative confirmed case prediction. The SVM with fuzzy granulation will effectively forecast the growth continuum of recorded new cases and newly healed cases, despite the average expected values being marginally high. 5. In [16], this post would go into a Dual Diagnostic Strategy that has been suggested in detail (HDS). The primary objective of HDS is to classify COVID-19 incidents easily and reliably. Early diagnosis of COVID-19 cases allows for prompt care and separation of patients, which slows the transmission of the pandemic. HDS is given a training series consisting of laboratory findings from COVID-19 and non-COVID-19 individuals. Since the model has learned to recognize trends, new cases can be added to it (including laboratory findings). HDS determines whether or not the input case has COVID-19 infection. 6. In [17], a review was published concerning a systematic search strategy to identify the free mobile applications linked to the COVID-19 pandemic in the App and Google Play stores. Various applications have been created for various functions such as contact tracing, knowledge building, appointment booking, online consultation. a few applications have incorporated functions and features like self-assessment, consultation, assistance, and information access, designing and developing an integrated mobile health application that incorporates the majority of the features and functionalities.
In this work, a symptom monitoring application in the mobile phone as a web-based COVID-19 screen tool application is developed. A proposal system starts with the FCM algorithm, which is used to cluster diseases. Because of its intrinsic processing speed and ease of representation of nonlinear relationships between input and output, the BP algorithm is used to define the class of pandemic. A neural network will be capable of analyzing data from a network and providing the ability to distinguish and differentiate network behavior based on limited, imperfect.

Data mining techniques
Various data mining techniques, such as Clustering, Classification, and Information Gain used for feature selection. The two data mining techniques used in the proposed study are illustrated in this section.

Fuzzy C Mean (FCM) algorithm
Based on degree affiliation values, the fuzzy clustering algorithm splits data into separate clusters, each cluster contains data that is more similar to one another while still being different from data in other clusters. Soft clustering occurs when data points in the fuzzy c-means algorithm belong to more than one cluster with some degree of membership. The data is organized into clusters using a fuzzy participant ship function. [18 19] The fuzzy clustering algorithm estimates using equation (1) and is dependent on minimizing objective j. This is focused on the entity function minimization Where m is the domain's actual number (1≤m<∞). k is cluster number. n is the data sample number. u i is the degree of membership that indicates the likelihood that the data sample will be used. x i belongs to j th Cluster. And c j is the center of the cluster.
Using equations, repeatedly modify cluster center c j and fuzzy membership u ij to achieve fuzzy clustering (2) (3).
Where u ij denoted the degree of membership of data samples that belonged to a particular cluster and met the following criteria (see equations (4) and (5)) [20,21]:

Back propagation NN
Backpropagation is a classification method in which the algorithm is told to validate the network using input and output tests. It's a multilayer feed-forward neural network made up of layers of neurons with one output layer and one or more hidden layers. The forward step, in which information is applied and propagated to the output layer, and the information is extended and propagated to the input layer in the backward step In the two stages of backpropagation learning, the error is measured and the weight is modified to reduce the error, allowing the ANN to learn the data [22,23]. Weights between the input and hidden layers multiply the input of each input layer neuron in the input layer. Using equation (6), each hidden neuron (j) in the hidden layer obtained the value Zj(j).
z xw Equation (7) the hidden layer's output is processed using the binary sigmoid function activation: Equation (8) find the hidden layer output: The mean square error's worth E is a measure that can be used to evaluate how much you've learned. If the MSE is less than (0.1), the net has learned its training package. In the following equation (9) E was calculated as the mean square error value: Where the p is Samples number, Y pk is the actual output, and T pk is target output. If the network output in the backward step differed from the target output, the output error is measured, and the error is then sent to the input layer, changing the weights of the layers' neurons. Equation (10) can be used to calculate the error between the output and hidden layer: Equation (11) can be used to calculate the error between the secret and input layers: Using the equations (12) and (13) below, To reduce the error, the weights are adjusted: Where w jk = the weight differential between the hidden and output layers, w ij = the weights differential between the hidden layers and the input layers, ƞ = Learning rate, and ∞ = the MC [24,25].

Information gain (IG)
The IG assesses characteristics by determining how much knowledge they receive concerning the class. Let C be a series of c data samples divided into m distinct groups. A sample of class I can be found in the training dataset C i . The expected data used to identify a sample is determined [26][27][28].
C i is the setʼs subset that contains the value f i for feature F. Let C j contain C ij class sample i. F is the entropy of the function.
F's information gain can be estimated as follows:

COVID-19 pandemic detection
The coronavirus (COVID-19) is a highly infectious and disease-causing infection caused by the extremely essential breathing condition coronavirus 2 (SARS-CoV-2), which first appeared in Wuhan, China, and then spread across the world [29]. Figure (1) presents the structure and symptoms of this disease.
According to the World Health Organization, the COVID-19 diagnostic test is important for monitoring the infection, understanding the epidemiology, notifying case supervision, and stopping transmission. To classify the COVID-19 virus, diagnostic testing is carried out using a variety of in-house and profitable analyses. Tasters may be inhaled through the nose or the back of the mouth. A sample from the lower respiratory tract can provide the best results for hospitalized patients. Antigen testing reveals whether anyone is infected with Covid-19 and therefore can spread it to others. Antibody studies, on the other hand, use blood samples to determine the immunity induced by the previous infection. Antibody test kits look for proteins that the virus uses as "glue" to trap antibodies in the blood [30][31][32]. Mobile application covid-19 detection: proposed system The following sections will present the details of the proposed system over four sections; general description, pre-processing the dataset, feature selection, Clustering using fuzzy c means, and classification using backpropagation neural networks are the primary approaches [33,34]. Before describing the proposal of detection, the proposal structure collecting the symptoms is described as shown in Figure 2. In our model, there are eight sensor nodes on the human body. These sensor nodes all start with the same amount of resources, computation, and storage. The sink node is located in the network's middle to offset the network's energy usage [35]. dataset. After dataset selection Pre-processing operation conduct on the dataset to transform it into an appropriate form for further processing. 2. Second step: Information Gain (IG) algorithm used to find the relevant features form irrelevant feature, where coved19 dataset consists of 52 features that increase the computational cost and time of training and testing. This step reduces the computational cost and increase the speed of the proposed system make the proposed system highlight. 3. Third step: The Fuzzy C-Mean (FCM) algorithm was used to create two clusters based on the type of symptom (covid-19 or influenza). 4. Fourth step: in this final step backpropagation algorithm is used to build a classifier to classify the class of pandemic.
The following block diagram, see Figure 3, and present the details of the proposed system through four sections;

General description of the dataset
The dataset has been standardized for it to be suitable for use by the proposed algorithm. The dataset's features are divided into numeric and symbolic features, which are categorized into the following groups (Disease group, WBC, Neutrophil, Lymphocyte, Monocyte, C-reactive protein, Severity, Number of UMI, Number of Gene, Percentage of mitochondrial gene, and the class Disease condition). The phases of the standardization process are as follows: A. Convert a symbolic feature's value to a sequential integer value. Since the fuzzy c-mean algorithm and the backpropagation algorithm take numerical values into account. In the table, the symbolic function and its integer value are shown in Table 1.

Feature selection
The feature selection algorithm used in this work is Information Gain, which is a filter selection process. In the IG algorithm, features are weighted based on their material knowledge, with the target class's information being determined first, the entropy for each variable is then calculated. By subtracting the feature's entropy from the target feature's entropy, the information gain for each feature can be determined. Figure 6 shows the function selection criteria. See the algorithm for a detailed explanation of the IG equation (18,19).
Then, using equation (20) to calculate the information gain for each function.

Training and testing
The FCMNN proposed system consist of two phases:

A. Training phase
The proposed system's training process begins with a standardized training dataset as input data. The first stage was then used to detect coved19 from the influenza virus using the FCM algorithm. FCM is a clustering algorithm that uses membership to decide the degree to which each data point belongs to a cluster. Its central concept is to find the clustering center for each group by dividing n vector x (i = 1, 2, n) into c fuzzy groups. Fuzzy C-means employs fuzzy partitioning, that is, membership is used to decide how often each data point belongs to a category, with each data point's value ranging from zero to one. The algorithm begins with a cluster center that has been initialized. The proposed system's second level is based on the BP algorithm. The BP algorithm starts with random weights, input data specified by covid-19 dataset attributes, the number of hidden units, and two outputs, one for each covid-19 dataset class identified by (covid-19, influenza) class. Figure 7 shows that the steps of the proposed system.

B. Test phase
The proposed system's testing process receives testing samples from the Coved-19 dataset, the optimal cluster center results from the first level of training, and the optimal vector of weights results from the second level of training. The membership matrix for each patient sample is calculated at the first step. After that, each sample is assigned to a cluster class. The covid-19 class then separates from the influenza classes. The second stage receives a vector of weights from the training phase and a pandemic cluster from the pandemic cluster phase, during which the BP algorithm forward phase begins to detect the pandemic type.

6
Discussion and experimental results The proposed system's first stage yields four potential outcomes and is referred to in the Table 2 confusion matrix. True negative (TN) indicates that normal behavior was correctly predicted, while true positive (TP) indicates that pandemic was correctly predicted, false positive (FP) indicates the incorrect prediction of influenza probability as covid-19, and false-negative (FN) indicates the incorrect prediction of influenza probability as covid-19. To assess the proposed system's performance, two potential outcomes can be obtained, which are referred to as the pandemic confusion matrix (see Table 3).
To evaluate the proposed system Table 4 describe the accuracy of the covid-19 dataset of (37557) patient sample with (10) features into methods FCM and BP. There are two stages of the proposed system. The FCM algorithm was used in the first step to cluster covid-19 from influenza. FCM's success is influenced by several factors. The following parameters are set experimentally: A. The fuzziness parameter's (m) value has an impact on FCM's efficiency. (m) Was measured with four different values during the training phase: 1.5, 1.9, 2, and 2.5.
When the value of m = 2 is used, the best results are obtained. B. The cluster centers' initial values are set to random values. C. After much trial and error, the number of iterations used to find the FCM stopping criterion has been set to 9.
BP was used in the second step to detect the type of pandemic traffic. BP's success is influenced by several factors. Experiment with the following conditions: A. The basis value that has an impact on BP = 1's learning process. B. The BP was measured using four hidden units: 5, 10, 20, and 40. When the number of hidden units is equal to 20, the highest result is obtained. C. The number of input units equals 10, which is the number of features in the Covid-19 dataset that is presented. D. The output layer has a unit count of two, indicating that there are two classes of pandemics in the Covid-19 dataset. E. The learning rate and momentum coefficient have an effect on BP's output convergent with the goal output. Various values were tested during the training process. The best results were obtained when the learning rate was equal to one and the momentum coefficient was equal to one. F. The BP algorithms initialize weights. The value of weights set to random values during running time is examined in this work. G. The mean square error was used to find the BP stopping criterion. The mean square error in this work was 0.001. H. The highest number of iterations. The overall number of iterations in this work is 40.

Conclusions
This research has identified a mobile application that may be potentially useful to mitigate the COVID-19 pandemic. In this proposed work, a model is suggested to detect the type of the disease for which (covid-19 or influenza) with high accuracy and detection rate. Also, it was able to detect the class of the pandemic. It was shown from the experiments that, the sample of data that was incorrectly classified as covid-19 in the first level may be recognized as unknown classes in the second step of the model. In addition, this work is expanded to detect pandemic subtypes such as classified covid-19 to covid-19(mild) and covid-19(severe) sup classes, and classified influenza class to flu and asymptomatic sup classes. Furthermore, smartwatches and smart bands are used as symptom monitoring because they became more popular and incorporated into people's everyday lives and potentially assisting in the critical monitoring of vulnerable populations' health statuses. In the future, automated and rapid detection of suspected infections will become more effective using a convolution neural network (CNN). Also, the diagnosis accuracy can be increased by improving the symptom management algorithm and adapting it to the pandemic. Mobile doctors would be the way to go in situations like this where self-isolation is needed, however, as technology advances, "digital humans" could be a viable option for reducing the burden on healthcare workers in future pandemics.