EANNMHO – A Novel Ensemble Based Technique for Liver Cirrhosis Detection

In today's fast moving world, Liver Cirrhosis is considered as an aspect having substantial significance both at the national level and international level. The preliminary interest of medical science is to develop a constructive method to predict the Liver Cirrhosis at an early stage. The extreme heterogeneous nature of the disease along with non-standardized treatment makes its management a complex issue. Though medical modalities assess the disease, patients responses creates variation in them. Machine Learning techniques have been used in medical prognosis as it helps physicians to assess the disease faster. Taking this hint and contemplating the troubles faced by the physicians in diagnosing Liver Cirrhosis we have proposed a novel technique called EANNMHO.EANNMHO is a hybrid technique involving EANN-Ensemble Artificial Neural Network and MHO- Modified Harris Hawk Optimization and initially missing values are imputed using K-Nearest Neighbor. The Proposed model when evaluated against other ML techniques produces conclusive results.


Introduction
Cirrhosis of Liver represents a grave warning which is escalating over the globe, drawing greater interest [1]. Liver Cirrhosis is a steadily progressive disease which slowly replaces the healthy liver tissue with scar tissue. This head off the liver from proper functioning. The blood flow through the liver is blocked because of the damaged tissue and the processing of hormones, nutrients, drugs is slowed down. Over Consumption of alcohol for a very long duration is one of the leading cause of liver cirrhosis. some of other causes for liver disease are Destruction of the bile ducts, Genetic digestive disorder etc. Liver cirrhosis evolution takes place in two phases one is asymptomatic and as the disease progress to the second phase where the organ liver fails functioning. Some of the signs and symptoms are swelling of feet, legs or ankles, loss of weight, fatigue, bleeding, yellow discoloration in the skin and eyes etc. The use of Data mining techniques is a critical step in knowledge discovery, referred to as a nontrivial process of identifying well grounded, novel, fruitful and ultimately comprehensible patterns. Studies using DM techniques have inferred relationships in med-ical records. DM analysis can indicate frequent combinations in patient's symptoms in the case of cirrhosis, thus providing physicians an opportunity to treat them or alert the patients early. The aim of this work is to help physicians in Cirrhosis diagnosis by automating its presence from medical data. First stage of Liver Disease is normally asymptomatic causes simple fatty change. If the disease is detected early survival rate is high and also if treated in time further liver damage could be controlled. Also the possibility of changes in liver tissue may be ceased which results in increased chance of patients getting better or recovering from disease and leads to the likelihood of successful liver transplantation. So, it is very crucial to diagnose the disease early. With the proper diagnosis and early treatment or surgery, possibility of healing of liver itself over time and life risk of patients may be reduced.
In section 2 we have made an attempt to throw light on valuable work carried out related to Liver Cirrhosis, in section 3 proposed method is discussed, results of experimentation are shown in section4 and conclusion in section5.

Related Work
Computers in healthcare can significantly help disease o predictions based on recorded symptoms. Cirrhosis of liver from ultrasound images was detected in [2]. The study selected Region of Interest (ROI) with radiologist endorsements. Cirrhosis in ultrasound images was detected in the study using OTSU and modified local binary pattern. Categorization of cirrhosis was of great importance to medical diagnostics. Deep Learning (DL) techniques were used in [3] for classifying cirrhosis where the model used a correlation-based feature selection. The study in [4] developed a novel non-invasive model for diagnosing Cirrhosis using ANN. Cirrhotic liver was discriminated from normal liver in ultrasound images in [5]. Authors acquired Radiologist marked ROI ultrasound liver images from which differences in neighborhood pixel intensity was identified from cirrhotic regions and training a classifier on the identified regions for detections. Neural Networks (NN) support conventional image processing operations and help achieving objectives like discovering potential disease biomarkers. The study [6] used Neural Networks and texture analysis on CT (Computed Tomography) images where features were extracted and classified images with Cirrhosis using PNN (Probabilistic Neural Network, LVQNN (Linear Vector Quantization Neural Network) and BPN (Back Propagation Network). Articles on cirrhosis proteome serum profiles versus healthy controls were investigated in [7]. A CAD (Computer Aided Diagnostic) system characterized cirrhotic livers using multiresolution texture descriptors in [8]. Their scheme segmented 120 ROIs from 31 Bmode ultrasound liver images. The proposed model derives five multi-resolution texture descriptors. Their Mean and SD (Standard Deviation) was analyzed followed by a comprehensive criterion search for selecting features which was then classified. The model was compared with SVM. Thus, many methods discussed have been effective in detecting cirrhosis of the Liver. Proposed Method Data is collected by having oral interaction with patients, outcomes of certain tests such as liver function test, blood test and scanning reports. Proposed method is executed on a total of one thousand three hundred and fifty-one records, out of which seventy percent of data (nine hundred and forty five records) is taken as training data set and remaining thirty percent of data (Four hundred and six records) taken as testing data set. A total of forty-one attributes have been used in our study like duration of alcohol consumption, quantity of alcohol consumption, type of alcohol, diabetes, obesity, blood pressure, Hepatitis B infection, protein level, haemoglobin level, platelet count, etc.

Missing value imputation
Liver datasets do contain many missing values. During analysis, when missing values are in a smaller ratio they are discarded or nullified. But when the ratio is higher discarding of the attribute is not a good idea. In this work missing values are imputed with KNN which calculates distances between training and instance samples. Dissimilarities in samples are computed using Euclidean distance measure.

Ensemble ANN with MHO
In [9] an attempt is made to estimate the no of neurons that are hidden in NNs and arrived at a conclusion that it was like guessing implying there is no definite method for finding adequate number of hidden neurons. Any successful method should reduce variances in NN models by training multiple models and combining their results instead of using a single model output. Multi-learning called ensemble learning effectively reduces prediction variances. ANN [10] shown in the Figure 1 is a collection of interconnected processors called a neuron, the fundamental information processing unit of a NN. It includes an activation function and a weight. Weight is an important parameter of a NN while activation function is used for a non-linear mapping. ANN's neuron structure determines its architecture [11]. Activation function non-linearly transforms the input for performing complex tasks. The first layer is a single-feed forward network and does not have any hidden layers. The second layer multi-layered feed forward network with hidden layers greater than one. The final layer (output layer) is RNN (Recurrent Neural Network) that has one feedback loop. The proposed scheme uses a MFNN (Multi-layer Feed Forward Neural Network) with one layer hidden as it can approximate non-linear functions as in [12]. The link weights between neurons are computed with back propagation. [13] training as errors in hidden layer neurons can be judged. Back propagation optimizations in training are based on algorithms like conjugate gradient, Newton's method and steepest descent. Amongst Back Propagation methods, LM (Levenberge Marquardt)  Algorithm is fast and takes lesser time for convergence than steepest descent or conjugate gradient methods [14] We have used LM back propagation method. Ensemble techniques usage in environmental science and hydrology has enhanced forecasts. It is based on the idea single generic predictor performance can be improved by combining many individual predictor outputs [15] and referred to as Ensemble Mod-eling. Ensemble techniques can be divided into two parts where creation of individual members is first part and combining their outputs is second, part , thus producing the most appropriate output [16].In this work we have applied ensemble modeling without initial weights influences on the results of the model it's architecture is as shown in fig.1.Networks with many number of hidden neurons 1to generate randomly initial set of weights 1-1 to1-100 and n-1 to n-100, generated and fed simultaneously to training data. Fig.2 shows the architecture of hidden layer. After training, each ensemble model was tested (Networks n-1to n) for getting the model with best performance using R squared measure given in Equations (1), (2) and (3) as it results in identifying the EANN model that best fit the test data set where ()) -sum of the squares, *)) -sum of residual squares, + -i th observed value. , , -mean of + for observed data set, , ,-Ensemble mean of i th data set in the Network, RMSE (Root-Mean-Square Error) given in Equation (4) and IQR (Inter Quartile Range) given in Equation (5) distances between twenty fifth and seventy fifth percentile is used in order to measure bias between observed values and ensemble means.
Where /0 ( + )and !0 ( + ) are the twenty fifth and seventy fifth percentile EANN results for the i th data set.

Ensemble ANN with MHO
Harris hawks show teamwork in tracing, encircling, approaching, and attacking potential preys Their surprise pounce is effectively carried out on escaping preys. Hawk team members attack from different directions before converging. An escaping prey is followed by another team member and while trying to avoid the team comes in front of the prey. Similar to Harris Hawks, HHO algorithm explores and exploits. The algorithm has two stages namely Seeking the prey and Hunting which are explained mathematically. Seeking Prey is the search or exploration in HHO where two location updates are deployed. This stimulates wait, monitor, observe, and scouting actions in HHO before the prey appears as stated in equation (6) Cp(t + 1) = ; Cp #123 (t) − r 4 |Cp #123 (t) − 2r ! Cp(t) q ≥ 0.5 DPp(t) − Pv(t)G − r 5 DLb + r 6 (Ub − Lb)G q < 0.5 where t is current iteration no, ( ) is hawks current position, ( + 1) is hawks position vector in next iteration, *7.8 ( ) is individual hawk in iteration, ( ) is average position vector of the present population, ( )position of the prey for the value of best fitness in each iteration. Lb lower bound, Ub-upper bounds of the problem's decision variable, 4 , ! , 5 and 6 uniform distribution between 0 and 1. In the Transitional Phase, With increasing in time chances of appearance of prey increases, when hawks change from a search mode to capture mode. HHO designed for changing from exploration to exploitation takes into account , prey's escape energy. The energy keeps depleting before termination and its value is updated using Equations (7) and (8).
where : -Prey's initial energy in each iteration (Even distribution between -1 and 1), T -Max iterations. When | | ≥ 1 it is a Global search while | | > 1 implies the exploration phase of HHO where hawks' chases and eventually kill the prey. Though simple at the outset, it is a complicated and unpredictable process as it corresponds to prey's escape modes and changes in behaviors of hawk. HHO imitates the situation with r (uniform distribution in the range 0 and 1) or the chances of a prey escaping before a surprise pounce. If r ≥ 05 means prey cannot escape and r< 0.5 indicates it will not get caught. Energy variable indicates the style of siege selected by the predator. A Soft Besiege occurs if ≥ 0.5 and | | ≥ 0.5, implying prey's high energy which has to be exhausted and a surprise pounce is implemented after encircling the prey. It is explained mathematically in Equations (9) and (10) Where = (1 − 0 )prey's jumping strength , 0 -uniform distribution between 0 and 1. A Hard Besiege occurs if r ≥ 0.5 and | | < 0.5 in which prey has lower escape energy or is exhausted. Mathematically it is given in Equation (11) .
A Soft Besiege with Progressive Rapid Dives occurs if r< 0. 5 and | | ≥ 0. 5. A stage implying prey has sufficient energy in order to escape and where hawk should arrive is predicted. Mathematically explained in Equation (12).
Hawks dive when X is more competitive. HHO uses LF to mimic abrupt, rapid and irregular movements of hunting hawks where the position update with LF is equation (13) .
where S -random list of size 1 × and d -no of decision variables. LF -flight function said as Eq in (14).  where β -constant value =1.5, u & v -random values between 0 and, Γ-gamma distribution. Harris hawks location update strategy in HHO can be defined using Equation (15).
where f -fitness value function. A Hard Besiege that has Progressive Rapid Dives occur if r<0.5 and | |< 0.5 where prey is captured and killed. HHO does it by reducing the average distance to the prey and is depicted as Equation (16) ( + 1) = ;

Proposed HHO
Though HHO can outperform most techniques while optimizing as in [18], in this work we have proposed a modified HHO (MHO) to promote speed in its operations. Energy of a prey is a significant variable and determines the search. MHO uses a modified updating strategy for enhance exploitations. MHO implements metaheuristic explorations (Global search) and exploitations (Local search). Balancing these conflicting operations was a challenge. HHO updating strategy is = 2 : × 4 but in MHO six updates for 4 is proposed in Equations (17) to (22).
where : -random no in the interval (-1,1), sin/cos -sine/cos functions, t-no of iterations, T-max iterations and e-exponential function with value 2.71828. First update is a linear decrement of energy as in Equation (17) and is applied like conventional HHO. The second and third type of updates are power-based non-linear decrements in energy. Equation (18) is a power function that has a downward bend while Equation (19) has an upward bend. Fourth kind of update is convex-concave sine function, translated into Equation (20). Fifth is a concave convex sine function update as in Equation (21) The sixth update depicts an exponential functional decrement as in Equation (22). Seventh is a concave convex sine function update as in Equation (23). Eighth update depicts an exponential functional decrement as in Equation (24). Final value of 4 under the sixth update is non zero where it is assumed that even towards iteration's end, the prey may escape for an enhanced performance. The energy value can never be greater than one in the first, second and fourth updates. Update type four has high exploration capabilities followed by 1 and 2. MHO pseudo code is given in Table 1. Proposed EANNMHO method is implemented on collected data set of liver patients and is benchmarked with SVM [17] and ANN using the performance metrics, Precision, Recall, F-scale and Accuracy. Experimental results are presented in Table2 and Fig3. Looking at the Table2 and Fig.3 it is apparent that proposed method EANNMHO has higher Precision, Recall, F-measure and Accuracy compared to SVM and ANN. Proposed EANNMHO performance is good compared to SVM and ANN. Thus, it provides satisfactory results for diagnosing Liver Cirrhosis

Conclusion and Future Work
Liver disorders have been rising in India mainly due to over consumption of alcohol, harmful gases, contaminated food, pickles and drugs. Testing presence of cirrhosis in the Liver is a critical diagnostic procedure and non-invasive tools are needed. The impetus for this study is current predictive abilities are found to be mostly inconclusive. Hence, this study proposes a non-invasive method for detection of Liver disorders from liver data. In The proposed system, EANNMHO, missing values are imputed by using KNN and optimization is performed by Modified Harris Hawk Optimization. This work is proposed, implemented and demonstrated a non-invasive hybrid ML technique for Cirrhosis detection. Moreover, it has been validated against other ML techniques with known performance measures. The performance shows that the proposed system EANNMHO attains better results when compared to ANN and SVM. It can be concluded that the proposed system may be used by physicians to reduce their workload and increase efficiency in diagnosing Cirrhosis of the Liver. Further we are planning to apply our proposed model on more number of records, compare it with various other machine learning algorithms.