Paper—A Biologically Inspired ELM-based Framework for Classification of Brain MRIs A Biologically Inspired ELM-based Framework for Classification of Brain MRIs

Use of medical images for clinical analysis of various critical diseases have become increasingly predominant in modern health care systems. Application of machine learning technique in this context evolves as a potential solution in terms providing faster output with high diagnostic accuracy. In this work, we propose an Extreme Learning Machine (ELM) based classifier SFLAELM for detection of normal and pathological brain condition from brain Magnetic Resonance Images (MRIs). ELM is known for its speed and accuracy whereas the proposed method uses a swarm based evolutionary technique Shuffled Frog Leaping Algorithm (SFLA) and 10-fold cross validation method to optimally determine the network parameter of the ELM for better classification performance. The proposed model is experimented on three different brain MRI datasets of three different brain diseases. To get better approximation accuracy and generalization ability for the base ELM classifier, the suitable activation function and the appropriate number of hidden layer nodes are chosen. The performance validation of the proposed framework is done under two different network conditions, i.e. fixed network structure and varying network structure, by comparing its performance with two standard hybridized ELM classifiers, namely, PSOELM and ABC-ELM. The comparative performance analysis suggests that the proposed SFLA-ELM gives better classification performance in diagnosing the diseases in terms of accuracy, sensitivity, specificity, F-score and Area under ROC curve (AUC). Furthermore, the SFLA-ELM also found to offer better generalization ability and better stability with more compact network structure. Keywords—Classification, Brain Image, Extreme Learning Machine, ELM, Shuffled Frog Leaping Algorithm, PSO-ELM, ABC-ELM, SFLA-ELM.


Introduction
Detection of normal and diseased brain conditions from Magnetic Resonance imaging (MRI) with high precision and accuracy has been a challenge for the health care professionals. High quality brain MRIs represent a large set of detailed information about the brain tissue anatomy and the condition of the patient's brain. In spite of the knowledge and experience, the human vision system restricts the manual interpretation and analysis of MR images for the clinical experts. This is due to the large volume of information contained in an MR image which is hard to interpret by human vision [1][2]. This is the reason why use of automated image analysis methods utilizing machine learning and image processing techniques are of wide use in recent years in the field of MR image processing. These computers assisted diagnosis (CAD) techniques not only reduce burden on the radiologist and neurologists but also improve the accuracy and objectivity of diagnosis [3][4][5][6][7].
Extreme Learning Machine (ELM) [8][9][10][11][12][13], anticipated by Huang et al. [13], is an efficient and effective learning algorithm for Single Layer Feed Forward Networks (SLFNs), which is being used extensively for classification task. The input weights as well as the biases of the hidden layer neurons are selected randomly in ELM, from which the output weights are calculated mathematically using a simple inverse operation of the hidden layer output matrix. As compared to other traditional neural networks used for classification, ELM exhibits higher generalization ability, extremely faster training speed and smallest norm of weights [8]. ELM resolves several issues such as; local minima, stopping criterion, proper learning rate and the number of iterations in comparison to traditional neural networks trained using gradient descent methods. Literature study reveals that ELM shows better generalization and faster learning ability compared to contemporary classification techniques like Support Vector Machines (SVM) and deep learning [14].
However, ELM faces the challenge with respect to selection of input factors, e.g., hidden biases and the input weights, which are not optimal [15][16]. In ELM, while calculating the output weights, it tries to minimize the training error and finds the minimum norm of output weights. Because of the random selection of the input weights and hidden biases, sometimes the output matrix may not exhibit full rank leading to illconditioning of the network and may result in non-optimal solutions [17]. Hence, random selection of these parameters can ultimately affect the performance of ELM. Therefore, optimal selections of these parameters are crucial for the best performance of ELM and this implicates a complex optimization problem [18]. Research on enhancing the performance of ELM establish that evolutionary algorithms such as DE [19][20], PSO [21][22] and ABC [23] can be used to optimally determine the input weights and hidden biases of the SLFN by exploring the potential areas of the solution space. Use of such optimization techniques can also increase the generalization ability and stability of ELM with more compact network.
Shuffled Frog Leaping Algorithm (SFLA) is developed by Eusuff et al. [24], is a nature inspired meta-heuristic, based on population. SFLA comes under the class of memetic algorithms as it mimics the memetic evolution adapted by frog population while searching for location of good food sources available on discrete stones in a swamp. SFLA has been applied effectively to resolve several miscellaneous optimization problems in engineering [25][26][27][28][29][30][31]. This work makes use of SFLA to train ELM by optimally determining the network parameters.
Selecting an appropriate activation function is of great significance in ELM as it can affect the classification performance like learning accuracy and convergence speed of ELM. Literature on ELM reveal that ELM may produce different approximation accuracy and generalization ability for same problem or training data using different activation functions [32] [33][34]. Activation functions are somehow dependent on the dataset at hand and there doesn't exist any provable relationship between these two. In essence, the suitable activation function can be chosen optimally either by trial or by tuning [34]. The network structure of the ELM, defined in terms of number of hidden layer nodes, also largely influences the classification performance of ELM classifier [32]. In this work, the classification performance of ELM has been evaluated using ten different commonly used activation functions at varying number of hidden layer nodes. Based on the accuracy rate obtained, the suitable activation function and the number of hidden layer nodes are chosen for the ELM classifier.

Aim of the work
The aim of this work is to design an efficient classifier model that can distinguish the brain MRI samples as normal and pathological, with more accuracy and effectively ranks both the positive and negative instances with better success rate. In this work, a hybrid ELM based classifier, called SFLA-ELM has been proposed for classifying brain MRI data to normal and pathological brain. SFLA-ELM applies SFLA for enhancing the performance of basic ELM classifier by optimally determining the input weight and hidden bias of the SLFN. The proposed model is experimented on three different brain MRI datasets of three different brain diseases. To get better approximation accuracy and generalization ability for the base ELM classifier, the suitable activation function and the appropriate number of hidden layer nodes are chosen based on evaluating the classification accuracy of ELM on ten standard activation functions. The classification performance of the SFLA-ELM is validated by comparing its performance with two standard hybridized ELM classifiers, namely, PSO-ELM and ABC-ELM. The performance validation is done under two different network conditions, i.e. fixed network structure and varying network structure. 10-fold cross validation is used and average estimation is considered in all the experiments in this work to reduce the variance in performance estimation.

Organization of the paper
The rest of the paper is arranged as follows -Section 2 defines the methodologies adopted for experimentation; experimental design has been discussed in Section 3; the experimentation and result analysis has been given in Section 4. Section 5 discusses the findings of this study and finally Section 6 concludes the work highlighting future scope.

2
Methodologies Adopted for the Experiment

Extreme Learning Machine (ELM)
ELM is a very fast, simple and efficient learning algorithm for training the SLFNs. Unlike that of in traditional learning methods used to train SLFNs, in ELM the output weight matrix of the network is calculated mathematically in a non-iterative way from the randomly chosen input weights and hidden biases using the Moore-Penrose generalized inverse [8]. In ELM, the training is carried out in batch mode by presenting all the training data to the SLFN before computing the weights in a single iteration. ELM exhibits several striking features that makes it superior compared to other learning methods for SLFN such as; extremely fast learning speed with better learning performance, better generalization ability, free from getting stuck in local minima, no network tuning, no control parameter setup, simple and fixed network structure and ease of implementation [14]. While the gradient-based learning algorithms strive to reach minimum training error without considering the magnitude of weights, ELM tries to reach smallest training error with smallest norm of weights. The learning steps of the ELM algorithm can be summarized in the following three steps [8]. Given a training set = {( , ) | ∈ , ∈ , = 1,2, … . , }, hidden node number and activation function : Step 1: Assign the input weight and bias ( = 1,2, … , ) randomly.
Step 2: Calculate the hidden layer output matrix using the equation (1).

Shuffled Frog Leaping Algorithm (SFLA)
SFLA [24] is a population based meta-heuristic for solving hard combinatorial optimization problems. SFLA comes under the class of memetic algorithms as it mimics the memetic evolution adapted by frog population while searching for location of food sources available on discrete stones in a swamp. The algorithm comprises of three major steps namely partitioning, local search and shuffling, apart from initialization of population. While the local searches enable evolution of intellect possessed by frogs (meme) within each partition independently, shuffling allows global interchange of intellect among frogs in the entire population. The process of partitioning, local exploration and global shuffling continue until the optimal solution is achieved or the stopping criteria are met. The detail explanation on SFLA can be found in [24][25]

Working Principle of Proposed SFLA-ELM Classifier
In this section, a novel learning framework for SLFNs, called SFLA-ELM has been proposed. This method uses SFLA and the k-fold cross validation scheme to find the optimal input weights and hidden biases. Further, Moore-Penrose generalized inverse is applied to mathematically calculate the output weights. Fitness function used by SFLA for optimization is the minimization of the mean square error (MSE). The optimization problem thus, is based on minimizing the function stated in equation (5).
Here, = Number of samples, = Desired output and ̂=Obtained output The algorithm of SFLA-ELM for classification of MRI dataset is as follows: Algorithm: Brain Image Classification Input: Brain image dataset; Population size (P); Size of hidden layer ( ); Number of memeplexes (m); number of folds for n-fold cross validation (n).
Step 1 Divide the dataset to train_set (train_input, train_output) and test_set (test_input, test_output) as per n-fold cross validation Step 2 For j = 1 to n ℎ set is used as test_set and − sets are as train_set Step 3 Set K random weight population; each of size 1 × = { 1 , 2 , 3 , … … … . . , } for = 1,2,3, … … , Step 4 For each population find the error value in ELM using Step 5 Step 5 For each train_input find Step 6 Sort each population in ascending order with respect to maximum fitness considering minimum MSE.
Step 7 Find , where is with minimum value.
Step 8 Partition the populations into m number of memeplexes.
The distribution of the sorted population to m number of memeplexes is done in such a way, that, the first population is assigned to first memeplex and second population is assigned to second memeplex and the same process is continued till the ℎ memeplex. Then the m+1 population will be assigned to first memeplex and so on till all population are distributed.
Step 9 Within each memeplex, generate and through the objective function repeating Step 5 with respect to fitness value.
Step 10 Update position of with respect to using the eq. (2) and eq. (3) Step 11 If Fitness of ( ) is better than ,then, Replace with ( ) Else Update with respect to using equation (4) and equation (3) a. If Fitness of ( ) is better than ,then replace with ( ) Else Generate a new position of randomly Step 12 Merge the memeplexes and find the fitness value of the new population using Step 5 Step 13 Repeat Step 6 to Step 12 until the termination condition is satisfied Step 14 is considered as final weight and accordingly the is generated and used for the classification of test data.
Step 15 Repeat the process for all the value of j, find the classification accuracy at each fold to generate the average accuracy.

Experiment Design
Three ELM based classification frameworks, namely, SFLA-ELM, ABC-ELM and PSO-ELM were implemented for classification of the brain MRI datasets.

Experimental setup
All experiments have been conducted using MATLAB (R2016) on a personal computer with 3.30 GHz Core-i5 processor having 4GB RAM running under Windows 10 operating system. Throughout the experiments, all data were scaled within the range of [0, 1]. The proposed method for brain image classification employs ELM as the base classifier. In all experiments, 10-fold cross validation were used for training the ELM to ensure bias free and reliable performance estimates by nullifying the influence of training and testing set on the classification outcomes. In addition, for each fold, 20 trials were carried out in order to nullify the effect of random inputs to the network and the average values of the results are considered. For the optimization techniques used in this work applying SFLA, ABC and PSO, the maximum number of iterations considered was 100 and the population size taken was 20.

Datasets used for experimentation
The proposed classifier model is experimented on three brain image datasets collected from [36], supported and maintained by Medical School of Harvard University. These three datasets contain T2-weighted brain MR images in gif format with size 256×256.All brain images are in transaxial plane. For our purpose we have collected three different brain image datasets for three different brain diseases, namely, Alzheimer, Glioma and Multiple Sclerosis. Feature extraction is the first step to represents the image in its compact and high level form to prepare the image dataset for classification. The datasets used in this work are prepared by applying Discrete Wavelet Transformation (DWT) using bi-orthogonal wavelet function for image decomposition and feature extraction from the MRI images. The images are decomposed by using bi-orthogonal 1.3 wavelet function up to level 3 and thereby reducing the image size to 32×32. The detail description of datasets is shown in Table 1. Each of these datasets comprises of both normal as well as diseased instances for brain images pertaining to that particular disease.

Experimentation, Result Analysis and Validation
This section presents the details of the three separate sets of experiments that were conducted, the analysis of the results obtained and validation of the proposed classification method. The first set of experiments were conducted with an objective to choose the suitable activation function and optimally decide the number of nodes for the base ELM classifier for each dataset. In second set of experiments the classification of the three MRI datasets was carried out using SFLA-ELM

Experiment I: Determining the activation function and network structure
The network structure of the SLFN, i.e. the activation function and the number of hidden layer nodes, has large influence on the learning efficiency and classification accuracy of ELM. In order to decide the most suitable activation function and the optimal number of hidden nodes of the base classifier ELM, a comprehensive study considering 10 commonly used activation function on ELM has been performed. The activation functions used are sigmoid, tanh, sine, bipolar, hardlimit, Gaussian, tribas, radbas, multi-quadratic and inverse multi-quadratic. Series of simulations were carried out for each dataset to analyze the performance of ELM at different number of nodes by considering the activation functions and 10-fold cross validation is used for bias free assessment. In addition, for each fold, 20 trials are taken with the same number of nodes and activation function on a dataset in order to nullify the effect of random inputs to the network. Finally, the average values of the result are considered for performance measure. Table 2 shows the highest average training accuracy achieved for each activation function. The training accuracies obtained at different number of hidden layer nodes in these experiments are represented in Fig 1 to Fig 3. All experiments are conducted by varying the number of nodes with an increment of 5 nodes, up to 60 nodes. That activation function is chosen as the most suitable activation function which shows best accuracy at lesser number of nodes compared to other activation functions tested here. By inspecting the accuracy results obtained using different activation function it is observed that the Multi-quadratic activation function shows better classification accuracies for all the three datasets. In fact, we limit the maximum number of nodes to 60 for our comparative study as in all three datasets it is found that by 60 nodes the training accuracies obtained reaches the maximum.  After choosing the activation function with best performance, the number of nodes that led to this best performance had been considered as the optimal number of nodes for the SLFN. The network structure that we choose for the ELM classifier which gives best classification performance in all three datasets is given in Table 3.

Experiment-II: Classification and performance analysis at fixed network structure
In order to study the classification performance and effectiveness of the SFLA-ELM method, it was applied on three MRI datasets, namely, Alzheimer, Glioma and Multiple Sclerosis. Two of the established hybrid ELM models such as; ABC-ELM and PSO-ELM were also implemented for classification of the MRI datasets to have a comparative study of the classification performances and validate the effectiveness of the model. For the classification of each of the dataset, the network structure for the ELM and the activation function used in the experiments for all three methods were taken as per Table 3. Classification accuracy and Receiver Operating Characteristics (ROC) curve are the two important measures considered primarily to evaluate the performance of the classifiers. Moreover, other performance measures like sensitivity, specificity and AUC were evaluated in order to compare the effectiveness of the classifier. The average classification accuracy, sensitivity, specificity values obtained for SFLA-ELM, PSO-ELM and ABC-ELM in case of all the three datasets is presented in Table 4. SFLA-ELM is found to achieve highest accuracy of 0.947, 0.944 and 0.988 for Alzheimer, Glioma and Multiple Sclerosis dataset respectively. it is evident that SFLA-ELM better scores for specificity and sensitivity as compared to PSO-ELM and ABC-ELM in case of all the three datasets except the case of Multiple Sclerosis dataset, where PSO-ELM scores better than SFLA-ELM in terms of the sensitivity value. Figure 4 shows the box-plots related to the distribution of the classification accuracy values. Each box in the plot corresponds to the accuracy distribution range obtained over the 10-fold cross validation performed for classification using the algorithm. As we can figure out, SFLA-ELM generates more compact boxes in all three datasets compared to PSO-ELM and ABC-ELM, and this confirms that SFLA-ELM gives better stability in terms of classification results compared to the other two methods. Figure 5 to Figure 7 present the ROC curves for the classification methods on the three datasets. A close observation of the ROC curves establishes that, SFLA-ELM outperforms PSO-ELM and ABC-ELM in case of all the three datasets.

Experiment III: Classification and Performance Analysis at Varying Network Structure
Experiments were conducted to investigate the influence of varying the number of nodes of the SLFN on the classification performance of ELM, PSO-ELM, ABC-ELM and SFLA-ELM, with an objective to compare and determine the most compact network that yields best classification accuracy in each of the datasets. The numbers of nodes are varied from 5 to 80, with an increment of 5 nodes. 10-fold cross-validation is used and for each fold 20 trials were carried out. The plots showing the variation in classification accuracy with respect to increase in number of hidden nodes are given in Figure 9.  Table 5 reports the highest average testing accuracy reached by the three methods and the number of nodes at which the highest accuracy is reached. Examining the results in Table 5, it is evident that SFLA-ELM achieves highest accuracy at minimum number of nodes as compared to ELM, PSO-ELM and SFLA-ELM in case of all three datasets. This experimental result establishes that SFLA-ELM shows better classification accuracy with a relatively compact and robust network.

Findings
The analysis of the of the experiments results suggested that the proposed SFLA-ELM classifier exhibits better overall performance in successfully discriminating the diseased and normal brain MRI samples with better accuracy in comparison to the existing hybridized ELM methods, PSO-ELM and ABC-ELM. Some of the important findings of these experimental studies are as follows.
a) The proposed SFLA-ELM method achieves better accuracy in all three MRI datasets compared to the other two hybrid ELM based methods, i.e. ABC-ELM and PSO-ELM. b) Analysis of the box plot results for accuracy further confirms that SFLA-ELM gives better stability in terms of classification results compared to the other two methods. c) The ROC curve analysis of the three hybrid methods shows that the curve for SFLA-ELM is superior to ABC-ELM and PSO-ELM. The higher AUC values obtained for SFLA-ELM in case of all three datasets further prove that SFLA-ELM is more successful in ranking both the positive and negative instances. d) SFLA-ELM achieves better sensitivity and specificity than the other two hybrid methods and thereby making it evident that, the proposed SFLA-ELM method is more appropriate for diagnosis of brain diseases from MRI data. e) While evaluating the performance of the three-classification method over varying number of hidden layer nodes, SFLA-ELM achieves highest accuracy at minimum number of nodes as compared to ELM, PSO-ELM and SFLA-ELM in case of all three datasets. This experimental result establishes that SFLA-ELM shows better classification accuracy with a relatively compact network.

Conclusion and Future Scope
This paper presents a novel hybrid classification method SFLA-ELM to classify MR brain image data for diagnosis of pathological brain conditions. SFLA-ELM applies SFLA for enhancing the performance of basic ELM classifier by optimally determining the input weight and hidden bias of the SLFN. The performance validation was done under both fixed network structure as well as varying network structure of ELM. Experimental results reveal that the SFLA-ELM method out performs the other two hybrid methods PSO-ELM and ABC-ELM and achieves better performance in terms of various performance measures like accuracy, ROC, AUC, sensitivity, specificity. Furthermore, SFLA-ELM is found to be more successful in ranking both the positive and negative instances by achieving higher AUC values and also shows better classification accuracy with a relatively compact network with better stability and thereby making it evident that, the proposed SFLA-ELM method can be more appropriate for diagnosis of brain diseases from MRI data. Future work could focus on implementing this model for multi-class classification of brain MR images and also on other type medical images for diagnosis of other diseases. Larger image dataset can also be used to enhance the training process of the classifier.