Efficient CNN-Based Models to Classify WBC-CNN: Efficient CNN-Based Models to Classify White Blood Cells Subtypes

wmaoad@qu.edu.sa Abstract — Blood is essential to life. The number of blood cells plays a significant role in observing an individual’s health status. Having a lower or higher number of blood cells than normal may be a sign of various diseases. Thus it is important to precisely classify blood cells and count them to diagnose different health conditions. In this paper, we focused on classifying white blood cells subtypes (WBC) which are the basic parts of the immune system. Classification of WBC subtypes is very useful for diagnosing diseases, infections, and disorders. Deep learning technologies have the potential to enhance the process and results of WBC classification. This study presented two fine-tuned CNN models and four hybrid CNN-based models to classify WBC. The VGG-16 and MobileNet are the CNN architectures used for both feature extraction and classification in fine-tuned models. The same CNN architectures are used for feature extraction in hybrid models; however, the Support Vector Machines (SVM) and the Quadratic Discriminant Analysis (QDA) are the classifiers used for classification. Among all models, the fine-tuned VGG-16 performs best, its classification accuracy is 99.81%. Our hybrid models are efficient in detecting WBC as well. 98.44% is the classification accuracy of the VGG-16+SVM model, and 98.19% is the accuracy of the


Introduction
Blood is the life-sustaining liquid that courses through the whole body. The basic components of human blood are red blood cells (RBCs), white blood cells (WBCs), platelets, and plasma. The blood cells are produced in the bone marrow. Individuals might be influenced by a wide range of blood conditions and blood cancer. Common blood issues include anemia, oozing disorders such as: hemophilia, thrombus, and blood cancers such as: leukemia, lymphoma, and myeloma [1].
White blood cells play a significant role in the immune system of the human body. It is also referred to as a resistant cell. The WBC defends the body against epidemic diseases and extrinsic invaders. There are different types of WBC: eosinophils, lymphocytes, monocytes, and neutrophils, as appeared in Figure 1 [2]. All of the different types of WBCs have a part within the immune response.
A blood cell count is a frequently utilized routine in health examination, diagnosis, and determination of particular conditions of a patient. Depending upon which WBCs are lost, the patient is at risk for diverse sorts of infection. As a result, fast and accurate classification of WBC subtypes is critical for disease diagnosis.
The WBC classification has already been discussed in various studies. One of the most promising approaches to detect and classify the WBC is deep learning. To the best of our knowledge, we still need powerful WBC classification approaches. The aim of this study is to present efficient deep learning-based models to classify WBC subtypes. We have built six different models, two of them are fine-tuning CNN models while the rest are hybrid CNN-based models.
The rest of this paper is organized as follows: Section 2 presents a review of relevant studies. Section 3 discusses the methodologies used in detail. Section 4 presents the results of our models. Section 5 discusses the results of proposed models. Section 6 concludes the paper by mentioning the most important points. Finally, Section 7 proposed some ideas that could be implemented in future. [2] 2

Related work
Many researchers have proposed different machine learning based models for blood cells classification tasks. These models differ in the algorithms used for feature extraction and classification. Some models are standalone CNN-based, in which the same CNN architecture is used for classification and feature extraction tasks. Tiwari et al, proposed a stand-alone CNN-based model to classify white blood cells. The accuracy of their approach is acceptable for the binary classification, but it is not the case for the multi-classification. The binary classification accuracy (polynuclear and mononuclear) is 94%, and the multi-classification (eosinophil, lymphocyte, neutrophil and monocyte) is 78% [3]. To enhance the multi-classification accuracy, Daouda Diouf et al, employed a deep convolutional neural network (CNN) model to classify four cell subtypes, 95.3% is their model's accuracy [4].
In [5], a fine-tuned model to classify blood images into health and Unhealthy was proposed. This model depends on the AlexNet architecture for both feature extraction and classification tasks. 100% is its classification accuracy. Furthermore, six CNN models were presented to classify malaria images as healthy and parasite in [6]. The Convolutional Neural Networks (CNN) architectures that were used to develop these models are ResNet50, AlexNet, GoogleNet, DenseNet201, VGG19, and Inceptionv3. Among these models, the best classification accuracy was obtained by the Dense-Net201-base model, its classification performance is 97.83%. Additionally, Yusuf et al, used capsule networks to classify the WBCs into five categories, their model's accuracy is 96.86% [7]. Table 1 illustrates Summary of CNN models that were presented for classifying Blood cells. The other type of classification models that were presented in related studies is hybrid machine learning models, in which the machine learning algorithm used for classification tasks is different from that used for the feature extraction task. In [5], several different hybrid models were proposed to classify blood images into healthy or unhealthy, researchers used AlexNet for feature extraction and SVM, LDA, Decision Tree, and KNN for classification. The best classification accuracy, which is 99.79%, was obtained when the SVM was used for classification [8].
Another technique to build hybrid models is using CNN architectures for classification and different algorithms for feature extraction. One of the papers that apply that method is [9]. In this paper, authors used Region of Interest (ROI) for feature extraction and the Softmax architecture for classification to classify blood images into 6 classes. 99% is the classification accuracy of their model. Some authors presented non-CNN hybrid models to classify blood images. In [10] , the blood images classification model was developed using SVM for feature extraction and KNN for classification, its accuracy is 93% . Another model was presented in [11], authors in this study used the Pseudo-Zernike (PZ) Moments as textural feature extraction from the images and SVM for the classification, 97% is the classification accuracy of this model. Table 2 shows Summary of hybrid models that were presented for classifying blood cells.  Tables 3 and 4 illustrate details about the models developed using the same dataset that we used to develop our models for WBC classification. In [15], researchers presented several standalone CNN models and hybrid machine learning models to classify the white blood cells. The AlexNet, LeNet, and VGG-16 architectures are used for feature extraction and classification. While in the hybrid models, AlexNet is the architecture that was used for feature extraction and several conventional machine learning algorithms were used for classification. Among their models, the hybrid model that consists of AlexNet and the quadratic discriminant analysis (QDA) classifier achieved the best accuracy, 97.78%.  Table 4. Summary of white blood cells classification models that presented in [18] and were developed using the same dataset that we used in this study [2] Feature Extraction classification Accuracy Feature Extraction classification Accuracy Furthermore, ten pre-trained deep learning architectures were used to extract features in order to build the classification models. These architectures are VGG-16, VGG-19, ResNet-50, DenseNet-121, DenseNet-169, Inception-V3, Inception-ResNet-V2, Xception 1., MobileNet-224, and Mo-bile NASNet-A. For classifications, six machine learning classifiers were used which are Logistic Regression, Decision Tree, Random Forest, Naive Bayes, KNN, LDA. The best model consists of the MobileNet-224 and the logistic regression classifier, its accuracy is 97.03% [18]. In [2], authors used a new method, PatternNet-fused Ensemble of Convolutional Neural Networks (PECNN) for feature extraction and softmax for classifying white blood cells, the accuracy of their method is 99.90%.
As a hybrid method, researchers in [17] used the canonical correlation analysisbased deep learning architecture to extract features from the blood cell images. For classification, they combined the CNN and LSTM, the classification accuracy of their method is 95.89%. One of the promising hybrid models for classifying white blood cells was presented in [16]. This model consists of AlexNet, GoogLeNet, and ResNet-50 as feature extractors, and quadratic discriminant analysis as a classifier, the overall accuracy of this model is 97.95%.
Among those models that were built using the same dataset that we used, the best classification accuracy was achieved by the model that uses AlexNet as a feature extractor and QDA as a classifier, its accuracy is 97.78 % [15]. While the worst accuracy (26.43 %) was obtained by using the VGG19 for features extraction and the LDA for classification [18].

Methodology
In this study, we have used a WBC dataset and different machine learning approaches to build several WBC classification models. This section illustrates and discusses our classification models and dataset used in detail.
The first four detection models are hybrid CNN-based models. The other two detection models are fine-tuned CNN models. Figure 2 illustrates a diagram of hybrid CNNbased models for classifying WBC and Figure 3 shows a diagram of the standalone fine-tuned CNN-based models for classifying WBC subtypes. The main steps used to implement our models consist of pre-processing and augmentation of dataset images, extracting features from images, training the model using extracted features, and testing the model's performance using unseen data. Even though the steps are almost the same in both approaches, the algorithms, techniques, and strategies used are different. The rest of this section discusses that in detail.

Dataset description
In this study, we have used a WBC dataset that contains 9,975 augmented images of blood cells [19,20]. There are approximately 2,500 images for each of 4 different cell types grouped into 4 different folders (according to cell type).The cell types are Eosinophil, Lymphocyte, Monocyte, and Neutrophil. Each image has a 320 x 240 pixel resolution and a depth of 24 bits. Table 5 shows the count of images for each while blood cell subtype.

Hybrid CNN models
We have implemented our hybrid CNN models using Mobile-Net and VGG-16 as feature extraction methods, and SVM and QDA as classifiers. Figure 2 shows a simplified diagram of hybrid CNN-based models for classifying WBC subtypes.
Feature extraction. In order to train and develop our hybrid models, we have extracted 500 features using two different CNN techniques. The main reason for choosing CNN techniques for feature extraction is that these models are non-linear and therefore can learn non-linear features. The second reason is that feature vectors are often very large and dimensional.
To extract features to build our hybrid CNN models, we have used the Mobile-net and VGG-16 CNN techniques. The VGG-16 has 16 layers and a 3x3 convolutional kernel. The Mobile-net has 28 layers and it is considered a light-weight deep neural network with almost the same efficiency as VGG-16, but it is 32 times smaller. The VGG-16 is about 553 megabytes in size and it has 138 million parameters. The Mobile-Nets, on the other hand, is about 17 megabytes in size and it has 4.2 million parameters. In comparison to VGG-16, it has almost the same efficiency with 27 times less computation power.
Both techniques were designed to classify 1000 categories (Images classes) in their original versions. Their lower layers are used for general features (problem independent), whereas higher layers contain label-specific features (problem de-pendent). We have frozen the problem dependent layers before training our model to prevent their weights from being updated during the training. We extended these model by adding Dense, Dropout, and Batch normalization layers on top, where Dense layers are deeply connected neurons in that layer and the neurons in the previous layer, dropout and batch normalization layers are placed between each fully connected layer, and the whole thing is run on the input data images which were resized to 224 x 224.
Classification. Multi-class classification is a task of classifying categories into more than two classes. The cate-gories are classified as belonging to one of several predefined classes. In our hybrid models, we have used the Quadratic Discriminant Analysis (QDA) and the support vector machines (SVM) to develop the models. We used Grid-SearchCV as a hyper-parameter tuning method.
The QDA classifier refers to the use of a non-linear combination of predictor variables to classify categorical re-sponses. In addition, the classifier was chosen based on our experiments. Furthermore, it is a powerful classifier capable of capturing nonlinear features. The SVM method is a controlled machine learning method for classification. The Grid-SearchCV is the method used to find the best candidate parameters by looping through predefined hyper-parameters and fitting the model to the training data. Multiple parameters are tested by cross-validation and the best parameters can be extracted to apply for a predictive model. This method was chosen because setting the optimal hyper-parameter value can significantly improve the model's performance.
We trained our models many times to reach the best model. Training models start with 10 epochs, then we com-pleted the training process until we get better results. We trained the models again with 20 epochs and 212 steps per epoch, and we got the best trained models.
Models optimization. The Adam optimizer was used as an optimization method in our proposed hybrid models. Based on the several experiments, Adam is the ideal optimizer for our classification task. Specifically, to more efficiently update network weights, which is based on adaptive estimation of first-order and second-order moments, with the advantage of less memory consumption.

Fine-tuned models
To develop our fine-tuned models, we have used the same CNN technique such as Mobile-Net and VGG-16 for feature extraction and classification. Figure 3 illustrates a simplified diagram of standalone CNN-based models for classifying WBC subtypes.
Feature extraction and Classification. For fine-tuned models, the transfer learning approach was used to adapt existing models to the new problem for feature extraction and classification tasks. We froze the weights of all the base layers in the pre-trained models; since their weights are frozen, all output from the base models is sent directly to the other model. We use the sequential model, which functions as layers. This benefits us by inserting the frozen model as a layer in our model and expanding the model by adding fully connected layers. Only the weights from the dense layers will be trained using the 500 extracted features. We fine-tuned each model after training by unfreezing some layers in the base network and training both the unfrozen layers and the new parts we added jointly. Figures 4 and 5 show the frozen, unfrozen, and new layers when training models before and after fine tuning for both Mobile-Net and VGG-16 architectures, respectively  In our experiments, we trained models many times until we got satisfied results. First, by freezing the convolutional base and expanding our model by adding additional layers. We trained the models with 20 epochs before fine-tuned, then we unfreeze some layers to fine tune the model by training it again with 30 epochs. We reduced the learning rate to RMSprop (lr=1e-5) as well. Extending models allow us to improve the model performance. Furthermore, to enhance classification accuracy, we have used the Im-ageDataGenerator class from Keras to apply data augmentation. It can generate augmented images dynamically during model training.
Models optimization. For the fine-tuning models, we first used RMSProp with a learning rate of lr=2e-5 to train the model with our modified layers. Once we had the result, we fine-tuned the model and trained it again with a learning rate of lr=1e-5, which is a very low learning rate, since large updates could damage the representations of the layers we were fine-tuning. Based on several experiments, we noticed that this optimization method gave us the best results rather than using other optimizers.

Experimental environment
To conduct our experiments, we used keras (version 2.4.3), a Python-based Deep Learning library, on Google Colab (a Jupyter-based notebook cloud environment) for free usage of a NVIDIA Tesla P100 GPU.

Test results
The expected outcome of our models is to predict images of EOSINOPHIL, LYMPHOCYTE, MONOCYTE, and NEU-TROPHIL efficiently. After building our models, we evaluated their performances. Figure 6 and Table 6 show the prediction results of our four hybrid models. Furthermore, Tables 7 and 8 illustrate the prediction results of our standalone models before and after fine tuning, respectively. Among all our models, the model that achieved the best performance is the fine-tuned VGG-16, its accuracy is 99.81%. However, among our hybrid models, the VGG-16+SVM is the best, its classification performance is 98.44%.  As shown in Table 7, the results were not good before the fine-tuning was applied on the models and the best result is 87% for VGG-16, and 85.80% for Mobile-net. By training the models again and applying the fine-tuned we got very good results as shown in Table 8. The best result is 99.81% which is the classification accuracy of the VGG-16 model. While the accuracy of the fine-tuned Mobile-Net is 94.38%.

Comparison of WBC classification models
This subsection compares the detection accuracy of our detection models with WBC detection models in the literature. Our fine-tuned VGG-16 outperforms the best WBC model proposed in related studies. The accuracy of our VGG-16 is 99.81% while the best accuracy of related models is 99.09%. Additionally, VGG-16+SVM and Mo-bileNet+SVM models are considered the third and fourth best detection models among others. Table 9 shows the accuracy of the best five proposed WBC detection models in the related work and the accuracy of our proposed models.

Comparison of VGG-16 and mobilenet based models
This subsection compares the detection performances of our models with models that were developed using the VGG-16 and MobileNet for feature extraction. The best result in the related studies was obtained by the MobileNet + Logistic regression model, its classification accuracy is 97.03%. Four out of six proposed models in this study outperform that model. The performances of our VGG-16, VGG16+SVM, Mo-bileNet+SVM, and MobileNet+QDA are 99.81%, 98.44%, 98.19%, and 97.39%, respectively. Table 10 illustrates the detection performance of VGG-16 based models. Table 11 shows the detection accuracy of MobileNet based models.
From the classifier's perspectives, using the suitable classifier to build hybrid models leads to enhanced detection performance. Our results show that the SVM classifier outperforms the QDA when used for classification in all hybrid models, see Tables 10 and  11.
In general, our findings illustrate that five of our proposed models are effective in classifying WBC subtypes. The lowest classification accuracy obtained was 74% which is the accuracy of the VGG16+QDA model.

Conclusion
In this study, we proposed six efficient CNN based models to classify white blood cells subtypes. Some of our models are fine-tuned CNN models, while the rest are hybrid CNN based models. In the fine-tuned models, we used the pre-trained VGG-16 and MobileNet architectures for both feature extraction and classification parts. To develop the hybrid models, we used the VGG16 and MobileNet architectures for feature extraction. For the classification part, we used the SVM and QDA classifiers. The best model among our models is the pretrained VGG-16, its accuracy is 99.81%. The best hybrid model is the VGG16+SVM, its accuracy is 98.44% which is better than the second fine-tuned model. The proposed models could help clinical laboratories to classify the white blood cell subtypes efficiently. To conclude, using transfer learning to solve new problems [20][21] instead of building models from scratch is a sufficient solution. In addition to the high performance, it reduces models' training time.

7
Future work This study could be extended by using several machine learning classifiers and CNN architectures to develop models. Furthermore, additional pre-process steps (segmentation, object detection, etc.) could be applied in the future to enhance the classification performance. Finally, it is helpful to deploy the best WBC classification models.