Cross Validation Analysis of Convolutional Neural Network Variants with Various White Blood Cells Datasets for the Classification Task

— White Blood Cells (WBCs) analysis is an important procedure to detect diseases is that closely related to human immunity system. Manual WBCs analysis is laborious and hence computer aided system (CAD) is a better option to alleviate the shortcoming. Since conventional segmentation-classification approach is tedious to configure, a Convolutional Neural Network (CNN) become recent trend for WBCs classification. Previously, there are many works proposed for WBCs identification. However, the models that can be generalised to works well among various datasets is remain vague. In this paper, an analysis of various CNN models which are simple Alexnet, embedded friendly Mobilenet, inception based Googlenet, systematic architecture VGG-16 and skip connection based model (Resnet & Densenet), are tested with three major WBCs datasets (Kaggle, LISC and IDB-2). From the rigorous experiments, it can be concluded that simple CNN model of Alexnet performs well across all three datasets with 98.08% accuracy on Kaggle, 96.34% accuracy on IDB-2 and 84.52% on LISC. This outcome can be utilise as a basis to improve the CNN classification model that can be generalize to works under various WBCs datasets.


Introduction
White Blood Cell (WBC) is one of the important elements in human's body including Red Blood Cell (RBC) and platelets.It is greatly related to human's immune system that keeps the body healthy.Other than that, it helps to fight viruses and bacteria which will help to prevent any serious disease from attacking.However, an abnormal count of WBC could be harmful as it will also lead to several diseases such as Leukemia, Cancerous and other blood related diseases [1].This is where WBC analysis is important so that early prevention and be made and the risk can be reduced.Other than that, WBC analysis also can help in monitoring the patient's health condition and the effectiveness of cancer patient treatment.
There are 5 types of WBC which are Eosinophil, Neutrophil, Basophil, Lymphocyte and Monocyte [2].These cells are different with each other in terms of their shapes, number of lobes and sizes of its nucleus and cytoplasm [3].The differences can be seen in Figure 1 and their nucleus' stain is also different [4].As can be seen, the shape of Eosinophil and Neutrophil is more rounded compared to Monocyte that shows the irregular edges.Other than that, the cytoplasm's color resolution is also different form Monocyte, Lymphocyte and Basophil which is in deeper purple colored stain while Eosinophil and Neutrophil is more pinkish and light in color.The number of nucleus and its shape also can be observed to differentiate between these cells.Other than that, there are small cavities in the cytoplasm which is called vacuoles, seen in Eosinophil, Neutrophil and Basophil.The normality and number of each cell type must be monitored to ensure the patient is in a healthy condition.

Fig. 1. 5 types of WBC
Traditionally, the WBC is done manually by the pathologist and it is very time consuming.Other than that, it is also challenging for the pathologist as the sample increases [5] and it is highly dependent on the pathologist's skill which will be confusing and yield inaccurate result [2].However, industry has come out with a hematology machine that is automated, fast and accurate but these machines are really expensive and it is not portable [6].Other than that, some researchers have been classifying WBC using conventional method that includes processes such as preprocessing, segmentation, feature extraction.The main problem with conventional method is that blood smear images can be affected by different conditions, light distribution and variation of staining intensities which can influence the segmentation process and can reduce the rate of segmentation [7].
As mentioned, these are the reasons that motivate this paper to classify WBC types using deep learning techniques which is Convolutional Neural Network (CNN).One of the advantages is deep learning can learn and extract high level features automatically and perform classification in the same time [8].Features extraction is critically important as the wrong features selection will reduce the classification accuracy.
Other than that, it is less complex without much tuning needed and less complicated process compared to the conventional method [9].CNN is specifically developed to tackle feature extraction issue including variation of image rotation.
CNN basically automatically learn the features through backpropagation by using multiple building blocks, such as convolution layers, pooling layers, and fully connected layer [10].CNN has proven an impressive performance in various tasks such as image classification, object detection, action recognition and many more since last few years [11].Its ability to be both translation and rotation invariant has helped a lot of researchers to complete various tasks [12].Deep learning is also widely used for brain tumour classification [13,14], Korean character recognition [15] and leaves classification for Chinese herbal medicine [16].Other than that, there are also research on offline signature verification using deep learning CNN [17].Some works used googlenet from CNN for lung cancer detection [18,19].However, this paper is focused on image classification of 5 WBC types as CNN works wonderfully and able to provide high performance on image classification [20].There are many researches on WBC classification using CNN but most of it is only limited to one dataset and not much comparison has been made.Some previous works that used CNN to classify WBC including classification that is based on Regional Convolutional Neural Network (R-CNN) [1].There are four models used to train which are Alexnet, VGG-16, Googlenet and Resnet50 and Res-net50 is able to show the highest performance which is 100% and testing result of 99.52%, 98.40%, 98.48%, 96.16% and 95.04% for Lymphocyte, Monocyte, Basophil, Eosinophil and Neutrophil respectively.Other than that, [7] and [8] develops their own model and the average accuracy achieved is 98.61% and 96.60% respectively.Next, WBC classification and counting is done by [2] which Alexnet has outperform Googlenet and Resnet-101.Other than that, five layers CNN model which contains three layers for feature extraction and the other two layers are used for classification [21].Lastly, WBC detection and identification using modified LeNet-5 is proposed in [6].
Basically, this paper discusses about WBC types classification using deep learning which is CNN.The comparison between several pre-trained models which are Alexnet, Googlenet, VGG-16, Mobilenet, Resnet and Densenet is made to classify WBC types in blood smear image.The result of training and testing performance is compared to each other.Other than that, it is also tested on three different datasets which are Kaggle, IDB-2 and LISC to prove that the model is optimal and valid for most of WBC dataset.Matlab R2020a is the platform used with deep network designer toolbox.

2
System overview

Platform
The platform used in this paper is Matlab R2020a as it is a high level computer programming language that can calculate, represent, visualize and complete many iJOE -Vol.18, No. 02, 2022 other tasks.The main toolboxes used are deep learning, image processing and computer vision.In order to use the deep learning designer toolbox is to import the pretrained model and modify accordingly.After the modification is done, the model is exported to the Matlab workspace and executed for training process.

Datasets
There are three databases experimented in this project which are Kaggle, IDB-2 and LISC.These databases consist images of various types of WBC.

CNN models
There are several main keys operation in CNN which include convolution, pooling and fully connected.Basically, convolution contains set of filters to obtain the feature maps and to learn feature representation of the inputs.Next, pooling is a down sampling layer to reduce the feature dimensions which will prevent overfitting.It is also equivalent to fuzzy filtering which is used to increase the robustness of feature extraction.Lastly, fully connected layer that acts as classifier which computes the last score.There are six models involved in this experiment which are Alexnet, Googlenet, VGG-16, Resnet, Mobilenet and Densenet.These models are tested because of its special feature such as the number of layers, present of inception module, skip connection, depthwise convolution and dense block.The batch size and epoch is fixed to 64 and 50 respectively.

Alexnet
Alexnet contains 8 layers of convolutional layer and fully connected layer as shown in Figure 5.The model also includes local response normalization and max pooling layer before fully connected layer.Input image must be in the size of 227×227 image resolution.First convolution mask of 11×11 is applied on the input image and followed by 5×5 and three times of 3×3 convolution.

VGG-16
There are 16 layers of convolutional layer, max pooling layer and fully connected layer in VGG-16 model and the input image resolution must be 224×224.The architecture of VGG-16 can be seen in Figure 7.This model improves its classification accuracy by replacing the large sized convolution filters with small sized filters [22].
Max pooling is used to overcome overfitting in this model and reduce the number of learned parameters which will eventually reduce the computational cost.Mobilenet has 27 layers in total including 13 depthwise convolution layers, a layer of 3×3 convolution and 13 1×1 convolution layers.Its architecture is as depicted in Figure 8.It can be seen that there are also average pooling layer, fully connected layer and softmax in Mobilenet structure.Pooling layer is applied to develop downsampling operation to reduce the feature dimension and prevent overfitting [23].Basically, Mobilenet is based on depthwise separable convolution which can be divided into two parts which are depthwise convolution and pointwise convolution.Depthwise convolution is used to apply single filter per each input while pointwise convolution is a simple 1×1 convolution layer which then used to create a linear combination of depthwise output layer.Depthwise is a step of filtering the input without creating new features while pointwise generates new features.Depthwise separable convolution will help to reduce computational time and size of the model.

Resnet
Resnet or known as Residual Network is basically based on the residual block with skip connection which add the result from previous layer to the next layer of the model.The reason behind the skip connection is to reduce the training error.In this paper, Resnet of 34 layers is used and the resnet building block is as depicted in Figure 9.
Resnet 34 layers contains one max pooling layer and also one average pooling layer at the end of the model.iJOE -Vol.18, No. 02, 2022

Fig. 9. Resnet building block
Another advantage of applying skip connection is any layer that can affect the performance of the model will be ignored as the it will be skipped by regularization.Other than that, problems involving vanishing or exploding gradient can be prevented.

Densenet
Densenet uses dense connection between layers through its dense block as shown in Figure 10.It is basically a feed forward connection which the feature maps of all preceding layers are used as input.The importance of this model is to encourage feature reuse and can reduce the number of parameter.

Results and analysis
In this section, explanation on the result obtained is made based on three databases.In this paper, CNN is used to classify different types of WBC.Same pre-trained models of Alexnet, Googlenet, VGG-16, Mobilenet, Resnet and Densenet are tested on the three databases.70% of images is used for training purposes and another 30% is used for testing.In training process, the value of epoch and batch size is fixed to 50 and 64 respectively.

Kaggle
Total of 6000 images were tested using six pre-trained models and the result is tabulated in Table 1.It can be seen from Table 1 that highest training accuracy is achieved by Resnet which is 97.13% followed by Alexnet and Mobilenet.However, Alexnet obtained the highest testing accuracy by achieving 99.10%.Since Alexnet is the highest for testing, the testing result of each cell type is as shown in Table 2.All types of cell able to achieve high testing accuracy which is more than 98%.Number of misclassified image in Eosinophil category is the highest which is 26.It means that 26 images of Eosinophil are misclassified as other cell type which is Neutrophil.It is followed by Neutrophil, that showed 21 images are misclassified as Eosinophil.These two types of cell are mostly mistaken due to the cell shape, morphological features and the amount of its nucleus which is almost similar and makes it hard to differentiate between those two cells.While as for Lymphocyte and Monocyte, the differences are very clear and easier to classify the cells to their own classes.The breakdown of misclassified image is as shown in Table 3. 26 images of Eosinophil are misclassified as Neutrophil and 20 images of Neutrophil are misclassified as Eosinophil.As for Lymphocyte and Monocyte, the number of misclassified image is low which is 3 and 4 respectively.Most misclassified Neutrophil is mistaken as Eosinophil and vice versa due to its morphological features, number of nucleus in the cytoplasm and their shapes were almost similar to each other.While Lymphocyte and Monocyte has clear morphological features, shape and pattern from one another.Average performance for each model is calculated and tabulated in Table 4. Alexnet achieved the highest average performance by obtaining 98.08% followed by Resnet which the average performance accuracy is 75.69%.The lowest performance accuracy is Googlenet which is 70.92%.Alexnet has showed outstanding performance for Kaggle dataset as other models' performance is in the range of 70%-76% while Alexnet's is 98.08%.

IDB-2
Two classes of data in IDB-2 database which are Lymphoblast and Nonlymphoblast commonly used to detect ALL.Each class contains 130 number of images and the total images in this dataset is 260 images.These images are trained and tested using five different pre-trained models of Alexnet, Googlenet, VGG-16, Mobilenet and Resnet.
The training accuracy and testing accuracy is obtained and kept as shown in Table 5 From the table, it can be said that Alexnet is the best model to classify two types of WBC in IDB database as both its training and testing accuracy is high.However, average performance comparison as shown in Table 6 is made to strengthen the results obtained.Alexnet again showed satisfactory result as Alexnet achieved highest average performance accuracy which is 96.34%.The lowest performance accuracy is 83.59% which has obtained by Resnet.This happens because even though Resnet achieves the highest training accuracy, but its testing performance is the lowest.As Alexnet is the best model for classification purposes in IDB-2 database, a detailed testing performance is made as mentioned in Table 7. Performance of testing accuracy for both classes of Lymphoblast and Non-lymphoblast cell is satisfactory which is more than 95%.There is no misclassified image in Lymphoblast class while in Non-lymphoblast class, there is only 4 images that have been misclassified.These misclassified images need to be identified as it will give a huge impact to the overall performance.Figure 11 shows some of the misclassified images that often mistaken as the other cell type.These are the images of Lymphoblast and Nonlymphoblast that are misclassified by more than one CNN model.This is probably due to some reasons such as it contains extra noises, confusing shape or color.Some of the images that have been misclassified is as pictured in Table 11.These images have been misclassified by all pre-trained models.This is most probably due to its acquisition condition, different lighting and coloration.Other than that, the color intensity of the image is not standardized.Hence, it is misclassified by all pre-trained models.

Conclusion and future works
In this paper, several CNN pre-trained models were tested on three databases.The purpose of this project is to classify WBC as these databases contain different WBC iJOE -Vol.18, No. 02, 2022 type.The platform used is Matlab R2020a and deep learning toolbox is included.Epoch value and batch size is fixed for all models which is 50 and 64 respectively.
Firstly, pre-trained models of Alexnet, Googlenet, VGG-16, Mobilenet, Resnet and Densenet is applied on Kaggle dataset to classify 4 types of WBC which is Eosinophil, Neutrophil, Lymphocyte and Monocyte.There are 1500 images in each class which will total up to 6000 images in Kaggle dataset.These images were trained and tested, and the comparison between each model is observed and recorded.In training process, Resnet achieved the highest accuracy of 97.13% followed by Alexnet which is 97.06%.However, as for testing accuracy, Alexnet is able to obtain the highest accuracy by achieving 99.10%.Average performance is calculated and based on the findings, Alexnet is the best model to classify WBC types in Kaggle dataset as it achieved 98.08% performance accuracy.It is definitely higher than other models.
Next, Alexnet, Googlenet, VGG-16, Mobilenet and Resnet is used to classify two types of cell in IDB-2 database which is Lymphoblast and Non-lymphoblast.Each class of data contains 130 number of images and total image in the database is 260 images.The comparison of the performance is made and it is found that the training accuracy of Resnet is the highest which is 97.18% while second highest is Alexnet by achieving 96.15% accuracy.As for testing performance, 96.52% accuracy is the highest which is achieved by Alexnet and followed by VGG-16 which achieved 95.69% testing accuracy.In order to find the best model for IDB-2 database, average performance is made and the highest performance accuracy is achieved by Alexnet which is 96.34%.Moreover, Alexnet is able to classify all Lymphoblast image precisely.
Lastly, in LISC dataset, there are 5 classes of data which are Eosinophil, Neutrophil, Basophil, Monocyte and Lymphocyte.Each class consists of 39, 50, 53, 52 and 48 images respectively and total image in LISC dataset is 242 images.Resnet is able to achieve the highest training accuracy which is 97.71%.However, its testing accuracy is 53.70% accuracy which is the lowest among other model.The highest testing accuracy is obtained by Alexnet by achieving accuracy of 88.21%.Average performance of each model is calculated and it is found that Alexnet is the best model to classify WBC types in LISC dataset as it achieved 84.52% of accuracy.Googlenet's average performance is the lowest which is 74.72%.Number of misclassified images by Alexnet is 16 images out of 242 images which is considered low.
Overall, as discussed in the previous section, Alexnet is the best model to classify WBC types in all three databases of Kaggle, IDB-2 and LISC.It contains the lowest number of layers compared to other models that have been tested.
In future, the project's finding is expected to be improved by adding the comparison between several optimizers such as Adam, RMSprop and Stochastic Gradient Descent with Momentum (SGDM).Other than that, fine tuning and developing basic own model is also to be made for the results improvement.Lastly, the dataset should be increase to strengthen the findings.

Table 1 .
Kaggle result example table

Table 2 .
Testing performance of each cell type (Alexnet)

Table 3 .
Breakdown of misclassified image

Table 4 .
Average performance for kaggle dataset

Table 5 .
. Highest training accuracy is again achieved by Resnet which is 97.18%.Mobilenet and Alexnet each achieved 96.96% and 96.15% training accuracy.The lowest training accuracy is Googlenet.While for testing accuracy, highest is Alexnet which is 96.52%, followed by VGG-16 by obtaining 95.69% accuracy.Training and testing accuracy

Table 7 .
Accuracy of testing for lymphoblast and non-lymphoblast (Alexnet)