Breast Cancer Image Multi-Classification Using Random Patch Aggregation and Depth-Wise Convolution based Deep-Net Model

—Adapting the profound, deep convolutional neural network models for large image classification can result in the layout of network architectures with a large number of learnable parameters and tuning of those varied parameters can considerably grow the complexity of the model. To address this problem a convolutional Deep-Net Model based on the extraction of random patches and enforcing depth-wise convolutions is proposed for training and classification of widely known benchmark Breast Cancer histopathology images. The classification result of these randomly extracted patches (size 50X50X3) is aggregated using majority vote casting in deciding the final image classification type. It has been observed that the proposed Deep-Net model implementation results when compared with classification results of the VGG Net (16 layers) learned features, outclasses achieving accuracy up to 89.6% on multi-class classification for 40X magnified images. The results further indicate model trained for images of one optical magnification factor (eg. 40X) might not classify images captured on different magnification (like 100X, 200X, and 400X) with similar accuracy. Thus, different classifiers are required at different magnifications.


Introduction
A rapid increase has been observed in the occurrences of breast cancer, especially in Asian nations like China, India, and Malaysia [1] [2]. Recognizing breast cancer is yet AI's most philanthropic and intricate challenge. The vital symptom of breast cancer is normally a lump or a tumor that feels peculiar from the rest of the breast tissue. However, it is not always easy to distinguish a malignant tumor from a benign one because of their structural similarities. To effectively understand the structural versions, physicians should carefully look at an individual's medical records and make various clinical examinations such as mammography or ultrasound. However, a precise and accurate analysis of a breast tumor can be acquired through few forms of biopsy wherein a small pattern of cells or tissue is removed and stained (using Hema-toxylin and Eosin stains) for examination [3]. The process of monitoring these images is a task of fatigue and requires expertise in the field [4]. There may be the possibility of missing important indicators when perceptibly searching for signs of cancer and may return a false negative.
Light microscopes are used to examine fine details and enlarge images of small matter. Enlarging an image is known as magnification and the amount of refined specification that can be visible is called resolution. H & E-stained tissue images are often captured at different optical magnification degrees, where each magnification can constitute specific facts. The lowest magnification captures the larger vicinity of tissue, while other larger magnification captures the zoomed-in view of the tissue. This explains the use of different magnifications which can doubtlessly yield varying discriminated statistics.
Deep learning methods have been exceedingly used to extract the applicable facts from the raw images and use them for classification tasks [5][6] [7]. There are lots of contributions that had been proposed to enhance generalization capacity for various heavily used benchmark Cancer datasets by making use of deep convolutional networks. Most of the images concentrate on using complex model architecture consisting of Alex Net [8] to VGG-16 [9], ResNet [10], Inception-V3 [11], and DenseNet [12] with functional manipulations such as dropout regularization, batch normalization, transfer learning [13], and zero-shot training [14] have been evolved to try and expand Deep Learning on image data set.
The Table 1 Table 2 summarizes the methods employed for Binary and Multi-class classification using Histopathology Breast Cancer Image analysis.
Though Deep convolutional neural network has confirmed its effectiveness for numerous image classification responsibilities, gaining comprehensive details present in biomedical images is difficult and the task adds on another difficulty when we stumble upon images of various magnification levels. Learning Deep-Net works with huge-size input dimensions requires a longer training time and a notably large network shape with more no. of hidden layers and hardware memory. To ease the task of training Deep-Net work with huge size images as input literature [15] [16] represented each image with one randomly cropped patch, and labeled the patch with the same label as of original image. But this approach leads to ambiguity in training examples as one patch may not be the good representative of the entire image. To deal with this problem, we propose and evaluate a deep-NET structure using SeparableConv2D to enforce depth-wise convolutions where we represent an input image with a small set of patches or tiles cropped from it and associating each tile with the corresponding image label. We also compared the proposed system with a widely used Pre-trained CNN model used for global feature extraction and classification using hard and soft voting techniques.

Dataset Used
Spanhol et al. [38] offered a public dataset (BreakHis) of 82 breast cancer sufferers with diagnoses of eight forms of breast tumors. A framework for the evaluation of the classification strategies for 8 forms of tumor was also likewise proposed. Baseline accuracy was obtained using six feature extractors and four classifiers. The dataset includes 7,909 BC images of four magnification factors. Pictures are 460 pixels in pinnacle and 700d pixels huge.

Convolutional Neural Network (CNN)
CNN's are a significant tool for image classification, retrieval and detection tasks these days. There are four predominant operations in CNNs, such as Convolution, Non-Linearity, Pooling/Sub Sampling, FC layer and Classification as shown in Fig. 1. CNN generally require plenty of training information, and has the capability to address big, high-resolution images and remodeling them without losing significant characteristics.

Experiment 1-Global Feature Extraction using Transfer Learning
Here an effort is made to construct accurate models for BC image classification problems through transfer learning in a quicker or better way. Transfer learning is usually used for preventing over-fitting problem considering the fact that ConvNet contains more generic features (e.g. horizontal /vertical edge detectors or color detectors) at initial layers, and later layers comprise more problem precise information of the multiple classes contained in the original data set. Typically, just the weights of trained convolutional layers are copied, rather than the entire network structure. This could be very effective as almost all data sets share low-level spatial traits that are better learned with big data sets.
A pre-trained CNN based model VGG-16 trained on an Imagenet benchmark dataset was imported deliberating the high computational cost of training such models.
VGG16 is chosen for its gratifying performances in image recognition tasks [9], usefulness toward real-time applications, and feasibility of transfer learning for constrained datasets. A typical CNN has two essential components: first is the Convolutional basis, which is composed of an assembly of convolutional and pooling layers. The main goal of the first part is to generate low-level features from the image. The second part consists of using the traditional machine learning approach, to categorize images from extracted features. We can compare and improve classification accuracy by training various classification models inclusive of linear Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Multilayer Perceptron Model (MLP) classifiers on the features extracted by the Convolutional base part and using k-fold crossvalidation to estimate the error of the classifier. For data augmentation usually, images are transformed using affine transforms (such as horizontal and vertical flipping, contrast enhancement, rotating, zooming, mirroring, fill mode= nearest, etc.) to avoid class imbalance. Fig. 2 describes the Global feature extraction method for BC image classification.
In addition, voting classifier strategies popularly called the weight-adjusted vote casting method is also used and compared. The idea of Ensemble Classifiers [49] is to combine the predictions made with the aid of more than one classifier, which can be more expressive than a single classifier prediction. Also, the outcomes obtained are much less dependent on the peculiarities of the training set. The varieties of voting used are hard and soft voting. The various classifiers used for voting were -Decision Tree Classifier, K Nearest Neighbor, Naive Bayes, Random Forest, Quadratic Discriminant Analysis, and AdaBoost.

Hard voting
In this very last prediction depends on the prediction that appears most even when making classification with the usage of specific models. It is likewise known as majority vote casting. Prediction ŷ depends on prediction made thru unbiased classifiers The equation shows final output prediction is obtained by taking mode of different classifiers used

Soft voting
In this final prediction ŷ relies upon on the predicted probability (p) of individual classifier.
Where, p ij is the i th probability of j th classifier, w j is the weight assigned to j th classifier and m is the total number of classifiers used.
The experimental results demonstrates that Average Accuracy using Soft Voting classifier was better than other conventional classifiers and Ensemble voting.

Proposed Deep-NET Model
Adapting the prevailing deep neural models for large images can result in extra complicated architectures, with massive units of parameters. Exceptional tuning these large ranges of parameters can appreciably grow the complexity of the model. We thus propose a Deep-Net Model to resolve these issues. We also use various parameter tuning and regularization methods adopted in literature for speeding Deep-Net works outputs such as-

Batch size and epochs
Batch size is the number of samples used from the learning set to estimate of the error gradient and is used to update the model weights. The more training examples we use the more accurately the weights will be adjusted towards improving the accuracy of the model. A batch size sample will be used to estimate the error gradient before the model weights are updated after completion of one epoch. Commonly used learning algorithms are Batch Gradient Descent where batch size equals total training samples, Stochastic Gradient Descent where batch = = 1, Mini-batch Gradient Descent. Batch < 1 and < no. of samples in the data set.

Dropout
Dropout is a way where randomly selected neurons are overlooked for the duration of training by means of zeroing their activation values. Dropout is implemented to hidden layer neurons of the Deep-Net model. Dropout is used for regularization by using adding noise to the output function maps of every layer, adding robustness to the model in managing variations of the test image.

Batch normalization
Batch normalization is again a regularization technique that provides some noise to each hidden layer by normalizing the set of activations in the layer. Normalization works by subtracting the batch average value from each of the calculated activations and dividing by the batch standard deviation. This pre-processing technique allows each layer of a network to analyze/learn by way of itself and more independently of other layers.
To further speed up the task of training Deep-Net work with huge size images which is the major problem considered in this paper, the proposed Deep-Net Model is based on the extraction of random patches and enforcing separable convolutions for training. Further aggregate result of these patches is used for final image class recognition. Separable convolutions first perform a depth wise spatial convolution on each input channel separately followed by a point wise convolution that mixes all the channels as shown in Fig 3. Various steps implemented in sequence to build Deep-Net Model for 40X magnified images are explained as follows: Step 1: Patch extraction: Here we extract multiple patches of 50 50 from images of both the positive and negative classes. This dataset holds 16,176 patches of size 50X50 extracted from 2022 histopathology images of breast cancer specimens scanned at 40 . Of these, 10,960 patches test malignant and 5, 216 patches test benign. The dataset is available in public domain with the name Breakhis for breast cancer Dataset. In addition, a text file is created containing list of paths to image patches.
Step 2: Database creation: Using the patches we create a mean file, which is the mean value of a pixel as seen through all the patches found in learning database. This mean value is subtracted from the pixel to roughly "zero" the data, improving the efficiency of the Deep Learning algorithm. Then, using text file created in the previous step, we calculate an index by multiplying the length of this list in text file by 0.8 so we can slice this list to get sub-lists for the training and testing datasets. Next, 20% of remaining list is kept for testing. Now, datasets are a list with tuples (holding paths and the base path (training, validation or testing) with class label for each image) for information about the training, validation, and testing sets. We now actually build training, validation and testing image set using above tuples in separately named folders. If the base path does not exist, we'll create the new directory.
Step 3: Build model: As a next step we suggest a deep-NET structure using Separable-Conv2D to enforce depth-wise convolutions. The purpose of doing convolution is to extract useful capabilities from the input. In Convolutional Neural Network, numerous features are extracted via convolution using filters whose weights are automatically learned throughout training. All those extracted features are then combined to make decisions. Convolution also takes spatial relationships of pixels into concern. Deep Learning have come up with different styles of convolutions (e.g. 2D, 3D, (1x1), Transposed, Dilated, Spatially Separable, Depth-wise Separable, Flattened, Grouped, Shuffled etc.). Here, in this work depthwise separable convolutions is used, commonly used in deep architectures (e.g. Mobile Net and Xception) which is applied to single channel at a time rather than applying to all channels at the same time, this requires lesser number of learnable parameters thus computations are reduced and resolves overfitting issues. The depth wise separable convolutions encompass steps involving depth-wise convolutions and 1 1 convolutions.
We will build a classifier to train on 80% of a original breast cancer histopathology image dataset. Of this, we'll keep 21% of the data for validation/dev set, thus splitting our dataset into training, validation, and testing sets in the ratio -59% (for training): 21% (for validation): 20% (for testing). Using Keras, we'll define a CNN (Convolutional Neural Network), calling it as BCNet, and train it on our images. We'll then derive metrics such as confusion matrix, accuracy, f score, recall and precision to analyze the performance of the model. We use ImageDataGenerator from Keras for image data augmentation and extract batches of images to avoid overloading the entire dataset in memory at once.
The Deep-NET model performs the following operations: Use 3 3 convolutional _lters Stack these filters on top of each other Perform max-pooling Use SeparableConv2D for depth-wise separable convolution The class deep-NET has a static method build () that takes four parameters-width and height of the image, its depth (the number of color channels in each image), and the number of classes the network will predict between, which, for us, is 8 (0 and 7). In this method, we initialize model and shape. Here we used channels first, to update the shape and the channel dimension. Now, we'll define three DEPTHWISE CONV = > => layers; each with a higher stacking and a greater number of filters. The softmax classifier outputs prediction percentages for each class. In the end, we return the model.
Step 4: Train model: Here we train and evaluate our model. Here, we'll import from keras, sklearn, Deep-Net , config, imutils, matplotlib, numpy, and os. In this script, first, we set initial values for the number of epochs to 200-400, the learning rate = 1e − 2, and the batch size of 64. We'll get the number of paths in the three directories for training, validation, and testing. Then, we adjust class weights to deal with class imbalance problem. We will initialize the model using the Adagrad optimizer and compile it with a categorical cross-entropy loss function. The model is compiled and trained for the given dataset.
Step 5: Image labeling: After patch classification result is computed image-wise classification is obtained using majority voting, where the most happening patch label is selected as image label.
Step 6: Evaluation: In this section, we present an extensive experimental evaluation shown in Table 5 for the proposed model with architecture shown in Table 4 on the BreakHis dataset, in order to show case its better performance than Global feature extraction method using metrics such as Accuracy, F-Score, Precision and Recall. The steps were defined for 40 i.e we trained the model at 40 magnification and as it has the tendency of being maximum informative magnification factor, we test it with images of other stage magnifications i.e.,100 , 200 , 400 .
However, it would seem that one magnification model may not be able to handle images with other magnifications, and different classifiers are required at different magnifications. Moreover, decision in such cases where large variation in patient score exists, may not be reliable by just considering one magnification level. However, it would seem that one magnification model may not be able to handle images with other magnifications, and different classifiers are required at different magnifications. Moreover, a decision in such cases where large variation in patient score exists, may not be reliable by just considering one magnification level. However, it appears that one magnification trained model might not classify images with similar accuracy for different magnification images.

Comparison with State-of-The-Art
To endorse the viability of proposed deep architecture for BC sub-classification we balance our results with some state of art work. In Table-7 we assemble the best outcomes got in this work along with other CNN-based approach presented in, [38], [42] and [50] applied on the same Breast Cancer dataset as used in this paper. Results depicts that the proposed strategy beats many of contemporary strategies except in some cases. The objective of this work is to examine and comprehensively analyze the sub-class classification performance of the proposed model across all optical magnification frontiers.

Conclusion
The work presented here proposed a Deep-Net Model based on the extraction of random patches and enforcing depth-wise convolutions which is an enhancement over traditional way of using Deep-Net models for image classification. An Exhaustive comparison of proposed model for 8 sub-class classification for Histopathology Breast Cancer images is presented over transfer learning based Global Feature Extraction method using Ensemble voting classifiers. Results show case better performance of proposed Deep-Net model over Global feature extraction method. Above result examination also reveals that one magnification model may not be able to handle images with other magnifications thus it raises the need for a magnification independent model utilizing deep learning to classify the benign and malignant cases. The study opens some questions about scale in variance properties of feature-classifier combination, role of ensemble classification, considering that the magnification specific model requires relatively less training than the magnification independent model. This research is a foundation for our future publication in the integration of deep learning and block-chain technology.

Declaration
We as author declare that there is no conflict of interest.

Authors
Prof. Vandana Kate received the BE degree in Computer Engineering from Jabalpur Engineering College in 2002, ME degree in Computer Engineering from Institute of Engineering & Technology affiliated to DAVV University, Indore in 2010 and Currently pursuing PhD in Computer Engineering from the same university. She is working as Associate Professor in Acropolis Institute of Technology and Research and has a vast experience in teaching of more than 15 years. Her research interests include pattern recognition, machine learning, and medical image analysis and computer vision. She has many publications in leading international journals and peer