Oral Malignancy Detection Using Color Features from Digital True Color Images

—One of the most prevalent forms of cancer worldwide is oral cancer which has a high rate of mortality. Diagnosis and treatment of oral premalignant lesions at an early stage reduces the death rate. The objective of this work is to detect malignancies by analyzing color features of digital true color oral images. A dataset of around 433 oral lesion images has been created that includes benign, premalignant and malignant lesions. The proposed method was experimented on this dataset. Different classifiers have been trained using various color features. The neural network classifier detects abnormalities with an accuracy of 94.82%. Results indicate that the color features have better potential in identifying benign and malignant oral lesions.


Introduction
Worldwide, cancer has a high fatality rate and is considered as a hazardous health disorder.Oral cancer is one among the many types of cancers that is most prevalent in India.Oral cancer is rampant in North India and Northern part of Karnataka mainly due to excessive consumption of carcinogenic products like tobacco and alcohol [1].Early diagnosis of oral malignancies can considerably reduce the oral cancer mortality rate.Any wound, injury or ulcer in any part of oral cavity is referred to as oral lesion.Oral lesions may be benevolent, premalignant or malevolent.Benevolent lesions are non-cancerous, harmless lesions.Few of the benevolent lesions are apthous ulcer, cysts, papilloma and others.Premalignant lesions are those that have the tendency to transform into cancerous lesions if not treated at an early stage.Leukoplakia, erythroplakia, lichen planus and submucous fibrosis are examples of potential premalignant lesions [2].Malignant lesions are life endangering cancerous lesions like Oral squa-mous cell carcinoma, verrucous carcinoma, adenocarcinoma and others.Many lives can be saved if premalignant and malignant lesions are detected and treated at an early stage.Since most of the premalignant lesions are asymptomatic i.e. painless and not causing discomfort in carrying out routine functions, it is quite challenging to identify early lesions and treat them.Biopsy is the gold standard for diagnosing oral lesions as benign or malignant lesions.Since it is an invasive, expensive and time-consuming procedure, patients are quite often unwilling to undergo biopsy.Since the oral malignancies are predominant in low socio-economic and rural areas, the investigating procedures like biopsy may not be conveniently available.
To address these issues, there is a need to develop newer techniques that are economical, accessible and timely available to people of all sections of the society.A Computer Aided Diagnosis System (CADs) is a technological advancement that can aid doctors in diagnosing oral malignancies.In this direction, the current work focuses on developing a CAD system that can capture oral images of patients, analyze these images by means of different techniques and generate a preliminary report regarding the malignancies if any.
Researchers have tried using various image features like color, texture [3] and shape for detection of malignancies in oral cavity.Chodorowski et.al [4] have tried to differentiate oral lichenoid reactions and leukoplakia with the use of color features.A semi-automatic method has been proposed in [5] where Red-Green-Blue (RGB) images are analyzed by converting them into single-band images like normalized RGB and Hue-Saturation-Intensity (HSI) color space.Higher order Spectra features and Local Binary Patterns [6] from 158 microscopic images have been used for classifying benign and malignant lesions with Support vector Machine (SVM) classifier.Authors in [7] have used histogram features and Gray Level Co-Occurrence (GLCM) features for classifying biopsy images of oral lesions into normal and malignant with linear SVM classifier.GLCM, Gray Level Run Length (GLRL) and first order intensity-based features have been used in [8] for classifying color oral images into different classes using neural network classifier.
Literature review shows that color is one of the most important features for differentiating oral malignancies from benign lesions.The present work aims at the use of color features to analyze color oral images for identifying malignancies.The remaining part of the paper has been organized into four different sections.Data collection and creation of dataset have been explained in second section.The techniques used in the work have been explained in section 3. Section 4 presents the results and section 5 concludes the paper.

Image Dataset Creation
An appropriate dataset is very much essential for any research work.A benchmark dataset of oral images is not available currently for use by researchers to evaluate their work.To overcome this limitation, a standard dataset has been created by collecting images of oral lesions from various medical colleges and hospitals across Karnataka over a period of three years.These include BLDE medical college, Vijayapura, Al-Ameen Medical College, Vijayapura, District Hospital, Vijayapura, Hasanamba Dental College, Hassan and also few private hospitals from Hassan.Since images were captured using digital cameras of different resolution, there was a lot of variation among these true-color digital images.Appropriate preprocessing techniques were applied to create a standard dataset of oral cavity images.The dataset consists of 433 oral images out of which 346 are malignant lesions and 87 are benign lesions.Each oral lesion image may have more than one area of lesion which is non-continuous.Even if it is a single continuous area, there is lot of variation.So, to increase the size of dataset, many patches have been created from every image, thus accounting for a total of 1642 patches.Fig. 1 shows few images of benign and malignant oral lesions from the dataset.

Proposed Method
The present work aims at developing a Computer Aided Diagnosis System for analyzing the oral lesion images to identify them as benevolent or malevolent.The steps involved in present work are given in Fig. 2. Color oral images are captured by using digital camera of good resolution.Since the acquired images are of different size and resolution, they are pre-processed to have a uniform dimension.Histogram equalization is performed to improve the image contrast.Segmentation is performed to extract the required area for analysis.From the segmented region, patches are derived from lesion area as well as normal areas.The most appropriate color features are derived using these patches.An analysis of these features is done by means of different classifiers to distinguish malignancies from non-malignancies.

Lesion area segmentation
Images for this work have been captured using mobile phone cameras.Captured images include unwanted regions such as teeth, lip and other areas of the oral cavity.These areas have to be excluded to retain only the region of interest i.e, lesion area.In earlier works, many segmentation techniques have been applied to select the lesion portion.Region of interest selection from dental panoramic images using active contours [9] is performed by contour initialization and appropriate parameter setting.A hybrid FCM segmentation technique has been proposed for region of interest segmentation from dental images in [10].Authors in [11] have used Gabor based texture method followed by watershed segmentation for segmenting epithelial layer from oral histological images.Marker controlled watershed segmentation approach for dental radiographs has been proposed in [12].In the present work, a threshold-based segmentation technique has been used for segmenting the lesion region from the input image.
The RGB images are converted into different color models namely, Hue-Saturation-Variance, Yellow-Chrominance (blue) -Chrominance (red) (YCbCr) and L*a*b* color spaces.From these color spaces, individuals color bands are extracted.The color thresholder application was used to decide different threshold values for removing the teeth, lip and other unwanted regions from the images.The YCbCr color space was found to give better results in segmenting the lesion portion from the oral cavity image.By setting the threshold values of Cbmax and Crmax to appropriate values, it was possible to obtain the masked image.
The algorithm for lesion area segmentation is given below: Apply active contour-based segmentation with the initialization mask to obtain the lesion area from the image Initially, the RGB image captured using mobile phone camera is taken as input.This image contains all areas of oral cavity like teeth, lips, tongue, buccal mucosa and other unwanted areas.These are eliminated by applying the threshold values for the YCbCr image as given in the algorithm.After eliminating the unwanted areas in the image, the most appropriate lesion portion needs to be extracted for further analysis.For this, region-based active contour method has been used to segment lesion area from the image.In region-based active contour method, an image is divided into regions.The pixels within that region are assumed to have similar gray level values [13].For active contours, an initial region has to be specified for growing the region to include the lesion area appropriately.For specifying the initial region as input to active contour model, the pixels which have their red and blue channels as non-zero values are selected.This is taken as the initialization mask for growing the region to form the lesion area.Fig. 3 elucidates the result of threshold-based segmentation.

Color features extraction
After the lesion area has been segmented, different patches are extracted from the lesion area.Patches are also extracted from normal areas.These patches are then used to extract different color features that are further analyzed to distinguish cancerous from non-cancerous lesions.Color features have been used by researchers in lesion analysis [14] [15].To analyze the color features, individual color bands from different color spaces are used namely, Red-Green-Blue space, Hue-Saturation-Value space, Yluminance-cb-cr, L*a*b*.The color features that are used for analysis include: Mean, Standard Deviation, Variance and Skewness.These features have been computed for every single-color band which results in (4 color spaces) X (3 color bands in iJOE -Vol.16, No. 14, 2020 each color space) X (4 color moments) = 48 features.In addition to these, 7 features which are computed by considering the differences in mean values of red, green, blue, hue and saturation between lesion areas and normal areas are also used as discriminative features.Altogether, 55 features have been used to analyze oral lesions.

Original image
Image after thresholding RoI selection after applying active contour

Experimental Results
The experimentation was conducted on 433 images from the dataset.All 55 features were derived from the images of the database.Literature [16] [17] shows the use of different classifiers for lesion classification.In the present work, four different classifiers have been used: K-Nearest Neighbor classifier (K-NN), support vector machine, Naive Bayes classifier and Artificial Neural Network (ANN) classifier.The extracted color features have been used to train these classifiers for classifying the images as benign or malignant.The K-NN classifier has been implemented with k=5 neighbors.The distance metric used is Minkowski with the power parameter being Euclidean distance.A linear SVM classifier has been trained with the extracted features.The Gaussian Naive Bayes algorithm has been used for classification of oral lesions in the current work.A 3-layer neural network classifier has been built with one input layer, one hidden layer and one output layer.The relu activation function and the adam optimization function have been used with mean square error as the loss function.For classification performance analysis, the experimentation was carried out with different sizes of dataset for training and testing.These included training and testing in the ratio 80:20, 70:30 and 60:40.For every ratio, experiments were repeated by random choice of training and testing dataset.Measures such as accuracy, sensitivity and specificity are used to evaluate the classification performance.Table 2 elucidates the performance analysis of different classifiers.The neural network classifier has highest classification performance in classifying the lesions as benign or malignant with 80% training samples followed by K-NN, SVM and Naive Bayes classifiers.The RoC curves for different classifiers are given in Figure 4. Figure 5 depicts the classification accuracy of different classifiers.

Conclusion
In this paper, color features have been explored for characterizing benign and malignant oral lesions.A dataset of 433 color oral lesion images has been created.Experimentation was conducted on these images of the database and the effect of various features was studied by using four different classifiers.Experimental results show that the color features are very useful in differentiating malignant lesions from benign lesions.Also, a multiclass classification of malignant lesions can be performed by using the proposed features.

Fig. 1 .
Fig. 1.Images of benign and malignant lesions from the dataset

Fig. 4 .
Fig. 4. Performance Analysis of different classifiers B.R., is currently serving as Assistant Professor, Department of Information Science and Engineering at Malnad College of Engineering, Hassan, Karnataka, India.She has a teaching experience of 17 years.She is currently pursuing her PhD in image processing and computer vision.She has 04 research articles published in International Journals/conferences.She is a member of professional societies like CSI and ISTE.Email: brn@mcehassan.ac.inDr. Geetha Kiran A., is currently serving as Professor & Head, Department of Computer Science and Engineering at Malnad College of Engineering, Hassan, Karnataka, India.She has about 23 years of experience in teaching.She has her Ph.D. in the field of Image Processing from Mysore University.She is an active researcher and academician.Her areas of research include image processing, computer vision, machine learning and applications of Python programming.Her research work entitled "Fruit Crop Yield Estimation Using Machine Vision Techniques" has received "TEQIP Competitive Research Grant".She has also been awarded with "Certificate of Appreciation" from Deputy Superintendent of Police, Hassan, Karnataka State Police, towards development of web-based application for monitoring quarantine persons of COVID-19 during April, 2020.She has been presented with "Most Influential Educational Leadership Award" for her excellence & leadership in education during July, 2020 by Golden AIM awards.iJOE -Vol.16, No. 14, 2020

Table 2 .
Classification Performance Analysis of different classifiers for different ratios of training and testing samples