An Efficient Hybrid Classifier for Cancer Detection

The early detection of cancer in both healthy and high-risk populations offers increased opportunity for treatment and curative intent. In this paper, we propose a hybrid classifier that produces an efficient classification system for cancer detection in cell datasets. The first part of this work investigates the performance of artificial neural networks (ANN) such as Self-Organizing Feature Map (SOM) and Learning Vector Quantization (LVQ), while in the second part, we present our investigation on the performances of Decision Tree (DT) and its pruning model. We also, in the third part, present our proposal for a new hybrid classifier that is based on the Random Forest (RF) and the combination of the LVQ and DT. Experimental results of the proposed hybrid classifier indicate that the hybrid classifier effectively avoids the drawbacks of individual classifiers and has high anti-noise performance. Keywords—Self-Organizing Map, Learning Vector Quantization, Decision Tree, Cancer Detection, Hybrid Classifier, Bootstrap Sampling


Introduction
The use of artificial intelligence in biomedical engineering includes three phases; sensor input, signal processing, and classification. Few classifiers, such as the Decision Tree (DT) and Learning Vector Quantization (LVQ), have a near-optimal performance for different databases [4,17,28]. However, only a few classifiers used in biomedical databases where their performances depend on the used database and special operation conditions [3,6,18]. On the other hand, new methods were proposed to improve the performance of individual classifiers such as the Genetic Algorithm [15], Hybrid Genetic Algorithm [15], and Swarm Optimization [26,27], to list a few. However, the improved classifiers still have inherent flaws from their basic algorithms and complicated coding issue with, in many cases, lack discussion of classification time. The Self-Organizing Feature Map (SOM) represents a simple neural network-based classifier, Kohonen model-based, that utilizes the principle of competition between neurons [2,3]. The SOM was developed based on the human brain neurons' function [1]. Unsupervised learning classifiers, such as SOM, are evolved into supervised learning classifiers such as the LVQ [23]. Also, other supervised learning Under the microscope, images of tumor cell disclose important information related to the possibility of having cancer. The database from the University of Wisconsin includes 357 benign cases and 212 malignant cases [19]. Features of a typical cell from the database include radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, and fractal dimension [20]. These ten features are extracted from a digitized image of a fine needle aspirate of a breast mass. The extracted features describe the characteristics of each cell nucleus found in the tumor image. Also, the average, standard deviation, and worst value of the ten characteristics are calculated. For each case, the database has thirty extracted features; the average values of ten features form the 1 st to 10 th features, the standard deviation of ten features forms the 11 th to 20 th features, and the worst values from the 21 st to 30 th features. Table 1 illustrates the used equations to calculate the ten characteristics of cell images where μ = 1/N (p_1 + p_2+. . p_N ), p i is the gray-scale value, N is the number of pixels in the moving window, E is the ratio of number of pixels to the real scale of the cell, q is number of pixels on cell covered area of tumor cell image, h is number of points on the perimeter, i is the point serial number, θ 2 is the coordinate of centroid of the tumor image, P ⃗ ⃗ i is the coordinate of one point on the perimeter of the tumor cell image, P 1 , P 2 , and P 3 are the initial point, point with the highest curvature, and final point; respectively, V ⃗ ⃗ is the direction vector of concavity, P i is the point with the highest curvature in tumor cell image, θ 1 is the point of centroid of symmetry lines from the standard Non-tumor image, and θ 2 is the point of centroid of tumor cell image.

Artificial Neuron Network with Competition Layer
This section discusses the competition layer ANN as a mathematical model, theoretical basis, and simulation steps. The first part of this section discusses the unsupervised learning competition SOM, while the second part of this section discusses the supervised learning classifier LVQ.

Self-Organizing Map (SOM)
The classification idea is based on the principle that any external stimuli cause changes to the internal neuronal parameters rather than neuron's position [3]. This change will generate a specialized tissue as a reaction to the external stimuli. As the external stimuli keep increasing, the internal parameters' change continues to increase, which generates a clustering function. For near and local neurons' interaction in the self-organizing cluster, some studies expressed this interaction as a neural lateral interaction [3]. Within the circle center of the neuron that sends messages, the adjacent neurons are excited while the far neurons are in the passive state.
For the mathematic model of SOM [1], the ℎ neural input in the competition layer is expressed by (1): where represents the external signal and represents the weight parameters between the competition layer neuron and input layer. The inner product is shown in (2).
The ℎ output of the competition layer can be described as shown in (3).
where represents the group of near neurons, represents the weighted parameters between near neurons, and ( ) represents some nonlinear lose. If the ℎ neuron is wining the specific stimulus, the output will be one, is equal to one, and the competition layer will be the output layer. Because of the interaction, the weighted parameters will change per stimuli, as shown in (4), and thus there must be a group of neurons that respond to the stimuli trigger.
where represents the weighted vector, represents the input vector, and α and represent the adjustment number. If the represents the winning group of neurons, the output in the group should be one, will be α, which will lead to the same stimuli, and the weighted parameters will be decreased in value. If the did not change, in this group, no matter how many times the stimuli trigger, the is called beyond doubt. Also, if the is not in the group, the and will be 0, indicating that the parameters will not change, as shown in (5). This procedure is captured in the learning process of SOM.
At the end of the learning process, α represents the study rate, which is a gradually decreasing function with only one variable; study time. Therefore, α may be represented by any function such as = 1/t where is the study time. The MATLAB offers a basic programming mode of setting the competition network [10]. In this article, for the first simulation, the SOM is used to completely classify 100 different patients randomly extracted from the cell tumor (CT) database, which can deeply find the performance of ANN with the competition layer. Since there are 100 different patients, the number of neurons is set to 100. For the second simulation, the SOM is used to classify two cell categories; malignant and benign.
One can state that three key parameters can affect the performance of the ANN with competition layer; learning / iteration time, number of neurons in the completion layer, and study rate as indicated in Ref. [5]. In the first simulation, 600 training samples and 168 testing samples, the learning/iteration time will change gradually as 10, 20, 50, 100, 250, 500, and 1500 to find the suitable iteration while the study rate is set to 0.01 to obtain fast and accurate classification [1,13]. In the second simulation, the suitable iteration is kept fixed to find a suitable study rate where the SOM changes the number of neurons to 2 to classify two cell categories. The testing database for the second simulation consisting of 569 cases from the CT Database. Figure 1 shows the flowchart of designing the SOM. There are two tests in the whole process. In the first test, the SOM should classify 100 different patients. In the second test, the SOM should classify two categories of 569 patients.

Learning Vector Quantization (LVQ)
The LVQ is developed based on the Kohonen Competition Function, which has the same principle of SOM [23]. The main difference between the two networks is the utilization of LVQ on supervised learning. The LVQ network is composed of three different layers; input, competition, and output, so the LVQ can automatically adjust the weighted parameters to obtain correct classification at the output layer and sets the weighted parameters between the competition layer and output layer. The values of these weighted parameters are always one or zero since the wining neurons should be one when it is associated with a particular class of input, while zero indicates failing neurons. The winning neuron from the competition layer is decided by (2) where the process of updating the parameters is similar to (5). Still, we consider the runnerup neuron in the case of the wrong winning neuron and the restricted decision by one neuron [13]. Comparing the output layer result with actual classification, the LVQ can update the weighted parameter without many iterations to ensure better performance by setting the suitable mean square error (MSE) for training [14].
To find the suitable MSE and iteration of the proposed Advanced LVQ (ALVQ) classifier, the value of MSE is set to the maximum value from five individual Fast-Good LVQ classifiers with the largest iteration. The requirements of the Fast-Good LVQ are set to: classification time less than 10 seconds, and accuracy, sensitivity, and specificity are larger than 85%. Figure 2 shows the flowchart of designing the ALVQ classifier. First, this paper manually finds valid ranges of MSE, iteration, and number of neurons for Fast-Good LVQs. Second, the Five LVQs take 150% of the maximum iteration time with the minimum number of neurons from the previous results. Third, the ALVQ utilizes the maximum MSE with associated iteration from Five LVQs. The number of neurons of ALVQ is set to the maximum number of the Fast-Good LVQs. In this section, the same CT Database that consists of 569 cases, 519 training cases, and 50 testing cases is used. To evaluate the performance of classifiers, the database will be randomly realigned ten times before each simulation.

Fig. 2. Flowchart of the proposed ALVQ Classifier
To evaluate the anti-noise classification performance, this paper uses ALVQ to conduct another three tests. The first test employs random Gaussian noise into 50% of the training set. The second test adds an unknown noise into a quarter of the training set through three random distributions with random parameters. The third test changes the data of 20 patients in the training set into a different value. This process uses the Colon Cancer Database. The whole simulation is conducted ten times due to random parameters of the noise.
The influence of iteration and number of neurons in the competition layer is analyzed in the SOM Section. Since the SOM doesn't have the output layer and competition layer, the SOM classifier performance will not be affected by supervised teaching. We can easily find the effects of iteration and the number of neurons, with a fixed study rate, on the results without setting the MSE of LVQ. The results indicate classification improvement in the learning function and study rate in comparison with other results such as [13,25]. This paper uses a 0.01 study rate, which is enough to make an accurate classification for many small databases within ten iterations if there is no significant noise in the training set or insufficient training process [13].

Decision Tree
The DT is considered as a statistical technique of the information reaction that is based on the entropy and information gain from the ID3 algorithm [7]. The entropy ( ) of information is used to describe the purity of group information, as shown in (6). If all samples are in the same group, the ( ) will be zero. In other words, if a system has mostly the same information, its entropy will be minimal. The DT is constructed with each non-terminal/non-leaf node representing the selected attribute on which the data is split. Further, terminal/leaf nodes represent the class label of the classification decision after computing all attributes [24]. In ID3, the entropy always affects the non-leaf node. The smallest entropy or largest information gain is used to be the non-leaf node in every iteration. The larger the number of single attributes in the data set, the higher probability of this attribute becoming a non-leaf node. To avoid drawbacks of the ID3 of tending to adopt more samples in the classifier where many tree branches offer useless details, the C4.5 algorithm is used where the gain is replaced by the gain ratio, as shown in (6), [8].
where represents the information gain, represents the difference value from the group feature , and ( ) represents the proportion of sample in group .
Furthermore, due to the binary categories of the CT database, the Classification and Regression Trees (CART) is a better choice [12]. The CART implementation is similar to the C4.5. The only difference is that CART is based on the Gini index, which is a standard impurity measure [22]. The CART can generate a regression tree. However, this work focuses on classification by the CART Method. To avoid the overfitting issue, this work uses the pessimistic pruning algorithm [9]. The pessimistic pruning method replaces the previous non-leaf node with the leaf node. To maintain the information classification accuracy and efficiency, the pessimistic pruning algorithm bases its judgment on the classification error. For a leaf node, it covers N samples, and the ratio of classification error is (E+0.5)/N. If the inside non-leaf node has L following nodes, the ratio of classification error can be found, as shown in (7).
where the constant 0.5 represents the punishment of calculating error ratio for a specific node. In the case of two-group classification, the error follows the Bernoulli Distribution. The calculation of the standard deviation of the error of a sub-node inside a non-leaf node can be found using (8).
Assuming the final leaf node will be cut, and the error classification is J, then the ratio of error should be (J+0.5)/N. If the difference between the non-leaf node error ratio and sub-node error ratio is larger than the signal node, the tree branch will be cut. Figure 3. shows the flow chart of the DT used in this paper. The DT uses the same CT database used by LVQ. To ensure finding the correct performance, the database will be randomly realigned ten times before each classification. Also, the mathematical logic procedure and the confusion matrix will be used to investigate the performance of the pruning tree from the CART and the normal DT from CART. Finally, the same anti-noising tests applied to LVQ will be applied to the Pruning CART.

Hybrid Classifier
A hybrid classifier is proposed based on the random forest (RF) and the combination of LVQ and DT. The purpose of the hybrid classifier is to overcome the drawbacks of each classifier based on the analyzed performance of individual classifiers.
From the mathematical inference of ANN with the competition Layer, the LVQ must take enough iterations to adjust its weighted parameters. However, not all neurons in the competition layer can make accurate reactions to different inputs. In the case of too large/small Euclidean distances between some vectors of input data, the learning process of competition and random initial weighed parameters limits the classification accuracy. In this case, the LVQ utilizes this drawback to avoid a big noisy training set. However, the DT can be more sensitive to any perturbations in the training data making it hard to avoid a wrong classification for noisy training data. Therefore, the LVQ can make up the drawbacks of DT. Furthermore, the DT can make up the deficient performance of neurons. When considering the multicollinearity drawback, the DT greedily chooses the significant group feature, and the LVQ chooses the small Euclidean distance group. Finally, ensemble methods, such as RFs, can negate this issue.
The RF is considered as multiple DTs-based classifier where the output category is determined by the output mode of individual trees [10]. When data progress inside the RF, it cause each DT to categories the data. The RF makes the final decision based on the majority vote of individual DTs. The Bootstrap Sampling method is used where it samples the data to create training groups equal to the number of used DTs inside the RF [21]. If a group S contains n samples x_1, x_2… ,x_n, then sampling the group n times creates a new group S'. For the new group S', the probability of not including one sample is = (1 − 1/ ) . Thus, we can get a new group with the same number of samples but different from the previous one. The new group S' is created as a training group for a specific DT [11]. The C4.5 method is fast enough to run one hundred Decision Tree models. In summary, we used 100 new groups as 100 training sets for 100 Decision Tree models with the C4.5 method. The classification result is based on the majority vote from DT results. Meanwhile, inherited problems of RFs are incomprehensible and hardly incremental in the classification process. The DT and LVQ can provide an inside process of classified features that should efficiently make up the flaws of RFs.
The proposed hybrid classifier consists of two sub-hybrid classifiers, where its framework is presented in Figure 4. The first sub-hybrid classifier bases its classification decision on two classifiers. The LVQ-based classifier uses the ALVQ with suitable training MSE and iteration from the Five LVQs. The DT-based classifier uses the pruning CART method. The other sub-hybrid classifier employs the RF where one hundred C4.5 DTs, based on the Bootstrap Sampling method, are used to create new one hundred training sets from the 519 training cases. To reduce the classification time, the three classifiers are run in parallel to vote for the final classification result. The hybrid classifier performance is also tested against noisy cases in the same test setup of the LVQ and DT individual classifiers. The RF is considered as multiple DTs-based classifier where the output category is determined by the output mode of individual trees [10]. When data progress inside the RF, it cause each DT to categories the data. The RF makes the final decision based on the majority vote of individual DTs.

Experimental Work and Performance Analysis
This section presents the simulation results of three individual classifiers and the proposed hybrid classifier.

Classifier performance evaluation
The accuracy, sensitivity, and specificity are used to evaluate classifiers' performance, as shown in (9).   Figure 5 shows the classification results of the SOM. At the beginning of the simulation, each neuron in the network is given a serial number so the program can easily check which neuron is related to a specific category of the database. The columns of figure 5 represent the patient number, while rows indicate the iteration (10, 20, 50, 100, 250, 500, 1500). The intersection of patient number and iteration indicates neurons' serial number.
The SOM system assigns every neuron a random serial number in each iteration, which represents one input category. For example, in Figure 5, for the first patient when the iteration is 250, the neuron's serial number of the classifier is 71. As Figure  5 indicates, using more iterations provides the neuron with better-weighted parameters like the human's memory since it requires more time stimulus to provide better results. However, simulation results indicate that using a large iteration will not efficiently enhance classification performance anymore; on the contrary, it decreases performance because of long classification time with almost unimproved accuracy. Simulation results also indicate that using 250 iterations results in acceptable accuracy and classification time.
In the case of using iteration less than 250, the SOM correctly identifies less than seventy patients. On the other hand, using iteration larger than 250 results in a littleimproved accuracy for the SOM network but at the price of longer classification time.
Results also show that using 500 iterations allows the SOM to avoid the error of 250 iterations, such as the change between 31st and 32nd columns, which takes longer classification time close to ten seconds. However, using 500 iterations introduces a new error, such as the change between 16th and 52nd columns. Figure 5 also indicates that no matter how the iterations change, the SOM cannot correctly classify five patients; number 81, 39, 51, 99, and 58. These results are due to the limited learning ability of neurons or due to disabled neurons being unable to effectively update the weighted parameters along with the iteration and study rate. In summary, the 250iteration results in the best classification performance.
If there are more than 100 neurons to classify 100 patients in the SOM competition layer, the classification performance of SOM should produce better performance. The SOM is also used for cancer detection. The previous discussion recommends using the 250 iterations. The results, shown in Table 2, indicate the SOM performance in cancer classification. Table 2 indicates that the number of used neurons affects classification time. Figure 6 shows the reaction time of neurons to the input data. These two experiments indicate that the iteration and number of neurons largely affect ANN with the competition layer. Using more iterations can enhance classification performance but increases the classification time. Using more neurons can avoid the overfitting problem and efficiently enhance the classification performance. Comparing the SOM with the traditional K-Means method, the SOM overcomes many of the K-means drawbacks [16]. For example, in the case of the unknown database, more cases than the predicted number of classified groups in the training group must be known using the K-means classifier. Figure 7 shows an example of the LVQ training confusion matrix with 20 neurons, 0.1 error limit, and 250 iterations. The true positive set has 322 patients which occupy 62% of all training sets, the false negative set has 42 patients, the false positive set has 7 patients, and the true negative set has 148 patients. Results indicate 88.5% sensitivity, 95.5% specificity, 97.9% precision, 77.9% negative predictive value, and 90.4% accuracy.   Fast-good LVQ results: To find a suitable MSE range, this paper designs LVQ with very small MSE and enough number of iterations. Figure 8 shows a typical example of the MSE curve versus Epochs for a training MSE of 0.01 and 250 iterations. Figure 8 shows that MSE cannot get a lower value than 0.0829 after 9 iterations. Since the supervised LVQ can efficiently adjust the weighted parameters of the neuron from the actual output, the LVQ takes the shortest time of iteration to achieve the best performance. The LVQ cannot reach 0 MSE due to the disabled neurons and the drawback of competition learning processing, which already indicated in the SOM Section. After several simulations with the random training data and testing data, Table 3 shows the MSE training goal range to reach Fast-Good LVQ. Table 3 indicates that the LVQ can have the best performance within 10 iterations and 10 seconds for the realigned CT Database. Figure 8 indicates that the performance will be distorted when LVQ cannot get 0.0829 MSE and continues to classify until 250 iteration ends. In this case, the number of neurons should be chosen from 20 to 25 because the simulation time of 30 neurons is close to 10 seconds, and a smaller number of neurons can inherent the overfitting problem as discussed in the SOM Section. So, the Fast-Good LVQ should have MSE less or equal to 0.1, 15 to 30 neurons, and 10 iterations. The ALVQ results: From the analysis of the previous section, parameters of the five LVQs of the ALVQ should have 0.01 MSE, 15 iterations, and 10 neurons. The MSE and iteration of the ALVQ are found from five LVQs. This accurate classifier can automatically find the suitable weighted parameters from five ALVQs, and it can adapt to other databases with a similar scale. Table 4 shows the ALVQ classification performance in identifying cancer. As Table 4 shows, after ten times of simulation with realigned training and testing sets of the CT Database, the ALVQ has average accuracy and specificity above 90%. However, the sensitivity is around 82%, which means the ALVQ suffers the overfitting problem for some malignancy inputs. Table 5 shows the simulation results of the anti-noise performance of the ALVQ. From the simulation results, the LVQ has high anti-noise classification performance in case of large SNR and, in case of the noise, has different distribution function. However, if we change 20 input data into 2000, the LVQ will be crashed. Figure 9 indicates that LVQ needs more classification time to handle significant different input data. Therefore, for the ALVQ, if there is a considerable noise, using short iteration time will not be enough.    Table 6 shows the average classification performance of the DT for the realigned training and testing set from the CT Database. As table 6 indicates, the DT can be a competent classifier for early detection of cancer.

Decision tree classification analysis
As Figure 10 shows, the DT can easily detect cancerous cells and find logical mathematical classification where x indicates a feature of the input data. For example, Figure 10 shows x23 as the worst value of the perimeter of the cell and X27 as the worst value of concavity of cell in input data which become the tree's chance nodes. If a cell has a perimeter larger than 114.65 and a concavity less than 0.1907, the specific person will have a 90.4% probability of not getting cancer.
From Figure 10, the mathematical function can be expressed by the DT in the type of Y = f(X); like piecewise functions. For example, when Y = 2, then X 23 > 114.65, X 27 > 0.1907, and X 23 > 15.665. However, it cannot build the consecutive relationship between two attributes in the type of X m + X n < constant where X is the input and (m, n) indicates the serial number of input features. This is because all Xboundary judgment is parallel with the axis. So, the DT offers straightforward judgment, but it is hard to render the mathematical relationship between each attribute with one mathematical equation.
From Figure 10, there is a degradation in the classification performance when the classifier executes in step X 22 . When the group size is small, the judgment becomes useless due to overfitting issue. Therefore, it is essential to stop the tree from growing too deep accurately. One solution is by using the pessimistic pruning method.
The pessimistic pruning method-based DT is shown in Figure 11. The simulation results of the 3 rd best-level, the 0.0289 new re-substitution error after pruning, and 0.0193 re-substitution error before pruning, indicate that the best level is three, which is the smallest value in subtrees because the DTM is based on the minimal cost calculation. The re-substitution error indicates re-inputting the data and checking the classification error. For example, for 30 leaves DT with 1000 samples having 10 error samples, the re-substitution error will be 10/1000=1%. After applying the pruning process, the substation error is increased from 0.019 to 0.029. However, as shown in table 7, the simulation performance on the testing data is better than the previous results of DT because the pruning method can effectively handle the overfitting problem.    Fig. 11. Pruning the Decision Tree Table 8 shows the simulation results of the anti-noise performance of the pruning DT. From the simulation results of the first noisy and the second noisy testing sets, the DT shows the worst performance when the noise embedded into the training set. Comparing with the LVQ anti-noise performance, Pruning DT is easily affected by noise. When the 20 patients are directly changed into 200, the pruning DT has 100% classification accuracy for the testing group. Though the DT is pruned, the pruning DT also has the overfitting problem and makes harder classification when the small noise results in a highly correlative data. The results of the third noise test also indicate that adding suitable noise into input data, pruning the DT model shows better results.

Proposed hybrid classifier performance analysis
The hybrid classifier uses the CT database that consists of realigned 569 cases split into 519 training cases and 50 testing cases. The previous analysis and simulations highlight the drawbacks and merits of each classifier. Therefore, a hybrid classifier is proposed to avoid all drawbacks of individual classifiers. As Figure 12 shows, the ALVQ classifier (3 rd row) adopts the suitable training MSE with the iteration from five LVQs, the DT (1 st row) adopts pruning CART, and the RF classifier (2 nd row) uses 100 decision trees. As Figure 12 shows, the 2 nd or 9 th columns make up the flaws of disabled neurons and/or the insufficient iteration, the 28 th column makes up the random and/or repeating choice of uninformative input data/variable from RF, and the 40 th column make up the overfitting problem of Pruning Decision Tree.
As table 9 shows, the hybrid classifier has the highest anti-noise performance. The LVQ and the RF can make up the flaws of low anti-noise performance of Pruning DT for the 1 st noise. The DT and RF can make up the flaws of LVQ for the 3 rd noise. However, the hybrid classifier did not have a high performance for the second noise due to the wrong votes from LVQ and Decision Tree. This drawback can also be found in table 5 and table 8. The low noise results in correlated variables of CT database, therefore, the LVQ has low ability to recognize them, and DT is easily overfitted. The DT in the hybrid classifier shows the important variables/features of input data that affects the classification and can help in viewing the logical mathematical classification. The hybrid classifier can be efficiently coded to show vectors of the training input data with LVQ weighted parameters. However, the ability of LVQ is limited by the other two DTs for the same weighted vote. If the DT gets the wrong results and LVQ made the correct classification, the hybrid classifier is prone to two DTs. Furthermore, the whole process increases the classification time by automatically find the best training MSE.

Conclusion
This paper focuses on enhancing the classification performance of cancer in cell database. Through analysis of three different classifiers, including Pruning DT, RF, and LVQ, this paper develops a hybrid classifier that has high classification performance for different small binary databases. Experimental results of ANN with a competition layer indicate that the SOM and LVQ are affected by the iteration, which recursively finds the best-weighted parameters between the competition layer and the input layer with a suitable study rate. The suitable training MSE and suitable iteration are the most critical parameters to obtain efficient LVQ. Furthermore, the ANN with the competition layer also has high anti-noise performance and stable classification time. The main drawbacks of ANN with competition layer are limitation by the learning process, and it is hard to classify some input vectors that have small (or large) Euclidean distance between each of them and the initial vectors of weighted parameters. The pessimistic pruning DT requires less computational complexity than ANN without the model training but has the worst anti-noisy performance because it demands independent and identical distribution of input data. Therefore, the normalization is a possible pre-processing solution for DT. The proposed hybrid classifier can be developed based on combination of other machine learning methods depending on the targeted database. For example, if image pixels are input instead of extracted features then the LVQ takes large computations and should be replaced by convolutional neural networks.