A Holistic Model for Recognition of Handwritten Arabic Text Based on the Local Binary Pattern Technique

in this paper, we introduce a multi-stage offline holistic handwritten Arabic text recognition model using the Local Binary Pattern (LBP) technique and two machine-learning approaches; Support Vector Machines (SVM) and Artificial Neural Network (ANN). In this model, the LBP method is utilized for extracting the global text features without text segmentation. The suggested model was tested and utilized on version II of the IFN/ENIT database applying the polynomial, linear, and Gaussian SVM and ANN classifiers. Performance of the ANN was assessed using the Levenberg-Marquardt (LM), Bayesian Regularization (BR), and Scaled Conjugate Gradient (SCG) training methods. The classification outputs of the herein suggested model were compared and verified with the results obtained from two benchmark Arabic text recognition models (ATRSs) that are based on the Discrete Cosine Transform (DCT) and Principal Component Analysis (PCA) methods using various normalization sizes of images of Arabic text. The classification outcomes of the suggested model are promising and better than the outcomes of the examined benchmarks models. The best classification accuracies of the suggested model (97.46% and 94.92%) are obtained using the polynomial SVM classifier and the BR ANN training methods, respectively. Keywords—Handwritten Arabic Text, Holistic Recognition, Local Binary Pattern, Support Vector Machines, Artificial Neural Network.


Introduction
Simulating human understanding for text processing such that a given computer system can read, understand and process text in a way similar to human mind, is the ultimate goal of any Handwritten Arabic Text Recognition System (HATRS) [1]. To the best of our knowledge, handwritten text recognition is considered as one of the major complicated problems in the domain of pattern recognition employing the Artificial Intelligence (AI) approaches [1]. To implement an HATRS, we can employ either offline or online recognition. Using the offline recognition approach, input image is scanned first then processed by the system. Where, on the other hand, online approaches 2

Related Works
Plenty frameworks have been developed to investigate the offline holistic approaches to recognize the handwritten Arabic text. For example, El-Hajj et al [8] introduced an offline-recognition model of handwritten Arabic text using the Hidden-Markov Model (HMM) classifier. In his model, [8] extracted text features from the sliding windows using the foreground pixel densities and their concavities. This model achieved 87.20% recognition accuracy when evaluated using the IFN/ENIT database. Pechwitz and Margner [9] advocated another offline handwritten Arabic text-recognition system using the HMMs. In their work, they compared the efficiencies of the extracted features depending on two categories: the pixel values and skeleton directions. They evaluated their model using the IFN/ENIT database and reached 89.1% recognition rate.
AlKhateeb [10] suggested multi-class classification model for the handwritten Arabic words that is based on the Dynamic Bayesian Network (DBN). In this model, the features were extracted from a sliding window moving across mirrored text images. The model was tested on version 2.0 of the IFN/ENIT dataset and the test results were quite promising. Alkhateeb [11] studied the DCT features using varying classifiers (the HMM with re-ranking, kNN, and ANN). Performance assessment showed that the recognition accuracies produced by these three classifiers were 95.15%, 78.67%, and 80.75%, respectively.
Alshekmubarak et al. [12] put forwarded an offline holistic model for recognition of the handwritten Arabic words using grid feature extraction and the normalized poly kernel SVM classifier. In their approach, they derived their features using the means of the uniform grid feature technique. This technique splits the word image into a set of regions, and then sums the black pixels in each region. They assisted the performance of the model on the IFN/ENIT database. They achieved 92.34% and 95.27% recognition accuracies using subsets 24 and 56 classes.
EI Qacimy et al. [4] proposed an offline, word-based system using the SVM classifier and DCT features that is enhanced by reject option for recognition of the handwritten Arabic text. Performance of this model was assessed on 2,000-word images randomly chosen from the IFN/ENIT database. Thereafter, performance of this model was evaluated in comparison with levels of performance of benchmark systems that use DCT features for classifying handwritten Arabic text.
Hassan and Alawi [13] designed holistic offline HATRS based on Discrete Wavelet Transform (DWT) and SVM with Gaussian kernel. Their model was developed based on four levels of the DWT via segmentation of the wavelet space into 16x16 segments. Afterwards, standard deviation was computed for each block. Then, the performance of this model was assessed on a collected database using the SVM linear, Gaussian, and polynomial kernel classifiers. The assessment results indicated that this model had the recognition accuracies of 89.17%, 90.00%, and 90.65% with these three classifiers, respectively.
Aloun [14] developed a holistic offline HATRS based on the LBP and the Gaussian SVM classifier. In this model, both the unwanted pixels and the noise were removed and the text edges were extracted. Thereafter, text skeleton was extracted and diacritics were ignored. To assist the performance of this model based on the IFN/ENIT database, the author used the Gaussian SVM classifier. Based on performance evaluation, this study concluded that the 125x125 word normalization size was the optimum size for best performance of the suggested model, whereas the highest classification accuracy was 96.57%.
Al-Saqqar et al. [6] introduced an offline model for holistic HATRS based on the PCA and SVM classifiers. In this study, the universal features were extracted using the PCA technique from variant text skeleton images sizes. To classify the extracted features, the authors used the linear, Gaussian, and polynomial SVM classifiers on the version 2 of the IFN/ENIT database. This study reached a recognition rate of (77.80% and 89.96%) based on the Gaussian SVM classifier and the 125x125 image normalization on the sub-sets (e) and (d) of the IFN/ENIT database.

The Proposed Model
This study proposes a holistic multi-stage HATRS based on the LPB method and two machine-learning classifiers (SVM and ANN). The proposed HATRS has four processes: skeleton extraction, text image normalization, feature extraction using LBP, and classification using the ANN and the Gaussian, linear, and polynomial SVM classifiers. The ANN is assessed using three training methods: LM, BR, and SCG. The proposed holistic multi-stage HATRS is presented in Fig 1. Stages of this proposed holistic multistage HATRS model is described in the sequent sub-sections.

The skeleton extraction
The skeletonization-based morphological operation technique is applied to thin the handwritten Arabic text at this stage, due to its effectiveness in thinning the cursive handwritten Arabic text [15]. Consequently, the thinning process refines the text shape by removing the unwanted data, which reduces the size of the data that need handling for feature extraction [15]. Fig 2 shows an example of a handwritten Arabic text skeleton produced by the skeletonization-based morphological technique.

Normalization
Text image normalization is a highly important step in the process of text recognition, because styles of writing vary from a person to another [11]. Normalization is critical step in the recognition systems that are sensitive to variations in sizes and positions. The purpose of normalization is producing uniform text image with little variations, at the levels of words and characters, among the various writers of the one text [11]. In the light of the importance of this process and its influence on recognition results of the recognition model suggested herein, recognition performance of this model was evaluated on images of texts of different sizes. Those sizes and their impacts on the performance of this proposed model are addressed in the results and discussion section.

Feature extraction using LBP
The ultimate goal of feature extraction is to provide a proper representation of the whole text image via a set of features [7]. This study uses the LBP method of Ojala et al. [16] in order to extract the handwritten Arabic text global features. The LBP is extended into 16x16 pixel cells in order to extract high-level Arabic text global features according to the following steps [17]: 1. The text image is segmented into regions of 16x16 pixel cells. 2. For each region, a circle with a radius (R) is drawn from the central pixel (xc, yc) of this region. As well as, the neighbors of the central pixel (xp, yp) are computed using the Equations 1 and 2.
3. Texture (T) of the local pixel neighborhood (xp, yp) is defined using Equation 3.
5. The LBP values for each region are computed by comparing its surrounding pixel with the central one, : using Equations 5 and 6.

Text classification
In this study, the proposed HATRS has been tested using two machine-learning classifiers (SVM and ANN). The two classifiers are explained in the following sub-sections. Support Vector Machine (SVM): SVM classifier normally uses kernels to decide the best decision boundary and separate, in the high-dimensional feature space, between the likely classes [18]. Several kernel functions can transform the non-linear divisible problem into a linear separable problem by projecting data into clarified feature space [4]. Thereafter, the SVM can determine the hyperplane with the best separation [13]. In this paper, multi-class SVM classifiers were employed for the text images classification using the following SVM kernel functions [19]: • The Gaussian SVM function is: K(x, y) = exp−γ||xi−xj ||2), • The polynomial function is: K(x, y) = (K(xi, xj ) = (γxixj + coef) d), and • The linear function is: K(x, y) = (K(xi, xj ) = (xixj)) Artificial Neural Network (ANN): Artificial Neural Network (ANN) is considered as a part of a computing system that is designed to mimic the way of analyzing and processing information by the human brain. Furthermore, an ANN model can be seen as a set of interconnected nodes which communicate together and with the outside using a well-known connections called synapses [20]. In this paper, Multi-layer Perceptron (MLP) ANN is used for holistic classification of the handwritten Arabic text. MLP composes of three layers: input, hidden and output layers. In the input layer, data is transfered through synapses to the hidden layer using the input neuron. Afterthat, in the hidden layer, data is transfered to the output layer using more synapses. Those synapses have what is known as weights which are used as an input and output for those layers [21]. In this study, the performance of the ANN was assessed using three training methods: the Levenberg-Marquardt (LM), Bayesian Regularization (BR), and Scaled Conjugate Gradient (SCG) methods. An explanation of these training methods follows [22].

Levenberg-Marquardt (LM) training method:
The LM method is an iterative algorithm that determines the minimal of multivariate function, which sums the squares of the nonlinear, real-valued functions [22][23][24]. Recently, the LM has turned into standard method for the non-linear, least-squares problems [25] and has become a widely approved method in broad range of disciplines. It is regarded as an integration of the Gauss-Newton and the steepest descent methods [22].
Bayesian Regularization (BR) training method: Mackay [26] proposed the Bayesian Regularization (BR) method. BR sets the optimum probable performance function automatically to achieve excellent generalization based on the Bayesian inference method. Bayesian optimization of the regularization parameters is dependent on calculation of the Hessian matrix at the lowest point [22].
Scaled Conjugate Gradient (SCG) training method: Moller [27] developed the Scaled Conjugate Gradient (SCG) training method. SCG is a variant of the Conjugate Gradient method that avoids line-search per learning iteration using the LM method to scale step size. By applying the step size scaling method, the SCG technique prevents the time-consuming line-search per the learning iteration [27]. Algorithm 1 illustrates the steps of the proposed HATRS.

Experimental Results and Discussion
We have implemented the suggested and the benchmark HATRSs in MATLAB 2017a that is installed on a personal computer with i3 processor, memory of 6 GB and speed of 1.90 GHz. Levels of recognition of the studied ATRSs were evaluated on version 2.0 of the IFN/ENIT database of handwritten Arabic texts, which consists of 32,492 images of handwritten Arabic names of Tunisian towns and villages. Those names are classified into five sub-sets, abbreviated as a, b, c, d, and e [28,29]. Sub-sets (a), (b), (c), and (d) were used for system training while sub-sets (d) and (e) were used for system testing. The five sub-sets are supplied with ground truth data that have been used in labeling the recognition results. To verify performance of the suggested HATRS, the recognition outcomes of the proposed system were compared with the results of two benchmark ATRSs, one developed by [6] based on PCA and another based on DCT. The DCT approach was used by [4] for holistic classification of Arabic texts with a reject option based on sub-word segmentation.
The proposed HATRS and the benchmark ATRSs were assisted on the (e) and (d) sub-sets of the IFN/ENIT database using (i) the polynomial, linear and Gaussian SVM classifiers; (ii) the SCG, LM, and BR ANN training methods; and (iii) six normalized sizes of word images: 75x75, 80x100, 100x100, 100x125, 125x125, and 150x150. The accuracies of classification of all systems when examined on sub-sets (d) and (e) of the IFN/ENIT database using the linear, polynomial, and Gaussian SVM classifiers, respectively, are given by Tables 1 and 2. Furthermore, classification accuracies of the suggested HATRS when assisted on sub-sets (d) and (e) of the IFN/ENIT database using six normalized image sizes and the SVM classifiers are presented in Fig 3 and 4 respectively. Moreover, the classification accuracies of these systems when tested on sub-sets (d) and (e) of the IFN/ENIT database using the ANN training methods, respectively, are shown in Tables 3 and 4. Furthermore, classification accuracies of the suggested HATRS when tested on sub-sets (d) and (e) of the IFN/ENIT database using six normalized image sizes and the ANN training methods are presented in Fig 5 and 6 respectively.  Table 1 shows that the suggested HATRS produces higher classification accuracies than the benchmark ATRSs when using the six normalized sizes of text images on subset (d) of the IFN/ENIT database and using the polynomial, linear and Gaussian SVM classifiers. The optimum classification accuracy (97.46%) generated by the suggested system was associated with the polynomial SVM classifier and the 100x125 normalized image size. Meantime, the optimal classification accuracies (89.96% and 79.14%) generated by the two benchmark ATRSs (PCA and DCT) were obtained with the 125x125 and 75x75 normalized image sizes, respectively, and using the Gaussian SVM classifier. Moreover, when the proposed HATRS was assisted on sub-set (d) of the IFN/ENIT database, the polynomial SVM classifier produced the best classification results (97.46%) while the Gaussian and linear SVM classifiers produced classification accuracies of 97.13% and 94.15%, respectively.  Table 2 shows that the suggested HATRS gives higher classification accuracies than the benchmark ATRSs with the foregoing normalized image sizes on the (e) sub-set of the IFN/ENIT database using the polynomial, linear, and Gaussian SVM classifiers. The optimal classification accuracy (83.34%) generated by the suggested system was associated with the 100x125 normalized image size and the Gaussian SVM classifier. However, the optimum classification accuracies (78.27% and 69.83) produced by the PCA and DCT benchmark ATRSs were concomitant with the 100x125 and 80x100 normalized image sizes, respectively, and the polynomial SVM classifier. Fig 4 shows that, when the proposed HATRS was assisted on sub-set (e) of the IFN/ENIT database, the Gaussian SVM classifier produced the best classification results (83.34%) whereas the linear and polynomial classifiers produced classification accuracies of 79.00% and 83.02%, respectively.   Table 3 reveals that the herein suggested HATRS gives higher classification accuracies than the benchmark ATRSs with the six image size normalizations on sub-set (d) of the IFN/ENIT database and the SCG, LM, and BR ANN training methods. The highest classification accuracy (94.92%) generated by the suggested system was associated with the 150 x150 image size normalization and the BR ANN training method. Meanwhile, the highest classification accuracies (87.96% and 75.49%) produced by the PCA and DCT ATRSs were associated with the 150x150 and 100x100 image size normalizations, respectively, and using the SCG ANN training method. Fig 5 shows Table 4 illustrates that the suggested HATRS gives higher classification accuracies than the benchmark ATRSs on sub-set (d) of the IFN/ENIT database with the six image size normalizations and using the SCG, LM, and BR ANN training methods. The highest classification accuracy (80.93%) generated by the suggested system was associated with the 80x100 normalized image size and the LM ANN training method. Meanwhile, the highest classification accuracies (74.97% and 67.49%) produced by the PCA and DCT ATRSs were associated with the 150x150 and 80x100 image size normalizations and the BR and SCG ANN training methods, respectively.   Finally, Fig 6 shows that, when suggested system is assisted on sub-set (e) of the IFN/ENIT database, the LM ANN training method produces the best classification results (80.93%) whilst the BR and SCG ANN training methods produced the classification accuracies of 80.61% and 80.37%, respectively. The above outcomes support the effectiveness of the suggested HATRS in the holistic recognition of handwritten Arabic texts using SVM and ANN classifiers.

Conclusions and Future Directions
This paper presented a multi-stage holistic HATRS based on the LBP feature extraction technique and two machine-learning classifiers (SVM and ANN). The proposed HATRS has four processes: skeleton extraction, text image normalization, feature extraction using LBP, and classification using the linear, Gaussian and polynomial SVM and ANN classifiers. The ANN was assessed using three training methods (LM, BR, and SCG). To validate the proposed model, we compared its recognition results with the recognition results of two benchmark ATRSs that are based on the PCA and the DCT methods. Recognition accuracies of the proposed HATRS and the benchmark ATRSs were assisted on the (d) and (e) sub-sets of the IFN/ENIT database using (i) the polynomial, linear and Gaussian SVM classifiers; (ii) the LM, BR, and SCG ANN training methods; and (iii) six image size normalizations, that is, 75x75, 80x100, 100x100, 100x125, 125x125, and 150x150. This paper concludes that the overall recognition results produced by the proposed HATRS using the SVM classifiers are better than the results obtained when using the ANN classifier.
The HATRS proposed herein gave higher classification accuracies than the benchmark ATRSs when applied on the six image size normalizations and the (e) and (d) subsets of the IFN/ENIT database and using the polynomial, linear, and Gaussian SVM classifiers. The optimum classification accuracies (97.46% and 83.34) were produced by the suggested system when it was applied on the (d) and (e) sub-sets of the IFN/ENIT database, respectively. Meantime, when using the SCG, LM, BR ANN training methods, the optimum classification accuracies achieved were 94.92% and 80.93%, respectively. The outcomes of this research support effectiveness of the suggested HATRS in the holistic recognition of handwritten Arabic texts. This study found that the Arabic text recognition outcomes of the suggested HATRS are promising because this system gave higher classification accuracies than two examined benchmark ATRSs. For related future studies, we recommend training the HATRS presented here by using combination of statistical and structural text features.