Paper— Real-Time Detection and Recognition of Road Traffic Signs using MSER and Random Forests Real-Time Detection and Recognition of Road Traffic Signs using MSER and Random Forests

— Real-time detection and recognition of road traffic signs plays an important role in advanced driving assistance system. Typically, the region of interest (ROI) method is effective in feature extraction but inefficient because it is sensitive to illumination changes. In this paper, we propose a maximally stable extremal regions (MSER) method with image enhancement to greatly improve ROI. Firstly, we employ gray world algorithm to process original images. And then potential areas of traffic signs are obtained through increasing the image contrast ratio and extracting the image-enhanced MSER. According to the characteristic variable and the geometry moment invariants, the geometric characteristics of traffic signs are extracted to obtain the ROIs. Finally, HSV-HOG-LBP feature is constructed and the random forests algorithm is used to identify the traffic signs. The experimental results show that our proposed method show strong robustness on illumination condition and rotation scale, and achieves a good performance by experiments with actual images and German traffic sign detection benchmark (GTSDB) data set.


Introduction
Traffic Sign Detection and Recognition (TSDR) is an important component of Advanced Driving Assistance System (ADAS) [1] . Various traffic information such as traffic signs, obstacles and so on can be perceived in real time by environmental sensors in ADAS. The possible traffic accident or traffic violation can be prejudged and warned, furthermore, emergency actions can be taken automatically. For example, when there is a speed limit sign in front of vehicle, the TSDR system can identify the sign ahead and give hints to driver for avoiding the over-speed behavior. ADAS can effectively reduce traffic accident rate, and play very important roles in improving the safety and comfort of driving.
In ADAS, it needs to automatically collect traffic information such as road signs, traffic signals and so on in the real-world road scene. The accuracy and real time will determine the performance of detection system. However, due to complexity of influ-ence factors in actual road scene, traffic sign detection and recognition with real time and high accuracy faces a great challenge [2]. So, automatic Traffic Sign Detection and Recognition (ATSDR) with image processing technology are considered as crucial technologies of various intelligent vehicle systems [3].
In real-world road traffic scene, efficiency of traffic sign detection and recognition is influenced by many factors, mainly are: • Color fading: Due to long exposure to the sun and rain, the signs will have a serious fading and be unclear. • Similarity and standardization of signs: Different countries have different standardized sets of signs, and there are certain similarities between different categories of signs. • Weather conditions: The clarity of sign images is affected by the weather conditions such as fog, cloud, rain and the snow. • Similar objects: The objects are similar with the signs in color and / or shape, such as buildings, billboards, or vehicles, etc. • Vehicle motion: Due to the sign images are taken by on-board camera, vehicle motion will lead to camera jitter and blur the image. • Illumination variations: The color of captured images is very sensitive to daytime sunshine and car light in night. The light reflection will produce highlights. The shadows will be produced under different light conditions on the whole or part of signs.
With the development of Intelligent Transportation System (ITS), traffic sign detection and recognition with image processing have been paid much attention by researchers. At the same time, with the influences of complex factors above, ATSDR technology still has great challenges.
To efficiently solve problems of the TSDR influenced by factors above especially the illumination variations, taking into account both speed and accuracy also, this paper proposes a method of real-time detection and recognition of traffic sign using MSER and random forests. This method is divided into two stages, detection stage and recognition stage. In the detection stage, illumination is rectified with the Gray World algorithm so as to reduce the impact of illumination variation. Then the ROI of traffic signs is obtained using MSER with CLAHE image enhancement method named IE-MSER. The geometry characteristic variables combined with the geometry moment invariants of traffic signs are designed to locate the signs more accurately. In recognition stage, we build a new feature descriptor named as HSV-HOG-LBP. An improved random forests algorithm with HSV-HOG-LBP descriptor is adopted in the classification and recognition. Experiment research is conducted with actual traffic scene images and GTSDB data set, which demonstrates the validity and advance of our method.
The main contributions of this work are presented as follows.
• It gives an improved traffic detection method. By using illumination rectification and MSER with image enhancement, the number of sign ROIs is maximized and the impacts of lighting and shadows are reduced. The geometry characteristic vari-ables combined with the geometry moment invariants of traffic signs are designed to achieve high detection rate. • The speed and accuracy of recognition are reached by using the improved random forests algorithm with new HSV-HOG-LBP descriptor, which can depress noise effectively and reach the needs of high accuracy and real time.
The overall organization of the conducted work is as follows: Section 1 introduces the background information of this topic and the main contribution of this work; Section 2 introduces the related work and clarifies the differences between the authors' work and the existing works; Section 3 gives the improved method of traffic sign detection; Section 4 presents the improved traffic sign recognition method based on random forests with HSV-HOG-LBP feature descriptor; In section 5, the experiments are carried out and discussed.

Related Works
Traffic sign recognition system can be generally divided into two parts: detection and recognition. The traffic signs have distinctive color and specific shapes, which can be easily observed by drivers. Traffic sign detection is usually based on the inherent characteristics of traffic signs (such as color and shape). The color characteristics of traffic signs are more obvious, they usually mainly are red, yellow, and blue. Color enhancement method [4] is used to extract red, yellow and blue tricolor spots, which emphasize that the pixels of a given color are dominant over the other two color channels in color space of RGB. But the effect is not ideal when it is influenced by illumination change. In [5], the Lab and HSI color spaces are used to extract candidate signs. Meanwhile, the detected white signs are help to decompose achromatic, discard the uninteresting area, connect the scattered signs, and separate signs at the same location.
Detection methods based on the color characteristics have low computing, good robustness and other characteristics [6][7][8] , which can improve the detection performance to a certain extent, but they depend on the corresponding threshold.
Shape-based detection researches are generally based on the specific shapes of traffic signs, mainly are triangles, circles and rectangles. These methods have a certain degree of robustness. Hough transform is often used to detect the traffic signs with circle or triangle shapes. However, this method is relatively time-consuming. The fusion detectors combined with both color and shape characteristics have good effect [9][10][11][12] , but due to various reasons, such methods appear missing cases. Gonzalez-Reyna [13] proposed a method based on directional gradient map and Karhunen-Loeve transform, which the classification accuracy is to 95.9% using the German traffic sign detection benchmark (GTSDB) data set. In recent years, some scholars have proposed many traffic signs detection methods based on machine learning [14] , such as the Aggregate Channel Feature (ACF) and the Integrated Channel Feature Fusion detector [15] . A traffic sign recognition method in [16] is proposed using evolutionary adaboost detection and forest-ECOC classification. Salti [17] designed a wave-based de-tector combining with MSER. Ruta [18] discussed a traffic sign detection, tracking and recognition system with in-vehicle camera using clustering and AdaBoost algorithm.
Machine learning and statistical learning methods are considered to be more effective methods for traffic signs recognition, such as Random Forests [19] , Convolution Neural Network (CNN) [20,21] , Support Vector Machine [22] . Huang [23] proposed a variant of HOG (HOGv), which can achieve a good balance between recognition accuracy and computational speed. In [24], a three-stage framework was proposed to realize the identification of traffic signs. Firstly, the classical Hough transform is used to determine the approximate position of signs. Then, the rotation invariant binary descriptor is used to realize the robustness detection. Finally, the neural network is adopted to reduce recognition time and obtain high recognition rate. In [25], an ROI extraction method based on contrast, split cascade tree detector and closed robust sparse classifier is proposed, which can detect and identify many kinds of traffic signs well.
CNN based methods can obtain high recognition accuracy, but they are more complex, the weight selection is difficult, the training time is too long, and the limitation is still needed to be improved, for instance, the training time in [20] is over 50 hours. The K-d tree and the Random Forests are used to attain high classification accuracy in [19], but the suitable descriptors have great impact on the performance of classifier.
In this paper, considering some situations such as influence factors mentioned above, balance of real time and accuracy, and lower complexity compared with some complex methods such as neural-network based methods, an improved detection and recognition method of traffic signs based on the maximally stable extremal regions (MSER) and random forests with new descriptor is proposed. The candidate regions in the image are extracted by MSER after processing with Gray World method and image enhancement. The ROIs are obtained according to the built characteristic variables such as aspect ratio, regional area ratio, and geometry moment invariants. A new feature descriptor named as HSV-HOG-LBP is constructed and used in the random forests algorithm which realizes the traffic signs recognition effectively.

Traffic sign detection
The procedure of traffic signs detection can be divided into three steps: image pretreatment, ROI extraction and geometric feature detection.

Image pretreatment with Gray World Algorithm
Traffic sign images are usually captured outdoors. In the real environment, different light intensity will lead to the difference between the image and the real image. In order to eliminate this phenomenon as much as possible, the gray world algorithm is used to correct the image.
Gray World algorithm [26] is based on gray world hypothesis, which assumes that for an image with a large number of color variations, the average of three components of R, G, and B tends to be same grayscale K. The grayscale K has two methods to determine: (1) Artificially set a fixed value, such as half of the maximum value of each channel of RGB, that is, 127 or 128.
(2) Determined by the average of three components RGB of image: Where R a , G a and B a are mean values of RGB channels respectively, and R, G and B are values of each channel after calculation. For the possible overflow (greater than 255), we set the pixel to 255, which may cause images slant white as a whole, but the experiments show that it has little effect on the results. This paper uses the second method to determine the value of K, the original captured image and the image dealt with gray world algorithm are shown in Figure.1.

ROI extraction method based on image enhancement MSER
The traffic signs are usually extracted according to the color and shape of signs. In [27], RGB images are converted to normalized red/blue images for further MSER detection, which facilitate the extraction of candidate regions with red and blue colors. In [28], the original image is transformed into a probabilistic map, and the MSER is adopted to extract the ROI. In [29], the color enhancement algorithm is used to enhance red and blue colors before using MSER to extract ROIs, which overcomes the drawbacks of color threshold segmentation method, such as poor generality and strong sensitivity to the color and illumination changes.
We use image enhancement method before MSER to extract ROI for improving image contrast and extraction accuracy. Contrast Limited Adaptive Histogram Equalization (CLAHE) [30] is adopted to enhance the image. The CLAHE algorithm compares the local histogram of image, redistributes the brightness to change image contrast, and improves local contrast to getting more detail. Compared with adaptive histogram equalization (AHE), CLAHE can effectively suppress noise amplification, and speed up operation with interpolation algorithm.
The CLAHE algorithm is as follows: 1. The image is divided into some blocks, the histogram is calculated first, then the histogram is trimmed, and the image is equalized. 2. Inter-block linear interpolation, traversal and operation of each image block. 3. Layer color filter mixed operation with original image.
As shown in Figure.2, the contrast between traffic sign in shadow and other area is low in original image, and the situation is improved after image enhancement with CLAHE. Fig. 2. Effect of image enhancement with CLAHE MSER is proposed firstly by Matas [31] for the research of robust wide-baseline stereo problems. MSER is similar to a watershed image. During the process of grayscale images from full black to full white, some connected regions vary little with the rise of the threshold, and these regions are extremely stable extremes.
Image I is a mapping from area D to gray S: D!Z 2 !S. Extremal regions are defined in images if: 1. S is totally ordered, i.e. reflexive, antisymmetric and transitive binary relation !exists. In this paper, S = {0, 1, . . . , 255} is considered, but extremal regions can be defined on e.g. real-valued images (S = R).

An adjacency (neighbourhood) relation
Region Q is a contiguous subset of D, i.e. for each p, q Q, there is a sequence p, a 1 , a 2 , . . . , a n , q and pAa 1 , a i Aa i+1 , a n Aq.
Region Boundary #Q. #Q = {q D\Q: "p Q: qAp}, i.e. the boundary #Q of Q is the set of pixels being adjacent to at least one pixel of Q but not belonging to Q.
The MSER can detect high-contrast, evenly distributed gray areas, so it can be used to detect uniform traffic signs. MSER can binarize each frame image with different thresholds and analyze the connected area. Fig.3 is the ROI extracted by MSER algorithm, the effect of different threshold is different, the larger the threshold, the less the candidate area, but if too large, it is possible to abandon the target area. The original image with traffic signs may be blurred by the weather, equipment, environment and other reasons, which cause the details of images not obvious. After image enhancement, the details are increased, which can improve the accuracy of MSER extraction. Fig.4 shows the difference before and after image enhancement. The original image is affected by light, the whole image is yellowish, and the contrast between the traffic signs area and others is low, the traffic signs area is not obvious to the surrounding, which makes it difficult for the MSER method to detect ROI. After Image enhancement, the detected area increased, the traffic signs can be clearly displayed, and the possibility of missed detection is reduced.

Geometric feature extraction of traffic signs
Preliminary screening for ROI. The potential area extracted with MSER contains many traffic sign regions as well as non traffic sign regions. In order to reduce the number of non traffic sign area, the preliminary screening of ROI is conducted according to the characteristic variables of traffic signs. Define the regional aspect ratio of region H and the area ratio between the potential area and the minimum enclosing rectangle A. The preliminary screening of ROI is according to the particular range of H and A:

Hmin<H< Hmax 2. A>Amin
Where the Minimum value H min , the maximum value H max of H, and the minimum value A min of A can be determined by traffic sign design specification. For traffic signs in China such as warning, prohibition, and indication signs, H min =0.5 , H max =2, A min =0.5.

Fig. 5. Preliminary screening for ROI
Secondary screening. After preliminary screening, the number of ROIs is greatly reduced. In order to further reduce non-target area, the geometry moment invariants method is adopted to screen ROIs again. The shapes of traffic sign are mainly triangular, rectangular and circular. Hu moment invariants [32] have fast and accurate characteristics on the recognition of simple shapes. With the invariance of rotation, scaling and translation, Hu moment invariants can avoid the impacts of deformation ratio, rotation and translation operations.
In the case of continuous, the image function is set to f (x, y), then the p+q order geometric moment of the image is: Where ! and ! are the center of gravity of the image respectively, ! ! ! !" Using the second and third order normalized center distances, seven moment invariants (Hu moment invariants), named as M 1 , M 2 , …, M 7 , can be constructed. M 1 and M 2 , which remain well invariance, are used in this paper.
The moment invariants of each ROI region are calculated and compared with the existing ones, only those with near-target are considered to be effective shape classes. The ROI of which the corresponding metric is outside the threshold, will be unqualified and discarded.
In Fig.6, the areas out of circle are discarded, and traffic sign is detected shown in the rectangular box.

Traffic sign recognition based on random forests with HSV-HOG-LBP feature
In this paper, the improved random forests method is used to recognize the traffic signs. The random forests method is an ensemble learning method, which is composed of any number of simple decision trees. During the generation of decision trees, the stochastic processes will be added in the row direction and column direction re-spectively, as well as the optimal tangent will be obtained. In the row direction, the training datas are obtained using the method of sampling with replacement. In the column direction, the feature subsets are obtained with the method of random sampling without replacement. There is no association between each decision tree, and the final results are decided by the vote of each decision tree. Random forests is not sensitive to multicollinearity, and the results are robust to missing data and unbalanced data, which can predict the effect of up to several thousand explanatory variables well. The random forests algorithm is as follows: Step 1. Generate single decision tree: 1. The number of training samples is N, single decision tree randomly extracts n from N as its training samples.

The number of input features of training examples is M, and m is far less than M.
During the division in each node of each decision tree, select randomly m input features from M, and choose a best one to divide from m. The m remains unchanged during the construction of decision tree. 3. Each tree has been divided until all training samples of node belong to same category. Because the randomness is ensured during the previous two random sampling processes, overfitting will not appear even if those trees are not pruned.
Step 2. Generate t decision trees. Follow step 1 to generate t decision trees to join the forest.
Step 3. Classify with random forests.
For each new test sample, the classification results of multiple decision trees are combined as the classification result of random forest: 1. If the target characteristic is the digital type: the average of t decision trees is taken as the classification result. 2. If the target feature is a category type: the majority of principle is followed, and the category with most single tree classification results will be the final classification result of the entire random forests.
It needs to select an appropriate feature as the basis of recognition for random forests algorithm. HOG feature has a very good performance in traffic sign recognition [19] . The basic idea of HOG feature is that the appearance and shape of local object can be characterized rather well by the distribution of local intensity gradients or edge directions, even without precise knowledge of the corresponding gradient or edge positions. Ellahyani [33] built a HOG features combined with the local self-similarity (LSS) features for random forests.
In this paper, we construct a new feature named as HSV-HOG-LBP. Localized binary model (LBP) is a local texture descriptor with strong classification ability and good robustness to gray scale changes caused by illumination, meanwhile the calculation of LBP is rather simple. The HOG performs poorly when the background is cluttered with noisy edge points. The concatenation with the HOG and LBP, called HOG+LBP, can reduce the influence of noise on the recognition results. We create a new eigenvector named HSV-HOG-LBP which HOG is combined with LBP in the HSV space.
The steps of feature fusion for HSV-HOG-LBP are as follows: 1. Transform the original image into HSV space and separate HSV to three spaces as H, S, and V. 2. In H space, the HOG and LBP eigenvectors of the image are calculated respectively, and connected to each other. The eigenvector H-HOG-LBP is obtained. 3. Similarly, the S-HOG-LBP and V-HOG-LBP eigenvectors are obtained by repeating step (2) in S space and V space. 4. The HSV-HOG-LBP eigenvectors are built by connecting three eigenvectors above.

Experimental results and discussion
This paper uses two kinds of experimental datas: one is the captured images set from the real-world road of China, the other is the German traffic sign detection benchmark (GTSDB) data set. The shapes of traffic signs are mainly include triangular, rectangular and round, and are classified into four categories: class a (speed limit signs), class b (prohibited signs), class c (mandatory signs) and class d (danger signs). The hardware environment of experiment mainly includes Intel i5-4200M CPU and 8G memory.

Experiments and analysis in detection stage
In the detection stage, the captured images are processed by gray world method and image enhancement CLAHE firstly. Then the ROIs of traffic sign are extracted effectively using image-enhanced MSER (IE-MSER) and Hu moment invariants method. Fig.7 shows an example of this method. The Recall ratio is defined to demonstrate the detection effect: Where ! !"! is the number of detected signs, and ! !"# is the sum of actual signs needed to be detected.
The experiments are conducted under normal light condition and weak light condition, and the IE-MSER method of this paper is compared with recent advance methods such as HOG+SVM [34] , FCN (Fully Convolutional Network) [35] , RGB_MSER [36] , YCbCr_DtBs [37] . The results of detection time and recall are shown in Table.1. Fig.8-9 show the detection examples for actual images and GTSDB. FCN method can achieve good effect under weak light condition, but the neural network based method need a large number of samples for training, and the training time is extremely long.
For the RGB_MSER method, it can process signs with three colors include R(red), G(green), B(blue), and can make a precise positioning of signs with multi-task CNN. Under the normal light condition, the positions of area with RGB colors can be extracted through RGB color normalization. But under weak light condition (at night for example), the color processing will not be done effectively for the decrease of color characteristics, which can reduce the detection effect.
IE-MSER method highlights the ROIs of sign with image enhancement, and the built characteristic variables and Hu moment invariants of signs can locate signs more accurately. So, it achieves best performance under not only normal light condition but also weak light condition.

Experiments and analysis in recognition stage
In this work, we use a random forest with 600 trees. The classification accuracy will be increased by the increasing of the number of trees, and becomes constant as the quantity reaches the value 600 shown in Fig.10. Therefore, in this work, we use a random forest classifier with 600 trees. Table.2 shows the Correct Classification Rates (CCR) of GTSDB data set adopting the random forest classifier with three different features. It indicates that the color characteristics can improve classification performance, e.g. HSV-HOG feature is better than HOG, and the recognition results are well improved for HSV-HOG-LBP feature.
HSV-HOG-LBP built by this paper can reach higher correct classification rates, the major reason is that the noise are depressed effectively. The HOG feature is easily affected by noise, so we compute HOG in HSV space and combine with LBP model, which can minimize noise effectively. Table.3 shows the CCR and recognition time of different methods. With the advantages of good fault tolerance, self-adaptive and strong self-learning ability, the methods based on neural networks such as FCN, Multi-task CNN, ANN can achieve good recognition effect, but they need take long training, for instance, the training time of this paper is within 1 hour and that of CNN method is more than 20 hours, which lead to higher running costs.

Overall results and discussion
The overall results shown in Table.4 can be achieved in combination with the experiment results of detection and recognition. The overall time is the sum of that of the detection and recognition, and the overall accuracy is the product of them. The overall time of this paper method from the original images to completion of detection and recognition is 469ms which reaches the need of real time.
For the HOG-SVM method in [34], the utility of only HOG feature is difficult to reduce the effect of noise, and the parameters for kernel function of SVM are difficult to adjust most suitable ones, which cause to poor performance.
CNN-based methods have some advantages in the time and accuracy, however there are many dynamic parameters need to configure especially in the model training.  The proposed method reaches well performance under two conditions as normal and weak light. In detection stage, the impact of illumination variation is reduced with the Gray World algorithm, the ROIs are highlighted with image-enhanced MSER, and signs can be located and extracted more accurately through the built characteristic variables and Hu moment invariants. In recognition stage, the built HSV-HOG-LBP can effectively reduce the effect of noise. Meanwhile the complexity of random forests method is lower than neural-network based methods. As a whole, the performance of this method is close to neural-network-based methods, even exceed them.

Conclusions
Aiming at the problem of traffic sign detection and recognition, this paper presents a traffic sign detection and recognition method based on image-enhanced MSER and random forests. The intensity of illumination is corrected using gray world algorithm. In order to improve the accuracy of ROI extraction, the CLAHE is used to enhance the image to improve the contrast. The ROI of traffic signs are extracted by the image-enhanced MSER method. According to the area of ROI and the geometry moment invariants, the circle, triangle and rectangle are identified. In the recognition stage, a new feature descriptor HSV-HOG-LBP is constructed, and the traffic signs recognition is realized by the method of random forests with HSV-HOG-LBP feature. Experimental tests are carried out using two kinks of data, including the traffic sign photographs on real road, and the GTSDB data set. Comparisons of each stage and overall performance with the recent advance methods are given. The experiments show that the proposed method has good robustness to the illumination condition and the rotation scale, and reaches well performance under two conditions as normal and weak light. As a whole, this method can reach the need of real time and accuracy for the detection and recognition of traffic signs.