Fault Diagnosis of Rolling Bearing Based on Tunable Q-Factor Wavelet Transform and Convolutional Neural Network

— The rolling bearing plays is used extensively in rotary machines and industrial processes. Effective fault diagnosis technology for a rolling bearing directly affects the life and operator safety of the device. In this paper, a fault diagnosis method based on a tunable-Q wavelet transform (TQWT) and a convolutional neural network (CNN) is proposed to reduce the influence of noise on the bearing vibration signal and to reduce the dependence on human experience in traditional diagnosis methods. TQWT is used to decompose and denoise the vibration signal, while the CNN extracts fault features and performs fault classification. Seven motor operating conditions—normal, drive end rolling ball failure (DE-B), drive end inner raceway failure (DE-IR), drive end outer raceway failure (DE-OR), fan end rolling ball failure (FE-B), fan end inner raceway fault (FE-IR) and fan end outer raceway fault (FE-OR)—are used to evaluate the proposed approach. The experimental results indicate that the fault diagnosis accuracy of the proposed method reaches 99.8%.


Introduction
Rolling bearings are widely employed in many machines. Its operational condition directly influences the lifetime and operator safety of the machine. Effective condition monitoring and fault diagnosis on rolling bearings are able to prevent unexpected device faults and enhance device operational efficiency [1][2]. Currently, there are two main categories of rolling bearing fault diagnosis techniques.
The first category uses traditional signal processing methods to extract fault features and then uses different pattern recognition methods for fault diagnosis. For example, Inturi et al [3] proposed an integrated condition monitoring scheme for the bearing on a wind turbine gearbox by using the discrete wavelet transform (DWT), together with vibration, acoustic, and lubrication oil analysis techniques. Their experimental results show that the presented integrated condition monitoring method has better classification accuracy than a single condition monitoring approach. Abdelkader et al [4] explored an improved empirical mode decomposition method to denoise the noisy vibration signals of rolling bearings. The experimental results indicate that the proposed approach is more suitable for the incipient fault detection compared with the traditional signal denoising methods. Manjurul et al [5] presented a bearing fault diagnosis approach using bearing acoustic emission signals and a Bayesian inferencebased multi-class support vector machine, and the proposed method is able to improve the classification accuracy. Tiwari et al [6] proposed a bearing fault diagnosis method using multi-scale permutation entropy and an adaptive neuro fuzzy classifier, and the experimental results verify the potential application value of their method for early bearing fault diagnosis.
In summary, the first category of bearing fault diagnosis methods is able to obtain acceptable fault diagnosis results. However, it depends mainly on previous experience to extract surface fault features, and it cannot deeply mine the correlation between vibration signal and bearing fault types. Moreover, the signal proceeding methods and pattern recognition methods are combined randomly in most application cases in this category. These drawbacks limit the broader application of these techniques.
The second category uses deep learning to extract fault features and realize fault classification. Since Hinton put forward the concept of deep learning [7], it has been successfully applied in many fields such as data transmission in IoT [8], image classification [9], and language understanding [10]. Several deep learning approaches have also been developed and reported for fault diagnosis. Xu et al [11] proposed a bearing fault diagnosis method by using a deep belief network with multiple hidden layers and affinity propagation. This method eliminates the dependence of fault labels on training processes and improves the efficiency of training. Compared with the traditional empirical mode decomposition methods, it has higher fault diagnosis accuracy. Sohaib et al [12] combined a hybrid feature pool and a deep neural network based on sparse stacked autoencoders (SSAEs) to identify different bearing faults and their severities. The experimental results show that compared with support vector machines and back propagation neural networks, the presented method can better classify bearing defects with various severities. Sun et al [13] proposed an intelligent bearing fault diagnosis approach based on compressed sensing and a SSAEs-based deep neural network to reduce the computing workload for processing the massive fault data. The effectiveness of the method is verified by rolling bearing data sets from Case Western Reserve University Bearing Data Center.
The convolutional neural network (CNN) [14] is another promising deep learning method for bearing fault diagnosis because it has high robustness and fault tolerance and is easy to train and optimize. CNNs have received significant research interest in the recent years and have been successfully applied in fault diagnosis. Verstraete et al [15] proposed a bearing fault diagnosis approach by using CNN to process the image representation obtained by time-frequency analysis on raw vibration signal. The feasibility of the presented system has been verified by experiments. Eren et al [16] proposed a generic intelligent bearing fault diagnosis method based on a compact adaptive 1D CNN fault classifier. The experimental results based on two commonly used benchmark vibration datasets indicated that this method can obtain competitive fault diagnosis results using the raw vibration signal and a 1D CNN with simple configurations. Islam et al [17] proposed a fault diagnosis system using adaptive deep convolu-tional neural network and wavelet packet transform to automate and better generalize the fault feature extraction and diagnosis process. Experimental results show that the presented method outperforms the existing multi-fault classification algorithms and has good classification accuracy.
The wavelet is a famous tool for signal and data processing. Since last year, researchers began to combine wavelet with CNN for various applications. Discrete wavelet transforms (DWT) and CNN were successfully used to identify gearbox faults [18], facial expressions [19], and brain tumors [20], while continuous wavelet transform (CWT) and CNN were explored to detect feeder faults in resonant grounding distribution systems [21]. In addition, wavelet packet decomposition (WPD) and CNN were successfully employed to build the wind speed prediction model [22].
The tunable-Q wavelet transform (TQWT) was proposed by Selesnick in 2011. Compared with the wavelet transform and Fourier transform, the TQWT is more suitable for analyzing nonstationary signals [23]. Although, there have been several successful applications of bearing fault diagnosis by using TQWT [24], bearing fault diagnosis by combining TQWT and CNN is a relatively unexplored area.
Compared with the above bearing fault diagnosis methods, this paper proposes a novel fault diagnosis method based on TQWT and CNN, in which TQWT is used to eliminate the influence of noise on original bearing vibration signal, while CNN is adopted for fault diagnosis.
The remainder of this paper is organized as follows. The principles of TQWT and CNN are briefly introduced in Section II. Section III describes the implementation methodology and gives the experimental results. Finally, Section IV presents the overall conclusions.

2
Theoretical Background

TQWT principle
TQWT is a fully discrete wavelet transform implemented by the iteration of twochannel filter banks and a discrete Fourier transform. Different from the traditional wavelet transform, TQWT is based on the oscillatory behavior of the measured signal rather than frequency behavior of the signal, and can adaptively adjust the Q value of the wavelet basis function according to the signal characteristics. Therefore, the degrees of oscillation in the wavelet basis function and in the extracted components of the measured signal can be better matched by using TQWT [23][24][25][26].
Q-factor ( ): The Q-factor (denoted as ) of the TQWT is defined as where is the center frequency of the signal and is signal bandwidth. It can be seen from the definition that reflects the frequency clustering, time aggregation, and resonance properties of the signal. A relative high Q-factor is suitable for the oscilla-tory signal with large bandwidth, while a low Q-factor is appropriate for little or no oscillatory behavior with small bandwidth.
Low-pass scaling: The low-pass scaling in the frequency domain preserves the low-frequency content of the signal. The principle of low-pass scaling is shown in Fig.1. is the low-pass scaling parameter. If the sampling frequency of the input signal is , the sampling frequency of the output signal will be .

Fig. 1. Block diagram of low-pass scaling
High-pass scaling: The high-frequency content of the signal is preserved with high pass scaling. The block diagram of the high-pass scaling is illustrated in Fig. 2. is the high-pass scaling parameter. If the sampling frequency of the input signal is , the sampling frequency of the output signal will be .

Fig. 2. Block diagram of high-pass scaling
The high-pass and low-pass scaling parameters are defined as Where is the redundancy factor that is usually no less than 3. To prevent the over redundancy of the wavelet transform and obtain perfect reconstruction with TQWT, both and should satisfy0 < < 1, 0 < < 1, and + > 1.
Signal decomposition and reconstruction: When the measured signal is processed by TQWT, the signal is decomposed and reconstructed by iteratively using the decomposition filter bank and reconstruction filter bank. Taking an N-layer signal decomposition and reconstruction as an example, the corresponding dual-channel filter banks are illustrated as Fig. 3. 0 ( ) and 1 ( ) are the frequency response functions of the low-pass filter and the high-pass filter, respectively, while 0 * ( ) and 1 * ( ) are the complex conjugates of 0 ( ) and 1 ( ). 0 ( ) and 1 ( ) are generally defined as    The length of the measured signal determines the number of TQWT levels, and the theoretical maximum decomposition level of TQWT is given by where is the length of the measured signal to be decomposed, and ⌊•⌋ represents the rounding function.
TQWT can set its Q-factor according to the actual applications without relying on the basis function. All the wavelets based on the same basis function and different Qfactor and the redundancy factor built up a library of wavelet functions. Therefore, TQWT is essentially a constant Q-factor wavelet transform with a certain degree of redundancy.

CNN principle
Generally, a CNN consists of an input layer, multiple alternating convolutional layers (i.e. C1…Cn) and pooling (sub sampling) layers (i.e. S1…Sn), a fully connected layer, and an output layer. In this research, one convolutional layer and one pooling layer are employed. The architecture of the CNN used in this application is depicted in Fig. 4. Input layer: In this paper, the input of the CNN is the vibration signal of the rolling bearing operating at various conditions. The size of the input data is 1024×3×1. Every data set consists of 1024 sampled points and every sampled point includes three vibration data obtained from the drive end, the fan end and the base of the motor.
Convolutional layer: The convolutional layer extracts the features of the original vibration signal. When the vibration signal is input to the CNN, a convolution operation with each receptive field is carried out using where, is the j-th output feature map of current layer, ′ is the i-th output feature map of the previous layer, is the weight matrix connecting the ′ and , is the additive bias of the j-th receptive field, and is the number of receptive fields in the previous layer.
Before training the CNN, the size and number the receptive field are set at 5×3 and 32. The output after the convolution of the vibration signal and the receptive field is feature signal C1.The number of channels is consistent with the number of receptive fields, and its length and width are ℎ ′ and ′ respectively. ℎ ′ and ′ are given by where, ⌊•⌋ represents downward rounding, ℎ and are the length and width of the input data, while , , and are the length, width, and stride length of the receptive field, respectively.
In this example, the ReLU function is used as the activation function. The dimension of feature signal C1 is unchanged by the activation function, so the size of feature signal C1 is 1020×1×32.
Pooling layer: Pooling layers reduce the dimension of extracted features to obtain the most essential signal features. Feature signal C1 is the input of the pooling layer, and feature signal S1 is the output of the pooling layer. The calculation formula for the pooling layer is Where, is the j-th output feature signal of current layer, ′ is the j-th output feature signal of the previous layer, (•) represents pooling rules, is the j-th multiplicative bias of current layer, and is the j-th additive bias of current layer.
In this case, maximum pooling with a range of 10×1 and a stride length of 1 is adopted. The dimension calculation method of feature signal S1 is similar to equation (10). As the pooling calculation only changes feature signal size, the dimension of feature signal S1 is 1011×1×32.
Fully connected layer and output layer: Through expanding the feature signal S1, a 1D Feature signal with a dimension of 32352×1 is obtained, which is used as the input signal of the fully connected layer. In this paper, the CNN is used to classify seven motor operating conditions network, so the fully connected layer has seven receptive fields and the size of the fully connected layer is 32352×1. The fully connected layer is calculated by where is the receptive field, is the input signal, and is the additive bias. The output of every fully connected layer ( ) is a constant. Finally, a Softmax function is used as the activation function for the condition classification.

Experimental Validation
The flowchart of the proposed rolling bearing fault diagnosis approach based on TQWT and CNN is shown in Fig. 5. It consists of a training part and a testing part. The training part implies the following steps.
Firstly, the Q-factor and redundancy γ are determined according to the original vibration signal characteristics and empirical methods, respectively, and then the optimal decomposition layers J are found.
Secondly, the vibration signal is decomposed and denoised using the TQWT with the selected parameters from the previous step.
Thirdly, the CNN is trained and tuned using the vibration signals processed by TQWT to obtain the essential fault features. Finally, the mapping relationship between the output of CNN and the corresponding bearing faults are determined. In the testing part, when the testing data sets are input the TQWT and CNN models obtained from the training data, the diagnosis results are obtained.
The algorithms implementing the flow chart of Fig. 5 are coded using MATLAB on a laptop with Intel i3-3240 3.4GHz CPU and 8GB RAM. As this paper mainly explores the feasibility of fault diagnosis using TQWT and CNN, instead of building up a testbed, this paper directly uses the data released by the Case Western Reserve University Bearing Data Center [27]. The experimental setup includes a 2hp electric motor, a torque sensor, a power test meter, a controller, and three accelerometers recording the vibration signals at the base plate, the drive end and the fan end of the motor. In

Signal denoising by TQWT
In order to reduce the influence of noise on the fault diagnosis result, the original vibration signal is firstly denoised by TQWT. The choice of the redundancy of TQWT is limited by the high and low filter transform scales, so the value of redundancy must be strictly greater than 1. But when the redundancy is too small (i.e., close to 1), the frequency domain response bandwidth of the signal is narrower, resulting in local degradation of the time domain response. In this paper, 3 is selected as the redundancy of TQWT since it has been widely used in previous studies [28][29][30][31]. The kurtosis of the vibration signal is used to determine Q-factor and the optimal decomposition level ( ) of TQWT. In this experiment, Q-factor is 1 and is 11. This is equal to the maximum decomposition level calculated by Equation (7). The 11th level domposition signal by TQWT is shown as Fig. 6. It is clear that the 11th level domposition signal is almost immune to noise and its waveform and period are clearer than the original signal.
As can be seen from Figure 6, the signal after TQWT based denoising is quite distinguishable, but the choice of redundancy and Q-factors of TQWT is based on the past experience, and the incorrect selection of the parameters may result in insufficient or excessive processing of the original vibration signal. Therefore, choosing CNN as a classifier is a sensible choice since CNN can maximize the advantages of machine learning, deeply explore the fault feature information of vibration signals, and avoid the influence of human factors on the diagnosis result as much as possible.

CNN model parameter setting
Due to the randomness and uninterpretability of the training process of a CNN, the training parameter settings of the CNN fault diagnosis model will directly affect its training time and fault diagnosis accuracy. Considering the computational complexity and reconstruction rate of the model, the proposed CNN model structure is designed as an input layer, a convolutional layerC1, a pooling layer S1, a fully connected layer, and an output layer. The maximum pooling value is used in the pooling layer. In the training procedure, the dropout layer was added to prevent the model from over fitting, the dropout ratio is 0.5, the learning rate is 0.01, the training maximum node is 20, and the mini-batch size is 5. During the testing phase, the dropout layer is closed. The main parameters of the convolutional layer and the sub-sampling layer are given in Table 1.

Experiment results
Fault diagnosis accuracy: In this experiment, 70%, 50%, 30%, and 10% vibration data sets for every operating condition are randomly selected as the training data for CNN, while the remaining data sets are applied as test data to evaluate the accuracy of the proposed fault diagnosis approach. The fault diagnosis results are given in Table  2. It can be seen that the fault diagnosis accuracy of the proposed method based on TQWT and CNN is high for all the four ratios of training data against testing data. The fault diagnosis accuracies reach 100% for the ratios of 7:3 and 5:5, while the accuracies reach 99.8% for the ratios of 3:7 and 1:9. The training times of the four ratios are from 8 seconds to 48 seconds. When the ratio of training data to testing data is 1:9, the least training time (8 seconds) is needed and the fault diagnosis accuracy is still more than 99.8%, which verifies that the training procedure of the proposed method is very effective and efficient.
The fault diagnosis accuracy for the seven motor operating conditions under various ratios of training data against testing data is shown in Fig. 7. It can be seen that the fault diagnosis accuracies for five operating conditions (normal (NOR), drive end rolling ball failure (DE-B), drive end inner raceway failure (DE-IR), drive end outer raceway failure (DE-OR), and fan end inner raceway fault (FE-IR)) are 100% for all the four training to testing data ratios. Although the proposed method is not able to identify all the fan end rolling ball failure (FE-B) under 3:7 and 1:9 data ratios, and fan end outer raceway fault (FE-OR) under 1:9 data ratio, both fault diagnosis accuracies reach 98%.

Fig. 7. Fault diagnosis accuracies for various motor operating conditions
Model generalization ability: In order to evaluate the generalization ability of the proposed algorithm, four other motor working conditions associated with the drive end bearing are used. The corresponding information about motor working status is listed in Table 3. The experimental results indicate that fault diagnosis accuracies reach 100% when the ratios of training data against testing data is 7:3, 5:5, and 3:7; when the data ratio is 1:9, the accuracy can reach 97.78%. Therefore, the proposed algorithm has significant generalization ability.
The effect of TQWT: In order to verify the effect of TQWT denoising used in this research, the fault diagnosis accuracy using TQWT and CNN is compared with the results by directly using CNN on the original vibration data. The experimental results are given in Table 4. The results indicate that the fault diagnosis accuracy using TQWT+CNN is higher than the accuracy by using CNN without TQWT when the ratios of training data against testing data is 5:5, 3:7, and 1:9. In particular, accuracy is enhanced by about 3% for data ratio 1:9, which verifies that TQWT denoising is more effective for small training data sets.
Comparison with other method: The fault diagnosis accuracy of the proposed approach is compared with results reported in paper [32], which using wavelet transform and support vector machine to classify five operating conditions. Using 590 data sets for training and 145 data sets for testing, paper [32] obtain an optimal diagnosis result at 99.29%. Using the data and our algorithm based on TQWT and CNN, the fault diagnosis accuracy can reach 100%.

Conclusion
In this paper, a fault diagnosis method based on TQWT and CNN has been proposed and evaluated using the vibration data sets of seven motor operating conditions released by the Case Western Reserve University Bearing Data Center. The experimental results show:  The fault diagnosis accuracies of the presented method reach 100% when the ratios of training data against testing data are 7:3 and 5:5, while the accuracies reach 99.8for the ratios of 3:7 and 1:9  The training procedure of the proposed method is very effective and efficient, since when the ratio of training data to testing data is 1:9, only 8 seconds are needed for training  The proposed algorithm has good generalization ability, and its fault diagnosis accuracy is higher than reported accuracy using wavelets and support vector machines. iJOE -Vol. 16, No. 2, 2020