IWSNs with On-Sensor Data Processing for Energy Efficient Machine Fault Diagnosis

Machine fault diagnosis systems need to collect and transmit dynamic signals, like vibration and current, at high-speed. However, industrial wireless sensor networks (IWSNs) and Industrial Internet of Things (IIoT) are generally based on low-speed wireless protocols, such as ZigBee and IEEE802.15.4. Large amounts of transmission data will increase the energy consumption and shorten the lifetime of energy-constrained IWSN nodes as well. To address these tensions when implementing machine fault diagnosis applications in IWSNs, this paper proposes an energy efficient IWSN with onsensor data processing. On-sensor wavelet transforms using four popular mother wavelets are explored for fault feature extraction, while an on-sensor support vector machine classifier is investigated for fault diagnosis. The effectiveness of the presented approach is evaluated by a set of experiments using motor bearing vibration data. The experimental results show that compared with raw data transmission, the proposed on-sensor fault diagnosis method can reduce the payload transmission data by 99.95%, and reduce the node energy consumption by about 10%, while the fault diagnosis accuracy of the proposed approach reaches 98%. Keywords—Industrial wireless sensor networks (IWSNs), fault diagnosis, wavelet transform, support vector machine, Industrial Internet of Things (IIoT)


Introduction
In recent decades, many novel machine fault diagnosis approaches have been proposed to prevent unexpected catastrophic machine failures and reduce the related economic loss due to these faults [1]. Currently, the emerging of Internet of Things (IoT) and its deployment in industrial settings, namely Industrial Internet of Things (IIoT), are transforming traditional industries in many areas including machine fault diagnosis [2][3][4][5][6]. IIoT and its wireless implementation, industrial wireless sensor networks (IWSNs), can sense device information and then transmit this data via a base station and the Internet to powerful cloud servers to enable real-time wireless machine condition monitoring and fault diagnosis [7].
ple of risk minimization [14][15][16]. To date, on-sensor fault diagnosis using SVM is a relatively unexplored area for IWSNs although there are many successful fault diagnosis applications based on the wired system with SVM.
This paper explores the feasibility of using IWSNs with on-sensor WT and SVM for fault feature extraction and fault diagnosis, compares the effectiveness of onsensor fault feature extraction using various mother wavelets, and also quantifies the node energy cost of the proposed on-sensor fault diagnosis approach. In this paper, the induction motor and vibration signals are taken as an example of monitored industrial equipment and signals due to their wide use. Machine failures due to bearings and the related components are more than 40 percent of all motor failures, so this project focuses on motor bearing faults [17,18]. As this paper mainly investigates the feasibility of on-sensor fault diagnosis, instead of building up a motor fault diagnosis testbed, this research directly uses the data from a well-known freely-available fault signal database at Case Western Reserve University (CWRU) Bearing Data Center as the training and testing data for on-sensor fault diagnosis [19]. Validating the new machine fault diagnosis technologies by using available dataset is a methodology adopted by many researchers, and the bearing data of CWRU has been used as a standard data set in many research projects [20][21][22].
This paper significantly extends our group's previous work on wavelet analysis [13] with a broader range of mother wavelets and a more sophisticated classification scheme to give significantly better results. More specifically, the paper makes the following new contributions, compared to our previous work and the work of others in analyzing bearing faults using wavelet analysis.
Firstly, the work focuses on wavelet-based vibration signal feature extractions which are suitable for on-sensor computation. To achieve this four different mother wavelets are investigated, including the Symlet2 wavelet which requires only integer computations rather than floating point computations. The Symlet2 wavelet is shown to produce similar classification accuracy with reduced computation cost.
Secondly, compared with other wavelet-based classification schemes, this work uses a new, smaller set of features which are the signal energies at each wavelet decomposition level. This significantly reduces the number of features, which allows for the use of fewer inputs to the classifier, again reducing the computational burden for on-sensor computation on the sensor node.
Thirdly, compared with previous work, this paper explores the use of an SVM classifier. SVM produces higher classification accuracy compared to a minimum distance classifier and gives deterministic results compared to ANN classifiers which can give slightly different results based on the random initialization of weights. Additional, the SVM has many fewer hyper-parameters that need to be decided compared to an ANN.
Finally, compared to the majority of other machine condition monitoring solutions which transmit raw vibration signals to a central server, the solution presented here computes the classification on the sensor node. The paper presents a detailed comparison of the energy consumption for raw data transmission versus on-sensor classification and shows a small benefit for the latter. This is in addition to the other benefits of autonomous condition monitoring, such as an immediate ability to respond to faults, compared to a solution which depends on a remote diagnosis with increased latency and dependence on reliable radio communications The remainder of this paper is organized as follows. The theoretical background of WT and SVM are introduced in Section II. Section III describes the system architecture and implementation methodology. The experimental evaluation of the proposed system is given in Section IV. Finally, Section V presents the overall conclusions.

2
Theoretical Background

Wavelet transform theory
Compared with Gabor and short-time Fourier transforms, the wavelet transform is a more sophisticated time-frequency analysis technique. It has strong time localization and multi-resolution analysis abilities and is suitable for processing non-stationary and transient signals, such as machine fault signals. The wavelet transform has two forms, namely, the continuous wavelet transform (CWT) and the discrete wavelet transform (DWT). CWT is mainly used to analyze continuous time-domain signals by decomposing different segments of the signal with an adjustable window function. The CWT is defined as where a, b, x(t), and ψ are the scale parameter, translation parameter, time-domain signal, and mother wavelet, respectively, and ψ* is the complex conjugate of ψ [13].
The DWT is the implementation of WT in discrete form. It is represented by (7,8) where a = 2 B and b = 2 B k are the scale parameter and translation parameter [13,23]. The DWT decomposes the original time-domain signal, x(t), into two components by passing the signal through a series of high and low pass filters. Therefore, the signal can be described as follows where A B is the low frequency band signals (approximations) at level j, while D B represents the high frequency bands (details) [13,24]. In other words, the signal is decomposed into lowest level approximations and jth level details of wavelet coefficients.

Support vector machine theory
An SVM is a statistical machine learning technique that has been widely applied in data classification [14,25,26]. SVM is powerful for linear classification problems, especially for the problem with small amounts of training data. It also can be used for nonlinear and high dimensional data classification by introducing kernel functions. The introduction of kernel functions reduces the computational complexity as well and makes SVM suitable for embedding in IWSNs node. SVM completes the classification process by seeking the optimal hyper-plane with the maximal margin between the separate data classes.
Taking two two-dimensional data sets as an example, the basic principle of the SVM classifier is illustrated in Fig. 1. The dashed line (H) is the optimal hyper-plane, which separates the two-class data points with the maximal margin, namely, the distance between H and the nearest data point in each class is maximal. These nearest data points are called support vectors, while the two solid lines (H1 and H2) parallel to H are known as bounding planes. The distance between H1 and H2 is the classification margin, which is equalto2/ǁwǁ. The optimal hyper-plane parameters for the biggest margin can be transformed into a convex quadratic programming problem that can be solved more easily.
For linearly separable data, H is found by solving the following equation: For non-complete separable data, the quadratic optimization problem becomes: where C is the penalty parameter that controls the trade-off between training error and generalization, while T is the sl6)ack variables that measure misclassification degree.
For the non-linearly separable data, the data is mapped into a high-dimensional feature space by some non-linear mapping functions, called kernel functions. After data space transformation, the optimal hyper-plane can be built to separate the data linearly [25]. In this paper, RBF kernel function is selected as the kernel function because it is able to fit any curve in any feature space and it has been used successfully in many wired fault diagnosis applications. The RBF kernel is as follows: In this research, the grid search algorithm is adopted to find the best C and γ parameters. The best value of C is 0.125, while best γ is 2.8284.
The basic SVM is designed to deal with binary classification problems. However, numerous multiclass classification tasks in practical applications encouraged researchers to extend SVM for multiclass problems. Recently, many multiclass classification methods have been proposed, such as one-against-all, one-against-others, one-against-one, and directed acyclic graph support vector machines (DAGSVM). Compared with one-against-all and one-against-others, one-against-one and DAGSVM methods need a shorter training time [27][28][29]. Although DAGSVM needs the same training time as one-against-one, it has a shorter testing time. Therefore, the DAGSVM method is adopted in this project to identify the various operating status of the motor. System Architecture and Implementation The architecture of the proposed machine fault diagnosis system using IIoT and IWSNs with on-sensor WT and multiclass support vector machine (M-SVM) is illustrated in Fig. 2. The system consists of a star topology IWSN with one coordinator and several sensor nodes, a computer working as the gateway, a cloud platform, and a management portal. ZigBee and a Jennic JN5139 sensor board and controller board are selected as the communication protocol and the hardware platform for the end nodes and the coordinator of the IWSN. The signal acquisition, WT fault feature extraction, and M-SVM fault diagnosis are completed on the IWSN end nodes, and then the fault diagnosis results are collected and transmitted through the coordinator and the gateway to the cloud platform for subsequent access by the management portal. The end nodes can switch to sleep mode between signal acquisition, fault feature extraction, and fault diagnosis stages to reduce node energy consumption and prolong the lifetime of IWSNs and IIoT. The details of the system are described below.

Machine fault signal
As introduced in section I, this project uses the vibration data of normal and faulty bearings provided by the Bearing Data Center at CWRU as the training and testing data for the proposed on-sensor fault diagnosis method. The test bed of CWRU is shown in the left part of Fig. 2. It consists of a 2 hp reliance electric motor, a torque transducer, and a dynamometer. The motor speed is 1797rpm. Rolling ball fault, inner race fault, and outer race fault with different fault diameters were separately seeded on the normal bearing using electro-discharge machining, and the vibration signal is collected using accelerometers and a 16 channel DAT recorder with 12 kHz sampling frequency.
In this paper, five bearing working conditions, namely normal condition bearing (NOR), bearing with inner raceway fault of 0.007 inches in diameter (IR007), bearing with inner raceway fault of 0.021 inches in diameter (IR021), bearing with rolling ball fault of 0.021 inches in diameter (B021), and bearing with outer raceway fault of 0.021 inches in diameter (OR021), are selected for further fault diagnosis experiment. Fig. 3 shows the original vibration signal data of examples of each of the five conditions. For these experiments, classification is based on a single accelerometer at the fan-end of the motor, which is sufficient to identify these bearing faults. Compared with the signal in a normal condition, the signal amplitudes change significantly when a fault occurs in the bearing.

Wavelet transform fault feature extraction
One wavelet transform method with low-memory requirements presented in [30] is selected for the resources-constrained IWSN nodes. The 2-level wavelet transform on bearing vibration signals with four popular used mother wavelets, namely Db97, Db53, Coiflet1, and Symlet2 wavelets, are computed to verify the feasibility of the proposed on-sensor WT fault feature extraction, and to compare the fault feature extraction effectiveness of the various mother wavelets. Daubechie, Coiflet, and Symlet mother wavelets are orthogonal, symmetric or approximately symmetric, and successfully used in many fault diagnosis applications [31][32][33]. The selected four mother wavelets are shown in Fig. 4. The filter coefficients of Db97, Db53, Coiflet1, and Symlet2 wavelets as given in [30,34,35] are used in this research.
After the wavelet transform, the signal energies of the wavelet coefficients of each DWT level are calculated as the fault features to reduce fault feature set size because wavelet coefficients are still too large to be directly transmitted by the IWSNs as the fault features. The signal energy feature used in this paper is defined as follows: (7) where Sj(t) is the wavelet signal in decomposition level j, yj(k) is the kth wavelet coefficients in DWT level j, and n is the sample number of each DWT level. The obtained signal energy of the wavelet coefficients is then used as the input of the M-SVM fault classifier which will be described in the next section.

M-SVM Fault diagnosis
Due to its short training and testing time, DAGSVM is chosen as the multiclass fault classifier in this paper. The principle of a DAG for classifying five machine working conditions is shown in Fig.5.
We can see that there are 5*(5-1)/2=10 internal nodes and 5 leaf nodes in Fig.5. Each internal node is a binary SVM classifier that has been trained by a distinct pair To evaluate a test data set, we start at the root node. The binary output of the root node, namely Normal VS OR021, is calculated first, the node is then exited via the left edge if the result does not indicate OR021; or the right edge if the binary output does not indicate Normal. The binary output of the next node (for example, Normal VS B021 in level 2 is then evaluated. By repeating this calculation and evaluation process at every level, we can travel down the DAG and finally reach a leaf node that indicates the predicted machine working condition. For a problem with N classes, N-1 decision nodes, one in each level, will be evaluated to complete the classification procedure. In this research, N is set as 5. The purple dotted line in Fig. 5 is one possible path taken through the DAG, representing the evaluation path.

Experimental Validation
In this section, a set of experiments were carried out to evaluate the proposed approach. Firstly, the vibration data from the Bearing Data Center at CWRU is stored in the Jennic JN5139 wireless sensor node, which is a typical commercial IWSN node with 192 kB ROM, 96 kB RAM, and ZigBee radio, and is suitable for on-sensor data processing. Secondly, the 2-level wavelet transforms with four popular used mother wavelets are implemented by C language, and then embedded and carried out on JN5139, to verify the feasibility of the proposed on-sensor WT fault feature extraction, and to analyze the fault feature performance of different mother wavelets. Thirdly, the accuracy of the presented on-sensor M-SVM is evaluated. Finally, the data transmission and energy consumption of the proposed approach are analyzed. The detailed steps and results of this experiment are given below.

WT Fault feature extraction
In this experiment, the feasibility of on-sensor fault feature extraction using WT is explored. The 2-level wavelet transforms with four different mother wavelets, namely Db97, Db53, Coiflet1 and Symlet2 wavelet, are conducted on IWSNs node to decompose vibration signals in the five conditions, namely NOR, IR007, IR021, B021, and OR021.
The vibration data used in this step are collected from the sensor nodes installed at the fan end of the motor housing. 1024 samples constitute a data set of one bearing condition, so the total number of samples is 5120. The CWRU Bearing Data was collected from a single machine with different sets of introduced bearing faults. Such faults are indicative of a scenario where a single class of machine is used in a variety of applications, running at similar speeds. In a real-world scenario, bearing fault data would either be collected by the manufacturer in specific testing, or through the longterm recording of data from many machines in the field. Traces of both normal and faulty operation would be accumulated in a central data store and would be used to regularly update and retrain the classifiers. So, even if a specific machine has never had a fault, the broader database would allow such fault conditions to be trained in the classifier. For operations at different speeds, different classifiers could be constructed for these scenarios. Of course, bearing faults are not the only failure mechanism. Other faults like shaft misalignments or out-of-balance faults could also be included. The range of fault signals could also be increased (such as vibrations in different axes). This experiment investigates one class of fault (bearing fault) on a specific machine, with the specific goal of analyzing the accuracy and energy efficiency of on-sensor classification.
The original vibration signals and corresponding wavelet coefficients after 2-level DWT are shown as Fig. 6, where Detail 1 is the detail coefficients at 1st level, Detail 2 is the detail coefficients at 2nd level, and Approx 2 is the approximation coefficients at 2nd level. Although vibration signals amplitude rose significantly for a faulty bearing, it is still difficult to decide bearing working condition just by vibration signal amplitude. In addition, compared to the normal condition, the wavelet coefficients of the faulty bearings have different characteristics.
E1, E2, and E3, the energy of the corresponding wavelet coefficients of the testing data sets, are then calculated using equation (7) on the sensor node. Although the sum of energy of all the wavelet coefficients at all details and approximate parts is equal to the energy of the original vibration signal, the energy distribution at various frequency bands will change according to the bearing working condition. The normalized wave- let energy signals for vibration signals under five bearing working conditions using four different mother wavelets are shown in Fig. 7. It is easier to distinguish the different bearing working status by using the energy signals rather than using vibration amplitude. Fig. 7. The normalized energy of wavelet coefficients for vibration signals under five bearing working conditions using four different mother wavelets

M-SVM Fault diagnosis
In this section, the feasibility of on-sensor multiclass fault diagnosis using DAGSVM is investigated. The vibration data from the bearing under the above mentioned five working conditions are used.
First, a total of 450 training data sets, 90 for each condition, are used to train the 10 SVM binary classifiers off-line. After training, the obtained M-SVM classifier parameters with different mother wavelets are given in Table 1. It can be seen that Coiflet1 (Coif1) wavelet needs the least training time, while Symlet2 (Sym2) has the smallest support vector number and potentially shortest calculation time in the on-line fault diagnosis procedure. The training accuracies of M-SVM classifiers with different mother wavelets are given in Table 2. It can be seen that the total training accuracy of M-SVM classifiers with Coiflet1 and Symlet2 wavelet reach 98%, while the accuracy of Db97 and Db53 are 93% and 95%, respectively.
Second, the obtained parameters of the M-SVM classifiers are then embedded in the program on the sensor nodes. Then 140 data sets, 28 for each condition, which were not used for training, were used for testing and verification on-line. The testing accuracy of M-SVM classifiers with different mother wavelets is given in Table 3. The training accuracy of M-SVM classifiers with all of the four mother wavelets exceeds 90%. The M-SVM classifier using Symlet2 wavelet gives the highest accuracy, which reaches 99.29%, while Coiflet1 wavelet has an accuracy of 98.57%. Third, 590 data sets from another set of vibration data are used to test the performance of the obtained M-SVM classifier models again. The results are given in Table  4. It can be seen that the classification accuracy of Coiflet1 and Symlet2 wavelet reaches 98.31%, and are better than the results of Db97 and Db53 wavelet. The Classification results of Coiflet1 and Symlet2 wavelet are also illustrated by confusion matrix in Fig. 8. Fourthly, we randomly divide the 560 sets of data into 8 groups. Each group includes 70 data sets, 14 for each condition. These data are used to verify the overall classification effect of the obtained M-SVM classifier with different mother wavelets again. The results are shown in Fig.9. Compared with Db97 and Db53 wavelet, Coiflet1 and Symlet2 wavelet have higher overall classification accuracy (98.31%) and less fluctuation.
Finally, the effectiveness of the proposed M-SVM method is compared with the effectiveness of fault classifiers based on ANN and minimum distance methods. In this experiment, Coiflet1 wavelet is used for fault feature extraction due to its better performance mentioned above, and the neural network has three inputs, five hidden layer neurons, and five output layer neurons. The experimental result is shown in Fig. 10, which indicates that the fault diagnosis accuracy of M-SVM method is far superior to the results of the neural network and minimum distance methods. The accuracy of the presented on-sensor approach has 15% and 30% higher accuracy than ANN and minimum distance methods.   Finally, the effectiveness of the proposed M-SVM method is compared with the effectiveness of fault classifiers based on ANN and minimum distance methods. In this experiment, Coiflet1 wavelet is used for fault feature extraction due to its better performance mentioned above, and the neural network has three inputs, five hidden layer neurons, and five output layer neurons. The experimental result is shown in Fig. 10, which indicates that the fault diagnosis accuracy of M-SVM method is far superior to the results of the neural network and minimum distance methods. The accuracy of the presented on-sensor approach has 15% and 30% higher accuracy than ANN and minimum distance methods.

Payload transmission data and Node energy consumption
In this section, the transmission data and node energy consumption for data transmitted after on-sensor WT fault feature extraction and SVM fault diagnosis and for raw data transmission are tested and compared by a series of experiments.
Compared with raw data direct transmission, the on-sensor fault diagnosis method using Symlet2 WT and DAGSVM reduces energy consumption from 42 mJ to 37.8 mJ, i.e. a decrease of 4.2 mJ or 10%.
The details of payload data transmission and node energy consumption for raw data transmission and on-sensor fault diagnosis are given in Table 5. It can be seen that the energy consumption of on-sensor fault diagnosis depends on the calculation time and complexity of the selected algorithm. The energy consumption for on-sensor fault diagnosis with Db53 WT and SVM is similar to the energy utilization for raw data transmission, while the energy consumption of on-sensor fault diagnosis with Ciof1 WT or Db97 WT and SVM is higher than the energy utilization of raw data transmission

Conclusion
In this paper, we proposed a novel machine fault diagnosis method, which uses IIoT and IWSNs with on-sensor fault feature extraction by wavelet transform and onsensor fault diagnosis by M-SVM to reduce the payload transmission data in IWSN. Four popular mother wavelets, namely Db97, Db53, Coiflet1, and Symlet2 wavelet, and DAGSVM are selected and implemented on the IWSN sensor node.
While previous work has demonstrated that wavelet-based features and SVM classification can be used for bearing-fault diagnosis, this paper introduces some novel results. It particularly addresses the practicality of implementation on a typical wireless sensor node with limited energy. It investigates the energy savings from using low-computational complexity wavelets (Db53 and Sym2) on-board the sensor node with only classification results transmitted to the coordinator. This work is the first to identify that the Sym2 wavelet provides both low computation cost and high classification accuracy.
The feasibility and effectiveness of the presented approach have been demonstrated by a set of experiments using the bearing vibration data obtained from the Bearing Data Center at CWRU. Testing results show the following.
• Compared with raw data transmission, the proposed on-sensor fault diagnosis method can reduce the payload transmission data by 99.95%, and reduce the node energy consumption by about 10%; • The fault diagnosis accuracy of the proposed method with all the four mother wavelets exceeds 91%, while the accuracy by Coiflet1 and Symlet2 wavelet reaches 98%; • The accuracy of the presented on-sensor approach with Coiflet1 wavelet is 15% and 30% higher than the accuracy of ANN and minimum distance methods.
There is, of course, significant scope to investigate other features such as MEL, other classifiers such as Random Forest classifiers, or even techniques that combine feature extraction and classification such as convolutional neural networks.
The on-sensor calculations just take a few seconds, so the system can operate in real-time. The duty-cycle will be determined by the available energy, but with sufficient energy, monitoring could be done continuously, with the condition monitoring equipment permanently connected to the motor. In such situations, the ruggedness and durability of the system would need to be consistent with the anticipated lifetime of the motor.
The energy consumption results show that small energy savings can be made, of the order of 10% by using on-sensor computation. However, the relatively small savings suggest that there is still scope for improved performance by reducing the energy cost of on-sensor processing, using more energy efficient computation architectures