mRMR-DNN with Transfer Learning for Intelligent Fault Diagnosis of Rotating Machines

mRMR-DNN with Transfer Learning for Intelligent Fault Diagnosis of Rotating Machines

Abstract

In recent years, intelligent condition-based monitoring of rotary machinery systems has become a major research focus of machine fault diagnosis. In condition-based monitoring, it is challenging to form a large-scale well-annotated dataset due to the expense of data acquisition and costly annotation. Along with that, the generated data have a large number of redundant features which degraded the performance of the machine learning models. To overcome this, we have utilized the advantages of minimum redundancy maximum relevance (mRMR) and transfer learning with deep learning model. In this work, mRMR is combined with deep learning and deep transfer learning framework to improve the fault diagnostics performance in term of accuracy and computational complexity. The mRMR reduces the redundant information from data and increases the deep learning performance, whereas transfer learning, reduces a large amount of data dependency for training the model. In the proposed work, two frameworks, i.e., mRMR with deep learning and mRMR with deep transfer learning, have explored and validated on CWRU and IMS rolling element bearings datasets. The analysis shows that the proposed frameworks are able to obtain better diagnostic accuracy in comparison of existing methods and also able to handle the data with a large number of features more quickly.

mRMR, Feature Selection, Feature Extraction, Deep learning, Transfer learning.

I Introduction

With the recent advancement of technology, intelligent condition monitoring of rotating machines become an essential tool of machine fault diagnosis to increase the reliability and ensure the equipment efficiency in industrial processes [c1, c2, c3]. Rotating components, which are essential parts of machines, are widely used in equipment transmission systems, and their failure might result in considerable loss and catastrophic consequences. As practical components for condition-based maintenance, the vibration-based fault diagnosis systems have been explored in recent years [c4].

Traditionally, machine fault diagnosis framework includes three main stages: 1) signal acquisition, 2) feature extraction and selection, 3) fault identification or classification. The signal acquisition stage involves the collection of raw data while the machine is in running condition. The signals such as vibration, temperature, current, sound pressure, and acoustic emission can be studied for health monitoring and fault diagnosis, but the vibration signal is extensively explored in the literature because it provides essential information about the faults [c5, c6, c7, patt]. In the second stage, feature extraction is used to extract informative features from the raw data using time-domain, frequency-domain, and time-frequency domain analysis [c8]. Although these feature extraction methods identify the machine health conditions, however, they may have irrelevant and insensitive features which affect the fault diagnosis performance. Therefore, feature selection methods such as mutual information: criteria of max-dependency, max-relevance, and min-redundancy, principal component analysis (PCA) and Fisher discriminant analysis (FDA) is widely used to select the essential features from the data [c9, c10, c11, t2pca]. In the final stage, selected features are used for fault classification using various classifier, i.e., support vector machine (SVM), k-nearest neighbor (k-NN), random forest (RF), and artificial neural network (ANN) [c12]. However, there are also several limitations with these traditional fault diagnosis methods.

In 2006, Hinton et al. [c13] proposed deep learning techniques which trying to learn the high-level representation of data by stacking the multiple layers in the hierarchical architecture. In recent years, several studies have focused on deep neural network (DNN) for machine fault diagnosis. Tao et al. [c14] suggested a deep neural network framework for bearing fault diagnosis based on stacked auto-encoder and softmax regression. In [c15, maurya, mauryavs] authors have proposed a DNN-based intelligent fault diagnosis method for the classification of different datasets from bearings element and gearboxes with large samples using auto-encoder. Sun et al. [c16], proposed a sparse auto-encoder-based DNN with the help of the denoising coding and dropout method for induction motor fault diagnosis. Ding et al. [c18] developed a deep convnet in which wavelet packet energy has used as input for bearing fault diagnosis. In [c19], intelligent machine bearings fault diagnosis method is presented by combining the compressed data acquisition and deep learning approach in a single framework.

Although deep learning-based models have achieved great success in machine fault diagnosis applications, however, there are still problems that are associated with deep learning models. As the number of hidden layers and nodes is increasing, the number of parameters also increased, which increases the computational complexity of the model. Along with that large amount of labeled training data is required for training the deep network from scratch. In addition to that, parameter optimization and hyperparameter tuning greatly affect the performance of deep networks. Transfer learning-based approaches have been used to overcome these problems where instead of training the deep learning models from scratch, a DNN that has been trained on sufficient labelled training data in different running conditions is used and fine-tuned on the target task. In the literature, various case studies have been performed using transfer learning with the DNN. Lu et al. [tr1] proposed a DNN model with domain adaptation for fault diagnosis. First, they utilize the domain adaptation to strengthen the representation of the original data, so that high classification accuracy can be obtained on the target domain. Second, they proposed various strategies to explore the optimal hyperparameters of the model. Long et al.[tr2] have presented a deep transfer learning-based model using sparse auto-encoder (SAE) for fault diagnosis. In which three layers SAE is used to extract the abstract features of raw data and uses the maximum mean discrepancy term to minimize the discrepancy between the features of training and testing data. Gao et al. [tr3] presented an intelligent fault diagnosis of machines with unlabeled data using deep convolutional transfer learning network. Siyu et al. [tr4] have proposed a highly accurate machine fault diagnosis model using deep transfer learning with a convolution neural network. However, the performance of these models again reduced due to redundant features in the dataset because, in the presence of redundant features, the different initial condition will lead to different performance.

In this paper, we have addressed the problem mentioned above, by employing the mutual information: criteria of max-dependency, max-relevance, and min-redundancy to select a subset of features with minimum redundancy. The selected features are used to train the DNN to extract meaningful representation in the lower dimension. In this work, two frameworks of intelligent fault diagnosis method have been evaluated. In the first framework, the DNN is pre-trained and fine-tuned on the same running condition, and the fine-tuned network is validated on unseen samples of same running conditions. However, in the second framework, the deep neural network is pre-trained on one running condition with unlabelled data and pre-trained weights are transferred on target domain and finally target network is fine-tuned on different running condition with less number of sample. So in the second framework, the pre-training time of the target network is totally eliminated. In the real application, as mentioned, it is challenging to form a large-scale well-annotated dataset. In this scenario, the second framework is more useful.

The major contributions of the paper are summarized as:

  1. mRMR with deep learning and mRMR with deep transfer learning frameworks have been proposed for an intelligent machine fault diagnosis, as shown in Fig 2.

  2. mRMR based machine learning-based method has been evaluated to minimize the effect of redundant features in the dataset. The redundant features decrease the performance of the deep models because in the presence of redundant features, the different initial condition will lead to different performance.

  3. Deep learning and deep transfer learning based methods have been evaluated for better feature representation in reduced dimension with lower complexity.

  4. t-Distributed Stochastic Neighbor Embedding (t-SNE) and confusion matrix chart are used for the visualization of reduced features and to describe the performance of the classification model.

  5. Experiments have been conducted on CWRU [dataset] and IMS [ims] datasets to show the efficacy of the proposed approach in comparison of state-of-the-art methods.

The remainder of this paper is organized as follows: Section II briefly introduces the theoretical background for Minimum Redundancy Maximum Relevance, deep neural network and transfer learning. Section III describes the proposed mRMR-DNN based transfer learning framework for intelligent condition based monitoring of rotating machinery fault diagnosis. Section IV presents the experimental results and analysis of proposed method in comparison with state-of-the-art methods. Finally, Section V draw the conclusion of complete paper.

Ii Theoretical Background

Ii-a Minimum Redundancy Maximum Relevance (mRMR) [mRMR]

The mutual information is used to determine the feature set with features which jointly have the maximum dependency on the target class . This approach is termed as Max-Dependency and written as

(1)

where, is the mutual information between feature subset and class .

The Max-Dependency criterion is difficult to implement in the high-dimensional feature space: 1) the number of samples is often insufficient and 2) multivariate density estimation often involves computing the inverse of high-dimension covariance matrix, which is generally an ill-posed. This problem is overcome by maximal relevance criterion (Max-Relevance). Max-Relevance is used to find features satisfying (2), which approximates in (1) with the mean values of all mutual information values between individual features and class :

(2)

Features selected using Max-Relevance could likely have sufficient redundancy, i.e., the dependency among these features could be significant. When two features highly depend on each other, the respective class-discriminative power would not change much if one of them were eliminated. Therefore, the following minimal redundancy (Min-Redundancy) condition can be used to select mutually exclusive features as

(3)

By combining the above two criteria, i.e., (2) and (3), it is called “minimal-redundancy-maximal-relevance” (mRMR). The operator is used to combine and and considered as the simplest form to optimize and simultaneously:

(4)

Ii-B Deep Learning (DL) [c13]

Deep learning is a branch of machine learning and its fundamental principle to learn a hierarchical representation of data from layer to layer [c13]. In the literature, a different type of deep learning models has been studied for machine fault diagnosis. In this paper, we have used a sparse auto-encoder based learning approach to form a deep neural network for automatic feature extraction. The auto-encoder is a three-layer feed-forward neural network comprises an input layer, hidden layer, and output layer. As shown in figure 1 the first part is known as an encoder which takes input x and transforms it into a hidden representation h via a non-linear mapping as

(5)

where, is a non-linear activation function. The second part of the figure is known as a decoder which maps the hidden representation h back to original representation as

(6)
Figure 1: Basic architecture of sparse auto-encoder.
Figure 2: Proposed Framework: (a) Pre-training of sparse auto-encoder (b) Training of deep neural network with labelled data are available in a source domain (c) Transferring the pre-trained weight of source data on target network with small sample are available in target domain.

The network parameters, i.e., weight (W) and bias (b), are optimized by minimizing the cost function using the back-propagation by computing the gradients with respect to W and b. In back-propagation for the given training samples (x, y), first perform the forward pass to compute the activation function at every node of the network. Then, for every node , the error is computed which measure how much a particular node was responsible for errors at output.

(7)

where, (Kullback- Leibler) is the divergence function, is the number of hidden nodes, is regularization parameter, is sparsity parameter and is a sparsity control parameter. The is the average activation of hidden node .

Ii-C Transfer Learning (TL) [Tl]

Transfer learning is a form of representation learning based on idea of mastering a new task by reusing knowledge from a previous task and it is defined as:

Given a source domain and learning task , a target domain and a learning task , TL tries to improve the learning performance of target function in using the knowledge in and , where , or . Based on the source task and target task it is categorized in three subcategories:

  1. Inductive TL: Given a source domain and a learning task , a target domain and a learning task , inductive TL tries to improve the learning performance of target predictive in using the knowledge in and , where .

  2. Transductive TL: Given a source domain and a learning task , a target domain and a learning task , transductive TL tries to improve the learning performance of target function in using the knowledge in and , where and . In addition, some unlabeled target domain data must be available at training.

  3. Unsupervised TL: Given a source domain with a learning task , a target domain and a learning task , unsupervised TL aims to improve the learning of the target function in using the knowledge in and , where and and are not observable.

In this work, we have utilized the inductive transfer learning to evaluate the performance on CWRU and IMS rolling element bearings datasets.

Iii mRMR-DNN based transfer learning framework for Intelligent Fault Diagnosis

Considering the challenges posed by traditional fault diagnosis methods in the condition-based monitoring system, this paper presents an intelligent condition-based monitoring framework by minimizing the redundant features from data and transferring the knowledge from one domain to different domain. In this work, mRMR based feature selection method is utilized to eliminate the effect of redundant features from the dataset as described in Algorithm 1. The redundant features decrease the performance of the deep learning models because, in the presence of redundant features, different initial condition will lead to different performance. The data with reduced features are utilized to pre-train the source network as shown in figure 2a, in the case of deep neural network the pre-trained model is fine-tuned on the source data as shown in figure 2b, and validated on the unseen target data with same machine running condition as described in Algorithm 2. However, in the case of deep transfer learning, the DNN with inductive transfer learning are applied where the target task is different from the source task, no matter when the source and target domains are the same or different. In inductive transfer learning setting, we have evaluated a condition that a lot of labeled training data are available in the source domain and small labeled training data available in the target domain, as illustrated in figure 2c.

As shown in figure 2a, sparse auto-encodes are learned layer by layer in an unsupervised way on the source data. The sparse auto-encoder learned at is given as

(8)

where and are the hidden input and and estimated hidden output of the sparse auto-encoder, and are number of nodes in the and , is the sparsity and is the Kullback-Leibler. The sparsity and divergence at activation () of hidden unit are defined as follows:

(9)
(10)

The encoding output (i.e., , where, ) of last SAE is used as input to the softmax layer for pre-training of the softmax layer and is given as

(11)

The learned sparse auto-encoders are stacked with the softmax layer to form the deep neural network as shown in figure 2b. The deep network shown in figure 2b is fine-tuned on the source label to obtain the optimal weight and bias vectors of the network as defined below.

(12)

The parameter (i.e., weight and bias) of the pre-trained model as shown figure 2a are transferred on target network figure 2c and are work as initial parameter for the target domain. The obtained optimal network has trained on the target domain that has less labeled data as described in Algorithm 3 and given as follow:

(13)

The trained model on the target domain is validated on unseen test data of the target domain.

1:,                                                           # x and y are the input and output vectors
2: # initial number of features, selected number of features and class y
3:
4:            # find the relevance between individual features and class as defined in (2)
5:
6:       # find the redundancy between individual features and class as defined in (3)
7:                      # compute the difference between relevance and redundancy
8:     # sort the features on the basis of minimum redundancy maximum relevance
9:                                                  # input with minimum redundant features in the data
10:                                                  # learning of sparse auto-encoder (SAE)
11:                    # similarly learning of sparse auto-encoder
Algorithm 1 Relevant Feature Selection and Pre-training of Sparse Auto-encoder
1:           # stacked all hidden layer with input to form SSAE
2:           # form DNN with help of stacked sparse auto-encoder (SSAE) and softmax layer
3:    # use labels () of the source data RC-A and estimated labels () of the source data
4:    # the whole network is fine-tuned on the source labels using back-propagation to find the optimal parameter, i.e., weight and bias vectors
5:                                            # the fine-tuned network is the trained DNN model
6:                                          # test the trained DNN on unseen source data RC-A
Algorithm 2 Fine-tuning of Deep Neural Network (DNN)
1:             # Pre-trained SSAE on source data of step 1 in Algorithm 2
2:       # pre-training of softmax layer on target labels using final output of the last SSAE
3:                                               # form DNN with help of SSAE and softmax layer
4:    # use labels () of the target data RC-B and estimated labels ()
5:    # the whole network is fine-tuned on the target labels using back-propagation to find the optimal parameter, i.e., weight and bias vectors
6:                                            # the fine-tuned network is the trained DNN model
7:                                               # test the trained DNN on unseen target data RC-B
Algorithm 3 Deep Transfer Learning (DTL)
Case 1: CWRU data with same fault diameter Case 2: Different fault diameter Case 3: IMS Data
Class
Source: RC-A Target: RC-B Source: RC-B Target: RC-B label
Number of Sample Load Number of Sample Load Number of Sample Load Number of Sample Load
Normal 1210 0 hp 400, 400, 400 1, 2, 3 400, 400, 400 1, 2, 3 400 26.6 kN 1
Inner 1210 0 hp 400, 400, 400 1, 2, 3 400, 400, 400 1, 2, 3 400 26.6 kN 2
Ball 1210 0 hp 400, 400, 400 1, 2, 3 400, 400, 400 1, 2, 3 400 26.6 kN 3
Outer 1210 0 hp 400, 400, 400 1, 2, 3 400, 400, 400 1, 2, 3 400 26.6 kN 4
Table I: Dataset Description
Conditions Source (RC-A) Target (RC-B)
Normal-inner race
Normal-outer race
Normal-outer race
Normal-inner-ball-outer
Table II: RC-A is a source data and RC-B is a target data

Iv Experimental Results and Analysis

The proposed frameworks are validated on two different case studies, i.e., Case Western Reserve University (CWRU) Bearing Data [dataset] and Intelligent Maintenance Systems (IMS) Bearing Data [ims]. They are described as follows:

Iv-a Dataset Description

Experimental data are taken from the CWRU and IMS data center to analyze the performance of the proposed frameworks. Experimental setup of CWRU and IMS bearing test rig has shown in figures 3 and 5 by which multivariate vibration series has generated for the validation. The CWRU test stand consists of a 2-hp Reliance Electric motor on the left of stand, a torque transducer/encoder in the center, a dynamometer on the right, and control electronics are not shown in the figure. Single point faults with diameters of 7, 14 and 21 mils ((1 mil = 0.001 inches) have seeded at the inner raceway, rolling element, and outer raceway of the test bearing using electro-discharge machining. The vibration data are collected using accelerometers and driver end vibration signal have used, which have 12 kHz (12,000 samples per second) sampling rate with 2 hp load. However, the IMS, data are collected at 20 kHz sampling rate with 26.6 kN load.

In this analysis, dataset included four health conditions: 1) normal condition, 2) outer race fault, 3) inner race fault, and 4) roller fault with two fault diameters 7 and 21 mils.

Dataset
Motor
Load
Condition
PCA
[c10]
SVM-RFE
[svm-rfe]
mRMR
[mRMR]
DNN Without Source Label
DNN
DNN with mRMR
DTL
DTL with mRMR
Binary-Class 1 Normal-ball 82.75 78.00 79.00 99.36 99.36 98.75 99.00
2 84.25 71.75 71.75 99.63 99.88 99.00 99.00
3 89.00 71.75 78.50 99.38 99.00 99.36 99.75
1 Normal-inner 68.50 57.75 61.00 99.75 99.75 99.25 99.50
2 73.50 69.00 72.25 99.50 99.50 99.00 99.75
3 55.50 58.00 61.75 100.00 100.00 99.63 99.00
1 Normal-outer 81.50 71.25 79.00 100.00 100.00 100.00 100.00
2 57.50 61.25 66.00 100.00 100.00 100.00 100.00
3 79.00 69.00 72.50 100.00 100.00 100.00 100.00
Multi-Class 1
Normal-inner-
outer-ball
38.12 77.50 47.62 85.44 86.56 86.15 86.50
2 26.38 80.63 45.13 88.88 89.88 88.69 89.56
3 30.12 80.63 45.13 92.06 91.44 90.69 90.19
Average performance 63.84 70.54 64.97 96.99 97.11 96.71 96.85
Table III: Accuracy with Softmax Classifier on Different Running Condition with Same Fault Diameters
Dataset
Motor
Load
Condition
PCA
[c10]
SVM-RFE
[svm-rfe]
mRMR
[mRMR]
DNN Without Source Label
DNN
DNN with mRMR
DTL
DTL with mRMR
Binary-Class 1 Normal-ball 83.50 97.75 79.50 99.25 99.50 98.50 98.75
2 92.50 88.25 88.50 96.63 96.88 96.75 96.88
3 90.75 84.50 88.75 96.00 96.88 96.50 96.13
1 Normal-inner 79.50 82.00 79.00 97.25 97.50 98.00 97.50
2 79.75 83.00 74.25 96.38 96.38 96.75 96.75
3 87.75 81.25 78.75 98.38 98.25 97.50 98.00
1 Normal-outer 95.25 75.50 96.00 97.75 98.63 97.25 97.25
2 89.50 79.75 82.25 97.75 97.75 97.50 97.50
3 89.75 84.75 86.75 97.13 98.25 94.63 95.50
Multi-Class 1
Normal-inner-
outer-ball
70.00 78.13 71.62 95.63 96.44 95.00 95.00
2 63.12 79.37 70.00 86.19 86.44 87.63 87.69
3 74.38 81.25 71.12 91.19 89.81 90.44 92.00
Average performance 82.98 82.96 80.54 96.04 96.06 95.54 95.75
Table IV: Accuracy with Softmax Classifier on Different Running Condition with Different Fault Diameters
Dataset
Motor
Load
Condition
PCA
[c10]
SVM-RFE
[svm-rfe]
mRMR
[mRMR]
DNN Without Source Label
DNN
DNN with mRMR
DTL
DTL with mRMR
Binary-Class 26.6 Normal-ball 53.00 47.00 51.75 72.34 72.72 80.59 82.67
Normal-inner 45.25 41.0 50.50 84.72 86.84 89.66 90.44
Normal-outer 98.25 99.15 99.25 99.38 99.38 99.38 99.50
Multi-Class
Normal-inner-
outer-ball
48.38 46.25 47.25 73.14 74.72 74.81 75.08
Average performance 61.22 58.35 63.69 82.40 83.42 86.11 86.92
Table V: Accuracy with Softmax Classifier on Intelligent Maintenance System Dataset

Iv-B Segmentation

The length of time series data is massive (at least 121,000 samples in CWRU and 20,000 sample in IMS), if all the sample directly applied to machine learning algorithms, it will take ample time to train the network. To overcome this problem the training samples for all conditions have been segmented with the size of one quarter of the sampling period to learn the local characteristics of the signal [segmen]. Three cases have been investigated for the analysis of proposed frameworks as described below and given in Table I.

  1. Case 1: In this case, running condition A (RC-A) with fault diameter 7 mils has been treated as a source data, where each type of data have at least 121,000 samples with 12kHz frequency and approximately 1797 RPM motor speed. Therefore the number of sample points per revolution is around 400. The number of sample points has been taken one-quarter of the sampling frequency, so the total number of sample points for each type of data is 1210, and the dimension of each sample points is 100. The running condition B (RC- B) with same diameter (7 mils) and different load (1hp, 2hp, 3hp) has been treated as target data where each type of data have at least 40,000 sample points with same frequency and speed. The total number of sample points for each type of data is 400, and the dimension of each sample is 100.

    Figure 3: Apparatus of CWRU bearing used for the experiment [dataset].
  2. Case 2: In this case, running condition A (RC-A) with fault diameter, 7 mils has been treated as a source data as similar to case 1. However, running condition B (RC-B) is changed here fault diameter 14 mils with different load (1hp, 2hp, 3hp) has treated as target data where each type of data have at least 40,000 sample points with 12kHz frequency and 1797 RPM motor speed. The total number of sample points for each type of data is 400, and the dimension of each sample points is 100.

  3. Case 3: In this case, running condition A (RC-A) with fault diameter, 7 mils has treated as a source data as similar to case 1 and 2. However, running condition B (RC-B) is changed here IMS data with load 26.6 kN has treated as target data where each type of data have at least 40,000 sample points (with two data files) with 20 kHz frequency and 2000 RPM motor speed. The total number of sample points for each type of data is taken 400, and the dimension of each sample points is 100.

Figure 4: Transfer of knowledge from source data to different target data in case of DTL and DTL with mRMR.
Figure 5: Apparatus of IMS bearing used for the experiment [ims].

Iv-C Pre-processing

The data collected from the accelerometers are not well structured. If the network is trained on this type of dataset, they perform poorly. So to make it well structured, they are pre-processed before training the network. In this paper, max-min normalization is used to evaluate the performance.

(14)

where, x is the un-normalized data, is the normalized data, and are minimum and maximum values of the data. The pre-processed data is sampled using 5-fold external cross-validation for better generalization of the model.

Figure 6: t-SNE visualization of features for CWRU data.

Iv-D Analysis

In this subsection, Case 1, Case 2 and Case 3 as described in Table I are analyzed on binary-class and multi-class classification problem (as described in Table II) and compared with DNN and DTL without mRMR and state-of-the-art feature selection/extraction methods.

Comparison with DNN and DTL without mRMR

The performance of proposed frameworks are compared with deep learning and deep transfer learning without reducing the effect of redundant features using mRMR on a binary and multi-class classification problem. As given in Table I, the source data is considered at load 0 with the normal condition and load 0 with fault diameter 7 mils with other conditions (i.e., inner, ball, outer). The target domain is considered at different load with the same diameter, different diameters and at different machine as given in Table I and described in Case 1, Case 2 and Case 3. The network architecture chosen in DNN is , where, 100 is input layer, 50 is first hidden layer, 40 is second hidden layer, and 20 is the last hidden layer with regularization and . The sparsity parameter is varied from {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9} and result is reported on the best sparsity parameter. Whereas in DNN with mRMR firstly, features are selected with minimum redundancy then a DNN is formed with network architecture . The transformed features at the last layer of DNN and DNN with mRMR are applied to the softmax classifier to identify the health state of the rotatory machines.

In DTL, the pre-trained DNN weight parameters on source data are used as initial weight of the target network. The target network is fine-tuned on the target label, and the fine-tuned network is validated on the unseen target data. Whereas, in DTL with mRMR, the pre-trained DNN with mRMR weight parameters on source data are used as an initial weight of the target network. So in the transfer learning model, no pre-training is required, which reduces the training time of the model. The performance of the model, i.e., DNN with mRMR and DTL with mRMR are compared in the term of accuracy and t-SNE plot. The t-SNE plots of the extracted features using the proposed models have better separability of the data points as shown in figure 6 which shows that the extracted features are more informative in comparison of DNN and DNN with mRMR. The features extracted using the proposed models are applied to the softmax classifier to interpret the performance in terms of accuracy values. As given in Table III, IV and V, the prediction accuracy on the unseen data are better and comparable. The confusion matrix charts are also presented in figures 7 and 8 for the case of DNN with mRMR and DTL with mRMR between the actual health state vs predicted health state to describe the performance of the classification model.

Comparison with Traditional State-of-the-art Feature Selection/Extraction Methods

In this subsection, the performance of proposed frameworks are compared with traditional methods, i.e., PCA[c10], mRMR [mRMR] and support vector machine recursive feature elimination (SVM-RFE) [svm-rfe] on a binary and multi-class classification problem. In PCA features are extracted in accordance of highest to lowest eigenvalues, mRMR features are selected as explained in the subsection II-A and described in Algorithm 1 whereas, in SVM-RFE the features are selected based on ranking in which top-ranked features are chosen. In all these state-of-the-art methods same numbers of features are chosen as of proposed frameworks. As given show in Table III, IV and V, the performance of the proposed frameworks is better in term of accuracy values. The t-SNE plots are also shown in figure 6, which reveal that the features representation of the proposed approach is also better in comparison of state-of-the-art methods. The execution time of the proposed approaches are also compared with these methods as shown in VI. The execution time of DNN with mRMR is less in compared to DNN, whereas, for DTL and DTL with mRMR, the execution time reduced much because in the case of DTL no pre-training of the network are required from the scratch as described in the proposed approach.

S. No. Methods
Execution time
(seconds)
1 PCA +Softmax Classifier
2 SVM-RFE +Softmax Classifier
3 mRMR+Softmax Classifier
4 DNN +Softmax Classifier
5 DNN-mRMR +Softmax Classifier
6 DTL +Softmax Classifier
7 DTL-mRMR+Softmax Classifier
Table VI: Comparison of execution times
Figure 7: Confusion matrix plots of the predicted results for DNN with mRMR
Figure 8: Confusion matrix plots of the predicted results for DTL with mRMR

V Conclusion

This paper presents a new framework for intelligent machine fault diagnosis. The major contributions of this paper are to minimize the effect of redundant features in the dataset and transfer the knowledge to a different domain. Because, in the presence of redundant features, the different initial conditions will lead to different performances. However, knowledge transfer helps to improve the performance of the target domain with less number of sample in the target domain. To overcome this effect, mRMR feature selection is utilized to reduce the redundant features, and data with reduced redundant features are used for training the DNN and pre-trained model on different running conditions is fine-tuned on the target task. The proposed frameworks are validated on the famous motor bearing dataset from CWRU and IMS data center. The results in terms of accuracy values show that the proposed frameworks are an effective tool for machine fault diagnosis. The effectiveness is also represented in terms of the t-SNE plot and confusion matrix chart, as shown in figures 6, 8 and 7. In the future, the proposed frameworks are very helpful in the application area like, bioinformatics, where the number of features is very large in comparison to the number of samples. The proposed approach will help to reduce the non-informative features from such type of data and improve the performance.

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
403132
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description