mRMR-DNN with Transfer Learning for Intelligent Fault Diagnosis of Rotating Machines
Abstract
In recent years, intelligent condition-based monitoring of rotary machinery systems has become a major research focus of machine fault diagnosis. In condition-based monitoring, it is challenging to form a large-scale well-annotated dataset due to the expense of data acquisition and costly annotation. Along with that, the generated data have a large number of redundant features which degraded the performance of the machine learning models. To overcome this, we have utilized the advantages of minimum redundancy maximum relevance (mRMR) and transfer learning with deep learning model. In this work, mRMR is combined with deep learning and deep transfer learning framework to improve the fault diagnostics performance in term of accuracy and computational complexity. The mRMR reduces the redundant information from data and increases the deep learning performance, whereas transfer learning, reduces a large amount of data dependency for training the model. In the proposed work, two frameworks, i.e., mRMR with deep learning and mRMR with deep transfer learning, have explored and validated on CWRU and IMS rolling element bearings datasets. The analysis shows that the proposed frameworks are able to obtain better diagnostic accuracy in comparison of existing methods and also able to handle the data with a large number of features more quickly.
I Introduction
With the recent advancement of technology, intelligent condition monitoring of rotating machines become an essential tool of machine fault diagnosis to increase the reliability and ensure the equipment efficiency in industrial processes [c1, c2, c3]. Rotating components, which are essential parts of machines, are widely used in equipment transmission systems, and their failure might result in considerable loss and catastrophic consequences. As practical components for condition-based maintenance, the vibration-based fault diagnosis systems have been explored in recent years [c4].
Traditionally, machine fault diagnosis framework includes three main stages: 1) signal acquisition, 2) feature extraction and selection, 3) fault identification or classification. The signal acquisition stage involves the collection of raw data while the machine is in running condition. The signals such as vibration, temperature, current, sound pressure, and acoustic emission can be studied for health monitoring and fault diagnosis, but the vibration signal is extensively explored in the literature because it provides essential information about the faults [c5, c6, c7, patt]. In the second stage, feature extraction is used to extract informative features from the raw data using time-domain, frequency-domain, and time-frequency domain analysis [c8]. Although these feature extraction methods identify the machine health conditions, however, they may have irrelevant and insensitive features which affect the fault diagnosis performance. Therefore, feature selection methods such as mutual information: criteria of max-dependency, max-relevance, and min-redundancy, principal component analysis (PCA) and Fisher discriminant analysis (FDA) is widely used to select the essential features from the data [c9, c10, c11, t2pca]. In the final stage, selected features are used for fault classification using various classifier, i.e., support vector machine (SVM), k-nearest neighbor (k-NN), random forest (RF), and artificial neural network (ANN) [c12]. However, there are also several limitations with these traditional fault diagnosis methods.
In 2006, Hinton et al. [c13] proposed deep learning techniques which trying to learn the high-level representation of data by stacking the multiple layers in the hierarchical architecture. In recent years, several studies have focused on deep neural network (DNN) for machine fault diagnosis. Tao et al. [c14] suggested a deep neural network framework for bearing fault diagnosis based on stacked auto-encoder and softmax regression. In [c15, maurya, mauryavs] authors have proposed a DNN-based intelligent fault diagnosis method for the classification of different datasets from bearings element and gearboxes with large samples using auto-encoder. Sun et al. [c16], proposed a sparse auto-encoder-based DNN with the help of the denoising coding and dropout method for induction motor fault diagnosis. Ding et al. [c18] developed a deep convnet in which wavelet packet energy has used as input for bearing fault diagnosis. In [c19], intelligent machine bearings fault diagnosis method is presented by combining the compressed data acquisition and deep learning approach in a single framework.
Although deep learning-based models have achieved great success in machine fault diagnosis applications, however, there are still problems that are associated with deep learning models. As the number of hidden layers and nodes is increasing, the number of parameters also increased, which increases the computational complexity of the model. Along with that large amount of labeled training data is required for training the deep network from scratch. In addition to that, parameter optimization and hyperparameter tuning greatly affect the performance of deep networks. Transfer learning-based approaches have been used to overcome these problems where instead of training the deep learning models from scratch, a DNN that has been trained on sufficient labelled training data in different running conditions is used and fine-tuned on the target task. In the literature, various case studies have been performed using transfer learning with the DNN. Lu et al. [tr1] proposed a DNN model with domain adaptation for fault diagnosis. First, they utilize the domain adaptation to strengthen the representation of the original data, so that high classification accuracy can be obtained on the target domain. Second, they proposed various strategies to explore the optimal hyperparameters of the model. Long et al.[tr2] have presented a deep transfer learning-based model using sparse auto-encoder (SAE) for fault diagnosis. In which three layers SAE is used to extract the abstract features of raw data and uses the maximum mean discrepancy term to minimize the discrepancy between the features of training and testing data. Gao et al. [tr3] presented an intelligent fault diagnosis of machines with unlabeled data using deep convolutional transfer learning network. Siyu et al. [tr4] have proposed a highly accurate machine fault diagnosis model using deep transfer learning with a convolution neural network. However, the performance of these models again reduced due to redundant features in the dataset because, in the presence of redundant features, the different initial condition will lead to different performance.
In this paper, we have addressed the problem mentioned above, by employing the mutual information: criteria of max-dependency, max-relevance, and min-redundancy to select a subset of features with minimum redundancy. The selected features are used to train the DNN to extract meaningful representation in the lower dimension. In this work, two frameworks of intelligent fault diagnosis method have been evaluated. In the first framework, the DNN is pre-trained and fine-tuned on the same running condition, and the fine-tuned network is validated on unseen samples of same running conditions. However, in the second framework, the deep neural network is pre-trained on one running condition with unlabelled data and pre-trained weights are transferred on target domain and finally target network is fine-tuned on different running condition with less number of sample. So in the second framework, the pre-training time of the target network is totally eliminated. In the real application, as mentioned, it is challenging to form a large-scale well-annotated dataset. In this scenario, the second framework is more useful.
The major contributions of the paper are summarized as:
-
mRMR with deep learning and mRMR with deep transfer learning frameworks have been proposed for an intelligent machine fault diagnosis, as shown in Fig 2.
-
mRMR based machine learning-based method has been evaluated to minimize the effect of redundant features in the dataset. The redundant features decrease the performance of the deep models because in the presence of redundant features, the different initial condition will lead to different performance.
-
Deep learning and deep transfer learning based methods have been evaluated for better feature representation in reduced dimension with lower complexity.
-
t-Distributed Stochastic Neighbor Embedding (t-SNE) and confusion matrix chart are used for the visualization of reduced features and to describe the performance of the classification model.
-
Experiments have been conducted on CWRU [dataset] and IMS [ims] datasets to show the efficacy of the proposed approach in comparison of state-of-the-art methods.
The remainder of this paper is organized as follows: Section II briefly introduces the theoretical background for Minimum Redundancy Maximum Relevance, deep neural network and transfer learning. Section III describes the proposed mRMR-DNN based transfer learning framework for intelligent condition based monitoring of rotating machinery fault diagnosis. Section IV presents the experimental results and analysis of proposed method in comparison with state-of-the-art methods. Finally, Section V draw the conclusion of complete paper.
Ii Theoretical Background
Ii-a Minimum Redundancy Maximum Relevance (mRMR) [mRMR]
The mutual information is used to determine the feature set with features which jointly have the maximum dependency on the target class . This approach is termed as Max-Dependency and written as
(1) |
where, is the mutual information between feature subset and class .
The Max-Dependency criterion is difficult to implement in the high-dimensional feature space: 1) the number of samples is often insufficient and 2) multivariate density estimation often involves computing the inverse of high-dimension covariance matrix, which is generally an ill-posed. This problem is overcome by maximal relevance criterion (Max-Relevance). Max-Relevance is used to find features satisfying (2), which approximates in (1) with the mean values of all mutual information values between individual features and class :
(2) |
Features selected using Max-Relevance could likely have sufficient redundancy, i.e., the dependency among these features could be significant. When two features highly depend on each other, the respective class-discriminative power would not change much if one of them were eliminated. Therefore, the following minimal redundancy (Min-Redundancy) condition can be used to select mutually exclusive features as
(3) |
Ii-B Deep Learning (DL) [c13]
Deep learning is a branch of machine learning and its fundamental principle to learn a hierarchical representation of data from layer to layer [c13]. In the literature, a different type of deep learning models has been studied for machine fault diagnosis. In this paper, we have used a sparse auto-encoder based learning approach to form a deep neural network for automatic feature extraction. The auto-encoder is a three-layer feed-forward neural network comprises an input layer, hidden layer, and output layer. As shown in figure 1 the first part is known as an encoder which takes input x and transforms it into a hidden representation h via a non-linear mapping as
(5) |
where, is a non-linear activation function. The second part of the figure is known as a decoder which maps the hidden representation h back to original representation as
(6) |


The network parameters, i.e., weight (W) and bias (b), are optimized by minimizing the cost function using the back-propagation by computing the gradients with respect to W and b. In back-propagation for the given training samples (x, y), first perform the forward pass to compute the activation function at every node of the network. Then, for every node , the error is computed which measure how much a particular node was responsible for errors at output.
(7) |
where, (Kullback- Leibler) is the divergence function, is the number of hidden nodes, is regularization parameter, is sparsity parameter and is a sparsity control parameter. The is the average activation of hidden node .
Ii-C Transfer Learning (TL) [Tl]
Transfer learning is a form of representation learning based on idea of mastering a new task by reusing knowledge from a previous task and it is defined as:
Given a source domain and learning task , a target domain and a learning task , TL tries to improve the learning performance of target function in using the knowledge in and , where , or . Based on the source task and target task it is categorized in three subcategories:
-
Inductive TL: Given a source domain and a learning task , a target domain and a learning task , inductive TL tries to improve the learning performance of target predictive in using the knowledge in and , where .
-
Transductive TL: Given a source domain and a learning task , a target domain and a learning task , transductive TL tries to improve the learning performance of target function in using the knowledge in and , where and . In addition, some unlabeled target domain data must be available at training.
-
Unsupervised TL: Given a source domain with a learning task , a target domain and a learning task , unsupervised TL aims to improve the learning of the target function in using the knowledge in and , where and and are not observable.
In this work, we have utilized the inductive transfer learning to evaluate the performance on CWRU and IMS rolling element bearings datasets.
Iii mRMR-DNN based transfer learning framework for Intelligent Fault Diagnosis
Considering the challenges posed by traditional fault diagnosis methods in the condition-based monitoring system, this paper presents an intelligent condition-based monitoring framework by minimizing the redundant features from data and transferring the knowledge from one domain to different domain. In this work, mRMR based feature selection method is utilized to eliminate the effect of redundant features from the dataset as described in Algorithm 1. The redundant features decrease the performance of the deep learning models because, in the presence of redundant features, different initial condition will lead to different performance. The data with reduced features are utilized to pre-train the source network as shown in figure 2a, in the case of deep neural network the pre-trained model is fine-tuned on the source data as shown in figure 2b, and validated on the unseen target data with same machine running condition as described in Algorithm 2. However, in the case of deep transfer learning, the DNN with inductive transfer learning are applied where the target task is different from the source task, no matter when the source and target domains are the same or different. In inductive transfer learning setting, we have evaluated a condition that a lot of labeled training data are available in the source domain and small labeled training data available in the target domain, as illustrated in figure 2c.
As shown in figure 2a, sparse auto-encodes are learned layer by layer in an unsupervised way on the source data. The sparse auto-encoder learned at is given as
(8) |
where and are the hidden input and and estimated hidden output of the sparse auto-encoder, and are number of nodes in the and , is the sparsity and is the Kullback-Leibler. The sparsity and divergence at activation () of hidden unit are defined as follows:
(9) | |||
(10) |
The encoding output (i.e., , where, ) of last SAE is used as input to the softmax layer for pre-training of the softmax layer and is given as
(11) |
The learned sparse auto-encoders are stacked with the softmax layer to form the deep neural network as shown in figure 2b. The deep network shown in figure 2b is fine-tuned on the source label to obtain the optimal weight and bias vectors of the network as defined below.
(12) |
The parameter (i.e., weight and bias) of the pre-trained model as shown figure 2a are transferred on target network figure 2c and are work as initial parameter for the target domain. The obtained optimal network has trained on the target domain that has less labeled data as described in Algorithm 3 and given as follow:
(13) |
The trained model on the target domain is validated on unseen test data of the target domain.
Case 1: CWRU data with same fault diameter | Case 2: Different fault diameter | Case 3: IMS Data | ||||||||
|
Source: RC-A | Target: RC-B | Source: RC-B | Target: RC-B | label | |||||
Number of Sample | Load | Number of Sample | Load | Number of Sample | Load | Number of Sample | Load | |||
Normal | 1210 | 0 hp | 400, 400, 400 | 1, 2, 3 | 400, 400, 400 | 1, 2, 3 | 400 | 26.6 kN | 1 | |
Inner | 1210 | 0 hp | 400, 400, 400 | 1, 2, 3 | 400, 400, 400 | 1, 2, 3 | 400 | 26.6 kN | 2 | |
Ball | 1210 | 0 hp | 400, 400, 400 | 1, 2, 3 | 400, 400, 400 | 1, 2, 3 | 400 | 26.6 kN | 3 | |
Outer | 1210 | 0 hp | 400, 400, 400 | 1, 2, 3 | 400, 400, 400 | 1, 2, 3 | 400 | 26.6 kN | 4 |
Conditions | Source (RC-A) | Target (RC-B) |
Normal-inner race | ||
Normal-outer race | ||
Normal-outer race | ||
Normal-inner-ball-outer |
Iv Experimental Results and Analysis
The proposed frameworks are validated on two different case studies, i.e., Case Western Reserve University (CWRU) Bearing Data [dataset] and Intelligent Maintenance Systems (IMS) Bearing Data [ims]. They are described as follows:
Iv-a Dataset Description
Experimental data are taken from the CWRU and IMS data center to analyze the performance of the proposed frameworks. Experimental setup of CWRU and IMS bearing test rig has shown in figures 3 and 5 by which multivariate vibration series has generated for the validation. The CWRU test stand consists of a 2-hp Reliance Electric motor on the left of stand, a torque transducer/encoder in the center, a dynamometer on the right, and control electronics are not shown in the figure. Single point faults with diameters of 7, 14 and 21 mils ((1 mil = 0.001 inches) have seeded at the inner raceway, rolling element, and outer raceway of the test bearing using electro-discharge machining. The vibration data are collected using accelerometers and driver end vibration signal have used, which have 12 kHz (12,000 samples per second) sampling rate with 2 hp load. However, the IMS, data are collected at 20 kHz sampling rate with 26.6 kN load.
In this analysis, dataset included four health conditions: 1) normal condition, 2) outer race fault, 3) inner race fault, and 4) roller fault with two fault diameters 7 and 21 mils.
Dataset |
|
Condition |
|
|
|
DNN | Without Source Label | ||||||||||
DNN |
|
DTL |
|
||||||||||||||
Binary-Class | 1 | Normal-ball | 82.75 | 78.00 | 79.00 | 99.36 | 99.36 | 98.75 | 99.00 | ||||||||
2 | 84.25 | 71.75 | 71.75 | 99.63 | 99.88 | 99.00 | 99.00 | ||||||||||
3 | 89.00 | 71.75 | 78.50 | 99.38 | 99.00 | 99.36 | 99.75 | ||||||||||
1 | Normal-inner | 68.50 | 57.75 | 61.00 | 99.75 | 99.75 | 99.25 | 99.50 | |||||||||
2 | 73.50 | 69.00 | 72.25 | 99.50 | 99.50 | 99.00 | 99.75 | ||||||||||
3 | 55.50 | 58.00 | 61.75 | 100.00 | 100.00 | 99.63 | 99.00 | ||||||||||
1 | Normal-outer | 81.50 | 71.25 | 79.00 | 100.00 | 100.00 | 100.00 | 100.00 | |||||||||
2 | 57.50 | 61.25 | 66.00 | 100.00 | 100.00 | 100.00 | 100.00 | ||||||||||
3 | 79.00 | 69.00 | 72.50 | 100.00 | 100.00 | 100.00 | 100.00 | ||||||||||
Multi-Class | 1 |
|
38.12 | 77.50 | 47.62 | 85.44 | 86.56 | 86.15 | 86.50 | ||||||||
2 | 26.38 | 80.63 | 45.13 | 88.88 | 89.88 | 88.69 | 89.56 | ||||||||||
3 | 30.12 | 80.63 | 45.13 | 92.06 | 91.44 | 90.69 | 90.19 | ||||||||||
Average performance | 63.84 | 70.54 | 64.97 | 96.99 | 97.11 | 96.71 | 96.85 |
Dataset |
|
Condition |
|
|
|
DNN | Without Source Label | ||||||||||
DNN |
|
DTL |
|
||||||||||||||
Binary-Class | 1 | Normal-ball | 83.50 | 97.75 | 79.50 | 99.25 | 99.50 | 98.50 | 98.75 | ||||||||
2 | 92.50 | 88.25 | 88.50 | 96.63 | 96.88 | 96.75 | 96.88 | ||||||||||
3 | 90.75 | 84.50 | 88.75 | 96.00 | 96.88 | 96.50 | 96.13 | ||||||||||
1 | Normal-inner | 79.50 | 82.00 | 79.00 | 97.25 | 97.50 | 98.00 | 97.50 | |||||||||
2 | 79.75 | 83.00 | 74.25 | 96.38 | 96.38 | 96.75 | 96.75 | ||||||||||
3 | 87.75 | 81.25 | 78.75 | 98.38 | 98.25 | 97.50 | 98.00 | ||||||||||
1 | Normal-outer | 95.25 | 75.50 | 96.00 | 97.75 | 98.63 | 97.25 | 97.25 | |||||||||
2 | 89.50 | 79.75 | 82.25 | 97.75 | 97.75 | 97.50 | 97.50 | ||||||||||
3 | 89.75 | 84.75 | 86.75 | 97.13 | 98.25 | 94.63 | 95.50 | ||||||||||
Multi-Class | 1 |
|
70.00 | 78.13 | 71.62 | 95.63 | 96.44 | 95.00 | 95.00 | ||||||||
2 | 63.12 | 79.37 | 70.00 | 86.19 | 86.44 | 87.63 | 87.69 | ||||||||||
3 | 74.38 | 81.25 | 71.12 | 91.19 | 89.81 | 90.44 | 92.00 | ||||||||||
Average performance | 82.98 | 82.96 | 80.54 | 96.04 | 96.06 | 95.54 | 95.75 |
Dataset |
|
Condition |
|
|
|
DNN | Without Source Label | ||||||||||
DNN |
|
DTL |
|
||||||||||||||
Binary-Class | 26.6 | Normal-ball | 53.00 | 47.00 | 51.75 | 72.34 | 72.72 | 80.59 | 82.67 | ||||||||
Normal-inner | 45.25 | 41.0 | 50.50 | 84.72 | 86.84 | 89.66 | 90.44 | ||||||||||
Normal-outer | 98.25 | 99.15 | 99.25 | 99.38 | 99.38 | 99.38 | 99.50 | ||||||||||
Multi-Class |
|
48.38 | 46.25 | 47.25 | 73.14 | 74.72 | 74.81 | 75.08 | |||||||||
Average performance | 61.22 | 58.35 | 63.69 | 82.40 | 83.42 | 86.11 | 86.92 |
Iv-B Segmentation
The length of time series data is massive (at least 121,000 samples in CWRU and 20,000 sample in IMS), if all the sample directly applied to machine learning algorithms, it will take ample time to train the network. To overcome this problem the training samples for all conditions have been segmented with the size of one quarter of the sampling period to learn the local characteristics of the signal [segmen]. Three cases have been investigated for the analysis of proposed frameworks as described below and given in Table I.
-
Case 1: In this case, running condition A (RC-A) with fault diameter 7 mils has been treated as a source data, where each type of data have at least 121,000 samples with 12kHz frequency and approximately 1797 RPM motor speed. Therefore the number of sample points per revolution is around 400. The number of sample points has been taken one-quarter of the sampling frequency, so the total number of sample points for each type of data is 1210, and the dimension of each sample points is 100. The running condition B (RC- B) with same diameter (7 mils) and different load (1hp, 2hp, 3hp) has been treated as target data where each type of data have at least 40,000 sample points with same frequency and speed. The total number of sample points for each type of data is 400, and the dimension of each sample is 100.
Figure 3: Apparatus of CWRU bearing used for the experiment [dataset]. -
Case 2: In this case, running condition A (RC-A) with fault diameter, 7 mils has been treated as a source data as similar to case 1. However, running condition B (RC-B) is changed here fault diameter 14 mils with different load (1hp, 2hp, 3hp) has treated as target data where each type of data have at least 40,000 sample points with 12kHz frequency and 1797 RPM motor speed. The total number of sample points for each type of data is 400, and the dimension of each sample points is 100.
-
Case 3: In this case, running condition A (RC-A) with fault diameter, 7 mils has treated as a source data as similar to case 1 and 2. However, running condition B (RC-B) is changed here IMS data with load 26.6 kN has treated as target data where each type of data have at least 40,000 sample points (with two data files) with 20 kHz frequency and 2000 RPM motor speed. The total number of sample points for each type of data is taken 400, and the dimension of each sample points is 100.


Iv-C Pre-processing
The data collected from the accelerometers are not well structured. If the network is trained on this type of dataset, they perform poorly. So to make it well structured, they are pre-processed before training the network. In this paper, max-min normalization is used to evaluate the performance.
(14) |
where, x is the un-normalized data, is the normalized data, and are minimum and maximum values of the data. The pre-processed data is sampled using 5-fold external cross-validation for better generalization of the model.

Iv-D Analysis
In this subsection, Case 1, Case 2 and Case 3 as described in Table I are analyzed on binary-class and multi-class classification problem (as described in Table II) and compared with DNN and DTL without mRMR and state-of-the-art feature selection/extraction methods.
Comparison with DNN and DTL without mRMR
The performance of proposed frameworks are compared with deep learning and deep transfer learning without reducing the effect of redundant features using mRMR on a binary and multi-class classification problem. As given in Table I, the source data is considered at load 0 with the normal condition and load 0 with fault diameter 7 mils with other conditions (i.e., inner, ball, outer). The target domain is considered at different load with the same diameter, different diameters and at different machine as given in Table I and described in Case 1, Case 2 and Case 3. The network architecture chosen in DNN is , where, 100 is input layer, 50 is first hidden layer, 40 is second hidden layer, and 20 is the last hidden layer with regularization and . The sparsity parameter is varied from {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9} and result is reported on the best sparsity parameter. Whereas in DNN with mRMR firstly, features are selected with minimum redundancy then a DNN is formed with network architecture . The transformed features at the last layer of DNN and DNN with mRMR are applied to the softmax classifier to identify the health state of the rotatory machines.
In DTL, the pre-trained DNN weight parameters on source data are used as initial weight of the target network. The target network is fine-tuned on the target label, and the fine-tuned network is validated on the unseen target data. Whereas, in DTL with mRMR, the pre-trained DNN with mRMR weight parameters on source data are used as an initial weight of the target network. So in the transfer learning model, no pre-training is required, which reduces the training time of the model. The performance of the model, i.e., DNN with mRMR and DTL with mRMR are compared in the term of accuracy and t-SNE plot. The t-SNE plots of the extracted features using the proposed models have better separability of the data points as shown in figure 6 which shows that the extracted features are more informative in comparison of DNN and DNN with mRMR. The features extracted using the proposed models are applied to the softmax classifier to interpret the performance in terms of accuracy values. As given in Table III, IV and V, the prediction accuracy on the unseen data are better and comparable. The confusion matrix charts are also presented in figures 7 and 8 for the case of DNN with mRMR and DTL with mRMR between the actual health state vs predicted health state to describe the performance of the classification model.
Comparison with Traditional State-of-the-art Feature Selection/Extraction Methods
In this subsection, the performance of proposed frameworks are compared with traditional methods, i.e., PCA[c10], mRMR [mRMR] and support vector machine recursive feature elimination (SVM-RFE) [svm-rfe] on a binary and multi-class classification problem. In PCA features are extracted in accordance of highest to lowest eigenvalues, mRMR features are selected as explained in the subsection II-A and described in Algorithm 1 whereas, in SVM-RFE the features are selected based on ranking in which top-ranked features are chosen. In all these state-of-the-art methods same numbers of features are chosen as of proposed frameworks. As given show in Table III, IV and V, the performance of the proposed frameworks is better in term of accuracy values. The t-SNE plots are also shown in figure 6, which reveal that the features representation of the proposed approach is also better in comparison of state-of-the-art methods. The execution time of the proposed approaches are also compared with these methods as shown in VI. The execution time of DNN with mRMR is less in compared to DNN, whereas, for DTL and DTL with mRMR, the execution time reduced much because in the case of DTL no pre-training of the network are required from the scratch as described in the proposed approach.
S. No. | Methods |
|
||
1 | PCA +Softmax Classifier | |||
2 | SVM-RFE +Softmax Classifier | |||
3 | mRMR+Softmax Classifier | |||
4 | DNN +Softmax Classifier | |||
5 | DNN-mRMR +Softmax Classifier | |||
6 | DTL +Softmax Classifier | |||
7 | DTL-mRMR+Softmax Classifier |


V Conclusion
This paper presents a new framework for intelligent machine fault diagnosis. The major contributions of this paper are to minimize the effect of redundant features in the dataset and transfer the knowledge to a different domain. Because, in the presence of redundant features, the different initial conditions will lead to different performances. However, knowledge transfer helps to improve the performance of the target domain with less number of sample in the target domain. To overcome this effect, mRMR feature selection is utilized to reduce the redundant features, and data with reduced redundant features are used for training the DNN and pre-trained model on different running conditions is fine-tuned on the target task. The proposed frameworks are validated on the famous motor bearing dataset from CWRU and IMS data center. The results in terms of accuracy values show that the proposed frameworks are an effective tool for machine fault diagnosis. The effectiveness is also represented in terms of the t-SNE plot and confusion matrix chart, as shown in figures 6, 8 and 7. In the future, the proposed frameworks are very helpful in the application area like, bioinformatics, where the number of features is very large in comparison to the number of samples. The proposed approach will help to reduce the non-informative features from such type of data and improve the performance.
References
