Predicting Remaining Useful Life using Time Series Embeddings based on Recurrent Neural Networks^{1}^{1}1Copyright 2017 Tata Consultancy Services Ltd.
Abstract.
We consider the problem of estimating the remaining useful life (RUL) of a system or a machine from sensor data. Many approaches for RUL estimation based on sensor data make assumptions about how machines degrade. Additionally, sensor data from machines is noisy and often suffers from missing values in many practical settings. We propose EmbedRUL: a novel approach for RUL estimation from sensor data that does not rely on any degradationtrend assumptions, is robust to noise, and handles missing values. EmbedRUL utilizes a sequencetosequence model based on Recurrent Neural Networks (RNNs) to generate embeddings for multivariate time series subsequences. The embeddings for normal and degraded machines tend to be different, and are therefore found to be useful for RUL estimation. We show that the embeddings capture the overall pattern in the time series while filtering out the noise, so that the embeddings of two machines with similar operational behavior are close to each other, even when their sensor readings have significant and varying levels of noise content. We perform experiments on publicly available turbofan engine dataset and a proprietary realworld dataset, and demonstrate that EmbedRUL outperforms the previously reported (Malhotra et al., 2016b) stateoftheart on several metrics.
1. Introduction
It is quite common in the current era of the ‘Industrial Internet of Things’ (Da Xu et al., 2014) for a large number of sensors to be installed for monitoring the operational behavior of machines. Consequently, there is considerable interest in exploiting data from such sensors for health monitoring tasks such as anomaly detection, fault detection, as well as prognostics, i.e., estimating remaining useful life (RUL) of machines in the field.
We highlight some of the practical challenges in using datadriven approaches for health monitoring and RUL estimation, and propose an approach that can handle these challenges:
1) Health degradation trend: In complex machines with several components, it is difficult to build physics based models for health degradation analysis. Many datadriven approaches assume a degradation trend, e.g., exponential degradation (Croarkin and
Tobias, 2006; Saxena
et al., 2008b; Ramasso, 2014; Camci
et al., 2016; Wang
et al., 2008). This is particularly useful in cases where there is no explicit measurable parameter of the health of a machine. Such an assumption may not hold in other scenarios, e.g., when a component in a machine is approaching failure, the symptoms in the sensor data may initially be intermittent and then grow over time in a nonexponential manner.
2) Noisy sensor readings: Sensor readings often suffer from varying levels of environmental noise which entails the use of denoising techniques. The amount of noise may even vary across sensors.
3) Partial unavailability of sensor data: Sensor data may be partially unavailable due to several reasons such as network communication loss and damaged or faulty sensors.
4) Complex temporal dependencies between sensors: Multiple components interact with each other in a complex way leading to complex dependencies between sensor readings.
For example, a change in one sensor may lead to a change in another sensor after a delay of few seconds or even hours.
It is desirable to have an approach that can capture the complex operational behavior of machine(s) from sensor readings while accounting for temporal dependencies.
In this paper, we propose EmbedRUL: an approach for RUL estimation using Recurrent Neural Networks (RNNs) to address the above challenges. An RNN is used as an encoder to obtain a fixeddimensional representation that serves as an embedding for multisensor time series data. The health of a machine at any point of time can be estimated by comparing an embedding computed using recent sensor history with representative embeddings computed for periods of normal behavior. Our approach for RUL estimation does not rely on degradation trend assumptions, can handle noise and missing values, and can capture complex temporal dependencies among the sensors. The key contributions of this work are:

We show that time series embeddings or representations obtained using an RNN Encoder are useful for RUL estimation (refer Section 5.2).

We show that embeddings are robust and perform well for the RUL estimation task even under noisy conditions, i.e., when sensor readings are noisy (refer Section 5.3).
The rest of the paper is organized as follows: We provide a review of related work in Section 2. Section 3 motivates our approach and briefly introduces existing RNNbased approaches for machine health monitoring and RUL estimation using sensor data. In Section 4 we explain our proposed approach for RUL estimation, and provide experimental details and observations in Section 5, and conclude in Section 6.
2. Related Work
Datadriven approaches for RUL estimation: Several approaches for RUL estimation based on sensor data have been proposed. A review of these approaches can be found in (Si et al., 2011). (Eker et al., 2014; Khelif, Malinowski, ChebelMorello, and Zerhouni, Khelif et al.) propose estimating RUL directly by calculating the similarity between the sensors without deriving any health estimates. Similarly, Support Vector Regression (Khelif et al., 2017), RNNs (Heimes, 2008), Deep Convolutional Neural Networks (Babu et al., 2016) have been proposed to estimate the RUL directly by modeling the relations among the sensors without estimating the health of the machines. However, unlike EmbedRUL, none of these approaches focus on robust RUL estimation, and in particular, on robustness to noise.
Robust RUL Estimation: Wavelet filters have been proposed to handle noise for robust performance degradation assessment in (Qiu et al., 2003). In (Hu et al., 2012), ensemble of models is used to ensure that predictions are robust. Our proposed approach handles noise in sensor readings by learning robust representations from sensor data via RNN EncoderDecoder (RNNED) models.
Time series representation learning: Unsupervised representation learning for sequences using RNNs has been proposed for applications in various domains including text, video, speech, and time series (e.g., sensor data). Long Short Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) based encoders trained using encoderdecoder framework have been proposed to learn representations of video sequences (Srivastava et al., 2015). Pretrained LSTM Encoder based on autoencoders are used to initialize networks for classification tasks and are shown to achieve improved performance (Dai and Le, 2015) for text applications. Gated Recurrent Units (GRUs) (Cho et al., 2014) based encoder named Timenet (Malhotra et al., 2017) has been recently proposed to obtain embeddings for time series from several domains. The embeddings are shown to be effective for time series classification tasks. Stacked denoising autoencoders have been used to learn hierarchical features from sensor data in (Yan and Yu, 2015). These features are shown to be useful for anomaly detection. However, to the best of our knowledge, the proposed EmbedRUL is the first attempt at using RNNbased embeddings of multivariate sensor data for machine health monitoring, and more specifically, for RUL estimation.
Other Deep learning models for Machine Health Monitoring: Various architectures based on Restricted Boltzmann Machines, RNNs (discussed in Section 3.2) and Convolutional Neural Networks have been proposed for machine health monitoring in different contexts. Many of these architectures and applications of deep learning to machine health monitoring have been surveyed in (Zhao et al., 2016b). An endtoend convolutional selective autoencoder for early detection and monitoring of combustion instabilites in high speed flame video frames was proposed in (Akintayo et al., 2016). A combination of deep learning and survival analysis for asset health management has been proposed in (Liao and Ahn, 2016) using sequential data by stacking a LSTM layer, a feed forward layer and a survival model layer to arrive at the asset failure probability. Deep belief networks and autoencoders have been used for health monitoring of aerospace and building systems in (Reddy et al., 2016). However, none of these approaches are proposed for RUL estimation. Predicting milling machine tool wear from sensor data has been proposed using deep LSTM networks in (Zhao et al., 2016a). In (Zhao et al., 2017), a convolutional bidirectional LSTM network along with fully connected layers at the top is shown to predict tool wear. The convolutional layer extracts robust local features while LSTM layer encodes temporal information. These methods model the problem of degradation estimation in a supervised manner unlike our approach of estimating machine health using embeddings generated using seq2seq models.
3. Background
Many datadriven approaches attempt to estimate the health of a machine from sensor data in terms of a health index (HI) (e.g., (Wang et al., 2008; Ramasso, 2014)). The trend of HI over time, referred to as HI curve, is then used to estimate the RUL by comparing it with the trends of failed instances. The HI curve for a test instance is compared with the HI curve of failed (train) instance to estimate the RUL of the test instance, as shown in Figure 1. In general, the HI curve of the test instance is compared with HI curves of several failed instances, and weighted average of the obtained RUL estimates from the failed instances is used to obtain the final RUL estimate (refer Section 4.3 for details).
In Section 3.1, we introduce a simple approach for HI estimation that maps the current sensor readings to HI. Next, we introduce existing HI estimation techniques that leverage RNNs to capture the temporal patterns in sensor readings, and provide a motivation for our approach in Section 3.2.
3.1. Degradation trend assumption based HI estimation
Consider a HI curve , where , . When a machine is healthy, and when a machine is near failure or about to fail, . The multisensor readings at time can be used to obtain an estimate for the actual HI value . One way of obtaining this mapping is via a linear regression model: , where and . The parameters and are estimated by minimizing , where the target HI curve can be assumed to follow an exponential degradation trend (e.g., (Wang et al., 2008)).
Once the mapping is learned, the sensor readings at a time instant can be used to obtain HI. Such approaches have two shortcomings: i) they rely on an assumption about the degradation trend, ii) they do not take into account the temporal aspect of the sensor data. We show that the target HI curve for learning such a mapping (i.e., learning the parameters and ) can be obtained using RNN models instead of relying on the exponential assumption (refer Section 5 for details).
3.2. RNNs for Machine Health Monitoring
RNNs, especially those based on LSTM units or GRUs have been successfully used to achieve stateoftheart results on sequence modeling tasks such as machine translation (Cho et al., 2014) and speech recognition (Graves et al., 2013). Recently, deep RNNs have been shown to be useful for health monitoring from multisensor time series data (Malhotra et al., 2015; Malhotra et al., 2016a; Filonov et al., 2016). The key idea behind using RNNs for health monitoring is to learn a temporal model of the system by capturing the complex temporal as well as instantaneous dependencies between sensor readings.
Autoencoders have been used to discover interesting structures in the data by means of regularization such as by adding constraints on the number of hidden units of the autoencoder (Ng, 2011), or by adding noise to the input and training the network to reconstruct a denoised version of the input (Vincent et al., 2008). The key idea behind such autoencoders is that the hidden representation obtained for an input retains the underlying important pattern(s) in the input and ignores the noise component.
RNN autoencoders have been shown to be useful for RUL estimation (Malhotra et al., 2016b) in which the RNNbased model learns to capture the behavior of a machine by learning to reconstruct multivariate time series corresponding to normal behavior in an unsupervised manner. Since the network is trained only on time series corresponding to normal behavior, it is expected to reconstruct the normal behavior well and perform poorly while reconstructing the abnormal behavior. This results in small reconstruction error for normal time series and large reconstruction error for abnormal time series. The reconstruction error is then used as a proxy to estimate the health or degree of degradation, and in turn estimate the RUL of the machine. We refer to this reconstruction error based approach for RUL estimation as ReconRUL.
We propose to learn robust fixeddimensional representations for multisensor time series data via sequencetosequence (Sutskever et al., 2014; Bahdanau et al., 2014) autoencoders based on RNNs. Here we briefly introduce multilayered RNNs based on GRUs that serve as building blocks of sequencetosequence autoencoders (refer Section 4 for details).
3.2.1. Multilayered RNN with Dropout
We use Gated Recurrent Units (Cho et al., 2014) in the hidden layers of sequencetosequence autoencoder. Dropout is used for regularization (Pham et al., 2014; Srivastava et al., 2014) and is applied only to the nonrecurrent connections, ensuring information flow across timesteps. For a multilayered RNN with hidden layers, the hidden state at time for hidden layer is obtained from and as in Equation 1. The time series goes through the following transformations iteratively for through , where is length of the time series:
(1)  
where is Hadamard product, is concatenation of vectors and , is dropout operator that randomly sets the dimensions of its argument to zero with probability equal to dropout rate, equals the input at time . , , and are weight matrices of appropriate dimensions s.t. , and are vectors in , where is the number of units in layer . The sigmoid () and activation functions are applied elementwise.
4. RUL Estimation using Embeddings
We consider a scenario where sensor readings over the operational life of one or multiple instances of a machine or a component are available. We denote the set of instances by . For an instance , we consider a multisensor time series where is the length of the time series, is an dimensional vector corresponding to the readings of the sensors at time . For a failed instance , the length corresponds to the total operational life (from start to end of life) while for a currently operating instance the length corresponds to the elapsed operational life till the latest available sensor reading.
Typically, if is large, we divide the time series into windows (subsequences) of fixed length . We denote a time series window from time to for instance by . A fixeddimensional representation or embedding for each such window is obtained using an RNN Encoder that is trained in an unsupervised manner using RNNED. We train RNNED using time series subsequences from the entire operational life of machines (including normal as well as faulty operations)^{2}^{2}2Unlike the proposed approach, ReconRUL (Malhotra et al., 2016b) uses time series subsequences only from normal operation of the machine.. We use the embedding for a window to estimate the health of the instance at the end of that window. The RNN Encoder is likely to retain the important characterstics of machine behavior in the embeddings, and therefore discriminate between embeddings of windows corresponding to degraded behavior from those of normal behavior. We describe how these embeddings are obtained in Section 4.1, and then describe how health index curves and RUL estimates can be obtained using the embeddings in Sections 4.2 and 4.3, respectively. Figure 2 provides an overview of the steps involved in the proposed approach for RUL estimation.
4.1. Obtaining Embeddings using RNN EncoderDecoder
We briefly introduce RNN EncoderDecoder (RNNED) networks based on sequencetosequence (seq2seq) learning framework. In general, a seq2seq model consists of a pair of multilayered RNNs trained together: an encoder RNN and a decoder RNN. Figure 3 shows the workings of encoderdecoder pair for a sample time series . Given an input time series , the encoder RNN iterates through the points in the time series to compute the final hidden state , given by the concatenation of the hidden state vectors from all the layers in the encoder, s.t. , where is the hidden state vector for the layer of encoder. The total number of recurrent units in the encoder is given by , s.t. (refer Section 3.2.1).
The decoder RNN has the same network structure as the encoder, and uses the hidden state as its initial hidden state, and iteratively (for steps) goes through the transformations in Equation 1 (followed by a linear output layer) to reconstruct the input time series. The overall process can be thought of as a nonlinear mapping of the input multivariate time series to a fixeddimensional vector representation (embedding) via an encoder function , followed by another nonlinear mapping of the fixeddimensional vector to a multivariate time series via a decoder function :
(2)  
The reconstruction error at any point in is . The overall reconstruction error for the input time series window is given by . The RNNED is trained to minimize the loss function given by the squared reconstruction error: .
Typically, along with the final hidden state, an additional input is given to the decoder RNN at each time step. This input is the output of the decoder RNN at the previous time step, as used in (Malhotra et al., 2016b). We, however, do not give any such additional inputs to the decoder along with the final hidden state of encoder. This ensures that the final hidden state of encoder retains all the information required to reconstruct the time series back via the decoder RNN. This approach of learning robust embeddings or representations for time series has been shown to be useful for time series classification in (Malhotra et al., 2017). Figure 4 shows a typical example of input and output from RNNED, where the smoothed reconstruction suggests that the embeddings capture the necessary pattern in the input and remove noise.
4.1.1. Handling missing values
In realworld data, the sensor readings tend to be intermittently missing. We include masking and delta vectors as additional inputs to the RNNED at each time instant, (as in (Che et al., 2016)). The masking vector helps to identify the sensors that are missing at time , and the delta vector indicates the time elapsed till , from the most recent nonmissing values for sensors in the past. We omit superscript for denoting an instance of the machine from the notation of masking and delta vectors defined below for simplicity.

Masking vector () denotes the missing sensors at time and , where is the number of sensors. The element of vector is given by:
(3) where , and denotes the element of vector . When , we set to 0 or to the average value for sensor (we use 0 for the experiments in Section 5).

Delta vector () indicates the time elapsed till , from the most recent nonmissing values for the sensors in the past and . The element of vector is given by:
(4) where and is the time elapsed from start when reading is available and . It is to be noted that the sensor readings may not be available at regular time intervals. Therefore, the sequence of readings is indexed by time , while the actual timestamps are denoted by .
The masking and delta vectors are given as additional inputs to the RNNED but are not reconstructed, s.t. only the actual sensors are reconstructed. Therefore, the modified input time series , while the corresponding target to be reconstructed is . The loss function of the RNNED is also modified accordingly, so that the model is not penalized for reconstructing the missing sensors incorrectly. The contribution of a time series subsequence to the loss function is thus given by . In effect, the network focuses on reconstructing the available sensor readings only.
4.2. Obtaining HI Curves using Embeddings
Here we describe how the embeddings of time series subsequences are utilized to estimate the health of machines. Since the RNN Encoder captures the important patterns in the input time series subsequences, the embeddings thus obtained can be used to differentiate between normal and degraded regions in the data. We maintain a set of embeddings, , corresponding to the time series subsequences from the normal behavior of all the instances in . As a machine operates, its health degrades over time and the corresponding subsequence embeddings tend to be different from those in . So, we estimate the HI for a subsequence as follows:
(5) 
The HI curve for an instance obtained from the HI estimates at each time is denoted by . Like the set of normal embeddings , we also maintain a set containing the HI curves of all instances in .
It is to be noted that HI values are usually assumed to have value between 0 and 1, where 0 means very poor health and 1 means perfect normal health (as shown in Figure 1). The HI as defined in Equation 5 follows inverse definition, i.e. it is low for normal health and high for poor health (as shown in Figure 9). This can be easily transformed to adhere to the standard range of 01 through suitable normalization/scaling procedure if required, as used in (Malhotra et al., 2016b).
4.3. RUL Estimation using HI Curves
We use the same approach for estimating RUL from the HI curve as in (Malhotra et al., 2016b). We present it here for the sake of completeness. To estimate the RUL for a test instance , its HI curve is compared with the HI curves in . The initial health of a train instance and a test instance need not be same. We therefore allow for a timelag in comparing the HI curve of test instance and train instance.
The similarity between the HI curves of the test instance and a train instance for a timelag is given by:
(6) 
, , . Here, is maximum allowed timelag, and controls the notion of similarity: a small value of would imply a large value for even when the difference in HI curves is small. The RUL estimate for based on the HI curve for and timelag is given by .
A weighted average of the RUL estimates obtained using all combinations of and is used as the final estimate , and is given by:
(7) 
where the summation is over only those combinations of and which satisfy , where , .
5. Experimental Evaluation
We evaluate our proposed approach for RUL estimation on two datasets: i) a publicly available CMAPSS Turbofan Engine dataset (Saxena and Goebel, 2008), ii) a proprietary realworld pump dataset. We use Tensorflow (Abadi et al., 2016) library for implementing the various RNN models. We present the details of datasets in Section 5.1. In Section 5.2, we show that the results for embedding distance based approaches for RUL estimation compare favorably to the previously reported results using reconstruction error based approaches (Malhotra et al., 2016b) on the engine dataset , as well as on the realworld pump dataset. Further, we evaluate the robustness of the embedding distances and reconstruction error based approaches by measuring the effect of additive random Gaussian noise in the sensor readings on RUL estimation in Section 5.3.
5.1. Datasets Description
5.1.1. Engine dataset
We use the first dataset from the four simulated turbofan engine datasets from the NASA Ames Prognostics Data Repository (Saxena and Goebel, 2008). This dataset contains time series of readings for 24 sensors for 100 train instances (train_FD001.txt) of turbofan engine from the beginning of usage till end of life. There are 100 test instances for which the time series are pruned some time prior to failure, s.t. the instances are currently operational and their RUL needs to be estimated (test_FD001.txt). The actual RUL for the test instances are available in RUL_FD001.txt. Noticeably, each engine instance has a different initial degree of wear such that the initial HI of each instance is likely to be different (implying potential usefulness of as introduced in Section 4.3).
We randomly select 80 train instances to train the models. Remaining 20 instances from the train set are used as validation set to select the parameters. The trajectories for these 20 engines are randomly truncated at five different locations to obtain five different instances from each instance for the RUL estimation task. We use Principal Components Analysis (PCA) (Jolliffe, 2002) to reduce the dimensionality of data and select the number of principal components () to be used based on the validation set.
5.1.2. Pump dataset
This dataset contains hourly sensor readings for 38 pumps that have reached end of life and 24 pumps that are currently operational. This dataset contains readings over a period of 2.5 years with each pump having 7 sensors installed on it. The 38 failed instances are randomly split into training, validation and test sets with 70%, 15%, and 15% instances in them, respectively. The 24 operational instances are added to training and validation set only for obtaining the RNNED model (they are not part of the set as their actual RUL is not known). The data is notably sparse with over 45% missing values across sensors. Also, for most pumps the sensor readings are not available from the date of installation but only few months (average 3.5 months) after the installation date. Depending on the time elapsed, the health degradation level when sensor data is first available for each pump varies significantly. The total operational life of the pumps varies from a minimum of 57 days to a maximum of 726 days.
We downsample the time series data from the original one reading per hour to one reading per day. To do this, we use following four statistics for each sensor over a day as derived sensors: minimum, maximum, average, and standard deviation, such that there are 28 (=) derived sensors for each day. Further, using the derived sensors also helps take care of missing values which reduce from 45% for hourly sampling rate data to 33% for daily sampling rate data. We use masking and delta vectors as additional inputs in this case to train RNNED models as described in Section 4.1.1, s.t. the final input dimension is 42 (28 for derived sensors, and 7 each for masking and delta vectors). Unlike the engine dataset where RUL is estimated only at the last available reading for each test instance, here we estimate RUL on every third day of operation for each test instance.
A description of the performance metrics used for evaluation (taken from (Saxena et al., 2008a)) is provided in Appendix A. The hyperparameters of our model, to be tuned are: number of principal components (), number of hidden layers for RNNED (), number of units in a hidden layer () (we use same number of units in each hidden layer), dropout rate (), window length (), maximum allowed timelag (), similarity threshold (), maximum predicted RUL (), and parameter (). The window length () can be tuned as a hyperparameter but in practice domain knowledge based selection of window length may be effective.
5.2. Embeddings for RUL Estimation
We follow similar evaluation protocol as used in (Malhotra et al., 2016b). To the best of our knowledge, the reconstruction error based model, LRED, reported the best performance for RUL estimation on the engine dataset in terms of timeliness score (refer Appendix B). We compare variants of embedding distance based approach and reconstruction error based approach. We refer to HI curve obtained using the proposed embedding distance based approach as HI (refer Section 4.2), and the HI curve obtained using the reconstrcution error based approach in (Malhotra et al., 2016b) as HI. Here, we refer the reconstruction error based LSTMED, LRED and LRED models reported in (Malhotra et al., 2016b), as ReconRUL, ReconLR, and ReconLR, respectively. We compare following models based on RNNs for RUL estimation task:

EmbedRUL Vs ReconRUL: We compare RUL estimation performance of EmbedRUL that uses HI curves and ReconRUL that uses HI curves.

Linear Regression models: We learn a linear regression model (as described in Section 3.1) using normalized health index curves HI as target and call it as EmbedLR. EmbedLR is obtained using squared normalized HI as target for the linear regression model. Similarly, ReconLR and ReconLR are obtained based on HI.

RNN Regression model: RNNbased regression model (RNNReg.) is directly used to predict RUL (similar to (Heimes, 2008))
Engine Dataset  Pump Dataset  
Noise ()  ReconRUL  EmbedRUL  ReconRUL  EmbedRUL 
(proposed)  (proposed)  
0.0  546  456  1979  1304 
0.1  548  462  2003  1298 
0.2  521  478  2040  1293 
0.3  523  460  2068  1259 
0.4  484  473  2087  1280 
Mean  524  466  2035  1287 
Std. Dev.  23  8  40  16 
5.2.1. Performance on Engine dataset
Metric  ReconRUL  EmbedRUL  ReconLR  EmbedLR  ReconLR  EmbedLR  RNNReg. 

(proposed)  (proposed)  (proposed)  
S  1263  810  477  219  256  232  352 
MSE  546  456  288  155  164  167  219 
A(%)  36  48  65  59  67  62  64 
MAE  18  17  12  10  10  10  11 
^{3}^{3}3Referred to as MAPE in (Malhotra et al., 2016b) MAPE  39  39  20  19  18  19  17 
FPR(%)  34  23  19  14  13  15  22 
FNR(%)  30  29  16  27  20  23  24 
Metric  ReconRUL  EmbedRUL  ReconLR  EmbedLR  ReconLR  EmbedLR  RNNReg. 

(proposed)  (proposed)  (proposed)  
MSE  1979  1304  2277  2288  2365  2312  3422 
MAE  40  33  38  42  38  42  48 
We use =13, =10 as proposed in (Saxena et al., 2008b) for this dataset (refer Equations 811 in Appendix A). The parameters are obtained using grid search to minimize the timeliness score (refer Equation 8) on the validation set. The parameters obtained for the best model (EmbedLR) are , , , , , , , , and .
Table 2 shows the performance in terms of various metrics on this dataset.We observe that each variant of embedding distance based approach perform better than the corresponding variant of reconstruction error based approach in terms of timeliness score . Figure 8(a) shows the distribution of errors for EmbedRUL and ReconRUL models, and Figure 8(b) shows the distribution of errors for the best linear regression models of embedding distance (EmbedLR) and reconstruction error (ReconLR) based approaches. The error ranges for reconstruction error based models are more spreadout (e.g., 70 to +50 for ReconRUL) than the corresponding embedding distances based models (e.g., 60 to +30 for EmbedRUL), suggesting the robustness of the embedding distances based models. Figure 6 shows the actual RULs and the RUL estimates from EmbedLR and ReconLR.
5.2.2. Performance on Pump dataset
The parameters are obtained using grid search to minimize the MSE for RUL estimation on the validation set. The parameters for the best model (EmbedRUL) are , , , , , , , and . The MSE and MAE performance metrics for the RUL estimation task are given in Table 3. The embedding distance based EmbedRUL model performs significantly better than any of the other approaches. It is better than the second best model (ReconRUL). The linear regression (LR) based approaches perform significantly worse than the raw embedding distance or reconstruction error based approaches for HI estimation indicating that the temporal aspect of the sensor readings is very important in this case. Figure 7 shows the actual and estimated RUL values for the pumps with best and worst performance in terms of MSE for the EmbedRUL model.
5.2.3. Qualitative Analysis of Embeddings
We analyze the embeddings given by RNN Encoder for the EmbedRUL models. The original dimension of embeddings for EmbedRUL for engine and pump datasets are 55 and 390, respectively. We use tSNE (Maaten and Hinton, 2008) to map the embeddings to 2D space. Figure 5 shows the 2D scatter plot for the embeddings at the first 25% (normal behavior) and last 25% (degraded behavior) points in the life of all test instances. We observe that RNN Encoder tends to give different embeddings for windows corresponding to normal and degraded behaviors. The scatter plots indicate that normal windows are close to each other and far from degraded windows, and viceversa. Note: Since tSNE does nonlinear dimensionality reduction, the actual distances between normal and degraded windows may not be reflected in these plots.
5.3. Robustness of Embeddings to Noise
We evaluate the robustness of EmbedRUL and ReconRUL for RUL estimation by adding Gaussian noise to the sensor readings. The sensor reading for a test instance at any time is corrupted with additive Gaussian noise to obtain a noisy version s.t. .
Table 1 shows the effect of noise on performance for both engine and pump datasets. For both datasets, the standard deviation of the MSE values over different noise levels is much lesser for EmbedRUL compared to ReconRUL. This suggests that embedding distance based models are more robust to noise compared to reconstruction error based models. Also, for the engine dataset, we observe similar behavior in terms of timeliness score (): for EmbedRUL and for ReconRUL.
Figure 9 depicts a sample scenario showing the health index generated from noisy sensor data. The vertical bar corresponds to 1sigma deviation in estimate. The reconstruction error and embedding distance increase over time indicating gradual degradation. While reconstruction error based HI varies significantly with varying noise levels, embedding distance based HI is fairly robust to noise. This suggests that reconstruction error varies significantly with change in noise levels impacting HI estimates while distance between embeddings does not change much leading to robust HI estimates.
6. Discussion
We have proposed an approach for health monitoring via health index estimation and remaining useful life (RUL) estimation. The proposed approach is capable of dealing with several of the practical challenges in datadriven RUL estimation including noisy sensor readings, missing data, and lack of prior knowledge about degradation trends. The RNN EncoderDecoder (RNNED) is trained in an unsupervised manner to learn fixeddimensional representations or embeddings to capture machine behavior. The health of a machine is then estimated by comparing the recent embedding with the existing set of embeddings corresponding to normal behavior. We found that our approach using RNNED based embedding distances is better compared to the previously known best approach using RNNED based reconstruction error on the engine dataset. The proposed approach also gives better results on the realworld pump dataset. We have also shown that embedding distances based RUL estimates are robust to noise.
References
 (1)
 Abadi et al. (2016) Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, and others. 2016. Tensorflow: Largescale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).
 Akintayo et al. (2016) Adedotun Akintayo, Kin Gwn Lore, Soumalya Sarkar, and Soumik Sarkar. 2016. Early Detection of Combustion Instabilities using Deep Convolutional Selective Autoencoders on Hispeed Flame Video. CoRR abs/1603.07839 (2016). http://arxiv.org/abs/1603.07839
 Babu et al. (2016) Giduthuri Sateesh Babu, Peilin Zhao, and XiaoLi Li. 2016. Deep convolutional neural network based regression approach for estimation of remaining useful life. In International Conference on Database Systems for Advanced Applications. Springer, 214–228.
 Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
 Camci et al. (2016) Fatih Camci, Omer Faruk Eker, Saim Başkan, and Savas Konur. 2016. Comparison of sensors and methodologies for effective prognostics on railway turnout systems. Proceedings of the Institution of Mechanical Engineers, Part F: Journal of Rail and Rapid Transit 230, 1 (2016), 24–42.
 Che et al. (2016) Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. 2016. Recurrent neural networks for multivariate time series with missing values. arXiv preprint arXiv:1606.01865 (2016).
 Cho et al. (2014) Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoderdecoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
 Croarkin and Tobias (2006) Carroll Croarkin and Paul Tobias. 2006. NIST/SEMATECH ehandbook of statistical methods. NIST/SEMATECH (2006).
 Da Xu et al. (2014) Li Da Xu, Wu He, and Shancang Li. 2014. Internet of things in industries: A survey. IEEE Transactions on industrial informatics 10, 4 (2014), 2233–2243.
 Dai and Le (2015) Andrew M Dai and Quoc V Le. 2015. Semisupervised sequence learning. In Advances in Neural Information Processing Systems. 3079–3087.
 Eker et al. (2014) Ömer Faruk Eker, Faith Camci, and Ian K Jennions. 2014. A similaritybased prognostics approach for remaining useful life prediction. (2014).
 Filonov et al. (2016) Pavel Filonov, Andrey Lavrentyev, and Artem Vorontsov. 2016. Multivariate Industrial Time Series with CyberAttack Simulation: Fault Detection Using an LSTMbased Predictive Data Model. NIPS Time Series Workshop 2016, arXiv preprint arXiv:1612.06676 (2016).
 Graves et al. (2013) Alan Graves, Abdelrahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 6645–6649.
 Heimes (2008) Felix O Heimes. 2008. Recurrent neural networks for remaining useful life estimation. In Prognostics and Health Management, 2008. PHM 2008. International Conference on. IEEE, 1–6.
 Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long shortterm memory. Neural computation 9, 8 (1997), 1735–1780.
 Hu et al. (2012) Chao Hu, Byeng D Youn, Pingfeng Wang, and Joung Taek Yoon. 2012. Ensemble of datadriven prognostic algorithms for robust prediction of remaining useful life. Reliability Engineering & System Safety 103 (2012), 120–135.
 Jolliffe (2002) Ian Jolliffe. 2002. Principal component analysis. Wiley Online Library.
 Khelif et al. (2017) Racha Khelif, Brigitte ChebelMorello, Simon Malinowski, Emna Laajili, Farhat Fnaiech, and Noureddine Zerhouni. 2017. Direct Remaining Useful Life Estimation Based on Support Vector Regression. IEEE Transactions on Industrial Electronics 64, 3 (2017), 2276–2285.
 Khelif, Malinowski, ChebelMorello, and Zerhouni (Khelif et al.) Racha Khelif, Simon Malinowski, Brigitte ChebelMorello, and Noureddine Zerhouni. RUL prediction based on a new similarityinstance based approach. In IEEE International Symposium on Industrial Electronics’ 14.
 Liao and Ahn (2016) Linxia Liao and Hyungil Ahn. 2016. A Framework of Comnbining Deep Learning and Survival Analysis for Asset Health Management. 1st ACM SIGKDD Workshop on ML for PHM. (2016).
 Maaten and Hinton (2008) Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using tSNE. Journal of Machine Learning Research 9, Nov (2008), 2579–2605.
 Macmann et al. (2016) Owen B Macmann, Timothy M Seitz, Alireza R Behbahani, and Kelly Cohen. 2016. Performing Diagnostics and Prognostics On Simulated Engine Failures Using Neural Networks. In 52nd AIAA/SAE/ASEE Joint Propulsion Conference. 4807.
 Malhotra et al. (2016a) Pankaj Malhotra, Anusha Ramakrishnan, Gaurangi Anand, and others. 2016a. LSTMbased EncoderDecoder for Multisensor Anomaly Detection. In Anomaly Detection Workshop at 33rd ICML. arxiv:1607.00148 (2016).
 Malhotra et al. (2016b) Pankaj Malhotra, Vishnu TV, Anusha Ramakrishnan, Gaurangi Anand, Lovekesh Vig, Puneet Agarwal, and Gautam Shroff. 2016b. MultiSensor Prognostics using an Unsupervised Health Index based on LSTM EncoderDecoder. 1st ACM SIGKDD Workshop on ML for PHM. arXiv preprint arXiv:1608.06154 (2016).
 Malhotra et al. (2017) Pankaj Malhotra, Vishnu TV, Lovekesh Vig, Puneet Agarwal, and Gautam Shroff. 2017. TimeNet: Pretrained deep recurrent neural network for time series classification. In 25th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning.
 Malhotra et al. (2015) Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal. 2015. Long Short Term Memory Networks for Anomaly Detection in Time Series. In ESANN, 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. 89–94.
 Mosallam et al. (2014) Ahmed Mosallam, Kamal Medjaher, and Noureddine Zerhouni. 2014. Datadriven prognostic method based on Bayesian approaches for direct remaining useful life prediction. Journal of Intelligent Manufacturing (2014).
 Mosallam et al. (2015) A Mosallam, K Medjaher, and N Zerhouni. 2015. Component based datadriven prognostics for complex systems: Methodology and applications. In Reliability Systems Engineering (ICRSE), 2015 First International Conference on. IEEE, 1–7.
 Ng (2011) Andrew Ng. 2011. Sparse autoencoder. CS294A Lecture notes 72, 2011 (2011), 1–19.
 Ng et al. (2014) Selina SY Ng, Yinjiao Xing, and Kwok L Tsui. 2014. A naive Bayes model for robust remaining useful life prediction of lithiumion battery. Applied Energy 118 (2014), 114–123.
 Peng, Wang, Wang, Liu, and Peng (Peng et al.) Yu Peng, Hong Wang, Jianmin Wang, Datong Liu, and Xiyuan Peng. A modified echo state network based remaining useful life estimation approach. In IEEE Conference on Prognostics and Health Management (PHM), 2012.
 Pham et al. (2014) Vu Pham, Théodore Bluche, Christopher Kermorvant, and Jérôme Louradour. 2014. Dropout improves recurrent neural networks for handwriting recognition. In Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on. IEEE, 285–290.
 Qiu et al. (2003) Hai Qiu, Jay Lee, Jing Lin, and Gang Yu. 2003. Robust performance degradation assessment methods for enhanced rolling element bearing prognostics. Advanced Engineering Informatics 17, 3 (2003), 127–140.
 Ramasso (2014) Emmanuel Ramasso. 2014. Investigating computational geometry for failure prognostics. International Journal of Prognostics and Health Management 5, 1 (2014), 005.
 Ramasso et al. (2013) Emmanuel Ramasso, Michele Rombaut, and Noureddine Zerhouni. 2013. Joint prediction of continuous and discrete states in timeseries based on belief functions. Cybernetics, IEEE Transactions on 43, 1 (2013), 37–50.
 Reddy et al. (2016) Kishore K Reddy, Vivek Venugopalan, and Michael J Giering. 2016. Applying Deep Learning for Prognostic Health Monitoring of Aerospace and Building Systems. 1st ACM SIGKDD Workshop on ML for PHM. (2016).
 Saxena et al. (2008a) Abhinav Saxena, Jose Celaya, Edward Balaban, Kai Goebel, Bhaskar Saha, Sankalita Saha, and Mark Schwabacher. 2008a. Metrics for evaluating performance of prognostic techniques. In Prognostics and health management, 2008. phm 2008. international conference on. IEEE, 1–17.
 Saxena and Goebel (2008) A Saxena and K Goebel. 2008. Turbofan Engine Degradation Simulation Data Set. NASA Ames Prognostics Data Repository (2008).
 Saxena et al. (2008b) Abhinav Saxena, Kai Goebel, Don Simon, and Neil Eklund. 2008b. Damage propagation modeling for aircraft engine runtofailure simulation. In Prognostics and Health Management, 2008. PHM 2008. International Conference on. IEEE, 1–9.
 Si et al. (2011) XiaoSheng Si, Wenbin Wang, ChangHua Hu, and DongHua Zhou. 2011. Remaining useful life estimation–A review on the statistical data driven approaches. European journal of operational research 213, 1 (2011), 1–14.
 Srivastava et al. (2014) Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929–1958.
 Srivastava et al. (2015) Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised learning of video representations using lstms. In International Conference on Machine Learning. 843–852.
 Sutskever et al. (2014) Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104–3112.
 Vincent et al. (2008) Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and PierreAntoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning. ACM, 1096–1103.
 Wang et al. (2008) Tianyi Wang, Jianbo Yu, David Siegel, and Jay Lee. 2008. A similaritybased prognostics approach for remaining useful life estimation of engineered systems. In Prognostics and Health Management, 2008. PHM 2008. International Conference on. IEEE, 1–6.
 Yan and Yu (2015) Weizhong Yan and Lijie Yu. 2015. On accurate and reliable anomaly detection for gas turbine combustors: A deep learning approach. In Proceedings of the Annual Conference of the Prognostics and Health Management Society.
 Zhao et al. (2016a) Rui Zhao, Jinjiang Wang, Ruqiang Yan, and Kezhi Mao. 2016a. Machine health monitoring with LSTM networks. In Sensing Technology (ICST), 2016 10th International Conference on. IEEE, 1–6.
 Zhao et al. (2016b) Rui Zhao, Ruqiang Yan, Zhenghua Chen, Kezhi Mao, Peng Wang, and Robert X Gao. 2016b. Deep Learning and Its Applications to Machine Health Monitoring: A Survey. arXiv preprint arXiv:1612.07640 (2016).
 Zhao et al. (2017) Rui Zhao, Ruqiang Yan, Jinjiang Wang, and Kezhi Mao. 2017. Learning to monitor machine health with convolutional bidirectional lstm networks. Sensors 17, 2 (2017), 273.
Appendix A Performance metrics
There are several metrics proposed to evaluate the performance of prognostics models (Saxena et al., 2008a). We measure the performance of our models in terms of Timeliness Score (S), Accuracy (A), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), False Positive Rate (FPR) and False Negative Rate (FNR) as mentioned in Equations 811.
For a test instance , the error between the estimated RUL () and actual RUL (). The timeliness score used to measure the performance of a model is given by:
(8) 
where if , else , is total test instances. Usually, such that late predictions are penalized more compared to early predictions. The lower the value of , the better is the performance.
(9) 
where , else , .
(10) 
(11) 
A prediction is false positive (FP) if , and false negative (FN) if .
Appendix B Benchmarks on Turbofan Engine Dataset
We provide a comparison of some approaches for RUL estimation on the engine dataset (test_FD001.txt) below:
Approach  S  A  MAE  MSE  MAPE  FPR  FNR 
Bayesian1 (Mosallam et al., 2014)  NR  NR  NR  NR  12  NR  NR 
Bayesian2 (Mosallam et al., 2015)  NR  NR  NR  NR  11  NR  NR 
ESNKF (Peng, Wang, Wang, Liu, and Peng, Peng et al.)  NR  NR  NR  4026  NR  NR  NR 
EVKNN (Ramasso et al., 2013)  NR  53  NR  NR  NR  36  11 
IBL (Khelif, Malinowski, ChebelMorello, and Zerhouni, Khelif et al.)  NR  54  NR  NR  NR  18  28 
Shapelet (Khelif, Malinowski, ChebelMorello, and Zerhouni, Khelif et al.)  652  NR  NR  NR  NR  NR  NR 
DeepCNN (Babu et al., 2016)  1287  NR  NR  340  NR  NR  NR 
SOM^{4}^{4}4Dataset simulated under similar settings(Macmann et al., 2016)  NR  NR  NR  297  NR  NR  NR 
SVR(Khelif et al., 2017)  449  70  NR  NR  NR  NR  NR 
RULCLIPPER^{5}^{5}5Unlike this method which tunes the parameters on the test set to obtain the maximum , we learn the parameters of the model on a validation set and still get similar performance in terms of . (Ramasso, 2014)  216*  67  10.0  176  20  56  44 
LRED (ReconLR) (Malhotra et al., 2016b)  256  67  9.9  164  18  13  20 
EmbedLR(Proposed)  219  59  9.8  155  19  14  27 