Sequence-to-Sequence Imputation of Missing Sensor Data

Sequence-to-Sequence Imputation of Missing Sensor Data

Abstract

Although the sequence-to-sequence (encoder-decoder) model is considered the state-of-the-art in deep learning sequence models, there is little research into using this model for recovering missing sensor data. The key challenge is that the missing sensor data problem typically comprises three sequences (a sequence of observed samples, followed by a sequence of missing samples, followed by another sequence of observed samples) whereas, the sequence-to-sequence model only considers two sequences (an input sequence and an output sequence). We address this problem by formulating a sequence-to-sequence in a novel way. A forward RNN encodes the data observed before the missing sequence and a backward RNN encodes the data observed after the missing sequence. A decoder decodes the two encoders in a novel way to predict the missing data. We demonstrate that this model produces the lowest errors in 12% more cases than the current state-of-the-art.

\keywords

Imputation Interpolation LSTM Encoder-Decoder Model Sequence-to-sequence model.

\publisedin

Preprint of article published in AI 2019: Advances in Artificial Intelligence. See https://doi.org/10.1007/978-3-030-35288-2_22

1 Introduction and Related Work

From smart cities [?] to personalised body sensor networks [9], sensor data is becoming ubiquitous. This has been fuelled by the rise of the internet of things (IOT), smart sensor networks, and low-cost sensors. Such technologies are however imperfect and their failure may result in missing data. Sensors may fail due to hardware or software failure. Communication networks can break down due to low signal level, network congestion, packet collision, limited memory capacity, or communication node failures [14]. Even if sensors and communications prevail, missing data may result from scheduled outages such as maintenance and upgrade routines.

When a data-driven model (such as a machine learning model) uses sensor data for prediction, missing data introduces various challenges in parameterising or training the model. This is especially problematic when the temporal structure of the data is important to the model. To address this problem, various methods for imputing or interpolating the missing data have been proposed in the literature.

The Recurrent Neural Network (RNN) has been shown to perform well for missing data recovery [5, 15, 4, 10]. However, there is little research into using the sequence-to-sequence (encoder-decoder) model [12], despite it being considered as a state-of-the-art model in deep learning sequence modelling. The key challenge in applying this model to missing data is that it is designed to use a single input sequence to predict some output sequence. However, the missing data problem can be considered to have two input sequences that are separated by the missing data. That is, relative to the missing data, the model must take into account data that is observed before and after the missing data sequence.

We propose a novel sequence-to-sequence model that incorporates the data before and after a missing data sequence to predict the missing values of that sequence. For this, two encoders are used: one propagating in the positive time-direction, and one propagating in the negative-time direction. These two encoders feed into a decoder that naturally combines the encoded forward and backward encoders to provide an accurate prediction of the missing data sequence. A key feature of the sequence-to-sequence model is that it can handle arbitrary length input and output sequences. Our key contributions are:

  1. The proposed decoder architecture is novel in the way that it merges information from two encoders.

  2. We introduce a novel approach to scaling a forward and backward RNN within the decoder according to their proximity to observed data.

  3. We demonstrate results which show that our model outperforms the current state-of-the-art methods on several datasets.

The proposed model is particularly applicable in problems where there is no neighbouring data available for imputing across variables at each sequence step. The recovery of the missing data must be determined from temporal information alone. These include univariate problems or multivariate problems where sequences of data are missing across all measured variables at the same time. This typically occurs when there is a central system failure, such as the failure of a multi-parameter sensor, the failure of a central communications node in a star-network, or a scheduled outage across a system.

2 Related Work

Various models such as MICE [3] and ST-MVL [13] have been proposed for missing data recovery. We however focus on RNNs, as these are considered to be the state-of-the-art in many missing data recovery applications. Various forms of the RNN have been tested for data imputation. Che et. al. [5] use the Gated Recurrent Unit with trainable decays (GRU-D) model for recovering missing data. The decay rates exponentially reduce importance of predictions that are distant from observations. The model however does not consider samples that occur after the missing data sequence.

The M-RNN [15] uses a bidirectional neural network for imputation. The model is a multi-directional RNN that considers temporal information as well as information across sensors to recover missing data. This model thus relies on a subset of data to be available at any time.

The RITS and BRITS [4] model use a RNN to perform one-step ahead forecasting and modelling over sequences. Compared with M-RNN, it trains output nodes with missing data as part of the network. Both this and a bidirectional RNN provide a means learn from data that lies before and after the missing data sequence. Additionally, they use trainable decays similar to the GRU-D. However, like the M-RNN, the RITS and BRITS models perform imputation by considering temporal information as well as information across sensors. Cao et. el. [4] do however propose the RITS-I and BRITS-I models as reduced versions of RITS and BRITS which exclude the mechanism used to perform predictions across sensors. These reduced models focus on temporal predictions and are thus used for comparison in this study.

The Iterative Imputing Model (IIM) [17] uses a forward and backward RNN to encode information before and after the missing data. These RNNs could be considered to perform the task of the encoder in the sequence-to-sequence model. However, to predict the missing data, a predict-update loop (similar to the EM algorithm) is used iteratively impute each missing sample. This iterative process is computationally expensive and does not correspond with a decoder in the sequence-to-sequence model.

The SSIM model [16] is the first model to use the sequence-to-sequence approach for recovering missing data. To address the problem of including observations before and after the missing data, SSIM uses a forward and backward RNN together with a variable-length sliding window. A drawback of the model is that it has to “learn” that there is a difference between the observations before and after the missing data [16].

Compared with GRU-D, BRITS, and M-RNN, our model uses the sequence-to-sequence approach, which is the state-of-the-art in applications such as natural language processing. Furthermore, we consider the problem where there is a complete set of data across all sensors or variables. The result is that data recovery is performed on temporal information alone. Compared with IIM, our model uses an arbitrary length decoder that does not require an iterative updating approach. Compared with SSIM, our model naturally stitches the observations before and after the missing data and is thus not required to learn that there is a difference between them. Furthermore, it does not require a variable sliding window to operate.

3 Model

3.1 Architecture

A sequence-to-sequence (encoder-decoder) model is proposed to recover missing time series data. As illustrated Figure 1, the network comprises a forward encoder, a backward encoder, and a form of bidirectional decoder. The network can be viewed as containing two traditional sequence-to-sequence models [12], one in the forward direction, and one in the backward direction. The outputs of the forward and backward RNN cells in the decoder are scaled and merged together in a final output layer in the form of a Multilayered Perceptron (MLP).

Forward Encoder

Decoder

Backward Encoder

Figure 1: Illustration of proposed sequence-to-sequence model for missing sensor data imputation. Square nodes denote LSTM cells and circular nodes denote linear output neurons. The circular nodes with the operator denote element-wise multiplication with the scaling factors and . Observations are provided for and . The values for are missing. The forward encoder encodes and the backward encoder encodes . The decoder is a bidirectional LSTM that predicts . Each forward and backward LSTM cell in the decoder predicts the missing data and this prediction is input to the next RNN cell in the sequence as illustrated by the dashed arrows. The LSTMs in the decoder thus perform one-step-ahead forecasting.

The forward and backward decoder RNNs operate by performing one-step-ahead predictions. The prediction of the previous RNN cell is fed to the input of the current cell as illustrated by the dashed arrows in Figure 1. In a regression problem, the prediction is performed using a MLP with a linear output layer and inputs given by the outputs of the corresponding RNN cell. The forward encoder predictions are denoted by and the backward encoder predictions are denoted by .

The additional outputs at the RNN level are required as all the final output layer’s outputs are not available at each sequence step. For example, as illustrated in Figure 1, computing requires the output of the second forward RNN cell and the second backward RNN cell. If the final output layer outputs were fed to the next cell, would be fed to the input of the forward RNN at index 4. However, also requires the output of the third backward RNN cell, which is not available as the backward RNN has only been processed up to its second cell. To address this dilemma, the forward RNNs and the backward RNNs are first processed over the entire sequence with their local outputs. The results are then passed to the final output layer.

3.2 Scaling Factors

Figure 2: An illustration demonstrating the principle of the scaling factor approach using two arbitrary linear functions. The variable decays with increasing (illustrated with a vanishing curve), whereas decays with decreasing . The prediction is the weighted combination of and . In the proposed model, is the output of the forward decoder RNN, is the output of the backward decoder RNN, and the summation operation in is a nonlinear operation performed by the output layer of the model.

Before the outputs of the forward and backward decoder RNNs are merged together in the final output MLP, the RNN outputs are scaled with a scaling factor . In our novel approach, the scaling factor decays as predictions progress further from observed data. The forward RNN outputs in the decoder are scaled according to the linear function

(1)

where is the length of the missing data sequence and is the index of the missing data sequence samples. The backward RNN outputs in the decoder are scaled according to

(2)

Thus, at time , the forward RNN output is scaled by a factor of . This factor decays to zero as increases. The opposite is true for the backward RNN, where it is scaled by a factor of at time . This factor decays to zero as decreases. The result is that the forward decoder RNN is emphasised near the observations associated with the forward encoder and the backward decoder RNN is emphasised near the observations associated with the backward encoder. The principle of this process is illustrated in an example using linear functions in Figure 2.

Scaling factors have been previously used in RNNs in [5] and [15]. These factors however decay exponentially and are integrated into the RNN network where they can be learned. In our approach, the scaling factors can be viewed as form of a “forced” attention mechanism that favours the RNN outputs that are nearest the observed data. Furthermore, the linear nature of the proposed scaling factors ensures a balanced weighting between the RNNs across the sequence such that .

3.3 Backpropagation With Scaling Factors

As the scaling factor scales the predictions, it also scales the derivatives used in backpropagation. This is due to being a fixed constant. For example, consider the forward decoder RNN (in the form of an Elman network for illustrative purposes) with the output layer as illustrated in Figure 3. The scaling factor is applied to each output . The variable is linear combination of inputs at output neuron , and is the linear combination of inputs at hidden neuron . The weight matrices , , and are associated with the input-to-hidden, hidden-to-hidden, and hidden-to-output connections respectively.

Figure 3: An illustration of a forward RNN and the output layer in the decoder for the discussion on backpropagation with the scaling factor.

Following the backpropagation derivation, the derivative of the cost with respect to the weight connecting the input to the RNN hidden node is given by

where is the number of output units and is the number of hidden units. The scaling factor affects the link between the hidden layer outputs and the output layer linear combination . This corresponds to the second factor in the first term. The derivative of this term is computed as

where is a bias. The scaling factor thus affects the derivatives passed back from the outputs to the hidden layers. The result is that, similar to the scaling of the predictions, the backpropagated errors are scaled to emphasise the RNN cells that are near the corresponding encoders. The scaling is thus incorporated into the learning process.

3.4 Output Layer

The scaled forward and backward decoder RNN outputs are passed to a MLP which predicts the missing data. The prediction provided by this output layer at time is denoted by . With linear outputs producing the predictions , , and , the cost function is given by

(3)

where is the ground truth value for the missing sample at time and is the mean squared error loss function (for the regression case).

4 Experiments

Several freely-available datasets are used to evaluate and compare the proposed model. The PM2.5 air quality dataset (from 2014-2015) is used as it is become a benchmark used in several previous studies such as [13], [4], and [17]. Note that, imputations are made across time and across sensors in these studies. However, in the current study, imputations are made across time only. In addition to this dataset, the Metro Interstate Traffic Volume dataset, the Birmingham Parking dataset [11], and the Beijing PM2.5 Air Quality dataset (from 2010-2014) [8] are used. These datasets are freely available from the UCI Machine Learning Repository1.

For the PM2.5 dataset, the PM2.5 data for sensor 001001 is used. In Traffic dataset, Temperature and traffic volume are used. Each parking area provides a unique variable in the Parking dataset. Finally, the Dew point, temperature, and pressure variables are used in the AirQuality dataset.

The Mean Absolute Error (MAE) and the Mean Relative Error (MRE) are used as performance metrics. These are given by [17, 13, 4].

where is the total number of observations.

The proposed model results are compared with results from the RITS-I [4], BRITS-I [4], and the sequence-to-sequence [12] models. In all models, 64 hidden units are used in the Long-Short Term Memory (LSTM) [6] RNN. A linear layer is used in the final output layer. All models are trained using the standard backpropagation approach to minimise (3) with the Adam optimisation algorithm [7]. Early stopping is used to avoid overfitting in the datasets.

The dataset is split into a test and training set such that the last 80% of the dataset is used as a test set. Training and test samples are extracted using a sliding window that is slid across the datasets. Each extracted window is split into a sequence of missing values, a sequence of observed values preceding the missing values, and a sequence of observed values following the missing values. The models are implemented in PyTorch and trained on Dual Xeon 14-core E5-2690 v4 Compute Nodes.

5 Results and Discussion

(a) Without scaling factor
(b) With scaling factor
Figure 4: Demonstration of the scaling factor operation on a Traffic dataset sample. The scaling factor emphasises the forward decoder RNN at the beginning of the prediction and it emphasises the backward decoder RNN at the end of the prediction. The result is a more accurate prediction.

To demonstrate the scaling factor, a prediction from a Traffic dataset sample is presented. The predictions of the forward decoder RNN, the backward decoder RNN, and the model output are plotted in Figure 4. The forward and backward RNNs produce significantly differing predictions. If the scaling factor is excluded from the model, the prediction is similar to the average of the forward and backward sequences. As both of these predictions deviate from the ground truth, this final prediction is inaccurate. By including the scaling into the model, the prediction is shifted towards the observed data points, providing a more accurate result.

Table 1 lists the MAEs and Figure 6 plots the MREs for the set of models and datasets. The Traffic dataset label indexes index the temperature and traffic volume variables in the dataset. The Parking dataset label indexes index the various parking areas. Finally, the AirQuality dataset label indexes index the dew point, temperature, and pressure variables. For reference, the dataset ranges are included in Table 1. In figures and tables, the proposed model is denoted by seq2seqImp and the sequence-to-sequence model is denoted by seq2seq.

The share of optimal MAEs is presented as a pie chart in Figure 5. Overall, the proposed model has the highest share with 38% of the lowest MAE results and is 12% higher than the other models. The sequence-to-sequence has the smallest share with 15% lowest errors. This is expected as the model is only provided with data prior to the missing data sequence. The other models are provided with data before and after the missing data sequence.

In the PM2.5 and AirQuality datasets, the proposed model produces significantly lower errors than the other models. For example, considering the proposed model produces MAEs that are a third lower than the competing models. The RITS-I model has the majority of its lowest errors in the Parking dataset. The model is thus well suited to this dataset.

To provide an aggregated representation of the results, Borda counts are used to rank the models through voting. A Borda count ranks a set of models with integers such that the model with the highest error is assigned a value of 1 and the model with the lowest error is assigned a value of . The sum of Borda counts for the models over all datasets are presented in Table 2. The results show that the proposed model is voted as the highest ranked model.

Figure 5: Pie chart indicating the share over which models produce optimal results.
Figure 6: MRE% results. See Table 1 for detailed MAE results. (The AirQuality results are not visible due to their scale. Refer to Table 1).
Range seq2seqImp seq2seq RITS-I BRITS-I
PM2.5:0 [3,429] 11.13 16.27 15.50 16.95 15.50 13.92
Traffic:0 [0,308] 1.45 2.32 2.28 2.35 1.85 1.51
Traffic:1 [0,7280] 832.33 1027.22 1111.64 1021.09 621.72 682.48
Parking:0 [20,492] 55.82 65.82 66.52 51.25 54.54 49.70
Parking:1 [0,320] 36.45 39.06 42.87 31.54 33.71 33.95
Parking:2 [68,821] 106.73 143.12 127.77 106.05 143.84 139.90
Parking:3 [39,402] 56.68 65.11 61.18 58.41 72.29 78.57
Parking:4 [0,1013] 146.85 163.61 188.61 146.69 110.33 99.68
Parking:5 [25,1197] 136.90 158.31 211.23 133.06 105.68 142.88
Parking:6 [15,612] 53.61 59.70 78.22 50.89 43.56 50.70
Parking:7 [30,470] 50.17 61.18 80.78 54.91 38.96 41.62
Parking:8 [2,220] 38.77 51.53 42.93 46.94 54.85 39.51
Parking:9 [170,678] 62.27 75.78 80.01 65.05 57.03 59.27
Parking:10 [55,845] 101.60 124.23 141.86 102.11 103.75 103.80
Parking:11 [156,723] 74.22 88.79 104.37 61.45 59.43 74.16
Parking:12 [53,503] 62.62 72.78 95.61 75.66 56.69 52.46
Parking:13 [155,413] 36.69 42.93 45.09 41.38 43.56 44.47
Parking:14 [4,246] 30.60 30.41 38.44 27.21 26.46 27.70
Parking:15 [46,593] 82.45 92.53 120.29 106.17 104.31 96.41
Parking:16 [48,689] 73.78 84.67 116.80 73.22 52.34 60.43
Parking:17 [77,2811] 307.67 361.59 451.15 299.45 236.23 268.42
Parking:18 [1,847] 63.88 79.55 84.51 77.57 60.53 56.34
Parking:19 [1,696] 57.90 73.52 79.15 71.45 54.61 47.50
Parking:20 [452,1578] 134.03 166.74 170.08 135.78 151.08 127.09
Parking:21 [51,1534] 113.36 153.19 145.35 138.09 142.10 123.52
Parking:22 [524,3949] 432.00 520.62 576.16 401.78 367.53 358.83
Parking:23 [472,3429] 317.78 462.98 533.55 313.17 349.44 362.69
Parking:24 [331,1444] 98.10 134.56 155.27 110.25 113.20 106.14
Parking:25 [224,1023] 87.47 105.46 109.22 80.14 109.43 100.09
Parking:26 [390,1911] 142.83 196.90 193.64 188.51 155.96 152.47
Parking:27 [248,1561] 155.64 211.38 228.86 145.20 158.50 170.15
AirQuality:0 [-33,28] 1.52 2.20 2.18 2.28 2.13 2.19
AirQuality:1 [-19,41] 1.32 1.78 1.83 1.77 1.78 2.00
AirQuality:2 [991,1046] 0.64 1.22 1.19 1.30 1.25 1.09
Table 1: MAE on the datasets for the various models. The proposed model is dented by seq2seqImp. The forward decoder RNN prediction errors and the backward decoder RNN in the proposed model are included as and respectively. The sequence-to-sequence model is denoted by seq2seq.
seq2seqImp seq2seq RITS-I BRITS-I
MAE 157 78 58 131 141 149
MRE 157 74 56 138 144 145
Table 2: Sum of Borda counts of the models over the datasets. A higher value indicates more points in the voting score. The forward decoder RNN and backward decoder RNN in the proposed model are included as and respectively.

6 Conclusion

We propose a novel sequence-to-sequence model for recovering missing sensor data. Our decoder model merges two encoders that summarise the information of data before and after a missing data sequence. This is performed with a forward and backward RNN within the decoder. The decoder RNNs are merged together with a novel overarching output layer that performs scaling of the RNN cell outputs based on their proximity to observed data.

The proposed model is demonstrated on several time series datasets. It is shown that the proposed model produces the lowest errors in 12% more cases than three other state-of-the-art models and is ranked as the best model according to the Borda count.

In future work, it is expected that significant improvement in the results could be achieved by using the attention mechanism [?] between the encoders and the decoder. Furthermore, the scaling mechanism could possibly be improved by parameterising it within the model such that it can be learned. This could be achieved by using a softmax layer such as used in the attention mechanism.

Acknowledgments.

The authors thank YiFan Zhang from CSIRO for the discussions around the topic of this study.

Footnotes

  1. https://archive.ics.uci.edu/ml/index.php

References

  1. H. Arasteh, V. Hosseinnezhad, V. Loia, A. Tommasetti, O. Troisi, M. Shafie-khah and P. Siano (2016-06) Iot-based smart cities: a survey. In 2016 IEEE 16th International Conference on Environment and Electrical Engineering (EEEIC), Vol. , pp. 1–6. External Links: Document, ISSN Cited by: §1.
  2. D. Bahdanau, K. Cho and Y. Bengio (2015) Neural machine translation by jointly learning to align and translate. In Proc. International Conference on Learning Representations, External Links: Link Cited by: §6.
  3. S. v. Buuren and K. Groothuis-Oudshoorn (2000) Multivariate imputation by chained equations: mice v1.0 user’s manual. Technical Report PG/VGZ/00.038, TNO Prevention and Health, Leiden. Note: \urlhttp://www.stefvanbuuren.nl/publications/MICETechnical report@techreport{Buuren2000Multivariate, title = {Multivariate Imputation by Chained Equations: MICE V1.0 User’s manual}, author = {Buuren, S van and Groothuis-Oudshoorn, Karin}, institution = {TNO Prevention and Health}, number = {PG/VGZ/00.038}, year = {2000}, address = {Leiden}, note = {\url{http://www.stefvanbuuren.nl/publications/MICE%20V1.0%20Manual}}} Wireless sensor network survey52122292 – 23302008ISSN 1389-1286DocumentLinkYickJenniferMukherjeeBiswanathGhosalDipakWireless sensor network, Protocols, Sensor network services, Sensor network deployment, SurveyA wireless sensor network (WSN) has important applications such as remote environmental monitoring and target tracking. This has been enabled by the availability, particularly in recent years, of sensors that are smaller, cheaper, and intelligent. These sensors are equipped with wireless interfaces with which they can communicate with one another to form a network. The design of a WSN depends significantly on the application, and it must consider factors such as the environment, the application’s design objectives, cost, hardware, and system constraints. The goal of our survey is to present a comprehensive review of the recent literature since the publication of [I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, A survey on sensor networks, IEEE Communications Magazine, 2002]. Following a top-down approach, we give an overview of several new applications and then review the literature on various aspects of WSNs. We classify the problems into three different categories: (1) internal platform and underlying operating system, (2) communication protocol stack, and (3) network services, provisioning, and deployment. We review the major development in these three categories and outline new challenges.@article{yick2008wireless, title = {Wireless sensor network survey}, journal = {Computer Networks}, volume = {52}, number = {12}, pages = {2292 - 2330}, year = {2008}, issn = {1389-1286}, doi = {https://doi.org/10.1016/j.comnet.2008.04.002}, url = {http://www.sciencedirect.com/science/article/pii/S1389128608001254}, author = {Jennifer Yick and Biswanath Mukherjee and Dipak Ghosal}, keywords = {Wireless sensor network, Protocols, Sensor network services, Sensor network deployment, Survey}, abstract = {A wireless sensor network (WSN) has important applications such as remote environmental monitoring and target tracking. This has been enabled by the availability, particularly in recent years, of sensors that are smaller, cheaper, and intelligent. These sensors are equipped with wireless interfaces with which they can communicate with one another to form a network. The design of a WSN depends significantly on the application, and it must consider factors such as the environment, the application’s design objectives, cost, hardware, and system constraints. The goal of our survey is to present a comprehensive review of the recent literature since the publication of [I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, A survey on sensor networks, IEEE Communications Magazine, 2002]. Following a top-down approach, we give an overview of several new applications and then review the literature on various aspects of WSNs. We classify the problems into three different categories: (1) internal platform and underlying operating system, (2) communication protocol stack, and (3) network services, provisioning, and deployment. We review the major development in these three categories and outline new challenges.}} PoonC. C. Y.LoB. P. L.YuceM. R.AlomainyA.HaoY.Body sensor networks: in the era of big data and beyond201584–16Body sensor networks (BSN) have emerged as an active field of research to connect and operate sensors within, on or at close proximity to the human body. BSN have unique roles in health applications, particularly to support real-time decision making and therapeutic treatments. Nevertheless, challenges remain in designing BSN nodes with antennas that operate efficiently around, ingested or implanted inside the human body, as well as new methods to process the heterogeneous and growing amount of data on-node and in a distributed system for optimized performance and power consumption. As the battery operating time and sensor size are two important factors in determining the usability of BSN nodes, ultralow power transceivers, energy-aware network protocol, data compression, on-node processing, and energy-harvesting techniques are highly demanded to ultimately achieve a self-powered BSN.Big Data;biomedical electronics;body sensor networks;data compression;energy harvesting;low-power electronics;medical signal processing;protocols;transceivers;body sensor networks;big data;health applications;real time decision making;therapeutic treatments;BSN node design;BSN node antennas;on-node data processing;optimized BSN performance;BSN power consumption;battery operating time;sensor size;ultralow power transceivers;energy aware network protocol;data compression;energy harvesting techniques;Body area networks;Body sensor networks;Wireless communication;Embedded systems;Big data;Biomedical monitoring;Communication system security;Wearable computers;Big data;body area network;cooperative networks;embedded systems;network security;wearable devices;wearable antenna;Big data;body area network (BAN);cooperative networks;embedded systems;network security;wearable devices;wearable antenna;Computer Communication Networks;Computer Security;Equipment Design;Humans;Monitoring, Ambulatory;Remote Sensing Technology;Wireless TechnologyDocumentISSN 1937-3333@article{Poon2015Body, author = {C. C. Y. {Poon} and B. P. L. {Lo} and M. R. {Yuce} and A. {Alomainy} and Y. {Hao}}, journal = {IEEE Reviews in Biomedical Engineering}, title = {Body Sensor Networks: In the Era of Big Data and Beyond}, year = {2015}, volume = {8}, number = {}, pages = {4-16}, abstract = {Body sensor networks (BSN) have emerged as an active field of research to connect and operate sensors within, on or at close proximity to the human body. BSN have unique roles in health applications, particularly to support real-time decision making and therapeutic treatments. Nevertheless, challenges remain in designing BSN nodes with antennas that operate efficiently around, ingested or implanted inside the human body, as well as new methods to process the heterogeneous and growing amount of data on-node and in a distributed system for optimized performance and power consumption. As the battery operating time and sensor size are two important factors in determining the usability of BSN nodes, ultralow power transceivers, energy-aware network protocol, data compression, on-node processing, and energy-harvesting techniques are highly demanded to ultimately achieve a self-powered BSN.}, keywords = {Big Data;biomedical electronics;body sensor networks;data compression;energy harvesting;low-power electronics;medical signal processing;protocols;transceivers;body sensor networks;big data;health applications;real time decision making;therapeutic treatments;BSN node design;BSN node antennas;on-node data processing;optimized BSN performance;BSN power consumption;battery operating time;sensor size;ultralow power transceivers;energy aware network protocol;data compression;energy harvesting techniques;Body area networks;Body sensor networks;Wireless communication;Embedded systems;Big data;Biomedical monitoring;Communication system security;Wearable computers;Big data;body area network;cooperative networks;embedded systems;network security;wearable devices;wearable antenna;Big data;body area network (BAN);cooperative networks;embedded systems;network security;wearable devices;wearable antenna;Computer Communication Networks;Computer Security;Equipment Design;Humans;Monitoring, Ambulatory;Remote Sensing Technology;Wireless Technology}, doi = {10.1109/RBME.2015.2427254}, issn = {1937-3333}, month = {}} ArastehH.HosseinnezhadV.LoiaV.TommasettiA.TroisiO.Shafie-khahM.SianoP.2016 IEEE 16th International Conference on Environment and Electrical Engineering (EEEIC)Iot-based smart cities: a survey2016-061–6Due to the growing developments in advanced metering and digital technologies, smart cities have been equipped with different electronic devices on the basis of Internet of Things (IoT), therefore becoming smarter than before. The aim of this article is that of providing a comprehensive review on the concepts of smart cities and on their motivations and applications. Moreover, this survey describes the IoT technologies for smart cities and the main components and features of a smart city. Furthermore, practical experiences over the world and the main challenges are explained.Internet;Internet of Things;metering;smart cities;IoT technologies;Internet of Things;electronic devices;smart cities;digital technologies;advanced metering;Internet of things;Smart cities;Intelligent sensors;Security;Internet of Things (loT);Smart City;Smart Grids;Smart Buildings;Demand Response;Smart GovernanceDocumentISSN June@inproceedings{Arasteh2016iot, author = {H. {Arasteh} and V. {Hosseinnezhad} and V. {Loia} and A. {Tommasetti} and O. {Troisi} and M. {Shafie-khah} and P. {Siano}}, booktitle = {2016 IEEE 16th International Conference on Environment and Electrical Engineering (EEEIC)}, title = {Iot-based smart cities: A survey}, year = {2016}, volume = {}, number = {}, pages = {1-6}, abstract = {Due to the growing developments in advanced metering and digital technologies, smart cities have been equipped with different electronic devices on the basis of Internet of Things (IoT), therefore becoming smarter than before. The aim of this article is that of providing a comprehensive review on the concepts of smart cities and on their motivations and applications. Moreover, this survey describes the IoT technologies for smart cities and the main components and features of a smart city. Furthermore, practical experiences over the world and the main challenges are explained.}, keywords = {Internet;Internet of Things;metering;smart cities;IoT technologies;Internet of Things;electronic devices;smart cities;digital technologies;advanced metering;Internet of things;Smart cities;Intelligent sensors;Security;Internet of Things (loT);Smart City;Smart Grids;Smart Buildings;Demand Response;Smart Governance}, doi = {10.1109/EEEIC.2016.7555867}, issn = {}, month = {June}} Sequence to sequence learning with neural networksSutskeverIlyaVinyalsOriolLeQuoc VAdvances in Neural Information Processing Systems 27GhahramaniZ.WellingM.CortesC.LawrenceN. D.WeinbergerK. Q.3104–31122014Curran Associates, Inc.Link@incollection{Sutskever2014Sequence, title = {Sequence to Sequence Learning with Neural Networks}, author = {Sutskever, Ilya and Vinyals, Oriol and Le, Quoc V}, booktitle = {Advances in Neural Information Processing Systems 27}, editor = {Z. Ghahramani and M. Welling and C. Cortes and N. D. Lawrence and K. Q. Weinberger}, pages = {3104–3112}, year = {2014}, publisher = {Curran Associates, Inc.}, url = {http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf}} HochreiterSeppSchmidhuberJürgenLong short-term memory981735–17801997DocumentLinkhttps://doi.org/10.1162/neco.1997.9.8.1735Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter’s (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.@article{hochreiter1997long, author = {Hochreiter, Sepp and Schmidhuber, Jürgen}, title = {Long Short-Term Memory}, journal = {Neural Computation}, volume = {9}, number = {8}, pages = {1735-1780}, year = {1997}, doi = {10.1162/neco.1997.9.8.1735}, url = {https://doi.org/10.1162/neco.1997.9.8.1735}, eprint = {https://doi.org/10.1162/neco.1997.9.8.1735}, abstract = {Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.}} Adam: a method for stochastic optimizationKingmaDiederikBaJimmy2014@article{kingma2014adam, title = {Adam: A method for stochastic optimization}, author = {Kingma, Diederik and Ba, Jimmy}, journal = {arXiv preprint arXiv:1412.6980}, year = {2014}} Neural machine translation by jointly learning to align and translateBahdanauDzmitryChoKyunghyunBengioYoshuaProc. International Conference on Learning Representations2015Link@inproceedings{bahdanau2014neural, title = {Neural machine translation by jointly learning to align and translate}, author = {Bahdanau, Dzmitry and Cho, Kyunghyun and Bengio, Yoshua}, booktitle = {Proc. International Conference on Learning Representations}, year = {2015}, url = {http://arxiv.org/abs/1409.0473}} Predicting car park occupancy rates in smart citiesStolfiDaniel HAlbaEnriqueYaoXinInternational Conference on Smart Cities107–1172017Springer@inproceedings{stolfi2017predicting, title = {Predicting car park occupancy rates in smart cities}, author = {Stolfi, Daniel H and Alba, Enrique and Yao, Xin}, booktitle = {International Conference on Smart Cities}, pages = {107–117}, year = {2017}, organization = {Springer}} LiangXuanZouTaoGuoBinLiShuoZhangHaozheZhangShuyiHuangHuiChenSong XiAssessing beijing’s pm¡sub¿2.5¡/sub¿ pollution: severity, weather impact, apec and winter heating4712182201502572015DocumentLinkhttps://royalsocietypublishing.org/doi/pdf/10.1098/rspa.2015.0257By learning the PM2.5 readings and meteorological records from 2010–2015, the severity of PM2.5 pollution in Beijing is quantified with a set of statistical measures. As PM2.5 concentration is highly influenced by meteorological conditions, we propose a statistical approach to adjust PM2.5 concentration with respect to meteorological conditions, which can be used to monitor PM2.5 pollution in a location. The adjusted monthly averages and percentiles are employed to test if the PM2.5 levels in Beijing have been lowered since China’s State Council set up a pollution reduction target. The results of the testing reveal significant increases, rather than decreases, in the PM2.5 concentrations in the years 2013 and 2014 as compared with those in year 2012. We conduct analyses on two quasi-experiments—the Asia-Pacific Economic Cooperation meeting in November 2014 and the annual winter heating—to gain insight into the impacts of emissions on PM2.5. The analyses lead to a conclusion that a fundamental shift from mainly coal-based energy consumption to much greener alternatives in Beijing and the surrounding North China Plain is the key to solving the PM2.5 problem in Beijing.@article{Xuan2015Assessing, author = {Xuan Liang and Tao Zou and Bin Guo and Shuo Li and Haozhe Zhang and Shuyi Zhang and Hui Huang and Song Xi Chen}, title = {Assessing Beijing's PM<sub>2.5</sub> pollution: severity, weather impact, APEC and winter heating}, journal = {Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences}, volume = {471}, number = {2182}, pages = {20150257}, year = {2015}, doi = {10.1098/rspa.2015.0257}, url = {https://royalsocietypublishing.org/doi/abs/10.1098/rspa.2015.0257}, eprint = {https://royalsocietypublishing.org/doi/pdf/10.1098/rspa.2015.0257}, abstract = {By learning the PM2.5 readings and meteorological records from 2010–2015, the severity of PM2.5 pollution in Beijing is quantified with a set of statistical measures. As PM2.5 concentration is highly influenced by meteorological conditions, we propose a statistical approach to adjust PM2.5 concentration with respect to meteorological conditions, which can be used to monitor PM2.5 pollution in a location. The adjusted monthly averages and percentiles are employed to test if the PM2.5 levels in Beijing have been lowered since China's State Council set up a pollution reduction target. The results of the testing reveal significant increases, rather than decreases, in the PM2.5 concentrations in the years 2013 and 2014 as compared with those in year 2012. We conduct analyses on two quasi-experiments—the Asia-Pacific Economic Cooperation meeting in November 2014 and the annual winter heating—to gain insight into the impacts of emissions on PM2.5. The analyses lead to a conclusion that a fundamental shift from mainly coal-based energy consumption to much greener alternatives in Beijing and the surrounding North China Plain is the key to solving the PM2.5 problem in Beijing.}} BlackDuncanBorda, condorcet and laplaceThe Theory of Committees and Elections1987Springer NetherlandsDordrecht156–185Borda’s two justifications of the method of marks. Systematic theorizing on elections was part of the general uprush of thought in France in the second half of the eighteenth century. The first thinker to develop a mathematical theory of elections was Borda, a member of the Academy of Sciences. In the course of a busy life in which he was successively an officer of cavalry and a naval captain, Borda made a number of contributions to Mathematical Physics, and also showed considerable talent in the improvement of scientific instruments. He is commemorated by a statue in his native town of Dax, near Bordeaux, and by a Borda Society.ISBN 978-94-009-4225-7DocumentLink@inbook{Theory1987Black, author = {Black, Duncan}, title = {Borda, Condorcet and Laplace}, booktitle = {The Theory of Committees and Elections}, year = {1987}, publisher = {Springer Netherlands}, address = {Dordrecht}, pages = {156–185}, abstract = {Borda's two justifications of the method of marks. Systematic theorizing on elections was part of the general uprush of thought in France in the second half of the eighteenth century. The first thinker to develop a mathematical theory of elections was Borda, a member of the Academy of Sciences. In the course of a busy life in which he was successively an officer of cavalry and a naval captain, Borda made a number of contributions to Mathematical Physics, and also showed considerable talent in the improvement of scientific instruments. He is commemorated by a statue in his native town of Dax, near Bordeaux, and by a Borda Society.}, isbn = {978-94-009-4225-7}, doi = {10.1007/978-94-009-4225-7\_18}, url = {https://doi.org/10.1007/978-94-009-4225-7\_18}} Cited by: §2.
  4. W. Cao, D. Wang, J. Li, H. Zhou, L. Li and Y. Li (2018) BRITS: bidirectional recurrent imputation for time series. In Advances in Neural Information Processing Systems, pp. 6775–6785. Cited by: §1, §2, §4, §4, §4.
  5. Z. Che, S. Purushotham, K. Cho, D. Sontag and Y. Liu (2018) Recurrent Neural Networks for Multivariate Time Series with Missing Values. Scientific Reports 8 (1), pp. 6085. External Links: Document, ISSN 2045-2322, Link Cited by: §1, §2, §3.2.
  6. S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. 9 (8), pp. 1735–1780. External Links: Document, Link, https://doi.org/10.1162/neco.1997.9.8.1735 Cited by: §4.
  7. D. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. Cited by: §4.
  8. X. Liang, T. Zou, B. Guo, S. Li, H. Zhang, S. Zhang, H. Huang and S. X. Chen (2015) Assessing beijing’s pm¡sub¿2.5¡/sub¿ pollution: severity, weather impact, apec and winter heating. 471 (2182), pp. 20150257. External Links: Document, Link, https://royalsocietypublishing.org/doi/pdf/10.1098/rspa.2015.0257 Cited by: §4.
  9. C. C. Y. Poon, B. P. L. Lo, M. R. Yuce, A. Alomainy and Y. Hao (2015) Body sensor networks: in the era of big data and beyond. 8 (), pp. 4–16. External Links: Document, ISSN 1937-3333 Cited by: §1.
  10. L. Shen, Q. Ma and S. Li (2018-14–16 Nov) End-to-end time series imputation via residual short paths. In Proceedings of The 10th Asian Conference on Machine Learning, J. Zhu and I. Takeuchi (Eds.), Proceedings of Machine Learning Research, Vol. 95, , pp. 248–263. External Links: Link Cited by: §1.
  11. D. H. Stolfi, E. Alba and X. Yao (2017) Predicting car park occupancy rates in smart cities. In International Conference on Smart Cities, pp. 107–117. Cited by: §4.
  12. I. Sutskever, O. Vinyals and Q. V. Le (2014) Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence and K. Q. Weinberger (Eds.), pp. 3104–3112. External Links: Link Cited by: §1, §3.1, §4.
  13. X. Yi, Y. Zheng, J. Zhang and T. Li (2016) ST-mvl: filling missing values in geo-sensory time series data. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, pp. 2704–2710. External Links: ISBN 978-1-57735-770-4, Link Cited by: §2, §4, §4.
  14. J. Yick, B. Mukherjee and D. Ghosal (2008) Wireless sensor network survey. 52 (12), pp. 2292 – 2330. External Links: ISSN 1389-1286, Document, Link Cited by: §1.
  15. J. Yoon, W. R. Zame and M. van der Schaar (2018) Deep sensing: active sensing using multi-directional recurrent neural networks. In Sixth International Conference on Learning Representations, Cited by: §1, §2, §3.2.
  16. Y. Zhang, P. J. Thorburn, W. Xiang and P. Fitch (2019-08) SSIM—a deep learning approach for recovering missing time series sensor data. IEEE Internet of Things JournalJournal of statistical softwareComputer NetworksIEEE Reviews in Biomedical EngineeringNeural ComputationarXiv preprint arXiv:1412.6980Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 6 (4), pp. 6618–6628. Cited by: §2.
  17. J. Zhou and Z. Huang (2018) Recover missing sensor data with iterative imputing network. In Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §2, §4, §4.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
409322
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description