A Review of Deep Learning Methods for Irregularly Sampled Medical Time Series Data

A Review of Deep Learning Methods for Irregularly Sampled Medical Time Series Data


Irregularly sampled time series (ISTS) data has irregular temporal intervals between observations and different sampling rates between sequences. ISTS commonly appears in healthcare, economics, and geoscience. Especially in the medical environment, the widely used Electronic Health Records (EHRs) have abundant typical irregularly sampled medical time series (ISMTS) data. Developing deep learning methods on EHRs data is critical for personalized treatment, precise diagnosis and medical management. However, it is challenging to directly use deep learning models for ISMTS data. On the one hand, ISMTS data has the intra-series and inter-series relations. Both the local and global structures should be considered. On the other hand, methods should consider the trade-off between task accuracy and model complexity and remain generality and interpretability. So far, many existing works have tried to solve the above problems and have achieved good results. In this paper, we review these deep learning methods from the perspectives of technology and task. Under the technology-driven perspective, we summarize them into two categories - missing data-based methods and raw data-based methods. Under the task-driven perspective, we also summarize them into two categories - data imputation-oriented and downstream task-oriented. For each of them, we point out their advantages and disadvantages. Moreover, we implement some representative methods and compare them on four medical datasets with two tasks. Finally, we discuss the challenges and opportunities in this area.


Irregularly sampled time series, medical data, deep learning.

1 Introduction

Time series data have been widely used in practical applications, such as health [15], geoscience [80], sales [96], and traffic [4]. The popularity of time series prediction, classification, and representation has attracted increasing attention, and many efforts have been taken to address the problem in the past few years [97, 3, 17, 2].

The majority of the models assume that the time series data is even and complete. However, in the real world, the time series observations usually have non-uniform time intervals between successive measurements. Three reasons can cause this characteristic: 1) The missing data exists in time series due to broken sensors, failed data transmissions or damaged storage. 2) The sampling machine itself does not have a constant sampling rate. 3) Different time series usually comes from different sources that have various sampling rates. We call such data as irregularly sampled time series (ISTS) data. ISTS data naturally occurs in many real-world domains, such as weather/climate [80], traffic [101], and economics [96].

In the medical environment, irregularly sampled medical time series (ISMTS) is abundant. The widely used electronic health records (EHRs) data have a large number of ISMTS data. EHRs are the real-time, patient-centered digital version of patients’ paper charts. EHRs can provide more opportunities to develop advanced deep learning methods to improve healthcare services and save more lives by assisting clinicians with diagnosis, prognosis, and treatment [81]. Many works based on EHRs data have achieved good results, such as mortality risk prediction [84, 42], disease prediction [70, 65, 20], concept representation [19, 21] and patient typing [6, 21, 14].

Due to the special characteristics of ISMTS, the most important step is establishing the suitable models for it. However, it is especially challenging in medical settings.

Various tasks need different adaptation methods. Data imputation and prediction are two main tasks. The data imputation task is a processing task when modeling data, while the prediction task is a downstream task for the final goal. The two types of tasks may be intertwined. Standard techniques, such as mean imputation [103], singular value decomposition (SVD) [35] and k-nearest neighbour (kNN) [26], can impute data. But they still lead to the big gap between the calculated data distribution and have no ability for the downstream task, like mortality prediction. Linear regression (LR) [33], random forest (RF)[52] and support vector machines (SVM) [109] can predict, but fails for ISTS data.

State-of-the-art deep learning architectures have been developed to perform not only supervised tasks but also unsupervised ones that relate to both imputation and prediction tasks. Recurrent neural networks (RNNs) [24, 107, 27], auto-encoder (AE) [59, 106] and generative adversarial networks (GANs) [13, 99] have achieved good performance in medical data imputation and medical prediction thanks to their abilities of learning and generalization obtained by complex nonlinearity. They can carry out prediction task or imputation task separately, or they can carry out two tasks at the same time through the splicing of neural network structure.

Full name Abbreviations Full name Abbreviations
Time series TS Recurrent neural network RNN
Irregularly sampled time series ISTS Long short-term unit LSTM
Irregularly sampled medical time series ISMTS Gated recurrent unit GRU
Electronic health record EHR Generative adversarial network GAN
Table 1: Abbreviations

Different understandings about the characteristics of ISMTS data appear in existing deep learning methods. We summarized them as missing data-based perspective and raw data-based perspective. The first perspective [15, 91, 64, 12, 53] treat irregular series as having missing data. They solve the problem through more accurate data calculation. The second perspective [6, 90, 5, 83, 16] is on the structure of raw data itself. They model ISMTS directly through the utilization of irregular time information. Neither views can defeat the other.

Either way, it is necessary to grasp the data relations comprehensively for more effectively modeling. We conclude two relations of ISMTS - intra-series relations (data relations within a time series) and inter-series relations (data relations between different time series). All the existing works model one or both of them. They relate to the local structures and global structures of data and we will introduced in Section 3.

Besides, different EHR datasets may lead to different performance of the same method. For example, the real-world MIMIC-III [49] and CINC[1] datasets record multiple different diseases. The records between diseases have distinct data characteristics and the prediction results of each general methods [15, 6, 64, 53] varied between each disease datasets. Thus, many existing methods model a specific disease record, like sepsis [46], atrial fibrillation[111, 43] and kidney disease[77] and have improved the predicting accuracy.

The rest of the paper is organized as follows. Section 2 gives the basic definition and abbreviations. Section 3 describes the features of ISMTS based on two viewpoints - intra-series and inter-series. Section 4 and Section 5 introduce the related works in technology-driven perspective and task-driven perspective. In each perspective, we summarize the methods into specific categories and analyze the merits and demerits. Section 6 compares the experiments of some methods on four medical datasets with two tasks. In section 7 and 8, we raise the challenges and opportunities for modeling ISMTS data and then make conclusion.

Figure 1: An example of a patient’s temporal EHR

2 Preliminaries

The summary of abbreviations is in Table 1.

A typical EHR dataset is consist of a number of patient information which includes demographic information and in-hospital information. In-hospital information is a hierarchical patient-admission-code form shown in Figure 1. Each patient has certain admission records as he/she could be in hospital several times. The codes have diagnoses, lab values and vital sign measurements.

Definition 1 (Electronic Health Records data EHRs)

An electronic health record consists a set of patient records , where is the number of records. A patient , having the demographic information , may have admission records. is the admission record set of a patient , . Each record is consist of many codes, including static diagnoses codes set and dynamic vital signs codes set . Each code has the time stamp .

EHRs have many ISMTS because of two aspects: 1) multiple admissions of one patient and 2) multiple time series records in one admission. Multiple admission records of each patient have different time stamps. Because of health status dynamics and some unpredictable reasons, a patient will visit hospitals under varying intervals [62]. For example, in Figure 1, March 23, 2006, July 11, 2006 and February 14, 2011 are patient admission times. The time interval between the 1st admission and 2nd admission is couple of months while the time interval between admissions 2, 3 is 5 years. Each time series, like blood pressure in one admission, also has different time intervals. Shown as Admission 2 in Figure 1, the sampling time is not fixed. Different physiological variables are examined at different times due to the changes in symptoms. Every possible test is not regularly measured during an admission. When a certain symptom worsens, corresponding variables are examined more frequently; when the symptom disappears, the corresponding variables are no longer examined.

Without the loss of generality, we only discuss univariate time series. Multivariate time series can be modeled in the same way.

Definition 2 (Irregularly Sampled Medical Time Series ISMTS)

A dataset of ISMTS has time series. is the observation of th simple and is its label. is represented as a tuple with time steps. is the observed value in time step l and is the corresponding time. The time interval between the th time step and the th time step of is . Time interval varies between different neighboring time steps.


Definition 2 illustrates three important matters of ISMTS - the value , the time and the time interval . In some missing value-based works (we will introduce in Section 4), they use masking vector to represent the missing value.


3 Characteristics of irregularly sampled medical time series

The medical measurements are frequently correlated both within streams and across streams. For example, the value of blood pressure of a patient at a given time could be correlated with the blood pressure at other times, and it could also have a relation with the heart rate at that given time. Thus, we will introduce ISMTS’s irregularity in two aspects: 1) intra-series and 2) inter-series.

Intra-series irregularity is the irregular time intervals between the nearing observations within a stream. For example, shown in Figure 1, the blood pressure time series have different time intervals, such as 1 hour, 2 hours, and even 24 hours. The time intervals add a time sparsity factor when the intervals between observations are large [62]. Existing two ways can handle the irregular time intervals problem: 1) Determining a fixed interval, treating the time points without data as missing data. 2) Directly modeling time series, seeing the irregular time intervals as information. The first way requires a function to impute missing data [9, 86]. For example, some RNNs [15, 48, 12, 41, 23, 55] can impute the sequence data effectively by considering the order dependency. The second way usually uses the irregular time intervals as inputs. For example, some RNNs [6, 90] apply time decay to affect the order dependency, which can weaken the relation between neighbors with long time intervals.

Inter-series irregularity is mainly reflected in the multi-sampling rates among different time series. For example, shown in Figure 1, vital signs such as heart rate (ECG data) have a high sampling rate (in seconds), while lab results such as pH (PH data) are measured infrequently (in days) [40, 4]. Existing two ways can handle the multi-sampling rates problem: 1) Considering data as a multivariate time series. 2) Processing multiple univariable time series separately. The first way aligns the variables of different series in the same dimension and then solves the missing data problem [82]. The second way models different time series simultaneously and then designs fusion methods [83].

Numerous related works are capable of modeling ISMTS data, we category them from two perspectives: 1) technology-driven and 2) task-driven. We will describe each category in detail.

Figure 2: Categories based on technology-driven
(a) Missing rates of Physionet dataset
(b) Missing rates of of MIMIC-III dataset
(c) Missing rates of sepsis dataset
(d) Missing rates of COVID-19 dataset
Figure 3: The missing rates of real-world EHRs data.
(a) The lines stand for maximum, minimum and average at each hour. The global .
(b) The global . when the (per hour).
(c) The global . drops sharply when (per hour)
(d) The global . until (per day).

4 Categorization based on technology-driven

Based on technology-driven, we divide the existing works into two categories: 1) missing data-based perspective and 2) an raw data-based perspective. The specific categories are shown in Figure 2.

The missing data-based perspective regards every time series has uniform time intervals. The time points without data are considered to be the missing data points. As shown in Figure (a)a, when converting irregular time intervals to regular time intervals, missing data shows up. The missing rate can measure the degree of the missing at a given sampling rate .


The ISMTS in the real-world EHRs have a severe problem with missing data. For example, Luo et al. [69] gathered statistics of CINC2012 dataset [84, 34]. As time goes by, the results show that the maximum missing rate at each timestamp is always higher than 95%. Most variables’ missing rate is above 80%, and the mean of the missing rate is 80.67%, as shown in Figure (a)a. The other three real-word EHRs data set MIMIC-III dataset [49],CINC2019 dataset [79, 1], and COVID-19 dataset [100] are also affected by the missing data, shown in Figure (b)b, (c)c, and (d)d. In this viewpoint, existing methods impute the missing data, or model the missing data information directly.

The raw data-based perspective uses irregular data directly. The methods do not fill in missing data to make the irregular sampling regular. On the contrary, they think that irregular time itself is the valuable information. As shown in Figure (b)b, the time are still irregular and the time intervals are recorded. Irregular time intervals and multi-sampling rates are intra-series characteristic and inter-series characteristic we have introduced in Section 3 respectively. They are very common phenomenons in EHR database. For example, CINC2019 dataset is relatively clean but still has more than 60% samples with irregular time intervals. Only 1.28% samples have the same sampling rate in MIMIC-III dataset. In this viewpoint, methods usually integrate the features of varied time intervals to the inputs of model, or design models which can process samples with different sampling rates.

Figure 4: Systolic and diastolic blood time series with missing data of a patient in ICU
(a) Convert irregularly sampled data to missing data – an example of the pH value of a patient
(b) Record the irregular time intervals between observations – an example of the pH value of a patient
Figure 5: Two perspectives of modeling irregularly sampled medical time series

4.1 Missing data-based perspective

The methods of missing data-based perspective convert ISMTS into equally spaced data. They [89, 88, 98] discretize the time axis into non-overlapping intervals with hand-designed intervals. Then the missing data shows up.

The missing values damage temporal dependencies of sequences [69] and make applying many existing models directly infeasible, such as linear regression [50] and recurrent neural networks (RNN) [95]. As shown in Figure 4, because of missing values, the second valley of the blue signal is not observed and cannot be inferred by simply relying on existing basic models [50, 95]. But the valley values of blood pressure are significant for ICU patients to indicate sepsis [28], a leading cause of patient mortality in ICU [68]. Thus, missing values have an enormous impact on data quality, resulting in unstable predictions and other unpredictable effects [22]. Many prior efforts have been dedicated to the models that can handle missing values in time series. And they can be divided into two categories: 1) two-step approaches and 2) end-to-end approaches.

Two-step approaches

Two-step approaches ignore or impute missing values and then process downstream tasks based on the preprocessed data. A simple solution is to omit the missing data and perform analysis only on the observed data. But it can result in a large amount of useful data not being available [15]. The core of these methods is how to impute the missing data.

Some basic methods are dedicated to filling the values, such as smoothing, interpolation [58], and spline [74]. But they cannot capture the correlation between variables and complex patterns. Other methods estimate the missing values by spectral analysis [78], kernel methods [44], and expectation-maximization (EM) algorithm [32]. However, simple reasoning design and necessary model assumptions make data imputation not accurate. Recently, with the vigorous development of deep learning, these methods have higher accuracy than traditional methods. RNNs and GANs mainly realize the deep learning-based data imputation methods.

A substantial literature uses RNNs to impute the missing data in ISMTS. RNNs take sequence data as input, recursion occurs in the direction of sequence evolution, and all units are chained together. Their special structure endows them with processing sequence data by learning order dynamics. In a RNN, the current state is affected by the previous state and the current input and is described as


RNN can integrate basic methods, such as EM [93] and linear model (LR) [75]. The methods first estimate the missing values and again uses the re-constructed data streams as inputs to a standard RNN. However, EM imputes the missing values by using only the synchronous relationships across data streams (inter-series relations) but not the temporal relationships within streams (intra-series relations). LR interpolates the missing values by using only the temporal relationships within each stream (intra-series relations) but ignoring the relationships across streams (inter-series relations). Meanwhile, most of the RNN-based imputation methods, like simple recurrent network (SRN) and LSTM, which have been proved to be effective to impute medical data by Kim et al. [53], are also learn an incomplete relation with considering intra-series relations only.

Chu et al. [48] have noticed the difference between these two relations in ISMTS data and designed multi-directional recurrent neural network (M-RNN) for both imputation and interpolation. M-RNN operates forward and backward in the intra-series directions according to an interpolation block and operates across inter-series directions by an imputation block. They implanted imputation by a Bi-RNN structure recorded as function and implanted interpolation by fully connected layers with function . The final objective function is mean squared error between the real data and calculated data.


Where and represent data value, masking and time interval we have defined in 2, we will not repeat it below. Bi-RNN is Bidirectional-RNN [37]. It is an advanced RNN structure with forward and backward RNN chains. It have two hidden states for one time point in the above two orders. Two hidden states concatenate or sum into the final value in this time point. Unlike the basic Bi-RNN, the timing of inputs into the hidden layers of M-RNN is lagged in the forward direction and advanced in the backward direction.

However, in M-RNN, the relations between missing variables are dropped, the estimated values are treated as constants which cannot be sufficiently updated.

To solve the problem, Cao et al. [12] proposed bidirectional recurrent imputation for time series (Brits) to predict missing values with bidirectional recurrent dynamics. In this model, the missing values are regarded as the variables in the model graph and get delayed gradients in both forward and backward directions with consistency constraints, which makes the estimation of missing values more accurate. It can update the predicted missing data with a combined three objective function – the errors of historical-based estimation , the feature-based estimation and the combined estimation , which not only considered the relations between missing data and known data, but also modeled the relations between missing data ignored by M-RNN.


But Brits did not take both inter-series and intra-series relations into account, M-RNN solved it.

GANs are a type of deep learning model which train generative deep models through an adversarial process [54]. From the perspective of game theory, GAN training can be seen as a minimax two-player game [108] between generator and discriminator with the objective function.


However, typical GANs require fully observed data during training. In response to this, Yoon et al. [102] proposed generative adversarial imputation nets (GAIN) model. Different from the standard GAN, its generator receives both noise and mask as input data, The masking mechanism makes missing data as input possible. GAIN’s discriminator outputs both real and fake components. Meanwhile, A hint mechanism makes discriminator get some additional information in the form of a hint vector. GAIN changes the objective of basic GAN to


To improve GAIN, Camino et al. [11] used multiple-inputs and multiple-outputs to the generator and the discriminator. The method did the variable splitting by using dense layers connected parallelly for each variable.

Zhang et al. [110] designed Stackelberg GAN based on GAIN to impute the medical missing data for computational efficiency. Stackelberg GAN can generate more diverse imputed values by using multiple generators instead of a single generator and applying the ensemble of all pairs of standard GAN losses.


The main goal of the above two-step methods is to estimate the missing values in the converted time series of ISMTS (convert irregularly sampled features to missing data features). However, in medical background, the ultimate goal is to carry out medical tasks such as mortality prediction [84, 42] and patient subtyping [6, 21, 14]. Two separated steps may lead to the suboptimal analyses and predictions [10] as the missing patterns are not effectively explored for final tasks. Thus, some researches proposed finding ways to solve the downstream tasks directly, rather than filling missing values.

End-to-end approaches

End-to-end approaches process the downstream tasks directly based on modeling the time series with missing data. The core objective is to predict, classify, or clustering. Data imputation is an additional task or not even a task in this type of methods.

Lipton et al. [65] demonstrated a simple strategy - using the basic RNN model to cope with missing data in sequential inputs and the output of RNN being the final characteristics for prediction. Then, to improve this basic idea, they addressed the task of multilabel classification of diagnoses by given clinical time series and found that RNNs can make remarkable use of binary indicators for missing data, improving AUC, and F1 significantly. Thus, they approached missing data by heuristic imputation directly model missingness as a first-class feature in the new work [64].

Similarly, Che at al. [15] also use RNN idea to predict medical issues directly. For solving the missing data problem, they designed a kind of marking vector as the indicator for missing data. In this approach, the value , the time interval and the masking impute missing data together. It first replaces missing data with the mean values, and then used the feedback loop to update the imputed values, which are the input of a standard RNN for prediction.


Meanwhile, they proposed GRU-Decay (GRU-D) to model EHRs data for medical predictions with trainable decays. The decay rate weighs the correlation between missing data and other data (previous data and mean data ).


Meanwhile, in this research, the authors plotted the Pearson correlation coefficient between variable missing rates of MIMIC-III dataset. They have observed that the missing rate is correlated with the labels, demonstrating the usefulness of missingness patterns in solving a prediction task.

However, the above models [15, 50, 12, 5, 64] are limited to using local information (empirical mean or the nearest observation) of ISMTS. For example, GRU-D assumed that a missing variable could be represented as the combination of its corresponding last observed value and the mean value. The global structure and statistics are not directly considered. The local statistics are unreliable when the continuous data misses (shown in Figure 4), or the missing rate rises up.

Tang et al. [91] have realized this problem and designed LGnet, exploring the global and local dependencies simultaneously. They used GRU-D model local structure, grasping intra-series relations, and used a memory module to model the global structures, learning inter-series relations. The memory module have rows, it capture the global temporal dynamics for missing values with the variable correlations . Meanwhile, an adversarial training process can enhance the modeling of global temporal distribution.

4.2 Raw data-based perspective

The alternative of processing the sequences with missing data by pre-discretizing ISMTS is constructing models which can directly receive ISMTS as input. The intuition of raw data-based perspective is from the characteristics of raw data itself - the intra -series relation and the inter-series relation. The intra -series relation of ISMTS is reflected in the irregular time intervals between two neighbor observations within one series; The inter-series relation is reflected in the different sampling rate of different time series. Thus, two subcategories are 1) irregular time intervals-based approaches and 2) multi-sampling rates-based approaches.

Irregular time intervals-based approaches

In EHRs setting, the time lapse between successive elements in patient records can vary from days to months, which is the characteristic of irregular time intervals in ISMTS. A better way to handle it is to model the unequally spaced data using time information directly.

Basic RNNs only process uniformly distributed longitudinal data by assuming that the sequences have an equal distribution of time differences. Thus, design of traditional RNNs may lead to suboptimal performance.

For better RNN performance, Baytas et al. [6] proposed a novel unit, time-aware LSTM (T-LSTM), to handle irregular time intervals in ISMTS for patient subtyping. T-LSTM is incorporated the elapsed time information into basic LSTM by a time decay function


They applied a memory discount in coordination with elapsed time to capture the irregular temporal dynamics to adjust the hidden status of basic LSTM to a new hidden state .


However, when ISMTS is univariate, T-LSTM is not a completely irregular time intervals-based method. For the multivariate ISMTS, it has to align multiple time series and filling missing data first. Where they have to solve the missing data problem again. But the research did not mention the specific filling strategy and used simple interpolation like mean values when data preprocessing.

For the multivariate ISMTS and the alignment problem, Tan et al. [90] gave an end-to-end dual-attention time-aware gated recurrent unit (DATA-GRU) to predict patients’ mortality risk. DATA-GRU uses a time-aware GRU structure T-GRU as same as T-LSTM. Besides, the authors give the strategy of multivariate data alignment problem. When aligning different time series to multi dimensions, previous missing data approaches, such as GRU-D [15] and LGnet [91], assigned equal weights to observed data and imputed data, ignoring the relatively larger unreliability of imputation compared with actuality. DATA-GRU tackles this difficulty by a novel dual-attention structure - unreliability-aware attention with reliability score and symptom-aware attention . The dual-attention structure jointly considers the data-quality and the medical-knowledge.


Further, the attention-like structure makes DATA-GRU explainable according to the interpretable embedding, which is an urgently needed issue in medical tasks.

Instead of using RNNs to learn the order dynamics in ISMTS, Bahadori et al. [5] have proposed methods for analyzing multivariate clinical time series that are invariant to temporal clustering. The events in EHRs may appear in a single admission together or may disperse over multiple admissions. For example, the authors postulated that whether a series of blood tests are completed at once or in rapid succession should not alter predictions. Thus, they designed a data augmentation technique, temporally coarsening, to exploits temporal-clustering invariance to regularize deep neural networks optimized for clinical prediction tasks. Moreover, they proposed a multi-resolution ensemble (MRE) model with the coarsening transformed inputs to improve predictive accuracy.

Multi sampling rates-based approaches

Only modeling the irregular time intervals of intra-series relation would ignore the multi-sampling rate relation of inter-series relation. Further, modeling inter-series relation is also a reflection of considering the global structure of ISMTS.

The above RNN-based methods of irregular time intervals-based category only consider the local order dynamics information. Although LGnet [91] has integrated the global structures, it incorporates all of the information from all time points into an interpolation model, which is redundant and low adaptive. Some models can also learn the global structures of time series, like a basic model Kalman filters [39] and a deep learning deep Markov models [76]. However, this kind of models mainly process the every time series with a stable sampling rate.

Che et al. [16] focused on the problem of modeling multi-rate multivariate time series and proposed a multi-rate hierarchical deep Markov model (MR-HDMM) for healthcare forecasting and interpolation tasks. MR-HDMM learns generation model and inference network by auxiliary connections and learnable switches. The latent hierarchical structure reflected in the states/switches factorizing by joint probability with layer .

Figure 6: Categories based on task-driven

These structures can capture the temporal dependencies and data generation process. Similarly, Binkowski et al. [7] presented an autoregressive framework for regression tasks by modeling ISMTS data. The core idea of implementation is roughly similar with MR-HDMM.

However, these methods considered the different sampling rates between series but ignored the irregular time intervals in each series. They process the data with a stable sampling rate (uniform time intervals) for each time series. For the stable sampling rate, they have to use forward or linear interpolation, where the global structures are omitted again for getting the uniform intervals. The Gaussian process can build global interpolation layers for process multi-sampling rate data. Li et al. [61] and Futoma et al. [31] used this technique. But if a time series is multivariate data, covariance functions are challenging due to the complicated and expensive computation.

Satya et al. [83] designed a fully modular interpolation-prediction network (IPN). IPN has an interpolation network to accommodate the complexity of ISMTS data and provide the multi-channel output by modeling three information - broad trends , transients and local observation frequencies . The three information is calculated by a low-pass interpolation , a high-pass interpolation and an intensity function .


IPN also has a prediction network which operates the regularly partitioned inputs from the former interpolation module. In addition to taking care of data relationships from multiple perspectives, IPN can make up for the lack of modularity in [15] and address the difficulty of the complexity of the Gaussian process interpolation layers in [61, 31].

5 Categorization based on task-driven

Modeling ISTS data aims to achieve two main tasks: 1) Missing data imputation and 2) Downstream tasks. The specific categories are shown in Figure 6.

Missing data imputation is of practical significance, as works on machine learning have become actively, getting large amounts of complete data has become an important issue. However, it is almost impossible in the real world to get complete data for many reasons like lost records. In many cases, the time series with missing values becomes useless and then thrown away. This results in a large amount of data loss. The incomplete data has adverse effects when learning a model [54].

Existing basic methods, such as interpolation [58] kernel methods [44] and EM algorithm [32, 93], have been proposed a long time ago. With the popularity of deep learning in recent years, most new methods are implemented by artificial neural networks (ANNS). One of the most popular models is RNN [95]. RNNs can capture long-term temporal dependencies and use them to estimate the missing values. Existing works [22, 12, 53, 64, 18, 63] have designed several special RNN structures to adapt the missingness and achieve good results. Another popular model is GANs [36], which generate plausible fake data through adversarial training. GAN has been successfully applied to face completion and sentence generation [8, 63, 67, 105]. Based on their data generation abilities, some research [69, 54, 102, 60] have applied GAN on time series data generation with considering sequence information into the process.

Downstream tasks generally include prediction, classification, and clustering. For ISMTS data, medical prediction (such as mortality prediction, disease classification and image classification) [84, 70, 65, 20, 42], concept representation [19, 21] and patient typing [6, 21, 14] are three main tasks. The downstream task-oriented methods calculate missing values and perform downstream tasks simultaneously, which is expected to avoid suboptimal analyses and predictions caused by the not effectively explored missing patterns due to the separation of imputations and final tasks [10]. Most methods [15, 91, 6, 90, 83, 16, 5, 64] use deep learning technology to achieve higher accuracy on tasks.

6 Experiments

In this section, we apply the above methods on four datasets and two tasks. We will analyze the method through the experimental results.

6.1 Datasets

Four datasets were used to evaluate the performance of baselines.

MIMIC-III dataset [49] is a freely accessible de-identified critical care database, developed and maintained by the Massachusetts Institute of Technology Laboratory for Computational Physiology. In this work, we extract diagnoses and demographics information for disease prediction and choose records with more than one visit. The dataset comprises 19,993 hospital admissions, 260,326 diagnoses and 4,893 unique ICD-9 codes of 7,537 patients. The average number of visits per patient is 2.66. There are 13.02 codes per visit on average and up to 39 codes in a visit.

CINC2012 dataset [84] consist of records from 12,000 ICU stays and have 4000 multivariate clinical time series. All patients were adults who were admitted for a wide variety of reasons to cardiac, medical, surgical, and trauma ICUs. Each record is a multivariate time series of roughly 48 hours and contains 41 variables such as Albumin, heart rate, glucose etc.

CINC2019 dataset [1] is publicly available and comes from two hospitals; it contains 30,336 patient admission records and 2,359 records of diagnosed sepsis cases. It is a set of multivariate time series that contains 40 related features, 8 kinds of vital signs, 26 kinds of laboratory values and 6 kinds of demographics. The time interval is 1 hour. The sequence length is between 8 and 336, and 29,414 records have lengths less than 60.

COVID-19 dataset [100] is collected between 10 January and 18 February 2020 from Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China. The dataset contains 375 patients with 6120 blood sample records as training set, 110 patients with 757 records as test set and 80 characteristics.

6.2 Experiment setting

The experiments have two tasks – 1) mortality prediction and 2) data imputation. The mortality prediction tasks use the time series of 48 hours before onset time from the above four datasets. The imputation tasks use 8 features (using the method in [16]) which are eliminated 10% of observed measurements from data. The eliminated data is the new ground-truth.

For RNN-based method, we fix the dimension of hidden state is 64. For GAN-based methods, the series inputs also use RNN structure. For final prediction, all methods use one 128-dimensions FC layer and one 64-dimensions FC layer. All methods apply Adam Optimizer [57] with , and . We use the learning rate decay with decay rate and the decay step is 2000. The 5-fold cross validation is used for both two tasks.

6.3 Evaluation metrics

The prediction results were evaluated by assessing the area under curve of receiver operating characteristic (AUC-ROC). ROC is a curve of true positive rate (TPR) and false positive rate (FPR). TN, TP, FP and FN stand for true positive, true negative, false positive and false negative rates.


We evaluate the imputation performance in terms of mean squared error (MSE). For th item, is the real value and is the predicting value. The number of missing values is .

RNN[64] 0.809 ± 0.014 0.800 ± 0.016 0.825 ± 0.024 0.945 ± 0.004
LSTM[53] 0.812 ± 0.009 0.805 ± 0.010 0.829 ± 0.019 0.945 ± 0.005
GRU-D[15] 0.829 ± 0.003 0.818 ± 0.009 0.835 ± 0.013 0.965 ± 0.003
M-RNN[48] 0.827 ± 0.005 0.820 ± 0.011 0.842 ± 0.010 0.959 ± 0.003
Brits[12] 0.833 ± 0.002 0.819 ± 0.012 0.839 ± 0.013 0.959 ± 0.002
T-LSTM[6] 0.817 ± 0.004 0.804 ± 0.010 0.831 ± 0.014 0.963 ± 0.003
DATA-GRU[90] 0.832 ± 0.006 0.822 ± 0.012 0.851 ± 0.012 0.961 ± 0.003
LGnet[91] 0.833 ± 0.003 0.822 ± 0.013 0.843 ± 0.013 0.956 ± 0.002
IPN [83] 0.831 ± 0.003 0.824 ± 0.009 0.844 ± 0.015 0.960 ± 0.003
Table 2: Performance comparison for mortality prediction task (in AUC-ROC)
RNN[64] 4.985 4.180 2.901 1.685
LSTM[53] 4.712 4.046 2.899 1.710
GRU-D[15] 4.412 3.567 2.379 1.543
M-RNN[48] 4.435 3.236 2.337 1.530
Brits[12] 4.339 3.238 2.439 1.533
GAN-GRUI[69] 3.920 3.129 2.225 1.533
Stackelberg[110] 4.431 3.741 2.435 1.932
MR-HDMM[16] 4.015 3.113 2.217 1.555
Table 3: Performance comparison for imputation task (in RMSE)
RNN[64] 0.805 0.783 0.814 0.945
LSTM[53] 0.802 0.794 0.819 0.943
GRU-D[15] 0.819 0.813 0.840 0.949
M-RNN[48] 0.826 0.818 0.842 0.946
Brits[12] 0.823 0.810 0.831 0.947
GAN-GRUI[69] 0.820 0.809 0.842 0.945
Stackelberg[110] 0.798 0.767 0.789 0.941
MR-HDMM[16] 0.829 0.814 0.841 0.945
Table 4: Performance comparison for mortality prediction task based on the imputation data (in AUC-ROC)

6.4 Results

Table 1 shows the performances of baselines for the mortality prediction task. For the two categories of technology-driven methods, each has its own merits, but irregularity-based methods work relatively well. Missing data-based methods have 2/4 top 1 results and 2/5 top 2 results, while irregularity-based methods have 2/4 top 1 results and 3/5 top 2 results.

For the methods of whether the two series relation are considered, the methods that take both inter-series relation and intra-series relation (both global and local structures) into account perform better. IPN, LGnet, and DATA-GRU have relatively good results. For different datasets, the methods show different effects. For example, as COVID-19 is a small dataset, unlike the other three datasets, the relatively simple methods perform better on this dataset, like T-LSTM, which doesn’t perform very well on the other three datasets.

Table 2 shows the performances of baselines for the imputation prediction task. The performances of RNN-based methods and GAN-based methods are similar. RNN-based methods have 3/4 top 1 results and 2/5 top 2 results, GAN-based methods have 1/4 top 1 results and 3/5 top 2 results. MR-HDMM and GAN-GRUI are the two best methods.

The data imputation is better in the Sepsis and COVID-19 dataset. Perhaps the time series in these two datasets is from the patients who suffered from the same disease. That’s probably why they also have relatively better results in the prediction task.

Table 3 shows a basic RNN model’s performance for mortality prediction tasks based on baselines’ imputation data. Different from the results in Table 2, the RNN-based methods perform better. Where the RNN-based methods have 4/5 top 1 results, but GAN-based methods have 1/5. The reason may be that the RNN-based approaches have integrated the downstream tasks when imputing. So, the data generated by them is more suitable for the final prediction task.

7 Discussions

According to the analysis of technologies and experiment results, in this section, we will discuss ISMTS modeling task from three perspectives - 1) imputation task with prediction task, 2) intra-series relation with inter-series relation / local structure with global structure and 3) missing data with raw data. The conclusions of the approaches in this survey are in Table 5.

7.1 Challenges

Based on the above five perspectives, we summarize the challenges as follows.

How to balance the imputation with the prediction? Different kinds of methods suit different tasks. GANs prefer imputation while RNNs prefer prediction. However, in the medical setting, aiming at different datasets, the conclusion does not seem correct. For example, missing data is generated better by RNNs than GANs in the COVID-19 dataset. And the two-step methods based on GANs for mortality prediction are no worse than using RNNs directly. Therefore, it seems difficult to achieve a general and effective modeling method in medical settings. The method should be specified according to the specific task and the characteristics of the datasets.

How to handle the intra-series relation with inter-series relation of ISMTS? In other words, how to trade off the local structure with global structure. In ISMTS format, a patient has several time series of vital signs connected to the diagnoses of diseases or the probability of death. Seeing these time series as a whole multivariate data sample, intra-series relations are reflected in longitudinal dependences and horizontal dependencies. The longitudinal dependencies contain the sequence order and context, time intervals, and decay dynamics. The horizontal dependence is the relations between different dimensions. And the inter-series relations are reflected in the patterns of time series of different samples.

However, when seeing these time series as separated multi-samples of a patient, the relations will change. Intra-series relations change to the dependencies of values observed on different time steps in a univariate ISMTS. The features of different time intervals should be taken care of. Inter-series relations change to the pattern relations between different patients’ different samples and between different time series of the same vital sign.

For the structural level, modeling intra-series relations is basically at the local level, while modeling inter-series relations is global. It is not clear what kind of consideration and which structure will make the results better. Modeling local and global structures seems to perform better in morality prediction, but it is a more complex method, and it’s not universal for different datasets.

How to choose the modeling perspective, missing data-based or irregularity-based? Both two kinds of methods have advantages and disadvantages. Most existing works are missing data-based and there are methods of estimating missing data for a long time [29]. In settings of missing data-based perspective, the discretization interval length is a hyper-parameter needs to be determined. If the interval size is large, missing data is less, but several values will show in the same interval; If the interval size is small, the missing data becomes more. No values in an interval will hamper the performance, while too many values in an interval need an ad-hoc choosing method. Meanwhile, missing data-based methods have to interpolate new values, which may artificially introduce some naturally occurring dependencies. Over-imputation may result in an explosion in size and the pursuit of multivariate data alignment may lead to the loss of raw data dependency. Thus, of particular interest are irregularity-based methods that can learn directly by using multivariate sparse and irregularly sampled time series as input without the need for other imputation.

However, although the raw data-based methods have metrics of no artificial dependencies introduced, they suffer from not achieving the desired results, complex designs, and large parameters. Irregular time intervals-based methods are not complex as they can be achieved by just injecting time decay information. But in terms of specific tasks, such as morality prediction, the methods seem not as good as we think (concluded from experiments section). Meanwhile, for multivariable time series, these methods have to align values on different dimensions, which leads to missing data problems again. Multi-sampling rates-based methods will not cause missing data. However, processing multiple univariate time series at the same time requires more parameters and is not friendly to batch learning. Meanwhile, modeling the entire univariate series may require data generation model assumptions.

7.2 Opportunities

Considering the complex patient states, the amount of interventions and the real-time requirement, the data-driven approaches by learning from EHRs are the desiderata to help clinicians.

Although some difficulties have not been solved yet, the deep learning method does show a better ability to model medical ISMTS data than the basic methods. Basic methods can’t model ISMTS completely as interpolation-based methods [58, 74] just exploit the correlation within each series, imputation-based methods [32, 56] just exploit the correlation among different series, matrix completion-based methods [71, 104] assume that the data is static and ignore the temporal component of the data. Deep learning methods use parameter training to learn data structures, and many basic methods can be integrated into the designs of neural networks. The deep learning methods introduced in this survey basically solve the problem of common methods and have achieved state-of-the-art in medical prediction tasks, including mortality prediction, disease prediction, and admission stay prediction. Therefore, the deep learning model based on ISMTS data has a broad prospect in medical tasks.

The deep learning methods, both RNN-based and GAN-based methods mentioned in this survey, are troubled by poor interpretability [73, 87], and clinical settings prefer interpretable models. Although this defect is difficult to solve due to models’ characteristics, some researchers have made some breakthroughs and progress. For example, the attention-like structures which are used in [70, 20] can give an explanation for medical predictions.

Category Subcategory Researches Advantages Disadvantages
Missing data-based Two-step [95, 69, 22, 12] [54, 102, 60, 53] [110, 30, 25] Generality; Ability of imputation. Suboptimal prediction; Incomplete data relation; Data generation patterns assumptions; Introduced artificial dependency.
End-to-end [15, 91, 64, 51] Optimal prediction Non-commonality; Introduced artificial dependency.
Raw data-based Irregular time intervals-based [6, 90, 5, 38] [47, 72, 45] [92, 94] No artificial dependency. Low-applicability for multivariate data; Incomplete data relation.
Multi sampling rates-based [83, 16, 66, 85] No artificial dependency; No data imputation. Implementation complexity; Data generation patterns assumptions.
Table 5: Conclusions of introduced methods for medical tasks using EHRs data

8 Conclusion

This survey introduced a kind of data – irregularly sampled medical time series (ISMTS). Combined with medical settings, we described characteristics of ISMTS. Then, we have investigated the relevant methods for modeling ISMTS data and classified them by technology-driven perspective and task-driven perspective. For each category, we divided the subcategories in detail and represented each specific model’s implementation method. Meanwhile, according to imputation and prediction experiments, we analyzed the advantages and disadvantages of some methods and made conclusions. Finally, we summarized the challenges and opportunities of modeling ISMTS data task.


  1. R. M. A., J. C. S, J. Russell, S. S. P., W. M. Brandon, N. Shamim, C. G. D. and S. Ashish (2020) Early prediction of sepsis from clinical data: the physionet/computing in cardiology challenge 2019. Critical Care Medicine 48 (121), pp. 210–217. Cited by: §1, §4, §6.1.
  2. K. Aggarwal, S. R. Joty, L. Fernández-Luque and J. Srivastava (2019) Adversarial unsupervised representation learning for activity time-series. In The Thirty-Third AAAI Conference on Artificial Intelligence, pp. 834–841. Cited by: §1.
  3. M. Ali, A. Alqahtani, M. W. Jones and X. Xie (2019) Clustering and classification for time series data in visual analytics: a survey. IEEE Access PP (99), pp. 1–1. Cited by: §1.
  4. S. Arslanturk, M. R. Siadat, T. Ogunyemi, K. Killinger and A. Diokno (2016) Analysis of incomplete and inconsistent clinical survey data. Knowledge and Information Systems 46 (3), pp. 731–750. Cited by: §3.
  5. M. T. Bahadori and Z. C. Lipton (2019) Temporal-clustering invariance in irregular healthcare time series. CoRR abs/1904.12206. Cited by: §1, §4.1.2, §4.2.1, §5, Table 5.
  6. I. M. Baytas, C. Xiao, X. Zhang, F. Wang and J. Zhou (2017) Patient subtyping via time-aware lstm networks. In Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, Cited by: §1, §1, §1, §3, §4.1.1, §4.2.1, §5, Table 2, Table 5.
  7. M. Binkowski, G. Marti and P. Donnat (2018) Autoregressive convolutional neural networks for asynchronous time series. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80, pp. 579–588. Cited by: §4.2.2.
  8. A. Bora, E. Price and A. G. Dimakis (2018) AmbientGAN: generative models from lossy measurements. In 6th International Conference on Learning Representations, Cited by: §5.
  9. G. E. P. Box and G. M. Jenkins (2010) Time series analysis : forecasting and control. Journal of Time 31 (3). Cited by: §3.
  10. A. S. N. Brian J Wells and M. W. Kattan (2010) Strategies for handling missing data in electronic health record derived data. Generating Evidence and Methods 1 (3). Cited by: §4.1.1, §5.
  11. R. D. Camino, C. A. Hammerschmidt and R. State (2019) Improving missing data imputation with deep generative models. CoRR abs/1902.10666. Cited by: §4.1.1.
  12. W. Cao, D. Wang, J. Li, H. Zhou, L. Li and Y. Li (2018) BRITS: bidirectional recurrent imputation for time series. In Advances in Neural Information Processing Systems, pp. 6776–6786. Cited by: §1, §3, §4.1.1, §4.1.2, §5, Table 2, Table 3, Table 4, Table 5.
  13. M. Chai and Y. Zhu (2019) Research and application progress of generative adversarial networks. Computer Engineering. Cited by: §1.
  14. Z. Che, D. C. Kale, W. Li, M. T. Bahadori and Y. Liu (2015) Deep computational phenotyping. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 507–516. Cited by: §1, §4.1.1, §5.
  15. Z. Che, S. Purushotham, K. Cho, D. Sontag and Y. Liu (2018) Recurrent neural networks for multivariate time series with missing values. Sentific Reports 8 (1), pp. 6085. Cited by: §1, §1, §1, §3, §4.1.1, §4.1.2, §4.1.2, §4.2.1, §4.2.2, §5, Table 2, Table 3, Table 4, Table 5.
  16. Z. Che, S. Purushotham, M. G. Li, B. Jiang and Y. Liu (2018) Hierarchical deep generative models for multi-rate multivariate time series. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80, pp. 783–792. Cited by: §1, §4.2.2, §5, §6.2, Table 3, Table 4, Table 5.
  17. Z. Cheng, Y. Yang, W. Wang, W. Hu, Y. Zhuang and G. Song (2020) Time2Graph: revisiting time series modeling with dynamic shapelets. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 3617–3624. Cited by: §1.
  18. E. Choi, M. T. Bahadori, A. Schuetz, W. F. Stewart and J. Sun (2016) Doctor AI: predicting clinical events via recurrent neural networks. In Proceedings of the 1st Machine Learning in Health Care, Vol. 56, pp. 301–318. Cited by: §5.
  19. E. Choi, M. T. Bahadori, E. Searles, C. Coffey, M. Thompson, J. Bost, J. Tejedor-Sojo and J. Sun (2016) Multi-layer representation learning for medical concepts. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1495–1504. Cited by: §1, §5.
  20. E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz and W. F. Stewart (2016) RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in Neural Information Processing Systems, pp. 3504–3512. Cited by: §1, §5, §7.2.
  21. E. Choi, C. Xiao, W. F. Stewart and J. Sun (2018) MiME: multilevel medical embedding of electronic health records for predictive healthcare. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems, pp. 4552–4562. Cited by: §1, §4.1.1, §5.
  22. X. Chu, I. F. Ilyas, S. Krishnan and J. Wang (2016) Data cleaning: overview and emerging challenges. In International Conference on Management of Data, Cited by: §4.1, §5, Table 5.
  23. J. Chung, C. Gulcehre, K. H. Cho and Y. Bengio (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. Eprint Arxiv. Cited by: §3.
  24. R. Cui, M. Liu and A. D. N. Initiative (2019) RNN-based longitudinal analysis for diagnosis of alzheimer’s disease. Comput. Medical Imaging Graph. 73, pp. 1–10. Cited by: §1.
  25. S. Dash, A. Yale, I. Guyon and K. P. Bennett (2020) Medical time-series data generation using generative adversarial networks. In Artificial Intelligence in Medicine, Vol. 12299, pp. 382–391. Cited by: Table 5.
  26. W. W. Dong and Zhenjiang (2007) An efficient nearest neighbor classifier algorithm based on pre-classify. Computer ence. Cited by: §1.
  27. H. Duan, Z. Sun, W. Dong, K. He and Z. Huang (2020) On clinical event prediction in patient treatment trajectory using longitudinal electronic health records. IEEE Journal of Biomedical and Health Informatics 24 (7), pp. 2053–2063. Cited by: §1.
  28. M. W. Dünser, J. Takala, H. Ulmer, V. D. Mayr, G. Luckner, S. Jochberger, F. Daudel, P. Lepper, W. R. Hasibeder and S. M. Jakob (2009) Arterial blood pressure during early sepsis and outcome. Intensive Care Medicine 35 (7), pp. 1225–1233. Cited by: §4.1.
  29. I. Ezzine and L. Benhlima (2018) A study of handling missing data methods for big data. In 2018 IEEE 5th International Congress on Information Science and Technology (CiSt), Cited by: §7.1.
  30. N. Fazakis, G. Kostopoulos, S. Kotsiantis and I. Mporas (2020) Iterative robust semi-supervised missing data imputation. IEEE Access 8, pp. 90555–90569. Cited by: Table 5.
  31. J. Futoma, S. Hariharan and K. A. Heller (2017) Learning to detect sepsis with a multitask gaussian process RNN classifier. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70, pp. 1174–1182. Cited by: §4.2.2, §4.2.2.
  32. P. J. García-Laencina, J. Sancho-Gómez and A. R. Figueiras-Vidal (2010) Pattern classification with missing data: a review. Neural Computing and Applications 19 (2), pp. 263–282. Cited by: §4.1.1, §5, §7.2.
  33. K. Godfrey (1986) Simple linear regression in medical research.. N Engl J Med 315 (26), pp. 1629–1636. Cited by: §1.
  34. A. Goldberger, L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark and H. E. Stanley (2000) PhysioBank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. Circulation [Online] 101 (23), pp. pp. e215–e220. Cited by: §4.
  35. G. H. Golub (1970) Singular value decomposition and least squares solutions. Numerische Mathematik 14 (5), pp. 403–420. Cited by: §1.
  36. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu and D. Warde-Farley (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §5.
  37. A. Graves and J. Schmidhuber (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks 18 (5–6), pp. 602–610. Cited by: §4.1.1.
  38. A. Guecioueur and F. J. Király (2020) Kernels for time series with irregularly-spaced multivariate observations. CoRR abs/2004.08545. Cited by: Table 5.
  39. S. Haykin (2001) Kalman filtering and neural networks. John Wiley and Sons, Inc. Cited by: §4.2.2.
  40. M. Herland, T. M. Khoshgoftaar and R. Wald (2013) Survey of clinical data mining applications on big data in health informatics. In International Conference on Machine Learning and Applications, ICMLA, pp. 465–472. Cited by: §3.
  41. S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural Computation 9 (8), pp. 1735–1780. Cited by: §3.
  42. S. Hong, Y. Xu, A. Khare, S. Priambada, K. O. Maher, A. Aljiffry, J. Sun and A. Tumanov (2020) HOLMES: health online model ensemble serving for deep learning models in intensive care units. In The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1614–1624. Cited by: §1, §4.1.1, §5.
  43. S. Hong, Y. Zhou, J. Shang, C. Xiao and J. Sun (2020) Opportunities and challenges of deep learning methods for electrocardiogram data: A systematic review. Comput. Biol. Medicine 122, pp. 103801. Cited by: §1.
  44. P. R. Ian R White and A. M. Wood. (2011) Multiple imputation using chained equations. issues and guidance for practice.. Statistics in medicine 30 (2), pp. 377–399. Cited by: §4.1.1, §5.
  45. Y. N. Jane, H. K. Nehemiah and A. Kannan (2017) A bio-statistical mining approach for classifying multivariate clinical time series data observed at irregular intervals. Expert Syst. Appl. 78, pp. 283–300. Cited by: Table 5.
  46. S. L. Javan, M. M. Sepehri, M. L. Javan and T. Khatibi (2019) An intelligent warning model for early prediction of cardiac arrest in sepsis patients. Comput. Methods Programs Biomed. 178, pp. 47–58. Cited by: §1.
  47. Y. Jiao, K. Yang, S. Dou, P. Luo, S. Liu and D. Song (2020) TimeAutoML: autonomous representation learning for multivariate irregularly sampled time series. CoRR abs/2010.01596. Cited by: Table 5.
  48. Y. Jinsung, Z. W. R. and M. V. D. Schaar (2017) Estimating missing data in temporal data streams using multi-directional recurrent neural networks. IEEE Transactions on Biomedical Engineering PP, pp. 1–1. Cited by: §3, §4.1.1, Table 2, Table 3, Table 4.
  49. A. e. al. Johnson (2016) Mimic-iii, a freely accessible critical care database. SCI. data. Cited by: §1, §4, §6.1.
  50. B. Jonathan and J. Ian (1979) Linear regression with censored data. Biometrika (3), pp. 429–436. Cited by: §4.1.2, §4.1.
  51. O. Karaahmetoglu, F. Ilhan, I. Balaban and S. S. Kozat (2020) Unsupervised online anomaly detection on irregularly sampled or missing valued time-series data using LSTM networks. CoRR abs/2005.12005. Cited by: Table 5.
  52. M. Khalilia, S. Chakraborty and M. Popescu (2011) Predicting disease risks from highly imbalanced data using random forest. Bmc Medical Informatics and Decision Making 11 (1), pp. 51–51. Cited by: §1.
  53. H. Kim, G. Jang, H. Choi, M. Kim, Y. Kim and J. Choi (2017) Recurrent neural networks with missing information imputation for medical examination data prediction. In IEEE International Conference on Big Data and Smart Computing, pp. 317–323. Cited by: §1, §1, §4.1.1, §5, Table 2, Table 3, Table 4, Table 5.
  54. J. Kim, D. Tae and J. Seok (2020) A survey of missing data imputation using generative adversarial networks. In 2020 International Conference on Artificial Intelligence in Information and Communication, pp. 454–456. Cited by: §4.1.1, §5, §5, Table 5.
  55. Y. Kim and M. Chi (2018) Temporal belief memory: imputing missing data during RNN training. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 2326–2332. Cited by: §3.
  56. B. King and D. B. Rubin (1988) Multiple imputation for nonresponse in surveys. Journal of the American Statal Association 84 (406), pp. 612. Cited by: §7.2.
  57. D. P. Kingma and J. Ba (2015) Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, Cited by: §6.2.
  58. D. M. Kreindler and C. J. Lumsden (2006) The effects of the irregular sample and missing data in time series analysis.. Nonlinear Dynamics Psychology and Life ences 10 (2), pp. 187–214. Cited by: §4.1.1, §5, §7.2.
  59. J. Lee, S. Sun, S. M. Yang, J. J. Sohn, J. Park, S. Lee and H. C. Kim (2019) Bidirectional recurrent auto-encoder for photoplethysmogram denoising. IEEE Journal of Biomedical and Health Informatics 23 (6), pp. 2375–2385. Cited by: §1.
  60. S. C. Li, B. Jiang and B. M. Marlin (2019) MisGAN: learning from incomplete data with generative adversarial networks. In 7th International Conference on Learning Representations, Cited by: §5, Table 5.
  61. S. C. Li and B. M. Marlin (2016) A scalable end-to-end gaussian process adapter for irregularly sampled time series classification. In Advances in Neural Information Processing Systems, pp. 1804–1812. Cited by: §4.2.2, §4.2.2.
  62. S. C. Li and B. M. Marlin (2020) Learning from irregularly-sampled time series: A missing data perspective. CoRR abs/2008.07599. Cited by: §2, §3.
  63. Y. Li, S. Liu, J. Yang and M. Yang (2017) Generative face completion. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 5892–5900. Cited by: §5.
  64. Z. C. Lipton, D. C. Kale and R. C. Wetzel (2016) Directly modeling missing data in sequences with rnns: improved classification of clinical time series. In Proceedings of the 1st Machine Learning in Health Care, Vol. 56, pp. 253–270. Cited by: §1, §1, §4.1.2, §4.1.2, §5, §5, Table 2, Table 3, Table 4, Table 5.
  65. Z. C. Lipton, D. C. Kale, C. Elkan and R. C. Wetzel (2016) Learning to diagnose with LSTM recurrent neural networks. In 4th International Conference on Learning Representations, ICLR, Cited by: §1, §4.1.2, §5.
  66. M. Liu, F. Stella, A. Hommersom, P. J. F. Lucas, L. Boer and E. Bischoff (2019) A comparison between discrete and continuous time bayesian networks in learning from clinical time series data with irregularity. Artif. Intell. Medicine 95, pp. 104–117. Cited by: Table 5.
  67. S. Liu, O. Bousquet and K. Chaudhuri (2017) Approximation and convergence properties of generative adversarial learning. In Advances in Neural Information Processing Systems, pp. 5545–5553. Cited by: §5.
  68. V. Liu, G. J. Escobar, J. D. Greene, J. Soule, A. Whippy, D. C. Angus and T. J. Iwashyna (2014) Hospital deaths in patients with sepsis from 2 independent cohorts. Jama 312 (1), pp. 90–92. Cited by: §4.1.
  69. Y. Luo, X. Cai, Y. Zhang, J. Xu and X. Yuan (2018) Multivariate time series imputation with generative adversarial networks. In Advances in Neural Information Processing Systems, pp. 1603–1614. Cited by: §4.1, §4, §5, Table 3, Table 4, Table 5.
  70. F. Ma, R. Chitta, J. Zhou, Q. You, T. Sun and J. Gao (2017) Dipole: diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1903–1911. Cited by: §1, §5, §7.2.
  71. R. Mazumder, T. Hastie and R. Tibshirani (2009) Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research Jmlr 11 (11), pp. 2287. Cited by: §7.2.
  72. Z. Mei, X. Zhao and H. Chen (2018) A distributed descriptor characterizing structural irregularity of EEG time series for epileptic seizure detection. In 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 3386–3389. Cited by: Table 5.
  73. C. Molnar (2020) Interpretable machine learning: a guide for making black box models explainable. online. Cited by: §7.2.
  74. T. Ogden (2002) Wavelet methods for time series analysis. (book reviews). Journal of the American (March). Cited by: §4.1.1, §7.2.
  75. S. Parveen and P. D. Green (2001) Speech recognition with missing data using recurrent neural nets. In Advances in Neural Information Processing Systems, pp. 1189–1195. Cited by: §4.1.1.
  76. Pederson and P. Shane (1998) Hidden markov and other models for discretevalued time serie. Technometrics 40 (3), pp. 263–263. Cited by: §4.2.2.
  77. A. J. Perotte, R. Ranganath, J. S. Hirsch, D. M. Blei and N. Elhadad (2015) Risk prediction for chronic kidney disease progression using heterogeneous electronic health record data and time series analysis. J. Am. Medical Informatics Assoc. 22 (4), pp. 872–880. Cited by: §1.
  78. Rehfeld, K., Marwan, N., Heitzig, J., Kurths and J. (2011) Comparison of correlation analysis techniques for irregularly sampled time series. Nonlinear Processes in Geophysics. Cited by: §4.1.1.
  79. C. Reyna, B. W. M. B. Jeter, S. A., N. S. and G. Clifford (2019) Early prediction of sepsis from clinical data – the physionet computing in cardiology challenge 2019 (version 1.0.0). PhysioNet. Cited by: §4.
  80. X. Shi, Z. Chen, H. Wang, D. Yeung, W. Wong and W. Woo (2015) Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems, pp. 802–810. Cited by: §1, §1.
  81. B. Shickel, P. Tighe, A. Bihorac and P. Rashidi (2018) Deep EHR: A survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Informatics 22 (5), pp. 1589–1604. Cited by: §1.
  82. S. N. Shukla and B. M. Marlin (2018) Modeling irregularly sampled clinical time series. CoRR abs/1812.00531. Cited by: §3.
  83. S. N. Shukla and B. M. Marlin (2019) Interpolation-prediction networks for irregularly sampled time series. In International Conference on Learning Representations, Cited by: §1, §3, §4.2.2, §5, Table 2, Table 5.
  84. I. Silva, G. Moody, D. J. Scott, L. A. Celi and R. G. Mark (2012) Predicting in-hospital mortality of icu patients: the physionet/computing in cardiology challenge 2012. Critical Care Medicine. Cited by: §1, §4.1.1, §4, §5, §6.1.
  85. B. P. Singh, I. Deznabi, B. Narasimhan, B. Kucharski, R. Uppaal, A. Josyula and M. Fiterau (2019) Multi-resolution networks for flexible irregular time series modeling (multi-fit). CoRR abs/1905.00125. Cited by: Table 5.
  86. S. Srivastava, P. Sen and B. Reinwald (2020) Forecasting in multivariate irregularly sampled time series with missing values. CoRR abs/2004.03398. Cited by: §3.
  87. G. Stiglic, P. Kocbek, N. Fijacko, M. Zitnik, K. Verbert and L. Cilar (2020) Interpretability of machine learning-based prediction models in healthcare. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 10 (5). Cited by: §7.2.
  88. Q. Tan, A. J. Ma, H. Deng, V. W. Wong, Y. Tse, T. C. Yip, G. L. Wong, C. L. J. Yuet, F. K. Chan and P. C. Yuen (2018) A hybrid residual network and long short-term memory method for peptic ulcer bleeding mortality prediction. In AMIA 2018, American Medical Informatics Association Annual Symposium, Cited by: §4.1.
  89. Q. Tan, A. J. Ma, M. Ye, B. Yang, H. Deng, V. W. Wong, Y. Tse, T. C. Yip, G. L. Wong, J. Y. Ching, F. K. Chan and P. C. Yuen (2019) UA-CRNN: uncertainty-aware convolutional recurrent neural network for mortality risk prediction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 109–118. Cited by: §4.1.
  90. Q. Tan, M. Ye, B. Yang, S. Liu, A. J. Ma, T. C. Yip, G. L. Wong and P. Yuen (2020) DATA-GRU: dual-attention time-aware gated recurrent unit for irregular multivariate time series. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 930–937. Cited by: §1, §3, §4.2.1, §5, Table 2, Table 5.
  91. X. Tang, H. Yao, Y. Sun, C. C. Aggarwal, P. Mitra and S. Wang (2020) Joint modeling of local and global temporal dynamics for multivariate time series forecasting with missing values. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 5956–5963. Cited by: §1, §4.1.2, §4.2.1, §4.2.2, §5, Table 2, Table 5.
  92. S. Tirunagari, S. C. Bull and N. Poh (2016) Automatic classification of irregularly sampled time series with unequal lengths: A case study on estimated glomerular filtration rate. In 26th IEEE International Workshop on Machine Learning for Signal Processing, pp. 1–6. Cited by: Table 5.
  93. V. Tresp and T. Briegel (1997) A solution for missing data in recurrent neural networks with an application to blood glucose prediction. In Advances in Neural Information Processing Systems, pp. 971–977. Cited by: §4.1.1, §5.
  94. L. Wang, H. Wang, Y. Song and Q. Wang (2019) MCPL-based FT-LSTM: medical representation learning-based clinical prediction model for time series events. IEEE Access 7, pp. 70253–70264. Cited by: Table 5.
  95. R. Williams and D. Zipser (2014) A learning algorithm for continually running fully recurrent neural networks. Neural Computation 1 (2), pp. 270–280. Cited by: §4.1, §5, Table 5.
  96. X. Wu (2018) RESTFul: resolution-aware forecasting of behavioral time series data. In the 27th ACM International Conference, Cited by: §1, §1.
  97. D. Xu, W. Cheng, B. Zong, D. Song and X. Zhang (2020) Tensorized lstm with adaptive shared memory for learning trends in multivariate time series. Proceedings of the AAAI Conference on Artificial Intelligence 34 (2), pp. 1395–1402. Cited by: §1.
  98. Y. Xu, S. Biswal, S. R. Deshpande, K. O. Maher and J. Sun (2018) RAIM: recurrent attentive and intensive model of multimodal patient monitoring data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Y. Guo and F. Farooq (Eds.), pp. 2565–2573. Cited by: §4.1.
  99. B. Yan, H. Wang, W. Xingzheng and Z. Yongbing (2017) AN accurate saliency prediction method based on generative adversarial networks. In 2017 IEEE International Conference on Image Processing (ICIP), Cited by: §1.
  100. G. J. e. al. Yan L (2020) An interpretable mortality prediction model for covid-19 patients. Nature, Machine intelligence 2. Cited by: §4, §6.1.
  101. H. Yao, X. Tang, H. Wei, G. Zheng and Z. Li (2019) Revisiting spatial-temporal similarity: a deep learning framework for traffic prediction. Proceedings of the AAAI Conference on Artificial Intelligence 33, pp. 5668–5675. Cited by: §1.
  102. J. Yoon, J. Jordon and M. van der Schaar (2018) GAIN: missing data imputation using generative adversarial nets. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80, pp. 5675–5684. Cited by: §4.1.1, §5, Table 5.
  103. W. Young, G. Weckman and W. Holland (2011) A survey of methodologies for the treatment of missing values within datasets: limitations and benefits. Theoretical Issues in Ergonomics Science 12 (1), pp. p.15–43. Cited by: §1.
  104. H. Yu, N. Rao and I. S. Dhillon (2016) Temporal regularized matrix factorization for high-dimensional time series prediction. In Advances in Neural Information Processing Systems, pp. 847–855. Cited by: §7.2.
  105. L. Yu, W. Zhang, J. Wang and Y. Yu (2017) SeqGAN: sequence generative adversarial nets with policy gradient. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 2852–2858. Cited by: §5.
  106. Y. Z. Yu and J. Hui (2017) A deep learning method based on hybrid auto-encoder model. IEEE Journal of Biomedical and Health Informatics, pp. 1100–1104. Cited by: §1.
  107. W. Yueming, L. Kang, Q. Yu, L. Qi, F. Shaozhe, P. Gang and W. Zhaohui (2018) Estimating brain connectivity with varying-length time lags using a recurrent neural network. IEEE Transactions on Biomedical Engineering 65, pp. 1953–1963. Cited by: §1.
  108. G. Zhang, E. Tu and D. Cui (2017) Stable and improved generative adversarial nets (GANS): A constructive survey. In 2017 IEEE International Conference on Image Processing, pp. 1871–1875. Cited by: §4.1.1.
  109. Zhang and Guojun (2009) A modified svm classifier based on rs in medical disease prediction. International Journal of Advanced Research in Computer Engineering and Technology, pp. 144–147. Cited by: §1.
  110. H. Zhang and D. Woodruff (2018) Medical missing data imputation by stackelberg gan. Carnegie Mellon University. Cited by: §4.1.1, Table 3, Table 4, Table 5.
  111. Y. Zhou, S. Hong, J. Shang, M. Wu, Q. Wang, H. Li and J. Xie (2019) K-margin-based residual-convolution-recurrent neural network for atrial fibrillation detection. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 6057–6063. Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters