# Patient Subtyping with Disease Progression and Irregular Observation Trajectories

## Abstract

Patient subtyping based on temporal observations can lead to significantly nuanced subtyping that acknowledges the dynamic characteristics of diseases. Existing methods for subtyping trajectories treat the evolution of clinical observations as a homogeneous process or employ data available at regular intervals. In reality, diseases may have transient underlying states and a state-dependent observation pattern. In our paper, we present an approach to subtype irregular patient data while acknowledging the underlying progression of disease states. Our approach consists of two components: a probabilistic model to determine the likelihood of a patient’s observation trajectory and a mixture model to measure similarity between asynchronous patient trajectories. We demonstrate our model by discovering subtypes of progression to hemodynamic instability (requiring cardiovascular intervention) in a patient cohort from a multi-institution ICU dataset. We find three primary patterns: two of which show classic signs of decompensation (rising heart rate with dropping blood pressure), with one of these showing a faster course of decompensation than the other. The third pattern has transient period of low heart rate and blood pressure. We also show that our model results in a 13% reduction in average cross-entropy error compared to a model with no state progression when forecasting vital signs.

## 1 Introduction

Patient subtyping is an important topic in medical informatics. Subtyping can be used to make improved predictions, understand disease etiologies and healthcare practices, plan customized treatments, and design efficient clinical trials [\citeauthoryearMarlin et al.2012, \citeauthoryearChang, Clark, and Weiner2011, \citeauthoryearGundlapalli et al.2008, \citeauthoryearKohane2011]. The medical community has begun to recognize that many diseases have heterogeneous underlying mechanisms and phenotypes [\citeauthoryearKeulenaer and Brutsaert2009, \citeauthoryearSuratt and Parsons2017]. With increasing amounts of patient data being collected in electronic medical records and large medical databases, there is a tremendous opportunity to mine data and identify patient subtypes to better patient care and improve outcomes. Patient subtyping can also have a direct economic impact on the healthcare system by characterizing current patient-provider interactions and suggesting optimal resource management. Today’s medical databases have patients’ longitudinal trajectories of observations and interventions charted along with their time stamps. Subtyping patients based on their temporal observation trajectories could lead to significantly nuanced subtyping that acknowledges the dynamic characteristics of diseases. Traditionally, patient subtyping has been based either on aggregates of patient’s entire time course of observations or summaries over blocks of fixed time intervals. These approaches result in vectors of fixed size for all patients, which are then amenable to well-known clustering approaches such as K-means clustering, hierarchichal clustering etc [\citeauthoryearVranas et al.2017, \citeauthoryearMarlin et al.2012, \citeauthoryearCohen et al.2010]. Native data in medical records, however, are almost always incomplete and irregularly observed over varying time intervals. The missingness of data is typically tackled by imputation to produce a complete dataset of constant dimension. Although practically useful, clustering with summary-based fixed dimensional dataset ignores the rich information in the temporal patterns of clinical observations/interventions. Methods that subtype entire observation trajectories are being developed in recent times [\citeauthoryearDoshi-Velez, Ge, and Kohane2014, \citeauthoryearSchulam, Wigley, and Saria2015, \citeauthoryearSaria, Koller, and Penn2010]. Existing methods, however, either treat the evolution of clinical markers as one homogeneous process or employ data available at regular intervals; in reality diseases may have transient underlying states and a state-dependent observation pattern. In this paper, we present an approach to subtype raw unsummarized data from electronic medical records, while acknowledging the underlying progression of disease states. Our approach consists of two components: a probabilistic model to determine the likelihood of a patient’s observation trajectory and a mixture model to measure similarity between asynchronous patient trajectories. By constructing a probabilistic model that acknowledges the progression of disease states and works with irregular data, we evaluate a likelihood for any observation trajectory. Subtyping is then accomplished by using a mixture model to cluster the observation trajectories given their likelihoods.

## 2 Methodology

We begin by describing the structure of the dataset that is available in medical records and the resulting challenges it presents for subtyping. Consider that we have data from patients, each associated with their time course of observations given by . Here, is the vector of observations at time and is the trajectory of observation vectors of patient . The length of each is , where is the number of features that could be observed. For example, if we have the heart rate, blood pressure, and respiratory rate measurements of patients, would be three. Usually, only a subset of the features are actually observed at any time, with the specific features observed being different at each time point. This results in an incomplete observation set with many feature observations missing. As such, patient observations are made when appropriate—when the patient appears for a routine check-up or when clinicians ask for specific tests/measurements. As a result, the trajectories of patient observations do not synchronize in time or the type of features that are observed.

### 2.1 Patient disease trajectory model

Disease evolution is fundamentally a continuous process: patient’s disease state transitions can happen at any time with the chance of state transition between any two time points higher if the time interval is longer. We thus use a continuous-time Markov chain to model the evolution of a patient’s disease state. The disease state of patient at time is denoted as and takes one of a set of discrete values. The disease state is naturally hidden, i.e., we never get to observe the actual disease state. In fact, the precise definition of the disease states is apriori unknown. The disease states can be learnt from data in an unsupervised manner and subsequently the states interpreted based on the parameters that describe the states. The observation vectors are surrogates of the underlying disease state . To reflect this behaviour in our model, we model the observations by a conditionally independent probability model , where observation is independent of all other observations given the current disease state . Overall, our model for the patient’s observation trajectory can be described by the continuous-time hidden Markov model (CT-HMM) shown in Figure 1. A CT-HMM models the temporal evolution of disease states and the state-dependent observation vectors.

#### Continuous time Markov chains

As mentioned in the above paragraph, we model the evolution of disease state with a continuous-time Markov chain. A continuous-time Markov chain is a continuous-time process on a state-space (here the different disease states) satisfying the Markov property. This means that if is all the information about the history of the disease state up to time and , then Z(t) is independent of all , where , given Z(s). Mathematically, this can be expressed as

(2.1) |

Further, we assume the process to be time-homogeneous, so that

(2.2) |

Equations 2.1 and 2.2 define a time-homogeneous continuous-time Markov chain and model the disease state evolution in our model. We allow different disease states in our model. The transition probability of moving from state to state over time in a continuous-time Markov chain is given by

(2.3) |

where is the generator matrix of the Markov process and is the matrix exponential. The probability of the initial state is parameterized by and given by

(2.4) |

#### Model formulation

We model the state-observation trajectory of a patient by the continuous-time hidden Markov model shown in Figure 1. The probability of the trajectory of patient is given by

(2.5) |

Throughout this paper, we use the notation l:r to denote all values ranging from to (inclusive of both boundaries). Due to the conditional independence property of the CT-HMM in Figure 1, the joint probability of the states and observations can be written as

(2.6) |

A common modeling choice that we incorporate in our model is that the features are conditionally independent given the corresponding disease state, i.e.,

(2.7) |

The choice of the conditional distribution of the observation given the disease state can be made as per the context. If recorded data only indicates whether a certain feature was observed or not, then the individual features can be modeled by a Bernoulli random variable. An example of this with healthcare data is when ICD9 code assignments are recorded along with their time stamps [\citeauthoryearWang, Sontag, and Wang2014]. In contrast, if the magnitude of feature observations are available, then continuous distributions like the Gaussian or the log-normal distributions could be used. Another approach to work with numerical values is to bin the values and then model the probability of observing values in the bins through a categorical distribution. This is the approach we adopt in our work here. Specifically, if we assume that a feature can fall into one of bins, the conditional probability of the th bin is given by

(2.8) |

where refers to the disease state at time and are the parameters of the categorical distribution of the feature given disease state . By construction . In case of missing feature observation, we marginalize that observation from the model.

The disease states in a patient’s observation timeline are unknown. Thus, we can quantify the patient’s observation trajectory by marginalizing the disease state out of Equation 2.6, giving

(2.9) |

This is we refer to as the likelihood under the patient’s disease trajectory model. A disease trajectory model is parameterized by , , and . Equipped with a likelihood model of the patient’s observation trajectory, we are now in a position to describe a measure of similarity between trajectories of different patients. Patients whose observation trajectories are more probable under a disease trajectory model than other trajectory models can be considered to be similar trajectories. This measure of similarity works even in cases when patients’ observation trajectories do not match in the time stamps of observations and the features observed. In other words, the patient observation trajectory likelihood based similarity metric allows comparison between trajectories when the patient observations are irregular, incomplete and when the observation trajectories across patients are asynchronous.

### 2.2 Mixture model

We subtype patients into different clusters using the mixture model. Consider that we are interested in identifying subtypes among the patients. In a mixture model, the probability of a patient observation trajectory is the weighted sum of the probability of observing the trajectory in each of the mixture components, where the weights sum to 1. Mathematically, the probability of a trajectory in a mixture model is expressed as

(2.10) |

where is the probability of observing the trajectory given that the patient belongs to subtype and is the prior probability of subtype . Thus, the joint distribution of patient ’s subtype assignment and his/her trajectory is given by

(2.11) |

Here and is evaluated using the disease trajectory model with parameters corresponding to subtype . To infer the subtypes, we identify patient subtype assignments and subtype parameters so as to maximize the joint probability of the subtype assignment and the conditional observation trajectory probability over all patients. Mathematically, assuming independence of patients, the objective used to identify the subtypes is

(2.12) |

With the above objective, each patient gets assigned the subtype with the highest posterior probability. One could instead also use the maximum likelihood of the mixture model (2.10) as the objective, producing a soft clustering of patients. A natural choice for the prior probabilities of subtype assignment is to assume a uniform distribution () for each , i.e., each patient is apriori equally likely to belong to any subtype. This translates into patients in the training data being assigned into subtypes and subtype parameters learnt so as to maximize the product of the conditional likelihood of all patient trajectories. Of course, in cases where the modeler would like to relax this assumption, the prior distribution of the subtype assignment can also be learnt from available data. Once the optimal parameters are learnt from the training data, for a new patient not in the training data, the subtype is identified as the subtype with the highest posterior probability for that patient.

(2.13) |

For m=1 to M: | |||

(2.14) |

## 3 Subtype learning

We now present the different steps involved in training the subtyping model. We train our subtyping model by maximizing the objective (Equation 2.12) using a coordinate ascent optimization algorithm. The algorithm consists of two alternating steps: Step 1, when each patient trajectory gets assigned to the subtype with the highest posterior probability, and Step 2, when all patients assigned to a subtype are used to optimize the parameters of that subtype. The precise mathematical forms of the two steps are given in Algorithm 1. In Step 2 of the above algorithm, parameters of each subtype are learnt by training the disease trajectory model described in Section 2.1. The solution of each maximization problem in Step 2 is a maximum likelihood estimate of the subtype parameters with the data assigned to that subtype. As was explained in Section 2.1, the likelihood of a patient trajectory can be realized by marginalizing the hidden disease states from the joint probability distribution of the observations and the disease states (Equation 2.9). Thus, the optimization in Step 2 can be solved with the expectation maximization algorithm. The E- and M- steps in optimizing equation 2.14 are given in Algorithm 2.

(3.1) |

(3.2) |

The E-Step and M-Step in Algorithm 2 can be simplified for our construction of the patient disease trajectory model. The expectation of the first term of the RHS in Equation 3.1 is given by

(3.3) |

where

(3.4) |

is the number of transitions between states and in time and is the duration of time spent in state during time interval . A detailed derivation of the above expression can be found in [\citeauthoryearMetzner, Horenko, and Schütte2007]. The second term of the RHS in Equation 3.1 can be written as

(3.5) |

where is the posterior probability of disease state for patient at time point , is an indicator of feature not missing at time point and is an indicator function for observation belonging to the discrete bin.

The M-step in Equation 3.2 results in the following closed-form expressions for the parameters of the observation model and the initial probability vector:

(3.6) |

(3.7) |

The generator matrix can be updated in each iteration using the closed-form solution [\citeauthoryearMetzner, Horenko, and Schütte2007, \citeauthoryearWang, Sontag, and Wang2014]:

(3.8) |

The specific formulae for the involved terms are:

(3.9) |

(3.10) |

(3.11) |

(3.12) |

Here, is the transition probability of moving from state to in time interval (Equation 2.3). The evaluation of posterior probabilities and is done using the forward-backward algorithm for computing the posterior probabilities in hidden Markov models. We give only the final results here; detailed derivations are in [\citeauthoryearBishop2007]. The approach consists of sequential updates of the form:

(3.13) |

and

(3.14) |

where and . Note in the above equations is the coefficient that normalizes the RHS in Equation 3.13. The marginal likelihood , the posterior probability , and the bivariate marginal posterior probability can then be obtained as

(3.15) |

(3.16) |

and

(3.17) |

## 4 Results

### 4.1 Data summary

We applied our subtyping model to a cohort of hemodynamically unstable patients obtained from the eResearch Institute [\citeauthoryearMcShea et al.2010]. The patients were identified to be hemodynamically unstable based on charted clinical interventions. Specifically, patients were deemed unstable if administered inotropic or vasopressor medications, or atleast 2.4L of fluid (colloid or crystalloid) over 8 hours, or administered packed red blood cells (PRBC). The criteria for instability were developed based on a strong consensus among a group of experienced intensive care physicians. The dataset consists of data from 6972 patients, each with an episode of less than 24 hours, during which observations of various features were made at irregular time points. For the present analysis, we only consider heart rate and noninvasive systolic blood pressure measurements. Interventions were given after all heart rate and blood pressure observations. The allowed range of heart rate values we consider is 40–150 beats/min and that of systolic blood pressure is 40–200 mmHg. All outliers are treated as missing observations. The observations for both features are discretized into five equal-sized bins. We tried a few different bin sizes and found the results to be similar. The heart rate and SBP bins are summarized in Table 1.

Bin | Heart rate | Systolic blood pressure |
---|---|---|

1 | 40.0–62.0 | 40.0–72.0 |

2 | 62.0–84.0 | 72.0–104.0 |

3 | 84.0–106.0 | 104.0–136.0 |

4 | 106.0–128.0 | 136.0–168.0 |

5 | 128.0–150.0 | 168.0–200.0 |

### 4.2 Inferring subtypes of hemodynamically unstable patients

We use our subtyping algorithm to identify 4 clusters using the time course of heart rate and blood pressure measurements and also the intervention time. Thus, effectively, we have three features in the analysis: heart rate, systolic blood pressure, and an indicator of treatment administration. As indicated earlier, the total length of the observation window for all patients is less than 24 hours and then a treatment administered. We force the administration of an intervention to be the indicator of the final disease state in our model. This is achieved by constraining the observation model:

(4.1) |

As a result, all prior disease states can be viewed relative to the final intervention state. By anchoring the intervention to one of the disease states, not only do we obtain a disease state which should have a similar meaning across patient subtypes, but it also helps in the interpretation of other disease states in relation to the anchored state. In other settings, where either such treatment information is not part of the available dataset or if none of the observations are candidates for anchoring, a standard application of our subtyping algorithm with all feature types allowed to be observed in each disease state would be appropriate. We set the number of allowed disease states to be 4 and impose a uniform prior distribution on the subtype of all patients. We allow the initial probability vector to be nonzero for all four disease states. Further, for the present analysis, we only allow instantaneous state transitions to the next state, although this assumption can be relaxed in general. The constraint is reflected by a generator matrix where only the diagonal and the first upper diagonal elements are nonzero. The optimal cluster parameters are learnt by applying Algorithm 1.

Each of the parameters of the subtypes reflect something about the expected trajectory of patients in that subtype. indicates the prior probabilities of the initial disease state of any patient in subtype , is the generator matrix of subtype , thereby defining the temporal evolution of the disease states, and defines the conditional distributions of the heart rate and blood pressure values. The progression of disease trajectories assuming the patients arrive in state 1 of the inferred subtypes are shown in Figure 2. To compute the progression trajectory, we calculate the expected duration of the four disease states and compute the expected values of the features in each disease state. The expected duration of state is since the holding time of each state is exponentially distributed with parameter . We find three primary patterns: subtype 2, 3, and 4 show classic signs of decompensation (rising heart rate with dropping blood pressure), with subtype 4 showing a faster course of decompensation than the other two. Subtype 1 has transient period of low heart rate and blood pressure. In general, across all subtypes we also observe that the state 3 has the shortest duration among all disease states. This corroborates a common observation of quick deterioration—a period of acute changes over a short time—before intervention among ICU patients.

### 4.3 Model evaluation by prediction

We evaluate our subtyping algorithm through quantitative tests of prediction on future observations. For this purpose, we consider the cohort of hemodynamically unstable patients utilized above and split it into a training set and a test set in a 80:20 proportion. This results in 5577 patients in the training set and 1395 patients in the test set. We only consider the vital signs (heart rate and blood pressure values) for this analysis and do not include the indicators of intervention. The evaluation of prediction accuracy is performed as follows. First we use the 5577 patients to learn four subtypes. Once again a uniform prior distribution is imposed on the subtype of each patient. Having learnt the underlying subtype parameters, for each patient in the test data, we use the first 70% of the timepoints when observations are made to identify the patient’s subtype. Knowing the patient’s subtype, we predict the observations at the remaining 30% of the timepoints. We compare the predicted heart rate and blood pressure bin probabilities with the available data points to compute the average cross-entropy error over the latter 30% observations. The total forecasting error is obtained by computing the mean of the cross-entropy error over all patients in the test set. The estimates of forecasting error and their standard error with different number of disease states are presented in Table 2. We use a paired t-test to test for significance in the difference in forecasting error between models with multiple disease states and a single disease state model. The p-values for all models were less than 0.05.

No. of disease states | Mean cross-entropy error |
---|---|

1 | |

2 | |

3 | |

4 | |

5 |

We see that there is a reduction in the forecasting error in predicting patient trajectories with 4 disease states as compared to subtyping models with only one disease state. These results demonstrate the potential of improved subtyping and more accurate forecasting by subtyping patient disease trajectories while acknowledging the underlying disease progression. In addition, we obtain forecasting error with a 4-disease-state patient disease trajectory model (no subtyping) trained and tested using the same datasets, showing that subtyping of patients indeed results in improvement in accuracy compared to training one model for all patients.

## 5 Conclusions

Subtyping based on temporal observations can lead to significantly nuanced subtyping that acknowledges the dynamic characteristics of diseases. In this paper, we present an approach to subtype irregular observation trajectories while acknowledging the underlying disease state of the patients. Subtyping of patient observation trajectories with transient disease states enables improved predictions and a better understanding of disease etiologies and patient care practices.

With increasing amounts of patient data being collected in medical records, subtyping tools can help clinicians discover hitherto unobserved disease patterns and identify optimal treatment protocols. A natural direction of future work would be to learn subtypes of observation trajectories with a richer class of state-conditioned observation models, e.g., models with polynomial basis.

### References

- Bishop, C. M. 2007. Pattern Recognition and Machine Learning. Springer.
- Chang, H. Y.; Clark, J. M.; and Weiner, J. P. 2011. Morbidity trajectories as predictors of utilization: multi-year disease patterns in Taiwan’s national health insurance program. Medical Care 49(10):918–923.
- Cohen, M. J.; Grossman, A. D.; Morabito, D.; Knudson, M. M.; Butte, A. J.; and Manley, G. T. 2010. Identification of complex metabolic states in critically injured patients using bioinformatic cluster analysis. Critical Care 14(1):R10.
- Doshi-Velez, F.; Ge, Y.; and Kohane, I. 2014. Comorbidity clusters in Autism Spectrum Disorders: an electronic health record time-series analysis. Pediatrics 133(1):e54–63.
- Gundlapalli, A. V.; South, B. R.; Phansalkar, S.; Kinney, A. Y.; Shen, S.; Delisle, S.; Perl, T.; and Samore, M. H. 2008. Application of natural language processing to VA electronic health records to identify phenotypic characteristics for clinical and research purposes. Summit on Translational Bioinformatics 36–40.
- Keulenaer, G. W. D., and Brutsaert, D. L. 2009. The heart failure spectrum: time for a phenotype-oriented approach. Circulation 119(24):3044–3046.
- Kohane, I. S. 2011. Using electronic health records to drive discovery in disease genomics. Nature Reviews Genetics 12:417–428.
- Marlin, B. M.; Kale, D. C.; Khemani, R. G.; and Wetzel, R. C. 2012. Unsupervised pattern discovery in electronic health care data using probabilistic clustering models. IHI ’12 Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium 389–398.
- McShea, M.; Holl, R.; Badawi, O.; Riker, R. R.; and Silfen, E. 2010. The eICU research institute - a collaboration between industry, health-care providers, and academia. IEEE Eng Med Biol Mag 18–25.
- Metzner, P.; Horenko, I.; and Schütte, C. 2007. Generator estimation of Markov jump processes based on incomplete observations equidistant in time. Physical Review E 76(6):066702.
- Saria, S.; Koller, D.; and Penn, A. 2010. Learning individual and population level traits from clinical temporal data. Proceedings of Neural Information Processing Systems (NIPS), Predictive Models in Personalized Medicine Workshop.
- Schulam, P.; Wigley, F.; and Saria, S. 2015. Clustering longitudinal clinical marker trajectories from electronic health data: Applications to phenotyping and endotype discovery. AAAI ’15 2956–2964.
- Suratt, B. T., and Parsons, P. E. 2017. In ARDS, Heterogeneity = Opportunity. Chest 151(4):731–732.
- Vranas, K. C.; Jopling, J. K.; Sweeney, T. E.; Ramsey, M. C.; Milstein, A. S.; Slatore, C. G.; Escobar, G. J.; and Liu, V. X. 2017. Identifying distinct subgroups of ICU patients: A machine learning approach. Critical care Medicine 45(10):1607–1615.
- Wang, X.; Sontag, D.; and Wang, F. 2014. Unsupervised learning of disease progression models. KDD’14 7.