Development of a Common Patient Assessment Scale across the Continuum of Care: A Nested Multiple Imputation Approach\thanksrefT1

Development of a Common Patient Assessment Scale across the Continuum of Care: A Nested Multiple Imputation Approach\thanksrefT1


Evaluating and tracking patients’ functional status through the post-acute care continuum requires a common instrument. However, different post-acute service providers such as nursing homes, inpatient rehabilitation facilities and home health agencies rely on different instruments to evaluate patient’s functional status. These instruments assess similar functional status domains, but they comprise different activities, rating scales and scoring instructions. These differences hinder the comparison of patients’ assessments across health care settings. We propose a two-step procedure that combines nested multiple imputation with the multivariate ordinal probit (MVOP) model to obtain a common patient assessment scale across the post-acute care continuum. Our procedure imputes the unmeasured assessments at multiple assessment dates, and enables evaluation and comparison of the rates of functional improvement experienced by patients treated in different health care settings using a common measure. To generate multiple imputations of the unmeasured assessments using the MVOP model, a likelihood-based approach that combines the EM algorithm and the bootstrap method as well as a fully Bayesian approach using the data augmentation algorithm are developed. Using a dataset on patients who suffered a stroke, we simulate missing assessments and compare the MVOP model to existing methods for imputing incomplete multivariate ordinal variables. We show that, for all of the estimands considered, and in most of the experimental conditions that were examined, the MVOP model appears to be superior. The proposed procedure is then applied to patients who suffered a stroke and were released from rehabilitation facility to skilled nursing facilities or home.


arXiv:0000.0000 \startlocaldefs \endlocaldefs


A Nested Multiple Imputation Approach \thankstextT1Supported in part by the Changing Long-Term Care in America project funded by the National Institute on Aging (P01AG027296).



Data augmentation \kwdEM algorithm \kwdMissing data \kwdNested Multiple imputation \kwdMultivariate ordinal probit model \kwdSlice sampler.

1 Introduction

1.1 Overview

To track and evaluate patients through the post-acute care continuum a common standardized evaluation tool is needed. Current evaluation tools have been largely developed within each type of health care provider, and cannot be easily compared. In inpatient rehabilitation facilities (IRFs), patients’ functional status is evaluated by the Functional Independence Measure (FIM). After being discharged from IRFs, the functional status of patients who stay in skilled nursing facilities (SNFs) (G1) is collected using the Minimum Data Set (MDS), while the Outcome and Assessment Information Set (OASIS) is collected for patients who receive home health care provided by home health agencies (G2). All of these assessments examine similar functional capabilities (e.g., eating, grooming, dressing, etc.), but the specific instruments, rating scales and instructions for scoring the different activities vary between these post-acute settings. Thus, it is difficult to evaluate and compare the rates of functional improvement experienced by patients treated in the different health care settings.

The Continuity Assessment Record and Evaluation item set is a standardized evaluation tool that was developed for use at acute hospital discharge and at post-acute care admission and discharge (Gage et al., 2012). This tool is intended to be a common evaluation tool for evaluating patients across the continuum of post-acute care, and considerable resources were invested in its development. However, implementing new instruments in all post-acute care settings may result in additional investments in administration and training as well as in changes to reimbursement system (Li et al., 2017). Moreover, adopting new instruments would require translating past functional outcome data so that comparison to existing outcome data is possible.

Equating setting-specific instruments so that functional status scores from one instrument could be used interchangeably with ones from another instrument is a possible approach to obtain a common evaluation tool (Kolen and Brennan, 2004; Dorans, Pommerich and Holland, 2007; von Davier, 2010). Linking and equating scores across different standardized assessments has been a major focus in the field of educational testing for the past 90 years (see Dorans, Pommerich and Holland, 2007, chapter 2, for details). Score equating methods have been recently used in health outcomes research. The conversion table method (Velozo, Byers and Joseph, 2007) was used to equate FIM assessments with MDS assessments. Conversion table equates the sum of individual item scores, also referred to as the total score, by matching on the latent functional status estimated from two different instruments using Item Response Theory models. This method was also used to equate scores from two physical functioning scales (ten Klooster et al., 2013), as well as two depression scales (Fischer et al., 2011). Conversion table may result in an invalid statistical procedure when further analysis is performed using the imputed scores, because it ignores the variability from the estimation and the imputation processes (Gu and Gutman, 2017a). Furthermore, a data set that comprises contemporaneous MDS and OASIS assessments is required in order to equate MDS and OASIS instruments. However, MDS and OASIS assessments are not jointly observed for patients in and .

We propose a nested multiple imputation procedure (Shen, 2000; Harel, 2003; Rubin, 2003) to impute unmeasured assessments across the continuum of care, which would enable evaluation and comparison of the rates of functional improvement across different health care settings. This procedure consists of an Equating step and a Translating step. In the Equating step, we impute the unmeasured assessments in MDS or OASIS that are close to the FIM assessment date to obtain a synthetic data set with simultaneous MDS and OASIS assessments. In the Translating step, we rely on the synthetic data set from the first step to estimate the relationship between MDS and OASIS that will be used to impute multiple unmeasured assessments in MDS or OASIS at later assessment dates. This two-step procedure accounts for the uncertainty in both steps, and provides flexibility for researchers to choose different models in the second step without the need to re-equate the instruments.

The Equating step utilizes modeling of multiple ordinal outcomes to impute the missing instruments that consist of multiple ordinal items. The logistic and probit link functions are commonly used to model single ordinal variable. Both link functions tend to result in similar model fit and predictive performance. Bayesian inference for these models relies on sampling from complex posterior distributions. However, by introducing auxiliary variables, sampling from these posterior distributions can become more efficient. Albert and Chib (1993) described this technique for the probit link function and more recently Holmes et al. (2006) and Polson, Scott and Windle (2013) proposed two possible approaches for the logistic link function. The multivariate ordinal probit (MVOP) model was proposed as an extension of the probit model (Albert and Chib, 1993) and the multivariate probit model (Ashford and Sowden, 1970) to multivariate ordinal responses. Similar extensions for logistic link function with multiple ordinal outcomes is an area of future research. Using the MVOP model we can capture the complex dependence structure and the ordinal nature of the different functional assessment instruments as well as adjust for observed patients’ covariates.

To generate multiple imputations of the unmeasured functional assessments using the MVOP model, we develop two computational approaches. The first approach combines the EM algorithm (Dempster, Laird and Rubin, 1977) for obtaining the maximum likelihood estimates of the parameters in the MVOP model and the bootstrap method to multiply impute the missing values (Little and Rubin, 2002). The second approach relies on the data augmentation (DA) algorithm (Tanner and Wong, 1987) to draw the unknown parameters of the MVOP model and the missing values from their joint posterior distribution. We compared the MVOP model to existing methods for imputing incomplete multivariate ordinal variables with respect to the biases, the sampling variances, and the RMSEs of their point estimates, as well as the widths and coverage rates of their interval estimates. For all of the estimands considered and in most of the experimental conditions that were examined, the MVOP model appears to be superior. In the Translating step, different models can be used to estimate the relationship between MDS and OASIS assessments. We illustrate this flexibility by either imputing the missing individual items using the MVOP model or imputing the missing total scores using a linear regression model.

The remainder of this section describes the analytical data set, introduces the basic framework and reviews related work. Section 2 describes the MVOP models and their estimation methods. Section 3 presents the NMI procedure. Section 4 compares the MVOP models to existing methods using a simulation study. Section 5 describes the empirical data analysis. Conclusions and discussions are provided in Section 6.

1.2 Motivating Example

The analytical data set includes 72,575 patients who suffered a stroke and were discharged from IRFs between 2011 and 2014. Of these patients, 38,629 were released to SNFs (), where the MDS assessments were collected for them. The other 33,946 patients were discharged home (), where the OASIS assessments were used to measure their functional status. Patient assessments were collected on admission and at various times points during their post-acute stays. The median number of assessments for each patient was 5 for patients in SNFs with a range of 0 to 91, and 3 for patients receiving home health care with a range of 0 to 46. Two assessments for each patient were included in our analyses. One assessment is collected at admission within 30 days from the IRF’s discharge date. The other is recorded approximately 30 days after the first assessment. The primary research objective is to examine and compare the rates of functional improvement experienced by patients treated in the different health care settings after being discharged from IRFs. To describe the functional change for patients who were released to either SNFs or home, we will use the MDS scale.

1.3 Basic Framework

We consider equating setting-specific patient assessments as a missing data problem. We assume that all patients have complete FIM assessments and complete demographic characteristics. Let , , be an indicator that is equal to 1 if patient is in and 0 if in . Let , , and denote matrices of item responses in FIM, MDS, and OASIS, respectively, with rows referring to subjects and columns referring to variables, and where , , , and . The subscripts A and B denote the assessments on admission and at a later date, respectively. The subscripts obs and mis denote the observed and missing assessments, respectively. In addition, let denote a set of fully observed covariates.

The joint posterior distribution of the missing data and the parameters can be written as


where and index the imputation models in the Equating and Translating steps, respectively. The Equating step is performed once, and the Translating step can be performed multiple times. To reduce the computation complexity and provide flexibility to researchers we assumed in Equation (1.1) that and are conditionally independent. The data setting that consists of patient’s covariates, FIM assessments, and first MDS or OASIS assessments resembles the statistical matching setup (D’Orazio, Di Zio and Scanu, 2006). In this setup, the joint distribution of is not identifiable based on and alone because MDS and OASIS are never jointly observed.

This setup also arises in the test equating literature when using common-item nonequivalent groups design (Kolen and Brennan, 2004; Dorans, Pommerich and Holland, 2007). This design assumes that different groups of examinees are assessed using two different test forms that share some common items. An important property of the common-item set when used for equating is that they would be representative of the total test forms in content and statistical characteristics (Kolen and Brennan, 2004; Dorans, Pommerich and Holland, 2007). This is commonly attained by ensuring that the items are exactly the same in both forms and are at the same location in the form. Here, is similar for patients in and , and it is administered prior to and within a short time frame from the initial MDS and OASIS assessments. In addition, include similar content to the MDS and OASIS assessments, because it attempts to approximate the same underlying functional status. Based on these observations, a natural starting point is to apply the conditional independence assumption (CIA), (D’Orazio, Di Zio and Scanu, 2006). This assumption is often implicitly made in test equating applications using only . Here, we include other patient’s characteristics as well.

We further assumed that the unmeasured assessments are missing at random (MAR) (Little and Rubin, 2002), because a major determinant of patient’s discharge to SNF or home from a rehabilitation facility is the patient’s functional status, which is measured using the validated FIM instrument. Under CIA and MAR, we can impute using the posterior distribution in the Equating step. These two assumptions cannot be inferred from the data and may not always be plausible. To examine the plausibility of these assumptions, we conducted a sensitivity analysis to examine whether our results changed in a substantial way when these assumptions are violated (Rubin, 1986; Heitjan and Landis, 1994).

The Equating step generates complete synthetic data sets that comprise MDS and OASIS assessments simultaneously for patients in . Assuming that the relationship between contemporary imputed and observed instruments does not change across the continuum of care, we can simplify the third line of Equation (1.1):

where , and impute using in the Translating step, where denotes the imputed MDS assessments at admission.

1.4 Related Work

The Equating and the Translating steps require methods that can impute multivariate ordinal variables. Multivariate imputation methods can be classified into two types of methods: fully conditional specification and joint modeling. Fully conditional specification (Van Buuren, 2007) involves a series of univariate conditional models that imputes missing values sequentially with current model estimates. In practice, users only include main effects in these models, because it is challenging to identify and include higher-order interactions and nonlinear terms at each of the conditional models (Vermunt et al., 2008). With multiple ordinal variables, the default implementation of fully conditional specification relies on the ordered logit model. Gu and Gutman (2017a) note that this implementation fails to capture the full correlation structure of the imputed items when the proportional odds assumption is violated. Recently, a multi-level model based on the probit link function was proposed as a possible imputation model for missing ordinal variable (Enders, Keller and Levy, 2017).

The joint modeling approach (Schafer, 1997) imputes the missing values according to a joint model for the variables that is common to all observations. Yucel, He and Zaslavsky (2011) proposed a method that is based on multivariate normal model to impute ordinal variables and supplemented it with a rounding technique that preserves the observed marginal distribution of the ordinal variables. When there is a large proportion of missing values, propagation of errors in the underlying modeling approximation can compound and result in invalid statistical inferences (Yucel, He and Zaslavsky, 2011; Gu and Gutman, 2017a).

Imputation by Propensity score matching (IPSM) can be embedded in a joint modeling approach to define cells within which hot-deck imputations can be drawn (Andridge and Little, 2010). The propensity score is defined as the probability of a unit to have missing values. IPSM imputes missing values with observed values from units with similar estimated propensity scores. IPSM is a generally valid statistical method, but its performance is sensitive to the specification of the propensity score model (Gu and Gutman, 2017a).

Latent variable matching (LVM) (Gu and Gutman, 2017a) is a recently proposed procedure that combines IRT models with multiple imputation (Rubin, 1987) to impute unmeasured assessments. LVM is also a hot deck imputation method, which matches units using the underlying functional status estimated from IRT models. In its original form, LVM ignores patient covariates, which may violate the MAR assumption (Rubin, 1996). LVM can be extended to account for a set of discrete and continuous covariates by applying it within subgroups of the covariates; however, this approach may become computationally intensive when the number of possible covariate values is large.

Among these methods, IPSM and LVM are the strongest candidate methods in terms of validity and efficiency for imputing the missing assessments in our datasets (Gu and Gutman, 2017a). Thus, we only compared these two methods with the newly proposed procedure in Section 4.

2 The Multivariate Ordinal Probit Model

Let denote a generic matrix of item responses, where and are the matrices of missing and observed item responses, respectively, is the response of patient to item , and is the number of response levels of item . For example, in the Equating step, corresponds to the unmeasured assessments in MDS, , and corresponds to the observed assessments in FIM and MDS, and . The MVOP model introduces a matrix of latent response variables such that , if , where are unknown threshold parameters. The MVOP model assumes that , where is a vector of covariates for patient , is a matrix of unknown regression coefficients and is an unknown covariance matrix. Statistical inferences for are based on the likelihood


where is the interval if , and is a normalizing constant.

The vector parameter is not identifiable because the likelihood (2.1) is invariant to location and scale transformations on . Threshold constraints and correlation constraints are two types of identification constraints that are commonly made with the MVOP model. The threshold constraints fix two threshold parameters for each outcome. For example, one could set and either or . Applying these constraints allows to sample the covariance matrix from a known probability distribution (Chen and Dey, 2000; Jeliazkov, Graves and Kutzbach, 2008). The correlation constraints either fix or , and restrict to be a correlation matrix (Lawrence et al., 2008; Zhang et al., 2017). We refer to the MVOP model under the correlation constraints and threshold constraints as the MVOP-C model and MVOP-T model, respectively.

MVOP models have been analyzed using likelihood-based methods, including a direct likelihood approach that involves the evaluation of integrals using Gaussian-Hermite quadrature (Li and Schafer, 2008), an approximate EM algorithm (Guo et al., 2015), a pseudo-likelihood approach (Varin and Czado, 2010), and Bayesian approaches (Chen and Dey, 2000; Lawrence et al., 2008; Zhang et al., 2017). Of these methods, only Zhang et al. (2017) extended the MVOP model to handle incomplete correlated ordinal responses. Here, when some of the outcomes are missing, we propose Monte Carlo EM (MCEM) algorithms (Wei and Tanner, 1990) for maximum likelihood estimation and DA algorithms for Bayesian inference under the MVOP model to produce imputed values.

2.1 MCEM Algorithm

We first consider the MVOP-T model, and fix and . The complete-data likelihood is

The E-step of the EM algorithm, given the current value of the parameter, , involves evaluating the expectation

which consists of multiple integrations with respect to a truncated multivariate normal distribution of . cannot be calculated analytically, but Monte Carlo methods can be used to approximate it. We extend the slice sampler algorithm proposed by Damien and Walker (2001) for bivariate normal distribution to sample from the truncated multivariate normal distribution. The algorithm introduces a latent variable so that the slice sampler runs on a sequence of conditional distributions which can all be sampled directly using uniform distributions, and it has a faster mixing rate than the Gibbs sampling algorithm (Geweke, 1991). Details of the algorithm are described in Section 1 of the online supplement (Gu and Gutman, 2017b).

In the M-step, we rely on conditional maximization (Meng and Rubin, 1993) to update in successive steps with respect to and :

where are draws from . To decrease the Monte Carlo errors, Wei and Tanner (1990) suggested using a large .

We also derive a consistent estimator for based on the empirical marginal distribution of the observed and imputed responses in the absence of threshold constraints (Olsson, 1979),

where if , and is imputed through the indicator function given the current estimate of if , and is the cumulative distribution function of the standard Normal distribution. The estimate of given the threshold constraints is

where .

2.2 Data Augmentation Algorithm

For Bayesian inference of the MVOP-T model, we assign a prior distribution for and a prior distribution for , where denotes the inverse-Wishart distribution with degrees of freedom and scale matrix . Based on the work of Albert and Chib (1993), we use a uniform prior distribution over the polytope for . The feasible region for the parameter space of :

The DA algorithm for drawing samples from the posterior distribution of consists of an Imputation step that draws from using the slice sampler algorithm described in Section 2.1, and three Posterior simulation (P) steps:

P-step 1: Draw , where , is the th row of , , and .

P-step 2: Draw .

P-step 3: Draw , for , and , where denotes the uniform distribution with support .

After each cycle of the algorithm, we impute the missing responses through the indicator functions , for and given the corresponding latent responses and threshold parameters .

2.3 Parameter Expansion Approach

For the MVOP-C model, we fix and constrain the covariance matrix to be a correlation matrix . In the MCEM algorithm, the M-step with respect to does not have a closed form solution (Chib and Greenberg, 1998), and direct maximization of the expectation of the complete-data likelihood is computationally intensive. For the Bayesian inference, the posterior distribution of does not follow a known probability distribution. Thus, we use the parameter expansion (PX) technique (Liu, Rubin and Wu, 1998; Liu and Wu, 1999), and propose a PX-MCEM algorithm to obtain the maximum likelihood estimates, and a PX-DA algorithm to sample from the posterior distribution of , respectively. These algorithms are similar to the work of Zhang, Boscardin and Belin (2006), Lawrence et al. (2008) and Zhang et al. (2017).

We consider the following transformations:


so that is transformed into a general covariance matrix, where is a diagonal matrix with diagonal elements . The PX-MCEM and the PX-DA algorithms using the transformed parameters and the latent responses proceed as the MCEM and DA algorithms described in Section 2.1 and Section 2.2, respectively. After each iteration, are updated via the inverse transformations of identities (2.2).

3 Nested Multiple Imputation Procedure

Let be a quantity of interest. We summarize the proposed procedure to multiply impute :

Equating: Impute from the predictive distribution .

Step 1: Draw independent parameters from the posterior distribution , or from the asymptotic distribution obtained by applying the EM algorithm to a bootstrapped sample of the cases.

Step 2: Impute through the indicator functions , where is determined by and , .

Step 3: Repeat steps 1 and 2 times to create imputed datasets , .

Translating: For each of the imputed datasets in Stage 1, impute from the predictive distribution .

Step 4: Draw independent parameters from the posterior distribution , or from the asymptotic distribution obtained by applying the EM algorithm to a bootstrapped sample of the cases.

Step 5: Impute through the indicator functions , where is determined by and , .

Step 6: Repeat steps 4 and 5 times to create imputed datasets , .

Combining Rules: For each of the complete datasets , for , and , the estimate of and its sampling variance are and , respectively. The overall estimate of and its sampling variance are obtained using the nested multiple imputation combining rule, confidence intervals and significance tests are based on a Student- reference distribution (Shen, 2000; Harel, 2003; Rubin, 2003).

4 Simulation Study

We compared the performance of the MVOP model in comparison to existing methods for imputing incomplete multivariate ordinal variables using a simulation study.

4.1 Partially Simulated Data

The simulation study was based on observed FIM assessments and MDS assessments on admission for patients in , and missing MDS assessments were artificially generated. To generate incomplete data sets, we fitted a logistic regression model to the entire dataset where the explanatory variables comprised , patients’ age and patients’ gender,


where , , and and denote the age and gender of patient , respectively. This resulted in estimated regression coefficients , where is the estimated intercept and is a vector of estimated regression coefficients for the other predictors. A simple random sample of = 1,000 patients was then drawn from the set of patients in , and () was sampled from a Bernoulli distribution with probability , where is the c.d.f. of a specified distribution and is the link function (McCullagh and Nelder, 1989). We considered three choices of , the logistic distribution, the Cauchy distribution, which is symmetric but has heavier tails than the logistic distribution, and the Box-Cox family distributions (Guerrero and Johnson, 1982). The Box-Cox distribution takes the form

This distribution allows us to assess the effect of skewness in the missing data mechanism. It is positively skewed for and negatively skewed for ; here, was fixed at either -0.3 or 0.3. The value of was fixed so that is either , or , where is the number of patients who have missing assessments. MDS assessments for patient were deleted to create incomplete data set when . For each configuration, 1,000 replications were produced.

The methods examined in the simulations were IPSM, LVM and the MVOP models implemented using both EM and DA algorithms: MVOPT-DA, MVOPT-EM, MVOPC-DA and MVOPC-EM. For IPSM, we estimated the propensity score using the logistic regression model: . For both IPSM and LVM, we used the nearest-neighbor matching algorithm to find a potential donor. Ten multiple imputations were generated using each of the methods.

We examined the performance of the different methods on two estimands: (1) the population mean total score of items in MDS, , where , ; (2) the pairwise Goodman and Kruskal’s rank correlation coefficients (Goodman and Kruskal, 1954) between items in MDS, . For each method, at each configuration, and at each of the 1,000 replications, we recorded , , their estimated sampling variances, the corresponding root mean square errors (RMSEs), the interval estimate widths, and determined whether the interval covered or did not cover the true value . Using these values, we calculated for each approach and each configuration, the average coverage rate, the bias, the mean estimated sampling variance, the mean RMSE, and the mean interval width. For each configuration, we also calculated a loss function based on the negatively oriented interval scores (Gneiting and Raftery, 2007, Equation (61)). This loss function provides flexible assessment of coverage by accounting for the distance between the interval estimate and the estimand. For estimand , the loss function for interval estimate , has the form


where and denotes the Lebesgue measure of the interval estimate .

The simulations were implemented using R 3.1.0 (R Core Team, 2014). The proposed EM and DA algorithms were implemented in C++ for efficiency. For the EM algorithms, we generated samples from in the E-step, and calculated the observed-data likelihood using a Monte Carlo method (Genz, 1992) to monitor the convergence of the MCEM algorithm. For the DA algorithms, multiple parallel chains of 50,000 iterations with dispersed initial values were generated. Standard MCMC convergence diagnostics such as Gelman-Rubin Statistic (Gelman and Rubin, 1992), trace plots, and autocorrelation plots were examined for a small sample of the simulations, and did not indicate failure to converge.

4.2 Results

Table 1 displays the mean biases, variances, RMSEs, coverages, interval widths and interval estimate loss function of for configurations, where is the logistic distribution and . All of the methods generally produce statistically valid inferences. Compared to all of the methods that were examined, MVOP models implemented using the DA algorithms have coverages that are closest to nominal across all configurations. IPSM has coverages that are slightly smaller than LVM for , and worse than LVM for . When , the parametric models underlying LVM impute the missing values with less bias than the propensity score model used in IPSM. Similar results were observed when predictive mean matching was compared to IPSM (Andridge and Little, 2010). The MVOP models implemented using the DA algorithms generally have the smallest biases and RMSEs, while the MVOP models implemented using the EM algorithms and the bootstrap method have the largest biases, variances, RMSEs and interval widths. Because some methods have lower coverage but with shorter intervals, and some have higher coverage with wider intervals, we used the loss function in Equation (4.2) to compare the methods on coverage and interval width simultaneously. Generally, the MVOPT models implemented using the DA algorithms had the smallest interval score loss followed by LVM.

Method Bias Variance RMSE Coverage Width Equation (4.2)
IPSMa -0.013 0.014 0.124 93.2 0.460 0.607
LVMb -0.047 0.014 0.124 93.9 0.459 0.588
0.2 MVOPTc-DAd -0.015 0.014 0.116 95.5 0.472 0.547
MVOPCe-DA -0.012 0.013 0.115 94.3 0.453 0.569
MVOPT-EMf -0.017 0.029 0.133 98.0 0.735 0.678
MVOPC-EM -0.032 0.030 0.130 97.8 0.752 0.677
IPSM -0.015 0.016 0.147 90.4 0.500 0.752
LVM -0.070 0.020 0.159 91.8 0.565 0.767
0.4 MVOPT-DA -0.030 0.025 0.139 96.1 0.641 0.715
MVOPC-DA -0.025 0.020 0.141 94.8 0.564 0.793
MVOPT-EM -0.070 0.039 0.175 97.3 0.885 0.823
MVOPC-EM -0.073 0.039 0.166 97.2 0.886 0.836
IPSM 0.062 0.034 0.228 89.6 0.748 1.191
LVM -0.050 0.035 0.196 92.2 0.757 1.007
0.6 MVOPT-DA -0.027 0.050 0.183 97.9 0.931 0.984
MVOPC-DA -0.007 0.033 0.179 95.3 0.749 0.871
MVOPT-EM -0.180 0.061 0.269 91.7 1.137 1.084
MVOPC-EM -0.167 0.063 0.263 92.4 1.164 1.024
  • IPSM: imputation by propensity score matching;

  • LVM: latent variable matching;

  • MVOPT: multivariate ordinal probit model with threshold constraints;

  • DA: data augmentation algorithm;

  • MVOPC: multivariate ordinal probit model with correlation constraints;

  • EM: expectation-maximization algorithm.

Table 1: Biases, variances, RMSEs, interval coverages, confidence interval widths, and interval estimate loss function (Equation (4.2)) for given that and is the logistic distribution.

Figure 1 displays the distribution of biases, interval coverages, interval widths and interval score loss of for configurations where is the logistic distribution and . The MVOP models except for MVOPC-DA have coverages that are close to nominal, while IPSM and LVM have median coverage that is lower than . However, except for MVOPT-DA, the other MVOP models have larger biases than IPSM and LVM. As with , the MVOPT-EM and MVOPC-EM methods have largest biases and interval lengths, but their coverages are closer to nominal when compared to LVM and IPSM. Lastly, the MVOPT-DA method has better coverages and smaller interval score loss than the MVOPC-DA method. These trends are similar to the ones observed with (see Figures 5 - 6 Section 2 of the online supplement (Gu and Gutman, 2017b)).

Because the MVOPT-DA method generally has the best operating characteristics when is the logistic distribution, we only included this method when we examined the effects of misspecification of the propensity score model. Table 1 in Section 2 of the online supplement (Gu and Gutman, 2017b) displays the results for when is the Cauchy distribution or the Box-Cox family distribution with . The performance of IPSM is sensitive to misspecification of the propensity score model. For example, when and is the Box-Cox family distribution with , the coverage of IPSM is only and its loss is much larger than LVM and MVOPT-DA. In contrast, LVM and MVOPT-DA are robust to different link functions. MVOPT-DA has better coverages and smaller biases than LVM across all of the configurations that were examined, and generally has smaller interval score loss than LVM. Figures 7 - 15 in Section 2 of the online supplement (Gu and Gutman, 2017b) display the results for when is the Cauchy distribution and the Box-Cox family distribution with , respectively. IPSM, LVM and MVOPT-DA have similar point estimates, but MVOPT-DA has better coverages and smaller interval score loss than LVM and IPSM in most of the configurations examined. IPSM has the lowest coverages, and the median of its coverages is about when is the Box-Cox distribution with and . When the percentage of missingness decreases, the coverages of MVOPT-DA are closer to nominal.

(a) Bias
(b) Coverage
(c) CI Width
(d) Interval Score Loss
Figure 1: Distribution across 1,000 simulation replications of (a) biases, (b) coverages of confidence interval, (c) widths of confidence interval and (d) interval score loss for given that is the logistic distribution and .

4.3 Sensitivity of the methods to the CIA and MAR Assumptions

The proposed methods rely on the validity of the CIA and MAR assumptions (Section 1.3). We conducted an additional simulation study to examine the plausibility of these assumptions in this analysis.

One clinical variable that is recorded for patients in IRFs is their swallowing status at discharge. Swallowing status is a categorical variable with three categories of Regular Food, Modified Food Consistency/Supervision and Tube/Parenteral. Swallowing status is correlated with patient’s self-care functional status as well as patient’s discharge destination. We recoded the swallowing status using two dummy variables, which were added to the Equation (4.1). The setup of the coefficients in Equation (4.1) was similar to the one in Section 4.1, and we also considered the three different link functions and three possible values for . Because MVOPT-DA has the best operating characteristics, we only examined the validity of the CIA and MAR assumptions with this model. Swallowing status was not included when fitting the MVOPT-DA model to address the possibility that physicians may have used unobserved clinical information when selecting between the two possible discharge locations. The fitted MVOP model potentially violates both the CIA and the MAR assumptions.

Table 2 and Figure 16 in Section 2 of the online supplement (Gu and Gutman, 2017b) display the results for and for in this sensitivity analysis, respectively. MVOPT-DA generally provides statistically valid inferences. When the percentage of missingness decreases, the biases, variances, RMSEs and interval widths of MVOPT-DA decrease.

5 Motivating Example Revisited

5.1 Data

FIM, MDS and OASIS include similar functional status items, but they have differences in the rating levels (i.e., independence is reflected by a higher score in FIM but a lower score in MDS). To increase the consistency of the items in these three instruments, we reversed the rating levels of FIM prior to the analysis such that in all three instruments lower rating levels represent better functional status. In addition, we recoded any MDS items with score of 7 or 8 (activity occurred only once or twice or activity did not occur) as a score of 4 (totally dependent) (Wysocki, Thomas and Mor, 2015), and combined the scores 3, 4, and 5 in the item Feeding or Eating in OASIS due to a small proportion ( 1) of patients responding at these levels. After recoding, the items in FIM, MDS and OASIS have seven, five and four rating levels, respectively, except for the item bathing in OASIS that has seven levels.

Patients’ demographic characteristics are summarized in Table 2. Table 3 displays patients’ functional assessments in the three instruments. The patients who were discharged home () have an average FIM total score of 17.19 ( = 6.21), and the average of the total score for patients who were discharged to SNFs () is 27.41 ( = 7.46). This suggests that patients in have better functional status when they were discharged from IRFs. Table 3 also shows that patients in both and have smaller average total scores at a later assessment date, suggesting that the functional status for most of patients improves over the course of their post-acute stay. The magnitude of improvement among the subsample of patients who received home health care appears to be larger than those who stayed in SNFs. of the patients in improve their functional status, while only of the patients in have functional improvement.

Variable Overall
Age 77.17 (9.62) 76.40 (10.05) 76.81 (9.83)
Gender, female () 53.0 53.2 53.1
Race, white () 81.2 77.0 79.2
Marital status, married () 42.2 50.0 45.9
Table 2: Summary of the observed covariates for patients in , and overall.
Instrument Variable Overall
FIM Score 27.41(7.46) 17.19(6.21) 22.63(8.59)
Scorea at time 1 18.38 (2.85) - -
MDS Score at time 2 17.43 (3.66) - -
Differenceb -0.95 (2.41) - -
Improvedc () 48.2 - -
Score at time 1 - 15.60 (4.12) -
OASIS Score at time 2 - 10.59 (4.75) -
Difference - -5.01 (4.07) -
Improved () - 84.5 -
  • Score: the total score of functional assessments in each instrument;

  • Difference: the difference in total scores measured on admission and at a later assessment date;

  • Improved: the proportion of patients who experience functional improvement.

Table 3: Summary of patient’s functional outcomes in three instruments.

5.2 Imputation Model

We illustrate the proposed nested multiple imputation procedure using the complete data set of 72,575 patients. In the first imputation stage, we imputed the unmeasured assessments in MDS using the MVOP-T model and the DA algorithm described in Section 2.2. Age, gender, race and marital status were included in the model. We generated ten parallel chains of 50,000 iterations with dispersed initial values, and created ten imputed data sets.

In the Translating step, we considered two possible models to illustrate the flexibility of the proposed procedure for translating assessments without re-equating the instruments. The first model is a linear regression model, referred to as , , where and denote the total scores of the imputed and observed items in MDS and OASIS on admission, respectively. The unmeasured total scores in MDS at a later assessment date are then imputed using the estimates of and and the observed total scores in OASIS at the later date. The second model is the MVOP-T model, referred to as , which models the joint distribution of all individual items in the imputed MDS and the observed OASIS instruments, . The unmeasured individual items in MDS at the later date are imputed using the estimates of and the observed individual items in OASIS at the later date, . For the MVOP-T model, the DA algorithm in Section 2.2 was used to generate multiple imputations. Ten imputed data sets were generated in the second stage, resulting in a total of 100 complete data sets.

We also used the conversion table method and LVM to equate the MDS and OASIS instruments in the Equating step, and the linear regression model to impute the missing total scores in MDS in the Translating step. For LVM, in order to accommodate patient’s covariates, we first partitioned the sample into five subclasses by sub-classifying at the quintiles of the distributions of the estimated propensity scores, , and then imputed the unmeasured assessments within each subclass.

5.3 Model Diagnostics

As suggested by Gelman et al. (2005) and Abayomi, Gelman and Levy (2008), we evaluated the appropriateness of the imputation model by comparing the distributions of observed and imputed values. Patients who were released home have smaller total MDS scores (see Figure 2), and are more likely to be at lower levels of each item (not shown). These patterns are consistent with the patterns that are observed in FIM.

Figure 2: Histograms of the observed (gray) and imputed (black) total scores in MDS. The gray dotted line and black dashed line are the average observed and imputed total scores, respectively.

We further examined the imputation model using posterior predictive checks (Gelman, Meng and Stern, 1996; Burgette and Reiter, 2010; He and Zaslavsky, 2012; Si and Reiter, 2013; Si et al., 2016). We first created complete data sets and replicated data sets in which both and are simulated from the imputation model. We then compared each with its corresponding on three test statistics in the first stage imputation: (1) the mean total score of items in MDS, ; (2) the proportion of response levels in each of the items in MDS, , where is the number of responses at level in item ; and (3) the pairwise Goodman and Kruskal’s rank correlation coefficients between items in MDS, . Let and , , be the values of computed with and , respectively. For each , we computed the two-sided posterior predictive probability (),

where is the indicator function that is equal to 1 if the condition is satisfied and 0 otherwise. A small indicates that and deviate from each other in one direction, which suggests that the imputation model distorts the data characteristics captured by .

To obtain the pairs , we added a step to the DA algorithm that replaces all values of and using the sampled parameter values at each iteration. We calculated the test statistics based on complete and replicated data sets, and their differences , . The estimated two-sided , which does not indicate a deficiency in the imputation model for . The left and right panel of Figure 3 show the histogram of the two-sided values for and , respectively. None of the and values are below 0.05. Thus, we did not observe implausible imputations. Similar model diagnostics were performed for the second stage imputation, and no significant lack of model fit was detected (not shown).

Figure 3: Histograms of the two-sided posterior predictive probabilities () for (left panel) and (right panel). The red dashed line corresponds to a threshold value of 0.05.

Because posterior predictive checks may not be well calibrated (Hjort, Dahl and Steinbakk, 2006), we also examined the imputation performance using a sample partitioning method. Patients in were partitioned into a training sample that included 90 of the patients. The remaining 10 served as a test sample. We fit the MVOP model to the training sample and predicted the assessments of the test sample. We repeated this partitioning and prediction process ten times, and in each replication we compared the distribution of total mean score and pairwise rank correlation coefficients of the predicted MDS assessments to the observed ones. Table 3 of the online supplement (Gu and Gutman, 2017b) shows the results of the ten replications. No significant lack of model fit was detected.

5.4 Analysis of Multiply Imputed Data

We compared the rates of functional change experienced by patients treated in SNFs and those treated by home health agencies using the observed and imputed assessments in MDS.

We define and to be the average change in total scores of the items in MDS over two assessments after discharge from IRFs for patients in and , respectively:

where and are the total scores of the observed items in MDS for patients in on admission and at a later assessment, respectively, and are the total scores of the imputed items in MDS for patients in on admission and at a later assessment, respectively, and are the number of patients in and , respectively, and . We also define and to be the proportion of patients whose functional status improved during the course of the post-acute stay in and , respectively:

where is an indicator function that is equal to 1 if is true and 0 otherwise.

We applied the proposed NMI procedure to examine two quantities: (1) the difference in average change of total scores over the course of post-acute stay in and , ; and (2) the difference in proportions of patients whose functional status improved during the post-acute stay in and