Development of a Common Patient Assessment Scale across the Continuum of Care: A Nested Multiple Imputation Approach\thanksrefT1
Abstract
Evaluating and tracking patients’ functional status through the postacute care continuum requires a common instrument. However, different postacute service providers such as nursing homes, inpatient rehabilitation facilities and home health agencies rely on different instruments to evaluate patient’s functional status. These instruments assess similar functional status domains, but they comprise different activities, rating scales and scoring instructions. These differences hinder the comparison of patients’ assessments across health care settings. We propose a twostep procedure that combines nested multiple imputation with the multivariate ordinal probit (MVOP) model to obtain a common patient assessment scale across the postacute care continuum. Our procedure imputes the unmeasured assessments at multiple assessment dates, and enables evaluation and comparison of the rates of functional improvement experienced by patients treated in different health care settings using a common measure. To generate multiple imputations of the unmeasured assessments using the MVOP model, a likelihoodbased approach that combines the EM algorithm and the bootstrap method as well as a fully Bayesian approach using the data augmentation algorithm are developed. Using a dataset on patients who suffered a stroke, we simulate missing assessments and compare the MVOP model to existing methods for imputing incomplete multivariate ordinal variables. We show that, for all of the estimands considered, and in most of the experimental conditions that were examined, the MVOP model appears to be superior. The proposed procedure is then applied to patients who suffered a stroke and were released from rehabilitation facility to skilled nursing facilities or home.
arXiv:0000.0000 \startlocaldefs \endlocaldefs
A Nested Multiple Imputation Approach \thankstextT1Supported in part by the Changing LongTerm Care in America project funded by the National Institute on Aging (P01AG027296).
and
Data augmentation \kwdEM algorithm \kwdMissing data \kwdNested Multiple imputation \kwdMultivariate ordinal probit model \kwdSlice sampler.
1 Introduction
1.1 Overview
To track and evaluate patients through the postacute care continuum a common standardized evaluation tool is needed. Current evaluation tools have been largely developed within each type of health care provider, and cannot be easily compared. In inpatient rehabilitation facilities (IRFs), patients’ functional status is evaluated by the Functional Independence Measure (FIM). After being discharged from IRFs, the functional status of patients who stay in skilled nursing facilities (SNFs) (G1) is collected using the Minimum Data Set (MDS), while the Outcome and Assessment Information Set (OASIS) is collected for patients who receive home health care provided by home health agencies (G2). All of these assessments examine similar functional capabilities (e.g., eating, grooming, dressing, etc.), but the specific instruments, rating scales and instructions for scoring the different activities vary between these postacute settings. Thus, it is difficult to evaluate and compare the rates of functional improvement experienced by patients treated in the different health care settings.
The Continuity Assessment Record and Evaluation item set is a standardized evaluation tool that was developed for use at acute hospital discharge and at postacute care admission and discharge (Gage et al., 2012). This tool is intended to be a common evaluation tool for evaluating patients across the continuum of postacute care, and considerable resources were invested in its development. However, implementing new instruments in all postacute care settings may result in additional investments in administration and training as well as in changes to reimbursement system (Li et al., 2017). Moreover, adopting new instruments would require translating past functional outcome data so that comparison to existing outcome data is possible.
Equating settingspecific instruments so that functional status scores from one instrument could be used interchangeably with ones from another instrument is a possible approach to obtain a common evaluation tool (Kolen and Brennan, 2004; Dorans, Pommerich and Holland, 2007; von Davier, 2010). Linking and equating scores across different standardized assessments has been a major focus in the field of educational testing for the past 90 years (see Dorans, Pommerich and Holland, 2007, chapter 2, for details). Score equating methods have been recently used in health outcomes research. The conversion table method (Velozo, Byers and Joseph, 2007) was used to equate FIM assessments with MDS assessments. Conversion table equates the sum of individual item scores, also referred to as the total score, by matching on the latent functional status estimated from two different instruments using Item Response Theory models. This method was also used to equate scores from two physical functioning scales (ten Klooster et al., 2013), as well as two depression scales (Fischer et al., 2011). Conversion table may result in an invalid statistical procedure when further analysis is performed using the imputed scores, because it ignores the variability from the estimation and the imputation processes (Gu and Gutman, 2017a). Furthermore, a data set that comprises contemporaneous MDS and OASIS assessments is required in order to equate MDS and OASIS instruments. However, MDS and OASIS assessments are not jointly observed for patients in and .
We propose a nested multiple imputation procedure (Shen, 2000; Harel, 2003; Rubin, 2003) to impute unmeasured assessments across the continuum of care, which would enable evaluation and comparison of the rates of functional improvement across different health care settings. This procedure consists of an Equating step and a Translating step. In the Equating step, we impute the unmeasured assessments in MDS or OASIS that are close to the FIM assessment date to obtain a synthetic data set with simultaneous MDS and OASIS assessments. In the Translating step, we rely on the synthetic data set from the first step to estimate the relationship between MDS and OASIS that will be used to impute multiple unmeasured assessments in MDS or OASIS at later assessment dates. This twostep procedure accounts for the uncertainty in both steps, and provides flexibility for researchers to choose different models in the second step without the need to reequate the instruments.
The Equating step utilizes modeling of multiple ordinal outcomes to impute the missing instruments that consist of multiple ordinal items. The logistic and probit link functions are commonly used to model single ordinal variable. Both link functions tend to result in similar model fit and predictive performance. Bayesian inference for these models relies on sampling from complex posterior distributions. However, by introducing auxiliary variables, sampling from these posterior distributions can become more efficient. Albert and Chib (1993) described this technique for the probit link function and more recently Holmes et al. (2006) and Polson, Scott and Windle (2013) proposed two possible approaches for the logistic link function. The multivariate ordinal probit (MVOP) model was proposed as an extension of the probit model (Albert and Chib, 1993) and the multivariate probit model (Ashford and Sowden, 1970) to multivariate ordinal responses. Similar extensions for logistic link function with multiple ordinal outcomes is an area of future research. Using the MVOP model we can capture the complex dependence structure and the ordinal nature of the different functional assessment instruments as well as adjust for observed patients’ covariates.
To generate multiple imputations of the unmeasured functional assessments using the MVOP model, we develop two computational approaches. The first approach combines the EM algorithm (Dempster, Laird and Rubin, 1977) for obtaining the maximum likelihood estimates of the parameters in the MVOP model and the bootstrap method to multiply impute the missing values (Little and Rubin, 2002). The second approach relies on the data augmentation (DA) algorithm (Tanner and Wong, 1987) to draw the unknown parameters of the MVOP model and the missing values from their joint posterior distribution. We compared the MVOP model to existing methods for imputing incomplete multivariate ordinal variables with respect to the biases, the sampling variances, and the RMSEs of their point estimates, as well as the widths and coverage rates of their interval estimates. For all of the estimands considered and in most of the experimental conditions that were examined, the MVOP model appears to be superior. In the Translating step, different models can be used to estimate the relationship between MDS and OASIS assessments. We illustrate this flexibility by either imputing the missing individual items using the MVOP model or imputing the missing total scores using a linear regression model.
The remainder of this section describes the analytical data set, introduces the basic framework and reviews related work. Section 2 describes the MVOP models and their estimation methods. Section 3 presents the NMI procedure. Section 4 compares the MVOP models to existing methods using a simulation study. Section 5 describes the empirical data analysis. Conclusions and discussions are provided in Section 6.
1.2 Motivating Example
The analytical data set includes 72,575 patients who suffered a stroke and were discharged from IRFs between 2011 and 2014. Of these patients, 38,629 were released to SNFs (), where the MDS assessments were collected for them. The other 33,946 patients were discharged home (), where the OASIS assessments were used to measure their functional status. Patient assessments were collected on admission and at various times points during their postacute stays. The median number of assessments for each patient was 5 for patients in SNFs with a range of 0 to 91, and 3 for patients receiving home health care with a range of 0 to 46. Two assessments for each patient were included in our analyses. One assessment is collected at admission within 30 days from the IRF’s discharge date. The other is recorded approximately 30 days after the first assessment. The primary research objective is to examine and compare the rates of functional improvement experienced by patients treated in the different health care settings after being discharged from IRFs. To describe the functional change for patients who were released to either SNFs or home, we will use the MDS scale.
1.3 Basic Framework
We consider equating settingspecific patient assessments as a missing data problem. We assume that all patients have complete FIM assessments and complete demographic characteristics. Let , , be an indicator that is equal to 1 if patient is in and 0 if in . Let , , and denote matrices of item responses in FIM, MDS, and OASIS, respectively, with rows referring to subjects and columns referring to variables, and where , , , and . The subscripts A and B denote the assessments on admission and at a later date, respectively. The subscripts obs and mis denote the observed and missing assessments, respectively. In addition, let denote a set of fully observed covariates.
The joint posterior distribution of the missing data and the parameters can be written as
(1.1) 
where and index the imputation models in the Equating and Translating steps, respectively. The Equating step is performed once, and the Translating step can be performed multiple times. To reduce the computation complexity and provide flexibility to researchers we assumed in Equation (1.1) that and are conditionally independent. The data setting that consists of patient’s covariates, FIM assessments, and first MDS or OASIS assessments resembles the statistical matching setup (D’Orazio, Di Zio and Scanu, 2006). In this setup, the joint distribution of is not identifiable based on and alone because MDS and OASIS are never jointly observed.
This setup also arises in the test equating literature when using commonitem nonequivalent groups design (Kolen and Brennan, 2004; Dorans, Pommerich and Holland, 2007). This design assumes that different groups of examinees are assessed using two different test forms that share some common items. An important property of the commonitem set when used for equating is that they would be representative of the total test forms in content and statistical characteristics (Kolen and Brennan, 2004; Dorans, Pommerich and Holland, 2007). This is commonly attained by ensuring that the items are exactly the same in both forms and are at the same location in the form. Here, is similar for patients in and , and it is administered prior to and within a short time frame from the initial MDS and OASIS assessments. In addition, include similar content to the MDS and OASIS assessments, because it attempts to approximate the same underlying functional status. Based on these observations, a natural starting point is to apply the conditional independence assumption (CIA), (D’Orazio, Di Zio and Scanu, 2006). This assumption is often implicitly made in test equating applications using only . Here, we include other patient’s characteristics as well.
We further assumed that the unmeasured assessments are missing at random (MAR) (Little and Rubin, 2002), because a major determinant of patient’s discharge to SNF or home from a rehabilitation facility is the patient’s functional status, which is measured using the validated FIM instrument. Under CIA and MAR, we can impute using the posterior distribution in the Equating step. These two assumptions cannot be inferred from the data and may not always be plausible. To examine the plausibility of these assumptions, we conducted a sensitivity analysis to examine whether our results changed in a substantial way when these assumptions are violated (Rubin, 1986; Heitjan and Landis, 1994).
The Equating step generates complete synthetic data sets that comprise MDS and OASIS assessments simultaneously for patients in . Assuming that the relationship between contemporary imputed and observed instruments does not change across the continuum of care, we can simplify the third line of Equation (1.1):
where , and impute using in the Translating step, where denotes the imputed MDS assessments at admission.
1.4 Related Work
The Equating and the Translating steps require methods that can impute multivariate ordinal variables. Multivariate imputation methods can be classified into two types of methods: fully conditional specification and joint modeling. Fully conditional specification (Van Buuren, 2007) involves a series of univariate conditional models that imputes missing values sequentially with current model estimates. In practice, users only include main effects in these models, because it is challenging to identify and include higherorder interactions and nonlinear terms at each of the conditional models (Vermunt et al., 2008). With multiple ordinal variables, the default implementation of fully conditional specification relies on the ordered logit model. Gu and Gutman (2017a) note that this implementation fails to capture the full correlation structure of the imputed items when the proportional odds assumption is violated. Recently, a multilevel model based on the probit link function was proposed as a possible imputation model for missing ordinal variable (Enders, Keller and Levy, 2017).
The joint modeling approach (Schafer, 1997) imputes the missing values according to a joint model for the variables that is common to all observations. Yucel, He and Zaslavsky (2011) proposed a method that is based on multivariate normal model to impute ordinal variables and supplemented it with a rounding technique that preserves the observed marginal distribution of the ordinal variables. When there is a large proportion of missing values, propagation of errors in the underlying modeling approximation can compound and result in invalid statistical inferences (Yucel, He and Zaslavsky, 2011; Gu and Gutman, 2017a).
Imputation by Propensity score matching (IPSM) can be embedded in a joint modeling approach to define cells within which hotdeck imputations can be drawn (Andridge and Little, 2010). The propensity score is defined as the probability of a unit to have missing values. IPSM imputes missing values with observed values from units with similar estimated propensity scores. IPSM is a generally valid statistical method, but its performance is sensitive to the specification of the propensity score model (Gu and Gutman, 2017a).
Latent variable matching (LVM) (Gu and Gutman, 2017a) is a recently proposed procedure that combines IRT models with multiple imputation (Rubin, 1987) to impute unmeasured assessments. LVM is also a hot deck imputation method, which matches units using the underlying functional status estimated from IRT models. In its original form, LVM ignores patient covariates, which may violate the MAR assumption (Rubin, 1996). LVM can be extended to account for a set of discrete and continuous covariates by applying it within subgroups of the covariates; however, this approach may become computationally intensive when the number of possible covariate values is large.
Among these methods, IPSM and LVM are the strongest candidate methods in terms of validity and efficiency for imputing the missing assessments in our datasets (Gu and Gutman, 2017a). Thus, we only compared these two methods with the newly proposed procedure in Section 4.
2 The Multivariate Ordinal Probit Model
Let denote a generic matrix of item responses, where and are the matrices of missing and observed item responses, respectively, is the response of patient to item , and is the number of response levels of item . For example, in the Equating step, corresponds to the unmeasured assessments in MDS, , and corresponds to the observed assessments in FIM and MDS, and . The MVOP model introduces a matrix of latent response variables such that , if , where are unknown threshold parameters. The MVOP model assumes that , where is a vector of covariates for patient , is a matrix of unknown regression coefficients and is an unknown covariance matrix. Statistical inferences for are based on the likelihood
(2.1) 
where is the interval if , and is a normalizing constant.
The vector parameter is not identifiable because the likelihood (2.1) is invariant to location and scale transformations on . Threshold constraints and correlation constraints are two types of identification constraints that are commonly made with the MVOP model. The threshold constraints fix two threshold parameters for each outcome. For example, one could set and either or . Applying these constraints allows to sample the covariance matrix from a known probability distribution (Chen and Dey, 2000; Jeliazkov, Graves and Kutzbach, 2008). The correlation constraints either fix or , and restrict to be a correlation matrix (Lawrence et al., 2008; Zhang et al., 2017). We refer to the MVOP model under the correlation constraints and threshold constraints as the MVOPC model and MVOPT model, respectively.
MVOP models have been analyzed using likelihoodbased methods, including a direct likelihood approach that involves the evaluation of integrals using GaussianHermite quadrature (Li and Schafer, 2008), an approximate EM algorithm (Guo et al., 2015), a pseudolikelihood approach (Varin and Czado, 2010), and Bayesian approaches (Chen and Dey, 2000; Lawrence et al., 2008; Zhang et al., 2017). Of these methods, only Zhang et al. (2017) extended the MVOP model to handle incomplete correlated ordinal responses. Here, when some of the outcomes are missing, we propose Monte Carlo EM (MCEM) algorithms (Wei and Tanner, 1990) for maximum likelihood estimation and DA algorithms for Bayesian inference under the MVOP model to produce imputed values.
2.1 MCEM Algorithm
We first consider the MVOPT model, and fix and . The completedata likelihood is
The Estep of the EM algorithm, given the current value of the parameter, , involves evaluating the expectation
which consists of multiple integrations with respect to a truncated multivariate normal distribution of . cannot be calculated analytically, but Monte Carlo methods can be used to approximate it. We extend the slice sampler algorithm proposed by Damien and Walker (2001) for bivariate normal distribution to sample from the truncated multivariate normal distribution. The algorithm introduces a latent variable so that the slice sampler runs on a sequence of conditional distributions which can all be sampled directly using uniform distributions, and it has a faster mixing rate than the Gibbs sampling algorithm (Geweke, 1991). Details of the algorithm are described in Section 1 of the online supplement (Gu and Gutman, 2017b).
In the Mstep, we rely on conditional maximization (Meng and Rubin, 1993) to update in successive steps with respect to and :
where are draws from . To decrease the Monte Carlo errors, Wei and Tanner (1990) suggested using a large .
We also derive a consistent estimator for based on the empirical marginal distribution of the observed and imputed responses in the absence of threshold constraints (Olsson, 1979),
where if , and is imputed through the indicator function given the current estimate of if , and is the cumulative distribution function of the standard Normal distribution. The estimate of given the threshold constraints is
where .
2.2 Data Augmentation Algorithm
For Bayesian inference of the MVOPT model, we assign a prior distribution for and a prior distribution for , where denotes the inverseWishart distribution with degrees of freedom and scale matrix . Based on the work of Albert and Chib (1993), we use a uniform prior distribution over the polytope for . The feasible region for the parameter space of :
The DA algorithm for drawing samples from the posterior distribution of consists of an Imputation step that draws from using the slice sampler algorithm described in Section 2.1, and three Posterior simulation (P) steps:

Pstep 1: Draw , where , is the th row of , , and .

Pstep 2: Draw .

Pstep 3: Draw , for , and , where denotes the uniform distribution with support .
After each cycle of the algorithm, we impute the missing responses through the indicator functions , for and given the corresponding latent responses and threshold parameters .
2.3 Parameter Expansion Approach
For the MVOPC model, we fix and constrain the covariance matrix to be a correlation matrix . In the MCEM algorithm, the Mstep with respect to does not have a closed form solution (Chib and Greenberg, 1998), and direct maximization of the expectation of the completedata likelihood is computationally intensive. For the Bayesian inference, the posterior distribution of does not follow a known probability distribution. Thus, we use the parameter expansion (PX) technique (Liu, Rubin and Wu, 1998; Liu and Wu, 1999), and propose a PXMCEM algorithm to obtain the maximum likelihood estimates, and a PXDA algorithm to sample from the posterior distribution of , respectively. These algorithms are similar to the work of Zhang, Boscardin and Belin (2006), Lawrence et al. (2008) and Zhang et al. (2017).
We consider the following transformations:
(2.2) 
so that is transformed into a general covariance matrix, where is a diagonal matrix with diagonal elements . The PXMCEM and the PXDA algorithms using the transformed parameters and the latent responses proceed as the MCEM and DA algorithms described in Section 2.1 and Section 2.2, respectively. After each iteration, are updated via the inverse transformations of identities (2.2).
3 Nested Multiple Imputation Procedure
Let be a quantity of interest. We summarize the proposed procedure to multiply impute :

Equating: Impute from the predictive distribution .

Step 1: Draw independent parameters from the posterior distribution , or from the asymptotic distribution obtained by applying the EM algorithm to a bootstrapped sample of the cases.

Step 2: Impute through the indicator functions , where is determined by and , .

Step 3: Repeat steps 1 and 2 times to create imputed datasets , .


Translating: For each of the imputed datasets in Stage 1, impute from the predictive distribution .

Step 4: Draw independent parameters from the posterior distribution , or from the asymptotic distribution obtained by applying the EM algorithm to a bootstrapped sample of the cases.

Step 5: Impute through the indicator functions , where is determined by and , .

Step 6: Repeat steps 4 and 5 times to create imputed datasets , .


Combining Rules: For each of the complete datasets , for , and , the estimate of and its sampling variance are and , respectively. The overall estimate of and its sampling variance are obtained using the nested multiple imputation combining rule, confidence intervals and significance tests are based on a Student reference distribution (Shen, 2000; Harel, 2003; Rubin, 2003).
4 Simulation Study
We compared the performance of the MVOP model in comparison to existing methods for imputing incomplete multivariate ordinal variables using a simulation study.
4.1 Partially Simulated Data
The simulation study was based on observed FIM assessments and MDS assessments on admission for patients in , and missing MDS assessments were artificially generated. To generate incomplete data sets, we fitted a logistic regression model to the entire dataset where the explanatory variables comprised , patients’ age and patients’ gender,
(4.1) 
where , , and and denote the age and gender of patient , respectively. This resulted in estimated regression coefficients , where is the estimated intercept and is a vector of estimated regression coefficients for the other predictors. A simple random sample of = 1,000 patients was then drawn from the set of patients in , and () was sampled from a Bernoulli distribution with probability , where is the c.d.f. of a specified distribution and is the link function (McCullagh and Nelder, 1989). We considered three choices of , the logistic distribution, the Cauchy distribution, which is symmetric but has heavier tails than the logistic distribution, and the BoxCox family distributions (Guerrero and Johnson, 1982). The BoxCox distribution takes the form
This distribution allows us to assess the effect of skewness in the missing data mechanism. It is positively skewed for and negatively skewed for ; here, was fixed at either 0.3 or 0.3. The value of was fixed so that is either , or , where is the number of patients who have missing assessments. MDS assessments for patient were deleted to create incomplete data set when . For each configuration, 1,000 replications were produced.
The methods examined in the simulations were IPSM, LVM and the MVOP models implemented using both EM and DA algorithms: MVOPTDA, MVOPTEM, MVOPCDA and MVOPCEM. For IPSM, we estimated the propensity score using the logistic regression model: . For both IPSM and LVM, we used the nearestneighbor matching algorithm to find a potential donor. Ten multiple imputations were generated using each of the methods.
We examined the performance of the different methods on two estimands: (1) the population mean total score of items in MDS, , where , ; (2) the pairwise Goodman and Kruskal’s rank correlation coefficients (Goodman and Kruskal, 1954) between items in MDS, . For each method, at each configuration, and at each of the 1,000 replications, we recorded , , their estimated sampling variances, the corresponding root mean square errors (RMSEs), the interval estimate widths, and determined whether the interval covered or did not cover the true value . Using these values, we calculated for each approach and each configuration, the average coverage rate, the bias, the mean estimated sampling variance, the mean RMSE, and the mean interval width. For each configuration, we also calculated a loss function based on the negatively oriented interval scores (Gneiting and Raftery, 2007, Equation (61)). This loss function provides flexible assessment of coverage by accounting for the distance between the interval estimate and the estimand. For estimand , the loss function for interval estimate , has the form
(4.2) 
where and denotes the Lebesgue measure of the interval estimate .
The simulations were implemented using R 3.1.0 (R Core Team, 2014). The proposed EM and DA algorithms were implemented in C++ for efficiency. For the EM algorithms, we generated samples from in the Estep, and calculated the observeddata likelihood using a Monte Carlo method (Genz, 1992) to monitor the convergence of the MCEM algorithm. For the DA algorithms, multiple parallel chains of 50,000 iterations with dispersed initial values were generated. Standard MCMC convergence diagnostics such as GelmanRubin Statistic (Gelman and Rubin, 1992), trace plots, and autocorrelation plots were examined for a small sample of the simulations, and did not indicate failure to converge.
4.2 Results
Table 1 displays the mean biases, variances, RMSEs, coverages, interval widths and interval estimate loss function of for configurations, where is the logistic distribution and . All of the methods generally produce statistically valid inferences. Compared to all of the methods that were examined, MVOP models implemented using the DA algorithms have coverages that are closest to nominal across all configurations. IPSM has coverages that are slightly smaller than LVM for , and worse than LVM for . When , the parametric models underlying LVM impute the missing values with less bias than the propensity score model used in IPSM. Similar results were observed when predictive mean matching was compared to IPSM (Andridge and Little, 2010). The MVOP models implemented using the DA algorithms generally have the smallest biases and RMSEs, while the MVOP models implemented using the EM algorithms and the bootstrap method have the largest biases, variances, RMSEs and interval widths. Because some methods have lower coverage but with shorter intervals, and some have higher coverage with wider intervals, we used the loss function in Equation (4.2) to compare the methods on coverage and interval width simultaneously. Generally, the MVOPT models implemented using the DA algorithms had the smallest interval score loss followed by LVM.
Method  Bias  Variance  RMSE  Coverage  Width  Equation (4.2)  

IPSM^{a}  0.013  0.014  0.124  93.2  0.460  0.607  
LVM^{b}  0.047  0.014  0.124  93.9  0.459  0.588  
0.2  MVOPT^{c}DA^{d}  0.015  0.014  0.116  95.5  0.472  0.547 
MVOPC^{e}DA  0.012  0.013  0.115  94.3  0.453  0.569  
MVOPTEM^{f}  0.017  0.029  0.133  98.0  0.735  0.678  
MVOPCEM  0.032  0.030  0.130  97.8  0.752  0.677  
IPSM  0.015  0.016  0.147  90.4  0.500  0.752  
LVM  0.070  0.020  0.159  91.8  0.565  0.767  
0.4  MVOPTDA  0.030  0.025  0.139  96.1  0.641  0.715 
MVOPCDA  0.025  0.020  0.141  94.8  0.564  0.793  
MVOPTEM  0.070  0.039  0.175  97.3  0.885  0.823  
MVOPCEM  0.073  0.039  0.166  97.2  0.886  0.836  
IPSM  0.062  0.034  0.228  89.6  0.748  1.191  
LVM  0.050  0.035  0.196  92.2  0.757  1.007  
0.6  MVOPTDA  0.027  0.050  0.183  97.9  0.931  0.984 
MVOPCDA  0.007  0.033  0.179  95.3  0.749  0.871  
MVOPTEM  0.180  0.061  0.269  91.7  1.137  1.084  
MVOPCEM  0.167  0.063  0.263  92.4  1.164  1.024 

IPSM: imputation by propensity score matching;

LVM: latent variable matching;

MVOPT: multivariate ordinal probit model with threshold constraints;

DA: data augmentation algorithm;

MVOPC: multivariate ordinal probit model with correlation constraints;

EM: expectationmaximization algorithm.
Figure 1 displays the distribution of biases, interval coverages, interval widths and interval score loss of for configurations where is the logistic distribution and . The MVOP models except for MVOPCDA have coverages that are close to nominal, while IPSM and LVM have median coverage that is lower than . However, except for MVOPTDA, the other MVOP models have larger biases than IPSM and LVM. As with , the MVOPTEM and MVOPCEM methods have largest biases and interval lengths, but their coverages are closer to nominal when compared to LVM and IPSM. Lastly, the MVOPTDA method has better coverages and smaller interval score loss than the MVOPCDA method. These trends are similar to the ones observed with (see Figures 5  6 Section 2 of the online supplement (Gu and Gutman, 2017b)).
Because the MVOPTDA method generally has the best operating characteristics when is the logistic distribution, we only included this method when we examined the effects of misspecification of the propensity score model. Table 1 in Section 2 of the online supplement (Gu and Gutman, 2017b) displays the results for when is the Cauchy distribution or the BoxCox family distribution with . The performance of IPSM is sensitive to misspecification of the propensity score model. For example, when and is the BoxCox family distribution with , the coverage of IPSM is only and its loss is much larger than LVM and MVOPTDA. In contrast, LVM and MVOPTDA are robust to different link functions. MVOPTDA has better coverages and smaller biases than LVM across all of the configurations that were examined, and generally has smaller interval score loss than LVM. Figures 7  15 in Section 2 of the online supplement (Gu and Gutman, 2017b) display the results for when is the Cauchy distribution and the BoxCox family distribution with , respectively. IPSM, LVM and MVOPTDA have similar point estimates, but MVOPTDA has better coverages and smaller interval score loss than LVM and IPSM in most of the configurations examined. IPSM has the lowest coverages, and the median of its coverages is about when is the BoxCox distribution with and . When the percentage of missingness decreases, the coverages of MVOPTDA are closer to nominal.
4.3 Sensitivity of the methods to the CIA and MAR Assumptions
The proposed methods rely on the validity of the CIA and MAR assumptions (Section 1.3). We conducted an additional simulation study to examine the plausibility of these assumptions in this analysis.
One clinical variable that is recorded for patients in IRFs is their swallowing status at discharge. Swallowing status is a categorical variable with three categories of Regular Food, Modified Food Consistency/Supervision and Tube/Parenteral. Swallowing status is correlated with patient’s selfcare functional status as well as patient’s discharge destination. We recoded the swallowing status using two dummy variables, which were added to the Equation (4.1). The setup of the coefficients in Equation (4.1) was similar to the one in Section 4.1, and we also considered the three different link functions and three possible values for . Because MVOPTDA has the best operating characteristics, we only examined the validity of the CIA and MAR assumptions with this model. Swallowing status was not included when fitting the MVOPTDA model to address the possibility that physicians may have used unobserved clinical information when selecting between the two possible discharge locations. The fitted MVOP model potentially violates both the CIA and the MAR assumptions.
Table 2 and Figure 16 in Section 2 of the online supplement (Gu and Gutman, 2017b) display the results for and for in this sensitivity analysis, respectively. MVOPTDA generally provides statistically valid inferences. When the percentage of missingness decreases, the biases, variances, RMSEs and interval widths of MVOPTDA decrease.
5 Motivating Example Revisited
5.1 Data
FIM, MDS and OASIS include similar functional status items, but they have differences in the rating levels (i.e., independence is reflected by a higher score in FIM but a lower score in MDS). To increase the consistency of the items in these three instruments, we reversed the rating levels of FIM prior to the analysis such that in all three instruments lower rating levels represent better functional status. In addition, we recoded any MDS items with score of 7 or 8 (activity occurred only once or twice or activity did not occur) as a score of 4 (totally dependent) (Wysocki, Thomas and Mor, 2015), and combined the scores 3, 4, and 5 in the item Feeding or Eating in OASIS due to a small proportion ( 1) of patients responding at these levels. After recoding, the items in FIM, MDS and OASIS have seven, five and four rating levels, respectively, except for the item bathing in OASIS that has seven levels.
Patients’ demographic characteristics are summarized in Table 2. Table 3 displays patients’ functional assessments in the three instruments. The patients who were discharged home () have an average FIM total score of 17.19 ( = 6.21), and the average of the total score for patients who were discharged to SNFs () is 27.41 ( = 7.46). This suggests that patients in have better functional status when they were discharged from IRFs. Table 3 also shows that patients in both and have smaller average total scores at a later assessment date, suggesting that the functional status for most of patients improves over the course of their postacute stay. The magnitude of improvement among the subsample of patients who received home health care appears to be larger than those who stayed in SNFs. of the patients in improve their functional status, while only of the patients in have functional improvement.
Variable  Overall  

Age  77.17 (9.62)  76.40 (10.05)  76.81 (9.83) 
Gender, female ()  53.0  53.2  53.1 
Race, white ()  81.2  77.0  79.2 
Marital status, married ()  42.2  50.0  45.9 
Instrument  Variable  Overall  
FIM  Score  27.41(7.46)  17.19(6.21)  22.63(8.59) 
Score^{a} at time 1  18.38 (2.85)      
MDS  Score at time 2  17.43 (3.66)     
Difference^{b}  0.95 (2.41)      
Improved^{c} ()  48.2      
Score at time 1    15.60 (4.12)    
OASIS  Score at time 2    10.59 (4.75)   
Difference    5.01 (4.07)    
Improved ()    84.5   

Score: the total score of functional assessments in each instrument;

Difference: the difference in total scores measured on admission and at a later assessment date;

Improved: the proportion of patients who experience functional improvement.
5.2 Imputation Model
We illustrate the proposed nested multiple imputation procedure using the complete data set of 72,575 patients. In the first imputation stage, we imputed the unmeasured assessments in MDS using the MVOPT model and the DA algorithm described in Section 2.2. Age, gender, race and marital status were included in the model. We generated ten parallel chains of 50,000 iterations with dispersed initial values, and created ten imputed data sets.
In the Translating step, we considered two possible models to illustrate the flexibility of the proposed procedure for translating assessments without reequating the instruments. The first model is a linear regression model, referred to as , , where and denote the total scores of the imputed and observed items in MDS and OASIS on admission, respectively. The unmeasured total scores in MDS at a later assessment date are then imputed using the estimates of and and the observed total scores in OASIS at the later date. The second model is the MVOPT model, referred to as , which models the joint distribution of all individual items in the imputed MDS and the observed OASIS instruments, . The unmeasured individual items in MDS at the later date are imputed using the estimates of and the observed individual items in OASIS at the later date, . For the MVOPT model, the DA algorithm in Section 2.2 was used to generate multiple imputations. Ten imputed data sets were generated in the second stage, resulting in a total of 100 complete data sets.
We also used the conversion table method and LVM to equate the MDS and OASIS instruments in the Equating step, and the linear regression model to impute the missing total scores in MDS in the Translating step. For LVM, in order to accommodate patient’s covariates, we first partitioned the sample into five subclasses by subclassifying at the quintiles of the distributions of the estimated propensity scores, , and then imputed the unmeasured assessments within each subclass.
5.3 Model Diagnostics
As suggested by Gelman et al. (2005) and Abayomi, Gelman and Levy (2008), we evaluated the appropriateness of the imputation model by comparing the distributions of observed and imputed values. Patients who were released home have smaller total MDS scores (see Figure 2), and are more likely to be at lower levels of each item (not shown). These patterns are consistent with the patterns that are observed in FIM.
We further examined the imputation model using posterior predictive checks (Gelman, Meng and Stern, 1996; Burgette and Reiter, 2010; He and Zaslavsky, 2012; Si and Reiter, 2013; Si et al., 2016). We first created complete data sets and replicated data sets in which both and are simulated from the imputation model. We then compared each with its corresponding on three test statistics in the first stage imputation: (1) the mean total score of items in MDS, ; (2) the proportion of response levels in each of the items in MDS, , where is the number of responses at level in item ; and (3) the pairwise Goodman and Kruskal’s rank correlation coefficients between items in MDS, . Let and , , be the values of computed with and , respectively. For each , we computed the twosided posterior predictive probability (),
where is the indicator function that is equal to 1 if the condition is satisfied and 0 otherwise. A small indicates that and deviate from each other in one direction, which suggests that the imputation model distorts the data characteristics captured by .
To obtain the pairs , we added a step to the DA algorithm that replaces all values of and using the sampled parameter values at each iteration. We calculated the test statistics based on complete and replicated data sets, and their differences , . The estimated twosided , which does not indicate a deficiency in the imputation model for . The left and right panel of Figure 3 show the histogram of the twosided values for and , respectively. None of the and values are below 0.05. Thus, we did not observe implausible imputations. Similar model diagnostics were performed for the second stage imputation, and no significant lack of model fit was detected (not shown).
Because posterior predictive checks may not be well calibrated (Hjort, Dahl and Steinbakk, 2006), we also examined the imputation performance using a sample partitioning method. Patients in were partitioned into a training sample that included 90 of the patients. The remaining 10 served as a test sample. We fit the MVOP model to the training sample and predicted the assessments of the test sample. We repeated this partitioning and prediction process ten times, and in each replication we compared the distribution of total mean score and pairwise rank correlation coefficients of the predicted MDS assessments to the observed ones. Table 3 of the online supplement (Gu and Gutman, 2017b) shows the results of the ten replications. No significant lack of model fit was detected.
5.4 Analysis of Multiply Imputed Data
We compared the rates of functional change experienced by patients treated in SNFs and those treated by home health agencies using the observed and imputed assessments in MDS.
We define and to be the average change in total scores of the items in MDS over two assessments after discharge from IRFs for patients in and , respectively:
where and are the total scores of the observed items in MDS for patients in on admission and at a later assessment, respectively, and are the total scores of the imputed items in MDS for patients in on admission and at a later assessment, respectively, and are the number of patients in and , respectively, and . We also define and to be the proportion of patients whose functional status improved during the course of the postacute stay in and , respectively:
where is an indicator function that is equal to 1 if is true and 0 otherwise.
We applied the proposed NMI procedure to examine two quantities: (1) the difference in average change of total scores over the course of postacute stay in and , ; and (2) the difference in proportions of patients whose functional status improved during the postacute stay in and