Unified Method for Markov Chain Transition Model Estimation Using Incomplete Survey Data
Abstract
The Future Elderly Model and related microsimulations are modeled as Markov chains. These simulations rely on longitudinal survey data to estimate their transition models. The use of survey data presents several incomplete data problems, including coarse and irregular spacing of interviews, data collection from subsamples, and structural changes to surveys over time. The ExpectationMaximization algorithm is adapted to create a method for maximum likelihood estimation of Markov chain transition models using incomplete data. The method is demonstrated on a simplified version of the Future Elderly Model.
1 Introduction
1.1 Health Policy Simulations
The Future Elderly Model (FEM; Goldman et al., 2015) is a microsimulation of healthrelated outcomes for the U.S. population over age 50. The FEM was recently extended to simulate outcomes for the U.S. population starting at age 25 in the Future Americans Model (FAM; Goldman et al., 2016). Currently, an international effort is underway to adapt the FEM to populations around the world, including Canada (Boisclair et al., 2016), 10 European countries (euFem, 2017), Japan (Chen et al., 2016), Mexico, Singapore, and South Korea. Each of these models is simulated as a Markov chain and the transition probabilities are estimated using data from a longitudinal survey in which each individual is typically followed until death. The FEM is estimated using data from the Health and Retirement Study, public use dataset (2016; HRS) core interviews. Although the specific survey questions differ from country to country, many of these surveys are structured similarly to the HRS and include questions about individuals’ health, finances, family structure, utilization of medical services, employment, and participation in government programs (Hankin et al., 2001; Korea Employment Information Service, 2017; MHAS, Mexican Health and Aging Study, 2017; Research Institute of Economy, Trade and Industry, 2017; Statistics Canada, 2017; Survey of Health, Ageing and Retirement in Europe, 2017).
1.2 Model Specification, Estimation, Simulation, and Analysis
Policy simulation projects can be divided into three conceptual phases: estimation, simulation, and analysis. The estimation phase starts with vectors of observed outcomes, starting at time and ending at The observations are used to estimate the firstorder Markov transition probability: This is the probability of transitioning into state at time conditional on being is state in the previous time step, . A stationary, firstorder Markov chain is assumed here in order to keep notation simple. Neither of these assumptions are required and it is straightforward to extend the methodology to higherorder or nonstationary chains. This article focuses on maximum likelihood estimation of a parametric model where the transition probabilities are indexed by a parameter vector so that the transition probability functions have the form The estimation phase then consists of finding the parameter estimate, , that maximizes the likelihood of the observed sequence.
Next comes the simulation phase in which Monte Carlo methods are used to simulate future sequences of outcomes from the estimated transition probability model up to some time . Formally, for
Finally, the simulated outcomes are used as inputs to the analysis phase. In the health policy setting, simulation output is often used for forecasting, e.g., projecting future medical costs. The transition model can also be modified to represent a policy intervention and then a second simulation is run to asses the impact of the intervention. For example, a new medical intervention could be represented by reducing the probability of a simulated individual being diagnosed with cancer. The simulation analyst might look at the average difference in costs between the modified model and the scenario with no intervention to better understand the cost effectiveness of the intervention.
1.3 Estimation Challenges Stemming from Incomplete Survey Data
The FEM and related simulation projects use observed outcomes, , from survey data to estimate their transition models. This reliance on survey data introduces some incomplete data problems into the estimation phase. In the context of this article, incomplete means that the survey is missing data on some of the outcomes at some of the time points in the transition model to be estimated.
Examples include:
Coarse spacing of survey waves: The FEM relies on HRS for transition data. The FAM uses the Panel Study of Income Dynamics, public use dataset (2016; PSID). In both of these surveys, respondents are interviewed every two years. For a simulation of oneyear time steps, the surveys are missing all outcomes at every other time step. Only , , are observed
Irregular spacing of survey waves: The Mexico FEM uses (MHAS, Mexican Health and Aging Study, 2017) for transition data. Respondents were interviewed in 2001, 2003, 2012, and 2015. Even if twoyear time steps are chosen for simulation, there are missing data for outcomes between 2003 and 2012 and between 2012 and 2015. Similarly, the FEM for Europe uses the (Survey of Health, Ageing and Retirement in Europe, 2017), which has a twoyear wave cycle, but the 2008 survey does not ask the same questions as the other waves and so cannot be used as transition data.
Irregular spacing of individual interviews: There are occasions when a survey cannot reach all of its respondents within a wave cycle due budget or staff limitations. In these cases the time between a respondent’s consecutive interviews will be longer than the wave cycle. The responses might appear in the survey data as if they were collected within the correct wave, but in fact they should be counted in a later year, which leaves a gap of missing data.
Data collected from random subsamples: HRS measured biomarkers for a randomly selected subsample in 2004 (Crimmins et al., 2008). The collection was expanded to a randomly selected half of its sample in 2006 and 2010. Specimens were collected from the other half of the sample in 2008 and 2012 (Crimmins et al., 2013, 2015). Incorporating biomarkers as an outcome into the FEM transition model means data for at least half of the sample are missing in any given survey wave. The fouryear spacing between biomarker measurements also introduces the problem of missing data due coarse spacing.
Structural changes to survey data over time: HRS has been collecting data since the 19911992 wave and the FEM typically uses data starting in the 2000 wave for transition model estimation. Including biomarkers in the FEM transition model means some estimation data are missing from 2000 until the introduction of biomarkers.
Item reference time shorter than survey wave spacing: The PSID interviews collect information about income amounts and sources for the previous calendar year. Since the interviews are conducted in odd years, the income data are observed only for even years.
When data are complete, it is relatively easy to compute the likelihood of the observed outcome sequence and find the maximizing value. The likelihood of the sequence is simply the product of the transition probabilities at each time step.
(1) 
The marginal probability of is ignored here to keep the focus on transition model estimation. The loglikelihood is
and the difficulty of finding the maximizer depends mostly on the specification of
When an outcome is missing, the likelihood becomes more difficult to compute because outcomes in the same time step and the next time step depend on the missing outcome. Define as the set of outcomes observed in the survey at time and as the missing outcomes not observed in the survey at time Assume, for simplicity, that is fully observed so that and Then, the likelihood of the observed outcome sequence is:
(2) 
In this case, the integration makes it more difficult to find the maximizer. Nevertheless, (2) is the likelihood of the observed data. If the goal is to use maximum likelihood estimates in the simulation phase, then the maximizing value of this likelihood is required.
1.4 Unified Solution
This article presents a method to obtain the maximum likelihood estimates from (2) that is based on the ExpectationMaximization (EM) algorithm (Dempster et al., 1977). Section 2 shows how the EM algorithm can be applied to estimate Markov Chains using incomplete data and shows how machinery from the simulation phase could be reused to assist in the estimation phase. The result is an iterative process:

Simulate several copies of the missing survey data using the estimated transition model.

Compute imputation weights for each simulated data set.

Reestimate the transition model using the weighted simulated data sets.
This is repeated until the transition model parameter estimates meet some convergence criterion between iterations. Section 3 demonstrates how to apply this to the FEM and related simulations mentioned in Section 1.1. Section 4 provides implementation and performance details. Although the focus here is on solving some common problems in the estimation phase, it turns out that these same tools can be used to solve a related problem in the simulation phase by adjusting the analysis. This is discussed in Section 5 along with suggestions for performance improvements and a note of caution about the types of missing survey data.
2 ExpectationMaximization for Markov Chains
2.1 ExpectationMaximization Algorithm
The EM algorithm is a general method for maximum likelihood estimation in the presence of missing data. A brief sketch of EM for a general model is offered to introduce notation before bringing in the additional details that come with Markov chain models. Suppose a parametric probability model, where is a multivariate random variable. Suppose further that only some components of are observed. Let and respectively represent the observed and missing components of . If could be observed, then the likelihood of the complete data would be The likelihood for the observed data is an integration over the complete data likelihood with respect to the unobserved data:
(3) 
The EM algorithm provides an iterative approach to finding a maximizer of (3) instead of trying to find the maximizer directly, which is often difficult because of the integration. Start with an initial estimate of the parameters, . Then, repeat the following iteration starting at

Estep: Compute the expected value of the complete data loglikelihood conditional on the observed data and using parameter values
(4) The second line in (4) comes from the fact that the probability of conditional on is . Note that will be used as function of , so the Estep is not computing a numerical result, but is instead computing pieces of a function that will be used in the next step.

Mstep: Keeping constant, find the value of that maximizes :
(5)
Dempster et al. (1977) show how the sequence converges to maximize (3). In practice, the iteration is typically stopped when some convergence criterion is met, e.g., when or become very small. There are several methods for computing confidence intervals or performing hypothesis tests about the parameters based on the EM estimates. Section 4.2.3 of Givens and Hoeting (2012) has an overview of these methods with references.
2.2 ExpectationMaximization for Estimating Markov Chain Transition Models
Now, returning to Markov chain models with incomplete estimation data, the complete data likelihood (1) can be restated in terms of the observed and missing components:
Taking the log and putting this into the Estep gives:
Applying Bayes’ theorem to the term involving yields:
The denominator is a function of only and observed data. It can be ignored when searching for the maximizing value. Putting the numerator back into the Estep gives:
(6) 
2.3 Estep Computation Using Importance Sampling
Although the terms inside the integrals in (6) are computationally simple to evaluate, the integrals themselves can be challenging because of the Markov chain dependency structure. Instead of trying to evaluate these integrals analytically, a Monte Carlo approximation can be used. The ease of this approach depends on the ability to factorize into (a) terms that can be easily simulated and (b) terms that can be easily computed.
One approach is to use the factorization:
(7) 
Starting at , where is fully observed, Monte Carlo methods can be used to simulate draws, (), from . Computing and for each of these draws gives
as a Monte Carlo approximation of
Moving to , Monte Carlo draws of can be simulated from for and then compute . Combining the time steps gives
as a Monte Carlo approximation of
Continuing like this through time , let
then is a Monte Carlo estimate of (6). The goal here is to simulate from the distribution of missing outcomes conditional on all observed outcomes in past, present, and future time steps. This is accomplished by simulating the missing outcomes from a different distribution (dependent only on the previous time step), then weighting the simulated outcomes to match the target distribution (dependence on current and future time steps).
If (7) does not yield convenient simulation and computation, other factorizations can be considered, such as,
(8) 
Here, is simulated conditional on the the previous time step and observed data in the same time step, . Repeating times and letting
gives as the Monte Carlo approximation of (6).
If outcomes in the same time step are conditionally independent given outcomes at the previous time step, then the factorization is simply:
(9) 
Simulate from and compute weights,
Due to the conditional independence assumption in the transition model, the ability to simulate from will also be needed for the simulation phase. As a result, conditional independence allows the simulation developer to solve two problems with one implementation. It is not necessary to use the same factorization in every time step. As shown in Section 3, the FEM uses conditional independence (9) for all outcomes when an individual is alive at time factorization (7) when the mortality status is missing, and (8) if the individual was alive at time and died by time .
In practice, without conditional independence, the factoring of outcomes into a missing block and an observed block in (7) and (8) might not lead immediately to a structure that is easily simulated or weights that are easily computed. In that case, it is necessary to look at the dependencies between individual outcomes in and and to factorize into smaller blocks of outcomes. This is completely dependent on the transition model structure in a particular simulation project, but the guiding principle is to look for blocks of missing outcomes that can be easily simulated by using the same functions needed for the simulation phase.
To summarize, the EM algorithm is modified by inserting imputation (I) and weighting (W) steps:

Estep: For :

For :

Istep: Simulate the missing outcomes at time to create a complete data set:

Wstep: Compute the probability of the observed outcomes at time :


Compute


Mstep: Keeping constant, find

Stop when convergence criterion is satisfied.
The implementation details of the Istep and Wstep depend on how is factorized at each time The convergence check is required as the final step because using finite will leave some Monte Carlo error between and . This approach can be seen as a hybrid between the analytical Estep of Dempster et al. (1977) and the fully Monte Carlo Estep of Wei and Tanner (1990). This method is called EMMCIS (ExpectationMaximization for Markov Chains using Importance Sampling) in what follows.
3 Application to Future Elderly Model
The FEM simulates a rich set of outcomes from HRS, including physical and mental health status, disability, health risk factors, employment status, earnings, retirement income, and Medicare and Medicaid participation. A simplified version of the model is used here to demonstrate the new method. It captures important features of the FEM, but sacrifices richness for ease of demonstration. Specifically, the simplified model includes current smoking status, diagnosis of diseases, and mortality as binary yes/no transitioned outcomes. Age is computed from birth year. Gender and race/ethnicity are fixed characteristics. The possible disease diagnoses are cancer, diabetes, heart disease, hypertension, lung disease, and stroke. Each disease diagnosis is an absorbing state – if an individual reports a diagnosis at time they will remain in that state for every subsequent time step. Smoking status is transient. The demographic variable states are male or female for gender and white, Hispanic, or nonHispanic black for race/ethnicity. Following the FEM, the simplified model is a firstorder Markov chain and outcomes are independent conditional upon the state in the previous time step and mortality in the current time step. If an individual dies in the current time step, then the model does not simulate their transition in any of the other outcomes. This matches the structure of HRS core interview data used for estimation (RAND HRS Data, Version P, 2016).
Outcome  Dependencies in previous time step 

Smoking status  cancer, diabetes, heart disease, hypertension, lung disease, stroke, smoking status 
Cancer  smoking status 
Diabetes  smoking status 
Heart disease  diabetes, hypertension, smoking status 
Hypertension  diabetes, smoking status 
Lung disease  smoking status 
Stroke  cancer, diabetes, heart disease, hypertension, smoking status 
Mortality  cancer, diabetes, heart disease, hypertension, lung disease, stroke, smoking status 
Each of the FEM outcomes considered here is modeled as a Probit regression. Let the outcomes for individual at time be denoted as with as smoking status and as mortality status. The outcomes for are the six disease conditions, but the exact assignment of disease condition to index is not important. Let be the vector of outcomes at time on which depends (see Table 1). For example, because smoking status at time depends on all outcomes in the previous time step. Let be the vector of individual covariates for outcome at time containing a unit constant, age, demographic indicators, and the elements of The Probit model for each outcome has its own parameter vector of coefficients corresponding to the elements of : an intercept term, an age coefficient, coefficients for demographics, and coefficients for outcomes in the previous time step. Each transition probability is then computed as with adjustments for absorption and mortality, where appropriate. Let be the vector of outcomes, including age and demographics, for individual at time The loglikelihood has the following components:
for smoking status,
for disease conditions (), and
for mortality. Putting everything together, the loglikelihood for the simplified FEM model is:
This is the likelihood of the model to be simulated. However, the HRS data for estimating it are incomplete.
There are missing responses in HRS data because some respondents could not be contacted at all in a particular survey wave or because some respondents did not answer specific questions. For the sake of demonstration, assume all of these data are missing at random conditional on the observed data. See Section 5.2 for discussion. Additionally, the model uses a oneyear time step. This means HRS data are only available for every other time step. However, the missing time steps can be filled in with information about absorbing outcomes because implies for all and implies for all The only times when will be missing for absorbing outcomes are when the most recent observation is 0 and the subsequent observation is 1. Individuals’ ages and demographics are always observed. The loglikelihood for the observed data is:
where and are the observed and missing outcomes for individual at time , respectively.
The EMMCIS method was implemented for this demonstration by iterating over individuals at the highest level and iterating sequentially over time steps within each individual. For each time step, the first task is to check mortality status and impute it if it is unknown. Then, if the individual is alive at time (either observed or imputed), the conditional independence of outcomes allows processing the remaining outcomes in any order or in parallel. The Estep is applied to each individual is as follows:

For :

Initialize

For :

If is missing, simulate with probability , otherwise set and update

If , stop iteration over and move to next in step 1.

For :

If is missing, simulate with probability , otherwise set and update



Compute

Note that the Istep and Wstep are applied to each outcome in steps 1.b.1 and 1.b.3.a. The Mstep is computed by combining imputations across individuals and the weights:
(10) 
4 Implementation
The performance of EMMCIS in estimating the simplified FEM model was evaluated using individuals surveyed by HRS in 2000 and followed until death or 2014. The estimation was implemented in R (R Core Team, 2016) and run on a server using 20 CPUs for parallelization over individuals. The server had 196 GB of memory and 2.4 GHz clock speed for each CPU. There was some competition for CPUs from other users while the estimation was running, but the estimation had at least 85% of the CPU time.
The R implementation used the plyr package (Wickham, 2011) with the doParallel backend (Revolution Analytics and Weston, 2015) to divide work between nodes. To generate an initial estimate of each , a Probit regression was run on the observed data, i.e., treating twoyear time steps as a oneyear time step and throwing away any transitions with a missing outcome value () or missing values for dependencies in the previous time step (). The Mstep used the BFGS method for optimization via the optim function in R. The BFGS iteration was run for a prespecified number of iterations or until its default convergence criterion (a relative likelihood change of less than roughly ) was met. The EM iteration was stopped when the absolute change in likelihood between subsequent iterations was less than
There are two conceptual gains in each EM iteration. First, there is a gain from computing the Estep with a value that is closer to the maximizer than . Second, there is a gain from running the Mstep to find a more optimal value. Each of these gains has a cost: the Estep requires simulating copies of the missing values and the Mstep involves repeatedly evaluation the function (and its gradient). During implementation tests, it was observed that initial EM iterations made large gains of the second kind from only one BFGS iteration while the first kind of gain had smaller magnitudes. This was true even when was small. This led to an adaptive iteration scheme where the number of iterations in the Estep and Mstep were increased based on changes in the function. At the beginning of the estimation, was set to 10 and the maximum number of BFGS iterations was set to 3. If any iteration ended with then was increased by a factor of ten. The threshold for increasing the number of iterations was based on a roughly estimated upper bound for the Monte Carlo variance of using a small number of Monte Carlo replicates. Further work is needed to determine an appropriate threshold. After reached 1,000, then, in any iteration that failed to increase beyond the threshold, the maximum number of BFGS iterations was increased by a factor of 10 until reaching 300. With this adaptive strategy, the estimation converged on the 13th EM iteration after running for 126 hours. Across all EM iterations, was evaluated 998 times and its gradient 112 times.
5 Discussion
5.1 Paths to Improved Performance
There are some obvious ways to make the estimation in Section 4 use computational resources more efficiently. As a first step, some efficiency will be gained by translating the R code to a compiled language like C. However, the greatest gains should come from minimizing the amount of data communicated between processing nodes and reducing the number of iterations in the Mstep optimization. Implementation of any of these performance improvements is expected to make the estimation to run noticeably faster.
The current implementation starts by loading all of the observed data into the master node’s memory. During the Estep, the plyr methods in R communicate pieces of the observed data set to each worker node and the worker nodes return the corresponding complete data imputations. Pieces of the complete data imputations are then sent back to the worker nodes during each optimization iteration in the Mstep. It is possible to eliminate the communication of individual data points in a distributed memory environment with shared storage by creating worker processes with the following functionality.

Initialization: Load a section of the observed data from storage into local memory, e.g., the th node of total nodes will load data for individuals.

Estep: Receive from master node. For each individual stored locally, create imputations of missing data and compute .

Mstep: Receive from master node. Evaluate the complete data likelihood component (or its gradient or Hessian) for each individual stored locally. Send the weighted sum over these individuals back to the master node.
With this functionality, the largest object communicated between processes is a gradient or Hessian, both of which are much smaller than the observed and imputed data sets.
There are several ways that the Mstep implementation can be improved. The overall goal is to reduce the number of optimization iterations because evaluation of the function or its derivatives is very expensive. The optim function in R evaluates and its gradient separately. However, the same iteration over individuals, imputations, and time steps is required for both evaluations, so it would be more efficient to compute and its gradient during the same iteration when both evaluations are needed. Next, efficiency can be improved by exploiting the independence of the model parameters. Because the vectors are specific to each outcome, the Mstep in (10) can be separated into eight independent maximization problems:
Finally, with this independence in mind, Newton’s Method might be a faster alternative to the BFGS method. The FEM models rarely have more than 20 coefficient parameters in each and it is relatively easy to compute the Hessian of each likelihood component.
5.2 Appropriate Types of Missingness
The incomplete data examples discussed in 1.3 are all due to circumstances unrelated to the state of the survey respondent. In general, the method developed here can be applied when the unobserved state of a missing outcome is independent of the fact that its missing, conditional on all other observed data. The method cannot be applied to a missing data point for an individual when combining the observed data for that individual with the fact the the data point is missing makes the unobserved state more or less likely. In these cases, the transition likelihood needs to be adjusted to condition on the missing status of the outcome. Little and Rubin (2002) provide a detailed discussion of the EM algorithm applied to different missing data mechanisms.
5.3 Comparison of Other Methods
A maximum likelihood solution to the coarse data spacing issue discussed in Section 1.3 is given by Craig and Sendi (2002) and, more recently, Chhatwal et al. (2016). Craig and Sendi (2002) also provide an EM solution to the irregular data spacing issue. The method presented here is more broadly applicable because it does not require discrete states and does not require homogeneity or stationarity assumptions. The present method can also be applied more generally because it does not depend on any structural pattern in the observed and missing data.
5.4 Connection to Bridge Simulations
The EMMCIS method addresses the problem of incomplete survey data during the estimation phase. There is a related problem in the simulation phase that can be solved using the same tools. Suppose that a fully specified transition model is already present (either estimated or obtained from some external source) and the goal is to simulate outcomes starting with observed outcomes as the initial state. This is straightforward: use the transition model to simulate conditional on the observed outcomes, simulate conditional on the simulated outcomes, and so on. Now suppose that outcomes from a later time are also observed as with . How can the outcomes be simulated from the given transition model, but conditional on the observed starting and ending states, and ? Following Section 2, the target model for simulation is:
Once again, the model can be simulated by importance sampling. After simulating a replicate of the chain, using the transition model, compute the replicate weight as . Results in the analysis phase are then computed as a weighted average:
Section 6.4.1 of Givens and Hoeting (2012) discusses bias and other important theoretical considerations when using importance weights in the analysis.
Acknowledgments
The HRS (Health and Retirement Study) is sponsored by the National Institute on Aging (grant number NIA U01AG009740) and is conducted by the University of Michigan. This work was supported by the National Institute On Aging of the National Institutes of Health under Award Number P30AG024968. The content is solely the responsibility of the author and does not necessarily represent the official views of the National Institutes of Health.
References
 Boisclair, D., CôtéSergent, A., LalibertéAuger, F., Marchand, S., and Michaud, P.C. (2016). A health microsimulation model for Quebec and Canada. Technical report, Industrial Alliance Research Chair on the Economics of Demographic Change.
 Chen, B. K., Jalal, H., Hashimoto, H., Suen, S.C., Eggleston, K., Hurley, M., Schoemaker, L., and Bhattacharya, J. (2016). Forecasting trends in disability in a superaging society: Adapting the Future Elderly Model to Japan. The Journal of the Economics of Ageing, 8:42 – 51. The Demographic Dividend and Population Aging in Asia and the Pacific.
 Chhatwal, J., Jayasuriya, S., and Elbasha, E. H. (2016). Changing cycle lengths in statetransition models: Challenges and solutions. Medical Decision Making, 36(8):952–964.
 Craig, B. A. and Sendi, P. P. (2002). Estimation of the transition matrix of a discretetime Markov chain. Health Economics, 11(1):33–42.
 Crimmins, E., Faul, J., Kim, J. K., Guyer, H., Langa, K., Ofstedal, M. B., Sonnega, A., Wallace, R., and Weir, D. (2013). Documentation of Biomarkers in the 2006 and 2008 Health and Retirement Study. University of Michigan Survey Research Center, Ann Arbor, MI.
 Crimmins, E., Faul, J., Kim, J. K., and Weir, D. (2015). Documentation of Biomarkers in the 2010 and 2012 Health and Retirement Study. University of Michigan Survey Research Center, Ann Arbor, MI.
 Crimmins, E., Guyer, H., Langa, K., Ofstedal, M. B., Wallace, R., and Weir, D. (2008). Documentation of Physical Measures, Anthropometrics and Blood Pressure in the Health and Retirement Study. University of Michigan Survey Research Center, Ann Arbor, MI.
 Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1–38.
 euFem (2017). http://www.eufem.org.
 Givens, G. H. and Hoeting, J. A. (2012). Computational Statistics. John Wiley & Sons, 2nd edition.
 Goldman, D. P., Ermini Leaf, D., and Tysinger, B. (2016). The Future Americans Model: Technical documentation. Technical report, University of Southern California, https://healthpolicy.app.box.com/v/FAMTechdoc.
 Goldman, D. P., Lakdawalla, D., Michaud, P.C., Eibner, C., Zheng, Y., Gailey, A., Vaynman, I., Sullivan, J., Tysinger, B., and Ermini Leaf, D. (2015). The Future Elderly Model: Technical documentation. Technical report, University of Southern California, https://healthpolicy.app.box.com/v/FEMTechdoc.
 Hankin, J. H., Stram, D. O., Arakawa, K., Park, S., Low, S.H., Lee, H.P., and Yu, M. C. (2001). Singapore Chinese Health Study: development, validation, and calibration of the quantitative food frequency questionnaire. Nutrition and Cancer, 39(2):187–195.
 Health and Retirement Study, public use dataset (2016). Produced and distributed by the University of Michigan with funding from the National Institute on Aging (grant number NIA U01AG009740).
 Korea Employment Information Service (2017). Korean Longitudinal Study of Ageing. http://survey.keis.or.kr/eng/klosa/klosa01.jsp.
 Little, R. and Rubin, D. (2002). Statistical Analysis with Missing Data. John Wiley & Sons, 2nd edition.
 MHAS, Mexican Health and Aging Study (2017). Data files and documentation (public use): Mexican health and aging study. http://www.mhasweb.org.
 Panel Study of Income Dynamics, public use dataset (2016). Produced and distributed by the Institute for Social Research, University of Michigan.
 R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
 RAND HRS Data, Version P (2016). Produced by the RAND Center for the Study of Aging, with funding from the National Institute on Aging and the Social Security Administration.
 Research Institute of Economy, Trade and Industry (2017). Japanese Study of Aging and Retirement. http://www.rieti.go.jp/en/projects/jstar/.
 Revolution Analytics and Weston, S. (2015). doParallel: Foreach Parallel Adaptor for the ’parallel’ Package. R package version 1.0.10.
 Statistics Canada (2017). National Population Health Survey. http://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&SDDS=3225.
 Survey of Health, Ageing and Retirement in Europe (2017). http://www.shareproject.org.
 Wei, G. C. and Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association, 85(411):699–704.
 Wickham, H. (2011). The splitapplycombine strategy for data analysis. Journal of Statistical Software, 40(1):1–29.