Regression by clustering using Metropolis-Hastings
High quality risk adjustment in health insurance markets weakens insurer incentives to engage in inefficient behavior to attract lower-cost enrollees. We propose a novel methodology based on Markov Chain Monte Carlo methods to improve risk adjustment by clustering diagnostic codes into risk groups optimal for health expenditure prediction. We test the performance of our methodology against common alternatives using panel data from 3.5 million enrollees of the Colombian Healthcare System. Results show that our methodology outperforms common alternatives and suggest that it has potential to improve access to quality healthcare for the chronically ill.
Regression by clustering using Metropolis-Hastings
Adolfo Quiroz Universidad de los Andes Bogota, Colombia firstname.lastname@example.org Simón Ramírez-Amaya Quantil and Universidad de los Andes Bogota, Colombia email@example.com Álvaro Riascos Quantil and Universidad de los Andes Bogota, Colombia firstname.lastname@example.org
noticebox[b]Preprint. Work in progress.\end@float
Contrary to expenditures on other services, health care expenditures are characterized both by large random variation as well as large predictable variation across individuals (Van de ven and Ellis, 2000). Such differences create potential for efficiency gains due to risk reduction from insurance and raise concerns about fairness across individuals with different expected needs in unregulated competitive health insurance markets.
Cross-subsidies based on the observable characteristics of enrollees have become an increasingly important regulatory tool in health insurance markets. These type of subsidies have the potential to decouple insurers expected costs and expected profit and thus weaken their incentives to manipulate insurance products to attract lower-cost consumers. Enrollee-based subsidies to insurers are known as risk adjustment, and their introduction has been motivated by a broader shift towards regulated private insurance markets (Geruso and Layton, 2015; Gruber, 2017).
The main purpose of a risk-adjusted systems is to allocate resources from enrollees to insurers in such a way that the risk that enrollees represent to insurers is equalized or at least smoothed. On one side, the system receives mandatory income related contributions from consumers. On the other, the system pays anticipated risk-adjusted payments to insurers. Risk-adjusted payments are delivered on a subscription basis for providing services under specified standards regardless of the actual nature or number of services provided. Figure 1 presents an abstraction of the most relevant features of risk-adjusted systems.
Several policy choices need to be fine-tuned for risk-adjustment systems to work properly. Among them is estimation of risk-adjusted payments. Since the cost level of a health care service package is hard to determine, payments are based on observed expenses rather than needs-based costs (Van de ven and Ellis, 2000). High quality estimation is vital since poor risk adjustment gives insurance companies incentives to engage in cream-skimming (Cutler and Zeckhauser, 1999; Ven et al., 2000; Shmueli and Nissan-Engelcin, 2013). This type of behavior threatens risk solidarity, efficiency and possibly the unraveling of the insurance market itself.
Evidence suggests that demographic adjusters such as age, sex and residence are weak predictors of individual expenditure and that risk adjustment can be greatly improved by using diagnosis-based information (Van de ven and Ellis, 2000). Since the early 1980’s a considerable amount of research has developed risk adjustment models that use diagnostic from insurance claims to estimate risk-adjusted payments.
Although each model has its own unique features, they share two characteristics that are worth highlighting. First, all models rely on the diagnostic standard known as the International Classification of Diseases (ICD) by the World Health Organization (WHO). In its 10th revision, the ICD allows for more than different codes for disease identification. Second, since the code space size is not trivial, models rely on classification systems to cluster ICD codes into meaningful Diagnostic Risk Groups (DRG).
Traditionally, DRG have been constructed using ad hoc expert criteria on clinical, cost and incentive considerations (Van de ven and Ellis, 2000; Juhnke et al., 2016). The most refined versions of these classification systems begin by classifying diagnoses into a tractable number of diagnostic-based groups and then using these diagnostic groups to classify individuals according to the specific combination of conditions each individual has. More recently, there have been attempts to inform DRG construction by introducing iterative hypothesis testing (Hughes et al., 2004).
Despite the wealth of classification systems available, quality of risk adjustment remains limited (Kleef et al., 2013; Brown et al., 2014; Alfonso et al., 2014; Riascos et al., 2017). We believe that risk adjusment can be improved by formally approaching the problem of finding optimal DRG. In this paper we develop a methodology aimed at solving this problem by using Monte Carlo Markov Chain methods to efficiently traverse the space of possible solutions. We also test our methodology against common alternatives in the Colombian Health Sector using two year panel data for 3.5 million enrollees. Results show that our methodology outperforms common alternatives and suggest that it has potential to improve access to quality healthcare for the chronically ill.
2 Theoretical Framework
Let be a finite index of all observations and the set of all observations. is a vector of characteristics with index that compromises continuous, and discrete variables, is a categorical variable and is a continuous real dependent variable. Let be the set of continuous and discrete characteristics, be the set of categorical characteristics and the set of real numbers. When useful we represent as dummy variables. is large.
We want to learn a hypothesis that minimizes a loss function , where is the loss given an example and . We will do this using as plausible hypothesis the set of linear learning functions, possibly including interactions among all features.
We trust good performance by reducing the dimension of the categorical feature space. Doing this may also provide interesting relationships among the categorical variables.
Let the set of all partitions of , fix a natural number and let be the set of all partitions of with clusters (elements). Let be the set of all linear hypothesis (probably including interactions) with categorical clusters defined by . For every partition we can define an optimal learning hypothesis trained using a training subsample from . For example, could be the least square learner.
For a fixed the problem we want to solve is:
The expectation can be estimated with a large testing subsample of . We propose solving this problem using a Metropolis-Hastings algorithm.
In order to use the Metropolis-Hastings algorithm we first need to introduce a distance function among partitions. Given (from now on we ommit the suscript ), define the partition distance, between and , as the minimum number of elements that must be deleted from so that the restrictions of to the remaining elements are identical. If in doing so a cluster becomes empty, then it is no longer considered a cluster. This definition is equivalent to the following. is the minimum number of elements that have to be moved in (including the creation of clusters), so that the resulting partition is identical to . This last interpretation is the most appropriate in the context of the Metropolis-Hastings algorithm that we now introduce.
To solve our optimization problem we will sample from the following distribution on :
where is a normalization constant that makes a probability distribution. The parameter is a meta parameter.
Let and consider the following Markov Chain on (i.e., proposal distribution):
The proposal distribution only gives a positive transition probability to partitions that are sufficiently close.111Given , for each cluster , let be the number of elemnts of . Notice that , hence the number of partitions that are at a distance form is equal to . At most one element has to be moved to obtain the other partition. The partition distance can be computed in polynomial time. Using the Metropolis-Hastings algorithm, we can modify to a new Markov chain that will have as the stationary distribution.
Define a Markov chain on in the following way. Let the acceptance ratio be define as and define:
Note that the normalization constant is irrelevant for the defnition o . Therefore, and for any initial partition, .
3 Empirical Framework
We use two years of panel data containing all insurance claims of 3.5 million enrollees of the Colombian Healthcare System to test our methodology. We train four different linear models to predict individual expenditure with four different feature specifications.
The first model is a naive benchmark that fits mean expenditure in the past year. The second model fits demographic features exclusively. The third and fourth model fit demographic features as well as individual membership to DRG. The third model uses DRG constructed using expert criteria (as reported in Riascos et al. (2017)) while the fourth model uses DRG constructed using Metropolis-Hastings (MH-DRG).
3.2 Results and discussion
Figure 2 shows the mean absolute error in Colombian currency for each model specification considered over the full sample, the upper percentile and the lower percentile of the enrollee expenditure distribution. Inclusion of MH-DRG improves expenditure prediction over the no DRG and traditional expert DRG alternatives. This is decidedly the case for the upper percentile of the expenditure distribution where the model absolute error falls about 5%.
The proposed methodology has the potential to improve risk adjustment and align incentives towards the provision of quality healthcare. In Colombia, the aforementioned improvement would imply a redistribution of resources among insurers of approximately USD million. This is an amount comparable to system-wide resources that are currently being redistributed among insurers by ex post risk adjustment mechanisms (Acuña, 2014).
4 Future work
Future research should consider finding a Metropolis-Hastings implementation that traverses the space of partitions of any (reasonable) size. That is, at each iteration, consider the possibility of transitioning not only to neighboring partitions of the same size but also to neighboring partitions of lesser or greater size. This will probably require to redefine appropriately and while calculating explicitly calculate and . Indeed, interesting research questions for future research can be derived from the broader question of considering a wider set of neighboring partitions at each iteration.
Appendix A Algorithms
Metropolis-Hastings algorithm is remarkably simple:
As stated before, in the problem at hand propositions are partitions of in clusters. We will denote by the cluster of under partition . Our specific implementation for a fixed follows closely:
It is important to note that from definition (3) it follows that, for a fixed , and therefore while calculating there is no need to explicitly calculate and .
- Acuña (2014) Acuña, L. (2014). El financiamiento de las enfermedades de alto costo. Technical report, Cuenta de Alto Costo.
- Alfonso et al. (2014) Alfonso, E., Riascos, A., and Romero, M. (2014). The performance of risk adjustment models in colombian competitive health insurance market. Technical report, Universidad de los Andes.
- Brown et al. (2014) Brown, J., Duggan, M., Kuziemko, I., and Woolston, W. (2014). How does risk selection respond to risk adjustment? new evidence from the medicare advantage program. American Economic Review, 104(10):3335–64.
- Cutler and Zeckhauser (1999) Cutler, D. M. and Zeckhauser, R. J. (1999). The anatomy of health insurance. Working Paper 7176, National Bureau of Economic Research.
- Geruso and Layton (2015) Geruso, M. and Layton, T. (2015). Upcoding: Evidence from medicare on squishy risk adjustment. Working Paper 21222, National Bureau of Economic Research.
- Gruber (2017) Gruber, J. (2017). Delivering Public Health Insurance through Private Plan Choice in the United States. Journal of Economic Perspectives, 31(4):3–22.
- Hughes et al. (2004) Hughes, J. S., Averill, R. F., Eisenhandler, J., Goldfield, N. I., Muldoon, J., Neff, J. M., and Gay, J. C. (2004). Clinical risk groups (crgs): A classification system for risk-adjusted capitation-based payment and health care management. Medical Care, 42(1):81–90.
- Juhnke et al. (2016) Juhnke, C., Bethge, S., and Mühlbacher, A. C. (2016). A review on methods of risk adjustment and their use in integrated healthcare systems. In International journal of integrated care.
- Kleef et al. (2013) Kleef, R. C. V., Vliet, R. C. V., and de Ven, W. P. V. (2013). Risk equalization in the netherlands: an empirical evaluation. Expert Review of Pharmacoeconomics & Outcomes Research, 13(6):829–839.
- Riascos et al. (2017) Riascos, A., Romero, M., and Serna, N. (2017). Risk adjustment revisited using machine learning techniques. Technical report, Universidad de los Andes.
- Shmueli and Nissan-Engelcin (2013) Shmueli, A. and Nissan-Engelcin, E. (2013). Local availability of physicians’ services as a tool for implicit risk selection. Social Science Medicine, 84:53 – 60.
- Van de ven and Ellis (2000) Van de ven, W. P. and Ellis, R. P. (2000). Risk adjustment in competitive health plan markets. In Culyer, A. J. and Newhouse, J. P., editors, Handbook of Health Economics, volume 1 of Handbook of Health Economics, chapter 14, pages 755–845. Elsevier.
- Ven et al. (2000) Ven, W., C.J.A van Vliet, R., Schut, F., and M. van Barneveld, E. (2000). Access to coverage for high risks in a competitive individual health insurance market: Via premium rate restrictions or risk-adjusted premium subsidies. Journal of Health Economics, 19:311–339.