# On application of a response propensity model to estimation from web samples.

###### Abstract

Increasing nonresponse rates and the cost of data collection are two pressing problems encountered in traditional randomized surveys. The proliferation of inexpensive data from web surveys stimulates interest in statistical techniques for valid inferences from web samples. We consider estimation of population and domain means in the two-sample setup, where the web sample contains variables of interest and covariates that are shared with an auxiliary random sample. First, we propose an estimator of population mean, based on the estimated propensity of response to a web survey, a.k.a. web response propensity. This makes inferences from web samples that are similar to well-established techniques used for observational studies and missing data problems. Second, we propose an âimplicitâ logistic regression for estimating parameters of the web response model in the two-sample setup. In addition to random sample design information, it utilizes random sample inclusion probabilities, nominally assigned to web sample units, and the size of the subpopulation of web responders. A simulation study confirms validity of the proposed estimator in comparison with alternative approximate estimators. We illustrate our method by estimating prevalence of chronic health conditions and related medication use for the U.S. population of adults, using web and random samples from experimental web survey and the National Health Interview Survey. eb samples, web response propensity, implicit logistic regression, weighted logistic regression, double-robust estimators, two-phase sampling.

^{†}

^{†}footnotetext: The findings and conclusions in this paper are those of the author and do not necessarily represent the views of the National Center for Health Statistics, Centers for Disease Control and Prevention.

## Introduction

Propensity models are widely used for inferences in observational studies and in survey sampling with missing data. In observational studies, treatment effect is estimated after responses from treated and control populations are either matched or balanced by their propensity to be treated, which is derived from modeling treatment assignment indicator Rosenbaum and Rubin (1983); Rosenbaum and Rubin (1984); Lunceford and Davidian (2004). In survey statistics, estimates of general population parameters are obtained from the data collected from responders by adjusting sampling weights of responding units by the estimated survey response propensity, thus calibrating the responding subpopulation to the general population (Sarndal and Lundstrom (2006); Haziza and Rao (2006). Unbiased and consistent estimators of treatment effect and general population parameters, using the propensity scores estimated with treatment assignment or response models, were developed for both cases.

It is important to identify target population in each case. For observational studies, it is the combined sample of treated and controls. In case of estimation from survey samples with missing data, it is the general surveyed population. Therefore, estimation of response propensity must account for survey weights (Sarndal and Lundstrom (2006)). The problem in both cases is that the outcome variable was observed only for a part of the target population: response to treatment only for the treated, response without treatment only for the controls, or an outcome variable only for the survey responders. Generally, the propensity score is used to propagate expected response to the rest of the target population, for which the outcome variable was not observed.

We formulate a problem of estimation from a web and random sample similar to the two problems described above. The target population in this case is the population surveyed by the random sample with known design but unobserved variables of interest. The web sample is assumed to represent the population of web responders, which is a subpopulation of the target population. If web response propensity is properly estimated, it can be utilized to propagate expectation of the outcome variable from the subpopulation of web responders to the target population. Under this approach, existing methodology can be applied to estimation from web samples.

We formulate a problem of estimation from a web and random sample similar to the two problems described above. The target population in this case is the population surveyed by the random sample with known design but unobserved variables of interest. The web sample is assumed to represent the population of web responders, which is a subpopulation of the target population. If web response propensity is properly estimated, it can be utilized to propagate expectation of the outcome variable from the subpopulation of web responders to the target population. Under this approach, existing methodology can be applied to estimation from web samples.

However, estimating web response propensity in the two-sample setup is not straightforward. This is because the web response indicator variable is not observed in the combined web and random samples and cannot be directly modeled by standard techniques, such as logistic regression. Indeed, this indicator equals one for all units of the web sample and is undefined for the random sample. We propose a modification of logistic regression to estimate web response propensity. We call it âimplicitâ logistic regression, because maximum likelihood equations (MLE) are derived from modeling web sample indicator variable , defined on the combined web and random samples, but logistic link and associated parameterization are considered for web response propensity , which is implicitly related to the expectation of the web sample indicator. Currently, the conventional way to estimate web response propensity involves weighted logistic regression Beresovsky (2016), Elliott and Valliant (2017), Valliant and Dever (2011).

Our proposed class of estimators of the target population mean in the two-sample setup is an adaptation of the estimators developed by Haziza and Rao (2006) and Kim and Haziza (2014) for the missing data problem. They utilize predictions of both web response and outcome variable models. These estimators may be considered an extension of the model-assisted calibration estimators, first presented by Wu and Sitter (2001) and recently reviewed by Breidt and Opsomer (2017). The proposed estimators are flexible, in that they admit any form of both models. Using predicted outcomes results in improved efficiency of the estimates and can also reduce bias due to an incorrectly specified web response model. Estimators of this class, initially proposed byRobins et al. (1994) for observational studies, are called “double-robust” in the sense that the inclusion of both models requires only one of the models to be correctly specified to produce unbiased estimates. Our formulation, however, is general, so that the proposed class of estimators may be based on only one of the models. For example, the estimators in this paper use only predictions by web response model.

With properly estimated web response propensity, asymptotically unbiased variance estimators can be handily formulated in randomization framework Kim and Haziza (2014). Because the proposed point estimators comprise contributions from web and random samples, variance estimators account for variability associated with the selection of both samples.

In Section 1, using an analogy with propensity-based estimation of treatment effect in observational studies and of population parameters in survey sampling with missing data, we present estimators of population means from a web sample, which are unbiased if either web response or the outcome model is correctly specified. Variance estimators, derived in the web response model approach, are presented in Section 2. In Section 3, we introduce implicit logistic regression (ILR) for estimating parameters for the web response model by modeling the observed web sample indicator . Simulations, presented in Section 4, illustrate application of ILR for inferences of web response model parameters and outcome variable means. These inferences are compared with inferences using web response propensity, approximately estimated by averaged implicit logistic regression (AILR) and weighted logistic regression (WLR). Section 5 provides an application of the proposed estimation methods to health care data, collected by deploying identical questionnaire by the means of the experimental web survey and the National Health Interview Survey (NHIS). Results are summarized, and directions for future research are deliberated in the Conclusions.

## 1 Inferences with propensity scores: common problems and solutions

It is important to qualify the estimation of general population characteristics from web samples as a typical problem of estimation using propensity scores. This helps to justify the application of estimation methodologies, originally developed for observational studies and nonresponse adjustment of estimators from survey samples. Specifics of and similarities between different cases of application of response propensity are described below.

Observational studies.

Outcome, covariates, and treatment indicator variables are available for all units of the combined sample of treated and controls. It coincides with the population of interest, for which treatment effect is defined as . The problem is that response to treatment is observed only for , and response of controls only for , so the difference in responses cannot be estimated from existing data.

The solution, proposed by Rosenbaum and Rubin (1983), uses the propensity of treatment assignment to establish conditional ignorability between outcomes and treatment assignment . If the treatment propensity is specified correctly, then the treatment effect is unbiasedly estimated by matching treated and control units by propensity score and taking expectation of the observed differences over covariate space

An alternative estimator of treatment effect is generated by inverse propensity weighting (IPW) of treated and control outcomes

This estimator is unbiased , if treatment propensity is correctly estimated by treatment assignment model. Robustness to misspecification of the treatment model may be improved, along with efficiency, by augmenting the IPW estimator with predictions by outcome models for treated and controls

(1.1) |

This is an augmented IPW (AIPW) estimator, initially proposed by Robins et al. (1994). Lunceford and Davidian (2004) proved that variance of the AIPW estimator is smaller compared with variance of the IPW estimator. It is easy to prove unbiasedness of the AIPW estimator, if either one of treatment or outcome models is misspecified. This property is called double-robustness.

Survey samples with nonresponse.

Random sampleis drawn with known selection probabilities from the population of interest , but only part of the sampled units, identified by indicator , responded to the survey questionnaire. Because outcome is available only for the respondents, direct estimation of population mean by weighted sum , is impossible.
The probability of response to the survey questionnaire, or response propensity, can be estimated if covariates are available for every sampled unit. In analogy with the IPW estimator of treatment effect, similar estimator of the target population mean is

Properties of the IPW estimator are evaluated over joint distribution of sample and response indicators , e.g. unbiasedness can be proved by taking total expectation . Haziza and Rao (2006) proposed an AIPW-like estimator of the target population mean

(1.2) |

Besides response propensity , it also uses predictions from the outcome model for survey respondents . Properties of the AIPW estimator can be evaluated over joint distribution of indicators, or sampling indicator and outcome variable . Its unbiasedness is easily demonstrated under both distributional assumptions, if the corresponding model is correctly specified. Kim and Haziza (2014) treated (1.2) as a class of estimators, differentiated by specific method of estimating parameters of response and outcome models. They considered one of such estimators, for which parameters of both models are estimated by minimizing the difference between (1.2) and Horwitz-Thompson estimator. They proved consistency and double-robustness of inferences obtained with this estimator.

Web and random samples.

To formulate a web response propensity-based estimator of the population mean from web and random samples, it is important to formalize the setup of the problem.
Suppose that a random sample is drawn from the population of interest with known inclusion probabilities . At the same time, every unit of has a nonzero probability of response to a web survey, where indicator defines one of the possible realizations of subpopulation of web responders. A web sample is then drawn from with known probabilities , and may be viewed as a product of two-phase selection . Understanding of selection mechanism is important for deriving a variance estimator, which is discussed in the next section.
Because outcome variable is collected only for the web sample , which is not representative of the population of interest , direct estimation of population parameters is impossible. Similar to other applications of response propensity, we use propensity of response to a web survey to make web sample representative of the population of interest . An estimator of the population mean from web and random samples, can be formulated in analogy with AIPW estimators (1.1 - 1.2)

(1.3) |

The summation includes units of the combined web and random sample . Sample indicators equal one for web sample units, and zero otherwise. Dependence of on web response indicator appears when sampled units are identified on the population level by sampling indicators

(1.4) | |||

where and are any functions of covariates and outcome variables, which are treated as constants in the web response propensity approach. Therefore, expectation and variance of are evaluated depending on rather than sample indicator.

Similar to the nonresponse estimator (1.2), estimator (1.3) may be considered either in the web response propensity or in the outcome prediction model approach. It is unbiased in both cases if the corresponding model is correctly specified, which is easily demonstrated by taking expectation of the difference between the estimator and target population parameter. It is worthwhile to determine how to extend the proof of consistency of the nonresponse estimator by Kim and Haziza (2014) to the estimator (1.3) from web samples.

## 2 Variance estimation in the web response propensity approach

Kim and Haziza (2014) derived variance of the population mean estimator with missing data in the response propensity approach using an inverse two-phase response framework Shao and Steel (1999). It assumes that nonrespondents are identified for the finite population , and then, a sample of both kinds of units is selected according to the given sampling design. This process is similar to the mechanism of web sample selection described in the previous section, except that only web responders are being selected for the web sample. In the response propensity approach, the outcome variable is considered fixed, and variance is evaluated over the joint distribution of sampling and response indicators using the law of total variance

(2.1) |

From (1.4), it follows that both web and random samples depend on sampling indicator , and, therefore, contribute to the variance of random sampling , which is estimated as variance of the Horwitz-Tompson expansion estimator

(2.2) |

Units of random sample are associated with predictions by the outcome model , and web sample units with residuals , weighted by inverse web response propensity . and are joint inclusion probabilities of random sampling of and , and independence is assumed for sampling and web response indicators .

The term of expression (2.1) is the variance of conditional expectation , which is evaluated as a variance of Poisson sampling of from the general population

(2.3) |

Multiplier appears as a result of estimating the variance from the available web sample.

In our view, two-phase selection of web samples adequately represents selection process characteristic for web surveys. However, separating web response and random sampling from the subpopulation of web responders requires additional information, which may not be available in practical situations. Therefore, we also consider variance estimation in the simplified one-phase framework, where web sample is selected directly from the general population by Poisson sampling. Under this framework, web sample coincides with the population of responders, which means that is combined web response indicator, and is combined web response probability equal to . Sampling probability in this case is , only random sample contributes to sampling variance (2.2) , and the total variance estimator is

(2.4) |

Variance estimators (2.2- 2.3) and (2.4), corresponding to both randomization frameworks, are compared in simulations and found to be practically identical .

## 3 Implicit logistic regression for modeling web response propensity

Propensity of web response, essential for point and variance estimators (1.3) and (2.2-2.4), cannot be directly estimated because the indicator of web response is not observed on sample data. Beresovsky (2016), Elliott and Valliant (2017), and Valliant and Dever (2011) used the following “intuitive” approach to estimate web response propensity. Random sample units are assigned with design weights and the units of are weighted by , which becomes under direct web sample selection framework. Distribution of sample indicators on the combined sample is fitted by weighted logistic regression (WLR) with parameters , which makes it possible to predict -propensity . Let and represent general and web respondersâ population sizes conditional on covariates. For large samples, weighted Z-propensity must converge to final population value, and web response propensity to the ratio of web and general population sizes . The WLR estimator of web response propensity is derived from these expressions as

(3.1) |

This methodology is approximate and suboptimal because solutions of maximum likelihood equations (MLE) are parameters of -model , rather than web response model. Also, using a logistic link with is incorrect because in the large-sample limit it varies between 0 and 0.5. These problems can be addressed by modifying MLE in such a way that parameters of web response model are directly estimated by fitting the distribution of sample indicators . Because there is indirect relation between sample indicators and web response indicators, we refer to this methodology as “mplicit” logistic regression (ILR).

A key premise of ILR is the relation between web response and - propensities, which follows from expressing the joint probability distribution of sample indicator variables through conditional probability

(3.2) |

The probability of belonging to a web sample, under two-phase web sample selection, is a product of selection probability from the web respondersâ population and web response propensity . The probability of belonging to a combined sample follows from general principles of probability

If sampling fractions are small, the last term can be neglected from the above expression. Finally, we determine the relation between web response and -propensities, which is central to ILR

(3.3) |

Note that ILR requires knowing both selection probabilities for every unit of the combined sample. Assigning web sample units with random selection probabilitiesrequires collecting additional information from web responders, which adds burden for administrators and responders of web surveys. Average implicit logistic regression (AILR) can be formulated by assigning average design weights to web sample units. Inferences produced by ILR, AILR, and WLR are compared in simulations.

We start with MLE resulting from fitting sample indicator with regular logistic regression

(3.4) |

Expression ( 3.3) defines implicit dependence of these MLE on parameters of the web response model . They are introduced with regular logistic link function , where and are -dimensional vectors of covariates and model parameters, including an intercept.

Solving equations (3.4) requires expressing them as a function of web response model parameters . This is straightforward for and , but deriving the modified link function is more involved. Finally, we obtain , where

(3.5) |

Now it is possible to estimate model parameters iteratively by a Fisher scoring algorithm Searle et al. (1992). At the iteration step, the next iteration of parametersâ values are calculated as

(3.6) |

It is understood that components of these equations- , , , and - are evaluated at . The asymptotic variance of the estimates takes the familiar form .

## 4 Simulation study

This simulation study compares inferences of parameters of web response model and finite population means of outcome variables, obtained using three estimators of web response propensity.
We simulate general populationof size , from which a subpopulation of web responders , of variable size , is selected by Poisson sampling with probability for each simulation. Random sample of size is selected from with probability proportional to size (PPS), depending on size measures as . Web sample is selected either following an inverse two-phase framework or directly from the general population. In the first case, a simple random sample (SRS) of size is selected with probability from the subpopulation of web responders . In the second case, probability of web response is chosen in such a way that web samplecoincides with subpopulation of web responders of average size.

The first estimator of web response propensity is ILR introduced in the previous section. Proper application of ILR requires knowing size measure for the units of both random and web samples and accounting for the size of web respondersâ subpopulation. Because obtaining this information can be difficult, we consider simplified ILR estimator, which uses actual design weights for the units of , and average design weights for the units of . It also assumes direct one-phase selection of a web sample and does not account for, even under two-phase selection of a web sample. This estimator is referred to as averaged ILR (AILR). The third estimator uses WLR ( 3.1) to estimate web response propensity It requires the same reduced input information as AILR, but uses different estimation techniques.

A set of four covariates is generated for each unit of the general population from a multivariate normal distribution , with covariance matrix We choose and for this simulation.

The outcome variable is generated as the Bernoulli variable, where probabilities come from equation

Finite population is subdivided into four strata defined by binary design variables , . Table 1 shows strata sizes and size measures used in the PPS designs of the random sample.

1 | 0 | 0 | 150,000 | 0.1 |

2 | 0 | 1 | 100,000 | 0.3 |

3 | 1 | 0 | 50,000 | 0.5 |

4 | 1 | 1 | 15,000 | 0.7 |

We allowed web sample to be selected in two phases and directly. Two-phase selection is simulated by generating web response indicator from the Bernoulli distribution,, with probabilities

This results in subpopulations of web responders with average sizes. A web sample of size is selected at the second phase as SRS. Direct selection of web samples of average size is accomplished for probabilities of web response defined as

The intercept is tuned to provide for an average size of web respondersâ subpopulation and to avoid further selection of web sample. All estimators of web response propensity use correctly specified design matrix . Inferences of model parameters by ILR, AILR, and WLR estimators, in cases of two-phase and direct selection of web samples, are averaged over 2,000 simulations. In Table 2, we present inferences of parameters of web response model for two-phase and direct selection of web sample by three estimators: implicit logistic regression (ILR), averaged implicit logistic regression (AILR), and weighted logistic regression (WLR).

Model | ||||||
---|---|---|---|---|---|---|

Two-phase | -1.0 | 1.0 | -0.5 | 0.25 | 0.1 | |

ILR | , covg | -1.00 , 1.00 | 1.01 , .96 | -.50 , .95 | .25 , .95 | .10 , .95 |

, | 0.035 , .076 | .091 , .096 | .084 , .085 | .081 , .081 | .080 , .078 | |

AILR | -5.84 , .00 | .64 , .00 | -.32 , .08 | .16 , .62 | .06 , 0.88 | |

0.031 , .051 | .053 , .054 | .052 , .053 | .051 , .052 | .050 , .050 | ||

WLR | -5.93 , .00 | .59 , .00 | -.30 , .02 | .15 , .29 | .06 , .61 | |

0.033 , .037 | .065 , .033 | .064 , .034 | .061 , .034 | .061 , .033 | ||

Direct | -6.2 | 1.0 | -0.5 | 0.25 | 0.1 | |

ILR | , covg | -6.20 , .98 | 1.00 , .95 | -.50 , .94 | .25 , .95 | .10 , .95 |

, | .050 , .060 | .059 , .059 | .059 , .057 | .056 , .056 | .053 , .054 | |

AILR | -6.10 , .61 | 1.00 , .95 | -0.50 , .94 | .25 , .95 | .10 , .96 | |

.052 , .058 | .057 , .058 | .057, .056 | .055 , .054 | .051 , .053 | ||

WLR | -6.22 , .77 | 1.02 , .50 | -.51 , .56 | .26 , .58 | .10 , .59 | |

.071 , .045 | .097 , .033 | .084 , .033 | .078 , .033 | .074 , .032 |

Model parameters and standard errors are unbiasedly estimated by ILR under both mechanisms of web sample selection. AILR and WLR produce biased point estimates in case of two-phase selection, because AILR is designed to approximate two-phase selection with direct selection, and WLR simply does not have an option to account for two-phase selection. However, if AILR estimates of standard errors are generally correct, estimates by WLR are consistently lower than corresponding standard deviation of point estimates. When web sample is selected directly, point estimates of model parameters are unbiased for all three estimators, and it is possible to compare them by efficiency. We find that standard deviations of model parameters estimated by ILR and AILR are close, while estimates by WLR are substantially less efficient.

Inferences of population mean with AIPW estimator ( 1.3) using web response propensity are studied in the remaining simulations. For simplification, the estimator of the mean does not use the outcome variable model. Instead, IPW estimator is used to predict mean outcome, resulting in a simplified AIPW estimator

(4.1) |

where and are estimators of population size. Means are estimated for the general population and in two domains, defined as : and : 1st quartile of . Note that does not correlate with web response propensity and the outcome variable, while does. In Table 3, we present inferences obtained in cases of two-phase and direct selection of web sample, using web response propensity estimated by ILR, AILR, and WLR.

Model | Pop | |||
---|---|---|---|---|

Two-phase selection | ||||

Direct | RB | 0.4 | 0.4 | 0.16 |

ILR | RB , covg | 0.0 , .95 | 0.0 , .96 | -.01 , .92 |

, | 1.69 , 1.75 | 3.07 , 3.08 | 2.98 , 3.01 | |

AILR | .01 , .91 | 0.0 , 0.92 | -.02 , 0.86 | |

1.71 , 1.49 | 2.96 , 2.62 | 2.82 , 2.29 | ||

WLR | .05 , .82 | .04 , .90 | 0.0 , .86 | |

2.06 , 1.66 | 3.28 , 2.92 | 2.92 , 2.39 | ||

Direct selection | ||||

Direct | RB | 0.73 | 0.72 | 0.19 |

ILR | RB , covg | 0.0 , .96 | 0.0 , .95 | 0.0 , .88 |

, | 2.04 , 2.06 | 3.65 , 3.61 | 3.90 , 3.93 | |

AILR | 0.0 , .93 | 0.0 , .92 | -.01 , .86 | |

1.99 , 1.81 | 3.46 , 3.18 | 3.84 , 3.51 | ||

WLR | -.01 , .84 | -.01 , .91 | -.04 , .88 | |

2.98 , 2.17 | 4.30 , 3.83 | 3.98 , 4.30 |

ILR unbiasedly estimates population and domain means and their standard deviations under both mechanisms of web sample selection. Point estimates by AILR are practically unbiased, but standard error estimates are substantially lower than corresponding standard deviations. Some of the WLR point estimates are noticeably biased, and standard errors substantially underestimate standard deviation of point estimates over simulations.

All estimators are more efficient for two-phase web sample selection than for direct selection. This is because web sample size is constant over simulations in the first case, while it fluctuates around in the second case. At the same time, estimators ILR and AILR show universally better efficiency than the WLR estimator. Overall, it can be concluded that inferences of population and domain means obtained with the WLR estimator are generally inferior compared with inferences obtained with ILR and AILR because WLR does not estimate propensity of web response adequately.

## 5 Inferences of chronic health conditions from web and NHIS samples

To be published after peer review

## Conclusion

We consider the problem of estimating population parameters from a combination of two samples: One is based on responses to a web survey, and the other is a random sample collected by a traditional survey. The web-based sample contains variables of interest, and the random sample is needed to calibrate estimates based on a shared set of covariates. We present an estimator, utilizing web response propensity to remove possible bias due to nonrandom selection of web samples, and demonstrate its relation to estimators widely employed in observational studies and inference with missing data in survey sampling. Variance of the proposed estimator is evaluated as a total variance associated with response and sampling indicators.

To make inferences from web samples possible, we develop implicit logistic regression to estimate parameters of implicit modeling web response indicator. In previous research, web response propensity was estimated using regular weighted logistic regression. The proposed method requires additional input information, such as nominal random selection probabilities for the web sample units and the overall size of the subpopulation of web responders. Because this information may not be always available, we consider average implicit logistic regression, which relies on the same input information as weighted logistic regression.

The simulation study demonstrates a clear advantage of the proposed methodology. In all cases of web sample selection, unbiased point and variance estimates are obtained only with implicit logistic regression. Using average implicit logistic regression produces almost unbiased point estimates but consistently underestimates variances. Estimates with weighted logistic regression are the least reliable: Point estimates are sporadically biased; variances are mostly substantially underestimated, sometimes by 30%. In addition, these estimates are less efficient compared with estimates using implicit and average implicit logistic regression. Given the clear advantages of implicit logistic regression, we emphasize the importance of planning a web survey to ensure that all of the required input information is collected and made available for estimation.

In this paper, we did not elaborate on the issue of the model selection. This may explain the results from comparing the estimates of prevalence of chronic health conditions and rates of medication use, obtained from data collected by the real web survey and regular NHIS. We found that some of the estimates are matching reasonably well, while others are substantially different. Relying on a limited set of covariates to model web response propensity can be sufficient to reduce bias for some, but not for all, estimates. Other factors, such as survey mode effect, may also contribute to discrepancies between estimates from the web survey and regular NHIS.

Notably, estimates from the web survey are more efficient than estimates from NHIS, given the difference in sample sizes. This is because estimated propensity of web response is less variable than NHIS random-selection probabilities, which results in smaller design effect, or equivalently, larger effective sample size.

Robustness to model misspecification and efficiency of estimates from web survey data can be substantially improved for AIPW estimator ( 1.3), using models for both web response and outcome variable. This further highlights the importance of using optimal model selection techniques, such as the Lasso Tibshirani (1996), in application to implicit logistic regression and joined modeling of web response and the outcome variable. This will be a subject for future research.

## References

- Beresovsky (2016) Beresovsky, V. (2016). Using official surveys to reduce bias of estimates from nonrandom samples collected by web surveys. In Proceedings of the American Statistical Association, Survey Research Methods Section, pages 1804–1819.
- Breidt and Opsomer (2017) Breidt, J. F. and Opsomer, J. D. (2017). Model-assisted survey estimation with modern prediction techniques. Statistical Science, 32(2):190–205.
- Elliott and Valliant (2017) Elliott, M. and Valliant, R. (2017). Inference for nonprobability samples. Statistical Science, 32(2):249–264.
- Haziza and Rao (2006) Haziza, D. and Rao, J. (2006). A nonrespnse model approach to inference under imputation with missing survey data. Survey Methodology, 32:53–64.
- Kim and Haziza (2014) Kim, J. K. and Haziza, D. (2014). Doubly robust inference with missing data in survey sampling. Statistica Sinica, 24:375–394.
- Lunceford and Davidian (2004) Lunceford, J. K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects; a comparative study. Statistics in Medicine, 23:2937–2960.
- Robins et al. (1994) Robins, J., A., R., and Zhao, L. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89:846–866.
- Rosenbaum and Rubin (1983) Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70:41–55.
- Rosenbaum and Rubin (1984) Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observtaional studies using subclassification on the propensity score. Journal of the American Statistical Association, 79:516–524.
- Sarndal and Lundstrom (2006) Sarndal, C-E. and Lundstrom, S. (2006). Estimation in Surveys with Nonresponse. Wiley Series in Survey Methodology, Chichester, England.
- Searle et al. (1992) Searle, S. R., Casella, G., and McCulloch, C. E. (1992). Variance Components. Wiley & Sons, New York, New York.
- Shao and Steel (1999) Shao, J. and Steel, P. (1999). Variance estimation for survey data with composite imputation and nonnegligible sampling fractions. Journal of the American Statistical Association, 94:254–265.
- Tibshirani (1996) Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J.Royal Statistical Society B, 58(1):267–288.
- Valliant and Dever (2011) Valliant, R. and Dever, J. A. (2011). Estimating propensity adjustments for volunteer web surveys. Sociological Methods and Research, 40(1):105–137.
- Wu and Sitter (2001) Wu, C. and Sitter, R. R. (2001). A model-calibration approach to using complete auxiliary information from survey data. Journal of the American Statistical Association, 96(453):185–193.