# [

###### Abstract

In some randomized clinical trials, patients may die before the measurements of their outcomes. Even though randomization generates comparable treatment and control groups, the remaining survivors often differ significantly in background variables that are prognostic to the outcomes. This is called the truncation by death problem. Under the potential outcomes framework, the only well-defined causal effect on the outcome is within the subgroup of patients who would always survive under both treatment and control. Because the definition of the subgroup depends on the potential values of the survival status that could not be observed jointly, without making strong parametric assumptions, we cannot identify the causal effect of interest and consequently can only obtain bounds of it. Unfortunately, however, many bounds are too wide to be useful. We propose to use detailed survival information before and after the measurements of the outcomes to sharpen the bounds of the subgroup causal effect. Because survival times contain useful information about the final outcome, carefully utilizing them could improve statistical inference without imposing strong parametric assumptions. Moreover, we propose to use a copula model to relax the commonly-invoked but often doubtful monotonicity assumption that the treatment extends the survival time for all patients.

Using Survival Information in Truncation by Death Problems]Using Survival Information in Truncation by Death Problems Without the Monotonicity Assumption

ounds; Causal inference; Principal stratification; Survivor average causal effect.

## 1 Introduction

In randomized clinical studies, evaluations of the effectiveness of alternative treatments on a non-mortality outcome such as the health related quality of life (HRQOL) outcome are often complicated by truncation by death. The motivation of our study comes from a prostate cancer research (Petrylak et al., 2004), a Southwest Oncology Group (SWOG) clinical trial, where an interest is to assess the effect on a HRQOL outcome measured at six months after treatment among advanced refractory prostate cancer patients being treated with Docetaxel and Estramustine (DE) versus Mitoxantrone and Prednisone (MP). But some patients died within six months after treatment, and therefore their HRQOL outcomes were not measured and were not even well-defined. As it is well known, to estimate the treatment effect on a HRQOL outcome that is truncated by death, a direct comparison between survivors in two treatment arms could be biased. This is because death serves as a mechanism of informative censoring given its strong correlation with the HRQOL. Intuitively, those patients who died would usually have had worse HRQOL outcomes than those who survived had they somehow been kept alive (Cox et al., 1992). To formulate a well-defined treatment effect on outcomes truncated by death, we adopt the principal stratification framework (Frangakis and Rubin, 2002). Based on whether each patient would survive to the HRQOL outcome measurement under treatment and whether the patient would survive to the HRQOL outcome measurement under control, subjects are classified into four principal strata as will be discussed in Section 2.2. The stratum of patients who would survive to the outcome measurement under both treatment and control are called always survivors. Following Frangakis and Rubin (2002) and Rubin (2006), we focus on the treatment effect among the always survivors, also called the survivor average causal effect (SACE), because they have well-defined HRQOL outcomes under both treatment arms.

Unfortunately, the SACE is not pointly identified without strong and untestable assumptions such as by imposing a full parametric model (Zhang et al., 2009; Frumento et al., 2012) or by utilizing a substitution variable for the latent survival type (Ding et al., 2011; Jiang et al., 2016; Ding and Lu, 2017; Wang et al., 2017). Although plausible for the settings considered by the authors, those assumptions can be too strong to be applied to many general cases where the parametric constraint is questionable and where a valid substitution variable is not available. Under weaker assumptions on the degree of selection bias, one can instead achieve partial identification on the SACE. Zhang and Rubin (2003) derived large sample bounds on the SACE under ranked average score assumptions, which are the shortest bounds possible without further assumptions. Long and Hudgens (2013) sharpened the bounds by using covariates. Yang and Small (2016) extended the ranked average score assumptions to further utilize survival information at a time point after the measurement of the HRQOL outcome.

In the previous literature on bounding the SACE, we are aware of two limitations. First, to shorten the bounds on the SACE without imposing a parametric model, the monotonicity assumption on the survival status is often invoked with a few exceptions (Zhang and Rubin, 2003; Ding et al., 2011; Wang et al., 2017). The monotonicity assumption states that the treatment does not cause death compared to the control, meaning that if a subject would die before the measurement of the HRQOL outcome therefore being truncated under treatment, the subject would also die before the measurement of the HRQOL outcome had the subject received the control. This assumption cannot be directly validated, and can be suspicious in many studies. Taking our motivating prostate cancer study SWOG as an example, although DE results in longer survival on average (Petrylak et al., 2004), from a clinical point of view, the monotonicity assumption that no patients could benefit from MP in survival may not be true since both DE and MP are active treatments. Previous research (Ding et al., 2011) suggested that the monotonicity assumption was likely to be violated in the SWOG study. Second, survival tends to be positively correlated with HRQOL and is informative as a proxy of health condition, but the previous literature on bounding the SACE utilizes only limited information of survival. Zhang and Rubin (2003), by imposing the ranked average score assumptions, utilized the survival information right before the measurement on the HRQOL outcome, and showed that this information was helpful to sharpen the bounds of the SACE. The authors derived the bounds with and without imposing the monotonicity assumption. Yang and Small (2016) showed that using a post measurement time point survival information in addition to the survival information right before the measurement could further improve the bounds on the SACE. However, in practice, more detailed survival information is often available in studies where patients were followed up for multiple times, such as the SWOG clinical trial where patients were followed up at three months, six months and twelve months, respectively. This more detailed survival information can provide additional information on the health condition of the patients, and therefore, help further improve the inference on the SACE. In addition, the method the authors developed relied on the monotonicity assumption, which limited the applicability of the method.

In this paper, under the principal stratification framework, we propose a set of ranked average score assumptions to incorporate detailed survival information to sharpen the inference on the SACE in the context of randomized trials, and meanwhile, remove the untestable and often violated monotonicity assumption on survival. Because only one potential survival status is observed for each patient, the stratum membership is not observable in general. With detailed survival information, the four principal strata can be further divided into many finer ones given the potential values of the survival time, thereby introducing additional complications to inference. To address the issue of latent stratum membership, assumptions on the joint distribution of potential survivals are necessary. In our approach, we model the joint distribution of the two potential survival times under treatment and control through a copula (Nelsen, 2006). Copulas provide a flexible way for modeling dependencies and have been used to model the joint distributions of potential outcomes in various contexts (Bartolucci and Grilli, 2011; Ma et al., 2011; Conlon et al., 2017). Using a copula model, we avoid the monotonicity assumption, and characterize the association between the two potential survival times by a single copula parameter. For each fixed value of the copula parameter, deriving the bounds on the SACE under the proposed ranked average score assumptions defines a linear programming problem. The final bounds on the SACE can then be obtained by varying the value of the copula parameter in a plausible range. We apply our proposed method to the SWOG study. By utilizing detailed survival information, we are able to substantially narrow the bounds on the SACE for the effect of DE versus MP.

## 2 Notation and Assumptions

### 2.1 Notation: potential and observed outcomes

We consider two-arm randomized experiments and adopt the potential outcomes framework to define causal effects. We let be the binary treatment for the -th subject ; we call level “the treatment” and level “the control”. Let denote the vector of treatment assignment indicators for all subjects. We use to represent the discretized potential survival time of subject from the initiation of the treatment that would be observed under treatment assignment which could be measured, for instance, by month, by year, etc. Assuming that there are follow-ups at time points respectively from the initiation of the treatment, then the values that can take are where . Take the SWOG study as an example, subjects were followed up three times, and therefore . will take a value if under the treatment assignment patient would die before the first follow-up time three months, if patient would die between three months and the second follow up time six months, if patient would die between six months and the last follow up time twelve months, and if patient would still be alive at twelve months. We use to denote the binary potential HRQOL outcome of subject that would be observed under treatment assignment . We consider HRQOL outcome measured at a fixed time point, the th follow-up (), and therefore for subjects who would die before time point , i.e., those with , their potential HRQOL outcomes ’s are not defined. Throughout this paper, we let level of the HRQOL outcome be worse than level . We use and to denote respectively subject ’s observed survival time and observed HRQOL outcome.

### 2.2 Assumptions and the parameter of interest

Under the stable unit treatment value assumption (SUTVA), there is only a single version of each treatment level and there is no interference between subjects. Therefore, we can write and as and , respectively. Moreover, the subjects can be classified into four latent groups based on the joint values of potential survival status at the HRQOL outcome measurement time under treatment and under control. Let denote subject ’s latent group, which is defined as: always survivor if and , meaning that the subject would survive at least to the time point of measurement under both treatment and control; protected if and , meaning that the subject would survive at least to the time point of measurement only under treatment; harmed if and , meaning that the subject would survive at least to the time point of measurement only under control; and never survivor if and , meaning that the subject would die before the time point of measurement under both treatment and control. Among those four groups, the always survivors constitute the only group for which both and are well defined at time . Thus the treatment effect on the HRQOL outcome is only well defined for always survivors (Frangakis and Rubin, 2002; Rubin, 2006), that is, the survivor average causal effect:

(1) |

The treatment is independent of the potential outcomes , , and .

In randomized studies such as SWOG, the ignorable treatment assignment assumption is guaranteed by design.

The following two assumptions compare two groups of subjects: and , where .
\assumption (i) When both groups and have well defined (i.e., and ), if and , then, ;

(ii) When both groups and have well defined (i.e., and ), if and , then, .

Assumption 2.2 compares the HRQOL outcomes between two groups of subjects where one group’s survival times under treatment and control are both longer than or equal to those of the other group. The probability of a worse HRQOL outcome for the group with longer survival times is not higher than that for the other group, recalling that level of the HRQOL outcome is worse than level .

(i) When both groups and have well defined (i.e., and ), if , but , then, ;

(ii) When both groups and have well defined (i.e., and ), if , but , then .

Assumption 2.2(i) compares the HRQOL outcome under treatment between two groups of subjects where one group has longer survival under treatment but shorter survival under control than the other group. If one group’s additional length of survival under treatment compared to the other group is no less than their reduced length of survival under control, Assumption 2.2(i) says that the probability of the worse HRQOL outcome under treatment for the group with longer survival under treatment is not higher than that for the other group. Assumption 2.2(ii) is the analogous assumption on the HRQOL outcome under control.

Assumptions 2.2 and 2.2 are our generalized ranked average score assumptions which utilize the survival information on multiple time points. They are plausibly satisfied in many HRQOL studies because survival is often positively related to the HRQOL. Therefore, it is reasonable to assume that the potential survival time is positively associated with a better potential HRQOL outcome, and is more predictive to the potential HRQOL outcome under the same treatment condition than under a different treatment condition. In particular, Assumption 2.2 says that the subjects with longer survivals under both treatment and control tend to be healthier on average, and therefore, are less likely to develop the bad HRQOL outcome. Assumption 2.2(i) says that for subjects’ health condition under treatment, survival under treatment is a better predictor of the HRQOL than survival under control. Therefore, even if one group of subjects live shorter than the other group under control, as long as they live much longer under treatment, they are healthier under treatment and are less likely to develop bad HRQOL outcome under treatment. Similarly, for subjects’ health condition under control, Assumption 2.2(ii) says that it is survival under control is a better predictor.

Remark . Our generalized ranked average score assumptions on the HRQOL become intuitive if we consider the following generalized linear models:

If , then and are the conditional relative risks of on the treatment and control potential HRQOL for The models above imply that

(2) | |||||

(3) |

If , then Assumption 2.2 holds. If , i.e., the conditional relative risk of on is larger than , then Assumption 2.2(i) holds. If , i.e., the conditional relative risk of on is larger than , then Assumption 2.2(ii) holds.

If , then and are the partial regression coefficients of on the treatment and control potential HRQOL for Then the right-hand sides of (2) and (3) are the corresponding comparisons of the conditional probabilities on the risk difference scale. If , then Assumption 2.2 holds. If , i.e., the partial regression coefficient of on is larger than , then Assumption 2.2(i) holds. If , i.e., the partial regression coefficient of on is larger than , then Assumption 2.2(ii) holds.

Note that the models in this remark are used to aid interpretations, and our method does not need to invoke them.

Zhang and Rubin (2003), hereafter ZR, proposed ranked average score assumptions to utilize the survival information on a single time point, i.e., the survival status when the HRQOL outcome was measured. Their assumptions say that when assigned to treatment, the risk of the worse HRQOL outcome for always survivors is not higher than that for the protected; and when assigned to control, the risk of the worse HRQOL outcome for always survivors is not higher than that for the harmed. Mathematically, ZR assumed,

(4) | |||||

(5) |

Different from our assumptions, ZR used only the survival information at the time point of the measurement of the HRQOL outcome. However, detailed survival information contains additional information on the health condition of the subjects, which can help further sharpen the inference on the SACE. Moreover, detailed survival information creates strata finer than the four principal strata, and the comparisons among those finer strata result in more plausible assumptions than the coarse comparisons among the four principal strata. For example, consider the SWOG study of the effect of DE (treatment) versus MP (control) on the HRQOL at six months described in the introduction, where patients were followed up at three months, six months and twelve months. Let us compare the following two groups’ HRQOL outcomes. The first group consists of patients who would die shortly after six months no matter being treated by DE or MP, i.e., . The second group consists of patients who would survive more than twelve months no matter being treated by DE or MP, i.e., . Apparently, the first group subjects’ health conditions at six months are much worth than those of the second group because the first group would die shortly after six months no matter being treated by DE or MP. Therefore, it’s reasonable to assume that the first group’s risk of bad HRQOL outcome is not lower than that for the second group as implied by our Assumption 2.2. However, ZR did not provide a comparison between two groups like those. For another example, ZR assumed that the protected, on average, had worse HRQOL outcomes than always survivors under treatment. In contrast, our Assumption 2.2 assumes that some particular subgroups of always survivors, on average, have worse HRQOL outcomes than some particular subgroups of the protected under treatment, which are more reasonable assumptions for many HRQOL studies given more survival information. Let’s still consider the SWOG study and compare the following two groups’ HRQOL outcomes under DE. The first group again consists of patients who would die shortly after six months no matter being treated by DE or MP, i.e., . The second group consists of patients who would survive more than twelve months if being treated by DE and would die shortly after three months if being treated by MP, i.e., , and . Patients respond differently to different treatments. Although the patients in the second group do not respond to MP as those patients in the first group, they respond to DE much better than the patients in the first group. Thus, it is reasonable to assume that when being treated by DE, the second group of patients, a subgroup of the protected, are likely to be less sick at six month, and therefore, their risks of bad HRQOL outcome are not higher than those for the first group of patients, a subgroup of always survivors.

Yang and Small (2016), hereafter YS, extended ZR’s ranked average score assumptions by utilizing a post measurement time point survival information. Their assumptions could be viewed as a simple version of our generalized ranked average score assumptions with two follow-up time points and with monotonicity constraints on the survival. The authors considered a scenario where the subjects were followed up twice () and their HRQOL outcomes were measured at the first follow up time point (). Besides the SUTVA and Ignorability assumptions, the monotonicity assumption was imposed to restrict the possible combinations of the values of and . Mathematically, the monotonicity assumption states that, As we discussed in the introduction, this assumption is strong and often suspicious. Assumptions to in YS are exactly the same as our generalized ranked average score assumptions (i.e., Assumptions 2.2 and 2.2) in this special case of and equal lengths of follow-up intervals (i.e., ) or longer second follow-up interval than the first (i.e., ), without considering subjects whose survival would be harmed by the treatment. Compared with YS, our assumptions are more conservative in the case of longer first follow-up interval than the second (i.e., ). Consider two groups of patients where the first group consists of patients who would survive to the first follow-up time point under both treatment and control (i.e., patients with ), and the second group consists of patients who would survive at least to the second follow-up time point under treatment however would die even before the first follow-up time point (i.e., patients with ). YS always assume that the second group’s risk of bad HRQOL outcome under treatment is not higher than that for the first group; in contrast, we only make this assumption when the second group’s additional length of survival under treatment compared to the first group (i.e., ) is no less than their reduced length of survival under control (i.e., ). In addition, the derived bounds in YS rely on the monotonicity assumption, which may or may not cover the true effect when the monotonicity is violated. See the numerical examples in the Supplementary Materials.

## 3 Derivation of Bounds Under a Copula Model

In this section, we derive large sample bounds for the SACE under Assumptions 2.2–2.2 assuming that the observable joint distribution of is known.

### 3.1 Bounds given the joint distribution of and

Define . Define as the proportion of the fine stratum that consists of patients who would survive to time (i.e., -th follow-up) under treatment and (i.e., -th follow-up) under control. In this subsection, we assume that the ’s are known. In terms of the fine strata, the SACE is:

SACE | ||||

(6) |

Meanwhile, under Assumption 2.2, . Therefore, the observable distribution is a mixture distribution of potential outcomes of fine strata as shown in the following identities:

(7) |

(8) |

If we hypothetically know the proportions of each fine stratum , then the bounds for the SACE could be obtained by solving a linear programming problem with objective function (3.1), subject to the linear equality constraints (7) and (8), the linear inequality constraints (9) that all the probabilities ’s are bounded between and ,

(9) |

and the linear inequality constraints (10) and (11) imposed by Assumptions 2.2 and 2.2:

(10) | |||||

(11) |

### 3.2 A copula model

Although the marginal distributions of the potential survival times, and are observable, their joint distribution is not. Therefore, the proportions of fine strata ’s are not identified without further assumptions. To capture the dependence between the two potential survival times, we propose to use the copula. We assume that the joint distribution of these intermediate outcomes follows a Plackett copula (Plackett, 1965), where the degree of association is measured by a single parameter. Let be the marginal distribution function for the random variable with . The joint distribution function of and linked by a Plackett copula is given by , where when , and when and The parameter measures the association between and , and the Spearman correlation coefficient is a monotonic function of : . Therefore, and are independent for , negatively associated for , and positively associated for . Since the survival times of patients under both treatment arms are highly dependent on their underlying health status, it is reasonable to assume that patients who live longer under one treatment arm are more likely to live longer under the other, implying that (i.e., ).

For a fixed , the ’s could be calculated based on the joint distribution function, . Then given the values of the ’s, the bounds for the SACE can be obtained by solving the linear programming problem described in Section 5. The final bounds for the SACE will be constructed by varying the value of on a suitable grid and obtaining the bounds for each value of . The final lower bound will then take the value of the smallest lower bound among all the lower bounds, and the final upper bound will take the value of the largest upper bound among all the upper bounds. Alternatively, we can view as a sensitivity parameter, and draw conclusions at different values of .

We give two numerical examples in the Supplementary Materials to show the improvement of our approach over ZR’s and the potential bias of YS’s when monotonicity does not hold. Although the intuition is overwhelming that our method will lead to narrower bounds than ZR’s in many cases, it is technically challenging to give a formal proof. Even under monotonicity, YS did not give a formal proof of improvement over ZR although in most cases the improvement is apparent.

## 4 Statistical Inference Accounting for Sampling Variability

The bounds derived in the previous section are large sample bounds, where we assume that the joint distribution of under each treatment arm is known. However, in practice, we need to account for the sampling uncertainty in statistical inference. Because our bounds are results of a linear programming problem, they will be in the form of intersection (i.e., the lower/upper bound takes the maximum/minimum a collection of functionals). The maximum and minimum operators involved in the intersection bounds generate significant complications for both estimation and inference from a frequentist perspective. Most methods (e.g. Chernozhukov et al., 2013) focusing on asymptotic properties may not have desirable finite sample properties as investigated by Yang and Small (2016). A recent method proposed by Jiang and Ding (ress) requires explicit forms of the bounds. To avoid these difficulties in frequentists’ inference, we adopt a Bayesian approach to conduct inference by deriving the exact credible intervals for the bounds. The Bayesian approaches are being increasingly adopted in partially identified models (Gustafson and Greenland, 2009; Scharfstein et al., 2011; Mealli and Pacini, 2013; Gustafson, 2015).

Suppose that the subjects are followed up to time point . Let us define parameter vectors for where . Given the value of the copula parameter , the joint probability for survival times under treatment and control is, where , and is defined as for . We define parameters for and . Given parameters ’s and ’s, the joint probability of survival and HRQOL outcomes under each treatment arm is, , for and . We further define compatible region to be the region of parameters constrained by Assumptions 2.2–2.2, i.e.,

(12) |

Prior to observing any data, we assume that the ’s are independent and follow truncated Dirichlet distributions with all parameters being , i.e., , where is the indicator function taking on the value if the statement is true and otherwise. Similarly, we assume that the ’s are independent and follow truncated Beta distributions with all parameters being , i.e., . The posterior distributions of the ’s and ’s can be derived analytically according to the priors and the likelihood. As an illustration, details on the posterior distributions calculation are provided for the SWOG study in Section 5.

To find the corresponding joint posteriors of the bounds given the value of the copula parameter , we simulate from the posterior distributions of the ’s and ’s, and perform the linearly constrained optimizations to obtain the lower and upper bounds on the SACE for each simulation. Credible intervals for the bounds are not unique. A credible interval for the bounds on the SACE would be any interval where there is a posterior probability that the bounds (i.e., both the lower and upper bounds) fall within. Among all the credible intervals for the bounds on the SACE given the value of , we choose the one with the shortest length by a numerical search.

## 5 Application to the Southwest Oncology Group Study

The data previously analyzed by Ding et al. (2011) contain 487 androgen-independent prostate cancer patients enrolled in the Southwest Oncology Group (SWOG) trial between 1999 and 2003, who were randomized to receive either Docetaxel and Estramustine (DE) or Mitoxantrone and Prednisone (MP). We view the patients who received DE as the treatment group (), and the patients who received MP as the control group (). Let if there was a reduction in patient ’s HRQOL six months after treatment compared to his HRQOL at baseline (i.e., a worse HRQOL outcome), and otherwise. If the patient died before six months after treatment, the HRQOL level would not be measured and will be undefined. Note that we dichotomize the continuous HRQOL measurement to be a binary outcome indicating its reduction compared to its baseline. Although it might cause loss of information, this dichotomized outcome is a meaningful measure of the efficacy of the treatment. Moreover, for statistical inference of this binary outcome, we do not need to impose any modeling assumptions in contrast to the original continuous outcome.

As described in Section 2, takes values or for death before three months, between three months and six months, between six months and twelve months, or after twelve months, respectively. The corresponding for the Southwest Oncology Group study equals , and the corresponding equals . Among the patients, there are 135 of them have missing measurements for their HRQOL although survived beyond six months. In our analysis, we assume that is missing at random given survival and the treatment assignment , and therefore, we can ignore the missing data model in the Bayesian analysis. We set the priors to be independent and non-informative as described in Section 4, Let for , and , for and , and let for and . Under the missing at random assumption, the posterior density is

where , , and are the observed vectors of the HRQOL outcomes, survival times and treatment assignment indicators for all subjects. Therefore, the posterior distributions of and are Dirichlet with parameters and , respectively, and the posterior distributions of are Beta with parameters , , , , respectively, all truncated within the compatible region. Truncation could be done by simulating from the un-truncated distributions and rejecting the simulations without feasible solutions to the linear constrained optimization problem. The details of the optimization problem we solve for the SWOG study are presented in the Supplementary Materials.

Estimated Bounds | Relative Length | Credible Interval | Relative Length | ||
---|---|---|---|---|---|

0.0000 | 0.000 | 0.764 | 0.889 | ||

0.1000 | 0.301 | 0.684 | 0.850 | ||

0.2000 | 0.607 | 0.602 | 0.811 | ||

0.3000 | 0.926 | 0.515 | 0.770 | ||

0.4000 | 1.264 | 0.428 | 0.727 | ||

0.5000 | 1.632 | 0.346 | 0.685 | ||

0.6000 | 2.049 | 0.274 | 0.654 | ||

0.7000 | 2.545 | 0.224 | 0.633 | ||

0.8000 | 3.189 | 0.191 | 0.620 | ||

0.9000 | 4.191 | 0.148 | 0.606 | ||

0.9900 | 7.110 | 0.072 | 0.582 | ||

0.9990 | 9.773 | 0.056 | 0.578 | ||

0.9999 | 12.331 | 0.056 | 0.578 | ||

ZR | 1.0000 | 1.0000 | |||

Table 1 compares the estimated bounds and the credible intervals under the following two sets of assumptions: (i) Assumptions 2.2–2.2, and (ii) Assumptions 2.2 with ZR’s ranked average score assumptions (4) and (5). The point estimates of the lower and upper bounds are reported based on their posterior medians, and the posterior distributions of the bounds under (ii) are obtained similarly with non-informative Beta priors on parameters ’s and ’s for . According to the results under the set of assumptions (i), with a moderately small correlation between the two potential survival times under DE and MP, , among the patients with androgen-independent prostate cancer who would survive to at least six months under both DE and MP, DE would help reduce the risk of bad HRQOL by 0.8 percent to 9.9 percent. Given the facts that the two potential survival times and two potential HRQOL outcomes are highly dependent on subjects’ underlying health status, and that the correlation between the potential HRQOL outcome and potential survival time under DE is 0.22, is probably a plausible and conservative assumption on the correlation between the two potential survival times under DE and MP. These bounds are about shorter and are also more informative than the bounds obtained by utilizing only the survival information at the time point of measurement, which estimates that the effect of DE is somewhere between reducing the risk of bad HRQOL outcome by 13.0 percent and increasing the risk by 2 percent compared to MP. Unfortunately, due to the small sample size and missing data, all the credible intervals cover , indicating that there is not enough evidence to conclude that DE improves the HRQOL outcome among androgen-independent prostate cancer patients who would be able to survive to at least six months under both treatments.

To evaluate the sensitivity of the proposed approach to the choice of the type of copula, we also conducted the analysis using the Gaussian copula. The results are very close. For instance, with a correlation between the two potential survival times under DE and MP, under the Gaussian copula, the estimated bounds are (compared to under the Plackett copula) with a credible interval (compared to under the Plackett copula). To save space, the description of the Gaussian copula and the detailed results are presented in the Supplementary Materials.

## 6 Discussion

Our approach can also be applied to stratified randomized trials where strata are created based on prognostic factors and randomization is conducted within each stratum. In these settings, we can weaken our assumptions by requiring them to hold in each stratum. Each stratum specific SACE can be bounded, and the overall bounds on the SACE will then be obtained as a weighted average of stratum specific SACE’s with weights proportional to the sizes of strata measured by the numbers of always survivors (Freiman and Small, 2014).

We focused on binary outcomes which allow for obtaining nonparametric bounds of the SACE. If the original outcome is continuous and dichotomization will lose information, we can consider applying our method to estimate SACE on at each point of , or imposing additional modeling assumptions on We leave this to future research.

In our analysis of the SWOG data, we focus on the patients’ HRQOL outcome measured at six months after initiation of the treatment. However, patients’ HRQOL outcomes were measured repeatedly in this study at each follow-up time point, namely, three months, six months and twelve months. Using the current approach, one would conduct separate analyses on the SACE’s for the HRQOL outcomes measured at different time points, which is not ideal to study the change in the SACE over time. How to incorporate the information from repeated measurements of the HRQOL outcomes requires further research.

Subject matter knowledge is necessary to judge the plausibility of our assumptions because they could not be validated by the observable data. We expect that our approach could be widely applied to HRQOL outcomes because patients surviving longer tend to be healthier, and patients’ health conditions under one arm tend to be better predicted by their survival lengths under the same arm than that under a different arm. However, in studies where the relationship between the time to truncation and the outcome of interest is not clear, our assumptions could be suspicious. Consider the causal effect of a job-training program on participants’ wages that are potentially truncated by unemployment. For some subjects, longer time to get employed may indicate their being less competitive in the job market, and therefore, is associated with lower wages. However, for some subjects that are very competitive and can afford longer unemployment, they may decline some job opportunities with low wages, and therefore, longer time to get employed is associated with higher wages. In such a case, our assumptions may not apply, and alternative assumptions should be invoked to sharpen the inference.

## 7 Supplementary Materials

Numerical examples referenced in Section 3.2, details on the optimization problem and sensitivity analysis for the SWOG study referenced in Section 5, original data and source code can be found at the Biometrics website on Wiley Online Library.

## Acknowledgments

We thank the Editor (Professor Stijn Vansteelandt), Associate Editor and a reviewer for constructive comments. We thank Dylan Small for helpful discussions and suggestions. Fan Yang’s research is partially supported by National Institutes of Health grant R03 CA208387-01A1. Peng Ding’s research was partially supported by the Institute of Education Sciences grant R305D150040.

## References

- Bartolucci and Grilli (2011) Bartolucci, F. and Grilli, L. (2011). Modeling partial compliance through copulas in a principal stratification framework. Journal of the American Statistical Association 106, 469–479.
- Chernozhukov et al. (2013) Chernozhukov, V., Lee, S., and Rosen, A. M. (2013). Intersection bounds: estimation and inference. Econometrica 81, 667–737.
- Conlon et al. (2017) Conlon, A., Taylor, J., and Elliott, M. (2017). Surrogacy assessment using principal stratification and a gaussian copula model. Statistical methods in medical research 26, 88–107.
- Cox et al. (1992) Cox, D. R., Fitzpatrick, R., Fletcher, A., Gore, S., Spiegelhalter, D., and Jones, D. (1992). Quality-of-life assessment: can we keep it simple? Journal of the Royal Statistical Society: Series A (Statistics in Society) 155, 353–393.
- Ding et al. (2011) Ding, P., Geng, Z., Yan, W., and Zhou, X.-H. (2011). Identifiability and estimation of causal effects by principal stratification with outcomes truncated by death. Journal of the American Statistical Association 106, 1578–1591.
- Ding and Lu (2017) Ding, P. and Lu, J. (2017). Principal stratification analysis using principal scores. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 757–777.
- Frangakis and Rubin (2002) Frangakis, C. E. and Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics 58, 21–29.
- Freiman and Small (2014) Freiman, M. H. and Small, D. S. (2014). Large sample bounds on the survivor average causal effect in the presence of a binary covariate with conditionally ignorable treatment assignment. The International Journal of Biostatistics 10, 143–163.
- Frumento et al. (2012) Frumento, P., Mealli, F., Pacini, B., and Rubin, D. B. (2012). Evaluating the effect of training on wages in the presence of noncompliance, nonemployment, and missing outcome data. Journal of the American Statistical Association 107, 450–466.
- Gustafson (2015) Gustafson, P. (2015). Bayesian Inference for Partially Identified Models: Exploring the Limits of Limited Data, volume 140. CRC Press.
- Gustafson and Greenland (2009) Gustafson, P. and Greenland, S. (2009). Interval estimation for messy observational data. Statistical Science 24, 328–342.
- Jiang and Ding (ress) Jiang, Z. and Ding, P. (In press). Using missing types to improve partial identification with missing binary outcomes. Annals of Applied Statistics .
- Jiang et al. (2016) Jiang, Z., Ding, P., and Geng, Z. (2016). Principal causal effect identification and surrogate end point evaluation by multiple trials. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78, 829–848.
- Long and Hudgens (2013) Long, D. M. and Hudgens, M. G. (2013). Sharpening bounds on principal effects with covariates. Biometrics 69, 812–819.
- Ma et al. (2011) Ma, Y., Roy, J., and Marcus, B. (2011). Causal models for randomized trials with two active treatments and continuous compliance. Statistics in medicine 30, 2349–2362.
- Mealli and Pacini (2013) Mealli, F. and Pacini, B. (2013). Using secondary outcomes to sharpen inference in randomized experiments with noncompliance. Journal of the American Statistical Association 108, 1120–1131.
- Nelsen (2006) Nelsen, R. B. (2006). An Introduction to Copulas (2nd ed.). New York: Springer.
- Petrylak et al. (2004) Petrylak, D. P., Tangen, C. M., Hussain, M. H., Lara Jr, P. N., Jones, J. A., Taplin, M. E., et al. (2004). Docetaxel and estramustine compared with mitoxantrone and prednisone for advanced refractory prostate cancer. New England Journal of Medicine 351, 1513–1520.
- Plackett (1965) Plackett, R. L. (1965). A class of bivariate distributions. Journal of the American Statistical Association 60, 516–522.
- Rubin (2006) Rubin, D. B. (2006). Causal inference through potential outcomes and principal stratification: application to studies with “censoring” due to death. Statistical Science 21, 299–309.
- Scharfstein et al. (2011) Scharfstein, D., Onicescu, G., Goodman, S., and Whitaker, R. (2011). Analysis of subgroup effects in randomized trials when subgroup membership is missing: application to the second multicenter automatic defibrillator intervention trial. Journal of the Royal Statistical Society: Series C (Applied Statistics) 60, 607–617.
- Wang et al. (2017) Wang, L., Zhou, X.-H., and Richardson, T. S. (2017). Identification and estimation of causal effects with outcomes truncated by death. Biometrika 104, 597–612.
- Yang and Small (2016) Yang, F. and Small, D. S. (2016). Using post-outcome measurement information in censoring-by-death problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78, 299–318.
- Zhang and Rubin (2003) Zhang, J. L. and Rubin, D. B. (2003). Estimation of causal effects via principal stratification when some outcomes are truncated by death. Journal of Educational and Behavioral Statistics 28, 353–368.
- Zhang et al. (2009) Zhang, J. L., Rubin, D. B., and Mealli, F. (2009). Likelihood-based analysis of causal effects of job-training programs using principal stratification. Journal of the American Statistical Association 104, 166–176.

Supplementary Material

## Appendix A Numerical Examples

### a.1 Numerical example showing improvements

Consider a hypothetical study where subjects are followed up four times () at time points , , , and with equal follow-up intervals and the HRQOL outcome is measured at the second follow-up (). We set the marginal distributions for to and as follows:

Given the marginal distributions, their joint distribution is specified by a Plackett copula with , corresponding to a Spearman correlation . The underlying risks of developing the bad HRQOL outcome under treatment and under control for each fine stratum are described in Table 2.

\backslashbox | |||||
---|---|---|---|---|---|

In the above setup, SACE , meaning that the treatment increases the risk of the bad HRQOL outcome by among patients who will survive at least to time point under both treatment arms.

Suppose that we have an infinite sample. Under Assumption 1, we would observe the values of ’s since , and would observe the joint distribution of and in each treatment arm,

Bounds | Relative Length | ||
---|---|---|---|

0.000 | 0.228 | ||

0.301 | 0.188 | ||

0.200 | 0.607 | 0.158 | |

0.300 | 0.926 | 0.132 | |

0.400 | 1.264 | 0.115 | |

0.500 | 1.632 | 0.099 | |

0.600 | 2.049 | 0.084 | |

0.700 | 2.545 | 0.069 | |

0.800 | 3.189 | 0.055 | |

0.900 | 4.191 | 0.041 | |

0.990 | 7.110 | 0.029 | |

0.999 | 9.773 | 0.028 | |

0.9999 | 12.331 | 0.028 | |

ZR | 1.000 |

Table 3 presents the bounds for the SACE under Assumptions 1–3 as a function of the Spearman correlation coefficient . The bounds get tighter as increases. This reflects the fact that with larger , there is less uncertainty in each subject’ stratum membership when one of the two potential survival status is observed. With no prior information on