Randomization inference with general interference and censoring

# Randomization inference with general interference and censoring

## Abstract

Interference occurs between individuals when the treatment (or exposure) of one individual affects the outcome of another individual. Previous work on causal inference methods in the presence of interference has focused on the setting where a priori it is assumed there is ‘partial interference,’ in the sense that individuals can be partitioned into groups wherein there is no interference between individuals in different groups. Bowers, Fredrickson, and Panagopoulos (2012) and Bowers, Fredrickson, and Aronow (2016) consider randomization-based inferential methods that allow for more general interference structures in the context of randomized experiments. In this paper, extensions of Bowers et al. are considered, including allowing for failure time outcomes subject to right censoring. Permitting right censored outcomes is challenging because standard randomization-based tests of the null hypothesis of no treatment effect assume that whether an individual is censored does not depend on treatment. The proposed extension of Bowers et al. to allow for censoring entails adapting the method of Wang, Lagakos, and Gray (2010) for two sample survival comparisons in the presence of unequal censoring. The methods are examined via simulation studies and utilized to assess the effects of cholera vaccination in an individually-randomized trial of children and women in Matlab, Bangladesh.

C

ausal Inference; Censoring; Interference; Permutation test; Randomization inference; Spillover effects.

## 1 Introduction

### 1.1 Background

Interference arises when an individual’s potential outcomes depend on the treatment status of others. The assumption of no interference (Cox, 1958) rules out an individual’s potential outcome depending on anyone else’s treatment. Assuming interference is absent when assessing the causal effect of a treatment on an outcome may be scientifically implausible in certain settings. For example, in the study of infectious diseases, whether one individual receives a vaccine may affect whether another individual becomes infected or develops disease. Motivated by infectious diseases, social sciences, econometrics, and other settings where individuals interact, many existing causal inference methods have recently been extended to allow for interference between individuals; see Halloran and Hudgens (2016) for a recent review of these methods.

Some previous work on causal inference methods in the presence of interference has focused on the setting where a priori it is assumed that there is partial interference (Sobel, 2006), that is, individuals can be partitioned into groups wherein there is no interference between individuals in different groups. In this paper we consider the more general setting where the assumed interference structure can take any form. The assumed interference structure under general interference may be encoded by a network adjacency matrix, with vertices as individuals and (directed) edges indicating possible interference between individuals. Approaches that adopt a network perspective for evaluating treatment effects when interference may be present include Bowers, Fredrickson, and Panagopoulos (2012), Toulis and Kao (2013), Ugander et al. (2013), van der Laan (2014), Bowers, Fredrickson, and Aronow (2016), Forastiere, Airoldi, and Mealli (2016), Liu, Hudgens, and Becker-Dreps (2016), Aronow and Samii (2017), Athey, Eckles, and Imbens (2018), and Eckles, Karrer, and Ugander (2017) among others.

### 1.2 Randomization methods for inference under general interference

In randomized experiments where the treatment assignment mechanism is known, Bowers, Fredrickson, and Panagopoulos (2012) (henceforth BFP) described how to carry out randomization-based (i.e., permutation or design-based) inference on parameters in causal models which allow for general interference. BFP proposed a causal model which restricts interference to those who did not receive treatment, with the direct (or individual) effect parametrized to be larger in magnitude than the spillover (or peer) effect. Bowers, Fredrickson, and Aronow (2016) (henceforth BFA) considered a different causal model, adopted from Manski (2013), that assumes additive direct and spillover effects. In this paper, we focus on these two causal models, although the methods developed are general and can easily be applied to other causal models.

For each causal model, a randomization-based approach is employed for drawing statistical inference about the parameters of the causal model. The approach entails constructing confidence sets for the causal parameters by inverting a set of hypothesis tests. An appealing aspect of randomization-based inference (Fisher 1935; Rosenbaum 2002, Chapter 2) is that no assumption of random sampling from some hypothetical superpopulation is invoked. Another benefit is the resulting confidence sets are exact, i.e., the probability the true causal parameters are contained in a confidence set is at least the nominal level . Moreover, in settings where possible interference is a priori assumed to have an arbitrary specified network structure, it is unreasonable to assume individual outcomes are independent, such that standard frequentist approaches are not justified; in contrast randomization-based methods that allow for possible general interference readily apply.

For a specified casual model and particular parameter value, the observed outcomes may be mapped to the potential outcomes under the uniformity trial (Rosenbaum, 2007) where no one receives treatment. If the causal model is correctly specified and the true parameter value assumed, then the uniformity trial potential outcomes should be similarly distributed between treatment and control groups. This suggests hypothesis tests which can be inverted to construct confidence sets. For instance, BFP utilized the two-sample Kolmogorov-Smirnov (KS) test statistic to compare the empirical distributions of the uniformity trial outcomes in each group. BFA employed a multiple linear regression model of the uniformity trial outcomes, using the resulting sum of squares of residuals (SSR) as a test statistic.

### 1.3 Proposed methods to accommodate right-censoring

In this article we consider the setting where the response of interest is a failure time, and only the censoring time is observed for a subset of individuals due to right censoring. Extensions of Bowers et al. are proposed which allow for right censoring. The log-rank statistic comparing the treated and control groups is used instead of the KS statistic in the presence of censoring. Similarly, a parametric accelerated failure time model is used in lieu of a multiple linear regression model. However, it is shown empirically that usual randomization-based tests employing these test statistics are not guaranteed to have type I error rates close to the nominal level in the presence of censoring. The proposed extension of Bowers et al. to permit right censored observations entails adapting the method of Wang, Lagakos, and Gray (2010) for two sample survival comparisons in the presence of unequal censoring. The proposed methods are examined via simulation studies and utilized to assess the effects of cholera vaccination in an individually-randomized trial of women and children in Matlab, Bangladesh.

### 1.4 Outline

The remainder of this article is organized as follows. In Section 2 notation is introduced and causal models are defined. In Section 3 different test statistics when outcomes are completely observed, i.e., there is no censoring, are presented, and the randomization inferential procedure is described. In Section 4 simulation studies comparing the different tests are conducted, including the setting where the assumed causal model differs from the true data-generating causal model. In Section 5 extensions for right censored outcomes are proposed, and the operating characteristics of the proposed methods are assessed via simulation studies. In Section 6 the methods are applied to the cholera example, and a summary is provided in Section 7.

## 2 General Interference and Causal Models

### 2.1 General Interference

Consider a finite population of individuals randomly assigned to either treatment or control. For each individual , let if individual is assigned treatment and otherwise. The vector of length comprising all treatment assignments is denoted . Let be an outcome of interest. Denote the potential outcome for individual under a treatment assignment , possibly counter to fact, by . By causal consistency (Pearl, 2010), . The potential outcomes for and are considered fixed features of the finite population of individuals (Neyman, 1990). Define the interference matrix with entry for as follows. Let for . For let if it is assumed a priori individual does not interfere with individual ; otherwise let . We emphasize that implies it is assumed a priori does not depend on , whereas merely indicates the possibility that individual may interfere with individual , and does not necessarily imply depends on . Indeed, one of our primary inferential goals is to determine whether such possible interference is present. The definition of encodes the assumption that any spillover effects on individual may emanate only from individuals where , and not from those where . The exact relationship between and is specified using a causal model described in the next section. Here and throughout we assume is known and invariant to . Let the interference set for individual be the set of indices where . Denote the -th row of by the vector , and the number of individuals in the interference set by the scalar . Under partial interference, individuals can be partitioned into groups or clusters wherein there is no interference between groups, in which case can be expressed as a block-diagonal matrix with each block corresponding to a group. Under general interference, each individual is allowed to have their own possibly unique interference set, so that there is no restriction on the structure of . More generally we could define to take on non-binary values that represent the extent that is allowed to interfere with , such as the distance measure used by Choi (2017); however, here we only consider with binary entries.

### 2.2 Causal Models

A (counterfactual) causal model expresses the potential outcomes as a parametric deterministic function of any treatment . Following Bowers et al. (2012, 2016) we consider a class of causal models which entail the composition of two functions. In particular, assume for user-specified functions and . The function takes as its arguments the treatment vector , causal parameter , and interference matrix . The specification of determines how an individual’s potential outcomes differ across different treatments and different values of the parameter , and includes, but is not limited to, how direct and spillover effects propagate. For notational simplicity we henceforth write with the dependence on implicit. The dependence of on is also left implicit notationally as it is implied under the specified causal model . The link function maps to in terms of a specified . The uniformity trial potential outcomes can be determined from the observed outcomes under a specified causal model by , where is the inverse of i.e., . In general, may be any one-to-one function. For the remainder of this paper, assume .

Let the number and proportion of individuals in individual ’s interference set assigned to treatment be and respectively; here implies . Note that and depend on , but this dependence is suppressed for notational convenience. For a treatment assignment , let and be the realizations of and . We consider the following specific causal models, although the methods described below are general and can be applied to other forms of :

 Fadd(z;δ,τ) =δzi+τgi; (1) FBFP(z;δ,τ) =δ+log[1+(1−zi){exp(−δ)−1}exp(−τ2ti)]. (2)

Under both causal models, the effect of on the outcome for individual takes the form of a bivariate treatment: (i) is the treatment received, and (ii) (or ) is the proportion (or number) of individuals in the interference set treated. The parameters and measure the extent to which the potential outcomes increase or decrease, relative to , due to and (or ). As both and depend only on the total number in the interference set treated, a peer effect homogeneity assumption is implied by these two causal models; Hudgens and Halloran (2008) refer to the assumption as stratified interference, which may be formally stated as if The next section describes how to carry out randomization inference for the parameter under a specified causal model .

## 3 Randomization-based Inference

### 3.1 Test statistics

When a hypothesis holds under a specified causal model , the uniformity trial potential outcomes can be determined from the observed data by . In a randomized experiment where individuals are assigned treatment with equal probability, the uniformity trial outcomes should be similar between treatment and control groups if is true (Rosenbaum, 2002). Therefore the null hypothesis can be tested using a test statistic that compares the uniformity outcomes between individuals in the treatment () and control () groups. For example, BFP used the two-sample KS test to compare the empirical distributions of the uniformity outcomes from each group.

BFA proposed a multiple linear regression model of the uniformity outcomes on and , using the resulting sum of squares of residuals (SSR) as a test statistic. The regression procedure may be formalized as follows. For a treatment assignment , let the regression predictors for individual be ; the interaction term and the interference set size are included to potentially improve the regression model fit. (For the causal model, may be replaced by .) Since the potential outcomes are log-linear in under the specified causal model, i.e., , we suggest using the log-transformed uniformity outcomes, denoted by , as the regression outcomes. The regression model is fitted, where is the regression model matrix with as the -th row, and are the regression error terms. From multiple linear regression theory, the residual sum-of-squares is minimized by the ordinary least squares (OLS) estimate . The test statistic under is the reciprocal is taken so that larger values of the test statistic suggest stronger evidence against .

To guide the choice of regression model in practice, we propose replacing the main effects and (or ) in with instead. For example, the regression model corresponding to may be . The estimate for can be found by minimizing the residual sum of squares:

 ^β=argminβn∑i=1[logyi(0)−{β0+FBFP(zi,ti;β1,β2)+β12ziti+β3Ai}]2.

Since is a non-linear function of and , there does not appear to be a closed form solution for ; however, the solution may be found numerically. The test statistic based on the modified regression model is then

In addition to the KS and SSR statistics, a likelihood ratio statistic may also be considered. Following the likelihood ratio principle for testing, a likelihood ratio permutation test is expected to be the most powerful test against certain alternatives (see Lehmann and Romano (2005), Chapter 5.9 for an example in the setting where there is no interference). Under the regression model , where the errors are independent and normally distributed with mean zero and variance , denote the log-likelihood function as ; see for example, Equation (A.29) of Weisberg (2014). Let denote the log-likelihood function evaluated at the maximum likelihood estimates (MLEs) of the parameters (), which are respectively the OLS estimators , and . Let denote the log-likelihood for the ‘intercept-only’ model evaluated at the (restricted) MLE. Since is constant with respect to for a fixed parameter value , the log-likelihood difference is proportional to . Furthermore, is a monotonic function of (see Equation (A.30) of Weisberg (2014)), so that the sampling distributions of and over re-randomizations of are equivalent.

The regression model is considered as a ‘working model,’ used solely to generate a test statistic for a hypothesis testing procedure. Under the randomization-based framework, the validity of the type I error is guaranteed by construction, and does not rely on the working model, including the normality and homoscedasticity assumptions, being correctly specified. In particular, is a mathematical (scalar) summary of and that is compared against other values of in deciding whether or not to reject the null hypothesis . The next section describes how to carry out such randomization inference.

### 3.2 Inference

For a chosen test statistic , the plausibility of can be assessed by evaluating the frequency of obtaining a value at least as ‘extreme’ (from ) as the observed value, over hypothetical re-randomizations of under . Here and throughout a completely randomized experiment is assumed, where the number assigned to treatment, denoted by , is fixed by design. The sample space of all hypothetical re-randomizations is the set of vectors of length containing ’s and ’s, and is denoted by The sampling distribution of the test statistic under may be determined exactly by computing for each of the possible randomizations for . Under , each randomization occurs with probability , so that a two-sided p-value may be defined as

 pv(θ0)=|Ω|−1∑z∈ΩI{TS(z;θ0)≥TS(Z;θ0)}, (3)

where larger values of suggest stronger evidence against , and if is true and otherwise. When it is not computationally feasible to enumerate exactly, an approximation with random draws of from may be used. The p-value is

 pvC(θ0)=C−1∑z∈ΩCI{TS(z;θ0)≥TS(Z;θ0)}. (4)

The subset of values where , or , is greater than or equal to forms a exact confidence set for .

## 4 Empirical size and power of tests

To assess empirically the operating characteristics of the proposed tests in Section 3.1, three different simulation studies were conducted. In study 1, the model was assumed for comparing the KS and SSR tests. In study 2, the causal model was assumed for comparing the KS, SSR and tests. In study 3, we consider the setting where the causal model assumed for inference differed from the true (data-generating) causal model.

### 4.1 Study 1: Fadd

For simulation study 1, let , of which individuals were assigned to treatment in a completely randomized experiment. For each individual , the interference set was generated once as follows: (i) randomly draw the interference set size as ; (ii) sample without replacement values of and set for the sampled values of ; and (iii) set the remaining values of to 0. The true uniformity trial potential outcomes were generated by carrying out step 0 below once. Steps 1 and 2 were then repeated 400 times each to obtain the empirical coverage of the 95% confidence sets.

1. For , generate the true uniformity trial potential outcomes as:

2. Randomly draw an observed treatment assignment from . Determine the observed outcomes as . The values of were chosen so that for the median value of , the magnitude of the spillover effect is greater than the direct effect i.e., .

3. For the observed dataset , carry out the KS and SSR tests for a discrete grid of values for under the causal model. Calculate the p-values , as defined in (4), with .

The size of the tests are evaluated under the null hypothesis where the assumed causal parameter values are set to the true (data-generating) values; i.e., . Both the KS and SSR tests preserved the nominal type I error rate under , which is guaranteed by construction. The empirical cumulative distribution functions (ECDFs) of the respective p-values are plotted in the left panel of Figure 1.

Power can be assessed by assuming parameter values for inference that differ from the true values. In particular, under , the SSR test had higher power on average than the KS test, rejecting at a higher rate than the KS test; see the ECDFs in the right panel of Figure 2. It was observed empirically that under the model, the values of were similarly distributed in the and groups so long as the correct value of was assumed. Thus hypotheses such as , where the assumed value of was correct but the assumed value of was incorrect, were rejected at a rate no more than the nominal significance level using the KS test. The proportion of SSR and KS 95% confidence sets that included each value of tested are plotted in Figure 2. For the correct value of , the KS confidence sets extended over values of at the nominal coverage level.

### 4.2 Study 2: FBFP

For the second simulation study, the causal model was assumed for comparing the KS, SSR and tests. Details on the study and plots of the empirical results are provided in Web Appendix A. As expected, all three tests maintained the nominal type I error rate . The SSR_BFP test had the greatest power on average empirically among the three tests.

### 4.3 Study 3: Misspecification of assumed causal model

Each hypothesis is a joint hypothesis of the specified causal model and the assumed parameter value . A test statistic may thus summarize aspects of the data relevant to both and , and may not necessarily reflect the plausibility of either or alone. For example, under the causal model, the spillover effect is restricted to only those in the group, with magnitude no larger than the direct effect. If the observed outcomes arose from a (true) causal model in which those in the group experienced a spillover effect, or the magnitude of the spillover effect was greater than the direct effect, or both, it may not be possible to recover the true uniformity trial outcomes from using for any value of . It is shown in Web Appendix B that if the observed outcomes are generated under the true causal model, there may not exist any unique solutions for when fitting the misspecified model.

Using a third simulation study, we demonstrate that the resulting confidence sets for assuming a misspecified causal model may be empty. Following simulation study 1 in Section 4.1, the observed outcomes were determined under the causal model in steps 0 and 1. Instead of assuming the correct data-generating causal model () in step 2, an incorrect causal model () was used to determine the uniformity trial outcomes. The resulting SSR confidence sets were empty for all 400 simulations, providing evidence of the implausibility of the assumed model. Details of the simulation study, including how a linear approximation of may be used to carry out the SSR test, are described in Web Appendix B1.

## 5 Right censored failure time outcomes

Now suppose each individual’s response is a (positive) failure time, subject to right censoring if the individual is not followed long enough for failure to be observed. For , let and denote the failure time and the censoring time respectively. is observed only if , so that the observed data are and the failure indicator . Under a null hypothesis for a specified causal model , let be the uniformity trial potential failure time. Unfortunately, the uniformity trial potential failure time can only be determined for individuals observed to fail. In particular, let as in the previous sections. Then if , and if .

### 5.1 Test statistics that accommodate right censoring

The test statistics considered in Section 3.1 require modification to accommodate right censoring. Instead of the KS test statistic, the log-rank statistic may be used to compare the possibly right censored uniformity failure times in the two treatment groups. An analog of the multiple linear regression model is the parametric accelerated failure time (AFT) model where the log-transformed failure times are linear functions of the predictors; see for example, Kalbfleisch and Prentice (2011). The log-normal AFT model of the uniformity failure times is where the errors are independent and normally distributed with mean zero and variance one. Denote the log-likelihood function as where ; see for example, Equation (6.25) of Collett (2003). In general, the MLEs of the parameters (), denoted by (), can be found iteratively using the Newton-Raphson method. Let denote the log-likelihood function evaluated at the MLE. Let denote the log-likelihood for the ‘intercept-only’ model evaluated at the restricted MLE. The log-likelihood difference is . In practice, can be used in place of since is constant with respect to for a fixed parameter value .

### 5.2 Empirical size of tests with right censoring

The randomization-based inferential procedures described in Section 3.2 do not necessarily yield tests that preserve the nominal size in the presence of right censoring, even if the test statistics considered in Section 5.1 are utilized. To demonstrate, a simulation study is conducted as follows. Let , and generate the interference sets as in Section 4.1.

1. Generate the uniformity failure times as

2. Randomly draw an observed treatment assignment from , where individuals are assigned to treatment. Determine the underlying failure time for individual with observed treatment () as for . The censoring times are then drawn from distributions that depend on treatment. First, randomly draw the dropout times from a lognormal distribution . The administrative censoring time is then set as . If , let the censoring times be ; otherwise, assume there is no dropout and let . Set the observed outcomes and failure indictors as defined above.

3. Determine the uniformity outcomes under . For the dataset , carry out the LogR and LRaft tests, holding fixed over re-randomizations. Calculate the p-values with .

Step 0 was carried out once, then steps 1 and 2 repeated 800 times each. The average failure rates were approximately and in the and arms respectively. The ECDFs of the log-rank (LogR), and LRaft p-values are plotted in the left panel of Figure 3. In general, neither test controlled the nominal type I error rate.

### 5.3 Correcting for right censored uniformity trial failure times

In this section, an alternative randomization-based procedure is proposed to better control the type I error in the presence of right censored outcomes.

Randomization tests of no treatment effect in the presence of censoring generally only preserve the nominal size when treatment does not affect the censoring times. To see this, consider for a moment the setting where there is no interference between individuals, so that each individual has two potential failure time outcomes and , and two potential censoring times and . Let and , and define and as above. Consider testing the sharp null hypothesis of no treatment effect i.e., for , using some test statistic which is a function of . If we assume , then under the sharp null both and will be the same regardless of treatment assignment, allowing exact determination of the test statistic’s sampling distribution by enumeration over all possible randomization assignments .

Returning to the setting where there is interference, in Section 5.2 the null hypothesis for a specified causal model was tested by conducting a randomization test of no effect of treatment on the possibly right censored uniformity potential failure times for given , with each test statistic a function of . Unfortunately, the standard randomization testing approach described above cannot be used to determine a test statistic’s sampling distribution under the null because in general an individual’s censoring indicator will not be fixed over all possible re-randomizations , even if treatment has no effect on the censoring times. To see this, consider the causal model and suppose and i.e., individual is assigned treatment and is censored at time . Let denote the potential censoring time for individual under treatment assignment ; further assume treatment has no effect on the censoring times, so that for all . Now consider some other where so that . Therefore, for , individual would not be censored when .

Thus it is not surprising that the empirical results in the previous section demonstrate that the randomization tests as presented do not in general have type I error rates close to the nominal level. Instead, we propose the following randomization-based method that allows for the set of censored individuals to vary over randomizations. Following the permutation test by Wang et al. (2010), first compute the Kaplan-Meier (KM) estimator of the survival function using the right censored uniformity failure times and failure indicators . Then for each re-randomization , uniformity trial failure times are imputed using the KM estimator. The failure times under treatment assignment are then determined under the specified causal model and the assumed values of the parameters; for example, under the causal model , .

Next the censoring times are imputed based on KM estimators of the censoring time distribution. For , the KM estimators are computed using the censoring times and censoring indicators (i.e., ) for individuals with . Then for each re-randomization , censoring times are randomly sampled from the corresponding distribution for . Finally, the outcomes and failure indicators are determined, and the test statistic of interest is computed. The procedure for a single observed dataset is presented in Web Appendix C.

### 5.4 Empirical evaluation of proposed tests

The simulation study described in Section 5.2 was repeated, but with step 2 replaced with the proposed test in the preceding section. The empirical results are shown in the right panel of Figure 3. The LogR and LRaft tests both had type I error rates that approximately equal the nominal size for all significance levels .

In another simulation study, the proportion of LogR and LRaft 95% confidence sets that included each value of tested are plotted in Figure 4. Similar to the setting without censoring, for the correct value of , the LogR confidence sets included multiple (incorrect) values of at the nominal coverage level. Details on the study are provided in Web Appendix D.

## 6 Application to randomized trial of cholera vaccine

In this section the methods described above are utilized to assess the effects of cholera vaccination in a placebo-controlled individually-randomized trial in Matlab, Bangladesh (Ali et al., 2005; Ali et al., 2009). In prior analyses of these data, Ali et al. (2005) found a negative association between an individual’s risk of cholera infection and the proportion of individuals vaccinated in the area surrounding an individual’s residence, suggesting possible interference. Similarly, analysis by Emch et al. (2009) found that the risk of cholera is inversely related with vaccine coverage in environmental networks that were connected via shared ponds. Likewise, Root et al. (2011) concluded that the risk of cholera among placebo recipients was inversely associated with level of vaccine coverage in their social networks.

Motivated by these association analyses, Perez-Heydrich et al. (2014) used inverse probability weighted estimators proposed by Tchetgen Tchetgen and VanderWeele (2012) to provide evidence of a significant indirect (spillover) effect of cholera vaccination. However, Perez-Heydrich et al. (2014) assumed partial interference based on a spatial clustering of individuals into groups and did not account for right censoring. Misspecification of the interference structure and failure to account for right censoring may bias results. The analysis below considers other possible interference structures and allows for right censoring.

All children aged 2-15 years and females over 15 years in the Matlab research site of the International Centre for Diarrheal Disease Research, Bangladesh were individually assigned randomly to one of three possible treatments: B subunit-killed whole cell oral cholera vaccine; killed whole cell-only oral cholera vaccine; or Escherichia coli K12 placebo. Recipients of either vaccine were grouped together for analysis as the vaccines were identical in cellular composition and similar in protective efficacy in previous analyses. Denote for those assigned to placebo, and for those assigned to either vaccine. Individuals were only included in the analysis if they had completely ingested an initial dose and had completely or almost completely ingested at least one additional dose. There were a total of individuals in the randomized trial subpopulation for analysis, with assigned to vaccine and to placebo. The primary outcome for analysis was the (failure) time in days from the 14th day after the vaccination regimen was completed (end of the immunogenic window; Clemens et al. 1988), until a patient was diagnosed with cholera following presentation for treatment of diarrhea. Failure times for many trial participants were right censored either due to outmigration from the field trial area or death prior to the end of the study, or administrative censoring at the end of the study on June 1, 1986.

### 6.1 Interference specifications

The vaccine trial was analyzed using three different specifications of interference. For all three specifications, an individual’s interference set included all other individuals residing in the same bari (i.e., households of patrilineally-related individuals), since person-to-person transmission of cholera often takes place within the same bari. There were a total of 6423 baris as defined with the Demographic Surveillance System and detailed geographic information system mapping of Matlab. Since baris were geographically discrete, three different specifications were posited regarding how an individual’s interference set might include other individuals in different baris, based on either geographical proximity or social interactions.

The first specification followed the same approach in Perez-Heydrich et al. (2014). Baris were partitioned into neighborhoods according to a single linkage agglomerative clustering method (Everitt et al., 2011). No interference was assumed between individuals in different baris and no additional assumptions were imposed regarding the interference structure. That is, partial interference was a priori assumed under this specification. The average number of individuals in each interference set was 419 with an interquartile range (IQR) of 120–631.

Ali et al. (2005, 2018) found an association between the risk of cholera for a placebo recipient and the level of vaccine coverage among individuals living within a 500 meter (m) radius of the placebo recipient. Following Ali et al. (2005, 2018), the second specification of the individual interference sets entails assuming an individual’s potential outcome may possibly depend on those living in a bari within a 500m radius of the bari s/he resided in. This specification does not assume a priori partial interference between individuals in different neighborhoods. The average number of individuals in each interference set was 499 (IQR 339–626).

The previous two specifications of the interference structure employed a ‘local’ neighborhood context based on geographical measures. Following Root et al. (2011), the third interference structure was defined according to a kinship-based social network between baris. The Matlab Demographic Surveillance System recorded the exact dates and bari of residence over time for each individual. An individual who migrated between two baris, primarily due to kinship relationships such as marriage, created a non-directional social tie between the baris (Emch et al., 2012). The average number of individuals in each interference set was 162 (IQR 70–225). The interference matrices under each of the three specifications are plotted in Figure 5.

The study population also included 44887 individuals who did not participate in the randomized trial, and thus have zero probability of receiving either cholera vaccine. However, most of these individuals also resided in the same baris as those who took part in the trial: 5661 baris contained a mixture of participants and non-participants, with a median participation rate of 71% within a bari. Since the three specified interference sets were defined based on baris, the definition of was expanded to include non-trial participants as follows. Let be the total number in the study population, regardless of trial participation, who may possibly interfere with person , so that . Denote the proportion of who receive treatment as , i.e., .

### 6.2 Results

For each specified interference matrix, confidence sets for () were constructed under the causal model, , by conducting hypothesis tests over a discrete grid of values of . It was not computationally feasible to enumerate the randomization set exactly with possible re-randomizations, so p-values were calculated with random draws of from . The LRaft 95% confidence sets are plotted in Figure 6, with the contours indicating pairs of () yielding the same p-values. The boundaries of the confidence set are demarcated by the contour lines that indicate p-values at least as large as 0.05.

There is evidence vaccination had an effect on the risk of cholera as the 95% confidence sets excluded under all three interference structure specifications. Point estimates of the joint treatment effects, corresponding to values of () with the largest p-value, were positive, suggesting the effect of the vaccine in reducing the risk of cholera was a combination of protective direct and spillover effects. For the 500m interference structure, the estimated treatment effect was . We offer two interpretations of under the additive causal model. First, the average time until cholera diagnosis had everyone not received vaccine (i.e., the uniformity trial) would have been times faster than if everyone had received vaccine (e.g., the ‘blanket coverage’ trial). Second, the estimated risk of cholera incidence at 365 days under the uniformity trial would be approximately 2.30% compared to 0.06% under the blanket coverage trial, corresponding to a 98% reduction.

## 7 Conclusion

In this paper we proposed randomization-based methods for assessing the effect of treatment on right-censored outcomes in the presence of general interference. Validity and power of the methods were demonstrated empirically via simulation studies. Utilization of the methods in the analysis of a large randomized cholera vaccine trial corroborate prior analyses indicating interference whereby unvaccinated individuals benefit from the vaccination of others (Barkley et al., 2017; Perez-Heydrich et al., 2014; Ali et al., 2009). In addition to vaccine studies, the developed methods can be applied to many other settings where general interference may be present, such as social network analyses and spatial analyses in econometrics, education, epidemiology and political science.

There are several avenues of possible future research related to the methods for censored outcomes developed in this paper. The adapted procedure as implemented only allows for unequal censoring based on . To allow for censoring to differ based on both and , a stratified Cox proportional hazards model (Cox, 1972), with and as strata and predictor respectively, may be used in place of the KM estimator of the censoring distributions. Since inference is contingent on the choice of interference structure assumed, another possible extension is to develop sensitivity analysis methods for assessing robustness to interference structure misspecification. Building on the empirical results in this paper, future research could examine the theoretical properties of the various tests considered, and derive tests that maximize power against particular alternatives. While illustrated in this paper using data from an individually-randomized trial, the proposed methods can also be employed in cluster-randomized trials. The methods could be extended to observational data under the assumption of no unobserved confounding, in principle by using a restricted set of randomizations. Finally, although this paper has focused on two specific causal models, the proposed methods are general and easily extended to other causal models.

\backmatter

## Software

R scripts implementing the methods described herein are provided at http://users.ugent.be/~weloh/.

## Acknowledgements

The authors thank Brian Barkley, Sujatro Chakladar and Bradley Saul for helpful comments. This research was supported by the NIH grant R01 AI085073-05 and by a Gillings Innovation Laboratory award from the UNC Gillings School of Global Public Health. Part of the computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation – Flanders (FWO) and the Flemish Government – department EWI. The content is solely the responsibility of the authors and does not represent the official views of the National Institutes of Health.

Web Appendices

## Appendix A Simulation study in Section 4.2

In Section 4.2, a simulation study was conducted to assess the operating characteristics of the KS, SSR and tests under the causal model. Let and assume an interference matrix that is upper diagonal with entry for as follows. For , let if ; otherwise let . A completely randomized experiment was simulated where eight individuals were randomly assigned to . The following steps were undertaken:

1. Randomly generate the uniformity trial potential outcomes as

2. Randomly draw an observed treatment assignment from . Determine the observed outcomes as for .

3. For the observed dataset , test each of the hypotheses , and , using the KS, SSR and tests under the causal model.

Step 0 was carried out once, then steps 1 and 2 were repeated for all to obtain the exact empirical cumulative distribution functions (ECDFs) of the p-values, as plotted in the left panel of Figure 1. For , the type I error rate was preserved for all three tests; i.e., a p-value is less than or equal to in no more than of hypothetical re-randomizations. Furthermore, the two SSR tests were size tests with approximately uniformly distributed p-values (ECDFs lying close to the diagonal).

Statistical power is evaluated by assuming parameter values for inference that differ from the true values. In particular, under , the SSR_BFP test had the greatest power on average among the compared tests. For a given significance level , the SSR_BFP p-values had a higher probability of being smaller than than the KS and SSR p-values. The ECDFs of the p-values under the hypothesis are plotted in the right panel of Figure 1.

## Appendix B Comparison of FBFP and Fadd

The notation is first defined for a given individual as follows. Let be the (row) vector in the interference matrix corresponding to the individual’s interference set; let be the size of the interference set; let be the number of treated individuals in the interference set for a treatment assignment (vector) ; and let be the individual’s assigned treatment.

For a fixed value of (), the BFP causal model may be written as the function In a completely randomized experiment where treatment is independently assigned, let be the probability of being assigned to treatment, and the average number of treated individuals in the interference set over re-randomizations. Note that the values of both and are fixed over re-randomizations in a completely randomized experiment. A linear approximation of at is thus

 f(Z,T) ≈f(μZ,μT)+∂f(Z,T)∂Z∣∣∣(μZ,μT)(Z−μZ)+∂f(Z,T)∂T∣∣∣(μZ,μT)(T−μT) (B.5) ∝exp(δ)−1exp(δ+τ2μT)−(1−μZ){exp(δ)−1}{Z+τ2(1−μZ)T} (B.6) =exp(δ)−1exp(δ+τ2Am/n)−(1−m/n){exp(δ)−1}{Z+τ2(1−m/n)T}. (B.7)

Consider the additive causal model . To obtain expressions for in terms of (), set the coefficients of and in the linear approximation of to be equal to the corresponding coefficients in as follows. First, using the coefficients for , the relationship between () and is:

 θ1 =exp(δ)−1exp(δ+τ2Am/n)−(1−m/n){exp(δ)−1} (B.8) ⟺ τ (B.9)

Hence, is a well-defined function of and only if

 {1−exp(−δ)}{1/θ1+(1−m/n)}>1. (B.10)

Furthermore, if , it follows that

 δ >−log[1−{1θ1+(1−mn)}−1]=θ1+(mn−12)θ21+O(θ31). (B.11)

Next, using the coefficients for , the relationship between and () is:

 θ2θ1A =τ2(1−mn) (B.12) ⟺ δ (B.13)

Hence, is a well-defined function of () only if

 {1/θ1+(1−m/n)}−1exp{θ2/θ1×m/(n−m)}<1. (B.14)

Similarly, it follows that if ,

 θ2 <θ1n−mmlog[1θ1+(1−mn)]. (B.15)

The parameters () in a linear approximation of may thus be expressed in terms of the parameters () in .

An example is provided to illustrate the results above. Suppose the observed outcome was generated by , where is the (true) uniformity trial outcome. However, under the assumed causal model , the uniformity trial outcomes were determined as , so that