## Abstract

Targeted therapies based on biomarker profiling are becoming a mainstream direction of cancer research and treatment. Depending on the expression of specific prognostic biomarkers, targeted therapies assign different cancer drugs to subgroups of patients even if they are diagnosed with the same type of cancer by traditional means, such as tumor location. For example, Herceptin is only indicated for the subgroup of patients with HER2+ breast cancer, but not other types of breast cancer. However, subgroups like HER2+ breast cancer with effective targeted therapies are rare and most cancer drugs are still being applied to large patient populations that include many patients who might not respond or benefit. Also, the response to targeted agents in human is usually unpredictable. To address these issues, we propose SUBA, subgroup-based adaptive designs that simultaneously search for prognostic subgroups and allocate patients adaptively to the best subgroup-specific treatments throughout the course of the trial. The main features of SUBA include the continuous reclassification of patient subgroups based on a random partition model and the adaptive allocation of patients to the best treatment arm based on posterior predictive probabilities. We compare the SUBA design with three alternative designs including equal randomization, outcome-adaptive randomization and a design based on a probit regression. In simulation studies we find that SUBA compares favorably against the alternatives.

KEY WORDS: Adaptive designs; Bayesisan inference; Biomarkers; Posterior; Subgroup identification; Targeted therapies.

Subgroup-Based Adaptive (SUBA) Designs for Multi-Arm Biomarker Trials

Yanxun Xu, Lorenzo Trippa, Peter Müllerand Yuan Ji

1 Division of Statistics and Scientific Computing, The University of Texas at Austin, Austin, TX, U.S.A.

2 Department of Biostatistics, Harvard School of Public Health, Boston, MA, U.S.A.

3 Department of Mathematics, The University of Texas at Austin, Austin, TX, U.S.A.

4 Center for Clinical and Research Informatics, NorthShore University HealthSystem Evanston, IL, U.S.A

5 Prytzker School of Medicine, The University of Chicago, Chicago, IL, U.S.A

Email: yji@health.bsd.uchicago.edu

## 1 Introduction

### 1.1 Targeted Therapy

With the rapid development in genomics and personalized medicine it is becoming increasingly more feasible to diagnose and treat cancer based on measurements from genomic interrogations at the molecular level such as gene expression (Van De Vijver et al., 2002; Snijders et al., 2001), DNA copy numbers (Curtis et al., 2012; Baladandayuthapani et al., 2010), and epigenetic marks (Wang et al., 2008; Barski and Zhao, 2009; Mitra et al., 2013). In particular, pairing genetic traits with targeted treatment options has been an important focus in recent research. This has led to successful findings such as the use of trastuzumab, doxorubicin, or taxanes on HER2+ breast cancer (Hudis, 2007), and the recommendation against treatment with EGFR antibodies on KRAS mutated colorectal cancer (Misale et al., 2012). It is now broadly understood that patients with the same cancer defined by classification criteria such as tumor location, staging, and risk-stratification can respond differently to the same drug, depending on their genetic profiling.

First proposed by Simon and Maitournam (2004), “targeted designs” restrict the eligibility of patients to receive a treatment based on predicted response using genomic information. Under fixed sample sizes and comparing to standard equal randomization with two-arm trials, the authors showed that targeted designs could drastically increase the study power in situations where the new treatment benefited only a subset of patients and those patients could be accurately identified. Sargent et al. (2005) proposed the biomarker-by-treatment interaction design and a biomarker-based-strategy design, both using prognostic biomarkers to facilitate treatment allocations to targeted subgroups. Maitournam and Simon (2005) further showed that the relative efficiency of target designs depended on (1) the relative sizes of the treatment effects in biomarker positive and negative subgroups, (2) the prevalence of the patient group who favorably responds to the experimental treatment, and (3) the accuracy of the biomarker evaluation. Recently, new designs have been proposed by Freidlin et al. (2010), Simon (2010) and Mandrekar and Sargent (2010), among others.

BATTLE (Kim et al., 2011) and I-SPY 2 (Barker et al., 2009) are two widely known biomarker cancer trials using Bayesian designs. The design of BATTLE predefined five biomarker groups on the basis of 11 biomarkers, and assigned patients to four drugs using an outcome-adaptive randomization (AR) scheme. AR is implemented with the expectation that an overall higher response rate would be achieved relative to equal randomization (ER), assuming at least one biomarker group has variations in the outcome distributions across arms. However, the analysis of the trial data revealed otherwise; the response rate was actually slightly lower during the AR period than during the initial ER period. This fact can be attributed to several factors such as possible trends in the enrolled population, or variations in the procedures for measuring primary outcomes. In practice, targeted agents can fail for reasons such as having no efficacy on the targeted patients, being unexpectedly toxic, or uniformly ineffective. There is a need for adaptive designs to accommodate the situations above to improve trial efficiency and maintain trial ethic (Yin et al., 2012; Gu and Lee, 2010; Zhu et al., 2013).

Researchers are also developing new designs that allow for the redefinition of biomarker groups that could be truly responsive to targeted treatments. Ruberg et al. (2010) and Foster et al. (2011) developed tree-based algorithms to identify and evaluate the subgroup effects by searching the covariate space for regions with substantially better treatment effects. Bayesian models are natural candidates for adaptive learning of subgroups, and have been known and applied in non-medical contexts (Loredo, 2003; Kruschke, 2008).

### 1.2 A Subgroup-Based Adaptive Design

In this paper, we propose a class of SUbgroup-Based Adaptive (SUBA) designs for targeted therapies which utilize individual biomarker profiles and clinical outcomes as they become available.

To understand and characterize a clinical trial design it is useful to distinguish between the patients in the trial versus future patients. There exist a number of methods that address the optimization for the patients in the trial. Most approaches are targeting the optimization of a pre-selected objective function (criterion). See, for example, (Fedorov and Leonov, 2013, chapters 8 and 9). SUBA aims to address both goals, successful treatment of patients in the trial and optimizing treatment selection for future patients. We achieve the earlier by allocating each patient on the basis of the patient’s biomarker profile to the treatment with the best currently estimated success probability. That is, the optimal treatment for a patient with biomarker profile is

where is the posterior predictive response rate of a patient with biomarker profile under treatment . This can be characterized as a stochastic optimization problem. In contrast, the optimal treatment selection for future patients is not considered as an explicit criterion in SUBA. It is indirectly addressed by partitioning the biomarker space into subsets with different response probabilities for the treatments under consideration. Learning about the implied patient subpopulations facilitates personalized treatment selection for a future patient on the basis of the patient’s biomarker profile . The outcome of SUBA is an estimated partition of the biomarker space and the corresponding optimal treatment assignments.

The main assumption underlying the proposed design approach is that there exist subgroups of patients who differentially respond to treatments. For example, consider a scenario with two subgroups of patients that respond well to either of two different treatments, but not both. An ideal design should search for such subgroups and link each subgroup with its corresponding superior treatment. That is, a design should aim to identify subgroups with elevated response rates to particular treatments. The key innovations of SUBA are that such biomarker subgroups are continuously redefined based on patients’ differential responses to treatments and that patients are allocated to the currently estimated best treatment based on posterior predictive inference.

In summary, SUBA conducts subgroup discovery, estimation, and patient allocation simultaneously. We propose a prior for the partition that classifies tumor profiles into biomarker subgroups. The stochastic partition has the advantage that biomarker subgroups are not fixed up front before patients accrual. The goal is to use the data, during the trial, to learn which partitions are likely to be relevant and could potentially become clinically useful. We define a random partition of tumor profiles using a tree-based model that shares similarities with Bayesian CART algorithms (Chipman et al., 1998; Denison et al., 1998). We provide closed-form expressions for posterior computations and describe an algorithm for adaptive patient allocation during the course of the trial.

### 1.3 Motivating Trial

We consider a breast cancer trial with three candidate treatments. Patients who are eligible have undergone neoadjuvant systemic therapy (NST) and surgery. Protein biomarkers for all patients are measured through biopsy samples by reverse phase protein arrays (RPPA) at the end of NST, but before surgery. The first treatment is a poly (ADP-ribose) polymerase (PARP) inhibitor, which affects DNA repair and cell death programming. The second treatment is a PI3K pathway inhibitor, which affects cell growth, proliferation, cell differentiation and ultimate survival. The third treatment is a cell cycle inhibitor that targets the cell cycle pathway. The main goal is to identify for each of the three treatments subgroups of patients that will respond favorably to the respective treatment.

## 2 Methodology

### 2.1 Sampling Model

Assume that candidate treatments are under consideration in a clinical trial. We use to index the treatments and to index patients. We assume a maximum sample size of patients. The primary outcome for each patient is a binary variable . We assume that can be measured without delay. We denote with the biomarker profile of the -th patient, recorded at baseline. We assume that all biomarkers are continous, . Finally, let denote the treatment allocation for patient with if patient is assigned to treatment .

The underlying assumption of a biomarker clinical trial is that there exist subgroups of patients that differentially respond to the same treatment. For example, subgroup may respond well to treatment but not while subgroup may respond well to treatment but not . However, the subgroups are not known before the trial and must be estimated adaptively based on response data and biomarker measurements from already treated patients. To estimate the subgroups and their expected response rates to treatments, we propose a random partition model. Assuming that all biomarker measurements are continuous, , we construct patient subgroups by defining a partition of the biomarker space . A partition is a family of subsets , where is the size of the partition and are the partitioning subsets such that and . The partition of the biomarker sample space implies a partition of the patients into biomarker subgroups. Patient belongs to biomarker subgroup if . We will construct a prior probability measure for in the next section. In the following discussion we will occasionally refer to as a subset of patients, implying the subset of patients that is defined by the partitioning subset .

We define a sampling model for conditional on and as

(1) |

where is the response rate of treatment for a patient in subgroup . Thus the joint likelihood function for patients is the product of such Bernoulli probabilities, using and depending on the recorded outcomes . In each biomarker subgroup , let count the number of patients, the number of patients assigned to treatment , and the number of patients in group assigned to with response . Here is the indicator function. Let , , , and . Then

Adding a prior on and we complete (1) to define a 3-level hierarchical model

(2) |

The last two factors define the prior model for and . We assume and discuss the prior for next. Posterior inference on and provides learning on subgroups and their treatment-specific response rates. Posterior probabilities for and are the key inference summaries that we will later use to define the desired adaptive trial design.

### 2.2 Random Biomarker Partition

We propose a tree-type random partition on the biomarker space to define random biomarker subgroups. A partition is obtained through a tree of recursive binary splits. Each node of the tree corresponds to a subset of , and is either a final leaf which defines one of the partioning subsets , or it is in turn split into two descendants. In the latter case the two descendants are defined by first selecting a biomarker and then splitting the current subset by thresholding . The threshold splits the ancestor set into two components. A sequence of such splits generates a partition of as the collection of the resulting subsets. For the motivating breast cancer trial, we limit the partition to at most eight biomarker subgroups in the random partition. We impose this constraint to limit the number of subgroups with critically small numbers of patients, and therefore only allow three rounds of random splits.

An example is shown in Figure 1. The figure shows a realization of the random partition with biomarkers. In each round, we consider each of the current subsets and either do not split it further with probability or with probability choose biomarker to split the subset into two parts. If an ancestor subset is split by the -th biomarker, then the resulting partition contains two new subsets, defined by and , where is the median of and is computed across all available data points in the subset . That is, is a conditional median which can vary during the course of the trial, as more data become available. In Figure 1 the sequence of splits is as follows. We first split on . In the second round the two resulting subsets are split on and , respectively. In a third round of splits, only one subset of the earlier four subsets is split on again, three others are not further split.

Let be the sample space of all possible partitions based on the three rounds of splits. For each partition , we calculate the prior probability based on the above random splitting rules. For example, the partition in Figure 1 has prior probability

(3) |

with the three factors corresponding to the three rounds of splits.

We use a variation of the described probability model. The main rational is that, if a biomarker is selected for an initial split, then it is desirable to augment the probability of splitting it again at the subsequent levels in the tree. The goal is to facilitate the identification of relevant subgroups maintaining the simplicity of the partition model. To implement this, in each possible partition , we calculate as the number of distinct biomarkers selected in the three rounds of splits. We then add an additional penalty term proportional to to the above prior probability of , so that the prior favors partitions that repeatedly split on the same marker. For example, in Figure 1, the modified prior probability is

(4) |

Similarly, we can calculate the prior probability for any partition in . When the two probability models that we described coincide while values of in allow one to tune the concentration of over partitions that split over a parsimonious number of biomarkers.

### 2.3 Decision Rule for Patient Allocation

A major objective of the SUBA design is to assign future patients to superior treatments based on their biomarker profiles and the observed outcomes of all previous patients. Assuming that the outcomes of the first patients have been observed, we denote by the posterior predictive probability of response under treatment for an patient with biomarker profile . Denoting the observed trial data , based on (2),

(5) |

The posterior probability can be computed as follows. Given a partition , all patients are divided into biomarker subgroups. Recall the definition of and from Section 2.1. The posterior distribution of is

where is the prior probability of partition that can be calculated as in (4). Let denote the beta function, and let denote a beta p.d.f. With independent prior distributions for the parameters we can further simplify the above equation to

(6) |

The conditional probability is the integral of (1) with respect to the posterior on . Then

(7) | |||||

Let index the partitioning subset with . The sum over in (7) reduces to just the term with . Combining (6) and (7), we compute the posterior predictive response rate of patient receiving treatment in closed form

(8) |

Denote with the treatment decision for the patient. We choose by adopting a minimum posterior predictive loss approach described in Gelfand and Ghosh (1998). Under a variety of loss functions (such as the 0-1 loss), the optimal rule that minimizes the posterior predictive loss is

(9) |

See Raiffa and Schlaifer (1961) or Gelfand and Ghosh (1998) for details. Alternatively, one could use the probabilities in a biased randomization , as proposed in Thall and Wathen (2007).

### 2.4 The SUBA Design

Computing the posterior predictive response rates for all candidate treatments allows us to compare treatments and monitor the trial accordingly. If one treatment is inferior to all other treatments, that treatment should be dropped from the trial. If there is only one treatment left after dropping inferior treatments, the trial should be stopped early due to ethical and logistics reasons.

The SUBA design starts a trial with a run-in phase during which patients are equally randomized to treatments. After the initial run-in, we continuously monitor the trial until either the trial is stopped early based on a stopping rule, or the trial is stopped after reaching a prespecified maximum sample size .

We include rules to exclude inferior treatments and stop the trial early if indicated. Recall that the biomarker space is . Consider the -th biomarker and observed biomarker values . We define an equally spaced grid of size between and , where and are the observed smallest and largest values for that biomarker. Taking the Cartesian product of these grids we then create a dimensional grid of size . Let , , denote the list of all grid points. After an initial run-in phase with equal randomization, we evaluate the posterior predictive response rate for treatment for each . Any treatment with uniformly inferior success probability

is dropped from the trial. That is, we remove from the list of treatments, . Also, if only one treatment is left in the trial, then the trial is stopped early.

Alternatively to the construction of the grid , any available data set of typical biomarker values could be used. For large this is clearly preferable. If such data were available, it could also be used for an alternative definition of in the specification of the splits in the prior for discussed earlier.

The SUBA design consists of the following steps.

1. Initial run-in. Start the trial and randomize patients equally to treatments in the set . 2. Treatment exclusion and early stopping. Drop treatment if for all and . Set . If enrollment remains active only for a single treatment then stop the trial. 3. Adaptive patient allocation. Allocate patient to treatment according to (9). When the response is available, go back to step 2 and repeat for patients . 4. Reporting patient subpopulations. Upon conclusion of the trial we report the estimated partition together with the estimated optimal treatment allocations.

In step 4, summarizing the posterior distribution over random partitions and determining the best partition over a large number of possible partitions is a challenging problem. Following Medvedovic et al. (2004) we define an association matrix of co-clustering indicators for each partition . Here is an indicator of patients and being in the same subgroup with respect to the biomarker partition . Dahl (2006) introduced a least-squares estimate for random partitions using draws from Markov chain Monte Carlo (MCMC) posterior simulation. Following their idea, we propose a least-square summary

where is the posterior mean association matrix and denotes the sum of squared elements of a matrix . In words, minimizes the sum of squared deviations of between an association matrix and the posterior mean .

Alternatively one could report a partition that minimizes the average squared deviation, averaging with respect to . That is, minimize posterior mean squared distance instead of squared distance to the posterior mean association matrix. While the earlier has an appealing justification as a formal Bays rule, the latter is easier to compute.

## 3 Simulation Studies

### 3.1 Simulation Setup

We conduct simulation studies to evaluate the proposed design. The setup is chosen to mimic the motivating breast cancer study. For each simulated trial, we fix a maximum sample size of patients in a three-arm study with three treatments . We assume that a set of biomarkers are measured at baseline for each patient and generate from a uniform distribution on , i.e., . The hyperprior parameters are fixed as , , , and . That is, each biomarker has the same prior probability of being selected for a split, and the response rates have uniform priors. To set up the grid for the stopping rule we select equally spaced points on each biomarker subspace, and thus grid points in . During the initial run-in phase, patients are equally randomized to three treatments.

#### Scenarios 1 through 6.

We consider six scenarios and simulated trials for each scenario. In the first two scenarios, we assume that biomarkers and are relevant to the response, but not biomarkers and . The simulation truth for the outcome is a probit regression. Specifically, we assume that the true response rates for a patient with covariate vector under treatments 1, 2 or 3 are , , or , respectively, where is the cumulative distribution function (CDF) of a Gaussian distribution with and . Figure 2 plots the response rates under three treatments versus given different values of . The red lines represent treatment 1, black lines refer to treatment 2 and green lines to treatment 3. Treatment 3 is always the most effective arm when , the three treatments have equal success rates when , and treatment 1 is superior when . In summary, the optimal treatment is a function of the second biomarker, . That is, identifies the optimal treatment selection. The response rates of three treatments increase with , but the ordering of the three treatments does not change varying the first biomarker. Therefore, is only predictive of response, but ideally should not be involved for treatment selection. To assess the performance of SUBA under this setup, we select two scenarios. In an over-simplified scenario 1, we assume that all the patients have . Thus, treatment 1 is more effective than 2, which in turn is more effective than 3. In scenario 2, we do not fix the values of and randomly generate all biomarker values.

In scenario 3, we assume that biomarkers 1, 2 and 3 are related to the response and there are interactions. The true response rates under treatments 1, 2, or 3 are , , or , respectively. Figure 3 plots the response rates under three treatments versus given (Figure 3a) and given (Figure 3b). Here, all three markers are predictive of the ordering of the treatment effects in a complicated fashion.

We design scenarios 4 and 5 with treatment 3 being uniformly inferior to treatments 1 and 2. We assume that the response rates under treatments 1 and 2 are or . The implied minimum response rate for treatments 1 and 2 is 0.37 and the response rates of treatment 1 and 2 are close for all biomarker values (differences range from -0.24 to 0.24 with the first quantile across biomarker profiles equal to -0.06 and the third quantile equal to 0.09). We assume in scenario 4 and in scenario 5, thus for all and . So we can expect that treatment 3 should be excluded in both scenarios.

Finally, Scenario 6 is a null case, in which no biomarkers are related to response. We assume that the response rates under the three treatments for all the patients are the same at 40%, that is, .

#### Comparison.

For comparison, we implement a standard design with equal randomization (ER), an outcome-adaptive randomization (AR) design, and a design based on a probit regression model (Reg). In the ER design, all patients are equally randomized to the three treatments and their responses are generated from for patient receiving treatment , and . The values of are defined by the Gaussian CDFs given above. Under the AR design, we assume that three predefined biomarker subgroups are fixed before the trial (similar to the BATTLE trial Kim et al. (2011)). We assume that the three subgroups are defined as , and , using the quartiles of the empirical distribution of biomarker as thresholds. Apparently, these subgroups are wrongly defined and do not match the true response curves in scenarios 1-6. The mismatch is deliberately chosen to evaluate the importance of correctly defining subgroups. Let be the response rate of treatment in subgroup , and the total number of patients receiving treatment in subgroup , and . For this design we use the model . With a conjugate beta prior distribution beta(1,1) on , we easily compute the posterior of as , where is the number of patients who responded to treatment in subgroup . Then under the AR design, we first equally randomize 100 patents to the three treatments, and adaptively randomize the next 200 patents sequentially. The AR probability for a future patient in subgroup equal , where is the posterior mean , alternatively other summaries of the posterior can be used to adapt treatment assignment Thall and Wathen (2007). Under the Reg design, we model binary outcomes using a probit regression. In the probit model, the inverse standard normal CDF of the response rate is modeled as a linear combination of the biomarkers and treatment, . The parameters and are obtained using maximum likelihood estimation. Under the Reg design, we randomize the first 100 patients with equal probabilities to the three treatments, and then assign the next 200 patients to the treatment with estimated best success probability, sequentially.

### 3.2 Simulation Results

#### Response rates.

Define the overall response rate (ORR) as

which is the proportion of responders among those patients who are treated after the run-in phase. We summarize ORR differences between SUBA versus ER, AR, and Reg for each scenario in Figure 4. In our comparisons we use the same burn in period across designs.

For scenarios 2 and 3, SUBA outperforms ER, AR and Reg with higher ORR in almost all the simulated trials. The ER and AR designs perform similarly. This suggests that no gains are obtained with AR when the biomarker subgroups are wrongly defined, confirming that for AR it is essential an upfront appropriate selection of the biomarker subgroups. In scenarios 1, 4 and 5, SUBA and Reg are preferable to ER and AR. SUBA exhibits a larger ORR value than Reg in 676 of 1,000 simulations in scenario 1, in 612 of 1,000 simulations in scenario 4 and in 605 of 1,000 simulations in scenario 5. In scenario 6, the true response rates are constant and not related to biomarkers, and the four designs show similar ORRs distribution across 1,000 simulations.

#### Early stopping.

Table 1 reports the average number of patients under the SUBA design. When a trial is stopped early by SUBA, there must be one last treatment left which are considered more efficacious than all the removed treatments. For a fair comparison with ER, AR and Reg which do not include early stopping, summaries in Table 2 are based on assignment of all remaining patients, until the maximum sample size , to that last active arm.

Scenario | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|

# of patients | 245.28 | 299.41 | 300.00 | 167.63 | 215.07 | 209.52 |

#### Treatment assignment.

We compute the average number of patients (ANP) assigned to treatment after the run-in phase by the three designs. Denote as the number of patients assigned to treatment in simulated trial after the run-in phase, i.e., , and . Thus

Table 2 shows the results. In scenario 1, treatment 1 is always the most effective arm since the second biomarker is fixed at 0.8 (see Figure 2). We can see that most of the patients are allocated to treatment 1 in scenario 1 by SUBA. Scenario 6 is a null case in which the biomarkers are not related to response rates and the response rates across treatments are the same, so the patients allocation by SUBA is similar as ER, AR and Reg.

Scenario | ER | AR | Reg | SUBA | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Subset | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | |

1 | / | 66.76 | 66.60 | 66.64 | 83.02 | 65.35 | 51.63 | 119.46 | 70.13 | 10.41 | 177.11 | 18.67 | 4.22 |

2 | 33.49 | 33.09 | 33.24 | 33.37 | 33.19 | 33.25 | 35.24 | 32.88 | 31.69 | 72.57 | 18.37 | 8.88 | |

33.27 | 33.51 | 33.40 | 33.41 | 33.25 | 33.53 | 35.42 | 33.01 | 31.76 | 8.63 | 17.79 | 73.77 | ||

3 | 19.49 | 19.09 | 19.29 | 22.21 | 17.63 | 18.03 | 18.65 | 16.40 | 22.81 | 41.11 | 8.94 | 7.82 | |

25.23 | 25.17 | 25.35 | 21.13 | 26.81 | 27.80 | 24.10 | 21.86 | 29.79 | 13.67 | 35.91 | 26.17 | ||

22.05 | 22.34 | 22.00 | 24.61 | 20.52 | 21.26 | 21.27 | 18.99 | 26.12 | 11.33 | 11.54 | 43.52 | ||

4 | 33.26 | 33.11 | 33.44 | 43.01 | 42.32 | 14.49 | 51.81 | 48.00 | 0 | 52.76 | 46.96 | 0.10 | |

33.50 | 33.49 | 33.20 | 42.32 | 43.46 | 14.41 | 51.75 | 48.44 | 0 | 50.78 | 49.29 | 0.11 | ||

5 | 33.26 | 33.11 | 33.44 | 39.14 | 38.49 | 22.19 | 51.51 | 48.25 | 0.05 | 51.13 | 47.05 | 1.63 | |

33.50 | 33.49 | 33.20 | 38.29 | 39.32 | 22.58 | 51.22 | 48.92 | 0.05 | 47.07 | 51.53 | 1.59 | ||

6 | / | 66.76 | 66.60 | 66.64 | 66.66 | 66.89 | 66.46 | 65.04 | 67.84 | 67.12 | 66.90 | 64.20 | 68.90 |

In scenario 2, we separately report the average numbers of patients assigned to three treatments after the run-in phase, among those whose second biomarker is positive or negative. We separately report these two averages to demonstrate the benefits of using the SUBA design since depending on the sign of the second biomarker, different treatments should be selected as the most beneficial and effective ones for patients. When the second biomarker is positive, treatment 1 is the most superior arm; when the second biomarker is negative, treatment 3 is the most effective arm according to our simulation settings. From Table 2, among the 200 post-runin patients, about 100 patients have () values of the second biomarker. In Table 2 we use and to denote sets of patients. Think of as a partition in the simulation truth. Among patients in , Table 2 reports that an average of approximately 73 of them are allocated to treatment 1, 18 to treatment 2, and 9 to treatment 3. For those in , 9 are allocated to treatment 1, 18 to treatment 2, and 74 to treatment 3. Most of the patients are assigned to the correct superior treatments according to their biomarker values, highlighting the utility of the SUBA design. In contrast, ER, AR and Reg designs assign far fewer patients to the most effective treatments. These results and, similarly Figure 4, shows the utility of the SUBA approach.

In scenario 3, biomarkers 1, 2 and 3 are related to the response. In a similar fashion, we report patient allocations by breaking down the numbers according to three subsets that are indicative of the true optimal treatment allocation depending on the biomarker values. Denote , , and . According to the simulation truth, we consider three sets and , defined as , and . Under this assumption, the best treatment for patients in set is treatment according to the simulation truth. Table 2 reports the simulation results for , and . We can see most of the patients are assigned to the correct superior treatments. In contrast, the ER, AR and Reg designs fail to do so.

In scenarios 4 and 5, biomarkers 1 and 2 are related to the response. Since treatment 3 is inferior to treatments 1 and 2, the biomarker space is only split to two sets and according to simulation truth. Denote , . So and . Table 2 again shows that SUBA assigns more patients to their corresponding optimal treatments than ER and AR designs, but performs similar as Reg. Scenarios 4-5 are two challenging cases, in which the dose-response surfaces are “U”-shaped (plots not shown) and treatments 1 and 2 have similar true responses rates for most biomarker values. Treatment 3 is much less desirable to treatments 1 and 2, and is excluded by SUBA and Reg quickly across most of the simulations. Both designs assign similar numbers of patients on average to treatments 1 and 2. However, both designs assign a considerable number of patients to suboptimal treatments. For example, in both scenarios 50% of the patients received a suboptimal treatment, which could be caused by false negative splits that failed to capture the superior subgroups for those patients. Nevertheless, SUBA is still markedly better than the ER and AR designs in these scenarios.

In summary, SUBA continuously learns the response function to pair optimal treatments with targeted patients and can substantially outperform ER, AR and Reg in terms of OOR.

#### Posterior estimated partition.

Figure 6 shows the least-square partition in an arbitrarily selected trial for scenarios 2 and 3. The number in each circle represents the biomarker used to split the biomarker space. In scenario 2, biomarkers 1 and 2 are related to response rate. Treatment 1 is the best treatment when the second biomarker is positive and treatment 3 is the best one when the second biomarker is negative. The least-square partition uses biomarker 2 to split the biomarker space in the first round of split, which corresponds to the simulation truth. In scenario 3, biomarkers 1, 2, and 3 are related to response rate and the least-square partition uses these true response-related biomarkers to split as well.

### 3.3 Sensitivity Analysis

To evaluate the impact of the maximum sample size on the simulation results, we carried out a sensitivity analysis with in scenario 1, with first patients equally randomized. Recall that in scenario 1, treatment 1 has a higher response rate than treatments 2 and 3, regardless of their biomarker values. Therefore the effect of sample size on the posterior inference can be easily evaluated.

Figure 5 plots the histogram of differences between treatments and after or patients have been treated in the trial. When , treatment 1 is reported as better than treatment 2 in 752 of 1,000 simulations; when , treatment 1 is better than treatment 2 in 838 of 1,000 simulations; when , treatment 1 is better than treatment 2 in 884 of 1,000 simulations. The more patients treated, the more precise the posterior estimates and more accurate assignments for future patients. Similar patterns are observed for the comparison between treatments 1 and 3.

We also varied the values and conducted sensitivity analysis with using scenario 2. Table 3 shows the average numbers of patients needed to make the decision of stopping trials early and the average numbers of patients assigned to three treatments after the run-in phase in two defined subsets. In summary, the reported summaries vary little across the considered hyperparameter choices, indicating robustness with respect to changes within a reasonable range of values.

# of patients | 298.10 | 299.41 | 299.15 | ||||||
---|---|---|---|---|---|---|---|---|---|

Subset | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 |

71.66 | 19.09 | 9.06 | 72.57 | 18.37 | 8.88 | 72.21 | 18.50 | 9.11 | |

8.64 | 18.50 | 73.05 | 8.63 | 17.79 | 73.77 | 8.79 | 18.31 | 73.09 |

## 4 Discussion

We demonstrated the importance of subgroup identification in adaptive designs when such subgroups are predictive of treatment responce. The key contribution of the proposed model-based approach is the construction of the random partition prior which allows a flexible and simple mechanism to realize subgroup exploration as posterior inference on . The Bayesian paradigm facilitates continuous updating of this posterior inference as data becomes available in the trial. The proposed construction for is easy to interpret and, most importantly, achieve a good balance between the required computational burden for posterior computation and the flexibility of the resulting prior distribution. The priors of are i.i.d Beta, with , i.e., a uniform prior in our simulation studies. If desired, this prior can be calibrated to reflect the historical response rate of the drug. The i.i.d assumption simplifies posterior inference. Alternatively, one could impose dependence across the ’s; for example, one could assume that adjacent partition sets have similar values.

The proposed SUBA design focuses on the treatment success for the patients who are enrolled in the current trial by identifying subgroups of patients who respond most favorably to each of the treatments. One could easily add to the SUBA algorithm a final recommendation of a suitable patient population for a follow-up trial, such as . Other directions of generalization include an extension of the models to incorporate variable selection, when a large number of biomarkers are measured.

## Acknowledgment

The research of YJ and PM is partly supported by NIH R01 CA132897. PM was also partly supported by NIH R01CA157458. This research was supported in part by NIH through resources provided by the Computation Institute and the Biological Sciences Division of the University of Chicago and Argonne National Laboratory, under grant S10 RR029030-01. We specifically acknowledge the assistance of Lorenzo Pesce (U of Chicago) and Yitan Zhu (NorthShore University HealthSystem).

(a) When 3rd biomarker=0.6 | (b) When 3rd biomarker=-0.6 |

Scenario 1 | Scenario 2 | Scenario 3 |

Scenario 4 | Scenario 5 | Scenario 6 |

(a) Scenario 2 |

(b) Scenario 3 |

### References

- Baladandayuthapani, V., Y. Ji, R. Talluri, L. E. Nieto-Barajas, and J. S. Morris (2010). Bayesian random segmentation models to identify shared copy number aberrations for array cgh data. Journal of the American Statistical Association 105(492).
- Barker, A., C. Sigman, G. Kelloff, N. Hylton, D. Berry, and L. Esserman (2009). I-spy 2: an adaptive breast cancer trial design in the setting of neoadjuvant chemotherapy. Clinical Pharmacology & Therapeutics 86(1), 97–100.
- Barski, A. and K. Zhao (2009). Genomic location analysis by chip-seq. Journal of cellular biochemistry 107(1), 11–18.
- Chipman, H. A., E. I. George, and R. E. McCulloch (1998). Bayesian cart model search. Journal of the American Statistical Association 93(443), 935–948.
- Curtis, C., S. P. Shah, S.-F. Chin, G. Turashvili, O. M. Rueda, M. J. Dunning, D. Speed, A. G. Lynch, S. Samarajiwa, Y. Yuan, et al. (2012). The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486(7403), 346–352.
- Dahl, D. (2006). Model-based clustering for expression data via a dirichlet process mixture model. Bayesian inference for gene expression and proteomics, 201–218.
- Denison, D. G., B. K. Mallick, and A. F. Smith (1998). A bayesian cart algorithm. Biometrika 85(2), 363–377.
- Fedorov, V. V. and S. L. Leonov (2013). Optimal design for nonlinear response models. CRC Press.
- Foster, J. C., J. M. Taylor, and S. J. Ruberg (2011). Subgroup identification from randomized clinical trial data. Statistics in medicine 30(24), 2867–2880.
- Freidlin, B., L. M. McShane, and E. L. Korn (2010). Randomized clinical trials with biomarkers: design issues. Journal of the National Cancer Institute 102(3), 152–160.
- Gelfand, A. E. and S. K. Ghosh (1998). Model choice: A minimum posterior predictive loss approach. Biometrika 85(1), 1–11.
- Gu, X. and J. J. Lee (2010). A simulation study for comparing testing statistics in response-adaptive randomization. BMC medical research methodology 10(1), 48.
- Hudis, C. A. (2007). Trastuzumab?mechanism of action and use in clinical practice. New England Journal of Medicine 357(1), 39–51.
- Kim, E. S., R. S. Herbst, I. I. Wistuba, J. J. Lee, G. R. Blumenschein, A. Tsao, D. J. Stewart, M. E. Hicks, J. Erasmus, S. Gupta, et al. (2011). The battle trial: personalizing therapy for lung cancer. Cancer Discovery 1(1), 44–53.
- Kruschke, J. (2008). Bayesian approaches to associative learning: from passive to active learning. Learning & Behavior 36, 210–226.
- Loredo, T. (2003). Bayesian adaptive exploration in a nutshell. Statistical Problems in Particle Physics, Astrophysics, and Cosmology 1, 162–165.
- Maitournam, A. and R. Simon (2005). On the efficiency of targeted clinical trials. Statistics in medicine 24(3), 329–339.
- Mandrekar, S. J. and D. J. Sargent (2010). Predictive biomarker validation in practice: lessons from real trials. Clinical Trials 7(5), 567–573.
- Medvedovic, M., K. Y. Yeung, and R. E. Bumgarner (2004). Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 20(8), 1222–1232.
- Misale, S., R. Yaeger, S. Hobor, E. Scala, M. Janakiraman, D. Liska, E. Valtorta, R. Schiavo, M. Buscarino, G. Siravegna, et al. (2012). Emergence of kras mutations and acquired resistance to anti-egfr therapy in colorectal cancer. Nature 486(7404), 532–536.
- Mitra, R., P. Müller, S. Liang, L. Yue, and Y. Ji (2013). A bayesian graphical model for chip-seq data on histone modifications. Journal of the American Statistical Association 108(501), 69–80.
- Raiffa, H. and R. Schlaifer (1961). Applied statistical decision theory (harvard business school publications).
- Ruberg, S. J., L. Chen, and Y. Wang (2010). The mean does not mean as much anymore: finding sub-groups for tailored therapeutics. Clinical Trials 7(5), 574–583.
- Sargent, D. J., B. A. Conley, C. Allegra, and L. Collette (2005). Clinical trial designs for predictive marker validation in cancer treatment trials. Journal of Clinical Oncology 23(9), 2020–2027.
- Simon, R. (2010). Clinical trial designs for evaluating the medical utility of prognostic and predictive biomarkers in oncology. Personalized medicine 7(1), 33–47.
- Simon, R. and A. Maitournam (2004). Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10(20), 6759–6763.
- Snijders, A. M., N. Nowak, R. Segraves, S. Blackwood, N. Brown, J. Conroy, G. Hamilton, A. K. Hindle, B. Huey, K. Kimura, et al. (2001). Assembly of microarrays for genome-wide measurement of dna copy number. Nature genetics 29(3), 263–264.
- Thall, P. F. and J. K. Wathen (2007). Practical bayesian adaptive randomisation in clinical trials. European Journal of Cancer 43(5), 859–866.
- Van De Vijver, M. J., Y. D. He, L. J. van’t Veer, H. Dai, A. A. Hart, D. W. Voskuil, G. J. Schreiber, J. L. Peterse, C. Roberts, M. J. Marton, et al. (2002). A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine 347(25), 1999–2009.
- Wang, Z., C. Zang, J. A. Rosenfeld, D. E. Schones, A. Barski, S. Cuddapah, K. Cui, T.-Y. Roh, W. Peng, M. Q. Zhang, et al. (2008). Combinatorial patterns of histone acetylations and methylations in the human genome. Nature genetics 40(7), 897–903.
- Yin, G., N. Chen, and J. Jack Lee (2012). Phase ii trial design with bayesian adaptive randomization and predictive probability. Journal of the Royal Statistical Society: Series C (Applied Statistics) 61(2), 219–235.
- Zhu, H., F. Hu, and H. Zhao (2013). Adaptive clinical trial designs to detect interaction between treatment and a dichotomous biomarker. Canadian Journal of Statistics, 1–15.