# Central limit Theorem for an Adaptive Randomly Reinforced Urn Model

## Abstract

: The generalized Pólya urn (GPU) models and their variants have been investigated in several disciplines. However, typical assumptions made with respect to the GPU do not include urn models with diagonal replacement matrix, which arise in several applications, specifically in clinical trials. To facilitate mathematical analyses of models in these applications, we introduce an adaptive randomly reinforced urn model that uses accruing statistical information to adaptively skew the urn proportion toward specific targets. We study several probabilistic aspects that are important in implementing the urn model in practice. Specifically, we establish the law of large numbers and a central limit theorem for the number of sampled balls. To establish these results, we develop new techniques involving last exit times and crossing time analyses of the proportion of balls in the urn. To obtain precise estimates in these techniques, we establish results on the harmonic moments of the total number of balls in the urn. Finally, we describe our main results in the context an application to response-adaptive randomization in clinical trials. Our simulation experiments in this context demonstrate the ease and scope of our model.

CLT for an Adaptive Urn Model

,

and

t2Supported in part by NsF grant DMS 000-03-07057.

class=MSC] \kwd[Primary ]60F15 \kwd60F05\kwd60E20\kwd60G99 \kwd[; secondary ]68Q87 \kwd97K50

Clinical trials \kwdcrossing times \kwdharmonic moments \kwdlast exit times \kwdgeneralized Pólya urn \kwdtarget allocation

## 1 Introduction

A generalized Pólya urn (GPU) model [4] is characterized by the pair of random variables representing the number of balls of two colors, red and white, for instance. The process is described as follows: at time , the process starts with balls. A ball is drawn at random. If the color is red, the ball is returned to the urn along with the random numbers of red and white balls; otherwise, the ball is returned to the urn along with the random numbers of red and white balls, respectively. Let and denote the urn composition when the sampled ball is red; similarly, let and denote the urn composition when the sampled ball is white. The process is repeated yielding the collection . The quantities and are collections of independent and identically distributed (i.i.d.) non-negative integer valued random variables, and is assumed to be independent of . We refer to

as a replacement matrix.

In this paper, we focus on an extension of the randomly reinforced urn (RRU) model, a variant of the randomized Pólya urn (RPU) models, whose replacement matrix is given by

where the random variables and are supported on , rather than on the set of non-negative integers. Let and . For the RRU model, a law of large numbers was established in [18]; i.e.

(1.1) |

where stands for almost sure convergence and is a random variable supported on . The properties of the distribution of were studied in [1, 2]. Denoting the number of balls of red and white colors sampled from the urn, one can deduce from the above LLN that converges to the same limit as .

Notice that the limit of the RRU in (1.1) is always 1 or 0 when . However, in applications it is common to target a specific value . This was achieved in [3], where the modified randomly reinforced urn (MRRU) model was introduced. The MRRU model is an RRU model with two fixed thresholds , such that if , no white balls are replaced in urn, while if , no red balls are replaced in the urn. These changes occur at random times and will in general depend on and . Thus, even if the sequences and are i.i.d., the replacements matrices of the MRRU model are not i.i.d. Indeed the replacement matrix has the following representation:

The LLN for the MRRU when is established as

A second order result for , namely the asymptotic distribution of after appropriate centering, was derived in [13]. However, the validity of the CLT for in the MRRU model is not known.

A critical issue in the MRRU model is that and are typically unknown in real applications. In this paper, we use the accruing information concerning the balls in the urn to provide random thresholds which converge a.s. to specified targets. More specifically, our replacement matrix becomes

(1.2) |

where and represent the random thresholds. We call this adaptive urn model an adaptive randomly reinforced urn (ARRU), to distinguish it from the RRU and the MRRU. In this paper, we investigate the asymptotic properties of the ARRU model when . Specifically, we establish the LLN for and , and the CLT for . Before concluding this section, we describe some recent works in the literature which are similar in spirit to the present work but are quite different from our proposed model.

Let , where is the “information” up to the time . This is referred to as the generating matrix. Asymptotic properties of the urn composition for homogeneous GPU, i.e. for all , have been studied in [4] under the assumption that is irreducible. In [21], the extended Pólya urn (EPU) is defined as a GPU such that all the rows of sum to the same positive constant, i.e.

(1.3) |

Under the assumption that has simple eigenvalues, second-order asymptotic properties on the proportion of sampled color extracted from the urn are obtained in [21]. In [15], the limiting distribution of the proportion of sampled balls for homogeneous urn models are derived. In [5], weak consistency and asymptotic normality of the urn composition for non-homogeneous GPU are established. However, in [5], the sequence is deterministic and converges to a matrix satisfying (1.3). [6, 7] extend [5] to random generating matrices and establish almost sure convergence of the proportion of sampled balls. They also investigate the second-order properties. A key assumption in [6, 7] is (1.3). In [24] the sequence of generating matrices is defined as function of adaptive estimators, which guarantees the convergence of to a limiting matrix satisfying (1.3). For “immigrated” urn models, theoretical results have been obtained in [25] under the assumptions (1.3), or . These extensions do not include the RRU model, where is diagonal, non-negative and (1.3) is not satisfied. For distributional results concerning large Pólya urns, see [8]. We now describe application to clinical trial literature (see [11]). For applications to computer science, we refer the reader to [17].

### 1.1 Applications to clinical trials

Urn models have a long history of applications in clinical trials, by providing randomization procedures that target certain objectives (for a review, see [19]). In this context, patients are sequentially allocated to treatments according to the sampled colors and the associated responses are used to update the urn. This is referred to as response-adaptive, since the probability of assignment depends on information about the treatment performances. For a literature review on response-adaptive designs in clinical trials see [14, 16]. In an RRU model, responses to treatments are typically transformed by a utility function to obtain the reinforcement values, so that the higher the reinforcement, the better the treatment. This yields a more ethical allocation in clinical trials, because (1.1) shows that the RRU assigns more patients to the superior treatment. However, response-adaptive designs usually aim at obtaining good inferential properties by targeting a certain proportion , which is typically chosen to satisfy some optimality criteria (see [20]). For this reason, in [3] the RRU was modified to asymptotically attain any target allocation proportion, . This guarantees the MRRU design to have an asymptotic allocation within there by incorporating ethical constraints (viz. assigning more subjects to the superior treatment). The main issue is that and are typically functions of unknown parameters (see [20]). The ARRU model presented in this paper allows and to be functions of such unknown parameters, and adaptively updates by substituting sequential estimates for the parameters. The limiting results in this paper demonstrate that such procedures target the unknown optimal allocation and provide an appropriate randomization procedure for such trials in large samples. We also demonstrate by simulation that the properties hold relatively well for moderate sample sizes.

### 1.2 Structure of the paper

The paper is organized as follows. In Section 2, we present the notation and assumptions concerning the ARRU model and related main results. Specifically, in Subsection 2.1, we present the LLN; in Subsection 2.2 we present the CLT under the assumption that the thresholds are updated at exponentially changing times. Subsection 2.3 is devoted to the implications of the main results in the context of clinical trials.

In Section 3, we describe several fundamental results concerning the ARRU model that are needed in the proof of the CLT. Specifically, we prove that the harmonic moments of the total number of balls in the ARRU are uniformly bounded. Then, we use this to obtain a uniform -bound for the distance between the urn proportion at successive update times and the adaptive thresholds. In Section 4 the proofs of the main results are provided, while Section 5 contains results of a simulation study. Section 6 contains extensions to multi-color urn models.

Finally, some remarks concerning proofs are in order. The LLN and CLT for are deduced using the asymptotic properties of . For this reason, in several results of this paper we will provide a detailed probabilistic description of the sequence .

## 2 Model assumptions, notation and main results

We begin by describing our model precisely. Let and be two sequences of i.i.d. random variables, with probability distributions and respectively. Without loss of generality (Wlog), assume that the support of and is the same. We denote it by . Consider an urn containing red balls and white balls, and define . At time , a ball is drawn at random from the urn and its color is observed. Let the random variable be such that

We assume to be independent of the sequences and . Note that is a Bernoulli random variable with parameter .

Let and be two random variables such that and a.s. Let , . If and , we return the extracted ball to the urn together with new red balls. While, if and , we return it to the urn together with new white balls. If and , or if and , the urn composition is not modified. To ease notation, let denote and . Formally, the extracted ball is always replaced in the urn together with

new balls of the same color to the extracted one; now, the urn composition becomes

Set and . Now, by iterating the above procedure we define and to be two random variables, measurable with respect to the -algebra , with and a.s. Let and be the means of and , respectively. We assume throughout the paper the following condition:

(2.1) |

The urn process is then repeated for all . Let and be two random variables, measurable with respect to the -algebra

with and a.s. We will refer to as threshold parameters. At time , a ball is extracted and let if the ball is red and otherwise. Then, the ball is returned to the urn together with

balls of the same color, where , , , and for any . Formally,

and . If and , i.e. , or if and , i.e. , the urn composition does not change at time . Note that condition a.s., which implies , ensures that the urn composition can change with positive probability for any , since the replacement matrix (1.2) is never a zero matrix. Since, conditionally to the -algebra , is assumed to be independent of , is conditionally Bernoulli distributed with parameter .

We will denote by and the number of red and white sampled balls, respectively, after the first draws, that is and . Let and be two constants such that . We will adopt the following notation:

### 2.1 Law of large numbers

Our first result is concerned with the LLN.

###### Theorem 2.1.

From Theorem 2.1 we can obtain the convergence of sampled balls, namely .

### 2.2 Central limit theorem

We next study the limit distribution of proportion of sampled balls . By the description of the model, depends on the sequence , . However, frequent changes to make the sequence more erratic. To stabilize the behavior of , we fix a constant and introduce the sequence as

(2.5) |

for ; that is, we adapt the threshold parameters to change “slowly” at exponential times . An alternative definition of , which is used in some proofs, is the following

(2.6) |

for any . We will denote by

We now turn to the statement of the CLT. In the following represents the convergence in distribution.

###### Theorem 2.2.

###### Remark 2.1.

Theorem 2.2 introduces an asymptotic bias for given by . We show that this bias is exactly of order ; our next proposition makes this observation precise.

###### Remark 2.2.

In the case when and for any , Theorem 2.2 provides a CLT for the allocation proportion of MRRU model. This is summarized in the following corollary:

###### Corollary 2.2.

In a MRRU, under assumption (2.1), we have that

### 2.3 Application to clinical trials (revisited)

Consider two competing treatments and . The random variables and are interpreted as the potential responses to treatments and , respectively, given by subjects that sequentially enter the trial. At all times , a subject is allocated to a treatment according to the color of the sampled ball and a new response is collected. Note that only one response is observable from every subject, that is . The function transforms the responses into reinforcements and that update the urn. Typically, is chosen such that (or ) is considered the superior treatment when (). We assume there exists a unique superior treatment, which is formally stated in assumption (2.1).

We now describe the role of the sequences and in clinical trails. Assume the distributions and are parametric, depending on the vectors and respectively, with , with . Let be an estimator of after the first allocations, so that is measurable with respect to the -algebra . We assume that the distributions and are parametrically independent, in the sense that does not depend on and does not depend on . Hence, is computed with the observations , while is computed with the observations . Thus, and are defined as follows:

(2.10) |

where and are two continuous functions such that

this implies a.s. for every . Moreover, set

The LLN presented in Theorem 2.1 suggests a direct interpretation for the functions and in a clinical trial context: and represent the desired limiting allocations for the sequence , in case the superior treatment is () or (), respectively. This is a great improvement, since the design can target an arbitrary known function of all the parameters of the response distributions.

Ideally, and are chosen to obtain good statistical properties from the design. Typically, in clinical trials, a design is constructed to satisfy certain optimality criteria related to its statistical performances (e.g., power; see [20]). Letting denote the limit proportion of subjects to be allocated to treatment , this design can be obtained by the urn model described in Section 2 by choosing . However, in some experiments, ethical aspects are important and the main goal may be to assign fewer subjects to the inferior treatment; in this case we choose and . Designs requiring both ethical and statistical goals can also be obtained from our design, by setting . For instance, we may take

(2.11) |

where is a biasing term, which introduces a trade-off between the ethics and statistical properties.

Finally, it is worth emphasizing that conditions (2.2) and (2.7) required in the LLN of Theorem 2.1 and in the CLT of Theorem 2.2, respectively, are straightforwardly satisfied when we take to be maximum likelihood estimators (MLEs) for .

Moreover, condition (c2) in Remark 2.1 is equivalent of the assumption that the ranges of and are subsets of , for some .

## 3 Harmonic moments and related asymptotics

### 3.1 Harmonic moments

In this subsection, we show that the harmonic moments of the total number of balls in the urn are uniformly bounded. This is a key result which is needed in several probabilistic estimates, and in particular in the proof of the CLT. More specifically, as explained previously the results concerning the asymptotic behavior of , depend critically on the behavior of . In Subsection 3.2 we provide bounds for , by using comparison arguments with the MRRU model. Now, to replace the random scaling by the deterministic scaling , one needs to investigate the behavior of . Our next theorem provides a precise estimates of the moment of for any .

In the proof of Theorem 3.1, we need the following lemma that provides an upper bound on the increments of the urn process , by imposing a condition on the total number of balls in the urn . Hence, the proof of Theorem 3.1 is reported after the following result.

###### Lemma 3.1.

For any , we have that

(3.1) |

Proof. The difference can be expresses as follows:

Consider , since the case is analogous. Note that implies that and . Then, since a.s., on the set we have

where the last inequality follows from in (3.1).

Proof. [Proof of Theorem 3.1] In this proof, when we have set of integers with , to ease notation we will just write , omitting the symbol . First, note that, since a.s. for any and , we have that

(3.2) | ||||||

for any . To keep calculation transparent we choose . We recall that, by construction, we have that and for any ; hence, the random variables are, conditionally to the -algebra , Bernoulli distributed with parameter greater than or equal to . Hence, the behavior of is intrinsically related to the behavior of .

Thus, let us introduce the sets (down), (center) and (up) as follows:

where will be appropriately fixed more ahead in the proof. Then, we perform the following decomposition on the behavior of ,

On the set the process is bounded away from the extreme values ; Hence we can use comparison arguments with a sequence of i.i.d. Bernoulli random variables with parameter to get the boundedness of . After that, we will focus on proving that and converge to zero exponentially fast.

First, note that on the set the random variables

are, conditionally to the -algebra , Bernoulli with parameter with parameter greater than or equal to for any . Hence, if we introduce a sequence of i.i.d. Bernoulli random variable with parameter , from (3.2) we have that

We now show that

with .
To this end, we apply Theorem 2.1 of [12], with , , for .
All the assumptions of the theorem are satisfied in our case.
In fact, at first we have because .

Secondly, note that are identically distributed for all , since are i.i.d. Bernoulli of parameter .
Finally, converges in distribution, since .
Hence, by Theorem 2.1 of [12], it follows that is uniformly integrable. As a consequence,

Now, we will prove that and converge to zero exponentially fast. We will show that this occurs because and are bounded away from the extreme values , with probability that converge to one exponentially fast. Formally, fix , such that and , and define , , for any . Now, for any define the following sets:

where we recall that and are the adaptive thresholds. Note that . We have that