Analyzing and Provably Improving Fixed Budget Ranking and Selection Algorithms
Abstract
This paper studies the fixed budget formulation of the Ranking and Selection (R&S) problem with independent normal samples, where the goal is to investigate different algorithms’ convergence rate in terms of their resulting probability of false selection (PFS). First, we reveal that for the wellknown Optimal Computing Budget Allocation (OCBA) algorithm and its two variants, a constant initial sample size (independent of the total budget) only amounts to a subexponential (or even polynomial) convergence rate. After that, a modification is proposed to achieve an exponential convergence rate, where the improvement is shown by a finitesample bound on the PFS as well as numerical results. Finally, we focus on a more tractable twodesign case and explicitly characterize the large deviations rate of PFS for some simplified algorithms. Our analysis not only develops insights into the algorithms’ properties, but also highlights several useful techniques for analyzing the convergence rate of fixed budget R&S algorithms.
Keywords: ranking and selection; fixed budget; convergence rate.
1 Introduction
Stochastic simulation has become one of the most effective approaches to modeling large, complex and stochastic systems arising in various fields such as transportation, finance, supply chain management, power and energy, etc. It has also been used in many applications to identify which system design is optimal under certain performance criterion (e.g., the expected cost). This leads to what is called simulation optimization or Optimization via Simulation (OvS). In particular, using simulations to identify the best design among a finite number of candidates, generally referred to as Ranking and Selection (R&S), is of great practical interest to study.
The research on R&S is largely concerned with two related yet different formulations. The fixed confidence formulation aims to attain a target confidence level of the selected design’s quality using as little simulation effort as possible, whereas the fixed budget formulation typically requires maximizing the probability of correct selection (PCS) under a fixed budget of simulation runs. For fixed confidence, a considerable amount of research effort goes to the indifference zone (IZ) formulation, which dates back at least to [1]. An IZ procedure guarantees selecting the best design with (frequentist) probability higher than certain level (e.g., 95%), provided that the difference between the toptwo designs is sufficiently large. Numerous efficient IZ procedures have been proposed in the simulation literature, including but are not limited to the KN procedure in [2], the KVP and UVP procedures in [3], and the BIZ procedure in [4]. We refer the reader to [5] and [6] for excellent reviews of the development on this topic. In addition, the Bayesian approaches (see, e.g., [7]) and the probably approximately correct (PAC) selection (see, e.g., [8]) have also been studied in this stream of works.
In this paper, we study the fixed budget formulation under a frequentist setting. In simulation optimization, the Optimal Computing Budget Allocation (OCBA) algorithm in [9] is one of the most widely applied and studied algorithms. Although OCBA is usually derived under a normality assumption and asymptotic approximations, it is well known for its robust empirical performance even when the sample distributions are nonnormal. Moreover, its key allocation rule can be formally justified from the perspective of the large deviations theory (see, e.g., [10]). However, a major criticism is that a theoretical performance guarantee is still lacking to this date, mostly due to the difficulties in characterizing the PCS for sequential sampling algorithms. On the other hand, in the MultiArmed Bandit literature, the same problem has been studied under the name of “BestArm Identification”, where the Successive Rejects (SR) algorithm proposed in [11] currently stands as one of the best algorithms. Built on a framework of sequential elimination, SR not only achieves good performance but also allows a finitesample bound to be derived. Furthermore, [12] showed that SR can match the optimal rate up to some constant in the Bernoulli setting. Nevertheless, SR’s performance under general distributions has not yet been studied. Bayesian methods are also gaining momentum recently. For example, the simple Bayesian algorithms proposed in [13] were shown to achieve the optimal posterior convergence rate. However, there was no guarantee on the frequentist performance. The followup work, [14], improved the Expected Improvement method and provided a frequentist bound, but the guarantee was for the fixed confidence setting.
Among the aforementioned algorithms, OCBA inarguably has the most variants and extensions. It has been extended to multiobjective optimization ([15]), finding simplest good designs ([16]), R&S under input uncertainty ([17]), optimizing expected opportunity cost ([18]) and many others. Meanwhile, there are attempts to approach the problem from different perspectives. For example, [19] considered fixed budget R&S under a Bayesian framework and formulated the problem as a Markov Decision Process, allowing a Bellman equation and an approximately optimal allocation to be derived. Also interestingly, [20] revealed that some variants of the Expected Improvement methods essentially have the same allocation as the OCBA methodology. Nonetheless, as was mentioned in [21] as one of the open challenges, “there are no theoretical proof to show how good the finitetime performance of OCBA is with respect to the real problem”.
The purpose of this paper is to better understand existing algorithms’ behavior through rigorous analysis, and develop insights for improving their performance. In particular, our work highlights convergence analysis on the OCBA algorithm and some of its variants, where the convergence rate is measured in terms of the large deviations rate of the probability of false selection (PFS). A small portion of the results in this paper can be found in [22], which only improves two variants of OCBA in a simplified twodesign setting. The current paper significantly extends [22] by generalizing to the multipledesign case, and characterizing the LD rate for several other algorithms. Specifically, our contributions are summarized as follows.

We show that for three OCBAtype algorithms including the original OCBA, a constant initial sample size only amounts to a subexponential (or even polynomial) convergence rate of PFS.

By making the initial sample size increase linearly with the total budget, we improve the convergence rate to exponential, as is shown by a finitesample bound on the PFS. The improvement is further validated via numerical experiments.

As further exploration towards general convergence analysis, we exactly characterize the convergence rate of two simplified algorithms for a twodesign case, where the results showcase some interesting insights as well as useful proof techniques.
The rest of the paper is organized as follows. A brief review on the fixed budget R&S problem is provided in Section 2. Section 3 reveals the drawback of constant initial sample size for several OCBAtype algorithms, and proposes a modification to improve their convergence rate. Section 4 conducts a preliminary study on convergence rate characterization by analyzing some simplified algorithms in a twodesign case. Numerical results are presented in section 5, followed by conclusion and future work in section 6.
2 Problem Formulation
Given a set of designs , our goal is to select (without loss of generality) the one with the highest expected performance. Samples from simulating design are denoted by , where denotes the th simulation run. Each design’s expected performance is unknown, and is typically evaluated through multiple simulation runs and estimated by the sample mean
where is how many times design has been sampled/simulated. The subscript will be suppressed when there is no ambiguity. The true best and the observed best designs are denoted by
respectively. We make the following standard assumptions to avoid technicalities, where stands for normal distribution and “i.i.d.” means “independent and identically distributed”.
Assumption 2.1.

and for any two different designs and .

For each design , are i.i.d. samples from , where . The samples are also independent across different designs.
Then, under a fixed budget of simulation runs, it is desired to maximize the probability of correct selection (PCS), which is defined as
We will also refer to as the probability of false selection (PFS). The challenge of fixed budget R&S problem lies in how to make the best use of a finite simulation budget to distinguish the best design from the rest. Numerous algorithms have been proposed to this end, and their performance is typically evaluated using two types of measures.
The first type is asymptotic measures, which are often based on the large deviations (LD) theory. It has been shown in [10] that many algorithms have the following asymptotic property.
(2.1) 
where is an algorithm, is a problem instance, is the PFS of algorithm applied to problem under budget , and is called an LD rate function. For convenience, we say an algorithm has an exponential convergence rate if its PFS converges exponentially fast to 0, i.e., its LD rate is positive. Asymptotically optimal algorithms have been derived by maximizing (see, e.g., [10]), but it is an insufficient performance measure since it focuses primarily on the asymptotic performance. For example, all the terms in have the same LD rate according to (2.1), yet they behave quite differently for small values of .
Measures of the second type emphasize more on the finitesample performance. One approach is to approximate the PFS using tight bounds, but it could be remarkably difficult for algorithms that allocate the budget in a sequential style. Another approach is to plot out the PCS curve and visualize how fast it converges to 1 as increases. The main downside, however, is that such empirical results are problemspecific and may fail to represent the general performance of an algorithm.
Bearing the pros and cons of these three approaches in mind, we will analyze and improve existing algorithms from an LD perspective, and substantiate the improvement using finitesamples bounds combined with numerical results.
3 Analyzing and Improving OCBAtype Algorithms
We begin by addressing the convergence rate of the wellknown OCBA algorithm, which has not been rigorously studied to the best of our knowledge. Two of its variants, called OCBAD and OCBAR, are also studied for a comparison. Surprisingly, we discover that if all three algorithms follow the common practice of a constant initial sample size, then the PFS converges only subexponentially fast as opposed to the often implicitly conjectured exponential rate. A quick modification is then proposed to guarantee an exponential convergence rate.
3.1 OCBA, OCBAD, and OCBAR
This section gives a brief introduction to OCBA and two of its variants, which we call OCBAD and OCBAR. To better describe the algorithms, we introduce the following notations. Let denote the standard sample variance estimator of i.i.d. samples from design , and let denote the iteration number of the algorithms, where corresponds to the initialization phase. The budget allocated to design at the end of the th iteration is written as , and other quantities are defined accordingly. For example, we let and . The OCBA algorithm is presented in Algorithm 1.
OCBA has three input parameters: (i) is the size of samples for an initial estimation of each design’s mean and variance; (ii) is the increment of available budget at each iteration; (iii) is the total budget. An auxiliary variable, , is introduced to implement sequential allocation. The procedure begins with estimating each design’s mean and variance using samples, where is set to be . Then, at each iteration, the algorithm increases by , and (re)computes the fractions according to the following equations.
(3.1) 
With the fractions computed, the algorithm tries to match its current with the target allocation to the greatest possible extent: if is below the target, run additional simulations to match its target; otherwise, maintain the current since consumed budget cannot be refunded. All the mean and variance estimates are updated at the end of each iteration. The process continues iteratively until the total budget is depleted. Finally, the design with the highest sample mean is selected as the output.
Observe that two features of OCBA stand out from Algorithm 1. The first one to notice is the allocation fractions specified by (3.1), which is a plugin estimate of
(3.2) 
The fractions in (3.2) can be derived by asymptotically maximizing a lower bound of the PCS under a normality assumption (see, e.g., [9]). Moreover, [10] showed that for algorithms using a deterministic allocation of , such fractions approximately maximize the LD rate of PFS in the case of i.i.d. normal samples. The other feature is sequential allocation, which consists of incrementally allocating the budget, repeatedly updating the estimated fractions , and asymptotically matching the true allocation fractions as . Empirical evidence shows that sequential allocation may be the key to its good finitesample performance, even though a quantitative analysis is not available due to its highly complex dynamics. In this paper, we attempt to better understand OCBA by studying its asymptotic behavior, and our results will also shed some light on its finitesample performance.
In addition to OCBA, we also consider variations on OCBA and propose two variants, OCBAD and OCBAR, which are presented in Algorithms 2 and 3, respectively. The “D” and “S” stand for “Deterministic” and “Randomized”. Both variants inherit the fractions in (3.1) and are designed to be fully sequential, i.e., at each iteration only a single additional run is allocated to some design . However, their difference lies in the way is chosen. For OCBAD, corresponds to the design with the largest ratio , where the ratio is roughly a measure of need for simulations: intuitively, an undersampled design is reflected by a larger ratio relative to the others’. In OCBAR, is chosen randomly by using the fractions as a sampling distribution. In other words, conditional on the vector, the choice of is independent of everything else. In sum, all three algorithms are governed by the “asymptotically optimal” fractions given by (3.2), except that they use different sequential allocation strategies to approximate such fractions.
We consider OCBAD and OCBAR for two reasons. First, fully sequential allocation and randomization are among the most natural forms of generalization to consider, examples including the moststarving version of OCBA ([23]) and the Toptwo Sampling Algorithms ([13]). It is therefore important to know if any finding for OCBA also applies to these variants. Second, such variations can often make the algorithm behave more regularly and thus more amenable to analysis.
3.2 Convergence Analysis
As a main contribution of this paper, we formally analyze the performance of OCBA, OCBAD and OCBAR. Firstly, we show that all three algorithms attain the “asymptotically optimal” allocation fractions given by (3.2) as . Secondly, we reveal that despite the convergence of fractions, if the initial sample size is chosen as a constant independent of , then these algorithms suffer from a subexponential convergence rate.
To put our work in perspective, [10] were among the first to study the asymptotics of fixed budget R&S algorithms. They established that if an algorithm prespecifies some fractions and simply sets , then the PFS converges exponentially fast under weak assumptions on the sample distributions’ tails. In particular, if the samples are i.i.d. normal, then the fractions given by (3.2) approximately maximize the LD rate of the PFS. Perhaps under the influence of such insights, there seems to be an implicit conjecture that algorithms which “asymptotically” attain the optimal allocation fractions, such as OCBA, should enjoy a similar LD rate to its static counterpart’s, or at least guarantee exponential convergence. In what follows, we disprove this conjecture by using OCBA and the two proposed variants as counterexamples.
To set the basis for our major discovery, we link Algorithms 13 through the convergence of their actual allocation fractions . Observe that if and only if , so we characterize such convergence in terms of for convenience. All the proofs omitted in the paper can be found in the electronic companion.
Proposition 3.1.
Let Assumption 2.1 hold and denote “almost surely” by “”. Then, for OCBA, OCBAD and OCBAR, the following holds.

a.s. as for all .

a.s. as for all .

a.s. as for all .
Proposition 3.1 is not surprising since all three algorithms are designed to approximate and match the true fractions in (3.2). It holds regardless of the value of (as long as ), because the algorithms are capable of correcting the estimation error from the initialization phase. For this reason, a small is often employed to leave room for better allocation flexibility in succeeding iterations. For example, a common suggestion for is between 5 and 20 (see, e.g., [24, 25]). Nevertheless, the following theorem suggests that a constant independent of can cause the PFS to converge rather slowly.
Theorem 3.1.
Let Assumption 2.1 hold. If is chosen as a constant independent of , then for OCBA and OCBAD,
(3.3) 
for some constant independent of . Also, for OCBAR,
(3.4) 
Theorem 3.1 appears somewhat surprising, as it states that a constant initial sample size leads to at most a polynomial convergence rate for OCBA and OCBAD, and a subexponential convergence rate for OCBAR. At a high level, it implies that the initial estimation error, though vanishing as , does not decrease at a sufficiently fast rate. It also implies that the convergence of allocation fractions alone does not say much about how fast the PFS converges. Before showing Theorem 3.1, we present a few technical lemmas and describe the main idea behind the proof.
Lemma 3.1.
Let be the sample standard deviation of i.i.d. normal samples with variance . Then, for any ,
(3.5) 
(3.6) 
Lemma 3.2.
Let be the sample variance of i.i.d. random variables. Then, such that
(3.7) 
Lemma 3.3.
Let be the sample variance of i.i.d. random variables. Then, , where is a constant, such that
Lemma 3.1 provides some basic tail bounds for the standard deviation estimator , which can be used to prove Lemma 3.2. Lemma 3.3 is the leading cause behind the polynomial convergence rate for OCBA and OCBAD, as it points out that the left tail of converges to 0 only at a polynomial rate. We will present the proof of Theorem 3.1 for OCBA, and the rest can be found in the electronic companion. To illustrate the main idea, consider an adversarial scenario for OCBA where

After the initialization phase, is some suboptimal design, e.g., design 2.

The algorithm allocates all the remaining budget to design 2.

The sample mean of design 2 beats all other designs’ over all iterations.
In the scenario described above, we say that the algorithm “freezes” all the designs other than design 2, which only happens if the initial estimates for the “frozen” designs are highly inaccurate. For instance, we may consider a case where for all , takes very small value and thus is also tiny. This would trick the algorithm into greedily sampling design 2, while all the other designs’ mean and variance estimates get no further update and thus stay inaccurate. To avoid technicalities, we further require design 2 to be the observed best design throughout the allocation process, so that takes the same functional form for any iteration (recall from (3.1) that has a different form than ). The rest is to bound the probability of such an event from below, and show that it is not exponentially rare.
Proof.
Proof of Theorem 3.1 (OCBA). Assume without loss of generality that . For each design , we will construct events such that on , a false selection always occurs. Without ambiguity, we will simply drop and write and instead. To begin with, by Lemma 3.2 we can choose such that . By a similar argument, there exists such that . Let
Then, by a union bound. For , we let
We now show that by induction, where “FS” stands for the false selection event. Fix a sample path on . Note that after the initialization phase. Assume that at the end of the th iteration, then at the th iteration, for any ,
where . From we have , thus for all and only design 2 will get additional sample at step 7 of Algorithm 1. Since for all , design 2 will still be at the end of the th iteration. Continue this process and a false selection is certain when the algorithm terminates. Finally, the probability of can be bounded from below as follows.
where the last equality follows from the independence of and . Furthermore, for some constant (independent of ), and by Lemma 3.3, where are constants independent of . Gather all the terms and the conclusion follows.
∎
The key to proving Theorem 3.1 is to exploit the asymmetry of the standard deviation estimator’s distribution. Specifically, when constructing events , we require to decrease in order as for all . Then, Lemma 3.3 can be used to show a polynomial lower bound for . Another way to construct a “freezing” event is by increasing in order , but this merely produces an exponential lower bound according to (3.6) in Lemma 3.1. In other words, only exploiting the left tail of would produce a tighter lower bound for the PFS.
Theorem 3.1 can be counterintuitive at first glance. Recall from [10] that for normal samples, any fixed fractions would guarantee an exponential convergence rate. This particularly includes equal allocation, i.e., for all designs . In this regard, Theorem 3.1 seems to suggest that equal allocation is better than more sophisticated sequential allocation procedures, which contradicts numerous empirical studies in which OCBA exhibits significant advantage over equal allocation. To resolve the “conflict”, note that the LD rate is only defined in an asymptotic sense, meaning that when gets large enough, equal allocation will eventually achieve a lower PFS than all three OCBAtype algorithms we consider. However, the crossing point of may be so large that the PFS is already very close to 0, which also explains why such a crossing point is not always observed in numerical results.
3.3 A Modification for Improvement
We propose a simple modification to the three OCBAtype algorithms, which is to make grow linearly in . This can be done by choosing a constant and setting . Intuitively, the PFS should converge at least as fast as equally allocating to all designs, where an exponential convergence is guaranteed. More formally, we have the following finitesample bound on the PFS.
Theorem 3.2.
Let Assumption 2.1 hold and suppose that . If for some , then for OCBA, OCBAD and OCBAR, there exists some positive constants (independent of ) such that
(3.8) 
where and for .
Proof.
Proof of Theorem 3.2 Note that since for all designs , if the event
occurs, then we have a correct selection regardless of the exact values of ’s. Apply a Gaussian tail bound for and we have
Evaluate the geometric sums and (3.8) follows.
∎
The bound (3.8) fills the longstanding void of a finitesample PFS upper bound for OCBAtype algorithms. It also applies to a broad class of algorithms that involve a warmup phase of acquiring initial estimates. An idea similar to using a linearly increasing is to enforce hard thresholds for the actual fractions such that, e.g., for some . Both methods will force to grow at least linearly fast in , but we work with the former mainly for conveniently obtaining a PFS bound. The choice of inevitably involves a tradeoff between lower initial estimation error and higher flexibility in subsequent allocation. In Section 5, we will use numerical results to demonstrate that an appropriately chosen can lead to a significant improvement in the finitesample PCS.
One drawback of the finitesample bound in (3.8) is that it is too general and thus can be quite loose. While a tighter upper bound should reflect the pros and cons of different sequential allocation strategies, deriving such a bound is known to be very challenging even for nicely structured fully sequential algorithms. In the upcoming section, we turn our attention to algorithms which follow simple designs yet capture some key features of advanced algorithms. The idea is to examine the individual impact of a feature through LD rate analysis, and keep the intuition uncluttered from other common features in a sophisticated algorithm.
4 Characterizing the LD Rate for Simplified Algorithms
In this section, we focus on algorithms with an exponential convergence rate, for which the LD rate is one of the most precise quantitative measures of asymptotic behavior. Nevertheless, LD rate analysis remains difficult for sophisticated sequential allocation algorithms. In this section, we exactly characterize the LD rate for some simplified algorithms, and compare their LD rates with that achieved by the optimal static allocation derived in [10]. Each algorithm to be considered has a simple structure yet represents certain important feature of more advanced algorithms. Our analysis will focus on a twodesign case for better tractability, but the proof techniques and insights can provide a basis for more general convergence analysis.
4.1 Algorithms Overview
We consider a case of and study three algorithms, which are presented in Algorithms 46. Algorithm 4 is the deterministic algorithm studied in [10], which statically allocates the budget according to prespecified fractions and , hence the name “deterministic static (DS)”.
A slight modification of DS leads to the randomized static (RS) algorithm in Algorithm 5, which uses the static fractions as a sampling probability distribution at every iteration, and thus can be roughly regarded as a simplified version of OCBAR or the Toptwo Sampling Algorithms in [13]. To the best of our knowledge, the (frequentist) convergence rate of such a randomized algorithm has not been studied in the literature.
Finally, Algorithm 6 is a twophase algorithm which uses phase I to estimate the optimal DS fractions (see Section 4.2 for more details), and then implements the estimated fractions in phase II. The twophase algorithm is a vanilla version of our modified OCBAtype algorithms, as it enforces a linearly growing , but does not update the fraction estimates in all subsequent iterations. Also, notice that we do not reuse the initial samples in phase II, so and are not bounded from below by a linear function of and Theorem 3.2 does not apply. However, we will show that it still has an exponential convergence rate due to the rapid decrease of initial estimation error as increases.
4.2 LD Rate Analysis
4.2.1 Analysis of DS Algorithm
Before proceeding to the LD rate analysis of the RS and twophase algorithms, we recall some wellestablished results for the DS algorithm. In addition, we also derive a few new results which will serve as benchmarks. Following the normality assumption and letting , the PFS can be written as
where we assume that and henceforth. Ignoring the integrality constraints on and , it can be shown that setting minimizes the PFS. In simulation literature, this is often known as “the optimal strategy for twodesign problems is to allocate the budget proportionally to their standard deviations”. The same conclusion can be reached by maximizing the following LD rate with respect to ,
where is again the unique maximizer, and the corresponding optimal LD rate is given by
(4.1) 
We will use the optimal DS allocation as a benchmark in subsequent analysis. In practice, the true variances are unknown and thus cannot be implemented. A simple alternative is equal allocation (EA), i.e., setting . The LD rate for EA is given by
(4.2) 
Since , we have . In other words, EA’s LD rate is never more than a factor of 2 away from the optimal DS rate. Another interesting fact is that, without prior knowledge on the designs’ performance, EA is the most robust algorithm. Indeed, consider the robust optimization problem
which is to find the that minimizes the worst case ratio between and . It can be checked that the innerlayer problem’s optimal value is , so is the optimal solution.
We now derive a PFS bound that holds for an important class of algorithms. Since the optimal DS allocation only involves the designs’ variance information, it would be reasonable for us to restrict our discussion to algorithms with the following property.
Definition 4.1.
A fully sequential algorithm is called variancedriven if

at iteration , it runs replications for each design to obtain initial variance estimates and ;

at every iteration , the algorithm decides which design to simulate next solely based on for all , i.e., the history of variance estimates up to iteration ;

at the end of final iteration , output .
Note that in the case of , OCBA’s allocation fractions in (3.2) degenerate to . Therefore, the three OCBAtype algorithms we considered in Section 3, i.e., OCBA, OCBAD and OCBAR, all fall into the category of variancedriven on twodesign problems. We will derive a tight PFS upper bound which holds for all algorithms of this type.
Lemma 4.1.
Let and be the sample mean and sample variance of i.i.d. normal random variables, respectively. Then, for all , is independent of .
Lemma 4.1 is an extension of the wellknown result that and are independent for normal distribution. Given a total budget of , we let and denote the total number of simulation runs for designs 1 and 2 when the algorithm terminates, i.e., . Then, Lemma 4.1 has the following implication in our context.
Corollary 4.1.
For any variancedriven algorithm, it holds that almost surely, where , , and is independent of .
Generally speaking, the final mean estimates and can have a highly nontrivial correlation if an algorithm sequentially allocates the computing budget based on some iteratively updated statistics. Surprisingly, Corollary 4.1 reveals that for variancedriven algorithms, and are conditionally independent given for some . Moreover, their joint distribution coincides with what we get from deterministically allocating and runs to designs 1 and 2, respectively. This is due to the nice property of normal distribution characterized in Lemma 4.1, and it gives rise to a tight PFS lower bound for all variancedriven algorithms.
Proposition 4.1.
For any variancedriven algorithm , we have
where is the cumulative distribution function (c.d.f.) of distribution.
Proof.
Proof of Proposition 4.1. For any fixed ,
where the righthand side (RHS) is convex in and is minimized when . For an algorithm described in the statement, it follows from Corollary 4.1 that is distributed as two independent normal random variables. Thus, by Jensen’s inequality,
where the RHS is further bounded from below by the PFS corresponding to , which yields exactly the lower bound in the statement. ∎
Proposition 4.1 establishes optimality for the optimal DS algorithm in a very strong sense: no variancedriven algorithm can beat the optimal DS allocation under any finite (up to some rounding error). The same typically does not hold if , where it can be checked numerically that the optimal DS algorithm may perform poorly on some problem instances under small budgets. Nonetheless, from an asymptotic point of view, it remains an open question whether sequential algorithms can achieve a higher LD rate than the optimal DS algorithm when .
4.2.2 Analysis of RS Algorithm
Recall from Algorithm 5 that at each iteration, the RS algorithm simulates design 1 with probability (w.p.) and design 2 w.p. , where and the samples are independent of the decisions. Let be a sequence of i.i.d. random variables representing whether design 1 is sampled at each iteration . To ensure that the sample means are welldefined, we set so that each design gets sampled at least once. Then, the PFS is given by
(4.3) 
which does not allow a closed form. However, a quick observation is that the RHS of (4.3) is bounded from below by the term corresponding to , i.e.,
which gives the LD rate upper bound
Thus, the RS algorithm’s LD rate is bounded as , which is in sharp contrast with the LD rate of the DS algorithm, where the latter grows in order according to (4.1). Since the separation margin of and measures the difficulty of a correct selection, this means that the RS algorithms cannot take advantage of a larger due to the randomness introduced in allocation. It also echoes our observation in Section 3 that algorithms with the same limiting allocation fractions may have drastically different LD rates. More precisely, we have the following exact characterization.
Theorem 4.1.
For the RS algorithm, we have
(4.4) 
where is the KullbackLeibler (KL) divergence between two Bernoulli distributions with parameters and , respectively.
The optimization problem in (4.4) is in general nonconvex and an analytical solution is not available. Nonetheless, it can be checked numerically that the value maximizing the LD rate of the RS algorithm is different from , the optimal DS fraction. The proof of Theorem 4.1 relies on the following lemma, where denotes the set of nonnegative integers.
Lemma 4.2.
Let be a sequence of functions for . If there exists a function such that converges uniformly to on , then
(4.5) 
Proof.
Proof of Lemma 4.2. First of all, notice that
so taking the supremum on the RHS gives a lower bound. For the upper bound,