A Generalization of Multiple Choice Balls-into-Bins: Tight Bounds
This paper investigates a general version of the multiple choice model called the -choice process in which balls are assigned to bins. In the process, balls are placed into the least loaded out of bins chosen independently and uniformly at random in each of rounds. The primary goal is to derive tight bounds on the maximum bin load for -choice for any . Our results enable one to choose suitable parameters and for which the -choice process achieves the optimal tradeoff between the maximum bin load and message cost: a constant maximum load and messages. The maximum load for a heavily loaded case where balls are placed into bins is also presented
for the case . Potential applications are discussed such as distributed storage as well as parallel job scheduling in a cluster.
Key words. Balanced allocation, Load balance, Coupling
In the classical single choice balls-into-bins problem, a ball is placed into a bin chosen independently and uniformly at random (i.u.r.). It is common knowledge that the maximum bin load of this basic process after balls are placed into bins is with high probability (w.h.p.) . In the multiple choice paradigm, each ball is placed into the least loaded out of bins chosen i.u.r. Azar et al. showed that the maximum load in this case is exponentially reduced to w.h.p. . Since then, numerous variations of the standard multiple choice problem have been investigated (e.g. [11, 19, 17, 9, 14, 4]). For example, Berenbrink et al. and Talwar et al.  proved that the gap between the maximum and average load still remains even if the number of balls grows unboundedly large. Czumaj and Stemann  proposed an adaptive algorithm (i.e. the number of choices made by each ball varies depending on the load of the chosen bins) that achieves maximum load with message cost. *** The message cost is the cost of network communication incurred by bin probing and defined by the number of bins to be probed. Parallel versions[1, 16, 11, 10, 3] of balanced allocation have been studied. Recent works addressed near optimal adaptive algorithms: a constant maximum bin load using an average of bin choices per ball [10, 6].
In our prior work †††A preliminary version of this paper was published in PODC’ 11, pages 297-298, titled with “Brief announcement: A Generalization of multiple choice balls-into-bins”., we posed the following questions: If we place balls at a time into least loaded out of bins chosen i.u.r., is the maximum load still ?. What would occur if balls are placed into bins or balls into bins? In general, if balls are assigned to the least loaded among possible destinations chosen i.u.r., which we call -choice, what is the maximum load of any bin after all balls are placed into bins?
In this paper, we derive tight bounds on the maximum load of -choice as a function of three parameters and , which in turn allow one to choose appropriate values of and to achieve a striking balance between the maximum bin load and message cost: a constant maximum load with messages, or maximum load with messages. This suggests that our non-adaptive allocation scheme is near-optimal (for appropriate and ), and outperforms existing non-adaptive allocation schemes; to the best of our knowledge, none of previously known algorithms using messages achieve a constant maximum load.‡‡‡A balls-into-bins model is called non-adaptive if the number of choices per ball is fixed.
The -choice model represents the full spectrum of balanced allocations that lie between the single- and multi-choice algorithms– for small , -choice acts like the standard -choice, while it converges to the classic single choice balls-into-bins problem for and large . Our work is similar in spirit to the -choice algorithm proposed by Peres et al , where each ball goes to the lesser loaded of two random bins with probability and a random bin with probability , in that both schemes can be viewed as a mix between single- and multiple-choice strategies, though these two models exhibit no other structural similarities. The -choice algorithm is a (semi) parallel version of the basic sequential -choice, but fundamentally different from existing parallel balanced allocations[1, 16, 11, 10, 3]: In previous parallel models, each ball carries out bin probing independently from other balls, whereas, in our case, a group of balls shares information on bin state and uses the information to lower the maximum load. From a technical point of view, Markov chain coupling and layered induction [2, 5] is the inspiration for our analysis. We extend the existing analysis to a general setting where there are strong data dependencies arising from the behavior of balls.
We note that there are some ambiguities in the -choice allocation policy. For example, suppose that four bins – bin1,…, bin4– contain , and balls, respectively, in the beginning of a round for the -choice process. We consider three different scenarios:
Each of the four bins is sampled once.
Each of bin2 and bin3 is sampled once, and bin4 is sampled twice.
Each of bin1 and bin4 is sampled twice.
In the first scenario, each of bin2, bin3, and bin 4 receives a new ball; however, in the next two cases, some bins are sampled multiple times, creating ambiguities on the destinations of the three balls. In scenario (b), one option is to assign each ball to each sampled bin, and another option is to assign two balls to bin4 and the other to bin3. The last scenario is even more problematic since only two destinations are available.
We eliminate this ambiguity by imposing the restriction on the allocation policy; bins sampled times can receive at most balls. This rule can be explained in a different way: In each round, each of balls (instead of balls) is placed sequentially into a random bin. At the end of the round, balls (among the balls belonging to the round) with maximal height are removed, where the height of a ball is defined as the number of balls in the bin containing the ball right after it is placed. According to the policy, bin3 receives a ball and bin4 receives two balls in scenario (b), and bin1 receives one ball and bin4 receives two in scenario (c). Note that the -choice policy is not always optimal; a better strategy is to assign one ball to bin3 and two balls to bin4 in scenario (a), and all three balls to bin4 in scenario (c). In practice, one can easily modify the policy to increase load balance.
The rest of the paper is organized as follows. In the rest of this section, we discuss our main results, simulation results, and potential applications. Section 2 presents the model, and definitions and a list of notations used throughout this paper. We present some key properties of -choice in Section 3 and analyze the upper and lower bounds on the maximum loads in Section 4 and Section 5, respectively. We provide proofs of some lemmas in Section 6, and conclude the paper in Section 7.
1.1 Main Results
We assume that is a multiple of and the -choice process consists of rounds. Our balls-into-bins model is described as follows.
Our main result is formalized as follows.
For , let
Let denote the maximum load after balls are placed into bins under the -choice process.
If , the following holds with probability .
(i) If , then
(ii) If as , then
As , we have , and hence (2) can be simplified as follows.
If , then with probability
We discuss several interesting consequences derived from the main result. If we choose the smallest () and hence , the result (1) is reduced to the maximum load for the standard -choice algorithm. At another extreme, if for large then (3) implies that the maximum load becomes ; this agrees with the maximum load for the classical single choice algorithm. The true benefit of the -choice scheme lies between these two extremes. For example, if and , then the result (2) implies that -choice achieves maximum load using the asymptotically minimal cost of messages. To our best knowledge, the previously known result using messages is an adaptive algorithm with maximum load presented in . Another example is that if and (such as and ), then (1) suggests that a constant maximum load is achieved at the cost of messages, which is comparable to the best known result of an adaptive algorithm in . This suggests that our non-adaptive allocation scheme performs as well as the best known adaptive algorithm.
We obtain the following (partial) results on the heavily loaded case where the number of balls exceeds the number of bins.
Let denote the maximum bin load after balls are placed into bins following the -choice process. If , then the maximum load is
with probability .
1.2 Experimental Results
In Table 1, we present simulation results of -choice after balls are placed into bins using and varying and values. A pseudo random number generator is used to sample random bins in each round of the process. The maxim load shown in the table is obtained after running the simulation ten times in each. All values we have chosen divide so that exactly balls are inserted in each round. In the second and third columns, the maximum load of single-choice and two-choice is given. It is worth of note that the result of -choice is close to that of two-choice and -choice outperforms two-choice and achieves the same maximum load as -choice. We also remark that -choice performs noticeably better than single-choice.
The -choice allocation scheme is used for a parallel job scheduling in a cluster environment  as )-choice enables low response time. Suppose that a job consists of tasks to be scheduled in parallel, and each task issues random probes individually (as in -choice). In this case, it is likely that there will be a ball/task whose possible destinations are all heavily loaded. Since a job’s completion time is determined by the task finishing last, the performance of the standard multiple choice degrades as a job’s parallelism increases. Our -choice model solves this problem by letting tasks share information across all the probes in a job, which effectively reduces the chance for any tasks to commit to a heavily loaded worker machine.
A distributed storage system is another application domain to which -choice is naturally applicable. Data replication and fragmentation are widely used in this setting to increase file availability, fault tolerance, and load balance. Suppose that a new file is created and replicated into copies (or that a large file is split into chunks), and each of the replicas (or chunks) is to be stored on servers. The -choice scheme provides a simple and efficient solution for fast allocation and load balance with the minimum message cost; replicas (or chunks) are stored on the least loaded out of servers chosen randomly. If, for example, and , then -choice provides the asymptotically same maximum load as that of the two-choice scheme at the half of the message cost of two-choice. In case of data partitioning, if a file search requires retrieving all chunks of the file, then the search operation costs , which is (asymptotically) minimum and approximately half of the search cost for two-choice.
2 Model, Notations, and Definitions
2.1 The Model and Notations
We will assume that, at the end of each round, bins are sorted in decreasing order in terms of bin load (with ties broken randomly). By bin , we denote the th most loaded bin (at the time of consideration); that is, bin is the most loaded bin, bin is the second most loaded bin, and so on. Then the -choice process can be viewed as Markovian with the state space composed of the sorted bin load vectors.
The height of a ball is the number of balls in the bin containing the ball immediately after it is placed. In case the two balls fall in the same bin in the same round, each of them can be assumed to have a different height. For the purpose of analysis, we may assume that one of the ball is placed first and in turn has less height than the other. The following is the list of notations used in this paper.
is the -choice algorithm.
is the classical single choice equivalence, where balls are placed into bins in each round.
is the number of balls in bin at the end of the th round resulting from algorithm .
is the number of balls in the most loaded bins at the end of the th round resulting from algorithm .
is the number of balls with height at least at the end of the th round resulting from algorithm .
is the number of bins with at least balls at the end of the th round resulting from algorithm .
denotes the maximum load after balls are placed into bins resulting from the -choice process.
For simplicity, we sometimes use (or ) to denote , the number of balls in bin at the end of the -choice process. We also use and to denote and , respectively.
The -choice process can run sequentially. In some part of analysis, we treat -choice as a sequential process and need a notion of bin state at any time .
Serialization of -choice For , let be a permutation of . In each round , a set of bins is chosen i.u.r. and each of balls is placed sequentially into a bin as follows. The first ball falls into the th least loaded bin in , the second ball falls into the th least loaded bin in , and so on. Let . By , we denote the serialized version of -choice induced by .
is the number of balls in bin at time (right after the th ball is placed and the bins are sorted), , resulting from .
is the probability that the th ball, , is placed into bin resulting from algorithm , where bin is the th most loaded bin at time (i.e., right before the th ball is placed).
Let and , where and are permutations of . If for some , then and may have different probability distributions and in general. There should be no confusion between and : The former is the load of bin at the end of th round (right after balls are placed), while the latter is the load of bin at time (right after balls are placed). We use to denote .
Let and be allocation processes starting with empty bins.
Let represent the number of balls in the most loaded
bins at the end of process , , where denotes the number of balls in the th most loaded bin.
i) We say that and are equivalent, denoted , if
for and .
ii) We say that is majorized by , denoted , if
for and .
iii) We say that is dominated by , denoted , if
for and .
We note that domination is a stronger concept than majorization and that if is dominated by then the total number of balls in at the end of the process may be less than that in .
Let . By , we denote an allocation process starting with empty bins in which each ball chooses a bin i.u.r., say bin (the th most loaded bin), and is placed into the bin only if and discarded if .
3 Key Properties of -choice
In this section, we list useful properties of -choice which will be used later. First, we need the following lemma.
Let and be sets of Bernoulli random variables and let be a sequence of random variables in the range . Suppose that and , and that
for any and any . Then
for any .
3.1 Properties of -choice
Properties of -choice: Let and .
, for any choice of .
, if .
Basic techniques frequently used in this Lemma are majorization and coupling arguments (See [2, 5] for background of majorizaton and coupling). The first four properties are intuitively obvious and can be proved by natural coupling, whereas the last property (v) needs a more sophisticated coupling argument. We provide a sketch of proof for each of part (i) - (iv) and detailed analysis for (v).
Part (i): Consider the following natural coupling for and : In each round , the same set of random bins are chosen to probe for both and . For any permutation of , the number of balls in the most loaded bins for each processor are equal at the end of round . That is, holds under this coupling.
Part (ii): Assume that, in each round , a set of random bins and a random subset of with bins are chosen to probe for and , respectively. Under this coupling, holds with certainty.
Part (iii): The property (ii) is obtained from (v) and (i) as follows:
Part (iv): We define a coupling that links one round for and rounds for as follows. Suppose that a set of random bins have been selected in the beginning of round for . Partition the set into random subsets with equal size, each of which is used as a set of random bins in each of rounds for . One can show that under this coupling .
Part (v): It suffices to show that
for some and . For fixed , let . That is, is a Bernoulli random variable which is if and only if the bin containing is one of the most loaded bins at time (i.e. right after the th ball is placed). Similarly, let . We will show that, for any , we have
For the rest of proof, we specify the processes and and define a coupling running on and in which (7) holds. We assume that without loss of generality.
Definition of : For , let denote the set . Define
We view the set as the space of all possible choices that balls have in each round for . The process begins by choosing a permutation of randomly and hence determines a list of sets . In each round , a set of random bins are selected. Let be a permutation of by which each of balls in round is allocated. That is, the th ball in the round is placed into the th least loaded bin if , and is placed into th least loaded bin if . Let .
Definition of : We view as a -random-bins-model, making comparable to as follows. In each round , a set of random bins (rather than bins) is selected first. Then one of the bins is chosen randomly and removed, and then balls fall into the least loaded among the remaining bins. Clearly this process is equivalent to . We describe this procedure formally as follows. For , define to be the set
Let . the process starts by choosing a vector randomly to determine a list of sets . In each round , choose a set of random bins to probe and a permutation of . Each ball (that belongs to round ) is placed sequentially into the th least loaded bin in . In the following coupling process, we choose based on used in the definition of .
Coupling: Fix first. We define a coupling for and . Assume that ball belongs to round for . That is, . Suppose that the process begins by choosing a permutation of randomly. Then the corresponding process of chooses a vector randomly under the following restriction on (and no restrictions on other entries):
Therefore, either or . Recall that each of the balls associated with under the process belongs to either round or round , and is placed into a bin by the permutation of . To keep the notations simple, we rename the permutation as . The key observation is that, by the choice of , there is a permutation of such that for all . We set to be to specify . Let be the balls associated with and in both processes. Under , ball , , is placed into th least loaded bin (out of the random bins chosen either in round or ). Under , ball is placed into th least loaded bin (out of the random bins chosen in round ). Let for some . The fact guarantees the inequality (7) as desired. ∎
3.2 Proof of Theorem 2
We observe that all the properties of -choice listed in the previous section hold when the allocation process is extended to the case balls. Therefore, by properties (iv) and (v),
holds regardless the number of balls. Using the result on the heavily loaded case of -choice , we obtain Theorem 2. For , the behavior of the -choice in the heavily loaded case remains an open question. The rest of this paper is devoted to prove Theorem 1.
4 Upper Bound Analysis
In this section, we analyze upper bounds on the maximum load . A schematic diagram of the sorted bin load resulting from the -choice process is shown in Figure 1, where represents the number of balls in bin , the th most loaded bin. We select a suitable constant for which we break the maximum bin load into and , on each of which we derive an upper bound separately. Depending on the range of and we use different approaches as described in subsequent sections.
4.1 Upper Bound when
Throughout this subsection, we assume that .
4.1.1 Upper Bound on
First, we note the relation between and : Using the following two lemmas and the fact that , we obtain an upper bound on and in turn an upper bound on for some . Carefully chosen will make the bound essentially tight, as proved in Section 5.1.
For any ,
For any and ,
The main result of this section is formalized as follows.
Let and . The number of balls in bin resulting from -choice process is
with probability .
By Lemma 2,
holds with probability . Using Lemma 3 and the fact that , we obtain
Let be the smallest such that . Let . By the definition of , we have
Therefore, we have
Using Stirling’s formula we solve the above inequality for to obtain
where . Note that, since , (11) guarantees and hence . Therefore, with probability ,
Since implies , we complete the proof. ∎
4.1.2 Upper Bound on
We revisit the layered induction approach presented by the authors in [2, 11] (For example, see pages 9 - 13 in .) The key formulation in their analysis is the recursive definition for expressed in terms of , where is the number of bins with load at least . Azar et al.  and Mitzenmacher  showed that the sequence of decreases doubly exponentially (with high probability). Our approach here is similar to the existing analysis. In the -choice context, however, the interplay among balls within a round in addition to the dependencies between different rounds pose several challenges. For example, depends not only on but also on some other bins with load less than ; if some bins with less than balls are sampled multiple times, they may receive multiple balls and in turn become another source that increases the value of . Furthermore, the random variables we deal with take on values in the range , and therefore the Chernoff bounds on the sum of Bernoulli random variables will no longer apply to our case. In addition, the Chernoff-Hoeffding bounds that hold for random variables in a large range are not strong enough to guarantee the tight bound we desire.
Let represent the number of balls placed in the th round of -choice with height at least . For , we have
where denotes the bins that received a ball in round and .
In the following two lemmas, we discuss a Chernoff-type tail bound that holds on the sum of non-Bernoulli random variables under a specific condition.
Let be independent random variables with and .
If is decreasing by (at least) a factor of §§§That is, ., then
the following results hold.
i) For , we have
ii) If , then