Smoothed Analysis of Balancing Networks

# Smoothed Analysis of Balancing Networks∗

Tobias Friedrich Max-Planck-Institut für Informatik, Saarbrücken, Germany Thomas Sauerwald Simon Fraser University, Burnaby, Canada  and  Dan Vilenchik Department of Mathematics, University of California Los Angeles, CA, USA
###### Abstract.

In a balancing network each processor has an initial collection of unit-size jobs (tokens) and in each round, pairs of processors connected by balancers split their load as evenly as possible. An excess token (if any) is placed according to some predefined rule. As it turns out, this rule crucially affects the performance of the network. In this work we propose a model that studies this effect. We suggest a model bridging the uniformly-random assignment rule, and the arbitrary one (in the spirit of smoothed-analysis). We start with an arbitrary assignment of balancer directions and then flip each assignment with probability independently. For a large class of balancing networks our result implies that after rounds the discrepancy is with high probability. This matches and generalizes known upper bounds for and . We also show that a natural network matches the upper bound for any .

A conference version [10] appeared in the 36th International Colloquium on Automata, Languages and Programming (ICALP 2009). This work was done while the first two authors were postdoctoral fellows at International Computer Science Institute Berkeley supported by the German Academic Exchange Service (DAAD) and the third author was a postdoctoral fellow at the Computer Science Division of the University of California Berkeley.

## 1. Introduction

In this work we are concerned with two topics whose name contains the word “smooth”, but in totally different meaning. The first is balancing (smoothing) networks, the second is smoothed analysis. Let us start by introducing these two topics, and then introduce our contribution – interrelating the two.

### 1.1. Balancing (smoothing) networks

In the standard abstraction of smoothing (balancing) networks [2], processors are modeled as the vertices of a graph and connection between them as edges. Each process has an initial collection of unit-size jobs (which we call tokens). Tokens are routed through the network by transmitting tokens along the edges according to some local rule. We measure the quality of such a balancing procedure by the maximum difference between the number of tokens at any two vertices at the end.

The local scheme of communication we study is a balancer gate: the number of tokens is split as evenly as possible between the communicating vertices with the excess token (if such remains) routed to the vertex towards which the balancer points. More formally, the balancing network consists of vertices , and matchings (either perfect or not) . We associate with every matching edge a balancer gate (that is, we think of the edges as directed edges). At the beginning of the first iteration, tokens are placed in vertex , and at every iteration , the vertices of the network perform a balancing operation according to the matching (that is, vertices and interact if ).

One motivation for considering smoothing networks comes from the server-client world. Each token represents a client request for some service; the service is provided by the servers residing at the vertices. Routing tokens through the network must ensure that all servers receive approximately the same number of tokens, no matter how unbalanced the initial number of tokens is (cf. [2]). More generally, smoothing networks are attractive for multiprocessor coordination and load balancing applications where low-contention is a requirement; these include producers-consumers [11] and distributed numerical computations [3]. Together with counting networks, smoothing networks have been studied quite extensively since introduced in the seminal paper of Aspnes et al. [2].

Herlihy and Tirthapura [12, 13] initiated the study of the  network (cube-connected-cycles, see Figure 1) as a smoothing network111Actually, they considered the so-called block network. However, it was observed in [16] that the block network is isomorphic to the -network and therefore we will stick to the latter in the following.. For the special case of the , sticking to previous conventions, we adopt a “topographical” view of the network, thus calling the vertices wires, and looking at the left-most side of the network as the “input” and the right-most as the “output”. In the , two wires at layer are connected by a balancer if the respective bit strings of the wires differ exactly in bit . The  is a canonical network in the sense that it has the smallest possible depth (number of rounds) of as a smaller depth cannot ensure any discrepancy independent of the initial one. Moreover, it has been used in more advanced constructions such as the periodic (counting) network [2, 6].

As it turns out, the initial setting of the balancers’ directions is crucial. Two popular options are an arbitrary orientation or a uniformly random one. A maximal discrepancy of was established for the for an arbitrary initial orientation [13]. For a random initial orientation of the , [12] show a discrepancy of at most for the (this holds whp 222Writing whp we mean with probability tending to as goes to infinity. over the random initialization), which was improved by Mavronicolas and Sauerwald [16] to (and a matching lower bound).

Results for more general networks have been derived in Rabani et al. [17] for arbitrary orientations. For expander graphs, they show an -discrepancy after rounds when is the discrepancy of the initial load vector. This was recently strengthened assuming the orientations are set randomly and in addition the matchings themselves are chosen randomly [9]. Specifically, for expander graphs one can achieve within the same number of rounds a discrepancy of only .

### 1.2. Smoothed analysis

Let us now turn to the second meaning of “smoothed”. Smoothed analysis comes to bridge between the random instance, which typically has a very specific “unrealistic” structure, and the completely arbitrary instance, which in many cases reflects just the worst case scenario, and is thus over-pessimistic in general. In the smoothed analysis paradigm, first an adversary generates an input instance, then this instance is randomly perturbed.

The smoothed analysis paradigm was introduced by Spielman and Teng in 2001 [19] to help explain why the simplex algorithm for linear programming works well in practice but not in (worst-case) theory. They considered instances formed by taking an arbitrary constraint matrix and perturbing it by adding independent Gaussian noise with variance to each entry. They showed that, in this case, the shadow-vertex pivot rule succeeds in expected polynomial time. Independently, Bohman et al. [4] studied the issue of Hamiltonicity in a dense graph when random edges are added. In the context of graph optimization problems we can also mention [14, 8], in the context of -SAT [7, 5], and in various other problems [18, 15, 20, 1].

Our work joins this long series of papers studying perturbed instances in a variety of problems. Specifically in our setting we study the following question: what if the balancers were not set completely adversarially but also not in a completely random fashion. Besides the mathematical and analytical challenge that such a problem poses, in real network applications one may not always assume that the random source is unbiased, or in some cases one will not be able to quantitatively measure the amount of randomness involved in the network generation. Still it is desirable to have an estimate of the typical behavior of the network. Although we do not claim that our smoothed-analysis model captures all possible behaviors, it does give a rigorous and tight characterization of the tradeoff between the quality of load balancing and the randomness involved in setting the balancers’ directions, under rather natural probabilistic assumptions.

As far as we know, no smoothed analysis framework was suggested to a networking related problem. Formally, we suggest the following framework.

### 1.3. The Model

We define both the smoothed-analysis aspect of the model, and the load-balancing one. For the load balancing part, our model is similar to (and, as we will shortly explain, a generalization of) the periodic balancing circuits studied in [17]. We think of the balancing network in terms of an -vertex graph. The processors in the network are the vertices of the graph, and balancers connecting processors are the (directed) edges of the graph.

Before we proceed, since what follows is somewhat heavy on notation and indices, it will be helpful for the reader to bear in mind the following legend: we use superscripts (in round brackets) to denote a time stamp, and subscripts to denote an index. In subscripts, we use the vertices of the graph as indices (thus assuming some ordering of the vertex set). For example, stands for the -entry in matrix , which corresponds to time/round .

Let be an arbitrary sequence of (not necessarily perfect) matchings. With each matching we associate a matrix with if and are matched in , if is matched in , if is not matched in , and otherwise.

In round , every two vertices matched in perform a balancing operation. That is, the sum of the number of tokens in both vertices is split evenly between the two, with the remaining token (if exists) directed to the vertex pointed by the matching edge.

###### Remark 1.1.

In periodic balancing networks (see [17] for example) an ordered set of (usually perfect) matchings is fixed. Every round of balancing is a successive application of the matchings. Our model is a (slight) generalization of the latter.

Let us now turn to the smoothed-analysis part. Given a balancing network consisting of a set of directed matchings, an -perturbation of the network is a flip of direction for every edge with probability independently of all other edges. Setting gives the completely “adversarial model”, and is the uniform random case.

###### Remark 1.2.

For our results, it suffices to consider . The case can be reduced to the case by flipping the initial orientation of all balancers and taking instead of . It is easy to see that both distributions are identical.

### 1.4. Our Contribution

For a load vector , its discrepancy is defined to be . We use to denote the unit vector whose all entries are 0 except the . For a matrix , stands for the second largest eigenvalue of (in absolute value). Unless stated otherwise, stands for the -norm of the vector . In the following, we will assume an ordering of the vertices from to . When we write , we refer to the case where and are connected by an undirected edge and .

###### Theorem 1.1.

Let be a balancing network with matchings . For any two time stamps satisfying , and any input vector with initial discrepancy , the discrepancy at time step in -perturbed  is whp at most

 (t2−t1)+3(12−α)t1+Λ1+Λ2,

where

 Λ1 =maxw∈V√logn∑t1i=1∑[u:v]∈M(i)((eu−ev)T(∏t2j=i+1P(i))ew)2\vrule height 8.959863pt depth -7.167891pt, Λ2

Before we proceed let us motivate the result stated in Theorem 1.1. There are two factors that affect the discrepancy: the fact that tokens are indivisible (and therefore the balancing operation may not be “perfect”), and how many balancing rounds are there. On the one hand, the more rounds there are the more balancing operations are carried, and the smoother the output is. On the other hand, the longer the process runs, its susceptibility to rounding errors and arbitrary placement of excess tokens increases. This is however only a seemingly tension, as indeed the more rounds there are, the smoother the output is. Nevertheless, in the analysis (at least as we carry it), this tension plays part. Specifically, optimizing over these two contesting tendencies is reflected in the choice of and . is the contribution resulting from the number of balancing rounds being bounded, and , along with the first two terms, account for the indivisibly of the tokens. In the cases that will interest us, will be chosen so that will be low-order terms compared to the first two terms.

Our Theorem 1.1 also implies the following results:

• For the aforementioned periodic setting Theorem 1.1 implies the following: after rounds (, is the so-called round matrix which corresponds to one period, the initial discrepancy) the discrepancy is whp at most

 O(dlog(Kn)ν⋅(12−α)+dloglognν).

Setting (and assuming is polynomial in ) we get the result of [17]333We point out that in the original statement in [17, Corollary 5], only the number of periods is counted. Hence, in their statement the number of rounds is by a factor of smaller., and for we get the result of [9]. (The restriction on being polynomial can be lifted but at the price of more cumbersome expressions in Theorem 1.1. Arguably, the interesting cases are anyway when the total number of tokens, and in particular , is polynomial). Complete details in Section 4.

• For the , after rounds the discrepancy is whp at most

 3(12−α)logn+loglogn+O(1).

Full details in Section 5.

Let us now turn to the lower bound. Here we consider the all-up-orientation of the balancers of a meaning that before the -perturbation, all balancers are directed to the wire with a smaller number.

###### Theorem 1.2.

Consider a with the all-up orientation of the balancers and assume that the number of tokens at each wire is uniformly distributed over (independently at each wire). The discrepancy of the -perturbed network is whp at least

 max{(12−α)logn−2loglogn,(1−o(1))(loglogn)/2}.

Two more points to note regarding the lower bound:

• For our lower bound matches the experimental findings of [12]. The authors examined ’s of size up to where all balancers are set in the same direction and the number of tokens at each input is a random number between and . Their observation was that the average discrepancy is close to (which matches our lower bound with ).

• The input distribution that we use for the lower bound is arguably more natural than the tailored and somewhat artificial ones used in previous lower bound proofs [13, 16].

Finally, we state a somewhat more technical result that we obtain, which lies in the heart of the proof of the lower bound and sheds light on the mechanics of the in the average case input. In what follows, for a balancer , we let be an indicator function which is 1 if had an excess token. By we denote the set of balancers that affect output wire (that is, there is a path through consecutive layers from the balancers to the output wire ). is the restriction of to balancers in layer .

###### Lemma 1.3.

Consider a network with any fixed orientation of the balancers. Assume that the number of tokens at each wire is uniformly distributed over (independently at each wire). Then every balancer in layer , , satisfies the following properties:

• ,

• moreover, for every wire , is a set of independent random variables.

The proof of the lemma is given in Section 2.3. Let us remark at this point that the lemma holding under such weak conditions is rather surprising. First, it is valid regardless of the given orientation. Secondly, the ’s of the balancers that affect the same output wire are independent. While this is obvious for balancers that are in the same layer, it seems somewhat surprising for balancers in subsequent layers that are connected.

### 1.5. Paper’s Organization

The remainder of the paper is organized as follows. We set out with the proof of Theorem 1.2 in Section 2, preceding the proof of Theorem 1.1 in Section 3. The reason is that the lower bound is concerned with the (a special case of our general model). The techniques used in the proof of the lower bound serve as a good introduction to the more complicated proof of the upper bound. In Sections 4 and 5 we show how to derive the special cases of the periodic balancing network and the from Theorem 1.1. Finally we present experimental results that we obtain for the in Section 6.

## 2. Proof of the Lower Bound

As we mentioned before, for the special case of the we adopt a “topographical” view of the network: calling the vertices wires, the time steps layers, the left-most side of the network the “input” and the right-most the “output”.

The proof outline is the following. Given an input vector (uniformly distributed over the range ), we shall calculate the expected divergence from the average load . The expectation is taken over both the smoothing operation and the input. After establishing the “right” order of divergence (in expectation) we shall prove a concentration result. One of the main keys to estimating the expectation is Lemma 1.3 saying that if the input is uniformly distributed as above, then for every balancer , (the probability is taken only over the input).

Before proceeding with the proof, let us introduce some further notation. Let be the number of tokens exiting on the top output wire of the network. For any balancer , is an indicator random variable which takes the value if the balancer was perturbed, and otherwise (Recall that all balancers are pointing up before the perturbation takes place).

Using the “standard” backward (recursive) unfolding (see also [12, 16] for a concrete derivation for the ) we obtain that,

 y1=μ+logn∑ℓ=12−logn+ℓ∑b∈B1(ℓ)Odd(b)⋅Ψ(b).

The latter already implies that the discrepancy of the entire network is at least

 y1−μ=logn∑ℓ=12−logn+ℓ∑b∈B1(ℓ)Odd(b)⋅Ψ(b),

because there is at least one wire whose output has at most tokens (a further improvement of a factor of will be obtained by considering additionally the bottom output wire and proving that on this wire only a small number of tokens exit).

Write , defining for each layer ,

 (1) Sℓ:=2−logn+ℓ∑b∈B1(ℓ)Odd(b)⋅Ψ(b).

### 2.1. Proof of (12−α)logn−2loglogn

We now turn to bounding the expected value of . Using the following facts: (a) the and are independent (b) Lemma 1.3 which gives , (c) the simple fact that and (d) the fact that in layer there are balancers which affect output wire 1 (this is simply by the structure of the ), we get

 E[Sℓ] =2−logn+ℓ∑b∈B1(ℓ)12⋅(12−α) =12(12−α).

This in turn gives that

Our next goal is to claim that typically the discrepancy behaves like the expectation; in other words, a concentration result. Specifically, we apply Hoeffdings bound to each layer separately. It is applicable as the random variables are independent for balancers within the same layer (such balancers concern disjoint sets of input wires, and the input to the network was chosen independently for each wire). For the bound to be useful we need the range of values for the random variables to be small. Thus, in the probabilistic argument, we shall be concerned only with the first layers (the last layers we shall bound deterministically). We use the following Hoeffding bound:

###### Lemma 2.1 (Hoeffdings Bound).

Let be a sequence of independent random variables with for each . Then for any number ,

 Pr[∣∣∑ni=1Zi−E[∑ni=1Zi]∣∣⩾ε]⩽2⋅exp(−2ε2∑ni=1(bi−ai)2).

For any random variable , let be the difference between the maximum and minimum value that can attain. For a balancer we plug in,

 Zb =2−logn+ℓ⋅Odd(b)⋅Ψ(b), ε =2(ℓ−logn+loglogn)/2, Range[Zb]2 =(2ℓ−logn)2,

and the sum is over balancers in layer . Therefore,

 Pr[|Sℓ−E[Sℓ]|⩾2(ℓ−logn+loglogn)/2] ⩽2exp(−2⋅2ℓ−logn+loglogn2ℓ−logn) ⩽n−1.

In turn, with probability at least (take the union bound over at most terms):

 logn−loglogn∑ℓ=1Sℓ⩾12(12−α)(logn−loglogn)−logn−loglogn∑ℓ=12(ℓ−logn+loglogn)/2.

The second term is just a geometric series with quotient  , and therefore can be bounded by .

For the last layers, we have that for every , cannot exceed , and therefore their contribution, in absolute value is at most . Wrapping it up, whp

 y1−μ= logn∑ℓ=1Sℓ⩾12(12−α)(logn−loglogn)−4−12loglogn.

The same calculation implies that the number of tokens at the bottom-most output wire deviates from in the same way (just in the opposite direction), so

 yn−μ⩽ −12(12−α)(logn−loglogn)+4+12loglogn.

Hence, the discrepancy is whp lower bounded by (using the union bound over the top and bottom wire, and not claiming independence)

 y1−yn ⩾(12−α)logn−8−(32−α)loglogn ⩾(12−α)logn−2loglogn.

### 2.2. Proof of (1−o(1))loglogn/2

The proof here goes along similar lines to Section 2.1, only that now we choose the set of balancers we apply it to more carefully. By the structure of the , the last layers form the parallel cascade of independent  subnetworks each of which has wires (by independent we mean that the sets of balancers are disjoint).

We call a subnetwork good if after an -perturbation of the all-up initial orientation, all the balancers affecting the top (or bottom) output wire were not flipped (that is, still point up).

The first observation that we make is that whp (for a suitable choice of , to be determined shortly) at least one subnetwork is good. Let us prove this fact.

The number of balancers affecting the top (or bottom) wire in one of the subnetworks is In total, there are no more than affecting both wires. The probability that none of these balancers was flipped is (using our assumption ) Choosing , this probability is at least ; there are at least such subnetworks, thus the probability that none is good is at most

 (1−n−1/2)n/logn=o(1).

Fix one good subnetwork and let be the average load at the input to that subnetwork. Repeating the arguments from Section 2.1 (with , re-scaled to , and now using the second item in Lemma 1.3 which guarantees that the probability of is still , for any orientation of the balancers) gives that in the top output wire of the subnetwork there are whp at least tokens, while on the bottom output wire there are whp at most tokens. Using the union bound, the discrepancy is whp at least their difference, that is, at least .

### 2.3. Proof of Lemma 1.3

The following observation is the key idea in proving Lemma 1.3. Recall that stands for the set of balancers that affect wire . For a balancer in layer let describe an assignment of values for all balancers in preceding layers that affect (that is, there is a path from to through consecutive layers).

###### Lemma 2.2.

Consider a network with any orientation of the balancers. Assume that the number of tokens at each wire is uniformly distributed over . Consider a balancer in layer with and let denote the two input wires that go into . Then for any assignment , is uniformly distributed over , for .

The lemma easily implies Lemma 1.3: Consider a balancer in layer , with with two inputs . Lemma 2.2 implies that is uniformly distributed over , and in particular is uniform over that range mod 2. Hence both are odd/even with probability 1/2. Further observe that by the structure of the  network, the input wires depend on disjoint sets of wires and balancers. By the two latter facts it follows that the sum is odd (or even) with probability , or in other words . The independence part follows from the fact that this is true for every conditioning on balancers from previous layers that affect (by Lemma 2.2) and the fact that balancers in the same layer that affect the same output wire are independent by construction, as those balancers depend on disjoint sets of balancers and disjoint sets of input wires.

###### Proof of Lemma 2.2.

We prove the lemma by induction on , the layer of the balancer. The base case is immediate: the input to a balancer in layer is just the original input, which is by definition distributed uniformly over which is simply . Assume the lemma is true for all balancers in layer and consider a balancer in layer . Let be its two input wires. By the structure of the , the value on each wire is determined by a disjoint set of balancers and preceding wires, therefore we can treat, w.l.o.g., only . Let be the part of that affects . ( would be the set of balancers so that there is a path from them to . Since the initial load on the input wires is chosen independently at every wire, and the sets of balancers affecting and are disjoint, indeed only affects , and the same applies for and ). Thus our goal is to calculate

 Pr[x1≡kmod(n/2ℓ)\leavevmode\nobreak ∣∣\leavevmode\nobreak A1(b)].

Let be the balancer in layer whose one outlet is , and let be its two inputs. Recall that , and a possible +1 addition in case this sum is odd and the balancer points in the direction of . Furthermore it is easy to verify that for every (assume is even, if odd write instead)

 a/2≡kmod(n/2ℓ)⇔a≡2kmod(n/2ℓ−1).

Therefore for the event to occur, either

• , or

• (assuming w.l.o.g. that the balancer points towards , otherwise the sum equals ).

Let us consider the first case.

 Pr[x′1+x′2≡2kmod(n/2ℓ−1)∣A1(b)] =Pr⎡⎢⎣n/2ℓ−1−1⋁i=0x′1≡imod(n/2ℓ−1)∧x′2≡2k−imod(n/2ℓ−1)∣A1(b)⎤⎥⎦.

Now observe that the values of and are determined independently, as again, by the structure of the , they involve disjoint sets of balancers and input wires. Similarly to we can define and which correspond to the parts of affecting and . By the structure of the , and depend on a disjoint set of input wires (and balancers). As the input is chosen independently for every wire, does not affect and similarly does not affect . Thus the latter reduces to

 n/2ℓ−1−1∑i=0Pr[x′1≡imod(n/2ℓ−1)∣A′1(b)] n/2ℓ−1−1∑i=0⋅Pr[x′2≡2k−imod(n/2ℓ−1)∣A′2(b)].

By the induction hypothesis (applied to the at layer ), for every , each of and is uniformly distributed over the range , and therefore in particular the entire expression does not depend on , or, put differently is the same for every choice of . The same argument holds for the case . This completes the proof. ∎

## 3. Proof of the Upper Bound

We shall derive our bound by measuring the difference between the number of tokens at any vertex and the average load (as we did in the proof of the lower bound for the ). Specifically we shall bound , being the number of tokens at vertex at time (we use for the vector of loads at time ). There are two contributions to the divergence from (which we analyze separately):

• The divergence of the idealized process from due to its finiteness.

• The divergence of the actual process from the idealized process due to indivisibility.

The idea to compare the actual process to an idealized one was suggested in [17] and was combined with convergence results of Markov chains. Though we were inspired by the basic setup from [17] and the probabilistic analysis from [9], our setting differs in a crucial point: when dealing with the case , we get a delicate mixture of the deterministic and the random model. For example, the random variables in our analysis are not symmetric anymore which leads to additional technicalities (cf. Lemma 3.2).

Formally, let be the load vector of the idealized process at time , then by the triangle inequality (1 is the all-one vector)

 ∥y(t)−μ1∥∞⩽∥y(t)−ξ(t)∥∞+∥ξ(t)−μ1∥∞.
###### Proposition 3.1.

Let be a balancing network with matchings . Then,

• ,

• whp over the -perturbation operation,

Theorem 1.1 then follows. The proof of the first part of the proposition consists of standard spectral arguments and is given in Section 3.1 for completeness. The proof the second part is more involved and is given in Section 3.2.

### 3.1. Proof of Proposition 3.1: Bounding ∥ξ(t2)−μ1∥∞

Letting be the initial load vector, it is easily seen that

 ξ(t2)=ξ(0)P(1)P(2)⋯P(t2),

where is the matrix corresponding to matching (as defined in Section 1.3). For simplicity let us abbreviate

 P[i,j]:=P(i)P(i+1)P(i+2)⋯P(j).

Since is real valued and symmetric (as each of the ’s is), it has real-valued eigenvalues whose corresponding eigenvectors form an orthogonal basis of . Next we observe that

 ξ(t)−μ1=ξ(0)P[1,t2]−μ1P[1,t2]=(ξ(0)−μ1)P[1,t2],

since 1 is an eigenvector of corresponding to . Furthermore is just the total (initial) number of tokens, and therefore by definition we get . Finally, let us project onto , that is, write ( as we said). For our goal to bound , it suffices to bound (recall that refers to the -norm) as for every vector , By the above,

 ∥∥ξ(t2)−μ1∥∥ =∥∥(ξ(0)−μ1)P[1,t2]∥∥ =∥∥∥n∑i=2civi⋅P[1,t2]∥∥∥ =∥∥∥n∑i=2ciλivi∥∥∥.

Recall that denotes the second largest eigenvalue of in absolute value. By the definition of the -norm, and using the fact that the form an orthogonal basis, the latter equals

 (n∑i=2c2iλ2i∥vi∥2)1/2⩽ λ(P[1,t2])⋅(n∑i=2c2i∥vi∥2)1/2 = λ(P[1,t2])⋅∥ξ(0)−μ1∥ ⩽ λ(P[1,t2])K√n\vrule height 6.999893pt depth -5.5% 99915pt.

### 3.2. Proof of Proposition 3.1: Bounding ∥y(t2)−ξ(t2)∥∞

The proof of this part resembles in nature the proof of Theorem 1.2. Assuming an ordering of ’s vertices, for a balancer in round , , , we set if the initial direction (before the perturbation) is and otherwise (in the lower bound we considered the all-up orientation thus we had no use of these variables). As in Section 2, for a balancer in round , the random variable is if the balancer is perturbed and otherwise. Using these notations we define a rounding vector , which accounts for the rounding errors in step . Formally,

 ρ(t)u=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩Odd(y(t−1)u+y(t−1)v)⋅Ψ(t)u,v⋅Φ(t)u,vif u and v are matched in M(t) and uv,0if u is unmatched.

Now we can write the actual process as follows:

 (2) y(t)=y(t−1)P(t)+ρ(t).

Let be the set of balancers at time with no excess token, and the ones with. We can rewrite as follows:

 ρ(t)=∑(u,v)∈M(t)OddΨ(t)u,v⋅Φ(t)u,v⋅(eu−ev).

Unfolding equation (2), yields then

 y(t)=y(0)P[1,t]+t∑i=1ρ(i)P[i+1,t],

where . Observe that is just (as ), and therefore

 y(t)−ξ(t) =∑ti=1ρ(i)P[i+1,t] =∑ti=1∑(u,v)∈M(i)OddΨ(i)u,v⋅Φ(i)u,v⋅(eu−ev)⋅P[i+1,t].

In turn,

 (3) (y(t)−ξ(t))w=∑ti=1∑(u,v)∈M(i)OddΨ(i)u,v⋅Φ(i)u,v⋅(P[i+1,t]u,w−P[i+1,t]v,w).

Our next task is to bound equation (3) to receive the desired term from Proposition 3.1. We do that similar in spirit to the way we went around in Section 2.1. We break this sum into its first summands (whose expected sum we calculate and to which we apply a large-deviation-bound). The remaining terms are bounded deterministically.

One major difficulty in the general setting is that Lemma 1.3 (which was crucial in the proof of Theorem 1.2) does not hold in general as its proof makes substantial use of the special structure of the .

Equation 3 with yields

 (x(t2)−ξ(t2))w = ∑t2i=1∑(u,v)∈M(i)OddΨ(i)u,vΦ(i)u,v(P[i+1,t2]u,w−P[i+1,t2]v,w).

With denoting the row-vector with at -th column and at -th column and zeros elsewhere, we can rewrite and split this equation as follows:

 (x(t2)−ξ(t2))w = ∑t1i=1∑(u,v)∈M(i)OddΨ(i)u,vΦ(i)u,v(eu,vP[i+1,t2]ew) +∑t2i=t1+1∑(u,v)∈M(i)OddΨ(i)u,vΦ(i)u,v(eu,vP[i+1,t2]ew).

Clearly,

 ∑t2i=t1+1∑(u,v)∈M(i)OddΨ(i)u,vΦ(i)u,v(eu,vP[i+1,t2]ew) = ∑t2i=t1+1(∑(u,v)∈M(i)OddΨ(i)u,vΦ(i)u,veu,v)⋅(P[i+1,t2]ew).

Observe that is a vector all of whose entries are bounded by in absolute value. Since is a stochastic matrix, the sum of all entries of the -th column of , which is , is exactly one, hence

 ∣∣∣∑t2i=t1+1(∑(u,v)∈M(i)OddΨ(i)u,vΦ(i)u,veu,v)⋅(P[i+1,t2]ew)∣∣∣ ⩽ (t2−t1).

It remains to bound

 WOdd:=∑t1i=1∑(u,v)∈M(i)OddΨ(i)u,vΦ(i)u,v(eu,vP[i+1,t2]ew).

Because is not necessarily a sum of independent random variables, it will be more convenient to work with the following quantity (which is a sum of independent random variables, as it assumes that every balancer gets an excess token),

 W:=∑t1i=1∑(u,v)∈M(i)Ψ(i)u,vΦ(i)u,v(eu,vP[i+1,t2]ew).

So our strategy is first to bound the deviation of from its mean by Hoeffdings Bound and then apply the following lemma (whose proof is in Section 3.3), which justifies using instead of .

###### Lemma 3.2.

Fix . For all with and arbitrary weights , let

 WOdd :=∑t2i=t1∑(u,v)∈M(i)OddΨ(i)u,vΦ(i)u,vw(i)u,v, WEven :=∑t2i=t1∑(u,v)∈M(i)EvenΨ(i)