Ignore or Comply? On Breaking Symmetry in Consensus

# Ignore or Comply? On Breaking Symmetry in Consensus

## Abstract

We study consensus processes on the complete graph of nodes. Initially, each node supports one from opinion from a set of up to different opinions. Nodes randomly and in parallel sample the opinions of constant many nodes. Based on these samples, they use an update rule to change their own opinion. The goal is to reach consensus, a configuration where all nodes support the same opinion.

We compare two well-known update rules: 2-Choices and -Majority. In the former, each node samples two nodes and adopts their opinion if they agree. In the latter, each node samples three nodes: If an opinion is supported by at least two samples the node adopts it, otherwise it randomly adopts one of the sampled opinions. Known results for these update rules focus on initial configurations with a limited number of colors (say ), or typically assume a bias, where one opinion has a much larger support than any other. For such biased configurations, the time to reach consensus is roughly the same for 2-Choices and -Majority.

Interestingly, we prove that this is no longer true for configurations with a large number of initial colors. In particular, we show that -Majority reaches consensus with high probability in rounds, while 2-Choices can need rounds. We thus get the first unconditional sublinear bound for -Majority and the first result separating the consensus time of these processes. Along the way, we develop a framework that allows a fine-grained comparison between consensus processes from a specific class. We believe that this framework might help to classify the performance of more consensus processes.

Keywords: Distributed Consensus; Randomized Protocols; Majorization Theory; Leader Election

\WarningFilter

latexMarginpar \WarningFilterlatexCitation \WarningFilterlatexReference \xpretocmd1

## 1 Introduction

We study consensus (also known as agreement) processes resulting from executing a simple algorithm in a distributed system. The system consists of anonymous nodes connected by a complete graph. Initially, each node supports one opinion from the set . We refer to these opinions as colors. The system state is modeled as a configuration vector , whose -th component denotes the number (support) of nodes with color . A consensus process is specified by an update rule that is executed by each node. The goal is to reach a configuration in which all nodes support the same color; the special case where nodes have pairwise distinct colors is leader election, an important primitive in distributed computing. We assume a severely restricted synchronous communication mechanism known as Uniform Pull [DGH87, KSSV00, KDG03]. Here, in each discrete round, nodes independently pull information from some (typically constant) number of randomly sampled nodes. Both the message sizes and the nodes’ local memory must be of size .

The so-called Voter process (also known as Polling), uses the most naïve update rule: In every round, each node samples one neighbor independently and uniformly at random and adopts that node’s color. Two further natural and prominent consensus processes are the 2-Choices and the -Majority process. Their corresponding update rules, as executed synchronously by each node, are as follows:

• 2-Choices: Sample two nodes independently and uniformly at random. If the samples have the same color, adopt it. Otherwise, ignore them and keep your current color.

• -Majority: Sample three nodes independently and uniformly at random. If a color is supported by at least two samples, adopt it. Otherwise, adopt the color of one of them at random2.

One reason for the interest in these processes is that they represent simple and efficient self-stabilizing solutions for Byzantine agreement [PSL80, Rab83]: achieving consensus in the presence of an adversary that can disrupt a bounded set of nodes each round [BCN14, BCN16, CER14, EFK16]. Further interest stems from the fact that they capture aspects of how agreement is reached in social networks and biological systems [BSDDS10, CER14, FPM02].

At first glance, the above processes look quite different. But a slight reformulation of -Majority’s update rule reveals an interesting connection:

• -Majority (alt.): Sample two nodes independently and uniformly at random. If the samples have the same color, adopt it. Otherwise, sample a new neighbor and adopt its color.

This highlights the fact that -Majority is a combination of 2-Choices and Voter: Each node performs the update rule of 2-Choices. If the sampled colors do not match, instead of keeping its color, executes the update rule of Voter. Interestingly enough, both -Majority and 2-Choices behave identical in expectation3. In comparison to Voter, both 2-Choices and -Majority exhibit a drift: they favor colors with a large support, for which it is more likely that the first two samples match. In particular, if there is a certain initial bias4 towards one color, Voter still needs linear time (in ) to reach consensus, while both 2-Choices and -Majority can exploit the bias to achieve sublinear time. On the other hand, it is unknown how 2-Choices and -Majority behave when they start from configurations having a large number of colors and no (or small) bias (see Section 1.1 for details).

In this paper, we compare the time 2-Choices and -Majority require to reach consensus. In particular, we prove that there is a polynomial gap between their performance if the bias is small and the number of colors is large (say ). This result follows from our unconditional sublinear bound for -Majority and an almost linear lower bound for 2-Choices. Details of our results and contribution are given in Section 1.2.

### 1.1 Related Work

Consensus processes are a quite general model that can be used to study and understand different flavors of \enquotespreading phenomena, for example the spread of infectious diseases, rumors and opinions in societies, or viruses/worms in computer networks. Apart from the above mentioned processes, such spreading processes include the Moran process [LHN05, DGRS14], contact processes, and classic epidemic processes [BG90, Lig99, Mor16]. This overview concentrates on results concerned with the time needed by Voter, 2-Choices, and -Majority to reach consensus. We also provide a short comparison of these processes and briefly discuss some of the slightly more distant relatives at the end of this section.

#### Voter

Previous work provides strong results for the time to consensus of the Voter process, even for arbitrary graphs. These results exploit an interesting duality: the time reversal of the Voter process turns out to be the coalescing random walk process (see [HP01, AF02]). The expected runtime of Voter on the complete graph and with nodes of pairwise distinct colors is . This follows easily from results for more general graphs and the above mentioned duality: The authors of [CEOR13] provide the upper bound on the expected coalescing time. Here is the graph’s spectral gap and , where denotes the average degree. In [BGKMT16] the authors show that the expected time to consensus is bounded by . Here, is the number of edges, the conductance, and the minimal degree.

#### 2-Choices

To the best of our knowledge, the only work that considers the case for 2-Choices is [EFK16]. The authors study the complete graph and show that 2-Choices reaches consensus with high probability in rounds, provided that for a small constant and an initial bias of . In [CER14, CER15], the authors consider 2-Choices for on different graphs. For random -regular graphs, [CER14] proves that all nodes agree on the first color in rounds, provided that the bias is . The same holds for arbitrary -regular graphs if the bias is , where is the second largest eigenvalue of the transition matrix. In [CER15], these results are extended to general expander graphs.

#### 3-Majority

All theoretical results for -Majority consider the complete graph. The authors of [BCN14] assume that the bias is . Under this assumption, they prove that consensus is reached with high probability in rounds, and that this is tight if . The only result without bias [BCN16] restricts the number of initial colors to . Under this assumption, they prove that -Majority reaches consensus with high probability in rounds. Their analysis considers phases of length and shows that, at the end of each phase, one of the initial colors disappears with high probability. Note that this approach – so far the only one not assuming any bias – cannot yield sublinear bounds with respect to .

Comparing the above processes on the complete graph for , we see that there are situations where Voter is much slower than 2-Choices or -Majority. Even with a linear bias, Voter is known to have linear runtime. In contrast, whenever there is a sufficient bias towards one color, both 2-Choices and -Majority can exploit this to achieve sublinear runtime5. However, to the best of our knowledge there are no unconditional results on 2-Choices and -Majority. All but one results need a minimum bias (at least ), and the only approach that works without any bias restricts the number of colors to  [BCN16].

A related consensus process is -Median [DGM11]. Here, every node updates its color (a numerical value) to the median of its own value and two randomly sampled nodes. Without assuming any initial bias, the authors show that this process reaches consensus with high probability in rounds. This is seemingly stronger than the bounds achieved for -Majority and 2-Choices without bias. However, it comes at the price of a complete order on the colors (our processes require colors only to be testable for identity). Moreover, -Median is not self-stabilizing for Byzantine agreement (unlike -Majority and 2-Choices [BCN16, EFK16]): it cannot guarantee validity6 [BCN16]. Another consensus process is the UndecidedDynamics. Here, each node randomly samples one neighbor and, if the sample has a different color, adopts a special \enquoteundecided color. In subsequent rounds, it tries to find a new (real) color by sampling one random neighbor. The most recent results [BCN15] show that, for a large enough bias, consensus is reached with high probability in at most rounds. Slightly more involved variants yield improved bounds of  [BFGK16, GP16, EFK16]. However, observe that for all nodes become undecided with constant probability instead of agreeing on a color.

### 1.2 Contribution & Approach

In this work, we provide an upper bound on -Majority and a lower bound on 2-Choices that solve two open issues: We give the first unconditional sublinear bound on any of these processes (an open issue from, e.g., [BCN16]) and prove that there can be a polynomial gap between the performance of -Majority and 2-Choices (see Theorem 1 below). One should note that this gap is in stark contrast not only to the expected behavior of both processes (which is identical) but also to the setting when there is a bias towards one color (where both processes exhibit the same asymptotic runtime ; see Section 1.1).

The following theorem states slightly simplified versions of our upper and lower bounds (see Theorem 4 and Theorem 5 for the detailed statements).

###### Theorem 1 (Simplified).

Starting from an arbitrary configuration, -Majority reaches consensus with high probability in rounds. When started from a configuration where each color is supported by at most nodes, 2-Choices needs with high probability rounds to reach consensus.

The lower bound for 2-Choices follows mostly by standard techniques, using a coupling with a slightly simplified process and Chernoff bounds. The proof of the upper bound for -Majority is more involved and based on a combination of various techniques and results from different contexts. This approach not only results in a concise proof of the upper bound, but yields some additional, interesting results along the way. We give a brief overview of our approach in the next paragraph.

#### Approach

To derive our upper bound on the time to consensus required by -Majority, we split the analysis in two phases: {enumerate*}[(a)]

the time needed to go from to colors and

the time needed to go from to one color. The runtime of the second phase follows by a simple application of [BCN16] and is . Bounding the runtime of the first phase is more challenging: we cannot rely on the drift from a bias or similar effects, and it is not clear how to perform a direct analysis in this setting (-Majority is geared towards biased configurations). To overcome this issue, we resort to a coupling between Voter and -Majority. Since the construction of such a coupling seems elusive, we use some machinery from majorization theory [MOA11] to merely prove the existence of the coupling (see next paragraph). As a consequence of (the existence of) this coupling, we get that the time needed by -Majority to reduce the number of colors to a fixed value is stochastically dominated by the time Voter needs for this (Lemma 2). This, finally, allows us to upper bound the time needed by -Majority 7 to go from to colors by the time Voter needs for this (which, in turn, we bound in Lemma 3 by ).

The technically most interesting part of our analysis is the proof of the stochastic dominance between -Majority and Voter. It works for a wide class of processes (including Voter and -Majority), which we call anonymous consensus (AC-) processes (see Definition 1). These are defined by an update rule that causes each node to adopt any color with the same probability that depends only on the current frequency of colors.

In the following, we provide a natural way to compare two processes. First, we define a way to compare two configurations and . We use vector majorization for this purpose: majorizes () if the total support of the largest colors in is not smaller than that in for all . In particular, note that a configuration where all nodes have the same color majorizes any other configuration. Let us write for the (random) configuration obtained by performing one step of a process on configuration . Consider two processes and two configurations with . We say dominates if, for all , the following holds: {quoting}[leftmargin=2.9em] The sum of the largest components of the vector is not smaller than that of . Note that this definition is not restricted to AC-processes.

Our main technical result (Theorem 2) proves that, for two AC-processes, dominating implies that the time needed by to reduce the number of colors to a fixed value stochastically dominates the time needs for this. Note that while this statement might sound obvious, it is not true in general (if one of the processes is not an AC-process): 2-Choices dominates Voter, but it is much slower in reducing the number of colors when there are many colors.

## 2 Consensus Model & Technical Framework

This section introduces our technical framework using concepts from majorization theory, which is used in Section 3 to derive the sublinear upper bound on -Majority. After defining the model and general notation, we provide a few definitions and state the main result of this section (Theorem 2).

### 2.1 Model and Notation

We consider the consensus problem on the complete graph of nodes. Initially, each node supports one opinion (or color) from the set , where . Nodes interact in synchronous, discrete rounds using the Uniform Pull mechanism [DGH87]. That is, during every round each node can ask the opinion of a constant number of random neighbors. Given these opinions, it updates its own opinion according to some fixed update rule. The goal of the system is to reach consensus (a configuration where all nodes support the same opinion).

Let and . We describe the system state after any round by an -dimensional integral vector with . Here, the -th component corresponds to the number of nodes supporting opinion . If , then for all . We use to denote the set of all possible configurations.

Let and . We define and . Moreover, let denote a permutation of such that all components are sorted non-increasingly. We write and say majorizes if, for all , we have and . For two random variables and we write if is stochastically dominated by , i.e., for all . A function is Schur-convex if . For a probability vector , we use to denote the multinomial distribution for trials and categories (the -th category having probability ).

### 2.2 Comparing Anonymous Consensus Processes

We first define a class of processes defined by update rules that depend only on the current configuration. The update rule states that each nodes adopts a color with the same probability , where is the current configuration. In particular, node IDs (including the sampling node’s ID) do not influence the outcome. In this sense, such update rules are anonymous.

###### Definition 1 (Anonymous Consensus Processes).

Given a distributed system of nodes, an anonymous consensus process is characterized by a process function with for all . When in configuration , each node independently adopts opinion with probability . We use the shorthand AC-processes to refer to this class.

Given an AC-process and a fixed initial configuration, let8 denote the configuration of at time . By Definition 1, is a Markov chain, since depends only on . Another immediate consequence of Definition 1 is that conditional on is distributed according to . In other words, the 1-step distribution of an AC-process is a multinomial distribution. Two important examples of AC-processes include Voter and -Majority:

• In the Voter process , each node samples one node (according to the pull mechanism) and (always) adopts that node’s opinion. Thus

 α(V)i(c)=cin. (1)
• In the -Majority process , each node samples independently and uniformly at random three nodes. If a color is supported by at least two of the samples, adopt it. Otherwise, adopt a random one of the sampled colors. Simple calculations (see [BCN14]) show

 α(3M)i(c)=cin⋅(1+cin−\normcn22). (2)

For any protocol starting with configuration let denote the first time step where the number of remaining colors reduces to where . The next definition introduces dominance between protocol. Intuitively, a protocol dominates another protocol if their expected behavior preserves majorization.

###### Definition 2 (Protocol Dominance).

Consider two (not necessarily AC-) processes . We say dominates if holds for all with .

Note that, in the case of AC-protocols, Definition 2 can be stated as follows: dominates if and only if for all with . With this, the main result of our framework can be stated as follows.

###### Theorem 2.

Consider two AC-Processes and where dominates . Assume and are started from the same configuration . Then, for any , the time needed by to reduce the number of remaining colors to dominates the time needs for this, i.e.,

 TκP′(c)≥stTκP(c).

One should note that the statement of Theorem 2 is not true in general (i.e., for non-AC-processes). In particular, 2-Choices dominates Voter, but our upper bound on Voter (Section 3.2) and our lower bound on 2-Choices (Theorem 5) contradict the statement of Theorem 2.

### 2.3 Coupling two Ac-Processes

In order to prove Theorem 2, we formulate a strong 1-step coupling property for AC-processes:

###### Lemma 1 (1-Step Coupling).

Let and be two AC-processes. Consider any two configurations with . Let and be the configurations of and after one round, respectively. Then, there exists a coupling such that .

Note that Theorem 2 is an immediate consequence of Lemma 1: Since dominates (which is, for AC-processes, equivalent to for all with ) we can apply Lemma 1 iteratively to get Theorem 2. The fine-grained comparison enabled by Lemma 1 is based on three observations:

1. The (pre-) order \enquote on the set of configurations naturally measures the closeness to consensus. Indeed, a configuration with only one remaining color is maximal with respect to \enquote. Similarly, the -color configuration is minimal.

2. We can define a vector variant \enquote of stochastic domination (see Definition 3) such that ([MOA11, Proposition 11.E.11] or Proposition 1 in Appendix A).

3. Consider two configurations with . Since and , the previous observations imply that one step of on is stochastically \enquotebetter than one step of on . Our goal is to apply Lemma 1 iteratively to get Theorem 2. For this, we prove a coupling showing majorization between the resulting configurations. We achieve this via a variant of Strassen’s Theorem (see Theorem 3 below), which translates stochastic domination among random vectors to the existence of such a coupling.

We now give a definition of stochastic majorization that is compatible with the preorder \enquote on the configuration space (cf. [MOA11, Chapter 11]).

###### Definition 3 (Stochastic Majorization).

For two random vectors and in , we write and say that stochastically majorizes if for all Schur-convex functions on such that the expectations are defined.

We proceed by stating the aforementioned variant (Theorem 3) of Strassen’s Theorem (Theorem 6) whose derivation is provided in Section A.2.

###### Theorem 3 (Strassen’s Theorem (variant)).

Consider a closed subset such that the set is closed. For two random vectors and over , the following conditions are equivalent:

1. (Stochastic Majorization) and

2. (Coupling) there is a coupling between and such that .

With this, Lemma 1 follows by a straightforward combination of the aforementioned machinery. See Section A.3 for details.

## 3 Upper Bound for 3-Majority

In this section, we provide a sublinear upper bound on the time needed by -Majority to reach consensus with high probability. This is one of our main results and is formulated in the following theorem.

###### Theorem 4.

Starting from any configuration , -Majority reaches consensus w.h.p. in rounds.

The analysis is split into two phases, each consisting of rounds.

Phase 1: From up to to colors.

This is the crucial part of the analysis. Instead of analyzing -Majority directly, we use our machinery from Section 2.2 to show that -Majority is not slower than Voter (Lemma 2). Then, we prove that Voter reaches colors in rounds (Lemma 3).

Phase 2: From up to to color (consensus).

Once we reached a configuration with colors, we can apply [BCN16, Theorem 3.1] (see Theorem 8 in Appendix A), a previous analysis of -Majority. It works only for initial configurations with at most colors ( arbitrarily small). In that case, [BCN16, Theorem 3.1] yields a runtime of . Since the first phase leaves us with colors, this immediately implies that the second phase takes rounds.

This section proceeds by proving the runtime of Phase 1 in two steps: dominating the runtime of -Majority by that of Voter (Section 3.1) and proving the corresponding runtime for Voter (Section 3.2). Afterwards, Section A.7 combines these results together with [BCN16, Theorem 3.1] to prove Theorem 4.

### 3.1 Analysis of Phase 1: 3-Majority vs. Voter

We prove the following lemma.

###### Lemma 2.

Consider Voter () and -Majority () started from the same initial configuration . There is a coupling such that after any round, the number of remaining colors in Voter is not smaller than those in -Majority. In particular, the time Voter needs to reach consensus stochastically dominates the time needed by -Majority to reach consensus, i.e.,

 Tκ3M(c)≤stTκV(c).
###### Proof.

By Theorem 2, all we have to prove is (see Section 2.2). To this end, consider two configurations with . Let and . We have to show . Since these are probability vectors, we have . It remains to consider the partial sums for . For this, let and . Remember that (Equation 2) and (Equation 1). In the following, we assume (w.l.o.g.) and (this implies and ). We compute

 k∑i=1pi−k∑i=1~pi =k∑i=1x2i+k∑i=1xi−\normx22k∑i=1xi−k∑i=1~xi (3) ≥k∑i=1x2i−\normx22k∑i=1xi.

We have to show that this last expression is non-negative, which is equivalent to

 \normx22≤(k∑i=1x2i)/(k∑i=1xi). (4)

This holds trivially for (where we have equality). Thus, it is sufficient to show that is non-increasing in . That is, for any we seek to show the inequality

 ∑k+1i=1x2i∑k+1i=1xi=∑ki=1x2i+x2k+1∑ki=1xi+xk+1≤∑ki=1x2i∑ki=1xi. (5)

This inequality is of the form , where . Rearranging shows that this is equivalent to . Thus, Equation 5 holds if and only if . This last inequality holds via , where we used . This finishes the proof. ∎

### 3.2 Analysis of Phase 1: A Bound for Voter

We analyze the time the Voter process takes to reduce the number of remaining colors from to . One should note that [BGKMT16] studies a similar process. However, their analysis relies critically on the fact that their process is lazy (i.e., nodes do not sample another node with probability ), while our proof does not require any laziness.

We make use of the well-known duality (via time reversal) between the Voter process and coalescing random walks. In the coalescing random walks process there are initially independent random walks, one placed at each of the nodes. While performing synchronous steps, whenever two or more random walks meet, they coalesce into a single random walk. Let denote the number of steps it takes to reduce the number of random walks from to in the coalescing random walks process (the coalescence time). Similarly, let denote the number of rounds it takes Voter to reduce the number of remaining colors from to .

Here we only sketch the main ideas of the proof, while the full proof is given in Section A.6.

###### Lemma 3.

Consider an arbitrary initial configuration . Voter reaches a configuration having at most remaining colors w.h.p. in rounds, i.e., .

###### Sketch of Proof..

It is well-known (e.g., [AF02]), that . This statement generalizes for all (see Lemma 4 in Appendix A for a proof) to

 TkV=TkC. (6)

Thanks to the previous identity, we can prove the lemma’s statement by proving that w.h.p. . To this end, we show that . From this we can easily derive a high probability statement (see the full proof in Section A.6). We now analyze the coalescing random walks.

Let denote the number of coalescing random walks at time . We have and . One can argue (see the full proof for details) that in expectation

 E[Xt+1\leavevmode\nobreak |\leavevmode\nobreak Xt=x]≤x−x210n. (7)

Using this expected drop together with a drift theorem (Theorem 7 in Appendix A.4) we finally get . Using this and Equation 6 we finally derive

## 4 Lower Bound for 2-Choices

This section gives an almost linear worst-case lower bound on the time needed by 2-Choices to reach consensus with high probability. It turns out that, when started from an almost balanced configuration, the consensus time is dictated by the time it takes for one of the colors to gain a support of . To prove this result, we prove a slightly stronger statement, that captures the slow initial part of the process when started from configurations with a maximal load of . Here we only provide a sketch of proof. The full proof is given in Section A.8

###### Theorem 5.

Let be a sufficiently large constant. Consider the 2-Choices process starting from any initial configuration . Let be the support of the largest color. Then, for , it holds with high probability that no color has a support larger than for rounds. In symbols,

 Prmaxici(t)>ℓ′ for some t

In particular, starting from the -color configuration, it holds with high probability that no color has a support larger than for rounds.

###### Sketch of Proof.

Let . For any fixed opinion we show that , so that, by a union bound over all opinions and using that , we obtain . Intuitively, we would like to show that, conditioning on , the expected number of nodes joining opinion is dominated by a binomial distribution with parameters and . The main obstacle to this is that naïvely applying Chernoff bounds for every time step yields a weak bound, since with constant probability at each round at least one color increases its support by a constant number of nodes. Instead, we consider a new process in which the number of nodes supporting color at time majorizes as long as ; we will then show that, after a certain time w.h.p. is still smaller than implying that indeed majorizes the original process. Using the fact that in we can simply apply Chernoff bounds over several rounds, we can finally get w.h.p..

Formally, process is defined as follows. and , where is a Bernoulli random variable with and, by a standard coupling, it is whenever node sees two times color at round (note that the latter event happens with probability at most for any ). By definition, if it holds , which implies that the probability that any node in the original process gets opinion is at most . Thus, we can couple 2-Choices and for so that . This implies that

 T′\coloneqqmin{t≥0\leavevmode\nobreak |\leavevmode\nobreak P(t)≥ℓ′}⪯Ti.

Relying on Chernoff bounds, we show in the full proof (Section A.8) that , and from this we derive

## 5 Conclusion & Future Work

This section briefly discusses some directions of future work and our conjecture that our framework might help to gain a better understanding of how different (AC-) processes compare to each other.

#### Fault Tolerance

As mentioned in the introduction, previous studies [BCN14, BCN16, CER14, EFK16] show that 2-Choices and -Majority are consensus protocols that can tolerate dynamic, worst-case adversarial faults. More in details, the protocols work even in the presence of an adversary that can, in every round, corrupt the state of a bounded set of nodes. The goal in this setting is to achieve a stable regime in which \enquotealmost-all nodes support the same valid color (i.e. a color initially supported by at least one non-corrupted node). The size of the corrupted set is one of the studied quality parameters and depends on the number of colors and/or on the bias in the starting configuration. For instance, in [BCN16] it is proven that, for , -Majority tolerates a corrupted sets of size . A natural important open issue is to investigate whether our framework for AC-processes can be used to make statements about fault-tolerance properties in this (or in similar) adversarial models. We moderately lean toward thinking that our analysis is sufficiently general and \enquoterobust to be suitably adapted in order to cope with this adversarial scenario over a wider range of and bias w.r.t. the relative previous analyses.

#### Towards a Hierarchy

Consider the process functions of the general -Majority process for arbitrary . Intuitively, -Majority should be (stochastically) slower than -Majority. We strongly believe this result holds. However, naïvely applying our machinery to prove this does not work and needs to be amended. Our conjecture that such a \enquotehierarchy for h for different holds is backed by the proof of Lemma 2 (which shows this for , since the Voter process is actually equivalent to -Majority and -Majority).

###### Conjecture 1.

For , we can couple -Majority and -Majority such that the latter never has more remaining colors than the former. In particular, -Majority is stochastically faster than -Majority.

However, as we show in Appendix B via a counterexample, it turns out that Lemma 1 is not strong enough to derive creftype 1. In fact, our failed attempts in adapting our approach may suggest that similar counterexamples exist for any majorization attempt that uses a total order on vectors.

## Appendix A Auxiliary Tools and Full Proofs

### a.1 Tools from Majorization Theory

###### Proposition 1 ([Moa11, Proposition 11.E.11],[Rs+77]).

For and a probability vector , consider a random vector having the multinomial distribution . Let

 ϕ:\Setx∈\Nl0∑i∈\intcclxi=N→\R (9)

be such that exists. Note that this expected value depends on . Define the function on probability vectors as . If is Schur-convex, then so is .

###### Theorem 6 (Strassen’s Theorem [Moa11, 17.b.6]).

Suppose that is closed and that is the preorder of generated by the convex cone of real-valued functions defined on . Suppose further that is a closed set. Then the conditions

1. and

2. there exists a pair of random variables such that

1. and are identically distributed, and are identically distributed and

are equivalent if and only if ; i.e., the stochastic completion of is complete.

### a.2 Proof of Theorem 3

Consider the cone

 C\coloneqq\setϕ:A→\Rϕ is Schur-% convex

of real-valued Schur-convex functions on . This cone implies a preorder \enquote on by the definition for all . One can show that this preorder is the vector majorization \enquote (cf. [MOA11, Example 14.E.5])9. Now, \enquote being equal to \enquote has two implications:

1. The stochastic majorization \enquote implied by the preorder \enquote is the stochastic majorization \enquote from Definition 3 (cf. [MOA11, Definition 17.B.1]).

2. Since a cone is complete if it is maximal with respect to functions preserving the preorder \enquote (cf. [MOA11, Definition 14.E.2]), is complete (Schur-convex functions are by definition the set of all functions preserving the majorization preorder).

From 1 we get that Condition 1 is actually Condition 1 of Theorem 6. The same holds for Condition 2. From 2 we get that (cf. [MOA11, Proposition 17.B.3]), such that Conditions 1 and 2 are equivalent by Theorem 6. This finishes the proof. ∎

### a.3 Proof of Lemma 1

Consider the processes and with the configurations and from the theorem’s statement. Let and denote the configurations resulting after one round of on and on , respectively. Let and . As observed earlier in Section 2.2, we have and . By the theorem’s assumption, we have . Since, by Proposition 1 (see Appendix A), the function is Schur-convex for any Schur-convex function for which the expectation exists, we get .

Since the configuration space is a finite subset of , it is closed and so is . We now apply Theorem 3 (Strassen’s Theorem, see Appendix A) to get that there exists a coupling between and such that10 . This finishes the proof. ∎

### a.4 Tools from Drift Theory

###### Theorem 7 (Variable Drift Theorem [Lw14, Corollary 1.(i)]).

Let , be a stochastic process over some state space , where . Let be a differentiable function. Then the following statements hold for the first hitting time . If and , then

 E[T∣X0]≤xminh(xmin)+∫X0xmin1h(y)dy.

### a.5 Tools for Consensus Processes

###### Theorem 8 ([Bcn+16, Theorem 3.1]).

Let be an arbitrarily small constant. Starting from any initial configuration with colors, -Majority reaches consensus w.h.p. in

 \LDAUOmicron(k2log1/2n+klogn)⋅(k+logn)

rounds.

The following lemma uses the high-level idea of the proof presented in [AF02, Chapter 14] which only considers the case . For the purposes of our proof we would only require a coupling with , but for the sake of completeness we show the stronger claim .

###### Lemma 4.

For any graph , there exists a coupling such that .

###### Proof.

For and for define the random variables with , where denotes the uniform distribution and denotes the neighborhood of . Hence, means that pulls information from node in step . In the Coalescence process, the random variable , captures the transition performed by the random walk which is at at time (if any). In other words, these random variables define the arrows in Figure 1. For the voter process means that in step node adopts the opinion of .

Let be the trajectory of the random walk starting at . We can thus express

 Xt(u)={u if t=0Yt−1(Xt−1(u)) otherwise. (10)

Thus, this trajectory and the random variable are completely determined by the random variables .

Let be the Voter process whose starting time equals the time of the coalescence process (see also Figure 1). Let be the opinion of at time of . For every node and we can thus express

 O∗TkC−t′(u)={u if t′=0O∗TkC−(t′−1)(YTkC−t′(u))% otherwise. (11)

Note that (11) constructs a coupling between the Voter process and the coalescence process through the common usage of the random variables in (10) and (11). In particular, by unrolling (10) and (11) we get

 XTkC(u) =YTkC−1(YTkC−2(…(Y0(X0(u)))…))\lx@stackrel(a)=YTkC−1(YTkC−2(…(Y0(u))…)) O∗0(u) =O∗TkC(YTkC−1(YTkC−2(…(Y0(u))…)))\lx@stackrel(b)=YTkC−1(YTkC−2(…(Y0(u)…)),

where and we used that and in we used that for all . The above equations imply

 XTkC(u)=O∗0(u). (12)

Let denote the positions of the remaining walks in the coalescence process at time . Observe that , , by definition of . We have, by (12), that

 ZTkC={XTkC(u):u∈V}={O∗0(u):u∈V}=:O∗0. (13)

From (13) we infer , which implies that

 TkV≤TkC.

In the reminder we generalize the previous coupling to show that

 TkV=TkC.

In particular, we consider the Voter process for all starting position (all nodes have different colors at round ) and show that the resulting number of opinions is strictly more than .

Let be the Voter process that starts at time , and let be the opinion of at time of . For every node and we have

 Oττ−t′(u)={u if t′=τOττ−(t′−1)(Yτ−t′(u)) % otherwise. (14)

Similarly as before, by unrolling (10) and (14) we get

 Xτ(u) =Yτ−1(Yτ−2(…(Y0(X0(u)))…))\lx@stackrel(a)=Yτ−1(Yτ−2(…(Y0(u))…))