Minimizing Message Size in Stochastic Communication Patterns:Fast Self-Stabilizing Protocols with 3 bits1footnote 11footnote 1A preliminary version of this work appears as a 3-pages Brief Announcement in PODC 2016 [BKN16] and as an extended abstract at SODA 2017 [BKN17].

Minimizing Message Size in Stochastic Communication Patterns: Fast Self-Stabilizing Protocols with 3 bits111A preliminary version of this work appears as a 3-pages Brief Announcement in PODC 2016 [Bkn16] and as an extended abstract at SODA 2017 [Bkn17].

Lucas Boczkowski IRIF, CNRS and University Paris Diderot, Paris, 75013, France. E-mail: {Lucas.Boczkowski,Amos.Korman}@irif.fr.    Amos Korman 22footnotemark: 2    Emanuele Natale Max Planck Institute for Informatics, SaarbrÃ¼cken, 66123 , Germany. E-mail: emanuele.natale@mpi-inf.mpg.de. This work has been partly done while the author was visiting the Simons Institute for the Theory of Computing. This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 648032).
Abstract

This paper considers the basic model of communication, in which in each round, each agent extracts information from few randomly chosen agents. We seek to identify the smallest amount of information revealed in each interaction (message size) that nevertheless allows for efficient and robust computations of fundamental information dissemination tasks. We focus on the Majority Bit Dissemination problem that considers a population of agents, with a designated subset of source agents. Each source agent holds an input bit and each agent holds an output bit. The goal is to let all agents converge their output bits on the most frequent input bit of the sources (the majority bit). Note that the particular case of a single source agent corresponds to the classical problem of Broadcast (also termed Rumor Spreading). We concentrate on the severe fault-tolerant context of self-stabilization, in which a correct configuration must be reached eventually, despite all agents starting the execution with arbitrary initial states. In particular, the specification of who is a source and what is its initial input bit may be set by an adversary.

We first design a general compiler which can essentially transform any self-stabilizing algorithm with a certain property (called “the bitwise-independence property”) that uses -bits messages to one that uses only -bits messages, while paying only a small penalty in the running time. By applying this compiler recursively we then obtain a self-stabilizing Clock Synchronization protocol, in which agents synchronize their clocks modulo some given integer , within rounds w.h.p., and using messages that contain bits only.

We then employ the new Clock Synchronization tool to obtain a self-stabilizing Majority Bit Dissemination protocol which converges in time, w.h.p., on every initial configuration, provided that the ratio of sources supporting the minority opinion is bounded away from half. Moreover, this protocol also uses only 3 bits per interaction.

Minimizing Message Size in Stochastic Communication Patterns:

Fast Self-Stabilizing Protocols with 3 bitsthanks: A preliminary version of this work appears as a 3-pages Brief Announcement in PODC 2016 [BKN16] and as an extended abstract at SODA 2017 [BKN17].

 Lucas Boczkowski ††thanks: IRIF, CNRS and University Paris Diderot, Paris, 75013, France. E-mail: {Lucas.Boczkowski,Amos.Korman}@irif.fr. and Amos Korman 22footnotemark: 2 and Emanuele Natale ††thanks: Max Planck Institute for Informatics, SaarbrÃ¼cken, 66123 , Germany. E-mail: emanuele.natale@mpi-inf.mpg.de. This work has been partly done while the author was visiting the Simons Institute for the Theory of Computing. This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 648032).

1 Introduction

1.1 Background and motivation

Distributed systems composed of limited agents that interact in a stochastic fashion to jointly perform tasks are common in the natural world as well as in engineered systems. Examples include a wide range of insect populations [HM85], chemical reaction networks [CCDS14], and mobile sensor networks [Pop1]. Such systems have been studied in various disciplines, including biology, physics, computer science and chemistry, while employing different mathematical and experimental tools.

From an algorithmic perspective, such complex systems share a number of computational challenges. Indeed, they all perform collectively in dynamically changing environments despite being composed of limited individuals that communicate through seemingly unpredictable, unreliable, and restricted interactions. Recently, there has been significant interest in understanding the computational limitations that are inherent to such systems, by abstracting some of their characteristics as distributed computing models, and analyzing them algorithmically [Dan2, Pop1, AFJ06, BCN15, Doty, OHK14]. These models usually consider agents which are restricted in their memory and communication capacities, that interact independently and uniformly at random (u.a.r.). By now, our understanding of the computational power of such models is rather advanced. However, it is important to note that much of this progress has been made assuming non-faulty scenarios - a rather strong assumption when it comes to natural or sensor-based systems. For example, to synchronize actions between processors, many known distributed protocols rely on the assumption that processors know when the protocol is initiated. However, in systems composed of limited individuals that do not share a common time notion, and must react to a dynamically changing environment, it is often unclear how to achieve such conditions. To have a better understanding of such systems, it is desirable to identify the weakest computational models that still allow for both efficient as well as robust computations.

This paper concentrates on the basic model of communication [DGH88, DF11a, DGM11, KSSV00], in which in each round, each agent can extract (pull) information from few other agents, chosen u.a.r. In the computer science discipline, this model, as well as its companion model, gained their popularity due to their simplicity and inherent robustness to different kinds of faults. Here, focusing more on the context of natural systems, we view the model as an abstraction for communication in well-mixed scenarios, where agents can occasionally “observe” arbitrary other agents. We aim to identify the minimal model requirements with respect to achieving basic information dissemination tasks under conditions of increased uncertainty. As many natural systems appear to be more restricted by their communication abilities than by their memory capacities [Survey, beeping1, emek], our main focus is on understanding what can be computed while revealing as few bits per interaction as possible111We note that stochastic communication patterns such as or are inherently sensitive to congestion issues. Indeed, in such models it is unclear how to simulate a protocol that uses large messages while using only small size messages. For example, the straightforward strategy of breaking a large message into small pieces and sequentially sending them one after another does not work, since one typically cannot make sure that the small messages reach the same destination. Hence, reducing the message size may have a profound impact on the running time, and perhaps even on the solvability of the problem at hand..

Self-stabilizing Bit Dissemination.

Disseminating information from one or several sources to the rest of the population is one of the most fundamental building blocks in distributed computing [CHHKM12, CLP09, DGH88, DF11a, KSSV00], and an important primitive in natural systems [FishConsensus, Razin, ManyEyes]. Here, we focus on the Majority Bit Dissemination problem defined as follows. We consider a population of agents. The population may contain multiple source agents which are specified by a designated bit in the state of an agent indicating whether the agent is a source or not. Each source agent holds a binary input bit, however, sources may not necessarily agree on their input bits. In addition, each agent holds a binary output bit (also called opinion). The goal of all agents is to converge their opinion on the majority bit among the initial input bits of the sources, termed . This problem aims to capture scenarios in which some individuals view themselves as informed, but some of these agents could also be wrong, or not up-to-date. Such situations are common in nature [COU05, Razin] as well as in man-made systems. The number of sources is termed . We do not assume that agents know the value , or that sources know whether they are in the majority or minority (in terms of their input bit). For simplicity, to avoid dealing with the case that the fraction of the majority input bit among sources is arbitrarily close to that of the minority input bit, we shall guarantee convergence only when the fraction of source agents holding the majority input bit is bounded away from .

The particular case where we are promised to have is called Bit Dissemination, for short. In this case we have a single source agent that aims to disseminate its input bit to the rest of the population, and there are no other sources introducing a conflicting opinion. Note that this problem has been studied extensively in different models under different names (e.g., Broadcast or Rumor Spreading). A classical example of Bit Dissemination considers the synchronous / communication model, where can be propagated from the source to all other agents in rounds, by simply letting each uninformed agent copy it whenever it sees an informed agent [KSSV00]. The correctness of this protocol heavily relies on the absence of incorrect information held by agents. Such reliability however may be difficult to achieve in dynamic or unreliable conditions. For example, if the source is sensitive to an unstable environment, it may change its mind several times before stabilizing to its final opinion. Meanwhile, it may have already invoked several consecutive executions of the protocol with contradicting initial opinions, which may in turn “infect” other agents with the wrong opinion . If agents do not share a common time notion, it is unclear how to let infected agents distinguish their current wrong opinion from the more “fresh”, correct opinion. To address such difficulty, we consider the context of self-stabilization [dijkstra], where agents must converge to a correct configuration from any initial configuration of states.

1.2 Technical difficulties and intuition

Consider the Bit Dissemination problem (where we are guaranteed to have a single source agent). This particular case is already difficult in the self-stabilizing context if we are restricted to use bits per interaction. As hinted above, a main difficulty lies in the fact that agents do not necessarily share a common time notion. Indeed, it is easy to see that if all agents share the same clock, then convergence can be achieved in time with high probability (w.h.p.), i.e, with a probability of at least , and using two bits per interaction.

Self-stabilizing Bit Dissemination (k=1) with 2 bits per interaction, assuming synchronized clocks.

The source sets her output bit to be her input bit . In addition to communicate its output bit , each agent stores and communicates a certainty bit . Informally, having a certainty bit equal to 1 indicates that the agent is certain of the correctness of its output bit. The source’s certainty bit is always set to 1. Whenever a non-source agent observes and sees the tuple , where , it copies the output and certainty bits of (i.e., sets and ). In addition, all non-source agents count rounds, and reset their certainty bit to 0 simultaneously every rounds. The reset allows to get rid of “old” output bits that may result from applying the protocol before the source’s output bit has stabilized. This way, from the first time a reset is applied after the source’s output bit has stabilized, the correct source’s output bit will propagate to all agents within rounds, w.h.p. Note however, that if agents do not share a consistent notion of time they cannot reset their certainty bit to zero simultaneously. In such cases, it is unclear how to prevent agents that have just reset their certainty bit to 0 from being “infected” by “misleading” agents, namely, those that have the wrong output bit and certainty bit equal to 1.

Self-stabilizing Bit Dissemination (k=1) with a single bit per interaction, assuming synchronized clocks.

Under the assumption that all agents share the same clock, the following trick shows how to obtain convergence in time and using only a single bit per message, namely, the output bit. As before, the source sets her output bit to be her input bit . Essentially, agents divide time into phases of some prescribed length , each of them being further subdivided into subphases of length . In the first subphase of each phase, non-source agents are sensitive to opinion . This means that whenever they see a in the output bit of another agent, they turn their output bit to , but if they see 1 they ignore it. Then, in the second subphase of each phase, they do the opposite, namely they switch their output bit to as soon as they see a (see Figure 1). Consider the first phase starting after initialization. If then within one complete subphase , every output bit is w.h.p., and remains there forever. Otherwise, if , when all agents go over a subphase all output bits are set to w.h.p., and remain forever. Note that a common time notion is required to achieve correctness.

The previous protocol indicates that the self-stabilizing Bit Dissemination problem is highly related to the self-stabilizing Clock Synchronization problem, where each agent internally stores a clock modulo incremented at every round and, despite having arbitrary initial states, all agents should converge on sharing the same value of the clock. Indeed, given such a protocol, one can obtain a self-stabilizing Bit Dissemination protocol by running the Clock Synchronization protocol in parallel to the last example protocol. This parallel execution costs only an additional bit to the message size and a additive factor to the time complexity over the complexities of the Clock Synchronization protocol.

Intuition behind the self-stabilizing Clock Synchronization algorithm.

Our technique for obtaining the Clock Synchronization protocol is based on a compact recursive use of the stabilizing consensus protocol proposed by Doerr et al. [DGM11] through our Message Reduction Theorem (Theorem 3.1).

In the Preliminary section (Section 2.2) we describe a simple protocol called Syn-Simple that uses bits per message. In Syn-Simple, each agent maintains a clock . At each round, each agent displays the opinion of her clock, pulls  other such clock opinions, and updates her clock as the bitwise majority of the two clocks she pulled and her own. Then the clock is incremented. This protocol essentially amounts to running the protocol of Doerr et al. on each bit separately and in parallel, and self-stabilizes in rounds w.h.p. (Proposition 2.1).

We want to apply a strategy similar to Syn-Simple, while using only many bits per interaction. The core technical ingredient, made rigorous in the Message Reduction Theorem, is that a certain class of protocols using messages of bits, to which Syn-Simple belongs, can be emulated by another protocol which uses bits only. The idea is to build a clock modulo using Syn-Simple itself on bits and sequentially display one bit of the original -bit message according to such clock. Thus, by applying such strategy to Syn-Simple itself, we use a smaller clock modulo to synchronize a clock modulo . Iterating such process, in Section 4.2, we obtain a compact protocol which uses only bits.

1.3 The model

The communication model.

We adopt the the synchronous model [BCN16, DGH88]. Specifically, in the model, communication proceeds in discrete rounds. In each round, each agent “observes” arbitrary other agents, chosen u.a.r.222“u.a.r.” stands for “uniformly at random”, with replacement. among all agents, including herself. (We often omit the parameter when it is equal to 2). When an agent “observes” another agent , she can peek into a designated visible part of ’s memory. If several agents observe an agent at the same round then they all see the same visible part. The message size denotes the number of bits stored in the visible part of an agent. We denote with the model with message size . We are primarily interested in message size that is independent of , the number of agents.

Agents.

We assume that agents do not have unique identities, that is, the system is anonymous. We do not aim to minimize the (non-visible) memory requirement of the agent, yet, we note that our constructions can be implemented with relatively short memory, using bits. We assume that each agent internally stores a clock modulo some integer , which is incremented at every round.

Majority Bit Dissemination problem.

We assume a system of agents each having an internal state that contains an indicator bit which indicates whether or not the agent is a source. Each source holds a binary input bit333 Note that having the indicator bit equal to 1 is equivalent to possessing an input bit: both are exclusive properties of source nodes. However, we keep them distinct for a clearer presentation. and each agent (including sources) holds a binary opinion. The number of sources (i.e., agents whose indicator bit is ) is denoted by . We denote by and the number of sources whose input bit is initially set to and , respectively. Assuming , we define the majority bit, termed , as if and if . Source agents know that they are sources (using the indicator bit) but they do not know whether they hold the majority bit. The parameters , or are not known to the sources or to any other agent. It is required that the opinions of all agents eventually converge to the majority bit. We note that agents hold their output and indicator bits privately, and we do not require them to necessarily reveal these bits publicly (in their visible parts) unless they wish to. To avoid dealing with the cases where the number of sources holding the majority bit is arbitrarily close to , we shall guarantee correctness (w.h.p.) only if the fraction of sources holding the majority is bounded away from , i.e., only if , for some positive constant . When , the problem is called Bit Dissemination, for short. Note that in this case, the single source agent holds the bit to be disseminated and there is no other source agent introducing a conflicting opinion.

T-Clock Synchronization.

Let be an integer. In the -Clock Synchronization problem, each agent maintains a clock modulo that is incremented at each round. The goal of agents is to converge on having the same value in their clocks modulo . (We may omit the parameter when it is clear from the context.)

Probabilistic self-stabilization and convergence.

Self-stabilizing protocols are meant to guarantee that the system eventually converges to a legal configuration regardless of the initial states of the agents [dijkstra]. Here we use a slightly weaker notion, called probabilistic self-stabilization, where stability is guaranteed w.h.p. [BCN15b]. More formally, for the Clock Synchronization problem, we assume that all states are initially set by an adversary. For the Majority Bit Dissemination problem, we assume that all states are initially set by an adversary except that it is assumed that the agents know their total number , and that this information is not corrupted.

In the context of -Clock Synchronization, a legal configuration is reached when all clocks show the same time modulo , and in the Majority Bit Dissemination problem, a legal configuration is reached when all agents output the majority bit . Note that in the context of the Majority Bit Dissemination problem, the legality criteria depends on the initial configuration (that may be set by an adversary). That is, the agents must converge their opinion on the majority of input bits of sources, as evident in the initial configuration.

The system is said to stabilize in rounds if, from any initial configuration w.h.p., within rounds it reaches a legal configuration and remains legal for at least some polynomial time [BCN15b, BCN16, DGM11]. In fact, for the self-stabilizing Bit Dissemination problem, if there are no conflicting source agents holding a minority opinion (such as in the case of a single source agent), then our protocols guarantee that once a legal configuration is reached, it remains legal indefinitely. Note that, for any of the problems, we do not require that each agent irrevocably commits to a final opinion but that eventually agents arrive at a legal configuration without necessarily being aware of that.

1.4 Our Results

Our main results are the following.

Theorem 1.1

Fix an arbitrarily small constant . There exists a protocol, called Syn-Phase-Spread, which solves the Majority Bit Dissemination problem in a self-stabilizing manner in rounds444With a slight abuse of notation, with we refer to . All logarithms are in base . w.h.p using -bit messages, provided that the majority bit is supported by at least a fraction of the source agents.

Theorem 1.1 is proved in Section 5. The core ingredient of Syn-Phase-Spread is our construction of an efficient self-stabilizing -Clock Synchronization protocol, which is used as a black-box. For this purpose, the case that interests us is when . Note that in this case, the following theorem, proved in Section 4, states that the convergence time of the Clock Synchronization algorithm is .

Theorem 1.2

Let be an integer. There exists a self-stabilizing -Clock Synchronization protocol, called Syn-Clock, which employs only 3-bit messages, and synchronizes clocks modulo  within rounds w.h.p.

In addition to the self-stabilizing context our protocols can tolerate the presence of Byzantine agents, as long as their number is555Specifically, it is possible to show that, as a corollary of our analysis and the fault-tolerance property of the analysis in [DGM11], if then Syn-Clock can tolerate the presence of up to Byzantine agents for any . In addition, Syn-Phase-Spread can tolerate Byzantine agents, where and are the number of sources supporting the majority and minority opinions, respectively. Note that for the case of a single source (), no Byzantine agents are allowed; indeed, a single Byzantine agent pretending to be the source with the opposite opinion can clearly ruin any protocol. . However, in order to focus on the self-stabilizing aspect of our results, in this work we do not explicitly address the presence of Byzantine agents.

The proofs of both Theorem 1.2 and Theorem 1.1 rely on recursively applying a new general compiler which can essentially transform any self-stabilizing algorithm with a certain property (called “the bitwise-independence property”) that uses -bit messages to one that uses only -bit messages, while paying only a small penalty in the running time. This compiler is described in Section 3, in Theorem 3.1, which is also referred as “the Message Reduction Theorem”. The structure between our different lemmas and results is summarized in the picture below, Figure 2.

It remains an open problem, both for the self-stabilizing Bit Dissemination problem and for the self-stabilizing Clock Synchronization problem, whether the message size can be reduced to 2 bits or even to 1 bit, while keeping the running time poly-logarithmic.

1.5 Related work

The computational study of abstract systems composed of simple individuals that interact using highly restricted and stochastic interactions has recently been gaining considerable attention in the community of theoretical computer science. Popular models include population protocols [Pop1, Pop2, Pop-ss, Joffroy], which typically consider constant size individuals that interact in pairs (using constant size messages) in random communication patterns, and the beeping model [beeping1, emek], which assumes a fixed network with extremely restricted communication. Our model also falls in this framework as we consider the model [DGH88, KSSV00, KDG03] with constant size messages. So far, despite interesting works that consider different fault-tolerant contexts [Aspnes, Pop-ss, Joffroy], most of the progress in this framework considered non-faulty scenarios.

Information dissemination is one of the most well-studied topics in the community of distributed computing, see, e.g., [Aspnes, CHHKM12, DGH88, DF11a, DGM11, OHK14, KSSV00]. Classical examples include the Broadcast (also referred to in the literature as Rumor Spreading) problem, in which a piece of information residing at one source agent is to be disseminated to the rest of the population, and majority-consensus (here, called Majority Bit Dissemination) problems in which processors are required to agree on a common output value which is the majority initial input value among all agents [Aspnes, kutten] or among a set of designated source agents [OHK14]. An extensive amount of research has been dedicated to study such problems in based protocols (including the phone call model), due to the inherent simplicity and fault-tolerant resilience of such meeting patterns. Indeed, the robustness of based protocols to weak types of faults, such as crashes of messages and/or agents, or to the presence of relatively few Byzantine agents, has been known for quite a while [ES09, KSSV00]. Recently, it has been shown that under the model, there exist efficient Broadcast and Majority Bit Dissemination protocols that use a single bit per message and can overcome flips in messages (noise) [OHK14]. The protocols therein, however, heavily rely on the assumption that agents know when the protocol has started. Observe that in a self-stabilizing context, in which the adversary can corrupt the initial clocks setting them to arbitrary times, such an assumption would be difficult to remove while preserving the small message size.

In general, there are only few known self-stabilizing protocols that operate efficiently under stochastic and capacity restricted interactions. An example, which is also of high relevance to this paper, is the work of Doerr et al. on Stabilizing Consensus [DGM11] operating in the model. In that work, each agent initially has a state taken out of a set of opinions and the goal is to converge on one of the proposed states. The proposed algorithm which runs in logarithmic time is based on sampling the states of agents and updating the agent’s state to be the median of the sampled states and the current state of the agent ( opinions in total). Since the total number of possible states is , the number of bits that must be revealed in each interaction is . Another example is the plurality consensus protocol in [BCN16], in which each agent has initially an opinion and we want the system to converge to the most frequent one in the initial configuration of the system. In fact, the Majority Bit Dissemination problem can be viewed as a generalization of the majority-consensus problem (i.e. the plurality consensus problem with two opinions), to the case in which multiple agents may initially be unopinionated. In the previous sense, we also contribute to the line of research on the majority-consensus problem [BCN15, CER15, EFK16].

Another fundamental building block is Clock Synchronization [Attiya, Lamport, Lenzen3, Lenzen2]. We consider a synchronous system in which clocks tick at the same pace but may not share the same opinion. This version has earlier been studied in e.g., [Ben-Or, Dolev07, Dolev97, Dolev04, SIROCCO, Herman] under different names, including “digital Clock Synchronization” and “synchronization of phase-clocks”; We simply use the term “Clock Synchronization”. There is by now a substantial line of work on Clock Synchronization problems in a self-stabilizing context [DolevKLRS13, Dolev04, Lenzen4, Lenzen5]. We note that in these papers the main focus is on the resilience to Byzantine agents. The number of rounds and message lengths are also minimized, but typically as a function of the number of Byzantine processors. Our focus is instead on minimizing the time and message complexities as much as possible. The authors in [Lenzen4, Lenzen5] consider mostly a deterministic setting. The communication model is very different than ours, as every agent gets one message from every other agent on each round. Moreover, agents are assumed to have unique identifiers. In contrast, we work in a more restricted, yet randomized communication setting. In [DolevKLRS13, Lenzen4] randomized protocols are also investigated. We remark that the first protocol we discuss Syn-Simple (Proposition 2.1), which relies on a known simple connection between consensus and counting [DolevKLRS13], already improves exponentially on the randomized algorithms from [DolevKLRS13, Lenzen4] in terms of number of rounds, number of memory states, message length and total amount of communication, in the restricted regime where the resilience parameter satisfies . We further note that the works [Lenzen5, Lenzen4] also use a recursive construction for their clocks (although very different from the one we use in the proof of Theorem 1.2). The induction in [Lenzen4] is on the resilience parameter , the number of agents and the clock length together. This idea is improved in [Lenzen5] to achieve optimality in terms of resilience to Byzantine agents.

To the best of our knowledge there are no previous works on self-stabilizing Clock Synchronization or Majority Bit Dissemination that aim to minimize the message size beyond logarithmic.

2 Preliminaries

2.1 A majority based, self-stabilizing protocol for consensus on one bit

Let us recall666Our protocols will use this protocol as a black box. However, we note that the constructions we outline are in fact independent of the choice of consensus protocol, and this protocol could be replaced by other protocols that achieve similar guarantees. the stabilizing consensus protocol by Doerr et al. in [DGM11]. In this protocol, called maj-consensus, each agent holds an opinion. In each round each agent looks at the opinions of two other random agents and updates her opinion taking the majority among the bits of the observed agents and its own. Note that this protocol uses only a single bit per interaction, namely, the opinion. The usefulness of maj-consensus comes from its extremely fast and fault-tolerant convergence toward an agreement among agents, as given by the following result.

Theorem 2.1 (Doerr et al. [Dgm11])

From any initial configuration, maj-consensus converges to a state in which all agents agree on the same output bit in rounds, w.h.p. Moreover, if there are at most Byzantine agents, for any constant , then after rounds all non-Byzantine agents have converged and consensus is maintained for rounds w.h.p.777The original statement of [DGM11] says that if at most agents can be corrupted at any round, then convergence happens for all but at most agents. Let us explain how this implies the statement we gave, namely that we can replace by , if . Assume that we are in the regime . It follows from [DGM11] that all but a set of agents reach consensus after round. This set of size contains both Byzantine and non Byzantine agents. However, if the number of agents holding the minority opinion is , then the expected number of non Byzantine agents that disagree with the majority at the next round is in expectation . Thus, by Markov’s inequality, this implies, that at the next round consensus is reached among all non-Byzantine agents w.h.p. Note also that, for the same reasons, the Byzantine agents do not affect any other non-Byzantine agent for rounds w.h.p.

2.2 Protocol Syn-Simple: A simple protocol with many bits per interaction

We now present a simple self-stabilizing -Clock Synchronization protocol, called Syn-Simple, that uses relatively many bits per message, and relies on the assumption that is a power of 2. The protocol is based on iteratively applying a self-stabilizing consensus protocol on each bit of the clock separately, and in parallel.

Formally, each agent maintains a clock . At each round, displays the opinion of her clock , pulls  uniform other such clock opinions, and updates her clock as the bitwise majority of the two clocks it pulled, and her own. Subsequently, the clock is incremented. We present the pseudo code of Syn-Simple in Algorithm 1.

We prove the correctness of Syn-Simple in the next proposition.

Proposition 2.1

Let be a power of . The protocol Syn-Simple is a self-stabilizing protocol that uses bits per interaction and synchronizes clocks modulo in rounds w.h.p.

• Let us look at the least significant bit. One round of Syn-Simple is equivalent to one round of maj-consensus with an extra flipping of the opinion due to the increment of the clock. The crucial point is that all agents jointly flip their bit on every round. Because the function agents apply, maj, is symmetric, it commutes with the flipping operation. More formally, let be the vector of the first bits of the clocks of the agents at round under an execution of Syn-Simple. E.g. is the value of the less significant bit of node ’s clock at time . Similarly, we denote by the first bits of the clocks of the agents at round obtained by running a modified version of Syn-Simple in which time is not incremented (i.e. we skip line 1 in Algorithm 1). We couple and trivially, by running the two versions on the same interaction pattern (in other words, each agent starts with the same memory and pulls the same agents at each round in both executions). Then, is equal to when is even, while is equal to when is odd. Moreover, we know from Theorem 2.1 that converge to a stable opinion in a self-stabilizing manner. It follows that, from any initial configuration of states (i.e. clocks), w.h.p, after rounds of executing Syn-Simple, all agents share the same opinion for their first bit, and jointly flip it in each round. Once agents agree on the first bit, since is a power of , the increment of time makes them flip the second bit jointly once every rounds888To get the feeling of the kind of dependence more significant bits have on the less significant ones when is not a power of observe that, for example, if then the first bit takes cyclically the values , and again .. More generally, assuming agents agree on the first bits of their clocks, they jointly flip the ’st bit once every rounds, on top of doing the maj-consensus protocol on that bit. Hence, the same coupling argument shows that the flipping doesn’t affect the convergence on bit . Therefore, rounds after the first bits are synchronized, w.h.p. the ’st bit is synchronized as well. The result thus follows by induction.

2.3 The bitwise-independence property

Our general transformer described in Section 3 is useful for reducing the message size of protocols with a certain property called bitwise-independence. Before defining the property we need to define a variant of the model, which we refer to as the model. The reason we introduce such a variant is mainly technical, as it appears naturally in our proofs.

Recall that in the model, at any given round, each agent is reading an -bit message for each of the observed agents chosen u.a.r. (in our case ), and then, in turn, updates her state according to the instructions of a protocol P. Informally, in the model, each agent also receives messages, however, in contrast to the model where each such message corresponds to one observed agent, in the model, the ’th bit of each such message is received independently from a (typically new) agent, chosen u.a.r. from all agents.

Definition 1 (The BIT model)

In the model, at each round, each agent picks agents u.a.r., namely, ,…,, and reads , the -th bit of the visible part of agent , for every and . For each , let be the -bit string . By a slight abuse of language we call the strings the messages received by in the model.

Definition 2 (The bitwise−independence property)

Consider a protocol P designed to work in the model. We say that P has the bitwise-independence property if its correctness and running time guarantees remain the same, under the model (assuming that given the messages it receives at any round, each agent performs the same actions that it would have, had it received these messages in the model).

Let us first state a fact about protocols having the bitwise-independence property.

Lemma 2.1

Assume protocol Syn-Generic is a protocol synchronizing clocks modulo for some and protocol P is a protocol which works assuming agents share a clock modulo . Denote by Syn-P the parallel execution of Syn-Generic and P, with P using the clock synchronized by Syn-Generic. If Syn-Generic and P are self-stabilizing then so is Syn-P, and the convergence time of Syn-P is at most the sum of convergence times of Syn-Generic and P. Finally, if Syn-Generic and P have the bitwise-independence property, and P is also self-stabilizing, Syn-P has the bitwise-independence property too.

• The self-stabilizing property of Syn-P and its convergence time easily follows from those of Syn-Generic and P.

As for the bitwise-independence property, assume we run Syn-P in the model. The execution of Syn-Generic is carried independently of the execution of P. Since, by hypothesis, Syn-Generic has the independence property, eventually all agents have a synchronized clock modulo . Thus, once clocks are synchronized, we can disregard the part of the message corresponding to Syn-Generic, and view the execution of Syn-P as simply P. Therefore, since P is self-stabilizing and has the independence property, Syn-P still works in the model as in the original model.

We next show that the protocol Syn-Simple has the aforementioned bitwise-independence property.

Lemma 2.2

Syn-Simple has the bitwise-independence property.

• Let be the size of the clocks. Assume the first bits of the clocks have been synchronized. At this stage, the -st bit of each agent is flipped every rounds and updated as the majority of the -st bit of and the pulled messages on each round. Since the first bits are synchronized, the previous flipping is performed by all agents at the same round. The thesis follows from the observation that, in order for Syn-Simple to work, we do not need the bit at index to come from the same agent as those bits used to synchronize the other indices, as long as convergence on the first bits has been achieved.

3 A General Compiler that Reduces Message Size

In this section we present a general compiler that allows to implement a protocol P using -bit messages while using messages of order instead, as long as P enjoys the bitwise-independence property. The compiler is based on replacing a message by an index to a given bit of the message. This tool will repeatedly be used in the following sections to obtain our Clock Synchronization and Majority Bit Dissemination algorithms that use 3-bit messages.

Theorem 3.1 (the Message Reduction Theorem)

Any self-stabilizing protocol P in the model having the bitwise-independence property, and whose running time is , can be emulated by a protocol Emul(P) which runs in the model, has running time and has itself the bitwise-independence property.

Remark 1

The only reason for designing Emul(P) to run in the model in the Message Reduction Theorem is the consensus protocol we adopt, maj-consensus, which works in the model. In fact, Emul(P) can be adapted to run in the model by using a consensus protocol which works in the model. However, no self-stabilizing binary consensus protocol in the model with the same performances as maj-consensus is currently known.

Proof of Theorem 3.1. Let be the message displayed by an agent under P at a given round. For simplicity’s sake, in the following we assume that is even, the other case is handled similarly. In Emul(P), agent keeps the message  privately, and instead displays a clock written on bits, and one bit of the message , which we refer to as the P-bit. Thus, the total number of bits displayed by the agent operating in Emul(P) is . The purpose of the clock is to indicate to agent which bit of to display. In particular, if the counter has value , then the -th bit (i.e the least significant bit) of is shown as the P-bit, and so on. In what follows, we refer to as the private message of , to emphasize the fact that this message is not visible in Emul(P). See Figure 3 for an illustration.

Each round of P executed in the model by an agent is emulated by rounds of Emul(P) in the model. We refer to such rounds as a phase, which is further divided to subphases of length . Note that since each agent samples 2 agents in a round, the total number of agents sampled by an agent during a phases is .

For a generic agent , a phase starts when its clock is zero, and ends after a full loop of its clock (i.e. when returns to zero). Each agent is running protocol Syn-Simple on the bits which correspond to her clock . Note that the phases executed by different agents may initially be unsynchronized, but, thanks to Proposition 2.1, the clocks eventually converge to the same value, for each agent , and hence all agents eventually agree on when each phase (and subphase) starts.

Let be an arbitrary agent. Denote by , …, the P-bits collected by from agents chosen u.a.r during a phase. Consider a phase and a round in that phase. Let and be such that . We view as round of subphase of the phase. On this round, agent pulls two messages from agents and , chosen u.a.r. Once the clocks (and thus phases and subphases) have synchronized, agents and are guaranteed to be displaying the th index of their private messages, namely, the values and , respectively. Agent then sets equal to and equal to .

In Emul(P), the messages displayed by agents are only updated after a full loop of . It therefore follows from the previous paragraph that the P-bits collected by agent after a full-phase are distributed like the bits collected during one round of P in the model (see Definition 1), assuming the clocks are synchronized already.

Correctness. The bitwise-independence property of Syn-Simple (Lemma 2.2), implies that Syn-Simple still works when messages are constructed from the P-bits collected by Emul(P). Therefore, from Proposition 2.1, eventually all the clocks are synchronized. Since private messages are only updated after a full loop of , once the clocks are synchronized a phase of Emul(P) corresponds to one round of P, executed in the model. Hence, the hypothesis that P operates correctly in a self-stabilizing way in the model implies the correctness of Emul(P).

Running time. Once the clocks are synchronized, for all agents , using the first bits of the messages, the agents reproduce an execution of P with a multiplicative time-overhead of . Moreover, from Proposition 2.1, synchronizing the clocks takes rounds. Thus, the time to synchronize the clocks costs only an additive factor of rounds, and the total running time is .

Bitwise-independence property. Protocol Emul(P) inherits the bitwise-independence property from that of Syn-Simple (Lemma 2.2) and P (which has the property by hypothesis): We can apply Lemma 2.1 where Syn-Generic is Syn-Simple and P is the subroutine described above, which displays at each round the bit of whose index is given by a synchronized clock modulo (i.e. the one produced by Syn-Simple). Observe that the aforementioned subroutine is self-stabilizing, since it emulates P once clocks are synchronized. Then, in the notation of Lemma 2.1, Emul(P) is Syn-P.

4 Self-Stabilizing Clock Synchronization

In Section 2.2 we described Syn-Simple -  a simple self-stabilizing Clock Synchronization protocol that uses bits per interaction. In this section we describe our main self-stabilizing Clock Synchronization protocol, Syn-Bits, that uses only bits per interaction. We first assume is a power of . We show how to get rid of this assumption in Section 4.2.

4.1 Clock Synchronization with 3-bit messages, assuming T is a power of two

In this section, we show the following result.

Lemma 4.1

Let be a power of . There exists a synchronization protocol Syn-Intermediate which synchronizes clocks modulo in time using only 3-bit messages. Moreover, Syn-Intermediate has the bitwise-independence property.

Before presenting the proof of Lemma 4.1, we need a remark about clocks.

Remark 2

In order to synchronize a clock modulo , throughout the analysis we often obtain a clock modulo which is incremented every rounds. However, can still be translated back to a clock modulo which is incremented every round, by keeping a third clock modulo and setting

 C=C′+C′′modT.

Proof of Lemma 4.1. At a high level, we simply apply iteratively the Message Reduction Theorem in order to reduce the message to bits, starting with P = Syn-Simple. A pictorial representation of our recursive protocol is given in Figure 4, and a pseudocode is given in Algorithm 2999The pseudocode deviates from the presentation done in the proof, as it makes no use of recursion..

Let us consider what we obtain after applying the Message Reduction Theorem the first time to P Syn-Simple for clocks modulo . Recall that we assume that is a power of 2. From Proposition 2.1 we know that in this case, the convergence time of Syn-Simple is , the number of pulled agents at each round is and the number of bits of each message is .

With the emulation produced by the Message Reduction Theorem, the clock used in P =Syn-Simple is incremented only every rounds. Another way to interpret this is that we obtain a clock modulo and using Remark 2 we can view the clock , as a counter modulo that is incremented at each round. Hence, by the running time analysis of the Message Reduction Theorem, we obtain a protocol Emul(P) which synchronizes a clock modulo in rounds. The message size is reduced from to .

By repeatedly applying the Message Reduction Theorem, we reduce the size of the message as long as , i.e. as long as . The number of repeated application of the Message Reduction Theorem until the message size is is thus of order .

Let us analyze the running time. Let , and let be the smallest integer such that . We apply the Message Reduction Theorem times, and we obtain a message size and a running time , such that

 (4.1) Li+1≤γ1(logℓilogn+ℓiLi),

for some constant independent of . We set to be , taking the maximum with for technical convenience. The second term dominates in (Equation 4.1) because and . Hence is at most of order . More precisely, by induction we can bound , since

 (4.2) Li+1≤γ1logℓilogn+γi1\prodop\displaylimitsij=1ℓj⋅L1≤γ1ℓilogn+γi1\prodop\displaylimitsij=1ℓj⋅L1≤2γi1\prodop\displaylimitsij=1ℓj⋅L1≤γi+11\prodop\displaylimitsij=1ℓj⋅L1,

where we use the fact that , and the definition of .

The running time of Emul(P) Syn-Clock after the last application of the Message Reduction Theorem, i.e. , is thus

 (4.3) L\textscSyn−Clock:=Lτ≤γτ1\prodop\displaylimitsτi=1ℓiL1.

We use the following fact.

Fact 4.1

If , it holds

 ex1+x≤1+x≤ex≤1+x1−x.

From the bounds , , , and Lemma B.1, we obtain and

 ℓτ3≤2O((log⊛4T)2)≤2O(logloglogT)≤(loglogT)O(1).

We thus conclude that

 (4.4) L\textscSyn−Clock ≤γτ1\prodop\displaylimitsτi=1ℓiL1≤O(logloglogT)⋅ℓ1ℓ2ℓτ3⋅O(logTlogn) (4.5) ≤O(logloglogT)⋅O(logT)⋅O(loglogT)⋅O(loglogT)O(1)⋅O(logTlogn) (4.6) ≤log2Tlogn⋅(loglogT)O(1).

The total slowdown with respect to Syn-Simple corresponds to . Hence the clock produced by the emulation is incremented every rounds. In other words we obtain a clock modulo for some function . But using Remark 2 we can still view this as a clock modulo .

4.2 Extension to general T and running time improvement.

In this subsection we aim to get rid of the assumption that is a power of in Lemma 4.1, and also reduce the running time of our protocol to , proving Theorem 1.2.

Proof of Theorem 1.2. From Lemma 4.1, we know that Syn-Intermediate synchronizes clocks modulo in time using only -bit messages, provided that is a power of 2. While protocol Syn-Intermediate emulates protocol Syn-Simple, it displays the first bit of the message of Syn-Simple only once every rounds. Of course, it would be more efficient to display it times in a row, so that maj-consensus would make every agent agree on this bit, and then move to agreeing on the second bit, and so on. To achieve this, as in the proof of Syn-Simple, we can view a clock modulo , say , as written on bits. If agents already possess a “small” counter modulo they can use it to display the first bit for rounds, then the second one for rounds, and so on until each one of the bits of has been synchronized. This would synchronize all bits of the desired clock within rounds, w.h.p., while being very economical in terms of message length, since only bit is displayed at any time.

Therefore, we can use Lemma 4.1 to synchronize a counter modulo in rounds, using bits per message. Then, we can use a fourth bit to run maj-consensus on each of the bits of for consecutive rounds, for a total running time of rounds. At this point, an application of the Message Reduction Theorem would give us a protocol with running time using -bit messages. However, perhaps surprisingly, a similar strategy enables us to synchronize a clock modulo any integer (not necessarily a power of ).

Let us assume that is an arbitrary integer. Let be an upper bound on the convergence time of maj-consensus which guarantees a correct consensus with probability at least , for some constant large enough [DGM11]. Let be the smallest power of bigger than . By Lemma 4.1, using bits, the agents can build a synchronized clock running modulo in time . The other main ingredient in this construction is another clock which is incremented once every rounds and runs modulo . The desired clock modulo , which we denote , is obtained by

 (4.7) C:=(C′+QT′⋅T′)modT.

It is easy to check, given the definitions of and that this choice indeed produces a clock modulo .

It remains to show how the clock modulo is synchronized. On a first glance, it may seem as if we did not simplify the problem since is a clock modulo itself. However, the difference between and a regular clock modulo is that is incremented only once every rounds. This is exploited as follows.

The counter is written on internal bits. We show how to synchronize using a 4-th bit in the messages, similarly to the aforementioned strategy to synchronize ; we later show how to remove this assumption using the Message Reduction Theorem. Let us call a loop of modulo an epoch. The rounds of an epoch are divided in phases of equal length (the remaining rounds are just ignored). The clock determines which bit from to display. The first bit of is displayed during the first phase, then the second one is displayed during the second phase, and so on. By Theorem 2.1, the length of each phase guarantees that consensus is achieved on each bit of via101010Observe that, once clock is synchronized, the bits of do not change for each agent during each subphase. Thus, we may replace maj-consensus by the Min protocol where on each round of subphase each agent pulls another agent u.a.r. and updates her -th bit of to the minimum between her current -th bit of and the one of . However, for simplicity’s sake, we reuse the already introduced maj-consensus protocol. maj-consensus w.h.p. More precisely, after the first bit has been displayed for rounds, all agents agree on it with probability111111From Theorem 2.1, we have that after rounds, with large enough, the probability that consensus has not been reached is smaller than . Thus, after rounds, the probability that consensus has not been reached is smaller than . If we choose , we thus get the claimed upper bound . , provided is large enough. Thus, at the end of an epoch, agents agree on all bits of with probability greater than .

We have thus shown that, by the time reaches its maximum value of , i.e. after one epoch, all agents agree on w.h.p. and then increment it jointly. From Lemma 4.1, Syn-Intermediate takes rounds to synchronize a clock modulo w.h.p. Together with the rounds to agree on w.h.p., this implies that after rounds the clocks are all synchronized w.h.p.

Finally, we show how to get rid of the extra -th bit to achieve agreement on . Observe that, once is synchronized, this bit is used in a self-stabilizing way. Thus, since Syn-Intermediate has the bitwise-independence property, using Lemma 2.1, the protocol we described above possesses the bitwise-independence property too. By using the Message Reduction Theorem we can thus reduce the message size from bits to bits, while only incurring a constant multiplicative loss in the running time. The clock we obtain, counts modulo but is incremented every rounds only. It follows from Remark 2 that we may still view this as a clock modulo .

Remark 3 (Internal memory space)

The internal memory space needed to implement our protocols Syn-Simple, Syn-Intermediate, and Syn-Clock is close to in all cases: protocol Syn-Simple uses one counter written on bits, Syn-Intermediate needs internal memory of size

 (4.8) logT+O(loglogT+logloglogT+…)≤logT(1+o(1)),

and the internal memory requirement of Syn-Clock is of order .

5 Majority Bit Dissemination with a Clock

In this section we assume that agents are equipped with a synchronized clock modulo for some big enough constant . In the previous section we showed how to establish such a synchronized clock in time and using 3-bit messages. We have already seen in Section 1.2 how to solve the Bit Dissemination problem (when we are promised to have a single source agent) assuming such synchronized clocks, by paying an extra bit in the message size and an additive factor in the running time. This section is dedicated to showing that, in fact, the more general Majority Bit Dissemination problem can be solved with the same time complexity and using 3-bit messages, proving Theorem 1.1.

In Section 5.1, we describe and analyze protocol Syn-Phase-Spread, which solves Majority Bit Dissemination by paying only a additive overhead in the running time w.r.t. Clock Synchronization. For clarity’s sake, we first assume that the protocol is using bits (i.e. 1 additional bit over the bits used for Clock Synchronization), and we later show how to decrease the number of bits back to 3 in Section 5.2, by applying the Message Reduction Theorem.

The main idea behind the -bit protocol, called Syn-Phase-Spread, is to make the sources’ input bits disseminate on the system in a way that preserves the initial ratio between the number of sources supporting the majority and minority input bit. This is achieved by dividing the dissemination process in phases, similarly to the main protocol in [OHK14] which was designed to solve the Bit Dissemination problem in a variant of the model in which messages are affected by noise. The phases induces a spreading process which allows to leverage on the concentration property of the Chernoff bounds, preserving the aforementioned ratio. While, on an intuitive level, the role of noisy messages in the model considered in [OHK14] may be related to the presence of sources having conflicting opinion in our setting, we remark that our protocol and its analysis depart from those of [OHK14] on several key points: while the protocol in [OHK14] needs to know the the noise parameter, Syn-Phase-Spread do not assume any knowledge about the number of different sources, and our analysis do not require to control the growth of the number of speaking agents from above121212To get such upper bound, the analysis in [OHK14] leveraged on the property that in the model the number of agents getting a certain message can be upper bounded by the number of agent sending such message, which is not the case for the passive communication of the model..

In order to perform such spreading process with 1 bit only, the protocol in [OHK14] leverages on the fact that in the model agents can choose when to speak, i.e. whether to send a message or not. To emulate this property in the model, we use the parity of the clock : on odd rounds agents willing to “send” a display , while others display and conversely on even rounds. Rounds are then grouped by two, so rounds in the model correspond to round in the version.

In this section we describe protocol Syn-Phase-Spread. As mentioned above, for clarity’s sake we assume that Syn-Phase-Spread uses -bit messages, and we show how to remove this assumption in Section 5.2. Three of such bits are devoted to the execution Syn-Clock, in order to synchronize a clock modulo for some constant large enough. Throughout this section we assume, thanks to Theorem 1.2, that has already been synchronized, which happens after rounds from the start of the protocol. In Section 5.1.1, we present a protocol Phase-Spread solving Majority Bit Dissemination assuming agents already share a common clock.

Let be a constant to be set later. Protocol Phase-Spread is executed periodically over periods of length , given by a clock . One run of length is divided in phases, the first and the last ones lasting rounds, all the other phases lasting rounds. The first phase is called boosting, the last one is called polling, and all the intermediate ones are called spreading. For technical convenience, in Phase-Spread agents disregard the messages they get as their second pull131313In other words, Phase-Spread works in the model..

During the boosting and the spreading phases, as we already explained in the introduction of this section, we make use of the parity of time to emulate the ability to actively send a message or not to communicate anything as in the model141414Of course, agents are still not able to control who sees/contacts them. (in the first case we say that the agent is speaking, in the second case we say that the agent is silent). This induces a factor slowdown which we henceforth omit for simplicity.

At the beginning of the boosting, each non-source agent is silent. During the boosting and during each spreading phase, each silent agent pulls until she sees a speaking agent. When a silent agent sees a speaking agent , memorizes but remains silent until the end of the phase; at the end of the current phase, starts speaking and sets . The bit is then never modified until the clock reaches again. Then, during the polling phase, each agent counts how many agents with and how many with she sees. At the end of the phase, each agent sets their output bit to the most frequent value of observed during the polling phase. We want to show that, for all agents, the latter is w.h.p. (the most frequent initial opinion among sources).

5.1.2 Analysis

We prove that at the end of the last spreading phase w.h.p. all agents are speaking and each agent has with probability for some positive constant (where the dependency in is monotonically increasing), otherwise. From the Chernoff bound (Corollary A.1) and the union bound, this implies that when at the end of the polling phase w.h.p. each agent learns .

Without loss of generality, let , i.e. .

The analysis is divided in the following lemmas.

Lemma 5.1

At the end of the boosting phase it holds w.h.p.

 k(1)1+k(1)0 ≥(k1+k0)γphase3logn⋅{1}{k1+k0n−2√nlogn}, (5.10) k(1)1k(1)0 ≥k1k0(1−√9γphasek0).
• First, we prove (5.9). By using Fact 4.1, we have

 E[k(1)1+k(1)0] =k1+k0+(n−k1−k0)(1−(1−k1+k0n)γphaselogn) (5.11) ≥k1+k0+(n−k1−k0)(1−e−k1+k0nγphaselogn).

We distinguish three cases.

Case . By using Fact 4.1 again, from (5.11) we get

 E[k(1)1+k(1)0] ≥k1+k0+(n−k1−k0)(1−e−k1+k0nγphaselogn) ≥k1+k0+(n−k1−k0)k1+k0nγphaselogn1+