Efficient Multiparty Interactive Coding for Insertions, Deletions and Substitutions

# Efficient Multiparty Interactive Coding for Insertions, Deletions and Substitutions

Ran Gelles Faculty of Engineering, Bar-Ilan University, ran.gelles@biu.ac.il. Supported in part by the Israel Science Foundation (ISF) through grant No. 1078/17.    Yael T. Kalai Microsoft Research, yael@microsoft.com.    Govind Ramnarayan EECS Department, MIT, govind@mit.edu. Supported in part by NSF award CCF 1665252 and NSF award DMS-1737944.
###### Abstract

In the field of interactive coding, two or more parties wish to carry out a distributed computation over a communication network that may be noisy. The ultimate goal is to develop efficient coding schemes that can tolerate a high level of noise while increasing the communication by only a constant factor (i.e., constant rate).

In this work we consider synchronous communication networks over an arbitrary topology, in the powerful adversarial insertion-deletion noise model. Namely, the noisy channel may adversarially alter the content of any transmitted symbol, as well as completely remove a transmitted symbol or inject a new symbol into the channel.

We provide efficient, constant rate schemes that successfully conduct any computation with high probability as long as the adversary corrupts at most fraction of the total communication, where is the number of links in the network and is a small constant. This scheme assumes the parties share a random string to which the adversarial noise is oblivious. We can remove this assumption at the price of being resilient to adversarial error.

While previous work considered the insertion-deletion noise model in the two-party setting, to the best of our knowledge, our scheme is the first multiparty scheme that is resilient to insertions and deletions. Furthermore, our scheme is the first computationally efficient scheme in the multiparty setting that is resilient to adversarial noise.

## 1 Introduction

In his pioneering work, Schulman [Sch92, Sch96] introduced and studied the problem of how can one perform an interactive two-party computation over a noisy communication channel. In [RS94], Rajagopalan and Schulman extended the two-party case and considered a network of  parties that wish to compute some function of their private inputs by communicating over an arbitrary111By “arbitrary” we mean that the topology of the network can be an arbitrary graph  where each node is a party and each edge is a communication channel connecting the parties associated with these nodes. noisy network. The work of [RS94] shows that if each channel is the binary symmetric channel222That is, a channel that flips every bit with some constant probability . (BSC), then one can obtain a coding scheme that takes any protocol that assumes noiseless communication, and converts it into a resilient protocol that computes the same task over the noisy network.

The coding scheme in [RS94] defies noise by adding redundancy. The amount of added redundancy is usually measured with respect to the noiseless setting—the rate of the coding is the communication of the noiseless protocol divided by the communication of the noise-resilient one. The rate assumes values between zero and one, and ideally the rate is bounded away from zero, commonly known as constant or positive rate. The rate may vary according to the network in consideration, for instance, the rate in [RS94] behaves as where is the maximal degree in the network. Hence, for networks where the maximal degree is non-constant, the rate approaches zero as the network size increases.

The next major step for multiparty coding schemes was provided by Jain et al. [JKL15] and by Hoza and Schulman [HS16]. In these works the noise is no longer assumed to be stochastic but instead is adversarial. That is, they consider worst-case noise where the only limit is the amount of bits flipped by the adversary. They showed that as long as the adversary flips at most -fraction of the total communication, a coding scheme with a constant rate can be achieved, where is some small constant, and is the number of communication links in the network.333[JKL15] obtained this result for the specific star network, whereas [HS16] generalized this result to a network with arbitrary topology. Hoza and Schulman further improved the noise resilience to , where is the number of parties, at the expense of reducing the rate to , which is no longer constant.

In this work we further improve the state of the art and show coding schemes for arbitrary networks that defy a stronger type of noise, namely, insertions and deletions. This type of noise may completely remove a transmission (so that the receiver is not aware that a bit was sent to him), or inject new transmissions (so that the receiver receives a bit while the sender didn’t send anything). Insertion and deletion noise is more general, and considered to be more difficult, than bit-flips. Indeed, a bit flip (a substitution noise) can be simulated by a deletion followed by an insertion. Our first coding scheme (Algorithm A) achieves a constant rate, and is resilient to -fraction of adversarial insertion, deletion, and substitution noise,444We assume that each insertion, deletion, or substitution counts as a single corruption. assuming the parties share a common random string, and assuming the adversarial noise is oblivious to this common randomness. In our second coding scheme (Algorithm B), we remove the common randomness assumption and remove the restriction of the obliviousness of the adversary, at the price of being resilient to -fraction of insertion, deletion, and substitution errors. If we assume the parties pre-share a random string, then we can further improve the resilience against a non-oblivious adversary to -fraction of insertion and deletion errors (Algorithm C).

A main feature of our coding schemes is that they are all computationally efficient. So far, all previous coding schemes that considered adversarial noise in the multiparty setting used a combinatorial object known as a tree code for which no efficient construction is known. See Table 1 for a comparison of our new results (Algorithms A, B, and C) with the state of the art.

##### The communication model.

Some care should be taken when defining the communication model in the presence of insertions and deletions (see, for instance, [BGMO17, SW17] for the two-party case). Consider, for instance, a network where at each round every party sends a single bit to all its neighbors; this setting is used by [RS94, HS16, ABE16, BEGH17] and is sometimes called fully utilized. In such a setting, insertions and deletions reduce to substitutions and erasures: since exactly one bit is expected at every round, if no bit has arrived at a given round this must be due to adversarial activity.

In this work we consider a more relaxed communication setting, where a party may or may not speak at a given round; this setting is very common for distributed computations and was previously considered for interactive coding by [JKL15, GK17]. Furthermore, we no longer assume that the underlying error-free protocol is fully utilized. We relax that assumption and only require the underlying protocol to have a fixed order of speaking, which is independent of the parties’ inputs. Naturally, one can convert any protocol in our model to a fully-utilized protocol by forcing all parties to speak at every round, and then apply an interactive coding scheme to the fully-utilized protocol. However, the conversion to a fully-utilized protocol may cause the communication complexity to increase by a factor of up to , and thus prevent a constant rate coding.

As first noted by Hoza [Hoz15], when parties are allowed to keep silent at certain rounds, the pattern of communication may carry information by itself. To see that, consider a party that speaks once every two rounds—on an even round to communicate the bit ‘0’ and on odd round to denote ‘1’. In fact, this kind of communication is completely resilient to noise that only flips bits since only the timing of the transmission matters. Therefore, in this setting it seems crucial to allow insertion and deletion errors, which we do.

We note that even though our noise-resilient protocol increases the communication complexity by only a constant factor, it may blow up the number of rounds of communication by more than a constant factor. As opposed to the fully utilized model, where the communication complexity determines the round complexity, this is no longer the case in our model. Specifically, in our model an interactive protocol with communication complexity may consist of rounds (in the case that the network if fully utilized) or may consist of rounds (in the case where the communication is very sparse).

In a recent work, Efremenko et al. [EHK18] constructed an interactive coding scheme resilient to insertions and deletions that blows up both the communication and the round complexity by at most a constant factor. They considered only the two-party setting, and a communication model in which in each round each party sends a message of an arbitrary length. We leave the problem of extending their result to the multiparty setting for future work.

### 1.1 Our results

We give efficient interactive coding schemes with constant rate for arbitrary synchronous networks (not necessarily fully-utilized) that suffer from a certain fraction of insertion, deletion and substitution noise. Our result is two-fold. First, we assume that the parties share a common random string, and that the adversary is oblivious, i.e., the noise is predetermined and is independent of this randomness. In this case our result is as follows.

###### Theorem 1.1 (common randomness, oblivious noise, informal).

Let be an arbitrary synchronous network with nodes and links. For any noiseless protocol over  with a predetermined order of speaking, and for any sufficiently small constant , there exists an efficient coding scheme that simulates over a noisy network , assuming the parties share common randomness. The simulated protocol is robust to adversarial insertion, deletion, and substitution noise, assuming at most -fraction of the communication is corrupted. The simulated protocol communicates bits, and succeeds with probability at least , assuming the noise is oblivious.

Next, we remove the common randomness and the restriction to oblivious noise. Namely, we consider adversaries that may adaptively decide which transmissions to corrupt according to the observed transcript, as well as the parties inputs (however, the noise is unaware of any private coin-tossing a party may perform later in the protocol). In this case we still obtain an efficient coding scheme with a constant rate, albeit, with slightly smaller noise resilience.

###### Theorem 1.2 (no common randomness, non-oblivious noise, informal).

Let be an arbitrary synchronous network with nodes and links. For any noiseless protocol over  with a predetermined order of speaking, and for any sufficiently small constant , there exists an efficient coding scheme that simulates over a noisy network . The simulated protocol is robust to adversarial insertion, deletion, and substituition noise, assuming at most -fraction of the communication is corrupted. The simulated protocol communicates bits, and succeeds with probability at least .

In Appendix B we consider the case where the adversarial channel is non-oblivious, however, the parties pre-share randomness. In this case we show a coding scheme (Algorithm C) that is resilient to a somewhat higher noise level of -fraction of insertion and deletion noise, while still incurring a constant blowup in the communication. See Appendix B for the complete details.

### 1.2 Our techniques

The basic idea towards constructing a multiparty coding scheme, is to have each pair of parties perform a two-party coding scheme [RS94, JKL15, HS16]. However, merely correcting errors in a pairwise manner is insufficient, since if a pair of parties found an inconsistency and backtracked, this may cause new inconsistencies (in particular, between these parties and their other neighbors). In [JKL15], this problem was solved by assuming there is one party who is connected to all parties (i.e., the star topology). This party has a global view on the progress of the simulation in the entire network, and hence can function as the “conductor of the orchestra.”

In our setting, where the topology may be arbitrary, no such central party exists and consequently no party knows the state of the simulation in the entire network. Instead, each party only sees its local neighborhood, and needs to propagate its “status” to the entire network in a completely decentralized manner.

We mention that Hoza and Schulman [HS16] also considered an arbitrary topology, but they consider the fully-utilized model. Loosely speaking, correcting errors in the fully-utilized model seems easier since the parties can afford to do a consistency check at every round, and therefore the error is caught before too much wasted communication occurred. We elaborate on this point later.

In order to keep our simulation efficient, as opposed to previous works in the multiparty setting which used the (inefficient) tree-code approach, we use the rewind-if-error approach [Sch92, BK12, KR13, Hae14, GHK18] (see also [Gel17]). Namely, each two neighboring parties send a hash of their simulated pairwise transcripts, and if the hash values do not match, then an error is detected, and the two parties initiate a “meeting-points” mechanism [Sch92, Hae14] in order to find a previous point where their transcripts agree.

As mentioned above, once a party decides to rewind due to an error on the link , this has an effect on the simulation of  with its other neighbors, and those should be rewound to the same point as well. We emphasize that merely running the meeting-points mechanism with all the neighbors does not get the desired result, since the transcripts that shares with its other neighbors agree!

The next idea that comes to mind is to have party artificially truncate its transcript with all its neighbors and initiate a meeting-points sequence. However, this approach runs into several difficulties. First, this causes the rewind to be made sequentially, and can result in different parties simulating a different part of the original protocol at a given time (and thus cause the parties to be “out of sync”).

To illustrate this point, consider the line topology, where for each , party is connected to party and (and party is connected only to party and party is connected only to party ). Moreover, suppose that the underlying protocol proceeds in chunks, where in each chunk the parties send messages in a straight-line manner from party  to party , and then party  and party  send  messages back-and-forth to each other. Note that since we do not want to increase the overall communication complexity by more than a constant factor, we can only do consistency checks once every chunk.

Next suppose that an error occurred between parties and in a given chunk . In particular, this implies that the bits communicated between party and party in this chunk are useless. Assuming there were no additional errors or hash collisions, parties and will notice this inconsistency in chunk , which will cause them to rewind their pairwise transcript. As a result, parties  and will notice this error in chunk , and so on. Note that parties and will only notice this error at chunk , and at that point, these two parties will send  wasted communication.555We emphasize that this problem does not exist in the fully-utilized network, since in that setting a consistency check can occur in each round. That is, party 1 and 2 will be able to detect the error after one round, rather than after rounds as in the non fully-utilized example.

To overcome this blowup, we introduce a “flag-passing” phase in which each party informs the network whether all seems correct and the simulation should continue, or it is in a error-correcting state and the network should idle. To this end, each party generates its own continue/idle flag, and these flags are communicated via a spanning tree over the network in a standard convergecast manner: the leafs transfer their flag to their parents and so on until the flag reaches the root. Each node passes a “continue” flag if and only if all the incoming flags from its children, as well as its own flag, are “continue”. Once the flag reaches the root the process repeats in the other direction, so that all nodes obtain the final flag (as long as there were no errors in the flag-passing communication).

Note that bits are sent during this flag-passing phase. To ensure that this does not blow up the communication, the flag passing phase cannot be run too often. In particular, in order to obtain a constant blowup, we are allowed to communicate O(1) bits of meta-data per 1 bit of the protocol we simulate. To this end, our noise-resilient protocol will consist of phases, where each phase will take care of one specific activity in the simulation and will have at most bits of communication. We now briefly sketch the different phases in our protocol, and refer the reader to Section 3.1 for an elaborated sketch of the protocol’s structure and a detailed description of the various phases and their role.

In the simulation phase we simulate a (small) chunk of the original protocol consisting of bits. Then we run a consistency-check phase, where the parties check if they are consistent with their neighbors via the meeting points mechanism. This consistency-check phase (which we call the meeting points phase) consists of at least bits, since all the neighbors communicate. In order to avoid a blowup in communication, we will also ensure it takes no more than bits, interleaving its functionality over several iterations, if needed. Next we run the flag-passing phase, which as discussed above, takes communication. If the flag-passing propagated an idle flag, then in the next simulation phase each party simply sends a dummy message, and in the next consistency-check phase the parties will try to make progress and correct the errors. If the propagated flag is a continue flag then the parties continue to simulate the next chunk of the protocol.

It turns out, however, that the high-level protocol described above is insufficient. As argued above, it may take rounds of consistency checks for the entire network to do the necessary rewinding caused by a single error. During this time the network is idle (to ensure that the parties remain in sync), and yet bits are communicated. Thus, a single error can cause an increase of in the communication complexity (during which no progress is made in the protocol). It then follows that, using this approach, if we want to obtain only constant blowup in communication we can only afford -fraction of error. One of the conceptual novelties of this work is in the way we get around this blowup. This is done by separating the rewindings that stem from channel noise from ones that stem from the rewinding corresponding to other links, and not due to noise on that specific link. The first type will be corrected using the meeting-points approach. For the second type, we introduce a new rewind phase in which the parties can send “rewind” requests to their neighbors. Such a rewind request (if accepted) causes the receiver to truncate a single chunk of its transcript simulated on that link. This process essentially speeds up the rewinding process in the entire network. As usual, caution should be taken since these requests may arrive corrupted at the other side (or be injected by the noisy channel, etc.).

Finally, we mention that there are several other delicate issues that need to be dealt with. To give just a single example, one can observe it is important that the parties all agree on which phase they are currently in: whether it is the consistency-check phase (where the meeting point mechanism is run), the flag-passing phase, the simulation phase (which may be a dummy phase if the parties should not simulate due to an idle flag), or the rewinding phase (described above).

To avoid any possible confusion we fix the number of rounds each phase takes. This leads to several difficulties. For instance, the meeting-point mechanism may span over many rounds (proportional to the number of corruptions observed). If the meeting point mechanism takes more than the predetermined number of rounds allocated to the consistency-check phase, we will not be able to complete it during that phase. Instead, we let it span over several iterations of the consistency-check phase (separated by the other phases). We do this by ensuring that each of these phases consist of an a priori fixed number of rounds.

Furthermore, for the simulation phase, we need to partition the underlying (errorless) protocol into a predetermined number of rounds, while ensuring that each such chunk consists of  bits. Note that this is not trivial at all since, assuming a general protocol, a fixed number of rounds (say, ) implies a communication between and bits, and moreover, the amount of communication may depend on the specific inputs and noise that the parties see. Due the above, we will assume that in the underlying protocol, whether a message is sent between a pair of parties in a given round is known and does not depend on the private inputs or the transcript so far (and only the content of the message depends on the inputs and the transcript). We will define a chunk as a set of consecutive rounds that consists of communication in our underlying protocol, and let the predetermined number of rounds in the simulation phase be larger than the number of rounds in the longest chunk (where the parties remain quiet after the chunk is done, until the simulation phase is over).

A more detailed overview of the coding scheme can be found in Section 3.1.

##### Common random string

In our interactive coding scheme described above, during the consistency phase, parties send each other a hash of their simulated transcript. At first, we assume that the seeds to these hash functions come in the form of shared randomness (also known as a common random string), and we analyze the protocol assuming the adversary is oblivious to this shared random string.

Next, we show how to remove the need for this shared common randomness. The basic idea is adopted from previous work [GMS11, BK12, Hae14]: simply have each pair of parties sample this string and send it across their shared link. However, this random string is very long, and sending it will increase the communication complexity by a quadratic factor. Therefore, instead, the parties communicate a short random string  which serves as a seed that generates a longer -biased string. This is done via a well-known technique by Naor and Naor [NN93], or Alon et al. [AGHP92]. This -biased randomness is used instead of the shared randomness above.

Intuitively, -biased randomness suffices, since hashing with a -biased seed behaves “close” to hashing with a truly uniform seed. However, this holds only when the input to the hash function is independent of the -biased seed. Unfortunately, in our setting, the input to the hash function may depend on the -biased seed. More specifically, in our coding scheme, the parties hash their partial transcripts in each meeting points phase, and these partial transcripts do depend on seed, since they are a function of the output of previous hashes, e.g., on whether or not hash-collisions occurred in previous rounds.

We circumvent this issue by showing that a given pattern of hash-collisions determines the (partial) transcripts throughout the protocol. Once the transcripts are fixed, the hashes output behave similarly to the case of uniform randomness, up to a statistical difference of at most . Therefore, if is small enough, we can prove robustness for all possible fixed patterns of hash-collisions. Then, we can union bound over all the possible patterns of hash collisions, which bounds the failure probability for the -biased case by the failure of the uniform case plus an error term that depends on and the cardinality of the different hash patterns. One cannot take  to be too small, since generating the -biased string requires communicating bits, hence the smaller gets, the more communication we need. Luckily, we can set to be sufficiently small so that the probability of failure is still exponentially small even after the union bound, while keeping a constant rate in the communication.

We refer the reader to Section 5 for details.

##### Our analysis

A main hurdle is analyzing the correctness of the scheme and its probability of success. As in previous works, the analysis is done using a potential argument that measures the progress of the simulation. The progress of each individual link is given by a potential function similar to the one in [Hae14]. On top of this per-link potential, we need to measure the progress from a “global” point of view. We next highlight the main technical difficulties in our analysis (that do not come up in previous works).

The first difficulty arises from our communication model, which is not fully-utilized. As a result, the communication complexity of the noise-resilient protocol is not a priori bounded, even if the communication in the underlying error-free protocol is bounded. In particular, suppose that in the underlying errorless protocol, in most of the rounds the communication is sparse (say, a single party speaks in each round). The adversary can insert errors that will cause the parties get out of sync, and as a result all speak simultaneously. Thus, by inserting errors, the communication increases, which in turn gives the adversary additional budget to inject more error. This issue does not come into play in the fully-utilized model, since in that setting, if the underlying protocol takes  rounds, then in the noise-resilient protocol the parties simply abort after rounds, which ensures only a constant blowup to the communication complexity. In our setting, we also instruct the parties to abort after rounds, but in our case the communication can still increase from to .

The second difficulty relates to hash collisions. Recall that our noise-resilient protocol consists of phases, where in the consistency phase all parties check if they are consistent with their neighbors. This is done using the meeting points mechanism, in which each party sends a hash of their (partial) transcript. To ensure that the communication rate remains constant, the hash functions have constant output length, and thus a pair of parties will not notice an error between them with constant probability. This event is known as a hash collision.

We note that in the two party coding scheme of [Hae14] hash collisions also occur with constant probability, which was the main technical hurdle in that work. However, in our work, the situation is much worse, since the number of parties is large, and in particular larger than the inverse of the collision probability. To exemplify the issue, consider (as above) the example of a line network: if an error occurs between parties  and , then they will need to propagate this throughout the line, but with very high probability there will be a hash collision and the propagation will fail! In addition, hash collisions may cause the parties to continue simulating the protocol in a wrong manner, hence, may increase the communication even further (and as a result increasing the error budget).

Indeed, bounding the expected number of hash collisions is a subtle task, and is a main component of our proof. This is done in three steps.

We begin by assuming the parties share a common random string (CRS), and assume that the noise is independent of this CRS. As mentioned above, this random string is used as seeds to the hash functions, making it so the probability of a hash-collision is exponentially small in the output length, for any given input. We bound (with high probability) the number of hash-collision and as a result, show that the total communication increased by only a constant factor (with high probability). This part of the analysis is quite similar to the analysis done in previous works (in particular, the work of [Hae14]).

We then remove the CRS, by replacing the CRS with a -biased string, and argue that the number of hash-collision does not increase by much, assuming the adversary is oblivious to the -biased string. As mentioned above, this part is quite subtle, and requires careful conditioning of probabilistic events, and a careful setting of the parameters to ensure that the union bound does not reduce the probability of success by too much.

Finally, we consider the case where the adversary is non-oblivious. We would like to follow the approach of [Hae14] and simply union bound over all the possible oblivious attacks. Unfortunately, as opposed to the two-party case of [Hae14], here there are too many options for attacks that lead to too many possible hash collisions, and the proof fails spectacularly.

To overcome this issue, we must increase the length of the hash output, so it is no longer constant but rather . Then the inherent hash-collision probability drops to , and the union bound yields the desired outcome. However, the length of each hash increased from a constant to , and the rate of the coding scheme is no longer constant. To overcome this issue, we simulate the protocol in larger chunks. That is, instead of sending a hash every bits of communication of , parties exchange hashes of length every bits of simulation, thus increasing the overall communication by just a constant factor.

### 1.3 Related work

As mentioned above, interactive coding was initiated by Schulman [Sch92, Sch96]. Over the last several years there has been tremendous amount of work on interactive coding schemes in the two-party setting (e.g., [GMS14, BR14, BE14, BKN14, GH14, GHK18]), and in the multi-party setting (detailed below). We refer the reader to [Gel17] for a survey on the field (and to references therein).

In what follows we only mentions the schemes that are closely related to our setting, namely, ones that are either in the multiparty setting, or ones that are in the two-party setting but are resilient to insertions and deletions.

Coding schemes for insertions and deletions in the two-party setting were first constructed by Braverman, Gelles, Mao, and Ostrovsky [BGMO17]. As was noted above, that in the model where in each round each party sends a single bit (which is the model used by most previous works, including [BGMO17]), insertions and deletions are only meaningful in the asynchronous model, as otherwise, such an error model is equivalent to the erasure model. However, the asynchronous model seems to be incompatible with insertions and deletions, since a single deletion can cause a “deadlock”, where both parties wait to the next incoming message. Braverman et al. suggested a model where any deletion is followed by an insertion, thus the protocol never “halts” due to noise. However, the noise may delete a certain message and then inject a spoofed “reply” to the original sender. Then, one party believes that the protocol has progressed by one step, while the other party is completely oblivious to this. This type of noise was called synchronization attack as it brings the parties out of synch.

Braverman et al. [BGMO17] constructed a coding scheme for insertions and deletions in this model with constant communication rate, and resilience to constant fraction of noise. Later, Sherstov and Wu [SW17] showed that a very similar scheme can actually resist an optimal noise level. Both these schemes are computationally inefficient. Haeupler, Shahrasbi, and Vitercik [HSV17] constructed an efficient scheme that is resilient to (a small) constant fraction of insertions and deletions. Furthermore, they constructed a scheme where the communication rate approaches 1 as the noise level approaches 0. Efremenko, Haramaty, and Kalai [EHK18] considered the synchronous setting, where parties can send messages of arbitrary length in each round, and the adversary may insert and delete bits in the content of each message. They construct an efficient coding scheme with constant communication rate, and constant blowup in the round complexity, that are resilient to a small constant fraction of noise.

In the multiparty setting, Rajagopalan and Schulman [RS94] constructed a coding scheme for stochastic noise with rate for networks with maximal degree . This implies a constant rate coding scheme for graphs with constant degree. Alon et al. [ABE16] showed that if the topology is a clique, or a dense -regular graph, then constant rate coding is also possible. Yet, Braverman, Efremenko, Gelles, and Haeupler [BEGH17] proved that a constant rate is impossible if the topology is a star. All the above works assume a synchronous fully-utilized network. Gelles and Kalai [GK17] showed that constant rate coding schemes are impossible also on graphs with constant degree, such as a cycle, assuming a synchronous, yet not fully-utilized model.

The case of adversarial noise in the multiparty setting was first considered by Jain, Kalai, and Lewko [JKL15], who constructed a constant-rate coding scheme over an asynchronous star network that is resilient to fraction of noise. Hoza and Schulman [HS16] considered synchronous networks with arbitrary topology and constructed a constant rate coding scheme that is resilient to noise. Via routing and scheduling techniques, they show how to resist a fraction of -noise, while reducing the rate to . Both these schemes use tree-codes, and therefore are computationally inefficient.

Aggarwal, Dani, Hayes, and Saia [ADHS17] constructed an efficient synchronous coding scheme, assuming the parties have shared randomness that is unknown to the adversary. They consider a somewhat different communication model than the above, where the length of the protocol is not predetermined and may vary with the noise (similar to the two-party adaptive notion of [AGS16]). Their coding scheme is resilient to an arbitrary (and a priori unknown) amount of bit-flips (as long as the noise-pattern is predetermined and independent of parties shared randomness), and has a rate of .

Censor-Hillel, Gelles, and Haeupler [CGH18] constructed asynchronous schemes, where the parties do not know the topology of the network (an assumption that is very common in the distributed computation community). Their scheme is resilient to noise and has a rate of .

## 2 Preliminaries

### 2.1 Notations and setting

##### Notations and basic properties

For we denote by the set . The function is taken to base 2. For a distribution  we use to denote that is sampled according to the distribution . For a finite set  we let be the uniform distribution over ; we commonly omit  and write when the domain is clear from context. The indicator function is set to if and only if the event  occurs. We usually use it to indicate an equality of values, e.g., , which equals if and only if .

##### Multiparty interactive communication model

We assume an undirected network of parties, , and edges, where is connected to if and only if . We identify parties with nodes, and treat and node  as one. For , let denote the neighborhood of in , i.e. . The network is assumed to be a connected simple graph (i.e., without self-loops or multi-edges).

The communication model works in synchronous rounds as follows. At each round, any subset of parties may decide to speak. Each link is allowed to transmit at most one symbol per round in each direction from a constant-sized alphabet . We will assume throughout this paper that (both in the noiseless and noisy settings), however, our results extend to a larger alphabet as well. At each round, a party is allowed to send multiple (possibly different) symbols over multiple links. A transmission over a certain link at a certain round is the event of a party sending a message on this link at that round (if both parties send messages these are two separate transmissions).

We emphasize that, contrary to most previous work, our communication model is not fully-utilized and does not demand all parties to speak at each round on every communication channel connected to them; in fact we don’t demand a certain party to speak at all at any given round.

##### Multiparty protocol

Each party is given an input , and its desire is to output for some predefined function  at the end of the process. A protocol dictates to each party what is the next symbol to send over which channel (if any), as a function of the party’s input, the round number, and all the communication that the party has observed so far. After a fixed and predetermined number of rounds, the protocol terminates and each party outputs a value as a function of its input and observed transcript. The length of the protocol, also called its round complexity is the maximal number of rounds takes to complete on any possible input. The communication complexity of the protocol (in bits), denoted by , is the total number of transmissions in the protocol times . Since we assume , the communication complexity equals the number of transmissions.

Throughout this work we denote the underlying (i.e., noiseless) interactive protocol that the parties are trying to compute by . We will usually use the notation to denote the length of the (noiseless) protocol in chunks rather than in rounds; see Section 3.2 for details on partitioning protocols into chunks. We assume that the noiseless protocol has the property that the speaking order is independent of the inputs that the parties receive, and may depend only on the messages the party received.666While we assume a fixed order of speaking for the noiseless protocol , we emphasize that this requirement will not apply on the coding scheme that simulates  over the noisy network.

##### Noise model

We concern ourselves with simulating multiparty interactive protocols among parties over a noisy network. A single transmission over a noisy channel with alphabet is defined as the function

 Ch:Σ∪{∗}→Σ∪{∗},

where is a special symbol that means “no message”. Given a single utilization of the channel, we say the transmission is corrupt (or noisy) if and . Specifically, if , this event is called a substitution noise, if (and ) the event is an insertion and if and the event is called a deletion.

We stress that the noise may have a crucial effect on the protocol executed by the parties. Not only its correctness may be harmed, but also its length and communication may vary. Hence, in the noisy setting we redefine and to be the length and communication complexity, respectively, of a given instance of the protocol (which is determined by the inputs, the randomness, and the noise pattern). Moreover, we emphasize that, opposed to some previous work where the number of rounds fixes the communication complexity, in our model these two are only related by the trivial bound . We note that the gap between these two may be substantial.

The fraction of noise observed in a given instance, is the fraction of corrupt transmissions out of all the transmissions in that instance. For the binary case the noise fraction can be written as,

 μ=#(noisy transmissions)CC(Π).

This is also known as relative noise fraction in adaptive (length-varying) settings, see for instance [AGS16, BGMO17, SW17, EHK18].

An oblivious adversary is an adversary that pre-determines its noise attack, independently of the inputs and randomness of the parties. In this work we focus on a specific oblivious adversary known as additive adversary [BBT60] (see, e.g., [CDF08, GS10, GIP14] for applications resilient against additive adversaries).

Assuming a binary alphabet, the oblivious additive adversary fixes a noise pattern that defines the noise per each link in each round of the protocol. In particular, the entry determines the noise added to the link in round . Assuming transmits to in iteration the message (where denotes the case of no message, i.e., ), then receives the transmission . The number of corruptions is the number of non-zero entries in . The noise fraction, again, is the number of corruptions divided by the actual communication given that error pattern (and specific inputs and randomness).

###### Remark 1.

While there are other types of oblivious adversaries, we choose the additive one because (a) it is easier to analyze, and (b) it is sufficient for our end-goal (Theorem 1.2). In particular, we view the oblivious-adversary scheme as a stepping stone towards obtaining a coding scheme that is resilient to non-oblivious adversary. As such, any (strong enough) oblivious adversary would do, and we choose the additive one as it makes the extension to non-oblivious adversaries (in Section 6) simpler.

If our final goal was the oblivious case (i.e., Theorem 1.1), then it would make sense to choose a stronger type of an oblivious adversary; for instance, an adversary that fixes in advance the output of the channel in any corrupted transmission, e.g., whether it is 0, 1, or silence. Note that while such an adversary is more natural for the case of insertions and deletions, counting the actual number of corruptions such an adversary makes in this case is more difficult (e.g., when the adversary sets the output to be exactly what the parties communicate in that round anyways, this would not count as a corruption).

We remark here that our result (Theorem 1.1) also holds for the above stronger oblivious adversary. In fact, the oblivious-adversary analysis in Sections 4 and 5 go through as-is. However, as mentioned, extending results for this oblivious adversary to the non-oblivious case would require additional care. ⬣

##### Coding scheme—a noise-resilient protocol

A coding scheme is a scheme that converts any protocol  into a noise-resilient protocol  that simulates  correctly with high probability on any given input. We say that a protocol simulates  correctly on a given input if each party can obtain its output corresponding to  from the transcript it sees when executing . The protocol is said to be resilient to -fraction of noise (with probability ), if it simulates correctly (with probability at least ) when executed over a noisy network with adversarial noise that corrupts at most -fraction of the transmissions in any instance of the protocol.

### 2.2 Codes

We use a standard binary error-correction code with constant rate and constant distance, which has an efficient encoding and decoding procedures. Such codes can be constructed by concatenating Reed-Solomon codes with binary linear random codes, or by employing the near-linear codes by Guruswami and Indyk [GI05].

###### Theorem 2.1.

For every there exists and sufficiently large such that the following holds. There exists a binary linear code with rate and relative distance at least . Furthermore, can be encoded and decoded from up to fraction of bit-flips in polynomial time in .

### 2.3 Hash functions, δ-biased strings

We use a standard inner-product based hash function. The hash function is seeded with a (usually random) string such that each bit of the output is an inner product between the input and a certain part of (using independent parts of for different output bits). Formally,

###### Definition 2.2 (Inner Product Hash Function).

The inner product hash function is defined for any input of length and seed  of length , as the concatenation of inner products between  and disjoint parts of , namely,

 h(x,s)=⟨x,s[1,L]⟩∘⋯∘⟨x,s[(τ−1)L+1,τL]⟩.

We use the shorthand .

We sometimes abuse notation and hash a ternary-string (or a string over a larger alphabet). In this case, assume we first convert into a binary string in the natural manner (each symbol separately, using bits) and then hash the binary string. The seed length should increase appropriately (by at most a constant).

The following is a trivial property of the inner product hash function, stating that, given a uniform seed, the output is also uniformly distributed.

###### Lemma 2.3.

For any , and

 Prs∼U[hs(x)=r]=2−τ.

It is easy to see Lemma 2.3 also implies that the collision probability of the inner product hash function with output length is exactly , since given two strings and such that , the Lemma implies that the probability that .

Usually, the hash function is seeded with a uniform string. In order to reduce the amount of randomness needed, we use -biased random strings, which are close enough to uniform (for our needs), yet can be constructed from much shorter (uniform) seeds, via a result by Naor and Naor [NN93].

###### Definition 2.4 (δ-bias).

Fix . A distribution over is -biased if for any , we have that

 ∣∣ ∣∣Prx∼D[n∑i=1vixi=0]−1/2∣∣ ∣∣≤δ.
###### Lemma 2.5 ([Nn93, Aghp92]).

There is a constant and an efficiently computable function such that the following holds. Fix . For any size and a uniformly random string , we have that is a -biased distribution over bit strings of length of .

Finally, we appeal to the following Lemma from [Hae14] (which is in turn based on [NN93]) that connects the behaviour of hash function seeded with -biased string to hashes seeded with uniformly random string as long as their input is fixed and independent of the seed.

###### Lemma 2.6 (Lemma 6.3 from [Hae14]).

Fix positive integers , and . Consider pairs of binary strings where each string has length at most . Let be an inner product hash family with input length , output length , and seed length . Let be a random seed of length .

1. If is drawn from a uniform distribution over , then for each ,

 Prs(i)[hs(i)(xi)=hs(i)(yi)]=2−τ

if . Note that, trivially, when . Furthermore, for each the events of hash collisions are independent.

2. If is drawn from a -biased distribution over , then it holds that the distribution

 (1hs(1)(x1)=hs(1)(y1),…,1hs(k)(xk)=hs(k)(yk))

is -close to the case where is uniformly random.

Recall that -closeness means that the statistical difference between two distributions is bounded by .

###### Definition 2.7 (δ-closeness).

We say that distributions over the probability space  are -close if

 supA⊆Ω|P(A)−Q(A)|≤δ.

The above is equivalent to having

Note that as a corollary of Lemma 2.6, by setting to 1, to a constant, and using -biased randomness with , we can get the hash functions with seed length that is logarithmic in the size of their input, by seeding the inner product hash function with -biased seeds. This corollary was noted in previous work, e.g. in Naor and Naor [NN93].

###### Corollary 2.8.

There is a hash function family with input length , output length , and seed length such that, given any pair of inputs and such , we have that

 PrQ[hQ(x)=hQ(y)]≤2⋅2−τ

## 3 Coding scheme for oblivious adversarial channels

### 3.1 Overview

The high-level description of the simulation is as follows. The basic mechanism is the rewind-if-error approach from previous works [Sch92, BK12, Hae14] (see also [Gel17]). In particular, the parties execute the noiseless protocol  for some rounds and then exchange some information to verify if there were any errors. If everything seems consistent, the simulation proceeds to the next part; Otherwise, the parties rewind to a previous (hopefully consistent) point in and proceed from there.

Note that since multiple parties are involved, it may be that some parties believe the simulation so far is correct while others believe it is not. Yet, even if one party notices some inconsistency, the entire network should rewind. Hence, we need a mechanism that allows propagating the local view of each party to the entire network.

Our simulation algorithm consists of repeatedly executing the following four phases: (i) consistency check, (ii) flag passing, (iii) simulation, and (iv) rewind. The simulation protocol cycles through these four phases in a fixed manner, and each such cycle is referred to as an iteration. Each phase consists of a fixed number of rounds (independent of the parties’ inputs and the content of the messages exchanged). Therefore, there is never an ambiguity as to which phase (and which iteration) is being executed. We next describe each phase (not in the order they are preformed in the protocol).

• Simulation: In this phase the parties simulate a single chunk of the protocol . Specifically, we split into chunks—consecutive sets of rounds—where at each chunk bits are being communicated, for some that is fixed throughout the simulation and such that is divisible by . Jumping ahead, we note that is set to be in the first protocol we construct, that is robust to oblivious adversaries, and is set to in the final protocol that considers arbitrary (non-oblivious) adversaries. Note that since the speaking order in  is fixed and predetermined, the partition into chunks can be done in advance and is independent of the inputs. We assume without loss of generality that each party speaks at least once in each chunk (this is without loss of generality since one can preprocess to achieve this property while increasing the communication complexity by only a constant factor).

In this phase, the parties “execute” the next chunk of , sending and receiving messages as dictated by the protocol .

This phase always takes rounds, which is the maximal number of rounds required to simulate transmissions of . It may be that the simulation of a specific chunk takes less rounds; in this case, the phase still takes rounds where all the parties remain silent after the chunk’s simulation has completed until rounds have passed.

We note that some parties may be aware that the simulation so far contains errors that were not corrected yet (jumping ahead, this information can be obtained via local consistency checks that failed or via the global flag-passing phase, described below). When we reach the simulation phase, these parties will send a dummy message  to their neighbors and remain silent for rounds until the simulation phase completes.

• Consistency check: The main purpose of this phase is to check whether each two neighboring parties have consistent transcripts and can continue to simulate, or whether instead they need to correct prior errors. This phase is based on the meeting points mechanism [Sch92, Hae14], which allows the parties to efficiently find the highest chunk number up to which they both agree.

Roughly speaking, every time the parties enter this phase, they exchange a hash of their current transcripts with each other. If the hashes agree, the parties believe that everything is consistent and effectively continue with simulating . If the hashes do not agree, the parties try to figure out the longest point in their transcript where they do agree. To this end, they send hashes of prefixes of their transcript until the hashes agree. In our setting, each time the parties enter the “consistency check” phase they perform a single iteration of the meeting-points mechanism [Hae14], which consists of sending two hash values. If the hashes mismatch, they will send the next two hash values (of some prefixes of the transcript, as instructed by the meeting-points mechanism) next time they enter the consistency check phase.777In addition to exchanging two hash values corresponding to prefixes of the transcript, the parties also exchange a hash indicating how long they have been running the meeting-points mechanism; see Section A for a full description.

Note that the above is performed between each pair of adjacent parties, in parallel over the entire network.

• Flag passing: In the flag passing phase, the parties attempt to synchronize whether or not they should continue the simulation of in the next simulation phase. As mentioned, it may be that some parties believe that the simulation so far is flawless while other may notice that there are some inconsistencies. In this phase the information about (2-party) inconsistencies is propagated to all the parties.

Roughly, if any party believes it shouldn’t continue with the simulation, it notifies all its neighbors, which propagate the message to the rest of the network, and no party will simulate in the upcoming simulation phase. However, if all parties believe everything is consistent then no such message will be sent, and all the parties will continue simulating the next chunk of .

Technically speaking, the parties accomplish this synchronization step by passing a “flag” (i.e., a stop/continue bit) along a spanning tree of . Namely, each party receives flags from each of its children in . A party will receive a bit from each of its children in . If one of the flags is stop, or if the party sees inconsistency with one of its neighbors, it sends a stop flag to its parent in the tree. Otherwise, it sends its parent the continue flag. After this phase ends and the root of receives all the flags, the root propagates the computed flag in the opposite direction back to the leafs. If there is no channel noise in this phase, it is clear that all parties are synchronized regarding whether the simulation should continue or not (recall that a party sends a dummy message during the simulation phase if its flag is set to stop).

• Rewind: In the rewind phase, each party tries to correct any obvious (i.e., length-wise) inconsistencies with their neighbors. Recall that the meeting-points mechanism allows two neighboring parties to truncate their mis-matching transcripts to a prefix on which both parties agree. However, this may cause inconsistencies with all their other neighbors. Indeed, if and rewind several chunks off their transcript with each-other, then must inform any other party to rewind the same amount of chunks. This rewinding happens even if the transcripts on the link are consistent at both ends. Therefore, this discrepancy is not necessarily revealed by the meeting-point mechanism between and , and must be solved in a different manner.

Technically, if the transcript of and consists of chunks, then will send a “rewind” message to any neighbor for which the transcript of and contains more than chunks. However, there are a few caveats. Any party that is currently trying to find agreement with via the meeting-points mechanism should not rewind the transcript with . Intuitively, we can see that any such rewind seems unnecessary, since is already going to truncate its transcript when it eventually finds agreement with in the meeting-points subroutine. Furthermore, an underlying assumption of the meeting-points protocol is that, until the parties decide to truncate their transcripts in the protocol, their transcripts do not change.

Additionally, we restrict each party to rewinding at most one chunk in each of its pairwise transcripts. This is primarily for ease of analysis: it means that no matter what kind of errors the adversary induces, there is only so much harm that can be done during the rewind phase. The upshot of this is that it is not necessarily true that after the rewind phase, sees exactly the same amount of simulated chunks with all .

Once a party sends a rewind message to a neighbor , party  will truncate one chunk of the transcript that corresponds to the link , and might then want to send rewind messages to its own neighbors. These rewinds could trigger more rewinds, leading to a wave of rewinds going through the network. By providing rounds in the rewind phase, we make sure that this wave has enough time to go through the entire network.888Alternatively, we could have fixed the rewind phase to consist of rounds (rather than  rounds), where is the the diameter of the grahp . This is critical to guaranteeing that we fix errors quickly enough to simulate with constant overhead.

##### Randomness Exchange.

Our coding scheme is randomized. The only usage of randomness in our scheme is for the consistency check phase, where hash values are exchanged. Since both parties must use the same hash function they must agree on the seed of the hash function, which can be thought of as a random string. In the description of the protocol below, we will assume that any pair of neighboring parties share a long random string of polynomial length in the communication of the protocol, and this random string determines the seeds to the hash functions used in the protocol.

In Section 5 we show how to remove the need for a long shared random string. The basic approach is from previous works [BK12, Hae14]: First make the common randomness short, by having the randomness consist of a (short) seed for -biased randomness. Then, in the protocol, use the -bias randomness, as opposed to the true randomness. Once the random string is short, the parties can simply send it over the network using a standard error-correcting code999The reason a standard error-correcting code suffices is that this “randomness sharing” step of the protocol fully utilizes the network; hence, insertions are nonexistent, and deletions are equivalent to erasures., without blowing up the communication complexity by more than a constant factor.

Unfortunately, this approach doesn’t work in a straightforward manner. The reason is that our assumption, that hashes seeded with a -biased string behave “close to” hashes seeded with a uniform hash, no longer holds when the inputs to the hash depend on the seed itself. Indeed, in our case the inputs depend on which hash collisions occurred in previous rounds, which is a function of the -biased string. Showing that replacing a uniform seed with a -biased, despite the dependency of the inputs on the seed, is a technicality that was overlooked in [Hae14], and is addressed in Section 5.101010This technicality was not an issue in [BK12] since in their protocol, the seed to the hash function is not sent ahead of time, but rather is send together with the hash value in each round. In our setting (as in [Hae14]) we cannot afford to send the hash seed in each iteration, since we have the budget to send only constant many bits between two parties in each iteration.

### 3.2 The coding scheme

We now formally describe the coding scheme assuming a common random string (CRS) and oblivious noise. Let be a noiseless protocol over , with rounds and transmissions throughout. Assume that the communication pattern and amount is predetermined, and independent of parties’ inputs and the transcript. Namely, let be the noiseless transcript of ; the content of the messages depend on the specific inputs, however their order, source and destination are fixed for .

We partition into rounds according to , and group the rounds into chunks, where each chunk is a set of contiguous rounds with total communication complexity exactly . Specifically, we keep adding rounds to a chunk until adding a round would cause the communication to exceed . Note that without the last round, the communication in the chunk is at least bits. We can then add a virtual round that makes the communication in the chunk be exactly  bits. This addition affects the communication complexity by a constant factor. From this point on, we assume that  adheres to our required structure.

We number the chunks in order, starting from 1. For any (possibly partial) transcript , we let denote the number of chunks contained in the transcript . In particular, is the maximal number of chunks in . We assume without loss of generality that in each chunk, each party sends at least one bit to each of its neighbors (again, this can easily be achieved by pre-processing  while increasing its communication by a constant factor). In addition, we assume that the protocol is padded with enough dummy chunks where parties simply send zeros. This padding is standard in the literature on interactive coding, and is added to deal with the case that the adversary behaves honestly in all the rounds until the last few rounds, and fully corrupts the last few rounds.

The parties simulate one chunk at a time, by cycling through the following phases in the following order: consistency check, flag passing, simulation, and rewind. Each phase takes a number of rounds that is a priori fixed, and since our model is synchronous, the parties are always in agreement regarding which phase is being executed.

Let denote the pairwise transcript of the link as seen by , where ; Similarly, is the transcript of the same link as seen by  (which may differ from due to channel noise). In more detail, is the concatenation of the transcripts generated at each chunk, where the transcript of chunk  consists of two parts: (1) the simulated communication of chunk , and (2) the chunk number .111111It is important to add the chunk number since the inner-product hash function we use (Lemma 2.2) has the property that for any string , . The structure of the part (1) is as follows. Assume that in the -th chunk in , bits are exchanged over the link in rounds . Then holds a string of length over describing the communication at times , as observed by . The symbol denotes the event of not receiving a bit at the specific round (i.e., due to a deletion). The transcript is defined analogously from ’s point of view. Note that restricted to the substrings that belongs to chunk , if and only if there where no errors at rounds in the simulation phase; insertions and deletions at other rounds are ignored. We abuse notation and define to be the number of chunks that appear in .

In Algorithm 1 we describe the noise-resilient protocol for a fixed party . The parties start by initializing their state with a call to InitializeState() (Algorithm 2). Much of this state is used for keeping state across iterations of the meeting-points mechanism, described in Algorithm 7 (in Appendix A).

Next, the parties perform a single iteration of the meeting-points mechanism (Algorithm 7). Given a pair of adjacent parties and , the meeting-points mechanism outputs a variable , which indicates whether the parties want to simulate (in which case ) or continue with the meeting-points mechanism (in which case ).

Then, according to the output of the meeting-point mechanism and according to any apparent inconsistencies in the simulated transcripts with its neighbors, each party sets its “flag” to denote whether it should continue with the simulation or not. This status is used as an input to the flag-passing phase, described in Algorithm 3.

Each party ends the flag-passing phase with a flag denoted that is set to  if the network as a whole seems to be correct. Then, the parties perform a simulation phase. If the flag is set to , they execute for one additional chunk, according to the place they believe they are at. Otherwise, they send a special symbol  to indicate they are not participating in the current simulation phase.

Finally, the rewind phase begins, where any party that sees an obvious discrepancy in the lengths of the transcripts in its neighborhood, sends a single rewind request to any neighbor which is ahead of the rest, conditioned that and are not currently in the middle of a meeting-points process.

## 4 Coding scheme for oblivious channels: Analysis

In this section we analyze the coding scheme presented in Section 3 and prove the following Theorem.

###### Theorem 4.1.

Assume a network with parties and links. Suppose is a multiparty protocol the network  with communication complexity , binary alphabet and fixed order of speaking. Let and let be a sufficiently small constant. Then, with probability at least , Algorithm 1 simulates correctly with communication complexity , assuming an oblivious adversary with error rate .

In order to prove the above theorem we define a potential function that measures the progress of the simulation at every iteration. In Section 4.1 we define the potential function and intuitively explain most of its terms. In Section 4.3 we prove that in every iteration121212Recall that a single iteration of Algorithm 1 consists of a consistency phase, flag-passing phase, simulation phase and a rewind phase. the potential increases by at least , while the communication increases by at most , where measures the number of channel noise and hash collisions that occurred in that specific iteration.

We split the analysis of the potential into two parts: the meeting points mechanism and the rest of the coding scheme. The first part re-iterates the analysis of [Hae14] with minor adaptations. We defer the full proofs to Appendix A. The rest of the potential analysis is novel and performed in Sections 4.3.2 and 4.3.3. Specifically, in Section 4.3.2 we focus on the iterations with no errors/hash-collisions, and in Section 4.3.3 we focus on iterations that suffer from errors/hash-collisions. Then, in Section 4.4 we bound (with high probability) the number of hash-collisions that may happen throughout the entire execution of the coding scheme. Finally, in Section 4.5 we complete the proof of Theorem 4.1, by showing that the potential at the end of the coding scheme must be high enough to imply a correct simulation of , given the bounded amount of errors and hash-collisions.

In the following, all our quantities measure progress in chunks, where each chunk contains exactly bits. Recall that we denote by  the number of chunks in the noiseless protocol , and we denote by is the number of chunks in the simulated (partial) transcript .

### 4.1 The potential function

Our potential function  will measure the progress of the network towards simulating the underlying interactive protocol correctly. Naturally, changes as the simulation of Algorithm 1 progresses, and so depends on the round number. In what follows, for ease of notation, we omit the current round number in all the terms used to define .

For each adjacent pair of parties and , define

 Gu,v (1)

to be the size (in chunks) of the longest common prefix of and . Namely, is the length of the largest prefix of communication between parties and in , that these parties agree on. Define to be

 Bu,v\lx@stackreldef=max(|Tu,v|,|Tv,u|)−Gu,v. (2)

Namely, is the gap between how far one of the parties thinks they have simulated and how far they have simulated correctly.131313Note that we can have even when there have been errors in the network, as long as those errors were corrected. Note that is always nonnegative by design. Furthermore, if and only if the parties have no differences in their pairwise transcripts with each other.

Define

 G∗\lx@stackreldef=min(u,v)∈EGu,v (3)

to be the largest chunk number through which the network as a whole has correctly simulated. Let

 H∗\lx@stackreldef=maxumaxv∈N(u)|Tu,v| (4)

denote the largest chunk number which any party in the network thinks it has simulated; note that, by definition, . Finally, we define

 B∗\lx@stackreldef=H∗−G∗. (5)

In addition, our potential function also quantifies the progress of the meeting-points mechanism between any two adjacent parties in the network (which we elaborate on in Section 4.2 below, and in Appendix A). This is done via the term defined in Eq. (39), which is closely inspired by the potential function stated in [Hae14]. Intuitively, is the number of iterations of the meeting-points mechanism that parties and need to do to make ; indeed, for all pairs it holds that (Proposition A.2)

 0≤Bu,v≤φu,v,

and in particular, implies that .

Finally, let denote the number of errors and hash collisions that have occurred in the protocol until the current round of Algorithm 1. Similarly to all the other terms in the potential, we drop the dependence on the round .

Our potential function is defined to be:

 ϕ\lx@stackreldef=∑(u,v)∈E(KmGu,v−K⋅φu,v)−C1KB∗+C7K⋅EHC (6)

where and are constants such that is sufficiently larger than 2, but smaller than all the constants defined in Eq. (39), and is a constant sufficiently larger than .

###### Remark 2 (Remark on Notation).

For any variable that represents the state of some party in Algorithm 1, including all the ones in Table 1, we let denote the value of the variable at the beginning of iteration . For example, denotes the value of the partial transcript at the very start of the tenth iteration of Algorithm 1. ⬣

### 4.2 The meeting-points mechanism and potential φu,v

In what follows, we briefly recall the meeting-points mechanism and why we use it. We defer the formal definition of and all the proofs regarding it to Appendix A.

If two adjacent parties and have (or equivalently, ), then they should not simulate further with each other without fixing the differences in their transcripts. If and knew which of them needs to roll back and by how much, they could simply roll back the simulated chunks until , at which point they can continue the simulation. However, they do not know this information. Furthermore, they cannot afford to communicate or , since these numbers potentially require bits to communicate.

This problem is solved via the “meeting-points” mechanism [Hae14] which is designed to roll back and to a point where , while only requiring exchanges of hashes between parties  and , and guaranteeing that (in the absence of error) neither nor truncate their transcript too much. That is, (resp. ) truncates (resp. ) by at most chunks. While errors and hash collisions can mess up this guarantee, each error or hash collision causes only a bounded amount of damage. Since the adversary’s allowed error rate is sufficiently small, the simulation overcomes this damage with high probability.

As mentioned above, our analysis of the meeting-points mechanism essentially follows that of Haeupler [Hae14] after adopting it to our construction, where the meeting-points mechanism is interleaved over several iterations, rather than performed all at once. Specifically, for each link we define a “meeting-points potential” term  that approximately measures the number of hash exchanges it will require for and to roll back and to a common point. We prove how this potential behaves in each of the phases of our simulation protocol. While its behavior in the meeting-points phase naturally repeats the analysis of [Hae14], can also change during the other phases of the protocol, especially when noise is present. Our analysis bounds the change in  in all the phases as a function of the errors and hash collisions occurred throughout. This allows bounding the change in the overall potential . We bound the changes in  in the Flag Passing, Rewind, and Simulation phases in Claim A.1. The changes in  in the Meeting Points phase are addressed in Lemmas A.6 (analogous to Lemma 7.4 in [Hae14]) and Proposition A.4, and are combined to establish how the potential changes in the Meeting Points phase (Lemma A.11).

We defer the formal definition of the meeting-points mechanism and the proofs of the relevant properties to Appendix A.

### 4.3 Bounding the potential increase and communication per iteration

In this section we prove the following technical lemma that says that the potential (Eq. (6)) increases in each iteration by at least . Furthermore, the amount of communication performed during a single iteration can be bounded by roughly  times the amount of links (i.e., pairs of parties) that suffer from channel-noise during this iteration, or links that experienced an event of hash-collision during this iteration.

###### Lemma 4.2.

Fix any iteration of Algorithms 1 and let be the number of links with errors or hash collisions on them during this iteration. Then,

1. The potential increases by at least in this iteration.

2. The amount of communication in the entire network during this iteration () satisfies

 CC≤α(1+ℓ)K,

where is a sufficiently large constant.

The next sections are devoted to proving the above lemma. Let us begin by giving a high-level overview of the proof.

#### 4.3.1 Proof Overview

We proceed to prove the lemma in two conceptual steps.

1. First, in Section 4.3.2, we consider iterations that have no errors or hash collisions.

We first establish that the communication in this case is at most . To this end, we first argue that the communication in the meeting-points, flag-passing, and rewind phases is always bounded by (Proposition 4.3), regardless of errors committed by the adversary. Therefore, it suffices to bound the communication in the simulation phase. If every party is simulating the same chunk, then the communication is easily bounded by . However, if the parties are simulating many different chunks, then the communication could be much larger. This is where the flag-passing phase is useful: if there are no errors, then the flags will prevent all parties from simulating when two parties are at different chunks.

We next establish that the potential increases by at least , as follows. If the parties simulate, then since there are no errors or hash collisions, increase by , and none of the other terms change. If the parties do not simulate, then either some adjacent parties did not pass their consistency check, in which case increases by (Lemma A.6) and none of the other terms change, or some parties rewind, in which case decreases and none of the other terms change.

2. Next, in Section 4.3.3, we consider iterations that have errors or hash collisions.

We first argue that errors and hash collisions increase by at least . To this end, note that errors may cause some terms of to decrease, but this is compensated for by the accompanying increase in , and since is set to be large enough, even though some of the terms decrease, overall the potential increases by at least .

We would then like to argue that the communication increases by at most , though unfortunately, this claim is false. The communication in an iteration can actually greatly exceed , though we show that in these cases, there were many errors or hash collisions in the iteration. Specifically, we argue that each error or hash collision individually does not cause too much extra communication. This is formalized in Lemma 4.8.

Before we formalize the intuition above, we mention some salient properties of the meeting-points potential function  that we use in the analysis.

#### 4.3.2 Iterations with no errors or hash collisions

First, we prove a simple proposition, which says that the communication in the meeting-points, flag-passing, and rewind phases is bounded. This reduces bounding the overall communication in an iteration to bounding the communication in the corresponding simulation phase.

###### Proposition 4.3.

The communication during the flag-passing and rewind phases is in total, and the communication in the meeting-points phase is , regardless of errors or hash collisions in the iteration.

###### Proof.

In the meeting-points phase, each adjacent pair of parties exchange hashes of their transcripts (see Algorithm 7), where the output length of the hash functions is . Hence, there is communication in the meeting-points phase.
The communication pattern in the flag-passing phase is deterministic and consists of two messages per link of a the spanning tree , hence it is upper bounded by . Finally, each link can have at most one valid “rewind” message in the rewind phase (note that messages that are inserted do not count towards our communication bound). ∎

###### Lemma 4.4.

Suppose that there are no errors or hash collisions in a single iteration of Algorithm 1. Then the overall communication in the network is .

###### Proof.

By Proposition 4.3, the communication in all the phases except of the simulation phase are bounded by , and we are left to bound the communication in the simulation phases

In the simulation phase, each party either sends or simulates a specific chunk. Say that simulates chunk number with all its neighbors if it didn’t send  (however, its neighbors may simulate a different chunk). Each chunk contains at most bits of communication, hence, the total amount of communication in the simulation phase is bounded by times the number of distinct chunk numbers being simulated in the network. In other words, it is bounded by , up to additional “messages” (which in our case are merely bits).

Therefore, to finish the proof it remains to argue that if there are no errors or hash collisions then and the potential function increases by at least in the iteration. We consider several different cases according to the state of the network at the beginning of the iteration, specifically, whether the parties have set or not.

##### Case 1: At the end of the flag-passing phase, netCorrectu=1 for every party u.

Since there were no errors or hash collisions, the fact that means that each party  had before the flag-passing phase. This follows since by the definition of the flag-passing phase, for every party , . Note that, since we assume that none of the parties have , and we assume no errors, the are no symbols sent in the simulation phase.

The fact that each party has implies that for all , it holds that . Further, for any , , or otherwise the hashes would indicate a mismatch and the parties would have set . Putting these two facts together, we get that and hence , which implies that indeed , as desired.

##### Case 2: At the end of the flag-passing phase, some party u has netCorrectu=0.

Since for some party , there must be some party such that , and hence we have that for all . Since for all parties , we know that none of the parties will simulate (they will only send s) and hence the overall communication in the iteration will be .

We next show that the potential increases by at least  in any such iteration.

###### Lemma 4.5.

Suppose that there are no errors or hash collisions in a single iteration of Algorithm 1. Then the potential increases by at least  during this iteration.

###### Proof.

We consider the status of the network at the iteration according to the next three cases.

##### Case 1: At the end of the flag-passing phase, netCorrectu=1 for every party u.

Recall that since there were no errors or hash collisions, the fact that means that each party  had before the flag-passing phase. This in turn implies that for all , it holds that . Further, for any , , or otherwise the hashes would indicate a mismatch and the parties would have set . Consequently, we have and . The fact that for every party , together with the fact that , implies that all parties simulate the same chunk, and the absence of errors in the communication implies that this simulation is simulation done correctly. Hence, each is extended correctly according to . This in turn implies that increases for each , which causes to increase by .

Next, we argue that none of the other terms of decrease. We first argue that remains zero at the end of the iteration. To this end, note that since all parties choose to simulate one more chunk in each of their pairwise transcripts, we still have the property that for all and after the simulation phase. Since there were no errors, we also have that for all . As noted before, this gives us that after the simulation phase, and since there are no errors it remains zero after the rewind phase as well.

It remains to argue that does not increase for any . By Proposition A.4 we know that does not increase in the meeting-points phase. Furthermore, it does not increase in the flag-passing, simulation or rewind phases either, by Claim A.1.

Putting this all together, we have that each increases by one, does not change and does not increase, which implies that the potential increases by at least  overall, as desired.

Since , has