Network-Aware Byzantine Broadcastin Point-to-Point Networks using Local Linear Coding 1footnote 11footnote 1This research is supported in part by Army Research Office grant W-911-NF-0710287 and National Science Foundation award 1059540. Any opinions, findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect the views of the funding agencies or the U.S. government.

Network-Aware Byzantine Broadcast
in Point-to-Point Networks using Local Linear Coding 111This research is supported in part by Army Research Office grant W-911-NF-0710287 and National Science Foundation award 1059540. Any opinions, findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect the views of the funding agencies or the U.S. government.

Guanfeng Liang and Nitin Vaidya
Department of Electrical and Computer Engineering, and
Coordinated Science Laboratory
University of Illinois at Urbana-Champaign
gliang2@illinois.edu, nhv@illinois.edu
Abstract

The goal of Byzantine Broadcast (BB) is to allow a set of fault-free nodes to agree on information that a source node wants to broadcast to them, in the presence of Byzantine faulty nodes. We consider design of efficient algorithms for BB in synchronous point-to-point networks, where the rate of transmission over each communication link is limited by its ”link capacity”. The throughput of a particular BB algorithm is defined as the average number of bits that can be reliably broadcast to all fault-free nodes per unit time using the algorithm without violating the link capacity constraints. The capacity of BB in a given network is then defined as the supremum of all achievable BB throughputs in the given network, over all possible BB algorithms.

We develop NAB – a Network-Aware Byzantine broadcast algorithm – for arbitrary point-to-point networks consisting of nodes, wherein the number of faulty nodes is at most , , and the network connectivity is at least . We also prove an upper bound on the capacity of Byzantine broadcast, and conclude that NAB can achieve throughput at least 1/3 of the capacity. When the network satisfies an additional condition, NAB can achieve throughput at least 1/2 of the capacity.

To the best of our knowledge, NAB is the first algorithm that can achieve a constant fraction of capacity of Byzantine Broadcast (BB) in arbitrary point-to-point networks.

1 Introduction

The problem of Byzantine Broadcast (BB) – also known as the Byzantine Generals problem [12] – was introduced by Pease, Shostak and Lamport in their 1980 paper [19]. Since the first paper on this topic, Byzantine Broadcast has been the subject of intense research activity, due to its many potential practical applications, including replicated fault-tolerant state machines [5], and fault-tolerant distributed file storage [20]. Informally, Byzantine Broadcast (BB) can be described as follows (we will define the problem more formally later). There is a source node that needs to broadcast a message (also called its input) to all the other nodes such that even if some of the nodes are Byzantine faulty, all the fault-free nodes will still be able to agree on an identical message; the agreed message is identical to the source’s input if the source is fault-free.

We consider the problem of maximizing the throughput of Byzantine Broadcast (BB) in synchronous networks of point-to-point links, wherein each directed communication link is subject to a ”capacity” constraint. Informally speaking, throughput of BB is the number of bits of Byzantine Broadcast that can be achieved per unit time (on average), under the worst-case behavior by the faulty nodes. Despite the large body of work on BB [7, 6, 3, 11, 2, 18], performance of BB in arbitrary  point-to-point network has not been investigated previously. When capacities of the different links are not identical, previously proposed algorithms can perform poorly. In fact, one can easily construct example networks in which previously proposed algorithms achieve throughput that is arbitrarily worse than the optimal throughput.

Our Prior Work:

In our prior work, we have considered the problem of optimizing throughput of Byzantine Broadcast in 4-node networks [14]. By comparing with an upper bound on the capacity of BB in 4-node networks, we showed that our 4-node algorithm is optimal. Unfortunately, the 4-node algorithm does not yield very useful insights on design of good algorithms for larger networks. This paper presents an algorithm that uses a different approach than that in [14], and also develops a different upper bound on capacity that is helpful in our analysis of the new algorithm. In other related work, we explored design of efficient Byzantine consensus algorithms when total communication cost is the metric (which is oblivious of link capacities) [15].

Main contributions:

This paper studies throughput and capacity of Byzantine broadcast in arbitrary point-to-point networks.

  1. We develop a Network-Aware Byzantine (NAB) broadcast algorithm for arbitrary point-to-point networks wherein each directed communication link is subject to a capacity constraint. The proposed NAB algorithm is “network-aware” in the sense that its design takes the link capacities into account.

  2. We derive an upper bound on the capacity of BB in arbitrary point-to-point networks.

  3. We show that NAB can achieve throughput at least 1/3 of the capacity in arbitrary point-to-point networks. When the network satisfies an additional condition, NAB can achieve throughput at least 1/2 of the capacity.

We consider a synchronous system consisting of nodes, named , with one node designated as the sender or source node. In particular, we will assume that node 1 is the source node. Source node 1 is given an input value containing bits, and the goal here is for the source to broadcast its input to all the other nodes. The following conditions must be satisfied when the input value at the source node is :

  • Termination: Every fault-free node must eventually decide on an output value of bits; let us denote the output value of fault-free node as .

  • Agreement: All fault-free nodes must agree on an identical output value, i.e., there exists such that for each fault-free node .

  • Validity: If the source node is fault-free, then the agreed value must be identical to the input value of the source, i.e., .

Failure Model:

The faulty nodes are controlled by an adversary that has a complete knowledge of the network topology, the algorithm, and the information the source is trying to send. No secret is hidden from the adversary. The adversary can take over up to nodes at any point during execution of the algorithm, where . These nodes are said to be faulty. The faulty nodes can engage in any kind of deviations from the algorithm, including sending incorrect or inconsistent messages to the neighbors.

We assume that the set of faulty nodes remains fixed across different instances of execution of the BB algorithm. This assumption captures the conditions in practical replicated server systems. In such a system, the replicas may use Byzantine Broadcast to agree on requests to be processed. The set of faulty (or compromised) replicas that may adversely affect the agreement on each request does not change arbitrarily. We model this by assuming that the set of faulty nodes remains fixed over time.

When a faulty node fails to send a message to a neighbor as required by the algorithm, we assume that the recipient node interprets the missing message as being some default value.

Network Model:

We assume a synchronous point-to-point network modeled as a directed simple graph , where the set of vertices represents the nodes in the point-to-point network, and the set of edges represents the links in the network. The capacity of an edge is denoted as . With a slight abuse of terminology, we will use the terms edge and link interchangeably, and use the terms vertex and node interchangeably. We assume that and that the network connectivity is at least (these two conditions are necessary for the existence of a correct BB algorithm [7]).

In the given network, links may not exist between all node pairs. Each directed link is associated with a fixed link capacity, which specifies the maximum amount of information that can be transmitted on that link per unit time. Specifically, over a directed edge with capacity bits/unit time, we assume that up to bits can be reliably sent from node to node over time duration (for any non-negative ). This is a deterministic model of capacity  that has been commonly used in other work [13, 4, 9, 10]. All link capacities are assumed to be positive integers. Rational link capacities can be turned into integers by choosing a suitable time unit. Irrational link capacities can be approximated by integers with arbitrary accuracy by choosing a suitably long time unit. Propagation delays on the links are assumed to be zero (relaxing this assumption does not impact the correctness of results shown for large input sizes). We also assume that each node correctly knows the identity of the nodes at the other end of its links.

Throughput and Capacity of Byzantine Broadcast

When defining the throughput of a given BB algorithm in a given network, we consider independent instances of BB. The source node is given an -bit input for each of these instances, and the validity and agreement properties need to be satisfied for each instance separately (i.e., independent of the outcome for the other instances).

For any BB algorithm , denote as the duration of time required, in the worst case, to complete instances of -bit Byzantine Broadcast, without violating the capacity constraints of the links in . Throughput of algorithm in network for -bit inputs is then defined as

We then define capacity as follows.

Capacity of Byzantine Broadcast in network is defined as the supremum over the throughput of all algorithms that solve the BB problem and all values of . That is, (1)

2 Algorithm Overview

Each instance of our NAB algorithm performs Byzantine broadcast of an -bit value. We assume that the NAB algorithm is used repeatedly, and during all these repeated executions, the cumulative number of faulty nodes is upper bounded by . Due to this assumption, the algorithm can perform well by amortizing the cost of fault tolerance over a large number of executions. Larger values of also result in better performance for the algorithm. The algorithm is intended to be used for sufficiently large , to be elaborated later.

The -th instance of NAB executes on a network corresponding to graph , defined as follows:

  • For the first instance, , and . Thus, and .

  • The -th instance of NAB occurs on graph in the following sense: (i) all the fault-free nodes know the node and edge sets and , (ii) only the nodes corresponding to the vertices in need to participate in the -th instance of BB, and (iii) only the links corresponding to the edges in are used for communication in the -th instance of NAB (communication received on other links can be ignored).

    During the -th instance of NAB using graph , if misbehavior by some faulty node(s) is detected, then, as described later, additional information is gleaned about the potential identity of the faulty node(s). In this case, is obtained by removing from appropriately chosen edges and possibly some vertices (as described later).

    On the other hand, if during the -th instance, no misbehavior is detected, then .

The -th instance of NAB algorithm consists of three phases, as described next. The main contributions of this paper are (i) the algorithm used in Phase 2 below, and (ii) a performance analysis of NAB.

If graph does not contain the source node 1, then (as will be clearer later) by the start of the -th instance of NAB, all the fault-free nodes already know that the source node is surely faulty; in this case, the fault-free nodes can agree on a default value for the output, and terminate the algorithm. Hereafter, we will assume that the source node 1 is in .

Figure 1: Example graphs
(a) Directed graph
(b) Undirected graph
(c) Two unit-capacity spanning trees in the directed graph. Every directed edge has capacity 1
(d) A spanning tree in the undirected graph shown in dotted edges
Figure 2: Different graph representations of a network. Numbers next to the edges indicate link capacities.

Phase 1: Unreliable Broadcast

In Phase 1, source node 1 broadcasts bits to all the other nodes in . This phase makes no effort to detect or tolerate misbehavior by faulty nodes. As elaborated in Appendix A, unreliable broadcast can be performed using a set of spanning trees embedded in graph . Now let us analyze the time required to perform unreliable broadcast in Phase 1.

denotes the minimum cut in the directed graph from source node 1 to node . Let us define

is equal to the maximum flow rate possible from node 1 to node . It is well-known [17] that is the maximum rate achievable for unreliable broadcast from node 1 to all the other nodes in , under the capacity constraints on the links in . Thus, the least amount of time in which bits can be broadcast by node 1 in graph is given by222To simplify the analysis, we ignore propagation delays. Analogous results on throughput and capacity can be obtained in the presence of propagation delays as well.

(2)

Clearly, depends on the capacities of the links in . For example, if were the directed graph in Figure 1, then , , and hence .

At the end of the broadcast operation in Phase 1, each node should have received bits. At the end of Phase 1 of the -th instance of NAB, one of the following four outcomes will occur:

  • The source node 1 is fault-free, and all the fault-free nodes correctly receive the source node’s -bit input for the -th instance of NAB, or

  • The source node 1 is fault-free, but some of the fault-free nodes receive incorrect -bit values due to misbehavior by some faulty node(s), or

  • The source node 1 is faulty, but all the fault-free nodes still receive an identical -bit value in Phase 1, or

  • The source node is faulty, and all the fault-free nodes do not receive an identical -bit value in Phase 1.

The values received by the fault-free nodes in cases (i) and (iii) satisfy the agreement and validity conditions, whereas in cases (ii) and (iv) at least one of the two conditions is violated.

Phase 2: Failure Detection

Phase 2 performs the following two operations. As stipulated in the fault model, a faulty node may not follow the algorithm specification correctly.

  • (Step 2.1) Equality check: Using an Equality Check algorithm, the nodes in perform a comparison of the -bit value they received in Phase 1, to determine if all the nodes received an identical value. The source node 1 also participates in this comparison operation (treating its input as the value “received from” itself).

    Section 3 presents the Equality Check algorithm, which is designed to guarantee that if the values received by the fault-free nodes in Phase 1 are not identical, then at least one fault-free node will detect the mismatch.

  • (Step 2.2) Agreeing on the outcome of equality check: Using a previously proposed Byzantine broadcast algorithm, such as [19], each node performs Byzantine broadcast of a 1-bit flag to other nodes in indicating whether it detected a mismatch during Equality Check.

If any node broadcasts in step 2.2 that it has detected a mismatch, then subsequently Phase 3 is performed. On the other hand, if no node announces a mismatch in step 2.2 above, then Phase 3 is not performed; in this case, each fault-free node agrees on the value it received in Phase 1, and the -th instance of NAB is completed.

We will later prove that, when Phase 3 is not performed, the values agreed above by the fault-free nodes satisfy the validity and agreement conditions for the -th instance of NAB. On the other hand, when Phase 3 is performed during the -th instance of NAB, as noted below, Phase 3 results in correct outcome for the -th instance.

When Phase 3 is performed, Phase 3 determines . Otherwise, .

Phase 3: Dispute Control

Phase 3 employs a dispute control  mechanism that has also been used in prior work [1, 15]. Appendix B provides the details of the dispute control algorithm used in Phase 3. Here we summarize the outcomes of this phase – this summary should suffice for understanding the main contributions of this paper.

The dispute control in Phase 3 has very high overhead, due to the large amount of data that needs to be transmitted. From the above discussion of Phase 2, it follows that Phase 3 is performed only if at least one faulty node misbehaves during Phases 1 or 2. The outcomes from Phase 3 performed during the -th instance of NAB are as follows.

  • Phase 3 results in correct Byzantine broadcast for the -th instance of NAB. This is obtained as a byproduct of the Dispute Control mechanism.

  • By the end of Phase 3, either one of the nodes in is correctly identified as faulty, or/and at least one pair of nodes in , say nodes , is identified as being “in dispute” with each other. When a node pair is found in dispute, it is guaranteed that (i) at least  one of these two nodes is faulty, and (ii) at least one of the directed edges and is in . Note that the dispute control phase never finds two fault-free nodes in dispute with each other.

  • Phase 3 in the -th instance computes graph . In particular, any nodes that can be inferred as being faulty based on their behavior so far are excluded from ; links attached to such nodes are excluded from . In Appendix B we elaborate on how the faulty nodes are identified. Then, for each node pair in , if that node pair has been found in dispute at least in one instance of NAB so far, the links between the node pair are excluded from . Phase 3 ensures that all the fault-free nodes compute an identical graph to be used during the next instance of NAB.

Consider two special cases for the -th instance of NAB:

  • If graph does not contain the source node 1, it implies that all the fault-free nodes are aware that node 1 is faulty. In this case, they can safely agree on a default value as the outcome for the -th instance of NAB.

  • Similarly, if the source node is in but at least other nodes are excluded from , that implies that the remaining nodes in are all fault-free; in this case, algorithm NAB can be reduced to just Phase 1.

Observe that during each execution of Phase 3, either a new pair of nodes in dispute  is identified, or a new node is identified as faulty. Once a node is found to be in dispute with distinct nodes, it can be identified as faulty, and excluded from the algorithm’s execution. Therefore, Dispute Control needs to be performed at most times over repeated executions of NAB. Thus, even though each dispute control phase is expensive, the bounded number ensures that the amortized cost over a large number of instances of NAB is small, as reflected in the performance analysis of NAB (in Section 5 and Appendix D).

3 Equality Check Algorithm with Parameter

We now present the Equality Check algorithm used in Phase 2, which has an integer parameter for the -th instance of NAB. Later in this section, we will elaborate on the choice of , which is dependent on capacities of the links in .

Let us denote by the -bit value received by fault-free node in Phase 1 of the -th instance. For simplicity, we do not include index in the notation . To simplify the presentation, let us assume that is an integer. Thus we can represent the -bit value as symbols from Galois Field . In particular, we represent as a vector ,

where each symbol can be represented using bits. As discussed earlier, for convenience, we assume that all the link capacities are integers when using a suitable time unit.

Each node should performs these steps:

  1. On each outgoing link whose capacity is , node transmits linear combinations of the symbols in vector , with the weights for the linear combinations being chosen from .

    More formally, for each outgoing edge of capacity , a matrix is specified as a part of the algorithm. Entries in are chosen from . Node sends to node a vector of symbols obtained as the matrix product . Each element of is said to be a “coded symbol”. The choice of the matrix affects the correctness of the algorithm, as elaborated later.

  2. On each incoming edge , node receives a vector containing symbols from . Node then checks, for each incoming edge , whether . The check is said to fail iff .

  3. If checks of symbols received on any incoming edge fail in the previous step, then node sets a 1-bit flag equal to MISMATCH; else the flag is set to NULL. This flag is broadcast in Step 2.2 above.

Algorithm 1 Equality Check in with parameter

In the Equality Check algorithm, symbols of size bits are transmitted on each link of capacity . Therefore, the Equality Check algorithm requires time duration

(3)

Salient Feature of Equality Check Algorithm

In the Equality Check algorithm, a single round of communication occurs between adjacent nodes. No node is required to forward packets received from other nodes during the Equality Check algorithm. This implies that, while a faulty node may send incorrect packets to its neighbors, it cannot tamper information sent between fault-free nodes. This feature of Equality Check is important in being able to prove its correctness despite the presence of faulty nodes in .

Choice of Parameter

We define a set as follows using the disputes identified through the first instances of NAB.

As noted in the discussion of Phase 3 (Dispute Control), fault-free nodes are never found in dispute with each other (fault-free nodes may be found in dispute with faulty nodes, however). This implies that includes all the fault-free nodes, since a fault-free node will never be found in dispute with other nodes. There are at least fault-free nodes in the network. This implies that set is non-empty.

Corresponding to a directed graph , let us define an undirected graph as follows: (i) both and contain the same set of vertices, (ii) undirected edge if either or , and (iii) capacity of undirected edge is defined to be equal to the sum of the capacities of directed links and in (if a directed link does not exist in , here we treat its capacity as 0). For example, Figure 2(b) shows the undirected graph corresponding to the directed graph in Figure 2(a).

Define a set of undirected graphs as follows. contains undirected version of each directed graph in .

Define as the minimum value of the MINCUTs between all pairs of nodes in all the undirected graphs in the set . For instance, suppose that , and the graph shown in Figure 1 is , whereas is the graph shown in Figure 1. Thus, nodes 2 and 3 have been found in dispute previously. Then, and each contain two subgraphs, one subgraph corresponding to the node set , and the other subgraph corresponding to the node set . In this example, . Also notice that in this example, there is no edge between nodes 2 and 4 in to begin with – so these two nodes will never be found in dispute.

Parameter is chosen such that

Under the above constraint on , as per (3), execution time of Equality Check is minimized when . Under the above constraint on , we will prove the correctness of the Equality Check algorithm, with its execution time being .

3.1 Correctness of the Equality Check Algorithm

The correctness of Algorithm 1 depends on the choices of the parameter and the set of coding matrices . Let us say that a set of coding matrices is correct if the resulting Equality Check Algorithm 1 satisfies the following requirement:

  • (EC)   if   there exists a pair of fault-free nodes such that (i.e., ),
    then   the 1-bit flag at at least one fault-free node is set to MISMATCH.

Recall that is a vector representation of the -bit value received by node in Phase 1 of NAB. Two consequences of the above correctness condition are:

  • If some node (possibly the source node) misbehaves during Phase 1 leading to outcomes (ii) or (iv) for Phase 1, then at least one fault-free node will set its flag to MISMATCH. In this case, the fault-free nodes (possibly including the sender) do not share identical -bit values ’s as the outcome of Phase 1.

  • If no misbehavior occurs in Phase 1 (thus the values received by fault-free nodes in Phase 1 are correct), but MISMATCH flag at some fault-free node is set in Equality Check, then misbehavior must have occurred in Phase 2.

The following theorem shows that when , and when is sufficiently large, there exists a set coding matrices that are correct.

Theorem 1

For , when the entries of the coding matrices in step 1 of Algorithm 1 are chosen independently and uniformly at random from , then is correct with probability Note that when is large enough, .

Proof Sketch:

The detailed proof is presented in Appendix C. Here we provide a sketch of the proof. The goal is to prove that property (EC) above holds with a non-zero probability. That is, regardless of which (up to ) nodes in are faulty, when for some pair of fault-free nodes and in during the -th instance, at least one fault-free node (which may be different from nodes and ) will set its 1-bit flag to MISMATCH. To prove this, we consider every subgraph of (see definition of above). By definition of , no two nodes in have been found in dispute through the first instances of NAB. Therefore, represents one potential set of fault-free nodes in . For each edge in , steps 1-2 of Algorithm 1 together have the effect of checking whether or not . Without loss of generality, for the purpose of this proof, rename the nodes in as . Denote for , then

(4)

Define . Let be the sum of the capacities of all the directed edges in . As elaborated in Appendix C, we define to be a matrix whose entries are obtained using the elements of for each edge in in an appropriate manner. For the suitably defined matrix, we can show that the comparisons in steps 1-2 of Algorithm 1 at all the nodes in are equivalent to checking whether or not

(5)

We can show that for a particular subgraph , when , ; and when the set of coding matrices are generated as described in Theorem 1, for large enough , with non-zero probability contains a square submatrix that is invertible. In this case if and only if , i.e., . In other words, if all nodes in subgraph are fault-free, and for two fault-free nodes , then and hence the check in step 2 of Algorithm 1 fails at some fault-free node in .

We can then show that, for large enough , with a non-zero probability, this is also simultaneously true for all subgraphs . This implies that, for large enough , correct coding matrices ( for each ) can be found. These coding matrices are specified as a part of the algorithm specification. Further details of the proof are in Appendix C.

4 Correctness of NAB

For Phase 1 (Unreliable Broadcast) and Phase 3 (Dispute Control), the proof that the outcomes claimed in Section 2 indeed occur follows directly from the prior literature cited in Section 2 (and elaborated in Appendices A and B). Now consider two cases:

  • The values received by the fault-free nodes in Phase 1 are not identical: Then the correctness of Equality Check ensures that a fault-free node will detect the mismatch, and consequently Phase 3 will be performed. As a byproduct of Dispute Control in Phase 3, the fault-free nodes will correctly agree on a value that satisfies the validity and agreement conditions.

  • The values received by the fault-free nodes in Phase 1 are identical: If no node announces a mismatch in step 2.2, then the fault-free nodes will agree on the value received in Phase 1. It is easy to see that this is a correct outcome. On the other hand, if some (faulty) node announces a mismatch in step 2.2, then Dispute Control will be performed, which will result in correct outcome for the broadcast of the -th instance.

Thus, in all cases, NAB will lead to correct outcome in each instance.

5 Throughput of NAB and Capacity of BB

5.1 A Lower Bound on Throughput of NAB for Large and

In this section, we provide the intuition behind the derivation of the lower bound. More detail is presented in Appendix D. We prove the lower bound when the number of instances and input size for each instance are both “large” (in an order sense) compared to . Two consequences of and being large:

  • As a consequence of being large, the average overhead of Dispute control per instance of NAB becomes negligible. Recall that Dispute Control needs to be performed at most times over executions of NAB.

  • As a consequence of being large, the overhead of 1-bit broadcasts performed in step 2.2 of Phase 2 becomes negligible when amortized over the bits being broadcast by the source in each instance of NAB.

It then suffices to consider only the time it takes to complete the Unreliable broadcast in Phase 1 and Equality Check in Phase 2. For the -th instance of NAB, as discussed previously, the unreliable broadcast in Phase 1 can be done in time units (see definition of in section 2.). We now define

Appendix E provides a systematic construction of the set . Define the minimum value of all possible :

Then an upper bound of the execution time of Phase 1 in all instances of NAB is .

With parameter , the execution time of the Equality Check in Phase 2 is . Recall that is defined as the minimum value of the MINCUTs between all pairs of nodes in all undirected graphs in the set . As discussed in Appendix C.2, , where . Hence in all possible . Define

Then for all possible and the execution time of the Equality Check is upper-bounded by . So the throughput of NAB for large and can be lower bounded by333To simplify the analysis above, we ignored propagation delays. Appendix D describes how to achieve this bound even when propagation delays are considered.

(6)

5.2 An Upper Bound on Capacity of BB

Theorem 2

In any point-to-point network , the capacity of Byzantine broadcast () with node 1 as the source satisfies the following upper bound

Appendix F presents a proof of this upper bound. Given the throughput lower bound in (6) and the upper bound on from Theorem 2, as shown in Appendix G, the result below can be obtained.

Theorem 3

For graph :

Moreover, when :

6 Conclusion

This paper presents NAB, a network-aware Byzantine broadcast algorithm for point-to-point networks. We derive an upper bound on the capacity of Byzantine broadcast, and show that NAB can achieve throughput at least 1/3 fraction of the capacity over a large number of execution instances, when is large. The fraction can be improved to at least 1/2 when the network satisfies an additional condition.

References

  • [1] Z. Beerliova-Trubiniova and M. Hirt. Efficient multi-party computation with dispute control. In IACR Theory of Cryptography Conference (TCC), 2006.
  • [2] Z. Beerliova-Trubiniova and M. Hirt. Perfectly-secure mpc with linear communication complexity. In IACR Theory of Cryptography Conference (TCC), 2008.
  • [3] P. Berman, J. A. Garay, and K. J. Perry. Bit optimal distributed consensus. Computer science: research and applications, 1992.
  • [4] N. Cai and R. W. Yeung. Network error correction, part II: Lower bounds. Communications in Information and Systems, 2006.
  • [5] M. Castro and B. Liskov. Practical byzantine fault tolerance. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 1999.
  • [6] B. A. Coan and J. L. Welch. Modular construction of a byzantine agreement protocol with optimal message bit complexity. Journal of Information and Computation, 1992.
  • [7] M. J. Fischer, N. A. Lynch, and M. Merritt. Easy impossibility proofs for distributed consensus problems. In ACM symposium on Principles of Distributed Computing (PODC), 1985.
  • [8] T. Ho, R. Koetter, M. Medard, D. Karger, and M. Effros. The benefits of coding over routing in a randomized setting. In IEEE International Symposium on Information Theory (ISIT), 2003.
  • [9] T. Ho, B. Leong, R. Koetter, M. Medard, M. Effros, and D. Karger. Byzantine modification detection in multicast networks using randomized network coding (extended version). http://www.its.caltech.edu/ tho/multicast.ps, 2004.
  • [10] S. Jaggi, M. Langberg, S. Katti, T. Ho, D. Katabi, and M. Medard. Resilient network coding in the presence of byzantine adversaries. In IEEE International Conference on Computer Communications (INFOCOM), 2007.
  • [11] V. King and J. Saia. Breaking the bit barrier: scalable byzantine agreement with an adaptive adversary. In ACM symposium on Principles of Distributed Computing (PODC), 2010.
  • [12] L. Lamport, R. Shostak, and M. Pease. The byzantine generals problem. ACM Transaction on Programming Languages and Systems, 1982.
  • [13] S.-Y. Li, R. Yeung, and N. Cai. Linear network coding. IEEE Transactions on Information Theory, 2003.
  • [14] G. Liang and N. Vaidya. Capacity of byzantine agreement with finite link capacity. In IEEE International Conference on Computer Communications (INFOCOM), 2011.
  • [15] G. Liang and N. Vaidya. Error-free multi-valued consensus with byzantine failures. In ACM Symposium on Principles of Distributed Computing (PODC), 2011.
  • [16] E. M. Palmer. On the spanning tree packing number of a graph: a survey. Journal of Discrete Mathematics, 2001.
  • [17] C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity, chapter 6.1 The Max-Flow, Min-Cut Theorem, page 120–128. Courier Dover Publications, 1998.
  • [18] A. Patra and C. P. Rangan. Communication optimal multi-valued asynchronous byzantine agreement with optimal resilience. Cryptology ePrint Archive, 2009.
  • [19] M. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. Journal of the ACM (JACM), 1980.
  • [20] A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts, chapter 17 Distributed File Systems. Addison-Wesley, 1994.
\appendixpage

Appendix A Unreliable Broadcast in Phase 1

According to [16], in a given graph with , there always exist a set of unit-capacity spanning trees of such that the total usage on each edge by all the spanning trees combined is no more than its link capacity . Each spanning tree is “unit-capacity” in the sense that 1 unit capacity of each link on that tree is allocated for transmissions on that tree. For example, Figure 2(c) shows 2 unit-capacity spanning trees that can be embedded in the directed graph in Figure 2(a): one spanning tree is shown with solid edges and the other spanning tree is shown in dotted edges. Observe that link (1,2) is used by both spanning trees, each tree using a unit capacity on link (1,2), for a total usage of 2 units, which is the capacity of link (1,2).

To broadcast an -bit value from source node 1, we represent the -bit value as symbols, each symbol being represented using bits. One symbol ( bits) is then transmitted along each of the unit-capacity spanning trees.

Appendix B Dispute Control

The dispute control algorithm is performed in the -th instance of NAB only if at least one node misbehaves during Phases 1 or 2. The goal of dispute control is to learn some information about the identity of at least one faulty node. In particular, the dispute control algorithm will identify a new node as being faulty, or/and identify a new node pair in dispute (at least one of the nodes in the pair is guaranteed to be faulty). The steps in dispute control in the -th instance of NAB are as follows:

  • (DC1) Each node in uses a previously proposed Byzantine broadcast algorithm, such as [6], to broadcast to all other nodes in all the messages that this node claims to have received from other nodes, and sent to the other nodes, during Phases 1 and 2 of the -th instance. Source node 1 also uses an existing Byzantine broadcast algorithm [6] to broadcast its -bit input for the -th instance to all the other nodes. Thus, at the end of this step, all the fault-free nodes will reach correct agreement for the output for the -th instance.

  • (DC2) If for some node pair , a message that node claims above to have sent to node mismatches with the claim of received messages made by node , then node pair is found in dispute. In step DC1, since a Byzantine broadcast algorithm is used to disseminate the claims, all the fault-free nodes will identify identical node pairs in dispute.

    It should be clear that a pair of fault-free nodes will never be found in dispute with each other in this step.

  • (DC3) The NAB algorithm is deterministic in nature. Therefore, the messages that should be sent by each node in Phases 1 and 2 can be completely determined by the messages that the node receives, and, in case of node 1, its initial input. Thus, if the claims of the messages sent by some node are inconsistent with the message it claims to have received, and its initial input (in case of node 1), then that node must be faulty. Again, all fault-free nodes identify these faulty nodes identically. Any nodes thus identified as faulty until now (including all previous instances of NAB) are deemed to be “in dispute” with all their neighbors (to whom the faulty nodes have incoming or outgoing links).

    It should be clear that a fault-free node will never be found to be faulty in this step.

  • (DC4) Consider the node pairs that have been identified as being in dispute in DC2 and DC3 of at least one instances of NAB so far.

    We will say that a set of nodes , where , “explains” all the disputes so far, if for each pair found in dispute so far, at least one of and is in . It should be easy to see that for any set of disputes that may be observed, there must be at least one such set that explains the disputes. It is easy to argue that the nodes in the set below must be necessarily faulty (in fact, the nodes in the set intersection below are also guaranteed to include nodes identified as faulty in step DC3).

    Then, is obtained as . is obtained by removing from edges incident on nodes in , and also excluding edges between node pairs that have been found in dispute so far.

As noted earlier, the above dispute control phase may be executed in at most instances of NAB.

Appendix C Proof of Theorem 1

To prove Theorem 1, we first prove that when the coding matrices are generated at random as described, for a particular subgraph , with non-zero probability, the coding matrices defines a matrix (as defined later) such that if and only if . Then we prove that this is also simultaneously true for all subgraphs .

c.1 For a given subgraph

Consider any subgraph . For each edge in , we “expand” the corresponding coding matrix (of size ) to a matrix as follows: consists blocks, each block is a matrix:

  • If and , then the -th and -th block equal to and , respectively. The other blocks are all set to .

    Here denotes the transpose of a matrix or vector.

  • If , then the -th block equals to , and the other blocks are all set to 0 matrix.

  • If , then the -th block equals to , and the other blocks are all set to 0 matrix.

Let for as the difference between and in the -th element. Recall that and . So is a row vector of elements from that captures the differences between and for all . It should be easy to see that

So for edge , steps 1-2 of Algorithm 1 have the effect of checking whether or not .

If we label the set of edges in as , and let be the sum of the capacities of all edges in , then we construct a matrix by concatenating all expanded coding matrices:

where each column of represents one coded symbol sent in over the corresponding edge. Then steps 1-2 of Algorithm 1 for all edges in have the same effect of checking whether or not . So to prove Theorem 1, we need to show that there exists at least one such that

It is obvious that if , then for any . So all left to show is that there exists at least one such that . It is then sufficient to show that contains a submatrix that is invertible, because when such an invertible submatrix exist,

Now we describe how one such submatrix can be obtains. Notice that each column of represents one coded symbol sent on the corresponding edge. A submatrix of is said to be a “spanning matrix” of if the edges corresponding to the columns of form a undirected spanning tree of – the undirected representation of . In Figure 2(d), an undirected spanning tree of the undirected graph in Figure 2(b) is shown in dotted edges. It is worth pointing out that an undirected spanning tree in an undirected graph does not necessarily correspond to a directed spanning tree in the corresponding directed graph . For example, the directed edges in Figure 2(a) corresponding to the dotted undirected edges in Figure 2(d) do not consist a spanning tree in the directed graph in Figure 2(a).

It is known that in an undirected graph whose MINCUT equals to , at least undirected unit-capacity spanning trees can be embedded [16]444The definition of embedding undirected unit-capacity spanning trees in undirected graphs is similar to embedding directed unit-capacity spanning trees in directed graphs (by dropping the direction of edges). This implies that contains a set of spanning matrices such that no two spanning matrices in the set covers the same column in . Let be one set of such spanning matrices of . Then union of these spanning matrices forms an submatrix of :

Next, we will show that when the set of coding matrices are generated as described in Theorem 1, with non-zero probability we obtain an invertible square matrix . When is invertible,

For the following discussion, it is convenient to reorder the elements of into

so that the -th through the elements of represent the difference between () and in the -th element.

We also reorder the rows of each spanning matrix () accordingly. It can be showed that after reordering, becomes and has the following structure:

(7)

Here is a square matrix, and it is called the adjacency matrix of the spanning tree corresponding to . is formed as follows. Suppose that the -th column of corresponds to a coded symbol sent over a directed edge in , then

  1. If and , then the -th column of has the -th element as 1 and the -th element as -1, the remaining entries in that column are all 0;

  2. If , then the -th element of the -th column of is set to -1, the remaining elements of that column are all 0;

  3. If , then the -th element of the -th column of is set to 1, the remaining elements of that column are all 0.

For example, suppose is the graph shown in Figure 2(b), and corresponds to a spanning tree of consisting of the dotted edges in Figure 2(d). Suppose that we index the corresponding directed edges in the graph shown in Figure 2(a) in the following order: (2,3), (1,4), (4,3). The resulting adjacency matrix .

On the other hand, each square matrix is a diagonal matrix. The -th diagonal element of equals to the -th coefficient used to compute the coded symbol corresponding to the -th column of . For example, suppose the first column of corresponds to a coded packet being sent on link . Then the first diagonal elements of and are 1 and 2, respectively.

So after reordering, can be written as that has the following structure:

(8)

Notice that is obtained by permuting the rows of . So to show that being invertible is equivalent to being invertible.

Define for . Note that is a sub-matrix of when , and . We prove the following lemma:

Lemma 1

For any , with probability at least , matrix is invertible. Hence is also invertible.

Proof:

We now show that each is invertible with probability at least for all . The proof is done by induction, with being the base case.

Base Case – :
(9)

As showed later in Appendix C.3, is always invertible and . Since is a -by- diagonal matrix, it is invertible provided that all its diagonal elements are non-zero. Remember that the diagonal elements of are chosen uniformly and independently from . The probability that they are all non-zero is .

Induction Step – to :

The square matrix can be written as

(10)

where

(11)

is an -by- matrix, and

(12)

is an -by- matrix.

Assuming that is invertible, we transform as follows:

(13)
(14)
(15)

Here and each denote a and a identity matrices. Note that since the matrix multiplied at the left has determinant , and the matrix multiplied at the right has determinant 1.

Observe that the diagonal elements of the diagonal matrix are chosen independently from . Then it can be proved that is invertible with probability at least (See Appendix C.4.) given that is invertible, which happens with probability at least according to the induction assumption. So we have

(16)

This completes the induction. Now we can see that is invertible with probability

(17)

Now we have proved that there exists a set of coding matrices such that the resulting satisfies the condition that if and only if .

c.2 For all subgraphs in

In this section, we are going to show that, for , if the coding matrices are generated as described in Theorem 1, then with non-zero probability the set of square matrices are all invertible simultaneously. When this is true, there exists a set of coding matrices that is correct.

To show that ’s for all are simultaneously invertible with non-zero probability, we consider the product of all these square matrices: