Universal and Robust Distributed Network Codes
Abstract
Random linear network codes can be designed and implemented in a distributed manner, with low computational complexity. However, these codes are classically implemented [1] over finite fields whose size depends on some global network parameters (size of the network, the number of sinks) that may not be known prior to code design. Also, if new nodes join the entire network code may have to be redesigned.
In this work, we present the first universal and robust distributed linear network coding schemes. Our schemes are universal since they are independent of all network parameters. They are robust since if nodes join or leave, the remaining nodes do not need to change their coding operations and the receivers can still decode. They are distributed since nodes need only have topological information about the part of the network upstream of them, which can be naturally streamed as part of the communication protocol.
We present both probabilistic and deterministic schemes that are all asymptotically rateoptimal in the coding blocklength, and have guarantees of correctness. Our probabilistic designs are computationally efficient, with orderoptimal complexity. Our deterministic designs guarantee zero error decoding, albeit via codes with high computational complexity in general. Our coding schemes are based on network codes over “scalable fields”. Instead of choosing coding coefficients from one field at every node as in [1], each node uses linear coding operations over an “effective fieldsize” that depends on the node’s distance from the source node. The analysis of our schemes requires technical tools that may be of independent interest. In particular, we generalize the SchwartzZippel lemma [2] by proving a nonuniform version, wherein variables are chosen from sets of possibly different sizes. We also provide a novel robust distributed algorithm to assign unique IDs to network nodes.
I Introduction
The paradigm of network coding allows each node in a network to process information in a nontrivial manner. As shown in [3, 4, 5], even if intermediate nodes simply perform linear operations over some finite field, the resulting network codes can be informationtheoretically rateoptimal for a large class of communication problems. In particular algorithms that design codes for multicast communication problems, wherein each of multiple sinks requires the same information from a source node, have been wellstudied. The design algorithms in [6, 7] are deterministic and centralized, and result in network codes with zeroerror. In contrast the algorithms in [1, 8] are decentralized and probabilistic, and for any result in network codes that “fail” with probability at most . Both these types of design algorithms (and the resulting network codes) are computationally tractable.
However, for all current network code design algorithms, some information network parameters is necessary prior to the code design, to determine the size of the finite field over which linear network coding is performed. In particular, the centralized algorithms in [6, 7] require prior knowledge of the entire network, and even the decentralized algorithms in [1, 8] require knowledge of the network size and the number of sinks – if these parameters are unavailable, code design cannot proceed with any guarantees of correctness, hence prior designs are not universal. Also, in the case of dynamically changing network topologies, if even one new node joins the network the entire network code may need to be updated due to a change in the fieldsize required, hence such codes are also not robust.
In this work we develop the first universal and robust distributed linear codes that are independent of all network parameters, and are designed to satisfy a prespecified tolerance on the errorprobability (defined as the probability that the linear transform from the source to some sink is not invertible). The essential idea behind our design is that of using “scalable fields”^{1}^{1}1Scalable fields do not mean embeddings of, for example, into , since arithmetic operations are defined differently over different fields, and hence the overall transform would not be linear. . Linear coding operations are chosen from nested finite subsets of an appropriate infinite field – in particular we choose the field of rational functions over , i.e., the field whose elements are ratios of binary polynomials. Operations over this field can be implemented via binary filters (or equivalently, convolutional codes) at each node. For instance, a node that chooses to implment the operation on an incoming binary sequence to generate an outgoing binary sequence would set . Convolutional network coding as a model of linear network coding has been wellstudied (see for example [9]).
As information percolates down the network, each node makes its own estimate of the “effective field size”, i.e., the size of the subset of from which that node should choose its coding operations, so as to meet the guarantee on the prespecified tolerance on the overall errorprobability. Our codes are able to perform this bookkeeping despite having access only to information that can be percolated down the network at rates that are asymptotically negligible in the blocklength – like standard distributed network codes, our codes are also asymptotically rateoptimal.
Our results are as follows. In Section III we prove a generalization of the SchwartzZippel lemma [2] that is useful as a technical tool in some of our code constructions; it may also be of independent interest for other universal algorithms.
In Section IV we present probabilistic universal and robust codes. That is, given any and any network, we present codes that guarantee that the linear transform from the source to each sink is invertible with probability at least – hence our codes are universal. Further, even if nodes join or leave, preexisting nodes do not need to change their coding operations to preserve the same guarantee of correctness – hence our codes are robust. We present two such codes. The first code is independent of network size, but does depend on the number of sinks. We present it primarily for expository purposes, since its presentation is simpler than that of our second set of codes, which are independent of all network parameters, including the number of sinks. Both these sets of codes base their choices of coding operations based on their distance from the source node. While the effective fieldsize over which our codes operate, and hence the computational complexity of our codes, are larger than those of prior distributed designs [1, 8], the complexity of implementing them is still polynomial in network parameters. Also, we present in Theorem 3 a class of networks that demonstrates that our codes have essentially orderoptimal computational complexity for universal codes.
In Section V we consider deterministic universal and robust codes. As a technical tool we first discuss a decentralized algorithm to distribute unique IDs to each node in a robust manner – even if a new node joins we guarantee that it too can be given an ID that is distinct from all others in the network. Building on this tool, and a novel use of Cantor’s classical mapping between and for any finite , we design zeroerror decentralized codes that are independent of all network parameters, and robust to changes in network topology. We provide two constructions. Our first construction, also primarily expository, is just for codes of rate , and is computationally efficient to design and implement. Our second construction is for arbitrary rate codes. This generalization comes at the cost of exponentially increasing the implementation complexity, compared to our other constructions^{2}^{2}2We distinguish between the computational complexity of design and that of implementation. The former refers to the the computational cost of designing the coding operations at each node, and is a onetime cost. The latter corresponds to the computational cost incurred by each node as it implements the predesigned coding operations, and is a repeated cost for each packet transmitted by that node. All our codes have design complexity that is at most polynomial in the network parameters. Further, most of our designed codes codes have implementation complexity that is also polynomial in network parameters; the only exception is the last of the proposed designs corresponding to the general design for zeroerror universal and robust distributed linear codes. .
We note that all our algorithms provide guarantees of correctness as long as the source transmits information at a rate no greater than can be supported by the network, i.e., its mincut. We view the process of determining this rate as a ratecontrol issue – our code designs are independent of the size of the mincut.
Ia Related work
The distributed random linear codes of [1, 8] require fieldsizes to scale roughly as . As shown in [10], even with centralized design of network codes, the field size over which coding must be performed as at least .
As to universal codes (codes independent of some problem parameters), they have been wellstudied in the classical informationtheory setting (for instance in source coding [11] and channel coding [12]).
In the network coding setting, however, the literature is much sparser. The work of [5] proposes “robust network codes” that are resilient to network failure patterns. However, the fieldsize over which coding is performed depends on the number of failure patterns, and hence these codes are not truly universal. Further, the computational complexity of designing such codes is prohibitive. There is also significant work on network coding for packet erasure networks (for instance [13]). Our codes can tolerate all such errors.
The work of [14] examines “decentralized network coding” in which new nodes can join a network without disrupting predesigned coding operations. Here, too, the fieldsize choice for the initial design depends on the size of the network. Further, the code designs are for special cases – either for rate codes (analogous to the codes we present in Section VC) or for networks with only two sink nodes.
Ii Notation and definitions
Iia Network Model
In this paper, we adopt the singlesource multicast network model of [5]. Let the network be represented by a directed acyclic^{3}^{3}3The work can be directly extended to multisource multicast networks, and over networks that may contain cycles, as long as each source has a unique identifier. To ease notational and description complexity we omit details here graph . Here represents the set of nodes and the set of edges. The graph has a prespecified source node and sink nodes . A directed edge from node to node is said to have tail (denoted ) and head (denoted ). The link is then said to be an incident outgoing link of and an incident incoming link of .
IiB Communication Model
The communication goal is for the source to communicate identical information to each sink.
As is standard [5], we assume that each link carries one packet of information per timestep. This is reasonable since if some link’s capacity is less we may consider the link’s communication to be over multiple successive timesteps, and if the link’s capacity is greater we can subdivide it into multiple links. The packetlength in bits is denoted by .
The network capacity, denoted by , is the timeaverage of the maximum number of packets that can be delivered from the source to each sink simultaneously. It can be also expressed as the minimum of the mincut from the source to each sink . The rate is the average number of information packets that the source generates per timestep, to be delivered to each sink over the network . Without loss of generality we assume that . Lastly, let denote the maximum capacity of any single link in the network.
IiC Code Model
IiC1 Network code
The network code comprises of the encoders at the source and each node inside the network, and the decoders at each of the sinks. In particular we focus on linear network codes, i.e., codes where the source node, each internal node, and each sink performs linear combinations of information in packets on incident incoming links to generate packets on incident outgoing links. Specifically we consider the class of convolutional linear operations, wellstudied in classical coding theory, that we reprise below. The basefield for arithmetic is chosen to be , hence all operations described below are binary.
IiC2 Convolutional network code
Recall that the transform [17] of any sequence of bits is given by the polynomial , denoted . Further, recall that the output of the convolution operation between two sequences^{4}^{4}4Terms and are respectively set to zero for and . and is defined as the length sequence whose th term equals . Lastly, it is wellknown that the transform of equals .
Convolutional codes [17] have long been used in pointtopoint communication scenarios. The idea of using convolutional codes for network coding (in networks with cycles) was foreshadowed in [3], and made explicit in [5] (who also noted that such an algebraic model for coding operations can help kill two birds with one stone, i.e., it can also help model delays in networks). The work of Erez et al. (see for example [9]) gave the first efficient designs for convolutional network codes, i.e., codes over . In our work, affords the advantage that it allows for coding operations to be chosen from a potentially unbounded set, which helps us circumvent the difficulty that we do not know the network’s parameters in advance.
The source’s packets are denoted by – each is a length bitvector. The corresponding transforms are denoted . Collectively they are represented by the length vector of polynomials . Each edge carries the packet , and its transform is denoted . Lastly, the transform of the packets on incident incoming links to any sink are denoted by the vector . We henceforth refer to a sequence and its transform interchangeably.
Let , and be three nodes such that there is at least one edge from to and at least one edge from to . We use a tuple to denote a coding choice for such nodes – specifically, refers to the local coding coefficient of the convolutional operation on the information on the th edge from to to the th edge from to . The choices of values of the local coding coefficients are code design parameters whose specifications are the primary objective of subsequent sections. Let be a specific edge from to , and denote a dummy variable that ranges over all edges incoming to (and hence is indexed by the pair ). Thus the convolutional operation that is performed at node comprises of taking linear combinations of the information with the appropriate over all edges incident incoming to node , to generate the information on the edge incident outgoing from . To simplify notation, we henceforth write simply as with the understanding that index the appropriate tuple. Thus the linear transform at each node can be written symbolically as
Since all the linear operations performed by the network can be represented via operations over polynomials over the binary field, we henceforth consider all arithmetic to be over the field of rational functions [5] over , denoted by . The elements of this field are of the form , where both and are binary polynomials. Linear codes over this field have been wellstudied in the convolutional coding literature [17].
As in classical distributed network codes [1], the codes in this work are distributed, i.e., the choice of a value for at node can depend only on its local parameters , and the corresponding parameters of the nodes upstream of node . Since we consider only directed acyclic networks in this work, this imposes a significant design constraint, since nodes that cannot directly communicate with each other over the network cannot coordinate their coding choices.
One idea of [1] that we too use is the idea of having “short headers” in each transmitted packet. Specifically, each packet (containing bits) transmitted by the source, also contains the linear transformation induced by the network from the source to that packet – as in [1] these transforms are computed in a distributed manner and percolated down the network along with the payload information at an asymptotically negligible rate. For every , let be the network transfer matrix from to – these too can be computed in a distributed manner. Let be the overall network transfer matrix from to formed as . Let and denote the determinants of and respectively.
Our codes are either probabilistic or deterministic depending on whether local coding coefficients are chosen probabilistically or deterministically. The error probability is the probability over choices of local coding coefficients that for each source message, at least one sink’s reconstruction of at least one possible message from the source is inaccurate. For linear network codes this happens if and only if the transfer matrix from the source to each sink is invertible. Rate is said to be achievable if for any and there exists a coding scheme of block length with rate and error probability . In particular, we require our deterministic network codes to be zeroerror, i.e., to have error probability be zero.
Iii The Generalized SchwartzZippel Lemma
The classical SchwartzZippel lemma [2] provides an upper bound on the probability that when variables of a polynomial are chosen uniformly at random from a field, then the polynomial evaluates to zero.
Recall that the degree of a variable in a polynomial is the maximal exponents of in its nonzero terms. Further, recall that the degree of a polynomial itself is the maximal value among the sum of the exponents of all its nonzero terms. Note that .
Lemma 1 (SchwartzZippel lemma [2]).
Let be a nonzero polynomial of degree over a field . Let be a finite subset of , and the value of each be selected independently and uniformly at random from . Then the probability that the polynomial equals zero is at most
The SchwartzZippel lemma is a useful tool in the analysis of random linear network codes (for instance [1]). A random linear network code causes an error if and only if one of the transfer matrices from the source to the destination is singular. This in turn happens if and only if the product of the determinants of these transfer matrices equals zero. But this product of determinants may be viewed as a polynomial whose variables consist of the local coding coefficients at each node. Hence the SchwartzZippel lemma provides an upper bound on the probability of error of a random linear network code.
In this work we are interested in a generalization of the SchwartzZippel lemma, for polynomials whose variables are chosen from different subsets of . We prove:
Lemma 2.
Let be a nonzero polynomial over a field . For all , let be a finite subset of , the degree of in be , and the value of each be selected independently and uniformly at random from . Then the probability that the polynomial equals zero is at most
Proof:
Given in Appendix A
Note : Neither Lemma 1 nor Lemma 2 put any restriction on the size of the field , as long as the appropriate subsets from which variables chosen are finite.
Note : A related but inequivalent generalization of the SchwartzZippel lemma was proved in [18].
The utility of this proof is that it allows for the variables comprising the polynomial to be chosen nonuniformly. This is integral to the proof techniques in this work, wherein we choose local coding coefficients over progressively larger sets, depending on how far from the source they are.
Iv Probabilistic designs
In this section we describe two probabilistic designs of universal distributed robust network codes. In particular, given any , we present schemes such that the overall error probability of the code is at most .
Our first scheme is independent of the size (number of nodes/edges) of the network, but does requires that the source has a priori knowledge of the number of sinks it shall be required to service. Hence we say it is only weakly universal. Our purpose in presenting this scheme is primarily expository, since the proof is significantly easier than that of the second scheme – it helps set the stage for the second scheme.
The second scheme is strongly universal and is independent of all network parameters, including the number of sinks.
We first describe some useful preprocessing steps relevant for both of our schemes.
Iva Graph transformation
We find it desirable to work over a transformed graph () rather than the original graph (). This transformation can be done locally at each node, and results in a graph with some useful properties. In particular, we use the work of [7] which demonstrates the equivalence between general network coding problems and those over “lowdegree” networks where each node has degree at most three. In particular, nodes in the reduced network either have one incident incoming edge and at most two incident outgoing edges (in which case they broadcast the incoming information on incident outgoing edges, and hence are called broadcasting nodes). Otherwise, they have two incident incoming edges and one incident outgoing edge (in which case they code the information on the incident incoming edges to generate information transmitted on the incident outgoing edge, and hence are called coding nodes). (See Figure 3(b).) This equivalence is useful for our probabilistic algorithms since it allows us to effectively enumerate networks. We change the equivalence relationship of [7] slightly as described below so as to make it robust to nodes joining the original network. That is, in our equivalence relationship, nodes can join the original network while only locally perturbing the “lowdegree” network.
The transformation is as follows. For every node we construct a virtual robust gadget (see Figure 1 for an example^{5}^{5}5We thank Michael Langberg for providing the template for Figures 1 and 2).
Suppose has incoming links and outgoing links. Corresponding each incoming link we construct a binary tree whose root is connected to that incoming link, and which has leaves. Similarly, corresponding to each outgoing link we construct an inverted binary tree whose root is connected to that outgoing link, and which has leaves. The last leaf node of each binary tree is called a virtual node, and the other leaf nodes are called connection nodes. We then connect connection nodes so that there is exactly one path from each incoming link to each outgoing link (the connection order does not matter).
If a new link^{6}^{6}6A new node is treated as the corresponding set of new links created in the network. Similarly, a node’s departure is treated as the set of corresponding links being removed. is created in the network – say, a link directed from to (see Figure 2 for an example). In this case we first create a virtual gadget corresponding to the directed edge . We then split each virtual node on the inverted binary tree (corresponding to the outgoing links of ) into two by appending a binary tree of depth one to it. We denote the second of the two new leaf nodes as a new virtual node, and the first as a new connection node. The connection nodes on ’s virtual gadget are then connected to the new connection nodes on each of the outgoing links’ virtual gadgets so that there is exactly one path from to each link outgoing from . A corresponding (but inverted) procedure holds if the new link corresponded to a link outgoing from . The removal of a link simply corresponds to removal of the corresponding virtual gadgets on the incoming and outgoing sides, and all links connected to it. The virtual nodes in each virtual gadget are what give our transformation robustness, since in case a new node joins or leaves the network, nodes other than the ones directly connecting with the changing node experience no structural changes in their existing virtual gadgets^{7}^{7}7The addition of virtual nodes and the corresponding robust connection procedure is the only substantive difference between our construction and that in [7].
Henceforth, all algorithms in Section IV shall convert the original graph to the virtual graph above as a preprocessing step, and all computations shall be over this virtual graph. Also, as part of normal communication each node in the virtual graph estimates its depth , i.e., the length of the shortest path from the source to itself. This can be done by any of a variety of distributed shortestpath algorithms over acyclic graphs, such as the BellmanFord algorithm [19].
IvB Weakly universal design
The essential idea behind our first scheme is as follows. Each node having estimated its depth, it then chooses a subset of whose size scales exponentially in this depth from which it pick its coding coefficients uniformly at random. We then show that the probability of error due to information being lost at any depth decays geometrically in the depth, and hence by the union bound the overall probability of error can be controlled so as not to exceed any desired .
WUP() (Weakly Universal Probabilistic) Code:

Each coding node in the vertexset of the virtual graph chooses two local coding coefficients corresponding to the two incoming links uniformly at random from the set of polynomials of degree at most .^{8}^{8}8This choice of the degree bound is simply to ease the analysis of Theorem 1. All logarithms are binary. Also, for simplicity of presentation we assume that is an integer – if not, we may round up to the nearest integer with negligible error in our estimate of parameters.
Theorem 1.
For any , WUP() has error probability at most .
Proof: Recall that represents the determinant of the transfer matrix from the source to sink . As noted in [5], the network code is errorfree if and only if the polynomial comprising of the product of the determinants over all sinks (with the network’s local coding coefficients as variables) is nonzero. To evaluate the probability that this is the case given the random assignment of local coding coefficients in WUP() , we use Lemma 2. Specifically, Each variable in Lemma 2 corresponds to a local coding coefficient. We group the coding coefficients in terms of the depths of the nodes at which they are used. But there are at most coding nodes at any depth in the virtual graph, since after the transformation in Section IVA the fast possible growthrate for the new graph would be if it corresponded to parallel binary trees – one for each of the source’s messages. Hence there are at most local coding coefficients at that depth. Also, Corollary in [1] shows that the degree of each local coding coefficient in is bounded from above by . By construction in WUP() each virtual node in the network chooses local coding coefficients uniformly at random from the set of polynomials of degree at most . This set is a subset of of size at most . Summing over all possible local coding coefficients at all possible (possibly infinite) depths and substituting the appropriate parameters into Lemma 2, the error probability of the network code is bounded from above by
The computational complexity of WUP() codes is polynomial in network parameters and , and the achievable rates approach the network capacity asymptotically in the blocklength. Further, our codes are robust to links joining and leaving. Since the analysis of these properties is very similar to that of the codes in Section IVC, we delay discussion to the end of that section.
IvC Strongly universal design
We now present design of probabilistic robust linear network codes that are strongly universal, i.e., independent of all network parameters. This obviates the requirement of knowledge of of the codes in Section IVB. The idea underlying the construction in this section is as follows. For the purpose of analysis, for each sink we identify a set of edgedisjoint paths, and estimate the probability that the information on these edgedisjoint paths remains invertible as information flows through the network. In particular, for any sink and any depth in the network we identify the set of edges in these edgedisjoint paths that must contain linearly independent combinations of the source’s information. We call such sets of edges flowcuts. It turns out that the number of flowcuts at any depth is in fact independent of the number of sinks, and further, a bound on this number at each depth can be computed locally. Thus sinks can be classified according flowcuts. Hence, instead of trying to ensure that the linear transform to each sink is invertible as in Theorem 1, nodes at each depth simply try to ensure that the linear transform to each flowcut is invertible. To analyze the probability of noninvertibility at each flowset an alternative to the endtoend analysis of the probability of error used in [1, 8] is required. Here we use the proof technique of [20], which analyzes the probability that information gets lost from one set of edges in the network to a neighbouring set of edges.
SUP() (Strongly Universal Probabilistic) Code

Each coding node at depth in the vertexset of the virtual graph chooses two local coding coefficients corresponding to the two incoming links uniformly at random from the set of polynomials of degree at most .
Recall that by assumption the capacity of the network is at least . Hence there is a set of at least edgedisjoint paths that go from the source to each .
Corresponding to each such set of edgedisjoint paths, we define flowcuts. A flowcut is defined as a set of edges that have the property that each edge in the flowcut is from a distinct edgedisjoint path in . These flowcuts are useful since we intend to analyze the linear (in)dependence of information flowing through the edges in each flowcut – if the information on each edge in a flowcut is linearly independent, then the source’s information can be retrieved from that flowcut. Hence, we only need to inductively prove that no information is lost from one flowcut to the “next” flowcut, appropriately defined as below^{9}^{9}9Similar intuition was used in the proofs of [6] and [20], where they were called “frontier edgesets”.Note that a flowcut need not be a cut or a subset of it – for instance, it may include two edges on two edgedisjoint paths, such that one is incoming to a node, and the other is outgoing from it..
We define the depth of a flowcut as the maximum depth of the head of any edge in it, i.e., . Further, we denote a flowcut of depth by .
We then define a flowset as an ordered set of flowcuts with some properties. In particular, each flowcut in a flowset differs from the successive flowcut in exactly one edge. Specifically, if one flowcut in differs from the next flowcut in in that some edge is replaced by another , then it must be the case that is the edge preceding in some path in the set of edgedisjoint paths . Intuitively, each flowset captures successive snapshots of how information flow from the source to the sink .
Examples of flowcuts and flowsets are provided in Figure 4(b), based on the butterfly network in Figure 4(a).
Let be some flowcut of depth to sink , and be the flowcut immediately preceding^{10}^{10}10Note that the depth of might be either or , since two successive flowcuts differ in exactly one edge, which may or may not be the deepest edge in a flowcut (if not, then both flowcuts have the same depth; if so, the depth of the flowcut can change by at most one). in flowset . Let be the linear transform that the network imposes from the source to the edges in the flowcut , and let be the rank of this transform. Correspondingly, let be the linear transform from to , and let be the rank of this transform. Then the following lemma gives an upper bound on the probability that choosing local coding coefficients according to the dictates of WUP() results in a loss of information in going from to .
Theorem 2.
For every , SUP() has error probability at most .
Proof: Note that of the two types of nodes in the virtual graph, the broadcasting nodes induce no additional error – if a flowcut contains linearly independent packets, and one of the edges in the flowcut is replaced with another edge at a broadcasting node, the information in the succeeding flowcut remains unchanged. Thus from now on we focus only on coding nodes.
By construction the structure of the virtual graph is such that each node can have at most two outgoing edges, and further the source node is replicated times. Hence the maximum possible number of edges in the virtual graph up to a depth occurs when it comprises of parallel binary trees. But each binary tree has at most edges, hence the total number of edges in the virtual graph up to depth is at most . Also, the total number of flowsets of depth is at most , which is bounded from above^{11}^{11}11Since . Also, all exponents are base . by .
We use these bounds to bound from above the number of distinct type of coding choices a coding node at a certain depth faces. All our analysis now focuses on the specific following coding node . Let its incoming edges be and , and the outgoing edge be . Let edge belong to a flowcut in flowset going towards sink , and edge be an arbitrary other edge. Then the outgoing edge replaces in the flowcut to produce flowcut . Suppose is of depth . Then by the bounds in the preceding paragraph, the number of ways an arbitrary flowcut of depth can result from the merger of a preceding flowcut and an arbitrary edge of depth at most is at most , which equals
(1) 
Next, we estimate the probability that a coding node “loses information”. That is, we bound from above the probability that the number of linearly independent packets on the edges of a flowcut is less than even though the immediately preceding flowcut has linearly independent packets.
Say represents the matrix whose th row represents the linear combinations of the source’s messages on the th link in the flowcut of depth . Correspondingly, let represent the matrix representing the linear transform from the source to the flowcut immediately preceding in the flowset , and suppose it is of full rank . Then the message on edge in flowcut may be written as . (Recall that and are the edges incoming to , and are the corresponding messages carried by them, and and represent the local coding coefficients at node .) But by assumption is of fullrank, and hence the message may be written as a linear combination of the messages on the edges in flowset . Thus the message on edge may be written as
for some . This in turn equals
But the information on the links in other than is unchanged, and hence the only manner in which the messages on the edges in are linearly dependent is if the coefficient equals zero. But by the choice specified in SUP() the coding coefficients are chosen from the set of polynomials of degree at most – this set is of size . Lemma 2 then implies that the probability that the degree polynomial equals zero, is at most . Analogously to Theorem 1, taking the union bound over all possible coding operations at depth (for which (1) is an upper bound), and summing over all (possibly infinite) depths gives us that the overall probability of error is at most
IvD Robustness
Due to the robust graph transformation described in Section IVA, neither the addition nor the deletion of edges or nodes in the network causes problems with our proof. If new nodes are added to the network, the actual depth of some nodes in the network, say in particular, may decrease. However, we require each node to use in perpetuity the value of the depth it estimates in the first round of communication. This ensures that the bound on the number of coding nodes at a particular depth is not violated. Conversely, if nodes leave the network, the actual depth of may increase. Nonetheless, the bound on the number of coding nodes at a particular depth is still not violated, since the total number of nodes in a network has only decreased.
IvE Complexity analysis
The complexity of both WUP() and SUP() scale with the corresponding degree of the polynomials chosen by nodes as coding coefficients.
For WUP() , by construction, the degree of the polynomials used as coding coefficients (as noted in [21] is a good proxy for the implementation complexity of codes) scales as the maximum depth of the virtual graph plus . But in the virtual graph each original node is replaced with a gadget with depth that is approximately the logarithm of the degree of the node, which in the worst case is at most^{12}^{12}12Recall that denotes the maximum value for any edge’s capacity in the network. . Hence the maximum depth of the virtual graph is at most times the number of vertices in the original graph. Pulling it all together, the complexity of implementation scales as .
The corresponding redundancy the network introduces in the codes, arising from the delays introduced by each coding node, then scales at worst as the maximum depth of the virtual graph times the maximum degree of the polynomials used by any node, since each coding node in a path can introduce at most the maximal delay and delays along a path add up. Using the bounds above, this scales as .
A similar analysis shows that the complexity of implementation of SUP() scales as , and that the redundancy introduced by such codes scales as .
We now demonstrate that the implementation complexity of WUP() and SUP() is in fact orderoptimal by demonstrating a class of networks for why any universal design requires computational complexity which is similar in order of magnitude to that of WUP() and SUP() . Our construction is inspired by that of [10]. Consider any universal error linear network code, i.e., any network code that requires that each sink be able to reconstruct the source’s information with probability of error at most , even if the network topology is not known in advance.
Theorem 3.
There exists a class of networks for which the implementation complexity for any error universal network code is .
Proof:
We construct a singlesource multicast network that requires that for any universal code, coding operations must be chosen from a set that must be exponentially larger than would actually be needed if the topology were known in advance. Consider the “binarytreelike” network demonstrated in Figure 5. The upper part of this graph comprises of a binary tree of depth , with the source located at the root of the tree, and hence leaf nodes. Each link of this binary tree has capacity . Next, each leaf node of this binary tree has a link of unit capacity leaving it to a corresponding forwarding node. Finally, we consider two possibilities. Either there are sinks, such that each sink is connected to a distinct subset of size two of the set of forwarding nodes, via unit capacity links to each of the two nodes in the subset (this is in the spirit of the combination networks examined in [10]). Or, there is only one sink node in the network, which is connected to two of these forwarding nodes (say and ) via links of unit capacity each.
As to coding strategies, each node in the binary tree part of the network can forward two linearly independent messages and on each of its outgoing links. Hence at depth each of the leaves of the binary tree have both and . Since neither of the two leaf node corresponding to or (say we call them and respectively) is aware of which of the two configurations the network is in, to be universal they must use coding operations that work for both configurations.
First we consider the case of , i.e., every message must be decoded correctly. Suppose the leaf nodes and choose to transmit the linear combinations and , or equivalently and (by setting for ). But the messages on each of the forwarding links must be linearly independent (to take into account the eventuality that the network is in the first configuration with sinks). Hence there must be at least choices for the s (one for each of the leaf nodes, minus one for the case when , which can be handled separately.) But if the set of possible coding operations is , then its implementation complexity must be at least . But this is . In contrast, if the network happened to be in the configuration with only one sink and this was known in advance, then each of and could simply forward one bit, for an implementation complexity of .
The case with errors can be similarly analyzed, by allowing sinks to make errors a fraction of the time. A direct counting argument gives the required result.
V Deterministic designs
In this section we describe two deterministic designs of universal distributed robust network codes that are zeroerror^{13}^{13}13Preliminary versions of the proofs in this section were in the thesis [15].
Our first scheme is only for codes of rate . It is related to a construction of [14], but generalizes it so that the choice of coding operations is independent of the size of the network. We call our scheme the Rate Deterministic Design R2 for short. Our purpose in presenting this first scheme is primarily expository, since the proof is significantly easier than that of the second scheme – it helps set the stage for the second scheme.
Our second scheme is for general rates and is independent of all network parameters, including the number of sinks. We call this scheme the Capacity or more, Probability of error scheme, or C3 for short. However, C3 is more of an existence result than a practical code since the computational complexity of its implementation is exponential in network parameters.
We first describe some useful preprocessing steps relevant for both of our schemes.
Va Robust distributed unique ID assignment
While the codes in Section IV only required nodes to estimated their depth, the zeroerror codes in presented in this section require nodes to obtain a unique ID, i.e., an ID that is distinct for each node in the network. Such an ID allows nodes to loosely coordinate coding choices even if they are unable to communicate directly with each other, and thereby ensure that the overall code is “good”. Such IDs might be preassigned to nodes (for example via factory stamps, or GPS coordinates, or IP addresses), or be assigned on the fly, as described below.
The task of distributing unique IDs to nodes over a directed graph was considered in [22]. The essential idea of their algorithm is to pretend that the graph is a tree directed from the root to the leaves (if not, extra edges are removed for the ID assignment protocol), and to assign IDs so that the binary expansion of each node’s ID is a prefix to the binary expansion of all nodes downstream from it. This ID distribution can be carried out with communication cost that is asymptotically negligible in the packet length, in conjunction with the normal flow of information through the network, for instance in the header. Here, as in Section IVA, we need to change the unique ID distribution protocol slightly to make it robust to network changes, so that new nodes are still ensured that IDs assigned to them do not clash with previously assigned IDs. In the same spirit as the robust virtual gadgets in Section IVA, at each node we reserve a virtual ID for the event that a new node might in the future connect to ; if so, this virtual ID is again split into another virtual ID, and an ID that is assigned to the new node. As noted in [22] the worstcase growth rate of the largest node ID with the network size is exponential in , for reasons similar to those outlined in Theorem 3 – nodes might be unable to distinguish between a full binary tree, and a very sparse graph.
VB Cantor labeling
The wellknown Cantor diagonal argument [23] makes an unexpected cameo in this work. One version states that the cardinality of the set of integers is the same as that of the set of finite dimensional vectors with integer components, and further gives an effective bijection between the two sets. Further, this bijection guarantees that any vector in with maximum component is mapped to an integer of size . This mapping is useful since, given a unique ID for each node , we then need to produce unique coding coefficients for each pair of edges such that one is incoming to and the other is outgoing for . Prior to code design, the number of such coefficients that each node might need to choose is unknown. However, each coefficient can be labeled by at most the five indices , each of which is an integer. Hence given a node’s unique ID, one can produce unique integral labels for each vector that are not too much larger (at most the fifth power) than any of the five parameters in . This mapping, denoted , can then be used to select distinct local coding coefficients as needed in Sections VC and VD
VC Rate zeroerror codes
For the case when the transmission rate equals , note that there are essentially just two nontrivial scenarios for each node – either a node receives one linearly independent message on incoming links, or it receives two. In the former case, it can only broadcast incoming information on outgoing links. In the latter case, it can reconstruct the source’s information, and thereby can fully control the linear combinations on outgoing links. Our construction for R2 rests on analysis of these cases.
R2 (Rate Deterministic Design)

The source has two linearly independent messages and .

Depending on its connectivity to the source, on incoming edges each node receives either one or two linearly independent combinations of the source messages .

If a node receives only one linearly independent message on incoming links, it broadcasts it down all outgoing edges.

If a node receives two linearly independent combinations of , this enables it to reconstruct both and . For each th directed edge connecting each pair of nodes , (connected possibly by multiple parallel edges), we use the Cantor labeling algorithm^{14}^{14}14In Section VB we assume that and are also variable, but in this section they are fixed. in Section VB to assign a distinct local coding coefficient. In particular, let denote the dimensional Cantor mapping. Then the node then transmits down the th edge connecting to (here is chosen to be distinct for each ).
Theorem 4.
For any network with mincut capacity at least , R2 succeeds with zero error.
Proof: For any such that the mincut between the source and is at least , there are at least two edgedisjoint paths from the source to . By the statement of our R2 algorithm, for any such nodes and , the linear combinations of and on all their outgoing links must be distinct, and linearly independent (since the vectors and are linearly independent if and only if and are distinct).
VD General zeroerror codes
The challenge in extending the results of Section VC to rates greater than lies in the fact that there might be nodes receiving two or more linearly independent pieces of information, and yet are unable to decode the source messages. In this case, they do not have full control over the messages they are able to send out, and hence the argument of Theorem 4 fails. In this section, we get around this challenge by examining a different invariant of linear convolutional network codes. In particular, we choose coding coefficients in a distributed manner so that the delay of the source messages on every path in the network is distinct. This means that the source messages never cancel out at the sinks, and hence can be reconstructed.
C3 (Capacity or more, Probability of error ) codes

For each tuple , let be the dimensional Cantor mapping defined in Section VB. We define the local coding coefficient as
i.e., the monomial in with degree (here the exponent is base ).
Theorem 5.
For any network , C3 succeeds with zero error.
Proof: Theorem in [1] demonstrates that , the determinant of the transfer matrix from the source to any sink , can be written as , where the product is over all the local coding coefficients on a particular path from to , is a nonzero constant corresponding to , and the outer summation is over all paths from to . Our choice of local coding coefficients along any path in C3 implies that equals
(2) 
But by choice, each of the terms is distinct, and hence the binary expansion of has a single in a distinct location. But if two paths in the summation (2) differ, then they must differ in at least one of the local coding coefficients, and therefore the exponent of the power of along the two paths must differ – hence each path corresponds to a distinct power of . This implies that as long as there is at least one path from to each , each of the corresponding transforms must be invertible.
VE Complexity Analysis
The complexity of both R2 and C3 scale with the corresponding Cantor labeling and node assignments.
For R2 the size of the set any node chooses its coding coefficients from scales as the third power of the largest node ID or the largest linkcapacity in the network. But as noted in Section VB, the largest node ID can scale exponentially in . Hence the degree of the polynomials used as coding coefficients scales logarithmically in the size of the sets from which local coding coefficients are chosen, which in turn scales as . The corresponding redundancy the network introduces in the codes, arising from the delays introduced by each coding node, then scales as , since each coding node in a path can introduce at most the maximal delay and delays along a path add up.
A similar analysis shows that the complexity of implementation of C3 scales as , and that the redundancy introduced by such codes scales as .
The problem of polynomial identity testing (PIT) [24] examines the question of deterministically determining whether a polynomial with a succinct but nonstandard representation (such as the determinant of a matrix of polynomials) identically equals zero. The deterministic complexity of such problems is a longstanding open problem in theoretical computer science. Given this context, we are unable to provide intuition on whether our codes in Section VD have orderoptimal computational complexity – indeed, answering this question in either direction would represent significant progress in resolving the complexity of PIT problems.
Vi Implementation issues
As noted in [21], the complexity of implementation of network codes scales polynomially in the logarithm of the fieldsize over which operations are performed, or in the case of convolutional network codes, polynomially in the degree of the polynomials used at each node. By this measure, the implementation complexity of the codes in [1, 8] is polylogarithmic in network parameters, whereas the implementation complexity of the first three of the four codes in this work is polynomial in network parameters. While this is an exponential blowup, we note that the resulting codes are still computationally tractable, and further, as noted in Section IVE, such a blowup is in fact necessary for codes to be universal.
Code  Error  Implementation  
known  known  Probability  Complexity  
[1]  yes  yes  
[7]  yes  yes  0  
WUP()  yes  no  
SUP()  no  no  
R2  no  no  0  
C3  no  no  0 
While the schemes in this work have been presented in the context of convolutional network coding operations at each node, they also go through for other infinite fields such as – the only requirement is that the field be unbounded in size, and that an infinite subset of it have a succinct representation.
Also, despite presenting all messages at the source and each link as bitstreams of possibly unbounded length, the schemes described in prior sections can also be implemented by packetization, by chopping up the bitstreams into packets of a standard size .
In our codes the header of each packet contains lowrate control information used by each node to decide on its coding operations. However, by design, the size of this header changes as information flows down the network – the rate of change depends on the network topology, and hence is unpredictable in advance. One challenge in the implementation of our codes is thus to ensure that the intermediate nodes are able to distinguish between header information and payload information. One standard trick for such scenarios is used in Theorem of [25] – each bit of the header is doubled, and the final such doublebit is followed by a to signify the end of the header. Since the length of the header is asymptotically negligible in the packetsize, the communication cost of this bitdoubling is still asymptotically negligible.
Vii Discussion
In this work we provide the first rateoptimal network code designs that have guaranteed decodability performance, and yet are independent of all network parameters. While requiring such universality makes us pay a price in the computational complexity and redundancy, (all but one of) our codes are computationally efficient to implement. The analytical tools we derive may well be of independent interest.
References
 [1] T. Ho., R. Kötter, M. Médard, D. Karger, and M. Effros, “The benefits of coding over routing in a randomized setting,” in IEEE International Symposium on Information Theory (ISIT), Yokohama, July 2003, p. 442.
 [2] J. Motwani and P. Raghavan, Randomized Algorithms. Cambridge University Press, 1995.
 [3] R. Ahlswede, N. Cai, S.Y. R. Li, and R. W. Yeung, “Network information flow,” IEEE Transactions on Information Theory, vol. 46, no. 4, pp. 1204–1216, Jul. 2000.
 [4] S.Y. R. Li, R. W. Yeung, and N. Cai, “Linear network coding,” IEEE Transactions on Information Theory, vol. 49, no. 2, pp. 371–381, Feb. 2003.
 [5] R. Kötter and M. Médard, “Beyond routing: An algebraic approach to network coding,” in Proceedings of the 21st Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM), vol. 1, 2002, pp. 122–130.
 [6] S. Jaggi, P. Sanders, P. A. Chou, M. Effros, S. Egner, K. Jain, and L. Tolhuizen, “Polynomial time algorithms for multicast network code construction,” IEEE Transactions on Information Theory, vol. 51, no. 6, pp. 1973–1982, June 2005.
 [7] M. Langberg, A. Sprintson, and S. Bruck, “The encoding complexity of network coding,” in International Symposium on Information Theory, Sept. 2005.
 [8] S. Jaggi, P. A. Chou, and K. Jain, “Low complexity algebraic multicast network codes,” in IEEE International Symposium on Information Theory (ISIT), Yokohama, July 2003, p. 368.
 [9] E. Erez and M. Feder, “Convolutional network codes,” in IEEE International Symposium on Information Theory, 2004.
 [10] A. Lehman and E. Lehman, “Complexity classification of network information flow problems,” in Proceedings of SODA, 2004.
 [11] J. Rissanen, “A universal data compression system,” IEEE Transactions on Information Theory, vol. 29, no. 5, pp. 656–664, Sep. 1983.
 [12] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems. Akadémiai Kiadó, 1981.
 [13] A. F. Dana, R. Gowaikar, R. Palanki, B. Hassibi, and M. Effros, “Capacity of wireless erasure networks,” IEEE Transactions on Information Theory, vol. 52, no. 6, pp. 789–804, Mar. 2006.
 [14] C. Fragouli and E. Soljanin, “Decentralized network coding,” in Information Theory Workshop, San Antonio, TX, USA, 2004, pp. 310–314.
 [15] S. Jaggi, “Design and analysis of network codes,” Dissertation, California Institute of Technology, 2006.
 [16] S. Jaggi, T. Ho, and M. Effros, “Zeroerror distributed network codes,” in Information Theory and Applications Workshop (unpublished), UCSD, San Diego, CA, Jan 2007.
 [17] D. J. C. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.
 [18] N. Alon, “Combinatorial nullstellensatz,” Combinatorics, Probability and Computing, vol. 8, pp. 7–29, 1999.
 [19] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2nd Edition. MIT Press and McGrawHill, 2001.
 [20] S. Jaggi, Y. Cassuto, and M. Effros, “Low complexity encoding for network codes,” in Proc. of the International Symposium on Information Theory, Seattle, WA, USA, Sep 2006.
 [21] S. Jaggi, M. Effros, T. Ho, and M. Médard, “On linear network coding,” in Proceedings of 42nd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, 2004.
 [22] J. Bruck, M. Langberg, and M. Schwartz, “Distributed broadcasting and mapping protocols in directed anonymous networks,” in The proceedings of the TwentySixth Annual ACM SIGACTSIGOPS Symposium on Principles of Distributed Computing (PODC), 2007, pp. 383–383.
 [23] G. Cantor, “Eigenschaft des inbegriffes aller reelen algebraischen zahlen,” Crelles Journal, vol. 77, pp. 258–262, 1874.
 [24] M. Agrawal and R. Saptharishi, “Classifying polynomials and identitytesting,” Current Trends in Science, 2009, 2009.
 [25] T. Cover and J. Thomas, Elements of Information Theory. John Wiley and Sons, 2nd Edition, 2006.
 [26] M. Mitzenmacher and E. Upfal, Probability and Computing : Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York (NY), 2005.
a Proof of Lemma 2
We proceed by mathematical induction. In the base case when , Lemma 2 is equivalent to the SchwartzZippel lemma in one variable.
As the inductive hypothesis, suppose that Lemma 2 is true for variables in the polynomial .
Now consider the case when the polynomial has variables. The polynomial can be rewritten so that
for some polynomials and over the appropriate variables.
The probability that equals zero can be bounded from above by
(3)  
But by the inductive hypothesis
(4) 
Also, by the Principle of Deferred Decisions [26] the probability is unaffected if the value of is chosen after the values of all the other variables have been fixed. In this case, if , then is a polynomial of degree over . By the SchwartzZippel lemma
(5) 