# Mission impossible: Computing the network coding capacity region

## Abstract

One of the main theoretical motivations for the emerging area of network coding is the achievability of the max-flow/min-cut rate for single source multicast. This can exceed the rate achievable with routing alone, and is achievable with linear network codes. The multi-source problem is more complicated. Computation of its capacity region is equivalent to determination of the set of all entropy functions , which is non-polyhedral. The aim of this paper is to demonstrate that this difficulty can arise even in single source problems. In particular, for single source networks with hierarchical sink requirements, and for single source networks with secrecy constraints. In both cases, we exhibit networks whose capacity regions involve . As in the multi-source case, linear codes are insufficient.

## 1Introduction

Network coding [1] generalizes routing by allowing intermediate nodes to perform coding operations which combine received data packets. One of the most celebrated benefits of this approach is increased throughput in multicast scenarios. This stimulated much of the early research in the area. One fundamental problem in network coding is to understand the capacity region and the classes of codes that achieve capacity. In the single session multicast scenario, the problem is well understood. In particular, the capacity region is characterized by max-flow/min-cut bounds and linear network codes are sufficient to achieve maximal throughput [2]. Network coding not only yields a throughput advantage over routing, its capacity can be easily determined, and easily achieved. This is in stark contrast to routing, where computation of the capacity region and of optimal routes is fundamentally difficult.

Significant practical and theoretical complications arise in more general multicast scenarios, involving more than one session. An expression for the capacity region is known [4], however it is given by the intersection of a set of hyperplanes (specified by the network topology and connection requirement) and the set of entropy functions . Unfortunately, this capacity region, or even the inner and outer bounds [5] cannot be computed in practice, due to the lack of an explicit characterization of the set of entropy functions for more than three random variables. This difficulty is not simply a consequence of the particular formulation of the capacity region given in [4]. It was recently shown that the problem of determining the capacity region for the multi-source problem is in fact entirely equivalent to the determination of , the set of almost entropic functions [8]. Furthermore, the non-polyhedral nature of , revealed in [9] implies a non-polyhedral capacity region (in contrast to the max-flow result for single sources). To make things even worse, it is also known that linear network codes are not sufficient for the multi-source problem [3].

In this paper, we show that non-polyhedral capacity regions can occur even in *single source* scenarios. We demonstrate this phenomenon for single source networks with hierarchical sink constraints, and for single source networks with security constraints. Our approach is in the spirit of our recent work [8], which revealed a deep duality between network codes and entropy functions. Direct consequences are non-polyhedral capacity regions, the insufficiency of linear network codes and the importance of non-Shannon information inequalities.

Section 2 provides the basic setup for secure network codes, and formally defines achievability and admissibility for networks with wiretapping adversaries. Section 3 focuses on the single source incremental multicast scenario, in which the sinks have hierarchical requirements. Given a function , we construct an incremental multicast network that is solvable if and only if is entropic. In Section 4 we construct a special single source secure multicast problem which is equivalent to an insecure multi-source multicast problem. Invoking the duality results from [8] these constructions relate the solvability of both single-source incremental multicast and single source secure multicast, to multi-source multicast problems.

## 2Background

The network topology will be modeled by a directed acyclic graph . Vertices correspond to communication nodes and directed edges are error-free point-to-point communication links. The *connection requirement* is specified by three components. The set indexes the independent multicast sessions, each of which is a collection of packets to be multicast to a prescribed set of destinations. The session-source location mapping specifies the originating node for session . The receiver-location mapping indicates the set of nodes which require the data of session .

A *network code* is identified by a set of discrete random variables , defined on finite sample spaces, where for concise notation, set-valued subscripts denote a set of objects indexed by the set, e.g. . The source random variables are mutually independent and are uniformly distributed on sample spaces whose size will be denoted . The variables are the messages transmitted over link .

Since the network is acyclic, variables in and can be ancestrally ordered according to the network topology. Causal coding requires that edge messages are conditionally independent of their non-incident ancestral messages given their incident source and message variables.

Probabilistic network codes can be implemented via using independent random variables (internal randomness) at each node such that all outgoing messages from a node are deterministic functions of incoming sources and link messages and the independent randomness generated at the node. It is easy to prove that all probabilistic network codes can be implemented in this way. Accordingly, we shall specify a probabilistic network code by the set .

The implication of the lemma is as follows. At the sinks (or any intermediate node) of the network, if reconstruction of the source messages is possible, then it can also be achieved in the absence of “internal randomness”. In fact, in the absence of security constraints, it is known that deterministic network codes are sufficient [6]. This is not always the case for the wiretapping scenarios considered in Section Section 4.

In addition to legitimate sinks, there are adversaries, which can eavesdrop any message transmitted along a given collection of links. Each adversary attempts to reconstruct a particular set of source messages, according to a wiretapping pattern.

For a given network code designed with respect to a connection requirement , define as the error probability that at least one receiver fails to correctly reconstruct one or more of its requested source messages. A *zero-error network code* is one for which , and hence the source messages can be perfectly reconstructed at desired sinks. The goal of secure communications is to transmit information such that any eavesdropper listening to the traffic on all the links in remains “ignorant” of the data transmitted by the sources in . A *perfectly secure* network code is one for which the information leakage for all .

The preceding definitions consider zero-error network codes and perfect security. Relaxing these requirements prompts the following definition.

In the absence of any security constraints, , these definitions reduce to the usual ones and the multi-source, multi-sink capacity region is given by [4]. Bounds for the multi-source multi-sink scenario with wiretappers were given in [10].

## 3Incremental Multicast

In this section, we study a the special case of *incremental multicast*, meaning that the session indexes are totally ordered such that a receiver requesting a particular session also requests all sessions with lower index. We consider the simplest incremental multicast scenario, with only two source messages and no secrecy constraints (permitting deterministic codes). We will show that determining the capacity region, even in such a simple scenario, can be no simpler than solving the general multicast problem.

Our approach is inspired by [8]. Let with coordinates indexed by proper subsets of a ground set with elements. Points can be regarded as functions, with . Given such an we will construct a special network , an incremental connection requirement and a rate-capacity tuple that is admissible if and only if is entropic.

The network topology, connection requirement and link capacities are defined in Figure ?, which for convenience, is divided into several subnetworks. The single source node is an open circle, labelled with the two available sessions (this node is repeated for convenience in Figures ?, ? and ?). The destinations are double circles, labelled with their requirements. Intermediate nodes are solid circles. The source and sink labels define the mappings and . Each capacitated edge is labeled with a pair of symbols denoting the edge capacity, and the edge message (and corresponding random variable). Unlabelled edges are assumed to be uncapacitated, or to have a finite but sufficiently large capacity to losslessly forward all received messages.

The first part of the network, shown in Figure ?, contains the source where there are two independent sessions (i.e., two messages and ) available. The desired source rates associated with and are respectively and . There are specific edge messages that are of particular interest. Rather than naming all edge variables , we label these particular edge variables and for . Remaining edge variables will be labelled with generic symbols indexed by an integer .

In Figure ?, the source node generates from and respectively the sets of network coded messages and which are duplicated as required and forwarded to the rest of the network. The remainder of the network is divided into subnetworks of two types, shown in Figures ? and ?.

With reference to Figure ?, there are type 1 subnetworks, one for each nonempty . These subnetworks introduce an edge of capacity between the source and a sink requiring . There is an intermediate node which has another incident edges (from Figure ?), carrying . The intermediate node then has an edge of capacity to the sink.

Figure ? shows the structure of type 2 subnetworks, which are indexed by and an element . Each type 2 subnetwork connects the source to the upper receiver. In addition, there are other incident edges carrying and . For notational simplicity, we have written .

So far, we have described a network , a connection requirement and have assigned rates to sources and capacities to links. Clearly depends only on , and not in any other way on . Similarly, the topology of the network depends only on . The choice of affects only the source rates and edge capacities, which are collected into the rate-capacity tuple . Also, we can assume without loss of generality that is a linear function of .

Suppose that is admissible. By Definition ?, admissibility of on requires the existence of a zero-error network code with source messages , and a subset of its coded messages and . Given this hypothesis, we will show that is the entropy function of , and that is quasi-uniform.

First focus on Figure ?. Applying min-cut bounds, it is straightforward to prove

Similarly, applying min-cut bounds to type 1 subnetworks of Figure ?, .

We now focus on type 2 subnetworks of Figure ? and aim to prove that for any . In order for the upper receiver to reconstruct and ,

or equivalently, . In addition,

As a result, which further implies that is a function of . Thus can be recovered at . On the other hand, from the lower part of the subnetwork,

where follows from the fact that can be reconstructed at the lower receiver. This implies that can be reconstructed at . From [8], that can decode and that can decode further implies . By mathematical induction (similar to the proof of [8]), the only solution that satisfies all of the conditions above is when the entropy function of is equal to .

Finally, from type 1 subnetworks, the support of is at most . Hence, is indeed quasi-uniform (this also implies that the are quasi-uniform, via and the independence of the ).

From Theorems ? and ?, we can follow the approach in [8] and easily extend the result to almost entropic functions.

## 4Secure Multicast

Linear network codes (for single source multicast) that are resilient to eavesdropping are considered in [11]. Sufficient conditions for the existence of such codes was also derived. This was further generalized in [12] to multi-source cases. A similar result was also obtained in [13] which gives necessary and sufficient conditions under which transmitted data are safe from being revealed to eavesdroppers. All of the above-cited works assume that the wiretapper aims to reconstruct all sources. Similar results have been obtained where only a subset of sources are to be reconstructed [14]. Inner and outer bounds to the secure capacity region were given in [10].

We will now show that even for a simple single-session secure multicast problem, determination of the capacity region can be extremely hard. In particular, the problem is at least as hard as any multi-source multi-session multicast problem.

Figure 1 shows the construction for a network . The source message is whose rate is . The link capacities are parametrized by . There is a single eavesdropper who only observes the message variable . Thus Figure 1 also specifies , and the wiretapping pattern .

From the capacity constraint on , we have

Together with the decodability requirement, , we have

Applying a min-cut bound on the set of edge variables , we can also prove that and . On the other hand, the secrecy constraint requires and hence

as is a function of .

Now, we will show that . First,

where (a) follows from the fact that is a function of and (b) follows from the conditional independence implied by the underlying network topology. Using the same argument, we can also prove that .

Since is a function of and is thus independent of internal randomness, Lemma ? implies that . Together with , we have

Since , it implies that or equivalently that is a function of and . Similarly, using the same argument, we can also prove that .

Our final aim is to show that and . Clearly, both and are bounded above by due to the edge capacity constraint. We obtain a lower bound on the entropy of as follows.

where (a) follows from (Equation 1). Hence, . And similarly, we can also prove that .

Independence of and implies

where (a) follows from . Consequently, .

Similarly, . Finally,

where (a) follows from independence of and . Hence, which further implies .

Under a regularity condition (that and are integers), the converse of Proposition ? also holds.

Essentially, Propositions ? and ? suggest that the admissibility of the single source secure multicast problem depends on communication of a secret key from to . Adhering several copies of together (see Figure 2), we can easily generalize the network such that admissibility implies that multiple secret keys must be transmitted across a network. This turns the single source secure multicast problem into a multi-source multicast.

## 5Implications and conclusion

Theorems ? and ? show that even for a single-source network multicast problem with two independent sets of messages or for a single source secure multicast problem, the determination of the set of achievable rate-capacity tuples can be extremely hard. Following the same arguments as used in [8], we can also prove the following results for a single-source two-session multicast problem or for a single-source single-session multicast problem with secrecy constraints:

Capacity regions are not polyhedral

in general.^{1}LP bounds are not tight in general.

Linear codes are not sufficient to achieve capacity.

In other words, finding capacity regions for (secure) multicast problems seems to be a mission impossible. Not only are the existing bounding techniques loose, the non-polyhedral nature of the capacity region suggests that LP bounds cannot fully characterize the region, even with the addition of more and more newly discovered information inequalities. Any finite set of such new inequalities can only further tighter the bound, but can never yield the exact capacity region.

Despite the hardness of the problem, there are still many questions to be answered. It is unclear what makes finding the capacity region problem so difficult. In the case of a single session multicast or the case where there are only two sinks, capacity regions have explicit polyhedral characterizations provided by min-cut bounds. On the other hand, where there are many sinks, the capacity region can be extremely complicated to characterize, even if there are only two independent sessions. It will be of great importance to classify the set of networks and connection requirements that lead to polyhedral capacity regions characterized by min-cut bounds or LP bounds.

### Footnotes

- That the single-source single-session secure multicast problem has a non-polyhedral capacity region is somewhat surprising, since the region for the same problem without the secrecy constraint is completely determined by the min-cut bound

### References

- R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung, “Network information flow,”
*IEEE Trans. Inform. Theory*, vol. 46, no. 4, pp. 1204–1216, July 2000. - S.-Y. R. Li, R. Yeung, and N. Cai, “Linear network coding,”
*IEEE Trans. Inform. Theory*, vol. 49, no. 2, pp. 371–381, Feb. 2003. - R. Dougherty, C. Freiling, and K. Zeger, “Insufficiency of linear coding in network information flow,”
*IEEE Trans. Inform. Theory*, vol. 51, no. 8, pp. 2745–2759, Aug. 2005. - X. Yan, R. Yeung, and Z. Zhang, “The capacity region for multi-source multi-sink network coding,” in
*IEEE Int. Symp. Inform. Theory*, Nice, France, 2007, pp. 116–120. - L. Song, R. W. Yeung, and N. Cai, “Zero-error network coding for acyclic networks,”
*IEEE Trans. Inform. Theory*, vol. 49, no. 12, pp. 3129–3139, Dec. 2003. - R. W. Yeung,
*A First Course in Information Theory*, ser. Information Technology: Transmission, Processing and Storage.1em plus 0.5em minus 0.4emNew York: Kluwer Academic/Plenum Publishers, 2002. - R. W. Yeung, S.-Y. R. Li, N. Cai, and Z. Zhang,
*Network Coding Theory*, ser. Foundations and Trends in Communications and Information Theory. 1em plus 0.5em minus 0.4emNow Publishers, 2006. - =2plus 43minus 4 T. Chan and A. Grant, “Dualities between entropy functions and network codes,” submitted to IEEE Trans. Inform. Theory. [Online]. Available: http://arxiv.org/abs/0708.4328v1 =0pt
- F. Matus, “Infinitely many information inequalities,” in
*IEEE Int. Symp. Inform. Theory*, Nice, France, 2007, pp. 24–29. - T. H. Chan and A. Grant, “Capacity bounds for secure network coding,” 2008, submitted to Australian Communications Theory Workshop.
- N. Cai and R. Yeung, “Secure network coding,” in
*IEEE Int. Symp. Inform. Theory*, 2002. - N. Cai and R. W. Yeung, “A security condition for multi-source linear network coding,” in
*IEEE Int. Symp. Inform. Theory*, 2007. - J. Feldman, T. Malkin, C. Stein, and R. Servedio, “On the capacity of secure network coding,” in
*42nd Annual Allerton Conference on Communication, Control, and Computing*, 2004. - K. Bhattad and K. Narayanan, “Weakly secure network coding,” in
*Workshop on Network Coding, Theory, and Applications*, 2005.