A Unified Approach for Network Information Theory

# A Unified Approach for Network Information Theory

Si-Hyeon Lee,  and Sae-Young Chung,  S.-H. Lee is with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada (e-mail: sihyeon.lee@utoronto.ca). This work was done when she was at KAIST. S.-Y. Chung is with the Department of Electrical Engineering, KAIST, Daejeon, South Korea (e-mail: sychung@ee.kaist.ac.kr). The material in this paper will be presented in part at IEEE ISIT 2015 [1].
###### Abstract

In this paper, we take a unified approach for network information theory and prove a coding theorem, which can recover most of the achievability results in network information theory that are based on random coding. The final single-letter expression has a very simple form, which was made possible by many novel elements such as a unified framework that represents various network problems in a simple and unified way, a unified coding strategy that consists of a few basic ingredients but can emulate many known coding techniques if needed, and new proof techniques beyond the use of standard covering and packing lemmas. For example, in our framework, sources, channels, states and side information are treated in a unified way and various constraints such as cost and distortion constraints are unified as a single joint-typicality constraint.

Our theorem can be useful in proving many new achievability results easily and in some cases gives simpler rate expressions than those obtained using conventional approaches. Furthermore, our unified coding can strictly outperform existing schemes. For example, we obtain a generalized decode-compress-amplify-and-forward bound as a simple corollary of our main theorem and show it strictly outperforms previously known coding schemes. Using our unified framework, we formally define and characterize three types of network duality based on channel input-output reversal and network flow reversal combined with packing-covering duality.

## I Introduction

In network information theory, we study the fundamental limits of information flow and processing in a network and develop coding strategies that can approach the limits closely. Instead of studying a fully general network, however, we often study simple canonical models such as the multiple-access channel [2], relay channel [3], and distributed source coding [4] because they are easier to study and more importantly because we can get useful insights from studying them. Once such insights are obtained, one can try to develop a more general theory that is applicable to general networks.

However, such a task is challenging and only partial results have been known so far [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], in which network model and/or applied coding technique is limited. For example, network coding [5] and compress-and-forward (CF) [3] were unified as noisy network coding in [8, 9], but does not include decode-and-forward (DF) [3]. DF and partial DF [3] were generalized for single-source multiple-relay single-destination networks [6] and for multicast and broadcast networks [13, 14], respectively. In [10], noisy network coding was combined with network DF [6], but does not allow a relay to perform both partial DF and CF simultaneously. For joint source-channel coding problems, a hybrid analog/digital coding strategy [12] was proposed that recovers and generalizes many previously known results. Such a hybrid coding scheme was applied to some relay networks and was shown to unify both amplify-and-forward (AF) [16] and CF [3]. In [15], a novel framework for proving achievability was proposed based on output statistics of random binning and source–channel duality. One important feature of this framework is that the addition of secrecy is free, i.e., once an achievability result is obtained for a network model using this framework, an achievability result with additional secrecy constraint is immediately obtained. We note that [12] and [15] took a bottom-up approach in a sense that achievability results are separately obtained for each of various network models.

In this paper, we take a top-down approach and prove a unified achievability theorem for a general network scenario with arbitrarily many nodes. Our setup is general enough such that any combination of source coding, channel coding, joint source-channel coding, and coding for computing problems can be treated. Our result recovers most of the exiting achievability results in network information theory as long as they are based on random coding. Some examples of known results recovered by our theorem are listed as follows:

• Channel coding: Gelfand-Pinsker coding [17], Marton’s inner bound for the broadcast channel [18], Han-Kobayashi inner bound for the interference channel [19], [20], coding for channels with action-dependent states [21], interference decoding for a 3-user interference channel [22], [23], Cover-Leung inner bound for the multiple access channel with feedback [24], a combination of partial DF and CF for the relay channel [3], network DF [6], noisy network coding [8, 9], short message noisy network coding with a DF option [10], offset encoding for the multiple access relay channel [25].

• Source coding: Slepian-Wolf coding [4], Wyner-Ziv coding [26], Berger-Tung inner bound for distributed lossy compression [27], [28], Zhang-Berger inner bound for multiple description coding [29].

• Joint source-channel coding: hybrid coding [12] and all previous results recovered by hybrid coding including sending arbitrarily correlated sources over multiple access channels [30], broadcast channels [31], and interference channels [32, 33].

• Coding for computing: coding for computing [34], cascade coding for computing [35].

In Table I, we compare many approaches that attemped to unify various coding strategies.111The check mark ‘✓’ means that the corresponding unification approach subsumes both the model and the achievability bound of previous result.

Our theorem can be useful in proving new achievability results easily and in some cases gives simpler rate expressions than those obtained using conventional approaches. Furthermore, our unified coding can strictly outperform existing schemes. To illustrate this, we show that a generalized decode-compress-amplify-and-forward bound for acyclic networks can be obtained as a simple corollary of our main theorem and show it strictly outperforms previously known coding schemes. As another special case of our main theorem, we derive a generalized decode-compress-and-forward bound for a discrete memoryless network (DMN) in [36], which recovers both noisy network coding [9] and distributed decode-and-forward [13] bounds. This is the first time the partial-decode-compress-and-forward bound (Theorem 7) by Cover and El Gamal [3] is generalized for DMN’s such that each relay performs both partial DF and CF simultaneously.

Our unified coding theorem enables us to state various types of duality arising in network information theory. Specifically, we formally define and characterize three types of network duality based on channel input-output reversal and network flow reversal combined with packing-covering duality. Our duality results include as special cases many known duality relationships in network information theory, e.g., the duality between coding for multiple-access channel [2] and distributed sources [27], [28] (type-I duality), the duality between Gelfand-Pinsker coding [17] and Wyner-Ziv coding [26] (type-II duality), and the duality between coding for multiple-access channel [2] and broadast channel [18] (type-III duality).

Our unified achievability result is enabled by many novel elements such as a unified framework that represents various network problems in a simple and unified way, a unified coding strategy that consists of a few basic ingredients but can emulate known coding techniques if needed, and new proof techniques beyond the use of standard covering and packing lemmas. In our framework, sources, channels, states and side information are treated in a unified way and various constraints such as cost and distortion constraints are combined as a joint-typicality constraint, which is specified by a single joint distribution. Furthermore, we mainly consider acyclic discrete memoryless networks (ADMN) in this paper, where information flows in an acyclic manner. However, we also show our coding theorem can also be applied to general DMN’s by unfolding the network. Graph unfolding was first used in [5] for network coding.

Our coding scheme has four main ingredients, i.e., superposition coding, simultaneous nonunique decoding, simultaneous compression, and symbol-by-symbol mapping. We note that our coding scheme does not explicitly include binning and multicoding, but is still general enough to emulate them if needed. Although each of these coding ingredients is not new, these are tweaked and combined in a special way to enable unification of many previous approaches. In our coding scheme, covering codebooks are used to compress information that each node observes and decodes. These covering codebooks are generated to permit superposition coding [37]. Each node operates according to the following three steps. The first step is simultaneous nonunique decoding [20, 38, 39], where a node uniquely decodes some covering codewords of other nodes together with some other covering codewords that do not need to be decoded uniquely. The next step is simultaneous compression, where the node finds covering codewords simultaneously that carry information about a received channel output sequence and decoded codewords. Since we allow general superposition relationship among covering codebooks, a more general analysis beyond multivariate covering lemma [40, 41] is needed. The last step is a symbol-by-symbol mapping from a received channel output sequence and decoded and covered codewords to a channel input sequence. The technique of using a symbol-by-symbol mapping was introduced in [42], which is referred to as the Shannon strategy. Our symbol-by-symbol mapping from all three, i.e., the channel output sequence and decoded and covered codewords, was first used in [43] for a three-node noncausal relay channel. We note that such a use of symbol-by-symbol mapping results in correlation between a channel input sequence and nonchosen covering codewords and thus the standard packing lemma [41] cannot be applied for the error analysis. Such correlation was problematic in many previous works and solved for some simple networks in [44, 12, 43]. Our proof technique completely solves this correlation issue in a fully general network setup.

This paper is organized as follows. In Section II, we present our unified framework. In Section III, we propose a unified coding scheme and present the main theorem of this paper. We also show various examples to illustrate how to utilize our results. In Section IV, we characterize three types of network duality. To demonstrate usefulness of our unified coding theorem, in Section V, we derive a generalized decode-compress-amplify-and-forward bound as a simple corollary of our theorem and show it strictly outperforms previously known coding schemes. In Section VI, we present a unified coding theorem for the Gaussian case. We conclude this paper in Section VII.

The following notations are used throughout the paper.

### I-a Notation

For two integers and , denotes the set . For a set of real numbers, denotes the -th smallest element in and denotes . For constants and , denotes the vector and denotes where the subscript is omitted when , i.e., . For random variables and , and are defined similarly. For sets and , denotes and denotes where the subscript is omitted when . Consider two real vectors and of length . We say that is smaller than and write if there exists such that for all and . Furthermore, we say that is component-wise smaller than and write if for all . denotes an all-one vector and denotes an identity matrix. When is a Gaussian random vector with mean and covariance matrix , we write . is the indicator function, i.e., it is 1 if and 0 otherwise. denotes a function of that tends to zero as tends to zero.

We follow the notion of typicality in [45], [41]. Let denote the number of occurrences of in the sequence . Then, is said to be -typical (or just typical) for if for every ,

 |πxn(x)/n−p(x)|≤ϵp(x).

The set of all -typical is denoted as , which is shortly denoted as . A jointly typical set (or just a typical set) such as for multiple variables, which will also be denoted as , is naturally defined from the definition of .

## Ii Unified Framework

In this section, we build a unified framework for proving the achievability of many network information theory problems including channel coding, source coding, joint source–channel coding, and coding for computing. Let us first construct a unified framework for point-to-point scenarios and then generalize it to general network scenarios.

### Ii-a Point-to-point scenarios

Consider the standard channel coding and source coding problems [46] illustrated in Fig. 1. These two problems can be stated with the following elements: information to be communicated, node interaction and node processing functions, and the definition of achievability. Let us investigate differences between these two coding problems for each element and discuss how we can unify them into a single framework. In the following, denotes the number of channel uses for channel coding and the number of source symbols for source coding and denotes the rate in each problem.

• Information to be communicated: In channel coding, a message , uniformly distributed over , is communicated from node 1 to node 2. In source coding, a discrete memoryless source (DMS) is given to node 1 and is reconstructed at node 2 (up to a prescribed distortion level in the case of lossy source coding). We can observe that a message can be regarded as a DMS such that . Hence, in both channel coding and source coding problems, we can say that a DMS is given to node 1 and is reconstructed at node 2.

• Node interaction and node processing functions: In channel coding, node 1 communicates with node 2 through a discrete memoryless channel (DMC) , . Node 1 maps to a channel input sequence and node 2 receives a channel output sequence and maps it to . In source coding, node 1 maps to an index and node 2 receives exactly and maps it to . The noiseless communication of an index in source coding can be regarded as a DMC such that . Hence, in both channel coding and source coding problems, we can say that node 1 communicates with node 2 through a DMC , , the processing function at node 1 is a mapping from to , and the processing function at node 2 is a mapping from to . By denoting by and by , we can further unify the notation for sequences and the node processing functions, i.e., a sequence received by node is denoted by , the resultant sequence from processing at node is denoted by , and the node processing function at node is a mapping from to .

• Achievability: In channel coding and lossless source coding problems, a rate is said to be achievable if there exists a sequence of node processing functions such that , where denotes the probability of error event given as . In lossy source coding problem, a rate–distortion pair is said to be achievable if there exists a sequence of node processing functions such that , where is a distortion measure between two arguments.

Now, let us introduce a new definition of achievability from which we can show the achievability of both channel coding and source coding problems in a unified way. We say a joint distribution , shortly denoted as , is achievable if there exists a sequence of node processing functions such that for any , where denotes the probability

 P((Xn1,Xn2,Yn1,Yn2)∉T(n)ϵ)

in which the typical set is defined with respect to . Then, the achievability of appropriately chosen implies the achievability of or in channel coding and source coding problems. For channel coding and lossless source coding problems, is achievable if such that is achievable. For lossy source coding problem, is achievable if such that , is achievable from the typical average lemma [41] and the continuity of the rate-distortion function in .

To see whether the aforementioned unification approach is general enough for point-to-point scenarios, let us consider more general point-to-point scenarios in Fig. 2. First, in channels with noncausal states [17] illustrated in Fig. 2-(a), node 1 observes a message of rate and a state sequence and encodes as . Then, node 2 receives and estimates as . Achievability is defined in the same way as in the channel coding problem. Let us apply the aforementioned unification approach to this problem. Since represents all the information node 1 receives, we let such that and and are independent, where corresponds to the message of rate . But, we cannot use the channel form of to capture the dependency of the channel output on state . This indicates that a more general channel form of is needed in the unified framework. Then, we can let be equal to . If we choose such that , the achievability of implies the achievability of of the original problem.

Next, in lossy source coding with side information [26] represented in Fig. 2-(b), node 1 receives a source sequence and encodes it as an index . Then, node 2 receives the index and side information and reconstructs as up to some distortion level. Achievability is defined in the same way as in the lossy source coding problem. For this problem, we apply the unification approach as follows. We let . Since node 2 has two channel inputs, we let and let the channel be decomposed as , where the channel corresponds to the communication of of rate and hence its capacity is given as , i.e., , and the channel captures the correlation between and the side information . We pick up the target distribution in the same way as in the lossy source coding problem. Furthermore, coding for computing problem [34], where node 2 wishes to reconstruct a function of and up to distortion with respect to a distortion measure , can also be included in this framework by choosing such that , where .

In summary, the achievability of the aforementioned point-to-point coding problems can be shown by considering the following unified framework. Network model is given by as illustrated in Fig. 3 and the objective is specified by a target distribution . is said to be achievable if there exists a sequence of node processing functions, , , such that for any .

### Ii-B General scenarios

In this subsection, we generalize the unified framework in Section II-A to general -node networks. In our unified framework for nodes, we define an -node acyclic discrete memoryless network (ADMN) , , which consists of a set of alphabet pairs , and a collection of conditional pmfs , . Here, and represent any information that comes into and goes out of node , respectively. can be a channel output, message, source, non-causal state information, and any combination of those. can be a channel input, message estimate, reconstructed source, action for generating channel state, and any combination of those. Next, signifies the correlation betweeen information prior to node and information received at node . It can capture channel distribution possibly with states, correlation between distributed sources, and complicated network-wide correlation among sources and channels.

In this network, information flows in one direction and node operations are sequential. Let denote the number of channel uses. First, is generated according to and then node 1 processes based on . Next, is generated according to and then node 2 encodes based on . Similarly, is generated according to and node encodes based on for . Clearly, any layered network [7] or noncausal network (without infinite loop) [47] possibly with noncausal state or side information is represented as an ADMN. Furthermore, any strictly causal (usual discrete memoryless network with relay functions having one sample delay) or causal network (relays without delay [47]) with blockwise operations can be represented as an ADMN by unfolding the network. Note that our unified achievability theorem (Theorem 1) still applies to the unfolded network. Therefore, considering only acyclic DMN (ADMN) in our unified approach is without loss of generality while greatly simplifying our unification approach. In the following subsection, we show several known examples represented by an ADMN.

Achievability is specified using a target joint distribution , which is shortly denoted as . For a set of node processing functions , , the -probability of error is defined as , where the typical set is defined with respect to . We say the target distribution is achievable if there exists a sequence of node processing functions , , such that for any . We note that unifies diverse network demands and constaints. It can be used for designating the source–destination relationship and for imposing distortion and cost constraints.

### Ii-C Examples

In this subsection, we represent some network information theory problems by an ADMN and a target distribution such that the achievability of implies the achievability of the original problem. Let us first consider some examples of single-hop networks.

###### Example 1 (Multiple access channels [2])

For multiple access channel problem with rates and , we choose , , , , , and such that .

###### Example 2 (Distributed lossy compression [27], [28])

For distributed lossy compression problem with rate–distortion pairs and , we let , , , such that and , and such that for , where is a distortion measure between two arguments and .

###### Example 3 (Broadcast channels [18])

For broadcast channel problem with rates and , we choose , , , , , , , and such that , .

###### Example 4 (Multiple description coding [48])

For multiple description coding with rates and distortion triples , and , we choose , , such that , such that , , and such that for , where is a distortion measure between two arguments and .

Next, we show an example of multi-hop networks.

###### Example 5 (Relay channels)

Consider a three-node relay channel , illustrated in Fig. 4-(a), where node 1 wishes to send a message to node 3 with the help of node 2. Let and denote the rate and the number of channel uses, respectively, and let and denote the message of rate at node 1 and the estimated message at node 3, respectively. Then, the node processing function at node 1 is a mapping from to , the node processing function at node 2 at time is a mapping from to , and the node processing function at node 3 is a mapping from to . The probability of error is defined as and a rate is said to be achievable if there exists a sequence of node processing functions such that .

If we assume a blockwise operation at each node, we can represent this network as an ADMN by unfolding the network. Assume transmission blocks, each consisting of channel uses. In the unfolded network illustrated in Fig. 4-(b), we have nodes and the operation of node corresponds to that of node of the original network at the end of block . To reflect the fact that node is originally the same node as node , we assume that node has an orthogonal link of sufficiently large rate to node , which is represented as a dashed line in Fig. 4-(b). Because this unfolded network is acyclic, it can be represented as an ADMN and can be chosen accordingly.

### Ii-D Introduction of a virtual node

The following two propositions are obtained by introducing a virtual node in an ADMN, which turn out to be useful in recovering some known achievability results in Section III.

###### Proposition 1

 (X1,…,XN,Y1,…,YN,N∏k=1p(yk|yk−1,xk−1))

and target distribution . For some and finite set , assume for . Then, we have

 p(y|xv−1,yv−1) =∑yvp(yv|xv−1,yv−1)p(y|xv−1,yv) p(yv|xv−1,yv−1,y) =p(yv|xv−1,yv−1)p(y|xv−1,yv)∑yvp(yv|xv−1,yv−1)p(y|xv−1,yv).

 (X′1,…,X′N+1,Y′1,…,Y′N+1,N+1∏k=1p′(yk|yk−1,xk−1))

and target distribution such that

 X′k=⎧⎨⎩Xk if kv,   Y′k=⎧⎨⎩Yk if kv,
 p′(yk|xk−1,yk−1)=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩pYk|Xk−1,Yk−1(yk|xk−1,yk−1) if kv+1,

and

 ∑xv,yvp′∗(xN+1,yN+1)=p∗(xv−1,xN+1v+1,yv−1,yN+1v+1).

Then, if is achievable for the -node ADMN, is achievable for the -node ADMN.

{proof}

The proof is straightforward from the observation that the -node ADMN is obtained by introducing a virtual node, whose channel output is and channel input is null, between nodes and in the -node ADMN and reindexing the nodes.

###### Proposition 2

 (X1,…,XN,Y1,…,YN,N∏k=1p(yk|yk−1,xk−1))

such that and for some , , . Let denote the common part of two random variables and , where the common part of two discrete memoryless sources is defined in [49], [50]

 (X′1,…,X′N+1,Y′1,…,Y′N+1,N+1∏k=1p′(yk|yk−1,xk−1))

and target distribution such that

 X′k=⎧⎨⎩Xk if kv1,   Y′k=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩Yk if k
 p′(yk|xk−1,yk−1)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩pYk|Xk−1,Yk−1(yk|xk−1,yk−1) if k

and

 ∑xv1,yv1p′∗(xN+1,yN+1)=p∗(xv1−1,xN+1v1+1,yv1−1,yN+1v1+1),

where for or and can be arbitrarily large.

Then, if is achievable for the -node ADMN, is achievable for the -node ADMN.

{proof}

Note that in the -node ADMN, both nodes and observe the common part and hence can share any function of . Thus, we can introduce a virtual node whose channel output is and channel input is and assume that is available at nodes and .

## Iii Unified Coding Theorem

In this section, we propose a unified coding scheme and present the main theorem of this paper, followed by various examples that show how to utilize our results. Our scheme consists of the following ingredients: 1) superposition, 2) simultaneous nonunique decoding, 3) simultaneous compression, and 4) symbol-by-symbol mapping. These are tweaked and combined in a special way to enable unification of many previous approaches. Let us first briefly explain the proposed scheme and introduce related coding parameters. Detailed description of our scheme is given in the proof of Theorem 1.

• Codebook generation: In our coding scheme, covering codebooks are used to compress information that each node observes and decodes. We generate covering codebooks . Let for denote the alphabet for the codeword symbol of . For indexing of codewords, we consider index sets , where for some for each . We denote by the set of indices of ’s associated with in a way that each codeword in is indexed by the vector and hence consists of codewords, i.e., . Each codebook is constructed allowing superposition coding. Let denote the set of the indices of ’s on which is constructed by superposition.

• Node operation: Node operates according to the following three steps as illustrated in Fig. 5.

• Simultaneous nonunique decoding: After receiving , node decodes some covering codewords of previous nodes simultaneously, where some are decoded uniquely and the others are decoded non-uniquely. We denote by and the sets of the indices of ’s whose codewords are decoded uniquely and non-uniquely, respectively, at node .

• Simultaneous compression: After decoding, node finds covering codewords simultaneously according to a conditional pmf that carry some information about the received channel output sequence and uniquely decoded codewords , where denotes the set of the indices of ’s used for compression.

• Symbol-by-symbol mapping: After decoding and compression, node generates by a symbol-by-symbol mapping from uniquely decoded codewords , covered codewords , and received channel output sequence . Let denote the function used for symbol-by-symbol mapping.

In summary, our scheme requires the following set of coding parameters, where some constraints are added to make the aforementioned codebook generation and node operation proper:

1. positive integers and

2. alphabets

3. -rate tuple

4. sets , , , , and for and that satisfy

1. ’s are disjoint,

2. and if ,

3. , , and .

5. a set of conditional pmfs and functions for such that induced by

 N∏k=1p(yk|yk−1,xk−1)p(uWk|uDk,yk)\mathbbm1xk=xk(uDk,uWk,yk) (1)

is the same as the target distribution .

Now, we are ready to present our main theorem, which gives a sufficient condition for achievability using the aforementioned scheme. For an ADMN and target distribution , let , shortly denoted as or , denote the set of all possible ’s.

###### Theorem 1

For an -node ADMN, is achievable if there exists such that for

 ∑j∈¯Skrj <∑j∈SkI(Uj;USk[j]∪Sck,Yk|UAj) (2) ∑j∈¯Tkrj >∑j∈TkI(Uj;UTk[j]∪Dk,Yk|UAj) (3)

for all such that and for all such that , where , , ,

 Sk ≜{j:j∈Dk∪Bk,Γj∩¯Sk≠∅}, (4) Tk ≜{j:j∈Wk,Γj∩(¯Tk∪¯Dk)c=∅}. (5)
###### Remark 1

For , the inequalities (2) and (3) are the conditions for successful simultaneous nonunique decoding and simultaneous compression, respectively, at node .

###### Remark 2

Theorem 1 can be improved using coded time sharing [19].

{proof}

Consider . Let for all such that and . Let