Minimum Communication Cost for Joint Distributed Source Coding and Dispersive Information Routing

Minimum Communication Cost for Joint Distributed Source Coding and Dispersive Information Routing

Abstract

This paper considers the problem of minimum cost communication of correlated sources over a network with multiple sinks, which consists of distributed source coding followed by routing. We introduce a new routing paradigm called dispersive information routing, wherein the intermediate nodes are allowed to ‘split’ a packet and forward subsets of the received bits on each of the forward paths. This paradigm opens up a rich class of research problems which focus on the interplay between encoding and routing in a network. Unlike conventional routing methods such as in [1], dispersive information routing ensures that each sink receives just the information needed to reconstruct the sources it is required to reproduce. We demonstrate using simple examples that our approach offers better asymptotic performance than conventional routing techniques. This paradigm leads to a new information theoretic setup, which has not been studied earlier. We propose a new coding scheme, using principles from multiple descriptions encoding [2] and Han and Kobayashi decoding [3]. We show that this coding scheme achieves the complete rate region for certain special cases of the general setup and thereby achieves the minimum communication cost under this routing paradigm.

{IEEEkeywords}

Distributed source coding, Minimum cost routing, Compression of correlated sources

1 Introduction

Compression of sources in conjunction with communication over a network has been an important research area, notably with the recent advancements in distributed compression of correlated sources and network (routing) design, coupled with the deployment of various sensor networks. Encoding correlated sources in a network, such as a sensor network with multiple nodes and sinks as shown in Fig. 1, has conventionally been approached from two different directions. The first approach is routing the information from different sources in such a way as to efficiently re-compress the data at intermediate nodes without recourse to distributed source coding (DSC) methods (we refer to this approach as joint coding via ‘explicit communication’). Such techniques tend to be wasteful at all but the last hops of the communication path. The second approach performs DSC followed by simple routing. Well designed DSC followed by optimal routing can provide good performance gains. We will focus on the latter category. Relevant background on DSC and route selection in a network is given in the next section.

Figure 1: A general multi-source multi-sink sensor network. The circles denote sources and stars denote sinks. The arrows denote allowed communication links.

This paper focuses on minimum cost communication of correlated sources over a network with multiple-sinks. We introduce a new routing paradigm called Dispersive Information Routing (DIR), wherein intermediate nodes are allowed to “split a packet” and forward a subset of the received bits on each of the forward paths. This paradigm opens up a rich class of research problems which focus on the interplay between encoding and routing in a network. What makes it particularly interesting is the challenge in encoding sources such that exactly the required information is routed to each sink, to reconstruct the prescribed subset of sources. We will show, using simple examples that asymptotically, DIR achieves a lower cost over conventional routing methods, wherein the sinks usually receive more information than they need. This paradigm leads to a general class of information theoretic problems, which have not been studied earlier. In this paper, we formulate this problem and the associated rate region. We introduce a new (random) coding technique using principles from multiple descriptions encoding and Han and Kobayashi decoding, which leads to an achievable rate region for this problem. We show that this achievable rate region is complete under certain special scenarios.

The rest of the paper is organized as follows. In Section 2, we review prior work related to distributed source coding and network routing. Before stating the problem formally, in Section 3, we provide 2 simple examples to demonstrate the basic principles behind DIR and the new encoding scheme. We also demonstrate the suboptimality of conventional routing methods using these simple examples. In Section 4, we formally state the DIR problem and provide an achievable rate region. Finally, in Section 5, we show that this achievable rate region is complete for some special cases of the setup.

2 Prior Work

Multi-terminal source coding has one of its early roots in the seminal work of Slepian and Wolf [4]. They showed, in the context of lossless coding, that side-information available only at the decoder can nevertheless be fully exploited as if it were available to the encoder, in the sense that there is no asymptotic performance loss. Later, Wyner and Ziv [5] derived a lossy coding extension that bounds the rate-distortion performance in the presence of decoder side information. Extensive work followed considering different network scenarios and obtaining achievable rate regions for them, including [6, 7, 8, 9, 10, 11, 12, 13, 14]. Han and Kobayashi [3] extended the Slepian-Wolf result to general multi-terminal source coding scenarios. For a multi-sink network, with each sink reconstructing a prespecified subset of the sources, they characterized an achievable rate region for lossless reconstruction of the required sources at each sink. Csiszr and Krner [15] provided an alternative characterization of the achievable rate region for the same setup by relating the region to the solution of a class of problems called the “entropy characterization problems”.

There has also been a considerable amount of work on joint compression-routing for networks. A survey of routing techniques for sensor networks is given in [16]. It was shown in [17] that the problem of finding the optimum route for compression using explicit communication is an NP-complete problem. [18] compared different joint compression-routing schemes for a correlated sensor grid and also proposed an approximate, practical, static source clustering scheme to achieve compression efficiency. Much of the above work is related to compression using explicit communication, without recourse to distributed source coding techniques. Cristescu et al. [1] considered joint optimization of Slepian-Wolf coding and a routing mechanism, we call ‘broadcasting’1, wherein each source broadcasts its information to all sinks that intend to reconstruct it. Such a routing mechanism is motivated from the extensive literature on optimal routing for independent sources [19]. [20] proved the general optimality of that approach for networks with a single sink. We demonstrated its sub-optimality for the multi-sink scenario, recently in [21]. This paper takes a step further towards finding the best joint compression-routing mechanism for a multi-sink network. We note that a preliminary version of our results appeared in [22] and [23].

We note the existence of a volume of work on minimum cost network coding for correlated sources, e.g. [24, 25]. But the routing mechanism we introduce in this paper does not require possibly complex network coders at intermediate nodes, and can be realized using simple conventional routers. The approach does have potential implications on network coding, but these are beyond the scope of this paper.

3 Dispersive Information Routing - Simple Networks

3.1 Basic Notation

We begin by introducing the basic notation. In what follows, denotes the set of all subsets (power set) of any set and denotes the set cardinality. Note that . denotes the set complement (the universal set will be specified when there is ambiguity) and denotes the null set. For two sets and , we denote the set difference by . Random variables are denoted by upper case letters (for example ) and their realizations are denoted by lower case letters (for example ). We also use upper case letters to denote source nodes and sinks and the ambiguity will be clarified wherever necessary. A sequence of independent and identically distributed (iid) random variables and its realization are denoted by and , respectively. The length , -typical set is denoted by . denotes that the three random variables form a Markov chain in that order. Notation in [26] is used to denote standard information theoretic quantities.

3.2 Illustrative example - No helpers case

Figure 2: Example 1 - Conventional Routing

Consider the network shown in Fig. 2. There are three source nodes, , and and two sinks and . The three source nodes observe correlated memoryless sequences and , respectively. Sink reconstructs the pair , while reconstructs . communicates with the two sinks through an intermediate node (called the ‘collector’) which is functionally a simple router. The edge weights on each path in the network are as shown in the figure. The cost of communication through an edge, , is a function of the bit rate flowing through it, denoted by and the corresponding edge weight, denoted by , which in this paper, we will assume for simplicity to be a simple product , noting that the approach is directly extendible to more complex cost functions. We further assume that the total cost is the sum of individual communication cost over each edge. The objective is to find the minimum total communication cost for lossless transmission of sources to the respective sinks.

We first consider the communication cost when broadcast routing is employed [1] wherein the routers forward all the bits received from a source to all the decoders that would reconstruct it. In other words, routers are not allowed to “split” a packet and forward a portion of the received information on the forward paths. Hence the branches connecting the collector to the two sinks carry the same rates as the branch connecting to the collector. We denote the rate at which , and are encoded by , and , respectively.

Using results in [1], it can be shown that the minimum communication cost under broadcast routing is given by the solution to the following linear programming formulation:

(1)

under the constraints:

(2)

To gain intuition into dispersive information routing, we will later consider a special case of the above network when the branch weights are such that . Let us specialize the above equations for this case. The constraint , implies that and should be encoded at rates and , respectively. Therefore the scenario effectively captures the case when and are available as side information at the respective decoders. It follows from (1) and (2) that for achieving minimum communication cost, is:

(3)

and therefore the minimum communication cost is given by:

(4)

Is this the best we can do? The collector has to transmit enough information to sink for it to decode and therefore the rate is at least . Similarly the rate on the branch connecting the collector to is at least . But if , there is excess rate on one of the branches.

Let us now relax this restriction and allow the collector node to “split” the packet and route different subsets of the received bits on the forward paths. We could equivalently think of the source transmitting 3 smaller packets to the collector; the first packet has a rate bits and is destined to both sinks. Two other packets have rates and and are destined to sinks and , respectively. Technically, in this case, the collector is again a simple conventional router.

We refer to such a routing mechanism, where each intermediate node transmits a subset of the received bits on each of the forward paths, as “Dispersive Information Routing” (DIR). Note that unlike network coding, DIR does not require possibly expensive coders at intermediate nodes, and can always be realized using conventional routers, with each source transmitting multiple packets into the network intended to different subsets of sinks. Hereafter, we interchangeably use the ideas of “packet splitting” at intermediate nodes and conventional routing of smaller packets, noting the equivalence in achievable rates and costs. This scenario is depicted in Fig. 3 with the modified cost each packet encounters.

Figure 3: Example - DIR. Note that the notion of ‘packet splitting’ is equivalent to the sources transmitting multiple smaller packets

Two obvious questions arise - Does DIR achieve a lower communication cost compared to conventional routing? If so, what is the minimum communication cost under DIR?

We first aim to find the minimum cost using DIR under the special case of (i.e., and ). To establish the minimum communication cost we need to first establish the complete achievable rate region for the rate tuple for lossless reconstruction of at both the decoders and then find the point in the achievable rate region that minimizes the total communication cost, determined using the modified weights shown in Fig. 3. Before deriving the ultimate solution, it is instructive to consider one operating point, and provide the coding scheme that achieves it. Extension to other “interesting points” and to the whole achievable region follows in similar lines. This particular rate point is considered first due to its intuitive appeal as shown in a Venn diagram (Fig. 4a).

Figure 4: Venn Diagram based intuition: (a) Amount of information routed using DIR when operating at point . Observe that each of the sinks receive information at the respective minimum rates. Green represents , Blue represents and Red represents (b) Intuitive representation of Wyner’s common information. Observe that in Wyner’s setup, it is generally not possible to split the information exactly and that there is a rate loss due to transmitting the common bit stream.

Gray and Wyner considered a closely resembling network [13] shown in Fig. 5. In their setup, the encoder observes iid sequences of correlated random variables and transmits packets (at rates , respectively), one meant for each subset of sinks. The two sinks reconstruct sequences and , respectively. They showed that the rate tuple is not achievable in general and that there is a rate loss due to transmitting a common bit stream; in the sense that individual decoders must receive more information than they need to reconstruct their respective sources if the sum rate is maintained at minimum. Wyner defined the term “Common Information” [11], here denoted by as the minimum rate such that is achievable and . He also showed that where the is taken over all auxiliary random variables such that form a Markov chain. He further showed that, in general, . We note in passing, the existence of an earlier definition of common information by Gcs and Krner [27] which measures the maximum shared information that can be fully utilized by both the decoders. It is less relevant to dispersive information routing.

Figure 5: Gray-Wyner Setup. Note the resemblance to the DIR setup in Fig. 3

At first glance, it might be tempting to extend Wyner’s argument to the DIR setting and say is not achievable in general, i.e., each decoder has to receive more information than it needs. But interestingly enough, a rather simple coding scheme achieves this point and simple extensions of the coding scheme can achieve the entire rate region for this example. The primary difference between Gray-Wyner network and DIR is that in their setup two correlated sources are encoded jointly for separate decoding at each sink. However, in our setup, is encoded for lossless decoding at both the sinks. Note that this section only provides intuitive arguments to support the result. A coding scheme will be formally derived in section 4, for the general setup.

We concentrate on encoding at assuming that and transmit at their respective source entropies. observes a sequence of iid random variables . This sequence belongs to the typical set, , with high probability. Every typical sequence is assigned indices, each independent of the other. The three indices are assigned using uniform pmfs over , and , respectively. All the sequences with the same first index, , form a bin . Similarly bins and are formed for all indices and , respectively. Upon observing a sequence with indices and , the encoder transmits index to decoder alone, index to decoder alone and index to both the decoders.

The first decoder receives indices and . It tries to find a typical sequence which is jointly typical with the decoded information sequence . As the indices are assigned independent of each other, every typical sequence has uniform pmf of being assigned to the index pair over . Therefore, having received indices and , using arguments similar to Slepian-Wolf [4] and Cover [7], the probability of decoding error asymptotically approaches zero if:

(5)

Similarly, probability of decoding error approaches zero at the second decoder if:

(6)

Clearly (5) and (6) imply that is achievable. In similar lines to [4, 7], the above achievable region can also be shown to satisfy the converse and hence is the complete achievable rate region for this problem. We term such a binning approach as ‘Power Binning’ as an independent index is assigned to each (non-trivial) subset of the decoders - the power set. It is worthwhile to note that the same rate region can be obtained by applying results of Han and Kobayashi [3], assuming 3 independent encoders at , albeit with a more complicated coding scheme involving multiple auxiliary random variables (see also [28]). We also note that the mechanism of assigning multiple independent random bin indices has been used is several related prior work, such as [29, 30].

The minimum cost operating point is the point that satisfies equations (5) and (6) and minimizes the cost function:

(7)

The solution is either one of the two points or and both achieve lower total communication cost compared to broadcast routing, in (4), for any if .

The above coding scheme can be easily extended to the case of arbitrary edge weights. Then, the rate region for the tuple and the cost function to be minimized are given by:

(8)

under the constraints:

(9)

If and , (9) specializes to (5) and (6). Also, it can easily be shown that the total communication cost obtained as a solution to the above formulation is lower than that for conventional routing if . This example clearly demonstrates the gains of DIR over broadcast routing to communicate correlated sources over a network.

Observe that in the above example, the sinks only receive information from the source nodes they intend to reconstruct. Such a scenario is called the ‘No helpers’ case in the literature [15]. In a network with multiple sources and sinks, if source is to be reconstructed at a subset of sinks , power binning assigns independently generated indices, each being routed to a subset of . It will be shown later in section 5 that power binning achieves minimum cost under DIR, even for a general setup, as long as there are no helpers, i.e., when each sink is allowed to receive information only from the requested sources. However, the problem of establishing the complete achievable rate region becomes considerably harder when every source is allowed to communicate with every sink, a scenario, that is highly relevant to practical networks. It was shown in [21] that for certain networks, unbounded gains in communication cost are obtained when source nodes are allowed to communicate with sinks that do not reconstruct them. In this paper, we derive an achievable rate region for this setup. In the following subsection, to keep the notations and understanding simple, we begin with one of the simplest setups which illustrates the underlying ideas.

3.3 A simple network with helpers

Figure 6: The 2 Source - 2 Sink example. Each source acts as the principle source for one sink and as a helper for the other

We will again provide only intuitive description for the encoding scheme here and defer the formal proofs for the general case to section 4. Consider the network shown in Fig. 6. Two source nodes and observe correlated memoryless sequences and , respectively. Two sinks and require lossless reconstructions of and , respectively. The source nodes can communicate with the sinks only through a collector node. The edge weights are as shown in the figure. Observe that, each source, while requested by one sink, acts as helper for the other.

Under dispersive information routing, each source transmits a packet to every subset of sinks. In this example, sends 3 packets to the collector at rates , respectively. The collector forwards the first packet to , the second to and the third to both and . Similarly, sends 3 packets to the collector at rates which are forwarded to the corresponding sinks. Our objective is to determine the set of achievable rate tuples that allows for lossless reconstruction at the two sinks. The minimum cost then follows by finding the point in the achievable rate region which minimizes the effective communication cost, , given by:

(10)

A non-single letter characterization of the complete rate region is possible using the results of Han and Kobayashi in [3]. They also provide a single-letter partial achievable rate region. However, applicability of their result requires artificial imposition of 3 independent encoders at each source, which is an unnecessary restriction. We present a more general achievable rate region, which maintains the dependencies between the messages at each encoder. Note that the source coding setup which arises out of the DIR framework is a special case of the general problem of distributed multiple descriptions and therefore the principles underlying the coding schemes for distributed source coding [3] and multiple descriptions encoding [2] play crucial roles in deriving a coding mechanism for dispersive information routing. It is interesting to observe that, unlike the general MD setting, the DIR framework is non-trivial even in the lossless scenario and deriving a complete rate region for lossless reconstruction at all the sinks is a challenging problem.

We now give an achievable region for the example in Fig. 6. Suppose we are given random variables jointly distributed with such that the following Markov chain conditions hold:

(11)

Note that the codeword indices of are sent in the packet from source to sinks . The encoding is divided into 3 stages.

Encoding : We first focus on the encoding at . In the first stage, codewords of , each of length are generated independently, with elements drawn according to the marginal density . Conditioned on each of these codewords, and codewords of and are generated according to the conditional densities and , respectively. Codebooks for and are generated at in a similar fashion. On observing a sequence , first tries to find a codeword tuple from the codebooks of such that and . The probability of finding such a codeword tuple approaches 1 if,

(12)

Let the codewords selected be denoted by (,). Similar constraints on must be satisfied for encoding at . Denote the codewords selected at by . It follows from (11) and the ‘Conditional Markov Lemma’ in [10] that and with high probability.

In the second stage of encoding, each encoder uniformly divides the codewords of into bins , . All the codewords which have the same bin index are said to fall in the bin . Note that the number of codewords in bin is . If selects the codewords in the first stage and if the bin indices associated with are , then index is routed to sink , to sink and to both the sinks and . Similarly, bin indices are routed from to the corresponding sinks.

The third stage of encoding, resembles the ‘Power Binning’ scheme described in Section 3.2. Every typical sequence of is assigned a random bin index uniformly chosen over . All sequences with the same index, , form a bin . Upon observing a sequence with bin index , in addition to (from the second stage of encoding), encoder also routes index to sink . Similarly bin index is routed from to in addition to . These bin indices are used to reconstruct and losslessly at the respective decoders. Note that, in a general setup, if source is to be reconstructed at a subset of sinks , the source assigns independently generated indices, each being routed to a subset of . We also note that and can be conveniently set to constants without changing the overall rate region. However, we continue to use them to avoid complex notation.

Decoding : We again focus on the first sink . It receives the indices . It first looks for a pair of unique codewords from and which are jointly typical. Obviously, there is at least one pair, , which is jointly typical. The probability that no other pair of codewords are jointly typical approaches if:

(13)

Noting that and , and applying the constraints on and from (12) we get the following constraints for and :

(14)

The decoder at next looks at the codebooks of and which were generated conditioned on and , respectively, to find a unique pair of codewords from and which are jointly typical with . We again have one pair, , which is jointly typical with . It can be shown using arguments similar to [3] that the probability of finding no other jointly typical pair approaches if :

(15)

On substituting the constraints for and from (12), and using the Markov chain condition in (11) we get:

(16)

After successfully decoding the codewords , the decoder at looks for a unique sequence from which is jointly typical with . We again have satisfying this property. It can be shown that the probability of finding no other sequence which is jointly typical with approaches if:

(17)

Similar conditions at sink lead to the following constraints:

(18)

The first packet from , destined to only , carries indices at rate . The second and third packets carry and at rates and , respectively and are routed to the corresponding sinks. Similarly, 3 packets are transmitted from carrying indices at rates to sinks , respectively. Constraints for can now be obtained using (14),(16), (17) and (18). The convex closure of achievable rates over all such random variables gives the achievable rate region for the 2 source - 2 sink DIR problem. It is easy to verify that this region subsumes the region that would be produced by employing the approach of Han and Kobayashi [3], which must assume three independent encoders at each source. Observe that in the above illustration, we assumed that the decoding is performed in a sequential manner, i.e., the codewords of are decoded first followed by the codewords of and , respectively. This was done only for the ease of understanding. In Theorem 1, we derive the conditions on rates for the decoders to find typical sequences from all the codebooks jointly (at once). Note that conditions on the rates for joint decoding is generally weaker (the region is larger) than that for sequential decoding.

4 Dispersive Information Routing - General Setup

Let a network be represented by an undirected connected graph . Each edge is associated with an edge weight, . The communication cost is assumed to be a simple product of the edge rate and edge weight2, i.e., . The nodes consist of source nodes (denoted by ), sinks (denoted by ), and intermediate nodes. We define the sets and . Source node observes iid random variables , each taking values over a finite alphabet . Sink reconstructs (requests) a subset of the sources specified by . Conversely, source node is reconstructed at a subset of sinks specified by . The objective is to find the minimum communication cost achievable by dispersive information routing for lossless reconstruction of the requested sources at each sink when every source node can (possibly) communicate with every sink.

4.1 Obtaining the effective costs

Under DIR each source transmits at most packets into the network, each meant for a different subset of sinks. Note that, while is the subset of sinks reconstructing , may be transmitting packets to many other subsets of sinks. Let the packet from source to the subset of sinks be denoted by and let it carry information at rate .

The optimum route for packet from the source to these sinks is determined by a spanning tree optimization (minimum Steiner tree) [19]. More specifically, for each packet , the optimum route is obtained by minimizing the cost over all trees rooted at node which span all sinks . The minimum cost of transmitting packet with bits from source to the subset of sinks , denoted by is :

(19)

where denotes the set of all paths from source to the subset of sinks . Having obtained the effective cost for each packet in the network, our next objective is to find an achievable rate region for the tuple . The minimum communication cost then follows directly from a simple linear programming formulation. Note that the minimum Steiner tree problem is NP - hard and requires approximate algorithms to solve in practice. Also note that in theory, each encoder transmits packets into the network. While in practice we might be able to realize improvements over broadcast routing using significantly fewer packets (see e.g., [31]).

4.2 An achievable rate region

In what follows, we use the shorthand for and for . Note the difference between and . is a set of variables, whereas is a single variable. For example, denotes the set of variables and represents the set .

We first give a formal definition of a block code and an associated rate region for DIR. We denote the set by for any positive integer . We assume that the source node observes the random sequence . An DIR-code is defined by the following mappings:

  • Encoders:

    (20)
  • Decoders:

    (21)

Denoting where , the decoder estimates are given by:

(22)

Note the correspondence between the encoder-decoder mappings and dispersive information routing. Observe that packet carries at rate from source to the subset of sinks . The probability of error is defined as:

(23)

A rate tuple