Energy Efficient Distributed Coding for Data Collection in a Noisy Sparse Network
Abstract
We consider the problem of data collection in a twolayer network consisting of (1) links between distributed agents and a remote sink node; (2) a sparse network formed by these distributed agents. We study the effect of interagent communications on the overall energy consumption. Despite the sparse connections between agents, we provide an innetwork coding scheme that reduces the overall energy consumption by a factor of compared to a naive scheme which neglects interagent communications. By providing lower bounds on both the energy consumption and the sparseness (number of links) of the network, we show that are energyoptimal except for a factor of . The proposed scheme extends a previous work of Gallager [1] on noisy broadcasting from a complete graph to a sparse graph, while bringing in new techniques from error control coding and noisy circuits.
Index terms: graph codes, sparse codes, noisy networks, distributed encoding, scaling bounds.
1 Introduction
Consider a problem of collecting messages from distributed agents in a twolayer network. Each agent has one independent random bit , called the selfinformation bit. The objective is to collect all selfinformation bits in a remote sink node with high accuracy. Apart from a noisy channel directly connected to the sink node, each agent can also construct a few noisy channels to other agents. We assume that, the interagent network has an advantage that an agent can transmit bits simultaneously to all its neighbors using a broadcast. However, constructing connections between distributed agents is difficult, meaning that the interagent network is required to be sparse.
Since agents are connected directly to the sink, there exists a simple scheme [1] which achieves polynomially decaying error probability with : for all such that , the th agent transmits to the sink for times, where , to ensure that . Then, using the union bound, we have that . However, this naive scheme can only provide a solution in which the number of transmissions scales as . In this paper, we show that, by carrying out interagent broadcasts, we can reduce the number of transmissions between distributed agents and the remote sensor from to , and hence dramatically reduce the energy consumption. Moreover, we show that, for the interagent broadcasting scheme to work, only interagent connections are required.
A related problem is function computation in sensor networks [1, 3, 2, 4, 5, 6], especially the identity function computation problem [3, 2, 1]. In [1], Gallager designed a coding scheme with broadcasts for identify function computation in a complete graph. Here, we address the same problem in a much sparser graph and obtain the same scaling bound using a conceptually different distributed encoding scheme that we call graph code. We also show that, the required interagent graph is the sparsest graph except for a factor, in that the number of links in the sparsest graph for achieving the number of communications (energy consumption) has to be , if the error probability is required to be . In [3], Giridhar and Kumar studied the rate of computing typesensitive and typethreshold functions in a randomplanar network. In [2], Karamchandani, Appuswamy and Franceschetti studied function computing in a grid network. Readers are referred to an extended version [7] for a thorough literature review.
From the perspective of coding theory, the proposed graph code is closely related to erasure codes that have lowdensity generator matrices (LDGM). In fact, the graph code in this paper is equivalent to an LDGM erasure code with noisy encoding circuitry [11], where the encoding noise is introduced by distributed encoding in the noisy interagent communication graph. Based on this observation, we show (in Corollary 1) that our result directly leads to a known result in LDGM codes. Similar results have been reported by Luby [8] for fountain codes, by Dimakis, Prabhakaran and Ramchandran [9] and by Mazumdar, Chandar and Wornell [10] for distributed storage, both with noisefree encoding. In the extended version [7], we show that this LDGM code achieves sparseness (number of ’s in the generator matrix) that is within a multiple of an informationtheoretic lower bound. Finally, We briefly summarize the main technical contributions of this paper:

we extend the classic distributed data collection problem (identity function computation) to sparse graphs, and obtain the same scaling bounds on energy consumption;

we provide both upper and lower bounds on the sparseness (number of edges) of the communication graph for constrained energy consumption;

we extend classic results on LDGM codes to innetwork computing with encoding noise.
2 System Model and Problem Formulations
Denote by the set of distributed agents. Assume that in the first layer of the network, each agent has a link to the sink node , and this link is a BEC (binary erasure channel) with erasure probability . Each transmission from a distributed agent to the sink consumes energy . We denote by the second layer of the network, i.e., a directed interagent graph. We assume that each directed link in is also a BEC with erasure probability . We denote by and the onehop inneighborhood and outneighborhood of . Each broadcast from a node to all of its outneighbors in consumes energy
2.1 Data Gathering with Transmitting and Broadcasting
A computation scheme is a sequence of Boolean functions, such that at each time slot , a single node computes the function (whose arguments are to be made precise below), and either broadcasts the computed output bit to , or transmits to . We assume that the scheme terminates in finite time, i.e., . The arguments of may consist of all the information that the broadcasting node has up to time , including its selfinformation bit , randomly generated bits and information obtained from its inneighborhood. A scheme has to be feasible, meaning that all arguments of should be available at before time . We only consider oblivious transmission schemes, i.e., the threetuple and the decisions to broadcast or to transmit are predetermined. Denote by the set of all feasible oblivious schemes. For a feasible scheme , denote by the number of transmissions from to the sink, and by the number of broadcasts from to . Then, the overall energy consumption is
(1) 
Conditioned on the graph , The error probability is defined as , where denotes the final estimate of at the sink . It is required that where is the target error probability and might be zero. We also impose a sparse constraint on the problem, meaning the number of edges in the second layer of the network is smaller than . The problem to be studied is therefore
(2) 
A related problem formulation is to minimize the number of edges (obtaining the sparsest graph) while making the energy consumption constrained:
(3) 
2.2 Lower Bounds on Energy Consumption and Sparseness
Theorem 1.
(Lower Bounds) For Problem 1, suppose , where . Then, the solution of Problem 1 satisfies
(4) 
For Problem 2, suppose and . Then, solution
(5) 
Proof.
Due to limited space, we only include a brief introduction on the idea of the proof. See Appendix A for a complete proof. First, for the th node, the probability that all transmissions and broadcasts to its neighbors are erased is . If this event happens for , all information about is erased, and hence all selfinformation bits cannot be recovered. Thus,
(6) 
The above inequality can be relaxed by
(7) 
where is the target error probability. The lower bounds of Problem 1 and Problem 2 are obtained by relaxing the constraint by (7). In what follows, we provide some intuition for Problem 1 as an example. For Problem 1, we notice that, in order to make the overall energy in (1) smaller, we should either make smaller, or make smaller, while maintaining large enough to make hold. Actually, we can make the following observations:

if , we should set , i.e. we should forbid from broadcasting. Otherwise, we should set ;

if , since , we can always make the energy consumption smaller by setting , i.e., we construct no outedges from in the graph .
Using these observations, we can decompose the original optimization into two subproblems respectively regarding and . We can complete the proof using standard optimization techniques and basic inequalities. ∎
Remark 1.
Note that the lower bounds hold for individual graph instances with arbitrary graph topologies. Although the two lower bounds are not tight for all cases, we especially care about the case when the sparseness constraint satisfies and the energy constraint satisfies . In this case, we will provide an upper bound that differs from the lower bound by a multiple of . In Section 4.1, we provide a detailed comparison between the upper and the lower bounds.
3 Main Technique: Graph Code
In this section, we provide an distributed coding scheme in accordance with the goal of Problem 1 and Problem 2. The code considered in this paper, which we call 3 graph code
(8) 
where denotes the selfinformation bits and denotes the encoding output with length . This means that the code bit calculated by a node is either its selfinformation bit or the parity of the selfinformation bits in its inneighborhood . Therefore, 3 codes are easy to encode using interagent broadcasts and admit distributed implementations. In what follows, we define the innetwork computing scheme associated with the 3 code.
3.1 Innetwork Computing Scheme
The innetwork computing scheme has two steps. During the first step, each node take turns to broadcast its selfinformation bit to for times, where
(9) 
where and are two predetermined constants. Then, each node estimates all selfinformation bits from all its inneighbors in . The probability that a certain bit is erased for times when transmitted from a node to one of its outneighbors is
(10) 
If all information bits from its inneighborhood are sent successfully, computes the local parity
(11) 
where is the th column of the adjacency matrix , and the summation is in the sense of modulo2. If any bit is not sent to successfully, i.e., erased for times, the local parity cannot be computed. In this case, is assumed to take the value ’’. We denote the vector of all local parity bits by . If all nodes could successfully receive all information from their inneighborhood, we would have
(12) 
where is the adjacency matrix of the graph .
During the second step, each node transmits and the local parity to the sink exactly once. If a local parity has value ’’, sends the value ’’. Denote the received (possibly erased) version of the selfinformation bits at the sink by , and the received (possibly erased) version of local parities by . Notice that, there might be some bits in changed into value ’’ during the second step. We denote all information gathered at the sink by . If all the connections between the distributed agents and from the distributed agents to the sink were perfect, the received information at the sink could be written as (8). However, the received version is possibly with erasures, so the sink carries out the Gaussian elimination algorithm to recover all information bits, using all nonerased information. If there are too many erased bits, leading to more than one possible decoded values , the sink claims an error.
In all, the energy consumption is
(13) 
where is defined in (9), and the constant in is introduced in the second step, when both the selfinformation bit and the local parity are transmitted to the sink.
4 Analysis of the Error Probability
First, we define a random graph ensemble based on the ErdsRnyi graphs [12]. In this graph ensemble, each node has a directed link to another node with probability , where is the same constant in (9). All connections are independent of each other. We sample a random graph from this graph ensemble and carry out the innetwork broadcasting scheme provided in Section 3.1. Then, the error probability is itself a random variable, because of the randomness in the graph sampling stage and the randomness of the input. We define as the expected error probability over the random graph ensemble.
Theorem 2.
(Upper Bound on the Ensemble Error Probability) Suppose is a constant, is a constant, is the channel erasure probability and . Assume . Define
(14) 
and assume
(15) 
Then, for the transmission scheme in Section 3.1, we have
(16) 
That is to say, if , the error probability eventually decreases polynomially with . The rate of decrease can be maximized over all that satisfies (15).
Proof.
See Section 4.2. ∎
Thus, we have proved that the expected error probability averaged over the graph code ensemble decays polynomially with . Denote by the event that an estimate error occurs at the sink, i.e., , then
(17) 
Since the number of edges in the directed graph is a Binomial random variable, using the Chernoff bound [13], we can get
(18) 
(19) 
which decays polynomially with . This means that there exists a graph code (graph topology) with links, and at the same time, achieves any required nonzero error probability when is large enough. Interestingly, the derivation above implies a more fundamental corollary for erasure coding in pointtopoint channels. The following corollary states the result for communication with noisefree circuitry, while the conclusions in this paper (see Theorem 2) shows the existence of an LDGM code that is tolerant of noisy encoding and distributed encoding.
Corollary 1.
For a discrete memoryless pointtopoint BEC with erasure probability , there exists a systematic linear code with rate
Proof.
See Appendix E. ∎
Remark 2.
In an extended version [7, Section VI], we discuss a distributed coding scheme, called 2, for a geometric graph. The 2 code divides the geometric graph into clusters and conquer each cluster using a dense code with length . Notice that the 2 code requires the same sparsity and the same number of broadcasts (and hence the same scale in energy consumption) as 3. However, the scheduling cost of 2 is high. Further, it requires a powerful code with length , which is not practical for moderate (this is also the problem of the coding scheme in [1]). Nonetheless, the graph topology for the 2 code is deterministic, which does not require ensembletype arguments.
4.1 Gap Between the Upper and the Lower Bounds
In this part, we compare the energy consumption and the graph sparseness of the 3 graph code with the two lower bounds in Theorem 1. First, we examine Problem 1 when and , which is the same case as the 3 Graph Code. In this case, the lower bound (4) has the following form:
(20) 
Under the mild condition , the lower bound can be simplified as
(21) 
The energy consumption of the 3 graph code has the form (see (13)), which has a multiplicative gap with the lower bound. Notice that if we make the assumption , i.e., the interagent communications are cheaper, the two bounds have the same scaling .
Then, we examine Problem 2 when and , which is also the same case as the 3 Graph Code. Notice that under mild assumptions, , which means that the condition in Theorem 1 holds when is large enough. In this case, the lower bound (5) takes the form
(22) 
The number of edges of the 3 graph code has the scale . Therefore, the ratio between the upper and the lower bound satisfies that
(23) 
4.2 An Upper Bound on the Error Probability
The Lemma 1 in the following states that is upper bounded by an expression which is independent of the input (selfinformation bits). In Lemma 1, each term on the RHS of (24) can be interpreted as the probability of the existence of a nonzero vector input that is confused with the allzero vector after all the nonzero entries of are erased, in which case is indistinguishable from the all zero channel input. For example, suppose the code length is . The sent codeword and the output at the sink happens to be . In this case, we cannot distinguish between the input vector and based on the output at the sink.
Lemma 1.
The error probability can be upperbounded by
(24) 
where is the dimensional zero vector.
Proof.
See Appendix B. ∎
Therefore, to upperbound , we only need to consider the event mentioned above, i.e., a nonzero input of selfinformation bits is confused with the allzero vector . This happens if and only if each entry of the received vector at the sink is either zero or ’’. When and the graph are both fixed, different entries in are independent of each other. Thus, the ambiguity probability for a fixed nonzero input and a fixed graph instance is the product of the corresponding ambiguity probability of each entry in (being a zero or a ’’).
The ambiguity event of each entry may occur due to structural deficiencies in the graph topology as well as due to erasures. In particular, three events contribute to the error at the th entry of : the product of and the th column of is zero (topology deficiency); the th entry of is ’’ due to erasures in the first step; the th entry is ’’ due to an erasure in the second step. We denote these three events respectively by , and , where the superscript and the argument mean that the events are for the th entry and conditioned on a fixed message vector . The ambiguity event on the th entry is the union of the above three events. Denote by the union event as . By applying the union bound over all possible inputs, the error probability (for an arbitrary input ) can be upper bounded by
(25) 
In this expression, the randomness of lies in the random edge connections. We use the binary indicator to denote if there is a directed edge from to . Note that we allow selfloops. By assumption, all random variables in are mutually independent
(26) 
where the equality (a) holds because in the innetwork computing scheme, the selfinformation bit and the local parity bit only depend on the inedges of , i.e., the edge set , and the fact that different inedge sets and are independent (by the independence of link generation) for any pair with , and the equality (b) follows from the iterative expectation.
Lemma 2.
Define as the number of ones in and , where is the erasure probability of the BECs and is a constant defined in (9). Further suppose . Then, for , it holds that
(27) 
For , it holds that
(28) 
where is the connection probability.
Proof.
See Appendix C for a complete proof. The main idea is to directly compute the probabilities of three error events , and for each bit . ∎
5 Conclusions
In this paper, we obtain both upper and lower scaling bounds on the energy consumption and the number of edges in the interagent broadcast graph for the problem of data collection in a twolayer network. In the directed ErdsRnyi graph ensemble, the average error probability of the proposed distributed coding scheme decays polynomially with the size of the graph. We show that the obtained code is almost optimal in terms of sparseness (with minimum number of ones in the generator matrix) except for a multiple gap. Finally, we show a connection of our result to LDGM codes with noisy and distributed encoding.
Appendix A Proof of Theorem 1
First, we state a lemma that we will use in the proof.
Lemma 3.
Suppose the constants . Suppose , and suppose the minimization problem
(30) 
has a solution, i.e., the feasible region is not empty. Then, the solution of the above minimization problem satisfies that
(31) 
Proof.
First, consider the case when are fixed. In this case, it can be easily shown in the KKT conditions that the minimization is obtained when
which is equivalent to
(32) 
Since we have that Therefore, for fixed , summing up (32) for all and plug in , we get
(33) 
When , we can prove that the function is convex in . Therefore, the function is convex in . Using the Jensen’s inequality, we have that
(34) 
∎
For the th node, the probability that all transmissions and broadcasts are erased is lower bounded by
(35) 
If this event happens for any node, all instant messages cannot be computed reliably, because at least all information about is erased. Thus, we have
(36) 
which is equivalent to . Using the AMGM inequality, we have that
(37) 
Using the fact that , we have that
(38) 
Plugging in (35), we get
(39) 
where is the target error probability in Problem 1 and Problem 2. Note that to provide a lower bound for solutions of Problem 1 and Problem 2, we can always replace a constraint with a relaxed version. In the following proof, we always relax the constraint by (39), which only makes our lower bound loose, but still legitimate.
Consider Problem 1, in which we have a constraint on the sparseness , and a constraint on the error probability . Our goal is to minimize . Note that in this problem, we have the constraint that . We relax this constraint to , which still yields a legitimate lower bound.
First, we notice the following facts:

If , we should set . Otherwise, we should set .

If , we can always make the energy consumption smaller by setting .
Proof.
For the th node, if we keep fixed, the LHS of the constraint (39) does not change. Noticing that the energy spent at the th node can be written as , we arrive at the conclusion that we should set when . Otherwise, we should maximize , which means setting . This concludes the first statement.
Based on the first statement, we have that, when , we set . Therefore, the constraint (39) does not contain for anymore, which means that further reducing does not affect the constraints. Thus, we should set , which can help relax the constraints for other . ∎
We assume, W.L.O.G., . Using the two arguments above, we can arrive at the following statement about the solution of the relaxed minimization Problem 1:
Statement A.1 : there exists , s.t.
1. for , , ;
2. for , , .
Since , we know that . We can then rewrite the original optimization problem as follows:
(40) 
When and are fixed, we decompose the problem into two subproblems:
(41) 
(42) 
According to Lemma 3, the first subproblem, if , satisfies the lower bound
(43) 
where
(44) 
The second subproblem can be solved using simple convexoptimization techniques and the optimal solution satisfies
(45) 
Therefore, when is fixed,