Information-Theoretic Bounds for Multiround Function Computation in Collocated Networks{}^{\text{\small 1}}

# Information-Theoretic Bounds for Multiround Function Computation in Collocated Networks\small 1

Nan Ma ECE Dept, Boston University
Boston, MA 02215
nanma@bu.edu
Prakash Ishwar ECE Dept, Boston University
Boston, MA 02215
pi@bu.edu
Piyush Gupta Bell labs, Alcatel-Lucent
Murray Hill, NJ 07974
pgupta@research.bell-labs.com
###### Abstract

We study the limits of communication efficiency for function computation in collocated networks within the framework of multi-terminal block source coding theory. With the goal of computing a desired function of sources at a sink, nodes interact with each other through a sequence of error-free, network-wide broadcasts of finite-rate messages. For any function of independent sources, we derive a computable characterization of the set of all feasible message coding rates - the rate region - in terms of single-letter information measures. We show that when computing symmetric functions of binary sources, the sink will inevitably learn certain additional information which is not demanded in computing the function. This conceptual understanding leads to new improved bounds for the minimum sum-rate. The new bounds are shown to be orderwise better than those based on cut-sets as the network scales. The scaling law of the minimum sum-rate is explored for different classes of symmetric functions and source parameters.

## I Introduction

11footnotetext: The work of N. Ma and P. Ishwar was supported by the US National Science Foundation (NSF) under award (CAREER) CCF–0546598. The work of P. Gupta was supported in part by NSF Grant CNS-0519535. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.

Both wired and wireless data networks such as the Internet and the mobile ad hoc and wireless mesh networks have been designed with the goal of efficient data transfer as opposed to data processing. As a result, computation takes place only after all the relevant data is moved. Two-way interaction would be utilized to primarily improve the reliability of data-reproduction than data processing efficiency. However, to maximize the data processing efficiency, it may be necessary for nodes to interact bidirectionally in multiple rounds to perform distributed computations in the network. In this paper we attempt to formalize this intuition through a distributed function computation problem where data processing efficiency is measured in terms of the total number of bits exchanged per sample computed. Our objective is to study the fundamental limits of multiround function computation efficiency within a distributed source coding framework, involving block-coding asymptotics and vanishing probability of function-computation error, for “collocated” networks where broadcasted messages can be heard by all nodes. We derive an information-theoretic characterization of the set of feasible coding-rates and explore the benefit of multiround communication.

This problem was studied in [1] within a communication complexity framework where computation is required to be error-free. For collocated networks and random planar multihop networks, the scaling law of the maximum rate of computation with respect to a growing size of the network was derived for divisible functions and two subclasses of symmetric functions namely type-sensitive and type-threshold functions. This work was extended in [2] to multihop networks having a finite maximum degree. In [2] it was also shown that for any network, if a nonzero per-sample error probability was allowed, the computation of a type-sensitive function could be converted to that of a type-threshold function. In [3] a min-cut bound was developed for acyclic network topology and was shown to be tight for tree networks and divisible functions.

In [4], a function computation problem in a collocated network was posed within a distributed block source coding framework, under the assumption that conditioned on the desired function, the observations of source nodes are independent. An information-theoretic lower bound for the sum-rate-distortion function was derived. It was shown that if the desired function and the observation noises are Gaussian, the lower bound is tight and there is no advantage to be gained, in terms of sum-rate, by broadcasting messages, in comparison to sending messages through point-to-point links from source nodes to the sink where the function is desired to be computed. Multiround (interactive) function computation in a two-terminal network was studied in [5, 6] within a distributed block source coding framework.

The impact of transmission noise on function computation was considered in [7, 8, 9] but without a block coding rate, i.e., only one source sample is available at each node. A joint source-channel function computation problem over noninteractive multiple-access channels was studied in [10]. Our focus is on the block source coding aspects of function computation and we assume that message exchanges are error-free.

The present work studies a multiround function computation problem in a collocated network within a multi-terminal source coding framework described in Sec. II. Sensors observe discrete memoryless stationary sources taking values in finite alphabets. The goal is to compute a samplewise function at a sink with a probability which tends to one as the block-length tends to infinity. We derive a computable characterization of the rate region and the minimum sum-rate in terms of information quantities (Sec. III). For computation of symmetric functions of binary sources, the sink is shown to inevitably obtain certain additional information, which is not demanded in computing the function (Sec. IV-A). This key observation is formalized under the vanishing block-error probability criterion (Lemma 2) and also the zero-error criterion (Lemma 3). This conceptual understanding leads to improved bounds for the minimum sum-rate (Sec. IV-B). These bounds are shown to be orderwise better than cut-set bounds as the size of the network grows. The scaling law of the minimum sum-rate is evaluated in different cases in Sec. IV-C.

## Ii Multiround Computation in Collocated Networks

Consider a network consisting of source nodes numbered , and one (un-numbered) sink (node). Each source node observes a discrete memoryless stationary source taking values in a finite alphabet. The sink has no source samples. For each ,222When and are integers, denotes an integer interval, which is the set of all consecutive integers beginning with and ending with . let denote the source samples which are available at node-. To isolate the impact of the structure of the desired function on the efficiency of computation, we assume sources are independent, i.e., for , iid . Let be the function of interest at the sink and let . The tuple , which denotes samples of the samplewise function of all the sources, is desired to be computed at the sink.

The communication takes place over rounds. In each round, source nodes broadcast messages according to the schedule . Each message depends on the source samples and all the previous messages which are available to the broadcasting node. Nodes are collocated, meaning that every broadcasted message is recovered without error at every node. After message broadcasts over rounds, the sink computes the samplewise function based on all the messages.

###### Definition 1

An -round distributed block source code for function computation in a collocated network with parameters is the tuple of block encoding functions and a block decoding functions , of block-length , where for every , ,333 means that and divides .

 ej:(Xk)n×j−1⨂i=1Mi→Mj,   g:t⨂j=1Mj→Z.

The output of , denoted by , is called the -th message, is the number of rounds, and is the total number of messages. The output of is denoted by . For each , is called the -th block-coding rate (in bits per sample).

Remarks: (i) Each message could be a null message (). By incorporating null messages, the multiround coding scheme described above subsumes all orders of messages transfers from source nodes, and an -round coding scheme subsumes an -round coding scheme if . (ii) Since the information available to the sink is also available to all source nodes, there is no advantage in terms of sum-rate to allow the sink to send any message.

###### Definition 2

A rate tuple is admissible for -round function computation if, , such that , there exists an -round distributed block source code with parameters satisfying

 ∀j∈[1,t], 1nlog2|Mj|≤Rj+ϵ,  P(ˆZ≠Z)≤ϵ.

The set of all admissible rate tuples, denoted by , is called the operational rate region for -round function computation. The minimum sum-rate is given by . Note that since each message could be a null message, if , holds. The goal of this work is to obtain a single-letter characterization of the rate region (a computable characterization independent of block-length ), to study the scaling behavior of , and to investigate the benefit of multiround function computation.

## Iii Rate Region

The rate region for -round function computation for independent sources can be characterized by Theorem 1, in terms of single-letter mutual information quantities involving auxiliary random variables satisfying Markov chain and conditional entropy constraints.

###### Theorem 1
 Rr={R | ∃ Ut,s.t. ∀j∈[1,t], and k=(jmodm), (3.1) Rj≥I(Xk;Uj|Uj−1),Uj−(Uj−1,Xk)−(Xk−1,Xmk+1), H(f(Xm)|Ut)=0 },

where are auxiliary random variables taking value in finite alphabets. Cardinality bounds on the alphabets of the auxiliary random variables can be derived using the Carathéodory theorem but are omitted.

The proof of achievability follows from standard random conditional coding arguments and is briefly outlined as follows. For the -th message, , node- () quantizes into with as side information, which is available at every node, so that every node can reproduce . After all the message transfers, the sink produces based on . The constraints in (3.1) ensure that as .

The (weak) converse, given in Appendix A, is proved using standard information inequalities, suitably defining auxiliary random variables, and using time-sharing arguments. Specifically, , Uniform independent of , for all , , and for all , .

By adding all the rate inequalities in (3.1) and enforcing all the constraints, we have the following characterization of the minimum sum-rate.

###### Corollary 1
 Rsum,r = minUtI(Xm;Ut), (3.2)

where are subject to all the Markov chain and conditional entropy constraints in (3.1).

The Markov chain and conditional entropy constraints of (3.1) imply a key structural property which need to satisfy. This property is described below in Lemma 1. This lemma provides a bridge between certain fundamental concepts which have been studied in the communication complexity literature[11] and distributed source coding theory. In order to state the lemma, we need to introduce some terminology used in the communication complexity literature[11]. A subset is called a rectangle if for every , there exists such that . A set is called -monochromatic if the function is constant on . The support-set of a probability mass function is the set over which it is strictly positive and is denoted by .

###### Lemma 1

Let be any set of auxiliary random variables satisfying the Markov chain and conditional entropy constraints in (3.1). If , then for any realization of , is an -monochromatic rectangle in .

Proof: The Markov chains in (3.1) induce the following factorization of the joint probability.

 pXmUt(xm,ut) = pXm(xm)pU1|X1(u1|x1)pU2|X2U1(u2|x2,u1)… =: pXm(xm)m∏i=1ϕi(xi,ut),

where is the product of all the factors having conditioning on . For each , let . Since , , we have . Since holds, is -monochromatic.

## Iv Computing Symmetric Functions of Binary Sources

In this section, we focus on the problem of computing symmetric functions of nontrivial Bernoulli sources: , , , where . Symmetric functions are invariant to any permutation of their arguments. A symmetric function of binary sources is completely determined by the (integer) sum of the sources . In other words, , such that .

###### Definition 3

Given a function , an interval is a maximal -monochromatic interval if (i) it is -monochromatic and (ii) it is not a proper subset of an -monochromatic interval.

The collection of all the maximal -monochromatic intervals can be constructed as follows. First, consider all the inverse images . Next, each inverse image can be written as a disjoint union of nonadjacent intervals. The collection of all such intervals from all inverse images, denoted by , forms the collection of all the maximal -monochromatic intervals. Note that they also form a partition of . Without loss of generality, we assume that these intervals are ordered so that and .

### Iv-a Sink learns more than the result of function computation

Note that if then which is, in general, a disjoint union of several maximal -monochromatic intervals. Thus, if the sink can successfully compute the function , one may expect that the sink can only estimate the value of as belonging to the union of several intervals. Somewhat surprisingly, however, it turns out that due to the structure of the multiround code, the sink will, in fact, be able to identify a single maximal monochromatic interval to which belongs as opposed to the union of several intervals. More surprisingly, the sink will be able to correctly identify the source-values at certain nodes. Lemma 2 formalizes this unexpected property and plays a central role in proving Theorem 2(i).

###### Lemma 2

Let be a symmetric function of binary variables and the collection of all the maximal -monochromatic intervals associated with . Let be independent nontrivial Bernoulli random variables and auxiliary random variables which satisfy the Markov chain and conditional entropy constraints in (3.1). Then for any , the following conditions hold.
(i) There exists such that

 P(S∈[av(ut),bv(ut)]|Ut=ut)=1.

(ii) There exist and such that: , and

 P(∀i∈K1(ut),∀i′∈K0(ut),Xi=1,Xi′=0|Ut=ut)=1.
###### Proof:

Due to Lemma 1, is an -monochromatic rectangle, which can be expressed as , where is either or or . Let and . Let and . It can be shown that the projection of under the linear transformation given by is an -monochromatic interval . Since is the collection of all the maximal -monochromatic intervals, such that . Therefore (i) holds. Since , we have and . Therefore (ii) holds. \qed

Although learning that is equivalent to learning that , which is generally a union of several intervals, Lemma 2(i) shows that the structure of block source coding for function computation in collocated networks is such that the sink will inevitably learn the exact interval in which resides even though this information is not demanded in computing . Similarly, Lemma 2(ii) shows that although learning that is equivalent to learning that there exist nodes observing ones and nodes observing zeros, the sink will inevitably learn the identities of these nodes.

Lemma 2 describes a property of the single-letter characterization of the rate region. It does not, as such, have a direct operational significance. Hence the conclusions of the previous paragraph can be only accepted as intuitive interpretations. If, however, the block-error probability criterion in Definition 2 is replaced by the zero-error criterion as in [1], we obtain Lemma 3 which holds for every sample realization and provides an operational significance to the results suggested by Lemma 2.

###### Lemma 3

Let be a symmetric function of binary variables and the collection of all the maximal -monochromatic intervals associated with . Let be independent nontrivial Bernoulli sources. For any -round, block-length code444The results of Lemma 3 hold for not only the multiround block coding strategy described in Definition 1 but also for the class of collision-free coding strategies defined in [1]. for computing in a collocated network, if , then given all the messages , for every sample , the following conditions hold. (i) There exists such that . (ii) There exist and such that: , and .

The proof of Lemma 3, given in Appendix B, is similar in structure to those of Lemmas 1 and 2.

Example: (Parity function) Let be the Boolean XOR function (parity) of binary variables. Then and . Thus for all , , and all the -monochromatic intervals are singletons. For every sample , if is computed with zero error, Lemma 3(i) shows that the sink ends up knowing exactly, because every interval is now a singleton. In addition, Lemma 3(ii) shows that the sink will also identify source nodes which observe ones and source nodes which observe zeros. Therefore the sink essentially needs all the raw data from all the source nodes in order to compute the parity function in a collocated network.

### Iv-B Bounds for minimum sum-rate

Returning to the block-error probability criterion, Lemma 2 leads to the following bounds for when iid Bernoulli, that is, , .

###### Theorem 2

Let be a symmetric function of binary variables and the collection of all the maximal -monochromatic intervals associated with . If iid Bernoulli, , then for all , (i)

 Rsum,r≥mh(p)−vmax∑v=1,av≠bv(bv−av)h(E(S|S∈[av,bv])−avbv−av)P(S∈[av,bv]),

(ii)
(iii)
where is the binary entropy function.

Remark: The minimum sum-rate for “data downloading” where all source samples are to be reproduced at the sink is . Theorem 2(ii) explicitly bounds the efficiency of multiround broadcasting relative to data downloading. Since (ii) is proved by relaxing the lower bound in (i), the right side of (ii) is not greater than that of (i).

### Iv-C Scaling law of minimum sum-rate

Consider a sequence of problems, where in the -th problem, , source nodes observe Bernoulli source samples which are iid both across samples and across nodes and is the desired function. Let be the minimum sum-rate of the -th problem. The scaling law of with respect to is explored in the following cases.

Case 1: We need to use the following fact.

###### Fact 1

For any , if , then such that .

###### Proof:

For any , if , then , which in turn implies that for , holds. \qed

If such that for every maximal -monochromatic interval , , then due to Fact 1, . Then due to Theorem 2(ii), , which implies that and data downloading is orderwise optimal. Conversely, if , then due to Theorem 2(ii), as . Therefore there exists a vanishing sequence such that holds for the -th problem. Due to Fact 1, there exists a sequence of maximal -monochromatic intervals such that as . In other words, multiround computation of symmetric functions of iid binary sources in collocated networks is orderwise more efficient than data downloading only if each sample of is determined with a probability which tends to one as .555We cannot, however, let nodes send nothing and set the output of the sink to be the determined function value because then, for each the probability of block error will tend to one with increasing block-length violating Definition 2.

Case 2: () For any symmetric function of iid Bernoulli sources, let . Theorem 2(i) and (iii) imply that . This shows that multiround computation can at most halve the minimum sum-rate of one-round computation. Since can be easily computed using the binomial distribution, can be easily evaluated within a factor of for all .

Case 3: (Type-sensitive functions, ) A sequence of symmetric functions of binary variables is type-sensitive if and such that , for every -monochromatic interval , (defined in [1], adapted to our notation). For example, the sum, mode, and parity functions are type-sensitive. For iid Bernoulli sources, it can be shown that by applying Theorem 2(i). Remark: For the zero-error criterion, the minimum worst-case sum-rate is also [1].

Case 4: (Type-threshold functions) A sequence of symmetric functions of binary variables is type-threshold, if there exist such that is -monochromatic for every (defined in [1], adapted to our notation). For example, the minimum and maximum functions are type-threshold. (i) If , then exponentially fast as . By applying Theorem 2(i) and (iii), we have , which is orderwise less than . (ii) If and , then , , , and , due to Theorem 2(ii), . Remark: For the zero-error criterion, the minimum worst-case sum-rate is [1].

### Iv-D Comparison to cut-set bounds

How do the bounds given in Sec. IV-B behave in comparison to bounds based on cut-sets? We will show that in some cases they are orderwise tighter than cut-set bounds and in some cases they coincide with them.

For any subset , let . We can formulate a two-terminal interactive function computation problem with alternating message transfers [6] by regarding the set of source nodes in as supernode- and the other source nodes and the sink as supernode-. The sources and are available to supernode- and supernode- respectively and the function is to be computed at supernode-. Let denote the directed sum-rate region of the two-terminal problem, which is the set of tuples such that and are admissible directed sum-rates from to and from to respectively, for two-terminal interactive function computation with alternating messages where is the minimum number of messages needed in the two-terminal problem to simulate the multiround code.

For any multiround code for a collocated network, for every , let denote the sum-rate of the messages broadcasted by node-. This code can be mapped into a two-terminal interaction code for the two-terminal problem described above, which generates the same computation result. The directed sum-rate tuples is , which should belong to the directed sum-rate region of the two-terminal problem. This leads to the following cut-set bound.

###### Theorem 3

(cut-set bound) For all ,

 Rsum,r≥Rcut:=min∀S⊆[1,m], (∑i∈Sri,∑i∈Scri)∈RS,Sc∀i∈[1,m], ri≥0m∑i=1ri. (4.3)

One could also consider a different type of cut-set bound:

 Rsum,r≥R′cut:=maxS⊆[1,m]RsumS,Sc:=maxS⊆[1,m]⎛⎜ ⎜⎝min(∑i∈Sri,∑i∈Scri)∈RS,Sc∀i∈[1,m], ri≥0m∑i=1ri⎞⎟ ⎟⎠,

where is called the bi-directional minimum sum-rate of the two-terminal problem given by the cut-set . Note that . In fact, can be orderwise looser than . For example, for the problem in Prop. 1, and .

###### Proposition 1

If iid Bernoulli and , then .

###### Proof:

For any , if , by applying the cut-set bound for the two-terminal interactive function computation problem [6, Corollary 1(ii)], we have , . Adding the inequalities for all , we have . \qed

Since is an admissible sum-rate for the problem stated in Prop. 1, the cut-set bound is tight. Note that Theorem 2(i) also gives the same bound . However, in the following case, the cut-set bound is orderwise loose.

###### Proposition 2

If iid Bernoulli and , then .

###### Proof:

It is sufficient to show that , is feasible for the minimization problem in (4.3), which requires showing , . Let iid  Bernoulli and iid  Bernoulli. The computation of the two-terminal problem can be performed by the following two schemes. (i) (One-message scheme) Supernode- sends to supernode- at the rate . Therefore , which implies that . (ii) (Two-message scheme) Supernode- sends to supernode- at the rate . Then supernode- computes the samplewise minimum of and , and sends it back to supernode- with as side information available to both supernodes, at the rate . Therefore , which implies that . By evaluating the entropies, it can be shown that, if , then , otherwise . Therefore .

The detailed steps only for ourselves: (will be deleted in the final draft) If , then

 H(YS)=h(12|S|)≤log2(e2|S|)2|S|≤log2(8|S|)2|S|≤3|S|2m/2,

where the first inequality is because and the second inequality is because . Therefore . If , then . If ,

 H(min(YS,YSc)|YSc)=12|Sc|h(12|S|)≤3|S|2m≤3|S|2m/2.

Otherwise (), . Therefore . \qed

Since the problem considered in Prop. 2 is a special case of Case 4(i), due to Theorem 2 we have . Therefore the exponentially vanishing cut-set bound given by Theorem 3 is orderwise loose.

## V Concluding Remarks

We studied function computation in collocated networks using a distributed block source coding framework. We showed that in computing symmetric functions of binary sources, the sink will inevitably obtain certain additional information which is not part of the problem requirement. Leveraging this conceptual understanding we developed bounds for the minimum sum-rate and showed that they can be better than cut-set bounds by orders of magnitude. Directions for future work include characterizing the scaling law of the minimum sum-rate for large source alphabets and general multihop networks.

## Appendix A Converse proof of Theorem 1

Suppose a rate tuple is admissible for -round function computation. By Definition 2, , such that , there exists an -round distributed source code satisfying and . Define auxiliary random variables as follows: 666 means , and