Distributed Source Coding for
Interactive Function Computation
A two-terminal interactive distributed source coding problem with alternating messages for function computation at both locations is studied. For any number of messages, a computable characterization of the rate region is provided in terms of single-letter information measures. While interaction is useless in terms of the minimum sum-rate for lossless source reproduction at one or both locations, the gains can be arbitrarily large for function computation even when the sources are independent. For a class of sources and functions, interaction is shown to be useless, even with infinite messages, when a function has to be computed at only one location, but is shown to be useful, if functions have to be computed at both locations. For computing the Boolean AND function of two independent Bernoulli sources at both locations, an achievable infinite-message sum-rate with infinitesimal-rate messages is derived in terms of a two-dimensional definite integral and a rate-allocation curve. A general framework for multiterminal interactive function computation based on an information exchange protocol which successively switches among different distributed source coding configurations is developed. For networks with a star topology, multiple rounds of interactive coding is shown to decrease the scaling law of the total network rate by an order of magnitude as the network grows.
distributed source coding, function computation, interactive coding, rate-distortion region, Slepian-Wolf coding, two-way coding, Wyner-Ziv coding.
I Introduction11footnotetext: This material is based upon work supported by the US National Science Foundation (NSF) under award (CAREER) CCF–0546598. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF. A part of this work was presented in ISIT’08.
In networked systems where distributed inferencing and control needs to be performed, the raw-data (source samples) generated at different nodes (information sources) needs to be transformed and combined in a number of ways to extract actionable information. This requires performing distributed computations on the source samples. A pure data-transfer solution approach would advocate first reliably reproducing the source samples at decision-making nodes and then performing suitable computations to extract actionable information. Two-way interaction and statistical dependencies among source, destination, and relay nodes, would be utilized, if at all, to primarily improve the reliability of data-reproduction than overall computation-efficiency.
However, to maximize the overall computation-efficiency, it is necessary for nodes to interact bidirectionally, perform computations, and exploit statistical dependencies in data as opposed to only generating, receiving, and forwarding data. In this paper we attempt to formalize this common wisdom through some examples of distributed function-computation problems with the goal of minimizing the total number of bits exchanged per source sample. Our objective is to highlight the role of interaction in computation-efficiency within a distributed source coding framework involving block-coding asymptotics and vanishing probability of function-computation error. We derive information-theoretic characterizations of the set of feasible coding-rates for these problems and explore the fascinating interplay of function-structure, distribution-structure, and interaction.
I-a Problem setting
Consider the following general two-terminal interactive distributed source coding problem with alternating messages illustrated in Figure 1. Here, samples , of an information source are available at location . A different location has samples of a second information source which are statistically correlated to . Location desires to produce a sequence such that where is a nonnegative distortion function of variables. Similarly, location desires to produce a sequence such that . All alphabets are assumed to be finite. To achieve the desired objective, coded messages, , of respective bit rates (bits per source sample), , are sent alternately from the two locations starting with location of location . The message sent from a location can depend on the source samples at that location and on all the previous messages (which are available to both locations). There is enough memory at both locations to store all the source samples and messages. An important goal is to characterize the set of all rate -tuples for which both and as . This set of rate-tuples is called the rate region.
I-B Related work
The available literature closely related to this problem can be roughly partitioned into three broad categories. The salient features of related problems in these three categories are summarized below using the notation of the problem setting described above.
I-B1 Communication complexity 
Here, and are typically deterministic, is not fixed in advance, and and are the indicator functions of the sets and respectively. Thus, the goal is to compute the function at location and the function at location . Both deterministic and randomized coding strategies have been studied. If coding is deterministic, the functions are required to be computed without error, i.e., . If coding is randomized, with the sources of randomness independent of each other and and , then and are random variables. In this case, computation could be required to be error-free and the termination time random (the Las-Vegas framework) or the termination time could be held fixed but large enough to keep the probability of computation error smaller than some desired value (the Monte-Carlo framework).
The coding-efficiency for function computation is called communication complexity. When coding is deterministic, communication complexity is measured in terms of the minimum value, over all codes, of the total number of bits that need to be exchanged between the two locations, to compute the functions without error, irrespective of the values of the sources. When coding is randomized, both the worst-case and the expected value of the total number of bits, over all sources of randomization, have been considered. The focus of much of the literature has been on establishing order-of-magnitude upper and lower bounds for the communication complexity and not on characterizing the set of all source coding rate tuples in bits per source sample. In fact, the ranges of and considered in the communication complexity literature are often orders of magnitude smaller than their domains. This would correspond to a vanishing source coding rate.
Recently, however, Giridhar and Kumar successfully applied the communication complexity framework to study how the rate of function computation can scale with the size of the network for deterministic sources [2, 3]. They considered a network where each node observes a (deterministic) sequence of source samples and a sink node where the sequence of function values needs to be computed. To study how the computation rate scales with the network size, they considered the class of connected random planar networks and the class of co-located networks and focused on the divisible and symmetric families of functions.
I-B2 Interactive source reproduction
Kaspi  considered a distributed block source coding [5, Section 14.9] formulation of this problem for discrete memoryless stationary sources taking values in finite alphabets. However, the focus was on source reproduction with distortion and not function computation. The source reproduction quality was measured in terms of two single-letter distortion functions of the form and . Coupled single-letter distortion functions of the form and , and probability of block error for lossless reproduction, were not considered. For a fixed number of messages , a single-letter characterization of the sum-rate pair (not the entire rate region) was derived. However, no examples were presented to illustrate the benefits of two-way source coding. The key question: “does two-way (interactive) distributed source coding with more messages require a strictly less sum-rate than with fewer messages?” was left unanswered.
The recent paper by Yang and He  studied two-terminal interactive source coding for the lossless reproduction of a stationary non-ergodic source at with decoder side-information . Here, the code termination criterion depended on the sources and previous messages so that was a random variable. Two-way interactive coding was shown to be strictly better than one-way non-interactive coding.
I-B3 Interactive function computation
In , Yamamoto studied the problem where is a doubly symmetric binary source,222 iid , where is the Kronecker delta, and . We say DSBS. terminal is required to compute a Boolean function of the sources satisfying an expected per-sample Hamming distortion criterion corresponding to , where is a Boolean function, only one message is allowed, i.e., , and nothing is required to be computed at terminal , i.e., . This is equivalent to Wyner-Ziv source coding  with decoder side-information for a per-sample distortion function which depends on the decoder reconstruction and both the sources. Yamamoto computed the rate-distortion function for all the Boolean functions of two binary variables and showed that they are of only three forms.
In , Han and Kobayashi studied a three-terminal problem where and are discrete memoryless stationary sources taking values in finite alphabets, is observed at terminal one and at terminal two and terminal three wishes to compute a samplewise function of the sources losslessly. Only terminals one and two can each send only one message to terminal three. Han and Kobayashi characterized the class of functions for which the rate region of this problem coincides with the Slepian-Wolf  rate region.
Orlitsky and Roche  studied a distributed block source coding problem whose setup coincides with Kaspi’s problem  described above. However, the focus was on computing a samplewise function of the two sources at terminal using up to two messages (). Nothing was required to be computed at terminal , i.e., . Both probability of block error and per-sample expected Hamming distortion were considered. A single-letter characterization of the rate region was derived. Example 8 in  showed that the sum-rate with messages is strictly smaller than with one message.
We study the two-terminal interactive function computation problem described in Section I-A for discrete memoryless stationary sources taking values in finite alphabets. The goal is to compute samplewise functions at one or both locations and the two functions can be the same or different. We focus on a distributed block source coding formulation involving a probability of block error which is required to vanish as the blocklength tends to infinity. We derive a computable characterization of the the rate region and the minimum sum-rate for any finite number of messages in terms of single-letter information quantities (Theorem 1 and Corollary 1). We show how the rate-regions for different number of messages and different starting locations are nested (Proposition 1). We show how the Markov chain and conditional entropy constraints associated with the rate region are related to certain geometrical properties of the support-set of the joint distribution and the function-structure (Lemma 1). This relationship provides a link to the concept of monochromatic rectangles which has been studied in the communication complexity literature. We also consider a concurrent kind of interaction where messages are exchanged simultaneously and show how the minimum sum-rate is bounded by the sum-rate for alternating-message interaction (Proposition 2). We also consider per-sample average distortion criteria based on coupled single-letter distortion functions which involve the decoder output and both sources. For expected distortion as well as probability of excess distortion we discuss how the single-letter characterization of the rate-distortion region is related to the rate region for probability of block error (Section III-B).
Striking examples are presented to show how the benefit of interactive coding depends on the function-structure, computation at one/both locations, and the structure of the source distribution. Interactive coding is useless (in terms of the minimum sum-rate) if the goal is lossless source reproduction at one or both locations but the gains can be arbitrarily large for computing nontrivial functions involving both sources even when the sources are independent (Sections IV-A, IV-B, and IV-C). For certain classes of sources and functions, interactive coding is shown to have no advantage (Theorems 2 and 3). In fact, for doubly symmetric binary sources, interactive coding, with even an unbounded number of messages is useless for computing any function at one location (Section IV-D) but is useful if computation is desired at both locations (Section IV-E). For independent Bernoulli sources, when the Boolean AND function is required to be computed at both locations, we develop an achievable infinite-message sum-rate with an infinitesimal rate for each message (Section IV-F). This sum-rate is expressed in analytic closed-form, in terms of two two-dimensional definite integrals, which represent the total rate flowing in each direction, and a rate-allocation curve which coordinates the progression of function computation.
We develop a general formulation of multiterminal interactive function computation in terms an interaction protocol which switches among many distributed source coding configurations (Section V). We show how results for the two-terminal problem can be used to develop insights into optimum topologies for information flow in larger networks through a linear program involving cut-set lower bounds (Sections V-B and V-C). We show that allowing any arbitrary number of interactive message exchanges over multiple rounds cannot reduce the minimum total rate for the Körner-Marton problem . For networks with a star topology, however, we show that interaction can, in fact, decrease the scaling law of the total network rate by an order of magnitude as the network grows (Example 3 in Section V-C).
Notation: In this paper, the terms terminal, node, and location, are synonymous and are used interchangeably. The acronym ‘iid’ stands for independent and identically distributed and ‘pmf’ stands for probability mass function. Boldface letters such as, , etc., are used to denote vectors. Although the dimension of a vector is suppressed in this notation, it will be clear from the context. With the exception of the symbols , and , random quantities are denoted in upper case, e.g., , etc., and their specific instantiations are denoted in lower case, e.g., , etc. When denotes a random variable, denotes the ordered tuple and denotes the ordered tuple . However, for a set , denotes the -fold Cartesian product . The symbol denotes and denotes . The indicator function of set which is equal to one if and is zero otherwise, is denoted by . The support-set of a pmf is the set over which it is strictly positive and is denoted by . Symbols , and represent Boolean XOR, AND, and OR respectively.
Ii Two-terminal interactive function computation
Ii-a Interactive distributed source code
We consider two statistically dependent discrete memoryless stationary sources taking values in finite alphabets. For , let iid . Here, is a joint pmf which describes the statistical dependencies among the samples observed at the two locations at each time instant . Let and be functions of interest at locations and respectively, where and are finite alphabets. The desired outputs at locations and are and respectively, where for , and .
A (two-terminal) interactive distributed source code (for function computation) with initial location and parameters is the tuple of block encoding functions and two block decoding functions , of blocklength , where for ,
The output of , denoted by , is called the -th message, and is the number of messages. The outputs of and are denoted by and respectively. For each , is called the -th block-coding rate (in bits per sample).
Intuitively speaking, coded messages, , are sent alternately from the two locations starting with location . The message sent from a location can depend on the source samples at that location and on all the previous messages (which are available to both locations from previous message transfers). There is enough memory at both locations to store all the source samples and messages.
We consider two types of fidelity criteria for interactive function computation in this paper. These are 1) probability of block error and 2) per-sample distortion.
Ii-B Probability of block error and operational rate region
Of interest here are the probabilities of block error and which are multi-letter distortion functions. The performance of -message interactive coding for function computation is measured as follows.
A rate tuple is admissible for -message interactive function computation with initial location if, , such that , there exists an interactive distributed source code with initial location and parameters satisfying
The set of all admissible rate tuples, denoted by , is called the operational rate region for -message interactive function computation with initial location . The rate region is closed and convex due to the way it has been defined. The minimum sum-rate is given by where the minimization is over . For initial location , the rate region and the minimum sum-rate are denoted by and respectively.
Ii-C Per-sample distortion and operational rate-distortion region
Let and be bounded single-letter distortion functions. The fidelity of function computation can be measured by the per-sample average distortion
Of interest here are either the expected per-sample distortions and or the probabilities of excess distortion and . Note that although the desired functions and do not explicitly appear in these fidelity criteria, they are subsumed by and because they accommodate general relationships between the sources and the outputs of the decoding functions. The performance of -message interactive coding for function computation is measured as follows.
A rate-distortion tuple is admissible for -message interactive function computation with initial location if, , such that , there exists an interactive distributed source code with initial location and parameters satisfying
The set of all admissible rate-distortion tuples, denoted by , is called the operational rate-distortion region for -message interactive function computation with initial location . The rate-distortion region is closed and convex due to the way it has been defined. The sum-rate-distortion function is given by where the minimization is over all such that . For initial location , the rate-distortion region and the minimum sum-rate-distortion function are denoted by and respectively.
The admissibility of a rate-distortion tuple can also be defined in terms of the probability of excess distortion by replacing the expected distortion conditions in Definition 3 by the conditions and . Although these conditions appear to be more stringent333Any tuple which is admissible according to the probability of excess distortion criteria is also admissible according to the expected distortion criteria., it can be shown444Using strong-typicality arguments in the proof of the achievability part of the single-letter characterization of the rate-distortion region. that they lead to the same operational rate-distortion region. For simplicity, we focus on the expected distortion conditions as in Definition 3.
For a -message interactive distributed source code, if , then (null message) and nothing needs to be sent in the last step and the -message code reduces to a -message code. Thus the -message rate region is contained within the -message rate region. For generality and convenience, is allowed for all . The following proposition summarizes some key properties of the rate regions which are needed in the sequel.
(i) If , then . Hence . (ii) If , then . Hence . Similarly, . (iii) .
Proof: (i) Any -message code with initial location can be regarded as a special case of a -message code with initial location by taking . (ii) Any -message code with initial location can be regarded as a special case of a -message code with initial location by taking . (iii) From (i), and are nonincreasing in and bounded from below from zero, so the limits exist. From (ii), , hence the limits are equal.
Proposition 1 is also true for any fixed distortion levels if we replace rate regions and minimum sum-rates in the proposition by rate-distortion regions and sum-rate-distortion functions respectively.
Ii-E Interaction with concurrent message exchanges
In contrast to the type of interaction described in Section II-A which involves alternating message transfers, one could also consider another type of interaction which involves concurrent messages exchanges. In this type of interaction, in the -th round of interaction, two messages and are generated simultaneously by encoding functions (at location ) and (at location ) respectively. These messages are based on the source samples which are available at each location and on all the previous messages which are available to both locations from previous rounds of interaction. Then and are exchanged. In rounds, messages are transferred. After rounds of interaction, decoding functions and generate function estimates based on all the messages and the source samples which are available at locations and respectively. We can define the rate region and the rate-distortion region for concurrent interaction as in Sections II-B and II-C for alternating interaction. Let denote the minimum sum-rate for -round interactive function computation with concurrent message exchanges.
The following proposition shows how the minimum sum-rates for concurrent and alternating types of interaction bound each other. This is based on a purely structural comparison of alternating and concurrent modes of interaction.
(i) . (ii) .
Proof: (i) The first inequality holds because any -message interactive code with alternating messages and initial location can be regarded as a special case of a -round interactive code with concurrent messages by taking for all even and for all odd .
The second inequality can be proved as follows. Given any -round interactive code with concurrent messages and encoding functions , one can construct a -message interactive code with alternating messages as follows: (1) Set . (2) For , if is even, define as the combination of and , otherwise, define as the combination of and . (3) If is even, set , otherwise set . It can be verified by induction that the inputs of defined in this way are indeed available when these encoding functions are used. Hence these are valid encoding functions for interactive coding with alternating messages. This -message interactive code with alternating messages has the same sum-rate as the original -round interactive code with concurrent messages. Therefore we have .
(ii) This follows from (i).
Although a -round interactive code with concurrent messages uses messages, the sum-rate performance is bounded by that of an alternating-message code with only messages. When is large, the benefit of concurrent interaction over alternating interaction disappears. Due to this reason and because for two-terminal function computation it is easier to describe results for alternating interaction, in Sections III and IV our discussion will be confined to alternating interaction. For multiterminal function computation, however, the framework of concurrent interaction becomes more convenient. Hence in Section V we consider multiterminal function computation problems with concurrent interaction.
Iii Rate region
Iii-a Probability of block error
When the probability of block error is used to measure the quality of function computation, the rate region for -message interactive distributed source coding with alternating messages can be characterized in terms of single-letter mutual information quantities involving auxiliary random variables satisfying conditional entropy constraints and Markov chain constraints. This characterization is provided by Theorem 1.
where are auxiliary random variables taking values in alphabets with the cardinalities bounded as follows,
It should be noted that the right side of (3.1) is convex and closed. This is because is convex and closed and Theorem 1 shows that the right side of (3.1) is the same as . In fact the convexity and closedness of the right side of (3.1) can be shown directly without appealing to Theorem 1 and the properties of . This is explained at the end of Appendix A.
The proof of achievability follows from standard random coding and random binning arguments as in the source coding with side information problem studied by Wyner, Ziv, Gray, Ahlswede, and Körner  (also see Kaspi ). We only develop the intuition and informally sketch the steps leading to the proof of achievability. The key idea is to use a sequence of “Wyner-Ziv-like” codes. First, Enc.1 quantizes to using a random codebook-1. The codewords are further randomly distributed into bins and the bin index of is sent to location . Enc.2 identifies from the bin with the help of as decoder side-information. Next, Enc.2 jointly quantizes to using a random codebook-2. The codewords are randomly binned and the bin index of is sent to location . Enc.3 identifies from the bin with the help of as decoder side-information. Generally, for the -th message, odd, Enc. jointly quantizes to using a random codebook-. The codewords are randomly binned and the bin index of is sent to location . Enc. identifies from the bin with the help of as decoder side information. If is even, interchange the roles of locations and and sources and in the procedure for an odd . Note that implies the existence of a deterministic function such that . At the end of messages, Dec. produces by . Similarly, Dec. produces . The rate and Markov chain constraints ensure that all quantized codewords are jointly strongly typical with the sources and are recovered with a probability which tends to one as . The conditional entropy constraints ensure that the corresponding block error probabilities for function computation go to zero as the blocklength tends to infinity.
The (weak) converse is proved in Appendix A following  using standard information inequalities, suitably defining auxiliary random variables, and using convexification (time-sharing) arguments. The conditional entropy constraints are established using Fano’s inequality as in [8, Lemma 1]. The proof of cardinality bounds for the alphabets of the auxiliary random variables is also sketched.
Proof: For (i), add all the rate inequalities in (3.1) enforcing all the constraints. Inequality (ii) can be proved either using (3.3) and relaxing the Markov chains constraints, or using the following cut-set bound argument. If is also available at location , then can be computed at location . Hence by the converse part of the Slepian-Wolf theorem, the sum-rate of all messages from to must be at least for to form . Similarly, the sum-rate of all messages from to must be at least .
Although (3.1) and (3.3) provide computable single-letter characterizations of and respectively for all finite , they do not provide a characterization for in terms of computable single-letter information quantities. This is because the cardinality bounds for the alphabets of the auxiliary random variable , given by (3.2), grow with .
The Markov chain and conditional entropy constraints of (3.1) imply certain structural properties which the support-set of the joint distribution of the source and auxiliary random variables need to satisfy. These properties are formalized below in Lemma 1. This lemma provides a bridge between certain concepts which have played a key role in the communication complexity literature and distributed source coding theory. In order to state the lemma, we need to introduce some terminology used in the communication complexity literature. This is adapted to our framework and notation. A subset is called -monochromatic if the function is constant on . A subset is called a rectangle if for some and some . Subsets of the form , , are called rows and subsets of the form , , are called columns the rectangle . By definition, the empty set is simultaneously a rectangle, a row, and a column. If each row of a rectangle is -monochromatic, then is said to be row-wise -monochromatic. Similarly, if each column of a rectangle is -monochromatic, then is said to be column-wise -monochromatic. Clearly, if is both row-wise and column-wise -monochromatic, then it is an -monochromatic subset of .
Let be any set of auxiliary random variables satisfying the Markov chain and conditional entropy constraints of (3.1). Let denote the projection of the -slice of onto . If , then for all , the following four conditions hold. (i) is a rectangle. (ii) is row-wise -monochromatic. (iii) is column-wise -monochromatic. (iv) If in addition, , then is -monochromatic.
Proof: (i) The Markov chains in (3.1) induce the following factorization of the joint pmf.
where is the product of all the factors having conditioning on and is the product of all the factors having conditioning on . Let and . Since for all and , . (ii) This follows from the conditional entropy constraint in (3.1). (iii) This follows from the conditional entropy constraint in (3.1). (iv) This follows from parts (ii) and (iii) of this lemma.
Note that is the empty set if, and only if,
. The above lemma holds for all values of . The
fact that the set has a rectangular shape is a
consequence of the fact that the auxiliary random variables need
to satisfy the Markov chain constraints in
(3.1). These Markov chain constraints are in turn
consequences of the structural constraints which are inherent to the
coding process – messages alternate from one terminal to the other
and can depend on only the source samples and all the previously
received messages which are available at a terminal. The rectangular
property depends “less directly” on the function-structure than on
the structure of the coding process and the structure of the joint
source distribution. On the other hand, the fact that
is row-wise or/and column-wise monochromatic is a
consequence of the fact that the auxiliary random variables need
to satisfy the conditional entropy constraints in
(3.1). This property is more closely tied to the
structure of the function and the structure of the joint distribution
of sources. Lemma 1 will be used to prove
Theorems 2 and Theorem 4
in the sequel.
Iii-B Rate-distortion region
When per-sample distortion criteria are used, the single-letter characterization of the rate-distortion region is given by Theorem 1 with the conditional entropy constraints in (3.1) replaced by the following expected distortion constraints: there exist deterministic functions and , such that and . The proof of achievability is similar to that of Theorem 1. The distortion constraints get satisfied “automatically” by using strongly typical sets in the random coding and binning arguments. The proof of the converse given in Appendix A will continue to hold if equations (A.4) and (A.5) are replaced by and respectively and the subsequent steps in the proof changed appropriately.
The following proposition clarifies the relationship between the rate region for probability of block error and the rate-distortion region.
Let denote the Hamming distortion function. If , , and , then .
Proof: In order to show that , note that , we have and for the distortion function assumed in the statement of the proposition. Therefore .
In order to show that , note that such that , we have , which implies , which in turn implies . Similarly, we have . Therefore .
Does interaction really help? In other words, does interactive coding with more messages strictly outperform coding with less messages in terms of the sum-rate? When only one nontrivial function has to be computed at only one location, at least one message is needed. In this situation, interaction will be considered to be “useful” if there exists such that . When nontrivial functions have to be computed at both locations, at least two messages are needed, one going from to and the other from to . Since messages go in both directions, a two-message code can be potentially considered to be interactive. However, this is a trivial form of interaction because function computation is impossible without two messages. Therefore, in this situation, interaction will be considered to be useful if there exists such that . Corollary 1 does not directly tell us if or when interaction is useful. In this section we explore the value of interaction in different scenarios through some striking examples. Interaction does help in examples IV-C, IV-E and IV-F, and does not (even with infinite messages) in examples IV-A, IV-B and IV-D.
Iv-a Interaction is useless for reproducing one source at one location: .
Iv-B Interaction is useless for reproducing both sources at both locations: .
Unless or , at least two messages are necessary. From (3.4), . But by Slepian-Wolf coding, first with as source and as decoder side information and then vice-versa. Hence, by Proposition 1(i), for all .
Examples IV-A and IV-B show that if the goal is source reproduction with vanishing distortion, interaction is useless555However, interaction can prove useful for source reproduction when it is either required to be error-free[11, 12] or when the sources are stationary but non-ergodic. To discover the value of interaction, we must study either nonzero distortions or functions which involve both sources. Our focus is on the latter.
Iv-C Benefit of interaction can be arbitrarily large for function computation: , , , (real multiplication).
This is an expanded version of Example 8 in . At least one message is necessary. If , an achievable scheme is to send by Slepian-Wolf coding at the rate so that the function can be computed at location . Although location is required to compute only the samplewise product and is not required to reproduce , it turns out, rather surprisingly, that the one-message rate cannot be decreased. This is a direct consequence of a lemma due to Han and Kobayashi which we now state by adapting it to our situation and notation.
(Han and Kobayashi [8, Lemma 1]) Let . If , , there exists such that , then .
The condition of Lemma 2 is satisfied in our present example with . Therefore we have . With one extra message and initial location , however, can be reproduced at location by entropy-coding at the rate bits per sample. Then, can be computed at location and conveyed to location via Slepian-Wolf coding at the rate bits per sample, where is the binary entropy function. Therefore, . The benefit of even one extra message can be significant: For fixed , can be made arbitrarily large for suitably small . For fixed , can be made arbitrarily large for suitably large .
Extrapolating from this example, one might be led to believe that the benefit of interaction arises due to computing nontrivial functions which involve both sources as opposed to reproducing the sources themselves. In other words, the function-structure determines whether interaction is beneficial or not (recall that the sources were independent in this example). However, the structure of the joint distribution plays an equally important role and this aspect will be highlighted in the next example.