Lower bounds on information complexity
via zerocommunication protocols
and applications
Abstract
We show that almost all known lower bound methods for communication complexity are also lower bounds for the information complexity. In particular, we define a relaxed version of the partition bound of Jain and Klauck [JK10] and prove that it lower bounds the information complexity of any function. Our relaxed partition bound subsumes all norm based methods (e.g. the method) and rectanglebased methods (e.g. the rectangle/corruption bound, the smooth rectangle bound, and the discrepancy bound), except the partition bound.
Our result uses a new connection between rectangles and zerocommunication protocols where the players can either output a value or abort. We prove the following compression lemma: given a protocol for a function with information complexity , one can construct a zerocommunication protocol that has nonabort probability at least and that computes correctly with high probability conditioned on not aborting. Then, we show how such a zerocommunication protocol relates to the relaxed partition bound.
We use our main theorem to resolve three of the open questions raised by Braverman [Bra12]. First, we show that the information complexity of the Vector in Subspace Problem [KR11] is , which, in turn, implies that there exists an exponential separation between quantum communication complexity and classical information complexity. Moreover, we provide an lower bound on the information complexity of the Gap Hamming Distance Problem.
1 Introduction
Information complexity is a way of measuring the amount of information Alice and Bob must reveal to each other in order to solve a distributed problem. The importance of this notion has been made apparent in recent years through a flurry of results that relate the information complexity of a function and its communication complexity. One of the main applications of information complexity is to prove direct sum theorems in communication complexity, namely to show that computing copies of a function costs times the communication of computing a single copy. Chakrabarti, Shi, Wirth and Yao [CSWY01] used information complexity to prove a direct sum theorem for simultaneous messages protocols (their notion is now usually called the external information complexity, whereas in this paper we work exclusively with what is often called the internal information complexity). BarYossef et al. [BYJKS04], used the information cost in order to prove a linear lower bound on the twoway randomized communication complexity of Disjointness. More recently, informationtheoretic techniques enabled the proof of the first nontrivial direct sum result for general twoway randomized communication complexity: the randomized communication complexity of copies of a function is at least times the randomized communication complexity of [BBCR10]. Then, Braverman and Rao [BR11], showed a tight relation between the amortized distributional communication complexity of a function and its internal information cost. Braverman [Bra12], defined interactive information complexity, a notion which is independent of the prior distribution of the inputs and proved that it is equal to the amortized communication complexity of the function. Braverman and Weinstein [BW12] showed that the information complexity is lower bounded by discrepancy.
The main question pertaining to information complexity is its relation to communication complexity. On the one hand, the information complexity provides a lower bound on the communication complexity of the function, since there cannot be more information leaked than the length of the messages exchanged. However, it is still open, whether the information complexity of a function can be much smaller than its communication complexity or whether the two notions are basically equivalent. In order to make progress towards this question, it is imperative to provide strong lower bounds for information complexity, and more specifically to see whether the lower bound methods for communication complexity can be compared to the model of information complexity.
Lower bound methods in communication complexity can be seen to fall into three main categories: the norm based methods, such as the method of Linial and Shraibman [LS09b] (see Lee and Shraibman’s survey for an overview [LS09a]); the rectangle based methods, such as discrepancy and the rectangle bound; and, of course, the information theoretic methods, among which, information complexity. Recently, Jain and Klauck [JK10] introduced the smooth rectangle bound, as well as the stronger partition bound, and showed that they subsume both and the rectangle bound [JK10].
The first lower bound on information complexity was proved by Braverman [Bra12], who showed that it is lower bounded by the logarithm of the communication complexity. Recently, Braverman and Weinstein showed that the discrepancy method lower bounds the information complexity [BW12]. Their result follows from a compression lemma for protocols: a protocol for a function that leaks bits of information implies the existence of a protocol with communication complexity and advantage on computing (over a random guess) of .
1.1 Our results
In this paper, we show that all known lower bound methods for communication complexity, with the notable exception of the partition bound, generalize to information complexity. More precisely, we introduce the relaxed partition bound (in subsection 3.2) denoted by , which depends on the function to be computed , the input distribution , and the error parameter , and such that the distributional communication complexity for any . We prove that the information complexity of a function is bounded below by the relaxed partition bound:
Theorem 1.1.
There is a positive constant such that for all functions , all , and all distributions , we have .
Since we show in subsection 3.2 that the relaxed partition bound subsumes the norm based methods (e.g. the method) and the rectanglebased methods (e.g. the rectangle/corruption bound, the smooth rectangle bound, and the discrepancy bound), all of these bounds are also lower bounds on the information complexity. Moreover, together with the direct sum theorem for information complexity, our main result implies a direct sum theorem on communication complexity for many notable functions (see subsection 1.1).
Technique.
The key idea of our result is a new connection between communication rectangles and zerocommunication protocols, where the players can either output a value or abort but without communicating. A priori, it is surprising that protocols with no communication can actually provide some insight on the communication or information complexity of a function. However, this model, which has been extensively used in quantum information for the study of nonlocal games and Bell inequalities, turns out to be a very powerful tool for the study of classical communication and information complexity. The communication complexity of simulating distributions is known to be related to the probability of not aborting in zerocommunication protocols that can abort [GG99, Mas02, BHMR03, BHMR06]. More recently connections have been shown for specific lower bound methods. It has been shown that zerocommunication protocols with error give rise to the factorization norm method [DKLR11], and the connection between the partition bound and zerocommunication protocols with abort was studied in [LLR12].
In a deterministic zerocommunication protocol with abort, each of the two players looks at their input and decides either to abort the protocol or to output some value . The output of the protocol is if both players agree on , or it aborts otherwise. It is easy to see that for any deterministic zerocommunication protocol with abort, the set of inputs where both players choose to output forms a rectangle, and so the protocol is characterized by a set of rectangles each labeled by an output. In a randomized protocol, we have instead a distribution over labeled rectangles.
This connection between rectangles and zerocommunication protocols with abort allows us to obtain our lower bound for information complexity from a new compression lemma for protocols (subsection 3.3): a protocol for a function that leaks bits of information implies the existence of a zerocommunication protocol that has nonabort probability at least and that computes correctly with high probability when not aborting. Our main theorem follows from this new compression.
The technical tools we use are drawn from Braverman [Bra12] and in particular Braverman and Weinstein [BW12]. We describe the difference between our compression and that of [BW12]. There, they take a protocol for computing a function that has information cost and compress it to a protocol with communication and advantage of computing of (i.e. the error increases considerably). Then, they apply the discrepancy method, which can handle such small advantage.
In our compression, we suppress the communication entirely, and, moreover, we only introduce an arbitrarily small error since the compressed protocol aborts when it does not believe it can correctly compute the output. This compression enables us to provide much sharper lower bounds on the information complexity and in particular, the lower bound in terms of the relaxed partition bound.
Applications.
Our lower bound implies that for most functions for which there exists a lower bound on their communication complexity, the same bound extends to their information complexity. Specifically, we can apply our lower bound in order to resolve three of the open questions in [Bra12].
First, we show that there exists a function , such that the quantum communication complexity of is exponentially smaller than the information complexity of (Open Problem 3 in [Bra12]).
Theorem 1.2.
There exists a function , s.t. for all .
In order to prove the above separation, we show that the proof of the lower bound on the randomized communication complexity of the Vector in Subspace Problem () [KR11] provides, in fact, a lower bound on the relaxed partition bound. By our lower bound, this implies that (Open Problem 7 in [Bra12]). Since the quantum communication complexity of is , we have the above theorem. Moreover, this implies an exponential separation between classical and quantum information complexity. We refrain from defining quantum information cost in this paper (see [JN10] for a definition), but since the quantum information cost is always smaller than the quantum communication complexity, the separation follows trivially from the above theorem.
In addition, we resolve the question of the information complexity of the Gap Hamming Distance Problem () (Open Problem 6 in [Bra12]), since the lower bounds on the randomized communication complexity of this problem go through the rectangle/corruption bound [She12] or smooth rectangle bound [CR11, Vid12].
Theorem 1.3.
Regarding direct sum theorems, it was shown [Bra12] that the information complexity satisfies a direct sum theorem, namely . If in addition it holds that , then we can immediately deduce that , i.e. the direct sum theorem holds for . Therefore our main result also gives the following corollary:
Corollary 1.3.
For any and any , if , then for all and integers , it holds that .
For example, since holds trivially, this corollary along with the fact that ([She12, CR11, Vid12], see subsection 5.2) immediately implies a direct sum theorem for .
Finally, regarding the central open question of whether or not it is possible to compress communication down to the information complexity for any function, we note that our result says that if one hopes to prove a negative result and separate information complexity from communication complexity, then one must use a lower bound technique that is stronger than the relaxed partition bound. To the best of our knowledge, the only such technique in the literature is the (standard) partition bound. We note, however, that to the best of our knowledge there are no known problems whose communication complexity can be lowerbounded by the partition bound but not by the relaxed partition bound.
1.2 Related work
Definitions of information complexity with some variations extend back to the work on privacy in interactive protocols [BYCKO93], and related definitions in the privacy literature appear [Kla02, FJS10, ACC12]. Information complexity as a tool in communication complexity was first used to prove direct sum theorems in the simultaneous message model [CSWY01], and subsequently to prove direct sum theorems and to study amortized communication complexity as stated in the first paragraph of this paper [BYJKS04, BBCR10, BR11, Bra12, BW12]. There are many other works using information complexity to prove lower bounds for specific functions or to prove direct sum theorems in restricted models of communication complexity, for example [JKS03, JRS03, JRS05, HJMR07].
In independent and concurrent work, Chakrabarti et al. proved that information complexity is lower bounded by the smooth rectangle bound under product distributions [CKW12]. While our result implies the result of [CKW12] as a special case, we note that their proof uses entirely different techniques and may be of independent interest.
2 Preliminaries
2.1 Notation and information theory facts
Let be a probability distribution over a (finite) universe . We will often treat as a function . For , we let . For singletons , we write interchangeably . Random variables are written in uppercase and fixed values in lowercase. We sometimes abuse notation and write a random variable in place of the distribution of that random variable.
For two distributions , we let denote their statistical distance, i.e. . We let be the relative entropy (i.e. KLdivergence). For two random variables , the mutual information is defined as , where is the Shannon entropy.
A rectangle of is a product set where and . We let denote a rectangle in . We let denote a fixed input, and be random inputs sampled according to some distribution (specified from context and usually denoted by ).
2.2 Information complexity
We study 2player communication protocols for calculating a function , where . Let be a randomized protocol (allowing both public and private coins, unless otherwise specified). We denote the randomness used by the protocol by . Let denote its output, i.e. the value in the two parties wish to compute.
The transcript of a protocol includes all messages exchanged, the output of the protocol (in fact we just need that both players can compute the output of the protocol from the transcript), as well as any public coins (but no private coins). The complexity of is the maximum (over all random coins) of the number of bits exchanged.
Let be a distribution over . Define if and otherwise and .
Definition 2.0.
Fix . Let be the tuple distributed according to sampled from and then being the transcript of the protocol applied to . Then define:
3 Zerocommunication protocols and the relaxed partition bound
3.1 The zerocommunication model and rectangles
Let us consider a (possibly partial) function . We say that is a valid input if , that is, satisfies the promise. In the zerocommunication model with abort, the players either output a value (they accept the run) or output (they abort).
Definition 3.0.
The zerocommunication model with abort is defined as follows:
 Inputs

Alice and Bob receive inputs and respectively.
 Output

Alice outputs and Bob outputs If both Alice and Bob output the same , then the output is . Otherwise, the output is .
We will study (publiccoin) randomized zerocommunication protocols for computing functions in this model.
3.2 Relaxed partition bound
The relaxed partition bound with error and input distribution , denoted by , is defined as follows.
Definition 3.0.
The distributional relaxed partition bound is the value of the following linear program. (The value of ranges over and over all rectangles, including the empty rectangle.)
(1)  
(2)  
(3) 
The relaxed partition bound is defined as .
We can identify feasible solutions to the program in subsection 3.2 as a particular type of randomized zerocommunication protocol: Alice and Bob sample according to the distribution given by the , and each individually sees if their inputs are in and if so they output , otherwise they abort. The parameter is the efficiency of the protocol [LLR12], that is, the probability that the protocol does not abort, and ideally we want it to be as large as possible.
There is also a natural way to convert any zerocommunication protocol into a distribution over : sample uniformly from , sample random coins for , and let be such that is the set of inputs on which Alice outputs in the protocol using random coins , and similarly for . (The sampling of a random incurs a loss of in the efficiency, which is why our bounds have a loss depending on . See subsection 3.4 for details.)
Relation to other bounds.
The relaxed partition bound is, as its name implies, a relaxation of the partition bound [JK10]. We also show that the relaxed partition bound is stronger than the smooth rectangle bound (the proof is provided in Appendix A).
Lemma 3.0.
For all and , we have .
Since Jain and Klauck have shown in [JK10] that the smooth rectangle bound is stronger than the rectangle/corruption bound, the method and the discrepancy method, this implies that the relaxed partition bound subsumes all these bounds as well. Therefore, our result implies that all these bounds are also lower bounds for information complexity.
We briefly explain the difference between the relaxed partition bound and the partition bound (more details appear in Appendix A). The partition bound includes two types of constraints. The first is a correctness constraint: on every input, the output of the protocol should be correct with probability at least . The second is a completeness constraint: on every input, the efficiency of the protocol (i.e. the probability it does not abort) should be exactly . In the relaxed partition bound, we keep the same correctness constraint. Since in certain applications the function is partial (such as the Vector in Subspace Problem [KR11]), one also has to handle the inputs where the function is not defined. We make this explicit in our correctness constraint. On the other hand, we relax the completeness constraint so that the efficiency may lie anywhere between and . This relaxation seems to be crucial for our proof of the lower bound on information complexity, since we are unable to achieve efficiency exactly .
3.3 Compression lemma
Lemma 3.0 (Main compression lemma).
There exists a universal constant such that for all distributions , communication protocols and , there exists a zerocommunication protocol and a real number such that
(4) 
(in statistical distance) and
(5)  
(6) 
Our compression extends the strategy outlined by [BW12]. At a high level, the protocol does the following:
 Sample transcripts

Alice and Bob use their shared randomness to repeat independent executions of an experiment to sample transcripts (subsection 4.1). Alice and Bob each decide whether the experiment is accepted (they may differ in their opinions).
 Find common transcript

Let be the set of accepted experiments for Alice, and the set of accepted experiments for Bob. They try to guess an element of . If they find one, they output according to the transcript from this experiment.
We prove our compression lemma in Section 4.
3.4 Information cost is lower bounded by the relaxed partition bound
We show how our compression lemma implies the main theorem.
Proof of Theorem 1.1.
Let be a randomized communication protocol achieving and let be the following relation that naturally arises from the function
Let us now consider the zerocommunication protocol from Lemma 3.3. As mentioned in subsection 3.2, there is a natural way to identify with a distribution over labeled rectangles : sample uniformly from , sample and let where is the set of inputs on which Alice outputs , and similarly for . The sampling of incurs a loss of in the efficiency.
We make this formal: for any fixed randomness occurring with probability , we define the rectangle as the set of such that the protocol outputs , and we let .
We check the normalization constraint
To see that Equation 2 is satisfied, we have by definition of that for any :
Finally, to see that Equation 1 is satisfied, we have
where for the last line we used the fact that has error , and so . This satisfies the constraints in the linear program (subsection 3.2) for with objective value . ∎
By the definitions of the information complexity and the relaxed partition bound, we have immediately
Corollary 3.0.
There exists a universal constant such that for all functions , all , we have .
4 The zerocommunication protocol
The zerocommunication protocol consists of two stages. First, Alice and Bob use their shared randomness to come up with candidate transcripts, based on the a priori information they have on the distribution of the transcripts given by the information cost of the protocol. To do this, they run some sampling experiments and decide which ones to accept. Second, they use their shared randomness in order to choose an experiment that they have both accepted. If anything fails in the course of the protocol, they abort by outputting .
4.1 Single sampling experiment
The single sampling experiment is described in subsection 4.1 and appeared first in [BW12] (variants also appeared in [Bra12] and [BR11]). Roughly, subsection 4.1 takes a distribution and two distributions over a universe such that are not too far from and tries to sample an element of that is close to being distributed according to .
Let us informally describe the goal of this sampling experiment in our context. Alice knowing and Bob want to sample transcripts according to which is the distribution over the transcripts of the protocol applied to . When inputs are fixed, the probability of a transcript occurring is the product of the probabilities of each bit in the transcript. The product of the probabilities for Alice’s bits is some function which depends on and the product of the probabilities for Bob’s bits is some function which depends on and . Alice can also estimate by taking the average over of . Call this estimate ; similarly for Bob’s estimate . Set and .
The challenge is that Alice and Bob know only and respectively and do not know (in our setting, ). They use a variant of rejection sampling, in which Alice will overestimate by a factor ; likewise for Bob. Let us define the set of bad elements with respect to as follows:
Intuitively, is bad if gives much more weight to it than . Observe that if , then implies that .
To prove our compression lemma, we use the following claim about the single sampling experiment.
Claim 4.0.
Let . Let . Then the following holds about subsection 4.1:

The probability that Alice accepts equals and the same for Bob.

The probability that the experiment is accepted is at most and at least .

Let denote the distribution of the output of the experiment, conditioned on it being accepted. Then .
Intuitively, this claim says that Alice accepts each single experiment with probability , and also implies that conditioned on Alice accepting the ’th experiment, it is relatively likely that Bob accepts it. Therefore, by repeating this experiment enough times, there is reasonable probability of Alice and Bob both accepting the same execution of the experiment. Conditioned on the experiment accepting, the output of the experiment is distributed close to the original distribution . In the next section, we show how to use a hash function to select a common accepting execution of the experiment out of many executions.
We will use the following lemma that appears in [Bra12].
Lemma 4.0 ([Bra12]).
For all , it holds that .
Proof of subsection 4.1.
We use the arguments first given in [BW12] and prove the items in order.

Probability Alice/Bob accepts. We do the analysis for Alice; the case for Bob is entirely symmetric. We may write:

Probability of accepting. First consider . For such , if then and also if then . Therefore we may write
(7) Furthermore, for any , we may write
For the upper bound we have:
For the lower bound we have

Statistical closeness of and . Let denote the probability that the experiment is accepted. From the previous point, we have that . By the definition of statistical distance, it suffices to prove that:
We proceed by splitting the elements of based on whether they intersect .
From Equation 7 we can deduce that . Therefore:
Since , we have that , which concludes the proof.
∎
4.2 Description and analysis of the zerocommunication protocol
Let be any distribution on inputs and be any protocol with information complexity . Let be the joint random variables where are distributed according to and is the distribution of the transcript of the protocol applied to (by slight abuse of notation we use the letter for both the transcript and its distribution). Let be conditioned on , be conditioned , and likewise.
Let be the space of all possible transcripts. We assume that each transcript contains the output of the protocol. As shown in [Bra12] and described above, Alice can construct functions and Bob can construct functions , such that for all , , , and .
The zerocommunication protocol is described in subsection 4.2. This protocol is an extension of the one in [BW12], where here Alice uses public coins to guess the hash function value instead of calculating and transmitting it to Bob and both players are allowed to abort when they do not believe they can output the correct value.
In order to analyze our protocol, we first define some events and give bounds on their probabilities.
Definition 4.0.
We define the following events over the probability space of sampling according to and running on to produce a transcript :

Large divergence. occurs if such that or . We will also let denote the set of such .

Collision. occurs if there exist distinct such that .

Protocol outputs something. occurs if .
The proof of the main compression lemma (subsection 3.3) uses the following claim.
Claim 4.0.
The probability of the above events are bounded as follows:

The inputs rarely have large divergence: .

For all , the hash function rarely has a collision: .

For all , the probability of outputting something is not too small: .

For all the probability of outputting something is not too large: .

For all protocols , input distributions and , the protocol in subsection 4.2 satisfies: For all , let be the distribution of conditioned on (namely, on ). Then .
Proof.
In the following, we will frequently use the fact that for all , it holds that .
We extend the arguments given in [BW12] to prove the items of the claim in order.

By the definition of information complexity and the fact that mutual information is equal to the expectation of the divergence, we have that for distributed according to ,
This implies that , and since divergence is nonnegative we have by Markov’s inequality that
The same argument holds for and by a union bound, we have .

We may write:
where we have used the independence between the trials and independence of the from the trials, as well as item 1 of subsection 4.1.

Let us define to be the event that the smallest satisfies , and also . (Notice this implies that are both nonempty.) We have
Observe that an element is in if and only if experiment is accepted by Alice. By item 1 of subsection 4.1, the probability of Alice aborting each experiment is . Since the experiments are independent, the probability of Alice aborting all experiments is
We assume now that is non empty and we denote by its first element.
For all , the probability that is exactly , in particular this holds for the first element of .
For any , we have that . Let us say that a transcript is “bad for Alice” (resp. Bob) if it lies in the set (resp. in the set . Using subsection 4.1, this implies that
It follows that .
By definition, for any , experiment is accepted if and only if . Therefore, :
where we used item 1 and item 2 of subsection 4.1, and the fact that implies that .
Also, observe that by the definition of the protocol, the choice of and are completely independent of the experiments. Therefore we may add the condition that without altering the probability. Since implies , we can add this condition too. We use this with , so therefore we may write
Finally, observe that , therefore we may conclude that:

We will again use the event as defined in the previous point. We will again use the fact that:
As before the first factor is exactly for any . We may also write:
where we used item 1 and item 2 of subsection 4.1. As with the previous point, adding the conditions and does not affect the probabilities. Therefore, . Finally, observe that , and therefore:

The distribution of conditioned on not aborting and on no collision is simply the distribution of the output of a single experiment, and we know from the facts about the single experiment (subsection 4.1) that this is close to . We wish to conclude that conditioned only on not aborting is also close to . The following lemma allows us to do this by using the fact that the probability of collision is small:
Claim 4.0.
Let and two distributions taking output in a common universe. Let and be two events in the underlying probability space of