The Pattern Matrix Method††thanks: To appear in SIAM J. Comput., 2009. A preliminary version of this article appeared under the title “The Pattern Matrix Method for Lower Bounds on Quantum Communication” in Proceedings of the 40th Annual ACM Symposium on Theory of Computing (STOC), pages 85-94, 2008.
We develop a novel and powerful technique for communication lower bounds, the pattern matrix method. Specifically, fix an arbitrary function and let be the matrix whose columns are each an application of to some subset of the variables We prove that has bounded-error communication complexity where is the approximate degree of This result remains valid in the quantum model, regardless of prior entanglement. In particular, it gives a new and simple proof of Razborov’s breakthrough quantum lower bounds for disjointness and other symmetric predicates. We further characterize the discrepancy, approximate rank, and approximate trace norm of in terms of well-studied analytic properties of broadly generalizing several recent results on small-bias communication and agnostic learning. The method of this paper has recently enabled important progress in multiparty communication complexity.
Key words. Pattern matrix method, bounded-error communication complexity, quantum communication complexity, discrepancy, Degree/Discrepancy Theorem, approximate rank, approximate trace norm, linear programming duality, approximation and sign-representation of Boolean functions by real polynomials.
AMS subject classifications. 03D15, 68Q15, 81P68
A central model in communication complexity is the bounded-error model. Let be a given function, where and are finite sets. Alice receives an input Bob receives and their objective is to compute with minimal communication. To this end, Alice and Bob share an unlimited supply of random bits. Their protocol is said to compute if on every input the output is correct with probability at least The canonical setting is but any other parameter can be considered. The cost of a protocol is the worst-case number of bits exchanged on any input. Depending on the physical nature of the communication channel, one studies the classical model, in which the messages are classical bits and and the more powerful quantum model, in which the messages are quantum bits and arbitrary prior entanglement is allowed. The communication complexity in these models is denoted and respectively.
Bounded-error protocols have been the focus of much research in communication complexity since the introduction of the area by Yao  three decades ago. A variety of techniques have been developed for proving lower bounds on classical communication, e.g., [27, 55, 21, 54, 13, 42, 20, 60]. There has been consistent progress on quantum communication as well [66, 3, 12, 31, 29, 56, 42], although quantum protocols remain much less understood than their classical counterparts.
The main contribution of this paper is a novel and powerful method for lower bounds on classical and quantum communication complexity, the pattern matrix method. The method converts analytic properties of Boolean functions into lower bounds for the corresponding communication problems. The analytic properties in question pertain to the approximation and sign-representation of a given Boolean function by real polynomials of low degree, which are among the oldest and most studied objects in theoretical computer science. In other words, the pattern matrix method takes the wealth of results available on the representations of Boolean functions by real polynomials and puts them at the disposal of communication complexity.
We consider two ways of representing Boolean functions by real polynomials. Let be a given Boolean function. The -approximate degree of denoted is the least degree of a real polynomial such that for all There is an extensive literature on the -approximate degree of Boolean functions [48, 50, 26, 8, 1, 2, 58, 64], for the canonical setting and various other settings. Apart from uniform approximation, the other representation scheme of interest to us is sign-representation. Specifically, the degree- threshold weight of is the minimum over all integers such that
where If no such integers exist, we write The threshold weight of Boolean functions has been heavily studied, both when is infinite [45, 4, 37, 38, 33, 32, 49] and when it is finite [45, 46, 7, 63, 34, 36, 52, 53]. The notions of uniform approximation and sign-representation are closely related, as we discuss in Section LABEL:sec:prelim. Roughly speaking, the study of threshold weight corresponds to the study of the -approximate degree for
Having defined uniform approximation and sign-representation for Boolean functions, we now describe how we use them to prove communication lower bounds. The central concept in our work is what we call a pattern matrix. Consider the communication problem of computing
where is a fixed Boolean function; the string is Alice’s input ( is a multiple of ); and the set with is Bob’s input. In words, this communication problem corresponds to a situation when the function depends on only of the inputs Alice knows the values of all the inputs but does not know which of them are relevant. Bob, on the other hand, knows which inputs are relevant but does not know their values. This communication game was introduced and studied in an earlier work by the author , in the context of small-bias communication. For the purposes of the introduction, one can think of the -pattern matrix as the matrix where ranges over the sets that have exactly one element from each block of the following partition:
We defer the precise definition to Section LABEL:sec:pattern-matrices. Observe that restricting to be of special form only makes our results stronger.
1.1 Our results
Our main result is a lower bound on the communication complexity of a pattern matrix in terms of the -approximate degree of the base function The lower bound holds for both classical and quantum protocols, regardless of prior entanglement.
Theorem 1.1 (communication complexity)
Let be the -pattern matrix, where is given. Then for every and every
Note that Theorem LABEL:thm:main-cc yields lower bounds for communication complexity with error probability for any In particular, apart from bounded-error communication (LABEL:eqn:main-cc-bounded), we obtain lower bounds for communication with small bias, i.e., error probability In Section LABEL:sec:additional-results, we derive another lower bound for small-bias communication, in terms of threshold weight
As R. de Wolf pointed out to us , the lower bound (LABEL:eqn:main-cc-bounded) for bounded-error communication is within a polynomial of optimal. More precisely, has a classical deterministic protocol with cost by the results of Beals et al. . See Proposition LABEL:prop:det-upper-bound for details. In particular, Theorem LABEL:thm:main-cc exhibits a large new class of communication problems whose quantum communication complexity is polynomially related to their classical complexity, even if prior entanglement is allowed. Before our work, the largest class of problems with polynomially related quantum and classical bounded-error complexities was the class of symmetric functions (see Theorem LABEL:thm:razborov03quantum below), which is broadly subsumed by Theorem LABEL:thm:main-cc. Exhibiting a polynomial relationship between the quantum and classical bounded-error complexities for all functions is a longstanding open problem.
Pattern matrices are of interest because they occur as submatrices in many natural communication problems. For example, Theorem LABEL:thm:main-cc can be interpreted in terms of function composition. Setting for concreteness, we obtain:
Let be given. Define by Then
As another illustration of Theorem LABEL:thm:main-cc, we revisit the quantum communication complexity of symmetric functions. In this setting Alice has a string Bob has a string and their objective is to compute for some predicate fixed in advance. This framework encompasses several familiar functions, such as disjointness (determining if and intersect) and inner product modulo (determining if and intersect in an odd number of positions). In a celebrated result, Razborov  established optimal lower bounds on the quantum communication complexity of every function of the above form:
Theorem 1.3 (Razborov)
Let be a given predicate. Put Then
where and are the smallest integers such that is constant in the range
Using Theorem LABEL:thm:main-cc, we give a new and simple proof of Razborov’s result. No alternate proof was available prior to this work, despite the fact that this problem has drawn the attention of various researchers [3, 12, 31, 29, 24, 42]. Moreover, the next-best lower bounds for general predicates were nowhere close to Theorem LABEL:thm:razborov03quantum. To illustrate, consider the disjointness predicate given by Theorem LABEL:thm:razborov03quantum shows that it has communication complexity while the next-best lower bound [3, 12] was only
Approximate rank and trace norm
We now describe some matrix-analytic consequences of our work. The -approximate rank of a matrix denoted is the least rank of a real matrix such that for all This natural analytic quantity arose in the study of quantum communication [66, 12, 56] and has since found applications to learning theory. In particular, Klivans and Sherstov  proved that concept classes (i.e., sign matrices) with high approximate rank are beyond the scope of all known techniques for efficient learning, in Kearns’ well-studied agnostic model . Exponential lower bounds were derived in  on the approximate rank of disjunctions, majority functions, and decision lists, with the corresponding implications for agnostic learning. We broadly generalize these results on approximate rank to any functions with high approximate degree or high threshold weight:
Theorem 1.4 (approximate rank)
Let be the -pattern matrix, where is given. Then for every and every
|In addition, for every and every integer|
We derive analogous results for the approximate trace norm, another matrix-analytic notion that has been studied in complexity theory. Theorem LABEL:thm:main-approx-rank is close to optimal for a broad range of parameters. See Section LABEL:sec:approx-rank for details.
The discrepancy of a function denoted is a combinatorial measure of the complexity of (small discrepancy corresponds to high complexity). This complexity measure plays a central role in the study of communication. In particular, it fully characterizes membership in the class of communication problems with efficient small-bias protocols . Discrepancy is also known  be to equivalent to margin complexity, a key notion in learning theory. Finally, discrepancy is of interest in circuit complexity [21, 22, 47]. We are able to characterize the discrepancy of every pattern matrix in terms of threshold weight:
Theorem 1.5 (discrepancy)
Let be the -pattern matrix, for a given function Then
As we show in Section LABEL:sec:disc, Theorem LABEL:thm:main-discrepancy is close to optimal. It is a substantial improvement on the author’s earlier work .
As an application of Theorem LABEL:thm:main-discrepancy, we revisit the discrepancy of the class of polynomial-size constant-depth circuits with AND, OR, NOT gates. In an earlier work , we obtained the first exponentially small upper bound on the discrepancy of a function in We used this result in  to prove that depth- majority circuits for require exponential size, solving an open problem due to Krause and Pudlák . Using Theorem LABEL:thm:main-discrepancy, we are able to considerably sharpen the bound in . Specifically, we prove:
We defer the new circuit implications and other discussion to Sections LABEL:sec:disc and LABEL:sec:app-discrepancy. Independently of the work in , Buhrman et al.  exhibited another function in with exponentially small discrepancy:
Theorem (Buhrman et al.). Let be given by Then
Using Theorem LABEL:thm:main-discrepancy, we give a new and simple proof of this result.
1.2 Our techniques
The setting in which to view our work is the generalized discrepancy method, a straightforward but very useful principle introduced by Klauck  and reformulated in its current form by Razborov . Let be a Boolean function whose bounded-error communication complexity is of interest. The generalized discrepancy method asks for a Boolean function and a distribution on -pairs such that:
the functions and have correlation under ; and
all low-cost protocols have negligible advantage in computing under
If such and indeed exist, it follows that no low-cost protocol can compute to high accuracy (otherwise it would be a good predictor for the hard function as well). This method applies broadly to many models of communication, as we discuss in Section LABEL:sec:discrepancy. It generalizes Yao’s original discrepancy method , in which The advantage of the generalized version is that it makes it possible, in theory, to prove lower bounds for functions such as disjointness, to which the traditional method does not apply.
The hard part, of course, is finding and with the desired properties. Except in rather restricted cases [29, Thm. 4], it was not known how to do it. As a result, the generalized discrepancy method was of limited practical use prior to this paper. Here we overcome this difficulty, obtaining and for a broad range of problems, namely, the communication problems of computing
Pattern matrices are a crucial first ingredient of our solution. We derive an exact, closed-form expression for the singular values of a pattern matrix and their multiplicities. This spectral information reduces our search from and to a much smaller and simpler object, namely, a function with certain properties. On the one hand, must be well-correlated with the base function On the other hand, must be orthogonal to all low-degree polynomials. We establish the existence of such by passing to the linear programming dual of the approximate degree of Although the approximate degree and its dual are classical notions, we are not aware of any previous use of this duality to prove communication lower bounds.
For the results that feature threshold weight, we combine the above program with the dual characterization of threshold weight. To derive the remaining results on approximate rank, approximate trace norm, and discrepancy, we apply our main technique along with several additional matrix-analytic and combinatorial arguments.
1.3 Recent work on multiparty complexity
The method of this paper has recently enabled important progress in multiparty communication complexity by a number of researchers. Lee and Shraibman  and Chattopadhyay and Ada  observed that our method adapts in a straightforward way to the multiparty model, thereby obtaining much improved lower bounds on the communication complexity of disjointness for up to players. David and Pitassi  ingeniously combined this line of work with the probabilistic method, establishing a separation of the communication classes NP and BPP for up to players. Their construction was derandomized in a follow-up paper by David, Pitassi, and Viola , resulting in an explicit separation. See the survey article  for a unified guide to these results, complete with all the key proofs. A very recent development is due to Beame and Huynh-Ngoc , who continue this line of research with improved multiparty lower bounds for functions.
We start with a thorough review of technical preliminaries in Section LABEL:sec:prelim. The two sections that follow are concerned with the two principal ingredients of our technique, the pattern matrices and the dual characterization of the approximate degree and threshold weight. Section LABEL:sec:pattern-matrix-method integrates them into the generalized discrepancy method and establishes our main result, Theorem LABEL:thm:main-cc. In Section LABEL:sec:additional-results, we prove an additional version of our main result using threshold weight. We characterize the discrepancy of pattern matrices in Section LABEL:sec:disc. Approximate rank and approximate trace norm are studied next, in Section LABEL:sec:approx-rank. We illustrate our main result in Section LABEL:sec:razborovs-result by giving a new proof of Razborov’s quantum lower bounds. As another illustration, we study the discrepancy of in Section LABEL:sec:app-discrepancy. We conclude with some remarks on the well-known log-rank conjecture in Section LABEL:sec:logrank and a discussion of related work in Section LABEL:sec:shi-zhu.
We view Boolean functions as mappings for a finite set where and correspond to “true” and “false,” respectively. Typically, the domain will be or A predicate is a mapping The notation stands for the set For a set its characteristic vector is defined by
For we put For we define For the notation refers as usual to the component-wise conjunction of and Analogously, the string stands for the component-wise disjunction of and In particular, is the number of positions in which the strings and both have a Throughout this manuscript, “” refers to the logarithm to base As usual, we denote the base of the natural logarithm by For any mapping where is a finite set, we adopt the standard notation We adopt the standard definition of the sign function:
Finally, we recall the Fourier transform over Consider the vector space of functions equipped with the inner product
For define by Then is an orthonormal basis for the inner product space in question. As a result, every function has a unique representation of the form
where . The reals are called the Fourier coefficients of The degree of denoted is the quantity The orthonormality of immediately yields Parseval’s identity:
The following fact is immediate from the definition of :
Let be given. Then
A Boolean function is called symmetric if is uniquely determined by Equivalently, a Boolean function is symmetric if and only if
for all inputs and all permutations Note that there is a one-to-one correspondence between predicates and symmetric Boolean functions. Namely, one associates a predicate with the symmetric function
2.1 Matrix analysis
We draw freely on basic notions from matrix analysis. In particular, we assume familiarity with the singular value decomposition; positive semidefinite matrices; matrix similarity; matrix trace and its properties; the Kronecker product and its spectral properties; the relation between singular values and eigenvalues; and eigenvalue computation for matrices of simple form. An excellent reference on the subject is . The review below is limited to notation and the more substantial results.
The symbol refers to the family of all matrices with real entries. We specify matrices by their generic entry, e.g., In most matrices that arise in this work, the exact ordering of the columns (and rows) is irrelevant. In such cases we describe a matrix by the notation where and are some index sets. We denote the rank of by We also write
We denote the singular values of by Recall that the spectral norm, trace norm, and Frobenius norm of are given by
For a square matrix its trace is given by
Recall that every matrix has a singular value decomposition where and are orthogonal matrices and is diagonal with entries For we write A useful consequence of the singular value decomposition is:
Following , we define the -approximate trace norm of a matrix by
The next proposition is a trivial consequence of (LABEL:eqn:hoffman-wielandt-conseq).
Let and Then
Proof. Fix any and such that Then by (LABEL:eqn:hoffman-wielandt-conseq). On the other hand, Comparing these two estimates of gives the sought lower bound on
Following , we define the -approximate rank of a matrix by
The approximate rank and approximate trace norm are related by virtue of the singular value decomposition, as follows.
Let and be given. Then
Proof (adapted from ). Fix with Then
We will also need a well-known bound on the trace norm of a matrix product, which we state with a proof for the reader’s convenience.
For all real matrices and of compatible dimensions,
Proof. Write the singular value decomposition Let and stand for the columns of and respectively. By definition, is the sum of the diagonal entries of We have:
2.2 Approximation and sign-representation
For a function we define
where the minimum is over real polynomials of degree up to The -approximate degree of denoted is the least with In words, the -approximate degree of is the least degree of a polynomial that approximates uniformly within
For a Boolean function the -approximate degree is of particular interest for The choice of is a convention and can be replaced by any other constant in without affecting by more than a multiplicative constant. Another well-studied notion is the threshold degree defined for a Boolean function as the least degree of a real polynomial with In words, is the least degree of a polynomial that represents in sign.
So far we have considered representations of Boolean functions by real polynomials. Restricting the polynomials to have integer coefficients yields another heavily studied representation scheme. The main complexity measure here is the sum of the absolute values of the coefficients. Specifically, for a Boolean function its degree- threshold weight is defined to be the minimum over all integers such that
If no such integers can be found, we put It is straightforward to verify that the following three conditions are equivalent: ; ; In all expressions involving we adopt the standard convention that and for any real
As one might expect, representations of Boolean functions by real and integer polynomials are closely related. In particular, we have the following relationship between and
Let be given. Then for
with the convention that
Since Theorem LABEL:thm:E-vs-W is not directly used in our derivations, we defer its proof to Appendix LABEL:sec:E-vs-W. Similar statements have been noted earlier by several authors [38, 11]. We close this section with Paturi’s tight estimate  of the approximate degree for each symmetric Boolean function.
Theorem 2.6 (Paturi)
Let be a given function such that for some predicate Then
where and are the smallest integers such that is constant in the range
2.3 Quantum communication
This section reviews the quantum model of communication complexity. We include this review mainly for completeness; our proofs rely solely on a basic matrix-analytic property of such protocols and on no other aspect of quantum communication.
There are several equivalent ways to describe a quantum communication protocol. Our description closely follows Razborov . Let and be complex finite-dimensional Hilbert spaces. Let be a Hilbert space of dimension whose orthonormal basis we denote by Consider the tensor product which is itself a Hilbert space with an inner product inherited from and The state of a quantum system is a unit vector in and conversely any such unit vector corresponds to a distinct quantum state. The quantum system starts in a given state and traverses a sequence of states, each obtained from the previous one via a unitary transformation chosen according to the protocol. Formally, a quantum communication protocol is a finite sequence of unitary transformations
where: and are the identity transformations in and respectively; are unitary transformations in ; and are unitary transformations in The cost of the protocol is the length of this sequence, namely, On Alice’s input and Bob’s input (where are given finite sets), the computation proceeds as follows.
The quantum system starts out in an initial state
Through successive applications of the above unitary transformations, the system reaches the state
Let denote the projection of onto The output of the protocol is with probability and with the complementary probability
All that remains is to specify how the initial state is constructed from It is here that the model with prior entanglement differs from the model without prior entanglement. In the model without prior entanglement, and have orthonormal bases and respectively, where is a finite set corresponding to the private workspace of each of the parties. The initial state is the pure state
where is a certain fixed element. In the model with prior entanglement, the spaces and have orthonormal bases and respectively, where is as before and is a finite set corresponding to the prior entanglement. The initial state is now the entangled state
Apart from finite size, no assumptions are made about or In particular, the model with prior entanglement allows for an unlimited supply of entangled qubits. This mirrors the unlimited supply of shared random bits in the classical public-coin randomized model.
Let be a given function. A quantum protocol is said to compute with error if
for all where the random variable is the output of the protocol on input Let denote the least cost of a quantum protocol without prior entanglement that computes with error Define analogously for protocols with prior entanglement. The precise choice of a constant affects and by at most a constant factor, and thus the setting entails no loss of generality.
Let be a predicate. We associate with the function defined by We let and More generally, by computing in the quantum model we mean computing the associated function We write for the least cost of a classical public-coin protocol for that errs with probability at most on any given input. Another classical model that figures in this paper is the deterministic model. We let denote the deterministic communication complexity of Throughout this paper, by the communication complexity of a Boolean matrix we will mean the communication complexity of the associated function given by
2.4 The generalized discrepancy method
The generalized discrepancy method is an intuitive and elegant technique for proving communication lower bounds. A starting point in our discussion is the following fact due to Linial and Shraibman [42, Lem. 10], with closely analogous statements established earlier by Yao , Kremer , and Razborov .
Let be finite sets. Let be a quantum protocol with or without prior entanglement with cost qubits and input sets and Then
for some real matrices with and
Theorem LABEL:thm:protocol2matrix states that the matrix of acceptance probabilities of every low-cost protocol has a nontrivial factorization. This transition from quantum protocols to matrix factorization is a standard technique and has been used by various authors in various contexts.
The generalized discrepancy method was first applied by Klauck [29, Thm. 4] and reformulated more broadly by Razborov . The treatment in  is informal. In what follows, we propose a precise formulation of the generalized discrepancy method and supply a proof.
Theorem 2.8 (generalized discrepancy method)
Let be finite sets and a given function. Let be any real matrix with Then for each
Proof. Let be a quantum protocol with prior entanglement that computes with error and cost Put
Then we can write where is the all-ones matrix and is some matrix with As a result,
On the other hand, Theorem LABEL:thm:protocol2matrix guarantees the existence of matrices and with and Therefore,
|by Prop. LABEL:prop:bound-on-trace-of-product|
The theorem follows by comparing (LABEL:eqn:lower-inner-product) and (LABEL:eqn:upper-inner-product).
Theorem LABEL:thm:discrepancy-method is not to be confused with Razborov’s multidimensional technique, also found in , which we will have no occasion to use or describe.
We will now abstract away the particulars of Theorem LABEL:thm:discrepancy-method and articulate the fundamental mathematical technique in question. This will clarify the generalized discrepancy method and show that it is simply an extension of Yao’s original discrepancy method [40, §3.5]. Let be a given function whose communication complexity we wish to estimate. The underlying communication model is irrelevant at this point. Suppose we can find a function and a distribution on that satisfy the following two properties.
Correlation. The functions and are well correlated under :
where is a given constant.
Hardness. No low-cost protocol in the given model of communication can compute to a substantial advantage under Formally, if is a protocol in the given model with cost bits, then
where The inner expectation in (LABEL:eqn:hardness-under-mu) is over the internal operation of the protocol on the fixed input
If the above two conditions hold, we claim that any protocol in the given model that computes with error at most on each input must have cost Indeed, let be a protocol with for all Then standard manipulations reveal:
where the last step uses (LABEL:eqn:correlation-f-g). In view of (LABEL:eqn:hardness-under-mu), this shows that must have cost
We attach the term generalized discrepancy method to this abstract framework. Readers with background in communication complexity will note that the original discrepancy method of Yao [40, §3.5] corresponds to the case when and the communication takes place in the two-party randomized model.
The purpose of our abstract discussion was to expose the fundamental mathematical technique in question, which is independent of the communication model. Indeed, the communication model enters the picture only in the proof of (LABEL:eqn:hardness-under-mu). It is here that the analysis must exploit the particularities of the model. To place an upper bound on the advantage under in the quantum model with entanglement, as we see from (LABEL:eqn:upper-inner-product), one considers the quantity where In the classical randomized model, the quantity to estimate happens to be
which is known as the discrepancy of under
3 Duals of approximation and sign-representation
Crucial to our work are the dual characterizations of the uniform approximation and sign-representation of Boolean functions by real polynomials. As a starting point, we recall a classical result from approximation theory due to Ioffe and Tikhomirov  on the duality of norms. A more recent treatment is available in the textbook of DeVore and Lorentz , p. 61, Thm. 1.3. We provide a short and elementary proof of this result in Euclidean space, which will suffice for our purposes. We let stand for the linear space of real functions on the set
Theorem 3.1 (Ioffe and Tikhomirov)
Let be a finite set. Fix and a function Then
where the maximum is over all functions such that
and, for each
Proof. The theorem holds trivially when Otherwise, let be a basis for Observe that the left member of (LABEL:eqn:berr-morth) is the optimum of the following linear program in the variables :
Standard manipulations reveal the dual:
Both programs are clearly feasible and thus have the same finite optimum. We have already observed that the optimum of first program is the left-hand side of (LABEL:eqn:berr-morth). Since form a basis for the optimum of the second program is by definition the right-hand side of (LABEL:eqn:berr-morth).
As a corollary to Theorem LABEL:thm:berr-morth, we obtain a dual characterization of the approximate degree.
Theorem 3.2 (approximate degree)
Fix Let be given, Then there is a function such that
Proof. Set and