A Characterization of Approximation Resistance

# A Characterization of Approximation Resistance

## Abstract

A predicate with is called approximation resistant if given a near-satisfiable instance of CSP, it is computationally hard to find an assignment that satisfies at least fraction of the constraints.

We present a complete characterization of approximation resistant predicates under the Unique Games Conjecture. We also present characterizations in the mixed linear and semi-definite programming hierarchy and the Sherali-Adams linear programming hierarchy. In the former case, the characterization coincides with the one based on UGC. Each of the two characterizations is in terms of existence of a probability measure with certain symmetry properties on a natural convex polytope associated with the predicate.

## 1 Introduction

Constraint satisfaction problems (CSPs) are some of the most well-studied NP-hard problems. Given a predicate , an instance of CSP consists of -valued1 variables and constraints where each constraint is the predicate applied to an ordered subset of variables, possibly in negated form. For example, the OR predicate on variables corresponds to the -SAT problem whereas the PARITY predicate (i.e. whether the product of the variables is ) on variables corresponds to the -LIN problem. The satisfiability problem for CSP asks whether there is an assignment that satisfies all the constraints. A well-known dichotomy result of Schaefer [35] shows that for every predicate , the satisfiability problem for CSP is either in P or NP-complete and moreover his characterization explicitly gives a (short) list of predicates for which the problem is in P.

An instance of CSP is called -satisfiable if there is an assignment that satisfies at least fraction of the constraints. The focus of this paper is whether given a -satisfiable instance, there is an efficient algorithm with a non-trivial performance. The density of the predicate is the probability that a uniformly random assignment to its variables satisfies the predicate. Given an instance of CSP, a naive algorithm that assigns random values to its variables yields an assignment that satisfies fraction of the constraints in expectation.

With this observation in mind, we consider the well-studied notion of approximation resistance. The instance is promised to be -satisfiable and the algorithm is considered non-trivial if it finds an assignment such that the fraction of assignments satisfied is at least , i.e. the algorithm has to do something more clever than outputting a random assignment. If such an efficient algorithm exists, the predicate is called approximable and approximation resistant otherwise.

Towards the study of approximation resistance, it is convenient to define the gap version of the problem. GapCSP is a promise problem such that the instance is guaranteed to be either -satisfiable or at most -satisfiable. Thus a predicate is approximation resistant if GapCSP is not in P. For resistant predicates, one would ideally like to show that the corresponding gap problem is NP-hard, or as is often the case, settle for a weaker notion of hardness such as UG-hardness (i.e. NP-hard assuming the Unique Games Conjecture [24]) or hardness, a.k.a. integrality gap, for a specific family of linear or semidefinite programming relaxation.

Until early 1990s, very little, if anything, was known regarding whether any interesting predicate is approximable or approximation resistant. By now we have a much better understanding of this issue thanks to a sequence of spectacular results. Goemans and Williamson [17, 37] showed that 2SAT and 2LIN are approximable.2 The discovery of the PCP Theorem [15, 2, 1], aided by works such as [7, 33], eventually led to Håstad’s result that 3SAT and 3LIN are approximation resistant and in fact that the appropriate gap versions are NP-hard! Since then, many predicates have been shown to be approximation resistant (see e.g. [18, 34, 23, 14], all NP-hardness) and most recently, a remarkable result of Chan [9] shows the approximation resistance of the Hypergraph Linearity Predicate (he shows NP-hardness whereas UG-hardness was shown earlier in [34]). Also, a general result of Raghavendra [30] shows that if a predicate is approximable, then it is so via a natural SDP relaxation of the problem followed by a rounding of the solution (the result is more general than stated: it applies to every -gap).

In this paper, our focus is towards obtaining a complete characterization of approximation resistance for all predicates, in the spirit of Schaefer’s theorem. There has been some progress in this direction that we sketch now. Every predicate of arity is approximable as follows from Goemans and Williamson’s algorithm [17].3 A complete classification of predicates of arity is known [37, 39]: a predicate of arity is approximation resistant (NP-hard) if it is implied by PARITY up to variable negations and approximable otherwise. For predicates of arity , Hast [19] gives a partial classification. Austrin and Mossel [6] show that a predicate is approximation resistant (UG-hard) if the set of its satisfying assignments supports a pairwise independent distribution (for a somewhat more general sufficient condition see [4]). Using this sufficient condition, Austrin and Håstad [3] show that a vast majority of -ary predicates for large are approximation resistant. Hast [20] shows that a -ary predicate with at most satisfying assignments is approximable.

In spite of all these works, a complete characterization of approximation resistance remained elusive. A recent result of Austrin and Khot [5] gives a complete characterization of approximation resistance (UGC-based) when the CSP is restricted to be -partite4 and the predicate is even.5 Given an even predicate , the authors therein associate with it a convex polytope consisting of all vectors of dimension that arise as the second moment vectors of distributions supported on . It is shown that the -partite version of CSP is approximation resistant (UG-hard) if and only if supports a distribution (a probability measure to be more precise) with a certain (difficult to state) property. The -partiteness condition is rather restrictive and without the evenness condition, one would need to take into account the first moment vector as well and it is not clear how to incorporate this in [5].

#### Characterizing Approximation Resistance

In this paper, we indeed give a complete characterization of approximation resistance, via an approach that is entirely different than [5]. Before stating the characterization, we point out that the characterization is not as simple as one may wish and we do not yet know whether it is decidable, both these features also shared by the result in [5]. 6

Roughly speaking our characterization states that a predicate is approximation resistant (UG-hard) if and only if a convex polytope associated with it supports a probability measure with certain symmetry properties.7 Specifically, let be the convex polytope consisting of all vectors of dimension that arise as the first and second moment vectors

 ((Ez∈ν[zi]|1≤i≤k),(Ez∼ν[zizj]|1≤i

of distributions supported on . For a measure on and a subset , let denote the projection of onto the co-ordinates in . For a permutation and a choice of signs , let denote the measure after permuting the indices in according to and then (possibly) negating the co-ordinates according to multiplication by . We are now ready to state our characterization.

###### Definition 1.1

Let be the family of all predicates (of all arities) such that there is a probability measure on such that for every , the signed measure

 Λ(t) := E|S|=t Eπ:[t]→[t] Eb∈{−1,1}t[(t∏i=1bi)⋅^f(S)⋅ΛS,π,b] (1.1)

vanishes identically. If so, itself is said to vanish.

Much elaboration is in order. In the above expression, the expectation is over a random subset of of size , a random permutation of and a random choice of signs on . The coefficients are the Fourier coefficients of the predicate , namely, the coefficients in the Fourier representation:

 f(x1,…,xk)=ρ(f)+∑S≠∅^f(S)∏i∈Sxi.

A signed measure is allowed to take negative values as well (as is evident from the possibly negative sign of and in the above expression). An equivalent way to state the condition is that if one writes the Expression (1.1) as a difference of two non-negative measures and by grouping the terms with positive and negative coefficients respectively, then the two non-negative measures are identical.

Our characterization states that if , then is approximation resistant (UG-hardness) and otherwise approximable. In the former case, the vanishing measure is a hard to round measure (in fact any proposed hard to round measure must be a vanishing measure). In the latter case, we can in fact conclude that the predicate is approximable via a natural SDP relaxation followed by a -dimensional rounding algorithm. A -dimensional rounding algorithm samples a -dimensional rounding function from an appropriate distribution, projects the SDP vectors onto a random -dimensional subspace and then rounds using . We find this conclusion rather surprising. As mentioned earlier, it follows from Raghavendra [30] that if a predicate is approximable then it is so via (the same) SDP relaxation followed by a rounding. However his rounding (and/or the one in [31]) is high dimensional in the sense that one first projects onto a random -dimensional subspace and then rounds using an appropriately sampled function and there is no a priori upper bound on the dimension required.

It is instructive to check that our characterization generalizes the sufficient condition for approximation resistance due to Austrin and Mossel [6]. Suppose that a predicate supports a pairwise independent distribution. This amounts to saying that the dimensional all-zeroes vector lies in the polytope . It is immediate that the measure concentrated at this single vector is vanishing (the all-zeroes vector and its projections onto subsets remain unchanged under sign-flips via and these terms cancel each other out due to the sign in the expression) and hence the predicate is approximation resistant. It is also instructive to check the case . In this case, the condition implies, in particular, that

 Eζ∼Λ[k∑i=1^f({i})⋅ζ(i)]=0.

Here denotes the first moment (i.e. bias) in the vector . For all the predicates that are known to be approximation resistant so far in literature, there is always a single hard to round point , i.e. the measure is concentrated at a single point . In that case, the above condition specializes to and this condition is known to be necessary (as a folklore among the experts at least). This is because otherwise a rounding that simply rounds each variable according to its bias given by the LP relaxation (and then flipping signs of all variables simultaneously if necessary) will strictly exceed the threshold . The term represents the contribution to the advantage over by the level- Fourier coefficients and a standard trick allows one to ignore the (potentially troublesome) interference from higher order Fourier levels. The conditions for intuitively rule out successively more sophisticated rounding strategies and taken together for all form a complete set of necessary and sufficient conditions for strong approximation resistance.

It seems appropriate to point out another aspect in which our result differs from [30, 31]. It can be argued (as also discussed in [5]) that [31] also gives a characterization of approximation resistance in the following sense. The authors therein propose a brute force search over all instances and their potential SDP solutions on variables which determines the hardness threshold up to an additive . Thus if a predicate is approximable with an advantage of say over the trivial threshold and if were known a priori, then the brute force search will be able to affirm this. However, there is no a priori lower bound on and thus this characterization is not known to be decidable either. Moreover, it seems somewhat of a stretch to call it a characterization because of the nature of the search involved. On the other hand, our characterization is in terms of concrete symmetry properties of a measure supported on the explicit and natural polytope . The characterization does not depend on the topology (i.e. the hyper-graph structure) of the CSP instance. We find this conclusion rather surprising as well. A priori, what might make a predicate hard is both a hard to round measure over local LP/SDP distributions (i.e. a measure on ) as well as the topology of the constraint hyper-graph (i.e. how the variables and constraints fit together). Our conclusion is that the latter aspect is not relevant, not in any direct manner at least. This conclusion may be contrasted against Raghavendra’s result. He shows that any SDP integrality gap instance can be used as a gadget towards proving a UG-hardness result with the same gap. The instance here refers to both the variable-constraints topology and the local LP/SDP distributions and from his result, it is not clear whether one or the other or both the aspects are required to make the CSP hard.

When CSP instances are restricted to be -partite as in [5], we are able to obtain a complete characterization. For the family defined below, if then the partite version is approximation resistant and otherwise the partite version is approximable.

###### Definition 1.2

Let be the family of all predicates (of all arities) such that there is a probability measure on such that for every , the signed measure

 ΛS := ^f(S)⋅Eb∈{−1,1}S[(∏i∈Sbi)⋅ΛS,b] (1.2)

vanishes identically.

The difference from Definition 1.1 is that each non-empty set is considered separately and there are no permutations of the set. We note that for even predicates, the first co-ordinates in the body corresponding to the first moments (i.e. “biases”) can be assumed to be identically zero and then the characterization boils down to one in [5] (though there it is stated differently).

We point out some directions left open by the discussion so far (we do not consider these as the focus of the current paper). Firstly, it would be nice to show that our characterization is decidable. Secondly, we are not aware of an approximation resistant predicate where one needs a combination of more than one hard to round points in . In other words, it might be the case that for every approximation resistant predicate, there exists a vanishing measure on that is concentrated on a single point or on a bounded number of points with an a priori bound. If this were the case, our characterization will be decidable (we omit the proof). In this regard, it would be interesting to investigate the example of an arity predicate in [4], Example 8.7 therein. The authors show that the predicate is approximation resistant by presenting a hard to round point. However, the approximation resistance is shown in an ad hoc manner that, as far as we see, does not immediately give a vanishing measure . Such a measure must exist by our results and would perhaps require a combination of more than one point. Thirdly, it will be interesting to show that for some special classes of predicates our characterization takes a much simpler form. For instance, [11] asks whether there is a linear threshold predicate that is approximation resistant. It would be nice if for such predicates our characterization takes a simpler form and leads to a resolution of the question. Finally, for predicates that do not satisfy our characterization and hence not approximation resistant, our result suggests that there are sophisticated rounding algorithms whose analysis may require looking at terms at level and above in the Fourier representation (as opposed to only using terms at first and second level). We do not yet have explicit examples and leave it as an exciting open question.

#### Results for Linear and Semidefinite Relaxations

We now move onto a discussion about our results concerning the notion of approximation resistance in the context of linear and/or semi-definite programming relaxations. A CSP instance can be formulated as an integer program and its variables may be relaxed to assume real values (in the case of LP relaxation) or vector values (in the case of SDP relaxation). The integrality gap of a relaxation is the maximum gap between the optimum of the integer program and the optimum of the relaxed program. An integrality gap instance is a concrete instance of a CSP whose LP/SDP optimum is high and the integer optimum is low. Constructing such gap instances is taken as evidence that the LP/SDP based approach will not achieve good approximation to the CSP. The LP/SDP relaxation may be ad hoc or may be obtained by systematically adding inequalities, in successive rounds, each additional round yielding a potentially tighter relaxation. The latter method is referred to as an LP or SDP hierarchy and several such hierarchies have been proposed and well-studied [12].

In this paper, we focus on one ad hoc relaxation that we call basic relaxation and two hierarchies, namely the mixed hierarchy and the Sherali-Adams LP hierarchy. We refer to Section 2 for their formal definitions, but provide a quick sketch here. Consider a CSP instance with a -ary predicate , a set of variables and constraints . We think of the number of rounds as or more. The -round Sherali-Adams LP is required to provide, for every set , a local distribution over assignments to the set , namely . The local distributions must be consistent in the sense that for any two sets of size at most and , the local distributions to and have the same marginals on . The -round mixed hierarchy, in addition, is supposed to assign unit vectors to variables such that the pairwise inner products of these vectors match the second moments of the local distributions: (this is a somewhat simplified view). The basic relaxation is a reduced form of the -round mixed hierarchy where a local distribution over a set needs to be specified only if is a set of variables of some constraint . The only consistency requirements are that if variables appear together inside some constraint on set . Finally, the objective function for all three programs is the same: the probability that an assignment sampled from the local distribution over a constraint satisfies the predicate (accounting for variable negations), averaged over all constraints.

A -integrality gap for a relaxation is an instance that is at most -satisfiable, but has a feasible LP/SDP solution with objective value at least . A predicate is approximation resistant w.r.t. a given relaxation if the relaxation has integrality gap. The general result of Raghavendra referred to before shows that for any gap location , UG-hardness is equivalent to integrality gap for the basic relaxation. Moreover, the general results of Raghavendra and Steurer [32] and Khot and Saket [26] show that the integrality gap for basic relaxation is equivalent to that for a super-constant number of rounds of the mixed hierarchy.

Our characterization of approximation resistance for the basic relaxation and the mixed hierarchy is the same and coincides with one in Definition 1.1 whereas that for the Sherali-Adams LP is different and presented below.

When as in Definition 1.1, we construct a integrality gap for the basic relaxation. From the general results [30, 32, 26] mentioned before, integrality gap for basic relaxation can be translated into the same gap for mixed hierarchy and into UG-hardness. When , we know that the predicate is approximable and moreover the algorithm is a rounding of the basic relaxation. When as in Definition 1.2, the UG-hardness as well as integrality gap constructions can be ensured to be on -partite instances, as in [5].

Finally we focus on the characterization of approximation resistance in Sherali-Adams LP hierarchy. Here the situation is fundamentally different at a conceptual level. The difference is illustrated by the (arguably the simplest) predicate 2LIN. Goemans and Williamson show that 2LIN is approximable via an SDP relaxation, namely the basic relaxation according to our terminology. In fact the approximation is really close: on an -satisfiable instance, the relaxation finds -satisfying assignment (which is asymptotically ). It is also known that this is precisely the integrality gap as well as UG-hardness gap [16, 25]. However, the predicate turns out to be approximation resistant in the Sherali-Adams LP hierarchy as shown by de la Vega and Mathieu [13]! They show integrality gap for rounds of the Sherali-Adams hierarchy, which is subsequently improved to rounds in [10].

Even though the approximation resistance in Sherali-Adams LP hierarchy is fundamentally different, our characterization of resistance here looks syntactically similar to the ones before, once we ignore the second moments (which are not available in the LP case).

###### Definition 1.3

Let be the family of all predicates (of all arities) such that there is a probability measure on such that for every , the signed measure

 Λ∗,(t) := E|S|=t Eπ:[t]→[t] Eb∈{−1,1}t[(t∏i=1bi)⋅^f(S)⋅Λ∗S,π,b] (1.3)

vanishes identically. Here is the projection of the polytope to the first co-ordinates corresponding to the first moments and are as earlier, but for the projected polytope .

We show that if , then there is a integrality gap for a super-constant number of rounds of Sherali-Adams hierarchy. Otherwise there is an approximation given by -rounds of the hierarchy. For the class of symmetric -ary predicates, our characterization takes a simple form. If is symmetric then if and only if there are inputs such that and , .

#### Equivalence of Approximation Resistance and Strong Approximation Resistance

In a previous version of this paper [27], we obtained a characterization of a related notion that we called strong approximation resistance. In this notion, an algorithm is considered non-trivial if on a near-satisfiable instance of CSP, it finds an assignment such that the fraction of constraints satisfied is outside the range . If such an efficient algorithm exists, the predicate is called weakly approximable and strongly approximation resistant otherwise.

We are now able to show that our characterization for strong approximation resistance applies to approximation resistance as well, i.e. the two notions of resistance are equivalent! In other words, every predicate is either approximable (as opposed to weakly approximable) or is strongly approximation resistant (as opposed to just approximation resistant), i.e. the best in both the worlds. To emphasize further, the equivalence means that for an approximation resistant predicate , there is a reduction from the Unique Games problem to GapCSP such that in the NO case, the instance has an additional property that every assignment to its variables satisfies between and fraction of the constraints, i.e. not more and not less than the threshold by a non-negligible amount. Similarly, all the LP/SDP integrality gap instances also share this additional property.

We show that a predicate is either approximable or there exists a vanishing measure . In the latter case, the measure is used to construct integrality gap for the mixed hierarchy which then implies approximation resistance via Raghavendra’s result. Though we do not present it here, it is also possible to use to directly construct a dictatorship test and prove approximation resistance (i.e. without going through the integrality gap construction). In either case, the fact that is a vanishing measure ensures that in the soundness analysis, the Fourier terms that are potentially responsible for deviating from the threshold precisely cancel each other out (up to error). Thus, the integrality gap instance we well as the NO instance in the hardness reduction have the property that for every assignment, the fraction of satisfied constraints cannot even deviate from (and therefore cannot exceed either) by a non-negligible amount, yielding strong approximation resistance.

The notion of strong approximation resistance has been considered in literature before, albeit implicitly. In fact, almost all known proofs of approximation resistance actually show strong resistance, either implicitly or explicitly, or by a minor modification or possibly switching from NP-hardness to UG-hardness. This is because the soundness analysis of these constructions shows that the Fourier terms that are potentially responsible for deviating from the threshold are all bounded by in magnitude (our analysis has the novelty that these Fourier terms cancel each other out, without each term necessarily being in magnitude). The only possible exception we are aware of is an arity predicate in [4], Example 8.7 therein, that we mentioned before. The predicate is shown to be approximation resistant therein and now our result implies that it is also strongly approximation resistant.

We will avoid referring to the notion of strong approximation resistance henceforth and refer an interested reader to the previous version of this paper [27] for relevant definitions.

### 1.1 Overview of the Proof Techniques

In this section we give an informal overview of the main ideas and techniques used in our results. A significant ingredient in our results is the Von Neumann min-max theorem for zero-sum games which was also used by O’Donnell and Wu [29] towards characterizing the approximability curve for the MAX-CUT problem. The game-theoretic and measure-theoretic framework we develop is likely to find other applications. For instance, it is possible to give an exposition to Raghavendra’s result in our framework, providing a clear explanation in terms of duality of the min-max theorem and (in our opinion) demystifying the result.

We first focus on the main result in the paper, namely that a predicate is approximation resistant if and only if as in Definition 1.1. Before we begin the overview, we briefly comment how the characterization in Definition 1.1 comes about and how it makes sense from the perspective of both the hardness and the algorithmic side. In hindsight, the characterization in Definition 1.1, in terms of the existence of a vanishing measure , is tailor-made to prove the hardness result: given a vanishing measure , it is straightforward, at least at a conceptual level, to design a dictatorship test and prove approximation resistance modulo the UGC. As we said earlier, the vanishing condition precisely ensures that the Fourier terms that are potentially responsible for deviating from exactly cancel each other out and the novel feature here is that the Fourier terms cancel each other out without each term necessarily being in magnitude. From the algorithmic side, one would want to show that if a vanishing measure does not exist, then the predicate is approximable. However, we do not actually know how to design such an algorithm directly! Instead, our entire argument runs in reverse. We propose a family of (SDP rounding) algorithms and show that either some algorithm in this family works or else there exists a vanishing measure. Given a vanishing measure, it is relatively straightforward to prove the hardness result and construct integrality gaps (which are equivalent by Raghavendra’s result), as mentioned earlier. Our argument is non-constructive on both the algorithmic and the hardness side: it yields neither an explicit algorithm nor an explicit vanishing measure.

With the benefit of hindsight, there might be an intuitive explanation why non-existence of a vanishing measure implies existence of an algorithm. Each point in the body corresponds to a Gaussian density function on . Non-existence of a vanishing measure on , by duality in an appropriate setting, might be interpreted as linear independence of these Gaussian density functions in the following sense: w.r.t. any probability measure on , the integral of the Gaussian density function (with Fourier coefficients of the predicate and sign flips thrown in so as to have both positive and negative terms), say a signed density on , is a non-zero density with a positive lower bound on its norm. The rounding algorithm is then a function that distinguishes from the zero density; the algorithm however has to work for all possible simultaneously. One might be able to translate this intuition into a formal proof with an appropriate setting, but we haven’t yet investigated this possibility.

We now begin our overview. We make several simplifying assumptions and use informal mathematically imprecise language as we proceed (for the sake of a cleaner overview only). Let be the predicate under consideration with . We make a simplifying assumption that the predicate is even, i.e. . This allows us to assume that the first moments (i.e. “biases”) are all zero for any distribution supported on and can be safely ignored. 8 Therefore we let the polytope to be the set of all -dimensional second moments vectors over all distributions supported on . Our main concern is whether there is an efficient algorithm for CSP that achieves a non-trivial approximation, i.e. on an satisfiable instance obtains an assignment such that the fraction of satisfied constraints is at least . We make the simplifying assumption that the CSP instance is in fact perfectly satisfiable. This implies that the basic relaxation yields, for every constraint that depends on variables say , a distribution over the set of satisfying assignments and unit vectors such that . As noted, then is a -dimensional vector of the second moments (which equal ). The uniform distribution over the vectors over all constraints is then a probability measure on . We regard the measure as essentially representing the given CSP instance (a priori, we seem to be losing information by ignoring the topology of the instance, but as we will see this doesn’t matter).

Note that in the relaxed solution, the vector assignment is global in the sense that the vector assigned to each CSP variable is fixed, independent of the constraint in which the variable participates in whereas the distribution is local in the sense that it depends on the specific constraint .

Our main idea, as hinted to before, is to propose a family of algorithms based on “-dimensional roundings” of the SDP solution for and to show that either one such algorithm achieves a non-trivial approximation or else the polytope supports a probability measure as in Definition 1.1 (note again that we are ignoring the first moments). In the latter case, the existence and symmetry of leads naturally to a integrality gap for the basic relaxation (and therefore mixed hierarchy) and a UG-hardness result for GapCSP, showing that the predicate is approximation resistant.

The proposed family of -dimensional roundings is easy to describe: any function serves as a candidate rounding algorithm where the SDP vectors are projected onto a random -dimensional subspace inducing and then the variable is assigned a boolean value . From the algorithmic viewpoint, one seeks a rounding function (more generally a distribution over ) such that its “performance” on every instance 9 significantly exceeds (in average, if a distribution over is used). From the hardness viewpoint, a natural goal then would be to come up with a “hard-to-round measure” on such that the “performance” of every rounding function is at most .

These considerations lead naturally to a two-player zero-sum game between Harry, the “hardness player” and Alice, the “algorithm player” (we view Harry as the row player and Alice as the column player). The pure strategies of Harry are the probability measures on to be rounded and the pure strategies of Alice are the rounding functions . The payoff to Alice when the two players play respectively is the “advantage over ” achieved by rounding using . More precisely, consider the scenario where the set of local distributions on CSP constraints is represented by the measure . The local distribution on a randomly selected constraint is a sample along with vectors whose pairwise inner products match . During the rounding process, the vectors are projected onto a random -dimensional subspace, generating a sequence of points that are standard -dimensional Gaussians with correlations . The CSP variables are then rounded to boolean values . Whether these values satisfy the constraint or not is determined by plugging them in the Fourier representation of the predicate . The “advantage over ” is precisely this Fourier expression without the constant term (which is ). Given this intuition, we define the payoff to Alice as the expression:

 PayOff(λ,ψ) := Eζ∼λ Ey1,…,yk∼Nd(ζ)⎡⎣∑S≠∅^f(S)⋅∏i∈Sψ(yi)⎤⎦, (1.4)

where denotes a sequence of standard -dimensional Gaussians with correlations . We apply Von Neumann’s min-max theorem and conclude that there exists a number , namely the “value” of the game, a mixed equilibrium strategy (a distribution over ) for Alice and an equilibrium strategy (a pure one as we will observe!) for Harry. Actually Von Neumann’s theorem applies only to games where the sets of strategies for both players are finite, but we ignore this issue for now. Depending on whether the value of the game is strictly positive or zero (it is non-negative since Alice can always choose a random function and achieve a zero payoff), we get the “dichotomy” that the predicate is approximable or approximation resistant (modulo UGC).

The conclusion when is easy: in this case Alice has a mixed strategy such that her payoff (expected over ) is at least for every pure strategy of Harry. This is same as saying that if a rounding function is sampled and then used to round the relaxed solution, it achieves an advanateg over for every CSP instance .

The conclusion when is more subtle: in this case in general Harry has a mixed strategy, say , such that for every pure strategy of Alice, her expected payoff (expected over ) is at most zero. We observe that Harry may replace his mixed strategy by a pure strategy . Noting that is a distribution over measures , we let be the single averaged measure informally written as . Thus the expectations over and may be merged into the expectation over . We may therefore conclude that for the measure over , for every :

 Eζ∼Λ Ey1,…,yk∼Nd(ζ)⎡⎣∑S≠∅^f(S)⋅∏i∈Sψ(yi)⎤⎦ ≤ 0. (1.5)

Now we view this expression as a multi-linear polynomial in (uncountable number of) variables . We observe that if a multi-linear polynomial in finitely many -valued variables with no constant term is upper bounded by zero, then it must be identically zero (see Lemma 2.11). We pretend, for now, that the same conclusion holds to the “polynomial” above and hence that it is identically zero and we may equate every “coefficient” of this polynomial to zero.

Fix any . For every , we are interested in the coefficient of the monomial . Firstly, this coefficient can arise from precisely the sets with . Secondly, for a fixed set , the coefficient is really the joint density of standard -dimensional Gaussians with correlations at the sequence , where is same as restricted to indices in . Thirdly, for any permutation , we must consider all sequences and add up their coefficients (i.e. Gaussian densities) since they all correspond to the same monomial . Finally, we did not mention this so far, but we need to allow only odd rounding functions , i.e. , to account for the issue of variable negations in CSPs. This has the effect that the monomials are same as for a choice of signs , and hence their coefficients (i.e. Gaussian densities) must be added up together. With all these considerations, the coefficient of the monomial can be written as:

 Eζ∼Λ⎡⎢⎣∑S,|S|=t  ∑π:[t]→[t]  ∑b∈{−1,1}t ^f(S)⋅(t∏i=1bi)γt,d((y1,…,yt),ζS,π,b)⎤⎥⎦.

Here is the sequence of correlations between the indices in after accounting for the permutation of indices according to and the sign-flips according to . Also is the joint density of standard -dimensional Gaussians with correlations . Defining the “signed measure” as in Equation (1.1), the conclusion that the above coefficient is zero (for every ), can be written as:

 ∀y1,…,yt∈Rd,   ∫γt,d((y1,…,yt),ξ) dΛ(t)(ξ)=0.

In words, w.r.t. the signed measure on (corresponding to all possible correlation vectors between standard -dimensional Gaussians), the integral of every function vanishes (there is one such function for every fixed choice of ). The class of these functions is rich enough that, after jumping through several hoops, we are able to conclude that the signed measure itself must identically vanish.

This proves the existence of the measure as in Definition 1.1. After this, the construction of the integrality gap for the CSP is obtained by generalizing the construction for MAX-CUT due to Feige and Schechtman [16]. We describe the construction in the continuous setting and ignore the discretization step here. The variables in the CSP instance correspond to points in for a high enough dimension and the variables for and are designated as negations of each other. The constraints of the CSP are defined by sampling and then sampling Gaussian points with correlations and placing a constraint on these variables. For the completeness part, one observes that for large the space with the Gaussian measure is (up to errors) same as the unit sphere towards our purpose and we may assume that all the CSP variables lie on the unit sphere. Each point on the sphere is assigned a vector that is itself and for every constraint, the local distribution equals if is used towards that constraint. For the soundness part, an assignment to the CSP corresponds to a function and the “advantage over ” is precisely the expression on the l.h.s. of Equation (1.5), if were chosen from instead of ( therein). The symmetry properties of (i.e. that the signed measure vanishes for every ) ensure that this expression vanishes identically and hence no CSP assignment can exceed or even deviate from . We would like to emphasize here that the existence of was deduced only assuming that no -dimensional rounding exceeds , but once the existence of is established, it automatically implies that no higher dimensional rounding can deviate from .

Once the integrality gap is established, the UG-hardness of GapCSP follows automatically from the general result of Raghavendra and the same integrality gap for a super-constant number of rounds of the mixed hierarchy follows automatically from the general results of Raghavendra and Steurer [32], and Khot and Saket [26].

As we said, this is a simplified and informal view and we actually need to work around all the simplifying assumptions we made, formalize all the arguments, and address many issues that we hid under the carpet, e.g. setting and the reason say does not work, handling the first moments, handling the possibility that a Gaussian density is degenerate, etc. Also, we cannot apply Von Neumann’s min-max theorem to infinite games. In principle, one might be able to use min-max theorems for infinite games such as Glicksberg’s theorem, but then one has to ensure that the strategy spaces are compact. Instead, we find it easier to work with a sequence of finite approximations to the infinite game and then use limiting arguments everywhere (this is easier said than done and this is where much of the work lies in).

Another tricky issue is to ensure that the polynomial, obtained as a discretized finite analogue of the expression (1.5), stays multi-linear. We ensure this by modifying the function so as to delete the non-multi-linear terms from the very start. In general, it seems difficult to argue that the norm on the terms so deleted is negligible compared to the norm on the linear combination of the remaining terms (which might suffer heavily due to cancellations). We observe however that the norm on the deleted terms needs to be negligible only compared to the value of the game in the case and this is indeed the case when the discretization is fine enough. The reason is that deleting certain terms changes the function by a corresponding amount, but as long as this amount is negligible compared to , in the case , the algorithm player has a strategy with value at least say even w.r.t. the original payoff function and hence still gets an advantage of over . 10

#### Approximation Resistance for LP Hierarchies

Now we give an overview of the characterization of approximation resistance (i.e. Definition 1.3) for a super-constant number of rounds of Sherali-Adams LP. We proceed along a similar line as earlier with one difference: we work with a different body instead of .

In the LP case, the second moments are not available at all and the first moments are all one has. We will nevertheless pretend that the second moments are available by using their dummy setting. For any distribution supported on , let the vector consist of the first moments and in addition, dummy second moments corresponding to those of independent unit -norm Gaussians with the given first moments, i.e. and . The body is defined as the set of all vectors over all distributions supported on . Note that is different than the polytope and not necessarily convex (we never used convexity), but its projection onto the first co-ordinates is the same as that of , namely as in Definition 1.3.

Once the polytope is replaced by the body , our argument proceeds as before. Note that since the second moments reflect independent Gaussians, our rounding is really using only the first moments, as ought to be the case with LPs. We conclude that either the predicate is approximable or there is a probability measure on that satisfies characterization in Definition 1.1. Projecting onto the first co-ordinates gives a measure on satisfying the characterization in Definition 1.3.

Once the existence of is established, we proceed to constructing the integrality gap in the Sherali-Adams hierarchy. This step however turns out to be more involved than before since general results as in [30, 32, 26] are not available in the LP setting. Instead, we are able to rework the MAX-CUT construction of de la Vega and Kenyon [13] for any predicate .

An intuitive way of looking at the construction is as follows. The variables of the CSP are points in the interval and the variables for and are negations of each other (called folding). Constraints are defined by sampling and then placing the constraint on variables . The local distribution for this constraint is such that . The LP-bias of a variable is itself. The vanishing condition in Definition 1.3 implies that any (measurable) -assignment to this CSP instance satisfies exactly fraction (measure) of the constraints. This conclusion also holds for -valued assignments appropriately interpreted.

This continuous instance only has a basic LP solution, i.e. the local distributions are defined only for constraints. We now construct the actual instance as follows. We discretize the interval by picking equally spaced points with fine enough granularity (and ensuring that a point and its negation are both included and are folded). Each variable is now blown up into a block of variables for a large (so the total number of variables is ). Whenever a constraint is generated in the continuous setting by sampling , we first round to nearest and then the constraint is actually placed on randomly chosen variables from blocks corresponding to respectively. This is the way one constraint is randomly introduced and the process is repeated independently times for . This defines the CSP instance as a -uniform hyper-graph. By deleting a small fraction of the constraints, one ensures that the hyper-graph has super-constant girth. Finally, de la Vega and Kenyon [13] construction is reworked to construct local distributions for all -sets of variables, i.e. for the -round Sherali-Adams LP. Our presentation is somewhat different than that in [13]: we find it easier to first construct a nearly correct LP solution and then correct it as in [32, 26].

One interesting and novel feature of our construction is how the CSP instance is constructed and how the “soundness” is proved as opposed to a standard construction of random CSPs.

A standard construction, in one step, generates a constraint by uniformly selecting a -subset of variables and then randomly selecting the polarities (i.e. whether a variable occurs in a negated form or not). This step is then repeated independently to generate constraints. Since the polarities are randomly chosen in each step, for any fixed global assignment, the probability that the assignment satisfies the constraint is precisely , and then one uses the Chernoff bound and the union bound to conclude that w.h.p. every global assignment to the instance satisfies between fraction of the constraints.

In our case, the one step of generating a constraint is different. In particular, the -subset of variables chosen is not necessarily uniformly random (it depends on since ) and the polarities are not necessarily random either (they depend on signs of due to folding). However it is still true that for any fixed global assignment, the probability that the assignment satisfies the constraint is precisely (up to errors introduced by discretization)! This property is simply inherited from the continuous setting by viewing the global assignment as a function where is the average of the global values to variables in block ! This concludes our overview.

## 2 Preliminaries and Our Results

In this section, we present formal definitions and statements of our results and a preliminary background on mathematical tools used.

### 2.1 Constraint Satisfaction Problems

###### Definition 2.1

For a predicate , an instance of CSP consists of a set of variables and a set of constraints where each constraint is over a -tuple of variables and is of the form

 Ci ≡ f(xi1⋅bi1,…,xik⋅bik)

for some . For an assignment , let denote the fraction of constraints satisfied by . The instance is called -satisfiable if there exists an assignment such that . The maximum fraction of constraints that can be simultaneously satisfied is denoted by , i.e.

 OPT(Φ)=maxA:{x1,…,xn}→{−1,1}sat(A).

The density of the predicate is .

For a constraint of the above form, we use to denote the tuple of variables and to denote the tuple of bits . We then write the constraint as . We also denote by the set of indices of the variables participating in the constraint .

###### Definition 2.2

A predicate is called approximable if there exists a constant and a polynomial time algorithm, possibly randomized, that given an -satisfiable instance of CSP(), outputs an assignment such that . Here the expectation is over the randomness used by the algorithm.

Towards defining the notion of approximation resistance, it is convenient to define the gap version of the CSP. Though the gap version can be defined w.r.t. any gap location, we do so only for the location that is of interest to us, namely versus . We say that a decision problem is UG-hard if there is polynomial time reduction from the Unique Games Problem [24] to the problem under consideration (we will not be directly concerned with the Unique Games Problem and the Conjecture; hence their discussion is deferred to the end of the preliminaries section).

###### Definition 2.3

Let be a constant.

Let GapCSP denote the promise version of CSP where the given instance is promised to have either  or  . The predicate is called approximation resistant if for every , GapCSP is UG-hard.

### 2.2 The LP and SDP Relaxations for Constraint Satisfaction Problems

Below we present three LP and SDP relaxations for the problem that are relevant in this paper: the Sherali-Adams LP relaxation, mixed LP/SDP relaxation and finally the basic relaxation.

We start with the -round Sherali-Adams relaxation. The intuition behind it is the following. Note that an integer solution to the problem can be given by an assignment . Using this, we can define -valued variables for each and , with the intended solution if and 0 otherwise. We also introduce a variable , which equals 1. We relax the integer program and allow variables to take real values in . Now the variables give a probability distribution over assignments to . We can enforce consistency between these local distributions by requiring that for , the distribution over assignments to , when marginalized to , is precisely the distribution over assignments to . The relaxation is shown in Figure 1.

We can further strengthen the integer program by adding the quadratic constraints

 x({i1,i2},(b1,b2)) = x({i1},b1)⋅x({i2},b2).

As solving quadratic programs is NP-hard we then relax these quadratic constraints to the existence of vectors and a unit vector , and impose the above constraints on inner products of the corresponding vectors. Adding these SDP variables and constraints to the -round Sherali-Adams LP as above yields the -round mixed relaxation as in Figure 2.

Finally, the basic relaxation is a reduced form of the above mixed relaxation where only those variables are included for which is the set of CSP variables for some constraint . The consistency constraints between pairs of vectors are included only for those pairs that occur inside some constraint. The relaxation (after a minor rewriting) is shown in Figure 3.

For an LP/SDP relaxation of CSP, and for a given instance of the problem, we denote by the LP/SDP (fractional) optimum. For the particular instance , the integrality gap is defined as . The integrality gap of the relaxation is the supremum of integrality gaps over all instances. The integrality gap thus defined is in terms of a ratio whereas we are concerned with the specific gap location versus .

###### Definition 2.4

Let be a constant.

A relaxation is said to have a -integrality gap if there exists a CSP instance such that and .

We will use known results showing that the integrality gap for the basic relaxation as in Figure 3 implies a UG-hardness result as well as integrality gap for the mixed relaxation as in Figure 2 for a super-constant number of rounds, while essentially preserving the gap. The first implication is by Raghavendra [30] and the second by Raghavendra and Steurer [32] and Khot and Saket [26]. We state these results in a form suitable for our purpose.

###### Theorem 2.5

[30] Let be an arbitrarily small constant.

If the basic relaxation as in Figure 3 has a -integrality gap, then GapCSP is UG-hard.

[32, 26] Let