Approximate Counting via Correlation Decay in Spin Systems

# Approximate Counting via Correlation Decay in Spin Systems

Liang Li
Peking University
liang.li@pku.edu.cn
This work was done when these authors visited Microsoft Research Asia.
Pinyan Lu
Microsoft Research Asia
pinyanl@microsoft.com
Yitong Yin 11footnotemark: 1
Nanjing University
yinyt@nju.edu.cn
Supported by the National Science Foundation of China under Grant No. 61003023 and No. 61021062.
###### Abstract

We give the first deterministic fully polynomial-time approximation scheme (FPTAS) for computing the partition function of a two-state spin system on an arbitrary graph, when the parameters of the system satisfy the uniqueness condition on infinite regular trees. This condition is of physical significance and is believed to be the right boundary between approximable and inapproximable.

The FPTAS is based on the correlation decay technique introduced by Bandyopadhyay and Gamarnik [SODA 06] and Weitz [STOC 06]. The classic correlation decay is defined with respect to graph distance. Although this definition has natural physical meanings, it does not directly support an FPTAS for systems on arbitrary graphs, because for graphs with unbounded degrees, the local computation that provides a desirable precision by correlation decay may take super-polynomial time. We introduce a notion of computationally efficient correlation decay, in which the correlation decay is measured in a refined metric instead of graph distance. We use a potential method to analyze the amortized behavior of this correlation decay and establish a correlation decay that guarantees an inverse-polynomial precision by polynomial-time local computation. This gives us an FPTAS for spin systems on arbitrary graphs. This new notion of correlation decay properly reflects the algorithmic aspect of the spin systems, and may be used for designing FPTAS for other counting problems.

## 1 Introduction

Spin systems are well studied in Statistical Physics. We focus on two-state spin systems. An instance of a spin system is a graph . A configuration assigns every vertex one of the two states. We shall refer the two states as blue and green. The contributions of local interactions between adjacent vertices are quantified by a matrix

 A=[A0,0A0,1A1,0A1,1]=[β11γ],

where . The weight of an assignment is the production of contributions of all local interactions and the partition function of a system is the summation of the weights over all possible assignments. Formally,

 ZA(G)=∑σ∈2V∏(u,v)∈EAσ(u),σ(v).

Although originated from Statistical Physics, the spin system is also accepted in Computer Science as a framework for counting problems. Considering the two very well studied frameworks, the weighted Constraint Satisfaction Problems (#CSP) [14, 7, 26, 6, 12, 19, 11] and Graph Homomorphisms [17, 8, 28, 36, 10, 9], the two-state spin systems can be viewed as the most basic setting in these frameworks: A Boolean #CSP problem with one symmetric binary relation; or Graph Homomorphisms to graph with two vertices. Many natural combinatorial problems can be formulated as two-state spin systems. For example, with and , is the number of independent sets (or vertex covers) of the graph .

Given a matrix , it is a computational problem to compute where graph is given as input. We want to characterize the computational complexity of computing in terms of and . For exact computation of , polynomial time algorithms are known only for the very restricted settings that or , and for all other settings the problem is proved to be #P-Hard [8]. We consider the approximation of , with the fully polynomial-time approximation schemes (FPTAS) and its randomized relaxation the fully polynomial-time randomized approximation schemes (FPRAS).

In a seminal paper [48], Jerrum and Sinclair gave an FPRAS when , which was further extended to the entire region  [41]. For except that or , Goldberg, Jerrum and Paterson prove that the problem do not admit an FPRAS unless NPRP [41]. For the other values of the parameters, namely, or symmetrically , the approximability of is not very well understood. It was shown in [41] that by coupling a simple heat-bath random walk, there exists an additional region of and which admit some FPRAS. The true characterization of approximability is still left open.

Within this unknown region, there lies a critical curve with physical significance, called the uniqueness threshold. The phase transition of Gibbs measure occurs at this threshold curve. Such statistical physics phase transitions are believed to coincide with the transitions of computational complexity. However, there are only very few examples where the connection is rigorously proved. One example is the hardcore (counting independent set) model. It was conjectured in [56] by Mossel, Weitz and Wormald, and settled in a line of works by Dyer, Frieze and Jerrum [23], Weitz [61], Sly [58], and very recently Galanis, Ge, Štefankovič, Vigoda and Yang [31] that in the hardcore model the uniqueness threshold essentially characterizes the approximability of the partition function. It will be very interesting to observe the similar transition in spin systems.

### 1.1 Main results

We extend the approximable region (in terms of and ) of to the uniqueness threshold in two-state spin systems, which is believed to be the right boundary between approximable and inapproximable. Specifically, we formulate a criterion for and such that there is a unique Gibbs measure on all infinite regular trees111Technically, there is a small integrality gap caused by the continuous generalization of the condition. The formal statement is given in the following section., and prove that there is an FPTAS for computing when this uniqueness condition is satisfied. This improves the approximable boundary (dashed lines in Figure 1) provided by the heat-bath random walk in [41]. Moreover, the algorithm is deterministic.

The FPTAS is based on the correlation decay technique first used in [61, 1] for approximate counting. We elaborate a bit on the ideas. A spin system induces a natural probability distribution over all configurations called the Gibbs measure where the probability of a configuration is proportional to its weight. Due to a standard self-reduction procedure, computing is reduced to computing the marginal distribution of the state of one vertex, which is made plausible by Weitz in [61] with the self-avoiding walk (SAW) tree construction. For efficiency of computation, the marginal distribution of a vertex is estimated using only a local neighborhood around the vertex. To justify the precision of the estimation, we show that far-away vertices have little influence on the marginal distribution. This is done by analyzing the rate with which the correlation between two vertices decays as they are far away from each other.

The correlation decay by itself is a phenomenon of physical significance. One of our main discoveries is that two-state spin systems on any graphs have exponential correlation decay when the above uniqueness condition is satisfied.

### 1.2 Technical contributions

The technique of using correlation decay to design FPTAS for partition functions is developed in the hardcore model. We introduce several new ideas to adapt the challenges arising from spin systems. We believe these challenges are typical in counting problems, and the new ideas will make the correlation decay technique more applicable for approximate counting.

1. The correlation decay technique used in [61] relies on a monotonicity property specific to the hardcore model. Correlation decays in graphs are reduced via this monotonicity to the decays in infinite regular trees, while the later have solvable phase transition thresholds. It was already observed in [61] that such monotonicity may not generally hold for other models. Indeed, it does not hold for spin systems. We develop a more general method which does not rely on monotonicity: We directly compute the correlation decay in arbitrary trees (and as a result in arbitrary graphs via the SAW tree reduction), and use the potential method to analyze the amortized behavior of correlation decay.

2. To have an FPTAS, the marginal distribution of a single vertex should be approximable up to certain precision from a local neighborhood of polynomial size. The classic correlation decay is measured with respect to graph distance. The local neighborhoods in this sense are balls in the graph metric. A SAW tree enumerates all paths originating from a vertex. For graphs of unbounded degrees, the SAW tree transformation may have the balls offering desirable precisions explode to super-polynomial sizes.

We introduce the notion of computationally efficient correlation decay. Correlation decay is now measured in a refined metric, which has the advantage that a desirable precision is achievable by a ball (in the new metric) of polynomial size even after the SAW tree transformation. We prove an exponential correlation decay in this new metric when the uniqueness is satisfied. As a result, we have an FPTAS for arbitrary graphs as long as the uniqueness condition holds.

### 1.3 Related works

The approximation for partition function has been extensively studied with both positive [48, 50, 39, 18, 29, 47, 60] and negative results [38, 5, 56, 37, 13, 33, 3, 32]. Some special problems in these framework are well studied combinatorial problems, e.g. counting independent sets [23, 29, 53] and graph coloring [54, 47, 45, 20, 22, 43, 21, 44, 46, 30, 55, 60, 4, 40]. Some dichotomies (or trichotomies) of complexity for approximate counting CSP were also obtained [27, 24, 31, 58]. Almost all known approximation counting algorithms are based on random sampling [51, 25], usually through the famous Markov Chain Monte Carlo (MCMC) method [16, 49]. There are very few deterministic approximation algorithms for any counting problems. Some notable examples include [1, 34, 2, 42, 59].

In a very recent work [57], Sinclair, Srivastava, and Thurley give an FPTAS using correlation decay for the two-state spin systems on bounded degree graphs. They allow the two-state spin systems to have an external field, and the uniqueness thresholds they used are defined with respect to specific maximum degrees.

## 2 Definitions and Statements of Results

A spin system is described by a graph . A configuration of the system is one of the possible assignments of states to vertices. We also use two colors blue and green to denote these two states. Let , where . The Gibbs measure is a distribution over all configurations defined by

 μ(σ)=1ZA(G)∏(u,v)∈EAσ(u),σ(v).

The normalization factor is called the partition function.

From this distribution, we can define the marginal probability of to be colored blue. Let be a configuration defined on vertices in . We call vertices fixed vertices, and free vertices. We use to denote the marginal probability of to be colored blue conditioned on the configuration of being fixed as .

###### Definition 1

A spin system on a family of graphs is said to have exponential correlation decay if for any graph in the family, any and ,

 |pσΛv−pτΛv|≤exp(−Ω(dist(v,Δ))).

where is the subset on which and differ, and is the shortest distance from to any vertex in .

This definition is equivalent to the “strong spatial mixing” in [61] with an exponential rate. It is stronger than the standard notion of exponential correlation decay in Statistical Physics [15], where the decay is measured with respect to instead of .

The marginal probability in a tree can be computed by the following recursion. Let be a tree rooted by . We denote as the ratio of the probabilities that root is blue and green, respectively, when imposing the condition . Formally, (when , let by convention). Suppose that the root of has children. Let be the subtree rooted by the -th child of the root. The distributions on distinct subtrees are independent. A calculation then gives that

 RσΛT=d∏i=1βRσΛTi+1RσΛTi+γ. (1)

It is of physical significance to study the Gibbs measures on infinite -regular trees [35]. In , the recursion is of a symmetric form . There may be more than one Gibbs measures on infinite graphs. We say that the system has the uniqueness if there is exact one Gibbs measure. Let be the fixed point of . It is known [52, 54] that the spin system on undergoes a phase transition at with uniqueness when . This motivates the following definition

 Γ(β) =inf{γ≥1∣∣∀d≥1,d(1−βγ)(β^x+1)d−1(^x+γ)d+1≤1}.

For a fixed , the gives the boundary that all infinite regular trees exhibit uniqueness when . We call the uniqueness threshold. Indeed, for any , there is a critical such that exhibits uniqueness when . Furthermore, there is a finite crucial such that . That is, has the highest uniqueness threshold among all .

We remark that for technical reasons, we treat as real numbers thus is slightly greater than the one defined by integer s. An integer version of is given in Section 6, where a slightly improved and tight analysis is given for the specially case .

###### Definition 2

A fully polynomial-time approximation scheme (FPTAS) for is an algorithm that given as input an instance and an , outputs a number in time such that .

In Definition 1, the correlation decay is measured in graph distance. In order to support an FPTAS for graphs with unbounded degrees, we need to define the following refined metric.

###### Definition 3

Let be a rooted tree and be a constant. We define the -based depth of a vertex in recursively as follows: if is the root of ; and for every child of , if has children, .

If every vertex in has children, is precisely the depth of . If there are vertices having children, we actually replace every such vertex and its children with an -ary tree of depth , and is the depth of in this new tree.

###### Definition 4

Let be a rooted tree and be a constant. Let , called an -based -ball, be the set of vertices in whose -based depths are no greater than ; and let , called an -based -closed-ball, be the set of vertices in and all their children in .

The main technical result of the paper is the following theorem which establishes an exponential correlation decay in the refined metric when the uniqueness condition holds.

###### Theorem 5 (Computationally Efficient Correlation Decay)

Let , , and . There exists a sufficiently large constant which depends only on and , such that on an arbitrary tree , for any two configurations and which differ on , if then

 |RσΛT−RτΛT|≤exp(−Ω(L)).

The name computationally efficient correlation decay is due to the fact that in any tree, thus an exponential decay would imply a polynomial-size giving an inverse-polynomial precision.

Theorem 5 has the following implications via Weitz’s self-avoiding tree construction [61].

###### Theorem 6

Let , , . It is of exponential correlation decay for the Gibbs measure on any graph.

###### Theorem 7

Let , , . There is an FPTAS for computing the partition function for arbitrary graph .

By symmetry, in Theorem 5, 6, and 7, the roles of and can be switched.

In the Section 3, we will show the FPTAS implied by Theorem 5, followed by a formal treatment of the uniqueness threshold in Section 4, and finally the formal proof of Theorem 5 in Section 5.

## 3 An FPTAS for the Partition Function

Assuming that Theorem 5 is true, we show that when and , there is an FPTAS for the partition function for arbitrary graph . The FPTAS is based on approximation of , the ratio between the probabilities that is blue and green, respectively, when imposing the condition .

The self-avoiding walk tree is introduced by Weitz in [61] for calculating . Given a graph , we fix an arbitrary order of vertices. Originating from any vertex , a self-avoiding walk tree, denoted , is constructed as follows. Every vertex in corresponds to one of the walks in such that , all edges are distinct and are distinct, i.e. the self-avoiding walks originating from and those appended with a vertex closing a cycle. The root of corresponds to the trivial walk . The vertex parents in , if and only if their respective walks and satisfy that for some . For a leaf of whose walk closes a cycle, supposed that the cycle is , fix the leaf to be blue if and green otherwise. When a configuration is imposed on of the original graph , for any vertex of whose corresponding walk ends at a , the color of the vertex is fixed to be . We abuse the notation and denote the resulting configuration on by as well.

This novel tree construction has the advantage that the probabilities are exactly the same in both the original spin system and the constructed tree.

###### Theorem 8 (Weitz [61])

Let . It holds that

Due to (1), in a tree , the following recursion holds for :

 RσΛT =d∏i=1βRσΛTi+1RσΛTi+γ.

The base case is either when the current root , i.e. ’s color is fixed, in which case or (depending on whether is fixed to be blue or green), or when is free and has no children, in which case (this is consistent with the recursion since the outcome of an empty product is 1 by convention).

For , the recursion is monotonically decreasing with respect to every . An upper (lower) bound of can be computed by replacing in the recursion by their respective lower (upper) bounds. Algorithm 1 computes the lower or upper bound of up to vertices in -based -closed ball . For the vertices outside , it uses the trivial bounds .

Due to the monotonicity of the recursion, it holds that

 R(T,σΛ,L,0,true)≤RσΛT≤R(T,σΛ,L,0,false).

Note that the naive lower bound (or the upper bound ) of for a vertex outside can be achieved by fixing the vertex to be green (or blue). Denote by and the configurations achieving the lower and upper bounds respectively. It is easy to see that in . Then due to Theorem 5, there is a constant such that

 |R(T,σΛ,L,0,false)−R(T,σΛ,L,0,true)| =|Rτ1T−Rτ0T| =O(αL).

To compute for an arbitrary graph , we first construct the of , and run Algorithm 1. Due to Theorem 8, , thus it returns and such that and . Since , we can output and so that and .

The running time of this algorithm relies on the size of in . The maximum degree of is bounded by the maximum degree of , which is trivially bounded by , thus . The running time of the algorithm is .

By setting , we can approximate within absolute error in time . For , it holds that for free thus , therefore the above procedure approximates within factor . We have an FPTAS for .

The partition function can be computed from by the following standard routine. Let enumerate the vertices in , and let , be the configurations fixing the first vertices to be green, where means all vertices are free. The probability measure of (all green) can be computed as

 μ(σn) =n∏i=1Pr[vi is green ∣v1,…,vi−1 are green] =n∏i=1(1−pσi−1vi).

On the other hand, it is easy to see that by definition of . Thus

 ZA(G) =γ|E|μ(σn)=γ|E|∏ni=1(1−pσi−1vi).

Notice that . Therefore, an FPTAS for implies an FPTAS for .

## 4 The uniqueness threshold

In this section, we formally define the uniqueness threshold and the critical . We also prove several propositions regarding these quantities which are useful for the analysis of the correlation decay.

###### Definition 9

Let be a fixed parameter. Suppose that and . Let be the positive solution of

 x =(βx+1x+γ)d. (2)

Define that . Then is the positive fixed point of . For , is continuous and strictly decreasing over , and it holds that and , thus has a unique fixed point over . Therefore, for and , is well defined and .

###### Definition 10

Let

 Γ(β) =inf{γ≥1∣∣∀d≥1,d(1−βγ)(βx(γ,d)+1)d−1(x(γ,d)+γ)d+1≤1}.

We write for short if no ambiguity is caused.

Note that can be equivalently defined as

 Γ =inf{γ≥1∣∣∀d≥1,d(1−βγ)x(γ,d)(βx(γ,d)+1)(x(γ,d)+γ)≤1},

because satisfies (2).

The following lemma states that for , is well-defined and nontrivial.

###### Lemma 11

For , it holds that .

Proof: We first show that . It is sufficient to show that if then there exists a such that , where satisfies that .

By contradiction, suppose that and for all , where satisfies that . Then,

 1≥d(1−βγ)x(βx+1)(x+γ)=d(1−βγ)βx+γx+(1+βγ)≥d(1−βγ)βx+γx+2.

Specifically, suppose that is sufficiently large so the followings hold

 βdexp(d(1−βγ)d−3) γd(1−βγ)−3.
• . Then . Thus,

 1≥d(1−βγ)βx+γx+2≥d(1−βγ)βx+3,

which implies that . On the other hand,

 x =(βx+1x+γ)d≤(βx+1x)d≤(β+βd(1−βγ)−3)d ≤βdexp(d(1−βγ)d−3)

• . Then . Thus,

 1≥d(1−βγ)βx+γx+2≥d(1−βγ)γx+3,

which implies that . On the other hand,

 x =(βx+1x+γ)d≥1(x+1)d≥(1+γd(1−βγ)−3)−d ≥exp(−γdd(1−βγ)−3)>γd(1−βγ)−3,

We proceed to show that . It is sufficient to show that there exists a such that for all , , where satisfies that .

If , then . Thus,

 d(1−βγ)x(βx+1)(x+γ)=dxx+γ≤dx≤dγd≤1elnγ,

where the last inequality can be verified by taking the maximum of over . Therefore, setting , it holds that .

On the other hand, if , choosing an arbitrary constant which also satisfies that , and assuming , we have

 x=(βx+1x+γ)d=(β+1−βγx+γ)d≤(β+1−βγ)d≤αd.

Thus,

 d(1−βγ)x(βx+1)(x+γ)≤d(1−βγ)x≤(1−βγ)dαd≤(1−βγ)−elnα,

where the last inequality is also proved by taking the maximum of . Therefore, we can choose , which indeed satisifes , to guarantee that .

Therefore, for , there always exists a such that for all , it holds that , where satisfies that . This implies .

###### Definitions 12

Let be the solution of

 d(1−βγ)x(γ,d)(βx(γ,d)+1)(x(γ,d)+γ) =1 (3)

over , and define by convention if such solution does not exist.

The following lemma states that is well-defined and captures the uniqueness threshold for different instances of .

###### Lemma 13

The followings hold for :

1. is a well-defined function for .

2. .

3. There exists a finite constant such that , and is a stationary point of , i.e. .

Proof:

1. We first show that for any , there exists at most one satisfying (3), which will imply that is well-defined.

Observe that for any fixed , is strictly decreasing with respect to over . By contradiction, assume that for some , is non-decreasing over . Then is strictly decreasing over , a contradiction.

Therefore, must be strictly decreasing with respect to , or otherwise would have been non-decreasing, contradicting that is strictly decreasing.

Combining these together, we have

 d(1−βγ)x(γ,d)(βx(γ,d)+1)(x(γ,d)+γ)=d(1−βγ)x(γ,d)+γ⋅1β+1x(γ,d)

is strictly decreasing over . Thus, there exists at most one satisfying (3). Therefore, is well-defined.

2. We then show that . For any , let

 Γd(β)=inf{γ≥1∣∣d(1−βγ)x(γ,d)(βx(γ,d)+1)(x(γ,d)+γ)≤1}.

Note that for any , when , , thus . In addition to that, since is strictly decreasing over , is either equal to the unique solution of over or equal to 1 if such solution does not exist. Therefore,

 Γd(β)=γ(d).

Since is strictly decreasing over , for any that for all , it holds that for all , i.e. . Thus, . The other direction is universal. Therefore,

 Γ(β)=supd≥1Γd(β)=supd≥1γ(d).
3. We show that there is a finite that .

First notice that . By contradiction assume that . Substituting in with the positive solution of gives us a . Then by conventional definition, . From the previous analysis, we know that and due to Lemma 11, . A contradiction.

We treat as a single-variate function of . We claim that as . By contradiction, if is bounded away from 0 by a constant as , then as , a contradiction.

Therefore, when , it must hold that , because if otherwise , since as , it holds that , which approaches either or as , which contradicts that as .

We just show that for sufficiently large , which means that for these s, is defined by (3) instead of defined by the convention . Thus, for sufficiently large , and can be treated as single-variate functions of satisfying both (2) and (3).

For , it holds that , thus

 dγd≥d(1−βγ)γd≥d(1−βγ)x=(βx+1)(x+γ)≥γ,

where the equality holds by (3). Thus, .

Recall that for all sufficiently large , thus there is a finite such that is bounded from below by a constant greater than 1. On the other hand, as . Therefore, there is a finite such that . Due to Lemma 13, this implies .

Since and is neither infinite nor equal to 1, must be a stationary point of , i.e. .

We can then define the crucial which generates the highest uniqueness threshold .

###### Definition 14

Let be the value satisfying . Let .

It is obvious that both (2) and (3) hold for , , and . Two less obvious but very useful identities are given in the following lemma.

###### Lemma 15

The followings hold for and .

 1 βΓ≤√βΓ

Proof:

1. Since , , and satisfies (3), it holds that

 D(1−βΓ) =