Spatial Mixing and Non-local Markov chains

# Spatial Mixing and Non-local Markov chains

Antonio Blanca School of Computer Science, Georgia Tech, Atlanta, GA 30332. Email: ablanca@cc.gatech.edu. Research supported in part by NSF grants 1420934, 1563838 and 1617306.    Pietro Caputo Department of Mathematics, University of Roma Tre, Largo San Murialdo 1, 00146 Roma, Italy. Email: caputo@mat.uniroma3.it    Alistair Sinclair Computer Science Division, U.C. Berkeley, Berkeley, CA 94720. Email: sinclair@cs.berkeley.edu. Research supported in part by NSF grant 1420934.    Eric Vigoda School of Computer Science, Georgia Tech, Atlanta, GA 30332. Email: vigoda@gatech.edu. Research supported in part by NSF grants 1563838 and 1617306.
Part of this work was done at the Simons Institute for the Theory of Computing.
###### Abstract

We consider spin systems with nearest-neighbor interactions on an -vertex -dimensional cube of the integer lattice graph . We study the effects that exponential decay with distance of spin correlations, specifically the strong spatial mixing condition (SSM), has on the rate of convergence to equilibrium distribution of non-local Markov chains. We prove that SSM implies mixing of a block dynamics whose steps can be implemented efficiently. We then develop a methodology, consisting of several new comparison inequalities concerning various block dynamics, that allow us to extend this result to other non-local dynamics. As a first application of our method we prove that, if SSM holds, then the relaxation time (i.e., the inverse spectral gap) of general block dynamics is , where is the number of blocks. A second application of our technology concerns the Swendsen-Wang dynamics for the ferromagnetic Ising and Potts models. We show that SSM implies an bound for the relaxation time. As a by-product of this implication we observe that the relaxation time of the Swendsen-Wang dynamics in square boxes of is throughout the subcritical regime of the -state Potts model, for all . We also prove that for monotone spin systems SSM implies that the mixing time of systematic scan dynamics is . Systematic scan dynamics are widely employed in practice but have proved hard to analyze. Our proofs use a variety of techniques for the analysis of Markov chains including coupling, functional analysis and linear algebra.

## 1 Introduction

Spin systems are a general framework for modeling interacting systems of simple elements, and arise in a wide variety of settings including statistical physics, computer vision and machine learning (where they are often referred to as “graphical models” or “Markov random fields”). A spin system consists of a finite graph and a set  of spins; a configuration assigns a spin value to each vertex . For definiteness in this version of the paper, we focus on the classical case where is a cube in the -dimensional lattice . The probability of finding the system in a given configuration  is given by the Gibbs (or Boltzmann) distribution

 μ(σ)=exp(−H(σ))/Z, (1)

where is the normalizing factor (or “partition function”) and the Hamiltonian  contains terms that depend on the spin values at each vertex (a “vertex potential”) and at each pair of adjacent vertices (an “edge potential”). See Section 2 for a precise definition.

One of the most fundamental properties of spin systems is (strong) spatial mixing (SSM), which captures the fact that the correlation between spins at different vertices decays with the distance between them (uniformly over the size of the underlying graph )—again, see Section 2 for a precise definition. SSM is closely related to the classical physical concept of a phase transition, which refers to the sudden disappearance of long-range correlations as some parameter of the system (typically, the edge or vertex potential) is continuously varied.111Actually phase transitions are usually related to a weaker notion called “weak spatial mixing” (WSM); in two dimensional spin systems WSM and SSM are known to be equivalent [35]. SSM has proved to have a number of powerful algorithmic applications, both in the analysis of spin system dynamics (discussed in detail below) and in the design of efficient approximation algorithms for the partition function (a weighted generalization of approximate counting) using the associated self-avoiding walk trees (see, e.g., [49, 41, 30, 19, 40, 42, 43]).

While SSM is a static property of a spin system, there is equal interest in dynamic properties. By this we mean the behavior of ergodic Markov chains whose states are the configurations of the spin system and whose equilibrium measure is the Gibbs distribution (1). Such dynamics are of interest in their own right: they provide algorithms for sampling from the Gibbs distribution and (in many cases) are a plausible model for the evolution of the underlying system of spins. Of particular interest are Glauber dynamics, which at each step pick a vertex uniformly at random and update its spin in a reversible fashion depending on the neighboring spins.

It has been well known since pioneering work in mathematical physics from the late 1980s (see, e.g., [26, 1, 50, 44, 33, 34, 8]) that SSM implies that the mixing time (i.e., rate of convergence) of the Glauber dynamics is , and hence optimal [25]; indeed, the reverse implication is also true, so the phase transition is manifested in the mixing time of the dynamics (see, e.g., [44, 33, 15]). The above implication was established using sophisticated functional analytic techniques, though more recently a simple combinatorial proof was given in [15] for the special case of monotone systems (where the edge potential favors pairs of equal spins—see Section 6 for a precise definition).

The intuition for these mixing time bounds comes from the fact that in the absence of long-range correlations (i.e., SSM), the system mimics the behavior of one with no interactions where the Gibbs distribution (1) is simply a product measure. Consequently, local Markov chains like the Glauber dynamics require steps to mix. On the other hand, non-local dynamics, where a large fraction of the configuration may be updated in a single step, could potentially converge to the Gibbs distribution much faster. These dynamics have to contend with the possibly high computational cost of implementing a single step. However, in some cases, non-local steps can be efficiently implemented by taking advantage of specific features of the models.

The current paper concerns the effects of SSM on the rate of convergence to equilibrium of non-local dynamics. Our first contribution consists of tight bounds for the mixing time and the spectral gap of a block dynamics. The spectral gap is the inverse of the relaxation time, which measures the speed of convergence to the stationary distribution when the initial configuration is reasonably close to this distribution (a “warm start”), whereas the mixing time assumes a worst possible starting configuration. The relaxation time is another well studied notion of rate of convergence (see, e.g., [27, 28]).

Let be a collection of sets (or blocks) such that . A (heat-bath) block dynamics with blocks is a Markov chain that in each step picks a block uniformly at random and updates the configuration in with a new configuration distributed according to the conditional measure in given the configuration in . We first consider the following choice of blocks. Start with a regular pattern of non-overlapping -dimensional lattice cubes of side , with a fixed minimal distance between cubes, and let denote the union of all cubes in this pattern. By considering all possible lattice translations of the set we obtain the blocks where ; see Figure 1 on page 1. Each such block is called a tiling of and the associated block dynamics is called the tiled block dynamics. We refer to Section 3 for a precise definition.

###### Theorem 1.1.

When is a sufficiently large constant (independent of ), SSM implies that the mixing time of the tiled block dynamics is and that its relaxation time is .

In practice, the steps of the tiled block dynamics can be implemented efficiently in parallel. However, the main significance of this result is that, in conjunction with a comparison methodology we develop, it allows us to establish several new results for standard non-local dynamics. The first consequence of this technology is a tight bound for the relaxation time of general block dynamics.

###### Theorem 1.2.

SSM implies that the spectral gap of any heat-bath block dynamics with blocks is , and hence its relaxation time is .

We observe that there are no restrictions on the geometry of the blocks in this theorem, other than . This optimal bound for the spectral gap was known before only for certain specific collections of blocks (see, e.g., [32, 15]), and previous analytic methods apparently do not apply to the general setting.

A second application of our techniques concerns the so-called Swendsen-Wang (SW) dynamics [45]. The SW dynamics is a widely studied reversible dynamics for the ferromagnetic Ising and Potts models, which are among the most important and classical of all spin systems. In the ferromagnetic -state Potts model, there are spin values and the edge potential favors equal spins on neighbors. More precisely, where is the number of edges connecting vertices with the same spin values in , and is a parameter of the model. The Ising model is just the special case .

The SW dynamics is non-local, and updates the entire configuration in a single step, according to a scheme inspired by the related random-cluster model. (The exact definition of this dynamics is given in Section 4.) We prove that the relaxation time of the SW dynamics is , provided SSM holds. More formally, let be the transition matrix of the Swendsen-Wang dynamics for the Potts model on an -vertex cube in , and let denote its spectral gap.

###### Theorem 1.3.

For all , SSM implies that ; hence the relaxation time of the SW dynamics is .

This optimal bound for the spectral gap is a substantial improvement over the best previous result due to Ullrich [47], where, in , SSM was shown to imply that . For earlier related work in see [36, 9]. Tight spectral gap bounds such as ours for the SW dynamics were known previously only in the mean-field setting, where the graph is the complete graph [31, 18, 4]. For other relevant work see [22], where Guo and Jerrum proved that when  the SW dynamics mixes in polynomial time on any graph. We note that our spectral gap result does not immediately imply a bound on the mixing time, as one might hope; this is because there is an inherent penalty of in relating spectral gap to mixing time, so the mixing time bound implied by Theorem 1.3 is .

In two dimensions SSM is known to hold for all and all , where is the uniqueness threshold; this is a consequence of the results in [3, 2, 35]. Therefore, we have the following interesting corollary of Theorem 1.3.

###### Corollary 1.4.

In an -vertex square box of , for all and all we have ; hence the relaxation time of the SW dynamics is .

In , Ullrich’s result [47] implies that the relaxation time of the SW dynamics is for , for , and at most polynomial in for and . Recently, Gheissari and Lubetzky [20, 21], using the results of Duminil-Copin et al. [11, 10] settling the continuity of phase transition, analyzed the dynamics at the critical point for all . They showed that the mixing time is at most polynomial in for , at most quasi-polynomial for , and for . Previously, Borgs et al. [6, 5] proved an exponential lower bound for the mixing time on the -dimensional torus when , but only for sufficiently large .

Our last contribution concerns the systematic scan dynamics, which is a version of Glauber dynamics in which the vertex  to be updated is chosen not uniformly at random but according to a fixed ordering of the vertex set ; one step of systematic scan consists of updating each vertex once according to this ordering. Systematic scan is widely employed in practice, and there is a folklore belief that its mixing time should be closely related to that of standard (random update) Glauber dynamics; however, it has proved much harder to analyze, and indeed a number of works have been devoted to this topic (see, e.g., [12, 13, 14, 24]). The best general condition under which systematic scan dynamics is known to be rapidly mixing is due to Dyer, Goldberg and Jerrum [14], and is closely related to the Dobrushin condition for uniqueness of the Gibbs measure; this condition in turn is known to be stronger (and in some cases significantly stronger) than SSM [44, 33].

For the special case of monotone spin systems we can show that the systematic scan dynamics mixes in steps for any ordering of the vertices, whenever SSM holds. Additionally, for a wide class of orderings we can show that the mixing time is , provided again that SSM holds. For a vertex ordering , let denote the length of the longest subsequence of that is a path in .

###### Theorem 1.5.

In a monotone spin system on , SSM implies that the mixing time for the systematic scan dynamics on an -vertex cube in is for any ordering . Moreover, if then SSM implies that the mixing time is .

Note that the condition is usually easy to check in practice. Moreover, it is easy to choose orderings  for which is bounded; for example, in , is always bipartite, so the ordering  that updates first all the even vertices, then all the odd ones, has . This particular systematic scan dynamics, called the alternating scan dynamics, is used in practice to sample from the Gibbs distribution and thus has received some attention [38, 23]. Using our comparison technology we prove that, for general spin systems, the relaxation time of the alternating scan dynamics is , provided SSM holds.

###### Theorem 1.6.

SSM implies that the relaxation time of the alternating scan dynamics on an -vertex cube in is .

We emphasize that Theorem 1.6 applies to general (not necessarily monotone) spin systems. In spin systems with the SSM property, the best previously known bound for the relaxation time of the alternating scan dynamics was ; this bound follows from a recent result of Guo et al. [23]. We observe that since the alternating scan dynamics is non-reversible, its relaxation time is defined in terms of the spectral gap of its multiplicative reversiblization; see, e.g., [17, 37].

The rest of the paper is organized as follows. We conclude this introduction with a brief discussion of our techniques. Section 2 contains some basic terminology, definitions and facts used throughout the paper. In Section 3 we derive our results for the tiled block dynamics (Theorem 1.1) and introduce our comparison technology in Section 3.1. In Sections 4 and 5 we provide two applications of this technology: bounds for the spectral gaps of the SW dynamics (Theorem 1.3) and of the general block dynamics (Theorem 1.2), respectively. Finally, in Section 6 we provide our proofs for Theorems 1.5 and 1.6 concerning systematic scan dynamics.

### 1.1 Overview of Techniques

We conclude this introduction by briefly indicating some of our techniques. We use the path coupling method of Bubley and Dyer [7] to establish our results for the tiled (heat-bath) block dynamics in Theorem 1.1. Our proof of this theorem is a generalization of the methods in [15]. We then develop a novel comparison methodology, consisting of several new comparison inequalities concerning various block dynamics, that together with this result allow us to establish Theorems 1.2 and 1.3. We provide next a high-level overview of this technology.

We consider a more general class of tiled block dynamics. Suppose that for each and each configuration in , we are given an ergodic Markov chain that acts only on the tiling , has as the fixed configuration in and is reversible with respect to . Given this family of Markov chains, we consider the tiled block dynamics that chooses a tiling uniformly at random from and updates the configuration in with a step of , provided is the configuration in . We are able to show that the spectral gap of any such tiled block dynamics is determined by the spectral gap of the tiled heat-bath block dynamics (which is considered in Theorem 1.1) and the spectral gaps of the ’s. To bound the spectral gaps of the ’s we crucially use the fact that, by design, the ’s consists of non-interacting -dimensional cubes of constant volume.

We use this methodology in the proof of Theorem 1.2 to show that the heat-bath block dynamics with exactly two blocks, one “even” block containing all the even vertices and an “odd” one with all the odd vertices, has a constant spectral gap provided SSM holds. For this, we consider the tiled block dynamics that picks a tiling uniformly at random and with probability performs a heat-bath update in all the even vertices in , and otherwise in all the odd ones. The other part of the proof consists of establishing a comparison inequality between the spectral gaps of the even/odd heat-bath block dynamics (i.e., the block dynamics with exactly two blocks: the even and odd ones) and general heat-bath block dynamics (i.e., where the collection of blocks is arbitrary). For this, we use two key properties of the variance functional: monotonicity and tensorization.

To derive our results for the SW dynamics in Theorem 1.3 we introduce an auxiliary variant of the SW dynamics that only updates isolated vertices (instead of connected components of any size). This isolated vertices variant can be compared to a tiled block dynamics that in a step updates all the isolated vertices in a single block chosen uniformly at random from . Our comparison methodology above is then used to show that the spectral gap of this tiled block dynamics is . To establish comparison inequalities between the spectral gaps of the SW dynamics, the isolated vertices variant of the SW dynamics and the tiled block dynamics that updates isolated vertices in a tiling, we use elementary functional analysis and the comparison framework of Ullrich [47, 48, 46].

The proof of our later theorem on systematic scan for monotone systems (Theorem 1.5) is loosely based on ideas from [15]. Finally, to establish our result for the alternating scan dynamics (Theorem 1.6), we relate the spectral gap of this dynamics to that of the even/odd heat-bath block dynamics, which we analyze in the proof of Theorem 1.2.

## 2 Background

### 2.1 Spin systems

Let be the infinite -dimensional lattice graph, where for , iff . Let be a finite subset of and let be the induced subgraph. We use to denote the boundary of , i.e., the set of vertices in connected by an edge in to .

A spin system on consists of a set of spins , a symmetric edge potential and a vertex potential . A configuration of the system is an assignment of spins to the vertices of ; we denote by the set of all configurations. A boundary condition for  is an assignment of spins to some (or all) vertices in ; i.e., with . The boundary condition where is called the free boundary condition.

Given a boundary condition , each configuration is assigned probability

 μψ(σ)=1Z⋅e−HψG(σ),

where is the normalizing constant and

 HψG(σ)=−∑(u,v)∈EU(σ(u),σ(v))−∑(u,v)∈E:u∈Aψ,v∈VU(ψ(u),σ(v))−∑u∈VW(σ(u)).

In the statistical physics literature, is called the partition function and the Hamiltonian of the system.

A particularly well known and widely studied spin system is the Ising/Potts model, where , and . The parameter is related to the inverse temperature of the system and to an external magnetic field. In Section 4 we analyze dynamics for the Ising/Potts model with ferromagnetic interactions () and no external field ( for all ).

###### Remark 1.

There are important spin systems, such as the hard-core model and the antiferromagnetic Potts model at zero temperature (proper -colorings), that require the edge potential to be infinite for certain configurations; namely, there are hard constraints in the system that make certain configurations invalid. Our results in Sections 3, 5 and 6 hold in this more general setting provided the system is permissive. A spin system is permissive if for any and any configuration on , there is at least one configuration on such that . This ensures that the measure is well-defined. It is easy to verify that, in addition to systems without hard constraints, the hard-core model for all and proper -colorings when are all permissive systems.

### 2.2 Glauber dynamics

Consider the spin system on with a fixed boundary condition . Let be a Markov chain that, given a configuration on , performs the following update:

1. Pick uniformly at random (u.a.r.);

2. Replace with a spin from sampled according to the distribution .

This Markov chain is called the (heat-bath) Glauber dynamics. is clearly reversible with respect to (w.r.t.) and, to avoid complications, we assume that it is irreducible. (This is always the case in systems without hard constraints, but could be reducible for some permissive systems; e.g., proper -colorings when .)

### 2.3 Strong spatial mixing (SSM)

Several notions of decay of correlations in spin systems have been useful in the analysis of local algorithms. A particularly important one is SSM, which says that the influence of a set on another decays exponentially with the distance between these sets.

For a fixed finite and , let be the condition that for all , all , and any pair of boundary conditions , on that differ only at , we have

 ∥μψB−μψuB∥\textsctv≤bexp(−a⋅dist(u,B)), (2)

where and are the probability measures induced in by and , respectively, denotes total variation distance and .

###### Definition 2.1.

A spin system on has SSM if there exist such that holds for every -dimensional cube .

###### Remark 2.

The definition of SSM varies in the literature. The main difference lies in the class of subsets for which is required to hold. The two boundary conditions may also differ on a larger subset of . We work here with one of the weakest versions of SSM. In particular, this notion is known to hold for the Ising/Potts model on for all and , where is the uniqueness threshold.

### 2.4 Mixing and coupling times

Let be an ergodic Markov chain over with stationary distribution . Let denote the distribution of after steps starting from , and let

 τmix(M,ε)=maxX0∈Ωmin{t≥0:∥Mt(X0,⋅)−μψ∥\textsctv≤ε}.

The mixing time of is defined as .

A (one step) coupling of the Markov chain specifies, for every pair of states , a probability distribution over such that the processes and , viewed in isolation, are faithful copies of , and if then . Let be the minimum such that , maximized over pairs of initial configurations , . The following inequality is standard:

 τmix(M,ε)≤Tcoup(ε);

(see, e.g., [29]). The coupling time is and thus . Moreover, if for any positive integer , then

 Pr[XT≠YT]≤1/4k. (3)

### 2.5 Analytic tools

Our proofs use elementary notions from functional analysis, which we briefly review here. For extensive background on the application of such ideas to the analysis of finite Markov chains, see [39, 37].

Let be the transition matrix of a finite irreducible Markov chain with state space and stationary distribution . For any , we let . If we endow with the inner product , we obtain a Hilbert space denoted and defines an operator from to . The Cauchy-Schwarz inequality implies

 ⟨f,Pf⟩μ≤⟨f,f⟩μ. (4)

Consider two Hilbert spaces and with inner products and respectively, and let be a bounded linear operator. The adjoint of is the unique operator satisfying for all and . If , is self-adjoint when .

In our setting, the adjoint of in is given by the transition matrix , and therefore is self-adjoint iff is reversible w.r.t. . In this case the spectrum of is real and we let denote its eigenvalues ( because is irreducible). The absolute spectral gap of is defined by , where . If is ergodic (i.e., irreducible and aperiodic), then , and it is a standard fact that for all all reversible Markov chains satisfy

 τmix(P,ε)≥(λ(P)−1−1)log(12ε), (5)

(see Theorem 12.4 in [29]). is called the relaxation time.

is positive semidefinite if and , . In this case has only nonnegative eigenvalues. The Dirichlet form of a reversible Markov chain is defined as

 EP(f,f)=⟨f,(I−P)f⟩μ=12∑x,y∈Ωμ(x)P(x,y)(f(x)−f(y))2,

for any . If is positive semidefinite, then the absolute spectral gap of satisfies

 λ(P)=1−λ2=minf∈R|Ω|,Varμ(f)≠0EP(f,f)Varμ(f), (6)

where and .

## 3 SSM and tiled block dynamics for general spin systems

Let be a -dimensional cube of volume222For , the volume of is . . Let be the induced subgraph and let be a fixed boundary condition on . For ease of notation we set .

Let be a collection of sets (or blocks) such that . A block dynamics w.r.t. this collection of sets is a Markov chain that in each step picks a set uniformly at random from and updates the configuration in . The heat-bath block dynamics corresponds to the case where the configuration in is replaced by a new configuration distributed according to the conditional measure in given the configuration in .

In this section we consider two different versions of the block dynamics for a particular collection of sets, that with slight abuse of terminology we call tilings. The steps of this dynamics can be efficiently implemented in parallel, so we believe it is interesting in its own right. Moreover, the mixing time and spectral gap bounds we derive here will be crucially used later in our proofs in Sections 4 and 5, where we consider the SW dynamics and general block dynamics, respectively.

We define the collection of blocks first, which we denote . Let be an odd integer. For each , let be the union of all -dimensional cubes of side length with centers at for some . The cubes in have volume and are at distance from each other (see Figure 1). For each , let and let ; then . We call each a tiling of since it corresponds to a tiling of with cubes of side lengh . Any block dynamics w.r.t.  is called a tiled block dynamics.

###### Remark 3.

In our proofs we will choose to be a sufficiently large constant independent of . The choice of the distance between the -dimensional cubes is so that neighboring cubes do not interact. This distance is sufficient because we are considering spin systems with only nearest-neighbor interactions. To extend our proofs to arbitrary finite range spin systems on it suffices to choose a larger distance between these cubes.

Let be the transition matrix of the heat-bath tiled block dynamics. That is, given a configuration at time , the chain proceeds as follows:

1. Pick u.a.r.;

2. Update the configuration in with a sample from .

This chain is clearly ergodic and reversible w.r.t. . We prove the following lemma, which corresponds to Theorem 1.1 from the introduction.

###### Lemma 3.1.

When is a sufficiently large constant (independent of ), SSM implies that and .

###### Proof.

The proof is a generalization of the path coupling argument in [15]. Let and be two copies of the tiled heat-bath block dynamics that differ at a single vertex . We construct a coupling of the steps of such that the expected number of disagreements between and is strictly less than one.

The region chosen in step 1 of the chain is the same in both copies. For every tiling there are three possibilities (see Figure 1):

1. , in which case we use the same configuration for in both copies and so with probability 1;

2. , and again we use the same configuration to update in both copies. Then, and differ only at with probability 1; or

3. . In this case disagreements could propagate from to the interior of , but we describe next a coupling that limits the extent of such propagation.

Case (a) occurs with probability for large enough . Let us consider case (c); i.e., . This case occurs with probability at most . Moreover, is in the boundary of exactly one of the smaller cubes (of side length at most ) in , which we denote . The cube can be partitioned into the sets of vertices that are close and far from . More precisely, let , and . SSM implies

where and are the two boundary conditions induced in by and , respectively, and thus differ only at . This implies that there is a coupling of the distributions and such that if is a sample from this coupling (so, and are configurations on ), then

where the last inequality holds for large enough . Hence, we can couple the update on such that and disagree on with probability at most . Then, the expected number of disagreements in is crudely bounded by

 |C|+|F|Ld≤(2R)d+1≤L8d+1.

The same configuration is used to update both copies in and so with probability one. This is possible because the configuration in the boundary of is the same in both and .

Combining all these facts, we get there is a coupling such that the expected number of disagreements at time is at most:

 1−12+2dL(L8d+1)=34+2dL≤78,

provided that is large enough. The path coupling method [7] then implies that

 maxσ∈Ω∥BtD(σ,⋅)−μ(⋅)∥\textsctv≤n(78)t.

This implies that the mixing time of is and that (see, e.g., Corollary 12.6 in [29]); hence, as claimed. ∎

### 3.1 Comparing tiled block dynamics

In this subsection we introduce a more general class of tiled block dynamics and relate the spectral gaps of the dynamics in this class to that of the heat-bath tiled block dynamics. This will allow us to deduce bounds for the spectral gaps of various tiled block dynamics, a key step in our comparison methodology.

Each dynamics in this class chooses a tiling uniformly at random from and updates the configuration in in a reversible fashion. Formally, for each and each valid configuration in , let be the transition matrix of an ergodic Markov chain whose state space is the set of valid configurations in given that is the configuration in . That is, is a Markov chain acting on the specific tiling with as the fixed configuration in the exterior of . We assume that, for each and , is reversible w.r.t.  and positive semidefinite. Using the ’s we define a tiled block dynamics as follows. Given a spin configuration , consider the chain that performs the following update to obtain :

1. Pick u.a.r.;

2. If , let and perform a step of to obtain .

Let denote the transition matrix of this chain. The ergodicity and reversibility of w.r.t.  follow from the ergodicity and reversibility of the ’s w.r.t. . We establish the following inequality between the spectral gaps of and . For , let be the set of the valid configurations of . Then,

###### Lemma 3.2.
 λ(SD)≥λ(BD)mink=1,…,mminτ∈Ω(Bck)λ(Sτk).

In words, this inequality states that the spectral gap of a generic tiled block dynamics is bounded from below by the spectral gap of the tiled heat-bath block dynamics times the smallest spectral gap of any of the ’s. This is indeed a natural inequality since roughly steps of should be enough to simulate one step of in when is the configuration in . Lemmas 3.1 and 3.2 put together allow us to bound the spectral gap of a general class of tiled block dynamics, provided that SSM holds and that we know the spectral gaps of the ’s. As we shall see in our later applications of these results, the geometry of the tilings in was chosen in a way that facilitates the analysis of many natural choices of the ’s.

Before proving Lemma 3.2 we state the two standard properties of heat-bath updates which will be used in the proof. For let be the transition matrix that corresponds to a heat-bath update in the set . That is, for ,

 KA(σ,σ′)=\mathbbm1(σ(Ac)=σ′(Ac))μ(σ′(A)∣σ(Ac)).

For ease of notation let denote the Dirichlet form of ; i.e., .

###### Fact 3.3.

is positive semidefinite. Moreover, for any

 EA(f,f)=∑τ∈Ω(Ac)VarτA(f)μ(τ),

where and .

We proceed with the proof of Lemma 3.2.

###### Proof of Lemma 3.2.

Let . Since ,

 (7)

by Fact 3.3.

For , let be the set of valid configurations on given that is the configuration on . For , let be such that for any . By assumption, is positive semidefinite, ergodic and reversible w.r.t. . Since also from (6), we get

 0<λ(Sτk)≤ESτk(fτ,fτ)Varμ(⋅|τ)(fτ)=ESτk(fτ,fτ)VarτBk(f). (8)

Let

 λmin=mink=1,…,mminτ∈Ω(Bck)λ(Sτk).

Then, from the definition of the Dirichlet form, (7) and (8) we get

 ESD(f,f) =1mm∑k=1∑τ∈Ω(Bck)μ(τ)ESτk(fτ,fτ) (9) ≥1mm∑k=1∑τ∈Ω(Bck)μ(τ)λ(Sτk)VarτBk(f)≥λ% minEBD(f,f).

Finally, we claim that both and are positive semidefinite. is an average over heat-bath updates each of which is positive semidefinite by Fact 3.3. Hence, is positive semidefinite. Similarly, the positivity of follows from the fact that by assumption the ’s are positive semidefinite. Indeed, from (9) and the definition of Dirichlet form, we get

 ⟨f,SDf⟩μ=1mm∑k=1∑τ∈Ω(Bck)μ(τ)⟨fτ,Sτkfτ⟩μ(⋅|τ)≥0.

Therefore, by (6), , as claimed. ∎

We conclude this section with the proof of Fact 3.3.

###### Proof of Fact 3.3.

Since , positive semidefinite. For , let be the set of valid configurations on when the configuration on is . Then, by the definition of the Dirichlet form,

 EA(f,f) =12∑τ∈Ω(Ac)∑σ,σ′∈Ωτ(A)μ(σ∪τ)μ(σ′∣τ)(f(σ∪τ)−f(σ′∪τ))2 =12∑τ∈Ω(Ac)μ(τ)∑σ,σ′∈Ωτ(A)μ(σ∣τ)μ(σ′∣τ)(f(σ∪τ)−f(σ′∪τ))2 =∑τ∈Ω(Ac)VarτA(f)μ(τ).\qed

## 4 SSM and the Swendsen-Wang dynamics for the Potts model

In this section we show that SSM implies fast mixing of the Swendsen-Wang (SW) dynamics. In particular, we prove that when is a finite -dimensional cube, the relaxation time (i.e., the inverse spectral gap) of the SW dynamics on the graph induced by is at most , provided the system has SSM.

The SW dynamics is a non-local Markov chain for the ferromagnetic Potts model () with no external field ( for all ); see Section 2.1 for the definition of this model. The state space of the SW dynamics is the set of Potts configurations , and it is straightforward to verify the reversibility of this chain w.r.t. the Potts measure, which, for distinctness, we will denote (see, e.g., [16]). We focus here on the free boundary condition case for clarity, but our results hold without significant modifications for the SW dynamics with arbitrary boundary conditions.

Let be a -dimensional cube of volume and let be the induced subgraph. Given a Potts configuration , a step of the SW dynamics results in a new configuration as follows:

1. Add each monochromatic edge independently with probability to obtain a joint configuration , where and an edge is monochromatic if ;

2. Assign to each connected component of independently a new spin from u.a.r.;

3. Remove all edges to obtain the new Potts configuration .

Let be the transition matrix of the SW dynamics on . In this section we prove Theorem 1.3 from the introduction. Corollary 1.4 follows directly from Theorem 1.3 and the fact that, in , SSM holds for all and (see [3, 2, 35]). In the proof of Theorem 1.3 we use several auxiliary Markov chains that we define and briefly motivate in Section 4.1. The proof of Theorem 1.3 is then provided in Section 4.2.

### 4.1 Auxiliary Markov chains

In Section 3 we established that the spectral gap of the heat-bath tiled block dynamics is at least , provided SSM holds (see Lemma 3.1). To prove Theorem 1.3 we show that the spectral gap of the SW dynamics is at least the spectral gap of the heat-bath tiled blocked dynamics times a constant that depends only on , and . Establishing such inequality directly seems difficult because the SW dynamics could change the spins in a large component intersecting many of the -dimensional cubes in a tiling. To work around this issue we introduce the following Markov chain.

Isolated vertices (SW) dynamics . Consider the Markov chain that, given a Potts configuration at time , performs the following update to obtain :

1. Add each monochromatic edge independently with probability to obtain ;

2. Assign to each isolated vertex of independently a new spin from u.a.r.;

3. Remove all edges to obtain .

We call this chain the isolated vertices dynamics and with a slight abuse of notation we let also denote its transition matrix. Intuitively, the SW dynamics ought to be faster than the isolated vertices dynamics since it updates all the components of any size simultaneously, instead of just the isolated vertices. We show that this is indeed the case.

###### Lemma 4.1.

.

The proof of this lemma is given in Section 4.2.2. The motivation for introducing is that now we can easily define a tiled variant of this chain as follows.

Isolated vertices tiled dynamics . Recall that is the collection of tilings; see Section 3 for the precise definition. Given a Potts configuration , one step of the isolated vertices tiled dynamics is given by:

1. Add each monochromatic edge independently with probability to obtain ;

2. Pick u.a.r.;

3. Assign to each isolated vertex in independently a new spin from u.a.r.;

4. Remove all edges to obtain .

We use to denote the transition matrix of this chain. Intuitively, should reach equilibrium faster than since in each step it updates the spins of all isolated vertices, instead of just those in a single tiling. This intuition is made rigorous in the following lemma, which is proved in Section 4.2.2.

###### Lemma 4.2.

.

Finally, it will be useful in our proofs to consider yet another variant of the isolated vertices dynamics that acts on a particular tiling with a fixed configuration in its exterior. These chains correspond to the ’s from Section 3 for the tiled dynamics .

Conditional isolated vertices tiled dynamics . For each and each fixed configuration in , we consider the Markov chain with transition matrix and state space , that if , then is obtained as follows:

1. Add each monochromatic edge in (according to ) independently with probability ;

2. Assign to each isolated vertex in independently a new spin from u.a.r.;

3. Remove all edges to obtain .

### 4.2 Proof of Theorem 1.3

Let

 λmin=mink=1,…,mminτ∈ΩP(Bck)λ(Iτk).

(Recall that is the set of valid configurations of and is the conditional isolated vertex tiled dynamics on with as the fixed configuration in the exterior of .) We prove the following two lemmas that, together with Lemmas 4.1 and 4.2 and the results in Section 3, imply Theorem 1.3.

###### Lemma 4.3.
1. and are reversible w.r.t.  and positive semidefinite.

2. For all and , is reversible w.r.t.  and positive semidefinite.

.

###### Proof of Theorem 1.3.

By Lemmas 4.1 and 4.2,

 λ(SW)≥λ(I\textscsw)≥λ(ID).

is a tiled block dynamics. Indeed, if is the configuration in , then the configuration in is updated with a step of the ergodic Markov chain . By Lemma 4.3, is reversible w.r.t.  and positive semidefinite. Lemma 4.3 also implies that is reversible w.r.t.  and positive semidefinite, for all and . Hence, by Lemma 3.2

 λ(ID)≥λminλ(BD).

By Lemma 3.1, when is a sufficiently large constant (independent of ), SSM implies that . Moreover, by Lemma 4.4, . Then

 λ(SW)≥156e−2βdLd,

and the result follows from the fact that . ∎

The rest of this section is organized as follows. The proofs of Lemmas 4.1, 4.2 and 4.3 use a common representation of the Markov chains , and which we introduce in Section 4.2.1. The actual proofs of these lemmas are provided in Section 4.2.2. The proof of Lemma 4.4 is provided in Section 4.2.3 and crucially uses the fact that by design the -dimensional cubes of side length in each tiling do not interact with each other.

#### 4.2.1 Common representation

We provide here a decomposition of the transition matrices , and as products of simpler matrices, which will be used in our proofs of Lemmas 4.1, 4.2 and 4.3. We are able to do this because the steps of these chains all include a “lifting” substep to a joint configuration space , where configurations consist of a spin assignment to the vertices together with a subset of the edges of . The joint Edwards-Sokal measure on is given by

 ν(A,σ)=p|A|(1