Rapid mixing of Swendsen-Wang and single-bond dynamics in two dimensions

# Rapid mixing of Swendsen-Wang and single-bond dynamics in two dimensions

Mario Ullrich111The author was supported by the DFG GK 1523.
Mathematisches Institut, Universität Jena
email: mario.ullrich@uni-jena.de
###### Abstract

We prove that the spectral gap of the Swendsen-Wang dynamics for the random-cluster model on arbitrary graphs with edges is bounded above by times the spectral gap of the single-bond (or heat-bath) dynamics. This and the corresponding lower bound (from [U12]) imply that rapid mixing of these two dynamics is equivalent.
Using the known lower bound on the spectral gap of the Swendsen-Wang dynamics for the two dimensional square lattice of side length at high temperatures and a result for the single-bond dynamics on dual graphs, we obtain rapid mixing of both dynamics on at all non-critical temperatures. In particular this implies, as far as we know, the first proof of rapid mixing of a classical Markov chain for the Ising model on at all temperatures.

## 1 Introduction

Markov chains for the random-cluster model and the closely related -state Potts model are the topic of many research articles from various areas of mathematics and statistical physics. The probably most studied model is the Ising model (or 2-state Potts model) on the two-dimensional square lattice. While there is almost complete knowledge about the mixing properties of single-spin dynamics, such as the heat-bath dynamics, there are only a few results on cluster algorithms, such as Swendsen-Wang, or dynamics on the corresponding random-cluster model. The single-spin dynamics on , i.e. the two dimensional square lattice of side length , is known to mix rapidly above the critical temperature [MO94] and below the critical temperature the mixing time is exponential in the side length , see [CGMS96]. See also [Mar99] for an excellent survey of the (at that time) known results. Only recently it was proven by Lubetzky and Sly [LS10] that the single-spin dynamics is also rapidly mixing at the critical temperature. One approach to overcome the torpid (or slow) mixing at low temperatures was to consider cluster algorithms that change the spin of a large portion of vertices at once. The most successful approach (so far) is the Swendsen-Wang dynamics (SW) that is based on the close relation between Potts and random-cluster models, see [SW87] and [ES88]. But although it is conjectured that this dynamics is rapidly mixing at high and at low temperatures, again most of the results concern high temperatures. Known results for general graphs include rapid mixing on trees and complete graphs at all temperatures, see e.g. [CF98], [CDFR00] and [LNP07], and graphs with small maximum degree at high temperatures , see [CF98] and [Hub03]. Additionally, it is proven that, for bounded degree graphs, rapid mixing of single-spin dynamics implies rapid mixing of SW [U11]. At low temperatures there are only a few results on the mixing time of SW. Beside the results for trees and complete graphs, we are only aware of two articles concerning the low temperature case, [Mar92] and [Hub03]. While Huber [Hub03] states rapid mixing for temperatures below some constant that depends on the size of the graph, Martinelli [Mar92] gave a result for hypercubic subsets of at sufficiently low temperatures that do not depend on the side length. Additionally to the rapid mixing results for SW there are some results on torpid mixing. These include torpid mixing for the -state Potts model at the critical temperature on the complete graph for all [GJ97] and on hypercubic subsets of for sufficiently large [BCT10].

In this article we study the mixing properties of the Swendsen-Wang and the heat-bath dynamics for the random-cluster model (In fact, SW can be seen as a Markov chain for random-cluster and Potts models.) and we prove that the spectral gap of SW is bounded above by some polynomial in the size of the graph times the spectral gap of the heat-bath dynamics. In particular, this implies rapid mixing of SW for the Potts model on the two-dimensional square lattice at all temperatures below the critical one.

To state our results in detail, we first have to define the models and the algorithms. Let be a graph with finite vertex set and edge set . The random-cluster model (also known as the FK-model) with parameters and , see Fortuin and Kasteleyn [FK72], is defined on the graph by its state space and the RC measure

where is the number of connected components in the graph , counting isolated vertices as a component, and is the normalization constant that makes a probability measure. Note that this model is well-defined also for non-integer values of , but we do not need this generalization here. See [Gri06] for further details and related topics.

A closely related model is the -state Potts model on at inverse temperature , that is defined as the set of possible configurations , where is the set of colors (or spins), together with the probability measure

 π(σ):=πGβ,q(σ)=1Z(G,1−e−β,q)exp⎧⎨⎩β∑u,v:{u,v}∈E\scalebox1.2\raisebox−0.569055pt$\mathds1$(σ(u)=σ(v))⎫⎬⎭

for , where is the same normalization constant as for the RC model (see [Gri06, Th. 1.10]). For this model is called Ising model.

The connection of these models is given by a coupling of the RC and the Potts measure in the case . Let and . Then the joint measure of is defined by

 ν(σ,A):=νGp,q(σ,A)=1Z(G,p,q)(p1−p)|A|\scalebox1.2\raisebox−0.569055pt$\mathds1$(A⊂E(σ)),

where

 E(σ):={{u,v}∈E:σ(u)=σ(v)}.

The marginal distributions of are exactly and , respectively, and we will call the FKES (Fortuin-Kasteleyn-Edwards-Sokal) measure, see e.g. [ES88] and [Gri06].

The Swendsen-Wang dynamics (SW) uses this coupling implicitly in the following way. Suppose the SW at time is in the state . We choose with respect to the measure , i.e. every connected component of is colored independently and uniformly at random with a color from . Then take and delete each edge independently with probability to obtain , which can be seen as sampling from . Denote by the transition matrix of this Markov chain. Of course, we can make these two steps in reverse order to obtain a Markov chain for the -state Potts model with transition matrix .

The heat-bath dynamics (HB) for the random-cluster model is a local Markov chain that, given the current state , sets with probability and otherwise chooses a edge uniformly at random and changes the state at most at the edge with respect to the conditional measure given all the other edges, which is sampling of from the conditional measure . The transition matrix of this chain is denoted .

The spectral gap of a Markov chain with transition matrix is defined by

 λ(P):=1−max{|ξ|:ξ is an eigenvalue % of P,ξ≠1}.

We prove

###### Theorem 1.

Let (resp. ) be the transition matrix of the Swendsen-Wang (resp. heat-bath) dynamics for the random-cluster model on a graph with edges. Then

 λ(PSW)≤16mlog(m)λ(PHB).

Using the corresponding lower bound, which was proven in [U12] we obtain that SW is rapidly mixing if and only if HB is rapidly mixing, since the spectral gaps can differ only by a polynomial in the number of edges of the graph. Furthermore we prove that the heat-bath dynamics for the RC model on a planar graph with parameters and has the same spectral gap than the heat-bath dynamics for the dual model, which is the random-cluster model on the dual graph (see Section 5.1 for definitions) with parameters and , where satisfies . We denote the dynamics for the dual model by (resp. ). This was probably known before, but we could not found a reference. It follows

###### Corollary 2.

Let be the transition matrix of the Swendsen-Wang dynamics for the random-cluster model on a planar graph with edges and let be the SW dynamics for the dual model. Then there exists a constant , such that

 λ(PSW)≤cmlog(m)λ(P†SW).

If we consider the two-dimensional square lattice of side length , i.e. the graph with and , we can deduce the following from the results of [U11].

###### Theorem 3.

Let be the transition matrix of the Swendsen-Wang dynamics for the random-cluster model on with parameters and . Let . Then there exist constants such that

•     for ,

•  for ,

•      for and ,

where .

An immediate consequence is the following corollary.

###### Corollary 4.

Let be the transition matrix of the Swendsen-Wang dynamics for the -state Potts model on at inverse temperature . Let . Then there exist constants such that

•     for ,

•  for ,

•      for and ,

where .

This seems to be the first prove of rapid mixing of a classical Markov chain for the Ising model at all temperatures. In fact, in [U11] a somehow artificial Markov chain, that makes a additional step at the dual graph, is proven to be rapid.

We also obtain for the heat-bath dynamics

###### Theorem 5.

Let be the transition matrix of the heat-bath dynamics for the random-cluster model on with parameters and . Let . Then there exist constants such that

•  for ,

•      for and ,

where .

The results of [U11], and hence the proofs of Theorems 3 and 5, rely ultimately on the rapid mixing results for the heat-bath dynamics for the Potts (resp. Ising) model that were proven over the last decades. To state only some of them, see e.g. [LS10], [MO94], [MOS94]l and [Ale98] together with the proof of exponential decay of connectivities up to the critical temperature from [BD10]. These articles give an almost complete picture over what is known so far about mixing of single-spin dynamics in .

The plan of this article is as follows. In Section 2, we introduce the necessary notation related to the spectral gap of Markov chains. Section 3 contains a more detailed description of the algorithms and the definition of the “building blocks” that are necessary to represent the dynamics on the FKES model. In Section 4 we will prove Theorem 1, and in Section 5 we introduce the notion of dual graphs and prove the remaining results from above.

## 2 Spectral gap and mixing time

As stated in the introduction, we want to estimate the efficiency of Markov chains. For an introduction to Markov chains and techniques to bound the convergence rate to the stationary distribution, see e.g. [LPW09]. In this article we consider the spectral gap as measure of the efficiency. Let be the transition matrix of a Markov chain with state space that is ergodic, i.e. irreducible and aperiodic, and has unique stationary measure . Additionally let the Markov chain be reversible with respect to , i.e.

 π(x)P(x,y)=π(y)P(y,x) for all x,y∈Ω.

Then we know that the spectral gap of the Markov chain can be expressed in terms of norms of the (Markov) operator that maps from to , where inner product and norm are given by and , respectively. The operator is defined by

 Pf(x):=∑y∈ΩP(x,y)f(y) (1)

and represents the expected value of the function after one step of the Markov chain starting in . The operator norm of is

 ∥P∥π:=∥P∥L2(π)→L2(π)=max∥f∥π≤1∥Pf∥π

and we use interchangeably for functions and operators, because it will be clear from the context which norm is used. It is well known that for reversible , where , and that reversibility of is equivalent to self-adjointness of the corresponding Markov operator, i.e. , where is the (adjoint) operator that satisfies for all . The transition matrix that corresponds to the adjoint operator satisfies

 P∗(x,y)=π(y)π(x)P(y,x).

If we are considering a family of state spaces with a corresponding family of Markov chains , we say that the chain is rapidly mixing for the given family if for all and some .

In several (or probably most of the) articles on mixing properties of Markov chains the authors prefer to use the mixing time as measure of efficiency, which is defined by

 τ(P):=min⎧⎨⎩t:maxx∈Ω∑y∈Ω∣∣Pt(x,y)−π(y)∣∣≤1e⎫⎬⎭.

The mixing time and spectral gap of a Markov chain (on finite state spaces) are closely related by the following inequality, see e.g. [LPW09, Theorem 12.3 & 12.4].

###### Lemma 6.

Let be the transition matrix of a reversible, ergodic Markov chain with state space and stationary distribution . Then

 λ(P)−1−1≤τ(P)≤log(2eπmin)λ(P)−1,

where .

In particular, we obtain the following for the random-cluster model.

###### Corollary 7.

Let be the transition matrix of a reversible, ergodic Markov chain for the random-cluster model on with parameters and . Then

 λ(P)−1−1≤τ(P)≤(2+|E|log1p(1−p)+|V|logq)λ(P)−1.

Therefore, all results of this article can also be written in terms of the mixing time, loosing the same factor as in Corollary 7.

## 3 Joint representation of the algorithms

In order to make our description of the considered Markov chains complete, we state in this section formulas for their transition matrices. Additionally, we introduce another local Markov chain that will be necessary for the further analysis and introduce a representation of the dynamics on (joint) FKES model.

The Swendsen-Wang dynamics (on the RC model), as stated in the introduction, is based on the given connection of the random cluster and Potts models and has the transition matrix

 PSW(A,B)=q−c(A)(p1−p)|B|∑σ∈ΩP(1−p)|E(σ)|\scalebox1.2\raisebox−0.569055pt$\mathds1$(A∪B⊂E(σ)). (2)

Recall that we denote by the Swendsen-Wang dynamics for the Potts model and note that both dynamics have the same spectral gap, see [U11].

The second algorithm we want to analyze is the (lazy) heat-bath dynamics. Let be given and denote by (resp. ) connected (resp. not connected) in the subgraph . Additionally we use throughout this article instead of (respectively for ) and denote the endpoints of by and , i.e. . For , , the transition probabilities of the HB dynamics are given by

 PHB(A,B):=12|E|∑e∈Eμ(B)μ(A∪e)+μ(A∖e)\scalebox1.2\raisebox−0.569055pt$\mathds1$(A⊖B⊂e), (3)

where denotes the symmetric difference and is chosen such that is stochastic. Hence, satisfies

 PHB(A,B)=12|E|∑e∈E⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩p, for B=A∪e and e(1)\lx@stackrelA∖e⟷e(2)1−p, for B=A∖e and e(1)\lx@stackrelA∖e⟷e(2)pp+q(1−p), for B=A∪e and e(1)\lx@stackrelA∖e\longarrownot⟷e(2)q(1−p)p+q(1−p), for B=A∖e and e(1)\lx@stackrelA∖e\longarrownot⟷e(2)

for . The heat-bath dynamics has the advantage that the corresponding HB dynamics for the dual model has the same spectral gap, see Section 5.1. Unfortunately, this Markov chains do not admit a representation on the joint model like the SW dynamics. Therefore we introduce the following (non-lazy) local dynamics with transition probabilities

 PSB(A,B)=1|E|∑e∈E⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩p, for B=A∪e and e(1)\lx@stackrelA⟷e(2)1−p, for B=A∖e and e(1)\lx@stackrelA⟷e(2)pq, for B=A∪e and e(1)\lx@stackrelA\longarrownot⟷e(2)1−pq, for B=A∖e and e(1)\lx@stackrelA\longarrownot⟷e(2). (4)

We call this Markov chain the single-bond dynamics (SB). This chain is inspired by the Swendsen-Wang dynamics since for a graph that consists of two vertices connected by a single edge. Note that , and are reversible with respect to .

Before we state the representation of and on the FKES model, we show that the spectral gaps of and are closely related. For this let and note that is the transition matrix of the lazy single-bond dynamics.

###### Lemma 8.

For and for the random-cluster model with parameters and we have

 12λ(PSB)≤λ(PHB)≤(1−p(1−q−1))−1λ(I+PSB2)
###### Proof.

Using standard comparison ideas, e.g. from [DSC93, Section 2.A], we obtain that for two transition matrices and , for all implies , where for lazy Markov chains the inequality for all is sufficient. Additionally we have in general . Therefore it is enough to prove for all , which is easy to check.

We want to represent the Swendsen-Wang and the single-bond dynamics on the FKES model, which consists of the product state space and the FKES measure . This was done first in [U12] and we follow the steps from this article. First we introduce the stochastic matrix that defines the mapping (by matrix multiplication) from the RC to the FKES model

 M(B,(σ,A)):=q−c(B)\scalebox1.2\raisebox−0.569055pt$\mathds1$(A=B)\scalebox1.2\raisebox−0.569055pt$\mathds1$(B⊂E(σ)). (5)

Note that defines an operator (like in (1)) that maps from to and its adjoint operator can be given by the (stochastic) matrix

 M∗((σ,A),B)=\scalebox1.2\raisebox−0.569055pt$\mathds1$(A=B).

The following matrix represents the updates of the RC “coordinate” in the FKES model. For and let

 Te((σ,A),(τ,B)):=\scalebox1.2\raisebox−0.569055pt$\mathds1$(σ=τ)⎧⎪⎨⎪⎩p,B=A∪e and σ(e(1))=σ(e(2))1−p,B=A∖e and σ(e(1))=σ(e(2))1,B=A∖e and σ(e(1))≠σ(e(2)). (6)

The following simple lemma shows some interesting properties of the matrices from (5) and (6), e.g. is a family of commuting projections in . This will be important in the proof of the main result.

###### Lemma 9.

Let , and be the matrices from above. Then

1. and are self-adjoint in .

2. and for all .

3. and .

Now we can state the desired Markov chains with the matrices from above.

###### Lemma 10.

Let , and be the matrices from above. Then

1. .

2. .

From Lemma 9 we have that the order of multiplication in is unimportant. Lemma 9 and 10 were proven in [U12, Lemma 3&4], but note that in this article is the lazy version of from here.

## 4 Proof of Theorem 1

In this section we will prove Theorem 1. This is done in two subsections. In the first one we prove some general norm estimates for operators on (resp. between) Hilbert spaces. In the second subsection we will apply these estimates to the setting from above to obtain the result.

### 4.1 Technical lemmas

In this section we provide some technical lemmas that will be necessary for the analysis. We state them in a general form, because we guess that they could be useful also in other settings. First let us introduce the notation. Throughout this section consider two Hilbert spaces and with the corresponding inner products and . The norms in and are defined as usual as the square root of the inner product of a function with itself. Additionally, we denote by (resp. ) the operator norms of operators mapping from to (resp. to ). We consider two bounded, linear operators, and . The operator maps from to and has the adjoint , i.e. with for all and . The operator is self-adjoint and acts on . Obviously, is then self-adjoint on .

###### Lemma 11.

In the setting from above let be also positive, i.e. , then

 ∥∥RTk+1R∗∥∥H1≤∥T∥H2∥∥RTkR∗∥∥H1.

In the special case this lemma was used in [U12] to prove a lower bound on the spectral gap of SW. We will recall this result later.

###### Proof.

By the assumptions, has a unique positive square root , i.e. , which is again self-adjoint, see e.g. [Kre78, Th. 9.4-2]. We obtain

 ∥∥RTk+1R∗∥∥H1=∥∥R˜T2k+2R∗∥∥H1=∥∥R˜Tk+1∥∥2H2→H1≤∥∥R˜Tk∥∥2H2→H1∥∥˜T∥∥2H2=∥∥R˜T2kR∗∥∥H1∥T∥H2=∥T∥H2∥∥RTkR∗∥∥H1.

In particular, if this proves monotonicity in .

###### Lemma 12.

In the setting from above let additionally , then

 ∥RTR∗∥2kH1≤∥∥RT2kR∗∥∥H1

for all .

###### Proof.

The case is obvious. Now suppose the statement is correct for , then

 ∥RTR∗∥2kH1=∥RTR∗∥2k−12H1≤∥∥RT2k−1R∗∥∥2H1≤∥∥RT2k−1∥∥2H2→H1∥R∗∥2H1→H2=∥∥RT2k−1T2k−1R∗∥∥H1∥RR∗∥H2≤∥∥RT2kR∗∥∥H1,

which proves the statement for . ∎

The next corollary combines the statements of the last two lemmas to give a result similar to Lemma 12 for arbitrary exponents.

###### Corollary 13.

Additionally to the general assumptions of this section let be positive, and . Then

 ∥RTR∗∥2kH1≤∥∥RTkR∗∥∥H1

for all .

###### Proof.

Let such that . Since by assumption, we obtain

 ∥RTR∗∥2kH1≤∥RTR∗∥2l+1H1\lx@stackrelL.\scriptsize???≤∥∥RT2l+1R∗∥∥H1\lx@stackrelL.\scriptsize???≤∥∥RTkR∗∥∥H1.

### 4.2 Proof

In this section we apply the estimates from the last one. Recall that we consider the dynamics on a graph with edges, i.e. . Fix an arbitrary ordering of the edges . We set the Hilbert spaces from the last section to and and define the operators

 T:=1mm∑i=1Tei

and

 T:=m∏i=1Tei

with from (6). Note that and by Lemma 10. Additionally we define

 Tα:=m∏i=1Tαiei

for . By Lemma 9(ii) we obtain for that if and only if . Furthermore, for every with for all . We prove the following theorem.

###### Theorem 14.

Let and be a bounded, linear operator with . Then

 ∥∥RTkR∗∥∥H1≤(1−ε)∥∥RTR∗∥∥H1+ε.
###### Proof.

Define the index sets and . Let and denote by , for , the multinomial coefficient. Obviously (by the multinomial theorem),

 ∑α∈Im,k(kα)=mk

and

 Zm,k:=∑α∈I0m,k(kα)≤m∑i=1∑α∈I0m,k:αi=0(kα)=m∑γ∈Im−1,k(kγ)=m(m−1)k.

We write

 Tk=(1mm∑i=1Tei)k=1mk∑α∈Im,k(kα)Tα=1mk∑α∈I1m,k(kα)Tα+1mk∑α∈I0m,k(kα)Tα.

Note that we use for the second equality that the ’s are commuting by Lemma 9(ii). Since we know that for every (note that for ) and for every , we obtain

 ∥∥RTkR∗∥∥H1≤1mk∑α∈I1m,k(kα)∥∥RTR∗∥∥H1+1mk∑α∈I0m,k(kα)∥∥RTαR∗∥∥H1≤(1−Zm,kmk)∥∥RTR∗∥∥H1+Zm,kmk.

Using for and , and it follows

 ∥∥RTkR∗∥∥H1≤(1−m(1−1m)k)∥∥RTR∗∥∥H1+m(1−1m)k.

Setting yields the result. ∎

Now we are able to prove the comparison result for SW and SB dynamics. For this let for all and , which defines an operator (by (1)) that maps from to . The adjoint operator is then given by and thus, for all . For the proof we set

 R:=M−S1.

It follows that , since , but and thus . This implies . Additionally, and .

###### Theorem 15.

Let (resp. ) be the transition matrix of the Swendsen-Wang (resp. single-bond) dynamics for the random-cluster model on a graph with edges. Then

 λ(PSW)≤8mlogmλ(PSB).
###### Proof.

Let . Then

 λ(PSW)=1−∥RTR∗∥H1\lx@stackrelTh.% \scriptsize???≤1−11−ε(∥∥RTkR∗∥∥H1−ε)=11−ε(1−∥∥RTkR∗∥∥H1)\lx@stackrelCoro.\scriptsize???≤11−ε(1−∥RTR∗∥2kH1)≤2k1−ε(1−∥RTR∗∥H1)=2k1−ελ(PSB),

where the last inequality comes from for . Setting , we obtian . This proves the statement.

Combining Lemma 8 and Theorem 15 proves Theorem 1.

## 5 Proof of Theorems 3 and 5

In this section we introduce the notion of (planar) dual graphs and prove that the heat-bath dynamics on a planar graph has the same spectral gap than the heat-bath dynamics for the dual model on , which is the dual graph of . This immediately implies Corollary 2 and hence, that rapid mixing of the Swendsen-Wang dynamics for the random-cluster model and its dual model is equivalent. Finally, we use the known lower bounds on the spectral gap of SW on the two-dimensional square lattice at high temperatures to prove Theorem 3 and Theorem 5.

### 5.1 Dual graphs

Let be a planar graph, i.e. a graph that can be embedded into a sphere such that two edges of intersect only at a common endvertex. We fix such an embedding for . Then we define the dual graph of as follows. Place a dual vertex in each face, i.e. in each region of whose boundary consists of edges in the embedding of , and connect 2 vertices by the dual edge if and only if the corresponding faces of share the boundary edge (see e.g. [Gri10, Section 8.5]). Note that the dual graph certainly depends on the used embedding. It is clear, that the number of vertices can differ in the dual graph, but we have the same number of edges.

Additionally we define a dual RC configuration in to a RC state in by

 e∈A⟺e†∉A†,

where is the edge in that intersects in our (fixed) embedding. (By construction, this edge is unique.)

It is easy to obtain (see [Gri06, p. 134]) that the random cluster models on the (finite) graphs and are related by the equality

 μGp,q(A)=μG