# Ising models on locally tree-like graphs\thanksrefT1

## Abstract

We consider ferromagnetic Ising models on graphs
that converge locally to trees. Examples include random regular
graphs with bounded degree and uniformly random graphs with
bounded average degree. We prove that the “cavity”
prediction for the
limiting
free energy per spin is correct for
*any positive temperature and external field*.
Further, local marginals can be approximated
by iterating a set of mean field (cavity) equations.
Both results are achieved by proving the local convergence
of the Boltzmann distribution on the original graph to the Boltzmann
distribution on the appropriate infinite random tree.

10.1214/09-AAP627 \volume20 \issue2 2010 \firstpage565 \lastpage592 \newproclaimdefinition[propo]Definition \newproclaimRemarkRemark \newproclaimremark[propo]Remark

Ising models on locally tree-like graphs \thankstextT1Supported in part by NSF Grant DMS-08-06211. {aug} A]\fnmsAmir \snmDembo and B]\fnmsAndrea \snmMontanarilabel=e2]montanari@stanford.edu\corref

class=AMS] \kwd[Primary ]82B44 \kwd[; secondary ]82B23 \kwd60F10 \kwd60K35 \kwd05C80 \kwd05C05. Ising model \kwdrandom sparse graphs \kwdcavity method \kwdBethe measures \kwdbelief propagation \kwdlocal weak convergence.

## 1 Introduction

A ferromagnetic *Ising model on the
finite
graph * (with vertex set ,
and edge set ) is defined by the following Boltzmann distributions over
,
with
:

(1) |

These distributions are parametrized by the
“magnetic field” and “inverse temperature” ,
where the partition function is
fixed by the normalization condition .
Throughout the paper, we will be interested in sequences of
graphs^{1}

Nonrigorous statistical mechanics techniques, such as the “replica”
and “cavity methods,” allow to make a number of predictions
on the model (1), when the graph
“lacks
any finite-dimensional structure.”
The most basic quantity in this context is the
asymptotic *free entropy density*

(2) |

(this quantity is also sometimes called in the literature also free energy or pressure). The limit free entropy density and the large deviation properties of Boltzmann distribution were characterized in great detail [9] in the case of a complete graph (the inverse temperature must then be scaled by to get a nontrivial limit). Statistical physics predictions exist, however, for a much wider class of graphs, including most notably sparse random graphs with bounded average degree; see, for instance, [8, 15, 18]. This is a direction of interest for at least two reasons: {longlist}[(ii)] (i) Sparse graphical structures arise in a number of problems from combinatorics and theoretical computer science. Examples include random satisfiability, coloring of random graphs and graph partitioning [21]. In all of these cases, the uniform measure over solutions can be regarded as the Boltzmann distribution for a modified spin glass with multispin interactions. Such problems have been successfully attacked using nonrigorous statistical mechanics techniques. A mathematical foundation of this approach is still lacking, and would be extremely useful.

Sparse graphs allow to introduce a nontrivial notion of distance between vertices, namely the length of the shortest path connecting them. This geometrical structure allows for new characterizations of the measure (1) in terms of correlation decay. This type of characterization is in turn related to the theory of Gibbs measures on infinite trees [17].

The asymptotic free entropy density (2) was determined rigorously only in a few cases for sparse graphs. In [11], this task was accomplished for random regular graphs. De Sanctis and Guerra [7] developed interpolation techniques for random graphs with independent edges (Erdös–Renyi type) but only determined the free entropy density at high temperature and at zero temperature (in both cases with vanishing magnetic field). The latter is in fact equivalent to counting the number of connected components of a random graph. Interestingly, the partition function can be approximated in polynomial time for , using an appropriate Markov chain Monte Carlo algorithm [14]. It is intriguing that no general approximation algorithms exists in the case (the “antiferromagnetic” Ising model). Correspondingly, the statistical physics conjecture for the free entropy density [21] becomes significantly more intricate (presenting the so-called “replica symmetry breaking” phenomenon).

In this paper we generalize the previous results by rigorously verifying the validity of the Bethe free entropy prediction for the value of the limit in (2) for generic graph sequences that converge locally to trees. Indeed, we control the free entropy density by proving that the Boltzmann measure (1) converges locally to the Boltzmann measure of a model on a tree. The philosophy is related to the local weak convergence method of [2].

Finally, several of the proofs have an algorithmic interpretation,
providing an efficient procedure for approximating the local marginals
of the
Boltzmann measure. The essence of this procedure consists in solving
by iteration certain mean field (cavity) equations.
Such an algorithm is known in artificial intelligence and computer
science under the name of *belief propagation*. Despite its success
and wide applicability, only weak performance guarantees
have been proved so far. Typically, it is possible to prove its correctness
in the high temperature regime, as a consequence
of a uniform decay of correlations holding there (spatial mixing)
[26, 3, 23].
The behavior of iterative inference algorithms on Ising models
was recently considered in [22, 24].

The emphasis of the present paper is on the low-temperature regime in which uniform decorrelation does not hold. We are able to prove that belief propagation converges exponentially fast on any graph, and that the resulting estimates are asymptotically exact for large locally tree-like graphs. The main idea is to introduce a magnetic field to break explicitly the symmetry, and to carefully exploit the monotonicity properties of the model.

A key step consists of estimating the correlation between the root spin of an Ising model on a tree and positive boudary conditions. Ising models on trees are interesting per se, and have been the object of significant mathematical work; see, for instance, [20, 16, 10]. The question considered here appears, however, to be novel.

The next section provides the basic technical definitions (in particular concerning graphs and local convergence to trees), and the formal statement of our main results. Notation and certain key tools are described in Section 3 with Section 4 devoted to proofs of the relevant properties of Ising models on trees (which are of independent interest). The latter are used in Sections 5 and 6 to derive our main results concerning models on tree-like graphs. A companion paper [5] deals with the related challenging problem of spin glass models on sparse graphs.

## 2 Definitions and main results

The next subsections contain some basic definitions on graph sequences and the notion of local convergence to random trees. Sections 2.2 and 2.3 present our results on the free entropy density and the algorithmic implications of our analysis.

### 2.1 Locally tree-like graphs

Let a probability distribution over the nonnegative integers, with finite, positive first moment, and denote by

(3) |

its size-biased version. For any , we let denote the random rooted tree generated as follows. First draw an integer with distribution , and connect the root to offspring. Then recursively, for each node in the last generation, generate an integer independently with distribution , and connect the node to new nodes. This is repeated until the tree has generations.

Sometimes it will be useful to consider the ensemble whereby the root node has degree with probability . We will drop the degree distribution arguments from or and write whenever clear from the context. Notice that the infinite trees and are well defined.

The average branching factor of trees will be denoted by , and the average root degree by . In formulae

(4) |

We denote by a graph with vertex set . The distance between is the length of the shortest path from to in . Given a vertex , we let be the set of vertices whose distance from is at most . With a slight abuse of notation, will also denote the subgraph induced by those vertices. For , we let denote the set of its neighbors , and its size (i.e. the degree of ).

This paper is concerned by sequence of graphs
of diverging size, that converge locally to trees.
Consider two trees and with vertices labeled arbitrarily.
We shall write if the two trees become identical when vertices
are relabeled from to , in a
breadth first fashion, and following lexicographic order among siblings.
{definition}
Considering a sequence of graphs , let
denote the law induced on the ball
in
centered at a uniformly chosen random vertex .
We say that *converges locally* to the random tree
if, for any , and any rooted tree
with generations

(5) |

We say that a sequence of graphs
is *uniformly sparse* if

(6) |

### 2.2 Free entropy

According to the statistical physics derivation [18], the model (1) has a line of first-order phase transitions for and [i.e., where the continuous function exhibits a discontinuous derivative]. The critical temperature depends on the graph only through the average branching factor and is determined by the condition

(7) |

Notice that for large degrees.

The asymptotic free-entropy density is given in terms of the fixed point of a distributional recursion. One characterization of this fixed point is as follows.

###### Lemma 2.1

Consider the sequence of random variables defined by identically and, for ,

(8) |

where is an integer valued random variable of distribution ,

(9) |

and the ’s are i.i.d. copies of that are independent of . If and has finite first moment, then the distributions of are stochastically monotone and converges in distribution to the unique fixed point of the recursion (8) that is supported on .

Our next result confirms the statistical physics prediction for the free-entropy density.

###### Theorem 2.2

Let be a sequence of uniformly sparse graphs that converges locally to . If has finite first moment (that is if has finite second moment), then for any and the following limit exists:

(10) |

Moreover, for the limit is given by

(11) | |||||

where has distribution and is independent of the “cavity fields” that are i.i.d. copies of the fixed point of Lemma 2.1. Also, and is the limit of as .

The proof of Theorem 2.2 is based on two steps:
{longlist}[(a)]
(a)
Reduce the computation of
to computing expectations
of local (in )
quantities with respect to the Boltzmann measure (1).
This is achieved by noticing that the derivative of
with respect to is a sum of such expectations.
(b)
Show that expectations of local quantities on are well
approximated by the same expectations with respect to an Ising model on the
associated tree (for and large).
This is proved by showing that, on such a tree, local expectations are
insensitive to boundary conditions that dominate stochastically
free boundaries.
The theorem then follows by monotonicity arguments.
The key step is of course the last one. A stronger requirement would be that
these
expectation values are insensitive to any boundary condition,
which would coincide with uniqueness of the Gibbs measure on
. Such a requirement would allow
for an elementary proof, but holds only at “high” temperature,
.
Indeed, insensitivity to positive boundary conditions is proved
in Section 4 for
the following collection of
trees of conditionally independent (and of bounded
average) offspring numbers.
{definition}
An infinite tree rooted at the vertex
is called *conditionally independent* if
for each integer , conditional on
the subtree of the first generations of ,
the number of offspring for
are independent of each other, where
denotes the set of vertices at generation .
We further assume that the [conditional on ]
first moments of are uniformly bounded by a given
nonrandom finite constant .
Beyond the random tree ,
these include deterministic trees with bounded degrees
and certain multi-type branching processes (such as
random bipartite trees and percolation clusters
on deterministic trees of bounded degree).
Consequently, Theorem 2.2
extends to any uniformly sparse graph sequence
that converge locally to a random tree of
the form of Definition 2.2 except that
the formula is in general more
involved than the one given in (11).
For example, such an extension allows one to handle
uniformly random bipartite graphs with different
degree distributions and for the two types of vertices.
While we refrain from formalizing and proving such
generalizations, we note in passing that our derivation
of the formula (11) implicitly
uses the fact that possesses the
involution invariance of [2]. As pointed out
in [1], every local limit of finite graphs
must have the involution invariance property (which
clearly not every conditionally independent tree has).

### 2.3 Algorithmic implications

The free entropy density is not the only quantity that can be characterized for Ising models on locally tree-like graphs. Indeed local marginals can be efficiently computed with good accuracy. The basic idea is to solve a set of mean field equations iteratively. These are known as Bethe–Peierls or cavity equations and the corresponding algorithm is referred to as “belief propagation” (BP).

More precisely, associate to each directed edge in the graph , with , a distribution over . In the computer science literature these distributions are referred to as “messages.” They are updated as follows:

(12) |

The initial conditions may be taken
to be
uniform or chosen according to some heuristic. We will say that the
initial condition is *positive* if for each of these messages.

Our next result concerns the uniform exponential convergence of the BP iteration to the same fixed point of (12), irrespective of its positive initial condition. Here and below, we denote by the total variation distance between distributions and .

###### Theorem 2.3

Assume , and is a graph of finite maximal degree . Then, there exists finite, and a fixed point of the BP iteration (12) such that for any positive initial condition and all ,

(13) |

For let be the ball of radius around in , denoting by its edge set, by its border (i.e., the set of its vertices at distance from ), and for each let denote any one fixed neighbor of in .

Our next result shows that the probability distribution

(14) |

with the fixed point of the BP iteration per Theorem 2.3, is a good approximation for the marginal of variables under the Ising model (1).

###### Theorem 2.4

Assume , and is a graph of finite maximal degree . Then, there exist finite and such that for any and , if is a tree then

(15) |

### 2.4 Examples

Many common random graph ensembles [13] naturally fit our framework.

#### Random regular graphs

Let be a uniformly random graph with degree . As , the sequence is obviously uniformly sparse, and converges locally almost surely to the rooted infinite tree of degree at every vertex. Therefore, in this case Theorem 2.2 applies with and for . The distributional recursion (8) then evolves with a deterministic sequence recovering the result of [11].

#### Erdös–Renyi graphs

Let be a uniformly random graph with edges over vertices. The sequence converges locally almost surely to a Galton–Watson tree with Poisson offspring distribution of mean . This corresponds to taking . The same happens to classical variants of this ensemble. For instance, one can add an edge independently for each pair with probability , or consider a multi-graph with edges between each pair .

#### Arbitrary degree distribution

Let be a distribution with finite second moment and a uniformly random graph with degree distribution (more precisely, we set the number of vertices of degree to , adding one for if needed for an even sum of degrees). Then, is uniformly sparse and with probability one it converges locally to . The same happens if is drawn according to the so-called configuration model (cf. [4]).

## 3 Preliminaries

We review here the notations and a couple of classical tools we use throughout this paper. To this end, when proving our results it is useful to allow for vertex-dependent magnetic fields , that is, to replace the basic model (1) by

(16) |

Given , we denote by [respectively, ] the vector , [respectively, ], dropping the subscript whenever clear from the context. Further, we use when two real-valued vectors and are such that for all and say that a distribution over is dominated by a distribution over this set (denoted ), if the two distributions can be coupled so that for any pair drawn from this coupling. Finally, we use throughout the shorthand for a distribution and function on the same finite set, or when is clear from the context.

The first classical result we need is Griffiths inequality (see [19], Theorem IV.1.21).

###### Theorem 3.1

Consider two Ising models and on graphs and , inverse temperatures and , and magnetic fields and , respectively. If , and for all , then for any .

The second classical result we use is the GHS inequality (see [12]) about the effect of the magnetic field on the local magnetizations at various vertices.

###### Theorem 3.2 ((Griffiths, Hurst, Sherman))

Let and for , denote by the local magnetization at vertex in the Ising model (16). If for all , then for any three vertices (not necessarily distinct),

(17) |

Finally, we need the following elementary inequality:

###### Lemma 3.3

For any function and distributions , on the finite set such that and ,

(18) |

In particular, if , then the right-hand side is bounded by .

Assuming without loss of generality that , the left-hand side of (18) can be bounded as

This implies the lemma.

## 4 Ising models on trees

We prove in this section certain facts about Ising models on trees which are of independent interest and as a byproduct we deduce Lemma 2.1 and the theorems of Section 2.3. In doing so, recall that for each the Ising models on with free and plus boundary conditions are

(19) | |||||

(20) | |||||

Equivalently is the Ising model (16) on with magnetic fields and is the modified Ising model corresponding to the limit for all . To simplify our notation we denote such limits hereafter simply by setting and use for statements that apply to both free and plus boundary conditions.

We start with the following simple but useful observation.

###### Lemma 4.1

For a subtree of a finite tree let denote the subset of vertices of connected by an edge to and for each let denote the root magnetization of the Ising model on the maximal subtree of rooted at . The marginal on of the Ising measure on , denoted is then an Ising measure on with magnetic field for and for .

Since is a subtree of the tree , the subtrees for are disjoint. Therefore, with denoting the Ising model distribution for we have that

(21) |

for the Boltzmann weight

Further, so for each and some constants ,

Embedding the normalization constants within we thus conclude that is an Ising measure on with the stated magnetic field . Finally, comparing the root magnetization for with that for we have by Griffiths inequality that , as claimed.

###### Theorem 4.2

Suppose is a conditionally independent infinite tree of average offspring numbers bounded by , as in Definition 2.2. For , and finite, there exist and finite such that if for all and for all , , then

(22) |

for , all and .

Fixing it suffices to consider [for which the left-hand side of (22) is maximal]. For this and we have that and , where in this case the Boltzmann weight in (21) is bounded above by and below by for . Further, the plus and free boundary conditions then differ in (21) by having the corresponding boundary conditions at generation of each subtree , which we distinguish by using instead of . Since the total variation distance between two product measures is at most the sum of the distance between their marginals, upon applying Lemma 3.3 we deduce from (21) that

By our assumptions, conditional on , the subtrees of denoted hereafter also by are for independent of each other. Further, is precisely the magnetization of their root vertex under plus/free boundary conditions at generation . Thus, taking (and using the inequality ), it suffices to show that the magnetizations at the root of any such conditionally independent infinite tree satisfy , for some finite, all and , where we have removed the absolute value since by Griffiths inequality. For greater convenience of the reader, this fact is proved in the next lemma.

###### Lemma 4.3

Suppose is a conditionally independent infinite tree of average offspring numbers bounded by . For , and finite, there exist such that

(23) |

where are the root magnetizations under and free boundary condition on .

Note that (23) trivially holds for [in which case ]. Assuming hereafter that we proceed to prove (23) when each vertex of has a nonzero offspring number. To this end, for let

and denote by the corresponding root magnetization. Writing instead of for constant magnetic field on the leave nodes, that is, when for each , we note that and . Further, applying Lemma 4.1 for the subtree of we represent as the root magnetization on where for and for all other . Consequently,

(24) |

Recall that if for , then applying Jensen’s inequality one variable at a time we have that for any independent random variables . By the GHS inequality, this is the case for , hence with denoting the conditional on expectation over the independent offspring numbers for , we deduce that

(25) |

where the last inequality is a consequence of Griffiths inequality and our assumption that for any and all . Since each has at least one offspring whose magnetic field is at least , it follows by Griffiths inequality that is bounded below by the magnetization at the root of the subtree of where for all and for all . Applying Lemma 4.1 for and , the root magnetization for the Ising distribution on turns out to be precisely for of (9). Thus, one more application of Griffiths inequality yields that

(26) |

Next note that and by GHS inequality is concave. Hence,

(27) |

for the finite constant

and all . Combining (25), (26) and (27) we obtain that

We have seen in (26) that is nondecreasing whereas from (24) and Griffiths inequality we have that is nonincreasing. With magnetization bounded above by one, we thus get upon summing the preceding inequalities for that

from which we deduce (23).

Considering now the general case where the infinite tree has vertices (other than the root) of degree one, let denote the “backbone” of , that is, the subtree induced by vertices along self-avoiding paths between and . Taking as the subtree of in Lemma 4.1, note that for each the subtree contains no vertex from . Consequently, the marginal measures are Ising measures on with the same magnetic fields outside . Thus, with denoting the corresponding magnetizations at the root for , we deduce that where for all . By definition every vertex of has a nonzero offspring number and with , the required bound

follows by the preceding argument, since is a conditionally independent tree whose offspring numbers do not exceed those of . Indeed, for , given the offspring numbers at are independent of each other [with probability of proportional to the sum over of the product of the probability of and that of precisely out of the offspring of in having a line of descendants that survives additional generations, for ].

Simon’s inequality (see [25], Theorem 2.1) allows one to bound the (centered) two point correlation functions in ferromagnetic Ising models with zero magnetic field. We provide next its generalization to arbitrary magnetic field, in the case of Ising models on trees.

###### Lemma 4.4

If edge is on the unique path from to , with a descendant of , , then

(28) |

where denotes the expectation with respect to the Ising distribution on the subtree of and all its descendants in and denotes the centered two point correlation function.

It is not hard to check that if are -valued random variables with and conditionally independent given , then

(29) |

In particular, under the random variables and are conditionally independent given with

Hence, if is the unique descendant of then