Ising models on locally tree-like graphs\thanksrefT1

Ising models on locally tree-like graphs\thanksrefT1

Abstract

We consider ferromagnetic Ising models on graphs that converge locally to trees. Examples include random regular graphs with bounded degree and uniformly random graphs with bounded average degree. We prove that the “cavity” prediction for the limiting free energy per spin is correct for any positive temperature and external field. Further, local marginals can be approximated by iterating a set of mean field (cavity) equations. Both results are achieved by proving the local convergence of the Boltzmann distribution on the original graph to the Boltzmann distribution on the appropriate infinite random tree.

[
\kwd
\doi

10.1214/09-AAP627 \volume20 \issue2 2010 \firstpage565 \lastpage592 \newproclaimdefinition[propo]Definition \newproclaimRemarkRemark \newproclaimremark[propo]Remark

\runtitle

Ising models on locally tree-like graphs \thankstextT1Supported in part by NSF Grant DMS-08-06211. {aug} A]\fnmsAmir \snmDembo and B]\fnmsAndrea \snmMontanarilabel=e2]montanari@stanford.edu\corref

class=AMS] \kwd[Primary ]82B44 \kwd[; secondary ]82B23 \kwd60F10 \kwd60K35 \kwd05C80 \kwd05C05. Ising model \kwdrandom sparse graphs \kwdcavity method \kwdBethe measures \kwdbelief propagation \kwdlocal weak convergence.

1 Introduction

A ferromagnetic Ising model on the finite graph (with vertex set , and edge set ) is defined by the following Boltzmann distributions over , with :

(1)

These distributions are parametrized by the “magnetic field” and “inverse temperature” , where the partition function is fixed by the normalization condition . Throughout the paper, we will be interested in sequences of graphs1 of diverging size .

Nonrigorous statistical mechanics techniques, such as the “replica” and “cavity methods,” allow to make a number of predictions on the model (1), when the graph “lacks any finite-dimensional structure.” The most basic quantity in this context is the asymptotic free entropy density

(2)

(this quantity is also sometimes called in the literature also free energy or pressure). The limit free entropy density and the large deviation properties of Boltzmann distribution were characterized in great detail [9] in the case of a complete graph (the inverse temperature must then be scaled by to get a nontrivial limit). Statistical physics predictions exist, however, for a much wider class of graphs, including most notably sparse random graphs with bounded average degree; see, for instance, [8, 15, 18]. This is a direction of interest for at least two reasons: {longlist}[(ii)] (i) Sparse graphical structures arise in a number of problems from combinatorics and theoretical computer science. Examples include random satisfiability, coloring of random graphs and graph partitioning [21]. In all of these cases, the uniform measure over solutions can be regarded as the Boltzmann distribution for a modified spin glass with multispin interactions. Such problems have been successfully attacked using nonrigorous statistical mechanics techniques. A mathematical foundation of this approach is still lacking, and would be extremely useful.

  • (ii)

    Sparse graphs allow to introduce a nontrivial notion of distance between vertices, namely the length of the shortest path connecting them. This geometrical structure allows for new characterizations of the measure (1) in terms of correlation decay. This type of characterization is in turn related to the theory of Gibbs measures on infinite trees [17].

    The asymptotic free entropy density (2) was determined rigorously only in a few cases for sparse graphs. In [11], this task was accomplished for random regular graphs. De Sanctis and Guerra [7] developed interpolation techniques for random graphs with independent edges (Erdös–Renyi type) but only determined the free entropy density at high temperature and at zero temperature (in both cases with vanishing magnetic field). The latter is in fact equivalent to counting the number of connected components of a random graph. Interestingly, the partition function can be approximated in polynomial time for , using an appropriate Markov chain Monte Carlo algorithm [14]. It is intriguing that no general approximation algorithms exists in the case (the “antiferromagnetic” Ising model). Correspondingly, the statistical physics conjecture for the free entropy density [21] becomes significantly more intricate (presenting the so-called “replica symmetry breaking” phenomenon).

    In this paper we generalize the previous results by rigorously verifying the validity of the Bethe free entropy prediction for the value of the limit in (2) for generic graph sequences that converge locally to trees. Indeed, we control the free entropy density by proving that the Boltzmann measure (1) converges locally to the Boltzmann measure of a model on a tree. The philosophy is related to the local weak convergence method of [2].

    Finally, several of the proofs have an algorithmic interpretation, providing an efficient procedure for approximating the local marginals of the Boltzmann measure. The essence of this procedure consists in solving by iteration certain mean field (cavity) equations. Such an algorithm is known in artificial intelligence and computer science under the name of belief propagation. Despite its success and wide applicability, only weak performance guarantees have been proved so far. Typically, it is possible to prove its correctness in the high temperature regime, as a consequence of a uniform decay of correlations holding there (spatial mixing) [26, 3, 23]. The behavior of iterative inference algorithms on Ising models was recently considered in [22, 24].

    The emphasis of the present paper is on the low-temperature regime in which uniform decorrelation does not hold. We are able to prove that belief propagation converges exponentially fast on any graph, and that the resulting estimates are asymptotically exact for large locally tree-like graphs. The main idea is to introduce a magnetic field to break explicitly the symmetry, and to carefully exploit the monotonicity properties of the model.

    A key step consists of estimating the correlation between the root spin of an Ising model on a tree and positive boudary conditions. Ising models on trees are interesting per se, and have been the object of significant mathematical work; see, for instance, [20, 16, 10]. The question considered here appears, however, to be novel.

    The next section provides the basic technical definitions (in particular concerning graphs and local convergence to trees), and the formal statement of our main results. Notation and certain key tools are described in Section 3 with Section 4 devoted to proofs of the relevant properties of Ising models on trees (which are of independent interest). The latter are used in Sections 5 and 6 to derive our main results concerning models on tree-like graphs. A companion paper [5] deals with the related challenging problem of spin glass models on sparse graphs.

  • 2 Definitions and main results

    The next subsections contain some basic definitions on graph sequences and the notion of local convergence to random trees. Sections 2.2 and 2.3 present our results on the free entropy density and the algorithmic implications of our analysis.

    2.1 Locally tree-like graphs

    Let a probability distribution over the nonnegative integers, with finite, positive first moment, and denote by

    (3)

    its size-biased version. For any , we let denote the random rooted tree generated as follows. First draw an integer with distribution , and connect the root to offspring. Then recursively, for each node in the last generation, generate an integer independently with distribution , and connect the node to new nodes. This is repeated until the tree has generations.

    Sometimes it will be useful to consider the ensemble whereby the root node has degree with probability . We will drop the degree distribution arguments from or and write whenever clear from the context. Notice that the infinite trees and are well defined.

    The average branching factor of trees will be denoted by , and the average root degree by . In formulae

    (4)

    We denote by a graph with vertex set . The distance between is the length of the shortest path from to in . Given a vertex , we let be the set of vertices whose distance from  is at most . With a slight abuse of notation, will also denote the subgraph induced by those vertices. For , we let denote the set of its neighbors , and its size (i.e. the degree of ).

    This paper is concerned by sequence of graphs of diverging size, that converge locally to trees. Consider two trees and with vertices labeled arbitrarily. We shall write if the two trees become identical when vertices are relabeled from to , in a breadth first fashion, and following lexicographic order among siblings. {definition} Considering a sequence of graphs , let denote the law induced on the ball in centered at a uniformly chosen random vertex . We say that converges locally to the random tree if, for any , and any rooted tree with generations

    (5)
    {definition}

    We say that a sequence of graphs is uniformly sparse if

    (6)

    2.2 Free entropy

    According to the statistical physics derivation [18], the model (1) has a line of first-order phase transitions for and [i.e., where the continuous function exhibits a discontinuous derivative]. The critical temperature depends on the graph only through the average branching factor and is determined by the condition

    (7)

    Notice that for large degrees.

    The asymptotic free-entropy density is given in terms of the fixed point of a distributional recursion. One characterization of this fixed point is as follows.

    Lemma 2.1

    Consider the sequence of random variables defined by identically and, for ,

    (8)

    where is an integer valued random variable of distribution ,

    (9)

    and the s are i.i.d. copies of that are independent of . If and has finite first moment, then the distributions of are stochastically monotone and converges in distribution to the unique fixed point of the recursion (8) that is supported on .

    Our next result confirms the statistical physics prediction for the free-entropy density.

    Theorem 2.2

    Let be a sequence of uniformly sparse graphs that converges locally to . If has finite first moment (that is if has finite second moment), then for any and the following limit exists:

    (10)

    Moreover, for the limit is given by

    (11)

    where has distribution and is independent of the cavity fields that are i.i.d. copies of the fixed point of Lemma 2.1. Also, and is the limit of as .

    The proof of Theorem 2.2 is based on two steps: {longlist}[(a)] (a) Reduce the computation of to computing expectations of local (in ) quantities with respect to the Boltzmann measure (1). This is achieved by noticing that the derivative of with respect to is a sum of such expectations. (b) Show that expectations of local quantities on are well approximated by the same expectations with respect to an Ising model on the associated tree (for and large). This is proved by showing that, on such a tree, local expectations are insensitive to boundary conditions that dominate stochastically free boundaries. The theorem then follows by monotonicity arguments. The key step is of course the last one. A stronger requirement would be that these expectation values are insensitive to any boundary condition, which would coincide with uniqueness of the Gibbs measure on . Such a requirement would allow for an elementary proof, but holds only at “high” temperature, . Indeed, insensitivity to positive boundary conditions is proved in Section 4 for the following collection of trees of conditionally independent (and of bounded average) offspring numbers. {definition} An infinite tree rooted at the vertex is called conditionally independent if for each integer , conditional on the subtree of the first generations of , the number of offspring for are independent of each other, where denotes the set of vertices at generation . We further assume that the [conditional on ] first moments of are uniformly bounded by a given nonrandom finite constant . Beyond the random tree , these include deterministic trees with bounded degrees and certain multi-type branching processes (such as random bipartite trees and percolation clusters on deterministic trees of bounded degree). Consequently, Theorem 2.2 extends to any uniformly sparse graph sequence that converge locally to a random tree of the form of Definition 2.2 except that the formula is in general more involved than the one given in (11). For example, such an extension allows one to handle uniformly random bipartite graphs with different degree distributions and for the two types of vertices. While we refrain from formalizing and proving such generalizations, we note in passing that our derivation of the formula (11) implicitly uses the fact that possesses the involution invariance of [2]. As pointed out in [1], every local limit of finite graphs must have the involution invariance property (which clearly not every conditionally independent tree has).

    2.3 Algorithmic implications

    The free entropy density is not the only quantity that can be characterized for Ising models on locally tree-like graphs. Indeed local marginals can be efficiently computed with good accuracy. The basic idea is to solve a set of mean field equations iteratively. These are known as Bethe–Peierls or cavity equations and the corresponding algorithm is referred to as “belief propagation” (BP).

    More precisely, associate to each directed edge in the graph , with , a distribution over . In the computer science literature these distributions are referred to as “messages.” They are updated as follows:

    (12)

    The initial conditions may be taken to be uniform or chosen according to some heuristic. We will say that the initial condition is positive if for each of these messages.

    Our next result concerns the uniform exponential convergence of the BP iteration to the same fixed point of (12), irrespective of its positive initial condition. Here and below, we denote by the total variation distance between distributions and .

    Theorem 2.3

    Assume , and is a graph of finite maximal degree . Then, there exists finite, and a fixed point of the BP iteration (12) such that for any positive initial condition and all ,

    (13)

    For let be the ball of radius around in , denoting by its edge set, by its border (i.e., the set of its vertices at distance from ), and for each let denote any one fixed neighbor of in .

    Our next result shows that the probability distribution

    (14)

    with the fixed point of the BP iteration per Theorem 2.3, is a good approximation for the marginal of variables under the Ising model (1).

    Theorem 2.4

    Assume , and is a graph of finite maximal degree . Then, there exist finite and such that for any and , if is a tree then

    (15)

    2.4 Examples

    Many common random graph ensembles [13] naturally fit our framework.

    Random regular graphs

    Let be a uniformly random graph with degree . As , the sequence is obviously uniformly sparse, and converges locally almost surely to the rooted infinite tree of degree at every vertex. Therefore, in this case Theorem 2.2 applies with and for . The distributional recursion (8) then evolves with a deterministic sequence recovering the result of [11].

    Erdös–Renyi graphs

    Let be a uniformly random graph with edges over vertices. The sequence converges locally almost surely to a Galton–Watson tree with Poisson offspring distribution of mean . This corresponds to taking . The same happens to classical variants of this ensemble. For instance, one can add an edge independently for each pair with probability , or consider a multi-graph with edges between each pair .

    The sequence is with probability one uniformly sparse in each of these cases. Thus, Theorem 2.2 extends the results of [7] to arbitrary nonzero temperature and magnetic field.

    Arbitrary degree distribution

    Let be a distribution with finite second moment and a uniformly random graph with degree distribution (more precisely, we set the number of vertices of degree to , adding one for if needed for an even sum of degrees). Then, is uniformly sparse and with probability one it converges locally to . The same happens if is drawn according to the so-called configuration model (cf. [4]).

    3 Preliminaries

    We review here the notations and a couple of classical tools we use throughout this paper. To this end, when proving our results it is useful to allow for vertex-dependent magnetic fields , that is, to replace the basic model (1) by

    (16)

    Given , we denote by [respectively, ] the vector , [respectively, ], dropping the subscript whenever clear from the context. Further, we use when two real-valued vectors and are such that for all and say that a distribution over is dominated by a distribution over this set (denoted ), if the two distributions can be coupled so that for any pair drawn from this coupling. Finally, we use throughout the shorthand for a distribution and function on the same finite set, or when is clear from the context.

    The first classical result we need is Griffiths inequality (see [19], Theorem IV.1.21).

    Theorem 3.1

    Consider two Ising models and on graphs and , inverse temperatures and , and magnetic fields and , respectively. If , and for all , then for any .

    The second classical result we use is the GHS inequality (see [12]) about the effect of the magnetic field on the local magnetizations at various vertices.

    Theorem 3.2 ((Griffiths, Hurst, Sherman))

    Let and for , denote by the local magnetization at vertex in the Ising model (16). If for all , then for any three vertices (not necessarily distinct),

    (17)

    Finally, we need the following elementary inequality:

    Lemma 3.3

    For any function and distributions , on the finite set such that and ,

    (18)

    In particular, if , then the right-hand side is bounded by .

    {pf}

    Assuming without loss of generality that , the left-hand side of (18) can be bounded as

    This implies the lemma.

    4 Ising models on trees

    We prove in this section certain facts about Ising models on trees which are of independent interest and as a byproduct we deduce Lemma 2.1 and the theorems of Section 2.3. In doing so, recall that for each the Ising models on with free and plus boundary conditions are

    (19)
    (20)

    Equivalently is the Ising model (16) on with magnetic fields and is the modified Ising model corresponding to the limit for all . To simplify our notation we denote such limits hereafter simply by setting and use for statements that apply to both free and plus boundary conditions.

    We start with the following simple but useful observation.

    Lemma 4.1

    For a subtree of a finite tree let denote the subset of vertices of connected by an edge to and for each let denote the root magnetization of the Ising model on the maximal subtree of rooted at . The marginal on of the Ising measure on , denoted is then an Ising measure on with magnetic field for and for .

    {pf}

    Since is a subtree of the tree , the subtrees for are disjoint. Therefore, with denoting the Ising model distribution for we have that

    (21)

    for the Boltzmann weight

    Further, so for each and some constants ,

    Embedding the normalization constants within we thus conclude that is an Ising measure on with the stated magnetic field . Finally, comparing the root magnetization for with that for we have by Griffiths inequality that , as claimed.

    Theorem 4.2

    Suppose is a conditionally independent infinite tree of average offspring numbers bounded by , as in Definition 2.2. For , and finite, there exist and finite such that if for all and for all , , then

    (22)

    for , all and .

    {pf}

    Fixing it suffices to consider [for which the left-hand side of (22) is maximal]. For this and we have that and , where in this case the Boltzmann weight in (21) is bounded above by and below by for . Further, the plus and free boundary conditions then differ in (21) by having the corresponding boundary conditions at generation of each subtree , which we distinguish by using instead of . Since the total variation distance between two product measures is at most the sum of the distance between their marginals, upon applying Lemma 3.3 we deduce from (21) that

    By our assumptions, conditional on , the subtrees of denoted hereafter also by are for independent of each other. Further, is precisely the magnetization of their root vertex under plus/free boundary conditions at generation . Thus, taking (and using the inequality ), it suffices to show that the magnetizations at the root of any such conditionally independent infinite tree satisfy , for some finite, all and , where we have removed the absolute value since by Griffiths inequality. For greater convenience of the reader, this fact is proved in the next lemma.

    Lemma 4.3

    Suppose is a conditionally independent infinite tree of average offspring numbers bounded by . For , and finite, there exist such that

    (23)

    where are the root magnetizations under and free boundary condition on .

    {pf}

    Note that (23) trivially holds for [in which case ]. Assuming hereafter that we proceed to prove (23) when each vertex of has a nonzero offspring number. To this end, for let

    and denote by the corresponding root magnetization. Writing instead of for constant magnetic field on the leave nodes, that is, when for each , we note that and . Further, applying Lemma 4.1 for the subtree of we represent as the root magnetization on where for and for all other . Consequently,

    (24)

    Recall that if for , then applying Jensen’s inequality one variable at a time we have that for any independent random variables . By the GHS inequality, this is the case for , hence with denoting the conditional on expectation over the independent offspring numbers for , we deduce that

    (25)

    where the last inequality is a consequence of Griffiths inequality and our assumption that for any and all . Since each has at least one offspring whose magnetic field is at least , it follows by Griffiths inequality that is bounded below by the magnetization at the root of the subtree of where for all and for all . Applying Lemma 4.1 for and , the root magnetization for the Ising distribution on turns out to be precisely for of (9). Thus, one more application of Griffiths inequality yields that

    (26)

    Next note that and by GHS inequality is concave. Hence,

    (27)

    for the finite constant

    and all . Combining (25), (26) and (27) we obtain that

    We have seen in (26) that is nondecreasing whereas from (24) and Griffiths inequality we have that is nonincreasing. With magnetization bounded above by one, we thus get upon summing the preceding inequalities for that

    from which we deduce (23).

    Considering now the general case where the infinite tree has vertices (other than the root) of degree one, let denote the “backbone” of , that is, the subtree induced by vertices along self-avoiding paths between and . Taking as the subtree of in Lemma 4.1, note that for each the subtree contains no vertex from . Consequently, the marginal measures are Ising measures on with the same magnetic fields outside . Thus, with denoting the corresponding magnetizations at the root for , we deduce that where for all . By definition every vertex of has a nonzero offspring number and with , the required bound

    follows by the preceding argument, since is a conditionally independent tree whose offspring numbers do not exceed those of . Indeed, for , given the offspring numbers at are independent of each other [with probability of proportional to the sum over of the product of the probability of and that of precisely out of the offspring of in having a line of descendants that survives additional generations, for ].

    Simon’s inequality (see [25], Theorem 2.1) allows one to bound the (centered) two point correlation functions in ferromagnetic Ising models with zero magnetic field. We provide next its generalization to arbitrary magnetic field, in the case of Ising models on trees.

    Lemma 4.4

    If edge is on the unique path from to , with a descendant of , , then

    (28)

    where denotes the expectation with respect to the Ising distribution on the subtree of and all its descendants in and denotes the centered two point correlation function.

    {pf}

    It is not hard to check that if are -valued random variables with and conditionally independent given , then

    (29)

    In particular, under the random variables and are conditionally independent given with

    Hence, if is the unique descendant of then