Disorder chaos and multiple valleys

Disorder chaos and multiple valleys in spin glasses

Sourav Chatterjee
367 Evans Hall #3860
Department of Statistics
University of California at Berkeley
Berkeley, CA 94720-3860
E-mail: sourav@stat.berkeley.edu
URL: http://www.stat.berkeley.edu/sourav
September 5, 2009
Abstract.

We prove that the Sherrington-Kirkpatrick model of spin glasses is chaotic under small perturbations of the couplings at any temperature in the absence of an external field. The result is proved for two kinds of perturbations: (a) distorting the couplings via Ornstein-Uhlenbeck flows, and (b) replacing a small fraction of the couplings by independent copies. We further prove that the S-K model exhibits multiple valleys in its energy landscape, in the weak sense that there are many states with near-minimal energy that are mutually nearly orthogonal. We show that the variance of the free energy of the S-K model is unusually small at any temperature. (By ‘unusually small’ we mean that it is much smaller than the number of sites; in other words, it beats the classical Gaussian concentration inequality, a phenomenon that we call ‘superconcentration’.) We prove that the bond overlap in the Edwards-Anderson model of spin glasses is not chaotic under perturbations of the couplings, even large perturbations. Lastly, we obtain sharp lower bounds on the variance of the free energy in the E-A model on any bounded degree graph, generalizing a result of Wehr and Aizenman and establishing the absence of superconcentration in this class of models. Our techniques apply for the -spin models and the Random Field Ising Model as well, although we do not work out the details in these cases.

Key words and phrases:
Sherrington-Kirkpatrick model, Edwards-Anderson model, spin glass, chaos, disorder, multiple valleys, concentration of measure, low temperature phase, Gaussian field
2000 Mathematics Subject Classification:
60K35, 60G15, 82B44, 60G60, 60G70
The author’s research was partially supported by NSF grant DMS-0707054 and a Sloan Research Fellowship

1. Introduction

Spin glasses are magnetic materials with strange properties that distinguish them from ordinary ferromagnets. In statistical physics, the study of spin glasses originated with the works of Edwards and Anderson [11] and Sherrington and Kirkpatrick [33] in 1975. In the following decade, the theoretical study of spin glasses led to the invention of deep and powerful new methods in physics, most notably Parisi’s broken replica method. We refer to [26] for a survey of the physics literature.

However, these physical breakthroughs were far beyond the reach of rigorous proof at the time, and much of it remains so till date. The rigorous analysis of the Sherrington-Kirkpatrick model began with the works of Aizenman, Lebowitz and Ruelle [1] and Fröhlich and Zegarliński [15] in the late eighties; the field remained stagnant for a while, interspersed with a few nice papers occasionally (e.g. [8], [32]). The deepest mysteries of the broken replica analysis of the S-K model remained mathematically intractable for many more years until the path-breaking contributions of Guerra, Toninelli, Talagrand, Panchenko and others in the last ten years (see e.g. [2], [19], [18], [30], [17], [34], [35]). Arguably the most notable achievement in this period was Talagrand’s proof of the Parisi formula [35].

However, in spite of all this remarkable progress, our understanding of these complicated mathematical objects is still shrouded in mystery, and many conjectures remain unresolved. In this article we attempt to give a mathematical foundation to some aspects of spin glasses that have been well-known in the physics community for a long time but never before penetrated by rigorous mathematics. Let us now embark on a description of our main results. Further references and connections with the literature will be given at the appropriate places along the way.

1.1. Weak multiple valleys in the S-K model

Consider the following simple-looking probabilistic question: Suppose are i.i.d. standard Gaussian random variables, and we define, for each , the quantity

(1)

Then is it true that with high probability, there is a large subset of such that

(2)

and any two distinct elements of are nearly orthogonal, in the sense that

(3)

(In the spin glass literature, the quantity is called the ‘overlap’ between the ‘configurations’ and .) To realize the non-triviality of the question, consider a slightly different Gaussian field on , defined as

where are i.i.d. standard Gaussian random variables. Then clearly, is maximized at , where . Note that for any ,

It is not difficult to argue from here that if is another configuration that is near-maximal for , then must agree with at nearly all coordinates. Thus, the field does not satisfy the ‘multiple peaks picture’ that we are investigating about . This is true in spite of the fact that and are approximately independent for almost all pairs .

We have the following result about the existence of multiple peaks in the field . It says that with high probability, there is a large collection of configurations satisfying (2) and (3), that is, for any two distinct , and for each .

Theorem 1.1.

Let be the field defined in (1), and define the overlap between configurations by the formula (3). Let

Then there are constants , , , and such that with probability at least , there is a set satisfying

  1. ,

  2. for all , , and

  3. for all .

Quantitatively, we can take , , and , where is an absolute constant. However these are not necessarily the best choices.

Let us now discuss the implication of this result in spin glass theory. The Sherrington-Kirkpatrick model of spin glasses, introduced in [33], is defined through the Hamiltonian (i.e. energy function)

(4)

The S-K model at inverse temperature defines a probability measure on through the formula

(5)

where is the normalizing constant. The measure is called the Gibbs measure.

According to the folklore in the statistical physics community, the energy landscape of the S-K model has ‘multiple valleys’. Although no precise formulation is available, one way to view this is that there are many nearly orthogonal states with nearly minimal energy. For a physical discussion of the ‘many states’ aspect of the S-K model, we refer to [26], Chapter III. A very interesting rigorous formulation was attempted by Talagrand (see [34], Conjecture 2.2.23), but no theorems were proved. Although our achievement is quite modest, and may not be satisfactory to the physicists because we do not prove that the approximate minimum energy states correspond to significantly large regions of the state space — in fact, one may say that it is not what is meant by the physical term ‘multiple valleys’ at all because an isolated low energy state does not necessarily represent a valley — it does seem that Theorem 1.1 is the first rigorous result about the multimodal geometry of the Sherrington-Kirkpatrick energy landscape. We may call it ‘multiple valleys in a weak sense’.

Theorem 1.1 can be generalized to the following Corollary, which shows that weak multiple valleys exist at ‘every energy level’ and not only for the lowest energy.

Corollary 1.2.

Let all notation be the same as in Theorem 1.1. Fix a number . Then for all sufficiently large , with probability at least there exists a set satisfying conditions (a) and (b) of Theorem 1.1, such that for all .

The variables in the Hamiltonian are collectively called the ‘couplings’ or the ‘disorder’. Our proof of Theorem 1.1 is based on the chaotic nature of the S-K model under small perturbations of the couplings; this is discussed in the next subsection. The relation between chaos and multiple valleys follows from a general principle outlined in [7], although the proof in the present paper is self-contained.

1.2. Disorder chaos in the S-K model

Recall the Gibbs measure of the S-K model, defined in (5). Suppose and are two configurations drawn independently according to the measure , and the overlap is defined as in (3). It is known that when , with high probability [15, 8, 34]. However, it is also known that cannot be concentrated near zero for all , because that would give a contradiction to the existence of a phase transition as established in [1]. In fact, it is believed that the limiting distribution of in the low temperature phase is given by the so-called ‘Parisi measure’, a notion first made rigorous by Talagrand [35, 36].

Now suppose we choose not from the Gibbs measure , but from a new Gibbs measure , based on a new Hamiltonian which is obtained by applying a small perturbation to the Hamiltonian . (We will make precise the notion of a small perturbation below.) Is it still true that has a non-degenerate limiting distribution at low temperatures? The conjecture of disorder chaos (i.e. chaos with respect to small fluctuations in the disorder ) states that indeed that is not the case: is concentrated near zero if is picked from the Gibbs measure and is picked from a perturbed Gibbs measure. This is supposed to be true at all temperatures. To the best of our knowledge, disorder chaos for the S-K model was first discussed in the widely cited paper of Bray and Moore [5]; a related discussion appears in the earlier paper [25]. The phenomenon of chaos itself was first conjectured by Fisher and Huse [13] in the context of the Edwards-Anderson model, although the term was coined in [5]. Again, to the best of our knowledge, nothing has been proved rigorously yet. For further references in the physics literature, let us refer to the recent paper [24].

Note that this idea of chaos should not be confused with temperature chaos (also discussed in [5]), which says that spin glasses are chaotic with respect to small changes in the inverse temperature .

We shall consider two kinds of perturbation of the disorder. The first, what we call ‘discrete perturbation’, is executed by replacing a randomly chosen small fraction of the couplings by independent copies. Here small fraction means a fraction that goes to zero as . Discrete perturbation is the usual way to proceed in the noise-sensitivity literature (see e.g. [3, 4, 31, 27, 16]). In fact, it seems that the following result is intimately connected with noise-sensitivity, although we do not see any obvious way to use the standard noise-sensitivity techniques to derive it.

Theorem 1.3.

Consider the S-K model at inverse temperature . Take any and . Suppose a randomly chosen fraction of the couplings are replaced by independent copies to give a perturbed Gibbs measure. Let be chosen from the original Gibbs measure and is chosen from the perturbed measure. Let the overlap be defined as in (3). Then

where is an absolute constant and the expectation is taken over all randomness.

This theorem shows that the system is chaotic if the fraction goes to zero slower than . The derivation of this result is based on the ‘superconcentration’ property of the free energy in the S-K model that we present in the next subsection.

The notion of perturbation in the above theorem, though natural, is not the only available notion. In fact, in the original physics papers (e.g. [5]), a different manner of perturbation is proposed, which we call continuous perturbation. Here we replace by , where is another set of indepenent standard Gaussian random variables and so that the resultant couplings are again standard Gaussian. When , we say that the perturbation is small. A convenient way to parametrize the perturbation is to set , where is a parameter that we call ‘time’. This nomenclature is natural, because perturbing the couplings up to time corresponds to running an Ornstein-Uhlenbeck flow at each coupling for time , with initial value . The following theorem says that the S-K model is chaotic under small continuous perturbations.

Theorem 1.4.

Consider the S-K model at inverse temperature . Take any . Suppose we continuously perturb the couplings up to time , as defined above. Let be chosen from the original Gibbs measure and be chosen from the perturbed measure. Let the overlap be defined as in (3). Then there is an absolute constant such that for any positive integer ,

The expectation is taken over all randomness.

Again, the achievement is very modest, and does not come anywhere close to the claims of the physicists. But once again, this is the first rigorous result about chaos of any kind in the S-K model. To the best of our knowledge, the only other instance of a rigorous proof of chaos in any spin glass model is in the work of Panchenko and Talagrand [30], who established chaos with respect to small changes in the external field in the spherical S-K model. Disorder chaos in directed polymers was established by the author in [7].

A deficiency of both theorems in this subsection is that they do not cover the case of zero temperature, that is, , where Gibbs measure concentrates all its mass on the ground state. In principle, the same techniques should apply, but there are some crucial hurdles that cannot be cleared with the available ideas.

1.3. Superconcentration in the S-K model

The notion of superconcentration was defined in [7]. The definition in [7] pertains only to maxima of Gaussian fields, but it can be generalized to roughly mean the following: a Lipschitz function of a collection of independent standard Gaussian random variables is superconcentrated whenever its order of fluctuations is much smaller than its Lipschitz constant. This definition is related to the classical concentration result for the Gaussian measure, which says that the order of fluctuations of a Lipschitz function under the Gaussian measure is bounded by its Lipschitz constant (see e.g. Theorem 2.2.4 in [34]), irrespective of the dimension.

The free energy of the S-K model is defined as

(6)

where is the Hamiltonian defined in (4). It follows from classical concentration of measure that the variance of is bounded by a constant multiple of (see Corollary 2.2.5 in [34]). This is the best known bound for . When , Talagrand (Theorems 2.2.7 and 2.2.13 in [34]) proved that the variance can actually be bounded by an absolute constant. This is also indicated in the earlier works of Aizenman, Lebowitz and Ruelle [1] and Comets and Neveu [8]. Therefore, according to our definition, the free energy is superconcentrated when . The following theorem shows that is superconcentrated at any .

Theorem 1.5.

Let be the free energy of the S-K model defined above in (6). For any , we have

where is an absolute constant.

This result may be reminiscent of the improvement in the variance of first passage percolation time [4]. However, the proof is quite different in our case since hypercontractivity, the major tool in [4], does not seem to work for spin glasses in any obvious way. In that sense, the two results are quite unrelated. Our proof is based on our chaos theorem for continuous perturbation (Theorem 1.4) and ideas from [7]. On the other hand, Theorem 1.5 is used to derive the chaos theorem for discrete perturbation, again drawing upon ideas from [7]. This equivalence between chaos and superconcentration is one of the main themes of [7], which in a way shows the significance of superconcentration, which may otherwise be viewed as just a curious phenomenon.

Incidentally, it was shown by Talagrand ([37], eq. (10.13)) that the lower tail fluctuations of are actually as small as order under an unproven hypothesis about the Parisi measure.

1.4. Disorder chaos in the E-A model

Let be an undirected graph. The Edwards-Anderson spin glass [11] on is defined through the Hamiltonian

(7)

where is again a collection of i.i.d. random variables, often taken to be Gaussian. The S-K model corresponds to the case of the complete graph, up to normalization by .

For a survey of the (few) rigorous and non-rigorous results available for the Edwards-Anderson model, we refer to Newman and Stein [28].

Unlike the S-K model, there are two kinds of overlap in the E-A model. The ‘site overlap’ is the usual overlap defined in (3). The ‘bond overlap’ between two states and , on the other hand, is defined as

(8)

We show that the bond overlap in the E-A model is not chaotic with respect to small fluctuations of the couplings at any temperature. This does not say anything about the site overlap; the site overlap in the E-A model can well be chaotic with respect to small fluctuations of the couplings, as predicted in [13, 5].

Theorem 1.6.

Suppose the E-A Hamiltonian (7) on a graph is continuously perturbed up to time , according to the definition of continuous perturbation in Section 1.2. Let be chosen from the original Gibbs measure at inverse temperature and is chosen from the perturbed measure. Let the bond overlap be defined as in (8). Let

where is the maximum degree of . Then

where is a positive absolute constant. Moreover, the result holds for also, with the interpretation that the Gibbs measure at is just the uniform distribution on the set of ground states.

An interesting case of the above theorem is when . The result then says that if two configurations are drawn independently from the Gibbs measure, they have a non-negligible bond overlap with non-vanishing probability. The fact that this holds at any finite temperature is in contrast with the mean-field case (i.e. the S-K model), where there is a high-temperature phase () where the bond overlap becomes negligible.

However, while Theorem 1.6 establishes that the bond overlap does not become zero for any amount of perturbation, it does exhibit a sort of ‘quenched chaos’, in the following sense.

Theorem 1.7.

Fix and let be as in Theorem 1.6. Then

That is, if we perturb the system by an amount , the bond overlap between two configurations drawn from the two Gibbs measures is approximately equal to the quenched average of the overlap. In physical terms, the overlap ‘self-averages’.

The combination of the last two theorems brings to light a surprising phenomenon. On the one hand, the perturbation retains a memory of the original Gibbs measure, because the overlap is non-vanishing in Theorem 1.6. On the other hand, the perturbation causes a chaotic reorganization of the Gibbs measure in such a way that the overlap concentrates on a single value in Theorem 1.7. The author can see no clear explanation of this confusing outcome.

1.5. Absence of superconcentration in the E-A model

The proof of Theorem 1.6 is based on the following result, which says that the free energy is not superconcentrated in the E-A model on bounded degree graphs. This generalizes a well-known result of Wehr and Aizenman [38], who proved the analogous result on square lattices. The relative advantage of our approach is that it does not use the structure of the graph, whereas the Wehr-Aizenman proof depends heavily on properties of the lattice.

Theorem 1.8.

Let denote the free energy in the Edwards-Anderson model on a graph , defined in (6). Let be the maximum degree of . Then for any , including (where the free energy is just the energy of the ground state), we have

The above result is based on a formula (Theorem 3.11) for the variance of an arbitrary smooth function of Gaussian random variables.

1.6. A note about other models

It will clear from our proofs that the chaos and superconcentration results hold for the -spin versions of the S-K model for even . (See Chapter 6 of [34] for the definition of these models and various results.) In fact, a generalization of Theorem 1.4 is proven in Theorem 3.5 later, which includes the -spin models for even .

It will also be clear that the lack of superconcentration is true in the Random Field Ising Model on general bounded degree graphs. (Again, the lattice case is handled in [38]. We refer to [38] for the definition of the RFIM.) The absence of superconcentration in the RFIM implies that the site overlap is stable under perturbations, instead of the bond overlap as in the E-A model.

A simple model where our techniques give sharp results is the Random Energy Model (REM). This is discussed in Subsection 3.14.

1.7. Unsolved questions

In spite of the progress made in this paper over [7], many key issues are still out of reach. Some of them are as follows:

  1. Improve the multiple valley theorem (Theorem 1.1) so that is a negative power of , preferably better than , which will prove ‘strong multiple valleys’ in the sense of [7].

  2. Another possible improvement to Theorem 1.1 can be achieved by increasing to something of the form .

  3. Prove the chaos theorems (Theorems 1.3 and 1.4) for the ground state () of the S-K model.

  4. Improve the superconcentration result (Theorem 1.5) so that the right hand side is for some . This is tied to the improvement of the chaos result.

  5. If the above is not possible, at least prove a version of the superconcentration result where the right hand side does not depend on , or has a better dependence than . This will solve the question of chaos for .

  6. Prove that the site overlap in the Edwards-Anderson model is chaotic with respect to fluctuations in the disorder, even though the bond overlap is not.

  7. Prove disorder chaos in the S-K model with nonzero external field, that is, if there is an additional term of the form in the Hamiltonian. The general nature of the S-K model indicates that any result for may be substantially harder to prove than for . (Reportedly, a sketch of the proof in this case will appear in the new edition of [34].)

  8. Show that in the E-A model, the variance of tends to zero and the graph size goes to infinity.

  9. Establish temperature chaos in any of these models.

The rest of the paper is organized as follows. In Section 2, we sketch the proofs of the main results. In Section 3, we present some general results that cover a wider class of Gaussian fields. All proofs are given in Section 3.

2. Proof sketches

In this section we give very short sketches of some of the main ideas of this paper.

2.1. Multiple valleys from chaos

Suppose we choose from the Gibbs measure at inverse temperature and from the measure obtained by applying a continuous perturbation up to time . Let and be the two Hamiltonians. Suppose and sufficiently slowly so that chaos holds (i.e.  as ). Clearly this is possible by Theorem 1.4. Then due to chaos, and are approximately orthogonal. Since , nearly minimizes and nearly minimizes . But, since , . Thus, and both nearly minimize . This procedure finds two states that have nearly minimal energy and are nearly orthogonal. Repeating this procedure, we find many such states. The details are of this argument are worked out in Subsection 3.3.

2.2. Superconcentration iff chaos under continuous perturbations

Let denote when is drawn from the unperturbed Gibbs measure at inverse temperature and is drawn from the Gibbs measure continuously perturbed up to time . Let be the free energy defined in (6). Then we show that

(9)

The proof of this result (Theorem 3.8) is simply a combination of the heat equation for the Ornstein-Uhlenbeck process and integration-by-parts. The formula directly shows that whenever falls of sharply to zero, which is a way of saying that chaos implies superconcentration.

In Subsection 3.1, we show that is a nonnegative and decreasing function. This proves the converse implication, since the integral of a nonnegative decreasing function can be small only if the function drops off sharply to zero.

2.3. Chaos under continuous perturbations

Suppose is drawn from the Gibbs measure of the S-K model at inverse temperature , and from the measure continuously perturbed up to time . Let be the overlap of and , as usual, and let

We have to show that for all ,

where is some constant that depends only on .

By repeated applications of differentiation and Gaussian integration-by-parts, we show that for all and . Here denotes the th derivative of . Such functions are called completely monotone. Now, by a classical theorem of Bernstein about completely monotone functions, there is a probability measure on such that

(10)

By Hölder’s inequality and the above representation, it follows that for ,

In other words, chaos under large perturbations implies chaos under small perturbations. Thus, it suffices to prove that for sufficiently large .

The next step is an ‘induction from infinity’. It is not difficult to see that when , after integrating out the disorder, and are independent and uniformly distributed on . From this it follows that . We use this to obtain a similar bound on for sufficiently large , through the following steps. First, we show that for any and ,

Thus, we have a chain of differential inequalities. It is possible to manipulate this chain to conclude that

The right hand side is bounded by if and only if is sufficiently large. (This is related to the fact that when is a standard Gaussian random variable, if and only if .) This completes the proof sketch. The details of the above argument are worked out in Subsection 3.1.

2.4. Chaos in E-A model

The proof of Theorem 1.6, again, is based on the representation (9) of the variance of the free energy and the representation (10) of the function (both of which hold for the E-A model as well). From (10), it follows that there is a nonnegative random variable such that for all ,

From this and (9) it follows that

Next, we prove a simple analytical fact: Suppose is a nonnegative random variable and let . Then for any ,

Using this inequality for the random variable and the lower bound on the variance from Theorem 1.8, it is easy to obtain the required lower bound on the function , which establishes the absence of chaos. The details of this argument are presented in Subsection 3.7.

The proof of Theorem 1.7 involves a new idea. Let , and let be independent copies of . For each , let

For each , let denote a configuration drawn from the Gibbs measure defined by the disorder . For , we assume that and are independent given . Define

By a similar logic as in the derivation of (10), one can show that is a completely monotone function. Also, is bounded by . Thus, for any ,

(11)

Now fix and let

It turns out that

and

where is the bond overlap between and . Combining these two identities with the inequality (11), it is easy to complete the proof of Theorem 1.7. The details are in Subsection 3.11.

2.5. Chaos under discrete perturbations

Let , and let be an independent copy of . For any , let be the array whose th component is

Let be the free energy, considered as a function of . Suppose and are constants such that for all ,

Fix , and let be a subset of , chosen uniformly at random from the collection of all subsets of size . Let be chosen from the Gibbs measure at inverse temperature defined by the disorder , and let be drawn from the Gibbs measure defined by . Let denote the overlap of and , as usual. The key step is to prove that for some absolute constant ,

This inequality is the content of Theorem 3.14. The proof is completed by showing that we can choose and such that , and using the superconcentration bound (Theorem 1.5) on the variance of . The details of the proof are given in Subsection 3.12.

2.6. No superconcentration in the E-A model

Although this result was already proven in [38] for the E-A model on lattices, it may be worth sketching our argument for general bounded degree graphs here. Our proof is based on a general lower bound for arbitrary functions of Gaussian random variables. The result (Theorem 3.12) goes as follows: Suppose is an absolutely continuous function such that there is a version of its gradient that is bounded on bounded sets. Let be a standard Gaussian random vector in , and suppose and are both finite. Then

where denotes the usual inner product on . We apply this result to the Gaussian vector , taking the function to be the free energy . A few tricks are required to get a lower bound on the right hand side that does not blow up as .

Incidentally, the above lower bound on the variance of Gaussian functionals is based on a multidimensional Plancherel formula that may be of independent interest:

(12)

Versions of this formula have been previously derived in the literature using expansions with respect to the multivariate orthogonal Hermite polynomial basis (see Subsection 3.8 for references). We give a different proof avoiding the use of the orthogonal basis.

3. General results about Gaussian fields and proofs

The results of Section 1 are applications of some general theorems about Gaussian fields. These are presented in this section, together with the proofs of the theorems of Section 1. Unlike the previous sections, we proceed according to the theorem-proof format in the rest of the paper.

3.1. Chaos in Gaussian fields

Let be a finite set and let be a centered Gaussian random vector. Let

Let be an independent copy of , and for each , let

Fix . For each , define a probability measure on that assigns mass

to the point , for each . The average of a function under the measure will be denoted by , that is,

We will consider the covariance kernel as a function on , defined as . Alternatively, it will also be considered as a square matrix.

Theorem 3.1.

Assume that for all . For each , let

Let be any convergent power series on all of whose coefficients are nonnegative. Then for each ,

Moreover, is a decreasing function of .

Roughly, the way to apply this theorem is the following: prove that the right hand side is small for some large using high temperature methods, and then use the infimum to show that the smallness persists for small as well.

Since the application of Theorem 3.1 to the S-K model seems to yield a suboptimal result (Theorem 1.4), one can question whether Theorem 3.1 can ever give sharp bounds. In Subsection 3.14 we settle this issue by showing that Theorem 3.1 gives a sharp result for Derrida’s Random Energy Model.

Let us now proceed to prove Theorem 3.1. In the following, will denote the set of all infinitely differentiable real-valued functions on with bounded derivatives of all orders.

Let us first extend the definition of to negative . This is done quite simply. Let be another independent copy of that is also independent of , and for each , let

Let us now recall Gaussian integration by parts: If is an absolutely continuous function such that has finite expectation, then for any ,

where denotes the partial derivative of along the th coordinate (see e.g. [34], Appendix A.6). The following lemma is simply a reformulated version of the above identity.

Lemma 3.2.

For any , we have

Proof.

For each , define

A simple computation gives

(Note that issues like moving derivatives inside expectations are easily taken care of due to the assumption that .) One can verify by computing covariances that and the pair are independent. Moreover,

So for any , Gaussian integration by parts gives

The proof is completed by combining the last two steps. ∎

Our next lemma is the most crucial component of the whole argument. It gives a way of extrapolating high temperature results to the low temperature regime.

Lemma 3.3.

Let be the class of all functions on that can be expressed as

for some nonnegative integer and nonnegative real numbers , and functions in . For any , there is a probability measure on such that for each ,

In particular, for any ,

Proof.

Note that any must necessarily be a nonnegative function, since and are independent and identically distributed conditional on , which gives

Now, if , then for all , and there is nothing to prove. So let us assume .

Since is a positive semidefinite matrix, there is a square matrix such that . Thus, given a function , if we define

then by Lemma 3.2 we have

From this observation and the definition of , it follows easily that if , then . Proceeding by induction, we see that for any , is a nonnegative function (where denotes the th derivative of ). Such functions on are called ‘completely monotone’. The most important property of completely monotone functions (see e.g. Feller [12], Vol. II, Section XIII.4) is that any such function can be represented as the Laplace transform of a positive Borel measure on , that is,

Moreover, . By taking , this proves the first assertion of the theorem. For the second, note that by Hölder’s inequality, we have that for any ,

This completes the proof. ∎

The next lemma is obtained by a variant of the Gaussian interpolation methods for analyzing mean field spin glasses at high temperatures. It is similar to R. Latała’s unpublished proof of the replica symmetric solution of the S-K model (to appear in the new edition of [34]).

Lemma 3.4.

Let and be as in Theorem 3.1. Then for each ,

Proof.

For each , define a function as

Note that

where if and otherwise. Since is bounded, this proves in particular that .

Take any nonnegative integer . Since is a positive semidefinite matrix, so is . (To see this, just note that are independent copies of , then .) Therefore there exists a matrix such that . Define the functions

In the following we will denote and by and respectively, for all . Let

By Lemma 3.2 we get