Strict convexity of the free energy

Strict convexity of the free energy of the canonical ensemble under decay of correlations

Abstract.

We consider a one-dimensional lattice system of unbounded, real-valued spins. We allow arbitrary strong, attractive, nearest-neighbor interaction. We show that the free energy of the canonical ensemble converges uniformly in  to the free energy of the grand canonical ensemble. The error estimates are quantitative. A direct consequence is that the free energy of the canonical ensemble is uniformly strictly convex for large systems. Another consequence is a quantitative local Cramér theorem which yields the strict convexity of the coarse-grained Hamiltonian. With small adaptations, the argument could be generalized to systems with finite-range interaction on a graph, as long as the degree of the graph is uniformly bounded and the associated grand canonical ensemble has uniform decay of correlations.

Key words and phrases:
Canonical ensemble, equivalence of ensembles, local Cramér theorem, strong interaction, phase transition
2010 Mathematics Subject Classification:
Primary: 82B05, Secondary: 60F05, 82B26.

1. Introduction

The broader scope of this article is the study of phase transitions. A phase transition occurs if a microscopic change in a parameter leads to a fundamental change in one or more properties of the underlying physical system. The most well-known phase transition is when water becomes ice. Many physical and non-physical systems and mathematical models have phase transitions. For example, liquid-to-gas phase transitions are known as vaporization. Solid-to-liquid phase transitions are known as melting. Solid-to-gas phase transitions are known as sublimation. More examples are the phase transition in the 2-d Ising model (see for example [Sel16]), the Erdös-Renyi phase transition in random graphs (see for example [ErRe60][ErRe61] or [KrSu13]) or phase transitions in social networks (see for example [Fro07]).

We are interested in studying a one-dimensional lattice systems of unbounded real-valued spins. The system consists of a finite number of sites  on the lattice . For convenience, we assume that the set  is given by . At each site  there is a spin . In the Ising model the spins can take on the value  or . In this article, we consider real-valued spins . A configuration of the lattice system is given by a vector . The energy of a configuration  is given by the Hamiltonian  of the system. For the detailed definition of the Hamiltonian  we refer to Section 2. We consider arbitrary strong, attractive, nearest-neighbor interaction.

We consider two ensembles of the lattice system. The first ensemble is the grand-canonical ensemble which is given by the Gibbs measure

(1)

Here,  is a generic normalization constant making the measure  a probability measure. The constant  is interpreted as an external field. The second ensemble is the canonical ensemble. It emerges from the grand-canonical ensemble by conditioning on the mean spin

(2)

The canonical ensemble is given by the probability measure

(3)

where  denotes the -dimensional Hausdorff measure.

The grand-canonical ensemble has a phase transition on the two-dimensional lattice (see for example [Pei36]). However, on the one-dimensional lattice the grand-canonical ensemble does not have a phase transition if the interaction decays fast enough (see for example [Isi25, Dob68, Dob74, Rue68, MeNi14]). More precisely, in this work a system has no phase transition if the infinite-volume Gibbs measure of the system is unique. It is a natural question if the canonical ensemble  also does not have a phase transition on the one-dimensional lattice. This is a non-trivial question since there are known examples where the grand canonical ensemble has no phase transition but the canonical ensemble has (see for example [ScSh96, BiChKo02, BiChKo03]).

If the spins are -valued there is no phase transition for the canonical ensemble on a one dimensional lattice with nearest-neighbor interaction. The authors could not find a proof of that statement in the literature but it follows from a result by Cancrini, Martinelli and Roberto [CMR02]. There, a logarithmic Sobolev inequality is deduced for the canonical ensemble on lattices of arbitrary dimension, provided the grand canonical ensemble satisfies a mixing condition. The mixing condition used in [CMR02] is that the grand canonical ensemble has an exponential decay of correlation that is uniform in the external field . This hypothesis is satisfied if the underlying lattice is one-dimensional. In our article we will use a similar mixing condition.

Up to the authors knowledge, this question is still open if the spins are real-valued and unbounded. We conjecture that this is true i.e. the infinite-volume Gibbs measure of the canonical ensemble should be unique. A first step toward verifying this conjecture is to study the equivalence of the grand-canonical and canonical ensemble. In equivalent ensembles, properties usually transfer from one ensemble to the other. The equivalence of ensembles in one-dimensional lattice system was deduced by Dobrushin [DoTi77] for discrete (or bounded) spin values or by Georgii [Ge95] for quadratic Hamiltonians. However, our case where the spin values are unbounded and the Hamiltonian is not quadratic is still open.

There are many different notions of equivalence of ensembles. We only consider the most simple type, namely the equivalence of thermodynamic quantities (see for example [Ad06]). This means that as the system size goes to infinity the free energy of the grand canonical ensemble converges to the free energy of the canonical ensemble (for more details see Section 2 below).

In the main result of this article, i.e. in Theorem 2.3 below, we show that the grand canonical and canonical ensemble are equivalent. In fact, we show that free energies converge uniformly in  as the system size goes to infinity. The rate of convergence in Theorem 2.3 is explicit. We therefore extend and refine the results of Dobrushin [DoTi77] and Georgii [Ge95].

Our argument is quite general and should apply to more general situations. The argument does not use that the lattice is one-dimensional. Instead, it only uses that the grand canonical ensemble on a one-dimensional lattice has an uniform exponential decay of correlations (see for example [MeNi14] and [Zeg96]). Under the assumption of an uniform exponential decay of correlation, one should be able to use similar calculations to deduce the local Cramér theorem for spin systems on arbitrary graphs, as long as the degree is uniformly bounded and the interaction has finite range. However, we only consider the one-dimensional lattice with nearest-neighbor interaction because less notational burden is better for explaining ideas and presenting the calculations.

A consequence of Theorem 2.3 is that the free energy of the canonical ensemble is uniformly strictly convex and quadratic for large enough systems (see Corollary 2.4). Strict convexity of the free energy rules out phase coexistence which corresponds flat parts in the free energy. The most prominent example of phase coexistence is that under ordinary pressure water and ice can coexist at 0 degree Celsius. We want to point out that our result already applies to large but finite systems. In the infinite-volume limit, ordinary equivalence of ensembles (and not equivalence in ) would suffice to conclude that the free energy of the canonical ensemble is strictly convex.

Closely related to the free energy  of the canonical ensemble is the notion of the coarse-grained Hamiltonian (cf. is (28) and [GrOtViWe09]). As in [GrOtViWe09], we derive from Theorem 2.3 a local Cramér theorem (see Theorem 2.6). The local Cramér theorem shows that the coarse-grained Hamiltonian converges in  to the Legendre transform of the free energy of the grand canonical ensemble. It is a direct consequence of the -local Cramér theorem that the coarse-grained Hamiltonian  is also uniformly strictly convex for large enough system size  (cf. Corollary 2.7).

The coarse-grained Hamiltonian  plays an important role when studying the Kawasaki dynamics. The Kawasaki dynamics is natural drift diffusion process on our lattice system that conserves the mean spin of the system. The canonical ensemble is the stationary and ergodic distribution of the Kawasaki dynamics. The strict convexity of  is a central ingredient for deducing a uniform logarithmic Sobolev inequality (LSI) for the canonical ensemble via the two-scale approach [GrOtViWe09]. The LSI characterizes the speed of convergence of the Kawasaki dynamics to the canonical ensemble. With the equivalence of dynamic and static phase transitions (see [MeNi14] or [Yos03] a uniform LSI would also yield the absence of a phase transition and verify our conjecture (i.e. that the infinite-volume Gibbs measure is unique). Additionally, a uniform LSI is one if the main ingredients when deducing hydrodynamic limit of the Kawasaki dynamic via the two scale approach (see next paragraph). The uniform LSI for the canonical ensemble with no interaction is a well-known result (see for example [Cha03, LaPaYa02, GrOtViWe09]). For weak interaction the uniform LSI was deduced in [Me11]. The question if the canonical ensemble satisfies a uniform LSI for strong nearest-neighbor interaction is still open. For -valued spins the answer is yes (see [CMR02]). The authors believe that this should also be the case for unbounded real-valued spins.

The strict convexity of the coarse-grained Hamiltonian also plays a crucial role when deducing the hydrodynamic limit of the Kawasaki dynamic. The hydrodynamic limit is a law of large numbers for processes. It states that under the correct scaling the Kawasaki dynamics (which is a stochastic process) converges to the solution of a non-linear heat equation (which is deterministic). It is conjectured by H.T. Yau that the hydrodynamic limit also holds for strong finite-range interactions on a one-dimensional lattice. So far, this conjecture is still wide open. The strict convexity of the coarse-grained Hamiltonian, which is deduced in this article, is an important cornerstone to tackle this problem with the help of the two-scale approach (see [GrOtViWe09]).

Let us now comment on how the -equivalence of ensembles is deduced. The motivation for our approach comes from the proof of the local Cramér theorem in [GrOtViWe09] and [Me11]. By using Cramér’s trick of an exponential shift it suffices to show -bounds on the density of a sum of random variables   (see also Proposition 3.6 below). Those desired bounds were derived  [GrOtViWe09] and [Me11] via a local central limit theorem (clt) for independent random variables. Our situation is a lot more subtle: Instead of deducing a local clt for independent random variables we would have to deduce a local clt for dependent random variables. At this point one could hope to use existing methods to deduce the local clt. Let us mention for example the approach of Dobrushin [Dob74], the approach of Bender [Ben73] or the approach of Wang and Woodroofe [WaWo90]. Unfortunately this does not help. All methods –at least the ones that are known to the authors– use the following principle (see also [DeSaMe16]):

(4)

The first ingredient, namely the integral clt for the dependent random variables  is relatively easy to deduce. There are a lot of methods available. Let us mention for example Stein’s method (see for example [ChGoSh11]), methods that are based on mixing, or methods that are based on Donsker’s theorem (see for example [Dur2010]). Deducing the second ingredient is tricky, not to mention that Dobrushin [Dob74] carried out that step only for discrete or bounded random variables.

All in all, this approach has two fundamental problems. The first one is that we need not only to control the density itself but also the first and second derivative. As a consequence, one would need very detailed information about the regularity of the density. We also believe that showing this regularity is as hard as directly deducing the local central limit theorem. Let us turn to the second problem. In order to deduce Theorem 2.3 the local central limit theorem must be quantitative. Using the principle from above yields suboptimal rates of convergence. For deducing Theorem 2.3 one has to iteratively apply the principle three times; and in each iteration the convergence rate gets worse. One would have to hope that in the end the convergence rate is still good enough for deducing Theorem 2.3.

Instead of using the principle from above, we generalize a well-known method for proving the local clt for independent random variables to dependent ones. We generalize the method that is based on characteristic functions and Fourier inversion (see [Fel71] and [GrOtViWe09]). Calculations get quite evolved and lengthy. We do not deduce a local clt for dependent random variables in this work. Instead, we only deduce bounds that are needed to deduce Theorem 2.3 (cf. Proposition 3.6 below). However, one could use our calculations as a guideline for deducing a quantitative, local clt for dependent random variables. When doing so, one would have to substitute some of our arguments that use the specific structure of our lattice model. We use the following special structure:

  • Exponential decay of correlations (see Lemma 3.5).

  • The interaction has finite range . More precisely, we use that two spins   and  become independent if  and one conditions on the spin values  between them (see Section 2).

  • The Hamiltonian is quadratic. More precisely, we use the following consequence. For all  the conditional variances  is bounded from above and below uniformly in the values  (see Section 2).

  • Higher moments of  conditioned on  are uniformly controlled by lower moments. This fact is used to show that the characteristic functions of  conditioned on  have a uniform decay (see Lemma 3.2 and Lemma 3.4).

As mentioned before, the local Cramér theorem (see Theorem 2.6) is deduced by generalizing the argument of [GrOtViWe09] for independent random variables to dependent random variables. This adds a lot more complexity to the task. We overcome the technical challenges of considering dependent random variables by using two strategies. The first strategy is to induce artificial independence by conditioning on even or odd random variables. The second strategy is to handle dependencies as a perturbation. We morally treat large blocks as single sites of a coarse-grained system. Because there is a big distance between the blocks, the blocks are only weakly dependent. Then, the error term can be controlled by using the decay of correlations. For more details we refer to the comments after Proposition 3.6 and at the beginning of Section 4.

Let us shortly discuss possible generalizations of our main result. We expect that one can generalize our method with only slight modifications to the following situation:

  • instead of nearest-neighbor interaction to finite range interaction.

  • instead of exponential decay of correlations to sufficiently fast algebraic decay.

  • instead of a 1d lattice to any lattice or graph with bounded degree, as long as the grand canonical ensemble  has sufficient decay of correlations, uniformly in the system size and the external field .

  • instead of attractive interaction to repulsive and mixed interactions, as long as the estimate

    (5)

    is satisfied. For attractive interaction this estimate is deduced in Lemma 3.2.

More challenging, it would be very interesting to study the local Cramér theorem for the following changes:

  • Instead of finite-range interaction one could consider infinite-range interaction. In the case of the grand canonical ensemble on a one-dimensional lattice, there is no phase transition if the interaction  decays algebraically faster than . The decay condition is sharp (see for example [MeNi14] and references therein). It would be very interesting to know if the -local Cramér theorem (see Theorem 2.6) also holds for this decay or a stronger algebraic decay is needed.

  • In our model we need a quadratic single-site potential. Inspired from [FaMe14], it is natural to ask if the local Cramér theorem also holds for super-quadratic or polynomially increasing single-site potentials.

  • Inspired by [Dob74] or [Ge95] it would be interesting to study more general interaction than pairwise-quadratic interaction.

We conclude the introduction by giving a short overview over the remaining article. In Section 2 we introduce the precise setting and formulate the main results. In Section 3 we deduce the main results of this article up to Proposition 3.6. The main computations are done in Section 4 where we give the proof of Proposition 3.6.

Conventions and Notation

  • The symbol  denotes the term that is given by the line .

  • With uniform we mean that a statement holds uniform in the system size , the mean spin  and the external field .

  • We denote with  a generic uniform constant. This means that the actual value of  might change from line to line or even within a line.

  • denotes that there is a uniform constant  such that .

  • means that  and .

  • denotes the -dimensional Hausdorff measure.

  • is a generic normalization constant. It denotes the partition function of a measure.

2. Setting and main results

We start with explaining the details of our model. We consider the sublattice . The Hamiltonian  of the system is defined as

(6)

We make the following assumptions:

  • The single-site potential  can be written as

    (7)

    where the function  satisfies

    (8)
  • The numbers  are arbitrary. They model the interaction of the system with an external field or the boundary.

  • The number  is arbitrary. It models the strength of the interaction. The interaction is attractive if . The interaction is repulsive if  .

Now, let us turn to the first main result of this article, namely the equivalence of ensembles (see Theorem 2.3 from below). The grand canonical ensemble (gce)  is a probability measure on  given by the Lebesgue density

(9)

The free energy of the gce  is given by (cf. (1) and [GrOtViWe09])

(10)

We observe that  is uniformly strictly convex. More precisely, it holds:

Lemma 2.1.

Let  be a real-valued random variable distributed according to the gce . Assume that

(11)

Then the free energy  of the gce  is uniformly strictly convex in the sense that there exists a constant  such that for all 

(12)

The proof of Lemma 2.1 is given in Section 3. The core ingredient of the argument is a uniform Poincaré inequality. The additional assumption (11) is not very restrictive. For example, it is automatically satisfied if the interaction is attractive (see Lemma 3.2 below).

Let us turn to the canonical ensemble (ce) . It emerges from the gce by conditioning (i.e. fixing) on the mean spin

(13)

The ce is given by the probability measure

(14)

where  denotes the -dimensional Hausdorff measure. The free energy of the ce  is given by

(15)

Equivalence of ensembles only holds if the external field  of the gce  and the mean spin  of the ce  are related in the following way.

Assumption 2.2.

We then choose  and  such that the following relation is satisfied:

(16)

By the strict convexity of  (see Lemma 2.1) there exists for any  a unique  that satisfies the relation (16) or vice versa.

Now, let us formulate our first main result, namely the equivalence of the free energies in .

Theorem 2.3 (Equivalance of ensembles).

Let  be a real-valued random variables distributed according to

(17)

Assume that

(18)

Then it holds that

(19)

where the convergence is uniform in the mean spin  and the external field . More precisely, given a constant , there is an integer  such that for all 

(20)
(21)
(22)

We would like to emphasize that Theorem 2.3 contains explicit rates of convergence. We give the proof of Theorem 2.3 in Section 3.

A direct consequence of Lemma 2.1 and Theorem 2.3 is that the free energy  is uniformly strictly convex for large enough systems.

Corollary 2.4.

There is a uniform constant  and an integer  such that for all  and all 

(23)

Let us turn to the second main result of this article, the local Cramér theorem. For that purpose let us introduce  which denotes the Legendre transform of the free energy  (also denoted by ) i.e.

(24)

It follows from elementary observations that  is uniformly strictly convex.

Lemma 2.5.

For any 

(25)

Additionally, under the same assumptions as in Theorem 2.3, it holds that  is uniformly strictly convex in the sense that there is a uniform constant  such that for all 

(26)

We give the proof of Lemma 2.5 in Section 3.

The coarse-grained Hamiltonian  is defined as

(27)

Hence, we can rewrite the free energy of the ce as

(28)

It follows that the difference of the free energies  and  can be expressed as

(29)
(30)

From Theorem 2.3 we deduce the following local Cramér theorem.

Theorem 2.6-local Cramér theorem).

Let  be a real-valued random variables distributed according to

(31)

Assume that

(32)

Then it holds that

(33)

where the convergence is uniform in the mean spin  and the external field . More precisely, given a constant , there is an integer  such that for all 

(34)
(35)
(36)

Theorem 2.6 is an extension of the local Cramér theorems that were deduced in Proposition 31 in [GrOtViWe09], Theorem 4 in [Me11] and [MeOt13]. The proof of Theorem 2.6 is stated in Section 3. The main ingredient is Theorem 2.3.

An important consequence of Lemma 2.5 and of Theorem 2.6 is that for large enough systems the coarse-grained Hamiltonian  is uniformly strictly convex.

Corollary 2.7.

Under the assumptions of Theorem 2.6 there is an positive integer  such that for all  the coarse-grained Hamiltonian  is uniformly strictly convex. More precisely, there is a uniform constant  such that for all 

(37)

3. Proof of the main results

In this section we prove the main results of this article.

Assumption 3.1.

From now on we assume that  is a real-valued random vector distributed according to

(38)

We begin with simple auxiliary lemma.

Lemma 3.2.

There exists a uniform constant  such that

(39)

Moreover, if  in (6) is nonnegative, then the condition (11) in Lemma 2.1 is satisfied.

The last lemma shows that the variance of the the mean spin of the gce  is well behaved.

Proof of Lemma 3.2.  For the proof of the upper bounds, one can apply a result of [MeNi14] which provided the logarithmic Sobolev inequality (LSI) for . It is well known that logarithmic Sobolev inequality implies spectral gap inequality (SG). This implies

(40)

where  is a constant in spectral gap inequality independent of .

Proof of the lower bound relies on [Me11, Lemma 9]. Let  be a random variable distributed according to the distribution

(41)

In [Me11, Lemma 9], it was shown that there is a constant  such that for all ,

(42)

Observe that the conditional expectation  has the Lebesgue density

(43)

Let  be the measure defined by the following property

(44)

Then it follows that

(45)
(46)
(47)

Moreover, Menz and Nittka (cf. [MeNi14]) proved that for  we have

(48)

Then it follows using (47) and (48) that

(49)

Now we are ready to give proofs of Lemma 2.1 and Lemma 2.5.

Proof of Lemma 2.1.  It is a direct consequence of Lemma 3.2. Indeed, we have

(50)
(51)

Taking the derivative with respect to again, we obtain

(52)
(53)

Therefore, we conclude from Lemma 3.2 that there is a constant  with

(54)

Proof of Lemma 2.5.  Let us denote . Since  is the Legendre transform of the strict convex function , there exists a unique  such that

(55)

Moreover, for each  satisfies

(56)

which is equivalent to

(57)
(58)
(59)

Then it follows that

(60)
(61)
(62)
(63)

In Lemma 2.1 we proved

(64)

Thus Lemma 3.2 implies there exists a constant  with

(65)

Let us now turn to the proof of Theorem 2.3. We need some more auxiliary results. The first one is Cramér’s trick of exponential shift of measures.

Lemma 3.3.

It holds that

(66)
(67)

Here,  denotes the distribution of

(68)

Proof of Lemma 3.3.  The lemma follows from a direct computation:

(69)
(70)
(71)
(72)
(73)
(74)
(75)

Taking the exponential function and using (30) yields the lemma as desired. ∎

Next, we need the following direct consequence of Lemma 3.2.

Lemma 3.4.

Assume that the single-site potential  satisfies (7) and (8). Then for any finite set  and , we have

(76)

Proof of Lemma 3.4.  Using the arithmetic-geometric mean inequality we get

(77)
(78)
(79)

The lemma now follows from the observation that because the gce  satisfies a uniform Poincaré inequality it holds that for all  and 

(80)

where the constant  only depends on . ∎

The next auxiliary lemma states that on a one dimensional lattice with nearest-neighbor interaction, the gce has uniform exponential decay of correlations.

Lemma 3.5.

For all functions  we have

(81)
(82)

where  and  denotes the support of the function  and  respectively.

Proof of Lemma 3.5.  See [MeNi14]. ∎

Now, we get to the core estimated needed for the proof of Theorem 2.3 and of Theorem 2.6.

Proposition 3.6.

For each  and , there exists a uniform constant  and an integer  such that for all  and all 

(83)
(84)
(85)

The statement of Proposition 3.6 should be compared to Proposition 31 in [GrOtViWe09] or Proposition 3.1 in [MeOt13]. The main difference is that in our situation the random variables  are dependent. This also makes the proof of Proposition 3.6 a lot harder.

The estimates of Proposition 3.6 are motivated from deducing a quantitative local central limit theorem for the properly normalized sum of the random variables . For example, if the random variables  are iid, the estimate (83) is a weaker version of the quantitative local clt estimate

(86)

The last inequality states that the density of the normalized sum at point  converges to the density of the normal distribution. As we mentioned in the introduction, we believe that one could strengthen the estimates of Proposition 3.6 to get a local central limit theorem for dependent random variables. However, we choose to derive weaker bounds instead because they are sufficiently strong for deducing our main results (see Theorem 2.3 and Theorem 2.6). Deducing those weaker estimates is already quite subtle and challenging.

We deduce Proposition 3.6 in Section 4. There, we also comment on how to overcome the problem of considering dependent random variables and not independent ones. Now, we are prepared for the proof of Theorem 2.3.

Proof of Theorem 2.3.  Let us begin with the estimate (20). From (66), we have

(87)

Then a combination of (83) and (87) yields, as desired,

(88)

Let us turn to the estimate (21). Taking the derivative with respect to  in (87) yields

(89)

Let us choose . Then a combination of (83), (84) and (89) implies

(90)

Let us turn to the estimate (22). Differentiating (89) again, we get

(91)

Then after choosing , a combination of (83), (84) and (85) yields

(92)

Let us proceed to the proof of Theorem 2.6.

Proof of Theorem 2.6.  Recall the difference of the free energies (30) of and 

(93)

Then the first desired estimate (34) follows from a combination of (30) and (20) in Theorem 2.3. Let us turn to the estimate (35). A direct computation yields

(94)
(95)
(96)

Then (21), (64) and Lemma 3.2 implies, as desired,

(97)

Before we move on to the estimate (36), let us deduce some auxiliary results. A direct calculation yields

(98)
(99)
(100)
(101)

In particular we have

(102)

Now we claim that

(103)

To prove this, we use (101) and (102) to get