# Population Annealing with Weighted Averages: A Monte Carlo Method for Rough Free Energy Landscapes

## Abstract

The population annealing algorithm introduced by Hukushima and Iba is described. Population annealing combines simulated annealing and Boltzmann weighted differential reproduction within a population of replicas to sample equilibrium states. Population annealing gives direct access to the free energy. It is shown that unbiased measurements of observables can be obtained by weighted averages over many runs with weight factors related to the free energy estimate from the run. Population annealing is well suited to parallelization and may be a useful alternative to parallel tempering for systems with rough free energy landscapes such as spin glasses. The method is demonstrated for spin glasses.

###### pacs:

05.10.Ln,61.43.Bn,75.10.Nr## I Introduction

One of the most difficult challenges in computational statistical physics is sampling the low temperature equilibrium states of systems with complex free energy landscapes such as occur, for example, in spin glasses, configurational glasses and biomolecules. Standard Markov chain Monte Carlo (MCMC) methods such as the Metropolis algorithm become stuck in local minima and are unable to correctly sample the equilibrium distribution. Attempts to solve this difficulty generally involve simulating a broadened distribution that smooths the free energy landscape. One of the most widely used of these methods is parallel tempering or replica exchange Monte Carlo Swendsen and Wang (1986); Geyer (1991); Hukushima and Nemoto (1996). In parallel tempering, many replicas of the system are simultaneously simulated using a standard MCMC method with each replica at a different temperature. The sequence of temperatures spans a range from high temperatures where the free energy landscape is smooth to the low temperatures of interest where it is rough. Replica exchanges are allowed that permit replicas to move between low and high temperatures. The hope is that visits to high temperatures allow more rapid mixing of the Markov chain and equilibration between different minima.

Closely related to the problem of sampling equilibrium is the problem of finding ground states of systems with complex energy landscapes. In computer science the same question arises for NP-hard combinatorial optimization problems. One generic method for finding ground states or at least low lying states is simulated annealing Kirkpatrick et al. (1983). In simulated annealing the system is subject to an equilibrating MCMC procedure at a sequence of temperatures. Following this ‘annealing schedule’ the system is gradually cooled through the sequence of temperatures from a high temperature where the free energy landscape is smooth to a low temperature where it is rough. The hope is that the system will explore many minima while the temperature is high and settle into the deepest minima as the temperature is gradually lowered.

The population annealing algorithm introduced by Hukushima and Iba Hukushima and Iba (2003) is based on simulated annealing Kirkpatrick et al. (1983) but also shares features with parallel tempering, histogram re-weighting Ferrenberg and Swendsen (1988) and diffusion Monte Carlo Anderson (1975). Like simulated annealing, it is a single pass algorithm with an annealing schedule. A population of replicas of the system is simultaneously cooled following the annealing schedule. Unlike simulated annealing, the population is maintained in equilibrium throughout the annealing process and an equilibrium sample is generated at every temperature. Equilibrium is maintained by differential reproduction (resampling) of replicas according to their relative Boltzmann weights. A useful by-product of the calculation of relative Boltzmann weights is direct access to the equilibrium free energy at every temperature. Methods for extracting free energy differences from population annealing are related to Jarzinski’s equality Jarzynski (1997). Population annealing is an example of a sequential Monte Carlo scheme Doucet et al. (2001) and is related to nested sampling methods Skilling (2006).

In Hukushima and Iba (2003), Hukushima and Iba compare population annealing to parallel tempering and show that they produce results with comparable efficiency. However, they note that population annealing suffers biases for small population size. The main contribution of this work is to show that these biases, together with statistical errors, can be made arbitrarily small by using appropriate weighted averages over many independent runs of the algorithm. The weight factor is obtained from the free energy estimator from each run and the variance of this estimator serves as a useful diagnostic for the algorithm. As proof of principle of the method, we apply it to 1D and 3D spin glasses.

## Ii The Algorithm

We now describe our implementation of the algorithm in detail. We start with a population of replicas of the system. For disordered spin systems, each replica has the same set of couplings. Each replica is initialized at infinite temperature, , thus each replica is in an independent microstate chosen at random with equal probability from among the microstates of the system. For Ising systems with spins, the microstates are described by binary variables and . Without loss of generality, suppose that the average energy of the system at infinite temperature is zero, as is the case for the Ising model. The algorithm can also be used with a finite initial temperature but then absolute measurements of the free energy are not available.

The temperature of the set of replicas is now lowered while keeping the ensemble in equilibrium. In order to accomplish this, the population is resampled—some replicas are reproduced and others are eliminated with the expected number of replicas held fixed. Suppose we have a population of replicas chosen from an equilibrium ensemble at temperature and we want to lower the temperature to with . For a replica , with energy the ratio of the statistical weight at and is . This reweighting factor is typically large so that a normalization is needed to keep the population size reasonable. We compute normalized weights whose sum over the ensemble is ,

(1) |

where is the normalization given by

(2) |

The new population of replicas is generated by resampling the original population. The number of copies of state in the new population is where is a Poisson random variate with expected value . If , the configuration is eliminated. The size, of the population after the temperature step has expectation . In our implementation, the size of the population is variable but stays close to the set of target values in contrast to the fixed population method introduced in Hukushima and Iba (2003).

Of course, the new population is now more correlated than before since several replicas may be in the same microstate. Furthermore, to the extent that the energy distributions at the two temperatures do not overlap, the equilibrium distribution may no longer be adequately sampled at the lower temperature. In order to mitigate both of these difficulties, all replicas are now subject to an equilibrating MCMC process at the new temperature.

The algorithm thus consists of a sequence of temperature steps with differential reproduction within the population according to the relative Boltzmann weights followed by additional equilibration at the new temperature using a standard MCMC procedure. The ordered sequence of inverse temperatures, determines the annealing schedule. The population of replicas is equilibrated at using sweeps of the MCMC process. The temperature is now lowered to with differential reproduction as described above followed by MCMC sweeps. At each temperature, observables are measured after the MCMC sweeps and is recorded. In the present implementation, observables are measured after resampling so that no weight factors are needed in taking averages. The temperature is lowered until is reached. In the simplest version of the algorithm the target population sizes and number of MCMC sweeps are independent of temperature, and for all .

Either the MCMC process or the differential reproduction steps are in principle sufficient to produce equilibrium at the new temperature though neither one individually is efficient. However, the combined processes are substantially more efficient than either alone.

One advantage of the algorithm is that it gives direct access to the free energy at each temperature. The normalization factor is an estimator of the ratio of the partition functions at the two temperatures,

(3) | |||||

Thus, the estimator of the free energy, at each simulated temperature is,

(4) |

This section concludes with pseudocode for the algorithm:

Population Annealing

## Iii Weighted Averages

Unless the population size, number of temperature steps and number of MCMC sweeps/step are sufficiently large, the result of a single run of population annealing is significantly biased because the equilibrium distribution is not fully sampled. Simple averaging over an ensemble of independent runs does not reduce this bias. However, an appropriate weighted average over an ensemble of independent runs effectively reduces both statistical errors and biases. The free energy estimator from a given run of the algorithm represents the weight that should be assigned to the observations made during that run. In particular, suppose that an observable is estimated in run to be at temperature . An unbiased estimate of from the entire simulation is the weighted average over the ensemble of independent runs of population annealing,

(5) |

where the weight for run is given by

(6) |

and is the free energy estimator (4) for the run of the algorithm at temperature . The weight factors are needed because they represent the total weight at that temperature that would have been associated with run if the normalization factor in (1) had not been used to keep the population size near the target size. Runs with different but fixed target populations sizes can be combined using (5) with replaced by in both the numerator and denominator of (6).

The free energy appears to require a more complicated weighting formula because it is the sum of terms from all temperatures. However, as shown below it may be expressed as a simple average. The quantity that is directly averaged at each temperature is , not , so the logarithm must be taken after the weighted average. The unbiased estimate of the free energy is given by,

(7) | |||||

The inner summation indexed by is over the ensemble of independent runs while the outer summation indexed by is over the temperature steps between the highest temperature and the temperature of interest . The second expression follows from the first via (4) and (6). The last expression shows that may be directly averaged.

Errors in the weighted averages can be obtained by resampling. Here we use the bootstraps method Newman and Barkema (1999) where the error is the standard deviation of a large number () of resampled ensembles, each of size .

Statistical and systematic errors in the method are determined by the probability distribution of the dimensionless free energy . If this distribution has a variance that is much smaller than unity and tails that decay faster than exponentially then the important weight factors (6) do not vary much and the tails of the distribution are unimportant. On the other hand, if the variance is larger than one, is controlled by the upper tail of the distribution and the ensemble size must be large to reduce errors. In the case that is normally distributed, the typical value of that dominates the sum in (5) is one variance (not one standard deviation!) above the mean. The variance of is thus a criteria for the performance of the algorithm. If it is smaller than about 0.5 then a modest ensemble of independent runs will be sufficient to yield accurate results. Optimizing population annealing should be guided by minimizing the variance of the free energy estimator for a fixed amount of computational effort. Of course, it is always possible that the actual distribution has significant weight far above its mean because of important low energy states that have never been explored. This diabolical situation would similarly fool equilibration tests for parallel tempering Machta (2009).

Population annealing using independent runs with population size is less accurate than a single run with population size . If is small, the loss of accuracy is small and vice versa. Nonetheless, an ensemble of independent runs has several important advantages. First, independent runs with moderate require only a single processor or simple parallelism without communication between processors. Large values of are only practical with true parallelism involving communication between processor. Second, the measurement of the variance of over the ensemble provides a diagnostic for the method. Third, error bars are simply obtained from an ensemble using resampling.

## Iv Application to Ising Spin Glasses

To illustrate the effectiveness of population annealing for high precision simulations we apply it to the Ising spin glass in one and three dimensions. Ising spin glasses are notoriously difficult to equilibrate at low temperature. The Ising spin glass is defined by the energy

(8) |

where is the Ising spin at site and the summation is over the nearest neighbor bonds of the lattice. We consider the simple cubic lattice (3D) with periodic boundary conditions and a periodic chain of spins (1D). The couplings are quenched random variables chosen independently for each bond from a Gaussian distribution with mean zero and variance one. The 3D Ising spin glass with Gaussian couplings has a continuous phase transition at transition temperature Katzgraber et al. (2006). The low temperature phase of this model is still the subject of controversy. On the other hand, the 1D Ising spin glass is a trivial model. Nonetheless, it has a rough free energy landscape and is difficult to equilibrate using generic Monte Carlo methods including parallel tempering Katzgraber and Young (2003); Katzgraber et al. (2005). The 1D spin glass has the advantage that the Monte Carlo results can be compared with exact transfer matrix calculations.

For the 1D spin glass we simulated 18 disorder realizations for a chain with spins using population annealing. We report results for the lowest temperature, . We used a mean population size with temperatures and Metropolis sweeps at each temperature. Of the 99 temperature steps, 19 are equally spaced in inverse temperature between and while the remaining 80 are equally spaced in temperature. Each run used Metropolis sweeps and for each realization the ensemble consisted of independent runs.

First we report results for a single 1D disorder realization at the lowest temperature, . Figure 1 shows the histogram of the dimensionless free energy .

The variance of is 0.34 and the distribution does not appear to have fat tails. Therefore, we expect weighted averages should be useful in calculating observables. Using (5) to compute and the bootstraps method to find the one standard deviation error, we obtain compared to the exact value, obtained from numerically summing the transfer matrices. On the other hand, the unweighted average energy of the ensemble is . Thus, the unweighted average is significantly biased while the weighted average shows no bias to within the statistical errors. The ground state energy for this disorder realization is so at the system is quite close to its ground state. For the free energy of the same realization we have while the weighted average is and the unweighted average is . Again, the weighted estimator is not significantly biased while the unweighted estimator is biased.

Next we consider the set of 18 disorder realizations and investigate how the bias decreases as the size of the ensemble of runs increases. We divide the 200 independent runs for each disorder realization into blocks of size and then perform weighted averages for the energy within each of these blocks. Next we take the mean over the set of blocks and finally average these means over the 18 disorder realizations. Let be this disorder averaged bias,

(9) |

where is the weighted energy estimator in block with block size . We can similarly obtain the standard error of the mean from the blocks. Figure 2 shows the bias as a function of block size . The case corresponds to unweighted averages. By the time an ensemble of 8 runs is used, the bias is reduced by approximately a factor of 6. The weighted energy estimator for the full ensemble of runs for each disorder realization yields a mean bias of where the error estimate is obtained using the bootstraps method. This weighted average displays no bias within the error bars. Similar results hold for the free energy. The weighted estimate of averaged over the 18 disorder realizations deviates from the exact value by , which is again indistinguishable from zero, while the unweighted free energy averaged over the same 18 realizations deviates from the exact answer by .

For the 3D Ising spin glass we studied 25 disorder realizations using population annealing for system size 8. The target population size is , temperatures are evenly spaced between and with Metropolis sweeps performed after every temperature step. Each run requires 2 Metropolis sweeps. In addition to the energy and free energy, we considered the observable , the probability that the overlap is less than 0.2, . The overlap is defined as where the superscripts refer to two independent spin configurations of the system. The probability distribution for the overlap reflects features of the free energy landscape. When there is substantial weight for small , there are two or more free energy minima that are widely separated but with comparable free energies. For parallel tempering this situation is expected to lead to long equilibration times Machta (2009). Thus, it is interesting to look for correlations between the variance of and . In the study of spin glasses, is important in distinguishing the competing pictures of the low temperature phase of the 3D Ising spin glass Marinari and Zuliani (1999); Palassini and Young (2000); Krzakala and Martin (2000); Katzgraber and Young (2002). Since involves the correlation between two independent replicas, we use pairs of independent runs to construct . For each pair of runs, we compute by averaging over pairs of replicas, one from each run and then evaluate for that pair. The appropriate weighted estimator is

(10) |

with

(11) |

where the superscripts refer to the two of independent runs from which is constructed. We did pairs of independent runs to compute .

Figure 3 shows a scatter plot of vs. for the 25 disorder realizations. The values of along the line are estimated as zero by the algorithm on the basis of measurements of the overlap. Most realizations have very small values of , with only a few realizations dominating the disorder average. The variance of the free energy estimator for the 25 realizations never exceeds 0.6. There appears to be a correlation between a large variance of the free energy and a large value of . Realizations with free energy variances less than 0.05 are all associated with very small values of the overlap near zero. This correlation can be understood by realizing that implies that there are at least two free energy minima with comparable free energies. If is not sufficiently large, the population of replicas may be dominated by one or another free energy minima in a single run and thus display a large variance in free energy from one run to another.

Of the 25 disorder realizations we now look more closely at the realization with the largest variance of . This ‘worst’ realization has is . For this realization we obtained data from an ensemble of 125 runs (with , and ). The histogram of is shown in Fig. 4 (left panel). It should be noted that for the 3D simulations the highest temperature is not infinite, however, as we shall see below, a negligible contribution to arises from high temperatures. The mean of is 0.521 below the best estimate , and this deviation is roughly equal to as expected. The value of from these runs is and the value of the energy is . For this realization we also did simulations with a larger target population size without changing or . The histogram of for population size is shown in Fig. 4 (right panel). For the larger population size so increasing the population size by a factor of 5 has reduced the variance by nearly the same factor. The estimate for from this larger population size is , which is consistent with the results from the smaller population size. The estimate for the energy is , which is marginally consistent with the smaller run. For comparison we simulated the same disorder realization using parallel tempering with two sets of 18 replicas each equally spaced in temperature with the same high and low temperatures. The system was equilibrated for sweeps and data was collected for sweeps. We ran this simulation 200 times to obtain means and errors. Thus the total number of Metropolis sweeps for the parallel tempering runs was approximately compared to approximately for population annealing. Parallel tempering runs yielded and . Results are marginally consistent for both and .

Next we consider an initial attempt to improve the performance of the algorithm. The variance of the free energy estimator should be minimized to optimize the algorithm. From (4) we see that at each temperature is a sum of positive terms so that successively lower temperatures have successively larger variances. Figure 5 shows as a function of temperature for the ‘worst’ disorder realizations with the largest variance discussed above. The variance at each temperature is obtained from 50 iterations of the algorithm. The upper points (blue online) show the results from runs where the average population size is fixed, for all . The variance increases slowly for small and then much more rapidly for large . The upward curvature of suggests that more replicas should be used at low temperature. The total computational work remains nearly fixed if for and for . The lower points (red online) in Fig. 5 show the results for this choice of target population size and demonstrates that, indeed, a significant reduction in from 0.57 to 0.36 is achieved at the lowest temperature. This attempt at optimizing the algorithm shows that there is room for improving its performance by careful choice of the parameters.

## V Discussion

Population annealing is a promising tool for measuring the free energy and other observables in spin glasses and perhaps other systems with rough free energy landscapes. By using weighted averages over an ensemble of runs, biases inherent in a single run can be made small and high precision results can be obtained. If the variance of the dimensionless free energy estimator is less than about 0.5, high precision results can be obtained from a relatively small ensemble of runs. This variance thus serves as measure of the convergence of the algorithm.

The method can be optimized by minimizing the variance of the free energy estimator. In future work it would be important to optimize the method and study its efficiency and scaling with system size in comparison with well-established methods such as parallel tempering.

Compared to parallel tempering, population annealing is better suited to parallelization. Parallel tempering is optimized with a relatively small number of replicas. Increasing the number of replicas in parallel tempering improves the acceptance rate of replica exchange moves but also lengthens the round trip distance between the warmest and coldest replicas. Thus, the optimum number of replicas is relatively small in practice. On the other hand, population annealing is monotonically improved by increasing the population size. For example, for the , 3D Ising spin glasses studied here population annealing makes effective use of thousands of replicas while parallel tempering is optimized with tens of replicas. Population annealing may find use in quickly equilibrating large systems using a very large population spread over many processors.

###### Acknowledgements.

This work was supported in part from NSF grant DMR-0907235. I thank Helmut Katzgraber, Burcu Yucesoy and Yukita Iba for helpful discussions. I thank the Santa Fe Institute for their hospitality and support.### References

- R. H. Swendsen and J.-S. Wang, Phys. Rev. Lett. 57, 2607 (1986).
- C. Geyer, in Computing Science and Statistics: 23rd Symposium on the Interface, edited by E. M. Keramidas (Interface Foundation, Fairfax Station, 1991), p. 156.
- K. Hukushima and K. Nemoto, J. Phys. Soc. Jpn. 65, 1604 (1996).
- K. Hukushima and Y. Iba, in THE MONTE CARLO METHOD IN THE PHYSICAL SCIENCES: Celebrating the 50th Anniversary of the Metropolis Algorithm, edited by J. E. Gubernatis (AIP, 2003), vol. 690, pp. 200–206.
- S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Science 220, 671 (1983).
- A. M. Ferrenberg and R. H. Swendsen, Phys. Rev. Lett. 61, 2635 (1988).
- J. B. Anderson, J. Chem. Phys. 63, 1499 (1975).
- C. Jarzynski, Phys. Rev. E 56, 5018 (1997).
- A. Doucet, N. de Freitas, and N. Gordon, eds., Sequential Monte Carlo Methods in Practice (Springer-Verlag, New York, 2001).
- J. Skilling, Bayesian Analysis 1, 833 (2006).
- M. E. J. Newman and G. T. Barkema, Monte Carlo Methods in Statistical Physics (Oxford, Oxford, 1999).
- J. Machta, Phys. Rev. E 80, 056706 (2009).
- H. G. Katzgraber, M. Körner, and A. P. Young, Phys. Rev. B 73, 224432 (2006).
- H. G. Katzgraber and A. P. Young, Phys. Rev. B 67, 134410 (2003).
- H. G. Katzgraber, M. Körner, F. Liers, M. Jünger, and A. K. Hartmann, Phys. Rev. B 72, 094421 (2005).
- E. Marinari and F. Zuliani, J. Phys. A: Math. Gen. 32, 7447 (1999).
- M. Palassini and A. P. Young, Phys. Rev. Lett. 85, 3017 (2000).
- F. Krzakala and O. C. Martin, Phys. Rev. Lett. 85, 3013 (2000).
- H. G. Katzgraber and A. P. Young, Phys. Rev. B 65, 214402 (2002).