Self-Healing Umbrella Sampling This work is supported by the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement number 614492 and by the French National Research Agency under the grant ANR-12-BS01-0019 (STAB). We also benefited from the scientific environment of the Laboratoire International Associé between the Centre National de la Recherche Scientifique and the University of Illinois at Urbana-Champaign.

Self-Healing Umbrella Sampling thanks: This work is supported by the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement number 614492 and by the French National Research Agency under the grant ANR-12-BS01-0019 (STAB). We also benefited from the scientific environment of the Laboratoire International Associé between the Centre National de la Recherche Scientifique and the University of Illinois at Urbana-Champaign.

Convergence and efficiency
Gersende Fort G. Fort LTCI, CNRS & Telecom Paris Tech 46, rue Barrault
75634 Paris Cedex 13
France
1B. Jourdain, T. Lelièvre and G. Stoltz Université Paris-Est, CERMICS (ENPC), INRIA
F-77455 Marne-la-Vallée
France 2
   Benjamin Jourdain G. Fort LTCI, CNRS & Telecom Paris Tech 46, rue Barrault
75634 Paris Cedex 13
France
1B. Jourdain, T. Lelièvre and G. Stoltz Université Paris-Est, CERMICS (ENPC), INRIA
F-77455 Marne-la-Vallée
France 2
   Tony Lelièvre G. Fort LTCI, CNRS & Telecom Paris Tech 46, rue Barrault
75634 Paris Cedex 13
France
1B. Jourdain, T. Lelièvre and G. Stoltz Université Paris-Est, CERMICS (ENPC), INRIA
F-77455 Marne-la-Vallée
France 2
   Gabriel Stoltz G. Fort LTCI, CNRS & Telecom Paris Tech 46, rue Barrault
75634 Paris Cedex 13
France
1B. Jourdain, T. Lelièvre and G. Stoltz Université Paris-Est, CERMICS (ENPC), INRIA
F-77455 Marne-la-Vallée
France 2
2email: gersende.fort@telecom-paristech.fr
4email: {jourdain,lelievre,stoltz}@cermics.enpc.fr
Received: date / Accepted: date
Abstract

The Self-Healing Umbrella Sampling (SHUS) algorithm is an adaptive biasing algorithm which has been proposed in SHUS in order to efficiently sample a multimodal probability measure. We show that this method can be seen as a variant of the well-known Wang-Landau algorithm wang-landau-01 ; wang-landau-01-PRL . Adapting results on the convergence of the Wang-Landau algorithm obtained in fort:jourdain:kuhn:lelievre:stoltz:2014 , we prove the convergence of the SHUS algorithm. We also compare the two methods in terms of efficiency. We finally propose a modification of the SHUS algorithm in order to increase its efficiency, and exhibit some similarities of SHUS with the well-tempered metadynamics method barducci-bussi-parrinello-08 .

Keywords:
Wang-Landau algorithm Stochastic Approximation Algorithm Free energy biasing techniques
journal: Statistics and Computing

1 Introduction

The efficient sampling of a probability measure defined over a high dimensional space is required in many application fields, such as computational statistics or molecular dynamics lelievre-rousset-stoltz-book-10 . Standard algorithms consist in building a dynamics which is ergodic with respect to the target distribution such as Langevin dynamics lelievre-rousset-stoltz-book-10 ; roberts-tweedie-96 or Metropolis-Hastings dynamics MRRTT53 ; Hastings70 . Averages over trajectories of this ergodic dynamics are then used as approximations of averages with respect to the target probability measure. In many cases of interest, this probability measure is multimodal: regions of high probability are separated by regions of low probability, and this implies that the ergodic dynamics is metastable. This means that it takes a lot of time to leave a high probability region (called a metastable state). The consequence of this metastable behavior is that trajectorial averages converge very slowly to their ergodic limits.

Many techniques have been proposed to overcome these difficulties. Among them, importance sampling consists in modifying the target probability using a well-chosen bias in order to enhance the convergence to equilibrium. Averages with respect to the original target are then recovered using a reweighting of the biased samples. In general, it is not easy to devise an appropriate bias. Adaptive importance sampling methods have thus been proposed in order to automatically build a “good” bias (see (lelievre-rousset-stoltz-book-10, , Chapter 5) for a review of these approaches).

Let us explain the principle of adaptive biasing techniques in a specific setting (we refer to (lelievre-rousset-stoltz-book-10, , Chapter 5) for a more general introduction to such methods). To fix the ideas, let us consider a target probability measure on the state space , where denotes the Lebesgue measure on , and let us be given a partition of into subsets. The subsets are henceforth called ’strata’. The choice of such a partition will be discussed later (see Footnote 1 below). We assume that the target measure is multimodal in the sense that the weights of the strata span several orders of magnitude. In other words, the weights of the strata

(1)

vary a lot and is large. We suppose from now on and without loss of generality (up to removing some strata) that . In such a situation, it is natural to consider the following biased probability density:

(2)

which is such that

(3)

for all : under the biased probability measure , each stratum has the same weight. In particular, the ergodic dynamics which are built with as the target measure are typically less metastable than the dynamics with target . The practical difficulty to implement this technique is of course that the vector

is unknown. The principle of adaptive biasing methods is to learn on the fly the vector in order to eventually sample the biased probability measure . Adaptive algorithms thus build a sequence of vectors which is expected to converge to . Various updating procedures have been proposed (lelievre-rousset-stoltz-book-10, , Chapter 5). Such adaptive techniques are used on a daily basis by many practitioners in particular for free energy computations in computational statistical physics. In this context, the partition of is related to the choice of a so-called reaction coordinate111For a given measure , the choice of a good partition or, equivalently, of a good reaction coordinate does of course influence the efficiency of the algorithm. This choice is a difficult problem that we do not consider here (see for example chopin-lelievre-stoltz-12 for such discussions in the context of computational statistics). Here, both and are assumed to be given., and the weights give the free energy profile associated with this reaction coordinate. We focus here on a specific adaptive biasing method called the Self-Healing Umbrella Sampling (SHUS) technique SHUS ; dickson-legoll-lelievre-stoltz-fleurat-lessard-10 . We will show that it is a variant of the well-known Wang-Landau method wang-landau-01 ; wang-landau-01-PRL . From a practical viewpoint, the main interest of SHUS compared to Wang-Landau is that the practitioner has less numerical parameters to tune (as will be explained in Section 2.3).

The aim of this paper is to analyze the SHUS algorithm in terms of convergence and efficiency. First, we adapt the results of fort:jourdain:kuhn:lelievre:stoltz:2014 which prove the convergence of Wang-Landau to obtain the convergence of SHUS (see Theorem 3.2). Second, we perform numerical experiments to analyze the efficiency of SHUS, in the spirit of amrx where similar numerical tests are performed for the Wang-Landau algorithm. The efficiency analysis consists in estimating the average exit time from a metastable state in the limit when goes to infinity. Adaptive techniques (such as SHUS or Wang-Landau) yield exit times which are much smaller than for the original non-adaptive dynamics.

The main output of this work is that, both in terms of convergence (longtime behavior) and efficiency (exit times from metastable states), SHUS is essentially equivalent to the Wang-Landau algorithm for a specific choice of the numerical parameters. These numerical parameters are not the optimal ones in terms of efficiency and we propose in Section 5.2 a modified SHUS algorithm which is (in the longtime limit) equivalent to the Wang-Landau algorithm with better sets of parameters.

This article is organized as follows. In Section 2, we introduce the SHUS algorithm, check its asymptotic correctness and explain how it can be seen as a Wang-Landau algorithm with stochastic stepsize sequence. In Section 3, we state a convergence result for Wang-Landau algorithms with general (either deterministic or stochastic) stepsize sequences and deduce the convergence of SHUS. The proofs are based on stochastic approximation arguments and postponed to Section 6. Numerical results illustrating the efficiency of the algorithm and comparing its performance with the standard Wang-Landau algorithm are provided in Section 4. Finally, in Section 5, we draw some conclusions on the interest of SHUS compared with the Wang-Landau algorithm, and further compare SHUS with other adaptive techniques, such as the well-tempered metadynamics algorithm barducci-bussi-parrinello-08 and the above-mentionned modified SHUS algorithm. We also prove the convergence of this modified SHUS algorithm (see Proposition 4), and present numerical results showing that this new method is closely related to a Wang-Landau dynamics with larger stepsizes.

2 The SHUS algorithm

Using the notation of the introduction, we consider a target probability measure on the state space and a partition of into strata .

We introduce a family of biased densities , for , where is the set of positive probability measures on :

The biased densities are obtained from by a reweighting of each stratum:

(4)

where is defined by (1). Observe that for any and ,

(5)

Equations (2) and (3) are respectively Equations (4) and (5) with the specific choice .

2.1 Description of the algorithm

As explained above, the principle of many adaptive biasing techniques, and SHUS in particular, is to build a sequence which converges to . This allows to sample , which is less multimodal than . In order to understand how the updating rule for is built for SHUS, one may proceed as follows.

Let us first assume that we are given a Markov chain which is ergodic with respect to the target measure (think of a Metropolis-Hastings dynamics). Let us introduce the sequence (for a given )

(6)

which, in some sense, counts the number of visits to each stratum. By the ergodic property, it is straightforward to check that converges almost surely (a.s.) to as . As explained in the introduction, the difficulty with this algorithm is that the convergence of to is very slow due to the metastability of the density and thus of the Markov chain .

The idea is then that if an estimate of is available, one should instead consider a Markov chain which is ergodic with respect to and thus hopefully less metastable. To estimate with this new Markov chain, one should modify the updating rule (6) as

(7)

in order to unbias the samples (since ). Again, by the ergodic property, one easily gets that converges a.s. to as . Indeed, converges a.s. to (given by (5)) as . Since

(8)

converges a.s. to , which implies in turn

Hence converges a.s. to as .

The SHUS algorithm consists in using the current value  as the estimate of in the previous algorithm. Let us now precisely define the SHUS algorithm. Let be a family of transition kernels on  which are ergodic with respect to . In particular,

Let , and be deterministic. The SHUS algorithm consists in iterating the following steps:

Algorithm 1

Given ,

  • compute the probability measure on ,

    (9)
  • draw according to the kernel ,

  • compute, for all ,

    (10)

Notice that only simulation under is needed to implement SHUS. By choosing as a Metropolis-Hastings kernel with target measure , the density needs to be known only up to a normalizing constant. Therefore, SHUS covers the case when the target measure is only known up to a multiplicative constant which is generally the case in view of applications to Bayesian statistics and statistical physics.

As proved in Section 3, converges almost surely to when . According to the update mechanism (10), is non negative, and it is positive if and only if the current draw lies in stratum . In addition, this variation is proportional to the current approximation of with a factor chosen by the user (prior to the run of the algorithm; the choice of  is numerically investigated in Section 4).

The principle of this algorithm is thus to penalize the already visited strata, in order to favor transitions towards unexplored strata. The penalization strength is proportional to the current bias of the strata. The prefactor in (10) can be understood as a way to unbias the samples in order to recover samples distributed according to the target measure , see also formula (17) below.

2.2 Asymptotic correctness

Let us consider the SHUS algorithm 1 and let us assume that converges to some value, say . It is then expected that converges to and that the updating rule (10) leads to the same asymptotic behavior as (7) (where in (10) has been replaced by its limit ). Thus, it is expected that a.s. by (8) and thus a.s.. Therefore, the only possible limit of the sequence is .

This heuristic argument is of course not a proof of convergence, but it explains why one can expect the SHUS algorithm to behave like a Metropolis-Hastings algorithm with target measure . The rigorous result for the convergence is given in Section 3, and the efficiency of SHUS is discussed in Section 4.

2.3 Reformulation as a Wang-Landau algorithm with a stochastic stepsize sequence

One key observation of this work is that SHUS can be seen as a Wang-Landau algorithm with nonlinear update of the weights and with a specific stepsize sequence , see wang-landau-01 ; wang-landau-01-PRL ; fort:jourdain:kuhn:lelievre:stoltz:2014 ; amrx . The Wang-Landau algorithm with nonlinear update of the weights consists in replacing the updating formula (10) by:

(11)

where the deterministic stepsize sequence has to be chosen by the practitioner beforehand. The choice of this sequence is not easy: it should converge to zero when goes to infinity (vanishing adaption) in order to ensure the convergence of the sequence , but not too fast otherwise the convergence of to is not ensured.

Remark 1

In fact, the updating rule of the original Wang-Landau algorithm is more complicated than (11) since the stepsizes are changed at random stopping times related to a quasi-uniform population of the strata and not at every iteration, see JR11 for a mathematical analysis of the well-posedness of the algorithm.

Going back to SHUS, by setting

(12)

it is easy to check that (10) is equivalent to

(13)

which explains why SHUS can be seen as a Wang-Landau algorithm with nonlinear update of the weights (see (11)) but with a stepsize sequence which is not chosen by the practitioner: it is adaptively built by the algorithm.

3 Convergence results

The goal of this section is to establish convergence results on the sequence given by (9) to the weight vector and on the distribution of the samples . To do so, we will extend the results of fort:jourdain:kuhn:lelievre:stoltz:2014 and more generally prove convergence of the Wang-Landau algorithm with general (either deterministic or stochastic) stepsize sequence and the nonlinear update of the weights. We need the following assumptions on the target density and the kernels :

  • The density of the target distribution is such that and the strata satisfy .

Notice that this assumption implies that given by (1) is such that (hence ).

  • For any , is a Metropolis-Hastings transition kernel with proposal kernel where is symmetric and satisfies , and with invariant distribution , where is given by (4).

3.1 Convergence of Wang-Landau algorithms with a general stepsize sequence

In fort:jourdain:kuhn:lelievre:stoltz:2014 , we consider the Wang-Landau algorithm with a linear update of the weights. The linear update version of Wang-Landau consists in changing the updating rule (11) to a linearized version (in the limit ) on the normalized weights:

(14)

We prove in fort:jourdain:kuhn:lelievre:stoltz:2014 that, when the target density and the kernels satisfy A1 and A2, the Wang-Landau algorithm with this linear update of the weights converges under the following condition on : the sequence is deterministic, ultimately non-increasing,

To the best of knowledge, neither the convergence of the Wang-Landau algorithm with the nonlinear update of the weights (11) nor the case of a random sequence of stepsizes are addressed in the literature. It is a particular case of the following Wang-Landau algorithm with general stepsize sequence which also generalizes SHUS: starting from random variables and , iterate the following steps:

Algorithm 2

Given ,

  • compute the probability measure on ,

    (15)
  • draw according to the kernel ,

  • compute, for all ,

    (16)

where the positive stepsize sequence is supposed to be predictable with respect to the filtration (i.e. is -measurable).

Algorithm 2 is a meta-algorithm: to obtain a practical algorithm, one has to specify the way the stepsize sequence is generated.

Definition 1

An algorithm of the type described in Algorithm 2 is said to converge if it satisfies the following properties:

  1. .

  2. For any bounded measurable function on ,

  3. For any bounded measurable function on ,

    (17)

In addition to the almost sure convergence of to , this definition encompasses the ergodic convergence of the random sequence to and the convergence of an importance sampling-type Monte Carlo average to the probability measure . Our main result is the following Theorem.

Theorem 3.1

Assume A1, A2 and that there exists a deterministic sequence converging to such that

(18)

and the stepsize sequence is a.s. non-increasing. Then Algorithm 2 converges in the sense of Definition 1.

One immediately deduces convergence of the Wang-Landau algorithm with determistic stepsize sequence and nonlinear update (11) under the same assumptions as for the linear update (14), which generalizes the result of fort:jourdain:kuhn:lelievre:stoltz:2014 .

Corollary 1

Assume A1, A2 and that the deterministic sequence is non-increasing and such that and . Then the Wang-Landau algorithm with nonlinear update (11) converges in the sense of Definition 1.

Before outlining the proof of Theorem 3.1, let us discuss its application to the SHUS algorithm.

3.2 Convergence of SHUS

The stepsize sequence obtained in the reformulation of the SHUS algorithm 1 as a Wang-Landau algorithm is clearly decreasing since, the sequence is increasing. To apply Theorem 3.1, we also need to check (18). This is the purpose of the following proposition which is proved in Section 6.

Proposition 1

With probability one, the stepsize sequence in the SHUS algorithm 1 is decreasing and for any ,

Moreover, under A1 and A2, there exists a random variable such that

Since where is deterministic, combining Theorem 3.1 and Proposition 1, we obtain the following convergence result for the SHUS algorithm 1.

Theorem 3.2

Under A1 and A2, the SHUS algorithm 1 converges in the sense of Definition 1.

Remark 2

Since the sequence generated by the SHUS algorithm is a Markov chain, one easily deduces that the SHUS algorithm started from a random initial condition also converges in the sense of Definition 1.

The convergence result from Theorem 3.2 allows us to characterize the asymptotic behavior of the stepsizes. Indeed, since for and ,

the property (17) with implies that the SHUS algorithm generates sequences and which satisfy

Corollary 2

Under A1 and A2,

Notice that this Corollary implies that the stepsize sequence scales like in the large limit. In Section 4.3, we will therefore compare SHUS with the Wang-Landau algorithm implemented with a stepsize sequence , for some positive parameter .

3.3 Strategy of the proof of Theorem 3.1

The proof of Theorem 3.1 is given in Section 6. It relies on a rewriting of the updating mechanism of the sequence as a Stochastic Approximation (SA) algorithm for which convergence results have been proven (see for example andrieu:moulines:priouret:2005 ). Notice that the updating formula (15)–(16) is equivalent to: for all and

(19)

where for all ,

denotes the index of the stratum where lies. Upon noting that , (19) is equivalent to

(20)

where is defined by

(21)

and

(22)

Notice that the last term in (20) is of the order of . The recurrence relation (20) is thus in a standard form to apply convergence results for SA algorithms (see e.g. andrieu:moulines:priouret:2005 ).

There are however three specific points in Algorithm 2 which make the study of this SA recursion quite technical. A first difficulty raises from the fact that alone is not a Markov chain: given the past up to time , is generated according to a Markov transition kernel computed at the current position but controlled by the current value , which depends on the whole trajectory . A second one comes from the randomness of the stepsizes . Finally, it is not clear whether the sequence remains a.s. in a compact subset of the open set so that a preliminary step when proving the convergence of the SA recursion is to establish its recurrence, namely the fact that the sequence returns to a compact set of infinitely often.

3.4 A few crucial intermediate results

Let us highlight a few results which are crucial to tackle these difficulties and establish Theorem 3.1. The fundamental result to address the dynamics of controlled by is the following proposition established in (fort:jourdain:kuhn:lelievre:stoltz:2014, , Proposition 3.1.)

Proposition 2

Assume A1 and A2. There exists such that

where for a signed measure , the total variation norm is defined as

In the present case, the recurrence property of the SA algorithm means the existence of a positive threshold such that infinitely often in , the minimal weight

(23)

is larger than the threshold. Let

(24)

be the smallest index of stratum with smallest weight according to and be the times of return to the stratum of smallest weight: and, for ,

(25)

with the convention . We prove in Section 6 the following recurrence property.

Proposition 3

Assume A1, A2 and the existence of a deterministic sequence converging to such that . Then Algorithm 2 is such that

(26)

and

(27)

Using the recurrence property of Proposition 3, we are then able to prove that Algorithm 2 converges in the sense of Definition 1- by using general convergence results for SA algorithms given in andrieu:moulines:priouret:2005 . The properties in Definition 1-- then follow from convergence results for adaptive Markov Chain Monte Carlo algorithms given in fort:moulines:priouret:2011 . See Section 6 for the details.

4 Numerical investigation of the efficiency

We present in this section some numerical results illustrating the efficiency of SHUS in terms of exit times from a metastable state. We also compare the performances of SHUS and of the Wang-Landau algorithm on this specific example.

4.1 Presentation of the model and of the dynamics

We consider the system based on the two-dimensional potential suggested in PSLS03 , see also amrx for similar experiments on the Wang-Landau algorithm. The state space is (with ), and we denote by a generic element of . The reference measure is the Lebesgue measure . The density of the target measure reads

for some positive inverse temperature , with

(28)

and the normalization constant

We introduce strata , with and . Thus, the element lies in the stratum .

A plot of the level sets of the potential  is presented in Figure 1. The global minima of the potential  are located at the points and (notice that the potential is symmetric with respect to the -axis).

The Metropolis-Hastings kernels are constructed using isotropic proposal moves distributed in each direction according to independent Gaussian random variables with variance . The reference Metropolis-Hastings dynamics is ergodic and reversible with respect to the measure with density . This dynamics is metastable: for local moves ( of the order of a fraction of ), it takes a lot of time to go from the left to the right, or equivalently from the right to the left. More precisely, there are two main metastable states: one located around , and another one around . These two states are separated by a region of low probability. The metastability of the dynamics increases with (i.e. as the temperature decreases). The larger is, the larger is the ratio between the weight under of the strata located near the main metastable states and the weight under of the transition region around , and the more difficult it is to leave the left metastable state to enter the one on the right (and conversely). We refer for example to (amrx, , Fig. 3-1) for a numerical quantification of this statement.

As already pointed out in amrx , adaptive algorithms such as the Wang-Landau dynamics are less metastable than the original Metropolis-Hastings dynamics, in the sense that the typical time to leave a metastable state is much smaller thanks to the adaption mechanism. In this section, we compare the adaptive Markov chain corresponding to the SHUS algorithm with the one generated by the Wang-Landau dynamics with a nonlinear update of the weights (see (11)) with stepsizes for some constant . This choice for is motivated by the asymptotic behavior of in the limit , see Corollary 2. The proposal kernel used in the Metropolis algorithm is the same for SHUS and Wang-Landau. Therefore, the two algorithms only differ by the update rules of the weight sequence . The initial weight vector is and the initial condition is for both dynamics.

Figure 1: Level sets of the potential  defined in (28). The minima are located at the positions , and there are three saddle-points, at the positions and . The energy differences of these saddle points with respect to the minimal potential energy are respectively  and .

4.2 Study of a typical realization

Let us first consider a typical realization of the SHUS algorithm, in the case when is equal to the width of a stratum , with (so that ), and an inverse temperature . The values of the first component of the chain as a function of the iterations index are reported in Figure 2. The trajectory is qualitatively very similar to the trajectories obtained with the Wang-Landau algorithm (see for example (amrx, , Figure 5)). In particular, the exit time out of the second visited well is much larger than the exit time out of the first one. Such a behavior is proved for the Wang-Landau algorithm with a deterministic stepsize in (amrx, , Section 6). After the exit out of the second well, the convergence of the sequence is already almost achieved, and the dynamics freely moves between the two wells.

When the initial exit out of the first well occurs, the biases within this well are already very well converged, see Figure 3.

In the longtime limit, according to Figure 4 (top), one has , as predicted by Corollary 2. In the initial phase (before the exit out of the left well), stabilizes around a value corresponding to the number of strata explored within the well, see Figure 4 (bottom) (here, the exploration is well performed for values of  between -1.2 and -0.45, which corresponds to 15 strata). In this phase, only the restriction of the target density to these strata is seen by the algorithm and this convergence can be seen as a local version of Corollary 2.

Figure 2: Top: Typical trajectory  for the parameters , , and .
Figure 3: Plot of the bias at the iteration index  when the system leaves the left well for the first time (, , and ). The reference values are computed with a numerical quadrature of the integrals (1).
Figure 4: Behavior of the stepsize sequence  (, , and ). Top: Longtime convergence. Bottom: Stabilization during the exploration of the left well before the first exit.

4.3 Scalings of the first exit time

In this section, we study the influence of the three parameters , and on the first exit times (in the limit of small temperature). Concerning , we stick to the assumption that without any prior knowledge on the system, the choice is natural. In addition, notice that multiplying and the parameter by the same constant does not modify the sequence generated by the SHUS algorithm. This is why we always choose .

Average exit times are obtained by performing independent realizations of the following procedure: initialize the system in the state , and run the dynamics until the first time index such that (i.e. the first component of is larger than 1) for SHUS or for Wang-Landau. The average of this first exit time is denoted by for SHUS and by for Wang-Landau. For a given value of the inverse temperature , the computed average exit times  and  are obtained by averaging over independent realizations of the process started at . We use the Mersenne-Twister random number generator as implemented in the GSL library. Since we work with a fixed maximal computational time (of about a week or two on our computing machines with our implementation of the code), turns out to be of the order of a few thousands for the largest exit times, while in the easiest cases corresponding to the shortest exit times. In our numerical results, we checked that is always sufficiently large so that the relative error on  and  is less than a few percents in the worst cases.

According to the numerical experiments performed in amrx that confirm the theoretical analysis of a simple three-states model also given in amrx , the scaling behavior for in the limit for the Wang-Landau algorithm with a stepsize sequence is

(29)

where and are positive constants which depend on , and .

Due to the initial convergence of to the number of strata visited before the first exit time, one expects the first exit time of the SHUS algorithm to behave like the first exit time for the Wang-Landau algorithm with stepsizes . Note that since we use strata of equal sizes, the number is proportional to the total number of strata (see Table 4 for a more quantitative assessment).

We consider three situations:

  1. We first study how the exit times vary as a function of with and the fixed value ; see Figure 5 and Table 1.

  2. We then fix and study how the exit times depend on , still with the fixed value ; see Figure 6 and Table 2.

  3. We finally study the scaling of the exit times depending on the value when and are fixed; see Figure 7 and Table 3.

We observe in all cases that the average first exit time is of the form

in the limit of large , the values and being obtained by a least-square fit in log-log scale. In view of (29), this confirms the validity of the above comparison with the Wang-Landau algorithm. The exponential rate does not seem to depend on the values of the parameters , and . Only the prefactors  depend on these parameters. The larger the number of strata, and the lower the value of , the larger the prefactor is. A more quantitative assessment of the increase of the prefactor with respect to larger numbers of strata  and smaller values of  is provided in the captions of Table 2 and 3.

Figure 5: Exit times as a function of the inverse temperature  when the number of strata is varied while the magnitude of the proposed displacement is modified accordingly as