Mix & Match Hamiltonian Monte Carlo

# Mix & Match Hamiltonian Monte Carlo

ms.bib

Abstract

The Hamiltonian Monte Carlo (HMC) method has been recognized as a powerful sampling tool in computational statistics. We show that performance of HMC can be dramatically improved by incorporating importance sampling and an irreversible part of the dynamics into the chain. This is achieved by replacing Hamiltonians in the Metropolis test with modified Hamiltonians, and a complete momentum update with a partial momentum refreshment. We call the resulting generalized HMC importance sampler—Mix & Match Hamiltonian Monte Carlo (MMHMC). The method is irreversible by construction and has been further complemented by (i) the efficient algorithms for computation of modified Hamiltonians; (ii) the implicit momentum update procedure as well as (iii) the two-stage splitting integration schemes specially derived for the methods sampling with modified Hamiltonians. MMHMC has been implemented in the in-house software package HaiCS (Hamiltonians in Computational Statistics), tested on the popular statistical models and compared in sampling efficiency with HMC, Generalized Hybrid Monte Carlo (GHMC), Riemann Manifold Hamiltonian Monte Carlo (RMHMC), Metropolis Adjusted Langevin Algorithm (MALA) and Random Walk Metropolis-Hastings (RWMH). To make a fair comparison, we propose a metric that accounts for both correlations among samples and weights, and can be readily used for all methods which generate such samples. The experiments reveal the superiority of MMHMC over popular sampling techniques, especially in solving high dimensional problems.

Keywords: Bayesian inference, Markov chain Monte Carlo, Hamiltonian Monte Carlo, modified Hamiltonians, splitting integrators

## 1 Introduction

Despite the complementary nature, Hamiltonian dynamics and Metropolis Monte Carlo had never been considered jointly until the Hybrid Monte Carlo method was formulated in the seminal paper by \citet*DKPR87. It was originally applied to lattice field theory simulations and remained unknown for statistical applications till 1994, when R. Neal used the method in neural network models \citepNeal94. Since then, the common name in statistical applications is Hamiltonian Monte Carlo (HMC). The practitioners-friendly guides to HMC can be found in \citepNeal10,Betancourt:2017, while comprehensive geometrical foundations are provided in \citepBBLG16. The conditions under which HMC is geometrically ergodic have been established recently \citepLBBG16.

Nowadays, HMC is used in a wide range of applications—from molecular simulations to statistical problems appearing in many fields, such as ecology, cosmology, social sciences, biology, pharmacometrics, biomedicine, engineering, business. The software package Stan \citepStan has contributed to the increased popularity of the method by implementing HMC based sampling within a probabilistic modeling language in which statisticians can write their models in a familiar notation.

For a range of problems in computational statistics the HMC method has proved to be a successful and valuable technique. The efficient use of gradient information of the posterior distribution allows it to overcome the random walk behavior typical of the Metropolis-Hastings Monte Carlo method.

On the other hand, the performance of HMC deteriorates exponentially, in terms of acceptance rates, with respect to the system’s size and the step size due to errors introduced by numerical approximations \citepIH04. Many rejections induce high correlations between samples and reduce the efficiency of the estimator. Thus, in systems with large number of parameters, or latent parameters, or when the data set of observations is very large, efficient sampling might require a substantial number of evaluations of the posterior distribution and its gradient. This may be computationally too demanding for HMC. In order to maintain the acceptance rate for larger systems at a high level, one should either decrease a step size or use a higher order integrator, which is usually impractical for large systems.

Ideally, one would like to have a sampler that increases acceptance rates, has faster convergence, improves sampling efficiency and whose rational choice of simulation parameters is not difficult to determine.

To achieve some of those goals, several modifications of the HMC method have been recently developed for statistical applications (see Figure 1).

It is worth of mentioning here the methods employing a position dependent ‘mass’ matrix \citepGC11,Betancourt13a,Lan:2012, adaptive HMC \citepHoffmanGelman14,Betancourt13,WdF12,WMdF13, delayed rejections HMC \citepSDMdW14,CSS14, HMC with the approximated gradients \citepCFG14,SSLS15,Zhang:2015,Zhang:2015a, problem related HMC \citepBetancourt:2010, brubaker2012family,LZS14,PP13,LSS14, Betancourt:2015,BG13, ZS14,zhang2012continuous, enhanced sampling HMC—for example, tempering HMC \citepvdMPW14, Hamiltonian Annealed Importance Sampling \citepSDC12, and special cases of HMC—for example, Metropolis Adjusted Langevin Algorithm \citepKennedy:1990.

Among the modifications introduced in computational physical sciences, the most important ones are partial momentum update and sampling with modified energies.

The partial momentum update (in contrast to the complete momentum update) was introduced by \citetHorowitz91 within Generalized guided Monte Carlo, also known as the second order Langevin Monte Carlo (L2MC). The purpose of this method was to retain more dynamical information on a simulated system.

\citet

Kennedy01 formalized this idea in the Generalized Hybrid Monte Carlo (GHMC) method. GHMC is defined as the concatenation of two steps: Molecular Dynamics Monte Carlo and Partial Momentum Update.

Applications of the GHMC method to date include mainly molecular simulations. Behavior of non-special cases of GHMC are not well studied in statistical simulations, with only a few exceptions, for example in \citepSohl-Dickstein12, SDMdW14.

The idea of implementing HMC method with respect to a modified density by using the modified (shadow) Hamiltonian in the Metropolis test was suggested by \citetIH04. The performance of the resulting Shadow Hybrid Monte Carlo (SHMC) is limited by the need for fine tuning the parameter introduced for controlling the difference in the true and modified Hamiltonians and by evaluation of a non-separable modified Hamiltonian.

The SHMC was modified by \citetSHSI09 by replacing a non-separable shadow Hamiltonian with the separable 4th order shadow Hamiltonian to result in Separable Shadow Hybrid Monte Carlo (S2HMC).

The first method to incorporate both the partial momentum update and sampling with respect to a modified density was introduced by \citetAR06 and called Targeted Shadow Hybrid Monte Carlo (TSHMC). However, the Generalized Shadow Hybrid Monte Carlo (GSHMC) method formulated by \citetAkhmatskaya08 appears the most efficient among the listed methods \citepWee08,Akhmatskaya:2009,ARN11.

The potential advantage of GSHMC compared to HMC is enhanced sampling as a consequence of: (i) higher acceptance rates, achieved due to better conservation of modified Hamiltonians by symplectic integrators; (ii) an access to second-order information about the target distribution; (iii) an additional parameter for improving performance; and (iv) irreversibility. The latter property of the method has never been pointed out by the authors whatsoever. Nevertheless, there is a great evidence that irreversible samplers result in better mixing properties than their reversible counterparts do \citepOttobre:2016.

On the other hand, potential disadvantages include one more parameter that should be tuned and some extra computational cost that is introduced through computation of modified Hamiltonians for each proposal as well as an additional Metropolis test for the momentum update step.

The GSHMC method has never been investigated for solving statistical inference problems although its applicability has been recognized \citepAkhmatskaya:2012.

In this paper, we present the Mix & Match Hamiltonian Monte Carlo (MMHMC) method which is based on the GSHMC method but modified, enriched with the new features and adapted specially to computational statistics. We implemented MMHMC in our software package HaiCS (details in Section 9), which offers implementation of several other HMC based samplers as well as a range of popular statistical models.

The modifications of GSHMC that led to the MMHMC method include:

• New formulations of modified Hamiltonians that allow for (i) employing multi-stage integrators in MMHMC; (ii) efficient implementation using quantities available from a simulation.

• Derivation of novel multi-stage numerical integrators, which can improve conservation of (modified) Hamiltonians compared with the commonly used Verlet integrator.

• Incorporating momentum updates within the Metropolis test resulting in less frequent calculation of derivatives in certain cases.

• An extension of the reduced momentum flipping technique to the methods sampling with modified Hamiltonians, in order to lessen the potentially negative impact of reverse trajectories.

As an additional contribution in this paper, we propose a new metric for sampling efficiency of methods that generate samples, which are both correlated and weighted (see Section 9.2).

In the following we provide details on each modification to the original GSHMC. We start with the formulation of MMHMC in Section 2. Section 3 provides two alternative ways for derivation of the 4th and 6th order modified Hamiltonians for multi-stage splitting integrators. In Section 4 the new multi-stage integrators specifically developed for the methods sampling with modified Hamiltonians are presented. Section 5 introduces the implicit partial momentum update whereas Section 6 offers the adaptation of a reduced flipping technique \citepWagonerPande12 to MMHMC. The choice of parameters for MMHMC is discussed in Section 7. We provide the algorithm for MMHMC in Section 8. The details of software implementation and testing procedure as well as the test results obtained for MMHMC and compared with various popular sampling techniques are discussed in Section 9. The conclusions are summarized in Section 10.

## 2 Mix & Match Hamiltonian Monte Carlo (MMHMC): Formulation

### 2.1 Hamiltonian Monte Carlo: Essentials

We start with revising the Hamiltonian Monte Carlo method. The purpose of HMC is to sample a random variable with the distribution or to estimate integrals of the form

 I=∫f(θ)π(θ)dθ. (1)

We will use the same notation for the probability density function (p.d.f.), which can be written as

 π(θ)=1Zexp(−U(θ)),

where the variable corresponds to the position vector, to the potential function of a Hamiltonian system and is the normalizing constant such that integrates to one. In Bayesian framework, the target distribution is the posterior distribution of model parameters given data , being the size of the data, and the potential function can be defined as

 U(θ)=−logL(θ|y)−logp(θ), (2)

for the likelihood function and prior p.d.f.  of model parameters.

The auxiliary momentum variable , conjugate to and independent on the vector is typically drawn from a normal distribution

 p∼N(0,M), (3)

with a covariance matrix , which is positive definite and often diagonal. The Hamiltonian function can be defined in terms of the target p.d.f. as the sum of the potential function and the kinetic function

 H(θ,p)=U(θ)+K(p)=U(θ)+12pTM−1p+12log((2π)D|M|). (4)

The joint p.d.f. is then

 π(θ,p) =1Zexp(−H(θ,p))=(2π)D2|M|Zexp(−U(θ))exp(−12pTM−1p) (5) ∝exp(−U(θ))exp(−12pTM−1p).

By simulating a Markov chain with the invariant distribution (5) and marginalizing out momentum variables, one recovers the target distribution .

HMC samples from by alternating a step for a momentum refreshment and a step for a joint, position and momentum, update, for each Monte Carlo iteration. In the first step, momentum is replaced by a new draw from the normal distribution (3). In the second step, a proposal for the new state, , is generated by integrating Hamiltonian dynamics

 dθdt=M−1p,dpdt=−Uθ(θ), (6)

for steps using a symplectic integrator with a step size . Due to the numerical approximation of integration, Hamiltonian function and thus, the density (5), are not preserved. In order to restore this property, which ensures invariance of the target density, an accept-reject step is added through a Metropolis criteria. The acceptance probability has a simple form

 α=min{1,exp(−(H(θ′,p′)−H(θ,p)))},

which, due to the preservation of volume, does not include potentially difficult to compute Jacobians of the mapping. As in any MCMC method, in case of a rejection, the current state is stored as a new sample. Once next sample is obtained, momentum is replaced by a new draw, so Hamiltonians have different values for consecutive samples. This means that samples are drawn along different level sets of Hamiltonians, which actually makes HMC an efficient sampler.

For a constant matrix the last term in the Hamiltonian (4) is a constant that cancels out in the Metropolis test. Therefore, the Hamiltonian can be defined as

 H(θ,p)=U(θ)+12pTM−1p.

The algorithmic summary of the HMC method is given below in Algorithm 1.

### 2.2 Mmhmc

As HMC, the MMHMC method aims at sampling unknown parameters with the distribution (known up to a normalizing constant)

 π(θ)∝exp(−U(θ)).

However, this is achieved indirectly, as shown in Figure 2. More precisely, MMHMC performs HMC importance sampling \citepKahnMarshall53 on the joint state space of parameters and momenta with respect to a modified Hamiltonian . The importance sampling distribution is defined as

 ~π(θ,p)∝exp(−~H(θ,p)). (7)

The target distribution on the joint state space , with respect to the true Hamiltonian , is recovered through importance reweighting and finally, the desired distribution by marginalizing momenta variables.

MMHMC consists of the three main steps:

1. Partial Momentum Monte Carlo (PMMC) – Momentum is partially updated using a noise vector and accepted according to the extended modified distribution with defined as

 ^H(x,p,u)=~H(x,p)+12u⊺M−1u. (8)

For the current momentum and a noise vector a proposal for the new momentum is generated from the mapping such that

 R(θ,p,u)=(θ,√1−φp+√φu,−√φp+√1−φu) (9)

and it is accepted according to the extended p.d.f.

 ^π(θ,p,u)∝exp(−(~H(θ,p)+12uTM−1u)) (10)

with probability

 P=min⎧⎨⎩1,exp(−(~H(θ,p∗)+12(u∗)TM−1u∗))exp(−(~H(θ,p)+12uTM−1u))⎫⎬⎭. (11)

If the new momentum is accepted, one set . Otherwise, the momentum stays unchanged.

The parameter controls the amount of noise introduced in every iteration. In Section 5 it will be shown that more efficient formulation of partial momentum update can be proposed in order to reduce a number of expensive derivative calculations.

2. Hamiltonian Dynamics Monte Carlo (HDMC) – A proposal is generated by simulating Hamiltonian dynamics (6) using a symplectic and reversible numerical integrator and accepted with the Metropolis criterion corresponding to the modified distribution (7) as

 (θnew,pnew)={(θ′,p′) with probability% α=min{1,exp(−Δ~H)}F(θ,p) otherwise (12)

where flips the momentum in the case of rejection and .

3. Reweighting – After iterations of the MMHMC algorithm (here is a chosen length of a MMHMC simulation), reweighting is required in order to estimate the integral (1). By making use of the standard technique for importance samplers, the integral is rewritten as

 I= Eπ[f]=∫f(θ)π(θ,p)dθdp=∫f(θ)π(θ,p)~π(θ,p)~π(θ,p)dθdp (13) = ∫f(θ)w(θ,p)~π(θ,p)dθdp=E~π[fw],

where is the importance distribution and the importance weight function. Since distributions and are known up to a normalizing constant, one may estimate this integral as

 ^I=∑Nn=1f(θn)wn∑Nn=1wn,wn=exp(~H(θn,pn)−H(θn,pn)), (14)

where are draws from , and are the corresponding weights.

Once a step size is chosen such that the modified Hamiltonian is a close approximation of the true Hamiltonian, the backward error analysis is valid \citepLR05. In particular, the difference between the true and modified Hamiltonian

 H(θ,p)−~H(θ,p)=O(hp), (15)

where is the order of the numerical integrator, implies that the reduction in efficiency of the estimator (14), introduced due to importance sampling, is minor in the case of the MMHMC method.

The main algorithmic differences between HMC and MMHMC are listed in Table 1. We discuss in more detail the last difference in the next section.

### 2.3 Irreversibility of MMHMC

Until recently, the significant attention in the literature has been paid to the theoretical analysis of reversible Markov chains rather than the study of irreversible Markov Chain Monte Carlo methods. However, numerous latest theoretical and numerical results demonstrate the advantage of irreversible MCMC over reversible algorithms both in terms of rate of convergence to the target distribution and variance of an estimator \citepNeal:2004,Suwa:2012,Ohzeki:2015,Bouchard-Cote:2016,Ottobre:2016,Duncan:2016,Duncan:2017. These well documented facts have induced a design of new algorithms which break the detailed balance condition (DBC)—a commonly used criterion to demonstrate the invariance of the chain. Some recent examples of irreversible methods based on Hamiltonian dynamcis can be found in \citepOttobre:2016,OPPS14,Ma:2016.

The core of the MMHMC algorithm consists of two steps, PMMC and HDMC, which both leave the target distribution invariant \citepAkhmatskaya08. However, the resulting chain is not reversible.

Apart from being invariant with respect to the target distribution, the HDMC step satisfies the modified DBC. The proof for the GHMC method can be found e.g. in \citepFSSS14, and the only difference in a case of MMHMC is that the target distribution, and thus the acceptance probability, is defined with respect to the modified Hamiltonian.

As the PMMC step is specific only to MMHMC (and, of course, GSHMC), we provide a direct proof of invariance of this step (Supplementary material). Furthermore, in an analogous way to HDMC, it can be proved that PMMC satisfies the modified DBC. The key observation is that the proposal mapping for momenta (9) is reversible w.r.t. the extended target , , and the reversing mapping is an involution.

The irreversibility of MMHMC arrises from an important observation—a composition of steps satisfying the DBC does not preserve the DBC. Therefore, although both steps of MMHMC do satisfy the (modified) DBC, their compositon is not symmetric and hence, the chain generated by MMHMC is not reversible by construction.

By that point, MMHMC is essentially the GSHMC method formulated in statistical framework. In the following sections, we introduce the new features that are specific to MMHMC only.

## 3 Modified Hamiltonians

The original GSHMC method has been formulated and implemented using the leapfrog integrator and the corresponding modified Hamiltonians. Our intention is to combine MMHMC with the numerical integrators which potentially can offer better conservation properties than Verlet. More specifically, we are interested in numerical integrators belonging to two-stage

 ψh=φBbh∘φAh2∘φB(1−2b)h∘φAh2∘φBbh, (16)

three-stage

 ψh=φBbh∘φAah∘φB(12−b)h∘φA(1−2a)h∘φB(12−b)h∘φAah∘φBbh (17)

and four-stage

 ψh=φBb1h∘φAah∘φBb2h∘φA(12−a)h∘φB(1−2b1−2b2)h∘φA(12−a)h∘φBb2h∘φAah∘φBb1h (18)

families of splitting methods, which require two, three or four gradient evaluations per step size, respectively. The exact flows and are solutions to the split systems

 A:dθdt=0,dpdt=−Uθ(θ), (19)

and

 B:dθdt=M−1p,dpdt=0, (20)

respectively. To incorporate splitting integrators in the MMHMC method, the formulation and implementation of appropriate modified Hamiltonians are required.

One procedure to calculate modified Hamiltonians of orders up to 24 is provided in \citepSkeel01,Engle05 for the Verlet integrator and it is further improved using Richardson extrapolation in \citepMoanNiesen10. This approach could be generalized to multi-stage integrators. However, it requires a modification of the integrator by introducing an additional scalar variable into dynamics. We opt for a different strategy in deriving appropriate expressions for modified Hamiltonians depending on one, two and three parameters for two-, three- and four-stage methods, respectively.

We consider splitting methods and start with writing the expansion of the Hamiltonian function with a quadratic kinetic function, in terms of Poisson brackets1 of partial Hamiltonians (19)-(20)

 ~H=H + h2α{A,A,B}+h2β{B,B,A} + h4γ1{A,A,A,A,B}+h4γ2{B,A,A,A,B} + h4γ3{B,B,A,A,B}+h4γ4{A,A,B,B,A}+O(h6)

where are polynomials written in terms of the integrators’ coefficients \citepBCSS14. Iterated Poisson brackets are denoted as .

The expressions for a modified Hamiltonian of an arbitrary order can be obtained by directly applying the Baker-Campbell-Hausdorff formula to the exponentials of Lie derivatives and iteratively, but the computation is cumbersome except for a low order approximation \citepSanzSerna94. Alternatively, coefficients multiplying Poisson brackets for the 4th, 6th and 8th order modified Hamiltonians for symmetric composition methods can be derived from expressions given by \citetOMF02. In the case of general non-symmetric composition methods with an arbitrary number of stages, one can obtain coefficients and using results derived in ([HLW06], Lemma III.5.5).

Here we propose two alternative ways to derive the expression for the 4th and 6th order modified Hamiltonians. One uses analytical derivatives of the potential function whereas another one relies on numerical time derivatives of its gradient, obtained through the quantities available from a simulation.

### 3.1 Analytical Derivatives

For problems in which derivatives of the potential functions are available, we derive the 4th and 6th order modified Hamiltonians by first expanding terms from (3) using Poisson bracket definition as

 {A,A,B} = pTM−1Uθθ(θ)M−1p {B,B,A} = Uθ(θ)TM−1Uθ(θ) {A,A,A,A,B} = Uθθθθ(θ)M−1pM−1pM−1pM−1p {B,A,A,A,B} = −3Uθ(θ)TM−1Uθθθ(θ)M−1pM−1p {B,B,A,A,B} = 2Uθ(θ)TM−1Uθθ(θ)M−1Uθ(θ) {A,A,B,B,A} = 2Uθ(θ)TM−1Uθθθ(θ)M−1pM−1p +2pTM−1Uθθ(θ)M−1Uθθ(θ)M−1p.

This leads to the following 4th and 6th order modified Hamiltonians for splitting integrators

 ~H[4](θ,p)= H(θ,p)+h2c21pTM−1Uθθ(θ)M−1p+h2c22Uθ(θ)TM−1Uθ(θ), (23) ~H[6](θ,p)= ~H[4](θ,p)+h4c41Uθθθθ(θ)M−1pM−1pM−1pM−1p (24) +h4c42Uθ(θ)TM−1Uθθθ(θ)M−1pM−1p +h4c43Uθ(θ)TM−1Uθθ(θ)M−1Uθ(θ) +h4c44pTM−1Uθθ(θ)M−1Uθθ(θ)M−1p,

where

 c21=α,c22=β,c41=γ1,c42=2γ4−3γ2,c43=2γ3,c44=2γ4. (25)

Coefficients can be derived from expressions in terms of Poisson brackets, given in \citepOMF02 where the authors analyzed the so called force-gradient integrators for molecular dynamics. In particular, they considered the splitting integrators that are extended by an additional higher-order operator into the single-exponential propagations. If the potential function is quadratic, i.e. corresponding to problems of sampling from Gaussian distributions, the 6th order modified Hamiltonian (24) simplifies to

 ~H[6](θ,p)=~H[4](θ,p) +h4c43Uθ(θ)TM−1Uθθ(θ)M−1Uθ(θ) (26) +h4c44pTM−1Uθθ(θ)M−1Uθθ(θ)M−1p.

Combining (25) with expressions for we obtain the following coefficients for the two-stage integrator family (16)

 c21= 124(6b−1) (27) c22= 112(6b2−6b+1) c41= 15760(7−30b) c42= 1240(−10b2+15b−3) c43= 1120(−30b3+35b2−15b+2) c44= 1240(20b2−1).

For three-stage integrators (17) (a two-parameter family) we get

 c21= 112(1−6a(1−a)(1−2b)) (28) c22= 124(6a(1−2b)2−1) c41= 1720(1+2(a−1)a(8+31(a−1)a)(1−2b)−4b) c42= 1240(6a3(1−2b)2−a2(19−116b+36b2+240b3)+a(27−208b+308b2)−48b2+48b−7) c43= 1180(1+15a(1−2b)(−1+2a(2−3b+a(4b−2)))) c44= 1240(−1+20a(1−2b)(b+a(1+6(b−1)b))).

Finally, for four-stage integrators (18) (a three-parameter family) the coefficients read as

 c21= 112(6b21−6b1+1+6b2(1−2a)(2b1+b2−1)) (29) c22= 124(6(b1+b2(1−2a)2)−1) c41= 15760(7+60(8(a−1)2a2−1)b1) c42= 196(1−12b1+40b21−24b31+4(1−2a)(a−3+(20−6a)b1+6(3+2a)b21)b2+8(1−2a)(5+ 9a2+6a(b1−2)−9b1)b22−24(1−2a)2b32) c43= 1360(2−15b1+30b21+15(1−2a)2(4(1+a)b1−1−2a)b2+30(1−2a)3b22) c44= 1120(2−30b31+5b21(7−6(4a(1+a)−3)b2)+5(1−2a)b2((7−6b2)b2−3+ 2a(6b22−1−3b2))+5b1(2(1−2a)b2(7−9b2+6a(1+b2))−3)).

Using (27) one can also obtain the modified Hamiltonian for the Verlet integrator, since two steps of Verlet integration are equivalent to one step of the two-stage integrator with . The coefficients are therefore

 c21 = 112,c22=−124 (30) c41 =

Figure 3 shows computational overheads of MMHMC, using the 4th order modified Hamiltonian (23) derived for two-stage integrators, compared to the HMC method. The left-hand graph presents the overhead for a model with a tridiagonal Hessian matrix and indicates that for two different dimensions of the system the overhead becomes negligible as the number of integration steps increases. In contrast, for models with a dense Hessian matrix computation of modified Hamiltonians may introduce a significant additional cost, as shown in the right-hand graph.

### 3.2 Numerical Derivatives

For applications with a dense Hessian matrix (and higher derivatives), computational overhead from calculations of modified Hamiltonians reduces the advantages of the MMHMC method. In order to implement such calculations in an efficient manner, we propose to express modified Hamiltonians in terms of quantities that are available during the simulation. Instead of making use of the time derivatives of the position vectors, as carried out in the original GSHMC method, we employ identities for time derivatives of the gradient of the potential function, as follows,

 U(1)θ = Uθθ(θ)M−1p U(2)θ = Uθθθ(θ)M−1pM−1p−Uθθ(θ)M−1Uθ(θ) (31) U(3)θ = Uθθθθ(θ)M−1pM−1pM−1p−3Uθθθ(θ)M−1Uθ(θ)M−1p −Uθθ(θ)M−1Uθθ(θ)M−1p.

Substituting these time derivatives (3.2) into the analytical expressions (23)-(24) for the 4th and 6th order modified Hamiltonians, respectively, one obtains

 ~H[4](θ,p) = H(θ,p)+h2k21pTM−1Uθ(1)+h2k22Uθ(θ)TM−1Uθ(θ), ~H[6](θ,p) = ~H[4](θ,p)+h4k41pTM−1Uθ(3)+h4k42Uθ(θ)TM−1Uθ(2) + h4k43Uθ(1)TM−1Uθ(1)+h4k44Uθ(θ)TM−1Uθθ(θ)M−1Uθ(θ),

where the coefficients are

 k21 = c21,k22=c22, (34) k41 =

We note that the newly derived expression (3.2) does not include the Hessian of the potential function and thus, allows for computation of using quantities available from a simulation. However, it is not the case for the resulting 6th order Hamiltonians. The last term in (3.2), arising from an expansion of the Poisson bracket , cannot be computed using time derivatives of available quantities and requires explicit calculation of the Hessian matrix of the potential function. Only for the Verlet integrator does this term vanish and the resulting coefficients are

 k21 = 112,k22=−124, k41 = (35)

One can now write explicit expressions for coefficients by simply substituting the derived coefficients (27), (28) or (29) into (34) for two-, three- or four-stage integrators, respectively.

In the original GSHMC method, an interpolating polynomial of positions is constructed from a numerical trajectory , where and for the 4th and 6th order modified Hamiltonian, respectively. This requires four or six additional gradient calculations in order to compute or , respectively. We choose a different strategy and calculate the polynomial in terms of the gradient of the potential function

 U(ti)=Uθ(θi),i=n−k,…,n,…,n+k.

With this approach for the 4th order and for the 6th order modified Hamiltonian, meaning that an evaluation of or requires two or four additional gradient calculations, respectively. Note that corresponds to a multiple of the full integration step only in the case of the Verlet integrator; for others it is the number of stages performed (e.g.  corresponds to a half integration step of a four-stage method). Also note that an efficient implementation does not include the unnecessary integration sub-step of momentum update at the very beginning and very end of the numerical trajectory .

Time derivatives of the gradient of the potential function are approximated using central finite difference of second order of accuracy for the 4th order modified Hamiltonian

 U(1)θ≈U(tn+1)−U(tn−1)2ε=:U(1), (36)

where for the Verlet, for two-stage and for three- and four-stage integrators, being the integration step size and being the integrator’s coefficient advancing position variables. The 6th order modified Hamiltonian, here considered only for the Verlet and two-stage integrators, is calculated using centered differences of fourth order accuracy for the first derivative and second order accuracy for the second and third derivatives

 U(1)θ ≈ U(tn−2)−8U(tn−1)+8U(tn+1)−U(tn+2)12ε=:U(1) U(2)θ ≈ U(tn−1)−2U(tn)+U(tn+1)ε2=:U(2) U(3)θ ≈ −U(tn−2)+2U(tn−1)−2U(tn+1)+U(tn+2)2ε3=:U(3),

where depends on the integrator as before.

The final expressions for the newly derived modified Hamiltonians are

 ~H[4](θ,p) = H(θ,p)+hk21pTM−1P1+h2k22Uθ(θ)TM−1Uθ(θ) ~H[6](θ,p) = ~H[4](θ,p)+hk41pTM−1P3+h2k42Uθ(θ)TM−1P2 + h2k43PT1M−1P1+h4k44Uθ(θ)TM−1Uθθ(θ)M−1Uθ(θ),

where . We note that the term with the coefficient is calculated exactly, i.e. avoiding finite difference approximation, which therefore improves the approximation of the modified Hamiltonian compared to the original strategy used in GSHMC. We also note that compared to the expressions with analytical derivatives (23) and (24) with coefficients multiplying exact derivatives, in the formulations (3.2) and (3.2) for the 4th and 6th order Hamiltonians, respectively, the terms arising from those multiplying and are approximated with . The level of accuracy provided by the modified Hamiltonians (3.2) and (3.2), however, are not affected by these approximations.

The computational overhead of MMHMC compared to the HMC method is shown in Figure 4 for models with a tridiagonal (left-hand graph) and a dense Hessian matrix (right-hand graph) using the modified Hamiltonians (3.2) and (3.2) of 4th and 6th order, respectively, with numerical approximations of derivatives. Compared to Figure 3, where all derivatives are calculated analytically, we note that for models with a sparse Hessian (left-hand graphs), the 4th order modified Hamiltonian (23) with analytical derivatives introduces less computational overhead than (3.2) with a numerical approximation. This is due to additional forwards and backwards integration steps, which do not counterbalance the inexpensive Hessian calculation. For models with a dense Hessian matrix (right-hand graphs) we recommend always using (3.2), which significantly reduces the overhead. The 6th order modified Hamiltonian (3.2) clearly requires additional computational effort, due to two extra gradient calculations per MC iteration.

In summary, we provided two alternative formulations of the 4th and 6th order modified Hamiltonians corresponding to multi-stage integrators (16)–(18) with arbitrary coefficients. For the cases when analytical derivatives of the potential function are available and inexpensive to compute, the modified Hamiltonians can be calculated using (23)-(30). For problems in which this is not the case, we provided formulations of modified Hamiltonians which mainly rely on quantities available from the simulation. Both approaches can be used with any multi-stage integrator (16)–(18) including the Verlet integrator.

In the following section, we devise the novel numerical integrators specifically for the methods sampling with modified Hamiltonians and examine their performance in comparison with the earlier proposed integrators for HMC methods.

## 4 Multi-stage Integrators

Until now, the leapfrog integrator has been the integrator of choice for the GSHMC method. Modified Hamiltonians in this case have been obtained using the Lagrangian formalism by \citetAkhmatskaya08. In this section, we consider alternative integrators and investigate their competitiveness with the Verlet integrator.

Our focus now shifts to multi-stage integrators. There are two reasons for an interest in these integrators. One is their potentially higher than in Verlet accuracy at the same computational cost. This implies higher acceptance rates and longer step sizes, thus better space exploration. Another possible benefit from the integrators of this class is avoiding a need for computationally expensive higher order modified Hamiltonians due to the accurate integration.

Our goal is to derive the new multi-stage integrators for being used in the methods which sample with modified Hamiltonians and compare their impact on the MMHMC performance with the efficiency of the integrators proposed for HMC by \citetBCSS14 and the Verlet integrator.

In the MMHMC method, the underlying system is driven by Hamiltonian dynamics (6). The equations of motion are therefore the same as in the HMC method; however, MMHMC possesses the different Metropolis test whose success depends on the accuracy of an integrator. Indeed, the sampling performance of MMHMC is controlled not by an energy error as in HMC but by a modified energy error. Thus, inspired by the ideas of \citetMcL95 and \citetBCSS14 for improving HMC performance by minimizing (expected) energy error through the appropriate choice of parameters of an integrator, we design the new integrators by considering the (expected) error in the modified Hamiltonian of order , in order to enhance performance of MMHMC. The expected values of such errors are taken with respect to the modified density , instead of the true density .

The parameters of new integrators will be chosen through minimization of either Hamiltonian error introduced after integration

 Δ=~H[l](Ψh,L(θ,p))−~H[l](θ,p), (39)

or its expected value . Here is the exact -time map of the modified Hamiltonian . With this approach, we design the minimum error and minimum expected error integrators for sampling with modified (M) Hamiltonians. In order to distinguish these integrators from the corresponding ones designed for the HMC method, the new integrators are denoted as M-ME and M-BCSS, respectively.

### 4.1 Minimum Error (M-ME) Integrators

We wish to construct the minimum error integrators for the 4th order modified Hamiltonian.

The Taylor expansion of the 4th order modified Hamiltonian after one integration step with the method can be written as \citepSanzSerna94

 ~H[4](θ′,p′) = ~H[4](Ψh(θ,p))=exp(hL~H)~H[4](θ,p) = ~H[4](θ,p)+hL~H~H[4](θ,p)+12h2L2~H~H[4](θ,p)+…,

where is the modified Hamiltonian (3) expressed in terms of Poisson brackets. Recalling the definition of the Lie derivative, , the error in after one integration step reads

 Δ(θ,p)=h5( γ1{A,A,A,A,A,B}(θ,p)+γ1{B,A,A,A,A,B}(θ,p) (40) + γ2{A,B,A,A,A,B}(θ,p)+γ2{B,B,A,A,A,B}(θ,p) + γ3{A,B,B,A,A,B}(θ,p)+γ3{B,B,B,A,A,B}(θ,p) + γ4{A,A,A,B,B,A}(θ,p)+γ4{B,A,A,B,B,A}(θ,p)).

An error metric for the 4th order modified Hamiltonian can then be defined as a function of the integrating coefficients

 E=√γ21+γ22+γ23+γ24, (41)

where the explicit expressions for follow from relationship (25) as

 γ1=c41,γ2=13(c44−c42),γ3=12c43,γ4=12c44

and coefficients are calculated from (27), (28) or (29) for two-, three- or four-stage integrators, respectively. For quadratic potential and kinetic functions, corresponding to the problem of sampling from a Gaussian distribution, error (40) simplifies and the error metric can be defined as

 EG=|γ2+γ4|. (42)

In contrast to this approach, the error metric for the minimum error integrator derived for sampling with the true Hamiltonian, i.e. the HMC method, is defined through the Hamiltonian truncation error at the state , rather than the error in Hamiltonian after numerical integration \citepMcL95. Minimization of the error metric results in the coefficient for the two-stage integrator.

In order to obtain numerical values for integrating coefficients for the MMHMC method, we minimized the metrics and on the interval using Mathematica. In Table 2 the coefficients obtained for each integrator with the corresponding error metrics for multi-stage M-ME integrators are summarized. The smallest error metric is achieved using three-stage integrators.

Error metric definitions (41) and (42) are based on the assumption that the iterated brackets from error (40) in contribute equally to the Hamiltonian error. This assumption does not hold in general, although it is a reasonable assumption to start with. Moreover, the weights of the brackets depend on the problem at hand, and their estimation could lead to different families of problem specific integrators. However, in this paper, our aim is to obtain the integrators for use in a broad range of problems.

### 4.2 Minimum Expected Error Two-stage Integrator (M-BCSS)

As in the previous section, the modified Hamiltonians considered here are of order 4. We adopt a strategy similar to the one proposed by \citetBCSS14, namely to find the parameters of integrators that minimize the expected value of the error. In our case, the error (39), resulting from numerical integration is in terms of the modified Hamiltonian and the expected value is taken with respect to the modified density .

As in the case when considering the error in the true Hamiltonian, one may prove that the expected error in the modified Hamiltonian is also positive. The objective is, therefore, to find a function that bounds , i.e.

 0≤E~π(Δ)≤ρ(h,b).

Here is a parameter for two-stage integrators and is a dimensionless step size; to guarantee stability. We omit here the derivation of as it can be found elsewhere \citepAkhmatskaya:2017 and provide the expression for for the family of two-stage integrators when sampling with 4th order modified Hamiltonians, which is

 ρ(h,b)=h8(b(12+4b(6b−5)+b(1+4b(3b−2))h2)−2)24(2−bh2)(4+(2b−1)h2)(2+b(2b−1)h2)(12+(6b−1)h2)