Error bounds for Metropolis–Hastings algorithms applied to perturbations of Gaussian measures in high dimensions
Abstract
The Metropolisadjusted Langevin algorithm (MALA) is a Metropolis–Hastings method for approximate sampling from continuous distributions. We derive upper bounds for the contraction rate in Kantorovich–Rubinstein–Wasserstein distance of the MALA chain with semiimplicit Euler proposals applied to logconcave probability measures that have a density w.r.t. a Gaussian reference measure. For sufficiently “regular” densities, the estimates are dimensionindependent, and they hold for sufficiently small step sizes that do not depend on the dimension either. In the limit , the bounds approach the known optimal contraction rates for overdamped Langevin diffusions in a convex potential.
A similar approach also applies to Metropolis–Hastings chains with Ornstein–Uhlenbeck proposals. In this case, the resulting estimates are still independent of the dimension but less optimal, reflecting the fact that MALA is a higher order approximation of the diffusion limit than Metropolis–Hastings with Ornstein–Uhlenbeck proposals.
10.1214/13AAP926 \volume24 \issue1 2014 \firstpage337 \lastpage377 \newproclaimdefi[prop]Definition \newproclaimrmk[prop]Remark \newproclaimex[prop]Example \newproclaimassumption[prop]Assumption
Error bounds for Metropolis–Hastings algorithms
class=AMS] \kwd[Primary ]60J22 \kwd[; secondary ]60J05 \kwd65C40 \kwd65C05
Metropolis algorithm \kwdMarkov chain Monte Carlo \kwdLangevin diffusion \kwdEuler scheme \kwdcoupling \kwdcontractivity of Markov kernels
1 Introduction
The performance of Metropolis–Hastings (MH) methods MRTT , H , RC for sampling probability measures on highdimensional continuous state spaces has attracted growing attention in recent years. The pioneering works by Roberts, Gelman and Gilks RGG and Roberts and Rosenthal RR show in particular that for product measures on , the average acceptance probabilities for the Random Walk Metropolis algorithm (RWM) and the Metropolis adjusted Langevin algorithm (MALA) converge to a strictly positive limit as only if the step sizes go to zero of order , , respectively. In this case, a diffusion limit as has been derived, leading to an optimal scaling of the step sizes maximizing the speed of the limiting diffusion, and an asymptotically optimal acceptance probability.
Recently, the optimal scaling results for RWM and MALA have been extended significantly to targets that are not of product form but have a sufficiently regular density w.r.t. a Gaussian measure; cf. MPS , PSTb . On the other hand, it has been pointed out BRSV , BS , HSV , CRSW that for corresponding perturbations of Gaussian measures, the acceptance probability has a strictly positive limit as for small step sizes that do not depend on the dimension, provided the random walk or Euler proposals in RWM and MALA are replaced by Ornstein–Uhlenbeck or semiimplicit (“preconditioned”) Euler proposals, respectively; cf. also below. Pillai, Stuart and Thiéry PST show that in this case, the Metropolis–Hastings algorithm can be realized directly on an infinitedimensional Hilbert space arising in the limit as , and the corresponding Markov chain converges weakly to an infinitedimensional overdamped Langevin diffusion as .
Mixing properties and convergence to equilibrium of Langevin diffusions have been studied intensively Has , MT , B , BCG , PZ , Roy . In particular, it is wellknown that contractivity and exponential convergence to equilibrium in Wasserstein distance can be quantified if the stationary distribution is strictly logconcave CL , RS ; cf. also E11 for a recent extension to the nonlogconcave case. Because of the diffusion limit results, one might expect that the approximating Metropolis–Hastings chains have similar convergence properties. However, this heuristics may also be wrong, since the convergence of the Markov chains to the diffusion is known only in a weak and nonquantitative sense.
Although there is a huge number of results quantifying the speed of convergence to equilibrium for Markov chains on discrete state spaces (cf. P , SC for an overview), there are relatively few quantitative results on Metropolis–Hastings chains on when is large. The most remarkable exception are the wellknown works DFK , KLS , LV , LV1 , LV2 which prove an upper bound for the mixing time that is polynomial in the dimension for Metropolis chains with ball walk proposals for uniform measures on convex sets and more general logconcave measures.
Below, we develop an approach to quantify Wasserstein contractivity and convergence to equilibrium in a dimensionindependent way for the Metropolis–Hastings chains with Ornstein–Uhlenbeck and semiimplicit Euler proposals. Our approach applies in the strictly logconcave case (or, more generally, if the measure is strictly logconcave on an appropriate ball) and yields bounds for small step sizes that are very precise. The results for semiimplicit Euler proposals require less restrictive assumptions than those for Ornstein–Uhlenbeck proposals, reflecting the fact that the corresponding Markov chain is a higher order approximation of the diffusion.
Our results are closely related and complementary to the recent work HSV11 , and to the dimensiondependent geometric ergodicity results in BRHV . In particular, in HSV11 , Hairer, Stuart and Vollmer apply related methods to establish exponential convergence to equilibrium in Wasserstein distance for Metropolis–Hastings chains with Ornstein–Uhlenbeck proposals in a less quantitative way, but without assuming logconcavity. In the context of probability measures on function spaces, the techniques developed here are applied in the PhD Thesis of Gruhlke G .
We now recall some basic facts on Metropolis–Hastings algorithms and describe our setup and the main results. Sections 2 and 3 contain basic results on Wasserstein contractivity of Metropolis–Hastings kernels, and contractivity of the proposal kernels. In Sections 4 and 5, we prove bounds quantifying rejection probabilities and the dependence of the rejection event on the current state for Ornstein–Uhlenbeck and semiimplicit Euler proposals. These bounds, combined with an upper bound for the exit probability of the corresponding Metropolis–Hastings chains from a given ball derived in Section 6 are crucial for the proof of the main results in Section 7.
1.1 Metropolis–Hastings algorithms
Let be a lower bounded measurable function such that
and let denote the probability measure on with density proportional to . We use the same letter for the measure and its density, that is,
(1) 
Below, we view the measure defined by (1) as a perturbation of the standard normal distribution in ; that is, we decompose
(2) 
with a measurable function , and obtain the representation
(3) 
with normalization constant . Here denotes the Euclidean norm.
Note that in , any probability measure with a strictly positive density can be represented as an absolutely continuous perturbation of as in (3). In an infinitedimensional limit, however, the density may degenerate. Nevertheless, also on infinitedimensional spaces, absolutely continuous perturbations of Gaussian measures form an important and widely used class of models.
[(Transition path sampling)] We briefly describe a typical application; cf. HSV and G for details. Suppose that we are interested in sampling a trajectory of a diffusion process in conditioned to a given endpoint at time . We assume that the unconditioned diffusion process satisfies a stochastic differential equation of the form
(4) 
where is an dimensional Brownian motion, and is bounded from below. Then, by Girsanov’s theorem and Itô’s formula, a regular version of the law of the conditioned process satisfying and on the path space is given by
(5) 
where is the law of the Brownian bridge from to ,
(6) 
and ; cf. Roy . In order to obtain finitedimensional approximations of the measure on , we consider the Wiener–Lévy expansion
(7) 
of a path in terms of the basis functions and with . Here the coefficients , , , , are real numbers. Recall that truncating the series at corresponds to taking the polygonal interpolation of the path adapted to the dyadic partition of the interval . Now fix , let and let
denote the vector consisting of the first components in the basis expansion of a path . Then the image of the Brownian bridge measure under the projection that maps to is the dimensional standard normal distribution ; for example, cf. Ste . Therefore, a natural finitedimensional approximation to the infinitedimensional sampling problem described above consists in sampling from the probability measure
(8) 
on where is a normalization constant, and
(9) 
with denoting the polygonal path corresponding to .
Returning to our general setup, suppose that is an absolutely continuous transition kernel on with strictly positive densities . Let
(10) 
Note that does not depend on . The Metropolis–Hastings algorithm with proposal kernel is the following Markov chain Monte Carlo method for approximate sampling and Monte Carlo integration w.r.t. :
[(1)]
Choose an initial state .
For :

Sample and independently.

If , then accept the proposal, and set , else reject the proposal and set .
The algorithm generates a timehomogeneous Markov chain with initial state and transition kernel
(11) 
Here
(12) 
is the average rejection probability for the proposal when the Markov chain is at . Note that restricted to is again absolutely continuous with density
Since
is a symmetric function in and , the kernel satisfies the detailed balance condition
(13) 
In particular, is a stationary distribution for the Metropolis–Hastings chain, and the chain with initial distribution is reversible. Therefore, under appropriate ergodicity assumptions, the distribution of will converge to as .
To analyze Metropolis–Hastings algorithms it is convenient to introduce the function
(14) 
For any ,
(15) 
In particular, for any ,
(16)  
(17)  
(18) 
The function defined by (14) can also be represented in terms of : Indeed, since
we have
(19) 
where denotes the standard normal density in .
1.2 Metropolis–Hastings algorithms with Gaussian proposals
We aim at proving contractivity of Metropolis–Hastings kernels w.r.t. appropriate Kantorovich–Rubinstein–Wasserstein distances. For this purpose, we are looking for a proposal kernel that has adequate contractivity properties and sufficiently small rejection probabilities. The rejection probability is small if the proposal kernel approximately satisfies the detailed balance condition w.r.t. .
1.2.1 Ornstein–Uhlenbeck proposals
A straightforward approach would be to use a proposal density that satisfies the detailed balance condition
(20) 
w.r.t. the standard normal distribution. In this case,
(21) 
The simplest form of proposal distributions satisfying (20) are the transition kernels of AR (discrete Ornstein–Uhlenbeck) processes given by
(22) 
for some constant . If is a standard normally distributed valued random variable, then the random variables
(23) 
have distributions . Note that by (21), the acceptance probabilities
(24) 
for Ornstein–Uhlenbeck proposals do not depend on .
1.2.2 Euler proposals
In continuous time, under appropriate regularity and growth conditions on , detailed balance w.r.t. is satsfied exactly by the transition functions of the diffusion process solving the overdamped Langevin stochastic differential equation
(25) 
because the generator
is a selfadjoint operator on an appropriate dense subspace of ; cf. Roy . Although we cannot compute and sample from the transition functions exactly, we can use approximations as proposals in a Metropolis–Hastings algorithm. A corresponding MH algorithm where the proposals are obtained from a discretization scheme for the SDE (25) is called a Metropolisadjusted Langevin algorithm (MALA); cf. RT , RC .
In this paper, we focus on the MALA scheme with proposal kernel
(26) 
for some constant ; that is, is the distribution of
where is a standard normal random variable with values in .
Note that if is replaced by , then (1.2.2) is a standard Euler discretization step for the SDE (25). Replacing by ensures that detailed balance is satisfied exactly for . Alternatively, (1.2.2) can be viewed as a semiimplicit Euler discretization step for (25):
[(Euler schemes)] The explicit Euler discretization of the overdamped Langevin equation (25) with time step size is given by
(28) 
where , are i.i.d. random variables with distribution . The process defined by (28) is a timehomogeneous Markov chain with transition kernel
(29) 
Even for , the measure is not a stationary distribution for the kernel . A semiimplicit Euler scheme for (25) with timestep size is given by
(30) 
with i.i.d. with distribution ; cf. HSV . Note that the scheme is implicit only in the linear part of the drift but explicit in . Solving for in (30) and substituting with yields the equivalent equation
(31) 
We call the Metropolis–Hastings algorithm with proposal kernel a semiimplicit MALA scheme with step size .
Proposition 1.1 ((Acceptance probabilities for semiimplicit MALA))
Let and . Then the acceptance probabilities for the Metropolisadjusted Langevin algorithm with proposal kernels are given by with
(32)  
For explicit Euler proposals with step size , a corresponding representation holds with
The proof of the proposition is given in Section 4 below.
For explicit Euler proposals, the correction term in (1.1) does not vanish for . More significantly, this term goes to infinity as , and the variance of w.r.t. the proposal distribution is of order .
1.3 Bounds for rejection probabilities
We fix a norm on such that
(34) 
We assume that is sufficiently smooth w.r.t. with derivatives growing at most polynomially:
The function is in , and for any , there exist finite constants , such that
holds for any and . For discretizations of infinitedimensional models, will typically be a finitedimensional approximation of a norm that is almost surely finite w.r.t. the limit measure in infinitedimensions.
[(Transition path sampling)] Consider the situation of Example 1.1, and assume that is in . Then by (9) and (6), is . For and , the directional derivatives of are given by
(35) 
where are the polygonal paths in corresponding to , respectively, for and . Assuming for some integer as , we can estimate
where , , is a discrete norm of the polygonal path and are finite constants that do not depend on the dimension . One could now choose for the minus norm the norm on corresponding to the discrete norm on polygonal paths. However, it is more convenient to choose a norm coming from an inner product. To this end, we consider the norms
on path space , and the induced norms
on where . One can show that for , the norm can be bounded from above by independently of the dimension; cf. G . On the other hand, if , then for almost every path of the Brownian bridge. This property will be crucial when restricting to balls w.r.t. . For with , both requirements are satisfied, and Assumption 1.3 holds with constants that do not depend on the dimension.
The next proposition yields in particular an upper bound for the average rejection probability w.r.t. both Ornstein–Uhlenbeck and semiimplicit Euler proposals at a given position ; cf. BouRabee for an analogue result:
Proposition 1.2 ((Upper bounds for MH rejection probabilities))
Suppose that Assumption 1.3 is satisfied and let . Then there exist polynomials and of degrees , , respectively, such that for any and ,
(1) The polynomials and in Proposition 1.2 are explicit; cf. the proof below. They depend only on the values in Assumption 1.3 for , , respectively, and on the moments
(36) 
but they do not depend on the dimension . For semiimplicit Euler proposals, the upper bound in Proposition 1.2 is stated in explicit form for the case and in (4) below.
(2) For explicit Euler proposals, corresponding estimates hold with replaced by ; cf. Remark 4 below. Note, however, that as .
Our next result is a bound of order , , respectively, for the average dependence of the acceptance event on the current state w.r.t. Ornstein–Uhlenbeck and semiimplicit Euler proposals. Let denote the dual norm of on , that is,
Note that
For a function ,
that is, the plus norm of determines the Lipschitz constant w.r.t. the minus norm.
Proposition 1.3 ((Dependence of rejection on the current state))
Suppose that Assumption 1.3 is satisfied, and let . Then there exist polynomials and of degrees , , respectively, such that for any and ,
(37)  
(38)  
(39)  
(40)  
where denotes the line segment between and .
The proof of the proposition is given in Section 5 below.
Again, the polynomials and are explicit. They depend only on the values in Assumption 1.3 for , , respectively, and on the moments for , , respectively, but they do not depend on the dimension . For semiimplicit Euler proposals, the upper bound in Proposition 1.3 is made explicit for the case and in (5) below.
For Ornstein–Uhlenbeck proposals, it will be useful to state the bounds in Propositions 1.2 and 1.3 more explicitly for the case , that is, when the second derivatives of are uniformly bounded w.r.t. the minus norm:
Proposition 1.4
1.4 Wasserstein contractivity
The bounds in Propositions 1.2, 1.3 and 1.4 can be applied to study contractivity properties of Metropolis–Hastings transition kernels. Recall that the Kantorovich–Rubinstein or Wasserstein distance of two probability measures and on the Borel algebra w.r.t. a given metric on is defined by
where consists of all couplings of and , that is, all probability measures on with marginals and ; cf., for example, Villani . Recall that a coupling of and can be realized by random variables and defined on a joint probability space such that and .
In order to derive upper bounds for the distances , and, more generally, , we define a coupling of the MALA transition probabilities , by setting
Here , , is the basic coupling of the proposal distributions defined by (1.2.2) with , and the random variable is uniformly distributed in and independent of .
Correspondingly, we define a coupling of the Metropolis–Hastings transition kernels based on Ornstein–Uhlenbeck proposals by setting
Let
denote the centered ball of radius w.r.t. . As a consequence of Proposition 1.4 above, we obtain the following upper bound for the Kantorovich–Rubinstein–Wasserstein distance of and w.r.t. the metric :
Theorem 1.5 ((Contractivity of MH transitions based on OU proposals))
Suppose that Assumption 1.3 is satisfied for with . Then for any , , and ,
where
with an explicit constant that only depends on the values and .
The proof is given in Section 7 below.
Theorem 1.5 shows that Wasserstein contractivity holds on the ball provided and is chosen sufficiently small depending on [with ]. In this case, the contraction constant depends on the dimension only through the values of the constants and . On the other hand, the following onedimensional example shows that for , the acceptancerejection step may destroy the contraction properties of the OU proposals:
Suppose that and . If with a constant , then by Theorem 1.5, Wasserstein contractivity holds for the Metropolis–Hastings chain with Ornstein–Uhlenbeck proposals on the interval provided is chosen sufficiently small. On the other hand, if for with a constant , then the logarithmic density
is strictly concave for , and it can be easily seen that Wasserstein contractivity on does not hold for the MH chain with OU proposals if is sufficiently small.
A disadvantage of the result for Ornstein–Uhlenbeck proposals stated above is that not only a lower bound on the second derivative of is required (this would be a fairly natural condition as the example indicates), but also an upper bound of the same size. For semiimplicit Euler proposals, we can derive a better result that requires only a strictly positive lower bound on the second derivative of and Assumption 1.3 with arbitrary constants to be satisfied. For this purpose we assume that
for an inner product on , and we make the following assumption on :
There exists a strictly positive constant such that
(41) 
Of course, Assumption 1.4 is still restrictive, and it will often be satisfied only in a suitable ball around a local minimum of . Most of the results below are stated on a given ball w.r.t. the minus norm. In this case it is enough to assume that 1.4 holds on that ball. If coincides with the Euclidean norm , then the assumption is equivalent to convexity of . Moreover, since , a sufficient condition for (41) to hold is
(42) 