Importance sampling in path space for diffusion processes with slowfast variables
Abstract
Importance sampling is a widely used technique to reduce the variance of a Monte Carlo estimator by an appropriate change of measure. In this work, we study importance sampling in the framework of diffusion process and consider the change of measure which is realized by adding a control force to the original dynamics. For certain exponential type expectation, the corresponding control force of the optimal change of measure leads to a zerovariance estimator and is related to the solution of a HamiltonJacobiBellmann equation. We focus on certain diffusions with both slow and fast variables, and the main result is that we obtain an upper bound of the relative error for the importance sampling estimators with control obtained from the limiting dynamics. We demonstrate our approximation strategy with an illustrative numerical example.
Keywords:
Importance sampling HamiltonJacobiBellmann equation Monte Carlo method change of measure rare events diffusion process.∎
1 Introduction
Monte Carlo (MC) methods are powerful tools to solve highdimensional problems that are not amenable to gridbased numerical schemes junliu_mc . Despite their quite long history since the invention of the computer, the development of MC method and applications thereof are a field of active research. Variants of the standard Monte Carlo method include Metropolis MC mcmchastings ; mcmcbrooks , Hybrid MC hmcduane ; hmcSchutte , Sequential MC smcliu ; smcDoucet2001 , to mention just a few.
A key issue for many MC methods is variance reduction in order to improve the convergence of the corresponding MC estimators. Although all unbiased MC estimators share the same decay of their variances with the sample size , the prefactor matters a lot for the performance of the MC method. Therefore variance reduction techniques (see, e.g., mcasmussen2007 ; junliu_mc ) seek to decrease the constant prefactor and thus to increase the accuracy and efficiency of the estimators.
In this paper, we focus on the importance sampling method for variance reduction. The basic idea is to generate samples from an alternative probability distribution (rather than sampling from the original probability distribution), so that the “important” regions in state space are more frequently sampled. To give an example, consider a realvalued random variable on some probability space and the calculation of a probability
of the event that is rare. When the set is rarely hit by the random variable , it may be a good idea to draw samples from another probability distribution, say, so that the event has larger probability under . An unbiased estimator of can then be based on the appropriately reweighted expectation under , i.e.,
with being the RadonNikodym derivative of with respect to . The difficulty now lies in a clever choice of , because not every probability measure that puts more weight on the “important” region leads to a variance reduction of the corresponding estimator. Especially in cases when the two probability distributions are too different from each other so that the RadonNikodym derivative (or likelihood ratio) becomes almost degenerate, the variance typically grows and one is better off with the plain vanilla MC estimator that is based on drawing samples from the original distribution . Importance sampling thus deals with clever choices of that enhance the sampling of events like while mimicking the behaviour of the original distribution in the relevant regions. Often such a choice can be based on large deviation asymptotics that provides estimates for the probability of the event as a function of a smallness parameter; see, e.g., ipblanchet2008 ; ipglasserman ; ipasmussen2008 ; ipdupuis ; ipdupuismultiscale ; iperic .
Here we focus on the path sampling problem for diffusion processes. Specifically, given a diffusion process governed by a stochastic differential equation (SDE), our aim is to compute the expectation of some path functional of with respect to the underlying probability measure generated by the Brownian motion. In this setting, we want to apply importance sampling and draw samples (i.e. trajectories) from a modified SDE to which a control force has been added that drives the dynamics to the important regions in state space. The control force generates a new probability measure on the space of trajectories , and estimating the expectation of the path functional with respect to the original probability measure by sampling from the controlled SDE is possible if the trajectories are reweighted according to the Girsanov theorem oksendalSDE . We confine ourselves to certain exponential path functionals which will be explicitly given below. For this type of path functionals, the optimal change of measure exists that admits importance sampling estimator with zero variance. Furthermore, the path sampling problem admits a dual formulation in terms of a stochastic optimal control problem, in which case finding the optimal change of measure is equivalent to solving the HamiltonJacobiBellmann (HJB) equation associated with the stochastic control problem.
Relevant work and contribution of this paper. While in general it is impractical to find the exact optimal control force by solving an optimal control problem, there is some hope to find computable approximations to the optimal control that yield importance sampling estimators which are sufficiently accurate in that they have small variance. A general theoretical framework has been established by Dupuis and Wang in dupuis_isaaces ; ipdupuis , where they connected the subsolutions of HJB equation and the rate of variance decay for the corresponding importance sampling estimators. This theoretical framework has been further applied by Dupuis, Spiliopoulos and Wang in a series of papers rare_event_rough ; ipdupuismultiscale ; ipkostas1 ; ipkostas3 to study systems of quite general forms and several adaptive importance sampling schemes were suggested based on large deviation analysis. In many cases, these importance sampling schemes were shown to be asymptotically optimal in logarithmic sense. Also see discussions in iperic ; spiliopoulos2015 . Closely related to our present work, dynamics involving two parameters , that represent time scale separation between slow and fast variables and the noise intensity, were studied in ipkostas1 . Therein the author carried out a systematic analysis for dynamics within different regimes that are expressed by the ratio as , where . Importance sampling for systems in the regime when with random environment was studied in ipkostas3 . A numerical scheme that leads to importance sampling estimators with vanishing relative error for diffusion processes in the small noise limit has been proposed in iperic . On the other hand, while importance sampling is crucial in the small noise limit when , some recent work ipkostas2 ; spiliopoulos2015 also considered the performance of importance sampling estimators when is small but fixed (preasymptotic regime), especially when systems’ metastability is involved ipkostas2 .
Inspired by these previous studies, in the present work we consider importance sampling for diffusions with both slow and fast time scales. See equation (3.1) in Section 3. Instead of studying importance sampling estimators associated with general subsolutions of the HJB equation as in ipdupuis ; rare_event_rough ; ipdupuismultiscale ; ipkostas1 ; ipkostas3 , we consider a specific control which can be constructed from the lowdimensional limiting dynamics. The main contribution of the present work is Theorem 3.1 in Section 3 which states that, under certain assumptions, the importance sampling estimator associated to this specific control is asymptotically optimal in the time scale separation limit and an upper bound on the relative error of the corresponding estimator is obtained. To the best of our knowledge, this is the first result about the explicit dependence of the relative error of the importance sampling estimator on the timescale separation parameter. As a secondary contribution, since the proof is based on a careful study of the multiscale process and the limiting process, several error estimates for the strong approximation of the original process by the limiting process are obtained as a byproduct. See Theorem 5.25.4 in Section 5.
Before concluding the introduction, we compare our results with the previous work in more details and discuss some limitations. First of all, the twoscale dynamics (3.1) considered in the present work is a special case of the dynamics considered in ipkostas1 ; ipkostas3 (corresponding to coefficients there). This specialization allows us to prove strong convergence of the dynamics towards the limit dynamics. Secondly, instead of considering asymptotic regime for both as in ipdupuismultiscale ; ipkostas1 ; ipkostas3 , here we only consider the timescale separation limit and assume the other parameter in (3.1), which is related to system’s temperature, is fixed. (Roughly speaking, this corresponds to the case when with fixed in ipkostas1 ; ipkostas3 ). As a consequence, the constant in Theorem 3.1 depends on . Thirdly, we assume Lipschitz conditions on system’s coefficients, which may be restrictive in many applications. Generalizing the theoretical results to nonLipschitz case is possible but not trivial and will be considered in future work. We refer to cerrai_nonlip for a related studies of reactiondiffusion equations.
Nevertheless, the twoscale dynamics (3.1) is an interesting mathematical paradigm for many applications that involve both slow and fast time scales (we refer to asymptotic_analysis ; book_PS08 for general references about averaging and homogenization). And our results are of different type comparing to the above mentioned literatures. In applications, especially in climate sciences and molecular dynamics meta_simple_climate_model ; majda_math_pers ; msm , systems may have a few degrees of freedom which evolves on a large time scale and exhibits metastability feature, while the other degrees of freedom are rapidly evolving. In this situation, due to the presence of metastability, standard Monte Carlo sampling may become inefficient and shows large sample variance even for moderate temperatures (also see ipkostas2 ). We expect our results will be relevant for developing efficient importance sampling schemes in this situation. A more detailed discussion based on an illustrative numerical example will be presented in Section 4.
Organization of the article. This paper is organized as follows. In Section 2, we briefly introduce the importance sampling method in the diffusion setting and discuss the variance of Monte Carlo estimators corresponding to a general control force. Section 3 states the assumptions and our main result: an upper bound of the relative error for the importance sampling estimator based on suboptimal controls for the multiscale diffusions; the result is proved in Section 5, but we provide some heuristic arguments based on formal asymptotic expansions already in Section 3. Section 4 shows an illustrative numerical example that demonstrate the performance of the importance sampling method. Appendix A and B contain technical results that are used in the proof.
2 Importance sampling of diffusions
We consider the conditional expectation
(2.1) 
on a finite time interval , where , , and satisfies the dynamics
(2.2) 
with , , is a standard dimensional Wiener process. Exponential expectations similar to (2.1) may arise either in connection with importance sampling ipdupuismultiscale ; ipkostas1 ; ipkostas3 ; iperic , or due to its close relationship with certain optimal control problem var_rep1998 ; fleming2006 . In recent years, it has also been exploited by physicists to study phase transitions Jack1 ; Hedges06032009 .
2.1 Importance sampling method
In this subsection we introduce the importance sampling method to compute quantify (2.1). To simplify matters, we assume all the coefficients are smooth and the controls satisfy the Novikov condition such that the Girsanov theorem can be applied oksendalSDE . Specific assumptions and the concrete form of dynamics will be given in Section 3.
It is known that dynamics (2.2) induces a probability measure over the path ensembles starting from . To apply the importance sampling method, we introduce
(2.3) 
where will be referred to as the control force. Then it follows from Girsanov theorem oksendalSDE that is a standard dimensional Wiener process under probability measure , with RadonNikodym derivative
(2.4) 
In the following, we will omit the conditioning on the initial value at time . Letting denote the expectation under , we have
(2.5) 
with variance
(2.6) 
Moreover, under , we have
(2.7) 
Now consider the calculation of (2.5) by a Monte Carlo sampling in path space, and suppose that independent trajectories of (2.7) have been generated where . An unbiased estimator of (2.1) is now given by
(2.8) 
whose variance is
(2.9) 
Notice that when , and we recover the standard Monte Carlo method. In order to quantify the efficiency of the Monte Carlo method, we introduce the relative error ipdupuis ; iperic
(2.10) 
The advantage of introducing the control force is that we may choose to reduce the relative error of the estimator (2.8). From (2.6) and (2.9), we can see that minimizing the relative error of the new estimator is equivalent to choosing such that
(2.11) 
is as close as possible to .
2.2 Dual optimal control problem and estimate of relative error
To proceed, we make use of the following duality relation var_rep1998 :
(2.12) 
where the infimum is over all processes which are progressively measurable with respect to the augmented filtration generated by the Brownian motion. See var_rep1998 for more discussions. It is known that there is a feedback control such that the infimum on the righthand side (RHS) of (2.12) is attained (see (fleming2006, , Sec. VI, Thm. 3.1)). We will call the optimal control force. Accordingly we define to be the respective quantities in (2.3) and (2.4) with replaced by , and we denote as the solution of (2.7) with control force . Using Jensen’s inequality one can show that (2.12) implies
(2.13) 
Combining the above equality with (2.9), it follows that the change of measure induced by is optimal in the sense that the variance of the importance sampling estimator (2.8) vanishes.
It is helpful to note that the RHS of (2.12) has an interpretation as the value function of the stochastic control problem:
(2.14) 
From the dynamic programming principle fleming2006 , we know that satisfies the following HamiltonJacobiBellman (HJB) or dynamic programming equation:
(2.15) 
The latter implies that the optimal control force is of feedback form and satisfies
(2.16) 
Now we estimate (2.11) and thus the relative error (2.10) for a general control . To this end we suppose that the probability measures and are mutually equivalent. Then, using (2.13), we can conclude that
(2.17) 
and therefore
(2.18)  
where by Girsanov’s theorem (2.4), we have
(2.19) 
In order to simplify (2.18), we follow ipdupuismultiscale and introduce another control force and change the measure again. Specifically, we choose and define as in (2.3)–(2.4), with being replaced by . If we now let denote the expectation with respect to then, using equations (2.18) and (2.19), we obtain
(2.20) 
Roughly speaking, the last equation indicates that the relative error
(2.10)
of the importance sampling estimator associated to a general control depends on
the difference between control and the optimal control . This
relation will be further used in Section 5 to prove the upper bound for
the relative error of importance sampling estimator.
3 Importance sampling of multiscale diffusions
Our main result in this paper concerns dynamics with two time scales. Specifically, we consider the case when the state variable can be split into a slow variable and a fast variable , i.e. , and we assume that (2.2) is of the form
(3.1) 
where , are smooth vector fields, , are smooth noise coefficients and , are independent Wiener processes with . The parameter describes the timescale separation between processes and .
Let be given and suppose that the fast subsystem
(3.2) 
is ergodic with a unique invariant measure whose density with respect to Lebesgue measure is denoted by (see Appendix B for more details). Then it is well known that when , under some mild conditions on the coefficients, the slow component of (3.1) converges in probability to the averaged dynamics freidlin2012random ; khasminskii ; book_PS08 ; liu2010
(3.3) 
where for every , we have
(3.4) 
Further define
(3.5) 
and consider the averaged value function
(3.6) 
where is the solution of
(3.7) 
The idea of using suboptimal controls for importance sampling of multiscale systems such as (3.1) is to use the solution of the limiting control problem (3.6)–(3.7) to construct an asymptotically optimal control of the form
(3.8) 
for the full system. Comparing (3.8) to the optimal control force (2.16), this means that we construct the control for the slow variable by using the averaged value function in (3.6) and leave the fast variable uncontrolled. Notice that control (3.8) has also been suggested in ipkostas1 for more general dynamics with a general subsolution of the HJB equation.
Remark 1
Another variant of a suboptimal control would be
(3.9) 
where the component is the optimal control of the averaged system (3.6)–(3.7). The advantage of using (3.9) rather than (3.8) is that the fast variables do not need to be explicitly known or observable in order to control the system. In the following we will assume that is independent of , in which case (3.8) and (3.9) coincide (see Assumption 3).
3.1 Main result
Our main assumptions are as follows.
Assumption 1
are functions, with derivatives that are uniformly bounded by a constant . and are bounded. Furthermore, there exist constant , such that
.
Assumption 2
, such that , we have
(3.10) 
where denotes the Frobenius norm.
Assumption 3
and do not depend on .
Remark 2

Assumption 1 implies the coefficients are Lipschitz functions. In particular, it holds that , (similarly for the other coefficients).

Assumption 2 guarantees that the fast dynamics is exponentially mixing. As we study the asymptotic solution of (3.1) as at fixed noise intensity, the inverse temperature can be absorbed into the coefficients , and . In Section 5, we will therefore assume , in which case Assumption 2 implies that
(3.11) where is an matrix with components
(3.12) Combining this with Assumption 1, we have
(3.13) The constant in (3.11) is not optimal, but it will simplify matters later on.
Now we are ready to state our main result, whose proof will be given in Section 5.
3.2 Formal expansion by asymptotic analysis
The proof of Theorem 3.1 in Section 5 is relatively long and technical, which is why we shall give a formal derivation of (3.8) first. The idea is to identify the suboptimal control as the leading term of the optimal control using formal asymptotic expansions asymptotic_analysis ; book_PS08 . To this end, let denote the solution of (2.15), for which we seek an asymptotic expansion in powers of . Further let . From the dual relation (2.12), we know that is the expectation (2.1) we want to compute. By the FeynmanKac formula, we have
(3.14) 
where is the infinitesimal generator of process (3.1), with
(3.15) 
Now consider the expansion of in powers of . Plugging it into (3.14) and comparing different powers of , we obtain :
(3.16)  
(3.17) 
By the assumption that the fast dynamics (3.2) are ergodic for every with unique invariant density , it follows that is the unique solution to the linear equation with . Here is the adjoint operator of with respect to the standard scalar product in the space . Hence we can conclude from (3.17) that is independent of . Integrating both sides of (3.16) against , we obtain a closed equation for :
(3.18) 
with
(3.19) 
Notice that is the infinitesimal generator of the averaged dynamics (3.3). Again by the FeynmanKac formula, the solution to (3.18) is recognized as the conditional expectation
(3.20) 
of the averaged path functional over all realizations of the averaged dynamics (3.3) starting at . Recalling , it follows that has the expansion
(3.21) 
Combining (3.21) with (3.20) and the dual relation (2.12), we conclude that in (3.6) satisfies and is the leading term of in expansion (3.21). Finding the corresponding expression for the optimal control is now straightforward: Setting , the relation (2.16) between the optimal feedback control and the value function yields
(3.22) 
where all functions are evaluated at .
The last equation shows that (3.8) appears to be the leading term of the optimal control force as . Reiterating the argument given in Section 2, we expect (3.8) to be a reasonably good approximation of the exact control force that gives rise to sufficiently accurate importance sampling estimators of (2.1) in the asymptotic regime .
As for the corresponding numerical algorithm, our derivations suggest that one possible strategy for finding good control forces for importance sampling is to first compute from (3.6) or (3.20), which corresponds to a lowdimensional stochastic optimal control problem, and then to construct the control force as in (3.8) to perform importance sampling. The numerical strategy will be discussed in Section 4, along with some details regarding the numerical implementation.
Remark 3
A closely related variant of the slowfast dynamics (3.1) is homogenization problems that exhibit more than two time scales book_PS08 . Although a rigorous treatment of multiscale diffusions with three or more time scales is beyond the scope of this work, we stress that the formal asymptotic argument carries over directly. See ipdupuismultiscale ; ipkostas1 ; ipkostas3 for large deviations and importance sampling studies of related dynamics.
4 Numerical example
In this section, we study a numerical example and discuss some algorithmic issues related to the calculation of the suboptimal control force (3.8) as proposed in Section 3. The dynamics we considered here is described by the twodimensional SDE
(4.1) 
where , is a twodimensional Wiener process and . The potential is defined as
(4.2) 
with if , and otherwise. The function is a smooth bistable potential that has two “wells” centered around and . As in (2.1), we aim at computing the expectation
(4.3) 
where
(4.4) 
with parameter . The graphs of the functions and are shown in Figure 1. Notice that the auxiliary function is introduced in (4.2) and (4.4) in order to guarantee that Assumption 13 of Theorem 3.1 in Section 3 are satisfied. More discussions on these assumptions can be found in the section of Introduction and Conclusions.
Using the specific form of potential , we can explicitly compute the invariant measure of the fast dynamics in (4.1), which for each fixed has the Lebesgue density
(4.5) 
Recalling the discussion in Section 3, especially (3.3) and (3.4), we conclude that the averaged dynamics is a onedimensional diffusion in a double well potential
(4.6) 
where the potential is given in (4.2) and is a onedimensional Wiener process.
Before we proceed, we shall briefly discuss the potential difficulties to
compute (4.3) with the standard Monte Carlo method, which is
mainly due to the inherent metastability of the system, even for moderate
values of . To this end, notice that, in the path space, the exponential integrand in (4.3)
is peaked around trajectories which spend a large portion of time at the minimum
of , which is located around (Figure 1).
But in order to get close to the state , trajectories starting from need to cross the energy barrier
of (Figure 1).
The probability of these barriercrossing trajectories is roughly of order when is large. Combining these facts, we
expect that the rare barrier crossing events play an important role for
computing (4.3). And standard Monte Carlo method will be inefficient in such a situation due to
insufficient sampling of these rare events (cf. the discussion in Section 1).
Computation of the suboptimal estimator based on the averaged equation. Now let us consider the method outlined in Subsection 3.1. In accordance with (3.18), the conditional expectation solves the linear backward evolution equation
(4.7) 
with
(4.8) 
The equation for is onedimensional (in space), and can be solved by standard gridbased method. For instance, using Rothe’s method, we can first discretize (4.7) in time, which yields
(4.9) 
where denotes the approximation of at time , with time step size . Equation (4.9) is then further discretized in space using the structurepreserving finite volume method described in Latorre2010 . Starting from , we can obtain all for by solving (4.9) backwardly.
After obtaining , we can compute the feedback control force (3.8) as
(4.10) 
when system’s state is at at time . Plugging the last expression into (4.1) then yields the controlled dynamics (also see (2.7))
(4.11) 
which will be employed to sample (4.3) using the reweighted estimator (2.8).
Numerical results. Now we turn to the numerical results. Table 1 shows the numerical results of the Monte Carlo method with the above importance sampling strategy, i.e. (4.11), which should be compared to Table 2 that shows the result of standard Monte Carlo method. For both the weighted and unweighted estimates, the sample size was set to trajectories of length with time step that is chosen small enough to remove discretization bias. The control (4.10) was obtained by computing from (4.9) on a grid of size . For comparison, we have computed a reference importance sampling MonteCarlo solution (“exact” mean value) based on independent realizations that is displayed in Table 1 in the column with label “”. The performance of the Monte Carlo methods can be evaluated based on the variance (2.6) and the relative error (2.10). In our numerical study, they are estimated from the sampled trajectories as
(4.12) 
where is the th trajectories, , is the estimator (2.8) of , and denotes the control force. See Section 2 for details. Furthermore, in order to illustrate the actual effect of the control force, we monitor the barrier crossing events with for some and let record the ratio of trajectories which cross the barrier among all the trajectories.
In Table 1, for different values of , we can see that the relative error of the importance sampling estimator becomes smaller as decreases from to . This indicates that the importance sampling estimator performs better and better when deceases and therefore is accordance with the conclusion of Theorem 3.1 in Section 3.
It is also worth making a comparison of both the importance sampling estimator and the standard Monte Carlo estimator. For the importance sampling estimator (Table 1), we observe that both the mean values and the variances, estimated with trajectories, are stable after we ran several times and are close to the results estimated with trajectories, which we take as the “exact” mean value. For the standard Monte Carlo method (Table 2), at , while it gives acceptable mean values, the sample variances (and the relative errors) are larger compared to the importance sampling estimator. For , the results of standard Monte Carlo method drift away from the “exact” mean values and show a significant bias. These results indicate that the standard Monte Carlo method is inefficient or useless in this situation.
The above results can be better understood if we record the barriercrossing events during time . These events are related to the metastability of the system and become rare for and . In the “” column of Table 2, we see that very few trajectories can cross the energy barrier when , and it becomes even rarer when is further increased to , at which no barriercrossing trajectories are sampled with trajectories. This observation reveals the fact that crossing the energy barrier is a rare event (in the uncontrolled system) due to system’s metastability at moderate temperature. And it also explains why the estimations of the mean values are largely underestimated by the standard Monte Carlo method (compare Table 1 and Table 2). On the other hand, as shown in “” column of Table 1, the barriercrossing events are much better sampled by the importance sampling estimator. Figure 2 shows the control force (4.10) as a function of and time for various values of . We clearly observe that the control acts against the energy barrier (blue region) and assists the slow variable of the system to transit from to .
We conclude this section with a couple of comments on numerical issues.
Remark 4

It is necessary to solve the averaged equation (3.6) for , or equivalently (3.18) for , in order to compute control (3.8). Solving from (3.18) may be relatively easy because the equation is linear. Furthermore, since equation (3.18) doesn’t involve the small parameter any more, it can be solved on a coarser grid and the numerical computation is not expensive.

In our example, the probability density can be solved analytically and used to obtain averaged dynamics (3.3) or (4.6). In general, the coefficients (3.4) of the averaged dynamics (3.3) could be numerically computed from the time integration of the fast subsystem (3.2). See Chapter  of book_PS08 and also analysis_SDE for more details.

In principle, the method described above for solving linear PDE (4.7) is computationally applicable when the dimension of system’s slow variables is smaller or equal to . In certain cases, however, the slow dynamics may still be higher dimensional, and alternatives to the direct numerical discretization are needed. We refer to the Conclusions for further discussions of this issue.