Bounding stationary averages of polynomial diffusions via semidefinite programming

# Bounding stationary averages of polynomial diffusions via semidefinite programming

## Abstract

We introduce an algorithm based on semidefinite programming that yields increasing (resp. decreasing) sequences of lower (resp. upper) bounds on polynomial stationary averages of diffusions with polynomial drift vector and diffusion coefficients. The bounds are obtained by optimising an objective, determined by the stationary average of interest, over the set of real vectors defined by certain linear equalities and semidefinite inequalities which are satisfied by the moments of any stationary measure of the diffusion. We exemplify the use of the approach through several applications: a Bayesian inference problem; the computation of Lyapunov exponents of linear ordinary differential equations perturbed by multiplicative white noise; and a reliability problem from structural mechanics. Additionally, we prove that the bounds converge to the infimum and supremum of the set of stationary averages for certain SDEs associated with the computation of the Lyapunov exponents, and we provide numerical evidence of convergence in more general settings.

## 1Introduction.

Stochastic differential equations (SDEs) and the diffusion processes they generate are important modelling tools in numerous scientific fields, such as chemistry, economics, physics, biology, finance, and epidemiology [22]. Stationary measures are to SDEs what fixed points are to deterministic systems: if the SDE is stable, then its stationary measures determine its long-term statistics. More concretely, both the ensemble averages and the time averages of the process converge to averages with respect to a stationary measure (which we call stationary averages). For the majority of SDEs, there are no known analytical expressions for their stationary measures. Consequently, large efforts have been directed at developing computational tools that estimate stationary averages and, more generally, at developing tools that can be used to study the long-term behaviour of SDEs. Among these, most prominent are Monte Carlo discretisation schemes, PDE methods (finite-difference, finite-element, Galerkin methods), path integral methods, moment closure methods, and linearisation techniques (see, e.g., [39]).

The purpose of this paper is to introduce a new algorithm that provides an alternative approach to the analysis of stationary measures. It uses semidefinite programming to compute hard bounds on polynomial stationary averages of polynomial diffusions. Our approach has distinct advantages: (i) it returns monotone sequences of both upper and lower bounds on the stationary averages, hence quantifying precisely the uncertainty in our knowledge of the stationary averages; (ii) no assumptions are required on the uniqueness of the stationary measures of the SDE; and (iii) the availability of high quality SDP solvers and of the modelling package GloptiPoly 3 drastically reduces the investment of effort and specialised knowledge required to implement the algorithm for the analysis of a given diffusion process.

### 1.1Problem definition.

We consider -valued diffusion processes satisfying stochastic differential equations of the form

where the entries of the drift vector and the diffusion coefficients are polynomials. In the above, is a standard -valued Brownian motion and the initial condition is a Borel measurable random variable. We assume that the underlying filtered space satisfies the usual conditions.

A Borel probability measure on is a stationary (or invariant) measure of the dynamics if

where we use the notation to mean that the random variable has law . The set of stationary measures of is denoted .

The problem we address here is how to estimate stationary averages of the form

in a systematic, computationally efficient manner. We present an algorithm that yields bounds on averages when is a polynomial. Hence, our algorithm can be used to bound the moments of the stationary measures of .

More precisely, the algorithm returns lower and upper bounds on the set

where is a given real algebraic variety

for given polynomials . The case corresponds to .

Stationary averages of the form and the set are of broad interest: if enjoys some mild stability and regularity properties [32], the stationary averages provide succinct, quantitative information about the long-term behaviour of . In particular, for almost every sample path (that is, -almost every ), there exists a such that

see Theorem ? for a formal statement. Furthermore, for appropriately chosen , the set is the set of limits where is any sample path contained in :

where is the subset of samples such that belongs to for all , see [32]. Similar considerations also apply to the ensemble averages (see the remark after Theorem ?). Therefore, bounds on the set equip us with quantitative information on the long-term behaviour of the paths of that are contained in .

### Remark (Linking the long-term behaviour to the stationary measures):

Although our work is motivated by the study of the long-term behaviour of a diffusion , the scope of this paper is restricted to the problem of bounding the stationary averages and the associated set . It is important to remark that to connect the long-term behaviour of with the set (and the bounds our algorithm returns) it is necessary to establish separately:

• the existence of stationary measures of with support contained in and the finiteness of their moments up to order ;

• the convergence of the time averages, i.e. verify that the limit holds for the diffusion at hand;

• that, for the initial conditions of interest, takes values in .

A straightforward way to verify (c) is to apply Itô’s formula and check that

If the initial condition takes values in , then clearly takes values in as well. Establishing (a) and (b) typically requires additional proofs beyond the scope of this paper, and has been studied extensively elsewhere (see [32]). We point out that whether or not conditions (a)-(c) hold, the algorithm still yields bounds on . Without proving (a)-(c), it may simply be that the set is empty, or that the relationships or do not exist. To make the paper self-contained, we recall briefly in Section 2.2 some simple conditions that we use in our later examples to establish (a) and (b).

### 1.3Brief description of algorithm.

Mathematically, our approach consists of four steps:

1. We derive a finite set of linear equalities satisfied by the moments of any stationary measure of (see Lemma ? and Section 3.1). Such a system of equalities is often underdetermined (see Example ?) and therefore admits infinitely many solutions.

2. To rule out spurious solutions, we exploit the fact that the solutions must be the moments of a probability measure, hence must satisfy extra constraints (e.g. even moments cannot be negative). We use well known results [25] to construct semidefinite inequalities (so called moment constraints) that are satisfied by moments of any probability measure with support contained in . This is done in Section 3.2.

3. Exploiting the fact that is a polynomial, we rewrite the stationary average as a linear combination of the moments of .

4. By maximising (resp. minimising) the linear combination over the set of all vectors of real numbers that satisfy both the linear equalities and the semidefinite inequalities, we obtain an upper (resp. lower) bound on the set .

Computationally, to find the upper (or lower) bound we solve a semidefinite program, a particularly tractable class of convex optimisation problems for which high-quality solvers are freely available online. The semidefinite constraints in (2), popularised by Lasserre and co-authors (see [25] and references therein), can be implemented via the freely-available, user-friendly package GloptiPoly 3 [13], which makes the approach described here accessible to non-experts.

### Remark (Convergence of the algorithm):

Concerning moment approaches, a question that often arises is that of convergence of the algorithm. In the context of this paper, this question takes the following form. Suppose that we want to obtain bounds on the set

where is the degree of . Note that if is compact, then any measure with support on has all moments finite, thus ; hence is the only set of interest. As will become clear later, repeated applications of our algorithm yield both an increasing sequence of lower bounds and a decreasing sequence of upper bounds on . The algorithm is said to converge if these sequences converge to the infimum and supremum, respectively, of . In general, the algorithm presented in this paper is not guaranteed to converge in the above sense. However, our numerics indicate that the algorithm often converges in practice (see Examples ?, ? and ?). In Section 4.2, we do prove convergence for SDEs related to the Lyapunov exponents of linear ordinary differential equations (ODEs) perturbed by multiplicative white noise. The question of convergence is of theoretical interest, but regardless of its answer, the bounds computed are still valid and are often still appropriate for the application in question (e.g., see the examples in Section 4).

### 1.5Related literature and contributions of the paper.

Computational methods that yield bounds on functionals of Markov processes by combining linear equalities (arising from the definitions of the functional and the process) and moment constraints have appeared in the literature. We refer to this class of methods as generalised moment approaches (GMAs). The various GMAs differ in the type of Markov processes and moment constraints they consider. The ideas underlying GMAs were first discussed in [4] where they were used to obtain analytical bounds on moments for measure-valued diffusions. The first GMA was presented in E. Schwerer’s PhD thesis [40] in the context of jump processes with bounded generators and reflected Brownian motion on the non-negative orthant. In [15], the authors present a GMA that yields bounds on the moments of stationary measures of discrete-time Feller Markov chains. In [12], analogous techniques are used to bound the moments of the stationary measures of diffusion approximations of the Wright-Fisher model on the unit simplex. GMAs have also been proposed to solve optimal control problems [14], to estimate exit times [10], and to price financial derivatives [27].

The contributions of this paper are as follows. Whereas [40] consider only stationary averages of specific SDEs, we introduce GMAs to the setting of general polynomial diffusions over unbounded domains—this requires setting up the technical background of Lemma Section 2.4. Also in contrast with [40], we employ moment constraints that lead to SDPs instead of linear programs. In Section 4.2, we prove convergence of our algorithm for SDEs related to the Lyapunov exponents of linear ODEs perturbed by multiplicative white noise. To the best of our knowledge, this is the first such result in the setting of stationary averages of diffusion processes. The remaining contributions are the applications of the algorithm to several examples of interest. Section 4.1 explains how the algorithm can be combined with the ideas underlying the Metropolis Adjusted Langevin Algorithm (a Markov chain Monte Carlo algorithm) to carry out numerical integration with respect to certain target measures, and then applies it to a simple Bayesian inference problem. In Section 4.2, we use our algorithm to obtain bounds on the Lyapunov exponents of linear differential equations perturbed by multiplicative white noise, and we show that in this case our approach is both sufficient and necessary (i.e., with enough computation power, it yields lower (upper) bounds arbitrarily close to the minimum (maximum) Lyapunov exponent, see Theorem ?). Finally, in Section 4.3 we explain how the algorithm can be extended to yield bounds on stationary averages where is piecewise polynomial, and we use this extension to tackle a reliability problem from structural mechanics.

## 2Background and preliminaries.

### 2.1Notation

Throughout this paper we use the following notation:

• denotes expectation with respect to the underlying probability measure , and we use and interchangeably.

• Given a function , denotes its partial derivative with respect to its argument; ; denotes its gradient vector; and denotes its Hessian matrix.

• Suppose that is a smooth manifold. denotes the set of real-valued, twice continuously differentiable functions on .

• For any two matrices ,

denotes the trace inner product of and , and denotes the Frobenius norm of .

• Let be the set of -tuples of natural numbers . Let be the subset of -tuples such that . The cardinality of is . We define the sum of two tuples to be the tuple .

• The space of real-valued vectors indexed by , , is isomorphic to and we make no distinction between them. Similarly for the space of real-valued matrices indexed by , , and . With this in mind, we denote the standard inner product on as

and the standard outer product on as

• Let and . Monomials are denoted as , and denotes the vector of monomials of degree or less. Hence the -th component of the -dimensional vector is given by

Let denote the vector space of real polynomials on of degree at most . The set is a basis (known as the canonical or monomial basis) of , and so we can write any polynomial as

where in is the vector of coefficients of . We denote the degree of any polynomial with .

• Let be a Borel measure on . A vector in is the vector of moments of order up to of if, for any , the -th component of is given by

assuming that the integrals are well defined.

### 2.2Stability and regularity properties of X.

We now briefly review well known properties of and a relevant drift condition used in the examples below. Throughout this section we assume that is an -dimensional smooth submanifold of . For this to be the case, it is sufficient that the vectors form a linearly independent set, for each in , where the ’s are defined in .

Firstly, smoothness of the components of and imply that has a unique strong solution , which is defined up to a stopping time [47]. The generator (or Kolmogorov operator) associated with is the second order differential operator

for , , and where denotes the diffusion matrix of . It is well known that if is a hypoelliptic operator on (see [16]), then generates a strong Feller Markov process. This is the regularity property of we use in our examples below. Although this condition can be replaced with weaker ones (e.g., being a T-process [32]), such alternative conditions usually require more work to establish in practice. The stability properties we require are summarised as follows:

In the case , it is easy to argue that the solution is globally defined. In the case , global existence of the solution follows from [33]. The rest follows from Theorems 3.4 and 8.1 in [32], plus Theorem 4.7 in [33] in the case of the drift condition.

### Remark (The ensemble averages):

If, additionally to the premise of Theorem ?, the semigroup generated by is aperiodic [32], then the analogous statements to Theorem ? (iii) and (iv) hold for the ensemble averages . In this case, we have that

where depends on the law of the initial condition , see [32].

### 2.4A relationship between the generator of and its stationary measures.

The following technical lemma is necessary for the development of the algorithm presented in this paper.

An application of Itô’s formula shows that, if , then

is a local martingale. The generator evaluated at function and point describes the rate of change in time of the expected value of conditioned on the event . If is the law of , then describes the rate of change in time of . It follows that if is a stationary measure of , then the law of does not change in time, and thus we would expect that . Unfortunately, for technical reasons, this is not always the case (see [8] for counterexamples). However, it is not difficult to find sufficient conditions on such that . The following lemma gives one such condition specialised for polynomial functions , which are the focus of this paper. For a proof of the lemma see Appendix A.

## 3The algorithm.

Our algorithm constructs a tractable outer approximation of the set defined by . As shown below, the approximation we derive is the image of a spectrahedron (a set defined by linear equalities and semidefinite inequalities) through a linear functional. Finding the infimum and supremum of reduces to solving two SDPs, which can be efficiently carried out using one of several high-quality solvers freely available. Since , the computed infimum (supremum) is a lower (upper) bound of .

To introduce our method, we first take a closer look at the set from two alternative perspectives. First, note that the set defined in is the image of the set

through the linear functional (on the vector space of signed measures) . It is straightforward to check that, as a subset of the vector space of signed measures, is convex. Consequently, its image is a (possibly unbounded) interval, which is fully described by its supremum and infimum (leaving aside whether or not contains its endpoints).

Alternatively, since is a polynomial (with vector of coefficients ), the set is the image of the set

through the linear functional (on ) We now see some concrete examples of these sets.

In the above example, we could obtain the sets , and their projections directly from . However, this is difficult in general. Indeed, results from real algebraic geometry show that optimising over the cone of vectors whose components are moments of a measure is an NP-hard problem [25]. We believe that, except for trivial cases, the same holds for which is a subset of this cone. Instead, we construct here a spectrahedral outer approximation of the set . Optimising over consists of solving an SDP, a polynomial-time problem. Explicitly, is a subset of defined by linear equalities and semidefinite inequalities such that . Because the outer approximation is a convex set, its image through the linear functional is an interval that contains :

Hence our task is reduced to obtaining the outer approximation . We do this in two steps:

• In Section 3.1, we use Lemma ? to construct a set of linear equalities satisfied by the moments of any stationary measure of .

• In Section 3.2, we construct a set of linear equalities and a semidefinite inequality satisfied by the moments of any unsigned measure with support on .

The outer approximation then consists of the set of vectors in that satisfy both of the above.

### 3.1Linear equalities satisfied by the moments of stationary measures.

By assumption, both the drift vector and the diffusion coefficients in are polynomials. Therefore, if is a polynomial, is also a polynomial. Suppose that belongs to and choose any measure in that has as its vector of moments of order . From Lemma ?, if , then

Since spans , checking that satisfies the above is equivalent to checking that

In words, every vector in satisfies the linear equalities defined by .

At this point, it is worth remarking that the conditions on and in Lemma Section 2.4 are only sufficient but not necessary for to hold, as the following example shows.

For most SDEs, together with the equations defines a set of underdetermined linear equations as the following example illustrates.

### 3.2Moment conditions.

That the moment equations are underdetermined in the above example is essentially the same issue that moment closure methods attempt to address (see [39] for a review). These methods ‘close’ the equations by heuristically removing the dependence of the first few equations on higher moments. We do not follow this approach here. Instead, we overcome this issue by exploiting the fact that we are not interested in all the solutions to the equations , but only in those that are the vector of moments of a probability measure with support contained in . There are various tractable conditions, known as moment conditions, which are satisfied by the vector of moments of any measure with support contained in , but not in general by an arbitrary vector of real numbers. For example, trivially, the even moments of an unsigned Borel measure on must be non-negative. We now describe in more detail the moment conditions we use in the rest of this paper.

Let be a measure with vector of moments of order , and let be a polynomial of degree . If has support contained in , then

By the definition , is zero everywhere in for all . Thus for every and . Since spans , checking that satisfies for any given is equivalent to checking that satisfies the following equations:

To these linear equations, we append a semidefinite inequality that stems from the fact that probability measures are unsigned. Let be any polynomial of degree . Then it follows that

Since is a non-negative function we have that

where the moment matrix

denotes the element-wise integration of the matrix . Note that the entries of the moment matrix are a function of the moments of :

Since holds for all , is positive semidefinite:

We then combine , , and the normalisation to obtain our outer approximation:

From , it follows that the projection contains . Therefore, we have the following bounds:

Computing and can be efficiently done by solving two SDPs with variables each.

For many SDEs, like those in Examples ? and ?, all stationary measures have moments of some order . If is compact, ; otherwise, such a can be found by verifying a drift condition like the one in Condition ?. In such cases, for all . However, as Example ? shows, this does not hold in general for our outer approximations. Instead, we only have that for all . Therefore are monotone non-decreasing (resp. non-increasing) sequences of lower (resp. upper) bounds on . In practice, the best bounds are obtained by solving for the infimum/supremum of for the largest that can be handled computationally by the solver.

## 4Applications.

We now consider three applications of the algorithm. To compute the bounds presented in this section we used the modelling package GloptiPoly 3 [13] to construct the SDPs corresponding to the outer approximations and the solver SDPT3 [46] to solve the SDPs. All computations were carried out on a desktop computer with a 3.4 GHz processor and 16GB of memory running Ubuntu 14.04.

### 4.1Langevin diffusions, numerical integration, and an inference problem.

The Metropolis Adjusted Langevin Algorithm (MALA) [37] is a popular Markov chain Monte Carlo algorithm (MCMC). MALA can be used to estimate integrals with respect to measures of the form

where denotes the Lebesgue measure on , is a smooth confining potential, and is the normalising constant It is well known that is the unique stationary measure of the Langevin diffusion

The SDE has globally defined solutions and, regardless of the initial condition, the limit holds with for all -integrable functions [37]. MALA proceeds by discretising , adding a Metropolis accept-reject step to preserve stationarity of , and simulating the resulting chain. The time averages of the simulation then converge to the desired average [37].

Since is the unique stationary measure of , we can use our algorithm to directly compute bounds on when both and are polynomials, circumventing any discretisation or simulation. We illustrate this idea with the following simple Bayesian inference problem.

It is important to remark that Example ? can be solved equally well with MCMC methods—indeed, MCMC algorithms scale better than ours with the dimension of the target measure . However, our alternative approach presents some attractive features: it is fast (see caption of Fig. ?) and simple to implement; no tuning of the algorithm is required (e.g., choosing the discretisation step size in MALA); and it delivers deterministic bounds on the integrals of interest instead of stochastic estimates (hence avoiding issues concerning the convergence of MCMC simulations). Our algorithm can also provide useful information in situations where MCMC methods face difficulties; in particular when the target distribution has several isolated modes. In such cases, MCMC algorithms get stuck at one of the modes and do not explore the rest of the target distribution. Our numerical observations suggest that our algorithm is also affected by the presence of isolated modes, producing a large gap between the upper and lower bounds for the desired integrals (since each bound is stuck at a different mode). The presence of such large gaps can alert the practitioner to the existence of isolated modes, something which is often not obvious for target distributions of dimension three or more. Methods designed to deal with isolated modes, such as simulated-tempering [3], can then be used instead of standard a Metropolis-Hastings MCMC method.

### 4.2Lyapunov exponents of linear SDEs driven by multiplicative white noise.

In practical applications involving systems of linear ordinary differential equations (ODEs), we are interested in situations where the parameters are perturbed by Gaussian white noise. In those cases, we obtain the following class of linear stochastic differential equations

where , and are independent standard Brownian motions on . It is well known that has globally defined solutions. Furthermore, there exists a jointly continuous process such that is the unique solution of (see [17]).

In 1967, Khas’minskii [21] made the following observation. Applying Itô’s formula twice, he found that the projection of onto the unit sphere, , satisfies the SDE

where and . In addition, satisfies

where It is not difficult to argue [20] that

Suppose that the generator of is hypoelliptic on , where denotes the unit sphere in . Since , by definition, takes values in , Theorem ? tells us that has at least one stationary measure, and together with and that for -almost every sample path there exists a stationary measure of such that

Typically, the above integral is estimated by choosing an appropriate discretisation scheme for , simulating the resulting chain, and computing the corresponding time average [44]. We instead exploit the fact that and are all polynomials and apply our algorithm on to compute bounds for . In particular, for any we have that:

where and are as in with notation adapted to . Note that

which implies that

i.e., the equilibrium solution of is almost surely asymptotically stable if , and almost surely asymptotically unstable if . Therefore, our algorithm applied to yields a sufficient test for the asymptotic stability or instability of . The method also yields a necessary test for asymptotic stability in the following sense:

Let denote the generator of and not of . The theorem is proved only for the lower bounds . The proof for the upper bounds is identical. The proof proceeds in three steps:

1. We show that the set of limits is the same as the set

2. We show that the equalities fully characterise the stationary measures of , in the sense that if is a Borel probability measure with support contained in such that

then is a stationary measure of .

3. We then only have to apply Theorem 4.3 in [25], which shows that, for large enough , , and

Then fully characterises the stationary measures of , which implies that

For the detailed proof, see Appendix B.

In Example ?, we could have evaluated numerically instead of computing . However, such analytical expressions for stationary measures of are not available in higher dimensions. As explained in [1] (see also [48]): “The direct use of Khas’minskii’s method to higher dimensional systems has not met with much success because of the difficulty of studying diffusion processes occurring on surfaces of unit hyperspheres in higher dimensional Euclidean spaces”. Our approach complements Khas’minskii’s procedure: it is simple to implement; the number of stationary measures of is not a limitation; and the fact that the measures have support on the unit sphere is an advantage instead of a disadvantage, as Theorem ? shows.

### 4.3Piecewise polynomial averages, a noisy nonlinear oscillator and structural reliability problems.

Our algorithm can be extended to bound stationary averages where is a piecewise polynomial of the type

Here are polynomials, is the indicator function of set , and are disjoint sets in defined by

This type of extension was first discussed in [24], and has been considered in [25]. Many stationary averages of interest can be written in the above form. For instance, if , then is the probability that event occurs.

#### Extension of the algorithm.

We follow here similar arguments to those presented in [25]. Let be a stationary measure of with support in and with moments of order . Let be the restriction of to set , and let denote the restriction to , i.e., to the complement in of the union of all s. Clearly,

Let denote the vectors of moments of , respectively. From the definition it follows that

assuming that . The normalisation condition takes the form:

Similarly to , the proof of Lemma ? tells us that the stationarity of implies that

The vectors of moments also fulfil similar conditions to and , in particular:

Furthermore, the definition of the sets leads to two additional sets of conditions. Firstly, the vectors of moments fulfil

Secondly, using similar arguments as those leading to , we obtain additional moment conditions. Let be a polynomial that is non-negative on and is any polynomial of degree . Then we have

Since has support in

where is the shift operator and the localising matrix

is the element-wise integration of the matrix . Since holds for all , the localising matrix is positive semidefinite. From the definition , is nonnegative on , hence we obtain our additional set of moment conditions:

Together, – and provide computationally tractable necessary conditions satisfied by the moments of any measure with support on . Thus we have the following outer approximation of :

The extended algorithm obtains bounds on by finding the infimum and supremum of ; that is, by solving two SDPs with times as many variables as in our original algorithm.

### Remark (Support of π0).

Notice that does not include conditions on related to the support of . The reason why is that, in general, has no simple description of the form . If such a description exists, extra constraints can easily be appended. If such constraints are lacking, there is an unfortunate consequence: always contains the origin, since satisfies all the constraints in . Consequently, our extended algorithm only yields informative bounds if is nonnegative (or nonpositive) and, in this case, only the upper (resp. lower) bound will be informative. However, for certain purposes this may be sufficient, as the following example demonstrates.

## 5Concluding remarks.

In this paper, we have introduced an algorithm based on semidefinite programming that yields upper and lower bounds on stationary averages of SDEs with polynomial drift and diffusion coefficients. The motivation behind our work is the study of long-term behaviour of such SDEs. As explained in the introduction, additional work is required to link the bounds obtained by our algorithm with this long-term behaviour. Typically, a drift condition must be verified by finding an appropriate Lyapunov function [33]. For polynomial drift vectors and diffusion matrices, one can also employ semidefinite programming to search for these Lyapunov functions (see, sum of squares programming approaches [35]). In many respects, these approaches are dual to the method we describe in this paper, see [28] for more on the connections.

Our algorithm is also applicable to SDEs whose diffusion coefficients are not polynomial, but whose diffusion matrix is polynomial (e.g., the Cox-Ingersoll-Ross interest rate model in financial mathematics). We have concentrated on polynomial diffusion coefficients for simplicity, in order to guarantee uniqueness of solutions. Furthermore, our algorithm can be extended to SDEs with rational drift vector and diffusion matrix—one must just carefully choose polynomials such that is still a polynomial.

Our choice of moment constraints was motivated by the convenience of use of the modelling package GloptiPoly 3. However, there is a wide selection of moment conditions, some of which lead to easier conic programs (e.g., linear programs or second-order cone programs) [25]. We also restricted ourselves to stationary measures with supports contained in algebraic varieties. We did this to simplify the exposition and because the applications chosen did not require more generality. However, from Section 4.3 it is clear that a similar algorithm can be constructed for measures with supports in so called basic semialgebraic sets of the form [25]. Such an approach could be advantageous for Example ?—using Stroock Varadhan’s Support Theorem it is not difficult to deduce that any stationary measure of those SDEs must have support on the nonnegative semiaxis .

Lastly, a practical issue relating to numerical aspects of our algorithm, and of GMAs in general. In contrast with other approaches, our algorithm runs in polynomial time—its computational complexity follows from that of primal-dual interior point algorithms [23]. However, in our experience, the applicability of the algorithm is affected by certain numerical issues. Specifically, the SDPs that arise from moment problems can be badly conditioned, causing the solvers to perform badly for medium to large problems. We believe that this is a consequence of using raw moments as the basis of the space of moment vectors, as is done in most GMAs. Indeed, the order of magnitude of the moments of a distribution varies rapidly (e.g., see in Example ?). Since the feasible set over which we optimise contains such a vector of moments and these sets are often compact [25], we expect large discrepancies in the order of magnitude of the entries of our feasible points and of the moment matrix . This can lead to a bad condition number of and consequently to poor performance of the solver. Improvements could be achieved by using an orthonormal basis with respect to the measures of interest, but this is not easy in practice; not only do we usually have little a priori information about the measures to guide our choice of basis, but the necessary modifications of the algorithmic implementation are substantial and the package GloptiPoly 3 could not be used in its current form. In our experience, a simple way to mitigate the bad conditioning is to scale the entries of the vectors by , where . This is equivalent to scaling by the moments of a Dirac measure at , so should be chosen so that the entry is close to the absolute value of the component of the mean of the measure of interest. A similar scaling was employed in [10]. It is then straightforward to show that satisfies the same semidefinite inequalities as , and all that is left to do is to rewrite the equality constraints in terms of the rescaled variables .

### Acknowledgements:

We thank Nikolas Kantas and Philipp Thomas for many helpful discussions.

## AProof of Lemma

Let be a solution of whose initial condition has law . Suppose that (defined in ) is a martingale and not just a local martingale. Then

for any . Since is a stationary measure of , and so the above implies

Using Tonelli’s Theorem and stationarity we have