1 Introduction

# On hard quadrature problems for marginal distributions of SDEs with bounded smooth coefficients

## Abstract.

In recent work of Hairer, Hutzenthaler and Jentzen, see [9], a stochastic differential equation (SDE) with infinitely often differentiable and bounded coefficients was constructed such that the Monte Carlo Euler method for approximation of the expected value of the first component of the solution at the final time converges but fails to achieve a mean square error of a polynomial rate. In the present paper we show that this type of bad performance for quadrature of SDEs with infinitely often differentiable and bounded coefficients is not a shortcoming of the Euler scheme in particular but can be observed in a worst case sense for every approximation method that is based on finitely many function values of the coefficients of the SDE. Even worse we show that for any sequence of Monte Carlo methods based on finitely many sequential evaluations of the coefficients and all their partial derivatives and for every arbitrarily slow convergence speed there exists a sequence of SDEs with infinitely often differentiable and bounded by one coefficients such that the first order derivatives of all diffusion coefficients are bounded by one as well and the first order derivatives of all drift coefficients are uniformly dominated by a single real-valued function and such that the corresponding sequence of mean absolute errors for approximation of the expected value of the first component of the solution at the final time can not converge to zero faster than the given speed.

## 1. Introduction

Let and consider a -dimensional system of autonomous SDEs

 (1) t∈[0,1],dXa,b(t) =a(Xa,b(t))dt+b(Xa,b(t))dW(t),t∈[0,1], Xa,b(0) =0,

with an -dimensional Brownian motion and infinitely often differentiable, bounded coefficients and . In particular, there exists a unique strong solution of (1), see, e.g., Theorem 3.1.1 in [24], and we have for every . Let be a measurable function that satisfies a polynomial growth condition. We study the computational task to approximate the quantity

 S(a,b,f)=E[f(Xa,b(1))]

by means of an algorithm that uses function values of , and and, eventually, their partial derivatives , , at finitely many points in .

A classical method of this type is given by the Monte Carlo quadrature rule based on repetitions of the Euler scheme with time step , i.e.,

 ˆSEn(a,b,f)=1nn∑i=1f(Ya,bi),

where are independent and identically distributed as and the scheme is recursively defined by and

 ˆXa,b(ℓn)=ˆXa,b(ℓ−1n)+a(ˆXa,b(ℓ−1n))⋅1n+b(ˆXa,b(ℓ−1n))⋅(W(ℓn)−W(ℓ−1n))

for . Then it is easy to see that

 E[|S(a,b,f)−ˆSEn(a,b,f)|2]≤c1⋅∥f∥2Lip⋅exp(c2⋅maxα∈Nd0,|α|1=1(∥Dαa∥2∞+∥Dαb∥2∞))⋅1n,

where denotes the Lipschitz seminorm of , and and are positive reals, which only depend on the dimensions and . Thus, if the first order partial derivatives of the coefficients and are also bounded and the integrand is Lipschitz continuous then the sequence of Monte Carlo Euler approximations achieves a polynomial rate of root mean square error convergence of order in terms of the total number of evaluations of the coefficients and and the integrand .

On the other hand, Hairer, Hutzenthaler and Jentzen have presented in [9] an equation (1) with , and infinitely often differentiable, bounded coefficients such that for the integrand and every ,

 (2) limn→∞nκ⋅E[|S(a,b,f)−ˆSEn(a,b,f)|2]=∞.

Hence the sequence of Monte Carlo Euler approximations might not achieve a polynomial rate of root mean square error convergence in terms of the number of evaluations of the coefficients and and the integrand if the first order partial derivatives of the coefficients are not bounded as well.

It seems natural to ask, whether the latter result demonstrates only a particular fallacy of the Monte Carlo Euler method and a polynomial rate of convergence could always be achieved for equations (1) with infinitely often differentiable and bounded coefficients , if only a more advanced approximation scheme than the Euler scheme would be employed. In fact, there is a variety of strong approximation schemes available in the literature that have been constructed to cope with non-Lipschitz continuous coefficients and have been shown to achieve a polynomial rate of convergence, in terms of the number of time steps, for suitable classes of such equations. See, e.g., [11, 10, 27, 18, 13, 30, 26, 25, 4, 29, 5, 16] for equations with globally monotone coefficients and see, e.g., [3, 8, 7, 1, 20, 12, 14, 6] for equations with possibly non-monotone coefficients.

However, the following result, Theorem 1, which is a straightforward consequence of Corollary 1 in Section 4.1, shows that for the pessimistic alternative is true in a worst case sense with respect to the coefficients and . For every sequence of Monte Carlo methods based on some kind of Itô-Taylor scheme there exists a sequence of equations (1) with infinitely often differentiable and bounded by one coefficients such that the resulting sequence of mean absolute errors for approximating the expected value of the first component of the solution at the final time does not converge to zero with a polynomial rate.

To state this finding in a more formal way let

 I=∞⋃k=1{0,1}k

denote the set of all finite sequences of zeros and ones, put for , and for , and let

 Jβn,ℓ=∫ℓnℓ−1n…∫u2ℓ−1n1dWβ1(u1)…dWβℓ(uℓ)

denote the corresponding iterated Itô-integral over the time interval .

###### Theorem 1.

Let and let be a measurable mapping. For all infinitely often differentiable functions and and every define the scheme by and

 ˆXa,bn(ℓn)=ˆXa,bn(ℓ−1n)+φ((Dαa(ˆXa,bn(ℓ−1n)),Dαb(ˆXa,bn(ℓ−1n)))α∈N40,(Jβn,ℓ)β∈I),

and let be independent and identically distributed as .

Then there exist sequences and of infinitely often differentiable functions and with and such that for every ,

 limn→∞nκ⋅E[∣∣E[Xan,bn1(1)]−1nn∑k=1Yan,bnn,k∣∣]=∞.

The latter result neither covers Multilevel Monte Carlo schemes nor the case of a non-uniform discretization of time. Moreover, one might argue that a result like Theorem 1 is not surprising since the order one partial derivatives of the chosen coefficients and in the theorem are not required to simultaneously satisfy some kind of growth condition. However, from Corollary 1 in Section 4.1 we even obtain that the coefficients and in Theorem 1 can be chosen in such a way that the order one partial derivatives of are bounded by one as well and the order one partial derivatives of are dominated by the function , and that furthermore the statement of the theorem extends to any sequence of Monte Carlo methods based on sequential evaluation of the coefficients and and all their partial derivatives , at finitely many points in . More formally, we have the following theorem as an immediate consequence of Corollary 1.

###### Theorem 2.

Let and let be a probability space. For every let as well as

 ψn,i:(R4×R4)N0×{1,…,i−1}×Ω→R4,i=2,…,n,

and

 φn:(R4×R4)N0×{1,…,n}×Ω→R

be measurable mappings. For all infinitely often differentiable functions and and for every define random variables by and

 Zn,i(ω)=((Dαa,Dαb)(ψn,i(Zn,1(ω),…,Zn,i−1(ω),ω)))α∈N40,i=2,…,n.

Then for every there exist infinitely often differentiable functions with , and , for all with such that for every ,

 limn→∞nκ⋅E[|E[Xan,bn1(1)]−φn(Zan,bnn,1,…,Zan,bnn,n,⋅)|]=∞.

Perhaps even more surprising we obtain from Corollary 2 in Section 4.1 that for every such sequence of Monte Carlo methods and for every arbitrarily slow convergence speed there exists a strictly increasing and continuous function and a sequence of infinitely often differentiable and bounded by one coefficients , such that the order one partial derivatives of are bounded by one as well, the order one partial derivatives of are dominated by the function and the resulting sequence of mean absolute errors for computing the expectation of the first component of the solution at the final time can not converge to zero faster than the given speed of convergence. This finding is formally stated in Theorem 3, which follows from Corollary 2 in Section 4.1.

###### Theorem 3.

Let and let be a probability space. For every let and let

 ψn,i:(R4×R4)N0×{1,…,i−1}×Ω→R4,i=2,…,n,

as well as

 φn:(R4×R4)N0×{1,…,n}×Ω→R

be measurable mappings. For all infinitely often differentiable functions and and for every define random variables by and

 Zn,i(ω)=((Dαa,Dαb)(ψn,i(Zn,1(ω),…,Zn,i−1(ω),ω)))α∈N40,i=2,…,n.

Let be a sequence of positive reals with .

Then there exist and a strictly increasing, continuous function as well as sequences and of infinitely often differentiable functions with , and , for all with , such that for every ,

 E[|E[Xan,bn1(1)]−φn(Zan,bnn,1,…,Zan,bnn,n,⋅)|]≥c⋅εn.

In Theorems 13 the integrand is fixed to be a coordinate projection and lower bounds are provided for the worst case mean absolute error of a Monte Carlo quadrature rule on subclasses of equations (1) with infinitely often differentiable coefficients that are bounded by one. On the other hand, one can fix a specific equation (1) with infinitely often differentiable and bounded coefficients and and study the worst case mean absolute error of a Monte Carlo quadrature rule with respect to a class of integrands . In the latter setting a negative result of the type stated in Theorems 2 and 3, which holds for any sequence of Monte Carlo quadrature rules that are based on finitely many evaluations of the integrand , can of course not be true. In fact, consider the direct simulation method based on repetitions of the solution of the fixed equation (1) at the final time, i.e.,

 ˆSdsn(a,b,f)=1nn∑i=1f(Va,bi),

where are independent and identically distributed as . Clearly, if is bounded by one then

 E[|S(a,b,f)−ˆSdsn(a,b,f)|2]≤1n.

However, if only deterministic quadrature rules are considered then we obtain again negative statements in the spirit of Theorems 2 and 3 even for the seemingly easy problem of computing the expected value for a one-dimensional Brownian motion and infinitely often differentiable integrands that are bounded by one. For instance, we can show that for any sequence of deterministic quadrature rules that are based on evaluations of the integrand and all its derivatives at finitely many points in and for every arbitrarily slow convergence speed there exists a strictly increasing and continuous function and a sequence of infinitely often differentiable and bounded by one integrands such that the order one partial derivatives of are dominated by the function and the resulting sequence of approximation errors for computing the expectation can not converge to zero faster than the given speed of convergence. This finding, which is formally stated in the following theorem, is a straightforward consequence of Corollary 5 in Section 4.2.

###### Theorem 4.

Assume that is a one-dimensional Brownian motion. For every let and let be a measurable mapping. Let be a sequence of positive reals with . Then there exists , a strictly increasing, continuous function and a sequence of infinitely often differentiable functions with and such that for every ,

 ∣∣E[fn(W(1))]−φn((f(k)n(xn,1),…,f(k)n(xn,n))k∈N0)∣∣≥c⋅εn.

The findings stated in Theorems 1– 3 are worst case results for randomized quadrature rules with respect to a given class of equations (1). It remains an open question whether these results can be strengthened in the sense that for every sequence of Monte Carlo methods for quadrature of the first component of the solution, which are based on finitely many sequential evaluations of the coefficients and all their partial derivatives, there exists a single equation with infinitely often differentiable and bounded coefficients, which leads to the prescribed slow convergence rate of the corresponding sequence of mean absolute errors. Up to now, a positive answer to this question is only known for the sequence of Euler Monte Carlo schemes, see [9] and (2). Similarly, it is unclear, whether Theorem 4 can be strengthened in the sense that for every sequence of deterministic quadrature rules for quadrature with respect to the one-dimensional standard normal distribution, which are based on finitely many sequential evaluations of the integrand and all its derivatives, there exists a single infinitely often differentiable and bounded integrand leading to the prescribed slow convergence rate of the corresponding sequence of absolute errors. We conjecture that both questions can be answered to the positive and we will address these issues in future research.

We add that there is a number of results on worst case lower error bounds for quadrature of marginals of SDEs in the case of coefficients that satisfy a uniform global Lipschitz condition and integrands with first order partial derivatives that satisfy a uniform polynomial growth condition, see [23, 17, 22, 19].

We further add that recently in [15] equations (1) with infinitely often differentiable and bounded coefficients , have been constructed that can not be approximated at the final time in the pathwise sense with a polynomial rate by any approximation method based on finitely many evaluations of the driving Brownian motion. In the present paper we use a construction, which is conceptually similar to the one from [15] but specifically tailored to the analysis of the quadrature problem.

We briefly describe the content of the paper. In Section 2 we fix some notation with respect to the regularity of coefficients and integrands. In Section 3 we set up the framework for studying worst case errors of randomized and deterministic algorithms for the approximation of nonlinear functionals on function spaces. In particular, we establish lower error bounds for the corresponding minimal randomized and deterministic errors that generalize classical results of Bakhvalov [2] and Novak [21] for linear integration problems. In Section 4 we use the framework from Section 3 to study quadrature problems for SDEs. Section 4.1 is devoted to lower bounds for worst case errors with respect to the coefficients, while Section 4.2 contains our results on worst case errors with respect to the integrands. The proofs of the main results, Theorems 5 and 6, are carried out in Section 5.

## 2. Notation

Let . For a vector and a matrix we use and to denote the maximum norm of and , respectively. For a function we put . By we denote the set of all functions that are infinitely often differentiable and for and a multiindex we use

 Dαh=∂α1+⋯+αkh∂xαkk…∂xα11:Rk→Rℓ1×ℓ2

to denote the corresponding partial derivative of . For every we use

 Missing or unrecognized delimiter for \Bigl

to denote all functions that are bounded by one and have partial derivatives up to order that are bounded by one as well.

## 3. Approximation of nonlinear functionals on function spaces and lower worst case error bounds

Let and be nonempty sets, let be a nonempty set of functions and let

 S:G→R.

We study the approximation of for by means of a deterministic or randomized algorithm that is based on finitely many evaluations of the mapping at points in . Our goal is to provide lower bounds for the worst case mean error of any such algorithm in terms of its worst case average number of function evaluations.

A generalized randomized algorithm for this problem is specified by a probability space and a triple

 (ψ,ν,φ),

where

• is a sequence of mappings

 ψk:Bk−1×Ω→A,

which are used to sequentially determine random evaluation nodes in for a given input ,

• the mapping

 ν:G×Ω→N

determines the random total number of evaluations of a given input , and

• is a sequence of mappings

 φk:Bk×Ω→R,

which are used to obtain for every input a random approximation to based on the observed function values of .

To be more precise, we define for every a mapping

 Nψk:G×Ω→Bk

by

 Nψk=(y1,…,yk),

where

 y1(g,ω)=g(ψ1(ω))

and

 ℓ=2,…,k.yℓ(g,ω)=g(ψℓ(y1(g,ω),…,yℓ−1(g,ω),ω)),ℓ=2,…,k.

For a given and a given input the algorithm specified by sequentially performs evaluations of at the points

 ψ1(ω),ψ2(y1(g,ω),ω),…,ψν(g,ω)(y1(g,ω),…,yν(g,ω)−1(g,ω),ω)∈A

and finally applies the mapping to the data to obtain the real number

 ˆSψ,ν,φ(g,ω)=φν(g,ω)(Nψν(g,ω)(g,ω),ω)

as an approximation to . The induced mapping

 ˆSψ,ν,φ:G×Ω→R

is called a generalized randomized algorithm if for every the mappings

 ˆSψ,ν,φ(g,⋅):Ω→R \ and% \ ν(g,⋅):Ω→N

are random variables.

We use to denote the class of all randomized algorithms. The error and the cost of are defined in the worst case sense by

 e(ˆS)=supg∈GE|S(g)−ˆS(g,⋅)|

and

 cost(ˆS)=infψ,ν,φ{supg∈GEν(g,⋅):ˆS=ˆSψ,ν,φ},

respectively. Thus the definition of the cost of takes into account that the representation is not unique in general.

A generalized randomized algorithm is called deterministic if the random variable is constant for all . In this case we have with mappings

 (3) ψk:Bk−1→A,ν:G→N,φk:Bk→R,

and it is easy to see that

 cost(ˆS)=inf{supg∈Gν(g):ˆS=ˆSψ,ν,φ,(ψ,ν,φ) % satisfies (???)}.

The class of all generalized deterministic algorithms is denoted by .

Let . The crucial quantities for our analysis are the -th minimal errors

and

 erann(G;S)=inf{e(ˆS):ˆS∈Aran,cost(ˆS)≤n},

i.e., the smallest possible worst case error that can be achieved by generalized deterministic algorithms based on at most function values of and the smallest possible worst case mean error that can be achieved by generalized randomized algorithms that use at most function values of on average, respectively. Clearly, .

We present two types of lower bounds for the minimal errors and , which generalize classical results of Bakhvalov and Novak for the case of being a linear functional on a space of real-valued functions , see  [2, 21].

###### Proposition 1.

Let , , and assume that there exist functions

 g1,+,g1,−,…,gm,+,gm,−:A→B

with the following properties.

• The sets

 {g1,+≠b∗}∪{g1,−≠b∗},…,{gm,+≠b∗}∪{gm,−≠b∗}

are pairwise disjoint,

• We have ,

• We have for .

Then, for every ,

 erann(G;S)≥m−16n8mε.
###### Proposition 2.

Let be a linear space. Let , , and assume that there exist functions

 g1,+,g1,−,…,gm,+,gm,−:A→B

with the following properties.

• The sets

 {g1,+≠b∗}∪{g1,−≠b∗},…,{gm,+≠b∗}∪{gm,−≠b∗}

are pairwise disjoint,

• We have and for all we have

 m∑i=1gi,δi∈GandS(m∑i=1gi,δi)=m∑i=1S(gi,δi),
• We have for .

Then, for every ,

 edetn(G;S)≥m−n2ε

and for every ,

 erann(G;S)≥√m−4n128ε.

For the proof of the lower bounds for the -th minimal randomized errors in Propositions 1 and 2 we employ a classical averaging principle of Bakhvalov, see [2]. Consider a probability measure on the power set of with finite support. For a deterministic algorithm we define the average error and the average cost of with respect to by

 e(ˆS,μ)=∫G|S(g)−ˆS(g)|μ(dg)

and

 cost(ˆS,μ)=inf{∫Gν(g)μ(dg):ˆS=ˆSψ,ν,φ,(ψ,ν,φ) satisfies (???)}.

The smallest possible average case error with respect to that can be achieved by any generalized deterministic algorithm based on at most function evaluations on average with respect to is then given by

###### Lemma 1.

For every probability measure on and every we have

 erann(G;S)≥12edet2n(μ).

For convenience of the reader we provide a proof of Lemma 1

###### Proof of Lemma 1.

Let with . Let and choose such that and . Put

 Ω1={ω∈Ω:∫Gν(g,ω)dμ(g)≤2n}.

Then , and therefore, . For every we have and , which implies

 ∫G|S(g)−ˆS(g,ω)|μ(dg)≥edet2n(μ).

Hence

 e(ˆS) ≥∫GE[|S(g)−ˆS(g,⋅)|]μ(dg) ≥∫Ω1∫G|S(g)−ˆS(g,ω)|μ(dg)P(dω)≥(1/2−ρ/(2n))⋅edet2n(μ)

Letting tend to zero completes the proof. ∎

###### Proof of Proposition 1.

Let denote the uniform distribution on

 ˜G={g1,+,g1,−,…,gm,+,gm,−}.

We show that

 (4) edetn(μ)≥m−8n4mε,

which jointly with Lemma 1 yields the lower bound in Proposition 1.

In order to prove (4), let with . Let and choose satisfying (3) such that and . Put

 ˜G1={g∈˜G:ν(g)≤4n}

and let

 I={i∈{1,…,m}:gi,+,gi,−∈˜G1}.

Then , and therefore . Since , we conclude that

 |I|≥(1/2−ρ/(2n))⋅m.

Let

 K={ψ1,ψ2(b∗),…,ψ4n(b∗,…,b∗)}

denote the set of the first nodes in that are produced by the sequence for evaluating the constant function on , and put

 J={i∈I:K∩({gi,+≠b∗}∪{gi,−≠b∗})=∅}.

Clearly, for all and all , and, observing property (i), we conclude . Thus, by property (iii),

 ∫G|S(g)−ˆS(g)|μ(dg) ≥12m∑i∈J(|S(gi,+)−ˆS(gi,+)|+|S(gi,−)−ˆS(gi,−)|) ≥12m∑i∈J|S(gi,+)−S(gi,−)| ≥|J|2mε≥(1/4−ρ/(4n)−2n/m)ε.

Letting tend to zero yields

 e(ˆS,μ)≥(1/4−2n/m)ε,

which completes the proof. ∎

###### Proof of Proposition 2.

We first prove the lower bound for the -th minimal error of deterministic methods. Let with . Choose satisfying (3) such that and . Consider the function