Non-polynomial lower error bounds for strong approximation of SDEs

# On non-polynomial lower error bounds for adaptive strong approximation of SDEs

## Abstract.

Recently, it has been shown in [9] that there exists a system of stochastic differential equations (SDE) on the time interval with infinitely often differentiable and bounded coefficients such that the Euler scheme with equidistant time steps converges to the solution of this SDE at the final time in the strong sense but with no polynomial rate. Even worse, in [20] it has been shown that for any sequence , which may converge to zero arbitrary slowly, there exists an SDE on with infinitely often differentiable and bounded coefficients such that no approximation of the solution of this SDE at the final time based on evaluations of the driving Brownian motion at fixed time points can achieve a smaller absolute mean error than the given number . In the present article we generalize the latter result to the case when the approximations may choose the location as well as the number of the evaluation sites of the driving Brownian motion in an adaptive way dependent on the values of the Brownian motion observed so far.

## 1. Introduction

Let , , consider a -dimensional system of autonomous stochastic differential equations (SDE)

 (1) dX(t) =μ(X(t))dt+σ(X(t))dW(t),t∈[0,T], X(0) =x0

with a deterministic initial value , a drift coefficient , a diffusion coefficient and an -dimensional driving Brownian motion , and assume that (1) has a unique strong solution . Our computational task is to approximate by means of methods that use finitely many evaluations of the driving Brownian motion . In particular we are interested in the following question: under which assumptions on the coefficients and exists a method of the latter type, which converges to in absolute mean with a polynomial rate?

It is well-known that if the coefficients and are globally Lipschitz continuous then the classical Euler scheme achieves the rate of convergence , see [26]. Moreover, the recent literature on numerical approximation of SDEs contains a number of results on approximation schemes that are specifically designed for non-Lipschitz coefficients and achieve polynomial convergence rates for suitable classes of such SDEs, see e.g. [16, 12, 18, 25, 38, 35, 37, 3, 21, 4] for SDEs with globally monotone coefficients and see e.g. [2, 8, 5, 1, 32, 17, 19, 23, 24, 33, 11] for SDEs with possibly non-monotone coefficients.

On the other hand, it has recently been shown in [20] that for any sequence , which may converge to zero arbitrary slowly, there exists an SDE (1) with and and with infinitely often differentiable and bounded coefficients and such that no approximation of based on finitely many evaluations of the driving Brownian motion converges in absolute mean faster than the given sequence . More formally,

 (2) infs1,…,sn∈[0,T]infu:Rn→R4measurableE∥∥X(T)−u(W(s1),…,W(sn))∥∥≥an.

In particular, there exists an SDE (1) with infinitely often differentiable and bounded coefficients and such that its solution at the final time can not be approximated with a polynomial rate of convergence based on finitely many evaluations of the driving Brownian motion . We add that the latter statement in the special case when the approximation is given by the Euler scheme with equidistant time steps has first been shown in [9].

Note that the time points that are used by an approximation in (2) are fixed, and therefore this negative result does not cover approximations that may choose the number as well as the location of the evaluation sites of the driving Brownian motion in an adaptive way, e.g. numerical schemes that adjust the actual step size according to a criterion that is based on the values of the driving Brownian motion observed so far, see e.g. [6, 29, 30, 27, 34, 22, 13, 14] and the references therein. See Section 4 for the formal definition of that type of approximations. It is well-known that for SDEs (1) with (essentially) globally Lipschitz continuous coefficients and adaptive approximations can not achieve a better rate of convergence compared to what is best possible for non-adaptive ones, which at the same time coincides with the best possible rate of convergence that can be achieved by any approximation based on , see [29, 30]. However, as has recently turned out, this is not necessarily the case anymore if the coefficients and are not both globally Lipschitz continuous. In  [10] it has been shown that for the one-dimensional squared Bessel process, which is the solution of the SDE (1) with and for the following holds: the best possible rate of convergence that can be achieved by any approximation based on equals , i.e. there exist such that

 c1⋅n−1/2≤infu:Rn→RmeasurableE∣∣X(T)−u(W(Tn),W(2Tn),…,W(T))∣∣≤c2⋅n−1/2,

while the best possible rate of convergence that can be achieved by approximations based on adaptively chosen evaluations of the driving Brownian motion equals infinity. More formally, for every there exists and a sequence of approximations based on adaptively chosen evaluations of such that

 E|X(T)−ˆXn|≤c⋅n−α.

In view of the latter result one might hope that a non-polynomial lower error bound in (2) could be overcome by using adaptive approximations, see also the discussion in  [7, p. 2]. In the present article we prove that the pessimistic alternative is true. We show that for any sequence , which may converge to zero arbitrary slowly, there exists an SDE (1) with and and with infinitely often differentiable and bounded coefficients and such that no approximation based on adaptively chosen evaluations of the driving Brownian motion on average can achieve a smaller absolute mean error than the given number , i.e.

 E∥∥X(T)−ˆXn∥∥≥an

for any approximation of the latter type. This fact is an immediate consequence of Corollary 2 in Section 5 together with an appropriate scaling argument. For the proof of the latter result we employ the same class of SDEs as in  [20]. Thus, roughly speaking, these SDEs can not be solved approximately in the strong sense in a reasonable computational time by means of any kind of adaptive (or nonadaptive) method based on finitely many evaluations of the driving Brownian motion .

We conjecture that a similar negative result does even if one allows for adaptive approximations based on finitely many evaluations of arbitrary linear continuous functionals of the driving Brownian motion . However, in this case one can not employ the class of SDEs from [20] since for every such SDE its solution at the final time can be approximated with error zero based on the evaluation of only two linear continuous functionals of the driving Brownian motion , see (2)

We add that negative results in the spirit of (2) for quadrature problems for marginal distributions of SDEs have recently been established in [31].

We briefly describe the content of the paper. In Section 2 we fix some notation. In Section 3 we briefly introduce the class of SDEs from [20], which is studied in this article as well. In Section 4 we formally define the class of adaptive approximations, which are analysed in this article. Our lower error bounds are stated in Section 5. The proof of the main result, Theorem 1, is carried out in Section 6.

## 2. Notation

Throughout this article the following notation is used. For a set , a vector space , a set , and a function we put . For sets , , a function and a subset we denote by the restriction of to . Moreover, for and we write for the Euclidean norm of . For and we denote by and the Borel -fields on and on , respectively, where the latter space is equipped with the supremum norm. For being a finite product of the latter two spaces we denote by the Borel -field on generated by the respective product topology.

## 3. A family of SDEs with smooth and bounded coefficients

Throughout this article we study SDEs provided by the following setting.

Let , let be a probability space with a normal filtration , and let be a standard -Brownian motion on .

Let and let be bounded and satisfy , , , , , and .

For every let and be given by

 μψ(x) =(1,0,0,h(x1)⋅cos(x2ψ(x3))), σ(x) =(0,f(x1),g(x1),0)

and consider the following -dimensional system of SDEs

 (3) dXψ(t) =μψ(Xψ(t))dt+σ(Xψ(t))dW(t),t∈[0,T], Xψ(0) =0.
###### Remark 1.

Note that for every the functions and are infinitely often differentiable and bounded.

###### Remark 2.

It is easy to see that for every the SDE (3) has a unique strong solution given by

 Xψ1(t) =t,Xψ2(t)=∫min(t,τ1)0f(s)dW(s), (4) Xψ3(t) =1[τ1,T](t)⋅∫min(t,τ2)min(t,τ1)g(s)dW(s), Xψ4(t) =1[τ2,T](t)⋅cos(Xψ2(τ1)ψ(Xψ3(τ2)))⋅∫tτ2h(s)ds

for all .

## 4. Adaptive strong approximations

Let . We study general strong approximations of based on and on finitely many sequential evaluations of in the interval . Every such approximation is defined by three sequences

 φ=(φn)n∈N,χ=(χn)n∈N,ϕ=(ϕn)n∈N

of measurable mappings

 φn :Rn−1×R[δ,T]→(0,δ), (5) χn :Rn×R[δ,T]→{0,1}, ϕn :Rn×R[δ,T]→R4.

The sequence determines the evaluation sites of a trajectory of in the interval . The total number of evaluations is determined by the sequence of stopping rules. Finally, the sequence is used to obtain the approximation to from the observed data.

More precisely, let , let be the corresponding trajectory of and put . The sequential observation of starts at the knot . After steps the available information is then given by , where , …, , and we decide whether we stop or further evaluate according to the value of . The total number of observations of in the interval is thus given by

 (6) ν(ω)=min{n∈N:χn(Dn(ω))=1}.

If , then the data is used to construct the estimate .

For obvious reasons we require that -a.s. Then the resulting approximation is given by

 ˆX=ϕν(Dν).

Without loss of generality we assume that

 (7) φk(y1,…,yk−1,v)≠φl(y1,…,yl−1,v)

for all , all with and all . We put

 c(ˆX)=Eν,

that is the expected number of evaluations of the driving Brownian motion in the interval . We denote by the class of all methods of the above form and for we put

 XδN={ˆX∈Xδ:c(ˆX)≤N}.

Clearly, for all and all .

Let us stress that the class contains in particular all methods from the literature, which use a step size control based on sequential evaluations of on average, see e.g. [6, 29, 30, 27, 34, 22, 13, 14] and the references therein. Moreover, of course contains all nonadaptive approximations based on evaluations of at fixed time points and on , as studied in [20]. In the latter case one can take any sequences and satisfying

 φn=sn for n≤N,χ1=…χN−1=0,χN=1 and ϕN=u.

## 5. Main results

Assume the setting in Section 3 and put

 (8) α=inft∈[0,τ1/2]|f′(t)|2,β=∫τ2τ1g2(t)dt,γ=∫Tτ2h(t)dt

as well as

 c1=γexp(−π24−1β)8π√2πβ,c2=γexp(−π24)4π.

Our main result is stated in Theorem 1. It provides a uniform lower bound for the mean absolute error of any strong approximation of that is based on and on sequential evaluations of in the interval on average in the case that is positive, strictly increasing and satisfies as well as . See Section 6 for the proof.

###### Theorem 1.

Let and let be positive, strictly increasing with and . Then for all and all we have

 (9) E∥Xψ(T)−ˆX∥≥c1⋅exp(−1β⋅(ψ−1((1+√96α(min(δ,τ1/2))3)N3))2)−c2N.

As a consequence of Theorem 1 we obtain a non-polynomial decay of the smallest possible mean absolute error of strong approximation of based on and on sequential evaluations of in the interval on average if additionally satisfies an exponential growth condition.

###### Corollary 1.

Let and let be positive, strictly increasing with and . Moreover assume that for all

 limx→∞ψ(x)⋅exp(−qx2)=∞.

Then for all we have

 limN→∞(Nq⋅infˆX∈XδNE∥Xψ(T)−ˆX∥)=∞.
###### Proof.

The assumptions on the function ensure that for all

 (10) limN→∞(Nq⋅exp(−1β⋅(ψ−1((1+√96α(min(δ,τ1/2))3)N3))2))=∞,

see Lemma 4.5 in [20]. This in particular implies that there exists such that for all

 Missing or unrecognized delimiter for \bigl

Employing Theorem 1 we therefore conclude that for all

 Missing or unrecognized delimiter for \bigl

The latter estimate and (10) imply the statement of the corollary. ∎

The following result shows that the smallest possible mean absolute error of strong approximation of based and on sequential evaluations of in the interval on average may converge to zero arbitrarily slow even then when the sequence tends to zero with any given speed.

###### Corollary 2.

Let and satisfy and . Then there exists and such that for all we have

 infˆX∈XδNNE∥Xψ(T)−ˆX∥≥κ⋅aN.
###### Proof.

We proceed similar to the proof of Corollary 4.3 in [20]. Without loss of generality we may assume that the sequences and are strictly decreasing. Let

 Missing or unrecognized delimiter for \bigr

and for put

 bN=√−βln(1c1⋅(aN+c2N)),dN=(1+√96α(min(δN,τ1/2))3)N3.

Note that the sequences and are strictly increasing and satisfy

 limN→∞bN=limN→∞dN=∞.

Define a function by

 ψ(x)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩dN0⋅(1−exp(1x−bN0)),if xN0.

Then is positive, strictly increasing, infinitely often differentiable and satisfies
as well as .

For put

 εN=infˆX∈XδNNE∥Xψ(T)−ˆX∥.

Theorem 1 implies that for all

 εN≥c1⋅exp(−1β⋅(ψ−1(dN))2)−c2N=c1⋅exp(−1β⋅b2N)−c2N=aN.

Since the sequence is decreasing hence for all

 εN≥εN0≥aN0.

Using the assumption that the sequence is strictly decreasing we therefore conclude that for all

 εN≥min{1,aN0/aN}⋅aN≥aN0a1⋅aN,

which completes the proof of the corollary with . ∎

## 6. Proof of Theorem 1

Let and let be given by sequences and , see (4). Recall the definition (6) of . We first determine the regular conditional distribution .

For put

 Sn={s∈(0,δ)n:|{s1,…,sn}|=n}.

For , , and define functions

 ms,y,v:[0,T]→R and Rs:[0,T]2→R

as follows. If put , and and let

 ms,y,v(t)={sk−tsk−sk−1⋅yk−1+t−sk−1sk−sk−1⋅yk,if t∈[sk−1,sk) for k∈{1,…,n+1},v(t),if t∈[δ,T]

as well as

 Rs(r,t)={(sk−max(r,t))⋅(min(r,t)−sk−1)sk−sk−1,if r,t∈[sk−1,sk) for k∈{1,…,n+1},0,otherwise,

for . Otherwise put

 ms,y,v=m(sπ(1),…,sπ(n)),(yπ(1),…,yπ(n)),v,Rs=R(sπ(1),…,sπ(n)),

where is the permutation of such that .

For , , and put

 (11) sy,vk=φk(y1,…,yk−1,v).

Note that due to the assumption (7) we have . Let denote the Gaussian measure on with mean and covariance function . Consider the measurable space

 (12) Extra open brace or missing close brace

It is easy to see that is - measurable. Define the mapping

 K:Ω1×B(C([0,T]))→[0,1]

by

 K((y,v),A)=Qy,v(A)

for all and .

###### Lemma 1.

is a version of the regular conditional distribution .

In the case of the statement of Lemma 1 seems to be well-known, see, e.g., [15, 28, 29, 30], but a proof of it seems not to be available in the literature. If, additionally, is constant then Lemma 1 follows from Lemma 2.9.7 in [36, p. 474], but measurability issues have not been fully addressed in the proof of the latter result. For convenience of the reader we therefore provide a proof of Lemma 1 here.

###### Proof.

Clearly, for all the mapping

 B(C([0,T]))∋A↦K((y,v),A)∈[0,1]

is a probability measure on .

Next, let . We show that the mapping

 (13) Ω1∋(y,v)↦K((y,v),A)∈[0,1]

is  -  measurable. For , and define a function

 Fs,u:[0,T]→R

by

 Fs,u(t)={u(t)−ms,(u(s1),…,u(sn)),0(t),if t∈[0,δ],0,if t∈(δ,T]

for . It is easy to see that for all

 Qy,v=PFsy,v,(W(t))t∈[0,δ]+msy,v,y,v,

and therefore for all

 (14) K((y,v),A)=∫C([0,δ])1A(Fsy,v,u+msy,v,y,v)P(W(t))t∈[0,δ](du).

Clearly,

 {((y,v),u)∈Ω1×C([0,δ]):Fsy,v,u+msy,v,y,v∈A}=∞⋃n=1An,

where

 An={(y,v,u)∈Rn×C([δ,T])×C([0,δ]):Fsy,v,u+msy,v,y,v∈A}

for . The measurability of the functions , , imply that for every the mapping

 Rn×C([δ,T])∋(y,v)↦sy,v∈Sn

is  -  measurable. Thus, observing that for every the mappings

 Sn×C([0,δ])∋(s,u)↦Fs,u∈C([0,T])

and

 Sn×Rn×C([δ,T])∋(s,y,v)↦ms,y,v∈C([0,T])

are continuous we conclude that for every the mapping

 Rn×C([δ,T])×C([0,δ])∋(y,v,u)↦Fsy,v,u+msy,v,y,v∈C([0,T])

is  -  measurable. Hence for every

 An∈B(Rn×C([δ,T])×C([0,δ]))⊂F1⊗B(C([0,δ])),

which implies that the mapping

 Ω1×C([0,δ])∋((y,v),u)↦1A(Fsy,v,u+msy,v,y,v)∈R

is  -  measurable. Using (14) and employing Fubini’s theorem we thus conclude that the mapping (13) is  -  measurable.

Finally, let and . We show that

 (15) P({W∈A}∩{Dν∈E})=∫EK((y,v),A)PDν(d(y,v)).

We have

 {Dν∈E}∩{ν<∞} =∞⋃n=1({Dν∈E}∩{ν=n}) =∞⋃n=1({Dn∈E}∩{χn(Dn)=1}∩n−1⋂k=1{χk(Dk)=0}) =∞⋃n=1{Dn∈E∩Cn},

where is given by

 Cn=χ−1n({1})∩n−1⋂k=1{(y,v)∈Rn×C([δ,T]):(y1,…,yk,v)∈χ−1k({0})}

for . Since a.s. we thus obtain

 P({W∈A}∩{Dν∈E}) =∞∑n=1P({W∈A}∩{Dn∈E∩Cn}) =∞∑n=1∫E∩CnPW|Dn=(y,v)(A)PDn(d(y,v)).

Similarly to the proof of Lemma 2.9.7 on page 474 in  [36] one can show that for all and all

 ∫GPW|Dn=(y,v)(A)PDn(d(y,v))=∫GQy,v(A)PDn(d(y,v)).

Hence

 P({W∈A}∩{Dν∈E}) =∞∑n=1∫E∩CnQy,v(A)PDn(d(y,v)) =∞∑n=1∫E∩CnK((y,v),A)PDn(d(y,v)).

Since for all and all we thus conclude

 P({W∈A}∩{Dν∈E})=∞∑n=1∫E∩(Rn×C([δ,T]))K((y,v),A)PDν(d(y,v)),

which implies (15) and completes the proof of the lemma. ∎

Next, assume that , let , and . Recall the definition (11) of the time points and put

 (16) sy,v0=0,sy,vn+1=δ.

Let be the permutation of such that . Let and put

 (17) t0=sy,vπ(i∗),t1=sy,vπ(