Optimal Strong Approximation of the 1-D Squared Bessel Process

# Optimal Strong Approximation of the One-dimensional Squared Bessel Process

## Abstract.

We consider the one-dimensional squared Bessel process given by the stochastic differential equation (SDE)

 dXt=1dt+2√XtdWt,X0=x0,t∈[0,1],

and study strong (pathwise) approximation of the solution at the final time point . This SDE is a particular instance of a Cox-Ingersoll-Ross (CIR) process where the boundary point zero is accessible. We consider numerical methods that have access to values of the driving Brownian motion at a finite number of time points. We show that the polynomial convergence rate of the -th minimal errors for the class of adaptive algorithms as well as for the class of algorithms that rely on equidistant grids are equal to infinity and , respectively. This shows that adaption results in a tremendously improved convergence rate. As a by-product, we obtain that the parameters appearing in the CIR process affect the convergence rate of strong approximation.

###### Key words and phrases:
Cox-Ingersoll-Ross process; strong approximation; -th minimal error; adaptive algorithm; reflected Brownian motion

## 1. Introduction

In recent years, strong approximation of stochastic differential equations (SDEs) has intensively been studied for SDEs of the form

 (1) dXt=(a−bXt)dt+σ√XtdWt,X0=x0,t≥0,

with a one-dimensional Brownian motion , and , , and . These SDEs are known to have a unique non-negative strong solution. Such SDEs were proposed in [11] as a model for short-term interest rates. The solution is called Cox-Ingersoll-Ross (CIR) process. Moreover, CIR processes are used as the volatility process in the Heston model [18].

Strong approximation is of particular interest due to the multi-level Monte Carlo technique, see [14, 15, 17]. For an optimality result of this technique applied to quadrature problems of the form with , we refer to [12]. In mathematical finance, the functional often represents a discounted payoff of some derivative and is the corresponding price.

In [1], various numerical schemes have been proposed and numerically tested for the SDE (1) with different choices of the corresponding parameters. These numerical results indicate a convergence at a polynomial rate, which depends on the parameters . More precisely, the empirical convergence rate is monotonically decreasing in the quotient for all numerical schemes that have been tested. Polynomial convergence rates for strong approximation of (1) have been proven by [4, 13, 2, 25, 19], where either a global or the final time error w.r.t. the -norm is studied. All these results only hold for some parameter range within and share the same monotonicity, see Figure 1. For an overview of numerical schemes and results on strong convergence without a rate we refer to [13] and the references therein.

In this paper we consider the particular case of the SDE (1) with

 σ2/(2a)=2.

By rescaling with , we thus may restrict ourselves to SDEs of the form

 (2) dXt =(1−bXt)dt+2√XtdWt,X0=x0,t≥0,

with and . Furthermore, we focus on the particular instance of (2) with , i.e.,

 (3) dXt=1dt+2√XtdWt,X0=x0,t≥0.

Its solution is called the square of a -dimensional Bessel process. For a detailed study of (squared) Bessel processes we refer to [26, Chap. XI].

In the context of (strong) approximation of SDEs, the majority of numerical methods in the literature are non-adaptive [22]. A non-adaptive algorithm uses a fixed discretisation of the driving Brownian motion whereas adaptive algorithms may sequentially choose the evaluation points. The most frequently studied methods in the class of non-adaptive algorithms are Euler or Milstein-type methods that are based on values of the driving Brownian motion on an equidistant grid. For various strong approximation problems of SDEs satisfying standard assumptions, adaption does not help up to a multiplicative constant, see, e.g., [24]. In particular, [23] showed for strong approximation of scalar SDEs at the final time point that no adaptive method can be better (up to a multiplicative constant) than the classical Milstein scheme. Let us stress that these standard assumptions are not fulfilled by (3) since the diffusion coefficient is not even locally Lipschitz continuous.

In contrast to that, the main result of this paper is that adaptive methods are far superior to methods that are based on an equidistant grid for strong approximation of the solution of (3). For this, we determine the polynomial convergence rate of the corresponding -th minimal errors, which will be introduced below.

Let be the solution of (3) at time , and let . The error of an approximation of is defined by

 (4) ep(ˆX1)=(E(∣∣X1−ˆX1∣∣p))1/p.

At first, we consider the class of methods that only use values of the driving Brownian motion on an equidistant grid with points given by

 Ceq(n)={ˆX1=Φ(W1n,W2n,…,W1): Φ:Rn→R Borel-measurable}.

The corresponding -th minimal error for the approximation of is given by

 (5) eeqp(n)=inf{ep(ˆX1): ˆX1∈Ceq(n)}.

Roughly speaking, is the error of the best algorithm for the approximation of  w.r.t. the -norm that only uses . Clearly, Euler and Milstein-type schemes fit into this class of algorithms. In the case , the optimal approximation is given by the conditional expectation of given the -algebra generated by .

The class of adaptive methods that use values of the driving Brownian motion at sequentially chosen points is given by

 Cad(n)={ˆX1 =Φ(Wt1,…,Wtn): Φ:Rn→R Borel-measurable, t1∈[0,1], t2=φ2(Wt1), φ2:R→[0,1] Borel-measurable, ⋮ tn=φn(Wt1,…,Wtn−1), φn:Rn−1→[0,1] Borel-measurable}.

Here, in contrast to the class , the -th evaluation site may depend on the previous observations of . Moreover, considering the particular choice of constant mappings yields for all . The -th minimal error for the approximation of for the class of adaptive methods is given by

We clearly have for all .

In the following we present our main results. We write for sequences of non-negative reals and if there exists a constant such that for all . Moreover, we write if and .

We show that the polynomial convergence rate of the -th minimal error is equal to for all . More precisely, Corollary 1 yields

 (7) eeqp(n)≍n−1/2

for all . Of course, the constants hidden in the “”-notation may depend on . Furthermore, the corresponding upper bound is attained by the drift-implicit Euler scheme , see Theorem 2 and (23). In the more general case of the SDE (2), the drift-implicit Euler scheme is given by

 (8) ˆXn,impk+1n=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝√ˆXn,impkn+(Wk+1n−Wkn)+ ⎷(√ˆXn,impkn+(Wk+1n−Wkn))22+bn⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠2

for and . Let us mention that the drift-implicit Euler scheme is actually proposed for the SDE (1) with parameters satisfying , see [1]. Nevertheless, it is still well defined in the limiting case . Note that the upper bound from (7) is the first strong convergence result with a positive rate in the case , cf. Figure 1.

For adaptive algorithms the situation is rather different. Corollary 2 shows that

for all and for all . Hence the polynomial convergence rate of the -th minimal error is equal to infinity. More precisely, for every we construct an adaptive algorithm that converges (at least) at a polynomial rate , see Theorem 4. In fact, numerical experiments suggest an exponential decay, see Figure 3. Moreover, such algorithms can be easily implemented on a computer with number of operations of order . Combining (7) and (9) establishes our claim that adaptive algorithms are far superior to non-adaptive algorithms that are based on equidistant grids for strong approximation of (3). Let us stress that this is the first result on SDEs where adaption results in an improved convergence rate compared to methods that are based on equidistant grids.

A key step for the proofs of (7) and (9) consists of identifying the pathwise solution of (3), see Proposition 1, and link this problem to global optimization under the Wiener measure. Let us mention that the analysis of the adaptive algorithm in Theorem 4 heavily relies on results of [9].

Although we have shown that adaptive algorithms are far superior to methods that are based on an equidistant grid for a particular choice of the parameters of SDE (1), it is open whether this superiority also holds for more general parameter constellations.

We now turn to the more general case of the SDE (2) with . Moreover, we consider a stronger error criterion which is pathwise given by the supremum norm. In this case we obtain

 (10) (E(sup0≤t≤1∣∣Xt−¯¯¯¯¯Xnt∣∣p))1/p≼n−1/2⋅√ln(1+n)

for all , where denotes a projected equidistant Euler scheme, see Remark 7. This scheme coincides with the drift-implicit Euler scheme for . Let us stress that this error bound is the first strong convergence result with a positive rate in the case and arbitrary , cf. Figure 1. At present, we have only shown the upper bound (10). Nevertheless, we expect this upper bound to be sharp even for adaptive algorithms.

Let us briefly comment on some consequences of the results presented above for strong approximation of CIR processes.

It is well-known that the parameters , , and in SDE (1) have an influence on the behavior of its solution. For instance, the solution remains strictly positive (the boundary point is inaccessible) if and only if the so-called Feller condition is satisfied. As illustrated in Figure 1, the drift-implicit Euler scheme converges at least with rate if , see [2, 25], and so does the corresponding -th minimal error for methods using an equidistant grid. Hence the quotient affects the convergence rate of the -th minimal error for equidistant methods since it drops down to for and according to (7).

In contrast to the known upper bounds, cf. Figure 1, the convergence rate of the drift-implicit Euler scheme for (3) does not depend on the -norm appearing in the error criterion, see (7).

Let us comment on lower bounds for strong approximation of SDEs at the final time point based on the values of the driving Brownian motion. In [10], a two-dimensional SDE is presented where the corresponding convergence rate is shown to be . In contrast to rate for smooth scalar SDEs [23], the difficulty in [10] arises from the presence of Lévy areas. More recently, the existence of SDEs with smooth coefficients has been shown where the corresponding -th minimal error converges arbitrarily slow to zero, see [16, 20]. It is crucial that these SDEs are multi-dimensional. Apart from (7), we are not aware of any other lower bound with convergence rate less than for a scalar SDE.

This paper is organized as follows. In Section 2 we derive an explicit representation of the solution of (3) and the more general case of (2). Using this representation we show sharp upper and lower bounds of in Section 3. In Section 4 we consider a particular adaptive method that achieves an arbitrarily high polynomial convergence rate. Finally, we illustrate our results by numerical experiments.

## 2. Squared Bessel Process of Dimension One

In this section, we will derive an explicit expression for the strong solution of (3) by using basic results about reflected SDEs. Subsequently, we will extend this technique to the more general case of SDE (2).

In the following let be a complete probability space and let be a filtration on this space satisfying the usual conditions.

Given and a Brownian motion w.r.t. , we define

 (11) Wt=∫t0sgn(Bs+√x0)dBs

for all with . Then, is a Brownian motion w.r.t. . Indeed, the quadratic variation of satisfies

 [W]t=∫t0sgn(Bs+√x0)2ds=∫t01ds=t,

and thus Lévy’s characterization can be applied. Now, consider the SDE (3) where the driving Brownian motion has the particular form given in (11). Due to this construction of , we see that the solution of (3) is given by

 (12) Xt=(Bt+√x0)2,

since and hence

 2∫t0√XsdWs =2∫t0∣∣Bt+√x0∣∣⋅sgn(Bs+√x0)dBs =2∫t0(Bs+√x0)dBs =B2t−t+2√x0Bt.

Moreover, Tanaka’s formula [21, Prop. III.6.8] given by

 |Bt−a|=|a|+∫t0sgn(Bs−a)dBs+2LBt(a),a∈R,

where denotes the local time of in , yields for that

 ∣∣Bt+√x0∣∣ =√x0+Wt+2LBt(−√x0) =√x0+Wt+max(0,sup0≤s≤t−(√x0+Ws)),

where the second equality follows from Skorokhod’s lemma, see [21, Lem. III.6.14]. Finally, using for leads to the solution of (3) given by

 (13) Xt=((Wt+√x0)+(inf0≤s≤tWs+√x0)−)2,

where denotes the negative part of . We stress that the explicit solution (13) of the SDE (3) holds for any Brownian motion and does not depend on the particular construction given in (11), see [21, Cor. V.3.23], since pathwise uniqueness and strong existence holds for the SDE (3). Hence the unique strong solution of the SDE (3) is given by (13).

###### Remark 1.

It is well-known that the solution of the SDE (3) can be expressed in terms of in (12), see, e.g., [26, Ex. IX.3.16]. However, we are not aware of a result regarding the explicit form of the strong solution given by (13).

In the context of SDEs, the somehow explicit solution of by means of in (12) is rather useless for strong approximation. The concept of strong solutions entails a functional dependence of the solution process and the input Brownian motion appearing in the SDE, which is in our case. We thus seek to “construct” the solution out of .

###### Remark 2.

Equation (12) clearly yields

 (Xt)t≥0\lx@stackreld=((Wt+√x0)2)t≥0,

cf. [21, Thm. III.6.17]. However, this equation is only valid in the distributional sense and does not hold pathwise. Note that the Brownian path attains its running minimum whenever the solution hits . More precisely, we have

 (14) Xt=0⇔Wt≤−√x0 ∧ Wt=inf0≤s≤tWs,

cf. Figure 2.

###### Remark 3.

The expression

 (Wt+√x0)+(inf0≤s≤tWs+√x0)−

appearing in (13) is known as a reflected Brownian motion (with reflecting barrier at ) starting in , see [29].

We now turn to the more general case of SDE (2) with arbitrary .

###### Proposition 1.

The unique strong solution of the SDE (2) is given by

 Xt=(ut+e−b2t(inf0≤s≤teb2sus)−)2,

where denotes the unique strong solution of the SDE

 (15) dut=−b2utdt+dWt,u0=√x0,t≥0.

In particular, the unique strong solution of the SDE (3) is given by (13).

###### Remark 4.

Note that the solution to the linear SDE (15) is called Ornstein-Uhlenbeck process and can be solved explicitly by

 (16) ut=e−b2t(√x0+∫t0eb2sdWs),t≥0,

see, e.g., [21, Ex. V.6.8)].

###### Proof of Proposition 1.

Analogous to the above derivation, we start with a Brownian motion and consider the SDE

 duBt=−b2uBtdt+dBt,uB0=√x0,t≥0.

Moreover, we assume that the Brownian motion appearing in (2) has the particular form

 Wt=∫t0sgn(uBs)dBs,t≥0.

Then, Itô’s formula shows

 d((uBt)2) =(1−b(uBt)2)dt+2uBtdBt =(1−b(uBt)2)dt+2∣∣uBt∣∣dWt.

Hence the solution of SDE (2) is given by

 (17) Xt=(uBt)2,t≥0.

On the other hand, the Tanaka-Meyer formula [21, Thm. III.7.1(v)] applied to

 ~uBt=uBteb2t

combined with the explicit expression (16) yields

 ∣∣~uBt∣∣ Missing or unrecognized delimiter for \left =√x0+∫t0eb2sdWs+2Λ~uBt(0),

where denotes the semimartingale local time of at . Thus, Skorokhod’s lemma [21, Lem. III.6.14] shows

 ∣∣~uBt∣∣=√x0+∫t0eb2sdWs+(inf0≤s≤t√x0+∫s0eb2udWu)−,

and consequently

 ∣∣uBt∣∣=ut+e−b2t(inf0≤s≤teb2sus)−.

It remains to apply [21, Cor. V.3.23]. ∎

###### Remark 5.

Consider the situation of the proof of Proposition 1. The Tanaka-Meyer formula applied to yields

 ∣∣uBt∣∣ =√x0−b2∫t0uBssgn(uBs)ds+∫t0sgn(uBs)dBs+2ΛuBt(0) =√x0−b2∫t0∣∣uBs∣∣ds+∫t0dWs+2ΛuBt(0).

Hence is the solution of the reflected SDE on the domain given by

 (18) dZt=−b2Ztdt+dWt+dKt,Z0=√x0,t≥0,

where is a process of bounded variation with variation increasing only when . For details on reflected SDEs we refer to [29]. In view of (17), we can express the solution to the SDE (2) by

 Xt=(Zt)2,t≥0.

## 3. Equidistant Methods for SDE (3)

As shown in the previous section, the SDE (3) admits the explicit solution (13). This immediately links the problem of approximating SDE (3) to global optimization under the Wiener measure. We refer to [27] for results on global optimization under the Wiener measure.

In this section we show sharp (up to constants) upper and lower bounds for for all . Recall that denotes the -th minimal error corresponding to the approximation of SDE (3) and algorithms that are based on equidistant grids, see (5). We show that the convergence rate of is equal to . Moreover, we show that the drift-implicit Euler scheme converges with the optimal rate .

For the proofs we exploit results from [3] on the asymptotic error distribution of the infimum of a Brownian motion approximated by equidistant points on the unit interval. For we define

 (19) δn=√n⋅(min0≤k≤nWkn−inf0≤s≤1Ws),

where denotes a standard Brownian motion. The following result is due to [3, Thm. 1 and Lem. 6].

###### Theorem 1 ([3]).
1. The sequence converges in distribution.

2. For all the sequence is uniformly integrable. In particular, for all we have

 (20) (E(∣∣∣min0≤k≤nWkn−inf0≤s≤1Ws∣∣∣p))1/p≼n−1/2.
###### Remark 6.

In [3], the limiting distribution of the sequence is given explicitly by means of three-dimensional Bessel processes.

Recall that the solution of (3) is given by with

 (21) Y1=(W1+√x0)+(inf0≤s≤1Ws+√x0)−.

Moreover, for we define the approximation of by

 ˆX(n)1=(ˆY(n)1)2,

where

 ˆY(n)1=(W1+√x0)+(min0≤k≤nWkn+√x0)−

serves as an approximation of . Here, the global infimum is simply replaced by the discrete minimum over equidistant knots.

The following upper bound is a consequence of Theorem 1(ii).

###### Theorem 2.

For every we have

 ep(ˆX(n)1)≼n−1/2.

In particular, the -th minimal error satisfies

 eeqp(n)≼n−1/2.
###### Proof.

At first, note that and

 ∣∣∣Y1−ˆY(n)1∣∣∣ =(inf0≤s≤1Ws+√x0)−−(min0≤k≤nWkn+√x0)−≤min0≤k≤nWkn−inf0≤s≤1Ws.

Hence we get

 ∣∣∣X1−ˆX(n)1∣∣∣=(Y1+ˆY(n)1)⋅∣∣∣Y1−ˆY(n)1∣∣∣≤2Y1⋅(min0≤k≤nWkn−inf0≤s≤1Ws).

Finally, Cauchy-Schwarz inequality yields

 E(∣∣∣X1−ˆX(n)1∣∣∣p) ≤2p⋅(E(Y2p1)⋅E(∣∣∣min0≤k≤nWkn−inf0≤s≤1Ws∣∣∣2p))1/2 ≼n−p/2

due to (20) and

 E(Yr1)<∞,1≤r<∞,

since , see Remark 2. ∎

The natural extension of to an approximation on the whole equidistant grid with mesh size is given by

 ˆX(n)kn=(ˆY(n)kn)2,k=0,…,n,

where

 ˆY(n)kn=(Wkn+√x0)+(min0≤l≤kWln+√x0)−,k=0,…,n.

Let us stress that this scheme can be expressed by the following Euler-type scheme

 (22) ˆY(n)k+1n=max(ˆY(n)kn+(Wk+1n−Wkn),0),k=0,…,n−1,

and . In the context of reflected SDEs, scheme (22) is simply the projected Euler scheme for a reflected Brownian motion. Due to

 max(x,0)=x+|x|2=x+√x22,x∈R,

the drift-implicit Euler scheme (8) for reads

 (23) ˆXn,impk+1n=(max(√ˆXn,impkn+(Wk+1n−Wkn),0))2,k=0,…,n−1.

Thus it coincides with the squared version of (22), i.e.,

 ˆXn,impkn=ˆX(n)kn,k=0,…,n.
###### Remark 7.

We briefly comment on results for the more general case of SDE (2) with arbitrary . Moreover, we consider a stronger global error criterion where pathwise the global error is measured in the supremum norm. Up to a logarithmic factor we will obtain the same error bound as in Theorem 2.

For we denote by the projected Euler scheme with steps associated to the reflected SDE (18) up to time , see [28]. More precisely, is defined by

 ¯¯¯¯Zn0 =√x0, ¯¯¯¯Znk+1n =max(¯¯¯¯Znkn−b2⋅¯¯¯¯Znkn⋅1n+(Wk+1n−Wkn),0),k=0,…,n−1,

and piecewise constant interpolation, i.e.,

 ¯¯¯¯Znt=¯¯¯¯Znkn,t∈[k/n,(k+1)/n[,k=0,…,n−1.

The solution to SDE (2) is then approximated by

 ¯¯¯¯¯Xnt=(¯¯¯¯Znt)2,t∈[0,1].

At the grid points, this scheme coincides with the drift-implicit Euler scheme (23) if . Similar to the proof of Theorem 2, we obtain

 (E(sup0≤t≤1∣∣Xt−¯¯¯¯¯Xnt∣∣p))1/p≼n−1/2⋅√ln(1+n)

for all due to [28, Cor. 2.5, Cor. 2.6 and Thm. 3.2(i)] and Remark 5. Let us stress that this error bound is the first strong convergence result with a positive rate in the case , cf. Figure 1.

We now turn to the question whether an algorithm can do better than in an asymptotic sense if this algorithm has the same information about the Brownian motion as .

Recall the definition of the -th minimal error given in (5). The proof of the following theorem is postponed to Section 5.

###### Theorem 3.

For all we have

 eeqp(n)≽n−1/2.

Combining Theorem 2 and Theorem 3 yields the following asymptotic behavior of the -th minimal error.

###### Corollary 1.

For all we have

 eeqp(n)≍n−1/2.

In particular, the drift-implicit Euler scheme (23) is asymptotically optimal.

###### Remark 8.

In Corollary 1 we obtain the same rate as in [27] for global optimization. In [27], the author studies optimal approximation of the time point where a Brownian motion attains its maximum, and provides a detailed analysis of general non-adaptive algorithms that do not necessarily rely on equidistant grids.

###### Remark 9.

If we allow for more information about the Brownian path than just point evaluations , the situation may change completely. For instance, if we consider algorithms that have access to the final value and to the infimum of the Brownian path, the problem of strong approximation of becomes trivial. Let us stress that the joint distribution of has an explicit representation by means of a Lebesgue density, see, e.g., [6, p. 154].

## 4. Adaptive Methods for SDE (3)

In this section we present an adaptive algorithm for the approximation of the solution of SDE (3) based on sequential observations of the Brownian motion . In contrast to Section 3, here the points are chosen adaptively, i.e., the -th evaluation site is a measurable function of . From Proposition 1 it is clear that the actual task consists of the approximation of the global infimum . For this we use the adaptive algorithm from [9], see also [7, 8]. This algorithm approximates by the discrete minimum . In the following we describe the adaptive choice of .

The first observation is non-adaptively chosen to be the endpoint, i.e., . Moreover, for notational convenience we define .

Let , and consider the -th step of the algorithm where will be chosen based on the previous observations . For this, we denote the ordered first evaluation sites by

 0=t(n)0

such that . Moreover, we assume that we have made the following observations

 (24) Wt(n)0=y(n)0=0,Wt(n)1=y(n)1,…,Wt(n)n=y(n)n,

and we denote the corresponding discrete minimum by

 m(n)=min0≤k≤ny(n)k.

Conditioned on (24), we have independent Brownian bridges from to on the subinterval , for . In the following we denote a Brownian bridge from to on with and by .

The basic idea of the adaptive algorithm is a simple greedy strategy: The next observation is taken at the midpoint of the subinterval where the probability that the corresponding Brownian bridge undershoots the current discrete minimum minus some threshold is maximal. More precisely, we split the interval according to

 (25) k∗=argmax1≤k≤n P⎛⎝inf 0≤s≤T(n)kBy(n)k−1,T(n)k,y(n)ks≤m(n)−ε(n)⎞⎠

with , and evaluate at

 tn+1=(t(n)k∗+t(n)k∗−1)/2.

We note that the infimum of a Brownian bridge satisfies

 P(inf0≤s≤TBx,T,ys

for , see [6, p. 67]. Hence (25) reduces to maximizing

 k∗=argmax1≤k≤n T(n)k(y(n)k−1−m(n)+ε(n))⋅(y(n)k−m(n)+ε(n)).

Finally, we have to specify the threshold . This threshold is chosen to be

 ε(n)=√λh(n)ln(1/h(n)),

where denotes the length of the smallest subinterval at step and where is some prespecified parameter. Let us stress that all above (adaptive) quantities depend on the choice of the parameter , although it is not explicitly indicated.

This amounts to a family of adaptive algorithms defined by

for the approximation of (13), where the adaptively chosen points depend on the prespecified choice of .

###### Remark 10.

A straightforward implementation of the algorithm (26) on a computer requires operations of order .

The following result is an immediate consequence of [9, Thm. 1].

###### Theorem 4.

For all and for all there exists such that

###### Remark 11.

The analysis in [9] shows that

 λ≥144⋅(1+2pq)

is sufficient to obtain a convergence order for the -norm in Theorem 4. However, numerical experiments indicate an exponential decay even for small values of , see Figure 3.

Recall the definition of the -th minimal error given in (6).

###### Corollary 2.

For all and for all we have