Optimal Strong Approximation of the Onedimensional Squared Bessel Process
Abstract.
We consider the onedimensional squared Bessel process given by the stochastic differential equation (SDE)
and study strong (pathwise) approximation of the solution at the final time point . This SDE is a particular instance of a CoxIngersollRoss (CIR) process where the boundary point zero is accessible. We consider numerical methods that have access to values of the driving Brownian motion at a finite number of time points. We show that the polynomial convergence rate of the th minimal errors for the class of adaptive algorithms as well as for the class of algorithms that rely on equidistant grids are equal to infinity and , respectively. This shows that adaption results in a tremendously improved convergence rate. As a byproduct, we obtain that the parameters appearing in the CIR process affect the convergence rate of strong approximation.
Key words and phrases:
CoxIngersollRoss process; strong approximation; th minimal error; adaptive algorithm; reflected Brownian motion1. Introduction
In recent years, strong approximation of stochastic differential equations (SDEs) has intensively been studied for SDEs of the form
(1) 
with a onedimensional Brownian motion , and , , and . These SDEs are known to have a unique nonnegative strong solution. Such SDEs were proposed in [11] as a model for shortterm interest rates. The solution is called CoxIngersollRoss (CIR) process. Moreover, CIR processes are used as the volatility process in the Heston model [18].
Strong approximation is of particular interest due to the multilevel Monte Carlo technique, see [14, 15, 17]. For an optimality result of this technique applied to quadrature problems of the form with , we refer to [12]. In mathematical finance, the functional often represents a discounted payoff of some derivative and is the corresponding price.
In [1], various numerical schemes have been proposed and numerically tested for the SDE (1) with different choices of the corresponding parameters. These numerical results indicate a convergence at a polynomial rate, which depends on the parameters . More precisely, the empirical convergence rate is monotonically decreasing in the quotient for all numerical schemes that have been tested. Polynomial convergence rates for strong approximation of (1) have been proven by [4, 13, 2, 25, 19], where either a global or the final time error w.r.t. the norm is studied. All these results only hold for some parameter range within and share the same monotonicity, see Figure 1. For an overview of numerical schemes and results on strong convergence without a rate we refer to [13] and the references therein.
In this paper we consider the particular case of the SDE (1) with
By rescaling with , we thus may restrict ourselves to SDEs of the form
(2) 
with and . Furthermore, we focus on the particular instance of (2) with , i.e.,
(3) 
Its solution is called the square of a dimensional Bessel process. For a detailed study of (squared) Bessel processes we refer to [26, Chap. XI].
In the context of (strong) approximation of SDEs, the majority of numerical methods in the literature are nonadaptive [22]. A nonadaptive algorithm uses a fixed discretisation of the driving Brownian motion whereas adaptive algorithms may sequentially choose the evaluation points. The most frequently studied methods in the class of nonadaptive algorithms are Euler or Milsteintype methods that are based on values of the driving Brownian motion on an equidistant grid. For various strong approximation problems of SDEs satisfying standard assumptions, adaption does not help up to a multiplicative constant, see, e.g., [24]. In particular, [23] showed for strong approximation of scalar SDEs at the final time point that no adaptive method can be better (up to a multiplicative constant) than the classical Milstein scheme. Let us stress that these standard assumptions are not fulfilled by (3) since the diffusion coefficient is not even locally Lipschitz continuous.
In contrast to that, the main result of this paper is that adaptive methods are far superior to methods that are based on an equidistant grid for strong approximation of the solution of (3). For this, we determine the polynomial convergence rate of the corresponding th minimal errors, which will be introduced below.
Let be the solution of (3) at time , and let . The error of an approximation of is defined by
(4) 
At first, we consider the class of methods that only use values of the driving Brownian motion on an equidistant grid with points given by
The corresponding th minimal error for the approximation of is given by
(5) 
Roughly speaking, is the error of the best algorithm for the approximation of w.r.t. the norm that only uses . Clearly, Euler and Milsteintype schemes fit into this class of algorithms. In the case , the optimal approximation is given by the conditional expectation of given the algebra generated by .
The class of adaptive methods that use values of the driving Brownian motion at sequentially chosen points is given by
Here, in contrast to the class , the th evaluation site may depend on the previous observations of . Moreover, considering the particular choice of constant mappings yields for all . The th minimal error for the approximation of for the class of adaptive methods is given by
(6) 
We clearly have for all .
In the following we present our main results. We write for sequences of nonnegative reals and if there exists a constant such that for all . Moreover, we write if and .
We show that the polynomial convergence rate of the th minimal error is equal to for all . More precisely, Corollary 1 yields
(7) 
for all . Of course, the constants hidden in the “”notation may depend on . Furthermore, the corresponding upper bound is attained by the driftimplicit Euler scheme , see Theorem 2 and (23). In the more general case of the SDE (2), the driftimplicit Euler scheme is given by
(8) 
for and . Let us mention that the driftimplicit Euler scheme is actually proposed for the SDE (1) with parameters satisfying , see [1]. Nevertheless, it is still well defined in the limiting case . Note that the upper bound from (7) is the first strong convergence result with a positive rate in the case , cf. Figure 1.
For adaptive algorithms the situation is rather different. Corollary 2 shows that
(9) 
for all and for all . Hence the polynomial convergence rate of the th minimal error is equal to infinity. More precisely, for every we construct an adaptive algorithm that converges (at least) at a polynomial rate , see Theorem 4. In fact, numerical experiments suggest an exponential decay, see Figure 3. Moreover, such algorithms can be easily implemented on a computer with number of operations of order . Combining (7) and (9) establishes our claim that adaptive algorithms are far superior to nonadaptive algorithms that are based on equidistant grids for strong approximation of (3). Let us stress that this is the first result on SDEs where adaption results in an improved convergence rate compared to methods that are based on equidistant grids.
A key step for the proofs of (7) and (9) consists of identifying the pathwise solution of (3), see Proposition 1, and link this problem to global optimization under the Wiener measure. Let us mention that the analysis of the adaptive algorithm in Theorem 4 heavily relies on results of [9].
Although we have shown that adaptive algorithms are far superior to methods that are based on an equidistant grid for a particular choice of the parameters of SDE (1), it is open whether this superiority also holds for more general parameter constellations.
We now turn to the more general case of the SDE (2) with . Moreover, we consider a stronger error criterion which is pathwise given by the supremum norm. In this case we obtain
(10) 
for all , where denotes a projected equidistant Euler scheme, see Remark 7. This scheme coincides with the driftimplicit Euler scheme for . Let us stress that this error bound is the first strong convergence result with a positive rate in the case and arbitrary , cf. Figure 1. At present, we have only shown the upper bound (10). Nevertheless, we expect this upper bound to be sharp even for adaptive algorithms.
Let us briefly comment on some consequences of the results presented above for strong approximation of CIR processes.
It is wellknown that the parameters , , and in SDE (1) have an influence on the behavior of its solution. For instance, the solution remains strictly positive (the boundary point is inaccessible) if and only if the socalled Feller condition is satisfied. As illustrated in Figure 1, the driftimplicit Euler scheme converges at least with rate if , see [2, 25], and so does the corresponding th minimal error for methods using an equidistant grid. Hence the quotient affects the convergence rate of the th minimal error for equidistant methods since it drops down to for and according to (7).
In contrast to the known upper bounds, cf. Figure 1, the convergence rate of the driftimplicit Euler scheme for (3) does not depend on the norm appearing in the error criterion, see (7).
Let us comment on lower bounds for strong approximation of SDEs at the final time point based on the values of the driving Brownian motion. In [10], a twodimensional SDE is presented where the corresponding convergence rate is shown to be . In contrast to rate for smooth scalar SDEs [23], the difficulty in [10] arises from the presence of Lévy areas. More recently, the existence of SDEs with smooth coefficients has been shown where the corresponding th minimal error converges arbitrarily slow to zero, see [16, 20]. It is crucial that these SDEs are multidimensional. Apart from (7), we are not aware of any other lower bound with convergence rate less than for a scalar SDE.
This paper is organized as follows. In Section 2 we derive an explicit representation of the solution of (3) and the more general case of (2). Using this representation we show sharp upper and lower bounds of in Section 3. In Section 4 we consider a particular adaptive method that achieves an arbitrarily high polynomial convergence rate. Finally, we illustrate our results by numerical experiments.
2. Squared Bessel Process of Dimension One
In this section, we will derive an explicit expression for the strong solution of (3) by using basic results about reflected SDEs. Subsequently, we will extend this technique to the more general case of SDE (2).
In the following let be a complete probability space and let be a filtration on this space satisfying the usual conditions.
Given and a Brownian motion w.r.t. , we define
(11) 
for all with . Then, is a Brownian motion w.r.t. . Indeed, the quadratic variation of satisfies
and thus Lévy’s characterization can be applied. Now, consider the SDE (3) where the driving Brownian motion has the particular form given in (11). Due to this construction of , we see that the solution of (3) is given by
(12) 
since and hence
Moreover, Tanaka’s formula [21, Prop. III.6.8] given by
where denotes the local time of in , yields for that
where the second equality follows from Skorokhod’s lemma, see [21, Lem. III.6.14]. Finally, using for leads to the solution of (3) given by
(13) 
where denotes the negative part of . We stress that the explicit solution (13) of the SDE (3) holds for any Brownian motion and does not depend on the particular construction given in (11), see [21, Cor. V.3.23], since pathwise uniqueness and strong existence holds for the SDE (3). Hence the unique strong solution of the SDE (3) is given by (13).
Remark 1.
It is wellknown that the solution of the SDE (3) can be expressed in terms of in (12), see, e.g., [26, Ex. IX.3.16]. However, we are not aware of a result regarding the explicit form of the strong solution given by (13).
In the context of SDEs, the somehow explicit solution of by means of in (12) is rather useless for strong approximation. The concept of strong solutions entails a functional dependence of the solution process and the input Brownian motion appearing in the SDE, which is in our case. We thus seek to “construct” the solution out of .
Remark 2.
Remark 3.
We now turn to the more general case of SDE (2) with arbitrary .
Proposition 1.
Remark 4.
Proof of Proposition 1.
Analogous to the above derivation, we start with a Brownian motion and consider the SDE
Moreover, we assume that the Brownian motion appearing in (2) has the particular form
Then, Itô’s formula shows
Hence the solution of SDE (2) is given by
(17) 
On the other hand, the TanakaMeyer formula [21, Thm. III.7.1(v)] applied to
combined with the explicit expression (16) yields
where denotes the semimartingale local time of at . Thus, Skorokhod’s lemma [21, Lem. III.6.14] shows
and consequently
It remains to apply [21, Cor. V.3.23]. ∎
Remark 5.
Consider the situation of the proof of Proposition 1. The TanakaMeyer formula applied to yields
Hence is the solution of the reflected SDE on the domain given by
(18) 
where is a process of bounded variation with variation increasing only when . For details on reflected SDEs we refer to [29]. In view of (17), we can express the solution to the SDE (2) by
3. Equidistant Methods for SDE (3)
As shown in the previous section, the SDE (3) admits the explicit solution (13). This immediately links the problem of approximating SDE (3) to global optimization under the Wiener measure. We refer to [27] for results on global optimization under the Wiener measure.
In this section we show sharp (up to constants) upper and lower bounds for for all . Recall that denotes the th minimal error corresponding to the approximation of SDE (3) and algorithms that are based on equidistant grids, see (5). We show that the convergence rate of is equal to . Moreover, we show that the driftimplicit Euler scheme converges with the optimal rate .
For the proofs we exploit results from [3] on the asymptotic error distribution of the infimum of a Brownian motion approximated by equidistant points on the unit interval. For we define
(19) 
where denotes a standard Brownian motion. The following result is due to [3, Thm. 1 and Lem. 6].
Theorem 1 ([3]).

The sequence converges in distribution.

For all the sequence is uniformly integrable. In particular, for all we have
(20)
Remark 6.
In [3], the limiting distribution of the sequence is given explicitly by means of threedimensional Bessel processes.
Recall that the solution of (3) is given by with
(21) 
Moreover, for we define the approximation of by
where
serves as an approximation of . Here, the global infimum is simply replaced by the discrete minimum over equidistant knots.
The following upper bound is a consequence of Theorem 1(ii).
Theorem 2.
For every we have
In particular, the th minimal error satisfies
Proof.
The natural extension of to an approximation on the whole equidistant grid with mesh size is given by
where
Let us stress that this scheme can be expressed by the following Eulertype scheme
(22) 
and . In the context of reflected SDEs, scheme (22) is simply the projected Euler scheme for a reflected Brownian motion. Due to
the driftimplicit Euler scheme (8) for reads
(23) 
Thus it coincides with the squared version of (22), i.e.,
Remark 7.
We briefly comment on results for the more general case of SDE (2) with arbitrary . Moreover, we consider a stronger global error criterion where pathwise the global error is measured in the supremum norm. Up to a logarithmic factor we will obtain the same error bound as in Theorem 2.
For we denote by the projected Euler scheme with steps associated to the reflected SDE (18) up to time , see [28]. More precisely, is defined by
and piecewise constant interpolation, i.e.,
The solution to SDE (2) is then approximated by
At the grid points, this scheme coincides with the driftimplicit Euler scheme (23) if . Similar to the proof of Theorem 2, we obtain
for all due to [28, Cor. 2.5, Cor. 2.6 and Thm. 3.2(i)] and Remark 5. Let us stress that this error bound is the first strong convergence result with a positive rate in the case , cf. Figure 1.
We now turn to the question whether an algorithm can do better than in an asymptotic sense if this algorithm has the same information about the Brownian motion as .
Recall the definition of the th minimal error given in (5). The proof of the following theorem is postponed to Section 5.
Theorem 3.
For all we have
Combining Theorem 2 and Theorem 3 yields the following asymptotic behavior of the th minimal error.
Corollary 1.
Remark 8.
In Corollary 1 we obtain the same rate as in [27] for global optimization. In [27], the author studies optimal approximation of the time point where a Brownian motion attains its maximum, and provides a detailed analysis of general nonadaptive algorithms that do not necessarily rely on equidistant grids.
Remark 9.
If we allow for more information about the Brownian path than just point evaluations , the situation may change completely. For instance, if we consider algorithms that have access to the final value and to the infimum of the Brownian path, the problem of strong approximation of becomes trivial. Let us stress that the joint distribution of has an explicit representation by means of a Lebesgue density, see, e.g., [6, p. 154].
4. Adaptive Methods for SDE (3)
In this section we present an adaptive algorithm for the approximation of the solution of SDE (3) based on sequential observations of the Brownian motion . In contrast to Section 3, here the points are chosen adaptively, i.e., the th evaluation site is a measurable function of . From Proposition 1 it is clear that the actual task consists of the approximation of the global infimum . For this we use the adaptive algorithm from [9], see also [7, 8]. This algorithm approximates by the discrete minimum . In the following we describe the adaptive choice of .
The first observation is nonadaptively chosen to be the endpoint, i.e., . Moreover, for notational convenience we define .
Let , and consider the th step of the algorithm where will be chosen based on the previous observations . For this, we denote the ordered first evaluation sites by
such that . Moreover, we assume that we have made the following observations
(24) 
and we denote the corresponding discrete minimum by
Conditioned on (24), we have independent Brownian bridges from to on the subinterval , for . In the following we denote a Brownian bridge from to on with and by .
The basic idea of the adaptive algorithm is a simple greedy strategy: The next observation is taken at the midpoint of the subinterval where the probability that the corresponding Brownian bridge undershoots the current discrete minimum minus some threshold is maximal. More precisely, we split the interval according to
(25) 
with , and evaluate at
We note that the infimum of a Brownian bridge satisfies
for , see [6, p. 67]. Hence (25) reduces to maximizing
Finally, we have to specify the threshold . This threshold is chosen to be
where denotes the length of the smallest subinterval at step and where is some prespecified parameter. Let us stress that all above (adaptive) quantities depend on the choice of the parameter , although it is not explicitly indicated.
This amounts to a family of adaptive algorithms defined by
(26) 
for the approximation of (13), where the adaptively chosen points depend on the prespecified choice of .
Remark 10.
A straightforward implementation of the algorithm (26) on a computer requires operations of order .
The following result is an immediate consequence of [9, Thm. 1].
Theorem 4.
For all and for all there exists such that
Remark 11.
Recall the definition of the th minimal error given in (6).
Corollary 2.
For all and for all we have
5. Proofs
For </