Optimal Strong Approximation of the One-dimensional Squared Bessel Process
We consider the one-dimensional squared Bessel process given by the stochastic differential equation (SDE)
and study strong (pathwise) approximation of the solution at the final time point . This SDE is a particular instance of a Cox-Ingersoll-Ross (CIR) process where the boundary point zero is accessible. We consider numerical methods that have access to values of the driving Brownian motion at a finite number of time points. We show that the polynomial convergence rate of the -th minimal errors for the class of adaptive algorithms as well as for the class of algorithms that rely on equidistant grids are equal to infinity and , respectively. This shows that adaption results in a tremendously improved convergence rate. As a by-product, we obtain that the parameters appearing in the CIR process affect the convergence rate of strong approximation.
Key words and phrases:Cox-Ingersoll-Ross process; strong approximation; -th minimal error; adaptive algorithm; reflected Brownian motion
In recent years, strong approximation of stochastic differential equations (SDEs) has intensively been studied for SDEs of the form
with a one-dimensional Brownian motion , and , , and . These SDEs are known to have a unique non-negative strong solution. Such SDEs were proposed in  as a model for short-term interest rates. The solution is called Cox-Ingersoll-Ross (CIR) process. Moreover, CIR processes are used as the volatility process in the Heston model .
Strong approximation is of particular interest due to the multi-level Monte Carlo technique, see [14, 15, 17]. For an optimality result of this technique applied to quadrature problems of the form with , we refer to . In mathematical finance, the functional often represents a discounted payoff of some derivative and is the corresponding price.
In , various numerical schemes have been proposed and numerically tested for the SDE (1) with different choices of the corresponding parameters. These numerical results indicate a convergence at a polynomial rate, which depends on the parameters . More precisely, the empirical convergence rate is monotonically decreasing in the quotient for all numerical schemes that have been tested. Polynomial convergence rates for strong approximation of (1) have been proven by [4, 13, 2, 25, 19], where either a global or the final time error w.r.t. the -norm is studied. All these results only hold for some parameter range within and share the same monotonicity, see Figure 1. For an overview of numerical schemes and results on strong convergence without a rate we refer to  and the references therein.
In this paper we consider the particular case of the SDE (1) with
By rescaling with , we thus may restrict ourselves to SDEs of the form
with and . Furthermore, we focus on the particular instance of (2) with , i.e.,
Its solution is called the square of a -dimensional Bessel process. For a detailed study of (squared) Bessel processes we refer to [26, Chap. XI].
In the context of (strong) approximation of SDEs, the majority of numerical methods in the literature are non-adaptive . A non-adaptive algorithm uses a fixed discretisation of the driving Brownian motion whereas adaptive algorithms may sequentially choose the evaluation points. The most frequently studied methods in the class of non-adaptive algorithms are Euler or Milstein-type methods that are based on values of the driving Brownian motion on an equidistant grid. For various strong approximation problems of SDEs satisfying standard assumptions, adaption does not help up to a multiplicative constant, see, e.g., . In particular,  showed for strong approximation of scalar SDEs at the final time point that no adaptive method can be better (up to a multiplicative constant) than the classical Milstein scheme. Let us stress that these standard assumptions are not fulfilled by (3) since the diffusion coefficient is not even locally Lipschitz continuous.
In contrast to that, the main result of this paper is that adaptive methods are far superior to methods that are based on an equidistant grid for strong approximation of the solution of (3). For this, we determine the polynomial convergence rate of the corresponding -th minimal errors, which will be introduced below.
Let be the solution of (3) at time , and let . The error of an approximation of is defined by
At first, we consider the class of methods that only use values of the driving Brownian motion on an equidistant grid with points given by
The corresponding -th minimal error for the approximation of is given by
Roughly speaking, is the error of the best algorithm for the approximation of w.r.t. the -norm that only uses . Clearly, Euler and Milstein-type schemes fit into this class of algorithms. In the case , the optimal approximation is given by the conditional expectation of given the -algebra generated by .
The class of adaptive methods that use values of the driving Brownian motion at sequentially chosen points is given by
Here, in contrast to the class , the -th evaluation site may depend on the previous observations of . Moreover, considering the particular choice of constant mappings yields for all . The -th minimal error for the approximation of for the class of adaptive methods is given by
We clearly have for all .
In the following we present our main results. We write for sequences of non-negative reals and if there exists a constant such that for all . Moreover, we write if and .
We show that the polynomial convergence rate of the -th minimal error is equal to for all . More precisely, Corollary 1 yields
for all . Of course, the constants hidden in the “”-notation may depend on . Furthermore, the corresponding upper bound is attained by the drift-implicit Euler scheme , see Theorem 2 and (23). In the more general case of the SDE (2), the drift-implicit Euler scheme is given by
for and . Let us mention that the drift-implicit Euler scheme is actually proposed for the SDE (1) with parameters satisfying , see . Nevertheless, it is still well defined in the limiting case . Note that the upper bound from (7) is the first strong convergence result with a positive rate in the case , cf. Figure 1.
For adaptive algorithms the situation is rather different. Corollary 2 shows that
for all and for all . Hence the polynomial convergence rate of the -th minimal error is equal to infinity. More precisely, for every we construct an adaptive algorithm that converges (at least) at a polynomial rate , see Theorem 4. In fact, numerical experiments suggest an exponential decay, see Figure 3. Moreover, such algorithms can be easily implemented on a computer with number of operations of order . Combining (7) and (9) establishes our claim that adaptive algorithms are far superior to non-adaptive algorithms that are based on equidistant grids for strong approximation of (3). Let us stress that this is the first result on SDEs where adaption results in an improved convergence rate compared to methods that are based on equidistant grids.
A key step for the proofs of (7) and (9) consists of identifying the pathwise solution of (3), see Proposition 1, and link this problem to global optimization under the Wiener measure. Let us mention that the analysis of the adaptive algorithm in Theorem 4 heavily relies on results of .
Although we have shown that adaptive algorithms are far superior to methods that are based on an equidistant grid for a particular choice of the parameters of SDE (1), it is open whether this superiority also holds for more general parameter constellations.
We now turn to the more general case of the SDE (2) with . Moreover, we consider a stronger error criterion which is pathwise given by the supremum norm. In this case we obtain
for all , where denotes a projected equidistant Euler scheme, see Remark 7. This scheme coincides with the drift-implicit Euler scheme for . Let us stress that this error bound is the first strong convergence result with a positive rate in the case and arbitrary , cf. Figure 1. At present, we have only shown the upper bound (10). Nevertheless, we expect this upper bound to be sharp even for adaptive algorithms.
Let us briefly comment on some consequences of the results presented above for strong approximation of CIR processes.
It is well-known that the parameters , , and in SDE (1) have an influence on the behavior of its solution. For instance, the solution remains strictly positive (the boundary point is inaccessible) if and only if the so-called Feller condition is satisfied. As illustrated in Figure 1, the drift-implicit Euler scheme converges at least with rate if , see [2, 25], and so does the corresponding -th minimal error for methods using an equidistant grid. Hence the quotient affects the convergence rate of the -th minimal error for equidistant methods since it drops down to for and according to (7).
Let us comment on lower bounds for strong approximation of SDEs at the final time point based on the values of the driving Brownian motion. In , a two-dimensional SDE is presented where the corresponding convergence rate is shown to be . In contrast to rate for smooth scalar SDEs , the difficulty in  arises from the presence of Lévy areas. More recently, the existence of SDEs with smooth coefficients has been shown where the corresponding -th minimal error converges arbitrarily slow to zero, see [16, 20]. It is crucial that these SDEs are multi-dimensional. Apart from (7), we are not aware of any other lower bound with convergence rate less than for a scalar SDE.
This paper is organized as follows. In Section 2 we derive an explicit representation of the solution of (3) and the more general case of (2). Using this representation we show sharp upper and lower bounds of in Section 3. In Section 4 we consider a particular adaptive method that achieves an arbitrarily high polynomial convergence rate. Finally, we illustrate our results by numerical experiments.
2. Squared Bessel Process of Dimension One
In this section, we will derive an explicit expression for the strong solution of (3) by using basic results about reflected SDEs. Subsequently, we will extend this technique to the more general case of SDE (2).
In the following let be a complete probability space and let be a filtration on this space satisfying the usual conditions.
Given and a Brownian motion w.r.t. , we define
for all with . Then, is a Brownian motion w.r.t. . Indeed, the quadratic variation of satisfies
and thus Lévy’s characterization can be applied. Now, consider the SDE (3) where the driving Brownian motion has the particular form given in (11). Due to this construction of , we see that the solution of (3) is given by
since and hence
Moreover, Tanaka’s formula [21, Prop. III.6.8] given by
where denotes the local time of in , yields for that
where denotes the negative part of . We stress that the explicit solution (13) of the SDE (3) holds for any Brownian motion and does not depend on the particular construction given in (11), see [21, Cor. V.3.23], since pathwise uniqueness and strong existence holds for the SDE (3). Hence the unique strong solution of the SDE (3) is given by (13).
It is well-known that the solution of the SDE (3) can be expressed in terms of in (12), see, e.g., [26, Ex. IX.3.16]. However, we are not aware of a result regarding the explicit form of the strong solution given by (13).
In the context of SDEs, the somehow explicit solution of by means of in (12) is rather useless for strong approximation. The concept of strong solutions entails a functional dependence of the solution process and the input Brownian motion appearing in the SDE, which is in our case. We thus seek to “construct” the solution out of .
We now turn to the more general case of SDE (2) with arbitrary .
Proof of Proposition 1.
Analogous to the above derivation, we start with a Brownian motion and consider the SDE
Moreover, we assume that the Brownian motion appearing in (2) has the particular form
Then, Itô’s formula shows
Hence the solution of SDE (2) is given by
On the other hand, the Tanaka-Meyer formula [21, Thm. III.7.1(v)] applied to
combined with the explicit expression (16) yields
where denotes the semimartingale local time of at . Thus, Skorokhod’s lemma [21, Lem. III.6.14] shows
It remains to apply [21, Cor. V.3.23]. ∎
Consider the situation of the proof of Proposition 1. The Tanaka-Meyer formula applied to yields
Hence is the solution of the reflected SDE on the domain given by
As shown in the previous section, the SDE (3) admits the explicit solution (13). This immediately links the problem of approximating SDE (3) to global optimization under the Wiener measure. We refer to  for results on global optimization under the Wiener measure.
In this section we show sharp (up to constants) upper and lower bounds for for all . Recall that denotes the -th minimal error corresponding to the approximation of SDE (3) and algorithms that are based on equidistant grids, see (5). We show that the convergence rate of is equal to . Moreover, we show that the drift-implicit Euler scheme converges with the optimal rate .
For the proofs we exploit results from  on the asymptotic error distribution of the infimum of a Brownian motion approximated by equidistant points on the unit interval. For we define
where denotes a standard Brownian motion. The following result is due to [3, Thm. 1 and Lem. 6].
Theorem 1 ().
The sequence converges in distribution.
For all the sequence is uniformly integrable. In particular, for all we have
In , the limiting distribution of the sequence is given explicitly by means of three-dimensional Bessel processes.
Recall that the solution of (3) is given by with
Moreover, for we define the approximation of by
serves as an approximation of . Here, the global infimum is simply replaced by the discrete minimum over equidistant knots.
The following upper bound is a consequence of Theorem 1(ii).
For every we have
In particular, the -th minimal error satisfies
The natural extension of to an approximation on the whole equidistant grid with mesh size is given by
Let us stress that this scheme can be expressed by the following Euler-type scheme
and . In the context of reflected SDEs, scheme (22) is simply the projected Euler scheme for a reflected Brownian motion. Due to
the drift-implicit Euler scheme (8) for reads
Thus it coincides with the squared version of (22), i.e.,
We briefly comment on results for the more general case of SDE (2) with arbitrary . Moreover, we consider a stronger global error criterion where pathwise the global error is measured in the supremum norm. Up to a logarithmic factor we will obtain the same error bound as in Theorem 2.
and piecewise constant interpolation, i.e.,
The solution to SDE (2) is then approximated by
We now turn to the question whether an algorithm can do better than in an asymptotic sense if this algorithm has the same information about the Brownian motion as .
For all we have
For all we have
In particular, the drift-implicit Euler scheme (23) is asymptotically optimal.
In Corollary 1 we obtain the same rate as in  for global optimization. In , the author studies optimal approximation of the time point where a Brownian motion attains its maximum, and provides a detailed analysis of general non-adaptive algorithms that do not necessarily rely on equidistant grids.
If we allow for more information about the Brownian path than just point evaluations , the situation may change completely. For instance, if we consider algorithms that have access to the final value and to the infimum of the Brownian path, the problem of strong approximation of becomes trivial. Let us stress that the joint distribution of has an explicit representation by means of a Lebesgue density, see, e.g., [6, p. 154].
In this section we present an adaptive algorithm for the approximation of the solution of SDE (3) based on sequential observations of the Brownian motion . In contrast to Section 3, here the points are chosen adaptively, i.e., the -th evaluation site is a measurable function of . From Proposition 1 it is clear that the actual task consists of the approximation of the global infimum . For this we use the adaptive algorithm from , see also [7, 8]. This algorithm approximates by the discrete minimum . In the following we describe the adaptive choice of .
The first observation is non-adaptively chosen to be the endpoint, i.e., . Moreover, for notational convenience we define .
Let , and consider the -th step of the algorithm where will be chosen based on the previous observations . For this, we denote the ordered first evaluation sites by
such that . Moreover, we assume that we have made the following observations
and we denote the corresponding discrete minimum by
Conditioned on (24), we have independent Brownian bridges from to on the subinterval , for . In the following we denote a Brownian bridge from to on with and by .
The basic idea of the adaptive algorithm is a simple greedy strategy: The next observation is taken at the midpoint of the subinterval where the probability that the corresponding Brownian bridge undershoots the current discrete minimum minus some threshold is maximal. More precisely, we split the interval according to
with , and evaluate at
We note that the infimum of a Brownian bridge satisfies
Finally, we have to specify the threshold . This threshold is chosen to be
where denotes the length of the smallest subinterval at step and where is some prespecified parameter. Let us stress that all above (adaptive) quantities depend on the choice of the parameter , although it is not explicitly indicated.
This amounts to a family of adaptive algorithms defined by
for the approximation of (13), where the adaptively chosen points depend on the prespecified choice of .
A straightforward implementation of the algorithm (26) on a computer requires operations of order .
The following result is an immediate consequence of [9, Thm. 1].
For all and for all there exists such that
Recall the definition of the -th minimal error given in (6).
For all and for all we have