Local linear estimator for stochastic differential equations driven by \alpha-stable Lévy motions

# Local linear estimator for stochastic differential equations driven by α-stable Lévy motions

Song Yu-Ping, Lin Zheng-Yan 111Corresponding author, zlin@zju.edu.cn
Department of Mathematics, Zhejiang University,
Hangzhou, China, 310027

: We study the local linear estimator for the drift coefficient of stochastic differential equations driven by -stable Lévy motions observed at discrete instants letting . Under regular conditions, we derive the weak consistency and central limit theorem of the estimator. Compare with Nadaraya-Watson estimator, the local linear estimator has a bias reduction whether kernel function is symmetric or not under different schemes.

: local linear estimator; stable Lévy motions; bias reduction; consistency; central limit theorem.

: 60J52; 62G20; 62M05; 65C30.

## 1 Introduction

Continuous-time models play an important role in the study of financial time series. Especially, many models in economics and finance, like those for an interest rate or an asset price involve continuous-time diffusion processes. Particularly, their theoretical and empirical applications to finance are quite extensive (see Jacod and Shiryaev [18]). However, growing evidence shows that stochastic processes with jumps are becoming more and more important (see Andersen et al. [2]; Baskshi et al. [7]; Duffie et al. [11]). Recently, stochastic processes with jumps as an intension of continuous-path ones have been studied by more and more statisticians since the financial phenomena can be better characterized (see At-Sahalia and Jacod [1]; Bandi and Nguyen [3]).

A diffusion model with continuous paths is represented by the following stochastic differential equation:

 dXt=μ(Xt)dt+σ(Xt)dWt,

where is a standard Brownian motion, is an unknown measurable function and is an unknown positive function. Many authors have investigated nonparametric estimations for the drift function and the diffusion function , which to some extend prevent the misspecification of the model (1.1) compare with the parametric estimations. Prakasa Rao [27] constructed a non-parametric estimator similar as the Nadaraya-Watson estimator for . Bandi and Phillips [4] discuss the Nadaraya-Watson estimator for these functions of non-stationary recurrent diffusion processes. Fan and Zhang [15] proposed local linear estimators for them and obtained bias reduction properties. In a finite sample, Xu [34]extended re-weighted idea proposed by Hall and Presnell [16] to estimate under recurrence. Xu [33] discussed the empirical likelihood-based inference for nonparametric recurrent diffusions to construct confidence intervals. Furthermore, Bandi and Phillips [5] proposed a simple and robust approach to specify a parameter class of diffusions and estimate the parameters of interest by minimizing criteria based on the integrated squared difference between kernel estimates of the drift and diffusion functions and their parametric counterparts.

Recently, stochastic processes with jumps have been paid more attention in various applications, for instance, financial time series to reflect discontinuity of asset return (see Baskshi et al. [7]; Duffie et al. [11]; Johannes [20]; Bandi and Nguyen [5]). In this paper, we consider the stochastic process with jumps through the stochastic differential equation driven by an -stable Lévy motion (1 ):

 dXt=μ(Xt−)dt+σ(Xt−)dZt,     X0=η,

where is a standard -stable Lévy motion defined on a probability space equipped with a right continuous and increasing family of -algebras and is a random variable independent of . has a - stable distribution with the characteristic function:

 Eexp{iuZ1}=exp{−|u|α(1−iβsgn(u)tanαπ2)},u∈R,

where is the skewness parameter. One can refer to Sato [30], Barndorff-Nielsen et al. [6] for more detailed properties on stable distributions. Usually, we get observations for model (1.2), where is the time frequency for observation and is the sample size. This paper is devoted to the nonparametric estimation of the unknown drift function. Our estimation procedure for model (1.2) should be based on

The stochastic differential equation driven by Lévy motion has received growing interest from both theoreticians and practitioners recently, such as applications to finance, climate dynamics et al.. Masuda ([23], [24]) proved some probabilistic properties of a multidimensional diffusion processes with jumps and provided mild regularity conditions for a multidimensional Ornstein-Uhlenbeck process driven by a general Lévy process for any initial distribution to be exponential - mixing. When model (1.2) is specially a mean-reverted Ornstein-Uhlenbeck process driven by a Lévy process, i.e. is known to be linear with the form and where is unknown parameters to be estimated. Based on Hu and Long [17] studied the least-squares estimator for when Z is symmetric -stable and . Masuda [25]considered an approximate self-weighted least absolute deviation type estimator for Zhou and Yu [37]proved the asymptotic distributions of the least squares estimator of the mean reversion parameter allowing for nonlinearity in the diffusion function under three sampling schemes. However, in model (1.2), the drift function is seldom known and the diffusion function may be nonlinear in reality. With no prior specified form of the drift function, Long and Qian [22] discussed the Nadaraya-Watson estimator for it and obtained the weak consistency and central limit theorem.

The Nadaraya-Watson estimator given for is locally approximating by a constant (a zero-degree polynomial). However, in the context of nonparametric estimator with finite-dimensional auxiliary variables, local polynomial smoothing has become the ¡°golden standard¡± (see Fan [12], Wand and Jones [36]). The local polynomial estimator is known to share the simplicity and consistency of the kernel estimators as Nadaraya-Watson or Gasser-Müller estimators but avoids boundary effects, at least when convergence rates are concerned. Local polynomial smoothing at a point x fits a polynomial to the pairs for those falling in a neighborhood of determined by a smoothing parameter . The local polynomial estimator has received increasing attention and it has gained acceptance as an attractive method of nonparametric estimation function and its derivatives. This smoothing method has become a powerful and useful diagnostic tool for data analysis. In particular, the local linear estimator locally fits a polynomial of degree one. In this paper, we propose the local linear estimators for drift function in model (1.2). As a nonparametric methodology, local polynomial estimator makes use of the observation information to estimate corresponding functions not assuming the function form. The estimator is obtained by locally fitting a polynomial of degree one to the data via weighted least squares and it shows advantages compared with Nadaraya-Watson approach (see Fan and Gijbels [13]). For further motivation and study of the local linear estimator, see Fan and Gijbels [14], Ruppert and Wand [29], Stone [32], Cleveland [10].

The remainder of this paper is organized as follows. In Section 2, local linear estimator and appropriate assumptions for model (1.2) are introduced. In Section 3, we present some technical lemmas and asymptotic results. The proofs will be collected in Section 4.

## 2 Local Linear Estimator and Assumptions

We lay out some notations. For simplify, denotes and we shall omit the subscript in the notation if no confusion will be caused. We will use notation “” to denote “convergence in probability”, notation “” to denote “convergence almost surely” and notation “” to denote “convergence in distribution”.

Local polynomial estimator firstly introduced in Fan [12] has been widely used in regression analysis and time series analysis. It has gained acceptance as an attractive method of nonparametric estimation of regression function and its derivatives. The estimator is obtained by locally fitting -th polynomial to the data via weighted least squares and it shows advantages compared with other kernel nonparametric regression estimators. The idea of weighted local polynomial regression is the following: under some smoothness conditions of the curve , we can expand in a neighborhood of the point as follows:

 m(x) ≈ m(x0)+m′(x0)(x−x0)+m′′(x0)2!(x−x0)2+⋯+m(p)(x0)p !(x−x0)p ≡ p∑j=0βj(x−x0)j,

where

Thus, the problem of estimating infinite dimensional is equivalent to estimating the -dimensional parameter Consider a weighted local polynomial regression:

 argminβ0,β1,⋯,βpn−1∑i=0{Yi−p∑j=0βj(Xi−x)j}2Khn(Xi−x),

where and is kernel function with the bandwidth.

What we are interested in is to estimate , hence as Fan and Gijbels [14] remarked, it is reasonable for us to discuss : the local linear estimator for the drift function in this paper. The local linear estimator for is the solution of the optimal problem:

 argminβ0,β1n−1∑i=0{Yi−1∑j=0βj(Xi−x)j}2Kh(Xi−x).

The solution of is

 ^μ(x)=n−1∑i=0Kh(Xi−x){Sn,2h2−(Xi−xh)Sn1h}(Xi+1−Xi)Δn−1∑i=0Kh(Xi−x){Sn,2h2−(Xi−xh)Sn1h},

where

We can also write

 ^hn(x)=1nn−1∑i=0Kh(Xi−x){Sn,2nh2−(Xi−xh)Sn1nh},
 ^gn(x)=1nΔn−1∑i=0Kh(Xi−x){Sn,2nh2−(Xi−xh)Sn1nh}(Xi+1−Xi).

We now present some assumptions used in this paper.

The drift function is twice continuously differentiable with bounded first and second order derivatives; the diffusion function satisfy a global Lipschitz condition, there exists a positive constant C 0 such that

 |σ(y)−σ(x)|≤C|y−x|, y, x ∈R.

There exist positive constants and such that for each

The solution admits a unique invariant distribution and is geometrically strong mixing, i.e. there exists and such that

The density function of the stationary distribution is continuously differentiable and 0.

The kernel function is nonnegative probability density function with compact support satisfying:

As

The condition (A.1) ensures that (1.2) admits a unique non-plosive càdlàg adapted solution, see Jacod and Shiryaev [18]. (A.3) implies is ergodic and stationary. The mixing property of a stochastic process describes the temporal dependence in data. One can refer to Bradley [9] for different kinds of mixing properties. For some sufficient conditions which guarantee (A.3), one can refer to Masuda [24]. The kernel function is not necessarily to be symmetric. Sometimes, unilateral kernel function may make predictor easier (see Fan and Zhang [15]).

## 3 Some Technical Lemmas and Asymptotic Results

We say that a continuous function grows more slowly than () if there exist positive constants such that for all and all .

Let be a predictable process satisfying almost surely for We assume that either is nonnegative or is symmetric. If grows more slowly than , then there exist positive constants and depending only on such that for each

 c1E[G((∫T0|ϕ(t)|αdt)1/α)]≤E[G(supt≤T|∫t0ϕ(s)dZs|)]≤c2E[G((∫T0|ϕ(t)|αdt)1/α)].

This lemma can be viewed as a generalization of Theorem 3.2 in Rosonski and Woyczynski [28], where they only dealt with the case that is symmetric.

Suppose that there is a deterministic and nonnegative function such that

 Φα(T)∫T0|ϕ(t)|αdt\lx@stackrelp→1   as   T→∞.

Then, we have

 Φ(T)∫T0|ϕ(t)|dZt⟹Sα(1,β,0).

This lemma can be regarded as an extending to -stable case of Theorem 1.19 in Kutoyants [21].

Assumptions (A.1) - (A.6) lead to the following result:

 1nn−1∑i=0Kh(Xi−x)(Xi−xh)k\lx@stackrela.s.⟶f(x)∫+∞−∞ukK(u)du

In Long and Qian [22], they proved a weaker case: One can easily obtain based on this lemma.

Assume that (A.1)-(A.6) hold and , then as

Let and assume that (A.1) - (A.6) are satisfied.

If and for some , then

 (nΔh)1−1αΛ(x)(^μ(x)−μ(x))⇒Sα(1,β,0)

If and for some , then

 (nΔh)1−1αΛ(x)(^μ(x)−μ(x)−h2Γμ(x))⇒Sα(1,β,0)

where and

In Long and Qian [22], they showed the following results under the Assumptions in this paper:   If and for some , then

 (nΔh)1−1αΛ(x)(^μ(x)−μ(x)−hK1)⇒Sα(1,β,0)

If and for some , is symmetric,, then

 (nΔh)1−1αΛ(x)(^μ(x)−μ(x)−h2Γμ(x))⇒Sα(1,β,0)

where and We can easily observe that the bias in the local linear case is smaller than the one in the Nadaraya-Watson case in comparison to the results between this paper and Long & Qian [22] whether is symmetric or not. Furthermore, when , is inconsistent with easily obtained from Theorem 3.2.

## 4 Proofs

See Long and Qian (Lemma 2.7).

See Long and Qian (Lemma 2.6).

We first note that

 1nn−1∑i=0Kh(Xi−x)(Xi−xh)k−f(x)∫+∞−∞ukK(u)du = 1nn−1∑i=0Kh(Xi−x)(Xi−xh)k−1nn−1∑i=0E[Kh(Xi−x)(Xi−xh)k]

From the stationarity of , we have:

 1nn−1∑i=0E[Kh(Xi−x)(Xi−xh)k] = E[Kh(X1−x)(X1−xh)k] = ∫+∞−∞Kh(y−x)(y−xh)kf(y)dy = ∫+∞−∞K(u)ukf(x+uh)du → f(x)∫+∞−∞ukK(u)du.(4.2)

Thus, from (4.1) and (4.2) it suffices to prove that

 1nn−1∑i=0Kh(Xi−x)(Xi−xh)k−1nn−1∑i=0E[Kh(Xi−x)(Xi−xh)k] = 1nn−1∑i=0{Kh(Xi−x)(Xi−xh)k−E[Kh(Xi−x)(Xi−xh)k]} =: 1nn−1∑i=0δn,i(x)\lx@stackrela.s.⟶0.(4.3)

Note that a.s. for some positive constant by the compact support of Applying Theorem 1.3 (2) in Bosq [8], we have for each integer and each

 P(1n∣∣ ∣∣n−1∑i=0δn,i(x)∣∣ ∣∣>ε)≤4exp(−ε2q8ν2(q))+22(1+4C0h−1ε)1/2qαX([p]Δ),

where

 ν2(q)=2p2s(q)+C0h−1ε2

with and

 s(q) = max0≤j≤2q−1E[([jp]+1−jp)δn,[jp]+1(x)+δn,[jp]+2(x)+⋯ + δn,[(j+1)p](x)+((j+1)p−[(j+1)p])δn,[(j+1)p]+1(x)]2.

Using the Hölder inequality and stationarity of , one can easily obtain that .

By choosing and , we get

 ε2q8ν2(q)=ε2⋅O(qh)=O(ε2√nΔh).

Moreover, we can obtain

 22(1+4C0h−1ε)1/2qαX([p]Δ)≤C(ε)exp(−O(ε2√nΔh))

under the mixing properties of in (A.3) and (A.6).

(4.4), (4.5) and (4.6) imply

 P(1n∣∣ ∣∣n−1∑i=0δn,i(x)∣∣ ∣∣>ε)≤C(ε)exp(−O(ε2√nΔh)).

Therefore, based on Borel-Cantelli lemma and (A.6).

It suffices to prove that

 ^gn(x)\lx@stackrelp⟶[K2f2(x)−(K1f(x))2]μ(x).

By (1.2), we first note that

 ^gn(x) = = = =: gn,1(x)+gn,2(x)+gn,3(x).

To show the convergence of , we should prove the following three results:

(i)

(ii)

(iii)

 gn,1(x) = μ(x)1nn−1∑i=0Kh(Xi−x){Sn,2nh2−(Xi−xh)Sn1nh} +1nn−1∑i=0Kh(Xi−x){Sn,2nh2−(Xi−xh)Sn1nh}(μ(Xi)−μ(x)) =: An,1(x)+An,2(x).(4.7)

Using Lemma 3.3, it is obviously that

 An,1(x)\lx@stackrela.s.⟶μ(x)[K2f2(x)−(K1f(x))2].

By the Lipschitz property of and the stationarity of , we have

 |An,2(x)|≤Lnn−1∑i=0|Xi−x|Kh(Xi−x)∣∣∣Sn,2nh2∣∣∣+Lnn−1∑i=0|Xi−x|Kh(Xi−x)∣∣∣Xi−xh∣∣∣∣∣∣Sn,1nh∣∣∣,

where denotes the bound of the first derivative of .

The two components of the right part are dealt with in the same way, so we only deal with the first one for convenience.

 1nn−1∑i=0|Xi−x|Kh(Xi−x)∣∣∣Sn,2nh2∣∣∣ = 1nn−1∑i=0(|Xi−x|Kh(Xi−x)∣∣∣Sn,2nh2∣∣∣−E[|Xi−x|Kh(Xi−x)∣∣∣Sn,2nh2∣∣∣]) +1nn−1∑i=0E[|Xi−x|Kh(Xi−x)∣∣∣Sn,2nh2∣∣∣].(4.10)

We find that is a.s. uniformly bounded for each i by Lemma 3.3 and the compact support of . Similar as the proof of (4.3), we can show that

 1nn−1∑i=0(|Xi−x|Kh(Xi−x)∣∣∣Sn,2nh2∣∣∣−E[|Xi−x|Kh(Xi−x)∣∣∣Sn,2nh2∣∣∣])\lx@stackrelp⟶0.

As for the second part,

 limh→01nn−1∑i=0E[|Xi−x|Kh(Xi−x)∣∣∣Sn,2nh2∣∣∣] = limh→0E[|X1−x|Kh(X1−x)∣∣∣Sn,2nh2∣∣∣] = limh→0K2f(x)E[|X1−x|Kh(X1−x)] = limh→0hK2f(x)∫+∞−∞|u|K(u)f(x+uh)du = limh→0hK2f2(x)∫+∞−∞|u|K(u)du→0.(4.12)

It follows that as . Hence by (4.7)-(4.12).

We first introduce a basic inequality for (1.2):

 supti≤t≤ti+1|Xt−Xti|≤eLΔ(|μ(Xi)|Δ+supti≤t≤ti+1∣∣∣∫ttiσ(Xs−)dZs∣∣∣),

which one can refer to Long & Qian [22], Shimizu & Yoshida [31] and Jacod & Protter [19].

 |gn,2(x)| ≤ 1nΔn−1∑i=0Kh(Xi−x){∣∣∣Sn,2nh2∣∣∣+∣∣∣Xi−xh∣∣∣∣∣∣Sn1nh∣∣∣}∫ti+1ti|μ(Xs−)−μ(Xi)|ds ≤ LnΔn−1∑i=0Kh(Xi−x){∣∣∣Sn,2nh2∣∣∣+∣∣∣Xi−xh∣∣∣∣∣∣Sn1nh∣∣∣}∫ti+1ti|Xs−−Xi|ds ≤ Lnn−1∑i=0Kh(Xi−x){∣∣∣Sn,2nh2∣∣∣+∣∣∣Xi−xh∣∣∣∣∣∣Sn1nh∣∣∣}supti≤t≤ti+1|Xt−Xti| ≤ =: An,3(x)+An,4(x)+An,5(x).(4.14)

Similarly as the proof of (4.1), we know that

where

that is

are dealt with the same approach, hence here we only verify

As for , according to Lemma 3.3, we only need to verify

 1nn−1∑i=0Kh(Xi−x)supti≤t≤ti+1∣∣∣∫ttiσ(Xs−)dZs∣∣∣\lx@stackrelp→0.

By Markov inequality and Lemma 3.1, we have

 P(1nn−1∑i=0Kh(Xi−x)supti≤t≤ti+1∣∣∣∫ttiσ(Xs−)dZs∣∣∣>ε) ≤ 1nεn−1∑i=0E[supti≤t≤ti+1∣∣∣∫ttiKh(Xi−x)σ(Xs−)dZs∣∣∣] ≤