Approximate continuous-discrete filters for the estimation of diffusion processes from partial and noisy observations

# Approximate continuous-discrete filters for the estimation of diffusion processes from partial and noisy observations

J.C. Jimenez
Instituto de Cibernética, Matemática y Física
Departamento de Matemática Interdisciplinaria
Calle 15, no. 551, Vedado, La Habana, Cuba
e-mail: jcarlos@icimaf.cu
###### Abstract

In this paper, an alternative approximation to the innovation method is introduced for the parameter estimation of diffusion processes from partial and noisy observations. This is based on a convergent approximation to the first two conditional moments of the innovation process through approximate continuous-discrete filters of minimum variance. It is shown that, for finite samples, the resulting approximate estimators converge to the exact one when the error of the approximate filters decreases. For an increasing number of observations, the estimators are asymptotically normal distributed and their bias decreases when the above mentioned error does it. A simulation study is provided to illustrate the performance of the new estimators. The results show that, with respect to the conventional approximate estimators, the new ones significantly enhance the parameter estimation of the test equations. The proposed estimators are intended for the recurrent practical situation where a nonlinear stochastic system should be identified from a reduced number of partial and noisy observations distant in time.

## 1 Introduction

The statistical inference for diffusion processes described by Stochastic Differential Equations (SDEs) is currently a subject of intensive researches. A basic difficulty of this statistical problem is that, except for a few simple examples, the joint distribution of the discrete-time observations of the process has unknown closed-form. In addition, if only some components of the diffusion process contaminated with noise are observed, then an extra complication arises. Typically, in this situation, the statistical problem under consideration is reformulated in the framework of continuous-discrete state space models, where the SDE to be estimated defines the continuous state equation and the given observations are described in terms of an discrete observation equation. For such class of models, a number of estimators based on analytical and simulated approximations have been developed in the last four decades. See, for instance, Nielsen et. al (2000a) and Jimenez et al. (2006) for a review.

In particular, the present paper deals with the class of innovation estimators for the parameter estimation of SDEs given a time series of partial and noisy observations. These are the estimators obtained by maximizing a normal log-likelihood function of the discrete-time innovations associated with the underlying continuous-discrete state space model. Approximations to this class of estimators have been derived by approximating the discrete-time innovations by means of inexact filters. With this purpose, approximate continuous-discrete filters like the Local Linearization (Ozaki 1994, Shoji 1998, Jimenez & Ozaki 2006), the extended Kalman (Nielsen & Madsen 2001, Singer 2002), and the second order (Nielsen et al. 2000b, Singer 2002) filters have been used, as well as, discrete-discrete filters after the discretization of the SDE by means of a numerical scheme (Ozaki & Iino 2001, and Peng et al. 2002). The approximate innovation estimators obtained in this way have been useful for the identification, from actual data, of a variety of neurophysiological, financial and molecular models among others (see, e.g., Calderon 2009, Chiarella et al. 2009, Jimenez et al. 2006, Riera et al. 2004, Valdes et al. 1999). However, a common feature of the approximate innovation estimators mentioned above is that, once the observations are given, the error between the approximate and the exact innovations is fixed and completely determined by the distance between observations. Clearly, this fixes the bias of the approximate estimators for finite samples and obstructs its asymptotic correction when the number of observations increases.

In this paper, an alternative approximation to the innovation estimator for diffusion processes is introduced, which is oriented to reduce and control the estimation bias. This is based on a recursive approximation of the first two conditional moments of the innovation process through approximate filters that converge to the linear one of minimum variance. It is shown that, for finite samples, the resulting approximate estimators converge to the exact one when the error of the approximate filters decreases. For an increasing number of observations, they are asymptotically normal distributed and their bias decreases when the above mentioned error does it. As a particular instance, the approximate innovation estimators designed with the order- Local Linearization filters are presented. Their convergence, practical algorithms and performance in simulations are also considered in detail. The simulations show that, with respect to the conventional approximations to the innovation estimators, the new approximate estimators significantly enhance the parameter estimation of the test equations given a reduced number of partial and noisy observations distant in time, which is a typical situation in many practical inference problems.

The paper is organized as follows. In section 2, basic notations and definitions are presented. In section 3, the new approximate estimators are defined and some of their properties studied. As a particular instance, the order- innovation estimator based on convergent Local Linearization filters is presented in Section 4, as well as algorithms for its practical implementation. In the last section, the performance of the new estimators is illustrated with various examples.

## 2 Notation and preliminary

Let be the underlying complete probability space and be an increasing right continuous family of complete sub -algebras of , and be a -dimensional diffusion process defined by the stochastic differential equation

 dx(t)=f(t,x(t);θ)dt+m∑i=1gi(t,x(t);θ)dwi(t) (1)

for , where and are differentiable functions, is an -dimensional -adapted standard Wiener process, is a vector of parameters, and is a compact set. Linear growth, uniform Lipschitz and smoothness conditions on the functions and that ensure the existence and uniqueness of a strong solution of (1) with bounded moments are assumed for all .

Let us consider the state space model defined by the continuous state equation (1) and the discrete observation equation

 z(tk)=Cx(tk)+etk, for k=0,1,..,M−1, (2)

where is a sequence of -dimensional i.i.d. Gaussian random vectors independent of , an positive semi-definite matrix, and an matrix. Here, it is assumed that the time instants define an increasing sequence , .

Suppose that, through (2), partial and noisy observations of the diffusion process defined by (1) with are given on . In particular, denote by the sequence of these observations, where denotes the observation at for all .

The inference problem to be consider here is the estimation of the parameter of the SDE (1) given the time series . Specifically, let us consider the innovation estimator defined as follows.

###### Definition 1

(Ozaki, 1994) Given observations of the continuous-discrete state space model (1)-(2) with on ,

 ˆθM=arg{minθUM(θ,Z)} (3)

defines the innovation estimator of , where

 UM(θ,Z)=(M−1)ln(2π)+M−1∑k=1ln(det(Σtk))+ν⊤tk(Σtk)−1νtk,

is the discrete innovations of the model (1)-(2) and the innovation variance for all .

In the above definition,

 νtk=ztk−Cxtk/tk−1 \ \ % \ \ \ and \ \ \ \ \ Σtk=CUtk/tk−1C⊺+Πtk,

where and denote the conditional mean and variance of the diffusion process at  given the observations for all and . Here, the predictions and are recursively computed through the Linear Minimum Variance filter for the model (1)-(2). Because the first two conditional moments of are correctly specified, Theorem 1 in Ljung & Caines (1979) for prediction error estimators implies the consistent and the asymptotic normality of the innovation estimator (3) under conventional regularity conditions (Ozaki 1994, Nolsoe et al. 2000).

In general, since the conditional mean and variance of equation (1) have not explicit formulas, approximations to them are needed. If and are approximations to and , then the estimator

 ˆϑM=arg{minθ˜UM(θ,Z)},

with

 ˜UM(θ,Z)=(M−1)ln(2π)+M−1∑k=1ln(det(˜Σtk))+˜ν⊤tk(˜Σtk)−1tk˜νtk

provides an approximation to the innovation estimator (3), where

 ˜νtk=ztk−C˜xtk/tk−1 \ \ \ \ \ and \ \ \ \ \ ˜Σtk=C˜Utk/tk−1C⊺+Πtk

are approximations to and .

Approximate estimators of this type have early been considered in a number of papers. Approximate continuous-discrete filters like Local Linearization filters (Ozaki 1994, Shoji 1998, Jimenez & Ozaki 2006), extended Kalman filter (Nielsen & Madsen 2001, Singer 2002), and second order filters (Nielsen et al. 2000b, Singer 2002) have been used for approximating the values of and . On the other hand, in Ozaki & Iino (2001) and Peng et. al (2002), discrete-discrete filters have also been used after the discretization of the equation (1) by means of a numerical scheme. In all these approximations, once the data are given (and so the time partition is specified), the error between and is completely settled by and can not be reduced. In this way, the difference between the approximate innovation estimator and the exact one can not be reduced neither. Clearly, this is a important limitation of these approximate estimators. Nevertheless, in a number of practical situations (see Jimenez & Ozaki 2006, Jimenez et al. (2006), and references therein) the bias the approximate innovation estimators is negligible. Therefore, these estimators has been useful for the identification, from actual data, of a variety of neurophysiological, financial and molecular models among others as it was mentioned above. Further, in a simulation study with the Cox-Ingersoll-Ross model of short-term interest rate, approximate innovation methods have provided similar or better results than those obtained by prediction-based estimating functions but with much lower computational cost (Nolsoe et al., 2000). Similar results have been reported in a comparative study with the approximate likelihood via simulation method (Singer, 2002).

Denote by the space of time continuously differentiable functions for which and all its partial derivatives up to order have polynomial growth.

## 3 Order-β innovation estimator

Let be a time discretization of such that , and be the approximate value of obtained from a discretization of the equation (1) for all . Let us consider the continuous time approximation for all of with initial conditions

 (4)

satisfying the bound condition

 E(|y(t)|2q{\LARGE\TEXTsymbol{|}}Ft0)≤L (5)

for all ; and the weak convergence criteria

 suptk≤t≤tk+1∣∣E(g(x(t)){% \LARGE\TEXTsymbol{|}}Ftk)−E(g(y(t))% {\LARGE\TEXTsymbol{|}}Ftk)∣∣≤Lkhβ (6)

for all and , where , and are positive constants, , and . The process defined in this way is typically called order- approximation to in weak sense (Kloeden & Platen, 1999). The second conditional moment of is also assumed to be positive definite and continuous for all .

In addition, let us consider the following approximation to the Linear Minimum Variance (LMV) filter of the model (1)-(2).

###### Definition 2

(Jimenez 2012b) Given a time discretization , the order- Linear Minimum Variance filter for the state space model (1)-(2) is defined, between observations, by

 yt/t=E(y(t)/Zt) \ \ \ \ \ \ and \ \ \ Vt/t=E(y(t)y⊺(t)/Zt)−yt/ty⊺t/t (7)

for all , and by

 ytk+1/tk+1=ytk+1/tk+Ktk+1(ztk+1−Cytk+1/tk), (8)
 Vtk+1/tk+1=Vtk+1/tk−Ktk+1CVtk+1/tk, (9)

for each observation at , with filter gain

 Ktk+1=Vtk+1/tkC⊺(CVtk+1/tkC⊺+Πtk+1)−1 (10)

for all , where is an order- approximation to the solution of (1) in weak sense, and are given observations of (1)-(2) until the time instant . The predictions and , with initial conditions and , are defined for all and .

Once an order- approximation to the solution of equation (1) is chosen, and so an order- LMV filter is specified, the following approximate innovation estimator can naturally be defined.

###### Definition 3

Given observations of the continuous-discrete state space model (1)-(2) with on , the order- innovation estimator for the parameters of (1) is defined by

 ˆθM(h)=arg{minθUM,h(θ,Z)}, (11)

where

 UM,h(θ,Z)=(M−1)ln(2π)+M−1∑k=1ln(det(Σh,tk))+ν⊺h,tk(Σh,tk)−1νh,tk,

with and , being and the prediction mean and variance of an order- LMV filter for the model (1)-(2), and the maximum stepsize of the time discretization associated to the filter.

In principle, according to the above definitions, any kind of approximation converging to in a weak sense can be used to construct an approximate order- LMV filter and so an approximate order- innovation estimator. In this way, the Euler-Maruyama, the Local Linearization and any high order numerical scheme for SDEs as those considered in Kloeden & Platen (1999) might be used as well. However, the approximations and to and in (3) at each will be now derived from the predictions of approximate LMV filter after various iterations with stepsizes lower than . Note that, when , an order- LMV filter might reduce to some one of the conventional approximation to the exact LMV filter. In this situation, the corresponding order- innovation estimator reduces to some one of the approximate innovation estimator mentioned in Section 2. In particular, to those considered in Ozaki (1994), Shoji (1998), Jimenez & Ozaki (2006) when Local Linearization schemes are used to define order- LMV filters.

Note that the goodness of the approximation to is measured (in weak sense) by the left hand side of (6). Thus, the inequality (6) gives a bound for the errors of the approximation to , for all and all pair of consecutive observations . Moreover, this inequality states the convergence (in weak sense and with rate ) of the approximation to as the maximum stepsize of the time discretization goes to zero. Clearly this includes, as particular case, the convergence of the first two conditional moments of to those of , which implies the convergence of order- LMV filter (7)-(10) to the exact LMV filter stated by Theorem 5 in Jimenez (2012b). Since the approximate innovation estimator (11) is designed in terms of the order- LMV filter (7)-(10), the weak convergence of to should then imply the convergence of the approximate innovation estimator (11) to the exact one (3) and the similarity of their asymptotic properties, as goes to zero. Next results deal with these matters.

### 3.1 Convergence

For a finite sample of observation of the state space model (1)-(2), Theorem 5 in Jimenez (2012b) states the convergence of the order- LMV filters to the exact LMV one when decreases. Therefore, the convergence of the order- innovation estimator to the exact innovation estimator is predictable when goes to zero.

###### Theorem 4

Let be a time series of observations of the state space model (1)-(2) with on the time partition . Let and be, respectively, the innovation and an order- innovation estimator for the parameters of (1) given . Then

 ∣∣ˆθM(h)−ˆθM∣∣→0

as . Moreover,

 E(∣∣ˆθM(h)−ˆθM∣∣)→0

as , where the expectation is with respect to the measure on the underlying probability space generating the realizations of the model (1)-(2) with .

Proof. Defining , it follows that

 det(Σh,tk) = det(Σtk−ΔΣh,tk) (12) = det(Σtk)det(I−Σ−1tkΔΣh,tk)

and

 Σ−1h,tk = (Σtk−ΔΣh,tk)−1 (13) =

By using these two identities and the identity

 (ztk−μh,tk)⊺(Σh,tk)−1(ztk−μh,tk) = (ztk−μtk)⊺(Σh,tk)−1(ztk−μtk) (14) +(ztk−μtk)⊺(Σh,tk)−1(μtk−μh,tk) +(μtk−μh,tk)⊺(Σh,tk)−1(ztk−μtk) +(μtk−μh,tk)⊺(Σh,tk)−1(μtk−μh,tk),

with and , it is obtained that

 UM,h(θ,Z)=UM(θ,Z)+RM,h(θ), (15)

where and are defined in (3) and (11), respectively, and

 RM,h(θ) = M−1∑k=1ln(det(I−Σ−1tkΔΣh,tk))+(ztk−μtk)⊺Mh,tk(ztk−μtk) +(ztk−μtk)⊺(Σh,tk)−1(μtk−μh,tk)+(μtk−μh,tk)⊺(Σh,tk)−1(ztk−μtk) +(μtk−μh,tk)⊺(Σh,tk)−1(μtk−μh,tk)

with .

Theorem 5 in Jimenez (2012b) deals with the convergence of the order- filters to the exact LMV one. In particular, for the predictions, it states that

 ∣∣xtk/tk−1−ytk/tk−1∣∣≤Khβ \ \ \ \ and \ \ ∣∣Utk/tk−1−Vtk/tk−1∣∣≤Khβ (16)

for all , where is a positive constant. Here, we recall that and are the predictions of the exact LMV filter for the model (1)-(2), whereas and are the predictions of the order- filter. From this and taking into account that and   follows that

 ∣∣μtk−μh,tk∣∣→0 \ \ \ \ \ \ and \ \ \ \ \ \ \ ∣∣Σtk−Σh,tk∣∣→0

as for all and . This and the finite bound for the first two conditional moments of and imply that as well with . From this and (15),

 ∣∣ˆθM(h)−ˆθM∣∣=∣∣∣arg{minθ{UM(θ,Z)+RM,h(θ)}}−arg{ minθUM(θ,Z)}∣∣∣→0 (17)

as , which implies the first assertion of the theorem.

On the other hand, since the constant in (16) does not depend of a specific realization of the model (1)-(2), from these inequalities follows that

 E(∣∣xtk/tk−1−ytk/tk−1∣∣)≤Khβ \ \ \ \ and \ \ E(∣∣Utk/tk−1−Vtk/tk−1∣∣)≤Khβ,

where the new expectation here is with respect to the measure on the underlying probability space generating the realizations of the model (1)-(2) with . From this and (17) follows that as , which concludes the proof.

The first assertion of this theorem states that, for each given data , the order- innovation estimator converges to the exact one as goes to zero. Because controls the weak convergence criteria (6) is then clear that the order- innovation estimator (11) converges to the exact one (3) when the error (in weak sense) of the order- approximation to decreases or, equivalently, when the error between the order- filter and the exact LMV filter decreases. On the other hand, the second assertion implies that the average of the errors corresponding to different realizations of the model (1)-(2) decreases when does.

Next theorem deals with error between the averages of the estimators and computed for different realizations of the state space model.

###### Theorem 5

Let be a time series of observations of the state space model (1)-(2) with on the time partition . Let and be, respectively, the innovation and an order- innovation estimator for the parameters of (1) given . Then,

 ∣∣E(ˆθM(h))−E(ˆθM)∣∣→0

as , where the expectation is with respect to the measure on the underlying probability space generating the realizations of the model (1)-(2) with .

Proof. Trivially,

 ∣∣E(ˆθM(h))−E(ˆθM)∣∣ = ∣∣E(ˆθM(h)−ˆθM)∣∣ ≤ E(∣∣ˆθM(h)−ˆθM∣∣),

where the expectation here is taken with respect to the measure on the underlying probability space generating the realizations of the model (1)-(2) with . From this and the second assertion of Theorem 4, the proof is completed.

Here, it is worth to remak that the conventional approximate innovation estimators mentioned in Section 2 do not have the desired convergence properties stated in the theorems above for the order- innovation estimator. Further note that, either in Definition 3 nor in Theorems 4 and 5 some restriction on the time partition for the data has been assumed. Thus, there are not specific constraints about the time distance between two consecutive observations, which allows the application of the order- innovation estimator in a variety of practical problems with a reduced number of not close observations in time, with sequential random measurements, or with multiple missing data. Neither there are restrictions on the time discretization on which the order- innovation estimator is defined. Thus, can be set by the user by taking into account some specifications or previous knowledge on the inference problem under consideration, or automatically designed by an adaptive strategy as it will be shown in the section concerning the numerical simulations.

### 3.2 Asymptotic properties

In this section, asymptotic properties of the approximate innovation estimator will be studied by using a general result obtained in Ljung and Caines (1979) for prediction error estimators. According to that, the relation between the estimator and the global minimum of the function

 WM(θ)=E(UM(θ,Z)) with θ∈Dθ (18)

should be considered, where is defined in (3) and the expectation is taken with respect to the measure on the underlying probability space generating the realizations of the state space model (1)-(2). Here, it is worth to remark that is not an estimator of since the function does not depend of a given data . In fact, indexes the best predictor, in the sense that the average prediction error loss function is minimized at this parameter (Ljung & Caines, 1979).

In what follows, regularity conditions for the unique identifiability of the state space model (1)-(2) are assumed, which are typically satisfied by stationary and ergodic diffusion processes (see, e.g., Ljung & Caines, 1979).

###### Lemma 6

If is positive definite for all , then the function defined in (18) has an unique minimum and

 arg{minθ∈DθWM(θ)}=θ0. (19)

Proof. Since is positive definite for all , Lemma A.2 in Bollerslev & Wooldridge (1992) ensures that is the unique minimum of the function

 lk(θ)=E(ln(det(Σtk))+ν⊤tk(Σtk)−1νtk|Ztk−1)

on for all , where  and . Consequently and under the assumed unique identifiability of the model (1)-(2), is then the unique minimum of

 WM(θ)=(M−1)ln(2π)+M−1∑k=1E(lk(θ))

on

Denote by the derivative of with respect to , and by the second derivative of with respect to .

###### Theorem 7

Let be a time series of observations of the state space model (1)-(2) with on the time partition .  Let be an order- innovation estimator for the parameters of (1) given . Then

 ˆθM(h)−θ0→ΔθM(h) (20)

w.p.1 as , where as . Moreover, if for some there exists such that

 W′′M(θ)>ϵI \ \ \ and % \ \ HM,h(θ)=ME(U′M,h(θ,Z)(U′M,h(θ,Z))⊺)>ϵI (21)

for all and , then

 √MP−1/2M,h(ˆθM(h)−θ0)∼N(ΔθM(h),I) (22)

as , where with as .

Proof. Let and , where is defined in (11).

For a fixed, Theorem 1 in Ljung & Caines (1979) implies that

 ˆθM(h)−αM(h)→0 (23)

w.p.1 as ; and

 √MP−1/2M,h(αM(h))(ˆθM(h)−αM(h))∼N(0,I) (24)

as , where

 PM,h(θ)=(W′′M,h(θ))−1 HM,h(θ) (W′′M,h(θ))−1

with .

By using the identities (12)-(14), the function

 WM,h(θ)=(M−1)ln(2π)+M−1∑k=1E(ln(det(Σh,tk))+(ztk−μh,tk)⊺(Σh,tk)−1(ztk−μh,tk)),

with , can be written as

 WM,h(θ)=WM(θ)+E(RM,h(θ)), (25)

where is defined in (18) and

 RM,h(θ) = M−1∑k=1E(ln(det(I−Σ−1tkΔΣh,tk))|Ftk−1)+E((ztk−μtk)⊺Mh,tk(ztk−μtk)|Ftk−1) +E((ztk−μtk)⊺(Σh,tk)−1(μtk−μh,tk)|Ftk−1)+E((μtk−μh,tk)⊺(Σh,tk)−1(ztk−μtk)|Ftk−1) +E((μtk−μh,tk)⊺(Σh,tk)−1(μtk−μh,tk)|Ftk−1)

with , and .

Denote by and the second derivative of and with respect to .

Taking into account that

 (W′′M,h(θ))−1 = (W′′M(θ)+E(R′′M,h(θ)))−1 = (W′′M(θ))−1+KM,h(θ)

with

 KM,h(θ)=−(W′′M(θ))−1E(R′′M,h(θ))(I+(W′′M(θ))−1E(R′′M,h(θ)))−1(W′′M(θ))−1,

it is obtained that

 PM,h(θ)=(W′′M(θ))−1HM,h(θ)(W′′M(θ))−1+ΔPM,h(θ), (26)

where

Theorem 5 in Jimenez (2012b) deals with the convergence of the order- filters to the exact LMV one. In particular, for the predictions, it states that

 ∣∣xtk/tk−1−ytk/tk−1∣∣≤Khβ \ \ \ \ and \ \ ∣∣Utk/tk−1−Vtk/tk−1∣∣≤Khβ

for all , where is a positive constant. Here, we recall that and are the predictions of the exact LMV filter for the model (1)-(2), whereas and are the predictions of the order- filter. From this and taking into account that