Data assimilation and parameter estimation for a multiscale stochastic system with \alpha-stable Lévy noise

# Data assimilation and parameter estimation for a multiscale stochastic system with α-stable Lévy noise

Yanjie Zhang, Zhuan Cheng, Xinyong Zhang, Xiaoli Chen, Jinqiao Duan and Xiaofan Li Center for Mathematical Sciences & School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan 430074, China Department of Applied Mathematics, Illinois Institute of Technology, Chicago, IL 60616, USA Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China This work was partly supported by the NSF grant 1620449, and NSFC grants 11531006, 11371367, and 11271290.
###### Abstract

This work is about low dimensional reduction for a slow-fast data assimilation system with non-Gaussian stable Lévy noise via stochastic averaging. When the observations are only available for slow components, we show that the averaged, low dimensional filter approximates the original filter, by examining the corresponding Zakai stochastic partial differential equations. Furthermore, we demonstrate that the low dimensional slow system approximates the slow dynamics of the original system, by examining parameter estimation and most probable paths.

Keywords: Multiscale systems, non-Gaussian Lévy noise, averaging principle, Zakai equation, parameter estimation, most probable paths

## 1 Introduction

Data assimilation is a procedure to extract system state information with the help of observations [18]. The state evolution and the observations are usually under random fluctuations. The general idea is to gain the best estimate for the true system state, in terms of the probability distribution for the system state, given only some noisy observations of the system. It provides a recursive algorithm for estimating a signal or state of a random dynamical system based on noisy measurements. It is also very important in many practical applications from inertial guidance of aircrafts and spacecrafts to weather and climate prediction. Most of the existing works on data assimilation is conducted in the context of Gaussian random fluctuations. The effects of multiscale signal and observation processes in the context of Gaussian random fluctuations has been considered by Park et.al.(see [1]), and they have shown that the probability density of the original system converges to that of the reduced system, by a Fourier analysis method. Imkeller et.al.[2] have further proved the convergence in distribution for the optimal filter, via backward stochastic differential equations and asymptotic techniques.

However, random fluctuations are often non-Gaussian (in particular, Lévy type) in nonlinear systems, for example, in geosciences [21], and biosciences [3, 4, 5, 14, 15, 16, 32]. There are experimental demonstrations of Lévy fluctuations in optimal foraging theory, rapid geographical spread of emergent infectious disease and switching currents. Humphries et. al. [25] used GPS to track the wandering black bowed albatrosses around an island in Southern Indian Ocean to study the movement patterns of searching food. They found that by fitting the data of the movement steps, the movement patterns obeys the power-law property with power parameter . La Cognata et. al. [15] considered a Lotka-Volterra system of two competing species subject to multiplicative -stable Lévy noise, and analyzed the role of the Lévy noise sources. Lisowski et. al. [14] studied a model of a stepping molecular motor consisting of two connected heads, and examined its dynamics as noise parameters varied. Speed and direction appear to very sensitively depend on characteristics of the noise. They explored the effect of noise on the ballistic graphene-based small Josephson junctions in the framework of the resistively and capacitively shunted model and found that the analysis of the switching current distribution made it possible to efficiently detect a non-Gaussian noise component in a Gaussian background.

Lévy motions are appropriate models for a class of important non-Gaussian processes with jumps or bursts [3, 4, 22]. It is desirable to consider data assimilation when the system evolution is under Lévy motions. This has been recently considered by one of us and other authors but not in the multiscale context (see [20, 28, 29, 35]).

The multi-scale stochastic dynamical systems arise widely in finance and biology. For example, there are two kinds of mutual securities in financial markets. One for the low-risk bonds, which can be characterized by ordinary differential equations; the other for high-risk stocks, whose price has two different changes. On the one hand, the edge change caused by the normal supply and demand can be characterized by Gaussian noise. On the other hand, due to the arrival of the important information of the stock, there will be a finite jump in the stock price. Such perturbations can be characterized by non-Gauss noise. In general, stock prices change at all times, while bond prices change for months or even years. Thus, the price of these two securities can be characterized by two-scales system with non-Gaussian noise (see [9]). Moreover, a large number of observations from biological experiments showed that the production of mRNA and proteins occur in a bursty, unpredictable, and intermittent manner, which create variation or noise in individual cells or cell-to-cell interactions. Such burst-like events appear to be appropriately modeled by the non-Gaussian noise. Since the mRNA synthesis process is faster than the protein dynamics, this leads to a two-time-scale system (see [23]). Here represents the ratio between the natural time scales of the protein and mRNA.

Parameter estimation for continuous time stochastic models is an increasingly important part of the overall modeling strategy in a wide variety of applications. It is quite often the case that the data to be fitted to a diffusion process has a multiscale character with Gaussian noise. One example is in molecular dynamics, where it is desirable to find effective models for low dimensional phenomena (such as conformational dynamics, vacancy diffusion and so forth) which are embedded within higher dimensional time-series. We are often interested in the parameter (see [23]), which represents the degradation or production rates of protein and mRNA. In this paper, we develop a parameter estimation method for multiscale diffusions with non-Gaussian noise. The results established here may be used to examine the change rate for low-risk bounds (see [9]).

In this present paper, we consider a slow-fast data assimilation system under Lévy noise, but only the slow component is observable. By averaging out the fast component via an invariant measure, we thus reduce the system dimension by focusing on the slow system evolution. We prove that the reduced lower dimensional filter effectively approximates (in probability distribution) the original filter. We demonstrate that a system parameter may be estimated via the low dimensional slow system, utilising only observations on the slow component. The accuracy for this estimation is quantified by -moment, with . We apply the stochastic Nelder-Mead method [31] for optimization in the searching for the estimated parameter value. Furthermore, we illustrate the low dimensional approximation by comparing the most probable paths for the slow system and the original system. Finally, we make some remarks in Section 6.

This paper is organized as follows. After recalling some basic facts about Lévy motions and the generalized solution in the next section, we address the effects of the multiscale signal and observation processes in the context of Lévy random fluctuations in Section 3 to Section 5. We illustrate the low dimensional slow approximation by examining zakai equation, parameter estimation and most probable paths. Finally, we give some discussions and comments in a more biological context.

## 2 Preliminaries

We recall some basic definitions for Lévy motions (or Lévy processes).

###### Definition 1.

A stochastic process is a Lévy process if

1. (a.s.);

2. has independent increments and stationary increments; and

3. has stochastically continuous sample paths, i.e., for every , in probability, as .

A Lévy process taking values in is characterized by a drift vector , an non-negative-definite, symmetric covariance matrix and a Borel measure defined on . We call the generating triplet of the Lévy motions . Moreover, we have the Lévy-Itô decomposition for as follows:

 Lt=bt+BQ(t)+∫||y||<1y˜N(t,dy)+∫||y||≥1yN(t,dy), (2.1)

where is the Poisson random measure, is the compensated Poisson random measure, is the jump measure, and is an independent standard -dimensional Brownian motion. The characteristic function of is given by

 E[exp(i⟨u,Lt⟩)]=exp(tψ(u)),   u∈Rn, (2.2)

where the function is the characteristic exponent

 ψ(u)=i⟨u,b⟩−12⟨u,Qu⟩+∫Rn∖{0}(ei⟨u,z⟩−1−i⟨u,z⟩I{||z||\textless1})ν(dz). (2.3)

The Borel measure is called the jump measure. Here denotes the scalar product in .

###### Definition 2.

For , an -dimensional symmetric -stable process is a Lévy process with characteristic exponent

 ψ(u)=−C1(n,α)|u|α, for u∈Rn (2.4)

with .

For an -dimensional symmetric -stable Lévy process, the diffusion matrix , the drift vector , and the Lévy measure is given by

 ν(du)=C2(n,α)||u||n+αdu, (2.5)

where .

For every function , the generator for this symmetric -stable Lévy process in is

 (Aαϕ)(x)=∫Rn∖{0}[ϕ(x+u)−ϕ(x)]ν(du). (2.6)

It is known [6] that extends uniquely to a self-adjoint operator in the domain. By Fourier inverse transform,

 Aαϕ=θα,n(−Δ)α/2ϕ(x), (2.7)

where

 θα,n=∫Rn∖{0}(cos(⟨e,y⟩)−1)ν(du) (2.8)

with being the unit vector in .

For fix , and set

 ρ(x)=(1+|x|)η. (2.9)

Let be the weighted -space with norm:

 ||f||p;ρ=(∫Rn|f(x)|pρ(x)dx)1p (2.10)

For , let be the -order weighted Sobolev space with norm

 ||f||l,p;ρ=l∑i=0||∇lf||p;ρ, (2.11)

###### Definition 3.

A backward predictable stochastic process is called a generalized solution of the equality

 v(t,x)=φ(x)+∫[t,T]Lv(s,x)ds+∫[t,T]hlv(s,x)dwl(s), (2.12)

if for every it satisfies the following equation

 (v(t,x),y)=(φ,y)+∫[t,T](v(s,x),L∗y)ds+∫[t,T](hlv(s,x),y)dwl(s) (2.13)

with being the generator of some Lévy process.

## 3 A slow-fast filtering system

Let us consider the following slow-fast signal-observation system:

 dXt=f1(Xt,Yt)dt+g1(Xt,Yt)dB1t+σ1dLα11,x∈Rn, (3.14) dYt=ε−1f2(Xt,Yt)dt+ε−12g2(Xt,Yt)dB2t+1ε1α2dLα22,y∈Rm, (3.15) dZt=h(Xt)dt+dWt,z∈Rd. (3.16)

Here is an -valued signal process which represents the slow and fast components. The constant represents the noise intensity for the slow variable. The observation process is -valued. The standard Brownian motions are independent. The non-Gaussian processes (with ) are independent symmetric -stable Lévy processes with triplets and , respectively. The parameter is the ratio of the slow time scale to the fast time scale. We make the following assumptions on this filtering system.
Hypothesis H.1. The functions satisfy the global Lipschitz conditions, i.e., there exists a positive constant such that

 ||f1(x1,y1)−f1(x2,y2)||2+||g1(x1,y1)−g1(x2,y2)||2+||f2(x1,y1)−f2(x2,y2)||2+||g2(x1,y1)−g2(x2,y2)||2≤γ[||x1−x2||2+||y1−y2||2]

for all

###### Remark 1.

Note that with the help of the global Lipschitz condition, it follows that there is a positive constant such that

 ||f1(x,y)||2+||g1(x,y)||2+||f2(x,y)||2+||g2(x,y)||2≤K(1+||x||2+||y||2)

for all .

Hypothesis H.2. The coefficients are of class with the first and second order derivatives bounded

Hypothesis H.3. The sensor function is of class , i.e. all the bounded continuous functions on .

Hypothesis H.4. There exists a positive constant , such that

 supx{Tr[g2(x,y)g2(x,y)T]+2⟨y,f2(x,y)⟩}≤−C(1+|y|2). (3.17)

This hypothesis (H.4.) ensures the existence of an invariant measure (see [12]) for the fast component . With the special scaling exponent for the fast component , this invariant measure is independent of (see [11, 12]).

The infinitesimal generator of the slow-fast stochastic system is

 L=L1+1εL2, (3.18)

where

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩L1Φ(x,y,t)=f1⋅∇xΦ(x,y,t)+12Tr[G1⋅Hx(Φ(x,y,t))]+P.V.∫Rn∖{0}(Φ(x+σ1z1,y,t)−Φ(x,y,t))ν1(dz1),L2Φ(x,y,t)=f2⋅∇yΦ(x,y,t)+12Tr[G2⋅Hy(Φ(x,y,t))]+P.V.∫Rm∖{0}(Φ(x,y+z2,t)−Φ(x,y,t))ν2(dz2). (3.19)

Here , is the gradient, and is the Hessian matrix (with respect to and respectively). Let

 Zt=σ(Zs:0≤s≤t)∨N, (3.20)

where is the collection of all -negligible sets of . Define

 Z=⋁t∈R+Zt, (3.21)

where denotes taking the -algebra generated by the union . That is,

 Z=σ(⋃tZt). (3.22)

By the version of Girsanov’s change of measure theorem, we obtain a new probability measure , such that the observation becomes -independent of the signal variables . This can be done through

 d~PdP=exp(−m∑i=1∫t0hi(Xs)dWis−12m∑i=1∫t0hi(Xs)2ds). (3.23)

Denote

 ~Rt=dPd~P∣∣∣Zt. (3.24)

Then by the Kallianpur-Striebel formula, for every bounded differentiable function , we have the following representation:

 E[ϕ(Xt,Yt)|Zt]=~E[~Rtϕ(Xt,Yt)|Zt]~E[~Rt|Zt]=~E[~Rtϕ(Xt,Yt)|Z]~E[~Rt|Z]. (3.25)

The unnormalized conditional distribution of , given , is defined as . Thus, we have the following Zakai equation:

 {dPt(ϕ)=Pt(Lϕ)dt+Pt(ϕhT)dZt,P0(ϕ)=E[ϕ(X0,Y0)]. (3.26)

The -marginal of is defined as

 ρt(ϕ)=∫Rn+mϕ(x)Pt(dx,dy). (3.27)

Now we define a reduced, low dimensional signal-observation system

 {d¯Xt=¯f1(¯Xt)dt+¯g1(¯Xt)dB1t+σ1dLα11,d¯Zt=h(¯Xt)dt+dBt. (3.28)

Here

 {¯f1(x)=∫Rmf1(x,y)μx(dy),¯G(x)=∫Rmg1(x,y)g1(x,y)Tμx(dy). (3.29)

The unnormalized conditional distribution corresponding to the filter for the reduced system (3.28) satisfies the following (reduced) Zakai equation

 {d¯ρt(ϕ)=¯ρt(¯Lϕ)dt+¯ρt(ϕhT)d¯Zt,¯ρ0(ϕ)=E[ϕ(X0)], (3.30)

where

 ¯Lϕ(x,t)=¯f1(x)⋅∇xϕ(x,t)+12Tr[Hx(¯G(x)ϕ(x,t))]+∫Rn(ϕ(x+σ1z1,t)−ϕ(x,t))ν1(dz1). (3.31)

However, we are more interested in the reduced filtering problem with the actual observation . This leads us to rewrite the reduced Zakai equation (3.30) as follows:

 {d¯ρt(ϕ)=¯ρt(¯Lϕ)dt+¯ρt(ϕhT)dZt,¯ρ0(ϕ)=E[ϕ(X0)]. (3.32)
###### Lemma 1.

Assume that the following conditions are satisfied:

1. The functions , and for and and their derivatives of first order (in x) as well as the derivatives of second order (in x) are uniformly bounded by the constant . The functions are locally uniformly bounded in .

2. .

Let be a generalized solution of problem

 ⎧⎪ ⎪⎨⎪ ⎪⎩du(t,x,ω)=Lu(t,x,ω)dt+hl(t,x,ω)u(t,x,ω)dBl(t),(t,x,ω)∈[T0,T]×Rk×Ω,u(T0,x,ω)=φ(x,ω),(x,ω)∈Rk×Ω, (3.33)

where

 Lu(t,x)=12σik(t,x,ω)σjk(t,x,ω)∂2iju(t,x,ω)+bi∂iu(t,x,ω)+12∫Rk(u(t,x+z)+u(t,x−z)−2u(t,x))dz|z|k+α, (3.34)

and is a generalized solution of problem

 (3.35)

Then the following formulas hold

 u(t,x)=E(φ(Y(t,x,T0))χ1(t,T0)|FT0t), (3.36)

and

 v(t,x)=E(φ(X(T,x,t))χ2(T,t)|FTt), (3.37)

where is the forward stochastic differential equation with generator and is the backward stochastic differential equation with generator . Here and satisfy the following equations

 χ1(t,s)=exp(∫[s,t]hl(τ,Y(t,x,τ))dBl(τ)−12∫[s,t]hlhl(τ,Y(t,x,τ))dτ), (3.38)

and

 χ2(s,t)=exp(∫[t,s]hl(τ,X(τ,x,t))dBl(τ)−12∫[t,s]hlhl(τ,X(τ,x,t))dτ). (3.39)
###### Proof.

Denote . Before we proceed to the proof of the lemma, let us consider the problem

 (3.40)

By the definition of a backward predictable stochastic process, the coefficients in equation (3.40) are predictable relative to the family of algebra with . The process is a Wiener martingale with respect to the same family and the initial condition is measurable with respect to the minimal -algebra of this family.

Let be a generalized solution of problem. Then by the definition of generalized solution, for every , the following equality holds on ,

 (v(T−s,x),y)=(φ,y)+∫[0,T−s][−(aij(T−s)∂iv(s,x),yj)+(bi(T−s)−aijj(T−s))∂iv(s,x))+12∫Rk(v(s,x+z)+v(s,x−z)−2v(s,x))dz|z|k+α,y)]ds+∫[0,T−s]hl(T−t)v(t,x)dBlT(s). (3.41)

Denote in equation (3.41), we find that satisfies the following equation on , for every .

 (u(t,x),y)=(φ,y)+∫[t,T][−(aij∂iv(s,x),yj)+((bi−aijj)∂iu(t,x))+12∫Rk(v(t,x+z)+v(t,x−z)−2v(t,x))dz|z|k+α,y)]ds+∫[t,T]hlv(s,x)dBl(s). (3.42)

Thus is an generalized solution of problem (3.35).

On the other hand, changing the variables in equality (3.42), we obtain that is a generalized solution of problem (3.40). Thus we have proved that problems (3.35) and (3.40) are equivalent. Therefore all the results obtained for problem (3.35) are naturally carried over to problem (3.40). For the problem (3.40), we can use the similar method to obtain the existence and uniqueness of - solution to stochastic fractal equations by using purely probabilistic argument (see [19, 30]). This completes the proof of this lemma. ∎

Now we show that the reduced system approximates the original system, by examining the corresponding Zakai equations. Before describing the theorem, we start by describing the probabilistic representation for semi-linear stochastic fractal equations. We proceed to consider the probability measure . Note that the process is a Brownian motion under . For convenience, we rewrite and as and , respectively. By Lemma 1, we know solves a stochastic partial differential equation (SPDE).

 {−dvϕ(t,x,y)=Lvϕ(t,x,y)dt+h(x)Tvϕ(t,x,y)d←−Bt.vϕ(T,x,y)=ϕ(x). (3.43)

Here denotes Itô’s backward integral. Likewise, we introduce , and then solves the following SPDE

 ⎧⎨⎩−d¯¯¯vϕ(t,x)=¯¯¯¯L¯¯¯vϕ(t,x)dt+h(x)T¯¯¯vϕ(t,x)d←−Bt,¯¯¯vϕ(T,x)=ϕ(x). (3.44)

By the version of Girsanov’s change of measure theorem and Markov property of , we know that for any : . In particular, we have

 ρT(ϕ)=∫vϕ(0,x,y)P(X0,Y0)(dx,dy). (3.45)

Similarly, we have

 ¯ρT(ϕ)=∫¯vϕ(0,x)P(X0)(dx). (3.46)

This is our main result on the comparison between the original filter and the reduced filter. This is desirable when only the slow component is observable.

###### Theorem 1.

Under the hypotheses (H.1)-(H.4) and for with , there exists a positive constant such that for , the following estimate holds

 E[|ρT(ϕ)−¯ρT(ϕ)|p]≤Cε, (3.47)

with a positive constant independent of . This implies that the reduced filter approximates the original filter, as the scale parameter tends to zero.

###### Proof.
 E[|ρT(ϕ)−¯ρT(ϕ)|p]=E[∣∣∫(vϕ(0,x,y)−¯vϕ(0,x))P(X0,Y0)(dx,dy)∣∣p],≤E[∫∥∥vϕ(0,x,y)−¯vϕ(0,x)∣∣pP(X0,Y0)(dx,dy)],=∫E[∣∣vϕ(0,x,y)−¯vϕ(0,x)∣∣p]P(X0,Y0)(dx,dy),=∫E[∣∣~Ex,y[ϕ(XT)~R0,T|Y0,T]−~Ex[ϕ(¯XT)~R0,T|Y0,T]∣∣p]P(X0,Y0)(dx,dy),≤C∫E∣∣~Ex[(ϕ(XT)−ϕ(¯XT))~R0,T|Yt,T]∣∣pP(X0,Y0)(dx,dy),=C∫E∣∣E[ϕ(XT)−ϕ(¯XT)|YT]~Ex[~R0,T|Y0,T]∣∣pP(X0,Y0)(dx,dy),≤C∫E∣∣E[XT−¯XT|YT]∣∣pP(X0,Y0)(dx,dy). (3.48)

Using the Jensen’s inequality, we have

 E[|ρT(ϕ)−¯ρT(ϕ)|p]≤∫E|XT−¯XT|pP(X0,Y0)(dx,dy). (3.49)

Applying a similar argument from [26], we obtain

 limϵ→0E|XT−¯XT|p=0 (3.50)

Finally, we conclude that

 E[|ρT(ϕ)−¯ρT(ϕ)|p]≤Cε. (3.51)

This completes the proof of Theorem 1. ∎

## 4 Parameter estimation

In this section, we consider parameter estimation in the following slow-fast dynamical system

 (4.52)

where is an matrix, is a positive definite matrix with eigenvalues . is an unknown parameter defined in an compact set of .

Hypothesis H.5. For all , the function is uniformly bounded and Lipschitz w.r.t x and y. The function is smooth with the first order derivatives bounded by .

Define an averaged, low dimensional slow stochastic dynamical system in (see [27]), as in the previous section

 d¯x(t)=A¯x(t)+¯f1(¯x(t),θ)dt, (4.53)

where

 ¯f1(¯x,θ)≜∫Rmf1(x,y,θ)μx(dy).

We will estimate the unknown parameter , based on this low dimensional slow system (4.53) but using the observations on the slow component only. Denote the solution of the original slow-fast system (4.52) with actual parameter by , and the solution of the slow system (4.53) with parameter by . We recall the following lemma.

###### Lemma 2.

Under hypothesis (H.6), the following strong convergence holds

 limε→0E|x(t,θ0)−¯x(t,θ0)|p=0, t∈[0,T] and p∈(1,α). (4.54)
###### Proof.

The proof of average principle is similar the the infinite case which has been studied in [26], Thus we omit here. ∎

We take as the objective function and assume there is a unique such that . This is our estimated parameter value. Then we can state our main result as follows.

###### Theorem 2.

Under hypothesis (H.6), the estimated parameter value converges to the true parameter value, as the scale parameter approaches zero. That is,

 limε→0^θϵ=θ0. (4.55)
###### Proof.

Note that

 E|¯x(t,θ0)−¯x(t,^θε)|p≤C(E|x(t,θ0)−¯x(t,θ0)|p+E|x(t,θ0)−¯x(t,^θε)|p), (4.56)

for some positive constant . Integrating both sides with respect to time, we get

 ∫T0E|¯x(t,θ0)−¯x(t,^θε)|pdt≤C(F(θ0)+F(^θε))≤2CF(θ0). (4.57)

We calculate the difference between and to obtain

 ˙¯x(t,θ0)−˙¯x(t,^θε)=A[¯x(t,θ0)−¯x(t,^θε)]+[¯f1(¯x(t,θ0),θ0)−¯f1(¯x(t,^θε),^θε)].

By the variation of constant formula, we have

 ¯x(t,θ0)−¯x(t,^θε)=∫t0eA(t−s)[¯f1(¯x(s,θ0),θ0)−¯f1(¯x(s,^θε),^θε)]ds.

Using the mean value theorem, we obtain

 ¯x(t,θ0)−¯x(t,^θε)=(θ0−^θε)∫t0eA(t−s)∇θ¯f1(¯x(s,θ′),θ′)ds, (4.58)

for some with . Denote

 G(θ′)=∫T0∣∣∣∫t0eA(t−s)∇θ¯f