Time Averages of Markov Processes and Applications to Two-Timescale Problems

# Time Averages of Markov Processes and Applications to Two-Timescale Problems

## Abstract

We show a decomposition into the sum of a martingale and a deterministic quantity for time averages of the solutions to non-autonomous SDEs and for discrete-time Markov processes. In the SDE case the martingale has an explicit representation in terms of the gradient of the associated semigroup or transition operator. We show how the results can be used to obtain quenched Gaussian concentration inequalities for time averages and to provide deeper insights into Averaging principles for two-timescale processes.

bib.bib \DTMsettimestyleiso

## 1 Introduction

For a Markov process with or let

 STf=∫T0f(t,Xt)dt

in the continuous-time case or

 STf=T−1∑t=0f(t,Xt)

in discrete time.

In the first part of this work, we will show a decomposition of the form

 STf=ESTf+MT,fT

where is a martingale depending on and for which we will give an explicit representation in terms of the transition operator or semigroup associated to .

We then proceed to illustrate how the previous results can be used to obtain Gaussian concentration inequalities for when is the solution to an Itô SDE.

The last part of the work showcases a number of results on two-timescale processes that follow from our martingale representation.

## 2 Martingale Representation

Consider the following SDE with time-dependent coefficients on :

 dXt =b(t,Xt)dt+σ(t,Xt)dBt,X0=x

where is a standard Brownian motion on with filtration and are continuous in and locally Lipschitz continuous in . We assume that does not explode in finite time.

Denote the set of smooth compactly supported space-time functions on .

Let be the evolution operator associated to ,

 Ps,tf(x)=E[f(t,Xt)|Xs=x],f∈C∞c.

For fixed consider the martingale

 Mt=EFt∫T0f(s,Xs)ds.

and observe that since is adapted and by the Markov property

 Mt=∫t0f(s,Xs)ds+EFt∫Ttf(s,Xs)ds=∫t0f(s,Xs)ds+RTtf(Xt)

with

 RTtf(x)=∫TtPt,sf(x)ds.

By applying the Itô formula to we can identify the martingale . This is the content of the following short theorem.

###### Theorem 2.1.

For fixed, and

 ∫t0f(s,Xs)ds+RTtf(Xt)=E∫T0f(s,Xs)ds+MT,ft

with

 MT,ft =∫t0∇RTsf(Xs)⋅σ(s,Xs)dBs.
###### Proof.

From the Kolmogorov backward equation and since we have

 ∂tRTtf(x) =−f(t,x)−∫TtLtPt,sf(x)ds=−f(t,x)−LtRTtf(x).

By Itô’s formula

 RTtf(Xt) =RT0f(X0)+∫t0∂sRTsf(Xs)ds+∫t0LsRTsf(Xs)ds+∫t0∇RTsf(Xs)⋅σ(s,Xs)dBs =E∫T0f(t,Xt)dt−∫t0f(s,Xs)ds+∫t0∇RTsf(Xs)⋅σ(s,Xs)dBs

and we are done.

###### Remark 2.2 (Poisson Equation).

In the time-homogeneous case and when the limit below is finite then it is independent of and we have

 R∞f:=limT→∞RTtf=limT→∞∫TtPs−tfds=limT→∞∫T−t0Psfds=∫∞0Psfds.

This is the resolvent formula for the solution to the Poisson equation with .

By taking in Theorem 2.1 we can identify the martingale part in the martingale representation theorem for .

###### Corollary 2.3.

For fixed,

 ∫T0f(t,Xt)dt−E∫T0f(t,Xt)dt=∫T0∇∫TtPt,sf(Xt)ds⋅σ(t,Xt)dBt.

By applying the Itô formula to we obtain for fixed

 dPt,Tf(Xt)=∇Pt,Tf(Xt)⋅σ(t,Xt)dBt (2.1)

and by integrating from to

 f(T,XT)=E[f(T,XT)]+∫T0∇Pt,Tf(Xt)⋅σ(t,Xt)dBt.

This was observed at least as far back as \autociteelliott_integration_1989 and is commonly used in the derivation of probabilistic formulas for .

Combining the formula (2.1) with Theorem 2.1 we obtain the following expression for in terms of .

###### Corollary 2.4.

For , fixed and any

 ∫t0f(s,Xs)−Ef(s,Xs)ds=MT,ft−ZT,ft

with

 ZT,ft =∫Tt∫t0∇Pr,sf(r,Xr)⋅σ(r,Xr)dBrds MT,ft =∫t0∫Tr∇Pr,sf(r,Xr)ds⋅σ(r,Xr)dBr.
###### Proof.

Let . We have

 RTtf0(Xt) =∫TtPt,sf0(Xt)ds =∫TtPt,sf(Xt)−P0,sf(X0)ds =∫Tt∫t0∇Pr,sf(r,Xr)⋅σ(r,Xr)dBrds

where the last equality follows by integrating (2.1) from to (with ). Since and we get from Theorem 2.1 that

 ∫t0f0(s,Xs)ds=MT,ft−RTtf0(Xt)

and the result follows with . ∎

###### Remark 2.5 (Carré du Champs and Mixing).

For differentiable functions let

 Γt(f,g)(x)=12∇f(t,x)(σσ⊤)(t,x)∇g(t,x).

Then we have the following expression for the quadratic variation of :

 d⟨MT,f⟩t =∣∣∣∫Ttσ(t,Xt)⊤∇Pt,sf(Xt)ds∣∣∣2dt =(4∫t≤s≤r≤TΓt(Pt,sf,Pt,rf)(Xt)drds)dt.

Furthermore, since

 ∂sPr,s(Ps,tfPs,tg)=2Pr,s(Γs(Ps,tf,Ps,tg))

and setting we have

 E⟨MT,f⟩t =2∫T0∫Tt2P0,tΓt(Pt,sf,Pt,sg)dsdt =2∫T0∫Tt∂tP0,t(Pt,sfPt,sg)dsdt =2∫T0∂t∫TtP0,t(Pt,sfPt,sg)dsdt+2∫T0P0,t(fg)dt =2∫T0P0,t(fg)−P0,tfP0,tgdt =2∫0≤t≤s≤TCov(f(t,Xt),f(s,Xs))dsdt.

This shows how the expressions we obtain in terms of the gradient of the semigroup relate to mixing properties of .

###### Remark 2.6 (Pathwise estimates).

We would like to have a similar estimate for

 Esup0≤t≤T∣∣∣∫t0f(Xs)−Ef(Xs)ds∣∣∣.

Setting

 f0(t,x)=f(x)−Ef(Xt)=f(x)−P0,tf(x0)

we have

 Esup0≤t≤T∣∣∣∫t0f(Xs)−Ef(Xs)ds∣∣∣ ≤Esup0≤t≤T|MT,f0t|+Esup0≤t≤T|RTtf0(Xt)| ≤2(E⟨MT,f0⟩T)1/2+Esup0≤t≤T|RTtf0(Xt)|

and

 RTtf0(Xt) =∫TtPt,sf(Xt)−P0,sf(x0)ds =∫Tt∫t0∇Pr,sf(Xr)⋅σ(r,Xr)dBrds

where the last equality follows from (for fixed)

 dPt,sf(Xt)=∇Pt,sf(Xt)⋅σ(t,Xt)dBt.

### 2.1 Discrete time

Consider a discrete-time Markov process with transition operator

 Pm,nf(x)=E[fn(Xn)|Xm=x]

and generator

 Lnf(x)=Pn,n+1f(x)−fn(x).

As in the continuous-time setting

 Mn:=fn(Xn)−f0(X0)−n−1∑m=0Lmf(Xm)

is a martingale (by the definition of ) and by direct calculation

 Mn−Mn−1=fn(Xn)−Pn−1,nf(Xn−1).

Let

 RNnf(x)=N−1∑m=nPn,mf(x)

and observe that

 LnRNf(x)=N∑m=n+1Pn,n+1Pn+1,mf(x)−N−1∑m=nPn,mf(x)=−fn(x).

Note that

 RNNf(x)=0 and RN0f(x)=E[N−1∑m=nf(Xm)∣∣ ∣∣X0=x].

It follows that

 n−1∑m=0fm(Xm)+RNnf(Xn)=−n−1∑m=0LmRNf(Xm)+RNnf(Xn)=RN0f(X0)+MN,fn

with

 MN,fn−MN,fn−1=N−1∑m=nPn,mf(Xn)−Pn−1,mf(Xn−1).

Analogous to the continuous-time case, we define the carré du champs

 Γn(f,g) :=Ln(fg)−gnLnf−fnLng =Pn,n+1(fg)−fnPn,n+1g−gnPn,n+1f+fngn =E[(fn+1(Xn+1)−fn(Xn))(gn+1(Xn+1)−gn(Xn))|Fn]

and using the summation by parts formula

 ⟨MN,f⟩n−⟨MN,f⟩n−1=E[(MN,fn−MN,fn−1)2|Fn−1] =2∑n≤k≤m

## 3 Concentration inequalities from exponential gradient bounds

In this section we focus on the case where we have uniform exponential decay of so that

 |σ(s,x)⊤∇Ps,tf(x)|≤Cse−λs(t−s)(0≤s≤t≤T) (3.1)

for all and some class of functions .

We first show that exponential gradient decay implies a concentration inequality.

###### Proposition 3.1.

For fixed and all functions such that (3.1) holds we have

 Missing or unrecognized delimiter for \left
###### Proof.

By (3.1)

 d⟨MT,f⟩t =∣∣∣∫Ttσ(t,Xt)⊤∇Pt,sf(Xt)ds∣∣∣2dt ≤(∫TtCte−λt(s−t)ds)2dt=(Ctλt(1−e−λt(T−t)))2dt

so that .

By Corollary 2.3 and since Novikov’s condition holds trivially due to being bounded by a deterministic function we get

 Eexp(a∫T0f(t,Xt)−Ef(t,Xt)dt)=Eexp(aMT,fT)≤E[exp(aMT,fT−a22⟨MT,f⟩T)]exp(a22⟨MT,f⟩T)≤exp(a22VTT).

By Chebyshev’s inequality

 P(1T∫T0f(t,Xt)−Ef(t,Xt)dt>R)≤exp(−aRT)exp(a22VTT)

and the result follows by optimising over . ∎

The corresponding lower bound is obtained by replacing by .

For the rest of this section, suppose that and that we are in the time-homogeneous case so that . An important case where bounds of the form (3.1) hold is when there is exponential contractivity in the Kantorovich (Wasserstein) distance . If for any two probability measures on

 W1(μPt,νPt)≤Ce−λtW1(μ,ν). (3.2)

then (3.1) holds for all Lipschitz functions with , .

Here the distance between two probability measures and on is defined by

 W1(μ,ν)=infπ∫|x−y|π(dxdy)

where the infimum runs over all couplings of . We also have the Kantorovich-Rubinstein duality

 W1(μ,ν)=sup∥f∥Lip≤1∫fdμ−∫fdν (3.3)

and we use the notation

 ∥f∥Lip=supx≠yf(x)−f(y)|x−y|.

We can see that (3.2) implies (3.1) from

 |∇Ptf|(x)=limy→x|Ptf(y)−Ptf(x)||y−x|≤limy→xW1(δyPt,δxPt)|y−x|≤∥f∥LipCe−λtlimy→xW1(δy,δx)|y−x|=∥f∥LipCe−λt

where the first inequality is due to the Kantorovich-Rubinstein duality (3.3) and the second is (3.1).

Bounds of the form (3.2) have been obtained using coupling methods in \autociteeberle_reflection_2016,eberle_quantitative_2016,wang_exponential_2016, under the condition that there exist positive constants such that

 (x−y)⋅(b(x)−b(y))≤−κ|x−y|2 when |x−y|>R0.

Similar techniques lead to the corresponding results for kinetic Langevin diffusions\autociteeberle_couplings_2017,.

Using a different approach, in [crisan_pointwise_2016] the authors directly show uniform exponential contractivity of the semigroup gradient for bounded continuous functions, focusing on situations beyond hypoellipticity.

Besides gradient bounds, exponential contractivity in also implies the existence of a stationary measure \autociteeberle_reflection_2016. Proposition 3.1 now leads to a simple proof of a deviation inequality that was obtained in a similar setting in \autocitejoulin_new_2009, via a tensorization argument.

###### Proposition 3.2.

If (3.2) holds then for all Lipschitz functions and all initial measures

 Pμ0(1T∫T0f(Xt)dt−∫fdμ∞>R)≤exp⎛⎝−(λ√TRC∥f∥Lip(1−e−λT)−W1(μ0,μ∞)√T)2⎞⎠
###### Proof.

We start by applying Proposition 3.1 so that

 Pμ0(1T∫T0f(Xt)dt−∫fdμ∞>R) =Pμ0(1T∫T0f(Xt)−Ef(Xt)dt>R+1T∫T0μ∞(f)−μ0Pt(f)dt) ≤exp⎛⎝−(R−∣∣∣1T∫T0μ∞(f)−μ0Pt(f)dt∣∣∣)2TVT⎞⎠,VT=(∥f∥LipC(1−e−λT)λ)2.

By the Kantorovich-Rubinstein duality

 ∣∣∣1T∫T0μ∞(f)−μ0Pt(f)dt∣∣∣ ≤∣∣∣1T∫T0∥∇f∥∞W1(μ∞Pt,μ0Pt)dt∣∣∣ ≤∥∇f∥∞Cλ(1−e−λT)TW1(μ,μ0)=√VTTW1(μ,μ0).

from which the result follows immediately. ∎

## 4 Averaging: Two-timescale Ornstein-Uhlenbeck

Consider the following linear multiscale SDE on where the first component is accelerated by a factor :

 dXt =−α(Xt−Yt)dt+√αdBXt,X0=x0 dYt =−(Yt−Xt)dt+dBYt,Y0=y0

with independent Brownian motions on . Denote and the associated semigroup and infinitesimal generator respectively.

Let and note that . We have by the regularity of and the Kolmogorov forward equation

 ∂t∂xPtf=∂xPtLf=−(α+1)∂xPtf

so that

 ∂xPtf=∂xfe−(α+1)t=e−(α+1)t.

Repeating the same reasoning for and gives

 ∂yPtf=−e−(α+1)t and Ptf(x,y)=(x−y)e−(α+1)t.

From Corollary 2.3

 ∫T0Xt−Ytdt=RT0f(x0,y0)+MT,fT

with

 RTtf(x,y) =∫TtPs−tf(x,y)ds=(x−y)1−e−(α+1)(T−t)α+1, MT,fT =∫T0∫Tt∂xPs−tf(Xt,Yt)ds√αdBXt+∫T0∫Tt∂yPs−tf(Xt,Yt)dsdBYt =∫T01−e−(α+1)(T−t)α+1(√αdBXt−dBYt).

This shows that for each fixed

 YT−(BYT+y0)=∫T0Xt−Ytdt

is a Gaussian random variable with mean

 RT0=(x0−y0)1−e−(α+1)Tα+1

and variance

 ⟨MT,f⟩T=1(α+1)∫T0(1−e−(α+1)(T−t))2dt.

## 5 Averaging: Exact gradients in the linear case

Consider

 dXt =−α(Xt−Yt)dt+√αdBXt,X0=x0 dYt =−(Yt−Xt)dt−βYt+dBYt,Y0=y0

Denote the solution for and let . Then

 dVt=−AVtdt with A=(α−α−1(1+β)).

The solution to the linear ODE for is

 Vt(z,v)=e−Atv

Since does not depend on we drop it from the notation. Now for any continuously differentiable function on and we obtain the following expression for the gradient of in the direction :

 ∇vPtf(z) =limε→0Ptf(z+εv)−Ptf(z)ε=limε→0Ef(Zt(z+εv))−f(Zt(z))ε =limε→0E∇f(Zt(z))⋅Vt(εv)+o(|Vt(εv)|)ε =E∇f(Zt(z))⋅e−Atv.

Since we can identify .

The eigenvalues of are with

 λ0 =12(α+β+1−√(α+β+1)2−4αβ), λ1 =12α(α+β+1+√(α+β+1)2−4αβ).

By observing that

 (α+β+1)2−4αβ=(α−(1+β))2+4α=(β−(α+1))2+4β

we see that asymptotically as

 λ0 =β+O(1α) λ1 =1+1α+O(1α2).

We can compute the following explicit expression for

 e−At =c0(t)Id−c1(t)αA =⎛⎝c2(t)αc1(t)c1(t)αc0(t)−1+βαc1(t)⎞⎠

with

 c0(t)