Persistence exponents in Markov chains

Persistence exponents in Markov chains

Department of Statistics,
Columbia University,
New York, USA
Department of Mathematics,
Weizmann Institute of Science,
Rehovot, Israel
Abstract

We prove the existence of the persistence exponent

 −logλ:=limn→∞1nlogPμ(X0∈S,…,Xn∈S)

for a class of time homogeneous Markov chains in a Polish space, where is a Borel measurable set and is the initial distribution. Focusing on the case of AR() and MA() processes with and continuous innovation distribution, we study the existence of and its continuity in the parameters, for . For AR processes with log-concave innovation distribution, we prove the strict monotonicity of . Finally, we compute new explicit exponents in several concrete examples.

[
\kwd
\setattribute

journalname \startlocaldefs \endlocaldefs\pdfstringdefDisableCommands

\runtitle

Persistence exponents in Markov chains \thankstextm1Research partially supported by DFG grant AU370/4-1. \thankstextm2Research partially supported by NSF grant DMS-1712037. \thankstextm3Research partially supported by grant 147/17 from the Israel Science Foundation.

{aug}

, and

class=MSC] \kwd[Primary ]60J05, 60F10 \kwd[; secondary ]45C05, 47A75

ARMA \kwdeigenvalue problem \kwdintegral equation \kwdlarge deviations \kwdMarkov Chain \kwdpersistence \kwdquasi-stationary distribution

1 Introduction

Let be a time homogenous Markov chain on a Polish space with transition kernel . For a given Borel measurable set , we are interested in the asymptotics of the persistence probability

 pn(P,S,μ):=Pμ(Xi∈S,0≤i≤n)=∫Sn+1P(xi,dxi+1)μ(dx0),

where is the initial measure, i.e. the law of . We stress that we shall be particularly interested in non-compact . We will be interested in the existence of the persistence exponent , defined as

 −logλ(P,S,μ):=limn→∞1nlogpn(P,S,μ) (1.1)

and its continuity and monotonicity properties in parameters of the kernel.

The asymptotics of persistence probabilities for not necessarily Markov processes has received both classical and recent interest in probability theory and theoretical physics. For recent surveys on persistence probabilities we refer the reader to [6] for a theoretical physics point of view and to [3] for a review of the mathematical literature.

Our approach exploits the Markovian structure and relates the persistence exponent to an eigenvalue of an appropriate operator, via the Krein-Rutman theorem. Such ideas go back at least to Tweedie [17, 18]. We work under somewhat different assumptions, and much of our effort lies in deriving the existence of the persistence exponent and its properties directly in terms of the kernel. The analysis in [17, 18] shows, under assumptions that are not always satisfied in the examples that we consider, the equivalence of the exponent’s existence and properties of the eigenvalue equation determined by . Even in very natural examples, we sometimes, and in particular in Section 5, need to work not with but rather with a modification of it.

One upshot of our approach is a study of monotonicity and continuity properties of the persistence exponent in parameters of the kernel . We illustrate this in the case of AR and MA processes, where the kernel (and thus the persistence exponent) depends on the coefficient vector. In this context, we derive a monotonicity lemma (Lemma 5.2) that might be of independent interest. We emphasize that since we are not working in the Gaussian setup, standard tools such as Slepian’s inequality cannot be applied; in particular, our results seem to provide the first monotonicity statements outside of the Gaussian case. As an application, we prove the strict monotonicity of the persistence exponent AR() processes with log concave innovation distributions. Finally, we demonstrate the strength of our approach by computing a number of new persistence exponents in concrete examples by solving the corresponding eigenvalue equation.

The outline of the paper is as follows: Section 1.1 contains our main abstract existence result. The short and technical Section 1.2 contains an abstract monotonicity lemma, and a continuity lemma. The abstract framework is then applied to auto-regressive (AR) and moving-average (MA) processes in Section 2, where existence of the exponent, continuity of the exponent, (strict) monotonicity results, and the question whether the exponent is degenerate are discussed. Finally, Section 3 contains a number of concrete cases where we are able to solve the eigenvalue equation, i.e. to find the leading eigenvalue explicitly. Sections 46 are devoted to the proofs corresponding to the former three topics, respectively.

1.1 Existence of the exponent

We begin with the following definition.

Definition 1.1.

Let denote the set of all bounded measurable functions on , and let denote the space of continuous bounded functions on equipped with the sup norm. For a bounded linear operator mapping to itself, define the operator norm

 ||K||:=supg∈B(S):||g||∞≤1||Kg||∞,

 λ(K):=limn→∞||Kn||1/n∈[0,1]. (1.2)

Note that the limit in (1.2) exists by sub-additivity, and that . Note also that , where is the linear operator on defined by

 [PS(g)](x):=∫Sg(y)P(x,dy),x∈S, (1.3)

while, for comparison,

 λ(P,S):=λ(PS)=limsupn→∞(supx∈SPnS1(x))1/n.

We recall that an operator from to itself is called compact if for any sequence in with one finds a subsequence such that converges in sup norm.

Theorem 1.2.

Assume the following conditions:

1. is a non-negative linear operator which maps into itself, and is compact for some .

2. for any non empty open set .

Then,

 ∫SKn1(x)μ(dx)=λ(K)n+o(n). (1.4)

Further, if , then is the largest eigenvalue of the operator , the corresponding eigenfunction is non-negative, and there exists a bounded, non-negative, finitely additive regular measure on which is a left eigenvector of corresponding to the eigenvalue , i.e.

Remark 1.3.

1) Replacing by in Theorem 1.2 gives immediately a sufficient condition for the existence of a universal persistence exponent for all initial conditions satisfying condition (ii). As we will see in Section 2, this is not always the best choice.
2) The assumption of compactness of for some (rather than the compactness of itself) is (a) sufficient for the proof to go through and (b) necessary in order to deal with concrete examples: For example, one can show that the operator corresponding to an MA() process is typically not compact itself but only is compact.
3) The left eigenvector in Theorem 1.2 is only finitely additive. This is a consequence of the fact that can be (and typically is, in our applications) non-compact. This complicates some of the following arguments. For example, the proof of Proposition 3.6 would be immediate if were a measure.

1.2 Properties of exponents

We begin with a definition.

Definition 1.4.

Suppose is equipped with a partial order . Let denote the class of bounded, non-negative, non-decreasing (in the sense of this partial order) measurable functions on .

A non-negative bounded linear operator on is stochastically non-decreasing with respect to the partial order if maps to itself.

The following lemma gives a sufficient condition for comparing and for two bounded non-negative linear operators .

Lemma 1.5.

Let and be two bounded non-negative linear operators on , such that the following conditions hold:

1. There exists a non-negative measurable function on such that for any .

2. is stochastically non-decreasing on .

Then for any we have where .

For studying the continuity of exponents, we prove the following lemma, which relates the continuity of exponents to continuity in operator norm.

Lemma 1.6.

For every let be a bounded linear operator on . If , then .

2 Results for AR and MA processes

In this section we study the persistence exponents of two classes of examples of interest: auto-regressive processes and moving-average processes.

2.1 Auto-regressive processes

First we deal with auto-regressive processes (AR processes of order ). Let be sequence of i.i.d. random variables of law possessing a continuous density function with respect to the Lebesgue measure. Let be a vector, called coefficient vector. Given independent of the sequence , we define an AR() process by setting

 Zi:=p∑j=1ajZi−j+ξi,i≥p.

We will always assume that is distributed according to the law . Let be defined by

 Kψ(x1,⋯,xp):=∫y+∑pj=1ajxp+1−j>0ψ(x2,⋯,xp,y+p∑j=1ajxp+1−j)ϕ(y)dy. (2.1)

It is not hard to verify that under the above assumptions, maps to itself.

We first turn to the existence of the persistence exponent. Here, we have to distinguish the cases , , and .

Theorem 2.1.

Fix , , continuous innovation density , initial distribution satisfying for all open , and let be the associated AR(p) process.

1. There is a , independent of , such that

 Pμ(min0≤i≤nZi≥0)=θF(a)n+o(n).

Further, if , then is the largest eigenvalue of the operator from (2.1), viewed as an operator mapping to itself. The corresponding eigenvector is non-negative and continuous.

2. If then , and if then .

3. If is a sequence of vectors in converging to and , then we have .

As the next proposition shows, if the coefficient vector in Theorem 2.1 is allowed to have positive entries, depending on the initial distribution the exponent need not even exist.

Proposition 2.2.

Suppose is an AR() process with innovation distribution , and .

1. If the initial distribution is , then and it is continuous in .

2. If the initial distribution satisfies

 limsupM→∞1logMlogPμ(Z0>M)= 0, (2.2) liminfM→∞1logMlogPμ(Z0>M)=− ∞, (2.3)

then there exists sequences and such that

 liminfk→∞1mklogP(min0≤i≤mkZi≥0)= 0, (2.4) limsupk→∞1nklogP(min0≤i≤nkZi≥0)≤ logθF(a1,μ0), (2.5)

where is as in part (a). In particular, the exponent does not exist for any , whereas .

It follows that the AR() operator with and is no longer compact on , as otherwise Theorem 1.2 would be applicable with .

In order to address this issue and derive an existence result for more general situations where the operator is not compact, one needs to make a judicious choice of the operator in Theorem 1.2, and this requires additional assumptions on the initial measure and innovation. We focus below on the contractive case .

Theorem 2.3.

Fix , parameters satisfying , continuous innovation density , initial distribution satisfying for all open , and let be the associated AR(p) process. Further assume that there exists such that

 Eμeδ∑p−1j=0Zj1{min0≤i≤p−1Zi≥0}<∞ (2.6)

and

 limsupt→∞1|t|logϕ(t)<0. (2.7)
1. There is a independent of the initial distribution such that

 Pμ(min0≤i≤nZi≥0)=θF(a)n+o(n).

Further, if , then is an eigenvector of the operator on defined by (2.1). The corresponding eigenvector is non-negative and continuous.

2. If then , and if then .

3. The function is continuous on the set .

As mentioned before, the proof of Theorem 2.3 employs a modified version of the operator , which now turns out to be compact if . The motivation behind the modification of the operator borrows from [2, 5, 15], who use a similar strategy to deal with AR() processes with Gaussian innovations starting at stationarity.

2.1.1 Strict monotonicity of the exponent

We first begin with the following proposition which shows monotonicity (not necessarily strict) of the AR persistence exponent. Recall that if then the exponent may not exist, as shown in Proposition 2.2.

Proposition 2.4.

For any , if then

 Pμ,b(min0≤i≤nZi≥0)≥Pμ,a(min0≤i≤nZi≥0).

In particular, if both exponents exist, then we have .

If the limit in (2.3) in Proposition 2.2 is , then the same proof shows that the corresponding exponent exists and equals for all , and consequently the function is not strictly monotone. Our next theorem shows that if has a log concave density on and the initial distribution has finite exponential moment, then the map is strictly increasing on the set . The exponential decay of log concave densities ensures that (2.7) holds, and so Theorem 2.3 guarantees the existence of a non-trivial exponent which is free of the initial distribution.

Theorem 2.5.

Assume is a strictly positive log concave density over , that , and that satisfies (2.6). Then with implies .

We complete the picture on the positive orthant through the next theorem, which says that the persistence exponent for all such that , for any innovation distribution .

Proposition 2.6.

Assume that and . If and then .

Proposition 2.6, together with Theorems 2.3 and 2.5, gives a complete picture in terms of monotonicity on the positive orthant. The function is continuous and non-increasing on , and identically equal to on the set . If further the innovation density is log concave and the initial distribution has finite exponential moment, then the exponent is strictly increasing on .

2.1.2 Positivity of the exponent

Part (b) of Theorem 2.1 and Theorem 2.3 give conditions ensuring that the exponent is non-trivial, i.e. the persistence probability decays at an exponential rate. The next proposition generalizes this to show that no matter what the coefficient vector may be, the exponent can never be , i.e. the persistence probability can never decay at a super exponential rate.

Proposition 2.7.

Fix , parameters , innovation distribution satisfying , and initial satisfying for every . Let be the associated AR(p) process. Then,

 liminfn→∞1nlogP(min0≤i≤nZi≥0)>−∞.

In particular, under the assumptions of Proposition 2.7, if exists then it must be positive.

Remark 2.8.

One cannot dispense completely of the assumptions in Proposition 2.7. Indeed, concerning the condition on initial distribution, when , , and , one sees that forces , and so . On the other hand, concerning the condition on the innovation distribution, if , , and for all , one obtains that , and so again .

2.2 Moving Average process

We next discuss moving average processes. Let be a sequence of i.i.d. random variables from a continuous distribution function . For a coefficient vector define the MA() process by setting

 Zi:=ξi+q∑j=1ajξi−j,i=0,1,2,….

Define the operator mapping to itself by

 Kϕ(x1,…,xq)=∫y+∑qj=1ajxq+1−j>0ψ(x2,⋯,xq,y)F(dy). (2.8)
Theorem 2.9.

For all MA() processes with , there is a so that

 P(min0≤i≤nZi≥0)=βF(a)n+o(n). (2.9)

Further, if then is the largest eigenvalue of the operator defined in (2.8), and the corresponding eigenfunction is non-negative and continuous.

The following theorem establishes the continuity of the MA persistence exponent.

Theorem 2.10.

In the setting of Theorem 2.9, the function is continuous on .

Theorem 2.9 shows that . Our next proposition gives a necessary and sufficient condition for .

Proposition 2.11.

Suppose we have an MA() process such that . Then if and only if .

For the particular case and and any innovation distribution with a continuous density, we have

 P(min0≤i≤nZi≥0)=P(ξ−1<ξ0<…<ξn)=1(n+2)!, (2.10)

and so . This observation has already been noted in [12, 14].

3 Computing the exponent in concrete cases

Using our operator approach, we can compute the persistence exponent in a number of concrete examples.

3.1 Results for AR(1) processes

We begin with the computation of the persistence exponent for the AR(1) process with uniformly distributed innovations.

Proposition 3.1.

Let denote an AR(1) process with , arbitrary initial distribution , and with innovation density , where . Then

 P(min0≤i≤nZi≥0)=(2bπ(a+b))n+o(n).

Our second example concerns exponential innovations.

Proposition 3.2.

Let denote an AR(1) process with , arbitrary initial distribution , and standard exponential innovations. Then

 P(min0≤i≤nZi≥0)=(11−a1)n−1Eea1Z01{Z0≥0}.

3.2 Results for MA processes

We next consider MA(1) processes, starting with uniform innovation density.

Proposition 3.3.

Let denote an MA(1) process with and innovation density , where .

• If then

 P(min0≤i≤nZi≥0)=(4bπ(a+b))n+o(n).
• If then

 P(min0≤i≤nZi≥0)=λn+o(n).

where is the largest real solution to the equation

 tan(a(a+b)λ)=1−(1−2a/(a+b))/λ1+(1−2a/(a+b))/λ. (3.1)

For in Proposition 3.3, one obtains . The next theorem shows that for continuous symmetric innovation distributions this value is universal.

Theorem 3.4.

Let denote an MA(1) process with and symmetric innovation density. Then

 P(min1≤i≤nZi≥0)=P(min0≤i≤nξi+ξi−1≥0)=∑k∈Z2(π/2+2πk)n+2=(2π)n+o(n).

Theorem 3.4 first appears in [14]. Their proof uses different methods.

We show in Proposition 3.5 below that the universality in Theorem 3.4 does not extend to discrete distributions. In fact, for discrete innovation distributions , there can be non-trivial differences between the two quantities

 P(min0≤i≤nZi>0)andP(min0≤i≤nZi≥0).
Proposition 3.5.

Let denote an MA(1) process with and Rademacher innovations, i.e. equal with probability ). Then

 P(min0≤i≤nZi>0)=(1/2)n+2,

while

 P(min0≤i≤nZi≥0)=(12+1√5)(1+√54)n+1+(12−1√5)(1−√54)n+1.

Our final example considers MA(1) processes with exponential innovation distribution.

Proposition 3.6.

Let denote an MA(1) process with and standard exponential innovations. Then

 P(min0≤i≤nZi≥0)=(1+a1)n+o(n).

4 Proof of the results of Section 1

Proof of Theorem 1.2.

The upper bound is simple: using that , we obtain from (1.2) that

 pn(P,S,μ)=∫S[Kn(1)](x)μ(dx)≤supx∈S[Kn1](x)=||Kn(1)||∞≤λ(K)n+o(n).

We turn to the lower bound. We may and will assume that since otherwise there is nothing left to prove. Note that equipped with the sup norm is a Banach space (even if is not compact, see [9], p. 257). Thus denoting by the -fold composition of (note that we consider acting on the smaller space ), by assumption (i), is a compact operator. Further,

 limn→∞(||(Kk)n||)1/n=(limn→∞(||Knk(1)||∞)1kn)k=λ(K)k>0,

and so an application of [7, Theorem 19.2] (also see [1, Problem 7.1.9]) yields the existence of a non-negative continuous function , , such that

 Kk~ψ(x)=λk~ψ(x),∀x∈S.

Setting

 ψ(x):=k−1∑a=0λa[Kk−1−a(~ψ)](x),

we note that . Also note that , and so is non-zero and non-negative. Finally a telescopic cancellation gives

 Kψ−λψ=k−1∑a=0λaKk−a(~ψ)−k−1∑a=0λa+1Kk−1−a(~ψ)=Kk~ψ−λk~ψ=0,

and so . Thus, setting , we obtain

 [Kn(1)](x)≥1c[Kn(ψ)](x)=1cλnψ(x).

Integrating the last inequality with respect to gives

 ∫S[Kn(1)](x)μ(dx)≥∫Sψ(x)μ(dx)cλn.

Since by assumption (ii) on , the lower bound in (1.4) follows at once.

Finally, the fact that is the largest eigenvalue of follows from the fact that is the largest eigenvalue of , a consequence of [7, Theorem 19.2]. Also, existence of the left eigenvector follows from [7, Exercise 12, p. 236], along with the observation that the dual of is the space of bounded, finitely additive regular measures on , see [9, Theorem IV.6.2.2]. ∎

Proof of Lemma 1.5.

Since , using assumption (ii) we have for all . Using condition (i) , which is the desired conclusion for . To verify the statement for general , we proceed by induction:

 Ki1(g) =K1(Ki−11(g))≥h(x)K2(Ki−11(g))=K2,h(Ki−11(g))≥Ki2,h(g).

In the last display, we use the fact that along with condition (i) for the first inequality, and the induction hypothesis along with the fact that preserves the ordering in the second inequality. ∎

Proof of Lemma 1.6.

Since converges to , without loss of generality assume . Also without loss of generality by scaling all operators involved if neccessary, we can assume that . Thus, for any with and arbitrary we have

 ||Knℓf||∞= ||(K∞+Kℓ−K∞)nf||∞ ≤ ⌊nδ⌋(n⌊nδ⌋)||Kn−⌊nδ⌋∞||+2n||Kℓ−K∞||⌊nδ⌋,

which upon taking sup over and invoking (1.2) gives

 λ(Kℓ)≤max(δ−δ(1−δ)1−δλ(K∞)1−δ,2||Kℓ−K∞||δ).

Letting followed by gives

 limsupℓ→∞λ(Kℓ)≤λ(K∞),

which is the upper bound. The lower bound follows by a symmetric argument, reversing the roles of and . ∎

5 Proofs of the results of Section 2

5.1 Proof of the results of Section 2.1

Proof of Theorem 2.1.

The proof is based on Theorem 1.2 with , and , and consists in checking the assumptions there, and in particular the compactness of .

(a) If then the process is i.i.d. for which all conclusions are trivial. Thus assume w.l.o.g. that , and that (otherwise we can reduce the value of ). Note that is a Markov chain on . Note that

 [PpS(g)](x1,…,xp)=∫(0,∞)pg(xp+1,…,x2p)2p∏ℓ=p+1ϕ(xℓ−p∑j=1ajxℓ−j)dxℓ.

If for some , then

 xp+ℓ−p∑j=1ajxp+ℓ−j≥xp+ℓ−apxℓ≥xp+ℓ−apL,

and so given a sequence of functions such that we have

 supx∈S:||x||∞>L|[PpS(gn)](x)|≤1−F(−apL).

Therefore, given there exists such that

 supx∈S:||x||∞>L|[PpS(gn)](x)|≤ε. (5.1)

On the other hand, for we have

 |[PpS(gn)](x1)−[PpS(gn)](x2)|≤2||Pp(x1,.)−Pp(x2,.)||TV

where is the law of given . The assumption that is a continuous density along with Scheffe’s lemma shows that the right-hand side above is uniformly continuous, and so the sequence is uniformly equicontinuous on . Thus by the Arzelà-Ascoli theorem, there exists a subsequence along which is Cauchy in sup norm on . Taking limits along this subsequence and using (5.1) gives

 limsupm,n→∞||PpSgn−PpSgm||∞≤2ε,

and so we have proved the existence of a convergent subsequence in sup norm on , and thus the compactness of . An application of Theorem 1.2 then yields part (a).

(b) The fact that follows from Proposition 2.7 along with the assumption that has full support, and . For the other inequality, for any non-negative function such that we have

 PS(g)(x1,⋯,xp) = ∫y+∑pj=1ajxp+1−j>0g(x2,⋯,xp,y+p∑j=1ajxp+1−j)ϕ(y+p∑j=1ajxp+1−j)dy ≤ P(ξ1>0)

and so .

(c) By assumption we have , and so there exists such that for all . Along with (5.1), this gives

 |Ppa(k),Sf(x)−Ppa,Sf(x)|≤1−F(δL)+2supx∈[0,L]p||Ppa(k)(x,.)−Ppa(x,.)||TV,

which on taking a sup over such that and letting gives . Upon letting we have , which, using Lemma 1.6, gives the desired conclusion.

Proof of Proposition 2.2.

(a) In this case is a stationary Gaussian sequence with non-negative summable correlations, and the conclusions follow from [8, Lemma 3.1].

(b) To begin, note that (2.2) implies the existence of a sequence of positive reals diverging to such that , where is a sequence converging to . W.l.o.g., by replacing by if necessary, we can also assume that diverges to . Setting