Persistence exponents in Markov chains
We prove the existence of the persistence exponent
for a class of time homogeneous Markov chains in a Polish space, where is a Borel measurable set and is the initial distribution. Focusing on the case of AR() and MA() processes with and continuous innovation distribution, we study the existence of and its continuity in the parameters, for . For AR processes with log-concave innovation distribution, we prove the strict monotonicity of . Finally, we compute new explicit exponents in several concrete examples.
journalname \startlocaldefs \endlocaldefs\pdfstringdefDisableCommands
Persistence exponents in Markov chains \thankstextm1Research partially supported by DFG grant AU370/4-1. \thankstextm2Research partially supported by NSF grant DMS-1712037. \thankstextm3Research partially supported by grant 147/17 from the Israel Science Foundation.
class=MSC] \kwd[Primary ]60J05, 60F10 \kwd[; secondary ]45C05, 47A75
ARMA \kwdeigenvalue problem \kwdintegral equation \kwdlarge deviations \kwdMarkov Chain \kwdpersistence \kwdquasi-stationary distribution
Let be a time homogenous Markov chain on a Polish space with transition kernel . For a given Borel measurable set , we are interested in the asymptotics of the persistence probability
where is the initial measure, i.e. the law of . We stress that we shall be particularly interested in non-compact . We will be interested in the existence of the persistence exponent , defined as
and its continuity and monotonicity properties in parameters of the kernel.
The asymptotics of persistence probabilities for not necessarily Markov processes has received both classical and recent interest in probability theory and theoretical physics. For recent surveys on persistence probabilities we refer the reader to  for a theoretical physics point of view and to  for a review of the mathematical literature.
Our approach exploits the Markovian structure and relates the persistence exponent to an eigenvalue of an appropriate operator, via the Krein-Rutman theorem. Such ideas go back at least to Tweedie [17, 18]. We work under somewhat different assumptions, and much of our effort lies in deriving the existence of the persistence exponent and its properties directly in terms of the kernel. The analysis in [17, 18] shows, under assumptions that are not always satisfied in the examples that we consider, the equivalence of the exponent’s existence and properties of the eigenvalue equation determined by . Even in very natural examples, we sometimes, and in particular in Section 5, need to work not with but rather with a modification of it.
One upshot of our approach is a study of monotonicity and continuity properties of the persistence exponent in parameters of the kernel . We illustrate this in the case of AR and MA processes, where the kernel (and thus the persistence exponent) depends on the coefficient vector. In this context, we derive a monotonicity lemma (Lemma 5.2) that might be of independent interest. We emphasize that since we are not working in the Gaussian setup, standard tools such as Slepian’s inequality cannot be applied; in particular, our results seem to provide the first monotonicity statements outside of the Gaussian case. As an application, we prove the strict monotonicity of the persistence exponent AR() processes with log concave innovation distributions. Finally, we demonstrate the strength of our approach by computing a number of new persistence exponents in concrete examples by solving the corresponding eigenvalue equation.
The outline of the paper is as follows: Section 1.1 contains our main abstract existence result. The short and technical Section 1.2 contains an abstract monotonicity lemma, and a continuity lemma. The abstract framework is then applied to auto-regressive (AR) and moving-average (MA) processes in Section 2, where existence of the exponent, continuity of the exponent, (strict) monotonicity results, and the question whether the exponent is degenerate are discussed. Finally, Section 3 contains a number of concrete cases where we are able to solve the eigenvalue equation, i.e. to find the leading eigenvalue explicitly. Sections 4–6 are devoted to the proofs corresponding to the former three topics, respectively.
1.1 Existence of the exponent
We begin with the following definition.
Let denote the set of all bounded measurable functions on , and let denote the space of continuous bounded functions on equipped with the sup norm. For a bounded linear operator mapping to itself, define the operator norm
and the spectral radius
Note that the limit in (1.2) exists by sub-additivity, and that . Note also that , where is the linear operator on defined by
while, for comparison,
We recall that an operator from to itself is called compact if for any sequence in with one finds a subsequence such that converges in sup norm.
Assume the following conditions:
is a non-negative linear operator which maps into itself, and is compact for some .
for any non empty open set .
Further, if , then is the largest eigenvalue of the operator , the corresponding eigenfunction is non-negative, and there exists a bounded, non-negative, finitely additive regular measure on which is a left eigenvector of corresponding to the eigenvalue , i.e.
1) Replacing by in Theorem 1.2 gives
immediately a sufficient condition for the existence of a universal persistence exponent for all initial conditions satisfying condition (ii). As we will see in Section 2, this is not always the best choice.
2) The assumption of compactness of for some (rather than the compactness of itself) is (a) sufficient for the proof to go through and (b) necessary in order to deal with concrete examples: For example, one can show that the operator corresponding to an MA() process is typically not compact itself but only is compact.
3) The left eigenvector in Theorem 1.2 is only finitely additive. This is a consequence of the fact that can be (and typically is, in our applications) non-compact. This complicates some of the following arguments. For example, the proof of Proposition 3.6 would be immediate if were a measure.
1.2 Properties of exponents
We begin with a definition.
Suppose is equipped with a partial order . Let denote the class of bounded, non-negative, non-decreasing (in the sense of this partial order) measurable functions on .
A non-negative bounded linear operator on is stochastically non-decreasing with respect to the partial order if maps to itself.
The following lemma gives a sufficient condition for comparing and for two bounded non-negative linear operators .
Let and be two bounded non-negative linear operators on , such that the following conditions hold:
There exists a non-negative measurable function on such that for any .
is stochastically non-decreasing on .
Then for any we have where .
For studying the continuity of exponents, we prove the following lemma, which relates the continuity of exponents to continuity in operator norm.
For every let be a bounded linear operator on . If , then .
2 Results for AR and MA processes
In this section we study the persistence exponents of two classes of examples of interest: auto-regressive processes and moving-average processes.
2.1 Auto-regressive processes
First we deal with auto-regressive processes (AR processes of order ). Let be sequence of i.i.d. random variables of law possessing a continuous density function with respect to the Lebesgue measure. Let be a vector, called coefficient vector. Given independent of the sequence , we define an AR() process by setting
We will always assume that is distributed according to the law . Let be defined by
It is not hard to verify that under the above assumptions, maps to itself.
We first turn to the existence of the persistence exponent. Here, we have to distinguish the cases , , and .
Fix , , continuous innovation density , initial distribution satisfying for all open , and let be the associated AR(p) process.
There is a , independent of , such that
Further, if , then is the largest eigenvalue of the operator from (2.1), viewed as an operator mapping to itself. The corresponding eigenvector is non-negative and continuous.
If then , and if then .
If is a sequence of vectors in converging to and , then we have .
As the next proposition shows, if the coefficient vector in Theorem 2.1 is allowed to have positive entries, depending on the initial distribution the exponent need not even exist.
Suppose is an AR() process with innovation distribution , and .
If the initial distribution is , then and it is continuous in .
If the initial distribution satisfies
then there exists sequences and such that
where is as in part (a). In particular, the exponent does not exist for any , whereas .
It follows that the AR() operator with and is no longer compact on , as otherwise Theorem 1.2 would be applicable with .
In order to address this issue and derive an existence result for more general situations where the operator is not compact, one needs to make a judicious choice of the operator in Theorem 1.2, and this requires additional assumptions on the initial measure and innovation. We focus below on the contractive case .
Fix , parameters satisfying , continuous innovation density , initial distribution satisfying for all open , and let be the associated AR(p) process. Further assume that there exists such that
There is a independent of the initial distribution such that
Further, if , then is an eigenvector of the operator on defined by (2.1). The corresponding eigenvector is non-negative and continuous.
If then , and if then .
The function is continuous on the set .
As mentioned before, the proof of Theorem 2.3 employs a modified version of the operator , which now turns out to be compact if . The motivation behind the modification of the operator borrows from [2, 5, 15], who use a similar strategy to deal with AR() processes with Gaussian innovations starting at stationarity.
2.1.1 Strict monotonicity of the exponent
We first begin with the following proposition which shows monotonicity (not necessarily strict) of the AR persistence exponent. Recall that if then the exponent may not exist, as shown in Proposition 2.2.
For any , if then
In particular, if both exponents exist, then we have .
If the limit in (2.3) in Proposition 2.2 is , then the same proof shows that the corresponding exponent exists and equals for all , and consequently the function is not strictly monotone. Our next theorem shows that if has a log concave density on and the initial distribution has finite exponential moment, then the map is strictly increasing on the set . The exponential decay of log concave densities ensures that (2.7) holds, and so Theorem 2.3 guarantees the existence of a non-trivial exponent which is free of the initial distribution.
Assume is a strictly positive log concave density over , that , and that satisfies (2.6). Then with implies .
We complete the picture on the positive orthant through the next theorem, which says that the persistence exponent for all such that , for any innovation distribution .
Assume that and . If and then .
Proposition 2.6, together with Theorems 2.3 and 2.5, gives a complete picture in terms of monotonicity on the positive orthant. The function is continuous and non-increasing on , and identically equal to on the set . If further the innovation density is log concave and the initial distribution has finite exponential moment, then the exponent is strictly increasing on .
2.1.2 Positivity of the exponent
Part (b) of Theorem 2.1 and Theorem 2.3 give conditions ensuring that the exponent is non-trivial, i.e. the persistence probability decays at an exponential rate. The next proposition generalizes this to show that no matter what the coefficient vector may be, the exponent can never be , i.e. the persistence probability can never decay at a super exponential rate.
Fix , parameters , innovation distribution satisfying , and initial satisfying for every . Let be the associated AR(p) process. Then,
In particular, under the assumptions of Proposition 2.7, if exists then it must be positive.
One cannot dispense completely of the assumptions in Proposition 2.7. Indeed, concerning the condition on initial distribution, when , , and , one sees that forces , and so . On the other hand, concerning the condition on the innovation distribution, if , , and for all , one obtains that , and so again .
2.2 Moving Average process
We next discuss moving average processes. Let be a sequence of i.i.d. random variables from a continuous distribution function . For a coefficient vector define the MA() process by setting
Define the operator mapping to itself by
For all MA() processes with , there is a so that
Further, if then is the largest eigenvalue of the operator defined in (2.8), and the corresponding eigenfunction is non-negative and continuous.
The following theorem establishes the continuity of the MA persistence exponent.
In the setting of Theorem 2.9, the function is continuous on .
Theorem 2.9 shows that . Our next proposition gives a necessary and sufficient condition for .
Suppose we have an MA() process such that . Then if and only if .
3 Computing the exponent in concrete cases
Using our operator approach, we can compute the persistence exponent in a number of concrete examples.
3.1 Results for AR(1) processes
We begin with the computation of the persistence exponent for the AR(1) process with uniformly distributed innovations.
Let denote an AR(1) process with , arbitrary initial distribution , and with innovation density , where . Then
Our second example concerns exponential innovations.
Let denote an AR(1) process with , arbitrary initial distribution , and standard exponential innovations. Then
3.2 Results for MA processes
We next consider MA(1) processes, starting with uniform innovation density.
Let denote an MA(1) process with and innovation density , where .
where is the largest real solution to the equation
For in Proposition 3.3, one obtains . The next theorem shows that for continuous symmetric innovation distributions this value is universal.
Let denote an MA(1) process with and symmetric innovation density. Then
We show in Proposition 3.5 below that the universality in Theorem 3.4 does not extend to discrete distributions. In fact, for discrete innovation distributions , there can be non-trivial differences between the two quantities
Let denote an MA(1) process with and Rademacher innovations, i.e. equal with probability ). Then
Our final example considers MA(1) processes with exponential innovation distribution.
Let denote an MA(1) process with and standard exponential innovations. Then
4 Proof of the results of Section 1
Proof of Theorem 1.2.
The upper bound is simple: using that , we obtain from (1.2) that
We turn to the lower bound. We may and will assume that since otherwise there is nothing left to prove. Note that equipped with the sup norm is a Banach space (even if is not compact, see , p. 257). Thus denoting by the -fold composition of (note that we consider acting on the smaller space ), by assumption (i), is a compact operator. Further,
we note that . Also note that , and so is non-zero and non-negative. Finally a telescopic cancellation gives
and so . Thus, setting , we obtain
Integrating the last inequality with respect to gives
Since by assumption (ii) on , the lower bound in (1.4) follows at once.
Finally, the fact that is the largest eigenvalue of follows from the fact that is the largest eigenvalue of , a consequence of [7, Theorem 19.2]. Also, existence of the left eigenvector follows from [7, Exercise 12, p. 236], along with the observation that the dual of is the space of bounded, finitely additive regular measures on , see [9, Theorem IV.6.2.2]. ∎
Proof of Lemma 1.5.
Since , using assumption (ii) we have for all . Using condition (i) , which is the desired conclusion for . To verify the statement for general , we proceed by induction:
In the last display, we use the fact that along with condition (i) for the first inequality, and the induction hypothesis along with the fact that preserves the ordering in the second inequality. ∎
Proof of Lemma 1.6.
Since converges to , without loss of generality assume . Also without loss of generality by scaling all operators involved if neccessary, we can assume that . Thus, for any with and arbitrary we have
which upon taking sup over and invoking (1.2) gives
Letting followed by gives
which is the upper bound. The lower bound follows by a symmetric argument, reversing the roles of and . ∎
5 Proofs of the results of Section 2
5.1 Proof of the results of Section 2.1
Proof of Theorem 2.1.
The proof is based on Theorem 1.2 with , and , and consists in checking the assumptions there, and in particular the compactness of .
(a) If then the process is i.i.d. for which all conclusions are trivial. Thus assume w.l.o.g. that , and that (otherwise we can reduce the value of ). Note that is a Markov chain on . Note that
If for some , then
and so given a sequence of functions such that we have
Therefore, given there exists such that
On the other hand, for we have
where is the law of given . The assumption that is a continuous density along with Scheffe’s lemma shows that the right-hand side above is uniformly continuous, and so the sequence is uniformly equicontinuous on . Thus by the Arzelà-Ascoli theorem, there exists a subsequence along which is Cauchy in sup norm on . Taking limits along this subsequence and using (5.1) gives
and so we have proved the existence of a convergent subsequence in sup norm on , and thus the compactness of . An application of Theorem 1.2 then yields part (a).
(b) The fact that follows from Proposition 2.7 along with the assumption that has full support, and . For the other inequality, for any non-negative function such that we have
and so .
(c) By assumption we have , and so there exists such that for all . Along with (5.1), this gives
which on taking a sup over such that and letting gives . Upon letting we have , which, using Lemma 1.6, gives the desired conclusion.
Proof of Proposition 2.2.
(a) In this case is a stationary Gaussian sequence with non-negative summable correlations, and the conclusions follow from [8, Lemma 3.1].
(b) To begin, note that (2.2) implies the existence of a sequence of positive reals diverging to such that , where is a sequence converging to . W.l.o.g., by replacing by if necessary, we can also assume that diverges to . Setting