Statistics for Tail Processes of Markov Chains

Statistics for Tail Processes of Markov Chains

Holger Drees
University of Hamburg
Department of Mathematics
Bundesstraße 55, 20146 Hamburg, Germany
   Johan Segers   Michał Warchoł
Université catholique de Louvain
Institut de Statistique, Biostatistique et Sciences Actuarielles
Voie du Roman Pays 20, B-1348 Louvain-la-Neuve, Belgium,
July 6, 2019

At high levels, the asymptotic distribution of a stationary, regularly varying Markov chain is conveniently given by its tail process. The latter takes the form of a geometric random walk, the increment distribution depending on the sign of the process at the current state and on the flow of time, either forward or backward. Estimation of the tail process provides a nonparametric approach to analyze extreme values. A duality between the distributions of the forward and backward increments provides additional information that can be exploited in the construction of more efficient estimators. The large-sample distribution of such estimators is derived via empirical process theory for cluster functionals. Their finite-sample performance is evaluated via Monte Carlo simulations involving copula-based Markov models and solutions to stochastic recurrence equations. The estimators are applied to stock price data to study the absence or presence of symmetries in the succession of large gains and losses.


Heavy–tailed Markov chains; Regular variation; Stationary time series; Tail process; Time reversibility.

1 Introduction

If serial dependence at high levels is sufficiently strong, extreme values of a stationary time series may arrive in clusters rather than in isolation. This is the case, for instance, for linear time series with heavy-tailed innovations and for solutions of stochastic recurrence equations. If a particular time series model is to be used for prediction at such high levels, it is important to model these clusters well. Think of tail-related risk measures in finance or of return levels in hydrology: a rapid succession of particularly rainy days may be especially dangerous if the capacity of the system to absorp the water is limited.

To judge the quality of fit of a time series model at extreme levels, it is useful to have a benchmark relying on as little model assumptions as possible. A purely nonparametric approach, however, has the drawback that there may be too few data that are sufficiently large. For the purpose of extrapolation, the empirical measure is inadequate.

A solution is to rely on asymptotic theory describing possible limit distributions for the extremes of a time series. If this family of distributions is not too large, one may hope to be able to fit it to actual data.

For extremes of stationary time series, there are several asymptotic frameworks available, all of them more or less equivalent. For the study of short-range extremal dependence, the tail process (Basrak and Segers, 2009) is a convenient choice. It captures the collection of finite-dimensional limit distributions of the series conditionally on the event that, at a particular time instant, the series is far from the origin. For instance, the tail process determines the tail dependence coefficients and the extremal index (Leadbetter, 1983). It is also related to other tail-related objects such as the extremogram (Davis and Mikosch, 2009) and the extremal dependence measure (Larsson and Resnick, 2012).

The family of tail processes of regularly varying time series is still too large to permit accurate nonparametric estimation. Additional assumptions serve to render the inference problem more manageable. The choice made in this paper is to focus on stationary univariate Markov chains. The joint distribution of such a chain is determined by its bivariate margins, yielding considerable simplifications. Its tail process takes the form of a geometric random walk, the increments depending both on the sign of the process at the current state and on the direction of time, forward or backward. The random walk representation goes back to Smith (1992) and was developed further in Perfekt (1997), Bortot and Coles (2000), and Yun (2000). The formulation in terms of the tail process stems from Segers (2007) and Janßen and Segers (2014). By a marginal standardization procedure, the tail process may also be used for time series with light-tailed margins. Such time series arise for instance in environmental applications.

The tail process of a stationary time series is itself not stationary because of the special role played by the time instant figuring in the conditioning event. Still, its finite-dimensional distributions satisfy a collection of identities regarding the effect of a time shift. These equations can be summarized into the so-called time-change formula; see equation (2.6) below. Apart from being a probabilistic nicety, the time-change formula is useful from a statistical perspective because it provides additional information on the distribution of the tail process. Exploiting this information can lead to more efficient inference.

Our contribution is to propose and study nonparametric estimators for the tail process of a stationary univariate Markov chain. Large-sample theory and Monte Carlo simulations both confirm that efficiency gains are possible when the time-change formula is incorporated into the estimation procedure. The asymptotic distributions of the estimators are described via functional central limit theorems building on the empirical process theory developed in Drees and Rootzén (2010). The finite-sample performance is investigated for solutions of stochastic recurrence equations and for copula-based Markov models (Chen and Fan, 2006). We focus on the estimation of cumulative distribution functions. Following Bortot and Coles (2000), however, one could also use kernel methods to estimate their densities.

The structure of the paper is as follows. Tail processes are reviewed in Section 2, with special attention to those of Markov chains. The estimators of the tail process of a regularly Markov chain are described in Section 3. Their asymptotic properties are worked out in Sections 4, whereas their finite-sample performance is evaluated via Monte Carlo simulations in Section 6 involving models presented in the Section 5. In Section 7, the estimators are applied to analyze time series of daily log returns of Google and UBS stock prices, revealing interesting patterns regarding the succession of large losses and gains. Proofs and calculations are deferred to Section 8.

Some notational conventions: the law of a random object is denoted by . Weak convergence is denoted by the arrow . The indicator variable of the event is denoted by . The set of integers is denoted by , while .

2 Tail processes of Markov chains

A strictly stationary time series is said to have a tail process if, for all such that ,


with the implicit understanding that the law of is non-degenerate.

Specializing equation (2.1) to implies that as for all continuity points of the law of . Since the law of was supposed to be non-degenerate, it follows that the function is regularly varying at infinity: there exists such that


The law of is thus Pareto(), i.e., for all . More generally, by Basrak and Segers (2009, Theorem 2.1), the time series admits a tail process with non-degenerate if and only if is jointly regularly varying with some index , i.e., if for all integers the random vector is multivariate regularly varying with index .

Many time series models are jointly regularly varying and hence admit a tail process. Examples include linear processes with heavy-tailed innovations, solutions to stochastic recurrence equations, and models of the ARCH and GARCH families. Sufficient conditions for such models to be regularly varying can be found in Davis et al. (2013).

The spectral tail process is defined by

By (2.1) and the continuous mapping theorem, it follows that for all such that


The difference between (2.1) and (2.3) is that in the latter equation, the variables are normalized by rather than by the threshold . Such auto-normalization allows the tail process to be decomposed into two stochastically independent components, i.e.,


Independence of and is stated in Basrak and Segers (2009, Theorem 3.1). The random variable characterizes the magnitudes of extremes, whereas captures serial dependence. The spectral tail process at time yields information on the relative weights of the upper and lower tails of : since , we have


The distributions of the forward tail process and the backward tail process mutually determine each other. The precise connection between the forward and backward (spectral) tail processes is captured by Theorem 3.1 in Basrak and Segers (2009). For all with and for all measurable functions satisfying whenever , we have, provided the expectations exist,


We will refer to (2.6) as the time-change formula. By exploiting the time-change formula, we will be able to improve upon the efficiency of estimators of the tail process.

A common procedure in multivariate extreme value theory is to standardize the margins. For jointly regularly varying time series, such a standardization is possible too, although some care is needed because of the possible presence of both positive and negative extremes.

Lemma 2.1.

Let be a stationary time series, jointly regularly varying with index , and having spectral tail process . Put for . Define a stationary time series by


Then is jointly regularly varying with index . Its spectral tail process is given by


In (2.8), note that the map is monotone and symmetric. The standardized series may be regularly varying even if the original series is not. In that sense, the standardization procedure in (2.7) widens the field of possible applications of tail processes. For instance, the marginal distributions of enviromental variables are often light-tailed rather than regularly varying. After standardization as in Lemma 2.1, the serial dependence between of extremes of such time series may still be modelled via tail processes.

Some time series models exhibit asymptotic independence of consecutive observations, that is, as for all . Well-known examples are non-degenerate Gaussian time series and classical stochastic volatility models. In such cases, the spectral tail process is noninformative in the sense that almost surely for all . More refined approaches to handle tail independence were developed in Ledford and Tawn (1996, 2003) and, more recently, in Janßen and Drees (2013) and Kulik and Soulier (2013).

Regularly varying Markov chains

For the purpose of statistical inference, the class of spectral tail processes is too large to be really useful: without additional modelling assumptions, it is impossible to estimate all limiting finite-dimensional distributions that appear in (2.1) or (2.3). Therefore, it is reasonable to consider families of spectral tail processes arising under additional constraints on the underlying time series.

One such family was identified in Segers (2007) and Janßen and Segers (2014) in the context of first-order Markov chains. Let be the spectral tail process of an -regularly varying, stationary time series , arising as the limit process in (2.3). Put as in (2.5). Introduce random variables (or rather their laws) as follows: if , then, as ,


and if , then


Further, let be independent random variables such that , , , and for all . Then the spectral tail process is said to be a Markov spectral tail chain if the following holds: the forward spectral tail process is given recursively by


whereas the backward spectral tail process is given by


If , then almost surely for all and thus the definition of is immaterial; similarly if . This can be seen by applying the time-change formula (2.6).

The motivation behind the above definition is that such spectral tail processes typically arise when is a stationary, first-order Markov chain; see Theorem 5.2 in Segers (2007) and Corollary 5.1 in Janßen and Segers (2014). However, they may as well arise in settings where the underlying process, , is non-Markovian; see Remark 5.1 in Janßen and Segers (2014). The forward and backward spectral tail processes and are Markovian themselves, and, conditionally on , they are independent. Their structure is that of a geometric random walk where the distribution of the increment at time depends on the sign of the process at time . The point zero acts as an absorbing state.

For Markov spectral tail chains, the distribution of the forward part is determined by , , and . Given additionally the index of regular variation , the distributions of and and thus of the backward part can be reconstructed from the time-change formula (2.6); see Lemma 3.1 below. It follows that the law of a Markov spectral tail process is determined by , , and the laws of and . This reduction provides a handle on the spectral tail process that can be exploited for statistical inference.

3 Estimating Markov spectral tail processes

In this section we propose estimators for , and . In combination with the index of regular variation , this triplet fully determines the law of a Markov spectral tail process as defined in equations (2.13) and (2.14), and of the tail processes .

Replacing population distributions by sampling distributions in the left-hand sides of (2.9) and (2.11) yields forward estimators for the laws of and . However, exploiting the time-change formula (2.6) allows to express the laws of and in terms of and (and and ). These expressions motivate so-called backward estimators for and . Convex combinations of forward and backward estimators finally produce mixture estimators. For an appropriate choice of the mixture weights, the mixture estimators may be more efficient than both the forward and the backward estimators separately.

In order to estimate , we simply take the empirical version of (2.5), yielding


For to be consistent and asymptotically normal, the threshold sequence should tend to infinity at a certain rate described in detail in condition (B) in the next section.

For estimating the cdf, , of we propose


which we refer to as the forward estimator of the cdf of . Similarly, for the forward estimator of the cdf of we take


The forward estimators of the cdf’s of and are empirical versions of the left-hand sides of (2.9) and (2.11), respectively. Note that one can expect consistency of these estimators only if the target distribution functions are continuous in , because otherwise need not converge to , for instance.

The time-change formula (2.6) yields a different representation of and , motivating different estimators than the ones above, based on different data points. For ease of reference, we record the relevant formulas in a lemma, whose proof is given in Appendix 8.2.

Lemma 3.1.

Let be a stationary time series, jointly regularly varying with index and spectral tail process . Let be given as in (2.9) to (2.12). If , then


Similarly, if , then


Formulas (3.4) to (3.7) remain valid when the time instances and are interchanged.

Assume for the moment that is known. Below, we will consider the more realistic situation that is unknown. Lemma 3.1 suggests the following backward estimator of the cdf of :


Similarly, we define the backward estimator of the cdf of as


For large, the backward estimators usually have a smaller variance than the forward estimators. To see this, note that for negative with large modulus only very few summands in the numerator of (3.2) do not vanish, because must be even larger in absolute value than , leading to a large variance of the numerator. In contrast, usually many more non-vanishing terms will be summed up in the numerator of (3.8), while each of them gets a rather low weight , leading to a smaller variance. For large positive one may argue similarly by considering the corresponding estimators of the survival function. Indeed, we show in Remark 4.2 that, provided , the backward estimator of the cdf of at has a smaller asymptotic variance than the forward estimator.

For well-chosen weights, convex combinations of the forward and backward estimators can achieve a lower asymptotic variance than each of the estimators individually. Unfortunately, the expression for the asymptotic covariance of the two estimators is intractable; see Corollary 4.1. It remains an open issue how to choose the mixture weights in order to minimize the asymptotic variance.

A pragmatic approach is to give more weight to the forward estimator for small and to give more weight to the backward estimator for large . To this end, define weights by

The mixture estimator for the cdf of is defined as


The mixture estimator for is defined by replacing in (3.10) with .

The backward and the mixture estimators require the value of the index of regular variation, which is unknown in most applications. There are at least two approaches to deal with this issue:

  1. Estimate separately, for instance, by the Hill–type estimator


    and plug in the estimated value of in (3.8), (3.9) and (3.10).

  2. Employ an empirical version of the transformation in Lemma 2.1 to ensure that, after transformation, . The transformation in (2.7) requires the tail function . This function can be estimated, for instance, by


    where we divide by rather than by in order to avoid division by zero later on. The transformed variable

    is based on the sign of and the rank of among .

In the simulation study in Section 6, the mixture estimator based on the rank-transformed data performs better than the plug-in version. Note, however, that the two approaches are not directly comparable: with the second approach, what we estimate is the tail process of the transformed series . From (2.8) and (2.9), it follows that, if is a Markov spectral tail chain as in (2.13) and (2.14), then so is , with and to be replaced by and , respectively. Combining the above two estimation approaches, one could even recover and via and . Finally, as may have a tail process even when the original time series has none, the second approach is more widely applicable.

4 Large sample theory

Under certain conditions, the standardized estimation errors of the forward and the backward estimators converge jointly to a centered Gaussian process. In order not to overload the presentation, we focus on nonnegative Markov chains. In that case, the distribution of determines the distribution of the forward spectral tail process, and thus, via the time-change formula, together with , also the one of the backward spectral tail process. We distinguish between the cases where is known (Section 4.1) and unknown (Section 4.2). In addition, we briefly indicate how the conditions and results must be modified in the real-valued case (Remark 4.7).

4.1 Known index of regular variation

If the index of regular variation, , is known, all estimators under consideration can be expressed in terms of generalized tail array sums, that is, statistics of the form , with


Drees and Rootzén (2010) give conditions under which, after standardization, such statistics converge to a centered Gaussian process, uniformly over appropriate families of functions . From these results we will deduce a functional central limit theorem for the processes of forward and backward estimators defined in (3.2) and (3.8), respectively.

To ensure consistency, the threshold must tend to infinity such that

tends to 0, but the expected number of exceedances tends to infinity. Moreover, let

denote the -mixing coefficients. Here is the -field generated by . We assume that there exist sequences and some such that the following conditions hold:


The cdf, , of is continuous on .

  1. As , we have , , , ;

  2. as and .

Condition (B) pose restrictions on the rate at which tends to 0 and thus on the rate at which tends to . Sufficient conditions to ensure that a Markov chain is -mixing can be found in Doukhan (1995, Section 2.4). Usually, for some and one may choose , and (B) is fulfilled for a suitably chosen if and .


For all there exists

such that .

Typically will be of the form with and . The interchangeability of the limit and the sum is then automatically fulfilled. For stochastic recurrence equations (Section 5.2), conditions (B) and (C) are verified in Example 8.3 below.

Under these conditions, one can prove the asymptotic normality of relevant generalized tail array sums (see Proposition 8.4 below) and thus the joint asymptotic normality of the forward and the backward estimator of centered by their expectation. However, additional conditions are needed to ensure that their bias is asymptotically negligible:


Here denotes the survival function of (and hence of ). These conditions are fulfilled if tends to sufficiently slowly, because by definition of the spectral tail process and by (3.4), the left-hand sides in (4.2)–(4.3) tend to if is continuous on .

Theorem 4.1.

Let be a stationary, regularly varying process with a Markov spectral tail chain. If (A()), (B), (C), (4.2) and (4.3) are fulfilled for some and , then


where the limit is a centered Gaussian process whose covariance function is given by


Remark 4.2.

For , we have

provided . Hence, for such , when the tail index is known, the backward estimator is asymptotically more efficient than the forward estimator.

Remark 4.3.

While it is not too restrictive to assume that the cdf of is continuous on , often has positive mass at 0; see Example 8.2. In this case, one may prove a version of Theorem 4.1 where the first coordinate in (4.4) is replaced with

for a weight function and any nondecreasing, continuous function with .

Remark 4.4.

Note that a similar result also holds true without assuming the Markovianity of the spectral tail chain, but then the formulas for the covariance function of the limiting process are more involved. The simple explicit formulas for the asymptotic variances obtained in Theorem 4.1 can be used to construct pointwise confidence intervals for the cdfs of or by a plug-in approach. However, if one wants to derive uniform confidence bands or tests for these cdfs then a resampling procedure may be advisable. The same holds true if one cannot assume that the tail sequence has the Markov property. The analysis of such methods is left for future work.

4.2 Unknown index of regular variation

In most applications, the index of regular variation, , is unknown. In the definition of the backward estimator , it must then be replaced with a suitable estimator. A popular estimator of is the Hill-type estimator (3.11). More generally, one may consider estimators that can be written in the form


with a remainder term and a suitable function which is a.s. continuous w.r.t.  for all such that and for all . Obviously, the Hill-type estimator is of this form with . Under weak dependence conditions, other well-known estimators like the maximum likelihood estimator in a generalized Pareto model examined by Smith (1987) and the moment estimator suggested by Dekkers et al. (1989) can be written in this way too; see Drees (1998a, Example 4.1) and Drees (1998b, Example 4.1) for similar results in the case of i.i.d. sequences.

Estimators of type (4.5) can be approximated by the ratio of the generalized tail array sums corresponding to the functions and , respectively, and their asymptotic behavior can hence be derived from Theorem 2.3 of Drees and Rootzén (2010). To this end, we replace (C) with the following condition:


For all there exists


such that .

Moreover, there exists such that


If is bounded, then (C’) follows from condition (C), but in general it is more restrictive, though it can often be established by similar arguments. For example, condition (C’) holds for the solutions to the stochastic recurrence equation studied in Example 8.3 and the Hill estimator, i.e., for .

The following result gives the asymptotic normality of centered at


This quantity tends to as by the assumptions on the function and condition (C’).

Lemma 4.5.

If is of the form (4.5) and if the conditions (B) and (C’) hold, then

for a centered Gaussian process with covariance function given by (8.6).

Similarly as in (4.2) and (4.3), we need an extra condition to ensure that the bias of is asymptotically negligible:


Now we are ready to state the asymptotic normality of the backward estimator with estimated index , i.e.,

Theorem 4.6.

Suppose that the conditions of Corollary 4.1 and of Lemma 4.5 are fulfilled and that (4.9) holds. Then



for a centered Gaussian process with covariance function given by (8.6).

The covariance function of the limiting process can be calculated in the same way as in the proof of Theorem 4.1. In general, the resulting expressions will involve sums over all . Moreover, it is no longer guaranteed that the backward estimator of at has a smaller variance than the forward estimator.

Remark 4.7.

For Markovian time series which are not necessarily positive, the forward and backward estimators of and can be represented in terms of generalized tail array sums constructed from

When , for example, the backward estimator equals the ratio of the generalized tail array sums pertaining to