Statistics for Tail Processes of Markov Chains
Abstract
At high levels, the asymptotic distribution of a stationary, regularly varying Markov chain is conveniently given by its tail process. The latter takes the form of a geometric random walk, the increment distribution depending on the sign of the process at the current state and on the flow of time, either forward or backward. Estimation of the tail process provides a nonparametric approach to analyze extreme values. A duality between the distributions of the forward and backward increments provides additional information that can be exploited in the construction of more efficient estimators. The largesample distribution of such estimators is derived via empirical process theory for cluster functionals. Their finitesample performance is evaluated via Monte Carlo simulations involving copulabased Markov models and solutions to stochastic recurrence equations. The estimators are applied to stock price data to study the absence or presence of symmetries in the succession of large gains and losses.
Keywords:
Heavy–tailed Markov chains; Regular variation; Stationary time series; Tail process; Time reversibility.
1 Introduction
If serial dependence at high levels is sufficiently strong, extreme values of a stationary time series may arrive in clusters rather than in isolation. This is the case, for instance, for linear time series with heavytailed innovations and for solutions of stochastic recurrence equations. If a particular time series model is to be used for prediction at such high levels, it is important to model these clusters well. Think of tailrelated risk measures in finance or of return levels in hydrology: a rapid succession of particularly rainy days may be especially dangerous if the capacity of the system to absorp the water is limited.
To judge the quality of fit of a time series model at extreme levels, it is useful to have a benchmark relying on as little model assumptions as possible. A purely nonparametric approach, however, has the drawback that there may be too few data that are sufficiently large. For the purpose of extrapolation, the empirical measure is inadequate.
A solution is to rely on asymptotic theory describing possible limit distributions for the extremes of a time series. If this family of distributions is not too large, one may hope to be able to fit it to actual data.
For extremes of stationary time series, there are several asymptotic frameworks available, all of them more or less equivalent. For the study of shortrange extremal dependence, the tail process (Basrak and Segers, 2009) is a convenient choice. It captures the collection of finitedimensional limit distributions of the series conditionally on the event that, at a particular time instant, the series is far from the origin. For instance, the tail process determines the tail dependence coefficients and the extremal index (Leadbetter, 1983). It is also related to other tailrelated objects such as the extremogram (Davis and Mikosch, 2009) and the extremal dependence measure (Larsson and Resnick, 2012).
The family of tail processes of regularly varying time series is still too large to permit accurate nonparametric estimation. Additional assumptions serve to render the inference problem more manageable. The choice made in this paper is to focus on stationary univariate Markov chains. The joint distribution of such a chain is determined by its bivariate margins, yielding considerable simplifications. Its tail process takes the form of a geometric random walk, the increments depending both on the sign of the process at the current state and on the direction of time, forward or backward. The random walk representation goes back to Smith (1992) and was developed further in Perfekt (1997), Bortot and Coles (2000), and Yun (2000). The formulation in terms of the tail process stems from Segers (2007) and Janßen and Segers (2014). By a marginal standardization procedure, the tail process may also be used for time series with lighttailed margins. Such time series arise for instance in environmental applications.
The tail process of a stationary time series is itself not stationary because of the special role played by the time instant figuring in the conditioning event. Still, its finitedimensional distributions satisfy a collection of identities regarding the effect of a time shift. These equations can be summarized into the socalled timechange formula; see equation (2.6) below. Apart from being a probabilistic nicety, the timechange formula is useful from a statistical perspective because it provides additional information on the distribution of the tail process. Exploiting this information can lead to more efficient inference.
Our contribution is to propose and study nonparametric estimators for the tail process of a stationary univariate Markov chain. Largesample theory and Monte Carlo simulations both confirm that efficiency gains are possible when the timechange formula is incorporated into the estimation procedure. The asymptotic distributions of the estimators are described via functional central limit theorems building on the empirical process theory developed in Drees and Rootzén (2010). The finitesample performance is investigated for solutions of stochastic recurrence equations and for copulabased Markov models (Chen and Fan, 2006). We focus on the estimation of cumulative distribution functions. Following Bortot and Coles (2000), however, one could also use kernel methods to estimate their densities.
The structure of the paper is as follows. Tail processes are reviewed in Section 2, with special attention to those of Markov chains. The estimators of the tail process of a regularly Markov chain are described in Section 3. Their asymptotic properties are worked out in Sections 4, whereas their finitesample performance is evaluated via Monte Carlo simulations in Section 6 involving models presented in the Section 5. In Section 7, the estimators are applied to analyze time series of daily log returns of Google and UBS stock prices, revealing interesting patterns regarding the succession of large losses and gains. Proofs and calculations are deferred to Section 8.
Some notational conventions: the law of a random object is denoted by . Weak convergence is denoted by the arrow . The indicator variable of the event is denoted by . The set of integers is denoted by , while .
2 Tail processes of Markov chains
A strictly stationary time series is said to have a tail process if, for all such that ,
(2.1) 
with the implicit understanding that the law of is nondegenerate.
Specializing equation (2.1) to implies that as for all continuity points of the law of . Since the law of was supposed to be nondegenerate, it follows that the function is regularly varying at infinity: there exists such that
(2.2) 
The law of is thus Pareto(), i.e., for all . More generally, by Basrak and Segers (2009, Theorem 2.1), the time series admits a tail process with nondegenerate if and only if is jointly regularly varying with some index , i.e., if for all integers the random vector is multivariate regularly varying with index .
Many time series models are jointly regularly varying and hence admit a tail process. Examples include linear processes with heavytailed innovations, solutions to stochastic recurrence equations, and models of the ARCH and GARCH families. Sufficient conditions for such models to be regularly varying can be found in Davis et al. (2013).
The spectral tail process is defined by
By (2.1) and the continuous mapping theorem, it follows that for all such that
(2.3) 
The difference between (2.1) and (2.3) is that in the latter equation, the variables are normalized by rather than by the threshold . Such autonormalization allows the tail process to be decomposed into two stochastically independent components, i.e.,
(2.4) 
Independence of and is stated in Basrak and Segers (2009, Theorem 3.1). The random variable characterizes the magnitudes of extremes, whereas captures serial dependence. The spectral tail process at time yields information on the relative weights of the upper and lower tails of : since , we have
(2.5) 
The distributions of the forward tail process and the backward tail process mutually determine each other. The precise connection between the forward and backward (spectral) tail processes is captured by Theorem 3.1 in Basrak and Segers (2009). For all with and for all measurable functions satisfying whenever , we have, provided the expectations exist,
(2.6) 
We will refer to (2.6) as the timechange formula. By exploiting the timechange formula, we will be able to improve upon the efficiency of estimators of the tail process.
A common procedure in multivariate extreme value theory is to standardize the margins. For jointly regularly varying time series, such a standardization is possible too, although some care is needed because of the possible presence of both positive and negative extremes.
Lemma 2.1.
Let be a stationary time series, jointly regularly varying with index , and having spectral tail process . Put for . Define a stationary time series by
(2.7) 
Then is jointly regularly varying with index . Its spectral tail process is given by
(2.8) 
In (2.8), note that the map is monotone and symmetric. The standardized series may be regularly varying even if the original series is not. In that sense, the standardization procedure in (2.7) widens the field of possible applications of tail processes. For instance, the marginal distributions of enviromental variables are often lighttailed rather than regularly varying. After standardization as in Lemma 2.1, the serial dependence between of extremes of such time series may still be modelled via tail processes.
Some time series models exhibit asymptotic independence of consecutive observations, that is, as for all . Wellknown examples are nondegenerate Gaussian time series and classical stochastic volatility models. In such cases, the spectral tail process is noninformative in the sense that almost surely for all . More refined approaches to handle tail independence were developed in Ledford and Tawn (1996, 2003) and, more recently, in Janßen and Drees (2013) and Kulik and Soulier (2013).
Regularly varying Markov chains
For the purpose of statistical inference, the class of spectral tail processes is too large to be really useful: without additional modelling assumptions, it is impossible to estimate all limiting finitedimensional distributions that appear in (2.1) or (2.3). Therefore, it is reasonable to consider families of spectral tail processes arising under additional constraints on the underlying time series.
One such family was identified in Segers (2007) and Janßen and Segers (2014) in the context of firstorder Markov chains. Let be the spectral tail process of an regularly varying, stationary time series , arising as the limit process in (2.3). Put as in (2.5). Introduce random variables (or rather their laws) as follows: if , then, as ,
(2.9)  
(2.10) 
and if , then
(2.11)  
(2.12) 
Further, let be independent random variables such that , , , and for all . Then the spectral tail process is said to be a Markov spectral tail chain if the following holds: the forward spectral tail process is given recursively by
(2.13) 
whereas the backward spectral tail process is given by
(2.14) 
If , then almost surely for all and thus the definition of is immaterial; similarly if . This can be seen by applying the timechange formula (2.6).
The motivation behind the above definition is that such spectral tail processes typically arise when is a stationary, firstorder Markov chain; see Theorem 5.2 in Segers (2007) and Corollary 5.1 in Janßen and Segers (2014). However, they may as well arise in settings where the underlying process, , is nonMarkovian; see Remark 5.1 in Janßen and Segers (2014). The forward and backward spectral tail processes and are Markovian themselves, and, conditionally on , they are independent. Their structure is that of a geometric random walk where the distribution of the increment at time depends on the sign of the process at time . The point zero acts as an absorbing state.
For Markov spectral tail chains, the distribution of the forward part is determined by , , and . Given additionally the index of regular variation , the distributions of and and thus of the backward part can be reconstructed from the timechange formula (2.6); see Lemma 3.1 below. It follows that the law of a Markov spectral tail process is determined by , , and the laws of and . This reduction provides a handle on the spectral tail process that can be exploited for statistical inference.
3 Estimating Markov spectral tail processes
In this section we propose estimators for , and . In combination with the index of regular variation , this triplet fully determines the law of a Markov spectral tail process as defined in equations (2.13) and (2.14), and of the tail processes .
Replacing population distributions by sampling distributions in the lefthand sides of (2.9) and (2.11) yields forward estimators for the laws of and . However, exploiting the timechange formula (2.6) allows to express the laws of and in terms of and (and and ). These expressions motivate socalled backward estimators for and . Convex combinations of forward and backward estimators finally produce mixture estimators. For an appropriate choice of the mixture weights, the mixture estimators may be more efficient than both the forward and the backward estimators separately.
In order to estimate , we simply take the empirical version of (2.5), yielding
(3.1) 
For to be consistent and asymptotically normal, the threshold sequence should tend to infinity at a certain rate described in detail in condition (B) in the next section.
For estimating the cdf, , of we propose
(3.2) 
which we refer to as the forward estimator of the cdf of . Similarly, for the forward estimator of the cdf of we take
(3.3) 
The forward estimators of the cdf’s of and are empirical versions of the lefthand sides of (2.9) and (2.11), respectively. Note that one can expect consistency of these estimators only if the target distribution functions are continuous in , because otherwise need not converge to , for instance.
The timechange formula (2.6) yields a different representation of and , motivating different estimators than the ones above, based on different data points. For ease of reference, we record the relevant formulas in a lemma, whose proof is given in Appendix 8.2.
Lemma 3.1.
Assume for the moment that is known. Below, we will consider the more realistic situation that is unknown. Lemma 3.1 suggests the following backward estimator of the cdf of :
(3.8) 
Similarly, we define the backward estimator of the cdf of as
(3.9) 
For large, the backward estimators usually have a smaller variance than the forward estimators. To see this, note that for negative with large modulus only very few summands in the numerator of (3.2) do not vanish, because must be even larger in absolute value than , leading to a large variance of the numerator. In contrast, usually many more nonvanishing terms will be summed up in the numerator of (3.8), while each of them gets a rather low weight , leading to a smaller variance. For large positive one may argue similarly by considering the corresponding estimators of the survival function. Indeed, we show in Remark 4.2 that, provided , the backward estimator of the cdf of at has a smaller asymptotic variance than the forward estimator.
For wellchosen weights, convex combinations of the forward and backward estimators can achieve a lower asymptotic variance than each of the estimators individually. Unfortunately, the expression for the asymptotic covariance of the two estimators is intractable; see Corollary 4.1. It remains an open issue how to choose the mixture weights in order to minimize the asymptotic variance.
A pragmatic approach is to give more weight to the forward estimator for small and to give more weight to the backward estimator for large . To this end, define weights by
The mixture estimator for the cdf of is defined as
(3.10) 
The mixture estimator for is defined by replacing in (3.10) with .
The backward and the mixture estimators require the value of the index of regular variation, which is unknown in most applications. There are at least two approaches to deal with this issue:

Employ an empirical version of the transformation in Lemma 2.1 to ensure that, after transformation, . The transformation in (2.7) requires the tail function . This function can be estimated, for instance, by
(3.12) where we divide by rather than by in order to avoid division by zero later on. The transformed variable
is based on the sign of and the rank of among .
In the simulation study in Section 6, the mixture estimator based on the ranktransformed data performs better than the plugin version. Note, however, that the two approaches are not directly comparable: with the second approach, what we estimate is the tail process of the transformed series . From (2.8) and (2.9), it follows that, if is a Markov spectral tail chain as in (2.13) and (2.14), then so is , with and to be replaced by and , respectively. Combining the above two estimation approaches, one could even recover and via and . Finally, as may have a tail process even when the original time series has none, the second approach is more widely applicable.
4 Large sample theory
Under certain conditions, the standardized estimation errors of the forward and the backward estimators converge jointly to a centered Gaussian process. In order not to overload the presentation, we focus on nonnegative Markov chains. In that case, the distribution of determines the distribution of the forward spectral tail process, and thus, via the timechange formula, together with , also the one of the backward spectral tail process. We distinguish between the cases where is known (Section 4.1) and unknown (Section 4.2). In addition, we briefly indicate how the conditions and results must be modified in the realvalued case (Remark 4.7).
4.1 Known index of regular variation
If the index of regular variation, , is known, all estimators under consideration can be expressed in terms of generalized tail array sums, that is, statistics of the form , with
(4.1) 
Drees and Rootzén (2010) give conditions under which, after standardization, such statistics converge to a centered Gaussian process, uniformly over appropriate families of functions . From these results we will deduce a functional central limit theorem for the processes of forward and backward estimators defined in (3.2) and (3.8), respectively.
To ensure consistency, the threshold must tend to infinity such that
tends to 0, but the expected number of exceedances tends to infinity. Moreover, let
denote the mixing coefficients. Here is the field generated by . We assume that there exist sequences and some such that the following conditions hold:
 (A())

The cdf, , of is continuous on .
 (B)


As , we have , , , ;

as and .

Condition (B) pose restrictions on the rate at which tends to 0 and thus on the rate at which tends to . Sufficient conditions to ensure that a Markov chain is mixing can be found in Doukhan (1995, Section 2.4). Usually, for some and one may choose , and (B) is fulfilled for a suitably chosen if and .
 (C)

For all there exists
such that .
Typically will be of the form with and . The interchangeability of the limit and the sum is then automatically fulfilled. For stochastic recurrence equations (Section 5.2), conditions (B) and (C) are verified in Example 8.3 below.
Under these conditions, one can prove the asymptotic normality of relevant generalized tail array sums (see Proposition 8.4 below) and thus the joint asymptotic normality of the forward and the backward estimator of centered by their expectation. However, additional conditions are needed to ensure that their bias is asymptotically negligible:
(4.2)  
(4.3) 
Here denotes the survival function of (and hence of ). These conditions are fulfilled if tends to sufficiently slowly, because by definition of the spectral tail process and by (3.4), the lefthand sides in (4.2)–(4.3) tend to if is continuous on .
Theorem 4.1.
Remark 4.2.
For , we have
provided . Hence, for such , when the tail index is known, the backward estimator is asymptotically more efficient than the forward estimator.
Remark 4.3.
While it is not too restrictive to assume that the cdf of is continuous on , often has positive mass at 0; see Example 8.2. In this case, one may prove a version of Theorem 4.1 where the first coordinate in (4.4) is replaced with
for a weight function and any nondecreasing, continuous function with .
Remark 4.4.
Note that a similar result also holds true without assuming the Markovianity of the spectral tail chain, but then the formulas for the covariance function of the limiting process are more involved. The simple explicit formulas for the asymptotic variances obtained in Theorem 4.1 can be used to construct pointwise confidence intervals for the cdfs of or by a plugin approach. However, if one wants to derive uniform confidence bands or tests for these cdfs then a resampling procedure may be advisable. The same holds true if one cannot assume that the tail sequence has the Markov property. The analysis of such methods is left for future work.
4.2 Unknown index of regular variation
In most applications, the index of regular variation, , is unknown. In the definition of the backward estimator , it must then be replaced with a suitable estimator. A popular estimator of is the Hilltype estimator (3.11). More generally, one may consider estimators that can be written in the form
(4.5) 
with a remainder term and a suitable function which is a.s. continuous w.r.t. for all such that and for all . Obviously, the Hilltype estimator is of this form with . Under weak dependence conditions, other wellknown estimators like the maximum likelihood estimator in a generalized Pareto model examined by Smith (1987) and the moment estimator suggested by Dekkers et al. (1989) can be written in this way too; see Drees (1998a, Example 4.1) and Drees (1998b, Example 4.1) for similar results in the case of i.i.d. sequences.
Estimators of type (4.5) can be approximated by the ratio of the generalized tail array sums corresponding to the functions and , respectively, and their asymptotic behavior can hence be derived from Theorem 2.3 of Drees and Rootzén (2010). To this end, we replace (C) with the following condition:
 (C’)

For all there exists
(4.6) such that .
Moreover, there exists such that
(4.7)
If is bounded, then (C’) follows from condition (C), but in general it is more restrictive, though it can often be established by similar arguments. For example, condition (C’) holds for the solutions to the stochastic recurrence equation studied in Example 8.3 and the Hill estimator, i.e., for .
The following result gives the asymptotic normality of centered at
(4.8) 
This quantity tends to as by the assumptions on the function and condition (C’).
Lemma 4.5.
Similarly as in (4.2) and (4.3), we need an extra condition to ensure that the bias of is asymptotically negligible:
(4.9) 
Now we are ready to state the asymptotic normality of the backward estimator with estimated index , i.e.,
Theorem 4.6.
The covariance function of the limiting process can be calculated in the same way as in the proof of Theorem 4.1. In general, the resulting expressions will involve sums over all . Moreover, it is no longer guaranteed that the backward estimator of at has a smaller variance than the forward estimator.
Remark 4.7.
For Markovian time series which are not necessarily positive, the forward and backward estimators of and can be represented in terms of generalized tail array sums constructed from
When , for example, the backward estimator equals the ratio of the generalized tail array sums pertaining to