Lower bounds to the accuracy of inference on heavy tails
The paper suggests a simple method of deriving minimax lower bounds to the accuracy of statistical inference on heavy tails. A well-known result by Hall and Welsh (Ann. Statist. 12 (1984) 1079–1084) states that if is an estimator of the tail index and is a sequence of positive numbers such that , where is a certain class of heavy-tailed distributions, then . The paper presents a non-asymptotic lower bound to the probabilities . We also establish non-uniform lower bounds to the accuracy of tail constant and extreme quantiles estimation. The results reveal that normalising sequences of robust estimators should depend in a specific way on the tail index and the tail constant.
0 \volume20 \issue2 2014 \firstpage979 \lastpage989 \doi10.3150/13-BEJ512
heavy-tailed distribution \kwdlower bounds
A growing number of publications is devoted to the problem of statistical inference on heavy-tailed distributions. Such distributions naturally appear in finance, meteorology, hydrology, teletraffic engineering, etc. EKM (), R97 (). In particular, it is widely accepted that frequent financial data (e.g., daily and hourly log-returns of share prices, stock indexes and currency exchange rates) often exhibits heavy tails FR (), EKM (), M63 (), N09 (), while less frequent financial data is typically light-tailed. The heaviness of a tail of the distribution appears to be responsible for extreme movements of stock indexes and share prices. The tail index indicates how heavy the tail is; extreme quantiles are used as measures of financial risk EKM (), N09 (). The need to evaluate the tail index and extreme quantiles stimulated research on methods of statistical inference on heavy-tailed data.
The distribution of a random variable (r.v.) is said to have a heavy right tail if
where the (unknown) function is slowly varying at infinity:
We denote by the non-parametric class of distributions obeying (1).
The tail index is the main characteristic describing the tail of a distribution. If then is called the tail constant.
Let denote the distribution function (d.f.). Obviously, the tail index is a functional of the distribution function:
If tends to a constant (say, ) as then the tail constant is also a functional of :
The statistical inference on a heavy-tailed distribution is straightforward if the class of unknown distributions is assumed to be a regular parametric family. The drawback of the parametric approach is that one usually cannot reliably check whether the unknown distribution belongs to a chosen parametric family.
A lot of attention during the past three decades has been given to the problem of reliable inference on heavy tails without parametric assumptions. The advantage of the non-parametric approach is that a class of unknown distributions, is so large that the problem of testing the hypothesis that the unknown distribution belongs to does not arise. The disadvantage of the non-parametric approach is that virtually no question concerning inference on heavy tails can be given a simple answer. In particular, the problem of establishing a lower bound to the accuracy of tail index estimation remained open for decades.
A lower bound to the accuracy of statistical inference sets a benchmark against which the accuracy of any particular estimator can be compared. When looking for an estimator of a quantity of interest, where is the unknown distribution, is the class of distributions and is a functional of one often would like to choose an estimator that minimises a loss function uniformly in (e.g., where is a particular loss function). A lower bound to follows if one can establish a lower bound to
The first step towards establishing a lower bound to the accuracy of tail index estimation was made by Hall and Welsh HW84 (), who proved the following result. Note that the class of heavy-tailed distributions is too “rich” for meaningful inference, and one usually deals with a subclass of imposing certain restrictions on the asymptotics of . Hall and Welsh dealt with the class of distributions on with densities
where . Note that the range of possible values of the tail index is restricted to interval . Let
be an arbitrary tail index estimator, where are independent and identically distributed (i.i.d.) random variables, and let be a sequence of positive numbers. If
Beirlant et al. BBW06 () have a similar result for a larger class of distributions but require the estimators are uniformly consistent in . Pfanzagl Pf () has established a lower bound in terms of a modulus of continuity related to the total variation distance . Let be the class of distributions with densities (3) such that , and set
where is the tail index of distribution and is a neighborhood of Pfanzagl has showed that neither estimator can converge to uniformly in with the rate better than and
Donoho and Liu DL () present a lower bound to the accuracy of tail index estimation in terms of a modulus of continuity . However, they do not calculate . The claim that a particular heavy-tailed distribution is stochastically dominant over all heavy-tailed distributions with the same tail index appears without proof. Assuming that the range of possible values of the tail index is restricted to an interval of fixed length, Drees Drees2001 () derives the asymptotic minimax risk for affine estimators of the tail index and indicates an approach to numerical computation of the asymptotic minimax risk for non-affine ones.
The paper presents a simple method of deriving minimax lower bounds to the accuracy of non-parametric inference on heavy-tailed distributions. The results are non-asymptotic, the constants in the bounds are shown explicitly, the range of possible values of the tail index is not restricted to an interval of fixed length. The information functional seems to be found for the first time, as well as the lower bound to the accuracy of extreme quantiles estimation.
The results indicate that the traditional minimax approach may require revising. The classical approach suggests looking for an estimator that minimises, say,
where is the “information functional” (an analogue of Fisher’s information). Theorems 1–4 reveal the information functionals and indicate that the normalising sequence of a robust estimator should depend in a specific way on the characteristics of the unknown distribution.
In the sequel, we deal with the non-parametric class
of distributions on , where and is the left end-point of the distribution. If then
The class is larger than ; the range of possible values of the tail index is not restricted to an interval of fixed length. Below, given a distribution function (d.f.) , we put
means the mathematical expectation with respect to and is the corresponding distribution. We set .
For any any tail index estimator and any estimator of index there exist d.f.s such that and
as and .
Note that if as then for any we have for all large enough , yielding . Thus, the Hall–Welsh result follows from (6).
Theorem 1 shows that the natural normalising sequence for is The information functional plays here the same role as Fisher’s information function in the Fréchet–Rao–Cramér inequality.
Theorem 1 yields also minimax lower bounds to the moments of In particular, there holds
For any there exist distribution functions such that and for any tail index estimator
The result holds if in (8) is replaced with
Let be a class of d.f.s such that as . Then for any estimator
A lower bound to seems to be established for the first time.
The presence of the information functional makes the bound non-uniform. Note that a uniform lower bound would be meaningless: as the range of possible values of is not restricted to an interval of fixed length, it follows from () that
More generally, may tend to as if .
Let be an arbitrary tail constant estimator. The next theorem presents a lower bound to the probabilities .
Let be an arbitrary tail constant estimator. For any and there exist distribution functions such that and for all large enough , as ,
According to Hall and Welsh HW84 (),
if . This fact can be obtained as a consequence to Theorem 3: if as then for any we have for all large enough , hence .
We now present a lower bound to the accuracy of estimating extreme upper quantiles. We call an upper quantile of level “extreme” if tends to 0 as grows. In financial applications (see, e.g., EKM (), N09 ()), an upper quantile of the level as high as 0.05 can be considered extreme as the empirical quantile estimator appears unreliable. Of course, there is an infinite variety of possible rates of decay of Theorem 4 presents lower bounds in the case where is bounded away from and .
Set We denote the upper quantile of level by
Let be an arbitrary estimator of . Denote .
For any there exist distribution functions such that and for all large enough and
where as .
Our approach to establishing lower bounds requires constructing two distribution functions and where is a Pareto d.f. and is a “disturbed” version of . We then apply Lemma 5 that provides a non-asymptotic lower bound to the accuracy of estimation when choosing between two close alternatives.
The problem of estimating the tail index, the tail constant and from is equivalent to the problem of estimating and quantiles from a sample of i.i.d. positive r.v.s with the distribution
where function slowly varies at the origin.
We denote by the class of distributions obeying (12). Note that if and only if Obviously, a tail index estimator can be considered an estimator of index from the sample , and vice versa. The tradition of dealing with this equivalent problem stems from H82 (). We proceed with this equivalent formulation.
A counterpart to is the following non-parametric class of d.f.s on :
where and is the right end-point of . A d.f. obeys
Proof of Theorem 1 Let , and denote
We will employ the distribution functions and where
The counterparts to these distributions are
It is easy to see that and
Obviously, We now check that
The right-hand side of (15) takes on its maximum at ; the supremum is bounded by . Note that
Let denote the Hellinger distance. It is easy to check that
According to Lemma 5 below,
Note that as From (17),
where and Note that as . Hence, as and (6) follows.
Let be an arbitrary estimator of index . Denote . Since Lemma 5 yields
With the left-hand side of this inequality is
where and , leading to (7).
Proof of Theorem 3 With and defined as above, we have
Let . Then
Note that as . The result follows.
Proof of Theorem 4 Denote
Obviously, is the quantile of . We find convenient dealing with the equivalent problem of estimating quantiles of the distribution of a random variable .
With functions defined as above, it is easy to see that
where we put Note that Hence if ().
by the assumption.
Using the facts that and as , we derive
Hence, and . By Lemma 5,
for any estimator . Thus,
This leads to (11).
The next lemma presents a lower bound to the accuracy of choosing between two “close” alternatives.
Let be an arbitrary class of distributions, and assume that the quantity of interest, is an element of a metric space . An estimator of is a measurable function of taking values in a subspace of the metric space .
Examples of functionals include (a) where is a parametric family of distributions (); (b) where is the density of with respect to a particular measure; (c) . A minimax lower bound over follows from a lower bound to where