Lower bounds to the accuracy of inference on heavy tails

Lower bounds to the accuracy of inference on heavy tails

\fnmsS.Y. \snmNovak\correflabel=e1]S.Novak@mdx.ac.uk [ Middlesex University, The Burroughs, London NW44BT, UK. \printeade1
\smonth6 \syear2012\smonth1 \syear2013
\smonth6 \syear2012\smonth1 \syear2013
\smonth6 \syear2012\smonth1 \syear2013
Abstract

The paper suggests a simple method of deriving minimax lower bounds to the accuracy of statistical inference on heavy tails. A well-known result by Hall and Welsh (Ann. Statist. 12 (1984) 1079–1084) states that if is an estimator of the tail index and is a sequence of positive numbers such that , where is a certain class of heavy-tailed distributions, then . The paper presents a non-asymptotic lower bound to the probabilities . We also establish non-uniform lower bounds to the accuracy of tail constant and extreme quantiles estimation. The results reveal that normalising sequences of robust estimators should depend in a specific way on the tail index and the tail constant.

\kwd
\aid

0 \volume20 \issue2 2014 \firstpage979 \lastpage989 \doi10.3150/13-BEJ512

\runtitle

Lower bounds

{aug}

heavy-tailed distribution \kwdlower bounds

1 Introduction

A growing number of publications is devoted to the problem of statistical inference on heavy-tailed distributions. Such distributions naturally appear in finance, meteorology, hydrology, teletraffic engineering, etc. EKM (), R97 (). In particular, it is widely accepted that frequent financial data (e.g., daily and hourly log-returns of share prices, stock indexes and currency exchange rates) often exhibits heavy tails FR (), EKM (), M63 (), N09 (), while less frequent financial data is typically light-tailed. The heaviness of a tail of the distribution appears to be responsible for extreme movements of stock indexes and share prices. The tail index indicates how heavy the tail is; extreme quantiles are used as measures of financial risk EKM (), N09 (). The need to evaluate the tail index and extreme quantiles stimulated research on methods of statistical inference on heavy-tailed data.

The distribution of a random variable (r.v.) is said to have a heavy right tail if

(1)

where the (unknown) function is slowly varying at infinity:

We denote by the non-parametric class of distributions obeying (1).

The tail index is the main characteristic describing the tail of a distribution. If then is called the tail constant.

Let denote the distribution function (d.f.). Obviously, the tail index is a functional of the distribution function:

(2)

If tends to a constant (say, ) as then the tail constant is also a functional of :

The statistical inference on a heavy-tailed distribution is straightforward if the class of unknown distributions is assumed to be a regular parametric family. The drawback of the parametric approach is that one usually cannot reliably check whether the unknown distribution belongs to a chosen parametric family.

A lot of attention during the past three decades has been given to the problem of reliable inference on heavy tails without parametric assumptions. The advantage of the non-parametric approach is that a class of unknown distributions, is so large that the problem of testing the hypothesis that the unknown distribution belongs to does not arise. The disadvantage of the non-parametric approach is that virtually no question concerning inference on heavy tails can be given a simple answer. In particular, the problem of establishing a lower bound to the accuracy of tail index estimation remained open for decades.

A lower bound to the accuracy of statistical inference sets a benchmark against which the accuracy of any particular estimator can be compared. When looking for an estimator of a quantity of interest, where is the unknown distribution, is the class of distributions and is a functional of one often would like to choose an estimator that minimises a loss function uniformly in (e.g., where is a particular loss function). A lower bound to follows if one can establish a lower bound to

The first step towards establishing a lower bound to the accuracy of tail index estimation was made by Hall and Welsh HW84 (), who proved the following result. Note that the class of heavy-tailed distributions is too “rich” for meaningful inference, and one usually deals with a subclass of imposing certain restrictions on the asymptotics of . Hall and Welsh dealt with the class of distributions on with densities

(3)

where . Note that the range of possible values of the tail index is restricted to interval . Let

be an arbitrary tail index estimator, where are independent and identically distributed (i.i.d.) random variables, and let be a sequence of positive numbers. If

(4)

then

(to be precise, Hall and Welsh HW84 () dealt with the random variables where are distributed according to (3)).

Beirlant et al. BBW06 () have a similar result for a larger class of distributions but require the estimators are uniformly consistent in . Pfanzagl Pf () has established a lower bound in terms of a modulus of continuity related to the total variation distance . Let be the class of distributions with densities (3) such that , and set

where is the tail index of distribution and is a neighborhood of Pfanzagl has showed that neither estimator can converge to uniformly in with the rate better than and

Donoho and Liu DL () present a lower bound to the accuracy of tail index estimation in terms of a modulus of continuity . However, they do not calculate . The claim that a particular heavy-tailed distribution is stochastically dominant over all heavy-tailed distributions with the same tail index appears without proof. Assuming that the range of possible values of the tail index is restricted to an interval of fixed length, Drees Drees2001 () derives the asymptotic minimax risk for affine estimators of the tail index and indicates an approach to numerical computation of the asymptotic minimax risk for non-affine ones.

The paper presents a simple method of deriving minimax lower bounds to the accuracy of non-parametric inference on heavy-tailed distributions. The results are non-asymptotic, the constants in the bounds are shown explicitly, the range of possible values of the tail index is not restricted to an interval of fixed length. The information functional seems to be found for the first time, as well as the lower bound to the accuracy of extreme quantiles estimation.

The results indicate that the traditional minimax approach may require revising. The classical approach suggests looking for an estimator that minimises, say,

(cf. Hu97 (), IH81 (), Tsy ()), while our results suggest looking for an estimator that minimises

where is the “information functional” (an analogue of Fisher’s information). Theorems 14 reveal the information functionals and indicate that the normalising sequence of a robust estimator should depend in a specific way on the characteristics of the unknown distribution.

2 Results

In the sequel, we deal with the non-parametric class

(5)

of distributions on , where and is the left end-point of the distribution. If then

The class is larger than ; the range of possible values of the tail index is not restricted to an interval of fixed length. Below, given a distribution function (d.f.) , we put

means the mathematical expectation with respect to and is the corresponding distribution. We set .

Theorem 1

For any any tail index estimator and any estimator of index there exist d.f.s such that and

(6)
(7)

as and .

Note that if as then for any we have for all large enough , yielding . Thus, the Hall–Welsh result follows from (6).

Theorem 1 shows that the natural normalising sequence for is The information functional plays here the same role as Fisher’s information function in the Fréchet–Rao–Cramér inequality.

Theorem 1 yields also minimax lower bounds to the moments of In particular, there holds

Corollary 2

For any there exist distribution functions such that and for any tail index estimator

(8)

The result holds if in (8) is replaced with

Let be a class of d.f.s such that as . Then for any estimator

()

A lower bound to seems to be established for the first time.

The presence of the information functional makes the bound non-uniform. Note that a uniform lower bound would be meaningless: as the range of possible values of is not restricted to an interval of fixed length, it follows from (???) that

More generally, may tend to as if .

Let be an arbitrary tail constant estimator. The next theorem presents a lower bound to the probabilities .

Theorem 3

Let be an arbitrary tail constant estimator. For any and there exist distribution functions such that and for all large enough , as ,

(9)

where

Similarly to (8) Theorem 3 yields lower bounds to the moments of In particular, (9) entails

()

According to Hall and Welsh HW84 (),

if . This fact can be obtained as a consequence to Theorem 3: if as then for any we have for all large enough , hence .

We now present a lower bound to the accuracy of estimating extreme upper quantiles. We call an upper quantile of level “extreme” if tends to 0 as grows. In financial applications (see, e.g., EKM (), N09 ()), an upper quantile of the level as high as 0.05 can be considered extreme as the empirical quantile estimator appears unreliable. Of course, there is an infinite variety of possible rates of decay of Theorem 4 presents lower bounds in the case where is bounded away from and .

Set We denote the upper quantile of level by

Let be an arbitrary estimator of . Denote .

Theorem 4

For any there exist distribution functions such that and for all large enough and

(10)
(11)

where as .

3 Proofs

Our approach to establishing lower bounds requires constructing two distribution functions and where is a Pareto d.f. and is a “disturbed” version of . We then apply Lemma 5 that provides a non-asymptotic lower bound to the accuracy of estimation when choosing between two close alternatives.

The problem of estimating the tail index, the tail constant and from is equivalent to the problem of estimating and quantiles from a sample of i.i.d. positive r.v.s with the distribution

(12)

where function slowly varies at the origin.

We denote by the class of distributions obeying (12). Note that if and only if Obviously, a tail index estimator can be considered an estimator of index from the sample , and vice versa. The tradition of dealing with this equivalent problem stems from H82 (). We proceed with this equivalent formulation.

A counterpart to is the following non-parametric class of d.f.s on :

(13)

where and is the right end-point of . A d.f. obeys

where and

{pf*}

Proof of Theorem 1 Let , and denote

We will employ the distribution functions and where

The counterparts to these distributions are

It is easy to see that and

(14)

Obviously, We now check that

Since

we have

(15)

The right-hand side of (15) takes on its maximum at ; the supremum is bounded by . Note that

Let denote the Hellinger distance. It is easy to check that

(16)

According to Lemma 5 below,

(17)

Let where

Note that as From (17),

(18)

where and Note that as . Hence, as and (6) follows.

Let be an arbitrary estimator of index . Denote . Since Lemma 5 yields

With the left-hand side of this inequality is

where and , leading to (7).

{pf*}

Proof of Corollary 2 Note that

(19)

for any non-negative r.v. Since

as (6) and (19) entail (8).

{pf*}

Proof of Theorem 3 With and defined as above, we have

Using this inequality, (17) and Lemma 5, we derive

Let . Then

Note that as . The result follows.

{pf*}

Proof of Theorem 4 Denote

Obviously, is the quantile of . We find convenient dealing with the equivalent problem of estimating quantiles of the distribution of a random variable .

With functions defined as above, it is easy to see that

(20)

where we put Note that Hence if ().

Denote

(21)

Then and

(22)

by the assumption.

Using the facts that and as , we derive

Hence, and . By Lemma 5,

for any estimator . Thus,

where and Taking into account (21) and (22), we derive

This leads to (11).

Recall that . From (20),

By Lemma 5,

Hence,

where and . The proof is complete.

The next lemma presents a lower bound to the accuracy of choosing between two “close” alternatives.

Let be an arbitrary class of distributions, and assume that the quantity of interest, is an element of a metric space . An estimator of is a measurable function of taking values in a subspace of the metric space .

Examples of functionals include (a) where is a parametric family of distributions (); (b) where is the density of with respect to a particular measure; (c) . A minimax lower bound over follows from a lower bound to where

Lemma 5

Denote