Lower bounds to the accuracy of inference on heavy tails

# Lower bounds to the accuracy of inference on heavy tails

\fnmsS.Y. \snmNovak\correflabel=e1]S.Novak@mdx.ac.uk [ Middlesex University, The Burroughs, London NW44BT, UK. \printeade1
\smonth6 \syear2012\smonth1 \syear2013
\smonth6 \syear2012\smonth1 \syear2013
\smonth6 \syear2012\smonth1 \syear2013
###### Abstract

The paper suggests a simple method of deriving minimax lower bounds to the accuracy of statistical inference on heavy tails. A well-known result by Hall and Welsh (Ann. Statist. 12 (1984) 1079–1084) states that if is an estimator of the tail index and is a sequence of positive numbers such that , where is a certain class of heavy-tailed distributions, then . The paper presents a non-asymptotic lower bound to the probabilities . We also establish non-uniform lower bounds to the accuracy of tail constant and extreme quantiles estimation. The results reveal that normalising sequences of robust estimators should depend in a specific way on the tail index and the tail constant.

\kwd
\aid

0 \volume20 \issue2 2014 \firstpage979 \lastpage989 \doi10.3150/13-BEJ512

\runtitle

Lower bounds

{aug}

heavy-tailed distribution \kwdlower bounds

## 1 Introduction

A growing number of publications is devoted to the problem of statistical inference on heavy-tailed distributions. Such distributions naturally appear in finance, meteorology, hydrology, teletraffic engineering, etc. EKM (), R97 (). In particular, it is widely accepted that frequent financial data (e.g., daily and hourly log-returns of share prices, stock indexes and currency exchange rates) often exhibits heavy tails FR (), EKM (), M63 (), N09 (), while less frequent financial data is typically light-tailed. The heaviness of a tail of the distribution appears to be responsible for extreme movements of stock indexes and share prices. The tail index indicates how heavy the tail is; extreme quantiles are used as measures of financial risk EKM (), N09 (). The need to evaluate the tail index and extreme quantiles stimulated research on methods of statistical inference on heavy-tailed data.

The distribution of a random variable (r.v.) is said to have a heavy right tail if

 P(X≥x)=L(x)x−α(α>0), (1)

where the (unknown) function is slowly varying at infinity:

 limx→∞L(xt)/L(x)=1(∀t>0).

We denote by the non-parametric class of distributions obeying (1).

The tail index is the main characteristic describing the tail of a distribution. If then is called the tail constant.

Let denote the distribution function (d.f.). Obviously, the tail index is a functional of the distribution function:

 αF≡αP=−limx→∞lnP(X≥x)lnx. (2)

If tends to a constant (say, ) as then the tail constant is also a functional of :

 cF≡cP=limx→∞xαFP(X≥x).

The statistical inference on a heavy-tailed distribution is straightforward if the class of unknown distributions is assumed to be a regular parametric family. The drawback of the parametric approach is that one usually cannot reliably check whether the unknown distribution belongs to a chosen parametric family.

A lot of attention during the past three decades has been given to the problem of reliable inference on heavy tails without parametric assumptions. The advantage of the non-parametric approach is that a class of unknown distributions, is so large that the problem of testing the hypothesis that the unknown distribution belongs to does not arise. The disadvantage of the non-parametric approach is that virtually no question concerning inference on heavy tails can be given a simple answer. In particular, the problem of establishing a lower bound to the accuracy of tail index estimation remained open for decades.

A lower bound to the accuracy of statistical inference sets a benchmark against which the accuracy of any particular estimator can be compared. When looking for an estimator of a quantity of interest, where is the unknown distribution, is the class of distributions and is a functional of one often would like to choose an estimator that minimises a loss function uniformly in (e.g., where is a particular loss function). A lower bound to follows if one can establish a lower bound to

 supP∈PP(|^an−aP|≥u)(u>0).

The first step towards establishing a lower bound to the accuracy of tail index estimation was made by Hall and Welsh HW84 (), who proved the following result. Note that the class of heavy-tailed distributions is too “rich” for meaningful inference, and one usually deals with a subclass of imposing certain restrictions on the asymptotics of . Hall and Welsh dealt with the class of distributions on with densities

 f(x)=cαx−α−1(1+u(x)), (3)

where . Note that the range of possible values of the tail index is restricted to interval . Let

 ^αn≡^αn(X1,…,Xn)

be an arbitrary tail index estimator, where are independent and identically distributed (i.i.d.) random variables, and let be a sequence of positive numbers. If

 limn→∞supF∈Db,APF(|^αn−αF|≥zn)=0(∀A>0), (4)

then

 zn≫n−b/(2b+1)(n→∞)

(to be precise, Hall and Welsh HW84 () dealt with the random variables where are distributed according to (3)).

Beirlant et al. BBW06 () have a similar result for a larger class of distributions but require the estimators are uniformly consistent in . Pfanzagl Pf () has established a lower bound in terms of a modulus of continuity related to the total variation distance . Let be the class of distributions with densities (3) such that , and set

 sn(ε,P0)=supP∈Pn,ε|αP−αP0|,

where is the tail index of distribution and is a neighborhood of Pfanzagl has showed that neither estimator can converge to uniformly in with the rate better than and

 inf0<ε<1ε−2b/(1+2b)liminfn→∞nb/(1+2b)sn(ε,P0)>0.

Donoho and Liu DL () present a lower bound to the accuracy of tail index estimation in terms of a modulus of continuity . However, they do not calculate . The claim that a particular heavy-tailed distribution is stochastically dominant over all heavy-tailed distributions with the same tail index appears without proof. Assuming that the range of possible values of the tail index is restricted to an interval of fixed length, Drees Drees2001 () derives the asymptotic minimax risk for affine estimators of the tail index and indicates an approach to numerical computation of the asymptotic minimax risk for non-affine ones.

The paper presents a simple method of deriving minimax lower bounds to the accuracy of non-parametric inference on heavy-tailed distributions. The results are non-asymptotic, the constants in the bounds are shown explicitly, the range of possible values of the tail index is not restricted to an interval of fixed length. The information functional seems to be found for the first time, as well as the lower bound to the accuracy of extreme quantiles estimation.

The results indicate that the traditional minimax approach may require revising. The classical approach suggests looking for an estimator that minimises, say,

 supP∈PEP|^an−aP|

(cf. Hu97 (), IH81 (), Tsy ()), while our results suggest looking for an estimator that minimises

where is the “information functional” (an analogue of Fisher’s information). Theorems 14 reveal the information functionals and indicate that the normalising sequence of a robust estimator should depend in a specific way on the characteristics of the unknown distribution.

## 2 Results

In the sequel, we deal with the non-parametric class

 H(b)={P∈H\dvtsupx>K∗(P)∣∣c−1FxαFP(X≥x)−1∣∣xbαF<∞} (5)

of distributions on , where and is the left end-point of the distribution. If then

 P(X≥x)=cFx−αF(1+O(x−bαF))(x→∞).

The class is larger than ; the range of possible values of the tail index is not restricted to an interval of fixed length. Below, given a distribution function (d.f.) , we put

 aFi=1/αFi,r=b/(1+2b),

means the mathematical expectation with respect to and is the corresponding distribution. We set .

###### Theorem 1

For any any tail index estimator and any estimator of index there exist d.f.s such that and

 maxi∈{0;1}Pi(|^αn/αFi−1|αr/bFicrFinr≥v/2) ≥ (1−v1/r/8n)2n/4, (6) maxi∈{0;1}Pi(|^an/aFi−1|a−r/bFicrFinr≥v/2) ≥ (1−v1/r/8n)2n/4 (7)

as and .

Note that if as then for any we have for all large enough , yielding . Thus, the Hall–Welsh result follows from (6).

Theorem 1 shows that the natural normalising sequence for is The information functional plays here the same role as Fisher’s information function in the Fréchet–Rao–Cramér inequality.

Theorem 1 yields also minimax lower bounds to the moments of In particular, there holds

###### Corollary 2

For any there exist distribution functions such that and for any tail index estimator

 maxi∈{0;1}αr/bFicrFiEFi|^αn/αFi−1|nr≥4rrΓ(r)/8+o(1). (8)

The result holds if in (8) is replaced with

Let be a class of d.f.s such that as . Then for any estimator

 supF∈Hn(b)αr/bFcrFEF|^αn/αF−1|nr≥4rrΓ(r)/8+o(1). (???∗)

A lower bound to seems to be established for the first time.

The presence of the information functional makes the bound non-uniform. Note that a uniform lower bound would be meaningless: as the range of possible values of is not restricted to an interval of fixed length, it follows from (???) that

 supF∈Hn(b)EF|^αn/αF−1|→∞(n→∞).

More generally, may tend to as if .

Let be an arbitrary tail constant estimator. The next theorem presents a lower bound to the probabilities .

###### Theorem 3

Let be an arbitrary tail constant estimator. For any and there exist distribution functions such that and for all large enough , as ,

 maxi∈{0;1}Pi(|^cn/cFi−1|αr/bFicrFi≥rvrn−rln(n/lnn)tn/2b)≥(1−v/8n)2n/4, (9)

where

Similarly to (8) Theorem 3 yields lower bounds to the moments of In particular, (9) entails

 maxi∈{0;1}αr/bFicrFiEFi|^cn/cFi−1|≥(lnn)n−rr24r−1Γ(r)/(2b+o(1)). (???∗)

According to Hall and Welsh HW84 (),

 zn≫(lnn)n−b/(2b+1)

if . This fact can be obtained as a consequence to Theorem 3: if as then for any we have for all large enough , hence .

We now present a lower bound to the accuracy of estimating extreme upper quantiles. We call an upper quantile of level “extreme” if tends to 0 as grows. In financial applications (see, e.g., EKM (), N09 ()), an upper quantile of the level as high as 0.05 can be considered extreme as the empirical quantile estimator appears unreliable. Of course, there is an infinite variety of possible rates of decay of Theorem 4 presents lower bounds in the case where is bounded away from and .

Set We denote the upper quantile of level by

 xF,n=¯F−1(qn).

Let be an arbitrary estimator of . Denote .

###### Theorem 4

For any there exist distribution functions such that and for all large enough and

 maxi∈{0;1}Pi(|^xn/xFi,n−1|α2(1−r)FicrFi/wFit⋆i,n≥un−r/2b) ≥ (1−u1/r/8n)2n/4, (10) maxi∈{0;1}Pi(|xFi,n/^xn−1|α2(1−r)FicrFi/wFit⋆i,n≥un−r/2b) ≥ (1−u1/r/8n)2n/4, (11)

where as .

## 3 Proofs

Our approach to establishing lower bounds requires constructing two distribution functions and where is a Pareto d.f. and is a “disturbed” version of . We then apply Lemma 5 that provides a non-asymptotic lower bound to the accuracy of estimation when choosing between two close alternatives.

The problem of estimating the tail index, the tail constant and from is equivalent to the problem of estimating and quantiles from a sample of i.i.d. positive r.v.s with the distribution

 F(y)≡P(Y≤y)=yαℓ(y)(y>0), (12)

where function slowly varies at the origin.

We denote by the class of distributions obeying (12). Note that if and only if Obviously, a tail index estimator can be considered an estimator of index from the sample , and vice versa. The tradition of dealing with this equivalent problem stems from H82 (). We proceed with this equivalent formulation.

A counterpart to is the following non-parametric class of d.f.s on :

 F(b)={F∈F\dvtsupy

where and is the right end-point of . A d.f. obeys

 F(y)=cFyαF(1+O(ybαF))(y→0),

where and

{pf*}

Proof of Theorem 1 Let , and denote

 α0=α,α1=α+γ,γ=hαb.

We will employ the distribution functions and where

 F0(y) = (y/c)α\mathbh1{0

The counterparts to these distributions are

 P0(X>x) = (cx)−α\mathbh1{x≥1/c}, P1(X>x) = (cx)−α\mathbh1{1/c≤x<1/h}+c−αh−γx−α1\mathbh1{x≥1/h}.

It is easy to see that and

 αF0=α,αF1=α1,cF0=c−α,cF1=c−αh−γ. (14)

Obviously, We now check that

Since

 c−1F1y−α1F1(y)=y−γhγ(h

we have

 sup0

The right-hand side of (15) takes on its maximum at ; the supremum is bounded by . Note that

Let denote the Hellinger distance. It is easy to check that

 d2H(F0;F1)≤γ1/r/8α2cα. (16)

According to Lemma 5 below,

 maxi∈{0;1}Pi(|^αn−αFi|≥γ/2)≥(1−γ1/r/8α2cα)2n/4. (17)

Let where

 γn≡γn(α,c,v)=v(α2cα/n)r.

Note that as From (17),

 maxi∈{0;1}Pi(|^αn/αFi−1|αr/bFicrFinr≥vtn,i/2)≥(1−v1/r/8n)2n/4, (18)

where and Note that as . Hence, as and (6) follows.

Let be an arbitrary estimator of index . Denote . Since Lemma 5 yields

 maxi∈{0;1}Pi(|^an−aFi|≥γaa1/2)≥(1−γ1/r/8α2cα)2n/4.

With the left-hand side of this inequality is

 maxi∈{0;1}Pi(|^an−aFi|≥vn−ra1−2ra1/2crF0)=maxi∈{0;1}Pi(|^an/aFi−1|a−r/bFicrFinr≥vt+n,i/2),

where and , leading to (7).

{pf*}

Proof of Corollary 2 Note that

 Eξ=∫∞0P(ξ≥x)dx (19)

for any non-negative r.v. Since

 ∫zn0(1−v1/r/8n)2ndv=4rrΓ(r)+o(1)(n→∞)

as (6) and (19) entail (8).

{pf*}

Proof of Theorem 3 With and defined as above, we have

 cF1−cF0=c−α(γ−γ/αb−1)≥c−αγ|lnγ|/αb.

Using this inequality, (17) and Lemma 5, we derive

 maxi∈{0;1}Pi(|^cn−cFi|≥c−αγ|lnγ|/2αb)≥(1−γ1/r/8α2cα)2n/4.

Let . Then

 maxi∈{0;1}Pi(|^cn−cFi|≥cF0(vα2cα/n)rrln(n/lnn)/2αb)≥(1−v/8n)2n/4.

Note that as . The result follows.

{pf*}

Proof of Theorem 4 Denote

 xi≡xFi,n,yi=1/xi.

Obviously, is the quantile of . We find convenient dealing with the equivalent problem of estimating quantiles of the distribution of a random variable .

With functions defined as above, it is easy to see that

 y0=cq1/αn=cκh,y1=cα/α1q1/α1nhγ/α1=y0(cκ)−γ/α1, (20)

where we put Note that Hence if ().

Denote

 γ≡γn(α,b,c)=u(α2cα/n)r. (21)

Then and

 cκ=u−1/αbs1/αc2rα−2r/αb<1 (22)

by the assumption.

Using the facts that and as , we derive

 y1−y0 = y0((cκ)−γ/α1−1) ≥ γ1+1/αb(cκ)1−γ/2α1|lncκ|/α1.

Hence, and . By Lemma 5,

 maxi∈{0;1}Pi(|^yn−yi|≥γ1+1/αb(cκ)1−γ/2α1|lncκ|/2α1)≥(1−γ1/r/8α2cα)2n/4

for any estimator . Thus,

where and Taking into account (21) and (22), we derive

 maxi∈{0;1}Pi(|^yn/yi−1|≥uα2(r−1)Fic−rFin−rln(uα2r/sbc2rαb)t⋆n,i/2b)≥(1−u1/r/8n)2n/4.

Recall that . From (20),

 |x1−x0|=|y1−y0|/y0y1≥γ1−1/αb(cκ)−1+γ/2α1|lncκ|/α1.

By Lemma 5,

 maxi∈{0;1}Pi(|^xn−xi|≥γ1−1/αb(cκ)−1+γ/2α1|lncκ|/2α1)≥(1−γ1/r/8α2cα)2n/4.

Hence,

 maxi∈{0;1}Pi(|^xn/xi−1|≥uα2(r−1)Fic−rFin−r∣∣ln(sbc2rαb/α2ru)∣∣~tn,i/2b)≥(1−u1/r/8n)2n/4,

where and . The proof is complete.

The next lemma presents a lower bound to the accuracy of choosing between two “close” alternatives.

Let be an arbitrary class of distributions, and assume that the quantity of interest, is an element of a metric space . An estimator of is a measurable function of taking values in a subspace of the metric space .

Examples of functionals include (a) where is a parametric family of distributions (); (b) where is the density of with respect to a particular measure; (c) . A minimax lower bound over follows from a lower bound to where

Denote