Logarithm of ratios of two order statistics
and regularly varying tails
Abstract
Here we suppose that the observed random variable has cumulative distribution function with regularly varying tail, i.e. , . Using the results about exponential order statistics we investigate logarithms of ratios of two order statistics of a sample of independent observations on Pareto distributed random variable with parameter . Short explicit formulae for its mean and variance are obtained. Then we transform this function in such a way that to obtain unbiased, asymptotically efficient, and asymptotically normal estimator for . Finally we simulate Pareto samples and show that in the considered cases the proposed estimator outperforms the well known Hill, tHill, Pickands and DeckersEinmahlde Haan estimators.
aff1]Pavlina K. Jordanova aff3,aff4]Milan Stehlík
History of the Problem
The usefulness of regularly varying (RV) functions in economics seems to be discussed for the first time during modeling of the wealth in our society by Pareto distribution, called to the name of Vilfredo Pareto (1897). J. Karamata (1933) provides their definition and integral representation. Later on the Convergence to types theorem, proved by R. A. Fisher, L. H. C. Tippett (1928), and B.V. Gnedenko (1948) plays a key role for their future applications. It is well known that this class of distributions describes very well the domain of attraction of stable distribution (see Mandelbrot (1960) [15]) and maxdomain of attraction of Frchet distribution (see M. Frchet (1927)). Laurens de Haan (1970) and coauthors [3, 4, 5] develop the main machinery for working with cumulative distribution functions(c.d.fs.) with such tail behaviour. Let us remind that the c.d.f. has regularly varying right tail with parameter , if
After their works the topic spread over the world very fast and many estimators of the index of regular variation are proposed, see e.g. Hill (1975) [10], Pickands (1975)[19] and DeckersEinmahlde Haan (1989) [6], tHill (Stehlik and coauthors (2010) [24, 9, 25], and Pancheva and Jordanova (2012) [11, 14]), among others.
Here we show the usefulness of functions of two central order statistics in estimating the parameter of regular variation. Under very general settings we show that the logarithm of the fraction of two specific central order statistics is an weakly consistent and asymptotically normal estimators of the logarithm of the corresponding theoretical quantiles. Then we use these functions and obtain our estimator for . Its main advantage is that it is very flexible and provides a useful accuracy given midrange and small samples. Pareto case, considered in Section 3 motivates our investigation. First we define a biased form of the estimator. Then using results about order statistics, which could be seen e.g. in Nevzorov (2001) [18] we obtain explicit formulae for its mean and variance. This allows us to define unbiased correction which is asymptotically efficient. Then we prove asymptotic normality and obtain large sample confidence intervals. Our simulation study depicts the advantages of the considered estimators over Hill, tHill, and DeckersEinmahlde Haan estimators. The paper finishes with some conclusive remarks.
Trough the paper we assume that are independent observations on a random variable(r.v.) , and denote by the corresponding increasing order statistics.
denotes the the Generalized harmonic number of power , and , is for the wellknown th harmonic number.
The main object of interest in this point are the statistics
The estimator it is obtained in Jordanova et al. [13] via quantile matching procedure. About the last procedure see e.g. Sgouropoulos et al. (2015) [21].
Along the paper means convergence in distribution.
General Results
In 1933  1949 Smirnoff [22] shows that in case of central order statistics, and more precisely for and such that and , the asymptotic distribution of is a standard normal. Moreover it seems that he has a similar results about bivariate order statistics. It could be seen e.g. in Arnold et al. (1992) [1], p. 226, Mosteller (1946) [16] p.338, Nair [17], p.330, or Wilks [27] among others. The multivariate delta method is a very powerful technique for obtaining confidence intervals in such cases. In the next theorem we apply them and obtain the limiting distribution of the logarithmic differences of central order statistics.
Smirnoff’s theorem. Assume for , , , , and . Then
where the covariance matrix
We apply this theorem together with the Multivariate delta method and obtain asymptotic normality of the estimators, discussed in this paper.
Theorem 1. Consider a sample of , independent observations on a r.v. with c.d.f. and p.d.f. . If there exists and , then for
(0) 
The variance in ( ‣ GENERAL RESULTS) is , where , and
Proof: We will apply the Theorem of Smirnoff for and and Multivariate delta method.
By assumptions the conditions , are satisfied. And for we have , , and , therefore the Smirnoff’s theorem on the joint asymptotic normality of the order statistics, says that
where the asymptotic covariance matrix of this bivariate distribution is
and the asymptotic correlation between these two order statistics is .
Consider the function . For and it is continuously differentiable.
The Jacobian of the transformation is
The asymptotic mean is
Now we apply the Multivariate Delta method, which could be seen e.g. in Sobel (1982) [23], and obtain that the asymptotic variance of is
Q.A.D.
Slutsky’s theorem about continuous functions together with the definition of convergence in probability, application of quantile transform, and Smirnoff’s theorem about a.s. convergence of empirical quantiles to corresponding theoretical one, lead us to the following result. Without lost of generality we consider only a.s. positive r.vs, however the result could be easily transformed for or , .
Theorem 2. Assume . If , , then for
(0) 
Pareto Case
In this section we assume that are independent observations on a r.v. with Pareto c.d.f.
(0) 
Briefly we will denote this by . Different generalizations of this distributions could be seen in Arnold (2015) [2]. The number is called ”index of regular variation of the tail of c.d.f.”. It determines the tail behaviour of the c.d.f. See e.g. de Haan and Ferreira [5], Resnick [20], or Jordanova [12].
Denote by , the fact that the r.v. has c.d.f.
(0) 
The results in the following theorem allow us later on, in Corollaries 1 and 2, to obtain unbiased, consistent, and asymptotically efficient estimators of the parameter .
Theorem 3. Assume are order statistics of independent observations on a r.v. , , , and are integer.
 i)

Denote by a Beta distributed with parameters , and . Then
where is the th order statistics in a sample of independent observations on i.i.d. Exponential r.vs. with parameter , and is the  th order statistic of a sample of independent observations on exponentially distributed r.v. with parameter . Its probability density function is
 ii)

and
Proof: Let us fix , integers. Because of is a strictly increasing function, it is well known that the probability quantile transform, entails
where are order statistics of independent identically distributed (i.i.d.) r.vs. with . Then, because of the multiplicative property of the exponential distribution
where are order statistics of i.i.d. r.vs. with . See e.g. de Haan and Ferreira [5]. Denote the logarithm with basis by log. Because of , , is an increasing function, thus
The last equality could be seen e.g. in de Haan and Ferreira [5] or Arnold et al. (1992) [1].
i) Follows by the equality , the well known relation and the formula for probability density function (p.d.f.) of order statistics of a sample of i.i.d. r.vs. See e.g. p. 7 Nevzorov [18].
ii) The mean, and the variance of the last order statistics are very well investigated. See e.g. Nevzorov [18], p.23. Using his results and the main properties of the expectation and the variance we obtain:
Q.A.D.
In the next corollary is useful when working with finite samples. We obtain that for any , and for fixed the estimators are unbiased for . The accuracy of these estimators in that case is explicitly calculated. However these estimators are applicable also for large enough samples, because for they are weakly consistent and asymptotically efficient.
Corollary 1. Assume , are order statistics of independent observations on a r.v. , , . Then, for all , and ,
 i)

Denote by a Beta distributed with parameters , and . Then
where is the th order statistics in a sample of independent observations on i.i.d. Exponential r.vs. with parameter . is the  th order statistic of a sample of independent observations on exponentially distributed r.v. with parameter . Its probability density function is
 ii)

and
 iii)

For all ,
 iv)

The estimator is asymptotically efficient. For ,
 v)

The estimator is weekly consistent. More precisely, for all ,
Proof: i) and ii) follow by Theorem 1, definition of and the relations
iii) is corollary of ii) and Chebyshev’s inequality.
iv) It is well known that where is the EulerMascheroni constant, , and is the Digamma function. By ii) for any fixed , we have
(0)  
(0) 
In the last equality we have used the well known solution of the Basel problem, and more precisely the limit
v) is a consequence of ii), iii) and iv). Q.A.D.
In the previous proof we have seen that for any fixed , . Therefore, although are biased, they are asymptotically unbiased, asymptotically normal, weakly consistent and asymptotically efficient estimators for . The next conclusions follow by the relation , and the main properties of the mean and the variance.
Corollary 2. Assume , are order statistics of independent observations on a r.v. , , .
 i)

Denote by a Beta distributed with parameters , and . Then, for all ,
where is the th order statistics in a sample of independent observations on i.i.d. Exponential r.vs. with parameter . is the  th order statistic of a sample of independent observations on exponentially distributed r.v. with parameter . Its probability density function is
 ii)

For all , and
 iii)

For all , and ,
 iv)

estimator is asymptotically unbiased and asymptotically efficient. More precisely
 v)

estimator is weekly consistent. For all ,
Applications of the previous results require knowledge about confidence intervals. Therefore, in the the next theorem, we obtain asymptotic normality of these estimators which allows us later on to construct large sample confidence intervals.
Theorem 4. If , , , then for all , and ,
(0) 
(0) 
(0) 
Proof: In this case , and . Therefore, , ,
For we have , , , and therefore we can apply Smirnoff’s theorem about the joint asymptotic normality of the order statistics and Theorem 1. In order to determine and let us note that . Therefore
The equalities
lead us to ( ‣ PARETO CASE). When we multiply the numerator in ( ‣ PARETO CASE) by , and the denominator by , and use that we obtain ( ‣ PARETO CASE). If we multiply both sides of ( ‣ PARETO CASE) by , and use that we obtain ( ‣ PARETO CASE). Q.A.D.
Now we are ready to compute the corresponding confidence intervals. Let us chose and denote by , quantile of the standard normal distribution. Using ( ‣ PARETO CASE), and the definition of we obtain
Therefore for any fixed , the corresponding asymptotic confidence intervals for when are: