A New Information Theoretical Concept: Information-Weighted Heavy-tailed Distributions

# A New Information Theoretical Concept: Information-Weighted Heavy-tailed Distributions

H.M. de Oliveira and R.J.S. Cintra Federal University of Pernambuco, UFPE,
Statistics Department,
CCEN-UFPE, Recife, Brazil.
http://arxiv.org/a/deoliveira_h_1.htmlhttp://arxiv.org/a/cintra_r_1.html
###### Abstract.

Given an arbitrary continuous probability density function, it is introduced a conjugated probability density, which is defined through the Shannon information associated with its cumulative distribution function. These new densities are computed from a number of standard distributions, including uniform, normal, exponential, Pareto, logistic, Kumaraswamy, Rayleigh, Cauchy, Weibull, and Maxwell-Boltzmann. The case of joint information-weighted probability distribution is assessed. An additive property is derived in the case of independent variables. One-sided and two-sided information-weighting are considered. The asymptotic behavior of the tail of the new distributions is examined. It is proved that all probability densities proposed here define heavy-tailed distributions. It is shown that the weighting of distributions regularly varying with extreme-value index still results in a regular variation distribution with the same index. This approach can be particularly valuable in applications where the tails of the distribution play a major role.

###### Key words and phrases:
information theory, information-weighted probability distribution, conjugated probability density function, heavy-tailed distributions.
###### 2010 Mathematics Subject Classification:
60E05, 62B10, 62E15, 94A15.
This paper is dedicated to Professor Fernando Menezes Campello de Souza (PhD, Cornell), in his forthcoming 70th birthday, who has been steadfast at creating an exceptional atmosphere of interest in Statistics, and whose philosophy had a decisive influence on authors’ way of looking the world.

## 1. Preliminaries

Information theory is a subject of relevance in many areas, particularly on Statistic MacKayCover-Thomas. Given an arbitrary random variable with a continuous probability density function (pdf), , we can compute the (Shannon) information amount associated with the event , that is, for each . This is given by .

###### Definition 1.1.

(cumulative information pdf) The information-weighted density is defined by:

 fIX(x)\vcentcolon=−fX(x)⋅logFX(x). (1.1)

Let us define an operator , which maps the a probability density into another function according to Def. 1.1. This can be interpreted as a probability density pair and the new density is the former density, but weighted by the information provided by its cumulative distribution. In the framework of distribution generalization theory, a mapping that takes a distribution in another allows the construction of several new distributions (e.g. leao), which is particularly attractive due to the fact that the shape of the new distribution is quite flexible. For instance, the beta generalized normal distribution (cintra2014) encompasses the beta normal, beta Laplace, normal, and Laplace distributions as sub-models. This article is in a scope somewhat similar, providing the generation of new probability distributions. However, noteworthy here is the construction of heavy-tailed distributions, even from distributions that do not hold this attribute.
The information-conjugated distribution is denoted by inserting an before the standard distribution, e.g. for a normal distribution, . (remark: the terms information-conjugated and information-weighted are used interchangeably throughout the paper.) This first property of a conjugated pdf is concerning its support:

###### Corollary 1.2.

The support of is contained in the support of , i.e. .

Indeed

 E(−logFX(X))=−∫∞−∞fX(x)⋅logFX(x)dx. (1.2)

This expression recalls the original definition of Shannon for the differential entropy of a continuous distribution (see Michalowicz), which is defined by

 H(X)\vcentcolon=−∫∞−∞fX(x)⋅logfX(x)dx. (1.3)

One of the troubling questions of this setting is the possibility of negative values for . This is due to the fact that is not upper bounded by the unit. Replacing now by in the argument of the logarithm was our initial motivation as an attempt to address this issue, bearing in mind that . However, rather to redefine entropy, this always resulted in unitary integral, leading the proposal laid down in this paper. The differential Entropy also has an interesting link with the wavelet analysis (deO). We show in the sequel that the integral Eqn 1.2 is always the unity, whatever the original probability density. Thus, the operator preserves probability densities and the calculation of the area under the curve is an isometry.

###### Proposition 1.3.

is a valid probability density.

###### Proof.

In order to proof that this is a normalized nonnegative function, we shall prove that:

 i) ∀(x) fIX(x)≥0.ii)∫∞−∞fIX(x)dx=1.

We remark first that , so (i) follows. Then we take

 I\vcentcolon=∫∞−∞fIX(x)dx=−∫∞−∞fX(x)⋅logFX(x)dx, (1.4)

which can be rewritten in terms of a Stieltjes integral Protter:

 I=−∫∞−∞logFX(x)dFX(x). (1.5)

Note that is the cumulative probability distribution (CDF) of . By the property of pars integration, we derive:

 I=−[logFX(+∞)].FX(+∞)+[logFX(−∞)].FX(−∞)+∫∞−∞FX(x)dlogFX(x), (1.6)

so that

 I=∫∞−∞FX(x)dlogFX(x)=∫∞−∞dFX(x)=1. (1.7)

It is also straightforward to derive (by simple integration) that the CDF associated with pdf of Def. 1.1 is:

 FIX(x)=FX(x).[1−log FX(x)]. (1.8)

As expected, (since ) and .
How to model probabilistic events described by long-tailed distributions? There are relatively few distributions used in this setting (e.g. Cauchy, log-normal, Weilbull, Burr…), highlighting the Pareto distribution. A pleasent reading review of different classes of distributions with heavy tails can be found in (Werner). We are concerned particularly with two classes:

• class D: subexponential distributions,

• class C: regular variation with tail index .

We show in the sequel that this paper offers a profuson of new options, primarily concerning the class of subexponential distributions (Goldie).

## 2. Conjugated Information-Weighted Density Associated with Known Distributions

Now we compute the conjugated information density associated with selected standard distributions selected in Table 2 (see Walpole).

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters