1 Introduction

A new distribution function with bounded support: the reflected Generalized Topp-Leone Power Series distribution

Abstract

In this paper we introduce a new flexible class of distributions with bounded support, called reflected Generalized Topp-Leone Power Series (rGTL-PS), obtained by compounding the reflected Generalized Topp-Leone (van Drop and Kotz, 2006) and the family of Power Series distributions. The proposed class includes, as special cases, some new distributions with limited support such as the rGTL-Logarithmic, the rGTL-Geometric, the rGTL-Poisson and rGTL-Binomial. This work is an attempt to partially fill a gap regarding the presence, in the literature, of continuous distributions with bounded support, which instead appear to be very useful in many real contexts, included the reliability. Some properties of the class, including moments, hazard rate and quantile are investigated. Moreover, the maximum likelihood estimators of the parameters are examined and the observed Fisher information matrix provided. Finally, in order to show the usefulness of the new class, some applications to real data are reported.

A new distribution function with bounded support: the reflected Generalized Topp-Leone Power Series distribution





Francesca Condino           and           Filippo Domma *

Department of Economics, Statistics and Finance

University of Calabria - Italy.

f.domma@unical.it     and     francesca.condino@unical.it

[1cm]

* Corresponding author: f.domma@unical.it

A new distribution function with bounded support: the reflected Generalized Topp-Leone Power Series distribution








Key words: Compound Class, Bounded Support, Hazard Function, Flexible shape.

1 Introduction

In recent years, many authors have focused their attention on the proposition of new and more flexible distribution functions, constructed using various transformation techniques such as, for example, the Beta-generated distribution by Eugene et al. (2002) and Jones (2004), the Gamma-generated distribution by Zografos and Balakrishnan (2009), Kumaraswamy-generated distribution by Cordeiro and de Castro (2011), McDonald-generated distribution by Alexander et al. (2012), Ristic and Balakrishnan (2012), Weibull-generated distribution by Bourguignon et al. (2014), just to name a few. Such transformation techniques may be viewed as special cases of the Transformed-Transformer method proposed by Alzaatreh et al. (2013). Other proposals are based on the Azzalini’s method (Azzalini, 1985, 1986) and its extensions (Domma et al., 2015). Using these techniques, many of the known distributions such as, for example, Normal, Exponential, Weibull, Logistics, Pareto, Dagum, Singh-Maddala, etc., have been generalized.
Almost all these new proposals concern distributions with unbounded support. In the face of the numerous proposals of distributions with unbounded support emerges, undoubtedly, the great scarcity of distributions with bounded support (Marshall and Olkin, 2007, pag. 473), although there are many real-life situations in which the observations clearly can take values only in a limited range, such as percentages, proportions or fractions. Papke and Wooldridge (1996) claim that variables bounded between zero and one arise naturally in many economic setting; e.g. the fraction of total weekly hours spent working, the proportion of income spent on non-durable consumption, pension plan participation rates, industry market shares, television rating, fraction of land area allocate to agriculture, etc. Various examples of proportions in the unit interval used in empirical finance are discussed in Cook et al. (2008).
Also in reliability analysis, different authors refer to continuous models with finite support in order to describe lifetime data. This is often motivated by considering physical reasons such as the finite lifetime of a component or the bounded signals occurring in industrial systems (see, for example, Jiang, 2013; Dedecius and Ettler, 2013). In this perspective, the models with infinite support can be viewed as an approximation of the realty. Furthermore, when the realiability is measured as percentage or ratio, it is important to have models defined on the unit interval (Genç, 2013) in order to have plausible results.

It is well known that the most used distribution to model continuous variables in the unit interval is the Beta distribution. The popularity of this distribution is certainly due to the great flexibility of its density function, in fact it can take different forms such as constant, increasing, decreasing, unimodal and uniantimodal depending on the values of its parameters. This distribution has been used in various fields of science such as, for example, biology, ecology, engineering, economics, demography, finance, etc. (for a detailed discussion see Johnson et al. (1995), Nadarajah and Kotz (2007)). On the other hand, mathematical difficulties underlying the use of this model, due to the fact that its distribution function cannot be expressed in closed form and its determination involves the incomplete beta function ratio, are well known.
Recently, several authors have proposed an alternative to the Beta distribution by recovering the distribution proposed by Kumaraswamy in 1980 in the context of hydrology studies. As pointed out by Jones (2009), Kumaraswamy’s distribution has many of the properties of the Beta distribution and some advantages in terms of tractability, in particular its distribution function has a closed form and it does not involve any special function. In fact, this distribution turns out to be a special case of the generalized Beta of the first type proposed by McDonald (1984) (see Nadarajah (2008)).

Perhaps thanks to a work of Nadarajah and Kotz (2003), a renewed interest has been recently developed for another distribution defined on bounded support, the Topp-Leone (TL) distribution, proposed by Topp and Leone (1955) and successively studied by different authors (see, for example, Ghitany, 2007; Genç, 2012; Vicari et al., 2008) also in the reliability context (Ghitany et al., 2005; Genç, 2013; Condino et al., 2014).

Starting from a generalized version of the TL distribution, van Drop and Kotz (2006) proposed the reflected Generalized Topp-Leone (rGTL) distribution. Similarly to the Beta distribution, this distribution has a density that can be constant, increasing, decreasing, unimodal and uniantimodal, depending on the values of its parameters. Moreover, it has a strictly positive density value at its lower bound and a closed form of its distribution function. However, the hazard function of the rGTL shows a certain rigidity since it is always increasing. This fact is a real weakness of the model, in particular in the field of reliability theory and survival analysis.

With the aim to propose a new flexible model defined on the unit interval, in this paper we consider the standard rGTL distribution and introduce the reflected Generalized Topp-Leone Power Series (rGTL-PS) class of distributions, obtained by compounding the Power Series distributions and the standard rGTL.

In the recent literature, many new distributions are obtained by compounding a continuous distribution with a discrete one. An interesting motivation of this procedure can be found in the process underlying the failure of a series system composed by component. If is the lifetime for the component, the system will fail when the first component fails, so the lifetime of the whole system is . In this situation, assuming that the lifetimes of the components are independent, it is easy to obtain the probability of failure for the system by compounding the distribution of ’s and the distribution of , as follows:

By considering different cumulative distribution functions for and for , various models are proposed during the last years. It is the case, for example, of the exponential-geometric distribution (Adamidis and Loukas, 1998), of the exponential-Poisson-Lindley distribution (Barreto-Souza and Bakouch, 2013) and of the exponential-logarithmic distribution (Tahmasbi and Rezaeib, 2008), to name a few. Other distributions, such as the Weibull Power Series (Morais and Barreto-Souza, 2011), are obtained by describing the random variable through the Power Series distribution, or by considering a similar procedure to that just mentioned, involving the maximum rather than the minimum of the lifetimes (Nadarajah et al., 2013; Flores et al., 2013).

The rGTL-PS distribution obtained preserves the main advantage of the rGTL distribution with respect to the Beta distribution, that is its cumulative distribution function has a closed form and therefore the quantile functions are easily obtainable and one can easily generate a random variable from rGTL-PS distribution. Furthermore, the rGTL-PS density function has an analogous flexibility of the Beta density function, i.e. the shape of density can be increasing, decreasing, unimodal and uniantimodal. Finally, the shape of the hazard, besides being increasing and bathtub as the Beta distribution, also shows a N-shape (very useful in the context of reliability theory, see for example, Bebbington et al. (2009), Lai and Izadi (2012)). These properties characterize our proposal as a valid alternative to the Beta distribution.

This article is organized as follows. In Section 2, we define the rGTL-PS distribution. Some properties, such as the moments, the hazard rate and the quantile are derived in Section 3. The maximum likelihood estimation is discussed in Section 4 and some special cases are studied in Section 5. Finally, various applications on real data sets are reported in Section 6.

2 Reflected Generalized Topp-Leone Power Series distribution

A random variable is said to have a standard reflected Generalized Topp-Leone (rGTL) distribution (van Drop and Kotz (2006)) if its cumulative distribution function (cdf) and the corresponding probability density function (pdf) are respectively given by

(1)

and

(2)

with and . The density of the rGTL can be strictly decreasing, strictly increasing or may possess a mode or an anti-mode, according to the values of the parameters.

In order to define the new distribution, we consider a sequence of independent and identically distributed continuous random variables , with distribution function , where is a discrete random variable following a Power Series distribution truncated at zero, with probability function (pf) given by

(3)

where is finite and . In Table 1 the most common distributions belonging to the Power Series family are reported.

Distribution range of
Logarithmic
Geometric
Poisson
Binomial
Table 1: Some special cases of the Power Series distribution.

Let . The conditional pdf of given that , i.e. , can be obtained from the distribution of the minimum of random variables:

(4)

Hence, the joint pdf of is given by

(5)

By denoting with the derivative of with respect to the argument, the marginal pdf of is

(6)

with cdf given by

(7)

Replacing the expressions (1) and (2) in (6) and (7), we obtain the rGTL-PS pdf and the corresponding cumulative cdf as follows:

(8)
(9)

It is evident, from (8), that the density of the rGTL-PS class can be expressed as mixture of rGTL densities, , with weights . In the following, we refer to this property because it enables us to obtain some mathematical properties of the rGTL-PS distributions, such as the moments.

3 Statistical Properties

In this section, we study some properties of the rGTL-PS distribution. In particular, we determine the rth incomplete moment and ordinary moments using the known properties of the mixture distributions. Moreover, we compute the quantile and the hazard rate, evaluating its behaviour at the extrems of the support.

3.1 Moments

In order to calculate the rth moment of rGTL-PS distribution, first we determine the incomplete moment of order r for a rGTL distribution then, by using the properties of the mixture distributions, we calculate the rth moment of a rGTL-PS distribution.

Lemma 1

If then the incomplete moment of order is

where and .

textbfProof Using , with we can write

Corollary 2

If then the moment of order is

(10)

Proof It is enough to put in the Lemma.

From (10), it is easy to obtain the moment of order of the rGTL-PS distribution:

3.2 Hazard rate

In this section, we verify that the hazard rate of rGTL-PS distribution is more flexible than the hazard rate of the rGTL.

First of all, it is easy to show that the hazard rate of rGTL distribution is always increasing. Indeed, by (1) and (2) we obtain the hazard rate for the rGTL, as follows:

(11)

and, after easy algebra, the derivative of (11) with respect to is given by

that is positive , and .

Starting from (6) and (7), we obtain the expression for the hazard rate of the rGTL-PS distribution, as follows:

(12)

The hazard rate is a complex function of . However, from (12) we have that for , the hazard rate tends to , while for the hazard rate tends to . Indeed, we have:

(13)

where has the same radius of convergence of and . Observed that , if then , given that ; while, if , given that , we have . Now, using l’Hopital’s Rule we obtain:

(14)

It can be shown that , so that .

As we can see from Fig. 1, the hazard rate for the models belonging to the rGTL-PS class of distributions is much more flexible than the hazard rate of the rGTL distribution. In particular, besides having the increasing shape, it is also possible to have the bathtub shape and the upside-down bathtub and then bathtub shape. This wide range of different behaviours of the hazard rate makes the models belonging to the rGTL-PS class suitable models for reliability theory and survival analysis in cases of bounded domain.

3.3 Quantile

An advantage of the rGTL-PS distribution is the possibility to get the expression for the quantile function. Indeed, the cdf for the rGTL-PS can be expressed in a closed form, as it can be noted from the expression (7) and therefore the quantile can be easily obtained by remembering the expression for the quantile of rGTL distribution:

(15)

and by putting .

Thus, the quantile of the rGTL-PS distribution is given by

(16)

4 Inference

In order to estimate the parameters of the rGTL-PS distribution, we consider the maximum likelihood (ML) method. Let be a random sample of size from the rGTL-PS given by (6). The log-likelihood function for the vector of parameters can be expressed as:

(17)

Differentiating with respect to and , respectively, and setting the results equal to zero, we have:

(18)

where the quantities and are reported in the Appendix. The system does not admit any explicit solution; therefore, the ML estimates can only be obtained by means of numerical procedures. Under the usual regularity conditions, the known asymptotic properties of the maximum likelihood method ensure that , where is the asymptotic variance-covariance matrix and is the Fisher Information matrix. Moreover, the asymptotic variance-covariance matrix of can be approximated by the inverse of the observed information matrix , whose entries are given in the Appendix. In order to build the confidence intervals and hypothesis tests, we use the fact that the asymptotic distribution of can be approximated by the multivariate normal distribution, , where is the inverse of the observed information matrix evaluated in .

4.1 EM algorithm

In the original formulation due to Dempester et al. (1977), EM algorithm is a method for computing maximum likelihood estimate iteratively, starting from some initial guess, when the data are incomplete. The EM algorithm has become a popular tool in the statistical estimation problems involving incomplete data, or in problems which can be posed in similar form. Each iteration of the EM algorithm consists of an Expectation (E) step, in which we calculate conditional expectation of the complete-data log-likelihood function given observed data, and a Maximization (M) step, in which we maximize this equation.
In our case, we suppose that the complete-data () consist of an observable part and an unobservable part . By (5) the corresponding log-likelihood function

(19)

Then, the E-step of the algorithm requires the computation of the conditional expectation

(20)

where is the current estimate of in the rth iteration. It is straightforward to verify that the E-step of an EM cycle requires the computation of the conditional expectation of the conditional random variable . Using (5) and (6), the conditional probability function of is

(21)

for and . After easy algebra, the conditional expectation is given by

(22)

The M-step of EM algorithm requires the maximization of the complete-data likelihood over , with the unobservable part replaced by their conditional expectations. Thus the at (r+1)th of EM is the numerical solution of the following nonlinear system:

(23)

5 Special cases

In this section, we furnish some results about special cases of the rGTL-PS class of distributions. In particular, by considering the quantities reported in Table 1 and the expressions given in (6) and (7), we obtain the pdf and the cdf for rGTL-Logarithmic, rGTL-Geometric, the rGTL-Poisson and the rGTL-Binomial distributions. Some plots of density functions and hazard rates are given in Fig. 1 to show the flexibility of these models.

Figure 1: Density and hazard rate for some models of rGTL-PS class for certain parameter values.

5.1 rGTL-Logarithmic distribution

The pdf and the cdf of the rGTL-Logarithmic (rGTL-Log) random variable, respectively, are:

(24)
(25)

with and . The hazard rate is given by

(26)

Some plots of the pdf and hazard rate are given in Fig. 1. Finally, considering that , from (16), we obtain the quantile for rGTL-Log distribution:

(27)

5.2 rGTL-Geometric distribution

The pdf and the cdf of the rGTL-Geometric (rGTL-Geo) random variable, respectively, are:

(28)
(29)

with and . After some passages we obtain the hazard rate, given by

(30)

The quantile for the rGTL-Geo distribution can be obtained from (16), by considering that .

5.3 rGTL-Poisson distribution

The pdf and the cdf of the rGTL-Poisson (rGTL-Poi) random variable, respectively, are:

(31)
(32)

with and . After some passages we obtain the hazard rate, given by

(33)

From the plots reported in Fig. 1, we can state that the hazard rate of the rGTL-Poisson distribution can be monotonically increasing, bathtub and UB-BT.

In this case , thus the quantile for the rGTL-Poi distribution can be obtained by inserting this expression in (16).

5.4 rGTL-Binomial distribution

The pdf and the cdf of the rGTL-Binomial (rGTL-Bin) random variable, respectively, are:

(34)
(35)

with , and . After some passages we obtain the hazard rate, as:

(36)

Finally, from (16), it is possible to obtain the expression for the quantile of rGTL-Bin distribution, by considering that .

6 Applications

In this section, we fit some models, belonging to the proposed class, to real data. In particular, in the first example, we consider the dataset reported in Genç (2013), regarding two different algorithms, SC16 and P3, used to estimate unit capacity factors by the electric utility industry, while, in the second examples, we consider the percentage of muslim population and the percentage of atheists, used by Silva and Barreto-Souza (2014).

Example 1. In Genç (2013), the author fits the TL distribution to the capacity data. We compare the reported results with those obtained considering the rGTL distribution, the Beta distribution and three models from the rGTL-PS class.

In Table 2 the ML estiamates of the parameters, with the corresponding standard errors, and the values for Akaike Information Criterion (AIC) are reported for both SC16 and P3 algorithms. The lower values of the AIC obtained for rGTL-Log model, compared to those obtained in correspondence with all others models, suggest the superiority of the former in describing these data. Furthermore, Table 2 gives the results obtained from the Kolmogorov-Smirnov (KS) test. Once again, the KS statistic for the rGTL-Log distribution is the lowest among all those obtained for the considered distributions.

Finally, in Fig. 2 are shown, for the two algorithms, the fitted density for the rGTL-Log, the TL, the rGTL and the Beta model. As it can be seen, also the plots confirm the superiority of the rGTL-Log model in both cases.

rGTL-Log rGTL-Geo rGTL-Poi rGTL TL Beta
SC16
1.3980 (0.687) 0.8856 (0.621) 0.6184 (0.511) 0.5444 (0.431) a=0.4869 (0.121)
0.8665 (0.455) 0.5578 (0.576) 1.0414 (0.544) 1.5194 (0.518) 0.5943 (0.1239) b=1.1679 (0.358)
0.9920 (0.014) 0.9055 (0.119) 2.1089 (1.311)
AIC -16.7599 -12.6145 -8.1251 -8.0792 -14.2302 -15.2149
KS (p-value) 0.1071 (0.9544) 0.148 (0.6952) 0.2376 (0.1491) 0.3287 (0.0139) 0.1690 (0.5272) 0.1836 (0.4202)
P3
1.3275 (0.777) 0.9098 (0.650) 0.6455 (0.550) 0.5573 (0.465) a=0.5539 (0.142)
0.9141 (0.475) 0.6557 (0.583) 1.0148 (0.562) 1.4533 (0.523) 0.6778 (0.145) b=1.2198 (0.376)
0.9821 (0.031) 0.8611 (0.160) 1.9458 (1.402)
AIC -10.6097 -8.2475 -5.3616 -6.342 -8.9965 -9.5638
KS (p-value) 0.1345 (0.8212) 0.1432 (0.758) 0.2383 (0.1642) 0.3395 (0.0099) 0.1848 (0.4400) 0.2002 (0.3413)
Table 2: ML estimates of the parameters, AIC values and Kolmogorov-Smirnov test results for the first example dataset.
Figure 2: Empirical and fitted density functions for SC16 (left panel) and P3 (rigth panel) algorithms data.

Example 2. In this example, we consider the proportions of muslim population in 152 countries and the proportion of atheists of 137 countries. These datasets have been considered by Silva and Barreto-Souza (2014) with the aim to select the best model between the Beta and the Kumaraswamy. Along these two models, we also consider rGTL-Log, rGTL-Geo and rGTL-Poi models. The ML estimates of the parameters, with the corresponding standard errors, and the values for AIC are reported in Table 3. In both cases, the rGTL-Log model appears to be the best model, as suggested by the lowest value for the AIC. We note that, for the Atheism dataset, also the rGTL-Geo seems to have a better performance than the Beta and Kumaraswamy distributions. In Fig. 3 the fitted densities for the considered models are shown.

rGTL-Log rGTL-Geo rGTL-Poi Beta KW
Muslim
1.3997 (0.341) 0.6521 (0.178) 0.4573 (0.120) a=0.2976 (0.028) a=.2715 (0.033)
0.2624 (0.063) 0.1226 (0.100) 0.5530 (0.086) b= 0.5159 (0.058) b=0.5906 (0.057)
0.9997 (3e-04) 0.9796 (0.018) 2.5828 (0.448)
AIC -250.545 -150.499 -71.596 -232.908 -225.667
Atheism
0.8411 (0.744) 0.9952 (0.608) 0.5353 (0.283) a=0.4368 (0.043) a=0.5091 (0.042)
2.3502 (1.242) 0.9730 (0.758) 3.0655 (0.782) b= 3.6347 (0.538) b=3.0914 (0.412)
0.9900 (0.004) 0.9746 (0.021) 3.3155 (0.580)
AIC -449.703 -438.989 -381.875 -407.951 -417.785
Table 3: ML estimates of the parameters and AIC values for the second example dataset.
Figure 3: Empirical and fitted density functions for the proportions of muslim population and atheists.

7 Conclusion

In many fields of applied science, the observations take values only in a limited range, it is the case, for example, of percentages, proportions and fractions. To model this type of data, the statistical literature offers very few alternatives, mainly the Beta distribution and only recently some authors have recovered the Topp-Leone distribution, proposed in 1955, and the Kumuraswamy distribution, introduced in literature in 1980. Certainly, the lack in the literature of distributions with bounded support contrasts with the huge presence of distributions with unbounded support. With the aim to reduce this gap, in this paper we have proposed a new class of distribution functions with limited support, namely rGTL-PS, obtained by compounding the Power Series distributions and the reflected Generalized Topp-Leone distribution. The proposed class includes, as special cases, some new distributions with limited support such as the rGTL-Logarithmic, the rGTL-Geometric, the rGTL-Poisson and rGTL-Binomial.

Like the Beta distribution, the shape of the rGTL-PS density function can be constant, increasing, decreasing, unimodal and uniantimodal depending on the values of its parameters. Unlike the Beta distribution, the hazard function of rGTL-PS is much more flexible since it can be increasing, bathtub and N-shape. Moreover, the main advantage with respect to the Beta distribution is represented by the fact that the proposed model presents a distribution function in a closed form and the quantiles can be easly obtained. Finally, applications to some real data sets highlight the potential of the proposed model.

8 Appendix

The partial derivatives of and with respect to and are:

Putting and , the elements of the observed information matrix are given by