A new generalization of the beta distribution

# A new generalization of the beta distribution

Rose Baker
University of Salford, UK
###### Abstract

The beta distribution is the best-known distribution for modelling doubly-bounded data, e.g. percentage data or probabilities. A new generalization of the beta distribution is proposed, which uses a cubic transformation of the beta random variable. The new distribution is label-invariant like the beta distribution and has rational expressions for the moments. This facilitates its use in mean regression. The properties are discussed, and two examples of fitting to data are given. A modification is also explored in which the Jacobian of the transformation is omitted. This gives rise to messier expressions for the moments but better modal behaviour. In addition, the Jacobian alone gives rise to a general quadratic distribution that is of interest. The new distributions allow good fitting of unimodal data that fit poorly to the beta distribution, and could also be useful as prior distributions.

## Keywords

Beta distribution, cubic transformation, doubly-bounded data, Jacobian, label invariance, regression

## 1 Introduction

Doubly-bounded data occur in many application areas, as for example percentage data, and also when the random variable is a probability. In this case, the probabilities are not often directly measured, and the data are binary, with events such as death that either occur or do not. However, a distribution of probability may then be used as a prior distribution, in a Bayes or Empirical Bayes analysis.

There are relatively few distributions available for modelling doubly-bounded data. The 2-parameter beta distribution defined on has pride of place, and many attempts have been made to generalize it to allow more flexible behaviour. These include the generalized beta distribution reviewed by Pham-Gia and Duong (1989). Other generalizations are considered by Nadarajah and Kotz (2006). There are also two very simple generalizations. One is a mixture with a uniform distribution, to allow greater variance. Its use in regression is described in Bayes, Bazán and Garcia (2012). Another is zero inflation, a mixture with a delta-function at zero (Stewart, 2013).

Attempts have also been made to replace the beta distribution with a more flexible distribution. The best-known alternative is probably the Kumaraswamy distribution (ibid, 1980). Some other replacement distributions are described in the book ‘Beyond Beta’ by Kotz and van Dorp (2004), and include 2-sided power distributions, generalized trapezoidal distributions, the Topp and Leone distribution, and Johnson’s distribution (for which see Johnson, Kotz and Balakrishnan, 1995). There is also the log-Lindley distribution (Gómez-Déniz et al, 2014).

Many of these distributions are difficult to use, with likelihoods and moments only expressible in terms of hypergeometric functions, or with cusps (the 2-sided power and trapezoidal distributions). No useful distribution can be completely simple, and even the normal distribution requires a special function, the error function, to compute its distribution function. However, there are several requirements for a practically useful distribution to generalize the beta distribution.

One is that the moments should be simple to compute, or at least the mean. This is because we often wish to regress the mean on a covariate, e.g. mean percentage body fat can be regressed on the Quetelet index (body mass index or BMI). To do this using a likelihood-based method, we must compute the mean for an observation, as a function of the covariates, and in order to compute the log-likelihood, we must then be able to compute the model parameters. For example, with the beta distribution itself with parameters , the mean is , and Mielke (1975) suggests reparameterising to use and . Then we compute and and can compute the log-likelihood. Without a simple formula for the mean in terms of the model parameters, such a regression would be difficult.

Another requirement is label-invariance. With a random variable , and standardising the interval to be be , we can look at the distribution of , where the ends of the scale have been flipped. We have interchanged the labels, for example ‘success’ and ‘failure’, or ‘no treatment effect’ and ‘100% treatment effect’. Label-invariance means that follows a distribution from the same family as . Thus if , then . The fitted model will be the same with the same likelihood value and the same fitted parameter values whichever choice is made.

Some distributions, such as the Kumaraswamy distribution, are not label-invariant. However, label invariance in general is a common requirement e.g. in medicine. (e.g. Senn, 1996). For example, suppose we wish to model the distribution of a disease activity index measured on a scale from zero to unity. Without a label-invariant model, we would get a different distribution if we considered the corresponding health index . Which model should we believe?

We must also require that the pdf and distribution function can be easily computed on most platforms, i.e. they do not require special functions that may not be available; this is not a crucial requirement, because if a distribution proves useful, the necessary special functions will soon be produced. The distribution function is needed for computing the likelihood when data are censored, an extreme case being the application of a doubly-bounded distribution to fitting grouped data. Random numbers are also needed, for example in Markov-chain Monte Carlo methods, and their generation should preferably be straightforward.

The beta distribution itself has the virtues of having a simple expression for the mean and of label-invariance. To compute the likelihood we require the beta function, and the distribution function requires the incomplete beta function. This is also needed for the t-distribution, so is commonly available. Random number generation is not particularly simple, but there exist efficient methods for doing this.

It was desired to construct a practically useful generalized beta distribution, as there does not currently appear to be a distribution for doubly-bounded data that has relatively simple expressions for the moments, is free of cusps, and is label-invariant. The log-Lindley distribution, for example, has simple expressions for the moments but is not label-invariant.

Bearing the above considerations in mind, we took the random variable as , where , and the are chosen so that , i.e. we have a monotonic transformation of the beta random variable. This has been explored with and . The case of course simply gives the beta distribution.

These new distributions generalize the beta distribution and allow more flexible behaviour, e.g. the skewness for a given mean can change substantially. They are also label-invariant, as we can write , where the are linear functions of the . The mean can also be computed as a rational expression, thus facilitating its regression on covariates.

The difficult task is restricting the to require that . We shall see that this is straightforward for (quadratic or Q-beta distribution) and less so for (cubic or C-beta distribution). We have not yet gone beyond , where we already have two extra parameters, which gives plenty of flexibility.

An unexpected problem with the distribution led to the creation of ‘Jacobian-less’ distributions, and these are regarded as the most useful distributions resulting from this approach.

The next section briefly discusses transformations of distributions, then the following sections discuss the detailed properties of the new distributions, after which two examples of their use are given. Finally, the quadratic distributions arising when the transformation is applied to the uniform distribution are described in more detail in appendices.

## 2 Transformation of pdfs

Consider a distribution with pdf defined on and a transformation to with an inverse transformation . Here is the pdf of the beta distribution with parameters , and . Denote the respective random variables as and respectively.

We require that the Jacobian , so that the transformation is monotonic and one-to-one. Since we have that .

If the Jacobian becomes very small over some interval of , will become large, and the distribution might be multimodal. In view of this, a ‘Jacobian-less’ distribution was constructed with pdf , for which for some unknown constant . This distribution clearly has the same modal structure as , because . Since , the mode of occurs at , where is the mode of . If could not be easily determined, this type of pdf would be of little interest. The ‘parent’ distribution of which is transformed to yield is , which here is simply . This is a mixture of beta-distributions, although some of the weights in the mixture may be negative. It is therefore straightforward to evaluate by requiring that the pdf integrates to unity.

Discarding the Jacobian ensures a unimodal distribution if is unimodal. However, it unavoidably makes results for distribution functions and moments more complicated, and the motto when considering discarding Jacobians should be ‘if it’s not broken, don’t fix it’.

## 3 Computing issues and notation

To solve cubics, and even quadratics, use of the Newton-Raphson iteration (e.g. Press et al, 2007) is often recommended here in preference to analytic solutions. This is very quick and easy to program.

If it happens that the variable strays outside this probably would not cause a problem, because it would wander back into again before convergence, but it is faster and safer to set after each step. It is safer because is not guaranteed to be positive for or and so the iteration might oscillate or diverge.

Computation was done using purpose-written fortran programs and the NAG library.

To define some notation, let be a r.v. that follows the beta distribution with parameters , so that the pdf is:

 fp(p)=pα−1(1−p)β−1/B(α,β),

where denotes the beta function. We sometimes write for brevity . Distributions are taken as having support in , but it is trivial to change the interval to an arbitrary interval by adding two more parameters.

Note that quantiles must always be found from the distribution function using Newton-Raphson iteration; they are not discussed further. It is also straightforward to compute inverse moments such as or ; these are also not discussed further. Bivariate distributions could be constructed, but currently the best procedure would be to use a copula.

## 4 Properties: the Q-beta (quadratic) distribution

It seems that the (cubic) distribution is a lot more flexible than the distribution, and the Jacobian-less distribution still better, but we consider the simpler cases first.

### 4.1 Pdf

Define where , where . We say that follows the Q-beta (quadratic-beta) distribution, i.e. . First, since , we have that and . This quickly leads to the requirement , and when we regain the beta distribution. The pdf . Solving the quadratic gives

 p(x)=−γ+Δ(x)1−2γ=xγ+Δ(x)

where , and . Hence the pdf is

 fx(x)=(γ+Δ(x))2−α−βxα−1(γ+Δ(x)−x)β−1/2Δ(x)B(α,β).

The distribution, like the beta distribution, is label-invariant, so that if . We have , and so and we have the alternative form

 fx(x)=xα−1(1−x)β−12Δ(x)B(α,β)(γ+Δ(x))α−1(1−γ+Δ(x))β−1.

In practice to compute the pdf a less analytic approach, which works well also for the cubic distribution considered in the next section, can be used. We find by solving either by solving the quadratic or by Newton-Raphson iteration from and the pdf is then .

### 4.2 Distribution function

The distribution function is also simply related to the distribution function of the beta distribution: using yields , where denotes the incomplete beta function ratio.

### 4.3 Random numbers

Random numbers are generated by piggy-backing off the beta distribution; generate and then .

### 4.4 Moments

The moments are also simple to calculate, although messy. We have that

 E(Xn)=∫10fx(x)xndx=∫10(2γp+(1−2γ))p2)nf(p)dp,

from which the moments can be read off using

 ∫10pmf(p)dp=(α)m/(α+β)m,

where denotes the Pochhammer symbol, the ascending factorial, so that . Finally,

 E(Xn)=n∑j=0(nj)(1−2γ)j(2γ)n−j(α)n+j(α+β)n+j.

Specifically,

 E(X)=α(2γβ+α+1)(α+β)(α+β+1).
 var(X)=4γ2(α)2/(α+β)2+4γ(1−2γ)(α)3/(α+β)3+(1−2γ)2(α)4/(α+β)4−E(X)2.

### 4.5 Mode

The mode is best found from the pdf expressed in terms of as . Taking logarithms and writing the mode is at , where

 α−1pm−β−11−pm−1−2γγ+(1−2γ)pm=0

under the same conditions as for the mode of the beta distribution.

One can find and hence by solving this equation by Newton-Raphson iteration or by solving the corresponding quadratic

 (1−2γ)p2+{(α−2)(1−2γ)−(α+β−2)γ)p+(α−1)(1−γ)=0.

For the distribution is unimodal and in general has exactly the same modality as the beta distribution.

## 5 Properties: the C-beta (cubic) distribution

### 5.1 Parameters

We can add two parameters and create the variable . Since , , and we focus on and , which will yield parameters and . We require , so , whence or . Given , then at and and could only become negative if the equation for zero slope, , has at least one root in . A sufficient condition for this not to happen is that the determinant . Substituting for this yields limits for of . Note by the way that when , , and that we must have .

This last can be seen more convincingly from , i.e. or . From , i.e. , we have .

The requirement that is not a necessary condition for in , as real roots may exist but be outside the range . When , the lower limit is as previously given, and when , the upper limit for is and the lower limit zero. This can be seen as follows: write , i.e. an expansion in Bernstein polynomials. Clearly all coefficients are positive when (so that ) if , i.e. . Hence for the upper limit for such that in is and the lower limit is zero.

To summarise:

1. ;

2. if , the range of is ;

3. if , the range of is .

We now consider how and transform when . Writing we have that , so that , and . Hence is invariant under the label transformation, and we take the first parameter , so that . Next, for (or ), define , and for define , so that always. Under the label transformation, stays the same, while . Thus . Distributions with and are therefore symmetric.

This parameterisation allows to each vary in and exhibits the label symmetry, but has the drawback that the model with will not have a zero value of ; it occurs at . To convert to , we write , then if , , otherwise , and finally . When so that we have . Hence has the same meaning as for the Q-beta distribution, which is now seen to be the C-beta distribution with .

A sensible fitting sequence to ensure convergence would be:

1. fit the beta distribution in the usual way;

2. fit the Q-beta distribution starting from the fitted and values, with (so that ), (so that ).

3. in case of difficulty, float and then , or vice versa.

### 5.2 Pdf

To obtain from we solve the cubic to obtain . This could be done analytically, e.g. using Vieta’s method, but for some parameter values this method is numerically unstable, and a Newton-Raphson iteration starting from is fast and always converges quickly with no numerical problems.

The pdf can then be computed as

 fx(x)=p(x)α−1(1−p(x))β−1B(α,β)(3cp(x)2+2bp(x)+a). (1)

We have again .

### 5.4 Moments

The moments are found using the identity

 E(Xn)=∫10f(p)(a+bp+cp2)ndp,

where are found from . The mean is

 E(X)≡μ=αα+β{a+α+1α+β+1{b+α+2α+β+2c}},

which has been arranged for fast computation.

When regressing the mean on covariates, one can proceed as follows.

1. take as parameters and .

2. Solve for , either by solving the cubic, or (better) using Newton-Raphson iteration starting from .

3. find and compute the log-likelihood as usual.

### 5.5 Mode

The mode can be found by differentiating (1) as for the quadratic distribution. Then the mode where

 α−1pm−β−11−pm−2b+6cpma+2bpm+3cp2m=0. (2)

One can solve by Newton-Raphson iteration or by solving the resulting cubic equation for . Compared with the beta distribution, this distribution has a different modal structure, unlike the Q-beta distribution, which had the same structure as the beta. For there is always one mode in , besides the fact that the pdf will be infinite at zero if and at unity if . This behaviour, giving a mode superimposed on a U or J-shaped distribution, while interesting, is probably not often wanted. It was this behaviour, caused by the fact that the Jacobian can be small over a range of , that led to the creation of the ‘Jacobian-less distribution, described on the next section.

Modal regression could be done by taking parameters and . Given the mode and setting one finds and then solves (2) for , which only requires solving a linear equation. Then the likelihood can be computed in terms of by setting .

## 6 Distributions lacking the Jacobian: SQ-beta and SC-beta distributions

The form of this distribution, as , was derived earlier.

Making integrate to unity gives

 C−1=a+2bαα+β+3cα(α+1)(α+β)(α+β+1).

The properties of the SC-beta distribution only are described; the SQ-beta distribution is of course similar but simpler.

### 6.1 Pdf

The pdf is

 gx(x)=p(x)α−1(1−p(x))β−1B(α,β)(a+2bαα+β+3cα(α+1)(α+β)(α+β+1))

where are derived from as before and is defined as before.

### 6.2 Distribution function

From integrating the pdf this is

 G(x)=C{aI(α,β;p(x))+2bαα+βI(α+1,β;p(x))+3cα(α+1)(α+β)(α+β+1)I(α+2,β;p(x))}.

This form requires three evaluations of the incomplete beta function. However, the computation can be made quicker (but messier) using the identity

 I(α+1,β;x)=I(α,β;x)−xα(1−x)βαB(α,β),

which is well-known, and can be derived by integrating by parts, differentiating the term. We then have

 G(x)=C{a+2bαη+3cα(α+1)η(η+1)}I(α,β;x)−Cxα(1−x)βB(α,β){2bη+3c(α+1η(η+1)+xη+1)},

which requires only one evaluation of an incomplete beta function.

### 6.3 Random numbers

The pdf of the parent distribution is the pdf multiplied by . The rejection method can be used to generate random numbers, by generating random numbers from and accepting them with probability . The maximum value of is either at or or a stationary value in and so is , where is either or its maximum with the stationary value at , if . Hence the acceptance probability is Then, given , one forms .

The efficiency (proportion of generated random numbers retained) is

 ∫10fp(p)(a+2bp+3cp2)Mmaxdp=C−1/Mmax.

This varies depending on the parameter values, but for the examples it was 33.8% and 8%.

This method of generating random numbers works, but a more efficient method would be desirable. However, designing a more efficient method would be another research project; there are many ways to proceed.

### 6.4 Moments

The th moment is

 E(Xn)=C∫10fp(p)(a+2bp+3cp2)(ap+bp2+cp3)ndp.

The mean is then

 E(X)=Cαα+β{a2+α+1α+β+1{3ab+α+2α+β+2{4ac+2b2+α+3α+β+3{5bc+α+4α+β+4(3c2)}}}}.

### 6.5 Mode

The mode is simply

 pm(x)=α−1α+β−2

if it exists, or in full . Modal regression would thus be straightforward. The parameters would be and the equation for would be solved for , after which and the pdf can be computed. The transformation used gives rise to two simple distributions that generalise the uniform distribution and allow modal or U-shaped distributions. They are mentioned in appendices A and B for completeness and because they are new. They may find some use in modelling.

## 7 Fitting to data

Two datasets were fitted. The first comprises 252 observations of calculated percentage of body fat plus a variety of other body size measurements, downloaded from statlib and supplied by Dr. A. Fisher. It is referenced in Penrose et al (1985). The second dataset, also from statlib, is a sample of 349 observations of glycosylated hemoglobin (HBA1c) readings reported in DCCT percentages from diabetic patients. This is referenced in Daramola (2012).

The results of fitting the beta distribution model, and the Q-beta (quadratic) and C-beta (cubic) models, are shown in table 1. The Jacobian-less distributions SQ-beta and SC-beta were also fitted. Figures 1 and 2 show histograms of the data, with fitted beta and C-beta distributions. In both cases, the cubic distribution gives a very significant improvement in the log-likelihood. We have in the first case and in the second , showing that the two added parameters significantly improve the fit.

In the first case, the distribution of percentage body fat is almost normal, whereas the beta distribution skews it to the right. The cubic distribution can correct this and give a good fit. In the second case, the data are more skewed to the right than the beta distribution would allow. The cubic distribution corrects this opposite problem.

The Jacobian-less distributions in fact fit slightly better in both cases, as seen in table 1. The fitted parameters are quite similar.

## 8 Conclusions

The beta distribution has been generalized by allowing quadratic or cubic functions of the beta random variable. These distributions, especially the cubic, can greatly improve model fit for doubly bounded data.

They are fairly tractable, with moments that are rational functions, which allows a straightforward regression of the mean on covariates, and are label invariant like the beta distribution. Modes are computable either as solutions of a quadratic/cubic equation or by Newton-Raphson iteration, so that modal regression is also possible. Distribution functions and random number generation ‘piggy-back’ off that for the parent beta distribution.

An obvious modification is to omit the Jacobian in the transformed distribution, so that the parent distribution is now a mixture of beta distributions, where some of the weights can be negative. The rationale is that for some parameter values, a small Jacobian can introduce an extraneous peak into the distribution. The modified cubic distribution fitted the examples slightly better than the originals. It has a much simpler expression for the mode, and is unimodal for but it has messier expressions for the moments and the distribution function. It would of course also be useful if carrying out modal regression rather than mean regression.

These distributions could be useful in data fitting and as prior distributions. The beta distribution is well-known as the conjugate prior of the binomial distribution, and a more flexible prior can be useful, e.g. for sensitivity analysis.

Obvious future work would be to study the 5-parameter quartic distribution. However, the 4-parameter cubic distribution can already reproduce a wide range of behaviour, so this is not an urgent task. More efficient generation of random numbers for the SC-beta distribution would be useful. The method of generalizing the beta distribution proposed here can be applied to any distribution for doubly-bounded data, thus generating a vast number of possibilities.

## References

•  Bayes, C. L., Bazán, J. L. and Garcia, C. (2012). A new robust regression model for proportions, Bayesian Analysis, 7, 841-866.
•  Daramola O.F. (2012). Assessing the validity of random blood glucose testing for monitoring glycemic control and predicting HbA1c values in type 2 diabetics at Karl Bremer hospital. Masters Thesis (Family Medicine and Primary Care), Stellenbosch University: Stellenbosch, South Africa.
•  Gómez-Déniz, E., Sordo, M. A. and Calderín-Ojeda, E. (2014), The log-Lindley distribution as an alternative to the beta regression model with applications in insurance, Insurance: mathematics and Economics, 54, 49-57.
•  Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995). Continuous Univariate Distributions, Wiley, New York.
•  Kotz, S. and van Dorp, J. R. (2004). Beyond Beta: Other Continuous Families of Distributions with Bounded Support and Applications, World Scientific, Singapore.
•  Kumaraswamy, P. (1980). A generalized probability density function for double-bounded random processes, Journal of Hydrology 46 (1-2), 7988.
•  Mielke, P. W. Jr. (1975). Convenient beta distribution likelihood techniques for describing and comparing meteorological data, Journal of Applied Meterology, 14, 985-990.
•  Nadarajah, S. and Kotz, S. (2006). Beta trigonometric distributions, Portuguese Economic Journal, 5, 207-224
•  Penrose, K., Nelson, A., and Fisher, A. (1985). Generalized Body Composition Prediction Equation for Men Using Simple Measurement Techniques” (abstract), Medicine and Science in Sports and Exercise, 17 (2), 189-189.
•  Pham-Gia, T. and Duong, Q. P. (1989). The generalized beta and F-distributions in statistical modelling, Mathematical computer modelling, 12, 1613-1625.
•  Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (2007). Numerical Recipes: the art of scientific computation, 3rd ed., Cambridge University Press, Cambridge
•  Senn, S. (1996). Relation between treatment benefit and underlying risk in meta-analysis - Standard of ’label invariance’ should not abandoned British Medical Journal, 313, 1550-1550.
•  Stewart, C. (2013). Zero-inflated beta distribution for modeling the proportions in quantitative fatty acid signature analysis, Journal of Applied Statistics, 40, 985-992.
•  Van Dorp, J R. and Kotz, S. (2002). The standard two-sided power distribution and its properties: with applications to financial engineering, American Statistician, 56, 90-99.

## Appendix A: the C-beta(1,1,γ,δ) (2-parameter) distribution

First, the C-beta(1,1) distribution is a special case of the C-beta distribution discussed earlier. With parameters , define as before. Then the pdf is , where as before solves . Figure 3 shows the pdf for various values of and .

The distribution function is simply .

The moments are

 E(X)=a/2+b/3+c/4,
 var(X)=a2/12+4b2/45+9c2/112+ab/6+bc/6+3ac/20.

The mode (which may be an antimode) is at if this lies in . The curvature at the mode is , so if the curvature is negative, and it is a mode not an antimode. For the mode may not exist, but for it always does.

Random numbers are generated by where is uniform on .

This distribution with can give narrow peaks with a flattish background, and would be suitable as a prior distribution with fat tails.

## A general 2-parameter quadratic distribution

The Jacobian-less distribution SC-beta is simply the uniform distribution. However, its parent, a quadratic distribution, is of interest. This type of distribution was not considered in the general case, because it did not fit the data as well as the C-beta and SC-beta distributions. When however we have a new and potentially useful distribution.

The distribution has pdf . This is shown in figure 4 for various values of and .

This gives distribution function . Random numbers can be generated in at least two ways. One is to solve , where is a uniformly-distributed random number, either analytically or using Newton-Raphson iteration. The other method is rejection sampling, by generating and accepting it with probability , as described for the Jacobian-less distribution.

The moments are

 E(P)=a/2+2b/3+3c/4,
 var(P)=a/3+b/2+3c/5−(a/2+2b/3+3c/4)2.

The moment-generating function can be found from

 E(exp(tP))=∫10(a+2bp+3cp2)exp(tp)dp,

as

 E(exp(tP))=a(exp(t)−1)t+2b{exp(t)t−(exp(t)−1)t2}+3c{exp(t)t−2exp(t)t2+2(exp(t)−1)t3}.

The mode is again at if this lies in .

This distribution generalizes the uniform and U-quadratic distributions.

## Figures and tables

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters   