Inverse stable prior for exponential models
We consider a class of non-conjugate priors as a mixing family of distributions for a parameter (e.g., Poisson or gamma rate, inverse scale or precision of an inverse-gamma, inverse variance of a normal distribution) of an exponential subclass of discrete and continuous data distributions. The prior class is proper, nonzero at the origin (unlike the gamma prior with shape parameter less than one), and is easy to generate random numbers from. The prior class also provides flexibility (includes the Airy and the half-normal) in capturing different prior beliefs as modulated by a bounded parameter The resulting posterior family in the single-parameter case can be expressed in closed-form and is proper, making calibration unnecessary. The mixing induced by the inverse stable family results to a marginal prior distribution in the form of a generalized Mittag-Leffler function, which covers a broad array of distributional shapes. We derive closed-form expressions of some properties like the moment generating function and moments. We propose algorithms to generate samples from the posterior distribution and calculate the Bayes estimators for real data analysis. We formulate the predictive prior and posterior distributions. We test the proposed Bayes estimators using Monte Carlo simulations. The extension to hierarchical modeling and inverse variance components models is straightforward. We illustrate the methodology using a real data set, introduce a hyperprior density for the hyperparameters, and extend the model to a heavy-tailed distribution.
Keywords: rate, inverse scale, inverse variance, precision, Jeffreys, Mittag-Leffler
We illustrate some motivations of this paper using common practical applications of the widely used conjugate gamma (, ) prior density
If and as , the limiting density fails to exist. Therefore, an infinite mass at the origin introduces bias toward values near the origin.
Application 1. If a discrete dataset is a random sample from Poisson () with probability mass function then we obtain the Poisson()-gamma (, ) model where the posterior density is proportional to
As and whenever the posterior diverges (with infinite limit). This posits a limitation on inference for the Poisson-gamma model. A true zero rate (or infinite variance) cannot be ignored but an infinite mass cannot be assigned only at/near the origin as well. Note that observing in practice, for instance, is not uncommon especially if a relatively small dataset is sampled from zero-inflated populations.
Furthermore, the Jeffreys prior for the Poisson distribution also tends to infinity as The corresponding posterior kernel is
which again suffers from the same drawback as whenever
Application 2. If is continuous and is a random sample from gamma () with probability density function then we have the gamma ()-gamma () model. The posterior density kernel is
If both shape parameters ( and ) above are relatively small and such that then the limit ceases to exist also as
Application 3. If the sample data come from inverse-gamma () with probability density function then we obtain the inverse-gamma ()-gamma () model for the inverse scale parameter which yields another gamma posterior kernel
If the same conditions in the second application above are satisfied then the posterior suffers from the same drawback as well. Clearly, finding a prior that is proper and robust against the restriction or bias introduced near the origin for the above models is one motivation of this paper besides providing informative and proper priors to a rate parameter say, of frequently used exponential models.
The rest of the paper is organized as follows. The inverse stable density is introduced in Section 2. The single parameter formulation is presented in Section 3. The computational algorithms are given in Section 4. Predictive distributions are discussed in Section 5. The extensions to the hierarchical setting are in Section 6. Illustrations using real data are in Section 7. The extension to heavy-tailed prior specification and some concluding remarks are presented in Section 8.
2 Inverse stable density
The inverse stable (IS) density has been increasingly becoming popular in several areas of study particularly in physics and mathematics. It is a probability model for time-fractional differential equations, which leads to closed-form solutions (Piryatinska et al., 2005; Mainardi et al., 2010; Meerschaert et al., 2011; Meerschaert and Straka, 2013; Iksanov et al., 2017). It is also used as a subordinator (as the operational time rather than the physical time) for time-fractional diffusions and for Poisson-type processes (Beghin and Orsingher, 2010).
The inverse stable function is related to the strict -stable density (see all references above) as follows: If then
where the Laplace transform of the -stable density is Note that the Airy () and half-normal () distributions are special cases. As , the distribution becomes degenerate, i.e.,
The prior family (indexed by ) does not vanish at , i.e, (unlike the gamma prior with shape parameter less than one). The large-sample behavior (Mainardi et al., 2010; Meerschaert and Straka, 2013) of can be simply expressed in terms of as
where and Moreover,
where and are independently and uniformly distributed in .
Figure 1 below reveals some members of the prior family as a function of the hyperparameter Apparently, controls the shape of For smaller values of , assigns finite masses near the origin and becomes bounded as Clearly, the class of priors allows considerable flexibility in capturing prior beliefs.
3 Single parameter formulation
Proposition 1. Let be a random sample from a population belonging to an exponential family of distributions with the likelihood kernel
where . Using the non-conjugate inverse stable density as a prior for yields the the proper or normalized posterior distribution
where and are the hyperparameters,
is the generalized Mittag–Leffler function (Prabhakar, 1971), , is the Pochammer symbol, and
Proof. See Appendix.
For the Poisson-gamma model in Section 1, as and , the posterior , i.e., the limit exists. For both the gamma-gamma and inv-gamma-gamma models above, as , i.e., as the posterior as well as .
Corollary 1. The marginal density of the data given hyperparameters and is
Proof. It is trivial and is omitted. ∎
Corollary 2. The th posterior moment is
Proof. It is trivial and thus is not provided. ∎
When , the formula in (13) is checked as a valid probability density function.
Corollary 3. The moment generating function is straightforward to calculate as
Proof. The proof follows from the property of the inverse stable distribution and is trivial.∎
Corollary 4. As ,
Proof. The proof follows from the property of the inverse stable density and is omitted. ∎
The preceding corollary suggests that the Bayes estimator . It also indicates how controls the shape and/or variability of the posterior distribution from a non-degenerate family to a degenerate one . This shows the relevance of as when and as goes large.
Table 1 below shows some likelihood models with their corresponding parameterizations following the above formulation and notation. The inverse stable density can also be a prior for a rate or an inverse scale parameter of two-parameter models like the inverse-Weibull, lognormal and normal-inverse densities conditional on the other parameter (see e.g. Miller, 1980; Kundu, 2008).
4 Estimation algorithms
4.1 Monte Carlo integration
From (88) of Appendix A, we have
where Thus, the th moment can be estimated using the following algorithm.
Step 1. Generate
Step 2. Let
Step 3. Set
In particular, the Bayes estimator of with respect to the squared error loss criterion can be directly computed as
while the variance can be estimated as
4.2 Posterior simulation
Let the target density be and candidate density be and are both known. Then
The preceding expression is maximized at and
Below is an accept-reject algorithm for generating observations from the posterior distribution (13).
Step 1. Generate and
Step 2. Accept if
or equivalently, otherwise, go back to step 1.
Step 3. Repeat steps times and set .
As , if and only if and Thus, we expect high acceptance rates (close to ) of Algorithm 2 if the conditions (, and ) are close to being satisfied. The same acceptance rates are to be observed for , but is the solution of
We generated 500 observations from the different posterior distributions in 1000 replications based on observed data sizes to test both algorithms above. We calculated the mean point estimate, the median absolute deviation (MAD) and the maximum likelihood estimates (MLE); the results are in Table 4 of Appendix B. Both algorithms are generally more precise (with smaller MAD) than the MLE, and Algorithm 2 tends to be the most precise (with the smallest MAD) for small values of and the sample size. The point estimates from both algorithms are very close but can be considerably different from the MLE even for relatively large data sizes. It is observed that both bias and variability decrease as the sample size increases in all cases as expected.
5 Predictive distributions
Consider the exponential data model
where The prior predictive distribution of a conditionally independent new observation is
where , , and is an appropriate normalizing constant or function. We now propose a class of prior predictive densities for a family of transformations that provides a way of simulating by generating or inverting , and hence allows testing of prior beliefs. Note that these predictive distributions can be easily plotted using the maximum likelihood estimates of the hyperparameters and the Monte Carlo approximation (19).
Proposition 2. Let and be free of (see Table 1) and be a one-to-one mapping and invertible. Then the prior predictive density of is
Proof. The proof follows from the well-known Mellin transform of the generalized Mittag-Leffler function (Shukla and Prajapati, 1997) below for as
The above family of proper density functions can easily be plotted using the approximation in (19).
Corollary 5. The th moment can be calculated using formula (30) as
Proof. It is trivial and is omitted. ∎
Formula (29) can be checked to be proper when in the preceding corollary. The th moment exists provided .
Example 1. Let Then and thus,
Example 2. Let . Then and therefore,
When , the preceding ’s are easily checked as proper prior predictive density functions.
After observing the data , the posterior predictive distribution of a conditionally independent new observation is
where and is some normalizing function or constant.
Example 3. Let Then and thus,
Example 4. Let . Then and hence,
The above posterior predictive distributions can directly be plotted using (19) also.
6 Hierarchical models
Recall that , where the marginal prior follows from Corollary 1, with the inverse stable density as the mixing distribution. The marginal prior family can be plotted using the Monte Carlo algorithm above and can be made proper using the Mellin transform property (30). Below are some applications of the inverse stable prior in hierarchical contexts.
Let be the inverse variance and consider
The marginal prior of can be shown as
Also, the hyperparameters and can be estimated by maximizing
The Gibbs sampler for this model can be performed as
The marginal prior of can be obtained as
gives estimates of the hyperparameters and The Gibbs sampler for this model can be accomplished as
Poisson-exponential with unequal intensity rates
The marginal prior of can be calculated as
where . Hence,
The normalizing constant can be straightforwardly calculated using Monte Carlo integration. Therefore,
Global shrinkage in normal-normal model
Let be the inverse variance. Consider
The marginal prior for is
Local shrinkage in normal-normal model
Let be the local inverse variance. When and above, then the marginal prior for is