1 Introduction
###### Abstract

In environmental studies, many data are typically skewed and it is desired to have a flexible statistical model for this kind of data. In this paper, we study a class of skewed distributions by invoking arguments as described by Ferreira and Steel (2006, Journal of the American Statistical Association, 101: 823–829). In particular, we consider using the logistic kernel to derive a class of univariate distribution called the truncated-logistic skew symmetric (TLSS) distribution. We provide some structural properties of the proposed distribution and develop the statistical inference for the TLSS distribution. A simulation study is conducted to investigate the efficacy of the maximum likelihood method. For illustrative purposes, two real data sets from environmental studies are used to exhibit the applicability of such a model.

A Class of Skewed Distributions with Applications in Environmental Data

Indranil Ghosh, Hon Keung Tony Ng

University of North Carolina, Wilmington, North Carolina, USA

Southern Methodist University, Dallas, Texas, USA.

Keywords and phrases: Maximum likelihood, Moments, Monte Carlo simulation, Skewed distribution, Truncation.

AMS 2010 subject classifications: 60E, 62F

## 1 Introduction

The need for skewed distributions arises in every area of the sciences, engineering and medicine because data are likely coming from asymmetrical populations. One of the common approaches for the construction of skewed distributions is to introduce skewness into some known symmetric distributions. Ferreira and Steel (2006) presented a unified approach for constructing such a class of skewed distributions. Let be a symmetric random variable about zero with probability density function (pdf) and cumulative distribution function (cdf) . Then, the random variable is a skewed version of the symmetric random variable with pdf

 fY(y)=fX(y)w[FX(y)],y∈R, (1.1)

where is a pdf defined on the unit interval (Definition 1, Ferreira and Steel, 2006). The unified family of distributions defined in Eq. (1.1) contains many well-known families of skewed distributions. One of the commonly used class of skewed distributions in the form of Eq. (1.1) is the skewed distributions introduced by Azzalini (1985). Specifically, take Then, Eq. (1.1) reduces to

 (1.2)

A particular case of the model in Eq. (1.2) is the skewed normal distribution obtained by setting and , where and are the pdf and cdf of the standard normal distribution, respectively. The family of distributions given by Eq. (1.2) and the skew-normal class have been studied and extended by many authors, for example, see Azzalini (1986), Azzalini and Dalla Valle (1996), Azzalini and Capitanio (1999), Arnold and Beaver (2000), Pewsey (2000), Loperfido (2001), Arnold and Beaver (2002), Nadarajah and Kotz (2003), Gupta and Gupta (2004), Behboodian et al. (2006), Nadarajah and Kotz (2006), Huang and Chen (2007) and Sharafi and Behboodian (2006).

In environmental studies, data are typically skewed and different skewed distributions such as the Weibull, lognormal and gamma distributions are often used to model such data sets (see, for example, EPA 1992, Singh et al. 2002). For instance, the soil concentrations of the contaminants of potential concern (Singh et al. 2002; Shoari et al. 2015), mercury concentration in swordfish (Lee and Krutchkoff 1980), and survival time times of mice exposed to gamma radiation (Gross and Clark 1975; Grice and Bain 1980) are fitted by different skewed distributions. In this paper, we aim to propose a class of skew-symmetric distributions as an alternative model for fitting skewed data originating from various environmental applications.

This paper is organized as follows. In Section 2, we discuss the proposed class of skew symmetric distributions and introduce some special cases of this class of distributions. In Section 3, we study the structural properties of the proposed class of distributions. Then, the random number generation of the proposed class of distributions is discussed in Section 4. In Section 5, the maximum likelihood estimation method is used to estimation the model parameters of the proposed class of distributions. In Section 6, two real data sets from environmental studies are used to illustrate the usefulness of the proposed class of distributions. Finally, some concluding remarks are presented in Section 7.

## 2 Truncated Logistic Skew-Symmetric Family of Distributions

In this section, we introduce the truncated logistic skew-symmetric (TLSS) family of distributions and study various structural properties of this family. At first, we provide the definition of the proposed family of distributions as follows.

Definition 1. A random variable has the truncated-logistic skew-symmetric distribution with parameter namely, if its pdf has the following form:

 fY(y;λ)=[2(1+e−λ)(1−e−λ)][λfX(y)e−λFX(y)(1+e−λFX(y))2],y∈R,λ∈R, (2.1)

where and are, respectively, the pdf and the cdf of a symmetric random variable about zero, and is a shape parameter.

From Eq. (2.1), the associated cdf of the random variable has the form

 FY(y;λ)=∫y−∞fY(t;λ)dt=[1−e−λFX(y)1+e−λFX(y)](1+e−λ1−e−λ),y∈R,λ∈R. (2.2)

Then, from Eq. (2.2), the inverse cdf of can be expressed as

 F−1Y(u;λ)=F−1X(1λlog⎡⎢ ⎢⎣1−u(1+e−λ1−e−λ)u(1+e−λ1−e−λ)⎤⎥ ⎥⎦),u∈(0,1),λ∈R. (2.3)

The inverse cdf in Eq. (2.3) can be used to obtain the distribution quantiles. Specifically, if , for any , then the -th quantile, can be obtained by using Eq. (2.3). In addition, the inverse cdf in Eq. (2.3) can be used to generate random sample from based on a uniform random number in by means of the inverse transform method, i.e., , where is a random number from uniform distribution in (0, 1) (see Section 4 for the details).

Note that the class of distributions defined in Eq. (2.1) is a particular case of the class in Eq. (1.1) with

 w(x)=[2(1+e−λ)(1−e−λ)]e−λx(1+e−λx)2,

which is the pdf of a truncated logistic distribution. By introducing the logistic function and replacing by one can see that the family of distributions in Eq. (2.1) is a natural extension of Eq. (1.1) to a logistic family. Furthermore, the family of distributions in Eq. (2.1) is symmetric with respect to in the sense that Additionally, in the limit, as has the same distribution as Again, we remark that Eq. (2.1) is undefined at so should be interpreted as the limit If then reduces to degenerate random variables. If , then if and for all other values of If then if and for all other values of

Next, we consider some specific members of the TLSS family:

1. If , i.e., and , , and , in Definition 1, then Eq. (2.3) gives the pdf of the random variable as

 fY(y;μ,σ,λ)=[2(1+e−λ)σ(1−e−λ)]{λϕ(y−μσ)e−λΦ(y−μσ)[1+e−λΦ(y−μσ)]2},y∈R,λ∈R,μ∈R,σ∈R+. (2.4)

We refer the distribution in Eq. (2.4) as the truncated-logistic-skew normal (TLSN) distribution with parameters and . In Figure 1, we plotted the pdfs of the TLSN distribution with and for different values of the parameter .

2. If , i.e., and , , and , in Definition 1, then Eq. (2.1) gives the pdf of the random variable as

 fY(y;μ,b,λ) = (2.5) y∈R,λ∈R,μ∈R,b∈R+.

We refer the distribution in Eq. (2.5) as the truncated-logistic-skew Laplace (TLSL) distribution with parameters and . In Figure 2, we plotted the pdfs of the TLSL distribution with and for different values of the parameter .

3. If , i.e., and , , and , in Definition 1, then Eq. (2.3) gives the pdf of the random variable as

 fY(y;μ,ξ,λ) = [2(1+e−λ)(1−e−λ)] (2.6) ×[λexp(−λ[12+1πarctan(x−μξ)]){πξ[1+(x−μξ)2]}(1+[1+exp(−λ[12+1πarctan(x−μξ)])])2], y∈R,λ∈R,μ∈R,ξ∈R+.

We refer the distribution in Eq. (2.6) as the truncated-logistic-skew Cauchy (TLSC) distribution with parameters and . In Figure 3, we plotted the pdfs of the TLSC distribution with and for different values of the parameter .

4. If , i.e., and , , and , in Definition 1, then Eq. (2.3) gives the pdf of the random variable as

 fY(y;μ,ξ,λ) = [2(1+e−λ)(1−e−λ)] (2.7) ×{λexp(−y−μs)exp(−λ1+e−(y−μ)/s)s[1+exp(−y−μs)]2[1+exp(−λ1+e−(y−μ)/s)]2}, y∈R,λ∈R,μ∈R,s∈R+.

We refer the distribution in Eq. (2.7) as the truncated-logistic-skew Logistic (TLSLG) distribution with parameters and . In Figure 4, we plotted the pdfs of the TLSLG distribution with and for different values of the parameter .

## 3 Structural Properties of the TLSS class of distributions

In this section, we study some important structural properties of the proposed class of skew-symmetric distribution.

Result 1: Moment generating function and characteristic function: Let and denote the moment generating function (mgf) and the characteristic function (chf) of the -th order statistic of a random sample of size from , , where . Then, the mgf and chf of can be expressed as

 E[exp(tY)] = λ(1+e−λ)e−λ∞∑j=0∞∑k=0(−1)j+k+1j[λ(j+1)]kk!Mk+1:k+1(t) {and }E[exp(itY)] = λ(1+e−λ)e−λ∞∑j=0∞∑k=0(−1)j+k+1j[λ(j+1)]kk!ϕk+1:k+1(t),

respectively.

Result 2: Suppose , if exists for any , then also exists.

Proof. Note that

 E(|Y|r) = ∫∞−∞(2(1+e−λ)(1−e−λ))[λfx(y)e−λFx(y){1+e−λFx(y)}2]|y|rdy (3.1) = (2(1+e−λ)(1−e−λ))⋅λ∫∞−∞∣Y∣r⋅{e−λFX(Y)[1+e−λFX(Y)]2}fX(y)dy = 2λ(1+e−λ(1−e−λ)E(|X|r{e−λFX(Y)[1+e−λFX(Y)]2}).

For any real and , we have

 |X|re−λFX(Y)[1+e−λFX(Y)]2≤|X|r, (3.2)

since is always less than 1. Thus, .

Result 3: Alternative expression for : Let denote the -th order statistic from a random sample of size n from and . If the conditions of Result 2 holds, then

 E(Yr)=2λ(1+e−λ)(1−e−λ)∞∑j=0∞∑k=0(−1)j+k+1j⋅[λ(j+1)]kk!⋅E[Xrk+1:k+1]k+1 (3.3)

Proof. The -th moment of the random variable can be expressed as

 E(Yr) = (2(1+e−λ)(1−e−λ))⋅λ∫∞−∞yr[fX(y)e−λFX(y){1+e−λFX(y)}2]dy (3.4) = 2(1+e−λ)(1−e−λ)∫∞−∞yr⋅fX(y)e−λFX(y)[∞∑j=0(−1)j+1j⋅e−λjFX(y)]dy = 2(1+e−λ)(1−e−λ)∞∑j=0(−1)j+1j∫∞−∞yr⋅fX(y)e−λ(j+1)FX(y)dy = λ(1+e−λ)e−λ∞∑j=0(−1)j+1j∫∞−∞yr⋅fX(y)[∞∑k=0(−1)k⋅{λ(J+1)}kk!FkX(y)]dy = 2(1+e−λ)(1−e−λ)∞∑j=0∞∑k=0(−1)j+k+1⋅j⋅[λ(j+1)]kk!∫∞−∞yrfX(y)FkX(y)dy = 2(1+e−λ)(1−e−λ)∞∑j=0∞∑k=0(−1)j+k+1j⋅[λ(j+1)]kk!(k+1)E(Xrk+1:k+1).□

This alternative expression of can be used to obtain the mgf and chf discussed in Result 1.

Result 5: Tail behavior property of TLSS(): First, note that and is a symmetric random variable about with the cdf and pdf as and respectively. Then, the tails of have the same behavior as the tails of because

 fY(y) ∼ 2λ(1+e−λ)(1−e−λ)⋅e−λ(1+e−λ)2fX(y)=2λe−λfX(y)1+e−2λ,asy→+∞, fY(y) ∼ (2(1+e−λ)(1−e−λ))fX(y),asy→−∞, FY(y) ∼ 2λ(1+e−λ)(1−e−λ)FX(y),asy→−∞, {and }1−FY(y) ∼ 2λe−λ1−e−2λ(1−FX(y))asy→+∞.

Result 6: Mode: The mode of the random variable TLSS() can be obtained by taking the first-order derivative of the density function and subsequently equating it to zero:

 ddyfY(y)=0 ⇒ λ[2(1+e−λ)(1−e−λ)]{[1+e−λFX(y)]2[f′X(y)e−λFX(y)−λf2X(y)e−λFX(y)]−A1[1+e−λFX(y)]4}=0,

where . After some algebraic simplification we obtain the following equation:

 [1+e−λFX(y)]{[f′X(y)−λf2X(y)]+2λf2X(y)e−λFX(y)}=0 (3.5) ⇒ [1+e−λFX(y)]⋅f′X(y)+λf2X(y)[1−e−λFX(y)]=0 ⇒ f′X(y)f2X(y)=λ[e−λFX(y)−1][1+e−λFX(y)].

The roots of Eq. (3.5) are the modes of the random variable TLSS(). Note that the roots are to the left (right) of zero for (). The root of Eq. (3.5), say , corresponds to a maximum if,

 ddy[ddyfY(y)]∣∣∣y=y0 < 0 ⇔f′′X(y0)[1+e−λFX(y0)]+λ2f3X(y0)e−λFX(y0) < λfX(y0)f′X(y0){3e−λFX(y0)−2}

Similarly, the root of Eq. (3.5), , corresponds to a minimum if,

 f′′X(y0)[1+e−λFX(y0)]+λ2f3X(y0)e−λFX(y0) > λfX(y0)f′X(y0){3e−λFX(y0)−2}.

The root of Eq. (3.5) corresponds to a inflection point if,

 f′′X(y0)[1+e−λFX(y0)]+λ2f3x(y0)e−λFX(y0) = λfX(y0)f′x(y0){3e−λFX(y0)−2}.

The mode corresponding to a maximum is unique if satisfies

 f′X(y)>f2X(y)⋅{(e−λFX(y)−1)λ[1+e−λFX(y)]}for allyy0.

Similarly, the mode corresponding to a minimum is unique if satisfies

 f′X(y)f2X(y)⋅{(e−λFX(y)−1)λ[1+e−λFX(y)]}for ally>y0.

## 4 Generating Random Variates from the TLSS class of distributions

In this section, we discuss the generation of the random variates from the TLSS distribution based on the inverse transform method and an acceptance-rejection method. Since the cdf in Eq. (2.2) of the random variable follows the TLSS distribution is continuous, the cdf is invertible with the inverse cdf presented in Eq. (2.3). Based on the inverse transform method, a random variate from the TLSS distribution with specific value of and can be generated by the following steps:

• Generate a random variate from the uniform distribution in (0, 1), i.e., .

• Obtain the random variate by solving , where is presented in Eq. (2.3).

In general, there is no closed form solution for the equation and hence, numerical method is required to solve the non-linear equation in order to obtain the random variate that follows the TLSS distribution. To avoid using a numerical method, we consider the acceptance-rejection method by using as the proposed distribution. The acceptance-rejection method provides an alternative way to generate if is a density that can easily be simulated from. The following acceptance-rejection algorithm can be used to generate :

• Generate as a proposal.

• Generate .

## 5 Estimation of Model Parameters

### 5.1 Maximum Likelihood Estimators and Fisher Information

Suppose that is a random sample of size from the distribution with pdf in Eq. (2.1) and is the parameter vector of the symmetric distribution , then the log-likelihood equation can be written as

 lnL(λ,\boldmathθ) = n∑i=1lnf(yi;λ,\boldmathθ) (5.1) = nln2+nlnλ+nln(1+e−λ)−nln(1−e−λ) +n∑i=1lnfX(yi;\boldmathθ)−λn∑i=1FX(yi;\boldmathθ)−2n∑i=1ln[1+e−λFx(yi;\boldmathθ)].

The maximum likelihood estimator (MLE) of , denoted as , can be obtained by maximizing the log-likelihood function in Eq. (5.1) with respect to . Under standard regularity conditions, as , the distribution of can be approximated by a multivariate normal distribution , where is the number of parameters in the distribution . Here, is the observed information matrix evaluated at the maximum likelihood estimate .

For illustrative purpose, we consider the TLSN distribution in Eq. (2.4) with the log-likelihood function

 ℓ(λ,μ,σ) = lnL(λ,μ,σ) (5.2) = n(ln2+lnλ)+nln(1+e−λ)−nln(1−e−λ)+n∑i=1ln{ϕ(yi−μσ)e−λΦ(yi−μσ){1+e−λΦ(yi−μσ)}2} = −nlnσ+n(ln2+lnλ)+nln(1+e−λ)−nln(1−e−λ) +n∑i=1lnϕ(yi−μσ)−λn∑i=1Φ(yi−μσ)−2n∑i=1ln{1+e−λΦ(yi−μσ)}.

The MLEs of the parameters in the TLSN distribution, , , , can be obtained by taking the partial derivatives of with respect to , and respectively and set them to zero. We have the maximum likelihood equations:

 ∂ℓ∂λ = nλ−2nλ(1+e−2λ)−n∑i=1Φ(yi−μσ)+2n∑i=1Φ(yi−μσ)e−λΦ(yi−μσ){1+e−λΦ(yi−μσ)}=0, (5.3) ∂ℓ∂μ = −n∑i=11σϕ(yi−μσ)+n∑i=1λϕ(yi−μσ)σ−2n∑i=1ϕ(yi−μσ)e−λΦ(yi−μσ)σ{1+e−λΦ(yi−μσ)}=0, (5.4) ∂ℓ∂σ = −nσ+μσ2{n∑i=1[ϕ(yi−μσ)]−1−λn∑i=1ϕ(yi−μσ) (5.5) −2λn∑i=1ϕ(yi−μσ)e−λΦ(yi−μσ)[1+e−λΦ(yi−μσ)]−1}=0.

Solving Eqs. (5.3)–(5.5) for and simultaneously gives the maximum likelihood estimates of and , denoted as , and , respectively. Here, the observed Fisher information matrix is given by

 J(\boldmath^θ)=⎡⎢⎣JλλJλμJλσJμμJμσJσσ⎤⎥⎦,

where

 Jλλ = −∂2ℓ∂λ2∣∣∣%\boldmath$θ$=\boldmath^θ = nλ2−2n(1+e−2λ)2[1+e−2λ(1+2λ)]+2n∑i=1e−2λΦ(yi−μσ)⎡⎢ ⎢⎣Φ(yi−μσ)1+e−λΦ(yi−μσ)⎤⎥ ⎥⎦2,
 Jμμ = −∂2ℓ∂μ2∣∣∣% \boldmathθ=\boldmath^θ = 1σ2⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩n∑i=1ϕ(1)(yi−μσ)[ϕ(yi−μσ)]2+λn∑i=1ϕ(1)(yi−μσ)⎫⎪ ⎪ ⎪⎬⎪ ⎪ ⎪⎭ −2σ2n∑i=1⎧⎨⎩[1+e−λΦ(yi−μσ)]−2e−λΦ(yi−μσ) ×[ϕ(1)(yi−μσ)+λϕ2(yi−μσ)+ϕ(1)(yi−μσ)e−λΦ(yi−μσ)]}, Jλμ = −∂2ℓ∂λ∂μ∣∣∣\boldmathθ=\boldmath^θ = −1σn∑i=1ϕ(yi−μσ) +2σn∑i=1ϕ(yi−μσ)[1+e−λΦ(yi−μσ)]−2Φ(yi−μσ)[1+2e−2λΦ(yi−μσ)], Jλσ = −∂2ℓ∂λ∂σ∣∣∣\boldmathθ=\boldmath^θ = −μσ2n∑i=1ϕ(yi−μσ)+2μσ2[n∑i=1ϕ(yi−μσ)e−λΦ(yi−μσ)[1+e−λΦ(yi−μσ)]−1 −2λn∑i=1ϕ(yi−μσ)[1+e−λΦ(yi−μσ)]−2Φ