Multivariate Geometric Skew-Normal Distribution

# Multivariate Geometric Skew-Normal Distribution

## Abstract

Azzalini [3] introduced a skew-normal distribution of which normal distribution is a special case. Recently Kundu [9] introduced a geometric skew-normal distribution and showed that it has certain advantages over Azzalini’s skew-normal distribution. In this paper we discuss about the multivariate geometric skew-normal distribution. It can be used as an alternative to Azzalini’s skew normal distribution. We discuss different properties of the proposed distribution. It is observed that the joint probability density function of the multivariate geometric skew normal distribution can take variety of shapes. Several characterization results have been established. Generation from a multivariate geometric skew normal distribution is quite simple, hence the simulation experiments can be performed quite easily. The maximum likelihood estimators of the unknown parameters can be obtained quite conveniently using expectation maximization (EM) algorithm. We perform some simulation experiments and it is observed that the performances of the proposed EM algorithm are quite satisfactory. Further, the analyses of two data sets have been performed, and it is observed that the proposed methods and the model work very well.

Key Words and Phrases: Skew-normal distribution; moment generating function; infinite divisible distribution; maximum likelihood estimators; EM algorithm; Fisher information matrix.

AMS 2000 Subject Classification: Primary 62F10; Secondary: 62H10

Department of Mathematics and Statistics, Indian Institute of Technology Kanpur, Kanpur, Pin 208016, India.   e-mail: kundu@iitk.ac.in.

## 1 Introduction

Azzalini [3] proposed a class of three-parameter skew-normal distributions which includes the normal one. Azzalini’s skew normal (ASN) distribution has received a considerable attention in the last two decades due to its flexibility and its applications in different fields. The probability density function (PDF) of ASN takes the following form:

 f(x;μ,σ,λ)=2σϕ(x−μσ)Φ(λ(x−μ)σ),    −∞0,

where and denote the standard normal PDF and standard normal cumulative distribution function (CDF), respectively, at the point . Here , and are known as the location, scale and skewness or tilt parameters, respectively. ASN distribution has an unimodal PDF, and it can be both positively or negatively skewed depending on the skewness parameter. Arnold and Beaver [2] provided an interesting interpretation of this model in terms of hidden truncation. This model has been used quite effectively to analyze skewed data in different fields due to its flexibility.

Later Azzalini and Dalla Valle [5] constructed a multivariate distribution with skew normal marginals. From now on we call it as Azzalini’s multivariate skew-normal (AMSN) distribution, and it can be defined as follows. A random vector is a -dimensional AMSN distribution, if it has the following PDF

 g(\boldmathz)=2ϕd(\boldmathz;\boldmathΩ)Φ(\boldmathαT\boldmathz),   \boldmathz∈Rd,

where denotes the PDF of the -dimensional multivariate normal distribution with standardized marginals, and correlation matrix . We denote such a random vector as SN. Here the vector is known as the shape vector, and it can be easily seen that the PDF of AMSN distribution is unimodal and can take different shapes. It has several interesting properties, and it has been used quite successfully to analyze several multivariate data sets in different areas because of its flexibility.

Although ASN distribution is a very flexible distribution, it cannot be used to model moderate or heavy tail data; see for example Azzalini and Capitanio [4]. It is well known to be a thin tail distribution. Since the marginals of AMSN are ASN, multivariate heavy tail data cannot be modeled by using AMSN. Due to this reason several other skewed distributions, often called skew-symmetric distributions, have been suggested in the literature using different kernel functions other than the normal kernel function and using the same technique as Azzalini [3]. Depending on the kernel function the resulting distribution can have moderate or heavy tail behavior. Among different such distributions, skew-t distribution is quite commonly used in practice, which can produce heavy tail distribution depending on the degrees of freedom of the associated -distribution. It has a multivariate extension also. For a detailed discussions on different skew-symmetric distribution, the readers are referred to the excellent monograph by Azzalini and Capitanio [4].

Although ASN model is a very flexible one dimensional model, and it has several interesting properties, it is well known that computing the maximum likelihood estimators (MLEs) of the unknown parameters of an ASN model is a challenging issue. Azzalini [3] has shown that there is a positive probability that the MLEs of the unknown parameters of a ASN model do not exist. If all the data points have same sign, then the MLEs of unknown parameters of the ASN model may not exist. The problem becomes more severe for AMSN model, and the problem exists for other kernels also.

Recently, the author [9] proposed a new three-parameter skewed distribution, of which normal distribution is a special case. The proposed distribution can be obtained as a geometric sum of independent identically distributed (i.i.d.) normal random variables, and it is called as the geometric skew normal (GSN) distribution. It can be used quite effectively as an alternative to an ASN distribution. It is observed that the GSN distribution is a very flexible distribution, as its PDF can take different shapes depending on the parameter values. Moreover, the MLEs of the unknown parameters can be computed quite conveniently using the EM algorithm. It can be easily shown that the ‘pseudo-log-likelihood’ function has a unique maximum, and it can be obtained in explicit forms. Several interesting properties of the GSN distribution have also been developed by Kundu [9].

The main aim of this paper is to consider the multivariate geometric skew-normal (MGSN) distribution, develop its various properties and discuss different inferential issues. Several characterization results and dependence properties have also been established. It is observed that the generation from a MGSN distribution is quite simple, hence simulation experiments can be performed quite conveniently. Note that the -dimensional MGSN model has unknown parameters. The MLEs of the unknown parameters can be obtained by solving non-linear equations. We propose to use EM algorithm, and it is observed that the ’pseudo-log-likelihood’ function has a unique maximum, and it can be obtained in explicit forms. Hence, the implementation of the EM algorithm is quite simple, and the algorithm is very efficient. We perform some simulation experiments to see the performances of the proposed EM algorithm and the performances are quite satisfactory. We also perform the analyses of two data sets to illustrate how the proposed methods can be used in practice. It is observed that the proposed methods and the model work quite satisfactorily.

The main motivation to introduce the MGSN distribution can be stated as follows. Although there are several skewed distributions available in one-dimension, the same is not true in . The proposed MGSN distribution is a very flexible multivariate distribution which can produce variety of shapes. The joint PDF can be unimodal or multimodal and the marginals can have heavy tails depending on the parameters. It has several interesting statistical properties. Computation of the MLEs can be performed in a very simple manner even in high dimension. Hence, if it is known that the data are obtained from a multivariate skewed distribution, the proposed model can be used for analysis purposes. Generating random samples from a MGSN distribution is quite simple, hence any simulation experiment related to this distribution can be performed quite conveniently. Further, it is observed that in one of our data example the MLEs of AMSN do not exist, whereas the MLEs of MGSN distribution exist. Hence, in certain cases the implementation of MGSN distribution becomes easier than the AMSN distribution. The proposed MGSN distribution provides a choice to a practitioner of a new multivariate skewed distribution to analyze multivariate skewed data.

Rest of the paper is organized as follows. In Section 2, first we briefly describe the univariate GSN model, and discuss some of its properties, and then we describe MGSN model. Different properties are discussed in Section 3. In Section 4, we discuss the implementation of the EM algorithm, and some testing of hypotheses problems. Simulation results are presented in Section 5. The analysis of two data sets are presented in Section 6, and finally we conclude the paper in Section 7.

## 2 GSN and MGSN Distributions

We use the following notations in this paper. A normal random variable with mean and variance will be denoted by N. A -variate normal random variable with mean vector and dispersion matrix will be denoted by N. The corresponding PDF and CDF at the point will be denoted by and , respectively. A geometric random variable with parameter will be denoted by GE, and it has the probability mass function (PMF): for

### 2.1 GSN Distribution

Suppose GE and are i.i.d. Gaussian random variables. It is assumed that and ’s are independently distributed. Then the random variable

 X\lx@stackreldist=N∑i=1Xi

is known as GSN random variable and its distribution will be denoted by GSN. Here, ’’ means equal in distribution. The GSN distribution can be seen as one of the compound geometric distributions. The PDF of takes the following form:

 fX(x;μ,σ,p)=∞∑k=1pσ√kϕ(x−kμσ√k)(1−p)k−1.

When = 0 and = 1, we say that has a standard GSN distribution, and it will be denoted by GSN.

The standard GSN is symmetric about 0, and unimodal, but the PDF of GSN can take different shapes. It can be unimodal or multimodal depending on , and values. The hazard function is always an increasing function. If GSN, then the moment generating function (MGF) of becomes

 MX(t)=peμt+σ2t221−(1−p)eμt+σ2t22,   t∈A1(μ,σ,p), (1)

where

 A1(μ,σ,p) = {t;t∈R,(1−p)eμt+σ2t22<1} = {t;t∈R,2μt+σ2t2+2ln(1−p)<0}.

The corresponding cumulant generating (CGF) function of is

 KX(t)=lnMX(t)=lnp+μt+σ2t22−ln(1−(1−p)eμt+σ2t22). (2)

From (2), the mean, variance, skewness and kurtosis can be easily obtained as

 E(X) = μp, (3) V(X) = σ2p+μ2(1−p)p2, (4) γ1 = (1−p)(μ3(2−p)+3μσ2p)(pσ2+μ2(1−p))3/2, γ2 = μ4(1−p)(p2−6p+6)−2μ2σ2p(1−p)(p2+3p−6)+3σ4p2(pσ2+μ2(1−p))2,

respectively. It is clear from the expressions of (3) and (4) that as , and diverge to . It indicates that GSN model can be used to model heavy tail data. It has been shown that the GSN law is infinitely divisible, and an efficient EM algorithm has been suggested to compute the MLEs of the unknown parameters.

### 2.2 MGSN Distribution

A -variate MGSN distribution can be defined as follows. Suppose GE, are i.i.d. N random vectors and all the random variables are independently distributed. Define

 \boldmathX\lx@stackreldist=N∑i=1\boldmathXi, (5)

then is said to have a -variate geometric skew-normal distribution with parameters , and , and its distribution will be denoted by MGSN. If MGSN, then the CDF and PDF of become

 F\boldmathX(\boldmathx;\boldmathμ,% \boldmathΣ,p)=∞∑k=1p(1−p)k−1Φd(% \boldmathx;k\boldmathμ,k\boldmathΣ)

and

 f\boldmathX(\boldmathx;\boldmathμ,\boldmathΣ,p) = ∞∑k=1p(1−p)k−1ϕd(\boldmathx;k\boldmathμ,k\boldmathΣ) = ∞∑k=1p(1−p)k−1(2π)d/2|% \boldmathΣ|1/2kd/2e−12k(\boldmathx−k\boldmathμ)T\boldmathΣ−1(\boldmathx% −k\boldmathμ),

respectively. Here and denote the CDF and PDF of a -variate normal distribution, respectively, with the mean vector and dispersion matrix .

If and , we say that is a standard -variate MGSN random variable, and its distribution will be denoted by MGSN. The PDF of MGSN is symmetric and unimodal, for all values of and , whereas the PDF of MGSN may not be symmetric, and it can be unimodal or multimodal depending on parameter values. The MGF of MGSN can be obtained in explicit form. If MGSN, then the MGF of is

 M\boldmathX(\boldmatht)=pe\boldmathμ% T\boldmatht+12\boldmathtT% \boldmathΣ\boldmatht1−(1−p)e\boldmathμT\boldmatht+12\boldmathtT\boldmathΣ\boldmatht,    \boldmatht∈Ad(%\boldmath$μ$,\boldmathΣ,p), (6)

where

 Ad(\boldmathμ,\boldmathΣ,p) = {\boldmatht;\boldmatht∈Rd,(1−p)e\boldmathμT\boldmatht+12% \boldmathtT\boldmathΣ\boldmatht<1} = {\boldmatht;\boldmatht∈Rd,\boldmathμT\boldmatht+12% \boldmathtT\boldmathΣ\boldmatht+ln(1−p)<0}.

Further the generation of MGSN distribution is very simple. The following algorithm can be used to generate samples from a MGSN random variable.

Algorithm 1:

• Step 1: Generate from a GE

• Step 2: Generate N.

In Figure 1 we provide the joint PDF of a bivariate geometric skew normal distribution for different parameter values: (a) = 0.75, = 0, = 2, , (b) = 0.5, = 2.0, = 1, , (c) = 0.15, = 2.0, = 1.0, = 1, , (d) = 0.15, = 0.5, = -2.5, = 1.0, .

## 3 Properties

In this section we discuss different properties of a MGSN distribution. We use the following notations:

 \boldmathX=(\boldmathX1% \boldmathX2),  \boldmathμ=(%\boldmath$μ$1\boldmathμ2),   % \boldmathΣ=(\boldmathΣ11% \boldmathΣ12\boldmathΣ21\boldmathΣ22). (7)

Here the vectors and are of the order each, and the matrix is of the order . Rest of the quantities are defined, so that they are compatible. The following result provides the marginals of a MGSN distribution.

Result 1: If and then

 \boldmathX2∼MGSNd−h(p,\boldmathμ2,% \boldmathΣ22).

Proof: The result easily follows from the MGF of MGSN as provided in (6).

We further have the following results similar to the multivariate normal distribution. The result may be used for testing simultaneously a set of linear hypothesis on the parameter vector or it may have some independent interest also; see for example Rao [12].

Theorem 1: If MGSN, then MGSN, where is a matrix of rank .

Proof: The MGF of the random vector is

 M\boldmathZ(\boldmatht) = E(e\boldmathtT\boldmathZ)=E(e\boldmathtT\boldmathD\boldmathX)=E⎛⎝e(\boldmathDT\boldmatht)T\boldmathX⎞⎠ = pe(\boldmathD\boldmathμ)T\boldmatht+12\boldmathtT% \boldmathD\boldmathΣ\boldmathDT% \boldmatht1−(1−p)e(\boldmathD\boldmathμ)T\boldmatht+12\boldmathtT% \boldmathD\boldmathΣ\boldmathDT% \boldmatht,    for   \boldmatht∈ADs,

where

 ADs = {\boldmatht;\boldmatht∈Rs,(1−p)e(\boldmathD\boldmathμ)T%\boldmath$t$+12\boldmathtT\boldmathD\boldmathΣ\boldmathDT\boldmatht<1} = {\boldmatht;\boldmatht∈Rs,ln(1−p)+(\boldmathD\boldmathμ)T\boldmatht+12\boldmathtT\boldmathD\boldmathΣ\boldmathDT\boldmatht<0}.

Hence the result follows.

If MGSN, and if we denote , , then the moments and cumulants of , for , can be obtained from the MGF as follows:

 E(Xi)=∂∂tiM\boldmathX(% \boldmatht)∣∣∣\boldmatht=\boldmath0=μip (8)
 E(XiXj)=∂2∂ti∂tjM% \boldmathX(\boldmatht)∣∣∣\boldmatht=% \boldmath0=pσij+μiμj(2−p)p2.

Hence,

 Var(Xi)=pσii+μ2i(1−p)p2, (9)
 Cov(Xi,Xj)=pσij+μiμj(1−p)p2, (10)

and

 Corr(Xi,Xj)=pσij+μiμj(1−p)√pσii+μ2i(1−p)√pσjj+μ2j(1−p). (11)

It is clear from (11) that the correlation between and for , not only depends on , but it also depends on and . For fixed , , if and , then 1, and if and , then -1. From (11) it also follows that if is a standard -variate MGSN random variable, i.e. for , = 0, hence = 0. Therefore, in this case although and are uncorrelated, they are not independent.

Now we would like to compute the multivariate skewness indices of the MGSN distribution. Different multivariate skewness measures have been introduced in the literature. Among them the skewness index of Mardia [10, 11] is the most popular one. To define Mardia’s multivariate skewness index let us introduce the following notations of a random vector .

 μ(r1,…,rs)i1,…,is=E[s∏k=1(Xrk−μrk)ik],

where . Mardia [10] defined the multivariate skewness index as

 β1=d∑r,s,t=1d∑r′,s′,t′=1σrr′σss′σtt′ μrst111 μr′s′t′111,

here for denotes the -th element of the inverse of the dispersion matrix of the random vector . In case of MGSN distribution

 μlhm111=1p4{p(1−p)(2−p)μhμlμm+p2(1−p)(μmσhl+μlσhm+μhσlm)}. (12)

It is clear from (12) that if = 1 then = 0. Also if = 0 for all , then = 0. Moreover, if for some , then the skewness index may diverge to or as . Therefore, for MGSN distribution Mardia’s multivariate skewness index varies from to .

If MGSN and if we denote the mean vector and dispersion matrix of , as and , respectively, then from (8), (9) and (10), we have the following relation:

 p \boldmathμ\boldmathX=\boldmathμ    and    p2 \boldmathΣ\boldmathX=p \boldmathΣ+(1−p) \boldmathμ\boldmathμT.

The following result provides the canonical correlation between and . It may be mentioned that canonical correlation is very useful in multivariate data analysis. In an experimental context suppose we take two sets of variables, then the canonical correlation can be used to see what is common among these two sets of variables; see for example Rao [12].

Theorem 2: Suppose MGSN. Further and are partitioned as in (7). Then for and such that and , the maximum corr = , where is the maximum root of the -degree polynomial equation

 ∣∣∣−λ\boldmathΣ11\boldmathΣ% 12\boldmathΣ21−λΣ22∣∣∣=0.

Proof: From Theorem 1, we obtain

 (\boldmathαT\boldmathX1%\boldmath$β$T\boldmathX2)∼MGSN(2,p,(\boldmathαT\boldmathμ1\boldmathβT\boldmathμ2),(\boldmathαT\boldmathΣ11\boldmathα\boldmathαT\boldmathΣ12\boldmathβ\boldmathβT% \boldmathΣ21\boldmathα\boldmathβT\boldmathΣ22\boldmathβ)).

Therefore, using (11), it follows that the problem is to find and such that it maximizes

 corr(\boldmathαT\boldmathX1,% \boldmathβT\boldmathX2)=p\boldmathαT\boldmathΣ12\boldmathβT√p\boldmathαT\boldmathΣ11% \boldmathα√p\boldmathβT\boldmathΣ22\boldmathβ=\boldmathαT%\boldmath$Σ$12\boldmathβT,

subject to the restrictions and . Now following the same steps as in the multivariate normal cases, Anderson [1], the result follows.

The following result provides the characteristic function of the Wishart type matrix based on MGSN random variables.

Theorem 3: Suppose are i.i.d. random variables, and MGSN. Let us consider the Wishart type matrix

 \boldmathA=n∑m=1\boldmathZm\boldmathZTm=((Aij)),   i,j=1,…,d.

If with is a matrix, then the characteristic function of , is

 E(eitr(\boldmathA\boldmathΘ))=pn[∞∑k=1∣∣\boldmathI−2ik\boldmathΘ\boldmathΣ∣∣−1/2(1−p)k−1]n.

Proof:

 E[eitr(\boldmathA\boldmathΘ)] = Missing or unrecognized delimiter for \left = Missing or unrecognized delimiter for \left

Now we would like to compute . For a real symmetric matrix , there is a real matrix such that

 \boldmathBT\boldmathΣ−1\boldmathB=\boldmathI    and    \boldmathBT%\boldmath$Θ$\boldmathB=\boldmathD=diag{δ1,…,δd}.

Here, means a diagonal matrix with diagonal entries as . If we make the transformation , then using Theorem 1, MGSN. Using the definition MGSN distribution it follows that

 \boldmathY\lx@stackreld=N∑m=1% \boldmathYm,

here GE, and ’s are i.i.d. random vectors, and N. We use the following notation

 \boldmathY=(Y1,…,Yd)T   and   % \boldmathYm=(Ym1,…,Ymd)T.

Hence, , for . Therefore,

 E[ei(\boldmathZT1\boldmathΘ\boldmathZ1)] = E[ei(\boldmathYT\boldmathD\boldmathY)]=E[ei(∑dj=1δjY2j)]=E[ei(∑dj=1δj(∑Nm=1Ymj)2)] = ENE[ei(∑dj=1δj(∑Nm=1Ymj)2)∣∣ ∣∣N]=ENE[ei(∑dj=1δjN(∑Nm=1Ymj/√N)2)∣∣ ∣∣N] = ENd∏j=1E[ei(δjN(∑Nm=1Ymj/√N)2)∣∣ ∣∣N]=ENd∏j=1(1−2iδjN)−1/2 = EN∣∣\boldmathI−2iN\boldmathD∣∣−1/2=EN∣∣\boldmathI−2iN\boldmathΘ% \boldmathΣ∣∣−1/2 = p∞∑k=1∣∣\boldmathI−2ik% \boldmathΘ\boldmathΣ∣∣−1/2(1−p)k−1.

Theorem 4: If for any , GSN for a dimensional random vector , then there exists a -dimensional vector and a symmetric matrix such that MGSN. Here are functions of .

Proof: If we denote the mean vector and dispersion matrix of the random vector , as and , respectively, then we have and . Hence from (3) and (4), we have

 μ(\boldmathc)=p\boldmathcT\boldmathμ\boldmathX   and   σ2(\boldmathc)=p\boldmathcT\boldmathΣ\boldmathX%\boldmath$c$−p(1−p)(\boldmathcT\boldmathμ\boldmathX)2. (13)

Therefore, from (1), using = 1, it follows that

 E(exp(\boldmathcT\boldmathX))=pexp(μ(\boldmathc)+σ2(\boldmathc)/2)1−(1−p)exp(μ(\boldmathc)+σ2(\boldmathc)/2). (14)

Let us define a -dimensional vector and a symmetric matrix as given below:

 \boldmathμ=p\boldmathμ\boldmathX   and    \boldmathΣ=p\boldmathΣ\boldmathX−p(1−p)\boldmathμ\boldmathX%\boldmath$μ$T\boldmathX. (15)

Therefore,

 μ(\boldmathc)+σ2(\boldmathc)2=% \boldmathcT\boldmathμ+12\boldmathcT\boldmathΣ\boldmathc,

and (14) can be written as

 E(e\boldmathcT\boldmathX)=pexp(\boldmathcT\boldmathμ+12% \boldmathcT\boldmathΣ\boldmathc)1−(1−p)exp(\boldmathcT\boldmathμ+12\boldmathcT\boldmathΣ\boldmathc)=M\boldmathX(\boldmathc).

Hence the result follows.

Therefore, combining Theorems 1 and 2, we have the following characterization results for a -variate MGSN distribution.

Theorem 5: If a -dimensional random vector has a mean vector and a dispersion matrix , then MGSN, here and are as defined in (15), if and only if for any , GSN, where and are as in (13).

Now we provide another characterization of the MGSN distribution.

Theorem 6: Suppose is a sequence of i.i.d. -dimensional random vectors, and GE, for . Consider a new -dimensional random vector

 \boldmathY=M∑i=1\boldmathXi.

Then MGSN for , if and only if has a MGSN distribution.

Proof: If part. Suppose MGSN, then the MGF of for , can be written as

 M\boldmathY(\boldmatht) = E(e\boldmathtT\boldmathY)=∞∑m=1E(e∑Mi=1\boldmathtT% \boldmathXi|M=m)P(M=m) = ∞∑m=1α(1−α)m−1⎛⎜⎝pe% \boldmathμT\boldmatht+12\boldmathtT\boldmathΣ\boldmatht1−(1−p)e\boldmathμT\boldmatht+12\boldmathtT% \boldmathΣ\boldmatht