On Mixtures of Skew Normal and Skew t-Distributions

# On Mixtures of Skew Normal and Skew t-Distributions

## Abstract

Finite mixtures of skew distributions have emerged as an effective tool in modelling heterogeneous data with asymmetric features. With various proposals appearing rapidly in the recent years, which are similar but not identical, the connections between them and their relative performance becomes rather unclear. This paper aims to provide a concise overview of these developments by presenting a systematic classification of the existing skew symmetric distributions into four types, thereby clarifying their close relationships. This also aids in understanding the link between some of the proposed expectation-maximization (EM) based algorithms for the computation of the maximum likelihood estimates of the parameters of the models. The final part of this paper presents an illustration of the performance of these mixture models in clustering a real dataset, relative to other non-elliptically contoured clustering methods and associated algorithms for their implementation.

## 1 Introduction

In recent years, non-normal distributions have received substantial interest in the statistics literature. The growing need for more flexible tools to analyze datasets that exhibit non-normal features, including asymmetry, multimodality, and heavy tails, has led to intense development in non-normal model-based methods. In particular, finite mixtures of skew distributions have emerged as a promising alternative to the traditional Gaussian mixture modelling. They have been successfully applied to numerous datasets from a wide range of fields, including the medical sciences, bioinformatics, environmetrics, engineering, economics, and financial sciences. Some recent applications of multivariate skew normal and skew -mixture models include Pyne et al. (2009), Soltyk and Gupta (2011), Contreras-Reyes and Arellano-Valle (2012), and Riggi and Ingrassia (2013).

The rich literature and active discussion of skew distributions was initiated by the pioneering work of Azzalini (1985), in which the univariate skew normal distribution was introduced. Following its generalization to the multivariate skew normal distribution in Azzalini and Dalla Valle (1996), the number of contributions have grown rapidly. The concept of introducing additional parameters to regulate skewness in a distribution was subsequently extended to other parametric families, yielding the skew elliptical family; for a comprehensive survey of skew distributions, see, for example, the articles by Azzalini (2005), Arellano-Valle and Azzalini (2006), Arellano-Valle et al. (2006), and also the book edited by Genton (2004).

Besides the skew normal distribution, which plays a central role in these developments, the skew -distribution has also received much attention. Being a natural extension of the -distribution, the skew -distribution retains reasonable tractability and is more robust against outliers than the skew normal distribution. Finite mixtures of skew normal and skew -distributions have been studied by several authors, including Lin et al. (2007a, b), Pyne et al. (2009), Basso et al. (2010), Frühwirth-Schnatter and Pyne (2010), Lin (2010), Cabral et al. (2012), Vrbik and McNicholas (2012), Lee and McLachlan (2013), and Vrbik and McNicholas (2013), among others. With the existence of so many proposals, with their various characterizations of skew normal and skew -distributions, it becomes rather unclear how these proposals are related to each other, and to what extent can the subtle differences between them have in practical applications.

This paper provides a concise overview of various recent developments of mixtures of skew normal and skew -distributions. An illustration is given of the performance of mixtures of these distributions and some other skew mixture models in clustering a real dataset. We first present a systematic classification of multivariate skew normal and skew -distributions, with special references to those used in various existing proposals of finite mixture models. We then illustrate the relative performance of these models and other related algorithms by applications to a real dataset.

Recently, Lee and McLachlan (2011) referred to the skew normal and skew -distributions of Pyne et al. (2009) as ‘restricted’ skew distributions, and the class of skew elliptical distributions of Sahu et al. (2003) as having the ‘unrestricted’ form. While this terminology was later briefly discussed in Lee and McLachlan (2013) when outlining the equivalence between the skew distributions of Azzalini and Dalla Valle (1996), Pyne et al. (2009), and Basso et al. (2010), further details were not given. This papers aims to fill this gap. We shall adopt the above terminology, and expand this idea further to classify more general forms of skew distributions, namely, the ‘extended’ and ‘generalized’ forms.

The remainder of this paper is organized as follows. In Section 2, we present the classification scheme for multivariate skew normal and skew -distributions, clarifying the connections between various variants. Next, we discuss the development of currently available algorithms for fitting mixtures of multivariate skew normal and skew -distributions in Section 3, pointing out the equivalence between some of these algorithms. Section 4 presents an application to automated flow cytometric analysis, and comparisons are made with the results of other model-based clustering methods. Finally, some concluding remarks are given in Section 5.

## 2 Classification of multivariate skew normal and skew t-distributions

### 2.1 Multivariate skew normal distributions

Since the seminal article by Azzalini and Dalla Valle (1996) on the multivariate skew normal (MSN) distribution, numerous ‘extensions’ of the so-called skew normal distribution have appeared in rapid succession. The number of contributions are now so many that it is beyond the scope of this paper to include them all here. However, most of these developments can be considered as special cases of the fundamental skew normal (FUSN) distribution (Arellano-Valle and Genton, 2005), and can be systematically classified into four types, namely, the restricted, unrestricted, extended, and generalized forms.

We begin by briefly discussing the FUSN distribution, since it encompasses the first three forms of MSN distributions. The FUSN distribution itself is a generalized form of the MSN distribution. It can be generated by conditioning a multivariate normal variable on another (univariate or multivariate) random variable. Suppose and is a -dimensional random vector. Adopting the notation as used in Azzalini and Dalla Valle (1996), we let be the vector if all elements of are positive and otherwise. Then has a FUSN distribution. It is important to note that is not necessarily normally distributed, but in the restricted, unrestricted, and extended cases, it is restricted to be a random normal variate. The parameter , known as the extension parameter, can be viewed as a location shift for the latent variable . When the joint distribution of and is multivariate normal, the FUSN distribution reduces to a location-scale variant of the canonical FUSN (CFUSN) distribution, given by

 \boldmathY=(\boldmathY1∣\boldmathY0>% \boldmath0), (1)

where

 [\boldmathY0\boldmathY1]∼Nq+p([[]cτμ],[ΓΔTΔΣ]), (2)

where is a -dimensional vector, is -dimensional vector, is a scale matrix, is an arbitrary matrix, and is a scale matrix.

The restricted case corresponds to a highly specialized form of (2), where is restricted to be univariate (that is, ), , and . In the unrestricted case, both and have a -dimensional normal distribution (that is, =). Note that the use of “restricted” here refers to restrictions on the random vector in the (conditioning-type) stochastic definition of the skew distribution. It is not a restriction on the parameter space, and so a “restricted” form of a skew distribution is not necessarily nested within its corrresponding “unrestricted” form. Indeed, the restricted and unrestricted forms coincide in the univariate case.

The extended form has no restriction on the dimensions of , but can be a non-zero vector. When is not normally distributed, the density of has the generalized form. A summary of some of the existing multivariate skew normal distributions is given in Tables 1 and 2, where rMSN , uMSN, eMSN, and gMSN refer to the restricted, unrestricted, extended, and generalized version, respectively, of the multivariate skew normal distribution. The list is not exhaustive, and the names appearing in the final columns are representative examples only.

#### Restricted multivariate skew normal distributions

The restricted case is one of the simplest multivariate forms of the FUSN distribution. The latent variable is assumed to be a univariate random normal variable, and its correlation with is controlled by . There exists two parallel forms of stochastic representation for a MSN random variable, obtained via the conditioning and convolution mechanism (Azzalini, 2005). In general, the conditioning-type stochastic representation of a restricted MSN (rMSN) distribution is given by

 \boldmathY=μ+(\boldmathY1∣Y0>0), (3)

where

 [Y0\boldmathY1]∼N1+p([[]c0\boldmath0],[1δTδΣ]). (4)

Alternatively, the rMSN distribution can be generated via the convolution approach, which leads to a convolution-type stochastic representation, given by

 \boldmathY=μ+~δ∣∣~Y0∣∣+~\boldmathY1, (5)

where and are independent, and where denotes the vector whose th element is given by the absolute value of the th element of . Note that the parameters in (5) are not identical to those in (3) and (4), but can be obtained from the latter by taking and . The connection between the pairs and , are discussed in more detail in Azzalini and Capitanio (1999). The skew normal distribution proposed by Azzalini and Dalla Valle (1996), Branco and Dey (2001), Lachos et al. (2010), and Pyne et al. (2009) are identical after reparameterization, and can be formulated as the rMSN distribution.

The first multivariate skew normal distribution (A-MSN)
The first formal definition of the univariate skew normal distribution dates back to Azzalini (1985). However, its extension to the multivariate case did not appear until just over a decade later. The widely accepted ‘original’ multivariate skew normal distribution was introduced by Azzalini and Dalla Valle (1996). The density of this distribution, denoted by A-MSN (with some changes in notation) takes the form

 f(\boldmathy;μ,Σ,δA)=2ϕp(\boldmathy;μ,Σ)Φ1(δTA\boldmathR−1\boldmathD−1(\boldmathy−μ);0,1−δTA% \boldmathR−1δA), (6)

where is the correlation matrix, is a diagonal matrix formed by extracting the diagonal elements of , and denotes the th entry of . We let be the -dimensional normal density with mean and covariance matrix , and is the (univariate) normal distribution function of a normal variable with mean and variance . To avoid ambiguity in the notation, we have appended a subscript to some of the parameters used in different versions of the rMSN distributions throughout this paper, for example, denotes the version of used in the A-MSN distribution. The density (6) was obtained via the conditioning method (3), with , where and are distributed according to (4). It corresponds to the rMSN distribution in (4) with replaced by . This characterization of the MSN distribution was adopted in the work of Frühwirth-Schnatter and Pyne (2010) when formulating finite mixtures of skew normal distributions, and parameter estimation was carried out using a Bayesian approach.

The skew normal distribution of Branco and Dey (B-MSN)
Branco and Dey (2001) generalized the original skew normal distribution to the class of (restricted) skew elliptical distributions. In their parameterization, the term used in the A-MSN distribution was removed, resulting in an algebraically simpler form. However, under this variant parameterization, a change in scale will affect the skewness parameter. The reader is referred to Arellano-Valle and Azzalini (2006) for a discussion on the effects of adopting this parameterization. The skew normal member of this family, denoted by B-MSN, has density

 f(\boldmathy;μ,δ,Σ)=2ϕp(\boldmathy;μ,Σ)Φ1(δTΣ−1(\boldmathy−μ);0,1−δTΣ−1δ). (7)

It follows that the conditioning-type stochastic representation for is given by , where

 [Y0\boldmathY1]∼N1+p([[]c0\boldmath0],[1δTδΣ]), (8)

and the corresponding convolution-type representation is

 \boldmathY=μ+δ∣∣~Y0∣∣+~\boldmathY1, (9)

where and are independent, and where . Note that (8) and (9) are identical to (4) and (5) respectively. It can be observed that (7) is a reparameterization of the A-MSN distribution. Replacing in (7) with recovers (6).

The skew normal/independent skew normal distribution (SNI-SN)
The skew normal Independent (SNI) distributions are, in essence, scale mixtures of the skew normal distribution. Introduced by Branco and Dey (2001), and considered further in Lachos et al. (2010), the family includes the multivariate skew normal distribution as the basic degenerate case, the density of which is given by

 f(\boldmathy;μ,δS,Σ)=2ϕp(\boldmathy;μ,Σ)Φ1(δTSΣ−12(% \boldmathy−μ);0,1−δTSδS), (10)

where is the square root matrix of ; that is, . We shall adopt the notation SNI-SN when has density (10). As with all restricted MSN distributions, the SNI-SN distribution also enjoys two parallel stochastic representations. This density is very similar to (6) and (7), and actually, is a reparameterization of them. The connection between them can be easily observed by directly comparing their stochastic representations. The two stochastic representations of the SNI-SN are given by

 Y = μ+(\boldmathY1∣Y0>0), (11)

and

 Y = μ+Σ12δS|~Y0|+Σ12(Ip−δSδTS)12~\boldmathY1, (12)

where

 [Y0\boldmathY1]∼N1+p⎛⎝[[]c0\boldmath0],⎡⎣1δTSΣ12Σ12δSΣ⎤⎦⎞⎠, (13)

and are independent. It can be observed that (10) becomes identical to (7) by replacing in (10) with . Cabral et al. (2012) described maximum likelihood (ML) estimation for the SNI-SN distribution via the expectation-maximization (EM) algorithm, and an extension to the mixture model was also studied.

The skew normal distribution of Pyne et al. (P-MSN)
In a study of automated flow cytometry analysis, Pyne et al. (2009) proposed yet another parametrization of the restricted skew normal distribution. This variant, hereafter referred to as the rMSN distribution (as used in Lee and McLachlan (2013)), was obtained as a ‘simplification’ of the unrestricted skew normal distribution described in Sahu et al. (2003) (see Section 2.1.2). Its density is given by

 f(\boldmathy;μ,Σ,δ)=2ϕp(\boldmathy;μ,Σ)Φ1(δTΣ−1(\boldmathy−μ);0,1−δTΣ−1δ). (14)

It follows that the conditioning-type stochastic representation of (14) is given by

 \boldmathY=μ+(\boldmathY1∣Y0>0), (15)

where

 [Y0\boldmathY1]∼N1+p([[]c0\boldmath0],[1δTδΣ]), (16)

and the corresponding convolution-type representation is given by

 \boldmathY=μ+δ|~Y0|+~\boldmathY1, (17)

where again and are independent, and where . It can be observed that (14) is identical to (7). One advantage of this parameterization is that the convolution-type representation is in a relatively simple form, and leads to a nice hierarchical form which facilitates implementation of the EM algorithm for ML parameter estimation.

For ease of reference, we include a summary of the density and stochastic representation of the above-mentioned restricted MSN distributions in Table 3 and 4, respectively.

#### Unrestricted multivariate skew normal distributions

The unrestricted case is very similar to the restricted case, except that the scalar latent variable is replaced by a -dimensional normal random vector . Accordingly, the constraint becomes a set of constraints , which implies each element of is positive. Similar to (3) and (4), the unrestricted MSN (uMSN) distribution can be described by

 \boldmathY=μ+(\boldmathY1∣% \boldmathY0>\boldmath0), (18)

where

 [\boldmathY0\boldmathY1]∼N2p([\boldmath0\boldmath0],[\boldmath% IpΔTΔΣ]). (19)

Here, the skewness parameter is a matrix. The convolution-type representation is analogous to (5), and is given by

 \boldmathY=μ+~Δ|~% \boldmathY0|+~\boldmathY1, (20)

where the random vectors and are independently distributed. The relationship between the parameters in (19) and (20) is similar to that in (3)-(5). In this case, they satisfy and . The skew normal version of Sahu et al. (2003) is an unrestricted form of the MSN distribution, with restricted to be a diagonal matrix.

The skew normal distribution of Sahu et al. (S-MSN)
In Sahu et al. (2003), skewness is introduced to a class of elliptically symmetric distributions by conditioning on a multivariate variable, which produces a class of (unrestricted) skew elliptical distribution. The multivariate skew normal distribution proposed by Sahu et al. (2003), which is a member of this family, is given by

 f(\boldmathy;μ,Σ,δ)=2pϕp(\boldmathy;μ,Σ)Φp(ΔΣ−1(\boldmathy−μ);\boldmath0,Λ), (21)

where and . Observe that with this characterization of the MSN distribution, the density involves the multivariate normal distribution function, whereas the restricted forms is defined in terms of the univariate distribution instead. Accordingly, the conditioning-type stochastic representation of (21) is given by , where

 [\boldmathY0\boldmathY1]∼N2p([\boldmath0\boldmath0],[\boldmath% IpΔΔΣ]), (22)

and the convolution-type representation is given by

 \boldmathY=μ+Δ|~% \boldmathY0|+~\boldmathY1, (23)

where and are independent variables distributed as and , respectively, and where . ML estimation for the uMSN distribution, and its mixture case, is studied in Lin (2009).

#### Extended multivariate skew normal distributions

We consider now the extended skew normal (ESN) distribution, which originates from a selective sampling problem, where the variable of interest is affected by a latent variable that is truncated at an arbitrary threshold. It can be obtained via conditioning by setting , where and are distributed according to (4), which leads to the density

 f(\boldmathy;μ,Σ,τ)=ϕp(\boldmathy;μ,Σ)Φ1(τ+δTΣ−1(\boldmathy−%\boldmath$u$);0,1−δTΣ−1δ)Φ1(τ;0,1). (24)

This expression for an ESN distribution is due to Arnold et al. (1993), and the threshold is known as an extension parameter. With this additional parameter, the normalizing constant is no longer a simple fixed value (such as in the restricted case and in the unrestricted case), but a scalar value that depends on the extension parameter. Although the ESN is more complicated than the restricted and unrestricted skew normal distributions, it has nice properties not shared by these ‘no-extension’ cases, including closure under conditioning.

The ESN distribution represents one of the simplest cases of the extended form. Replacing the latent variable with a -dimensional version leads to the unified skew normal (SUN) distribution (Arellano-Valle and Azzalini, 2006). The SUN distribution is an attempt to unify all of the aforementioned skew normal distributions. Its conditioning-type stochastic representation is given by (1) and (2). It follows that the SUN density is given by

 f(\boldmathy;μ,Σ,Γ,Δ,τ)=ϕp(\boldmathy;μ,Σ)Φq(τ+ΔTΣ−1(\boldmathy−μ);\boldmath0,Γ−ΔTΣ−1Δ)Φq(τ;\boldmath0,Γ). (25)

Its construction can also be achieved via the convolution approach, where the -dimensional latent variable follows a truncated normal distribution with mean . More specifically, let and be independent variables, where denotes a multivariate normal variable with mean vector and covariance truncated to the positive hyperplane. Then has an extended MSN density. Note that in this case, the skewness parameter is a matrix instead of the -dimensional vector used in the restricted and unrestricted forms of the MSN distribution.

It is not difficult to show that the SUN distribution includes the restricted MSN distributions, the unrestricted MSN distributions, and the ESN distribution as special cases. There are also various versions of MSN distributions which turns out to be equivalent to the SUN distribution, including the hierarchical skew normal (HSN) of Liseo and Loperfido (2003), the closed skew normal (CSN) of González-Farás et al. (2004), the skew normal of Gupta et al. (2004) and a location-scale variant of the canonical fundamental skew normal (CFUSN) distribution (Arellano-Valle and Genton, 2005). For a detail discussion on the equivalence between these extended forms of MSN distributions, the reader is referred to Arellano-Valle and Azzalini (2006).

#### Generalized multivariate skew normal distributions

A further generalization of the extended form of the MSN distribution is to relax the distributional assumption of the latent variable . For the ‘generalized form’ of the MSN distribution, there are no other restrictions on the MSN density except that the symmetric part must be a multivariate normal density, that is, is normally distributed. This form is very general and apparently includes the other three forms discussed above. A prominent example is the fundamental skew normal distribution (FUSN), a member of the class of fundamental skew distributions considered by Arellano-Valle and Genton (2005). Its density is given by

 f(\boldmathy;μ,Σ,Qq)=K−1qϕp(\boldmathy;μ,Σ)Qq(% \boldmathy), (26)

where is a normalizing constant and is a skewing function. Notice that the skewing function here is not restricted to the normal family. As mentioned previously, the FUSN density can be obtained by defining , where follows the -dimensional normal distribution with location parameter and scale matrix and is a random vector. Under this definition, and is given by and , respectively.

An interesting special case of (26) is the location-scale variant of the so-called canonical fundamental skew normal (CFUSN) distribution, obtained by taking and cov. In this case, we have , where . This leads to the density

 f(\boldmathy;μ,Σ,Δ)=2qϕp(\boldmathy;μ,Σ)Φq(ΔTΣ−1(\boldmathy−μ);\boldmath0,Λ). (27)

We shall write . It should be noted that by taking and , (27) reduces to the unrestricted skew normal density introduced by Sahu et al. (2003). Also, the CFUSN density reduces to the restricted B-MSN distribution (7) when and .

### 2.2 Multivariate skew t-distributions

The multivariate skew -distribution is an important member of the family of skew-elliptical distributions. Like the skew normal distributions, there exists various different versions of the MST distribution, which can be nai̇vely classified into four broad forms. The MST distribution is of special interest because it offers greater flexibility than the normal distribution by combining both skewness and kurtosis in its formulation, while retaining a fair degree of tractability in an algebraic sense. This additional flexibility is much needed in some practical applications, as will be demonstrated in the example in Section 4.

In the past two decades, many variants of the multivariate skew -distribution have been proposed. Some notable proposals include the skew -member of Branco and Dey (2001)’s skew elliptical class, the skew -distribution of Azzalini and Capitanio (2003), the skew -distribution of Gupta (2003), the skew -distribution of Sahu et al. (2003)’s skew elliptical class, the skew normal/independent skew (SNI-ST) distribution of Lachos et al. (2010), the closed skew (CST) distribution of Iversen (2010), and the extended skew (EST) distribution of Arellano-Valle and Genton (2010). Many of these can be considered as special cases of the fundamental skew (FUST) distribution introduced by Arellano-Valle and Genton (2005). They may be classified as ‘restricted’, ‘unrestricted’, ‘extended’, and ‘generalized’ subclasses of the FUST distribution (see Table 5).

#### Restricted multivariate skew t-distributions

The restricted skew -distribution is obtained by conditioning on a univariate latent variable being positive. The correlation between and is described by the vector . Like the MSN distributions, the MST distributions can be obtained via a conditioning and convolution mechanism. In general, the restricted MST distribution has a conditioning-type stochastic representation given by:

 \boldmathY=μ+(\boldmathY1∣Y0>0), (28)

where

 [Y0\boldmathY1]∼t1+p([[]c0\boldmath0],[1δTδΣ],ν). (29)

The equivalent convolution-type representation is given by

 \boldmathY=μ+~δ|~Y0|+~\boldmathY1, (30)

where the two random variables and have a joint multivariate central -distribution with scale matrix and degrees of freedom. The link between the pairs of parameters and is the same as that for the rMSN distribution. The skew- distribution of Branco and Dey (2001), Azzalini and Capitanio (2003), Gupta (2003), the SNI-ST, and the skew -version given by Pyne et al. (2009) are equivalent to (28) up to a reparametrization.

The skew t-distribution of Branco and Dey (B-MST)
The skew elliptical class of Branco and Dey (2001) includes a skew -distribution, which is a special case of a scale mixture of the B-MSN distribution. Its density is given by

 f(\boldmathy) = 2tp(\boldmathy;μ,Σ,ν) (31) T1⎛⎝δTΣ