On the canonical form of scale mixtures of skewnormal distributions
Abstract
The canonical form of scale mixtures of multivariate skewnormal distribution is defined, emphasizing its role in summarizing some key properties of this class of distributions. It is also shown that the canonical form corresponds to an affine invariant coordinate system as defined in Tyler et al. (2009), and a method for obtaining the linear transform that converts a scale mixture of multivariate skewnormal distribution into a canonical form is presented. Related results, where the particular case of the multivariate skew distribution is considered in greater detail, are the general expression of the Mardia indices of multivariate skewness and kurtosis and the reduction of dimensionality in calculating the mode.
Keywords: affine invariance, kurtosis, Mardia indices of multivariate skewness and kurtosis, scale mixtures of normal distributions, skewness, skewnormal distribution, skew distribution.
1 Introduction
The Gaussian model plays a central role in statistical modelling; nevertheless the need of flexible multivariate parametric models which are able to represent departure from normality is testified by the increasing weight of the literature devoted to this issues during the last decade. Departure from normality can take place in different ways, such as multimodality, lack in central symmetry, excess or negative excess of kurtosis. The present paper focuses on the last two features, considering the class of distribution generated by scale mixtures of the dimensional skewnormal random variables defined by Azzalini and DallaValle (1996).
The class of scale mixtures of skewnormal distributions includes parameters to regulated either skewness or kurtosis, and reduces to the class of scale mixture of normal distributions when the skewness parameter vanishes. Finally, the skewnormal distribution is recovered when the mixing distribution corresponds to a random variable that is equal to one with probability 1. Among the members of this family, whose general form has been firstly introduced by Branco and Dey (2001), the skew distribution is the one that has received the greatest attention; it corresponds to the case where the mixing distribution is , where is a random variable. Azzalini and Capitanio (2003) developed a systematic study of its main probabilistic properties as well as statistical issues, however some aspects have been left unexplored, like the expression of suitable indices of multivariate skewness and kurtosis and a formal proof of unimodality. The usefulness of the skew distribution has been explored in different applied problems. Azzalini and Genton (2008) proposed and discussed the use of the multivariate skew distribution as an attractive alternative to the classic robustness approach, and Walls (2005), Meucci (2006) and Adcock (2009), among others, adopted this model to represent relevant features of financial data. Another member which has been studied in some details is the multivariate skewslash distribution, defined by Wang and Genton (2006), which is obtained when the mixing distribution is , where is a uniform distribution on the interval and is a real parameter greater than zero.
This paper introduces the definition of a canonical form associated to scale mixtures of skew normal distribution, which generalizes the analogous one introduced in Azzalini and Capitanio (1999) for the multivariate skewnormal distribution. The motivation is its suitability in allowing a simplified representation of some relevant features which are shared by all the members of the class of scale mixtures of skewnormal distributions. In fact the components of the canonical form are such that all but one is symmetric: the skewed component summarizes the skewness of the distribution as a whole, leading to consistent simplifications in obtaining summary measures of the data shape. For instance, compact general expressions for the indices of multivariate skewness and kurtosis defined by Mardia (1970, 1974) for the entire class of scale mixtures of skewnormal distributions are obtained. It will be also shown that a data transformation leading to a canonical form generates an affine invariant coordinate system of the kind defined and discussed in Tyler et al. (2009) in connection with a general method for exploring multivariate data.
2 The skewnormal distribution and its canonical form
The multivariate skewnormal distribution has been defined in Azzalini and Dalla Valle (1996). The parameterization adopted in the present paper is the one introduced by Azzalini and Capitanio (1999), that have further explored the properties of this family.
A dimensional variate is said to have a skewnormal distribution if its density function is
(1) 
where denotes the dimensional normal density with zero mean and full rank covariance matrix , is the distribution function, is the location parameter, is a diagonal matrix of scale parameters such that is a correlation matrix, and is a shape parameter which regulates departure from symmetry. Note that when the normal density is recovered. A random variable with density (1) will be denoted by . The skewnormal distribution shares many properties with the normal family, such as closure under marginalization and affine transforms, and distribution of certain quadratic forms. See Azzalini and Capitanio (1999) for details on these issues. For later use we recall that the mean vector and the covariance matrix of are
(2) 
where
(3) 
is a vector whose elements lie in the interval . From (3) we have also
(4) 
It is important to note that the shape parameter of a marginal component of is in general not equal to the corresponding component of . More specifically, when is partitioned as of dimension and , respectively, the expression of the shape parameter of the marginal component is given by
where , and and , for , denotes the elements of the corresponding partitions of and , respectively. On the contrary, the entries of the vector after marginalization are obtained by extracting the corresponding components of the original parameter.
Azzalini and Capitanio (1999, Proposition 4) introduced a canonical form associated to a skewnormal variate, via the following result.
Proposition 1
Let and consider the affine non singular transform where and is an orthogonal matrix having the first column proportional to . Then , where and .
The above authors called the variate a canonical form of . With respect to the original definition, and without loss of generality, here it is assumed that the nonzero element of the shape vector is the first one. The above result can be easily verified by applying Proposition 3 of Azzalini and Capitanio (1999). Furthermore, using their Propositions 5 and 6 it is immediate to see that while the remaining components of are variates, and that in addition the components of are mutually independent. Finally, it is remarked that the linear transform leading to a canonical form is not unique.
Azzalini and Capitanio (1999) underlined how this transformation plays a role analogous to the one which converts a multivariate normal variable into a spherical form. Motivated by the expressions they obtained for the indices of multivariate skewness and kurtosis defined by Mardia (1970), they also highlighted the role of as a quantity summarizing the shape of the distribution. In fact the two indices are
(5)  
(6) 
and they depends on and only via .
As an additional comment, note that by comparing expressions (5) and (6) with the corresponding ones for a univariate skewnormal distribution (see Azzalini 1985, sect. 2.3), and taking into account (4), it turns out that the values of the multivariate skewness and kurtosis indices are equal to those of the corresponding univariate indices for a skewnormal distribution having shape parameter equal to . In this sense, the canonical form is characterized by one component absorbing the departure from normality of the whole distribution.
Notice also that on using expression (3), the marginal shape parameter associated to the skewed component of a canonical form turns out to be . Because of the onetoone correspondence between these two quantities, it makes no difference which one is used as summary quantity.
Some results contained in Tyler et al. (2009) allows to provide new insight into the role of a canonical transformation . These authors introduced a general method for exploring multivariate data, based on a particular invariant coordinate system, which relies on the eigenvalueeigenvector decomposition of one scatter matrix relative to another. The canonical transformation turns out to be an invariant coordinate system transformation with respect to the scatter matrices and , and taking into account the results of Section 3 of Tyler et al. (2009), a method to obtain a matrix such that can be explicitly stated.
Proposition 2
Let , and define , where is the unique positive definite symmetric square root of , and is the covariance matrix of . Let denote the spectral decomposition of . Then the transform
where , converts into a canonical form. Moreover, is an invariant coordinate system transformation based on the simultaneous diagonalization of the scatter matrices and .
Proof. Consider the simultaneous diagonalization of the scatter matrices and , and let denote the unique positive definite symmetric square root of . Following Tyler et al. (2009, Section 3), a matrix such that and turns out to be , where are the eigenvalues of , or equivalently of , and where the th column of the orthogonal matrix is the normalized eigenvector of corresponding to the th smallest eigenvalue. Furthermore, the th column of is the eigenvector of corresponding to the th smallest eigenvalue of . The transform corresponds to an invariant coordinate system, as defined in Tyler et al. (2009, p. 558). After some straightforward algebra, the eigenvalues of turn out to be , with multiplicity , and , and the associated eigenspaces are the orthogonal complement of the subspace spanned by , and the subspace spanned by , respectively. This fact implies that the first row of is proportional to , while the last rows lie in the orthogonal complement of the subspace spanned by . On using the expressions for the parameters of a linear transformation of a skewnormal variate given in Azzalini and Capitanio (1999, p. 585), the distribution of is ; taking into account the structure of the matrix and the equality we obtain , and hence the variate corresponds to the canonical form of . QED
The proof of Proposition 2 contains a description of the structure of the matrix , which it is shown to have one column proportional to and the remaining ones belonging to the orthogonal complement of the subspace spanned by . This result implies that the projection captures all the skewness and the kurtosis of the joint distribution, whereas by projecting onto the orthogonal complement of the subspace spanned by independent variates are obtained.
Since a matrix converting a skewnormal variate to its canonical form can be obtained through the simultaneous diagonalization of a pair of scatter matrices different from and , it is expected that when two scatter matrices, and say, are such that they become diagonal when the variate is in canonical form, then Proposition 2 will continue to be valid if the matrices and are replaced by and . An example of such matrices will be given in Proposition 6 at the end of Section 4.
3 Scale mixtures of skewnormal variates and their canonical form
In this section a canonical form analogous to the one introduced for the skewnormal distribution is defined for scale mixtures of skewnormal distributions, and some properties are given.
3.1 Scale mixtures of skewnormal variates
Scale mixtures of skewnormal distributions have been considered in Branco and Dey (2001). This class of distributions contains the corresponding class of scale mixture of normal distribution and the skewnormal distribution as proper members, allowing to model a wide range of shapes. A scale mixture of skewnormal distributions is defined as follows.
Definition 1
Let , where and is an independent scalar random variable. Then the variate is a scale mixture of skewnormal distributions, with location and scale parameters and , respectively.
Note that, when , reduces to the corresponding scale mixture of distributions.
The th order moments of can be calculated by differentiating the moment generating function given in Branco and Dey (2001, expression 4.1). An alternative and simpler way to obtain moments is to follow the scheme used by Azzalini and Capitanio (2003, expression (28)) for the moments of the skew distribution, which arises when , and is a random variable. Specifically, assuming that and , by exploiting the stochastic representation given in Proposition 1 we obtain
(7) 
where denotes a moment of order . Note that to use this formula only the knowledge of th order moments of and is required.
An appealing property of an variate is that the distribution of its any even functions is equal to the one obtained by applying the same even function to a variate. This fact can be easily seen by considering Proposition 2 in Azzalini and Capitanio (2003) and noting that the skewnormal distribution belongs to the broader class of distribution generated by perturbation of symmetry which the proposition is concerned with. As a corollary it follows from (7) that even order moments of are equal to those of the corresponding scale mixture of normal distributions. On using (7) and taking into account (2), the mean vector and the covariance matrix of are
(8) 
in agreement with those obtained by Branco and Dey (2001).
Scale mixtures of skewnormal are models capable to take into account for both skewness and kurtosis, and it is important to have available the expressions of measures of these two features. The next proposition introduces the expression of the Pearson indices of skewness and kurtosis for the univariate case; the multivariate case will be considered later, as the introduction of the canonical form of allows to cope with the problem in a simpler manner.
Proposition 3
Let , where and is a scalar random variable. Then, provided that the moments up to order three or up to order four of exist, the expressions of the skewness and excess of kurtosis indices and are
where .
Proof. Since the two indices are location and scale invariant, the case were and will be considered. The third and the fourth cumulants of required to compute and are functions of the first four non central moments of , which in turn, taking into account (7), depends on the corresponding moments of . The first moment of is given in (2), and taking into account that (see Azzalini 1985, property H) the second and the fourth ones are equal to 1 and 3, respectively. Finally, by deriving the moment generating function of the scalar skewnormal distribution given in Azzalini (1985, p. 174), the third moment of turns out to be . After some algebra the result follows. QED
Note that when the variate is a scale mixture of , so that the index becomes zero and measures the excess of kurtosis of . When is degenerate and , the expressions of the two indices for the skewnormal distribution are recovered. When is the inverse of the square root of a random variable, follows a scalar skew distribution, and the two indices coincide with those given in Azzalini and Capitanio (2003, p. 382).
3.2 The canonical form of scale mixtures of skewnormal distributions
The canonical form for scale mixtures of skewnormal distributions is defined in the following way.
Definition 2
Let , where and is an independent scalar random variable. The variate , where the matrices and are as in Proposition 1, will be called a canonical form of .
From the above definition, it is straightforward to see that Proposition 2 can be extended to scale mixtures of skewnormal variates, that is, the linear transform , where is defined as in Proposition 2, converts into a canonical form.
The next proposition states some properties of .
Proposition 4
Under the settings of Definition 2, the following facts hold.

Only the first univariate component of can be skewed. More specifically, is a scale mixture of an variate, where , and its mean and variance are
respectively, where . The remaining components are identically distributed scale mixtures of distributions, that is, symmetric about zero random variables with variance .

The components of are uncorrelated.

The non zero elements of the set of moments are
and

The non zero elements of the set of moments are
Proof. By definition ; the result follows taking into account that whilst the last components of are . The expressions for the means and the variances can be obtained by (8) taking into account (3). Using (3) the vector associated to becomes , where ; taking into account the expression of given in (8), we see that . – From expression (7) we have . The result follows taking into account that the components of are mutually independent and the expressions of their moments. QED
The above results show that the main features of the canonical form of the skewnormal distribution are preserved when a scale mixture is considered. In fact only the first component is skewed, and the influence of the parameters and is completely summarized by quantity , or equivalently by . Independence among the components is replaced by a zero correlation, as expected since scale mixture of normal distribution themselves does not allow to model independence between components.
4 Mardia indices of multivariate skewness and kurtosis
The canonical form of can lead to dramatic simplification in calculating quantities which are invariant or equivariant with respect to invertible affine transformations. This is the case, for instance, of the Mardia indices of multivariate skewness and kurtosis and of the mode. In this section the Mardia indices will be considered, while the latter issue will be developed in the next section.
Given a dimensional random variable , the Mardia indices of multivariate skewness and excess of kurtosis are defined as follows
where and denote the mean vector and the covariance
matrix of , respectively,
,
and denotes the th entry of .
Proposition 5
Consider the scale mixture of skewnormal distribution , where . Then the Mardia indices of multivariate skewness and excess of kurtosis of are, provided that the involved moments of exist
where, using a self explanatory notation, the quantities , , and refer to the component of the canonical form associated to .
Proof. In the proof some symbols introduced in Proposition 4 will be used. Since and are invariant with respect to invertible affine transforms, the canonical form will be considered in place of . From of Proposition 4 we know that the last components of are symmetric about zero; a first implication is that for any . In addition, taking into account , it follows that for any choice of , and in . From we have for any , and consequently reduces to
Finally, by expressing in terms of non central moments and by applying (7), the first equality is proved.
Let us denote by the generic entry of the fourth order central moment of ; taking into account and of Proposition 4 we have
where the expressions of and for and greater than 1 are given in of Proposition 4, and that of can be obtained with the aid of (7). After some algebra the second equality follows. QED
This result shows that, if is a scale mixture of skewnormal distributions, then and depend on the shape of , and on the underlying skewnormal variate only via the scalar quantity , or equivalently , reinforcing its role of a summary quantity of the distribution shape.
By comparing these expressions with the corresponding ones of the skewnormal distribution, given by (5) and (6), respectively, we can observe that they have a different structure. In particular, when a scale mixture of skewnormal distributions is considered, the two indices do not coincide with their univariate version evaluated with respect to the marginal distribution of the only skewed component of the variate in canonical form.
It could be of interest to highlight the structure of . It turns out that it is the sum of three terms: the univariate kurtosis index of , whose expression is given in Proposition 3, the kurtosis index of the dimensional scale mixture of normal distribution , which is given by , and a term which is related with the fourth moment of through , for any .
When explicit expressions of the two indices can be easily obtained taking into account the well known result
leading to
where
and the explicit expressions of and are given in Azzalini and Capitanio (2003, p. 382).
Note that an equivalent expression, obtained through a different method, for , is given in Kim and Mallik (2009). Finally, note also that the expression of and given in Proposition 5 reduces to the corresponding ones for the skewnormal distribution when is such that , while is the index of multivariate kurtosis of a scale mixture of normal distributions with mixing variable when .
The following proposition provides a further example of a pair of scatter matrices that can be used for obtaining the linear transform to convert a scale mixture of skewnormal variates into a canonical form. The proof of the proposition contains the proof of the fact that if two scatter matrices are diagonal when the considered variate is in canonical form, then it is expected that by applying to them the procedure described in Proposition 2 we obtain a matrix that induces a canonical form.
Proposition 6
Consider the scale mixture of skewnormal distribution , where , and define the scatter matrix
Let , where is the unique positive definite symmetric square root of , and is the covariance matrix of . Let denote the spectral decomposition of . Then the transform
where , converts into a canonical form.
Proof. By means of the results contained in Proposition 4 it is possible to show that when a scale mixture of skewnormal distribution is in canonical form, then both the scatter matrices and are diagonal. Let and denote such matrices, where is a matrix such that is in canonical form. The equality , where is the th column of the matrix and is the corresponding eigenvalue, implies that the equality must also hold true; since both and are diagonal, the equality is fulfilled when all the eigenvalues of are equal, or when . The first circumstance is out of interest, because it would imply that we are considering two scatter matrices which are proportional, the second one implies that the columns of are proportional to the corresponding columns of , and the proposition is proved. QED
On the basis of Propositions 2 and 6 we see that the matrix that defines the canonical form can be obtained working with the pair or with , no matter which one between them. However it is important to highlight the auxiliary information given by this technique, which essentially relies on a spectral decomposition. In particular, it is straightforward to note that the trace of the matrix , or equivalently, of , is equal to the sum of the variances of the marginal univariate components of the canonical form, while the trace of the matrix , or equivalently, of , is equal to .
5 The mode of the multivariate skewnormal and skew t distributions
The mode of the skewnormal and skew distributions cannot be calculated in closed form, so one needs to resort to numerical methods. In this section it is proved the uniqueness of the mode in the dimensional case, and it is shown that its computation can be reduced to an equivalent onedimensional problem, drastically reducing the dimensionality of the original problem. From the expression of the mode which is obtained, it also turns out that the mode, the mean and the location parameter are aligned. More specifically, they lie in a one dimensional linear manifold of direction . Thus, the departure from symmetry of these distributions is characterized by a displacement of the probability mass along this direction. The above issues are briefly discussed also for the general case of scale mixture of skewnormal distributions.
For later use, we recall that the density function of a dimensional skew variate as given by Azzalini and Capitanio (2003, expression 26) is
(9) 
where , is the density function of a dimensional variate with degrees of freedom, is the scalar distribution function with degrees of freedom. A random variable having density (9) will be denoted by .
Proposition 7
Let . Then the unique mode of is
where is the mode of a scalar random variable.
Proof. Consider first the mode of the canonical form . If we calculate the mode by imposing the gradient of the density function to be equal to the null vector, the system of equations to be solved turns out to be
where , denotes the th entry of the vector . The last equations are satisfied when for , whilst the unique root (for the uniqueness see Azzalini, 1985, Property D) of the first one corresponds to the mode, say , of a , so that the mode of is the vector . Recalling that and , and taking into account that the mode is equivariant with respect to affine transformations, the mode of turns out to be
where the last equality follows taking into account (3) and (4). QED
Proposition 8
Let . Then the unique mode of is
where is the unique solution of the equation
where .
Proof. As for the skewnormal case, the canonical form , where is considered, and the mode is calculated by imposing the gradient of the density function to be equal to the null vector. The system of equations to solve turns out to be
where and . First note that the function on the left hand side of the first equation can be equal to zero only if . This fact implies that the remaining equations are equal to zero if and only if for . Hence the mode of is , where the scalar value is the solution of
(10) 
where . To see that equation (10) admits a unique solution, first notice that when the function on the right hand side is the difference between a strictly increasing function and a strictly decreasing one. Furthermore, when the latter is greater than zero while the former is equal to zero, and as the latter goes to zero while the former goes to . Hence, there exists a unique point in which their difference is equal to zero. The expression of the mode of is obtained on the basis of arguments analogous to those used for the mode of a multivariate skewnormal distribution. QED
Note that a different proof for the uniqueness of the mode for the multivariate skew t distribution has been independently developed by Azzalini and Regoli (2012).
The issue of finding the mode of other members of the family of scale mixture of skewnormal distributions can be tackled in a similar way. An open problem, which is not investigated here, is to assess the uniqueness of the solution.
It is straightforward to see that if a point of is the mode of the canonical form of a dimensional skew scale mixture of skewnormal variates, then it should be of type , where the real number is such that
where denotes the density function of . This implies that, as for the skewnormal and skew distributions, the mode of a scale mixture of skewnormal distributions will be of the form
Acknowledgements
The author is grateful to Adelchi Azzalini for helpful and stimulating discussions. This research has been supported by the grant scheme PRIN 2006, grant No. 2006132978, from MIUR, Italy.
References
Adcock, C. J. (2009). Asset pricing and portfolio selection based on the multivariate extended skewStudent distribution. Ann. Oper. Res. In press.
Azzalini, A. (1985). A class of distribution which includes the normal ones. Scand. J. Statist. 12, 171–178.
Azzalini, A. & Capitanio, A. (1999). Statistical applications of the multivariate skew normal distribution. J. Roy. Statist. Soc., B 61 579–602.
Azzalini, A. & Capitanio, A. (2003). Distributions generated by perturbation of symmetry with emphasis on a multivariate skew distribution. J. Roy. Statist. Soc., B 65 367–389.
Azzalini, A. & Dalla Valle, A. (1996). The multivariate skew normal distribution. Biometrika 83, 715–726.
Azzalini, A. & Genton, M. G. (2008). Robust likelihood methods based on the skewt and related distributions. Int. Statist. Rev. 76, 106–129.
Azzalini, A. & Regoli, G. (2012). Some properties of skewsymmetric distributions. Annals of the Institute of Statistical Mathematics 64, 857–879.
Branco, M. D. & Dey, D. K. (2001). A general class of multivariate skew elliptical distributions. Journal of Multivariate Analysis 79, 99–113.
Wang, J., Genton, M. G., (2006). The multivariate skewslash distribution. J. Statist. Plann. Inference 136, 209–220.
Genton, M. G., Li, H. & Xiangwei, L. (2001). Moments of skew normal random vectors and their quadratic forms. Statist. & Prob. Lett. 51, 319–325.
Kim, H. M. (2008). A note on scale mixtures of skew normal distribution. Statist. & Prob. Lett. 78, 1694–1701.
Kim, H. M., & Mallik, B. K. (2009). Corrigendum to: “Moments of random vectors with skew distribution and their quadratic forms”[Statist. Probab. Lett. 63 (2003) 417–423]. Statist. & Prob. Lett. 79, 2098–2099.
Mardia, K.V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika 57, 519–530.
Mardia, K.V. (1974). Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhya B 36, 115–128.
Meucci, A. (2006). Beyond BlackLitterman: views on nonnormal markets. Risk Magazine 19(2), 87–92.
Tyler, D. E., Critchley, F., Dümbgen, L., & Oja, H. (2009). Invariant coordinate selection (with discussion). J. Roy. Statist. Soc., B 71, 549–692.
Walls, W. D. (2005). Modeling heavy tails and skewness in film returns. Applied Financial Economics 15(17), 1181–1188.