The Sparse Multivariate Methodof Simulated Quantiles

# The Sparse Multivariate Method of Simulated Quantiles

Mauro Bernardi Corresponding author: Via C. Battisti, 241, 35121 Padua, Italy. e-mail: mauro.bernardi@unipd.it, Tel.: +39.049.8274165. Department of Statistical Sciences, University of Padova and Istituto per le Applicazioni del Calcolo “Mauro Picone” - CNR, Roma, Italy Lea Petrella e-mail: lea.petrella@uniroma1.it. MEMOTEF Department, Sapienza University of Rome Paola Stolfi e-mail: paola.stolfi@uniroma3.it. Department of Economics, Roma Tre University
July 5, 2019
###### Abstract

In this paper the method of simulated quantiles (MSQ) of Dominicy and Veredas (2013) and Dominicy et al. (2013) is extended to a general multivariate framework (MMSQ) and to provide sparse estimation of the scaling matrix (Sparse–MMSQ). The MSQ, like alternative likelihood–free procedures, is based on the minimisation of the distance between appropriate statistics evaluated on the true and synthetic data simulated from the postulated model. Those statistics are functions of the quantiles providing an effective way to deal with distributions that do not admit moments of any order like the –Stable or the Tukey lambda distribution. The lack of a natural ordering represents the major challenge for the extension of the method to the multivariate framework. Here, we rely on the notion of projectional quantile recently introduced by Hallin et al. (2010b) and Kong and Mizera (2012). We establish consistency and asymptotic normality of the proposed estimator. The smoothly clipped absolute deviation (SCAD) –penalty of Fan and Li (2001) is then introduced into the MMSQ objective function in order to achieve sparse estimation of the scaling matrix which is the major responsible for the curse of dimensionality problem. We extend the asymptotic theory and we show that the sparse–MMSQ estimator enjoys the oracle properties under mild regularity conditions. The method is illustrated and its effectiveness is tested using several synthetic datasets simulated from the Elliptical Stable distribution (ESD) for which alternative methods are recognised to perform poorly. The method is then applied to build a new network–based systemic risk measurement framework. The proposed methodology to build the network relies on a new systemic risk measure and on a parametric test of statistical dominance.

Keywords: directional quantiles, method of simulated quantiles, sparse regularisation, SCAD, Elliptical Stable distribution, systemic risk, network risk measures.

## 1 Introduction

Model–based statistical inference primarily deals with parameters estimation. Under the usual assumption of data being generated from a fully specified model belonging to a given family of distributions indexed by a parameter , inference on the true unknown parameter can be easily performed by maximum likelihood. However, in some pathological situations the maximum likelihood estimator (MLE) is difficult to compute either because of the model complexity or because the probability density function is not analytically available. For example, the computation of the log–likelihood may involve numerical approximations or integrations that highly deteriorate the quality of the resulting estimates. Moreover, as the dimension of the parameter space increases the computation of the likelihood or its maximisation in a reasonable amount of time becomes even more prohibitive. In all those circumstances, the researcher should resort to alternative solutions. The method of moments or its generalised versions (GMM), Hansen (1982) or (EMM), Gallant and Tauchen (1996), may constitute feasible solutions when expressions for some moment conditions that uniquely identify the parameters of interest are analytically available. When this is not the case, simulation–based methods, such as, the method of simulated moments (MSM), McFadden (1989), the method of simulated maximum likelihood (SML), Gouriéroux and Monfort (1996) and its nonparametric version Kristensen and Shin (2012) or the indirect inference (II) method Gouriéroux et al. (1993), are the only viable solutions to the inferential problem. Jiang and Turnbull (2004) give a comprehensive review of indirect inference from a statistical point of view. Despite their appealing characteristics of only requiring to be able to simulate from the specified DGP, some of those methods suffer from serious drawbacks. The MSM, for example, requires that the existence of the moments of the postulated DGP is guaranteed, while, the II method relies on an alternative, necessarily misspecified, auxiliary model as well as on a strong form of identification between the parameters of interests and those of the auxiliary model. The quantile–matching estimation method (QM), Koenker (2005), exploits the same idea behind the method of moments without requiring any other condition. The QM approach estimates model parameters by matching the empirical percentiles with their theoretical counterparts thereby requiring only the existence of a closed form expression for the quantile function.
This paper focuses on the method of simulated quantiles recently proposed by Dominicy and Veredas (2013) as a simulation–based extension of the QM of Koenker (2005). As any other simulation–based method, the MSQ estimates parameters by minimising a quadratic distance between a vector of quantile–based summary statistics calculated on the available sample of observations and that calculated on synthetic data generated from the postulated theoretical model. Specifically, we extend the method of simulated quantiles to deal with multivariate data, originating the multivariate method of simulated quantiles (MMSQ). The extension of the MSQ to multivariate data is not trivial because it requires the definition of multivariate quantile that is not unique given the lack of a natural ordering in for . Indeed, only very recently the literature on multivariate quantiles has proliferated, see, e.g., Serfling (2002) for a review of some extensions of univariate quantiles to the multivariate case. Here we rely on the definition of projectional quantile of Hallin et al. (2010a) and Kong and Mizera (2012), that is a particular version of directional quantile. This latter definition is particularly appealing since it allows to reduce the dimension of the problem by projecting data towards given directions in the plane. Moreover, the projectional quantiles incorporate information on the covariance between the projected variables which is crucial in order to relax the assumption of independence between variables. An important methodological contribution of the paper concerns the choice of the relevant directions to project data in order to summarise the information for the parameters of interest. Although the inclusion of more directions can convey more information about the parameters, it comes at a cost of a larger number of expensive quantile evaluations. Of course the number of quantile functions is unavoidably related to the dimension of the observables and strictly depends upon the considered distribution. We provide a general solution for Elliptical distributions and for those Skew–Elliptical distributions that are closed under linear combinations. We also establish consistency and asymptotic normality of the proposed MMSQ estimator under weak conditions on the underlying true DGP. The conditions for consistency and asymptotic Normality of the MMSQ are similar to those imposed by Dominicy and Veredas (2013) with minor changes due to the employed projectional quantiles. Moreover, for the distributions considered in our illustrative examples, full details on how to calculate all the quantities involved in the asymptotic variance–covariance matrix are provided. The asymptotic variance–covariance matrix of the MMSQ estimator is helpful to derive its efficient version, the E–MMSQ.
As any other simulation–based method the MMSQ does not effectively deal with the curse of dimensionality problem, i.e., the situation where the number of parameters grows quadratically or exponentially with the dimension of the problem. Indeed, the right identification of the sparsity patterns becomes crucial because it reduces the number of parameters to be estimated. Those reasonings motivate the use of sparse estimators that automatically shrink to zero some parameters, such as, for example, the–off diagonal elements of the variance–covariance matrix. Several works related to sparse estimation of the covariance matrix are available in literature; most of them are related to the graphical models, where the precision matrix, e.g., the inverse of the covariance matrix, represents the conditional dependence structure of the graph. Friedman et al. (2008) propose a fast algorithm based on coordinate–wise updating scheme in order to estimate a sparse graph using the least absolute shrinkage and selection operator (LASSO) –penalty of Tibshirani (1996). Meinshausen and Bühlmann (2006) propose a method for neighbourhood selection using the LASSO –penalty as an alternative to covariance selection for Gaussian graphical models where the number of observations is less than the number of variables. Gao and Massam (2015) estimate the variance–covariance matrix of symmetry–constrained Gaussian models using three different –type penalty functions, i.e., the LASSO, the smoothly clipped absolute deviation (SCAD) of Fan and Li (2001) and the minimax concave penalty (MCP) of Zhang (2010). Bien and Tibshirani (2011) proposed a penalised version of the log–likelihood function, using the LASSO penalty, in order to estimate a sparse covariance matrix of a multivariate Gaussian distribution. Previous work show that sparse estimation has been proposed mainly either within the regression framework or in the context of Gaussian graphical models. In boh those cases, sparsity patterns are imposed by penalising a Gaussian log–likelihood.
In this paper we handle the lack of the model–likelihood or the existence of valid moment conditions together with the curse of dimensionality problem within a high–dimensional non–Gaussian framework. Specifically, our approach penalises the objective function of the MMSQ by adding a SCAD –penalisation term that shrinks to zero the off–diagonal elements of the scale matrix of the postulated distribution. Moreover, we extend the asymptotic theory in order to account for the sparsity estimation, and we prove that the resulting sparse–MMSQ estimator enjoys the oracle properties of Fan and Li (2001) under mild regularity conditions. Moreover, since the chosen penalty is concave, we deliver a fast and efficient algorithm to solve the optimisation problem.
The proposed methods can be effectively used to make inference on the parameters of large–dimensional distributions such as, for example, Stable, Elliptical Stable (Samorodnitsky and Taqqu 1994), Skew–Elliptical Stable (Branco and Dey 2001), Copula (Oh and Patton 2013), multivariate Gamma (Mathai and Moschopoulos 1992) and Tempered Stable (Koponen 1995). Among those, the Stable distribution allows for infinite variance, skewness and heavy–tails that exhibit power decay allowing extreme events to have higher probability mass than in Gaussian model. To test the effectiveness of the MMSQ and sparse–MMSQ methods several synthetic datasets have been simulated from the Elliptical Stable distribution previously considered by Lombardi and Veredas (2009). For a summary of the properties of the stable distributions see Zolotarev (1964) and Samorodnitsky and Taqqu (1994), which provide a good theoretical background on heavy–tailed distributions. Univariate Stable laws have been studied in many branches of the science and their theoretical properties have been deeply investigated from multiple perspectives, therefore many tools are now available for estimation and inference on parameters, to evaluate the cumulative density or the quantile function, or to perform fast simulation. Stable distribution plays an interesting role in modelling multivariate data. Its peculiarity of having heavy tailed properties and its closeness under summation make it appealing in the financial contest. Nevertheless, multivariate Stable laws pose several challenges that go further beyond the lack of closed form expression for the density. Although general expressions for the multivariate density have been provided by Abdul-Hamid and Nolan (1998), Byczkowski et al. (1993) and Matsui and Takemura (2009), their computations is still not feasible in dimension larger than two. A recent overview of multivariate Stable distributions can be found in Nolan (2008).
As regards applications to real data, we consider the well–known problem of evaluating the systemic relevance of the financial institutions or banks belonging to a given market. After the Bear Stearns hedge funds collapse in July 2007, and the consequent global financial crisis which originated in the United States and then spread quickly to the rest of the world, the threat of a global collapse of the whole financial system has been becoming the major concern of financial regulators. Systemic risk, as opposed to risk associated with any one individual entity, aims at evaluating to which extent the bankruptcy of a bank or financial institutions may degenerate to a collapse of the system as a consequence of a contagion effect. While individual risks are assessed using individual Value–at–Risks (VaR), one the most employed systemic risk measure has been becoming the Conditional VaR (CoVaR), introduced by Adrian and Brunnermeier (2011, 2016). Since then, the assessment of financial risk in a multi–institution framework where some institutions are subject to systemic or non–systemic distress events is one of the hot topics which has received large attention from scholars in Mathematical Finance, Statistics, Management, see, e.g., Acharya et al. (2012), Billio et al. (2012), Bernardi et al. (2017c), Girardi and Ergün (2013), Caporin et al. (2013), Engle et al. (2014), Hautsch et al. (2014), Lucas et al. (2014), Bernardi and Catania (2015), Bernardi et al. (2015), Sordo et al. (2015), Bernardi et al. (2016b), Bernardi et al. (2016c), Bernardi et al. (2016a), Brownlees and Engle (2016) and Salvadori et al. (2016), just to quote a few of the most relevant approaches. For an extensive and up to date survey on systemic risk measures, see Bisias et al. (2012), while the recent literature on systemic risk is reviewed by Benoit et al. (2016). The CoVaR measures the systemic impact on the whole financial system of a distress event affecting an institution by calculating the VaR of the system conditioned to the distress event as measured by the marginal VaR of that institution. As recognised by Bernardi et al. (2017c) this definition of CoVaR fails to consider the institution as a part of a system. Here, we introduce a new definition of CoVaR, the NetCoVaR, that overcomes this drawback by aggregating individual institutions providing a measure of profit and loss of the whole financial market. Despite its appealing definition, the NetCoVaR, as any other risk measure, provide only point estimates of the amount of systemic risk. Within this context, statistical methods aims to assess whether two risk measures are statistically different from each other. As concerns the CoVaR, recently, Castro and Ferrari (2014) proposed a nonparametric dominance test where pairwise CoVaRs are compared in order to statistically assess the systemic relevance of the different institutions. Here, we propose a parametric counterpart of the test of Castro and Ferrari (2014) and we assume profits–and–losses of the different institutions are Elliptically Stable distributed. The asymptotic distribution of the dominance test is provided under the mild assumption of elliptically contoured distributions for the involved random variables. The dominance test is subsequently used to build a network that represents the interdependence relations among institutions. In this context the ESD distribution plays a relevant role either because data are contaminated by the presence of outliers or because the methodology strongly relies on the presence of heavy–tailed distributions such as the systemic risk assessment.
The remainder of the paper is structured as follows. In Section 2 we introduce the multivariate Method of Simulated Quantiles, and we establish the basic asymptotic properties. The asymptotic variance of the estimator is necessary to select the optimal weighting matrix for the square distance in order to obtain the efficient MMSQ estimator. Section 3 deals with the curse of dimensionality problem, introduces the Sparse–MMSQ estimator that induces sparsity in the scale matrix using the SCAD –penalty and shows that the Sparse–MMSQ enjoys the oracle properties under mild regularity conditions. The penalised estimator cannot be used to make inference on the parameters shrunk to zero, therefore Section 3 ends by proposing a de–sparsified MMSQ estimator. The effectiveness of the method is tested in Section 4, where several synthetic datasets from the Elliptical Stable distribution are considered. Section 5 is devoted to the empirical application that aims to illustrate how the methodological contributions of the paper can be applied to the systemic risk assessment. Section 6 concludes. Technical proofs of the theorems are deferred to Appendix A.

## 2 Multivariate method of simulated quantiles

In this Section we first recall the basic concepts on directional and projectional quantiles. Then, the multivariate method of simulated quantiles is introduced, and results about the consistency and asymptotic properties of the estimator are proposed.

### 2.1 Directional quantiles

The MMSQ requires the prior definition of the concept of multivariate quantile, a notion still vague until quite recently because of the lack of a natural ordering in dimension greater than one. Here, we relies on the definition of directional quantiles and projectional quantiles introduced by Hallin et al. (2010a), Paindaveine and Šiman (2011) and Kong and Mizera (2012). We first recall the definition of directional quantile given in Hallin et al. (2010a) and then we introduce the main assumptions that we will use to develop MMSQ.

###### Definition 1.

Let be a –dimensional random vector in , be a vector in the unit sphere and . The –quantile of is defined as any element of the collection of hyperplanes

 πτu={Y:b′τuY−qτu=0},

such that

 (qτu,b′τu)′∈{argmin(q,b)Ψτu(q,b)s.t.b′u=1}, (1)

where

 Ψτu(q,b)=E[ρτ(b′Y−q)], (2)

and denotes the quantile loss function evaluated at , , and denotes the expectation operator.

The term directional is due to the fact that the multivariate quantile defined above is associated to a unit vector .

###### Assumption 2.

The distribution of the random vector Y is absolutely continuous with respect to the Lebesgue measure on , with finite first order moment, having density that has connected support.

Under assumption 2, for any the minimisation problem defined in equation (1) admits a unique solution , which uniquely identifies one hyperplane .
A special case of directional quantile is obtained by setting ; in that case the directional quantile becomes a scalar value and it inherits all the properties of the usual univariate quantile. This particular case of directional quantile is called projectional quantile, whose formal definition, reported below, is due to Kong and Mizera (2012) and Paindaveine and Šiman (2011).

###### Definition 3.

Let , be a vector in the unit sphere , and . The projectional quantile of is defined as

 qτu∈{argminq∈RΨτu(q)}, (3)

where in equation (2).

Clearly the –projectional quantile is the –quantile of the univariate random variable . This feature makes the definition of projectional quantile particularly appealing in order to extend the MSQ to a multivariate setting because, once the direction is properly chosen, it reduces to the usual univariate quantile. Given a sample of observations from , the empirical version of the projectional quantile is defined as

 qnτu∈{argminqΨnτu(q)},

where denotes the empirical version of the loss function defined in equation (2).

### 2.2 The method of simulated quantiles

The MSQ introduced by Dominicy and Veredas (2013) is likelihood–free simulation–based inferential procedure based on matching quantile–based measures, that is particularly useful in situations where either the density function does is not analytically available and/or moments do not exist. Since it is essentially a simulation–based method it can be applied to all those random variables that can be easily simulated. In the contest of MSQ, parameter are estimated by minimising the distance between an appropriately chosen vector of functions of empirical quantiles and their simulated counterparts based on the postulated parametric model. An appealing characteristic of the MSQ that makes it a valid alternative to other likelihood–free methods, such as the indirect inference of Gouriéroux et al. (1993), is that the MSQ does not rely on a necessarily misspecified auxiliary model. Furthermore, empirical quantiles are robust ordered statistics being able to achieve high protection against bias induced by the presence of outlier contamination.
Here we introduce the MMSQ using the notion of projectional quantiles defined in Section 2.1. Let be a –dimensional random variable with distribution function , which depends on a vector of unknown parameters , and be a vector of independent realisations of . Moreover, let be a matrix of projectional quantiles at given confidence levels with , and . Let be a vector of quantile functions assumed to be continuously differentiable with respect to for all and measurable for and for all . Let us assume also that cannot be computed analytically but it can be empirically estimated on simulated data; denote those quantities by . Let be a matrix of projectional quantiles with and , and let be a vector of functions of sample quantiles, that is measurable of .
The MMSQ at each iteration estimates on a sample of replication simulated from , for , as , where is the function computed at the –th simulation path. The parameters are subsequently updated by minimising the distance between the vector of quantile measures calculated on the true observations and that calculated on simulated realisations as follows

 ^ϑ=argminϑ∈ϑ(^% \boldmathΦu−~\boldmathΦRu,ϑ)′Wϑ(^\boldmath% Φu−~\boldmathΦRu,ϑ), (4)

where is a symmetric positive definite weighting matrix. The method of simulated quantiles of Dominicy and Veredas (2013) reduces to the selection of the first canonical direction as relevant direction in the projectional quantile.
The vector of functions of projectional quantiles should be carefully selected in order to be as informative as possible for the vector of parameters of interest. In their applications, Dominicy and Veredas (2013) only propose to use the MSQ to estimate the parameters of univariate Stable law. Toward this end they consider the following vector of quantile–based statistics, as in McCulloch (1986) and Kim and White (2004)

 \boldmathΦϑ=(q0.95+q0.05−2q0.5q0.95−q0.05,q0.95−q0.05q0.75−q0.25,q0.75−q0.25,q0.5)′.

where the first element of the vector is a measure of skewness, the second one is a measure of kurtosis and the last two measures refer to scale and location, respectively. Of course, the selection of the quantile–based summary statistics depend either on the nature of the parameter and on the assumed distribution. The MMSQ generalises also the MSQ proposed by Dominicy et al. (2013) where they estimate the elements of the variance–covariance matrix of multivariate elliptical distributions by means of a measure of co–dispersion which consists in the in interquartile range of the standardised variables projected along the bisector. The MMSQ based on projectional quantiles is more flexible and it allows us to deal with more general distributions than elliptically contoured distributions because it relies on the construction of quantile based measures on the variables projected along an optimal directions which depend upon the considered distribution. The selection of the relevant direction is deferred to Section 4.

### 2.3 Asymptotic theory

In this section we establish consistency and asymptotic normality of the proposed MMSQ estimator. The next theorem establish the asymptotic properties of projectional quantiles.

###### Theorem 4.

Let be a random vector with cumulative distribution function and variance–covariance matrix . Let be a sample of iid observations from . Let and be the projected random variable along with cumulative distribution function , for . Let where , be the vector of directional quantiles along the direction and suppose , for . Let us assume that is differentiable in and , for and . Then

1. for a given direction , with , it holds

 √n(^qτk,uk−qτk,uk)d→N(0,\boldmathη),

as , where denotes a symmetric matrix whose generic entry is

 ηr,c=τr∧τc−τrτcfZk(qτr,uk)fZk(qτc,uk),

for ;

2. for a given level , with , it holds

 √n(^qτj−qτj)d→N(0,\boldmathη),

as , where ,

 ηr,c =⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩−τ2jfZr(qτjur)fZc(qτjuc)+FZr,Zc(qτj,r,c,ΣZr,Zc)fZr(qτjur)fZc(qτjuc),forr≠cτj(1−τj)fZr(qτjur)2,forr=c,

and denotes the variance–covariance matrix of the random variables and and , for ;

3. given and with and and given and with and , it holds

as , where

 ηr,c=−τjτlfZs(qτj)fZt(qτl)+FZs,Zt((qτjus,qτlut),ΣZs,Zt)fZs(qτj)fZt(qτl),forr≠c.
###### Proof.

See Appendix A. ∎

###### Remark 5.

The expression stands for the minimum of and . As regards the calculation of the sparsity function we refer to Koenker (2005) and Dominicy and Veredas (2013).

To establish the asymptotic properties of the MMSQ estimates we need the following set of assumptions.

###### Assumption 6.

There exists a unique/unknown true value such that the sample function of projectional quantiles equal the theoretical one, provided that each quantile–based summary statistic is computed along a direction that is informative for the parameter of interest. That is .

###### Assumption 7.

is the unique minimiser of .

###### Assumption 8.

Let be the sample variance–covariance matrix of and be the non–singular variance–covariance matrix of , then converges to as goes to infinity.

###### Assumption 9.

The matrix is non–singular.

Under these assumptions we show the asymptotic properties of functions of quantiles.

###### Theorem 10.

Under the hypothesis of Theorem 4 and assumptions 68, we have

 √n(^\boldmathΦ−\boldmathΦϑ)d→N(0,% \boldmathΩϑ) √n(~\boldmathΦ−\boldmathΦϑ)d→N(0,% \boldmathΩϑ),

as , where , , is the variance–covariance matrix of the projectional quantiles defined in Theorem 4 and .

###### Proof.

See Appendix A. ∎

Next theorem shows the asymptotic properties of the MMSQ estimator.

###### Theorem 11.

Under the hypothesis of Theorem 4 and assumptions 69, we have

 √n(^ϑ−ϑ)d→N(0,(1+1R)DϑWϑ\boldmathΩϑW′ϑD′ϑ),

as , where .

###### Proof.

See Appendix A. ∎

The next corollary provides the optimal weighting matrix .

###### Corollary 12.

Under the hypothesis of Theorem 4 and assumptions 69, the optimal weighting matrix is

 W∗ϑ=\boldmathΩ−1ϑ.

Therefore, the efficient method of simulated quantiles estimator E–MMSQ has the following asymptotic distribution

 √n(^ϑ−ϑ)d→N(0,(1+1R)(∂Φϑ∂ϑ′\boldmathΩ−1ϑ∂Φϑ∂ϑ)−1),

as .

## 3 Handling sparsity

In this section the MMSQ estimator is extended in order to achieve sparse estimation of the scaling matrix. Specifically, the smoothly clipped absolute deviation (SCAD) –penalty of Fan and Li (2001) is introduced into the MMSQ objective function. Formally, let be a random vector and be its variance–covariance matrix we are interested in providing a sparse estimation of . To achieve this target we adopt a modified version of the MMSQ objective function obtained by adding the SCAD penalty to the off–diagonal elements of the covariance matrix in line with Bien and Tibshirani (2011). The SCAD function is a non convex penalty function with the following form

 pλ(|γ|)=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩λ|γ|if |γ|≤λ1a−1(aλ|γ|−γ22)−λ22(a−1)if λ<γ≤aλλ2(a+1)2if aλ<|γ|, (6)

which corresponds to quadratic spline function with knots at and . The SCAD penalty is continuously differentiable on but singular at 0 with its derivative equal to zero outside the range . This results in small coefficients being set to zero, a few other coefficients being shrunk towards zero while retaining the large coefficients as they are. The penalised MMSQ estimator minimises the penalised MMSQ objective function, defined as follows

 ^ϑ =argminϑQ⋆(ϑ), (7)

where is the penalised distance between and and are the quantile–based summary statistics defined in Section 2.2. As shown in Fan and Li (2001), the SCAD estimator, with appropriate choice of the regularisation (tuning) parameter, possesses a sparsity property, i.e., it estimates zero components of the true parameter vector exactly as zero with probability approaching one as sample size increases while still being consistent for the non–zero components. An immediate consequence of the sparsity property of the SCAD estimator is that the asymptotic distribution of the estimator remains the same whether or not the correct zero restrictions are imposed in the course of the SCAD estimation procedure. They call them the oracle properties.

Let be the true value of the unknown parameter , where is the subset of non–zero parameters and and let . The following definition of oracle estimator is given in Zou (2006).

###### Definition 13.

An oracle estimator has the following properties:

• consistent variable selection: , where
;

• asymptotic normality: , as , where is the variance covariance matrix of .

Following Fan and Li (2001), in the remaining of this section we establish the oracle properties of the penalised SCAD MMSQ estimator. We first prove the sparsity property.

###### Theorem 14.

Given the SCAD penalty function , for a sequence of such that , and , as , there exists a local minimiser of in (7) with . Furthermore, we have

 limn→∞P(^ϑ0=0)=1. (8)
###### Proof.

See Appendix A. ∎

The following theorem establishes the asymptotic normality of the penalised SCAD MMSQ estimator; we denote by the subvector of that does not contain zero off–diagonal elements of the variance covariance matrix and by the corresponding penalised MMSQ estimator.

###### Theorem 15.

Given the SCAD penalty function , for a sequence and as , then has the following asymptotic distribution:

 √n(^ϑ1−ϑ10)d→N(0,(1+1R)(∂Φϑ∂ϑ1′% \boldmathΩ−1ϑ10∂Φϑ∂ϑ1)−1), (9)

as .

###### Proof.

See Appendix A. ∎

### 3.1 Algorithm

The objective function of the sparse estimator is the sum of a convex function and a non convex function which complicates the minimisation procedure. Here, we adapt the algorithms proposed by Fan and Li (2001) and Hunter and Li (2005) to our objective function in order to allow a fast procedure for the minimisation problem.
The first derivative of the penalty function can be approximated as follows

 [pλ(|σij|)]′=p′λ(|σij|)sgn(σij)≈p′λ(|σij,0|)|σij,0|σij, (10)

when . We use it in the first order Taylor expansion of the penalty to get

 pλ(|σij|)≈pλ(|σij,0|)+12p′λ(|σij,0|)|σij,0|(σ2ij−σ2ij,0), (11)

for . The objective function in equation (7) can be locally approximated, except for a constant term by

 Q⋆(ϑ) ≈(^Φ−~ΦRϑ0)′W¯θ(^Φ−~ΦRϑ0)−∂~ΦRϑ0∂ϑW¯ϑ(^Φ−~ΦRϑ0)(ϑ−ϑ0) +12(ϑ−ϑ0)′∂~ΦRϑ0∂ϑW¯ϑ∂~ΦRϑ0∂ϑ(ϑ−ϑ0)+n2ϑ′Σλ(ϑ0)ϑ, (12)

where , for which the first order condition becomes

 ∂Q⋆(ϑ)∂ϑ ≈−∂~ΦRϑ0∂ϑW¯ϑ(^Φ−~ΦRϑ0)+∂~ΦRϑ0∂ϑ′W¯ϑ∂~ΦRϑ0∂ϑ(ϑ−ϑ0)+nΣλ(ϑ0)ϑ =−∂~ΦRϑ0∂ϑW¯ϑ(^Φ−~ΦRϑ0)+∂~ΦRϑ0∂ϑ′W¯ϑ∂~ΦRϑ0∂ϑ(ϑ−ϑ0) +nΣλ(ϑ0)(ϑ−ϑ0)+nΣλ(ϑ0)ϑ0 +Σλ(ϑ0)ϑ0 =0, (13)

and therefore

 ϑ =ϑ0−⎡⎢⎣∂~ΦRϑ0∂ϑ′W¯ϑ∂~ΦRϑ0∂ϑ+nΣλ(ϑ0)⎤⎥⎦−1 (14)

The optimal solution can be find iteratively, as follows

 ϑ(k+1) =ϑ(k)−⎡⎣∂~ΦRϑ(k)∂ϑ′W¯ϑ∂~ΦRϑ(k)∂ϑ+nΣλ(ϑ(k))⎤⎦−1 ×⎡⎣−∂~ΦRϑ(k)∂ϑW¯ϑ(^Φ−~ΦRϑ(k))+nΣλ(ϑ(k))ϑ(k)⎤⎦, (15)

and if , then is set equal zero. When the algorithm converges the estimator satisfies the following equation

 −∂~ΦRϑ0∂ϑW¯ϑ(^Φ−~ΦRϑ0)+nΣλ(ϑ0)ϑ0=0, (16)

that is the first order condition of the minimisation problem of the SCAD MMSQ.
The algorithm used above and introduced by Fan and Li (2001) is called local quadratic approximation (LQA). Hunter and Li (2005) showed that LQA applied to penalised maximum likelihood is an MM algorithm. Indeed, we define

 (17)

since the SCAD penalty is concave it holds

 Ψ|σij,0|(|σij|)≥pλ(|σij|),∀|σij|, (18)

and equality holds when . Then majorise , and it holds

 Ψ|σij,0|(|σij|)<Ψ|σij,0|(|σij,0|)⇒pλ(|σij|)

that is called descendent property. This feature allows us to construct an MM algorithm: at each iteration we construct and then minimize it to get , that satisfies . Let us consider the following

 Sk(ϑ)=(^Φ−~ΦRϑ)′W¯θ(^Φ−~ΦRϑ)+n∑i>jΨ|σ(k)ij|(|σij|), (20)

then majorise ; thus we only need to minimise , that can be done as explained above. Hunter and Li (2005) proposed an improved version of LQA for penalised maximum likelihood, aimed at avoiding to zero out the parameters too early during the iterative procedure. We present their method applied to SCAD MMSQ as follows

 pλ,ϵ(|σij|) =pλ(|σij|)−ϵ∫|σij|0p′λ(|σij,0|)ϵ+tdt Q⋆ϵ(ϑ) =(^Φ−~ΦRϑ)′W¯ϑ(^Φ−~ΦRϑ)+n∑i>jpλ,ϵ(|σij|) Ψ|σij,0|,ϵ(|σij|) =pλ,ϵ(|σij,0|)+p′λ(|σij,0|)2(ϵ+|σij,0|)(σ2ij−σ2ij,0) Sk,ϵ(ϑ) =(^Φ−~ΦRϑ)′W¯ϑ(^Φ−~ΦRϑ)+n∑i>jΨ|σ(k)ij|,ϵ(|σij|),

where is a consistent estimator of . They proved that as the perturbed objective function converges uniformly to the not perturbed one and that if is a minimiser of then any limit point of the sequence is a minimiser of . This construction allows to define even when . The authors also provided a way to choose the value of the perturbation and suggested the following

 ϵ=τ2np′λ(0)min{|σ(0)i,j|:σ(0)i,j≠0}, (21)

with the following tuning constant .

### 3.2 Tuning paramenter selection

The SCAD penalty requires the selection of two tuning parameters . The first tuning parameter is fixed at as suggested in Fan and Li (2001), while the parameter is selected using as validation function

 V(λ)=1n(^Φ−~ΦR^ϑλ)W^ϑλ(^Φ−~ΦR^ϑλ), (22)

where denotes the parameters estimate when is selected as tuning parameter. We choose ; the minimisation is performed over a grid of values for .
An alternative approach is the –fold cross validation, in which the original sample is divided in subgroups , called folds. The validation function is

 CV(λ)=K∑k=11nk(^Φ−~ΦR^ϑλ,k)W^ϑλ,k(^Φ−~ΦR^ϑλ,k), (23)

where denotes the parameters estimate on the sample with as tuning parameter. Then the optimal value is chosen as ; again the minimisation is performed over a grid of values for .

### 3.3 Implementation

The symmetric and positive definiteness properties of the variance–covariance matrix should be preserved at each step of the optimisation process. Preserving those properties is a difficult task since the constraints that ensure the definite positiveness of a matrix are non linear. Therefore, we consider an implementation that is similar to the Graphical Lasso algorithm introduced by Friedman et al. (2008). We outline the steps of the algorithm below. Let be a correlation matrix of dimension and partition as follows

 \boldmathΩ=[\boldmathΩ11\boldmathω12\boldmathω′121], (24)

where is a matrix of dimension and is a vector of dimension , and consider the transformation consider the transformation where is obtained by applying a step of the Newton–Raphson algorithm to as follows

 ^\boldmathω12 ×⎡⎢⎣−∂~ΦRσ12∂σ12W¯σ12(^Φ−~ΦRσ12)+nΣλ(\boldmathω12)\boldmathω12⎤⎥⎦. (25)

Once we update the last column, we shift the next to the last at the end and repeat the steps described above. We repeat this procedure until convergence.

## 4 Synthetic data examples

As mentioned in the introduction the Stable distribution plays an interesting role in modelling multivariate data. Its peculiarity of heaving heavy tailed properties and its closeness under summation make it appealing in the financial contest. Despite its characteristics, estimation of parameters has been always challenging and this feature greatly limited its use in applied works requiring simulation–based methods. In this section we briefly introduce the multivariate Elliptical Stable distribution (ESD) previously considered by Lombardi and Veredas (2009).

### 4.1 Multivariate Elliptical Stable distribution

A random vector is elliptically distributed if

 Y=d\boldmathξ+R\boldmathΓU, (26)

where is a vector of location parameters, is a matrix such that is a