High-dimensionality effects in the Markowitz problem and other quadratic programs with linear constraints: Risk underestimation\thanksrefT1

# High-dimensionality effects in the Markowitz problem and other quadratic programs with linear constraints: Risk underestimation\thanksrefT1

\fnmsNoureddine \snmEl Karoui\correflabel=e1]nkaroui@stat.berkeley.edu [ University of California, Berkeley Department of Statistics
367 Evans Hall
University of California, Berkeley
Berkeley, California 94720-3860
USA
\smonth8 \syear2009\smonth1 \syear2010
\smonth8 \syear2009\smonth1 \syear2010
\smonth8 \syear2009\smonth1 \syear2010
###### Abstract

We first study the properties of solutions of quadratic programs with linear equality constraints whose parameters are estimated from data in the high-dimensional setting where , the number of variables in the problem, is of the same order of magnitude as , the number of observations used to estimate the parameters. The Markowitz problem in Finance is a subcase of our study. Assuming normality and independence of the observations we relate the efficient frontier computed empirically to the “true” efficient frontier. Our computations show that there is a separation of the errors induced by estimating the mean of the observations and estimating the covariance matrix. In particular, the price paid for estimating the covariance matrix is an underestimation of the variance by a factor roughly equal to . Therefore the risk of the optimal population solution is underestimated when we estimate it by solving a similar quadratic program with estimated parameters.

We also characterize the statistical behavior of linear functionals of the empirical optimal vector and show that they are biased estimators of the corresponding population quantities.

We investigate the robustness of our Gaussian results by extending the study to certain elliptical models and models where our observations are correlated (in “time”). We show a lack of robustness of the Gaussian results, but are still able to get results concerning first order properties of the quantities of interest, even in the case of relatively heavy-tailed data (we require two moments). Risk underestimation is still present in the elliptical case and more pronounced than in the Gaussian case.

We discuss properties of the nonparametric and parametric bootstrap in this context. We show several results, including the interesting fact that standard applications of the bootstrap generally yield inconsistent estimates of bias.

We propose some strategies to correct these problems and practically validate them in some simulations. Throughout this paper, we will assume that , and tend to infinity, and .

Finally, we extend our study to the case of problems with more general linear constraints, including, in particular, inequality constraints.

[
\kwd
\doi

10.1214/10-AOS795 \volume38 \issue6 2010 \firstpage3487 \lastpage3566 \newproclaimfactNumThm[theorem]Fact

\runtitle

High-dimensional quadratic programs \thankstextT1Supported by the France–Berkeley Fund, a Sloan research Fellowship and NSF Grants DMS-06-05169 and DMS-08-47647 (CAREER).

{aug}

class=AMS] \kwd[Primary ]62H10 \kwd[; secondary ]90C20. Covariance matrices \kwdconvex optimization \kwdquadratic programs \kwdmultivariate statistical analysis \kwdhigh-dimensional inference \kwdconcentration of measure \kwdrandom matrix theory \kwdMarkowitz problem \kwdWishart matrices \kwdelliptical distributions.

## 1 Introduction

Many statistical estimation problems are now formulated, implicitly or explicitly, as solutions of certain optimization problems. Naturally, the parameters of these problems tend to be estimated from data and it is therefore important that we understand the relationship between the solutions of two types of optimization problems: those which use the population parameters and those which use the estimated parameters. This question is particularly relevant in high-dimensional inference where one suspects that the differences between the two solutions might be considerable. The aim of this paper is to contribute to this understanding by focusing on quadratic programs with linear constraints. An important example of such a program where our questions are very natural is the celebrated Markowitz optimization problem in Finance which will serve as a supporting example throughout the paper.

The Markowitz problem [Markowitz (1952)] is a classic portfolio optimization problem in Finance, where investors choose to invest according to the following framework: one picks assets in such a way that the portfolio guarantees a certain level of expected returns but minimizes the “risk” associated with them. In the standard framework, this risk is measured the variance of the portfolio.

Markowitz’s paper was highly influential and much work has followed. It is now part of the standard textbook literature on these issues [Ruppert (2006), Campbell, Lo and MacKinlay (1996)]. Let us recall the setup of the Markowitz problem.

• We have the opportunity to invest in assets, .

• In the ideal situation, the mean returns are known and represented by a -dimensional vector, .

• Also, the covariance between the returns is known; we denote it by .

• We want to create a portfolio, with guaranteed mean return , and minimize its risk, as measured by variance.

• The question is how should items be weighted in portfolio? What are weights ?

We note that is positive semi-definite and hence is in particular symmetric. In the ideal (or population) solution, the covariance and the mean are known. The mathematical formulation is then the following simple quadratic program. We wish to find the weights that solve the following problem:

 ⎧⎪⎨⎪⎩min12w′Σw,w′μ=μP,w′e=1.

Here is a -dimensional vector with 1 in every entry. If is invertible, the solution is known explicitly (see Section 2). If we call the solution of this problem, the curve , seen as a function of , is called the efficient frontier.

Of course, in practice, we do not know and and we need to estimate them. An interesting question is therefore to know what happens in the Markowitz problem when we replace population quantities by corresponding estimators.

Naturally, we can ask a similar question for general quadratic programs with linear constraints [see below or Boyd and Vandenberghe (2004) for a definition], the Markowitz problem being a particular instance of such a problem. This paper provides an answer to these questions under certain distributional assumptions on the data. Hence our paper is really about the impact of estimation error on certain high-dimensional -estimation problems.

It has been observed by many that there are problems in practice when replacing population quantities by standard estimators [see Lai and Xing (2008), Section 3.5], and alternatives have been proposed. A famous one is the Black–Litterman model [Black and Litterman (1990), Meucci (2005) and, e.g., Meucci (2008)]. Adjustments to the standard estimators have also been proposed: Ledoit and Wolf (2004), partly motivated by portfolio optimization problems, proposed to “shrink” the sample covariance matrix toward another positive definite matrix (often the identity matrix properly scaled), while Michaud (1998) proposed to use the bootstrap and to average bootstrap weights to find better-behaved weights for the portfolio. As noted in Lai and Xing (2008), there is a dearth of theoretical studies regarding, in particular, the behavior of bootstrap estimators.

An aspect of the problem that is of particular interest to us is the study of large-dimensional portfolios (or quadratic programs with linear constraints). To make matters clear, we focus on a portfolio with assets. If we use a year of daily data to estimate , the covariance between the daily returns of the assets, we have observations at our disposal. In modern statistical parlance, we are therefore in a “large , large ” setting, and we know from random matrix theory that the sample covariance matrix is a poor estimator of , especially when it comes to spectral properties of . There is now a developing statistical literature on properties of sample covariance matrices when and are both large, and it is now understood that, though is unbiased for , the eigenvalues and eigenvectors of behave very differently from those of . We refer the interested reader to Johnstone (2001), El Karoui (2007, 2008, 2009a), Bickel and Levina (2008a), Rothman et al. (2008) for a partial introduction to these problems. We wish with this study to make clear that the “large , large ” character of the problem has an important impact of the empirical solution of the problem. By contrast, standard but thorough discussions of these problems [Meucci (2005)] give only a cursory treatment of dimensionality issues (e.g., one page out of a whole book).

Another interesting aspect of this problem is that the high-dimensional setting does not allow, by contrast to the classical “small , large ” setting, a perturbative approach to go through. In the “small , large ” setting, the paper Jobson and Korkie (1980) is concerned, in the Gaussian case, with issues similar to the ones we will be investigating.

The “large , large ” setting is the one with which random matrix theory is concerned—and the high-dimensional Markowitz problem has therefore been of interest to random matrix theorists for some time now. We note in particular the paper Laloux et al. (2000), where a random matrix-inspired (shrinkage) approach to improved estimation of the sample covariance matrix is proposed in the context of the Markowitz problem.

Let us now remind the reader of some basic facts of random matrix theory that suggest that serious problems may arise if one solves naively the high-dimensional Markowitz problem or other quadratic programs with linear equality constraints. A key result in random matrix theory is the Marčenko–Pastur equation [Marčenko and Pastur (1967)] which characterizes the limiting distribution of the eigenvalues of the sample covariance matrix and relates it to the spectral distribution of the population covariance matrix. We give only in this introduction its simplest form and refer the reader to Marčenko and Pastur (1967), Wachter (1978), Silverstein (1995), Bai (1999) and, for example, El Karoui (2009a) for a more thorough introduction and very recent developments, as well as potential geometric and statistical limitations of the models usually considered in random matrix theory.

In the simplest setting, we consider data , which are -dimensional. In a financial context, these vectors would be vectors of (log)-returns of assets, the portfolio consisting of assets. To simplify the exposition, let us assume that the ’s are i.i.d. with distribution . We call the matrix whose th row is the vector . Let us consider the sample covariance matrix

 ^Σ=1n−1(X−¯X)′(X−¯X),

where is a matrix whose rows are all equal to the column mean of . Now let us call the spectral distribution of , that is, the probability distribution that puts mass at each of the eigenvalues of . A graphical representation of this probability distribution is naturally the histogram of eigenvalues of . A consequence of the main result of the very profound paper Marčenko and Pastur (1967) is that , though a random measure, is asymptotically nonrandom, and its limit, in the sense of weak convergence of distributions, has a density (when ) that can be computed.  depends on in the following manner: if , the density of is

 fρ(x)=12πρ√(y+−x)(x−y−)x1y−≤x≤y+,

where and . Figure 1 presents a graphical illustration of this result.

What is striking about this result is that it implies that the largest eigenvalue of , , will be overestimated by the largest eigenvalue of . Also, the smallest eigenvalue of , , will be underestimated by the smallest eigenvalue of , . As a matter of fact, in the model described above, has all its eigenvalues equal to 1, so , while will asymptotically be larger or equal to and smaller or equal to (in the Gaussian case and several others, and converge to those limits). We note that the result of Marčenko and Pastur (1967) is not limited to the case where is identity, as presented here, but holds for general covariance  ( has of course a different limit then).

Perhaps more concretely, let us consider a projection of the data along a vector , with , where is the Euclidian norm of . Here it is clear that, if , , for all , since . However, if we do not know and estimate it by , a naive (and wrong) reasoning suggests that we can find direction of lower variance than 1, namely those corresponding to eigenvectors of associated with eigenvalues that are less than 1. In particular, if is the eigenvector associated with , the smallest eigenvalue of , by naively estimating, for independent of , the variance in the direction of , , by the empirical version , one would commit a severe mistake: the variance in any direction is 1, but it would be estimated by something roughly equal to in the direction of .

In a portfolio optimization context, this suggests that by using standard estimators, such as the sample covariance matrix, when solving the high-dimensional Markowitz problem, one might underestimate the variance of certain portfolios (or “optimal” vectors of weights). As a matter of fact, in the previous toy example, thinking (wrongly) that there is low variance in the direction , one might (numerically) “load” this direction more than warranted, given that the true variance is the same in all directions.

This simple argument suggests that severe problems might arise in the high-dimensional Markowitz problem and other quadratic programs with linear constraints, and in particular, risk might be underestimated. While this heuristic argument is probably clear to specialists of random matrix theory, the problem had not been investigated at a mathematical level of rigor in that literature before this paper was submitted [the paper Bai, Liu and Wong (2009) has appeared while this paper was being refereed. It is concerned with different models than the ones we will be investigating and our results do not overlap]. It has received some attention at a physical level of rigor [see, e.g., Pafka and Kondor (2003), where the authors treat only the Gaussian case, and do not investigate the effect of the mean, which as we show below creates problems of its own]. In this paper, we propose a theoretical analysis of the problem in a Gaussian and elliptical framework for general quadratic programs with linear constraints, one of them involving the parameter . Our results and contributions are several-fold. We relate the empirical efficient frontier to the theoretical efficient frontier that is key to the Markowitz theory, in a variety of theoretical settings. We show that the empirical frontier generally yields an underestimation of the risk of the portfolio and that Gaussian analysis gives an over-optimistic view of this problem. We show that the expected returns of the naive “optimal” portfolio are poorly estimated by . We argue that the bootstrap will not solve the problems we are pointing out here. Beside new formulas, we also provide robust estimators of the various quantities we are interested in.

The paper is divided into four main parts and a conclusion. In Section 2, to make the paper self-contained, we discuss the solution of quadratic problems with linear equality constraints—a focus of this paper. In Section 3, we study the impact of parameter estimation on the solution of these problems when the observed data is i.i.d. Gaussian and obtain some exact distributional results for fixed and . In Section 4, we obtain results in the case where the data is elliptically distributed. This allows us also to understand the impact of correlation between observations in the Gaussian case and to get information about the behavior of the nonparametric bootstrap. In Section 5, we apply the results of Section 4 to the quadratic programs at hand and compare the elliptical and the Gaussian cases. We show, among other things, that the Gaussian results are not robust in the class of elliptical distribution. In particular, two models may yield the same and but can have very different empirical behavior. In Section 5, we also propose various schemes to correct the problems we highlight (see pages 2, 3 and 4 for pictures) and study more general problems with linear constraints (see Section 5.6). The conclusion summarizes our findings and the Appendix contains various facts and proofs that did not naturally flow in the main text or were better highlighted by being stated separately.

Several times in the paper and will appear. Unless otherwise noted, when taking the inverse of a population matrix, we implicitly assume that it exists. The question of existence of inverse of sample covariance matrices is well understood in the statistics literature. Because our models will have a component with a continuous distribution, there are essentially no existence problems (unless we explicitly mention and treat them) as proofs similar to standard ones found in textbooks [e.g., Anderson (2003)] would show. Hence, we do not belabor this point any further in the rest of the paper as our focus is on things other than rather well-understood technical details, and the paper is already a bit long.

Finally, let us mention that while the Finance motivation for our study is important to us, we treat the problem in this paper as a high-dimensional -estimation question (which we think has practical relevance). We will not introduce particular modelization assumptions which might be relevant for practitioners of Finance but might make the paper less relevant in other fields. A companion paper [El Karoui (2009b)] deals with more “financial” issues and the important question of the realized risk of portfolios that are “plug-in” solutions of the Markowitz problem.

## 2 Quadratic programs with linear equality constraints

We discuss here the properties of the solution of quadratic programs with linear equality constraints as they lay the foundations for our analysis of similar problems involving estimated parameters (and of problems with inequality constraints). We included this section for the convenience of the reader to make the paper as self-contained as possible.

The problem we want to solve is the following:

 ⎧⎨⎩minw∈Rp12w′Σw,w′vi=ui,1≤i≤k. (QP-eqc)

Here is a positive definite matrix of size , and . We have the following theorem:

###### Theorem 2.1

Let us call the matrix whose th column is , the  dimensional vector whose th entry is and the matrix

 M=V′Σ−1V.

We assume that the ’s are such that is invertible. The solution of the quadratic program with linear equality constraints (QP-eqc) is achieved for

 woptimal=Σ−1VM−1U,

and we have

 w′optimalΣwoptimal=U′M−1U.
{pf}

Let us call a dimensional vector of Lagrange multipliers. The Lagrangian function is, in matrix notation,

 L(w,λ)=w′Σw2−λ′(V′w−U).

This is clearly a (strictly) convex function in , since is positive definite by assumption. We have

 ∂L∂w=Σw−Vλ.

So . Now we know that . So . Therefore,

 woptimal=Σ−1VM−1U.

We deduce immediately that

 w′optimalΣwoptimal=U′M−1U.
\upqed

We now turn to another result which will prove to be useful later. It gives a compact representation of linear combinations of the weights of the optimal solution, and we will rely heavily on it in particular in the case of Gaussian data.

###### Lemma 2.2

Let us consider the solution of the optimization problem (QP-eqc). Let be a vector in . Let us call the matrix that is written in block form

 M=(V′Σ−1VV′Σ−1γγ′Σ−1Vγ′Σ−1γ).

Assume that is invertible. Then

 γ′woptimal=−1M−1k+1,k+1(U′0)M−1(0k1). (1)
{pf}

The proof is a consequence of the results discussed in the Appen- dix concerning inverses of partitioned matrices [see Section A.1 and equation (A.4) there]. Let us write

 M=(M11M12M21M22),

where is , is naturally and is a scalar. With the same block notation, we have

 M−1=(M11M12M21M22).

Then, we know [see equation (A.4)] that but since is a scalar, equal to , we have

 M−111M12=−M12/M22.

Now , so . Hence,

 w′optimalγ=−1M22(U′0)M−1(0k1).
\upqed

We note that here , as an application of equation (A.2) clearly shows.

## 3 QP with equality constraints: Impact of parameter estimation in the Gaussian case

From now on, we will assume that we are in the high-dimensional setting where and go to infinity. Our study will be divided into two. We will first consider the Gaussian setting (in this section) and then study an elliptical distribution setting (in Section 4). (We note that for the Markowitz problem, the assumption of Gaussianity would be satisfied if we worked under Black–Scholes diffusion assumptions for our assets and were considering log-returns as our observations.) Interestingly, we will show that the results are not robust against the assumption of Gaussianity, which is not (so) surprising in light of recent random matrix results [see El Karoui (2009a)]. We will also show that understanding the elliptical setting allows us to understand the impact of correlation between observations and to discuss bootstrap-related ideas. In particular, we will see that various problems arise with the bootstrap in high-dimension and that the results change when one deals with observations that are correlated (in time) or not.

We also address similar questions concerning inequality constrained problems in Section 5.6.

Before we proceed, we need to set up some notations: we call the -dimensional vector whose entries are all equal to 1. We call , as above, the matrix containing all of our constraint vectors, which we may have to estimate (for instance, if for a certain ). We call the matrix of estimated constraint vectors.

The template question for all our investigations will be the following (Markowitz) question: what can be said of the statistical properties of the solution of

 ⎧⎪⎨⎪⎩minw∈Rpw′^Σw,w′^μ=μP,w′e=1

compared to the solution of the population version

 ⎧⎪⎨⎪⎩minw∈Rpw′Σw,w′μ=μP,w′e=1?

We will solve the problem at a much greater degree of generality, by considering first quadratic programs with linear equality constraints (see Section 5.6 for inequality constraints) and comparing the solutions of

 ⎧⎪⎨⎪⎩minw∈Rpw′^Σw,w′vi=ui,1≤i≤k−1,w′^μ=uk (QP-eqc-Emp)

and

 ⎧⎪⎨⎪⎩minw∈Rpw′Σw,w′vi=ui,1≤i≤k−1,w′μ=uk. (QP-eqc-Pop)

Here and will be estimated from the data. We call the vector that yields a solution of problem (QP-eqc-Emp) and the vector that yields a solution of problem (QP-eqc-Pop).

We call the matrix containing and , and its population counterpart, which contains and . We assume that are deterministic and known (just like the vector in the Markowitz problem). In our analysis, will be held fixed. (The th column of will contain in general or our estimator of .)

As should be clear from Theorem 2.1, the properties of the entries of the matrix as compared to those of the matrix will be key to our understanding of this question. In what follows, we assume that the vectors are either deterministic or equal to . The extension to linear combinations of a deterministic vector and is straightforward. We also note that in the Gaussian case, we could just assume that the are (deterministic) functions of (because and are independent in this case). On the other hand, the vector is assumed to be deterministic.

Before we proceed, let us mention that after our study was completed, we learned of similar results (restricted to the Markowitz case and not dealing with general quadratic programs with linear equality constraints) by Kan and Smith (2008). We stress the fact that our work was independent of theirs and is more general which is why it is included in the paper.

### 3.1 Efficient frontier problems

We first study questions concerning the efficient frontier and then turn to information we can get about linear functionals of the empirical weights.

###### Theorem 3.1

Let us assume that we observe data , for . Here is and . Suppose we estimate with the sample covariance matrix , and with the sample mean . Suppose we wish to solve the problem

 {minw∈Rpw′Σw,w′vj=uj,1≤j≤k. (QP-eqc-Pop)

where are deterministic, are deterministic and given for and . Assume that we use as a proxy for the previous problem the empirical version with plugged-in parameters. Let us consider the solution of the problem

 {minw∈Rpw′^Σw,w′^vj=uj,1≤j≤k. (QP-eqc-Emp)

Now for and , for a given deterministic function . Let us call the corresponding “weight” vector. The plug-in estimate of is . Let us call the optimal solution of the quadratic program obtained under the assumption that is given, but is not and is estimated by . Finally, we assume that .

Then we have

 w′emp^Σwemp=w′oracleΣworacleχ2n−1−p+kn−1, (2)

where is random (because is) but is statistically independent of . Also,

 w′oracleΣworacle=U′(ˆV′Σ−1ˆV)−1U.

The previous theorem means that the cost of not knowing the covariance matrix and estimating it is the apparition of the . In the high-dimensional setting when and are of the same order of magnitude and is large, this terms is approximately . Hence, the theorem quantifies the random matrix intuition that having to estimate the high-dimensional covariance matrix at stake here leads to risk underestimation, by the factor . In other words, using plug-in procedures leads to over-optimistic conclusions in this situation.

We also note that the previous theorem shows that, in the Gaussian setting under study here, the effect of estimating the mean and the covariance on the solution of the quadratic program are “separable”: the effect of the mean estimation is in the oracle term, while the effect of estimating the covariance is in the term. To show risk underestimation, it will therefore be necessary to relate to . We do it in Proposition 3.2 but first give a proof of Theorem 3.1.

{pf*}

Proof of Theorem 3.1 The crux of the proof is the following result, which is well known by statisticians, concerning (essentially) blocks of the inverse of a Wishart matrix: if , that is, is a Wishart matrix with degree of freedoms and covariance , and is , deterministic matrix, then, when ,

 (A′S−1A)−1∼Wk((A′Σ−1A)−1,m−p+k).

We refer to Eaton [(1983), Proposition 8.9, page 312] for a proof, and to Mardia, Kent and Bibby [(1979), pages 70–73] for related results.

Another important remark is the well-known fact that, in the situation we are considering, is and independent of . Finally, it is also well known that if and is a -dimensional deterministic vector, then .

Now . Therefore, since is a function of , we have, by independence of and ,

 (ˆV′^Σ−1ˆV)−1|^μ∼Wk((ˆV′Σ−1ˆV)−1,n−1−p+k)/(n−1).

Therefore,

 U′(ˆV′^Σ−1ˆV)−1UU′(ˆV′Σ−1ˆV)−1U∣∣∣^μ∼χ2n−p−1+kn−1.

Because the right-hand side does not depend on , we have established the independence of

 U′(ˆV′^Σ−1ˆV)−1UU′(ˆV′Σ−1ˆV)−1Uandχ2n−p−1+kn−1.

Hence, we conclude that

 U′(ˆV′^Σ−1ˆV)−1U=U′(ˆV′Σ−1ˆV)−1Uχ2n−p−1+kn−1,

and the two terms are independent. Now the term is the estimate we would get for the solution of problem (QP-eqc-Pop), if were known and were estimated by . In other words, it is the “oracle” solution described above.

#### 3.1.1 Some remarks on the oracle solution

Theorem 3.1 sheds light on the separate effects of mean and covariance estimation on the problem considered above. To understand further the problem of risk estimation, we need to better understand the role the estimation of the mean might play. This is what we do now.

###### Proposition 3.2

Suppose that the last column of is . Let us call the dimensional matrix whose th column is , which are known deterministic vectors. Suppose that . Suppose further that , where is the smallest eigenvalue of the matrix .

Further, call and call the canonical basis vectors in . Finally, call .

Then, when , asymptotically,

 w′oracleΣworacle=w′theoΣwtheo−α(U′M−1ek)21+αe′kM−1ek+oP(w′theoΣwtheo).

Let us discuss a little bit this result before we provide a proof. In the asymptotics we have in mind and are considering, and therefore . So if , when the above analysis applies, the impact of the estimation of by will be risk underestimation, just as is the case for the case of the covariance matrix. Here, we can also quantify the impact of this estimation of by : it leads to risk underestimation by the amount .

{pf*}

Proof of Proposition 3.2 Let us write , where . Clearly, , where is . We have, using block notations,

 ˆV′Σ−1ˆV=V′Σ−1V+(000e′Σ−1e)+(0V′−kΣ−1ee′Σ−1V−k2μ′Σ−1e).

Replacing by its value, we have . By the same token, we can also get that

 V′−kΣ−1e=1√nV′−kΣ−1/2Z∼N(0,V′−kΣ−1V−kn).

Our assumption that implies that and . Therefore,

 (0V′−kΣ−1ee′Σ−1V−k2μ′Σ−1e)=OP(1√n).

Hence, since ,

 ˆV′Σ−1ˆV=V′Σ−1V+αeke′k+OP(n−1/2).

Our assumptions guarantee that , and therefore. In other respects, let be a matrix such that and be a matrix such that . Recall that for symmetric matrices, [see, e.g., Weyl’s theorem, Horn and Johnson (1994), page 185]. So in this situation, . Let us now consider the implications of this remark on the difference of and . We claim that . By the first resolvent identity, ; our previous remark implies that and the result follows. Applying the results of this discussion to and , we have

 ˆV′Σ−1ˆV=(V′Σ−1V+αeke′k)−1+oP((V′Σ−1V+αeke′k)−1).

We can now use well-known results concerning inverses of rank-1 perturbation of matrices, namely

 (V′Σ−1V+αeke′k)−1=(M+αeke′k)−1=M−1−αM−1eke′kM−11+αe′kM−1ek.

This allows us to conclude that

 U′(ˆV′Σ−1ˆV)−1U=U′M−1U−α(U′M−1ek)21+αe′kM−1ek+oP(U′M−1U).

This is the result announced in the theorem and the proof is complete.

We can now combine the results of Theorem 3.1 and Proposition 3.2 to obtain the following corollary.

###### Corollary 3.3

We assume that the assumptions of Theorem 3.1 and Proposition 3.2 hold and that has a finite nonzero limit, as , and tends to infinity. Then we have

 w′emp^Σwemp = (1−p−kn−1)(w′theoΣwtheo−pn(U′M−1ek)21+(p/n)e′kM−1ek) +oP(w′theoΣwtheo∨n−1/2),

where is the population quantity .

The corollary shows that the effects of both covariance and mean estimation are to underestimate the risk, and the empirical frontier is asymptotically deterministic.

### 3.2 On the optimal weights

Our matrix characterization of the empirical optimal weights (Lemma 2.2) allows us to give a precise characterization of the statistical properties of linear functionals of these weights. We give here some exact results, concerning distributions and expectations of those functionals. A longer discussion, including robustness and more detailed bias issues can be found in Section 5.

###### Proposition 3.4

Assume that the assumptions of Theorem 3.1 hold and in particular are i.i.d. . Let be a fixed -dimensional vector. Let us call the matrix whose first columns are those of . Let and be a matrix with distribution (conditional on ). Then,

 γ′wemp|^μ\lx@stackrelL=−∑ki=1uiWγ(i,k+1)Wγ(k+1,k+1).

In particular,

 E(γ′wemp|^μ)=−∑ki=1uiˆNγ(i,k+1)ˆNγ(k+1,k+1).

We note, somewhat heuristically, that when is estimated by , since , , when , and are all large (we refer again to Section 5 for a more precise statement). Hence is a not a consistent estimator of . As we will see in Section 5.2 and as can be expected from the previous proposition, this will also imply bias for linear combinations of empirical optimal weights. We will show in particular that returns are overestimated when using as an estimator for .

Another interesting aspect of the previous proposition is that it allows us to understand the fluctuation behavior of when is large: as a matter of fact, the limiting fluctuation behavior of the entries of a (fixed-dimensional) Wishart matrix with large number of degrees of freedom is well known [see, e.g., Anderson (2003), Theorem 3.4.4, page 87] and the -method can be applied to get the information—conditional on .

For instance, if we assume that, conditional on , the matrix converges to a matrix , which possibly depends on , we see that calling the last column , is asymptotically normal (all statements are conditional on ), if goes to infinity when and go to infinity. Furthermore we know the limiting covariance of (after scaling by ), using Theorem 3.4.4 in Anderson (2003). Let us call it and let us call the limit of —which we assume exists.

If we assume that is not 0, Slutsky’s lemma and the -method give us through simple computations that

 √n−p+k(γ′wemp+∑ki=1uiν0(i)ν0(k+1))∣∣^μ⟹1ν0(k+1)2N(0,C′Γ0C),

where .

We know the distribution of , so we could get (limiting) unconditional results for . This is not hard but a bit tedious if we want explicit expressions, and because our focus is mostly on first-order properties in this paper, we do not state the result.

{pf*}

Proof of Proposition 3.4 The proof follows from the representation we gave in Lemma 2.2, that is,

 γ′wemp=−1(ˆV′γ^Σ−1ˆVγ)−1(k+1,k+1)(U′0)(ˆV′γ^Σ−1ˆVγ)−1(0k1),

and the fact that, by the same arguments as before, conditional on ,

 (ˆV′γ^Σ−1ˆVγ)−1|^μ∼Wk+1((ˆV′γΣ−1ˆVγ)−1,n−p+k)/(n−1).

We conclude that

 γ′wemp|^μ\lx@stackrelL=−(U′0)Wγ(0k1)Wγ(k+1,k+1)=−∑ki=1uiWγ(i,k+1)Wγ(k+1,k+1).

This shows the fist part of the proposition.

The second part follows from the following observation. Suppose the matrix is . If and are -dimensional, orthogonal vectors, let us consider

 α′Pββ′Pβ.

We can, of course, write , where are i.i.d. . In other respects, and are clearly independent normal random variables, since their covariance is , and they are normal. So

 E(α′Pββ′Pβ∣∣{Y′iβ}Ki=1)=0

because the quantity whose expectation we are taking is a linear combination of mean 0 independent normal random variables. Hence, also,

 E(α′Pββ′Pβ)=0.

Now, when is not orthogonal to , we write , where is orthogonal to . We immediately deduce that in general,

 E(α′Pββ′Pβ)=α′β∥β∥22+E(δ′Pββ′Pβ)=α′β∥β∥22.

Furthermore, when is , because we can write , where , we finally have

 E(α′Pββ′Pβ)=α′Σββ′Σβ.

In the case of interest to us, we have , and . Applying the previous formula gives us the second part of the proposition.

We now turn to the question of understanding the robustness properties of the Gaussian results we just obtained. We will do so by studying the same problems under more general distributional assumptions, and specifically we will now assume that the observations are elliptically distributed.

## 4 Solutions of quadratic programs when the data is elliptically distributed

In Section 3, we studied the properties of the “plug-in” solution of problem (QP-eqc-Pop) under the assumption that the data was normally distributed. While this allowed us to shed light on the statistical properties of the solution of problem (QP-eqc-Emp), it is naturally extremely important to understand how robust the results are to our normality assumptions.

In this section, we will consider elliptical models, that is, models such that the data can be expressed as

 Xi=μ+λiΣ1/2Yi,

where is a random variable and are i.i.d. entries. and are assumed to be independent, and to lift the indeterminacy between and , we assume that . Under this assumption, we clearly have . We note that this is not the standard definition of elliptical models, which generally replaces with a vector uniformly distributed on the sphere in , but it captures the essence of the problem. We refer the interested reader to Anderson (2003) and Fang, Kotz and Ng (1990) for extensive discussions of elliptical distributions.

Our motivation for undertaking this study comes also from the fact that for certain types of data, such as financial data, it is sometimes argued that elliptical models are more reasonable than Gaussian ones, for instance, because they can capture nontrivial tail dependence [see Frahm and Jaekel (2005) where such models are advocated for high-dimensional modelization of financial returns, Meucci (2005) for a discussion of their relevance for certain financial markets, Biroli, Bouchaud and Potters (2007) for modelization considerations quite similar to Frahm and Jaekel (2005) and McNeil, Frey and Embrechts (2005) for a thorough discussion of tail dependence]. From a theoretical standpoint, considering elliptical models will also help in several other ways: the results will yield alternative proofs to some of the results we obtained in the Gaussian case, they will allow us to deal with some situations where the data are not independent and they will also allow us to understand the properties of the bootstrap.

We also want to point out that elliptical distributions allow us to not fall into the geometric “trap” of standard random matrix models highlighted in El Karoui (2009a): the fact that data vectors drawn from standard random matrix models are essentially assumed to be almost orthogonal to one another and that their norm (after renormalization by ) is almost constant. In a sense, studying elliptical models will allow us to understand what is the impact of the implicit geometric assumptions made about the data when assuming normality. (We purposely do so not under minimal assumptions but under assumptions that capture the essence of the problem while allowing us to show in the proofs the key stochastic phenomena at play.) This part of the article can therefore be viewed as a continuation of the investigation we started in El Karoui (2009a) where we showed a lack of robustness of random matrix models (contradicting claims of “universality”) by thoroughly investigating limiting spectral distribution properties of high-dimensional covariance matrices when the data is drawn according to elliptical models and generalizations. We show here that the theoretical problems we highlighted in El Karoui (2009a) have important practical consequences. [For more references on elliptical models in a random matrix context, we refer the reader to El Karoui (2009a) where an extended bibliography can be found.]

We now turn to the problem of understanding the solution of problem (QP-eqc-Emp) in the setting where the data is elliptically distributed. We will limit ourselves to the case where the matrix is full of known and deterministic vectors, except possibly for the sample mean. In this section we restrict ourselves to convergence in probability results. It is clear from Section 2 that to tackle the problems we are considering we need to understand at least three types of quantities: for a deterministic with unit norm, and .

Here is a brief overview of our findings. When we consider elliptical models, our results say that roughly speaking, under certain assumptions given precisely later:

1. [3.]

2. , where satisfies, if is the limit law of the empirical distribution of the and , .

3. If , .

4. If , .

All these convergence results are to be understood in probability. They naturally allow us—under certain conditions on the population parameters—to conclude about the convergence in probability of the matrix . The results mentioned above are stated in all details in Theorems 4.1 and 4.6.

In the situation where are i.i.d., the results above hold when have a second moment and they do not put too much mass near 0. This is interesting in practice because it tells us that our results hold for heavy-tailed data, which are of particular interest in some financial applications.

The bootstrap situation corresponds basically to being Poisson(1), which we denote by . Also in the statement above for , one should replace