Numerical Implementation of the QuEST FunctionThe authors wish to thank Edgar Dobriban (Stanford University), Jonathan Fletcher (University of Strathclyde), Matan Gavish (Stanford University), Na Huang (London School of Economics and Political Science), Wen Jun (National University of Singapore), Tatsuya Kubokawa (University of Tokyo), Clifford Lam (London School of Economics and Political Science), Artyom Lepold (University of Kaiserslautern), Stefano Monni (American University of Beirut), Nestor Parolya (University of Hannover), Simon Welsing (Technische Universität München), and Zhao Zhao (Huazhong University of Science and Technology) for testing early versions of this code. Any errors are ours.

Numerical Implementation of the QuEST FunctionThe authors wish to thank Edgar Dobriban (Stanford University), Jonathan Fletcher (University of Strathclyde), Matan Gavish (Stanford University), Na Huang (London School of Economics and Political Science), Wen Jun (National University of Singapore), Tatsuya Kubokawa (University of Tokyo), Clifford Lam (London School of Economics and Political Science), Artyom Lepold (University of Kaiserslautern), Stefano Monni (American University of Beirut), Nestor Parolya (University of Hannover), Simon Welsing (Technische Universität München), and Zhao Zhao (Huazhong University of Science and Technology) for testing early versions of this code. Any errors are ours.

Abstract

This paper deals with certain estimation problems involving the covariance matrix in large dimensions. Due to the breakdown of finite-dimensional asymptotic theory when the dimension is not negligible with respect to the sample size, it is necessary to resort to an alternative framework known as large-dimensional asymptotics. Recently, [27] have proposed an estimator of the eigenvalues of the population covariance matrix that is consistent according to a mean-square criterion under large-dimensional asymptotics. It requires numerical inversion of a multivariate nonrandom function which they call the QuEST function. The present paper explains how to numerically implement the QuEST function in practice through a series of six successive steps. It also provides an algorithm to compute the Jacobian analytically, which is necessary for numerical inversion by a nonlinear optimizer. Monte Carlo simulations document the effectiveness of the code.

JEL CLASSIFICATION NOS: C13, C61, C87.

1Introduction

Many data sets in econometrics, biostatistics and electrical engineering, among a host of other fields, contain large numbers of related variables. The estimation of the covariance matrix poses challenging statistical problems when the dimension is not small relative to sample size. Approximations that are valid under traditional asymptotics, that is, when the dimension remains fixed while the sample size goes to infinity, perform poorly. This is why attention has turned to large-dimensional asymptotics where the dimension and the sample size go to infinity together, with their ratio converging to a finite, nonzero limit called the concentration (ratio).

Under large-dimensional asymptotics, the sample eigenvalues are not consistent estimators of the population eigenvalues. A new estimator for the population eigenvalues under large-dimensional asymptotics was recently introduced by [27]. It hinges critically on a multivariate nonrandom function called the QuEST function. This acronym stands for Quantized Eigenvalues Sampling Transform. [27] provide the mathematical definition of the QuEST function, but do not provide any details about numerical implementation. The problem of numerical implementation is non-trivial, due to the complexity of the definition of the QuEST function. A direct application of this method is the optimal estimation of the covariance matrix in the class of rotation-equivariant estimators introduced by [41] under various loss functions; see [25].

This paper explains how to numerically implement the QuEST function accurately and efficiently. In addition, given that the estimation of the population eigenvalues requires numerically inverting the QuEST function using a nonlinear optimizer, we also give the Jacobian analytically.

Section 2 reviews the literature on this subject. Section 3 gives the definition of the problem that will be solved numerically. Sections Section 4Section 9 describe in detail the six steps needed to implement the QuEST function numerically, delineating all the mathematical results that are needed along the way. Section Section 10 provides extensive Monte Carlo simulations. Section 11 concludes.

2Literature Review

2.1Estimation of the Population Covariance Matrix Eigenvalues

[10] proposed a way to estimate the empirical c.d.f. of population eigenvalues under large-dimensional asymptotics using a different approach than the QuEST function. However, the code executing his algorithm was not made available to other researchers in the field, and those who tried to replicate it themselves did not enjoy much success. The state of affairs is aptly summarized by [28]:

Actually, the general approach in [10] has several implementation issues that seem to be responsible for its relatively low performance as attested by the very simple nature of provided simulation results.

There are three reasons why the same criticisms cannot be levelled against the QuEST function: first, a Matlab executable implementing the QuEST function has already been used independently by [44], [16], [15], and [20], among others1; second, the present paper opens up the code of the QuEST function and its Jacobian to the general public for inspection and potential improvements; and third, Section Section 10 provides an extensive Monte Carlo study with nearly a third of a million simulations across a variety of challenging scenarios.

Apart from [10], other proposals have been put forward, making this field one of the most active ones in multivariate analysis in recent years.

  • [36] provide a solution when the population spectrum has a staircase structure, typically with half of the eigenvalues equal to one and the rest equal to two. The ability of this approach to handle the general case where there can be up to distinct population eigenvalues, with going to infinity, is not established.

  • [32] provides a solution when the concentration ratio is sufficiently small and/or the distinct population eigenvalues sufficiently far from one another, that is, when the sample eigenvalues display what is known as “spectral separation”. This is a favorable situation where the sample eigenvalues are grouped into easily identifiable clusters, each cluster corresponding to one single population eigenvalue (which can have multiplicity higher than one). Monte Carlo simulations assume no more than four distinct population eigenvalues.

  • [3] propose a solution based on the method of moments when the parametric dimension of the population spectrum is finite. They demonstrate good behavior up to order four.

  • [7] elaborate on the previous paper by providing more rigorous justification of the method when the model order is unknown. But Monte Carlo simulations only go to order three.

  • [46] can be seen as a cross between the papers of [32] and [3], but also requiring a finite number of distinct population eigenvalues. In practice, Monte Carlo simulations provided by the authors do not go above three distinct population eigenvalues.

The common point between all these other methods is that they do not purport to address the general case. They work with a finite number of degrees of freedom (in practice no more than four) in the choice of the population spectral distribution, whereas the real number is , which goes to infinity. This is why it is important to avoid the criticisms that have been levelled at the only other ostensibly general approach, that of [10], by fully explaining how to numerically implement the QuEST function, and by providing extensive Monte Carlo simulations showing that it works in practice under a wide variety of circumstances.

Finally, we should note that [9] also provides a numerical method for solving the [30] equation. He does not compute the QuEST function explicitly, and does not provide the Jacobian analytically. As a result, numerical inversion is very difficult, but his paper is not focused on the problem of recovering the population eigenvalues.

2.2Potential Applications

The numerical implementation of the QuEST function given in this paper is essential for the estimation of the population eigenvalues, which in turn is essential for computing the optimal nonlinear shrinkage of the covariance matrix under large-dimensional asymptotics. Many fields are interested in shrinking the covariance matrix when the number of variables is high:

Acoustics

Optimally removing noise from signals captured from an array of hydrophones [48].

Cancer Research

Mapping out the influence of the Human Papillomavirus (HPV) on gene expression [35].

Chemistry

Estimating the temporal autocorrelation function (TACF) for fluorescence correlation spectroscopy [12].

Civil Engineering

Detecting and identifying vibration–based bridge damage through Random Coefficient Pooled (RCP) models [33].

Climatology

Detecting trends in average global temperature through the optimal fingerprinting method [37].

Econometrics

Specifying the target covariance matrix in the Dynamic Conditional Correlation (DCC) model to capture time-series effects in the second moments [13].

Electromagnetics

Studying correlation between reverberation chamber measurements collected at different stirrer positions [34]

Entertainment Technology

Designing a video game controlled by performing tricks on a skateboard [1].

Finance

Reducing the risk in large portfolios of stocks [17].

Genetics

Inferring large-scale covariance matrices from functional genomic data [38].

Geology

Modeling multiphase flow in subsurface petroleum reservoirs with the iterative stochastic ensemble method (ISEM) on inverse problems [11].

Image Recognition

Detecting anomalous pixels in hyperspectral imagery [2].

Neuroscience

Calibrating brain-computer interfaces [29].

Psychology

Modeling co-morbidity patterns among mental disorders [31].

Road Safety Research

Developing an emergency braking assistance system [14].

Signal Processing

Combining data recorded by an array of sensors to minimize the noise [8].

Speech Recognition

Automatically transcribing records of phone conversations [6].

Up until now, these fields have had to satisfy themselves with linear shrinkage estimation of the covariance matrix [22]. However this approach is asymptotically suboptimal in the class of rotation-equivariant estimators relative to nonlinear shrinkage, which requires numerical implementation of the QuEST function. The present paper makes this new and improved method universally available in practice.

3Definition of the QuEST Function

The mathematical definition of the QuEST function is given by [27]. It is reproduced here for convenience. For any positive integers and , the QuEST function, denoted by , is the nonrandom multivariate function given by:

where

and is the unique solution in the set

to the equation

The QuEST function is a natural discretization of equation (1.4) of [39], which is itself a reformulation of equation (1.14) of [30]. The basic idea is that represents the matrix dimension, the sample size, the population eigenvalues, the sample eigenvalues, the limiting empirical c.d.f of sample eigenvalues, and its [43] transform. A fundamental result in large-dimensional asymptotics is that the relationship between the population spectral distribution and the sample spectral distribution is nonrandom in the limit. Figure ?, publicized by Jianfeng [45] in a conference presentation, gives a heuristic view of the area where Marčenko-Pastur asymptotic theory is more useful (labelled “MP area”) vs. the area where standard fixed-dimension asymptotic theory applies (labelled “Low-dim area”).

image

This insight is further developed in the recent book by [47]. Readers interested in the background from probability theory may also consult the authoritative monograph by [5].

The importance of the QuEST function is twofold. First, inverting it numerically yields an estimator of the population eigenvalues that is consistent under large-dimensional asymptotics. Second, once this has been achieved, it is possible to use Theorem 2 of [21] to construct shrinkage estimators of the covariance matrix that are asymptotically optimal with respect to a given loss function in the -dimensional space of rotation-equivariant estimators introduced by [41]. [25] derive the optimal shrinkage formula for five different loss functions, and [26] for a sixth.

Numerical implementation of the QuEST function consists in a series of six successive operations: 1) finding the support of ; 2) choosing a grid that covers the support; 3) solving equation on the grid; 4) computing the sample spectral density; 5) integrating it to obtain the empirical c.d.f. of sample eigenvalues; and 6) interpolating the c.d.f to compute sample eigenvalues as per equation (Equation 2). Each of these steps is detailed below.

4Support

In what follows we omit the subscripts and superscript of in order to simplify the notation. We do not work directly with but with , which is defined by:

There is a direct mapping between -space and -space, as explained in Section 2 of [24]. Numerically it is more judicious to work in -space.

To determine the image of the support of in -space, we first need to group together the population eigenvalues that are equal to one another and, if necessary, discard those that are equal to zero. Let us say that there are distinct nonzero population eigenvalues . We can associate them with their respective weights: if elements of the vector are equal to then the corresponding weight is .

4.1Spectral Separation

Now we look for spectral separation between and (). This is done in two stages. First we run a quick test to see whether we can rule out spectral separation a priori. Second, if the test is inconclusive, we do the full analysis to ascertain whether spectral separation does indeed occur.

Necessary Condition

Spectral separation occurs between and if and only if

which is equivalent to

Equation (Equation 5) is equivalent to the function defined in equation (1.6) of [40] being strictly increasing at . Section 4 of [40] explains how this enables us to determine the support.

Call the function on the left-hand side of equation (Equation 5). We can decompose it into

It is easy to see that the function is convex over the interval , diverges to near and , and attains its minimum at

therefore a lower bound for on is .

It is also easy to see that the function is decreasing over the interval ; therefore, it attains its minimum at and is bounded from below by . Conversely, the function is increasing over the interval , attains its minimum at and is bounded from below by . Putting these three results together yields the following lower bound for :

where is given by equation (Equation 6).

Combining this bound with equation (Equation 5) means that

is a necessary (but not sufficient) condition for spectral separation to occur between and . Thus, the numerical procedure can be made more efficient by first computing the quantity on the left-hand side of equation (Equation 7), comparing it to , and discarding the interval in the case where it is higher than . If, on the other hand, it is strictly lower than , then further work is needed to ascertain whether spectral separation does indeed occur. In practice, checking this condition seems to save a lot of time by eliminating many intervals , except perhaps when is very small and the population eigenvalues are very spread out.

Necessary and Sufficient Condition

Consider now some for which the condition in equation (Equation 7) does not hold. Given equation (Equation 5), we need to find the minimum of over and compare it to . It is easy to check that is strictly convex over , therefore this minimum exists, is unique, and is the only zero in of the derivative function

Most numerical algorithms that find the zero of a function require as inputs two points such that the sign of the function is not the same at as at . Finding two such points is the goal of the next step. There are three cases, depending on the sign of .

  • : Then the search is immediately over because attains its minimum at . This would not happen generically unless .

  • : In this case, given that is strictly increasing, the minimizer of lies in the interval . We can feed the lower bound into the numerical procedure that will find the zero of . It would be also tempting to set , but unfortunately doing so would not be practicable because , and most numerical procedures perform poorly near singularity points. Therefore we need to find some such that . Let denote the unique value in such that . Then the fact that is increasing in for any implies that the following inequalities hold:

    where the last inequality follows from . Thus if we set

    we know that . Launching a zero-finding algorithm for on the interval as defined above yields a unique solution .

  • : A similar line of reasoning points us to

    , and yields a unique zero for over the interval .

Across all three cases, the outcome of this procedure is . Spectral separation occurs between and if and only if .

If there is no spectral separation, then we can dismiss the interval . Otherwise we need some additional work to compute spectrum boundaries.

Interval Boundaries

Consider now some for which is known and . Spectral separation means that the support ends at some point in and starts again at some point in . The equation that characterizes support endpoints is . Thus we need to find two zeros of the function , one in the interval and the other in the interval .

Let us start with the first zero of the function , the one that lies in the interval . Once again, we employ an off-the-shelf univariate zero-finding routine that takes as inputs two points such that and . The obvious candidate for is . For , however, we cannot use because . Therefore we need to find some that verifies . Such an can be found by considering the following series of inequalities, which hold for all :

Notice that if we set

then

therefore, . Feeding thus defined into the zero-finding numerical routine with the function yields an endpoint of the support.

A similar line of reasoning leads to setting ,

and running a numerical routine to find a zero of the function on the interval . This zero will also be a support endpoint.

4.2Extremities of the Support

The procedure described so far identifies all support endpoints lying in the interval . In order to complete the determination of the support, we must find the support endpoint that lies in the interval and the support endpoint that lies in the interval .

Minimum of the Support

Let us start with the first support endpoint, the one lying in the interval . The equation that characterizes this point is the same as before: . In order to employ the zero-finding numerical routine, we must find two bounds and , both strictly less than , such that and . The left-hand side bound can be obtained by considering the following inequalities:

Notice that if we set

then

which in turn implies by equation (Equation 8) that , as desired.

The right-hand side bound can be found by considering a different inequality:

Notice that if we set

then

which in turn implies by equation (Equation 9) that , as desired. Launching the numerical routine to find a zero of the function over the interval thus defined yields the first endpoint of the support.

Maximum of the Support

For the last endpoint of the support, the one that lies in the interval , a similar line of reasoning leads us to define:

Launching the numerical routine to find a zero of the function over the interval thus defined yields the last endpoint of the support.

4.3Output

The main outputs of this procedure are , the number of distinct intervals that constitute the support, and , the support endpoints. The support in -space is .

Another output of this procedure is a set of positive integers summing up to that tell us how many population eigenvalues correspond to each support interval. If then there is no spectral separation and . If and the first spectral separation occurs between and for some , then . If some poulation eigenvalues are equal to zero then needs to be augmented accordingly.

If and the last spectral separation occurs between and for some , then . If and the support interval (for ) is delimited on the left-hand side by spectral separation occurring between and , and on the right-hand side by spectral separation occurring between and (where ), then . This information will turn out to be useful in subsequent operations.

4.4Derivative of the Support Endpoints

If the QuEST function defined by equations (Equation 1)–(Equation 4) is to be used efficiently in an optimization algorithm, it is desirable to be able to compute its derivative analytically. Since this function is constructed as a chain of six successive operations, the first of which is the determination of support endpoints, its derivative can be computed in the same way, provided that we start by computing analytically the derivative of support endpoints with respect to for all .

Every for is a zero of the function

By differentiating the equation we get:

The partial derivatives of the function are as follows:

therefore,

5Grid

The first operation generated the support in -space and the number of population eigenvalues corresponding to each interval: . The goal of the second operation is to produce a grid that covers this support. This problem can be broken down by considering each interval separately.

5.1Formula for the Grid Points

Take some . How shall we determine a grid that covers the interval ? The number of points on the grid will be a function of . Specifically, we shall take points in the open interval , plus the two endpoints and . Thus, the total number of points covering the closed interval will be . Let us call these points , with the convention that and . Thus, what is left is to define .

There are many ways to choose such a grid, depending on how densely we want to cover the various parts of the interval. The simplest idea would be to have uniform coverage through a linearly spaced grid. But it is more judicious to increase coverage density near the edges of the interval because this is where a lot of the action is taking place. [40] demonstrate that the limiting density of sample eigenvalues has “square root”-type behavior near boundary points. This fact points us towards the inverse c.d.f. function of the beta distribution with parameters , also known as the arcsine distribution:

Compared to the beta distribution with parameters , which is the uniform distribution, reducing both parameters from to increases coverage density near the edges of the interval. Note that the density of the arcsine distribution goes to infinity at the edges of the interval (as does the derivative of the square root function), but the c.d.f., its inverse and the grid all remain well-behaved. The goal here is to enhance numerical accuracy.

5.2Derivative of the Grid Points

In keeping with our earlier stated objective (see Section 4.4) of building towards an analytical formula for the partial derivative of with respect to for all , at this stage we need to compute for all . From equation (Equation 11) we can see immediately that it is

where and are given by equation (Equation 10).

6Solving the Marčenko-Pastur Equation in -Space

In this section we will assume that the interval index is fixed.

6.1Statement of the Problem

Given a grid coverage of the support interval, the third operation solves the Marčenko-Pastur equation at . For every , define the function