UserFriendly Covariance Estimation for HeavyTailed Distributions: A Survey and Recent Results
Abstract
We offer a survey of selected recent results on covariance estimation for heavytailed distributions. By unifying scattered ideas appeared in the literature, we propose userfriendly methods that facilitate practical implementation. Specifically, we introduce elementwise and spectrumwise truncation operators, as well as their estimator counterparts, to robustify the sample covariance matrix. Different from the classical notion of robustness which is typically characterized by the breakdown property, we focus on the tail robustness that is evidenced by the nonasymptotic deviation property of the estimator for data with only finite fourth moments. The key observation is that the robustification parameter needs to adapt to the sample size, dimensionality and moment to achieve optimal tradeoff between bias and robustness. Furthermore, to facilitate their practical use, we propose tuningfree procedures that automatically calibrate the tuning parameters. Applications to a series of structured models in high dimensions, including the bandable covariance estimation, sparse precision matrix estimation, lowrank covariance estimation, and covariance estimation and multiple testing under factor models are revisited. Numerical examples lend strong support to our proposed methodology.
Keywords: Covariance estimation, heavytailed data, estimation, nonasymptotics, tail robustness, truncation.
1 Introduction
Covariance matrix has played a significant role in multivariate statistics. The estimation of covariance matrix, therefore, serves as a building block for many important statistical methods, including principal component analysis, discriminant analysis, clustering analysis and regression analysis, among many others. Recently, estimating structured large covariance matrices, such as bandable, sparse and lowrank matrices, has attracted evergrowing attention in statistics and machine learning (Bickel and Levina, 2008a, b; Cai, Ren and Zhou, 2016; Fan, Liao and Liu, 2016). It also finds applications in a wide range of applications, ranging from functional magnetic resonance imaging (fMRI), analysis of gene expression arrays to risk management and portfolio allocation.
Theoretical properties of large covariance matrix estimators in the literature often hinges upon the Gaussian or subGaussian^{1}^{1}1A random variable is said to have subGaussian tails if there exists constants and such that for all . assumption (Vershynin, 2012). See, for example, Theorem 1 in Bickel and Levina (2008a). Such an assumption is typically very restrictive in practice. For example, a recent fMRI study by Eklund et al. (2016) reports that most of the common software packages, such as SPM and FSL, for fMRI analysis can result in inflated falsepositive rates up to 70% under 5% nominal levels, and questioned a number of fMRI studies among 40,000 or so studies according to PubMed. Their result suggested that
The principal cause of the invalid cluster inferences is spatial autocorrelation functions that do not follow the assumed Gaussian shape.
Eklund et al. (2016) plot the the empirical spatial autocorrelation functions, which clearly exhibit heavier tails (Eklund et al., 2016, Page 7902). Violation of the Gaussianity has also been found in genomic studies (Liu et al., 2003; Purdom et al., 2005) and quantitative finance (Fama, 1963; Mandelbrot, 1967). It is therefore imperial to develop robust inferential procedures that are insensitive to restrictive distributional assumptions.
We are interested in obtaining estimators that admit tight nonasymptotic deviation guarantees under weak assumptions, such as existence of moments of only low order. Heavytailed distribution is a viable model for data contaminated by outliers that are typically encountered in applications. Due to heavy tailedness, there is a higher chance that some observations are sampled far away from the population mean. We refer to these outlying data points as distributional outliers. A procedure that is robust against such outliers is called a tailrobust procedure, and is evidenced by its better finitesample performance than a nonrobust method in the presence of heavytailed distributions. In this paper, by unifying scattered ideas appeared in the literature, we provide a general framework for constructing userfriendly tailrobust covariance estimators. Specifically, we propose elementwise and spectrumwise truncation operators, as well as their estimator counterparts, with adaptively chosen robustification parameters. Theoretically, we establish nonasymptotic deviation bounds and demonstrate that the robustification parameters should adapt to the sample size, dimensionality and noise level for optimal tradeoff between bias and robustness. Our goal is to obtain estimators that are computationally efficient and easily implementable in applications. To this end, we propose datadriven schemes to calibrate these robustification parameters, making our proposal computational friendly. Finally, we discuss a series of applications to structured models in high dimensions, including the bandable covariance estimation, sparse precision matrix estimation, lowrank covariance estimation, and covariance estimation, and multiple testing under factor models.
Our definition of robustness is different from the conventional perspective under Huber’s contamination model (Huber, 1964), where the focus has been on developing robust procedures with high breakdown point. The breakdown point (Hampel, 1971) of an estimator is defined roughly as the proportion of arbitrary outliers the estimator can tolerate before it produces arbitrarily large estimates. Since the seminar work of Tukey (1975), a number of depthbased procedures have been developed; see, for example, Liu (1990), Zuo and Serfling (2000), Mizera (2002) and SalibianBarrera and Zamar (2002), among others. Another line of work focuses on robust and resistant estimators, including the least median of squares and least trimmed squares (Rousseeuw, 1984), the Sestimator (Rousseeuw and Yohai, 1984) and the MMestimator (Yohai, 1987). We refer to Portnoy and He (2000) for a literature review on classical robust statistics, and Chen et al. (2018) for recent development on nonasymptotic analysis under contamination models.
The rest of the paper proceeds as follows. We start with a motivating example in Section 2, which reveals the weakness of sample covariance matrix. In Section 3, we introduce two types of generic robust covariance estimators and establish their deviation bounds under different norms of interest. The finitesample performance of the proposed estimators, both elementwise and spectrumwise, depends on a careful tuning of the robustification parameter, which should adapt to the noise level for biasrobustness tradeoff. We also discuss a medianofmeans estimator, which is automatically tuningfree at the cost of slightly stronger conditions. For practical implementation, in Section 4 we propose a datadriven scheme to choose the key tuning parameters. Section 5 presents various applications to estimating structured covariance and precision matrices, multiple testing under factor models with heavytailed error. Numerical studies are provided in Section 6. We end this paper with a discussion in Section 7.
1.1 Overview of the previous work
In the past several decades, there have been active developments of robust covariance estimators in the absence of normalty. Examples include the Minimum Covariance Determinant (MCD) estimator, the Minimum Volume Ellipsoid (MVE) estimator, Maronna’s (Maronna, 1976), and Tyler’s (Tyler, 1987) estimators of multivariate scatter matrix. We refer to Hubert, Rousseeuw and Van Aelst (2008) for a comprehensive review. Asymptotic properties of these methods have been established for the family of elliptically symmetric distributions; see, for example, Davies (1992), Butler, Davies and Jhun (1993) and Zhang, Cheng and Singer (2016), among others. However, the above estimators rely on either a parametric assumption or a shape constraint on the sampling distribution. Under a general setting where neither is assumed, robust covariance estimation remains a challenging problem.
The work of Catoni (2012) triggers a growing interest in developing tailrobust estimators, which are characterized by tight nonasymptotic deviation analyses, rather than the mean squared errors. The current stateoftheart methods for covariance estimation with heavytailed data include those of Catoni (2016), Minsker (2018), Minsker and Wei (2018), AvellaMedina et al. (2018), Mendelson and Zhivotovskiy (2018). From a spectrumwise perspective, Catoni (2016) constructs a robust estimator of the Gram matrix of a random vector or a covariance matrix via estimating the quadratic forms uniformly over the unit sphere in , and obtains error bounds for the operator norm. More recently, Mendelson and Zhivotovskiy (2018) build a different robust covariance estimator that admits dimensionfree deviation bounds under the finite kurtosis condition. Both constructions, however, involve bruteforce search over every direction in a dimensional net, and thus are computationally intractable. From an elementwise perspective, AvellaMedina et al. (2018) combine robust estimates of the first and second moments to obtain variance estimators. In practice, three potential drawbacks of this approach are: (i) the accumulated error consists of those from estimating the first and second moments, which may be significant; (ii) the diagonal variance estimators are not necessarily positive and therefore additional adjustments are required; and (iii) using the crossvalidation to calibrate a total number of tuning parameters is computationally expensive.
Building upon the ideas in Minsker (2018) and AvellaMedina et al. (2018), we propose userfriendly tailrobust covariance estimators that enjoy desirable finitesample deviation bounds under weak moment conditions. The constructed estimators only involve simple truncation techniques and are computationally more friendly. Through a novel datadriven tuning scheme, we are able to efficiently compute these robust estimators for largescale problems in practice. These two points distinguish our work from others in the literature. The proposed robust estimators serve as building blocks for estimating large structured covariance and precision matrices, and we illustrate their board applicabilities in a series of problems.
1.2 Notation
We adopt the following notation throughout the paper. For any and a matrix , we define the maxnorm , the Frobenius norm and the operator norm
where for , and . In particular, it holds and . Moreover, we write for the spectral norm and use to denote the effective rank of a nonnegative definite matrix , where is the trace of . When is symmetric, we have where are eigenvalues of . For any matrix and an index set , we use to denote the complement of , and to denote the submatrix of with entries indexed by . For a realvalued random variable , let be the kurtosis of , defined as , where and .
2 Motivating example: A challenge of heavytailedness
Suppose we observe a sample of independent and identically distributed (i.i.d.) data vectors from a random vector with mean and covariance matrix . To recognize the difficulty of mean and covariance estimation for heavytailed distributions, we first provide a lower bound for the deviation of the empirical mean under the norm in .
Proposition 2.1.
For any and , there exists some distribution in with mean and covariance matrix such that the empirical mean of i.i.d. observations from this distribution satisfies, with probability at least , that
(2.1) 
The above proposition, which is a multivariate extension of Proposition 6.2 in Catoni (2012), provides a lower bound under the norm for estimating a mean vector via the empirical mean. Integrating the union bound and Chebyshev’s inequality, we obtain that with probability at least ,
Together, this upper bound and (2.1) show that the worst case deviations of the empirical means are polynomial at best under the norm in the presence of heavytailed distributions. We will see a later a modified estimator can achieve an exponentialtype deviation bound, and thus the sample mean is suboptimal.
To better appreciate the above motivation, we perform a toy numerical study on covariance matrix estimation. Let be i.i.d. data vectors drawn from , where ’s are independent from a centered Gamma distribution so that and . We compare the performance of three methods: the sample covariance matrix, the elementwise truncated covariance estimator and the spectrumwise truncated estimator. The latter two are tailrobust covariance estimators that will be introduced in Sections 3.1 and 3.2 respectively. Their performances are assessed by the maxnorm error. We take and let increase from to with a step size of . The results are based on simulations. Figure 1 displays the average (line) and the spread (dots) of estimation errors for each method as the dimension increases. We see that the sample covariance estimator has not only the largest average error but also the highest variability in all the settings. This example demonstrates that the sample covariance matrix suffers from poor finitesample performance when the data are heavytailed.
3 Tailrobust covariance estimation
3.1 Elementwise truncated estimator
We consider the same setting as in the previous section. For mean estimation, we notice that the suboptimality of under norm is due to the fact that the tail probability of decays only polynomially. A simple yet natural idea for improvement is to truncate the data to eliminate outliers caused by the noise, so that each entry of the resultant estimator has a subGaussian tail. To realize this idea, we introduce the following truncation function/operator, which is closely related to the Huber loss, to be discussed later.
Definition 3.1 (Truncation operator).
Let be a truncation operator given by
(3.1) 
where the truncation parameter is also referred to as the robustification parameter that balances bias and robustness.
For illustration purpose, we assume so that . We apply the truncation operator above to each entry of , and then take the average to obtain
where are robustification parameters. When the mean vector is unspecified, a straightforward approach is to first estimate the mean vector using existing robust methods (Minsker, 2015; Lugosi and Mendelson, 2017), and then employ as robust estimates of the second moments. Estimating the first and second moments separately will unavoidably introduce additional tuning parameters, which increases both statistical variability and computational complexity of the method. In what follows, we propose to use the pairwise difference approach, which is free of mean estimation.
Let and define the paired data
(3.2) 
which are identically distributed from a random vector with mean 0 and covariance matrix . It is easy to check that the sample covariance matrix, with , can be expressed using paired data as
Following the same argument from the last section, we apply the truncation operator to entrywise, and then take the average to obtain
Collecting these marginal estimators, we define the following elementwise truncated covariance matrix estimator
(3.3) 
where is a symmetric parameter matrix. can be viewed as a truncated version of the sample covariance matrix . We assume and define , the largest integer no greater than . Moreover, let be a symmetric matrix with
Theorem 3.1.
Theorem 3.1 indicates that, with properly tuned parameter matrix , the resulting covariance matrix estimator achieves elementwise tail robustness against heavytailed distributions: provided the fourth moments are bounded, each entry of concentrates tightly around its mean so that the maximum error scales as . Elementwise, we are able to accurately estimate at high confidence levels even when the dimension grows with the sample size exponentially fast.
3.2 Spectrumwise truncated estimator
In this section, we propose and study a tailrobust covariance estimator that is robust in the spectrum domain. Following the general robustification technique developed in Minsker (2018), we directly apply the truncation operator to matrices in their spectrum domain. To this end, we need the following definition of matrix functional.
Definition 3.2 (Matrix functional).
Given a realvalued function defined on and a symmetric with eigenvalue decomposition such that , is defined as , where .
Following the same rational as in the previous section, we propose a spectrumwise truncated covariance estimator based on the pairwise difference approach:
(3.6) 
where are given in (3.2). Note that is a rankone matrix with eigenvalue and the corresponding eigenvector . By Definition 3.2, can be rewritten as
This alternative expression renders the computation almost effortless. The following theorem provides an exponentialtype concentration inequality for in operator norm, which is a useful complement to (Minsker, 2018, Remark 8). In a similar spirit to Theorem 3.1, our next theorem shows that achieves an exponentialtype concentration bound under operator norm for heavytailed data with finite operatorwise fourth moment, that is,
(3.7) 
is finite. We refer to Minsker and Wei (2018) for a general treatment of robust modifications of statistics.
Theorem 3.2.
For any , the estimator with
(3.8) 
satisfies that with probability at least ,
(3.9) 
To better understand the above result, note that can be written as
which is welldefined if the fourth moments are finite. Define the maximum kurtosis of as . Then, it follows that
The following result is a direct consequence of Theorem 3.2: admits exponentialtype concentration for data with finite kurtosises.
Corollary 3.1.
Assume that is finite. Then, for any , the estimator as in Theorem 3.2 satisfies
(3.10) 
with probability at least .
Remark 1.
Theoretically, the current stateoftheart estimator proposed by Mendelson and Zhivotovskiy (2018) achieves a more refined deviation bound with in (B.9) improved to ; in particular, the deviations are controlled by the spectral norm instead of the (possibly much larger) trace . Estimators admitting such deviations guarantees are often called “subGaussian” as they achieve performance similar to the sample covariance obtained from data with multivariate normal distribution. Unfortunately, the aforementioned estimator is computationally intractable. In a recent work, Hopkins (2018) proposed a multivariate mean estimator that achieves theoretically optimal performance in the norm, and is computationally tractable at the same time. It remains an open question to design a polynomialtime algorithm capable of computing the nearoptimal estimator efficiently.
3.3 An estimation viewpoint
In this section, we discuss alternative tailrobust covariance estimators from an estimation perspective and study both the elementwise and spectrumwise truncated estimators. The connection with truncated covariance estimators is discussed at the end of this section. To proceed, we revisit the definition of Huber loss.
Definition 3.3 (Huber loss).
Compared with the squared error loss, large values of are downweighted in the Huber loss, by which it introduces robustness. Generally, the minimizer is biased from the model parameters, and letting grow reduces the bias. In other words, the robustification parameter quantifies the tradeoff between bias and robustness. As observed in Sun, Zhou and Fan (2017), in order to achieve optimal tradeoff between bias and robustness, should adapt to the sample size, dimension and noise level.
Starting with the elementwise method, we define marginal estimators
(3.12) 
where are robustification parameters satisfying . When , even though the minimization is over , it turns out the solution is still positive almost surely and therefore provides a reasonable estimator of . To see this, for each , define and note that for any and ,
This implies that , which is strictly positive as long as there are no tied observations. Again, collecting these marginal estimators, we obtain a Hubertype estimator
(3.13) 
where . Our main result in this section indicates that achieves fast concentration under the maxnorm for data with finite fourth moments. A version of Theorem 3.3 follows from Theorem 3.1 in Minsker and Wei (2018) by setting , although the univariate case requires slightly weaker assumptions and is of independent interest.
Theorem 3.3.
Let be a symmetric matrix satisfying
(3.14) 
For any , the covariance estimator given in (3.13) with
(3.15) 
satisfies
(3.16) 
as long as .
The estimator counterpart of the spectrum truncated covariance estimator is first proposed by Minsker (2018) using a different robust loss function. Later, Minsker and Wei (2018) investigate robust modifications of empirical risk minimization of Ustatistics. In line with the previous elementwise estimator, we restrict our attention to the Huber loss and consider
(3.17) 
which is a natural robust variant of the sample covariance matrix
Define the matrix that satisfies
The following result is modifed from Corollary 4.1 in Minsker and Wei (2018).
Theorem 3.4.
Assume that there exists some such that . Then for any and , the estimator with satisfies
(3.18) 
with probability at least as long as , where are absolute constants.
To solve the convex optimization problem (3.17), Minsker and Wei (2018) propose the following gradient descent algorithm: Starting with an initial estimator , at iteration , compute
where is given in (3.1). From this point of view, the truncated estimator given in (3.6) can be viewed as the first step of the gradient descent iteration for solving optimization problem (3.17) initiated at . This procedure enjoys a nice contraction property, as demonstrated by Lemma 3.2 in Minsker and Wei (2018). However, since the difference matrix for each is no longer rankone, we need to perform a singular value decomposition to compute the matrix functional for .
We close this section with a discussion on the connection and difference between truncated estimators and estimators. Both types of estimators achieve tail robustness through biasrobustness tradeoff, either elementwise or spectrumwise. The difference lies in the quantity where the truncation is applied to. Intuitively, estimators truncate symmetrically around the true expectation as shown in (3.12) and (3.17), while the truncation estimators simply truncate symmetrically around zero as in (3.3) and (3.6). Due to smaller finitesample bias, estimators are expected to outperform the simple truncated estimators. However, since the optimal choice of the robustification parameter is often much larger than the population moments in magnitude either elementwise or spectrumwise, the difference between truncation estimators and estimators becomes insignificant when is large. Therefore, we advocate using the simple truncated estimator primarily due to its simplicity and computational efficiency.
3.4 Medianofmeans estimator
Truncationbased approaches described in the previous sections require knowledge of robustification parameters . Adaptation and tuning of these parameters will be discussed in Section 4 below. Here, we suggest another method that does not need any tuning but requires stronger assumptions, namely, existence of moments of order . This method is based on the medianofmeans (MOM) technique (Nemirovsky and Yudin, 1983; Devroye et al., 2016; Minsker and Strawn, 2017). To this end, assume that the index set is partitioned into disjoint groups (partitioning scheme is assumed to be independent of ) such that the cardinalities satisfy for . For each , let and
be the sample covariance evaluated over the data in group . Then, for all , the MOM estimator of is defined via
where is the entry in the th row and th column of . This leads to
Let for . The following result provides a deviation bound for the MOM estimator under the maxnorm.
Theorem 3.5.
Assume that and that for all . Then there exists depending only on the fourth and sixth central moments of the entries of such that
(3.19) 
for all such that .
Remark 2.

The only userdefined parameter in the definition of is the number of subgroups . The bound above shows that, provided , the term in (3.19) is of smaller order, and we obtain an estimator that admits tight deviation bounds for a wide range of . In this sense, estimator is essentially tuningfree (say, one could set .

Application of the MOM construction to large covariance estimation problems has been explored by AvellaMedina et al. (2018). However, results obtained in that work are insufficient to conclude that MOM estimators are “tuningfree”. In particular, in (AvellaMedina et al., 2018) authors considered weaker assumptions (fourth moments instead of sixth) and obtained weaker results that (i) are not uniform in the confidence parameter , hence the estimator has to be recomputed for each value of this parameter and (ii) result in error bounds that are controlled by the fourth moments instead of .
4 Automatic tuning of robustification parameters
For all the proposed tailrobust estimators besides the medianofmeans, the robustification parameter needs to adapt properly to the sample size, dimension and noise level in order to achieve optimal tradeoff between bias and robustness in finite samples. An intuitive yet computationally expensive idea is to use crossvalidation. Another approach is based on Lepski’s method (Lepski and Spokoiny, 1997); this approach yields adaptive estimators with provable guarantees (Minsker, 2018; Minsker and Wei, 2018), however, it is also not computationally efficient. In this section, we propose tuningfree approaches for constructing both truncated and estimators that have low computational costs. Our nonasymptotic analysis provides useful guidance on the proposed choice of key tuning parameters.
4.1 Adaptive truncated estimator
In this section we introduce a datadriven procedure that automatically tunes the robustification parameters in the elementwise truncated covariance estimator. Practically, each individual parameter can be tuned by crossvalidation from a grid search. This, however, will incur extensive computational cost even when the dimension is moderately large.
Instead, we propose a datadriven procedure that automatically calibrates the parameters. This procedure is motivated by the theoretical properties established in Theorem 3.1. For presentation simplicity, we fix and define such that . Then can be written as . In view of (3.4), an “ideal” choice of is
(4.1) 
where is prespecified to control the confidence level and will be discussed later. A naive estimator of is the empirical second moment , which tends to overestimate the true value when data are leptokurtic. Intuitively, a wellchosen makes a good estimator of , and meanwhile, we expect the empirical truncated second moment to be a reasonable estimate of as well. Plugging this empirical truncated second moment into (4.1) yields
(4.2) 
We then solve the above equation to obtain , a datadrive choice of . By Proposition 3 in Wang et al. (2018), equation (4.2) has a unique solution as long as . We characterize the theoretical properties of this tuning method in a companion paper (Wang et al., 2018).
For the choice of , on one hand, because it controls the confidence level according to (3.5), we should let be sufficiently large so that the estimator is concentrated around the true value with high probability. On the other hand, also appears in the deviation bound that corresponds to the width of the confidence interval, it should not grow too fast as a function of . In practice, we recommend using (or equivalently, ), a typical slowly varying function of .
To implement the spectrumwise truncated covariance estimator in practice, note that there is only one tuning parameter whose theoretically optimal scale is
Motivated by the datadriven tuning scheme for the elementwise estimator, we choose by solving the equation
where as before we take .
4.2 Adaptive Hubertype estimator
To construct a datadriven estimator for the adaptive Huber estimator, we follow the same rationale from the previous subsection. Note that the optimal now depends on instead of the second moment , it is therefore conservative to directly apply the above datadriven method in this case. Instead, we propose to calibrate and estimate simultaneously by solving the following system of equations
(4.3a)  
(4.3b) 
for and . Via a similar argument, it can be shown that the equation has a unique solution as long as ; for any , the equation also has a unique solution. Starting with an initial estimate , which is the sample variance estimator of , we iteratively solve and for until convergence. The resultant solution, denoted by with slight abuse of notation, is then referred to as the adaptive Huber estimator of . We then obtain the dataadaptive Huber covariance matrix estimator as . See Algorithm 1 for the pseudoalgorithm of this datadriven procedure.
5 Applications to structured covariance estimation
The robustness properties of the elementwise and spectrumwise truncation estimators are demonstrated in Theorems 3.1 and 3.2. In particular, the exponentialtype concentration bounds are essential for obtaining reasonable estimators of highdimensional structured covariance and concentration matrix. In this section, we apply the proposed generic robust methods to the estimation of bandable covariance matrix, sparse precision matrix, lowrank covariance matrix, covariance matrix and multiple testing under factor models.
5.1 Bandable covariance matrix estimation
Motivated by applications such as climate studies and spectroscopy, in which there is a natural order/metric on the index set of variables , one can expect that a large distance implies near independence. We characterize this feature by the following class of bandable covariance matrices considered in Bickel and Levina (2008a) and Cai, Zhang and Zhou (2010):
(5.1) 
Here are regarded as universal constants and the parameter specifies the decay rate of to zero as for each row.
Under subGaussian conditions, Cai, Zhang and Zhou (2010) establish minimaxoptimal estimation over under the spectral norm loss. More specifically, they proposed a tapering estimator based on the pivotal sample covariance matrix as follows: , where the positive integer specifies the bandwidth, , , whenever , , , and denotes the sample covariance. When has subGaussian tails, it is shown in Cai, Zhang and Zhou (2010) that with an optimal choice of bandwidth , achieves the minimax convergence rate under the spectral norm that is of order .
When demonstrates general heavy tails with finite fourth moments, it is still unknown whether the (nearly) same optimal rate can be achieved over unless additional distributional assumptions are imposed, such as elliptical symmetry (Mitra and Zhang, 2014; Chen et al., 2018). To close this gap, we propose a robust covariance estimator based on spectrumwise truncated covariance estimator introduced in Section 3.2. We first define a few notations and one matrix operator to facilitate the illustration of our procedure. Let be a subvector of given in (3.2), where and on the superscript indicate the location and dimension, respectively. Then the spectrumwise truncated estimator of the corresponding principal submatrix of is defined as
(5.2) 
where the robustification parameter is defined in (3.8) with replaced by , and therein is defined in (3.7) with replaced by .
In addition, we define an expanding operator which puts a small matrix onto a large zero matrix: for an matrix , define the matrix , where indicates the location and
Our robust covariance estimator is then defined as