Bernsteinvon Mises Theorems for Functionals of Covariance Matrix
^{1}
Abstract
We provide a general theoretical framework to derive Bernsteinvon Mises theorems for matrix functionals. The conditions on functionals and priors are explicit and easy to check. Results are obtained for various functionals including entries of covariance matrix, entries of precision matrix, quadratic forms, logdeterminant, eigenvalues in the Bayesian Gaussian covariance/precision matrix estimation setting, as well as for Bayesian linear and quadratic discriminant analysis.
Keywords. Bernsteinvon Mises Theorem, Bayes Nonparametrics, Covariance Matrix.
eq(LABEL:#1) \newrefformatchapChapter LABEL:#1 \newrefformatsecSection LABEL:#1 \newrefformatalgoAlgorithm LABEL:#1 \newrefformatfigFig. LABEL:#1 \newrefformattabTable LABEL:#1 \newrefformatrmkRemark LABEL:#1 \newrefformatclmClaim LABEL:#1 \newrefformatdefDefinition LABEL:#1 \newrefformatcorCorollary LABEL:#1 \newrefformatlmmLemma LABEL:#1 \newrefformatlemmaLemma LABEL:#1 \newrefformatpropProposition LABEL:#1 \newrefformatappAppendix LABEL:#1 \newrefformatexExample LABEL:#1 \newrefformatexerExercise LABEL:#1 \newrefformatsolnSolution LABEL:#1 \newrefformatcondCondition LABEL:#1
1 Introduction
The celebrated Bernsteinvon Mises (BvM) theorem [20, 3, 29, 21, 27] justifies Bayesian methods from a frequentist point of view. It bridges the gap between Bayesians and frequentists. Consider a parametric model , and a prior distribution . Suppose we have i.i.d. observations from the product measure . Under some weak assumptions, Bernsteinvon Mises theorem shows that the conditional distribution of
is asymptotically under the distribution with some centering and covariance when . In a local asymptotic normal (LAN) family, the centering can be taken as the maximum likelihood estimator (MLE) and as the inverse of the Fisher information matrix. An immediate consequence of the Bernsteinvon Mises theorem is that the distributions
are asymptotically the same under the sampling distribution . Note that the first one, known as the posterior, is of interest to Bayesians, and the second one is of interest to frequentists in the large sample theory. Applications of Bernsteinvon Mises theorem include constructing confidence sets from Bayesian methods with frequentist coverage guarantees.
Despite the success of BvM results in the classical parametric setting, little is known about the highdimensional case, where the unknown parameter is of increasing or even infinite dimensions. The pioneering works of [11] and [13] (see also [17]) showed that generally BvM may not be true in nonclassical cases. Despite the negative results, further works on some notions of nonparametric BvM provide some positive answers. See, for example, [22, 8, 9, 24]. In this paper, we consider the question whether it is possible to have BvM results for matrix functionals, such as matrix entries and eigenvalues, when the dimension of the matrix grows with the sample size .
This paper provides some positive answers to this question. To be specific, we consider a multivariate Gaussian likelihood and put a prior on the covariance matrix. We prove that the posterior distribution has a BvM behavior for various matrix functionals including entries of covariance matrix, entries of precision matrix, quadratic forms, logdeterminant, and eigenvalues. All of these conclusions are obtained from a general theoretical framework we provide in Section 2, where we propose explicit easytocheck conditions on both functionals and priors. We illustrate the theory by both conjugate and nonconjugate priors. A slight extension of the general framework leads to BvM results for discriminant analysis. Both linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) are considered.
This work is inspired by a growing interest in studying the BvM phenomena on a lowdimensional functional of the whole parameter. That is, the asymptotic distribution of
with being a map from to , where does not grow with . A special case is the semiparametric setting, where contains both a parametric part and a nonparametric part . The functional takes the form of . The works in this field are pioneered by [19] in a rightcensoring model and [26] for a general theory in the semiparametric setting. However, the conditions provided by [26] for BvM to hold are hard to check when specific examples are considered. To the best of our knowledge, the first general framework for semiparametric BvM with conditions cleanly stated and easy to check is the beautiful work by [7], in which the recent advancement in Bayes nonparametrics such as [2] and [15] are nicely absorbed. [25] proves BvM for linear functionals for which the distribution of converges to a mixture of normal instead of a normal. At the point when this paper is drafted, the most updated theory is due to [10], which provides conditions for BvM to hold for general functionals. The general framework we provide for matrix functional BvM is greatly inspired by the framework developed in [10] for functionals in nonparametrics. However, the theory in this paper is different from theirs since we can take advantage of the structure in the Gaussian likelihood and avoid unnecessary expansion and approximation. Hence, in the covariance matrix functional case, our assumptions can be significantly weaker.
The paper is organized as follows. In Section 2, we state the general theoretical framework of our results. It is illustrated with two priors, one conjugate prior and one nonconjugate prior. Section 3 considers specific examples of matrix functionals and the associated BvM results. The extension to discriminant analysis is developed in Section 4. Finally, we devote Section 5 to some discussions on the assumptions and possible generalizations. Most of the proofs are gathered in Section 6.
1.1 Notation
Given a matrix , we use to denote its spectral norm, and to denote its Frobenius norm. The norm , when applied to a vector, is understood to be the usual vector norm. Let be the unit sphere in . For any , we use notation and . The probability stands for and is for . In most cases, we use to denote the covariance matrix, and to denote the precision matrix (including those with superscripts or subscripts). The notation is for a generic probability, whenever the distribution is clear in the context. We use and to denote stochastic orders under the sampling distribution of the data. We use to indicate constants throughout the paper. They may be different from line to line.
2 A General Framework
Consider i.i.d. samples drawn from , where is a covariance matrix with inverse . A Bayes method puts a prior on the precision matrix , and the posterior distribution is defined as
where is the loglikelihood of defined as
We deliberately omit the logarithmic normalizing constant in for simplicity and it will not affect the definition of the posterior distribution. Note that specifying a prior on the precision matrix is equivalent to specifying a prior on the covariance matrix . The goal of this work is to show that the asymptotic distribution of the functional under the posterior distribution is approximately normal, i.e.,
where , as jointly with some appropriate centering and variance . In this paper, we choose the centering to be the sample version of , where is replaced by the sample covariance , and compare the BvM results with the classical asymptotical normality for in the frequentist sense. Other centering , including bias correction on the sample version, will be considered in the future work.
We first provide a framework for approximately linear functionals, and then use the general theory to derive results for specific examples of priors and functionals. For clarity of presentation, we consider the cases of functionals of and functionals of separately. Though a functional of is also a functional of , we treat them separately, since some functional may be “more linear” in than in , or the other way around.
2.1 Functional of Covariance Matrix
Let us first consider a functional of , . The functional is approximately linear in a neighborhood of the truth. We assume there is a set satisfying
(1) 
for any sequence , on which is approximately linear in the sense that there exists a symmetric matrix such that
(2) 
The main result is stated in the following theorem.
Theorem 2.1.
Under the assumptions of (2) and , if for a given prior , the following two conditions are satisfied:

,

For any fixed , for the perturbed precision matrix
then
where .
The theorem gives explicit conditions on both prior and functional. The first condition says that the posterior distribution concentrates on a neighborhood of the truth under the spectral norm, on which the functional is approximately linear. The second condition says that the bias caused by the shifted parameter can be absorbed by the posterior distribution. Under both conditions, Theorem 2.1 shows that the asymptotic posterior distribution of is
2.2 Functional of Precision Matrix
We state a corresponding theorem for functionals of precision matrix in this section. The condition for linear approximation is slightly different. Consider the functional . Let be a set satisfying
(3) 
for some integer and any sequence . We assume the functional is approximately linear on in the sense that there exists a symmetric matrix satisfying , such that
(4) 
The main result is stated in the following theorem.
Theorem 2.2.
Under the assumptions of (4), and , if for a given prior , the following conditions are satisfied:

,

For any fixed , for the perturbed precision matrix
then
where .
2.3 Priors
In this section, we provide examples of priors. In particular, we consider both a conjugate prior and a nonconjugate prior. Note that the result of a conjugate prior can be derived by directly exploring the posterior form without applying our general theory. However, the general framework provided in this paper can handle both conjugate and nonconjugate priors in a unified way.
Wishart Prior
Consider the Wishart prior on with density function
(5) 
supported on the set of symmetric positive semidefinite matrices.
Lemma 2.1.
Gaussian Prior
Consider Gaussian prior on with density function
(6) 
supported on the following set
for some constant .
Lemma 2.2.
3 Examples of Matrix Functionals
We consider various examples of functionals in this section. The two conditions of Theorem 2.1 and Theorem 2.2 are satisfied by Wishart prior and Gaussian prior, as is shown in Lemma 2.1 and Lemma 2.2 respectively. Hence, it is sufficient to check the approximate linearity of the functional with respect to or for the BvM result to hold. Among the four examples we consider, the first two are exactly linear and the last two are approximately linear. In the below examples, is always a random variable distributed as .
3.1 Entrywise Functional
We consider the elementwise functional and . Note that these two functionals are linear with respect to and respectively. For , we write
where the matrix is the th basis in with on its the element and elsewhere. For , we write
Note that . Hence, the corresponding matrices and in the linear expansion of and are . In view of Theorem 2.1 and Theorem 2.2, the asymptotic variance for is
The asymptotic variance for is
Plugging these quantities in Theorem 2.1, Theorem 2.2, Lemma 2.1, and Lemma 2.2, we have the following Bernsteinvon Mises results.
Corollary 3.1.
Consider the Wishart prior in (5) with integer . Assume and , then we have
where is the th element of the sample covariance . If we additionally assume , then
where is the th element of .
3.2 Quadratic Form
Consider the functional and for some . Therefore, the corresponding matrices and are . It is easy to see that . The asymptotic variances are
Plugging these representations in Theorem 2.1, Theorem 2.2, Lemma 2.1 and Lemma 2.2, we have the following Bernsteinvon Mises results.
Corollary 3.3.
Consider the Wishart prior in (5) with integer . Assume and , then we have
If we additionally assume , then
Corollary 3.4.
Remark 3.1.
The entrywise functional and the quadratic form are both special cases of the functional for some . It is direct to apply the general framework to this functional and obtain the result
Similarly, for the functional for some , we have
Both results can be derived under the same conditions of Corollary 3.3 and Corollary 3.4.
3.3 Log Determinant
In this section, we consider the logdeterminant functional. That is . Different from entrywise functional and quadratic form, we do not need to consider because of the simple observation
The following lemma establishes the approximate linearity of .
Lemma 3.1.
Assume and , then for any , we have
By Lemma 3.1, the corresponding matrix is . The asymptotic variance of is
Corollary 3.5.
Consider the Wishart prior in (5) with integer . Assume and , then we have
where is the sample covariance matrix.
Proof.
Corollary 3.6.
Consider the Gaussian prior in (6). Assume and , then we have
where is the sample covariance matrix.
Proof.
The proof of this corollary is the same as the proof of the last one using Wishart prior. The only difference is that the choice of , according to the proof of Lemma 2.2, is
for some . Therefore,
for some under the assumption, and the approximate linearity holds. ∎
One immediate consequence of the result is the Bernsteinvon Mises result for the entropy functional, defined as
Then it is direct that
3.4 Eigenvalues
In this section, we consider the eigenvalue functional. In particular, let be eigenvalues of the matrix with decreasing order. We investigate the posterior distribution of for each . Define the eigengap
The asymptotic order of plays an important role in the theory. The following lemma characterizes the approximate linearity of .
Lemma 3.2.
Assume and , then for any , we have
where is the th eigenvector of .
Lemma 3.2 implies that the corresponding in the linear expansion of is , and the asymptotic variance is
We also consider eigenvalues of the precision matrix. With slight abuse of notation, we define the eigengap of to be
The approximate linearity of is established in the following lemma.
Lemma 3.3.
Assume , then for any , we have
where is the th eigenvector of .
Similarly, Lemma 3.3 implies that the corresponding in the linear expansion of is , and the asymptotic variance is
Plugging the above lemmas into our general framework, we get the following corollaries.
Corollary 3.7.
Consider the Wishart prior in (5) with integer . Assume and , then we have
where is the sample covariance matrix. If we instead assume with being the eigengap of , then
Proof.
Corollary 3.8.
Consider the Gaussian prior in (6). Assume and , then we have
where is the sample covariance matrix. If we instead assume with being the eigengap of , then
Proof.
We only need to check the approximate linearity. According to Lemma 2.2, the choice of is