Finite Dimensional Statistical Inference
Abstract
In this paper, we derive the explicit series expansion of the eigenvalue distribution of various models, namely the case of noncentral Wishart distributions, as well as correlated zero mean Wishart distributions. The tools used extend those of the free probability framework, which have been quite successful for high dimensional statistical inference (when the size of the matrices tends to infinity), also known as free deconvolution. This contribution focuses on the finite Gaussian case and proposes algorithmic methods to compute the moments. Cases where asymptotic results fail to apply are also discussed.
Gaussian matrices, Random Matrices, convolution, limiting eigenvalue distribution.
I Introduction
Random matrix and free probability theory have fruitful applications in many fields of research, such as digital communication [1], mathematical finance [2] and nuclear physics [3]. In particular, the free probability framework [4, 5, 6, 7, 8] can be used for high dimensional statistical inference (or free deconvolution), i.e., to retrieve the eigenvalue distributions of involved functionals of random matrices. The general idea of deconvolution is related to the following problem [9]:
Given , two independent square Hermitian (or symmetric)
random matrices:
1) Can one derive the eigenvalue distribution of from the ones of and ?
If feasible in the large limit, this operation is named additive free deconvolution,
2) Can one derive the eigenvalue distribution of from the ones
of and ? If feasible in the large limit, this operation is named multiplicative free deconvolution.
In the literature, deconvolution for the large limit has been studied, and the methods generally used to compute it are the method of moments [4], and the Stieltjes transform method [10]. The expressions turn out to be quite simple if some kind of asymptotic freeness [8] of the matrices involved is assumed. However, freeness usually does not hold for finite matrices. Quite remarkably, the method of moments can still be used to propose an algorithmic method to compute these operations. The goal of this contribution is exactly to propose a general finite dimensional statistical inference framework based on the method of moments, which is implemented in software. As the calculations are quite tedious, and for sake of clarity, we focus in this contribution on Gaussian matrices^{1}^{1}1Cases such as Vandermonde matrices can also be implemented in the same vein [11, 12]. The general case is, however, more difficult..
The method of moments [9] is based on the relations between the moments of the different matrices involved. It provides a series expansion of the eigenvalue distribution of the involved matrices. For a given random matrix , the th moment of is defined as
(1) 
where is the expectation, the normalized trace, and the associated empirical mean measure defined by , where are the eigenvalues of . Quite remarkably, when , converges in many cases almost surely to an analytical expression that depends only on some specific parameters of (such as the distribution of its entries)^{2}^{2}2Note that in the following, when speaking of moments of matrices, we refer to the moments of the associated measure.. This enables to reduce the dimensionality of the problem and simplifies the computation of convolution of measures. In recent works deconvolution has been analyzed when for some particular matrices and , such as when and are free [13], or random Vandermonde and diagonal [11, 12].
The inference framework described in this contribution is based on the method of moments in the finite case: it takes a set of moments as input, and produces a set of moments as output, with the dimensions of the matrices considered finite. The framework is flexible enough to allow for repeated combinations of the random matrices we consider, and the patterns in such combinations are reflected nicely in the algorithms. The framework also lends itself naturally to combinations with other types of random matrices, for which support has already been implemented in the framework [12]. This flexibility, exploited with the method of moments, is somewhat in contrast to methods such as the Stieltjes transform method [10], where combining patterns of matrices naturally leads to more complex equations for the Stieltjes transforms (when possible) and can only be performed in the large limit. While the simplest patterns we consider are sums and products, we also consider products of many independent matrices. The algorithms are based on iterations through partitions and permutations as in [14], where the case of a Wishart matrix was considered. Our methods build heavily on the simple form which the moments of complex Gaussian random variables have, as exploited in [14]. We remark that, in certain cases, it is possible to implement the method of moments in a different way also [15, 16]. However, we are not aware of any attempts to make an inference framework as general as the one presented here. The case presented in [16], for instance, handles only certain zeromean, onesided correlated Wishart matrices.
The paper is organized as follows. Section II provides background essentials on random matrix theory and combinatorics needed to state the main results. Parts of Section II is rather technical, but it is not necessary to understand all details therein to understand the statement of the main results. These are summarized in Section III. First, algorithms for the simplest patterns (sums and products of random matrices) in the finite dimensional statistical inference framework are presented. Then, recursive algorithms for products of many Wishart matrices and a deterministic matrix are included, as well with some general remarks on how the general situation can be attacked from these basic algorithms. We then explain how algorithms for deconvolution can be obtained within the same framework, and formalize the corresponding moment estimators. Section IV presents details on the software implementation of the finite dimensional statistical inference framework. Section V presents some simulations and useful applications showing the implications of the presented results in various applied fields.
Ii Random matrix Background Essentials
In the following, upper boldface symbols will be used for matrices, whereas lower symbols will represent scalar values. will denote the transpose operator, conjugation, and hermitian transpose. will represent the identity matrix. We let be the (nonnormalized) trace for square matrices, defined by,
where are the diagonal elements of the matrix . We also let be the normalized trace, defined by . When is nonrandom, there is of course no need to take the expectation in (1). will in general be used to denote such nonrandom matrices, and if are such matrices, we will write
(2) 
whenever . (2) are also called mixed moments.
To state the results of this paper, random matrix concepts will be combined with concepts from partition theory. will denote the partitions of . For a partition , denote its blocks, while denotes the number of blocks. We will write when and belong to the same block of . Partition notation is adapted to mixed moments in the following way:
Definition 1
For , with , we define
(3)  
(4) 
With the empirical eigenvalue distribution of a hermitian random matrix , we mean the (random) function
(5) 
where are the (random) eigenvalues of . In many cases, the moments determine the distribution of the eigenvalues [18]. Due to the expectation in (1), the results in this paper thus apply to the mean eigenvalue distribution of certain random matrices.
In the following, we will denote a standard complex Gaussian matrix by . Standard complex means that the matrix has i.i.d. complex Gaussian entries, with zero mean and unit variance (in particular, the real and imaginary parts of the entries are independent, each with mean , variance ). will sometimes also be used to denote a standard selfadjoint Gaussian matrix, standard selfadjoint meaning that it has i.i.d. entries only above or on the main diagonal, with the real and imaginary parts independent with variance [8]. The matrix sizes in the following will be denoted for rectangular matrices, for square matrices. All random matrices we consider will be using selfadjoint or complex Gaussian matrices as building blocks.
Iia The diagrammatic method
Some schools of science learn methods for computing the moments of Gaussian matrices from diagrams. As an example of what we mean by this, we have in Figure 1 demonstrated how the second moment of a Wishart matrix can be found in this way.
In Figure 2 we have similarly demonstrated how the second moment of a matrix on the form can be found, where and are independent from .
This matrix form is much used when combining many observations of a random vector. While these two figures assume a complex Gaussian matrix, Figure 3 explains how the diagrammatic method can be modified to compute the second moment of , where is selfadjoint, Gaussian, and is independent from it.
The diagrammatic method is easily generalized to higher moments, and to other random matrix models where Gaussian matrices are building blocks.
The simple ingredient behind the diagrammatic method is the fact that one only needs consider conjugate pairings of complex Gaussian elements [14], which simplifies the computation of moments to simple identification of edges in graphs in all possible ways, as illustrated. This simple fact will be formalized in the following combinatorial definitions, which will be needed for the main results. The stated formulas are not new, since it has been known for quite some time that the diagrammatic method can be used to obtain them. The value in this paper therefore does not lie in these formulas, but rather in making the general results possible to write down within a framework, and available for computation in terms of an accompanying software implementation.
Without going in all the details, there are similarities with the sketched diagrammatic approach, and other approaches based on diagrammatics. In particular in physics, and especially the field of statistical mechanics (see e.g. [19, 20]). It has been used recently in the field of wireless communications, related to the analysis of the mean and the variance of the Signal to Noise Ratio at the output of the MMSE receiver in MIMO and OFDMCDMA systems [21]. Instead of calculating all the moments individually, one can represent these operations diagrammatically by solid lines and dashed lines. The idea is to draw them using Feynman rules derived from a generating function, and perform a resummation of all relevant graphs where averaging over matrices corresponds to connecting in all possible ways the different lines seperately. In many cases, in the large limit, only terms with noncrossing lines survive, A general description is proposed in [22, 23, 24]. The nomenclature we use for stating our results deviate some from that found in the literature.
To explain better how the diagrammatic method is connected to random matrices, write the trace of a product of matrices as
(6)  
where the entries of are . We will visualize the matrix indices as points (in the following also called vertices) on a circle, and the matrix entries as edges labeled , with the points being the end points of the edge labeled . We will call this the circular representation of (6). If is standard, complex, Gaussian, the circular representation of , before any Gaussian pairings have taken place, is thus shown in Figure 4.
More general than (6), we can have a product of traces,
(7) 
This will also be given an interpretation in terms of circles, with points/edges on each, respectively. Conjugate pairings of complex Gaussian elements in (7) in all possible ways are performed as in the case of one circle, and is illustrated in Figure 5 for and , with the first edge on the first cirle paired with the last edge on the second circle. In (7), this corresponds to and being conjugate of each other (there are twelve matrices present here,since ). This can only be the case if and .
IiB Formalizing the diagrammatic method
The following definition, which is a generalization from [14], formalizes identifications of edges, as we have illustrated:
Definition 2
Let be a positive integer. By a partial permutation we mean a onetoone mapping between two subsets of . We denote by the set of partial permutations of elements. When , we define by
Note that in this definition, subtraction is performed in such a way that the result stays within the same circle. In terms of Figure 5, this means that and . Addition is assumed to be performed in the same way, so that , and . In the following, this convention for addition and subtraction will be used, and the number of edges on the circles will be implicitly assumed, and only mentioned when strictly needed.
When we compute , we multiply out to obtain a sum of terms of length on the form (6), where the terms are one of , or , with  and terms appearing in alternating order. corresponds to the indices of in such a term (after their order of appearance), to the indices of , and to the Gaussian conjugate pairings. Computing thus boils down to iterating through . In Figure 2, this was examplified with , with the sizes of the subsets equal to . was indicated by the starting edges of the arrows, by the ending edges. (a) to (d) represents the only possible pairings in the terms , , , and , respectively. The case for a Wishart matrix is simpler, since there is no need to multiply out terms, and we need only consider where , as shown in Figure 1. Such are in onetoone correspondence with , the set of permutations of elements.
It is clear that maps onto , and has period two (i.e. for all ), where . In particular, maps even numbers to odd numbers, and vice versa. When edges are identified as dictated by a partial permutation, vertices are identified as dictated by the partition defined as follows:
Definition 3
Let be a partial permutation, and let be determined by and a pairing between them. We associate to an equivalence relation on generated by
(8) 
We let and denote the number of blocks of consisting of only even or odd numbers, respectively.
Any block in consists either of even numbers, or odd numbers, since maps between even and odd numbers, so that the definitions of and above make sense. In the following, we will let be the cardinalities of the blocks consisting of even numbers only, and the cardinalities of the blocks consisting of odd numbers only. The restriction of to the odd numbers thus defines another partition, which we will denote . Similarly, the restriction of to the even numbers yields another partition, which we will denote . and will appear in the main results later on.
should be interpreted as an equivalence relation on matrix indices occuring in . The following definition similarly keeps track of how conjugate pairings group matrix indices occuring in into traces:
Definition 4
Let be the set of deterministic edges (i.e. edges corresponding to ocurrences of ), and let be determined by . is defined as the equivalence relation on generated by the relations
if  (9)  
if  (10) 
Let also be the number of blocks of contained within the even numbers which intersect , and let be the number of blocks of contained within the odd numbers which intersect .
Two edges from belong to the same block of if, after identifying edges, they are connected with a path of edges from . A block of which contains a vertex from corresponds to a matrix index which occurs in a deterministic element. As an example, in Figure 2 all four partial permutations are seen to give rise to a with one block only. They are seen to be for (a), for (b), for (c), and for (d).
IiC Formalizing the diagrammatic method for selfadjoint matrices
A standard, selfadjoint, Gaussian random matrix can be written on the form , where is an standard complex Gaussian matrix. We can thus compute the moments of and (with selfadjoint Gaussian, and selfadjoint and independent from ) by substituting this, and summing over all possible combinations of and . This rewriting in terms of complex Gaussian matrices means that we need to slightly change the definitions of the partitions and to the following:
Definition 5
Let be determined by disjoint subsets of with (in particular, ). We associate to an equivalence relation on generated by
for  
for 
As before, corresponds to choices from , to choices from , when the selfadjoint Gaussian matrix is expressed as a sum of complex Gaussian matrices. Definition 4 is modified as follows:
Definition 6
With as in Definition 5, is defined as the equivalence relation on generated by the relations
if  (11)  
if  (12)  
Define also as the number of blocks of which intersect .
As an example, In Figure 3(c) we have that , , in Figure 3(d) we have that , is the empty partition.
In the following, we will state our results in terms of normalized traces. We remark that some of these have been stated previously in terms of nonnormalized traces [14]. In some results, we have substituted , which makes the results compatible with the asymptotic case often used in the literature, where and grow to infinity at the same rate, the rate being . In the following, equivalence relations will interchangeably also be refered to as partitions.
Iii Statement of main results
The main results of the paper are split into three sections. In the first, basic sums and products are considered, basic meaning that there is only one random matrix involved. In the second section we expand to the case where independent random matrices are involved, in which case expectations of products of traces are brought into the picture. In these two sections, all Gaussian matrices are assumed complex and rectangular, for which the results relate to the moments of the singular law of the matrices. In the third section we state similar results for the case where the Gaussian matrices instead are assumed square and selfadjoint.
Iiia Basic sums and products
Our first and simplest result concerns the moments of a doubly correlated Wishart matrix. These matrices are the most general known form we have found which have been considered in the literature [25], which can be addressed by our results:
Theorem 1
Let be positive integers, be standard, complex, Gaussian, and a (deterministic) matrix, a (deterministic) matrix. For any positive integer ,
Theorem 1 is proved in Appendix A. The next results will be proved using the same techniques, and will therefore be given shorter proofs. The special case of a product of a Wishart matrix and a deterministic matrix, and a Wishart matrix itself, can be considered as a special case. It is seen that, for the latter, the contribution for the identification of edges refered to in Figure 1 is equal to , which indeed depends only on , and the number of evenlabeled and oddlabeled vertices in the resulting graphs. As an example, the contributions from the two possible identifications of edges giving the second moment of a Wishart matrix is (Figure 1(a)) and (Figure 1(b)). The second moment is thus , which also can be infered from the more general formuals in Section IV.
We remark also that other closed forms of (1) can be found in the literature. When the Wishart matrices are onesided correlated (i.e. ), [16] gives us the means to find the first order moments (i.e. one circle only is involved) in certain cases, also if the ’th moment is replaced with more general functionals of . It seems, however, that this result and the techniques used to prove it are hard to generalize.
We now turn to the moments of . In the large limit, the case where is related to the concept of rectangular free convolution [26], which admits a nice implementation in terms of moments [27]. When and are finite, the following will be proved in Appendix B.
Theorem 2
Let be an standard, complex, Gaussian matrix, deterministic matrices, and set . We have that
(13)  
Note that in Theorem 2,  and terms have not been grouped together. This has been done to make clear in the proof the origin of the different terms.
IiiB Expectations of products of traces
Theorems 1 and 2 can be recursively applied, once one replaces and with random matrices. In this process, we will see that expectations of products of traces are also needed, not only the first order moments as in theorems 1 and 2. The recursive version of Theorem 1 looks as follows.
Theorem 3
Assume that the random matrix and the random matrix are both independent from the standard, complex, Gaussian matrix , and define
Set , and let as before be the cardinalities of the blocks of odd numbers only of , be the cardinalities of the blocks of even numbers only of , with , the number of blocks consisting of even and odd numbers only, respectively. We have that
(14)  
There are only two differences from Theorem 1. First, is replaced with , since we now are taking traces instead of (we modify with additional trace normalization factors). Second, we replace the trace of a deterministic matrix with the expectation of a random matrix. It is clear that the only additional thing needed for the proof to be replicated is that the random matrices and are independent.
In some cases, for instance when , Theorem 3 also allows for deconvolution. By this we mean that we can write down unbiased estimators for , (which is the simplified notation for the mixed moments for the case where ) from an observation of . This is achieved by stating all possible equations in (14) (i.e. for all possible ), and noting that these express a linear relationship between all , and all , where there are as many equations are unknowns. The implementation presented in Section IV thus performs deconvolution by constructing the matrix corresponding to (14), and applying the inverse of this to the aggregate vector of all mixed moments of the observation. We remark that the inverse may not exist if , as will also be seen from expressions for these matrices in Section IV.
It is also clear that the theorem can be recursively applied to compute the moments of any product of independent Wishart matrices
(15) 
where is deterministic and is an standard complex Gaussian matrix. The ’s during these recursions will simply be
Unbiased estimators for the moments of from observations of the form (15) can also be written down. Such deconvolution is a multistage process, where each stage corresponds to multiplication with an inverse matrix, as in the case where only one Wishart matrix is involved.
The recursive version of Theorem 2 looks as follows.
Theorem 4
Let be an standard, complex, Gaussian matrix and let be and independent from . Set
We have that
(16) 
where are the cardinalities of the blocks of , divided by .
The proof is omitted, since it follows in the same way Theorem 1 was generalized to Theorem 3 above. Deconvolution is also possible here. It is in fact simpler than for Theorem 3, in that there is no need to form the inverse of a matrix [17]. This is explained further in the implementation presented in Section IV.
IiiC Selfadjoint Gaussian matrices
The analogues of Theorem 1 and Theorem 2 when the Gaussian matrices instead are selfadjoint look as follows. Since Theorem 1 and Theorem 2 had straightforward generalizations to the case where all matrices are random, we will here assume from the start that all matrices are random:
Theorem 5
Assume that the random matrix is independent from the standard selfadjoint Gaussian matrix , and define
Set , and let be the cardinalities of the blocks of . We have that
(17) 
The proof follows in the same way as the proofs in Appendix A and B. We therefore only give the following quick description on how the terms in (17) can be identified:

comes from the normalizing factors in ,

comes from replacing the nonnormalized traces with the normalized traces to obtain ,

comes from the normalizing factors in ,

comes from the traces taken in .
Similarly, the result for sums involving selfadjoint matrices takes the following form:
Theorem 6
Let be an standard selfadjoint Gaussian matrix, and let be and independent from . Set
Set , and let be the cardinalities of the blocks of from Definition 6. We have that
(18) 
The items in (18) are identified as follows:

comes from the normalizing factors in the choices of ,

comes from the normalizing factors in the choices of ,

comes from counting the vertices which do not come from applications of (12),

comes from the traces taken in ,

comes from replacing the nonnormalized traces with the normalized traces to obtain .
Recursive application of theorems 3, 4, 5, and 6, allows us to compute moments of most combinations of independent (selfadjoint or complex) Gaussian random matrices and deterministic matrices, in any order, and allows for deconvolution in the way explained. This type of flexibility makes the method of moments somewhat different from that of the Stieltjes transform, where expressions grow more complex, when the model grows more complex. Moreover, contrary to methods based on the Stieltjes transform, the results scale in terms of the number of moments: from a given number of moments, they enable us to compute the same number of output moments. The theorems also enable us to compute second order moments (i.e., covariances of traces) for many types of matrices, using the same type of results. Asymptotic properties of such second order moments have previously been studied [28, 29, 30]. While previous papers allow us to compute such moments and second order moments asymptotically, in many cases the exact result is needed.
Iv Software implementation
Theorems 3, 4, 5, and 6 present rather complex formulas. However, it is also clear that they are implementable: all that is required is traversing subsets (), permutations (), and implement the equivalence relations from . Code in Matlab for doing so has been implemented for this paper [31], as well as the equivalence relations we have defined. Also, the implementation stores results from traversing all partitions in matrices, and this traversal is performed only once. Our formulas are thus implemented by multiplying the vectors of mixed moments with a precomputed matrix. These operations are also vectorized, so that they can be applied to many observations simultaneously (each vector of mixed moments is stored as a column in a matrix, and one larger matrix multiplication is performed). Representing the operations through matrices also addresses more complex models, since many steps of matrix multiplication are easily combined. In [32], documentation of all public functions in this library can be found, as well as how our methods for Gaussian matrices can be combined with other types of matrices. The software can also generate formulas directly in LaTeX, in addition to performing the convolution or deconvolution numerically in terms of a set of input moments. All formulas in this section have in fact been automatically generated by this implementation. For products, we have written down the matrices needed for convolution and deconvolution, as described previously. For sums, we have only generated the expressions for the first moments. Due to the complexity of the expressions, it is not recommended to compute these by hand.
Iva Automatically generated formulas for theorems 3 and 2
We obtain the following expression for the first three moments in Theorem 3, where (we consider only onesided correlated Wishart matrices) and are as in that theorem:
More generally, in order to compute the moments of products of Wishart matrices, we need to compute matrices as above for the different sizes of the different Wishart matrices, and multiply these. In Section V, we will see an example where two Gaussian matrices are multiplied. Note that the matrices from above are not invertible when .
IvB Automatically generated formulas for theorems 5 and 6
We obtain the following expression for the first four moments in Theorem 5, where and are as in that theorem:
(since ). The implementation is also able to generate the expected moments of the product of any number deterministic matrices, independent, selfadjoint (or complex) Gaussian matrices, in any order [32]. This is achieved by constructing the matrices for the selfadjoint and complex cases as above, and multiplying the corresponding matrices together in the right order.
V Applications
In this section, we consider some wireless communications examples where the presented inference framework is used.
Va MIMO rate estimation
In many MIMO (Multiple Input Multiple Output) antenna based sounding and MIMO channel modelling applications, one is interested in obtaining an estimator of the rate in a noisy and mobile environment. In this setting, one has noisy observations of the channel , where is an deterministic channel matrix, is an standard, complex, Gaussian matrix representing the noise, and is the noise variance. The channel is supposed to stay constant during symbols. The rate estimator is given by