Channel Capacity Estimation using Free Probability Theory

Channel Capacity Estimation using Free Probability Theory

Øyvind Ryan,  and Mérouane Debbah, 
This work was supported by Alcatel-Lucent within the Alcatel-Lucent Chair on flexible radio at SUPELECThis paper was presented in part at the Asilomar Conference on Signals, Systems and Computers, 2007, Pacific Grove, USAØyvind Ryan is with the Centre of Mathematics for Applications, University of Oslo, P.O. Box 1053 Blindern, NO-0316 Oslo, NORWAY, oyvindry@ifi.uio.noMérouane Debbah is with SUPELEC, Gif-sur-Yvette, France, merouane.debbah@supelec.fr
Abstract

In many channel measurement applications, one needs to estimate some characteristics of the channels based on a limited set of measurements. This is mainly due to the highly time varying characteristics of the channel. In this contribution, it will be shown how free probability can be used for channel capacity estimation in MIMO systems. Free probability has already been applied in various application fields such as digital communications, nuclear physics and mathematical finance, and has been shown to be an invaluable tool for describing the asymptotic behaviour of many large-dimensional systems. In particular, using the concept of free deconvolution, we provide an asymptotically (w.r.t. the number of observations) unbiased capacity estimator for MIMO channels impaired with noise called the free probability based estimator. Another estimator, called the Gaussian matrix mean based estimator, is also introduced by slightly modifying the free probability based estimator. This estimator is shown to give unbiased estimation of the moments of the channel matrix for any number of observations. Also, the estimator has this property when we extend to MIMO channels with phase off-set and frequency drift, for which no estimator has been provided so far in the literature. It is also shown that both the free probability based and the Gaussian matrix mean based estimator are asymptotically unbiased capacity estimators as the number of transmit antennas go to infinity, regardless of whether phase off-set and frequency drift are present. The limitations in the two estimators are also explained. Simulations are run to assess the performance of the estimators for a low number of antennas and samples to confirm the usefulness of the asymptotic results.

Free Probability Theory, Random Matrices, deconvolution, limiting eigenvalue distribution, MIMO.

I Introduction

Random matrices, and in particular limit distributions of sample covariance matrices, have proved to be a useful tool for modelling systems, for instance in digital communications [1], nuclear physics [2] and mathematical finance [3]. A typical random matrix model is the information-plus-noise model,

(1)

and are assumed independent random matrices of dimension , where contains i.i.d. standard (i.e. mean , variance ) complex Gaussian entries. (1) can be thought of as the sample covariance matrices of random vectors . can be interpreted as a vector containing the system characteristics (direction of arrival for instance in radar applications or impulse response in channel estimation applications). represents additive noise, with a measure of the strength of the noise. Classical signal processing estimation methods consider the case where the number of observations is highly bigger than the dimensions of the system , for which equation (1) can be shown to be approximately:

(2)

Here, is the true covariance of the signal. In this case, one can separate the signal eigenvalues from the noise ones and infer (based only on the statistics of the signal) on the characteristics of the input signal. However, in many situations, one can gather only a limited number of observations during which the characteristics of the signal does not change. In order to model this case, and will be increased so that , i.e. the number of observations is increased at the same rate as the number of parameters of the system (note that equation (2) corresponds to the case ).

Previous contributions have already dealt with this problem. In [4], Dozier and Silverstein explain how one can use the eigenvalue distribution of to estimate the eigenvalue distribution of by solving a given equation. In [5, 6], we provided an algorithm for passing between the two, using the concept of multiplicative free convolution, which admits a convenient implementation. The implementation performs free convolution exactly based solely on moments.

In this paper, channel capacity estimation in MIMO systems is used as a benchmark application by using the connection between free probability theory and systems of type (1). For MIMO channels with and without frequency off-sets, we derive explicit asymptotically unbiased estimators which perform much better than classical ones. We do not prove directly that the proposed estimators work better than the classical ones, but present simulations which indicate that they are superior. We remark that the proposed capacity estimators will not be unbiased, it is needed that either the number of transmit antennas or the number of observations be large to obtain precise estimation. This limitation is most severe for channels with frequency off-sets, where it is needed in any case that the number of transmit antennas is large to obtain precise estimation. A case of study where channel estimation using free deconvolution has been used can be found in [7] and [8].

This paper is organized as follows. Section II presents the problem under consideration. Section III provides the basic concepts needed on free probability, including free convolution. In section IV, we formalize a new channel capacity estimator based on free probability, and explain some of the shortcomings for MIMO models with frequency off-sets. Another estimator, called the Gaussian matrix mean based estimator is then formalized to address the shortcomings of the free probability based estimator. We also present arguments for the Gaussian matrix mean based estimator performing better than the free probability based estimator, in some specific cases. These arguments are, however, not definite; we do not prove that one estimator is better than the other for the cases considered. The limitations of the estimators are also explained. The low rank of the channel (less than or equal to four) is the most notable limitation. In section V, simulations of the estimators are performed and compared, where several quantities are varied, like the noise variance, rank and dimensions of the channel matrix, and the number of observations. In the following, upper (lower boldface) symbols will be used for matrices (column vectors) whereas lower symbols will represent scalar values, will denote transpose operator, conjugation and hermitian transpose. will represent the identity matrix. will denote the non-normalized trace on matrices, while denotes the normalized trace. Also, we will throughout the paper use as a shorthand notation for the ratio between the number of rows and the number of columns in the random matrix model being considered.

Ii Statement of the problem

In usual time varying measurement methods for MIMO systems, one validates models [9] by determining how the model fits with actual capacity measurements. In this setting, one has to be extremely cautious about the measurement noise, especially for far field measurements where the signal strength can be lower than the noise.

The MIMO measured channel in the frequency domain can be modelled by [10, 11]

(3)

where , and are respectively the measured MIMO matrix ( is the number of receiving antennas, is the number of transmitting antennas), the MIMO channel and the noise matrix with i.i.d. standard Gaussian entries. Note that we suppose the noise matrix to be spatially white. In the realm of the channel measurements under study, the antenna outputs are connected to different RF (Radio Frequency) chains. As a consequence, for the case under study, the channel noise impairments are independent from one received antenna to the other. When one RF chain is used, the noise to be considered is not white. This case can also be studied within the framework of free deconvolution but goes beyond the scope of the paper. We suppose that the channel , although time varying, stays constant (block fading assumption) during blocks. and are and diagonal matrices which represent phase off-sets and phase drifts (which are impairments due to the antennas and not the channel) at the receiver and transmitter given respectively by (these are supposed to vary on a block basis)

where the phases and are random. We assume all phases independent and uniformly distributed.

We will also compare (3) with the simpler model

(4)

which is (3) without phase off-sets and phase drifts.

The capacity per receiving antenna (in the case where the noise is spatially white additive Gaussian and the channel is not known at the transmitter) of a channel with channel matrix and signal to noise ratio is given by

(5)

where are the eigenvalues of . The problem consists therefore of estimating the eigenvalues of based on few observations , which is paramount for modelling purposes. Note that the capacity expression supposes that the channel is perfectly known at the receiver and not at the transmitter. In practice, with the noise impairment, the channel will never be estimated perfectly and therefore expression (5) is not achievable. However, for MIMO modelling purposes, for which the capacity is often the matching metric, one needs to compare the capacity of the model with expression (5).

There are different methods actually used for channel capacity estimation [12, 13, 14, 15]. Usual methods discard, through an ad-hoc threshold procedure, all channels for which the channel to noise ratio () is lower than a threshold and then compute

where is the number of channels having a signal to noise ratio higher than the threshold. One of the drawbacks of this method is that one will not analyze the true capacity but only the capacity of the ”good channels”. Moreover, one has to limit the channel measurement campaign (in order to have enough channels higher than the threshold) only to regions which are close (in terms of actual distance) enough to the base station.

Other methods, in order to have a capacity estimation at a given signal to noise ratio (different from the measured one with noise variance ), normalize each channel realization and then compute for a different value of the noise variance (for example ) the capacity estimate . In the case where is high and is low, one usually finds a high capacity estimate as one measures only the noise, which is known to have a high multiplexing gain.

In this contribution, we will provide a neat framework, based on free deconvolution, for channel capacity estimation that circumvents all the previous drawbacks. Moreover, we will deal with model (3), for which no solution has been provided in the literature so far.

Iii Framework for free convolution

Free probability [16] theory has grown into an entire field of research through the pioneering work of Voiculescu in the 1980’s. Free probability introduces an analogy to the concept of independence from classical probability, which can be used for non-commutative random variables like matrices. These more general random variables are elements in what is called a noncommutative probability space. This can be defined by a pair , where is a unital -algebra with unit , and is a normalized (i.e. ) linear functional on . The elements of are called random variables. In all our examples, will consist of matrices or random matrices. For matrices, will be . The unit in these -algebras is the identity matrix . The analogy to independence is called freeness:

Definition 1

A family of unital -subalgebras will be called a free family if

(6)

A family of random variables are said to be free if the algebras they generate form a free family.

When restricting to spaces such as matrices, or functions with bounded support, it is clear that the moments of uniquely identify a probability measure, here called , such that . In such spaces, the distributions of and give us two new probability measures, which depend only on the probability measures associated with , when these are free. Therefore we can define two operations on the set of probability measures: Additive free convolution for the sum of free random variables, and multiplicative free convolution for the product of free random variables. These operations can in many cases be used to predict the spectrum of sums or products of large random matrices: If has an eigenvalue distribution which approaches and has an eigenvalue distribution which approaches , then in many cases the eigenvalue distribution of approaches .

One important probability measure is the Marc̆henko Pastur law  [17], which has the density

(7)

where , , , and is dirac measure (point mass) at . According to the notation in [18], is also the free Poisson distribution with rate and jump size . We will need the following formulas for the first moments of the Marc̆henko Pastur law:

(8)

(8) follows immediately from applying what is called the moment-cumulant formula [18], to the free cumulants [18] of the Marc̆henko Pastur law . The (free) cumulants of the Marc̆henko Pastur law are  [5]. Cumulants and the moment-cumulant formula in free probability have analogous concepts in classical probability.

describes asymptotic eigenvalue distributions of Wishart matrices, i.e. matrices on the form , with an random matrix with independent standard complex Gaussian entries, and . This can be seen from the following result, where the difference from (8) vanishes when :

Proposition 1

Let be a complex standard Gaussian matrix, and set . Then

(9)

This will be useful later on when we compute mixed moments of Gaussian and deterministic matrices. The proof of proposition 1 is given in appendix B.

We will also find it useful to introduce the concept of multiplicative free deconvolution: Given probability measures and . When there is a unique probability measure such that , we will write , and say that is the multiplicative free deconvolution of with . There is no reason why a probability measure should have a unique deconvolution, and whether one exists at all depends highly on the probability measure which we deconvolve with. This will not be a problem for our purposes: First of all, we will only have need for multiplicative free deconvolution with , and only in order to find the moments of the channel matrix. The problem of a unique deconvolution is therefore addressed by an existing algorithm for free deconvolution [6], which finds unique moments of (as long as the first moments of is nonzero).

We will need the following definitions:

Definition 2

By the empirical eigenvalue distribution of an random matrix we mean the random atomic measure

where are the (random) eigenvalues of .

Definition 3

A sequence of random variables in probability spaces is said to converge in distribution if, for any , , we have that the limit exists as .

To make the connection between models (4), (3) and model (1), we need the following result [5]:

Theorem 1

Assume that the empirical eigenvalue distribution of converges in distribution almost surely to a compactly supported probability measure . Then we have that the empirical eigenvalue distribution of also converges in distribution almost surely to a compactly supported probability measure uniquely identified by

(10)

where is dirac measure (point mass) at .

Theorem 1 can also be re-stated (through deconvolution) as

When we have observations in a MIMO system as in (4) or (3), we will form the random matrices

(11)

with

This is the way we will stack the observations in this paper. It is only one of many possible stackings. A stacking where the ratio between the number of rows and the number of columns converges to a quantity between and would allow us to use theorem 1 (which implicitly assumes ) directly to conclude almost sure convergence, which again would help us to conclude that the introduced capacity estimators are asymptotically unbiased. Such a stacking can also reduce the variance of the estimators. Even though the stacking considered here may not give the lowest variance, and may not give almost sure convergence, we show that its variance converges to and provides asymptotic unbiasedness for the corresponding capacity estimator.

For the case , the formula

(12)

can be combined with theorem 1 to give the approximation

(13)

for a single observation. This approximation works well when is large. For many observations, note that when there is no phase off-set and phase drift, so that the approximation

(14)

applies and generalizes (13). The ratio between the number of rows and columns in the matrices and is , considering the horizontal stacking of the observations in a larger matrix. It is only this stacking which will be considered in this paper.

When phase off-set and phase drift are added, it is much harder to adapt theorem 1 to produce the moments of . The reason is that theorem 1 really helps us to find the moments of . In the case without phase off-set and phase drift, this is enough since these moments are equal to the moments of . However, equality between these moments does not hold when phase off-set and phase drift are added. A procedure for converting between these moments may exist, but seems to be rather complex, and will not be dealt with here. In section IV, we will instead define an estimator for the channel capacity which does not stack observations into the matrix at all. Instead, an estimation will be performed for each observation, taking the mean of all the estimates at the end.

Iv New estimators for channel capacity

In this section, two new channel capacity estimators are defined. First, a free probability based estimator is introduced, which (for model (4)) will be shown to be asymptotically unbiased w.r.t. the number of observations. Then, by slightly modifying the free probability based estimator, we will construct what we call the Gaussian matrix mean based capacity estimator. This estimator will be shown, for model (4) and (3), to give unbiased estimates of the moments of the channel matrix for any number of observations. The computational complexity for the two estimators lies in the computation of eigenvalues and moments of the matrix , in addition to computing the free (de)convolution in terms of moments. For the matrix ranks considered here, free (de)convolution requires few computations. The complexity in the computation of eigenvalues and moments of the matrix grows with (the number of receiving antennas), which is small in this paper. The computational complexity in the estimators grows slowly with the number of observations, since the dimensions of does not grow with .

The two estimators are stated as estimators for the lower order moments of . Under the assumption that this matrix has limited rank (such as here), estimators for lower order moments can be used to define estimators for the channel capacity, since the capacity can be written as a function of the lowest moments when the matrix has rank , as explained below.

Iv-a The free probability based capacity estimator

The free probability based estimator is defined as follows:

Definition 4

The free probability based estimator for the capacity of a channel with channel matrix of rank , denoted , is computed through the following steps:

  1. Compute the first moments of the sample covariance matrix (i.e. compute for ),

  2. use (14) to estimate the first moments of ,

  3. estimate the nonzero eigenvalues of from . Substitute these in (5).

We also call the free probability based estimators for the first moments of .

Steps 2 and 3 in definition 4 need some elaboration. To address step 3, consider the case of a rank channel matrix. For such channel matrices, only the lowest three moments , , of need to be estimated in order to estimate the eigenvalues. To see this, first write

(15)

where , and are the three non-zero eigenvalues of . This quantity can easily be calculated from the elementary symmetric polynomials

by observing that

can be written as

(16)

can in turn be calculated from the power polynomials

by using the Newton-Girard formulas [19], which for the three first moments take the form , and . If the channel matrix has a higher rank , similar reasoning can be used to conclude that the first moments need to be estimated. In the simulations, the eigenvalues themselves are never computed, since computation of the moments and the Newton-Girard formulas make this unnecessary.

To address step 2 in definition 4, a Matlab implementation [20] which performs free (de)convolution in terms of moments as described in [6] was developed and used for the simulations in this paper. Free (de)convolution is computationally expensive for higher order moments only: For the first four moments, step 2 in definition 4 is equivalent to the following:

Proposition 2

Let and be as in definition 4. Then

(17)

where .

The proof of proposition 2 can be found in appendix A. The following is the main result on the free probability based estimator, and covers the different cases for bias and asymptotic bias w.r.t. number of observations or antennas.

Theorem 2

For observation, the following holds for both models (3) and (4):

  1. and are unbiased. and are biased, with the bias of given by

    In particular and are asymptotically unbiased when (with kept fixed), i.e.

  2. is asymptotically unbiased when (with kept fixed) and has rank , i.e. .

For any number of observations with model (4), the following holds:

  1. and are unbiased. and are biased, with the bias of given by

    In particular and are asymptotically unbiased when either or (with the other kept fixed), i.e.

    for .

  2. is asymptotically unbiased when either (with kept fixed), or (with kept fixed) and has rank , i.e. .

The proof of theorem 2 can be found in appendix C. The bias in theorem 2 motivates the definition of the estimator of the next section. The free probability based estimator performs estimation as if the Gaussian random matrices and deterministic matrices involved were free. It turns out that these matrices are only asymptotically free [16], which explains why there is a bias involved, and why the bias decreases as the matrix dimensions increase.

Iv-B The Gaussian matrix mean based capacity estimator

The expression for the Gaussian matrix mean based capacity estimator is motivated from computing expected values of mixed moments of Gaussian and deterministic matrices (lemma 1). This results in expressions slightly different from (17). We will show that the Gaussian matrix mean based estimator can be used for channel capacity estimation in certain systems where the free probability based estimator fails. The definition of the Gaussian matrix mean based capacity estimator is as follows for matrices of rank :

Definition 5

The Gaussian matrix mean based estimator for the capacity of a channel with channel matrix of rank , denoted , is defined through the following steps:

  1. For each observation, perform the following

    1. Compute the first moments of the sample covariance matrix (i.e. compute for ),

    2. find estimates of the first four moments of by solving

      (18)

      where ,

    Form the estimates , , of the first moments of ,

  2. estimate the nonzero eigenvalues of from . Substitute these in (5).

We also call the Gaussian matrix mean based estimators for the first moments of .

While a Matlab implementation [20] of free (de)convolution is used for the free (de)convolution in the free probability based estimator, the algorithm for the Gaussian matrix mean based capacity estimator used by the simulations in this paper follows the steps in definition 5 directly.

Note that (18) resemble the formulas in (17) when . is used in definition 5 since the observation matrices are not stacked together in a larger matrix in this case. Instead, a mean is taken of all estimated moments in step 1 of the definition. This is not an optimal procedure, and we use it only because it is hard to compute mixed moments of matrices where observations of type (3) are stacked together.

The following theorem is the main result on the Gaussian matrix mean based estimator, and shows that it qualifies for it’s name.

Theorem 3

For either model (4) or (3), the following holds:

  1. The estimators are unbiased, i.e.

  2. is asymptotically unbiased as (with kept fixed) when has rank , i.e. .

  3. In the case of observation, and . In particular, when has rank .

The proof of theorem 3 can be found in appendix C.

Iv-C Limitations of the two estimators

We have chosen to define two estimators, since they have different limitations.

The most severe limitation of the Gaussian matrix mean based capacity estimator, the way it is defined, lies in the restriction on the rank. This restriction is done to limit the complexity in the expression for the estimator. However, the computations in appendix C should convince the reader that capacity estimators with similar properties can be written down (however complex) for higher rank channels also. Also, while the free probability based estimator has an algorithm [6] for channel matrices of any rank, there is no reason why a similar algorithm can not be found for the Gaussian matrix mean based estimator also. The computations in appendix C indicate that such an algorithm should be based solely on iteration through a finite set of partitions. How this can be done algorithmically is beyond the scope of this paper.

For the free probability based estimator the limitation lies in the presence of phase off-set and phase drift (model (3)): When model (3) is used, the comments at the end of section III make it clear that we lack a relation for obtaining the moments of from the moments of . Without such a relation, we also have no candidate for a capacity estimator (capacity estimators in this paper are motivated by first finding moment estimators). In conclusion, the stacking of observations performed by the free probability based estimator does not work for model (3). Only the Gaussian matrix mean based estimator can perform reliable capacity estimation for many observations with model (3). The second limitation of the free probability based estimator comes from the inherent bias in its deconvolution formulas (17). The bias is only large when both and are small (see theorem 2), so this point is less severe (however, channel matrices down to size occur in practice). The bias in the lower order moments is easily seen to affect capacity estimation from the following expansion of the capacity

(19)

which can be obtained from substituting the Taylor expansion

(20)

into the definition of the capacity. Here is SNR, and are the moments of . It is clear from (19) that, at least if we restrict to small , the expression is dominated by the contribution from the first order moments. If is small we therefore first have a high relative error in the first moments after the deconvolution step, which will propagate to a high relative error in the capacity estimate for small due to (19). Thus, free probability based capacity estimation will work poorly for small and . The same limitation is not present in the Gaussian matrix mean based estimator, since its moment estimators are unbiased.

The limitation on the rank can in some cases be avoided, if we instead have some bounds on the eigenvalues: If we instead knew that at most four of the eigenvalues are not ”negligible”, we could still use proposition 2 to estimate the capacity. This follows from results on the continuity of multiplicative free convolution, which has been covered in [21]. Such continuity issues are also beyond the scope of this paper.

V Channel capacity estimation

Several candidates for channel capacity estimators for (4) have been used in the literature. We will consider the following:

(21)

These will be compared with the free probability based () and the Gaussian matrix mean based () estimators.

V-a Channels without phase off-set and phase drift

In figure 1, , and are compared for various number of observations, with , and a channel matrix of rank 3. It is seen that only the estimator gives values close to the true capacity. The channel considered has no phase drift or phase off-set. and are seen to have a high bias.

Fig. 1: Comparison of various classical capacity estimators for various number of observations, model (4). , receive antennas, transmit antennas. The rank of was .

In figure 2, the same and channel matrix are put to the test with the free-probability based and Gaussian matrix mean based estimators for various number of observations. These give values close to the true capacity. Both work better than for small number of observations.

Fig. 2: Comparison of and for various number of observations, model (4). , receive antennas, transmit antennas. The rank of was .

The free-probability based estimator converges faster (in terms of the number of observations) for lower rank channel matrices. In figure 3 we illustrate this for channel matrices of rank 3, 5 and 6.

Fig. 3: for various number of observations, model (4). , receive antennas, transmit antennas. The rank of was , and .

Simulations show that for channel matrices of lower dimension (for instance ), we have slower convergence to the true capacity.

V-B Channels with phase off-set and phase drift

In figure 4, the estimator is compared with the free-probability based estimator, the Gaussian matrix mean based estimator and the true capacity, for various number of observations, and with the same and channel matrix as in figure 1 and 2. Phase off-set and phase drift have also been introduced.

Fig. 4: Comparison of capacity estimators which worked for model (4) for increasing number of observations. Model (3) is used. , receive antennas, transmit antennas. The rank of was .

In this case, the free-probability based estimator and the -estimator seem to be biased.

In figure 5, simulations have been performed for various . Only observation was used, receive antennas, and transmit antennas. The channel matrix has rank . It is seen that the Gaussian matrix mean based capacity estimator is very close to the true capacity, There are only small deviations even if one observation is present, which provides a very good candidate for channel estimation in highly time-varying environments. The deviations are higher for higher .

Fig. 5: for observation, receive antennas, transmit antennas, with varying values of . Model (3). The rank of was .

In figure 6 we have also varied and used only one observation, but we have formed another rank matrix with, receive antennas, transmit antennas. It is seen that the deviation from the true capacity is much higher in this case.

Fig. 6: for observation, receive antennas, transmit antennas, with varying values of . Model (3). The rank of was .

We have in figure 7 increased the number of observations to , and used the same channel matrix. It is seen that this decreases the deviation from the true capacity.

Fig. 7: for observations, receive antennas, transmit antennas, with varying values of . Model (3). The rank of was .

Finally, let us use a channel matrix of rank . In this case we have to increase the number of observations even further to accurately predict the channel capacity. In figure 8, Gaussian matrix mean based capacity estimation is performed for a rank channel matrix with receive antennas, transmit antennas. observation is performed. If we increase the number of observations, Gaussian matrix mean based capacity estimation is seen to go very slowly towards the true capacity. To illustrate this, figure 9 shows Gaussian matrix mean based capacity estimation for observations on the same channel matrix.

Fig. 8: for observation, receive antennas, transmit antennas, with varying values of . Model (3). The rank of was .

It is seen that this decreases the deviation from the true capacity.

Fig. 9: for observations, receive antennas, transmit antennas, with varying values of . Model (3). The rank of was .

Vi Conclusion

In this paper, we have shown that free probability provides a neat framework for estimating the channel capacity for certain MIMO systems. In the case of highly time varying environments, where one can rely only on a set of limited noisy measurements, we have provided an asymptotically unbiased estimator of the channel capacity. A modified estimator called the Gaussian matrix mean based estimator was also introduced to take into account the bias in the case of finite dimensions and was proved to be adequate for low rank channel matrices. Moreover, although the results are based on asymptotic claims (in the number of observations), simulations show that the estimators work well for a very low number of observations also. Even when considering discrepancies such as phase drifts and phase off-set, the algorithm, based on the Gaussian matrix mean based estimator, provided very good performance. Further research is being conducted to take into account spatial correlation of the noise (in other words, deconvolving with other measures than the Marc̆henko Pastur law).

Appendix A The proof of proposition 2

Let be the moments of , the moments of . Then [6]

(22)

Note that (22) can also be inverted to express the in terms of the instead:

(23)

Note also that the moments of are

(24)

By the definition of the free probability based estimator,

where the moments of are . Denoting by , , we have that . Denote also the moments of by , the moments of by , and as before the moments of by . Write also as in proposition 2. For the third moment, we can apply (22), (24) and (23) in that order,