Marchenko-Pastur and Bercovici-Pata generalized

Marchenko-Pastur theorem and Bercovici-Pata bijections for heavy-tailed or localized vectors

Abstract.

The celebrated Marchenko-Pastur theorem gives the asymptotic spectral distribution of sums of random, independent, rank-one projections. Its main hypothesis is that these projections are more or less uniformly distributed on the first grassmannian, which implies for example that the corresponding vectors are delocalized, i.e. are essentially supported by the whole canonical basis. In this paper, we propose a way to drop this delocalization assumption and we generalize this theorem to a quite general framework, including random projections whose corresponding vectors are localized, i.e. with some components much larger than the other ones. The first of our two main examples is given by heavy tailed random vectors (as in the model introduced by Ben Arous and Guionnet in [5] or as in the model introduced by Zakharevich in [32] where the moments grow very fast as the dimension grows). Our second main example, related to the continuum between the classical and free convolutions introduced in [11], is given by vectors which are distributed as the Brownian motion on the unit sphere, with localized initial law. Our framework is in fact general enough to get new correspondences between classical infinitely divisible laws and some limit spectral distributions of random matrices, generalizing the so-called Bercovici-Pata bijection.

Key words and phrases:
Random matrices, Marchenko-Pastur Theorem, free probability, infinitely divisible distributions, Bercovici-Pata bijection
2000 Mathematics Subject Classification:
15A52, 46L54, 60F05
This work was partially supported by the Agence Nationale de la Recherche grant ANR-08-BLAN-0311-03 and partly accomplished during the first named author’s stay at New York University Abu Dhabi, Abu Dhabi (U.A.E.).

1. Introduction

In 1967, Marchenko and Pastur introduced a successful matrix model inspired by the elementary fact that each Hermitian matrix is the sum of orthogonal rank one homotheties. Substituting orthogonality with independence, they considered in their seminal paper [23] the random matrix defined by

(1)

where is an i.i.d. sequence of real valued random variables and is an i.i.d. sequence of -dimensional column vectors, whose conjugate transpose are denoted , independent of . As a main result, they proved that the empirical spectral measure of this matrix converges to a limit with an explicit characterization under the following assumptions:

  1. tend to infinity in such a way that ;

  2. the first four joint moments of the entries of are not too far, roughly speaking, from the ones of the entries of a standard Gaussian vector.

In the special case where all ’s are equal to one, the matrix (1) reduces to a so-called empirical covariance matrix, and its limit spectral distribution is none other than the well-known Marchenko-Pastur distribution with parameter .

It has to be noticed that even in the general case, the limit spectral distribution does not depend on the particular choice of the ’s, granted they satisfy the above hypothesis. For example, one can choose to have uniform distribution on the sphere of or with radius , or to be a standard gaussian: such a vector is said to be delocalized, which means that with large probability, is small; more specifically:

After the initial paper of Marchenko and Pastur, a long list of further-reaching results about limit spectral distribution of empirical covariance matrices have been obtained, by Yin and Krishnaiah [31], Götze and Tikhomirov [20, 21], Aubrun [3], Pajor and Pastur [25], Adamczak [1]. All of them are devoted to the empirical covariance matrix of more or less delocalized vectors, with limit spectral distribution being the Marchenko-Pastur distribution (except in the case treated in [31], but there the vector are still very delocalized).

In this paper, our goal is to drop the delocalization assumption and to be able to deal with localized ’s, i.e. with some entries much larger than the other ones. For example, the applications of our main theorem include the case where the entries of have heavy tails, but also in some other examples of localized vectors, such as the one where the law of results from a Brownian motion with localized initial condition.

This approach is based on our preceding works [6, 16] on the Bercovici-Pata bijection (see also [7, 26, 10]). This bijection, that we denote by , is a correspondence between the probability measures on the real line that are infinitely divisible with respect to the classical convolution and the ones which are infinitely divisible with respect to the free convolution . In [6, 16], we constructed a set of matrix ensembles which produces a quite natural interpretation of . This construction is easy to describe for compound Poisson laws and makes the connection with Marchenko-Pastur’s model quite clear. Let be still an i.i.d. sequence of real valued random variables, i.i.d. column vectors uniformly distributed on the sphere of with radius , and a standard Poisson process, , and being independent. For each , we defined the random matrix

(2)

and we proved that its empirical spectral law converges to when goes to infinity, where is the compound Poisson law of

The link with Marchenko-Pastur’s model of (1) is now obvious and it is easy to verify that the empirical spectral laws of (1) and (2) have same limit if . Hence, our previous works [6, 16] could be viewed as another insight on Marchenko-Pastur’s results, partly more restricted, since we only considered uniformly distributed random vectors , partly more general, since our construction extended to all infinitely divisible laws. In fact, the main advantage of our matricial model, which has also been studied in [26], over Marchenko-Pastur’s one is to be infinitely divisible. It allows us to derive simpler proofs, using appropriate tools as cumulant computations or semi-groups.

In the present paper, we extend our construction to a larger class of ’s, while continuing to benefit of the infinitely divisible framework. Roughly speaking, we are able to prove the convergence of the empirical spectral law if we suppose only that the entries of are exchangeable and have a moment of order growing as at most in , for any fixed (cf. Theorem 2.6).

Then, approximations allow to extend the result to with heavy tailed entries (see Theorem 3.1). When the ’s are constant, we recover a result by obtained by Belinschi, Dembo and Guionnet in [4]. Our result is more general from a certain point of view (we allow some random ’s), but less explicit since we characterize the limit spectral distribution as the weak limit of a sequence of probability distributions with calculable moments and not with a functional equation, as in [4]. We also state (Theorem 3.2) a “covariance matrices version” of Zakharevich’s generalization of Wigner’s theorem of [32], which is a direct consequence of our key result Theorem 2.6.

We also devote a particular interest to the special case where the column vectors ’s are copies of

(3)

where is uniformly distributed on the canonical basis ,…, , independent of , with independent standard Gaussian variables. Let us emphasized that is a quite typicaly localized vector since it has exactly one entry which is much bigger than the others. Precisely, we have:

This is the reason why the limits differ from the ones of the classical matrix models. For example, the classical Marchenko-Pastur Theorem about empirical covariance matrices is not true anymore for such vectors. More specifically, we have:

Theorem 1.1.

Let be i.i.d. copies of the vector defined in (3). Let

(4)

be the (dilated) empirical covariance matrix of the sample . As with , the empirical spectral law of converges to a limit law with unbounded support which is characterized by its moments, given by the formula

(5)

with the set of partitions of and defined by:

if are the connected components of ;

if is connected, is the number of times one changes from one block to another when running through in a cyclic way.

Note that the law is the Poisson law with parameter if and tends to the free Poisson law (also called Marchenko-Pastur law) as tends to : the weight penalizes crossings in partitions (indeed, if and only if is non crossing). An illustration is given in Figure 1 below. In fact, Formula (5) can be generalized into a more general moments-cumulants formula which provides a new continuous interpolation between classicaly and freely infinitely divisible laws. This interpolation is related to the notion of -freeness developed by the first named author and Lévy in [11] and is based on a progressive penalization of the crossings in the moments-cumulants formula (19).

(a) Case where
(b) Case where
(c) Case where
Figure 1. The empirical spectral distribution of the matrix of (4) for , at several values of . We see that this distribution, which is approximately the law , is close to the Poisson law with parameter for close to zero, and that it converges to the Marchenko-Pastur law with parameter (whose density is plotted by a smooth continuous line) as grows.

The paper is organized as follows. Next section describes the main results; Section 3 provides some applications; the following sections are devoted to the proofs; the last section is an appendix where we recall some facts on the infinitely divisible laws, the Bercovici-Pata bijection and (hyper)graphs, for the easyness of the reader.

As this preprint was published, Victor Pérez-Abreu informed us that he is working on a close subject with J. Armando Domínguez-Molina and Alfonso Rocha-Arteaga in the forthcoming preprint [18].

Acknowledgements. The authors thank James Mingo for a useful explanation during the program “Bialgebras in free probability” in the Erwin Schrödinger Institute in Wien during spring 2011. They also thank Thierry Lévy and Camille Male for some discussions on a preliminary version of the paper.

2. The main results

2.1. A general family of matrix ensembles

The basic facts on the -infinitely divisible laws are recalled in the appendix. For such a law, its Lévy exponent is the function such that the Fourier transform of is .

Case of compound Poisson laws

Such a law is a one of

where is a random variable with Poisson law with expectation and the ’s are i.i.d. random variables, independent of . In this case, if denotes the law of the ’s, the Lévy exponent of is For , let be a sequence of i.i.d. copies of a random column vector or . Then we define to be the law of

where is a random variable with Poisson law with expectation , the ’s are i.i.d. random variables with law and the ’s are i.i.d. copies of (whose conjugate transpose are denoted by ), all being independent. Let us notice that is still a compound Poisson law and that its Lévy exponent is given by:

for any Hermitian matrix .

General case

Here, we shall extend the previous construction for a general infinitely divisible law. In the following theorem and in the rest of the paper, the spaces of probability measures are endowed with the weak topology. denotes either or .

Theorem 2.1.

Let be an infinitely divisible law on , let us fix and let be a random column vector such that . Let be a sequence of i.i.d. copies of and, for each , let be i.i.d. -distributed random variables. Then the sequence of Hermitian matrices

(6)

converges in distribution, as ( being fixed), to a probability measure on the space of Hermitian matrices, whose Fourier transform is given by

(7)

for any Hermitian matrix .

The following proposition extends the theorem to a quite more general framework, where the ’s are i.i.d. but not necessarily distributed according to and only satisfy the limit theorem

for a sequence .

Proposition 2.2.

a) If the law of is compactly supported, then to any limit theorem

there corresponds a limit theorem in the space of Hermitian matrices

(8)

where are i.i.d. -distributed random variables, is a sequence of i.i.d. copies of , independent of the ’s.

b) In the case where the law of is not compactly supported, (8) stays true as long as one supposes that is bounded uniformly in .

Remark 2.3.

Such a construction has been generalized to the more general setting of Hopf algebras by Schürmann, Skeide and Volkwardt in [29].

2.2. Convergence of the empirical spectral law

The empirical spectral law of a matrix is the uniform probability measure on its eigenvalues (see Equation (44) in the appendix). When is uniformly distributed on the unit sphere, we proved that the empirical spectral law associated to converges when the size tends to infinity (cf. [6, 16]). In order to obtain other convergences, we shall first make the following assumptions on the random vector . We denote the entries of by . Roughly speaking, these assumptions mean that the ’s are exchangeable and that the moment of order of grows at most in , for all . These assumptions will be weakened in the next section to consider heavy tailed variables.

Hypothesis 2.4.

a) For each , the entries of are exchangeable and have moments of all orders.

b) As goes to infinity, we have:

  • for each , for each ,

    (9)
  • for each , for all positive integers , there exists finite such that

    (10)

    Moreover, there is a constant such that for all , for all ,

    (11)
Remark 2.5.

Let define the random column vector , with some i.i.d. variables uniformly distributed on , independent of . Note that if satisfies Hypothesis 2.4, so does , with the same function . Moreover, the expectation of (9) is null for as soon as the multisets and are not equal (in the complex case) and as soon as an element in the multiset appear an odd number of times (in the real case).

The following theorem is the key result of the paper. For its second part, we endow the set of functions on multisets of integers (it is the set that belongs to) with the product topology.

Theorem 2.6.

We suppose that Hypothesis 2.4 holds.

a) For any -infinitely divisible distribution , the empirical spectral distribution of a -distributed random matrix converges almost surely, as , to a deterministic probability measure , which depends only on and on the function of Equation (10).

b) The probability measure depends continuously on the pair .

c) The moments of moments can be computed when has moments to all orders via the following formula,

(12)

where the non negative numbers , which factorize along the connected components of , are given at Lemma 6.3 and the numbers are the cumulants of (whose definition is recalled in the appendix).

d) If the Lévy measure of has compact support, then admits exponential moments of all orders.

The following proposition allows to assert that many limit laws obtained in Theorem 2.6 have unbounded support. Recall that a law is said to be non degenerate if it is not a Dirac mass.

Proposition 2.7.

If the function of Hypothesis 2.4 is such that , then for any non degenerate -infinitely divisible distribution whose classical cumulants are all non negative, the law has unbounded support.

3. Applications and examples

3.1. Heavy-tailed Marchenko-Pastur theorem

In this section, we use Theorem 2.6 to extend the theorem of Marchenko and Pastur described in the introduction to vectors with heavy-tailed entries. This also extends Theorem 1.10 of Belinschi, Guionnet and Dembo in [4] (their theorem corresponds to the case ), but we do not provide the explicit characterization of the limit that they propose. Also, our result is valid in the complex as in the real case, whereas the one of [4] is only stated in the real case (but we do not know whether this is an essential restriction of the approach of [4]).

Let us fix and let be an infinite array of i.i.d. -valued random variables such that the function

(13)

has slow variations as . For each , define the column vector ( depends implicitly on ). Let us define .

Theorem 3.1.

For any set of i.i.d. real random variables (with any law, but that does not depend on ) independent of the ’s and any fixed , the empirical spectral law of the random matrix

converges almost surely, as with , to a deterministic probability measure which depends only on , on and on the law of the ’s. This limit law depends continuously of these three parameters.

3.2. Covariance matrices with exploding moments

The following theorem is a direct consequence of Theorem 2.6 and Proposition 2.7. It is the “covariance matrices version” of Zakharevich’s generalization of Wigner’s theorem to matrices with exploding moments (see [32] and also the recent work [22] by Male).

Theorem 3.2.

Let be a complex random matrix with i.i.d. centered entries whose distribution might depend on and . We suppose that there is a sequence such that is bounded and such that for each fixed ,

Then the empirical spectral law of

converges, as with , to a probability measure which depends continuously on the pair . If for all , is Marchenko-Pastur distribution with parameter dilated by a coefficient . Otherwise, has unbounded support but admits exponential moments of all orders.

3.3. Non centered Gaussian vectors, Brownian motion on the unit sphere and a continuum of Bercovici-Pata’s bijections

In this section, we give examples of vectors which satisfy Hypothesis 2.4 for a certain function . These examples will allow to construct the continuum of Bercovici-Pata bijections mentioned in the introduction.

Let us fix and consider defined thanks to one of the two (or three, actually) following definitions

  1. Gaussian and uniform cases:

    (14)

    where is uniformly distributed on the canonical basis ,… of , independent of , with i.i.d. variables whose distribution is either the centered Gaussian law on with variance or the uniform law on ,

  2. Brownian motion: is a Brownian motion on the unit sphere of taken at time , whose distribution at time zero is the uniform law on the canonical basis of . Such a process is a strong solution of the SDE

    (15)

    where is a standard Brownian motion on the Euclidian space of skew-Hermitian matrices, endowed with the scalar product .

Of course, these models make sense for : in the first model, it means only that and the second model, at , can be understood as its limit in law, i.e. a random vector with uniform law on the sphere with radius . For , all formulas below make sense (and stay true) by taking their limits.

Thanks to the results of [9], it can be seen that the Gaussian model and the Brownian one have approximately the same finite-dimensional marginals, as . The following proposition, whose proof is postponed to Section 9, makes this analogy stronger.

Proposition 3.3.

For as defined according to any of the above models, Hypothesis 2.4 holds for given by the following formulas:

(16)
(17)
(18)

Let us now consider the family of transforms (denoted by in the introduction).

In both models, when , the rank-one random projector is a diagonal matrix with unique non zero entry uniformly distributed on the diagonal and equal to . In such a case, it can be readily seen that is the law of a diagonal matrix with i.i.d. entries with law . Owing to the law of large numbers, is obviously the identity map. Moreover, it has been seen in [6, 16] that for , is the Bercovici-Pata bijection.

Therefore, provides a continuum of maps passing from the identity () to the Bercovici-Pata bijection (). This continuum is related to the notion of -freeness developed by the first named author and Lévy in [11]. The maps are made explicit (at least at the moments level) by the following proposition, whose proof, based on the explicitation of the functions , is postponed to Section 10.

Proposition 3.4.

Let be an infinitely divisible law with moments of all orders. Then for each ,

(19)

with defined by:

if are the connected components of ;

if is connected, is the number of times one changes from one block to another when running through in a cyclic way.

Example of computation of . For instance, let us consider the partition (cf. Figure 2). Then the connected components of are the partitions and induced by on the sets and . Since and , .

12345678910

Figure 2. The partition .

Let us now say a few words about the bijection , as a function of .

If , then:

This corroborates the fact that is the identity map.

If tends to infinity, then:

where denotes the set of non-crossing partitions of (see Section 5.1 for a precise definition). Indeed, is positive unless is non-crossing. Since the continuity of w.r.t. the weak topology is uniform in , as it appears from the proof of Proposition 6.5 b), this proves that tends to the Bercovici-Pata bijection when goes to infinity, as expected.

3.4. The distribution made more explicit

When is a compound Poisson law, the definition of has been made explicit in Section 2.1.1. In this section, we explicit for a Dirac mass or a Gaussian laws (the column vector underlying the definition of staying as general as possible). Since any infinitely divisible law is a weak limit of convolutions of such laws and obviously, by the Formula (7) of the Fourier transform of , the law depends continuously on and satisfies

this gives a good idea of what a random matrix distributed according to looks like. Moreover, we also consider the case where is a Cauchy law, where a surprising behaviour w.r.t. the convolution appears.

First, it can easily be seen that if is the Dirac mass at , then is the Dirac mass at

Hence, due to Hypothesis 2.4 and Lemma 12.2, for .

Suppose that now is the standard Gaussian law and that Hypotheses 2.4 holds. Let be as in Remark 2.5. Then the distribution only depends on , and : when , is the law of

where are standard Gaussian variables and designs a GOE or GUE matrix (according to weither of ) as defined p. 51 of [2] (i.e. a standard Gaussian vector on the space of real symmetric or Hermitian matrices endowed with the scalar product ). As a consequence, since when , and , the spectral law of a -distributed matrix converges to

In the next proposition, we consider the case where is a Cauchy law with paramater :

Let be the associated kernel, defined by

Proposition 3.5.

We suppose that . For a -distributed random matrix and bounded, for any Hermitian matrix ,

real functions being applied to Hermitian matrices via the functional calculus.

The following Corollary follows easily, using the vector instead of (with as defined in Remark 2.5).

Corollary 3.6.

Under Hypothesis 2.4, the set of Cauchy laws is invariant by : for any , , for .

4. Proof of Theorem 2.1 and Proposition 2.2

One proves the theorem and the proposition in the same time, by showing that under the hypotheses of the theorem or of a) or b) of the remark, the Fourier transform of the left hand term of (8), that we denote by , converges pointwise to the right hand term of (7). Indeed, for any Hermitian matrix ,

where for any , . The function converges to as , uniformly on every compact subset of (this follows from [27, Lem. 3.1] and from the fact that uniformly on every compact set). It is enough to prove the result in the case where the law of is compactly supported. To conclude in the case where (resp. in the case where ), one needs to argue that there is a constant such that for all , (resp. that ).

5. Preliminaries for the proof of Theorem 2.6 : partitions and graphs

The aim of this section is to introduce the combinatorial definitions which will be used in the next section in order to prove Theorem 2.6. The conventions we chose for the definitions of partitions, graphs, and for the less well-known notion of hypergraph are presented in Section 12.3 of the appendix.

5.1. Partitions

  • Let us recall that we denote by the set of partitions of , and by the set of non-crossing partitions of (a partition of is said to be non-crossing if there does not exist such that ).

  • For any given partition , we denote by be the minimal non-crossing partition which is above for the refinement order; the partitions induced by on the blocks of are called the connected components of .

  • If has only one block, then is said to be connected.

  • We define to be the partition induced by on the subset of obtained by erasing whenever (with the convention ).

  • A partition is said to be thin if .

For instance, for the partition is and Figure 3 illustrates of the operation .

1234567891012345689

Figure 3. Illustration of the operation

For any function defined on a set , we denote by the partition of whose blocks are the level sets of . For any partition , for each , we denote by the index of the class of in , after having ordered the classes according to the order of their first element (for the example given Figure 2, we have , , and , , ,…). Moreover, is identified with and is identified with , so that and .

5.2. Graphs

Let be a graph, and a partition of . Let be the canonical surjection from onto . The quotient graph is the graph with as set of vertices and with edges such that is the edge between and if is an edge between and , with same direction if the graph is directed. Note that the quotient graph can have strictly less vertices than the initial one, but it has as much edges as the initial one. Note also that the quotient of a circuit is still a circuit.

For instance, let us consider the preceding partition and be the cyclic graph with vertex set and edges (cf. Figure a where we draw the partition with dashed lines). The quotient graph is then given by Figure b with , , and .

12345678910

(a) A cyclic graph and a partition of its vertices

V1V2V3V4

(b) The quotient of the left graph by the partition
Figure 4. A cyclic graph and one of its quotients

Let a directed graph. For any vertex , we denote by the subset of out-going edges from , and by the subset of in-going edges to :

Notice that and are not necessarily disjoints.

Let us consider as a set of “colorings” of the edges of by the colors . For any coloring , let and . If for each vertex and each color , , the coloring is called admissible.

Figure 5 presents an instance of an admissible coloring of with three colors.

V1V2V3V4

Figure 5. An admissible coloring

A partition of the edges of will be said admissible if it is the kernel of an admissible coloring. In this case, for any vertex , we define

(20)

to be the common value of the decreasing reordering of the families for coloring such that .

5.3. Hypergraph associated to a partition of the edges of a graph

Let be a graph with vertex set , be a partition of and be a partition of the edge set of . Then one can define to be the hypergraph with the same vertex set as (i.e. ) and with edges , where each edge is the set of blocks such that at least one edge of starting or ending at belongs to .

For instance, with , and as given by the preceding example Figure 5, is the hypergraph with three edges drawn in Figure 6. Another example, where has no cycle, is given at Figure 7.

V1V2V3V4

Figure 6. The graph with , and as given by the preceding example Figure 5

The next proposition will be used in the following. For a partition of the edges of a graph, a cycle of the graph is said to be -monochromatic if all the edges it visits belong to the same block of .

Proposition 5.1.

Let be a directed graph, which is a circuit. Fix and be a partition of the edge set of such that has no cycle. Then is a disjoint union of -monochromatic cycles. As a consequence, is admissible.

Proof. If has more than one block (the case with one bloc being obvious), since is a circuit, one can find a closed path

of which visits each edge of exactly once and such that and do not belong to the same block of . For each edge of , let be the edge of consisting of all vertices of which are the beginning or the end of an edge of