Singular value outliers

The outliers among the singular values
of large rectangular random matrices
with additive fixed rank deformation

François Chapon, Romain Couillet,
Walid Hachem and Xavier Mestre
January 2012
Abstract.

Consider the matrix where the matrix has Gaussian standard independent elements, is a deterministic diagonal nonnegative matrix, and is a deterministic matrix with fixed rank. Under some known conditions, the spectral measures of and both converge towards a compactly supported probability measure as with . In this paper, it is proved that finitely many eigenvalues of may stay away from the support of in the large dimensional regime. The existence and locations of these outliers in any connected component of are studied. The fluctuations of the largest outliers of are also analyzed. The results find applications in the fields of signal processing and radio communications.

Key words and phrases:
Random Matrix Theory, Stieltjes Transform, Fixed rank deformation, Extreme eigenvalues, Gaussian fluctuations.
2000 Mathematics Subject Classification:
Primary 15A52, Secondary 15A18, 60F15.
The work of the first three authors was supported by the French Ile-de-France region, DIM LSC fund, Digiteo project DESIR

1. Introduction

1.1. The model and the literature

Consider a sequence of matrices , , of the form where is a random matrix whose coefficients are independent and identically distributed (iid) complex Gaussian random variables such that and are independent, each with mean zero and variance , and where is a deterministic nonnegative diagonal matrix. Writing and denoting by the Dirac measure, it is assumed that the spectral measure of converges weakly to a compactly supported probability measure when . It is also assumed that the maximum of the distances from the diagonal elements of to the support of goes to zero as . Assume that when , where is a positive constant. Then it is known that with probability one, the spectral measure of the Gram matrix converges weakly to a compactly supported probability measure (see [26], [16], [35], [36]) and, with probability one, has no eigenvalues in any compact interval outside for large [3].
Let be a given positive integer and consider a sequence of deterministic matrices , , such that and where is the spectral norm. Consider the matrix . Since the additive deformation has a fixed rank, the spectral measure of still converges to (see, e.g., [2, Lemma 2.2]). However, a finite number of eigenvalues of (often called “outliers” in similar contexts) may stay away of the support of . In this paper, minimal conditions ensuring the existence and the convergence of these outliers towards constant values outside are provided, and these limit values are characterized. The fluctuations of the outliers lying at the right of are also studied.

The behavior of the outliers in the spectrum of large random matrices has aroused an important research effort. In the statistics literature, one of the first contributions to deal with this subject was [23]. It raised the question of the behavior of the extreme eigenvalues of a sample covariance matrix when the population covariance matrix has all but finitely many of its eigenvalues equal to one (leading to a mutliplicative fixed rank deformation). This problem has been studied thoroughly in [5, 6, 32]. Other contributions (see [11]) study the outliers of a Wigner matrix subject to an additive fixed rank deformation. The asymptotic fluctuations of the outliers have been addressed in [5, 33, 32, 1, 11, 12, 7].

Recently, Benaych-Georges and Nadakuditi proposed in [8, 9] a generic method for characterizing the behavior of the outliers for a large palette of random matrix models. For our model, this method shows that the limiting locations as well as the fluctuations of the outliers are intimately related to the asymptotic behavior of certain bilinear forms involving the resolvents and of the undeformed matrix for real values of . When , the asymptotic behavior of these bilinear forms can be simply identified (see [9]) thanks to the fact that the probability law of is invariant by left or right multiplication by deterministic unitary matrices. For general , other tools need to be used. In this paper, these bilinear forms are studied with the help of an integration by parts formula for functionals of Gaussian vectors and the Poincaré-Nash inequality. These tools belong to the arsenal of random matrix theory, as shown in the recent monograph [31] and in the references therein. In order to be able to use them in our context, we make use of a regularizing function ensuring that the moments of the bilinear forms exist for certain .

The study of the spectrum outliers of large random matrices has a wide range of applications. These include communication theory [20], fault diagnosis in complex systems [14], financial portfolio management [34], or chemometrics [29]. The matrix model considered in this paper is widely used in the fields of multidimensional signal processing and radio communications. Using the invariance of the probability law of by multiplication by a constant unitary matrix, can be straightforwardly replaced with a nonnegative Hermitian matrix . In the model where is any square root of , matrix often represents snapshots of a discrete time radio signal sent by sources and received by an array of antennas, while is a temporally correlated and spatially independent “noise” (spatially correlated and temporally independent noises can be considered as well). In this framework, the results of this paper can be used for detecting the signal sources, estimating their powers, or determining their directions. These subjects are explored in the applicative paper [40].

The remainder of the article is organized as follows. The assumptions and the main results are provided in Section 2. The general approach as well as the basic mathematical tools needed for the proofs are provided in Secion 3. These proofs are given in Sections 4 and 5, which concern respectively the first order (convergence) and the second order (fluctuations) behavior of the outliers.

2. Problem description and main results

Given a sequence of integers , , we consider the sequence of matrices with the following assumptions:

Assumption 1.

The ratio converges to a positive constant as .

Assumption 2.

The matrix is a random matrix whose coefficients are iid complex random variables such that and are independent, each with probability distribution .

Assumption 3.

The sequence of deterministic diagonal nonnegative matrices satisfies the following:

  1. The probability measure converges weakly to a probability measure with compact support.

  2. The distances from to satisfy

The asymptotic behavior of the spectral measure of under these assumptions has been thoroughly studied in the literature. Before pursuing, we recall the main results which describe this behavior. These results are built around the Stieltjes Transform, defined, for a positive finite measure over the Borel sets of , as

(1)

analytic on . It is straightforward to check that when , and . Conversely, any analytic function on that has these two properties admits the integral representation (1) where is a positive finite measure. Furthermore, for any continuous real function with compact support in ,

(2)

which implies that the measure is uniquely defined by its Stieltjes Transform. Finally, if when , then [25].
These facts can be generalized to Hermitian matrix-valued nonnegative finite measures [10, 15]. Let be a -valued analytic function on . Letting , assume that and in the order of the Hermitian matrices for any , and that . Then admits the representation (1) where is now a matrix-valued nonnegative finite measure such that . One can also check that .

The first part of the following theorem has been shown in [26, 36], and the second part in [3]:

Theorem 2.1.

Under Assumptions 1, 2 and 3, the following hold true:

  1. For any , the equation

    (3)

    admits a unique solution . The function so defined on is the Stieltjes Transform of a probability measure whose support is a compact set of .
    Let be the eigenvalues of , and let be the spectral measure of this matrix. Then for every bounded and continuous real function ,

    (4)
  2. For any interval ,

We now consider the additive deformation :

Assumption 4.

The deterministic matrices have a fixed rank equal to . Moreover, .

In order for some of the eigenvalues of to converge to values outside , an extra assumption involving in some sense the interaction between and is needed. Let be the Gram-Schmidt factorization of where is an isometry matrix and where is an upper triangular matrix in row echelon form whose first nonzero coefficient of each row is positive. The factorization so defined is then unique. Define the Hermitian nonnegative matrix-valued measure as

Assumption 3 shows that . Moreover, it is clear that the support of is included in and that . Since the sequence is bounded in norm, for every sequence of integers increasing to infinity, there exists a subsequence and a nonnegative finite measure such that for every function , with being the set of continuous functions on . This fact is a straightforward extension of its analogue for scalar measures.

Assumption 5.

Any two accumulation points and of the sequences satisfy where is a unitary matrix.

This assumption on the interaction between and appears to be the least restrictive assumption ensuring the convergence of the outliers to fixed values outside as . If we consider some other factorization of where is an isometry matrix with size , and if we associate to the the sequence of Hermitian nonnegative matrix-valued measures defined as

(5)

then it is clear that for some unitary matrix . By the compactness of the unitary group, Assumption 5 is satisfied for if and only if it is satisfied for . The main consequence of this assumption is that for any function , the eigenvalues of the matrix arranged in some given order will converge.

An example taken from the fields of signal processing and wireless communications might help to have a better understanding the applicability of Assumption 5. In these fields, the matrix often represents a multidimensional radio signal received by an array of antennas. Frequently this matrix can be factored as where is a deterministic isometry matrix, is a deterministic matrix such that converges to a matrix as (one often assumes for each ), and is a random matrix independent of with iid elements satisfying and (in the wireless communications terminology, is the so called MIMO channel matrix and is the so called signal matrix, see [38]). Taking in (5) and applying the law of large numbers, one can see that for any , the integral converges to with . Clearly, the accumulation points of the measures obtained from any other sequence of factorizations of are of the form where is an unitary matrix.

It is shown in [37] that the limiting spectral measure has a continuous density on (see Prop. 3.1 below). Our first order result addresses the problem of the presence of isolated eigenvalues of in any compact interval outside the support of this density. Of prime importance will be the matrix functions

where is an accumulation point of a sequence . Since on , the function is analytic on . It is further easy to show that and on , and . Hence is the Stieltjes Transform of a matrix-valued nonnegative finite measure carried by . Note also that, under Assumption 5, the eigenvalues of remain unchanged if is replaced by another accumulation point.

The support of may consist in several connected components corresponding to as many “bulks” of eigenvalues. Our first theorem specifies the locations of the outliers between any two bulks and on the right of the last bulk. It also shows that there are no outliers on the left of the first bulk:

Theorem 2.2.

Let Assumptions 1, 2 and 3 hold true. Denote by the eigenvalues of . Let be any connected component of . Then the following facts hold true:

  1. Let be a sequence satisfying Assumptions 4 and 5. Given an accumulation point of a sequence , let . Then can be analytically extended to where its values are Hermitian matrices, and the extension is increasing in the order of Hermitian matrices on . The function has at most zeros on . Let , be these zeros counting multiplicities. Let be any compact interval in such that . Then

  2. Let . Then for any positive (assuming it exists) and for any sequence of matrices satisfying Assumption 4,

Given any sequence of positive real numbers lying in a connected component of after the first bulk, it would be interesting to see whether there exists a sequence of matrices that produces outliers converging to these . The following theorem answers this question positively:

Theorem 2.3.

Let Assumptions 1, 2 and 3 hold true. Let be a sequence of positive real numbers lying in a connected component of , and such that . Then there exists a sequence of matrices satisfying Assumptions 4 and 5 such that for any compact interval with ,

It would be interesting to complete the results of these theorems by specifying the indices of the outliers that appear between the bulks. This demanding analysis might be done by following the ideas of [11] or [39] relative to the so called separation of the eigenvalues of . Another approach dealing with the same kind of problem is developed in [4].

A case of practical importance at least in the domain of signal processing is described by the following assumption:

Assumption 6.

The accumulation points are of the form where

and where is a unitary matrix.

Because of the specific structure of in the factorization , the MIMO wireless communication model described above satisfies this assumption, the often referring to the powers of the radio sources transmitting their signals to an array of antennas.
Another case where Assumption 6 is satisfied is the case where is a random matrix independent of , where its probability distribution is invariant by right multiplication with a constant unitary matrix, and where the non zero singular values of converge almost surely towards constant values.
When this assumption is satisfied, we obtain the following corollary of Theorem 2.2 which exhibits some sort of phase transition analogous to the so-called BBP phase transition [5]:

Corollary 2.1.

Assume the setting of Theorem 2.2-(1), and let Assumption 6 hold true. Then the function is decreasing on . Depending on the value of , , the equation has either zero or one solution in . Denote by , these solutions counting multiplicities. Then the conclusion of Theorem 2.2-(1) hold true for these .

We now turn to the second order result, which will be stated in the simple and practical framework of Assumption 6. Actually, a stronger assumption is needed:

Assumption 7.

The following facts hold true:

Moreover, there exists a sequence of factorizations of such that the measures associated with these factorizations by (5) converge to and such that

Note that one could have considered the above superior limits to be zero, which would simplify the statement of Theorem 2.4 below. However, in practice this is usually too strong a requirement, see e.g. the wireless communications model discussed after Assumption 5 for which the fluctuations of are of order . On the opposite, slower fluctuations of would result in a much more intricate result for Theorem 2.4, which we do not consider here.

Before stating the second order result, a refinement of the results of Theorem 2.1–(1) is needed:

Proposition 2.1 ([36, 22, 18]).

Assume that is a diagonal nonnegative matrix. Then, for any , the equation

admits a unique solution for any . The function so defined on is the Stieltjes Transform of a probability measure whose support is a compact set of . Moreover, the diagonal matrix-valued function is analytic on and coincides with the Stieltjes Transform of .
Let Assumption 2 hold true, and assume that , and . Then the resolvents and satisfy

(6)

for any . When in addition Assumptions 1 and 3 hold true, converges to provided in the statement of Theorem 2.1 uniformly on the compact subsets of .

The function is a finite approximation of . Notice that since is the Stieltjes Transform of the spectral measure of , Convergence (4) stems from (6).
We shall also need a finite approximation of defined as

With these definitions, we have the following preliminary proposition:

Proposition 2.2.

Let Assumptions 1, 3-7 hold true. Let be the function defined in the statement of Corollary 2.1 and let . Assume that the equation has a solution in , and denote the existing solutions (with respective multiplicities ) of the equations in . Then the following facts hold true:

  • is positive for every .

  • Denoting by the first upper left diagonal blocks of , where , for every .

We recall that a GUE matrix (i.e., a matrix taken from the Gaussian Unitary Ensemble) is a random Hermitian matrix such that , and for , and such that all these random variables are independent. Our second order result is provided by the following theorem:

Theorem 2.4.

Let Assumptions 1-7 hold true. Keeping the notations of Proposition 2.2, let

where and where the eigenvalues of are arranged in decreasing order. Let be independent GUE matrices such that is a matrix. Then, for any bounded continuous ,

where is the random vector of the decreasingly ordered eigenvalues of the matrix

where

Some remarks can be useful at this stage. The first remark concerns Assumption 7, which is in some sense analogous to [7, Hypothesis 3.1]. This assumption is mainly needed to show that the are bounded, guaranteeing the tightness of the vectors . Assuming that and both satisfy the third item of Assumption 7, denoting respectively by and the matrices associated to these measures as in the statement of Theorem 2.4, it is possible to show that as . Thus the results of this theorem do not depend on the particular measure satisfying Assumption 7. Finally, we note that Assumption 7 can be lightened at the expense of replacing the limit values with certain finite approximations of the outliers, as is done in the applicative paper [40].

The second remark pertains to the Gaussian assumption on the elements of . We shall see below that the results of Theorems 2.22.4 are intimately related to the first and second order behaviors of bilinear forms of the type , , and where , , and are deterministic vectors of bounded norm and of appropriate dimensions, and where is a real number lying outside the support of . In fact, it is possible to generalize Theorems 2.2 and 2.3 to the case where the elements of are not necessarily Gaussian. This can be made possible by using the technique of [21] to analyze the first order behavior of these bilinear forms. On the other hand, the Gaussian assumption plays a central role in Theorem 2.4. Indeed, the proof of this theorem is based on the fact that these bilinear forms asymptotically fluctuate like Gaussian random variables when centered and scaled by . Take and where is the canonical vector of . We show below (see Proposition 2.1 and Lemmas 4.3 and 4.6) that the elements of the resolvent are close for large to the elements of the deterministic matrix . We therefore write informally

It can be shown furthermore that for large and that the sum is tight. Hence, is tight. However, when is not Gaussian, we infer that does not converge in general towards a Gaussian random variable. In this case, if we choose (see Section 5), Theorem 2.4 no longer holds. Yet, we conjecture that an analogue of this theorem can be recovered when and are replaced with delocalized vectors, following the terminology of [12]. In a word, the elements of these vectors are “spread enough” so that the Gaussian fluctuations are recovered.

A word about the notations

In the remainder of the paper, we shall often drop the subscript or the superscript when there is no ambiguity. A constant bound that may change from an inequality to another but which is independent of will always be denoted . Element of matrix is denoted or . Element of vector is denoted . Convergences in the almost sure sense, in probability and in distribution will be respectively denoted , , and .

3. Preliminaries and useful results

We start this section by providing the main ideas of the proofs of Theorems 2.2 and 2.3.

3.1. Proof principles of the first order results

The proof of Theorem 2.2-(1), to begin with, is based on the idea of [8, 9]. We start with a purely algebraic result. Let be a factorization of where is a isometry matrix. Assume that is not an eigenvalue of . Then is an eigenvalue of if and only if where is the matrix

(for details, see the derivations in [9] or in [20, Section 3]). The idea is now the following. Set in . Using an integration by parts formula for functionals of Gaussian vectors and the Poincaré-Nash inequality [31], we show that when is large,

by controlling the moments of the elements of the left hand members. To be able to do these controls, we make use of a certain regularizing function which controls the escape of the eigenvalues of out of . Thanks to these results, is close for large to

Hence, we expect the eigenvalues of in the interval , when they exist, to be close for large to the zeros in of the function

which are close to the zeros of . By Assumption 5, these zeros are independent of the choice of the accumulation point .

To prove Theorems 2.2-(2) and 2.3, we make use of the results of [37] and [27, 28] relative to the properties of and to those of the restriction of to . The main idea is to show that

  • for all (these lie at the left of the first bulk) and for all .

  • For any component such that (i.e., lying between two bulks or at the right of the last bulk), there exists a Borel set such that and

    for all .

Thanks to the first result, for any lying if possible between zero and the left edge of the first bulk, , hence has asymptotically no outlier at the left of the first bulk.
Coming to Theorem 2.3, let be a set associated to by the result above. We build a sequence of matrices of rank , and such that the associated have an accumulation point of the form where we choose . Theorem 2.2-(1) shows that the function associated with this is increasing on . As a result, becomes singular precisely at the points .

3.2. Sketch of the proof of the second order result

The fluctuations of the outliers will be deduced from the fluctuations of the elements of the matrices introduced above. The proof of Theorem 2.4 can be divided into two main steps. The first step (Lemma 5.4) consists in establishing a Central Limit Theorem on the –uple of random matrices