The outliers among the singular values
of large rectangular random matrices
with additive fixed rank deformation
Abstract.
Consider the matrix where the matrix has Gaussian standard independent elements, is a deterministic diagonal nonnegative matrix, and is a deterministic matrix with fixed rank. Under some known conditions, the spectral measures of and both converge towards a compactly supported probability measure as with . In this paper, it is proved that finitely many eigenvalues of may stay away from the support of in the large dimensional regime. The existence and locations of these outliers in any connected component of are studied. The fluctuations of the largest outliers of are also analyzed. The results find applications in the fields of signal processing and radio communications.
Key words and phrases:
Random Matrix Theory, Stieltjes Transform, Fixed rank deformation, Extreme eigenvalues, Gaussian fluctuations.2000 Mathematics Subject Classification:
Primary 15A52, Secondary 15A18, 60F15.1. Introduction
1.1. The model and the literature
Consider a sequence of matrices , , of the form
where is a random matrix
whose coefficients are independent and identically distributed (iid)
complex Gaussian random variables such that and
are independent, each with mean zero and variance , and where
is a deterministic nonnegative diagonal matrix.
Writing and denoting by
the Dirac measure, it is assumed that the spectral
measure of
converges weakly to a compactly supported probability
measure when .
It is also assumed that the maximum of the distances from the diagonal elements
of to the support of goes
to zero as .
Assume that when , where is a positive constant.
Then it is known that
with probability one, the spectral measure of the Gram matrix
converges weakly to a compactly supported probability measure
(see [26], [16], [35], [36]) and,
with probability one, has no eigenvalues
in any compact interval outside for large [3].
Let be a given positive integer and consider a sequence of
deterministic matrices , , such that and
where is the spectral norm.
Consider the matrix .
Since the additive deformation has a
fixed rank, the spectral measure of still converges
to (see, e.g., [2, Lemma 2.2]). However, a finite
number of eigenvalues of (often called “outliers” in
similar contexts) may stay away of the support of .
In this paper, minimal conditions ensuring the existence and the convergence
of these outliers towards constant values outside are provided,
and these limit values are characterized. The fluctuations of the outliers
lying at the right of are also studied.
The behavior of the outliers in the spectrum of large random matrices has aroused an important research effort. In the statistics literature, one of the first contributions to deal with this subject was [23]. It raised the question of the behavior of the extreme eigenvalues of a sample covariance matrix when the population covariance matrix has all but finitely many of its eigenvalues equal to one (leading to a mutliplicative fixed rank deformation). This problem has been studied thoroughly in [5, 6, 32]. Other contributions (see [11]) study the outliers of a Wigner matrix subject to an additive fixed rank deformation. The asymptotic fluctuations of the outliers have been addressed in [5, 33, 32, 1, 11, 12, 7].
Recently, BenaychGeorges and Nadakuditi proposed in
[8, 9] a generic method for
characterizing the behavior of the outliers for a large palette of
random matrix models. For our model, this method shows that the limiting
locations
as well as the fluctuations of the outliers are intimately related
to the asymptotic behavior of certain bilinear forms involving the resolvents
and
of the undeformed matrix for real
values of .
When , the asymptotic behavior of these bilinear forms
can be simply identified (see [9]) thanks to the fact that
the probability law of is invariant by left or right multiplication by
deterministic unitary matrices. For general , other tools
need to be used. In this paper, these bilinear forms are studied with the help
of an integration by parts formula for functionals of Gaussian vectors and the
PoincaréNash inequality.
These tools belong to the arsenal of random matrix theory, as shown
in the recent monograph [31] and in the references therein.
In order to be able to use them in our context, we make use of a regularizing
function ensuring that the moments of the bilinear forms exist for certain
.
The study of the spectrum outliers of large random matrices has a wide range of
applications. These include communication theory [20], fault
diagnosis in complex systems [14], financial portfolio
management [34], or chemometrics [29]. The matrix
model considered in this paper is widely used in the fields of multidimensional
signal processing and radio communications. Using the invariance of the
probability law of by multiplication by a constant unitary matrix,
can be straightforwardly replaced with a nonnegative Hermitian matrix .
In the model where is any square root of
, matrix often represents snapshots of a discrete time radio
signal sent by sources and received by an array of antennas, while is a temporally correlated and spatially independent “noise”
(spatially correlated and temporally independent noises can be considered as
well). In this framework, the results of this paper can be used for detecting
the signal sources, estimating their powers, or determining their directions.
These subjects are explored in the applicative paper
[40].
The remainder of the article is organized as follows. The assumptions and the main results are provided in Section 2. The general approach as well as the basic mathematical tools needed for the proofs are provided in Secion 3. These proofs are given in Sections 4 and 5, which concern respectively the first order (convergence) and the second order (fluctuations) behavior of the outliers.
2. Problem description and main results
Given a sequence of integers , , we consider the sequence of matrices with the following assumptions:
Assumption 1.
The ratio converges to a positive constant as .
Assumption 2.
The matrix is a random matrix whose coefficients are iid complex random variables such that and are independent, each with probability distribution .
Assumption 3.
The sequence of deterministic diagonal nonnegative matrices satisfies the following:

The probability measure converges weakly to a probability measure with compact support.

The distances from to satisfy
The asymptotic behavior of the spectral measure of under these assumptions has been thoroughly studied in the literature. Before pursuing, we recall the main results which describe this behavior. These results are built around the Stieltjes Transform, defined, for a positive finite measure over the Borel sets of , as
(1) 
analytic on . It is straightforward to check that when , and . Conversely, any analytic function on that has these two properties admits the integral representation (1) where is a positive finite measure. Furthermore, for any continuous real function with compact support in ,
(2) 
which implies that the measure is uniquely defined by its Stieltjes
Transform. Finally, if when , then
[25].
These facts can be generalized to Hermitian matrixvalued
nonnegative finite measures [10, 15].
Let be a valued analytic function on .
Letting , assume that
and in the order of the Hermitian matrices for any
, and that .
Then admits the representation (1) where is now a
matrixvalued nonnegative finite measure such that
. One can also check
that .
Theorem 2.1.
Under Assumptions 1, 2 and 3, the following hold true:

For any , the equation
(3) admits a unique solution . The function so defined on is the Stieltjes Transform of a probability measure whose support is a compact set of .
Let be the eigenvalues of , and let be the spectral measure of this matrix. Then for every bounded and continuous real function ,(4) 
For any interval ,
We now consider the additive deformation :
Assumption 4.
The deterministic matrices have a fixed rank equal to . Moreover, .
In order for some of the eigenvalues of to converge to values outside , an extra assumption involving in some sense the interaction between and is needed. Let be the GramSchmidt factorization of where is an isometry matrix and where is an upper triangular matrix in row echelon form whose first nonzero coefficient of each row is positive. The factorization so defined is then unique. Define the Hermitian nonnegative matrixvalued measure as
Assumption 3 shows that . Moreover, it is clear that the support of is included in and that . Since the sequence is bounded in norm, for every sequence of integers increasing to infinity, there exists a subsequence and a nonnegative finite measure such that for every function , with being the set of continuous functions on . This fact is a straightforward extension of its analogue for scalar measures.
Assumption 5.
Any two accumulation points and of the sequences satisfy where is a unitary matrix.
This assumption on the interaction between and appears to be the least restrictive assumption ensuring the convergence of the outliers to fixed values outside as . If we consider some other factorization of where is an isometry matrix with size , and if we associate to the the sequence of Hermitian nonnegative matrixvalued measures defined as
(5) 
then it is clear that
for some unitary matrix . By the compactness
of the unitary group, Assumption 5 is satisfied for
if and only if it is satisfied for .
The main consequence of this assumption is that for any function
, the eigenvalues of the matrix
arranged in some given order will converge.
An example taken from the fields of signal processing and wireless
communications might help to have a better understanding the applicability
of Assumption 5.
In these fields, the matrix often represents a multidimensional radio
signal received by an array of antennas.
Frequently this matrix can be factored as where
is a deterministic isometry matrix, is a deterministic
matrix such that converges to a matrix as
(one often assumes for each ), and is a random
matrix independent of with iid elements satisfying
and (in the wireless communications
terminology,
is the so called MIMO channel matrix and is the so called signal matrix,
see [38]). Taking in
(5) and applying the law of large numbers, one can see that
for any , the integral
converges to with .
Clearly, the accumulation points of the measures obtained from any other
sequence of factorizations of are of the form
where is an unitary matrix.
It is shown in [37] that the limiting spectral measure has a continuous density on (see Prop. 3.1 below). Our first order result addresses the problem of the presence of isolated eigenvalues of in any compact interval outside the support of this density. Of prime importance will be the matrix functions
where is an accumulation point of a sequence . Since on , the function is analytic on . It is further easy to show that and on , and . Hence is the Stieltjes Transform of a matrixvalued nonnegative finite measure carried by . Note also that, under Assumption 5, the eigenvalues of remain unchanged if is replaced by another accumulation point.
The support of may consist in several connected components corresponding to as many “bulks” of eigenvalues. Our first theorem specifies the locations of the outliers between any two bulks and on the right of the last bulk. It also shows that there are no outliers on the left of the first bulk:
Theorem 2.2.
Let Assumptions 1, 2 and 3 hold true. Denote by the eigenvalues of . Let be any connected component of . Then the following facts hold true:

Let be a sequence satisfying Assumptions 4 and 5. Given an accumulation point of a sequence , let . Then can be analytically extended to where its values are Hermitian matrices, and the extension is increasing in the order of Hermitian matrices on . The function has at most zeros on . Let , be these zeros counting multiplicities. Let be any compact interval in such that . Then

Let . Then for any positive (assuming it exists) and for any sequence of matrices satisfying Assumption 4,
Given any sequence of positive real numbers lying in a connected component of after the first bulk, it would be interesting to see whether there exists a sequence of matrices that produces outliers converging to these . The following theorem answers this question positively:
Theorem 2.3.
It would be interesting to complete the results of these theorems by specifying
the indices of the outliers that appear between the bulks.
This demanding analysis might be done by following the
ideas of [11] or [39] relative to the so called
separation of the eigenvalues of . Another approach
dealing with the same kind of problem is developed in
[4].
A case of practical importance at least in the domain of signal processing is described by the following assumption:
Assumption 6.
The accumulation points are of the form where
and where is a unitary matrix.
Because of the specific structure of in the factorization
, the MIMO wireless communication model described
above satisfies this assumption, the often referring to the powers
of the radio sources transmitting their signals to an array of antennas.
Another case where Assumption 6 is satisfied is the case where
is a random matrix independent of , where its probability distribution is
invariant by right multiplication with a constant unitary matrix, and where the
non zero singular values of converge almost surely towards constant
values.
When this assumption is satisfied, we obtain the following corollary of Theorem 2.2 which exhibits some sort of phase transition analogous to the socalled BBP phase transition [5]:
Corollary 2.1.
Assume the setting of Theorem 2.2(1), and let Assumption 6 hold true. Then the function is decreasing on . Depending on the value of , , the equation has either zero or one solution in . Denote by , these solutions counting multiplicities. Then the conclusion of Theorem 2.2(1) hold true for these .
We now turn to the second order result, which will be stated in the simple and practical framework of Assumption 6. Actually, a stronger assumption is needed:
Assumption 7.
The following facts hold true:
Moreover, there exists a sequence of factorizations of such that the measures associated with these factorizations by (5) converge to and such that
Note that one could have considered the above superior limits to be zero, which would simplify the statement of Theorem 2.4 below. However, in practice this is usually too strong a requirement, see e.g. the wireless communications model discussed after Assumption 5 for which the fluctuations of are of order . On the opposite, slower fluctuations of would result in a much more intricate result for Theorem 2.4, which we do not consider here.
Proposition 2.1 ([36, 22, 18]).
Assume that is a diagonal nonnegative matrix. Then, for any , the equation
admits a unique solution for any . The function
so defined on is the Stieltjes Transform
of a probability measure whose support is a compact set of .
Moreover, the diagonal matrixvalued function
is analytic on and
coincides with the Stieltjes Transform of
.
Let Assumption 2 hold true, and assume that , and . Then
the resolvents and
satisfy
(6) 
for any . When in addition Assumptions 1 and 3 hold true, converges to provided in the statement of Theorem 2.1 uniformly on the compact subsets of .
The function
is a finite approximation of .
Notice that since is the Stieltjes Transform of the
spectral measure of , Convergence (4)
stems from (6).
We shall also need a finite approximation of defined as
With these definitions, we have the following preliminary proposition:
Proposition 2.2.
Let Assumptions 1, 37 hold true. Let be the function defined in the statement of Corollary 2.1 and let . Assume that the equation has a solution in , and denote the existing solutions (with respective multiplicities ) of the equations in . Then the following facts hold true:

is positive for every .

Denoting by the first upper left diagonal blocks of , where , for every .
We recall that a GUE matrix (i.e., a matrix taken from the Gaussian Unitary Ensemble) is a random Hermitian matrix such that , and for , and such that all these random variables are independent. Our second order result is provided by the following theorem:
Theorem 2.4.
Let Assumptions 17 hold true. Keeping the notations of Proposition 2.2, let
where and where the eigenvalues of are arranged in decreasing order. Let be independent GUE matrices such that is a matrix. Then, for any bounded continuous ,
where is the random vector of the decreasingly ordered eigenvalues of the matrix
where
Some remarks can be useful at this stage. The first remark concerns Assumption
7, which is in some sense analogous to [7, Hypothesis
3.1]. This assumption is mainly needed to show that the are bounded, guaranteeing the tightness of
the vectors .
Assuming that and both satisfy the third item
of Assumption 7, denoting respectively by and
the matrices associated to these measures as in the
statement of Theorem 2.4, it is possible to show that
as
. Thus the results of this theorem do not depend on the
particular measure satisfying Assumption 7.
Finally, we note that Assumption 7 can be lightened at the
expense of replacing the limit values with certain finite
approximations of the outliers, as is done in the applicative paper
[40].
The second remark pertains to the Gaussian assumption on the elements of . We shall see below that the results of Theorems 2.2–2.4 are intimately related to the first and second order behaviors of bilinear forms of the type , , and where , , and are deterministic vectors of bounded norm and of appropriate dimensions, and where is a real number lying outside the support of . In fact, it is possible to generalize Theorems 2.2 and 2.3 to the case where the elements of are not necessarily Gaussian. This can be made possible by using the technique of [21] to analyze the first order behavior of these bilinear forms. On the other hand, the Gaussian assumption plays a central role in Theorem 2.4. Indeed, the proof of this theorem is based on the fact that these bilinear forms asymptotically fluctuate like Gaussian random variables when centered and scaled by . Take and where is the canonical vector of . We show below (see Proposition 2.1 and Lemmas 4.3 and 4.6) that the elements of the resolvent are close for large to the elements of the deterministic matrix . We therefore write informally
It can be shown furthermore that for large and that the sum is tight. Hence, is tight. However, when is not Gaussian, we infer that does not converge in general towards a Gaussian random variable. In this case, if we choose (see Section 5), Theorem 2.4 no longer holds. Yet, we conjecture that an analogue of this theorem can be recovered when and are replaced with delocalized vectors, following the terminology of [12]. In a word, the elements of these vectors are “spread enough” so that the Gaussian fluctuations are recovered.
A word about the notations
In the remainder of the paper, we shall often drop the subscript or the superscript when there is no ambiguity. A constant bound that may change from an inequality to another but which is independent of will always be denoted . Element of matrix is denoted or . Element of vector is denoted . Convergences in the almost sure sense, in probability and in distribution will be respectively denoted , , and .
3. Preliminaries and useful results
3.1. Proof principles of the first order results
The proof of Theorem 2.2(1), to begin with, is based on the idea of [8, 9]. We start with a purely algebraic result. Let be a factorization of where is a isometry matrix. Assume that is not an eigenvalue of . Then is an eigenvalue of if and only if where is the matrix
(for details, see the derivations in [9] or in [20, Section 3]). The idea is now the following. Set in . Using an integration by parts formula for functionals of Gaussian vectors and the PoincaréNash inequality [31], we show that when is large,
by controlling the moments of the elements of the left hand members. To be able to do these controls, we make use of a certain regularizing function which controls the escape of the eigenvalues of out of . Thanks to these results, is close for large to
Hence, we expect the eigenvalues of in the interval , when they exist, to be close for large to the zeros in of the function
which are close to the zeros of . By
Assumption 5, these zeros are independent of the choice of the
accumulation point .
To prove Theorems 2.2(2) and 2.3, we make use of the results of [37] and [27, 28] relative to the properties of and to those of the restriction of to . The main idea is to show that

for all (these lie at the left of the first bulk) and for all .

For any component such that (i.e., lying between two bulks or at the right of the last bulk), there exists a Borel set such that and
for all .
Thanks to the first result, for any lying if possible between zero and
the left edge of the first bulk, , hence
has asymptotically no outlier at the left of the first bulk.
Coming to Theorem 2.3, let be a set associated to by
the result above. We build a sequence of matrices of rank , and such
that the associated have an accumulation point of the form
where we choose
.
Theorem 2.2(1) shows that the function
associated with this is
increasing on . As a result, becomes singular
precisely at the points .