Entrywise Eigenvector Analysis of Random Matrices with Low Expected Rank
Recovering low-rank structures via eigenvector perturbation analysis is a common problem in statistical machine learning, such as in factor analysis, community detection, ranking, matrix completion, among others. While a large variety of results provide tight bounds on the average errors between empirical and population statistics of eigenvectors, fewer results are tight for entrywise analyses, which are critical for a number of problems such as community detection and ranking.
This paper investigates the entrywise perturbation analysis for a large class of random matrices whose expectations are low-rank, including community detection, synchronization (-spiked Wigner model) and matrix completion models. Denoting by , respectively , the eigenvectors of a random matrix , respectively , the paper characterizes cases for which serves as a first-order approximation under the norm. The fact that the approximation is both tight and linear in the random matrix allows for sharp comparisons of and . In particular, it allows to compare the signs of and even when is large, which in turn allows to settle the conjecture in [?] that the spectral algorithm achieves exact recovery in the stochastic block model without any trimming or cleaning steps. The results are further extended to the perturbation of eigenspaces, providing new bounds for -type errors in noisy matrix completion.
: Eigenvector perturbation, spectral analysis, synchronization, community detection, matrix completion, low-rank structures, random matrices.
Many estimation problems in statistics involve low-rank matrix estimators that are NP-hard to compute, and many of these estimators are solutions to nonconvex programs. This is partly because of the widespread use of maximum likelihood estimation (MLE), which, while enjoying good statistical properties, often poses computational challenges due to nonconvex or discrete constraints inherent in the problems. It is, thus, crucial to address computational issues, especially in modern large-scale problems.
Fortunately, computationally efficient eigenvector-based algorithms often afford good performance. The eigenvectors either directly lead to final estimates [?], or serve as warm starts followed by further refinements [?]. Such algorithms mostly rely on computing leading eigenvectors and matrix-vector multiplications, which are fast by nature.
While various heuristics abound, theoretical understanding remains scarce on the entrywise analysis, and on when refinements are needed or can be avoided. In particular, it remains open in various cases to determine whether a vanilla eigenvector-based method without preprocessing steps (e.g., trimming of outliers) or without refinement steps (e.g., cleaning with local improvements) enjoys the same optimality results as the MLE (or SDP) does. A crucial step missing is a sharp entrywise perturbation analysis of eigenvectors. This is party due to the fact that targeting directly the distance between the eigenvectors of a random matrix and its expected counterpart is not the right approach; errors per entry can be asymmetrically distributed, as we shall see in this paper.
This paper investigates entrywise behaviors of eigenvectors, and more generally eigenspaces, for random matrices with low expected rank using the following approach. Let be a random matrix, and let be the ‘error’ of around its mean. In many cases, is a symmetric matrix with low rank determined by the structure of a statistical problem, such as low-rank with blocks in community detection.
Consider for now the case of symmetric , and let , resp. , be the eigenvector corresponding to the -th largest eigenvalue of , resp. . Roughly speaking, if is moderate, our first-order approximation reads
While eigenvectors are nonlinear functions of the random matrix (or equivalently ), the approximation is linear in , which greatly facilitates the analysis of eigenvectors. Under certain conditions, the maximum entrywise approximation error can be much smaller than , allowing us to compare , and sharply. To obtain such results, a key part in our theory is to characterize concentration properties of the random matrix and structural assumptions on its expectation .
This perturbation analysis leads to new and sharp theoretical guarantees. In particular, we find that for the problems we study in this paper, fast one-shot eigenvector-based algorithms enjoy the same optimality as MLE does, i.e., the vanilla spectral algorithm (without trimming or cleaning) achieves the information-theoretic limit of the MLE estimator, whenever the MLE estimator succeeds. This settles in particular a conjecture left open in [?]. Therefore, MLE or SDPs do not offer more advantages on spectral methods for exact recovery (a.k.a. strong consistency).
1.1A sample problem
Let us consider a network model that has received widespread interest in recent years: the stochastic block model (SBM). Suppose that we have a graph with vertex set , and assume for simplicity that is even. There is an unknown index set with such that the vertex set is partitioned into two groups and . Within groups, there is an edge between each pair of vertices with probability , and between groups, there is an edge with probability . Let be the group assignment vector with if and otherwise. The goal is to recover from the observed edges of the graph.
This random-graph-based model was proposed for social relationship networks [?], and many more realistic models have been developed based on the SBM since then. Given its fundamental importance, there are a plurality of papers addressing statistical properties and algorithmic efficiencies; we will further discuss these below, see [?] for a survey.
Under the regime , where are constants, it is known from [?] and [?] that exact recovery is possible if and only if , and that the limit can be achieved with efficient algorithms. The algorithms in these two papers use two-round procedures, with a clean-up phase, that achieve the threshold. Semidefinite relaxations are also known to achieve the threshold [?], as well as spectral methods with local refinements [?], which are also related to [?], [?], [?], [?] and [?].
While existing works tackle exact recovery rather successfully, some fundamental questions remain unsolved: how do the simple statistics—top eigenvectors of the adjacency matrix—behave? Are they informative enough to reveal the group structure under very challenging regimes?
To study these questions, we start with the eigenvectors of , the expectation of the graph adjacency matrix . By definition, is a Bernoulli random variable, and equals or depending on whenever and are from the same groups. The expectation must be a block matrix of the following form:
where is the all-one matrix. Here, for convenience, we represent as if . But in general is unknown, and there is a permutation of indices in the matrix representation.
From the matrix representation it is clear that has rank , with the top two eigenvalues and . Simple calculations give the corresponding (normalized) eigenvectors: , and if and if . Since perfectly aligns with the group assignment vector , we hope to show its counterpart , i.e., the second eigenvector of , also has desirable properties.
The first reassuring fact is that, the top eigenvalues preserve proper ordering: by Weyl’s inequality, the deviation of any eigenvalue () from is bounded by , which is with high probability, due to a refined version of Feige-Ofek’s result (see Lemma ?). Then, by Davis-Kahan’s theorem, and are weakly consistent estimators of and respectively, in the sense that for . However, weak consistency is only a guarantee in the average sense, and it is not helpful for understanding behaviors of the eigenvectors in the uniform sense, which is crucial for exact recovery. Weak consistency does not explain why we should expect a sharp phase transision phenomenon. This makes entrywise analysis both interesting and challenging.
This problem motivates some simulations about the coordinates of top eigenvectors of . In Figure 1, we calculate the rescaled second eigenvector of one typical realization , and make a histogram plot of its coordinates. (Note the first eigenvector is aligned with the all-one vector , which is uninformative). The parameters we choose to generate the random graph are and , for which exact recovery is possible with high probability. The red vertical lines show the coordinates of , which only take two values or . Visibly, the coordinates are grouped into two clusters. Intuitively, the signs of these coordinates alone should be able to reveal the group structure.
To probe into the second eigenvector , we expand the perturbation as follows:
The first term is exactly , which is linear in , and it represents the first-order perturbation error. The second term is nonlinear in in general, and represents the higher-order perturbation error. In Figure 1, we made boxplots of the infinity norm of rescaled perturbation errors for realizations (see (i)-(iii)). It is clear from the boxplots that the higher-order error is much smaller than both and . Indeed, we will see in Theorem ? that, up to sign
Here, the notation means as . Therefore, the entrywise behavior of is captured by the first-order term, which is much more amenable to analysis. This observation will finally lead to sharp eigenvector results in Section 3.2.
We remark that it is also possible to calculate the top eigenvector (denoted as ) of the centered adjacency matrix , where is the average degree of all vertices. The top eigenvector of is exactly , so its counterpart is very similar to . In fact, the same reasoning and analysis applies to , and one obtains similar plots as Figure 1 (omitted here).
1.2First-order approximation of eigenvectors
Now we present a simpler version of our result that justifies the intuitions above. Consider a general symmetric random matrix (more precisely, this should be a sequence of random matrices) (not necessarily a graph adjacency matrix) with independent entries on and above its diagonal. Suppose its expectation has a low-rank structure, with nonzero eigenvalues. Let us assume that
, these eigenvalues are positive and ordered , and .
Their corresponding eigenvectors are denoted by . In other words, we have spectral decomposition .
We fix . To study the -th eigenvector, let us define the eigen-gap (or spectral gap) by
where we define for simplicity. Assume that
concentrates under the spectral norm, i.e., there is a suitable such that holds with probability .
A direct yet important implication is that, the fluctuation of is much smaller than the gap , since by Weyl’s inequality, . Thus, is well separated from other eigenvalues, including the ‘bulk’ eigenvalues whose magnitudes are at most .
Besides, we assume concentrates in a row-wise sense. To understand its precise meaning, note that if each entry of is a Gaussian variable with variance , then for any and any weight vector with , the weighted sum is also a Gaussian variable with variance , and thus its absolute value is bounded by with probability (since Gaussian variables have light tails).
Here, the notation means the -th row vector of . While a similar result holds for sub-Gaussian variables, it is often too crude for Bernoulli variables with vanishing . To characterize such concentration well for a broader distribution family beyond Gaussian variables, we assume
there exists a continuous non-decreasing function that possibly depends on , such that , is non-increasing, and that for any , with probability ,
For Gaussian variables, i.e. , we can simply choose a linear function where is some proper constant. The condition then reads
which directly follows from Gaussian tail bound since . For Bernoulli variables, we can choose —see Figure 2. In both cases we have under suitable signal-to-noise conditions.
We are in a position to state our approximation result.
On the left-hand side, we are allowed to choose a suitable sign , because eigenvectors are not uniquely defined. The second bound is a consequence of the first one, since and by continuity. We hide dependency on in the above bound, since is bounded by a constant under suitable signal-to-noise ratio. See Theorem ? for a detailed version of the theorem. Therefore, the approximation error is much smaller than . This rigorously confirms the intuitions in Section 1.1.
Here are some remarks. (1) This theorem enables us to study via its linearized version , since the approximation error is usually small order-wise. (2) The conditions of the theorem are fairly mild. For SBM, the theorem is applicable as long as we are in the regime ( and ), regardless of the relative sizes of and . The requirement for exact recovery is only needed when showing entries of have the same signs as those of , and are well seperated into two clusters.
1.3MLE, spectral algorithm, and strong consistency
Once we obtain the approximation result , the analysis of entrywise behavior of eigenvectors boils down to that of . For example, in the SBM problem, once we prove , we expect with probability , where means the sign function applied to each coordinate of a vector. If, under suitable conditions, we further have with probability that and the entries of are all bounded away from zero by a order of , then the vanilla eigenvector estimator achieves exact recovery. Since each coordinate of is simply a sum of weighted Bernoulli variables, the analysis becomes much easier.
We remark on a subtlety of our result: our central analysis is a good control of , not necessarily of . For example, in SBM, an inequality such as is not true in general. In Figure 1, the second boxplot shows that may well exceed even if . This suggests that the distributions of the coordinates of the two clusters, though well separated, have longer tails on one side than the other. Thus, it is in vain to seek a good bound on —see Theorem ? and the following remarks. Instead of bounding , one should resort to the central quantity for optimal results. This may partly explain why the conjecture was not set up to now.
The vector also plays a pivotal role in the information-theoretic lower bound established in [?]. In SBM, for example, it is necessary to require holds with a nonvanishing probability at least, say, . Otherwise, by symmetry and the union bound, with probability at least , we can find some and such that and . If that occurs, an exchange of group assignments of and leads to an increase of the likelihood, and thus the MLE fails for exact recovery. It is well known that with a uniform prior on group assignments, the MLE is equivalent to the maximum a posteriori estimator, which is optimal for exact recovery. Therefore, to achieve exact recovery, it is necessary to ensure no such local refinement is possible. This forms the core argument of the information-theoretic lower bound. The above analysis suggests an interesting property about the eigenvector estimator :
This is because the success of the MLE hinges on . See Section 3.2 for details. This phenomenon holds for two applications considered in this paper.
1.4An iterative perspective
In the SBM, a key observation of the above heuristics is that is small. Here we give some intuitions from the iterative (or algorithmic) perspective. For simplicity, we will focus on the top eigenvector of the centered adjacency matrix . We denote the top eigenvalue of by .
It is well known that the top eigenvector of a symmetric matrix can be computed via the power method, namely, computing iteratively . Suppose we initialize the iterations by (recall is also the top eigenvector of ). Note that by initializing with an unknown vector, the power method is not a real algorithm, though it helps us gain theoretical insights.
The first iterate after initialization is . By standard concentration inequalities, we can show that concentrates around the top eigenvalue of . Therefore, is approximately , which is exactly our first-order approximation. Under suitable eigen-gap condition, the iterates typically converge to in the linear rate, namely, errors decaying geometrically. If those errors decay sufficiently fast, it is reasonable to expect that the first iterate is already good enough.
This iterative perspective is explored in recent works [?], where the latter studies both the eigenvector estimator and the MLE of a nonconvex problem. In this paper, we will not show any proof with iterations or inductions, though it is likely possible to do so. Instead, we resort to Davis-Kahan’s theorem combined with a “leave-one-out” type of technique. Nevertheless, we believe the iterative perspective is helpful to many other nonconvex problems where a counterpart of Davis-Kahan’s theorem is not available.
The study of eigenvector perturbation has a long history, with early works date back to Rayleigh and Schrödinger [?], in which asymptotic expansion of perturbation is developed. In numerical analysis, notable works include Davis-Kahan’s theorem [?], in which a perturbation bound for general eigenspaces under unitary-invariant norms is given in terms of the magnitude of the perturbation matrix and the eigen-gap. This result is later extended to general rectangular matrices in [?]. A comprehensive investigation on this topic can be found in the book [?]. However, norms that depend on the choice of basis, such as the norm, are not addressed, but are of great interest in statistics.
There are several recent papers related to the study of entrywise perturbation. In [?], eigenvector perturbation bounds are proved, and their results are improved by [?], in which the authors focus on norm bounds for eigenspaces. In [?], the perturbation of eigenvectors are expanded into infinite series, and then an perturbation bound is developed. These results are deterministic by nature, and thus yield suboptimal bounds under challenging stochastic regimes with small signal-to-noise ratio. In [?], bilinear forms of principal components (or singular vectors) are studied, yielding a sharp bound on error, which is later extended to tensors [?]. In [?], entrywise behaviors of eigenvectors are studied, and their connection with Rayleigh-Schrodinger perturbation theory is explored. In [?], in a related but slighted more complicated problem called “phase synchronization”, the entrywise behaviors of both eigenvector estimator and MLE estimator are analyzed under a near-optimal regime. In [?], similar ideas are used to derive the optimality of both the spectral method and MLE in top- ranking problem.
There is a rich literature on the three applications which our perturbation theorems are applied to. The synchronization problems [?] aim at estimating unknown signals (usually group elements) from their noisy pairwise measurements, and have attracted much attention in optimization and statistics community recently [?]. They are very relevant models for cryo-EM, robotics [?] and more.
The stochastic block model has been studied extensively in the past decades, with renewed activity in the recent years [?], see [?] for further references, and in particular [?], [?], [?], [?], [?] and [?], which are closest to this paper in terms of regimes and algorithms. The matrix completion problems [?] have seen great impacts in many areas, and new insights and ideas keep flourishing in recent works [?]. These lists are only a small fraction of the literature and are far from complete.
We organize our paper as follows: we present our main theorems of eigenvector and eigenspace perturbation in Section 2, which are rigorous statements of the intuitions introduced in Section 1. In Section 3, we apply the theorems to three problems: -synchronization, SBM, and matrix completion from noisy entries. In Section 4, we present simulation results to verify our theories. The ideas of proofs are outlined in Section 5, and technical details are deferred to the appendix. Finally, we conclude and discuss future works in Section 6.
We use the notation to refer to for , and let . For any real numbers , we denote and . For nonnegative and that depend on (e.g., problem size), we write to mean for some constant . The notation is similar, hiding two constants in upper and lower bounds. For any vector , we define and . For any matrix , refers to its -th row, which is a row vector, and refers to its -th column, which is a column vector. The matrix spectral norm is , the matrix max-norm is , and the matrix norm is . The set of matrices with orthonormal columns is denoted by .
2.1Random matrix ensembles
Suppose is a symmetric random matrix, and let . Denote the eigenvalues of by , and their associated eigenvectors by . Analogously for , the eigenvalues and eigenvectors are denoted by and . For convenience, we also define and . Note that we allow eigenvalues to be identical, so some eigenvectors may be defined up to rotations.
Suppose and are two integers satisfying and . Let , and . We are interested in the eigenspace , especially an approximate form of . To this end, we assume an eigen-gap that separates from and other eigenvalues (see Figure 3)
We define , which is always bounded from below by 1. In our applications, is usually bounded from above by a constant, i.e., is comparable to in terms of magnitude.
The concentration property is characterized by a parameter , and a function . Roughly speaking, is related to the noise level, and typically vanishes as tends to . The function is chosen according to the distribution of the noise, and is typically bounded by a constant for . Particularly, in our applications, (Gaussian noise) and (Bernoulli noise)—see Figure 2. In addition, we will also make a mild structural assumption: . In many applications involving low-rank structure, the eigenvalues of interest (and thus ) typically scale with , whereas scales with .
We make the following assumptions, followed by a few remarks.
(Row- and column-wise independence) For any , the entries in the th row and column of are independent with other entries: namely, are independent of .
(Spectral norm concentration) and for some ,
(Row concentration) Suppose is continuous and non-decreasing in with , is non-increasing in , and . For any and ,
Here are some remarks about the above assumptions. Assumption requires that no row of is dominant. To relate it to the usual concept of incoherence in [?] and [?], we consider the case and let . Note that
and . Then Assumption is satisfied if , which is very mild.
Assumption is a mild independence assumption, and it encompasses common i.i.d. noise assumptions.
Roughly speaking, can be interpreted as the signal strength. Assumption requires that the noise matrix is dominated by the signal strength under the spectral norm, since we need (its inverse is related to the signal-to-noise ratio) to be sufficiently small. For instance, in synchronization (see Section 3.1), , and the entries of above the diagonal are i.i.d. . Since by standard concentration results, we need to require .
In Assumption , we choose according to different noise distributions, as discussed in Section 1.2. This row concentration is a generalization of the form therein. For instance, in synchronization, the noise is Gaussian, and we will choose a linear function . Since in this case, for we have and . When , we have . This assumption requires , which is the regime where exact recovery takes place. Moreover, when but many entries of are less than in magnitude, intuitively there is less fluctuation and better concentration. Indeed, Assumption stipulates a tighter concentration bound by a factor of , where is typically much smaller than due to non-uniformity of weights. This delicate concentration bound turns out to be crucial in the analysis of SBM.
2.2Entrywise perturbation of general eigenspaces
In this section, we generalize Theorem ? from individual eigenvectors to eigenspaces under milder conditions that are characterized by additional parameters. Note that neither nor is uniquely defined, and they can only be determined up to a rotation if the eigenvalues are identical. For this reason, our result has to involve an orthogonal matrix. Beyond asserting our result holds up to a suitable rotation, we give an explicit form of such orthogonal matrix.
Let , and its singular value decomposition be , where are orthonormal matrices, and is a diagonal matrix. Define an orthonormal matrix as
This orthogonal matrix is called the matrix sign function in [?]. Now we are able to extend the results in Section 1.2 to general eigenspaces.
The third inequality is derived by simply writing as a sum of the first-order error and higher-order error , and bounding by the row concentration Assumption . It will be useful for the noisy matrix completion problem. It is worth pointing out that Theorem ? is applicable to any eigenvector of that does not have to be the leading one. This is particularly useful when applied to the stochastic block model in Section 3.2, since we need to analyze the second eigenvector there. Besides, we do not need to have low rank, although the examples to be presented have such structure. For low-rank , estimation errors of all the eigenvectors can be well controlled by the following corollary of Theorem ?.
Corollary ? directly follows from Theorem ?, inequality (Equation 5) and the fact that . To understand the inequalities in Corollary ?, let us consider a special case: is a rank-one matrix with a positive leading (top) eigenvalue and the leading eigenvector . Equivalently, we set and . With this structure, we have and . With this simplification, the random matrix is usually called spiked Wigner matrix in statistics and random matrix theory.
Our first two inequalities are then simplified as
and Assumption is equivalent to . In words, the assumption requires that the eigenvector is not aligned with the canonical basis . This indeed agrees with the usual incoherence condition as in [?] and [?]. Note that Theorem ? is a special case of Corollary ?, and hence Theorem ?.
In many applications, is bounded by a constant, and is vanishing as tends to infinity. Under such conditions, the first inequality then implies that the magnitude of the perturbed eigenvector is bounded by that of the true eigenvector in the sense. Furthermore, the second inequality provides a finer approximation result: the difference between and is much smaller than . Therefore, it is possible to study via , which usually makes analysis much easier.
3.1-synchronization and spiked Wigner model
The problem of -synchronization is to recover unknown labels from noisy pairwise measurements. It has served as the prototype of more general -synchronization problems. Important cases include phase synchronization and -synchronization, in which one wishes to estimate the phases of signals or rotations of cameras/molecules etc. It is relevant to many problems, including time synchronization of distributed networks [?], calibration of cameras [?], and cryo-EM [?].
Consider an unknown signal , where each entry only takes value in or . Suppose we have measurements (or data) of the form: , where , and . We can define for simplicity, and write our model into a matrix form as follows:
This is sometimes called the Gaussian -synchronization problem, in contrast to the -synchronization problem with -noise also called the censored block model [?]. This problem can be further generalized: each entry is a unit-modulus complex number , if the goal is to estimate the unknown angles from pairwise measurements; or, each entry is an orthogonal matrix from , if the goal is to estimate the unknown orientations of molecules, cameras, etc. Here, we focus on the simplest case, that is, each is .
Note that in , both and are symmetric matrices in , and the data matrix has a noisy rank-one decomposition. This falls into the spiked Wigner model. The quality of an estimator is usually gauged either by its correlation with , or by the proportion of labels it correctly recovers. It has been shown that the information-theoretic threshold for a nontrivial correlation is [?], and the threshold for exact recovery (i.e., with probability tending to ) is [?].
When ( is any constant), it is proved by [?] that semidefinite relaxation finds the maximum likelihood estimator and achieves exact recovery. However, in this paper, we show that a very simple method, both conceptually and computationally, also achieves exact recovery. This method is outlined as follows:
Compute the leading eigenvector of , which is denoted by .
Take the estimate .
Our next theorem asserts that this eigenvector-based method succeeds in finding asymptotically under . Thus, under any regime where the MLE achieves exact recovery, our eigenvector estimator equals the MLE with high probability. This phenomenon holds also for other examples that we have examined, including stochastic block models.
Note that our eigenvector approach does not utilize the structural constraints ; whereas such constraints are in the SDP formulation [?]. A natural question is an analysis of both methods with an increased noise level . A seminal work by [?] complements our story: the authors showed, via nonrigorous statistical mechanics arguments, that when is on the order of , the SDP-based approach outperforms the eigenvector approach. Nevertheless, with a slightly larger signal strength, there is no such advantage of the SDP approach.
Note also that without exploiting the nature of the signal, general results for spiked Wigner models [?] imply that, with high probability, when for any small constant . This is proved to be tight in [?], i.e., it is impossible to obtain non-trivial reconstruction from any estimator if .
3.2Stochastic Block Model
As is briefly discussed in Section 1, we focus on symmetric SBM with two equally-sized groups
The community detection problem aims at finding the index set given only one realization of . Let if and otherwise. Equivalently, we want to find an estimator for the unknown labels . Intuitively, the task is more difficult when is close to , and when the magnitude of is small. It is impossible, for instance, to produce any meaningful estimator when . The task is also impossible when and are as small as, for example, , since is a zero matrix with probability tending to .
As is already discussed in Section 1, under the regime where and are constants independent of , it is information theoretically impossible achieve exact recovery (the estimate equals or with probability tending to ) when . In contrast when , the goal is efficiently achievable. Further, it is known that SDPs succeed down to the threshold. Under the regime , it is impossible to obtain nontrivial correlation
Here we focus on the regime where , and are constants. Note that , or equivalently , is a rank- matrix. Its eigenvalues are , , , and the top two eigenvectors are and . Note that is aligned with . As perfectly reveals the desired partition, the following vanilla spectral method is a natural candidate:
Compute , the eigenvector of corresponding to its second largest eigenvalue .
Various papers have investigated this algorithm and its variants such as [?], [?], [?], [?], [?], [?], [?], [?], [?], among others. However, it is not known if the simple algorithm above achieves exact recovery down to the information-theoretic threshold, nor the optimal misclassification rate studied in [?] while below the threshold. An important reason for the unsettlement of this question is that the entrywise behavior of is not fully understood. In particular, if one focuses on obtaining bounds on , these can exceed the value of (see Theorem ?), suggesting that the algorithm may potentially fail by rounding on the incorrect sign. This is not necessarily the case—as errors could take place with larger magnitude on the ‘good side’ of the signal range—but this cannot be concluded by bounding only . To avoid suboptimal results in the analysis, two-round algorithms are often used in the literature [?], with a cleaning step that improves an approximate solution locally, and often also with a preliminary step to trim high-degree nodes [?]. Later, [?] and [?] show that the exact recovery threshold can be achieved with such variants. With the following result, both such steps can be avoided to show that the vanilla spectral algorithm achieves the threshold and the minimax lower-bound in one shot (Corollary ?).
This approximation bound is a consequence of Theorem ?. It holds for any constants and , and does not depend on the gap . Note that each entry of is either or , so the approximation error is negligible. By definition, ; however, Theorem ? shows that up to sign, is approximately , which is linear in the random matrix . Now, the analysis of our spectral method boils down to the study the entries in , which are just weighted sums of Bernoulli random variables. The next lemma is proved in [?] in a more general form. Here we provide a short proof in the appendix.
In the above lemma, it is visible that plays a central role for exact recovery. Moreover, when the gap is not large enough for exact recovery, our spectral method is able to achieve the best misclassification rate in the sense of minimax theory [?]. The misclassification rate is defined as
Therefore, the first part of the corollary implies that, under the regime where the MLE achieves exact recovery, our eigenvector estimator is exactly the MLE with high probability. This proves Corollary ? in the introduction. Moreover, the second part asserts that for more challenging regime where exact recovery is impossible, the eigenvector estimator has the optimal misclassification rate.
As is mentioned earlier, our sharp result for the eigenvector estimator stem from careful analysis of the linearized version of , and the approximation error . This is superior to direct analysis of the perturbation . It is implied by the next theorem that can be larger then , making it unable to imply that preserves the signs of entries in .
Now let us consider the case in Figure 1, where and . On the one hand, exact recovery is achievable since . On the other hand, by taking we get and . Theorem ? forces
In words, the size of fluctuation is larger than the signal strength with high probability. As a result, by merely looking at the eigenvector perturbation we cannot expect sharp analysis of the spectral method in exact recovery.
3.3Matrix completion from noisy entries
Matrix completion based on partial observations has wide applications including collaborative filtering, system identification, global positioning, remote sensing, etc., as outlined in [?]. A popular version is the “Netflix problem”, where a imcomplete table of customer ratings was posted online, and the goal was to predict the missing ratings with prescribed accuracy. This could be useful for targeted recommendation in the future. Since it has been intensively studied in the past decade, our brief review below is by no means exhaustive. [?], [?], and [?] focus on exact recovery of low-rank matrices based on noiseless observations. More realistic models with noisy observations are studied in [?], [?], [?], [?] and [?].
As an application of Theorem ?, we are going to study a model similar to the one in [?] where both sampling scheme and noise are random. It can be viewed as a statistical problem with missing values. Suppose we have an unknown signal matrix . For each entry of , we have a noisy observation with probability , and have no observation otherwise. Let record our observations, with missing entries treated as zeros. For convenience, we consider the rescaled partial observation matrix . It is easy to see that is an unbiased estimator for , and hence a popular starting point for further analysis. The definition of our model is formalized below.
Let and be its singular value decomposition (SVD), where , , is diagonal, and . We are interested in estimating , , and . The rank is assumed to be known, which is usually easily estimated otherwise, see [?] for example. We work on a very simple spectral algorithm that often serves as an initial estimate of in iterative methods.
Compute the largest singular values of , and their associated left and right singular vectors and . Define , and .
Return , and as estimators for , , and , respectively.
Note that the matrices in Definition ? are asymmetric in general, due to the rectangular shape and independent sampling. Hence, the SVD in our spectral algorithm is different from eigen-decomposition, and Theorem ? is not directly applicable. Nevertheless, it could be tailored to fit into our framework by a “symmetric dilation” trick. Details are postponed to Appendix B. Below we present our results.
Our analysis yields entrywise bounds for reconstruction errors which, to our best knowledge, are the first results of this type for the spectral algorithm. In non-convex optimization-based algorithms for matrix recovery [?], the spectral algorithm or its variants usually serve as the initialization step to be followed by local refinements. Therefore, error bounds for the spectral algorithm play an important role in the analysis of those algorithms. Existing studies rely on perturbation analysis of eigenvectors, and thus provide guarantees in terms of reconstruction error. We expect our entrywise analysis to be helpful for deriving finer results.
In Theorem ? we have made no endeavor to optimize the dependence on or . To facilitate interpretation, we translate the results therein into the language of relative errors:
Two key quantities are and . The former specifies the noise corruption on , while the latter characterizes the structure of . is closely related to the matrix coherence , a concept introduced by [?] and widely used in the analysis of matrix recovery problems. The more delocalized and are, the smaller is. A trivial bower bound is .
Now we further suppose