Entrywise Eigenvector Analysis of Random Matrices with Low Expected Rank

# Entrywise Eigenvector Analysis of Random Matrices with Low Expected Rank

Emmanuel Abbe, Jianqing Fan, Kaizheng Wang and Yiqiao Zhong Address: Program in Applied and Computational Mathematics, and Department of EE, Princeton University, Princeton, NJ 08544, USA; E-mails:eabbe@princeton.edu. The research was supported by NSF CAREER Award CCF-1552131, ARO grant W911NF-16-1-0051, NSF CSOI CCF-0939370.Address: Department of ORFE, Sherrerd Hall, Princeton University, Princeton, NJ 08544, USA; E-mails: {jqfan, kaizheng, yiqiaoz}@princeton.edu. The research was supported by NSF grants DMS-1662139 and DMS-1712591 and NIH grant R01-GM072611-11.
###### Abstract

Recovering low-rank structures via eigenvector perturbation analysis is a common problem in statistical machine learning, such as in factor analysis, community detection, ranking, matrix completion, among others. While a large variety of results provide tight bounds on the average errors between empirical and population statistics of eigenvectors, fewer results are tight for entrywise analyses, which are critical for a number of problems such as community detection and ranking.

This paper investigates the entrywise perturbation analysis for a large class of random matrices whose expectations are low-rank, including community detection, synchronization (-spiked Wigner model) and matrix completion models. Denoting by , respectively , the eigenvectors of a random matrix , respectively , the paper characterizes cases for which

 uk≈Au∗kλ∗k

serves as a first-order approximation under the norm. The fact that the approximation is both tight and linear in the random matrix allows for sharp comparisons of and . In particular, it allows to compare the signs of and even when is large, which in turn allows to settle the conjecture in ABH16 that the spectral algorithm achieves exact recovery in the stochastic block model without any trimming or cleaning steps. The results are further extended to the perturbation of eigenspaces, providing new bounds for -type errors in noisy matrix completion.

Keywords: Eigenvector perturbation, spectral analysis, synchronization, community detection, matrix completion, low-rank structures, random matrices.

## 1 Introduction

Many estimation problems in statistics involve low-rank matrix estimators that are NP-hard to compute, and many of these estimators are solutions to nonconvex programs. This is partly because of the widespread use of maximum likelihood estimation (MLE), which, while enjoying good statistical properties, often poses computational challenges due to nonconvex or discrete constraints inherent in the problems. It is, thus, crucial to address computational issues, especially in modern large-scale problems.

Fortunately, computationally efficient eigenvector-based algorithms often afford good performance. The eigenvectors either directly lead to final estimates (SMa00; NJW02), or serve as warm starts followed by further refinements (KesMonOh10; JaiNetSan13; CanLiSol15). Such algorithms mostly rely on computing leading eigenvectors and matrix-vector multiplications, which are fast by nature.

While various heuristics abound, theoretical understanding remains scarce on the entrywise analysis, and on when refinements are needed or can be avoided. In particular, it remains open in various cases to determine whether a vanilla eigenvector-based method without preprocessing steps (e.g., trimming of outliers) or without refinement steps (e.g., cleaning with local improvements) enjoys the same optimality results as the MLE (or SDP) does. A crucial step missing is a sharp entrywise perturbation analysis of eigenvectors. This is party due to the fact that targeting directly the distance between the eigenvectors of a random matrix and its expected counterpart is not the right approach; errors per entry can be asymmetrically distributed, as we shall see in this paper.

This paper investigates entrywise behaviors of eigenvectors, and more generally eigenspaces, for random matrices with low expected rank using the following approach. Let be a random matrix, and let be the ‘error’ of around its mean. In many cases, is a symmetric matrix with low rank determined by the structure of a statistical problem, such as low-rank with blocks in community detection.

Consider for now the case of symmetric , and let , resp. , be the eigenvector corresponding to the -th largest eigenvalue of , resp. . Roughly speaking, if is moderate, our first-order approximation reads

 uk=Aukλk≈Au∗kλ∗k=u∗k+Eu∗kλ∗k.

While eigenvectors are nonlinear functions of the random matrix (or equivalently ), the approximation is linear in , which greatly facilitates the analysis of eigenvectors. Under certain conditions, the maximum entrywise approximation error can be much smaller than , allowing us to compare , and sharply. To obtain such results, a key part in our theory is to characterize concentration properties of the random matrix and structural assumptions on its expectation .

This perturbation analysis leads to new and sharp theoretical guarantees. In particular, we find that for the problems we study in this paper, fast one-shot eigenvector-based algorithms enjoy the same optimality as MLE does, i.e., the vanilla spectral algorithm (without trimming or cleaning) achieves the information-theoretic limit of the MLE estimator, whenever the MLE estimator succeeds. This settles in particular a conjecture left open in abh_arxiv; ABH16. Therefore, MLE or SDPs do not offer more advantages on spectral methods for exact recovery (a.k.a. strong consistency).

### 1.1 A sample problem

Let us consider a network model that has received widespread interest in recent years: the stochastic block model (SBM). Suppose that we have a graph with vertex set , and assume for simplicity that is even. There is an unknown index set with such that the vertex set is partitioned into two groups and . Within groups, there is an edge between each pair of vertices with probability , and between groups, there is an edge with probability . Let be the group assignment vector with if and otherwise. The goal is to recover from the observed edges of the graph.

This random-graph-based model was proposed for social relationship networks (HLL83), and many more realistic models have been developed based on the SBM since then. Given its fundamental importance, there are a plurality of papers addressing statistical properties and algorithmic efficiencies; we will further discuss these below, see Abb17 for a survey.

Under the regime , where are constants, it is known from ABH16 and mossel_consist that exact recovery is possible if and only if , and that the limit can be achieved with efficient algorithms. The algorithms in these two papers use two-round procedures, with a clean-up phase, that achieve the threshold. Semidefinite relaxations are also known to achieve the threshold (ABH16; wu-xu; afonson; afonso_single), as well as spectral methods with local refinements (ASa15; prout2; Gao15), which are also related to Coj06, Vu14, Vu-new, massoulie-STOC and prout.

While existing works tackle exact recovery rather successfully, some fundamental questions remain unsolved: how do the simple statistics—top eigenvectors of the adjacency matrix—behave? Are they informative enough to reveal the group structure under very challenging regimes?

To study these questions, we start with the eigenvectors of , the expectation of the graph adjacency matrix . By definition, is a Bernoulli random variable, and equals or depending on whenever and are from the same groups. The expectation must be a block matrix of the following form:

 EA=lognn⎛⎝a1n2×n2b1n2×n2b1n2×n2a1n2×n2⎞⎠,

where is the all-one matrix. Here, for convenience, we represent as if . But in general is unknown, and there is a permutation of indices in the matrix representation.

From the matrix representation it is clear that has rank , with the top two eigenvalues and . Simple calculations give the corresponding (normalized) eigenvectors: , and if and if . Since perfectly aligns with the group assignment vector , we hope to show its counterpart , i.e., the second eigenvector of , also has desirable properties.

The first reassuring fact is that, the top eigenvalues preserve proper ordering: by Weyl’s inequality, the deviation of any eigenvalue () from is bounded by , which is with high probability, due to a refined version of Feige-Ofek’s result (see Lemma 7). Then, by Davis-Kahan’s theorem, and are weakly consistent estimators of and respectively, in the sense that for . However, weak consistency is only a guarantee in the average sense, and it is not helpful for understanding behaviors of the eigenvectors in the uniform sense, which is crucial for exact recovery. Weak consistency does not explain why we should expect a sharp phase transision phenomenon. This makes entrywise analysis both interesting and challenging.

This problem motivates some simulations about the coordinates of top eigenvectors of . In Figure 1, we calculate the rescaled second eigenvector of one typical realization , and make a histogram plot of its coordinates. (Note the first eigenvector is aligned with the all-one vector , which is uninformative). The parameters we choose to generate the random graph are and , for which exact recovery is possible with high probability. The red vertical lines show the coordinates of , which only take two values or . Visibly, the coordinates are grouped into two clusters. Intuitively, the signs of these coordinates alone should be able to reveal the group structure.

To probe into the second eigenvector , we expand the perturbation as follows:

 u2−u∗2=(Au∗2λ∗2−u∗2)+(u2−Au∗2λ∗2). (1)

The first term is exactly , which is linear in , and it represents the first-order perturbation error. The second term is nonlinear in in general, and represents the higher-order perturbation error. In Figure 1, we made boxplots of the infinity norm of rescaled perturbation errors for realizations (see (i)-(iii)). It is clear from the boxplots that the higher-order error is much smaller than both and . Indeed, we will see in Theorem 1.1 that, up to sign111This means we can choose an appropriate sign for the eigenvector, since it is not uniquely defined. See Theorem 1.1 for its precise meaning.,

 ∥∥u2−Au∗2/λ∗2∥∥∞=oP(mini|(u∗2)i|)=oP(1/√n). (2)

Here, the notation means as . Therefore, the entrywise behavior of is captured by the first-order term, which is much more amenable to analysis. This observation will finally lead to sharp eigenvector results in Section 3.2.

We remark that it is also possible to calculate the top eigenvector (denoted as ) of the centered adjacency matrix , where is the average degree of all vertices. The top eigenvector of is exactly , so its counterpart is very similar to . In fact, the same reasoning and analysis applies to , and one obtains similar plots as Figure 1 (omitted here).

### 1.2 First-order approximation of eigenvectors

Now we present a simpler version of our result that justifies the intuitions above. Consider a general symmetric random matrix (more precisely, this should be a sequence of random matrices) (not necessarily a graph adjacency matrix) with independent entries on and above its diagonal. Suppose its expectation has a low-rank structure, with nonzero eigenvalues. Let us assume that

• (a) , these eigenvalues are positive and ordered , and .

Their corresponding eigenvectors are denoted by . In other words, we have spectral decomposition .

We fix . To study the -th eigenvector, let us define the eigen-gap (or spectral gap) by

 Δ∗=min{λ∗k−1−λ∗k,λ∗k−λ∗k+1},

where we define for simplicity. Assume that

• (b) concentrates under the spectral norm, i.e., there is a suitable such that holds with probability .

A direct yet important implication is that, the fluctuation of is much smaller than the gap , since by Weyl’s inequality, . Thus, is well separated from other eigenvalues, including the ‘bulk’ eigenvalues whose magnitudes are at most .

Besides, we assume concentrates in a row-wise sense. To understand its precise meaning, note that if each entry of is a Gaussian variable with variance , then for any and any weight vector with , the weighted sum is also a Gaussian variable with variance , and thus its absolute value is bounded by with probability (since Gaussian variables have light tails).

Here, the notation means the -th row vector of . While a similar result holds for sub-Gaussian variables, it is often too crude for Bernoulli variables with vanishing . To characterize such concentration well for a broader distribution family beyond Gaussian variables, we assume

• (c) there exists a continuous non-decreasing function that possibly depends on , such that , is non-increasing, and that for any , with probability ,

 |(A−A∗)m⋅w|≤Δ∗∥w∥∞φ(∥w∥2√n∥w∥∞).

For Gaussian variables, i.e. , we can simply choose a linear function where is some proper constant. The condition then reads

 P(|(A−A∗)m⋅w|≤cσ√logn∥w∥2)=1−o(n−1),

which directly follows from Gaussian tail bound since . For Bernoulli variables, we can choose —see Figure 2. In both cases we have under suitable signal-to-noise conditions.

We are in a position to state our approximation result.

###### Theorem 1.1 (Simpler form of Theorem 2.1).

Let be fixed. Suppose that Assumptions (a), (b) and (c) hold, and . Then, with probability ,

 mins∈{±1}∥uk−sAu∗k/λ∗k∥∞=O((γ+φ(γ))∥u∗k∥∞)=o(∥u∗k∥∞), (3)

where the notations and hide dependencies on .

On the left-hand side, we are allowed to choose a suitable sign , because eigenvectors are not uniquely defined. The second bound is a consequence of the first one, since and by continuity. We hide dependency on in the above bound, since is bounded by a constant under suitable signal-to-noise ratio. See Theorem 2.1 for a detailed version of the theorem. Therefore, the approximation error is much smaller than . This rigorously confirms the intuitions in Section 1.1.

Here are some remarks. (1) This theorem enables us to study via its linearized version , since the approximation error is usually small order-wise. (2) The conditions of the theorem are fairly mild. For SBM, the theorem is applicable as long as we are in the regime ( and ), regardless of the relative sizes of and . The requirement for exact recovery is only needed when showing entries of have the same signs as those of , and are well seperated into two clusters.

### 1.3 MLE, spectral algorithm, and strong consistency

Once we obtain the approximation result (3), the analysis of entrywise behavior of eigenvectors boils down to that of . For example, in the SBM problem, once we prove (2), we expect with probability , where means the sign function applied to each coordinate of a vector. If, under suitable conditions, we further have with probability that and the entries of are all bounded away from zero by a order of , then the vanilla eigenvector estimator achieves exact recovery. Since each coordinate of is simply a sum of weighted Bernoulli variables, the analysis becomes much easier.

We remark on a subtlety of our result: our central analysis is a good control of , not necessarily of . For example, in SBM, an inequality such as is not true in general. In Figure 1, the second boxplot shows that may well exceed even if . This suggests that the distributions of the coordinates of the two clusters, though well separated, have longer tails on one side than the other. Thus, it is in vain to seek a good bound on —see Theorem 3.3 and the following remarks. Instead of bounding , one should resort to the central quantity for optimal results. This may partly explain why the conjecture was not set up to now.

The vector also plays a pivotal role in the information-theoretic lower bound established in ABH16. In SBM, for example, it is necessary to require holds with a nonvanishing probability at least, say, . Otherwise, by symmetry and the union bound, with probability at least , we can find some and such that and . If that occurs, an exchange of group assignments of and leads to an increase of the likelihood, and thus the MLE fails for exact recovery. It is well known that with a uniform prior on group assignments, the MLE is equivalent to the maximum a posteriori estimator, which is optimal for exact recovery. Therefore, to achieve exact recovery, it is necessary to ensure no such local refinement is possible. This forms the core argument of the information-theoretic lower bound. The above analysis suggests an interesting property about the eigenvector estimator :

###### Corollary 1.1.

Suppose we are given such that , i.e., we exclude the regime where is at the boundary of the phase transition. Then, whenever the MLE is successful, in the sense that (up to sign) with probability , we have

 ^xeig(A)=^x\scriptsize MLE(A)=x

with probability , where is the sign indicator of the true communities.

This is because the success of the MLE hinges on . See Section 3.2 for details. This phenomenon holds for two applications considered in this paper.

### 1.4 An iterative perspective

In the SBM, a key observation of the above heuristics is that is small. Here we give some intuitions from the iterative (or algorithmic) perspective. For simplicity, we will focus on the top eigenvector of the centered adjacency matrix . We denote the top eigenvalue of by .

It is well known that the top eigenvector of a symmetric matrix can be computed via the power method, namely, computing iteratively . Suppose we initialize the iterations by (recall is also the top eigenvector of ). Note that by initializing with an unknown vector, the power method is not a real algorithm, though it helps us gain theoretical insights.

The first iterate after initialization is . By standard concentration inequalities, we can show that concentrates around the top eigenvalue of . Therefore, is approximately , which is exactly our first-order approximation. Under suitable eigen-gap condition, the iterates typically converge to in the linear rate, namely, errors decaying geometrically. If those errors decay sufficiently fast, it is reasonable to expect that the first iterate is already good enough.

This iterative perspective is explored in recent works (Zho17; ZhoBou17), where the latter studies both the eigenvector estimator and the MLE of a nonconvex problem. In this paper, we will not show any proof with iterations or inductions, though it is likely possible to do so. Instead, we resort to Davis-Kahan’s theorem combined with a “leave-one-out” type of technique. Nevertheless, we believe the iterative perspective is helpful to many other nonconvex problems where a counterpart of Davis-Kahan’s theorem is not available.

### 1.5 Related works

The study of eigenvector perturbation has a long history, with early works date back to Rayleigh and Schrödinger (rayleigh1896theory; schrodinger1926quantisierung), in which asymptotic expansion of perturbation is developed. In numerical analysis, notable works include Davis-Kahan’s theorem (DavKah70), in which a perturbation bound for general eigenspaces under unitary-invariant norms is given in terms of the magnitude of the perturbation matrix and the eigen-gap. This result is later extended to general rectangular matrices in Wed72. A comprehensive investigation on this topic can be found in the book SSu90. However, norms that depend on the choice of basis, such as the norm, are not addressed, but are of great interest in statistics.

There are several recent papers related to the study of entrywise perturbation. In FanWanZho16, eigenvector perturbation bounds are proved, and their results are improved by CapTanPri17, in which the authors focus on norm bounds for eigenspaces. In EldBelWan17, the perturbation of eigenvectors are expanded into infinite series, and then an perturbation bound is developed. These results are deterministic by nature, and thus yield suboptimal bounds under challenging stochastic regimes with small signal-to-noise ratio. In KLo16; KolXia16, bilinear forms of principal components (or singular vectors) are studied, yielding a sharp bound on error, which is later extended to tensors (XiaZho17). In Zho17, entrywise behaviors of eigenvectors are studied, and their connection with Rayleigh-Schrodinger perturbation theory is explored. In ZhoBou17, in a related but slighted more complicated problem called “phase synchronization”, the entrywise behaviors of both eigenvector estimator and MLE estimator are analyzed under a near-optimal regime. In CFM17, similar ideas are used to derive the optimality of both the spectral method and MLE in top- ranking problem.

There is a rich literature on the three applications which our perturbation theorems are applied to. The synchronization problems (Sin11; CucLipSin12) aim at estimating unknown signals (usually group elements) from their noisy pairwise measurements, and have attracted much attention in optimization and statistics community recently (bandeira2014tightness; JMR16). They are very relevant models for cryo-EM, robotics (Sin11; Ros16) and more.

The stochastic block model has been studied extensively in the past decades, with renewed activity in the recent years (Coj06; decelle; massoulie-STOC; Mossel_SBM2; KMM13; ABH16; sbm-groth; levina; ASa15; montanari_sen; bordenave; colin3cpam; banks2), see Abb17 for further references, and in particular McS01, Vu14, prout, massoulie-xu, Vu-new and prout2, which are closest to this paper in terms of regimes and algorithms. The matrix completion problems (CRe09; CPl10; KMO101) have seen great impacts in many areas, and new insights and ideas keep flourishing in recent works (Ge16; SLu16). These lists are only a small fraction of the literature and are far from complete.

We organize our paper as follows: we present our main theorems of eigenvector and eigenspace perturbation in Section 2, which are rigorous statements of the intuitions introduced in Section 1. In Section 3, we apply the theorems to three problems: -synchronization, SBM, and matrix completion from noisy entries. In Section 4, we present simulation results to verify our theories. The ideas of proofs are outlined in Section 5, and technical details are deferred to the appendix. Finally, we conclude and discuss future works in Section 6.

### 1.6 Notations

We use the notation to refer to for , and let . For any real numbers , we denote and . For nonnegative and that depend on (e.g., problem size), we write to mean for some constant . The notation is similar, hiding two constants in upper and lower bounds. For any vector , we define and . For any matrix , refers to its -th row, which is a row vector, and refers to its -th column, which is a column vector. The matrix spectral norm is , the matrix max-norm is , and the matrix norm is . The set of matrices with orthonormal columns is denoted by .

## 2 Main results

### 2.1 Random matrix ensembles

Suppose is a symmetric random matrix, and let . Denote the eigenvalues of by , and their associated eigenvectors by . Analogously for , the eigenvalues and eigenvectors are denoted by and . For convenience, we also define and . Note that we allow eigenvalues to be identical, so some eigenvectors may be defined up to rotations.

Suppose and are two integers satisfying and . Let , and . We are interested in the eigenspace , especially an approximate form of . To this end, we assume an eigen-gap that separates from and other eigenvalues (see Figure 3) 222Compared to the usual eigen-gap (DavKah70), we include in addition, due to a technical reason. Note, for a rank-deficient , is itself an eigenvalue., i.e.,

 Δ∗=(λ∗s−λ∗s+1)∧(λ∗s+r−λ∗s+r+1)∧mini∈[r]|λ∗s+i|. (4)

We define , which is always bounded from below by 1. In our applications, is usually bounded from above by a constant, i.e., is comparable to in terms of magnitude.

The concentration property is characterized by a parameter , and a function . Roughly speaking, is related to the noise level, and typically vanishes as tends to . The function is chosen according to the distribution of the noise, and is typically bounded by a constant for . Particularly, in our applications, (Gaussian noise) and (Bernoulli noise)—see Figure 2. In addition, we will also make a mild structural assumption: . In many applications involving low-rank structure, the eigenvalues of interest (and thus ) typically scale with , whereas scales with .

We make the following assumptions, followed by a few remarks.

1. (Incoherence) .

2. (Row- and column-wise independence) For any , the entries in the th row and column of are independent with other entries: namely, are independent of .

3. (Spectral norm concentration) and for some ,

 P(∥A−A∗∥2≤γΔ∗)≥1−δ0. (5)
4. (Row concentration) Suppose is continuous and non-decreasing in with , is non-increasing in , and . For any and ,

 P(∥(A−A∗)m⋅W∥2≤Δ∗∥W∥2→∞φ(∥W∥F√n∥W∥2→∞))≥1−δ1n. (6)

Here are some remarks about the above assumptions. Assumption 1 requires that no row of is dominant. To relate it to the usual concept of incoherence in CRe09 and CLMW11, we consider the case and let . Note that

 ∥U∗Λ∗(U∗)T∥2→∞≤∥U∗∥2→∞∥Λ∗(U∗)T∥2=∥U∗∥2→∞∥Λ∗∥2, (7)

and . Then Assumption 1 is satisfied if , which is very mild.

Assumption 2 is a mild independence assumption, and it encompasses common i.i.d. noise assumptions.

Roughly speaking, can be interpreted as the signal strength. Assumption 3 requires that the noise matrix is dominated by the signal strength under the spectral norm, since we need (its inverse is related to the signal-to-noise ratio) to be sufficiently small. For instance, in synchronization (see Section 3.1), , and the entries of above the diagonal are i.i.d. . Since by standard concentration results, we need to require .

In Assumption 4, we choose according to different noise distributions, as discussed in Section 1.2. This row concentration is a generalization of the form therein. For instance, in synchronization, the noise is Gaussian, and we will choose a linear function . Since in this case, for we have and . When , we have . This assumption requires , which is the regime where exact recovery takes place. Moreover, when but many entries of are less than in magnitude, intuitively there is less fluctuation and better concentration. Indeed, Assumption 4 stipulates a tighter concentration bound by a factor of , where is typically much smaller than due to non-uniformity of weights. This delicate concentration bound turns out to be crucial in the analysis of SBM.

### 2.2 Entrywise perturbation of general eigenspaces

In this section, we generalize Theorem 1.1 from individual eigenvectors to eigenspaces under milder conditions that are characterized by additional parameters. Note that neither nor is uniquely defined, and they can only be determined up to a rotation if the eigenvalues are identical. For this reason, our result has to involve an orthogonal matrix. Beyond asserting our result holds up to a suitable rotation, we give an explicit form of such orthogonal matrix.

Let , and its singular value decomposition be , where are orthonormal matrices, and is a diagonal matrix. Define an orthonormal matrix as

 sgn(H):=¯U¯VT. (8)

This orthogonal matrix is called the matrix sign function in Gro11. Now we are able to extend the results in Section 1.2 to general eigenspaces.

###### Theorem 2.1.

Under Assumptions A1A4, with probability at least we have

 ∥U∥2→∞≲(κ+φ(1))∥U∗∥2→∞+γ∥A∗∥2→∞/Δ∗, (9) ∥Usgn(H)−AU∗(Λ∗)−1∥2→∞≲κ(κ+φ(1))(γ+φ(γ))∥U∗∥2→∞+γ∥A∗∥2→∞/Δ∗, (10) ∥Usgn(H)−U∗∥2→∞≤∥Usgn(H)−AU∗(Λ∗)−1∥2→∞+φ(1)∥U∗∥2→∞. (11)

Here the notation only hides absolute constants.

The third inequality (11) is derived by simply writing as a sum of the first-order error and higher-order error , and bounding by the row concentration Assumption A4. It will be useful for the noisy matrix completion problem. It is worth pointing out that Theorem 2.1 is applicable to any eigenvector of that does not have to be the leading one. This is particularly useful when applied to the stochastic block model in Section 3.2, since we need to analyze the second eigenvector there. Besides, we do not need to have low rank, although the examples to be presented have such structure. For low-rank , estimation errors of all the eigenvectors can be well controlled by the following corollary of Theorem 2.1.

###### Corollary 2.1.

Let Assumptions A1A4 hold, and suppose that . With probability at least , we have

 ∥U∥2→∞≲(κ+φ(1))∥U∗∥2→∞, (12) ∥Usgn(H)−AU∗(Λ∗)−1∥2→∞≲κ(κ+φ(1))(γ+φ(γ))∥U∗∥2→∞, (13) ∥Usgn(H)−U∗∥2→∞≤∥Usgn(H)−AU∗(Λ∗)−1∥2→∞+φ(1)∥U∗∥2→∞. (14)

Here the notation only hides absolute constants.

Corollary 2.1 directly follows from Theorem 2.1, inequality (7) and the fact that . To understand the inequalities in Corollary 2.1, let us consider a special case: is a rank-one matrix with a positive leading (top) eigenvalue and the leading eigenvector . Equivalently, we set and . With this structure, we have and . With this simplification, the random matrix is usually called spiked Wigner matrix in statistics and random matrix theory.

Our first two inequalities are then simplified as

 ∥u∥∞≲(1+φ(1))∥u∗∥∞, (15) ∥u−Au∗/λ∗∥∞≲(γ+φ(γ))(1+φ(1))∥u∗∥∞, (16)

and Assumption A1 is equivalent to . In words, the assumption requires that the eigenvector is not aligned with the canonical basis . This indeed agrees with the usual incoherence condition as in CRe09 and CLMW11. Note that Theorem 1.1 is a special case of Corollary 2.1, and hence Theorem 2.1.

In many applications, is bounded by a constant, and is vanishing as tends to infinity. Under such conditions, the first inequality (15) then implies that the magnitude of the perturbed eigenvector is bounded by that of the true eigenvector in the sense. Furthermore, the second inequality (16) provides a finer approximation result: the difference between and is much smaller than . Therefore, it is possible to study via , which usually makes analysis much easier.

## 3 Applications

### 3.1 Z2-synchronization and spiked Wigner model

The problem of -synchronization is to recover unknown labels from noisy pairwise measurements. It has served as the prototype of more general -synchronization problems. Important cases include phase synchronization and -synchronization, in which one wishes to estimate the phases of signals or rotations of cameras/molecules etc. It is relevant to many problems, including time synchronization of distributed networks (giridhar2006distributed), calibration of cameras (tron2009distributed), and cryo-EM (shkolnisky2012viewing).

Consider an unknown signal , where each entry only takes value in or . Suppose we have measurements (or data) of the form: , where , and . We can define for simplicity, and write our model into a matrix form as follows:

 Y=xxT+σW,x∈{±1}n. (17)

This is sometimes called the Gaussian -synchronization problem, in contrast to the -synchronization problem with -noise also called the censored block model (abbs). This problem can be further generalized: each entry is a unit-modulus complex number , if the goal is to estimate the unknown angles from pairwise measurements; or, each entry is an orthogonal matrix from , if the goal is to estimate the unknown orientations of molecules, cameras, etc. Here, we focus on the simplest case, that is, each is .

Note that in (17), both and are symmetric matrices in , and the data matrix has a noisy rank-one decomposition. This falls into the spiked Wigner model. The quality of an estimator is usually gauged either by its correlation with , or by the proportion of labels it correctly recovers. It has been shown that the information-theoretic threshold for a nontrivial correlation is (JMR16; yash_sbm; lelarge2016fundamental; perry2016optimality), and the threshold for exact recovery (i.e., with probability tending to ) is (bandeira2014tightness).

When ( is any constant), it is proved by bandeira2014tightness that semidefinite relaxation finds the maximum likelihood estimator and achieves exact recovery. However, in this paper, we show that a very simple method, both conceptually and computationally, also achieves exact recovery. This method is outlined as follows:

1. Compute the leading eigenvector of , which is denoted by .

2. Take the estimate .

Our next theorem asserts that this eigenvector-based method succeeds in finding asymptotically under . Thus, under any regime where the MLE achieves exact recovery, our eigenvector estimator equals the MLE with high probability. This phenomenon holds also for other examples that we have examined, including stochastic block models.

###### Theorem 3.1.

Suppose where . Then, with probability , the leading eigenvector of with normalization satisfies

 √nmini∈[n]{sxiui}≥1−√22+ε+C√logn,

for a suitable , where is an absolute constant. As a consequence, our eigenvector-based method achieves exact recovery.

Note that our eigenvector approach does not utilize the structural constraints ; whereas such constraints are in the SDP formulation (bandeira2014tightness). A natural question is an analysis of both methods with an increased noise level . A seminal work by JMR16 complements our story: the authors showed, via nonrigorous statistical mechanics arguments, that when is on the order of , the SDP-based approach outperforms the eigenvector approach. Nevertheless, with a slightly larger signal strength, there is no such advantage of the SDP approach.

Note also that without exploiting the nature of the signal, general results for spiked Wigner models (BBP05; FPe07; Ben11) imply that, with high probability, when for any small constant . This is proved to be tight in yash_sbm, i.e., it is impossible to obtain non-trivial reconstruction from any estimator if .

### 3.2 Stochastic Block Model

As is briefly discussed in Section 1, we focus on symmetric SBM with two equally-sized groups333When the two groups have different sizes, the analysis only requires slight modification. Note that the second eigenvector of depends on the sizes of the two groups.. For simplicity, we include self-loops in the random graph (i.e., there are possibly edges from vertices to themselves). It makes no much difference if self-loops are not allowed.444If no self-loops are allowed, the expectation of the adjacency matrix changes by under the spectral norm, which is negaligable; moreover, Assumptions A1A4 still hold with the same parameters.

###### Definition 3.1.

Let be even, , and with . is the ensemble of symmetric random matrices where are independent Bernoulli random variables, and

 P(Aij=1)={p, if i∈J,j∈J or i∈Jc,j∈Jcq,otherwise. (18)

The community detection problem aims at finding the index set given only one realization of . Let if and otherwise. Equivalently, we want to find an estimator for the unknown labels . Intuitively, the task is more difficult when is close to , and when the magnitude of is small. It is impossible, for instance, to produce any meaningful estimator when . The task is also impossible when and are as small as, for example, , since is a zero matrix with probability tending to .

As is already discussed in Section 1, under the regime where and are constants independent of , it is information theoretically impossible achieve exact recovery (the estimate equals or with probability tending to ) when . In contrast when , the goal is efficiently achievable. Further, it is known that SDPs succeed down to the threshold. Under the regime , it is impossible to obtain nontrivial correlation555This means the correlation between and is at least for some , since a random guess would get roughly half the signs correct and hence zero correlation with . between any estimator and when , and when , it is possible to obtain efficiently nontrivial correlation (massoulie-STOC; Mossel_SBM2).

Here we focus on th