Near-optimal bounds for phase synchronization

Near-optimal bounds for phase synchronization

Abstract

The problem of phase synchronization is to estimate the phases (angles) of a complex unit-modulus vector from their noisy pairwise relative measurements , where is a complex-valued Gaussian random matrix. The maximum likelihood estimator (MLE) is a solution to a unit-modulus constrained quadratic programming problem, which is nonconvex. Existing works have proposed polynomial-time algorithms such as a semidefinite relaxation (SDP) approach or the generalized power method (GPM) to solve it. Numerical experiments suggest both of these methods succeed with high probability for up to , yet, existing analyses only confirm this observation for up to . In this paper, we bridge the gap, by proving SDP is tight for , and GPM converges to the global optimum under the same regime. Moreover, we establish a linear convergence rate for GPM, and derive a tighter bound for the MLE. A novel technique we develop in this paper is to track (theoretically) closely related sequences of iterates, in addition to the sequence of iterates GPM actually produces. As a by-product, we obtain an perturbation bound for leading eigenvectors. Our result also confirms intuitions that use techniques from statistical mechanics.

Keywords:

angular synchronization, nonconvex optimization, semidefinite relaxation, power method, maximum likelihood estimator, eigenvector perturbation bound.

1Introduction

Phase synchronization is the problem of estimating angles in based on noisy measurements of their differences . This is equivalent to estimating phases from measurements of relative phases .

A typical noise model for this estimation problem is as follows. The target parameter (the signal) is the vector with entries . The measurements are stored in a matrix such that, for ,

where is the noise level and are independent standard complex Gaussian variables. Under this model, defining and for consistency, the model is compactly written in matrix notation as

where both and are Hermitian. An easy derivation1 shows that a maximum likelihood estimator (MLE) for the signal is a global optimum of the following quadratically constrained quadratic program (we define ):

Problem is non-convex and hard in general ([?]). Yet, numerical experiments in [?] suggest that, provided ,2 the following convex semidefinite relaxation for admits as its unique global optimum with high probability (more generally, if the problem below admits a solution of rank 1, , then the relaxation is said to be tight and is an optimum of ):

In this paper, we give a rigorous proof for this observation, improving on the previous best result which only handles [?]. Our result also provides some justification for the analytical prediction in [?] on optimality of the semidefinite relaxation approach.3

In the theorem statement, “up to phase” refers to the fact that the measurements are relative: the distribution of does not change if is replaced by for any angle , so that if is a solution of , then necessarily so is for any , and can only be recovered up to a global phase shift.

Theorem ? shows that the non-convex problem enjoys what is sometimes called hidden convexity, that is, in the proper noise regime, it is equivalent to a (tractable) convex problem. As a consequence, it is not a hard problem in that regime, suggesting local solvers may be able to solve it in its natural dimension. This is desirable, since the relaxation , while convex, has the disadvantage of lifting the problem from to dimensions.

And indeed, numerical experiments in [?] suggest that local optimization algorithms applied to directly succeed in the same regime as . This was confirmed theoretically in [?] for , using both a modification of the power method called the generalized power method (GPM) and local optimization algorithms acting directly on the search space of , which is a manifold. Results pertaining to GPM have been rapidly improved to allow for in [?].

In this paper, we consider a version of GPM listed as Algorithm ? and prove that it works in the same regime as the semidefinite relaxation, thus better capturing the empirical observation. Note that GPM, as a local algorithm, is a more desirable approach versus semidefinite relaxation in practice. GPM and its variants are also considered in a number of related problems [?], and can be seen as special cases of the conditional gradient algorithm [?].

To establish both results, we develop an original proof technique based on following separate but closely related sequences of feasible points for , designed so that they will have suitable statistical independence properties. Furthermore, as a necessary step toward proving the main theorems, we prove an perturbation bound for eigenvectors, which is of independent interest.

It is worth noting that for , it is impossible to reliably detect, with probability tending to 1, whether is of the form or if it is only of the form [?], which suggests that is necessary in order for a good estimator to exist. This can be made precise by considering the simpler problem of synchronization,4 where we have the stronger knowledge that . For the synchronization problem, non-rigorous arguments that use techniques from statistical mechanics show is the information-theoretic threshold for mean squared estimation error (MSE): when is above this threshold, no estimator is able to beat the trivial estimator as [?]. In [?], it was rigorously proved that is the threshold for a different notion of MSE. These results a fortiori imply, for phase synchronization, that is necessary5 in order for an estimator to have nontrivial MSE (better than the trivial estimator ). It is also known that both the eigenvector estimator and the MLE have nontrivial MSE as soon as [?]. Whether the extra logarithmic factor is necessary to compute the MLE efficiently up to the threshold remains to be determined.

To close this introduction, we state the relevance of the MLE as an estimator for .

The bound on error appears in [?], while the bound on error improves on [?] as a by-product of the results obtained here. We remark that is necessary for a nontrivial error (smaller than , which is trivially attained by ) due to [?].6

It is important to state that the eigenvector estimator mentioned above is order-wise as good an estimator as the MLE, in that it satisfies the same error bounds as in Theorem ? up to constants. From the perspective of optimization, the main merit of Theorems 1 and 2 is that they rigorously explain the empirically observed tractability of (P) despite non-convexity.

The difficulty: statistical dependence

As will be argued momentarily, the main difficulty in the analysis is proving a sharp bound for , which involves two dependent random quantities: the noise matrix and a solution of , which is a nontrivial function of . While in the -norm the simple bound is sharp, no such simple argument is known to bound the -norm. The need to study perturbations in -norm appears inescapable, as it arises from the entry-wise constraints of and the aim to control as well as .

This issue has already been raised in [?], which focuses on the relaxation (Equation 4). Specifically, in [?], it is shown that the relaxation is tight in particular if

where is an optimum of which is then unique up to phase. From this expression, it is apparent that if , it only remains to show that to conclude that solving is equivalent to solving . This reduces the task to that of carefully bounding this scalar, random variable:

where are the columns of the random noise matrix . If and were statistically independent, this would be bounded with high probability by , as desired. Indeed, since the vector contains only phases and since the Gaussian distribution is isotropic (the distribution is invariant under rotation in the complex plane), would be distributed identically to a sum of independent standard complex Gaussians. The modulus of such a variable concentrates close to . Taking the maximum over incurs an additional factor.

Unfortunately, the intricate dependence between and has not been satisfactorily resolved in previous work, where only suboptimal bounds have been produced for , eventually leading to suboptimal bounds on the acceptable noise levels [?][?][?].

As a key step to overcome this difficulty, we (theoretically) introduce auxiliary problems to transform the question of controlling into one about the sensitivity of the optimum to perturbations of the data. This is outlined next.

Introducing auxiliary problems to reduce dependence

Since the main concern in controlling is the statistical dependence between and , we introduce new optimization problems of the form , where, for each value of in , the cost matrix is replaced by

where is the indicator function. In other terms, is with the th row and column set to 0, so that is statistically independent from . As a result, a global optimum of with set to is also independent from . This usefully informs the following observation, where the global phases of and are chosen so that :

Crucially, independence of and implies the first term is with high probability, by the argument laid out after eq. . In the second term, a standard concentration argument shows with high probability—see Section 3. Hence, to control , it is sufficient to show that, with high probability for all , the solutions and are within distance of each other, in the sense.

This claim about the proximity of and turns out to be a delicate statement about the sensitivity of the global optimum of to perturbations of only measurements which involve the th phase, . To establish it, we need precise control of the properties of the optima of . To this end, we develop a strategy to track the properties of sequences which converge to as well as to for each .

To ease further discussion about distances up to phase, consider the following distance-like function:

Restricted to complex vectors of given -norm or to complex vectors with unit-modulus entries, is a true metric on the quotient space induced by the equivalence relation :

Thus, is appropriate as a distance between estimators for and as a distance between candidate eigenvectors, being invariant under global phase shifts. Moreover, the quotient space is a complete metric space with . More details will follow.

Coming back to our problem, the core argument is an analysis of recursive error bounds of GPM, and this analysis leads to the proof that all iterates stay in , where

and are some constants (determined in Section 4). On one hand, we show that, with high probability, the nonlinear mapping iterated by GPM is Lipschitz continuous over with constant . On the other hand, we show that, with high probability, all iterates of GPM are in . Together, these two properties imply that, with high probability, is a contraction mapping over the set of iterates of GPM. By a completeness argument, this implies that the sequence of iterates of GPM converges in . The roadmap of our proof is the following:

  1. (this requires developing new bounds for eigenvector perturbation—see Theorem ?),

  2. (this is done in two stages,7 for small and large —see Theorems ? and ?),

  3. is -Lipschitz with on with respect to (by completeness, this implies —see Lemma ?), and

  4. any fixed point of in is a global optimum of (see Lemma ?).

On top of securing results about GPM, this will imply that admits a solution which is in and hence, a fortiori, satisfies , yielding the announced results about the SDP relaxation as per .

As hinted above, we follow this reasoning not only for the sequence which is expected to converge to , but also for auxiliary sequences expected to converge to . It is only through exploitation of the strong links between these sequences and reduction in statistical dependence they offer that we are able to go through with the proof program above.

Note that might not be a contraction mapping on all of since we do not show that . Nevertheless, is a contraction on the iterates, which is sufficient for our purpose; henceforth, we say the mapping has the local contraction property.

We remark that, in the study of high-dimensional -estimation [?], the idea of introducing auxiliary problems (and associated optimizers) is also used to tackle dependence, and it yields powerful analysis. While sharing similarity with that approach, our analysis relies on studying auxiliary sequences of iterates, as will be discussed soon—also see Figure 1.

As a necessary and useful warm-up, we first focus on the task of showing that (a leading eigenvector of ) is in , via analysis of the related (leading eigenvectors of ). This requires sharp bounds for . The outcome of this analysis is an eigenvector perturbation bound in the -norm, which is another motivation for the introduction of auxiliary problems.

First analysis: an perturbation bound for eigenvectors

As the initializer of Algorithm ?, the leading eigenvector of has several good properties necessary for analysis, and we will discuss them in depth in Section 3. Theorems and lemmas in this direction are stated separately and proved first, because their proof is illustrative of the techniques deployed to prove results about . Most notably, we prove a sharp perturbation bound for leading eigenvectors.

The crux of the proof lies in a sharp bound on . This is obtained by using (a suitable version of) the Davis–Kahan theorem (Lemma ?): when ,

To reach the first conclusion, we view as the perturbed version of due to perturbation . Note that has nonzero entries only in the th row and th column, and they are independent of . Compared to the full perturbation which perturbs the eigenvector to , the matrix results in a much smaller distance between and . Notice that, as will be detailed later, these results combined with the reasoning of imply that is in , as desired.

Comparing Theorem ? to Theorem ? readily shows that the eigenvector is an excellent estimator for (up to the fact that its entries are not necessarily unit-modulus, which can be easily corrected—see Theorem ?). Further efforts in this paper are dedicated to characterizing the performance and tractability of the MLE .

Analysis of iterations: tracking auxiliary sequences

While analyzing the eigenvector is relatively straightforward, the optima of are more difficult to tame due to the unit-modulus constraints. As hinted above, the novel idea we develop in this paper is to track the sequences produced by Algorithm ? with inputs instead of , for each . These auxiliary sequences—which only serve for the analysis and are not (and could not be) computed in practice—enjoy the crucial proximity property desired in the previous subsection—see Figure 1.

Figure 1: Sequence \{x^t\}_{t=0}^\infty (in black) produced by Algorithm , and n auxiliary sequences \{x^{t,m}\}_{t=0}^\infty (in red) produced (conceptually) by Algorithm  with modified inputs. Crucial properties along the paths: (i) proximity: x^t and x^{t,m} stay close; (ii) local contraction: x^t and x^{t,m} remain in the contraction region \mathcal{N} with high probability and converge in it.
Figure 1: Sequence (in black) produced by Algorithm , and auxiliary sequences (in red) produced (conceptually) by Algorithm with modified inputs. Crucial properties along the paths: (i) proximity: and stay close; (ii) local contraction: and remain in the contraction region with high probability and converge in it.

Indeed, we will show by induction that there exist absolute constants such that, for all and for ,

The proximity and local contraction properties8 are both crucial and complementary for the analysis: the proximity property allows to control quantities in the presence of the random matrix despite statistical dependence (as shown in ), and the local contraction property is used to establish below, making sure remains small.

For the high-level idea, consider the nonlinear operators and implicitly defined by Algorithm ? so that and . If we can show that is -Lipschitz with constant with respect to , then a recursive error bound follows:

This ensures does not accumulate with , provided the discrepancy error—which is caused by the difference between and —is small enough. This is assured with high probability, because is independent of , causing the discrepancy error to be —considerably smaller than . In spirit, this is the same argument as in the analysis of the eigenvector estimator.

The above recursive error bound hinges on the other important property, that is, staying in the contraction region. Crucially, to establish , we need a tight bound on . Fortunately, we have seen how to control this quantity in : for any ,

This, in turn, requires a proximity result for . This insight naturally motivates an analysis of each iteration by induction.

There are two technical issues we briefly address before ending this introduction with a remark.

The first issue concerns the probabilistic argument in the proof. In (Equation 12) and (Equation 13), we invoke concentration inequalities to obtain tight bounds. However, since there is a (small) probability that such inequalities fail, we cannot use union bounds for , which is infinite. To overcome this obstacle, we use concentration inequalities only for the first iterations, and resort to a deterministic analysis for iterations . The critical observation is that, decays exponentially for due to contraction, so the amount of update is tiny after iterations. Using another inductive argument, we can secure exponential decay for as well. The rationale is that we already established good properties about , and is tiny for , so we can easily relate to and show also has good properties. Essentially, remains in a contraction region with slightly larger constants.

The second issue is identifying the limit with a solution of . We will verify the optimality and uniqueness (up to phase) of via a known dual optimality certificate [?].

We close with a remark about the initializer (and ). Algorithm ? uses the leading eigenvector of for initialization, and our analysis relies on perturbation bounds to verify the base case of the induction. However, we point out that, even in the absence of such perturbation results, we could set in theory, deduce that and thus prove Theorem ? about the SDP relaxation. In other words, even if we leave out the discussion about the initializer altogether, there is still enough material to secure tightness of the SDP relaxation. The proof augmented with the analysis of the eigenvector perturbation has the advantage of also providing a statement about Algorithm ? which is actually runnable in practice.

2Main results

In this section we will state our main theorems formally. The assumption on random noise will also be relaxed to a broader class of random matrices. To begin with, let us first clarify the “up to phase” statements in Section 1.

The quotient space For any , whether the true phases are or does not affect the measurements . As a result, the available data are insufficient to distinguish from . Clearly the program is invariant to global phase shifts as well. It thus makes sense to ignore the global phase in defining distances between estimators. A reasonable notion of error then becomes

where the optimal phase is the phase (Arg) of . Similarly, a notion of error can be defined:

Formally, one can partition all points in into equivalence classes via the equivalence relation . The resulting quotient space contains the equivalence classes for all . Specifically, the feasible set of ,

reduces to under this equivalence relation. It is easily verified that defines a distance on . In particular, it satisfies the triangular inequality (where is understood to mean ):

Moreover, is a complete metric space under (see Theorem ?). Similarly, is also a distance. As will be shown, the sequence described in Section 1 satisfies the local contraction property (see ) on the metric space , hence converges to a fixed point which is exactly (understood as ).

The noise matrix In Section 1 we assume that has independent standard complex Gaussian variables above its diagonal. However, this restricted assumption is only for expository convenience, and can be relaxed to the class of Hermitian Wigner matrices with sub-gaussian entries. Statements about Algorithm ? and about tightness of the SDP relaxation continue to hold, although of course the solution of now no longer necessarily corresponds to the MLE.

The class of sub-gaussian variables subsumes Gaussian variables, but has one defining feature similar to Gaussian variables, that is, the tail probability decaying no slower than Gaussian variables. In our model, each entry of satisfies the tail bound

for both real and imaginary parts, where is an absolute constant. Formally, we assume the Hermitian matrix satisfies the following: are jointly independent, have zero mean, and satisfy the sub-gaussian tail bound (Equation 17); the diagonal elements are zero, and for any . Note there are equivalent definitions of sub-gaussian variables (up to constants) [?].

This random model is a much richer class of noise matrices, containing the Gaussian model introduced in Section 1 as a special case. Each random variable in can be, for example, a symmetric Bernoulli variable, any other centered and bounded variable, or simply zero.

SDP approach The SDP approach tries to solve (Equation 3) via its convex semidefinite relaxation (Equation 4). It is a relaxation of in the following sense. For any feasible , the corresponding matrix is feasible for . Likewise, any feasible matrix of rank 1 can be factored as such that is feasible for . Thus, the relaxation consists in allowing solutions of rank more than 1 in . Consequently, if admits a solution of rank 1, , then the corresponding is a global optimum for . Furthermore, if the rank-1 solution of is unique, it can be recovered in polynomial time. For this reason, the regime of interest is one where admits a unique solution of rank 1.

The following theorem—a statement of Theorem ? which holds in the broadened noise model—closes the gap in previous papers [?].

Note that the exponent 2 in the failure probability can be replaced by any positive numerical constant, only affecting other absolute constants in the theorem (and all other theorems).

GPM approach The generalized power method (Algorithm ?) is an iterative algorithm similar to the classical power method, but instead of projecting vectors onto a sphere after matrix-vector multiplication , it extracts the phases from , which is an entry-wise projection. It is much faster than SDP, and converges linearly to a limit, which is the optimum up to phase (optimality is stated in Theorem ?). The next theorem is a precise version of Theorem ?.

The proof is based on induction: in each iteration, we will establish the proximity property and the contraction property for . The proof is simply a rigorous justification of the heuristics we discussed in Section 1. It also leads to and error bounds for . The following theorem is a precise version of Theorem ?.

Eigenvector estimator We denote, henceforth, the leading eigenvector of by , and similarly the leading eigenvector of by . Note that and (similarly and ) are identical.9 We highlight the significance of the eigenvector estimator in the following theorem, which is a precise version of Theorem ?.

The eigenvector estimator has been studied extensively in recent years, prominently in the statistics literature [?], under the spiked covariance model. While the perturbation is usually studied under norms, the norm received much less attention. A recent perturbation result appeared in [?], but it is a deterministic bound and would produce a suboptimal result here.

3Proof organization for eigenvector perturbations

We begin with some concentration lemmas, which will also be useful in Section 4. Recall the definition of . We also define , which has nonzero entries only in the th row and th column, given by .

Concentration lemmas

The first concentration result is standard and is a direct consequence of, for example, Proposition 2.4 in [?].

Let be the set of unit vectors in . Suppose for each we have a finite (random) set whose elements are independent of , and the cardinality of is not random. Concentration inequalities enable us to bound uniformly over all with high probability. We state this formally in the next lemma.

A straightforward application of Hoeffding’s inequality for sub-gaussian variables [?] shows with probability . Lemma ? is more general, because , and it will be useful in later proofs.

For the eigenvector problem, we will choose (a singleton) where is a leading eigenvector of scaled to have norm . For problem (Equation 3), for each , the set will be , namely, the first iterates of Algorithm ? with input , where . By construction, elements of the set are independent of .

Introducing auxiliary eigenvector problems

As is well known, the leading eigenvectors of are the solutions to the following optimization problem (note that this problem is a relaxation of ):

We aim to show that a solution of is close to in the sense of . As before, the major difficulty of the analysis is obtaining a sharp bound on . This is apparent when we write for the leading eigenvalue of and use to obtain (choosing the global phase of such that ):

While it is easy to analyze and , bounding requires more work.

For , let be the solution to an auxiliary problem in which is replaced by —thus, is equivalent to . Following the same strategy as in , we can now split into two terms and try to bound separately:

where is the dominant term, and can be easily bounded—see the paragraph below Lemma ?; and is the higher-order discrepancy error, which is the price we pay for replacing with .

The crucial point is that , which is much smaller than . This is because the difference between and results from a sparse perturbation , whose effect on the leading eigenvector is small. This point is formalized in the next lemma, which follows from [?].10

The benefit of this perturbation result is pronounced when is a sparse random matrix: if we set , then the numerator in ( ?) becomes with high probability (by Lemma ? and a bound on ). If, however, we set , then the numerator is with high probability. This is why is so small and (Equation 19) yields a tight bound.

We remark that in many later uses of perturbation results (e.g., [?]), especially in statistics and theoretical computer science, it is common to invoke a variant of the Davis–Kahan theorem in which is replaced by in ( ?), which would lead to a suboptimal result here. This is because with high probability when is a random sparse matrix and is not large. Our analysis here is an example that shows the merit of using the more precise version of the Davis–Kahan theorem.

4Proof organization for phase synchronization

We begin with some useful lemmas about the local contraction property. These will prove useful to establish the desired properties of the iterates of Algorithm ?, by induction. These properties extend to the limit by continuity. Finally, we will use a known optimality certificate for to validate .

Local contraction lemmas

First let us denote the rescaled matrix by , which can also be viewed as a linear operator in :

This is a linear combination of (signal) and (noise). When is small compared to , is Lipschitz continuous on the sphere around , with respect to .

This lemma is instrumental in establishing the key local contraction property (Lemma ?). It is related to the contraction mapping theorem, in which an iteratively defined sequence converges to a fixed point. We could use this lemma to easily show (using [?] for example) that the normalized version is a contraction mapping in a neighborhood of on is the power method operator.

However, our problem is more complicated due to the unit-modulus constraints in which call for the entry-wise operator in Algorithm ?. Consequently, an analysis of Algorithm ? requires entry-wise bounds on key quantities in each iteration, which are more involved than bounds. In the next two lemmas, we will see which entry-wise bounds we need in order to establish the local contraction property.

Recall that maps each entry of a vector to the unit circle in the complex plane:

The case will not appear in the proofs because it can be excluded with high probability. Henceforth, we also drop parentheses in for simplicity.

In the next two lemmas, we establish the local contraction property of (the GPM operator) under the distance . Under certain conditions on the input points, shrinks the distance between points by a ratio in . In a rigorous sense, this does not imply that is a contraction mapping, because the output points do not necessarily satisfy the conditions themselves. However, for the sequences of interest, the conditions are satisfied with high probability and this is all we need to ascertain convergence—see Theorems ? and ?.

This lemma says is Lipschitz continuous (in the quotient space ) in a region where and are uniformly lower bounded. The composition , will have the local contraction property as long as the contraction ratio in ( ?) is small enough, and the in ( ?) is not too close to . The next lemma formalizes this result. For later use in the proofs, we also introduce an additional notation: for a Hermitian matrix , we let .

This deterministic lemma states that has the local contraction property in a region where ( ?) and ( ?) are satisfied, as long as . Note that we have to require bounds in because of the entry-wise nature of . In the next subsection, we use this lemma to show that, with high probability, is controlled by where the ratio lies in .

Convergence analysis

Let us denote the rescaled matrix by , where . Also let . Recall that is identical to the leading eigenvector of , and is identical to the leading eigenvector of ; and they are normalized such that . Algorithm ? iterates . The auxiliary sequences defined for theoretical analysis (not implemented in practice) follow a similar update rule: (Figure Figure 1). Note that, for all , each entry of and has unit modulus, i.e., , but are not in in general.

As shown in Lemma ?, the local contraction property of hinges on the condition that the vectors to be updated are in the contraction region , where and are defined in (Equation 11) and ( ?). The absolute constants in their definitions will be specified in Theorem ?.

In order to show the iterates stay in , we analyze the dependence between the random quantities and by making use of the auxiliary sequences , as illustrated by eq. (Equation 13). As shown in the analysis of eigenvectors in Section 3, we know is small. Owing to the local contraction property (Lemma ?), we can prove the recursive error bound (Equation 12), which ensures that is bounded throughout all iterations. The analysis is based on induction.

As discussed in Section 1, a technical issue is that we cannot use concentration results infinitely many times for all , because we use the union bound to achieve a high probability result. We study first iterates using concentration results, and resort to deterministic analysis for later iterations. In the following theorems, constants are those constants in Lemmas ? and ?.

Note that ( ?) and ( ?) guarantee . Considering from our proof map,

Assuming ( ?) and ( ?) for the case , by the local contraction property (Lemma ?), the first term is bounded by where . By the concentration bounds (Lemma ?), the second term is bounded by with high probability. Therefore, it is expected that ( ?) continues to hold for the case .

This proximity property ( ?) is crucial to show that stays in the local region . To bound , we use the concentration bounds (Lemma ?) and the proximity property ( ?) in the inequality (Equation 13).

To bound , we derive an entry-wise bound on , then use Lemma ?. This is straightforward once we have a bound on .

The next result says decreases geometrically for , which is notably useful to analyze later iterations .

This result is similar to the contraction mapping theorem (though we cannot prove for all ), which says the sequence produced by a contraction mapping is a Cauchy sequence and satisfies an inequality similar to ( ?). After iterations, the update from to is almost negligible. Although we no longer rely on concentration results, we will show, by induction again, that remains very small for all . This ensures that stays in a slightly larger contraction region for all (with larger constants). The next theorem depends crucially on the conclusions of Theorem ? and ?.

Under the stated conditions, this theorem is a deterministic result. By Theorems ? and ?, the conditions hold with high probability (note that when ). This theorem establishes the convergence of and, importantly, the bounds extend to the limit by continuity. This strong characterization of the limit point puts us in a favorable position to verify optimality.

Verifying optimality

To verify optimality of , it is convenient to use a known dual certificate for the SDP relaxation. The following is a combination of Lemmas 4.3 and 4.4 in [?].

To simplify notation, let . Using the same developments as in [?], we verify that is positive semidefinite and has rank under condition on and the conclusions of Theorem ?, namely, inequalities and equation . By construction, . Hence, it is sufficient to verify that for all with and :

(Owing to , we used , with appropriate choice of .) Now using and assuming and the bounds hold, it follows that

Assume satisfies inequality . Then, using and as in Theorem ?,

Thus, is positive semidefinite and has rank —which implies is the unique solution of and is the unique solution of up to phase by Lemma ?—provided satisfies , and the conclusions of Theorem ? hold. Theorem ? follows directly; details for Theorems ?? are in the appendix.

5Conclusions and perspectives

We proved that both semidefinite relaxation and the generalized power method are able to find the global optimum of (Equation 3) under the regime with high probability. In other words, the maximum-likelihood estimator of phase synchronization is computationally feasible under noise level , which (nearly) matches the information-theoretic threshold, thus closing the gap in previous papers. We also derived and bounds on the optimum , and the bound improves upon previous results. The proof is based on tracking auxiliary sequences, which is a novel technique developed in this paper. As a by-product, we also proved an bound for the eigenvector estimator, which is of independent interest.

An interesting problem for future work is to prove (or disprove) that second-order necessary optimality conditions are sufficient for (Equation 3). If this is true, then any algorithm that finds a second-order critical point also solves the nonconvex problem (Equation 3). This was proved in [?] for then in [?] for . Numerical experiments in [?] suggest that a local optimization method (namely, the Riemannian trust-region method) with random initialization finds the global optimum with and random initialization. The analysis presented here does not apply directly though, because it hinges on a characterization of the limit points of GPM: a priori, this does not allow to characterize all second-order critical points.

A natural extension of our work is to establish similar results for synchronization over SO() [?] and SE() [?]. The general synchronization problem is to recover group element from their noisy pairwise measurements . Our work here addresses synchronization over the group SO(2) (equivalently, the group U(1)), that is, in-plane rotations (equivalently, points on the unit complex circle). Another important group in practice is the rotation group SO(3), which is often used to describe the orientation of an object [?]. It is shown empirically in [?] that the Riemannian trust-region method performs well. The analysis may be complicated by the fact that SO(3) is a non-commutative group.

Another important problem in practice is to handle incomplete measurement sets. In this paper, we suppose all entries of are known. A more realistic setting is that some pairs of phase differences are measured, forming edges of a graph. This appears in many applications [?], and is addressed in a number of papers [?]. The effect of an incomplete measurement graph on fundamental bounds is well understood as being related to the Laplacian of the graph [?]. See [?] for robotics applications.

Finally, another problem of practical concern is robustness of estimation methods. Here, (Equation 3) minimizes the sum of squared errors. However, in practice, more robust methods may be required to deal with outliers. In [?] for example, the authors minimize a sum of unsquared errors. A common way to solve such problems is via iterative reweighted least squares (IRLS), which is widely used in statistics [?]. IRLS solves a weighed least squares problem in each step, where the weights depend on the current iterate. In this regard, our analysis could be a first step toward understanding robust methods with IRLS for synchronization problems.

Acknowledgements

The authors thank Afonso Bandeira, Amelia Perry, Alexander Wein, Amit Singer and Jianqing Fan for helpful discussions.


AProofs

Proofs for Section

Let be the upper triangular part of , i.e., , and . Then are all matrices with independent and sub-gaussian entries, whose sub-gaussian moments are bounded by an absolute constant (see [?] for equivalent definitions of sub-gaussian variables). We can then apply Proposition 2.4 in [?] and obtain the desired concentration bound on . For , we take a union bound over choice of . The bound on follows from those on and . The bound on follows from that on , since

We will prove this lemma in the case where is a deterministic set. The case where is random follows easily from the deterministic case, since we can first condition on and use independence between and . In the proof, notations denote some absolute constants. For a fixed and ,

We will bound the two parts on the right-hand side separately. We can expand into a sum: . By assumption, is a sub-gaussian random variable. By Hoeffding’s inequality for sub-gaussian variables [?],

For sums of random variables of , or over , similar concentration results hold. Thus, we can set , where is some large absolute constant such that