Multi-Frequency Phase Synchronization
We propose a novel formulation for phase synchronization—the statistical problem of jointly estimating alignment angles from noisy pairwise comparisons—as a nonconvex optimization problem that enforces consistency among the pairwise comparisons in multiple frequency channels. Inspired by harmonic retrieval in signal processing, we develop a simple yet efficient two-stage algorithm that leverages the multi-frequency information. We demonstrate in theory and practice that the proposed algorithm significantly outperforms state-of-the-art phase synchronization algorithms, at a mild computational costs incurred by using the extra frequency channels. We also extend our algorithmic framework to general synchronization problems over compact Lie groups.
Keywords: Spectral Methods, Harmonic Retrieval, Phase Synchronization, Group Representation
Angular or phase synchronization [Sin11, Bou16] concerns estimating angles in from a subset of possibly noise-contaminated relative offsets . An instance of phase synchronization can be encoded on an observation graph , where each angle is assigned to a vertex and relative offsets are measured between and if and only if there is an edge in connecting vertices and . Equivalently, the angles can be encoded into a column phase vector , and measurements constitute a Hermitian matrix
where is the adjacency matrix of the observation graph , is the entrywise product, and the Hermitian matrix encodes measurement noise.
As a prototypical example of more general synchronization problems arising from many scientific fields concerning consistent pairwise comparisons within large collections of objects (e.g., cryogenic electron microscopy [SZSH11] and comparative biology [GBM16]), phase synchronization attracted much attention due to its simple yet rich mathematical structure. One mathematical formulation is through nonconvex optimization
where is the Cartesian product of copies of . Depending on the context of the scientific problem, may be assumed to arise from an additive Gaussian noise model [Bou16, BBS17], in which the Hermitian matrix in (1) is a Wigner matrix with i.i.d. complex Gaussian entries above the diagonal, or from a random corruption model [Sin11, CSG16] that assumes
for each edge . Note that the random corruption model can also be cast in the form (1) after proper shifting and scaling. In general, the additive Gaussian noise model is more amenable to analysis, while the random corruption model is better at capturing the behavior of physical or imaging models where many outliers exist.
In this paper, we propose to tackle the phase synchronization problem by solving an alternative nonconvex optimization problem of “multi-frequency” nature, namely,
where is the number of frequency channels, is the entrywise th power of , and is a Hermitian matrix containing information of the phase vector in the th frequency component. Under the random corruption model (3), we construct directly from by entrywise power:
where each is a complex Hermitian random matrix with independent upper diagonal entries, and the scaling is chosen such that the operator norm of is upper bounded by . However, unlike [BCS15, PWBM18], we allow entries of to be general sub-Gaussian random variables rather than restrictively complex Gaussian, and we do not assume independence of the ’s across different . Such generality enables us to treat the two noise models (5) (6) in a unified model, under which we design and analyze our multi-frequency phase synchronization algorithm. We demonstrate surprising theoretical and empirical results that drastically outperform all existing phase synchronization algorithms in their corresponding settings, measured in terms of the correlation between the output and the true phase vector , at a mild increase in the computational cost incurred by parallelizing the computation in frequency channels. As will be demonstrated in Section 4, in the noise regime where phase synchronization is tractable, the number of frequencies needed to outperform single frequency algorithms is at most polylogarithmically dependent on the problem size , while the estimation error decays polynomially in .
The rationale behind the multi-frequency formulation (4) lies at the observation that statistical estimation can often benefit from higher moment estimates, even without introducing new measurements. As a motivating example, let be a complete graph, and consider the following coupled problems:
where are i.i.d. uniform on for , and . Note that and are by no means independent, but for all practical purposes satisfy the same sub-Gaussian bounds since and are identically distributed; we thus assume without loss of generality that . If we can find satisfying
then, by Lemma 1 of [Bou16] (assuming without loss of generality that ), we have for
which gives . This is a tighter bound than one could obtain from (9) alone for lage (so that ).
The idea of incorporating multiple frequency information into phase synchronization was first pursued in the non-unique games (NUG) framework [BCS15] and the message-passing algorithm [PWBM18], for different considerations. In practice, since solutions to (2) are determined only up to a global multiplicative phase, solving the problems separately adds difficulty to postprocessing, since individual solutions are not related to each other by entrywise powers. We thus propose to consider problem (4) which concatenates the problems additively and internalizes the entrywise power constraints. Though (4) is nonconvex and NP-hard in general (same as (2)), we propose an efficient two-stage (initialization and iterative refinement) algorithm for this problem which turns out to produce surprisingly accurate estimates for the phase vector even in high noise regimes where all existing phase synchronization algorithms break down. Our algorithm is motivated by the classical harmonic retrieval techniques in signal processing [SM97, TK82, BM86, ZW88, Sch86, RK89, SD17a, SD17b] and the generalized power method [Bou16]. We extend this algorithm to synchronization over general compact Lie groups in Section 5.
Upper case letters and lower case letters will be used to denote matrices and vectors, respectively. , are the transpose of with or without conjugation, respectively. The entrywise (Hadamard) product of matrix and will be denoted as . Graphs are always undirected and connected. Vertices of the graph will be denoted as integers ; pairs of integers denote edges in . For we write . Norms , stand for matrix or vector norms, depending on the context; , are matrix operator and Frobenius norms, respectively. The Cartesian product of copies of is denoted as . The quotient space is identified with the unit circle.
2 Related Work
Directly solving (2) is NP-hard [ZH06], but many convex and nonconvex methods have been proposed to find high quality approximate solutions. These include spectral and semi-definite programming (SDP) relaxations [Sin11, CSC12, CKS15, BKS16, BBS17]. An alternative approach using generalized power method (GPM) is also studied [Bou16, LYM17, ZB18].
Phase synchronization in multiple frequency channels
[BCS15] proposed the non-unique games (NUG) SDP optimization framework for synchronization over compact Lie groups. The SDP is based on quadratically lifting the irreducible representations of the group elements, and imposing consistency among variables across frequency channels via a Féjer kernel; it is computationally expensive. [PWBM18] introduced an iterative approximate message passing (AMP) algorithm for noise model (6), assuming the noise are Gaussian and independent across frequency channels. Each iteration of the AMP performs matrix-vector multiplication and entrywise nonlinear transformation, followed by an extra Onsager correction term; it is conjectured to be asymptotically optimal.
In this section we formally state the two-stage multi-frequency phase synchronization algorithmic paradigm. Stage One combines phase synchronization outcomes from individual frequency channels with harmonic retrieval, aiming at producing a high-quality initialization; Stage Two iteratively refines an input by an extended generalized power method that works concurrently in multiple frequency channels while striving to maintain entrywise consistency.
3.1 Stage One: Initialization Strategy
Step 1. Individual Frequency Synchronization: Apply any phase synchronization algorithm (spectral/SDP relaxation or GPM) to get phase vector estimate from each , , and form ;
Step 2. Entrywise Harmonic Retrieval: For each , use any harmonic retrieval technique to estimate from , , call the estimators ;
Step 3. Final Phase Synchronization: Construct another Hermitian matrix by , and apply any phase synchronization algorithm to estimate the true phases from matrix .
The flexibility of the multi-frequency phase synchronization framework lies at the various choices to be made in each step. As a concrete example, we detail in Algorithm 1 a simple version that uses spectral relaxation for phase synchronization and periodogram-based harmonic retrieval. We will henceforth refer to Algorithm 1 as the periodogram peak extraction with spectral methods (PPE-SPC). If a different phase synchronization method is used, for instance, SDP relaxation, our nomenclature refers to it as PPE-SDP. We will focus on analyzing PPE-SPC in depth in Section 4, but the analysis strategy can be seamlessly carried in principle to other variants of this algorithmic paradigm.
We briefly motivate the argmax operation in Step 2 as follows. If our measurement matrices are noise-free, then the th entry of from Step 1 should equal to ; in this case, the goal of Step 2 is to reconstruct from its “trigonometric moments,” for which any harmonic retrieval technique can be applied; the periodogram method in Algorithm 1 is among the most naïve approach for this purpose. For clean signal, the periodogram is equal to the modulus of the Dirichlet kernel
which attains its maximum at . Since the peak of becomes sharper and sharper as increases, we expect the periodogram peak identification step to be robust to noise, which will produce a very high quality estimate for Step 3. In fact, our analysis in Section 4 suggests that this initialization stage alone can produce highly accurate phase vectors for sufficiently large , and the estimation error drops inverse-polynomially in .
3.2 Stage Two: Iterative Refinement
In this stage, we use an iterative refinement scheme that takes an initial phase vector and enhances it successively. In our implementation we warm-start this iterative algorithm with the produced from the PPE-SPC Algorithm 1, but any initialization scheme can be applied in principle, including random initialization. This iterative refinement concurrently performs the generalized power method (GPM) [Bou16] in multiple frequency channels consistently: at each frequency , we perform power iteration by multiplication with ; the results are combined across frequency channels to obtain one periodogram for each vertex followed by a “soft harmonic retrieval” step that soft-thresholds [Don95] the periodogram in frequency domain. We pick a relatively lower threshold at the beginning of this iterative scheme, but gradually raise the threshold over to reveal the true peak that persists. Details can be found in Algorithm 2, henceforth referred to as multi-frequency generalized power method (MFGPM).
MFGPM can be viewed as an iterative version of PPE-SPC, except that the stringent peak extraction step is replaced with the more malleable soft-thresholding. Periodograms are virtually the Dirichlet kernels, which truncate a Dirac delta function in the frequency domain; one can also take Cesáro means of these Periodograms, or equivalently, work with the Féjer kernels that are known to converge faster to the Dirac delta function. We omit those results as no significant difference is observed in performance.
As an integral part of our two-stage algorithmic framework, MFGPM works most efficiently with initialization from PPE-SPC, but we also observed empirically that the MFGPM outperforms other methods given identical random initialization, illustrated in Figure 1, in the sense that MFGPM often produces phase vectors that correlate more strongly with the true phase vector . See Section 6 for more comprehensive comparisons results.
The computational complexities of PPE-SPC and MFGPM are and , respectively.
In this section we analyze PPE-SPC in theory, under the general sub-Gaussian noise model (6). We assume the observation graph is generated from a Erdős–Rényi model with edge connectivity independent of the ’s.
For and each , assume
where is the entrywise th power of , and , are complex random Wigner matrices satisfying the following assumptions:
For any fixed , are jointly independent with zero mean, and unit sub-Gaussian norm [Ver18];
for all and ;
for all and .
Furthermore, assume is the adjacency matrix of a Erdős–Rényi random graph independent of all the ’s, with edge connecting probability .
We emphasize again that Assumption 1 assumes no independence for the ’s across frequency channels; only entries within the same are assumed independent. As explained in Introduction, this enables us to unify our discussions on the random corruption model and additive Gaussian model in a single pass (see e.g., (8)). Another advantage for such generality is that we can focus on analyzing complete observation graphs, since
where is the identify matrix of dimension -by-, and thus we can apply the theoretical analysis in this section to
satisfies the same conditions as in Assumption 1 with different absolute constants. Therefore, in the rest of this section we focus on complete observation graph only, i.e.,
Our first goal is to understand the spectral method in PPE-SPC Step 1 and Step 3. Since Step 2 is entrywise, it is crucial to bound the distance between and the leading eigenvector (scaled to ). The proof of the following Lemma 1 uses recent perturbation results of eigenvectors of random matrices [EBW17, AFWZ17, FWZ18, ZB18] and can be found in Appendix A.1.
Assume Assumption 1 is satisfied, and the observation graph is a complete graph. Let be an arbitrarily chosen but fixed absolute constant. For any , denote for the leading eigenvector of scaled such that and . There exist absolute (in particular, independent of and ) constants such that, if , there holds with probability
The inequality (16) is a direct consequence of (15), which is identical to Theorem 8 of [ZB18], but we verify in the proof that the event probability in [ZB18] can be made slightly higher. This is necessary for taking the union bound across all entries in the main Theorem 2.
A quick consequence of Lemma 1 is the uniform proximity of the periodogram to a Dirichlet kernel up to constant scaling and shifts, with high probability. More specifically,
with probability . Clearly, the maximum of is attained at . We thus expect the argmax operation in Step 2 of PPE-SPC to produce high accuracy estimates of as long as the difference between the “optimization landscape” of the periodogram and the Dirichlet kernel is small enough. This is formalized in the following lemma, which exploits the geometry of the Dirichlet kernel.
Under the same conditions as in Lemma 1, if
then with probability at least
It is straightforward to check that (17) holds for sufficiently large as long as is bounded from above by . This can be seen by noticing that the function is differentiable and monotonically decreasing for all , and for sufficiently large it infinitesimally approaches .
The most important message from Lemma 2 is the following: At the beginning of the Step 3 of PPE-SPC, the newly constructed Hermitian matrix is entrywise –close to the ground truth rank-one matrix . We emphasize that this error incurred in is significantly smaller than the noise level in the raw input data, and can be made arbitrarily small by choosing large . We formalize this key observation in the main theorem below, for which the proof is deferred to Appendix A.3.
Under the same conditions as Lemma 1 and Lemma 2, if (17) holds and , then there exists an absolute constant such that, with probability , the correlation between the true phase vector and the leading eigenvector (scaled to ) of in PPE-SPC Step 3 is at least
Moreover, for the phase vector output from PPE-SPC,
Following the discussion after Lemma 2, it is not surprising to see in Theorem 2 that the correlation can be made arbitrarily close to (or equivalently, the distance between the estimated and true phase vectors can be made arbitrarily close to ). Moreover, it doesn’t take excessively large for PPE-SPC to outperform all existing phase synchronization algorithms—in fact, for which is the highest level of noise tolerable to ensure the validity of Lemma 1, it suffices to take to suppress the estimation error below the established near-optimal bound for eigenvector based phase synchronization methods [BBS17, ZB18]. We believe (19) can still be improved by a factor of by leveraging the randomness in the residue error in (16), but such finer analysis relies on more detailed analysis on the perturbation and the change in the optimization landscape, which will be pursued in a future work.
5 Extension to General Synchronization
The algorithmic framework of multi-frequency phase synchronization proposed in this paper can be extended to synchronization over any compact Lie group , by the representation-theoretic analogue of Fourier series — the Peter-Weyl decomposition. In a nutshell, the Peter-Weyl theorem states that, for square integrable functions , we have decomposition
where each is an irreducible, unitary representation of , and is the “Fourier coefficient”
where the integral is take with respect to the Haar measure.
On a connected observation graph , the input data to a synchronization problem over group are pairwise measurements on edges satisfying . The goal is to find group elements , one for each vertex, that satisfy as many constraints as possible. Mathematically, this type of problems can often be formulated as an optimization problem [BCS15]
where each measures the compatibility between the relative alignment and the observation data on edge . The ’s are nonlinear and nonconvex in general. If are bandlimited, we can expand (22) using the Peter-Weyl decomposition
which can be viewed as a generalization of the multi-frequency phase synchronization problem (4).
For simplicity of statement, we assume the observation graph is complete in this section. Since ’s are unitary representations, the matrices ’s are unitary matrices for any , and it is natural to solve for from its irreducible representations . Vertically stacking the th irreducible representations together, the variable can be organized in matrices , defined by
Analogies of the noise models also exist in this more general setting. The additive Gaussian noise model, following [PWBM18], amounts to
where the parameter stands for the signal-to-noise ratio (SNR) at “frequency ,” is a Wigner matrix with i.i.d. standard complex Gaussian entries in the upper triangular part. For the random corruption model, let
and set the th sub-block of to .
As we elaborate in the remainder of this section, all the key ingredients in PPE-SPC and MFGPM can be extended to this more general setting. We demonstrate the efficacy of this algorithm for synchronization in Section 6.
Spectral relaxation: Compute the top eigenvectors and stack them horizontally to form . Approximate with .
Generalized harmonic retrieval: For each , set
Based on these new estimates for the pairwise alignments, we build matrix with blocks with . We then extract the top eigenvectors of , stack them horizontally to form , and project each of its vertical blocks to a unitary matrix through singular value decomposition (SVD)
Iterative refinement: At the th iteration, denoting for the current stacked th representations (23), we construct
and compute the inverse Fourier transform for each of the vertical sub-blocks of :
Note that we only need toe evaluate on a finite number of uniformly sampled elements of , from which the “inverse Fourier transform” can be applied
along with the soft-thresholding . We again project each to the closest unitary matrix by SVD (27), then form by vertically stacking the ’s. The final outputs are for .
6 Numerical Experiments
This section contains detailed numerical results under both additive Gaussian noise and random corruption models, for both and . In all experiments with Gaussian noise, we keep where is the signal-to-noise ratio (SNR); for the random corruption model (3) we set . We fix and vary and to evaluate and compare the performance of different algorithms. When comparing iterative algorithms (AMP, GPM, MFGPM), within each random trial the random initialization is kept identical for all three algorithms and across frequency channels; between trials both data and initialization are redrawn. The remainder of the section contains results for and synchronization with complete observation graphs only; incomplete observation graph results are similar and included in Appendix B.
In Figure 3 and Figure 4, we measure the correlation between the output and the truth phase vector for various single- and multi-frequency synchronization methods, under the additive Gaussian and random corruption noise model, respectively. The SNR varies between and , which is in the extremely noisy regime: under the random corruption model, for instance, with , between and of the pairwise alignments are corrupted with random elements. In each subplot, the vertical axis varies from to , and the horizontal axis marks the change in . The bottom row in each subplot thus represents the single-frequency () version of the algorithm. The methods under comparison are: (a) AMP [PWBM18] with random initialization; (b) PPE-SPC; (c) MFGPM with random initialization; (d) PPE-SDP (replacing the spectral methods in Algorithm 1 with SDP relaxation); (e) PPE-SDP with an additional projection to rank-one matrices in each iteration; (f) Iterating PPE-SPC three times; (g) AMP initialized with PPE-SPC; (h) MFGPM initialized with PPE-SPC.
It is clear from Figure 3 and Figure 4 that leveraging information in multiple frequency channels produces superior results than single-frequency approaches. Most shockingly, in Figure 4 our proposed PPE-SPC method and variants [subplots (b)–(h)] are capable of recovering the true phase vector when the SNR is well below the critical threshold (corresponding to ) determined in [Sin11] by random matrix arguments. This is surprising because, according to [Sin11], for single frequency phase synchronization one can not expect correlation to be much higher than , which is in our experiments. This is confirmed by looking at the bottom row of each subplot of Figure 4, but with suitably large this barrier no longer exists, even though in model (5) our high-frequency measurements are generated from the single frequency data.
In Figure 3 and Figure 4, (d) and (e) illustrates the performance of the SDP variant of PPE-SPC. The difference between (d) and (e) is the following: in (d) we use directly estimated by solving the SDP in [Sin11], but in (e) we apply project the SDP solution to a rank-one matrix using eigen-decomposition. The results from these SDP variants are occasionally slightly better PPE-SPC, but the computational cost is expensive: the runtime is over times longer, and a lot more memory is required. The SDP relaxation in [BCS15] is even more demanding on computation resources so is not included here.
Figures (f)f and (f)f explore another possibility of extending PPE-SPC: After recovering , take entrywise powers of and treat them as multi-frequency data input to another fresh run of PPE-SPC. Unlike the iterative refinement algorithm MFGPM, we observed empirically that the performance boost saturate quickly after just a couple of such repeated calls to PPE-SPC. The result in (f) from both figures are obtained from performing such repetitions. Compared with (b), this strategy improves the estimation accuracy for smaller , but the performance gain is not as significant as using MFGPM for iterative refinements (h).
Initialization turns out to be important for AMP: As shown in Figure (a)a, when the SNR is below the critical threshold predicted in [PWBM18] (), increasing does not lead to performance improvement; the critical threshold appears even higher for random corruption model (Figure (a)a). In contrast, PPE-SPC and MFGPM can always benefit from sufficiently larger .
Comparison results for synchronization under Gaussian noise model and random corruption model are shown in Figure (a)a and (b)b, respectively. In all these experiments, the Fourier transform (28) is numerically evaluated using elements uniformly sampled in . Clearly, the proposed method outperforms single frequency methods and achieve higher accuracy as increases; moreover, the multi-frequency formulation and algorithm lead to drastic performance boost especially at the “low SNR regime.”
In Figure 6 we compare AMP and MFGPM with different initialization strategies–PPE-SPC vs. random initialization–under the additive Gaussian noise model (24) with . We plot the accuracy of using PPE-SPC alone without iterative refinement as a baseline. The results demonstrate the performance boost from using PPE-SPC for initialization, as well as improvements gain from using iterative refinements on top of the initialization PPE-SPC.
In this paper, we propose a novel, mult-frequency formulation for phase synchronization as a nonconvex optimization problem, for which we develop a two-stage algorithm inspired by harmonic retrieval and generalized power method that produces high accuracy approximate solutions. We demonstrate in theory and experiments that the new framework significantly outperform all existing phase synchronization algorithms.
There are many opportunities for future research. We are particularly interested in gaining deeper theoretical understandings for the multi-frequency GPM algorithm, especially its performance guarantees and behavior near local optimum. More general harmonic retrieval techniques can be potentially used in place of the periodogram-based peak extraction. We are also working on extending the algorithmic framework beyond compact Lie groups, such as Euclidean groups and symmetric groups, with applications to object matching [SHSS16, PKS13].
Appendix A Technical Proofs
a.1 Proof of Lemma 1
Proof of Lemma 1.
The conclusion of this lemma is identical to [ZB18, Theorem 8]; the only difference is that the event probability is slightly larger — in [ZB18, Theorem 8] the event probability is . This can be done by straightforwardly modifying the arguments in the proof of [ZB18, Theorem 8], and at the expense of increasing the absolute constant picked in that proof. Actually, this is already stated by the authors of [ZB18] on page 998 of the published version, in the paragraph right below their Theorem 5. We document here how this modification can be done.
The randomness in the proof of [ZB18, Theorem 8] arises only from the dependence of Lemma 9 and Lemma 10 of [ZB18], so it is sufficient to track the failure probability of the events there. These modifications only need to be stated for real sub-Gaussian random variables, as the trivial passage from real to complex cases is the same as detailed in the proof of [ZB18, Lemma 9].
[ZB18, Lemma 9] is based on the well-known concentration results on the maximum singular value of sub-Gaussian random matrices, in particular, [RV10, Proposition 2.4], which states for any sub-Gaussian random matrix of dimension -by- with independent, zero mean sub-Gaussian entries (whose subgaussian moments are bounded by ) that, for any ,
where are positive absolute constants. We take here , so with probability at least . Obviously, there exists sufficiently large absolute constant such that
where is the arbitrarily chosen but fixed constant in the statement of our Lemma 1.
[ZB18, Lemma 10] attains the event probability by taking a union bound, over instances of and instances of , for individual event probabilities of , where is an absolute positive constant. However, note that in the case of eigenvectors, we have (consisting of a singleton, cf. the second paragraph on pp.1000 of [ZB18], right above section title “Introducing auxiliary eigenvector problems”), which is two orders of magnitude smaller than the bound stated in [ZB18, Lemma 10]. The union bound thus yields the success probability of at least , which is .
Combining both ends lead to the success probability of for any .
For the last inequality, note that , , and , and note that for all and . We have
where in the last inequality we used the assumption ∎
a.2 Proof of Lemma 2
Proof of Lemma 2.
The proof starts with some elementary observations for the Dirichlet kernel , defined as
Note the following (cf. Figure 7):
is upper bounded by ;
vanishes at , for ;
A unique local maximum exists between each pair of consecutive zeros on .
Let be the local maximizer attaining the highest “side lobe” of between and in Figure 7. When , by Lemma 1, the periodogram will not exceed
On the other hand, again by Lemma 1, the periodogram stays above