Block-Coordinate Descent on the Riemannian Staircase for Certifiably Correct Distributed Rotation and Pose Synchronization

Block-Coordinate Descent on the Riemannian Staircase for Certifiably Correct Distributed Rotation and Pose Synchronization


This paper presents the first certifiably correct solver for distributed rotation and pose synchronization, the backbone of modern collaborative simultaneous localization and mapping (CSLAM) and camera network localization (CNL) systems. By pursuing a sparse semidefinite relaxation, our approach provides formal performance guarantees that match the state of the art in the centralized setting. In particular, we prove that under “low” noise, the solution to the semidefinite relaxation is guaranteed to provide a globally optimal solution to the original non-convex problem. To solve the resulting large-scale semidefinite programs, we adopt the state-of-the-art Riemannian Staircase framework and develop Riemannian block-coordinate descent (RBCD) as the core distributed local search algorithm. RBCD is well-suited to distributed synchronization problems as it only requires local communication, provides privacy protection, and is easily parallelizable. Furthermore, we prove that RBCD converges to first-order critical points for general Riemannian optimization problems over product of matrix submanifolds, with a global sublinear convergence rate. Extensive evaluations on real and synthetic datasets demonstrate that the proposed solver correctly recovers globally optimal solutions under low-to-moderate noise, and outperforms alternative distributed techniques in terms of solution precision and convergence speed.


1 Introduction

Collaborative simultaneous localization and mapping (CSLAM) is a fundamental capability for multi-robot systems navigating in unknown GPS-denied environments. CSLAM allows robots to leverage the observations acquired by their peers to improve their own spatial awareness. Additionally, it provides a consistent spatial understanding across the team, which is a key perquisite for more complex modes of collaboration in multi-robot missions. In this work, we focus on the back-end stage of modern CSLAM and camera network localization (CNL) systems; see [1, 2] and references therein for recent results on CSLAM front-ends. In the back-end stage, agents collaboratively solve a rotation synchronization or pose synchronization problem, the latter also known as pose-graph optimization (PGO), in order to estimate their orientations or poses based on noisy relative measurements.

Centralized schemes for collaborative PGO, e.g., used in [3], are suitable only for limited scenarios as they require a number of restrictive assumptions: a central node that is capable of solving the entire team’s PGO problem; a sufficiently reliable communication channel that connects the central node to the team; and enough resources (mainly, energy and bandwidth) for regularly relaying the team’s (raw or preprocessed) observations to the central node. Additionally, these schemes do not protect the privacy of their participants (since the central node would have access to the entire team’s observations and trajectories) and are less robust due to having a single point of failure. These critical issues demonstrate the need for decentralized and distributed PGO solvers for CSLAM and CNL.

State-of-the-art decentralized and distributed back-ends [4, 5, 6] have fallen behind recent breakthroughs that have led to certifiably correct centralized PGO solvers based on semidefinite relaxations [7, 8, 9, 10, 11]. Specifiably, existing distributed solvers such as [4] rely on local search algorithms (e.g., Gauss-Newton) to solve PGO, and are therefore susceptible to suboptimal solutions and arbitrarily bad local minima due to the non-convexity of the problem. In contrast, state-of-the-art centralized approaches attain global optimality by relaxing PGO into a tractable semidefinite program (SDP). Among these works, Rosen et al. [7] further show that under prevalent low-to-moderate noise regimes, SDP relaxation is guaranteed to recover the unique global minimizer to PGO up to global gauge symmetry.

The main goal of this paper is to fill the aforementioned technical gap by designing certifiably correct decentralized and distributed PGO solvers. To this end, we pursue an alternative SDP relaxation [8] for pose synchronization that avoids the elimination of translation variables and thus preserves the essential sparsity structure for distributed optimization (see Remark 1). As our first contribution, we provide formal performance guarantees for this SDP relaxation by showing that it shares the same set of core theoretical properties with the original SDP relaxation [7], namely, existence of low-rank solutions (Theorem 1) and exactness guarantees under low noise (Theorem 2).

Our second contribution is the design and analysis of a distributed algorithm for solving the SDP relaxations of rotation and pose synchronization. In practice, the sheer sizes of these SDPs make standard interior-point methods impractical. Consequently, recent works [7, 8] employ a technique known as Riemannian Staircase [12] that seeks low-rank solutions to the SDP by solving a hierarchy of rank-restricted SDPs using the celebrated Burer-Monteiro factorization [13]. At each level of the hierarchy, local search is performed to find critical points of the rank-restricted problem. In the centralized setting, the second-order Riemannian trust-region (RTR) algorithm [14, 15] has emerged as the default method to carry out the local search. Nevertheless, solving the trust-region subproblems in RTR requires extensive coordination among the agents and delicate bookkeeping of parameters, which makes the algorithm unsuitable for distributed computation.

To address this challenge, we propose Riemannian block-coordinate descent (RBCD) as the core distributed procedure to solve the rank-restricted SDP relaxations of PGO inside the Riemannian Staircase framework. RBCD is a general algorithm for optimization over direct product of matrix submanifolds. Under mild conditions, we show that the algorithm converges to first-order critical points with global sublinear convergence rate. Furthermore, by leveraging the sparsity structure and independence relations in the pose graph, RBCD retains the critical features of state-of-the-art distributed solvers [4]:

  • Cheap Iterations: Agents locally perform cheap (and closed-form in the case of CNL) iterations during optimization.

  • Local Communication: Agents only need to communicate with their neighbors (i.e., those connected by loop closures) in the pose graph.

  • Privacy Protection: During optimization, agents do not reveal information about their “private” states and observations.

  • Parallel Execution: At each iteration, multiple agents update their estimates in parallel without compromising the correctness of their solutions.

The rest of this paper is organized as follows. In the remainder of this section, we summarize our contributions and introduce relevant notations and preliminaries. In Section 2, we review state-of-the-art centralized and distributed solvers for PGO, as well as recent advances in block-coordinate optimization methods. In Section 3, we review the problem formulation of rotation and pose synchronization, their SDP relaxations, as well as the Riemannian Staircase framework. We also present formal guarantees for the alternative SDP relaxation of pose synchronization used in this work. The RBCD algorithm is presented in Section 4. In Section 5, we prove convergence of RBCD and provide global convergence rate analysis. We discuss several details related to the distributed implementation of Riemannian Staircase in Section 6. We conclude with extensive experimental evaluations in Section 7.


In this work, we propose a certifiably correct distributed solver for rotation and pose synchronization in the contexts of CSLAM and CNL. Specifically,

  • We prove that the sparse SDP relaxation [8] used in this work enjoys the same theoretical guarantees as the original SDP relaxation [7], namely, existence of low-rank solutions and exactness under low noise.

  • To solve the large-scale SDPs, we develop RBCD as the core distributed local search procedure within Riemannian Staircase [12]. RBCD is naturally well-suited to distributed synchronization problems (or similar optimization problems over sparse graphs) as it has cheap iterations, requires only local communications, provides privacy protection, and is easily parallelizable.

  • Under mild conditions, we prove that RBCD converges to first-order critical points for general optimization problems over product manifolds, with global sublinear convergence rate. Furthermore, we show that the required conditions are satisfied in CSLAM and CNL.

Notations and Preliminaries

General Notations

Unless stated otherwise, lowercase and uppercase letters are generally used for vectors and matrices, respectively. We use to denote the set of natural numbers up to and including . Unless specified otherwise, letters refer to indices of single poses or rotations, and refer to indices of robots.

Linear Algebra

and denote the set of symmetric and symmetric positive semidefinite matrices, respectively. is the identity matrix, and represent the vectors of all zeros and all ones, respectively. For a matrix , we use to index its -th entry. Given a ()-block-structured matrix , refers to its -th block. Following [7], we define as the linear operator that extracts the diagonal blocks of and zeros out all remaining blocks, and as its symmetric version; see [7, Equations (4)-(5)]. Finally, denotes the projection operator onto a given set .

Differential Geometry and Matrix Lie Groups

The orthogonal group is defined as,


The special orthogonal group is defined as,


The special Euclidean group is defined as,


The Stiefel manifold is defined as,


In general, we use to denote a smooth matrix submanifold. For two manifolds , denotes their product manifold. denotes the -th power manifold of . Similar to [7], we represent the product manifold and power manifold in matrix form as,


On a manifold , (or for brevity) denotes the tangent space at . The tangent space is endowed with the standard Riemannian metric induced from the ambient (Euclidean) space, i.e., , and the induced norm is . denotes a retraction operator, with being its restriction to . For a function , we use and to denote the Euclidean and Riemannian gradients of at . We call a first-order critical point if the corresponding Riemannian gradient is zero. Readers are referred to [15] for an excellent and comprehensive review of relevant differential geometry concepts for optimization on matrix submanifolds.

2 Related Work

2.1 Centralized Solvers

Rosen et al. [7] developed SE-Sync, a state-of-the-art certifiably correct solver for PGO. SE-Sync solves an SDP relaxation of PGO after analytically eliminating translation variables [16]. It is shown that under low noise, the SDP relaxation is guaranteed to be exact and hence can be used to extract a globally optimal solution to the original PGO problem. In addition to the theoretical low-noise guarantee, it is also empirically demonstrated that global optimality holds under typical noise levels encountered in robotic applications.

Despite the need to solve a large-scale SDP, SE-Sync often outperforms conventional sparse nonlinear least squares solvers in terms of runtime. This is mainly attributed to the Riemannian Staircase algorithm [12] which leverages the so-called Burer-Monteiro factorization [13] to search for low-rank solutions of the SDP. The Riemannian Staircase requires a numerical optimization algorithm to search for (preferably, second-order) critical points of a non-convex optimization problem over the product of Stiefel manifolds. By default, SE-Sync uses the second-order Riemannian trust-region (RTR) method [14, 15]. RTR is a popular method due to its useful features such as provable global convergence to second-order critical points and superlinear local convergence rate. In order to avoid inverting the Riemannian Hessian at each iteration, “inverse-free” techniques such as truncated conjugate gradient (tCG) are frequently used inside RTR to solve each trust-region subproblem. Unfortunately, in practice RTR is limited to the centralized setting since solving the trust-region subproblems requires extensive coordination among the agents, e.g., due to delicate bookkeeping required to update the descent direction in tCG.

A similar centralized solver, named Cartan-Sync, is proposed by Briales and Gonzalez-Jimenez [8]. The main difference between SE-Sync and Cartan-Sync is that the latter directly performs SDP relaxation over PGO without first analytically eliminating the translations. As a result, the rank-restricted problems solved inside the Riemannian Staircase are defined over the direct product of Stiefel manifolds and the Euclidean space. Because of the non-compactness of this search space, the low-noise global optimality guarantees in [7] have not been extended to this case.

Similar SDP relaxations [9, 11, 17] have also been proposed for the closely related problem of angular synchronization and rotation synchronization (also known as rotation averaging [18] or synchronization over the special orthogonal group). This fundamental problem arises in a number of applications such as CNL [19, 20, 21], structure from motion [18], and other domains such as cryo-EM in structural biology [22]. In this paper, we study distributed rotation synchronization alongside pose synchronization as our algorithm is applicable to both problems.1

This work follows a similar path as [7, 8] and considers solving the SDP relaxations of rotation and pose synchronization problems distributedly. Due to the need to preserve sparsity for distributed computation, we choose to pursue the alternative (sparse albeit non-compact) SDP relaxation used in Cartan-Sync [8]; see Remark 1. As one of our main contributions, we address the open problem concerning the theoretical properties of this SDP relaxation. In Section 3, we show that despite non-compactness, the alternative SDP relaxation enjoys the same performance guarantees as the original SDP relaxation employed by SE-Sync [7]. This result serves as a strong theoretical foundation that motivates us to design efficient distributed solvers in Section 4.

2.2 Decentralized Solvers

The work by Choudhary et al. [4] is currently the state of the art in distributed PGO solvers and has been recently used by modern decentralized CSLAM systems [1, 23]. Choudhary et al. [4] propose a two-stage approach for finding approximate solutions to PGO. The first stage estimates the rotation variables by solving a linear least squares problem after relaxing the non-convex rotation constraints. The resulting solution is then projected back to the special orthogonal group. The rotation estimates are then used in the second stage to initialize a single Gauss-Newton iteration on the full pose synchronization problem. In both stages, classical distributed techniques such as Jacobi over-relaxation (JOR) and successive over-relaxation (SOR) [24] are used to solve the normal equations of the linear least squares problems. The experimental evaluations presented in [4] demonstrate that this approach significantly outperforms prior techniques [5, 6]. Nonetheless, the proposed approach is still performing incomplete local search on a non-convex problem and thus does not offer any performance guarantees.

In another line of work, Tron [19], Tron and Vidal [20], Tron et al. [21] propose a multi-stage distributed Riemannian consensus protocol for CNL based on distributed execution of Riemannian gradient descent over where and is the number of cameras (agents). CNL can be seen as a special instance of collaborative PGO where each agent owns a single pose rather than an entire trajectory. In these works, the authors establish convergence to critical points and, under perfect (noiseless) measurements, convergence to globally optimal solutions. We present a specialized form of our distributed PGO algorithm for CNL in Sections 4.3 and 4.4.

2.3 Block-Coordinate Descent

Block-coordinate descent (BCD) methods (also known as Gauss-Seidel-type methods) are classical techniques [24] that have recently regained popularity in large-scale machine learning and numerical optimization [25, 26, 27, 28]. These methods are popular due to their simplicity, cheap iteration complexity, and flexibility in the parallel and distributed settings [24].

BCD is a natural choice for solving PGO (among other optimization problems over graphs) in the distributed setting due to the graphical decomposition of the underlying optimization problems. In SLAM, BCD-type techniques have been applied in the past [29, 30]. In computer vision, variants of the Weiszfeld algorithm have also been used for robust rotation averaging [18, 31]. The abovementioned works, however, use BCD for local search and thus cannot guarantee global optimality in rotation or pose synchronization problems. More recently, Eriksson et al. [9] propose a BCD-type algorithm for solving the SDP relaxation of rotation synchronization. Their row-by-row (RBR) solver extends the approach of Wen et al. [32] from SDPs with diagonal constraints to block-diagonal constraints. In small problems with up to rotations, RBR is shown to be comparable or better than the Riemannian Staircase approach [12] in terms of runtime. This approach, however, needs to store and manipulate a dense matrix which is not sustainable in SLAM where in typical moderate-size problems, is one to two orders of magnitude larger than the problems considered in [9]. We provide a runtime comparison between RBR and our algorithm in Section 7.2. Finally, although in principle this algorithm can be executed distributedly, the resulting scheme would not preserve the privacy of the participants.

This work is originally inspired by recent block-coordinate minimization algorithms for solving SDPs with diagonal constraints via the Burer-Monteiro approach [33, 34]. Our recent technical report [35] extends these algorithms and the global convergence rate analysis provided by Erdogdu et al. [34] from the unit sphere (SDPs with diagonal constraints) to the Stiefel manifold (SDPs with block-diagonal constraints). In this work, we further extend our initial results by providing a unified Riemannian BCD algorithm and its global convergence rate analysis, as well as proposing specialized scheme for collaborative rotation and pose synchronization.

3 Certifiably Correct Pose-Graph Optimization

Rotation Sync. (Problem 1)

SDP Relaxation for Rotation Sync. (Problem 3)

Rank-restricted SDP for Rotation Sync. (Problem 6)

Pose Sync. (Problem 2)

SDP Relaxation for Pose Sync. (Problem 4)

Rank-restricted SDP for Pose Sync. (Problem 7)

SDP relaxation

low-noise guarantees

BM factorization


SDP relaxation

low-noise guarantees

(Theorem 2)

BM factorization


Figure 1: Relations between problems considered in this work. From the original PGO problem (rotation or pose synchronization), applying semidefinite relaxation yields the corresponding SDPs. Applying Burer-Monteiro (BM) factorization [13] on the SDPs then yields the rank-restricted SDPs which can be solved with Riemannian local search algorithms. Solutions to the SDPs can be recovered from solutions to their rank-restricted surrogates via post-hoc verification (Section 6.2). Finally, under sufficiently low noise, SDP relaxations are guaranteed to find global minimizers to PGO (e.g., see Theorem 2).

In this section, we formally introduce the pose-graph optimization (PGO) problem. We also review state-of-the-art certifiably correct PGO solvers based on SDP relaxations, together with how these SDPs are solved in practice using the Riemannian Staircase framework. Figure 1 summarizes the problems we introduce in this section and how they relate to each other. As our first technical contribution, we establish formal guarantees for the alternative SDP relaxation of PGO [8] used in this work; see Theorem 1 and Theorem 2.

3.1 Pose-Graph Optimization (PGO)

In PGO, we need to estimate unknown rotations or poses from a set of noisy relative measurements. In graph terms, PGO can be modeled with a directed graph (pose graph) , where and correspond to the sets of unknown poses and relative measurements, respectively. In the rest of this paper, we make the standard assumption that is weakly connected.

Rotation Synchronization

In some applications, the unknown variables consist only of rotations rather than of full poses. Examples include CNL [20, 21], structure from motion pipelines [9], and initialization techniques for SLAM [36]. In these cases, the estimation problem is more frequently referred to as rotation synchronization or rotation averaging. Let denote the set of rotation variables. Following [7], we assume that for each edge , the corresponding relative rotation measurement is generated from a Langevin distribution,


where denotes the ground truth relative rotation.

Under noise model (7), it can be shown that the maximum likelihood estimate (MLE) corresponds to the minimizer of the following non-convex optimization problem,

Problem 1 (Rotation Synchronization).
subject to (8b)

Pose Synchronization

Next, we consider estimation of full pose variables which is the default setup in pose-graph SLAM. Analogous to Problem 1, this problem is also known as pose synchronization or motion averaging. Each variable is now an element from the special Euclidean group, represented (with a slight abuse of notation) as , where represents the translation component of pose . In addition to the relative rotation measurements (7), we also obtain relative translation measurements corrupted by additive Gaussian noise,


where denotes the ground truth relative translation. The MLE of pose synchronization corresponds to the minimizer of the following problem,

Problem 2 (Pose Synchronization).
subject to (10b)

Distributed PGO







(a) CNL




Figure 2: Distributed PGO scenarios considered in this work. (a) In CNL, a group of cameras need to localize each other in a common reference frame. Each vertex in the pose graph denotes the pose of a single camera. Cameras that share overlapping fields of view are connected by relative measurements. (b) In CSLAM, multiple robots need to jointly estimate their trajectories in the same frame. Each robot has multiple pose variables that are connected by odometric measurements and loop closures. We refer to poses that have inter-robot loop closures (dashed edges) as public (marked in red), and all other poses as private (marked in black).

In this work, we focus on solving Problem 1 and 2 in the distributed setting. More specifically, we consider two important real-world applications: CNL and CSLAM. In CNL, a network of cameras need to localize each other with respect to a common reference. In this case, each vertex in the pose graph represents the rotation or pose of a single camera. Relative measurements are extracted between camera pairs with overlapping fields of view using standard two-view geometry techniques and subject to scale ambiguity in the case of monocular cameras. See Figure 1(a) for a simple illustration.

In CSLAM, multiple robots need to jointly estimate their trajectories in a common reference frame. In the pose graph, each vertex represents the pose of a robot at a certain time step. Odometric measurements and intra-robot loop closures connect poses within a single robot’s trajectory. When two robots visit the same part of the environment, they establish inter-robot loop closures that link their respective poses. See Figure 1(b) for a simple example. In this case, we can further divide vertices into two classes based on whether they have inter-robot loop closures.

Definition 1 (Public and private poses in CSLAM).

In CSLAM, all poses that share inter-robot loop closures with poses of other robots are called public poses. All other poses are private poses.

In the literature, public poses are also known as separators [6]. These poses play an important role in distributed PGO as they allow information to flow across the entire team. Specifically, during optimization, each agent (camera or robot) update its own poses after receiving the updated public poses from its neighbors (i.e., those connected by loop closures in the pose graph). In Figure 1(b), public poses are marked in red and private poses are marked in black. We note that with this definition, all poses in CNL would be characterized as public since each agent (camera) only has a single pose.

3.2 SDP Relaxation for PGO

Traditionally, Problem 1 and 2 are solved with local search algorithms such as Gauss-Newton. However, depending on the noise level and the quality of initialization, local search algorithms are susceptible to local minima [37]. To address this critical issue, recent works aim to develop certifiably correct global PGO solvers. In particular, techniques based on SDP relaxation demonstrate state-of-the-art performance and furthermore provide strong theoretical guarantees under low noise regimes [7, 9, 11]. In this section, we present SDP relaxations for Problems 1 and 2.

We begin with rotation synchronization (Problem 1). Let denote the stacked rotation variables. It can be shown that the cost function (8a) in Problem 1 can be equivalently written as , where is the so-called connection Laplacian matrix [7]. Consider the “lifted” variable . Treating as a -block-structured matrix, we can rewrite the constraints (8b) in Problem 1 as , for , , and for all . Dropping the last two non-convex constraints yields the following SDP relaxation for Problem 1.

Problem 3 (SDP Relaxation for Rotation Synchronization).
subject to (11b)

Following similar steps, one can derive the SDP relaxtion for the pose synchronization problem [8]. Denote the connection Laplacian of Problem 2 as , and treat the SDP variable as a -block-structured matrix. Then, the SDP relaxation is given by,

Problem 4 (SDP Relaxation for Pose Synchronization [8]).
subject to (12b)

The original SE-Sync algorithm [7] employs a different SDP relaxation for Problem 2 by first using the separable structure of PGO [16] to analytically eliminate the translation variables, and subsequently performing convex relaxation over the reduced rotation-only problem. The resulting SDP has the form,

Problem 5 (Rotation-only SDP Relaxation for Pose Synchronization [7]).
subject to (13b)

In (13a), is a dense cost matrix, and is essentially the generalized Schur complement of the sparse connection Laplacian (see Appendix A). As the first technical contribution in this section, we establish the following theorem which characterizes key relations between Problem 4 and Problem 5.

Theorem 1 (Relations between Problem 4 and Problem 5).

Problem 4 admits a minimizer with if and only if Problem 5 admits a minimizer with the same rank. Furthermore, for all minimizers of Problem 4 and Problem 5.

Theorem 1 suggests that, for pose synchronization, the elimination of translation variables does not affect the optimal value or the rank of solutions in the SDP relaxation. This result has a far-reaching impact, as it further allows us to establish equivalent low-noise guarantees for the two SDP relaxations. Specifically, Rosen et al. [7] show that under low noise, the rotation-only SDP (Problem 5) is exact, i.e., from its solution one can recover a global minimizer to the original non-convex pose synchronization problem [7, Proposition 2]. Using this result and Theorem 1, we show that the same a priori guarantee can be established for our SDP relaxation (Problem 4), which is first used in [8] albeit without exactness guarantees. We give an informal statement below, and provide the formal theorem and its proof in Appendix B.

Theorem 2 (Exact recovery via Problem 4 (Informal)).

Under sufficiently low measurement noise, every minimizer to Problem 4 has its first block row given by , where is an optimal solution to Problem 2.

Theorem 2 provides a strong theoretical justification on why we solve Problem 4 — under low noise (which we characterize in Appendix B), one can directly read off a global minimizer to Problem 2 from the first block row of any . We note that this global optimality result usually holds on real-world datasets (see Section 7). Similar guarantees can be established for rotation synchronization, e.g., using a subset of the machinery in [7]; see also [9] for a similar result.

Remark 1.

Here we explain why we refrain from pursuing the original SDP relaxation (Problem 5) used by SE-Sync [7] in this work. Problem 5 enjoys several benefits including having a compact search space and better numerical conditioning. However, the cost matrix in Problem 5 is dense due to the Schur complement operation (see Appendix A). In graph terms, eliminating the translation variables makes the underlying dependency graph fully connected. This is a major drawback in the distributed setting since robots’ public and private poses become fully dependent on other robots’ trajectories, which increases the communication costs substantially. As we shall see in the following sections, our proposed algorithms rely on and exploit the sparse graphical structure of the problem to achieve computational and communication efficiency, and to preserve the privacy of participating robots. For this reason, in this work we adopt the alternative approach of Briales and Gonzalez-Jimenez [8] which directly relaxes the original problem and thus preserves the essential sparsity structure (e.g., as in Figure 1(a) and 1(b)) for distributed optimization. As shown by Theorem 2, the alternative approach preserves the essential theoretical guarantees.

3.3 The Riemannian Staircase Algorithm

In typical CSLAM scenarios, the dimension of the SDP relaxation can be quite large (e.g., ), and thus it is often impractical to solve Problems 3 and 4 with interior-point methods. To address this issue, Burer and Monteiro in their seminal work [13] propose to solve rank-restricted versions of the original SDPs. This approach is justified by a theorem of Barvinok [38] and Pataki [39], which guarantees the existence of low-rank solutions for SDPs with compact search spaces. Specifically, applying this theorem to Problem 3 and 5 guarantees that both SDPs admit solutions with rank no greater than [12]. As a direct consequence of Theorem 1, the same result holds for our non-compact SDP (Problem 4).

Corollary 1.

Problem 4 admits a minimizer with .

For SDPs with block-diagonal constraints, Boumal [12] extends the general approach of Burer and Monteiro [13] by further exploiting the geometric structure within the problem. The result is an elegant algorithm known as Riemannian Staircase, which has been used as the back-end SDP solver in [7, 8]. In the Riemannian Staircase, we search for the SDP solution by solving a hierarchy of rank-restricted surrogates. At each level, we impose a rank- factorization () of the original SDP variable, i.e., by letting,


in Problems 3 and 4, respectively. It can be shown that the resulting rank-restricted SDPs are smooth (albeit non-convex) optimization problems on the Cartesian product of Stiefel manifolds; see Problems 6 and 7 below.

Problem 6 (Rank-restricted SDP for Rotation Synchronization).
Problem 7 (Rank-restricted SDP for Pose Synchronization).

In principle, one may attempt to solve Problems 6 and 7 via Riemannian local search algorithms. Due to the non-convex constraints in both problems, however, there is no a priori guarantee that the obtained solution is a global minimizer, or that it can be used to extract the SDP solution, at least for small values of . Nevertheless, given any first-order critical point, we can obtain a post-hoc certificate of global optimality by verifying the KKT conditions of the original SDP [12]. We discuss the details of this verification procedure in Section 6.2.

Algorithm 1 summarizes the Riemannian Staircase algorithm for pose synchronization. At level of the hierarchy, we first solve the rank-restricted SDP using local search algorithm (line 7). If the obtained first-order critical point passes the verification procedure, we use it to extract the solution to SDP and the algorithm terminates (line 8). If not, the current solution is used to warm-start the local search at the next level of the hierarchy (line 11). Although Riemannian Staircase is an iterative procedure, in practice it typically identifies the SDP solution at the first level (e.g., with ); see Section 7.

2:- Initial rank .
3:- Initial estimate (Section 6.1).
5:- Global minimizer of Problem 7 with a corresponding solution to Problem 4.
6:for  do
7:     Starting at , apply distributed local search (Section 4) to find a first-order critical point for Problem 7.
8:     if  passes verification (Section 6.2then return .
9:     else
10:         Set .
11:         Update via line search along the escape direction (Section 6.2).
12:     end if
13:end for
Algorithm 1 Riemannian Staircase for Pose Synchronization

We conclude this section by discussing the implications of using a first-order local search method within Riemannian Staircase (Algorithm 1, line 7). Boumal et al. [40] show that for almost all SDPs with compact search spaces and linearly independent constraints, if where is the number of SDP constraints, any second-order critical points of the corresponding rank-restricted SDPs are globally optimal. Such result motivates the use of second-order local search algorithms (e.g., RTR) within Algorithm 1, since for sufficiently large , these algorithms are guaranteed to converge to global minimizers at the first iteration. Nonetheless, in the distributed setting, it is challenging to design algorithms with second-order convergence guarantees. For this reason, we decide to pursue first-order convergence (Section 4-5) and employ verification (Section 6.2) as post-hoc certificate of global optimality.

We note that in reality, there are also several practical issues when applying the theorem of Boumal et al. [40] to problems considered in this work. First, the theorem in its current form does not hold for pose synchronization, since the SDP relaxation (Problem 4) has a non-compact search space. Second, the bound on (which translates to ) is too conservative in typical PGO problems. As we mentioned earlier, in practice global optimality often holds for much smaller ; see Section 7 and [7, 8].

4 Riemannian Block-Coordinate Descent

In this section, we propose a distributed local search algorithm to find first-order critical points of the rank-restricted SDPs inside the Riemannian Staircase framework. For both CNL and CSLAM, the vertex set in the pose graph admits a natural disjoint partition . In CNL, each block corresponds to a single camera, i.e., and . In CSLAM, each block corresponds to the trajectory of a single robot , i.e, is the number of robots and contains the indices of pose variables owned by robot .

To leverage the abovementioned natural partitions (as well as more generalized blocking schemes discussed in Section 4.5), we propose Riemannian block-coordinate descent (RBCD) as the distributed local search algorithm within Riemannian Staircase. At an abstract level, RBCD solves a general optimization problem over the direct product of matrix submanifolds,


Given a disjoint partitioning of indices , each iteration of RBCD optimizes variables in a single block . For now, this means that in each iteration only a single agent (e.g., camera or robot) updates its decision variables, while other agents remain idle. We address this limitation in Section 4.5 by providing highly effective parallel execution schemes for RBCD. Algorithm 2 provides the pseudocode for RBCD.

The rest of this section is organized to discuss details of each step in Algorithm 2. We begin with the discussion of block selection rules in Section 4.1 to determine how blocks should be selected at each iteration of RBCD (Line 8). Then, in Section 4.2, we propose a general block update rule for an arbitrary manifold optimization problem. In Section 4.3-4.4, we focus on the special cases of rotation and pose synchronizations in CNL, and derive optimal block update rules in these contexts. Finally, in Section 4.5, we discuss parallel execution schemes that further accelerate RBCD in practice.

2:- Global cost function .
3:- Blocks .
4:- Initial solution (Section 6.1).
6:- First-order critical point .
7:for  do
8:     Select block with denoting the corresponding component in .
9:     Update by (approximately) solving .
10:     Carry over all other blocks .
11:end for
Algorithm 2 Riemannian Block-Coordinate Descent (RBCD)

4.1 Block Selection Rules

In this section, we describe three mechanisms for selecting which block to update at each iteration of RBCD (line 8). The first two block selection rules are based on random sampling [41, 34]. At each iteration, a block is selected with probability . The simplest among such rules is uniform sampling, in which all blocks are selected with an equal probability, i.e., .

In practice, it is often the case that selecting certain blocks leads to significantly larger decrement of the cost function compared to others. Therefore, it is natural to assign these blocks higher weights during the sampling process. We refer to this block selection rule as importance sampling. In this work, we design the probability of selecting each block to be proportional to the squared norm of Riemannian gradient, i.e., . Under Lipschitz-type conditions, it can be shown that the squared gradient norm provides a lower bound on the cost decrement along the direction of negative gradient [25].

We can also modify importance sampling into a deterministic strategy. At each iteration, we can directly choose the block with the largest squared gradient norm, i.e., . We refer to this strategy as greedy selection and more specifically the Gauss-Southwell (GS) rule [25]. Recent works also propose other variants of greedy selection such as Gauss-Southwell-Lipschitz (GSL) and Gauss-Southwell-Quadratic (GSQ) [25]. However, such rules require additional knowledge about the block Lipschitz constants that are often hard to obtain in practice. For this reason, we restrict our deterministic selection rule to GS. Despite its simpler nature, empirically the GS rule already demonstrates satisfactory performance; see Section 7.

In practice, although importance sampling and greedy selection tend to produce more effective iterations, they also incur additional coordination and communication overhead in the distributed scenario. For example, with greedy selection, at the beginning of each iteration agents need to coordinate and find the block with maximum squared gradient norm. In contrast, uniform sampling incurs minimal overhead in the block selection stage.

4.2 Block update with Riemannian Trust-Region (RTR)

In this section, we describe our default method to perform block update at each iteration of RBCD (line 9). Let be the component of corresponding to the selected block, and let be the complement of in whose values are fixed at this iteration. Define the reduced cost function . To perform block update, we solve the reduced problem,


As concrete examples, consider solving the rank-restricted SDP of rotation synchronization (Problem 6) in the context of CSLAM. Recall that in this case, each block corresponds to variables owned by robot in Problem 6. Fixing all other robots’ variables , it can be shown that the reduced problem defined over is of the form,


where is the submatrix of formed with the rows and columns that correspond to robot ’s variables, and is a constant matrix that depends on the (fixed) public variables of robot ’s neighbors.

Similarly, for pose synchronization (Problem 7), let be the set of variables corresponding to the trajectory of robot . Fixing the variables of all other robots, the reduced problem over is of the form,


where the constant matrices , have similar interpretations as in (20).

The general problem (19) and its particular instances (20) and (21) only involve local variables of each agent. Therefore, after receiving the fixed public variables of its neighbors over the network, each agent can solve its reduced problem locally and independently. Nevertheless, due to the manifold constraints, these problems are in general non-convex and thus hard to solve to global optimality.

A natural alternative is to perform inexact update to in order to sufficiently decrease the reduced cost. For this purpose, there is a variety of optimization algorithms on matrix manifolds that one can consider. In this work, we select the popular second-order Riemannian trust-region (RTR) algorithm [15, Chapter 7] to solve the reduced problems. Compared to alternative first-order methods such as Riemannian gradient descent, RTR uses second-order information of the (local) reduced problem to speed up optimization. Empirically, we observe that in most cases, a single iteration of RTR yields sufficient descent on the cost function. As mentioned above, since by design each block corresponds to the decision variables of a single agent, RTR iterations (within RBCD) are executed locally by the selected agent. More details and extensive analysis of the algorithm are provided in Section 5.

Accelerating RTR via Preconditioning

In practice, most instances of (20) and (21) in CSLAM are poorly conditioned, i.e., the condition numbers of and are quite large. In these cases, suitable preconditioning can significantly speed up numerical optimization. In general, a Riemannian preconditioner for the reduced problem is a linear, symmetric, and positive definite operator that approximates the inverse of the Riemannian Hessian. In the particular cases of SLAM, Rosen and Carlone [42], Briales and Gonzalez-Jimenez [8] have already proposed empirically effective preconditioners for problems similar in nature to (20) and (21).2 The main idea is to approximate the directional derivatives of the Riemannian gradient with that of the Euclidean gradient in the ambient space. More specifically, we let our preconditioners be,


for problem (20) and (21), respectively. The small constant ensures that the proposed preconditioners are positive definite. It is straightforward to verify that (22) and (23) are linear and symmetric; e.g., see [8, Appendix VI]. In practice, each robot can compute and reuse the Cholesky decomposition of and for improved numerical efficiency.

4.3 Optimal Update for Single Rotation

The RTR-based block update described in the previous section works for arbitrary instances of the reduced problem (19). Nevertheless, for Problem 6 in the context of CNL, it is possible to perform optimal update by optimizing each reduced problem exactly. Recall that in CNL, each block in Problem 6 contains a single variable that corresponds to the “lifted” rotation of a single camera. After fixing all other variables for , the reduced problem over is,


where and distinguish neighbors of in the pose graph based on edge orientations.

Problem (24) is similar to an instance of the single rotation averaging problem [18]. After a series of algebraic manipulations [35], one can show that (24) is equivalent to minimizing the cost function for a constant matrix defined as,


Thus the solution to this problem is given by,


where the projection to Stiefel manifold can be implemented in closed-form via singular value decomposition (SVD). Specifically, let be a thin, rank- SVD of , i.e., and . Then, the projection is given by [43, Theorem 2.1]. In practice, is a small matrix (e.g., and in our experiments). The small size of implies that the SVD operation is very cheap, which makes each optimal update very efficient.

4.4 Optimal Update for Single Pose

Similarly, we show that optimal update can be derived for Problem 7 in CNL. Similar to the previous section, each block contains a single variable corresponding to the “lifted” pose of a single camera. After fixing all other blocks , the reduced optimization problem over is,

subject to

In (27), the optimization with respect to is an (unconstrained) linear least squares problem. We can thus analytically eliminate from (27) and form a further reduced problem involving only . Specifically, given any value of , the optimal value of (as a function of ) is given by the following weighted average in ,


To derive the reduced problem over , we substitute (28) into the original cost function (27). Using the fact that , we can simplify the resulting problem into the form of minimizing , where is a constant matrix of the form,


In (30), is the same cost matrix as defined in (25), and summarizes costs induced by the translation measurements. We give the exact expression for in Appendix G. Similar to Section 4.3, the optimal is given by the projection,


Afterwards, the optimal is recovered by substituting into (28).







(a) PABS with 3 colors (CNL)




(b) An aggregate block for CSLAM




(c) An aggregate block for CSLAM
Figure 3: Illustrating PABS (Definition 2) for CNL and CSLAM. (a) A -coloring of CNL. The corresponding PABS consists of three aggregate blocks associated to each color, i.e., , , , and . In each iteration of RBCD, one aggregate block is selected from and all blocks in are updated independently and in parallel. (b) and (c) illustrate two aggregate blocks in a PABS for CSLAM. First, robots are colored with two colors. Then is a PABS where each aggregate block consists of the “star” nodes. Note that the two aggregate blocks have a non-empty overlap.

4.5 Parallel Execution

So far we have assumed that the decision variables are partitioned into (disjoint) blocks <