BlockCoordinate Descent on the Riemannian Staircase for
Certifiably Correct Distributed Rotation and Pose Synchronization
Abstract
This paper presents the first certifiably correct solver for distributed rotation and pose synchronization, the backbone of modern collaborative simultaneous localization and mapping (CSLAM) and camera network localization (CNL) systems. By pursuing a sparse semidefinite relaxation, our approach provides formal performance guarantees that match the state of the art in the centralized setting. In particular, we prove that under “low” noise, the solution to the semidefinite relaxation is guaranteed to provide a globally optimal solution to the original nonconvex problem. To solve the resulting largescale semidefinite programs, we adopt the stateoftheart Riemannian Staircase framework and develop Riemannian blockcoordinate descent (RBCD) as the core distributed local search algorithm. RBCD is wellsuited to distributed synchronization problems as it only requires local communication, provides privacy protection, and is easily parallelizable. Furthermore, we prove that RBCD converges to firstorder critical points for general Riemannian optimization problems over product of matrix submanifolds, with a global sublinear convergence rate. Extensive evaluations on real and synthetic datasets demonstrate that the proposed solver correctly recovers globally optimal solutions under lowtomoderate noise, and outperforms alternative distributed techniques in terms of solution precision and convergence speed.
Contents
 1 Introduction
 2 Related Work
 3 Certifiably Correct PoseGraph Optimization
 4 Riemannian BlockCoordinate Descent
 5 Convergence Analysis for RBCD
 6 Distributed Initialization, Verification, and Rounding
 7 Experiments
 8 Conclusion
 A Proof of Theorem 1
 B Proof of Theorem 2
 C Proof of Corollary 1
 D Proof of Lemma 1
 E Proof of Lemma 2
 F Proof of Theorem 3
 G Optimal Update for Single Pose
1 Introduction
Collaborative simultaneous localization and mapping (CSLAM) is a fundamental capability for multirobot systems navigating in unknown GPSdenied environments. CSLAM allows robots to leverage the observations acquired by their peers to improve their own spatial awareness. Additionally, it provides a consistent spatial understanding across the team, which is a key perquisite for more complex modes of collaboration in multirobot missions. In this work, we focus on the backend stage of modern CSLAM and camera network localization (CNL) systems; see [1, 2] and references therein for recent results on CSLAM frontends. In the backend stage, agents collaboratively solve a rotation synchronization or pose synchronization problem, the latter also known as posegraph optimization (PGO), in order to estimate their orientations or poses based on noisy relative measurements.
Centralized schemes for collaborative PGO, e.g., used in [3], are suitable only for limited scenarios as they require a number of restrictive assumptions: a central node that is capable of solving the entire team’s PGO problem; a sufficiently reliable communication channel that connects the central node to the team; and enough resources (mainly, energy and bandwidth) for regularly relaying the team’s (raw or preprocessed) observations to the central node. Additionally, these schemes do not protect the privacy of their participants (since the central node would have access to the entire team’s observations and trajectories) and are less robust due to having a single point of failure. These critical issues demonstrate the need for decentralized and distributed PGO solvers for CSLAM and CNL.
Stateoftheart decentralized and distributed backends [4, 5, 6] have fallen behind recent breakthroughs that have led to certifiably correct centralized PGO solvers based on semidefinite relaxations [7, 8, 9, 10, 11]. Specifiably, existing distributed solvers such as [4] rely on local search algorithms (e.g., GaussNewton) to solve PGO, and are therefore susceptible to suboptimal solutions and arbitrarily bad local minima due to the nonconvexity of the problem. In contrast, stateoftheart centralized approaches attain global optimality by relaxing PGO into a tractable semidefinite program (SDP). Among these works, Rosen et al. [7] further show that under prevalent lowtomoderate noise regimes, SDP relaxation is guaranteed to recover the unique global minimizer to PGO up to global gauge symmetry.
The main goal of this paper is to fill the aforementioned technical gap by designing certifiably correct decentralized and distributed PGO solvers. To this end, we pursue an alternative SDP relaxation [8] for pose synchronization that avoids the elimination of translation variables and thus preserves the essential sparsity structure for distributed optimization (see Remark 1). As our first contribution, we provide formal performance guarantees for this SDP relaxation by showing that it shares the same set of core theoretical properties with the original SDP relaxation [7], namely, existence of lowrank solutions (Theorem 1) and exactness guarantees under low noise (Theorem 2).
Our second contribution is the design and analysis of a distributed algorithm for solving the SDP relaxations of rotation and pose synchronization. In practice, the sheer sizes of these SDPs make standard interiorpoint methods impractical. Consequently, recent works [7, 8] employ a technique known as Riemannian Staircase [12] that seeks lowrank solutions to the SDP by solving a hierarchy of rankrestricted SDPs using the celebrated BurerMonteiro factorization [13]. At each level of the hierarchy, local search is performed to find critical points of the rankrestricted problem. In the centralized setting, the secondorder Riemannian trustregion (RTR) algorithm [14, 15] has emerged as the default method to carry out the local search. Nevertheless, solving the trustregion subproblems in RTR requires extensive coordination among the agents and delicate bookkeeping of parameters, which makes the algorithm unsuitable for distributed computation.
To address this challenge, we propose Riemannian blockcoordinate descent (RBCD) as the core distributed procedure to solve the rankrestricted SDP relaxations of PGO inside the Riemannian Staircase framework. RBCD is a general algorithm for optimization over direct product of matrix submanifolds. Under mild conditions, we show that the algorithm converges to firstorder critical points with global sublinear convergence rate. Furthermore, by leveraging the sparsity structure and independence relations in the pose graph, RBCD retains the critical features of stateoftheart distributed solvers [4]:

Cheap Iterations: Agents locally perform cheap (and closedform in the case of CNL) iterations during optimization.

Local Communication: Agents only need to communicate with their neighbors (i.e., those connected by loop closures) in the pose graph.

Privacy Protection: During optimization, agents do not reveal information about their “private” states and observations.

Parallel Execution: At each iteration, multiple agents update their estimates in parallel without compromising the correctness of their solutions.
The rest of this paper is organized as follows. In the remainder of this section, we summarize our contributions and introduce relevant notations and preliminaries. In Section 2, we review stateoftheart centralized and distributed solvers for PGO, as well as recent advances in blockcoordinate optimization methods. In Section 3, we review the problem formulation of rotation and pose synchronization, their SDP relaxations, as well as the Riemannian Staircase framework. We also present formal guarantees for the alternative SDP relaxation of pose synchronization used in this work. The RBCD algorithm is presented in Section 4. In Section 5, we prove convergence of RBCD and provide global convergence rate analysis. We discuss several details related to the distributed implementation of Riemannian Staircase in Section 6. We conclude with extensive experimental evaluations in Section 7.
Contributions
In this work, we propose a certifiably correct distributed solver for rotation and pose synchronization in the contexts of CSLAM and CNL. Specifically,

To solve the largescale SDPs, we develop RBCD as the core distributed local search procedure within Riemannian Staircase [12]. RBCD is naturally wellsuited to distributed synchronization problems (or similar optimization problems over sparse graphs) as it has cheap iterations, requires only local communications, provides privacy protection, and is easily parallelizable.

Under mild conditions, we prove that RBCD converges to firstorder critical points for general optimization problems over product manifolds, with global sublinear convergence rate. Furthermore, we show that the required conditions are satisfied in CSLAM and CNL.
Notations and Preliminaries
General Notations
Unless stated otherwise, lowercase and uppercase letters are generally used for vectors and matrices, respectively. We use to denote the set of natural numbers up to and including . Unless specified otherwise, letters refer to indices of single poses or rotations, and refer to indices of robots.
Linear Algebra
and denote the set of symmetric and symmetric positive semidefinite matrices, respectively. is the identity matrix, and represent the vectors of all zeros and all ones, respectively. For a matrix , we use to index its th entry. Given a ()blockstructured matrix , refers to its th block. Following [7], we define as the linear operator that extracts the diagonal blocks of and zeros out all remaining blocks, and as its symmetric version; see [7, Equations (4)(5)]. Finally, denotes the projection operator onto a given set .
Differential Geometry and Matrix Lie Groups
The orthogonal group is defined as,
(1) 
The special orthogonal group is defined as,
(2) 
The special Euclidean group is defined as,
(3) 
The Stiefel manifold is defined as,
(4) 
In general, we use to denote a smooth matrix submanifold. For two manifolds , denotes their product manifold. denotes the th power manifold of . Similar to [7], we represent the product manifold and power manifold in matrix form as,
(5)  
(6) 
On a manifold , (or for brevity) denotes the tangent space at . The tangent space is endowed with the standard Riemannian metric induced from the ambient (Euclidean) space, i.e., , and the induced norm is . denotes a retraction operator, with being its restriction to . For a function , we use and to denote the Euclidean and Riemannian gradients of at . We call a firstorder critical point if the corresponding Riemannian gradient is zero. Readers are referred to [15] for an excellent and comprehensive review of relevant differential geometry concepts for optimization on matrix submanifolds.
2 Related Work
2.1 Centralized Solvers
Rosen et al. [7] developed SESync, a stateoftheart certifiably correct solver for PGO. SESync solves an SDP relaxation of PGO after analytically eliminating translation variables [16]. It is shown that under low noise, the SDP relaxation is guaranteed to be exact and hence can be used to extract a globally optimal solution to the original PGO problem. In addition to the theoretical lownoise guarantee, it is also empirically demonstrated that global optimality holds under typical noise levels encountered in robotic applications.
Despite the need to solve a largescale SDP, SESync often outperforms conventional sparse nonlinear least squares solvers in terms of runtime. This is mainly attributed to the Riemannian Staircase algorithm [12] which leverages the socalled BurerMonteiro factorization [13] to search for lowrank solutions of the SDP. The Riemannian Staircase requires a numerical optimization algorithm to search for (preferably, secondorder) critical points of a nonconvex optimization problem over the product of Stiefel manifolds. By default, SESync uses the secondorder Riemannian trustregion (RTR) method [14, 15]. RTR is a popular method due to its useful features such as provable global convergence to secondorder critical points and superlinear local convergence rate. In order to avoid inverting the Riemannian Hessian at each iteration, “inversefree” techniques such as truncated conjugate gradient (tCG) are frequently used inside RTR to solve each trustregion subproblem. Unfortunately, in practice RTR is limited to the centralized setting since solving the trustregion subproblems requires extensive coordination among the agents, e.g., due to delicate bookkeeping required to update the descent direction in tCG.
A similar centralized solver, named CartanSync, is proposed by Briales and GonzalezJimenez [8]. The main difference between SESync and CartanSync is that the latter directly performs SDP relaxation over PGO without first analytically eliminating the translations. As a result, the rankrestricted problems solved inside the Riemannian Staircase are defined over the direct product of Stiefel manifolds and the Euclidean space. Because of the noncompactness of this search space, the lownoise global optimality guarantees in [7] have not been extended to this case.
Similar SDP relaxations [9, 11, 17] have also been proposed for the closely related problem of angular synchronization and rotation synchronization (also known as rotation averaging [18] or synchronization over the special orthogonal group). This fundamental problem arises in a number of applications such as CNL [19, 20, 21], structure from motion [18], and other domains such as cryoEM in structural biology [22]. In this paper, we study distributed rotation synchronization alongside pose synchronization as our algorithm is applicable to both problems.^{1}^{1}1See Section 4.3 for a specialized form of our algorithm applied to rotation synchronization.
This work follows a similar path as [7, 8] and considers solving the SDP relaxations of rotation and pose synchronization problems distributedly. Due to the need to preserve sparsity for distributed computation, we choose to pursue the alternative (sparse albeit noncompact) SDP relaxation used in CartanSync [8]; see Remark 1. As one of our main contributions, we address the open problem concerning the theoretical properties of this SDP relaxation. In Section 3, we show that despite noncompactness, the alternative SDP relaxation enjoys the same performance guarantees as the original SDP relaxation employed by SESync [7]. This result serves as a strong theoretical foundation that motivates us to design efficient distributed solvers in Section 4.
2.2 Decentralized Solvers
The work by Choudhary et al. [4] is currently the state of the art in distributed PGO solvers and has been recently used by modern decentralized CSLAM systems [1, 23]. Choudhary et al. [4] propose a twostage approach for finding approximate solutions to PGO. The first stage estimates the rotation variables by solving a linear least squares problem after relaxing the nonconvex rotation constraints. The resulting solution is then projected back to the special orthogonal group. The rotation estimates are then used in the second stage to initialize a single GaussNewton iteration on the full pose synchronization problem. In both stages, classical distributed techniques such as Jacobi overrelaxation (JOR) and successive overrelaxation (SOR) [24] are used to solve the normal equations of the linear least squares problems. The experimental evaluations presented in [4] demonstrate that this approach significantly outperforms prior techniques [5, 6]. Nonetheless, the proposed approach is still performing incomplete local search on a nonconvex problem and thus does not offer any performance guarantees.
In another line of work, Tron [19], Tron and Vidal [20], Tron et al. [21] propose a multistage distributed Riemannian consensus protocol for CNL based on distributed execution of Riemannian gradient descent over where and is the number of cameras (agents). CNL can be seen as a special instance of collaborative PGO where each agent owns a single pose rather than an entire trajectory. In these works, the authors establish convergence to critical points and, under perfect (noiseless) measurements, convergence to globally optimal solutions. We present a specialized form of our distributed PGO algorithm for CNL in Sections 4.3 and 4.4.
2.3 BlockCoordinate Descent
Blockcoordinate descent (BCD) methods (also known as GaussSeideltype methods) are classical techniques [24] that have recently regained popularity in largescale machine learning and numerical optimization [25, 26, 27, 28]. These methods are popular due to their simplicity, cheap iteration complexity, and flexibility in the parallel and distributed settings [24].
BCD is a natural choice for solving PGO (among other optimization problems over graphs) in the distributed setting due to the graphical decomposition of the underlying optimization problems. In SLAM, BCDtype techniques have been applied in the past [29, 30]. In computer vision, variants of the Weiszfeld algorithm have also been used for robust rotation averaging [18, 31]. The abovementioned works, however, use BCD for local search and thus cannot guarantee global optimality in rotation or pose synchronization problems. More recently, Eriksson et al. [9] propose a BCDtype algorithm for solving the SDP relaxation of rotation synchronization. Their rowbyrow (RBR) solver extends the approach of Wen et al. [32] from SDPs with diagonal constraints to blockdiagonal constraints. In small problems with up to rotations, RBR is shown to be comparable or better than the Riemannian Staircase approach [12] in terms of runtime. This approach, however, needs to store and manipulate a dense matrix which is not sustainable in SLAM where in typical moderatesize problems, is one to two orders of magnitude larger than the problems considered in [9]. We provide a runtime comparison between RBR and our algorithm in Section 7.2. Finally, although in principle this algorithm can be executed distributedly, the resulting scheme would not preserve the privacy of the participants.
This work is originally inspired by recent blockcoordinate minimization algorithms for solving SDPs with diagonal constraints via the BurerMonteiro approach [33, 34]. Our recent technical report [35] extends these algorithms and the global convergence rate analysis provided by Erdogdu et al. [34] from the unit sphere (SDPs with diagonal constraints) to the Stiefel manifold (SDPs with blockdiagonal constraints). In this work, we further extend our initial results by providing a unified Riemannian BCD algorithm and its global convergence rate analysis, as well as proposing specialized scheme for collaborative rotation and pose synchronization.
3 Certifiably Correct PoseGraph Optimization
In this section, we formally introduce the posegraph optimization (PGO) problem. We also review stateoftheart certifiably correct PGO solvers based on SDP relaxations, together with how these SDPs are solved in practice using the Riemannian Staircase framework. Figure 1 summarizes the problems we introduce in this section and how they relate to each other. As our first technical contribution, we establish formal guarantees for the alternative SDP relaxation of PGO [8] used in this work; see Theorem 1 and Theorem 2.
3.1 PoseGraph Optimization (PGO)
In PGO, we need to estimate unknown rotations or poses from a set of noisy relative measurements. In graph terms, PGO can be modeled with a directed graph (pose graph) , where and correspond to the sets of unknown poses and relative measurements, respectively. In the rest of this paper, we make the standard assumption that is weakly connected.
3.1.1 Rotation Synchronization
In some applications, the unknown variables consist only of rotations rather than of full poses. Examples include CNL [20, 21], structure from motion pipelines [9], and initialization techniques for SLAM [36]. In these cases, the estimation problem is more frequently referred to as rotation synchronization or rotation averaging. Let denote the set of rotation variables. Following [7], we assume that for each edge , the corresponding relative rotation measurement is generated from a Langevin distribution,
(7) 
where denotes the ground truth relative rotation.
Under noise model (7), it can be shown that the maximum likelihood estimate (MLE) corresponds to the minimizer of the following nonconvex optimization problem,
Problem 1 (Rotation Synchronization).
(8a)  
subject to  (8b) 
3.1.2 Pose Synchronization
Next, we consider estimation of full pose variables which is the default setup in posegraph SLAM. Analogous to Problem 1, this problem is also known as pose synchronization or motion averaging. Each variable is now an element from the special Euclidean group, represented (with a slight abuse of notation) as , where represents the translation component of pose . In addition to the relative rotation measurements (7), we also obtain relative translation measurements corrupted by additive Gaussian noise,
(9) 
where denotes the ground truth relative translation. The MLE of pose synchronization corresponds to the minimizer of the following problem,
Problem 2 (Pose Synchronization).
(10a)  
subject to  (10b) 
3.1.3 Distributed PGO
In this work, we focus on solving Problem 1 and 2 in the distributed setting. More specifically, we consider two important realworld applications: CNL and CSLAM. In CNL, a network of cameras need to localize each other with respect to a common reference. In this case, each vertex in the pose graph represents the rotation or pose of a single camera. Relative measurements are extracted between camera pairs with overlapping fields of view using standard twoview geometry techniques and subject to scale ambiguity in the case of monocular cameras. See Figure 1(a) for a simple illustration.
In CSLAM, multiple robots need to jointly estimate their trajectories in a common reference frame. In the pose graph, each vertex represents the pose of a robot at a certain time step. Odometric measurements and intrarobot loop closures connect poses within a single robot’s trajectory. When two robots visit the same part of the environment, they establish interrobot loop closures that link their respective poses. See Figure 1(b) for a simple example. In this case, we can further divide vertices into two classes based on whether they have interrobot loop closures.
Definition 1 (Public and private poses in CSLAM).
In CSLAM, all poses that share interrobot loop closures with poses of other robots are called public poses. All other poses are private poses.
In the literature, public poses are also known as separators [6]. These poses play an important role in distributed PGO as they allow information to flow across the entire team. Specifically, during optimization, each agent (camera or robot) update its own poses after receiving the updated public poses from its neighbors (i.e., those connected by loop closures in the pose graph). In Figure 1(b), public poses are marked in red and private poses are marked in black. We note that with this definition, all poses in CNL would be characterized as public since each agent (camera) only has a single pose.
3.2 SDP Relaxation for PGO
Traditionally, Problem 1 and 2 are solved with local search algorithms such as GaussNewton. However, depending on the noise level and the quality of initialization, local search algorithms are susceptible to local minima [37]. To address this critical issue, recent works aim to develop certifiably correct global PGO solvers. In particular, techniques based on SDP relaxation demonstrate stateoftheart performance and furthermore provide strong theoretical guarantees under low noise regimes [7, 9, 11]. In this section, we present SDP relaxations for Problems 1 and 2.
We begin with rotation synchronization (Problem 1). Let denote the stacked rotation variables. It can be shown that the cost function (8a) in Problem 1 can be equivalently written as , where is the socalled connection Laplacian matrix [7]. Consider the “lifted” variable . Treating as a blockstructured matrix, we can rewrite the constraints (8b) in Problem 1 as , for , , and for all . Dropping the last two nonconvex constraints yields the following SDP relaxation for Problem 1.
Problem 3 (SDP Relaxation for Rotation Synchronization).
(11a)  
subject to  (11b) 
Following similar steps, one can derive the SDP relaxtion for the pose synchronization problem [8]. Denote the connection Laplacian of Problem 2 as , and treat the SDP variable as a blockstructured matrix. Then, the SDP relaxation is given by,
Problem 4 (SDP Relaxation for Pose Synchronization [8]).
(12a)  
subject to  (12b) 
The original SESync algorithm [7] employs a different SDP relaxation for Problem 2 by first using the separable structure of PGO [16] to analytically eliminate the translation variables, and subsequently performing convex relaxation over the reduced rotationonly problem. The resulting SDP has the form,
Problem 5 (Rotationonly SDP Relaxation for Pose Synchronization [7]).
(13a)  
subject to  (13b) 
In (13a), is a dense cost matrix, and is essentially the generalized Schur complement of the sparse connection Laplacian (see Appendix A). As the first technical contribution in this section, we establish the following theorem which characterizes key relations between Problem 4 and Problem 5.
Theorem 1 suggests that, for pose synchronization, the elimination of translation variables does not affect the optimal value or the rank of solutions in the SDP relaxation. This result has a farreaching impact, as it further allows us to establish equivalent lownoise guarantees for the two SDP relaxations. Specifically, Rosen et al. [7] show that under low noise, the rotationonly SDP (Problem 5) is exact, i.e., from its solution one can recover a global minimizer to the original nonconvex pose synchronization problem [7, Proposition 2]. Using this result and Theorem 1, we show that the same a priori guarantee can be established for our SDP relaxation (Problem 4), which is first used in [8] albeit without exactness guarantees. We give an informal statement below, and provide the formal theorem and its proof in Appendix B.
Theorem 2 (Exact recovery via Problem 4 (Informal)).
Theorem 2 provides a strong theoretical justification on why we solve Problem 4 — under low noise (which we characterize in Appendix B), one can directly read off a global minimizer to Problem 2 from the first block row of any . We note that this global optimality result usually holds on realworld datasets (see Section 7). Similar guarantees can be established for rotation synchronization, e.g., using a subset of the machinery in [7]; see also [9] for a similar result.
Remark 1.
Here we explain why we refrain from pursuing the original SDP relaxation (Problem 5) used by SESync [7] in this work. Problem 5 enjoys several benefits including having a compact search space and better numerical conditioning. However, the cost matrix in Problem 5 is dense due to the Schur complement operation (see Appendix A). In graph terms, eliminating the translation variables makes the underlying dependency graph fully connected. This is a major drawback in the distributed setting since robots’ public and private poses become fully dependent on other robots’ trajectories, which increases the communication costs substantially. As we shall see in the following sections, our proposed algorithms rely on and exploit the sparse graphical structure of the problem to achieve computational and communication efficiency, and to preserve the privacy of participating robots. For this reason, in this work we adopt the alternative approach of Briales and GonzalezJimenez [8] which directly relaxes the original problem and thus preserves the essential sparsity structure (e.g., as in Figure 1(a) and 1(b)) for distributed optimization. As shown by Theorem 2, the alternative approach preserves the essential theoretical guarantees.
3.3 The Riemannian Staircase Algorithm
In typical CSLAM scenarios, the dimension of the SDP relaxation can be quite large (e.g., ), and thus it is often impractical to solve Problems 3 and 4 with interiorpoint methods. To address this issue, Burer and Monteiro in their seminal work [13] propose to solve rankrestricted versions of the original SDPs. This approach is justified by a theorem of Barvinok [38] and Pataki [39], which guarantees the existence of lowrank solutions for SDPs with compact search spaces. Specifically, applying this theorem to Problem 3 and 5 guarantees that both SDPs admit solutions with rank no greater than [12]. As a direct consequence of Theorem 1, the same result holds for our noncompact SDP (Problem 4).
Corollary 1.
Problem 4 admits a minimizer with .
For SDPs with blockdiagonal constraints, Boumal [12] extends the general approach of Burer and Monteiro [13] by further exploiting the geometric structure within the problem. The result is an elegant algorithm known as Riemannian Staircase, which has been used as the backend SDP solver in [7, 8]. In the Riemannian Staircase, we search for the SDP solution by solving a hierarchy of rankrestricted surrogates. At each level, we impose a rank factorization () of the original SDP variable, i.e., by letting,
(14)  
(15) 
in Problems 3 and 4, respectively. It can be shown that the resulting rankrestricted SDPs are smooth (albeit nonconvex) optimization problems on the Cartesian product of Stiefel manifolds; see Problems 6 and 7 below.
Problem 6 (Rankrestricted SDP for Rotation Synchronization).
(16) 
Problem 7 (Rankrestricted SDP for Pose Synchronization).
(17) 
In principle, one may attempt to solve Problems 6 and 7 via Riemannian local search algorithms. Due to the nonconvex constraints in both problems, however, there is no a priori guarantee that the obtained solution is a global minimizer, or that it can be used to extract the SDP solution, at least for small values of . Nevertheless, given any firstorder critical point, we can obtain a posthoc certificate of global optimality by verifying the KKT conditions of the original SDP [12]. We discuss the details of this verification procedure in Section 6.2.
Algorithm 1 summarizes the Riemannian Staircase algorithm for pose synchronization. At level of the hierarchy, we first solve the rankrestricted SDP using local search algorithm (line 7). If the obtained firstorder critical point passes the verification procedure, we use it to extract the solution to SDP and the algorithm terminates (line 8). If not, the current solution is used to warmstart the local search at the next level of the hierarchy (line 10). Although Riemannian Staircase is an iterative procedure, in practice it typically identifies the SDP solution at the first level (e.g., with ); see Section 7.
We conclude this section by discussing the implications of using a firstorder local search method within Riemannian Staircase (Algorithm 1, line 7). Boumal et al. [40] show that for almost all SDPs with compact search spaces and linearly independent constraints, if where is the number of SDP constraints, any secondorder critical points of the corresponding rankrestricted SDPs are globally optimal. Such result motivates the use of secondorder local search algorithms (e.g., RTR) within Algorithm 1, since for sufficiently large , these algorithms are guaranteed to converge to global minimizers at the first iteration. Nonetheless, in the distributed setting, it is challenging to design algorithms with secondorder convergence guarantees. For this reason, we decide to pursue firstorder convergence (Section 45) and employ verification (Section 6.2) as posthoc certificate of global optimality.
We note that in reality, there are also several practical issues when applying the theorem of Boumal et al. [40] to problems considered in this work. First, the theorem in its current form does not hold for pose synchronization, since the SDP relaxation (Problem 4) has a noncompact search space. Second, the bound on (which translates to ) is too conservative in typical PGO problems. As we mentioned earlier, in practice global optimality often holds for much smaller ; see Section 7 and [7, 8].
4 Riemannian BlockCoordinate Descent
In this section, we propose a distributed local search algorithm to find firstorder critical points of the rankrestricted SDPs inside the Riemannian Staircase framework. For both CNL and CSLAM, the vertex set in the pose graph admits a natural disjoint partition . In CNL, each block corresponds to a single camera, i.e., and . In CSLAM, each block corresponds to the trajectory of a single robot , i.e, is the number of robots and contains the indices of pose variables owned by robot .
To leverage the abovementioned natural partitions (as well as more generalized blocking schemes discussed in Section 4.5), we propose Riemannian blockcoordinate descent (RBCD) as the distributed local search algorithm within Riemannian Staircase. At an abstract level, RBCD solves a general optimization problem over the direct product of matrix submanifolds,
(18) 
Given a disjoint partitioning of indices , each iteration of RBCD optimizes variables in a single block . For now, this means that in each iteration only a single agent (e.g., camera or robot) updates its decision variables, while other agents remain idle. We address this limitation in Section 4.5 by providing highly effective parallel execution schemes for RBCD. Algorithm 2 provides the pseudocode for RBCD.
The rest of this section is organized to discuss details of each step in Algorithm 2. We begin with the discussion of block selection rules in Section 4.1 to determine how blocks should be selected at each iteration of RBCD (Line 8). Then, in Section 4.2, we propose a general block update rule for an arbitrary manifold optimization problem. In Section 4.34.4, we focus on the special cases of rotation and pose synchronizations in CNL, and derive optimal block update rules in these contexts. Finally, in Section 4.5, we discuss parallel execution schemes that further accelerate RBCD in practice.
4.1 Block Selection Rules
In this section, we describe three mechanisms for selecting which block to update at each iteration of RBCD (line 8). The first two block selection rules are based on random sampling [41, 34]. At each iteration, a block is selected with probability . The simplest among such rules is uniform sampling, in which all blocks are selected with an equal probability, i.e., .
In practice, it is often the case that selecting certain blocks leads to significantly larger decrement of the cost function compared to others. Therefore, it is natural to assign these blocks higher weights during the sampling process. We refer to this block selection rule as importance sampling. In this work, we design the probability of selecting each block to be proportional to the squared norm of Riemannian gradient, i.e., . Under Lipschitztype conditions, it can be shown that the squared gradient norm provides a lower bound on the cost decrement along the direction of negative gradient [25].
We can also modify importance sampling into a deterministic strategy. At each iteration, we can directly choose the block with the largest squared gradient norm, i.e., . We refer to this strategy as greedy selection and more specifically the GaussSouthwell (GS) rule [25]. Recent works also propose other variants of greedy selection such as GaussSouthwellLipschitz (GSL) and GaussSouthwellQuadratic (GSQ) [25]. However, such rules require additional knowledge about the block Lipschitz constants that are often hard to obtain in practice. For this reason, we restrict our deterministic selection rule to GS. Despite its simpler nature, empirically the GS rule already demonstrates satisfactory performance; see Section 7.
In practice, although importance sampling and greedy selection tend to produce more effective iterations, they also incur additional coordination and communication overhead in the distributed scenario. For example, with greedy selection, at the beginning of each iteration agents need to coordinate and find the block with maximum squared gradient norm. In contrast, uniform sampling incurs minimal overhead in the block selection stage.
4.2 Block update with Riemannian TrustRegion (RTR)
In this section, we describe our default method to perform block update at each iteration of RBCD (line 9). Let be the component of corresponding to the selected block, and let be the complement of in whose values are fixed at this iteration. Define the reduced cost function . To perform block update, we solve the reduced problem,
(19) 
As concrete examples, consider solving the rankrestricted SDP of rotation synchronization (Problem 6) in the context of CSLAM. Recall that in this case, each block corresponds to variables owned by robot in Problem 6. Fixing all other robots’ variables , it can be shown that the reduced problem defined over is of the form,
(20) 
where is the submatrix of formed with the rows and columns that correspond to robot ’s variables, and is a constant matrix that depends on the (fixed) public variables of robot ’s neighbors.
Similarly, for pose synchronization (Problem 7), let be the set of variables corresponding to the trajectory of robot . Fixing the variables of all other robots, the reduced problem over is of the form,
(21) 
where the constant matrices , have similar interpretations as in (20).
The general problem (19) and its particular instances (20) and (21) only involve local variables of each agent. Therefore, after receiving the fixed public variables of its neighbors over the network, each agent can solve its reduced problem locally and independently. Nevertheless, due to the manifold constraints, these problems are in general nonconvex and thus hard to solve to global optimality.
A natural alternative is to perform inexact update to in order to sufficiently decrease the reduced cost. For this purpose, there is a variety of optimization algorithms on matrix manifolds that one can consider. In this work, we select the popular secondorder Riemannian trustregion (RTR) algorithm [15, Chapter 7] to solve the reduced problems. Compared to alternative firstorder methods such as Riemannian gradient descent, RTR uses secondorder information of the (local) reduced problem to speed up optimization. Empirically, we observe that in most cases, a single iteration of RTR yields sufficient descent on the cost function. As mentioned above, since by design each block corresponds to the decision variables of a single agent, RTR iterations (within RBCD) are executed locally by the selected agent. More details and extensive analysis of the algorithm are provided in Section 5.
Accelerating RTR via Preconditioning
In practice, most instances of (20) and (21) in CSLAM are poorly conditioned, i.e., the condition numbers of and are quite large. In these cases, suitable preconditioning can significantly speed up numerical optimization. In general, a Riemannian preconditioner for the reduced problem is a linear, symmetric, and positive definite operator that approximates the inverse of the Riemannian Hessian. In the particular cases of SLAM, Rosen and Carlone [42], Briales and GonzalezJimenez [8] have already proposed empirically effective preconditioners for problems similar in nature to (20) and (21).^{2}^{2}2The only difference is the additional linear terms in our cost functions, as a result of anchoring variables owned by other robots. The main idea is to approximate the directional derivatives of the Riemannian gradient with that of the Euclidean gradient in the ambient space. More specifically, we let our preconditioners be,
(22) 
(23) 
for problem (20) and (21), respectively. The small constant ensures that the proposed preconditioners are positive definite. It is straightforward to verify that (22) and (23) are linear and symmetric; e.g., see [8, Appendix VI]. In practice, each robot can compute and reuse the Cholesky decomposition of and for improved numerical efficiency.
4.3 Optimal Update for Single Rotation
The RTRbased block update described in the previous section works for arbitrary instances of the reduced problem (19). Nevertheless, for Problem 6 in the context of CNL, it is possible to perform optimal update by optimizing each reduced problem exactly. Recall that in CNL, each block in Problem 6 contains a single variable that corresponds to the “lifted” rotation of a single camera. After fixing all other variables for , the reduced problem over is,
(24) 
where and distinguish neighbors of in the pose graph based on edge orientations.
Problem (24) is similar to an instance of the single rotation averaging problem [18]. After a series of algebraic manipulations [35], one can show that (24) is equivalent to minimizing the cost function for a constant matrix defined as,
(25) 
Thus the solution to this problem is given by,
(26) 
where the projection to Stiefel manifold can be implemented in closedform via singular value decomposition (SVD). Specifically, let be a thin, rank SVD of , i.e., and . Then, the projection is given by [43, Theorem 2.1]. In practice, is a small matrix (e.g., and in our experiments). The small size of implies that the SVD operation is very cheap, which makes each optimal update very efficient.
4.4 Optimal Update for Single Pose
Similarly, we show that optimal update can be derived for Problem 7 in CNL. Similar to the previous section, each block contains a single variable corresponding to the “lifted” pose of a single camera. After fixing all other blocks , the reduced optimization problem over is,
(27)  
subject to 
In (27), the optimization with respect to is an (unconstrained) linear least squares problem. We can thus analytically eliminate from (27) and form a further reduced problem involving only . Specifically, given any value of , the optimal value of (as a function of ) is given by the following weighted average in ,
(28) 
(29) 
To derive the reduced problem over , we substitute (28) into the original cost function (27). Using the fact that , we can simplify the resulting problem into the form of minimizing , where is a constant matrix of the form,
(30) 
In (30), is the same cost matrix as defined in (25), and summarizes costs induced by the translation measurements. We give the exact expression for in Appendix G. Similar to Section 4.3, the optimal is given by the projection,
(31) 
Afterwards, the optimal is recovered by substituting into (28).
4.5 Parallel Execution
So far we have assumed that the decision variables are partitioned into (disjoint) blocks , where each block contains variables owned by a single agent. At each iteration of RBCD, only one agent updates its block while others remain idle. However, the natural graphical decomposition of objectives in Problems 6 and 7 (inherited from Problems 1 and 2, respectively) allows us to update multiple blocks in parallel provided that every pair of those blocks are disconnected in the graphical representation. For example, in CNL a subset of cameras can be updated in parallel if there are no intercamera measurements connecting a camera pair. Similarly, in CSLAM a subset of trajectories (corresponding to multiple robots) can be updated in parallel if there are no interrobot loop closures between them.
Definition 2 (Proper Aggregate Blocking Scheme).
Let . We call