Distributed Mapping with Privacy and Communication Constraints: Lightweight Algorithms and Objectbased Models
Abstract
We consider the following problem: a team of robots is deployed in an unknown environment and it has to collaboratively build a map of the area without a reliable infrastructure for communication. The backbone for modern mapping techniques is pose graph optimization, which estimates the trajectory of the robots, from which the map can be easily built. The first contribution of this paper is a set of distributed algorithms for pose graph optimization: rather than sending all sensor data to a remote sensor fusion server, the robots exchange very partial and noisy information to reach an agreement on the pose graph configuration. Our approach can be considered as a distributed implementation of the twostage approach of Carlone et al. (2015b), where we use the Successive OverRelaxation (SOR) and the Jacobi OverRelaxation (JOR) as workhorses to split the computation among the robots. We also provide conditions under which the proposed distributed protocols converge to the solution of the centralized twostage approach. As a second contribution, we extend the proposed distributed algorithms to work with objectbased map models. The use of objectbased models avoids the exchange of raw sensor measurements (e.g., point clouds or RGBD data) further reducing the communication burden. Our third contribution is an extensive experimental evaluation of the proposed techniques, including tests in realistic Gazebo simulations and field experiments in a military test facility. Abundant experimental evidence suggests that one of the proposed algorithms (the Distributed GaussSeidel method or DGS) has excellent performance. The DGS requires minimal information exchange, has an anytime flavor, scales well to large teams (we demonstrate mapping with a team of 50 robots), is robust to noise, and is easy to implement. Our field tests show that the combined use of our distributed algorithms and objectbased models reduces the communication requirements by several orders of magnitude and enables distributed mapping with large teams of robots in realworld problems.
itemizestditemize \newrefformatprobProblem LABEL:#1 \newrefformatdefDefinition LABEL:#1 \newrefformatsecSection LABEL:#1 \newrefformatsubSection LABEL:#1 \newrefformatpropProposition LABEL:#1 \newrefformatappAppendix LABEL:#1 \newrefformatalgAlgorithm LABEL:#1 \newrefformatcorCorollary LABEL:#1 \newrefformatthmTheorem LABEL:#1 \newrefformatlemLemma LABEL:#1 \newrefformatfigFig. LABEL:#1 \newrefformattabTable LABEL:#1
Siddharth Choudhary, School of Interactive Computing, College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
1 Introduction
The deployment of large teams of cooperative autonomous robots has the potential to enable fast information gathering, and more efficient coverage and monitoring of vast areas. For military applications such as surveillance, reconnaissance, and battle damage assessment, multirobot systems promise more efficient operation and improved robustness in contested spaces. In civil applications (e.g., pollution monitoring, precision agriculture, search and rescue, disaster response), the use of several inexpensive, heterogeneous, agile platforms is an appealing alternative to monolithic single robot systems.
The deployment of multi robot systems in the real world poses many technical challenges, ranging from coordination and formation control, to task allocation and distributed sensor fusion. In this paper we tackle a specific instance of the sensor fusion problem. We consider the case in which a team of robots explores an unknown environment and each robot has to estimate its trajectory from its own sensor data and by leveraging information exchange with the teammates. Trajectory estimation, also called pose graph optimization, is relevant as it constitutes the backbone for many estimation and control tasks (e.g., geotagging sensor data, mapping, positionaware task allocation, formation control). Indeed, in our application, trajectory estimation enables distributed 3D mapping and localization (Fig. 1).
We consider a realistic scenario, in which the robots can only communicate when they are within a given distance. Moreover, also during a rendezvous (i.e., when the robots are close enough to communicate) they cannot exchange a large amount of information, due to bandwidth constraints. Our goal is to design a technique that allows each robot to estimate its own trajectory, while asking for minimal knowledge of the trajectory of the teammates. This “privacy constraint” has a clear motivation in a military application: in case one robot is captured, it cannot provide sensitive information about the areas covered by the other robots in the team. Similarly, in civilian applications, one may want to improve the localization of a device (e.g., a smart phone) by exchanging information with other devices, while respecting users’ privacy. Ideally, we want our distributed mapping approach to scale to very large teams of robots. Our ultimate vision is to deploy a swarm of agile robots (e.g., micro aerial vehicles) that can coordinate by sharing minimal information and using onboard sensing and computation. The present paper takes a step in this direction and presents distributed mapping techniques that are shown to be extremely effective in large simulations (with up to 50 robots) and in realworld problem (with up to 4 robots).
Contribution. We consider a distributed maximumlikelihood (ML) trajectory estimation problem in which the robots have to collaboratively estimate their trajectories while minimizing the amount of exchanged information. We focus on a fully 3D case, as this setup is of great interest in many robotics applications (e.g., navigation on uneven terrain, underwater and aerial vehicles). We also consider a fully distributed setup, in which the robots communicate and acquire relative measurements only during rendezvous.
We present two key contributions to solve the distributed mapping problem. The first contribution are a set of distributed algorithms that enable distributed inference at the estimation backend. This contribution is presented in Section 3. Our approach can be understood as a distributed implementation of the chordal initialization discussed in Carlone et al. (2015b). The chordal initialization (recalled in Section 3.2) consists in approximating the ML trajectory estimate by solving two quadratic optimization subproblems. The insight of the present work is that these quadratic subproblems can be solved in a distributed fashion, leveraging distributed linear system solvers. In particular, we investigate distributed implementations of the Jacobi OverRelaxation and the Successive OverRelaxation. These distributed solvers imply a communication burden that is linear in the number of rendezvous among the robots. Moreover, they do not rely on the availability of an accurate initial guess as in related work (see Section 2). In Section 3.3 we discuss conditions under which the distributed algorithms converge to the same estimate of the chordal initialization (Carlone et al., 2015b), which has been extensively shown to be accurate and resilient to measurement noise.
The second contribution is the use of highlevel objectbased models at the estimation frontend and as map representation. This contribution is presented in Section 4. Traditional approach for multi robot mapping rely on featurebased maps which are composed of low level primitives like points and lines (Davison et al., 2007). These maps become memoryintensive for longterm operation, contain a lot of redundant information (e.g., it is unnecessary to represent a planar surface with thousands of points), and lack the semantic information necessary for performing wider range of tasks (e.g., manipulation tasks, humanrobot interaction). To solve these issues, we present an approach for multi robot SLAM which uses object landmarks (SalasMoreno et al., 2013) in a multi robot mapping setup. We show that this approach further reduces the information exchange among robots, results in compact humanunderstandable maps, and has lower computational complexity as compared to lowlevel featurebased mapping.
The third contribution is an extensive experimental evaluation including realistic simulations in Gazebo and field tests in a military facility. This contribution is presented in Section 5. The experiments demonstrate that one of the proposed algorithms, namely the Distributed GaussSeidel method, provides accurate trajectory estimates, is more parsimonious, communicationwise, than related techniques, scales to large tea, and is robust to noise. Finally, our field tests show that the combined use of our distribute algorithms and objectbased models reduces the communication requirements by several orders of magnitude and enables distributed mapping with large teams of robots.
Section 6 concludes the paper and discusses current and future work towards realworld robust mapping with large swarms of flying robots.
2 Related Work
Multi Robot Localization and Mapping. Distributed estimation in multi robot systems is currently an active field of research, with special attention being paid to communication constraints (Paull et al., 2015), heterogeneous teams (Bailey et al., 2011; Indelman et al., 2012), estimation consistency (Bahr et al., 2009), and robust data association (Indelman et al., 2014; Dong et al., 2015). Robotic literature offers distributed implementations of different estimation techniques, including Extended Kalman filters (Roumeliotis and Bekey, 2002; Zhou and Roumeliotis, 2006), information filters (Thrun and Liu, 2003), and particle filters (Howard, 2006; Carlone et al., 2011). More recently, the community reached a large consensus on the use of maximum likelihood (ML) estimation (maximum aposteriori, in presence of priors), which, applied to trajectory estimation, is often referred to as pose graph optimization or posebased SLAM. ML estimators circumvent wellknown issues of Gaussian filters (e.g., buildup of linearizion errors) and particle filters (e.g., particle depletion), and frame the estimation problem in terms of nonlinear optimization. In multi robot systems, ML trajectory estimation can be performed by collecting all measurements at a centralized inference engine, which performs the optimization (Andersson and Nygards, 2008; Kim et al., 2010; Bailey et al., 2011). Variants of these techniques invoke partial exchange of raw or preprocessed sensor data (Lazaro et al., 2011; Indelman et al., 2014).
In many applications, however, it is not practical to collect all measurements at a single inference engine. When operating in hostile environment, a single attack to the centralized inference engine (e.g., one of the robot) may threaten the operation of the entire team. Moreover, the centralized approach requires massive communication and large bandwidth. Furthermore, solving trajectory estimation over a large team of robots can be too demanding for a single computational unit. Finally, the centralized approach poses privacy concerns as it requires to collect all information at a single robot; if an enemy robot is able to deceive the other robots and convince them that it is part of the team, it can easily gather sensitive information (e.g., trajectory covered and places observed by every robot). These reasons triggered interest towards distributed trajectory estimation, in which the robots only exploit local communication, in order to reach a consensus on the trajectory estimate. Nerurkar et al. (2009) propose an algorithm for cooperative localization based on distributed conjugate gradient. Franceschelli and Gasparri (2010) propose a gossipbased algorithm for distributed pose estimation and investigate its convergence in a noiseless setup. Aragues et al. (2011) use a distributed Jacobi approach to estimate a set of 2D poses, or the centroid of a formation (Aragues et al., 2012a). Aragues et al. (2012b) investigate consensusbased approaches for map merging. Knuth and Barooah (2013) estimate 3D poses using distributed gradient descent. Cunningham et al. (2010) use Gaussian elimination, and develop an approach, called DDFSAM, in which each robot exchange a Gaussian marginal over the separators (i.e., the variables shared by multiple robots); the approach is further extended in Cunningham et al. (2013), to avoid storage of redundant data.
Another related body of work is the literature on parallel and hierarchical approaches for mapping. Also in this case, Gaussian elimination and Schur complement have been used as a key component for hierarchical approaches for largescale mapping (Ni and Dellaert, 2010; Grisetti et al., 2010; Suger et al., 2014). Decoupled stochastic mapping was one of the earliest approach for submapping proposed by Leonard and Feder (2001). Leonard and Newman (2003) propose a constanttime SLAM solution which achieves nearoptimal results under the assumption that the robot makes repeated visits to all regions of the environment. Frese et al. (2005) use multilevel relaxations resulting in a linear time update. Frese (2006) propose the TreeMap algorithm which divides the environment according to a balanced binary tree. Estrada et al. (2005) present a hierarchical SLAM approach which consist of a set of local maps and enforces loop consistency when calculating the optimal estimate at the global level. Ni et al. (2007) present an exact submapping approach within a ML framework, and propose to cache the factorization of the submaps to speedup computation. Grisetti et al. (2010) propose hierarchical updates on the map: whenever an observation is acquired, the highest level of the hierarchy is modified and only the areas which are substantially modified are changed at lower levels. Ni and Dellaert (2010) extend their previous approach (Ni et al., 2007) to include multiple levels and use nested dissection to minimize the dependence between two subtrees. Grisetti et al. (2012) compute a good initial estimate for global alignment through a submapping approach. Zhao et al. (2013) propose an approximation for largescale SLAM by solving for a sequence of submaps and joining them in a divideandconquer manner using linear least squares. Suger et al. (2014) present an approximate SLAM approach based on hierarchical decomposition to reduce the memory consumption for pose graph optimization.
While Gaussian elimination has become a popular approach it has two major shortcomings. First, the marginals to be exchanged among the robots are dense, and the communication cost is quadratic in the number of separators. This motivated the use of sparsification techniques to reduce the communication cost (Paull et al., 2015). The second reason is that Gaussian elimination is performed on a linearized version of the problem, hence these approaches require good linearization points and complex bookkeeping to ensure consistency of the linearization points across the robots (Cunningham et al., 2013). The need of a linearization point also characterizes gradientbased techniques (Knuth and Barooah, 2013). In many practical problems, however, no initial guess is available, and one has to develop adhoc initialization techniques, e.g., (Indelman et al., 2014).
Related Work in Other Communities. Distributed position and orientation estimation is a fertile research area in other communities, including sensor networks, computer vision, and multi agent systems. In these works, the goal is to estimate the state (e.g. position or orientation) of an agent (e.g., a sensor or a camera) from relative measurements among the agents. A large body of literature deals with distributed localization from distance measurements, see Anderson et al. (2010); Calafiore et al. (2012); Simonetto and Leus (2014); Wei et al. (2015) and the references therein. The case of position estimation from linear measurements is considered in Barooah and Hespanha (2005, 2007); Russell et al. (2011); Carron et al. (2014); Todescato et al. (2015); Freris and Zouzias (2015); the related problem of centroid estimation is tackled in Aragues et al. (2012a). Distributed rotation estimation has been studied in the context of attitude synchronization (Thunberg et al., 2011; Hatanaka et al., 2010; OlfatiSaber, 2006), camera network calibration (Tron and Vidal, 2009; Tron et al., 2012a), sensor network localization (Piovan et al., 2013), and distributed consensus on manifolds (Sarlette and Sepulchre, 2009; Tron et al., 2012b).
HighLevel Map Representations. Semantic mapping using highlevel objectbased representation has gathered a large amount of interest from the robotics community. Kuipers (2000) model the environment as a spatial semantic hierarchy, where each level expresses states of partial knowledge corresponding to different level of representations. Ranganathan and Dellaert (2007) present a 3D generative model for representing places using objects. The object models are learned in a supervised manner. Civera et al. (2011) propose a semantic SLAM algorithm that annotates the lowlevel 3D point based maps with precomputed object models. Rogers et al. (2011) recognize door signs and read their text labels (e.g., room numbers) which are used as landmarks in SLAM. Trevor et al. (2012) use planar surfaces corresponding to walls and tables as landmarks in a mapping system. Bao et al. (2012) model semantic structure from motion as a joint inference problem where they jointly recognize and estimate the location of highlevel semantic scene components such as regions and objects in 3D. SLAM++, proposed by SalasMoreno et al. (2013), train domainspecific object detectors corresponding to repeated objects like tables and chairs. The learned detectors are integrated inside the SLAM framework to recognize and track those objects resulting in a semantic map. Similarly, Kim et al. (2012) use learned object models to reconstruct dense 3D models from single scan of the indoor scene. Choudhary et al. (2014) proposed an approach for for online object discovery and object modeling, and extend a SLAM system to utilize these discovered and modeled objects as landmarks to help localize the robot in an online manner. Pillai and Leonard (2015) develop a SLAMaware object recognition system which result in a considerably stronger recognition performance as compared to related techniques. GálvezLópez et al. (2016) present a realtime monocular objectbased SLAM using a large database of 500 3D objects and show exploiting object rigidity both improve the map and find its real scale. Another body of related work is in the area of dense semantic mapping where the goal is to categorize each voxel or 3D point with a category label. Related work in dense semantic mapping include Nüchter and Hertzberg (2008); Koppula et al. (2011); Pronobis and Jensfelt (2012); Finman et al. (2013); Kundu et al. (2014); Vineet et al. (2015); Valentin et al. (2015); McCormac et al. (2016) and the references therein.
3 Dealing with Bandwidth Constraints I: Distributed Algorithms
The first contribution of this paper is to devise distributed algorithms that the robots can implement to reach consensus on a globally optimal trajectory estimate using minimal communication. Section 3.1 introduces the mathematical notation and formalizes the problem. Section 3.2 presents a centralized algorithm, while Section 3.3 presents the corresponding distributed implementations.
3.1 Problem Formulation: Distributed Pose Graph Optimization
We consider a multi robot system and we denote each robot with a Greek letter, such that the set of robots is . The goal of each robot is to estimate its own trajectory using the available measurements, and leveraging occasional communication with other robots. The trajectory estimation problem and the nature of the available measurements are made formal in the rest of this section.
We model each trajectory as a finite set of poses (triangles in Fig. 2); the pose assumed by robot at time is denoted with (we use Roman letters to denote time indices). We are interested in a 3D setup, i.e., , where is the Special Euclidean group of 3D rigid transformations; when convenient, we write , making explicit that each pose includes a rotation , and a position . The trajectory of robot is then denoted as .
Measurements. We assume that each robot acquires relative pose measurements. In practice these are obtained by postprocessing raw sensor data (e.g., scan matching on 3D laser scans). We consider two types of measurements: intrarobot and interrobot measurements. The intrarobot measurements involve the poses of a single robot at different time instants; common examples of intrarobot measurements are odometry measurements (which constrain consecutive robot poses, e.g., and in Fig. 2) or loop closures (which constrain nonconsecutive poses, e.g., and in Fig. 2). The interrobot measurements are the ones relating the poses of different robots. For instance, during a rendezvous, robot (whose local time is ), observes a second robot (whose local time is ) and uses onboard sensors to measure the relative pose of the observed robot in its own reference frame. Therefore, robot acquires an interrobot measurement, describing the relative pose between and (red links in Fig. 2). We use the term separators to refer to the poses involved in an interrobot measurement.
While our classification of the measurements (inter vs intra) is based on the robots involved in the measurement process, all relative measurements can be framed within the same measurement model. Since all measurements correspond to noisy observation of the relative pose between a pair of poses, say and , a general measurement model is:
(1) 
where the relative pose measurement includes the relative rotation measurements , which describes the attitude with respect to the reference frame of robot at time , “plus” a random rotation (measurement noise), and the relative position measurement , which describes the position in the reference frame of robot at time , plus random noise . According to our previous definition, intra robot measurements are in the form , for some robot and for two time instants ; interrobot measurements, instead, are in the form for two robots .
In the following, we denote with the set of intrarobot measurements for robot , while we call the set of intrarobot measurements for all robots in the team, i.e., . The set of interrobot measurements involving robot is denoted with ( is the mnemonic for “separator”). The set of all interrobot measurements is denoted with . The set of all available measurements is then . Note that each robot only has access to its own intra and interrobot measurements and .
ML trajectory estimation. Let us collect all robot trajectories in a single (tobeestimated) set of poses . The ML estimate for is defined as the maximum of the measurement likelihood:
(2) 
where we took the standard assumption of independent measurements. The expression of the likelihood function depends on the distribution of the measurements noise, i.e., in (1). We follow the path of Carlone et al. (2015a) and assume that translation noise is distributed according to a zeromean Gaussian with information matrix , while the rotation noise follows a VonMises distribution with concentration parameter .
Under these assumptions, it is possible to demonstrate (Carlone et al., 2015a) that the ML estimate can be computed as solution of the following optimization problem:
(3)  
The peculiarity of (3) is the use of the chordal distance to quantify rotation errors, while the majority of related works in robotics uses the geodesic distance, see (Carlone et al., 2015b) for details.
A centralized approach to solve the multi robot pose graph optimization problem (3) works as follows. A robot collects all measurements . Then, the optimization problem (3) is solved using iterative optimization on manifold (Dellaert, 2012), fast approximations (Carlone et al., 2015b), or convex relaxations (Rosen et al., 2016).
In this paper we consider the more interesting case in which it is not possible to collect all measurements at a centralized estimator, and the problem has to be solved in a distributed fashion. More formally, the problem we solve is the following.
Problem 1 (Distributed Trajectory Estimation).
Design an algorithm that each robot can execute during a rendezvous with a subset of other robots , and that

takes as input: (i) the intrarobot measurements and (ii) the subset of interrobot measurements , (iii) partial estimates of the trajectory of robots ;

returns as output: the ML estimate , which is such that is a minimizer of (3).
While the measurements and are known by robot , gathering the estimates from robots requires communication, hence we want our distributed algorithm to exchange a very small portion of the trajectory estimates.
The next sections present our solution to Problem 1. To help readability, we start with a centralized description of the approach, which is an adaptation of the chordal initialization of Carlone et al. (2015b) to the multi robot case. Then we tailor the discussion to the distributed setup in Section 3.3.
3.2 TwoStage Pose Graph Optimization: Centralized Description
The present work is based on two key observations. The first one is that the optimization problem (3) has a quadratic objective; what makes (3) hard is the presence of nonconvex constraints, i.e., . Therefore, as already proposed in Carlone et al. (2015b) (for the single robot, centralized case), we use a twostage approach: we first solve a relaxed version of (3) and get an estimate for the rotations of all robots, and then we recover the full poses and topoff the result with a GaussNewton (GN) iteration. The second key observation is that each of the two stages can be solved in distributed fashion, exploiting existing distributed linear system solvers. In the rest of this section we review the twostage approach of Carlone et al. (2015b), while we discuss the use of distributed solvers in Section 3.3.
The twostage approach of Carlone et al. (2015b) first solves for the unknown rotations, and then recovers the full poses via a single GN iteration. The two stages are detailed in the following.
Stage 1: rotation initialization via relaxation and projection. The first stage computes a good estimate of the rotations of all robots by solving the following rotation subproblem:
(4) 
which amounts to estimating the rotations of all robots in the team by considering only the relative rotation measurements (the second summand in (3)).
While problem (4) is nonconvex (due to the nonconvex constraints ), many algorithms to approximate its solution are available in literature. Here we use the approach proposed in Martinec and Pajdla (2007) and reviewed in Carlone et al. (2015b). The approach first solves the quadratic relaxation obtained by dropping the constraints , and then projects the relaxed solution to . In formulas, the quadratic relaxation is:
(5) 
which simply rewrites (4) without the constraints. Since (5) is quadratic in the unknown rotations , we can rewrite it as:
(6) 
where we stacked all the entries of the unknown rotation matrices into a single vector , and we built the (known) matrix and (known) vector accordingly (the presence of a nonzero vector follows from setting one of the rotations to be the reference frame, e.g., ).
Since (5) is a linear leastsquares problem, its solution can be found by solving the normal equations:
(7) 
Let us denote with the solution of (7). Rewriting in matrix form, we obtain the matrices , . Since these rotations were obtained from a relaxation of (4), they are not guaranteed to satisfy the constraints ; therefore the approach (Martinec and Pajdla, 2007) projects them to , and gets the rotation estimate , . The projection only requires to perform an SVD of and can be performed independently for each rotation (Carlone et al., 2015b).
Stage 2: full pose recovery via single GN iteration. In the previous stage we obtained an estimate for the rotations . In this stage we use this estimate to reparametrize problem (3). In particular, we rewrite each unknown rotation as the known estimate “plus” an unknown perturbation; in formulas, we rewrite each rotation as , where is the exponential map for , and (this is our new parametrization for the rotations). With this parametrization, eq. (3) becomes:
(8)  
Note that the reparametrization allowed to drop the constraints (we are now trying to estimate vectors in ), but moved the nonconvexity to the objective ( is nonlinear in its argument). In order to solve (8), we take a quadratic approximation of the cost function. For this purpose we use the following firstorder approximation of the exponential map:
(9) 
where is a skew symmetric matrix whose entries are defined by the vector . Substituting (9) into (8) we get the desired quadratic approximation:
(10)  
Rearranging the unknown of all robots into a single vector , we rewrite (10) as a linear leastsquares problem:
(11) 
whose solution can be found by solving the linear system:
(12) 
From the solution of (12) we can build our trajectory estimate: the entries of directly define the positions , ; moreover, includes the rotational corrections , from which we get our rotation estimate as: .
Remark 1 (Advantage of Centralized TwoStage Approach).
The approach reviewed in this section has three advantages. First, as shown in Carlone et al. (2015b), in common problem instances (i.e., for reasonable levels of measurement noise) it returns a solution that is very close to the ML estimate. Second, the approach only requires to solve two linear systems (the cost of projecting the rotations is negligible), hence it is computationally efficient. Finally, the approach does not require an initial guess, therefore, it is able to converge even when the initial trajectory estimate is inaccurate (in those instances, iterative optimization tends to fail (Carlone et al., 2015b)) or is unavailable.
3.3 Distributed Pose Graph Optimization
In this section we show that the twostage approach described in Section 3.2 can be implemented in a distributed fashion. Since the approach only requires solving two linear systems, every distributed linear system solver can be used as workhorse to split the computation among the robots. For instance, one could adapt the Gaussian elimination approach of Cunningham et al. (2010) to solve the linear systems (7) and (12). In this section we propose alternative approaches, based on the Distributed Jacobi OverRelaxation and the Distributed Successive OverRelaxation algorithms, and we discuss their advantages.
In both (7) and (12) we need to solve a linear system where the unknown vector can be partitioned into subvectors, such that each subvector contains the variables associated to a single robot in the team. For instance, we can partition the vector in (7), as , such that describes the rotations of robot . Similarly, we can partition in (12), such that describes the trajectory of robot . Therefore, (7) and (12) can be framed in the general form:
(13) 
where we want to compute the vector given the (known) block matrix and the (known) block vector ; on the right of (13) we partitioned the square matrix and the vector according to the blockstructure of .
In order to introduce the distributed algorithms, we first observe that the linear system (13) can be rewritten as:
Taking the contribution of out of the sum, we get:
(14) 
The set of equations (14) is the same as the original system (13), but clearly exposes the contribution of the variables associated to each robot. The equations (14) constitute the basis for the Successive OverRelaxation (SOR) and the Jacobi OverRelaxation (JOR) methods that we describe in the following sections.
Distributed Jacobi OverRelaxation (JOR):
The distributed JOR algorithm (Bertsekas and Tsitsiklis, 1989) starts at an arbitrary initial estimate and solves the linear system (13) by repeating the following iterations:
(15) 
where is the relaxation factor. Intuitively, at each iteration robot attempts to solve eq. (14) (the second summand in (15) is the solution of (14) with the estimates of the other robots kept fixed), while remaining close to the previous estimate (first summand in (15)). If the iterations (15) converge to a fixed point, say , then the resulting estimate solves the linear system (14) exactly (Bertsekas and Tsitsiklis, 1989, page 131). To prove this fact we only need to rewrite (15) after convergence:
(15) 
which can be easily seen to be identical to (14).
In our multi robot problem, the distributed JOR algorithm can be understood in a simple way: at each iteration, each robot estimates its own variables () by assuming that the ones of the other robots are constant (); iterating this procedure, the robots reach an agreement on the estimates, and converge to the solution of eq. (13). Using the distributed JOR approach, the robots solve (7) and (12) in a distributed manner. When , the distributed JOR method is also known as the distributed Jacobi (DJ) method.
We already mentioned that when the iterations (15) converge, then they return the exact solution of the linear system. So a natural question is: when do the Jacobi iteration converge? A general answer is given by the following proposition.
Proposition 2 (Convergence of JOR (Bertsekas and Tsitsiklis, 1989)).
Consider the linear systems (13) and define the block diagonal matrix . Moreover, define the matrix:
(16) 
where is the identity matrix of suitable size. Then, the JOR iterations (15) converge from any initial estimate if and only if , where denotes the spectral radius (maximum of absolute value of the eigenvalues) of a matrix.
The proposition is the same of Proposition 6.1 in (Bertsekas and Tsitsiklis, 1989) (the condition that is invertible is guaranteed to hold as noted in the footnote on page 144 of (Bertsekas and Tsitsiklis, 1989)).
It is nontrivial to establish whether our linear systems (7) and (12) satisfy the condition of Proposition 2. In the experimental section, we empirically observe that the Jacobi iterations indeed converge whenever . For the SOR algorithm, presented in the next section, instead, we can provide stronger theoretical convergence guarantees.
Distributed Successive OverRelaxation (SOR)
The distributed SOR algorithm (Bertsekas and Tsitsiklis, 1989) starts at an arbitrary initial estimate and, at iteration , applies the following update rule, for each :
(17) 
where is the relaxation factor, is the set of robots that already computed the th estimate, while is the set of robots that still have to perform the update (17), excluding node (intuitively: each robot uses the latest estimate). As for the JOR algorithm, by comparing (17) and (14), we see that if the sequence produced by the iterations (17) converges to a fixed point, then such point satisfies (14), and indeed solves the original linear system (13). When , the distributed SOR method is known as the distributed GaussSeidel (DGS) method.
The following proposition, whose proof trivially follows from (Bertsekas and Tsitsiklis, 1989, Proposition 6.10, p. 154) (and the fact that the involved matrices are positive definite), establishes when the distributed SOR algorithm converges to the desired solution.
Proposition 3 (Convergence of SOR).
According to (Bertsekas and Tsitsiklis, 1989, Proposition 6.10, p. 154), for , the SOR iterations (17) do not converge to the solution of the linear system in general, hence also in practice, we restrict the choice of in the open interval . In the experimental section, we show that the choice ensures the fastest convergence.
Communication Requirements for JOR and SOR
In this section we observe that to execute the JOR and SOR iterations (15)(17), robot only needs its intra and interrobot measurements and , and an estimate of the separators, involved in the interrobot measurements in . For instance, in the graph of Fig. 3 robot only needs the estimates of and , while does not require any knowledge about the other poses of .
To understand this fact, we note that both (7) and (12) model an estimation problem from pairwise relative measurements. It is well known that the matrix (sometimes called the Hessian (Dellaert, 2005)) underlying these problems has a block structure defined by the Laplacian matrix of the underlying graph (Barooah and Hespanha, 2007). For instance, Fig. 3 (right) shows the block sparsity of the matrix describing the graph on the left: offdiagonal blockelements in position are non zero if and only if there is an edge (i.e., a measurement) between and .
By exploiting the block sparsity of , we can further simplify the JOR (15) iterations as:
(18) 
where we simply removed the contributions of the zero blocks from the sum in (15).
Similarly we can simplify the SOR (17) iterations as:
(19) 
where we removed the contributions of the zero blocks from the sum in (17); the sets and satisfy , and are such that includes the interrobot measurements involving robots which already performed the th iteration, while is the set of measurements involving robots that have not performed the iteration yet (as before: each robot simply uses its latest estimate).
Eqs. (18) and (19) highlight that the JOR and SOR iterations (at robot ) only require the estimates for poses involved in its interrobot measurements . Therefore both JOR and SOR involve almost no “privacy violation”: every other robot in the team does not need to communicate any other information about its own trajectory, but only sends an estimate of its rendezvous poses.
Flagged Initialization
As we will see in the experimental section and according to Proposition 3, the JOR and SOR approaches converge from any initial condition when is chosen appropriately. However, starting from a “good” initial condition can reduce the number of iterations to converge, and in turns reduces the communication burden (each iteration (18) or (19) requires the robots to exchange their estimate of the separators).
In this work, we follow the path of Barooah and Hespanha (2005) and adopt a flagged initialization. A flagged initialization scheme only alters the first JOR or SOR iteration as follows. Before the first iteration, all robots are marked as “uninitialized”. Robot performs its iteration (18) or (19) without considering the interrobot measurements, i.e., eqs. (18)(19) become ; then the robot marks itself as “initialized”. When the robot performs its iteration, it includes only the separators from the robots that are initialized; after performing the JOR or SOR iteration, also marks itself as initialized. Repeating this procedure, all robots become initialized after performing the first iteration. The following iterations then proceed according to the standard JOR (18) or SOR (19) update. Barooah and Hespanha (2005) show a significant improvement in convergence using flagged initialization. As discussed in the experiments, flagged initialization is also advantageous in our distributed pose graph optimization problem.
4 Dealing With Bandwidth Constraints II: Compressing Sensor Data via Objectbased Representations
The second contribution of this paper is the use of highlevel objectbased models at the estimation frontend and as a map representation for multi robot SLAM. Objectbased abstractions are crucial to further reduce the memory storage and the information exchange among the robots.
Previous approaches for multi robot mapping rely on featurebased maps which become memoryintensive for longterm operation, contain a large amount of redundant information, and lack the semantic understanding necessary to perform a wider range of tasks (e.g., manipulation, humanrobot interaction). To solve these issues, we present an approach for multi robot SLAM which uses object landmarks (SalasMoreno et al., 2013) in a multi robot mapping setup.
Section 4.1 introduces the additional mathematical notation and formalizes the problem of distributed objectbased SLAM. Section 4.2 presents the implementation details of our distributed objectbased SLAM system.
4.1 Distributed Objectbased SLAM
We consider a multi robot system as defined in Section 3.1. Each robot, in addition to estimating its own trajectory using local measurements and occasional communication with other robots, also estimates the pose of a set of objects in the environment. We model each trajectory as a finite set of poses; the trajectory of robot is . In addition, we denote with the pose of the object in the coordinate frame of robot (Fig. 4).
Measurements. Similar to distributed pose graph optimization (Section 3.1), we assume that each robot acquires two types of relative pose measurements: intrarobot and interrobot measurements. The intrarobot measurements consist of the odometry measurements, which constrain consecutive robot poses (e.g., and in Fig. 4), and object measurements which constrains robot poses with the corresponding visible object landmarks (e.g., and in Fig. 4). Contrarily to Section 3.1, the interrobot measurements relate the object poses observed by different robots. During a rendezvous between robot and robot , each robot shares the label and pose of detected object landmarks with the other robot. Then, for each object observed by both robots, the teammates add an interrobot measurements, enforcing the object pose estimate to be consistent across the teammates. For instance, if and in Fig. 4 model the pose of the same object, then the two poses should be identical. For this reason, intrarobot measurement between a pair of associated object poses is zero.
The intrarobot object measurements follow the same measurements model of eq. (1). For instance, if the robot at time and at pose observes an object at pose , then the corresponding measurement measures the relative pose between and . Consistently with our previous notation, we denote intrarobot object measurement between and as , and interrobot measurement between object poses and as .
In the following, we denote with the set of intrarobot odometry for robot , while we call the set of intrarobot odometry measurements for all robots in the team, i.e., . Similarly the set of intrarobot object measurements for robot is denoted as , whereas the set of all intrarobot object measurements is denoted as . Similar to Section 3.1, the set of interrobot measurements involving robot is denoted with . The set of all interrobot measurements is denoted with . The set of all available measurements is then . Note that each robot only has access to its own intra and interrobot measurements , and .
ML trajectory and objects estimation. Let us collect all robot trajectories and object poses in a (tobeestimated) set of robot poses and set of object poses . The ML estimate for and is defined as the maximum of the measurement likelihood:
(20) 
where we used the same assumptions on measurement noise as in Section 3.1. Defining , we rewrite eq. (20) as:
(21) 
Since the optimization problem in eq. (21) has the same structure of the one in eq. (2), we follow the same steps to solve it in a distributed manner using the Distributed GaussSeidel method.
The next section presents the implementation details of our distributed objectbased SLAM system.
4.2 Objectbased SLAM Implementation
Object detection and pose estimation. Each robot collects RGBD data using a depth camera, and measures its egomotion through wheel odometry. In our approach, each RGB frame (from RGBD) is passed to the YOLO object detector (Redmon et al., 2015), which detects objects at 45 frames per second. Compared to objectproposalbased detectors, YOLO is fast, since it avoids the computation burden of extracting object proposals, and is less likely to produce false positives in the background. We finetune the YOLO detector on a subset of objects from the BigBird dataset (Singh et al. (2014)). The training dataset contains the object images in a clean background taken from different viewpoints and labeled images of the same objects taken by a robot in an indoor environment. During testing, we use a probability threshold of 0.3 to avoid false detections.
Each detected object bounding box is segmented using the organized point cloud segmentation (Trevor et al., 2013). The segmented object is then matched to the 3D template of the detected object class to estimate its pose. We extract PFHRGB features (Rusu et al., 2008) in the source (object segment) and target (object model) point clouds and register the two point clouds in a Sample Consensus Initial Alignment framework (Rusu, 2009). If we have at least 12 inlier correspondences, GICP (generalized iterative closest point Segal et al. (2009) is performed to further refine the registration and the final transformation is used as the object pose estimate. If less than 12 inlier correspondences are found, the detection is considered to be a false positive and the corresponding measurement is discarded. In hindsight, this approach verifies the detection both semantically and geometrically.
Data Association. If object pose estimation is successful, it is dataassociated with other instances already present in the map by finding the object landmark having the same category label within distance of the newly detected object. If there are multiple objects with the same label within that distance, the newly detected object is matched to the nearest object instance. If there exists no object having the same label, a new object landmark is created.
Before the first rendezvous event, each robot performs standard singlerobot SLAM using OmniMapper (Trevor et al., 2012). Both wheel odometry and relative pose measurements to the observed objects are fed to the SLAM backend. A flowchart of the approach is given in Fig. 5.
Robot Communication.
During a rendezvous between robots and , robot communicates the category labels (class) and poses (in robot ’s frame) of all the detected objects to robot .
We assume that the initial pose of each robot is known to all the robots, hence,
given the initial pose of robot , robot is able to transform the communicated object
poses from robot ’s frame to its own frame.
We remark that, while before the first rendezvous the robots and have different reference frames, the objectobject factors enforce both robots to have a single shared reference frame, facilitating future data association.
Next we show the experimental evaluation which includes realistic Gazebo simulations and field experiments in a military test facility.
5 Experiments
We evaluate the distributed JOR and SOR along with DJ and DGS approaches (with and without using objects) in largescale simulations (Section 5.1 and 5.2) and field tests (Section 5.3 and 5.4). The results demonstrate that (i) the DGS dominates the other algorithms considered in this paper in terms of convergence speed, (ii) the DGS algorithm is accurate, scalable, and robust to noise, (iii) the DGS requires less communication than techniques from related work (i.e., DDFSAM), and (iv) in real applications, the combination of DGS and objectbased mapping reduces the communication requirements by several orders of magnitude compared to approaches exchanging raw measurements.
5.1 Simulation Results: Multi Robot Pose Graph Optimization
In this section, we characterize the performance of the proposed approaches in terms of convergence, scalability (in the number of robots and separators), and sensitivity to noise.
Simulation setup and performance metrics. For our tests, we created simulation datasets in six different configurations with increasing number of robots: 4, 9, 16, 25, 36 and 49 robots. The robots are arranged in a 3D grid with each robot moving on a cube, as shown in Fig. 6. When the robots are at contiguous corners, they can communicate (gray links). Unless specified otherwise, we generate measurement noise from a zeromean Gaussian distribution with standard deviation for the rotations and m for the translations. Results are averaged over 10 Monte Carlo runs.
In our problem, JOR or SOR are used to sequentially solve two linear systems, (7) and (12), which return the minimizers of (6) and (11), respectively. Defining, , we use the following metric, named the rotation estimation error, to quantify the error in solving (7):
(22) 
quantifies how far is the current estimate (at the th iteration) from the minimum of the quadratic cost. Similarly, we define the pose estimation error as:
(23) 
with . Ideally, we want and to quickly converge to zero for increasing .
Ultimately, the accuracy of the proposed approach depends on the number of iterations, hence we need to set a stopping condition for the JOR or SOR iterations. We use the following criterion: we stop the iterations if the change in the estimate is sufficiently small. More formally, the iterations stop when (similarly, for the second linear system ). We use as stopping condition unless specified otherwise.
Comparisons among the distributed algorithms. In this section we consider the scenario with 49 robots. We start by studying the convergence properties of the JOR and SOR algorithms in isolation. Then we compare the two algorithms in terms of convergence speed. Fig. 7 shows the rotation and the pose error versus the number of iterations for different choices of the parameter for the JOR algorithm. Fig. 7a confirms the result of Proposition 2: JOR applied to the rotation subproblem converges as long as . Fig. 7a shows that for any the estimate diverges, while the critical value (corresponding to the DJ method) ensures the fastest convergence rate. Fig. 8 shows the rotation and the pose error versus the number of iterations for different choices of the parameter for the SOR algorithm. The figure confirms the result of Proposition 3: the SOR algorithm converges for any choice of . Fig. 8a shows that choices of close to 1 ensure fast convergence rates, while Fig. 8b established (corresponding to the DGS method) as the parameter selection with faster convergence. In summary, both JOR and SOR have top performance when . Later in this section we show that is the best choice independently on the number of robots and the measurement noise.
Let us now compare JOR and SOR in terms of convergence. Fig. 9 compares the convergence rate of SOR and JOR for both the rotation subproblem (Fig. 9a) and the pose subproblem (Fig. 9b). We set in JOR and SOR since we already observed that this choice ensures the best performance. The figure confirms that SOR dominates JOR in both subproblems. Fig. 10 shows the number of iterations for convergence (according to our stopping conditions) and for different choices of the parameter . Once again, the figure confirms that the SOR with is able to converge in the smallest number of iterations, requiring only few tens of iterations in both the rotation and the pose subproblem.
We conclude this section by showing that setting in SOR ensure faster convergence regardless the number of robots and the measurement noise. Fig. 11 compares the number of iterations required to converge for increasing number of robots for varying values. Similarly Fig. 12 compares the number of iterations required to converge for increasing noise for varying value. We can see that in both the cases has the fastest convergence (required the least number of iterations) irrespective of the number of robots and measurement noise. Since SOR with , i.e., the DGS method, is the top performer in all test conditions, in the rest of the paper we restrict our analysis to this algorithm.
Flagged initialization. In this paragraph we discuss the advantages of the flagged initialization. We compare the DGS method with flagged initialization against a naive initialization in which the variables ( and , respectively) are initialized to zero. The results, for the dataset with 49 robots, are shown in Fig. 13. In both cases the estimation errors go to zero, but the convergence is faster when using the flagged initialization. The speedup is significant for the second linear system (Fig. 13b). We noticed a similar advantage across all tested scenarios. Therefore, in the rest of the paper we always adopt the flagged initialization.
Stopping conditions and anytime flavor. This section provides extra insights on the convergence of the DGS method. Fig. 14ab show the evolution of the rotation and pose error for each robot in the 49robot grid: the error associated to each robot (i.e., to each subgraph corresponding to a robot trajectory) is not monotonically decreasing and the error for some robot can increase to bring down the overall error. Fig. 14cd report the change in the rotation and pose estimate for individual robots. Estimate changes become negligible within few tens of iterations. As mentioned at the beginning of the section, we stop the DGS iterations when the estimate change is sufficiently small (below the thresholds and ).
Fig. 15 shows the estimated trajectory after 10 and 1000 iterations of the DGS algorithm for the 49robot grid. The odometric estimate (Fig. 15a) is shown for visualization purposes, while it is not used in our algorithm. We can see that the estimate after 10 iterations is already visually close to the estimate after 1000 iterations. The DGS algorithm has an anytime flavor: the trajectory estimates are already accurate after few iterations and asymptotically converge to the centralized estimate.
Scalability in the number of robots. Fig. 16 shows the average rotation and pose errors for all the simulated datasets (4, 9, 16, 25, 36 and 49 robots). In all cases the errors quickly converge to zero. For large number or robots the convergence rate becomes slightly slower, while in all cases the errors is negligible in few tens of iterations.
#Robots  Distributed GaussSeidel  Centralized  
TwoStage  GN  
#Iter  Cost  #Iter  Cost  Cost  Cost  
4  10  1.9  65  1.9  1.9  1.9 
9  14  5.3  90  5.2  5.2  5.2 
16  16  8.9  163  8.8  8.8  8.7 
25  17  16.2  147  16.0  16.0  15.9 
36  28  22.9  155  22.7  22.6  22.5 
49  26  35.1  337  32.9  32.7  32.5 
While so far we considered the errors for each subproblem ( and ), we now investigate the overall accuracy of the DGS algorithm to solve our original problem (3). We compare the proposed approach against the centralized twostage approach of Carlone et al. (2015b) and against a standard (centralized) GaussNewton (GN) method, available in gtsam (Dellaert (2012)). We use the cost attained in problem (3) by each technique as accuracy metric (the lower the better). Table 1 reports the number of iterations and the cost attained in problem (3), for the compared techniques. The number of iterations is the sum of the number of iterations required to solve (7) and (12). The cost of the DGS approach is given for two choices of the thresholds and . As already reported in Carlone et al. (2015b), the last two columns of the table confirm that the centralized twostage approach is practically as accurate as a GN method. When using a strict stopping condition (), the DGS approach produces the same error as the centralized counterpart (difference smaller than ). Relaxing the stopping conditions to implies a consistent reduction in the number of iterations, at a small loss in accuracy (cost increase is only significant for the scenario with 49 robots). In summary, the DGS algorithm (with ) ensures accurate estimation within few iterations, even for large teams.
Sensitivity to measurement noise. Fig. 17 shows the average rotation and pose errors for increasing levels of noise in the scenario with 49 robots. Also in this case, while larger noise seems to imply longer convergence tails, the error becomes sufficiently small after few tens of iterations.
Table 2 evaluates the performance of the DGS method in solving problem (3) for increasing levels of noise, comparing it against the centralized twostage approach of Carlone et al. (2015b) and the GaussNewton method. The DGS approach is able to replicate the accuracy of the centralized twostage approach, regardless the noise level, while the choice of thresholds ensures accurate estimation within few tens of iterations.
Measurement  Distributed GaussSeidel  Centralized  
noise  TwoStage  GN  
#Iter  Cost  #Iter  Cost  Cost  Cost  
8.5  2.1  51.0  1.8  1.8  1.8  
21.8  14.8  197.8  14.0  14.0  13.9  
35.6  58.4  277.7  56.6  56.6  56.0  
39.8  130.5  236.8  128.4  129.3  126.0 
Scalability in the number of separators. In order to evaluate the impact of the number of separators on convergence, we simulated two robots moving along parallel tracks for 10 time stamps. The number of communication links were varied from 1 (single communication) to 10 (communication at every time), hence the number of separators (for each robot) ranges from 1 to 10. Fig. 18a shows the number of iterations required by the DGS algorithm (), for increasing number of communication links. The number of iterations is fairly insensitive to the number of communication links.
Fig. 18b compares the
information exchanged in the DJ algorithm against a stateoftheart algorithm,
DDFSAM (Cunningham et al. (2010)). In DDFSAM, each robot sends
bytes,
where is the number of iterations required by a GN method applied to problem (3)
(we consider the best case ),
is the number of separators and is the size of a pose in bytes.
In the DGS algorithm, each robots sends
bytes,
where and are the number of iterations required by the
DGS algorithm to solve the linear systems (7)
and (12), respectively, and is the size of a rotation (in bytes).
We assume doubles (72 bytes)
Realistic simulations in Gazebo. We tested our DGSbased approach in two scenarios in Gazebo simulations as shown in Fig. 19. The robots start at fixed locations and explore the environment by moving according to a random walk. Each robot is equipped with a 3D laser range finder, which is used to intrarobot and interrobot measurements via scan matching. In both scenarios, two robots communicate only when they are within close proximity of each other (m in our tests). Results are average over 100 MonteCarlo runs.
Fig. 19 shows the aggregated point cloud corresponding to the DGS trajectory estimate, for one of the runs. The point cloud closely resembles the ground truth environment shown in the same figure. Fig. 20a shows that number of steps required to explore the whole environment quickly decreases with increasing number of robots. This intuitive observation motivates our interest towards mapping techniques that can scale to large teams of robots. Fig. 20b reports trajectory samples for different robots in our Monte Carlo analysis.
5.2 Simulation Results: Multi Robot Object based SLAM
In this section we characterize the performance of the DGS algorithms associated with our objectbased model described in Section 4. We test the resulting multi robot objectbased SLAM approach in terms of scalability in the number of robots and sensitivity to noise.
Simulation setup and performance metrics. We consider two scenarios, the 25 Chairs and the House, which we simulated in Gazebo. In the 25 Chairs scenario, we placed 25 chairs as objects on a grid, with each chair placed at a random angle. In the House scenario, we placed furniture as objects in order to simulate a living room environment. Fig. 21 shows the two scenarios. Unless specified otherwise, we generate measurement noise from a zeromean Gaussian distribution with standard deviation for the rotations and m for the translations. Six robots are used by default. Results are averaged over 10 Monte Carlo runs.
We use the absolute translation error (ATE*) and absolute rotation error (ARE*) of the robot and landmark poses with respect to the centralized estimate as error metric. These two metrics are formally defined below.
Absolute Translation Error (ATE*). Similar to the formulation by Sturm et al. Sturm et al. (2012), the average translation error measures the absolute distance between the trajectory and object poses estimated by our approach versus the centralized GN method. The ATE* is defined as:
(24) 
where is the position estimate for robot at time , is the corresponding estimate from GN, and is the number of poses in the trajectory of . A similar definition holds for the object positions.
Absolute Rotation Error (ARE*). The average rotation error is computed by evaluating the angular mismatch between the (trajectory and objects) rotations produced by the proposed approach versus a centralized GN method:
(25) 
where is the rotation estimate for robot at time , is the corresponding estimate from GN. A similar definition holds for the object rotations.
#Robots  Distributed GaussSeidel  Centralized  ATE* (m)  ARE* (deg)  

GN  Poses  Lmrks. 