Continuous Preintegration Theory for Graph-based Visual-Inertial Navigation
In this paper we propose a new continuous preintegration theory for graph-based sensor fusion with an inertial measurement unit (IMU) and a camera (or other aiding sensors). Rather than using discrete sampling of the measurement dynamics as in current methods, we analytically derive the closed-form solutions to the preintegration equations, yielding improved accuracy in state estimation. We advocate two new different inertial models for preintegration: (i) the model that assumes piecewise constant measurements, and (ii) the model that assumes piecewise constant local true acceleration. We show through extensive Monte-Carlo simulations the effect that the choice of preintegration model has on estimation performance. To validate the proposed preintegration theory, we develop both direct and indirect visual-inertial navigation systems (VINS) that leverage our preintegration. In the first, within a tightly-coupled, sliding-window optimization framework, we jointly estimate the features in the window and the IMU states while performing marginalization to bound the computational cost. In the second, we loosely couple the IMU preintegration with a direct image alignment that estimates relative camera motion by minimizing the photometric errors (i.e., raw image intensity difference), allowing for efficient and informative loop closures. Both systems are extensively tested in real-world experiments and are shown to offer competitive performance to state-of-the-art methods.
Accurate localization for autonomous systems is a prerequisite in many robotic applications such as planetary exploration (Mourikis, Trawny, Roumeliotis, Johnson, and Matthies, 2007), search and rescue (Ellekilde, 2007), and autonomous driving (Geiger, Lenz, and Urtasun, 2012). In many of these scenarios, access to global information such as from a Global Positioning System (GPS), motion capture system, or a prior map of the environment is unavailable. Instead, one has to estimate the robot state and its surroundings based on noisy, local measurements from onboard sensors, by performing simultaneous localization and mapping (SLAM), which has witnessed significant research efforts in the past three decades (Cadena, Carlone, Carrillo, Latif, Scaramuzza, Neira, Reid, and Leonard, 2016).
Of many possible sensors used in SLAM, Micro-Electro-Mechanical System (MEMS) inertial measurement units (IMUs) have become ubiquitous. These low-cost and light-weight sensors typically provide local linear acceleration and angular velocity readings, and are well suited for many applications such as Unmanned Aerial Vehicles (UAVs) (Ling, Liu, and Shen, 2016) and mobile devices (Wu, Ahmed, Georgiou, and Roumeliotis, 2015). IMUs provide information only about the derivatives of the state, so estimation must be performed by integrating over these noisy measurements. This may lead to large drifts over long periods of time, making the use of a low-cost IMU alone an unreliable solution. However, IMU readings are highly-informative about short-term motion which is ideal for fusion with measurements from exteroceptive aiding sensors, such as LiDAR and cameras. These sensors compensate for the drift issue inherent in IMUs, while high-rate inertial measurements are useful in tracking aggressive motion which may be difficult for exteroceptive low-rate sensors alone.
The canonical way of fusing IMU measurements in aided inertial navigation is to use an extended Kalman filter (EKF) (e.g., see Mourikis and Roumeliotis (2007)). In this method, the inertial measurements are used to predict to the next time instance, while measurements from exteroceptive sensors are used to update the state estimate. More recently, the development of preintegration has allowed for the efficient inclusion of high-rate IMU measurements in graph-based SLAM (Lupton and Sukkarieh, 2012; Forster, Carlone, Dellaert, and Scaramuzza, 2015, 2017). In this paper, building upon our prior conference publication (Eckenhoff, Geneva, and Huang, 2016a), we investigate in-depth the optimal use of preintegration by providing models and their analytical solutions for the preintegrated measurement dynamics, allowing for more accurate computation of the inertial factors for use in the graph optimization of visual-inertial navigation systems (VINS).
In particular, the main contributions of this work include:
We advocate two new preintegration models (i.e., piecewise constant measurements and piecewise constant local true acceleration, instead of piecewise constant global acceleration as assumed in the existing methods) to better capture the underlying motion dynamics and offer the analytical solutions to the preintegration equations. We have open sourced the proposed preintegration to better contribute to our research community.111The open source of the proposed continuous preintegration is available at: https://github.com/rpng/cpi
Using the proposed continuous preintegration, we develop an indirect, tightly-coupled, sliding-window optimization based visual-inertial odometry (VIO), which marginalizes out features from the state vector when moving to the next time window to enable real-time performance of bounded computational cost.
With the proposed continuous IMU preintegration, we further develop a loosely-coupled, direct VINS, which fuses the preintegrated inertial measurements with the direct image alignment results.
We conduct thorough Monte-Carlo simulation analysis of different preintegration models by varying motion dynamics and IMU sampling rates. We perform extensive real-world experiments to validate the proposed VINS using our preintegration by comparing with a state-of-the-art method.
The reminder of the paper is organized as follows: After a brief overview of related work in the next section and estimation preliminaries in Section III, we present in detail the proposed continuous preintegration in Section IV. The direct and indirect VINS that use the proposed preintegration are described in Section V. In Sections VI and VII, we validate the proposed VINS algorithms through both simulations and experiments. Finally, Section VIII concludes the work in this paper, as well as the possible future research directions.
Ii Related Work
Ii-a Visual-Inertial Navigation
*Mourikis2007ICRA proposed one of the earliest successful VINS algorithms, known as the Multi-State Constraint Kalman Filter (MSCKF). This filtering approach used quaternion-based inertial dynamics (Trawny and Roumeliotis, 2005) for state propagation coupled with a novel update step. Rather than adding features seen in the camera images to the state vector, their visual measurements were projected onto the nullspace of the feature Jacobian matrix (akin to feature marginalization Yang, Maley, and Huang (2017)), thereby retaining motion constraints that only related to the stochastically cloned camera poses in the state vector (Roumeliotis and Burdick, 2002). While reducing the computational cost by removing the need to co-estimate features, this nullspace projection prevents the relinearization of the processed features’ nonlinear measurements at later time steps.
The standard MSCKF recently has been extended in various directions. For example, *Hesch2013TRO improved the filter consistency by enforcing the correct observability properties of the linearized EKF VINS; *Guo2013iros showed that the inclusion of plane features increases the estimation accuracy; *Guo2014RSS extended to the case of rolling-shutter cameras with inaccurate time synchronization; and *Wu2015RSS reformulated the VINS problem within a square-root inverse filtering framework for improved computational efficiency and numerical stability without sacrificing estimation accuracy. While the filtering based methods have shown to exhibit high-accuracy state estimation, they theoretically suffer from a limitation – that is, nonlinear measurements must have a one-time linearization before processing, possibly introducing large linearization errors into the estimator.
Batch optimization methods, by contrast, solve a nonlinear least-squares (or bundle adjustment) problem over a set of measurements, allowing for the reduction of error through relinearization (Kummerle, Grisetti, Strasdat, Konolige, and Burgard, 2011). The incorporation of tightly-coupled VINS in batch optimization methods requires overcoming the high frequency nature and computational complexity of the inertial measurements.
*LeuteneggerIJRR2014 introduced a keyframe-based VINS approach (i.e., OKVIS), whereby a set of non-sequential past camera poses and a series of recent inertial states, connected with inertial measurements, was used in nonlinear optimization for accurate trajectory estimation. These inertial factors took the form of a state prediction: every time that the linearization point for the starting inertial state changed, it is required to reintegrate the IMU dynamics. This presents inefficiencies in the inertial processing, and makes incorporating a large number of inertial factors practically infeasible.
Ii-B Visual Processing
A key component to any VINS algorithm is the visual processing pipeline, responsible for transforming dense imagery data to motion constraints that can be incorporated into the estimation problem. Seen as the classical technique, indirect methods of visual SLAM extract and track features in the environment, while using geometric reprojection constraints during estimation. An example of a current state-of-the-art indirect method is ORB-SLAM2 (Mur-Artal and Tardós, 2017), which performs graph-based optimization of camera poses using information from 3D feature point correspondences.
In contrast, direct methods utilize raw pixel intensities in their formulation and allow for inclusion of a larger percentage of the available image information. LSD-SLAM is an example of a state-of-the-art direct visual-SLAM method which optimizes the transformation between pairs of camera keyframes based on minimizing their intensity error (Engel, Schöps, and Cremers, 2014). Note that this approach also optimizes a separate graph containing keyframe constraints to allow for the incorporation of highly informative loop-closures to correct drift over long trajectories. This work was later extended from a monocular sensor to stereo and omnidirectional cameras for improved accuracy (Engel, Stückler, and Cremers, 2015; Caruso, Engel, and Cremers, 2015). Other popular direct methods include *Engel2018TPAMI and *Wang2017ICCV which estimate keyframe depths along with the camera poses in a tightly-coupled manner, offering low-drift results.
Application of direct methods to the visual-inertial problem has seen recent attention due to their ability to robustly track dynamic motion even in low-texture environments. For example, *Bloesch2015IROS,Bloesch2017IJRR used a patch-based direct method to provide updates with an iterated EKF; *Usenko2016ICRA introduced a sliding-window VINS based on the discrete preintegration and direct image alignment; *Ling2016ICRA employed loosely-coupled direct alignment with preintegration factors for tracking aggressive quadrotor motions. While these methods have shown the feasibility of incorporating IMU measurements with direct methods, they employed the discrete form of inertial preintegration.
Ii-C Inertial Preintegration
First introduced by *Lupton2012TOR, preintegration is a computationally efficient alternative to the standard inertial measurement integration. The authors proposed the use of discrete integration of the inertial measurement dynamics in a local frame of reference, preventing the need to reintegrate the state dynamics at each optimization step. While this addresses the computational complexity issue, this method suffers from singularities due to the use of Euler angles in the orientation representation. To improve the stability of this preintegration, an on-manifold representation was introduced by *Forster2015RSS,Forster2017TRO which presents a singularity-free orientation representation on the manifold, incorporating the IMU preintegration into an efficient graph-based VINS algorithm.
While *Shen2015ICRA introduced preintegration in the continuous form, they still discretely sampled the measurement dynamics without offering closed-form solutions. This left a significant gap in the theoretical completeness of preintegration theory from a continuous-time perspective. Albeit, Qin, Li, and Shen (2017) later extended to a robust tightly-coupled monocular visual-inertial localization system. As compared to the discrete approximation of the preintegrated measurement and covariance calculations used in previous methods, in our prior work (Eckenhoff, Geneva, and Huang, 2016a), we have derived the closed-form solutions to both the measurement and covariance preintegration equations and showed that these solutions offer improved accuracy over the discrete methods, especially in the case of highly dynamic motion.
In this work, based on our preliminary results (Eckenhoff, Geneva, and Huang, 2016a, 2017), we provide a solid theoretical foundation for continuous-time preintegration and show that it can be easily incorporated into different graph-based sensor fusion methods. We investigate the improved accuracy afforded by two different models of continuous-time preintegration and scenarios in which they exhibit superior performance. We further develop both indirect and direct graph-based VINS and demonstrate their competitive performance to state-of-the-art methods.
Iii Estimation Preliminaries
The IMU state of an aided inertial navigation system at time step is given by (Mourikis and Roumeliotis, 2007):
where is the unit quaternion of JPL form parameterizing the rotation from the global frame to the current local frame (Trawny and Roumeliotis, 2005), and are the gyroscope and accelerometer biases, and and are the velocity and position of the IMU expressed in the global frame, respectively.
We note that while the state is embedded in , there are actually 15 degrees of freedom (DOF), due to the constraint that must have unit length. In fact, the state lies on the manifold defined as the product of the unit quaternions with (i.e., ). In order to represent the estimation problem on manifold, we employ the “boxplus” update operation, , which maps an element from a manifold, , and an error vector into a new element on (Hertzberg, Wagner, Frese, and Schröder, 2013). As illustrated in Figure 1, for a manifold of dimension , we can define the following operation:
Similarly, the inverse “boxminus” operation is given by:
In the case of a state in a vector space, , these operations are the standard addition and subtraction:
In the case of a unit quaternion expressed using the JPL convention, , we have (Trawny and Roumeliotis, 2005):
where refers to the vector portion of the quaternion argument (i.e., ). The quaternion multiplication, , is given by:
where for :
In state estimation, these operations allow us to model the state on manifold using a Gaussian distribution on its error state vector. In particular, the random variable with mean value takes the form:
where is the covariance of the zero-mean error state. The error state corresponding to Equation (1) is given by:
Iii-a Batch Optimization
In the case of graph SLAM, the graph nodes can correspond to historical robot states and features in the environment, while the edges represent collected measurements from sensors which relate the incident nodes. As an example, a robot measuring a feature would add an edge between the feature and the robot state node. Using this graph formulation and under the assumption of independent zero-mean Gaussian noise, we can find a maximum a posteriori (MAP) estimate of all states by solving the following nonlinear least-squares problem (Kummerle, Grisetti, Strasdat, Konolige, and Burgard, 2011):
where is the error/residual of measurement , is the associated information matrix (inverse covariance), and represents the squared energy norm. Optimization is typically performed iteratively by linearizing the nonlinear measurements about the current estimate, , and defining a new linear weighted least squares problem in terms of the error state :
We can see that the original optimization problem has been converted into finding the optimal correction vector, , to the current state estimate. The optimal solution can be found by solving the following normal equation:
After obtaining the optimal correction, , we update our current estimate, , and repeat the optimization process. After convergence, we will be left with the following distribution:
where the measurement Jacobians, , are evaluated at the final estimate.
In a naive graph SLAM formulation, nodes are continuously added to the graph as time progresses without consideration to the computational burden. For example, as a robot moves through an unknown environment we would add robot state nodes at every measurement time. This becomes a problem due to the high computational complexity, , of batch optimization, in the worst case. In order to bound the computational complexity of the system, marginalization is often performed to remove a set of nodes, called marginal states, from the graph, while retaining the information contained in their incident edges (see Figure 2 for an example) (Huang, Kaess, and Leonard, 2013; Eckenhoff, Paull, and Huang, 2016b). Partitioning the optimization variables into states remaining after marginalization, , and the to-be marginalized states, , we can write Equation (17) as the solution of the following minimization (Huang, Mourikis, and Roumeliotis, 2011):
The second subcost, , is associated with the measurements incident to the marginal states, and is a function of both these states and the remaining ones. The first, , refers to all other edges in the graph. The optimal estimate for the remaining nodes can be written as:
That is, minimizing with respect to yields a cost that is a function only of the remaining states. This minimization is performed as in Equation (20), where we write out the linear system for only the measurements involved in :
The optimal subcost , up to an irrelevant constant, is given by (Nerurkar, Wu, and Roumeliotis, 2014):222Throughout the paper, we reserve the symbol to denote the current estimate of state variable in optimization, while refers to the (inferred) measurement mean value.
where is the linearization point used to build the system (in practice, the current state estimate at the time of marginalization), and and are the marginalized Hessian and gradient, respectively.
In future optimization, this marginalization creates both a new quadratic and linear cost in terms of the error between the remaining states and their linearization points. This then replaces the marginal measurements in the original graph, and we can write this new cost up to a constant in the form of Equation (17):
where for the Jacobian of a vector (i.e., if ):
and for a quaternion , with , we have:
Iv Continuous Preintegration
In this section, we present in detail the proposed continuous IMU preintegration based on two different realistic inertial models, which is expected to be readily used in any graph-based aided inertial navigation, thus providing an essential building block for visual-inertial state estimation.
An IMU attached to the robot collects inertial readings of the underlying state dynamics. In particular, the sensor receives angular velocity and local linear acceleration measurements which relate to the corresponding true values and as follows:
where is the global gravity333Note that the gravity is slightly different in different parts of the globe. and is the rotation from the global frame to the instantaneous local inertial frame. The measurements are corrupted both by the time-varying biases and (which must be co-estimated with the state), and the zero-mean white Gaussian noises and . The standard dynamics of the IMU state is given by (Chatfield, 1997):
Iv-a Standard IMU Processing
Given a series of IMU measurements, , collected over a time interval , the standard (graph-based) IMU processing considers the following propagation function:
That is, the future state at time step is a function of the current state at step , the IMU measurements , and the corresponding measurement noise . Conditioning on the current state, the expected value of the next state is found by evaluating the propagation function with zero noise:
which implies that we perform integration of the state dynamics in the absence of noise. The residual for use in batch optimization of this propagation now constrains the start and end states of the interval and is given by (see Equation (17)):
where is the linearized, discrete-time noise covariance computed from the IMU noise characterization and is a function of the state. This noise covariance matrix and the propagation function can be found by the integration of Equations (37)-(41) and their associated error state dynamics, to which we refer the reader to (Trawny and Roumeliotis, 2005; Mourikis and Roumeliotis, 2007). It is clear from Equation (44) that we need constantly re-evaluate the propagation function and the residual covariance whenever the linearization point (state estimate) changes. The high frequency nature of the IMU sensors and the complexity of the propagation function and the noise covariance can make direct incorporation of IMU data in real-time graph-based SLAM prohibitively expensive.
Iv-B Model 1: Piecewise Constant Measurements
IMU preintegration seeks to directly reduce the computational complexity of incorporating inertial measurements by removing the need to re-integrate the propagation function and noise covariance. This is achieved by processing IMU measurements in a local frame of reference, yielding measurements that are, in contrast to Equation (44), independent of the state (Lupton and Sukkarieh, 2012).
Specifically, by denoting , we have the following relationship between a series of IMU measurements, the start state, and the resulting end state (Eckenhoff, Geneva, and Huang, 2016a):
From the above, we define the following preintegrated IMU measurements:444 Note that along with the preintegrated inertial measurements in Equations (52) and (53), the preintegrated relative-orientation measurement (or ) can be obtained from the integration of the gyro measurements.
To remove the dependencies of the above preintegrated measurements on the true biases, we linearize about the current bias estimates at time step , and . Defining , we have:
Note that Equations (54) and (55) are simple Taylor series expansions for our and measurements, while Equation (56) models an additional rotation induced due to a change of the linearization point (estimate) of the gyro bias (Forster, Carlone, Dellaert, and Scaramuzza, 2015; Eckenhoff, Geneva, and Huang, 2016a).
The preintegrated measurement’s mean values, , , and , must be computed for use in graph optimization. It is important to note that current preintegration methods (Lupton and Sukkarieh, 2012; Forster, Carlone, Dellaert, and Scaramuzza, 2015; Ling, Liu, and Shen, 2016) are all based on discrete integration of the measurement dynamics through Euler or midpoint integration. In particular, the discrete approximation used by *Forster2015RSS in fact corresponds to a piecewise constant global acceleration model. By contrast, we here offer closed-form solutions for the measurement means under the assumptions of piecewise constant measurements and of piecewise constant local acceleration (in Section IV-C).
Iv-B1 Computing preintegration mean
Between two image times, and , the IMU receives a series of inertial measurements. We denote as the step at which an IMU measurement is received, and as the step of the next IMU reading. The time associated with each of these steps is given by and , respectively. The relative orientation between the interval, , can be found using successive applications of the zeroth order quaternion integrator (Trawny and Roumeliotis, 2005). Based on the definitions of and (see Equations (52) and (53)), we have the following continuous-time dynamics at every step with :
From these governing differential equations, we formulate the following linear system that describes the evolution of the measurements by taking the expectation operation:
Given and sampled at time and assuming that these IMU measurements are piecewise constant during , we analytically solve the above linear time-varying (LTV) system to obtain the updated preintegration mean values, which are computed as follows
where we have employed the definitions: , , and . Clearly, these closed-form expressions reveal the higher order affect of the angular velocity on the preintegrated measurements due to the evolution of the orientation over the IMU samping interval.
Iv-B2 Computing preintegration covariance
In order to derive the preintegrated measurement covariance, we first write the linearized measurement error state system as follows (Eckenhoff et al., 2018):
which is akin to the standard VINS error state propagation equations in a local frame of reference (Mourikis and Roumeliotis, 2007).
It is important to note that in contrast to our previous work (Eckenhoff et al., 2016a, 2017), we here couple the preintegration bias and measurement evolution for improved accuracy. Note also that the bias error terms in Equation (61), and , describe the deviation of the bias over the interval due to the random-walk drift, rather than the error of the current bias estimate. Covariance propagation can be carried out using integration to obtain the discrete state transition matrix :
The propagation of the measurement covariance, , takes the following form:
where is the continuous-time IMU noise covariance. To keep presentation concise, the discrete-time noise covariance can be computed similarly as in (Trawny and Roumeliotis, 2005).
Iv-B3 Preintegration measurement residuals and Jacobians
For use in optimization, we form the associated measurement cost and residual as follows:
where we have employed and . In the above expression, and , are the Jacobian matrices of the pertinent residuals with respect to the biases, which are used to correct the measurements due to a change in the initial bias estimate , thus compensating for the fact that preintegrated measurements have been linearized about and without having to recompute the required integrals whenever the bias estimates change. In particular, using the fact that our preintegrated measurement means are linear in the acceleration bias (see Equation (60)), we have the following dynamics of its Jacobians:
Similarly, for the gyroscope bias Jacobians, we have:
where and are the terms in the second matrix of Equation (60). Moreover, the measurement Jacobians of these preintegrated measurements with respect to the error state, Equation (16), can also be found in Appendix A.1, which are essential for the batch optimization solver. For the detailed derivations and closed-form expressions of the preintegrated measurements and Jacobians, the reader is referred to our companion technical report (Eckenhoff et al., 2018).
Iv-C Model 2: Piecewise Constant Local Acceleration
The previous preintegration (Model 1) assumes that noiseless IMU measurements can be approximated as remaining constant over a sampling interval, which, however, might not always be a good approximation (see Figure 3). In this section, we propose a new preintegration model that instead assumes piecewise constant true local acceleration during the sampling time interval, which may better approximate motion dynamics in practice.
Note that we have moved the effect of gravity back inside the integrals. We then define the following vectors:
which essentially are the true local position displacement and velocity change during , and yields:
In particular, between two IMU measurement times inside the preintegration interval, , we assume that the local acceleration will be constant:
Using this sampling model we can rewrite Equation (75) as:
It is important to note that, since and are functions of both the biases and the initial orientation, we perform the following linearization with respect to these states:
where is the rotation angle change associated with the change of the linearization point of quaternion .
Iv-C1 Computing preintegration mean
To compute the new preintegrated measurement mean values, we first determine the continuous-time dynamics of the expected preintegration vectors by taking expectations of Equations (74) and (77), given by:
As in the case of Model 1 (see Section IV-B1), we can formulate a linear system of the new preintegration measurement vectors and find the closed-from solutions. Specifically, we can integrate these differential equations and obtain the solution similar to Equation (60), while using the new definition: , which serves as the estimate for the piecewise constant local acceleration over the sampling interval.