Scene-Aware Error Modeling of LiDAR/Visual Odometry for Fusion-based Vehicle Localization
Localization is an essential technique in mobile robotics. In a complex environment, it is necessary to fuse different localization modules to obtain more robust results, in which the error model plays a paramount role. However, exteroceptive sensor-based odometries (ESOs), such as LiDAR/visual odometry, often deliver results with scene-related error, which is difficult to model accurately. To address this problem, this research designs a scene-aware error model for ESO, based on which a multimodal localization fusion framework is developed. In addition, an end-to-end learning method is proposed to train this error model using sparse global poses such as GPS/IMU results. The proposed method is realized for error modeling of LiDAR/visual odometry, and the results are fused with dead reckoning to examine the performance of vehicle localization. Experiments are conducted using both simulation and real-world data of experienced and unexperienced environments, and the experimental results demonstrate that with the learned scene-aware error models, vehicle localization accuracy can be largely improved and shows adaptiveness in unexperienced scenes.
Recent years have witnessed considerable progress in developing autonomous systems, where highly accurate vehicle localization is the key to achieving safe and efficient autonomy in a complex real world.
GNSSs (global navigation satellite systems) have been widely used in vehicle localization in outdoor environments and are usually combined with proprioceptive sensors such as IMUs (inertial measurement units) and wheel encoders for interpolating positions during satellite signal outages. However, such systems are restricted by GNSS conditions, and the IMU maintains accuracy only for short periods due to accelerometer biases and gyro drifts. Therefore, exteroceptive sensor-based approaches such as LiDAR odometry or visual odometry have been studied to assist in highly accurate localization. Hereinafter, we refer to exteroceptive sensor-based odometry as ESO and proprioceptive sensor-based odometry as PSO.
However, the performance of exteroceptive sensor-based localization is strongly related to the scenes. When fusing them with other localization approaches, e.g., , precise error modeling is essential to the fusion efficiency. Covariance has been a widely used measurement for error estimation. Many of these methods correlate pose uncertainty with the covariance of data matching, which could be an ill-posed problem in many situations. In addition, most existing works only model the error of their measurements or features, whereas fewer studies focus on the error of final localization results such as .
This research proposes a method of ESO scene-aware error modeling for fusion-based localization, which is formulated as a mapping from given scene data to a prediction of the ESO error as an information matrix, a dual form of a covariance matrix. A CNN (convolutional neural network) is used to model the mapping procedure, and a vehicle localization framework is devised to incorporate the scene-aware error modeling results in fusing ESO for pose estimation. An end-to-end method is developed to train the error model by using reliable global localization, such as GPS, as supervision, which could be sporadic. Therefore, at each iteration, the vehicle is localized by forward propagation on the current parameters for a number of frames, and when a reliable GPS measurement is found, the global localization error is backpropagated along the pipeline to correct the CNN parameters.
The proposed method is realized for error modeling of LiDAR odometry and visual odometry, and the results are fused with dead reckoning to examine the performance of vehicle localization. Experiments are conducted using both simulation and real-world data. The former validates the adaptability of the proposed method in simple but typical scenes, while the latter examines the performance in a complex real world that contains experienced and unexperienced environments. The experiments are deployed on some popular LiDAR odometry and visual odometry methods and compared with the traditional fusion approaches using covariance-based error modeling. The experimental results demonstrate that with the learned scene-aware error models, vehicle localization accuracy can be largely improved, and it shows adaptiveness in unexperienced scenes.
The paper is organized as follows. A literature review about ESO and the corresponding error model is presented in section II. An overview of the proposed error model learning method is given in section III. Experiments on LiDAR/visual odometry and the analysis of their results are illustrated in sections IV and V, respectively. Finally, section VI describes the conclusions and the direction of our future works.
|Research||Category||Optimizing Method||Feature||Objective||Error Model|
|LO||Censi, 2008, ||F-b||Lagrange’s Multiplier||RP&Line||Feature Distance|
|Bosse, 2009, ||F-b||WLS||Shape Info.||Match&Smoothness Func.||-|
|Armesto, 2010, ||F-b||LS||RP, Facet||Metric Distance|
|Zhang, 2014, ||F-b||LM||Line, Plane||Feature Distance||-|
|Velas, 2016, ||F-b||SVD||Line||Feature Distance||-|
|Wang, 2018, ||F-b||EM, IRLS||RP||Likelihood Func.||-|
|Magnusson, 2007, ||Direct||Newton||RP||Joint Prob.||-|
|Olson, 2009, ||Direct||Search||RP||Posterior Observation Prob.||Cov.|
|Olson, 2015, ||Direct||Search||RP||Correlative Cost Func.||-|
|Jaimez, 2016, ||Direct||IRLS||RP(Range Flow)||Geometric Residual||-|
|Ramos, 2007, ||Heuristic/F-b||WLS||Local Feature||CRF Inference Error||-|
|Diosi, 2007, ||Heuristic/F-b||WLS, Parabola Fitting||RP||Polar Range Distance||Cov.|
|Censi, 2009, ||Heuristic/Direct||LS||RP(Hough)||Spectrum Func.||-|
|VO||Howard, 2008, ||F-b||LM||Harris/FAST||RE||-|
|Kitt, 2010, ||F-b||ISPKF||Harris et.al.||RE||Cov.|
|Mouats, 2014, ||F-b||GN||Log-Gabor Wavelets||RE||-|
|Gomez-ojeda, 2016, ||F-b||GN||ORB&LSD||RE||Cov.|
|Zhang, 2012, ||F-b||PHD||SIFT||RE||Cov.|
|Engel, 2013, ||Direct||RGN||-||PE||Cov.|
|Kerl, 2013, ||Direct||IRLS||-||PE||-|
|Wang, 2017, ||Direct||GN||-||PE||-|
|Li, 2018, ||Direct||GN||-||PE||-|
|Engel, 2017, ||Direct||GN||-||PE||-|
|Forster, 2014, ||Semi-Direct||GN||Sparse Feature Patches||PE&RE||-|
|Wang, 2017, ||Deep Learning||BP||-||Pose MSE||-|
|Others||Tanskanen, 2015, ||Visual-Inertial||EKF||-||PE||Cov.|
|Usenko, 2016, ||Visual-Inertial||LM||-||Photometric-Inertial Energy||-|
|Qin, 2018, ||Visual-Inertial||LS||Harris||Feature&IMU Residual||-|
|Zhang, 2015, ||Visual-LiDAR||LM||Haris&RP||Feature Distance||-|
|Hemann, 2016, ||LiDAR-Inertial||KF||RP&DEM||Cross-correlation Func.||Cov.|
|Barjenbruch, 2015, ||Radar||Gradient-based||Spatial&Doppler Info.||Metric Func.||-|
*The denotions for abbreviations in this table are arranged in alphabetical order by column.
BP: Back Propagation
CRF: Conditional Random Field
DEM: Digital Elevation Model
EKF: Extended Kalman Filter
IMU: Inertial Measurement Unit
IRLS: Iteratively Reweighted Least Squares
ISPKF: Iterated Sigma Point Kalman Filter
KF: Kalman Filter
LO: LiDAR Odometry
LS: Least Squares
MSE: Mean-square Error
PE: Photometric Error
PHD: Probability Hypothesis Density
RE: Reprojection Error
RGN: Reweighted Gauss-Newton
RP: Raw Point
SVD: Singular Value Decomposition
VO: Visual Odometry
WLS: Weighted Least Squares
Ii Related Works
Ii-a LiDAR Odometry
LiDAR odometry performs relative positioning by comparing laser measurements from sequent LiDAR scans, which has a more popular name, scan matching. Following the conventional taxonomy of visual odometry, this paper divides LiDAR odometries into feature-based methods and direct methods by whether explicit feature correspondence is needed.
Feature-based methods. A typical method for scan matching is to build the feature correspondence for sequent LiDAR scans, and then the motion from the reference frame to the target frame can be calculated from the matching results. In feature selection, various definitions of features, such as points, lines, planes and other self-defined local features, can be used alone or in combination. In optimization strategies of feature matching, many works, such as and , are variants of the ICP (iterative closest point) algorithm, which iteratively minimizes the feature matching error using an optimizer such as least squares. Apparently, feature association in such an indirect matching method creates considerable computing cost and often leads to overconfident mismatching.
Direct methods. To overcome the efficiency problem of feature association, some researchers have attempted to avoid building such explicit correspondence.  transformed the scan-to-scan matching problem into a correlation evaluation under a probabilistic framework, and  extended it to 3D applications.  proposed correlative scan matching by employing a Monte Carlo sampling strategy, and  improved the efficiency of such methods using multiresolution matching.  designed a range flow-based approach in the fashion of dense 3D visual odometry, which performs scan alignment using scan gradients.
In addition, many heuristic methods have been proposed to compensate for the flaws of previous work, such as poor convergence or dependence on initialization.  matched LiDAR points with the same bearing under polar coordinates to run faster than ICP.  presented a CRF (conditional random field)-based scan matching, which takes into account the high-level shape information.  attempted to use the Hough transformation to decompose the 6DoF search into a series of fast one-dimensional cross-correlations.
Ii-B Visual Odometry
Similar to LiDAR odometry, visual odometry retrieves camera motion using information from images taken from different poses. Visual odometries can be simply divided into 2 classes: feature-based methods and direct methods.
Feature-based methods. These methods require feature extraction and association, mostly aiming at minimizing the reprojection error of the matched features. In feature extraction, typical image point features such as corners are well utilized, such as . Line features and other novel features can also be used for different image scenes or camera sensors.  combined ORB and LSD features to obtain more stable tracking in low-textured scenes. With multispectral cameras,  used log-Gabor wavelets to obtain interest points at different orientations and scales. In the optimization process, most works employ a nonlinear optimizer for feature matching between consecutive images, as previously mentioned . In addition, some works exploit filtering methods to track the features over an image sequence.  used the iterated sigma point Kalman filter to track the ego-motion trajectory and feature observation.  considered image features as group targets and used the probability hypothesis density filter to track the group states. Most feature-based visual odometries share the same problem of computing efficiency and accuracy for data association, similar to feature-based LiDAR odometries. Moreover, feature-based visual odometries only concentrate on the features extracted without considering the information remaining in the images, which actually places a strong requirement on feature abundance.
Direct methods. To eliminate the shortcomings of feature-based visual odometries above, direct visual odometries have appeared in recent studies. They directly use camera sensor measurements without precomputation, considering the photometric error for pose estimation. For instance,  presented a direct method working for RGB-D cameras.  proposed direct sparse odometry, which combines photometric error minimization and the joint optimization of camera model parameters.  introduced a direct line guidance odometry, which uses lines to guide the key point selection. There are also hybrid methods, such as semidirect visual odometry in  and deep learning-based methods.
Ii-C Other Odometries
To further improve the accuracy of the aforementioned odometries, many studies have attempted to incorporate inertial sensors.  each propose a visual-inertial odometry.  used IMU to improve the performance of LiDAR odometry in long-range navigation. Second, using LiDAR and a camera together is another direction.  implemented visual-LiDAR odometry, which has better robustness in conditions of lacking visual features or aggressive motion. There are also some works using radar sensors.
Ii-D Error Model
As Table I shows, only a small part of the ESO-related literature, such as , presents error models for uncertainty estimation. In these studies, the Hessian method and sampling method are 2 representative routes for error modeling. Consider a simple odometry model as an example:
where is the pose to be estimated as a column vector, is the corresponding observation set consisting of sensor measurements with i.i.d. noise of variance , and is the objective function that measures the matching error between and . The error modeling methods can be formulated as follows.
Hessian method. When is designed to be analytical and differentiable as
where represents a measurement model mapping to , Eq.1 can be solved using least squares. Therefore, the close-formed solution of can be approached recursively as
in the Newton method, so that the conditional covariance of can be derived as
where is defined as the Hessian matrix of in mathematics. For the uncertainty of measurements, is propagated to by the inverse Hessian matrix of , and this method is named the Hessian method.
Hessian methods are widely used in various odometries, such as . However, several problems place restrictions on its usage. First, for feature matching-based odometry, Hessian methods depend on a strong assumption that the feature correspondence is established correctly. Second, in some cases, the inverse Hessian Matrix is difficult to calculate but can be approximated from a Jacobian matrix such as , which actually decreases the covariance accuracy. Third, it is difficult to ensure that step in Eq.3 is infinitesimal, which is required by covariance calculation for nonlinear least squares. Due to these problems, many studies, such as , attempt to extend the Hessian method case by case, but it is still far from accurate.
Sampling method. For nonanalytical , the covariance can be calculated by sampling poses according to a distribution , such as the prediction from the motion model. Assuming the sample set , the mean value of these samples can be regarded as an estimation of
where denotes the probabilistic measurement model, so that the covariance can be calculated as
where the superscript represents the transpose operator here and later.
For instance,  used this method to calculate the covariance for scan matching. ROS package
Overall, existing error modeling methods strongly depend on a series of definitions and assumptions, which may have a negative influence on the uncertainty estimation. For example, the measurement model and objective function in the Hessian method and sampling distribution in the sampling method need to be manually designed or approximated, which may not objectively reflect the true relationship between target parameter and sensor observations . In addition, for fusion-based localization, there are another 2 important characteristics of ESO error that are often overlooked. First, the error model should be compatible with other localization modules for comparison. More importantly, the performance of ESO is sensitive to the scene. Therefore, a scene-aware error model is needed to capture the relationship between odometry performance and the environment.
Iii-a Fusion-based Localization with Scene-Aware Error Modeling
Assume that a PSO such as dead reckoning has a system error, where the model parameters are calibrated and are not correlated with the scene. Referring to such a PSO, a scene-aware error model of an ESO such as LiDAR or visual odometry can be learned, and a multimodal fusion-based localization can be achieved as described in Fig. 1.
At time , let and be the relative poses estimated by PSO and ESO, respectively.
where is the true relative pose, and are Gaussian noise of and with their respective covariances and . is a system error, while is predicted by a scene-aware error model on data that describes the scene at the moment.
An information filter is used to find an MAP (maximum a posteriori) estimation of the relative pose , which is represented by a mean pose and a covariance matrix . In Fig. 1, we denote the fusion module by , i.e., , which is operated recurrently and has the function of history memory. The process is detailed in the next section.
Since it is difficult to find accurate relative poses as the ground truth, we used the global pose by RTK-GPS instead. The supervision is conducted sporadically when the following two conditions are met: 1) a reliable GPS measurement is obtained, and 2) relative pose error has been accumulated for frames that exceed the error level of GPS.
Representing in a uniform matrix,
where are the rotation matrix and translation vector, and the vehicle’s mean pose at a global coordinate system can be estimated by accumulating the relative motions sequentially from an initial global pose .
In Fig. 1, we denote the module of the pose accumulator by , i.e., , which is operated recurrently and has the function of history memory.
When the supervision conditions are met, with a reliable GPS measurement , the localization error is backpropagated along the pipeline to optimize the parameters of the error model . Hence, the major pipeline of fusion-based localization with scene-aware error modeling can be summarized by the following formulas, where for conciseness, subscript is omitted.
Iii-B PSO and ESO Fusion
Relative Pose Fusion Using an Information Filter
The maximum a posterior probability (MAP) estimation of the vehiclesâ relative motion state can be formulated as below, consisting of two subsequent steps in each iteration, i.e., prediction using vehicle control and updating using measurement .
When occurs in a very short time, this recursive estimation can also be regarded as a tracking process over a series of velocity measurements.
An information filter  can be used to estimate vehicle pose, with the multivariate Gaussian distribution represented using an information vector and an information matrix in canonical representation as
where denotes the covariance matrix of this distribution. Obviously, and are dual to measure the uncertainty.
By extending the original information filter, a relative pose fuser (RPF) is developed in this research, as listed in Algorithm 1. At each frame, given and from the PSO and ESO modules, as well as their covariance matrices and as uncertainty estimation, rhe RPF function estimates the mean relative motion at time , and and . Here, we use the information matrix as the input of . The reasons are two-fold: 1) taking as an input of can reduce the computing cost of the inversing matrix; and 2) the numerical estimation of is more stable in the case of so that the formula of in Eq. 10 should be rewritten as
Step 1 of Algorithm 1 is coordinate transformation. Since this research estimates the vehicle’s relative pose using the information filter, at each frame , we have denoting the zero position and consequently , where is used to represent the zero vector or matrix here and later. On the other hand, the information matrix obtained in the previous iteration needs to be transformed to compensate for the rotation factor in . Assume can be decoupled as , where denotes the rotation factor and is the translation factor. Let we have
Here, , and denote the results obtained in the estimation of time , which are relative to the vehicle’s coordinate system at time . Whereas , and are the converted results in the estimation of time , which are relative to the vehicle’s coordinate system at time .
Steps 2-3 predict the information matrix and vector by incorporating the outputs of the referred PSO module, steps 4-5 are measurement updating using results from the target ESO, and step 6 is conversion from canonical representation to find a mean relative pose .
Extension to Multimodal Fusion
This model can be easily extended to a system with other mutually independent odometry modules. The output from the th target ESO module can be seen as independent observation variables similar to the 2-module system. Assuming that there are target ESO modules, the th observation and its covariance are , the probabilistic formulation can be extended as
so that in the information filter framework, steps 4 and 5 in Algorithm 1 can be rewritten as
Iii-C Scene-Aware Error Model Learning
The pipeline is shown in Fig. 2. For ESO, such as LiDAR/visual odometry, the scene data can be obtained from its input, such as a camera image or LiDAR point cloud, and the next step is to map it to the pose error, namely, the information matrix in RPF. An information matrix is symmetric and positive definite when ; hence, it can be factorized by Cholesky decomposition
is a unique lower-triangular matrix of . Define an information descriptor consisting of all the independent elements of . The neural network needs to be customized for different scene information and input output to , with which can be uniquely estimated as the predicted error.
To learn a parameter set of the neural network in Fig. 2, supervised learning is not adaptive, as the ground truth of neither the information descriptor nor the information matrix is available. However, the vehicle’s ground truth position can be obtained under certain conditions. For example, a vehicle pose can be measured using, e.g., a GPS/IMU suite or a loop closure detector. is considered as a ground truth location if and only if
where is the estimation at the time by fusing scan matching and dead reckoning outputs, and is the error level of the measurement .
Given a parameter set , the localization module initiates from a ground truth and estimates vehicle pose for steps. The localization error caused by is accumulated during these steps, which is evaluated at time as below, where is a hyperparameter to weight errors in location and heading angle.
Given a pair of ground truth positions , the objective is to learn a parameter set to minimize error , subject to .
is refined iteratively whenever a pair of the vehicle’s ground truth position is obtained, where learning is conducted in two subsequent steps: forward prediction and backpropagation, which are described in Fig. 3 and Algorithm 2.
Forward prediction estimates a sequence of vehicle poses on the current parameter set for steps, where the process of each step is described in lines 5-13 of Algorithm 2. Initiated from , forward prediction results in an estimation of the vehicle pose at time .
Backpropagation refines to minimize the error between the estimated vehicle pose at time and its ground truth . The functions , , and the neural network are differentiable, and the error can be backpropagated from time to by stochastic gradient descent. The gradient estimation and the backpropagation process are described in lines 14-23 of Algorithm 2.
Iv Experiment on LiDAR Odometry
An overview of the processing flow for fusing LiDAR odometry is given in Fig. 4. A simple dead reckoning (DR) is used as the referred PSO odometry. For the target LiDAR odometry, two classical 2D scan matching algorithms, CSM and PLICP, are selected to perform error model learning, and their traditional error models  and  are used to compare with our method, corresponding to the aforementioned sampling method and Hessian method, respectively.
Three experimental results are presented. First, simulation data at specifically designed simulation environments are used to verify the proposed method and demonstrate that the predicted error models can capture scene properties. Second, real-world data from an instrumented vehicle are used, where training and testing are conducted in the same campus environment to compare the performance. Third, experiments in an unexperienced environment are conducted, where training and testing are performed at different sites to demonstrate the generality of the proposed method in unexperienced scenes.
Iv-a State Definition and Network Design
For intelligent vehicles, generally, 2-dimensional localization is sufficient in the structural urban environment, so that we set in Eq. 7. More specifically, for relative positioning, any pose state is defined as a column vector including 3 independent elements , where are the displacement relative to the zero position in the local coordinate system, and is the heading change in the Euler angle.
For local relative localization, there is no considerable scenario change when cars move such a short distance (several meters). Therefore, only one of the two frames in scan matching is enough to represent the local scenario, which contains sufficient scene information as network input. Therefore, given a LiDAR scan , a neural network is designed to map it to an information matrix that models the error of scan matching result on . A CNN (convolutional neural network) is used due to its superior performance, which has been demonstrated in the literature such as . Therefore, a LiDAR scan is first converted to a binary image by regularly tessellating an ego-centered horizontal space, and each pixel value is or in which means there is no LiDAR observation falling into the grid and indicates that at least one LiDAR beam hit the grid. In this research, considering learning efficiency and the sparsity of LiDAR points, a image is generated for each scan at a dimension of , and the pixel size is . The detailed network structure is given in Fig. 5.
Iv-B Simulation Data Experiment
Gazebo and ROS are used as the simulator to build an artificial environment and collect simulated sensor data. As corridor scenes with two parallel featureless walls are very challenging for scan matching and their fusion-based approaches, such an environment is built, as shown in the first column of Fig. 6. The sensor set of the simulated car model includes a 360-degree horizontal LiDAR for scan matching, a wheel encoder, and a yaw rate sensor for dead reckoning. To make the simulated data more realistic, Gaussian noise is added to these sensor readings.
In data collection, the simulated car traveled along the corridor with a series of steering operations so that the direction of the LiDAR frame changed continuously. Two sets of data are collected for training and testing by driving the car along a rectangular and a circular trajectory, respectively.
Learning Result of Scene-Aware Error Model
Because the corridor walls are straight and parallel, the point features are monotonous. The error distribution of scan matching at such a scene usually has a main direction along the direction of the passage. Moreover, the covariance can be estimated by a conventional solution  for comparison. Several typical cases in the testing process are shown in Fig. 6, where the results of our method are compared with those of a conventional method that are shown side by side. The covariances are represented by 2 standard deviation ovals and sampled scattered points, which are drawn in the plane. Apparently, due to the dependence of sampling, although the conventional solution can give the correct main direction of covariance, the scale is not accurate enough, which may lead to a worse localization fusion.
In contrast, our model can calculate more than the correct main direction of covariance, and it also obtains a more accurate covariance scale for fusion, which matches the error scale of odometry. The position error statistics of every 40-meter-long trajectory segment in the testing process are shown in Fig. 9, in which Fig. (a)a gives the Euclidean distance error distribution at the end of every trajectory, and Fig. (b)b shows the corresponding yaw error distribution. Our method has obvious advantages in the comparison of localization accuracy on both of these LiDAR odometry algorithms.
Iv-C Real Data Experiment
An instrumented vehicle, as shown in Fig. 13, is used to collect data at a real-world scene to evaluate the performance of the proposed method. The following sensors are used: 1) LiDARs are horizontally mounted in the front and rear of the car profile for scan matching; 2) a wheel encoder and a yaw rate sensor are used for dead reckoning; and 3) a highly accurate GPS/IMU suite is used to obtain ground truth locations of the vehicle for model training and localization result evaluation.
For experiments on experienced scenes, two sets of data are collected in the same region of the Peking University campus for training and testing, which are conducted on different days. For experiments on unexperienced scenes, we collect a large-scale dataset in several different regions with a total mileage of approximately 10 km. Testing data accounts for 40% of the dataset, most of which cannot be seen by LiDAR in the training data. To avoid the great accuracy disparity between the referred PSO and the target ESO, which may lead to no complementary information for fusion, we manually adjusted the accuracy settings of the sensors in the experiments for different control groups.
During the experiment, a new scan matching is triggered once the car moves ahead by 1.0 meter or the heading angle changes by 30 degrees since the last operation. In the training process, we use a constant step , meaning that the program conducts forward prediction based on the current parameter set for every steps. Then, a ground truth is obtained from the GPS/IMU suite and is used to adjust through backpropagation along the sequence to minimize the error between and the predicted location . However, the hyperparameter of the loss function is set to 100.0 in this research to weight the errors in distance and angle.
Learning Result of Scene-Aware Error Model
The CSM is used to examine our error model performance in different training stages. During training, a new is learned every steps with a ground truth location obtained. Such a procedure iterates until a limit condition is reached. Below, we use “epoch” to denote a single pass through the full training set, and let to represent the learned parameter set at Epoch . At each specific scene, the predicted covariance error of LiDAR scan matching changes with . This result is analyzed in Fig. (a)a, where three scenes are selected, and the predicted covariance is represented by 2 standard deviation ovals and sampled scatter points. With the initial parameter set , the predicted covariance of all three scenes shows quite similar shapes. As the number of epochs increases, the shapes vary differently, but they show a tendency of converging to their own stable states. We use the parameter sets , , and to estimate the sequences of vehicle poses, which are drawn in Fig. (b)b as trajectories A, B, C and D, respectively. It is obvious that the localization error decreases progressively from trajectory A to D, demonstrating the efficiency of the learning procedure, where the accuracy of the predicted covariance error model is greatly improved.
Localization Accuracy in Experienced Environments
The localization accuracy of the proposed method is compared with dead reckoning, LiDAR odometry CSM and PLICP, and their conventional fusion-based method using covariance estimation in  and . The sample trajectories estimated by these methods on testing data are shown in Fig. (a)a. Compared with the traditional fusion method, the trajectories of our fusion method (solid line) are closer to the ground truth than the traditional methods (dashed dotted line). With the GPS/IMU output as the ground truth, the position and heading error statistics of every 100-meter-long trajectory segment are plotted in Fig. (b)b and Fig. (c)c. For the fusion of DR and CSM, our method obtains a 12.7% and 49.5% reduction in the average Euclidean distance error and average yaw error, respectively; for the fusion of DR and PLICP, our method obtains a 48.1% and 75.1% reduction in the average Euclidean distance error and average yaw error, respectively.
Localization Accuracy in Unexperienced Environments
Similarly, localization trajectories on testing data of unexperienced regions are compared with other methods, as shown in Fig. (a)a. The position and heading error statistics of every 100-meter-long trajectory segment are plotted in Fig. (b)b and (c)c. From Fig. (a)a, we can see that the trajectories of our fusion methods denoted by solid lines are closer to the ground truth than the traditional trajectories drawn by dashed dotted lines. From a statistical perspective, our methods reduce the average Euclidean error by 18.5%(DR+CSM) and 27.4%(DR+PLICP) and the average yaw error by 33.9%(DR+CSM) and 45.8%(DR+PLICP).
To compare with the conventional method of error modeling, it is noteworthy that the measurement noise parameter in Eq. 4 of the Hessian method and the likelihood in Eq. 5 of the sampling method has a strong effect on the covariance scale, which may lead to different fusion accuracies in RPF . Therefore, in the real data experiment section, we perform a grid search to rescale the covariance from the conventional method so that their best performance on the error scale can be used to compare with our method.
V Experiment on Visual Odometry
This is a supplementary experiment to prove that our method is also effective in odometry of other modalities except for LiDAR odometry. An overview of the processing flow for fusing visual odometry is given in Fig. 22. Similar to the experiment on LiDAR odometry, dead reckoning (DR) is used as the referred PSO module. For the target visual odometry, three representative algorithms, LIBVISO(feature-based method), DSO(direct method) and PL-SVO(a variant of SVO, semidirect method), are selected to perform error model learning. However, because there is no available error model for LIBVISO and DSO, the conventional error model comparison can only be performed on PL-SVO. Limited to sensor equipment, all of these visual odometries work in a monocular mode in our experiments.
V-a State Transition Model and Network Design
Here, we use the same state definition as the experiments on LiDAR odometry, and the 6DoF results of visual odometry used in our experiments are projected to 3DoF in the same coordinate as GPS/IMU. However, as it is challenging for monocular visual odometries to output reliable scale information, we need to customize our state transition model Eq. 7 as follows:
and is the function for calculating the translation scale of relative movement so that we can use an extended information filter to track this nonlinear state update. Steps 4-5 in Algorithm 1 should be modified as
where , .
For visual odometry, the real-time images can be used as scene information, so that a similar network architecture, as shown in Fig. 5, is used in this experiment. For the sake of training efficiency, we resize the grayscale image to the size of (pixel), and the network structural parameters related to the input image size are also-modified accordingly.
V-B Real Data Experiment
This dataset was also collected using the platform, as shown in Fig. 13. The following sensors are used: 1) a monocular camera is mounted above the windshield for visual odometry; 2) a wheel encoder and an IMU with lower precision are used for dead reckoning; and 3) a highly accurate GPS/IMU suite is used to obtain ground truth locations of the vehicle for model training and localization result evaluation.
Based on the good performance of the experiment on LiDAR odometry, we only challenge the experiments on visual odometry in an unexperienced environment to verify the extensibility of our method. Similarly, a large-scale dataset is collected in several different regions with a total mileage of approximately 10 km. Training data accounts for 60% of the dataset, where there is almost no scene intersection with the remaining data for testing.
To compare trajectories from different methods synchronously, we keep the trigger behavior of DSO and align the trigger time of the other visual odometries LIBVISO and PL-SVO with DSO. In the training process, we use a constant step , and the hyperparameter of the loss function is set to 100.0 to balance the errors in distance and angle.
The localization accuracy of the proposed method is compared with dead reckoning, DSO, LIBVISO, PL-SVO, and the conventional fusion-based method using covariance estimation of PL-SVO in the author’s open source code
In contrast, the conventional method of PL-SVO error modeling achieves poor performance. In addition to the reasons analyzed in subsection II-D, another important reason lies in its derivation. PL-SVO optimizes the SE(3) pose using the left multiplicative perturbation model on its Lie group so that its covariance on needs to be mapped to SE(3) and SE(2) to be used in our state transition model as Eq. 22. In this process, several nonlinear mappings need to be linearized, which makes this error model more inaccurate.
In this research, a scene-aware error model is designed for LiDAR/visual odometry, and a localization fusion framework is developed to fuse the results using such an error model. Moreover, an end-to-end learning method is devised to train the error model in the proposed localization fusion framework.
We thoroughly evaluate the proposed method on simulation data to verify its adaptability at various simple but typical scenes and on real data to examine its efficiency in real-world situations. The experimental results demonstrate that the proposed method is efficient in learning the CNN-based error model, and the localization accuracy based on such models is superior compared with the fusion accuracy of the other traditional methods.
Future work will focus on the following limitations of the proposed method.
1) Gradient vanishing problem. This is a general problem of training RNNs (recurrent neural networks). Apparently, the error model learning process of our method is similar to the typical RNN. When training using a long trajectory with a large number of iterations, we expect the error information to be amplified by continuous rotation, whereas sometimes it is also likely to be overwhelmed.
2) Optimization of the training trigger strategy. In our experiment, every backpropagation is performed after forward prediction with fixed time steps, which is convenient for off-line batch training. However, such a method cannot fit Eq.19 properly and makes it inefficient to be extended to on-line learning.
3) Hyperparameter setting. The hyperparameter in the loss function (Eq. 20) is another important factor for training performance. Manually in the experiment above is selected after many attempts. This troublesome but necessary procedure must be performed for different datasets.
Appendix A The Original Information Filter
Assume that the state transition and measurement probabilities are governed by the following linear Gaussian equations:
where and are the control and measurement at time , and denote their Gaussian noise with covariances and , respectively.
Probabilistic estimation of using a Gaussian filter finds a mean pose and a covariance matrix . Whereas using an information filter , the Gaussian distribution is represented in canonical representation, and the problem is to estimate an information vector and an information matrix , which is described in Algorithm A-1.
Appendix B Derivation of the RPF Algorithm
- ”amcl” is a probabilistic localization system for a robot moving in 2D, http://http://wiki.ros.org/amcl
- M. H. Hebert, C. E. Thorpe, and A. Stentz, Intelligent unmanned ground vehicles: autonomous navigation research at Carnegie Mellon. Springer Science & Business Media, 2012, vol. 388.
- J. L. Jones, “Robots at the tipping point: the road to irobot roomba,” IEEE Robotics & Automation Magazine, vol. 13, no. 1, pp. 76–78, 2006.
- S. Thrun, M. Montemerlo, H. Dahlkamp, D. Stavens, A. Aron, J. Diebel, P. Fong, J. Gale, M. Halpenny, G. Hoffmann et al., “Stanley: The robot that won the darpa grand challenge,” Journal of field Robotics, vol. 23, no. 9, pp. 661–692, 2006.
- G. Nelson, A. Saunders, and R. Playter, “The petman and atlas robots at boston dynamics,” Humanoid Robotics: A Reference, pp. 169–186, 2019.
- J. Georgy, T. Karamat, U. Iqbal, and A. Noureldin, “Enhanced mems-imu/odometer/gps integration using mixture particle filter,” GPS solutions, vol. 15, no. 3, pp. 239–252, 2011.
- A. N. Ndjeng, D. Gruyer, S. Glaser, and A. Lambert, “Low cost imu–odometer–gps ego localization for unusual maneuvers,” Information Fusion, vol. 12, no. 4, pp. 264–274, 2011.
- P. Biber and W. Straßer, “The normal distributions transform: A new approach to laser scan matching,” in Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), vol. 3. IEEE, 2003, pp. 2743–2748.
- E. B. Olson, “Real-time correlative scan matching,” in 2009 IEEE International Conference on Robotics and Automation. IEEE, 2009, pp. 4387–4393.
- J. Zhang and S. Singh, “Loam: Lidar odometry and mapping in real-time.” in Robotics: Science and Systems, vol. 2, 2014, p. 9.
- B. Kitt, A. Geiger, and H. Lategahn, “Visual odometry based on stereo image sequences with ransac-based outlier rejection scheme,” in Intelligent Vehicles Symposium (IV), 2010.
- C. Forster, M. Pizzoli, and D. Scaramuzza, “Svo: Fast semi-direct monocular visual odometry,” in 2014 IEEE international conference on robotics and automation (ICRA). IEEE, 2014, pp. 15–22.
- J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 3, pp. 611–625, 2017.
- H.-P. Chiu, X. S. Zhou, L. Carlone, F. Dellaert, S. Samarasekera, and R. Kumar, “Constrained optimal selection for multi-sensor robot navigation using plug-and-play factor graphs,” in 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 663–670.
- S. R. Sukumar, H. Bozdogan, D. L. Page, A. F. Koschan, and M. A. Abidi, “Sensor selection using information complexity for multi-sensor mobile robot localization,” in Proceedings 2007 IEEE International Conference on Robotics and Automation. IEEE, 2007, pp. 4158–4163.
- G. Bresson, M.-C. Rahal, D. Gruyer, M. Revilloud, and Z. Alsayed, “A cooperative fusion architecture for robust localization: Application to autonomous driving,” in 2016 IEEE 19th international conference on intelligent transportation systems (ITSC). IEEE, 2016, pp. 859–866.
- S. Thrun, W. Burgard, and D. Fox, Probabilistic robotics. MIT press, 2005.
- S. Shen, Y. Mulgaonkar, N. Michael, and V. Kumar, “Multi-sensor fusion for robust autonomous flight in indoor and outdoor environments with a rotorcraft mav,” in 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 4974–4981.
- D. P. Koch, T. W. McLain, and K. M. Brink, “Multi-sensor robust relative estimation framework for gps-denied multirotor aircraft,” in 2016 International Conference on Unmanned Aircraft Systems (ICUAS). IEEE, 2016, pp. 589–597.
- O. Bengtsson and A.-J. Baerveldt, “Robot localization based on scan-matchingâestimating the covariance matrix for the idc algorithm,” Robotics and Autonomous Systems, vol. 44, no. 1, pp. 29–40, 2003.
- S. Bonnabel, M. Barczyk, and F. Goulette, “On the covariance of icp-based scan-matching techniques,” in 2016 American Control Conference (ACC). IEEE, 2016, pp. 5498–5503.
- M. Bloesch, S. Omari, M. Hutter, and R. Siegwart, “Robust visual inertial odometry using a direct ekf-based approach,” in 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2015, pp. 298–304.
- T. N. N. Hossein, S. Mita, and H. Long, “Multi-sensor data fusion for autonomous vehicle navigation through adaptive particle filter,” in 2010 IEEE Intelligent Vehicles Symposium. IEEE, 2010, pp. 752–759.
- D. Gulati, F. Zhang, D. Clarke, and A. Knoll, “Vehicle infrastructure cooperative localization using factor graphs,” in 2016 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2016, pp. 1085–1090.
- G. Hemann, S. Singh, and M. Kaess, “Long-range gps-denied aerial inertial navigation with lidar localization,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016, pp. 1659–1666.
- S. Lynen, M. W. Achtelik, S. Weiss, M. Chli, and R. Siegwart, “A robust and modular multi-sensor fusion approach applied to mav navigation,” in 2013 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2013, pp. 3923–3929.
- A. Censi, “An icp variant using a point-to-line metric,” in 2008 IEEE International Conference on Robotics and Automation, May 2008, pp. 19–25.
- R. Gomez-Ojeda and J. Gonzalez-Jimenez, “Robust stereo visual odometry through a probabilistic combination of points and line segments,” in 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 2521–2526.
- M. Bosse and R. Zlot, “Continuous 3d scan-matching with a spinning 2d laser,” in 2009 IEEE International Conference on Robotics and Automation. IEEE, 2009, pp. 4312–4319.
- L. Armesto, J. Minguez, and L. Montesano, “A generalization of the metric-based iterative closest point technique for 3d scan matching,” in 2010 IEEE International Conference on Robotics and Automation. IEEE, 2010, pp. 1367–1372.
- M. Velas, M. Spanel, and A. Herout, “Collar line segments for fast odometry estimation from velodyne point clouds,” in 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 4486–4495.
- D. Wang, J. Xue, Z. Tao, Y. Zhong, D. Cui, S. Du, and N. Zheng, “Accurate mix-norm-based scan matching,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 1665–1671.
- M. Magnusson, A. Lilienthal, and T. Duckett, “Scan registration for autonomous mining vehicles using 3d-ndt,” Journal of Field Robotics, vol. 24, no. 10, pp. 803–827, 2007.
- E. Olson, “M3rsm: Many-to-many multi-resolution scan matching,” in 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2015, pp. 5815–5821.
- M. Jaimez, J. G. Monroy, and J. Gonzalez-Jimenez, “Planar odometry from a radial laser scanner. a range flow-based approach,” in 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 4479–4485.
- F. T. Ramos, D. Fox, and H. F. Durrant-Whyte, “Crf-matching: Conditional random fields for feature-based scan matching.” in Robotics: Science and Systems, 2007.
- A. Diosi and L. Kleeman, “Fast laser scan matching using polar coordinates,” The International Journal of Robotics Research, vol. 26, no. 10, pp. 1125–1153, 2007.
- A. Censi and S. Carpin, “Hsm3d: feature-less global 6dof scan-matching in the hough/radon domain,” in 2009 IEEE International Conference on Robotics and Automation. IEEE, 2009, pp. 3899–3906.
- A. Howard, “Real-time stereo visual odometry for autonomous ground vehicles,” in 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2008, pp. 3946–3952.
- T. Mouats, N. Aouf, A. D. Sappa, C. Aguilera, and R. Toledo, “Multispectral stereo odometry,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 3, pp. 1210–1224, 2014.
- F. Zhang, H. Stähle, A. Gaschler, C. Buckl, and A. Knoll, “Single camera visual odometry based on random finite set statistics,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 559–566.
- J. Engel, J. Sturm, and D. Cremers, “Semi-dense visual odometry for a monocular camera,” in The IEEE International Conference on Computer Vision (ICCV), December 2013.
- C. Kerl, J. Sturm, and D. Cremers, “Robust odometry estimation for rgb-d cameras,” in 2013 IEEE International Conference on Robotics and Automation. IEEE, 2013, pp. 3748–3754.
- R. Wang, M. Schworer, and D. Cremers, “Stereo dso: Large-scale direct sparse visual odometry with stereo cameras,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3903–3911.
- S.-J. Li, B. Ren, Y. Liu, M.-M. Cheng, D. Frost, and V. A. Prisacariu, “Direct line guidance odometry,” in 2018 IEEE international conference on Robotics and automation (ICRA). IEEE, 2018, pp. 1–7.
- S. Wang, R. Clark, H. Wen, and N. Trigoni, “Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks,” in 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017, pp. 2043–2050.
- P. Tanskanen, T. Naegeli, M. Pollefeys, and O. Hilliges, “Semi-direct ekf-based monocular visual-inertial odometry,” in 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2015, pp. 6073–6078.
- V. Usenko, J. Engel, J. Stückler, and D. Cremers, “Direct visual-inertial odometry with stereo cameras,” in 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 1885–1892.
- T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Transactions on Robotics, vol. 34, no. 4, pp. 1004–1020, 2018.
- J. Zhang and S. Singh, “Visual-lidar odometry and mapping: Low-drift, robust, and fast,” in 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2015, pp. 2174–2181.
- M. Barjenbruch, D. Kellner, J. Klappstein, J. Dickmann, and K. Dietmayer, “Joint spatial-and doppler-based ego-motion estimation for automotive radars,” in 2015 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2015, pp. 839–844.
- P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,” in Sensor Fusion IV: Control Paradigms and Data Structures, vol. 1611. International Society for Optics and Photonics, 1992, pp. 586–607.
- F. Lu and E. Milios, “Robot pose estimation in unknown environments by matching 2d range scans,” Journal of Intelligent and Robotic systems, vol. 18, no. 3, pp. 249–275, 1997.
- A. Censi, “An accurate closed-form estimate of icp’s covariance,” in Proceedings 2007 IEEE international conference on robotics and automation. IEEE, 2007, pp. 3167–3172.
- Y. Aksoy and A. A. Alatan, “Uncertainty modeling for efficient visual odometry via inertial sensors on mobile devices,” in 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 2014, pp. 3397–3401.
- M. Brenna, “Scan matching covariance estimation and slam: models and solutions for the scanslam algorithm,” Ph.D. dissertation, Politecnico di Milano, 2009.
- O. Bengtsson and A.-J. Baerveldt, “Localization in changing environments-estimation of a covariance matrix for the idc algorithm,” in Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No. 01CH37180), vol. 4. IEEE, 2001, pp. 1931–1937.
- G. Kantor and S. Singh, “Preliminary results in range-only localization and mapping,” in Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), vol. 2. Ieee, 2002, pp. 1818–1823.
- M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng, “Ros: an open-source robot operating system,” in ICRA workshop on open source software, vol. 3, no. 3.2. Kobe, Japan, 2009, p. 5.
- A. G. Buch, D. Kraft et al., “Prediction of icp pose uncertainties using monte carlo simulation with synthetic depth images,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 4640–4647.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99.
- A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnn features off-the-shelf: an astounding baseline for recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2014, pp. 806–813.
- S. Gidaris and N. Komodakis, “Object detection via a multi-region and semantic segmentation-aware cnn model,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1134–1142.
- N. P. Koenig and A. Howard, “Design and use paradigms for gazebo, an open-source multi-robot simulator.” in IROS, vol. 4. Citeseer, 2004, pp. 2149–2154.
- H. Strasdat, J. Montiel, and A. J. Davison, “Scale drift-aware large scale monocular slam,” Robotics: Science and Systems VI, vol. 2, no. 3, p. 7, 2010.
- T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ, and S. Khudanpur, “Recurrent neural network based language model,” in Eleventh annual conference of the international speech communication association, 2010.
- R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks,” in International conference on machine learning, 2013, pp. 1310–1318.