A fast, complete, point cloud based loop closure for LiDAR odometry and mapping

A fast, complete, point cloud based loop closure for LiDAR odometry and mapping

Jiarong Lin and Fu Zhang J. Lin and F. Zhang are with the Department of Mechanical Engineering, Hong Kong University, Hong Kong SAR., China. jiarong.lin, fuzhang@hku.hk
Abstract

This paper presents a loop closure method to correct the long-term drift in LiDAR odometry and mapping (LOAM). Our proposed method computes the 2D histogram of keyframes, a local map patch, and uses the normalized cross-correlation of the 2D histograms as the similarity metric between the current keyframe and those in the map. We show that this method is fast, invariant to rotation, and produces reliable and accurate loop detection. The proposed method is implemented with careful engineering and integrated into the LOAM algorithm, forming a complete and practical system ready to use. To benefit the community by serving a benchmark for loop closure, the entire system is made open source on Github 111https://github.com/hku-mars/loam_livox.

I Introduction

With the capacity of estimating the 6 degrees of freedom (DOF) state, and meanwhile building the high precision maps of surrounding environments, SLAM methods using LiDAR sensors have been regarded as an accurate and reliable way for robotic perception. In the past years, LiDAR odometry and mapping (LOAM) have been successfully applied in the field of robotics, like self-driving car[13], autonomous drone [4, 9], field robots surveying and mapping[18, 24], etc. In this paper, we focus on the problem of developing a fast and complete loop closure system for laser-based SLAM systems.

Loop closure is an essential part of SLAM system, to estimate the long term accumulating drift caused by local feature matching. In a common paradigm of loop closure, the successful detection of loops plays an essential role. Loop detection is the ability of recognizing the previously visited places, by giving a measurement of similarity between any two places. For visual-slam methods, the loop closure considerably benefits from various largely available computer vision algorithms. For example, by utilizing the bag-of-words model [8, 7] and clustering the feature descriptors as words, the similarity between observations can be computed in the word space. This kind of method has been used in most of the state of the art visual SLAM system (e.g. [17, 22]) and have achieved great success in the past years.

Unlike the visual-SLAM, the relevant research of laser-based loop closure is rare, and it is surprisingly hard for us to find any available open soured solution which addresses this problem. We conclude these phenomenons as two main reasons: Firstly, compared to the camera sensors, the cost of LiDAR sensors are extremely expensive, preventing them in wider use. In most of the robotics navigation perception, LiDARs is not always the first choice. Secondly, the problem of place recognition on point cloud is very challenging. Unlike 2D images containing rich information such as textures and colors, the available informations in point cloud are only the geometry shapes in 3D space.

In this paper, we develop a fast, complete loop closure system for LiDAR odometry and mapping (LOAM), consisting of fast loop detection, maps alignment, and pose graph optimization. We integrate the proposed loop closure method into a LOAM algorithm with Livox MID40222https://www.livoxtech.com/mid-40-and-mid-100 sensor, a high performance low cost LiDAR sensor easily available. Some of the results we obtain are shown in Fig. 2 and Fig. 2. To contribute to the development of laser-based slam methods, we will open source all the datasets and codes on Github1.

Fig. 1: An example of loop closure around the Main Building of the Hong Kong University (HKU). (a), the RGB image of the map area; (b), the red and withe points are off the map before and after loop closure, respectively; (c), the red dashed line indicates the detected loop, the green, and blue solid lines are the trajectories before and after loop closure, respectively.
Fig. 2: The large scale loop closure of the Hong Kong University of Science and Technology (HKUST) campus. We align the point cloud map after loop closure with the satellite image. Our video is available at https://youtu.be/fOSTJ_yLhFM .
Fig. 1: An example of loop closure around the Main Building of the Hong Kong University (HKU). (a), the RGB image of the map area; (b), the red and withe points are off the map before and after loop closure, respectively; (c), the red dashed line indicates the detected loop, the green, and blue solid lines are the trajectories before and after loop closure, respectively.
Fig. 3: The overview of our system.

Ii Related work

Loop closure is widely found in visual-SLAM to correct its long-term drift. The commonly used pipeline mainly consists of three steps: First, local feature of a 2D images is extracted by using handcrafted descriptors such as Scale-Invariant Feature Transforms (SIFT) [15], Binary Robust Independent Elementary Features (BRIEF) [5], Oriented Fast and Rotated BRIEF(ORB) [23], etc. Then, a bag-of-world model [8, 7] is used to cluster these features and build a dictionary to achieve loop detection. Finally, with the detected loop, a pose graph optimization is formulated and solved update the historic poses in the map.

Unlike the visual-SLAM, loop detection for point cloud data is still an open, challenging problem in laser-based SLAM. Bosse et al [3] achieve place recognition by directly extracting keypoints from the 3D point cloud and describe them with a handcrafted 3D Gestalt descriptors. Then the keypoints vote for their nearest points and the scores are used to determine if a loop is detected. The similar method is also used in [10]. Magnusson et al [16] describe the appearance of 3D point cloud by utilizing the normal distribution transform (NDT), they compute the similarity of two scans from the histogram of the NDT descriptors.

Besides the hand-crafted descriptors, learning-based method has also been used in loop detection(or place recognitions) in recent years. For example, the SegMatch proposed by Dube et al [6] achieves place recognition by matching semantic features like buildings, tree, vehicles, etc. Angelina et al [1] realize place recognition by extracting the global descriptor from an end-to-end network, which is trained by combining the PointNet [21] and NetVLAD [2]. The learning-based method is usually computationally expensive, and the performances greatly rely on the dataset in the training process.

Although with these reviewed work, to our best knowledge, there is no open-sourced codes or dataset that benchmark the problem of loop closure for LOAM, which leaves some difficulties for readers on reproducing their works. To this purpose, we propose a complete loop closure system. The loop detection of our work is mainly inspired by the method of [16] and some of the adjustments are made in our scenarios. Due to the small FoV and special scanning pattern of our considered LiDAR sensor, we perform loop detection for an accumulated time of scans (i.e., the keyframe). To summarize, our contributions are threefold: (1) we develop a fast loop detection method to quickly measure the similarity of two keyframes; (2) we integrate our loop closure system, consisting of the loop detection, map alignment, and pose optimization into an LiDAR odometry and mapping algorithm (LOAM) [14], leading to a complete and practical system ready to use; (3) we provide an available solution and paradigm for point cloud based loop closure by opening source our systems and datasets on Github.

Iii System Overview

The workflow of our system is shown in Fig. 3, each new frame (i.e., scan) input from LiDAR is registered to the global map by LOAM algorithm [14]. If a specified number of frames have been received (e.g., 100 frames), a keyframe created, which forms a small local map patch. The raw points, which were registered to the cells of the global map (Section IV) by LOAM, corresponding to the new keyframe are retrieved to compute its 2D histogram (see Section V). The computed 2D histogram is compared to the database, which contains 2D histograms of the global map consisting of all the past keyframes, to detect a possible loop (see Section VI). Meanwhile, the new keyframe 2D histogram is added to the database for the use of next keyframe. Once a loop is detected, the keyframe is aligned with global map and a pose graph optimization is performed to correct the drift in the global map.

Iv Map and cell

In this section, we will introduce two key elements of our algorithm, the maps and cell. For conveniently, We use and denote the map and cell, respectively.

Iv-a Cell

A cell is a small cube of a fixed size (i.e., and in , and directions) by partitioning the 3D space. It is represented by its center location and created by the first point in it

(1)

Let denote the number of points located in a cell , the mean and covariance of this cell is:

(2)
(3)

Notice that the cell is a fixed partitioning of the 3D space are is constantly populated with new points. To speed up the computation of mean 2 and covariance 3, we derive its recursive form as follows. Denote the new point, is the number of existing points in a cell with mean and covariance . The mean and covariance of all the points in the cell are:

(4)
(5)

Therefore, a cell is composed of its static center , the dynamically updated mean and covariance , and the raw points collection : .

Iv-B Map

The map is the collection of all raw points saved in cells. More specifically, consists of a hash table and a global octree . The hash table enables to quickly find the specific cell according to its center . The octree enables to find out all cells located in the specific area of given range. These two are of significant importance in speeding up the maps alignments.

For any new added cell , we compute its hash index using the XOR operation of hash index of its individual components: (). The computed hash index is then added to the hash table of the map . Since the cell is a fixed partitioning of the 3D space, its center location is static, requiring no update for existing entries in the hash table (the hash table is dynamically growing though).

The new added cell is also added to the Octree according to its center location, similar to the OctoMap in [12]. Algorithm 1 illustrates the procedure of incrementally creating cells and maps from new frames.

Input : Points from -th frame, Current map , the pose estimated from LOAM algorithm
for each  do
       Transform to global frame by .
       Compute the cell center from (1).
       Compute the hash index .
       if   then
             Create new cell with center .
             Insert to hash table of map .
             Insert to Octotree of map .
      Add to .
       Update mean of using (4).
       Update covariance of using (5).
Algorithm 1 Registration of new frame

V 2D histogram of rotation invariance

The main idea of our fast loop detection is that we use the 2D image-like histograms to roughly describe the keyframe. The 2D histogram describes the distribution of the Euler-angles of the feature direction in a keyframe.

V-a The feature type and direction in a cell

As mentioned previously, each keyframe consists of a number of (e.g., 100) frames and each frame (i.e., scan) is partitioned into cells. For each cell, we determine the shape formed by its points and the associated feature direction (denoted as ). Similar to [16], we perform eigenvalue decomposition on the covariance matrix in (3):

(6)

where is diagonal matrix with eigenvalues in descending order (i.e., ). In practice, we only consider cells with 5 or more points to increase the robustness.

  • Cell of plane shape: If is significantly larger than , we regard this cell as a plane shape and regard the plane normal as the feature direction, i.e., where is the third column of the matrix .

  • Cell of line shape: If the cell is not a plane and is significantly larger than , we regard this cell as a line shape and regard the line direction as the feature direction, i.e., , the first column of .

  • Cell with no feature: A cell which is neither a line nor plane shape is not considered.

Fig. 4: The Euler angle of a feature direction and its contribution to the 2D histogram, each element of the 2D histogram is the number of feature directions with pitch and yaw located in the corresponding bin.

V-B Rotation invariance

In order to make our feature descriptors invariant to arbitrary rotation of the keyframe, we rotate each feature direction by multiplying it to an additional rotation matrix , and expect that most of the feature direction are lie on -axis, and the secondary most are on -axis. Since plane feature is more reliable than line feature (e.g., the edge of plane feature are treated as a line feature), we use the feature direction of plane cells to determine the rotation matrix . Similar to the previous sections, we compute the covariance of all plane feature directions in a keyframe:

(7)

where is the number of plane cells, denotes the feature direction (i.e., plane normal) of the -th plane cell. Similarly, the eigenvalue decomposition of is:

(8)

where is a diagonal matrix with eigenvalues in descending order (), is the eigenvector matrix. Then, the rotation matrix is determined as:

(9)

After compute the rotation matrix , we apply the rotation transformation to all feature (both plane and line) directions.

Input : Current keyframe
Output : 2D Histogram of line cell
2D Histogram of plane cell
Start :  , .
Compute rotation matrix from Sect. V-B.
for each  do
       if  is a line shape  then
            
             Compute the pitch and yaw angle of .
             +=1
      if  is a plane shape  then
            
             Compute the pitch and yaw angle of .
             +=1
      
Gaussian blur and .
Algorithm 2 Computing the 2D hist. of a keyframe
Fig. 5: A keyframe is consists of (e.g., ) frames (not shown), which then contains many cells. Each keyframe has two 2D histograms, one for line cells and the other one for plane cells.

V-C 2D histogram of keyframe

(a) The visualization of two keyframes (frame A and B), 2D histograms, and contained cells. Fig. A1 is the RGB image of the first scene. Fig. A2 and Fig. A3 are the side-view and bird view of the keyframe, respectively. In Fig. A2 and Fig. A3, the red points denote the raw point cloud in the keyframe, the white cubes denote the cells, the blue lines are the feature direction of plane cells, and the yellow lines are the feature direction of line cells. Fig. A4 and Fig. A5 are the 2D histogram (pixels are colored by their values) of plane and line features, respectively. The arrangement of Fig. B1B5 is the same as Fig. A1A5.
(b) The similarity in plane features between frame A and A (”P_A2A”), and between frame A and B (”P_A2B”), and in line features between frame A and A (”L_A2A”), and between frame A and B (”L_A2B”). Polar distance is the similarity level while ploar angle is the magnitude of random rotations.
Fig. 6: The visualization of keyframe, cells, and 2D histograms (a) and the evaluation of rotation invariance (b).

With the rotation invariant feature directions of all cells in a keyframe, we compute the 2D histogram as follows:

Firstly, for a given feature direction , we choose the direction with positive components, i.e., . Then, the Euler angle of the feature direction is computed (see Fig. 4):

(10)
(11)

The 2D-histogram we use is a matrix (have resolution on both pitch and yaw angle), the elements of this matrix denote the number of line/plane cell with its pitch and yaw located in the corresponding bin. For example, -th row, -th column element, , is the number of cells with the angle of its feature direction satisfied:

To increase the robustness of the 2D histogram to possible noise, we apply a Gaussian blur on each 2D histogram we computed.

The complete algorithm of computing the 2D histogram with rotation invariance is shown in Algorithm. 2.

Vi Fast loop detection

Vi-a Procedure of loop detection

As mentioned previously, we group frames (e.g., ) into a keyframe . It can be viewed as a local patch of the global map , and contains all of the cells appearing in the last frames, as shown in Fig. 5. We compute the 2D histogram of a new keyframe and its similarity (Section VI. B) with all keyframes in the past to detect a loop. The keyframe with a detected loop is then matched to the map (Section VI. C) and the map is updated with a pose graph optimization (Section VI. D).

Vi-B Similarity of two keyframes

For each newly added keyframe, we measure its similarity to all history keyframes. In this work, we use the normalized cross-correlation of 2D histograms to compute their similarity, which has been widely used in the field of computer vision (e.g., template matching, image tracking, etc.). The similarity of two 2D histogram is computed as:

(12)

where is the mean of and is the index of the element in . If the similarity between two keyframes is higher than threshold (e.g., for plane and for line), a loop is thought to be detected.

Vi-C Maps alignment

After the successful detecting of a loop, we perform maps alignment to compute the relative pose between two keyframes. The problem of maps alignment can be viewed as the registration between the target point cloud and source point cloud, as their work of [20].

Since we have classified the cell of linear shape and planar shape in our LOAM algorithm [14], we use the feature of edge-to-edge and planar-to-planar to iteratively solve the relative pose.

After the alignment, if the average distance of the points of edge/plane feature on is close enough to the edge/plane feature (distance less than ), we regard these two maps are aligned.

Vi-D Pose graph optimization

As the workflow is shown in Fig. 3, once the two keyframes are aligned, we perform the pose graph optimization following the method in [19, 11]. We implement the graph optimization using the Google ceres-solver333http://ceres-solver.org/. After optimizing the pose graph, we update all the cells in the entire map by recomputing the contained points, the points’ mean and covariance.

Fig. 7: We test our algorithm on four datasets, which are all sampled by Livox-MID40. The first one is a large scale dataset sampled in HKUST campus; The second one is sampled around a square building (main building of HKU); The third one is sampled indoor consisting of two long corridors in two neighboring floors. The fourth one is sampled around a rectangular building (Chong Yuet Ming Cultural Center in HKU) with many natural objects, such as trees, stairs, sculptures, etc.

Vii Results

Vii-a Visualization of keyframe, cells, and 2D histograms

We visualize the two keyframes, their associated 2D histograms local maps, and contained cells in Fig. 6(a). This figure shows that the 2D histogram of the two different scenes are very distinctive.

Vii-B Rotation invariance

We evaluate the rotation invariance of our loop detection method by computing the similarity of the two scenes in Fig. 6(a) with random rotations. For each level of rotation, we generate 50 random rotation matrix of random directions but the same magnitude, rotate one of the two scenes by the generated rotation matrix, and compute the average similarity among all the 50 rotations of the same magnitude. The similarity of keyframe A to itself, keyframe B to itself, and keyframe A to keyframe B are shown as Fig. 6(b). It can be seen that, the similarity of plane features almost hold the same under different or rotation magnitude, and the similarity of the same keyframe (with arbitrary rotation) is constantly higher than the similarity of different keyframes. For line features, although the similarity of the same keyframe slightly drops when rotation takes place, it is still significantly higher than the similarity of different keyframes.

Vii-C Time of computation

We evaluate the time consumption or each step of our system on two platforms: the desktop PC (with i7-9700K) and onboard-computer (DJI manifold2444https://www.dji.com/cn/manifold-2 with i7-8550U). The averge running time of our algorithm run on HKUST large scale dataset (the first column of Fig. 7) are shown in Table. I, where we can see our proposed method is fast and suitable for real time scenenarios on both platforms.

2D histogram Maps Similarity of
computing alignment two maps
Desktop PC 1.18 621 13
Onboard-computer 1.48 931 16
TABLE I: The time table or our system run on two platforms.

Vii-D Large scale loop closure results

We test our algorithm on four datasets in Fig. 7, where the first row is the comparison of trajectory before (green solid line) and after (blue sold line) loop closure, the red dashed lines indicate the detected loop. The second row of figures is the comparison of the point cloud map before (red) and after loop closure (white), where we can see the loop closure can effectively reduce the drift of LiDAR odometry and mapping (especially in the areas inside yellow circle). We align our point cloud after loop closure with Goolge maps in the third row, where we can see the alignment is very accurate, showing that the accuracy of our system is of high precision.

Viii Conclusion

This paper presented a fast, complete, point cloud based loop closure for LiDAR odometry and mapping, we develop a loop detection method which can quickly evaluate the similarity of two keyframes from the 2D histogram of plane and line features in the keyframe. We test our system on four datasets and the results demonstrate that our loop closure can significantly reduce the long-term drift error. We open sourced all of our datasets and codes on Github to serve as an available solution and paradigm for point cloud based loop closure research in the future.

References

  • [1] M. Angelina Uy and G. Hee Lee (2018) PointNetVLAD: deep point cloud based retrieval for large-scale place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4470–4479. Cited by: §II.
  • [2] R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic (2016) NetVLAD: cnn architecture for weakly supervised place recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5297–5307. Cited by: §II.
  • [3] M. Bosse and R. Zlot (2013) Place recognition using keypoint voting in large 3d lidar datasets. In 2013 IEEE International Conference on Robotics and Automation, pp. 2677–2684. Cited by: §II.
  • [4] A. Bry, A. Bachrach, and N. Roy (2012) State estimation for aggressive flight in gps-denied environments using onboard sensing. In 2012 IEEE International Conference on Robotics and Automation, pp. 1–8. Cited by: §I.
  • [5] M. Calonder, V. Lepetit, C. Strecha, and P. Fua (2010) Brief: binary robust independent elementary features. In European conference on computer vision, pp. 778–792. Cited by: §II.
  • [6] R. Dubé, D. Dugas, E. Stumm, J. Nieto, R. Siegwart, and C. Cadena (2017) Segmatch: segment based place recognition in 3d point clouds. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 5266–5272. Cited by: §II.
  • [7] D. Filliat (2007) A visual bag of words method for interactive qualitative localization and mapping. In Proceedings 2007 IEEE International Conference on Robotics and Automation, pp. 3921–3926. Cited by: §I, §II.
  • [8] D. Gálvez-López and J. D. Tardos (2012) Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics 28 (5), pp. 1188–1197. Cited by: §I, §II.
  • [9] F. Gao, W. Wu, W. Gao, and S. Shen (2019) Flying on point clouds: online trajectory generation and autonomous navigation for quadrotors in cluttered environments. Journal of Field Robotics 36 (4), pp. 710–733. Cited by: §I.
  • [10] A. Gawel, T. Cieslewski, R. Dubé, M. Bosse, R. Siegwart, and J. Nieto (2016) Structure-based vision-laser matching. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 182–188. Cited by: §II.
  • [11] G. Grisetti, R. Kummerle, C. Stachniss, and W. Burgard (2010) A tutorial on graph-based slam. IEEE Intelligent Transportation Systems Magazine 2 (4), pp. 31–43. Cited by: §VI-D.
  • [12] A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard (2013) OctoMap: an efficient probabilistic 3d mapping framework based on octrees. Autonomous robots 34 (3), pp. 189–206. Cited by: §IV-B.
  • [13] J. Levinson, J. Askeland, J. Becker, J. Dolson, D. Held, S. Kammel, J. Z. Kolter, D. Langer, O. Pink, V. Pratt, et al. (2011) Towards fully autonomous driving: systems and algorithms. In 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 163–168. Cited by: §I.
  • [14] J. Lin and F. Zhang (2019) Loam_livox: a fast, robust, high-precision lidar odometry and mapping package for lidars of small fov. arXiv preprint. Cited by: §II, §III, §VI-C.
  • [15] D. G. Lowe et al. (1999) Object recognition from local scale-invariant features.. In iccv, Vol. 99, pp. 1150–1157. Cited by: §II.
  • [16] M. Magnusson, H. Andreasson, A. Nuchter, and A. J. Lilienthal (2009) Appearance-based loop detection from 3d laser data using the normal distributions transform. In 2009 IEEE International Conference on Robotics and Automation, pp. 23–28. Cited by: §II, §II, §V-A.
  • [17] R. Mur-Artal and J. D. Tardós (2017) Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics 33 (5), pp. 1255–1262. Cited by: §I.
  • [18] A. Nüchter, K. Lingemann, J. Hertzberg, and H. Surmann (2007) 6D slam—3d mapping outdoor environments. Journal of Field Robotics 24 (8-9), pp. 699–722. Cited by: §I.
  • [19] E. Olson, J. Leonard, and S. Teller (2006) Fast iterative alignment of pose graphs with poor initial estimates. In Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006., pp. 2262–2269. Cited by: §VI-D.
  • [20] K. Pulli (1999) Multiview registration for large data sets. In Second International Conference on 3-D Digital Imaging and Modeling (Cat. No. PR00062), pp. 160–168. Cited by: §VI-C.
  • [21] C. R. Qi, H. Su, K. Mo, and L. J. Guibas (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660. Cited by: §II.
  • [22] T. Qin, P. Li, and S. Shen (2018) Vins-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics 34 (4), pp. 1004–1020. Cited by: §I.
  • [23] E. Rublee, V. Rabaud, K. Konolige, and G. R. Bradski (2011) ORB: an efficient alternative to sift or surf.. In ICCV, Vol. 11, pp. 2. Cited by: §II.
  • [24] B. Schwarz (2010) LIDAR: mapping the world in 3d. Nature Photonics 4 (7), pp. 429. Cited by: §I.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
391963
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description