SPLODE: SemiProbabilistic Point and Line Odometry with Depth Estimation from RGBD Camera Motion
Abstract
Active depth cameras suffer from several limitations, which cause incomplete and noisy depth maps, and may consequently affect the performance of RGBD Odometry. To address this issue, this paper presents a visual odometry method based on point and line features that leverages both measurements from a depth sensor and depth estimates from camera motion. Depth estimates are generated continuously by a probabilistic depth estimation framework for both types of features to compensate for the lack of depth measurements and inaccurate feature depth associations. The framework models explicitly the uncertainty of triangulating depth from both point and line observations to validate and obtain precise estimates.
Furthermore, depth measurements are exploited by propagating them through a depth map registration module and using a frametoframe motion estimation method that considers 3Dto2D and 2Dto3D reprojection errors, independently. Results on RGBD sequences captured on large indoor and outdoor scenes, where depth sensor limitations are critical, show that the combination of depth measurements and estimates through our approach is able to overcome the absence and inaccuracy of depth measurements.
I Introduction
Modern active depth cameras, i.e., structuredlight and timeofflight (ToF) cameras, are capable of capturing dense 640480 depth maps from poorly textured scenes, at 30 fps. Therefore, the combination of these depth cameras with colour cameras, in a stereo configuration, has become a popular setup to capture both the geometry and appearance of indoor scenes. The calibration of such stereo setup allows the direct registration of depth with RGB images. To leverage this type of fused data, known as RGBD data, many visual odometry and SLAM methods [1, 2, 3, 4, 5, 6, 7] have emerged, over the last few years. However, these depth sensors suffer from several limitations, which cause missing and erroneous depth readings, such as: limited field of view and range (typically between 0.4 and 5 m), nearinfrared (NIR) interference (e.g. sunlight and multiple devices) and nonLambertian reflections. These limitations may affect severely the performance of pure RGBD based methods [4, 5, 6, 7] as these only make use of RGB pixels that have associated depth measurements. On the other hand, depth estimation from a single camera motion is less limited by the above issues but purely monocular SLAM [8, 9] suffers from scale drift and degenerate motion.
To address these limitations, we propose a robust visual odometry based on image point and line features, called SPLODE, which exploits not only the depth measurements from a depth camera but also depth estimates from camera motion. While SPLODE overcomes temporally unavailable depth measurements by propagating depth through depth map registration and estimating frametoframe pose by using loosely both the 2Dto3D and 3Dto2D reprojection errors of feature matches, SPLODE continuously estimates depth for tracked features by recursively triangulating and fusing depth from temporal stereo. The key contributions of this paper are the following:

The introduction of a depth estimation framework that models explicitly the triangulation uncertainty of points and lines by taking into account their particular geometries and the pose uncertainty.

Showing how to minimize the impact of depth measurement errors on the pose estimation by comparing two different depth estimation schemes.
Our experiments show that the combination of depth camera measurements with estimates through the proposed framework allows SPLODE to operate on challenging environments (i.e. beyond roomsized environments), where the limitations of depth cameras become too problematic to rely just on RGBD odometry.
Ii Related Work
Recently, several works have shown the benefit of combining different depth sensing modalities for egomotion estimation. Stereo cameras have been integrated in monocular SLAM methods based on either direct image alignment [10] or point features [11] to help resolving scaleambiguity and degenerate motions. RGBD cameras have been used with monocular techniques in [1, 2, 3] to deal with a lack of depth sensor measurements. Hu et al. [1] proposed a heuristic switching algorithm to choose between RGBD SLAM and monocular SLAM based on epipolar geometry, however with such a hard switch, all depth measurements during the monocular SLAM mode are ignored, thus a map optimization is required to estimate the scale. Zhang et al. [2] proposed estimating motion by combining 3Dto2D and 2Dto2D feature point matches in a joint optimization. Additionally, a depth map registration module is proposed to incorporate depth information from past frames and extend the method to LIDAR, nevertheless, during the frametoframe pose estimation, depth from the second frame (i.e. 2Dto3D) is neglected. Alternatively, AtaerCansizoglu et al. [3] proposed using a RANSAC framework to combine hybrid features. The framework estimates pose by trying different triplets comprised of either plane matches and 3Dto3D point matches or just 2Dto3D point matches, whereas the rest of the matches, including point matches without depth, are used to check the hypothesis consensus, yet the case of 3Dto2D matches is not considered. Both [2] and [3] triangulate points without depth after being tracked for longer than a certain temporal stereo baseline. However, assessing the stereo baseline, by itself, is not enough, especially for lines, e.g., camera translation parallel to a line or in a direction of a point renders triangulation impossible.
More convincing depth estimation is found among monocular odometry methods [8, 9]. Engel et al. [8] estimated and refined a semidense inverse depth map for direct image alignment by modelling the inverse depth uncertainty of pixels with three criteria: An approximation to geometric disparity error, a photometric disparity error and a pixel to inverse depth conversion. Forster et al. [9] introduced a fast monocular odometry, where the inverse depth of point features was modelled as a Gaussian Uniform mixture model by employing a Bayesian filter to fit good depth observations and account for outliers. Such model suggests robustness to repetitive textures, however, unlike our approach, the depthfilter needs to undergo multiple observations before it converges since it does not take into account the uncertainty of each feature observation. This method was further extended in [12] to line features to address textureless scenes.
RGBD data has also potential to be used in poorly textured and illuminated scenes due to the use of active light. In [4], the combination of point features with ICP based on the depth data proved to be advantageous in dark areas. Similarly in [5], the direct minimization of both photometric and depth error showed better performance in low textured scenes than by just relying on photometric error. Robustness in RGBD methods has also been achieved [13, 6] by exploiting other geometric primitives besides points. Planes have been widely used [13, 3]. More closely related to our work is the one of Lu and Song [6], who combined points and line segment features under a framework that models the uncertainties of the 3D features based on the lowresolution and noise of depth measurements obtained by structuredlight sensors. Despite their use of a 3D line fitting method that tolerates missing depth, the frametoframe motion estimation itself requires feature matches to have depth measurements in both frames, as it minimizes the Mahalanobis distance between the 3D feature locations. Consequently, the framework does not allow the inclusion of either features with missing depth or triangulated features.
Iii Method
Our proposed visual odometry system, outlined in Fig. 2, leverages active depth sensing and temporal stereo to retrieve the 3D location of points and line features, which are projected on the RGB camera. Once a new RGBD frame is captured, 2D features are detected from the current RGB image and depth measurements are sampled from the current depth map to attempt to obtain the respective 3D coordinates. These features are then matched against the ones extracted from the last frame and pose is estimated by minimizing their bidirectional reprojection errors by employing an Mestimator to reduce the impact of spurious matches and wrong depth associations on the estimated pose. Given the pose estimate, past depth measurements can be combined with the current depth map via point cloud registration to achieve a denser depth map, from which depth is resampled and used to recover the 3D coordinates of features, which do not have yet valid depth, so they can be used in the next frametoframe pose estimation. For the purpose of depth triangulation, features are tracked by frametoframe feature matching since their first frame observation to allow a widebaseline stereo. As outlined by the feature tracking algorithm in Fig. 3, we have experimented two different schemes to integrate the depth estimation from temporal stereo:

Mode B estimates depth for features, regardless of having depth measurements, such that, even if a feature already has a 3D position, recovered from the depth map, a depth estimate may be applied to initialize a new 3D position hypothesis to be used along with the other hypothesis in the next frame’s pose estimation as two 3Dto2D feature match residuals. Since the pose estimation implements an Mestimator, the spurious depth hypothesis can be downweighted.
While mode A gives priority to measurements coming from the depth sensor, estimating depth only when necessary, mode B relies less on depth measurements by treating the measurements and estimates equally. The principle behind mode B is that depth measurements may be inaccurate and depth estimates from temporal stereo can be used as an alternative hypothesis. The depth estimation module uses the uncertainty of the stereo triangulation, given the pose uncertainty, to fuse estimates from multiple views and assess the precision of the estimates. Whenever, the uncertainty of a fused depth estimate drops below a certain threshold, that depth estimate is applied to initialize the 3D position of the feature through backprojection, so it can be used in the next camera pose estimation. To combat wrong feature matches, the consistency of both triangulated and fused depth estimates are continuously checked based on the reprojection errors. In the remainder of this section, we detail each individual module of our framework.
Iiia Preliminaries
The projection of a 3D point: on an image plane is denoted as: where the coordinates of this projected point: correspond to its homogeneous vector representation: . The projected point can also be expressed in pixels by: where is known as the camera calibration matrix.
Both in 2D and 3D, a line segment is represented by its endpoints. For the purpose of measuring pointtoline distances, a 2D line is also parameterized in the Hessian normal form as: .
Furthermore, a point can be expressed with respect to a reference frame of a different camera view through a 3D rigid body transformation: , such that: . When necessary, rotation is expressed as a unit quaternion to achieve minimal pose parameters: such that for rotation angles between .
IiiB Pose Estimation
As several works in RGBD Odometry [2, 4], we avoid working in the Euclidean space, due to the depth sensor error at long distances, and instead estimate the frametoframe pose by jointly minimizing the reprojection errors of both line and point correspondences. To exploit the depth measurements available in both frames, the bidirectional error is used loosely, such that 3Dto2D correspondences are used if depth is available on the first frame and 2Dto3D correspondences are used if depth is available on the second frame. Given a 3Dto2D point correspondence: , we express the reprojection error in the vector form as follows:
(1) 
where correspond to the relative pose estimate. Similarly, the reprojection error of a 2Dto3D point match: can be expressed by using the inverse pose. For a 3Dto2D correspondence of line segments, the residual is defined as the pointtoline distance between a 2D line: and the two corresponding 3D line endpoints as follows:
(2) 
Additionally, the 2Dto2D point correspondences without depth can also be taken into account by using the residual introduced in [2]:
(3) 
Here, we use a factor to downweight these residuals, whereas in [2], the reprojection errors of the 3Dto2D matches are in fact scaled by the depth, therefore their impact on the overall optimization depends on the scene depth and metric unit. The value of is set heuristically to 0.01.
These heterogeneous residuals are then stacked together and minimized by using a nonlinear least squares algorithm (i.e. LevenbergMarquart). Formally, let be a set of 3Dto2D point matches; , a set of 2Dto3D point matches; , a set of 3Dto2D line matches; , a set of 2Dto3D line matches: and , a set of 2Dto2D point matches, then the optimal pose is given by the minimization of the following joint cost function:
(4)  
where are diagonal matrices containing the Tukey weights , which are computed separately per residualtype in an iteratively reweighted leastsquares fashion.
Degenerate feature configurations are addressed by imposing a minimum of 3 total matches with depth and by checking, after the optimization, the uncertainty of the pose parameters , which is approximated as the inverse of the GaussNewton approximation to the Hessian, based on the squared Mahalanobis distance (also known as the uncertainty backpropagation [14]):
(5) 
where is the combined Jacobian matrix of the residual functions with respect to the estimated pose parameters, evaluated at the solution, and is a diagonal matrix containing the uncertainties of the residuals. Although can be derived by propagating the uncertainty of the 3D feature coordinates, in this work, for simplicity, we assume homoscedasticity by letting the uncertainty of a residual be the variance of the residuals of the same type.
If the largest eigenvalue of the 33 block of corresponding to the translation exceeds a fixed threshold then the configuration is considered degenerate and a decaying velocity model is applied instead to estimate the pose.
IiiC Point and Line Image Features
At every frame, point and line features are extracted from the RGB image. Specifically for points, we make use of SURF features, whereas for lines, we use the LSD [15] method to detect line segments and then extract binarized LBD [16] descriptors (implemented in OpenCV) from the detected line segments. Features correspondences are then established by matching the descriptors between consecutive frames. Small frametoframe motion is assumed to prune away matches which are geometrically far: After performing a NN descriptor search for a given point , we select the closest match in whose point coordinates lie in a circular region defining the neighbourhood of , whereas, for a given line, we accept the closest line match in that has a similar slope angle and distance from origin, according to their line Hessian normal equations.
IiiD Point Depth Sampling and 3D Line Fitting
For a calibrated RGBD sensor, the 3D location of feature points is directly obtained from the depth map by backprojecting the corresponding 2D coordinates.
However, a 2D line segment contains multiple depth samples across its length, which suggests that one should exploit them instead of simply backprojecting the two depth pixels corresponding to the line endpoints as these are noisy, may be missing or correspond instead to the background, for lines tend to be detected on the object contours where depth is discontinuous. Hence, to obtain the 3D lines we propose to use the robust method of [6] with two simple modifications to address the computational cost of the original framework:

Using the Euclidean pointtoline distance within the RANSAC outlier filtering instead of the originally proposed Mahalanobis distance, for the two metric have not shown significant differences in terms of pose estimation accuracy, in our experiments (Section IV), however the computation of the Mahalanobis distance requires either inverting the uncertainty matrices of the line depth samples or applying whitening transformations to them [17]. In our implementation, using Eigen library, computing the Mahalanobis distances makes the process coarsely 3 times slower, thus we resort to use simply the Euclidean distance with an inlier threshold of 3 cm.

The final line fitting step in [6] is cast as a nonlinear least squares problem in order to incorporate the line uncertainty. Instead, we perform PCA on the consensus set of line samples: , given by RANSAC. Let the covariance of be factorized as: , then we parameterize the line, according to the spread of samples, with the two 3D points: , where is the mean of the samples, is the square root of the largest eigenvalue in and is the corresponding eigenvector.
(a)  (b) 
IiiE Line Depth Estimation
As depicted in Fig. 4.b, given a pair of line segment matches, the depth of one line segment endpoint can be derived from the pointtoplane distance:
(6) 
where is the normal of the projective plane obtained by the crossproduct of the two endpoints on the second image. This expression is illposed when the rotated point lies on the projective plane of the other frame, therefore it may result in a negative even for a correct line match. Thus, to detect outliers, endpoints with invalid depth estimates are assigned arbitrary positive depth and then reprojected on the second frame to check the pointtoline distance.
IiiF Point Depth Estimation
In practice, the feature point matches, resulting from the descriptor matching, do not meet exactly the epipolar constraint, meaning that their projection rays do not intersect. This issue is addressed by employing the commonly used linear triangulation method (described in [14], p. 312) to estimate the 3D point coordinates, which in turn we use to obtain the corrected projected point coordinates. The projection errors between the original and the corrected coordinates are checked to detect outliers. Additionally, for invalid depth estimates we use an outlier detection procedure equivalent to the one described in the previous section. Since the problem of point triangulation is overdetermined, the triangulation method [14] relies on a leastsquares solution, which makes the derivation of depth uncertainty less straightforward, however, given the corrected coordinates we can now arrive, through the trigonometry represented in Fig. 4.a, at the following formula:
(7) 
where:
(8) 
IiiG Uncertainty of Depth and Pose Estimates
The uncertainty of the depth estimation is modelled in terms of inverse depth variance to allow initialization of 3D features that are far away from the camera. Based on the first order error propagation, the uncertainty of the general inverse depth denoted here as: is given by:
(9) 
where is the Jacobian of the inverse depth function with respect to the input variables and is the variables’ uncertainty matrix. Specifically, this inverse depth function corresponds to the inverse of (6) and (7) for, respectively, a line endpoint and a point.
Regarding the variables’ uncertainties, depth estimation for a point correspondence depends on two image points: and on the stereo pose parameters , whereas the depth estimation of a line endpoint depends on three image points and . While the uncertainty of each point coordinate is set to 1 pixel, the block of corresponding to the pose is derived from the propagation of the pose uncertainty as follows.
Let the parameters express the camera pose at frame with respect to frame and be the uncertainty obtained by (5), then is given by the state transition function:
(10) 
with denoting the quaternion multiplication, and the respective pose uncertainty is given according to the EKF propagation equation as:
(11) 
where and are the Jacobians of the first 6 columns of (10) with respect to and , respectively.
IiiH Fusion of Depth Estimates
Sequential filtering is performed in order to fuse the estimates from multiple views. Each point, being tracked, has associated a 1D Gaussian PDF to represent its inverse depth state estimate. This is initialized as (with infinite depth) and it is updated whenever there is a new triangulation. Given the PDF of a new triangulated inverse depth estimate, represented as: , the prior depth estimate state: is updated by the product of two Gaussians:
(12) 
For point features, when drops below a fixed threshold, we use to backproject a 3D point, so that it can be used in the next frame. For line features, the 3D endpoints are initialized only when both uncertainties are sufficiently low.
IiiI Depth Map Registration
Depth measurements from past frames are propagated as a registered point cloud to recover the missing depth measurements of the current depth map. A denser depth map can then be obtained by transforming and projecting the registered point cloud to the current camera view. At each frame, the point cloud is filtered and updated with new measurements, as shown in Fig. 5, according to the following criteria:

Transformation uncertainty: To avoid inaccurate point registration due to the growing pose error, points measured at frames whose transformation to the current frame has an uncertainty indicator higher than a certain limit are removed from the point cloud. Specifically, a sliding window of transformation uncertainties is maintained and their largest eigenvalues are used as an uncertainty indicator, as described in Section IIIB.

Fieldofview: Only points within the FOV of the color camera are maintained.

One point per pixel: Each point must have a unique image projection pixel. In case of conflict, the most recent point is kept.
While one could devise more sophisticated methods for combining the depth measurements, e.g., averaging depth observations, an important advantage of this simple method is the efficiency, since the size of the point cloud is bounded by the image resolution, due to the last two criteria.
Iv Experiments
The performance of our visual odometry method is evaluated on the TUM RGBD dataset [18] and on an authorcollected RGBD dataset. All sequences have been captured with a structuredlight camera and a color camera operating at 30 fps with a resolution of 640480.
Sequence  Direct sampling  Euclidean dist.  Mahalanobis dist. 

fr1/desk  42 mm / 2.2°  41 mm / 2.2°  40 mm / 2.2° 
fr2/large_no_loop  85 mm / 1.2°  74 mm / 1.2°  75 mm / 1.2° 
Sequence 

SPLODE (Mode A) 


Points  Lines  All  
fr1/desk  No 





fr1/360  No 






No 



  
Yes 





No 





Yes 





No 



  
Yes 



Iva TUM RGBD dataset
We have conducted experiments on two distinct types of environments that are captured by the TUM RGBD dataset: A small office and a large industrial hall denoted respectively as fr1 and fr2. While the sequences recorded in the former are commonly used for benchmarking RGBD odometry and SLAM methods, few works have addressed some of the fr2 sequences. We believe this is mostly due to the large scene depth (see Fig. 8) which leads to insufficient depth measurements, which in turn causes tracking failures. Consequently, only fractions of these sequences were indeed evaluated in [3]. On the contrary, our results are reported for the full sequences, except for the fr2/large_no_loop which only has ground truth for about 20 % of the sequence duration. As described in Section IIIB, tracking or pose estimation failures are handled by using a velocity model.
Performance was evaluated in terms of pose drift as the relative pose error (RPE) per second. First, Table I shows how the choice of 3D line fitting affects the pose error. We have observed that when the depth information is rich, simple backprojection of the endpoints seems enough, however when the depth measurements are severely compromised, the robust line fitting becomes necessary. No significant differences have been observed however between using the Mahalanobis or the Euclidean distance.
As can be seen in Table II, including depth estimates reduces overall the odometry drift on the fr2 scene. Moreover, the number of frames, where tracking fails, is reduced, e.g., in the fr2/360_hemisphere, while point odometry without depth estimates misses 29 frames, with the depth estimates, tracking is not interrupted. Since no significant differences were observed between the two depth estimation schemes in this dataset, we only report results for the mode A, which is preferred in terms of computational cost.
IvB AuthorCollected Dataset
To further evaluate the robustness of the proposed method, we have captured two challenging RGBD sequences, for depth sensors: A large foyer and a park under sun exposure, shown in Fig. 1 and 8. The respective trajectories estimated by SPLODE are shown in Fig. 6, the final error of the trajectories is reported in Table III and a video is available at: https://youtu.be/6lMwPCiCXZc.
As shown in Fig. 6, while pure RGBD odometry with points and lines (i.e. PLO) fails, even with depth map registration, by making frequently gross errors or failing to estimate pose, the integration of depth estimates allows the method to perform well in both environments. In terms of features (see Table III), pointbased odometry performs worst than linebased odometry in the foyer sequence due to the predominance of lines, on the other hand, pointbased odometry performs well in the park sequence, whereas using just lines, as expected, is not sufficient, since straight lines are less common in nature.
In this dataset, mode B yields generally lower errors than mode A. This can be explained by the existence of spurious depth measurements due to NIR interference (shown in Fig. 8.d) and nonLambertian effects (shown in Fig. 7). These depth errors are further propagated and accumulated by the depth map registration, which can affect severely the pose estimation, as shown in Fig. 7, especially for Mode A, which relies more on the depth measurements.


Mode  SPLODE  
Points  Lines  All  

No   




Yes  A 




B 





No   

Fail 


Yes  A 




B 



(a) fr2/360_hemisphere  (b) fr2/360_hemisphere  (c) foyer  (d) park 
V Conclusions
This paper presents an RGBD based odometry method that achieves robustness to poorly textured scenes and depth sensor failures by combining point and line features, and depth sensor measurements with temporal stereo.
Our results show that the depth estimation framework introduced by this method is beneficial in large indoor environments (e.g. warehouses) and outdoor environments, where the depth information captured by depth cameras is too sparse. Moreover, our experiments indicate that gross depth errors occur typically outdoors, due to NIR interference with the sunlight, and its impact on the pose estimation can be minimized by adopting a more active depth estimation approach (referred to as Mode B) than just estimating depth when measurements are missing (Mode A). However, because Mode B is more computational intensive than Mode A, adaptive switching between both is a worthwhile research direction. Another possible improvement of the system, is to make the pose estimation also probabilistic by weighting the residuals according to the uncertainties of both depth measurements and estimates: In this work, SPLODE relies on a finelytuned maximum uncertainty threshold to accept sufficiently precise depth estimates, but by using the depth estimate uncertainty in the pose estimation, such threshold could be further relaxed to exploit more features with higher uncertainties.
References
 [1] G. Hu, S. Huang, L. Zhao, A. Alempijevic, and G. Dissanayake, “A robust rgbd slam algorithm,” in International Conference on Intelligent Robots and Systems (IROS), 2012.
 [2] J. Zhang, M. Kaess, and S. Singh, “Realtime depth enhanced monocular odometry,” in International Conference on Intelligent Robots and Systems (IROS), 2014.
 [3] E. AtaerCansizoglu, Y. Taguchi, and S. Ramalingam, “Pinpoint slam: A hybrid of 2d and 3d simultaneous localization and mapping for rgbd sensors,” in IEEE International Conference on Robotics and Automation (ICRA), 2016.
 [4] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgbd mapping: Using kinectstyle depth cameras for dense 3d modeling of indoor environments,” The International Journal of Robotics Research, vol. 31, no. 5, pp. 647–663, 2012.
 [5] C. Kerl, J. Sturm, and D. Cremers, “Dense visual slam for rgbd cameras,” in International Conference on Intelligent Robots and Systems (IROS), pp. 2100–2106, IEEE, 2013.
 [6] Y. Lu and D. Song, “Robust rgbd odometry using point and line features,” in IEEE International Conference on Computer Vision (ICCV), 2015.
 [7] T. Whelan, H. Johannsson, M. Kaess, J. J. Leonard, and J. McDonald, “Robust realtime visual odometry for dense rgbd mapping,” in IEEE International Conference on Robotics and Automation (ICRA), 2013.
 [8] J. Engel, J. Sturm, and D. Cremers, “Semidense visual odometry for a monocular camera,” in Proceedings of the IEEE international conference on computer vision, pp. 1449–1456, 2013.
 [9] C. Forster, M. Pizzoli, and D. Scaramuzza, “Svo: Fast semidirect monocular visual odometry,” in IEEE International Conference on Robotics and Automation (ICRA), 2014.
 [10] J. Engel, J. Stueckler, and D. Cremers, “Largescale direct slam with stereo cameras,” in International Conference on Intelligent Robots and Systems (IROS), 2015.
 [11] R. MurArtal and J. D. Tardós, “Orbslam2: an opensource slam system for monocular, stereo and rgbd cameras,” in ArXiv preprint arXiv:1610.06475, 2016.
 [12] R. GomezOjeda, J. Briales, and J. GonzálezJiménez, “Plsvo: Semidirect monocular visual odometry by combining points and line segments,” in International Conference on Intelligent Robots and Systems (IROS), 2016.
 [13] Y. Taguchi, Y.D. Jian, S. Ramalingam, and C. Feng, “Pointplane slam for handheld 3d sensors,” in IEEE International Conference on Robotics and Automation (ICRA), pp. 5182–5189, 2013.
 [14] R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge university press, 2003.
 [15] v. G. R. Grompone, J. Jakubowicz, J.M. Morel, and G. Randall, “Lsd: a fast line segment detector with a false detection control.,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 4, pp. 722–732, 2010.
 [16] L. Zhang and R. Koch, “An efficient and robust line segment matching approach based on lbd descriptor and pairwise geometric consistency,” Journal of Visual Communication and Image Representation, vol. 24, no. 7, pp. 794–805, 2013.
 [17] Z. Lu, S. Baek, and S. Lee, “Robust 3d line extraction from stereo point clouds,” in IEEE Conference on Robotics, Automation and Mechatronics, 2008.
 [18] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgbd slam systems,” in International Conference on Intelligent Robots and Systems (IROS), 2012.
 [19] D. GutierrezGomez, W. MayolCuevas, and J. J. Guerrero, “Dense rgbd visual odometry using inverse depth,” Robotics and Autonomous Systems, vol. 75, pp. 571–583, 2016.