Robust Shape Registration using Fuzzy Correspondences
Shape registration is the process of aligning one 3D model to another. Most previous methods to align shapes with no known correspondences attempt to solve for both the transformation and correspondences iteratively. We present a shape registration approach that solves for the transformation using fuzzy correspondences to maximize the overlap between the given shape and the target shape. A coarse to fine approach with Levenberg-Marquardt method is used for optimization. Real and synthetic experiments show our approach is robust and outperforms other state of the art methods when point clouds are noisy, sparse, and have non-uniform density. Experiments show our method is more robust to initialization and can handle larger scale changes and rotation than other methods. We also show that the approach can be used for 2D-3D alignment via ray-point alignment.
Registration is a common problem in many 3D tasks and has wide ranging applications to scene understanding, modeling, robotics, and many other research areas. The goal is to find the transformation that aligns two objects. In this paper we focus on model fitting where we align a given partial point set to a model represented as a point set or a mesh. While there are many approaches for registration, they often suffer greatly in cases where the source and target models vary greatly due to scale changes, noise, outliers and missing data.
We propose a novel method for registration which solves for the transformation that maximizes the proximity of a given shape to a target shape while also maximizing coverage on the target shape. Proximity is a measure of the distance between the two shapes. Coverage is the area of the target shape that corresponds to the given shape. This approach does not require correspondences, is robust to scale changes, large transformations, and works well on non-uniform points making it ideal for many real world applications. A sample of alignment results using our method is shown in Figure 1. We demonstrate our technique on a variety of 3D benchmarks including a typical registration dataset, a dataset of low resolution, non-uniform and noisy LiDAR scans with corresponding CAD models, and a series of 3D to 2D alignment experiments. The technique outperforms traditional registration approaches on standard 3D alignment, and we demonstrate its wide ranging applicability by illustrating its use for 3D to 2D projective alignment to calibrate a camera system and LiDAR.
The paper is organized as follows. Section 1.1 describes related registration algorithms. Section 2 describes our energy function, the proximity and coverage metrics and fuzzy correspondences. Section 2.2 describes 2D to 3D alignment through point-to-ray registration. Section 3 describes our experiments including a direct comparison to relevant methods on the Correspondence and Registration of Range Images for 3D Modeling dataset, the Ford Campus dataset, and 2D-3D alignment on our own dataset. Finally, we conclude the paper in Section 4
1.1 Related Works
3D registration is well studied problem with many different approaches, we direct the reader to  for a more comprehensive review. ICP [6, 2] is the most widely used method for rigid registration of point sets. ICP is an iterative method that alternates between two steps (i) computing correspondences with fixed transformation and (ii) solving for the transformation with fixed correspondences. Closest points between the point sets are assigned as correspondences. ICP works well when the transformation between the point sets is small and the data has low levels of noise, few outliers, and uniform density of points. Many variants have been proposed to overcome limitations of ICP [21, 19, 9, 11, 13]. use a point to plane distance to find correspondences, use a hierarchical approach and propose a heuristic points search algorithm to speed up ICP,  propose a method to sample geometrically stable points used to solve for transformations.
To increase robustness to outliers and noise,  proposed to soft assign correspondences based on Gaussian weights,  proposed a probabilistic approach EM-ICP in which they used matches weighted by normalized Gaussian weights. Though they replaced the discrete matching by a continuous function their methods still alternate between finding correspondences and solving for transformations.  proposed LM-ICP which uses nonlinear optimization technique such as Levenberg-Marquardt method to solve for both correspondences and transformation in the same step. They replace discrete matching by a kernel function of distance transform.
 propose a continuous representation of point sets using a mixture model and minimize the KL distance between the point sets using an improved Euler’s algorithm for numerical integration. These methods, while being robust to noise and outliers, are still only well suited for point sets of similar scale. Myronenko and Song  proposed Coherent Point Drift (CPD), a probabilistic approach for rigid and non-rigid registration. They represent the point set using GMM and force the centroids to move coherently for preserving topology. They also propose a close form solution to the EM approach to solve for the transformation.
Our approach to align shapes represented as point sets without explicit correspondences involves solving for the transformation of a given shape that maximizes the proximity of to a model while maximizing coverage on . In this paper, we solve for rigid and similarity transformations. We hypothesize that a given shape is aligned to a model if all the points of are within a distance from (Proximity) and if all the points of are within a distance from (Coverage). When and are identical sets this amounts to mapping between them. But, in most real world scenarios, these sets are not identical. The two point sets differ in cardinality and point density along with the data being corrupted by noise, outliers, and missing data. In such scenarios we propose to maximize proximity and coverage by minimizing the following energy function:
where, and are measures of proximity and coverage respectively. is the weighting factor in the range [0,1] and .
The discrete functions and do not allow minimization of equation (1) using standard numerical nonlinear optimization methods. Note that, given an , by minimizing equation (1) we are optimizing the mapping (correspondences) between the point sets rather than directly minimizing the distance between the point sets. does not measure the quality of mapping between the point sets. Inspired by [18, 10, 8, 3] which use weighted matching of points to make the correspondence matrix continuous, we replace the function by a summation of Gaussian functions. We define a fuzzy correspondences matrix between point sets and as
where and . A matrix element gives a measure of quality of the mapping between and . The here is analogous to . Using the fuzzy membership matrix we redefine Proximity and Coverage as
Since the cardinality of the two sets are generally not the same (M is not a square matrix), the inner summation in equations (6) and (7) may bias the energy equation (1) towards when and towards when . Replacing the inner summation by an averaging operation will eliminate this bias, however it can lead to small values which can result in numerical instability during optimization. We circumvent these issues by introducing normalized fuzzy correspondences. We define normalized fuzzy correspondences matrices and to compute and respectively.
is a constant that is set so that we have well scaled gradients for optimization. We set in our experiments. The Gaussian scale controls the capture range of a model. Points that are outside the capture range of a model do not contribute to the energy function. Therefore, large makes the method robust to initialization, while small makes it robust to outliers. We describe a multi- approach for registration in the following section.
The energy function in equation (1) is minimized using the Levenberg-Marquardt method. The LM method is used to solve least-squares fitting problems. The energy function (1) can also be written as a sum of residuals
where and is the vector
The error function (12) is minimized to solve for the transformation . The Jacobian required by the method is computed , where , . for rigid transformation and for similarity transformation. We use unit quaternions to represent rotation. To be robust to initialization we start the optimization at larger values and iteratively start reducing the value of . We scale the point sets so that they fit within an unit cube and after every optimization step reduce by a factor of (The initial and final values for are set based on the application). Finally the point sets are scaled back to the original scale of the target shape. We also use a multi-resolution approach where we iteratively increase the cardinality of the two point sets. Initially when the is large, a point in can be mapped to multiple points in . As the is reduced, the mapping moves towards a one-to-one mapping between the point sets with outliers having no mapping. This builds in an outlier rejection mechanism into the registration process while being robust to initialization.
2.2 2D-3D alignment
Here we extend the point set registration scheme to achieve 2D to 3D alignment through point-to-ray registration. Given a set of 2D image points and a set 3D scene points that represent objects in the scene, we wish to solve for the transformation that aligns the projection of the 3D points with 2D image points. We formulate this problem as solving for the rigid transformation that aligns 3D points to rays . From the 2D image points, using the camera projection parameters, we compute a set of normalized rays that originate from the center of projection and pass through the 2D image points on the image plane. Assuming that the center of projection is at the origin, We modify the fuzzy correspondence matrix by replacing the point-to-point distance by point-to-ray distance.
Where, and . Optimization of the alignment is performed as in section 2.1. In the experiments section we show the application of this method to align points from LiDAR to image points.
The proposed approach requires that the two shapes and be represented as point sets. In cases when a shape is input as a mesh with low vertex density we perform voxelization to convert them to point sets. To voxelize a mesh, we use triangle-cube intersection to find voxels that intersect a face in the mesh. We initialize a voxel grid of desired resolution within the bounding box of the shape. The voxels which intersect any face of the mesh are set to and the ones that do not are set to . This gives us a binary voxel representation of the surface of the shape. We then use morphological closing to remove small gaps. To mask out voxels that are not part of the outer surface of the shape, we set all the voxels that are not reachable from the edge of the volume to . The centers of all the voxels that are set to form the point set representation of the shape.
To validate our method we have conducted three sets of experiments on diverse data types. First we compare our method directly to common registration schemes on the Automatic Correspondence and Registration of Range Images for 3D Modeling dataset [15, 14], which is a publicly available 3D dataset for object recognition, segmentation and registration. Secondly, to demonstrate the efficacy of our approach on noisy and sparse data with scale differences we use the Ford Campus Dataset , which features noisy LiDAR scans of a parking lot and align these to high quality CAD models from 3D Warehouse. We also present results of 2D to 3D registration in the form of LiDAR to camera alignment, to demonstrate the generality of our technique. The result is an automatic or semi-automatic approach to LiDAR to camera calibration which does not require a calibration object.
3.1 Registration comparison
The Automatic Correspondence and Registration of Range Images for 3D Modeling dataset [15, 14] features 4 different 3D models in various scenes that have been scanned with a non contact laser scanner. The dataset features high resolution scans of each object on its own as well as scans of scenes containing combinations of the objects in different arrangements and poses. The dataset has been developed for a number of tasks including segmentation and detection, but the cluttered scenes contain ground truth alignment parameters, allowing us to validate our approach by registering the high resolution target shape to the occluded or translated objects in scanned scenes.
Since we are only concerned with registration and not segmentation or detection, we semi-automatically segment out each shape in the scenes containing multiple objects. We start with an aligned shape and we rotate the source shape in steps of five degrees along each axis. We then align the centroid of both models, introducing a translation, as the target shape is incomplete. We compare our approach to a number of common registration schemes, including ICP [6, 2] with point to plane distance, CPD , Efficient variants of ICP [21, 19, 9, 11, 13], ICP Fast[15, 14], Finite-ICP  and compare the resulting alignment parameters against the ground truth by measuring mean vertex to vertex distance.
Figure 2 shows a series of graphs for different models in the Range Images for 3D Modeling dataset, where we have introduced rotations about different axes in steps of . In these graphs our algorithm (in orange) demonstrates stability over a wide range of rotations about different axes. Our algorithm consistently outperforms other techniques in terms of robustness to initialization. We have included a failure case where our algorithm finds a local minima in one of the models about one of the axes (bottom right corner of Figure 2). This represents a case where the target shape is more radially symmetric than other objects in the scene. Additional results are shown in Figure 3 and Table 1
3.2 LiDAR to CAD model Registration
One of the main applications for registration techniques is aligning high quality CAD models to noisy sensor data for scene representation and robotics. Registering objects like valves and door handles is a key component of successfully navigating tasks like those in the DARPA Robotics Challenge. This task is complicated by noise in sensor data. To illustrate the robustness of our method we have aligned CAD models and LiDAR scans. The LiDAR scans come from the Ford Campus Dataset [1, 17], which uses a Velodyne 3D-LiDAR scanner. The CAD models were downloaded from the open source 3D model repository 3D Warehouse, and have been produced by different individuals, at different scales, resolutions, and level of detail. These models are not uniform, some contain internal mesh components like seats, some have additional body components like roof racks and larger wheels that may not reflect their real world counterparts, and these complicate the task of alignment.
To test our method we have manually identified vehicle models in the Ford Campus Dataset by looking through the photographs that are provided with the LiDAR scans. We have downloaded corresponding models from 3D warehouse to the best of our ability (within 2-3 model years for the vehicles we have selected). To test our registration technique we again segment the portion of the LiDAR scan which corresponds to the vehicle, and run our registration method. As this data contains no ground truth alignment parameters we report cloud-mesh distance using the CloudCompare utility . The models we downloaded are at different scales (some in mm, some in feet and inches etc). Since the registration must handle scale, we have shown comparison only with CPD and fast CPD , as other ICP based methods do not handle scale well when the data is incomplete and sparse. We have also compared to CPD using our voxelization process described in 2.3. Results are shown in Table 2.
|Vehicle||Our Method||CPD||CPD (Fast)||CPD + Voxelization|
|mesh||mean dist||max dist||mean||max||mean||max||mean||max|
|Ford Focus 1||0.0598908||0.490146||-||-||-||-||0.069064||0.697694|
|Ford Focus 2||0.0657001||0.526913||-||-||-||-||0.0649599||0.536582|
These results show that our method outperforms CPD in most circumstances. We consistently achieve lower mean and max distance from the cloud to mesh after applying our alignment. Figure 4 and 5 show the input point cloud, our alignment and the alignment achieved by CPD. The mean cloud to mesh distance from our method is approximately the same as the 5cm error range of the LiDAR used to capture the data.
3.3 LiDAR to Camera Alignment
We have extended the formulation of our registration technique to work on not only 3D to 3D alignment, but projective 2D to 3D alignment by ray-point registration. This extension allows for automatic and semi-automatic LiDAR to camera calibration, and has applications in sensor fusion, and 3D reconstruction. We demonstrate this technique by aligning and re-projecting LiDAR points onto both color and and thermal images. We have developed this technique for validating stereo reconstruction in two modalities using LiDAR to generate ground truth.
We have used a Trimble GX Advanced TLS LiDAR, which captures 5000 3D points per second at error at 50M. The LiDAR is mounted adjacent to a color and thermal camera system. The color camera is a Point Grey Flea2 capturing at 1280 x 960. The thermal camera is a long wave infrared Xenics Gobi 640-GigE, which captures at 640 x 480. The cameras have similar fields of view, and we scan an area slightly larger than what either camera sees. The cameras are calibrated using the method outlined in  using a heated ceramic backed calibration pattern, allowing for both cameras to be calibrated simultaneously.
Calibration of the LiDAR to camera is a process of identifying the transformation from the camera center to the coordinate center of the LiDAR. We treat this as a process of registering 3D rays to 3D points, and solve for the alignment using the technique outlined in Section 2.2. To do this we generate a set of rough correspondences in the form of automatically generated points via edges or roughly drawn regions for semi automatic registration.
To automatically generate rough correspondences we use image edges and depth discontinuities. We project the 3D LiDAR points to a 2D depth image using the camera parameters. We then compute an edge mask of both the depth image and a bilateral filtered camera image using Canny edge detection . Furthermore, this process works exceptionally well in thermal images, because texture has little effect on the image intensity, and most of the edges correspond to object boundaries and material changes. These edges are used to generate a set of 3D points and rays for alignment. 2D points in the camera edge-map are used to create a set of rays originating at the camera center and intersecting the image plane through the edge map. The depth edge map is dilated, and all the 3D points that project to the new edge-map are obtained. These rays and 3D points are then used for registration. The resulting transformation allows us to project 3D points from the LiDAR onto the camera, and generate new depth images from the camera’s perspective as shown in Figure 6.
In scenes with dense and contrasting texture the camera edge image can be exceedingly noisy, and this can lead to a large number of potential local minimums. To combat this we have extended the above method to include manual rough correspondences. We have developed a small GUI which allows users to draw regions on both the camera image and the depth image to create rough correspondences. It is not mandatory for the hand drawn components to be exact same area or even to have the same number of components in each image. Figure 8 shows the GUI which allows users to highlight regions in the camera image and depth image to generate rough correspondences.
To validate this approach we measure pixel distance from reprojected 3D points to their closest labeled point in the 2D image. We report the errors over 5 different alignments in the form of a histogram in Figure 7. The large majority of points report error of under 3 pixels. The outliers with large errors are caused by points that have no correspondence and (correctly) lie outside the image when projected.
In this work we have presented a robust technique for registration that does not require correspondences, effectively handles scale, and is invariant to many transformations that cause traditional approaches to fail. The proposed approach based on fuzzy correspondences maximizes the overlap and proximity of points between the target and source shape. We have demonstrated this technique on data from public datasets and compared it against widely used approaches. Our technique outperforms the other methods in registration tasks. With noisy sensor data we have shown that our technique can align models at different scales to within a close range of the sensor’s error margins. We have also extended the technique to handle the problem of ray-point registration for the application of LiDAR to camera alignment, and we have developed fully automatic and semi-automatic approaches to calibration that do not require a calibration object, and work in different image modalities.
-  S. Y. Bao, M. Bagra, Y.-W. Chao, and S. Savarese. Semantic structure from motion with points, regions, and objects. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2012.
-  P. J. Besl and H. D. McKay. A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2):239–256, Feb 1992.
-  D. Breitenreicher and C. Schnörr. Robust 3d object registration without explicit correspondence using geometric integration. Machine Vision and Applications, 21(5):601–611, 2010.
-  J. Canny. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell., 8(6):679–698, June 1986.
-  U. Castellani and A. Bartoli. 3d shape registration. In 3D Imaging, Analysis and Applications, pages 221–264. Springer, 2012.
-  Y. Chen and G. G. Medioni. Object modelling by registration of multiple range images. Image Vision Comput., 10(3):145–155, 1992.
-  T. P. EDF R&D. Cloudcompare (version 126.96.36.199)[gpl software], 2014. Retrieved from http://www.cloudcompare.org/.
-  A. W. Fitzgibbon. Robust registration of 2d and 3d point sets. Image and Vision Computing, 21(13):1145–1153, 2003.
-  N. Gelfand, S. Rusinkiewicz, L. Ikemoto, and M. Levoy. Geometrically stable sampling for the icp algorithm. In 3DIM, pages 260–267. IEEE Computer Society, 2003.
-  S. Granger and X. Pennec. Multi-scale em-icp: A fast and robust approach for surface registration. In Proceedings of the 7th European Conference on Computer Vision-Part IV, ECCV ’02, pages 418–432, London, UK, UK, 2002. Springer-Verlag.
-  T. Jost and H. HÃ¼gli. A multi-resolution icp with heuristic closest point search for fast and robust 3d registration of range images. In 3DIM, pages 427–433. IEEE Computer Society, 2003.
-  D.-J. Kroon. SEGMENTATION OF THE MANDIBULAR CANAL IN CONE-BEAM CT DATA. PhD thesis, Universiteit Twente, 2011.
-  K.-L. Low. Linear least-squares optimization for point-to-plane icp surface registration. Chapel Hill, University of North Carolina, 4, 2004.
-  A. S. Mian, M. Bennamoun, and R. Owens. Three-dimensional model-based object recognition and segmentation in cluttered scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1584–1601, Oct 2006.
-  A. S. Mian, M. Bennamoun, and R. A. Owens. A novel representation and feature matching algorithm for automatic pairwise registration of range images. International Journal of Computer Vision, 66(1):19–40, 2006.
-  A. Myronenko and X. Song. Point set registration: Coherent point drift. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12):2262–2275, Dec 2010.
-  G. Pandey, J. R. Mcbride, and R. M. Eustice. Ford campus vision and lidar data set. Int. J. Rob. Res., 30(13):1543–1552, Nov. 2011.
-  A. Rangarajan, H. Chui, E. Mjolsness, S. Pappu, L. Davachi, P. Goldman-Rakic, and J. Duncan. A robust point-matching algorithm for autoradiograph alignment. Medical Image Analysis, 1(4):379 – 398, 1997.
-  S. Rusinkiewicz and M. Levoy. Efficient variants of the icp algorithm. In Proceedings of the Third Intl. Conf. on 3D Digital Imaging and Modeling, pages 145–152, 2001.
-  P. Saponaro, S. Sorensen, S. Rhein, and C. Kambhamettu. Improving calibration of thermal stereo cameras using heated calibration board. In Image Processing (ICIP), 2015 IEEE International Conference on, pages 4718–4722, Sept 2015.
-  T. ZinÃer, J. Schmidt, and H. Niemann. A refined icp algorithm for robust 3-d correspondence estimation. In ICIP (2), pages 695–698, 2003.