Model-free Consensus Maximizationfor Non-Rigid Shapes

Model-free Consensus Maximization
for Non-Rigid Shapes

Thomas Probst1 1 Computer Vision Lab, ETH Zürich, Switzerland
2 VISICS, ESAT/PSI, KU Leuven, Belgium
11email: {probstt,ajad.chhatkuli,paudel,vangool}
   Ajad Chhatkuli1 1 Computer Vision Lab, ETH Zürich, Switzerland
2 VISICS, ESAT/PSI, KU Leuven, Belgium
11email: {probstt,ajad.chhatkuli,paudel,vangool}
   Danda Pani Paudel1 1 Computer Vision Lab, ETH Zürich, Switzerland
2 VISICS, ESAT/PSI, KU Leuven, Belgium
11email: {probstt,ajad.chhatkuli,paudel,vangool}
   and Luc Van Gool1,2 1 Computer Vision Lab, ETH Zürich, Switzerland
2 VISICS, ESAT/PSI, KU Leuven, Belgium
11email: {probstt,ajad.chhatkuli,paudel,vangool}

Many computer vision methods rely on consensus maximization to relate measurements containing outliers with a reliable transformation model. In the context of matching rigid shapes, this is typically done using Random Sampling and Consensus (RANSAC) to estimate an analytical model that agrees with the largest number of measurements, which make the inliers. However, such models are either not available or too complex for non-rigid shapes. In this paper, we formulate the model-free consensus maximization problem as an Integer Program in a graph using ‘rules’ on measurements. We then provide a method to solve such a formulation optimally using the Branch and Bound (BnB) paradigm. In the context of non-rigid shapes, we apply the method to filter out outlier 3D correspondences and achieve performance superior to the state-of-the-art. Our method works with outlier ratio as high as 80%. We further derive a similar formulation for 3D template to image correspondences. Our approach achieves similar or better performance compared to the state-of-the-art.

Acknowledgements. Research funded by the EU’s Horizon 2020 programme under grant agreement No 687757– REPLICATE and supported by NVIDIA Corporation through the Academic Hardware Grant.

1 Introduction

Consensus maximization is a powerful tool in computer vision that has enabled practical applications of highly complex algorithms such as Structure-from-Motion (SfM) [1, 2, 3] to work despite outlying measurements and noise. Apart from heuristic strategies such as RANSAC [4], globally optimal consensus maximizers have been widely studied in the context of SfM with rigid shapes in [5, 6, 7, 8, 9, 10, 11], where the transformation model between two measurements exists. In contrast, such tools have not been explored in earnest for the model-free scenario. An important field where model-free approaches are needed is in non-rigid shape registration. Consensus maximization in non-rigid shapes have applications in augmented reality, object animations and shape analysis.

A large number of works have tackled non-rigid registration problem between images or shapes  [12, 13, 14, 15, 16]. On the other hand, very little attention has been given to identifying outliers among a given set of matched correspondences. A few methods solve such problems in the images of non-rigid shapes [17, 18] and between a template shape and an image [19] through locally optimal approaches. The difficulty of assigning a suitable minimal parameter model to non-rigid transformations makes it highly challenging to devise a consensus maximizer. Furthermore, problem solving strategies that guarantee global optimality, such as Integer Programming and Branch-and-Bound (BnB), have not been explored for non-rigid shapes.

In this paper, we propose a common framework of seeking consensus in a model-free one-to-one correspondence set. Our key idea is that despite lacking a model which can explain each instance in a matching set individually, one can consider the agreement between two or more instances through certain rules to formulate constraints. In non-rigid shapes, a common rule widely applied for reconstruction and registration is the isometric deformation prior. Isometry implies that the geodesic distances on the surfaces do not change with deformations. Using these theoretical understandings, we provide our contributions in three different aspects. First we show how a model-free consensus maximization problem can be posed as a graph problem and more concretely solved as an integer program if we have inlier/outlier rule priors on the matching sets. Such an integer program can be solved optimally using a BnB approach. Second, we apply this formulation for the problem of establishing inlier matches in non-rigid shape correspondences under the isometry prior. We show that our method can handle as much as 80% outlier correspondences on isometric surfaces. We provide extensive experiments on several isometric and partial shapes, as well as ‘loosely’ isometric partial inter-subject human datasets, where we obtain results that improve over state-of-the-art methods. To show the generic nature of the introduced consensus maximizer, we also formulate a 3D template-to-image outlier removal problem using the piecewise rigidity and smoothness prior. We conduct extensive experiments in order to analyse the behaviour of the proposed algorithms and to compare with state-of-the-art methods.

2 Related Work

We briefly highlight the related works that are relevant to the non-rigid registration problems. The first problem we tackle is that of maximizing consensus between matched 3D surface points in non-rigid 3D shapes using the isometry prior. This is a widely used prior in registration [14, 20, 21, 15, 22] as well as 3D reconstruction [23, 24]. Most non-rigid shape registration methods [20, 21, 15, 22] start with a 3D descriptor such as the SHOT descriptor [25] or heat kernels [26] and establish correspondences between shapes through some energy minimization. Others compute the registration directly through conformal maps [27, 14]. This process in general results some good matches and some outlying matches. In our work, we are interested in whether the outlying matches given by state-of-the-art feature matching methods can be removed in both easy and difficult cases including complete, partial and inter-subject scenarios.

3D template-to-image matching is yet another important problem in non-rigid shapes that can be used to localize cameras [28] or template-based 3D reconstruction [29, 24, 30, 19]. Eliminating outlier matches in such cases is addressed in [19] by using a local iterative approach. Other methods which solve image registration [12, 18] do not use a 3D geometric prior explicitly. We address the problem of outlier removal with consensus maximization in this setting with piece-wise rigidity and smoothness prior. A recent method [16] solves the combinatorial matching problem with the same constraints but does not focus on the problem of identifying outlying matches.

3 Background and Theory

3.1 Notations

We represent sets and graphs as special Latin characters, e.g., . Members of the sets, e.g., are written as lowercase Latin letters. We use lowercase Latin letters , or to represent indices or sets of indices. We write known or unknown scalars also in lowercase Latin letters, such as . We use uppercase bold Latin letters to represent matrices (e.g., ) and lowercase bold Latin letters to represent vectors (e.g., ). We use lowercase Greek letter to represent thresholds. We use uppercase Greek letters to represent mappings or functions (e.g., ). We use to denote the – norm and to denote the – norm of a vector or the cardinality of a set. Unless stated otherwise, we write primed letters to represent quantities related to the transformed set.

3.2 Outliers

Let be a bijective map between two sets and of cardinality . The map induces element-to-element matching such that a member in the first set is mapped to the member in the second set. This defines a set of matches . In practice, may be a rigid or non-rigid transformation function (or their compositions). The outlier set is defined with the help of a distance function :


In other words, a pair is an outlier if the distance between the mapping of and its correspondence , is greater than . Note that the outlier set is a collection of outlier indices. Here, the index represents the matched pair .

3.3 Consensus Maximization

Using the definition of outliers in (1), the problem of consensus maximization is defined as the minimization of the cardinality of the set , for the unknown , such that,


Problem (2) implies that we wish to find the mapping which results in the least number of disagreements given by the cardinality of , in the given matching set . In rigid SfM related problems, can be often expressed using a linear or non-linear function of a fixed small size of parameters. This means that (1) can be evaluated point-wise111While in some cases such as that of the Fundamental Matrix, although cannot be determined point-wise, it can be estimated for a minimal set. Thus, a RANSAC problem can be formulated. and also estimated using a very small size of point correspondence sets, known as the minimal sets. There is no doubt that such problems can be efficiently solved using RANSAC and other globally optimal methods highlighted in section 1. However even when can be parameterized, very recently problem (2) was shown to be NP-hard with W[1]-complexity [31, 32], meaning that solving it optimally is very expensive. We refer to the problem when can be parameterized (with a reasonable number of parameters) as model-based consensus maximization. In the sections below we focus on the model-free problem. Note that most formulations on consensus maximization are written as maximization of the inlier set cardinality rather than the minimization of the outlier set cardinality. However, these definitions are equivalent and we choose the minimization of the outlier set cardinality for convenience.

3.4 Generic Rules-based Consensus Maximization

In contrast to the case of rigid structures, for many problems such as those related to non-rigid shapes, cannot be represented with a small size of parameters and therefore it cannot be estimated using a minimal point set. As a consequence, cannot be evaluated point-wise. For example, consider the case when represents the mapping between the two instances of a non-rigid surface. Such a map may be represented by Free-Form Deformation (FFD) [33, 18] or specialized latent space models such as SMPL [34] for human body, which requires a large number of points to fit the latent parameters.

The problem of (2) is difficult to solve in its original form for the model-free case. Therefore, we offer an alternative consensus maximization formulation which is easier to solve for a special class of problems. A problem belongs to this special class if the sets and have a common underlying structure which can be measured using subsets of the match set . To obtain a tractable formulation, we define a set of binary variables such that . Let a binary valued function measure the agreement between two small subsets , . evaluates to 1 if the subsets and agree up to some threshold and 0 otherwise. Then the following problem is an alternative of the original problem (2):


The function can be thought of as a rule which uses priors on the sets and to measure the agreement on the match subsets. The subsets and sampled from the match set , are the minimal sets such that can be evaluated. Problem (3) simply means, in case two subsets chosen on the basis of some prior do not agree with each other, at least one member from the union of those subsets must be an outlier. This is the key idea of our work. It should be noted here that although solving problem (3) optimally does not guarantee an optimal solution for problem (2), the latter is an alternative formulation of the former. Therefore solving problem (3) amounts to solving the model-free consensus maximization. However, problem (3) is still a combinatorial problem and is NP-hard. In the next section we give more insights to the problem using a graph structure and provide a globally optimal method to solve it with integer programming. We then show how such a formulation can be applied in the non-rigid shape problems.

4 Consensus Maximization with a Graph

In section 3.4, we introduced the subsets and to formulate the agreement function . For clarity, we represent the collection of all samples and and the agreement function in a graph . The nodes are composed by adding all unique sampled subsets and . The nodes are connected by edges . We use the index to denote the nodes and the index to denote the edges . Figure 1 illustrates this representation of the problem.

Figure 1: Graph formulation for consensus maximization. The selected point sets are drawn as apricot and purple nodes in the graph, connected by edges representing the compatiblity between the sets. The point clouds are taken from [35].

4.1 Graph Formulation

Given the graph , we would still like to compute the original binary variable set . We can think of as the integer attributes of the graph nodes. For compactness and simplicity, we use each node index to refer to the quantities of the corresponding minimal subsets in the original points. For example, . Similarly we define the binary variable set of an edge as for . The constraint on the binary variables can then be compactly expressed as the following:


where represents the sum of all the elements in the set . Problem (3) with constraint (4) is an example of graph optimization where we need to compute the node properties for each node using all the edge measurements .

4.2 Integer Programming

Using the constraint of (4), we propose an efficient way to solve the consensus maximization problem, under the framework of Integer Programming, as:


Problem (5) can be efficiently solved using any off-the-shelf solver for Integer Programming. This is done using the popular BnB method. Often problems in consensus maximization are solved with integer programming using the so-called big method [36]. Such a formulation of (5) is needed when a binary decision function cannot be defined for a given edge . In that case, the integer inequality in problem (5) is written as using the scalar-valued function and a scalar threshold . Here, is a chosen large scalar number that makes the problem feasible when is large. However, in this work we consider only those problems that can be expressed with a binary rule . In the next section, we describe two different problems in non-rigid shapes which can be expressed in the form of problem (5).

Relaxed alternatives and BnB.

Problems of integer programming with binary or integer variables are non-convex in nature. Such problems can be solved with graph cuts by further relaxing the binary variable constraint with real bounds. In contrast to the relaxed framework, we opt for the BnB approach in order to obtain a globally optimal solution even in case of high outlier ratio. Such an approach computes the lower and upper bound of the cost iteratively and terminates with a certificate of sub-optimality if they are equal. We compare these two approaches in the experiments section 6.

5 Non-Rigid Shapes

Non-rigid objects have deformations that cannot be parameterized with a small fixed set of parameters. It is also impractical to design a point-wise function as in problem (2). Nevertheless, non-rigid objects do obey strictly or loosely some shape priors. One such prior is the isometric deformation. In an isometric deformation, the geodesic distances between any pair of points is preserved despite the change in shape.

5.1 Shape Matching with Isometry

We consider two different shapes and related by an unknown isometric deformation. We want to establish the set of outlier points using the deformation prior on the matching set . Such problems may arise, for example, when registering 3D non-rigid surfaces using image matches [28] or when registering different shapes with a 3D feature point descriptor [25, 26]. Our goal is to filter the image or 3D point matches using the 3D surfaces. For that purpose, we propose the following graph attributes and agreement function:


where denotes the function that measures geodesic distances on the surface between two points. Constraint (6) gives the measurement of the agreement of geodesics on the two surfaces as a graph. Each node of the graph consists of only matching pair in . This means that each constraint obtained from the edge consists of only two binary variables. This makes the problem sparse and relatively easy to solve. One could also solve such problem by relaxing the binary variable as a bounded real number that can be thresholded to obtain the final results after optimization. As we will see in the results section, such a relaxation can work fairly well for low to moderate percentage of outliers but fail when the number of outliers increase. Although, we only show the problem formulation using isometry, other deformation priors such as conformality can be used in problem (6).

Practical considerations.

While the method works perfectly for isometric surfaces, objects which are undergoing topological changes such as a tearing piece of paper or a human body crossing arms would pose additional difficulty. This is because the geodesic distances are not preserved between all the points. In such cases, one should use the local isometric property and evaluate only close neighbors as explained in algorithm 1.

  1. Cluster initial matches into disjoint clusters.
  2. For each cluster ,  (a) Compute nearest neighbors and establish edges . (b) For each edge compute the agreement function . (c) Formulate constraints (6) with .
  3. Aggregate all the results from each cluster .
Algorithm 1 : = shapeRegistration

The algorithm also addresses the non-linear time complexity of the integer programming problem. This is done by first sampling the matches into separate cluster of points based on neighborhood in both shapes before applying the method on each cluster separately. This further allows us to use the method in highly dense point surfaces as the time complexity with the number of clusters is always linear. In order to have geodesics that are accurate enough to evaluate the agreement function we compute the geodesics by subdivision of the mesh. The mesh is computed by a simple Delaunay triangulation of the 3D points.

5.2 Image to Template Matching

Template-based reconstruction is a well-studied problem [24, 23, 30, 19] which relies upon having a set of matches between the template shape and the deformed shape projected onto an image . Again, matches established using feature descriptors may consist of outliers, in which case the reconstruction obtained can be of poor quality. In such case, we propose the use of piece-wise rigidity and surface smoothness as the prior to define the agreement function . Despite non-rigidity, surface smoothness has been successfully used in the state-of-the-art template-based reconstructions [30, 19]. We use a similar approach by considering that the relative camera to object pose changes smoothly over the surface.

We build a pose graph over the surface and define the agreement function as follows:


where and represent the rotations of the absolute pose estimated using the node and respectively, for the image . We define to be the set valued function that gives the neighboring node in the graph. Similarly and represent the solution of the camera translations. The rule measures how well the pose has been estimated from each of the triangle. To that end, is the function used to measure the distance between two rotations. We use two hyperparameters and to threshold the change in rotation and translation respectively. Local rigidity and surface smoothness implies that the poses should also change smoothly. The absolute pose problem can be solved using any of the so-called PnP methods [37, 38, 39]. Many of these methods require four or more points. We consider only the minimal problem that uses three non-collinear matched points and is also known as the P3P method [37]. The solutions obtained with P3P have a 4-fold ambiguity. This can be disambiguated either by using an additional matching point pair or by simply choosing the solution that minimize . The nodes are sampled such that each edge requires only four unique point matches and therefore each inequality constraint will consist of four binary variables.

Practical considerations.

Piecewise rigidity is a stronger prior compared to isometry. For non-rigid shapes, this holds true only for close neighbors. Also in contrast to the shape matching problem of 5.1, each edge here requires four point matches instead of two. For that reason, this only works if the matching set is dense enough so that the points obey rigidity at least locally. Algorithm 2 describes the implementation of the method for image-to-template matching problem.

  1. Cluster initial matches into disjoint clusters.
  2. For each cluster ,  (a) Compute various triangulations of the point clusters and establish edges with two triangles. (b) For each such pair of triangles with shared edge, evaluate . (c) Formulate constraints (7) with .
  3. Aggregate all the results from clusters.
Algorithm 2 : = templateImageRegistration

A very naive simplification of algorithm 2 can be made by considering all points that produce a high number of 1’s in the agreement function to be inliers. We term such a voting method as local filtering which can find obvious inliers in the image-template matching problem.

6 Experimental Results

We present the results and analysis of our proposed methods in this section using standard datasets of shapes and images. We refer to the integer program based methods as exact or the proposed method. We also compare with the alternative method where the binary variables in our integer problem (5) have been relaxed to real variables, which we refer to as relaxed or simply the relaxed method. We compare and use several matching or outlier removal methods. We write the spline warp based image outlier removal method [18] as featds. We use the graph matching method [12] as maxpoolm. We test the image-template outlier removal method based on mesh Laplacian [19] as laplacian. Apart from these image-based methods we also use shape matching methods. We write the recent method of deformable shape kernel matching [15] as KM. We write the deep functional map [22] as DFM and the blended intrinsic maps [14] as BIM. For all experiments we use MATLAB implementation with YALMIP222 and MOSEK333 solver for integer programming and linear programming. Below we describe in detail the experiments for each of the non-rigid registration problems we discussed in the paper.

6.1 Non-rigid Shape Matching

We begin by analyzing the behavior of the proposed method in a synthetic environment where ground truth correspondences are available. This contains two mocap generated cloth-capture datasets [35] with synthetically generated outliers. We then move to several datasets with real outlier statistics. We investigate the behavior on SIFT matches using a dataset obtained with KINECT [40] and the SfM generated hand deformation pair [41]. Furthermore, we conduct experiments on partial matching of 3D human body scans from the FAUST [42] dataset.

6.1.1 Mocap data.

We test with two cloth-capture data [35]. The datasets consist of a cloth falling (toss) and a moving pair of trousers (stepping trousers). The datasets are generated with mocap and consist of real points registered in all the frames. We synthetically generate outliers by randomly re-assigning matches to evaluate our methods.

Figure 2 (a) compares the relaxed and exact version of the proposed method. We observe that, for low outlier ratio, it is possible to remove all the outliers using the relaxed solution. However, the relaxed solution breaks down as the percentage of outliers increase beyond 50%, while the exact solution still correctly detects all inliers up to 80% of outliers. Note that the proposed method does not detect any false positive inliers. Figure 2 (b) shows how the exact method behaves with the number of iterations. We observe that the method quickly computes the upper bound cost or the pessimistic outlier set while it takes a while to obtain the certificate of optimality of the set. We find this behavior to be consistent to many other experimental setups. Figure 2 (c) shows the number of open nodes at each iteration, describing how BnB evaluates and prunes branches. To investigate time complexity, we also plot the execution time for the exact method in figure 2 (d). It can be observed that the execution time increases with increase in the number of points. However, this is not a problem in practice thanks to the clustering framework presented in algorithm 1.

(a) exact vs. relaxed (b) BnB Convergence (c) BnB open nodes (50% outliers)

(d) Run time of our method with increasing number of points and outlier percentage.

Figure 2: Analysis of our method. Number of inliers detected, convergence of the proposed method, and time taken for the mocap cloth dataset [35] under various setups. Note that the number of iterations in (b) and (c) are in log-scale.

6.1.2 KINECT Newspaper dataset.

The RGB-D data obtained from depth-camera sensors such as KINECT make an important field of application for the method. We investigated our method on the Newspaper dataset444 Dataset was provided by the authors. [41]. It consists of a double-sided sheet of newspaper being teared in two parts. Figure 3 shows the inliers and outliers for a part of the template image with our method. Due to the local neighborhood computed using both point sets, the exact method can robustly handle the topological changes. On the other hand, due to limited number of constraints from the local neighborhood, the relaxed method is unable to identify the outliers555The complete set of results are provided in the supplementary document..

(a) exact

(b) relaxed

(c) laplacian

Figure 3: Newspaper dataset. Visualization of inlier and outlier matches from our exact and two next best performing methods for an example pair of the Newspaper dataset. Left column shows the inlier detection and the right column shows the outlier detection.

6.1.3 Hand dataset.

The hand dataset [41] consists of two different instances of the deformations of a hand and their 3D ground truth obtained with SfM. We obtain correspondences via SIFT matching between two frames. Due to the non-rigid deformation, SIFT detects very few matches with a large percentage (more than 70%) of outliers. Similarly shape matching methods [15, 14] completely fail on this dataset and we do not use them here. We show the results of our exact method in figure 4 and the other best performing methods in figure 5. These qualitative results clearly shows that the relaxed method as well as the compared methods cannot correctly identify the outliers in such difficult cases.

Figure 4: SfM Hand dataset. Inlier detections (left) and outlier detections (right) of our exact method.
Figure 5: Inlier detections with laplacian (left) and relaxed (right).

6.1.4 Human body shapes.

In the next set of experiments, we use our method on human body scans from the FAUST [42] dataset. To introduce challenging outliers, we consider a partial matching scenario by cutting one arm and one leg from the mesh, and matching it to the full one. Thanks to the mesh registrations provided by the dataset, we can exactly evaluate inliers and outliers based on geodesic deviations to the ground truth correspondences (deviations greater than 15cm are considered as outliers). We compare our relaxed and exact methods against matches estimated by DFM [22], KM [15], and BIM [14]. Although BIM [14] produced visually good correspondences, it suffered from ambiguity issues, that could not be resolved. Therefore we compare to BIM only where proper evaluations were possible.

Since our method is designed for isometric deformations, we conducted the first experiment in the intra-subject case (same subject in 9 different poses). We observe that our method can successfully eliminate more than 90% outliers produced by DFM as well as KM without removing a significant number of true inliers, as shown in the first column of Table 6.1.4.

In cross body shape matching applications however, the isometry assumption only holds to some extend. Therefore, it is interesting to investigate the robustness of our approach by experimenting on non-isometric shape deformations. We used two challenging datasets to this end. The first one is a inter-subject matching on the FAUST data, again in the partial matching setting. Since the body shape varies across subjects, isometry doesn’t hold anymore. We therefore employ a slightly higher thresholds on the geodesics-based outlier rule. The results presented in the second column of Table 6.1.4 demonstrate that this problem is significantly harder than the isometric matching. We can see that matches provided by BIM contain outliers that are very hard to detect, and only 15% can be removed without sacrificing many inliers. For DFM and KM, we can reliably detect more than 80% and 90% resp., and therefore improve the matching robustness for subsequent tasks.

Our third experiment with human body shapes involves dense correspondence estimation from a depth map to the 3D model. We rendered synthetic depthmap mimicking the projection and noise properties of KINECT from an articulated MPII Human Shape model [43] using variations of upright poses and body shapes. To compute the geodesics on this modality, we triangulated the point cloud using 2D Delauney triangulation. Applying DFM and KM on the raw input does not work well, since SHOT [44] and HKS [45] are not reliable features for depth maps. We therefore took initial matches from a metric regression forest [46] trained on this specific task of dense correspondence estimation. We then compare our methods, KM and ICP on top of these matches in the third column of Table 6.1.4. We can conclude that, although provided with inital matches, KM fails to robustly match the two modalities. Our method however shows promising results even though the matching is non-isometric, and geodesics are computed on the triangulated point cloud. Interestingly, our result is comparable to that of articulated non-rigid ICP which exploits additional information such as the kinematics and a shape prior, but suffers from local minima. Fig. 6.1.4 shows one qualitative example from our test set.

In summary, we demonstrated that our method can be used on top of generic matching methods to robustly detect outliers for isometric deformations, and even in some classes of non-isometric registration such as inter-subject human body shapes. Moreover, we can confirm our results on synthetic data and conclude that even the relaxed version of our method provides similarly good results if the number of outliers is less, while being slightly faster than the exact method.

Method Dataset
intra-subject inter-subject from rendered depth map
Inliers / Outliers Time [s] Inliers / Outliers Time [s] Inliers / Outliers Time [s]
BIM - - 3381 / 1602 3 - -
BIM+Ours (relaxed) - - 3269 / 1362 10 - -
BIM+Ours (exact) - - 3267 / 1395 32.9 - -
DFM 4211 / 772 1 3756 / 1227 1 272 / 3728 1
DFM+Ours (relaxed) 3918 / 31 19 3437 / 93 15 - -
DFM+Ours (exact) 3918 / 31 24 3437 / 93 19.4 - -
KM 4736 / 181 89 4051 / 860 92 572 / 3387 53
KM+Ours (relaxed) 4554 / 18 104 3634 / 161 107 - -
KM+Ours (exact) 4556 / 17 110 3634 / 161 115 - -
RF - - - - 3220 / 780 1
RF+KM - - - - 1162 / 269 3
RF+Ours (relaxed) - - - - 2800 / 137 14
RF+Ours (exact) - - - - 2800 / 137 15
RF+ICP - - - - 3166 / 159 301
mostly isometric non-isometric

tableNon-rigid 3D shape matching. Comparing methods on FAUST [42] intra- and inter-subject, as well as matching depth maps to the MPII HumanShape [43] model.


figureQualitative results. Non-isometric shape matching from depth map. L.t.r.: body mesh model [43], RF [46], RF+KM [15], RF+Ours, RF+ICP, input depth map. Correspondences are color-coded, gray indicates removed matches.

6.2 Template to Image Matching

The template 3D structure to image matching is an important problem in non-rigid geometry. Most reconstruction methods [19, 30] are sensitive to outlying correspondences and proceed by first removing outliers in matches. We use problem (7) to formulate the template to image outlier removal method with the help of piece-wise absolute pose. We test our results on three datasets: KINECTPaper [47], T-Shirt [48] and the MPI Sintel [49] all of which contain the groundtruth 3D data and images. The KINECTPaper consists of VGA resolution images and depth of a large piece of paper smoothly deforming over time. The T-Shirt data consists of high-resolution wide-baseline images and 3D of a deforming t-shirt. The Sintel data is an animated movie with groundtruth depth. We select a single pair for each dataset and compute the SIFT matches. We count the number of inliers and outlier matches manually for each of the methods’ output. We compare with three other state-of-the-art methods for outlier removal or matching points. We compare our exact method with laplacian, featds and maxpoolm. Similarly, as discussed in section 5.2 we also report the results of the relaxed solution. We further report the results of the local-filtering method as another baseline where the inliers are decided based on the local neighborhood voting.

Kinect T-shirt Sintel
Methods Inliers Time(s) Inliers Time(s) Inliers Time(s)
Local-filtering 46/142 4.22 95/351 6.10 17/68 2.03
Relaxed solution 99/142 5.56 291/351 7.52 44/68 3.51
exact 114/142 7.59 309/351 9.66 53/68 5.01
laplacian 126/142 1.15 301/351 7.84 44/68 0.53
featds 76/142 3.93 304/351 1.46 42/68 0.32
maxpoolm 3/142 159.96 6/351 608.55 16/68 7.88
Table 1: 3D template to image matching. Comparison on three different real datasets.

We conducted all the methods in their favorable conditions and the The reported inliers were manually validated. The results show that our method performs in par with the state-of-the-art outlier removal method [19] designed for template-based outlier removal. Note that the exact method consistently detects more number of inliers than other methods. Our method performs better than featds in multi-body situation as featds uses a single spline-based warp and computes the residuals to identify outliers. We visualize the results of outlier removal in figure 6 for the proposed method and two other methods: featds and laplacian.

Figure 6: Inliers (left) vs. Outliers (right) for the T-shirt dataset using the exact method. The performance of our method is on average better than that of the two compared methods designed for non-rigid matching. More results are provided in the supplementary material.

7 Conclusion

In this paper we brought forward a theory on model-free consensus maximization using integer programming and its optimal solution with Branch and Bound approach. We formulated two different registration problems using our consensus maximizer: isometric shape outlier removal and image-template outlier removal. We showed optimal outlier removal at up to 80% mismatches in non-rigid shape registration and 25% mismatches in image-template registration. These results were obtained by tackling a close relaxation of the original problem with optimality. We showed with extensive experiments that our methods consistently performs on par or better than the existing methods. In future, we plan to use the formulation with other priors suitable for non-rigid applications such as conformal deformation and local affine invariance.


  • [1] Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Second edn. Cambridge University Press, ISBN: 0521540518 (2004)
  • [2] Longuet-Higgins, H.: A computer algorithm for reconstructing a scene from two projections. Nature 293 (1981) 133–135
  • [3] Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6) (2004) 756–777
  • [4] Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6) (1981) 381–395
  • [5] Chin, T.J., Kee, Y.H., Eriksson, A., Neumann, F.: Guaranteed outlier removal with mixed integer linear programs. In: CVPR. (2016)
  • [6] Speciale, P., Paudel, D.P., Oswald, M.R., Kroeger, T., Gool, L.V., Pollefeys, M.: Consensus maximization with linear matrix inequality constraints. In: CVPR. (2017)
  • [7] Bazin, J.C., Li, H., Kweon, I.S., Demonceaux, C., Vasseur, P., Ikeuchi, K.: A branch-and-bound approach to correspondence and grouping problems. IEEE transactions on pattern analysis and machine intelligence 35(7) (2013) 1565–1576
  • [8] Hartley, R.I., Kahl, F.: Global optimization through rotation space search. IJCV 82(1) (2009) 64–79
  • [9] Bazin, J.C., Seo, Y., Hartley, R.I., Pollefeys, M.: Globally optimal inlier set maximization with unknown rotation and focal length. In: ECCV. (2014)
  • [10] Li, H.: Consensus set maximization with guaranteed global optimality for robust geometry estimation. In: ICCV. (2009)
  • [11] Zheng, Y., Sugimoto, S., Okutomi, M.: Deterministically maximizing feasible subsystem for robust model fitting with unit norm constraint. In: CVPR. (2011)
  • [12] Cho, M., Sun, J., Duchenne, O., Ponce, J.: Finding matches in a haystack: A max-pooling strategy for graph matching in the presence of outliers. In: CVPR. (2013)
  • [13] Collins, T., Mesejo, P., Bartoli, A.: An analysis of errors in graph-based keypoint matching and proposed solutions. In: ECCV. (2014)
  • [14] Kim, V.G., Lipman, Y., Funkhouser, T.: Blended intrinsic maps. In: ACM Transactions on Graphics (TOG). Volume 30. (2011)  79
  • [15] Lähner, Z., Vestner, M., Boyarski, A., Litany, O., Slossberg, R., Remez, T., Rodolà, E., Bronstein, A.M., Bronstein, M.M., Kimmel, R., Cremers, D.: Efficient deformable shape correspondence via kernel matching. In: 3DV. (2017)
  • [16] Bernard, F., Schmidt, F.R., Thunberg, J., Cremers, D.: A combinatorial solution to non-rigid 3d shape-to-image matching. In: CVPR. (2017)
  • [17] Pilet, J., Lepetit, V., Fua, P.: Fast non-rigid surface detection, registration and realistic augmentation. International Journal of Computer Vision 76(2) (2008) 109–122
  • [18] Pizarro, D., Bartoli, A.: Feature-based deformable surface detection with self-occlusion reasoning. International Journal of Computer Vision 97(1) (2012) 54–70
  • [19] Ngo, T.D., Östlund, J.O., Fua, P.: Template-based monocular 3D shape recovery using laplacian meshes. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(1) (2016) 172–187
  • [20] Aflalo, Y., Dubrovina, A., Kimmel, R.: Spectral generalized multi-dimensional scaling. International Journal of Computer Vision 118(3) (2016) 380–392
  • [21] Vestner, M., Litman, R., Rodolà, E., Bronstein, A., Cremers, D.: Product manifold filter: Non-rigid shape correspondence via kernel density estimation in the product space. In: CVPR. (2017)
  • [22] Litany, O., Remez, T., Rodola, E., Bronstein, A.M., Bronstein, M.M.: Deep functional maps: Structured prediction for dense shape correspondence. In: ICCV. (2017)
  • [23] Salzmann, M., Fua, P.: Linear local models for monocular reconstruction of deformable surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(5) (2011) 931–944
  • [24] Bartoli, A., Gérard, Y., Chadebecq, F., Collins, T., Pizarro, D.: Shape-from-template. IEEE Trans. Pattern Anal. Mach. Intell. 37(10) (2015) 2099–2118
  • [25] Salti, S., Tombari, F., Di Stefano, L.: Shot: Unique signatures of histograms for surface and texture description. Computer Vision and Image Understanding 125 (2014) 251–264
  • [26] Ovsjanikov, M., Mérigot, Q., Mémoli, F., Guibas, L.: One point isometric matching with the heat kernel. In: Computer Graphics Forum. (2010)
  • [27] Le, H., Chin, T.J., Suter, D.: Conformal surface alignment with optimal mobius search. In: CVPR. (2016)
  • [28] Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., Stamminger, M.: Volumedeform: Real-time volumetric non-rigid reconstruction. In: ECCV. (2016)
  • [29] Wandt, B., Ackermann, H., Rosenhahn, B.: 3d reconstruction of human motion from monocular image sequences. IEEE transactions on pattern analysis and machine intelligence 38(8) (2016) 1505–1516
  • [30] Chhatkuli, A., Pizarro, D., Bartoli, A., Collins, T.: A stable analytical framework for isometric shape-from-template by surface integration. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(5) (2017) 833–850
  • [31] Chin, T.J., Suter, D.: The maximum consensus problem: recent algorithmic advances. Volume 7. Morgan & Claypool Publishers (2017)
  • [32] Chin, T.J., Cai, Z., Neumann, F.: Robust fitting in computer vision: Easy or hard? arXiv preprint arXiv:1802.06464 (2018)
  • [33] Brunet, F., Bartoli, A., Hartley, R.: Monocular template-based 3D surface reconstruction: Convex inextensible and nonconvex isometric methods. Computer Vision and Image Understanding 125 (2014) 138–154
  • [34] Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 34(6) (2015) 248:1–248:16
  • [35] White, R., Crane, K., Forsyth, D.: Capturing and animating occluded cloth. In: SIGGRAPH. (2007)
  • [36] McCormick, G.P.: Computability of global solutions to factorable nonconvex programs: Part i—convex underestimating problems. Mathematical programming 10(1) (1976) 147–175
  • [37] Kneip, L., Li, H., Seo, Y.: Upnp: An optimal o(n) solution to the absolute pose problem with universal applicability. In Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., eds.: Computer Vision – ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I. (2014)
  • [38] Lepetit, V., Moreno-Noguer, F., Fua, P.: Epnp: An accurate o(n) solution to the pnp problem. Int. J. Comput. Vision 81(2) (2009) 155–166
  • [39] Urban, S., Leitloff, J., Hinz, S.: Mlpnp - a real-time maximum likelihood solution to the perspective-n-point problem. In: ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences. Volume 3. (2016) 131–138
  • [40] Chhatkuli, A., Pizarro, D., Collins, T., Bartoli, A.: Inextensible non-rigid structure-from-motion by second-order cone programming. IEEE Transactions on Pattern Analysis and Machine Intelligence, pre-print (2017)
  • [41] Chhatkuli, A., Pizarro, D., Collins, T., Bartoli, A.: Inextensible non-rigid shape-from-motion by second-order cone programming. In: CVPR. (2016)
  • [42] Bogo, F., Romero, J., Loper, M., Black, M.J.: FAUST: Dataset and evaluation for 3D mesh registration. In: Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Piscataway, NJ, USA, IEEE (2014)
  • [43] Pishchulin, L., Wuhrer, S., Helten, T., Theobalt, C., Schiele, B.: Building statistical shape spaces for 3d human modeling. Pattern Recognition (2017)
  • [44] Tombari, F., Salti, S., Di Stefano, L.: Unique signatures of histograms for local surface description. In: Proc. ECCV. Volume 6313. (09 2010) 356–369
  • [45] Sun, J., Ovsjanikov, M., Guibas, L.: A concise and provably informative multi-scale signature based on heat diffusion. In: Comput. Graph. Forum. Volume 28. (07 2009) 1383–1392
  • [46] Pons-Moll, G., Taylor, J., Shotton, J., Hertzmann, A., Fitzgibbon, A.: Metric regression forests for correspondence estimation. International Journal of Computer Vision (2015) 1–13
  • [47] Varol, A., Salzmann, M., Fua, P., Urtasun, R.: A constrained latent variable model. In: CVPR. (2012)
  • [48] Chhatkuli, A., Pizarro, D., Bartoli, A.: Non-rigid shape-from-motion for isometric surfaces using infinitesimal planarity. In: BMVC. (2014)
  • [49] Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: ECCV. (2012)
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description