# Efficient Outlier Removal for Large Scale Global Structure-from-Motion

###### Abstract

This work addresses the outlier removal problem in large-scale global structure-from-motion. In such applications, global outlier removal is very useful to mitigate the deterioration caused by mismatches in the feature point matching step. Unlike existing outlier removal methods, we exploit the structure in multiview geometry problems to propose a dimension reduced formulation, based on which two methods have been developed. The first method considers a convex relaxed minimization and is solved by a single linear programming (LP), whilst the second one approximately solves the ideal minimization by an iteratively reweighted method. The dimension reduction results in a significant speedup of the new algorithms. Further, the iteratively reweighted method can significantly reduce the possibility of removing true inliers. Realistic multiview reconstruction experiments demonstrated that, compared with state-of-the-art algorithms, the new algorithms are much more efficient and meanwhile can give improved solution. Matlab code for reproducing the results is available at https://github.com/FWen/OUTLR.git.

## I Introduction

Robust model fitting is a fundamental problem in many computer vision applications. Important application examples include robust fundamental matrix estimation and homography matrix estimation, vision-based localization in robotics navigation, global outlier removal in structure from motion [1], [2].

Robust model fitting is also known as the maximum consensus problem. Given a set of measurements , an important problem arises in computer vision is to remove the outliers in this data set. This problem is referred to as the maximum consensus problem [3], which aims to find a model, parameterized by , that is consistent with as many of the input data as possible, i.e., has the largest consensus set as

(1) |

where is the inlier threshold, is the index set. For a solution with size , denotes the index set of the true inliers, and denotes the index set of the true outliers.

On the one hand, due to the intractability of the robust geometric fitting problem, global optimum can only be found by searching [25], which makes globally optimal algorithms only suitable for low-dimensional problems. On the other hand, the class of randomized hypothesize-and-verify algorithms are more efficient and popular, e.g., RANSAC [4] and its many variants [26]. Although such algorithms are efficient, they can usually produce only approximate solution and do not guarantee a good estimate due to their randomized nature. Very recently, methods fill the gap between these two classes have been proposed in [3], [24]. Such methods are much more efficient than globally optimal algorithms, while be able to achieve better solution than hypothesize-and-verify algorithms.

The main application interest of this work is large-scale multiview reconstruction. Typically, in such applications, feature points are first extracted from the images and matched between image pairs, then, the geometry is estimated by the matched corresponding points. The algorithms introduced above are usually used in the matching step to achieve robust matching. However, since the automatic matching is performed locally on a subset of the images, mismatches in longer point tracks often go undetected and may degrade the accuracy of the solution [19]. To address this problem, a post-processing step can be employed after automatic matching on the entire sequence, which aims to remove outliers globally.

Due to the high-dimensionality of the global outlier removal problem in practical multiview geometry, more efficient methods than [3]–[5], [24], [25] have to be used. The most efficient algorithm suitable for this problem is the method [6], which solves a convex relaxation of (1). Although there is no theoretical guarantee of success, it works very well in practice and can yield a significant drop in reprojection error.

In this work, we propose two algorithms which are more efficient than the method [6] and suitable for large-scale 3D reconstruction. We use a dimension reduced formulation to reduce the computational complexity. The first one solves a convex relaxed minimization by linear programming (LP). The second one approximately solves the ideal formulation by an iteratively reweighted algorithm. The proposed algorithms are not only much more efficient, but also can achieve better solution in practical 3D reconstruction.

## Ii Proposed Formulation and Connection to Existing Works

### Ii-a Proposed Formulation

Using a nonnegative auxiliary variable , , problem (1) can be recast into the following formulation

(2) |

where denotes the norm which counts the number of nonzero elements in . For a solution of (2), it holds , which is the index set of the outliers, and . denotes the support set (the index set of nonzero) of a vector. Equivalently, problem (2) can be expressed as the constrained minimization

(3) |

### Ii-B Multiview Geometry

The above formulations are designed for the linear regression residual , which can be extended to handle geometric residuals, e.g., in multiview reconstruction. In multiview geometry, the goal is to estimate the structure of the scene and the camera motion from image projections. For instance, let be a measurement in one of the images and be its corresponding 3D-point. Given the camera matrix , the squared projection error is

(6) |

where denotes the -th row of , denotes the translation of the camera. In the general framework, a reformulation of the squared error residuals is of the form

(7) |

In many geometry problems in computer vision, if either or is known, the residual formulation (6) can be rewritten in the form of (7). For example, for the triangulation problem, and are kept fixed, and denotes the position parameters of 3D-points. For the multiview reconstruction problem, both the positions of the 3D-points and the positions of the cameras are unknown. In this case, in (7) contains the parameters of the 3D-points and the camera translations, which is also called the known rotation problem [7].

To handle such geometric residuals, in addition to the Euclidian distance, the coordinate-wise max distance ( norm) and the absolute distance ( norm) are also popular due to their convenience. Specifically, a general formulation of the error residuals is of the form [7], [8]

where is quasi-convex for any [16]. With an inlier threshold , the constraint for outlier removal is

(8) |

where is naturally satisfied as .

For the norm, i.e., , the constraint becomes

which can be expressed as the following linear inequalities

Meanwhile, in practical applications, a depth constraint can be additionally considered as

where and denote the minimal and maximal depth, respectively. It can be converted into two linear constraints

In this case, in the constraint of (5).

### Ii-C Connection to Existing Works

A reformulation of (1) has been considered in [5] as

(9) |

where is a large positive constant. For a solution of (9), it is easy to see that , which implies the equivalence of the formulations (2), (3), (5) and (9) for solving the maximum consensus problem (1). To solve the maximum consensus problem exactly and efficiently, a guaranteed outlier removal (GORE) approach based on mixed integer linear programming has been proposed in [5] to reduce the runtime of exact algorithms. But it does not scale to high-dimensional problems, e.g., large-scale multiview reconstruction.

Very recently, deterministic approximate methods have been proposed in [3], [24]. These methods reformulate the consensus maximization problem with linear complementarity constraints, and employ the Frank-Wolfe optimization scheme and alternating direction method of multipliers (ADMM) to efficiently solve the reformulations. These algorithms are efficient and effective for low-dimensional problems such as fundamental matrix and homography estimation, but they still do not scale to large-scale multiview reconstruction problems.

It is popular to solve convex relaxed formulations of (1), e.g., approximation [6], [14], [15]. The most efficient method [6] considers a formulation as

(10) |

This minimization formulation can be viewed as a convex relaxation of (5), where the nonconvex norm is replaced by its convex envelop, i.e., the norm. Meanwhile, the structure in in the constraint is ignored and, hence, the number of unknown parameters in is while that in is . Sine in our formulation the dimension of variables is significantly reduced, it can be solved more efficiently with lower computational complexity (see section IV). The dimension reduction does not only reduce the computational complexity, but also yields improved result.

The relaxation is convenient due to its convexity and that well-developed LP solvers can be directly applied. However, the convex relaxation may degrade the performance. It has been demonstrated in the sparse recovery researches that, the or () norm can usually yield a sparser solution than the norm [12]. Empirical results have shown that the minimization (10) is likely to remove inliers in some conditions [6], [24]. Since the and norm penalties tend to yield a sparser solution, they can be expected to reduce the possibility of removing true inliers.

## Iii Proposed Algorithms

Generally, it is difficult to directly solve the nonconvex minimization problem (5). In this section, we first propose an algorithm to solve a convex relaxed version of it. Then, we develop an iteratively reweighted algorithm to approximately solve the ideal minimization problem (5).

### Iii-a Algorithm with Reduced Dimension

We consider a convex relaxation of (5) via replacing the -norm by its convex envelop, the -norm, as

(11) |

This formulation is similar to that considered in [6], except for that the number of slack variables in (11) is while that in the algorithm [6] is . The dimension reduction would result in a speedup. The dual problem of (11) is given by

(12) |

where and are the dual variables, and . is an identity matrix of size . The dual problem (12) can be solved by well-developed LP solvers. The algorithm is summarized as follows.

Algorithm 1: algorithm with reduced dimension |
---|

Input: The set of measurements , inlier threshold . |

Begin: |

Construct and from the measurements . |

Solve the LP problem (12) to obtain . |

Remove the residuals for which . |

End |

Output: A subset of the measurements for which such that , . |

### Iii-B Iteratively Reweighted Algorithm

As shown in [6], [24], the method probably removes true inliers in practical applications. This is explained in the last section, as the minimization may yield a solution not sparse enough. Ideally, we should minimize the number of nonzero elements in , i.e., the minimization problem (5). However, exact solving of (5) is difficult. Inspired by the success of iteratively reweighted methods in sparse recovery researches [13], we propose an iteratively reweighted algorithm to approximately solve the intractable problem (5).

First, we approximate the norm by the norm with a small value of (e.g., in the experiments). The norm usually yields a sparser solution than norm [20], [21]. Then, at the -th iteration, the norm is approximated via first-order expansion (linearization) at obtained at the -th iteration as

where is a small positive constant. Let

(13) |

denote the updated weighting vector based on of the -th iteration, then, the iteratively reweighted algorithm update the parameters at the -th iteration as

(14) |

where denotes the Hadamard product.

Similar to (12), the problem (14) can be solved by the duality approach. The dual problem of (14) is given by

(15) |

The iteratively reweighted algorithm is summarized as follows.

Algorithm 2: Iteratively reweighted algorithm |
---|

Input: The set of measurements , inlier threshold , , , an initialization . |

Begin: |

Construct and from the measurements . |

For |

Solve (15) with to obtain . |

Update via (13) based on . |

End For |

Set and remove the residuals for which . End |

Output: A subset of the measurements for which such that , . |

This algorithm iteratively updates the weighting vector and solves the LP problem (15) times. In practical applications, a small value of can yield sufficiently good performance, as shown in the experiments.

## Iv Experiments

This section evaluates the proposed algorithms via experiments in comparison with the following methods:

(a) RANSAC [4]: is used for stopping criterion.

(b) Exact penalty (EP) method [24]^{1}^{1}1Code at: https://www.researchgate.net/publication/320707327_demo_pami.:
initialized by the solution of the least squares method.
It deterministically solves a reformulation of the consensus maximization problem with linear complementarity constraints by the Frank-Wolfe algorithm.

(c) method [6]^{2}^{2}2Code is available at: http://www.maths.lth.se/matematiklth/personal/calle/.:
it solves the dual problem of (10) using an LP solver,
which is one of the most efficient algorithms that suitable for large-scale 3D-reconstruction.

(d) method [11]: solved using Gugat’s algorithm [22]. This method also use slack variables and minimizes the maximum slack value. It repeatedly removes the data with the largest slack value, until the largest slack value is not greater than zero.

We use and for Algorithm 2. SeDuMi [23] is used to solve the involved LP problems in all these algorithms. The experiments were implemented in MATLAB and run on a laptop with 2 GHz Intel I7 CPU and 16 GB RAM.

### Iv-a Robust Linear Regression with Synthetic Data

(Gugat’s) [11] | [6] | Alg. 1 | Alg. 2 () | ||

House | Removed outliers | 2396 | 862 | 722 | 498 |

Remaining inliers | 99936 | 101470 | 101610 | 101834 | |

RMSE (pixels) | 0.9119 | 0.5982 | 0.6007 | 0.6063 | |

Runtime (seconds) | 50878 | 975 | 322 | 477 |

(Gugat’s) [11] | [6] | Alg. 1 | Alg. 2 () | ||

Cathedral | Removed outliers | 1594 | 652 | 541 | 428 |

Remaining inliers | 44451 | 45393 | 45504 | 45617 | |

RMSE (pixels) | 1.1343 | 0.8036 | 0.8075 | 0.8178 | |

Runtime (seconds) | 2354 | 259 | 76 | 134 |

(Gugat’s) [11] | [6] | Alg. 1 | Alg. 2 () | ||

College | Removed outliers | 966 | 459 | 319 | 179 |

Remaining inliers | 26756 | 27263 | 27403 | 27543 | |

RMSE (pixels) | 1.0754 | 0.5134 | 0.5197 | 0.5562 | |

Runtime (seconds) | 4880 | 162 | 51 | 79 |

Before proceeding to the main interest of 3D-reconstruction, we repeat a synthetic experiment in [24] on a small linear regression problem with synthetic data. We generated points with and . The elements of follow a uniform distribution in . is perturbed by white Gaussian noise with standard deviation of 0.1. To simulate outliers, a part of the elements in are corrupted by much higher Gaussian noise with standard deviation of 1.

Fig. 1 shows the performance of the iteratively reweighted algorithm (Algorithm 2) versus the iteration number for different values of , e.g., . It can be seen that, Algorithm 2 converges within a few iterations, e.g., . Fig. 2 shows the results of the algorithms for different outlier ratio. Each result is an average over 100 independent run. Compared with the method, Algorithm 1 removes a same number of outliers in all cases, while Algorithm 2 removes fewer outliers. Algorithm 2 can achieve sufficiently good performance within a few iterations, e.g. . The EP method yields the largest consensus size. In terms of runtime, Algorithm 1 is the fastest while Algorithm 2 is much faster than EP.

### Iv-B Structure and Motion Estimation on Real Datasets

We consider the full 3D-reconstruction experiments in [6] and use the known-rotation formulation for outlier removal, which is a procedure of RANSAC followed by outlier removal and bundle adjustment [18]. The initial camera rotation and image data are obtained using RANSAC for pairwise images. The global outlier removal is achieved over the structure and translation of the cameras. The inlier threshold is selected corresponding to a reprojection error tolerance of 5 pixels.

Three open datasets^{3}^{3}3Available online at http://www.maths.lth.se/matematiklth/personal/calle/.
are used, including a house (consists of 23 cameras and 29220 3D points projected into 35470 image points), a cathedral (consists of 17 cameras and 16961 3D points projected into 46045 image points), and a college (consists of 57 cameras and 8990 3D points projected into 27722 image points). In these datasets, SIFT descriptors [17] are used to generate point correspondences and, then, RANSAC is used to discard outliers and determine orientations between pairs of cameras.

Since RANSAC and EP are not suitable for large-scale problems, only the [6] and [11] methods are compared here. We use and for Algorithm 2. Fig. 3 illustrates the resulting reconstruction on the three datasets. Table 1 compares the reconstruction results, including removed outliers, remaining inliers, root-mean-squared-error (RMSE) of reprojection in pixels, and runtime in seconds. Without removal of any outliers, the reprojection RMSEs in reconstructing the house, cathedral and college are respectively 4.76, 3.24, and 4.16 pixels. It can be seen that each algorithm can significantly reduce the RMSE, which implies the effective removal of some outliers. A significant drop in the reprojection RMSE implies the removed data are true outliers.

Algorithm 1 is more than three times faster than the method, which is due to the reduced dimension. For example, in the cathedral experiment, the number of variables in the method is 327201 ( contains the parameters of the 3D points and camera translations, and contains the slack variables). Whereas, in our formulation the number of variables is 96976, with and . In the experiments, as a depth constraint is also considered.

Compared with the method, Algorithm 1 removes fewer outliers while yields a comparable RMSE. The removal of fewer outliers is in contrast to the results in the linear regression experiment with synthetic data. Algorithm 2 removes much fewer outliers compared with the other algorithms, while being much faster than the and methods.

## V Conclusion

In multiview reconstruction, automatic feature point matching often generates some mismatches, which gives rise to outliers. This work developed two methods to detect and remove such outliers. Compared with existing methods, the new methods use a dimension reduced formulation, which significantly reduces the computational complexity. Realistic multiview reconstruction experiments demonstrated that, the new algorithms are much faster than state-of-the-art algorithms while give improved solution. Due to their efficiency and effectiveness, the new methods may be favored in practical large scale 3D reconstruction applications.

## References

- [1] P. Meer, “Robust techniques for computer vision,” in G. Medioni and S. B. Kang eds.: Emerging topics in computer vision. Prentice Hall, pp. 107–190, 2004.
- [2] D. Zou and P. Tan, “Coslam: Collaborative visual slam in dynamic environments,” IEEE Transactions Pattern Analysis and Machine Intelligence, vol. 35, no. 2, pp. 354–366, 2012.
- [3] H. Le, T. J. Chin, and D. Suter, “An exact penalty method for locally convergent maximum consensus,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1888–1896.
- [4] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
- [5] T. J. Chin, Y. H. Kee, A. Eriksson, and F. Neumann, “Guaranteed outlier removal with mixed integer linear programs,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5858–5866.
- [6] C. Olsson, A. P. Eriksson, and R. Hartley, “Outlier removal using duality,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010ï¼ pp. 1450–1457.
- [7] F. Kahl and R. Hartley, “Multiple view geometry under the -norm,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 9, pp. 1603–1617, 2008.
- [8] P. J. Huber, “Robust estimation of a location parameter,” The Annals of Mathematical Statistics, vol. 35, no. 1, pp.73–101, 1964.
- [9] S. Agarwal, N. Snavely, and S. Seitz, “Fast algorithms for problems in multiview geometry,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Anchorage, USA, 2008.
- [10] K. Astrom, O. Enqvist, C. Olsson, and F. Kahl, “An approach to structure and motion problems in 1d-vision,” in Int. Conf. Computer Vision, Rio de Janeiro, Brazil, 2007.
- [11] K. Sim and R. Hartley, “Removing outliers using the l-infinity norm,” in Conf. Computer Vision and Pattern Recognition (CVPR), pp. 485–494, New York City, USA, 2006.
- [12] F. Wen, P. Liu, and R. Qiu, “Nonconvex regularization based sparse and low-rank recovery in signal processing, statistics, and machine learning,” submitted for publication, 2018.
- [13] E. J. Candes, M. B. Wakin, S. P. Boyd, “Enhancing sparsity by reweighted l1 minimization,” Journal of Fourier Analysis and Applications, vol. 14, pp. 877–905, 2008
- [14] Y. Seo, H. Lee, and S. W. Lee, “Outlier removal by convex optimization for l-infinity approaches,” in PSIVTâ09: Pacific Rim Symposium on Advances in Image and Video Technology, 2009.
- [15] A. Dalalyan and R. Keriven, “L1-penalized robust estimation for a class of inverse problems arising in multiview geometry,” in Annual Conference on Neural Information Processing Systems, Vancouver, Canada, 2009.
- [16] Q. Ke and T. Kanade, “Quasiconvex optimization for robust geometric reconstruction,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1834–1847, 2007.
- [17] D. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. Journal Computer Vision, 2004.
- [18] D. Martinec and T. Pajdla, “Robust rotation and translation estimation in multiview reconstruction,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Minneapolis, USA, 2007.
- [19] C. Olsson, O. Enqvist, and F. Kahl, “A polynomial-time bound for matching and registration with ouliers,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Anchorage, USA, 2008.
- [20] F. Wen, L. Pei, Y. Yang, W. Yu, and P. Liu, “Efficient and robust recovery of sparse signal and image using generalized nonconvex regularization,” IEEE Trans. Computational Imaging, vol. 3, no. 4, pp. 566–579, 2017.
- [21] F. Wen, P. Liu. Y. Liu, R. C. Qiu, and W. Yu, “Robust sparse recovery in impulsive noise via Lp-L1 optimization,” IEEE Trans. Signal Process., vol. 65, no. 1, pp. 105–118, Jan. 2017.
- [22] M. Gugat, “A fast algorithm for a class of generalized fractional programs,” Man. Sci., vol. 42, no. 10, pp. 1493–1499, 1996.
- [23] J. F. Sturm. Using SeDuMi 1.02, a Matlab toolbox for optimization over symmetric cones. Optimization Methods and Software, vol. 11–12, pp. 625–653, 1999.
- [24] H. Le, T. J. Chin, A. Eriksson, and D. Suter, “Deterministic approximate methods for maximum consensus robust fitting,” arXiv preprint, arXiv:1710.10003, 2017.
- [25] T. J. Chin, P. Purkait, A. Eriksson, and D. Suter, “Efficient globally optimal consensus maximisation with tree search,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 2413–2421.
- [26] O. Chum, J. Matas, and J. Kittler, “Locally optimized ransac,” in DAGM. Springer, 2003.