MAGSAC: marginalizing sample consensus
A method called -consensus is proposed to eliminate the need for a user-defined inlier-outlier threshold in RANSAC. Instead of estimating , it is marginalized over a range of noise scales using a Bayesian estimator, i.e. the optimized model is obtained as the weighted average using the posterior probabilities as weights. Applying -consensus, two methods are proposed: (i) a post-processing step which always improved the model quality on a wide range of vision problems without noticeable deterioration in processing time, i.e. at most 1-2 milliseconds; and (ii) a locally optimized RANSAC, called LO-MAGSAC, which includes -consensus to the local optimization of LO-RANSAC. The method is superior to the state-of-the-art in terms of geometric accuracy on publicly available real world datasets for epipolar geometry (F and E), homography and affine transformation estimation.
Keywords:RANSAC, robust estimation, Bayesian estimation
The RANSAC (RANdom SAmple Consensus) algorithm proposed by Fischler and Bolles  in 1981 has become the most widely used robust estimator in computer vision. RANSAC and its variants have been successfully applied to a wide range of vision tasks, e.g. motion segmentation , short baseline stereo [2, 3], wide baseline stereo matching [4, 5, 6], detection of geometric primitives , image mosaicing , and to perform  or initialize multi-model fitting [10, 11]. In short, the RANSAC approach repeatedly selects random subsets of the input point set to which it fits a model, e.g. a plane to three 3D points or a homography to four 2D point correspondences. Next, the quality of the estimated model is measured, for instance by the size of its support, i.e. the number of inliers. Finally, the model with the highest quality, polished e.g. by a least squares fit on its inliers, is returned.
Since the publication of RANSAC, a number of modifications have been proposed. NAPSAC , PROSAC  and EVSAC  modify the sampling strategy to increase the probability of selecting an all-inlier sample early. NAPSAC assumes the inliers to be spatially coherent, PROSAC exploits an a priori predicted inlier probability of the points and EVSAC estimates a confidence in each point. MLESAC  estimates the model quality by a maximum likelihood process, albeit under certain assumptions about inlier and outlier distributions, with all its beneficial properties. In practice, MLESAC results are often superior to the inlier counting of plain RANSAC and they are less sensitive to the user-defined inlier-outlier threshold. In MSAC , the robust estimation is formulated as a process that estimates both the parameters of the data distribution and the quality of the model in terms of maximum a posteriori.
One of the highly attractive properties of RANSAC is its small number of control parameters.
The termination is controlled by a manually set confidence value and the sampling stops as soon as the probability of finding a model with higher support falls below .
As the major contribution of this paper, we propose an approach, -consensus, that eliminates the need for , the noise scale parameter and also keeps the processing time acceptable. Instead of , only a range, within lies, is required. The range can be fairly wide, say one order of magnitude. The final outcome is obtained by marginalizing over , using Bayesian estimation and data likelihood given the model and . Besides finessing the need for a precise scale parameter, the novel method, called MAGSAC, is more precise than previously published RANSACs.
As a second contribution, we replace maximization in RANSAC with Bayesian estimation which takes the form of robust model averaging weighted by the data likelihood for each so-far-the-best model. We build on the observation of Rais et al.  who showed that the accuracy of RANSAC, which depends on a single sample where the model attained the maximum score, can be improved by averaging other all-inlier models generated during the RANSAC procedure. Their Random Sample Aggregated Consensus (RANSAAC) algorithm  takes the weighted mean (or median) of all the generated models selecting a power of the inlier counts as weights. We replace this ad-hoc weighting by the a posteriori likelihood, justified under Bayesian estimation, and propose an approach for averaging over .
They however identified a problem, that significant number of models ( for homography fitting) should be generated and averaged in case of low inlier ratio to achieve stable results – which is prohibitive in practice. Dealing with this issue, they applied locally optimized RANSAC proposed by Chum et al.  who observed that RANSAC requires more samples in practice than theory predicts since not all all-inlier samples are “good”, i.e. lead to a model accurate enough for distinguishing all inliers. In , LO-RANSAC is used to select models worth to be involved into the averaging: the locally optimized ones.
As a third contribution, we propose a new local optimization, called LO-MAGSAC, which polishes each so-far-the-best model through a range of noise s. Since the number of LO runs is close to the logarithm of the number of verifications, the procedure is often faster than plain RANSAC due to the early termination. More importantly, the obtained results were superior to that of the state-of-the-art RANSAC variants on a wide range of vision problems and datasets. We however acknowledge that being often faster than plain RANSAC does not mean that the time demand has not been increased by the optimization of . Thus we also propose a post-processing step applying -consensus to the so-far-the-model without noticeable deterioration in processing time, i.e. maximum 1-2 milliseconds. In our experiments, the method always improved the input model (coming from RANSAC, MSAC or LO-RANSAC) on a wide range of problems. Thus we see no reason for not applying it after the robust estimation finished.
2 Bayesian optimization
In this paper, the input point set is denoted as , where is the dimension of the points, e.g. for 2D points. The inlier set is . The model to fit is represented by its parameter vector , where is dimension of the model, e.g. for 2D lines (angle and offset), and is the manifold, e.g. all possible 2D lines. Fitting function calculates the model parameters from data points, where and is the minimum point number for fitting a model, e.g. for 2D lines. Note that is a combined function applying different estimators on the basis of the input set, for instance, a minimal method if and least-squares fitting otherwise. Function is the point-to-model residual function. Function selects the inlier set given model and threshold . For example, if the original RANSAC approach is considered, , whilst for truncated quadratic distance, . The quality function is where higher quality is interpreted as better model. For RANSAC, and for MSAC, it is , where is the th inlier.
|- Set of data points||- Noise standard deviation|
|- Model parameters||- Residual function|
|- Inlier selector function||- Model quality function|
|- Fitting function||- Minimal sample size|
Suppose that we are given a model estimated from a minimal sample and a point set . The objective is to estimate the noise , given model and , minimizing the Bayesian quadratic loss. The problem is formulated as follows:
where is the model implied by the inlier set selected using around the input model . It can be solved using the joint probability rule:
Since is a non-negative constant, it is enough to minimize the integral as follows:
The solution is obtained by simply differentiating Eq. 3 as follows:
After rearranging the equation,
Note that .
To discretize the problem, let us notice that due to having a finite point set (), the set of s leading to models with different parameters cannot be infinite. The set of possible models given and is as follows:
Due to , it holds that . Therefore the integral can be replaced by a summation as follows:
Consequently, the model minimizing the loss is the weighted mean of the models which a finite set of s imply. The weights are the posterior probabilities of those s.
2.3 Posterior probabilities
To calculate the Bayesian posterior probability of model implied by given
required for the averaging, we need to model the input data as a mixture of probabilistic distributions. Note that is the likelihood of the points given and encapsulates the prior knowledge about , and thus about as well. It can be easily seen that can be omitted from Eq. 4 due to the division and the fact that it does not depend on . Let us start from the original RANSAC scheme where a fitness score is assigned to every data point interpreting it as a member of the outlier () or inlier () classes. In that approach, both classes are supposed to have uniform distributions and thus, the input point set is described by a mixture of those. The error distribution given the ground truth model is as follows:
where is the standard deviation of the inlier residuals () and is a boundary for the outliers, for instance, the image diagonal size for 2D points. The implied likelihood of given is as follows:
where is an indicator variable – determined by function – which is equal to if is inlier, and otherwise.
Leaving the RANSAC approach and assuming that the inliers have normal distribution (MSAC or MLESAC scheme), similarly as in , the residuals are described by the following mixed distribution:
Therefore is written as follows:
Note that the prior probability , without any prior knowledge about the desired model or the actual noise scale, can be chosen to a flat one or can either be updated step-by-step inside the RANSAC procedure. Also note that for avoiding extremely small weights, to which the product of many small numbers leads, it is beneficial to use the logarithm of as weights. It is as follows:
Example distributions estimated by the proposed method for are shown in Fig. 1 for synthetic homography fitting discussed later in the experimental section.
3 Algorithms using -consensus
In this section, we propose two algorithms, LO-MAGSAC and a post-processing approach which uses -consensus. LO-MAGSAC is a locally optimized RANSAC aiming to minimize the quadratic loss when each so-far-the-best model is found. Even though it is often faster than plain RANSAC, there are RANSAC variants promising faster procedure. Thus the reason of the post-processing step is to improve without noticeable increasing in the processing time by applying -consensus only once: to the final so-far-the-best-model, i.e. the output of the robust estimation. In the proposed algorithms, we define the quality function as follows:
Function , i.e. the one returning the inlier set, is .
3.1 Post-processing by -consensus
In order to use the previously described optimization without high computational overhead, we propose to apply it as a post-processing step refining only the RANSAC output. The proposed algorithm is summarized in Alg. 1.
Assuming that the given s are ordered, , , it is a straightforward solution for speeding up the process to get the inliers implied by the highest (; line 4) first. Then the next steps just need to update the inlier set by removing the points farther than the current (line 6) instead of checking all elements in . For each , a model is fitted to the implied inlier set (line 7) and its quality is computed (line 8). To achieve the maximum accuracy, the estimated model is also compared to the current so-far-the-best model and stored if has higher quality. The averaged model is then updated (line 12) by adding the new model with its quality as a weight. We use running average to be able to keep the results of intermediate steps as well if they lead to better quality (line 13) than the previous best.
It can be easily seen that the time complexity of the algorithm is not significant for small . Only the first iteration requires to check all the points, the remaining ones works with the shrinking inlier set, . The time complexity of function , considering that it is an SVD decomposition, is , where is the dimension of unknowns, e.g. for line fitting, it is . The worst case is when all points are at zero distance from the model, . has to run on all data points times, thus leading to complexity. The complexity of computing the score times is and that of the running average is . Therefore the overall worst case complexity is , which is linear in the number of points. However, the solution can straightforwardly be parallelized by computing the score with computational demand. Then the complexity becomes .
3.2 LO-MAGSAC: Locally optimized marginalizing sample consensus
We propose a locally optimized RANSAC applying the proposed -consensus inside the LO procedure. The algorithm is summarized in Alg. 2. First, it selects the inliers of the model to be optimized (line 1). Next, a RANSAC-like procedure is applied to the inlier set selecting -sized samples in each iteration (as proposed in ), where is the size of a minimal sample. Using guided sampling like PROSAC  instead of the uniform one is a straightforward choice. For PROSAC, the points have to be ordered by the feature scores. Thus it requires no additional computation to use PROSAC in the embedded RANSAC since the inlier selector function does not change the point ordering. Note that it could also be a justifiable approach to use the inlier probabilities of the points w.r.t. the given model. However, this would introduce bias into the estimation. A model is then estimated from the selected sample (line 4), its score is computed (line 5) and -consensus is performed (line 8).
The time complexity of in this case, considering that it is still an SVD decomposition, is , where is the number of unknowns. The score computation has complexity in general, but for parallel computing it has . As we discussed in the previous section, the time demand of the optimization step in the worst case is . Thus the overall complexity of the local optimization step is . That of the parallel approach is . Since , and are usually small numbers and do not depend on , the time demand is linear in the number of points.
4 Experimental Results
In order to evaluate the effect of the proposed post-processing step, we tested several approaches with and without this step. The compared algorithms are: RANSAC, MSAC, LO-RANSAC, and LO-MSAC. Being the first method which averages the estimated models, we also included the results of LO-RANSAAC . PROSAC sampling, the same random seed and algorithmic components were used for all methods and they performed a final least-squares on the obtained inlier set. Thus the difference between RANSAC – MSAC and LO-RANSAC – LO-MSAC is solely the scoring (i.e. quality) function. Moreover, the methods with LO prefix, except LO-MAGSAC, run the original local optimization step proposed by Chum et al.  with an inner RANSAC applied to the inliers. The parameters used are as follows: was the inlier-outlier threshold used for the RANSAC loop (this value was proposed in  and also suited for us). The number of inner RANSAC iterations was . The required confidence was . There was a minimum number of iterations required (set to ) before the first LO step applied and also before termination. The reported error values are the root mean square (RMS) errors. We used set for the optimization since it led to accurate results with negligible time demand.
4.1 Synthesized Tests
For testing the proposed method in a fully controlled environment, two cameras were generated by their projection matrices and . The first camera was located in the origin and its image plane was parallel to the XY plane. The position of the second camera was at a random point inside a unit-sphere around the first one, thus . Its orientation was determined by three random rotations affecting around the principal directions as follows:
where , and are random angles in-between values and . Both cameras had a common intrinsic camera matrix with focal length and principal points . A 3D plane was generated with random tangent directions and origin . It was sampled at locations, thus generating three-dimensional points at most one unit far from the plane origin. These points were projected into the cameras. All of the random parameters were selected using uniform distribution. Zero-mean Gaussian-noise with standard deviation was added to the projected point coordinates. Finally, outliers, i.e. uniformly distributed random point correspondences, were added. In total, points were generated, therefore .
The mean results of runs are reported in Fig. 11. The competitor algorithms are: RANSAC (RSC), MSAC (MSC), LO-RANSAC (LO-RSC), LO-MSAC (LO-MSC) and LO-MAGSAC. Suffix ”” means that -consensus was applied to the obtained model. The first row (a–c) reports the geometric accuracy (in pixels) as a function of the noise obtained using different outlier ratios (a – , b – , c – ). For instance, outlier ratio means that and . A fixed, and same, number of samples were used for all methods in these tests calculated from the ground truth inlier ratio requiring confidence . By looking at the differences between methods with and without the proposed post-processing step (””), it can be seen that it always improved the results on these tests. E.g. the geometric error of RSC is higher than that of RSC + for every noise . LO-MAGSAC led to the estimates with the lowest errors for outlier ratios and . It was the second best for low outlier ratio (), whilst the first one was LO-MSAC + in that case. For the second row (plots d–f), not a fixed number of samples but an iteratively updated one was used requiring confidence. The geometric errors are plotted as a function of the noise . Even though the differences become slightly smaller, the same trend can be observed as for the previous row. This change in the differences can easily be understood by noticing that the proposed local optimization more often led to earlier termination than the others – see plots (g) and (h). For (g–h), the required sample number is shown as the function of the noise scale (g) or the outlier ratio (h). Since methods with suffix ”+ ” use a post-processing step, the number of generated samples does not change. Thus they are not shown in these plots. It can be seen, that LO-MAGSAC needed the least number of samples for the termination in most of the cases. The last plot (i) shows the sensitivity of -consensus to the used range of s. The results were averaged over outlier ratios , , …, . For this test, LO-MAGSAC was ran optimizing with different sets. For instance, the case means that the used set was . Thus the tested s were from interval . It can be seen that the results are getting better as , i.e. the number of s in interval , is increasing. After a certain point, , the results have fairly similar quality.
Fig. 1 shows the normalized likelihood of each given , i.e. the probabilities of the most and least likely (Eq. 7) ones are and , respectively. The results are the mean of runs for each outlier ratio (0.2, 0.5 and 0.8). The ground truth s are shown by black vertical lines. It can be seen that by using the proposed approach the peaks are close to the ground truth even for high outlier ratio.
4.2 Real World Tests
Estimation of Fundamental Matrix.
To evaluate the performance on fundamental matrix estimation we used kusvod2
The first three blocks of Table 1, each consisting of four rows, report the quality of the estimation on each dataset as the average of 1000 runs on every image pair. The first two columns show the name of the tests and the investigated properties: (1-2) values and are the mean and median RMS geometric errors in pixels of the obtained model w.r.t. the manually annotated inliers. For fundamental matrices and homographies, it is defined as the average Sampson distance and re-projection error, respectively. For essential matrices, it is the mean Sampson distance of the implied F and the correspondences. (3) Value is the mean processing time in milliseconds. (4) Value is the mean number of samples, i.e. RANSAC iterations, which had to be drawn till termination. Note that, as it is expected, the methods applied with or without the proposed post-processing step do not differ in that aspect.
It can be clearly seen that for F estimation the proposed post-processing step improved the results in nearly all of the tests with negligible increasing in the processing time. The increase was milliseconds on average. The errors were reduced by approximately compared with the methods without -consensus. Applying LO-MAGSAC led to results superior to that of the competitor algorithms in all test cases in terms of geometric accuracy. Comparing its processing time to that of the fastest one (LO-MSAC) LO-MAGSAC was times slower. However, its time was times that of plain RANSAC due to the early termination.
Estimation of Homography.
In order to test homography estimation we downloaded homogr
Estimation of Essential Matrix. To estimate essential matrices, we used the strecha dataset  consisting of image sequences of buildings. All images are of size . The ground truth projection matrices are provided. The methods were applied to all possible image pairs in each sequence. The SIFT detector  was used to obtain correspondences. For each image pair, a reference point set with ground truth inliers was obtained by calculating the fundamental matrix from the projection matrices . Correspondences were considered as inliers if the symmetric epipolar distance was smaller than pixel. All image pairs with less than inliers found were discarded. In total, image pairs were used in the evaluation. The results are reported in the th block of Table 1. The trend is similar to the previous cases. The only difference is that the post-processing step did not improve plain RANSAC and MSAC. Its average improvement was pixels. The most accurate essential matrices were obtained by LO-MAGSAC.
Estimation of Affine Transformation.
The SZTAKI Earth Observation dataset
|3 327||3 327||2 752||2 752||1 635||1 635||1 471||1 471||1 471||1 651|
|1 987||1 987||908||908||580||580||327||327||327||585|
|8 415||8 415||6 307||6 307||5 610||5 610||4 451||4 451||4 451||4 798|
|3 216||3 216||2 753||2 760||1 559||1 560||1 457||1 461||1 460||2 398|
|4 012||4 012||3 359||3 359||1 826||1 826||1 691||1 691||1 691||2 183|
A robust approach, called -consensus, was proposed for eliminating the need of a user-defined threshold by marginalizing over a range of noise scales using a Bayesian estimator. Also, it is shown that having a finite set of data points, the scale range can be replaced by a finite set of s without loss of generality. The optimized model, not depending on a threshold parameter, is obtained as the weighted average using the posterior probabilities as weights. Applying -consensus, we proposed two methods: first, a locally optimized RANSAC, called LO-MAGSAC, which includes -consensus to the local optimization of LO-RANSAC. The method is superior to the state-of-the-art in terms of geometric accuracy on publicly available real world datasets for epipolar geometry (both F and E), homography and affine transformation estimation. The method is often faster than plain RANSAC due to the early termination, but sometimes significantly slower. We therefore also proposed a post-processing step applying -consensus only once: to the final so-far-the-best model. The method always improved the model quality on a wide range of vision problems without noticeable deterioration in processing time, i.e. at most 1-2 milliseconds. We see no reason for not applying it after the robust estimation finished.
- Note that the probabilistic interpretation of holds only for the standard cost function.
- Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM (1981)
- Torr, P.H.S., Murray, D.W.: Outlier detection and motion segmentation. In: Optical Tools for Manufacturing and Advanced Automation, International Society for Optics and Photonics (1993)
- Torr, P.H.S., Zisserman, A., Maybank, S.J.: Robust detection of degenerate configurations while estimating the fundamental matrix. Computer Vision and Image Understanding (1998)
- Pritchett, P., Zisserman, A.: Wide baseline stereo matching. In: International Conference on Computer Vision, IEEE (1998)
- Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing (2004)
- Mishkin, D., Matas, J., Perdoch, M.: MODS: Fast and robust method for two-view matching. Computer Vision and Image Understanding (2015)
- Sminchisescu, C., Metaxas, D., Dickinson, S.: Incremental model-based estimation using geometric constraints. Pattern Analysis and Machine Intelligence (2005)
- Ghosh, D., Kaabouch, N.: A survey on image mosaicking techniques. Journal of Visual Communication and Image Representation (2016)
- Zuliani, M., Kenney, C.S., Manjunath, B.S.: The multiransac algorithm and its application to detect planar homographies. In: International Conference on Image Processing, IEEE (2005)
- Isack, H., Boykov, Y.: Energy-based geometric multi-model fitting. International Journal of Computer Vision (2012)
- Pham, T.T., Chin, T.J., Schindler, K., Suter, D.: Interacting geometric priors for robust multimodel fitting. Transactions on Image Processing (2014)
- Nasuto, D., Craddock, J.M.B.R.: NAPSAC: High noise, high dimensional robust estimation - itâs in the bag. (2002)
- Chum, O., Matas, J.: Matching with PROSAC-progressive sample consensus. In: Computer Vision and Pattern Recognition, IEEE (2005)
- Fragoso, V., Sen, P., Rodriguez, S., Turk, M.: EVSAC: accelerating hypotheses generation by modeling matching scores with extreme value theory. In: International Conference on Computer Vision. (2013)
- Torr, P.H.S., Zisserman, A.: MLESAC: A new robust estimator with application to estimating image geometry. Computer Vision and Image Understanding (2000)
- Torr, P.H.S.: Bayesian model estimation and selection for epipolar geometry and generic manifold fitting. International Journal of Computer Vision 50(1) (2002) 35–61
- Stewart, C.V.: Minpran: A new robust estimator for computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(10) (1995) 925–938
- Moisan, L., Moulon, P., Monasse, P.: Automatic homographic registration of a pair of images, with a contrario elimination of outliers. Image Processing On Line 2 (2012) 56–73
- Rais, M., Facciolo, G., Meinhardt-Llopis, E., Morel, J., Buades, A., Coll, B.: Accurate motion estimation through random sample aggregated consensus. CoRR abs/1701.05268 (2017)
- Chum, O., Matas, J., Kittler, J.: Locally optimized RANSAC. In: Joint Pattern Recognition Symposium, Springer (2003)
- Lebeda, K., Matas, J., Chum, O.: Fixing the locally optimized RANSAC. In: British Machine Vision Conference, Citeseer (2012)
- Hartley, R., Zisserman, A.: Multiple view geometry in computer vision. Cambridge university press (2003)
- Hartley, R.I.: In defense of the eight-point algorithm. Transactions on Pattern Analysis and Machine Intelligence (1997)
- Chum, O., Werner, T., Matas, J.: Epipolar geometry estimation via RANSAC benefits from the oriented epipolar constraint. In: International Conference on Pattern Recognition. (2004)
- Strecha, C., Fransens, R., Van Gool, L.: Wide-baseline stereo from multiple views: a probabilistic account. In: Conference on Computer Vision and Pattern Recognition, IEEE (2004)
- Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer vision, IEEE (1999)
- Benedek, C., Szirányi, T.: Change detection in optical aerial images by a multilayer conditional mixed markov model. Transactions on Geoscience and Remote Sensing (2009)