Closed-Form Optimal Two-View Triangulation Based on Angular Errors

Closed-Form Optimal Two-View Triangulation Based on Angular Errors

Seong Hun Lee          Javier Civera
I3A, University of Zaragoza, Spain
{seonghunlee, jcivera}@unizar.es
Abstract

In this paper, we study closed-form optimal solutions to two-view triangulation with known internal calibration and pose. By formulating the triangulation problem as and minimization of angular reprojection errors, we derive the exact closed-form solutions that guarantee global optimality under respective cost functions. To the best of our knowledge, we are the first to present such solutions. Since the angular error is rotationally invariant, our solutions can be applied for any type of central cameras, be it perspective, fisheye or omnidirectional. Our methods also require significantly less computation than the existing optimal methods. Experimental results on synthetic and real datasets validate our theoretical derivations.

1 Introduction

Recovering the position of a 3D point given its projections in two or more cameras is called triangulation. It constitutes a fundamental building block in stereo vision [33], simultaneous localization and mapping (SLAM) [22] and structure-from-motion (SfM) pipelines [29]. For large problems, reconstructing thousands (or millions) of points is not uncommon, so achieving fast and accurate triangulation is important for the performance of such systems.

If one assumes the exact knowledge of camera matrices and noiseless feature measurements, triangulation amounts to intersecting two backprojected rays that correspond to the same 3D point. In practice, however, this assumption is unrealistic, and the rays do not necessarily intersect. Therefore, a nontrivial method is required even for just two views.

The standard approach is to find the 3D point that minimizes a chosen cost function given the feature measurements. The most common are the norm (sum of magnitude), norm (sum of squares) and norm (maximum) of image reprojection errors. While reasonable for perspective cameras, the image reprojection error does not generalize well to different camera types (e.g., omnidirectional or fully spherical panoramic cameras). This motivates the use of angular reprojection error, a rotationally invariant alternative to the image reprojection error that is generic and independent of the projection geometry [21, 24, 27].

In this work, we derive, for the first time to our knowledge, the exact closed-form solutions to the and optimal triangulation from two views based on the angular reprojection error. Unlike iterative methods (e.g., [13, 17]), the proposed methods guarantee global optimality without any iterations, and unlike polynomial methods (e.g., [10, 23, 32]), they do not involve finding the roots of a higher-degree polynomial. Hence, our methods simultaneously provide the global optimality, speed and simplicity. We also present our own derivation of the optimal solution that is much more compact and geometrically intuitive than the existing one [24]. Since all three methods are based on the angular error, they are not limited to standard perspective cameras and can also be used for fisheye, omnidirectional and fully spherical panoramic cameras.

The paper is organized as follows. In the next three sections, we discuss the related work and preliminaries. Section 5, 6 and 7 respectively present the closed-form solutions to the , and optimal triangulation. To make our paper compact and easily accessible, we separated the proofs from our main findings and put them in the appendix. Section 8 addresses the cheirality constraint. Finally, experimental results are provided in Section 9, followed by the conclusions in Section 10.

2 Related Work

The most widespread approach to triangulation is to find the 3D point that minimizes the norm of image reprojection errors [8]. Assuming that image points are perturbed by Gaussian noise, the optimal solution gives the maximum likelihood estimate (MLE). This can be obtained in closed form by solving a polynomial of degree 6 for two views [10] and degree 47 for three views [32]. Such polynomial methods are, however, computationally expensive and susceptible to ill-conditioning [17]. Besides, an iterative search for the roots may converge to a local minimum [10].

Another two-view method by Kanatani et al. [13] iteratively corrects the 2D projections of the points. Although this method was shown to be faster than the one by Hartley and Sturm [10], it does not satisfy the epipolar constraint [19] in each iteration. Lindstrom [17] solved this problem with an improved iterative algorithm that is even more stable and faster. However, neither his method nor Kanatani’s guarantees global optimality. Oliensis [24] showed that by formulating the problem as minimization of the sine of angular reprojection errors, an exact closed-form solution can be derived for two-view triangulation.

Instead of minimizing the norm, one may choose to minimize the norm of reprojection errors. The advantage of norm is that it is more robust to outliers as it places less emphasis on large errors [10, 11]. For two views, Hartley and Sturm [10] showed that the optimal solution can be obtained in closed form by solving a polynomial of degree 8. They also found that the optimization gives slightly more accurate 3D results than the optimization.

In geometric problems, another popular norm is the norm. The optimal solution corresponds to the MLE under the assumption of uniform noise in the image points [6]. The advantage of the cost function over the cost is that it is relatively simpler and has a single minimum [9]. For the case of two views, Níster [23] showed that the optimal solution can be obtained in closed form by keeping the reprojection errors equal in the two views and solving the resulting quartic equation. A main drawback of the cost is that it is relatively more sensitive to outliers [9]. This being said, such sensitivity was shown to be useful for outlier removal [30, 26, 16].

While most of the aforementioned works formulate their optimization problem in terms of the image reprojection error, the angular reprojection error is another popular choice. It embodies a better noise model for fisheye or omnidirectional cameras [24, 20]. Even for perspective cameras, the assumption of Gaussian noise is not justified [6], and the angular reprojection error is just as valid as the image reprojection error, if not more so. In the literature, it has been proposed to minimize the sine of angular reprojection errors in norm [24], the tangent in or norm [7, 9, 12], and the cosine in negative norm [28, 3]. In contrast to these methods, our and optimization do not involve trigonometric functions.

3 Preliminaries on 3D Geometry

Throughout the paper, we adopt the following notation: We use bold letters for vectors and matrices, and light letters for scalars. The Euclidean norm of a vector is denoted by , and the unit vector by . The angle between two lines and is denoted by .

The following vector identities will come in handy later:

(1)
(2)

We also make frequent use of the following formulas:

  1. The distance between a point and a plane is given by where is the projection of onto . This is computed as follows:

    (3)
  2. The distance between two skew lines and is given by where and are the points on each line that form the closest pair. Letting and , this is computed as follows:

    (4)

    The two points can also be obtained individually [14]:

    (5)
    (6)

Equation (4) can be interpreted as the minimum amount of translation required for the two lines to intersect. In this work, it will be also important to know the minimum amount of rotation (or pivot) required for the two lines to intersect. We answer this question in the following lemma:

Lemma 1 (Minimum Pivot Angle for Intersection)

Given two skew lines and , let be the line that forms the smallest angle to among all possible lines that intersect both point and line . Then, is the projection of onto the plane that contains and . Furthermore, letting and ,

(7)

We call the minimum pivot angle for intersection, as it represents the smallest angle required for pivoting line at to make it intersect .

Proof. Refer to Appendix A.

4 Preliminaries on Two-View Triangulation

Figure 1: The difference between the observed features (, ) and the triangulation result (, ) can be quantified by either image reprojection errors (, ) or angular reprojection errors (, ).

Consider two cameras and observing the same 3D world point . Let and be their positions in the world frame, and let and be the rotation matrix and translation vector that together transform a point from the camera frame to , i.e., , where and correspond to in camera frame and , respectively. Since triangulation is impossible for zero translation, we set without loss of generality. Let and be the homogeneous pixel coordinates of the estimated correspondence to in each frame. Given the camera calibration matrix , the normalized image coordinates and are related to and by and .

The two backprojected rays in frame , i.e., and , do not necessarily intersect due to inaccuracies in the image measurements and camera matrices. For the rays to intersect, and must be corrected to and such that the epipolar constraint [19] is satisfied. It enforces the coplanarity of , and , and is given by

(8)

The goal of the optimal triangulation is to minimally correct the feature rays so that they satisfy (8) and intersect at some point in frame . What is meant by “minimal” depends on the chosen cost function and error criterion. Fig. 1 illustrates two most popular error criteria, namely the image reprojection error and the angular reprojection error. Formally, they are defined as follows:

(9)
(10)

In this work, we minimize the latter in , and norms. Once we have the optimal and , the point of intersection can be obtained using either (5) or (6):

(11)

where equals the depth multiplied by for .

Figure 2: Example scenarios satisfying the epipolar constraint (8). The epipolar plane (shown in green) contains both the rays and the camera centers. Cheirality condition is violated in case (b) and (c).

Note that the epipolar constraint (8) is a necessary condition for intersecting the two rays, but not a sufficient one. Fig. 2 illustrates scenarios where the two rays are coplanar, but do not intersect. This happens when the intersection requires negative depth(s), violating the cheirality constraint [8]. In the following analysis (until Section 8), we will temporarily assume that satisfying the epipolar constraint (8) is sufficient for intersecting the rays.

5 Closed-Form Triangulation

The triangulation based on the angular reprojection error (10) finds the feature rays and that minimize subject to the epipolar constraint (8). The following lemma reveals a surprising fact that is achieved by correcting either one of or , but not both:

Lemma 2 ( Angle Minimization)

Given two skew lines and , consider any two intersecting lines that also pass and , respectively, i.e., and . Let , , , and . Then, is minimized for the following and :

(12)
(13)

Proof. Refer to Appendix B.

By substituting and into and in the above lemma, the resulting and become the corrected rays and that satisfy the optimality, and (or ) becomes the normal of the corresponding epipolar plane.

6 Closed-Form Triangulation

Considering that the angular errors are small in practice, the “relaxed” triangulation finds the feature rays and that minimize (instead of ) subject to the epipolar constraint (8). Note that the small-angle approximation by is more accurate than by or that have been used in literature [7, 9, 12, 28, 3]. This is easily seen by comparing their Maclaurin expansions. As will be shown in the following lemma (and previously in [24]), the relaxation with the sine function allows us to derive the optimal solution in closed form.

Lemma 3 ( Angle Minimization)

Given two skew lines and , consider any two intersecting lines that also pass and , respectively, i.e., and . Let , and . Then, is minimized for

(14)

where is the second column of the matrix from

(15)

Proof. Refer to Appendix C.

Analogously to the method, substituting and into and in the above lemma gives and that satisfy the optimality.

Input:        Calib. matrix (), relative pose (, ), and a
       match (, ) from two views ().
Output:           Triangulated 3D point () in ref. frame .
1) , , , .
2) For triangulation:
If , use (12) to obtain and . Otherwise, use (13).
For triangulation:
Compute and from (14) and (15).
For triangulation:
Compute and from (16) and (17).
3) and .
4) Check cheirality:
(i)  Obtain and from (11). (ii) Discard the point and terminate if either      or .
5) Check angular reprojection errors:
(i)   and . (ii) Discard the point and terminate if      for some small .
6) Check parallax:
(i)   (ii) Discard the point and terminate if      for some small .
7) Compute and return from (11).
Table 1: Summary of the proposed methods.

7 Closed-Form Triangulation

The triangulation based on the angular reprojection error (10) finds the feature rays and that minimize subject to the epipolar constraint (8). The following lemma states that this is achieved when :

Lemma 4 ( Angle Minimization)

Given two skew lines and , consider any two intersecting lines that also pass and , respectively, i.e., and . Let , , , and . Then, is minimized when . This is achieved for

(16)

where

(17)

Proof. Refer to Appendix D.

Analogously to the previous two methods, substituting and into and gives and that satisfy the optimality.

Figure 3: Top row: Real dataset images. Bottom row: Main segments of the median reconstruction results using the proposed method.
Figure 4: Five scenarios where the optimal solution violates the cheirality constraint, and the possible reattempts for triangulation.

8 Cheirality, Parallax and Outliers

We have used the term lines instead of rays in all lemmas so far, ignoring the cheirality constraint [8]. We argue that if the optimal solution violates the cheirality constraint, the most reasonable choice is to simply discard the result. In the following, we provide the rationale for this choice.

Fig. 4 illustrates five scenarios where the optimal solution violates the cheirality constraint. In case (a), both rays have negative depths at the optimal intersection. Increasing the allowed angular reprojection error, the first intersection with positive depths occurs when the two corrected rays become parallel, resulting in a point at infinity. This point cannot be triangulated, so it should be discarded.

In the remaining cases, the optimal intersection involves only one of the rays having a negative depth. Following the same procedure, the first intersection with positive depths occurs either at infinity (case (b)), at one of the camera centers (case (c)), along the ray parallel to the translation (case (d)), or at a point somewhere else (case (e)).

In case (b), (c) and (d), the newly triangulated point has either infinite, zero or ambiguous depth, so it is reasonable to discard it. In case (e), we found that reattempting the triangulation with positive depths yields either a very large error, a point near the epipole or a low parallax angle. Typically, these are the indicators of low accuracy or an outlier [10, 8], so a reasonable choice is to discard the match. This procedure is outlined in Step 4–6 of Tab. 1.

9 Experimental Results

We evaluate the proposed methods in comparison to the midpoint method [2, 10], Hartley and Sturm’s and method [10], Lindstrom’s method with five iterations [17], and Níster’s method [23]. The evaluation was performed on both synthetic and real datasets. We generated the synthetic datasets as follows: A set of point clouds of 2,500 points each are generated with a Gaussian radial distribution where is the distance from the world origin. Each point cloud is centered at for with , and their image projections are perturbed by Gaussian noise for . The size and the focal length of the images are pixels and pixel, respectively. We have three configurations for the camera poses: (1) “orbital” - the cameras at pointing at the point cloud center, (2) “lateral” - the cameras at pointing at , and (3) “forward” - the cameras at pointing at the point cloud center. The poses are slightly perturbed with uniform noise . For real datasets, we used the Oxford Dinosaur, Model House and Corridor [1], Notre Dame [31] and Fountain [4, 25] dataset. In total, the synthetic and real datasets provide over 5.5 million unique triangulation problems in a wide variety of geometric configurations.

Figure 5: 3D triangulation errors before (top) and after (bottom) discarding the points with the lowest 5% parallax.
Midpoint img img img img ang ang ang
[2, 10] [10] [10] 5 it. [17] [23]

Error Criterion

- - - - - 100 % - -
- - 7e-5 % 5e-5 % - - 99.9999 % -
- - - - - - 100 % -
- - - - - - - 100 %
- 70.84% 0.002% 0.002% - 29.16 % - -
- - 23.14 % 76.86 % - - - -
- - - - 100 % - - -
Table 2: Percentage of the total matches (from all synthetic and real datasets) for which each method yields the lowest error in given criterion. “img/ang”: optimal in the image/angular errors. See the supplementary material for the results from individual datasets.
Midpoint img img img img img ang ang ang
[2, 10] [10] [10] [23] 2 it. [17] 5 it. [17]
Points/sec 42 M 65 K 92 K 270 K 1.4 M 520 K 29 M 670 K 14 M
Relative Speed 1.0 0.0016 0.0022 0.0064 0.033 0.013 0.71 0.016 0.33
Table 3: Speed of computing a 3D point. The relative speed is normalized by that of the midpoint method. Note that this does not take into account Step 4–6 of Tab. 1. All algorithms were implemented in C++ and run on a laptop CPU (Intel i7-4810MQ, 2.8 GHz).

Tab. 2 provides the percentage of the total matches (from both synthetic and real datasets) for which each method yields the lowest error in given criterion. In 100 % of the total triangulation problems, all three of our methods yield the lowest errors in their corresponding optimal criterion. We also see that minimizing is very close to minimizing , as discussed in Section 6. Since our angular method is numerically stable, it sometimes finds better solutions than Hartley-Sturm’s closed-form method [10] even in the image error criterion ().

In Fig. 5, histograms are given for the 3D reconstruction errors on the synthetic datasets. It shows that 1) all methods exhibit similar 3D accuracy, and 2) discarding low-parallax points (Step 6 of Tab. 1) helps to remove large 3D errors. Qualitatively, we also found that the reconstructions of the real datasets look similar for all methods. Fig. 3 shows the reconstruction results using the proposed method.

We compare the speed of each algorithm in Tab. 3. The midpoint method is the fastest, as it directly computes the 3D point using (11) without correcting the feature rays or image points. Among the optimal methods, our and methods are significantly faster than the rest, i.e., at least 1–2 orders of magnitude faster than the state-of-the-art [17].

10 Conclusions

In this work, we derived optimal closed-form solutions to the , and stereo triangulation based on the angular reprojection error. The proposed triangulation methods are extremely simple and fast, and they guarantee global optimality under respective cost functions. We believe that our findings will be particularly useful for large-scale SfM and real-time visual SLAM algorithms.

Acknowledgement

This work was partially supported by the Spanish government (project PGC2018-096367-B-I00) and the Aragón regional government (Grupo DGA-T45_17R/FSE).

Appendix

A Proof of Lemma 1

Figure 6: The angle is the smallest angle required for pivoting line at point to make it intersect .

Consider a right circular cone with apex and axis , lying sideways on a plane that contains and line (see Fig. 6). The equation of the plane is given by

(18)

The line of intersection between the plane and the cone forms the smallest angle to among all possible lines on the plane that pass . That is, it forms the smallest angle to among all possible lines that pass both and . Hence, this line of intersection must be . Now, consider a point located one unit away from along . Let be the projection of onto plane . According to lemma 5 in Appendix E, the point must be located along . Let , i.e., the distance between and plane . Then, we obtain as follows:

B Proof of Lemma 2

One of the following is true when is minimized:

  1. and and .

  2. and and .

  3. and and .

Suppose, for the sake of argument, that one of the first two statements is true. In the first case, lemma 1 states that is obtained by projecting onto the plane with the normal , which leads to (12) and

(19)

Likewise, in the second case, lemma 1 leads to (13), and

(20)

Now, the question is how to determine which of the two statements is true. Comparing the right-hand side of (19) and (20), we find that if , then

(21)

and is equal to the left-hand side of (21), indicating that the first statement is true. Naturally, the second statement is true otherwise. Note that there is an ambiguity if , and the solution is optimal whichever case is considered. This concludes the proof of lemma 2 for the first two cases.

Figure 7: When two cones intersect at a single point on their lateral surface, they are tangent to the same plane containing and the apexes of each cone. For visualization purposes, we show the two cones lying on each side of the plane separately.

We will now prove that the third case never occurs. Given some angle , minimizing is equivalent to minimizing . This is the identical situation as the first case if we replace by . We know from the proof of lemma 1 that pivoting a line to intersect another with minimum angle can be modeled by a cone lying sideways on a plane. The top of Fig. 7 illustrates this. Similarly, the bottom of Fig. 7 illustrates the minimization of with respect to given . Now, since both planes touching each cone contain the same two intersecting lines and , they must be the same plane. Let this plane be . According to lemma 1, is the projection of onto plane . Therefore,

(22)

where is the unit normal of plane . Since contains both and , is perpendicular to . Hence, computing the dot product with on each side of (22) yields

(23)

Note that corresponds to the magnitude of the projection of onto plane for non-zero , so it must be smaller than . Thus,

(24)

Using (2), this inequality can be written as

(25)

Analogously, we can also derive

(26)

Now, suppose that

(27)

Without loss of generality, let us assume that . Then, (25) gives . As we discussed for the first two cases, this means that pivoting to intersect takes smaller angle than pivoting to intersect , i.e., . Thus

(28)

According to lemma 6 in Appendix E, pivoting a line twice for intersection takes equal or greater angle than the single minimum pivot angle. Therefore,

(29)

which contradicts (27). Therefore, is minimized when either or is zero.

C Proof of Lemma 3

Given some angle , is achieved by minimizing and vice versa. As discussed in the proof of lemma 2, this means that the underlying geometry at can be represented by the two cones with apex , and skew axes , , respectively, touching each side of the same plane on their lateral surface. This is visualized in Fig. 7. Let be the normal of plane . From Lemma 1, we know that

(30)

Combining these two equations, we get

(31)

Since plane contains both and , is perpendicular to . Therefore, minimizing is equivalent to solving the following equality-constrained quadratic programming problem:

(32)

In [5], it was shown that this problem can be solved using the method of Lagrange multipliers, and is minimized when is the eigenvector corresponding to the smallest nontrivial eigenvalue of . Letting , it can be easily shown that . Hence, . Note that for any square matrix and , the eigen-decomposition of is the same as that of . This means that the eigenvectors of are the same as those of , i.e., the right-singular vectors of . Therefore, letting with the diagonal entries of in descending order, the optimal is given by the second column of . Finally, projecting and onto plane leads to (14).

D Proof of Lemma 4

First, we show that when is minimized: Consider two cones with apex , and skew axes , . Constrain both their apertures to be . When , the they are simply two skew lines. As we gradually increase , they will grow at the same rate, and eventually, touch one another. Let at this point. Now, suppose

(33)

The definition of implies that setting will make the two cones partially overlap in space (or at least meet at a point). However, the they do not meet when . This is a contradiction, so the inequality in (33) must be false, and must be equal to . That is, in order for to be minimized.

We can now represent the underlying geometry at as two congruent cones with skew axes, touching each side of the same plane on their lateral surface. This is the situation shown in Fig. 7 for . Let be the normal of plane . Then, from lemma 1, we get

(34)

The last equality in (34) can be written as

(35)

where is or , depending on the signs of