An Efficient Algebraic Solution to the Perspective-Three-Point Problem

# An Efficient Algebraic Solution to the Perspective-Three-Point Problem

###### Abstract

In this work, we present an algebraic solution to the classical perspective-3-point (P3P) problem for determining the position and attitude of a camera from observations of three known reference points. In contrast to previous approaches, we first directly determine the camera’s attitude by employing the corresponding geometric constraints to formulate a system of trigonometric equations. This is then efficiently solved, following an algebraic approach, to determine the unknown rotation matrix and subsequently the camera’s position. As compared to recent alternatives, our method avoids computing unnecessary (and potentially numerically unstable) intermediate results, and thus achieves higher numerical accuracy and robustness at a lower computational cost. These benefits are validated through extensive Monte-Carlo simulations for both nominal and close-to-singular geometric configurations.

An Efficient Algebraic Solution to the Perspective-Three-Point Problem

 Tong Ke University of Minnesota Minneapolis, MN 55455 kexxx069@cs.umn.edu
 Stergios Roumeliotis University of Minnesota Minneapolis, MN 55455 stergios@cs.umn.edu

The Perspective-n-Point (PnP) is the problem of determining the 3D position and orientation (pose) of a camera from observations of known point features. The PnP is typically formulated and solved linearly by employing lifting (e.g., [?]), or as a nonlinear least-squares problem minimized iteratively (e.g., [?]) or directly (e.g., [?]). The minimal case of the PnP (for n=3) is often used in practice, in conjunction with RANSAC, for removing outliers [?].

The first solution to the P3P problem was given by Grunert [?] in 1841. Since then, several methods have been introduced, some of which [?, ?, ?, ?, ?, ?] were reviewed and compared, in terms of numerical accuracy, by Haralick et al. [?]. Common to these algorithms is that they employ the law of cosines to formulate a system of three quadratic equations in the features’ distances from the camera. They differ, however, in the elimination process followed for arriving at a univariate polynomial. Later on, Quan and Lan [?] and more recently Gao et al. [?] employed the same formulation but instead used the Sylvester resultant [?] and Wu-Ritz’s zero-decomposition method [?], respectively, to solve the resulting system of equations, and, in the case of [?], to determine the number of real solutions. Regardless of the approach followed, once the feature’s distances have been computed, finding the camera’s orientation, expressed as a unit quaternion [?] or a rotation matrix [?], often requires computing the eigenvectors of a  matrix (e.g., [?]) or performing singular value decomposition (SVD) of a  matrix (e.g., [?]), respectively, both of which are time-consuming. Furthermore, numerical error propagation from the computed distances to the rotation matrix significantly reduces the accuracy of the computed pose estimates.

To the best of our knowledge, the first method111Nister and Stewenius [?] also follow a geometric approach for solving the generalized P3P resulting into an octic univariate polynomial whose odd monomials vanish for the case of the central P3P. that does not employ the law of cosines in its P3P problem formulation is that of Kneip et al. [?], and later on that of Masselli and Zell [?]. Specifically, [?] and [?] follow a geometric approach for avoiding computing the features’ distances and instead directly solve for the camera’s pose. In both cases, however, several intermediate terms (e.g., tangents and cotangents of certain angles) need to be computed, which negatively affect the speed and numerical precision of the resulting algorithms.

Similar to [?] and [?], our proposed approach does not require first computing the features’ distances. Differently though, in our derivation, we first eliminate the camera’s position and the features’ distances to result into a system of three equations involving only the camera’s orientation. Then, we follow an algebraic process for successively eliminating two of the unknown 3-dof and arriving into a quartic polynomial. Our algorithm (summarized in Alg. 1) requires fewer operations and involves simpler and numerically more stable expressions, as compared to either [?] or [?], and thus performs better in terms of efficiency, accuracy, and robustness. Specifically, the main advantages of our approach are:

• Our algorithm’s implementation takes about 40% of the time required by the current state of the art [?]. 222Although Masselli and Zell [?] claim that their algorithm runs faster than Kneip et al.’s [?], our results (see Section An Efficient Algebraic Solution to the Perspective-Three-Point Problem) show the opposite to be true (by a small margin). The reason we arrive at a different conclusion is that our simulation randomly generates a new geometric configuration for each run, while Masselli employs only one configuration during their entire simulation, in which they save time due to caching.

• Our method achieves better accuracy than [?, ?] under nominal conditions. Moreover, we are able to further improve the numerical precision by applying root polishing to the solutions of the quartic polynomial while remaining faster than [?, ?].

• Our algorithm is more robust than [?, ?] when considering close-to-singular configurations (the three points are almost collinear or very close to each other).

The remaining of this paper is structured as follows. Section An Efficient Algebraic Solution to the Perspective-Three-Point Problem presents the definition of the P3P problem, as well as our derivations for estimating first the orientation and then the position of the camera. In Section An Efficient Algebraic Solution to the Perspective-Three-Point Problem, we assess the performance of our approach against  [?] and [?] in simulation for both nominal and singular configurations. Finally, we conclude our work in Section An Efficient Algebraic Solution to the Perspective-Three-Point Problem.

Given the positions, , of three known features , with respect to a reference frame , and the corresponding unit-vector, bearing measurements, , , our objective is to estimate the position, , and orientation, i.e., the rotation matrix , of the camera .

From the geometry of the problem (see Fig. An Efficient Algebraic Solution to the Perspective-Three-Point Problem), we have (for ):

 Gpi =GpC+diGCCCbi (1)

where is the distance between the camera and the feature . Figure \thefigure: The camera {C}, whose position, GpC, and orientation, GCC, we seek to determine, observes unit-vector bearing measurement Cbi of a feature fi, whose position, Gpi, is known.

In order to eliminate the unknown camera position, , and feature distance, , we subtract pairwise the three equations corresponding to (An Efficient Algebraic Solution to the Perspective-Three-Point Problem) for and , and project them on the vector to yield the following system of 3 equations in the unknown rotation :

 (Gp1−Gp2)TGCC(Cb1×Cb2) =0 (2) (Gp1−Gp3)TGCC(Cb1×Cb3) =0 (3) (Gp2−Gp3)TGCC(Cb2×Cb3) =0 (4)

Next, and in order to compute one of the 3 unknown degrees of rotational freedom, we introduce the following factorization of :

 GCC=C(k1,θ1)C(k2,θ2)C(k3,θ3) (5)

where333 denotes the rotation matrix describing the rotation about the unit vector, , by an angle . Note that in the ensuing derivations, all rotation angles are defined using the left-hand rule.

 k1≜Gp1−Gp2∥Gp1−Gp2∥, k3≜Cb1×Cb2∥Cb1×Cb2∥, k2≜k1×k3∥k1×k3∥ (6)

Substituting (An Efficient Algebraic Solution to the Perspective-Three-Point Problem) in (An Efficient Algebraic Solution to the Perspective-Three-Point Problem), yields a scalar equation in the unknown :

 kT1C(k2,θ2)k3 =0 (7)

which we solve by employing Rodrigues’ rotation formula [?]:444 denotes the skew-symmetric matrix corresponding to such that , . Note also that if is a unit vector, then , while for two vectors , , . Lastly, it is easy to show that

 C(k2,θ2)=cosθ2I−sinθ2⌊k2⌋+(1−cosθ2)k2kT2 (8)

to get

 θ2=arccos(kT1k3)±π2 (9)

Note that we only need to consider one of these two solutions [in our case, we select ; see Fig. An Efficient Algebraic Solution to the Perspective-Three-Point Problem], since the other one will result in the same (see Appendix An Efficient Algebraic Solution to the Perspective-Three-Point Problem for a formal proof).
In what follows, we describe the process for eliminating from (An Efficient Algebraic Solution to the Perspective-Three-Point Problem) and (An Efficient Algebraic Solution to the Perspective-Three-Point Problem), and eventually arriving into a quartic polynomial involving a trigonometric function of . To do so, we once again substitute in (An Efficient Algebraic Solution to the Perspective-Three-Point Problem) and (An Efficient Algebraic Solution to the Perspective-Three-Point Problem) the factorization of defined in to get (for ):

 uTiC(k1,θ1)C(k2,θ2)C(k3,θ3)vi=0 (10)

where

 ui≜Gpi−Gp3,  vi ≜Cbi×Cb3,  i=1,2, (11)

and employ the following property of rotation matrices

 C(k1,θ1)C(k2,θ2)CT(k1,θ1)=C(C(k1,θ1)k2,θ2)

to rewrite (An Efficient Algebraic Solution to the Perspective-Three-Point Problem) in a simpler form as

 uTiC(k1,θ1)C(C(k2,θ2)k3,θ3)C(k2,θ2)vi =0 ⇒uTiC(k1,θ1)C(k′3,θ3)v′i =0 (12)

where

 v′i ≜C(k2,θ2)vi, i=1,2 k′3 ≜C(k2,θ2)k3=k2×k1 (13)

The last equality in (13) is geometrically depicted in Fig. An Efficient Algebraic Solution to the Perspective-Three-Point Problem and algebraically derived in Appendix An Efficient Algebraic Solution to the Perspective-Three-Point Problem. Analogously, it is straightforward to show that

 k′1≜CT(k2,θ2)k1=k3×k2 Figure \thefigure: Geometric relation between unit vectors k1, k2, k3, k′3, k′′3, and u1. Note that k1, k3, and k′3 belong to a plane π1 whose normal is k2. Also, k2, k′3, and k′′3 lie on a plane, π2, normal to π1.

Next, by employing Rodrigues’ rotation formula [see (An Efficient Algebraic Solution to the Perspective-Three-Point Problem)], for expressing the product of a rotation matrix and a vector as a linear function of the unknown , i.e.,

 C(k,θ)v =(−cosθ⌊k⌋2−sinθ⌊k⌋+kkT)v =[−⌊k⌋2v−⌊k⌋v][cosθsinθ]+(kTv)k (14)

in (12) yields (for ):

 ([−⌊k1⌋2ui⌊k1⌋ui][cosθ1sinθ1]+(kT1ui)k1)T ⋅ ([−⌊k′3⌋2v′i−⌊k′3⌋v′i][cosθ3sinθ3]+(k′3Tv′i)k′3)=0 (15)

Expanding (15) and rearranging terms, yields (for )

 [cosθ1sinθ1]T[uTi⌊k1⌋2⌊k′3⌋2v′iuTi⌊k1⌋2⌊k′3⌋v′iuTi⌊k1⌋⌊k′3⌋2v′iuTi⌊k1⌋⌊k′3⌋v′i][cosθ3sinθ3] + = (k′3Tv′i)[uTi⌊k1⌋⌊k′3⌋k1uTi⌊k1⌋k′3][cosθ1sinθ1] (16)

Notice that the term appears three times in (An Efficient Algebraic Solution to the Perspective-Three-Point Problem), and

 uT1⌊k1⌋⌊k′3⌋ =uT1k′3kT1 =(Gp1−Gp3)T⌊k1⌋⌊k′3⌋ =(Gp1−Gp2+Gp2−Gp3)T⌊k1⌋⌊k′3⌋ =(Gp2−Gp3)T⌊k1⌋⌊k′3⌋ =uT2k′3kT1=uT2⌊k1⌋⌊k′3⌋ (17)

This motivates to rewrite (12) as (for ):

 0 =uTiC(k1,θ1)C(k′3,θ3)v′i =uTiC(k1,θ1)C(k1,−ϕ)C(k1,ϕ)C(k′3,θ3)v′i =uTiC(k1,θ1−ϕ)C(C(k1,ϕ)k′3,θ3)C(k1,ϕ)v′i =uTiC(k1,θ′1)C(k′′3,θ3)v′′i (18)

where

 θ′1≜θ1−ϕ, v′′i≜C(k1,ϕ)v′i, k′′3≜C(k1,ϕ)k′3 (19)

To simplify the equation analogous to (An Efficient Algebraic Solution to the Perspective-Three-Point Problem) that will result from (18) [instead of (An Efficient Algebraic Solution to the Perspective-Three-Point Problem)], we seek to find a (not necessarily unique) such that , and hence, [see (17)], i.e.,

 0 =uT1k′′3=uT1C(k1,ϕ)k′3 (20) =uT1(cosϕI−sinϕ⌊k1⌋+(1−cosϕ)k1kT1)k′3 =cosϕuT1k′3−sinϕuT1⌊k1⌋k′3 =cosϕuT1k′3−sinϕuT1k2
 ⇒[cosϕsinϕ] =1δ[uT1k2uT1k′3] (21)

where

 δ≜√(uT1k′3)2+(uT1k2)2=∥u1×k1∥ (22)

and thus [from (19) using (An Efficient Algebraic Solution to the Perspective-Three-Point Problem)]

 k′′3 =cosϕk′3−sinϕ⌊k1⌋k′3+(1−cosϕ)k1kT1k′3 =(k′3kT2u1−k2k′3Tu1)/δ=u1×(k′3×k2)/δ =u1×k1∥u1×k1∥ (23)

Now, we can expand (18) using (14) to get an equation analogous to (An Efficient Algebraic Solution to the Perspective-Three-Point Problem):

 (24)

Substituting [see (17)] in (24) and renaming terms, yields (for ):

 = [0¯fi3][cosθ′1sinθ′1] ⇒ [¯fi1cosθ′1+¯fi4¯fi2cosθ′1+¯fi5][cosθ3sinθ3]=¯fi3sinθ′1 (25)

where555The simplified expressions for the following terms, shown after the second equality, require lengthy algebraic derivations which we omit due to space limitations.

 ¯fi1 ≜uTi⌊k1⌋2⌊k′′3⌋2v′′i=δvTik2 ¯fi2 ≜uTi⌊k1⌋2⌊k′′3⌋v′′i=δvTik′1 ¯fi3 ≜(k′′3Tv′′i)uTi⌊k1⌋k′′3=δvTik3 ¯fi4 ≜−(uTik1)kT1⌊k′′3⌋2v′′i=(uTik1)(vTik′1) ¯fi5 ≜−(uTik1)kT1⌊k′′3⌋v′′i=−(uTik1)(vTik2)

For , (25) results into the following system:

 [¯f11cosθ′1+¯f14¯f12cosθ′1+¯f15¯f21cosθ′1+¯f24¯f22cosθ′1+¯f25][cosθ3sinθ3]=[¯f13¯f23]sinθ′1 (26)

Note that since , we can further simplify (26) by introducing , where

 [cosθ′3sinθ′3]≜[¯f11cosθ3+¯f12sinθ3√¯f211+¯f212−¯f14cosθ3+¯f15sinθ3√¯f214+¯f215]T (27)

Replacing by in (26), we have

 (28)

where

 f11 ≜δkT3Cb3 (29) f21 ≜δ(CbT1Cb2)(kT3Cb3) (30) f22 ≜δ(kT3Cb3)∥Cb1×Cb2∥ (31) f13 ≜¯f13=δvT1k3 (32) f23 ≜¯f23=δvT2k3 (33) f24 ≜(uT2k1)(kT3Cb3)∥Cb1×Cb2∥ (34) f15 ≜−(uT1k1)(kT3Cb3) (35) f25 ≜−(uT2k1)(CbT1Cb2)(kT3Cb3) (36)

From (28), we have

 [cosθ′3sinθ′3]= ⋅ (37)

Computing the norm of both sides of (An Efficient Algebraic Solution to the Perspective-Three-Point Problem), results in

which is a 4th-order polynomial in that can be compactly written as:

 4∑j=0αjcosjθ′1=0 (38)

with

 α4 ≜g25+g21+g23 (39) α3 ≜2(g5g6+g1g2+g3g4) (40) α2 ≜g26+2g5g7+g22+g24−g21−g23 (41) α1 ≜2(g6g7−g1g2−g3g4) (42) α0 ≜g27−g22−g24 (43) g1 ≜f13f22 (44) g2 ≜f13f25−f15f23 (45) g3 ≜f11f23−f13f21 (46) g4 ≜−f13f24 (47) g5 ≜f11f22 (48) g6 ≜f11f25−f15f21 (49) g7 ≜−f15f24 (50)

We compute the roots of (38) in closed form to find . Similarly to [?] and [?], we employ Ferrari’s method [?] to attain the resolvent cubic of (38), which is subsequently solved by Cardano’s formula [?]. Once the (up to) four real solutions of (38) have been determined, an optional step is to apply root polishing following Newton’s method, which improves accuracy for minimal increase in the processing cost (see Section \thefigure). Regardless, for each solution of , we will have two possible solutions for , i.e.,

 sinθ′1=±√1−cos2θ′1 (51)

which, in general, will result in two different solutions for . Note though that only one of them is valid if we use the fact that (see Appendix An Efficient Algebraic Solution to the Perspective-Three-Point Problem).

Next, for each pair of , we compute and from (An Efficient Algebraic Solution to the Perspective-Three-Point Problem), which can be written as

 [cosθ′3sinθ′3]=sinθ′1g5cos2θ′1+g6cosθ′1+g7[g1cosθ′1+g2g3cosθ′1+g4] (52)

Lastly, instead of first computing from (19) and from (27) to find using (An Efficient Algebraic Solution to the Perspective-Three-Point Problem), we hereafter describe a faster method for recovering . Specifically, from (An Efficient Algebraic Solution to the Perspective-Three-Point Problem), (12) and (18), we have

 GCC =C(k1,θ1)C(k2,θ2)C(k3,θ3) =C(k1,θ1)C(k′3,θ3)C(k2,θ2) (53)

Since is perpendicular to , we can construct a rotation matrix such that

 ¯C=[k1k′′3k1×k′′3]

and hence

 k1=¯Ce1, k′′3=¯Ce2 (54)

where

Substituting (54) in (53), we have

 GCC =¯CC(e1,θ′1)C(e2,θ3)¯CTC(k1,ϕ)C(k2,θ2) =¯CC(e1,θ′1)C(e2,θ3)C(e2,θ′3−θ3)¯¯C =¯CC(e1,θ′1)C(e2,θ′3)¯¯C (55)

where

 ¯¯C ≜C(e2,θ3−θ′3)¯CTC(k1,ϕ)C(k2,θ2) \lx@stackrel(???)=[Cb1k3Cb1×k3]T

The advantages of (55) are: (i) The matrix product can be computed analytically; (ii) are invariant to the (up to) four possible solutions and thus, we only need to construct them once.

Substituting in (An Efficient Algebraic Solution to the Perspective-Three-Point Problem) the expression for from (62) and rearranging terms, yields

 GpC =Gp3−δsinθ′1kT3Cb3GCCCb3 (56)

Note that we only use (An Efficient Algebraic Solution to the Perspective-Three-Point Problem) for to compute from . Alternatively, we could find the position using a least-squares approach based on (An Efficient Algebraic Solution to the Perspective-Three-Point Problem) for (see Appendix An Efficient Algebraic Solution to the Perspective-Three-Point Problem), if we care more for accuracy than speed. Lastly, the proposed P3P solution is summarized in Alg. 1.

Our algorithm is implemented666Our code is submitted along with the paper as supplemental material. in C++ using the same linear algebra library, TooN [?], as [?]. We employ simulation data to test our code and compare it to the solutions of [?] and [?]. For each single P3P problem, we randomly generate three 3D landmarks, which are uniformly distributed in a cuboid centered around the origin. The position of the camera is , and its orientation is .

We generate simulation data without adding any noise or rounding error to the bearing measurements, and run all three algorithms on 50,000 randomly-generated configurations to assess their numerical accuracy. Note that the position error is computed as the norm of the difference between the estimate and the ground truth of . As for the orientation error, we compute the rotation matrix that transforms the estimated to the true one, convert it to the equivalent axis-angle representation, and use the absolute value of the angle as the error. Since there are multiple solutions to a P3P problem, we compute the errors for all of them and pick the smallest one (i.e., the root closest to the true solution).

The distributions and the means of the position and orientation errors are depicted in Fig.s An Efficient Algebraic Solution to the Perspective-Three-Point Problem - An Efficient Algebraic Solution to the Perspective-Three-Point Problem and Table An Efficient Algebraic Solution to the Perspective-Three-Point Problem. As evident, we get similar results to those presented in [?] for Kneip et al.’s [?] and Masselli and Zell’s methods [?], while our approach outperforms both of them by two orders of magnitude in terms of accuracy. This can be attributed to the fact that our algorithm requires fewer operations and thus exhibits lower numerical-error propagation.

Furthermore, and as shown in the results of Table An Efficient Algebraic Solution to the Perspective-Three-Point Problem, we can further improve the numerical precision by applying root polishing. Typically, two iterations of Newton’s method [?] lead to significantly better results, especially for the orientation, while taking only 0.01 s per iteration, or about 4% of the total processing time.

We use a test program that solves 100,000 randomly generated P3P problems and calculates the total execution time to evaluate the computational cost of the three algorithms considered. We run it on a 2.0 GHz4 Core laptop and the results show that our code takes 0.54 s on average (0.52 s without root polishing) while [?] and [?] take 1.3 s and 1.5 s, respectively. This corresponds to a 2.5 speed up (or 40% of the time of [?]). Note also, in contrast to what is reported in [?], Masselli’s method is actually slower than Kneip’s. As mentioned earlier, Masselli’s results in [?] are based on 1,000 runs of the same features’ configuration, and take advantage of data caching to outperform Kneip.

There are two typical singular cases that lead to infinite solutions in the P3P problem:

• Singular case 1: The 3 landmarks are collinear.

• Singular case 2: Any two of the 3 bearing measurements coincide.

In practice, it is almost impossible for these conditions to hold exactly, but we may still have numerical issues when the geometric configuration is close to these cases. To test the robustness of the three algorithms considered, we generate simulation data corresponding to small perturbations (uniformly distributed within ) of the features’ positions when in singular configurations. The errors are defined as in Section An Efficient Algebraic Solution to the Perspective-Three-Point Problem, while we compute the medians of them to assess the robustness of the three methods. For fairness, we do not apply root polishing to our code here. According to the results shown in Fig.s An Efficient Algebraic Solution to the Perspective-Three-Point Problem - An Efficient Algebraic Solution to the Perspective-Three-Point Problem and Tables An Efficient Algebraic Solution to the Perspective-Three-Point Problem - An Efficient Algebraic Solution to the Perspective-Three-Point Problem, our method achieves the best accuracy in these two close-to-singular cases. The reason is that we do not compute any quantities that may suffer from numerical issues, such as cotangent and tangent in [?] and [?], respectively.