Parallax Bundle Adjustment on Manifold with Convexified Initialization
Abstract
Bundle adjustment (BA) with parallax angle based feature parameterization has been shown to have superior performance over BA using inverse depth or XYZ feature forms. In this paper, we propose an improved version of the parallax BA algorithm (PMBA) by extending it to the manifold domain along with observationray based objective function. With this modification, the problem formulation faithfully mimics the projective nature in a camera’s image formation, BA is able to achieve better convergence, accuracy and robustness. This is particularly useful in handling diverse outdoor environments and collinear motion modes. Capitalizing on these properties, we further propose a posegraph simplification to PMBA, with significant dimensionality reduction. This posegraph model is convex in nature, easy to solve and its solution can serve as a good initial guess to the original BA problem which is intrinsically nonconvex. We provide theoretical proof that our global initialization strategy can guarantee a nearoptimal solution. Using a series of experiments involving diverse environmental conditions and motions, we demonstrate PMBA’s superior convergence performance in comparison to other BA methods. We also show that, without incremental initialization or via thirdparty information, our global initialization process helps to bootstrap the full BA successfully in various scenarios, sequential or outoforder, including some datasets from the “Bundle Adjustment in the Large” database.
1 Introduction
Structure from Motion (SfM) / visual SLAM estimates 3D scene structures and camera poses simultaneously from 2D images. Bundle adjustment is the gold standard method of SfM, in that it finds optimal pose and map in the least squares sense to best explain the data. Solving such a nonlinear least squares problem typically requires iterative Newton methodology: start with an initial guess, repetitively add increments by solving a normal equation until convergence.
GN  LM  DL 

As shown in Table 1, this approach comes in three forms: original GaussNewton (GN) when the equation is easy to solve (the Hessian matrix has a small condition number), Levenberg Macquardt (LM) as a damped GN if Hessian is near singular, and DogLeg (DL) as a combination of GN and the steepest descent method for fast convergence. LM is a favourite of the BA community for its safe handling despite its slowness. GN and DL are both considered risky due to the large step size and are often avoided.
Problematic features
In many modern BA systems [2][3][4], a 3D feature point is parameterized as Euclidean coordinates (XYZ) or inverse depth (IDP). A wellknown problem for these representations is that when far away features exist or when camera poses observing a feature are collinear with the feature, the Hessian becomes illconditioned. A small change in error function leads to a large jump in the state variable, significantly affecting BA’s robustness, efficiency and accuracy. See Fig. 1 and Fig. 8(a) for illustration of failure in conventional BA.
To deal with this problem, several remedies are commonly adopted. The fundamental principle is separate treatment for problematic features and good ones. ORBSLAM [5] uses a prudent feature selection strategy where features with insufficient parallax angles are discarded. A hybrid method was proposed in [6], that first estimates camera orientations with remote features then optimises with poses and near features. The vision smart factor proposed in [7] (implemented in GTSAM [4]) shares the same approach of [6]. It avoids degenerate cases by using a flexiblesize error function. Recently [8] proposed a solution in which less weighting is given to the error terms for “problematic” features.
Compared to the aforementioned methods, our proposed algorithm PMBA treats
the problem with a totally different viewpoint. We argue that the root cause for illconditioned cases is that feature uncertainty for conventional BAs is NOT uniformly bounded. In our previous work [9][1], we used three angles (elevation, azimuth and parallax) to define structure of a feature without involving depth. [1] demonstrated that this parameterization is closer to the measurement space of projective geometry, parallaxbased BA (we call it PBA in this paper) is
more robust and efficient compared to BA’s in XYZ or IDP form. We will present our improved manifold version – PMBA that faithfully complies with projective geometry in computer vision. This results in a nonsingular Hessian and a bounded error function that is suitable for faster implementation.
Initialization methods
BA due to its highly nonconvex nature, requires good initial estimate to converge to global minimum. The common initialization methods include
incremental or global. In incremental strategy, with a simple start, many midlevel BAs are performed on each new pose insertion. Incremental
strategy draws the criticism that it is slow and leads to drifting for long sequences of data. Example systems are VisualSFM [10], Bundler [11] and ORBSLAM [5]. The alternative is global initialization where all camera poses are initialised simultaneously. Global SfM thus bootstrapped shows higher efficiency and accuracy. This strategy exposes many research challenges, and has been studied intensively in [12][13][14][15].
This paper builds on the previous PBA algorithm [9][1] and makes the following improvements: (1) recalls the conventional BA methods and analyzes its limitations (Section 2); (2) an improved PBA on manifold formulation that is able to fully avoid “problematic feature” induced illconditioned cases (Section 3) ; (3) a simple but effective global initialization method using convexified posegraph model that is compatible with PMBA, which can guarantee a nearoptimal solution (Section 4); (4) to demonstrate the two improvements, we provide both theoretical proof and experimental results from a series of largescale datasets, sequential or outoforder (Section 5).
Notations:

is a skew symmetric matrix from vector , equivalent to crossproduct operator,

The term represents the camera pose at timestep .

Subscript indicates frame is local.

Decoration indicates vector is normalized: .
2 Background Knowledge
In this section, we first recall the monocular SLAM problem and conventional BA. We then analyze the potential problems in this formulation.
The visual SLAM problem estimates camera poses and feature positions from a set of images . When the feature is observed from the pose , the monocular sensor intercepts the light ray that passes through its centre to the feature point in the form of image pixel , as shown in Fig. 2(a). Table 2 lists different expressions the observation ray can have.


Global ray  Global ray direction  Local ray  Local ray direction 

The information encodes constitute constraints in a maximum a posterior (MAP) problem for poses and points.
(1) 
In conventional BA, the error function is given by:
(2) 
BA with conventional parameterization and cost function suffers from the issues listed below:

Illconditioned case due to problematic features: Although these features still contain some information, they cause singularity in the Hessian matrix, a main contribution to GN divergence and numerical instability.

Slow convergence: To deal with singularity, slow LM is commonly used for safe increment, DL and GN are avoided, and efficiency is compromised for stability.

Stop criteria: Small changes in the error cost lead to large variation in the state variable, making it difficult to specify a consistent stop criterion.

Local minimum: the error function (2) does not distinguish between infrustum or behind camera features, thus causing many local minima and saddle points. A good example is the twoview geometry problem in which there are multiple global minima for the BA formulation such that further manual intervention is needed to pick the feasible solution.
In light of above discussion, safehandling of illconditioned cases is vitally important for robustness, accuracy and efficiency of visual SLAM.
3 Parallax Bundle Adjustment on Manifold
In this section, we introduce the BA method using parallax angle in manifold domain (PMBA). We provide a thorough theoretical analysis on the boundedness of its information matrix, hence prove its smooth convergence without issues of singularity. We also show the error function is bounded and globally continuous. All these factors lead to possibility of faster optimization method DL, a significant improvement than previous work [9][1].
3.1 Feature parameterization
(a) Feature anchored by and , is parallax angle between anchor rays, is ray direction in main’s frame, is an arbitrary pose position covisible for .  (b) Convexification of PMBA: Rotate about by () becomes . 
A feature’s depth information is implied in the parallax between observations from different viewpoints. For a feature , amongst the set of cameras to which is visible, we choose a main anchor and an associate anchor that form best parallax angle from their observation rays. This geometric relationship among the feature is illustrated in Fig. 3(a). The feature can be overparameterized by the unit observation ray vector in mainanchor frame, and the parallax angle , i.e.,
(3) 
The new parameterization only defines the relative structure of the feature with respect to its two anchors. The scale of the feature is implicitly defined by the relative translation of the two anchors, computed as
(4)  
where

is the local depth of the feature in the main anchor frame, from sine rule.

is the rotation for main anchor frame .

is the direction of observation ray from point to point , local in main anchor frame .

is the parallax angle between the vector and the vector .

is the angle between vector and vector .
Remark 1
In the original PBA parameterization [1], ray direction was defined by an elevation and azimuth angle in the global frame, camera’s orientation in Euler angles. Expressing direction in sinusoids of angles is a potential source of singularity. In PMBA, both and are in the manifold domain. Moreoever, is defined in ’s local frame, for ease of multicamera system application.
3.2 State retraction in manifold
Optimization in manifold follows a 3 step procedure [16]: lift a manifold variable to its tangent space, solve a normal equation to obtain the Euclidean increment, and retract back to manifold. We adopt method in [17][18] for pose retraction. For feature’s ray direction, we give a natural definition of uncertainty as a normally distributed rotational perturbation to the directional vector as shown in Fig. 4. The rotation’s axis constitutes a plane normal to the ray passing through the observing camera, and is the tangent space, summarized in the following equation:
(5) 
where , and . The optimal perturbation is the increment for retraction :
(6) 
where the total increment has same dimensionality as conventional parameterization.
3.3 Error function and optimization formulation
After determining the main anchor and the associated for each feature , we can rewrite the nonlinear least squares problem (1) using the new feature parametrization
(7) 
where and . We now give ray a new definition (with abuse of notation): the original ray vector scaled up by a factor of , for convenience of mathematical manipulation, i.e.,
(8)  
We also introduce a ray direction based error function, different from the conventional pixelbased error function (2) (shown in Fig. 2), i.e.,
(9) 
where is the measured directional vector for the feature in the pose . From now on, we use to refer to for simplicity.
We further simplify (7) by moving measurement to global frame
(10) 
Remark 2
Specifically,

The error equation (9) implies the residual , where is the angle between the estimated and measured ray direction. Thus, the error equation is bounded.

In contrast to conventional 2D cost functions, our error function (9) operates in 3D and thus can handle the case when the feature point lies behind the observing camera.
3.4 Theoretical analysis on behaviour of information matrix
Consider the Hessian matrix of the problem (7)
(11) 
where and . Like the Hessian matrix in conventional BA, is block diagonal. With the Schur’s complement method, the dominant computation in each Newton method’s iteration is about solving the following normal equation:
(12) 
where . In conventional BA, existence of problematic features makes the matrix and the block matrix (with slight abuse of notation) illconditioned at the neighborhood of global minimum. The global minimum locates at a “long flat valley” [1] such that solvers fail or require long iterations to converge, see Fig. 8(a) for illustration.
In comparison, PMBA’s formulation (10), thanks to the redefined retraction (6) and the error function (9), has an uncluttered Hessian, can therefore fully avoid the illconditioned cases caused by “problematic” features.
Theorem 1
Under the formulation (10), is consistently nonsingular for any and .
See Appendix .1. Theorem 1 completely suppresses all illconditioned such that achieving convergence becomes much eaiser. As a result, DL can be safely used for efficiency. One can also appreciate Theorem 1 from an Information Theory perspective: the Hessian matrix at global minimum is the inverse of the covariance matrix (up to a scale) and thus the uncertainty of the parallax angle and the direction is uniformly bounded.
Remark 3
Remark 4
Although the matrices and are denser, compared to those in XYZ or IDP, shows same sparsity. Thus the computational time for each iteration in PMBA is comparable to conventional BA, see [1] for proof.
4 Global Initialization
In this section, we derive a novel initialization strategy. We do this in two steps: An orientations and parallax feature initialization step that involves cheap rotation averaging and anchor selection, without the need of expensive triangulation; then a translationaveraging method using a simplified convex posegraph optimization. We prove that a nearoptimal solution can be obtained by this strategy. This process is illustrated in Fig. 5.
4.1 Orientation and feature initialization
Following the approach in [19][13][20], we first compute an initial guess for orientation . For each pair of pose and with common features above a threshold, we extract its Epipolar Geometry (EG) by Kneip’s 5point algorithm [21]. We then use the stateoftheart chordal initialization [22] to accurately compute . We now feed into OpenGV’s twopt ransac module [23], to obtain translation directions .
Having obtained accurate estimates for orientations and EGpairs, we are ready to perform feature initialization. The default anchor selection strategy was given in [1]. We use the same algorithm for anchor selection, with the small change that covisible pose scanned in pick anchors have to be part of an EGpair. This step ensures best ascanbe parallax angle be given to each feature point. We stress that any problematic features corresponding to low parallax angles do stay in the state and do not affect convergence under PMBA. Good features together with problem ones work together to shape the final solution.
Remark 5
PMBA parameterization does not involve scale calculation, the selection algorithm in [1] utilizes this property and only makes use of camera rotations to compute feature values in a fast and accurate way, we thus completely avoid unreliable/expensive linear triangulation.
4.2 Position initialization
After orientation and feature initialization, we can perform position initialization. We do this by approximating the original nonlinear ray function (10) with a linear relation of positions, helped with a rotation trick, as illustrated in Fig. 3(b). Now we give a new formulation:
(13)  
where

is the rotation axis from the vector to the vector .

is the angle of rotation from vector to the vector .

Both and are locally observable.

is rotation of vector about axis by angle.
Inspired by the translation averaging method in [20], we obtain a “position only” convex cost function, after substituting into the original formulation (10) optimizationm:
(14) 
Remark 6
Considering (14) is still a nonlinear problem, an initial guess for the problem (14) is needed. Since is linear to positions, we further simplify (14) to a linearly constrained Quadratic Programming (QPLC) problem: to minimize the crossproduct between ray and , with a linear constraint to ensure local observation ray lies in front of the camera, as shown in Fig. 2(b), i.e.,
(15) 
4.3 Theoretical analysis
Theorem 2
See Appendix .2. Theorem 2 proves the correctness and robustness of the proposed initialization in theory. Moreover, (14) is a posegraph problem with much reduced size than (10) and the expensive feature retraction operation is also not needed.
Remark 7
Here we do not claim the proposed global initialization is the best one but it is very compatible to PMBA. Note that the proposed method is friendly to robust methods such as pseudo Huber, norm or outlier detection technique. Further, this convexified model is still formulated in a probabilistic framework, different from the “Linear Global Translation Estimation” reported in [12].
5 Evaluation on PMBA performance
5.1 Simulation
We demonstrate PMBA’s ability to handle problem features by running a simple simulation test: 4 poses and 10 features. One of the problem features is a far feature, another is a singular feature that would cause singularity in original PBA, as shown in Fig. 6(a). We run 4 iterations for the BAs under comparison: DL for PMBA and XYZBA; and LM for PBA. At each iteration, we collect the Hessian’s condition number, and at the end report the error between optimization results and ground truth. The results are listed in Table 3. PMBA has normal condition numbers and gave good optimized estimates, PBA and XYZBA show consistently large condition numbers and high final state error. This confirms our prediction that PMBA has wellbehaved information matrix during optimization.


Convergence Properties  PMBA  PBA  XYZBA 

Iter0  9.74  1.46E+4  1.22E+94 
Iter1  5.68  1.46E+4  1.22E+94 
Iter2  8.80  1.46E+4  1.22E+94 
Iter3  5.74  1.46E+4  3.53E+95 
Final  2.58E3  5.37E2  3.43E2 
5.2 Large dataset test
We conducted a series of real datasets to compare performance of the proposed PMBA (10) and original PBA, IDP and XYZ, aiming to address following questions:

Robustness. With illconditioned scenario disappearing, can DL be safely used in PMBA?

Efficiency. If DL were safely applied for PMBA formulation, how fast can the optimization process be?

Accuracy. Since the PMBA formulation employs a different error function (9). Is the global minimum accurate?
All methods are tested against 6 very challenging datasets, which are also accessible from OpenSLAM

Fakepile is collected by the Google tango tablet in normal lab environment [24] with a fake bridge pile in the middle, showing close and far features.

Malaga [25] is collected using an electric car equipped camera facing the road, rich in collinear features.

Village and College are aerial photogrammetric datasets. The low feature to observation ratio implies the existence of many small parallax features

UsydMainquad2 and Victoriacottage are collected at University of Sydney campus, show in Fig. 7.
(a) Images  (b) PMBA output pose and map 
Dataset  Testtype 


Initial Chi2  Final Chi2  Time[sec]  

Fakepile  PMBA  135  9 / 9  6.6E+6  1.7E+2  0.7  
PBA  /12,741  23 / 23  6.6E+6  1.7E+2  1.9  
IDP  /53,878  104 / 102  6.6E+6  1.7E+2  6.0  
XYZ  116 / 108  6.6E+6  1.2E+3  4.7  
Malaga  PMBA  170  44 / 31  3.8E+7  9.1E+3  21.6  
PBA  /305,719  64 / 47  3.8E+7  9.1E+3  35.4  
IDP  /779,268  230 / 170  3.1E+7  5.8E+5  93.8  
XYZ  110 / 85  3.8E+7  3.3E+5  39.0  
Village  PMBA  90  12 / 12  1.0E+10  3.3E+4  31.8  
PBA  /305,719  13 / 13  1.0E+10  3.3E+4  36.0  
IDP  /779,268  19 / 19  1.0E+10  3.3E+4  35.2  
XYZ  18 / 18  1.0E+10  3.3E+4  26.3  
College  PMBA  468  33 / 33  3.0E+11  1.1E+6  334.4  
PBA  /1,236,502  31 / 31  3.0E+11  1.1E+6  370.5  
IDP  /3,107,524  34 / 34  3.0E+11  1.1E+6  255.3  
XYZ  295 / 193  3.0E+11  1.0E+7  1361.0  
Victoria  PMBA  400  19 / 16  6.2E+8  1.1E+6  70.5  
cottage  PBA  /153,632  88 / 66  6.2E+8  1.2E+6  301.4  
IDP  /890,057  49/48  6.2E+8  1.1E+6  157.9  
XYZ  47 / 44  6.2E+8  1.2E+6  124.3  
Usyd  PMBA  424  25 / 25  2.4E+9  2.4E+6  214.5  
Mainquad  PBA  /227,615  101 / 57  2.4E+9  3.6E+6  642.6  
IDP  /1,607,082  301 / 191  2.2E+9  4.6E+6  1994.7  
XYZ  76 / 58  2.4E+9  2.8E+6  423.7 
(a) Malaga  (b) UsydMainquad2 
(c) Victoriacottage  (d) Fakepile 
We use the initialization method from [1] to set all BAs from the same starting point. We find that PBA, IDP and XYZ show unstable behaviour under DL. PMBA, in comparison, has always worked well with DL. This can be explained by our Theorem 1 that Hessian in PMBA does not exhibit singularity yet others can, we therefore list DL results for PMBA and LM for other BA’s.
We use Ceressolver as the optimization engine and test all BAs on an Inteli7 with one thread. We use ray direction cost function for PMBA, and compute its corresponding uvbased Chi2 error at each iteration step with current estimate, to compare with other BAs on a common error metric. This scheme is not fair for PMBA, yet is the only convincing way to evaluate performance amongst all methods. Despite of this treatment, we found PMBA the best performer in all tests, consistent to our expectation. Selected convergence plots are shown in Fig. 8, other details are summarized in Table 4.
Further, in the Malaga dataset that contains numerous problematic features (Fig. 1), we observe that the PMBA estimates and Ground Truth are very close, yet conventional BA gives significant error. This is also seen in Table 4, conventional BA’s converge to a local minimum, whereas both PBA and PMBA can converge to their respective global minimums. Fig. 8 demonstrates the error function (9) is practical, consistent with the claim in [26]. In conclusion, these experiments all give positive answers to the raised questions.
5.3 Evaluation of convexified initialization
In this subsection, we use selected datasets from the “Bundle Adjustment in the Large” (BAL) datebase

Ladybug1370: 1370 images, captured at a regular rate using a Ladybug camera mounted on a moving vehicle.

Trafalgar126: 126 outoforder internet images.

Venice427: 427 outoforder internet images.

College: 468 arial photogrammetric images.
All these datasets include either collinear or far features, exposing challenges for conventional BA. Since camera calibration is beyond the scope of this activity, we apply the reported optimal camera settings from BAL and PBA websites and only test undistorted versions of these data. We stress that our initial pose and feature values are purely generated from the Rotation averaging and Translation averaging method described in Section 4, without using the initial values provided by [27]. We are able to form good initial values at QPLC stage, shown in Fig. 9 as 1st image in each row. And, our convex posegraph stage has a very large convergence region such that imperfect outputs from the QPLC stage can gradually converge to a posegraph with a topology similar to that of Ground Truth. This is especially obvious in “Ladybug1370” and “Venice427”. Moreover, in “Ladybug1370”, BAL’s optimal trajectory (in red) contains an erroneous camera pose, shown as red dot at top right corner of the red trajectory, our method did not encounter this stray pose at all.
6 Conclusion
In this work, we proposed a new bundle adjustment formulation (PMBA) which utilizes parallax angle based feature parametrization on manifold and observationray based objective function. We proved that under the new formulation the illconditioned cases due to problematic features can be theoretically avoided without any manual intervention, which results in much better convergence and robustness properties.
Furthermore, motivated by the strong local observability hidden in the visual SLAM problem, we derived a novel global initialization process for PMBA. We use a simplified model that can guarantee a nearoptimal solution to bootstrap the original problem. Experimental results show that the proposed initialization can provide accurate estimates and is a viable global initialization strategy for many challenging situations including sequential and outoforder images.
The promising results of the global initialization plus PMBA pipeline using publicly available datasets demonstrate that the proposed technique can deal with different challenging data. In the future, we are planning to integrate the proposed pipeline with efficient visual SLAM frontend to develop a robust and efficient SfM system.
.1 The Proof of Theorem 1
Consider the feature and the corresponding subblock matrix in . Denoting for (), we have
(16) 
On the one hand,
(17)  
Denoting and , we have
(18)  
Note that and , thus we have
(19) 
Therefore, and .
.2 The Proof of Theorem 2
Consider the following function
(20) 
where , (). It is a fact that
(21) 
for any and any .
Considering the problem (14) and the linearity of w.r.t. , (14) can be rewritten as
(22)  
where is a directional vector.
Denoting the global minimum of the problem (14) as , we have
(23)  
for any and . The inequality above indicates a fact: if we perform optimization for the problem (14), i.e., from any initial guess , the converged value after optimization will be a nearoptimal solution, i.e., . When is close to the optimal estimate, will be also a nearoptimal solution of the problem (10) clearly. Under noisefree condition, is an exact solution due to thus the problem (14) is convex.
Footnotes
References
 L. Zhao, S. Huang, Y. Sun, L. Yan, and G. Dissanayake, “Parallaxba: bundle adjustment using parallax angle feature parametrization,” The International Journal of Robotics Research, vol. 34, no. 45, pp. 493–516, 2015.
 S. Agarwal, K. Mierle, and Others, “Ceres solver,” http://ceressolver.org.
 R. Kuemmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard.
 F. Dellaert, “Factor graphs and gtsam: A handson introduction,” Georgia Institute of Technology, Tech. Rep., 2012.
 M. J. M. M. MurArtal, Raúl and J. D. Tardós, “ORBSLAM: a versatile and accurate monocular SLAM system,” IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163.
 H. Zhang, K. Hasith, and H. Wang, “A hybrid feature parametrization for improving stereoslam consistency,” in 2017 13th IEEE International Conference on Control Automation (ICCA), 2017, pp. 1021–1026.
 L. Carlone, Z. Kira, C. Beall, and Others, “Eliminating conditionally independent sets in factor graphs: A unifying perspective based on smart factors,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 4290–4297.
 C. Lee and K. Yoon, “Exploiting feature confidence for forward motion estimation,” CoRR, vol. abs/1704.07145, 2017.
 L. Zhao, S. Huang, L. Yan, and G. Dissanayake, “Parallax angle parametrization for monocular slam,” in 2011 IEEE International Conference on Robotics and Automation, 2011, pp. 3117–3124.
 C. Wu et al., “Visualsfm: A visual structure from motion system.”
 N. Snavely, S. Seitz, and R. Szeliski, “Photo tourism: Exploring image collections in 3d,” in ACM Transactions on Graphics (Proceedings of SIGGRAPH 2006), 2006.
 Z. Cui, N. Jiang, and P. Tan, “Linear global translation estimation from feature tracks,” CoRR, vol. abs/1503.01832, 2015.
 N. Jiang, Z. Cui, and P. Tan, “A global linear method for camera pose registration,” in 2013 IEEE International Conference on Computer Vision, 2013, pp. 481–488.
 C. Tang, O. Wang, and P. Tan, “Globalslam: Initializationrobust monocular visual slam,” arXiv preprint arXiv:1708.04814, 2017.
 Z. Cui and P. Tan, “Global structurefrommotion by similarity averaging,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 864–872.
 S. T. Smith, “Trustregion methods on riemannian manifolds,” Foundations of Computational Mathematics, no. 3, pp. 303–330, 2007.
 T. Zhang, K. Wu, J. Song, S. Huang, and G. Dissanayake, “Convergence and consistency analysis for a 3d invariantekf slam,” IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 733–740, 2017.
 T. Zhang, K. Wu, D. Su, S. Huang, and G. Dissanayake, “An invariantekf VINS algorithm for improving consistency,” in accepted by 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sep 2017.
 D. Crandall, A. Owens, N. Snavely, and D. Huttenlocher, “Discretecontinuous optimization for largescale structure from motion,” in CVPR 2011, 2011, pp. 3001–3008.
 K. Wilson and N. Snavely, Robust Global Translations with 1DSfM. Cham: Springer International Publishing, 2014, pp. 61–75.
 L. Kneip, R. Siegwart, and M. Pollefeys, Finding the Exact Rotation between Two Images Independently of the Translation. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 696–709.
 L. Carlone, R. Tron, K. Daniilidis, and F. Dellaert, “Initialization techniques for 3d slam: A survey on rotation estimation and its use in pose graph optimization,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), 2015, pp. 4597–4604.
 L. Kneip and P. Furgale, “Opengv: A unified and generalized approach to realtime calibrated geometric vision,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), May 2014.
 L. Liu, Y. Wang, L. Zhao, and S. Huang, “Evaluation of different slam algorithms using google tangle data,” in 2017 IEEE Conference on Industrial Electronics and Applications, 2017.
 J.L. Blanco, F.A. Moreno, and J. Gonzalez, “A collection of outdoor robotic datasets with centimeteraccuracy ground truth,” Autonomous Robots, vol. 27, no. 4, p. 327, Aug 2009.
 S. Im, H. Ha, F. Rameau, H.G. Jeon, G. Choe, and I. S. Kweon.
 S. Agarwal, N. Snavely, and Others, “Bundle adjustment in the large,” http://grail.cs.washington.edu/projects/bal/.