Optimization over Geodesics for Exact Principal Geodesic Analysis
In fields ranging from computer vision to signal processing and statistics, increasing computational power allows a move from classical linear models to models that incorporate non-linear phenomena. This shift has created interest in computational aspects of differential geometry, and solving optimization problems that incorporate non-linear geometry constitutes an important computational task. In this paper, we develop methods for numerically solving optimization problems over spaces of geodesics using numerical integration of Jacobi fields and second order derivatives of geodesic families. As an important application of this optimization strategy, we compute exact Principal Geodesic Analysis (PGA), a non-linear version of the PCA dimensionality reduction procedure. By applying the exact PGA algorithm to synthetic data, we exemplify the differences between the linearized and exact algorithms caused by the non-linear geometry. In addition, we use the numerically integrated Jacobi fields to determine sectional curvatures and provide upper bounds for injectivity radii.
Keywords:geometric optimization, principal geodesic analysis, manifold statistics, differential geometry, Riemannian metrics
Manifolds, sets locally modeled by Euclidean spaces, have a long and intriguing history in mathematics, and topological, differential geometric, and Riemannian geometric properties of manifolds have been studied extensively. The introduction of high performance computing in applied fields has widened the use of differential geometry, and Riemannian manifolds, in particular, are now used for modeling a range of problems possessing non-linear structure. Applications include shape modeling (complex projective shape spaces kendall_shape_1984 () and medial representations of surfaces blum_transformation_1967 (); joshi_multiscale_2002 ()), imaging (tensor manifolds in diffusion tensor imaging fletcher_principal_2004 (); fletcher_riemannian_2007 (); pennec_riemannian_2006 () and image segmentation and registration caselles_geodesic_1995 (); pennec_feature-based_1998 ()), and several other fields (forestry huckemann_intrinsic_2010 (), human motion modeling sminchisescu_generative_2004 (); urtasun_priors_2005 (), information geometry and signal processing yang_means_2011 ()).
To fully utilize the power of manifolds in applied modeling, it is essential to develop fast and robust algorithms for performing computations on manifolds, and, in particular, availability of methods for solving optimization problems is paramount. In this paper, we develop methods for numerically solving optimization problems over spaces of geodesics using numerical integration of Jacobi fields and second order derivatives of geodesic families. In addition, the approach allows numerical approximation of sectional curvatures and bounds on injectivity radii huckemann_intrinsic_2010 (). The methods apply to manifolds represented both parametrically and implicitly without preconditions such as knowledge of explicit formulas for geodesics. This fact makes the approach applicable to a range of applications, and it allows exploration of the effect of curvature on non-linear statistical methods.
To exemplify this, we consider the problem of capturing the variation of a set of manifold valued data with the Principal Geodesic Analysis (PGA, fletcher_principal_2004-1 ()) non-linear generalization of Principal Component Analysis (PCA). Until now, there has been no method for numerically computing PGA for general manifolds without linearizing the problem. Because PGA can be formulated as an optimization problem over geodesics, the tools developed here apply to computing it without discarding the non-linear structure. As a result, the paper provides an algorithm for computing exact PGA for a wide range of manifolds.
1.1 Related Work
A vast body of mathematical literature describes manifolds and Riemannian structures; do_carmo_riemannian_1992 (); lee_riemannian_1997 () provide excellent introductions to the field. From an applied point of view, the papers dedieu_symplectic_2005 (); herbert_bishop_keller_numerical_1968 (); noakes_global_1998 (); klassen_geodesics_2006 (); schmidt_shape_2006 (); sommer_bicycle_2009 () address first-order problems such as computing geodesics and solving the exponential map inverse problem, the logarithm map. Certain second-order problems including computing Jacobi fields on diffeomorphism groups younes_evolutions_2009 (); ferreira_newton_2008 () have been considered but mainly on limited classes of manifolds. Different aspects of numerical computation on implicitly defined manifolds are covered in zhang_curvature_2007 (); rheinboldt_manpak:_1996 (); rabier_computational_1990 (), and generalizing linear statistics to manifolds has been the focus of the papers karcher_riemannian_1977 (); pennec_intrinsic_2006 (); fletcher_robust_2008 (); fletcher_principal_2004-1 (); huckemann_intrinsic_2010 ().
Optimization problems can be posed on a manifold in the sense that the domain of the cost function is restricted to the manifold. Such problems are extensively covered in the literature (e.g. luenberger_gradient_1972 (); yang_globally_2007 ()). In contrast, this paper concerns optimization problems over geodesics with the complexity residing in the cost functions and not the optimization domains.
The manifold generalization of linear PCA, PGA, was first introduced in fletcher_statistics_2003 () but it was formulated in the form most widely used in fletcher_principal_2004-1 (). It has subsequently been used for several applications. To mention a few, the authors in fletcher_principal_2004-1 (); fletcher_principal_2004 () study variations of medial atoms, wu_weighted_2008 () uses a variation of PGA for facial classification, said_exact_2007 () presents examples on motion capture data, and sommer_bicycle_2009 () applies PGA to vertebrae outlines. The algorithm presented in fletcher_principal_2004-1 () for computing PGA with tangent space linearization is most widely used. In contrast, said_exact_2007 () computes PGA as defined in fletcher_statistics_2003 () without approximations, exact PGA, on a particular manifold, the Lie group . The paper sommer_manifold_2010 () uses the methods presented here to experimentally assess the effect of tangent space linearization on high dimensional manifolds modeling real-life data.
A recent wave of interest in manifold valued statistics has lead to the development of Geodesic PCA (GPCA, huckemann_intrinsic_2010 ()) and Horizontal Component Analysis (HCA, sommer_horizontal_2013 ()). GPCA is in many respects close to PGA but GPCA optimizes for the placement of the center point and minimizes projection residuals along geodesics. HCA builds low-dimensional orthogonal decompositions in the frame bundle of the manifold that project back to approximating subspaces in the manifold.
1.2 Content and Outline
The paper presents two main contributions: (1) how numerical integration of Jacobi fields and second order derivatives can be used to solve optimization problems over geodesics; and (2) how the approach allows numerical computation of exact PGA. In addition, we use the computed Jacobi fields to numerically approximate geometric properties such as sectional curvatures. After a brief discussion of the geometric background, explicit differential equations for computing Jacobi fields and second derivatives of geodesic families are presented in Section 3. The actual derivations are performed in the appendices due to their notational complexity. In Section 4, the exact PGA algorithm is developed. We end the paper with experiments that illustrate the effect of curvature on the non-linear statistical method and with estimation of sectional curvatures and injectivity radii bounds.
The importance of curvature computations is noted in huckemann_intrinsic_2010 (), which lists the ability to compute sectional curvature as a high importance open problem. The paper presents a partial solution to this problem: we discuss how sectional curvatures can be determined numerically when either a parametrization or implicit representation is available.
In the experiments, we evaluate how the differences between the exact and linearized PGA depend on the curvature of the manifold. This experiment, which to the best of our knowledge has not been made before, is made possible by the generality of the optimization approach that makes the algorithm applicable to a wide range of manifolds with varying curvature.
This section will include brief discussions of relevant aspects of differential and Riemannian geometry. We keep the notation close to the notation used in the book do_carmo_riemannian_1992 (); see in addition Appendix A.
2.1 Manifolds and Their Representations
In the sequel, will denote a Riemannian manifold of finite dimension . We will need to be sufficiently smooth, i.e. of class for or depending on the application. For concrete computational applications, we will represent either using parametrizations or implicitly. A local parametrization is a map from an open subset to . With an implicit representation, is represented as a level set of a differentiable map , e.g. . If the Jacobian matrix has full rank everywhere on , will be an -dimensional manifold. The space is called the embedding space. When dealing with implicitly defined manifolds, we let and denote the dimension of the domain and codomain of , respectively, so that the dimension of the manifold equals . Examples of applications using implicit representations include shape and human poses models sommer_bicycle_2009 (); hauberg_natural_2012 (), and several shape models use parametric representations joshi_multiscale_2002 (); klassen_analysis_2004 ().111Other representations include discrete triangulations used for surfaces and quotients of a larger manifold by a group . The latter is for example the case for Kendall’s shape-spaces kendall_shape_1984 (). Kendall’s shape-spaces for planar points are actually complex projective spaces for which parameterizations are available, and, for points in -dimensional space and higher, the shape-spaces are anomalous and not manifolds. The spaces studied in huckemann_intrinsic_2010 () belong to this class.
2.2 Geodesic Systems
Given a local parametrization , a curve on is a geodesic if the curve in representing , i.e. , satisfies the ODE
Here denotes the Christoffel symbols of the Riemannian metric. Conversely, geodesics can be found by solving the ODE with a starting point and initial velocity as initial conditions. The exponential map maps the initial point and velocity to , the point on the geodesic at time . When defined, the logarithm map is the inverse of , i.e. . For implicitly represented manifolds, the classical ODE describing geodesics is not directly usable because neither parametrizations nor Christoffel symbols are directly available. Instead, the geodesic with initial point and initial velocity can be found as the -part of the solution of the IVP
see dedieu_symplectic_2005 (). Note that is a curve in the embedding space but since is a subset of the embedding space and the starting point is in , will stay in for all . Recall that is the map defining the manifold by e.g. and that denotes the Hessian of the th component of . is map between Euclidean spaces and the Hessian is therefore the ordinary Euclidean Hessian matrix. The map is defined by where the symbol denotes the generalized inverse or pseudo-inverse of the non-square matrix . Since has full-rank , equals . Numerical stability of the geodesic system is treated in dedieu_symplectic_2005 ().
2.3 Geodesic Families and Variations of Geodesics
In the next sections, we will treat optimization problems over geodesics of which the PGA problem (6) constitute a concrete example; in addition, problems such as geodesic regression fletcher_geodesic_2011 () and manifold total least squares belong to this class. For this purpose, we here recall the close connection between variations of geodesics, Jacobi fields, and the differential . Let be a family of geodesics parametrized by , i.e. for each , the curve is a geodesic. By varying the parameter , a vector field is obtained.222 Recall that is a shorthand for , see Appendix A. These Jacobi fields are uniquely determined by the initial conditions and , the variation of the initial points and the covariant derivative of the field at , respectively. Define , , and . If and then equals (do_carmo_riemannian_1992, , Chap. 5). When is constant, i.e. , we have the following connection between and the differential :
Jacobi fields can equivalently be defined as solutions to an ODE that involves the curvature endomorphism of the manifold (do_carmo_riemannian_1992, , Chap. 5). However, the curvature endomorphism is not easily computed when the manifold is represented implicitly, and, therefore, the ODE is hard to use for computational applications in this case. In the next section, we numerically compute Jacobi fields by integrating the differential of the system (2).
Jacobi fields can be used to retrieve various geometric information e.g. sectional curvature. Let denote a Jacobi field along the geodesic with and derivative . Assume the vectors and are orthonormal. These vectors define a plane in , and denotes the sectional curvature of the plane . Because occurs in a Taylor expansion of the length , the sectional curvature can be estimated by
for small . Furthermore, if is a non-zero Jacobi field with along a geodesic and, for some , also then is called a conjugate point to . This can provide an upper bound on the injectivity radius of , that, in general terms, specifies the minimum length of non-minimizing geodesics. Figure 1 illustrates the situation on the sphere . We will explore both these points in the experiments section.
2.4 Principal Geodesic Analysis
Principal Component Analysis (PCA) is widely used to model the variability of data in Euclidean spaces. The procedure provides linear dimensionality reduction by defining a sequence of linear subspaces maximizing the variance of the projection of the data to the subspaces or, equivalently, minimizing the reconstruction errors. The th subspace is spanned by an orthogonal basis of principal components , and the th principal component is defined recursively by
when formulated as to maximize the variance of the projection of the dataset to the subspaces .
PCA is dependent on the vector space structure of the Euclidean space and hence cannot be performed on manifold valued datasets. Principal Geodesic Analysis was developed to overcome this limitation. PGA finds geodesic subspaces centered at point with usually being an intrinsic mean333 The notion of intrinsic mean goes back to Fréchet frechet_les_1948 () and Karcher karcher_riemannian_1977 (). As in fletcher_principal_2004-1 (), an intrinsic mean is here a minimizer of . Uniqueness issues are treated in karcher_riemannian_1977 (). of the dataset , . The th geodesic subspace of is defined as with being the span of the principal directions defined recursively by
The projection of a point onto a geodesic subspace is
The term being maximized in (6) is the sample variance of the projected data, the expected value of the squared distance to , and PGA therefore extends PCA by finding geodesic subspaces in which variance is maximized.444 A slightly different definition that uses one-dimensional subspaces and Lie group structure was introduced in fletcher_statistics_2003 ().
Since both optimization problems (6) and (7) are difficult to optimize, PGA has traditionally been computed using the orthogonal projection in the tangent space of to approximate the true projection. With this approximation, equation (6) simplifies to
which is equivalent to (5), and, therefore, the procedure amounts to performing regular PCA on the vectors . We will refer to PGA with the approximation as linearized PGA, and, following said_exact_2007 (), PGA as defined by (6) will be referred to as exact PGA.555In said_exact_2007 (), the fact that has a closed form solution on the sphere when is a one-dimensional geodesic subspace is used to iteratively compute PGA with the fletcher_statistics_2003 () definition. The ability to iteratively solve optimization problems over geodesics that we will develop in the next sections will allow us to optimize (6) and hence numerically compute exact PGA.
In general, PGA might not be well-defined as the intrinsic mean might not be unique and both existence and uniqueness may fail for the projections (7) and the optimization problem (6). The convexity bounds of Karcher karcher_riemannian_1977 () ensures uniqueness of the mean for sufficiently local data but setting up sufficient conditions to ensure well-posedness of (7) and (6) for general manifolds is difficult because they depend on the global geometry of the manifold.
There is ongoing discussion of when principal components should be constrained to pass the intrinsic mean as in PGA or if other types of means should be used, see huckemann_intrinsic_2010 () with discussions. In Geodesic PCA huckemann_intrinsic_2010 (), the principal geodesics do not necessarily pass the intrinsic mean, and similar optimization that allows the PGA base point to move away from the intrinsic mean can be carried out with the optimization approach used in this paper. PGA can also be modified by replacing maximization of sample variance by minimization of reconstruction error. This alternate definition is not equivalent to the definition above, a fact that again underlines the difference between the Euclidean and the curved situation. We will illustrate differences between the formulations in the experiments section but we mainly use the variance formulation (6).
3 Optimization over Geodesics
Equation (6) and (7) defining PGA are examples of optimization problems over geodesics that in those cases are represented by their starting point and initial velocity . More generally, we here consider problems
where is a function defining the cost of the geodesic here at time .666Even more generally, can be a function of the entire curve , instead of just for the point , Note that for PGA, the initial velocity is in addition constrained to subspaces of .. In order to iteratively solve optimization problems of the form (8), we will need derivatives of since with . Thus, we wish to compute the differential of with respect to initial point and initial velocity . Since (6) is a function of the projection given by (7), we will later see that we need the second order differential of as well.
Only in specific cases where explicit expressions for geodesics are available can the above mention differentials be derived in closed form. Instead, for general manifolds, the ODEs (1) and (2) describing geodesics can be differentiated giving systems that can be numerically integrated to provide the differentials. This approach relies on the fact that sufficiently smooth initial value problems (IVPs) are differentiable with respect to their initial values, see e.g. (hairer_solving_2008, , Chap. I.14).
We will here derive explicit expressions for IVPs describing the differential of the exponential map and Jacobi fields. In addition, we will differentiate the IVPs a second time. The concrete expressions will allow the IVPs to be used for iterative optimization of problems on the form (8). In particular, they will be used for the exact PGA algorithm presented in the next section. The basic strategy is simple: we differentiate the geodesic systems of Section 2.2. Though the resulting equations are notationally complex, their derivation is in principle just repeated application of the chain and product rules for differentiation. MATLAB code for numerical integration of the systems is available at http://image.diku.dk/sommer.
Since the geodesic equations (2) contain the generalized inverse of the Jacobian matrix , we will use the following formula for derivatives of generalized inverses. When an matrix depends on a parameter and has full rank , and if its generalized inverse is differentiable, then the derivative is given by
This result was derived in decell_derivative_1974 (); golub_differentiation_1973 () and hanson_extensions_1969 () for the full-rank case. We will apply (9) with when is an dependent family of curves in the embedding space that are geodesics on and when is fixed. To see that is differentiable with respect to when depends smoothly on , take a frame of the normal space to in a neighborhood of , and note that is a composition of a invertible map onto the frame depending smoothly on and the frame itself.
The explicit expressions for the differential equations are notationally
heavy. Therefore, we only
state the results here and postpone the actual derivation
to Appendix B.
Let be defined as a regular zero level set of a map . Using the embedding, we identify curves in and vectors in with curves and vectors in . Let be a geodesic with and . The Jacobi field along with and can then be found as the -part of the solution of the IVP
with the map given in explicit form in Appendix B.
As previously noted, Jacobi fields can be described using an ODE incorporating the curvature endomorphism in the parameterized case. We can, however, apply a procedure similar to the implicit case and derive and IVP by differentiating the geodesic system (1). We will use the resulting IVP (24) when working with variations of geodesics in the parameterized case, see Appendix B.
Recall the equivalence (3) between Jacobi fields and : if satisfy (10) (or (24)) with initial values then is equal to . Therefore, we can compute the differential with respect to by numerically integrating the system using a basis for the tangent space at . With initial conditions instead, we can similarly compute the derivative with respect to the initial point . Note that implies that , a fact that allows the computation of as well.
Assuming the manifold is sufficiently smooth, we can differentiate the systems
(10) and (24) once more and thereby obtain second order
information that we will need later. The main difficulty is
performing the algebra of the already complicated expressions for the systems,
and, for the implicit case, we will need second
order derivatives of the generalized inverses .
For simplicity, we consider a families of geodesics with stationary initial
point. The derivations are
again postponed to Appendix B.
Let be of class , and let be a family of geodesics. Assume is a local parametrization containing , and let be the curve in representing , i.e. . Let with and . Define , and let . Then, in coordinates defined by , can be found as the -part of the solution of the IVP
with the map given in explicit form in Appendix B.
Now, let instead be defined as a regular zero level set of a map . Then can be found as the -part of the solution of the IVP
3.1 Numerical Considerations
The geodesic systems (1) and (2) can in both the parametrized and implicit case be expressed in Hamiltonian forms. In dedieu_symplectic_2005 (), the authors use this property along with symplectic numerical integrators to ensure the computed curves will be close to actual geodesics. This is possible since the Hamiltonian encodes the Riemannian metric. The usefulness of pursuing a similar approach of expressing the differential systems in Hamiltonian form and using symplectic integrators to preserve the Hamiltonians is limited since there is no direct interpretation of such Hamiltonians in contrast to the case for geodesic systems.
Along the same lines, we would like to use the preservation of quadratic forms for symplectic integrators hairer_geometric_2002 () to preserve quadratic properties of the differential of the exponential map, e.g. the Gauss Lemma do_carmo_riemannian_1992 (). We are currently investigating numerical schemes that could possibly ensure such stability.
4 Exact Principal Geodesic Analysis
As an example of how the IVPs describing differentials allow optimizing over geodesics, we will provide algorithms that allow iterative optimization of (6) and that thus allow PGA as defined in fletcher_principal_2004-1 () to be computed without the traditional linear approximation.
Solving the optimization problem (6) requires the ability to compute the projection . We start with the gradient needed for iteratively computing the projection before deriving the gradient of the cost function of (6). Computing these gradients will require the differentials over geodesic families derived in Section 3. Thereafter, we present the actual algorithms for solving the problems before discussing convergence issues.
The optimization problems (6) and (7) are posed in the tangent space of the manifold at the sample mean and the unit sphere of that tangent space, respectively. These domains have relatively simple geometry, and, therefore, the complexity of the problems is contained in the cost functions. Because of this, we will not need optimizing algorithms that are specialized for domains with complicated geometry. For simplicity, we compute gradients and present steepest descent algorithms but it is straightforward to compute Jacobians instead and use more advanced optimization algorithms such as Gauss-Newton or Levenberg-Marquardt.
The overall approach is similar to the approach used for computing exact PGA in said_exact_2007 (). Our solution differs in that we are able to compute and its differential without restricting to the manifold SO(3) and in that we optimize the functional (6) instead of the cost function used in fletcher_statistics_2003 () that involves one-dimensional subspaces.
4.1 The Geodesic Subspace Projection
We consider the projection of a point on a geodesic subspace . Assume is centered at , let be a -dimensional subspace of such that , and define a residual function by that measures squared distances between and points in . Computing by solving (7) is then equivalent to finding minimizing . To find the gradient of , choose an orthonormal basis for and extend it to a basis for . Furthermore, let and choose an orthonormal basis for the tangent space . Karcher showed in karcher_riemannian_1977 () that the gradient equals , and, using this, we get the gradient of the residual function as
with denoting the first columns of the matrix expressed using the chosen bases.777In coordinates of the bases, the differential becomes a matrix that we write . The notation denotes differentiation along the basis elements of . See Appendix A for additional notation. This matrix can be computed using the IVPs (10) or (24).
4.2 The Differential of the Subspace Projection
In order to optimize (6), we will need to compute gradients of the form
with , and .888 Since in (6) is restricted to the unit sphere, we will not need the gradient in the direction of , and, therefore, we find the gradient in the subspace instead of in the larger space . As noted in Section 2.4, the optimization approach presented here can be extended to include optimization of the base point as well. Here, we use a fixed base point that for PGA is an intrinsic mean of a data set. This will involve the differential of with respect to . Since is defined as a minimizer of (7), its differential cannot be obtained just by applying the chain and product rules. Instead, we use the implicit function theorem to define a map that equals around a neighborhood of in . We then derive the differential of .
For the result below, we extend the domain of residual function defined above from to the entire tangent space . We will a choose basis for , and we let denote the Hessian matrix of with respect to the basis. Similarly, we will choose a basis for , and we let denote the Hessian matrix of restricted to with respect to this basis. Using this notation, we get the following result for the derivative of the projection :
Let be an orthonormal basis for a subspace . For each , let be the subspace , and let be the corresponding geodesic subspace. Fix and define for an . Suppose the matrix has full rank . Extend the orthonormal basis for to an orthonormal basis for . Then
The coordinates of the vector in the basis for are contained in the st column of the matrix , the scalar is the st coordinate of in the basis, and is the matrix
with the last columns of the matrix and the identity matrix.
Before proving the result, we discuss its use for computing the gradient (14). The assumption that the Hessian of the restricted residual must have full rank is discussed below.
Because , we have
which, combined with (15), gives (14). In order to compute the right hand side of (15), it is necessary to compute parts of the Hessian of the non-restricted residual . For doing this, we will use the alternative formulation for the residual function. With let . Working in the chosen orthonormal basis, we have
Using that for a time dependent, invertible matrix 999 See (decell_derivative_1974, , Eq. (2)). and the fact that for all , we get
Proof (Proposition 1)
Extend the basis for to an orthonormal basis for . The argument is not dependent on this choice of basis, but it will make the reasoning and notation easier. Let be an open neighborhood of and define the map by
with the vectors constituting an orthonormal basis for for each and with and denoting the matrices having and in the columns, respectively. Since for all because is a minimizer for among vectors in in , we see that vanishes. Therefore, if is non-singular, the implicit function theorem asserts the existence of a map from a neighborhood of to with the property that for all in the neighborhood. We then compute
For the differentials on the right hand side of (18), we have
With the choice of basis, the above matrix is block triangular,
with equal to . The requirement that is non-singular is fulfilled, because has rank by assumption and has rank .
Since the first rows of are zero, we need only the last columns of in order to compute (18). The vector as defined in the statement of the theorem is equal to the st column. Let be the matrix consisting of the remaining columns. Using the form (20), we have
Assume is chosen such that equals the previously chosen basis for . With this assumption, is the identity matrix . In addition, let denote the st component of , that is, the projection of onto . Since and by choice of , Lemma 1 (see Appendix C) gives
Because , this provides (15).
4.3 Exact PGA Algorithm
The gradients of the cost functions enable us to iteratively solve the optimization problems (6) and (7). We let be an intrinsic mean of a dataset , . The algorithms listed below are essentially steepest descent methods but, as previously noted, Jacobian-based optimization algorithms can be employed as well.
Algorithm 1 for computing updates instead of the actual point that we are interested in. The vector is related to by .
The algorithm for solving (6) is listed in Algorithm 2. Since in (6) is required to be on the unit sphere, the optimization will take place on a manifold, and a natural approach to compute iteration updates will use the exponential map of the sphere. Yet, because of the symmetric geometry of the sphere, we approximate this using the simpler method of adding the gradient to the previous guess and normalizing. When computing the principal direction, we choose the initial guess as the first regular PCA vector of the data projected to in . See Figure 2 for an illustration of an iteration of the algorithm.