Spherical Principal Curves
Jang-Hyun Kim, Jongmin Lee and Hee-Seok Oh
Seoul National University
Seoul 08826, Korea
Abstract: This paper presents a new approach for dimension reduction of data observed in a sphere. Several dimension reduction techniques have recently developed for the analysis of non-Euclidean data. As a pioneer work, Hauberg (2016) attempted to implement principal curves on Riemannian manifolds. However, this approach uses approximations to deal with data on Riemannian manifolds, which causes distorted results. In this study, we propose a new approach to construct principal curves on a sphere by a projection of the data onto a continuous curve. Our approach lies in the same line of Hastie and Stuetzle (1989) that proposed principal curves for Euclidean space data. We further investigate the stationarity of the proposed principal curves that satisfy the self-consistency on a sphere. Results from real data analysis with earthquake data and simulation examples demonstrate the promising empirical properties of the proposed approach.
Keywords: Dimension reduction; Feature extraction, Principal geodesic analysis; Principal curve; Spherical domain.
A variety of dimension reduction techniques have been developed for representing and analyzing data. Most of the dimension reduction techniques are focused on Euclidean space data. Recently there has been a growing interest in the analysis of non-Euclidean data; some dimension reduction techniques for non-Euclidean data have been studied for shape analysis or motion analysis. For example, Siddiqi and Pizer (2008) and Cippitelli et al. (2016) considered a Cartesian product of SO(2) (i.e., sphere) and for medial representation and skeleton data analysis, respectively. For these representations, existing dimension reduction methods have been modified by considering geodesic distance. There have been several generalizations of principal component analysis (PCA) suited for non-Euclidean data, which contain Fletcher et al. (2004), Huckemann and Ziezold (2006), Huckemann et al. (2010), Jung et al. (2011), Jung et al. (2012), Panaretos et al. (2014), and Hauberg (2016). Fletcher et al. (2004) proposed a tangent plane approach of dimension reduction for manifold data, termed principal geodesic analysis (PGA). However, PGA is somewhat limited in scope because it can only perform the dimension reduction of spherical data onto the great circle. Along with this generalization, Jung et al. (2011) proposed a dimension reduction that extracts a circle on a sphere, and Jung et al. (2012) suggested principal nested spheres that can be considered as a high-dimensional version of Jung et al. (2011). Huckemann et al. (2010) suggested an extension of PCA using the intrinsic manifold structure and obtained their principal geodesic component as the best approximating the data does not necessarily pass through a predetermined intrinsic mean.
As a study that is closely related to our proposal, Hauberg (2016) developed principal curves on Riemannian manifolds. To be specific, Hauberg (2016) used an approximation to construct principal curves by a projection of the data onto the discrete points, unlike the original principal curve by Hastie and Stuetzle (1989) that projects the data onto a continuous curve. This approximation of finding closest points instead of projections on a curve causes a problem that may project different data points onto a single point mistakenly. In this study, we propose a new principal curve approach for spherical data by a projection of the data onto a continuous curve on a sphere instead of approximation, which improves dimension reduction and its stability. Our proposed approach is two-fold: One is the extrinsic approach that requires an additional space setting to the given manifold. In the case of the sphere, there exists a unique natural embedding space (three-dimensional space), and there is no question of the subjectivity of the extrinsic approach. The extrinsic approach is computationally efficient. The other is the intrinsic approach without any embedding space. Although this approach is difficult to calculate and impractical (Srivastava and Klassen, 2002), it is necessary to develop principal curves to a generic manifold. Furthermore, we investigate the stationarity of spherical principal curves obtained by both proposed approaches.
To sum up, the main contributions of this study can be summarized as follows: (a) We present a new principal circle defined on a sphere for initialization of the proposed principal curves. (b) We propose both extrinsic and intrinsic approaches to construct principal curves on a sphere. (c) We develop median principal curves that might be resistant to outliers, which can be considered as a robust extension of the proposed principal curves. (d) We verify the stationarity of all the proposed principal curves on the sphere.
The remainder of this paper is organized as follows. Section 2 briefly reviews conventional principal curves and intrinsic and extrinsic means on manifolds. In Section 3, a newly developed exact principal circle on a sphere is studied, which is used for the initialization of the proposed principal curves. Section 4 presents the proposed principal curves with a practical algorithm and investigates the stationarity of them theoretically. In Section 5, the experimental results of the proposed method are provided through earthquake data from the U.S. Geological Survey and simulation examples. Section 6 discusses a robust extension of intrinsic principal curves, termed median principal curves. Finally, concluding remarks are given in Section 7. The detailed proofs of properties of the principal curve on the sphere are given in Appendix.
Before closing this section, we remark that, as a significant extension of canonical interpretation of principal component analysis on Riemannian manifolds, Panaretos et al. (2014) introduced a smooth curve on the manifold, termed principal flow, which passes through the mean of given data and allows for non-geodesic flexibility of curve fitting of the data. Along with this line, the concept of principal flows can be shared with that of our method. Nevertheless, in this study, we focus on a direct extension of principal curves from Euclidean space to Riemannian manifolds, such as a sphere.
2.1 Principal Curves
Principal curves of Hastie and Stuetzle (1989) can be considered as a nonlinear generalization of PCA that finds an affine space that maximizes the variance of the projected data. The curve is a function from a one-dimensional closed interval to a given space, and further, curve is said to be self-consistent or a principal curve of if the curve satisfies
where is the projection of the point to the function . It implies that is the average of all data points projected onto itself. One of the most important consequences of self-consistency is that the principal curve is a critical point of the reconstruction error for small perturbations (Hastie and Stuetzle, 1989).
The existence of principal curves satisfying the self-consistency for a given distribution is not guaranteed. It is also difficult to formulate the principal curve by solving the self-consistent equation of (1). Thus, by the algorithm of Hastie and Stuetzle (1989), we set the principal curve as the first order spline connected by points, where the points are initialized by the given rule. The points of the principal curve are updated so that the curve reaches the self-consistency condition by iterating the following two steps, projection and expectation: (a) In the projection step, the given data are projected onto the curve. If the projection index is not unique, it is set to be the maximum value among them. (b) In the expectation step, the curve is updated to satisfy the self-consistency.
In practice, the projection indices of the data are likely to be different. Thus, each point of the curve is updated to the local weighted mean of data points projected near the point of the curve. These two steps are iterated until the reconstruction error is converged.
As related works, Banfield and Raftery (1992) proposed a modification of the original algorithm by Hastie and Stuetzle (1989) to reduce bias when segments of function have high curvature. Tibshirani (1992) presented a different definition of principal curves based on a mixture model and suggested an EM algorithm for estimation to reduce bias.
2.2 Means on Manifolds
The self-consistency of principal curves on Euclidean space implies that the conditional expectation of random variable given a projection index coincides with the point of the curve at the projection index. On the one hand, the concept of the expected value of a distribution is expandable in a smooth manifold and is called as Fréchet mean. Given a probability distribution on a smooth Riemannian manifold with a distance , the Fréchet mean is defined as
The natural distance on a manifold is geodesic distance . The Fréchet mean of (2) using as distance is termed as intrinsic mean (Bhattacharya and Patrangenaru, 2003). However, in general, it is difficult to solve geodesic distance and to ensure the uniqueness of intrinsic mean (Le and Kume, 2000).
On the other hand, by embedding a given manifold into a higher dimensional Euclidean space , the Fréchet mean can be calculated using Euclidean distance in , which is called as extrinsic mean. With letting denote embedding from to , the extrinsic mean is defined as
It is equivalent to the projection of the mean value of data points in to (Bhattacharya and Patrangenaru, 2003). With a projection mapping as , the extrinsic mean of (3) can be calculated as
We remark that the extrinsic mean obtained by Euclidean distance holds a computational advantage, compared to the intrinsic mean (Bhattacharya et al., 2012). Moreover, the extrinsic mean is indistinguishable from the intrinsic mean in the case that the distribution has support in a small region (Bhattacharya and Patrangenaru, 2005).
2.3 Existing Principal Curves on Riemannian Manifolds
Hauberg (2016) developed principal curves on Riemannian manifolds by extending the previous work of Hastie and Stuetzle (1989). The principal curve by Hauberg (2016) is represented as a set of points, , joined by geodesic segments. These points are initialized with the principal geodesic. The fitting algorithm of the curves follows that of Hastie and Stuetzle (1989). More specifically, the mean operation in the expectation step of Hastie and Stuetzle (1989) is replaced with the intrinsic mean, and in the projection step, each data point is projected onto the closest points in as
Note that Hauberg’s algorithm does not perform accurate projection on a principal curve.
3 Enhancement of Principal Circle for Initialization
In this section, we improve the principal circle to use the initialization for the proposed principal curves in Section 4.
3.1 Principal Geodesic and Principal Circle
The principal curve algorithm of Hastie and Stuetzle (1989) uses the first principal component as the initial curve, which is easily calculated by singular value decomposition (SVD) of the data matrix in Euclidean space. Along with this line, the proposed principal curve algorithm in Section 4 requires an initial curve.
Principal geodesic analysis (PGA) by Fletcher et al. (2004) can be considered as a generalization of PCA, which performs dimension reduction of data on the Cartesian product of simple manifolds. For this purpose, Fletcher et al. (2004) projected each manifold component of data onto the tangent space at the intrinsic mean of each component. As a result of the tangent space approximation of each component, the given data are approximated by points on Euclidean space; thus, by applying PCA, the dimension reduction can be performed through the inverse process of the tangent projection. For a spherical case, they particularly perform tangent space projection using an inverse exponential map, which is called as log map. The log map at (0,0,1) of the unit sphere centered at the origin of is given by Fletcher et al. (2004) as
where is a point on the unit sphere and .
Principal curves on manifolds by Hauberg (2016) employ PGA for the initialization of a curve without endpoints and use a circle by tangent approximation for initialization of a closed curve. The first principal component obtained by the projection of manifolds into the tangent space corresponds to the geodesic in the existing manifolds (Fletcher et al., 2004). Thus, principal curves by Hauberg (2016) are initialized as great circles or approximated circles that do not minimize a reconstruction error for the representation of spherical data. However, geodesic on a sphere is limited to represent data on the sphere. Figure 1 shows earthquake data from the U.S. Geological Survey that represent the locations (blue dots) of significant earthquakes of Mb magnitude 8+ around the Pacific ocean since 1900. The data will be analyzed detailedly Section 5. In Figure 1, the result (pink) by PGA does not fit the data appropriately, while our principal circle (red) in Section 3.2 improves the representation of the data.
In literature, there exists an attempt by Jung et al. (2011) that generalizes PGA to a circle on a sphere. The circle on the sphere that minimizes a reconstruction error is called principal circle, where the reconstruction error is defined as the total sum of squares of geodesic distances between curve and data. Jung et al. (2011) used a doubly iterated algorithm that finds the principal circle after projecting the data onto the tangent space by using the log map. However, this approach suffers from two problems. First, using the tangent approximation in minimizing the distance causes numerical errors. In the case that data points are located far away from the mean, the numerical errors increase considerably because there is no local isometry between the sphere and its tangent plane from the Gauss’s Theorema Egregium. For instance, the left panel in Figure 2 shows simulated data distributed in the range of longitudes from 0.7 to 0.8, where the underlying structure is a great circle. Suppose that the intrinsic mean of the dataset is . It means that even if the structure is a circle, data points are mainly concentrated on around the north pole, which looks like an inhomogeneous great circle. The middle panel shows projected points of the dataset onto the tangent plane at . Although the data points around the south pole are dense, they become somewhat scattered after mapping onto the plane, as shown in the middle panel, which causes numerical errors (non-isometricity). Secondly, owing to a topological difference between sphere and plane, the existence of principal circles in the plane is not guaranteed. Even though the sphere is compact, the tangent plane is not compact, merely open. From the compactness of the sphere, the least square circle always exists regardless of the data structure; however, it does not work in the plane. In the right panel of Figure 2, the blue dotted points are the projected ones onto the plane, which form a non-periodic line. Because a small arc of a big circle looks like a line, there are infinitely many circles that minimize the reconstruction error to be sufficiently close to 0. Hence, the least square circle does not exist in this case.
In this study, for a better initialization of the proposed principal curves in Section 4, we propose a new principal circle that does not depend on tangent projection. This attempt was previously made by Gray et al. (1980), which embeds a sphere into three-dimensional space and then find a circle that minimizes the reconstruction error through an iterative quadratic approximation of geodesic distances in the three-dimensional spatial coordinate system. Specifically, Gray et al. (1980) expressed the center of the circle as three-dimensional coordinates with a constraint of .
On the other hand, we obtain the constraint-free optimization problem by expressing the center of the circle using a spherical coordinate system in Section 3.2; hence, our principal curve can be obtained through a simple algorithm.
3.2 Intrinsic Principal Circle
For our principal circle, we consider an intrinsic optimization algorithm that does not use an approximation; thus, it is simple and is applicable to some situations focused on a two-dimensional sphere instead of a high-dimensional one. Let denote the minimal geodesic distance between two points and on a sphere. For a given dataset and a circle on a sphere, let be the sum of squares of distances between circle and data, defined as
where denotes a projection of on . The goal is to find a circle on the sphere that minimizes . To solve this optimization problem, we represent a circle by a center of the circle and a radius that is a geodesic distance between the center and the circle . This representation is not unique. For example, let be the antipode of , then and represent the same circle . Nevertheless, it is not crucial to the optimization problem since we simply find a representation of the least square circle. See Figure 3. By using a spherical coordinate system, it is able to parameterize as , where denotes the azimuth angle and is the polar angle. By symmetricity of the circle, can be easily calculated by
Thus, we have
With letting and in the spherical coordinate system, the geodesic distance is given by the spherical law of cosines with three points , and the polar point (see Appendix for details)
By putting (6) into (5), it follows that is represented as a three-parameter differentiable function in domain . Since is compact, the function holds global minimum value; hence, it can apply the gradient descent method to find the solution. Here is the algorithm to find a principal circle by the above description.
Notice that the algorithm can converge to a local minimum since is not a convex function. Therefore, the intrinsic mean of the data set is recommended for the initial setting. Denote ( as spherical coordinates of the intrinsic mean. Besides, denotes the step size of the algorithm, which works well unless is too large.
4 Proposed Principal Curves
In this section, we propose both extrinsic and intrinsic principal curves on a sphere and present practical algorithms for implementing the procedures. We further investigate the stationarity of the proposed principal curves.
4.1 Extrinsic Principal Curves
As mentioned previously, Hauberg’s algorithm does not perform exact projection on a principal curve. On the other hand, when we focus on a sphere, by using the series of rotations, it is able to perform exact projection on the geodesic segments, and hence, obtain a continuous and sophisticated dimension reduction, which is our main feature that is distinguished from Hauberg’s principal curves. Like the original one of Hastie and Stuetzle (1989), we consider performing the projection and expectation steps iteratively to obtain principal curves on a sphere.
In the projection step, each data point is projected on the curve along a geodesic. To project a point on an arbitrary geodesic segment, we rotate the geodesic segment to be aligned on the equator. Next, we project the rotated data point on the rotated geodesic segment and re-rotate the result to get coordinates of the projection that we want. Figure 4 shows this procedure that projects a point to the geodesic segment by using rotations. In the left panel of Figure 4, the initial position of the point (blue) and the corresponding geodesic segment (red) are displayed. The middle panel shows a rotated result obtained after rotating one of the endpoints of the geodesic segment to the north pole. As a result of this rotation, the geodesic becomes a part of a great circle perpendicular to the -plane. By adding /2 to the azimuth angle of the geodesic, it is able to get the center (green) of the great circle, which contains the geodesic segments. The right panel shows a rotated result from rotating the center of the circle to the north pole. As one can see, the geodesic segment is now aligned on the equator, so we can easily calculate the coordinates of projection from the point (blue) using a spherical coordinate system. By the inverse process of these rotations, we obtain the coordinates of the projection in the original coordinate system. Finally, after projecting the data point on each geodesic segment of the principal curve, we obtain the projection on the curve by choosing the closest geodesic segment. Let denote the projection index of a point to the curve , where . Then, the projection of can be expressed as
In the expectation step, we find the mean of data points. Mean on a sphere can be obtained from both intrinsic and extrinsic perspectives. We first focus on the extrinsic approach. The extrinsic approach is advantageous in that it has a smaller computational complexity compared to the intrinsic algorithm, and further, the stationarity of the spherical principal curves can be held. The expectation step follows that of the principal curve of Hauberg (2016), i.e., updating the weighted average by smoothing that makes the curve closer to the self-consistency condition. Suppose that we have data points and the corresponding projection indices , where for . Let denote the number of points in an initial curve. The local weighted smoother updates the th point of the principal curve with the weighted extrinsic mean of data points for . We use a quadratic kernel , as Hauberg (2016), and the weights are given by , where for fixed .
Here are the detailed steps for obtaining the proposed extrinsic principal curve.
We remark that, for principal curves on a sphere, we can choose the curves to have either the endpoints or not. Through our experiments, we observe that there is no significant difference in empirical performance. As far as Euclidean space is concerned as embedding space, the extrinsic approach is advantageous in computational efficiency. However, if data points, in the expectation step, are not contained in local regions, then the intrinsic way may have better performances than the extrinsic one. Moreover, the intrinsic manner might be appealing owing to its inherent metric.
4.2 Intrinsic Principal Curves
Intrinsic principal curves can be obtained by using the intrinsic mean in the expectation step of Algorithm 2. In our case, the intrinsic mean is the point that minimizes the sum of square of the geodesic distances ; thus, the minimum cannot be solved explicitly; instead, it can be approximately obtained by an iterative algorithm (Fletcher et al., 2004). Although the intrinsic mean exists due to the compactness of the sphere, uniqueness is guaranteed only in a restricted region. Specifically, the intrinsic mean is determined uniquely for a random vector on the unit sphere satisfying for some (Pennec, 2006).
Here are the detailed steps for obtaining the proposed intrinsic principal curve.
4.3 Stationarity of Principal Curves
For a random vector on for , the stationarity of the principal curve of is given by Hastie and Stuetzle (1989) as
where and are smooth curves on satisfying and , and denotes the distance from to the curve .
However, unlike , it is not easy to define addition on a sphere. Thus, it is necessary to redefine some concepts, such as addition and perturbation, in order to extend the properties of the principal curves on Euclidean space to a spherical surface. We conversely consider firstly, instead of . Specifically, let and be infinitely differentiable smooth curves on a sphere that are parameterized with . From these assumptions, we define the perturbation shown in Figure 5 as follows:
-internal division of the geodesic from to means a point that contracts by times along the geodesic from to , that is,
For , -internal division of the geodesic from to can be defined in the same manner. We note that when and are located on the opposite sides, the shortest geodesic is not unique. Thus, we only consider the case that the distance between and the perturbation function is less than when the curves are on the unit sphere. It is reasonable since we aim to show the stationarity for small .
Let and be smooth curves on a sphere that are parameterized with satisfying . Then, we define as -internal division of the geodesic from to .
The following two properties can be obtained from the definition.
Under the same conditions in Definition 2, is a smooth curve on a sphere and uniformly on .
From the assumption that and are smooth functions and the uniqueness assumption of , the smoothness of with respect to follows. Moreover, it is easy to see that is infinitely differentiable on . In addition, from the definition of -internal division and , the uniform continuity of on with respect to follows. ∎
In Euclidean space, we have , where . From this fact, the magnitude of perturbation is defined as , , , and The boundedness of guarantees that -internal division of the geodesic from to converges to uniformly on as goes to zero. Notice that from the compactness of the unit sphere, is inherently equal or less than ; thus, the assumption of implies that .
Moreover, the norm of derivative of perturbation is defined as , , , and . It is easy to check that a constraint ensures a finite length of and well-definedness of for . In addition, is extended to a small open interval containing for differentiability on .
Let be a point on a sphere. By the continuity of and the compactness of the domain set, it follows that can be attained. Let denote the geodesic distance between and , i.e., . By the continuity of again, is closed and therefore compact. Thus, the projection index is well defined. When , the point is called an ambiguity point of . The set of ambiguity points of the smooth curve has spherical measure 0; thus, the ambiguity points are negligible when calculating the expected value.
Let be a random vector on a sphere that has a probability density. We define that is an extrinsic principal curve of when satisfies the following self-consistency of ,
where denotes the natural embedding from the unit sphere to and is a projection mapping from to the sphere, which are used in (4) for the definition of extrinsic mean.
In the principal curve algorithm, it is natural that the projection is made along the geodesic, and therefore, it is reasonable to calculate the distance of (7) as the geodesic distance. However, it is difficult to show that the equation (7) is valid in the spherical surface directly. It seems that the square is a natural operation in the Euclidean space, but not in the spherical surface. Here, by using the spherical law of cosine, we obtain the following result.
Suppose that , are finite-length smooth curves satisfying and and is a random vector on . Then, is an extrinsic principal curve of if and only if
A proof is provided in Appendix. We remark that since for small , the above result (8) has a similar meaning to (7). Based on this, it is able to consider the sum of in the extrinsic principal curve as the reconstruction error (distance) . However, is a monotonically increasing function of , if ; hence, there is no difference in which one is used as an evaluation measure or a threshold in the algorithm.
Furthermore, we find a necessary and sufficient condition of intrinsic principal curves on a sphere below.
Suppose that , are finite-length smooth curves satisfying and and that a random vector on satisfies , where for a small . Then, is an intrinsic principal curve of if and only if
In contrast to the extrinsic stationarity, an additional constraint is necessary for guaranteeing that the projection index is differentiable for and its derivative is uniformly bounded, which is verified by Lemma 5 in Appendix. Furthermore, the measure of goes to 0 as by Lemma 4 in Appendix. Thus, the constraint of random vector could be almost negligible by setting infinitesimally small since has a probability density.
We remark that the stationarity of principal curves in Euclidean space provides a rationale for the principal curves of Hastie and Stuetzle (1998) that is a nonlinear generalization of the linear principal component. Along the same line, the above stationarity results provide theoretical justifications that the proposed approaches extend the principal curves in Euclidean space to the sphere.
5 Numerical Experiments
In this section, we conduct numerical experiments with real data analysis and several simulated examples to assess the practical performance of the proposed methods, which are implemented by the algorithms in Section 4. The Python codes used to implement the methods and to carry out some experiments are available at https://github.com/Janghyun1230/Spherical-Principal-Curve in order that one can reproduce the same results.
5.1 Real Data Analysis: U.S. Geological Earthquake Survey
Here we present a real data analysis of the proposed principal curves on a sphere. For this purpose, we consider earthquake data from the U.S. Geological Survey in Figure 6 that represent the distribution of significant earthquakes (8+ Mb magnitude) around the Pacific Ocean since the year 1900. As shown in Figure 6, we observe some features of the data: (a) the observations are scattered on the surface of Earth, and (b) they are non-linear one-dimensional structure on the sphere. It seems that the proposed principal curve might be applicable to the data for extracting the feature.
We now perform the proposed extrinsic principal curve over different values of hyperparameter . As decreases, the resulting curve becomes wiggly and overfits the data. Figure 7 shows the results with and 0.2. The determination of the parameter is not easy. Duchamp and Stuetzle (1996) showed that a principal curve is always a saddle point of the expected squared distance function, which explains why cross-validation is not reliable for choosing hyperparameter . In Euclidean space, Biau and Fischer (2012) suggested the parameter selection by empirical risk minimization via penalization of length and the number of points in curves. In a spherical domain, this topic is left for future work.
As mentioned earlier, the proposed algorithm is required to have an initialization. The results in Figure 7 are obtained by using uniformly sampled points from the intrinsic principal circle in Section 3.2 as an initialization.
|T=77||# of distinct projection||74/77||72/77||22/77|
|# of distinct projection||76/77||76/77||9/77|
|# of distinct projection||74/77||73/77||16/77|
|# of distinct projection||75/77||75/77||27/77|
|T=500||# of distinct projection||75/77||72/77||30/77|
|# of distinct projection||75/77||74/77||18/77|
|# of distinct projection||75/77||75/77||27/77|
|# of distinct projection||75/77||75/77||37/77|
We further compare the proposed extrinsic principal curves with the principal curves of Hauberg (2016). Figure 8 shows both results with , where the purple lines represent the fitted curves by each method, and the blue lines represent the projections from the data onto the curve. As one can see, the proposed extrinsic principal curve continuously represents the given data on the curve, whereas the principal curve of Hauberg (2016) projects some local data points to a single point. The results are further summarized in Table 1. When 77 earthquake observations are projected on the principal curve, the number of distinct projections by our method is much larger than that of Hauberg’s method. It implies that our principal curve represents the data continuously, compared to Hauberg’s principal curve that tends to cluster the data. Moreover, through a reconstruction error defined as with the observations and fitted values , we compare both methods and observe that our method outperforms Hauberg’s method.
5.2 Simulation Study
To evaluate the empirical performance of the proposed method, we conduct a simulation study. For this purpose, we consider two types of form on the unit sphere with spherical coordinates , where is the azimuthal angle, and denotes polar angle.
Circle: It is formed of with and . It is an ideal case for the extrinsic principal curve as well as the improved principal circle.
Wave: It is defined as with and . It has a complicated structure with varying frequency () or amplitude of waveform ().
For each dataset, we generate data points from each form by sampling equally in , and add normal noises from on with a certain noise level . Figure 9 shows a realization of noisy circular data with marked by blue and the extrinsic principal curve by the proposed method with and that represents the underlying structure well.
We now consider a dataset with the wave structure, in which the principal circle cannot reveal the wave feature, while the proposed principal curve might be capable of extracting the feature. To conduct various experiments, we use several forms of wave, parameterized by amplitude and frequency . In addition, a sampling rate of the data point is considered for the experiment setting.
Firstly, a waveform dataset with the amplitude of and the frequency of is generated. Figure 10 shows the underlying structure, noisy data, and the extrinsic principal curve with and that represents the underlying waveform efficiently. Notice that the amplitude of principal curves is reduced owing to the locally weighted averaging in the expectation step of the algorithms when segments of the underlying structure have high curvature, which is a general phenomenon of the original principal curves (Banfield and Raftery, 1992). Secondly, we consider a waveform dataset with amplitude and relatively low frequency in Figure 11. We compare our methods with Hauberg’s method under the same condition and . As one can see, both extrinsic and intrinsic principal curves extract the true waveform effectively, while Hauberg’s method yields rather a sharp curve. Thirdly, Figure 12 shows a noisy waveform dataset with relatively low amplitude and frequency and the true waveform. The result by the proposed method represents the waveform feature well, as shown in the right panel. Lastly, in Figure 13, we use a waveform with high amplitude and high frequency as the first case, but take a low sampling rate that the number of data points is small, , compared to the case of Figure 10. In other words, the generated data are rather sparse. The result by the proposed method in the right panel produces a sawtooth form, but it still represents the up-down pattern roughly on the sphere.
For further evaluation, we conduct a Monte Carlo simulation. For this purpose, we generate noisy data that consist of two true curves and Gaussian noises from with a noise level of or 0.1. The number of data points is , and data points are equally spaced. We consider three methods for feature extraction, the proposed extrinsic, intrinsic principal curves, and the principal curve by Hauberg (2016). As an evaluation measure, we consider a reconstruction error defined as , where denote sample values of the true function and are the fitted values. In addition, we use the number of distinct projection points as another evaluation measure. It can be conjectured that the resulting curve with a large number of distinct projection points represents data continuously.
For each combination of the underlying curve and noise level, we generate equally spaced data points with size and then compute the reconstruction error and the number of distinct projection points by each method. Over 50 simulation data sets, Table 2 lists the average values of reconstruction errors and their standard deviations. As listed, the proposed principal curves outperform the method of Hauberg (2016). Table 3 provides average values of distinct projection points and their standard deviations. As one can see, the proposed method provides a much large number of distinct projection points, compared to that of Hauberg (2016). Overall, as listed in Tables 2 and 3, the proposed methods perform better than Hauberg’s method, including the case that is much larger than in which Hauberg’s method is improved. In addition, the results of both intrinsic and extrinsic principal curves are similar in terms of reconstruction error and the number of distinct projection points, which is along with the fact that the intrinsic and extrinsic means are almost identical in the local region (Bhattacharya and Patrangenaru, 2005).
|True form||Method||Noise level|
|Circle||Proposed(extrinsic)||0.093 (0.026)||0.12 (0.026)||0.095 (0.013)||0.201 (0.049)||0.218 (0.042)||0.138 (0.025)|
|Proposed(intrinsic)||0.093 (0.026)||0.12 (0.027)||0.095 (0.013)||0.201 (0.048)||0.216 (0.046)||0.137 (0.025)|
|Hauberg||0.117 (0.073)||0.408 (0.149)||0.298 (0.038)||0.370 (0.205)||0.74 (0.208)||0.494 (0.063)|
|Wave||Proposed(extrinsic)||0.703 (0.116)||0.338 (0.092)||0.084 (0.026)||0.679 (0.136)||0.353 (0.108)||0.124 (0.037)|
|Proposed(intrinsic)||0.71 (0.114)||0.329 (0.097)||0.084 (0.023)||0.673 (0.150)||0.346 (0.113)||0.124 (0.038)|
|Hauberg||2.444 (0.059)||2.158 (0.155)||0.568 (0.055)||2.544 (0.118)||2.103 (0.563)||0.796 (0.094)|
|Circle||Proposed(extrinsic)||0.088 (0.026)||0.118 (0.023)||0.091 (0.018)||0.209 (0.049)||0.212 (0.043)||0.129 (0.018)|
|Proposed(intrinsic)||0.088 (0.026)||0.118 (0.023)||0.091 (0.018)||0.21 (0.050)||0.207 (0.043)||0.129 (0.018)|
|Hauberg||0.089 (0.027)||0.205 (0.079)||0.269 (0.034)||0.233 (0.087)||0.453 (0.177)||0.397 (0.079)|
|Wave||Proposed(extrinsic)||0.531 (0.064)||0.239 (0.058)||0.072 (0.020)||0.57 (0.094)||0.237 (0.081)||0.110 (0.031)|
|Proposed(intrinsic)||0.535 (0.065)||0.239 (0.056)||0.072 (0.020)||0.574 (0.094)||0.237 (0.082)||0.110 (0.031)|
|Hauberg||2.006 (0.697)||1.831 (0.146)||0.529 (0.043)||1.906 (0.847)||1.756 (0.696)||0.688 (0.073)|
|True form||Method||Noise level|
|Circle||proposed(extrinsic)||99.06 (0.31)||98.96 (0.40)||98.80 (0.45)||99.06 (0.31)||98.6 (1.12)||98.10 (1.05)|
|proposed(intrinsic)||99.02 (0.32)||98.92 (0.34)||98.84 (0.47)||99.08 (0.34)||98.72 (1.11)||98.20 (1.12)|
|Hauberg||87.70 (7.95)||56.68 (17.99)||64.70 (3.22)||69.80 (12.28)||47.04 (15.44)||60.42 (2.83)|
|Wave||proposed(extrinsic)||93.36 (4.50)||96.88 (1.96)||99.30 (0.54)||94.98 (3.86)||96.36 (2.34)||99.00 (0.67)|
|proposed(intrinsic)||93.36 (4.47)||97.28 (2.13)||99.32 (0.51)||95.82 (3.77)||96.72 (2.22)||99.10 (0.65)|
|Hauberg||22.72 (2.77)||25.94 (2.65)||62.14 (2.49)||24.5 (3.63)||32.16 (16.72)||58.84 (3.04)|
|Circle||proposed(extrinsic)||99.06 (0.24)||99.08 (0.40)||98.72 (0.58)||99.08 (0.27)||98.94 (0.47)||98.14 (1.03)|
|proposed(intrinsic)||99.08 (0.27)||99.02 (0.25)||98.76 (0.69)||99.1 (0.30)||99.04 (0.49)||99.30 (1.09)|
|Hauberg||97.8 (1.47)||89 (8.63)||79.28 (4.60)||93.64 (5.29)||78.72 (13.08)||73.86 (7.28)|
|Wave||proposed(extrinsic)||99.16 (0.37)||98.5 (1.40)||99.28 (0.64)||99.18 (0.39)||98.68 (1.22)||99.30 (0.74)|
|proposed(intrinsic)||99.18 (0.39)||98.5 (1.27)||99.26 (0.56)||99.22 (0.42)||98.84 (1.20)||99.18 (0.66)|
|Hauberg||45.2 (24.8)||43.38 (3.72)||73.20 (3.42)||52.04 (26.81)||50.64 (19.99)||71.52 (4.38)|
6 Median Principal Curves: A Robust Extension
The intrinsic mean is obtained by the minimization of the sum of square of the geodesic distances; thus, the intrinsic principal curves might be sensitive to outliers. By following the conventional way in Euclidean space, it is natural to consider the median as an alternative measure of the centrality of data. In particular, we replace the expectation step in Algorithm 3 with
Then the resulting curves pass through the “median” of the given dataset. We now define a median principal curve, by mimicking the principal curve of (1), as
where the points projecting onto are contained in a set of a great circle on the sphere, , and a connected proper subset of is isometric to a line with the same length in . Thus, the above-median can be found in the same way in . More specifically, for a given dataset , the point on the sphere that minimizes the function is called geometric median, which has no explicit form (Bajaj, 1988). Thus, it should be obtained by a numerical algorithm. By a similar way in Buss and Fillmore (2001), we have
for . Here is a practical algorithm for finding geometric median on a sphere by gradient descent.
In the case that a dataset on the unit sphere is contained in a convex region in which any geodesic distance between two points in is less than , the geometric median exists and is unique (Fletcher et al., 2009). In practice, it is necessary to obtain the geometric median only on a small region in the expectation step of the principal curve algorithm, say Algorithm 3. Meanwhile, notice that we divide a gradient part by iteration to get for each iteration. In contrast to the intrinsic mean, is a weighted average of the unit vectors; thus, it is possible that its length does not go to 0, and the algorithm could oscillate and not converge. Heuristically, to cope with this problem, we divide it by the number of iterations , so the length of goes to 0 as . Moreover, it is able to reach the goal finally because of . Algorithm 4 works and is empirically stable. To reduce the computational cost, the denominator could be considered as () instead of .
Furthermore, we investigate the stationarity of the median principal curves as follows.
Under the same conditions on Theorem 2, is a median principal curve of if and only if
We now present a simulated example for the empirical performance of the proposed median principal curves. For this purpose, we consider the circular data on the sphere, which is formed of with and . We generate data points by sampling equally in , and add random noises from Cauchy on . As shown in Figure 14, the median principal curve implemented with and is robust to the outliers and represents the underlying structure well, compared to the extrinsic principal curve.
7 Concluding Remarks
In this paper, we have proposed new principal curves on a sphere for dimension reduction of spherical data. For this purpose, we have considered the extrinsic and intrinsic approaches and investigated the stationarity of the principal curves which supports that the proposed methods are a direct extension of principal curves (Hastie and Stuetzle, 1989) to a sphere.
As far as Euclidean space is concerned as embedding space, dimension, or location of the origin does not affect the distance, and hence, does not affect the extrinsic mean. Moreover, the extrinsic approach holds efficiency computationally. However, it is questionable whether an extrinsic approach is still effective in asymmetric manifolds. For some asymmetric manifolds, an intrinsic approach may yield better performance. Moreover, the intrinsic mean defined by a given geodesic distance might be attractive owing to its inherency. However, the iterative way for the intrinsic mean is computationally expensive since it takes time for the algorithm to converge.
In this study, the proposed principal curve algorithm is a top-down approach, in the sense that it approximates the structure of data as an initial curve and then improve the approximation gradually in an iterative manner. Thus, initialization plays a crucial role in the algorithm. The intrinsic principal circle in Section 3.2 has been used as the initialization of the proposed principal curve. It is capable of identifying some meaningful structure of spherical data. However, there exist some complex structures that cannot be represented only by a circle, for example, a structure divided into several pieces or a structure with intersections. To cope with this limitation, it is worth to study a “bottom-up” approach, which is left for future work.
As final remarks, in shape analysis or motion analysis, data have been represented as a Cartesian product of , and SO(2). PGA by Fletcher et al. (2004) and the principal arc analysis by Jung et al. (2011) perform dimension reduction on medial representation by mapping the given data space to Euclidean space. If the data are properly projected on geodesic segments, the proposed principal curve procedure can be applied for medial representation. Since the mean can be independently calculated for each component of the Cartesian product, there is no additional difficulty in the expectation step. Some problems of projecting from the Cartesian product of simple manifolds to geodesic segments are left for future study.
Appendix: Stationarity of Principal Curves
Throughout this discussion, we cover finite-length smooth curve that does not cross on a sphere (i.e., , including curves with end points as well as closed curves, which can be both parameterized by the closed interval . In the latter case, boundary condition is needed; any order partial derivatives of at 0 and 1 are the same, i.e., . For a random vector on a sphere, we further assume that the curve is not short enough to cover well, that is, , which implies that is orthogonally projected on the curve . Note that is infinitely differentiable on , i.e., is smoothly extended to open interval containing and all the its derivatives on are continuous. Our main purpose is to prove the stationarity of extrinsic, intrinsic, and median principal curves that satisfies the equations (8), (9), and (10) in Theorems 1, 2, and 3.
When moving from Euclidean space to spherical surface, topological properties such as measurability and continuity are preserved, while the formula using specific distance should be modified. This modification could be obtained by embedding a spherical surface into a three-dimensional Euclidean space. Specifically, we embed a spherical surface as a unit sphere centered at the origin and investigate further derivations. For a smooth curve , suppose that is parameterized by a constant speed with respect to and is expressed as three-dimensional coordinates . Then the following lemmas are held.
The curve satisfies and , , where denotes inner product in .
It is directly obtained from and . ∎
Suppose that and are expressed as three-dimensional vectors. Then, it follows that , where is the angle between and . Then, . Thus, it follows that for some . Note that denotes the projection index of point to the curve .
For , it follows that . From the assumption that is a smooth curve and the fact that has the minimum at , the remaining part of the lemma follows by differentiation with respect to . ∎
(Spherical law of cosines) Let , , be points on a sphere, and , and denote , and , respectively. If is the angle between and , i.e., the angle of the corner opposite , then, Further, with three-dimensional vectors , , , it follows that where denotes outer product in .
The next topological properties of the principal curves established in Euclidean space of Hastie and Stuetzle (1989) are still valid in spherical surfaces.
(Measurability of index function) For a continuous curve f on a sphere, is measurable.
(Continuity of projection index under perturbation) If is not an ambiguity point for continuous curve , then .
In the proof of Proposition 3, we use the triangle inequality on a sphere. Moreover, by the smoothness and compact domain of ; , we have a more efficient tool for verifying Theorem 2 and 3.
(Uniform continuity of projection index under perturbation) uniformly on the set of non-ambiguity points of . That is, for every , there exists such that for any non-ambiguity points , if , then .
For guaranteeing uniform continuity of projection index, it is required that is bounded. A proof is similar to that of Proposition 3; thus, we omit the proof, which can be provided on request.
Spherical measure of the set of ambiguity points of smooth curve is 0.
Detailed steps for a proof of Proposition 5 are similar with those of Hastie and Stuetzle (1989); thus, we omit the proof, which can be provided on request.
Meanwhile, to prove Theorem 2 and 3, it is crucial to verify that is differentiable for and its derivative is uniformly bounded. Thus, it is necessary to define a subset of as for . Note that is increasing as goes to 0 by definition.
The image of smooth function from to has measure 0. Moreover,
is a union of images of two smooth functions from to , which implies that are measure 0. Therefore, the measure of goes to 0 as .
Suppose that is a smooth function on the sphere, that is, . The domain and range are the second countable differentiable manifolds whose dimensions are 1 and 2, respectively. Since is twice continuously differentiable function and the differential has rank 1 which is less than dimension of , by Sard’s Theorem, the image has measure zero. Next, each point of is characterized by two equations and for some . Therefore, we define functions as follows: For all ,
It is well known that the curvature of a smooth curve lying on the unit sphere is more than 1. It implies that , where is the curvature of and for all , and hence . We have already known that by Lemma 1. Hence, it is obtained that , which means that and are well defined and smooth functions. Therefore, we have , which completes the proof. ∎
Lemma 4 implies that the constraints of random vector in Theorems 2 and 3 are almost negligible by setting infinitesimally small. It is also used for verifying Theorem 3. Denote the set of ambiguity points of smooth curve on a sphere as , which is a measure zero set by Proposition 5.
Let be the set of non-ambiguity points of smooth curve on a sphere. Suppose that for some small , and . Then is a smooth function for on an open interval containing zero. Moreover, is uniformly bounded on . That is, there exists a constant and such that if and , then .
Since is a non-ambiguity point of and , is uniquely characterized by orthogonality between geodesic from to and on a small near the zero, that is, by the same argument in Lemma 1. Then, we define a map as . is a smooth function by Proposition 1. It follows by the definition of that
By implicit function theorem, for each , is a smooth function for and in an open interval containing zero. Next, in order to prove uniform boundedness of , we should verify that