1 Introduction

Simplicial Nonlinear Principal Component Analysis

Abstract.

We present a new manifold learning algorithm that takes a set of data points lying on or near a lower dimensional manifold as input, possibly with noise, and outputs a simplicial complex that fits the data and the manifold. We have implemented the algorithm in the case where the input data has arbitrary dimension, but can be triangulated. We provide triangulations of data sets that fall on the surface of a torus, sphere, swiss roll, and creased sheet embedded in . We also discuss the theoretical justification of our algorithm.

Key words and phrases:
nonlinear dimensionality reduction, tangent space, manifold learning, principal component analysis
2000 Mathematics Subject Classification:
62-04
This work was supported by NSF DMS-1007399

1. Introduction

Given a large set of data points in a high dimensional space, the task of a manifold learning algorithm is to discover a lower dimensional manifold that approximates the data reasonably well. Principal component analysis may be interpreted as a manifold learning algorithm when the set of high dimensional data points lie near a lower dimensional affine subspace, in the sense that it can extract the affine subspace from the data set. Nature does not always serve up inherently linear data sets though. For example, a set of image vectors generated by photographing a sculpture at different azimuth and altitude angles may intuitively be described by two parameters related to the two angles, but this description is certainly nonlinear [2]. In [1], the authors use principal component analysis to precondition stellar spectra data before passing it to a neural network for classification, and note that a nonlinear preprocessing scheme may help the neural network learn rare or weak features of the data.

Isomap [6], Local Linear Embedding [3, 4], and Local Tangent Space Alignment [7] are recent approaches to reducing the dimension of artificially high dimensional data sets without destroying the geometric characteristics of the original data. Isomap approximates the geodesic distance between every pair of data points by summing the straight line Euclidean distances along the shortest rectilinear path through the data that connects the two points. It then constructs an embedding of the data into a lower dimensional Euclidean space by applying Multidimensional Scaling to the set of pairwise geodesic distances so that the geodesic distances are nearly preserved in the lower dimensional representation. Local Linear Embedding approximates each data point as a weighted average of its neighbors, and then reduces the data set by mapping each data point to a lower dimensional space in such a way that the set of nearest neighbors of a data point in the lower dimensional space is preserved under the mapping, and each data point in the lower dimensional space is approximated by the same weighted average of its neighbors. Given data that lies near the surface of a manifold, Local Tangent Space Alignment estimates the tangent space at each point from its neighbors, and then generates a lower dimensional coordinate system in such a way that the tangent space associated with a point in the lower dimensional space is still aligned with the tangent spaces associated with its neighbors. It appears that all three of these methods work best when the input data lie on a manifold that admits a global coordinate system into which the data can be mapped. In the case of Local Linear Embedding, the authors note that is an open question how to modify their algorithm to handle input data that lie on the surface of a sphere or torus [4, p. 148]. One of our design motivations was to develop a manifold learning algorithm that can handle this type of data set.

We present Simplicial Nonlinear Principal Component Analysis (SNPCA), an algorithm that shares some of the underlying motivation of Local Tangent Space Alignment, but differs in that it is a manifold learning algorithm whose output is a simplicial complex that acts as a lower dimensional description of the nonlinear input data set. SNPCA reduces the data set in two senses. First, every simplex vertex coincides with a surface data point, and typically there will be far fewer simplex vertices than data points. Secondly, the subset of data points that lie near a face of the simplex are fit by that face, whose dimension is typically much smaller than the dimension of the data points. We have implemented our algorithm in the case where the data can be fit with a complex of two-simplices, that is, when the data can be triangulated. This is the case when the data lies near the surface of a two dimensional manifold embedded in .

2. Algorithm overview for data that can be triangulated

We initially describe the algorithm for a set of data that can be triangulated, as this is the case for which we have implemented the algorithm. The two fundamental inputs into SNPCA are a set of data vectors , and a characteristic length . The output of SNPCA is a simplicial complex represented by the set , where is the set of vertices in the triangulation, and and are the sets of edges and triangles in the triangulation. Each triangle in is a three element set of vertices from , and each edge in is a two element set of vertices from . We assume that the data has been scaled so that changes in the different coordinates of the data vectors are comparably measured. The algorithm is designed so each vertex coincides with a data point in , but the number of vertices in is typically much less than the number of data points. There are no isolated edges in , meaning that every edge in is a subset of some triangle in . There are also no isolated vertices in , each point in is a vertex of at least one edge and one triangle.

The algorithm is composed of two distinct stages. The first is the advancing front stage, where each successful iteration of the main loop yields a new triangle that is added to the triangulation. In an iteration of the advancing front stage, the algorithm first selects an active front edge, which is an edge in that belongs to exactly one triangle in . A triangle edge can belong to at most two triangles. so intuitively, a front edge is a triangle edge that is exposed. The algorithm attempts to generate a new triangle composed of the active edge’s vertices and another vertex which is typically new, but may be an existing vertex in . If the new triangle is acceptable, then the triangulation is updated with the new triangle. Otherwise, the front edge is unviable for the remainder of the advancing front stage and the algorithm will not return to it until the second stage. The first stage terminates when every front edge is unviable.

As one can see in the figures in §4, the advancing front stage typically generates an incomplete triangulation of the data. This incomplete triangulation has seams, which are sequences of front edge pairs, where the edges in each pair are both nearly parallel and close to each other. The second stage of the algorithm is the seam sewing stage, where the algorithm attempts to close the remaining gaps in the triangulation. We now describe both stages in greater detail.

2.1. Advancing front stage for data that is locally two dimensional

Generating a candidate triangle from the active front edge

To generate the candidate triangle based on the active front edge, SNPCA requires the active edge’s two vertices, the data points located near each of the two vertices, and the characteristic length. Once the algorithm has generated the candidate triangle, it then determines whether to add it to the triangulation based on the criteria in §2.1.2. Given a front edge with vertices and , the algorithm determines the best coordinates in for a third vertex so that the new triangle fits a subset of the data, where the coordinates are best in the sense that they solve the nonlinear constrained minimization problem (4), which we now describe.

Let and denote the two vertices belonging to the active front edge. We define the empirical local direction covariance matrix associated with the vertex as

(1)

where and are column vectors of coordinates, is a subset of data falling in a neighborhood centered at the vertex , and denotes the number of data points in the neighborhood. Two possibilities for are a Euclidean neighborhood, or the data points nearest .

The definition of the empirical local direction matrix is motivated by ideas from principal component analysis. If the vertex and the data points in lie on the surface of a smooth two dimensional manifold, then each member of the set of normalized directions can be nearly reconstructed from a two dimensional basis, although the set of directions almost certainly spans a much higher dimensional space. Akin to principal component analysis, the dominant eigenvectors of the covariance matrix furnish a natural basis for the set of normalized directions, and the eigenvalues provide a measure of how well each basis vector reconstructs the data.

Associated with is a Riemannian metric induced by the the matrix

(2)

where the user must set the parameter so that under this metric, the distance from to is large when does not lie in the affine subspace associated with dominant eigendirections of . Without an efficient method to compute the distance under the metric induced by , our algorithm would require an impractical amount of time to complete. Our assumption that the data can be fit with triangles implies that the numerical rank of is much less than the dimension of a data vector, so we only need the dominant eigenvalues and eigenvectors of to measure distances in the induced metric. Suppose and has nonzero eigenvalues. Let denote the diagonal matrix of nonzero eigenvalues, and let denote the matrix whose orthonormal columns are eigenvectors of so that . Let be the coordinate vector in, then the distance induced by (2) can be computed as

(3)

Computing the lefthand side of (3) in the naive but straightforward manner requires arithmetic operations. Forming requires arithmetic operations, and once we have formed , forming the righthand side of (3) requires an additional arithmetic operations. Of course, the righthand side requires the eigen information in and , but these can be computed efficiently by a Krylov subspace method since the structure of allows us to compute the matrix-vector product as the linear combination of the normalized directions . This requires arithmetic operations, where is the number of normalized directions, and typically is much less than the dimension of a data vector .

Given the front edge with vertices and , the algorithm attempts to generate a new triangle with the vertices , where is typically new, but may be an existing vertex. The bulk of the computational expense required to generate the new triangle is spent solving the constrained minimization problem (4) for the new vertex. Let denote , then the constrained optimization problem is

(4a)
(4b)

The first constraint describes a Euclidean sphere of radius centered at the midpoint of the active front edge. We assume that the raw data has been scaled so that all the points on this sphere represent more or less equal changes in the characteristics of the data. The second constraint is the isosceles constraint, so called because it forces the minimizer to be equidistant from both active edge vertices, where the distance to the edge vertex is measured by the metric induced by . The shorter the length of a displacement from , the more confident we are that the data lies in that direction. Under the first two constraints in (4b), the minimization problem can have two solutions. Setting the new vertex to one of these minimizers would introduce a triangle that is redundant in the sense that it nearly replicates the triangle that owns the active edge. The third constraint is designed to remove this minimizer from consideration. Let denote the third vertex in the triangle that owns the active front edge, then is the unit normal to the front edge that lies in the plane of , and points away from . That is, it is the unit vector in the direction of .

After solving the constrained optimization problem (4), the algorithm uses the minimizer to generate at most two candidates for the vertex . The first candidate always exists, and is the data point nearest , which may also be a vertex in the existing triangulation. The purpose of the second candidate is to avoid generating an unnecessarily small triangle at some point in the future, which works against the goal of fitting the data with as few triangles as possible. This can happen if the first candidate vertex is close to a vertex belonging to some other front edge that shares a vertex with the active front edge. In this event, the second candidate for is the nearby vertex belonging to the front edge adjacent to the active edge, provided that the distance between the two vertices is within a user supplied tolerance. Without considering the second candidate, then the original candidate triangle will have a front edge that forms a small angle with an adjacent front edge. As a result, the algorithm will likely generate a future small triangle that owns the two adjacent front edges with a small angle between them, and a third short edge. Illustrations of the first and second candidate vertices are in Figure 1 in the case where the data vectors are in .

(a) First candidate vertex
(b) Second candidate vertex
Figure 1. First and second candidate candidate vertices. Existing triangles are dark gray, candidate triangle is light gray

The user must supply the radius of the constraint sphere in (4b). In our implementation, we set the constraint sphere radius based on the weighted average of the Euclidean length of the active front edge, and the characteristic length. A weighted average with a heavy emphasis on the active edge length is more likely to result in an equilateral triangle, which is preferable if the active front edge is not too short. If the active front edge is relatively short, then it is preferable to generate an isosceles triangle with two long edges and a single short edge since this triangle will fit more data than an equilateral triangle. In our implementation, we set times the weighted average of the active edge length and the characteristic length, where both weights are . In the common case where the active front edge length and the characteristic length are nearly equal, then this constraint sphere radius will favor a Euclidean equilateral triangle with sides that are approximately the characteristic length.

In our implementation, we use a Euclidean neighborhood for . When calculating , we set the radius of this neighborhood to the length of the active edge measured in the Euclidean metric.

Given the popularity and long history of least squares, one may ask why we did not choose to minimize the sum of squares instead of the objective function (4a) in the constrained minimization problem. It turns out that when , as may be the case when the data is planar, the least squares solution is unacceptable. Suppose the origin is the midpoint of the active front edge, so its vertices satisfy . Then, , which is minimized on the constraint sphere (4b) by the appropriately scaled eigenvector of associated with its smallest eigenvalue. In this case, the least squares solution completely ignores the other two eigenvalues of , so this information is essentially wasted. Furthermore, there is no reason to believe that the minimizing eigenvector of is nearly perpendicular to the active front edge, so the resulting triangle may be highly skewed.

Criteria for accepting the candidate triangle

The main idea behind deciding whether to accept the candidate triangle is that the new triangle should not conflict with the existing triangulation, which we now define. We can always translate the triangles and so that without a loss of generality, we may assume that one of the vertices of coincides with the origin in . Let and denote the triangles in obtained by orthogonally projecting the vertices of and into the subspace containing the vertices of . Note that and represent the same triangle, but their vertices are represented in different bases, while and do not represent the same triangle in general. The triangles and are said to overlap if the intersection of the open interiors of and is nonempty. More concretely, let and denote the open interiors of the triangles and , that is , is a vertex of , and is defined analogously. and overlap if is nonempty. The triangles and are said to conflict if they overlap and the distance between and is within some tolerance. We included the open instead of closed interior in the definition of overlap so that two triangles that share a common edge do not necessarily conflict.

Figure 2 illustrates the two cases where a pair of triangles overlap but do not conflict, and when a pair of triangles do conflict. The illustrations are for data in for the purpose of visualization, but the definitions of conflict and overlap hold for data in a higher dimensional space. In both cases, triangle is the dark triangle highest on the vertical axis, is the dark triangle in the - plane, and the lighter triangle is the triangle formed by orthogonally projecting the vertices of into the plane containing .

(a) Overlapping but nonconflicting triangles
(b) Conflicting triangles
Figure 2. Conflicting and nonconflicting triangles

If the candidate triangle conflicts with an existing triangle in the triangulation, then the algorithm rejects the candidate triangle and adds no new vertices or edges to the triangulation in the current iteration of the main loop. The algorithm may also fail to generate a new triangle if

  • The matrix could not be computed for some active front edge vertex because the deleted neighborhood of contained no surface data points. This indicates that the radius of the neighborhood may be too small, or the density of data points near is too low.

  • The distance from the minimizer to the nearest surface data point was larger than a user supplied tolerance. This can happen when a front edge lies near a boundary of the data set, such as the boundary of the creased sheet.

  • The constraint set in (4b) was empty.

Generating the initial triangulation

The advancing front stage of the algorithm requires an initial triangulation to get started. The algorithm generates the first vertex by setting its coordinates to those of some data point that is chosen randomly unless the user specifies a specific initial point. The algorithm then finds a second data point so that the distance from the initial vertex to this data point is as close to the characteristic length as possible, and sets the coordinates of the second vertex equal to the coordinates of the second data point. With the edge defined by vertices and , the algorithm now solves the constrained minimization problem of (4) with the third constraint removed, which typically has two solutions. The algorithm generates coordinates for two new vertices from these minimizers by finding the two data points closest to the minimizers, and then generates the initial triangulation containing four vertices, five edges, and two triangles.

Selecting the active front edge in the advancing front stage

The algorithm must select the active front edge at the beginning of each iteration of the advancing front stage of the algorithm. In our implementation, we push any new front edges onto a stack (FILO buffer) whenever a new triangle is created. Although an edge may be a front edge when it is pushed onto the stack, it may get covered by a triangle after some subsequent iteration of the advancing front stage. So, at the beginning of the advancing front stage, the algorithm pops edges off the stack until it encounters a front edge, which becomes the active front edge. This approach appears to work fairly well in practice, although a more sophisticated active front edge choosing algorithm may yield a better final triangulation.

2.2. Seam sewing stage for data that can be triangulated

The advancing front stage of SNPCA typically produces a triangulation that does not completely fit the data set, as one can see in §4. The seam sewing stage of the algorithm attempts to complete the triangulation with a variation of the advancing front stage algorithm. This stage requires the partial complex generated during the advancing front stage, the set of ’s associated with all vertices belonging to a front edge, and a maximum allowable edge length parameter that is similar in spirit to the characteristic length. Each successful iteration of the seam sewing stage produces a new triangle from an active edge, but unlike the advancing front stage, it never generates a new vertex.

Generating a new triangle from the active front edge

Let and denote the vertices of the active front edge. The algorithm attempts to form a new triangle by finding the existing vertex that solves the discrete optimization problem

(5)

Note that the minimizer comes from the finite set of existing vertices , so the first constraint reduces the problem to a search over a small subset of the vertex set. Since this set is discrete, there is no reason to believe that the isosceles constraint of (4b) can be satisfied. Consequently, we chose the objective function to simply minimize the sum of the squares of the triangle edge lengths opposite the active edge, measured in their respective induced metrics. We imposed the first constraint so the seam sewing stage of the algorithm does not generate a new triangle that bridges a void in the data. The algorithm marks the active front edge as unviable if no vertex satisfies the constraints, and this information is used by the algorithm when selecting the next active edge in subsequent iterations.

In practice, the seam sewing stage may fail to close holes in the triangulation, in which case it it is up to the user to determine if the holes are spurious. If so, they may be closed by omitting the restriction in (5) that the candidate triangle does not conflict with the existing triangulation.

Selecting the active edge in the seam sewing stage

The algorithm that selects the next active edge during the seam sewing stage of the algorithm is similar to the one used during the advancing front stage. The major difference being that the advancing front stage produced a single sequence of front edges that terminated as soon as the edge stack became empty, whereas the seam sewing stage may produce more than one such sequence. At the beginning of the seam sewing stage, each front edge is marked as viable, and a front edge is marked unviable if upon visiting it, the algorithm determines that the minimization problem (5) has no solution.

To initialize a sequence of front edges, the algorithm finds the smallest angle between two adjacent front edges, where at least one of the front edges is viable, and then pushes one of the viable front edges into the edge stack. The algorithm repeatedly pops edges out of the buffer until it finds a front edge that has not been marked as unviable, and then sets the active front edge to this edge. It then attempts to generate a new triangle by solving the minimization problem in (5), and adds any new front edges to the edge stack. The sequence of front edges terminates when the edge stack is empty, and the seam sewing stage terminates when there are no front edges in the triangulation or every front edge is unviable.

2.3. Advancing front stage outline for data that can be fit with a complex of -simplices

In this section, we give an overview of the general case where the data points are in , but are essentially locally -dimensional. This means that given any data point, the data points that fall in a sufficiently small neighborhood of that data point lie near a -dimensional affine subspace. In this setting, the output of SNPCA is a simplicial -complex whose simplices have at most vertices. If the user does not know the essential dimension of the data beforehand, it can be estimated at a data point by counting the number of dominant eigenvalues of the empirical local direction covariance matrix computed at that data point.

Analogous to a front edge, a front face is a -simplex in the complex whose vertices are a subset of exactly one -simplex in the complex. The fundamental task in the advancing front stage is to generate a new vertex from a front face, so that the new vertex and front face form a new -simplex that fits a subset of the data, which is a generalization of generating a new triangle from a front edge in §2.1.1. Let denote the active front face vertices that belong to the -simplex with vertices , define the local direction covariance matrix and as in (1) and (2). The coordinates for the new vertex are generated by first solving the following generalization of the constrained minimization problem (4)

(6a)
(6b)

where denotes the centroid the active front face, is shorthand for , and denotes the radius of the constraint sphere. The second constraint forces the minimizer to be equidistant from all front face vertices where the distance to the vertex is measured with the metric induced by . The purpose of the third constraint in (6b) is to force the minimizer to fall on the side of the active face opposite the interior vertex , which is the intuitively correct region to place the vertex. We take to be the unit normal to the -simplex that is the active front face, lies in the space spanned by the simplex that owns the front face, and points away from .

The other main component of the advancing front stage is a means to determine if a candidate -simplex conflicts with an existing -simplex in the complex, which can be naturally generalized from the definition of conflicting triangles in §2.1.2. Let and denote two simplices whose vertices are in , where as in the case of triangles we may assume that has one vertex at the origin. Let and denote the -simplices in obtained by orthogonally projecting the vertices of and into the subspace containing the vertices of . The simplices and are said to overlap if the intersection of the open interiors of and is nonempty. and are said to conflict if they overlap and the distance from to is within some tolerance.

If the candidate -simplex formed by the active front face and the new vertex does not conflict with an existing simplex in the complex, then the algorithm accepts the new simplex. Just as in the case where the data could be triangulated, the algorithm pushes all new front faces belonging to the new -simplex onto a face stack, and at the top of each iteration of the advancing front stage main loop, faces are popped off the stack until the algorithm finds a front face.

3. Discussion

3.1. Interpreting the local direction covariance matrix

Although the SNPCA algorithm operates on discrete data, we can gain intuition about its behavior by investigating some of its components in a continuous setting. To this end, suppose we have a smooth manifold that can be parameterized locally by the variables and . This is the limiting case where the data is distributed uniformly over the manifold with respect to surface area, and the number of data points goes to infinity. We define the continuous version of the local direction covariance matrix as

(7)

where is a point on the manifold, and denotes the region in - parameter space that maps to the region on the manifold lying inside the search sphere of radius centered at the point . The area element is given by the formula , where and are the manifold parameters, and denotes the standard norm. The integral in the denominator is the surface area of the manifold inside the search sphere, which serves as a normalization constant.

To elucidate the relationship between the continuous version of and the curvature of the underlying manifold, we will focus on the case where the underlying manifold is a quadratic surface. Given a quadratic surface in and a point on the surface, there is always a translation of the surface and an orthonormal change of variables corresponding to a rotation so that we may assume without a loss of generality that the point under consideration is the origin, and the normal to the surface is aligned with the axis. Such a quadratic surface has the form

(8)

where

An expansion in the search sphere radius of the local direction covariance matrix (7) of the quadratic surface (8) at the origin is

(9)

where

For the remainder of the discussion, we refer to the local direction covariance matrix to leading order in (9) as . Note that is diagonal when , and its eigenvalues are and with algebraic multiplicities two and one. When , the eigenvalues of are perturbations off of and that depend on the curvature of the quadratic surface and the size of the search radius . If a dominant eigenvalue of has an expansion of the form , then to leading order and . By the same argument, the weak eigenvalue of to leading order is . Both these approximations to the dominant eigenvalues satisfy the characteristic polynomial of with a residual of , where the characteristic polynomial is computed from the leading order approximation (9). The signed curvature at the origin1 of the curve formed by the intersection of the quadratic surface (8) and the plane perpendicular to the tangent space containing the vector is . It follows that the maximum and minimum curvatures are and . The two directions in the tangent space associated with the curvatures and are aligned with the and axes. Let denote the mean curvature at the origin. The eigenvalues of in terms of the mean, maximum, minimum curvatures, and search radius are

(10)

When the search radius is zero, the eigenvectors associated with dominant eigendirections of are the first two columns of the identity matrix, and they form a basis for the tangent space through the origin of the quadratic surface. When , then in general the eigenvectors of no longer span the tangent space, but they span a plane that nearly coincides with the tangent space. To calculate an approximate eigenvector of associated with (10), we may safely assume that has been scaled so that the first entry is , and the other entries have Taylor expansions in the search radius . The coefficients in the Taylor expansions can be calculated by the standard technique of zeroing the coefficients of the leading order terms in of .

The approximate eigenvectors of are

(11)

where denotes the column of the identity matrix. These formulas indicate that and are nearly aligned with the directions in which the curvature of the quadratic surface is maximized and minimized. For both approximate eigenvectors, after has been normalized in the two-norm.

3.2. Motivation for the vertex placement algorithm

We can adapt the constrained minimization problem (4) that underlies the new vertex placement algorithm of §2.1.1 to a continuous setting by treating the underlying manifold as the data. In this setting, the vertices and are points on the manifold, and the matrices and are computed from the continuous version of the local direction covariance matrix (7). The main requirement of the new vertex placement algorithm is that the vertex must not be too far from the underlying surface. Otherwise, the resulting triangle cannot possibly fit the data in a meaningful way. Here, too far means a large distance as measured by either of the two metrics induced by and . These metrics are built from curvature information local to the active edge vertices, which can be estimated in the discrete setting where we just have points on the underlying surface, but not the surface itself. With this in mind, the initial vertex placement algorithm may be interpreted as simultaneously minimizing the distances from the new vertex to the underlying surface as measured by the induced metrics associated with each active edge vertex. There appears to be no reason to favor one metric over the over, and the second constraint in (4b) ensures both are given equal weight in the simultaneous minimization.

If one of the front edge vertices is the origin, then the objective function (4a) is , where and is a user supplied small parameter. In the context of the minimization problem that underlies the initial vertex placement algorithm, it is helpful to think of as a penalty function that penalizes points based on their distance to the underlying surface. Under this interpretation, a point that is not near the underlying surface is penalized in the sense that is large. For a general point in space, the size of the penalty is determined by the eigen structure of , which is intimately related to the curvature and tangent space of the underlying surface. In the case of the quadratic surface (8), the exact tangent space through the origin is the - plane, and we define the approximate tangent space to be the subspace spanned by and in (11). and share the same eigenvectors, so two of ’s eigenvectors span the approximate tangent space, and its third eigenvector is orthogonal to the the approximate tangent space. The largest eigenvalue of is , which is much larger than the other two eigenvalues assuming the user has not chosen an overly large value of . Consequently, if has a non-negligible component in the direction perpendicular to the approximate tangent space, then the objective function severely penalizes in the sense that the value is dominated by . The objective function should exhibit this behavior since points that are not near the approximate tangent space cannot be near the underlying surface. The remaining two eigendirections of lie in the approximate tangent space, and these eigendirections are nearly aligned with the directions of maximum and minimum curvature of the quadratic surface as described in §3.1. The eigenvalues of in these directions are

(12)

There is enough freedom in the derivation of the quadratic surface equation (8) that we may assume for the sake of concreteness that , which then imposes . Thus, for two points with equal Euclidean norm that both lie in the approximate tangent space, the objective function penalizes the point with the greater component in the direction more than it penalizes the other point. The eigendirection associated with nearly coincides with the direction of greatest curvature in magnitude, which is . It is in this direction that the quadratic surface curves away from the exact tangent space most rapidly. The approximate and exact tangent spaces are nearly aligned, so points with a larger component in the direction of the maximum magnitude curvature will be farther from the underlying quadratic surface than points with the same Euclidean norm, but a smaller component in the direction of maximum magnitude curvature. The curvature of the surface is encoded in through its eigenvalues (12), which allows the objective function to penalize points in the approximate tangent space that are far from the underlying surface by using the local curvature information as a proxy.

3.3. Conditions under which the minimization problem has a solution

Although the constrained optimization problem (4) is not guaranteed to have a solution, we can derive mild conditions under which it does so that the algorithm user can be confident that the advancing front stage of the algorithm is robust. We focus on the case where we are triangulating data in . We omit the third constraint in (4b) whose purpose is to make the minimizer unique, so that the simplified constraint set is the intersection of the constraint sphere and the surface that describes the isosceles constraint. Without a loss of generality, we may assume that , so that the midpoint of the two active edge vertices falls on the origin. First, consider the case where , then the isosceles constraint reduces to which describes a plane through the origin. The constraint set in this case is the intersection of the plane with a sphere centered at the origin. Therefore, the constraint set is a non-empty compact subset of , which guarantees that a minimizer of the constrained optimization problem exists.

Now, consider the more realistic case where , but the tangent spaces through and are nearly aligned in the sense that their canonical angles2 are nearly zero, and the eigenvalues of and are nearly equal after they have been sorted by magnitude. The isosceles constraint reduces to

(13)

where . The first term in equation (13) describes the plane from the isosceles constraint in the case, and the other terms can be interpreted as perturbations that introduce curvature into the isosceles constraint surface. By continuity, if is sufficiently small in norm, then the surface that satisfies the isosceles constraint nearly coincides with the plane that describes the isosceles constraint in the case. It follows that if is sufficiently small, then the constraint set described by the intersection of the constraint sphere with the isosceles constraint surface is a compact subset of , and therefore a minimizer to the constrained minimization problem exists. We now show that the magnitude of is controlled by the alignment of the two tangent spaces, and the difference in the eigenvalues of and . If the data points reside on a smooth manifold, and lie near each other on the manifold, and if there are a sufficient number of data points in the neighborhoods of and , then the alignment of the two tangent spaces and difference in eigenvalues of and are governed by the curvature of the underlying manifold. Furthermore, as and approach each other and the density of data in their respective neighborhoods increases, the tangent spaces approach perfect alignment and the differences between eigenvalues approach zero. The symmetric matrices and can be diagonalized by orthonormal transformations, so let and where is an orthonormal matrix and is diagonal, and define and . We may assume without a loss of generality that is on the order of , where denotes one of the canonical angles between the two tangent spaces, or the canonical angle between the orthogonal complements of the tangent spaces. Thus, approaches zero as the two tangent spaces approach perfect alignment. By a standard argument,

which implies that is bounded by and .

4. Results

We tested the SNPCA algorithm by running it on data sets that fall near the surface of a sphere, torus, swiss roll, and creased sheet. All these manifolds can be parameterized by two variables, and we embedded the data in a fifty dimensional space as follows. To generate points in that are distributed uniformly over the manifold with respect to surface area, we sampled parameter values from the distribution whose probability density function is the manifold area element normalized by the manifold surface area. For example, the sphere of radius has surface area and is parameterized by where and . The area element is , so the distribution from which we sample values of and for the sphere is . We then generated points in on the manifold from these random parameter values. Finally, we randomly generated three orthogonal unit vectors in , and then for each point on the manifold, we generated a data point in by forming the linear combination of three three orthonormal vectors with the manifold point coordinates as weights. The vertex coordinates of the final triangulation are in , so to produce the illustrations of the final triangulation, we undid the orthogonal transformation so that the vertex coordinates were in .

We chose these data sets based on attributes of the manifold that underlies the data, namely the curvature and smoothness of the manifold, and the presence of a boundary. The creased sheet is smooth with no curvature in every neighborhood away from the crease location, and has a boundary. The sphere is smooth, has uniform curvature, and no boundary The swiss roll is smooth, has varying curvature, and a boundary. The torus is smooth, has varying curvature, and no boundary.

For each data set, we have included illustrations of the points in the data set, the output of the advancing front stage, and the final triangulation produced by the seam sewing stage. The error tables list the maximum error, the average error, and the RMS error associated with each triangulation generated by SNPCA. The error associated with the data point, denoted by , is the shortest distance from the data point to a point on the triangulation.

(a) Data
(b) SNPCA seams
(c) SNPCA sewn
Figure 3. A -sphere of radius embedded in .
(a) Data
(b) SNPCA seams
(c) SNPCA sewn
Figure 4. A -torus with radii and embedded in .
(a) Data
(b) SNPCA seams
(c) SNPCA sewn
Figure 5. The surface underlying this data set is a torus with radii of and , and we added error to the surface data points before generating the set of data vectors. We generated the error vectors in by drawing each coordinate from the standard normal distribution, and then scaling the error vector by a factor of . We then generated the set of data vectors by our standard orthogonal transformation. The final triangulation is not watertight, meaning it has front edges.
(a) Data
(b) SNPCA seams
(c) SNPCA sewn
Figure 6. We parameterized the swiss roll by , where and , and set the curvature parameter . The notches on the boundary of the final triangulation account for the large value of .
(a) Data
(b) SNPCA seams
(c) SNPCA sewn
Figure 7. The creased sheet data fall on a surface that resembles a creased sheet of paper, where the crease angle radians.

Footnotes

  1. The signed curvature at of the plane curve is
  2. Let and denote orthonormal matrices whose columns span two subspaces of equal dimension. The cosines of the canonical subspace angles are the singular values of [5, p. 73].

References

  1. Coryn A. L. Bailer-Jones, Mike Irwin, and Ted Von Hippel. Automated classification of stellar spectra — ii. two-dimensional classification with neural networks and principal components analysis. Monthly Notices of the Royal Astronomical Society, 298(2):361–377, 1998.
  2. Hiroshi Murase and Shree K. Nayar. Visual learning and recognition of 3-d objects from appearance. International Journal of Computer Vision, 14:5–24.
  3. Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.
  4. Lawrence K. Saul and Sam T. Roweis. Think globally, fit locally: unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res., 4:119–155, December 2003.
  5. G. W. Stewart. Matrix Algorithms: Volume 1, Basic Decompositions. Society for Industrial Mathematics, 1998.
  6. Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
  7. Zhenyue Zhang and Hongyuan Zha. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientific Computing, 26(1):313–338, 2004.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
63960
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description