Sparse Circular Coordinates

Sparse Circular Coordinates via Principal -Bundles

Jose A. Perea Department of Computational Mathematics, Science & Engineering Department of Mathematics, Michigan State University East Lansing, MI, USA. joperea@msu.edu
Abstract.

We present in this paper an application of the theory of principal bundles to the problem of nonlinear dimensionality reduction in data analysis. More explicitly, we derive, from a 1-dimensional persistent cohomology computation, explicit formulas for circle-valued functions on data with nontrivial underlying topology. We show that the language of principal bundles leads to coordinates defined on an open neighborhood of the data, but computed using only a smaller subset of landmarks. It is in this sense that the coordinates are sparse. Several data examples are presented, as well as theoretical results underlying the construction.

Key words and phrases:
Circular Coordinates, Persistent cohomology, Principal Bundles, Classifying Map
2010 Mathematics Subject Classification:
Primary 55R99, 55N99, 68W05; Secondary 55U99
This work was partially supported by the NSF under grant DMS-1622301 and DARPA under grant HR0011-16-2-003

1. Introduction

The curse of dimensionality refers to a host of phenomena inherent to the increase in the number of features describing the elements of a data sets. For instance, in statistical learning, the number of training data points needs to grown roughly exponentially in the number of features, in order for learning algorithms to generalize correctly in the absence of other priors. A deeper manifestation of the curse of dimensionality is the deterioration of the concept of “nearest neighbors” in high-dimensional Euclidean space; for as the dimension increases, the distance between any two points is roughly the same [18]. One of the most popular priors in data science is the “low intrinsic dimensionality” hypothesis. It contends that while the apparent number of features describing each data point (e.g., the number of pixels in an image) might be large, the effective number of degrees of freedom (i.e., the intrinsic dimensionality) is often much lower. Indeed, images generated at random will hardly depict a cat or a natural scene.

Many dimensionality reduction schemes have been proposed in the literature to leverage the “low intrinsic dimensionality” hypothesis, each making explicit or implicit use of likely characteristics of the data. For instance, Principal Component Analysis [10] and other linear dimensionality reduction methods, rely on the existence of a low-dimensional linear representation accounting for most of the variability in the data. Methods such as Locally Linear Embeddings [19] and Laplacian EingenMaps [2], on the other hand, presuppose the existence of a manifold-like object parametrizing the underlying data space. Other algorithms, like Multidimensional Scaling [11] and Isomap [22], attempt to preserve distances between data points while providing low-dimensional reconstructions.

Recently, several new methods for nonlinear dimensionality reduction have emerged from the field of computational topology [6, 21, 17]. The idea being that if the underlying space from which the data has been sampled has a particular shape, then this information can be used to generate appropriate low-dimensional representations. The circular coordinates of de Silva, Morozov, and Vejdemo-Johansson [6] pioneered the use of persistent cohomology as a way to measure the shape of a data set, and then produce circle-valued coordinates reflecting the underlying nontrivial topology. Their algorithm goes as follows. Given a finite metric space — the data — and a scale so that the Rips complex

has a nontrivial integer cohomology class — this is determined from the persistent cohomology of the Rips filtration — a linear least squares optimization (of size the number of vertices by the number of edges of ) is solved, in order to construct a function which, roughly, puts one of the generators of in correspondence with .

1.1. Our Contribution

Two drawbacks of the perspective presented in [6] are: (1) the method requires a persistent cohomology calculation, as well as a least squares optimization, on the Rips filtration of the entire data set . This is computationally expensive and may limit applicability to small-to-medium-sized data. (2) once the function has been computed, it is only defined on the data points from used for its construction. Here we show that these drawbacks can be addressed effectively with ideas from principal -bundles. In particular, we show that it is possible to construct circular coordinates on from the Rips filtration on a subset of landmarks , Proposition 4.3, with similar classifying properties as in [6], Theorem 3.2, and that said coordinates will be defined on an open neighborhood of containing . We call these functions “sparse circular coordinates”.

1.2. The Sparse Circular Coordinates Algorithm

Let us describe next the steps needed to construct said coordinates. The rest of the paper is devoted to the theory behind these choices:

  1. Let be the input data set; i.e. a finite metric space. Select a set of landmarks , e.g. at random or via maxmin sampling, and let

    be the radius of coverage. In particular, is the Hausdorff distance between and .

  2. Choose a prime at random and compute the 1-dimensional persistent cohomology with coefficients in , for the Rips filtration on the landmark set . Let be the resulting persistence diagram.

  3. If there exists so that , then let

    Let be a cocyle representative for the persistent cohomology class corresponding to . If is closer to 1, then the circular coordinates are defined on a larger domain; however, this makes step (5) below more computationally intensive.

  4. Lift to an integer cocycle . That is, one for which is a coboundary in . An explicit choice (that works in practice for a prime chosen at random) is the integer cochain:

  5. Choose positive weights for the vertices and edges of — e.g. all equal to one — and let be the (weighted) Moore-Penrose pseudoinverse (solving weighted linear least squares problems) for the coboundary map

    If is the inclusion, let

  6. Denote by the value of on the vertex , and by the value of on the oriented edge . If we let

    and denotes the open ball of radius centered at , then the sparse circular coordinates are defined by the formula:

    (1)

    If is a subspace of an ambient metric space , then the ’s can be taken to be ambient metric balls. This is why we call the circular coordinates sparse; is computed using only , but its domain of definition is an open subset of which, by construction, contains all of .

1.3. Organization

We start in Section 2 with a few preliminaries on principal bundles, highlighting the main theorems needed in later parts of the paper. We assume familiarity with persistent cohomology (if not, see [16]), as well as the definition of Čech cohomology with coefficients in a presheaf (see for instance [14]). Section 3 is devoted to deriving the formulas — e.g. (1) above — which turn a 1-dimensional integer cohomology class into a circle-valued function. In Section 4 we describe how to make all this theory applicable to real data sets. We present several experiments in Section 5 with both real and synthetic data, and end in Section 6 with a few final remarks.

2. Preliminaries

2.1. Principal Bundles

We present here a terse introduction to principal bundles, with the main results we will need later in the paper. In particular, the connection between principal bundles and Čech chomology, which allows for explicit computations, and their classification theory via homotopy classes of maps to classifying spaces. The latter description will be used to generate our sparse circular coordinates. We refer the interested reader to [9] for a more thorough presentation.

Let be a connected and paracompact111 So that partitions of unity always exist topological space with basepoint .

Definition 2.1.

A pair , with a topological space and a continuous map, is said to be a fiber bundle over with fiber , if:

  1. is surjective

  2. Every point has an open neighborhood and a homeomorphism , called a local trivialization around , so that for every .

The spaces and are called, respectively, the total and base space of the bundle, and is called the projection map.

Definition 2.2.

Let be an abelian topological group whose operation we write additively. A fiber bundle is said to be a principal -bundle if:

  1. The total space comes equipped with a fiberwise free right -action. That is, a continuous map

    satisfying the right-action axioms, with for every pair , and for which only if is the identity of .

  2. The induced fiberwise -action is transitive for every in the base space.

  3. The local trivializations can be chosen to be -equivariant: that is, so that , for every .

Two principal -bundles , , are said to be isomorphic, if there exists a -equivariant homeomorphism so that . This defines an equivalence relation on principal -bundles over , and the set of isomorphism classes is denoted .

Given a principal -bundle and a system of (-equivariant) local trivializations , we have that

(2)

is a -equivariant homeomorphism whenever . Since the -action on is fiberwise free and fiberwise transitive, then induces a well-defined continuous map

defined by the equation

(3)

The ’s are called the transition functions for the -bundle corresponding to the system of local trivializations . In fact, these transition functions define an element in the Čech cohomology of . Indeed, for each open set let denote the set of continuous maps from to . Since is an abelian group, then so is , and if is another open set, then precomposing with the inclusion yields a restriction map

This defines a sheaf of abelian groups over , with , called the sheaf of -valued continuous functions on . It follows that the transition functions (2) define an element in the Čech 1-cochains of the cover with coefficients in the sheaf . Moreover,

Proposition 2.3.

The transition functions satisfy the cocycle condition

(4)

In other words, is a Čech cocycle.

If is another system of local trivializations with induced Čech cocycle , and

then one can check that the difference is a coboundary in . Since is a refinement for both and , it follows that the -bundle yields a well-defined element . Moreover, after passing to isomorphism classes of -bundles we get that

Lemma 2.4.

The function

is well-defined and injective.

This is in fact a bijection. To check surjectivity, fix an open cover for , and a Čech cocyle

Then one can construct a -principal bundle over with total space

(5)

and projection

taking the class of in the quotient , to the point . Notice that if sends to the class of in , then defines a system of local trivializations for , and that is the associated system of transition functions. Therefore,

Theorem 2.5.

The function

is a natural bijection.

In addition to this characterization of principal -bundles over as Čech cohomology classes, there is another interpretation in terms of classifying maps. We will combine these two views in order to produce coordinates for data in the next sections.

Indeed, to each topological group one can associate a space that is both weakly contractible, i.e. all its homotopy groups are trivial, and which comes equipped with a free right -action

The quotient is a topological space (endowed with the quotient topology), called the classifying space of , and the quotient map

defines a principal -bundle over , called the universal bundle. It is important to note that there are several constructions of , and thus of , but they all have the same homotopy type. One model for is the Milnor construction [13]

(6)

with acting diagonally by right multiplication on each term of the infinite join.

The next Theorem explains the universality of . Given a continuous map , the pullback is the principal -bundle over with total space , and projection map . Moreover,

Theorem 2.6.

Let denote the set of homotopy class of maps from to the classifying space . Then, the function

is a bijection.

Proof.

See [9], Chapter 4: Theorems 12.2 and 12.4. ∎

Theorem 2.6 implies that given a principal -bundle , there exists a continuous map so that is isomorphic to , and that the choice of is unique up to homotopy. Any such choice is called a classifying map for .

3. From Integer Simplicial Cohomology to Circular Coordinates

For an arbitrary topological group , the Milnor construction (6) produces an explicit universal -bundle , but the spaces and tend to be rather large. Indeed, they are often infinite-dimensional CW-complexes. For the case we have the more economical models and , with acting on by right translation: , and projection

Since is discrete, then -valued continuous functions on are in fact locally constant, and hence is exactly the sheaf of locally constant functions with values in , denoted . Combining the definition of the Čech cohomology group with Theorems 2.5 and 2.6, yields a bijection

(7)

where the limit is taken over all locally finite covers of , ordered by refinement, and the groups are the 1-dimensional simplicial cohomology with coefficients of the associated nerve complexes . The goal now is to produce an explicit family of compatible functions realizing the isomorphism from (7). This is done in Theorem 3.2, and an explicit formula is given by (11).

To begin, let be a partition of unity on dominated222 That is, so that for all . by , fix a 1-cocycle , and define for each the map

(8)

Since is locally finite, then all but finitely many terms in this sum are zero. Note that acts on by right translation , and that is equivariant with respect to this action: .

If , then we have that

and hence the ’s can be assembled to induce a continuous map on the quotient space defined by (5); here is regarded as a collection of constant functions . To be more explicit, sends the class of in to . Since each is -equivariant, then so is , and hence it descends to a well defined map at the level of base spaces

(9)
Lemma 3.1.

The map classifies the principal -bundle .

Proof.

Let us see explicitly that the map is well defined; in other words, that the value is independent of the open set containing . Indeed, let be so that . We contend that for every . If , then the equality is trivial since ; if , then and since is a cocycle. Therefore

and given that , then .

Finally, let us check that taking the pullback of the universal -bundle yields a principal -bundle isomorphic to . Indeed, since , then is a morphism of principal -bundles, and the result follows from [9, Chapter 4: Theorem 4.2]. ∎

Theorem 3.2.

Let be the inclusion and

(10)

the induced homomorphism. Given and , let . Denote by the value of on the vertex , and by the value of on the oriented edge ; in particular , and whenever . If

(11)

then is a classifying map for the principal -bundle .

Proof.

Since is a classifying map for , by Lemma 3.1, then it is enough to check that and are homotopic (see Theorem 2.6). For we have that

where . Since factors through :

then is null-homotopic, hence is homotopic to , and the result follows. ∎

Remark 3.3.

We note that the relation from Theorem 3.2 implies that the cochain encodes the degrees of freedom in choosing a cocycle representative for the class , and thus defining the classifying map . This choice will be addressed in the discussion about Harmonic Smoothing in Section 4.

4. Persistent Cohomology and Sparse Circular Coordinates for Data

In this section we show how the theory we have developed thus far can be applied to real data sets. In particular, we explain and justify the choices made in the construction outlined in the Introduction (1.2). Let us begin by fixing an ambient metric space , let be finite, and let

The formulas derived in the previous section, specially (9), imply that each cocycle yields a map . The thing to notice is that is defined on every ; thus, given a large but finite set — the data — sampled around a continuous space , one can select a much smaller set of landmarks and for which . The resulting circular coordinates will thus be defined on all points of , though only the landmark set is used in its construction. As we alluded to in the introduction, this is what we mean when we say that the coordinates are sparse.

Landmark Selection

In practice we select the landmarks either at random, or through maxmin sampling: Given and chosen arbitrarily, assume that have been selected, , and let

(12)

Following this inductive procedure defines a landmark set that is in practice well-separated and well-distributed throughout the data. However, it is important to keep in mind that this process is prone to choosing outliers.

The Subordinated Partition of Unity

As for the choice of partition of unity dominated by , we can use that the cover is via metric balls, and let

(13)

See [17, 3.3 and Fig 6.] for other typical choices of partition of unity in the case of metric spaces, and coverings via metric balls.

The Need for Persistence

Even if the landmark set correctly approximates the underlying topology of , the choice of scale and cocycle might reflect sampling artifacts instead of robust geometric features of the underlying space . This is why we need persistent cohomology. Indeed, a class which is not in the kernel of the homomorphism

induced by the inclusion , is less likely to correspond to spurious features as the difference increases. Note, however, that the efficient computation of persistent cohomology classes relies on using field coefficients. We proceed, following [6] and [17], by choosing a prime and a scale so that (1) contains a class with large persistence, and (2) so that the homomorphism , induced by the quotient map , is surjective.

Lifting Persistence to Integer Coefficients

As stated in [6], one has that:

Proposition 4.1.

Let be a finite simplicial complex, and suppose that does not divide the order of the torsion subgroup of . Then the homomorphism

induced by the quotient map , is surjective.

Proof.

This follows directly from the Bockstein long exact sequence in cohomology, corresponding to the short exact sequence

More generally, let be a filtered simplicial complex with finite. Since each complex is finite, and the cohomology groups change only at finitely many values of , then there exists so that the hypotheses of Proposition 4.1 will be satisfied for each , and all . In practice we choose a prime at random, with the intuition that for scientific data only a few primes are torsion contributors.

Let and for let be defined on each 1-simplex as:

(14)

Thus, takes values in and it satisfies . For the examples we have observed, the cochain defined by (14) produces an integer cocycle. One of the reviewers of an earlier version of this paper remarked that this is not always the case; the outlined procedure tends to fail (in real world-examples) when the cohomology computation involves division by 2. As highlighted in [6, 2.4], solving a Diophantine linear system can be used to fix the problem.

Use Rips, not Nerves

Constructing the filtered complex can be rather expensive for a general ambient metric space . Indeed, the inclusion of an -simplex into the nerve complex is predicated on checking if the intersection of ambient metric balls is nonempty. This is nontrivial on curved spaces. On the other hand, the Rips complex

provides a straightforward alternative, since we can use that

for every . Here is how. Let be a prime so that

is surjective for all , and let

be the homomorphism induced by the inclusion . Moreover, let be so that , and fix an integer lift

That is, one for which is a coboundary, e.g. (14).

The diagram below summarizes the spaces and homomorphisms used thus far:

Since the diagram commutes, then is not in the kernel of , and hence we obtain a nonzero element . This is the class we would use as input for Theorem 3.2.

Harmonic Smoothing

The final step is selecting an appropriate cocycle representative (refer to Figure 1 to see why this matters)

for the class , see (10). Again, since one would hope to never compute the nerve complex, the strategy is to solve the problem in for , and then transfer the solution using .

Inspecting (11) reveals that the choice of which promotes the smallest total variation in , is the one for which the value of on each 1-simplex of is as small as possible. Consequently, we will look for the cocycle representative

of , which in average has the smallest squared value333 That is, we use the harmonic cocycle representative for appropriate inner products on cochains on each 1-simplex of . That said, not all edges in the rips complex are created equal. Some might have just entered the filtration, i.e. , which would make them unstable if is corrupted with noise, or perhaps is a rather small portion of the data, which could happen if and are outliers selected during maxmin sampling.

These observations can be encoded by choosing weights on vertices and edges:

(15)

where is symmetric for all , it satisfies

and only when . Here is the weight of , and is the weight of the edge . For instance, one can take

but we note that we have not yet systematically investigated the effects of this choice. See [20, Apendix D] for a different heuristic.

It follows that defines inner products on and , by letting the indicator functions on -simplices () be orthogonal, and setting

(16)

Using , for , we let be the orthogonal projection of onto the space of 1-coboundaries, and define

(17)

A bit of linear algebra shows that,

Proposition 4.2.

The 1-cocycle defined by (17) is a minimizer for the weighted least squares problem

(18)

Here the sum runs over all 1-simplices , and the minimization is over all 1-cocycles which are cohomologous to .

Similarly, and if

denotes the coboundary map, then we let

in the orthogonal complement of the kernel of , be so that . Hence is the 0-chain with the smallest norm mapping to via . Consequently, if

is the weighted Moore-Penrose pseudoinverse of (see [3, III.3.4]), then

(19)

This is how we compute and in our implementation. Now, let

If we were to be completely rigourous, then and would be the cochains going into (11); this would require the 1-skeleton of the nerve complex. However, as the following proposition shows, this is unnecessary:

Proposition 4.3.

For all , and every , we have that

That is, we can compute sparse circular coordinates using only the Rips filtration on the landmark set.

Proof.

Since and have the same vertex set, namely , then as real-valued functions on . Moreover, for all we have that

for if , then both sides are zero, and if , then the edge is in both and , which shows that . ∎

5. Experiments

In all experiments below, persistent cohomology is computed using a MATLAB wrapper for Ripser [1] kindly provided by Chris Tralie. The Moore-Penrose pseudoinverse was computed via MATLAB’s pinv. In all cases we run the algorithm from the Introduction 1.2 using the indicated persistence classes, or linear combinations thereof as made explicit in each example.

5.1. Synthetic Data

5.1.1. A Noisy Circle

We select 1,000 points from a noisy circle in ; the noise is Gaussian in the direction normal to the unit circle. 50 landmarks were selected via maxmin sampling (5% of the data), and circular coordinates were computed for the two most persistent classes and , using (19) as input to (11) — this is the harmonic cocycle column — or (9) with either or directly — the integer cocycle column. We show the results in Figure 1 below. Computing persistent cohomology took 0.079423 seconds (the Rips filtration is constructed from zero to the diameter of the landmark set); in each case computing the harmonic cocycle takes about 0.037294 seconds. This example highlights the inadequacy of the integer cocycle and of choosing cohomology classes associated to sampling artifacts (i.e., with low persistence). From now on, we only present circular coordinates computed with the relevant harmonic cocycle representative.

Figure 1. A noisy circle. Left: persistence diagrams in dimension 0 (blue) and 1 (red) for the Rips filtration on the landmarks. Right: Circular coordinates from the two most persistent classes (top row) and (bottom row). The columns indicate if the harmonic or integral cocycle was used. The dark rings are the landmarks. The colors are: the domain of definition for the circular coordinate (gray), and its value on each point (dark blue, , through dark red, ). Please refer to an electronic version for colors.

5.1.2. The 2-Dimensional Torus

For this experiment we sample 1,000 points uniformly at random from the square , and for each selected pair we generate a point on the surface of the torus embedded in . The resulting finite set is endowed with the ambient distance from , and 100 landmarks (i.e., 10% of the data) are selected through maxmin sampling. We show the results in Figure 2 below, for the circular coordinates computed with the two most persistent classes, and , and the maps (11) associated to the harmonic cocycle representatives (19). Computing persistent cohomology for the Rips filtration on the Landmarks (from zero to the diameter of the set) takes 0.398252 seconds, and computing the harmonic cocycles takes 0.030832 seconds.

Figure 2. The torus. Left: Persistence in dimensions 0 and 1 for the Rips filtration on the landmark set. Center and Right: the landmark set is depicted with dark rings, and the colors correspond to the circular coordinates computed with (the harmonic representatives from) the two most persistent classes (center) and (right). Please refer to an electronic version for colors.

5.1.3. The Klein Bottle

We model the Klein bottle as the quotient space

and endow it with the quotient metric. Just like in the case of the 2-torus, we sample 1,000 points uniformly at random on (the fundamental domain