Sparse Circular Coordinates via Principal Bundles
Abstract.
We present in this paper an application of the theory of principal bundles to the problem of nonlinear dimensionality reduction in data analysis. More explicitly, we derive, from a 1dimensional persistent cohomology computation, explicit formulas for circlevalued functions on data with nontrivial underlying topology. We show that the language of principal bundles leads to coordinates defined on an open neighborhood of the data, but computed using only a smaller subset of landmarks. It is in this sense that the coordinates are sparse. Several data examples are presented, as well as theoretical results underlying the construction.
Key words and phrases:
Circular Coordinates, Persistent cohomology, Principal Bundles, Classifying Map2010 Mathematics Subject Classification:
Primary 55R99, 55N99, 68W05; Secondary 55U991. Introduction
The curse of dimensionality refers to a host of phenomena inherent to the increase in the number of features describing the elements of a data sets. For instance, in statistical learning, the number of training data points needs to grown roughly exponentially in the number of features, in order for learning algorithms to generalize correctly in the absence of other priors. A deeper manifestation of the curse of dimensionality is the deterioration of the concept of “nearest neighbors” in highdimensional Euclidean space; for as the dimension increases, the distance between any two points is roughly the same [18]. One of the most popular priors in data science is the “low intrinsic dimensionality” hypothesis. It contends that while the apparent number of features describing each data point (e.g., the number of pixels in an image) might be large, the effective number of degrees of freedom (i.e., the intrinsic dimensionality) is often much lower. Indeed, images generated at random will hardly depict a cat or a natural scene.
Many dimensionality reduction schemes have been proposed in the literature to leverage the “low intrinsic dimensionality” hypothesis, each making explicit or implicit use of likely characteristics of the data. For instance, Principal Component Analysis [10] and other linear dimensionality reduction methods, rely on the existence of a lowdimensional linear representation accounting for most of the variability in the data. Methods such as Locally Linear Embeddings [19] and Laplacian EingenMaps [2], on the other hand, presuppose the existence of a manifoldlike object parametrizing the underlying data space. Other algorithms, like Multidimensional Scaling [11] and Isomap [22], attempt to preserve distances between data points while providing lowdimensional reconstructions.
Recently, several new methods for nonlinear dimensionality reduction have emerged from the field of computational topology [6, 21, 17]. The idea being that if the underlying space from which the data has been sampled has a particular shape, then this information can be used to generate appropriate lowdimensional representations. The circular coordinates of de Silva, Morozov, and VejdemoJohansson [6] pioneered the use of persistent cohomology as a way to measure the shape of a data set, and then produce circlevalued coordinates reflecting the underlying nontrivial topology. Their algorithm goes as follows. Given a finite metric space — the data — and a scale so that the Rips complex
has a nontrivial integer cohomology class — this is determined from the persistent cohomology of the Rips filtration — a linear least squares optimization (of size the number of vertices by the number of edges of ) is solved, in order to construct a function which, roughly, puts one of the generators of in correspondence with .
1.1. Our Contribution
Two drawbacks of the perspective presented in [6] are: (1) the method requires a persistent cohomology calculation, as well as a least squares optimization, on the Rips filtration of the entire data set . This is computationally expensive and may limit applicability to smalltomediumsized data. (2) once the function has been computed, it is only defined on the data points from used for its construction. Here we show that these drawbacks can be addressed effectively with ideas from principal bundles. In particular, we show that it is possible to construct circular coordinates on from the Rips filtration on a subset of landmarks , Proposition 4.3, with similar classifying properties as in [6], Theorem 3.2, and that said coordinates will be defined on an open neighborhood of containing . We call these functions “sparse circular coordinates”.
1.2. The Sparse Circular Coordinates Algorithm
Let us describe next the steps needed to construct said coordinates. The rest of the paper is devoted to the theory behind these choices:

Let be the input data set; i.e. a finite metric space. Select a set of landmarks , e.g. at random or via
maxmin
sampling, and letbe the radius of coverage. In particular, is the Hausdorff distance between and .

Choose a prime at random and compute the 1dimensional persistent cohomology with coefficients in , for the Rips filtration on the landmark set . Let be the resulting persistence diagram.

If there exists so that , then let
Let be a cocyle representative for the persistent cohomology class corresponding to . If is closer to 1, then the circular coordinates are defined on a larger domain; however, this makes step (5) below more computationally intensive.

Lift to an integer cocycle . That is, one for which is a coboundary in . An explicit choice (that works in practice for a prime chosen at random) is the integer cochain:

Choose positive weights for the vertices and edges of — e.g. all equal to one — and let be the (weighted) MoorePenrose pseudoinverse (solving weighted linear least squares problems) for the coboundary map
If is the inclusion, let

Denote by the value of on the vertex , and by the value of on the oriented edge . If we let
and denotes the open ball of radius centered at , then the sparse circular coordinates are defined by the formula:
(1) If is a subspace of an ambient metric space , then the ’s can be taken to be ambient metric balls. This is why we call the circular coordinates sparse; is computed using only , but its domain of definition is an open subset of which, by construction, contains all of .
1.3. Organization
We start in Section 2 with a few preliminaries on principal bundles, highlighting the main theorems needed in later parts of the paper. We assume familiarity with persistent cohomology (if not, see [16]), as well as the definition of Čech cohomology with coefficients in a presheaf (see for instance [14]). Section 3 is devoted to deriving the formulas — e.g. (1) above — which turn a 1dimensional integer cohomology class into a circlevalued function. In Section 4 we describe how to make all this theory applicable to real data sets. We present several experiments in Section 5 with both real and synthetic data, and end in Section 6 with a few final remarks.
2. Preliminaries
2.1. Principal Bundles
We present here a terse introduction to principal bundles, with the main results we will need later in the paper. In particular, the connection between principal bundles and Čech chomology, which allows for explicit computations, and their classification theory via homotopy classes of maps to classifying spaces. The latter description will be used to generate our sparse circular coordinates. We refer the interested reader to [9] for a more thorough presentation.
Let be a connected and paracompact^{1}^{1}1 So that partitions of unity always exist topological space with basepoint .
Definition 2.1.
A pair , with a topological space and a continuous map, is said to be a fiber bundle over with fiber , if:

is surjective

Every point has an open neighborhood and a homeomorphism , called a local trivialization around , so that for every .
The spaces and are called, respectively, the total and base space of the bundle, and is called the projection map.
Definition 2.2.
Let be an abelian topological group whose operation we write additively. A fiber bundle is said to be a principal bundle if:

The total space comes equipped with a fiberwise free right action. That is, a continuous map
satisfying the rightaction axioms, with for every pair , and for which only if is the identity of .

The induced fiberwise action is transitive for every in the base space.

The local trivializations can be chosen to be equivariant: that is, so that , for every .
Two principal bundles , , are said to be isomorphic, if there exists a equivariant homeomorphism so that . This defines an equivalence relation on principal bundles over , and the set of isomorphism classes is denoted .
Given a principal bundle and a system of (equivariant) local trivializations , we have that
(2) 
is a equivariant homeomorphism whenever . Since the action on is fiberwise free and fiberwise transitive, then induces a welldefined continuous map
defined by the equation
(3) 
The ’s are called the transition functions for the bundle corresponding to the system of local trivializations . In fact, these transition functions define an element in the Čech cohomology of . Indeed, for each open set let denote the set of continuous maps from to . Since is an abelian group, then so is , and if is another open set, then precomposing with the inclusion yields a restriction map
This defines a sheaf of abelian groups over , with , called the sheaf of valued continuous functions on . It follows that the transition functions (2) define an element in the Čech 1cochains of the cover with coefficients in the sheaf . Moreover,
Proposition 2.3.
The transition functions satisfy the cocycle condition
(4) 
In other words, is a Čech cocycle.
If is another system of local trivializations with induced Čech cocycle , and
then one can check that the difference is a coboundary in . Since is a refinement for both and , it follows that the bundle yields a welldefined element . Moreover, after passing to isomorphism classes of bundles we get that
Lemma 2.4.
The function
is welldefined and injective.
This is in fact a bijection. To check surjectivity, fix an open cover for , and a Čech cocyle
Then one can construct a principal bundle over with total space
(5) 
and projection
taking the class of in the quotient , to the point . Notice that if sends to the class of in , then defines a system of local trivializations for , and that is the associated system of transition functions. Therefore,
Theorem 2.5.
The function
is a natural bijection.
In addition to this characterization of principal bundles over as Čech cohomology classes, there is another interpretation in terms of classifying maps. We will combine these two views in order to produce coordinates for data in the next sections.
Indeed, to each topological group one can associate a space that is both weakly contractible, i.e. all its homotopy groups are trivial, and which comes equipped with a free right action
The quotient is a topological space (endowed with the quotient topology), called the classifying space of , and the quotient map
defines a principal bundle over , called the universal bundle. It is important to note that there are several constructions of , and thus of , but they all have the same homotopy type. One model for is the Milnor construction [13]
(6) 
with acting diagonally by right multiplication on each term of the infinite join.
The next Theorem explains the universality of . Given a continuous map , the pullback is the principal bundle over with total space , and projection map . Moreover,
Theorem 2.6.
Let denote the set of homotopy class of maps from to the classifying space . Then, the function
is a bijection.
Proof.
See [9], Chapter 4: Theorems 12.2 and 12.4. ∎
Theorem 2.6 implies that given a principal bundle , there exists a continuous map so that is isomorphic to , and that the choice of is unique up to homotopy. Any such choice is called a classifying map for .
3. From Integer Simplicial Cohomology to Circular Coordinates
For an arbitrary topological group , the Milnor construction (6) produces an explicit universal bundle , but the spaces and tend to be rather large. Indeed, they are often infinitedimensional CWcomplexes. For the case we have the more economical models and , with acting on by right translation: , and projection
Since is discrete, then valued continuous functions on are in fact locally constant, and hence is exactly the sheaf of locally constant functions with values in , denoted . Combining the definition of the Čech cohomology group with Theorems 2.5 and 2.6, yields a bijection
(7) 
where the limit is taken over all locally finite covers of , ordered by refinement, and the groups are the 1dimensional simplicial cohomology with coefficients of the associated nerve complexes . The goal now is to produce an explicit family of compatible functions realizing the isomorphism from (7). This is done in Theorem 3.2, and an explicit formula is given by (11).
To begin, let be a partition of unity on dominated^{2}^{2}2 That is, so that for all . by , fix a 1cocycle , and define for each the map
(8) 
Since is locally finite, then all but finitely many terms in this sum are zero. Note that acts on by right translation , and that is equivariant with respect to this action: .
If , then we have that
and hence the ’s can be assembled to induce a continuous map on the quotient space defined by (5); here is regarded as a collection of constant functions . To be more explicit, sends the class of in to . Since each is equivariant, then so is , and hence it descends to a well defined map at the level of base spaces
(9) 
Lemma 3.1.
The map classifies the principal bundle .
Proof.
Let us see explicitly that the map is well defined; in other words, that the value is independent of the open set containing . Indeed, let be so that . We contend that for every . If , then the equality is trivial since ; if , then and since is a cocycle. Therefore
and given that , then .
Finally, let us check that taking the pullback of the universal bundle yields a principal bundle isomorphic to . Indeed, since , then is a morphism of principal bundles, and the result follows from [9, Chapter 4: Theorem 4.2]. ∎
Theorem 3.2.
Let be the inclusion and
(10) 
the induced homomorphism. Given and , let . Denote by the value of on the vertex , and by the value of on the oriented edge ; in particular , and whenever . If
(11) 
then is a classifying map for the principal bundle .
Proof.
4. Persistent Cohomology and Sparse Circular Coordinates for Data
In this section we show how the theory we have developed thus far can be applied to real data sets. In particular, we explain and justify the choices made in the construction outlined in the Introduction (1.2). Let us begin by fixing an ambient metric space , let be finite, and let
The formulas derived in the previous section, specially (9), imply that each cocycle yields a map . The thing to notice is that is defined on every ; thus, given a large but finite set — the data — sampled around a continuous space , one can select a much smaller set of landmarks and for which . The resulting circular coordinates will thus be defined on all points of , though only the landmark set is used in its construction. As we alluded to in the introduction, this is what we mean when we say that the coordinates are sparse.
Landmark Selection
In practice we select the landmarks either at random, or through maxmin
sampling:
Given and chosen arbitrarily,
assume that have been selected, ,
and let
(12) 
Following this inductive procedure defines a landmark set that is in practice wellseparated and welldistributed throughout the data. However, it is important to keep in mind that this process is prone to choosing outliers.
The Subordinated Partition of Unity
As for the choice of partition of unity dominated by , we can use that the cover is via metric balls, and let
(13) 
See [17, 3.3 and Fig 6.] for other typical choices of partition of unity in the case of metric spaces, and coverings via metric balls.
The Need for Persistence
Even if the landmark set correctly approximates the underlying topology of , the choice of scale and cocycle might reflect sampling artifacts instead of robust geometric features of the underlying space . This is why we need persistent cohomology. Indeed, a class which is not in the kernel of the homomorphism
induced by the inclusion , is less likely to correspond to spurious features as the difference increases. Note, however, that the efficient computation of persistent cohomology classes relies on using field coefficients. We proceed, following [6] and [17], by choosing a prime and a scale so that (1) contains a class with large persistence, and (2) so that the homomorphism , induced by the quotient map , is surjective.
Lifting Persistence to Integer Coefficients
As stated in [6], one has that:
Proposition 4.1.
Let be a finite simplicial complex, and suppose that does not divide the order of the torsion subgroup of . Then the homomorphism
induced by the quotient map , is surjective.
Proof.
This follows directly from the Bockstein long exact sequence in cohomology, corresponding to the short exact sequence ∎
More generally, let be a filtered simplicial complex with finite. Since each complex is finite, and the cohomology groups change only at finitely many values of , then there exists so that the hypotheses of Proposition 4.1 will be satisfied for each , and all . In practice we choose a prime at random, with the intuition that for scientific data only a few primes are torsion contributors.
Let and for let be defined on each 1simplex as:
(14) 
Thus, takes values in and it satisfies . For the examples we have observed, the cochain defined by (14) produces an integer cocycle. One of the reviewers of an earlier version of this paper remarked that this is not always the case; the outlined procedure tends to fail (in real worldexamples) when the cohomology computation involves division by 2. As highlighted in [6, 2.4], solving a Diophantine linear system can be used to fix the problem.
Use Rips, not Nerves
Constructing the filtered complex can be rather expensive for a general ambient metric space . Indeed, the inclusion of an simplex into the nerve complex is predicated on checking if the intersection of ambient metric balls is nonempty. This is nontrivial on curved spaces. On the other hand, the Rips complex
provides a straightforward alternative, since we can use that
for every . Here is how. Let be a prime so that
is surjective for all , and let
be the homomorphism induced by the inclusion . Moreover, let be so that , and fix an integer lift
That is, one for which is a coboundary, e.g. (14).
The diagram below summarizes the spaces and homomorphisms used thus far:
Since the diagram commutes, then is not in the kernel of , and hence we obtain a nonzero element . This is the class we would use as input for Theorem 3.2.
Harmonic Smoothing
The final step is selecting an appropriate cocycle representative (refer to Figure 1 to see why this matters)
for the class , see (10). Again, since one would hope to never compute the nerve complex, the strategy is to solve the problem in for , and then transfer the solution using .
Inspecting (11) reveals that the choice of which promotes the smallest total variation in , is the one for which the value of on each 1simplex of is as small as possible. Consequently, we will look for the cocycle representative
of , which in average has the smallest squared value^{3}^{3}3 That is, we use the harmonic cocycle representative for appropriate inner products on cochains on each
1simplex of .
That said, not all edges in the rips complex are created equal.
Some might have just entered the filtration, i.e. ,
which would make them unstable if is corrupted with noise,
or perhaps is a rather small
portion of the data, which could happen if and are outliers selected during maxmin
sampling.
These observations can be encoded by choosing weights on vertices and edges:
(15) 
where is symmetric for all , it satisfies
and only when . Here is the weight of , and is the weight of the edge . For instance, one can take
but we note that we have not yet systematically investigated the effects of this choice. See [20, Apendix D] for a different heuristic.
It follows that defines inner products on and , by letting the indicator functions on simplices () be orthogonal, and setting
(16) 
Using , for , we let be the orthogonal projection of onto the space of 1coboundaries, and define
(17) 
A bit of linear algebra shows that,
Proposition 4.2.
The 1cocycle defined by (17) is a minimizer for the weighted least squares problem
(18) 
Here the sum runs over all 1simplices , and the minimization is over all 1cocycles which are cohomologous to .
Similarly, and if
denotes the coboundary map, then we let
in the orthogonal complement of the kernel of , be so that . Hence is the 0chain with the smallest norm mapping to via . Consequently, if
is the weighted MoorePenrose pseudoinverse of (see [3, III.3.4]), then
(19) 
This is how we compute and in our implementation. Now, let
If we were to be completely rigourous, then and would be the cochains going into (11); this would require the 1skeleton of the nerve complex. However, as the following proposition shows, this is unnecessary:
Proposition 4.3.
For all , and every , we have that
That is, we can compute sparse circular coordinates using only the Rips filtration on the landmark set.
Proof.
Since and have the same vertex set, namely , then as realvalued functions on . Moreover, for all we have that
for if , then both sides are zero, and if , then the edge is in both and , which shows that . ∎
5. Experiments
In all experiments below,
persistent cohomology is computed using a MATLAB wrapper for Ripser [1] kindly provided by Chris Tralie.
The MoorePenrose pseudoinverse was computed via MATLAB’s pinv
.
In all cases we
run the algorithm from the Introduction 1.2 using
the indicated persistence classes, or linear combinations thereof as made explicit in each example.
5.1. Synthetic Data
5.1.1. A Noisy Circle
We select 1,000 points from a noisy circle in ;
the noise is Gaussian in the direction normal to the unit circle.
50 landmarks were selected via maxmin
sampling (5% of the data),
and circular coordinates were computed for the two most persistent classes and ,
using (19) as input to (11) — this is the harmonic cocycle column —
or (9) with either or directly — the integer cocycle column.
We show the results in Figure 1 below.
Computing persistent cohomology took 0.079423 seconds (the Rips filtration is constructed from zero to the diameter of the landmark set); in each case computing the
harmonic cocycle takes about 0.037294 seconds.
This example highlights the inadequacy of the integer cocycle and of choosing cohomology classes
associated to sampling artifacts (i.e., with low persistence).
From now on, we only present circular coordinates computed with the relevant harmonic cocycle representative.
5.1.2. The 2Dimensional Torus
For this experiment we sample 1,000 points uniformly at random from the square ,
and for each selected pair we generate a point
on the surface of the torus embedded in .
The resulting finite set is endowed with the ambient distance from ,
and 100 landmarks (i.e., 10% of the data) are selected through maxmin
sampling.
We show the results in Figure 2 below, for the circular coordinates
computed with the two most persistent classes, and , and the maps (11)
associated to the harmonic cocycle representatives (19).
Computing persistent cohomology for the Rips filtration on the Landmarks (from zero to the diameter of the set)
takes 0.398252 seconds, and computing the harmonic cocycles takes 0.030832 seconds.
5.1.3. The Klein Bottle
We model the Klein bottle as the quotient space
and endow it with the quotient metric. Just like in the case of the 2torus, we sample 1,000 points uniformly at random on (the fundamental domain