Homological Coordinatization

Homological Coordinatization

Andrew Tausz Stanford University, Stanford, CA, 94305 atausz@stanford.edu  and  Gunnar Carlsson Stanford University, Stanford, CA, 94305 gunnar@math.stanford.edu
July 4, 2019
Abstract.

In this paper, we review a method for computing and parameterizing the set of homotopy classes of chain maps between two chain complexes. This is then applied to finding topologically meaningful maps between simplicial complexes, which in the context of topological data analysis, can be viewed as an extension of conventional unsupervised learning methods to simplicial complexes.

The first author was supported by AFOSRG grant FA9550-09-0-1-0531.
The second author was supported by AFOSRG grant FA9550-09-0-1-0531, ONR grant N00014-08-1-0931 and NSF grant DMS 0905823.

1. Introduction

One goal of topological data analysis is to adapt algebraic topological methods to the context of point cloud data (i.e. finite metric spaces). The generalization of homology to this setting is called persistent homology [Carlsson_04], [Carlsson_09]. Persistent homology has been successful at providing insight into nonlinear datasets, that would not be accessible with more classical methods. Although maps between spaces play a fundamental role in algebraic topology, most of the developments in topological data analysis have focused on the spaces and datasets themselves. In this paper, we propose a method for studying these maps.

Suppose that we compute the homology of a space . By comparing the homology of with the known homology of other spaces, it can suggest that is homeomorphic to a previously understood space, or that there should be maps exhibiting prescribed homological behavior to model spaces with known homology. One thinks of this process as a kind of non-linear coordinatization. Standard methods for introducing useful coordinates on a point cloud include Principal Component Analysis and Multidimensional Scaling. Both of these methods often work well when the structure of the space is essentially Euclidean. However, when the space carries noncontractible topology, as in the case of a circle, these methods do not provide a method for mapping the data set to a nonlinear model. We describe some examples when the ability to construct maps to nonlinear targets would be useful.

Circular coordinatization: In situations where one finds that the persistent homology of the data set is that of a circle, it is natural to attempt to find a map to the circle. In this special case, there is a natural methodology using the persistent cohomology generator in dimension 1 which allows one to construct the map. The procedure is described in detail in [de_Silva_Johansson].

Natural image example: In [mumford], homological calculations where carried out to confirm that a space of frequently occurring motifs within image patches in natural images had the homology of a Klein bottle. Once one is given the homology, it is then of a great deal of interest to attempt to construct an actual parametrization by a Klein bottle. This was done by hand in [mumford], but one would like to automate the procedure.

Gene expression data: Gene expression microarrays provide a powerful tool for obtaining information about many biological phenomena, including cancer. They produce high dimensional data, where the coordinates consist of probes representing particular genes. It is very well known that the results of such studies are highly dependent on platforms, procedures within the laboratories performing the studies, as well as many other factors. All this can distort the geometry of the data set, but one can hope that certain topological features would still be preserved, which might permit one to map one data set to another in a nonlinear way, preserving the relevant features. Often [monica], the geometry of these data sets are represented by shapes like the one pictured below.

In this case, since the tree is contractible, the direct use of homology will not be useful, since homology vanishes on contractible spaces. However, relative homology of the pair , where denotes a suitably defined boundary of the space, does capture the existence of branches or flares in the geometric object. There are reasonable ways of defining the boundary of the metric space, and the points in the boundary have significance in that they tend to consist of most representative phenomena of a particular subclass of samples. For example, type I and type II diabetes can be distinguished in this way, as can various molecular subtypes of cancer. In this case, one could then fix the map on the boundary, and study the relative mapping problem in which one enforces the constraint that the boundary is carried into a small neighborhood of the boundary. This kind of mapping is of a great deal of importance, since the problem of reconciling different data sets constructed on the same kinds of tumors or other disorders is a major problem in this area.

From the perspective of topological data analysis, we desire a map between two simplicial complexes and to satisfy the following:

  • Such a map must be functorial: It must induce homomorphisms on the homology of and .

  • Ideally, the map would be simplicial. This means that the image of a simplex should be a chain in containing either 0 or 1 simplices. Simplicial maps are particularly very well behaved: they are determined by their values on vertices and can be fully extended by linear interpolation. Additionally, they have a nice geometric interpretation. In general, we obtain maps between chains in and . Thus, the typical situation is that we have .

  • Even though a map might be a chain map (hence inducing homomorphisms, or even isomorphisms in homology), such a map might be unsatisfactory in the eyes of a practitioner. We want these mappings to reveal some sort of information about the structure of one complex in terms of the other. A common situation might be that is created from a large and high-dimensional dataset, and is a simple model with the same homology. In this case, a map can be thought of as performing a kind of topological dimensionality reduction. This leads to the process of geometric regularization through optimization.

Unfortunately, simplicial maps do not always exist - an example is the case of mapping from a triangle to a square. For this reason, we wish to find maps that are as close to simplicial as possible. Our investigation to this problem proceeds as follows:

  • The first step is to compute a compute a parameterization of the homotopy classes of chain maps between two finite simplicial complexes. This is accomplished quite easily using a simple trick from homological algebra. (See sections 2 and 3)

  • We wish to develop optimization problems that yield maps that are as close to simplicial as possible. Additionally, we would like these problems to satisfy other properties. For example, when they exist, simplicial maps should be contained in the set of optima, and to preserve tractability the problem should be convex. We discuss these criteria in section 4 and formulate two different optimization problems over the parameterization. The first one is convex and can be formulated as a linear program, while the second is non-convex but has the property that its minima are precisely the set of simplicial maps when they exist.

  • In section 5, we discuss three applications to various scenarios in topological data analysis: manifold-valued coordinates, density maximization, and mappings of contractible spaces. In the third example, we show how to compute mappings between trees by considering their relative homology with respect to a boundary defined by a filter function.

The two main existing investigations into the area of computing maps are [de_Silva_Johansson] and [Ding]. In the paper [de_Silva_Johansson], de Silva and Vejdemo-Johansson present a method based on persistent cohomology for computing circular-valued coordinates on statistical data sets. The fundamental idea behind their work is to use the Brown representation of cohomology. Since we know that is the Eilenberg-MacLane space for the group , we have that

We can compute a map from a space to the circle by choosing a representative from a cohomology class . In practice, they also perform an optimization step where they select the smoothest cocycle. Although this works very well for the case of a circle, it is not practically generalizable to spaces other than . For example is the infinite-dimensional complex projective space , and is .

In the PhD thesis of Yi Ding, [Ding], the mapping problem is investigated from a combinatorial perspective. The hom-complex is used, as it is here, to obtain a parameterization of the space of chain maps, but combinatorial optimization techniques are used to select maps that satisfy certain criteria. Our work can be seen as an extension of Ding’s work to the continuous case.

We close the introduction with a remark on how we can think about the simplicial mapping problem as a version of higher order clustering. Clustering can be thought of as computing a map from a dataset to the discrete space . In the figure below, a cluster assignment is a mapping from each point in the set on the left to the set .

Suppose that we construct a filtered simplicial complex from with some maximum filtration parameter . Examples of such constructions include the Vietoris-Rips complex, the Cech complex, and others. In this case, a clustering assignment is a mapping from the filtered complex to that is constant on the path components of the complex. It is easy to verify that this is equivalent to the chain map property on dimension 0: If is an edge between and in the same path component, then . Thus, a higher dimensional analogue of clustering is the computation of some homotopy representative of a class of chain maps for -simplices, with .

2. Definitions and Basics

In this section we review some basic definitions for completeness. This material can be found in a standard text on algebraic topology such as [HATCHER] or [Munkres]. The material on the hom-complexes can be found in [Maclane].

Definition 1.

A chain map between two chain complexes and is a family of homomorphisms such that for each we have that .

Definition 2.

Given two chain maps and between the chain complexes and , a chain homotopy is a family of homomorphisms such that for each we have that

(1)

The key fact is that if the chain maps and are chain-homotopic, then they induce the same homomorphism on homology.

Definition 3.

Given two chain complexes and , we define a new complex as follows. Let

(2)

An element of is a family of homomorphisms for . Note that even in the case where the chain complexes and have non-negative support, the chain complex will have non-trivial negative terms in general.

Now that we have defined the terms in the complex, we need to define connecting homomorphisms

If , then will itself be a family of homomorphisms. Let us denote by , the component mappings on the individual modules . To define , it is sufficient to specify its action on elements of for all . Let , be an arbitrary element. We define

It is easy to see that satisfies . We summarize the properties of the hom-complex in the proposition below. Each claim can be proved by a simple computation.

Proposition 1.

The hom-complex satisfies the following homological properties:

  1. is a cycle () if and only if is a chain map.

  2. is the boundary of some element if and only if is chain homotopic to the zero map via .

  3. The zeroth homology group consists of homotopy classes of chain maps.

Thus we have a nice characterization of the homotopy classes of chain maps - all we need to do is compute the homology of the hom-complex. Note that in this paper, if we use the term homotopy we actually mean the term “chain homotopy”. In other words, we only deal with the algebraic notion and not the topological one. Furthermore, by a map between chain complexes, we really mean a chain map.

Remark 1.

In our investigation, we will restrict ourselves to field coefficients for two computational reasons. Firstly, the homology computations will be performed in a persistent setting for which one must work over a field (see [Carlsson_04]). Secondly, working over a field of characteristic 0 greatly enhances the efficiency of the later optimization steps. As a bonus for working with field coefficients, we get a more manageable representation of the individual terms, since for vector spaces and we have that . We denote the vector space dual of with . In this case, the -th term in the hom-complex is

(3)

This is very similar to another standard construction in homology theory - the tensor product of two chain complexes. Thus an element may be written as . From the computational point of view, this representation is particularly useful since most of the coefficients will be zero and do not need to be stored.

We now close this section with a theorem which gives a useful characterization of the homology of the hom-complex in special cases. A proof may be found in chapter III of [Maclane].

Proposition 2.

Let and be chain complexes of -modules. Then there exists the following exact sequence for each

(4)

In the special case when we are dealing with vector spaces over a field , then the Ext term vanishes and we have an explicit expression for the Hom term. So we get that

(5)

3. Finding Mappings between Simplicial Complexes

In this section we apply the algebraic techniques of the previous section to computing a parameterization of the homotopy classes of maps between two simplicial complexes.

Suppose we have simplicial complexes, and with , . So and are bases for the vector spaces (over a field, ) and . We also denote the dual basis of with . Then, the vector space has basis .

Based on our previous discussion about hom-complexes, we know that the set of homotopy classes of maps between and is given by the -th homology with

(6)

In practice, we are not interested in the rank or Betti numbers of the homology group, but rather we wish to find representatives for the homotopy classes. As described in [Munkres], homology can be computed via the Smith normal form in the case of coefficients in a PID, or with Gaussian elimination in the case of coefficients in a field.

The result of the homology computation is a set of representative cycles (equivalently chain maps), which we denote . We can also easily obtain the set of chain homotopies to zero by computing the columns (the image) of the matrix representation of . Let us denote these columns by .

The general parameterization of the affine space of homotopy classes is

Note that the above expression uses the notation for the chain-homotopy classes of chain maps between and , and not the homotopy classes of all continuous maps. Let us denote the -th chain map by , and the -th chain homotopy by .

3.1. Example

Let be a triangle with vertices [0], [1], [2], and edges [0, 1], [0, 2], [1, 2]. Let . Note that by the homotopy classification theorem of the previous section (Proposition 2), we have that

Thus we have two generating cycles. The two computed representatives of are:

The set of homotopies is given by the image of the 1-dimensional boundary matrix . The first few (out of a total of 9) are:

We can see that the first generating cycle induces an isomorphism , and induces the zero map on dimension-1 homology. Similarly, the second generating cycle induces an isomorphism and induces the zero map on dimension-0 homology. A generator for (equivalently ) is , and a generator for (equivalently ) is . A quick computation also reveals that adding a chain homotopy does not change the induced action of the generating cycles.

This example also gives us some hints about the selection of the coefficients for the chain maps. From the previous paragraphs, we have chain maps (which are homology cycles), which induces an isomorphism on dimension 0 homology, and which induces an isomorphism on dimension 1 homology. If we take their sum, (by setting ), then it is easy to see that induces an isomorphism on both homology groups. In practice this map is the one to start with, since it preserves the homological structure across all dimensions. Thus in this case we would use the parameterization

4. Searching the Parameterization

Given two simplicial complexes and , we now know how to compute a parameterization for the homotopy classes of chain maps between them. However, in a statistical application setting we are interested in selecting only one geometrically meaningful map from this set. Some reasonable criteria for such a map are:

  • Image cardinality (simplicialness): In general, the image of a simplex under a map will be some linear combination . Ideally, we would want to have the number of non-zero coefficients to be as few as possible.

  • Preimage cardinality: Likewise, we would prefer if the number of non-zero coefficients in the expression be as few as possible.

  • Locality: We would like the image of a simplex to be localized in the codomain. This means that the non-zero terms in the expression should not be spread apart.

  • Unbiasedness: There should be no a-priori reason to prefer one map over another if they achieve the same optima. For example, in the case of a triangle mapping to a triangle, there is no geometric way of distinguishing the identity map from one of the two rotations of the vertices (both are simplicial and optimally localized).

  • Convexity: The optimization problem should be convex.

Unfortunately, the above criteria are mutually incompatible. To see this, it suffices to consider the case where and are both triangles. The optimally sparse and localized chain maps include the three rotations of the vertices. However, the unbiased property says that each of these three maps should be elements of the set of optima of the optimization problem. If we require the problem to be convex, then it turns out that the set of optima must also be convex and in particular connected. Thus if one takes a non-trivial convex combination (say with coefficients (1/3, 1/3, 1/3)), that will also be an optima but it will violate the condition of sparsity. One remedy for this is to require that the set of optima is the convex hull of the unbiased set of points. Alternatively, one can discard unbiasedness and require that only one sparse point be returned.

4.1. The Combinatorial Approach is Hard: Theory

The purpose of this section is to provide some orientation regarding the complexity of finding nice maps. This assumes that we are taking a combinatorial approach as discussed in [Ding].

Our ultimate goal would be to construct a map that is simplicial in both directions. This means that the image and preimage of each simplex contains zero or one simplices. Let us call such a map bisimplicial. In other words, such a map would minimize the quantity

(7)

When a simplicial map exists, the above function has a minimum value of 2. However, it turns out that the above optimization problem is hard in a precise sense:

Proposition 3.

Finding whether or not a bisimplicial map exists is at least as difficult as solving the graph isomorphism problem.

Proof.

We use the standard technique of performing a polynomial-time reduction of an instance of the graph isomorphism problem to the bisimpliciality problem. In other words, suppose that we have an oracle that can tell us whether a bisimplicial map between two complexes exists in polynomial time. Then we must show that we can also determine whether there exists a graph isomorphism between two graphs and in polynomial time. Let us construct a machine that solves this problem in polynomial time.

Suppose that we are given two graphs and . Note that we can take care of any polynomial time graph invariants beforehand. For example this means that we may answer “No” if and have different numbers of vertices or edges. Similarly, since homology over a field may be computed in polynomial time (see [Carlsson_04]), we may also answer “No” if . Thus suppose that using our oracle we have constructed a bisimplicial map that induces an isomorphism on homology between and .

We claim that is a graph isomorphism. Let us denote the vertices of and by and , and the edges by and . By bisimpliciality, must be an isomorphism between and . Suppose that in (ie. the edge exists in ). Then by the fact that is a chain map, . Thus is an edge between and in . So .

Conversely, suppose that there are vertices , in such that and and , but with . However, we must have an edge such that (if not would not be bisimplicial or would not be a homology-isomorphism). But then at least one of the following holds: or . Without loss of generality, . Then we have that , contradicting the bisimpliciality of . Thus we must have that , and hence is a graph isomorphism. Thus, answering “Yes” when a bisimplicial map exists, and “No” otherwise solves the graph isomorphism decision problem. ∎

Unfortunately, it is not known whether the graph isomorphism problem is NP-complete or is in P [Garey]. Thus all of the best-known algorithms are super-polynomial. Complexity theorists have created a class called GI which consists of those problems which are polynomial time reducible to the graph isomorphism problem. In this terminology, we have shown that computing bisimplicial maps is GI-hard in general.

4.2. The Combinatorial Approach is Hard: Empirical Findings

Although the bisimpliciality problem is provably hard as previously shown, one may wonder whether it is possible to use heuristic combinatorial optimization techniques (over ). Examples include random walks, simulated annealing, or greedy search with randomized restarts. Here, we give some empirical evidence suggesting that these approaches are unlikely to work for finding bisimplicial maps. Although it is possible that many simplicial maps exist for certain articially constructed cases, the examples below suggest that such maps are “rare”.

Consider the one of the simplest conceivable nontrivial cases: where both and are squares containing the simplices . Suppose that we are looking for a map such that is forward and backward simplicial. As stated before we wish to find a minimizer for equation (7).

The homotopies in this case consist of the images of all simple tensors of the form under the map . Thus there are a total of 16 homotopies. Since we are dealing with coefficients, there are possible choices for the homotopy coefficients. Although it is not practical in general, we may simply enumerate all possible sets of coefficients. If we enumerate all possible sets of coefficients, we find that out of the 65,536 possibilities, only 16 choices of coefficients yield bisimplicial maps. Figure 1 shows the set of all coefficients enumerated on the horizontal axis (by using binary representation) with the simplicial objective function value on the vertical axis.

Figure 1. Enumeration of all of the homotopy representatives of the chain map inducing an isomorphism between two squares. The horizontal axis indicates the binary representation of the set of 16 coefficients for the homotopies, and the vertical axis shows the bisimpliciality penalty.

Similarly, for finding mappings between a triangle and a square, out of the 4096 possibilities only 7 of them yield minima. Similarly, in the case where and are circles with 8 vertices (octagons) simulated annealing was performed. After 25,300 iterations a minimum value of 11 was reported for the simplicial objective function. Since the identity map is a minimizer, the actual minimum should be 2. This was also the case with the other heuristics (greedy search and random walking) - we have yet to find a nontrivial pair in which one of these techniques is able to find a map that comes close to minimizing (7). While these examples do not constitute proof, they suggest that bisimplicial maps are extremely rare among chain maps.

4.3. Matrix Representation

For convenience, we fix some notation that will be used for the rest of this document. Let the capital letters denote the matrix representations of the chain maps , and denote the matrix representations of the homotopies . We let be the sum of chain maps. We denote an arbitrary member of by , and its matrix representation by . The basis elements of correspond to the unit vectors , and the basis elements of correspond to the unit vectors .

4.4. Minimizing Image and Preimage Size

Let us explore the first two criteria stated at the beginning of this section. We said that ideally we would like to minimize the number of non-zero terms in the image and adjoint-image of each simplex. This can be formulated as a special case of the following optimization problem:

(8)
subject to

Note that in the above expression, the coefficients are fixed beforehand. This means that we make an initial selection of which homological features to preserve. If they were not selected and were optimization variables, the problem above would be minimized by the zero map.

It is also possible to replace the max terms by sums over the domain and codomain, however such a replacement yields less meaningful maps (Consider the case of the chain map which maps each vertex in the domain to a single vertex in the codomain).

To minimize the maximum cardinality or the sum of the cardinalities, one can use (although this is not actually a norm). However, we have seen that such a combinatorial approach is difficult, and since the problem dimension we are dealing with is multiplicative in the sizes of the simplicial complexes and , this is out of the question. As practiced in many optimization settings, one may relax the cardinality minimization problem to a -norm minimization problem. The intuition behind this is that the unit ball in the -norm is the convex hull of the points and that constrained optima tend to lie on corner points which are sparse.

In the case of the -norm, we can rewrite this as:

(9)
subject to

The -norm of a matrix is defined to be the maximum absolute column sum of its entries, and is equal to the operator norm induced by the vector -norm. It turns out that this problem is convex and in fact can be reformulated as a linear program.

Although the optimization problem (8) with is useful in that it eliminates many solutions which we view as being inadmissible, it still has the property that it is closed under convex combinations. This means, for example, that in the case of the two triangles, the map that sends each vertex to is an optima, but is not what we are looking for. Thus, our viewpoint changes from (8) being the answer our search for a suitable optimization problem, to instead being a definition of admissibility of a map:

Definition 4.

A map is said to be admissible if it is a member of the set of optima of (8) with or equivalently (9). Denote the admissible set by .

The next step is to somehow identify the points within that satisfy some sort of sparsity requirement. One possibility is to require the coefficients to be integral. This brings us into the realm of integer linear programming, which turns out to be NP-hard (in the absence of the property of integral vertices which is not satsified in this situation). In fact, the integer feasibility problem (finding a point with integer coordinates in a polytope defined by ) is also NP-hard. Computationally, this can be applied to situations where both and are small complexes, but it does not scale well.

The second possibility is to optimize some measure of sparsity or peakiness over the points in . One strategy is to maximize the ratio of the 2-norm to the 1-norm. (To see that this is reasonable, one can consider the vectors and ). Another possibility is to minimize the function which measures the distance of a point to it’s nearest integral point (let the objective be ). Although both these objectives have the property that their global minima over are the simlicial maps, both of them suffer from being non-convex. In fact, since the vertices of are local minima of these functions, global minimization would involve searching all of the corner points. Once again, the task of enumerating vertices of a convex polytope turns out to be NP-hard [Khachiyan]. A third random heuristic is to select a random search direction and solve the problem

(10) minimize
subject to

This will result in the selection of a random corner point of . This can be repeated until a sufficiently sparse (close to simplicial) map is found.

4.5. The Alexander-Whitney Map

It is easy to see that there is no convex objective function that will select all of the simplicial maps, since a convex function cannot have a disconnected set of minima. However, if we discard the requirement of unbiasedness (not favoring one simplicial map over another) or convexity we can devise other methods for selecting favorable maps. The following method dispenses with convexity.

Define the Alexander-Whitney map by

Where is if , and is . It is a routine calculation to show that the following diagram commutes if and only if is a simplicial chain map:

(11)

Thus we can measure the deviation of a chain map from simpliciality by measuring the norm of the difference . Suppose we are given a loss function on , we can define

(12)

For example, a convenient choice would be the quadratic loss

(13)

Thus given a selection of a loss function, we can solve

(14) minimize
subject to

Note that even if the original loss function is convex, will not be convex on the affine space of chain maps. However, it is clear the minima of the above optimization problem will be precisely the bisimplicial maps, in the case where one exists.

Remark 2.

A naive implementation of the above would be very expensive to compute. Suppose that and , so the matrix representation of a map is a matrix . At some point, we need to compute a product of the form . The simple way would be to form the matrix (of size ), and then perform the matrix-vector multiplication. The cost of this operation is .

However, it is possible to rearrange this product differently. Suppose that is a vector in . We reshape this into the matrix of size by dividing into blocks of length and laying them side by side. It turns out that computing the product is equivalent to computing and then reshaping it into vector form (see [Henderson]). The complexity of this operation is .

5. Extensions and Applications

5.1. Coordinatization on a Manifold

In this section we investigate how the method previously described can be used to find manifold-valued coordinates for a given simplicial complex. Suppose that we have the following data:

  • : The domain simplicial complex. This is the “original” data set under investigation.

  • : A manifold that we wish to compute coordinates on.

  • : A triangulation of the manifold .

  • : A homeomorphism, corresponding to the triangulation of .

  • : A “localization” function, which maps zero-dimensional chains on to points on the manifold.

We enforce a compatibility constraint on functions and : If , where , then . Our objective is to find a coordinate mapping .

The first approach to computing coordinates on is to compute the parameterization of as well as to perform one of the optimization routines in the previous section to obtain a map . Then, given a vertex , we define its coordinate on to be

A second approach relies on the geometry of the manifold. Suppose that is equipped with a Riemannian metric . We wish to find a mapping which minimizes the total distortion across the 1-skeleton of . In other words, we wish that vertices connected by an edge in should be mapped nearby in . This can be precisely formulated as the following optimization problem:

(15) minimize
subject to

The metric on has the standard definition:

and where is the length of the curve defined by the Riemannian metric .

5.2. Example: Mapping to a circle

Let be the circle, and suppose that is an -polygon homeomorphic to . We define the coordinate of the -th vertex to be , where . We also define to map a chain in to the weighted sum of its basis elements. So we have that

Our distance function on becomes . Given this setup, we can find -valued coordinates for a data set of interest, by solving the optimization problem (15). An example is shown in Figure 5.

5.3. Density Maximization

Suppose that we have a set of points in Euclidean space, and a simplicial model . We wish to find a mapping from the vertices in to Euclidean space such that the images of these vertices land in regions of high density. The interpretation is that this process produces a clustering of the data that is aware of the topological structure of the data set .

To do this, we begin with to pieces of data:

  • A filtered simplicial complex, , constructed from the vertices . For example, one may use the Vietoris-Rips or witness constructions discussed in [Carlsson_09].

  • An emperical density estimator on , which we call created from the data points . A common choice would be a kernel density estimate defined by

    A reasonable choice for the kernel function would be the standard Gaussian density function.

Given a chain map , if , then we interpret the image to be the weighted average of the points in the chain. In other words, define , where takes a vertex in to its Euclidean coordinates. Since we wish to move points in the image of the chain map to regions of high-density, we form the following optimization problem

(16) maximize
subject to

5.4. Mapper and Contractible Data Sets

Another extension of the idea of hom-based mappings is to shape matching or data fusion. The discussion of gene expression data in the introduction provides the motivation for this. Suppose that we have two data sets and which for now are just sets of points. If these data sets arise from sampling a null-homotopic space, then Proposition 2 tells us that the homological mapping technique described in the previous section will not be very helpful. However, there is a way to remedy this.

In the paper [Mapper], the authors describe a multiscale decomposition method called mapper. The idea is that one has a filter function , and we cluster the set according to preimages of overlapping intervals. Although we do not fully describe the method here, mapper allows a statistical practitioner to obtain a multiscale representation of the data at different resolutions. The reader is advised to consult [Mapper] for a detailed discussion.

Given outputs of the mapper algorithm, we are interested in constructing maps between different representations of the same dataset, either from different filter functions or from different scale parameters. This problem is relevant in biological settings in which the obtained datasets are highly dependent on the measuring procedures used. We combine our homological mapping procedure with mapper into a structure mapping method as follows:

  • Given two data sets and , and two filter functions , we run the mapper algorithm to obtain reduced simplicial models and . For the 1-dimensional version of mapper on a contractible data set, and will be trees.

  • Since the filter functions and are also defined on the and , we take the quotient of each tree by the set of vertices which are local maxima of the filtration functions. This yields two graphs and . In general, these two graphs may have cycles.

  • We run the homological mapping algorithm to obtain an optimal chain map, between and .

6. Implementation and Results

6.1. Software

The above ideas were implemented as described below in a new version of the JavaPlex software package [javaPlex]. Further optimization and scripting was performed using Matlab. The computation of the homology of the hom-complex was performed over the field in exact arithmetic, and the optimization was performed in floating-point.

We also note that this entire mapping procedure can be performed in a persistent setting where in addition to the natural grading of the chain complex by dimension, we have a grading by filtration. For a good discussion on persistent homology, the reader is invited to look at [Carlsson_09] and [Carlsson_04]. The one modification we make is that we only select representatives for which correspond to nontrivial persistence intervals. In other words, in the expression , we set the coefficients equal to 1 for significant intervals and 0 for nonsignificant intervals.

6.2. Visualization

The visualizations in this section show various examples of mappings between simplicial complexes. The domain complexes are on the left, and the codomains are on the right. Colors of the complexes are computed as follows:

  • The color of the domain complex is fixed. We start with map mapping the vertices in to their RGB values. The color of a -simplex for is defined to be the average of the colors of its vertices. In other words .

  • To compute the color of a simplex , under the map , we define by , where we extend linearly over chains in , and where is the adjoint of . This is analogous to the definition of the pushforward of a measure.

6.3. Examples

In Figure 2, we show an example of a homotopy representative between a circle with 8 vertices and a circle with 4 vertices. In order to compute the map, a random corner point of the admissible set, , was selected. In other words, the map was a random extremal point of the polytope of maps minimizing the maximum row and column sums. The computed map is given below, where the block in the upper left corner is the map on the 0-skeleton and the block in the lower right corner is the map on the 1-skeleton:

(17)

In Figure 3 we show an example of a map computed by minimizing the Alexander-Whitney function with quadratic loss. In Figure 4, the figure on the left is a simplicial complex created using the lazy-witness construction from a sample of 500 points on a trefoil knot, [Witness]. A map to a circle with 4 vertices is shown. In Figures 5 - 7, we explore the applications described in section 5: computing a map to a manifold, density maximization, and contractible datasets. We refer the reader to the captions for more information regarding these.

Figure 2. Simple example of a visualization of a chain map. The map was computed by selecting a random extremal point of the polytope . An artefact of the visualization method described in section 6.2 is that the colors on the figure on the right are more intense than those on the left. This is due to the fact that the rows in the matrix in equation 17 sum to greater than 1.
Figure 3. This example shows an icosahedron being mapped to an octahedron. This map was constructed by performing the Alexander-Whitney optimization with quadratic loss. Note that the map was rescaled to prevent the colors in the codomain from being washed-out as in Figure 2.
Figure 4. The shape on the left was created by first randomly sampling 500 points on a trefoil knot. From this, the lazy-witness construction was used to construct a filtered simplicial complex as described in [Witness] on a landmark set of 40 points constructed by sequential max-min selection. The mapping was obtained by randomly selecting 100 vertices of the polytope , and then choosing the one which had greatest 2-norm to 1-norm ratio.
Figure 5. Manifold map example. The domain, , consists of a simplicial circle with 60 vertices, and the codomain, , consists of a circle embedded in the plane. Given a map , we may compute the embedded coordinates for the points in as described in section 5.2. The object here is to find a map from to that minimizes the total distortion across the 1-skeleton of the domain. On the left, the images of the domain points are shown as crosses, whereas the codomain points are shown as circles. On the right, we show the relationship between the original angular coordinates for points in the domain on the horizontal axis, versus the computed angular coordinates on the vertical axis.
Figure 6. Density maximization example. In the above figure, the codomain points consist of a random sample from the unit circle, and the domain complex is an idealized circle with 10 vertices. The locations of the domain points are computed by selecting the homotopy representative that maximizes the density of the image points as described in section 5.3.
Figure 7. This figure shows the homological mapping algorithm applied to mapper outputs. The mapper outputs are quotiented out by maxima of the filtration function. For this example, we used an eccentricity filter.

7. Concluding Remarks

In this paper we have discussed a method for computing maps between two simplicial complexes that respect their homological structure. The computation is done in a two-stage process: first a parameterization is obtained for the homotopy classes of chain maps, and then an optimization procedure is run to select one of the maps from the affine parameterization. We have also demonstrated the method on various examples. Some key distinguishing features in comparison with traditional statistical dimensionality reduction and mapping techniques include:

  • The domain and codomain data sets are not required to be Euclidean spaces, or even metric spaces.

  • Conventional linear and nonlinear dimensionality reduction methods rely on the fact that the data can be somehow unfolded into a convex subset of Euclidean space. The homological method presented is designed to preserve nontrivial topological structure.

  • Unlike the method of circular coordinates, or various other surface mapping algorithms, in principle the method presented in this paper is not restricted by the dimension or structure of either the domain or codomain spaces.

Nevertheless, the mapping technique in its current state suffers from a few shortcomings:

  • There is no universal optimization problem which produces geometrically satisfying maps in all cases. Depending on the application or situation, a practitioner might want to use different objective functions or constraints.

  • Since the computation relies on the construction of the hom-complex, the fundamental problem size given simplicial complexes of sizes and is the product, . This leads to somewhat poor algorithmic complexity in comparison with first order methods and has limited the sizes of the examples presented.

A key step to improving the applicability of hom-complex based mappings would be to alleviate the problems with its algorithmic efficiency. It would be interesting to investigate what optimizations would enable this method to scale to datasets of more reasonable size. Despite these shortcomings, the examples in this paper are designed to be a proof-of-concept for hom-complex based mappings.

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
24821
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description