On \varepsilon Approximations of Persistence Diagrams

On Approximations of Persistence Diagrams

Jonathan Jaquette Department of Mathematics, Hill Center-Busch Campus, Rutgers University, 110 Frelinghusen Rd, Piscataway, NJ 08854-8019, USA jaquette@math.rutgers.edu  and  Miroslav Kramár Department of Mathematics, Hill Center-Busch Campus, Rutgers University, 110 Frelinghusen Rd, Piscataway, NJ 08854-8019, USA miroslav@math.rutgers.edu
Abstract.

Biological and physical systems often exhibit distinct structures at different spatial/temporal scales. Persistent homology is an algebraic tool that provides a mathematical framework for analyzing the multi-scale structures frequently observed in nature. In this paper a theoretical framework for the algorithmic computation of an arbitrarily good approximation of the persistent homology is developed. We study the filtrations generated by sub-level sets of a function , where is a CW-complex. In the special case , we discuss implementation of the proposed algorithms. We also investigate a priori and a posteriori bounds of the approximation error introduced by our method.

The first author’s research was funded in part by AFOSR Grant FA9550-09-1-0148 and NSF Grant DMS-0915019.
The second author’s research was funded in part by NSF Grants DMS-1125174 and DMS-0835621.
copyright: ©: American Mathematical Society

1. Introduction

The formation of complex patterns can be encountered in almost all sciences. Many patterns observed in nature are extremely complicated and it is usually impossible to completely understand them using analytical methods. Often a coarser but computationally tractable description is needed. In recent years computational topology [8, 14] has become widely recognized as an important tool for quantifying complex structures. Computational homology has been used to quantitatively study spatio-temporal chaos [9] as well as to analyze complicated flow patterns generated by Rayleigh-Bénard convection [19, 20], time evolution of the isotropically compressed granular material [15] and complex microstructures generated in binary metal alloys through a process called spinodal decomposition [10, 21]. Unfortunately homology groups can be extremely sensitive to small perturbations [17]. Persistent homology [7] overcomes this problem [4] and provides a more powerful tool for studying evolution of complex patterns in the granular media [17, 16] and fluid dynamics [18].

In the applications mentioned above the pattern is given by a sub- or super-level set of a real valued function . Therefore analyzing the patterns usually involves numerical study based on a suitable discretization. It is important to know when the homology groups of a sub-level set of can be inferred from a discrete approximation of this set. Probabilistic results, for a function , are presented in [24]. An alternative approach is to construct a cubical approximation of the sub-level set with the same homology groups [5]. The algorithm presented in [5] can handle functions defined on a unit square and uses a uniform grid on . If the resolution of a grid is fine enough, then the sub-level set can be approximated by a collection of the grid elements that intersect the sub-level set. This often leads to an unnecessary large complex. The size of the complex can be reduced by using a randomized version of the algorithm [3]. Using a regular CW complex to construct the nodal approximation [6] simplifies the homology computations.

In this paper we develop a method for approximating the sub-level sets of a continuous function , where is a regular CW complex. Note that the sub-level set does not have to be a CW complex. The following definition provides conditions on a CW structure of under which the singular homology groups of the CW complex

are isomorphic to the homology groups . Throughout the paper, we use the notation “” to mean “ for all ”.

Definition 1.1.

Let be a finite regular complex with a CW structure . Suppose that is a continuous function and . We say that the CW structure is compatible with if every cell satisfies one of the following conditions:

  1. ,

  2. ,

  3. there exists a deformation retraction of the set onto .

According to the following theorem, proved in Section 3, the singular homology groups can be obtained by computing the cellular homology groups of the complex .

Theorem 1.2.

Let be a finite regular CW complex. If the CW decomposition of is compatible with , then the map , induced by the inclusion , is an isomorphism.

Due to computational complexity of computing homology, it is extremely important that the number of cells in the complex is as small as possible. To achieve this goal we use a multi-scale CW structure . The diameter (size) of the cells in varies with local complexity of the pattern. Larger cells are used at places with smaller complexity while small cells might be necessary at the places where spacial scales of the pattern are small. To produce a multi-scale CW structure we start from a coarse CW structure of and keep refining the cells that do not satisfy any of the conditions in Definition 1.1. For the number of cells in is smaller than the number of cells in the cubical complex used in [5, 6].

To facilitate the refinement process we introduce self similar grids. The size of the grid elements might vary but all the grid elements of are homeomorphic to . The resolution of the self similar grid can be dynamically adjusted at different parts of . To increase the resolution, each grid element can be refined into a union of smaller complexes homeomorphic to . We represent a self similar grid by a tree structure. The root of this tree corresponds to the whole complex . The -th level nodes represent the complexes obtained by refinements of . Finally, the leaves correspond to the grid elements of .

The tree representation of provides a basis for a memory efficient data structure for storing the complex and its boundary operator. We stress that the boundary operator is not stored as a large matrix. Instead it is computed using the boundary operator corresponding to the coarse (unrefined) CW structure, the tree structure representing the self similar grid and geometric information about intersections of the grid elements.

While constructing a compatible CW structure for an arbitrary regular CW complex might be challenging, it is straight forward when , where . The coarse CW structure of , employed in this paper, is given by standard decomposition of a cube into vertices, edges and higher dimensional faces (cells). A self similar grid on consists of dyadic cubes of different sizes. If every cube (grid element) in satisfies the following definition, then every cell in the CW structure generated by the grid satisfies one of the conditions in Definition 1.1. Moreover, if a cell satisfies Condition (3) in Definition 1.1, then the function can be taken to be non-increasing in .

Definition 1.3.

Let and be an dimensional dyadic cube. We say that is -verified if for some coordinate vectors , where , and for every dimensional cell of the cube which is orthogonal to the vectors , then either or .

A cell is considered to be orthogonal to a collection of vectors if the projection of onto is zero dimensional. The condition in Definition 1.3 can be checked using interval arithmetics. Therefore the homology groups can be evaluated on a computer. However, if is close to a critical value, then using interval arithmetics to guarantee that the cubes surrounding the critical point satisfy Definition 1.3 might require an extremely fine resolution or might not be possible at all. Another problem with this approach is that the topology of the set depends on the choice of the threshold , and small changes of can lead to large changes of the topology as demonstrated in [17]. So the particular choice of the threshold might be hard to justify. Both of these problems can be overcome by using the persistent homology [7, 8].

Persistent homology relates the homology groups of for different values of . It reduces the function to a collection of points in the plane. This collection of points is called a persistence diagram and denoted by . Each point in the persistence diagram encodes a well defined geometric feature of . Hence the persistent homology provides a finer description of the pattern than the homology groups of any one sub-level set. The set of all persistence diagrams can be turned into a complete metric space by using a Bottleneck distance between the diagrams [4, 22].

The main contribution of this paper is a theoretical framework for constructing an approximation of the persistence diagram . For and a continuous function we define an approximate filtration of as follows.

Definition 1.4.

Let be a finite regular CW complex and a continuous function. Suppose that a sequence of real numbers has the following properties:

  1. and ,

  2. and ,

  3. for .

Then the collection of sub-level sets is called an approximate filtration of .

It follows from the stability results [4] that the persistence diagram of any approximate filtration of is close to . More precisely

(1.1)

Construction of the persistence diagram requires determining homology groups of the sub-level sets in some approximate filtration of . Freedom in choosing an approximate filtration of allows us to stay away from the critical values. In order to compute the persistence diagram of an approximate filtration one might be tempted to replace this filtration by its cellular counterpart . However there is no guarantee that and so may not be a filtration. (See Figure 2 for a counterexample.) In order to construct a cellular filtration we have to require that the CW structures are commensurable.

Definition 1.5.

Let be a collection of CW structures on . We say that they are commensurable if for every such that then either or .

The CW structures constructed using a self similar grid are always commensurable. The CW complexes define by form an approximate cellular filtration of . This filtration has the following property.

Theorem 1.6.

Let be an approximate cellular filtration of corresponding to . Suppose that for every cell which satisfies Condition (3) in Definition 1.1, the function is non-increasing in . Then the persistence diagrams of the filtrations and are equal.

Corollary 1.7.

Let be an approximate cellular filtration of . If is the persistence diagram of this filtration, then .

The persistence diagram can be computed even if we fail to construct CW structures compatible with for some of the thresholds . In this case the distance can be evaluated a posteriori. The required resources for computing the persistence homology can be further reduced if zig-zag persistence [1] is used.

The remainder of this paper is organized as follows. In Section 2 we recall the definition of a CW-complex, homology and persistent homology. Theorem 1.2 is proven in Section 3. In Section 4 we address the Inequality (1.1). A precise definition of a self similar grid is given in Section 5. These grids are employed in Section 6 to build a compatible structure. A memory efficient data structure for storing the CW complex and its boundary operator is presented in Section 7. In the last section we use our method to compute the approximate persistence diagram of a two dimensional Fourier series with modes and discuss the a posteriori error estimates.

2. Background

In this section we recall definitions of a CW-complex [12], homology [14] and persistent homology [7]. Notation introduced in this section is used throughout the paper. We start by introducing a regular CW-complex.

Let be a Hausdorff space. A (finite) regular CW decomposition of is a (finite) set of subspaces of with the following properties:

  1. is a partition of , that is and

  2. Every is homeomorphic to some euclidean space

  3. For every -cell there exists a continuous embedding such that where is the closure of , and .

The number is well determined and is called the dimension of . The sets which are homeomorphic to are the -cells. The boundary of a cell is defined by . The union is the -skeleton of the CW decomposition. The map is called a characteristic map for and an attaching map. The space is called a regular CW complex if there exists a regular CW decomposition of .

Cellular homology is useful for computing the homology groups of CW complexes. For a regular CW-complex the -chains are defined by

There is a one-to-one correspondence between the basis elements of and the -cells of . For every dimensional cell the map generates a monomorphism . For a fixed generator the set

is a basis of . For every we denote its corresponding geometric cell by .

The boundary operator is defined by the following composition

where is the connecting homomorphism from the long exact sequence of the pair and is the quotient map. In the above chosen basis of , the boundary operator is completely determined by the values on the basis elements and can be uniquely expressed as

The coefficient is called an incidence number of the cells. Regularity of the complex implies that the incidence number is non-zero only for the pairs of cells such that . Moreover in this case.

The cellular complex corresponding to is defined by . The singular homology of is isomorphic to the cellular homology of , which is given by

We close this section with a short overview of persistent homology, following the presentation given in [7]. Let be a function defined on a CW complex . We denote its sub-level sets by . The function is called tame if the homology groups of every sub-level set have finite ranks and there are only finitely many values across which the homology groups are not isomorphic. Let be the values for which the homology of and differ for all . We define an interleaved sequence such that , and , for . For each there is a natural inclusion of the set into and we denote the induced homomorphism between the corresponding homology groups by

Persistent homology makes use of the above mentioned maps to compare the homology groups of for different values of . We say that the homology class is born at if it does not come from a class in , that is . The class dies at if is in but not in . We denote the birth-death pair for by . The -th persistence diagram is defined to be the multi set of the points containing:

  1. one point for each pair .

  2. infinitely many copies of the points at the diagonal.

If has dimension , then the persistence diagram for the function is the collection . Actually, persistence diagrams can be defined for continuous function , where is the realization of a simplicial complex as a topological space[2]. In this case the persistence diagrams can contain accumulation points at the diagonal. In other words there might be a sequence of off diagonal points converging to the diagonal. Using the same reasoning as in [2] the same is true if is a finite regular CW-complex.

Differences between the persistence diagrams can be assessed using a bottleneck distance defined as follows.

Definition 2.1.

Let and be persistence diagrams. The bottleneck distance between and is defined to be

where and ranges over all bijections.

3. Homology of sub-level sets

Figure 1. (a) A simple function . The sub-level set corresponds to the part of at which the value of is below the semitransparent plane. (b) The set is shown in purple. The black vertices, edges and squares bounded by the edges define a CW structure on . (c) The set corresponding to the decomposition .

In this section we present a method for computing the homology of sub-level sets. Let be a CW-complex and a continuous function; we are interested in the homology of the sub-level set . Suppose that is a CW structure on , then we define the cellular approximation to be

(3.1)

Note that is a closed subset of , and since is a finite union of cells, then it is a sub-complex of . For the function shown in Figure 1 the set is homologous to . According to Theorem 1.2, the homology groups of and are isomorphic if the CW-structure is compatible with . We close this section by proving Theorem 1.2.

Proof.

If (using singular homology), then the fact that is an isomorphism follows from the long exact sequence of the pair. We prove that by induction on the dimension of .

For a zero dimensional complex , the set is equal to and so trivially . Now we show that for an dimensional complex under the assumption that the statement is true for any complex with dimension less than . Let us first suppose that contains a single dimensional cell . Then . Define to be the skeleton of . While is not a CW complex in general, we define an analog of its skeleton by . By the inductive hypothesis, the homology groups are trivial.

The CW structure is compatible with , so the cell satisfies one of the conditions in Definition 1.1. If , then and . Hence we obtain the set equality and so by the inductive assumption the relative homology groups stay trivial.

Now suppose that . In this case both and . In general, we have the equality , so it follows from the inductive assumption that . We would like to apply the excision theorem to obtain . However, the assumptions of the singular excision theorem are not actually satisfied because is not in the interior of . This problem can be overcome by expanding and inside using a small neighborhood of . Note that for small enough the homology groups of and do not change.

If satisfies Condition (3) in Definition 1.1, then there exists a deformation retraction of the set onto . Note that is a continuous map on the closed set , and the identity map on the closed set is also continuous. Since these two maps agree on the intersection of their domains, we can extend to a deformation retraction of onto . Existence of also implies that there exists some such that , hence and . By induction again.

Finally suppose that is an arbitrary dimensional complex. Let be its dimensional cells. We repeat the above process times. At each step we add a cell to

4. Persistent homology

In order to exactly compute the persistence diagrams of a continuous bounded function, one must identify all thresholds for which the homology of and differ for all . If is a Morse function, then the thresholds are the critical values of , which one may identify after locating every point where the Jacobian vanishes. Such a root finding problem is frequently studied in nonlinear optimization [11]. In applications though, it may not always be feasible to rigorously find every root of . In such a situation, it would be desirable to still be able to compute the persistence diagrams of , even at the cost of losing some accuracy.

In this section we present a framework for computing an approximation of the persistence diagram . For a given we construct a cellular approximation of the filtration corresponding to . The persistence diagram of this approximation is close to , that is . We start by choosing an approximate filtration (cf. Definition 1.4). The following lemma shows that the bottleneck distance between the persistent diagram for the approximate filtration of and the persistence diagram of is at most .

Lemma 4.1.

Let be a CW complex and a continuous bounded function. If is a persistence diagram of any approximate filtration of then .

Proof.

To prove the lemma we have to show that for every -th persistence diagram there exists a bijection from to such that for every .

If and , then we define . Clearly in this case. The collection of sets is an filtration. Therefore, if , then there exist indices and such that . So the topological feature corresponding to is not present in the sub-level set but it can be found in . Hence for the approximate filtration there exists a unique topological feature , corresponding to , born at . By the same reasoning the feature dies at and we define . Again the distance , and the number of points in is smaller or equal to the number of points in . So is a bijection. ∎

It is appealing to construct a cellular approximate filtration by replacing the sets with their cellular surrogates introduced in the previous section. However the collection might not be a filtration, as shown in Figure 2. Namely, the set is not a subset of because of different choices of the compatible CW structures for the sub-level sets. To avoid this problem we construct a common refinement of the CW structures. Our refinement process relies on the fact that the CW structures are commensurable (cf. Definition 1.5).

Definition 4.2.

Let be a commensurable collection of CW structures on . Then the set

is called the common refinement of the CW structures .

The CW structures and shown in Figures 2 (b-c) are commensurable and their common refinement is . In this paper we will use the common refinement of all the structures to compute the persistent homology. If zig-zag persistence [1] is used, then it suffices to consider the common refinements of consecutive CW structures and . We stress that the common refinement does not need to be compatible with any of the sub-level sets . Hence we need to develop a more sophisticated approach for building a cellular representation of the approximate filtration. Before introducing a cellular representation of the filtration let us prove that the common refinement is a regular CW structure on .

Lemma 4.3.

Let be a common refinement of the commensurable CW structures on . Then is a CW structure on .

Proof.

First we show that every point is contained in a unique cell . Note that the set is not empty. If , then by Definition 1.5 either or . This implies that the cell is an element of . Moreover for any the cell and if . Therefore . Suppose by contradiction that is not a unique cell in containing . Then there exist such that . It follows from Definition 4.2 that and . So .

We showed that is a disjoint cellular decomposition of . Naturally, every cell is homeomorphic to some euclidean space . Now we need to define the characteristic maps for the cells. Every cell in belongs to at least one . We identify the characteristic map for with the characteristic map given by one (arbitrary but fixed) CW structure . Since every cell in is covered by a collection of smaller cells in , the boundary of a cell is contained in the union of cells in of dimension less than . Hence the set of cells equipped with these maps is a CW structure on . ∎

Figure 2. (a) A contour plot of a function . (b) The sub-level set and its cellular approximation obtained by the compatible CW structure with four squares ( dimensional cells.) (c) The sub-level set and its cellular approximation obtained by the compatible CW structure with a single dimensional cell. (d) The sub-level set with the refined approximation , which is equal to .

Since the common refinement does not have to be compatible with any of the sub-level sets, we cannot use it to generate a cellular approximation of the filtration . Instead we approximate the sub-level sets by

(4.1)
Definition 4.4.

Let be an approximate filtration of a continuous function and a collection of commensurable CW structures, where is compatible with for every . Then the collection of sets is called an approximate cellular filtration of corresponding to .

Remark 4.5.

Every set is a sub-complex of , where the CW structure of is given by the common refinement . Moreover is a filtration, that is

Our goal is to show that the persistence diagrams of the filtrations and are equal. This is true if the diagram shown below in Figure 3, with the maps induced by inclusion, commutes and the maps are isomorphisms. It follows from Remark 4.5 and the functorality of homology that the diagram commutes. To prove that are isomorphisms we will have to strengthen the definition of a compatible CW structure.

Figure 3. The persistence diagrams of the filtrations and are equal if the above diagram commutes and the maps , induced by inclusion, are isomorphisms.

First, we illustrate the idea on a simple example shown in Figure 2. The cells that are not in are inside of the single 2 dimensional cell and additionally . Furthermore because the image contains both values above and below . The fact that the CW structure is compatible with implies that there is a deformation retraction from onto . If is non-increasing in , then it can be restricted to a deformation retraction from onto . By the same argument as in the proof of Theorem 1.2 the inclusion map from to induces an isomorphism on the homology level. We now prove Theorem 1.6, which treats the general case.

Proof.

The persistence diagrams of the filtrations and are equal if the diagram in Figure 3 commutes and the maps , induced by inclusion, are isomorphisms. We already showed that the diagram computes. Now we have to prove that every is an isomorphism. According to Theorem 1.2, the inclusion of each into induces an isomorphism. Hence it suffices to show that the inclusion from into also induces an isomorphism in homology.

The set is a sub-complex of . So

for some cells . We denote . First we divide the cells in into disjoint groups. Then we show that by removing all of the cells in the first group we obtain a CW sub-complex of and the map induced by the inclusion of this complex into is an isomorphism. We prove that the maps are isomorphisms by inductively removing the cells in all the other groups.

Before grouping the cells into disjoint classes let us show that for every cell there exists and , where , such that:

  1. ,

  2. .

By the construction of , each cell is a subset of a unique cell and . The fact that implies for some It follows from the construction of the cellular approximation that . Otherwise we would have . Therefore there exists a cell which satisfies the properties (1) and (2).

Let be the cells with the properties (1) and (2) corresponding to the cells in and let be a set of the maximal cells. That is, a cell is removed from the set if there is another cell such that . We order the cells in by dimension so that for . We say that the cells are in the same group if there exists some such that both and are subsets of .

We will remove the cells from in steps. During the -th step all of the cells such that are removed. Note that . Due to the ordering of the cells in we get a CW-complex at each step. To show this let be contained in some . We show that is not in the boundary of any cell in . The cell for some and . It follows from the construction of that every cell such that is not in . So every cell in whose closure intersects the cell is contained in some cell such that and was removed before . Therefore at each step we get a closed subset of the cells from which is a CW-complex. Moreover, at each step we can express the complex as a finite union of cells from , so it also has a CW structure induced by .

First we suppose that and prove that the map , induced by inclusion, is an isomorphism. To do so we apply the five lemma to the following diagram.

Horizontal maps are given by the cellular Mayer-Vietoris sequences for the CW complexes given by the middle terms. The vertical maps are induced by inclusion. The first two vertical maps are induced by identity maps and are thereby isomorphisms. To satisfy the assumptions of the five lemma we have to show that the map is also an isomorphism.

We recall that the cell