Volume Optimal Cycle

Volume Optimal Cycle: Tightest representative cycle of a generator on persistent homology

Ippei Obayashi
Abstract.

This paper shows a mathematical formalization, algorithms and computation software of volume optimal cycles, which are useful to understand geometric features shown in a persistence diagram. Volume optimal cycles give us concrete and optimal homologous structures, such as rings or cavities, on a given data. The key idea is the optimality on -chain complex for a th homology generator. This optimality formalization is suitable for persistent homology. We can solve the optimization problem using linear programming. For an alpha filtration on , volume optimal cycles on an -th persistence diagram is more efficiently computable using merge-tree algorithm. The merge-tree algorithm also gives us a tree structure on the diagram and the structure has richer information. The key mathematical idea is Alexander duality.

1. Introduction

Topological Data Analysis (TDA) [1, 2], which clarifies the geometric features of data from the viewpoint of topology, is developed rapidly in this century both in theory and application. In TDA, persistent homology and its persistence diagram (PD) [3, 4] are important tool for TDA. Persistent homology enables us to capture multiscale topological features effectively and quantitatively. Fast computation softwares for persistent homology are developed [5, 6] and many applications are achieved such as materials science [7, 8, 9], sensor networks [10], evolutions of virus [11], and so on. From the viewpoint of data analysis, a PD has some significant properties: translation and rotation invariance, multiscalability and robustness to noise. PDs are considered to be compact descriptors for complicated geometric data.

th homology encodes dimensional geometric structures of data such as connected components (), rings (), cavities (), etc. th persistent homology encodes the information about dimensional geometric structures with their scale. A PD, a multiset111A multiset is a set with multiplicity on each point. in , is used to summarize the information. Each point in a PD is called a birth-death pair, which represents a homologous structure in the data, and the scale is encoded on x-axis and y-axis.

Figure 1. The 1st PD for the atomic configuration of amorphous silica in [7], reproduced from the simulation data. The data is provided by Dr. Nakamura.

Typical workflow of the data analysis with persistent homology is as follows:

  1. Construct a filtration from data

    • Typical input data is a point cloud, a finite set of points in and a typical filtration is an alpha filtration

  2. Compute the PD from the filtration

  3. Analyze the PD to investigate the geometric features of the data

In the last part of the above workflow, we often want to inversely reconstruct a geometric structure corresponding each birth-death pair on the PD, such as a ring or a cavity, into the original input data. Such inverse analysis is practically important for the use of PDs. For example, we consider the 1st PD shown in Fig. 1 from the atomic configuration of amorphous silica computed by molecular dynamical simulation [7]. In this PD, there are some characteristic bands , and these bands correspond to typical geometric structures in amorphous silica. To analyze the PD more deeply, we want to reconstruct rings corresponding such birth-death pairs in the original data. In the paper, optimal cycles, one of such inverse analysis methods, are effectively used to clarify such typical structures.

Figure 2. A simplicial complex with one hole.

A representative cycle of a generator of the homology vector space has such information, but it is not unique and we want to find a better cycle to understand the homology generator for the analysis of a PD. For example, Fig. 2(a) has one homology generator on , and cycles , , and shown in Fig. 2 (b), (c), and (d) are the same homologous information. However, we consider that is best to understand the homology. Optimization problems on homology are used to find such a representative cycle. We can find the “tightest” representative cycle under a certain formalization. Such optimization problems have been widely studied under various settings[12, 13, 14], and two concepts, optimal cycles[15] and volume optimal cycles[16], have been successfully applied to persistent homology. The optimal cycle minimizes the size of the cycle, while the volume optimal cycle minimizes the internal volume of the cycle. Both of these two methods give a tightest cycle in different senses. The volume optimal cycles for persistent homology have been proposed in [16] under the restriction of dimension. We can use them only for -th persistent homology embedded in , but under the restriction, there is an efficient computation algorithm using Alexander duality.

In this paper, we generalize the concept of volume optimal cycles on any persistent homology and show the computation algorithm. The idea in [16] is not applied to find a volume optimal ring (a volume optimal cycle for ) in a point cloud in but our method is applicable to such a case. In that case, optimal cycles are also applicable, but our new algorithm is simpler, faster for large data, and gives us better information.

The contributions of this paper are as follows:

  • The concept of volume optimal cycles is proposed to identify good representatives of generators in persistent homology. This is useful to understand a persistence diagram.

    • The concept has been already proposed in [16] in a strongly limited sense about dimension and this paper generalize it.

    • Optimal cycles are also usable for the same purpose, but the algorithm in this paper is easier to implement, faster, and gives better information.

      • Especially, children birth-death pairs shown in Section 6 are available only with volume optimal cycles.

  • Mathematical properties of volume optimal cycles are clarified.

  • Effective computation algorithms for volume optimal cycles are proposed.

  • The algorithm is implemented and some examples are computed by the program to show the usefulness of volume optimal cycles.

The rest of this paper is organized as follows. The fundamental ideas such as persistent homology and simplicial complexes are introduced in Section 2. In Section 3 the idea of optimal cycles is reviewed. Section 4 is the main part of the paper. The idea of volume optimal cycles and the computation algorithm in a general setting are presented in Section 4. Some mathematical properties of volume optimal cycles are also shown in this section. In Section 5 we show some special properties of -th persistent homology in and the faster algorithm. We also explain tree structures in -th persistent homology. In Section 6, we compare volume optimal cycles and optimal cycles. In Section 7 we show some computational examples by the proposed algorithms. In Section 8, we conclude the paper.

2. Persistent homology

In this section, we explain some preliminaries about persistent homology and geometric models. Persistent homology is available on various general settings, but we mainly focus on the persistent homology on a filtration of simplicial complexes, especially an alpha filtration given by a point cloud.

2.1. Persistent homology

Let be a filtration of topological spaces where is a subset of or . That is, holds for every . Then we define th homology vector spaces whose coefficient is a field and homology maps for all induced by inclusion maps . The family is called the th persistent homology. The theory of persistent homology enables us to analyze the structure of this family.

Under some assumptions, is uniquely decomposed as follows [3, 4]:

where with . Here, consists of a family of vector spaces and linear maps:

This means that for each there is a dimensional hole in and it appears at , persists up to and disappears at . In the case of , the dimensional hole never disappears on . is called a birth time, is called a death time, and the pair is called a birth-death pair. When is a filtration of finite simplicial/cell/cubical complexes on with (we call a finite filtration under the condition), such a unique decomposition exists.

When we have the unique decomposition, the th persistence diagram of , , is defined by a multiset

and the 2D scatter plot or the 2D histogram of is often used to visualize the diagram.

We investigate the detailed algebraic structure of persistent homology for the preparation. For simplicity, we assume the following condition on .

Condition 1.

Let be a finite simplicial complex. For any , is a subcomplex of .

Under the condition,

(1)

is a filtration of complexes. For a general finite filtration, we can construct a filtration satisfying Condition 1 by ordering all simplices properly. Let be the boundary operator on and be a boundary operator of . Cycles and boundaries are defined by the kernel of and the image of , and th homology vector spaces are defined by . Condition 1 says that if is a -simplex,

(2)

holds. From the decomposition theorem and (2), for each birth-death pair , we can find such that

(3)
(4)
(5)
(6)
(7)

where . (6) holds only if . This is a homology generator that persists from to . is called the persistence cycles for . An algorithm of computing a PD actually finds persistence cycles from a given filtration. The persistence cycle of is not unique, therefore we want to find a “good” persistence cycle to find out the geometric structure corresponding to each birth-death pair. That is the purpose of the volume optimal cycle, which is the main topic of this paper. We remark that the condition (7) can be easily proved from (3-6) and the decomposition theorem, and hence we only need to show (3-6) to prove that given are persistence cycles.

2.2. Alpha filtration

One of the most used filtrations for data analysis using persistent homology is an alpha filtration [2, 17]. An alpha filtration is defined from a point cloud, a set of finite points . The alpha filtration is defined as a filtration of alpha complexes and they are defined by a Voronoi diagram and a Delaunnay triangulation.

The Voronoi diagram for a point cloud , which is a decomposition of into Voronoi cells , is defined by

The Delaunnay triangulation of , , which is a simplicial complex whose vertices are points in , is defined by

where is the -simplex whose vertices are . Under the assumption of general position in the sense of [17], the Delaunnay triangulation is a simplicial decomposition of the convex hull of and it has good geometric properties. The alpha complex with radius parameter , which is a subcomplex of , is defined as follows:

where is the closed ball whose center is and whose radius is . A significant property of the alpha complex is the following homotopy equivalence to the -ball model.

where is the geometric realization of . The alpha filtration for is defined by . Figure 3 illustrates an example of a filtration by -ball model and the corresponding alpha filtration. The 1st PD of this filtration is . Since there are such that for any , we can treat the alpha filtration as a finite filtration .

Figure 3. An -ball model and the corresponding alpha filtration. Each red simplex in this figure appears at the radius parameter .

We mention an weighted alpha complex and its filtration [18]. An alpha complex is topologically equivalent to the union of -balls, while an weighted alpha complex is topologically equivalent to the union of -balls, where depends on each point. The weighted alpha filtration is useful to study the geometric structure of a point cloud whose points have their own radii. For example, for the analysis of atomic configuration, the square of ionic radii or Van der Waals radii are used as .

3. Optimal cycle

First, we discuss an optimal cycle on normal homology whose coefficient is . Figure 2(a) shows a simplicial complex whose 1st homology vector space is isomorphic to . In Fig. 2(b), (c), and (d), , , and have same information about . That is, . However, we intuitively consider that is the best to represent the hole in Fig. 2 since is the shortest loop in these loops. Since the size of a loop is equal to

and this is “norm”222 For a finite dimensional - or - vector space whose basis is , the norm is defined by . Mathematically this is not a norm since it is not homogeneous, but in information science and statistics, it is called norm. 333 On a -vector space, any norm is not defined mathematically, but it is natural that we call this norm. , we write it . Here, is the solution of the following problem:

The minimizing is called the optimal cycle for . From the definition of homology, we can rewrite the problem as follows:

(8) minimize

Now we complete the formalization of the optimal cycle on a simplicial complex with one hole.

Figure 4. A simplicial complex with two holes.

How about the case if a complex has two or more holes? We consider the example in Fig. 4. From and , we try to find and using a similar formalization. If we apply the optimization (8) to each and , and are found. How can we find from and ? The problem is a hole represented by , therefore we “fill” that hole and solve the minimization problem. Mathematically, filling a hole corresponds to considering instead of and the following optimization problem gives us the required loop .

minimize

When you have a complex that has many holes, you can apply the idea repeatedly to find all optimal cycles. The idea of optimal cycles obviously applied th homology for any .

3.1. How to compute an optimal cycle

Finding a basis of a homology vector space is not a difficult problem for a computer. We prepare a matrix representation of the boundary operator and apply matrix reduction algorithm. Please read [19] for the detailed algorithm. Therefore the problem is how to solve the above minimizing problem.

In general, solving a optimization problem on a linear space is a difficult problem. The problem is a kind of combinatorial optimization problems. They are well studied but it is well known that such a problem is sometimes hard to solve on a computer.

One approach is using linear programming, used in [20]. Since optimization problem on is hard, we use as a coefficient. For coefficient, norm also means the size of loop and optimization is natural for our purpose. However, optimization is also a difficult problem. Therefore we replace norm to norm. It is well known in the fields of sparse sensing and machine learning that optimization gives a good approximation of optimization. That is, we solve the following optimization problem instead of (8).

(9) minimize

This is called a linear programming and we can solve the problem very efficiently by good solvers such as cplex444https://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/ and Clp555https://projects.coin-or.org/Clp.

Another approach is using integer programming, used in [12, 15]. norm optimization gives a good approximation, but maybe the solution is not exact. However, if all coefficients are restricted into or in the optimization problem (9), the norm and norm is identical, and it gives a better solution. This restriction on the coefficients has another advantage that we can understand the optimal solution in more intuitive way. Such an optimization problem is called integer programming. Integer programming is much slower than linear programming, but some good solvers such as cplex and Clp are available for integer programming.

3.2. Optimal cycle for a filtration

Now, we explain optimal cycles on a filtration to analyze persistent homology shown in [15]. We start from the example Fig. 5.

Figure 5. A filtration example for optimal cycles.

In the filtration, a hole appears at and disappear at , another hole appears at and appears at . The 1st PD of the filtration is . The persistence cycles , are computable by the algorithm of persistent homology and we want to find or to analyze the hole corresponding to the birth-death pair . The hole has been already dead at and remains alive at , so we can find or to solve the following optimization problem:

minimize

In this case, is chosen because . By generalizing the idea, we show Algorithm 1 to find optimal cycles for a filtration 666 In fact, in [15], two slightly different algorithms are shown, and this algorithm is one of them. . Of course, to solve the optimization problem in Algorithm 1, we can use the computation techniques shown in Section 3.1.

Compute and persistence cycles
Choose by a user
Solve the following optimization problem
minimize
Algorithm 1 Computation of optimal cycles on a filtration

4. Volume optimal cycle

In this section, we propose volume optimal cycles, a new tool to characterize generators appearing in persistent homology. In this section, we will show the generalized version of volume optimal cycles and the computation algorithm. The limited version of volume optimal cycles shown in [16] will be explained in the next section.

We assume Condition 1 and consider the filtration . A persistent volume for is defined as follows.

Definition 2.

is a persistent volume for if satisfies the following conditions:

(10)
(11)
(12)

where , , and is the dual basis of cochain , i.e. is the linear map on satisfying for any : -simplex.

Note that the persistent volume is defined only if the death time is finite.

The volume optimal cycle for and the optimal volume for the pair are defined as follows.

Definition 3.

is the volume optimal cycle and is the optimal volume for if is the solution of the following optimization problem.

minimize , subject to (10), (11), and (12).

The following theorem ensures that the optimization problem of the volume optimal cycle always has a solution.

Theorem 4.

There is always a persistent volume of any .

The following theorem ensures that the volume optimal cycle is good to represent the homology generator corresponding to .

Theorem 5.

Let be all persistence cycles for . If is a persistent volume of , are also persistence cycles for .

Intuitively say, a homology generator is dead by filling the internal volume of a ring, a cavity, etc., and a persistent volume is such an internal volume. The volume optimal cycle minimize the internal volume instead of the size of the cycle.

Proof of Theorem 4.

Let be a persistence cycle satisfying (3-6). Since

we can write as follows.

(13)

where . Note that the coefficient of in can be normalized as in (13). Now we prove that is a persistent volume. From and , we have and this means that for all . From , we have and therefore , and the right hand side is not zero since . Therefore satisfies all conditions (10-12). ∎

Proof of Theorem 5.

We prove the following arguments. The theorem follows from these arguments.

The condition (11), , means . The condition (12), , means . Since and , we have and this finishes the proof. ∎

4.1. Algorithm for volume optimal cycles

To compute the volume optimal cycles, we can apply the same strategies as optimal cycles. Using linear programming with coefficient and norm is efficient and gives sufficiently good results. Using integer programming is slower, but it gives better results.

Now we remark the condition (12). In fact it is impossible to handle this condition by linear/integer programming directly. We need to replace this condition to for sufficiently small and we need to solve the optimization problem twice for and . However, as mentioned later, we can often remove the constraint (12) to solve the problem and this fact is useful for faster computation.

We can also apply the following heuristic performance improvement technique to the algorithm for an alpha filtration by using the locality of an optimal volume. The simplices which contained in the optimal volume for , are contained in a neighborhood of . Therefore we take a parameter , and we use instead of to reduce the size of the optimization problem, where is the ball of radius whose center is the centroid of . Obviously, we cannot find a solution with a too small . In Algorithm 2, is properly chosen by a user but the computation software can automatically increase when the optimization problem cannot find a solution.

We also use another heuristic for faster computation. To treat the constraint (12), we need to apply linear programming twice for positive case and negative case. In many examples, the optimized solution automatically satisfies (12) even if we remove the constrain. There is an example in which the corner-cutting does not work (shown in 4.2), but it works well in many cases. One way is that we try to solve the linear programming without (12) and check the (12), and if (12) is satisfied, output the solution. Otherwise, we solve the linear programming twice with (12).

The algorithm to compute a volume optimal cycle for an alpha filtration is Algorithm 2.

procedure Volume-Optimal-Cycle()
     Compute the persistence diagram
     Choose a birth-death pair by a user
     Solve the following optimization problem:
minimize
     if we find the optimal solution  then
         if  then
              return and
         else
              Retry optimization twice with the additional constrain:
         
     else
         return the error message to the user to choose larger .      
Algorithm 2 Algorithm for a volume optimal cycle

If your filtration is not an alpha filtration, possibly you cannot use the locality technique. However, in that case, the core part of the algorithm works fine and you can use the algorithm.

4.2. Some properties about volume optimal cycles

In this subsection, we remark some properties about volume optimal cycles.

First, the volume optimal cycle for a birth-death pair is not unique. Figure 6 shows such an example. In this example, and both (b) and (c) is the optimal volumes of the birth-death pair . In this filtration, any weighted sum of (b) and (c) with weight and () in the sense of chain complex is the volume optimal cycle of if we use as a coefficient and norm. However, standard linear programing algorithms choose an extremal point solution, hence the algorithms choose either or and our algorithm outputs either (b) or (c).

Figure 6. An example of non-unique volume optimal cycles.
Figure 7. An example of the failure of the computation of the volume optimal cycle if the constrain (12) is removed.

Second, by the example in Fig 7, we show that the optimization problem for the volume optimal cycle may give a wrong solution if the constrain (12) is removed. In this example, are birth-death pairs in the 1st PD, and the volume optimal cycle for is () in Fig. 7, but the algorithm gives () if the constrain (12) is removed.

5. Volume optimal cycle on -th persistent homology

In this section, we consider a triangulation of a convex set in and its -th persistent homology. More precisely, we assume the following conditions.

Condition 6.

A simplicial complex in satisfies the following conditions.

  • Any -simplex in is a face of an -simplex

  • is convex

For example, an alpha filtration satisfies the above conditions if the point cloud has more than points and satisfies the general position condition. In addition, we assume Condition 1 to simplify the statements of results and algorithms.

The thesis [16] pointed out that -th persistent homology is isomorphic to 0th persistent cohomology of the dual filtration by the Alexander duality under the assumption. By using this fact, the thesis defined volume optimal cycles under different formalization from ours. The thesis defined a volume optimal cycle as an output of Algorithm 3. In fact, the two definitions of volume optimal cycles are equivalent on -th persistent homology. 0th persistent cohomology is deeply related to the connected components, and we can compute the volume optimal cycle by linear computation cost. The thesis also pointed out that -th persistent homology has a tree structure called persistence trees (or PH trees).

In this section, we always use as a coefficient of homology since using makes the problem easier.

The following theorems hold.

Theorem 7.

The optimal volume for is uniquely determined.

Theorem 8.

If and are the optimal volumes for two different birth-death pairs and in , one of the followings holds:

  • ,

  • ,

  • .

Note that we can naturally regard any as a subset of -simplices of , , since we use as a homology coefficient.

From Theorem 8, we know that can be regarded as a forest (i.e. a set of trees) by the inclusion relation. The trees are called persistence trees.

We can compute all optimal volumes and persistence trees on by the merge tree algorithm (Algorithm 3). This algorithm is a modified version of the algorithm in [16]. To describe the algorithm, we prepare a directed graph where is a set of nodes and is a set of edges. In the algorithm, an element of is a -cell in and an element of is a directed edge between two -cells, where is the -cell in the one point compactification space . An edge has extra data in and we write the edge from to with extra data as . Since the graph is always a forest through the whole algorithm, we always find a root of a tree which contains a -cell in the graph by recursively following edges from . We call this procedure Root().

procedure Compute-Tree()
     initialize and
     for  do
         if  is a -simplex then
              add to
         else if  is a -simplex then
              let and are two -cells whose common face is
              
              
              if  then
                  continue
              else if  then
                  Add to
              else
                  Add to                             return
Algorithm 3 Computing persistence trees by merge-tree algorithm

The following theorem gives us the interpretation of the result of the algorithm to the persistence information.

Theorem 9.

Let be a result of Algorithm 3. Then the followings hold.

  1. The optimal volume for is all descendant nodes of in

  2. The persistence trees is computable from . That is, is a child of if and only if there are edges .

The theorems in this section can be proven from the following facts:

  • From Alexander duality, for a simplicial complex in ,

    holds.

    • is required for one point compactification of .

    • More precisely, we use the dual decomposition of .

  • By applying above Alexander duality to a filtration, -th persistent homology is isomorphic to -th persistent cohomology of the dual filtration.

  • On a cell complex , a basis of -th cohomological vector space is given by

    where is the connected component decomposition of 0-cells in .

  • Merge-tree algorithm traces the change of connectivity in the filtration, and it gives the structure of 0-the persistent cohomology.

We prove the theorems in Appendix A.

5.1. Computation cost for merge-tree algorithm

In the algorithm, we need to find the root from its descendant node. The naive way to find the root is following the graph step by step to the ancestors. In the worst case, the time complexity of the naive way is where is the number of of -simplices, and total time complexity of the algorithm becomes . The union-find algorithm [21] is used for a similar data structure, and we can apply the idea of union-find algorithm. By adding a shortcut path to the root in a similar way as the union-find algorithm, the amortized time complexity is improved to almost constant time777 More precisely, the amortized time complexity is bounded by the inverse of Ackermann function and it is less than 5 if the data size is less than . Therefore we can regard the time complexity as constant. . Using the technique, the total time complexity of the Algorithm 3 is .

6. Comparison between volume optimal cycles and optimal cycles

In this section, we compare volume optimal cycles and optimal cycles. In fact, optimal cycles and volume optimal cycles are identical in many cases. However, since we can use optimal volumes in addition to volume optimal cycles, we have more information than optimal cycles. One of the most prominent advantage of volume optimal cycles is children birth-death pairs, explained below.

6.1. Children birth-death pairs

In the above section, we show that there is a tree structure on an -th persistence diagram computed from a triangulation of a convex set in . Unfortunately, such a tree structure does not exist in a general case. However, in the research of amorphous solids by persistent homology[7], a hierarchical structure of rings in is effectively used, and it will be helpful if we can find such a structure on a computer. In [7], the hierarchical structure was found by computing all optimal cycles and searching multiple optimal cycles which have common vertices. However, computing all optimal cycles or all volume optimal cycles is often expensive as shown in Section 7.4 and we require a cheaper method. The optimal volume is available for that purpose. When the optimal volume for a birth-death pair is , the children birth-death pairs of is defined as follows:

This is easily computable from a optimal volume with low computation cost.

Now we remark that if we consider -th persistent homology in , the children birth-death pairs of is identical to all descendants of in the tree structure. This fact is known from Theorem 8. This fact suggests that we can use children birth-death pairs as a good substitute for the tree structure appearing on in . The ability of children birth-death pairs is shown in Section 7.2, the example of amorphous silica.

6.2. Some examples in which volume optimal cycles and optimal cycles are different

We show some differences between optimal cycles and volume optimal cycles on a filtration. In Fig 8, the 1st PD of this filtration is . The optimal cycle of is since but the volume optimal cycle is . In this example, is better than to represent the birth-death pair . The example is deeply related to Theorem 5. Such a theorem does not hold for optimal cycles and it means that an optimal cycle may give misleading information about a birth-death pair. This is one advantage of volume optimal cycles compared to optimal cycles.

Figure 8. A filtration whose optimal cycle and volume optimal cycle are different.

In Fig. 9 and Fig. 10, optimal cycles and volume optimal cycles are also different. In Fig. 9, the optimal cycle is but the volume optimal cycle . In Fig. 10, the optimal cycle for is but the volume optimal cycle is .

Figure 9. Another filtration whose optimal cycle and volume optimal cycle are different.
Figure 10. Another filtration whose optimal cycle and volume optimal cycle are different.

In Fig 11, the 1st PD is and we cannot define the volume optimal cycle but can define the optimal cycle. In general, we cannot define the volume optimal cycle for a birth-death pair with infinite death time. If we use an alpha filtration in , such a problem doest not occur because a Delaunnay triangulation is always acyclic. But if we use another type of a filtration, we possibly cannot use volume optimal cycles. That may be a disadvantage of volume optimal cycles if we use a filtration other than an alpha filtration, such as a Vietoris-Rips filtration.

Figure 11. A filtration without a volume optimal cycle.

One more advantage of the volume optimal cycles is the simplicity of the computation algorithm. For the computation of the optimal cycles we need to keep track of all persistence cycles but for the volume optimal cycles we need only birth-death pairs. Some efficient algorithms implemented in phat and dipha do not keep track of such data, hence we cannot use such softwares to compute the optimal cycles without modification. By contrast we can use such softwares for the computation of the volume optimal cycles.

7. Example

In this section, we will show the example results of our algorithm. In all of these examples, we use alpha or weighted alpha filtrations.

For all of these examples, optimal volumes and volume optimal cycles are computed on a laptop PC with 1.2 GHz Intel(R) Core(TM) M-5Y71 CPU and 8GB memory on Debian 9.1. Dipha [5] is used to compute PDs, CGAL888http://www.cgal.org/ is used to compute (weighted) alpha filtrations, and Clp [22] is used to solve the linear programming. Python is used to write the program and pulp999https://github.com/coin-or/pulp is used for the interface to Clp from python. Paraview101010https://www.paraview.org/ is used to visualize volume optimal cycles.

If you want to use the software, please contact with us. Homcloud111111http://www.wpi-aimr.tohoku.ac.jp/hiraoka_labo/research-english.html, a data analysis software with persistent homology developed by our laboratory, provides the algorithms shown in this paper. Homcloud provides the easy access to the volume optimal cycles. We can visualize the volume optimal cycle of a birth-death pair only by clicking the pair in a PD on Homcloud’s GUI.

7.1. 2-dimensional Torus

The first example is a 2-dimensional torus in . 2400 points are randomly scattered on the torus and PDs are computed. Figure 12 shows the 1st and 2nd PDs. The 1st PD has two birth-death pairs and and the 2nd PD has one birth-death pair far from the diagonal. These birth-death pairs correspond to generators of and .

Figure 12. The 1st and 2nd PDs of the point cloud on a torus.
Figure 13. Volume optimal cycles for and in and in on the torus point cloud.

Figure 13 shows the volume optimal cycles of these three birth-death pairs using Algorithm 2. Blue lines show volume optimal cycles, red lines show optimal volumes, black lines show for each birth death pair (we call this simplex the death simplex). Black dots show the point cloud. By the figure, we understand how homology generators appear and disappear in the filtration of the torus point cloud. The computation times are 25sec, 33sec, and 7sec on our laptop PC.

By using Algorithm 3, we can also compute volume optimal cycles in . In this example, the computation time by Algorithm 3 is about 2sec. This is much faster than Algorithm 2 even if Algorithm 3 computes all volume optimal cycles.

7.2. Amorphous silica

In this example, we use the atomic configuration of amorphous silica computed by molecular dynamical simulation as a point cloud and we try to reproduce the result in [7]. In this example, we use weighted alpha filtration whose weights are the radii of atoms. The number of atoms are 8100, 2700 silicon atoms and 5400 oxygen atoms.

Figure 1 shows the 1st PD. This diagram have four characteristic areas , , , and . These areas correspond to the typical ring structures in the amorphous silica as follows. Amorphous silica consists of silicon atoms and oxygen atoms and the network structure is build by covalent bonds between silicons and oxygens. has rings whose atoms are \ce -Si-O-Si-O- where \ce- is a covalent bond between a silicon atom and a oxygen atom. has triangles consisting of \ceO-Si-O. has triangles consisting of three oxygen atoms appearing alternately in \ce-O-Si-O-Si-O-. has many types of ring structures, but one typical ring is a quadrangle consists of four oxygen atoms appearing alternately in \ce-O-Si-O-Si-O-Si-O-.

Figure 14 shows the volume optimal cycles for birth-death pairs in and . In this figure oxygen (red) and silicon (blue) atoms are also shown in addition to volume optimal cycles, optimal volumes, and death simplices. We can reproduce the result of [7] about ring reconstruction.

Figure 14. Volume optimal cycles in amorphous silica in , and (from left to right).

We also know that the oxygen atom rounded by the green circle in this figure is important to determine the death time. The death time of this birth-death pair is determined by the radius of circumcircle of the black triangle (the death simplex), hence if the oxygen atom moves away, the death time becomes larger. The oxygen atom is contained in another \ce -Si-O-Si-O- ring structure around the volume optimal cycle (the blue ring). By the analysis of the optimal volume, we clarify that such an interaction of covalent bond rings determines the death times of birth-death pairs in . This analysis is impossible for the optimal cycles, and the volume optimal cycles enable us to analyze persistence diagrams more deeply.

Figure 15. Children birth-death pairs. Red circles are children birth-death pairs of the green birth-death pair.

Figure 15 shows the children birth-death pairs of the green birth-death pair. The rings corresponding to these children birth-death pairs are subrings of the large ring corresponding to the green birth-death pair. This computation result shows that a ring in has subrings in , , and . The hierarchical structure of these rings shown in [7]. We can easily find such a hierarchical structure by using our new algorithm.

The computation time is 3 or 4 seconds for each volume optimal cycle on the laptop PC. The computation time for amorphous silica is much less than that for 2-torus even if the number of points in amorphous silica is larger than that in 2-torus. This is because the locality of volume optimal cycles works very fine in the example of amorphous silica.

7.3. Face centered cubic lattice with defects

The last example uses the point cloud of face centered cubic (FCC) lattice with defects. By this example, we show how to use the persistence trees computed by Algorithm 3. The point cloud is prepared by constructing perfect FCC lattice, adding small Gaussian noise to each point, and randomly removing points from the point cloud.

Figure 16. (a) The 2nd PD of the perfect FCC lattice with small Gaussian noise. (b) The 2nd PD of the lattice with defects.

Figure 16(a) shows the 2nd PD of FCC lattice with small Gaussian noise. (i) and (ii) in the figure correspond to octahedron and tetrahedron cavities in the FCC lattice. In materials science, these cavities are famous as octahedron sites and tetrahedron sites. Figure 16(b) shows the 2nd PD of the lattice with defects. In the PD, birth-death pairs corresponding to octahedron and tetrahedron cavities remain ((i) and (ii) in Fig 16(b)), but other types of birth-death pairs appear in this PD. These pairs correspond to other types of cavities generated by removing points from the FCC lattice.

Figure 17(a) shows a tree computed by Algorithm 3. Red markers are nodes of the tree, and lines between two markers are edges of the tree, where upper left nodes are ancestors and lower right nodes are descendants. The tree means that the largest cavity corresponding to most upper-left node has sub cavities corresponding descendant nodes. Figure 17(b) shows the volume optimal cycle of the most upper-left node, (c) shows the volume optimal cycles of pairs in (i), and (d) shows the volume optimal cycles of pairs in (ii). Using the algorithm, we can study the hierarchical structures of the 2nd PH.

Figure 17. A persistence tree and related volume optimal cycles. (a) The persistence tree whose root is . (b) The volume optimal cycle of the root pair. (c) The volume optimal cycles of birth-death pairs in (i) which are descendants of the root pair. (d) The volume optimal cycles of birth-death pairs in (ii) which are descendants of the root pair.

7.4. Computation performance comparison with optimal cycles

We compare the computation performance between optimal cycles and volume optimal cycles. We use OptiPers for the computation of optimal cycles for persistent homology, which is provided by Dr. Escolar, one of the authors of [15]. OptiPers is written in C++ and our software is mainly written in python, and python is much slower than C++, so the comparison is not fair, but suggestive for the readers.

We use two test data. One test data is the atomic configuration of amorphous silica used in the above example. The number of points is 8100. Another data is the partial point cloud of the amorphous silica. The number of points is 881. We call these data the large data and the small data. Table 1 shows the computation time of optimal cycles/volume optimal cycles for all birth-death pairs in the 1st PD by OptiPers/Homcloud.

optimal cycles (OptiPers) volume optimal cycles (Homcloud)
the small data 1min 17sec 3min 9sec
the large data 5hour 46min 4hour 13min
Table 1. Computation time of optimal cycles and volume optimal cycles on the large/small data.

For the small data, OptiPers is faster than Homcloud, but to the contrary, for the large data, Homcloud is faster than OptiPers. This is because the performance improvement technique using the locality of the optimal volume works fine for the large data, but for the small data the technique is not so effective and the overhead cost using python is dominant for Homcloud. This benchmark shows that the volume optimal cycles have an advantage about the computation time when an input point cloud is large.

8. Conclusion

In this paper, we propose the idea of volume optimal cycles to identify good geometric realizations of homology generators appearing in persistent homology. Optimal cycles are proposed for that purpose in [15], but our method is faster for large data and gives better information. Especially, we can reasonably compute children birth-death pairs only from a volume optimal cycle. Volume optimal cycles are already proposed under the limitation of dimension in [16], and this paper generalize the idea.

Our idea and algorithm are widely applicable to and useful for the analysis of point clouds in by using the (weighted) alpha filtrations. Our method gives us intuitive understanding of PDs. In [23], such inverse analysis from a PD to its original data is effectively used to study many geometric data with machine learning on PDs and our method is useful to the combination of persistent homology and machine learning.

In this paper, we only treat simplicial complex, but our method is also applicable to a cell filtration and a cubical filtration. Our algorithms will be useful to study sublevel or superlevel filtrations given by 2D/3D digital images.

Appendix A Proofs of Section 5

The theorems shown in this section are a kind of folklore theorems. Some researchers about persistent homology probably know the fact that the merge-tree algorithm gives a 0th PD, and the algorithm is available to compute an -th PD using Alexander duality, but we cannot find the literature for the complete proof. [16] stated that the algorithm also gives the tree structure on an -th PD, but the thesis does not have the complete proof. Therefore we will show the proofs here.

Alexander duality says that for any good topological subspace of , the -th homology of and -th cohomology of have the same information. In this section, we show Alexander duality theorem on persistent homology. In this section, we always use as a coefficient of homology and cohomology.

a.1. Persistent cohomology

The persistent cohomology is defined on a decreasing sequence of topological spaces. The cohomology vector spaces and the linear maps induced from inclusion maps define the sequence

and this family of maps is called persistent cohomology . The decomposition theorem also holds for persistent cohomology in the same way as persistent homology and we define the th cohomologous persistence diagram using the decomposition.

a.2. Alexander duality

Before explaining Alexander duality, we show the following proposition about the dual decomposition.

Proposition 10.

For any oriented closed -manifold and its simplicial decomposition , there is a decomposition of , , satisfying the followings:

  1. is a cell complex of .

  2. There is a one-to-one correspondence between and . For , we write the corresponding cell in as .

  3. for any .

  4. If a subcomplex of ,

    is a subcomplex of .

  5. We consider the chain complex of and , let and be boundary operators on those chain complexes, and let and be matrix representations of and , i.e. and . Then is the transpose of .

This decomposition is called the dual decomposition of . One example of the dual decomposition is a Voronoi decomposition with respect to a Delaunnay triangulation. Using the dual decomposition, we can define the map from to for as the linear extension of , where are -simplices of , are corresponding -cells of , and is the dual basis of .

The map satisfies the equation

(14)

where is the coboundary operator on from Proposition 10. The map induces the isomorphism , and the isomorphism is called Poincaré duality.

Using the dual decomposition, we show Alexander duality theorem.

Theorem 11.

For an -sphere , its simplicial decomposition