A Topological Study of Functional Data and Fréchet Functions of Metric Measure Spaces††thanks: Research supported in part by NSF grants DMS-1722995 and DMS-1723003.
We study the persistent homology of both functional data on compact topological spaces and structural data presented as compact metric measure spaces. One of our goals is to define persistent homology so as to capture primarily properties of the shape of a signal, eliminating otherwise highly persistent homology classes that may exist simply because of the nature of the domain on which the signal is defined. We investigate the stability of these invariants using metrics that downplay regions where signals are weak. The distance between two signals is small if they exhibit high similarity in regions where they are strong, regardless of the nature of their full domains, in particular allowing different homotopy types. Consistency and estimation of persistent homology of metric measure spaces from data are studied within this framework. We also apply the methodology to the construction of multi-scale topological descriptors for data on compact Riemannian manifolds via metric relaxations derived from the heat kernel.
Keywords: persistent homology, functional data, metric-measure spaces.
This paper investigates ways of producing robust, informative summaries of both functional data on topological spaces and structural data in metric spaces, two problems that permeate the sciences and applications. Many problems involving structural data may be formulated in the realm of metric-measure spaces (-spaces) , where is a metric space and is a Borel probability measure on . For example, a dataset may be analyzed via the associated empirical measure , where is the Dirac measure based at .
The mean of a Euclidean distribution is a most basic statistic that may be generalized to triples via the Fréchet function defined by
In the Euclidean case, it is well known that the mean is the unique minimizer of , provided that has finite second moment. In the more general setting, a Fréchet mean is a minimizer of , not necessarily unique, thus leading to the concept of Fréchet mean set. One also may consider the local minima of that often yield valuable additional information about the distribution. The Fréchet mean has received a great deal of attention from many authors (cf. [gk73, patbat03, arn-bar13]). However, Fréchet mean sets may be difficult to estimate or compare quantitatively, limiting its applicability in data analysis.
Beyond means, a wealth of structural information about typically resides in (cf. [diazetal18, diazetal18a]). One of our primary goals is to carry out a topological study of the Fréchet function via persistent homology to uncover and summarize properties of the shape of probability distributions on metric spaces. The study is done in the general setting of Fréchet functions of order , defined as
Some authors refer to as the -eccentricity function of (cf. [carlsson09]). Closely related to is the function defined as
which we refer to as the -centrality function of . We opt to construct topological summaries for using , rather than , because barcodes or persistence diagrams derived from -centrality are more amenable to analysis. A key property of Fréchet and centrality functions is that they attenuate the influence of outliers in persistence homology computed from large random samples of a distribution, a fact that is implicit in the stability and consistency results proven in this paper.
We view as a “signal” on and first work in the more general setting of functional data on compact topological spaces. Subsequently, we specialize to centrality functions of -spaces. In the functional setting, a data object is a triple , where is a topological space and is a continuous function. The collection of all such triples is denoted . We make the convention that large values of correspond to weak signals. Generally, this is more consistent with the behavior of Fréchet functions that attain larger values in data deserts, far away from concentrations of probability mass (cf. [diazetal18]).
In defining persistent homology [frosini92, robins99, edelsbrunner-etal, carlsson09, edehar08, cohenetal07] invariants for , there are a few drawbacks in following the standard procedure of using the filtration of given by the sublevel sets of directly. For example, a region of non-trivial topology where the signal is weak might contribute highly persistent homology classes, masking the “real” topology of the signal and producing a confounding effect. A closely related problem is that, once we reach the maximum value of , the sublevel sets of coincide with the full space , so that the global homology of is the dominant information captured in a homological barcode regardless of the “support” of the signal. The number of bars of infinite length for -dimensional homology is the th Betti number of . At the barcode level, we could alleviate the problem by trimming barcodes at the maximum value of . However, stability results for truncated barcodes obtained by factoring through the stability of persistent homology of the usual sublevel set filtration would be somewhat weak because it would require similarity of the barcodes prior to truncation. Thus, we introduce a metric on the space of functional topological spaces with respect to which small distances indicate similarity where signals are strong. We investigate the stability of persistent homology in this setting. We take the homotopy type distance , introduced by Frosini et al. in [frosinietal17], as well as a slight variant of it, as our point of departure. A stability theorem is proven in [frosinietal17] for persistent homology (of sublevel set filtrations) with respect to for functional data defined on domains with the same homotopy type. We combine with a topological cone construction, used as a counterpart to barcode trimming at the level of functional spaces, that has the virtue of making all domains contractible, so that the stability theorem of [frosinietal17] becomes applicable to functional data with arbitrary domains. Coning also allows us to highlight the topology of regions where signals are strong and downplay topological differences where signals are subdued. Previously, Cerri et al. have used a related cone construction in the study of Betti numbers for multidimensional persistent homology [cerrietal08]. We also define a metric on the space of -spaces that downplays topological differences in regions far away from sizeable probability mass and prove stability and consistency theorems for persistent homology of centrality functions.
Functional and structural data on domains of different homotopy types are commonplace in practice. For example, functional data acquired through imaging often contain many topological defects in their domains that may require a significant amount of image processing prior to analysis. The proposed method allows us to easily deal with those defects and bypass such pre-processing steps, provided that the defects do not occur near the “support” of the signal. It is also common for analysis of functional data to involve delicate registration steps, especially in situations where images only partially correspond. The present approach circumvents registration steps, still yielding informative data summaries, provided that the “supports” of the signals are fully captured by the images. In summary, if the images in a dataset capture the regions where the important information resides, little image processing is needed.
As an illustration, we simulate this type of data. Fig. 1 shows signals with very similar shapes, but defined on domains of different homotopy types. The first row shows heat maps of (i) a “ground truth” function defined on a rectangle and with a circular shape, (ii) the same function restricted to with an open ball removed that is “encircled” by the signal, and (iii) the same function on with three noisy holes located in a region where the signal is subdued. The function labeled (i) has global minima along a circle and the center of that circle is a maximum. (Recall that we made the convention that low values correspond to strong signals.) The second row shows the barcodes for 1-dimensional homology obtained from the sublevel set filtrations induced by the images. The barcode for the ground truth image comprises a single bar of finite persistence whose birth-death coordinates are the minimum and maximum values of the function, respectively. These values are highlighted by vertical dashed lines. For image (ii), we have a single bar with the same birth coordinate , but of infinite length because the signal wraps around the hole in the domain. For image (iii), we have the original bar of finite length in addition to three infinite bars whose birth coordinates are close to . Thus, in spite of having three signals of very similar shape, the bottleneck distance between any pair of barcodes is infinite. The aforementioned cone construction will have the effect of trimming the barcodes at , as shown on the third row. This yields three barcodes that lie close together with respect to the bottleneck distance because the three noisy bars in (c) are short lived.
It may be instructive to contrast the present approach to persistent homology of metric measure spaces with the treatment by Blumberg et al. [blumberg14]. A basic philosophical difference is that whereas we define the persistent homology of a -space directly via centrality functions and prove stability and consistency results, Blumberg et al. consider the pushforward of the product measure on the -fold product space to barcode space via the map that associates to a sample of size , the persistent homology of its Vietoris-Rips complex, proving stability and concentration results in this framework. Thus, their approach is based on properties of the distribution of barcodes constructed from independent random draws from the “theoretical” distribution .
As an application, we construct multi-scale persistent homology descriptors for distributions on a compact Riemannian manifold . We denote the geodesic distance on by . Using diffusion distances , , associated with the heat kernel on (cf. [coifman, diazetal18]), we obtain metric relaxations of via the 1-parameter family of -spaces. (We recall the definition of in Section LABEL:S:manifolds.) We show that this gives rise to a continuous path of persistence diagrams via the persistent homology of their centrality functions. We employ the framework developed for the topological analysis of -spaces to prove that this multi-scale topological descriptor is stable with respect to the Wasserstein distance on the space of Borel measures on .
The rest of the paper is structured as follows. Section 2 develops the aforementioned metric on the space of functional topological spaces and Section 3 proves the stability of persistence homology with respect to this metric. Section LABEL:S:mm addresses stability and consistency of persistent homology of -spaces. Section LABEL:S:manifolds is devoted to a multi-scale analysis of distributions on Riemannian manifolds and Section LABEL:S:final closes the paper with some additional discussion.
2 Functional Topological Spaces
Throughout the paper, we assume that all topological spaces are compact. A functional topological space (-space) is a triple , where is a continuous function on the topological space . Two -spaces and are isomorphic if there is a homeomorphism such that . The collection of isomorphism classes of -spaces is denoted . We abuse notation and denote an element of as . We begin by reviewing a special case of the homotopy type distance of [frosinietal17] that will be used in our study of functional data. We also introduce a slight variant of that has better properties with respect to the proposed cone construction, as explained in details below. We note that our notation and terminology differ from that of [frosinietal17].
2.1 The Homotopy Type Distance
A pairing between two topological spaces and is a pair of continuous mappings and . We denote by the collection of all such pairings.
Let be continuous mappings and a homotopy between and ; that is, a continuous mapping such that and , . Given and a continuous function , is said to be an -homotopy from to over if
Note that even if is an -homotopy from to over , the mapping may not be an -homotopy from to over .
Let and be ft-spaces, , and a pairing between and .
is an -pairing if and , and .
is a strong -pairing if and , and .
Clearly, any strong -pairing is an -pairing.
Let and be ft-spaces and .
An -matching between and is an -pairing that satisfies:
there is a -homotopy from to over ;
there is a -homotopy from to over ,
where and denote the identity maps of and , respectively.
A strong -matching between and is a strong -pairing that satisfies (a) and (b) above.
We use the notation to indicate that there is an -matching between and . The notation indicates the existence of a strong -matching. Note that such that if and only if and are homotopy equivalent. The same statement holds for strong matchings.
Let be -spaces.
(Frosini et al. [frosinietal17]) The distance between and is defined as , if and are homotopy equivalent, and , otherwise.
The distance is defined as , if and are homotopy equivalent, and , otherwise.
Both and define extended pseudo-metrics on , extended meaning that they may attain the value . The triangle inequality is easily verified.
2.2 A Cone Construction
As explained in the Introduction, we introduce a cone construction that gives a counterpart to barcode trimming at the -space level. Let be a compact topological space. On the product , consider the equivalence relation generated by , . The cone on is the quotient space . The quotient topology is denoted . The cone point is the equivalence class of any , denoted .
Let be a ft-space. The cone on , denoted , is the ft-space , where is defined by
for and . Here, is the maximum value of .
Note that for any , the sublevel set strong deformation retracts along cone lines to and therefore they have the same homology. Moreover, for , which is contractible. Thus, any bar in a barcode whose birth coordinate is close to will be short-lived, with the possible exception of a single bar in .
Define the cone operator by and let and be the pseudo-metrics on induced by and , respectively, under this operator. In other words,
Remark. Since and are contractible, thus homotopy equivalent, we have that and , for any .
One of the drawbacks in directly using the distance in our analysis of functional data is that the cone operator, designed to simplify signals, may in fact increase the distance between -spaces, making their dissimilarities even more pronounced, as illustrated by the following example.
Example. Let and with the topology induced by the Euclidean distance in . Let denote projection onto the second coordinate and set and , so that . Then, one may show that and .
In spite of the obvious inequality , coning exhibits a much better behavior with respect to , making it a more natural metric to adopt in some situations. We close this section by showing that the cone operator is non-expansive with respect to .
Let and be ft-spaces. If there is a strong -pairing between and , then .
Let be such that and , for all and . Pick and that satisfy and . If , then, , so that . The same argument applies if , proving the lemma. ∎
Given a map , we refer to , defined by , as the cone on . We use the notation , instead of , to distinguish this construction from the cone on a signal (see Definition 2.5).
Let and .
If is a strong -pairing between and , then is a strong -pairing between and .
If is a strong -matching between and , then is a strong -matching between and .
(i) From the definitions of cones and Lemma 2.6, we have that
Similarly, , proving the claim.
(ii) Let be a strong -matching between and . By (i), is a strong -pairing, so it suffices to verify the condition on homotopies. Let be a -homotopy from to over . We show that defined by is a -homotopy from to over . Indeed,
Similarly, we construct a -homotopy from to over . This concludes the proof. ∎
The inequality holds for any .
The statement is trivial if , so we assume that this distance is finite. Let . Then, there is a strong -matching between and . By Lemma 2.7 (ii), is a strong -matching between and . Thus . Taking infimum over , the claim follows. ∎
3 Topology of Functional Spaces
We briefly recall the definitions of persistence modules over and interleaving distance between two persistence modules. For more details, we refer the reader to [chazaletal, lesnick11]. We regard as a category whose objects are the points with a single morphism if and none otherwise. A persistence module (over a fixed field ) is a functor from to the category of vector spaces over . We use the notation for the vector space over . For , we write for the linear mapping associated with the morphism .
Let and be persistence modules and . A morphism of degree is a collection of linear mappings , , satisfying , for any and . An -interleaving between and is a pair and of morphisms of degree satisfying and , . We write to indicate that there is an -interleaving between the two persistence modules. The interleaving distance is defined as:
, if no interleaving exists;
A persistence module is tame if has finite rank, for any . There is a well-defined persistence diagram (or barcode), denoted , associated with any tame and the Isomorphism Theorem for persistence modules states that
if and are tame, where denotes bottleneck distance [cohenetal07, desilvaetal].
3.1 Stability of Persistent Homology
Let be an -space. For , let be the corresponding sub-level set of . Clearly, , if , and . Thus, the sub-level sets of induce a filtration of by closed subsets, which may be viewed as a functor from to the category of topological spaces and continuous mappings. Composition with the -dimensional homology functor (with coefficients in ) yields a persistence module, which we denote by . The vector space over is and the morphism , , is the homomorphism on homology induced by inclusion.
A stability theorem for persistent homology of -spaces has been proven in [frosinietal17], but we include a proof for the one-parameter persistence case needed in this paper.
Theorem 3.1 (Frosini et al. [frosinietal17]).
Let and be functional topological spaces. Then,
It suffices to consider the case . Let be an -matching between and . For any , condition (i) in Definition 2.2 ensures that . Thus, induces a mapping . Similarly, induces mappings . Condition (a) in Definition 2.3(i) implies that is homotopic to the inclusion map . Analogously, is homotopic to the inclusion map . Thus, the homomorphisms on homology induced by and , , yield an -interleaving between and . This implies that . Taking infimum over , the result follows. ∎
Let and be functional topological spaces. Then,
where and .
If is a triangulable space, then is tame (cf. [cohenetal07, desilvaetal]) so that there is a well-defined persistence diagram associated with .
Let and be triangulable ft-spaces and be an integer. Then,
3.2 Effect of Coning on Persistent Homology
As explained in the Introduction, one of the practical motivations for coning functional data is the truncation effect it has on persistent homology of their sublevel set filtrations. Here, we show more formally that this is indeed the effect of coning an -space.
Let be a persistence module and . The -truncation of is defined as the persistence module , where
Let be an -space and . We denote by the -space where is a one-point space and is given by . By the definition of , the (constant) maps and have the property that the sublevel sets of , and satisfy and , . It is simple to verify that, for each , and induce homomorphisms and of persistence modules. Note that is isomorphic to the interval module associated with the interval , and for . The kernels of and give persistence submodules and of and , respectively, which yield direct sum decompositions
(These decompositions are non-trivial only for the case when since for .) One may construct such isomorphisms by splitting the homomorphisms and , as follows. Pick such that and let and be given by and . Then, the induced homomorphisms and on persistence modules are well defined and split and , as desired. As usual, the splitting is not natural.
Let be an integer, , and . Then, is isomorphic to .
Let be the inclusion . Then, the sublevel sets of and satisfy , , and induces a homomorphism . The commutativity of the diagram