A Riemannian Framework for Statistical Analysis of Topological Persistence Diagrams

A Riemannian Framework for Statistical Analysis of Topological Persistence Diagrams

Rushil Anirudh*, Vinay Venkataraman*, Karthikeyan Natesan Ramamurthy, Pavan Turaga*
*Arizona State University, IBM T. J. Watson Research Center, NY
  
Abstract

Topological data analysis is becoming a popular way to study high dimensional feature spaces without any contextual clues or assumptions. This paper concerns itself with one popular topological feature, which is the number of dimensional holes in the dataset, also known as the Betti number. The persistence of the Betti numbers over various scales is encoded into a persistence diagram (PD), which indicates the birth and death times of these holes as scale varies. A common way to compare PDs is by a point-to-point matching, which is given by the -Wasserstein metric. However, a big drawback of this approach is the need to solve correspondence between points before computing the distance; for points, the complexity grows according to n. Instead, we propose to use an entirely new framework built on Riemannian geometry, that models PDs as 2D probability density functions that are represented in the square-root framework on a Hilbert Sphere. The resulting space is much more intuitive with closed form expressions for common operations. The distance metric is 1) correspondence-free and also 2) independent of the number of points in the dataset. The complexity of computing distance between PDs now grows according to , for a discretization of . This also enables the use of existing machinery in differential geometry towards statistical analysis of PDs such as computing the mean, geodesics, classification etc. We report competitive results with the Wasserstein metric, at a much lower computational load, indicating the favorable properties of the proposed approach.

1 Introduction

Topological data analysis (TDA) has emerged as a useful tool to analyze underlying properties of data before any contextual modeling assumptions kick in. Generally speaking, TDA seeks to characterize the shape of high dimensional data (viewed as a point-cloud in some metric space), by quantifying various topological constructs such as connected components, high-dimensional holes, level-sets and monotonic regions of functions defined on the data [9]. In particular, the number of -dimensional holes in a data, also known as the Betti- number, corresponds to the rank of the dimensional homology group. A commonly used and powerful topological feature, the persistence diagram (PD) summarizes the persistence of the Betti numbers with respect to the scale parameter used to analyze the data. A typical machine learning pipeline using TDA features would first estimate the PDs from the given point cloud, and define a metric on them to compare different point clouds. The Wasserstein distance measure has become ubiquitous as a metric between such PDs, as it is stable and has a well-defined metric space associated with it [21, 8]. However, computation of the Wasserstein distance involves finding a one-to-one map between the points in one persistence diagram to those in the other, which is a computationally expensive operation.footnotetext: This work was supported by NSF CAREER grant 1452163.

In this paper, we propose a novel representation for persistence diagrams as points on a hypersphere, by approximating them as 2D probability density functions (pdf). We perform a square-root transform of the constructed pdf, wherein the Hilbert sphere becomes the appropriate space for defining metrics [19]. This insight is used to construct closed form metrics – geodesics on the Hilbert sphere – which can be computed very efficiently, bypassing the correspondence problem entirely. The overall pipeline of operations for computing the proposed representation is given in Figure 1.

Figure 1: The overall sequence of operations leading to the proposed representation, illustrated for the application of activity analysis. Joint position data from motion capture systems are collected as 1D time series. The phase space is reconstructed from these time series via Takens’ embedding theorem. We represent the topological properties of this phase space using the persistence diagram (PD). Next, we use kernel density estimation to represent the PD itself as a 2D probability density function (pdf). Finally, we use the square-root framework to interpret these pdfs as points on a Hilbert sphere.

The biggest advantage of the proposed representation is that it completely works around the computationally expensive step of obtaining one-to-one correspondences between points in persistence diagrams, thereby making the distance computation between PDs extremely efficient. We show that approximating PDs as pdfs results in comparable performances to the popular and best performing -Wasserstein metric, while at the same time being orders of magnitude faster. We also provide a theoretically well-grounded understanding of the geometry of the pdf representation. Additionally, the availability of closed form expressions for many important tools such as the geodesics, exponential maps etc. enables us to adapt powerful statistical tools – such as clustering, PCA, sample mean etc. – opening new possibilities for applications involving large datasets. To the best of our knowledge we are the first to propose this representation for persistence diagrams.

Contributions

  1. We present the first representation of persistence diagrams that are approximated as points on a Hilbert sphere resulting in closed-form expressions to compare two diagrams. This also completely avoids the correspondence problem, which is typically a computational bottleneck.

  2. We demonstrate the ability of the proposed representation for statistical analyses, such as computing the mean persistence diagram, principal geodesic analysis (PGA), and classification using SVMs.

  3. We show promising results for supervised tasks such as action recognition, and assessment of movement quality in stroke rehabilitation.

  4. The space of the representation – the Hilbert sphere – is a geometrically well-understood and intuitive space, which may further promote its use in topological machine learning applications.

The rest of the paper is organized as follows. Section 2 discusses related work in more detail. Section 3 provides the necessary background on persistent homology, the space of persistence diagrams, and the square-root framework on the Hilbert space. Section 4 provides details about the proposed framework for using the new representation of persistence diagrams in a functional Hilbert space for statistical learning tasks. Section 5 describes the experiments and results. Section 6 concludes the paper.

2 Related Work

Persistence diagrams elegantly summarize the topological properties of point clouds, and the first algorithm for computing topological persistence was proposed in [9]. Ever since, there has been an explosion in understanding the properties of PDs, comparing PDs, and computing statistics on them. Since a PD is a multiset of off-diagonal points along with infinite number of points on the diagonal, the cost of optimal bijective assignment, also known as the Wasserstein distance, between the individual points in the PDs is a valid distance measure. The time complexity of computing the distance between two PDs with points each is [4]. It has been shown in [8] that the persistence diagrams of Lipschitz functions are stable with respect to -Wasserstein distance. However, the bottleneck and -Wasserstein distance do not allow for easy computation of statistics. Hence, -Wasserstein metrics have been used to develop approaches for computing statistical summaries such as the Fréchet mean [21]. Computing Fréchet mean of PDs involves iteratively assigning points from the individual diagrams to the candidate mean diagram, recomputing the mean, and repeating till convergence. Since the mean PD obtained is not guaranteed to be unique, the authors in [16] proposed the probabilistic Fréchet mean which itself is a probability measure on the space of PDs. An investigation of the properties of the space of PDs that allow for definition of various probability measures is reported in [15], and a derivation of confidence sets that allow the separation of topological signal from noise is presented in [10].

Since operations with PDs can be computationally expensive, Bubenik proposed a new topological summary - known as the persistence landscape - derived from the PD [5]. It can be thought of as a collection of envelope functions on the points in PD based on the order of their importance. Persistence landscapes allow for fast computation of statistics since they lie in a vector space. However their practical utility has been limited since they provide decreasing importance to secondary and tertiary features in PDs that are usually useful in terms of discriminating between data from different classes. Recently, an approach that defines a stable multi-scale kernel between persistence diagrams has been proposed [17]. The kernel is obtained by creating a surface with a Gaussian centered at each point in the PD along with a negative Gaussian centered at its reflection across the diagonal. In addition, the authors in [2] propose to compute a vector representation - the persistence image - by computing a kernel density estimate of the PD and integrating it locally. The kernel density estimate is weighted such that the points will have increasing weights as they move away from the diagonal. A similar weighting approach is used in [14] to create a persistence-weighted Gaussian kernel.

The square-root representation for 1D probability density functions (pdfs) was proposed in [19] and was used for shape classification. It has since been used in several applications, including activity recognition [22] - where the square-root velocity representation is used to model the space of time warping functions. This representation extends quite easily to arbitrary high-dimensional pdfs as well.

3 Mathematical Preliminaries

In this section we will introduce the concepts of a) persistence homology, b) persistent diagrams, c) our proposed representation of PDs as a 2D pdf, and the square-root framework resulting in the Hilbert sphere geometry.

3.1 Persistent Homology

The homology of point cloud can be computed by first constructing a shape out of the point cloud and estimating its homology. A simplex is imposed on every set of points in the point cloud that satisfy a neighborhood condition based on a scale parameter . The collection of such simplices is known as the simplicial complex . can be thought of as a representation of the underlying shape of the data, and its homology can be inferred. However, homology groups obtained from depend on the scale or time parameter based on which the complexes are constructed [9]. The homological features of the simplicial complex constructed from the point cloud that are stable across scales, i.e., that are persistent, are the ones that provide information about the underlying shape. Topological features that do not persist are considered to be noise. This information is represented using persistence diagrams as a 2D plot of birth versus death times of each homology cycle corresponding to the homology group . The birth time is the scale at which the homological feature is born and the death time is the scale at which it ceases to exist. The homology cycle of dimension is also referred to as a dimensional hole. Therefore, the PD can be considered as an object that represents the number of holes in terms of the range of scales at which they appear. Typically, PDs of the point cloud data are obtained using filtration of simplicial complexes. A well-known filtration is the Vietoris-Rips (VR) filtration, where a simplex is induced between a group of points whenever the distance between each pair of points is less than the given scale [26]. An example of point clouds and their corresponding persistence diagrams for homology groups 0 and 1 are provided in Figure 2.

(a)
(b)
(c) Point cloud data
(d) Persistence diagrams
Figure 2: The above example illustrates the topological features extracted from the point cloud data with two and one one-dimensional holes. These properties are reflected well in their corresponding persistence diagrams on the right.

3.2 The Space of Persistence Diagrams

Every PD is a multiset of 2D points, where each point denotes the birth and death time of a homology group. Furthermore, the diagonal is assumed to contain an infinite number of points with the same birth and death times. For any two PDs and , the distance between the diagrams is usually quantified using the bottleneck distance or the -Wasserstein metric [13]. In this paper, we consider only the - and -Wasserstein ( and ) metrics given respectively as,

(1)

and

(2)

Since each diagram contains an infinite number of points in the diagonal, this distance is computed by pairing each point in one diagram uniquely to another non-diagonal or diagonal point in the other diagram, and then computing the distance. This correspondence can be obtained via the Hungarian algorithm or its more efficient variants [13].

The space of PDs with respect to the -Wasserstein metric is given by

(3)

Turner et. al. [21] show this is a non-negatively curved Alexandrov space. Furthermore, the diagram on the geodesic between the PDs and in this space is given as

(4)

where is a point in the diagram , is a corresponding point in the diagram and parametrizes the geodesic. Clearly, the points in the diagram on the geodesic can be simply obtained by linearly interpolating between the corresponding points on the candidate points and . Furthermore, the Riemannian mean between two PDs is easily computed as .

3.3 Square-root Framework and the Hilbert Sphere

We treat the points in a persistence diagram as samples from an underlying probability distribution function. This representation is inspired from recent work dealing with defining Mercer kernels on PDs [17], where the feature embedding obtained from the Mercer kernel closely resembles a kernel-density estimator from the given points, with an additional reflection term about the diagonal to account for boundary effects while solving a heat-equation. The fact that the feature embedding resembles a kernel density estimate is not further exploited in [17].

In our work, we more directly exploit this pdf interpretation of PDs, further endowing it with a square-root form [19] – and then utilizing the Hilbert spherical geometry that results.

Without loss of generality, we will assume that the supports for each 2D pdf is . The space of pdfs that we will consider is

(5)

It is well-known that is not a vector space [19]. Instead, it is a Riemannian manifold with the Fisher-Rao metric as the natural Riemannian metric. Geodesics under the Fisher-Rao metric are quite difficult to compute. Instead, we adopt a square-root form proposed by Srivastava et. al. [19] which simplifies further analysis enormously. In other words we consider the space,

(6)

For any two tangent vectors , the Fisher-Rao metric is given by,

(7)

The above two pieces of information imply that the square-root form results in the space becoming a unit Hilbert-sphere, endowed with the usual inner-product metric. Geodesics on the unit-Hilbert sphere under the above Riemannian metric are known in closed form. In fact, the differential geometry of the Hilbert sphere results in closed form expressions for computing geodesics, exponential and inverse exponential maps [19]. Further, the square-root form with the above metric has additional favorable properties such as invariance to domain re-parameterization.

Given two points the geodesic distance between them is given by

(8)

where . The exponential map is defined as,

(9)

where is a tangent vector at = . The logarithmic map from to is given by:

(10)

with The geodesic on the sphere is given in closed form as

(11)

or equivalently as

(12)

A comparison of sampling of the geodesics in the Alexandrov space induced by the -Wasserstein metric and the proposed Hilbert sphere are provided in Figure 3.

Figure 3: Comparing geodesics in the Alexandrov space (top) and the proposed Hilbert sphere (bottom), for a simple persistence diagram containing just points. The Alexandrov space requires computing correspondence between points, and the geodesic involves linearly interpolating between corresponding points. The persistence pdf avoids correspondence estimation, hence the geodesics correspond to new local modes appearing and gradually increasing in intensity as the original modes decrease in intensity.

4 Algorithmic Details

Our proposed framework consists of reconstructing the phase space of the time series data, computing the PDs of the reconstructed phase space, transforming each PD to a 2D pdf by kernel density estimation, using the 2D pdfs in the square-root framework to estimate distances or obtain statistical summaries. The distances computed and the features obtained will be used in inference tasks. Additional details for some of the above steps are given below.

Phase-space Reconstruction from Activity Data

In dynamical system understanding, we aim to estimate the underlying states, but we measure functions – usually projections – of the original state-space. e.g. while human movement is influenced by many joints, ligaments, muscles of the body, we might measure the location of only a few joints. One way to deal with this issue is to reconstructing the ‘phase space’ by the method of delays used commonly in non-linear dynamical analysis [20]. Reconstructing the phase-space in this manner only preserves important topological properties of the original dynamical system, and does not recover the true state-space. For a discrete dynamical system with a multidimensional phase-space, time-delay vectors (or embedding vectors) are obtained by the concatenation of delayed versions of the data points,

(13)

where is the embedding dimension, is the embedding delay. For a sufficiently large , the important topological properties of the unknown multidimensional system are reproduced in the reconstructed phase-space [1]. The collection of time-delay vectors is the point cloud data under further consideration, and this is represented as .

Estimating the PDs

After estimating the point cloud in the reconstructed phase space, we will construct a simplicial complex using the point cloud data to compute the persistence diagrams for the Betti numbers using the VR filtration. However, this approach considers only spatial adjacency between the points and ignores the temporal adjacency. We improve upon the existing VR filtration approach by explicitly creating temporal links between , , and in the one-skeleton of , thereby creating a metric space which encodes adjacency in both space and time [23]. The persistence diagrams for homology groups of dimensions and are then estimated.

Square-root Framework for Distance Estimation between PDs

Since each PD is a multiset of points in the 2D space, we start by constructing a 2D pdf from each of them using kernel density estimation using a Gaussian kernel of zero mean and variance . For each PD, we compute the square-root representation using the approach provided in Section 3.3, and the distance between two PDs can be computed using (8).

Dimensionality Reduction with Principal Geodesic Analysis (PGA)

We are able to extract principal components using PGA [11], adapted to our Hilbert sphere – which essentially performs classical PCA on the tangent space of the mean-point on the sphere. Given a dataset of persistence diagrams in their square-root form , we first compute the Riemannian center of mass [12]. We use a simple approximation using an extrinsic algorithm that computes first the Euclidean mean and maps it on to the closest point on the sphere. Next, we represent each by its tangent vector from the mean. We then compute the principal components on the tangent space and represent a persistence diagram as a low-dimensional vector.

5 Experiments

We perform experiments on two datasets for human action analysis. First we perform action recognition on the MoCap dataset [3], next we show the use of the framework in quality assessment of movements in stroke rehabilitation [7]. We will describe the datasets next, followed by the evaluation settings and parameters used. In all our experiments, we performed a discretization of the 2D pdf into a grid. The choice of indirectly affects classification performance as expected, i.e. – a smaller value of results in a reduced ability to identify between nearby points in the PD due to lower resolution. On the other hand, a larger value of improves resolution, but also increases computational requirements. Typical values of used in experiments range from to at most, whereas typical values of – the number of points in a PD – ranges from to .

5.1 Motion Capture Data

We evaluate the performance of the proposed framework using -dimensional motion capture sequences of body joints [3]. The dataset is a collection of five actions: dance, jump, run, sit and walk with and instances respectively.

Method Accuracy (%) Time (sec)
Chaos [3] 52.44 -
VR-Complex [26] 93.68 -
DT2 [24] 93.92 -
T-VR Complex (L1) [23] 96.48
Proposed - 1NN 89.87
Proposed - PGA +SVM 91.68 -
Table 1: Comparison of classification rates for different methods using nearest neighbor classifier on the motion capture dataset. It is observed that on an average, the proposed metric is times faster than the traditional Wasserstein metric, while achieving a comparable recognition accuracy.

We generate random splits having testing examples from each action class and use an SVM classifier on the vector features computed with PGA, we get a performance of . The mean recognition rates for the different methods are given in Table 1. We also compare with a 1-nearest neighbor classifier computed on the Hilbert sphere, which gives a performance of . This is clearly competitive, when compared to approaches that use the -Wasserstein metric as shown in table 1. Clearly, the proposed metric for persistence diagrams is able to capture most of the important information, with the added benefit being free from expensive correspondence estimation. To compute the times taken using the Wasserstein metric and the proposed metric, we average across 3 samples from 5 action classes, across (19 joint trajectories along 3 axes) trajectories – a total of persistence diagrams. We computed a pairwise distance matrix for all these PDs, and computed the average time taken in Matlab 2014 on a standard Intel i7 PC.

Figure 4: Recognition accuracy vs , the standard deviation of the 2D Gaussians used in the kernel density estimates.

In Figure 4, we compare the recognition accuracy with the choice of , the standard deviation used during kernel density estimation. The accuracy is generally higher for a small and drops as the increases. We note that a similar trend is also reported in [17].

(a) Unimpaired Subjects
(b) Impaired Subjects
Figure 5: The mean persistence pdfs visualized as heat maps for unimpaired subjects (left) and stroke survivors (right), for the reach-and-grasp action, are shown. These means are computed as the extrinsic means (we choose extrinsic mean here for simplicity) on the Hilbert sphere using the proposed representation. The locations of the peaks are exactly the same, since the subjects perform the same general movement, however the intensity differs significantly indicating that the quality of movement is captured using topological tools well.

5.2 Stroke Rehabilitation Dataset

Our aim in this experiment is to quantitatively assess the quality of movement performed by the impaired subjects during repetitive task therapy. The experimental data was collected using a heavy marker-based system ( markers on the right hand, arm and torso) in a hospital setting from impaired subjects performing reach and grasp movements to a defined target. The stroke survivors were also evaluated by the Wolf Motor Function Test (WMFT) [25] on the day of recording, which evaluates a subject’s functional ability on a scale of (with being least impaired and being most impaired) based on predefined functional tasks. In our experiments, we only use the data corresponding to the single marker on the wrist from the heavy marker-based hospital system, which allows us to evaluate the framework in a home-based rehabilitation system setting. We utilize the WMFT scores as an approximate high-level quantitative measure for movement quality (ground-truth labels) of impaired subjects performing the reach and grasp movements.

We compute explicit features as described earlier using PGA. This allows us to represent each movement by a low-dimensional feature vector. We use the WMFT scores and split the dataset into train and test to perform regression. Since the dataset is small, we use a leave-one-out approach, where we train on all but one sample, which is used for testing, and repeat this over all the samples in the dataset. The correlation performance with the WMFT score is shown in Table 2, and it can be seen that we are comparable to the state of the art, and much better than traditional features. The predicted scores are shown in Figure 6 compared to the original scores.

The dynamical features proposed in [24] and [3], depend on describing the phase space for each movement. We compute the topological features, which are expected to be much weaker than other characteristics such as shape. However, we are still able to capture subtle information regarding movement quality. This is illustrated in Figure 5, where we see the 3D peaks associated with the average of persistence diagrams across subjects impaired with stroke and those who are unimpaired. It is clearly seen that since they are performing the same kind of movements, the peaks occur at the same location. However, the intensity is significantly different across these diagrams, perhaps indicating information regarding movement quality.

Method Correlation with WMFT
KIM (14 markers) [6] 0.85
Lyapunov exponent (1 marker) [18] 0.60
Proposed method (1 marker) 0.80
Table 2: Comparison of correlation coefficient for different methods using leave-one-subject-out cross-validation scheme and SVM regressor on the stroke rehabilitation dataset. Even with just a single marker, We obtain comparable results to a clinical 14-marker system.
Figure 6: Predicting movement quality scores for reach-and-grasp in stroke survivors using topological features. We obtain a 0.8 correlation score with a p-value of

6 Discussion and Conclusion

Based on the theory and experiments presented so far, it is instructive to compare the space of PDs with respect to the different distance metrics. We will consider only the -Wasserstein metric () and the proposed Hilbert sphere metric . The interpretation of PDs with respect to these two metrics is very different. is the Earth Mover’s Distance between the PDs and considered as scaled discrete probability mass functions (pmfs). However, is the geodesic distance between the square root of kernel density estimates of and . Furthermore, the “length” of the persistence diagrams induced by these metrics are very different. For the Wasserstein metric, this is given by which can be arbitrarily high, whereas with the proposed metric it is constrained as the persistence diagrams live on a unit sphere. The most important distinction arises when the pairwise distances between all the points in the two PDs are sufficiently high. When the 2-dimensional pdfs obtained from kernel density estimates of the two PDs do not overlap anywhere, . This implies that if the variance of the kernel is sufficiently small, two PDs with non-overlapping points will always have close to . This problem can be alleviated by using kernels with multiple scales for smoothing the PDs to obtain and combining the distances obtained at each scale.

The comparison between the geodesics in the persistence space is also illuminating (Figure 3). Sampling of the geodesic in the Alexandrov space shows that the points in one diagram move towards the corresponding points in the other, as we move along the geodesic. Whereas, a similar sampling in the Hilbert sphere shows that the PDs in the middle of the geodesic contain the modes from pdfs corresponding to both the candidate PDs. This results in markedly different interpretations of the means computed from and even for the case of two persistence diagrams.

Several directions of future work emerge from this endeavor. On the theoretical side a more formal understanding of the variety of metrics and their relationship to the proposed one can be considered. On the applied side, the favorable computational load reduction should open new applications of TDA for massive datasets.

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
45319
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description