A Riemannian Framework for Statistical Analysis of Topological Persistence Diagrams
Abstract
Topological data analysis is becoming a popular way to study high dimensional feature spaces without any contextual clues or assumptions. This paper concerns itself with one popular topological feature, which is the number of dimensional holes in the dataset, also known as the Betti number. The persistence of the Betti numbers over various scales is encoded into a persistence diagram (PD), which indicates the birth and death times of these holes as scale varies. A common way to compare PDs is by a pointtopoint matching, which is given by the Wasserstein metric. However, a big drawback of this approach is the need to solve correspondence between points before computing the distance; for points, the complexity grows according to n. Instead, we propose to use an entirely new framework built on Riemannian geometry, that models PDs as 2D probability density functions that are represented in the squareroot framework on a Hilbert Sphere. The resulting space is much more intuitive with closed form expressions for common operations. The distance metric is 1) correspondencefree and also 2) independent of the number of points in the dataset. The complexity of computing distance between PDs now grows according to , for a discretization of . This also enables the use of existing machinery in differential geometry towards statistical analysis of PDs such as computing the mean, geodesics, classification etc. We report competitive results with the Wasserstein metric, at a much lower computational load, indicating the favorable properties of the proposed approach.
1 Introduction
Topological data analysis (TDA) has emerged as a useful tool to analyze underlying properties of data before any contextual modeling assumptions kick in. Generally speaking, TDA seeks to characterize the shape of high dimensional data (viewed as a pointcloud in some metric space), by quantifying various topological constructs such as connected components, highdimensional holes, levelsets and monotonic regions of functions defined on the data [9]. In particular, the number of dimensional holes in a data, also known as the Betti number, corresponds to the rank of the dimensional homology group. A commonly used and powerful topological feature, the persistence diagram (PD) summarizes the persistence of the Betti numbers with respect to the scale parameter used to analyze the data. A typical machine learning pipeline using TDA features would first estimate the PDs from the given point cloud, and define a metric on them to compare different point clouds. The Wasserstein distance measure has become ubiquitous as a metric between such PDs, as it is stable and has a welldefined metric space associated with it [21, 8]. However, computation of the Wasserstein distance involves finding a onetoone map between the points in one persistence diagram to those in the other, which is a computationally expensive operation.^{†}^{†}footnotetext: This work was supported by NSF CAREER grant 1452163.
In this paper, we propose a novel representation for persistence diagrams as points on a hypersphere, by approximating them as 2D probability density functions (pdf). We perform a squareroot transform of the constructed pdf, wherein the Hilbert sphere becomes the appropriate space for defining metrics [19]. This insight is used to construct closed form metrics – geodesics on the Hilbert sphere – which can be computed very efficiently, bypassing the correspondence problem entirely. The overall pipeline of operations for computing the proposed representation is given in Figure 1.
The biggest advantage of the proposed representation is that it completely works around the computationally expensive step of obtaining onetoone correspondences between points in persistence diagrams, thereby making the distance computation between PDs extremely efficient. We show that approximating PDs as pdfs results in comparable performances to the popular and best performing Wasserstein metric, while at the same time being orders of magnitude faster. We also provide a theoretically wellgrounded understanding of the geometry of the pdf representation. Additionally, the availability of closed form expressions for many important tools such as the geodesics, exponential maps etc. enables us to adapt powerful statistical tools – such as clustering, PCA, sample mean etc. – opening new possibilities for applications involving large datasets. To the best of our knowledge we are the first to propose this representation for persistence diagrams.
Contributions

We present the first representation of persistence diagrams that are approximated as points on a Hilbert sphere resulting in closedform expressions to compare two diagrams. This also completely avoids the correspondence problem, which is typically a computational bottleneck.

We demonstrate the ability of the proposed representation for statistical analyses, such as computing the mean persistence diagram, principal geodesic analysis (PGA), and classification using SVMs.

We show promising results for supervised tasks such as action recognition, and assessment of movement quality in stroke rehabilitation.

The space of the representation – the Hilbert sphere – is a geometrically wellunderstood and intuitive space, which may further promote its use in topological machine learning applications.
The rest of the paper is organized as follows. Section 2 discusses related work in more detail. Section 3 provides the necessary background on persistent homology, the space of persistence diagrams, and the squareroot framework on the Hilbert space. Section 4 provides details about the proposed framework for using the new representation of persistence diagrams in a functional Hilbert space for statistical learning tasks. Section 5 describes the experiments and results. Section 6 concludes the paper.
2 Related Work
Persistence diagrams elegantly summarize the topological properties of point clouds, and the first algorithm for computing topological persistence was proposed in [9]. Ever since, there has been an explosion in understanding the properties of PDs, comparing PDs, and computing statistics on them. Since a PD is a multiset of offdiagonal points along with infinite number of points on the diagonal, the cost of optimal bijective assignment, also known as the Wasserstein distance, between the individual points in the PDs is a valid distance measure. The time complexity of computing the distance between two PDs with points each is [4]. It has been shown in [8] that the persistence diagrams of Lipschitz functions are stable with respect to Wasserstein distance. However, the bottleneck and Wasserstein distance do not allow for easy computation of statistics. Hence, Wasserstein metrics have been used to develop approaches for computing statistical summaries such as the Fréchet mean [21]. Computing Fréchet mean of PDs involves iteratively assigning points from the individual diagrams to the candidate mean diagram, recomputing the mean, and repeating till convergence. Since the mean PD obtained is not guaranteed to be unique, the authors in [16] proposed the probabilistic Fréchet mean which itself is a probability measure on the space of PDs. An investigation of the properties of the space of PDs that allow for definition of various probability measures is reported in [15], and a derivation of confidence sets that allow the separation of topological signal from noise is presented in [10].
Since operations with PDs can be computationally expensive, Bubenik proposed a new topological summary  known as the persistence landscape  derived from the PD [5]. It can be thought of as a collection of envelope functions on the points in PD based on the order of their importance. Persistence landscapes allow for fast computation of statistics since they lie in a vector space. However their practical utility has been limited since they provide decreasing importance to secondary and tertiary features in PDs that are usually useful in terms of discriminating between data from different classes. Recently, an approach that defines a stable multiscale kernel between persistence diagrams has been proposed [17]. The kernel is obtained by creating a surface with a Gaussian centered at each point in the PD along with a negative Gaussian centered at its reflection across the diagonal. In addition, the authors in [2] propose to compute a vector representation  the persistence image  by computing a kernel density estimate of the PD and integrating it locally. The kernel density estimate is weighted such that the points will have increasing weights as they move away from the diagonal. A similar weighting approach is used in [14] to create a persistenceweighted Gaussian kernel.
The squareroot representation for 1D probability density functions (pdfs) was proposed in [19] and was used for shape classification. It has since been used in several applications, including activity recognition [22]  where the squareroot velocity representation is used to model the space of time warping functions. This representation extends quite easily to arbitrary highdimensional pdfs as well.
3 Mathematical Preliminaries
In this section we will introduce the concepts of a) persistence homology, b) persistent diagrams, c) our proposed representation of PDs as a 2D pdf, and the squareroot framework resulting in the Hilbert sphere geometry.
3.1 Persistent Homology
The homology of point cloud can be computed by first constructing a shape out of the point cloud and estimating its homology. A simplex is imposed on every set of points in the point cloud that satisfy a neighborhood condition based on a scale parameter . The collection of such simplices is known as the simplicial complex . can be thought of as a representation of the underlying shape of the data, and its homology can be inferred. However, homology groups obtained from depend on the scale or time parameter based on which the complexes are constructed [9]. The homological features of the simplicial complex constructed from the point cloud that are stable across scales, i.e., that are persistent, are the ones that provide information about the underlying shape. Topological features that do not persist are considered to be noise. This information is represented using persistence diagrams as a 2D plot of birth versus death times of each homology cycle corresponding to the homology group . The birth time is the scale at which the homological feature is born and the death time is the scale at which it ceases to exist. The homology cycle of dimension is also referred to as a dimensional hole. Therefore, the PD can be considered as an object that represents the number of holes in terms of the range of scales at which they appear. Typically, PDs of the point cloud data are obtained using filtration of simplicial complexes. A wellknown filtration is the VietorisRips (VR) filtration, where a simplex is induced between a group of points whenever the distance between each pair of points is less than the given scale [26]. An example of point clouds and their corresponding persistence diagrams for homology groups 0 and 1 are provided in Figure 2.
3.2 The Space of Persistence Diagrams
Every PD is a multiset of 2D points, where each point denotes the birth and death time of a homology group. Furthermore, the diagonal is assumed to contain an infinite number of points with the same birth and death times. For any two PDs and , the distance between the diagrams is usually quantified using the bottleneck distance or the Wasserstein metric [13]. In this paper, we consider only the  and Wasserstein ( and ) metrics given respectively as,
(1) 
and
(2) 
Since each diagram contains an infinite number of points in the diagonal, this distance is computed by pairing each point in one diagram uniquely to another nondiagonal or diagonal point in the other diagram, and then computing the distance. This correspondence can be obtained via the Hungarian algorithm or its more efficient variants [13].
The space of PDs with respect to the Wasserstein metric is given by
(3) 
Turner et. al. [21] show this is a nonnegatively curved Alexandrov space. Furthermore, the diagram on the geodesic between the PDs and in this space is given as
(4) 
where is a point in the diagram , is a corresponding point in the diagram and parametrizes the geodesic. Clearly, the points in the diagram on the geodesic can be simply obtained by linearly interpolating between the corresponding points on the candidate points and . Furthermore, the Riemannian mean between two PDs is easily computed as .
3.3 Squareroot Framework and the Hilbert Sphere
We treat the points in a persistence diagram as samples from an underlying probability distribution function. This representation is inspired from recent work dealing with defining Mercer kernels on PDs [17], where the feature embedding obtained from the Mercer kernel closely resembles a kerneldensity estimator from the given points, with an additional reflection term about the diagonal to account for boundary effects while solving a heatequation. The fact that the feature embedding resembles a kernel density estimate is not further exploited in [17].
In our work, we more directly exploit this pdf interpretation of PDs, further endowing it with a squareroot form [19] – and then utilizing the Hilbert spherical geometry that results.
Without loss of generality, we will assume that the supports for each 2D pdf is . The space of pdfs that we will consider is
(5) 
It is wellknown that is not a vector space [19]. Instead, it is a Riemannian manifold with the FisherRao metric as the natural Riemannian metric. Geodesics under the FisherRao metric are quite difficult to compute. Instead, we adopt a squareroot form proposed by Srivastava et. al. [19] which simplifies further analysis enormously. In other words we consider the space,
(6) 
For any two tangent vectors , the FisherRao metric is given by,
(7) 
The above two pieces of information imply that the squareroot form results in the space becoming a unit Hilbertsphere, endowed with the usual innerproduct metric. Geodesics on the unitHilbert sphere under the above Riemannian metric are known in closed form. In fact, the differential geometry of the Hilbert sphere results in closed form expressions for computing geodesics, exponential and inverse exponential maps [19]. Further, the squareroot form with the above metric has additional favorable properties such as invariance to domain reparameterization.
Given two points the geodesic distance between them is given by
(8) 
where . The exponential map is defined as,
(9) 
where is a tangent vector at = . The logarithmic map from to is given by:
(10) 
with The geodesic on the sphere is given in closed form as
(11) 
or equivalently as
(12) 
A comparison of sampling of the geodesics in the Alexandrov space induced by the Wasserstein metric and the proposed Hilbert sphere are provided in Figure 3.
4 Algorithmic Details
Our proposed framework consists of reconstructing the phase space of the time series data, computing the PDs of the reconstructed phase space, transforming each PD to a 2D pdf by kernel density estimation, using the 2D pdfs in the squareroot framework to estimate distances or obtain statistical summaries. The distances computed and the features obtained will be used in inference tasks. Additional details for some of the above steps are given below.
Phasespace Reconstruction from Activity Data
In dynamical system understanding, we aim to estimate the underlying states, but we measure functions – usually projections – of the original statespace. e.g. while human movement is influenced by many joints, ligaments, muscles of the body, we might measure the location of only a few joints. One way to deal with this issue is to reconstructing the ‘phase space’ by the method of delays used commonly in nonlinear dynamical analysis [20]. Reconstructing the phasespace in this manner only preserves important topological properties of the original dynamical system, and does not recover the true statespace. For a discrete dynamical system with a multidimensional phasespace, timedelay vectors (or embedding vectors) are obtained by the concatenation of delayed versions of the data points,
(13) 
where is the embedding dimension, is the embedding delay. For a sufficiently large , the important topological properties of the unknown multidimensional system are reproduced in the reconstructed phasespace [1]. The collection of timedelay vectors is the point cloud data under further consideration, and this is represented as .
Estimating the PDs
After estimating the point cloud in the reconstructed phase space, we will construct a simplicial complex using the point cloud data to compute the persistence diagrams for the Betti numbers using the VR filtration. However, this approach considers only spatial adjacency between the points and ignores the temporal adjacency. We improve upon the existing VR filtration approach by explicitly creating temporal links between , , and in the oneskeleton of , thereby creating a metric space which encodes adjacency in both space and time [23]. The persistence diagrams for homology groups of dimensions and are then estimated.
Squareroot Framework for Distance Estimation between PDs
Since each PD is a multiset of points in the 2D space, we start by constructing a 2D pdf from each of them using kernel density estimation using a Gaussian kernel of zero mean and variance . For each PD, we compute the squareroot representation using the approach provided in Section 3.3, and the distance between two PDs can be computed using (8).
Dimensionality Reduction with Principal Geodesic Analysis (PGA)
We are able to extract principal components using PGA [11], adapted to our Hilbert sphere – which essentially performs classical PCA on the tangent space of the meanpoint on the sphere. Given a dataset of persistence diagrams in their squareroot form , we first compute the Riemannian center of mass [12]. We use a simple approximation using an extrinsic algorithm that computes first the Euclidean mean and maps it on to the closest point on the sphere. Next, we represent each by its tangent vector from the mean. We then compute the principal components on the tangent space and represent a persistence diagram as a lowdimensional vector.
5 Experiments
We perform experiments on two datasets for human action analysis. First we perform action recognition on the MoCap dataset [3], next we show the use of the framework in quality assessment of movements in stroke rehabilitation [7]. We will describe the datasets next, followed by the evaluation settings and parameters used. In all our experiments, we performed a discretization of the 2D pdf into a grid. The choice of indirectly affects classification performance as expected, i.e. – a smaller value of results in a reduced ability to identify between nearby points in the PD due to lower resolution. On the other hand, a larger value of improves resolution, but also increases computational requirements. Typical values of used in experiments range from to at most, whereas typical values of – the number of points in a PD – ranges from to .
5.1 Motion Capture Data
We evaluate the performance of the proposed framework using dimensional motion capture sequences of body joints [3]. The dataset is a collection of five actions: dance, jump, run, sit and walk with and instances respectively.
Method  Accuracy (%)  Time (sec) 

Chaos [3]  52.44   
VRComplex [26]  93.68   
DT2 [24]  93.92   
TVR Complex (L1) [23]  96.48  
Proposed  1NN  89.87  
Proposed  PGA +SVM  91.68   
We generate random splits having testing examples from each action class and use an SVM classifier on the vector features computed with PGA, we get a performance of . The mean recognition rates for the different methods are given in Table 1. We also compare with a 1nearest neighbor classifier computed on the Hilbert sphere, which gives a performance of . This is clearly competitive, when compared to approaches that use the Wasserstein metric as shown in table 1. Clearly, the proposed metric for persistence diagrams is able to capture most of the important information, with the added benefit being free from expensive correspondence estimation. To compute the times taken using the Wasserstein metric and the proposed metric, we average across 3 samples from 5 action classes, across (19 joint trajectories along 3 axes) trajectories – a total of persistence diagrams. We computed a pairwise distance matrix for all these PDs, and computed the average time taken in Matlab 2014 on a standard Intel i7 PC.
In Figure 4, we compare the recognition accuracy with the choice of , the standard deviation used during kernel density estimation. The accuracy is generally higher for a small and drops as the increases. We note that a similar trend is also reported in [17].
5.2 Stroke Rehabilitation Dataset
Our aim in this experiment is to quantitatively assess the quality of movement performed by the impaired subjects during repetitive task therapy. The experimental data was collected using a heavy markerbased system ( markers on the right hand, arm and torso) in a hospital setting from impaired subjects performing reach and grasp movements to a defined target. The stroke survivors were also evaluated by the Wolf Motor Function Test (WMFT) [25] on the day of recording, which evaluates a subject’s functional ability on a scale of (with being least impaired and being most impaired) based on predefined functional tasks. In our experiments, we only use the data corresponding to the single marker on the wrist from the heavy markerbased hospital system, which allows us to evaluate the framework in a homebased rehabilitation system setting. We utilize the WMFT scores as an approximate highlevel quantitative measure for movement quality (groundtruth labels) of impaired subjects performing the reach and grasp movements.
We compute explicit features as described earlier using PGA. This allows us to represent each movement by a lowdimensional feature vector. We use the WMFT scores and split the dataset into train and test to perform regression. Since the dataset is small, we use a leaveoneout approach, where we train on all but one sample, which is used for testing, and repeat this over all the samples in the dataset. The correlation performance with the WMFT score is shown in Table 2, and it can be seen that we are comparable to the state of the art, and much better than traditional features. The predicted scores are shown in Figure 6 compared to the original scores.
The dynamical features proposed in [24] and [3], depend on describing the phase space for each movement. We compute the topological features, which are expected to be much weaker than other characteristics such as shape. However, we are still able to capture subtle information regarding movement quality. This is illustrated in Figure 5, where we see the 3D peaks associated with the average of persistence diagrams across subjects impaired with stroke and those who are unimpaired. It is clearly seen that since they are performing the same kind of movements, the peaks occur at the same location. However, the intensity is significantly different across these diagrams, perhaps indicating information regarding movement quality.
Method  Correlation with WMFT 

KIM (14 markers) [6]  0.85 
Lyapunov exponent (1 marker) [18]  0.60 
Proposed method (1 marker)  0.80 
6 Discussion and Conclusion
Based on the theory and experiments presented so far, it is instructive to compare the space of PDs with respect to the different distance metrics. We will consider only the Wasserstein metric () and the proposed Hilbert sphere metric . The interpretation of PDs with respect to these two metrics is very different. is the Earth Mover’s Distance between the PDs and considered as scaled discrete probability mass functions (pmfs). However, is the geodesic distance between the square root of kernel density estimates of and . Furthermore, the “length” of the persistence diagrams induced by these metrics are very different. For the Wasserstein metric, this is given by which can be arbitrarily high, whereas with the proposed metric it is constrained as the persistence diagrams live on a unit sphere. The most important distinction arises when the pairwise distances between all the points in the two PDs are sufficiently high. When the 2dimensional pdfs obtained from kernel density estimates of the two PDs do not overlap anywhere, . This implies that if the variance of the kernel is sufficiently small, two PDs with nonoverlapping points will always have close to . This problem can be alleviated by using kernels with multiple scales for smoothing the PDs to obtain and combining the distances obtained at each scale.
The comparison between the geodesics in the persistence space is also illuminating (Figure 3). Sampling of the geodesic in the Alexandrov space shows that the points in one diagram move towards the corresponding points in the other, as we move along the geodesic. Whereas, a similar sampling in the Hilbert sphere shows that the PDs in the middle of the geodesic contain the modes from pdfs corresponding to both the candidate PDs. This results in markedly different interpretations of the means computed from and even for the case of two persistence diagrams.
Several directions of future work emerge from this endeavor. On the theoretical side a more formal understanding of the variety of metrics and their relationship to the proposed one can be considered. On the applied side, the favorable computational load reduction should open new applications of TDA for massive datasets.
References
 [1] H. D. Abarbanel. Analysis of observed chaotic data. New York: SpringerVerlag, 1996.
 [2] H. Adams, S. Chepushtanova, T. Emerson, E. Hanson, M. Kirby, F. Motta, R. Neville, C. Peterson, P. Shipman, and L. Ziegelmeier. Persistent images: A stable vector representation of persistent homology. ArXiv preprint arXiv:1507.06217, 2015.
 [3] S. Ali, A. Basharat, and M. Shah. Chaotic invariants for human action recognition. In IEEE International Conference on Computer Vision, pages 1–8, Oct. 2007.
 [4] D. P. Bertsekas. A new algorithm for the assignment problem. Mathematical Programming, 21(1):152–171, 1981.
 [5] P. Bubenik. Statistical topological data analysis using persistence landscapes. The Journal of Machine Learning Research, 16(1):77–102, 2015.
 [6] Y. Chen, M. Duff, N. Lehrer, H. Sundaram, J. He, S. L. Wolf, and T. Rikakis. A computational framework for quantitative evaluation of movement during rehabilitation. In AIP Conference ProceedingsAmerican Institute of Physics, volume 1371, pages 317–326, 2011.
 [7] Y. Chen, M. Duff, N. Lehrer, H. Sundaram, J. He, S. L. Wolf, T. Rikakis, T. D. Pham, X. Zhou, H. Tanaka, et al. A computational framework for quantitative evaluation of movement during rehabilitation. In AIP Conference ProceedingsAmerican Institute of Physics, volume 1371, page 317, 2011.
 [8] D. CohenSteiner, H. Edelsbrunner, J. Harer, and Y. Mileyko. Lipschitz functions have l pstable persistence. Foundations of Computational Mathematics, 10(2):127–139, 2010.
 [9] H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistence and simplification. Discrete and Computational Geometry, 28(4):511–533, 2002.
 [10] B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan, A. Singh, et al. Confidence sets for persistence diagrams. The Annals of Statistics, 42(6):2301–2339, 2014.
 [11] P. T. Fletcher, C. Lu, S. M. Pizer, and S. C. Joshi. Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Transactions on Medical Imaging, 23(8):995–1005, August 2004.
 [12] K. Grove and H. Karcher. How to conjugate Cclose group actions. Math.Z, 132:11–20, 1973.
 [13] M. Kerber, D. Morozov, and A. Nigmetov. Geometry helps to compare persistence diagrams. In Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pages 103–112, 2016.
 [14] G. Kusano, K. Fukumizu, and Y. Hiraoka. Persistence weighted gaussian kernel for topological data analysis. arXiv preprint arXiv:1601.01741, 2016.
 [15] Y. Mileyko, S. Mukherjee, and J. Harer. Probability measures on the space of persistence diagrams. Inverse Problems, 27(12):124007, 2011.
 [16] E. Munch, P. Bendich, K. Turner, S. Mukherjee, J. Mattingly, and J. Harer. Probabilistic Fréchet means and statistics on vineyards. ArXiv preprint arXiv:1307.6530, 2013.
 [17] J. Reininghaus, S. Huber, U. Bauer, and R. Kwitt. A stable multiscale kernel for topological machine learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4741–4748, 2015.
 [18] M. Rosenstein, J. Collins, and C. De Luca. A practical method for calculating largest lyapunov exponents from small data sets. Physica D: Nonlinear Phenomena, 65(1):117–134, 1993.
 [19] A. Srivastava, I. Jermyn, and S. Joshi. Riemannian analysis of probability density functions with applications in vision. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2007.
 [20] F. Takens. Detecting strange attractors in turbulence. Dynamical Systems and Turbulence, 898:366–381, 1981.
 [21] K. Turner, Y. Mileyko, S. Mukherjee, and J. Harer. Fréchet means for distributions of persistence diagrams. Discrete & Computational Geometry, 52(1):44–70, 2014.
 [22] A. Veeraraghavan, A. Srivastava, A. K. RoyChowdhury, and R. Chellappa. Rateinvariant recognition of humans and their activities. Image Processing, IEEE Transactions on, 18(6):1326–1339, 2009.
 [23] V. Venkataraman, K. N. Ramamurthy, and P. Turaga. Persistent homology of attractors for action recognition. arXiv preprint arXiv:1603.05310, 2016.
 [24] V. Venkataraman and P. Turaga. Shape distributions of nonlinear dynamical systems for videobased inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016 (accepted) ; arXiv:1601.07471.
 [25] S. L. Wolf, P. A. Catlin, M. Ellis, A. L. Archer, B. Morgan, and A. Piacentino. Assessing wolf motor function test as outcome measure for research in patients after stroke. Stroke, 32(7):1635–1639, 2001.
 [26] A. Zomorodian. Fast construction of the VietorisRips complex. Computers & Graphics, 34(3):263–271, 2010.