Ricci Curvature and the Manifold Learning Problem

Ricci Curvature and the Manifold Learning Problem

Antonio G. Ache Mathematics Department, Princeton University, Fine Hall, Washington Road, Princeton New Jersey 08544-1000 USA aache@math.princeton.edu  and  Micah W. Warren Department of Mathematics, University of Oregon, Eugene OR 97403 micahw@oregon.edu
Abstract.

Consider a sample of points taken i.i.d from a submanifold of Euclidean space. We show that there is a way to estimate the Ricci curvature of with respect to the induced metric from the sample. Our method is grounded in the notions of Carré du Champ for diffusion semi-groups, the theory of Empirical processes and local Principal Component Analysis.

The first author was partially supported by a postdoctoral fellowship of the National Science Foundation, award No. DMS-1204742.
The second author was partially supported by NSF Grant DMS-1438359.

1. Introduction

In this paper we are concerned with the structure of sets of large data in high dimensions. Even though we deal with sets of points in for large, a common assumption when studying large data sets is that the points lie in or on the vicinity of an embedded low dimensional submanifold of . This assumption is oftentimes called the manifold assumption and the study of geometric and topological properties of sets satisfying the manifold assumption is what we call nowadays manifold learning. The interest in the structure of large data sets comes from the need of organizing information arising from many different sources, for example, images, signals, genomes and other outcomes. Even though there has been significant progress in the manifold learning problem in the last decade, a fundamental question remains unanswered: construct an algorithm for learning or effectively estimating the curvature of a manifold that is being approximated by a point cloud. In this paper we lay the theoretical foundation for estimating the Ricci curvature of an embedded submanifold of if one only knows a point cloud approximating the submanifold. In particular, combining with the recent work of Singer and Wu [12] and PCA ( Principal Component Analysis) , we offer a construction which takes a point cloud and generates the geometric information as follows.

  • Input: A sequence of points and a scale parameter .


  • Output: For each point , an approximate basis for a tangent space at , and Ricci curvature matrix approximating the Ricci curvature with respect to this basis.

    If points are sampled randomly from a smooth submanifold, then for large and small, there will be an orthogonal connection matrix approximating the connection between the tangent spaces of nearby points

There is a choice of kernel and and PCA cutoff parameter, which may affect the output, but will not affect the limit when the points are sampled from a smooth submanifold as

Some advantages of our method are the following:

  • It generates a weighted Ricci curvature which takes into account the underlying probability density .

  • We do not need to approximate the derivatives of the underlying metric.

  • It generates an approximate Ricci curvature even without assuming that the manifold has a constant dimension

  • Our method allows one to study the convergence of the sample of version of Ricci curvature to its actual value based on extrinsic information like the reach of the submanifold (see Definition 3.10).

  • When the connection forms are available, it allows one to approximate the Hodge Laplacian on 1-forms.

As we will see, our method is based on the fact that it is possible to estimate the rough Laplacian of the induced metric of an embedded submanifold of . Given an embedded submanifold of , and an embedding , the metric induced by is given in coordinates by where is the Euclidean inner product in and means differentiation with respect to some coordinates on . By rough Laplacian of we mean the operator defined on functions by where is the Levi-Civita connection of . Belkin and Niyogi showed in [4] that given a uniformly distributed point cloud on there is a 1-parameter family of operators which converge to the Laplace-Beltrami operator on the submanifold. More precisely, the construction of the operators is based on an approximation of the heat kernel of , and in particular the parameter can be interpreted as a choice of scale. In order to learn the rough Laplacian from a point cloud it is necessary to write a sample version of the operators Then, supposing we have data points that are independent and identically distributed (abbreviated by i.i.d.) one can choose a scale in such a way that the operators converge almost surely to the rough Laplacian . This step follows essentially from applying a quantitative version of the law of large numbers. Thus one can almost surely learn spectral properties of a manifold. While in [4] it is assumed that the sample is uniform, it was proved by Coifman and Lafon in [5] that if one assumes more generally that the distribution of the data points has a smooth, strictly positive density in then it is possible to normalize the operators in [4] to recover the rough Laplacian. More generally, the results in [5] and [12] show that it is possible to recover a whole family of operators that include the Fokker-Planck operator and the weighted Laplacian associated to the smooth metric measure space , where is a smooth function.  As the Bakry-Emery Ricci tensor can be obtained by iterating , the Bakry-Emery Ricci tensor can be approximated by iterating approximations of   Following [4], Singer and Wu have recently developed methods for learning the rough Laplacian of an embedded submanifold on 1-forms using Vector Diffusion Maps (VDM) (see for example [12]).

Our interest in the Ricci curvature is not arbitrary, but is motivated by very concrete problems in applied mathematics. One example of these problems is the following: estimating the spectrum of the Hodge Laplacian on 1-forms with respect to the induced metric of an embedded submanifold of This is important in the study of topological properties of . On the other hand, using Singer and Wu’s analysis based on Vector Diffusion Maps, there has been significant progress in the estimation of the spectrum of the so called connection or rough Laplacian on 1-forms but unfortunately, there does not yet seem to be an effective way to estimate the Hodge Laplacian of an embedded manifold. We remark that an effective algorithm for learning the Ricci curvature of an embedded submanifold could in principle provide us with a method for estimating the Hodge laplacian on 1-forms in view of the Weitzenböck formula. More precisely, given a metric on and a 1-form we know that

where is the Ricci endomorphism applied to , i.e., . In terms of linear algebra, the if we have taken points from a dimensional manifold, with sufficiently large, the PCA will give us a basis for the tangent space, in which case the space of one forms can be described as a vector in - that is, for each point choosing a -vector. The Hodge Laplacian then becomes a map

which one can analyze. In the current paper, we do not attempt to prove spectral convergence, as one can see from the [13] that this is expected to be somewhat involved.

Another motivation for the problem of learning the Ricci curvature of a submaifold is related to problems in biology. Note that from a point cloud drawn from a submanifold, one can formally think of the point cloud as a graph approximating the submanifold. In [14], a connection has been established between Ricci curvature of graphs and the robustness of cancer networks. Moreover, it has been suggested that robustness of cancer networks is associated to a certain “entropy” and that the Ricci curvature of a graph is closely related to such entropy. The notion of Ricci curvature used in [14] is Ollivier’s coarse Ricci curvature, however, it is well known that the actual evaluation of Ollivier’s Ricci curvature requires explicit computation of Wasserstein distances, which can be a somewhat costly linear program. Our method gives an explicit formula that can be evaluated directly on the data points without having to compute any Wasserstein distances, (although we make no claims about the relative computational costliness of our procedure.)

It is the goal of this paper, together with [2] and [1], to demonstrate that the above discussed approximation of the rough Laplacian can be continued to approximate Ricci curvature as well. In fact, our approximation method is based on writing sample counterparts of the Ricci curvature. More generally, we will show that it is possible to show that one can define sample counterparts of more general objects, for example of the notions of Carré du Champ and iterated Carré du Champ associated to a diffusion semi-group. Our idea for estimating the Carré du Champ (and ultimately the Ricci curvature) from a sample is closely related to the results in [2, 1]. For example, in [1] we define a family of coarse Ricci curvatures which depend on a scale parameter and show that when taken on a smooth embedded submanifold on Euclidean space, these recover the Ricci curvature as . We will show that as long as we sample points adequately from the submanifold , it is possible to choose a scale depending only on the size of the data set (equal to ) to obtain almost sure convergence to the actual Ricci curvature of the submanifold at a given point. We will summarize the results in [2, 1] relevant to the present article in Section 2 (for example Theorem 2.1 and Proposition 2.2).

1.1. Background and Definitions

In this section we recall c.f. [2] how Ricci curvature on general metric spaces can be constructed with an operator, in particular the infinitesimal generator of a diffusion semi-group. When the space is a metric measure space, we use a family of operators which are intended to approximate a Laplace operator on the space at scale As this definition holds on metric measure spaces constructed from sampling points from a manifold, we can define an empirical or sample version of the Ricci curvature at a given scale . As mentioned above, this last construction will have an application to the Manifold Learning Problem, namely it will serve to predict the Ricci curvature of an embedded submanifold of if one only has a point cloud on the manifold and the distribution of the sample has a smooth positive density.

1.1.1. Carré du champ

We now recall how Bakry and Emery [3] related the notion of Carré du Champ to Ricci curvature. Let be a -parameter family of operators of the form

(1.1)

where is a bounded measurable function defined on and is a non-negative kernel. We assume that satisfies the semi-group property, i.e.

(1.2)
(1.3)

In , an example of is the Brownian motion, defined by the density

(1.4)

If now is a diffusion semi-group defined on , we let be the infinitesimal generator of which is densely defined in by

(1.5)

We consider a bilinear form which has been introduced in potential theory by J.P. Roth [10] and by Kunita in probability theory [8] and measures the failure of from satisfying the Leibnitz rule. This bilinear form is defined as

(1.6)

When is the rough Laplacian with respect to the metric , then

(1.7)

We will also consider the iterated Carré du Champ introduced by Bakry and Emery denoted by and defined by

(1.8)

Note that if we restrict our attention to the case the Bochner formula yields

(1.9)

We observe immediately that if and one can recover the Ricci tensor via

(1.10)

1.1.2. Approximations of the Laplacian, Carré du Champ and its iterate.

Following [4] and [5], we recall how to construct operators which can be thought of as approximations of the Laplacian on metric measure spaces. Consider a metric measure space with a Borel -algebra such that . Given , let be given by

(1.11)

We define a 1-parameter family of operators as follows: given a function on let

(1.12)

With respect to this one can define a Carré du Champ on appropriately integrable functions by

(1.13)

which simplifies to

(1.14)

In a similar fashion we define the iterated Carré du Champ of to be

(1.15)
Remark 1.1.

This definition of differs from Belkin-Niyogi operator in that we normalize by instead of for an assumed manifold dimension This allows our general discussion to fit into the framework of spaces with lower Ricci curvature bound, for example, the disjoint union of two manifolds of different dimensions or a sequence of manifolds which may be collapsing.

1.1.3. Empirical Carré du Champ at a given scale

We can also define empirical versions of and . On a space which consists of points sampled from a manifold, it is natural to consider the empirical measure defined by

where is the atomic point measure at the point (also called -mass). For any function we will use the notation

Notation 1.2.

We will use the “hat” notation (for example ) to distinguish those operators, measures, or -densities that have been constructed from a sample of finite points.

To be more precise, we define the operator as

(1.16)

where

(1.17)

and of course

(1.18)

The sample version of Carré du Champ will be the bilinear form which from (1.14) takes the form

(1.19)

We denote the iterated Carré du Champ corresponding to by , and by this we mean

(1.20)

1.2. Statement of Results

1.2.1. Applications to Manifold Learning

We now show how our notion of empirical Carré du Champ at a given scale has applications to the Manifold Learning Problem. For the rest of subsection 1.2.1 we will consider a closed, smooth, embedded submanifold of , and the metric measure space will be , where

  • is the distance function in the ambient space ,

  • is the volume element corresponding to the metric induced by the embedding of into .

In addition we will adopt the following conventions

  • All operators , and will be taken with respect to the distance and the measure .

  • All sample versions , and are taken with respect to the ambient distance .

The choice of the above metric measure space is consistent with the setting of manifold learning in which no assumption on the geometry of the submanifold is made, in particular, we have no a priori knowledge of the geodesic distance and therefore we can only hope to use the chordal distance as a reasonable approximation for the geodesic distance. We will show that while our construction at a scale involves only information from the ambient space, the limit as tends to will recover the Ricci curvature of the submanifold. As pointed out by Belkin-Niyogi [4, Lemma 4.3], the chordal and intrinsic distance squared functions on a smooth submanifold disagree first at fourth order near a point , so while much of the analysis is done on submanifolds, the intrinsic geometry will be recovered in the limit.

We now address the problem of choosing a scale depending on the size of the data and the dimension of the submanifold such that the sequence of empirical Ricci curvatures corresponding to the size of the data converge almost surely to the actual Ricci curvature of at a point. In order to simplify the presentation of our results, we start by stating the simplest possible case, which corresponds to a uniformly distributed i.i.d. sample The more general case of distributions with strictly positive density with respect to the Lebesgue measure is the subject of a forthcoming paper of the two authors.

Theorem A (Approximation of the Ricci Curvature).

Consider the metric measure space where is a smooth closed embedded submanifold. Suppose that we have a uniformly distributed i.i.d. sample of points from For let

(1.21)

For there exists a sequence of -tuples of orthogonal vectors, and Ricci matrices representing the Ricci curvature on these vectors, such at if , then

(1.22)

where are the components of the vector projected onto the -tuple of vectors approximating the tangent plane.

As mentioned above, we will see that the sequence of tangent spaces can be be constructed using a method known as local Principal Component Analysis (local PCA). Since this construction is a crucial part of the article [12], we will devote Section 4 to a a fairly detailed explanation of the construction. Even though the approximation method used to obtain Theorem A is inspired by the notion of Coarse Ricci curvature introduced by the authors in [2, 1], Theorem A relies heavily on a precise estimation of the tangent space at a point by means of local PCA, as opposed to the approximation proposed in [2, 1] based on the construction of an “auxiliary tangent space” by taking segments within the point cloud. One of the advantages of the use of local PCA is that we may drop the assumption that the sample points lie on the submanifold but instead we may assume that they lie in a vicinity of and still obtain a satisfactory approximation of the Ricci curvature at a point, whereas it is not clear if using [2, 1] we can obtain a reasonable approximation of the Ricci curvature when the points lie off the manifold.

Besides local PCA, the proof of Theorem A relies heavily on the following theorem for the iterated Carré du Champ: Theorem A in turn follows from an approximation result for the iterated Carré du Champ:

Theorem B.

Consider the metric measure space where is a smooth closed embedded submanifold. For let

(1.23)
  1. If , then

  2. If is the class of linear functions where is the ambient distance in , then

The proof of Theorem B requires using ideas from the theory of empirical processes for which we will provide the necessary background in Section 3. As pointed out earlier in this introduction, since we are interested in recovering an object from its sample version, we are forced to consider a law of large numbers in order to obtain convergence in probability or almost surely. The problem is that the sample version of involves a high correlation between the data points, destroying independence and any hope of applying large number results directly. The idea then is to reduce the convergence of the sample version of to the application of a uniform law of large numbers to certain classes of functions. Theorem B is proved in Section 3. In section 4 we will prove that Theorem B indeed implies Theorem A. This will require results from [1].

1.2.2. Smooth Metric Measure Spaces and non-Uniformly Distributed Samples

Consider a smooth metric measure space and let be the operator

In [5], the authors consider a family of operators which converge to Note that a standard computation (cf [16, Page 384]) gives

We adapt [5] to our setting: Recall that

and define, for

(1.24)

We can define the operator

(1.25)

and again obtain bilinear forms and . For the rest of the section we will consider the metric measure space where is an embedded submanifold, is the ambient distance and is a smooth function in . We again take all the operators and and their sample counterparts and with respect to the data of .

Based on estimates in [2] and calculations similar to the proof of Theorem B, we can prove

Theorem C (Non-uniform case).

Consider the metric space where is a smooth closed embedded submanifold. Suppose that we have an i.i.d. sample of points from whose common distribution has density . For let

Then, for any and any there exists a sequence of functions constructed from the data such that

and where

The proof of Theorem C will be the subject of a forthcoming paper of the authors.

1.3. Final remarks

Our results show that one can give a definition of a sample version of Ricci curvature at a scale on general metric measure spaces that converges to the actual Ricci curvature on smooth Riemannian manifolds. Moreover, our definition of empirical coarse Ricci curvature at a scale can be thought of as an extension of Ricci curvature to a class of discrete metric spaces namely those obtained from sampling points from a smooth closed embedded submanifold of . Note however, that in order to obtain convergence of the empirical coarse Ricci curvature at a scale to the actual Ricci curvature we need to assume that there is a manifold which fits the distribution of the data. Recently, Fefferman-Mitter-Narayanan in [6] have developed an algorithm for testing the hypothesis that there exists a manifold which fits the distribution of a sample, however, a problem that remains open is how to estimate the dimension of a submanifold from a sample of points.

In another vein, there is much current interest in a converse problem : The development of algorithms for generating point clouds on manifolds or even on surfaces. Recently, there has been progress in this direction by Karcher-Palais-Palais in [7], specifically on methods for generating point clouds on implicit surfaces using Monte Carlo simulation and the Cauchy-Crofton formula.

1.4. Organization of the paper

This paper is devoted to proving Theorems A and B. In Section 2 we summarize some of the results in [1]. The core of the paper will be Section 3 devoted to the prove of Theorem B. In Section 4 we review the construction of local PCA in [12] and show how can we combine this construction with Theorem B to prove Theorem A.

1.5. Acknowledgements

The authors would like to thank Amit Singer, Hau-Tieng Wu and Charles Fefferman for constant encouragement. The first author would like to express gratitude to Adolfo Quiroz for very useful conversations on the topic of empirical processes, and to Richard Palais for bringing his work to the attention of both authors. The second author would like to thank Jan Maas for useful conversations, and Matthew Kahle for stoking his interest in the topic.

2. Summary of previous results. Proof of Theorem A

We now recall the following result proved in [1]:

Theorem 2.1 (See [1]).

Let be a closed embedded submanifold, let be the Riemannian metric induced by the embedding, and let be the metric measure space defined with respect to the ambient distance. Given any where is the class of functions

there exists a constant depending on the geometry of and the function such that

A fundamental step for proving Theorem 2.1 is the following proposition shown in [1]:  For simplicity we will assume that has unit volume. Recall the definitions (1.11), (1.12), (1.13), (1.14) and (1.15).

Proposition 2.2 (See [1]).

Suppose that is a closed, embedded, unit volume submanifold of . Let also be the metric induced by the embedding of into . For any in and for any functions in we have

(2.1)
(2.2)
(2.3)
(2.4)

and

(2.5)

where each is a locally defined function, which is smooth in its arguments, and is a locally defined -jet of the function . Also, each is a locally defined function of which is uniformly bounded in terms of its arguments.

We will show in Section 4 that Propositions 2.1 and 2.2 are needed to prove Theorem A.

2.1. Life-Sized Coarse Ricci Curvature

As mentioned above, in [2, 1], the authors have formulated a notion of Coarse Ricci Curvature alternative to Ollivier’s Coarse Ricci curvature. The purpose of this section is to formulate the results of this paper in terms of the notions developed in [2, 1]. In particular, we show how Ricci curvature can be approximated using test functions different to the linear functions in Theorem A. For any we define

(2.6)

We also define for the following function

(2.7)
Definition 2.3.

Given an operator we define the coarse Ricci curvature for as

(2.8)

In principle, the functions in (2.6) and (2.7) serve as a substitute of the linear functions in Theorem A (see also Section 4). Notice that the quantity in (2.8) is the same order as distance squared. To obtain a quantity that does not vanish near the diagonal, we use (2.6):

Definition 2.4.

Given an operator we define the life-sized coarse Ricci curvature for as

(2.9)

From this, one can define notions of empirical coarse Ricci curvature, by taking the sample versions of (2.8) and (2.9)

Inspired by Theorems A and B, the results at the end of Section 3 will lead easily to the following.

Corollary A.

Let be an embedded submanifold and consider the metric measure space . Suppose that we have an i.i.d. uniformly distributed sample drawn from . Let

for any . Then

In other words, there is a choice of scale depending on the size of the data and the dimension of the submanifold for which the corresponding empirical life-sized coarse Ricci curvatures converge almost surely to the life-sized coarse Ricci curvature.

The proof is given at the end of Section 3.

Remark 2.5.

The convergence is better if is chosen to go to zero slower than in (3.100). In particular, if one replaces with an upper bound on , then Theorem B and Corollary A still hold.

Another result proved in [1] is

Corollary 2.6.

With the hypotheses of Theorem 2.1 we have

We note that the relation between the coarse Ricci curvature and the Ricci curvature is as follows.

Proposition 2.7 (See [1]).

Suppose that is a smooth Riemannian manifold. Let with  Then

3. Empirical Processes and Convergence. Proof of Theorem B

The goal of this section is to prove Theorem B. This will be done using tools from the theory of empirical processes in order to establish uniform laws of large numbers in a sense that we will explain in Sections 3.2 through 3.6. For a standard reference in the theory of empirical processes, see [15]. See also [13] for further applications of the theory of empirical processes to the recovery of diffusion operators from a sample.

3.1. Estimators of the Carré Du Champ and the Iterated Carré Du Champ in the uniform case

Let us assume that the measure is the volume measure . Recall that our formal definition of the Carré du Champ of with respect to the uniform distribution is given by

(3.1)

It is clear from (3.1) that a sample estimator of the Carré Du Champ at a point is given by

(3.2)

and recall that we defined the -Laplace operator by

(3.3)

and its sample version is

(3.4)

Recall that the iterated Carré du Champ is

(3.5)

For simplicity, we will evaluate at a pair instead of and by symmetry it is clear that we obtain

(3.6)

Combining the sample versions of and we obtain a sample version for

(3.7)
(3.8)
(3.9)
(3.10)

In principle, the convergence analysis for (3.7)-(3.10) can be done using the following standard result in large deviation theory.

Lemma 3.1 (Hoeffding’s Lemma).

Let be i.i.d. random variables on the probability space where is the Borel -algebra of , and let be a Borel measurable function with . Then for the corresponding empirical measure and any we have

Observe, however, that (3.7)-(3.10) is a non-linear expression which will involve non-trivial interactions between the data points . This non-trivial interaction between the points will produce a loss of independence and we will not be able to apply Hoeffding’s Lemma directly to (3.7)-(3.10). In order to address this difficulty we will establish several uniform laws of large numbers which will provide us with a large deviation estimate for (3.7)-(3.10).

Remark 3.2.

We will not use directly the expression (3.7)-(3.10), instead we will write (3.7)-(3.10) schematically in the form

(3.11)

which is clearly equivalent to (3.7)-(3.10).

3.2. Glivenko-Cantelli Classes

A Glivenko-Cantelli class of functions is essentially a class of functions for which a uniform law of large numbers is satisfied.

Definition 3.3.

Let be a fixed probability distribution defined on . A class of functions of the form is Glivenko-Cantelli if

  1. for any ,

  2. For any i.i.d. sample drawn from whose distribution is we have uniform convergence in probability in the sense that for any

    (3.12)
Remark 3.4.

Note that in general we have to consider outer probabilities instead of because the class may not be countable and the supremum may not be measurable. On the other hand, if the class is separable in , then we can replace by . While all of the classes we will encounter in this paper will be separable in we use when we deal with a general class.

Let be a class of functions defined on and totally bounded in . Given we let be the -covering number of , i.e.,

(3.13)
Lemma 3.5.

Let be an equicontinuous class of functions in that satisfies for some . Then for any distribution which is absolutely continuous with respect to the class is -Glivenko-Cantelli. Moreover, if is an i.i.d. sample drawn from with distribution we have

(3.14)
Proof.

By equicontinuity of , it follows from the Arzelà-Ascoli theorem that is precompact in the norm and hence totally bounded in . In particular for every , the number is finite. Let be a finite class such that the union of all balls with center in and radius covers and . For any there exists such that and we obtain

(3.15)

and clearly

(3.16)

Fixing and choosing we observe that

(3.17)

and by Hoeffding’s inequality we have

(3.18)

which implies the lemma. ∎

Recall that as in the statement of Theorem A, we have defined the space of functions with bounded Lipschitz semi-norm in . To be clear, we will be using the following norms and semi-norms:

(3.19)
(3.20)

where

i.e., the norm is the norm in the ambient space , and

(3.21)