1 Introduction

Reconstructing Functions from Random Samples

Abstract.

From a sufficiently large point sample lying on a compact Riemannian submanifold of Euclidean space, one can construct a simplicial complex which is homotopy-equivalent to that manifold with high confidence. We describe a corresponding result for a Lipschitz-continuous function between two such manifolds. That is, we outline the construction of a simplicial map which recovers the induced maps on homotopy and homology groups with high confidence using only finite sampled data from the domain and range, as well as knowledge of the image of every point sampled from the domain. We provide explicit bounds on the size of the point samples required for such reconstruction in terms of intrinsic properties of the domain, the co-domain and the function. This reconstruction is robust to certain types of bounded sampling and evaluation noise.

Key words and phrases:
55U05, 55U10, 55U15, 62-07, Homology, Homotopy, Nonlinear maps

1. Introduction

The use of algebraic topological methods for the analysis of nonlinear data has become a subject of considerable interest with a wide variety of promising applications [3, 5, 7, 10]. With modern technology, large and high-dimensional datasets are easily collected. However, in many cases the data are generated by a nonlinear system with many fewer degrees of freedom than the ambient dimension, and thus one may expect that the data actually lie on a much lower dimensional manifold. In this setting, it seems reasonable that the geometry generated by the data may provide insight concerning the original system. Given that such data are typically finite and noisy, relatively crude invariants – such as homology or homotopy groups – appear to be natural tools for capturing aspects of their underlying topological structure. The same heuristic arguments extend to transformations from one dataset to another. In such analysis, one is interested not only in the geometry associated with the data, but also in the action of an unknown nonlinear process on the data. This action may be partially characterized by the maps which it induces on homology and homotopy groups.

Given these arguments, an obvious but mostly unresolved mathematical issue is to quantify the extent to which we can extract correct topological features from noisy data. An important first step in this direction is the following result due to Partha Niyogi, Steve Smale, and Shmuel Weinberger [13]. Let denote Euclidean -dimensional space with the standard metric, and consider a compact -dimensional Riemannian submanifold . The condition number of is defined as follows: is the largest positive real number such that for any , the normal bundle of radius about can be embedded in .

Theorem 1.1.

([13, Thm 3.1]) Let be a compact -dimensional Riemannian submanifold of with condition number . Given

  1. some probability parameter ,

  2. a radius , and

  3. a finite set of independent and identically distributed (i.i.d.) uniformly sampled points,

let denote the union of -dimensional open -balls centered at the points in . If the sample size is larger than a bounding value (see Definition 3.1), then is homotopy-equivalent to with probability exceeding .

Thus, the union of balls of a suitably chosen radius around a sufficiently large point sample suffices to recover the homotopy type of that manifold with high confidence. From a computational perspective, recall that the nerve of a cover [12, 15] is the abstract simplicial complex where each -dimensional simplex corresponds to an intersection of sets of that cover. If we let denote the nerve corresponding to the cover of by its constituent open balls, then one obtains an isomorphism between the singular homology of and the simplicial homology of by using the nerve lemma. Thus, it is possible to successfully compute the homology of an unknown manifold with high probability from a sufficiently dense point sample .

An obvious next step is to obtain bounds on the probability of reconstructing, up to homotopy, a continuous function between Riemannian manifolds from images of dense samples. This is the focus of our work with the main result as follows.

Theorem 1.2.

Let and be compact Riemannian submanifolds with condition numbers and respectively and let be a Lipschitz continuous function with Lipschitz constant bounded above by some . Given

  1. probability parameters ,

  2. radii and satisfying , and

  3. finite sets and of independent and identically distributed (i.i.d.) uniformly sampled points,

let and be nerves of the covers generated by open balls of radius and around and respectively. If and , then there exists a simplicial map which

  1. recovers the homotopy class of with probability exceeding , and

  2. can be explicitly constructed using only , , , , and the restriction .

In particular, the morphisms induced by on the simplicial homology as well as homotopy groups of the nerves faithfully capture their singular counterparts induced by with high confidence. Note from Theorem 1.1 that the upper bound on the probability of failing to produce is no larger than the corresponding bound on failing to reconstruct both and from the sample sets and .

The third hypothesis in Theorem 1.1 which requires the sampled points to lie on the underlying manifold is too strong from the perspective of practical sampling considerations. A more reasonable hypothesis is that the data is noisy and lies near – rather than on – the respective underlying manifolds. This situation is also considered in [13]. Using their framework, we show that Theorem 1.2 can be extended to this more general setting (see Theorem 5.6).

The rest of the paper is organized as follows. In Section 2 we mention relevant definitions and tools from combinatorial algebraic topology. Section 3 describes results from [13] that are used in our work. The simplicial reconstruction of functions and its verification are presented in Section 4. Section 5 demonstrates the robustness of this reconstruction by providing a version of Theorem 1.2 which holds in the case of bounded sampling and evaluation noise. Our argument relies on a controlled version of the nerve lemma whose proof is described in Section 6.

2. Carriers and Nerves

For the sake of completeness and to introduce relevant notation, we review some classical results from the theory of simplicial complexes. A much more complete treatment is available in standard texts, for instance [12, 15].

Let be any finite set whose elements we call vertices. A simplicial complex with vertex set is a collection of nonempty subsets of – called simplices – which contains all the vertices is closed under inclusion. More precisely, a collection of subsets of is a simplicial complex if

  • for each , we have , and

  • if and , then .

The dimension of each simplex is the natural number given by where indicates cardinality. A subcomplex of is a sub-collection of simplices which forms a simplicial complex in its own right. The notation is used to indicate all simplices of dimension in . Without loss of generality, we may identify with by associating each -dimensional simplex with the unique element of which it contains. Thus, each simplex is uniquely determined by its constituent vertices. Given simplicial complexes and , a simplicial map associates to each vertex a vertex so that for each simplex the image is a simplex in .

2.1. Carriers

The geometric realization of a simplicial complex is defined (in [15, Ch. 3.1], for instance) as the space of all maps called barycentric functions such that for each ,

  1. there exists with , and

  2. the sum equals .

The realization of a simplex is defined as the closed subset consisting of all barycentric functions such that whenever . Observe that if then ; if then is contractible in ; and .

A simplicial map induces a continuous function between geometric realizations defined as follows. For any , the action of the barycentric function on a vertex is given by

It is readily seen from this definition that for each simplicial map and each .

Let be any topological space and a simplicial complex. A contractible carrier from to assigns to each simplex a contractible subset of so that whenever . A function is carried by if for each simplex . The following result is an extremely useful tool in combinatorial algebraic topology. We refer the reader to [1] and the references therein for details.

Lemma 2.1 (Carrier Lemma).

Let be a simplicial complex, a topological space and a contractible carrier. Then, there exists a continuous function from to carried by . Moreover, if two continuous functions are carried by , then

  1. they are homotopic, i.e., , and

  2. a homotopy may be chosen so that for each the section is also carried by .

2.2. Nerves

Let be a topological space equipped with a finite cover consisting of subsets of . The nerve of is the simplicial complex with vertex set where each subcollection constitutes a simplex if and only if the intersection is a non-empty subset of . We call this intersection the support of and denote it by . On the other hand, we also make use of the union . When considering an entire subcomplex rather than a single simplex, one defines to be the union of supports of all simplices in . It is easy to see1 that , and that .

A cover of a topological space is called contractible if the support of each is a contractible subset of . For instance, if lies in a topological vector space and if each is convex, then all non-empty intersections are automatically convex and hence contractible. One reason to consider contractible nerves is the following classical result (see [2] or [11, Thm. 15.21]).

Lemma 2.2 (Nerve Lemma).

Let be a paracompact topological space equipped with an open cover . If is contractible, then the geometric realization of its nerve is homotopy-equivalent to .

Since we always restrict our attention to finite covers by open balls in Euclidean space, we may obtain the following controlled version of the nerve lemma by strengthening its hypotheses.

Lemma 2.3.

Let be a finite collection of open balls in Euclidean space and let be their union. Then,

  1. is homotopy-equivalent to , and

  2. a homotopy-equivalence may be chosen so that for each simplex .

Note that the first conclusion of Lemma 2.3 follows easily from the traditional nerve lemma; it is the second assertion which plays a fundamental role in the proofs of our main results. A detailed verification of the controlled nerve lemma is presented in Section 6.

3. Recovering Manifolds from Samples

Our main result focuses on recovering – up to homotopy type – a Lipschitz continuous function between finitely sampled unknown compact Riemannian submanifolds of Euclidean space. In order to accomplish this, we first use the finite sampled data to construct a simplicial complex homotopically faithful to the underlying manifold. In this section, we briefly survey the process from [13] which constructs such a simplicial complex.

Throughout this section, let be a compact Riemannian submanifold with condition number .

Definition 3.1.

The bounding function is given by

(1)

where

and denotes the volume of the standard -dimensional Euclidean ball of radius .

Let denote the -dimensional Euclidean open ball of radius centered at . For any subset of and , we denote by the set of open balls and let be their union. We say that is -dense in the manifold if we have the inclusion . The following proposition enables one to recover the homotopy type of from a finite set which is sufficiently dense in relative to .

Proposition 3.2.

([13, Prop 3.1]) Assume and that a finite set is -dense in . Then, the canonical projection map defined by

(2)

is a strong deformation-retraction.

As a corollary to this proposition one obtains a string of isomorphisms on homology:

The first isomorphism comes from the fact that deformation retractions preserve homotopy type and homology is a homotopy-invariant. The second isomorphism results from applying the nerve lemma: since is a convex cover of for each , the associated nerve is contractible. The last isomorphism is simply the equivalence of singular and simplicial homology. We remark here that the nerve of a cover by balls [6] and the homology groups of finite simplicial complexes [4, 9] are eminently computable. Thus, one can actually obtain a finite representation of the homology of up to isomorphism from a sufficiently dense point sample.

Adopting the terminology of the proposition, we observe an important property of the projection map which holds even when is allowed to range over the larger interval . Note that for each we have some with . On the other hand, the distance between and is at most as well, since is the nearest point of the manifold to . By the triangle inequality, one has the following estimate for each and whenever :

(3)

This is not the best possible estimate one could obtain, but it suffices for our purposes here.

The following proposition assumes that is obtained by uniform i.i.d. sampling on and provides a lower bound on the sample size which guarantees – with high confidence – the -density needed by the previous proposition.

Proposition 3.3.

([13, Prop 3.2]) Choose and the probability parameter . Assume that is obtained by i.i.d. uniform samplings from . If , then is -dense in with probability exceeding .

Propositions 3.2 and 3.3 lead directly to Theorem 1.1, which is the main result of [13].

4. Recovering Functions from Samples

In this section we provide a proof of our main result, Theorem 1.2. The hypotheses of this theorem consist of a variety of assumptions and a-priori choices of parameters. To clarify their respective roles we present them via the following exhaustive list. The notation below will remain fixed throughout this section.

  • Assume that and are compact Riemannian submanifolds with condition numbers and , respectively.

  • Choose probability parameters .

  • Assume is a Lipschitz continuous function with Lipschitz constant less than . More precisely, we require for any pair of points .

  • Choose the radii and so that .

  • Assume knowledge of the finite sets and obtained by i.i.d. uniform sampling from and respectively. Furthermore, require and .

  • Assume knowledge of the restriction of to the point sample .

It is important to note that neither Smp nor Img imply that sampled points map to sampled points, so in general . Since Rad fixes choices of and , we simplify our notation by declaring that and . Similarly, we denote the unions of balls and by and . Finally, define by

(4)

Note that by Rad, we have .

By Cnd, Prb, Rad and Smp, Theorem 1.1 establishes that the vertical maps in the diagram below induce isomorphisms on homotopy with probability exceeding :

Here, is the canonical projection map (2) and is the inclusion of into the union of balls . Let be the composition . With probability exceeding , the map induced by on homotopy is naturally related by isomorphisms to the corresponding map induced by .

Definition 4.1.

A simplicial reconstruction of is defined to be any simplicial map so that for each .

We know from Lemma 2.3 that it is possible to find continuous maps for which induce isomorphisms on homotopy and satisfy for each simplex . For any choice of such maps, the next proposition establishes that the following diagram commutes up to homotopy whenever is a simplicial reconstruction of :

Proposition 4.2.

If is a simplicial reconstruction of , then and share a contractible carrier and hence are homotopic.

Proof.

For each , define . Note that implies since the latter is the union over a superset. The -image of each is contractible in , being a union of convex sets (in our case, balls in Euclidean space) with a non-empty intersection. Thus, is a contractible carrier. Given any , we have

On the other hand,

Thus, both and are carried by . ∎

With the goal of building a simplicial reconstruction in mind, we consider the correspondence defined on each by

(5)
Proposition 4.3.

With probability exceeding , the following holds. For each , the set is non-empty.

Proof.

Since Cnd, Prb, Rad and Smp satisfy the hypotheses of Proposition 3.3 for , we see that is -dense in with probability exceeding . This density suffices to guarantee a non-empty for each in the following way. For any there exists some with . Since as a consequence of Rad and (4), we have . ∎

Let denote the map taking each simplex to its vertex set which corresponds to (the centers of) those balls whose non-empty intersection determines the support of . Note that in if and only if we have the inclusion . Define similarly. For each , define by

(6)
Proposition 4.4.

With probability exceeding , the following holds. For each , any non-empty subset of determines a simplex in .

Proof.

By Cnd, Prb, Rad and Smp, Proposition 3.3 holds for . Therefore, with probability exceeding we are guaranteed that is sufficiently dense in so that the canonical projection map from (2) induces homotopy-equivalence. Assuming this density, pick and recall that by definition. Setting , we note from (3) that for each . By Lip, for each such we have:

(7)

Given , by (6) there is some so that , whence by 5. Since (7) implies , the triangle inequality yields

Since the point lies in the intersection , this intersection is non-empty and must determine a simplex of . Clearly, any subset of determines a face of this simplex, and hence constitutes a simplex in its own right. ∎

Proposition 4.3 guarantees with probability exceeding that is nonempty for each and so we may choose a selector function so that for each such . By definition of and (4), we have

(8)

Proposition 4.4 guarantees with probability exceeding that for each , the collection determines a simplex of . Therefore, with probability exceeding , there exists a map of points which induces a simplicial map . The following proposition demonstrates that the homotopy type of the induced simplicial map is independent of the choice of .

Proposition 4.5.

Given any pair of selectors, the maps and from to are homotopic.

Proof.

For each , we know that is a simplex of by Proposition 4.4. Moreover, if then by 6 and therefore .. For any , note that since and are faces of , both and are subsets of . Now , being the realization of a single simplex, is contractible. Therefore, and share that contractible carrier which associates each to . ∎

For any selector , the induced map is a simplicial reconstruction of . To see this, note that for any we can apply (8) to each element of in order to conclude . Combining this with Proposition 4.2 concludes the proof of Theorem 1.2.

5. Robustness to Bounded Noise

As is indicated in the Introduction, we would like to extend the results of the previous section to the case where data samples are noisy. That is, we assume that the sampled points lie close to, rather than on, the underlying manifolds. Such sampling discrepancies – aside from being ubiquitous in experimental data – cascade into imprecise knowledge of the images under an unknown function, particularly if the evaluation of that function is also subject to some inherent measurement errors. This generalization requires a framework to describe the noise, and so we adopt the model of [13, Sec. 7]. For any subset of Euclidean space and any , define the tubular neighborhood of radius around as follows:

Definition 5.1.

Given a subset and some , a probability measure on is called -conditioned about if

  1. the support of is contained in , and

  2. for each , there exists some constant so that

    where denotes the -dimensional open ball of radius about .

We write to denote the constant .

As in the noiseless case, the following fundamental results concerning sampling of manifolds are reproduced from [13]. Let be a compact Riemannian submanifold of with condition number . For define the functions

and note that when the quantity under the square root is strictly positive. It is straightforward to check that this positivity holds for . Pick such an and assume that is a finite set lying in . For each , let be the nerve generated by open balls of radius about the points in and let be their union. The following result is the noisy analogue of Proposition 3.2.

Proposition 5.2.

([13, Prop 7.1]) Assume that is -dense in for some and choose a radius satisfying . Then, the canonical projection map as defined in (2) is a strong deformation retraction.

Recall that given , the -covering number of – denoted – is defined to be the minimum possible satisfying the following property: there exists some finite set of cardinality such that the collection of -dimensional open balls covers . Given an -conditioned probability measure about and a probability parameter , define the new bounding function as follows:

(9)

The next result replaces Proposition 3.3 in the setting of conditioned noise.

Proposition 5.3.

([13, Prop 7.2]) Assume that the real numbers and satisfy and . Let be any -conditioned probability measure about and assume that a finite set is drawn from in i.i.d. fashion with respect to . Given a parameter , if then is -dense in with probability exceeding .

Combining the preceding propositions yields the main result of [13] as adapted for conditioned noise.

Theorem 5.4.

([13, Thm 7.1]) Let be a compact Riemannian submanifold with condition number . Fix and choose a radius satisfying . Assume that is an -conditioned probability measure about and that is a finite set obtained by -i.i.d. sampling. If for some , then strong deformation retracts onto with probability exceeding .

The introduction of sampling noise requires the following modifications to our assumptions and choices.

  • Assume that is a Lipschitz-continuous function whose Lipschitz constant is bounded above by some satisfying .

  • Choose positive noise bounds where and . Assume that is a -conditioned probability measure about .

  • Choose radii satisfying for so that

    (10)
  • Assume knowledge of the finite sets and obtained by i.i.d.  and sampling respectively. We require and .

  • Assume knowledge of so that for each , where is the canonical projection map from (2) and satisfies the following bound:

    (11)

The assumptions Cnd and Prb of Section 4 remain unchanged. As usual, we simplify notation by dropping the fixed quantities and from various subscripts. Thus, the nerve and the union are denoted by and respectively, and similar simplifications are made for the analogues. Note that in the assumption Img’ we allow evaluation noise. That is, we only assume knowledge of the true image of each up to a distance of .

The inequality (10) is a constraint that involves the Lipschitz constant of , the models for conditioned noise, and the radii for the nerves. It guarantees that the restriction (11) is always positive. The following result provides conditions on the manifolds, the noise models and the function under which (10) can be satisfied.

Proposition 5.5.

If , then there exist valid choices of and which satisfy (10).

Proof.

First, we check that on the domain imposed by Nse. Recall that by Rad’, and consider the following function

This function has no local maximum in its domain and attains a maximum value of at the left endpoint, so as desired. Since (10) imposes a lower bound of on , it suffices to show that the over-estimate of this lower bound is smaller than the upper bound imposed on by Rad’. Equivalently, we must show that . Observe that the function

has no local minima on the domain imposed by Nse and attains a minimum value of at the right endpoint. Thus, it is possible to satisfy (10) if . ∎

The main result of this section is the following theorem.

Theorem 5.6.

Assume Cnd, Lip’, Nse, Rad’, Prb, Smp’, and Img’. Then, with probability exceeding there exists a simplicial map which

  1. is a simplicial reconstruction of , and

  2. can be explicitly constructed using only , , , , and .

For the most part, the proof of this theorem is analogous to that of Theorem 1.2. Aside from Proposition 5.5, the only major modification is that the domain’s radius is augmented by the noise bound whereas the range’s radius is diminished by the corresponding bound . The following quantity plays the role of from (4).

(12)

Define as follows: for each ,

(13)

The noisy analogue of Proposition 4.3 is as follows.

Proposition 5.7.

With probability exceeding , the following holds. For each , the set is non-empty.

Proof.

Since Cnd, Prb, Nse, Rad’ and Smp’ satisfy the hypotheses of Proposition 5.3 for with probability measure , the sampled set is -dense in with probability exceeding . Assume that this density holds. Now, by Img’ and there exists some with by the assumed density of . By the triangle inequality, , and hence . ∎

As before, we define maps for taking each simplex to its vertex set. For each , define

(14)
Proposition 5.8.

With probability exceeding , the following is true. For each , any non-empty subset of determines a simplex in .

Proof.

By Cnd, Prb, Rad’, Nse and Smp’, Proposition 5.3 holds for and the probability measure . Therefore, with probability exceeding we are guaranteed that is -dense in via Proposition 5.3 and hence that is a strong deformation retraction via Proposition 5.2. Assuming this density, note that the distance from any to its nearest neighbor in is at most , since is -conditioned about by Nse. Similarly, given any , we have . For any and , we have by definition, so we may make the following estimate:

Write , and observe by Lip’ that

By (11) we know , so we obtain