Distance Functions

Distance Functions, Critical Points, and the Topology of Random Čech Complexes

Omer Bobrowski omer@math.duke.edu Department of Mathematics, Duke University, 120 Science Dr. Durham, NC, 27708 USA    Robert J. Adler robert@ee.technion.ac.il Department of Electrical Engineering, Technion - Israel Institute of Technology, Haifa,32000 Israel
Abstract

For a finite set of points in , the function measures Euclidean distance to the set . We study the number of critical points of when is a Poisson process. In particular, we study the limit behavior of – the number of critical points of with Morse index – as the density of points grows. We present explicit computations for the normalized, limiting, expectations and variances of the , as well as distributional limit theorems. We link these results to recent results in [16, 17] in which the Betti numbers of the random Čech complex based on were studied.

Distance function, critical points, Morse index, Čech complex, Poisson process, central limit theorem, Betti numbers
:
60D05, 60F05, 60G55, 55U10, 58K05
thanks: O.B. was supported in part by the Adams Fellowship Program of the Israel Academy of Sciences and Humanities and FP7-ICT-318493-STREP, TOPOSYS.thanks: R.A. was supported in part by AFOSR FA8655-11-1-3039 and ERC 2012 Advanced Grant 20120216, URSAT.

1 Introduction

For a finite set of points in , of size , let be the distance function for , so that

(1.1)

where denotes the Euclidean distance.

The main results of this paper provide considerable information about the asymptotic (in ) behavior of the critical points (defined below) of when is random. While the critical points are, by themselves, intrinsically interesting, knowledge of their behavior also has immediate implications (via Morse theory) to the study of the topology of Čech complexes built over random point sets.

Throughout, we shall concentrate on the situation in which the points in are those of a non-homogeneous Possion process with intensity , where is a probability density on . The mean number of points is therefore . Virtually identical results hold when is made up of independent samples from , and proofs in this situation can be found in the PhD thesis [6].

Most of what we shall have to say will concentrate on the distance function in neighborhoods of radius around , when and . Our main results give expressions for the normalized, asymptotic, means and variances of - the number of critical points with index appearing within distance from , along with various distributional limit results. The limit distributions are of different kinds, and, depending on delicate relationships between , , and , provide limits that may be Gaussian, Poisson, or deterministic, while also exhibiting a range of critical phenomena. Note that there are various notions of convergence used in probability theory, and so in Appendix A we provide definitions of the notions that we need. Our main results on critical points are described in detail in Section 3. However, before stating the results, we first need to describe precisely how to define the critical points, along with their indices, for the distance function. The difficulty lies in the fact that the distance function is not everywhere differentiable. We shall do this in the following section.

In Section 4 we shall discuss the relationship between and the Betti numbers of a special simplicial complex, the Čech complex, based on . The homology of the Čech complex is closely related to the neighborhood set, or -tube around ,

(1.2)

where is the -ball of radius around . What we shall see in Section 4 is that, if is too small, then the individual balls in (1.2) will generally fail to intersect, and the topology will be approximately that of a large number of disjoint points. This is occasionally referred to as the “dust” or “sparse” regime, although there does not yet seem to be a universally accepted term. If decays too slowly, then the balls will connect and the topology of will be that of a single ball. At the (phase) transition will have a percolative-like structure, and so we call this the percolation phase, which is also known as the “thermodynamic limit”. Each of these phases exhibits different limit behavior, with even more subtle differences possible within phases depending on interactions between parameters.

Translating our results about critical points into statements about the (algebraic) topological structure of , as , will also allow us to compare them to other results currently in the literature (primarily [16, 17]). The one comment that we already make at this stage, however, is that we can provide a much richer set of results for the asymptotic behavior of numbers of critical points than is currently available for the Betti numbers of these Čech complexes. Indeed, we can also provide some topological results via critical points that are not yet available with a direct topological approach. For example, we are able to compute properties of the Euler characteristic of the complex, and can show (see Corollary 4.2 for details) that there exist functions such that

Moreover, when and for some (cf. Proposition 3.9), then .

The remainder of the paper contains the proofs of the results in Sections 3 and 4. These are organized in a number of sections and appendices so as to make them as user friendly as possible. Many of the proofs rely on techniques in the theory of random geometric graphs as developed in [23].

Finally, a few words on motivation. There is considerable current interest in the study, from a topological, homological, point of view of random structures such as graphs and simplicial complexes. Some recent references are [3, 5, 7, 10, 18, 24] with two reviews, from different aspects, in [1] and [12]. Many of these papers find their raison d’être in essentially statistical problems, in which data generates these structures. An important example appears in the papers [22, 21] which show that the homology of an unknown manifold can be recovered, with high probability, by looking at the homology of the union of balls around the points of random samples (or equivalently, at the homology of the Čech complex generated by the sampling points on the manifold) with or without additional noise. The homological theme of these papers, which considers manifolds as being ‘close’ if their homologies are the same, seems particularly promising for situations in which the manifold of interest is embedded in a space of much higher dimension that itself; i.e. in dimension reduction problems and in manifold learning.

The approach adopted in this paper shares the motivation of the others listed above, but as already noted, by adopting a Morse theoretic point of view based on critical points of the distance function, obtains a more internally complete theory. Further, as mentioned above and shown later, it often gives some information on global topological invariants, such as Betti numbers. However, being based on critical points, this approach is naturally limited in its ability to reveal the full picture about the global topological invariants of random complexes.

Acknowledgements. We thank Shmuel Weinberger for introducing us to this problem, as well Matthew Strom Borman, Yuliy Baryshnikov, Matthew Kahle, and Shmuel, for many useful discussions in the earlier stages of our work.

2 Critical Points of the Distance Function

Critical points of smooth functions have been studied since the earliest days of calculus, but took on significant additional importance following the development of Morse theory (e.g. [19, 20]) which tied them closely to the homologies of manifolds, a topic that we shall discuss briefly in Section 4. At this point we note that if is a nice (closed, differentiable) -dimensional manifold, and a nice (Morse) function, then a point is called a critical point if . A non-degenerate critical point is one for which the Hessian matrix is non-singular. The Morse index of a non-degenerate critical point is then the number of negative eigenvalues of . These points, along with their indices, provide one of the main links between differential and algebraic topology.

Classical Morse theory does not directly apply to the distance function mainly because it is not everywhere differentiable. However, when the set is finite, one can still define a notion of non-degenerate critical points for the distance function , as well as their Morse index. It turns out that, even in this case, knowledge of the critical points and their indices allows one to deduce topological properties of the related Čech complexes. We shall see how to do this later in Section 4, but for now we need some definitions. Our arguments follow from the results presented in [11]. While the distance function served as the main motivation in [11], the results presented there are given in the more general context of ‘min-type’ functions. Here, we specialize those results to the case of the distance function.

Given a finite set of points , and defining the distance function (1.1), we start with the local (and global) minima of ; viz. the points of (where ), and call these critical points with index . For higher indices, we have the following definition.

Definition 2.1.

A point is a critical point of with index if there exists a subset of points in such that:

  1. , and, we have .

  2. The points in are in general position (i.e. the points of do not lie in a -dimensional affine space).

  3. , where is the interior of the convex hull of (an open -simplex in this case).

The first condition implies that in a small neighborhood of . The second condition implies that the points in lie on a unique - dimensional sphere. We shall use the following notation:

(2.1)
(2.2)
(2.3)
(2.4)

Note that is a -dimensional sphere, whereas is a -dimensional ball. Obviously, , but unless , is not the boundary of . Since the critical point in Definition 2.1 is equidistant from all the points in , we have that . Thus, we say that is the unique index critical point generated by the points in . The last statement can be rephrased as follows:

Lemma 2.2.

A subset of points in general position generates an index critical point if, and only if, the following two conditions hold:

  1. [label=CP0]

  2. ,

  3. .

Furthermore, the critical point is and the critical value is .

Figure 1 depicts the generation of an index critical point in by subsets of points. We shall also be interested in critical points that are within distance from , i.e. . This adds a third condition,

  1. [label=CP0]

  2. .

Figure 1: Generating a critical point of index in (i.e. a maximum point). The small blue disks are the points of . We examine three subsets of : , , and . are the dashed circles, whose centers are . The shaded balls are , and the interior of the triangles are . (1) We see that both (1) and (2). Hence is a critical point of index . (2) , which means that (1) does not hold, and therefore is not a critical point (as can be observed from the flow arrows). (3) , so (1) holds. However, we have , so (2) does not hold, and therefore is also not a critical point. Note that in a small neighborhood of we have , completely ignoring the existence of .

The following indicator functions, related to CP1–CP3, will appear often.

Definition 2.3.

Using the notation above,

(2.5)
(2.6)
(2.7)

3 Main Results

Let be a bounded probability density function on , which we assume to be bounded. This assumption will remain in force throughout the paper, without further comment. Let be a spatial Poisson process on with intensity function . Denote by the sets of critical points with index of . Let be a sequence of positive numbers with , and define

Our main goal is to study the limits of as . Since (the minima are the points of ) we shall only be interested in . The results split into three main regimes, depending on the rate of convergence of to zero, specifically, on the limit of the term .

A word on notation: In the formulae presented below, for and we write for .

3.1 The Subcritical Range ()

This range is also known as the ‘dust phase’, for reasons that will become clearer later, when we discuss Čech complexes. We start with the limiting mean.

Theorem 3.1 (Limit mean).

If , then for ,

(3.1)

where

(3.2)

and is defined in (2.6).

In general, as is common for results of this nature, it is difficult to express this integral in a more transparent form. However, when , contains only a single point, and so and . Therefore, , yielding , where is the volume of the unit ball in . Some numerics for other cases are given below.

The observation that, for a specific choice of , there is at most one such that leads to the important fact that there is a ‘critical’ index, , such that

(3.3)

with any value in possible at . That is, there is phase transition occurring within the subcritical regime itself. Similar regimes, with identical limits, appear for asymptotic variances.

Theorem 3.2 (Limit variance).

If , then for ,

Not surprisingly, the three regimes also yield different limit distributions.

Theorem 3.3 (Limit distribution).

Let , and ,

  1. If , then

  2. If , then

  3. If , then

As above, for a specific choice of , there is going to be at most a single for which the Poisson limit applies. Otherwise converges either to zero or infinity. Thus, in the subcritical regime, the picture is that , while, for the value of will be zero, with high probability, which increases with .

3.2 The Critical and Supercritical Ranges ()

We now look at the critical () and supercritical () regimes. While there are differences between the two regimes, the general outline of the results is the same. In both, the correct scaling for is (as opposed to in the subcritical range). Consequently, the limit results are similar for all the indices.

The supercritical regime is significantly more difficult to analyze than either the critical or subcritical, and we shall require an additional assumption for this case, which necessitates a definition.

Definition 3.4.

Let be a probability density function. We say that is lower bounded if it has compact support and .

Henceforth, when dealing with the supercritical phase, we always assume that is a lower bounded probability density, and that is convex. It is not clear at this point if these are necessary conditions, or a consequence of our proofs.

Theorem 3.5 (Limit mean).

If , then, for ,

where

is the volume of the unit ball in , and , and are defined in (2.3), (2.5), and (2.6), respectively.

Again, these terms can be evaluated for , in which case

(3.4)

For a uniform distribution on a compact set it is easy to show that is given by

(3.5)

from which it is easy to check that as . For higher indices, we have no analytic way to compute . However, it can be evaluated numerically, and an example is given in Figure 2 for the uniform distribution on . Note that, in that example, . This is not a coincidence, and the explanation for this phenomenon will be given in Section 4.3, where we discuss the mean Euler characteristic of Čech complexes.

Figure 2: The function. In this example , and is the uniform density on . For we know that , and for we have an explicit formula in (3.5). For we used numerical integration followed by some smoothing.

Recall that, in the subcritical phase, the limit mean and the limit variance were exactly the same. For other phases, this is no longer true.

Theorem 3.6 (Limit variance).

If and ,

The expression defining is rather complicated, and can be found in (8.4). Note, that as an immediate corollary of Theorems 3.5 and 3.6, we have the ‘law of large numbers’ that,

Theorem 3.7 (Clt).

If , then for ,

(3.6)

To conclude this section, we note an interesting result which is unique to the supercritical regime, for which we define , the ‘global’ number of critical points of the distance function in (i.e. without requiring (3)). We note first that and have identical asymptotic behaviors, at least at the level of their first two moments and CLT:

Theorem 3.8.

Let be lower bounded with a convex support. Then, for ,

and

An obvious corollary of Theorem 3.8 is that . However, much more is true:

Proposition 3.9.

Under the conditions of Theorem 3.8, and if , for sufficiently large (-dependent) , then, for ,

(3.7)

Thus, in the supercritical phase, the slow decrease of the radii implies that the global and the local number of critical points are ultimately equal with high probability, despite the fact that both grow to infinity with increasing . This is an interesting result, and will turn out to be important when we discuss the Euler characteristic of the Čech complex in the next section. The equality between the local and global counts can be explained if we study how well is covered by the random balls of radius . Denoting by the event that , then similar methods as in [13, 15, 2] can be applied to show that if , then . Thus, under the assumptions of Proposition 3.9, the support of is comppletely covered by the -balls. Since all the critical points lie within the support, we have that they all should be accounted for in . Note that (3.7) relies heavily on the assumed convexity of . For example, take to be the uniform density on the annulus . Then, for large enough, we would expect to have a maximum point (index 2) close to the origin. This critical point will be accounted for in , but will be ignored by , since its distance to is greater than . Thus, we would expect that , which contradicts (3.7)

4 Random Čech Complexes

As mentioned already a number of times, the results of the previous section regarding critical points of the distance function have implications for the homology and Betti numbers of certain random Čech complexes, and so are related to recent results of [16] and [17]. Our plan in this section is to describe these complexes and then the connections. We shall assume that the reader either has a basic grounding in algebraic topology at the level of the first two chapters of [14] or is prepared to accept a definition of the -th Betti number of a topological space as the number of -dimensional ‘holes’ in , where a -dimensional hole can be thought of as anything that can be continuously transformed into a -dimensional sphere. The zeroth Betti number, , is merely the number of connected components in .

4.1 Čech Complexes and the Distance Function

The Čech complex generated by a set of points is a simplicial complex, made up of vertices, edges, triangles and higher dimensional faces. While its general definition is quite broad, we focus on the following special case.

Definition 4.1 (Čech complex).

Let be a collection of points in , and let . The Čech complex is constructed as follows:

  1. The -simplices (vertices) are the points in .

  2. An -simplex is in if .

Figure 3 depicts a simple example of a Čech complex in .

Figure 3: The Čech complex , for , and some . The complex contains 6 vertices, 7 edges, and a single 2-dimensional face.

An important result, known as the ‘nerve theorem’, links Čech complexes and the neighborhood set , and states that they are homotopy equivalent (cf. [9]). Thus, for example, they have the same Betti numbers. Furthermore, both are linked to sublevel sets of the distance function, since it is immediate from the definitions that

(4.1)

4.2 Critical Points and Betti Numbers

Classical Morse theory, in particular the version developed in [11] that applies to the distance function, tells us that, in view of the equivalences in (4.1), there is a connection between the critical points of over the set , along with their indices, and the Betti numbers of . As usual, is a point set in , and assume that is in general position. Then for every critical point of at height and of index , for all small enough , either

(4.2)

or

(4.3)

Despite this connection, Betti numbers, dealing, as they do, with ‘holes’, are typically determined by global phenomena, and this makes them hard to study directly in the random setting. On the other hand, the structure of critical points is a local phenomenon, which is why, in the random case, we can say more about critical points than what is known for Betti numbers to date.

4.3 Random Čech Complexes

Retaining the notation of the previous section, and defining , our aim will be to examine relationships between the random variables and the and . In addition, we shall compare our results for to those of [16] and [17] for , using Morse theory to explain the connections. Note that the results in [16] are phrased in terms of the random samples case (with points), however the proofs there can be easily adjusted to fit the Poisson case as well (as in [17] or [6], where both the Poisson and random samples cases are treated).

In direct analogy to the results of Section 3, [16, 17] show that the limiting behavior of splits into three main regimes, depending on the limit of . In the subcritical () or dust phase, in which the Čech complex consists mostly of small disconnected particles and very few holes, Theorem 3.2 in [16] states that for ,

for some constant defined in an integral form and related to the of our Theorem 3.1. In [17] the subcritical phase is explored in more detail, and limit theorems analogous to those of Theorem 3.3 are proved. Combining their results with those in Section 3.1, observe that the and the exhibit similar limiting behavior, and are . Furthermore, based on the expected values, we can informally summarize the relationship between the different and as follows:

(4.4)

where by we mean that and by we mean that , and is as in (3.3). For all terms are zero with high probability that grows with .

Recall that Morse theory tells us that each critical point of index contributes either to or to (see (4.2),(4.3)). Splitting accordingly as , the diagram (4.4) implies that . In other words, most of the critical points of index destroy homology generators rather than create new ones.

For the other regimes, making statements about the Čech complex becomes extremely difficult, and thus the theory is still incomplete.

In the critical phase (), the Čech complex starts to connect and the topology becomes more complex. In addition, once passes a certain threshold, a giant component emerges (cf. Chapter 9 of [23]), from which comes the alternate description of this phase as the ‘percolation phase’. Theorem 4.1 in [16] states that for ,

although the exact limit is not computed. This agrees with the results in Section 3.2 of this paper. The main difference between the two sets of results is that for critical points we are able to give a closed form expression for the limit mean of (Theorem 3.5), as well as stronger limit results (Theorems 3.73.9). This will be useful below, when we discuss Euler characteristics.

In the supercritical regime () even less is known about the Čech complex. In general, the Čech complex becomes highly connected, the topology becomes simpler and the Betti numbers decrease. Theorem 6.1 of [16] gives the precise results that if is a uniform density with a compact and convex support, and , then

(4.5)

which is described in [16] by saying that is “asymptotically almost surely contractible”. We have no analogous result about critical points, nor could we, since is and thus (Section 3.2). However, Corollary 4.2 below gives information about the Euler characteristic of the Čech complex which is different from, but related to, (4.5). (Note that (4.5) requires that the underlying probability density is lower bounded with convex support, the same assumption we adopted Section 3.2.)

To conclude this section, we present a novel statement about the Čech complex which can be made based on the results in Section 3. The Euler characteristic of a simplicial complex has a number of equivalent definitions, and a number of important applications. One of the definitions, via Betti numbers, is

(4.6)

However, also has a definition via indices of critical points of appropriately defined functions supported on , and this leads to

Corollary 4.2.

Let be the Euler characteristic of . Then

(4.7)

Moreover, when and (with as in Proposition 3.9), then .

The proof of Corollary 4.2 is presented in Section 9. Note that (4.7) cannot be proven using only the existing results on Betti numbers, since the values of the limiting mean in the critical and supercritical regimes are not available. This demonstrates one of the advantages of studying the homology of the Čech complex via the distance function.

In closing we note some of the implications of Corollary 4.2. In the subcritical phase, we have that , which agrees with the intuition developed so far that, in this range, the Čech complex consists of mostly small disconnected particles and very few holes. In the critical range we have a non-trivial limit resulting from the fact that the Čech complex has many holes of all possible dimensions. In the supercritical range, which is exactly what we get when (cf. (4.6), (4.5)). Since in this regime, it is clear now why the numerics of Figure 2 showed that . Finally, note that in a sequel [8], we explore the Čech complex when the samples are generated by a distribution supported on a closed manifold . In this case we can make a much more concrete statement, and prove that with an appropriate choice of radius . A different direction, in which the underlying samples are generated by stochastic processes with dependence between the points, can be found in [26].

5 Some Notation and Elementary Considerations

The remaining sections of the paper are devoted to proofs of the results in Sections 3 and 4, and are organized according to situations: sub-critical (dust), critical (percolation), and super-critical. In this section we list some common notation and note some simple facts that will be used in many of them.

  • Henceforth, will be fixed, and whenever we use or we implicitly assume that , unless stated otherwise.

  • Usually, finite subsets of will be denoted calligraphically (). However inside integrals we use boldfacing and lower case ().

  • For , and , we use the shorthand

  • The symbol ‘’, denotes a constant value, which might depend on (ambient dimension), (the probability density of the samples), and (the Morse index), but on neither nor . The actual value of may change between and even within lines.

  • While not exactly a notational issue, we shall often use the fact that, for every , as , and there is a such that .

Lemma 5.1.

Let be a set of i.i.d. points in sampled from a bounded density . Then there exists a constant such that

Proof.

If is bounded by a ball with radius , then are all within distance from , thus

where , and is the volume of the unit ball in . ∎

6 Means for the Subcritical Range ()

We start by proving Theorem 3.1 (the limit expectation), which requires the following important lemma. Note that the lemma has two implications. Firstly, it gives a precise order of magnitude, with constant, for the probability that points in the -neighborhood of a point in generate an index- critical point. Secondly, it implies that if an additional, high density set of Poisson points is added to the picture, the probability that any of these will be in the ball containing the original points is of a smaller order of magnitude.

Lemma 6.1.

Let , be a set of random points with density function , independent of the Poisson process . Then,

Proof.

Note that from the definition of , it follows that

Thus, using the change of variables ,

(6.1)

Now, for to be nonzero, all the elements must lie inside - the ball of radius around the origin. Therefore,

and applying the dominated convergence theorem (DCT) to (6.1) yields

(6.2)

from which follows

(6.3)

completing the proof for .

Next, the definition of as a Poisson process with intensity implies

Thus,

The integrand here is smaller or equal to the one in (6.1), therefore we can safely apply the DCT to it. To find the limit, first note that