Introduction

Manifold learning is a problem of recovering a manifold or properties of it from sampled data points. A variety of problems in human and machine learning are characterized as manifold learning including perception \citetenenbaum1998mapping,jansen2006intrinsic,chen2018sparse, active learning \citeslama2015accurate, and deep learning \citefefferman2016testing. Yet, learning the manifold underlying a dataset is known to be extremely difficult \citeniyogi2008finding.

There are two principle challenges in manifold learning. In theory, the ability to reconstruct a manifold is bounded by the reach, which roughly speaking is the distance between narrowest parts of the manifold. When learning, one does not know the true reach of the manifold, and conservative assumptions about the reach require vastly increased data. In practice, data rarely adhere to the uniform random sampling assumption used to prove bounds, and non-random sampling can render the problem much more difficult. Topological data analysis (TDA; \citecarlsson2008local) addresses needing to know the reach, but does not address the amount of data or the need for uniform random sampling.

Central to the formulation of the problem is that learners must draw inference from the data alone. This differs from human learning problems where learners may receive examples \citeShafto2008,Shafto2014 or demonstrations \citekuhl1997cross,brand2002evidence from a more knowledgeable teacher, in addition to their own observations. In this paper, we consider how having teachers who select structured data may affect manifold learning.

Our goal is to understand theoretical bounds on learning manifolds and their topology from teaching via structured data, which we expect to inform debates in machine learning and human learning. We investigate manifolds because this is the learning problem, as most frequently posed. We particularly investigate the topology of manifolds for several reasons. First, the decomposition of learning into grounded and more abstract aspects parallels common wisdom across human and machine learning, which have converged on hierarchical (“deep”) models of learning. Second, teaching topology will prove to be data-efficient for manifold learning applications, such as clustering where only information about the global structure of manifold (e.g. number of connected components, number of holes, etc.) is needed. Third, teaching will be able to proceed without full knowledge of the geometry, and the requirement for the teacher can be relaxed by just knowing the homotopy type of the manifold.

Our contributions are: (1) We propose two styles of teaching a manifold: teaching from individual data points and teaching from sequence of data (demonstration) (Section \refsec:teachtopology); (2) We provide bounds on the minimum number of data points needed in order to teach correct topology of closed orientable surface of genus (Propositions \refprop:closedsurface and \refprop:seq_surface in Section \refsec:teachtopology); (3) We introduce mechanisms for a TDA learner to interpret demonstrations by a teacher and we qualitatively analyze levels of confidence in a teacher (Section \refsec: HL); (4) We analyze teaching topology and implications for learning true geometry of the manifold in two examples (Section \refsec: examples).

Preliminaries\labelsec:prelim In this section, we will provide a brief overview of the necessary background, and refer the reader to \citehatcher2000algebraic,munkres2000topology for a complete introduction.

In machine learning, the manifold assumption states that high dimensional data in the real world are typically concentrated on a much lower dimensional manifold rather than every region of the possible domain \citezhu2009introduction. For instance, natural images do not occupy the entire space of possible pixel configurations. Therefore learning the manifold on which the data lie on or near is an important task. Because the difficulty of inferring the geometry of an arbitrary manifold is bounded by its worst local feature, quantified as the \textitreach (defined below) of the manifold, we may only aim to reduce the sample complexity of learning a manifold by focusing on its global properties, which are encoded by the topology.

An -dim manifold is a topological object that locally resembles Euclidean space near each point. They naturally arise as solution sets of a system of equations \citelee2010introduction. In this paper, is an orientable compact sub-manifold in . We mainly focus on low dimensional manifolds such as curves (1-dim) and surfaces (2-dim). However, our teaching methods can be used to convey low dimensional topological features of any manifold.

The classical result on closed surfaces will be used, which states that any connected orientable closed surface is homeomorphic to either the sphere or the connected sum of tori, where represents the genus.

Algebraic topology provides powerful methods to study topological features of a space using algebraic tools. One main idea is that two topological spaces and are considered to have ‘the same shape’ if one space can continuously deform into the other one. Formally, two continuous maps are \textbfhomotopic if there exists a continuous function from the product of the space with the unit interval to such that and hold for any . The spaces and are said to be \textbfhomotopy equivalent or to have the same \textbfhomotopy type if there exists two maps and such that and are homotopic to the identity map between and respectively. A space is said to be \textbfcontractible if it is homotopy equivalent to a point.

Among all the topological invariants shared by spaces with the same homotopy type, \textbfhomology is of the greatest interest to manifold learning. Because homology captures abstract topological properties of the underlying data space in simple algebraic notions such as numbers and groups. There are several models of the homology theory. Throughout this paper, we will use \textitsimplicial homology with coefficient . For each dimension , the -th homology group of , denoted by is a commutative group in form of . Roughly speaking, each copy of represents a -dim ‘hole’ of and the amount of copies represents the total number of independent -dim ‘holes’ of . For example, indicates that has two connected components, and non-trivial -st homology group suggests that contains -dim hole(s) and thus not contractible.

Another important characteristic of a manifold that has been extensively used in manifold learning is the \textitreach, which reflects the geometric aspect of \citefefferman2016testing, aamari2017estimating. The \textbfreach of a manifold is the largest number such that any point at distance less than from has a unique nearest point on . Intuitively, around one can freely roll a ball of radius less than its reach . The reach measures the narrowest bottleneck-like width of , which also quantifies the curvature of .

For the formalism of teaching-learning algorithms, we consider as a class of \textbflearning algorithms that construct approximations of and/or identify the homotopy type of from a set of data points sampled from . Examples of such algorithms are available in \citecheng2005manifold, niyogi2008finding,boissonnat2014manifold.

Given a manifold , a collection of data points is called a \textbfteaching set with respect to if there exists a learning algorithm that recovers the homotopy type of using . denotes the size of . A teaching set is said to be \textitminimal w.r.t. if for any teaching set of . Further, is a \textitminimal teaching set w.r.t. the homotopy type of if is a teaching set for some of the same homotopy type as and for any teaching set of a manifold homotopy equivalent to . The size of a minimal teaching set is called the \textbfminimal teaching number.

Structured data and manifold teaching\labelsec:teachtopology Here we propose two corresponding styles of teaching the topology of a manifold using structured data: isolated data points (individual examples) and and sequential data points (demonstrations). In particular, we provide lower bounds for teaching complexity of each method.

Manifold teaching from sample points \labelsec:points \citeniyogi2008finding introduced a framework to reconstruct manifolds from randomly sampled data. Their work can be rephrased as a manifold teaching problem. Suppose two agents, which we call a teacher and a learner, wish to communicate a manifold . In their setting, the teacher passes a collection of randomly sampled data points to the learner, who then builds a manifold by a learning algorithm in the \textbfclass : the learner first picks a parameter , then for each , makes an n-dimensional ball centered at of radius . is the dimension of the ambient space which can be inferred from data points’ coordinate size. The union of these balls constitutes the learned space.

The main result in \citeniyogi2008finding provides an estimation on the number of data that are needed to guarantee that the learned space and the target manifold are homotopy equivalent with high confidence. depends on the confidence level, the volume and the reach of , and also the learner’s choice of .

Considering as a sufficient bound on the minimal teaching number of , we seek a necessary condition. The calculation of proposed in \citeniyogi2008finding requires knowledge about the critical features of (volume and reach), which translated to our context implies that either the teacher knows the true or the teacher has observed a large amount of data points, which allows one to make good estimations of these critical features using sophisticated algorithms such as \citeaamari2017estimating. Hence we will start our analysis by assuming that the teacher has access to the true manifold. In Section \refsec: HL, we will show that this assumption can be relaxed in many practical cases.

Suppose that the learner uses the class of algorithms , what is a \textitminimal teaching set to convey the homotopy type of a manifold ? The case when is a non-contractible 1-dim manifold is extremely neat. Since every such is homotopy equivalent to a circle, at least three points are needed as explained below.

Let be a unit circle embedded in , and be the class of learning algorithms described above. It is clear that any data set with only one or two points will result contractible for any choice of . However, as illustrated in Figure Document(a), with three equidistant points sampled on , any learner with will recover the correct topology of from the union of three connected disks with a hole in the middle. Thus the minimal teaching number for a circle is three. Now suppose that is a closed orientable surface. Two basic examples are given below. {example} Let be a unit sphere. Only four points are needed: four vertices of an inscribed regular tetrahedron. Any learner with , recovers the correct topology of .

Let be a torus embedded in as shown in Figure Document. can be obtained by rotating the \textcolorredred circle around the \textcolorgreengreen circle . Denote the radii of and by and respectively. Two 1-dim holes of are represented by and . As in Example Document, each needs at least three teaching points. Since the learner picks one for all data points, more data points are needed for when , where and . Hence to find the minimal teaching set for the homotopy type of , we may assume that . Suppose that any three data points sharing a circle in Figure Document are equidistant points. Then can be used to teach and . To recover the only 2-dim hole of , it is natural to add into to complete the red dotted circles going through and . Ideally, -balls centered at these points should form a torus. However, there are large undesirable gaps left open between the red circles because the learner is restricted to pick . We now compute how many extra points are needed to fill in all these gaps. Direct calculation shows that the radius of the dashed blue circle is and nine equidistant data points on are needed to teach it with . If we rotate around nine times with each step , then the trace of produces data points (including all 9 points in ). With these points, we almost form a torus but still have many small gaps. One may count that in total there are such gaps. So points are enough. Moreover, notice that the inner green circle is over taught, one may check that 3 teaching points can be removed from . Hence we may teach with points. The approach used in Example Document can be generalized to all orientable surfaces. {proposition} Let be a closed orientable surface with genus . Then the minimal teaching number for the homotopy type of with respect to is bounded by . Proof. We will procced by induction. When , is homotopy equivalent to . So the homotopy type of can be taught by 51 points. Suppose that the claim holds for any with . Then when , , surface with genus and boundary component, can be taught by points. Notice that there exists a which can be obtained by gluing a with a . Hence we may teach with data points. {remark} The teaching set prescribed in Proposition Document for is robust to parameter . For different choices of , if the learned space is not homotopy equivalent to the target manifold , then is either contractible or disconnected. Therefore, if the learner and the teacher agree the target manifold is connected and not contractible, then the learner is able to learn the correct manifold (homology) using any proper choice of . {remark} Let be a genus orientable surface with boundary components. Note that can be obtained from by removing disconnect disks. Therefore, the minimal teaching number of with respect to is bounded by . The above analysis suggests that teaching manifolds by isolated data is not always efficient: the examples show that even a simple manifold as a regular torus requires a large set of teaching points. Based on manifold’s topological features, below we propose two new classes of teaching algorithms.

### \thesubsection Manifold teaching from demonstrations

A main task in manifold teaching is passing the correct topology.
This task forces large amount of data for any local to global teaching procedure due to the locally Euclidean nature of manifolds. We describe a method that teaches the topology directly from demonstrations where each demonstration is a sequence of data points describing a loop.
Teaching with sequences of data points is efficient because topologies of manifolds are intuitively captured by loops in various dimensions.
As in Example Document, a unit circle can be taught by three points. In fact, any (oriented) ^{1}^{2}^{3}^{4}

To make the learning algorithm robust to the choice of curves and planes, we further assume that the teacher and the learner agree that : (1) there is no intersection between different connecting curves except end points; (2) two points are connected by at most one curve. For instance, a pair of pants can be taught by as shown in Figure Document(b). With the teaching set , the learner needs to connect and multiple times. If the learner makes the connection by the \textcolorredred curves for the first time, assumption ensures the learner always picks the red curves during the entire learning process. {example} The torus in Example Document can be taught by a sequence of four sequences: as shown in Figure Document. This teaching set only contains the 9 basic points which fits our initial intuition. {proposition} Let be a closed orientable surface with genus . Then the minimal teaching number of with respect to is bounded by sequences, where each sequence consists of at most points. Proof. A classical result of surfaces states that for any , there is a system of disjoint simple closed curves which cut into pairs of pants (see for example, [farb2011primer]). Note that each simple closed curve can be taught by a sequence of points; each pair of pants can be taught by a sequence that consists of three sequences representing its boundary curves. Moreover, two legs of a pair of pants can be glued along their boundary curves through a sequential data. For instance, two \textcolorblueblue boundary curves in Figure Document(b) can be glued by . Hence the claim holds.

## \thesection Teaching with partial knowledge

In this section, we show how a teacher who may have knowledge on just some parts of the manifold, would assist the learner to improve its estimation of the relevant topological and geometrical information from the data. We consider teaching in the framework of Topological Data Analysis (TDA) [carlsson2009topology, chazal2017introduction]. We assume the learner’s data set is sampled from the true manifold . Also, there exists a teacher who is able to mark a sequence of data points in as a demonstration in Sec. Document which represents a loop in the part of the manifold known to the teacher. Using an algorithm in class (Sec LABEL:sec:points) with different ’s, the learner obtains a summary of estimations of in form of persistent homology (Sketched below, see details in [edelsbrunner2010computational]). Roughly speaking, persistent homology tracks topological changes as the learner’s approximation of varies with . Based on algorithm , for each , the learner builds a union of balls , centered at with radii equal to . Consider the nested family of . Given a non-negative integer , the inclusion , for , naturally induces a linear map between their -th homology groups and . The set of all -th homology groups together with all the linear maps induced by inclusions form a persistence module, which can be intuitively viewed as

[chazal2016structure] show that when is finite, the persistence module obtained for a fixed can be decomposed into direct sum of interval modules of the form:

where is the identity map. Recall that each -th homology group is a direct sum of , with each copy of represents a -dim loop. Hence, essentially each interval module records the lifespan of a loop , which can be depicted by a interval from the birth radius to the death radius of the loop. Therefore, each persistence module forms a collection of intervals called the persistence barcode. Conventionally, the longer an interval in the barcode, the more persistent is the corresponding topological feature. Each pair of birth and death time of any cycle can be represented by a point in the plane forming a persistence diagram. Heuristically, more persistent features which are interpreted as more relevant are farther from the diagonal in the persistence diagram. One of the quirks of TDA is that persistence is only heuristic. Due to [niyogi2008finding], in a modest sized dataset, there need not be any for which the true topology is obtained.

### \thesubsection Learning from demonstrations

There are two types of challenges (that may happen simultaneously) in TDA for which we believe teaching can be helpful. First, in smaller reach manifolds, TDA is prone to over-identification of loops, i.e., proposing loops that are not legitimate in the actual manifold from which data are sampled (Example Document). Second. TDA usually only focuses on the loops that are long persistent, i.e. far from the diagonal in the persistence diagram. However, many of short living loops are of interest due to some background knowledge and may be disregarded because lack of enough samples or non-uniform sampling from the manifold (Example Document). Teaching through demonstration addresses these problems by letting loops of interest persist longer. In the first type of problems, a demonstration identifying a cycle which is homotopy equivalent to a loop already learned by a TDA learner suggests that birth of such a cycle might have happened sooner and the learner is allowed to assume a sooner birth time for it. In terms of persistence diagram it is equivalent of shifting the point corresponding to this cycle horizontally to the left which makes it farther from the diagonal and increases its chance of selection. Geometrically, the background knowledge of teacher on the manifold and uniform sampling process allows the learner to assume the existence of other unobserved data from that is between demonstrated points which requires smaller -balls for formation of that cycle, hence sooner birth time. Teaching also justifies the learner persisting a loop for a longer time, which is helpful for problems with fast dying loops. This can be achieved by increasing the death time of the loop, which topologically is to let the desired loop live longer by obstructing the collapse of it in the formation of as grows. Consequently, the point corresponding to the taught cycle in the persistence diagram shifts vertically farther from the diagonal. Integrating the teaching data requires specifying how the learner interprets the teacher’s marking of a sequence of points. In case of the complete trust in teaching, the learner could infer that a teacher’s demonstration is implicitly picking out the range of under which can be properly approximated. This interpretation implies the teacher knows the entire manifold which is a rather strong assumption that may not be practically useful. We consider two relaxations toward a notion of teaching in which the teacher has partial knowledge. Suppose the teacher’s demonstration is consistent with a feature (an n-dim loop), , that appears on the persistence barcode with lifespan . First, if the teacher had full access to that feature, then the learner could assume that the demonstration indicated that should exist from the lowest scale, i.e. . However, for a teacher who is assumed to have more, but not perfect knowledge, the learner may sample a new lower bound for that loop uniformly in the range .Second, if the data observed by the learner were uniformly sampled and sufficiently dense, there would exist a (range of) wherein true features would be observed together. However, in reality the points observed by the learner may be non-uniformly sampled (Example Document). In this case, the teacher’s demonstration would be interpreted as indicating a long living feature at . Hence, the learner may sample a new upper bound uniformly in the range .

## \thesection Simulations

In this section we analyze our proposed methods in the previous section in two examples where teaching is helpful. {example} Let the true manifold be the \textcolorblueblue barbell shaped annulus shown in Figure Document with reach . Assume that the learner analyzes randomly sampled data by TDA and the teacher knows that contains a 1-dim hole. As described in Section Document, three distinct points are required to form a teaching sequence for this hole. When fewer than three data points are observed by the learner, the teacher would simply wait until more data were collected. Suppose that the learner gets three data point as shown in Fig. Document. The corresponding persistence barcode of is empty for (no 1-dim loop is ever formed for any choice of ). With , the teacher may teach by marking these points sequentially as for example . Comparing the teacher’s demonstration with the barcode, the learner would realize that contains an 1-dim loop containing and currently the dataset is not sufficient to extract any accurate geometrical information.

Further, suppose that the learner intends to estimate the geometry of and continues to sample more points.
A given data set is feasible,
if the learner is able to derive the true geometry of from with some , i.e.
for this example, if there exists such that is homotopy equivalent to .
^{5}^{6}^{7}

The Figure Document(c-left) depicts the mean and variance of the learning accuracy of ’s geometry for different learners.
The blue curve represents the learner, who has high confidence in the teacher’s demonstration, and therefore extends the birth time of the bar corresponding to the loop containing (the top green bar in Fig Document (a) in this case), to the lowest possible , shown as in Figure Document(b-left) in persistent barcode and (b-right) in persistent diagram.
The orange curve represents the learner, who has limited confidence in the teacher’s demonstration, that extends the birth of the corresponding bar to a uniformly sampled smaller . The green curve represents the learner who chooses uniformly over all the entire possible range.
The red curve represents the learner who approximate by with the most persistent homology (two -dim loops in this case), and is incorrect about the geometry even with increasing data size. Notice that the gap between the blue (or orange) and the green curves decreases as the number of data increases. This implies that the smaller the data set is, the more effective the teacher’s demonstration is.
As discussed before, the difficulty of learning a manifold increases dramatically as the reach of drops.
We further investigated the role of reach in this setting.
Figure Document(c-right) plots the average increase in the probability of learning geometry with high confidence in the teacher for different magnitudes of reach.
It indicates that the teacher’s demonstration is most effective for smaller reach and sparse data set.
Thus, the example illustrates how a learner’s acquisition of geometry is accelerated by teaching topology.
{example}
As discussed in Sec. Document, another type of problem where teaching is helpful is when the data points are not sampled uniformly and its distribution constrain the TDA learner to identify important cycles at different non-overlapping ranges of . We analyze a dataset of taxi pickup locations around Central Park in Manhattan. ^{8}

Persistent homology has started to attract attention in machine learning [carlsson2008local, chazal2013persistence, li2014persistence, reininghaus2015stable]. However, levering these topological features for learning poses considerable challenges because the relevant topological information is not carried by the whole persistence barcode but is concentrated in a small region of that may not be obvious [hofer2017deep]. Teaching by demonstration resolves these challenges by allowing the the learner to extract the most suitable topological information after the correct homology appears in the persistence barcode, and zooming the analysis of ’s geometry into the most appropriate range of with high data efficiency. More importantly, teaching by demonstration allows accumulation of information across learners. As pointed out in Sec LABEL:sec:points, the method of teaching by sampling points essentially assumes that the teacher knows the true manifold . However, given the intractability of manifold learning in general, there is no plausible way for the teacher to have access to . On such accounts, teaching merely passes the problem off to a teacher for whom the learning problem does not exist. The key advantage of teaching from demonstrations is that it allows the teacher to convey critical information of without knowing the entire manifold. In addition, from a teacher’s perspective, much less data is needed to learn the topology of an irregular manifold than its geometry. For instance, let be the 1-dim manifold shown in Figure Document(b). Denote the reach of by and the radius of the left arc in by . Note that the teacher only needs -dense data to learn the topology of , whereas -dense data to learn the geometry. In fact, for any manifold , we may define its topological reach to be the largest number such that is homotopy equivalent to for any , where . According to Proposition 3.2 in [niyogi2008finding], for the same confidence level, points needed to achieve -dense is polynomial increasing with . Therefore when is irregular, i.e. is significantly less than , the amount of data needed to achieve -dense is much fewer than -dense. Since the topology of remains the same for data beyond -dense, it requires much less data to learning the topology of an irregular manifold than its geometry.

## \thesection Related work and discussion

There are two levels—topology and geometry—in learning a manifold . In the topological level, the goal is to convey the homotopy type of , whereas in the geometric level, one aims to minimize the Hausdorff distance between the learned space and the target manifold. Teaching by points as in Sec LABEL:sec:points combines these two objectives together. According to the main result in [niyogi2008finding], the learned space is topologically the same as the true manifold with high confidence only if is chosen close to the reach of . This also indicates that every point in is close to . In contrast, teaching by demonstration in Sec Document prioritize topology which leads to a large reduction on the minimal teaching number. However, the distance between the learned space and could be large. Moreover, the teacher may have only limited access to . For instance, teacher could be a local manifold landmarker [silva2006selecting, xu2018unified], who only has a dense sample from a particular region of . To close this gap, we show in the Sec Document and Sec Document that even learners interested in the geometry of would benefit from learning the topological information from a teacher with partial knowledge. There are two main areas of related work: formal approaches to manifold learning and machine teaching. [niyogi2008finding] describes a PAC learning framework for learning the homology of a manifold, which we directly build upon in Section LABEL:sec:points. Extensions have, for example, directly tested the manifold hypothesis [fefferman2016testing], and estimated the reach of a manifold [aamari2017estimating]. This line of work assumes data are isolated sample points and are not formulated by a teacher. The literature on machine teaching, algorithmic teaching, and Bayesian teaching investigate the implications of having a teacher for machine learning algorithms. Machine teaching formalizes teaching standard machine learning algorithms with the single best set of teaching points [Zhu2015, Liu2016]. Algorithmic teaching uses the deterministic algorithmic learning framework [doliwa2014recursive]. Bayesian teaching formalize teaching standard probabilistic machine learning algorithms [eaves2016toward, yangexplainable]. All of these assume that the relevant data are points, rather than more structured data and all require that the teacher knows the correct answer.

## \thesection Conclusions

Manifold learning is challenging due to problems of small reach and non-uniform sampling. We analyze the possibility that teaching, by examples and demonstrations, may address these challenges. Theoretical bounds are provided for perfect knowledge teachers teaching by examples and demonstrations. We then extend to partial knowledge teachers by integrating teaching into a Topological Data Analysis (TDA) learning framework and illustrate how partial knowledge teachers can help learners overcome problems of small reach and non-uniform sampling. Important future directions include elaborating the connections between topological and geometric interpretations of teaching to constrain inferences about the exact manifold, and elaborating the interpretation of teaching as a form of multidimensional persistence.

### Footnotes

- Without considering the orientation, a sequence with two points can also describe a loop.
- A curve is simple if it has no self-intersection.
- If and are the same point, do nothing.
- and .
- It is possible that is homotopy equivalent to for . However, in this case the top and the bottom of the narrow middle part of may be connected up in such , which leads to wrong geometry.
- The barcode was constructed using the GUDHI library [maria2014gudhi].
- The end of the horizon is set to
- Test dataset at www.kaggle.com/c/new-york-city-taxi-fare-prediction