A kernelbased method for coarse graining complex dynamical systems
Abstract
We present a novel kernelbased machine learning algorithm for identifying the lowdimensional geometry of the effective dynamics of highdimensional multiscale stochastic systems. Recently, the authors developed a mathematical framework for the computation of optimal reaction coordinates of such systems that is based on learning a parametrization of a lowdimensional transition manifold in a certain function space. In this article, we enhance this approach by embedding and learning this transition manifold in a reproducing kernel Hilbert space, exploiting the favorable properties of kernel embeddings. Under mild assumptions on the kernel, the manifold structure is shown to be preserved under the embedding, and distortion bounds can be derived. This leads to a more robust and more efficient algorithm compared to previous parametrization approaches.
1 Introduction
Many of the dynamical processes investigated in the sciences today are characterized by the existence of phenomena on multiple, interconnected time scales that determine the longterm behavior of the process. Examples include the inherently multiscale dynamics of atmospheric vortex and current formation which needs to be considered for effective weather prediction [24, 29], or the vast difference in time scales on which bounded atomic interactions, sidechain interactions, and the resulting formation of structural motifs occur in biomolecules [19, 12, 10]. An effective approach to analyzing these systems is often the identification of a lowdimensional observable of the system that captures the interesting behaviour on the longest time scale. However, the computerized identification of such observables from simulation data poses a significant computational challenge, especially for highdimensional systems.
Recently, the authors have developed a novel mathematical framework for identifying such essential observables for the slowest time scale of a system [6]. The method—called the transition manifold approach—was primarily motivated by molecular dynamics, where the dynamics is typically described by a thermostated Hamiltonian system or diffusive motion in molecular dynamics landscapes. In these systems, local minima of the potential energy landscape induce metastable behavior, which is the phenomenon that on long time scales, the dynamics is characterized by rare transitions between certain sets that happen roughly along interconnecting transition pathways [35, 41, 17]. The soughtafter essential observables should thus resolve these transition events, and are called reaction coordinates in this context [46, 4], a notion that we will adopt here. Despite of its origins, the transition manifold approach is also applicable to other classes of reducible systems (which will also be demonstrated in this article).
At the heart of this approach is the insight that good reaction coordinates can be found by parametrizing a certain transition manifold in the function space . This manifold has strong connections to the aforementioned transition pathway [5], but is not equivalent. Its defining property is that, for times that fall between the fastest and slowest time scales, the transition density functions with relaxation time concentrate around . The algorithmic strategy to parametrize can then be summarized as

Choose test points in the dynamically relevant regions of the state space.

Sample the transition densities for each test point by Monte Carlo simulation.

Embed the transition densities into a Euclidean space by a generic embedding function.

Parametrize the embedded transition densities with the help of established manifold learning techniques.
The result is a reaction coordinate evaluated in the test points. This reaction coordinate has been shown to be as expressive as the dominant eigenfunctions of the transfer operator associated with the system [6], which can be considered an “optimal” reaction coordinate [20, 11, 30]. One decisive advantage of our method, however, is the ability to compute the reaction coordinate locally (by choosing the test points), whereas with conventional methods, the inherently global computation of transfer operator eigenfunctions quickly becomes infeasible due to the curse of dimensionality. Kernelbased methods for the computation of eigenfunctions alleviate this problem to some extent [43, 25]. Nevertheless, the number of dominant eigenfunctions critically depends on the number of metastable states, which can be significantly larger than the natural dimension of the reaction coordinate [6].
However, the algorithm proposed originally had several shortcomings related to the choice of the embedding function. First, in order to ensure the preservation of the manifold’s topology under the embedding, the dimension of had to be known in advance. Second, the particular way of choosing the embedding functions allowed no control over the distortion of , which may render the parametrization problem numerically illconditioned.
The goal of this article is to overcome the aforementioned problems by kernelizing the transition manifold embedding. That is, we present a method to implicitly embed the transition manifold into a reproducing kernel Hilbert space (RKHS) with a proper kernel. The RKHS is—depending on the kernel—a high or even infinitedimensional function space with the crucial property that inner products between points embedded into it can be computed by cheap kernel evaluations, without ever explicitly computing the embedding [48, 38]. In machine learning, this socalled kernel trick is often used to derive nonlinear versions of originally linear algorithms, by interpreting the RKHSembedding of a data set as a highdimensional, nonlinear transformation, and (implicitly) applying the linear algorithm to the transformed data. This approach has been successfully applied to methods such as principal component analysis (PCA) [40], canonical correlation analysis (CCA) [31], and timelagged independent component analysis (TICA) [43], to name but a few.
Due to their popularity, the metric properties of the kernel embedding are wellstudied [45, 21, 47, 22, 34]. In particular, for characteristic kernels, the RKHS is “large” in an appropriate sense, and geometrical information is wellpreserved under the embedding. For our application, this means that distances between points on the transition manifold are approximately preserved, and thus the distortion of under the embedding can be bounded. This will guarantee that the final manifold learning problem is wellposed. Moreover, if the transformation induced by the kernel embedding is able to approximately linearize the transition manifold, there is hope that efficient linear manifold learning methods can be used to parametrize the embedded transition manifold.
The main contributions of this work are as follows:

We develop a kernelbased algorithm to approximate transition manifolds and compare it with the Euclidean embedding counterpart.

We derive measures for the distortion of the embedding and associated error bounds.

We illustrate the efficiency of the proposed approach using academic and molecular dynamics examples.
In Section 2, we will formalize the definition of transition manifolds and derive conditions under which systems possess such manifolds. Section 3 introduces kernels and the induced RKHSs. Furthermore, we show that the algorithm to compute transition manifolds numerically can be written purely in terms of kernel evaluations and derive measures for the distortion of the manifold caused by the embedding into the RKHS. Numerical results illustrating the benefits of the proposed kernelbased methods are presented in Section 4 and a conclusion and future work in Section 5.
2 Reaction coordinates based on transition manifolds
In what follows, let (abbreviated as ) be a reversible, thus ergodic, stochastic process on a compact state space and its unique invariant density. That is, if , then for all . For and , let denote the transition density function of the system, i.e., describes the probability of finding the system in at time , after starting in at time .
2.1 Reducibility of dynamical systems
We assume the state space dimension to be large. The main objective of this work is the identification of good lowdimensional reaction coordinates (RCs) or order parameters of the system. An dimensional RC is a smooth map from the full state space to a lowerdimensional space , . Informally, we call such an RC good if the longterm dynamics of the projected process —i.e., the process observed through the RC—resembles the longterm dynamics of the full process . A quantitative measure for the quality of RCs will follow in Section 2.2.
A geometric characterization of good reaction coordinates based on the socalled transition manifold was derived in [6].
Definition 2.1.
Let be fixed. The set
is called the fuzzy transition manifold of the system.
The definition is motivated by the following observation: If the system is in a certain way separable into a slowly equilibrating dimensional part and a quickly equilibrating dimensional part, and is chosen to be in a certain intermediate^{1}^{1}1Intermediate here means that it has to be large enough so that the fast processes essentially equilibrated but the slow processes are still present and distinguishable. time scale range, then approximately forms an dimensional manifold in . Let us illustrate this with the aid of a simple example.
Example 2.2.
Consider the process to be described by overdamped Langevin dynamics
(1) 
with the energy potential , the inverse temperature , and Brownian motion . The potential depicted in Figure 1 possesses two metastable states, located around the local energy minima. Any realization of this system that started in one of the the metastable sets will likely remain in that set for a long time. Moreover, a realization that started outside of the wells will with high probability quickly leave the transition region and move into one of the two metastable sets. The probability of whether the trajectory will be trapped in the left or right well depends almost exclusively on the horizontal coordinate of the starting point , or, in other words, the progress of along the transition path. Thus, for times that allow typical trajectories to move into one of the wells, the transition densities also depend only on the horizontal but not the vertical coordinate of . This means that the fuzzy transition manifold essentially forms a onedimensional manifold in the space of probability densities. Also, any parametrization of this manifold corresponds to a parametrization of the horizontal coordinate, and thus to a good reaction coordinate.
The concept of (fuzzy) transition manifolds can be made rigorous by the following definition:
Definition 2.3.
The process is called reducible if there exists an dimensional manifold such that for a fixed lag time , it holds that
Any such is called a transition manifold of the system.
Two remarks are in order:

The fact that we require closeness with respect to the norm, instead of, for example, the norm, is indeed important here, as measuring the quality of a given RC will require a certain Hilbert space structure. This will be detailed below; for now, simply note that for all it holds that since .

The original definition of reducibility (see [6, Definition 4.4]), is marginally different from the definition above: Instead of , we here require . The introduction of this slightly stronger technical requirement allows us to later control a certain embedding error, see Proposition 2.8. Note that the proofs in [6] regarding the optimality of the final reaction coordinate are not affected by this change.
In what follows, we always assume that the process is reducible with small and .
Remark 2.4.
In addition to metastable systems, many other relevant problems possess such transition manifolds. For instance, one can show that, under mild conditions on and , systems with explicit time scale separation, given by
also possess a transition manifold since essentially depends only on for and .
2.2 A measure for the quality of reaction coordinates
We will now derive a measure for evaluating the quality of reaction coordinates that is based on transfer operators. The Perron–Frobenius operator associated with the process is defined by
This operator can be seen as the pushforward of arbitrary starting densities, i.e., if , then .
The operator can be naturally extended to the inner product space and then has particularly advantageous properties (see [2, 37, 26]): Defined on , the operator is selfadjoint due to the reversibility of . Moreover, under relatively mild conditions, it does not exhibit any essential spectrum [41]. Hence, its eigenfunctions form an orthonormal basis of and the associated eigenvalues are real. Now, the significance of the dominant eigenpairs for the system’s time scales is wellknown [41]. This is the primary reason for the choice of the norm in Definition 2.3.
Let be the eigenvalues of , sorted by decreasing absolute value, and the corresponding eigenfunctions, where . It holds that is independent of , isolated and the sole eigenvalue with absolute value . Furthermore, . The subsequent eigenvalues decrease monotonously to zero both for increasing index and time. That is,
The associated eigenfunctions can be interpreted as subprocesses of decreasing longevity in the following sense: Let , with , , then
since for the lag time as defined above, there exists an index such that for all and all . Hence, the major part of the information about the longterm density propagation of is encoded in the dominant eigenpairs.
In addition to the Perron–Frobenius operator for the full process , we can define the Perron–Frobenius operator for the projected process , denoted by . Formally, it can be computed by applying the Zwanzig projection operator to , for details see [50]. Assume now that we are interested in reproducing the longterm dynamics of , i.e., the action of , for bigger than some fixed lag time . A natural requirement for good reaction coordinates then is
(2) 
for and all densities . Therefore, a necessary condition is
That is, the eigenfunctions must be almost constant along the level sets of . For a formal proof of this statement, see [6]. This motivates the following definition:
Definition 2.5.
Let be the eigenpairs of the Perron–Frobenius operator. Let and such that for all and . We call a function a good reaction coordinate if for all there exist functions such that
(3) 
If condition (3) is fulfilled, we say that (approximately) parametrizes the dominant eigenfunctions.
2.3 Optimal reaction coordinates
We now justify why reaction coordinates that are based on parametrizations of the transition manifold indeed fulfill the optimality condition (3). Let be the nearestpoint projection onto , i.e.,
Assume further that some parametrization of is known, i.e., is onetoone on and its image in . The reaction coordinate defined by
(4) 
is good in the sense of Definition 2.5.
Theorem 2.6 ([6, Corollary 3.8]).
Let the system be reducible and defined as in (4). Then for all , there exist functions such that
(5) 
Let us add two remarks:
The main task for the rest of the paper is now the numerical computation of an (approximate) parametrization of .
2.4 Whitney embedding of the transition manifold
One approach to find a parametrization of , proposed by the authors in [6], is to first embed into a more accessible Euclidean space and to parametrize the embedded manifold. In order to later compare it with our new method, we will briefly describe this approach here.
To construct an embedding that preserves the topological structure of , without prior knowledge about , a variant of the Whitney embedding theorem can be used. It extends the classic Whitney theorem to arbitrary Banach spaces and was proven by Hunt and Kaloshin in [23].
Theorem 2.7 (Whitney embedding theorem, [23]).
Let be a Banach space, let be a smooth manifold of dimension and let . Then almost every (in the sense of prevalence) bounded linear map is onetoone on and its image in .
In our case, and is the transition manifold. Assuming that the dimension of the transition manifold is known in advance, this result states that we can effectively choose the embedding randomly, and the image will again be an dimensional manifold in , provided that .
We restrict ourselves to embeddings of the form
with arbitrarily chosen feature maps that we require to be bounded on . The dynamical embedding of a point is then defined by
(6) 
This is the Euclidean representation of the density , and the set is the Euclidean representation of the fuzzy transition manifold. It again clusters around an dimensional manifold in , namely the image of the transition manifold under :
Proposition 2.8.
Let the process be reducible with transition manifold , and and defined as above. Then
Proof.
Let . Then there is an such that . As , i.e., consists of transition densities, we have
Remark 2.9.
Together, Theorem 2.7 and Proposition 2.8 guarantee at least a minimal degree of wellposedness of the embedding problem: The embedded manifold has the same topological structure as , and clusters closely around it (if is small). However, guarantees on the condition of the problem cannot be made. The manifold will in general be distorted by , to a degree that might pose problems for numerical manifold learning algorithms. This problem is illustrated in Figure 2. Such a situation typically occurs if some of the components of the embedding are strongly correlated.
Additionally, the Whitney embedding theorem cannot guarantee that the fuzzy transition manifold will be preserved under the embedding, as analytically is not a manifold. Thus, is in general not injective on .
2.5 Datadriven algorithm for parametrizing the transition manifold
Due to its implicit definition, the transition manifold is hard to analyze and parametrize without further knowledge of the system. However, as and concentrates closely around , we can assume that any parametrization of the dominant directions of is also a good parametrization of . The embedding can be efficiently sampled from simulation data in the following way: Let denote the endpoint of the time realization of starting in and outcome , where is the sample space underlying the process . For , fixed as in Definition 2.3 and arbitrarily chosen , let . In practice, the will be generated by multiple runs of a numerical SDE solver starting in with different random seeds.
The empirical dynamical embedding is defined by
(7) 
and is an estimator for .
Let be a finite sample of state space points, which covers the “dynamically relevant” part of state space, i.e., the regions of of substantial measure . The exact distribution of the sampling points is not important here. If is bounded or periodic, could be drawn from the uniform distribution or chosen to form a regular grid. In practice, it often consists of a sample of the system’s equilibrium measure . Due to Proposition 2.8, the point cloud then clusters around an dimensional manifold in . Parametrizing this manifold is a classical manifold learning task for which a wide variety of algorithms can be deployed. Algorithm 2.1 summarizes the overall procedure of computing good reaction coordinates based on parametrizing the embedded fuzzy transition manifold.
3 Kernelbased parametrization of the transition manifold
The approach described above for learning a parametrization of the transition manifold by embedding it into Euclidean spaces requires a priori knowledge of the dimension of . Also, more importantly, might be strongly distorted by the embedding, as described in Section 2.4. The kernelbased parametrization, which is the main novelty of this work, will address both of these shortcomings by embedding into reproducing kernel Hilbert spaces.
3.1 Kernel reformulation of the embedding algorithm
Dimensionality reduction algorithms that can be used in Algorithm 2.1 include Diffusion Maps [14], Multidimensional Scaling [49, 28], and Locally Linear Embedding [36]. These, and many others, require only a notion of distance between pairs of data points. In our case, this amounts to the Euclidean distances between embedded points, i.e., , which can be computed by the Euclidean inner products , as
Other compatible algorithms such as Principal Component Analysis are based directly on the inner products. The inner products can be written as
and the empirical counterpart is
However, rather than explicitly computing the inner product between the features on the righthand side, we now assume that it can be computed implicitly by using a kernel function , i.e.,
(8) 
This assumption, called the kernel trick, is commonly used to avoid the costly computation of inner products between highdimensional features. However, instead of defining the kernel based on previously chosen features, one typically considers kernels that implicitly define high and possibly infinitedimensional feature spaces. In this way, we are able to avoid the choice of feature maps altogether.
Kernels with this property span a socalled reproducing kernel Hilbert space:
Definition 3.1 (Reproducing kernel Hilbert space [45]).
Let be a positive definite function. The set of functions , together with the corresponding inner product which fulfills

and

for all
is called the reproducing kernel Hilbert space (RKHS) associated with the kernel .
Requirement 2 implies that
(9) 
thus can be regarded as a functionvalued feature map (the socalled canonical feature map). However, each positive definite kernel is guaranteed to also possess a feature map of at most countable dimension.
Theorem 3.2 (Mercer’s theorem [32]).
Let be a positive definite kernel and be a finite Borel measure with support . Define the integral operator by
(10) 
Then there is an orthonormal basis of consisting of eigenfunctions of rescaled with the square root of the corresponding nonnegative eigenvalues such that
(11) 
The above formulation of Mercer’s theorem has been taken from [34]. The Mercer features thus fulfill (8) for their corresponding kernel. In what follows, if not stated otherwise, will be the standard Lebesgue measure.
Example 3.3.
Examples of commonly used kernels are:

Linear kernel: . One sees immediately that (8) is fulfilled by choosing , (also spanning the Mercer feature space).

Polynomial kernel of degree : . It can be shown that the Mercer feature space is spanned by the monomials in up to degree .

Gaussian kernel: where is called the bandwidth of the kernel. Let and with be a multiindex. The Mercer features of then take the form
see [48], where
Let denote the density embedding based on the Mercer features of the kernel , i.e.,
(12) 
and let . The amount of information about preserved by the embedding depends on the choice of the kernel . For the first two kernels in Example 3.3, the information preserved has a familiar stochastic interpretation (see, e.g., [34, 39, 47]):

Let be the linear kernel. Then
i.e., the means of and coincide.

Let be the polynomial kernel of degree . Then
i.e., the first moments of and coincide.
Remark 3.4.
In practice, comparing the first moments often is enough to sufficiently distinguish the transition densities that constitute the transition manifold. However, densities that differ only in higher moments cannot be distinguished by , which means that for the above two kernels, is not injective. Therefore does not belong to the prevalent class of maps that is at the heart of the Whitney embedding theorem 2.7. We can therefore not utilize the Whitney embedding theorem to argue that the topology of is preserved under . Instead, in Section 3.2, we will use a different argument to show that the embedding is indeed injective for the Gaussian kernel (and others).
Still, by formally using the Mercer dynamical embedding in (6) (abusing notation if there are countably infinitely many such features), and using the kernel trick, we can now reformulate Algorithm 2.1 as a kernelbased method that does not require the explicit computation of any feature vector. This is summarized in Algorithm 3.1.
3.2 Condition of the kernel embedding
We will now investigate to what extend the kernel embedding preserves the topology and geometry of the transition manifold.
3.2.1 Kernel mean embedding
We derived the kernelbased algorithm by considering the embedding of the transition manifold into the image space of the Mercer features in order to highlight the similarity to the Whitey embedding based on randomly drawn features. Of course, the Mercer features never had to be computed explicitly.
However, in order to investigate the quality of this embedding procedure it is advantageous to consider a different, yet equivalent embedding map: The transition manifold can be directly embedded into the RKHS by means of the kernel mean embedding operator.
Definition 3.5.
Let be a positive definite kernel and the associated RKHS. Let be a probability density over . Define the kernel mean embedding of by
and the empirical kernel mean embedding by  
Note that and are again elements of and that for in (10) being the Lebesgue measure we obtain . Further, one sees that
Thus, for investigating whether the embedding preserves distances or inner products between densities, we can equivalently investigate the embedding . This is advantageous as injectivity and isometry properties of the kernel mean embedding are wellstudied.
3.2.2 Injectivity of the kernel mean embedding
A first important result is that can be chosen such that is injective. Such kernels are called characteristic [21]. In [47], several conditions for characteristic kernels are listed, including the following:
Theorem 3.6 ([47, Theorem 7]).
The kernel is characteristic if for all it holds that
(13) 
Condition (13) is known as the Mercer condition, which is, for example, fulfilled by the Gaussian kernel from Example 3.3. The Mercer features of such a kernel are particularly rich.
Theorem 3.7.
Assume that the kernel satisfies the Mercer condition (13). Then the eigenfunctions of form an orthonormal basis of .
For more details, see, e.g., [38, 48]. It is easy to see that for kernels fulfilling (13), as a map from to is Lipschitz continuous:
Lemma 3.8.
Let be a characteristic kernel with Mercer eigenvalues , . Then is Lipschitz continuous with constant
(14) 
Proof.
Thus, if the kernel is characteristic, the structure of the TM and the fuzzy TM are qualitatively preserved under the embedding.
Corollary 3.9.
Let be a characteristic kernel and let be reducible. Then is an dimensional manifold in , and for all it holds that
Proof.
Remark 3.10.
This result should be seen as an analogue to Proposition 2.8 for the Whitneybased TM embedding. In short, for characteristic kernels, the injectivity and continuity of guarantee that the image of under is again an dimensional manifold in , and Corollary 3.9 guarantees that the embedded fuzzy transition manifold still clusters closely around (if and in Corollary 3.9 are small). This again guarantees a minimal degree of wellposedness of the problem.
3.2.3 Distortion under the kernel mean embedding
Unlike the Whitney embedding, the kernel embedding now allows us to derive conditions under which the distortion of is bounded. We have to show that the distance between points on is not overly decreased or increased by the kernel mean embedding. To formalize this, we consider measures for the manifold’s internal distortion, following the notions of metric embedding theory [1].
Definition 3.11.
Let be the embedding corresponding to a characteristic kernel. The contraction of is defined by
(15) 
The expansion of is defined by
(16) 
Finally, the distortion of is defined by
(17) 
It follows that and if and only if is isometric. We call the embedding wellconditioned if is small.
Proposition 3.12.
Assume is the largest and the smallest bound such that
(18) 
holds for all . Then the distortion of the kernel embedding is .
Proof.
The second inequality in (18) is just Lipschitz continuity of , for which conditions have been derived in Lemma 3.8, so it can be assumed to hold with the corresponding Lipschitz constant.
However, one can show that, even for characteristic kernels, it is not possible to derive a bound for the first inequality that holds uniformly for all :
Proposition 3.13.
Assume the kernel embedding operator has absolutely bounded orthonormal eigenfunctions with corresponding nonnegative eigenvalues (arranged in nonincreasing order). Assume increases superlinearly with . Then there exist functions such that
for any arbitrarily small .
Proof.
See Appendix A. ∎
A similar, but nonquantitative result has been derived in [47, Theorem 19]. The idea behind the proof is that, if and vary only in higher eigenfunctions of the embedding operator (see also Theorem 3.2), the distance can become arbitrarily small. If, however, we can reasonably restrict our considerations to the subclass of functions whose variation in the higher is small compared to the variation in the lower , a favorable bound can be derived. Let the expansion of be given by
with the sequence . Now, for any such that there exists an index such that , define the factor
(19) 
This factor bounds the contribution of the higher Mercer eigenfunctions to by the contribution of the lower ones:
Thus, for an individual , we can bound the distortion of the norm under with the help of .
Lemma 3.14.
Let , , and be defined as in (19). Then
Proof.
See Appendix A. ∎
This already shows that for ,
However, this is only an intermediate result, as the relevant distance measure in Definition 2.5 is the norm, for reasons detailed in Section 2.2. Unfortunately, a naive estimation yields
(20) 
While due to ergodicity is indeed defined on , it becomes large in regions of small invariant measure , i.e., “dynamically irrelevant” regions. This would lead to a very small factor in (18), and thus large predicted distortion. For general , a more favorable estimate is difficult to obtain. In practice, however, these “dynamically irrelevant” regions are almost never visited by the system such that any practically relevant bound would not suffer from this problem.
Recall that our task is to parametrize the embedded transition manifold, in order to find a parametrization of the dominant transfer operator eigenfunctions . Therefore, differences of and in higher eigenfunctions , do not need to be resolved. Thus, we only need to preserve the distance
where is the orthogonal projection onto the dominant eigenfunctions:
This allows us to derive a more favorable bound:
Lemma 3.15.
Let denote the th dominant eigenfunction of the Perron–Frobenius operator and the associated eigenvalues. Let denote the orthogonal projection onto the dominant eigenfunctions. Then for any ,
where denotes the Lebesgue measure of .
Proof.
See Appendix A. ∎
Remark 3.16.
Note that unlike in (20), the factor is not problematic here, as the dominant eigenfunctions are small whenever is small.
Corollary 3.17.
Assume there exists and a constant such that
for all . Then
where
Remark 3.18.
In summary it can be said that, if we can assume that the differences of all , , in the higher Mercer eigenfunctions of are uniformly bounded by the differences in the lower Mercer eigenfunctions, then the first inequality in (18) is fulfilled with the above constant . Combined with the Lipschitz constant in (14), we have therefore derived an upper bound for the distortion factor of :
(21) 
Most importantly, this bound guarantees the wellposedness of the embedding and parametrization problem as the distortion of even the fuzzy transition manifold cannot become arbitrarily large. We will support this statement with numerical evidence (see Section 4.2) showing that the distortion is, in practice, indeed small. Note, however, that we do not expect (21) to perform well as a quantitative error estimate, as many of the estimates that led to it are rather rough.
4 Illustrative examples and applications
4.1 Reaction coordinate of the Müller–Brown potential
As a first illustrating example, we compute the reaction coordinate of the twodimensional Müller–Brown potential [33] via the new kernelbased Algorithm 3.1. The potential energy surface (see Figure 3 (a)) possesses three local minima, where the two bottom minima are separated only by a rather shallow energy barrier. Correspondingly, the system’s characteristic longterm behavior is determined by the rare transitions between the minima. These transitions happen predominantly along the potential’s minimum energy pathway (MEP), which is shown as white dashed line and was computed using the zero temperature string method [15, 16].
For two sets and a starting point , the committor function is defined as the probability that the process hits set before hitting set , provided it started in at time zero. For a precise definition see [41]. For the Müller–Brown potential, the committor function associated with the top left and bottom right energy minima, shown in Figure 3 (b), can be considered an optimal reaction coordinate [18]. Therefore, we use the (qualitative) comparison with the committor function as a benchmark for our reaction coordinate. Note that the computation of the committor function requires global knowledge of the metastable sets and is often not a practical option for the identification of reaction coordinates.
(a)  (b)  (c) 
The governing dynamics is given by an overdamped Langevin equation (1), which we solve numerically using the Euler–Maruyama scheme. At inverse temperature , the lag time falls between the slow and fast time scales and is thus defined to be the intermediate lag time. The test points required by Algorithm 3.1 are given by a regular grid discretization of the domain . For the embedding, the Gaussian kernel
(22) 
with bandwidth is used and for the manifold learning task in Algorithm 3.1, the diffusion maps algorithm with bandwidth parameter . The reaction coordinate for the test points is shown in Figure