Pseudo generators of spatial transfer operators
Abstract
Metastable behavior in dynamical systems may be a significant challenge for a simulation based analysis. In recent years, transfer operator based approaches to problems exhibiting metastability have matured. In order to make these approaches computationally feasible for larger systems, various reduction techniques have been proposed: For example, Schütte introduced a spatial transfer operator which acts on densities on configuration space, while Weber proposed to avoid trajectory simulation (like Froyland et al.) by considering a discrete generator.
In this manuscript, we show that even though the family of spatial transfer operators is not a semigroup, it possesses a well defined generating structure. What is more, the pseudo generators up to order 4 in the Taylor expansion of this family have particularly simple, explicit expressions involving no momentum averaging. This makes collocation methods particularly easy to implement and computationally efficient, which in turn may open the door for further efficiency improvements in, e.g., the computational treatment of conformation dynamics. We experimentally verify the predicted properties of these pseudo generators by means of two academic examples.
1 Introduction
Conformations of molecular systems
The properties of many biomolecular systems such as proteins or enzymes depend heavily on their molecular configuration, i.e. the position of single atoms relative to each other. It is often observed that the system tends to ”cluster” around certain key configurations. Transitions between these socalled conformations can be considered rare events, as the time scale on which they occur and the characteristic dynamic time scale of the atoms in the molecule typically lie orders of magnitude apart. Nevertheless, these transitions play an essential role for the biological function of these molecules [15, 50, 16, 35, 33]. The reliable identification of these conformations and the probabilities (and rates) of transitions between them via direct numerical simulation is computationally very demanding if not infeasible for larger molecules.
Transfer operator based methods
Molecular systems as described above are typically modeled as Hamiltonian systems, possibly including stochastic perturbations. Conformations then are almostinvariant (metastable) subsets of position space, corresponding to local minima of the potential energy surface. The ultimate goal of conformation analysis is to obtain a reduced model of the given system which accurately depicts these sets and the proper statistics of the transitions between them. This field of research, also called Markov state modeling, attracted a lot of interest in the last decade [4, 5, 21, 36, 39, 43, 48].
Pioneering work of Deuflhard, Dellnitz et al [10, 42] exploits that these almost invariant sets can in principle be identified through eigenfunctions of a certain linear operator, the transfer operator, which describes the evolution of distributions under the dynamics.
A direct application of this approach considers the operator acting on densities on the entire state space (i.e. position and momentum), while the conformational changes of interest are only observed in the position coordinate. Moreover, the approach is subject to the curse of dimension for all but the smallest systems, since a discretization of state space has to be constructed.
To remedy this, one might consider the “overdampled” or Smoluchowski dynamics [22], which acts on position space only, but this is a physically acceptable model of molecular motion only in the case that random collisions with the solvent overwhelm the effect of inertia in the molecule of interest. Schütte [41] came up with a physically justifiable solution as he introduced the socalled spatial dynamics, whose metastable sets still bear the interpretation of molecular dynamical conformations. The associated spatial transfer operator acts on densities on position space and can be seen as a momentumaveraged version of the full Hamiltonian or Langevin dynamics.
Commonly, the transfer operator is finitely approximated by a stochastic matrix, whose entries can be interpreted as transition probabilities of single system instances in the canonical ensemble from some subset (in state or position space) to another. These transition probabilities in turn are computed by short time integration of a number of trajectories starting in each subset. Thus, the computation of a few long simulations is replaced by the computation of many short trajectories for an ensemble of adequately distributed initial conditions. Still, the momentum averaging has to be done explicitely by additionally sampling the momentum space for each of these initial conditions.
Over the years, different techniques for a finite approximation of transfer operators have been proposed, we refer to [9, 7] and the references there. More recently, Weber [47] used meshfree approximation techniques and showed that for a given approximation error, the number of basis elements scales with the number of metastable sets, not necessarily with the dimension of the sytem. In [25], an approximation by a sparse Haar space was proposed in order to mollify the curse of dimension, while in [17] a tensorproduct construction was used in combination with a mean field approach. A novel approach to coarsegrain a multiscale system by discretizing its transfer operator without using a full partition of the phase space [43] excels especially in efficiently reproducing the dominant time scales of the original system, but relies heavily on long trajectory simulations.
Simulationfree and generatorbased methods
All these methods rely on the numerical integration of trajectories. Only recently, methods have been proposed that require no time integration, albeit imposing further requirements on the system [27, 48, 18]: Under these conditions and provided that the system’s transfer operator forms a continuous time semigroup, one can exploit that the eigenvalues and functions are the same as those of the semigroup’s infinitesimal generator. Discretizing this generator requires no timeintegration and is thus computationally considerably cheaper than classical methods.
This manuscript
Unfortunately, Schütte’s spatial transfer operator is not a time semigroup. In this contribution, we define suitable pseudo generators of nonsemigroup families of operators which inherit desirable properties of the spatial transfer operator as well as “restored” operators which approximate the spatial operator at least for small times. The appeal of these constructions from a numerical perspective is threefold:

no numerical time integration is needed,

momentum averaging is accomplished analytically, i.e. momentum sampling is completely avoided,

the pseudo generators can be discretized by collocation methods, avoiding costly boundary integrals.
We establish theoretical asymptotic estimates on the error of density propagation, and validate them numerically. The numerical experiments indicate that the information on metastable sets (i.e. conformations) gained from the restored operators remains close to the “original” one gained from the exact spatial transfer operator, even for times beyond those guaranteed by our estimates. A quantitative understanding of this phenomenon (also observed by Schütte [41]) is still lacking; some steps towards a theoretical explanation have been made in [3].
The manuscript is structured as follows. In Section 2, we introduce the basic dynamical models we are working with and describe their action on propagating (probability) densities by transfer operators. This necessitates the discussion of operator semigroups, given at the end of the section. Section 3 is concerned with fluctuations in the spatial distribution of the system governed by its dynamics, leading to the concept of spatial transfer operators as well as metastability in position space. In Section 4, we introduce the concept of pseudo generators, the corresponding restored operators, and give asymptotic error estimates on their approximation quality. Section 5 includes numerical experiments. We conclude our work in Section 6, and discuss future directions to make the method applicable for realistic (bio)molecular systems. Three appendices are given: Appendix A gives a detailed derivation of the pseudo generators up to order 3; Appendix B gives a complete and selfcontained proof of the applicability of Huisinga’s theory [22, 24] on the quantitavive identification of metastable components from spectral analysis of transfer operators for the spatial transfer operator based on Langevin dynamics (especially reversibility and ergodicity of the spatial dynamics, which are probably known or at least anticipated, however we could not find neither a statement, nor even a partial derivation of these properties); in Appendix C, we show that the eigenfunctions of the spatial transfer operator are smooth, i.e. infinitely differentiable, if the potential is a smooth function.
2 Transfer operators and their generators
In this section we introduce the dynamical systems of interest as well as the concept of transfer operators for describing statistical transport under these dynamics.
2.1 Stochastic dynamics
Broadly speaking, we will be studying continuous time stochastic dynamical systems on a phase space . They will be described by valued random variables , , following an Itô diffusion equation, i.e. a stochastic differential equation of the form
(1) 
Here, denotes the differentiation with respect to , and is a valued “white noise” term, see e.g. [34, p. 61]. The functions and are assumed to be globally Lipschitz and growing at most linearly at infinity, such that (1) has unique solutions (cf. [34, Theorem 5.2.1]). Here and in the following boldface lower case letters denote random variables.
Molecular dynamics
Consider a molecular system described by positional degrees of freedom. Typically, these correspond to either internal coordinates or particle positions in , being the number of particles. Let denote the configuration space.
Let a potential , describing the relative energy of a given configuration , be continously differentiable^{1}^{1}1Later on, we will impose stronger assumptions on .. Under the assumption of strict total energy, the movement of the system is then described by classical deterministic Hamiltonian dynamics:
(2)  
The phase space is thus , with the position and the momentum coordinate. For simplicity, we set the mass matrix to be the identity, otherwise the first equation in (2) would be .
Equation (2) models a system “in vacuo”, independent from external influence. Of more physical relevance, however, are systems which are stochastically coupled to their surroundings, physically motivated by the presence of a heat bath or implicit solvent not modeled explicitly. A prominent way of doing this is via a driftdiffusion perturbation of (2), known as the Langevin equations, which can be formally derived using averaging techniques from the MoriZwanzig formalism [51],
(3)  
These can be written in the form of (1) with
The term mimics the drag through the implicitly present solvent, accounts for random collisions with the solvent particles. To balance damping and excitation and to keep the system at a constant average internal energy^{2}^{2}2Actually, is the inverse temperature, , with Boltzmann’s constant and the system temperature. , we set . The choice of is problem dependent, and mimics the viscosity of the aforementioned implicit solvent. For further details on the modeling see [6].
An even further model reduction leads to the socalled Smoluchowski dynamics. In the secondorder form of (3),
(4) 
we consider a highfriction situation . After appropriate rescaling of the time, (in order to be able to observe movement under the now extremely slow dynamics), (4) becomes
In the limit , this yields the Smoluchowski equation
(5) 
This conceptual derivation can be made precise by considering stochastic convergence notions. The interested reader is referred either to [32], where the physical intuition and mathematical rigor are both kept at a high level^{3}^{3}3A reader with a physicists view might be irritated by the fact that the terms with and act as forces (or accelerations) in (3) and as velocities in (5). By eliminating the damping coefficient , the physical dimensions of the terms changed. Since the mathematical statement is not affected by this, we shall not go into details., or to [37], where homogenization techniques for the transfer operators of the underlying equations are exploited. Next, we will consider this latter, operatorbased characterization of stochastic processes.
2.2 Transfer operators
We shall now examine how phase space density functions evolve under the dynamics induced by (1). That is, given a probability density at time , what is the probability to find the system in a certain region at time ? More precisely, given (a random variable distributed according to the density ), find with , , where the evolution of is governed by (1).
To this end, let be a probability space with denoting the Borel algebra, and consider the stochastic transition function ,
and denote by
(6) 
the transition probabilities between and , where indicates that ; i.e. the initial condition is distributed according to . For the long term macroscopic behavior of the system, sets play an important role for which for some physically relevant measure and times .
Now assume that an initial distribution ^{4}^{4}4In the literature, sometimes denotes the “preLebesgue space”, i.e. the Lebesgue space before equivalence class formation, and usually denotes the actual Lebesgue space. Due to clash of notation, however, we call the actual Lebesgue space and use to denote the standard norm. is given. We then have that with
(7) 
Under mild conditions^{5}^{5}5See e.g. [29]., satisfied by the systems considered here, is uniquely defined by (7). This yields the transfer operator with lag time via
where we extend the definition of from densities to arbitrary integrable functions using linearity. In the deterministic case, this is the socalled Perron–Frobenius operator.
For some of the following results, it will be necessary to distinguish between the transfer operators of the general Itô diffusion (1), and the special cases of the Langevin (3) and Smoluchowski dynamics (5). We will then refer to them as , and , respectively. Of course, statements concerning hold for and as well.
Some properties of
can be considered a timeparametrized family of linear operators, which then possesses the Chapman–Kolmogorov (or semigroup) property:

,

for all .
While a stochastic interpretation only makes sense in the preceding setting, the formal extension of to the spaces , , is well defined for proper choices of (see Corollary 2.1). We thus have that and for .
Using the standard scalar product on , for , the transition probabilities can be expressed via the transfer operator:
with being the indicator function.
2.3 Infinitesimal generators
The semigroup property basically means that is “memoryless” (in other terms, (1) generates a Markov process). The identity suggests that all information about the density transport is contained in for arbitrarily small .
This is formalized by looking at an operator given by
(8) 
where is the linear subspace of where the above limit exists. is called the infinitesimal generator of the semigroup , and the field of operator semigroup theory [38] answers the question in which sense is a solution operator to the Cauchy problem . Essentially, the power of the infinitesimal generator lies in the fact that all the relevant information about for all times is already encoded in . We will discuss this below.
Invariant density
Having its interpretation in mind, it is not surprising that the infinitesimal generator is exactly the right hand side of the parabolic partial differential equation describing the flow of sufficiently regular densities, the Kolmogorov forward equation or Fokker–Planck equation, see [26, p. 282]:
(9) 
In the Langevin case, this simplifies to
(10) 
where the dot denotes the Euclidean inner product, and are the gradient and Laplace operators with respect to , respectively. For Smoluchowski, it is
(11) 
Densities which are invariant under the dynamics play a naturally prominent role. Since they are fixed under for any , by (8) they lie in the kernel of ; see also Corollary 2.3 below. For the stochastic processes considered here the invariant density can be shown to be unique (cf. [34]), and—using the term from statistical mechanics—we call it the canonical density, . For the Langevin dynamics
(12) 
where
with and . For the Smoluchowski dynamics, is the canonical density. is a density with respect to the Lebesgue measure , and its existence requires the integrability of ; which is from now on assumed to hold.
To understand their relevance, note that the canonical density not just describes the statistical equilibrium of the system, but the system also tends to this equilibrium as time grows, i.e. according to whatever the system is distributed initially, as in .
Domain and Spectral properties
The results of this paragraph hold true for the transfer operators of both the Langevin and the Smoluchowski dynamics, by taking the corresponding phase space and invariant measure (i.e. the measure having the canonical density as Radon–Nikodým derivative with respect to the Lebesgue measure).
Let be the invariant measure of . Note that for arbitrary , can be defined on due to the following corollary:
Corollary 2.1 ([2, Corollary to Lemma 1]).
Let be a transfer operator associated with a transition function having the invariant measure . Then is a welldefined contraction on for every .
For our purposes the main connection between a semigroup of operators and their generator is given by the following
Theorem 2.2 (Spectral mapping theorem [38]).
Let be a Banach space, , , a semigroup of bounded linear operators (i.e. as for every , and bounded for every ), and let be its infinitesimal generator. Then
with denoting the point spectrum. The corresponding eigenvectors are identical.
We can immediately deduce the following statements.
Corollary 2.3.
A function is an invariant density of for all , if and only if .
Corollary 2.4.
Since is a contraction in , the eigenvalues of lie in the left complex halfplane.
Theorem 2.2 suggests that . This intuition is false in general, as may be unbounded and for any . However, can be approximated by a truncated “Taylor series”, at least pointwise, in the function space
(13) 
We require and to be times differentiable, as this is the highest derivative occuring in , cf. (9).
The following convergence result also holds true if choosing instead of in the definition of , and correspondingly regarding the norm . However, we state it for , as this is the space we are ultimately operating in.
Proposition 2.5.
Let . Then
Proof.
Let . Then, is times differentiable in because exist as per choice of . The Taylor series expansion for Banach space valued linear operators can for example be found in [49, Section 4.5]. Application to yields
We estimate the remainder:
As we are interested in the limit we can assume . In that case, , and therefore  
is the solution of the Fokker–Planck equation (9), and thus and, by extension, . Moreover, due to Pazy [38, Corollary 1.4], the transfer operator and generator commute: . Therefore,  
In the last line, because is a contraction (Corollary 2.1). As by the choice of , it is bounded. This completes the proof. ∎
3 Spatial dynamics and metastability
In order to analyze the behaviour of molecular systems in regard of configurational stability, we have to restrict our view to the dynamics on position space . For this purpose, Schütte in [41] proposed a reduction of the classical Hamiltonian dynamics, called Hamiltonian dynamics with randomized momenta, while Weber [48] proposed the corresponding generalized version for a stochastic evolution. Following Schütte and Weber, we formulate the extension to Langevin dynamics and state the appropriate definition of metastability.
3.1 The spatial transfer operator
Consider an infinitely large number of identical systems of form (3) in thermodynamic equilibrium, i.e. identically and independently distributed according to (called an ensemble in classical statistical mechanics literature). To determine which portion of these systems undergo a certain configurational change, i.e. leave a subset , we have to track the evolution of all these systems starting from . Due to the product structure (12) of , their momenta are still distributed according to and so the whole coordinates are initially distributed according to , with the indicator function of on . This phase space density now evolves under , but as we are only interested in the position portion of the evolving density, we form the marginal distribution with respect to . The resulting spatial transfer operator on with is
(14) 
cf. Corollary 2.1 applied to the operator and the invariant measure .
Intuitively, one can think of with normalized as transporting a positional portion of the canonical density.
3.2 Metastability on position space
Langevin dynamics with randomized momenta
Considering as an operator on and using the standard associated scalar product gives us access to certain transition probabilities on , which fit our intuition of metastability. For we call
the ”slice“ of phase space corresponding to . It represents a subensemble in position space associated to all possible momenta. It is easy to see that the transition probabilities between slices and can be expressed in terms of :
(15) 
where is the stochastic transition function under Langevin dynamics with respect to the Lebesgue measure. We now call a disjoint decomposition of position space metastable if
The meaning of “” will become apparent later.
The connection between eigenvalues close to one of some transfer operator and metastable sets was first observed in [9] and applied in conformation dynamics in [10]. An extension to a broader class of transfer operators (satisfying an assumption related to selfadjointness) was provided by Huisinga and Schmidt [24]. Our falls into that class, which is shown in Appendix B.
Theorem 3.1 (Application of [24, Theorem 2]).
Let with and
be the largest eigenvalues of , with eigenvectors .
Let be a measurable decomposition of and be the orthogonal projection onto , i.e.
The metastability of the decomposition can then be bounded from above by
while it is bounded from below by
where and .
Thus, the lower the projection error of , the better the lower bound matches the upper bound. We choose in accordance to the the sign structure of as a heuristic to the optimal decomposition (i.e. we treat the eigenfunctions as onestep functions of the form ).
However, more sophisticated strategies for extracting metastable sets are available, most notably the linear optimizationbased PCCAalgorithm and its extensions, developed by Deuflhard et. al. ([11], [40]). Let it be noted that it is applicable to all the operators developed herein, as PCCA does not depend on the underlying dynamical model.
Smoluchowski dynamics
As described in section 2.1, another way to restrict the molecular dynamics to position space is via the highfriction limit and the arising transition from Langevin to Smoluchowski dynamics. As this limit may represent a considerable deviation from physical reality, it is initially unclear how metastability in system (5) can be interpreted in the context of the original system. As the transition also involves a rescaling of time, especially the transition probabilities have to be treated with caution.
Nevertheless, metastability under Smoluchowski dynamics can formally be defined as above, and it holds
Here, is the stochastic transition function with respect to Smoluchowski dynamics. Using the same reasoning as in the previous paragraph, we seek eigenpairs of with .
However, in the case of Smoluchowski dynamics, these are somewhat more accessible. Due to the Spectral Mapping Theorem 2.2, eigenvalues of near coincide with eigenvalues of the infinitesimal generator near , and the associated eigenvectors are identical. An efficient method for metastability analysis based on was developed in [18]. There it is shown that for an eigenvalue of , the corresponding eigenvector and the sets holds
Unfortunately, lacks the semigroup property, and so cannot be the solution operator of an autonomous PDE, such as the Fokker–Planck equation. Equivalently, spatial dynamics is not induced by an Itô diffusion process, and thus has no infinitesimal generator in the sense of (11).
4 The generating structure of spatial transfer operators
Formally, the timederivatives of can still be defined, in analogy to (8). We will see in the following how the resulting operators can play the role of the infinitesimal generator in the context of metastability analysis.
4.1 Pseudo generators
We first define these time derivatives for general timeparameterized operators:
Definition 4.1.
Let be a Banach space, be a timeparametrized family of bounded linear operators.
Define the operator by
and call it the timederivative of . here is the subspace of where the above limit exists. Iteratively, we define by the th timederivative on . Finally,
is called the th pseudo generator of .
For , the transfer operator of an Itô process, the pseudo generators are the iterated infinitesimal generators:
Proposition 4.2.
On , the th pseudo generator of takes the form
with the infinitesimal generator of the respective dynamics.
With this, the pseudo generators of the spatial transfer operator can be expressed by the generator of the full Langevin transfer operator:
Lemma 4.3.
On , the th pseudo generator of takes the form
Proof.
The first timederivative of is
Inductively, this gives the th time derivative
As , the th pseudo generator is
∎
From now on, when speaking of pseudo generators, we always mean pseudo generators of . Note that, in general, is not simply a power of , as the integral and the power of do not commute. We thus take a closer look at the first few :
Proposition 4.4.
Let be the spatial transfer operator for the Langevin dynamical process. On their respective domain, its first three pseudo generators take the form

,

. Notably, is independent of .

.
The proof can be found in Appendix A.
Connection to Smoluchowski dynamics
We can draw a perhaps surprising connection between the second pseudo generator and Smoluchowski dynamics. Recall that (11) and thus deals with densities with respect to Lebesgue measure . However, to compare it to , we have to track the transport of densities with respect to . This transport is described by
(16) 
Note, however, that the transition from to is merely a basis transformation: Let, for a brief moment, and be the transfer operators generated by and , respectively. Then,
(17) 
exists on every , (choosing in Corollary 2.1), and thus because of (17), exists on . As we only work on weighted spaces, we drop the second subscript from now on and set
Corollary 4.5.
The pseudo generator (of the spatial transfer operator) is the infinitesimal generator of the Smoluchowski dynamics:
In conclusion, the density transport under spatial dynamics is similar to that under Smoluchowski dynamics, but on different timescales: The Taylor expansion of (in spirit of Proposition 2.5) gives
(18)  
while that of gives  
(19) 
Thus formally rescaling in (18) equals (19) up to second order terms in . We will make this rigorous in the following section.
4.2 Local reconstruction of the spatial transfer operator
In this section we aim to approximate in a way that is suitable for subsequent numerical metastability analysis.
To ensure that the various operators constructed with the pseudo generators are welldefined, we introduce the spaces
(20) 
Note that, despite a similar notation, is not the usual Sobolev space. is the highest derivative appearing in , so we require the corresponding differentiability. The choice of in is motivated by the definition of transition probabilities via the scalar product on , (15). However, all error estimates in this section hold for as well, if the definition of is also changed accordingly. The requirement that is mostly technical in nature. Arbitrary and will only appear in Theorem 4.6 and Lemma 4.10. Later on, only the space will be of interest, due to the simple structure of and . We will see later (Section 5.1) that in fact contains our objects of interest, namely the eigenvectors of our approximations to . Moreover, it is big enough to allow for a sensible discretization basis (Section 5.2).
Taylor reconstruction
Theorem 4.6.
Let . Then,
Proof.
Unfortunately, the for are not readily available. In those pseudo generators, higher derivatives of the potential appear, whose analytic or numerical evaluation can be costly (they are dimensional tensors).
In practice, however, the gradient (the ”force field“) typically is available, as it would be needed for numerical simulation of the system anyway. If we thus truncate the Taylorlike sum from Theorem 4.6 after the third term, higher derivatives of are avoided, as in the computation of and only occurs. We call
(21)  
the 3rd order Taylor approximation^{6}^{6}6As we never work with higher orders, we refer to simply as “the Taylor approximation” from now on. of . This yields the convergence result
Corollary 4.7.
Let . Then
Exponential reconstruction
We expect that approximates well (for ) and can be computed cheaply, provided and are available. However, unlike , is not normpreserving and positive for densities with respect to , i.e. for . Therefore, when transporting , we lose the interpretation of as a physical density.
Moreover, for sufficiently large, is not even a contraction on . With , ,
and so . We will see in the numerical experiments that this quickly (i.e. already for small to moderate ) destroys the interpretation of the eigenvalues of as metastability quantifiers.
Therefore we mainly use an alternative approximation to , called the exponential approximation , which is normpreserving and positive for densities, further contractive on . One has to be careful with notation, however. As is an unbounded operator on , an operator exponential of form cannot be defined by an infinite series. However, considerations of e.g. Pazy [38] allow us to define over a bounded operator approximating , the socalled Yosida approximation:
Lemma 4.8.
is a bounded linear operator on , and
Proof.
With this, we define
which has the desired properties:
Proposition 4.9.
Let . Then and Moreover, is a contraction on .
Proof.
The following statements describe the approximation quality of for small , analogous to the Taylor approximation:
Lemma 4.10.
Let . Then for ,
The proof can be found in Appendix A.
Corollary 4.11.
Let . Then
for .