Pseudo generators of spatial transfer operators

Pseudo generators of spatial transfer operators

Andreas Bittracher Center for Mathematics, Technische Universität München    Péter Koltai Mathematics Institute, Freie Universität Berlin    Oliver Junge11footnotemark: 1

Metastable behavior in dynamical systems may be a significant challenge for a simulation based analysis. In recent years, transfer operator based approaches to problems exhibiting metastability have matured. In order to make these approaches computationally feasible for larger systems, various reduction techniques have been proposed: For example, Schütte introduced a spatial transfer operator which acts on densities on configuration space, while Weber proposed to avoid trajectory simulation (like Froyland et al.) by considering a discrete generator.

In this manuscript, we show that even though the family of spatial transfer operators is not a semigroup, it possesses a well defined generating structure. What is more, the pseudo generators up to order 4 in the Taylor expansion of this family have particularly simple, explicit expressions involving no momentum averaging. This makes collocation methods particularly easy to implement and computationally efficient, which in turn may open the door for further efficiency improvements in, e.g., the computational treatment of conformation dynamics. We experimentally verify the predicted properties of these pseudo generators by means of two academic examples.

1 Introduction

Conformations of molecular systems

The properties of many biomolecular systems such as proteins or enzymes depend heavily on their molecular configuration, i.e. the position of single atoms relative to each other. It is often observed that the system tends to ”cluster” around certain key configurations. Transitions between these so-called conformations can be considered rare events, as the time scale on which they occur and the characteristic dynamic time scale of the atoms in the molecule typically lie orders of magnitude apart. Nevertheless, these transitions play an essential role for the biological function of these molecules [15, 50, 16, 35, 33]. The reliable identification of these conformations and the probabilities (and rates) of transitions between them via direct numerical simulation is computationally very demanding if not infeasible for larger molecules.

Transfer operator based methods

Molecular systems as described above are typically modeled as Hamiltonian systems, possibly including stochastic perturbations. Conformations then are almost-invariant (metastable) subsets of position space, corresponding to local minima of the potential energy surface. The ultimate goal of conformation analysis is to obtain a reduced model of the given system which accurately depicts these sets and the proper statistics of the transitions between them. This field of research, also called Markov state modeling, attracted a lot of interest in the last decade [4, 5, 21, 36, 39, 43, 48].

Pioneering work of Deuflhard, Dellnitz et al [10, 42] exploits that these almost invariant sets can in principle be identified through eigenfunctions of a certain linear operator, the transfer operator, which describes the evolution of distributions under the dynamics.

A direct application of this approach considers the operator acting on densities on the entire state space (i.e. position and momentum), while the conformational changes of interest are only observed in the position coordinate. Moreover, the approach is subject to the curse of dimension for all but the smallest systems, since a discretization of state space has to be constructed.

To remedy this, one might consider the “overdampled” or Smoluchowski dynamics [22], which acts on position space only, but this is a physically acceptable model of molecular motion only in the case that random collisions with the solvent overwhelm the effect of inertia in the molecule of interest. Schütte [41] came up with a physically justifiable solution as he introduced the so-called spatial dynamics, whose metastable sets still bear the interpretation of molecular dynamical conformations. The associated spatial transfer operator acts on densities on position space and can be seen as a momentum-averaged version of the full Hamiltonian or Langevin dynamics.

Commonly, the transfer operator is finitely approximated by a stochastic matrix, whose entries can be interpreted as transition probabilities of single system instances in the canonical ensemble from some subset (in state or position space) to another. These transition probabilities in turn are computed by short time integration of a number of trajectories starting in each subset. Thus, the computation of a few long simulations is replaced by the computation of many short trajectories for an ensemble of adequately distributed initial conditions. Still, the momentum averaging has to be done explicitely by additionally sampling the momentum space for each of these initial conditions.

Over the years, different techniques for a finite approximation of transfer operators have been proposed, we refer to [9, 7] and the references there. More recently, Weber [47] used meshfree approximation techniques and showed that for a given approximation error, the number of basis elements scales with the number of metastable sets, not necessarily with the dimension of the sytem. In [25], an approximation by a sparse Haar space was proposed in order to mollify the curse of dimension, while in [17] a tensor-product construction was used in combination with a mean field- approach. A novel approach to coarse-grain a multi-scale system by discretizing its transfer operator without using a full partition of the phase space [43] excels especially in efficiently reproducing the dominant time scales of the original system, but relies heavily on long trajectory simulations.

Elaborate schemes to extract the metastability information from the eigenvectors of the (approximate) transfer operator were developed in [22, 24, 12, 23].

Simulation-free and generator-based methods

All these methods rely on the numerical integration of trajectories. Only recently, methods have been proposed that require no time integration, albeit imposing further requirements on the system [27, 48, 18]: Under these conditions and provided that the system’s transfer operator forms a continuous time semigroup, one can exploit that the eigenvalues and -functions are the same as those of the semigroup’s infinitesimal generator. Discretizing this generator requires no time-integration and is thus computationally considerably cheaper than classical methods.

This manuscript

Unfortunately, Schütte’s spatial transfer operator is not a time semigroup. In this contribution, we define suitable pseudo generators of non-semigroup families of operators which inherit desirable properties of the spatial transfer operator as well as “restored” operators which approximate the spatial operator at least for small times. The appeal of these constructions from a numerical perspective is threefold:

  1. no numerical time integration is needed,

  2. momentum averaging is accomplished analytically, i.e. momentum sampling is completely avoided,

  3. the pseudo generators can be discretized by collocation methods, avoiding costly boundary integrals.

We establish theoretical asymptotic estimates on the error of density propagation, and validate them numerically. The numerical experiments indicate that the information on metastable sets (i.e. conformations) gained from the restored operators remains close to the “original” one gained from the exact spatial transfer operator, even for times beyond those guaranteed by our estimates. A quantitative understanding of this phenomenon (also observed by Schütte [41]) is still lacking; some steps towards a theoretical explanation have been made in [3].

The manuscript is structured as follows. In Section 2, we introduce the basic dynamical models we are working with and describe their action on propagating (probability) densities by transfer operators. This necessitates the discussion of operator semigroups, given at the end of the section. Section 3 is concerned with fluctuations in the spatial distribution of the system governed by its dynamics, leading to the concept of spatial transfer operators as well as metastability in position space. In Section 4, we introduce the concept of pseudo generators, the corresponding restored operators, and give asymptotic error estimates on their approximation quality. Section 5 includes numerical experiments. We conclude our work in Section 6, and discuss future directions to make the method applicable for realistic (bio-)molecular systems. Three appendices are given: Appendix A gives a detailed derivation of the pseudo generators up to order 3; Appendix B gives a complete and self-contained proof of the applicability of Huisinga’s theory [22, 24] on the quantitavive identification of metastable components from spectral analysis of transfer operators for the spatial transfer operator based on Langevin dynamics (especially reversibility and ergodicity of the spatial dynamics, which are probably known or at least anticipated, however we could not find neither a statement, nor even a partial derivation of these properties); in Appendix C, we show that the eigenfunctions of the spatial transfer operator are smooth, i.e. infinitely differentiable, if the potential is a smooth function.

2 Transfer operators and their generators

In this section we introduce the dynamical systems of interest as well as the concept of transfer operators for describing statistical transport under these dynamics.

2.1 Stochastic dynamics

Broadly speaking, we will be studying continuous time stochastic dynamical systems on a phase space . They will be described by -valued random variables , , following an Itô diffusion equation, i.e. a stochastic differential equation of the form


Here, denotes the differentiation with respect to , and  is a -valued “white noise” term, see e.g. [34, p. 61]. The functions and are assumed to be globally Lipschitz and growing at most linearly at infinity, such that (1) has unique solutions (cf. [34, Theorem 5.2.1]). Here and in the following boldface lower case letters denote random variables.

Molecular dynamics

Consider a molecular system described by positional degrees of freedom. Typically, these correspond to either internal coordinates or particle positions in , being the number of particles. Let denote the configuration space.

Let a potential , describing the relative energy of a given configuration , be continously differentiable111Later on, we will impose stronger assumptions on .. Under the assumption of strict total energy, the movement of the system is then described by classical deterministic Hamiltonian dynamics:


The phase space is thus , with the position and the momentum coordinate. For simplicity, we set the mass matrix  to be the identity, otherwise the first equation in (2) would be .

Equation (2) models a system “in vacuo”, independent from external influence. Of more physical relevance, however, are systems which are stochastically coupled to their surroundings, physically motivated by the presence of a heat bath or implicit solvent not modeled explicitly. A prominent way of doing this is via a drift-diffusion perturbation of (2), known as the Langevin equations, which can be formally derived using averaging techniques from the Mori-Zwanzig formalism [51],


These can be written in the form of (1) with

The term mimics the drag through the implicitly present solvent, accounts for random collisions with the solvent particles. To balance damping and excitation and to keep the system at a constant average internal energy222Actually, is the inverse temperature, , with Boltzmann’s constant and the system temperature. , we set . The choice of  is problem dependent, and mimics the viscosity of the aforementioned implicit solvent. For further details on the modeling see [6].

An even further model reduction leads to the so-called Smoluchowski dynamics. In the second-order form of (3),


we consider a high-friction situation . After appropriate rescaling of the time, (in order to be able to observe movement under the now extremely slow dynamics), (4) becomes

In the limit , this yields the Smoluchowski equation


This conceptual derivation can be made precise by considering stochastic convergence notions. The interested reader is referred either to [32], where the physical intuition and mathematical rigor are both kept at a high level333A reader with a physicists view might be irritated by the fact that the terms with  and  act as forces (or accelerations) in (3) and as velocities in (5). By eliminating the damping coefficient , the physical dimensions of the terms changed. Since the mathematical statement is not affected by this, we shall not go into details., or to [37], where homogenization techniques for the transfer operators of the underlying equations are exploited. Next, we will consider this latter, operator-based characterization of stochastic processes.

2.2 Transfer operators

We shall now examine how phase space density functions evolve under the dynamics induced by (1). That is, given a probability density at time , what is the probability to find the system in a certain region at time ? More precisely, given (a random variable distributed according to the density ), find with , , where the evolution of is governed by (1).

To this end, let be a probability space with denoting the Borel -algebra, and consider the stochastic transition function ,

and denote by


the transition probabilities between and , where indicates that ; i.e. the initial condition is distributed according to . For the long term macroscopic behavior of the system, sets play an important role for which  for some physically relevant measure  and times .

Now assume that an initial distribution 444In the literature, sometimes denotes the “pre-Lebesgue space”, i.e. the Lebesgue space before equivalence class formation, and usually denotes the actual Lebesgue space. Due to clash of notation, however, we call the actual Lebesgue space and use to denote the standard norm. is given. We then have that with


Under mild conditions555See e.g. [29]., satisfied by the systems considered here, is uniquely defined by (7). This yields the transfer operator with lag time  via

where we extend the definition of from densities to arbitrary integrable functions using linearity. In the deterministic case, this is the so-called Perron–Frobenius operator.

For some of the following results, it will be necessary to distinguish between the transfer operators of the general Itô diffusion (1), and the special cases of the Langevin (3) and Smoluchowski dynamics (5). We will then refer to them as , and , respectively. Of course, statements concerning hold for and as well.

Some properties of

can be considered a time-parametrized family of linear operators, which then possesses the Chapman–Kolmogorov (or semigroup) property:

  1. ,

  2. for all .

While a stochastic interpretation only makes sense in the preceding setting, the formal extension of to the spaces , , is well defined for proper choices of  (see Corollary 2.1). We thus have that and for .

Using the standard scalar product on , for , the transition probabilities can be expressed via the transfer operator:

with  being the indicator function.

2.3 Infinitesimal generators

The semigroup property basically means that  is “memoryless” (in other terms, (1) generates a Markov process). The identity suggests that all information about the density transport is contained in for arbitrarily small .

This is formalized by looking at an operator given by


where is the linear subspace of where the above limit exists. is called the infinitesimal generator of the semigroup , and the field of operator semigroup theory [38] answers the question in which sense is a solution operator to the Cauchy problem . Essentially, the power of the infinitesimal generator lies in the fact that all the relevant information about  for all times is already encoded in . We will discuss this below.

Invariant density

Having its interpretation in mind, it is not surprising that the infinitesimal generator is exactly the right hand side of the parabolic partial differential equation describing the flow of sufficiently regular densities, the Kolmogorov forward equation or Fokker–Planck equation, see [26, p. 282]:


In the Langevin case, this simplifies to


where the dot denotes the Euclidean inner product, and are the gradient and Laplace operators with respect to , respectively. For Smoluchowski, it is


Densities which are invariant under the dynamics play a naturally prominent role. Since they are fixed under  for any , by (8) they lie in the kernel of ; see also Corollary 2.3 below. For the stochastic processes considered here the invariant density can be shown to be unique (cf. [34]), and—using the term from statistical mechanics—we call it the canonical density. For the Langevin dynamics



with and . For the Smoluchowski dynamics, is the canonical density. is a density with respect to the Lebesgue measure , and its existence requires the integrability of ; which is from now on assumed to hold.

To understand their relevance, note that the canonical density not just describes the statistical equilibrium of the system, but the system also tends to this equilibrium as time grows, i.e. according to whatever the system is distributed initially, as in .

Domain and Spectral properties

The results of this paragraph hold true for the transfer operators of both the Langevin and the Smoluchowski dynamics, by taking the corresponding phase space and invariant measure (i.e. the measure having the canonical density as Radon–Nikodým derivative with respect to the Lebesgue measure).

Let be the invariant measure of . Note that for arbitrary , can be defined on due to the following corollary:

Corollary 2.1 ([2, Corollary to Lemma 1]).

Let be a transfer operator associated with a transition function having the invariant measure . Then is a well-defined contraction on for every .

For our purposes the main connection between a semigroup of operators and their generator is given by the following

Theorem 2.2 (Spectral mapping theorem [38]).

Let be a Banach space, , , a semigroup of bounded linear operators (i.e.  as for every , and  bounded for every ), and let be its infinitesimal generator. Then

with denoting the point spectrum. The corresponding eigenvectors are identical.

We can immediately deduce the following statements.

Corollary 2.3.

A function is an invariant density of for all , if and only if .

Corollary 2.4.

Since is a contraction in , the eigenvalues of lie in the left complex half-plane.

Theorem 2.2 suggests that . This intuition is false in general, as may be unbounded and for any . However, can be approximated by a truncated “Taylor series”, at least pointwise, in the function space


We require  and  to be -times differentiable, as this is the highest derivative occuring in , cf. (9).

The following convergence result also holds true if choosing instead of in the definition of , and correspondingly regarding the norm . However, we state it for , as this is the space we are ultimately operating in.

Proposition 2.5.

Let . Then


Let . Then, is times differentiable in  because exist as per choice of . The Taylor series expansion for Banach space valued linear operators can for example be found in [49, Section 4.5]. Application to yields

We estimate the remainder:

As we are interested in the limit we can assume . In that case, , and therefore
is the solution of the Fokker–Planck equation (9), and thus and, by extension, . Moreover, due to Pazy [38, Corollary 1.4], the transfer operator and generator commute: . Therefore,

In the last line, because is a contraction (Corollary 2.1). As by the choice of , it is bounded. This completes the proof. ∎

3 Spatial dynamics and metastability

In order to analyze the behaviour of molecular systems in regard of configurational stability, we have to restrict our view to the dynamics on position space . For this purpose, Schütte in [41] proposed a reduction of the classical Hamiltonian dynamics, called Hamiltonian dynamics with randomized momenta, while Weber [48] proposed the corresponding generalized version for a stochastic evolution. Following Schütte and Weber, we formulate the extension to Langevin dynamics and state the appropriate definition of metastability.

3.1 The spatial transfer operator

Consider an infinitely large number of identical systems of form (3) in thermodynamic equilibrium, i.e. identically and independently distributed according to (called an ensemble in classical statistical mechanics literature). To determine which portion of these systems undergo a certain configurational change, i.e. leave a subset , we have to track the evolution of all these systems starting from . Due to the product structure (12) of , their momenta are still distributed according to and so the whole coordinates are initially distributed according to , with the indicator function of on . This phase space density now evolves under , but as we are only interested in the position portion of the evolving density, we form the marginal distribution with respect to . The resulting spatial transfer operator on with is


cf. Corollary 2.1 applied to the operator and the invariant measure .

Intuitively, one can think of with normalized as transporting a positional portion of the canonical density.

3.2 Metastability on position space

Langevin dynamics with randomized momenta

Considering as an operator on and using the standard associated scalar product gives us access to certain transition probabilities on , which fit our intuition of metastability. For we call

the ”slice“ of phase space corresponding to . It represents a sub-ensemble in position space associated to all possible momenta. It is easy to see that the transition probabilities between slices and can be expressed in terms of :


where  is the stochastic transition function under Langevin dynamics with respect to the Lebesgue measure. We now call a disjoint decomposition of position space metastable if

The meaning of “” will become apparent later.

The connection between eigenvalues close to one of some transfer operator and metastable sets was first observed in [9] and applied in conformation dynamics in [10]. An extension to a broader class of transfer operators (satisfying an assumption related to self-adjointness) was provided by Huisinga and Schmidt [24]. Our falls into that class, which is shown in Appendix B.

Theorem 3.1 (Application of [24, Theorem 2]).

Let with and
be the largest eigenvalues of , with eigenvectors . Let be a measurable decomposition of and be the orthogonal projection onto , i.e.

The metastability of the decomposition can then be bounded from above by

while it is bounded from below by

where and .

Thus, the lower the projection error of , the better the lower bound matches the upper bound. We choose in accordance to the the sign structure of as a heuristic to the optimal decomposition (i.e. we treat the eigenfunctions as one-step functions of the form ).

However, more sophisticated strategies for extracting metastable sets are available, most notably the linear optimization-based PCCA-algorithm and its extensions, developed by Deuflhard et. al. ([11], [40]). Let it be noted that it is applicable to all the operators developed herein, as PCCA does not depend on the underlying dynamical model.

Smoluchowski dynamics

As described in section 2.1, another way to restrict the molecular dynamics to position space is via the high-friction limit and the arising transition from Langevin to Smoluchowski dynamics. As this limit may represent a considerable deviation from physical reality, it is initially unclear how metastability in system (5) can be interpreted in the context of the original system. As the transition also involves a rescaling of time, especially the transition probabilities have to be treated with caution.

Nevertheless, metastability under Smoluchowski dynamics can formally be defined as above, and it holds

Here, is the stochastic transition function with respect to Smoluchowski dynamics. Using the same reasoning as in the previous paragraph, we seek eigenpairs  of  with .

However, in the case of Smoluchowski dynamics, these are somewhat more accessible. Due to the Spectral Mapping Theorem 2.2, eigenvalues of near coincide with eigenvalues of the infinitesimal generator near , and the associated eigenvectors are identical. An efficient method for metastability analysis based on was developed in [18]. There it is shown that for an eigenvalue of , the corresponding eigenvector and the sets holds

Unfortunately, lacks the semi-group property, and so cannot be the solution operator of an autonomous PDE, such as the Fokker–Planck equation. Equivalently, spatial dynamics is not induced by an Itô diffusion process, and thus has no infinitesimal generator in the sense of (11).

4 The generating structure of spatial transfer operators

Formally, the time-derivatives of can still be defined, in analogy to (8). We will see in the following how the resulting operators can play the role of the infinitesimal generator in the context of metastability analysis.

4.1 Pseudo generators

We first define these time derivatives for general time-parameterized operators:

Definition 4.1.

Let be a Banach space, be a time-parametrized family of bounded linear operators.

Define the operator by

and call it the time-derivative of . here is the subspace of where the above limit exists. Iteratively, we define by the -th time-derivative on . Finally,

is called the -th pseudo generator of .

For , the transfer operator of an Itô process, the pseudo generators are the iterated infinitesimal generators:

Proposition 4.2.

On , the -th pseudo generator of takes the form

with the infinitesimal generator of the respective dynamics.

With this, the pseudo generators of the spatial transfer operator can be expressed by the generator of the full Langevin transfer operator:

Lemma 4.3.

On , the -th pseudo generator of takes the form


The first time-derivative of is

Inductively, this gives the -th time derivative

As , the -th pseudo generator is

From now on, when speaking of pseudo generators, we always mean pseudo generators of . Note that, in general, is not simply a power of , as the integral and the power of do not commute. We thus take a closer look at the first few :

Proposition 4.4.

Let be the spatial transfer operator for the Langevin dynamical process. On their respective domain, its first three pseudo generators take the form

  1. ,

  2. . Notably, is independent of .

  3. .

The proof can be found in Appendix A.

Connection to Smoluchowski dynamics

We can draw a perhaps surprising connection between the second pseudo generator and Smoluchowski dynamics. Recall that (11) and thus deals with densities with respect to Lebesgue measure . However, to compare it to , we have to track the transport of densities with respect to . This transport is described by


Note, however, that the transition from to is merely a basis transformation: Let, for a brief moment, and be the transfer operators generated by and , respectively. Then,


exists on every (choosing in Corollary 2.1), and thus because of (17), exists on . As we only work on weighted spaces, we drop the second subscript from now on and set

Comparing equation (16) to Proposition 4.4, we immediately see

Corollary 4.5.

The pseudo generator (of the spatial transfer operator) is the infinitesimal generator of the Smoluchowski dynamics:

In conclusion, the density transport under spatial dynamics is similar to that under Smoluchowski dynamics, but on different timescales: The Taylor expansion of (in spirit of Proposition 2.5) gives

while that of gives

Thus formally rescaling in (18) equals (19) up to second order terms in . We will make this rigorous in the following section.

4.2 Local reconstruction of the spatial transfer operator

In this section we aim to approximate in a way that is suitable for subsequent numerical metastability analysis.

To ensure that the various operators constructed with the pseudo generators are well-defined, we introduce the spaces


Note that, despite a similar notation, is not the usual Sobolev space. is the highest derivative appearing in , so we require the corresponding differentiability. The choice of in is motivated by the definition of transition probabilities via the scalar product on , (15). However, all error estimates in this section hold for as well, if the definition of is also changed accordingly. The requirement that is mostly technical in nature. Arbitrary and will only appear in Theorem 4.6 and Lemma 4.10. Later on, only the space will be of interest, due to the simple structure of and . We will see later (Section 5.1) that in fact contains our objects of interest, namely the eigenvectors of our approximations to . Moreover, it is big enough to allow for a sensible discretization basis (Section 5.2).

Taylor reconstruction

Combining Proposition 2.5 and Lemma 4.3 gives the following natural Taylor reconstruction of :

Theorem 4.6.

Let . Then,


By definition of and Lemma 4.3, we can write

However, the integrand is of order by Proposition 2.5. ∎

Unfortunately, the for are not readily available. In those pseudo generators, higher derivatives of the potential appear, whose analytic or numerical evaluation can be costly (they are -dimensional tensors).

In practice, however, the gradient (the ”force field“) typically is available, as it would be needed for numerical simulation of the system anyway. If we thus truncate the Taylor-like sum from Theorem 4.6 after the third term, higher derivatives of are avoided, as in the computation of and only occurs. We call


the 3rd order Taylor approximation666As we never work with higher orders, we refer to simply as “the Taylor approximation” from now on. of . This yields the convergence result

Corollary 4.7.

Let . Then

Exponential reconstruction

We expect that approximates well (for ) and can be computed cheaply, provided and are available. However, unlike , is not norm-preserving and positive for densities with respect to , i.e. for . Therefore, when transporting , we lose the interpretation of as a physical density.

Moreover, for  sufficiently large, is not even a contraction on . With ,  ,

and so . We will see in the numerical experiments that this quickly (i.e. already for small to moderate ) destroys the interpretation of the eigenvalues of as metastability quantifiers.

Therefore we mainly use an alternative approximation to , called the exponential approximation , which is -norm-preserving and positive for densities, further contractive on . One has to be careful with notation, however. As is an unbounded operator on , an operator exponential of form cannot be defined by an infinite series. However, considerations of e.g. Pazy [38] allow us to define over a bounded operator approximating , the so-called Yosida approximation:

Lemma 4.8.

is a bounded linear operator on , and


As the infinitesimal generator of the Smoluchowski dynamics, fullfills the assumptions of [38, Theorem 3.1]. Thus, the statement holds due to [38, Lemma 3.3]. ∎

With this, we define

which has the desired properties:

Proposition 4.9.

Let . Then and Moreover, is a contraction on .


Due to Corollary 4.5 and Lemma 4.8, is simply a time-scaled version of the transfer operator of the Smoluchowski dynamics:

As such, it inherits the desired properties from . ∎

The following statements describe the approximation quality of for small , analogous to the Taylor approximation:

Lemma 4.10.

Let . Then for ,

The proof can be found in Appendix A.

Corollary 4.11.

Let . Then

for .