Dimensional Reduction of Markov State Models from Renormalization Group Theory

# Dimensional Reduction of Markov State Models from Renormalization Group Theory

S. Orioli Dipartimento di Fisica, Università degli Studi di Trento, Via Sommarive 14 Povo (Trento), 38123 Italy. Trento Institute for Fundamental Physics and Applications (INFN-TIFPA), Via Sommarive 14 Povo (Trento), 38100 Italy.    P. Faccioli Dipartimento di Fisica, Università degli Studi di Trento, Via Sommarive 14 Povo (Trento), 38123 Italy. Trento Institute for Fundamental Physics and Applications (INFN-TIFPA), Via Sommarive 14 Povo (Trento), 38100 Italy.
###### Abstract

Renormalization Group (RG) theory provides the theoretical framework to define Effective Theories (ETs), i.e. systematic low-resolution approximations of arbitrary microscopic models. Markov State Models (MSMs) are shown to be rigorous ETs for Molecular Dynamics (MD). Based on this fact, we use Real Space RG to vary the resolution of a MSM and define an algorithm for clustering microstates into macrostates. The result is a lower dimensional stochastic model which, by construction, provides the optimal coarse-grained Markovian representation of the system’s relaxation kinetics. To illustrate and validate our theory, we analyze a number of test systems of increasing complexity, ranging from synthetic toy models to two realistic applications, built form all-atom MD simulations. The computational cost of computing the low-dimensional model remains affordable on a desktop computer even for thousands of microstates.

## I Introduction

MSMs – for recent reviews see Pande0 (); Noe0 (); Prinz0 (); Noe1 (), for a discussion of mathematical aspects see MSMfund1 (); MSMfund2 (); MSMfund3 (); MSMfund4 ()– provide a coarse-grained description of the conformational dynamics of macromolecules in which the probability for the system to be found in a discrete set of metastable states (herby termed microstates) evolves in time according to a master equation. The microstates and all the parameters of the master equation are obtained by reducing the data generated by MD simulations.

Coarse-graining the dynamics at the MSM level enables to extract the relevant kinetic and thermodynamical information from many short independent MD trajectories. The larger is the number of microstates in the model, the shorter are the MD trajectories which need to be run in order to specify the master equation. By choosing a sufficiently large number of microstates, the computational effort can be massively distributed and it becomes possible to investigate the dynamics for time intervals inaccessible to straightforward MD simulations. For example, in applications to protein dynamics,  Pande1 (); Bowman0 () microstates are usually required to achieve an efficient sampling of the configuration space.

Unfortunately, with such a large number of states it is hard to gain insight into the conformational dynamics by directly inspecting the transition pathways in the space of microstates. A way around this problem consists in further coarse-graining the representation of the dynamics, by clustering (lumping) the microstates into supergroups that are usually referred to as macrostates. Physically, macrostates are to be interpreted as larger metastable regions of the configuration space, which comprize many smaller and fast interconverting microstates.

Several algorithms have been developed to perform dimensional reduction of MSMs (see e.g. Huang0 (); Schutte0 (); Bowman0 ()), and some of these have been compared and assessed on a few benchmark systems in Ref. Bowman1 (). It was found that the structure and the optimal number of macrostates is quite sensitive to the specific algorithm adopted to lump the microstates. Furthermore, all the considered algorithms were found to fail in reproducing the kinetics of the original MSM, by underextimating the relaxation timescales by a factor as large as . Alternative lumping schemes which provide relaxation times scales in much better agreement with the original model have been developed using Hidden Markov models hidden () or by means of projection techniques pcca (); HummerSzabo (); DeLosRios1 (); DeLosRios2 ().

In this work, we frame the theoretical foundation of MSMs and their dimensional reduction, within the ET formalism. ETs –for an excellent pedagogical introduction see Lepage ()— are systematic low-resolution approximations of arbitrary more microscopic theories. Unlike coarse-graining approaches inspired by heuristic or phenomenological arguments, ETs can be rigorously derived starting from the underlying microscopic theory using RG theory. Consequently, under certain well-defined conditions to be discussed below, ETs are guaranteed to approximate the long-distance (or long-time) dynamics within a degree of accuracy which can be estimated a priori and systematically improved.

The physical picture behind the ET formalism is very familiar: Any experimental probe with wavelength (or frequency is insensitive to the details of the physics at lengthscales (or timescales ). As a consequence, as long as one is interested on the infra-red (IR) physics, i.e. in the behaviour of observables at distances or times , all the so-called ultra-violet (UV) details of a microscopic theory are irrelevant, and can be accurately mimicked by a set of effective parameters. These parameters have to be computed from the underlying microscopic theory, or extracted from experimental data. Clearly, ETs are only applicable to physical systems which display a gap in characteristic scales, i.e. for which it is possible to separate IR from UV scales.

A familiar example of ET is the multi-pole expansion of classical electrodynamics: The electric field generated by an arbitrary localized charge distribution of size is systematically approximated at distances by the series of multi-poles. In the terms of this series, the fine-grained structure of the charge distribution at the scale is mimicked by effective constants like total charge, dipole moment, etc.. This example illustrates that ETs are usually much simpler than the corresponding underlying microscopic theories. On the other hand, their accuracy of any ET breaks down at the scales comparable with the UV cut-off.

The identification of MSMs with ETs paves the door to using RG methods in order to systematically lower their time- and space- resolution. In fact, one of the main results of this work is the derivation of a physically sound and mathematically rigorous dimensional reduction scheme for MSMs based on the so-called Real-Space RG formalism. This approach was inspired by the work of Degenhard and Rodriguez-Laguna RG1 (); RG2 (), who used the RG to lower the computational cost of integrating non-linear partial differential equations.

In the following sections, we shall first review how MSMs emerge as rigorous ETs for MD. Then, in section III we develop our RG scheme for dimensional reduction and describe the practical implementation of the corresponding algorithm. In sections IV and V we present a number of illustrative applications of increasing complexity and assess the accuracy of our scheme. Finally, our main results are summarized in section VI.

## Ii Theoretical Framework of Markov State Models

MSMs can be defined without explicitly referring to a particular type of MD Sarich (). However, from a mathematical standpoint, the connection between MSMs and rigorous ETs is particularly manifest for systems which obey the over-damped Langevin equation. In turn, in view of Zwanzig-Mori projection formalism Zwanzig0 (); Mori (), such a stochastic differential equation can be regarded as the low-energy approximation of an underlying Hamiltonian dynamics.

Let us therefore consider a system composed of atoms, which obey the equation

 ˙xi=−1kBTD ∇iU(x)+ηi(t), (1)

where is a point in configuration space, is the potential energy and is the diffusion coefficient (assumed to be the same for all atoms, for the sake of notational simplicity). is a white Gaussian noise obeying the fluctuation-dissipation relationship,

 ⟨ηi(t)⋅ηj(t′)⟩=6D δij δ(t−t′),i,j=1,…,Na. (2)

The probability distribution sampled by the stochastic differential Eq. (1) satisfies then the Fokker-Planck (FP) equation

 ∂∂t P(x,t)=−HFPP(x,t) (3)

where is the non-hermitian operator

 HFP=−DNa∑i=1∇i⋅(∇i+β∇iU(x))(β≡1/kBT), (4)

From it is possible to define a hermitian operator by performing the following non-unitary transformation:

 ⎧⎪⎨⎪⎩Hh=eβ2U(x) HFP e−β2U(x)=−D∇2+Dβ24 [(∇U(x))2−2β∇2U(x)]ψ(x,t)=e−β2U(x)P(x,t) (5)

This transformation turns the FP equation into a Schrödinger equation in imaginary time:

 −∂∂tψ(x,t)=Hh ψ(x,t). (6)

In the following, the function will be referred to as the hermitian component of the probability density .

It is straightforward to show that and have the same spectrum, which is non-negative definite and contains a null eigenvalue, . The right zero-mode is the Gibbs distribution:

 HFPfR1(x)=0,fR1(x)=1Z e−βU(x), (7)

where is the system’s canonical partition function. The hermitian components of the left- and right- eigenstates of are related to the eigenstates of the corresponding hermitian operator :

 fRi(x) = e−β2U(x) ϕi(x) fLi(x) = eβ2U(x) ϕi(x). (8)

Finally, we note that the probability density entering the FP equation (3) can be expanded as a series of right eigenfunctions of or, equivalently, eigenfunctions of :

 P(x,t)=∞∑i=1ci fRi(x) e−λit=e−β2U(x)∞∑i=1ci ϕi(x) e−λitwhereci≡∫dxfLi(x)P(x,0). (9)

### ii.1 Definition of Microstates

Let us now specialize on molecular systems in which the typical relaxation times required to attain local thermal equilibrium within all metastable states are much smaller than the timescales associated to transitions between such metastable states. This condition is realized if the thermal energy is much lower than the energy barriers between local minima and implies that the spectrum of the operator is gapped. In the following, we shall always assume such a low-temperature regime. Consequently, MSM can only deal with dynamics at time intervals much larger than the inverse of the lowest eigenvalue above the gap, . Thus, represents a typical value for the UV cut-off scale of the ET, i.e. .

To explicitly construct such an ET, starting from the microscopic Fokker-Planck dynamics, we observe (see e.g. the discussion in Ref. MSMfund4 ()) that there exist exactly linear combinations of the right-eigenstates below the gap,

 p(i)(x)=N∑j=1Cij fRj(x) (10)

which simultaneously satisfy the following three properties: (i) non-negativity, , (ii) disjointness, for and and (iii) local Gibbseanity,

 p(i)(x)=1zie−βU(x)h(i)(x)~% {}with~{} zi=∫dx h(i)(x) e−βU(x), (11)

where is negligible everywhere and equal to 1 in the vicinity of one and only one of the local meta-stable states. is then interpreted as the probability distribution associated to the th so-called Markov state and denotes the corresponding characteristic function (see Fig. 1).

It is also relevant to consider the following linear combinations of the left eigenstates , with :

 a(i)(x)=N∑j=1Cij fjL(x). (12)

The interest in these linear combinations resides in the fact that the functions are proportional to the characteristic functions of the microstates, thus are approximatively constant where is non-negligible, and are negligible elsewhere. To see this, it is sufficient to isolate the hermitian component of the distributions:

 a(i)(x)=N∑j=1Cij ϕj(x) eβ2U(x)≃p(i)(x) eβU(x)≃1zih(i)(x) (13)

In the hermitian formalism, the left- and right- state distributions are replaced by a single distribution :

 π(i)(x)≡hi(x)√zi e−β2U(x)=√zi  × ⎧⎨⎩e−β2U(x) a(i)(x)e+β2U(x) p(i)(x). (14)

In practical applications, MSMs are never defined by directly diagonalizing , as in this purely theoretical discussion. Instead, they are built by analyzing an ensemble of MD trajectories by means of dimensional reduction methods such as Time-lagged Independent Component Analysis (TICA) TICA1 (); TICA2 (); TICA3 (); TICA4 () or Principal Component Analysis (PCA) TICA1 (); PCA () and by geometric clustering of the projected configurations. In this case, the cut-off scale is identified with the frequency of the slowest spectral component which is projected out, e.g. the inverse of the TICA timescale associated to the first excluded independent component.

Once the dynamics has been coarse-grained at the level of microstates, the explicit dependence on the -dimensional configuration point becomes redundant. It is then convenient to introduce a formalism in which such a dependence is removed altogether. Hence, from this point on we shall use the hermitian formulation of the Fokker-Planck dynamics (6) and adopt Dirac’s “bra-ket" formalism, in which the normalized microstates are identified with points in a Hilbert space .

To construct the microstates , we begin by introducing position eigenstates and the corresponding hermitian operator :

 ^X|x⟩=x|x⟩, (15)

where the eigenvalues of are points in the -dimensional configuration space. Notice that the position eigenstates obey the normalization condition

 ⟨x|x′⟩=δ(x−x′). (16)

The ket- and bra-microstates, and are defined from the hermitian components of the state distributions :

 |i⟩ = ∫dx π(i)(x)|x⟩ (17) ⟨i| = ∫dx π(i)(x)⟨x|(i=1,…,N) (18)

Note that they approximatively form an orthonormal set:

 ⟨i|j⟩=∫dx π(i)(x) π(j)(x)=δij (19)

At finite temperature, orthogonality is weakly violated by the exponentially small overlaps between the microstates.

The instantaneous configuration of a system is described by a time-dependent state , defined by

 |ψ(t)⟩ = ∫dx ψ(x,t)|x⟩, (20)

where is the solution of the imaginary-time Schrödinger equation (6). In appendix A, we show that if the distribution can be expressed through a linear combination of eigenstates below the gap – like in Eq. (9) –, then the state can be expressed as time-dependent linear combinations of the ket-microstates :

 |ψ(t)⟩ = ∑ini(t)√zi |i⟩. (21)

The time-dependent coefficients

 ni(t) = √zi ⟨i|ψ(t)⟩ (22) = √zi ∫dx π(i)(x) ψ(x,t) = ∫dx hi(x) P(x,t)

are interpreted as the probabilities of observing the molecule in the different microstates at time and will be called the instantaneous microstate populations. In the low-temperature limit, the time spent by the system in crossing high-energy regions is exponentially small compared to the time spent in the states. Consequently, the following sum-rule holds:

 N∑i=1√zi⟨i|ψ(t)⟩=N∑i=1 ni(t)≃1. (23)

Note that the normalization factor ensures that the correct equilibrium populations are attained in the long-time limit:

 limt→∞ni(t)=limt→∞∫dx P(x,t) hi(x)=1Z ∫dx e−βU(x)hi(x)=ziZ. (24)

where is the system partition function, in the low-temperature limit.

The action of operators on the arbitrary state is defined by its action on the microstates , i.e.

 ^O |ψ(t)⟩≡N∑ij=1 Oij |i⟩⟨j|ψ(t)⟩=N∑ij=1 nj(t)√zj Oij |i⟩ (25)

where is a matrix.

### ii.2 Dynamics in the Space of Microstates

To determine the dynamics of the state it is convenient to introduce a time-evolution operator,

 |ψ(t+τ)⟩=^Uτ |ψ(t)⟩ (26)

The ”wave-function” evolves according to

 ψ(x,t+τ)=⟨x|ψ(t+τ)⟩=∫dx′⟨x|^Uτ|x′⟩⟨x′|ψ(t)⟩=∫dx′⟨x|^Uτ|x′⟩ ψ(x′,t). (27)

where is the imaginary-time propagator

 ⟨x|^Uτ|x′⟩=⟨x|e−τ^H|x′⟩ (28)

and will be called the (effective) Hamiltonian operator.

We are now in a condition to derive the time-evolution of the populations of the microstates :

 ni(t+τ) = √zi⟨i|ψ(t+τ)⟩=N∑j=1√zizj⟨i|e−^Hτ|j⟩ nj(t) (29) ≡ N∑j=1 Tij(τ) nj(t).

The so-called transition probability matrix

 Tij(τ)=T|j⟩→|i⟩(τ)≡√zizj ⟨i|e−^Hτ|j⟩ (30)

expresses the probability that a system prepared in microstate is found in microstate after a time interval . In the MSM literature, the time interval is ofter referred to as lag-time. We emphasize that the matrix (30) manifestly satisfies the detailed balance condition, which implies that is a left stochastic matrix:

 N∑i=1Tij(τ)=1∀τ∀j=1,…,N. (31)

If the lag-time is finite, Eq. (29) defines a so-called discrete-time master equation. Conversely, if represents an infinitesimal time interval, Eq. (29) defines as a so-called continuous-time master equation (see e.g. Sriraman () for related discussions),

 ˙ni(t) = N∑j=1Kijnj(t)Kij={kij≥0 if i≠j−∑ikij if i=j (32)

The is called the transition rate matrix (sometimes also termed kinetic matrix or Markov generator). Its relationship with the transition probability matrix is simply given by

 Kij=limτ→0Tij(τ)−δijτ. (33)

We emphasize again that, in the RG perspective, the symbol in Eq. (32) does not represent a real derivative, but rather a finite increment. Indeed, cannot be set smaller than the inverse frequency cut-off scale . From this discussion it follows that the lag-time of discrete-time MSMs must obey . Models which violate this inequality cannot be interpreted as rigorous ETs of MD. The hierarchies of different frequencies scales which must be obeyed for a MSM to represent an ET is schematically illustrated in Fig.2.

Finally, the equation expressing the evolution of the states in the Hilbert space is

 −ddt|ψ(t)⟩=^H|ψ(t)⟩≡[−∑ij√zjziKij|i⟩⟨j|]|ψ(t)⟩. (34)

Notice that this equation provides the discrete formulation of the effective Schrödinger equation (SE) and that the symmetry of the following matrices (as introduced in Sriraman ())

 Hij ≡ ⟨i|^H|j⟩=−√zjzi Kij (35) Ξij(τ) ≡ √zjziTij(τ) (36)

expresses the detailed balance condition. It is straightforward to show that and share the same spectrum, up to an overall sign. Indeed,

 ∑jHijvj=−∑j√zjziKijvj=λvi (37)

By defining , one has . The same argument applies also to the relationship between and .

## Iii Even coarser-grained ETs: Dimensional Reduction of MSMs

In the previous sections we have seen how it is possible to derive a MSM which provides the rigorous ET for the system’s slow relaxation dynamics. Now, we further assume that the spectrum of the associated master equation is also gapped –see Fig.2b–, enabling to separate its slow and fast relaxation processes. Under such circumstances it is possible to use RG theory to further coarse-grain the dynamics, i.e. to build a lower dimensional MSMs, in which the number of macrostates is set by the number of eigenvalues below the gap.

To introduce such a dimensional reduction, let us begin by considering the case of a continuous-time MSM.The dynamics of the populations is determined by the master equation (32) which can be re-written in the following form:

 ni(t+dt)=N∑j=1(δij+dt Kij) nj(t) (38)

We stress once again that plays the role the UV cut-off of this ET, hence it must be chosen longer than the inverse UV cut-off frequency, yet shorter than all relaxation times predicted by the rate matrix.

Geometrically, the set of microstate populations can be identified as the coordinates of a unit-norm vector belonging to the positive quadrant of a -dimensional vector space – see Fig. 3. Let us now introduce the so-called truncation matrix , which maps vectors belonging to onto some insofar unspecified -dimensional linear subspace . In particular, the time-dependent population vector is transformed by as follows:

 ni↦n′I≡N∑j=1 GIj nj,(I=1,…,M.j=1,…,N) (39)

From here on, we shall always denote with uppercase letters all the indices ranging from to and with lowercase letters those ones ranging from to . Since , the truncation matrix is obviously not invertible. However, it is possible to define its so-called pseudo-inverse matrix which coincides with for and satisfies the so-called Moore-Penrose relationships (see Appendix B). In practice, the pseudo-inverse of the truncation matrix (which we shall refer to as the embedding matrix) can be explicitly constructed from the singular value decomposition of the truncation matrix :

 GIj=(UΣVT)Ij⇒GPiJ≡(VΣ−1UT)iJ, (40)

where and are the orthogonal matrices of the left- and right- singular vectors of and is the rectangular matrix with the singular values on the diagonal.

Using the embedding and truncation matrices, we can define a new lower-dimensional master equation:

 n′I(t+dt)=M∑J=1(δIJ+dt K′IJ) n′J(t) (41)

where

 K′IJ=(GKGP)IJ (42)

is an effective transition rate matrix.

Our goal is to identify the truncation and embedding matrices and for which the lower-dimensional master equation (41) provides the best possible Markovian low-dimensional approximation of the dynamics defined by the original dimensional master equation (38). In this case, the components of the dimensional vector are interpreted as the probabilities of observing the system in each of the different macrostates and the matrix contains the interconversion rates between macrostates.

Our general strategy is schematically represented in Fig. 3: we first introduce a norm which quantifies the difference between the dynamics described by the original master equation (38) and the reduced master equation (41), thus defines the error introduced by the dimensional reduction. Next, we vary the choice of the subspace until such a difference is reduced to a minimum. The vector space which corresponds to the least difference will be called the relevant subspace.

To implement this scheme in practice, we begin by embedding the effective dynamics given by Eq. (41) in the larger vector space , where the original dynamics given by (32) is defined. This can be done by applying the embedding matrix to Eq. (41):

 ni(t+dt)=N∑j=1(GP(I+dtK′)G)ij nj(t) =∑jRijnj(t)+dtN∑j=1(RKR)ij nj(t). (43)

where the projector

 Rij≡(GPG)ij=M∑K=1v(K)iv(K)j (44)

is called the reduction matrix. In this equation, is the th component of the -th right-singular vector of the truncation matrix .

It is now convenient to switch to the quantum-mechanical notation introduced in the previous section. We define the reduction operator acting on the Hilbert space:

 ^R=M∑K=1|vK⟩⟨vK|=^I−N∑k=M+1|vk⟩⟨vk|. (45)

where the states are called the target states and are related to the components of the right singular vectors of the truncation matrix , i.e.

 v(k)i≡⟨i|vk⟩. (46)

Notice that, with this definition, the reduction operator and the reduction matrix are related by . Thus, finding the optimal dimensional reduction is equivalent to identifying the set of target states whose elimination from the Hilbert space minimally affects the time evolution of the probability state .

It is most convenient to perform this elimination by projecting out one target state at the time. Let us therefore discuss the elimination of the first one, :

 ^R=^I−|v⟩⟨v|=^I−^P|v⟩ (47)

The difference between the evolution of the population vectors in the original and in the effective dynamics is written component-wise as follows

 noriginali(t)−neffectivei(t) = √zi⟨i|ψ(t+dt)⟩original−√zi⟨i|ψ(t+dt)⟩effective (48) = √zi⟨i|[(^I−dt ^H)−(^R−dt ^R ^H ^R)]|ψ(t)⟩

Hence, the term into square bracket on the right-hand side expresses the error which is introduced in the dynamics by projecting out the target state , thus is called the error operator. Recalling definition (47), it can be written as follows:

 ^ε=^P|v⟩−dt ({^H,^P|v⟩}−⟨v|^H|v⟩^P|v⟩), (49)

where denotes the anti-commutator. Our goal is then to minimize the Frobenius norm111There are many possible norms between which one can choose, a priori. We follow the choice of Ref.s RG1 (); RG2 (). of the error operator with respect to the choice of target state :

 |^ε|2F≡∑ij|⟨i|^ε|j⟩|2=∑ij[vivj−dt (∑lHilvlvj+∑lvivl Hlj−⟨v|^H|v⟩ vivj )]2. (50)

Expanding the square and retaining only the leading-order terms in we find:

 |^ε|2F = 1−2dt∑ijvivjHij+O(dt2) (51)

The target vector components are found by extremizing the bilinear under the normalisation constraint . Introducing a Lagrange multiplier and imposing stationarity with respect to variation of we obtain:

 0=∂∂vk[∑ijvivjHij+λ(1−∑iv2i)] (52)

which yelds

 Hijvj=λvi=|λ| vi (53)

Thus, we have found that all eigenvectors of the orthogonal matrix locally minimize the target function inside the square brackets of Eq. (52). In particular, if are the components of the th eigenvector, then and the error introduced by projecting out this vector is

 |^ϵ|2F≃1−2dt|λk|. (54)

The global minimum of our error estimator is realized by projecting out the eigenvector of with the largest eigenvalue. This procedure should be repeated to project out all eigenstates with eigenvalues above the gap (i.e. all system’s eigenmodes with fast relaxation frequencies). Indeed, from Eq. (51) it follows that the elimination of all these target vectors generates errors which are small and comparable. The elimination of all these states results in the lowering of the cut-off. The new UV scale is set by the frequency of the slowest mode which was projected out –see Fig. 2a–.

The renormalization of a discrete-time MSMs is completely analog and is reported in appendix C. We recall that, in the discrete-time formulation, one needs to specify the lag-time . For the dimensionally reduced MSM to represent a rigorous ET, such a lag-time must be chosen in such a way to remain larger than the new UV time cut-off scale . In other words, the RG transformation of a discrete-time MSMs is intrinsically consistent only if the lag-time of the original MSM is much longer than the inverse of the smallest relaxation frequency projected out during the dimensional reduction –see Fig.2a–.

### iii.1 Identifying the Macrostates and Computing the Reduced Kinetic Matrix

So far we have developed a procedure to identify subspace of the original vector space where the relevant dynamics takes place. Let us now address the problem of identifying the macrostates. We recall that we are working under the hypothesis that also the spectrum of the effective hamiltonian operator is gapped, namely that the first right eigenstates

 ^H|vK⟩=λK|vK⟩ (55)

have eigenvalues well separated by all other eigenvalues , with . In the previous section, we discussed how the existence of a gap in the spectrum of the Fokker-Planck operator leads to the definition of continuous state distributions , associated to the system’s microstates. The same arguments can be repeated at the discrete level and lead to the definition of macrostates. We expect to to find linear combinations of the lowest right-eigenvectors of the hermitian Hamiltonian operator ,

 |J⟩≡M∑K=1TJK|vK⟩J=1,…,M, (56)

which obey the following properties:

1. The coefficients expressing the ket-macrostates as linear combination of the microstates,

 |J⟩=N∑i=1Π(J)i|i⟩ (57)

are non-negative, . Those coefficients are the analog of the entering Eq. (11).

2. The states are disjoint, that is , for all . Disjointness implies that each microstate belongs to one and only one of the states.

One can immediately notice that Eq. (56) guarantees that the projector operates as the identity in the relevant space :

 ^R|J⟩=M∑H=1|vH⟩⟨vH|M∑K=1TJK|vK⟩=M∑K=1TJH|vH⟩=|J⟩ (58)

This means that the projector operator can be rewritten as a projector on the macrostates:

 ^R=M∑J=1|J⟩⟨J|. (59)

Given Eq. (59), let us compute the projector matrix elements on the microstates basis:

 (GPG)ij=Rij=⟨i|^R|j⟩=M∑J=1⟨i|J⟩⟨J|j⟩=M∑J=1Π(J)i⟨J|j⟩ (60)

From this we deduce that

 GPiJ=Π(J)i. (61)

Furthermore, to make sure that the relation is satisfied, one necessarily has to define the bra-macrostates as

 ⟨J|=N∑i=1(Π(J)i)P⟨i|=N∑i=1GJi⟨i|. (62)

Thus, the expansion coefficients of the macrostates in the microstates basis naturally provide the truncation and embedding matrices. We recall that the condition of minimum Frobenius norm of the error operator poses a constraint only on the reduction matrix , thus leaving some arbitrariness on the choice of and . Within the manifold of and matrices which satisfy , only the choice given by (61) and (62) leads to the correct probabilistic interpretation, i.e. ensues that is related to the probability of observing the microstate inside the macrostate .

Let us finally tackle the problem of how constructing the macrostates starting from the set of microstates of the system. To this goal, we consider a new set of states defined by:

 |Mi⟩=^R|i⟩⟨Mi|=⟨i|^R (63)

Then,

 ⟨j|Mi⟩=⟨j|^R|i⟩=Rij⟨Mi|j⟩=⟨i|^R|j⟩=Rji=Rij. (64)

where the second line in Eq. (64) follows from the fact that is a projector, so . Let us now suppose that some microstate is contained in the macrostate (i.e. ). Then, it is immediate to prove that the state state has non-zero overlap with :

 ⟨I|Mi⟩=⟨I|^R|i⟩=⟨I|i⟩=Π(I)j>0 (65)

where we used the fact that belongs to the relevant subspace. Conversely, if the microstate is not contained in the macrostate , then and have no overlap (up to correction of order ):

 ⟨I|Mi⟩=⟨I|i⟩∼O(λΛ). (66)

Hence, if the state overlaps with , then it has no relevant overlap with any other macrostate . This implies that each of the states is either null or parallel to one and only one macrostate.

Given this consideration, we can compute the coefficients of the generic macrostate by imposing the same normalization adopted for the macrostates:

 δij=⟨Mi|Mj⟩=⟨i|^R^R|j⟩=⟨i|^R|j⟩=Rij (67)

This means that the coefficients are simply provided by

 Π(I)i=GPiI=RiI√Rii (68)

where are the linearly independent rows corresponding to the arbitrary choice of the linearly independent states. In Appendix D we show that , which implies

 GIi=RTIi√Rii. (69)

The last ingredient of our Renormalization Group procedure concerns computing the effective kinetic matrix. We start from the effective Hamiltonian operator represented in the macrostates basis

 H′IJ=⟨I|^H|J⟩=√ZJZIK′IJ (70)

where is the partition function of the th macro state On the other hand,

 H′IJ=⟨I|^H|J⟩=N∑i,j=1⟨I|i⟩⟨i|^H|j⟩⟨j|J⟩=N∑i,j=1GIiHijGTjJ (71)

Plugging Eq. (70) into Eq. (71) one finds that the effective kinetic matrix is the solution of the following equation

 K′IJ=√ZIZJN∑i,j=1GIiHijGTjJ. (72)

We emphasize that this is an implicit relationship. Indeed the partition functions in the right-hand-side are the components of the lowest right-eigenvalues of the effective kinetic matrix .

All the proofs given in this section hold also for discrete-time MSMs, since and share the same eigenvectors . Thus, the macrostates of a system described by a transition probability matrix are simply provided by Eqs. (61) and (69) and the corresponding effective transition probability matrix is readily obtained as

 T′IJ(τ)=√ZIZJN∑i,j=1GIiΞij(τ)GTjJ (73)

where is defined in Eq. (36).

To summarize, Eq.s (68), (72) and (73) are the most important results of this paper. Indeed, Eq. (61) provides the definition of macrostates in terms of microstates, while Eq.s (72) and (73) provide the new continuous- and discrete-time master equations, which approximate the MD in the reduced space of macrostates.

### iii.2 The Renormalization Group Clustering Algorithm

We now describe our Renormalization Group Clustering (RGC) algorithm, which implements the RG theory developed so far. Here we illustrate it for a continuous-time MSM. The formulation of the same algorithm for a discrete-time MSM is analog and is reported in Appendix C.

1. Compute the spectrum of relaxation frequencies from the kinetic matrix . Retain only the frequencies below the gap, i.e. up to , where , and the corresponding set of right eigenvectors. Let us call the eigenvector corresponding to and define

 Δ=∣∣∣λMλM+1∣∣∣≪1; (74)

which quantifies the extent the spectral gap.

2. Compute the hamiltonian matrix ;

3. Build the projector where is the -th components of the -th lowest eigenstate of the hamiltonian matrix;

4. Extract the linearly independent vectors from to build the matrix ;

5. The matrix may contain entries of order , some of which may be negative. To preserve only the relevant terms and the probabilistic interpretation, filter out all such terms using the following rule: A matrix element is set to if there exists a element such that

 ∣∣∣GIjGIk∣∣∣≤Δ. (75)

Once the irrelevant terms have been deleted from the matrix, normalize its rows dividing them by (which is equivalent to normalizing to 1 the sum of the elements in each of the rows of the matrix);

6. Solve the self-consistent equation to obtain the reduced kinetic matrix. To this goal, one may use the fixed-point algorithm described Appendix E;

7. Compute the spectrum of and isolate the smallest mode . In general, . To ensure the existence of a stationary state, redefine the spectrum as

 λ′K→λ′K−λ′1 (76)

and recompute the effective kinetic matrix as

 K′=Vdiag[λ′1,λ′2,…,λ′M]V−1 (77)

where is the orthogonal matrix which diagonalizes .

A Python code implementing this algorithm and the equivalent version for discrete-time MSMs can be made available by the authors upon request.

We emphasize that the effective transition rate matrix (or, equivalently, the transition probability matrix ) is expected to satisfy the microscopic reversibility condition only up to corrections of . On the other hand, to solve the self-consistent equation and to determine the equilibrium population it is important to consider effective models which obey such a condition. In the RGC algorithm this is done at step 7, by shifting the spectrum of the effective matrix by a factor so that the lowest eigenvalue of is null. We also stress the fact that the results of the renormalized MSM are only expected to be accurate up relative corrections which are expected to scale according to the gap ratio .

## Iv Illustrative Examples

In this section we provide two simple illustrative applications of the RGC algorithm based on simple toy models.

### iv.1 MSM with Multiple Gaps in the Spectrum of Relaxation Frequencies

As a first example of application of the RGC algorithm, we consider the MSM represented in Fig. 4a, composed of microstates. The arrows represent the connectivity between the different states and the numbers near the arrows express the corresponding transition rates (measured in units of some reference rate ). In particular, we notice that most transition rates are of order , a few of them are of order and two are of order . Physically, this model represents a system in which the relaxation to thermal equilibrium is slowed down by two energy barriers playing the role of kinetic bottlenecks. The and matrix are respectively showed in Eq. (109) and (110) of Appendix F.

Fig.4b shows that the spectrum of the matrix displays two well separated gaps, denoted with and , respectively. Eigenvalues above are of order , those between and are of order , while the two below are respectively and . Consequently, the original continuous time MSM must be defined using a cut-off smaller than the inverse of the highest frequency shown in Fig. 4b.

In the presence of two gaps it is possible to define two ETs, characterized by a different level of resolution, i.e. a different degree of dimensional reduction. In the finer-grained effective MSM, only the fast relaxation modes above the gap are projected out. The number of macrostates of this effective model is set by the number of eigenvalues below the gap, and the new cut-off is smaller than the smallest relaxation frequency above the gap. In the even coarser-grained effective MSM, all the relaxation modes with frequency above the gap are projected out and a new cut-off is defined correspondingly. In this ET, there are only two macrostates.

From the block diagonal structure of the kinetic matrix in Eq. (109) it is immediate to guess the structure of the macrostates in the finer- and coarser-grained effective MSM, which are graphically highlighted by the ellipses in Fig.4c and Fig.4d, respectively. In the following we show how these structures emerge from the RGC analysis.

The first step consists in computing the reduction matrix , i.e. the projector onto the lowest 6 eigenstates of the matrix. The result is shown in Eq. (111) of Appendix F. We note that some rows of this matrix differ only by relative corrections of order , i.e. by an amount of the order , which represents the estimate of the intrinsic relative uncertainty of the RGC approach. To this level of accuracy, it is legittimate to set to zero all entries of relative order and require that two vectors are equal if their relative difference is lower than . The new matrix is reported in Eq. (112) of Appendix F. It is worth noticing that such filtering procedure preserves the symmetry of the original projector. Furthermore, in this form it is clear that only 6 lines are linearly independent. The truncation operator is obtained by normalizing and arranging in raw such lines and is provided in Eq. (113) in Appendix F.

Finally, the corresponding effective kinetic matrix is calculated by solving the self-consistent equation (72). The result obtained using the fixed-point algorithm described in Appendix E is reported in Eq. (114) of Appendix F. We note that that the raws of this matrix do not sum exactly to , but to a number of the order , smaller than the intrinsic error of the method. Within such an accuracy it is legitimate to shift the diagonal element by an amount given by the lowest eigenvalue of the matrix (in this particular case ) so that the new satisfies detailed balance condition. This spectral shift procedure yields the matrix reported in Eq. (115) in Appendix F.

The same procedure can be repeated to build the coarser grained effective MSM. In this case, we need to build a new projector matrix which maps onto the space spanned by the 2 lowest eigenvectors of . To build the truncation operator we discard all relative differences which are of order or lower and obtain the truncation operator showed in Eq. (116) in Appendix F. Through this procedure one finds the two macrostates showed in Fig.4d. The kinetic matrix obtained after enforcing detailed balance is reported in (117) in Appendix F.

Let us now assess the accuracy of these two effective descriptions in describing the kinetics. To this goal, we compare the relaxation rates of the two effective models with the slowest rates calculated from the eigenvalues of the kinetic matrix of the original MSM. We expect the finer-grained MSM to reproduce well the lowest 6 relaxation frequencies of the original model, while the coarser grained model to reproduce only the thermal relaxation rate (we recall that first eigenvalue is set to 0 by construction).