A Graph-Algorithmic Approachfor the Study of Metastability in Markov Chains

# A Graph-Algorithmic Approach for the Study of Metastability in Markov Chains

Tingyue Gan tgan@math.umd.edu Department of Mathematics, University of Maryland, College Park, MD 20742 Maria Cameron cameron@math.umd.edu Department of Mathematics, University of Maryland, College Park, MD 20742
July 5, 2019
###### Abstract

Large continuous-time Markov chains with exponentially small transition rates arise in modeling complex systems in physics, chemistry and biology. We propose a constructive graph-algorithmic approach to determine the sequence of critical timescales at which the qualitative behavior of a given Markov chain changes, and give an effective description of the dynamics on each of them. This approach is valid for both time-reversible and time-irreversible Markov processes, with or without symmetry. Central to this approach are two graph algorithms, Algorithm 1 and Algorithm 2, for obtaining the sequences of the critical timescales and the hierarchies of Typical Transition Graphs or T-graphs indicating the most likely transitions in the system without and with symmetry respectively. The sequence of critical timescales includes the subsequence of the reciprocals of the real parts of eigenvalues. Under a certain assumption, we prove sharp asymptotic estimates for eigenvalues (including prefactors) and show how one can extract them from the output of Algorithm 1. We discuss the relationship between Algorithms 1 and 2, and explain how one needs to interpret the output of Algorithm 1 if it is applied in the case with symmetry instead of Algorithm 2. Finally, we analyze an example motivated by R. D. Astumian’s model of the dynamics of kinesin, a molecular motor, by means of Algorithm 2.

## 1 Introduction

Phase transitions in non-equilibrium systems, conformational changes in molecules or atomic clusters, financial crises, global climate changes, and genetic mutations exemplify the phenomenon where a seemingly stable behavior of a system in hand, persisting for a long time, undergoes a sudden qualitative change. Such systems are often referred to as metastable. A popular choice of mathematical model for investigating such systems is a Markov jump process with a finite number of states and exponentially distributed holding times. The dynamics of the process is described by the generator matrix . Each off-diagonal entry of is the transition rate from state to state . Often, it takes the form [25, 18]

 Lij=κijexp(−Uij/ε), (1)

where is the pre-factor, the number is the exponential factor or order, and is a small parameter. In many cases, the pre-factors are not available, and the rates are determined only up to the exponential order:

 Lij≍exp(−Uij/ε),\leavevmode\nobreak \leavevmode\nobreak i.e.\leavevmode\nobreak \leavevmode\nobreak limε→0εlogLij(ε)=−Uij. (2)

The reciprocal is the expected waiting time for a jump from state to state . The diagonal entries are defined so that the row sums of are zeros, i.e., . Hence, is the escape rate from state .

A Markov chain with pairwise rates of the form (1) or (2) can be associated with a weighted directed graph where the set of vertices is the set of states, the set of arcs includes only such arcs that , and is the set of arc weights. An arc has weight if . We set if , i.e., if there is no arc in the graph.

In this work, we will consider continuous-time Markov chains with finite numbers of states and pairwise transition rates of the form (1) or (2). Under this framework, the timescales on which various transition processes take place in the system are well-separated as tends to zero. Our goal is to find a constructive way to calculate the sequence of critical timescales at which the behavior of such a Markov chain undergoes qualitative changes, and give effective descriptions of its behavior on the whole range of timescales from zero to infinity. Imagine that the time evolution of a given Markov chain is observed for some not very large fixed number of time units. We want to be able to predict what transitions the observer will see depending on the initial state and the size of the time unit. Such a prediction is easy if the time unit tends to zero. Then it is extremely unlikely to observe any transitions. If the time unit tends to infinity, then an equilibrium probability distribution will be observed. However, even in this case, the determination of arcs along which the transitions will be most likely observed remains a nontrivial problem. On timescales between those two extremes, the problem of giving an effective description of the observable transitions is difficult. Prior to presenting our solution to it, we give an account of works that provided us with necessary background and/or inspiration.

### 1.1 Background

Long-time behavior of stochastic systems with rare transitions has been attracting attention of mathematicians for the last fifty years. Freidlin and Wentzell developed the Large Deviation Theory [18] in 1970s. They showed that the long-time behavior of a system evolving according to the SDE

 dXt=b(Xt)dt+√εdWt

can be modeled by means of continuous-time Markov chains. The states of the Markov chain correspond to attractors of the corresponding ODE . The pairwise rates are logarithmically equivalent to , where is the quasi-potential, the quantity characterizing the difficulty of the passage from attractor to attractor . Recently, Buchet and Reygner [5] calculated the pre-factor in the case where state represents a stable equilibrium point separated from the attractor corresponding to state by a Morse index one saddle.

In early 1970s, Freidlin proposed to describe the long-time behavior of the system via a hierarchy of cycles (we refer to them as Freidlin’s cycles) [15, 16, 18] in the case where each cycle has a unique main state, a unique exit arc, and the exit rates for all cycles are distinct. This hierarchy can be mapped onto a tree. In the time-reversible case, this tree is a complete binary tree [10]. Later, in 2014, Freidlin extended this approach to the case with symmetry [17] replacing the hierarchy of cycles with the hierarchy of Markov chains. Each cycle/Markov chain in Freidlin’s hierarchy is born at a specific critical timescale, which is the reciprocal of its rotation rate. The corresponding hierarchy of timescales has only partial but not complete order: cycles/Markov chains of the same order typically have different timescales. The birth timescales of Freidlin’s cycles/Markov chains constitute an important subset of critical timescales of the system.

The other important subset of critical timescales is given by the reciprocals of the absolute values of the real parts of nonzero eigenvalues of the generator matrix . Using the classic potential theory as a tool, Bovier and collaborators [6, 7, 8, 9] derived sharp estimates for low lying spectra of time-reversible Markov chains (all their eigenvalues are real) with pairwise rates not necessarily of the form (1), defined a hierarchy of metastable sets, and identified the link between eigenvalues and expected exit times. A more general study, utilizing almost degenerate perturbation theory, was conducted by Gaveau and Schulman [19], in which a spectral definition of metastability was given for a broad class of Markov chains. The Transition Path Theory proposed by E and Vanden-Eijnden and extended to Markov chains by Metzner et al [23], can be viewed as an extension of the potential-theoretic approach exercised by Bovier et al in the time-irreversible case. It is focused on the study of transition pathways between any particular pair of metastable sets.

Asymptotic estimates of the exponential orders of the real parts of eigenvalues of the generator matrix of any Markov chain with pairwise rates of the form (2) were developed by Wentzell in early 1970s [30]. These estimates are given in terms of the optimal W-graphs, i.e., solutions of certain combinatoric optimization problems on so-called W-graphs. Assuming the more concrete form (1) of the pairwise transition rates, we have derived asymptotic estimates for eigenvalues including pre-factors for the case where all optimal W-graphs are unique. Greedy graph algorithms to solve these optimization problems in case where all optimal W-graphs are unique were introduced in [11] and [12]. The one in [11] assumes time-reversibility and finds the sequence of asymptotic estimates for eigenvalues starting from the smallest ones. The greedy/dynamical programing “single-sweep algorithm” in [12] does not require time-reversibility and computes the sequence of asymptotic estimates for eigenvalues starting from the largest ones. Sharp estimates for eigenvalues for the time-reversible case with symmetry were obtained by Berglund and Dutercq using tools from the group representation theory [4].

### 1.2 Summary of main results

The starting point of this work is the single-sweep algorithm. In [12], we introduced it in the form convenient for programming and used it only for the purpose of finding asymptotic estimates for eigenvalues and eigenvectors of the large time-reversible Markov chain with 169523 states representing the energy landscape of the Lennard-Jones-75 cluster.

In this work, we extend the single-sweep algorithm [12] for finding the whole sequence of critical timescales corresponding to both, births of Freidlin’s cycles and reciprocals of the absolute values of the real parts of eigenvalues. Besides the set of critical timescales, the output will include the hierarchy of Typical Transition Graphs (T-graphs) introduced in this work, that mark the transitions most likely to observe up to certain timescales. Wentzell’s optimal W-graphs are readily extracted from the T-graphs. We will refer to this extension of the single-sweep algorithm as Algorithm 1. The presentation of Algorithm 1 is significantly different from the one in [12]. Each step of Algorithm 1 is motivated by the consideration of the dynamics of the system at an appropriate timescale.

Algorithm 1 offers a constructive way to simultaneously solve both Freidlin’s problem of building the hierarchy of cycles and Wentzell’s problem of finding asymptotic estimates for eigenvalues. Contrary to [15, 16, 18], Freidlin’s cycles are found in the decreasing order of their rotation rates. Asymptotic estimates for eigenvalues, containing pre-factors (if so do the input data), are found during the run of Algorithm 1. Our proof for sharp asymptotic estimates of eigenvalues of the generator matrix for time-irreversible Markov chains for which all optimal W-graphs are unique is provided.

From the programming point of view, Algorithm 1 and the single-sweep algorithm in [12] differ in their stopping criteria. Algorithm 1 stops as it computes all Freidlin’s cycles. The costs of the single-sweep algorithm and Algorithm 1 and the difference between them depend on the structure of the graph. For a graph with vertices and maximal vertex degree , the cost of both algorithms are at worst .

Algorithm 1 is designed for the case with no symmetry, i.e., where all critical timescales are distinct and all Freidlin’s cycles have unique exit arcs. Markov chains arising in modeling complex physical systems might or might not be symmetric. For example, the networks representing energy landscapes of biomolecules and clusters of particles interacting according to a long-range potential [26, 27, 28, 29] are mostly non-symmetric, while the networks representing the dynamics of particles interacting according to a short-range potential [2, 22, 3, 21] are highly symmetric.

To handle the case with symmetry, we have developed a modification of Algorithm 1 and called it Algorithm 2. Algorithm 2 computes the sequence of distinct values of critical timescales and the hierarchy of T-graphs.

The presence of symmetry is not necessarily apparent in a Markov chain. Algorithm 1 can run in the case with symmetry and produce an output that does not reflect its presence. However, the output will be inaccurate in some aspects. The relationship between the outputs of Algorithms 1 and 2 and a recipe for the interpretation of the output of Algorithm 1 used in a symmetric case is summarized in a theorem proven in this work.

Algorithms 1 and 2 constitute a graph-algorithmic approach to the study of metastability in continuous-time Markov chains with exponentially small transition rates.

Algorithms 1 and 2 are illustrated on examples. The case with symmetry is not as graphic as the one without it, however, it is important, as symmetry often occurs in Markov chains modeling natural systems. One such example, motivated by Astumian’s model [1] of the directed motion of kinesin protein, a molecular motor, is analyzed by means of Algorithm 2. The stochastic switch between two chemical states, breaking the detailed balance in this system, enables the directed motion (walking). The rate of the chemical switch is treated as a parameter. The most likely walking style is indicated for each rate value.

The rest of the paper is organized as follows. Some necessary background on continuous-time Markov chains and optimal W-graphs is provided in Section 2. The T-graphs are introduced in Section 3. Algorithm 1 is presented and discussed in Section 4. In Section 5, we address the case with symmetry and introduce Algorithm 2. The interpretation of the output of Algorithm 1 applied in the case with symmetry is given in Section 6. The molecular motor example is investigated in Section 7. We summarize our work in Section 8. Appendices A and B contain proofs of theorems.

## 2 Significance and nested property of optimal W-graphs

In this Section, we provide some necessary background and discuss some of our recent results that are essential for the presentation of our new developments.

### 2.1 Continuous-time Markov chains

Let be a weighted directed graph associated with a given continuous-time Markov chain with pairwise transition rates of the form (1) or (2). Throughout the rest of the paper we adopt the following assumptions.

###### Assumption 1.

The set of states is finite and .

###### Assumption 2.

The graph has a unique closed communicating class.

We remind (see e.g. [24]) that a subset of vertices is called a communicating class, if there is a directed path leading from any vertex to any vertex . A communicating class is called closed if for any vertex the following condition holds: if there is a directed path from to , then . A trivial example of a closed communicating class is an absorbing state, i.e., a state with no outgoing arcs. All states of an irreducible Markov chain belong to the same closed communicating class.

In a weighted directed graph with a single closed communicating class , all states in are recurrent, while the rest of the states are transient. Assumption 2 guarantees that the invariant distribution satisfying , , , and , is unique and supported only on the recurrent states. Note that is the left eigenvector of corresponding to the zero eigenvalue . Due to the zero row sum property of , the corresponding right eigenvector is .

Due to the weak diagonal dominance of and non-positivity of its diagonal entries, all its eigenvalues have non-positive real parts. This can be shown, e.g., by applying Gershgorin’s circle theorem [20]. Let , , be nonzero eigenvalues of , ordered so that , and column vectors and row vectors be the corresponding right and left eigenvectors. If is diagonalizable, one can write the probability distribution , governed by the Fokker-Planck equation , in the following form:

 p(t)=p(0)etL=π+n−1∑m=1e−λmteiμmt(p(0)ϕm)ψm. (3)

Considering the time as a function of , one can infer from Eq. (3) that the th eigen-component of is significant only for time of an exponential order not greater than the one of .

### 2.2 Optimal W-graphs

The W-graphs were introduced by Wentzell [30] in order to obtain asymptotic estimates for exponential factors of real parts of eigenvalues of the generator matrices of Markov chains with pairwise rates of the form (2). The W-graphs generalize the -graphs introduced by Freidlin [15, 16] for building the hierarchy of cycles describing the long-term behavior of such Markov chains.

###### Definition 2.1.

(W-graph) Let be a weighted directed graph satisfying Assumptions 1 and 2. A W-graph with sinks is a subgraph of with the same set of vertices such that

1. vertices have exactly one outgoing arc in , while the other vertices, called sinks, have no outgoing arcs in ;

2. contains no cycles.

Assumptions 1 and 2 guarantee the existence of a W-graph with any sinks. If , there is the unique W-graph, and it has no arcs. A W-graph with sinks is a forest consisting of connected components. Each connected component is a directed in-tree, i.e., a rooted tree with arcs pointing towards its root (sink). The collection of all possible W-graphs with sinks will be denoted by .

Asymptotic estimates for eigenvalues are defined by a collection of W-graphs with minimal possible sums of weights of their arcs [30]. We call such graphs optimal W-graphs and denote by . Precisely,

###### Definition 2.2.

(Optimal W-graph) is an optimal W-graph with sinks if and only if

 g∗m=argming∈G(m)V(g),\leavevmode\nobreak \leavevmode\nobreak where\leavevmode\nobreak \leavevmode\nobreak V(g):=∑(i→j)∈gUij. (4)

For brevity, we write meaning , where is the set of arcs of . We emphasize that, in the optimal W-graph, the sum of weights needs to be minimized simultaneously with respect to the choices of sinks and arcs.

### 2.3 Asymptotic estimates for eigenvalues

Wentzell established the following asymptotic estimates for eigenvalues in terms of the optimal W-graphs [30]:

###### Theorem 2.3.

(Wentzell, 1972)111No proof of Theorem 1 was provided in [30]. We are not aware of any published proof of this result. Let be the eigenvalues of the generator matrix with off-diagonal entries of the order of , ordered so that . Then as ,

 λm≍exp(−Δm/ε),whereΔm:=V(g∗m)−V(g∗m+1),m=1,2,⋯,n−1. (5)

Theorem 1 implies [30] that

 Δ1≥Δ2≥⋯≥Δn−1>0. (6)

Assumption 3 below allows us to derive a number of important facts.

###### Assumption 3.

For all , the optimal W-graphs are unique.

It is, in essence, a genericness assumption, because if the weights are random real numbers in an interval , then it holds with probability one.

Under Assumption 3, the strict inequalities take place in Eq. (6). Hence, if is sufficiently small, all eigenvalues are real and distinct [30]. In this case, for a generator matrix with off-diagonal entries of the form (1), the estimates given by Theorem 1 can be made more precise.

###### Theorem 2.4.

Suppose a Markov chain with pairwise rates of the form satisfies Assumptions 1, 2, and 3. Let be eigenvalues of the corresponding matrix . Then as , for we have

 λm =αmexp(−Δm/ε)(1+o(1)),where (7) Δm :=V(g∗m)−V(g∗m+1),αm:=∏(i→j)∈g∗mκij∏(i→j)∈g∗m+1κij.

Originally, we formulated Theorem 2.4 in [12] but did not provide a detailed proof in order to stay focused on practical matters. Here, we provide a proof of Theorem 2.4 in Appendix A as promised in [12]. The key point in this proof is the relationship between the coefficients of the characteristic polynomial of and the W-graphs with the corresponding numbers of sinks.

### 2.4 Weak nested property of optimal W-graphs

In [12], under Assumption 3, we proved a weak nested property of optimal W-graphs, a pivotal property that serves as a stepping stone in the design of the single sweep algorithm [12] and Algorithm 1, and facilitates a deeper understanding of the metastable behavior of continuous-time Markov chains. The weak nested property can be compared to a stronger nested property of optimal W-graphs taking place if the underlying Markov chain is time-reversible [11].

A connected component of a W-graph will be identified with its set of vertices. will denote the set of arcs of with tails in .

###### Theorem 2.5.

(The weak nested property of optimal W-graphs [12]) Suppose that Assumptions 1, 2, and 3 hold. Then the following statements are true.

1. There exists a unique connected component of , whose sink is not a sink of .

2. , i.e. the arcs in with tails in coincide with those in . However, and do not necessarily coincide.

3. There exists a single arc with tail in and head in another connected component with sink of .

Theorem 2.5 is illustrated in Fig. 1. The optimal W-graph is obtained from by adding an arc with tail and head lying in different connected components of , denoted by and respectively, and possibly rearranging some arcs with tails and heads in . We say that is absorbed by to form . All arcs of with tails not in are inherited by .

## 3 Freidlin’s hierarchies, the critical timescales, and the T-graphs

In this Section, we introduce the Typical Transition Graphs or T-graphs for continuous-time Markov chains with pairwise rates of the form of Eq. (1) or (2). Originally, the T-graphs arose as a byproduct of the single-sweep algorithms [12]. In this work, we connect the T-graphs to Freidlin’s construction of the hierarchy of cycles/Markov chains [15, 16, 18, 17].

A timescale is a function of defined up to the exponential order: we say that is logarithmically equivalent to and write

 t(ε)≍eθ/ε\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak if\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak limε→0εlog(t(ε))=θ.

For brevity, we write

 t>eθ1/ε\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak  implying that \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak limε→0εlog(t(ε))>θ1,

and adopt analogous meanings for all other inequality signs for comparing timescales.

For every vertex , let . The outgoing arcs from of the minimal weight will be called min-arcs from , and the set of all min-arcs from will be denoted by . If a Markov jump process starts at state , then the probability of the first jump along an arc is given by

 Pi(i→j)=Lij∑l≠iLil=κije−Uij/ε∑l≠iκile−Uil/ε=[κij∑(i→l)∈Amin(i)κil]e−(Uij−Umin(i))/ε(1+o(1)). (8)

Eq. (8) shows that remains positive as if an only if the arc is a min-arc from . The expected exit time from is logarithmically equivalent to .

For simplicity of the presentation below, we adopt

###### Assumption 4.

The Markov chain associated with the graph is irreducible.

Freidlin’s construction of the hierarchy of cycles/Markov chains [15, 16, 17] can be outlined as follows (with minor terminological modifications). We begin with the graph with the set of states and no arcs. All vertices are called the zeroth order Markov chains. Then all min-arcs from all vertices are added to resulting in the graph

 G1(S,A1,U1),\leavevmode\nobreak \leavevmode\nobreak where\leavevmode\nobreak \leavevmode\nobreak A1:=⋃i∈SAmin(i),U1={Uij\leavevmode\nobreak |\leavevmode\nobreak (i→j)∈A1}.

All nontrivial closed communicating classes in (i.e., consisting of more than one state), are called the first order cycles/Markov chains. The birth timescale of each first order cycle/Markov chain (the reciprocal of its rotations rate) is the maximal expected holding time among its vertices. If there is only one first order cycle/Markov chain, and it contains the whole set of states , the construction is fulfilled. Otherwise, each first order cycle/Markov chain is treated as a macro-state. For each macro-state, the exit rate and the set of exit arcs along which the process escapes from it with probability that remains positive as are found. These exit arcs are the min-arcs from the macro-states. Their weights are modified to set them equal to the exponential factors of their exit rates. Contracting the macro-states into single super-vertices, one obtains the graph which is a directed forest. The graph is obtained from by adding all min-arcs from the macro-states with their modified weights. Then one checks for nontrivial closed communicating classes in . If there is a unique closed communicating class comprising the whole set of states , the process terminates. Otherwise, it is continued in the same manner as it was done for the graph . This recursive procedure will terminate in a finite number of steps with a graph . It produces the hierarchy of Freidlin’s cycles/Markov chains of orders . The one of order comprises the whole set of states . Recursively restoring all contracted closed communicating classes while keeping the modified arc weights, one obtains the graph . We will call the transitions corresponding to arcs of the graph typical transitions and associate them with the corresponding associated timescales .

We remark that the birth timescales and exit timescales of the cycles/Markov chains are only partially ordered in the sense that if is a closed communicating class in some containing a super-vertex comprising a closed communicating class that the birth timescale of is greater than the one of , and the holding time in is greater than the one in . However, if and are closed communicating classes in and respectively, where , it is possible that the birth and exit timescales of are greater than those of . Furthermore, the order in which arcs are added to to form eventually is, in general, neither increasing nor decreasing order of their modified or original weights.

In this work, we propose an alternative construction that builds the graph from by adding arcs to it in the increasing order of their modified weights. In our construction, the procedure of finding the exit rates from the closed communicating classes is simpler than that in Freidlin’s one due to the order in which the arcs are added.

###### Definition 3.1.

(Critical exponents and T-graphs) Let be the graph obtained as a result of the construction of the hierarchy of Freidlin’s cycles/Markov chains as described above. The ordered subset of defined by

 γ1≤…≤γK:=Q−1⋃q=0{U(q)min(i)\leavevmode\nobreak |\leavevmode\nobreak i∈S(q)} (9)

and is the set of critical exponents. The corresponding timescales , …, are the critical timescales.

Let be the ordered set of distinct numbers in . The Typical Transition Graph or T-graph is the subgraph of containing all arcs of weights up to , i.e., where

 A(Tp)={i→j∈A∗\leavevmode\nobreak |\leavevmode\nobreak U∗ij≤θp},U(Tp)={U∗ij∈U∗\leavevmode\nobreak |\leavevmode\nobreak U∗ij≤θp},p=0,1,…,P. (10)

The T-graph is associated with the range of timescales for . The T-graphs and are associated with the ranges of timescales and .

Algorithm 1 in Section 4 builds the hierarchy of the critical exponents and the T-graphs in the case where all critical exponents , , are distinct and the min-arcs from each vertex and each closed communicating class are unique. In this case, and , , and all closed communicating classes in all graphs , , are simple directed cycles. Each of these cycles has the unique main state, where the system is found with probability close to one for small enough provided that it is in the cycle, and the unique exit arc, via which the system escapes from the cycle with probability close to one for small enough . Furthermore, the exit rates from the all cycles of all orders are distinct. We refer to such a case as a case with no symmetry. In this case, one can subdivide the set of the critical exponents , , into two subsets associated with the set of nonzero eigenvalues and the births of cycles respectively. Besides, if the pairwise rates are of the form , one can obtain the sharp estimates for the eigenvalues from the output of Algorithm 1. Finally, the unique hierarchy of the optimal W-graphs can be easily extracted from the found T-graphs.

Algorithm 2 presented in Section 5.2 is designed to handle the case with symmetry, i.e., the one where at least two critical exponents coincide or at least one vertex or a closed communicating class has a non-unique min-arc. It computes the set of numbers , , and the hierarchy of the T-graphs .

Algorithm 1 is design to handle the case with no symmetry. However, it might be impossible to determine the presence of symmetry by the input data and output of Algorithm 1. This can happen if one of the closed communicating classes has more that one min-arc. Hence the symmetry test should be made at each step of the while-cycle in the code. Suppose the code implementing Algorithm 1 detects some symmetry while running, but continues running and terminates normally. In this case, its output will contain a correct set of distinct values of the critical exponents, while the set of graphs will not be the correct set of the T-graphs. Nevertheless, one can still extract some important facts about the true T-graphs from its output. Section 6 contains a discussion and a theorem on this subject.

## 4 Algorithm 1 for the study of the metastable behavior

In this Section, we design Algorithm 1 that (a) finds the set of critical exponents and (b) builds the hierarchy of T-graphs. Throughout this Section, we adopt Assumption 1-4 and

###### Assumption 5.

All min-arcs are unique at every stage of Algorithm 1 presented below, and all critical exponents are distinct.

The while-loop in Algorithm 1 is essentially the same as the one in the “single-sweep algorithm” introduced in [12] for finding asymptotic estimates for eigenvalues and the hierarchy of optimal W-graphs. However, Algorithm 1 has a different stopping criterion and used for a broader, more ambitious purpose. More output data are extracted. Furthermore, the presentations of these algorithms are very different. Here, the description of Algorithm 1 serves the purpose of understanding the metastable behavior of the Markov process. Its recursive structure and Contract and Expand operators reflect the processes taking place on various timescales. On the contrary, the single-sweep algorithm was presented in [12] as a recipe for programming, where the key procedures were manipulations with various sets of outgoing arcs. It was not recursive and involved no Contract and Expand operators.

### 4.1 The recursive structure of the algorithm

Assumption 5 implies that a Markov process starting at a vertex leaves via the unique min-arc from denoted by with probability approaching one as . For every vertex , we select the unique min-arc , sort the set of min-arcs according to their weights in the increasing order, and call the sorted set the bucket .

We start with the bucket containing arcs and a graph containing all vertices of and no arcs. At each step of the main cycle of Algorithm 1, the minimum weight arc in the bucket is transferred to the graph .

At step one, the arc of the minimum weight in the whole graph is transferred to . Set and . The graph coincides with the optimal W-graph with sinks. Theorem 2.4 implies that the eigenvalue is approximated by Hence, and .

At step two, we transfer the next arc from the bucket to the graph . Set and . If contains no cycles, the optimal W-graph with sinks coincides with , the eigenvalue , and , .

We continue transferring arcs from to in this manner until the addition of a new arc creates a cycle in the graph . This will happen sooner or later because, if all arcs in are transferred to the graph with vertices, a cycle must appear in (see Lemma B.2 in Section 6).

Suppose a cycle is formed after the arc of weight was added to the graph . Abusing notations, we treat as a graph or its set of vertices depending on the context. We need to identify the exit arc from , i.e., the arc with and such that the probability to exit via tends to one as . We discard all arcs with tails and heads both in and modify the weights and pre-factors of all remaining outgoing arcs from according to the following rule:

 Unewij=Uij+γlast−Uμ(i),κnewij=κijκμlastκμ(i). (11)

Note that none of the arcs with modified weight is in the bucket at this moment, and all min-arcs with tails in have been already removed from and added to the graph . The update rule is of crucial importance for Algorithm 1. It is consistent with the one in [16]. A simple explanation for it is provided in Section 4.3 below. The number shows what would be the total weight of the graph obtained from the last found optimal W-graph by replacing the arc with and adding the arc (see [12] for more details). Then we contract the cycle to a single vertex (a super-vertex) . Finally, among the arcs with tails in , we find an arc with minimal modified weight, denote it by , and add it to the bucket . By Assumption 5, such an arc is unique. The idea of such a contraction is borrowed from the Chu-Liu/Edmonds algorithm [13, 14] for finding the optimal branching when the root is given a priori.

We continue adding arcs and contacting cycles in this manner. Note that the indices of the numbers are equal to the number of steps or, equivalently, to the arcs removed from the bucket and added to the graph T-graph where is the current recursion level, i.e., the number of cycles contracted into super-vertices. All arc additions not leading to cycles are associated with eigenvalues. The indices of the numbers and at step are .

The main cycle terminates as the bucket becomes empty. This stopping criterion allows us to obtain the whole collection of the critical timescales and the whole hierarchy of T-graphs. Suppose the main cycle terminates at the recursion level . Then Algorithm 1 returns to the previous recursion level and expands super-vertex back into the cycle . Then, if , Algorithm 1 returns to the recursion level and expands super-vertex back into the cycle . And so on, until recursion level zero is reached. After that, one can extract the optimal W-graphs out of the appropriate T-graphs..

Below, Algorithm 1 is given in the form of a pseudocode. The operators Contract and Expand are described in Section 4.2 below.

Algorithm 1
Initialization: Set step counter , cycle or recursion depth counter , and eigenvalue index . Prepare the bucket , i.e., for every state , find the min-arc , and then sort the set according to the arc weights in the increasing order:

 Uμ1

The graph is original graph .
Set the graph . Set .

The main body of the Algorithm: Call the function FindTgraphs with arguments , , , , and .
Function FindTgraphs
while is not empty and has no cycles
(1) Increase the step counter: ;
(2) Transfer the minimum weight arc from the bucket to the graph ;
(3) Set ;
(4) Set ;
(5) Check whether has a cycle; if so, denote the cycle by ;
if contains no cycle
(6) Decrease the eigenvalue index and set ; ;
(7) Set and denote by the sink of

the connected component of containing the tail of ;

end if
end while
if a cycle was detected at step (5)
(8) Save the index at which the cycle arose: set ;
(9) Remove the arcs with both tails and heads in (if any);
if the set of arcs of with tails in and heads not in is not empty
(10) Modify weights and pre-factors of all arcs

with tails in and heads not in according to Eq. (11);

(11) Contract into a super-vertex :

= Contract; = Contract;

(12) Denote the min-arc from by and add it to ;
(13) Call the function FindTgraphs;
(14) Expand the super-vertex back into the cycle :

for = Expand; end for

end if
end if
end

The flow chart of Algorithm 1 is shown in Fig. 2. At each recursion level, the T-graphs obtained in the while-cycle are placed into the green boxes, while the ones obtained by the Expand operator are placed into the yellow boxes. The number is the index of the last step of Algorithm 1. At , cycle is detected in . Hence steps (8) and (9) are performed. However, since it is the last step, the set of arcs in with tails in and heads not in is empty. Therefore, steps (10) - (14) are not executed. Instead, the algorithm returns to the previous recursion level and lands at step (14) where the function Expand is called.

The output of Algorithm 1 consists of several datasets.

• The set of critical exponents defining the critical timescales .

• The hierarchy of T-graphs , indicating the most likely transitions to observe up to timescales , where , , …, respectively.

• The set of exponents and pre-factors determining sharp estimates of eigenvalues according to . Note that .

• The set of sinks and . The optimal W-graph has the sinks where the sink is defined as the sink . The sinks are the sinks of those connected components of the optimal W-graphs that absorb the connected components with sinks as the graphs are created. The sets of vertices in connected components of the optimal W-graph coincides with those of the T-graph . An explanation how one can extract the optimal W-graphs from the graphs is given in Section 4.4.

• The set of step numbers at which the cycles are created.

###### Remark 4.1.

The number of arcs in the bucket is reduced by one after each step when no cycle is created, and remains that same otherwise. Let be the smallest step index such that the number of arcs in the bucket at the end of step reaches one. The eigenvalue index switches to during step . A new cycle will be obtained at every step such that . For all , the T-graphs are connected, while for all , are not connected.

###### Remark 4.2.

The total number of steps and the total number of cycles satisfy the relationship

 K−Nc=n−1. (12)

The maximal recursion level . Hence .

###### Remark 4.3.

In the case of a time-reversible Markov chain, the total number of Freidlin’s cycles of nonzero orders is [10]. Therefore, the total number of steps made by Algorithm 1 in this case will be . In the general case, the total number of cycles satisfies

 1≤Nc≤n−1. (13)

Indeed, let us construct a directed rooted tree of cycles, whose leaves are the vertices, the root is the cycle , the other nodes correspond to the cycles , , and whose arcs are directed toward the root. Each node of this tree except for the root has exactly one outgoing arc. Hence, the total number of arcs is . On the other hand, the nodes corresponding to the cycles , have at least two incoming arcs. Therefore, the number of arcs must satisfy

 Na=n+Nc−1≥2Nc.

This inequality implies that . Besides, Assumption 4, implies that there must be at least one cycle. This proves Eq. (13). Hence, the maximal possible recursion depth is achieved if and only if the Markov chain is time-reversible.

###### Remark 4.4.

The stopping criterion of Algorithm 1 can be modified depending on the goal. If one would like to use Algorithm 1 only for finding estimates for eigenvalues and optimal W-graphs, it suffices to stop the while-cycle as soon as . If one would like to find all T-graphs for the timescales , one needs to stop the while-cycle as soon as .

###### Remark 4.5.

For programming purposes, if a cycle is encountered, it is more convenient to merge the sets of outgoing arcs from the vertices in instead of contracting into a single super-vertex as it is done in the single-sweep algorithm in [12].

### 4.2 The functions Contract and Expand

The function = Contract maps the graph onto the graph as follows. All vertices of belonging to the cycle are mapped onto a single vertex of . More formally, let and . Let be the subset of vertices lying in the cycle , and be the subset of arcs of defined by

 Ac:={(i→j)\leavevmode\nobreak |\leavevmode\nobreak i,j∈Sc}.

Then , , if and , and , if and .

The function = Expand is the inverse function of = Contract. If , , then , , if and , and , if and .

### 4.3 An explanation of the update rules for arc weights and pre-factors

In this Section, we explain the update rules for arc weights and pre-factors given by Eq. (11) (also see [15, 16, 18]). A cycle

 c={i1→i2→…→iq→i1}

appears in Algorithm 1 if and only if the min-arcs from the vertices , are , and the min-arc from is . Let us restrict the dynamics of the Markov chain to the cycle and for each state neglect the transition rates of smaller exponential orders than which is for sufficiently small . Then we obtain the following generator matrix approximately describing the dynamics within the cycle :

 Lc=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣−Lcμ(i1)Lcμ(i1)−Lcμ(i2)Lcμ(i2)⋱⋱−Lcμ(iq−1)Lcμ(iq−1)Lcμ(iq)−Lcμ(iq)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦,where\leavevmode\nobreak \leavevmode\nobreak Lcμ(il)=κμ(il)e−Uμ(il)/ε. (14)

Solving , , we find the approximation to the invariant distribution in :

 πc(il)≈1κμ(il)eUμ(il)/ε∑qj=11κμ(ij)eUμ(ij)/ε,l=1,…,q. (15)

Without the loss of generality we assume that the last added arc to the cycle is . Then for . Multiplying the enumerator and the denominator in Eq. (15) by and neglecting small summands in the denominator we obtain

 πc≈[κμ(iq)κμ(i1)e−(Uμ(iq)−Uμ(i1))/ε,…,κμ(iq)κμ(iq−1)e−(Uμ(iq)−Uμ(iq−1))/ε,1]. (16)

The quasi-invariant distribution (i.e., the approximation to the invariant distribution for dynamics restricted to the subset of states ) allows us to obtain sharp estimates for the exit rates from the cycle via arcs with tails in . For any arc where and , the exit rate from through the arc is approximated by

 πc(il)Lilj=κμ(iq)κiljκμ(il)e−(Uilj+Uμ(iq)−Uμ(il))/ε. (17)

If one treats the cycle as a macro-state, i.e., contracts it into a single vertex, then the effective exit rate from it is given by Eq. (17). Recalling that is the last added arc, one readily reads off the update rule for the arc weights and the pre-factors given by Eq. (11).

### 4.4 Extraction of optimal W-graphs

Let us imagine that we have abolished the Contract and Expand operators in Algorithm 1 and manipulate the arc sets instead as it is done in the single-sweep algorithm [12], i.e., as it is suggested in Remark 4.5 in Section 4.1. I.e., instead of contracting a cycle into a super-vertex, we discard all arcs with both tails and heads in that are not in the current graph , modify the weights and the pre-factors of all outgoing arcs with tails in and heads not in according to Eq. (11) and denote the set of such arcs by , and find the arc of minimal weight in and add it to the bucket ; this arc becomes the min-arc for all vertices in . The weight of any arc modified according to the update rule Eq. (11) is equal to the increment of the total weight of the graph obtained from the current optimal W-graph by replacing the arc with and adding the arc that led to the current cycle. This fact and the weak nested property of the optimal W-graphs (Theorem 2.5) guarantee that the whole hierarchy of the optimal W-graphs , …, can be extracted from the T-graphs , …, built by Algorithm 1.

Recall that is the step number at which the eigenvalue counter switches to : , where is the recursion depth at step in Algorithm 1. For convenience, we denote the unique absorbing state of the T-graph by , i.e., . The optimal W-graphs can be extracted from the corresponding T-graphs for . We emphasize that is fully expanded graph, i.e., its set of vertices is . The set of sinks of is