A GraphAlgorithmic Approach
for the Study of Metastability
in Markov Chains
Abstract
Large continuoustime Markov chains with exponentially small transition rates arise in modeling complex systems in physics, chemistry and biology. We propose a constructive graphalgorithmic approach to determine the sequence of critical timescales at which the qualitative behavior of a given Markov chain changes, and give an effective description of the dynamics on each of them. This approach is valid for both timereversible and timeirreversible Markov processes, with or without symmetry. Central to this approach are two graph algorithms, Algorithm 1 and Algorithm 2, for obtaining the sequences of the critical timescales and the hierarchies of Typical Transition Graphs or Tgraphs indicating the most likely transitions in the system without and with symmetry respectively. The sequence of critical timescales includes the subsequence of the reciprocals of the real parts of eigenvalues. Under a certain assumption, we prove sharp asymptotic estimates for eigenvalues (including prefactors) and show how one can extract them from the output of Algorithm 1. We discuss the relationship between Algorithms 1 and 2, and explain how one needs to interpret the output of Algorithm 1 if it is applied in the case with symmetry instead of Algorithm 2. Finally, we analyze an example motivated by R. D. Astumian’s model of the dynamics of kinesin, a molecular motor, by means of Algorithm 2.
1 Introduction
Phase transitions in nonequilibrium systems, conformational changes in molecules or atomic clusters, financial crises, global climate changes, and genetic mutations exemplify the phenomenon where a seemingly stable behavior of a system in hand, persisting for a long time, undergoes a sudden qualitative change. Such systems are often referred to as metastable. A popular choice of mathematical model for investigating such systems is a Markov jump process with a finite number of states and exponentially distributed holding times. The dynamics of the process is described by the generator matrix . Each offdiagonal entry of is the transition rate from state to state . Often, it takes the form [25, 18]
(1) 
where is the prefactor, the number is the exponential factor or order, and is a small parameter. In many cases, the prefactors are not available, and the rates are determined only up to the exponential order:
(2) 
The reciprocal is the expected waiting time for a jump from state to state . The diagonal entries are defined so that the row sums of are zeros, i.e., . Hence, is the escape rate from state .
A Markov chain with pairwise rates of the form (1) or (2) can be associated with a weighted directed graph where the set of vertices is the set of states, the set of arcs includes only such arcs that , and is the set of arc weights. An arc has weight if . We set if , i.e., if there is no arc in the graph.
In this work, we will consider continuoustime Markov chains with finite numbers of states and pairwise transition rates of the form (1) or (2). Under this framework, the timescales on which various transition processes take place in the system are wellseparated as tends to zero. Our goal is to find a constructive way to calculate the sequence of critical timescales at which the behavior of such a Markov chain undergoes qualitative changes, and give effective descriptions of its behavior on the whole range of timescales from zero to infinity. Imagine that the time evolution of a given Markov chain is observed for some not very large fixed number of time units. We want to be able to predict what transitions the observer will see depending on the initial state and the size of the time unit. Such a prediction is easy if the time unit tends to zero. Then it is extremely unlikely to observe any transitions. If the time unit tends to infinity, then an equilibrium probability distribution will be observed. However, even in this case, the determination of arcs along which the transitions will be most likely observed remains a nontrivial problem. On timescales between those two extremes, the problem of giving an effective description of the observable transitions is difficult. Prior to presenting our solution to it, we give an account of works that provided us with necessary background and/or inspiration.
1.1 Background
Longtime behavior of stochastic systems with rare transitions has been attracting attention of mathematicians for the last fifty years. Freidlin and Wentzell developed the Large Deviation Theory [18] in 1970s. They showed that the longtime behavior of a system evolving according to the SDE
can be modeled by means of continuoustime Markov chains. The states of the Markov chain correspond to attractors of the corresponding ODE . The pairwise rates are logarithmically equivalent to , where is the quasipotential, the quantity characterizing the difficulty of the passage from attractor to attractor . Recently, Buchet and Reygner [5] calculated the prefactor in the case where state represents a stable equilibrium point separated from the attractor corresponding to state by a Morse index one saddle.
In early 1970s, Freidlin proposed to describe the longtime behavior of the system via a hierarchy of cycles (we refer to them as Freidlin’s cycles) [15, 16, 18] in the case where each cycle has a unique main state, a unique exit arc, and the exit rates for all cycles are distinct. This hierarchy can be mapped onto a tree. In the timereversible case, this tree is a complete binary tree [10]. Later, in 2014, Freidlin extended this approach to the case with symmetry [17] replacing the hierarchy of cycles with the hierarchy of Markov chains. Each cycle/Markov chain in Freidlin’s hierarchy is born at a specific critical timescale, which is the reciprocal of its rotation rate. The corresponding hierarchy of timescales has only partial but not complete order: cycles/Markov chains of the same order typically have different timescales. The birth timescales of Freidlin’s cycles/Markov chains constitute an important subset of critical timescales of the system.
The other important subset of critical timescales is given by the reciprocals of the absolute values of the real parts of nonzero eigenvalues of the generator matrix . Using the classic potential theory as a tool, Bovier and collaborators [6, 7, 8, 9] derived sharp estimates for low lying spectra of timereversible Markov chains (all their eigenvalues are real) with pairwise rates not necessarily of the form (1), defined a hierarchy of metastable sets, and identified the link between eigenvalues and expected exit times. A more general study, utilizing almost degenerate perturbation theory, was conducted by Gaveau and Schulman [19], in which a spectral definition of metastability was given for a broad class of Markov chains. The Transition Path Theory proposed by E and VandenEijnden and extended to Markov chains by Metzner et al [23], can be viewed as an extension of the potentialtheoretic approach exercised by Bovier et al in the timeirreversible case. It is focused on the study of transition pathways between any particular pair of metastable sets.
Asymptotic estimates of the exponential orders of the real parts of eigenvalues of the generator matrix of any Markov chain with pairwise rates of the form (2) were developed by Wentzell in early 1970s [30]. These estimates are given in terms of the optimal Wgraphs, i.e., solutions of certain combinatoric optimization problems on socalled Wgraphs. Assuming the more concrete form (1) of the pairwise transition rates, we have derived asymptotic estimates for eigenvalues including prefactors for the case where all optimal Wgraphs are unique. Greedy graph algorithms to solve these optimization problems in case where all optimal Wgraphs are unique were introduced in [11] and [12]. The one in [11] assumes timereversibility and finds the sequence of asymptotic estimates for eigenvalues starting from the smallest ones. The greedy/dynamical programing “singlesweep algorithm” in [12] does not require timereversibility and computes the sequence of asymptotic estimates for eigenvalues starting from the largest ones. Sharp estimates for eigenvalues for the timereversible case with symmetry were obtained by Berglund and Dutercq using tools from the group representation theory [4].
1.2 Summary of main results
The starting point of this work is the singlesweep algorithm. In [12], we introduced it in the form convenient for programming and used it only for the purpose of finding asymptotic estimates for eigenvalues and eigenvectors of the large timereversible Markov chain with 169523 states representing the energy landscape of the LennardJones75 cluster.
In this work, we extend the singlesweep algorithm [12] for finding the whole sequence of critical timescales corresponding to both, births of Freidlin’s cycles and reciprocals of the absolute values of the real parts of eigenvalues. Besides the set of critical timescales, the output will include the hierarchy of Typical Transition Graphs (Tgraphs) introduced in this work, that mark the transitions most likely to observe up to certain timescales. Wentzell’s optimal Wgraphs are readily extracted from the Tgraphs. We will refer to this extension of the singlesweep algorithm as Algorithm 1. The presentation of Algorithm 1 is significantly different from the one in [12]. Each step of Algorithm 1 is motivated by the consideration of the dynamics of the system at an appropriate timescale.
Algorithm 1 offers a constructive way to simultaneously solve both Freidlin’s problem of building the hierarchy of cycles and Wentzell’s problem of finding asymptotic estimates for eigenvalues. Contrary to [15, 16, 18], Freidlin’s cycles are found in the decreasing order of their rotation rates. Asymptotic estimates for eigenvalues, containing prefactors (if so do the input data), are found during the run of Algorithm 1. Our proof for sharp asymptotic estimates of eigenvalues of the generator matrix for timeirreversible Markov chains for which all optimal Wgraphs are unique is provided.
From the programming point of view, Algorithm 1 and the singlesweep algorithm in [12] differ in their stopping criteria. Algorithm 1 stops as it computes all Freidlin’s cycles. The costs of the singlesweep algorithm and Algorithm 1 and the difference between them depend on the structure of the graph. For a graph with vertices and maximal vertex degree , the cost of both algorithms are at worst .
Algorithm 1 is designed for the case with no symmetry, i.e., where all critical timescales are distinct and all Freidlin’s cycles have unique exit arcs. Markov chains arising in modeling complex physical systems might or might not be symmetric. For example, the networks representing energy landscapes of biomolecules and clusters of particles interacting according to a longrange potential [26, 27, 28, 29] are mostly nonsymmetric, while the networks representing the dynamics of particles interacting according to a shortrange potential [2, 22, 3, 21] are highly symmetric.
To handle the case with symmetry, we have developed a modification of Algorithm 1 and called it Algorithm 2. Algorithm 2 computes the sequence of distinct values of critical timescales and the hierarchy of Tgraphs.
The presence of symmetry is not necessarily apparent in a Markov chain. Algorithm 1 can run in the case with symmetry and produce an output that does not reflect its presence. However, the output will be inaccurate in some aspects. The relationship between the outputs of Algorithms 1 and 2 and a recipe for the interpretation of the output of Algorithm 1 used in a symmetric case is summarized in a theorem proven in this work.
Algorithms 1 and 2 constitute a graphalgorithmic approach to the study of metastability in continuoustime Markov chains with exponentially small transition rates.
Algorithms 1 and 2 are illustrated on examples. The case with symmetry is not as graphic as the one without it, however, it is important, as symmetry often occurs in Markov chains modeling natural systems. One such example, motivated by Astumian’s model [1] of the directed motion of kinesin protein, a molecular motor, is analyzed by means of Algorithm 2. The stochastic switch between two chemical states, breaking the detailed balance in this system, enables the directed motion (walking). The rate of the chemical switch is treated as a parameter. The most likely walking style is indicated for each rate value.
The rest of the paper is organized as follows. Some necessary background on continuoustime Markov chains and optimal Wgraphs is provided in Section 2. The Tgraphs are introduced in Section 3. Algorithm 1 is presented and discussed in Section 4. In Section 5, we address the case with symmetry and introduce Algorithm 2. The interpretation of the output of Algorithm 1 applied in the case with symmetry is given in Section 6. The molecular motor example is investigated in Section 7. We summarize our work in Section 8. Appendices A and B contain proofs of theorems.
2 Significance and nested property of optimal Wgraphs
In this Section, we provide some necessary background and discuss some of our recent results that are essential for the presentation of our new developments.
2.1 Continuoustime Markov chains
Let be a weighted directed graph associated with a given continuoustime Markov chain with pairwise transition rates of the form (1) or (2). Throughout the rest of the paper we adopt the following assumptions.
Assumption 1.
The set of states is finite and .
Assumption 2.
The graph has a unique closed communicating class.
We remind (see e.g. [24]) that a subset of vertices is called a communicating class, if there is a directed path leading from any vertex to any vertex . A communicating class is called closed if for any vertex the following condition holds: if there is a directed path from to , then . A trivial example of a closed communicating class is an absorbing state, i.e., a state with no outgoing arcs. All states of an irreducible Markov chain belong to the same closed communicating class.
In a weighted directed graph with a single closed communicating class , all states in are recurrent, while the rest of the states are transient. Assumption 2 guarantees that the invariant distribution satisfying , , , and , is unique and supported only on the recurrent states. Note that is the left eigenvector of corresponding to the zero eigenvalue . Due to the zero row sum property of , the corresponding right eigenvector is .
Due to the weak diagonal dominance of and nonpositivity of its diagonal entries, all its eigenvalues have nonpositive real parts. This can be shown, e.g., by applying Gershgorin’s circle theorem [20]. Let , , be nonzero eigenvalues of , ordered so that , and column vectors and row vectors be the corresponding right and left eigenvectors. If is diagonalizable, one can write the probability distribution , governed by the FokkerPlanck equation , in the following form:
(3) 
Considering the time as a function of , one can infer from Eq. (3) that the th eigencomponent of is significant only for time of an exponential order not greater than the one of .
2.2 Optimal Wgraphs
The Wgraphs were introduced by Wentzell [30] in order to obtain asymptotic estimates for exponential factors of real parts of eigenvalues of the generator matrices of Markov chains with pairwise rates of the form (2). The Wgraphs generalize the graphs introduced by Freidlin [15, 16] for building the hierarchy of cycles describing the longterm behavior of such Markov chains.
Definition 2.1.
Assumptions 1 and 2 guarantee the existence of a Wgraph with any sinks. If , there is the unique Wgraph, and it has no arcs. A Wgraph with sinks is a forest consisting of connected components. Each connected component is a directed intree, i.e., a rooted tree with arcs pointing towards its root (sink). The collection of all possible Wgraphs with sinks will be denoted by .
Asymptotic estimates for eigenvalues are defined by a collection of Wgraphs with minimal possible sums of weights of their arcs [30]. We call such graphs optimal Wgraphs and denote by . Precisely,
Definition 2.2.
(Optimal Wgraph) is an optimal Wgraph with sinks if and only if
(4) 
For brevity, we write meaning , where is the set of arcs of . We emphasize that, in the optimal Wgraph, the sum of weights needs to be minimized simultaneously with respect to the choices of sinks and arcs.
2.3 Asymptotic estimates for eigenvalues
Wentzell established the following asymptotic estimates for eigenvalues in terms of the optimal Wgraphs [30]:
Theorem 2.3.
Assumption 3 below allows us to derive a number of important facts.
Assumption 3.
For all , the optimal Wgraphs are unique.
It is, in essence, a genericness assumption, because if the weights are random real numbers in an interval , then it holds with probability one.
Under Assumption 3, the strict inequalities take place in Eq. (6). Hence, if is sufficiently small, all eigenvalues are real and distinct [30]. In this case, for a generator matrix with offdiagonal entries of the form (1), the estimates given by Theorem 1 can be made more precise.
Theorem 2.4.
Originally, we formulated Theorem 2.4 in [12] but did not provide a detailed proof in order to stay focused on practical matters. Here, we provide a proof of Theorem 2.4 in Appendix A as promised in [12]. The key point in this proof is the relationship between the coefficients of the characteristic polynomial of and the Wgraphs with the corresponding numbers of sinks.
2.4 Weak nested property of optimal Wgraphs
In [12], under Assumption 3, we proved a weak nested property of optimal Wgraphs, a pivotal property that serves as a stepping stone in the design of the single sweep algorithm [12] and Algorithm 1, and facilitates a deeper understanding of the metastable behavior of continuoustime Markov chains. The weak nested property can be compared to a stronger nested property of optimal Wgraphs taking place if the underlying Markov chain is timereversible [11].
A connected component of a Wgraph will be identified with its set of vertices. will denote the set of arcs of with tails in .
Theorem 2.5.
(The weak nested property of optimal Wgraphs [12]) Suppose that Assumptions 1, 2, and 3 hold. Then the following statements are true.

There exists a unique connected component of , whose sink is not a sink of .

, i.e. the arcs in with tails in coincide with those in . However, and do not necessarily coincide.

There exists a single arc with tail in and head in another connected component with sink of .
Theorem 2.5 is illustrated in Fig. 1. The optimal Wgraph is obtained from by adding an arc with tail and head lying in different connected components of , denoted by and respectively, and possibly rearranging some arcs with tails and heads in . We say that is absorbed by to form . All arcs of with tails not in are inherited by .
3 Freidlin’s hierarchies, the critical timescales, and the Tgraphs
In this Section, we introduce the Typical Transition Graphs or Tgraphs for continuoustime Markov chains with pairwise rates of the form of Eq. (1) or (2). Originally, the Tgraphs arose as a byproduct of the singlesweep algorithms [12]. In this work, we connect the Tgraphs to Freidlin’s construction of the hierarchy of cycles/Markov chains [15, 16, 18, 17].
A timescale is a function of defined up to the exponential order: we say that is logarithmically equivalent to and write
For brevity, we write
and adopt analogous meanings for all other inequality signs for comparing timescales.
For every vertex , let . The outgoing arcs from of the minimal weight will be called minarcs from , and the set of all minarcs from will be denoted by . If a Markov jump process starts at state , then the probability of the first jump along an arc is given by
(8) 
Eq. (8) shows that remains positive as if an only if the arc is a minarc from . The expected exit time from is logarithmically equivalent to .
For simplicity of the presentation below, we adopt
Assumption 4.
The Markov chain associated with the graph is irreducible.
Freidlin’s construction of the hierarchy of cycles/Markov chains [15, 16, 17] can be outlined as follows (with minor terminological modifications). We begin with the graph with the set of states and no arcs. All vertices are called the zeroth order Markov chains. Then all minarcs from all vertices are added to resulting in the graph
All nontrivial closed communicating classes in (i.e., consisting of more than one state), are called the first order cycles/Markov chains. The birth timescale of each first order cycle/Markov chain (the reciprocal of its rotations rate) is the maximal expected holding time among its vertices. If there is only one first order cycle/Markov chain, and it contains the whole set of states , the construction is fulfilled. Otherwise, each first order cycle/Markov chain is treated as a macrostate. For each macrostate, the exit rate and the set of exit arcs along which the process escapes from it with probability that remains positive as are found. These exit arcs are the minarcs from the macrostates. Their weights are modified to set them equal to the exponential factors of their exit rates. Contracting the macrostates into single supervertices, one obtains the graph which is a directed forest. The graph is obtained from by adding all minarcs from the macrostates with their modified weights. Then one checks for nontrivial closed communicating classes in . If there is a unique closed communicating class comprising the whole set of states , the process terminates. Otherwise, it is continued in the same manner as it was done for the graph . This recursive procedure will terminate in a finite number of steps with a graph . It produces the hierarchy of Freidlin’s cycles/Markov chains of orders . The one of order comprises the whole set of states . Recursively restoring all contracted closed communicating classes while keeping the modified arc weights, one obtains the graph . We will call the transitions corresponding to arcs of the graph typical transitions and associate them with the corresponding associated timescales .
We remark that the birth timescales and exit timescales of the cycles/Markov chains are only partially ordered in the sense that if is a closed communicating class in some containing a supervertex comprising a closed communicating class that the birth timescale of is greater than the one of , and the holding time in is greater than the one in . However, if and are closed communicating classes in and respectively, where , it is possible that the birth and exit timescales of are greater than those of . Furthermore, the order in which arcs are added to to form eventually is, in general, neither increasing nor decreasing order of their modified or original weights.
In this work, we propose an alternative construction that builds the graph from by adding arcs to it in the increasing order of their modified weights. In our construction, the procedure of finding the exit rates from the closed communicating classes is simpler than that in Freidlin’s one due to the order in which the arcs are added.
Definition 3.1.
(Critical exponents and Tgraphs) Let be the graph obtained as a result of the construction of the hierarchy of Freidlin’s cycles/Markov chains as described above. The ordered subset of defined by
(9) 
and is the set of critical exponents. The corresponding timescales , …, are the critical timescales.
Let be the ordered set of distinct numbers in . The Typical Transition Graph or Tgraph is the subgraph of containing all arcs of weights up to , i.e., where
(10) 
The Tgraph is associated with the range of timescales for . The Tgraphs and are associated with the ranges of timescales and .
Algorithm 1 in Section 4 builds the hierarchy of the critical exponents and the Tgraphs in the case where all critical exponents , , are distinct and the minarcs from each vertex and each closed communicating class are unique. In this case, and , , and all closed communicating classes in all graphs , , are simple directed cycles. Each of these cycles has the unique main state, where the system is found with probability close to one for small enough provided that it is in the cycle, and the unique exit arc, via which the system escapes from the cycle with probability close to one for small enough . Furthermore, the exit rates from the all cycles of all orders are distinct. We refer to such a case as a case with no symmetry. In this case, one can subdivide the set of the critical exponents , , into two subsets associated with the set of nonzero eigenvalues and the births of cycles respectively. Besides, if the pairwise rates are of the form , one can obtain the sharp estimates for the eigenvalues from the output of Algorithm 1. Finally, the unique hierarchy of the optimal Wgraphs can be easily extracted from the found Tgraphs.
Algorithm 2 presented in Section 5.2 is designed to handle the case with symmetry, i.e., the one where at least two critical exponents coincide or at least one vertex or a closed communicating class has a nonunique minarc. It computes the set of numbers , , and the hierarchy of the Tgraphs .
Algorithm 1 is design to handle the case with no symmetry. However, it might be impossible to determine the presence of symmetry by the input data and output of Algorithm 1. This can happen if one of the closed communicating classes has more that one minarc. Hence the symmetry test should be made at each step of the whilecycle in the code. Suppose the code implementing Algorithm 1 detects some symmetry while running, but continues running and terminates normally. In this case, its output will contain a correct set of distinct values of the critical exponents, while the set of graphs will not be the correct set of the Tgraphs. Nevertheless, one can still extract some important facts about the true Tgraphs from its output. Section 6 contains a discussion and a theorem on this subject.
4 Algorithm 1 for the study of the metastable behavior
In this Section, we design Algorithm 1 that (a) finds the set of critical exponents and (b) builds the hierarchy of Tgraphs. Throughout this Section, we adopt Assumption 14 and
Assumption 5.
All minarcs are unique at every stage of Algorithm 1 presented below, and all critical exponents are distinct.
The whileloop in Algorithm 1 is essentially the same as the one in the “singlesweep algorithm” introduced in [12] for finding asymptotic estimates for eigenvalues and the hierarchy of optimal Wgraphs. However, Algorithm 1 has a different stopping criterion and used for a broader, more ambitious purpose. More output data are extracted. Furthermore, the presentations of these algorithms are very different. Here, the description of Algorithm 1 serves the purpose of understanding the metastable behavior of the Markov process. Its recursive structure and Contract and Expand operators reflect the processes taking place on various timescales. On the contrary, the singlesweep algorithm was presented in [12] as a recipe for programming, where the key procedures were manipulations with various sets of outgoing arcs. It was not recursive and involved no Contract and Expand operators.
4.1 The recursive structure of the algorithm
Assumption 5 implies that a Markov process starting at a vertex leaves via the unique minarc from denoted by with probability approaching one as . For every vertex , we select the unique minarc , sort the set of minarcs according to their weights in the increasing order, and call the sorted set the bucket .
We start with the bucket containing arcs and a graph containing all vertices of and no arcs. At each step of the main cycle of Algorithm 1, the minimum weight arc in the bucket is transferred to the graph .
At step one, the arc of the minimum weight in the whole graph is transferred to . Set and . The graph coincides with the optimal Wgraph with sinks. Theorem 2.4 implies that the eigenvalue is approximated by Hence, and .
At step two, we transfer the next arc from the bucket to the graph . Set and . If contains no cycles, the optimal Wgraph with sinks coincides with , the eigenvalue , and , .
We continue transferring arcs from to in this manner until the addition of a new arc creates a cycle in the graph . This will happen sooner or later because, if all arcs in are transferred to the graph with vertices, a cycle must appear in (see Lemma B.2 in Section 6).
Suppose a cycle is formed after the arc of weight was added to the graph . Abusing notations, we treat as a graph or its set of vertices depending on the context. We need to identify the exit arc from , i.e., the arc with and such that the probability to exit via tends to one as . We discard all arcs with tails and heads both in and modify the weights and prefactors of all remaining outgoing arcs from according to the following rule:
(11) 
Note that none of the arcs with modified weight is in the bucket at this moment, and all minarcs with tails in have been already removed from and added to the graph . The update rule is of crucial importance for Algorithm 1. It is consistent with the one in [16]. A simple explanation for it is provided in Section 4.3 below. The number shows what would be the total weight of the graph obtained from the last found optimal Wgraph by replacing the arc with and adding the arc (see [12] for more details). Then we contract the cycle to a single vertex (a supervertex) . Finally, among the arcs with tails in , we find an arc with minimal modified weight, denote it by , and add it to the bucket . By Assumption 5, such an arc is unique. The idea of such a contraction is borrowed from the ChuLiu/Edmonds algorithm [13, 14] for finding the optimal branching when the root is given a priori.
We continue adding arcs and contacting cycles in this manner. Note that the indices of the numbers are equal to the number of steps or, equivalently, to the arcs removed from the bucket and added to the graph Tgraph where is the current recursion level, i.e., the number of cycles contracted into supervertices. All arc additions not leading to cycles are associated with eigenvalues. The indices of the numbers and at step are .
The main cycle terminates as the bucket becomes empty. This stopping criterion allows us to obtain the whole collection of the critical timescales and the whole hierarchy of Tgraphs. Suppose the main cycle terminates at the recursion level . Then Algorithm 1 returns to the previous recursion level and expands supervertex back into the cycle . Then, if , Algorithm 1 returns to the recursion level and expands supervertex back into the cycle . And so on, until recursion level zero is reached. After that, one can extract the optimal Wgraphs out of the appropriate Tgraphs..
Below, Algorithm 1 is given in the form of a pseudocode. The operators Contract and Expand are described in Section 4.2 below.
Algorithm 1
Initialization: Set step counter , cycle or recursion depth counter , and eigenvalue index . Prepare the bucket , i.e., for every state , find the minarc , and then sort the set according to the arc weights in the increasing order:
The graph is original graph .
Set the graph . Set .
The main body of the Algorithm: Call the function FindTgraphs with arguments , , , , and .
Function FindTgraphs
while is not empty and has no cycles
(1) Increase the step counter: ;
(2) Transfer the minimum weight arc from the bucket to the graph ;
(3) Set ;
(4) Set ;
(5) Check whether has a cycle; if so, denote the cycle by ;
if contains no cycle
(6) Decrease the eigenvalue index and set ; ;
(7) Set and denote by the sink ofthe connected component of containing the tail of ;
end if
end while
if a cycle was detected at step (5)
(8) Save the index at which the cycle arose: set ;
(9) Remove the arcs with both tails and heads in (if any);
if the set of arcs of with tails in and heads not in is not empty
(10) Modify weights and prefactors of all arcswith tails in and heads not in according to Eq. (11);
(11) Contract into a supervertex :
= Contract; = Contract;
(12) Denote the minarc from by and add it to ;
(13) Call the function FindTgraphs;
(14) Expand the supervertex back into the cycle :for = Expand; end for
end if
end if
end
The flow chart of Algorithm 1 is shown in Fig. 2. At each recursion level, the Tgraphs obtained in the whilecycle are placed into the green boxes, while the ones obtained by the Expand operator are placed into the yellow boxes. The number is the index of the last step of Algorithm 1. At , cycle is detected in . Hence steps (8) and (9) are performed. However, since it is the last step, the set of arcs in with tails in and heads not in is empty. Therefore, steps (10)  (14) are not executed. Instead, the algorithm returns to the previous recursion level and lands at step (14) where the function Expand is called.
The output of Algorithm 1 consists of several datasets.

The set of critical exponents defining the critical timescales .

The hierarchy of Tgraphs , indicating the most likely transitions to observe up to timescales , where , , …, respectively.

The set of exponents and prefactors determining sharp estimates of eigenvalues according to . Note that .

The set of sinks and . The optimal Wgraph has the sinks where the sink is defined as the sink . The sinks are the sinks of those connected components of the optimal Wgraphs that absorb the connected components with sinks as the graphs are created. The sets of vertices in connected components of the optimal Wgraph coincides with those of the Tgraph . An explanation how one can extract the optimal Wgraphs from the graphs is given in Section 4.4.

The set of step numbers at which the cycles are created.
Remark 4.1.
The number of arcs in the bucket is reduced by one after each step when no cycle is created, and remains that same otherwise. Let be the smallest step index such that the number of arcs in the bucket at the end of step reaches one. The eigenvalue index switches to during step . A new cycle will be obtained at every step such that . For all , the Tgraphs are connected, while for all , are not connected.
Remark 4.2.
The total number of steps and the total number of cycles satisfy the relationship
(12) 
The maximal recursion level . Hence .
Remark 4.3.
In the case of a timereversible Markov chain, the total number of Freidlin’s cycles of nonzero orders is [10]. Therefore, the total number of steps made by Algorithm 1 in this case will be . In the general case, the total number of cycles satisfies
(13) 
Indeed, let us construct a directed rooted tree of cycles, whose leaves are the vertices, the root is the cycle , the other nodes correspond to the cycles , , and whose arcs are directed toward the root. Each node of this tree except for the root has exactly one outgoing arc. Hence, the total number of arcs is . On the other hand, the nodes corresponding to the cycles , have at least two incoming arcs. Therefore, the number of arcs must satisfy
This inequality implies that . Besides, Assumption 4, implies that there must be at least one cycle. This proves Eq. (13). Hence, the maximal possible recursion depth is achieved if and only if the Markov chain is timereversible.
Remark 4.4.
The stopping criterion of Algorithm 1 can be modified depending on the goal. If one would like to use Algorithm 1 only for finding estimates for eigenvalues and optimal Wgraphs, it suffices to stop the whilecycle as soon as . If one would like to find all Tgraphs for the timescales , one needs to stop the whilecycle as soon as .
Remark 4.5.
For programming purposes, if a cycle is encountered, it is more convenient to merge the sets of outgoing arcs from the vertices in instead of contracting into a single supervertex as it is done in the singlesweep algorithm in [12].
4.2 The functions Contract and Expand
The function = Contract maps the graph onto the graph as follows. All vertices of belonging to the cycle are mapped onto a single vertex of . More formally, let and . Let be the subset of vertices lying in the cycle , and be the subset of arcs of defined by
Then , , if and , and , if and .
The function = Expand is the inverse function of = Contract. If , , then , , if and , and , if and .
4.3 An explanation of the update rules for arc weights and prefactors
In this Section, we explain the update rules for arc weights and prefactors given by Eq. (11) (also see [15, 16, 18]). A cycle
appears in Algorithm 1 if and only if the minarcs from the vertices , are , and the minarc from is . Let us restrict the dynamics of the Markov chain to the cycle and for each state neglect the transition rates of smaller exponential orders than which is for sufficiently small . Then we obtain the following generator matrix approximately describing the dynamics within the cycle :
(14) 
Solving , , we find the approximation to the invariant distribution in :
(15) 
Without the loss of generality we assume that the last added arc to the cycle is . Then for . Multiplying the enumerator and the denominator in Eq. (15) by and neglecting small summands in the denominator we obtain
(16) 
The quasiinvariant distribution (i.e., the approximation to the invariant distribution for dynamics restricted to the subset of states ) allows us to obtain sharp estimates for the exit rates from the cycle via arcs with tails in . For any arc where and , the exit rate from through the arc is approximated by
(17) 
If one treats the cycle as a macrostate, i.e., contracts it into a single vertex, then the effective exit rate from it is given by Eq. (17). Recalling that is the last added arc, one readily reads off the update rule for the arc weights and the prefactors given by Eq. (11).
4.4 Extraction of optimal Wgraphs
Let us imagine that we have abolished the Contract and Expand operators in Algorithm 1 and manipulate the arc sets instead as it is done in the singlesweep algorithm [12], i.e., as it is suggested in Remark 4.5 in Section 4.1. I.e., instead of contracting a cycle into a supervertex, we discard all arcs with both tails and heads in that are not in the current graph , modify the weights and the prefactors of all outgoing arcs with tails in and heads not in according to Eq. (11) and denote the set of such arcs by , and find the arc of minimal weight in and add it to the bucket ; this arc becomes the minarc for all vertices in . The weight of any arc modified according to the update rule Eq. (11) is equal to the increment of the total weight of the graph obtained from the current optimal Wgraph by replacing the arc with and adding the arc that led to the current cycle. This fact and the weak nested property of the optimal Wgraphs (Theorem 2.5) guarantee that the whole hierarchy of the optimal Wgraphs , …, can be extracted from the Tgraphs , …, built by Algorithm 1.
Recall that is the step number at which the eigenvalue counter switches to : , where is the recursion depth at step in Algorithm 1. For convenience, we denote the unique absorbing state of the Tgraph by , i.e., . The optimal Wgraphs can be extracted from the corresponding Tgraphs for . We emphasize that is fully expanded graph, i.e., its set of vertices is . The set of sinks of is