# Path statistics, memory, and coarse-graining of continuous-time random walks on networks

###### Abstract

Continuous-time random walks (CTRWs) on discrete state spaces, ranging from regular lattices to complex networks, are ubiquitous across physics, chemistry, and biology. Models with coarse-grained states, for example those employed in studies of molecular kinetics, and models with spatial disorder can give rise to memory and non-exponential distributions of waiting times and first-passage statistics. However, existing methods for analyzing CTRWs on complex energy landscapes do not address these effects. We therefore use statistical mechanics of the nonequilibrium path ensemble to characterize first-passage CTRWs on networks with arbitrary connectivity, energy landscape, and waiting time distributions. Our approach is valuable for calculating higher moments (beyond the mean) of path length, time, and action, as well as statistics of any conservative or non-conservative force along a path. For homogeneous networks we derive exact relations between length and time moments, quantifying the validity of approximating a continuous-time process with its discrete-time projection. For more general models we obtain recursion relations, reminiscent of transfer matrix and exact enumeration techniques, to efficiently calculate path statistics numerically. We have implemented our algorithm in PathMAN, a Python script that users can easily apply to their model of choice. We demonstrate the algorithm on a few representative examples which underscore the importance of non-exponential distributions, memory, and coarse-graining in CTRWs.

## I Introduction

We can model many dynamical systems in physics, chemistry, and biology as random walks on discrete state spaces or network structures. For example, random walks can represent proteins folding on a coarse-grained network of conformational states Noe2009 (); Lane2011 (), particles diffusing in disordered, fractal-like media benAvraham2000 (); Redner2001 (), populations evolving in DNA or protein sequence space Weinreich2006 (); Manhart2015 (), and cells differentiating across epigenetic landscapes of regulatory states Enver2009 (); Lang2014 (). One can also use random walks to probe the structure of empirical complex networks, such as protein-protein interaction networks or the World Wide Web Albert2002 (); Gallos2007 (); Condamin2007 (). The central problem in these models is characterizing the statistical properties of paths taken by the system as it evolves from one state to another, especially for systems out of equilibrium. This entails understanding not only the distribution of lengths and times for these paths, but also their distribution in the state space, which may reveal bottlenecks and indicate the diversity of intermediate pathways.

There is extensive literature for random walks on lattices Montroll1965 (); Weiss1994 (); Redner2001 (), fractals Redner2001 (); benAvraham2000 (), and random and scale-free networks Albert2002 (); Bollt2005 (); Condamin2007 () in the absence of an energy landscape or other objective function. Much of this work has focused especially on the scaling behavior of first-passage times and the mean square displacement, the latter being important to identifying anomalous diffusion benAvraham2000 (). More complex models, especially those with energy landscapes derived from empirical models or experimental data, generally require numerical approaches. One such approach is transition path theory Metzner2009 (), which relies on numerical solutions of the backward equation. This technique has been used to study Markov state models of molecular kinetics such as protein folding Noe2009 (); Lane2011 ().

Standard transition path theory, however, is not applicable to general continuous-time random walks Weiss1994 () (CTRWs) where states may have non-exponential waiting time distributions, nor does it address the complete distribution of first-passage times beyond the mean. These problems are important in many systems. For example, molecular Markov state models require grouping large numbers of microscopic conformations of molecules into a small number of effective states Prinz2011 (); the stochastic dynamics are then analyzed on this effective model Noe2009 (); Lane2011 (). However, this coarse-graining is known to lead to qualitative differences with the underlying microscopic dynamics Prinz2011 (). In particular, the loss of information due to coarse-graining can lead to the appearance of memory, manifested as non-exponential waiting time distributions, in the coarse-grained states. Indeed, there is evidence of non-exponential distributions of time in protein conformation dynamics Sabelko1999 (); Yang2003 () and enzyme kinetics Flomenbom2005b (); Reuveni2014 (). Non-exponential distributions can also arise from spatial disorder, as in glassy systems Campbell1988 (); Angelani1998 (). Other linear algebra-based methods have been developed to treat general CTRWs Hunter1969 (); Yao1985 (); Harrison2002 (), but such methods are complicated and provide relatively little physical insight.

An alternative, more intuitive approach to CTRWs uses the path representation: statistical properties of CTRWs are decomposed into averages over the ensemble of all possible stochastic paths through state space. Some analytical results with this approach for arbitrary energy landscapes and waiting time distributions have been obtained, but only for 1D lattices, due to the difficulty of enumerating paths Flomenbom2005a (); Flomenbom2007a (); Flomenbom2007b (). On the other hand, path sampling methods Harland2007 () are able to treat arbitrary network topologies, but these methods have not been developed for non-exponential waiting time distributions, and in any case sampling is likely to be inefficient for calculating higher moments of path statistics, which are crucial when non-exponential distributions are expected.

Here we develop a generalized formalism for the path ensemble of a CTRW on a network of discrete states, regardless of their connectivity, energy landscape, or intermediate state waiting time distributions. In Sec. II we use statistical mechanics of the nonequilibrium path ensemble for a CTRW to obtain expressions for arbitrary moments of path statistics including path length, time, action, and any conservative or nonconservative force along a path. We use this formalism to deduce general relationships among the distributions of path time, length, and action, as well as several exact relationships for the case of homogeneous networks. In Sec. III we derive recursion relations, reminiscent of transfer matrix and exact enumeration techniques, to efficiently calculate various path statistics numerically, including distributions of paths in the state space. We have implemented our approach in a user-friendly Python script called PathMAN (Path Matrix Algorithm for Networks), freely available at https://github.com/michaelmanhart/pathman, that users can apply to their own models.

In Sec. IV we demonstrate the numerical algorithm on a few examples. After illustrating some basic concepts on a simple 1D random walk, we apply our method to a 1D comb to show how coarse-graining can lead to the appearance of memory, in the form of non-exponential waiting time distributions. We quantify the effect of the memory on the distribution of total path times. We further demonstrate the effect of coarse-graining in a 2D double-well potential, from which we deduce some general properties of memory arising from coarse-graining. Lastly, we use our method to show how spatial disorder can also lead to non-exponential distributions of path statistics in the 2D random barrier model.

## Ii Distributions in the path ensemble

### ii.1 Continuous-time random walks and memory

Consider a stochastic process on a finite set of states: the process makes discrete jumps between states with continuous-time waiting at each state in between jumps. Such a process is known as a continuous-time random walk Weiss1994 () (CTRW), and it can describe many physical or biological systems, such as a protein traversing a coarse-grained network of conformations toward its folded state Noe2009 (); Lane2011 () or a particle traveling through a disordered material benAvraham2000 (); Redner2001 (). The time the system waits in a state before making a jump to is distributed according to . In many models this distribution depends only on the current state and not on the destination , so that ; such waiting time distributions are known as “separable” Haus1987 (). We will mostly assume separable distributions throughout this paper. However, since non-separable distributions arise crucially in coarse-grained models, we will also briefly discuss how to extend our results to the non-separable case. Let the raw moments of the waiting time distributions be denoted as

(1) |

We assume every state has at least one finite moment for ; the zeroth moment is always by normalization. In the special case of a discrete time process, and the moments are .

Given the system has finished waiting in and makes a jump out, the probability of jumping to is given by the matrix element , where is an matrix and denotes an -dimensional vector with at the position corresponding to the state and 0 everywhere else. The jump probabilities out of each state are therefore normalized according to , with by definition (since a jump must leave the current state). The matrix imposes a network structure over the states in , with edges directed and weighted by the entries in . We can think of the jump process alone as a discrete-time projection of the model, since it describes the system’s dynamics if we integrate out the continuous waiting times.

An ordinary Markov process is a special case of the above CTRW construction. A continuous-time Markov process is typically defined by a rate matrix such that in a small time interval , the probability of making a jump is . Therefore the probability of making the jump , given that the system makes any jump out of during , is

(2) |

which defines the relation between the Markov rate matrix and the jump matrix . The probability per unit of waiting time in and then making a jump out is given by

(3) |

The waiting time distribution is then the continuous limit of Eq. 3:

(4) |

where

(5) |

is the mean waiting time in . Hence waiting times in a Markov process always have an exponential distribution. The higher moments of exponential waiting times are completely determined by the mean: .

In general, processes with exponential distributions of times are important because they are memoryless in the following sense: the probability of taking at least time , given the process has already taken at least time , is the same as taking at least in the first place. That is, the system “forgets” the time it has already taken. Mathematically this means that

(6) |

where is the complementary cumulative distribution function. The only function satisfying Eq. 6 is a simple exponential , from which it follows that . For waiting time distributions , non-exponential functions are therefore indicative of memory within a state: how much longer the system tends to wait in that state depends on how long it has already waited. In contrast to ordinary Markov models where is always exponential, models where may be non-exponential are sometimes known as semi-Markov processes.

### ii.2 The ensemble of first-passage paths

We approach CTRWs using the ensemble of first-passage paths Flomenbom2005a (); Sun2006 (); Flomenbom2007a (); Flomenbom2007b (); Harland2007 (); Manhart2013 (); Manhart2014 (), which first reach a particular final state or a set of final states from some initial conditions. We are interested in statistical properties of this ensemble such as its distributions in length, time, and space. In addition to situations where first-passage properties themselves are of interest, first-passage paths constitute fundamental building blocks of a stochastic process since the full propagator and steady state can in principle be derived from them Redner2001 ().

Let be the set of final states, which we will treat as absorbing ( for all and ) so that the first-passage condition is satisfied. Define a path of length to be an ordered sequence of states: . Denote the probability distribution over initial states as . Then the probability density of starting in a state and completing the path at exactly time is given by

(7) |

where are the intermediate waiting times and is the Dirac delta function. The probability of completing the path irrespective of how much time it takes is then

(8) |

The time-independent path probability is convenient because we can express many path statistics of interest as averages over this distribution, analogous to averages over the Boltzmann distribution in ordinary statistical mechanics Harland2007 (); Manhart2013 (); Manhart2014 (). For example, let be a functional that measures some property of the path . We use angle brackets to denote the average of this quantity over the path ensemble:

(9) |

where the sum is over all first-passage paths of any length ending at states in . Note that the partition function of the first-passage path ensemble, , always equals 1, since the process must reach one of the final states eventually. In this manner we can calculate nonequilibrium (first-passage) properties of the system as equilibrium properties of the path ensemble, which is time-independent by construction.

### ii.3 Distribution of path lengths

The simplest path property is its length , i.e., the discrete number of jumps along the path. The mean path length is then

(10) |

Functionals for the higher moments of path length are simply powers of the length functional,

(11) |

and the path length probability distribution is

(12) |

where is the Kronecker delta. Note that the distribution of path lengths depends only on the jump matrix and not on the waiting time distributions , and hence it can be thought to characterize the discrete-time projection of the underlying continuous-time stochastic process. In Appendix A we show that the distribution of path lengths is typically exponential asymptotically:

(13) |

where is a constant of order 1 and is the mean path length.

### ii.4 Distribution of path times

In contrast to the discrete length of a path, there is also the continuous time of the path that accounts for the variable waiting times at the intermediate states. The distribution of total path times (first-passage time distribution) is

(14) |

Unlike the path length distribution, the path time distribution depends on both the jump matrix as well as the waiting time distributions . We cannot evaluate for arbitrary waiting time distributions ; however, we can express its moments as simple averages over the time-independent path ensemble using path functionals (cf. Eq. 9). That is, using Eqs. 7 and 14, we obtain

(15) |

where the functional for the th moment of path time is

(16) |

Each summation in the multinomial expansion is from to subject to the constraint . For example, the first few moments are

(17) |

Note that Eq. 16 implies that if any accessible intermediate state has a divergent waiting time moment of order , then all path time moments of order and higher must be divergent as well.

### ii.5 Path action and a general class of path functionals

For many systems it is important to determine whether their dynamics are highly predictable or highly stochastic; that is, whether the system is likely to take one of a few high-probability paths every time, or whether there is a more diverse ensemble of paths with similar probabilities. One way to quantify this notion uses the path action, defined as

(18) |

so that path probability is . As in classical and quantum mechanics, paths with minimum action dominate while paths of large action are suppressed. Note that like path lengths, action depends only on the jump probabilities and not on the waiting time distributions.

The mean path action is the Shannon entropy of the path distribution Filyukov1967 () (we ignore the path-independent contribution from the initial condition):

(19) |

This is consistent with the idea that the path action distribution tells us about the diversity of paths in the ensemble: low entropy (small mean action) means that a few paths with large probability dominate the process, while large entropy (large mean action) means that a diverse collection of low-probability paths contribute. The distribution of actions around this mean may be non-trivial, however. For instance, even if the mean action is large, the variance around it could either be small (the system must traverse one of the low-probability paths) or large (the system may traverse paths with a wide range of probabilities). We can characterize the action distribution by considering its higher moments. The functional for the th moment of path action is

(20) |

so the total moments of the path action distribution are

(21) |

The action functionals (Eq. 20) share a similar multinomial form with the time functionals (Eq. 16). This leads us to consider a more general class of path functionals with this form. Consider a path functional that sums some property over jumps in a path (or edges in the network), so that for a path of length ,

(22) |

In the case of action, . Equation 22 is a discretized line integral along the path , which suggests thinking of as representing a force acting on the random walker as it traverses a path. The statistics of such forces over paths are especially interesting when the force is nonconservative, i.e., the line integral depends on the whole path and not just on the end points. Non-transitive landscapes or non-gradient forces with this property have been considered in evolutionary theory Mustonen2010 () and biochemical networks Wang2008 (). However, even for conservative forces, the distribution of the line integral may be non-trivial over the path ensemble if there are multiple initial and final states. The moment functionals for any such quantity are again of the multinomial form:

(23) |

This suggests that methods for calculating time or action moments can be applied to any path property in this general class.

### ii.6 Path statistics on a homogeneous network

We now consider these statistics of path length, time, and action in the simple case of a network with homogeneous properties. We first assume that the waiting time distributions are identical for all states, with raw moments and cumulant moments . We do not assume anything about the jump matrix (i.e., the network connectivity). In Appendix B we derive an exact relation between path time and length moments for arbitrary :

(24) |

where are the partial Bell polynomials Comtet1974 (). For example, the first few moments are

(25) |

Note that the th time moment depends on all length moments up to . Equation 24 holds for the cumulants and as well (Appendix B). In Appendix C we present an alternative argument for the first two moments of Eq. 25 using the central limit theorem, and in Appendix D we study the special case when is exponential.

As these results show, we can think of path time as a convolution between path length and the intermediate waiting times: the variation in total path times arises from both variation in path lengths as well as variation in the waiting times. If is a delta function (discrete-time process), then for (no variation in waiting times), and exactly: path lengths and times are identical up to an overall scale. This is consistent with our previous notion that the path length distribution fully describes the discrete-time projection of the process. However, even with continuous-time distributions , the approximation , and therefore the approximate equivalence of the discrete- and continuous-time processes, may still hold if the waiting times are not too broadly dispersed (so the higher moments of are not too large). We can make this observation more quantitative by expanding Eq. 24 as

(26) |

where

(27) |

is the waiting time coefficient of variation (CV), i.e., the standard deviation divided by the mean. The CV measures the relative dispersion of a distribution; it always equals 1 for exponential distributions. As with Eq. 24, Eq. 26 holds for the cumulants and as well.

Equation 26 implies that path length and time moments will be approximately proportional, and hence the whole distributions should be similar, if

(28) |

The quantity is typically of the order of the inverse mean path length ; in particular this is true when path lengths have an exponential distribution, which is generally the case asymptotically (Appendix A). An important exception, however, is if lengths have a Poisson distribution, so that in the cumulant version of Eq. 26. Apart from this special case, the condition of Eq. 28 is equivalent to

(29) |

that is, the waiting time distribution must be sufficiently narrow compared to the mean path length. In many cases we expect this to hold, since for exponential-like waiting time distributions and the mean path length is usually very large. We will investigate the validity of this condition in later examples.

We also consider path action on a homogeneous network. Path action depends only on the jump matrix and not on the waiting time distributions , so as a simple example we take all states in to have the same number of outgoing jumps (nearest neighbors on the network) and all such jumps to have equal probability . Therefore the probability of a path is , and the action is . This means that the distribution of path actions is exactly equivalent to that of path lengths (rescaled by a factor of ), and the moments are

(30) |

## Iii Matrix formulation and numerical algorithm

Besides gaining general insights into the relationships between distributions of path lengths, times, and actions, the path ensemble formalism is convenient because we can efficiently calculate many ensemble averages using recursion relations that implicitly sum over all paths. We now derive these relations and show how to implement them numerically.

### iii.1 Recursion relations

We reformulate the problem in terms of matrices to express the sums over paths more explicitly. Let be an matrix ( is the number of states in ) such that the matrix element is the th time moment of all paths of exactly length from to . In particular, the zeroth-order matrix gives the total probability of all paths going from to in exactly jumps. The initial condition is

(31) |

where is an identity matrix. If our path ensemble is the set of first-passage paths to final states with an initial distribution vector , then

(32) |

is the th time moment for all paths of exactly length , and

(33) |

is the moment averaged over paths of all lengths. This expression illustrates how to express the previous path ensemble averages in the matrix formulation.

The key advantage of the matrices is that they obey the following recursion relation (Appendix E):

(34) |

where is an matrix with waiting time moments for each state along the diagonal:

(35) |

Appendix E also shows how this recursion relation for the path time moments generalizes to the case of non-separable waiting time distributions . For the total probability (), the recursion relation of Eq. 34 is simply multiplication by the jump matrix: , since the total probability of going from one state to another in exactly jumps must be given by the product of the jump matrices .

Owing to the similar multinomial form of their path functionals (compare Eqs. 16 and 20), the path action moments obey a similar recursion relation. Define to be the th action moment of all paths of length from to , so that

(36) |

In Appendix E we show that these matrices obey the recursion relation

(37) |

where the matrix is defined so that

(38) |

In fact, if is the matrix such that

(39) |

for any path functional in Eq. 22, it obeys the recursion relation

(40) |

where (Appendix E). Therefore recursion relations of this form extend to a wide class of path statistics.

### iii.2 Transfer matrices

To calculate the th moment of time or action, we must carry out the recursion relation of Eq. 34 or 37 for all moments up to . We can unify all these steps into a single transfer matrix operation convenient for numerical use. Let be the maximum moment of interest. Define the -dimensional column vector as a concatenation of for all :

(41) |

Define the basis vectors for and so that the entry of is the th time moment at state at the th jump: . We similarly define the action vector

(42) |

Now define the matrices

(43) |

(44) |

These recursion relations have the solutions

(45) |

where the initial conditions are

(46) |

Here each represents a zero column vector of length . We can think of and as transfer matrices that iteratively generate sums over the path ensemble to calculate moments. This is analogous to transfer matrices in spin systems that generate the sums over spin configurations to calculate the partition function Yeomans1992 (). The zeroth-order version of this formalism, which simply calculates powers of the jump matrix , is equivalent to the exact enumeration method for discrete-time random walks Majid1984 (); benAvraham2000 ().

We can obtain most path statistics of interest by various matrix and inner products on these vectors. Define the cumulative moment vectors

(47) |

(48) |

These represent the total th moments of time and action for all paths through each state , but weighed by the number of visits to that state since the sum over counts a path’s contribution each time it visits . In the case of ,

(49) |

since (Eqs. 34 and 37). This is actually the average number of visits to a state , since the probability of a path is counted each time it visits . For an intermediate state , multiplying the mean number of visits by the mean waiting time gives the average time spent in . When is a final state in , the random walk can only visit it once (if it absorbs at that final state) or zero times (if it absorbs at a different final state), and thus the average number of visits equals the probability of reaching that final state (commitment probability).

If there are multiple final states in , we often wish to sum path statistics over all of them. Define the -dimensional row vector (with 1 at the position for each final state and 0 everywhere else) and the matrix

(50) |

where each is a zero row vector of length . Multiplying this matrix on a corresponding vector will sum over all final states for each moment, leaving an -dimensional vector with the total moments. For example,

(51) |

where we use the shorthand for the total th time moment absorbed at the th jump. Note that is the probability of reaching any of the final states in exactly jumps. Thus this method allows us to calculate the entire path length distribution. On the cumulative time vector , returns the total time moments:

(52) |

where is total time moment over all paths. The matrix similarly acts on the action vectors and :

(53) |

where is the th action moment absorbed in all final states at the th jump, and is the total th action moment.

Finally, for any function of state , we can calculate the average value of that function at the th intermediate jump. Define the -dimensional row vector

(54) |

where there are zero row vectors , each of length . The row vector acts on to return the value of averaged over the probability distribution across all intermediate states at the th jump:

(55) |

For example, if , tells us the unconditional mean time spent at the th intermediate jump. If is set to a position in space corresponding to state (for systems that allow embedding of states into physical space), is the average position at the th intermediate jump, which over all traces the average path of the system.

### iii.3 Convergence and asymptotic behavior of path sums

To numerically calculate the foregoing matrix quantities, we must truncate the sums over path lengths at some suitable cutoff . If there are no loops in the network, then the jump matrix is nilpotent, meaning there is a maximum possible path length such that for all . In this case all sums converge exactly after jumps. If the network has loops, paths of arbitrarily long length have nonzero probability. We must then choose a desired precision and truncate the sums at when

(56) |

The first condition guarantees that the total probability has converged: all remaining paths have total probability less than . The second condition indicates that the th contribution to the maximum moment is sufficiently small relative to the total moment calculated so far.

A potential problem with the second convergence condition arises when the state space is periodic, so the final states can only be reached in a number of jumps that is an integer multiple of the periodicity (plus a constant). For instance, square lattices have a periodicity of 2. In that case, will alternate between zero and nonzero values as alternates between even and odd values. To prevent these zero values of from trivially satisfying the second condition in Eq. 56, we also require that be nonzero. A more subtle problem can arise if there are very low-probability paths with very large contributions to the higher time moments. For example, one can construct a model where there are extremely long paths with probabilities much smaller than but which make arbitrarily large contributions to the total time moments due to the waiting time moments at those states. The algorithm will satisfy the convergence criteria before these paths are summed and therefore miss their contributions. This is an extreme example, but in general one may need to reconsider the convergence test depending on the properties of the model at hand.

How does the cutoff depend on the maximum moment ? To address this we must determine the asymptotic behavior of for different . As long as the network has loops, the path length probability distribution is asymptotically exponential (Eq. 13; see Appendix A). To estimate the asymptotic dependence of the higher time moments on path length, we consider the special case of identical waiting time distributions as in Sec. II.6. Since

(57) |

we can use the approximation from Eq. 26 (valid when the waiting time distributions are not too dispersed) to obtain

(58) |

Since , the higher moments decay nearly exponentially (up to a logarithmic correction) with the same rate as the probability, set by the mean path length . We expect this asymptotic behavior to remain valid even when the waiting time distributions are not all the same, as long as the length and time moments are approximately proportional; we will empirically verify this expectation in later examples.

Although all asymptotically decay with exponential dependence on , the logarithmic correction in the exponent shifts the exponential regime toward larger for higher moments; this is why we test convergence on the maximum moment in Eq. 56. Indeed, in Eq. 58 is maximized by , after which exponential decay sets in. Thus we expect scaling for the cutoff to be to leading order.

The approximate exponential dependence of the moments also enables a convergence scheme more sophisticated than Eq. 56. Since we know the asymptotic dependence of all moments will be approximately exponential for long paths, we can simply calculate the moments for path lengths until all have reached their exponential tails, and then fit exponential functions and extrapolate to infer the contributions of the longer paths. Conceptually, this means that all long path behavior is contained in the statistics of shorter paths, since long paths are simply short paths with many loops Sun2006 (); Manhart2013 (). In practice this procedure can help to avoid calculating extremely long paths unnecessarily.

### iii.4 Numerical implementation in PathMAN

We have implemented the aforementioned matrix formulation in a Python script called PathMAN (Path Matrix Algorithm for Networks), available at https://github.com/michaelmanhart/pathman, with additional scripts for generating examples and analyzing output. Figure 1 shows the pseudocode. Since the jump matrix is typically very sparse, we can store all matrices in sparse formats for efficient storage and computation using SciPy’s sparse linear algebra module SciPy (). The script is general enough to treat any CTRW on a finite discrete space given a list of states, their jump probabilities, and at least their first waiting time moments. The current implementation assumes separable waiting time distributions, but modifying it to run the calculations for non-separable distributions (Appendix E) is straightforward. The user can specify any path boundary conditions (initial distribution and final states) and functions of state to average over. The script reads all input data from plain text files in a simple format (see GitHub repository for documentation).

Define initial distribution and set of final states |

Define jump and waiting time matrices , (Eq. 35), (Eq. 38) |

Define transfer matrices and (Eq. LABEL:eq:Kb_Gb_def) |

Define final state sum matrix (Eq. 50) |

Define row vector (Eq. 54) for each state function |

Define initial transfer vectors (Eq. 46) |

Define initial cumulative vectors , |

For |