Faster Algorithms for Computing the Stationary Distribution, Simulating Random Walks, and More
In this paper, we provide faster algorithms for computing various fundamental quantities associated with random walks on a directed graph, including the stationary distribution, personalized PageRank vectors, hitting times, and escape probabilities. In particular, on a directed graph with vertices and edges, we show how to compute each quantity in time , where the notation suppresses polylogarithmic factors in , the desired accuracy, and the appropriate condition number (i.e. the mixing time or restart probability).
Our result improves upon the previous fastest running times for these problems; previous results either invoke a general purpose linear system solver on a matrix with non-zero entries, or depend polynomially on the desired error or natural condition number associated with the problem (i.e. the mixing time or restart probability). For sparse graphs, we obtain a running time of , breaking the barrier of the best running time one could hope to achieve using fast matrix multiplication.
We achieve our result by providing a similar running time improvement for solving directed Laplacian systems, a natural directed or asymmetric analog of the well studied symmetric or undirected Laplacian systems. We show how to solve such systems in time , and efficiently reduce a broad range of problems to solving directed Laplacian systems on Eulerian graphs. We hope these results and our analysis open the door for further study into directed spectral graph theory.
The application and development of spectral graph theory has been one of the great algorithmic success stories of the past three decades. By exploiting the relationship between the combinatorial properties of a graph, the linear algebraic properties of its Laplacian, and the probabilistic behavior of the random walks they induce, researchers have obtained landmark results ranging across multiple areas in the theory of algorithms, including Markov chain Monte Carlo techniques for counting [DBLP:journals/iandc/SinclairJ89, karp1989monte, jerrum2004polynomial] and volume estimation [lovasz1990mixing, dyer1991random, vempala2005geometric, lovasz2006fast, lovasz2006hit, lee2016geodesic], approximation algorithms for clustering and graph partitioning problems [alon1985lambda1, SpielmanTengSolver:journal, kannan2004clusterings, AndersenCL06, orecchia2012approximating], derandomization [hoory2006expander, reingold2008undirected], error correcting codes [spielman1995linear, sipser1996expander], and the analysis of random processes [lovasz1993random], among others. In addition to their theoretical impact, spectral techniques have found broad applications in practice, forming the core of Google’s PageRank algorithm, playing a ubiquitous role in practical codes for machine learning, computer vision, clustering, and graph visualization. Furthermore they have enabled the computation of fundamental properties of various Markov chains, such as stationary distributions, escape probabilities, hitting times, mixing times, and commute times.
More recently, spectral graph theory has been driving an emerging confluence of algorithmic graph theory, numerical scientific computing, and convex optimization. This recent line of work began with a sequence of papers that used combinatorial techniques to accelerate the solution of linear systems in undirected graph Laplacians, eventually leading to algorithms that solve these systems in nearly-linear time [SpielmanTengSolver:journal, KoutisMP10:journal, KoutisMP10, KoutisMP11, KelnerOSZ13, lee2013efficient, Cohen:2014:SSL:2591796.2591833, PengS14, LeePS15, KyngLPSS15:arxiv]. This was followed by an array of papers in the so-called “Laplacian Paradigm” [Teng10], which either used this nearly-linear-time algorithm as a primitive or built on the structural properties underlying it to obtain faster algorithms for problems at the core of algorithmic graph theory, including finding maximum flows and minimum cuts [christiano2011electrical, lee2013new, sherman2013nearly, kelner2014almost, peng2016approximate], solving traveling salesman problems [asadpour2010log, anari2015effective], sampling random trees [kelner2009faster, madry2015fast], sparsifying graphs [SpielmanS08, SpielmanS11, allen2015spectral, LeeS15], computing multicommodity flows [kelner2012faster, kelner2014almost], and approximately solving a wide range of general clustering and partitioning problems [alon1985lambda1, kannan2004clusterings, AndersenCL06, orecchia2012approximating].
While these recent algorithmic approaches have been very successful at obtaining algorithms running in close to linear time for undirected graphs, the directed case has conspicuously lagged its undirected counterpart. With a small number of exceptions involving graphs with particularly nice properties and a line of research in using Laplacian system solvers inside interior point methods for linear programming [DBLP:conf/focs/Madry13, LeeS14, DBLP:journals/corr/CohenMSV16], the results in this line of research have centered almost entirely on the spectral theory of undirected graphs. While there have been interesting results in candidate directed spectral graph theory [ref1, AndersenCL06, GuoM15:arxiv], their algorithmic ramifications have been less clear.
One problem that particularly well illustrates the discrepancy between the directed and undirected settings is the computation of the stationary distribution of a random walk. Computing this is a primary goal in the analysis of Markov chains, constitutes the main step in the PageRank algorithm, remains the missing piece in derandomizing randomized log space computations , and is necessary to obtain the appropriate normalization for any of the theoretical or algorithmic results in one of the few instantiations of directed spectral graph theory [ref1, AndersenCL06].
In the undirected setting, the stationary distribution is proportional to the degree of a vertex, so it can be computed trivially. However, despite extensive study in the mathematics, computer science, operations research, and numerical scientific computing communities, the best previously known asymptotic guarantees for this problem are essentially what one gets by applying general-purpose linear algebra routines. Given a directed graph with vertices and edges these previous algorithms fall into two broad classes:
Iterative Methods: These aim to compute the stationary distribution by either simulating the random walk directly or casting it as a linear system or eigenvector computation and applying either a global or coordinate-wise iterative method to find it. The running times of these methods either depend polynomially on the relevant numerical conditioning property of the instance, which in this case is, up to polynomial factors, the mixing time of the random process; or they only compute a distribution that only approximately satisfies the defining equations of the stationary distribution, with a running time that is polynomial in . There has been extensive work on tuning and specializing these methods to efficiently compute the stationary distribution, particularly in the special case of PageRank. However, all such methods that we are aware of retain a polynomial dependence on either the mixing time, which can be arbitrary large as a function of the number of edges of the graph, or on .111A possibleexception, is the algorithm that invokes conjugate gradient in a blackbox manner to solve the requisite linear system to compute the stationary distribution. At best this analysis would suggest an running time. However, it is not known how to realize even this running time in the standard word-RAM model of computation.
Fast Matrix Multiplication: By using a direct method based on fast matrix multiplication, one can find the stationary distribution in time in time , where [DBLP:conf/stoc/Williams12] is the matrix multiplication exponent. These methods neglect the graph structure and cannot exploit sparsity. As such, even if one found a matrix multiplication algorithm matching the lower bound of , this cannot give a running time lower than , even when the graph is sparse.
Another problem which well demonstrates the gap between directed and undirected graph problems is that of solving linear systems involving graph Laplacians. For undirected graphs, as we have discussed there are multiple algorithms to solve associated Laplacian systems in nearly time. However, in the case of directed graphs natural extensions of solving Laplacian systems are closely related to computing the stationary distribution, and thus all known algorithms either depend polynomially on the condition number of the matrix or the desired accuracy or they require time . Moreover, many of the techniques, constructions, and properties used to solve undirected Laplacian systems either have no known analogues for directed graphs or can be explicitly shown to not exist. This gap in our ability to solve Laplacian systems is one of the the primary reasons (perhaps the primary reason) that the recent wave of graph algorithms based on the “Laplacian Paradigm” have not produced directed results to match the undirected ones.
Given the fact that, despite several decades of work on designing specialized methods for this problem, there are no methods known that asymptotically improve upon general linear algebra routines, along with the structural problems in translating the techniques from the undirected case, it would not be unreasonable to expect that the best one can hope for is heuristic improvements in special cases, and that the worst-case asymptotics for graph Laplacians are no better than the that is known for general matrices.
In this paper, we show that this is not the case by providing an algorithm that solves directed graph Laplacian systems—a natural generalization of undirected graph Laplacian systems—in time where here and throughout the paper the notation hides polylogarithmic factors in , the desired accuracy, and the natural condition numbers associated with the problem. Consequently, we obtain the first asymptotic improvement for these systems over solving general linear systems.222In follow up work, the authors of this paper in collaboration with Anup Rao have improved the running time to almost linear in the number of edges in the graph, meaning the running time is linear if we ignore contributions to the running time that are smaller than any polynomial. This paper will be made available online as soon as possible. In particular, when the graph is sparse, i.e. , our algorithm runs in time , breaking the barrier of that would be achieved by algorithms based on fast matrix multiplication if . We then leverage this result to obtain improved running times for a host of problems in algorithmic graph theory, scientific computing, and numerical linear algebra, including:
Computing the Stationary Distribution: We compute a vector within distance of the stationary distribution of a random walk on a strongly connected directed graph in time , where the natural condition number of this problem is the mixing time. (See Section 7.2.)
Solving Large Classes of Linear Systems: We provide algorithms that solve a large class of well-studied linear systems. Compared with prior algorithms capable of solving this class, ours are the first that are asymptotically faster than solving general linear systems, and the first that break the barrier for sufficiently sparse instances. Our methods solve directed Laplacian systems and systems where the matrix is row- or column-diagonally dominant. The running time is . (See Section 7.3.)
Computing Personalized PageRank: We compute a vector within distance of the personalized PageRank vector, for a directed graph with with restart probability , in time . Here the natural condition number is . In the case of small and , this improves upon local methods that take time [PageBMW99, jeh2003scaling, fogaras2004towards, AndersenCL06, andersen2007local, lofgren2014fast]. (See Section 7.1).
Simulating Random Walks: We show how to compute a wide range of properties of random walks on directed graphs including escape probabilities, commute times, and hitting times. (See Section 7.4 and Section 7.5.) We also show how to efficiently estimate the mixing time of a lazy random walk on a directed graph up to polynomial factors in and the mixing time. (See Section 7.2.) The runtime for all these algorithms is .
Estimating All Commute Times: We show how to build a size data structure in time that, when queried with any two vertices and , outputs a multiplicative approximation to the expected commute time between and , i.e. the expected amount of time for a random walk starting at to reach and return to . Our data structure is similar to the data structure known for computing all-pairs effective resistances in undirected graphs [SpielmanS08, SpielmanS11]. (See Section 7.6.)
It is important to note that the -notation hides factors that are polylogarithmic in both the condition number (equivalently, mixing time) and the ratio of maximum to minimum stationary probability. As such, the natural parameter regime for our algorithms is when these quantities are subexponential or polynomial. For all the above problems, the best prior algorithms had worst case runtimes no better than in this regime. We hope that our results open the door for further research into directed spectral graph theory, and serve as foundation for the development of faster algorithms for directed graphs.
Our approach for solving these problems centers around solving linear systems in a class of matrices we refer to as directed (graph) Laplacians, a natural generalization of undirected graph Laplacians. A directed Laplacian, is simply a matrix with non-positive off-diagonal entries such that each diagonal entry is equal to the sum of the absolute value of the other off-diagonal entries in that column, i.e. for and (equivalently ). As with undirected Laplacians, every directed Laplacian there is naturally associated with a directed graph , where the vertices correspond to the columns of and there is an edge from vertex to vertex of weight if and only if .
Another close feature of directed and undirected Laplacians is the close connection between random walks on the associated graph and solutions to linear systems in . We ultimately show that solving a small number of directed Laplacian systems suffices to obtain all of our desired applications (See Section 5 and Section 7). Unfortunately, solving linear systems in directly is quite challenging. Particularly troubling is the fact that we while we know has a non-trivial kernel (since , we do not have a simple method to compute it efficiently. Moreover, is not symmetric, complicating the analysis of standard iterative algorithms.Furthermore, the standard approach of multiplying on the left by the transpose, so that we are solving linear systems in , would destroy the combinatorial structure of the problem and cause an intolerably large condition number. A natural idea is to try to work with a symmetrization of this matrix, , but it turns out that this may not even be positive semidefinite (PSD).333Consider the directed edge Laplacian . Then, has an eigenvector with a corresponding eigenvalue of . Consequently, it is not clear a priori how to define an efficient iterative method for computing the stationary or solve systems in it without depending polynomially on the condition number of .
Fortunately, we do know how to characterize the kernel of , even if computing it is difficult a priori. If we let denote the diagonal matrix consistent with , i.e., , then we see that where is the identity matrix and is the random walk matrix associated with . In other words, for any distribution , we have that is the resulting distribution of one step of the random walk where, at a vertex , we pick a random outgoing edge with probability proportional to its weight and follow that edge. The Perron-Frobenius Theorem implies that as long as the graph is strongly connected there is some stationary distribution such that . Consequently, the kernel of is simply the stationary distribution of the natural random walk on multiplied by .
Consequently, we can show that for every directed Laplacian that corresponds to a strongly connected graph, there is always a vector such that (See Lemma 1). In other words, letting denote the diagonal matrix associated with the directed Laplacian satisfies . This says that the total weight of incoming edges to a vertex is the same as the total weight of outgoing edges from that vertex, i.e., that corresponds to the Laplacian of an Eulerian graph. We call such a vector an Eulerian scaling of .
Now, solving systems in an Eulerian Laplacian (i.e., a Laplacian corresponding to an Eulerian graph) seems easier than solving an arbitrary directed Laplacian. In particular, we know the kernel of a , since it is just the all ones vector. In addition, we have that is symmetric and PSD—in fact it is just the Laplacian of an undirected graph! Unfortunately, this does not immediately yield an algorithm, as it is not known how to use the ability to solve systems in such a symmetrization to solve systems in the original matrix.
Ultimately, this line of reasoning leaves us with two fundamental questions:
Can we solve Eulerian Laplacian systems in time
Can we use an Eulerian Laplacian system solver for more than solving Eulerian Laplacian systems?
The major contribution of this paper is answering both of these questions in the affirmative. We show the following:
We show that we can solve Eulerian Laplacian systems in time .
We show that using Eulerian Laplacian systems we can solve broader classes of matrices we refer to as RCDD Z-matrices, and RCDD Z-matrices.
We show that using solvers for RCDD Z-matrices, we can estimate an Eulerian scaling of a directed Laplacian.
Putting these components together we achieve our desired applications. Some of these are applications are straightforward, whereas others require some significant work.
A serious question that arises throughout these results is the degree of precision do we need to carry out our arithmetic operations. This arrises both in using undirected Laplacian system solvers to solving Eulerian Laplacian systems, and then again in using Eulerian Laplacian system solvers to derive the rest of our results. These numerical issues are not merely technicalities—they crucially affect the algorithms we can use to solve our problem. In fact, we will see in Section 4 that, if we disregarded the numerics and relied on frequently-quoted assertions relating the behavior of conjugate gradient to the existence of polynomials with certain properties, we would actually obtain a better running time, but that these assertions do not carry over to reasonable finite-precision setting.
Given these subtleties, we discuss numerical issues throughout the paper, and in Appendix LABEL:sec:numerical_stability we cover particular details of the stability of our algorithms and the precision they require, showing that we can achieve all our results in the standard unit cost RAM model (or any other reasonable model of computation).
In the remainder of this overview we briefly comment on the key technical ingredients of each of these results.
1.1.1 Solving Eulerian Laplacian Systems
To solve a Eulerian Laplacian system , we first precondition, multiplying both sides by , where is a Laplacian of an undirected graph corresponding to , and is its Moore-Penrose pseudoinverse. This shows that it suffices to instead solve, . Now using a nearly-linear-time Laplacian system solver, we can apply to a vector efficiently. As such, we simply need to show that we can efficiently solve systems in the symmetric matrix .
Next, we show that the matrix is, in an appropriate sense, approximable by . Formally we show that is smaller in the sense that , and that it is not too much larger in the sense that . While the first proof is holds for a broad class of asymmetric matrices, to prove the second fact we exploit structure of Eulerian Laplacians, particularly the fact that an Eulerian graph has a decomposition into simple cycles.
Unfortunately, this property doesn’t immediately yield an algorithm for solving Laplacian systems. The natural approach would be to use preconditioned Krylov methods, such as the Chebyshev method or conjugate gradient. These essentially apply a polynomial of to the right hand side. Unfortunately, Chebyshev iterations only yield a time algorithm with this approach. For conjugate gradient, it can be shown that the trace bound leads to time algorithm in exact arithmetic, but, unfortunately, this analysis does not appear to be numerically stable, and we do not know how to show it yields this running time in our computational model.
Instead we implement an approach based on preconditioning and subsampling. We precondition with for a value of we tune. This reduces the problem to only solving linear systems in . To solve these systems we note that we can write this equivalently as and using the factorization of into its edges we can subsample the inner while preserving the matrix. Ultimately, this means we only need to solve systems in plus a low rank matrix which we can do efficiently using the fact that there is an explicitly formula for low rank updates (i.e. Sherman-Morrison-Woodbury Matrix Identity). Trading between the number of such systems to solve, the preprocessing to solve these systems, and the time to solve them gives us our desired running time for solving such linear systems. We show in the appendix that we can, with some care, stably implement a preconditioned Chebyshev method and low rank update formulas. This allows us to circumvent the issues in using conjugate gradient and achieve our running time in the desired computational model.
1.1.2 Solving RCDD Z-matrices
A row column diagonal dominant (RCDD) matrix is simply a matrix where and and a Z-matrix is a matrix where the off-diagonal entries are negative. We show how to solve such matrices by directly reducing them to solving Eulerian Laplacian systems. Given a RCDD Z-matrix , we add an additional row and column, filling in the entries in the natural way so that the resulting matrix is an Eulerian Laplacian. We show that, from the solution to such a linear system, we can immediately glean the solution to systems in . This reduction is analogous to the reduction from solving symmetric diagonally dominant (SDD) systems to solving undirected Laplacian systems. In the appendix we show that this method is stable.
1.1.3 Computing the Stationary
Given a RCDD Z-matrix solver we use it to compute the scaling that makes a directed Laplacian Eulerian, i.e., we compute the stationary distribution. To do this, we pick an initial non-negative diagonal scaling and a initial non-negative diagonal matrix such that is -RCDD, that is each diagonal entry is a larger in absolute value than the sum of the off-diagonal entries in both the corresponding row and column.
We then iteratively decrease and update while maintaining that the matrix is -RCDD. The key to this reduction is the fact that there is a natural way to perform a rank 1 update of to obtain an Eulerian Laplacian, and that the stationary distribution of this Laplacian can be obtained by solving a single system in . Ultimately, this method yields a sequence of stationary distributions that, when multiplied together entrywise, yield a good approximate stationary distribution for . For a more detailed over this approach and this intuition underlying it, see Section 3.1.
1.1.4 Additional Applications
Our algorithms for computing personalized page rank vectors, solving linear systems in arbitrary RCDD matrices, and solving directed Laplacian linear systems are all proven in a similar fashion. We obtain an approximate stationary distribution, rescale the system to make it strictly RCDD, then solve it—all using algorithms from the previous sections in a black box fashion. Therefore, the running times for these applications—and in fact all our applications—depend solely (up to polylogarithmic factors) on the black-box costs of computing the stationary distribution and solving RCDD matrices.
However, our algorithms must determine how much accuracy to request when they invoke these two black-box routines. For computing personalized PageRank, one can determine the accuracy to request based solely on the restart probability. However, for our other applications, the accuracy our algorithms request has a dependence on the condition number of and the ratio of max over min stationary probability. In order to get an unconditional running time—and out of intrinsic interest–-we show how to efficiently compute reasonable upper bounds on these quantities. We use an approach motivated by the close relationship of and mixing time. Specifically, we formulate a notion of personalized PageRank mixing time, then get a polynomially good estimate of this quantity using our ability to solve personalized PageRank. Finally, we show that and personalized pagerank mixing time are equivalent up to factors that are good enough444They are equivalent up to factors polynomial in and themselves. Since these quantities appear only in logs in our runtimes and these logs already have factors of in them, this polynomial equivalence only changes runtimes by a constant factor compared to if we had exact estimates. A similar relationship also holds between and the mixing time of lazy random walks on the graph. (See Section 6 for details.) for our purposes. With a reasonable bound on , we are then able to choose a restart probability that is small enough in order to guarantee that personalized solving PageRank gives a good approximation of the stationary distribution.
Our algorithms for computing hitting times, escape probabilities, and all pairs commute times all start by taking a natural definition for the quantity of interest and massaging it into an explicit formula that has an in it. Then, they use various methods to approximately evaluate the formula. In the case of hitting times, we simply plug everything into the formula and invoke our approximate solver for with appropriately small error. Escape probabilities are handled similarly, except that there are also two unknown parameters which we show we can estimate to reasonable accuracy and plug in.
Perhaps the most sophisticated application is computing all pairs commute times. We show that the commute time from to is given by the simple formula where is the matrix obtained by performing the diagonal rescaling of that turns its diagonal into the stationary distribution, which also makes the graph Eulerian. An interesting feature of this formula is that it involves applying the pseudo-inverse of a matrix of the very same form as the preconditioned system that our Eulerian Laplacian solver uses. Another interesting feature is that when is symmetric, this formula simplifies to . Thus, it is a generalization of the well-known characterization of commute times in terms of effective resistance from undirected graphs. In undirected graphs, all pairs commute times can be computed efficiently via Johnson-Lindenstrauss sketching [SpielmanS08]. We show that a similar approach extends to directed Laplacians as well. While the general approach is similar, the error analysis is complicated by the fact that we only have access to an approximate stationary distribution. If this were used naively, one would have to deal with an approximate version of that, importantly, is only approximately Eulerian. We bypass this issue by showing how to construct an Eulerian graph whose stationary is exactly known and whose commute times approximate the commute times of the original graph. This may be of independent interest.
The fact that a matrix of the form comes up both in solving Eulerian Laplacians and in sketching commute times indicates that it is an even more natural object than it might appear at first.
1.2 Paper Organization
The rest of our paper is organized as follows:
Section 2 - we cover preliminary information
Section 3 - we show how to compute the stationary distribution
Section 4 - we provide our fast Eulerian Laplacian system solver
Section 5 - we reduce strict RCDD linear systems to solving Eulerian systems
Section 6 - we provide condition number quantities for applications and prove equivalences
Section 7 - we provide our applications
Appendix LABEL:sec:numerical_stability - we discuss numerical stability of our solvers
Appendix LABEL:sec:matrix_facts - we provide facts about matrices we use throughout
Appendix LABEL:sec:Deriving-the-Identities - we derive identities for hitting times, commute times, and escape probabilities
In this section we introduce notation and provide basic machinery we use throughout this paper.
Matrices: We use bold to denote matrices and let
denote the identity matrix and zero matrix respectively. For symmetric
matrices we use to denote
the condition that and we define
, , and analogously. We call a symmetric
matrix positive semidefinite if
and we let . For any norm
define on vectors in we define the operator
norm it induces on by
for all .
Diagonals: For we let denote the diagonal matrix with and when it is clear from context we more simply write . For we let denote the vector corresponding to the diagonal of and we let , i.e. with the off-diagonal set to .
Vectors: We let denote the all zeros and ones vectors respectively. We use to denote the indicator vector for coordinate , i.e. for and . Occasionally we apply scalar operations to vectors with the interpretation that they should be applied coordinate-wise, e.g. for we let denote the vector with and we use to denote the condition that for all .
Condition Numbers: Given a invertible matrix we let denote the condition number of . Note that if is a nonzero diagonal matrix then .
Sets: We let and , i.e. the -dimensional simplex.
2.2 Matrix Classes
Diagonal Dominance: We call a possibly asymmetric matrix
-row diagonally dominant (RDD)
if for all ,
-column diagonally dominant (CDD) if ,
and -RCDD if it is both -RDD and -CDD.
For brevity, we call RCDD if it is -RCDD and strictly RCDD
if it is -RCDD for .
Z-matrix: A matrix is called a Z-matrix if for all , i.e. every off-diagonal entry is non-positive.
Directed Laplacian: A matrix is called a directed Laplacian if it a Z-matrix with , that is for all and for all . To every directed Laplacian we associate a graph with vertices and an edge of weight for all with . Occasionally we write to denote that we decompose into the diagonal matrix where is the out degree of vertex in and is weighted adjacency matrix of with if and otherwise. We call the random walk matrix associated with . We call Eulerian if additionally as in this case the associated graph is Eulerian.
(Symmetric) Laplacian: A matrix is called a Symmetric Laplacian or just a Laplacian if it is symmetric and a Laplacian. This coincides with the standard definition of Laplacian and in this case note that the associated graph is symmetric. For a Laplacian we also associate a matrix known as the weighted incidence matrix. Each row of corresponds to an edge and for a canonical orientation ordering of we have , , and if . Note that and thus is always PSD.
Random Walk Matrix: A matrix is called a random walk matrix if for all and . To every random walk matrix we associated a directed graph with vertices and an edge from to of weight for all with . Note if we say that is a directed Laplacian, then is a random walk matrix and the directed graphs associated with and are identical.
Lazy Random Walk Matrix: Given a random walk matrix the -lazy random walk matrix associated with for is given by . When we call this a lazy random walk matrix for short and typically denote it .
Personalized PageRank Matrix: Given a random walk matrix the personalized PageRank matrix with restart probability is given by . Given any probability vector the personalized PageRank vector with restart probability and vector is the vector which satisfies . Rearranging terms we see that hence justifying our naming of as the personalized PageRank matrix.
2.3 Directed Laplacians of Strongly Connected Graphs
Here we give properties of a directed Laplacian, , whose associated graph, , is strongly connected that we use throughout the paper. Formally, we provide Lemma 1 which shows that always has a positive Eulerian scaling, that is a with Eulerian, and that this is given by the stationary distribution of the associated random walk matrix. Using this we completely characterize the kernel of and .
For directed Laplacian whose associated graph is strongly connected there exists a positive vector (unique up to scaling) such that the following equivalent conditions hold.
for the random walk matrix associated with .
for is an Eulerian Laplacian.
If we scale so that then we call the stationary distribution associated with the random walk on the associated graph . We call any vector such that is an Eulerian Laplacian an eulerian scaling for . Furthermore, and .
Since the graph associated with is strongly connected we have that for any note that is positive irreducible and aperiodic with all columns having norm of . Therefore, by the Perron-Frobenius Theorem we know and therefore therefore there exists a positive vector unique up to scaling such that that or equivalently . Therefore, by definition and . Furthermore this implies that for = we have and as is Z-matrix so is and therefore is a Z-matrix. Furthermore clearly being a Eulerian Laplacian implies and therefore we see that the conditions are all equivalent. Lastly, we know that is a symmetric Laplacian associated with an connected graph. Therefore it is PSD having only in there kernel yielding our characterization of the kernel of and . ∎
Note that Lemma 1 immediately implies that any random walk matrix associated with a strongly connected graph has a unique stationary distribution , i.e. and . Furthermore, we see that all -lazy random walks for associated with have the same stationary distribution.
3 Computing the Stationary Distribution
Here we show to compute the stationary distribution given an -RCDD Z-matrix linear system solver. Throughout this section, we let denote a directed Laplacian and our primary goal in this section is to compute an approximate stationary vector such that is approximately Eulerian. The main result of this section is the following:
Theorem 2 (Stationary Computation Using a RCDD Solver).
Given and , a directed Laplacian with nonzero-entries, in time Algorithm LABEL:alg:stationary_computation computes an approximate stationary distribution such that is -RCDD where , , and is the cost of computing an -approximate solution to a -RCDD Z-matrix linear system with -nonzero entries, i.e. computing such that for -RCDD Z-matrix with non-zero entries, . Furthermore, , where is the ratio between the largest and smallest elements of , and the diagonals of all intermediate RCDD systems also have condition number .
The remainder of this section is divided as follows:
3.1 The Approach
Our approach to computing the stationary distribution is broadly related to the work of Daitch and Spielman [DaitchS08] for computing the diagonal scaling makes a symmetric -matrix diagonally dominant. However, the final rescaling that we’re trying to calculate is given by the uniqueness of the stationary distribution, which in turn follows from the Perron-Frobenius theorem. The fact that we’re rescaling and computing on the same objective leads a more direct iterative process.
As in [DaitchS08] we use an iterative that brings a matrix increasingly close to being RCDD. However, Our convergence process is through potential functions instead of through combinatorial entry count; instead of making parts of RCDD incrementally we instead start with a relaxation of that is RCDD and iteratively bring this matrix closer to while maintaining that it is RCDD. We remark that this scheme can also be adapted to find rescalings of symmetric M-matrices.
Our algorithm hinges on two key insights. The first is that if we have positive vectors so that is -RCDD, then for any vector with we can compute the stationary distribution of the directed Laplacian by solving a single linear system in . Note that is clearly invertible (Lemma 4) whereas has a kernel (Lemma 1). Furthermore, is simply a rank-1 update of and consequently there kernel has a closed form, namely (Lemma 3). Since the stationary distribution of is given by its kernel (Lemma 1) we obtain the result.
This first insight implies that if we have a RCDD Z-matrix we can compute the stationary of a related matrix, however a priori it is unclear how this allows to compute the stationary of . The second insight is that if we compute the stationary of , e.g. we compute some such that is an Eulerian Laplacian, then is strictly RCDD. Since is Eulerian, is an all positive matrix which is entrywise less than in absolute value. In other words, removing from strictly decreases the absolute value of the off diagonal entries and increases the value of the diagonal entries causing to be strictly RCDD. Consequently, given we can hope to decrease to achieve to obtain an -RCDD Z-matrix where .
Combining these insights naturally yields our algorithm. We compute an initial-RCDD Z-matrix and compute for with so that is Eulerian. We then compute the smallest so that is again -RCDD and repeat. What remains to show is that for proper choice of initial and as well as proper choice of in each iteration, this algorithm converges quickly provided we use a sufficiently accurate RCDD linear system solver.
The only missing ingredient for such analysis, is how to measure progress of the algorithm. Note that is -CDD if and only if (Lemma 6) and consequently the smallest is the smallest value of we can hope for to keep -RCDD. We show the each iteration of the procedure outlined above brings rapidly closer to d. In particular we show that in each iteration we can ensure that decreases by a multiplicative constant up to a small additive factor depending only on (Lemma 10). Putting this all together we achieve an algorithm (See Section 3.2) that in iterations finds the desired .
3.2 The Algorithm
Following the approach outlined in the previous section, our iterative algorithm for computing the stationary is given by Algorithm LABEL:alg:stationary_computation. First, we compute positive vectors such that and is -DD. In particular, we let and we let be the entry-wise minimal non-negative vector such that is -RCDD. This choice of allows us to reason about the size of easily (See Lemma 7).
Next, we iterate, computing such that is -RCDD. We pick the discussed in the previous section to be as this naturally corresponds to the relative amount of each we want to remove. We then let so that is a directed Laplacian where is nearly Eulerian. We let be the entry-wise minimal vector such that is -RCDD and then we repeat. Outputting the final computed completes the result.
To complete the specification of the algorithm all that remains is discuss the precision with which we need to carry out the operations of the algorithm. There are two places in particular where we might worry that error in inexact arithmetic could hinder the progress of the algorithm. The first is the degree of precision to which we solve the linear system in , i.e. compute . We show that computing instead an approximate such that for suffices. The second is in computing . Unrolling the iterations of the algorithm we see that is the entry-wise product of all previous vectors with . Consequently, one might worry that since that errors could accumulate. We show that computing a bounded precision such that for suffices.
3.3 The Analysis
Here we prove Theorem 2 by showing the correctness of our algorithm for computing the stationary distribution, Algorithm LABEL:alg:stationary_computation. We split our proof into multiple parts:
Section 3.3.1: we show that is indeed a positive vector that makes Eulerian
Section 3.3.2: we provide bounds on
Section 3.3.3: we show how much decreases between iterations
Section 3.3.4: we bound the error induced by solving linear systems approximately
Note that much of the analysis is done in slightly greater generality then required. In particular, much of our analysis works so long as however we constrain so we can easily reason about the stability of the procedure, measure of the quality of the output stationary distribution, and simplify our analysis. Also note that minimal effort was taken to control the actual value of beyond to simplify the proof that suffices.
3.3.1 Properties of
Here we prove that is indeed a positive vector such that is Eulerian. First we provide two general lemmas about the kernel of a matrix after a rank one update, Lemma 3, and the invertibility of -DD matrices Lemma 4. Then, using these lemmas we prove the main result of this subsection, Lemma 5, which yields our desired properties of .
for all invertible and .
If the claim follows trivially. Otherwise, there is with such that . Since is invertible it follows that . ∎
Every strictly RCDD matrix is invertible.
Let be an arbitrary strictly RCDD matrix. By definition, is -RCDD for some. Consequently, there exists such that is -RCDD for some . Now,
However, is clearly a -RCDD symmetric matrix and therefore is PSD. Consequently and therefore doesn’t have a non-trivial kernel and is invertible. ∎
Let be such that is strictly RCDD and let with . Then is all positive and is an Eulerian Laplacian.
Since is strictly RCDD by Lemma 4 it is invertible. Furthermore, since by Lemma LABEL:lem:inverse-alphacdd_bound we know that and . Now is a directed Laplacian and therefore has a non-trivial right kernel. Consequently, by Lemma 3 and the fact that we know that is in the kernel of . Consequently, yielding the result. ∎
3.3.2 Properties of
Here we provide bounds on what needs to be for to be -RCDD. First in Lemma 6 we give necessary and sufficient conditions for to be -CDD. This provides a lower bound on all the . Then in Lemma 7 we upper bound . We conclude with Lemma 8, which yields a closed formula for the .
Lemma 6 (Conditions for -Cdd).
For all vectors and directed Laplacian the matrix is -CDD if and only if .
By definition of we have that is column -CDD if and only if entrywise
Applying to each side and then subtracting from each side yields the result. ∎
Lemma 7 (Bound on ).
Since is the entry-wise minimal vector such that is -RCDD and since we have
However, since we have that
Where we used that . ∎
Lemma 8 (Formula for ).
For and let for directed Laplacian . Then, , the entry-wise minimal vector such that is -RCDD is given by
By Lemma 6 we know that is -CDD if and only if . Furthermore, is -RDD if and only if which happens if and only if
Taking the maximum of the two lower bounds on yields the result. ∎
3.3.3 Progress from Updating
For and with we have .
Let . Then we have that
We can bound trivially by
We can bound the first term by Cauchy-Schwarz