Perron-Frobenius Theory in Nearly Linear Time: Positive Eigenvectors, M-matrices, Graph Kernels, and Other Applications

Perron-Frobenius Theory in Nearly Linear Time:
Positive Eigenvectors, M-matrices, Graph Kernels, and Other Applications

AmirMahdi Ahmadinejad
Stanford University
ahmadi@stanford.edu
   Arun Jambulapati
Stanford University
jmblpati@stanford.edu
   Amin Saberi
Stanford University
saberi@stanford.edu
   Aaron Sidford
Stanford University
sidford@stanford.edu
Abstract

In this paper we provide nearly linear time algorithms for several problems closely associated with the classic Perron-Frobenius theorem, including computing Perron vectors, i.e. entrywise non-negative eigenvectors of non-negative matrices, and solving linear systems in asymmetric M-matrices, a generalization of Laplacian systems. The running times of our algorithms depend nearly linearly on the input size and polylogarithmically on the desired accuracy and problem condition number.

Leveraging these results we also provide improved running times for a broader range of problems including computing random walk-based graph kernels, computing Katz centrality, and more. The running times of our algorithms improve upon previously known results which either depended polynomially on the condition number of the problem, required quadratic time, or only applied to special cases.

We obtain these results by providing new iterative methods for reducing these problems to solving linear systems in Row-Column Diagonally Dominant (RCDD) matrices. Our methods are related to the classic shift-and-invert preconditioning technique for eigenvector computation and constitute the first alternative to the result in Cohen et al. (2016) for reducing stationary distribution computation and solving directed Laplacian systems to solving RCDD systems.

1 Introduction

The Perron-Frobenius theorem of O. Perron (1907) and G. Frobenius (1912) is a fundamental result in linear algebra that has had far reaching implications over the past century. Notable applications include the study of Markov chains in probability theory, the theory of dynamical systems, economic analysis (such as Leontief’s input-output model), and modeling and analysis of social networks.

In its simplest form, the Perron-Frobenius theorem states that every positive square matrix has a unique largest real eigenvalue such that for some positive vector . This vector, known as the Perron-Frobenius vector or Perron vector, plays a central role in understanding positive matrices and a broad classes of linear systems known as M-matrices. Through the natural representation of directed graphs with non-negative edge weights via their adjacency matrices, the Perron vector provides a critical tool to understand numerous graph related properties.

Given its far-reaching applications and its role as a cornerstone to numerous disciplines, providing faster algorithms for computing Perron vectors and positive eigenvectors of non-negative matrices has been a central problem studied extensively for decades [24, 39, 28, 14, 9, 21, 7, 13, 29]. However, despite extensive research on this problem and numerous proposed algorithms (see [11] for a survey), all previously known algorithms either run in quadratic time, are applicable only in special cases [8, 3, 4], or depend more than polylogarithmically on the desired accuracy and the relevant condition numbers (see Section 1.2 for further discussion).

In this paper, we provide the first nearly linear time algorithm111We use the term “nearly linear time” to refer to any running time that is where is a measure of the size of the input. This is in contrast to certain works [4, 35] which use the phrase “almost linear” to refer to this running time and reserve the phrase “nearly linear” for running times which hide only polylogarithmic factors. We adopt this notation as our running time depend on the best running time for solving row-column diagonally dominant systems [3]. While the state of the art for those is currently almost linear, their (expected) improvement to nearly linear time will result in the same running time for us too. for computing Perron vectors and prove the following theorem.

\thmt@toks\thmt@toks

Given non-negative irreducible matrix and , there is an Algorithm (See Algorithm 9) which in time with high probability in computes real number , and positive vectors such that ,

where and , are the left and right Perron vectors of .

Theorem 1 (Perron Vector Nearly Linear Time).

To achieve this result we consider the more general problem of solving linear systems for a broad class of matrices, known as M-matrices. A matrix is an M-matrix if it can be written of the form where and is a non-negative matrix with spectral radius (i.e. largest eigenvalue) of magnitude at most . These systems generalize symmetric diagonally dominant systems [33], row-column diagonally dominant systems [3], and factor width 2 systems [8] and have numerous applications.

The previously known algorithms for solving linear systems in M-matrices suffer from the same issues as we mentioned for computing the Perron vector. In this paper, we present the first nearly linear time algorithm for solving this problem and prove the following theorem:

\thmt@toks\thmt@toks

Let with nonzero entries, , , and . For all and there is an algorithm (See Algorithm 5) which runs in time and with high probability computes an operator where for any vector it is the case .

Theorem 2 (M-Matrices in Nearly Linear Time).

As an immediate corollary of this theorem, we can show that if a matrix has the property that after negating its off-diagonal entries it becomes an M-matrix, then we can also solve systems in in nearly linear time (See Section D). We provide a specialization of this result for symmetric matrices that shows we can solve factor-width 2 matrices, that is matrices of the form where each row of has at most non-zero entries (See Section D). Further in Section C we prove a specialization of Theorem 1 for the case where is symmetric. In this case we are able to get a shorter analysis with a tighter running time bound.

To prove Theorem 1 we build upon and enhance an exciting line of work on developing nearly linear time algorithms for computing the stationary distribution of Markov chains, a special case of the Perron vector problem, and solving linear systems in directed Laplacians, a special cases of the problem of solving M-matrices [3, 4]. This line of work achieved these results in two steps. First [3] provided a reduction from these problems to solving linear systems in Eulerian Laplacians or more broadly, row-column diagonally dominant (RCDD) systems. Second, [4] provided a nearly linear time algorithm for solving these systems.

These results suggest a natural line of attack to achieve our results, extend the reduction from [3] to reduce solving M-matrices to solving RCDD matrices. While there are reasons to hope this might be possible (See Section 1.1), there are two significant barriers to this approach (See Section 1.2). First, the analysis in [3] leveraged the fact that directed Laplacians are always M-matrices, and there is a simple linear time procedure to check that a matrix is a directed Laplacian (and therefore an M-matrix). However, in general the mere problem of computing whether or not a matrix is an M-matrix involves computing the spectral radius or top eigenvector of a non-negative matrix and prior to our work there was no nearly linear time algorithm for it. Consequently, extending [3] would require a different type of analysis that leverages properties of M-matrices that cannot be easily tested. Second, while the algorithm and analysis of the reduction in [3] is fairly simple and short, it does not follow any previously known analysis of an iterative method and thus it does not seem clear how to modify the algorithm and analysis in a principled manner.

To circumvent these issues, we provide a new algorithm for reducing solving M-matrices to solving RCDD systems that provides the first known alternative to the reduction in [3] for solving directed Laplacians. Interestingly, our algorithm is a close relative of a classic and natural method for computing eigenvectors of a matrix known as shift-and-invert preconditiong [32, 10] and can be easily analyzed by a simple bound on how well different regularized M-matrices approximate or precondition each other. While, the careful error analysis of our method is detailed (see Section 4) its core fits on just two pages and can be found in Section 3. We believe this is one of the primary technical contributions of our paper. (See Section 1.3 and Section 3 for further discussion.)

Finally, we show how to leverage these results to achieve faster running times for a wide range of problems involving Perron vectors, M-matrices, and random walks. In particular, we show our method can be applied in the analysis of Leontief input-output models, computing Katz Centrality [17], computing the top singular value and its corresponding vectors of a non-negative matrix, and computing random walk based graph kernels. While some of these are direct applications, others require opening up our framework and leveraging its flexibility, e.g computing the top singular value and its corresponding vectors of a non-negative matrix. (See Section 6)

Just as Laplacian systems have been crucial for solving a broad range of combinatorial problems [18, 22, 27, 34, 35], we hope that the work in this paper will ultimately open the door to the development of even faster and simpler linear algebraic primitives for processing graphs and probing their random walk structure and find even broader applications than those provided in this paper.

1.1 M-Matrices, Positive Matrix Sums, and Perron Vectors

Our results leverage a tight connection between invertible M-matrices, matrix series, random walks, Perron vectors, and RCDD matrices. Here we elaborate on this connection to motivate previous work on these problems and our approach to solving them.

To demonstrate this connection, first observe that a matrix is an invertible M-matrix if and only if it is of the form where . In this case it is easy to see that

Consequently, the question of testing whether a matrix is an M-matrix and solving linear systems in it is closely related to the following problems:

Definition 1 (M-Matrix Problem).

Given matrix and a vector either solve the linear system or prove that is not an M-matrix.

Definition 2 (Geometric Non-negative Matrix Series Problem).

Given a non-negative matrix and a vector compute or prove it diverges.

Definition 3 (Perron Vector Problem).

Given a non-negative matrix computes , its largest eigenvector, and a Perron vector, such that .

We provide the first nearly linear time algorithms for the above three problems. This is done by leveraging a somewhat classic fact that Perron vector problem is also intimately connected to the M-matrix problem. If is a non-negative matrix then,

where is the limit of approach from above and is a non-negative Perron vector of . Consequently, given a solution of the M-matrix problem we can obtain arbitrary good approximations to the Perron vector problem.

Second, (and perhaps more surprising), recent work on solving RCDD systems [3, 4] implies that solutions to the Perron vector problem lead to nearly linear time algorithms for the M-Matrix Problem. Suppose we have an invertible M-matrix with and we have computed positive left and right Perron vectors and respectively, then it is easy to see (See Section 3) that is RCDD, where and are the diagonal matrices associated with and respectively. Consequently, given Perron vectors and RCDD solvers we can solve M-matrices.

These two results together provide chicken and egg problem for solving M-matrices. We can solve M-matrices with Perron vectors and compute Perron vectors with M-matrix solvers, however it is unclear how to achieve either individually. While there have been previous results for escaping this conundrum to provide solvers for special cases of M-matrices, they are only applicable in settings where we were able to certify that the input matrix was an M-matrix in nearly linear time. A key result of this paper is that a variant of shift-and-invert preconditioning carefully analyzed leveraging properties of M-matrices lets us efficiently escape this problem and obtain nearly linear time algorithms.

1.2 Previous Work

Here we briefly review previous work on solving M-matrices and computing Perron vectors. These problems have been studied extensively and there are too many results to list. Here we briefly survey various areas of research relevant to our theoretical improvements. For a more comprehensive treatment of these problems see [11]. Previous work on the additional applications of these results are deferred to Section 6.

Each of the problems described in Section 1.1 can easily be solved using an algorithm for solving square linear systems. Moreover, given a matrix and a vector , it is well-known that a linear system can be solved in , where [40], or can be solved in where is the number of nonzero entries of .222Note that conjugate gradient achieves time only using exact arithmetic. It is open how to achieve a comparable running time where arithmetic operations can only be performed on numbers with a polylogarithmic number of bits. See [26] for further discussion. However, in both theory and practice is often sparse, e.g. , in which case these algorithms run in quadratic, i.e. time, which may be prohibitively expensive for many purposes.

To improve upon the performance of these generic linear system solvers a wide array of iterative methods and practical heuristics have been proposed. However, even in the special case of computing the stationary distribution of Markov chains, until the work of [3, 4], the previously best known running times either only applied in special cases or depended polynomially on the desired accuracy or conditioning of the problem.

Recently, in an important special case of the problems considered in Section 1.1, namely computing the stationary distribution, and solving directed Laplacians (i.e. for these ) it was shown how to solve these problems in nearly linear time [4]. This result was achieved by a result in [4] showing that RCDD matrices can be solved in nearly linear time and a result in [3], reducing each problem to solving RCDD systems with only polylogarithmic overhead.

These works created hope for achieving the results of this paper, e.g. by extending the reduction in [3], however there are significant issues to this approach. First, the analysis in [3] leveraged that it is easy to certify that a matrix is a directed Laplacian. The method incrementally computed the associated stationary distribution for this matrix using that a directed Laplacian has a kernel. However, as discussed even certifying that a matrix is an M-matrix involves computing the spectral radius of a non-negative matrix, for which there were no previously known nearly linear time algorithm. Second, the reduction in [3] was not known to be an instance of any previously studied iterative framework and therefore it is difficult to adapt it to new cases.

The only other work directly relevant to the problems we consider was the exciting work of [8] showing that symmetric M-matrices and more broadly factor-width 2 matrices could be solved in nearly linear time, provided an explicit factorization is given, i.e. for the input matrix there was a known with at most non-zero entries per row such that . Note that all symmetric M-matrices are factor-width , and though [8] provided a nearly linear time algorithm which given could reduce solving M to solving symmetric diagonally dominant (SDD) matrices, i.e. symmetric RCDD matrices, our result is the first to actual compute this factorization in nearly linear time when it is not given.

1.3 Overview of Approach

We achieve the results of our paper by leveraging the RCDD solvers of [4]. As discussed in Section 1.1 this would suffice to solve M-matrices in nearly linear time provided we could computed Perron vectors in nearly linear time. However, as discussed in Section 1.2 there are barriers to applying the previous methods of [3, 8] for computing Perron vectors in our more general setting.

To overcome this difficulty, we provide a new reduction from solving M-matrices to solving RCDD systems, that can serve as an alternative for the reductions in both [3] and [8]. Our method is loosely related to a method known as shift-and-invert preconditioning for computing eigenvectors of matrices [32, 10]. The crux of this method is that if a matrix has top eigenvalue with top eigenvector , then the matrix for also has top eigenvector with eigenvalue . Consequently, by performing power method on , i.e. solving linear systems in we can compute a top eigenvector of . Shift-and-invert precondition does precisely this, leveraging that power method may converge faster for this matrix as bringing towards may accentuate the gap between and the next smallest eigenvector.

Now, as we have discussed computing the left and right Perron vectors for suffices to solve linear systems in the M-matrix . Furthermore, similar to shift-and-invert we know that if solving linear systems in this matrix would suffice to compute the left and right Perron vectors. Our method works by considering the matrices for . By an very minor strengthening of a not as well known fact about M-matrices, it can be shown that actually just solving two linear system in suffice to get left and right scalings to make it RCDD. Consequently, if we have a solver for any we can get the scalings for and have a new solver for that matrix.

Now, we can always pick so large that is RCDD and therefore we can compute its scalings. However, this argument does not yet suffice to get a scaling for any other . To circumvent this we show that in fact for any and that are multiplicatively close, the matrix is close to the identity in an appropriate norm and therefore a solver for one can be used through preconditioning to yield a solver for the other.

Combining these insights yields a natural algorithm: compute a scaling for for large , use this to solve linear systems in and thereby . Use this solver to get scaling for and repeat. We provide the basic mathematical analysis for this assuming exact solvers in Section 3 and then show how to build upon this to solve our problem in the inexact case in Section B. In Section 5 we then show how to carefully binary search to solve the Perron problem and certify if matrices are M-matrices.

1.4 Overview of Results

In this paper we give the first nearly linear time algorithms for Perron Vector computation and solving linear systems in M-matrices. Our main results are the following:

Theorem ?? (Perron Vector Nearly Linear Time).
Theorem ?? (M-Matrices in Nearly Linear Time).

As discussed, we achieve these results by providing reductions from solving these problems to solving linear systems in RCDD matrices. Our reduction is the first alternative to the reduction in [3] for reducing stationary distribution computation and solving directed Laplacians to solving RCDD matrices and the first alternative to the reduction in [8] from solving symmetric M-matrices given a factorization to solving symmetric Laplacians. Our main theorem corresponding to this reduction is the following:

\thmt@toks\thmt@toks

Let with nonzero entries. Let and , and let . For all and there is an algorithm (See Algorithm 4) which runs in time and with high probability computes a pair of diagonal matrices where is RCDD with high probability.

Theorem 5 (M-Matrix Scaling in Nearly Linear Time).

Leveraging these results we obtain faster algorithms for a host of problems (See Section 6 for a more detailed discussion). Key results from this section include

  • Leontief economies: these are models capturing interdependencies between different sectors within a national economy and how the output of one sector can be used as input to another sector changes in the production of one will affect the others. We give a near linear time for checking the Hawkin-Simons condition [12] that guarantees the existence of a non-negative output vector that solves the equilibrium relation in which demand equals supply.

  • Katz centrality: we give a nearly linear time algorithm for computing the Katz centrality which defines the relative influence of a node within a network as a linear function of the influence of its neighbors.

  • Left and right top singular vectors: we can also compute the top left-right singular values and associated top lef-right singular vectors of a non-negative matrix in nearly linear time. For this application we need additional ingredients solve linear systems in when is RCDD (see Oracle 1).

  • Graph kernels: graph kernels capture the similarity between two graphs with applications in social networks, studying chemical compounds, comparison and function prediction of protein structures, and analysis of semantic structures in natural language processing [15, 16, 38]. One key obstacle in applying the existing algorithms for computing the kernel function between two graphs is their large running time [38]. Using our methodology, we obtain improved running times for computing canonical kernel functions known as random walk kernels.

1.5 Paper Organization

The rest of the paper is organized as follows. In Section 2 we cover preliminaries including notation and facts about M-matrices and Perron vectors that we use throughout the paper. Then, in Section 3 we provide a semi-rigorous technical overview of our approach ignoring errors from approximate numerical computations to make the insights of our results clear.

In Section 4 we provide the algorithm for computing RCDD scalings of M-matrices and in Section 5 we show how to use this method to compute Perron vectors and achieve efficient running times for solving and certifying M-matrices with unknown condition number. In Section 6 we provide our applications.

Finally, in Appendix A we provide proofs of facts about M-matrices and Perron vectors that we use throughout the paper and in Section B we provide proofs missing from Section B.

2 Preliminaries

In this section we provide our notation (Section 2.1) as well as facts about M-matrices (Section 2.2), the Perron-Frobenius theorem (Section 2.3), and RCDD matrices (Section 2.4) that we use throughout the paper.

2.1 Notation

Variables: We use bold to denote matrices and arrows to denote vectors. We use to denote the identity matrix and the all-zero matrix respectively. We use to denote the all-zeros and all-ones vectors respectively. We use to denote the standard basis vector, i.e. the vector where for and .

Matrix Operations: For square matrix we use to denote its transpose, to denote its inverse, and to denote its Moore-Penrose pseuduinverse. For notational convenience we use to denote the inverse of the transpose of a matrix, i.e. . Given a vector , we use to denote the diagonal matrix where and whenever it is unambiguous, we will refer to as .

Matrix Ordering: For symmetric matrices we use to denote the condition that for all and define and analogously. We say symmetric matrix is positive semidefinite (PSD) if and positive definite (PD) if . For any PSD , we define as the unique symmetric PSD matrix where and as the unique PSD matrix where .

Matrix Norms: Given a positive semidefinite matrix (PSD) , we define the -norm (or seminorm) . We define the norm . We define the norm of a matrix to be the matrix norm induced by the norm on vectors: for a matrix , . Similarly, for any PSD we define . We observe that . We further define the induced and matrix norms as and . We observe that and .

Spectral Quantities: For a square matrix , we let and denote its largest and smallest eigenvalues of respectively, we let denote its condition number, and we let denote its spectral radius.

2.2 M-Matrices

This paper deals extensively with a broad prevalent class of matrices known as M-matrices. We will present some basic definitions and prove some basic facts about M-matrices here. We start by giving the formal definition of an M-matrix:

Definition 4 (M-Matrix).

A matrix is an M-matrix if it can be expressed as where and is an entrywise nonnegative matrix with .

M-matrices have many useful properties, some of which can be found in [2]. We state and prove the properties that are needed in our analysis, in Appendix (see Section A).

2.3 Perron-Frobenius Theorem

Here we give several definitions and lemmas related to Perron-Frobenius theorem.

Definition 5.

A matrix is called an irreducible matrix, if for every pair of row/column indices and there exists a natural number such that is non-zero.

Theorem 6 (Perron-Frobenius Theorem).

Let be an irreducible matrix with spectral radius . Then the following facts hold:

  • is a positive real number and it is an eigenvalue of .

  • The eigenvalue corresponding to is simple, which means the left and right eigenspaces associated with are one-dimensional.

  • has a right eigenvector and a left eigenvector with eigenvalue , which are component-wise positive. i.e. .

  • The only eigenvectors whose components are all positive, are the ones associated with .

Lemma 1 (Collatz - Wielandt formula).

Given a non-negative irreducible matrix , for all non-negative non-zero vectors , let be the minimum value of taken over all those ’s such that . Then is a real valued function whose maximum over all non-negative non-zero vectors is the Perron-Frobenius eigenvalue.

2.4 Row Column Diagonally Dominant Matrices

A key result in our paper is to reduce the problem of solving M-matrices to solving a sequence of row column diagonally dominant (RCDD) matrices and then leverage previous work on solving such systems. Here we provide the results and notation that we use in the remainder of the paper. We begin by defining RCDD matrices.

Definition 6 (Row Column Diagonally Dominant (RCDD) Matrices).

A matrix is RCDD if and for all . It is strictly RCDD if each of these inequalities holds strictly.

Our algorithms make extensive use of recent exciting results that show linear system in RCDD systems can be solved efficiently. Since our algorithms only use RCDD solvers as black box, through this paper we simply assume that we have oracle access to a RCDD solver defined as follows.

Oracle 1 (RCDD Solver).

Let be an RCDD matrix, and let be a vector. Then there is a procedure, called which can generate an operator where

Further, we can compute and apply to vectors in time.

Designing faster RCDD solvers is an active area of research and therefore we parameterize our running times in terms of so that should faster RCDD solvers be developed our running times immediately improve. We achieve our claims about nearly linear time algorithms by leveraging the following results of [3, 4] that RCDD systems can be solved in nearly linear time.

Theorem 7 (Nearly Linear Time RCDD Solvers [3, 4]).

It is possible to implement Oracle 1 with

3 A Technical Overview of Approach

In this section we give a technical overview of the key results in our paper. We provide several of the core proofs and algorithms we use in the rest of the paper. Our focus in this section is to provide a semi-rigorous demonstration of our results. While the complete analysis of our results is rather technical due to the careful account for the numerical precision of various algorithmic subroutines that can only be carried out approximately, e.g. solving various linear systems, here we provide a vastly simplified analysis that ignores these issues, but conveys the core of our arguments.

Formally, we make the following assumption through this section (and only this section) to simplify our analysis. In the remainder of the paper we carry on our analysis without it.

Assumption 1 (Exact Numerical Operations Assumption Made Only in Section 3).

For any routine that can compute a vector with -approximate error in some norm , i.e. compute with , in time we say there is a routine that can compute exactly in time, where here hides factors polylogarithmic in as well as natural parameters of the problem, e.g. .

Note that this assumption is obviously not met in various settings and can easily be abused by performing arithmetic operations on overly large numbers for free. However, it certainly simplifies analysis greatly, as for instance, under this assumption, Theorem 7. implies that a RCDD matrix with non-zero entries can be solved exactly in time. However, in many cases (and is formally the case in this paper), analysis under the assumption loses only polylogarithmic factors in the overall running time.

Again, we make this assumption purely to simplify exposition in this section. In the remainder of the paper we carry on our analysis without it.

3.1 From M-Matrices to RCDD Matrices

We begin our technical overview by formally stating the connections between M-matrices and diagonal scaling that make such matrices RCDD.

Given a non-negative irreducible matrix , suppose is an invertible M-matrix for some . As we discussed in Section 1.1 we know that if and are the left and right Perron vectors of respectively then

Consequently, for and this property implies that is RCDD since its off-diagonal entries are non-positive and

Thus any invertible M-matrix can be scaled with positive diagonal matrices to become RCDD. Furthermore, since systems of linear equations with RCDD coefficient matrices can be solved in nearly linear time this implies that given these scaling M-matrices can be solved in nearly linear time, as to solve , one can use an RCDD solver to find in nearly linear time.

More interestingly, the reduction also works in the other direction. This means if one can solve a linear system described with an M-matrix efficiently, then it is possible to find such that is RCDD where and . Again from M-matrix theory we know that if is an invertible M-matrix then is entrywise non-negative. Consequently, if we apply to any positive vector, the resulting vector remains positive. Given this observation, let , and . Now and similarly . Thus, is RCDD and we see that one can look at either of or as a certificate for the matrix to be an M-matrix.

We use both of these directions in our algorithm to either find the desired scalings or solve linear systems fast. We further use the following lemma which is a formal statement of what we discussed above. The proof is deferred to Appendix (Section A).

\thmt@toks\thmt@toks

Let be a nonnegative matrix, and consider for some . Then if is an M-matrix then for every pair of vectors and where and we have is RCDD. Further, if there exist positive vectors , where is RCDD, then is an M-matrix.

Lemma 2.

Also, the following observation about M-matrices and these scaling is important in our analysis (see Section A for the proof). \thmt@toks\thmt@toks Given a non-negative matrix , where is an invertible M-matrix, l et be the left and right scalings that make RCDD and let . Then is PD.

Lemma 3.

3.2 Algorithm for Solving Linear Systems in M-matrices

Relying on the connection between M-matrices and RCDD matrices discussed in Section 3.1 and the fact that linear systems described by RCDD matrices can be solved in nearly linear time Section 2.4, in this section we present one of our central results which is a simple iterative algorithm that can be used to solve systems of linear equations described by M-matrices. Particularly, we give a nearly linear time algorithm for solving the following problem.

Definition 7 (M-matrix Scaling Problem).

Given non-negative and value where is an invertible M-matrix, i.e. , compute scaling vectors , such that is strictly RCDD, where and .

The main statement we will prove in this section is Theorem 8 given below. The fully rigorous version of this Theorem 1.4 is one of the main results of the paper and provides a tool for solving linear systems in M-matrices in nearly linear time. We also use this result later to give our nearly linear time algorithm for finding the top eigenvalue and eigenvectors of non-negative matrices as described by Perron-Frobenius theorem. We remark that the following statement is written assuming Assumption 1 and the full version of this theorem is stated in Theorem 1.4.

Theorem 8 (M-matrix Scaling, Informal).

Given a value , a non-negative matrix with -nonzero entries where , and let . Algorithm 2, finds positive diagonal matrices such that is RCDD in time .

Note that in solving the above problem we can assume without loss of generality that : observe that is an M-matrix and additionally note that . Thus any scaling of is a scaling of . Note that the running time is unaffected by such scalings since . Thus from now on whenever we discuss Theorem 8 we assume for simplicity. To solve the M-matrix scaling problem with we take a simple iterative approach of solving and computing scalings for defined for all by . In particular, we consider the scaling vectors and let and (see Algorithm 2). As discussed in Section 3.1, is RCDD. Note that to solve , we can instead solve , which is equivalent to computing . This means solving an equation in reduces to inverting a RCDD matrix.

The above discussion shows computing and gives us access to in nearly linear time. We show computing and for one value of , suffices to compute it for another nearby. To do this, we show for constantly close to is a good preconditioner of , i.e. in a specific norm. It is known that when this happens a simple procedure known as preconditioned Richardson (Algorithm 1) can solve a linear system in using just applications of and , i.e. nearly linear time.

1:function (, , , , , )
2:     Input: , is a preconditioner for , ,
3:     Input: is an upper-bound on the number of iterations to run the algorithm.
4:     Output: Either with or failure after iterations
5:     
6:     repeat
7:         
8:     until 
or number of iterations exceeds
9:     return
10:end function
Algorithm 1 Exact Preconditioned Richardson

The following lemmas provides the formal bound on that allows this procedure to work.

Lemma 4.

Let be a symmetric PSD matrix such that is PSD. Then for all the following holds

Proof.

Note that . Consequently, and therefore

yielding the first equality of the claim. For the inequality, note that

Furthermore, we see that

Since and by assumption, we have

yielding the result. ∎

By Lemma 4, any which makes PSD lets us bound . Furthermore, in Lemma 3.1 we showed that for is PSD. Therefore, we already know an appropriate scaling .

Having all these observations we are ready to give an efficient algorithm for solving M-matrix scaling problem. Our algorithm (Algorithm 2) starts with a large (e.g. ) such that trivially becomes RCDD, then halves in each iteration and find for the new . Using Lemmas 3.1 and 4, we just showed there exists a norm where . Provided we can solve linear systems in , we compute and using preconditioned Richardson (see Algorithm 1). By Assumption 1, this requires solving linear systems in . Since we already have access to and we can solve systems in exactly in (again by Assumption 1). Therefore and can be computed in . Now to find scalings for , we iterate times, giving the overall running time of . This proves the statement in Theorem 8.

1:function  (, )
2:     Input: is an irreducible matrix with , and is a real number.
3:     output: Positive diagonal scalings and such that is RCDD
4:     
5:     
6:     while 
 do
7:         //.
8:         //In the following lines, we assume access to an efficient solver for as , are already computed.
9:          (, , , , 0, )
10:          (, , , , 0, )
11:         
12:     end while
13:     return with being RCDD
14:end function
Algorithm 2 Compute left and right scalings of an M-matrix to make it RCDD.

We used Assumption 1 twice in the above argument. In Section 4 through a careful analysis, we show how to adjust each of these assumptions to the approximate world while avoiding to increase the running time by more than a polylogarithmic factor in , , and .

3.3 Finding the top eigenvalue and eigenvectors of positive matrices

In this section, we present an overview of our result on computing the largest eigenvalue and corresponding eigenvectors of a non-negative matrix as characterized by Perron-Frobenius theorem. In the last section we showed how to find the left and right scalings for an M-matrix to make it RCDD. Here we discuss how to use this idea to come up with a nearly linear time algorithm for the Perron vector problem (see Definition 3) or equivalently M-matrix decision problem, defined below.

Definition 8 (M-matrix decision problem).

Given a non-negative irreducible matrix , and real number , decide whether is an M-matrix, or equivalently .

If we know the top eigenvalue of , then to solve the M-matrix decision problem we only need to check . Thus a solver for the Perron problem, gives an immediate solution for the M-matrix decision problem. On the other hand, if we have a solver for the M-matrix decision problem, to get a solver for the Perron problem, we need to use a search strategy to find the smallest such that . We use the latter approach combined with the ideas from Algorithm 2 to come up with a nearly linear time algorithm for the Perron problem. Theorem 1 demonstrates this result. A full detailed analysis of this theorem is given in Section 5. Below we discuss the main ideas and steps that are taken to prove Theorem 1, and point to the corresponding subsections in Section 5.

3.3.1 The M-matrix Decision Problem

Here we show how a slightly modified version of Algorithm 2 can be used to solve the M-matrix decision problem. Note that if Algorithm 2 succeeds to find a scaling for , it is also a proof that . However, Algorithm 2 only works on the premise that . If we remove this assumption, we are no longer able to apply Lemmas 3.1 and 4. In other words, it might no longer be the case that is a good preconditioner for (i.e. satisfies in some norm) or that is an M-matrix. Therefore, preconditioned Richardson might fail to terminate within a reasonable time, or find an appropriate scaling. To give an example on what might go wrong, note that in our analysis we rely on the fact , but this is not necessarily true when is not an M-matrix.

Remember we used Assumption 1, and Lemmas 3.14, to show each invocation of preconditioned Richardson in Algorithm 2 generates an exact solution after iterations. Interestingly, we can use this to test whether is an M-matrix or not. Given that we have access to an exact solver for , if preconditioned Richardson fails to find an appropriate scaling after iterations, it is a proof that is not an M-matrix. On the other hand, if we limit the number of iterations for preconditioned Richardson to and our algorithm still succeeds to find appropriate scalings for , this is a proof that is actually an M-matrix. This way we can come up with a nearly linear time algorithm which either reports that is not an M-matrix, or provides a proof that is an M-matrix.

For a full discussion on the M-matrix decision problem and the bounds we get for solving this problem refer to Section 5.1. In this section there are further complications that we address to deal with the fact that the condition number of the top eigenvectors of is not known. The main issue comes up with choosing the number of iterations to run the preconditioned richardson. We discuss this issue briefly here, since it impacts the running time of the algorithm in a way other than tunning the internal solvers precisions. For a detailed discussion we refer the reader to Section 5.3.

In the aforementioned scheme, given that is a preconditioner for , we assumed (by Assumption 1) there exists a function such that the preconditioned Richardson (see Algorithm 2) should terminate with an exact solution after iterations. To get the above strategy actually work, we need to know an upper-bound on , which turns out to have a polylogarithmic dependence on and , where and are the left and right eigenvectors of . Unfortunately, we do not know and in advance.

One potential fix to this problem is to guess an upper-bound on the condition numbers of left and right eigenvectors and double it until . This simple doubling scheme does not quite work as it is not clear when we should stop doubling . Instead we provide a more complex doubling scheme which can solve the problem. For this we refer reader to Section 5.3.

3.3.2 Perron problem

We use our algorithm for the M-matrix decision problem to provide an algorithm for the Perron problem. Our approach for solving the Perron problem consists of two stages. First, we compute the largest eigenvalue, namely , within an multiplicative error. Second, we use this computed value to find approximate eigenvectors corresponding to . Our method for finding the largest eigenvalue of is depicted in Algorithm 7, function (Section 5.2). By using the algorithm for the M-matrix decision problem, for a given , we can either determine (i.e. is not an M-matrix), or (i.e. is an M-matrix). We use this fact in , to apply a binary search which finds within an multiplicative error. For a detailed discussion and analysis of this algorithm refer to Section 5.2.

Next, in Section 5.2.1 we show how to find an approximate eigenvector of corresponding to its largest eigenvalue. We define an approximate eigenvector as follows.

Definition 9 (Approximate Eigenvector).

Given a matrix and a vector norm , let be a non-zero eigenvalue of , then a non-zero vector is an -approximate eigenvector of corresponding to if .

In Lemma 10, we show that is a -approximate eigenvector corresponding to (in infinity norm). Note that we are able to compute in nearly linear time with our machinery for the matrix scaling problem (refer to and definitions in Section 3.3.1). Therefore if we have we can compute its corresponding approximate eigenvectors efficiently. We then show similar bounds hold if instead of we use an approximation of it. This completes our two step approach for finding both the top eigenvalue and eigenvectors of a non-negative irreducible matrix. This simple approach could work under Assumption 1, if the terms we are hiding in notation in algorithm did not depend on and , or either we knew a tight upper-bound for them. However, as discussed in the previous subsection our algorithm for the M-matrix decision problem depends on knowing this bound. We show a fix for this in Section 5.3.

4 Numerically Stable M-Matrix Scaling

In this section, we analyze algorithm in full detail. The algorithm can be viewed as a cousin to shift-and-invert preconditioning methods for the recovery of the top eigenvector of a matrix. Given an input M-matrix , our algorithm maintains a left-right scaling333As a reminder, a pair of diagonal matrices is a left-right scaling of an matrix if is RCDD. of the matrix for progressively smaller values of . These scalings will be obtained by approximately solving the linear systems

As discussed in the previous section, we observe that we can assume , as for any the matrix : solving the scaling problem for is identical to solving the scaling problem for . We asume this throughout the rest of the section. We will choose our initial choice of to be such that is itself RCDD: thus our initial scalings can simply be . In every step of our procedure, we will use our computed left-right scaling of to obtain a scaling of . We do this in three parts. We first use the recent result on solving RCDD linear systems to compute a good preconditioner for the matrix given the left-right scaling . We then show that under a certain measure of error this preconditioner for also acts as a preconditioner for . Finally, we argue that by applying a standard iterative procedure preconditioned with to solve and we can solve these linear systems in a small number of iterations precisely enough to recover a scaling of . Once we have obtained our scaling of we repeatedly halve until our additive identity factor becomes small enough.

As discussed we begin by showing that solving M-matrices reduces to solving RCDD linear systems if we are given the left-right scaling of our desired M-matrix.

1:Input: is an M-matrix, are diagonal matrices where is RCDD, is a real number.
2:output: Operators satisfying Theorem 5.
3:;
4: generated by Oracle 1;
5: