Perron–based algorithms for the multilinear PageRank
We consider the multilinear PageRank problem studied in [Gleich, Lim and Yu, Multilinear PageRank, 2015], which is a system of quadratic equations with stochasticity and nonnegativity constraints. We use the theory of quadratic vector equations to prove several properties of its solutions and suggest new numerical algorithms. In particular, we prove the existence of a certain minimal solution, which does not always coincide with the stochastic one that is required by the problem. We use an interpretation of the solution as a Perron eigenvector to devise new fixed-point algorithms for its computation, and pair them with a continuation strategy based on a perturbative approach. The resulting numerical method is more reliable than the existing alternatives, being able to solve a larger number of problems.
Gleich, Lim and Yu  consider the following problem, arising as an approximate computation of the stationary measure of an order-2 Markov chain: given , with and
solve for the equation
The solution of interest is stochastic, i.e., and . Here denotes a column vector of all ones (with an optional subscript to specify its length), and inequalities between vectors and matrices are intended in the componentwise sense. In the paper , they prove some theoretical properties, consider several solution algorithms, and evaluate their performance. In the more recent paper , the authors improve some results concerning the uniqueness of the solution.
This problem originally appeared in , and is a variation of problems related to tensor eigenvalue problems and Perron–Frobenius theory for tensors; see, e.g., [4, 5, 6]. However, it also fits in the framework of quadratic vector equations derived from Markovian binary tree models introduced in  and later considered in [8, 9, 10]. Indeed, the paper  considers a more general problem, which is essentially (2) without the hypotheses (1). Hence, all of its results apply here, and they can be used in the context of multilinear PageRank. In particular,  considers the minimal nonnegative solution of (2) (in the componentwise sense), which is not necessarily stochastic as the one sought in .
In this paper, we use the theory of quadratic vector equations in [8, 9, 10] to better understand the behavior of the solutions of (2) and suggest new algorithms for computing the stochastic solution. More specifically, we show that if one considers the minimal nonnegative solution of (2) as well, the theoretical properties of (2) become clearer, even if one is only interested in stochastic solutions. Indeed we prove that there always exists a minimal nonnegative solution, which is the unique stochastic solution when . When , the minimal nonnegative solution is not stochastic and there is at least one stochastic solution . Note that [1, Theorem 4.3] already proves that when the stochastic solution is unique and [2, Theorem 1] slightly improves this bound; our results give a broader characterization. All this is in Section 2.
When , as already pointed out in , computing the stochastic solution of (2) is easy. Indeed, this is also due to the fact that the stochastic solution is the minimal solution, and for instance the numerical methods proposed in [7, 8, 9] perform very well. The most difficult case is when , in particular when . Since the minimal solution of (2) can be easily computed, the idea is to compute and deflate it, with a similar strategy to the one developed in [8, 9], hence allowing us to compute stochastic solutions even when they are not minimal. The main tool in this approach is rearranging (2) to show that (after a change of variables) a solution corresponds to the Perron eigenvector of a certain matrix that depends on itself. This interpretation in terms of Perron vector allows to devise new algorithms based on fixed point iteration and on Newton’s method. Sections 3 and 4 describe this deflation technique and the algorithms based on the Perron vector computation.
Finally, we propose in Section 5 a continuation strategy based on a perturbative approach that allows one to solve the problem for values in order to obtain better starting values for the more challenging cases when .
2 Properties of the nonnegative solutions
In this section, we show properties of the nonnegative solutions of the equation (2). In particular, we prove that there always exists a minimal nonnegative solution, which is stochastic when . These properties can be derived by specializing the results of , which apply to more general vector equations defined by bilinear forms.
We introduce the map
and its Jacobian
We have the following result.
2.1 Sum of entries and criticality
Let . Then, for any .
This fact has important consequences for the sum of the entries of the solutions of (2).
For each solution of (2), is one of the two solutions of the quadratic , i.e., or .
Let be one of the solutions of and define the level set . Since is convex and compact, and since maps to itself by Lemma 2, then the Brouwer fixed-point theorem implies the following result.
There exists at least a solution to (2) with and a solution with .
Hence we can have two different settings, for which we borrow the terminology from .
- Subcritical case
, hence the minimal nonnegative solution is the unique stochastic solution.
- Supercritical case
, hence the minimal nonnegative solution satisfies and there is at least one stochastic solution .
Note that [1, Theorem 4.3] already proves that when the stochastic solution is unique; these results give a broader characterization.
The tools that we have introduced can already be used to determine the behavior of simple iterations such as (3).
Thanks to Lemma 2, the quantity evolves according to . So the result follows by the theory of scalar fixed-point iterations, since this iteration converges to for , to for , and diverges for . ∎
An analogous result holds for the subcritical case.
The papers [8, 9, 10] describe several methods to compute the minimal solution . In particular, all the ones described in  exhibit monotonic convergence, that is, . Due to the uniqueness and the monotonic convergence properties, computing the minimal solution is typically simple, fast, and free of numerical issues. Hence in the subcritical case the multilinear PageRank problem is easy to solve. The supercritical case is more problematic.
Among all available algorithms to compute the minimal solution , we recall Newton’s method, which is one of the most efficient ones. The Newton–Raphson method applied to the function generates the sequence
The following result holds [10, Theorem 13].
Suppose that , and that is irreducible. Then, Newton’s method (4) starting from is well defined and converges monotonically to (i.e., ).
Algorithm (1) shows a straightforward implementation of Newton’s method as described above.
Note that the theory in [10, Section 9] shows how one can predict where zeros appear in and eliminate them reducing the problem to a smaller one. Indeed, in view of the probabilistic interpretation of multilinear PageRank, zero entries in can appear only when the second-order Markov chain associated to is not irreducible. So we can assume in the following that , and that the nonnegative matrix is irreducible. In particular, in this case also substochastic (as proved in [10, Theorem 6]).
3 Deflating the minimal solution
From now on, we assume to be in the supercritical case, i.e., , and that has been already computed and is explicitly available.
We wish adapt to this setting the deflation strategy introduced in . Since all solutions to (2) satisfy , it makes sense to change variable to obtain an equation in . After a few manipulations, using bilinearity of and the fact that , one gets
We set . Note that for each , hence it is irreducible. In addition, if is chosen such that as in (6), then
so is a stochastic matrix.
4 Perron-based algorithms
starting from a given nonnegative vector such that . This iteration is implemented in Algorithm 2
Let , with such that . Then, its Jacobian is given by
Let us compute its directional derivative along the direction . We set ; hence, . Since is irreducible, its Perron eigenvalue is a simple eigenvalue, and hence we can define locally smooth functions as the Perron eigenvalue of and . A computation analogous to (7) shows that , hence and .
By differentiating and evaluating at , we get
or, rearranging terms,
where . Since we defined so that , we have , and hence also
The matrix in the left-hand side is nonsingular, since it can be obtained by replacing the eigenvalue with in the eigendecomposition of the singular irreducible M-matrix . Thus we can write
Since , we can simplify this expression further to
from which we get (10). ∎
From the above result, the Jacobian of the function is given by
and Newton’s method consists of generating the sequence of vectors
The Perron–Newton method is described in Algorithm 3.
The standard theorems  on local convergence of Newton’s method imply the following result.
Suppose that the matrix is nonsingular. Then the Perron–Newton method is locally convergent to a stochastic solution of (2).
Since , the matrix has an eigenvalue 1 with left eigenvector , for each value of .
5 Continuation techniques
The above algorithms, as well as those in , are sufficient to solve most of the test problems that are explored in . However, especially when , the algorithms may converge very slowly or stagnate far away from the minimal solution. For this reason, we explore an additional technique based on a perturbation expansion of the solution as a function of , inspired loosely by the ideas of homotopy continuation , which is a well-known strategy to derive approximate solutions for parameter-dependent equations.
The main result is the following.
Let us make the dependence of the various quantities on the parameter explicit, i.e., we write to denote a stochastic solution of (2) for a certain value of the parameter , and similarly and .
We apply the implicit function theorem [13, Theorem 9.28] to
Differentiating (13), we get
The function obtained by the theorem must satisfy for each : indeed, by Corollary 3 for a solution of the equation it must hold either or , and the continuous function cannot jump from to without taking any intermediate value.
The formula (12) now follows by Taylor expansion. ∎
This result suggests a strategy to improve convergence for : we start solving the problem with a small value of , e.g., , then we solve it repeatedly for increasing values ; at each step we use as a starting point for our iterative algorithms an approximation of constructed from .
We tried several different strategies to construct this approximation.
Second-order Taylor expansion ;
First-order Taylor expansion ;
Linear extrapolation from the previous two values ;
An attempt at building a ‘semi-implicit method’ with only the values available: ideally, one would like to set (which would correspond to expressing with a Taylor expansion around ); however, we do not know . Instead, we take with
i.e., we use the new value of and the old value of in the expression for the first derivative.
The only remaining choice is designing an effective strategy to choose the next point . We adopt the following one. Note that for T1,EXT,IMP we expect . Hence one can expect
Thus we choose a ‘target’ value (e.g., ) for , and then we choose by solving the equation
The resulting continuation algorithm is described in Algorithm 4. Note that we start from , to steer clear of the double solution for . Convergence for such a value of is typically not problematic.
Note that for strategy T2 one would expect instead, but for simplicity (and to ensure a fair comparison between the methods) we use the same step size selection strategy.
6 Numerical experiments
We have implemented the described methods using Matlab, and compared them to some of the algorithms in  on the same benchmark set of small-size models () used in . We have used tolerance , where is the machine precision, as an initial value unless differently specified, and for Algorithm 4. To compute Perron vectors, we have used the output of Matlab’s eig. For problems with larger , different methods (eigs, inverse iteration, Gaussian elimination for kernel computation, etc.) can be considered .
The results, for various values of , are reported in Tables 1 to 3. Each of the 29 rows represents a different experiment in the benchmark set, and the columns stand for the different algorithms, according to the following legend.
The fixed-point iteration (3), from an initial value such that (and with renormalization to at each step).
The inner-outer iteration method, as described in .
The Perron method (Algorithm 2).
The Perron–Newton method (Algorithm 3).
- N-xxx, PN-xxx
N or PN methods combined with continuation with one of the four strategies described in Section 5.
For each experiment, we report the number of iterations required, and the CPU times in seconds (obtained on Matlab R2017a on a computer equipped with a 64-bit Intel core i5-6200U processor). The value NaN inside a table represents failure to converge after 10,000 iterations. For P and PN, the number of iterations is defined as the sum of the number of Newton iterations required to compute with Algorithm 1 and the number of iterations in the main algorithm. For the continuation algorithms, we report the number of inner iterations, that is, the total number of iterations performed by PN or N (respectively), summing along all calls to the subroutine.
Note that neither the number of iterations nor the CPU time are particularly indicative of the true performance of the algorithms: indeed, the iterations in the different methods amount to different quantity of work, since some require merely linear system solutions and some require eigenvalue computations. Moreover, Matlab introduces overheads which are difficult to predict for several instructions (such as variable access and function calls). In order to make the time comparisons more fair, for methods IO and N we did not use the code in , but rather rewrote it without making use of Matlab’s object-oriented features, which made the original functions significantly slower. The only change introduced with respect to their implementations is that we changed the stopping criterion to , so that the stopping criterion is the same one for all tested methods.
In any case, the performance comparison between the various methods should be taken with a pinch of salt.
We comment briefly on the results. Newton-based methods typically require a constant number of iterations to converge, but on some of the benchmarks they fail to converge or require a much larger number of iterations. From the point of view of reliability, combining the Perron–Newton algorithm with continuation strategies gives a definite advantage: the resulting methods are the only ones (among the ones considered) that can solve successfully all the problems in the original benchmark set in , which contained the experiments with all values of up to . Raising to the more extreme value of reveals failure cases for these methods as well. (However, if one reduces the step-size selection parameter to , methods PN-EXT and PN-T1 succeed on all examples also for .)
Our results confirm the findings of  that problem R6_3 (the third to last one) for is the hardest problem of the benchmark set; Perron-based algorithms can solve it successfully in the end, but (like the algorithms in ) they stagnate for a large number of iterations around a point which is far away from the true solution. The analysis in [1, End of Section 6.3 and caption of Figure 9] also attributes this difficulty to the presence of a ‘pseudo-solution’ that is an attraction basin for a large set of starting points.
To understand the specific difficulties encountered by continuation methods on problem R6_3, it is instructive to consider the plot in Figure 4.
This figure, which has been generated with the help of the software Bertini , shows the first entry of each valid stochastic solution to (2) (recall that there may be more than one solution), for varying values of . It turns out that the problem has a single solution for all values of up to , then two more solutions appear, initially coincident and then slowly separating one from the other. Two of these three solutions merge at about and then disappear (i.e., they are replaced by a pair of complex conjugate solutions), so for there is only one real stochastic solution remaining. Note that the points where the number of solutions changes are points in which the Jacobian is singular, in accordance with the implicit function theorem.
Continuation methods ‘track’ the upper solution, and hence fail to provide a good starting point for , when it disappears suddenly and is replaced by a completely different solution. Moreover, for , the numerical behavior of these iterations is affected by the presence of a point with a very small residual. For an intuition, the reader may consider the behavior of Newton’s method (on a function of a single variable) in presence of a local minimum very close to , e.g., on for a very small .
We have used the theory of quadratic vector equations in [8, 9, 10] to attack the multilinear PageRank problem described in . The tools that we have introduced improve the theoretical understanding of this equation and of its solutions. In particular, considering the minimal solution in addition to the stochastic ones allows one to use a broader array of algorithms, with computational advantage. The new algorithms achieve better results when , which is the most interesting and computationally challenging case.
Possible future improvements to these algorithms include experimenting with further step-size selection strategies, and more importantly integrating them with inexact eigenvalue solvers that are more suitable to computation with large sparse matrices.
-  Gleich DF, Lim LH, Yu Y. Multilinear PageRank. SIAM J. Matrix Anal. Appl. 2015; 36(4):1507–1541, doi:10.1137/140985160. URL http://dx.doi.org/10.1137/140985160.
-  Li W, Liu D, Ng MK, Vong SW. The uniqueness of multilinear pagerank vectors. Numerical Linear Algebra with Applications 2017; 24(6):e2107–n/a, doi:10.1002/nla.2107. URL http://dx.doi.org/10.1002/nla.2107, e2107 nla.2107.
-  Li W, Ng MK. On the limiting probability distribution of a transition probability tensor. Linear Multilinear Algebra 2014; 62(3):362–385, doi:10.1080/03081087.2013.777436. URL http://dx.doi.org/10.1080/03081087.2013.777436.
-  Lim LH. Singular values and eigenvalues of tensors: A variational approach. Proceedings of the 1st IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, 2005; 129–132, doi:10.1109/CAMAP.2005.1574201.
-  Chang KC, Pearson K, Zhang T. Perron-Frobenius theorem for nonnegative tensors. Commun. Math. Sci. 2008; 6(2):507–520. URL http://projecteuclid.org/euclid.cms/1214949934.
-  Friedland S, Gaubert S, Han L. Perron-Frobenius theorem for nonnegative multilinear forms and extensions. Linear Algebra Appl. 2013; 438(2):738–749, doi:10.1016/j.laa.2011.02.042. URL http://dx.doi.org/10.1016/j.laa.2011.02.042.
-  Hautphenne S, Latouche G, Remiche MA. Newton’s iteration for the extinction probability of a Markovian binary tree. Linear Algebra Appl. 2008; 428(11-12):2791–2804, doi:10.1016/j.laa.2007.12.024. URL http://dx.doi.org/10.1016/j.laa.2007.12.024.
-  Bini DA, Meini B, Poloni F. On the solution of a quadratic vector equation arising in Markovian binary trees. Numer. Linear Algebra Appl. 2011; 18(6):981–991, doi:10.1002/nla.809. URL http://dx.doi.org/10.1002/nla.809.
-  Meini B, Poloni F. A Perron iteration for the solution of a quadratic vector equation arising in Markovian binary trees. SIAM J. Matrix Anal. Appl. 2011; 32(1):248–261, doi:10.1137/100796765. URL http://dx.doi.org/10.1137/100796765.
-  Poloni F. Quadratic vector equations. Linear Algebra and its Applications 2013; 438(4):1627–1644, doi:10.1016/j.laa.2011.05.036. URL http://dx.doi.org/10.1016/j.laa.2011.05.036.
-  Ortega JM, Rheinboldt WC. Iterative solution of nonlinear equations in several variables, Classics in Applied Mathematics, vol. 30. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2000, doi:10.1137/1.9780898719468. URL http://dx.doi.org/10.1137/1.9780898719468, reprint of the 1970 original.
-  Li TY. Numerical solution of multivariate polynomial systems by homotopy continuation methods. Acta numerica, 1997, Acta Numer., vol. 6. Cambridge Univ. Press, Cambridge, 1997; 399–436, doi:10.1017/S0962492900002749. URL http://dx.doi.org/10.1017/S0962492900002749.
-  Rudin W. Principles of mathematical analysis. Third edn., McGraw-Hill Book Co., New York-Auckland-Düsseldorf, 1976. International Series in Pure and Applied Mathematics.
-  Gleich DF, Yu Y. mlpagerank. Github repository 2015. URL https://github.com/dgleich/mlpagerank.
-  Stewart WJ. Introduction to the numerical solution of Markov chains. Princeton University Press, Princeton, NJ, 1994.
-  Higham DJ, Higham NJ. MATLAB Guide. Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2000.
-  Bates DJ, Hauenstein JD, Sommese AJ, Wampler CW. Bertini: Software for numerical algebraic geometry. Available at http://bertini.nd.edu, doi:10.7274/R0H41PB5.