Perron–based algorithms for the multilinear PageRank

# Perron–based algorithms for the multilinear PageRank

Beatrice Meini111Dipartimento di Matematica, Università di Pisa, Italy. and Federico Poloni222Dipartimento di Informatica, UniversitÃ  di Pisa, Italy. federico.poloni@unipi.it.
###### Abstract

We consider the multilinear PageRank problem studied in [Gleich, Lim and Yu, Multilinear PageRank, 2015], which is a system of quadratic equations with stochasticity and nonnegativity constraints. We use the theory of quadratic vector equations to prove several properties of its solutions and suggest new numerical algorithms. In particular, we prove the existence of a certain minimal solution, which does not always coincide with the stochastic one that is required by the problem. We use an interpretation of the solution as a Perron eigenvector to devise new fixed-point algorithms for its computation, and pair them with a continuation strategy based on a perturbative approach. The resulting numerical method is more reliable than the existing alternatives, being able to solve a larger number of problems.

Multilinear PageRank; Perron vector; fixed point iteration; Newton’s method

## 1 Introduction

Gleich, Lim and Yu  consider the following problem, arising as an approximate computation of the stationary measure of an order-2 Markov chain: given , with and

 1⊤nv=1,1⊤nR=1⊤n2, (1)

solve for the equation

 x=αR(x⊗x)+(1−α)v. (2)

The solution of interest is stochastic, i.e., and . Here denotes a column vector of all ones (with an optional subscript to specify its length), and inequalities between vectors and matrices are intended in the componentwise sense. In the paper , they prove some theoretical properties, consider several solution algorithms, and evaluate their performance. In the more recent paper , the authors improve some results concerning the uniqueness of the solution.

This problem originally appeared in , and is a variation of problems related to tensor eigenvalue problems and Perron–Frobenius theory for tensors; see, e.g., [4, 5, 6]. However, it also fits in the framework of quadratic vector equations derived from Markovian binary tree models introduced in  and later considered in [8, 9, 10]. Indeed, the paper  considers a more general problem, which is essentially (2) without the hypotheses (1). Hence, all of its results apply here, and they can be used in the context of multilinear PageRank. In particular,  considers the minimal nonnegative solution of (2) (in the componentwise sense), which is not necessarily stochastic as the one sought in .

In this paper, we use the theory of quadratic vector equations in [8, 9, 10] to better understand the behavior of the solutions of (2) and suggest new algorithms for computing the stochastic solution. More specifically, we show that if one considers the minimal nonnegative solution of (2) as well, the theoretical properties of (2) become clearer, even if one is only interested in stochastic solutions. Indeed we prove that there always exists a minimal nonnegative solution, which is the unique stochastic solution when . When , the minimal nonnegative solution is not stochastic and there is at least one stochastic solution . Note that [1, Theorem 4.3] already proves that when the stochastic solution is unique and [2, Theorem 1] slightly improves this bound; our results give a broader characterization. All this is in Section 2.

When , as already pointed out in , computing the stochastic solution of (2) is easy. Indeed, this is also due to the fact that the stochastic solution is the minimal solution, and for instance the numerical methods proposed in [7, 8, 9] perform very well. The most difficult case is when , in particular when . Since the minimal solution of (2) can be easily computed, the idea is to compute and deflate it, with a similar strategy to the one developed in [8, 9], hence allowing us to compute stochastic solutions even when they are not minimal. The main tool in this approach is rearranging (2) to show that (after a change of variables) a solution corresponds to the Perron eigenvector of a certain matrix that depends on itself. This interpretation in terms of Perron vector allows to devise new algorithms based on fixed point iteration and on Newton’s method. Sections 3 and 4 describe this deflation technique and the algorithms based on the Perron vector computation.

Finally, we propose in Section 5 a continuation strategy based on a perturbative approach that allows one to solve the problem for values in order to obtain better starting values for the more challenging cases when .

We report several numerical experiments in Section 6, to show the effectiveness of these new techniques for the set of small-scale benchmark problems introduced in , and draw some conclusions in Section 7.

## 2 Properties of the nonnegative solutions

In this section, we show properties of the nonnegative solutions of the equation (2). In particular, we prove that there always exists a minimal nonnegative solution, which is stochastic when . These properties can be derived by specializing the results of , which apply to more general vector equations defined by bilinear forms.

We introduce the map

 G(x):=αR(x⊗x)+(1−α)v,

and its Jacobian

 G′x:=αR(x⊗In)+αR(In⊗x).

We have the following result.

###### Lemma 1.

Consider the fixed-point iteration

 xk+1=G(xk),k=0,1,…, (3)

started from . Then the sequence of vectors is such that , there exists and is the minimal nonnegative solution of (2), i.e., equation (2) has a (unique) solution such that for any other possible solution .

###### Proof.

The map is weakly positive, i.e., , whenever , . Moreover, if then . Therefore Condition A1 of  is satisfied which, according to Theorem 4 of , implies that the sequence of vectors is bounded and converges monotonically to a vector , which is the minimal nonnegative solution of (2). ∎

### 2.1 Sum of entries and criticality

In this specific problem, the hypotheses (1) enforce a stronger structure on the iterates of (3): the sum of the entries of is a function of the sum of the entries of only.

###### Lemma 2.

Let . Then, for any .

###### Proof.
 1⊤G(x) =1⊤(αR(x⊗x)+(1−α)v)=α1⊤R(x⊗x)+(1−α)1⊤v =α1⊤(x⊗x)+(1−α)=α(1⊤x)2+(1−α).\qed

This fact has important consequences for the sum of the entries of the solutions of (2).

###### Corollary 3.

For each solution of (2), is one of the two solutions of the quadratic , i.e., or .

Let be one of the solutions of and define the level set . Since is convex and compact, and since maps to itself by Lemma 2, then the Brouwer fixed-point theorem implies the following result.

###### Corollary 4.

There exists at least a solution to (2) with and a solution with .

Hence we can have two different settings, for which we borrow the terminology from .

Subcritical case

, hence the minimal nonnegative solution is the unique stochastic solution.

Supercritical case

, hence the minimal nonnegative solution satisfies and there is at least one stochastic solution .

Note that [1, Theorem 4.3] already proves that when the stochastic solution is unique; these results give a broader characterization.

The tools that we have introduced can already be used to determine the behavior of simple iterations such as (3).

###### Theorem 5.

Consider the fixed-point iteration (3), with a certain initial value , for the problem (2) with . Define . Then,

• If , then , and the iteration (3) can converge only to the minimal solution (if it converges).

• If , then , hence the iteration (3) can converge only to a stochastic solution (if it converges).

• If , then , hence the iteration (3) diverges.

###### Proof.

Thanks to Lemma 2, the quantity evolves according to . So the result follows by the theory of scalar fixed-point iterations, since this iteration converges to for , to for , and diverges for . ∎

An analogous result holds for the subcritical case.

The papers [8, 9, 10] describe several methods to compute the minimal solution . In particular, all the ones described in  exhibit monotonic convergence, that is, . Due to the uniqueness and the monotonic convergence properties, computing the minimal solution is typically simple, fast, and free of numerical issues. Hence in the subcritical case the multilinear PageRank problem is easy to solve. The supercritical case is more problematic.

Among all available algorithms to compute the minimal solution , we recall Newton’s method, which is one of the most efficient ones. The Newton–Raphson method applied to the function generates the sequence

 (I−G′xk)xk+1=(1−α)v−αR(xk⊗xk),k=0,1,…. (4)

The following result holds [10, Theorem 13].

###### Lemma 6.

Suppose that , and that is irreducible. Then, Newton’s method (4) starting from is well defined and converges monotonically to (i.e., ).

Algorithm (1) shows a straightforward implementation of Newton’s method as described above.

Note that the theory in [10, Section 9] shows how one can predict where zeros appear in and eliminate them reducing the problem to a smaller one. Indeed, in view of the probabilistic interpretation of multilinear PageRank, zero entries in can appear only when the second-order Markov chain associated to is not irreducible. So we can assume in the following that , and that the nonnegative matrix is irreducible. In particular, in this case also substochastic (as proved in [10, Theorem 6]).

## 3 Deflating the minimal solution

From now on, we assume to be in the supercritical case, i.e., , and that has been already computed and is explicitly available.

We wish adapt to this setting the deflation strategy introduced in . Since all solutions to (2) satisfy , it makes sense to change variable to obtain an equation in . After a few manipulations, using bilinearity of and the fact that , one gets

 y=αR(y⊗y)+αR(y⊗m)+αR(m⊗y)=αR(y⊗y)+G′my=(αR(y⊗In)+G′m)y. (5)

Moreover,

 1⊤y=1⊤(s−m)=1−1−αα=2α−1α. (6)

We set . Note that for each , hence it is irreducible. In addition, if is chosen such that as in (6), then

 1⊤Py =α1⊤R(y⊗I)+α1⊤R(I⊗m)+α1⊤R(m⊗I) (7) =α1⊤(y⊗I)+α1⊤(I⊗m)+α1⊤(m⊗I) =α2α−1α1⊤+α1−αα1⊤+α1−αα1⊤=1⊤,

so is a stochastic matrix.

Let us introduce the map that gives the Perron vector of an irreducible matrix , normalized such that . Then, since is irreducible and stochastic, (5) and (6) show that

 y=2α−1αPV(Py), (8)

i.e., the sought vector is the Perron vector of the matrix , multiplied by the constant .

## 4 Perron-based algorithms

Equation (8) suggests a new fixed-point iteration for computing , which is analogous to the one appearing in ,

 yk+1=2α−1αPV(Pyk), (9)

starting from a given nonnegative vector such that . This iteration is implemented in Algorithm 2

We may also apply Newton’s method to find a solution to (8), following . To this end, we first compute the Jacobian of the function .

###### Lemma 7.

Let , with such that . Then, its Jacobian is given by

 w′(y)=α((I−Py+w(y)1⊤)−1R(I⊗w(y))−w(y)1⊤). (10)
###### Proof.

Let us compute its directional derivative along the direction . We set ; hence, . Since is irreducible, its Perron eigenvalue is a simple eigenvalue, and hence we can define locally smooth functions as the Perron eigenvalue of and . A computation analogous to (7) shows that , hence and .

By differentiating and evaluating at , we get

 αR(z⊗I)v(0)+Pydvdh(0)=α(1⊤z)v(0)+dvdh(0),

or, rearranging terms,

 (I−Py)dvdh(0)=α(R(I⊗v(0))−v(0)1⊤)z,

where . Since we defined so that , we have , and hence also

 (I−Py+v(0)1⊤)dvdh(0)=(R(I⊗v(0))−v(0)1⊤)z.

The matrix in the left-hand side is nonsingular, since it can be obtained by replacing the eigenvalue with in the eigendecomposition of the singular irreducible M-matrix . Thus we can write

 dvdh(0)=α(I−Py+v(0)1⊤)−1(R(I⊗v(0))−v(0)1⊤)z.

Since , we can simplify this expression further to

 dvdh(0)=α((I−Py+v(0)1⊤)−1R(I⊗v(0))−v(0)1⊤)z,

from which we get (10). ∎

From the above result, the Jacobian of the function is given by

 H′y=In+(2α−1)w(y)1⊤−(2α−1)(I−Py+w(y)1⊤)−1R(I⊗w(y)) (11)

and Newton’s method consists of generating the sequence of vectors

 yk+1=yk−(H′yk)−1H(yk)

The Perron–Newton method is described in Algorithm 3.

The standard theorems  on local convergence of Newton’s method imply the following result.

###### Theorem 8.

Suppose that the matrix is nonsingular. Then the Perron–Newton method is locally convergent to a stochastic solution of (2).

###### Remark 9.

Since , the matrix has an eigenvalue 1 with left eigenvector , for each value of .

## 5 Continuation techniques

The above algorithms, as well as those in , are sufficient to solve most of the test problems that are explored in . However, especially when , the algorithms may converge very slowly or stagnate far away from the minimal solution. For this reason, we explore an additional technique based on a perturbation expansion of the solution as a function of , inspired loosely by the ideas of homotopy continuation , which is a well-known strategy to derive approximate solutions for parameter-dependent equations.

The main result is the following.

###### Lemma 10.

Let be a stochastic solution of problem (2) (for a certain ), and suppose that is nonsingular. Then, there is a stochastic solution to (2) with the parameter replaced by such that

 s^α=s+s(1)(^α−α)+s(2)(^α−α)2+O((^α−α)3), (12)

with , , where .

###### Proof.

Let us make the dependence of the various quantities on the parameter explicit, i.e., we write to denote a stochastic solution of (2) for a certain value of the parameter , and similarly and .

We apply the implicit function theorem [13, Theorem 9.28] to

 0=Fα(sα)=sα−αR(sα⊗sα)−(1−α)v,

obtaining

 ∂Fα∂sα =I−αR(sα⊗I)−αR(I⊗sα)=I−G′α,sα, ∂Fα∂α =v−R(sα⊗sα), s(1)=ddαsα =−(∂Fα∂sα)−1∂Fα∂α=−(I−G′α,sα)−1(v−R(sα⊗sα)). (13a)

Differentiating (13), we get

 ddα∂Fα∂sα =−R(sα⊗I)−R(I⊗sα)−αR(s(1)⊗I)−αR(I⊗s(1)), ddα∂Fα∂α s(2)=d2sαdα2 =(∂Fα∂sα)−1(ddα∂Fα∂sα)(∂Fα∂sα)−1∂Fα∂α−(∂Fα∂sα)−1ddα∂Fα∂α =−(∂Fα∂sα)−1(ddα∂Fα∂sα)s(1)−(∂Fα∂sα)−1ddα∂Fα∂α =(I−G′α,sα)−1(R(tα⊗s(1))+R(s(1)⊗tα)).

The function obtained by the theorem must satisfy for each : indeed, by Corollary 3 for a solution of the equation it must hold either or , and the continuous function cannot jump from to without taking any intermediate value.

The formula (12) now follows by Taylor expansion. ∎

This result suggests a strategy to improve convergence for : we start solving the problem with a small value of , e.g., , then we solve it repeatedly for increasing values ; at each step we use as a starting point for our iterative algorithms an approximation of constructed from .

We tried several different strategies to construct this approximation.

T2

Second-order Taylor expansion ;

T1

First-order Taylor expansion ;

EXT

Linear extrapolation from the previous two values ;

IMP

An attempt at building a ‘semi-implicit method’ with only the values available: ideally, one would like to set (which would correspond to expressing with a Taylor expansion around ); however, we do not know . Instead, we take with

 s(1)=(I−αh+1R(sαh⊗I)−αh+1R(I⊗sαh))−1R(sαh⊗sαh−v),

i.e., we use the new value of and the old value of in the expression for the first derivative.

The only remaining choice is designing an effective strategy to choose the next point . We adopt the following one. Note that for T1,EXT,IMP we expect . Hence one can expect

 sαh−sαh−1(αh−αh−1)2≈sαh+1−sαh(αh+1−αh)2.

Thus we choose a ‘target’ value (e.g., ) for , and then we choose by solving the equation

 sαh−sαh−1(αh−αh−1)2=τ(αh+1−αh)2.

The resulting continuation algorithm is described in Algorithm 4. Note that we start from , to steer clear of the double solution for . Convergence for such a value of is typically not problematic.

Note that for strategy T2 one would expect instead, but for simplicity (and to ensure a fair comparison between the methods) we use the same step size selection strategy.

## 6 Numerical experiments

We have implemented the described methods using Matlab, and compared them to some of the algorithms in  on the same benchmark set of small-size models () used in . We have used tolerance , where is the machine precision, as an initial value unless differently specified, and for Algorithm 4. To compute Perron vectors, we have used the output of Matlab’s eig. For problems with larger , different methods (eigs, inverse iteration, Gaussian elimination for kernel computation, etc.) can be considered .

The results, for various values of , are reported in Tables 1 to 3. Each of the 29 rows represents a different experiment in the benchmark set, and the columns stand for the different algorithms, according to the following legend.

F

The fixed-point iteration (3), from an initial value such that (and with renormalization to at each step).

IO

The inner-outer iteration method, as described in .

N

Newton’s method with normalization to at each step, as described in . Note that this is not Algorithm 1, which would converge instead to the minimal solution .

P

The Perron method (Algorithm 2).

PN

The Perron–Newton method (Algorithm 3).

N-xxx, PN-xxx

N or PN methods combined with continuation with one of the four strategies described in Section 5.

For each experiment, we report the number of iterations required, and the CPU times in seconds (obtained on Matlab R2017a on a computer equipped with a 64-bit Intel core i5-6200U processor). The value NaN inside a table represents failure to converge after 10,000 iterations. For P and PN, the number of iterations is defined as the sum of the number of Newton iterations required to compute with Algorithm 1 and the number of iterations in the main algorithm. For the continuation algorithms, we report the number of inner iterations, that is, the total number of iterations performed by PN or N (respectively), summing along all calls to the subroutine.

In addition to the tables, performance profiles [16, Section 22.4] are reported in Figures 13.

Note that neither the number of iterations nor the CPU time are particularly indicative of the true performance of the algorithms: indeed, the iterations in the different methods amount to different quantity of work, since some require merely linear system solutions and some require eigenvalue computations. Moreover, Matlab introduces overheads which are difficult to predict for several instructions (such as variable access and function calls). In order to make the time comparisons more fair, for methods IO and N we did not use the code in , but rather rewrote it without making use of Matlab’s object-oriented features, which made the original functions significantly slower. The only change introduced with respect to their implementations is that we changed the stopping criterion to , so that the stopping criterion is the same one for all tested methods.

In any case, the performance comparison between the various methods should be taken with a pinch of salt.

We comment briefly on the results. Newton-based methods typically require a constant number of iterations to converge, but on some of the benchmarks they fail to converge or require a much larger number of iterations. From the point of view of reliability, combining the Perron–Newton algorithm with continuation strategies gives a definite advantage: the resulting methods are the only ones (among the ones considered) that can solve successfully all the problems in the original benchmark set in , which contained the experiments with all values of up to . Raising to the more extreme value of reveals failure cases for these methods as well. (However, if one reduces the step-size selection parameter to , methods PN-EXT and PN-T1 succeed on all examples also for .)

Our results confirm the findings of  that problem R6_3 (the third to last one) for is the hardest problem of the benchmark set; Perron-based algorithms can solve it successfully in the end, but (like the algorithms in ) they stagnate for a large number of iterations around a point which is far away from the true solution. The analysis in [1, End of Section 6.3 and caption of Figure 9] also attributes this difficulty to the presence of a ‘pseudo-solution’ that is an attraction basin for a large set of starting points.

To understand the specific difficulties encountered by continuation methods on problem R6_3, it is instructive to consider the plot in Figure 4. Figure 4: The first entry s1 of each stochastic solution to (2) for the benchmark problem R6_3, plotted for various values of α (on the x-axis). The behavior of all other entries of s is similar.

This figure, which has been generated with the help of the software Bertini , shows the first entry of each valid stochastic solution to (2) (recall that there may be more than one solution), for varying values of . It turns out that the problem has a single solution for all values of up to , then two more solutions appear, initially coincident and then slowly separating one from the other. Two of these three solutions merge at about and then disappear (i.e., they are replaced by a pair of complex conjugate solutions), so for there is only one real stochastic solution remaining. Note that the points where the number of solutions changes are points in which the Jacobian is singular, in accordance with the implicit function theorem.

Continuation methods ‘track’ the upper solution, and hence fail to provide a good starting point for , when it disappears suddenly and is replaced by a completely different solution. Moreover, for , the numerical behavior of these iterations is affected by the presence of a point with a very small residual. For an intuition, the reader may consider the behavior of Newton’s method (on a function of a single variable) in presence of a local minimum very close to , e.g., on for a very small .

## 7 Conclusions

We have used the theory of quadratic vector equations in [8, 9, 10] to attack the multilinear PageRank problem described in . The tools that we have introduced improve the theoretical understanding of this equation and of its solutions. In particular, considering the minimal solution in addition to the stochastic ones allows one to use a broader array of algorithms, with computational advantage. The new algorithms achieve better results when , which is the most interesting and computationally challenging case.

Possible future improvements to these algorithms include experimenting with further step-size selection strategies, and more importantly integrating them with inexact eigenvalue solvers that are more suitable to computation with large sparse matrices.

## References

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters   