Convex Optimization methods for computing
Exponent of matrices
The first author is supported by the RFBR grants No 10-01-00293 and No
11-01-00329, and by the grant of Dynasty foundation. This research was carried out while the first author was visiting
Center of Operation Research and Econometrics (CORE) and Université Catholique de Louvain (Louvain-la-Neuve, Belgium)
in Spring, 2011. That author is grateful to the institute and to the university for their hospitality.
The second author is an F.R.S.-FNRS fellow.
We introduce a new approach to evaluate the largest Lyapunov exponent of a family of nonnegative matrices. The method is based on using special positive homogeneous functionals on , which gives iterative lower and upper bounds for the Lyapunov exponent. They improve previously known bounds and converge to the real value. The rate of converges is estimated and the efficiency of the algorithm is demonstrated on several problems from applications (in functional analysis, combinatorics, and language theory) and on numerical examples with randomly generated matrices. The method computes the Lyapunov exponent with a prescribed accuracy in relatively high dimensions (up to 60). We generalize this approach to all matrices, not necessarily nonnegative, derive a new universal upper bound for the Lyapunov exponent, and show that such a lower bound, in general, does not exist.
Let us consider a family of linear operators acting in . To each operator we associate a positive number so that . In the sequel we assume that every family of operators is equipped with a family of numbers. Consider a random product , where all indices are independent and identically distributed random variables; each takes values with probabilities respectively. According to the Furstenberg-Kesten theorem  the value converges with probability to a number , which depends only on the family , i.e., on the operators and on the probabilities . The number is called the largest Lyapunov exponent of the family . In this paper we do not deal with other Lyapunov exponents, and, for the sake of simplicity, we omit the word “largest”. This number can be defined by the following limit formula
where denotes the mathematical expectation. The results of this paper can be extended to more general matrix distributions, but we restrict ourselves to i.i.d. matrices taking values on a finite set .
We introduce a new approach for computing the Lyapunov exponent based on using special positive homogeneous functionals on . The idea is the following: for any such a functional the minimal and maximal expected value of over all , give a lower and upper bound respectively for . For families of nonnegative matrices those bounds can be effectively computed and then optimized over certain families of functionals , which leads to optimal bounds that, under some general assumptions, rapidly converge to as . For every both and are found by solving unconstrained convex minimization problems. The rate of convergence is proved to be at least linear in , but in most of practical examples it is much faster. For dimensions up to it usually takes less than iterations to estimate with the relative precision . All computations take a few minutes on a standard desktop computer.
In Section II we describe the new approach for operators with a common invariant cone, then in Section III we consider two special families of functionals and the corresponding bounds and . In Section IV it is shown that for nonnegative matrices both those bounds can be found and optimized over the corresponding families as solutions of certain convex minimization problems. In Theorems 2 and 3 we prove that under some general assumptions on matrices we have , where is an effective constant. In Section V this technique is extended to all matrices, without the nonnegativity condition. We derive an upper bound for , which is sharper than the classical upper bound . On the other hand, Theorem 4 proved in that section shows that there are no good lower bounds for the Lyapunov exponent of general matrices. Finally, in Section VI we compute or estimate Lyapunov exponents of special families of matrices arising in problems of functional analysis, combinatorics, and language theory, and also report numerical results with randomly generated matrices.
Lyapunov exponents of matrices have been studied in the literature in great detail due to many applications in probability, ergodic theory, functional analysis, combinatorics, etc. (see [18, 26, 19, 11, 7, 21, 16] and references therein). A special attention has been paid to the case of nonnegative matrices, i.e., matrices with nonnegative entries [14, 12, 23]. The problem of computing or estimating the Lyapunov exponent is known to be extremely hard. It is even algorithmically undecidable in general . Nevertheless, there are several methods for approximate computation of the Lyapunov exponent that work well in most of practical cases. These are methods for special families of matrices arising in applications [7, 21, 16], for general nonnegative matrices [17, 10], and for general families [4, 5, 1]. The method proposed in this paper for nonnegative matrices has several advantages compared with those previously known: 1) it produces upper and lower bounds for the Lyapunov exponent that both converge to with a linear rate as the number of iterations grows. So, the Lyapunov exponent is sandwiched between two values. The rate of convergence is estimated theoretically, but in practice, as we see in numerical examples, it converges much faster. 2) The method works equally well for high dimensions. In examples with randomly generated matrices of dimension it computes with a relative error less than within a few iterations. 3) We relax the assumptions on nonnegative matrices imposed in the previous papers on the subject.
The most popular upper bound used in the literature is
where is the set of all products of matrices of length (with the corresponding probabilities). For any norm this bound converges to as . Usually one takes the Euclidean norm . As for the lower bounds for nonnegative matrices, the most well-known of them comes from the results of Key , which uses the same formula, but with an arbitrary submultiplicative functional instead of the norm. We shall see that our bounds are closer to the real value of and have a guaranteed rate of convergence as . The theoretical reasons for that are the following: 1) In our bounds, we manage to interchange the Expectation- and Maximum-operations, which results in a smaller upper bound and a larger lower bound. 2) We do not restrict ourself to an a priori fixed functional (or norm), but rather we show how to optimize it over a large family of functionals.
In Section V we extend this approach to general matrices, without the nonnegativity assumption, and obtain an upper bound that is better than (2). We also prove that such a lower bound for general matrices does not exist.
2 Operators with a common invariant cone: Bounds for the Lyapunov exponent
Assume that all operators share a common invariant cone , which is supposed to be convex, closed, solid, pointed, and having its apex at the origin. For any points we write if and if . For an operator we write if it leaves the cone invariant. The same notation are used for the dual cone .
Consider a functional . In the sequel we impose the following assumptions on :
1) is positive, i.e., , whenever ;
2) is homogeneous, i.e., for any .
Now we define two values and for any family and for any functional :
We use the short notation , and the same with , if the functional and the family are fixed. Denote also . Thus,
and similarly with . Thus, is the smallest expected value of the logarithm of the ratio over all . Let us first make the following simple observation:
For every natural and we have
which completes the proof for . The proof for is the same. ∎
For an arbitrary functional and for every we have and .
For an arbitrary functional and for any we have . In particular, .
It suffices to prove that . Then applying this inequality to the family and taking into account that one obtains .
By the compactness argument it follows that there are positive constants such that . Actually, these constants are respectively the minimum and the maximum of on the intersection of the unit sphere with the cone . Applying Corollary 1, we obtain for every
Since and as , we see that . On the other hand, the same Corollary 1 implies that
for every such that . Furthermore, for each there is a positive constant such that for every operator leaving the cone invariant, we have (see, for instance, ). Therefore, , and hence
By Fekete’s lemma  for any sequence of nonnegative numbers such that , the limit exists and equals to . Similarly, if , then . Applying Lemma 1 we see that (denote this limit by ), and (denote this limit by ). Invoking now Lemma 2 we obtain the following
For every family and for every functional we have
Thus, to approximate the Lyapunov exponent one can take an arbitrary functional and get the values and as a lower and upper bound respectively. Iterating, one obtains the bounds and , which, by Corollary 1, are, at least, not worse. If the two inner inequalities in (3) become equalities, then and converge from different sides to , which allows us to compute with an arbitrary prescribed accuracy. Sufficient conditions for that are given in Theorem 1 below. Finally, in the ideal case, when , all the inequalities in (3) become equalities. In this case we get a sharp value of immediately, just by evaluating . Such “ideal” functionals are called invariant.
A functional is called invariant for a family if
Thus, is invariant if and only if . In view of Lemma 2 both these values equal to .
For any invariant functional the constant in Definition 1 is equal to the Lyapunov exponent of .
Certainly, invariant functionals do not exist for all families that have invariant cones. For nonnegative matrices sufficient conditions were obtained in , we shall cite that result in Theorem A (Section IV). However, even if an invariant functional exists, it may be very difficult to find or to approximate. Nevertheless, as the following theorem says, the very existence of an invariant functional guarantees that for an arbitrary functional the values and both converge to with the linear rate.
For an arbitrary family and for every functional we have .
If, in addition, there is an invariant functional for this family, then for every functional we have . In this case
where the constant depends only on and on .
There are positive constants such that . Taking the operator norm and using the fact that the mean of maxima is bigger than or equal to the maximum of means, we obtain for every
Taking limit as , we get . Comparing with (3) we see that .
Let be an invariant functional for . By the compactness argument, for an arbitrary functional on there are positive constants such that . Therefore,
In the same way we show that , and hence
Taking limit as , we obtain , which completes the proof.∎
Thus, for every functional we have . If the family possesses an invariant functional on the cone , then for every functional the values and converge from two sides to the Lyapunov exponent , and the distance between them decays as . This provides a theoretical opportunity to compute the Lyapunov exponent with a given precision, using an arbitrary functional . To realize this idea we need to compute the values and for large . Each computation actually requires the resolution of an optimization problem, for which one needs to find a global optimum of the function on the cone . Therefore, the functional should be chosen in a special way, to obtain the objective function convenient for global minimizing/maximizing. In the next section we define two families of functionals (each depending on one -dimensional parameter), and then, in Section IV, we apply them for the cone (i.e., for the case of nonnegative matrices). Those functionals will allow us not only to evaluate the lower and upper bounds for , but also to optimize these bounds over all values of the parameters. This leads to a fast algorithm for computing the Lyapunov exponent of nonnegative matrices (Section IV). Even in relatively high dimensions (up to ) that algorithm computes the Lyapunov exponent with a good precision (the relative error is less than ). The corresponding numerical examples from applications and some results with randomly generated matrices are given in Section VI. Then, making use of the semidefinite lifting technique, we partially extend our technique to general matrices, without the nonnegativity assumption. Applying a special functional on the cone of positive semidefinite matrices, we obtain an upper bound for , which is, at least, not worse than the usual upper bound (2) with the Euclidean norm (Section V). In practice it is much more efficient, which is confirmed by numerical examples in Section VI. As for effective lower bounds, it is shown in Section V that they actually do not exist for general matrices. This explains well-known negative theoretical results on the Lyapunov exponent computation .
We have seen that for every functional the value actually coincides with the Lyapunov exponent. However, for this is, in general, not the case, unless the family possesses an invariant functional. The main problem, therefore, is the lower bound for the Lyapunov exponent. In Section V we shall see examples of matrix families that have no functionals such that as . That is why the existence of an invariant functional is crucial for deriving lower bounds that converge to the Lyapunov exponent.
3 Two special functionals
In this section we define two families of functionals , which then will be applied to compute the Lyapunov exponent of nonnegative matrices.
Let be an arbitrary finite family of matrices sharing an invariant cone . For every point consider the functional defined on as follows:
Geometrically, this functional is a norm on K, whose unit ball is . If then , therefore for any operator , and hence . Consequently, for this functional we have
Let us denote
Similarly we define and . Applying now Lemma 2, we conclude that for each and , and therefore .
To obtain a lower estimate for we take arbitrary and consider the linear functional . Again, this functional is a norm on whose unit ball is the intersection of with a half-space. For this functional we have
Similarly we define and . Lemma 2 now implies that for every and .
Thus, we have the following bounds for the Lyapunov exponent of a matrix possessing an invariant cone:
In general, they are not easy to compute. For instance, to evaluate we need to find the global minimum over of the function
This function is not convex in , it is actually quasiconcave, and hence its minimization may be hard. Nevertheless, we shall see in Section IV that in case the value is not only computable, but can be efficiently optimized over all , and the same is for . If we work with a general cone , then it is more convenient to apply the linear functional to get not a lower bound (as ), but the upper one. Doing so, we write for the functional and obtain
The objective function is concave, and hence its maximum on the convex set can be efficiently found. The same can be done for every :
The shortcoming of this estimate is that it is very hard to minimize over the set even in the case . Nevertheless, choosing appropriate one can obtain good upper bounds that converge fast to as . We use this bound in Section V for approximating Lyapunov exponents of general sets of matrices.
4 The Lyapunov exponent of nonnegative matrices
We are going to see that in case , i.e., when all the operators are written by nonnegative matrices, both estimates and are efficiently computable. We only show here how to evaluate and , since the computation of and is the same with replacing the family of matrices by the family of all their products of length .
We begin with . Let us first note that . Therefore,
Changing the variables, , we get
Interchanging and we write
Observe that the function is convex in . The maximum of convex functions is convex. Therefore, the value is a solution of the following convex minimization problem:
Let us now compute . We have
Thus, is the minimal value of the concave function on the simplex . This minimal value is attained at an extreme point, i.e., at a vertex of the simplex. Since its vertices are the vectors of the canonical basis, we have
Since and , where is the th column of the matrix , we obtain
For any and for any the set of solutions of the inequality
coincides with the set of solutions of the inequality
which is convex, because the function is concave in . Hence, for any the function is quasiconcave in , and therefore is quasiconcave as well, as a minimum of quasiconcave functions. Thus, is the solution of the following quasiconcave maximization problem:
Let us remember that for each the values and are defined by formulas (9) and (11) multiplied by and with replacing by . Similarly for the values and . Thus, for every vectors we have the following inequality:
4.1 Two conditions for nonnegative matrices
The bounds and are derived for all families of nonnegative matrices, and inequality (12) always holds. The question, however, is do they provide really effective estimations for the Lyapunov exponent, i.e., do they converge to as ? It appears that the answer is affirmative, whenever the matrices are not “too sparse”. More precisely, the family satisfies the following two assumptions:
(a) there is at least one strictly positive product of matrices from (with repetitions permitted);
(b) matrices from do not have zero rows nor zero columns.
Note that conditions (a) and (b) are assumed in most of papers studying random products of nonnegative matrices (see [26, 14, 12, 17, 23]). We shall see that condition (a) can always be omitted, but the situation with condition (b) is more complicated.
Let us recall that (Theorem 1), if there is an invariant functional for the family , then for any other functional the bounds and converge linearly to the Lyapunov exponent.
Theorem A . For every family of nonnegative matrices satisfying conditions (a) and (b) there exists an invariant functional on . This functional is, moreover, concave and monotone on the cone .
Applying now Theorem 1 to the functionals and , we arrive at
For every family of nonnegative matrices satisfying (a) and (b), and for every vectors we have and
where the constant depends on and .
The constant can be effectively estimated by entries of the matrices , see .
Since and , we see that the estimates and tend to as well, and . In general, there is no need to evaluate the optimal values in problems (9) and (11) with a good precision. To approximate it suffices to find points and for which the difference is small, say, less than , then the Lyapunov exponent is found with the precision . As we shall see in numerical examples, in practice the value decays much faster than , and it is enough to take a reasonably small (much smaller than ) to compute the Lyapunov exponent with the precision .
Estimates and can be extended to any set of matrices sharing a polyhedral invariant cone . If the cone is spanned by vectors and its dual cone is spanned by vectors , then writing formula for we replace by , and writing formula for we replace by . Here it is important that both sets of vectors are finite, i.e., that the cone is polyhedral. As we shall see in Theorem 4, for families of matrices with a non-polyhedral invariant cone an effective lower bound for may not exist at all. In particular, is not such a bound any more.
4.2 Omitting condition (a)
The lower and upper bounds and respectively can be computed for every family of nonnegative matrices. However, to prove the convergence of these bounds to we essentially used conditions (a) and (b), because Theorem A may fail without them. Moreover, there are simple examples showing that if at least one of the conditions (a) or (b) is not fulfilled, then the difference may not vanish as , in which case our bounds do not provide the Lyapunov exponent computation with a given prescribed accuracy. For every family condition (b) can be, of course, checked immediately. Condition (a) looks more difficult to verify. Nevertheless is can be checked efficiently as well. The corresponding algorithm takes arithmetic operations, where is the dimension, and is the number of matrices . Thus, if both conditions (a) and (b) are satisfied, then we apply Theorem 2 to compute the Lyapunov exponent . Otherwise, if at least one of them fails, one can still use inequality (12), but now there is no guarantee that both parts converge to as . For some families this still gives good numerical estimates for (as for the binomial matrices from Subsection VI.1 below). However, there are examples, when and are too far from each other for all , and these bounds become useless. A question arises: is it possible to obtain effective upper and lower bounds for the Lyapunov exponent (perhaps, different from and ) without those two conditions? We do not know the answer for condition (b). Is it true that if nonnegative matrices are allowed to have zero rows and columns, then the Lyapunov exponent can be sandwiched between two efficiently computable bounds, whose difference tend to zero?
As for condition (a), the answer is affirmative. That condition can be omitted, with a special modification of the upper bound . In this subsection we extend our approach to this case, when matrices of the family do not necessarily have a positive product. To begin with, we need the following key result proved in :
Theorem B . If a family of nonnegative matrices satisfies condition (b), but do not have a positive product, then one of the two following cases takes place:
(1) is reducible;
(2) there is a partition of the set to sets , on which every matrix from acts as a permutation.
In the latter case there exists a product of matrices from that has a block-diagonal form: strictly positive blocks corresponding to the sets .
Property (1) means that there is a nontrivial subspace of spanned by several basis vectors, that is invariant for all matrices from . Property (2) means that for every matrix there is a permutation of the set such that , where is a cone spanned by the vectors . To compute the Lyapunov exponent for a family not satisfying condition (a) one needs to consider both cases of Theorem B.
Case 1. The family is reducible. In this case there is a permutation of basis vectors, after which all matrices from take a block upper-triangular form. This permutation can be found by a combinatorial algorithm that takes arithmetic operations (see, for instance, [15, Lemma 3.1] and references therein). Now it remains to refer to the main result of the work : for a family of block upper-triangular matrices the Lyapunov exponent equals to the largest Lyapunov exponent of the blocks. Hence, in case (1) the problem of computing the Lyapunov exponent is reduced to several analogous problems in smaller dimensions.
Case 2. There is a partition of the set , on which every matrix from acts as a permutation. We first show that in this case, we can restrict our attention to the set . This is a union of faces of the positive orthant , corresponding to the sets of the partition. Clearly, for each . Consider an arbitrary functional on the set , which is positive () and homogeneous (). For this functional we define in the same way as in Section II. Then we establish the following analogue of Lemma 2:
If a family is irreducible, then for an arbitrary functional on and for each we have . In particular, .
The proof is literally the same as the proof of Lemma 2 with only one exception: in order to prove that we need to show that there exists a vector , for which
In the proof of Lemma 2 we established the existence of such a point in the cone , now we need this point in the set . To this end we apply the main result of the work : if a family is irreducible and its matrices have no zero columns and rows, then every nonzero vector satisfies (13). Thus, an arbitrary suffices. The remainder of the proof is the same as for Lemma 2.∎
From [24, theorem 4] it follows that for a family of matrices satisfying assumptions of the case (2) of Theorem B there exists an invariant functional on , for which . Now, precisely as in the proof of Theorem 1, we conclude that for every functional on one has
where the constant depends only on and on . To estimate the Lyapunov exponent it remains only to choose any convenient functional in order to compute the values and .
For we again choose with arbitrary . It appears that for this functional we again have the equality , where is defined by (10). This is not obvious, because now we take minimum not over the whole set , but over a much narrower set . We have
Since the minimum of a concave function on the simplex is attained at an extreme point, i.e., at a basis vector, we have
For we again take the functional , where . For every we have , where is our permutation. Whence,
This value will be denoted by . Interchanging and and writing , we obtain