Analysis of Krylov subspace approximation to
Large Scale Differential Riccati Equations
We consider a Krylov subspace approximation method for the symmetric differential Riccati equation , . The method we consider is based on projecting the large scale equation onto a Krylov subspace spanned by the matrix and the low rank factors of and . We prove that the method is structure preserving in the sense that it preserves two important properties of the exact flow, namely the positivity of the exact flow, and also the property of monotonicity. We also provide a theoretical a priori error analysis which shows a superlinear convergence of the method. This behavior is illustrated in the numerical experiments. Moreover, we derive an efficient a posteriori error estimate as well as discuss multiple time stepping combined with a cut of the rank of the numerical solution.
Key words. Differential Riccati equations, LQR optimal control problems, large scale ordinary differential equations, Krylov subspace methods, matrix exponential, exponential integrators, model order reduction, low rank approximation.
AMS subject classifications. 65F10, 65F60, 65L20, 65M22, 93A15, 93C05
Large scale differential Riccati equations (DREs) arise in the numerical treatment of optimal control problems governed by partial differential equations. This is the case in particular when solving a linear quadratic regulator problem (LQR), a widely studied problem in control theory. We shortly describe the finite dimensional LQR problem. For more details, we refer to [1, 9]. The differential Riccati equation arises in the finite horizon case, i.e., when a finite time integral cost functional is considered. The functional has then the quadratic form
where , () and (). The coefficient matrix of the penalizing term is symmetric, nonnegative and has a low rank. The functional (LABEL:eq:quad_cost) is constrained by the system of differential equations
where the matrix is sparse and . The number of columns of correspond to the number of controls and matrix represent an observation matrix. Under suitable conditions (see [1, 9]), the optimal control minimizing the functional (LABEL:eq:quad_cost) is given by
is the unique solution of
As a result, the central computational problem becomes that of solving the final value problem (LABEL:eq:optimal_DRE) which, with a careful change of variables, can be written as a initial value problem.
We consider a Krylov subspace approximation method for large scale differential Riccati equations of the form (LABEL:eq:optimal_DRE). A similar projection method for DREs has been recently proposed in . Our approach differs from that of  in the fact that the initial value matrix of (LABEL:eq:optimal_DRE) is contained in the Krylov subspace. This allows multiple time stepping. Our approach is also closely related to projection techniques considered for large scale algebraic Riccati equations [27, 33].
Essentially, the method we consider is based on projecting the matrices and on an appropriate Krylov subspace, namely on the block Krylov subspace spanned by and the low rank factors of and . The projected small dimensional system is then solved using existing linearization techniques. We show that when using a Padé approximant to solve the linearized small dimensionals system the total approximation will be structure preserving in the sense that the property of the positivity is preserved and also under certain conditions the property of monotonicity.
The linearization approach for DREs is a well-known method. This allows an efficient integration for dense problems, see e.g. . Another approach, the so called Davison–Maki method , uses the fundamental solution of the linearized system. A modified variant, avoiding some numerical instabilities, is proposed in . However, the application of these methods for large scale problems is impossible due to the high dimensionality of the initial value in the linearized differential equation.
The problem of solving large scale DREs has received recently considerable attention. In [5, 4] the authors proposed efficient BDF and Rosenbrock methods for solving DREs capable of exploiting several of the above described properties: sparsity of , low rank structure of , and , and the symmetry of the solution. However, several difficulties arise when approximating the optimal control (LABEL:eq:control) in the large scale setting. One difficulty is to evaluate the state equation and Riccati equation in the same mesh. In  a refined ADI integration method is proposed which addresses the high storage requirements of large scale DRE integrators. In recent studies an efficient splitting method  and adaptive high-order splitting schemes  for large scale DREs have been proposed.
Our Krylov subspace approach is also strongly related to Krylov subspace techniques used for approximation of the product of an matrix function and a vector, , and to exponential integrators . For an introduction to matrix functions we refer to the monograph . The effectiveness of these Krylov subspace techniques come from the fact that generating Krylov subspaces is mostly based on operations of the form , which are cheap for sparse . The Krylov subspace approximation of products of the form has recently been an active topic of research, and we mention the work on classical Krylov subspaces [14, 15, 23, 29], extended Krylov subspaces , and rational Krylov subspaces [39, 3].
The paper is organized as follows. In Section 2 we describe some preliminaries. Then, in Section 3, the structure preserving method is proposed. In Section 4, the error analysis first for the differential Lyapunov equation (a simplified version of the DRE), and then for the DRE is presented. In Section 5 a posteriori error estimation is described. In Section 6 the rank cut and multiple time stepping are discussed. Numerical examples and conclusions of Sections 7 and 8 conclude the article.
Notation and definitions
Throughout the article will denote the Euclidean norm, or its induced matrix norm, i.e., the spectral norm. By will denote the column space of a matrix . We say that a matrix is nonnegative if it is symmetric positive semidefinite, and write . For symmetric matrices and we write if .
We will repeatedly use the notion of the logarithmic norm of a matrix . It can be defined via the field of values , which is defined as
Then, the logarithmic norm of is defined by
We will also repeatedly use the exponential like function defined by
From now on we consider the time invariant symmetric differential Riccati equation (DRE) written in the form
where and , , . Specifically, we focus on the low rank positive semidefinite case, where
for some and , . Notice that we changed here from to to and from now on is tall and skinny instead of short and fat as in (LABEL:eq:optimal_DRE).
We recall a fact that will be needed later on (see e.g. [1, Thm. 3.1.1.]).
Lemma 1 (Associated linear system)
The DRE (LABEL:eq:DRE) is equivalent to solving the linear system of differential equations
where . If the solution exists on the interval , then the solution of (LABEL:eq:hamdiff) exists, is invertible on , and
Notice also that the matrix is Hamiltonian, i.e., it holds that
This linearization approach for the differential Riccati equation is a standard method for solving finite dimensional DREs, and leads to efficient integration methods for dense problems, see e.g. .
2.2 Integral representation of the exact solution
For the exact solution of (LABEL:eq:DRE) we have the following integral representation (see also [24, Thm. 8]).
Theorem 2 (Exact solution of the DRE)
The exact solution of the DRE (LABEL:eq:DRE) is given by
Proof. Can be verified by elementary differentiation.
2.3 Positivity and monotonicity of the exact flow
We recall two important properties of the symmetric DRE, namely the positivity of the exact solution (see e.g. [12, Prop. 1.1]) and the monotonicity of the solution with relative to the initial data (see e.g. [13, Thm. 2]). By these we mean the following.
Theorem 3 (Positivity and monotonicity of the solution)
For the solution of the symmetric DRE (LABEL:eq:DRE) it holds:
(Positivity) is symmetric positive semidefinite and it exists for all .
(Monotonicity) Consider two symmetric DREs of the (LABEL:eq:DRE) corresponding to the linearized systems of the form (LABEL:eq:hamdiff) with the coefficient matrices
and let be the skew-symmetric matrix (LABEL:eq:Ham_property). Then, if , and if , then for every : .
We will later show that our proposed numerical method preserves the properties of Theorem LABEL:thm:positivity_and_monotonicity.
2.4 Bound for the solution
Using the positivity property of (Thm. LABEL:thm:positivity_and_monotonicity) we obtain the following bound for the norm of the solution. This will be repeatedly needed in the analysis of the proposed method.
Lemma 4 (Bound for the exact solution)
For the solution of (LABEL:eq:DRE) it holds
Proof. Since , and are all symmetric positive definite, we see that the first two terms on the right hand side of (LABEL:eq:exact_solution) are symmetric positive semidefinite and the third term is symmetric negative semidefinite. Moreover, since is symmetric positive definite by Theorem LABEL:thm:positivity_and_monotonicity, and since for every symmetric positive definite matrix it holds that , we see that
Using the well-known bound (see e.g. ), the fact that and that , the claim follows.
From Lemma LABEL:lem:bound_exact we immediately get the following corollary.
The solution satisfies
3 A Krylov subspace method and its structure preserving properties
Given an orthogonal basis matrix , we first show how to carry out the projection of the system (LABEL:eq:DRE) using the Galerkin condition. The choice of the matrix is then motivated by a moment-matching approach. This means that we Taylor expand the exact solution with respect to time variable and inspect which subspaces are sufficient to approximate the expansion.
The fact that needs to contain certain Krylov subspaces in order to obtain a polynomial approximation of can alternatively be seen from the point of view of Krylov subspace approximation of the matrix exponential. This is strongly related to the approach taken by Saad already in  for the algebraic Lyapunov equation. This approach will give some auxiliary results that are needed in the convergence analysis of the method.
3.1 Approximation of the solution in Krylov subspaces
Our strategy is to approximate the symmetric positive semidefinite solution by a low rank matrix
orthogonal. We first describe how to obtain the numerical approximation (LABEL:eq:X_low_rank) by using the Galerkin condition, given an orthonormal matrix . An analogous approach for the algebraic Riccati equation can be found in .
Denote the right hand of the DRE (LABEL:eq:DRE) as
Then, the Galerkin condition
directly gives the projected differential equation
Our choice of the subspace is guided by the following lemma.
Let satisfy the DRE (LABEL:eq:DRE) and let . Then, there exist matrices and such that
Proof. The proof goes by induction. The claim clearly holds for since . Suppose the claim holds for alll , . Then, in particular, for some matrices and such that . Then, from the DRE (LABEL:eq:DRE) we infer that
Clearly, for each of the terms on the right hand side of (LABEL:eq:lin_comb) there exists matrices and such that , and that the term is of the form for some matrix . This holds clearly also for their linear combination . The claim follows then from the fact that is a symmetric matrix.
Lemma LABEL:lem:Taylor_polynomial directly gives the following corollary.
Let satisfy the DRE (LABEL:eq:DRE). Let and consider the Taylor polynomial
Then, there exist matrices and such that
3.2 Block Krylov subspace approximation of the matrix exponential
Block Krylov subspace methods are based on the idea of projecting a high dimensional problem involving a matrix and a block matrix onto a lower dimensional subspace, a block Krylov subspace , which is defined by
Usually, an orthogonal basis matrix for is generated using an Arnoldi type iteration, and this matrix is then used for the projections. There exist several Arnoldi type methods to produce an orthogonal basis matrix for , and we choose here the block Arnoldi iteration given in  which is listed algorithmically as follows.
Input: , and number of iterations .
Start. Compute QR decomposition: .
Iterate. for compute:
As usual, the orthogonalization can be carried out at step 3 in a modified Gram–Schmidt manner and reorthogonalization can be performed if needed.
This algorithm gives an orthogonal basis matrix for the block Krylov subspace and the projected block Hessenberg matrix
This means that the -block of is given by in the above algorithm. Moreover, the following Arnoldi relation holds:
If has its field of values on a line, e.g., is Hermitian or skew-Hermitian, then there exists such that is Hermitian. By (LABEL:eq:block_Hessenberg) this implies that is block tridiagonal. Then, the orthogonalization in the above algorithm has to be done only against two previous blocks, and the second line of step 3 can be replaced by
The resulting algorithm is called the block Lanczos iteration.
Since , we have the following result for the block Krylov approximation of matrix polynomials.
For all polynomials of degree the following holds:
Proof. The proof goes analogously to the proof of [29, Lemma 3.1], where is a vector.
Lemma LABEL:lem:polynomials motivates to carry out the approximation of the product of the matrix exponential and a block matrix as
Since the columns of are orthonormal, we have
and from this it follows that . Clearly, it also holds that . From these bounds and from Lemma LABEL:lem:polynomials we get the following result.
For the approximation (LABEL:eq:exp_approx) holds the following bound:
Proof. The proof goes analogously to the proof of [15, Thm 2.1], where is a vector.
Lemma LABEL:lem:exp_error will be needed repeatedly in the upcoming error analysis.
3.3 The method
Motivated by Corollary LABEL:cor:Taylor_polynomial, we approximate in the block Krylov subspace . To this end, an orthogonal basis matrix for is generated using the block Arnoldi iteration. Then, led by the Galerkin condition (equation (LABEL:eq:small_equation)), we carry out the approximation as listed in Algorithm LABEL:Alg:main.
the orthogonal basis matrix of
the block Hessenberg matrix
the matrices and
3.4 Solving the small dimensional system
To solve the small dimensional system (LABEL:eq:riccati_small_system) we use the modified Davison–Maki method which is presented in . This method is chosen because of its structure preservation properties which are shown in the next subsection. The method and can be described as follows.
By Lemma LABEL:lem:linearization, the solution of the projected system (LABEL:eq:riccati_small_system) is given by
Instead of directly evaluating by (LABEL:eq:exp_H_k), which was the idea of the original Davison–Maki method , we perform substepping in order to avoid numerical instabilities arising from the inversion of the matrix in (LABEL:eq:exp_H_k). This is exactly the modified Davison–Maki method, and it is enlisted in the following pseudocode.
Input: Hamiltonian matrix , ,
time , substep size , .
Iterate. for :
For computing the matrix exponential in Step 3, we use the 13 order diagonal Padé approximant
which is implemented in Matlab as ’expm’ command .
3.5 Structure preserving properties of the approximation
We next inspect the two properties stated in Theorem LABEL:thm:positivity_and_monotonicity. We show that the proposed projection method preserves the property of the positivity of the exact flow, and it also preserves the property of monotonicity under the condition that the matrix used for the projection stays constant when the initial data for the DRE is changed. Notice that these results are not restricted to polynomial Krylov subspace methods.
The numerical approximation given by Algorithm LABEL:Alg:main preserves the property of positivity stated in Theorem LABEL:thm:positivity_and_monotonicity.
Proof. The projected coefficient matrices , and the initial value of the small system (LABEL:eq:riccati_small_system) are clearly all symmetric nonnegative. Thus the small system (LABEL:eq:riccati_small_system) is a symmetric DRE. By Theorem 3.1 of , an application of a symplectic Runge–Kutta scheme with positive weights (see  for details) gives as a result a symmetric nonnegative solution. As the th order diagonal Padé approximant equals the stability function of the -stage Gauss–Legendre method (see e.g. [21, p. 46]), all the substeps of the modified Davison–Maki method (Subsection LABEL:subsec:scaling_and_squaring) give symmetric nonnegative matrices and as a result is symmetric nonnegative as well as .
The numerical approximation given by Algorithm LABEL:Alg:main preserves the property of monotonicity in the following sense. Consider two DREs corresponding to linearizations with the coefficient matrices
Suppose both systems are projected using the same orthogonal matrix , giving as a result small -dimensional systems of the form (LABEL:eq:riccati_small_system) for the matrices and . Then, for the matrices and we have
Proof. Consider the projected systems of the form (LABEL:eq:riccati_small_system) corresponding to and with the projected coefficient matrices , and , and , and , respectively. Consider also the corresponding linearizations of the form (LABEL:eq:linearizations) with the Hamiltonian matrices and .
By the reasoning of the proof of Theorem LABEL:thm:numerical_positivity, the projected systems corresponding to and are symmetric DREs. Moreover, from (LABEL:eq:Ham_ineq) it clearly follows that , where , and also that . By Theorem 6 of , an application of a symplectic Runge–Kutta scheme with positive weights (see  for details) preserves the monotonicity. Thus the Padé approximants of the substeps of the modified Davison–Maki method (Subsection LABEL:subsec:scaling_and_squaring) preserve the monotonicity. Thus, and as a consequence .
As the basis matrix given by Algorithm LABEL:Alg:main is independent of the matrix in the DRE (LABEL:eq:DRE), where is the control matrix in the original linear system (LABEL:eq:lin_op), we see that Algorithm LABEL:Alg:main preserves monotonicity under modifications of . This is important as the property of monotonicity means the ability to modify the costs and the optimal control through modifications in the input data (see ).
4 A priori error analysis
We first consider the approximation of the DRE without the quadratic term , i.e., we consider the differential Lyapunov equation. This clarifies the presentation as the derived bounds will be later needed when considering the approximation of the differential Riccati equation. We note, however, that the bounds for the Lyapunov equation are applicable outside of the scope of the optimal control problems, e.g., when considering time integration of a two dimensional inhomogeneous heat equation.
4.1 Error analysis for the Lyapunov equation
Consider the symmetric Lyapunov differential equation with low rank initial data and constant low rank inhomogeneity,
where and , . Then, the approximation is given by , where is a solution of the projected system (LABEL:eq:riccati_small_system) with . For the error of this approximation we obtain the following bound.
Let , , , and let be the solution of (LABEL:eq:Lyapunov). Let be an orthogonal basis of the block Krylov subspace . Let be the solution of the projected system (LABEL:eq:riccati_small_system) with , and let . Then,
Proof. Using the integral representation of Theorem LABEL:thm:exact_solution for both and , we see that
Adding and substracting to the right hand side of (LABEL:eq:err1) gives
Using Lemma LABEL:lem:exp_error to bound the norm of , and using the fact that and that , gives
Then, similarly, adding and substracting the term to (LABEL:eq:err2) and applying Lemma LABEL:lem:exp_error shows that
which shows the claim.
4.2 Refined error bounds for the Lyapunov equation
Although Theorem LABEL:thm:Lyapunov shows the superlinear convergence speed for the approximation of the Lyapunov equation (LABEL:eq:Lyapunov), sharper bounds can be obtained, e.g., by using the bounds given in . As an example we consider the following. If is symmetric negative semi-definite with its spectrum inside the interval , and is an orthonormal basis matrix for the block Krylov subspace , we have (see [19, Thm. 2]) for the error the bound
Using (LABEL:eq:HL_bounds) and following the proof of Theorem LABEL:thm:Lyapunov, we get the following bound for the case of a symmetric negative semidefinite .
Let , and define the Lyapunov equation LABEL:eq:Lyapunov. Let be an orthogonal basis matrix of the subspace . Let be the solution of the projected (using ) system (LABEL:eq:riccati_small_system) with , and let . Then, for the error it holds that
The bound (LABEL:eq:symm_refined_bound) can be illustrated with the following simple numerical example. Let be the tridiagonal matrix , , and let and be random vectors. Figure LABEL:fig:bound1 shows the convergence of the algorithm vs. the a priori bound (LABEL:eq:symm_refined_bound).
4.3 Error for the approximation of the Riccati equation
Here, we state our main theorem which shows the superlinear convergence property of Algorithm LABEL:Alg:main when applied to the DRE (LABEL:eq:DRE). Its proof, which is essentially based on Lemma LABEL:lem:exp_error and Grönwall’s lemma, is lengthy and is left to the appendix.
First, however, we state a bound for the norm of the numerical solution which will be needed in the proof of the main theorem.
Suppose, , and that is symmetric nonnegative. Then, is symmetric nonnegative, and satisfies the bound
Proof. As , and are symmetric and nonnegative, we see from (LABEL:eq:riccati_small_system) that so are the orthogonally projected matrices , and . Thus, the projected system is a symmetric DRE. Applying Lemma LABEL:lem:bound_exact to the projected system, and using the facts that , and , shows the claim.
From Lemma LABEL:lem:bound_X_k we immediately get the following bound.
The numerical solution satisfies
We are now ready to state an error bound for the DRE. The proof is left to the appendix.
Let , , and defined the DRE (LABEL:eq:DRE). Let be the numerical solution given by Algorithm LABEL:Alg:main. Then, the following bound holds:
5 A posteriori error estimation
We consider next an a posteriori error estimation of the method by using ideas presented in .
Denote the original DRE (LABEL:eq:DRE) as
Using the residual matrix we derive computable error estimates. These derivations are based on the following lemma.
The error satisfies the equation
Proof. We see that the error satisfies the ODE
with the initial value . Applying the variation-of-constants formula to (LABEL:eq:E_k_ode) gives (LABEL:eq:apost_representation).
Next we show the representation (LABEL:eq:R_k_representation). Since