Interpolation of inverse operators for preconditioning parameter-dependent equations This work was supported by the French National Research Agency (Grant ANR CHORUS MONU-0005)
We propose a method for the construction of preconditioners of parameter-dependent matrices for the solution of large systems of parameter-dependent equations. The proposed method is an interpolation of the matrix inverse based on a projection of the identity matrix with respect to the Frobenius norm. Approximations of the Frobenius norm using random matrices are introduced in order to handle large matrices. The resulting statistical estimators of the Frobenius norm yield quasi-optimal projections that are controlled with high probability. Strategies for the adaptive selection of interpolation points are then proposed for different objectives in the context of projection-based model order reduction methods: the improvement of residual-based error estimators, the improvement of the projection on a given reduced approximation space, or the re-use of computations for sampling based model order reduction methods.
This paper is concerned with the solution of large systems of parameter-dependent equations of the form
where takes values in some parameter set . Such problems occur in several contexts such as parametric analyses, optimization, control or uncertainty quantification, where are random variables that parametrize model or data uncertainties. The efficient solution of equation generally requires the construction of preconditioners for the operator , either for improving the performance of iterative solvers or for improving the quality of residual-based projection methods.
A basic preconditioner can be defined as the inverse (or any preconditioner) of the matrix at some nominal parameter value or as the inverse (or any preconditioner) of a mean value of over (see e.g. ). When the operator only slightly varies over the parameter set , these parameter-independent preconditioners behave relatively well. However, for large variabilities, they are not able to provide a good preconditioning over the whole parameter set . A first attempt to construct a parameter-dependent preconditioner can be found in , where the authors compute through quadrature a polynomial expansion of the parameter-dependent factors of a LU factorization of . More recently, a linear Lagrangian interpolation of the matrix inverse has been proposed in . The generalization to any standard multivariate interpolation method is straightforward. However, standard approximation or interpolation methods require the evaluation of matrix inverses (or factorizations) for many instances of on a prescribed structured grid (quadrature or interpolation), that becomes prohibitive for large matrices and high dimensional parametric problems.
In this paper, we propose an interpolation method for the inverse of matrix . The interpolation is obtained by a projection of the inverse matrix on a linear span of samples of and takes the form
where are arbitrary interpolation points in . A natural interpolation could be obtained by minimizing the condition number of over the , which is a Clarke regular strongly pseudoconvex optimization problem . However, the solution of this non standard optimization problem for many instances of is intractable and proposing an efficient solution method in a multi-query context remains a challenging issue. Here, the projection is defined as the minimizer of the Frobenius norm of , that is a quadratic optimization problem. Approximations of the Frobenius norm using random matrices are introduced in order to handle large matrices. These statistical estimations of the Frobenius norm allow to obtain quasi-optimal projections that are controlled with high probability. Since we are interested in large matrices, are here considered as implicit matrices for which only efficient matrix-vector multiplications are available. Typically, a factorization (e.g. LU) of is computed and stored. Note that when the storage of factorizations of several samples of the operator is unaffordable or when efficient preconditioners are readily available, one could similarly consider projections of the inverse operator on the linear span of preconditioners of samples of the operator. However, the resulting parameter-dependent preconditioner would be no more an interpolation of preconditioners. This straightforward extension of the proposed method is not analyzed in the present paper.
The paper then presents several contributions in the context of projection-based model order reduction methods (e.g. Reduced Basis, Proper Orthogonal Decomposition (POD), Proper Generalized Decompositon) that rely on the projection of the solution of on a low-dimensional approximation space. We first show how the proposed preconditioner can be used to define a Galerkin projection-based on the preconditioned residual, which can be interpreted as a Petrov-Galerkin projection of the solution with a parameter-dependent test space. Then, we propose adaptive construction of the preconditioner, based on an adaptive selection of interpolation points, for different objectives: (i) the improvement of error estimators based on preconditioned residuals, (ii) the improvement of the quality of projections on a given low-dimensional approximation space, or (iii) the re-use of computations for sample-based model order reduction methods. Starting from a -point interpolation, these adaptive strategies consist in choosing a new interpolation point based on different criteria. In (i), the new point is selected for minimizing the distance between the identity and the preconditioned operator. In (ii), it is selected for improving the quasi-optimality constant of Petrov-Galerkin projections which measures how far the projection is from the best approximation on the reduced approximation space. In (iii), the new interpolation point is selected as a new sample determined for the approximation of the solution and not of the operator. The interest of the latter approach is that when direct solvers are used to solve equation at some sample points, the corresponding factorizations of the matrix can be stored and the preconditioner can be computed with a negligible additional cost.
The paper is organized as follows. In Section 2 we present the method for the interpolation of the inverse of a parameter-dependent matrix. In Section 3, we show how the preconditioner can be used for the definition of a Petrov-Galerkin projection of the solution of on a given reduced approximation space, and we provide an analysis of the quasi-optimality constant of this projection. Then, different strategies for the selection of interpolation points for the preconditioner are proposed in Section 4. Finally, in Section 5, numerical experiments will illustrate the efficiency of the proposed preconditioning strategies for different projection-based model order reduction methods.
Note that the proposed preconditioner could be also used (a) for improving the quality of Galerkin projection methods where a projection of the solution is searched on a subspace of functions of the parameters (e.g. polynomial or piecewise polynomial spaces) , or (b) for preconditioning iterative solvers for , in particular solvers based on low-rank truncations that require a low-rank structure of the preconditioner . These two potential applications are not considered here.
2Interpolation of the inverse of a parameter-dependent matrix using Frobenius norm projection
In this section, we propose a construction of an interpolation of the matrix-valued function for given interpolation points in . We let , . For large matrices, the explicit computation of is usually not affordable. Therefore, is here considered as an implicit matrix and we assume that the product of with a vector can be computed efficiently. In practice, factorizations of matrices are stored.
2.1Projection using Frobenius norm
We introduce the subspace of . An approximation of in is then defined by
where denotes the identity matrix of size , and is the Frobenius norm such that with . Since , we have the interpolation property , . The minimization of has been first proposed in  for the construction of a preconditioner in a subspace of matrices with given sparsity pattern (SPAI method). The following proposition gives some properties of the operator (see Lemma 2.6 and Theorem 3.2 in ).
For all , we have
where the matrix and the vector are given by
Therefore, the solution of problem is with the solution of . When considering a small number of interpolation points, the computation time for solving this system of equations is negligible. However, the computation of and requires the evaluation of traces of matrices and for all . Since the are implicit matrices, the computation of such products of matrices is not affordable for large matrices. Of course, since , the trace of an implicit matrix could be obtained by computing the product of with the canonical vectors , but this approach is clearly not affordable for large .
Hereafter, we propose an approximation of the above construction using an approximation of the Frobenius norm which requires less computational efforts.
2.2Projection using a Frobenius semi-norm
Here, we define an approximation of in by
where , with . defines a semi-norm on . Here, we assume that the linear map is injective on so that the solution of is unique. This requires and is satisfied when and is the linear span of linearly independent invertible matrices. Then, the solution of is such that the vector satisfies , with
The procedure for the computation of and is given in Algorithm ?. Note that only matrix-vector products involving the implicit matrices are required.
Now the question is to choose a matrix such that provides a good approximation of for any and .
Hadamard matrices for the estimation of the Frobenius norm of an implicit matrix
Let an implicit -by- matrix (consider , with and ). Following , we show how Hadamard matrices can be used for the estimation of the Frobenius norm of an implicit matrix. The goal is to find a matrix such that is a good approximation of . The relation suggests that should be such that is as close as possible to the identity matrix. For example, we would like to minimize
which is the mean square magnitude of the off-diagonal entries of . The bound is known to hold for any whose rows have unit norm . Hadamard matrices can be used to construct matrices such that the corresponding error is close to the bound, see .
A Hadamard matrix is a -by- matrix whose entries are , and which satisfies where is the identity matrix of size . For example,
is a Hadamard matrix of size . The Kronecker product (denoted by ) of two Hadamard matrices is again a Hadamard matrix. Then it is possible to build a Hadamard matrix whose size is a power of 2 using a recursive procedure: . The -entry of this matrix is , where and are the binary vectors such that and . For a sufficiently large , we define the rescaled partial Hadamard matrix as the first rows and the first columns of .
Statistical estimation of the Frobenius norm of an implicit matrix
For the computation of the Frobenius norm of , we can also use a statistical estimator as first proposed in . The idea is to define a random matrix with a suitable distribution law such that provides a controlled approximation of with high probability.
Two distributions will be considered here.
(a) The rescaled Rademacher distribution. Here the entries of are independent and identically distributed with with probability . According to Theorem 13 in , the rescaled Rademacher distribution satisfies the -concentration property for
(b) The subsampled Randomized Hadamard Transform distribution (SRHT), first introduced in . Here we assume that is a power of . It is defined by where
is a diagonal random matrix where are independent Rademacher random variables (i.e. with probability ),
is a Hadamard matrix of size (see Section ?),
is a subset of rows from the identity matrix of size chosen uniformly at random and without replacement.
In other words, we randomly select rows of without replacement, and we multiply the columns by . We can find in  an analysis of the SRHT matrix properties. In the case where is not a power of , we define the partial SRHT (P-SRHT) matrix as the first rows of a SRHT matrix of size , where is the smallest power of such that . The following proposition shows that the (P-SRHT) distribution satisfies the -concentration property.
Let . We define the square matrix of size , whose first diagonal block is , and elsewhere. Then we have . The rest of the proof is similar to the one of Lemma 4.10 in . We consider the events and , where is the -th canonical vector of . The relation holds. Thanks to Lemma 4.6 in  (with ) we have . Now, using the scalar Chernoff bound (Theorem 2.2 in  with ) we have
The condition implies , and then , which ends the proof.
Such statistical estimators are particularly interesting for that they provide approximations of the Frobenius norm of large matrices, with a number of columns for which scales as the logarithm of , see and . However, the concentration property holds only for a given matrix . The following proposition ? extends these concentration results for any matrix in a given subspace. The proof is inspired from the one of Theorem 6 in . The essential ingredient is the existence of an -net for the unit ball of a finite dimensional space (see ).
We consider the unit ball of the subspace . It is shown in  that for any , there exists a net of cardinality lower than such that
In other words, any element of the unit ball can be approximated by an element of with an error less than . Using the -concentration property and a union bound, we obtain
with a probability at least . We now impose the relation , where . To prove , it remains to show that equation implies
We define . Let be such that , and . Then we have and , where is the inner product associated to the Frobenius norm . We have
We now have to bound the three terms in the previous expression. Firstly, since , the relation holds. Secondly, gives . Thirdly, by definition of , we can write and , so that we obtain . Finally, from , we obtain
Since , we have . Then and , so that implies
and then . By definition of , we can write for any , that implies .
Let us introduce the subspace of dimension less than , such that . Then, we note that with the conditions or , the distribution law of the random matrix satisfies the -concentration property. Thanks to Proposition ?, the probability that
holds for any is higher than . Then, by definition of , we have with a probability at least that for any , it holds
Then, taking the minimum over , we obtain .
Similarly to Proposition ?, we obtain the following properties for , with the solution of .
The optimality condition for yields . Since (where is the subspace introduced in the proof of Proposition ), we have
with a probability higher than . Using (which is satisfies for any realization of the rescaled Rademacher or the P-SRHT distribution), we obtain with a probability higher than . Then, yields the right inequality of . Following the proof of Lemma 2.6 in , we have . Together with , it yields the left inequality of . Furthermore, with probability , we have . Since the square of the Frobenius norm of matrix is the sum of squares of its singular values, we deduce
with a probability higher than , where is the largest singular value of . Then follows from the definition of .
Comparison and comments
We have presented different possibilities for the definition of . The rescaled partial Hadamard matrices introduced in section ? have the advantage that the error is close to the theoretical bound , see Figure ? (note that the rows of have unit norm). Furthermore, an interesting property is that has a structured pattern (see Figure ?). As noticed in , when the matrix have non-zero entries only on the -th upper and lower diagonals, with . As a consequence, the error on the estimation of will be induced only by the non-zero off-diagonal entries of that occupy the -th upper and lower diagonals, with . If the entries of vanish away from the diagonal, the Frobenius norm is expected to be accurately estimated. Note that the P-SRHT matrices can be interpreted as a “randomized version” of the rescaled partial Hadamard matrices, and Figure ? shows that the error associated to the P-SRHT matrix behaves almost like the rescaled partial Hadamard matrix. Also, P-SRHT matrices yield a structured pattern for , see Figure ?. The rescaled Rademacher matrices give higher errors and yield matrices with no specific patterns, see Figure ?.
The advantage of using rescaled Rademacher matrices or P-SRHT matrices is that the quality of the resulting projection can be controlled with high probability, provided that has a sufficiently large number of rows (see Proposition ?). Table ? shows the theoretical value for in order to obtain the quasi-optimality result with and . It can be observed that grows very slowly with the matrix size . Also, depends on linearly for the rescaled Rademacher matrices and quadratically for the P-SRHT matrices (see equations and ). However, these theoretical bounds for are very pessimistic, especially for the P-SRHT matrices. In practice, it can be observed that a very small value for may provide very good results (see Section 5). Also, it is worth mentioning that our numerical experiments do not reveal significant differences between the rescaled partial Hadamard, the rescaled Rademacher and the P-SRHT matrices.
2.3Ensuring the invertibility of the preconditioner for positive definite matrix
Here, we propose a modification of the interpolation which ensures that is invertible when is positive definite.
Since is positive definite, is positive definite. We introduce the vectors and whose components
correspond respectively to the lowest and highest eigenvalues of the symmetric part of . Then, for any ,
where and are respectively the positive and negative parts of . As a consequence, if the right hand side of is strictly positive, then is invertible. Furthermore, we have , where is the vector of component , where denotes the operator norm of . If we assume that , the condition number of satisfies
It is then possible to bound by by imposing
which is a linear inequality constraint on and . We introduce two convex subsets of defined by
From , we have that any nonzero element of is invertible, while any nonzero element of is invertible and has a condition number lower than . Under the condition , we have
Then definitions and for the approximation can be replaced respectively by
which are quadratic optimization problems with linear inequality constraints. Furthermore, since for all , all the resulting projections interpolate at the points .
The following proposition shows that properties and still hold for the preconditioned operator.
Since (or ) is a closed and convex positive cone, the solution of is such that for all (or ). Taking and , we obtain that , which implies . We refer to the proof of Lemma 2.6 and Theorem 3.2 in  to deduce and . Using the same arguments, we prove that the solution of satisfies , and then that and hold with a probability higher than .
2.4Practical computation of the projection
Here, we detail how to efficiently compute and given in equation in a multi-query context, i.e. for several different values of . The same methodology can be applied for computing and . We assume that the operator has an affine expansion of the form
where the are matrices in and the are real-valued functions. Then and also have the affine expansions
respectively. Computing the multiple terms of these expansions would require many computations of traces of implicit matrices and also, it would require the computation of the affine expansion of . Here, we use the methodology introduced in  for obtaining affine decompositions with a lower number of terms. These decompositions only require the knowledge of functions in the affine decomposition , and evaluations of and (that means evaluations of ) at some selected points. We briefly recall this methodology.
Suppose that , with a vector space, has an affine decomposition , with and . We first compute an interpolation of under the form , with , where are interpolation points and the associated interpolation functions. Such an interpolation can be computed with the Empirical Interpolation Method  described in Algorithm ?. Then, we obtain an affine decomposition which can be computed from evaluations of at interpolation points .
Applying the above procedure to both and , we obtain
The first (so-called offline) step consists in computing the interpolation functions and and associated interpolation points and using Algorithm ? with input and respectively, and then in computing matrices and vectors using Algorithm ?. The second (so-called online) step simply consists in computing the matrix and the vector for a given value of using .
3Preconditioners for projection-based model reduction
We consider a parameter-dependent linear equation
with and . Projection-based model reduction consists in projecting the solution onto a well chosen approximation space of low dimension . Such projections are usually defined by imposing the residual of to be orthogonal to a so-called test space of dimension . The quality of the projection on depends on the choice of the test space. The latter can be defined as the approximation space itself , thus yielding the classical Galerkin projection. However when the operator is ill-conditioned (for example when corresponds to the discretization of non coercive or weakly coercive operators), this choice may lead to projections that are far from optimal. Choosing the test space as , where is called the “supremizer operator” (see e.g. ), corresponds to a minimal residual approach, which may also results in projections that are far from optimal. In this section, we show how the preconditioner can be used for the definition of the test space. We also show how it can improve the quality of residual-based error estimates, which is a key ingredient for the construction of suitable approximation space in the context of the Reduced Basis method.
is endowed with the norm defined by , where is a symmetric positive definite matrix and is the canonical inner product of . We also introduce the dual norm such that for any we have .
3.1Projection of the solution on a given reduced subspace
Here, we suppose that the approximation space has been computed by some model order reduction method. The best approximation of on is and is characterized by the orthogonality condition
or equivalently by the Petrov-Galerkin orthogonality condition
Obviously the computation of test functions for basis functions of is prohibitive. By replacing by , we obtain the feasible Petrov-Galerkin formulation
Denoting by a matrix whose range is , the solution of is where the vector is the solution of
Note that corresponds to the standard Galerkin projection when replacing by . Indeed, the orthogonality condition becomes for all .
We give now a quasi-optimality result for the approximation . This analysis relies on the notion of -proximality introduced in .
The orthogonality condition yields
for all . Using , we have that for any ,