Stability Analysis of Bilinear Iterative Rational Krylov Algorithm1footnote 11footnote 1This material is based upon work supported by Council of Scientific and Industrial Research (India) Grant Number 25/(0220)/13/EMR-II.

# Stability Analysis of Bilinear Iterative Rational Krylov Algorithm111This material is based upon work supported by Council of Scientific and Industrial Research (India) Grant Number 25/(0220)/13/EMR-II.

Rajendra Choudhary Kapil Ahuja Discipline of Computer Science and Engineering, Indian Institute of Technology Indore
###### Abstract

Models coming from different physical applications are very large in size. Simulation with such systems is expensive so one usually obtains a reduced model (by model reduction) that replicates the input-output behaviour of the original full model. A recently proposed algorithm for model reduction of bilinear dynamical systems, Bilinear Iterative Rational Krylov Algorithm (BIRKA), does so in a locally optimal way. This algorithm requires solving very large linear systems of equations. Usually these systems are solved by direct methods (e.g., LU), which are very expensive. A better choice is iterative methods (e.g., Krylov). However, iterative methods introduce errors in linear solves because they are not exact. They solve the given linear system up to a certain tolerance. We prove that under some mild assumptions BIRKA is stable with respect to the error introduced by the inexact linear solves. We also analyze the accuracy of the reduced system obtained from using these inexact solves and support all our results by numerical experiments.

###### keywords:
Bilinear Dynamical Systems, Model Reduction, Iterative Solves, Krylov Subspace Methods, Petrov-Galerkin, Backward Stability.
###### Msc:
[2010] 34C20, 41A05, 65F10, 65G99
journal: Linear Algebra and its Applications

## 1 Introduction

A dynamical system describes a relation between two or more measurable quantities by a set of differential equations. The system may be linear or nonlinear. A bilinear dynamical system is one such weakly nonlinear system. The system can be described both in the time domain and in the frequency domain. In the time domain, a Multiple Input Multiple Output (MIMO) bilinear dynamical system with m inputs and p outputs is represented as follows Dirac3 (); Dirac5 ():

 ζ:⎧⎪⎨⎪⎩˙x(t)=Ax(t)+m∑k=1Nkx(t)uk(t)+Bu(t),y(t)=Cx(t), (1)

where for , and . Also, , and . It is not possible to write the transfer function of a complete bilinear dynamical system, therefore, in Dirac5 (); Dirac10 () the authors represent the bilinear dynamical system in the frequency domain by a series of subsystem transfer functions, i.e.,

,

where and are the frequencies. The transfer function of the order subsystem is given as follows Dirac5 ():

 Hk(s1, s2, …, sk)= C(skIn−A)−1¯N[Im⊗(sk−1In−A)−1](Im⊗¯N)… ⋅⎡⎢ ⎢⎣Im⊗…⊗Imk−2 times⊗(s2In−A)−1⎤⎥ ⎥⎦⎛⎜ ⎜⎝Im⊗…⊗Imk−2 times⊗ ¯N⎞⎟ ⎟⎠ ⋅⎡⎢ ⎢⎣Im⊗…⊗Imk−1 times⊗(s1In−A)−1⎤⎥ ⎥⎦⎛⎜ ⎜⎝Im⊗…⊗Imk−1 times⊗ B⎞⎟ ⎟⎠, (2)

where ; and are the identity matrices of size and , respectively; and denotes Kronecker product (defined later).

If in (1), the matrix is a zero matrix, then the system is a linear dynamical system. That is, a MIMO linear dynamical system is represented as

 ˙x(t)=Ax(t)+Bu(t),y(t)=Cx(t). (3)

The transfer function of the linear dynamical system in the frequency domain is defined as follows:

 H(s)=C(sIn−A)−1B. (4)

In general, dynamical systems corresponding to real world applications are extremely large in size. Simulation and computation with such systems requires large amount of space and time. By using model reduction techniques Dirac6 (), these large dynamical systems are reduced into a smaller size, which makes the simulation and computation easier. Model reduction can be done in many ways, i.e., by using balanced truncation, Hankel approximations or Krylov projection Dirac6 (). Projection methods obtain the reduced model by projecting the original full model on a lower dimensional subspace, and are quite popular. In literature, there are several techniques of projecting a dynamical system Dirac6 (); Dirac4 (); serkphd (); zbai (); kahujaphd (); Dirac7 (). The Petrov-Galerkin projection is one such projection technique that gives nice properties in the reduced model. Interpolation is usually used to obtain the subspaces involved in the Petrov-Galerkin projection.

Based upon the theory of Petrov-Galerkin based interpolatory model reduction, authors in Dirac4 (); inexirka (); mimo () have proposed Iterative Rational Krylov Algorithm (IRKA) for model reduction of linear dynamical systems. IRKA provides the reduced model that is optimal (the kind of optimality is discussed in the next section). Similar to IRKA, authors in Dirac3 (); Dirac5 (); doi:10.1137/130947830 (); tobiasphd () have proposed Bilinear Iterative Rational Krylov Algorithm (BIRKA) for model reduction of bilinear dynamical systems.

The main computational bottleneck in reducing larger models (or dynamical systems) is solving large sparse linear systems of equations. The reason for this is that typically, model reducers use direct solvers, e.g., LU factorization to solve such linear systems of equations, which are expensive. The solution to this scaling problem is to use iterative methods, e.g., Krylov subspace methods.

Application of Krylov subspace methods for IRKA has been done ahuja2012recycling (); sarahsms (); sarahsphd (). Iterative methods are inexact, i.e., they solve linear systems of equations up to a certain stopping tolerance. Hence, it becomes important to check if the model reduction algorithm (IRKA or BIRKA) is stable with respect to these inexact solves. In other words, we need to check that small errors in linear solves does not substantially deteriorate the quality of the reduced model. For IRKA, stability analysis has been done in Dirac2 (). We do the same for BIRKA, i.e., prove that BIRKA is stable with respect to the inexact linear solves. From this work users will have more confidence in using iterative solvers for BIRKA.

In the next section (Section 2), we discuss model reduction by a Petrov-Galerkin based interpolatory model reduction framework. We discuss stability of BIRKA in Section 3. In Section 4, we analyze invertibility assumptions of all involved matrices as well as the accuracy of the reduced system obtained from a backward stable BIRKA. We support our theory with numerical experiments in Section 5, and give concluding remarks as well as future directions in Section 6. For the rest of this paper we use the terms and notations as listed below.

1. In literature Dirac3 (), the norm of a bilinear dynamical system is defined as

 ∥ζ∥2H2=vec(Ip)T(C⊗C)(−A⊗In−In⊗A−m∑k=1Nk⊗Nk)−1(B⊗B)vec(Im), (5)

where is an identity matrix of size . If the type of norm is not written, then in the case of functional norm it is a norm. In the case of matrices it is a 2-norm.

2. The Kronecker product between two matrices (of size ), and (of size ) is defined as

 P⊗Q=⎡⎢ ⎢⎣p11Q⋯p1nQ⋮⋱⋮pm1Q⋯pmnQ⎤⎥ ⎥⎦,

where is an element of matrix and order of is .

3. operator on a matrix is defined as

 vec(P)=(p11, …, pm1, p12, …, pm2, … …, p1n, …, pmn)T.
4. Also, denotes the set of real numbers and denotes the discrete subset of real numbers.

## 2 Petrov-Galerkin Based Interpolatory Model Reduction Framework

According to the Petrov-Galerkin projection, the residual of a dynamical system obtained after projecting on a lower dimensional subspace, is made orthogonal to some other subspace defined by a test basis. Let denote the residual of this dynamical system, then according to the Petrov-Galerkin condition, , where denotes any test subspace.

The subspace on which we project, and the orthogonal subspace are not known to us. We can arbitrarily pick these subspaces, but then we cannot guarantee a good input-output behaviour from the reduced model. For the reduced model to provide a high fidelity approximation to the input-output behaviour of the original full model, we use interpolation to obtain these subspaces. In Dirac4 (), authors give an algorithm for model reduction of linear dynamical systems called IRKA (Iterative Rational Krylov Algorithm). IRKA is a Petrov-Galerkin based interpolatory model reduction algorithm. For a certain type of linear dynamical systems, IRKA locally converges to a local minimum of the underlying optimization problem flagg12Convergence (). For optimality discussion in the linear case we refer the reader to Dirac4 () and flagg12Convergence (). We discuss optimality in the bilinear case below.

Next, we apply Petrov-Galerkin based interpolatory model reduction to a bilinear dynamical system. This is a short summary of the original work in Dirac3 () and Dirac5 (). After reduction, the bilinear system (1) can be represented as Dirac3 ()

 ζr:⎧⎪⎨⎪⎩˙xr(t)=Arxr(t)+m∑k=1Nkrxr(t)uk(t)+Bru(t),yr(t)=Crxr(t), (6)

where for with . We want to approximate in an appropriate norm, and hence, should be nearly equal to for all admissible inputs. Let the two r-dimensional subspaces, and , be chosen in such a way that , where are matrices. We project the original full model (1) to a lower dimensional subspace, i.e., , and enforce the Petrov-Galerkin condition Dirac3 (); Dirac5 ()

Comparing the above equations with (6), we get

 Ar=(WTrVr)−1WTrAVr, Nkr=(WTrVr)−1WTrNkVr, Br=(WTrVr)−1WTrB, and Cr=CVr, (7)

where is assumed to be invertible. Obtaining such an invertible matrix is not hard Dirac3 (). Different selection of the subspaces and give different reduced models, but we choose the subspaces and by enforcing interpolation. In the case of bilinear systems, there are two ways of doing interpolation Dirac5 ().

A bilinear system can be represented by a series of subsystem transfer functions. If we apply certain interpolation conditions on a finite number of subsystems then, it is called subsystem interpolation Dirac5 (). Another way is Volterra series interpolation. Here, interpolation is done on a weighted sum of all Volterra kernel transfer functions given by (1). We refer the reader to Dirac5 (); opac-b1090598 () for a detailed discussion on the definition of the Volterra series, the Volterra kernels, and the subsequent derivations.

As the subsystem interpolation approach is unable to satisfy any optimality condition Dirac5 () (error between the original full model and the reduced model is minimum in some norm), so our focus is on the Volterra series interpolation. We need to know how to build and such that the conditions of the Volterra series interpolation are satisfied. We also need to decide where to interpolate so that we get an optimal reduced model. Here, we focus on optimality.

In a bilinear system, the following error system expression is differentiated for getting the optimality conditions Dirac3 ()

 (−[A00Λ]⊗[In00Ir]−[In00Ir]⊗[A00ˇA]−m∑k=1[Nk00ˇˇNTk]⊗[Nk00ˇNk])−1 ×([BˇˇBT]⊗[BˇB])vec(I2m), (8)

where and are the initial guesses for the reduced system. Also, . Performing interpolation on the inverse images of the reduced system poles helps achieve optimality. Theorem 1 below summarizes this where the poles of the transfer function of every reduced subsystem (say ) are computed (say represented by ), inverted (leading to ), and finally, interpolation is performed at these points.

###### Theorem 1.

Dirac5 () Let be a bilinear system of order n. Let be an optimal approximation of order r. Then, satisfies the following multi-point Volterra series interpolation conditions:

 ∞∑k=1r∑l1=1…r∑lk=1ϕl1, l2, …, lkHk(−λl1, −λl2, …, −λlk) = ∞∑k=1r∑l1=1…r∑lk=1ϕl1, l2, …, lkHrk(−λl1, −λl2, …, −λlk),and
 ∞∑k=1r∑l1=1…r∑lk=1ϕl1, l2, …, lk(k∑j=1∂∂sjHk(−λl1, −λl2, …, −λlk))= ∞∑k=1r∑l1=1…r∑lk=1ϕl1, l2, …, lk(k∑j=1∂∂sjHrk(−λl1, −λl2, …, −λlk)),

where and are residues and poles of the transfer function associated with , respectively.

Obtaining the residues and the poles of the optimal reduced model is not possible since we do not have such a system. In Dirac3 (); doi:10.1137/130947830 () the authors propose Bilinear Iterative Rational Krylov Algorithm (BIRKA), which at convergence, ensures that the conditions of Theorem 1 are satisfied. BIRKA gives a locally optimal reduced model. Algorithm 1 lists BIRKA.

## 3 Backward Stability

In general, numerical algorithms for a problem are continuous in nature but, a digital computer solves them in a discrete manner. The reason is limitation on the representation of real / complex numbers. Since complex numbers can be represented by real numbers, we focus on latter only. Let be a function giving a finite approximation to a real number. It provides rounded equivalent as Dirac8 ()

for all ,

where is the machine precision. Also, for every operation between any two finite numbers, the result is exact up to a relative error, i.e., for all x, y

 fd(x⊕y)=(x⊕y)(1+ϵmachine),

where can be any of the following operation: .

Consider a continuous mathematics algorithm . Say executing this algorithm on a digital computer (that uses finite precision arithmetic) is represented as . To check how good the approximated algorithm is, one usually computes the accuracy of . We say an algorithm is accurate if Dirac8 ()

 ∥∥f(x)−˜f(x)∥∥∥f(x)∥=O(ϵmachine),

where . From the above equation, we find that computing accuracy is not possible since we do not know . A more easier parameter to check the goodness of is stability. There are multiple notions of stability. One such notion is backward stability, which says that an algorithm is backward stable if Dirac8 ()

 ˜f(x)=f(˜x)for some ˜x with∥x−˜x∥∥x∥=O(ϵmachine).

This notion of backward stability is useful since one can easily compute accuracy of the result/ output for a backward stable algorithm.

###### Theorem 2.

Dirac8 () If is a backward stable algorithm, and is the condition number of the problem, then the relative error

 ∥∥f(x)−˜f(x)∥∥∥f(x)∥=O(k(x) ϵmachine),

where is the machine precision (or perturbation in ).

Let’s look at lines 3b. and 3c. in BIRKA (Algorithm 1). There we need to solve linear systems to compute and , respectively. Solving these linear systems by direct methods (such as LU-factorization, Gaussian elimination, etc.) is too expensive (time complexity of where is the system size). Moreover the linear systems here have sparse matrices. For such systems, iterative methods, e.g., Krylov subspace methods Dirac6 (), are preferred because of the reduced complexity (time complexity of where is the number of nonzeros in the matrix) In fact, the matrices here are block sparse. Iterative methods for difficult to solve linear systems usually require a preconditioner. Hence, this block sparsity can be exploited in designing preconditioners here. E.g., in luo2015optimization (), authors have designed an Incomplete LU (ILU) factorization for efficiently solving block sparse linear systems. The techniques from luo2015optimization (), can be used for designing better ILU preconditioned iterative methods for block sparse linear systems..

Iterative methods are inexact in nature, which means they do not solve linear systems, say Ax = b, exactly. Instead Ax = b + is solved, where is the stopping tolerance. Our aim is to find that if one uses an iterative solver (also called inexact solver from now on) in IRKA or BIRKA, are these algorithms stable with respect to the error introduced by the inexact solves. As earlier, we check for backward stability. For IRKA, the backward stability analysis has been done in Dirac2 ().

Let in BIRKA be calculated exactly, and be the functional representation of the interpolation process that uses and in BIRKA (i.e., exact BIRKA). Similarly, let and be calculated inexactly (i.e., by an iterative solver), and be the functional representation of the interpolation process that uses in BIRKA (i.e., inexact BIRKA). Then, from the backward stability definition, BIRKA is backward stable if

 ˜g(ζ)=g(˜ζ)for % some ˜ζ with (9) ∥ζ−˜ζ∥H2 or H∞∥ζ∥H2 or H∞=O(∥F∥), (10)

where is the perturbed full model corresponding to the error in the linear solves for and in inexact BIRKA. This perturbation is denoted by . Next, we look at the above two conditions for stability in the two different sub-sections below.

### 3.1 Satisfying the First Condition of Backward Stability

Let the original full order model be represented as . Recall from Algorithm 1, the following:

 (11)

Also, let the residuals associated with iterative solves for computing and be and , respectively. Then, the above equations lead to

 (−Λ⊗In−Ir⊗A−m∑k=1ˇˇNkT⊗Nk)vec(˜V)= (ˇˇBT⊗B) vec(Im)+vec(RB)and (12) (−Λ⊗In−Ir⊗AT−m∑k=1ˇˇNk⊗NTk)vec(˜W)= (13)

Let and . The Petrov-Galerkin projection connects the reduced model matrices (obtained by inexact BIRKA) to the original full model matrices as

 ˜Ar=(˜WTr˜Vr)−1˜WTrA˜Vr,  ˜Nkr=(˜WTr˜Vr)−1˜WTrNk˜Vr,˜Br=(˜WTr˜Vr)−1˜WTrB, and  ˜Cr=C˜Vr, (14)

where this reduced model is represented as .

By the backward stability definition, next we find a perturbed full model whose exact interpolation will give the reduced model as obtained by inexact interpolation of the original full model. Let the perturbed full model be represented as or , where are the constant perturbation matrices. Then, we have

 (−Λ⊗In−Ir⊗(A+F)−m∑k=1ˇˇNkT⊗(Nk+Ek))vec(˜V) =(ˇˇBT⊗(B+G)) vec(Im)and (15) (−Λ⊗In−Ir⊗(A+F)T−m∑k=1ˇˇNk⊗(Nk+Ek)T)vec(˜W) =(ˇˇCT⊗(C+H)T) vec(Ip),

or

 (−Λ⊗In−Ir⊗A−m∑k=1ˇˇNkT⊗Nk)vec(˜V)= (ˇˇBT⊗B) vec(Im)+(ˇˇBT⊗G) vec(Im) + (Ir⊗F+m∑k=1ˇˇNTk⊗Ek)vec(˜V)and (16) (−Λ⊗In−Ir⊗AT−m∑k=1ˇˇNk⊗NTk)vec(˜W) + (Ir⊗FT+m∑k=1ˇˇNk⊗ETk)vec(˜W). (17)

As earlier, and . Using the Petrov-Galerkin projection to connect the reduced model matrices (obtained by exact BIRKA) with the perturbed full model matrices we get

 ˆAr=(˜WTr˜Vr)−1˜WTr(A+F)˜Vr,  ˆNkr=(˜WTr˜Vr)−1˜WTr(Nk+Ek)˜Vr,ˆBr=(˜WTr˜Vr)−1˜WTr(B+G),  and  ˆCr=(C+H)˜Vr, (18)

where this reduced model is represented as  . To satisfy the backward stability’s first condition (9), we equate the reduced models in (14) and (18). That is,

 ˆAr =˜Ar+(˜WTr˜Vr)−1˜WTrF˜Vr.

Similarly, ,   and .

From the above, we note that if , then . Similarly, if , then ; if , then ; and if , then . Using the Petrov-Galerkin framework for the inexact solves in (12) and (13), we can easily achieve some of the above relations. We discuss this next.

#### 3.1.1 The Petrov-Galerkin Framework for Inexact Solves

The Petrov-Galerkin framework by definition implies finding the solution of a linear system of equation such that its residual at every point is orthogonal to some other suitable subspace van2003iterative (). In our context, we define the Petrov-Galerkin framework as below.

 Find˜V∈Prsuch thatRB⊥Qrandfind˜W∈Qrsuch thatRC⊥Pr, (19)

where and are any two r-dimensional subspaces of ; and satisfy (12); and and satisfy (13).

Comparing (12) with (16) and (13) with (17), we get the following equations:

 vec(RB) =(ˇˇBT⊗G) vec(Im)+(Ir⊗F+m∑k=1ˇˇNTk⊗Ek)vec(˜V) and vec(RC) RB =GˇˇB+F˜V+m∑k=1Ek˜VˇˇNkandRC=HTˇˇC+FT˜W+m∑k=1ETk˜WˇˇNTk. (20)

Next, we consider perturbations in and individually, and use the Petrov-Galerkin framework discussed above. First, if we take the perturbation in only, then (20) is equivalent to

 RB =F˜VandRTC=˜WTF. (21)

In the above, if we multiply from left in the first equation and from right in the second equation, then we get

 ˜WTRB=˜WTF˜VandRTC˜V=˜WTF˜V.

From the Petrov-Galerkin framework (19), , and hence,

 ˜WTF˜V=0or˜WTrF˜Vr=0.\lx@notemarkfootnote (22)

Similarly, if we take the perturbation in any one matrix, then (20) is equivalent to

 RB=Ek˜VˇˇNk% andRTC=ˇˇNk˜WTEk.

Again in the above, if we multiply from left in the first equation and from right in the second equation, then we get

 ˜WTRB=˜WTEk˜VˇˇNkandRTC˜V=ˇˇNk˜WTEk˜V.

Using the Petrov-Galerkin framework (19) in above we get

 ˜WTEk˜VˇˇNk=0andˇˇNk˜WTEk˜V=0.

To achieve the desired result, i.e., , we need to be invertible. This cannot always be guaranteed. Thus, we drop the perturbation analysis with matrices.

Finally, if we only take the perturbations and , in the matrices and , respectively, then (20) is equivalent to

 RB=GˇˇBandRTC=ˇˇCTH.

As in the last two paragraphs, multiplying by from left in the first equation above, multiplying by from right in the second equation above, and using the Petrov-Galerkin framework (19) we get

 ˜WTGˇˇB=0andˇˇCTH˜V=0.

As above, to achieve the desired result, i.e., and , we need and to be invertible. This cannot always be guaranteed because these are non-square matrices. Thus, we drop the perturbation analysis with and matrices both.

Hence, (22) implies that if we consider the perturbation in matrix only and use a Petrov-Galerkin framework for the inexact linear solves, then

 ˆAr=˜Ar,ˆNkr=˜Nkr, ˆBr=˜Br,andˆCr=˜Cror ˜g(ζ) =g(˜ζ).

The theorem below summarizes this.

###### Theorem 3.

If the inexact linear solves in BIRKA (line 3b. and 3c. of Algorithm 1) are solved using the Petrov-Galerkin framework (19), then BIRKA satisfies the first condition of backward stability with respect to these solves, i.e., (9).

### 3.2 Satisfying the Second Condition of Backward Stability

Next, we show that the second condition of backward stability, given in (10), is also satisfied. According to (10), the difference between the original full model and the perturbed full model should be order of the perturbation, i.e.,

 ∥ζ−˜ζ∥H2 or H∞∥ζ∥H2 or H∞ =O(∥F∥).

We satisfy the above condition in the absolute sense, since is independent of . That is,

 ∥∥ζ−˜ζ∥∥2H2=O(∥F∥2).

Consider the error system whose matrices are defined as follows Dirac3 (); Dirac5 ():

.

The norm of this error system is

 ∥ζerr∥2H2 =vec(I2p)T([C−C]⊗[C−C]) × (−[A00