Model reduction of Fokker–Planck and quantum Liouville equations

Model reduction of controlled Fokker–Planck and Liouville–von Neumann equations

Abstract.

Model reduction methods for bilinear control systems are compared by means of practical examples of Liouville–von Neumann and Fokker–Planck type. Methods based on balancing generalized system Gramians and on minimizing an -type cost functional are considered. The focus is on the numerical implementation and a thorough comparison of the methods. Structure and stability preservation are investigated, and the competitiveness of the approaches is shown for practically relevant, large-scale examples.

Key words and phrases:
Bilinear systems, model order reduction, balanced truncation, averaging method, Hankel singular values, generalized Lyapunov equations, stochastic control.

1. Introduction

Due to the growing ability to accurately manipulate single molecules by spectroscopic techniques, numerical methods for the control of molecular systems have recently attracted a lot of attention [4, 30, 41, 61]. Key applications involve probing of mechanical properties of biomolecules by force microscopy and optical tweezers [23, 32], or the control of chemical reaction dynamics by temporally shaped femtosecond laser pulses in femtochemistry [53, 59]. A key feature of these small systems is that they are open systems, in that they are subject to noise and dissipation induced by the interaction with their environment, as a consequence of which the dynamics are inherently random and the description is on the level of probability distributions or measures rather than trajectories [47].

Depending on whether or not quantum effects play a role, the evolution of the corresponding probability distributions is governed by parabolic partial differential equations of either Liouville–von Neumann or Fokker–Planck type. The fact that the dynamics are controlled implies that the equations are bilinear as the control acts as an advection term that is coupled linearly to the probability distribution, but the main computational bottleneck clearly is that the equations, in spatially semi-discretized form, are high-dimensional which explains why model reduction is an issue; for example, in catalysis, optimal shaping of laser pulses requires the iterated integration of the dissipative Liouville–von Neumann (LvN) equation for reduced quantum mechanical density matrices, the spatial dimension of which grows quadratically with the number of quantum states involved [18]; cf. [36].

Many nonlinear control systems can be represented as bilinear systems by a suitable change of coordinates (as well as linear parametric systems), and it therefore does not come as a surprise that model reduction of bilinear control systems has recently been a field of intense research; see [7, 12] and the references therein. In recent years, various model reduction techniques that were only available for linear systems have been extended to the bilinear case, among which are Krylov subspace techniques [44, 6, 16, 39, 45], interpolation-based approaches [1, 8, 24, 25], balanced model reduction [2, 9, 49, 29], empirical POD [20, 21, 34], or -optimal model reduction [8, 25, 60]. The downside of many available methods is their lack of structure preservation, most importantly, regarding asymptotic stability. In our case, positivity is an issue too, as we are dealing with probability distributions.

In this paper we compare two different model reduction techniques that represent different philosophies of model order reduction, with the focus being on practical computations and numerical tests rather than a theoretical analysis. The first approach is based on the interpolation of the Volterra series representation of the system’s transfer function and gives a local -optimal approximation, because the interpolation is chosen so that the system satisfies the necessary -optimality conditions upon convergence of the algorithm; see [8] for details. The second approach is based on balancing the controllable and observable subspaces, and exploits the properties of the underlying dynamical system in that it uses the properties of the controllability and observability Gramians to identify suitable small parameters that are sent to 0 to yield a reduced-order system; for details, we refer to [29]. Both methods require the solution of large-scale matrix Sylvester or Lyapunov equations. While the computational effort of balanced model reduction is essentially determined by the solution of two generalized Lyapunov equations for controllability and observability Gramians, the effort of the -optimal interpolation method is mainly due to the solution of two generalized Sylvester equations in each step of the bilinear iterative rational Krylov algorithm (B-IRKA). We stress that both generalized Lyapunov or Sylvester equations can be solved iteratively at comparable numerical cost (for a given accuracy), but they all require the dynamics of the uncontrolled system to be asymptotically stable [55]. However, as both the dissipative LvN and Fokker-Planck operators have a simple eigenvalue zero, stability has to be enforced before solving Lyapunov or Sylvester equations, and in this paper we systematically compared stabilization techniques for both approaches.

The outline of the article is as follows: In Section 2 we briefly discuss the basic properties of bilinear systems and set the notation for the remainder of the article. Model reduction by -norm minimization and balancing are reviewed in Sections 3 and 4, along with some details regarding the numerical implementation for the specific applications considered in this paper in Section 5. Finally, in Section 6 we study model reduction of the Fokker-Planck equation comparing balancing and -norm minimization, and in Section 7 we carry out a similar study for the dissipative Liouville–von Neumann equation. We discuss our observations in Section 8. The article contains an appendix, Appendix A, that records some technical lemmas related to the asymptotic stability of bilinear systems.

2. Bilinear control systems

We start by setting the notation that will be used throughout this article. Let be governed by the time-inhomogeneous differential equation

(2.1)

with coefficients , and being a vector of bounded measurable controls . We assume that not all state variables are relevant or observable, so we augment (2.1) by a linear output equation

(2.2)

with , . The systems of equations (2.1)–(2.2) is called a bilinear control system with inputs and outputs .

As is well-known, see e.g. [48, 60], an explicit output representation for (2.2) can be obtained by means of successive approximations. The resulting so-called Volterra series is given as

(2.3)

Moreover, based on a multivariate Laplace transform of these integrands, the system can alternatively be analyzed in a generalized frequency domain by means of generalized transfer functions. Since this will not be essential for the results presented here, we refrain from a more detailed discussion and refer to, e.g., [48].

2.1. Reduced-order models

We seek coefficients , and with such that

(2.4)

has an input-output behavior that is similar to (2.1)–(2.2). In other words, we seek a reduced-order model with the property that for any admissible control input (to be defined below), the error in the output signal,

(2.5)

is small, relative to (in some norm) and uniformly on bounded time intervals.

As will be outlined below, both model reduction schemes considered in this paper are closely related to the solutions of the following adjoint pair of generalized Lyapunov equations:

(2.6)

and

(2.7)

where, in the first equation, we have introduced the shorthand . The Hermitian and positive semi-definite matrices are called the controllability and observability Gramians associated with (2.1)–(2.2)—assuming well-posedness of the Lyapunov equations and hence existence and uniqueness of and . The relevance of the Gramians for model reduction is related to the fact that the nullspace of the controllability Gramian contains only states that cannot be reached by any bounded measurable control and that the system will not produce any output signal, if the dynamics is initialized in the nullspace of the observability Gramian [33]; as a consequence one can eliminate states that belong to without affecting the input-output behavior of (2.1)–(2.2); cf. [2].

2.2. Standing assumptions

The following assumptions will be used throughout to guarantee existence and uniqueness of the solutions to the generalized Lyapunov equations (Assumption 1) and existence and uniqueness of the solution of the bilinear system (2.1) for all (Assumptions 2 and 3):

Assumption 1: There exists constants , such that

and

where is the matrix 2-norm that is induced by the Euclidean norm .

Assumption 2: The bilinear system (2.1)–(2.2) is bounded-input-bounded-output (BIBO) stable, i.e., there exists , such that for any input with

the output is uniformly bounded.

Assumption 3: The admissible controls are continuous, bounded and square integrable, i.e., with

Specifically, we require that the admissible controls are uniformly bounded by

with as in Assumption 1, which by BIBO stability (Assumptions 2) implies that the output is bounded for all (cf. [52]).

3. optimal model reduction of bilinear systems

In this section, we recall some existing results on -optimal model order reduction for bilinear systems. For a more detailed presentation, see [60, 8, 24].

For a better understanding of the subsequent concepts, let us briefly focus on the linear case, i.e., in (2.1). Here, the Volterra series representation (2.3) simplifies to If the input signal is a Dirac mass at we obtain the impulse response The -norm for linear systems now is simply defined as the -norm of the impulse response, i.e.,

Based on the latter definition and the Volterra series, in [60], the -norm has been generalized for bilinear systems as follows.

Definition 3.1.

Let denote a bilinear system as in (2.1). We then define its -norm by

with

Obviously, for a bilinear system having a finite -norm, it is required that the system is stable in the linear sense, i.e., has only eigenvalues in Moreover, the matrices have to be sufficiently bounded. From [60], let us recall that Assumption 1 ensures that the bilinear system under consideration has a finite -norm, which, moreover, can be computed by means of the solution and of the generalized Lyapunov equations (2.6) and (2.7), respectively. In particular, we have that

Given a fixed system dimension the goal of -optimal model order reduction now is to construct a reduced-order bilinear system such that

Unfortunately, already in the linear case this is a highly nonconvex minimization problem such that finding a global minimizer is out of reach. Instead, we aim at constructing such that first-order necessary conditions for -optimality are fulfilled. In [60], the optimality conditions from [58] are extended to the bilinear case. More precisely, it is shown that an -optimal reduced-order model is defined by a Petrov-Galerkin projection of the original model. Given a reduced-order system let us consider the associated error system

(3.1)

as well as the generalized Lyapunov equations associated with it

(3.2)

Assuming the partitioning

(3.3)

the first-order necessary optimality conditions now are

(3.4)

In [60] the authors have proposed a gradient flow technique to construct a reduced-order model satisfying (3.4). Since here we are interested in computations for large-scale systems for which this technique is not feasible, we instead use the iterative method from [8]. The main idea is inspired by the iterative rational Krylov algorithm from [28] and relies on solving generalized Sylvester equations of the form

Based on a given reduced-order model , the subspaces spanned by columns of the solutions are used to generate an updated reduced-order model. More precisely, given unitary matrices such that and we set

This type of fixed-point iteration is repeated until the reduced-order model is numerically converged up to a prescribed tolerance. For more details on the iteration, we also refer to [8].

4. Balanced model reduction for bilinear systems

We shall briefly explain model reduction based on balancing controllability and observability. To this end we assume that the generalized Gramian matrices are both Hermitian positive definite which is guaranteed by the assumption that the bilinear system (2.1)–(2.2) is completely controllable and observable:

Assumption 4: The matrix pair is controllable, i.e.,

Assumption 5: The matrix pair is observable, i.e.,

4.1. Singularly perturbed bilinear systems

We consider a balancing transformation under which the Gramians transform according to [42]

(4.1)

where the diagonal matrix with contains the real-valued Hankel singular values (HSV) of the system. Under the linear map , the coefficients of (2.1)–(2.2) transform according to

(4.2)

As the Hankel singular values are the square roots of the eigenvalues of the product , they are independent of the choice of coordinates. It can be shown (e.g. [5]) that a balancing transformation that makes the two Gramians and equal and diagonal is given by the matrix with inverse where the matrices are defined by the Cholesky decompositions and of the two Gramians solving (2.6) and (2.7), and their singular value decomposition .

Now suppose that with and corresponding to the splitting of the system states into relevant and irrelevant states. Further assume that in the sense that the smallest entry of is much larger than the largest entry of . The rationale of balanced model reduction is based on a continuity argument: if the space of the uncontrollable and unobservable states is spanned by the singular vectors corresponding to , then, by continuity of the solution of (2.1)–(2.2) on the system’s coefficients, small singular values should indicate hardly controllable and observable states that do not contribute much to the input-output behavior of the system.

Using the notation with and partitioning the balanced coefficients according to the splitting into large and small HSV, then yields the following singularly perturbed system of equations (see [29, 31]):

(4.3)

Here , with , denotes the balanced state vector where the splitting into , is in accordance with the splitting of the HSV into and . The splitting of the balanced coefficients

(4.4)

into , etc. can be understood accordingly.

4.2. An averaging principle for bilinear systems

In order to derive reduced-order models of (2.1)–(2.2), we consider the limit in (4.3). This amounts to the limit of vanishing small HSV in the original bilinear system.

We suppose that Assumptions 1–5 hold for all . As we will show in Appendix A, the results in [11] can be modified to show that the matrices and are Hurwitz, and that their eigenvalues are bounded away from the imaginary axis. In this case, BIBO stability of the system together with the assumptions on the admissible controls imply that pointwise for all as . However, the rate at which tends to zero and hence the limiting bilinear systems clearly depends on the controls , especially when depends on . We give only a formal justification of the different candidate equations that can be obtained in the limit of vanishing small HSV and refer to [29] for further details.

4.3. Balanced truncation

If , we expect that the first two equations in (4.3) decouple as , which implies that the limiting bilinear system will be of the form

(4.5)

The assumption that goes to zero faster than is the basis of the traditional balanced truncation approach in which the weakly controllable and observable degrees of freedom are eliminated by projecting the equations to the linear subspace

The validity of the approximation for all requires that ; cf. Remark 4.2 below.

4.4. Singular perturbation

If the , equations do not decouple as , and the limiting equation turns out to be different from (4.5). To reveal it, it is convenient to introduce scaled variables by by which (4.3) becomes

(4.6)

Equation (4.6) is an instance of a slow-fast system with being the slow variable and being fast, and for non-pathological controls , the averaging principle applies [27]. The idea of the averaging principle is to average the fast variables in the equation for against their invariant measure, because whenever is sufficiently small, the fast variables relax to their invariant measure while the slow variables are effectively frozen, and therefore the slow dynamics move under the average influence of the fast variables. This clearly requires that the convergence of the fast dynamics is sufficiently fast and independent of the initial conditions. The auxiliary fast subsystem for frozen slow variable reads

(4.7)

with

(4.8)

It is obtained from (4.6) by rescaling the equations according to and , and sending . Since the admissible controls decay on time scales that are of order one in (i.e.  in ), it follows that

In other words, for fixed the fast dynamics converge to the Dirac mass at . This can be rephrased by saying that for all admissible controls and in the limit the dynamics (4.6) collapse to the invariant subspace

Averaging the fast variables in (4.6) against their invariant measure , then yields the averaged equation for the slow variables:

(4.9)

with the coefficients

(4.10)

The situation here is special, in that the controls decay sufficiently fast so that the invariant measure of the fast variables is independent of . For other choices of admissible controls, however, the invariant measure may depend on , which then gives rise to averaged equations with measure-valued right hand side [26, 27, 54]. The following approximation result has been proved in [29]; cf. [56].

Theorem 4.1.

Let in (4.6) be admissible, satisfying for some . Further let be the observed solution of (4.6) with consistent initial conditions , and let denote the output of the averaged equation (4.9) on the bounded time interval , starting from the same . Then there exists a constant , such that

We should stress that it is possible to relax the condition on the initial conditions that guarantees that . In this case there will be a transient initial layer of thickness , in which there is a rapid adjustment of the initial conditions to the invariant subspace and during which the averaged dynamics deviates from the original dynamics, with an error. A uniform approximation on can then be obtained by a so called matched asymptotic expansion that matches an initial layer approximation with the averaged dynamics [43].

Remark 4.2.

For single-input systems (), a sufficient condition for BIBO stability of (2.1) is that is Hurwitz, in which case there exists a , such that is Hurwitz for all . The stability of is inherited by the Schur complement , in (4.9) and consequently inherits stability, with a possibly smaller stability region. (See Appendix A for details.) Hence reduced single-input systems are again BIBO stable.

5. Numerical details

Before testing and balanced model reduction for examples from stochastic control and quantum dynamics, see Secs. 6 and 7, respectively, we will first focus on the numerical issues related to the scaling of the controls and the preprocessing of the unstable matrix.

5.1. Structured bilinear systems

The subsequent numerical examples share several special properties that result from a physical interpretation and that require a careful numerical treatment. In this section, we provide some insight in how the model reduction methods are applied to the particularly structured bilinear systems. In fact, in the FPE as well as in the LvNE context, the initial setup leads to a purely bilinear system of the form

(5.1)

In either case, the system exhibits a nontrivial stationary solution corresponding to a simple eigenvalue of the system matrix i.e., For the applications we are interested in the deviation of the state from the stationary solution. Let us therefore introduce the reference state that is governed by the bilinear system

(5.2)

where the term can be interpreted as a constant nonzero feedthrough of the system. For the reduced-order model, we thus may simply set such that we can simply focus on the output operator In accordance with standard model reduction concepts that assume a homogeneous initial condition, here we assume that the initial state of the original system is the equilibrium, i.e., While the system now has been transformed from a purely bilinear into a standard bilinear system, we still have to deal with the problem of a system matrix that is not asymptotically stable. In what follows, we present two different techniques that bypass this problem.

5.2. Sparsity preserving projection

In our examples, the system matrices are mass and positivity preserving. Numerically this is reflected in the fact that the system matrix as well as the bilinear coupling matrices have zero row sum. In other words, the vector satisfies The intuitive idea now is splitting the state into the direct sum of the asymptotically stable subspace and the eigenspace associated with the eigenvalue Since a straightforward implementation in general will destroy the sparsity pattern of the matrices, we suggest to use a particular decomposition that has been introduced in a similar setup in [17]. Define the matrix

where denotes the -th unit vector in An easy calculation now shows that the inverse is given as

where the vector consists of the first components of Assume that the matrices and are partitioned as follows

with and Finally, a state space transformation yields the equivalent bilinear system

(5.3)

Making use of the relations we conclude that the last row of and is zero. This implies that the last component of is constant, and, due to vanishes for all times As a consequence, we can focus on the first components of which, after some calculations, can be shown to satisfy

Typically, the matrices and result from finite difference or finite element discretization, respectively, and thus are sparse. The previous projection in fact only slightly increases the number of nonzero entries. Moreover, the matrices are given as the sum of the original data and a low rank update which can be exploited in a numerical implementation as well.

5.3. Discounting the system state

An ad-hoc alternative to the decomposition of the state space into stable and unstable directions is the “shifting” of the -matrix by a translation for some . If has a simple eigenvalue zero, as in our case, there exists an , such that the matrix is Hurwitz. For linear systems the shifting can be interpreted as a discounting of the controllability and observability functionals that renders the associated Gramians finite [15].

As the controllability and observability Gramians in the bilinear case are lacking a similar interpretation, the shifting has no clear functional analogue (cf. [10]). It is still possible to stabilize the system by a joint state-observable transformation

under which the system (2.1)–(2.2) transforms according to

(5.4)

Even though (5.4) and (2.1)–(2.2) are equivalent as state space systems, the shifting clearly affects the Hankel singular value spectrum and, as a consequence, the reduced system. (As a matter of fact, the Hankel singular values do not even exist in case of the untransformed system.) Hence the parameter should be regarded as a regularization parameters that must chosen as small as possible.

Later on we compare stabilization of the matrix by state space decomposition and shifting in terms of the achievable state space reduction (i.e., decay of Hankel singular values) and fidelity of the reduced models.

5.4. Scaling the control fields

Assumption 1 in Sec. 2.2 deals with the existence and uniqueness of controllability and observability Gramians which are obtained as solutions to the generalized Lyapunov equations. The criterion given there involves an upper bound for the matrix 2-norm of the control matrices . In the examples of model order reduction shown below, this can be achieved by a

suitable scaling with real which leaves the equations of motion invariant but, clearly, not the Gramians. Hence, by increasing , we drive the system to its linear counterpart. For the limit , the system matrices and vanish and we obtain a linear system. For this reason, should not be chosen too large.

5.5. Calculation of the error

To quantify the error introduced by dimension reduction, we use the -norm introduced in Sec. 3. We emphasize that the effort required for computing the -error is negligible when compared to solving the generalized Lyapunov equations arising for balanced truncation and singular perturbation, respectively, which is seen as follows. Given a reduced-order system the associated -error is given as

(5.5)

where solves (3.2). Using the particular structure of the error system, this is obviously the same as

However, the term now can be precomputed since is required for the balancing-based methods anyway. What remains is the computation of the solutions and of the following the generalized Sylvester and Lyapunov equations, respectively

Based on the results from [22], we can compute and as the limits of solutions to standard Sylvester and Lyapunov equations

5.6. Software

All of the numerical tests of the dynamical systems presented in the following have been carried out using the WavePacket software project which encompasses all numerical methods for model order reduction as discussed above. Being hosted at the open–source platform Sourceforge.net, this program package is publicly available, along with many instructions and demonstration examples, see http://sf.net/projects/wavepacket and Refs. [51, 50]. In addition to a mature MATLAB® version, there is also a C++ version currently under development.

6. Fokker–Planck equation

We start off with an example from stochastic control in classical mechanics: a semi-discretized Fokker–Planck equation (FPE) with external forcing. To this end, we consider the stochastic differential equation

(6.1)

that governs the motion of a classical particle with position at time . The motion is influenced by the gradient of a smooth potential , a deterministic control force and a random forcing coming from the increments of the Brownian motion in . For simplicity we assume that the potential is , with

Note that is a random variable for every , and an equivalent characterization of the diffusion process is in terms of its probability distribution

where is any measurable (Borel) subset of , and is the associated probability density whose time evolution is governed by the Fokker–Planck equation

(6.2)