A HJB-POD approach for the control of nonlinear PDEs on a tree structure

A HJB-POD approach for the control of
nonlinear PDEs on a tree structure

Alessandro Alla Luca Saluzzi Department of mathematics, PUC-Rio, Rio de Janeiro, Brazil, alla@mat.puc-rio.br Department of mathematics, Gran Sasso Science Institute, L’Aquila, Italy, luca.saluzzi@gssi.it
Abstract

The Dynamic Programming approach allows to compute a feedback control for nonlinear problems, but suffers from the curse of dimensionality. The computation of the control relies on the resolution of a nonlinear PDE, the Hamilton-Jacobi-Bellman equation, with the same dimension of the original problem. Recently, a new numerical method to compute the value function on a tree structure has been introduced. The method allows to work without a structured grid and avoids any interpolation.

Here, we aim to test the algorithm for nonlinear two dimensional PDEs. We apply model order reduction to decrease the computational complexity since the tree structure algorithm requires to solve many PDEs. Furthermore, we prove an error estimate which guarantees the convergence of the proposed method. Finally, we show efficiency of the method through numerical tests.

keywords:
Optimal control, Hamilton-Jacobi-Bellman equation, Model order reduction, Proper Orthogonal Decomposition, Tree structure, Error Estimates
Msc:
[2010]49L20, 49L25, 49J20,78M34,65N99,62H25
journal: Applied Numerical Mathematics

1 Introduction

The dynamic programming (DP) approach, introduced by Bellman in the late ’50, allows to obtain a feedback control by means of the knowledge of the value function. Thus, we solve a nonlinear Partial Differential Equation (PDE) known as Hamilton-Jacobi-Bellman (HJB) equation that has the same dimension of the optimal control problem. It is well-known that this problem suffers from the curse of dimensionality: typically this equation has to be solved on a space grid and this is the major bottleneck for numerical methods in high-dimension. We refer to BCD97 () and FF13 () for a complete description of theoretical and numerical results, respectively.

The focus of this paper is to solve finite horizon optimal control problems for nonlinear PDEs. It is straightforward to understand the difficulty of the problem when dealing with a DP approach, since the discretization of PDEs leads to a very large system of ODEs, which makes the problem not feasible on a structured grid. In the literature several methods have been introduced to mitigate the curse of dimensionality. Although a complete description of numerical methods for HJB goes beyond the scopes of this work, we distinguish between numerical methods for the control of ODEs and PDEs via the HJB equation. In the former we mention, among others, domain decomposition methods and iterative schemes based on a semi-Lagrangian approach (see e.g. CCFP12 (); AFK15 () and the references therein). On the other hand, to compute feedback control of PDEs, it is very common the use of model order reduction techniques to reduce the complexity of the system and, therefore, the dimension of the corresponding HJB equation. In particular, we refer to the Proper Orthogonal Decomposition (POD, see Vol11 ()) which will constitute one of the building blocks for the current paper. The POD method allows to compute low-rank orthogonal projectors by means of Singular Value Decomposition (SVD) upon snapshots of the dynamical system at given time instances. Here, it comes a serious issue of this approach for optimal control problems since the control input is not known in advance and it is usually necessary to plug a forecast to compute the snapshots. However, on a structure grid, POD has been successfully coupled with the HJB approach for the control of PDEs. We refer to the pioneering work KVX04 () and to AFV17 () for error estimates of the method. We note that this approach is only a mitigation of the curse of dimensionality because it is not possible to work with a reduced space with dimension larger than and the aim of the POD method is to make the problem feasible even for very high dimensional equation such as PDEs. Other approaches to mitigate the curse of dimensionality are built upon the sparse grid method (see e.g. GK17 ()) or the spectral elements method (see KK18 ()). For the sake of completeness, we mention that the control of PDEs can be solved with other methods such as, among others, open loop techniques (see e.g HPUU09 ()) and model predictive control (see e.g GP11 ()).

Recently, in AFS18 () the authors proposed a new method based on a time discretization of the dynamics which allows to mimic the discrete dynamics in high-dimension via a tree structure, considering a discretized control space. The method deals with a finite horizon optimal control problem and, in the discretization, the tree structure replaces the space grid which allows to increase the dimension of the state space. However, the tree structure complexity increases exponentially due to the number of time steps and control inputs. To decrease the complexity and to accelerate, a pruning technique has been implemented to reduce the number of branches in the tree obtaining rather accurate results. Error estimates for the method can be found in the recent work SAF18 (). Therefore, it is clear that the method is expensive when we deal with PDEs since it requires to solve many equations for several control inputs. It is then natural to couple the TSA with POD in order to speed up the method. With the approach studied in the current paper we have four major advantages:

  1. we build the snapshots set upon all the trajectories that appear in the tree, avoiding the selection of a forecast for the control inputs which is always not trivial for model reduction,

  2. the application of POD also allows an efficient pruning since it reduces the dimension of the problem,

  3. we avoid to define the numerical domain for the projected problem, which is a difficult task since we lose the physical meaning of the reduced coordinates,

  4. we are not restricted to consider a reduced space dimension smaller than as in e.g. KVX04 (); AFV17 ().

Finally, we remark that to obtain a low-dimensional problem completely independent from the dimension of the original system, we use the Discrete Empirical Interpolation Method as in CS10 (). To validate our approach we also provide a-priori error estimate for the coupling between TSA and model order reduction.

The paper is organized as follows: we define the optimal control problem and the DP approach in Section 2. We recall the tree structure algorithm in Section 2.1 and the POD method in Section 3. In Section 4 we present, step by step, the coupling between POD and the TSA, and in Section 5 we provide an error estimate for the coupled method. Finally, numerical tests for two-dimensional nonlinear PDEs are shown in Section 6. We give our conclusions and perspectives in Section 7.

2 The optimal control problem

In this section we describe the optimal control problem and the essential features of the DP approach. Let us consider a large system of ordinary differential equations in the following form:

(1)

where is a given initial data, are given matrices and is a continuous function in both arguments and locally Lipschitz-type with respect to the second variable. We will denote by the solution, by the control and by

the set of admissible controls where is a compact set. We will assume that there exists a unique solution for (1) for each .

This wide class of problems arises in many applications, especially from the numerical approximation of PDEs. In such cases, the dimension of the problem is the number of spatial grid points used for the discretization and it can be very large.

To ease the notation we will denote the right hand side as follows:

(2)

To select the optimal trajectory, we consider the following cost functional

(3)

where is the running cost, is the final cost and is the discount factor. We will suppose that the functions and are Lipschitz continuous. The optimal control problem then reads:

(4)

The final goal is the computation of the control in feedback form in terms of the state equation where is the feedback map. To derive optimality conditions, we use the Dynamic Programming Principle (DPP). We first define the value function for an initial condition :

(5)

which satisfies the DPP, i.e. for every :

(6)

Due to (6), we can derive the HJB equation for every :

(7)

Once the value function has been computed, it is possible to obtain the optimal feedback control as:

(8)

2.1 Dynamic Programming on a Tree Structure

In this section we will recall the finite horizon control problem and its approximation by the tree structure algorithm (see AFS18 () for a complete description of the method and SAF18 () for theoretical results). The computation of analytical solutions of Equation (7) is a difficult task due to its nonlinearity and approximation techniques should take in consideration discontinuities in the gradient (see FF13 () and the references therein). Here, we discretize equation (7), only partitioning the time interval with step size , where is the total number of steps. Thus, for and every , we have

(9)

where and

The term is usually computed by interpolation on a grid, since is in general not a grid point. To avoid the use of interpolation, we build a non-structured grid with a tree structure.

We first discretize the control domain into discrete controls, . The tree will be denoted by where each level contains all the nodes of the tree at time . We proceed as follows: first we start from the initial state , which will form the first level . Then, we follow the discrete dynamics, given e.g. by an explicit Euler scheme, inserting the discrete control , obtaining

Therefore, we have . We can characterize the nodes by their th time level as follows

and the tree can be shortly defined as

where the nodes are obtained following the dynamics at time with the controls :

with , and , where is the ceiling function.

The cardinality of tree increases exponentially, i.e. , where is the number of controls and the number of time steps. To mitigate this problem, we consider the following pruning rule: given a threshold , we can cut off a new node , if it verifies the following condition with a certain

(10)

This condition is reasonable, since the numerical value function is Lipschitz continuous, therefore implies for and .

The pruning rule (10) helps to save a huge amount of memory. If we choose the tolerance properly, e.g. , we keep the same accuracy of the approach without pruning, as shown in SAF18 (). To increase the order of convergence, one could use a higher order method for the discretization of the ODE (1). More details can be found in AFS18b ().

The computation of the numerical value function will be done on the tree nodes

(11)

and it follows directly from the DPP. The tree will form the spatial grid and we can write a time discretization for (7) as follows:

(12)

Since the control set is discrete, the minimization is computed by comparison. A detailed comparison and discussion about the classical method and tree structure algorithm can be found in AFS18 (). The computation of the feedback on a tree structure takes advantage of the discrete control set and therefore during the computation of the value function, we can store the indices which provide the optimal trajectory. More details on the computation of the feedback control are given in Section 4.

3 Model order reduction and POD method

In this section first we recall the POD method for the state equation (1) and later how to apply it to reduce the dimension of the optimal control problem (4).

3.1 POD for the state equation

The solution of the system (1) may be very expensive and it is useful to deal with projection techniques to reduce the complexity of the problem. Although a complete description of model order reduction methods goes beyond the scopes of this work, here we recall the POD method. We refer the interested reader to Sir87 (); Vol11 () for more details on the topic and to BGW15 () for a review of different projection techniques.

Let us assume we have computed a numerical (or analytical if possible) solution of (1) on the time grid points , for some given control inputs. Then, we collect the snapshots into the matrix . The aim of the method is to determine a POD basis of rank to describe the set of data collected in time by solving the following minimization problem:

(13)

The associated norm is given by the euclidean inner product . The solution of (13) is given by the SVD of the snapshots matrix , where we consider the first columns of the orthogonal matrix . The selection of the rank of POD basis is based on the error computed in (13) which is related to the singular values neglected. We will choose such that , with

(14)

where are the singular values of .

However, the error strongly depends on the quality of the computed snapshots. This is clearly a limit when dealing with optimal control problems, since the control input is not known a-priori and it is necessary to have a reasonable forecast. In Section 4 we will explain how to select the control input to solve (4).

To ease the notation, in what follows, we will denote by the POD basis of rank . Let us assume that the POD basis have been computed and make use of the following assumption to obtain a reduced dynamical system:

(15)

where is a function from to . If we plug (15) into the full model (1) and exploit the orthogonality of the POD basis, the reduced model reads:

(16)

where and . We also note that and . Error estimates for the reduced system (16) can be found in KV02 ().

In what follows we are going to define the reduced dynamics as:

(17)

Discrete Empirical Interpolation Method

The solution of (16) is still computationally expensive, since the nonlinear term depends on the dimension of the original problem, i.e. the variable . To avoid the expensive, high-dimensional evaluation, the Empirical Interpolation Method (EIM, BMNP04 ()) and Discrete Empirical Interpolation Method (DEIM, CS10 ()) were introduced.

The computation of the POD basis functions for the nonlinear part is related to the set of the snapshots , where are already computed from (1). We denote by the POD basis functions of rank of the nonlinear part. The DEIM approximation of is given in the following form:

(18)

where and . Here, we assume that each component of the nonlinearity is independent from each other, then the matrix can be moved into the nonlinearity. The role of the matrix is to select interpolation points to evaluate the nonlinearity. The selection is made according to the LU decomposition algorithm with pivoting CS10 (), or following the QR decomposition with pivoting DG15 (). We finally note that all the quantities in (18) are independent of the full dimension Typically the dimension is much smaller than the full dimension. This allows the reduced order model to be completely independent of the full dimension as follows:

(19)

In what follows, we are going to define the reduced POD-DEIM dynamics as:

(20)

The DEIM error is given by:

(21)

as shown in CS10 (); DG15 ().

3.2 POD for the optimal control problem

The key ingredient to compute feedback control is the knowledge of the value function expressed in (9), which is a nonlinear PDE whose dimension is given by the dimension of (1). It is clear that its approximation is very expensive. Therefore, we are going to apply the POD method to reduce the dimension of the dynamics and then solve the corresponding (reduced) discrete DPP which is now feasible and defined below. Let us first define the reduced running cost and the reduced final cost as

Next, we introduce the reduced optimal control problem for (4). For a given control , we denote by the unique solution to (19) at time . Then, the reduced cost is given by

(22)

and, the POD approximation for (4) reads as follows:

(23)

Finally, we define the reduced value function as

(24)

and the reduced HJB equation:

(25)

Alternatively, one could further approximate the nonlinear term using DEIM and replace the dynamics (16) with (19) in (23), providing an impressive acceleration of the algorithm as shown in Section 6.

4 HJB-POD method on a tree structure

In this section we explain, step by step, how to use model reduction techniques on a tree structure in order to obtain an efficient approximation of the value function and to deal with complex problems such as PDEs. We also provide an error estimate for the presented method.

Computation of the snapshots

When applying POD for optimal control problems there is a major bottleneck: the choice of the control inputs to compute the snapshots. Thus, we store the tree for a chosen and discrete control set . This set turns out to be a very good candidate for the snapshots matrix since it delivers all the possible trajectories we want to consider. To summarize the snapshots set is .

In the numerical tests, we will use and controls to compute the snapshots that, as shown in Section 6, will be sufficient to catch the main features of the controlled problem.

Computation of the basis functions

The computation of the basis has been described in Section 3. In this context we have no restrictions on the choice of the number of basis , since we will solve the HJB equation on a tree structure. In former works , e.g. KVX04 (); AFV17 (), the authors were restricted to choose to have a feasible reduction of the HJB equation. Here, the dimension of the state variable is not a major issue. On the other hand, the pruning strategy will turn out to be crucial for the feasibility of the problem.

It is well-known that the error in (13) is given by the sum of the singular values neglected. We recall that we will chose such that with defined in (14).

Construction of the reduced tree

Having computed the POD basis, we build a new tree which might consider a different and/or a finer control space with respect to the snapshots set. We will denote the projected tree as with its generic th level given by:

where the reduction of the nonlinear term can be done via POD or POD-DEIM as in (18). The first level of the tree is clearly given by the projection of the initial condition, i.e. . Then, the procedure follows the full dimensional case, but with the projected dynamics. We will show how this approach speeds up the method keeping high accuracy. Even if we have reduced the dimension of the problem, the cardinality of the tree depends on the number of the discrete controls and the time step chosen as in the high-dimensional case. It is clear that each resolution of the PDE will be faster, but it is still necessary to apply a pruning criteria which reads:

(26)

The reduced dynamics has the property that its first components vary more than the remaining ones. This property allows us to check the closeness of two points just looking at the first components, reducing the pruning rule to a lower dimensional problem.

Approximation of the reduced value function

The numerical reduced value function will be computed on the tree nodes in space as

(27)

Then, the computation of the reduced value function follows directly from the DPP. Defined the grid for , we can write a time discretization for (7) as follows:

(28)

Computation of the feedback control

The computation of the feedback control strongly relies on the fact we deal with a discrete control set . Indeed, when we compute the reduced value function, we store the control indices corresponding to the in (28). The optimal trajectory is than obtained by following the path of the tree with the controls chosen such that

(29)

for , where the symbol stands for the connection of two nodes by the control .

Once the control has been computed, we plug it in into the high dimensional problem (1) and compute the optimal trajectory.

5 Error estimates for the HJB-POD method on a TSA

In this section we derive an error estimate for the HJB-POD approximation (28) on a tree structure. In what follows, we assume that the functions are bounded:

(30)

the functions and are Lipschitz-continuous with respect to the first variable

(31)

and the cost is also Lipschitz-continuous:

(32)

Furthermore, let us assume that the functions and are semiconcave

(33)

and assume that verifies the following inequality:

(34)

We also introduce the continuous-time extension of the DDP

(35)

and the POD version for the continuous-time extension (35) which reads:

(36)

Given the exact solution and its POD discrete approximation , we want to prove the following theorem which provides an error estimate for the proposed method.

Theorem 5.1.

Let us assume hold true, then there exists a constant such that

(37)

where is the identity map and is the orthogonal projection onto the subspace such that .

Proof.

We observe that, by triangular inequality, the approximation error can be decomposed in two parts:

(38)

An error estimate for the first term has been already obtained in SAF18 ():

(39)

Let us focus on the second term of the right hand side of (38). Without loss of generality, we consider . For , the estimate follows directly by the assumptions on . Considering and , we can write

(40)

where and are defined as

We define the trajectory path and its POD approximation respectively as

where

with and . Then, iterating (40) we obtain

(41)

We can see that

Then

By the discrete Grönwall’s lemma, we obtain the following estimate

(42)

Combining (41) and (42), we get

(43)

where

Analogously, it is possible to obtain the same estimate for and, defining , we get the desired result. ∎

Remark 5.1.

The error is related to the POD approximation and it can be bounded by the sum of the remaining singular values, obtaining the convergence

Remark 5.2.

The error estimate (37) can be easily extended to the DEIM approach. It is indeed enough to replace the POD dynamics with the POD-DEIM dynamics and use the error estimate (21) in (37).

6 Numerical Tests

In this section we apply our proposed algorithm to show the effectiveness of the method with two test cases. In the first we deal with a parabolic PDE with a polynomial nonlinear term, which is usually not a trivial task when applying open-loop control tools. The second test concerns the bilinear control of the viscous Burgers’ equation.

In order to obtain the PDEs in the form (1), we use a Finite Difference scheme and we integrate in time using an implicit Euler scheme coupled with the Newton’s method with tolerance equal to . We will denote by the discretized set of with equi-distributed controls.

The numerical simulations reported in this paper are performed on a MacBook Pro with 1CPU Intel Core i7, GHz and 16GB RAM. The codes are written in Matlab R2018b.

6.1 Test 1: Nonlinear reaction diffusion equation

In the first example we consider the following bidimensional PDE with polynomial nonlinearity and homogeneous Neumann boundary conditions

(44)

where the control is taken in the admissible set and . In (44) we consider: We discretize the space domain in points in each direction, obtaining a discrete domain with points. As shown in Figure 1, the solution of the uncontrolled equation (44) (i.e. ) converges asymptotically to the stable equilibrium .

Figure 1: Test 1: Uncontrolled solution for equation (44) for time (from left to right).

Our aim is to steer the solution to the unstable equilibrium . For this reason, we introduce the following cost functional

(45)

Case 1: Full TSA

We first consider the results using the TSA without model order reduction. In Figure 2 we report the optimal trajectory obtained using the full tree structure algorithm with controls.

Figure 2: Test 1: Controlled solution with TSA for equation (44) with full tree for time (from left to right) with .

As one can see, we steer the solution to the unstable equilibrium using as discrete control set. For the given tolerance , the cardinality of the pruned tree is , whereas without is .

In the left panel of Figure 3, we show the control policy obtained with , and discrete controls. In the right panel we show the behaviour of the cost functional, and it is easy to check that the optimal trajectories are very similar. An analysis of the CPU time is provided in Table 1 and discussed below.

Figure 3: Test 1: Control policy (left) and cost functional (right) for , and .

Case 2: TSA with POD

The computation of the full TSA is already expensive with only controls. For this reason, we replace the dynamics with its reduced order modeling. Then, we set the number of POD basis such that . Similarly, we consider DEIM basis for the nonlinear term. In what follows, whenever we will talk about POD, we will refer to POD-DEIM approach.

The snapshots matrix is computed with a full TSA using the discrete control space . It is possible to observe in Figure 4 that dealing with , and controls, we obtain the same results as for the high dimensional equation (compare with left panel of Figure 3). We remind that the optimal trajectory is obtained plugging the suboptimal control into the high dimensional model. The temporal step size chosen for these simulations is , as in the snapshots set. However, we are able to work using a different control space. In the online phase we perform the pruning criteria with . Finally, in the right panel of Figure 3 we show a zoom of the cost functional which considers the whole optimal trajectories and it is possible to see the improvement obtained using more controls.

Figure 4: Test 1: Optimal policy (left), cost functional (middle) and (right) for with .

The CPU time, expressed in seconds, is shown in Table 1. The online phase of the TSA-POD is always faster than the full TSA. We tried to compute the full TSA with controls and we stopped the comp