Polynomial approximation of highdimensional HamiltonJacobiBellman equations and applications to feedback control of semilinear parabolic PDEs
Abstract.
A procedure for the numerical approximation of highdimensional HamiltonJacobiBellman (HJB) equations associated to optimal feedback control problems for semilinear parabolic equations is proposed. Its main ingredients are a pseudospectral collocation approximation of the PDE dynamics, and an iterative method for the nonlinear HJB equation associated to the feedback synthesis. The latter is known as the Successive Galerkin Approximation. It can also be interpreted as Newton iteration for the HJB equation. At every step, the associated linear Generalized HJB equation is approximated via a separable polynomial approximation ansatz. Stabilizing feedback controls are obtained from solutions to the HJB equations for systems of dimension up to fourteen.
1. Introduction
Optimal feedback controls for evolutionary control systems are of significant practical importance. Differently from openloop optimal controls, they do not rely on knowledge of the initial condition and they can achieve design objectives, as for instance stabilisation, also in the presence of perturbations. Furthermore, the online synthesis of feedback control can be implemented in a realtime setting. It is wellknown that their construction relies on special HamiltonJacobiBellman (HJB) equations, see for instance [4, 17]. The solution of the HJB equation is the value function associated to the optimal control problem, and its gradient is used to construct the optimal feedback control. In the very special, but important case of a linear control system with quadratic cost without constraints on the control or the state variables, the HJB equation reduces to a Riccati equation which has received a tremendous amount of attention, both for the cases when the control system is related to ordinary or to partial differential equations. Otherwise one has to deal with the HJB equation which is a partial differential equation whose spatial dimension is that of the control system. Thus optimal feedback control for partial differential equations leads to HJB equations in infinite dimensions [16]. After semidiscretization in space of the controlled partial differential equation (PDE), the HJB equation is posed in a space of dimension corresponding to the spatial discretization of the PDE [18]. For standard finite element or finite difference discretizations this leads to highdimensional HJB equations. This is one of the instances which is referred to as the curse of dimensionality [9].
Many attempts to tackle the difficulties posed for numerically solving the HJB equations arising in optimal control have been made in the past or are currently being investigated. We refer, for instance, to [17], which mainly focuses on semiLagrangian schemes, and further references given there. A related approach to numerical optimal feedback control of PDEs is to semidiscretize the dynamics and to add a model order reduction step, either with Balanced Truncation or Proper Orthogonal Decomposition, in order to reduce the dimension of the dynamics to a number that is tractable for gridbased, semiLagrangian schemes. This approach has been successfully explored, for instance, in [1, 26, 29] and references therein. It strongly relies on a trustworthy representation of the dynamics via lowdimensional manifolds. Such a lowdimensional representation may deteriorate when nonlinear and/or advection effects are relevant. Thus, it is important to strive for techniques, or combinations of techniques, which allow to solve higher dimensional problems.
Another direction of research evolves around generalizing the Riccatibased approach to allow for nonlinearities in the state equation. One such technique is termed statedependent Riccati equation [15]. Here the coefficients in the ’ordinary’ Riccati equation are functions of the state rather than constants as in the case of linear state equations. Another approach realizes the fact that the Riccati equation can be interpreted as the equation satisfied by the first term arising in the power series expansion of the value function, and attempts to improve by realizing also higher order terms in the expansion. These methods are succinctly explained in [7].
Yet another technique which has received a considerable amount of attention is termed Successive Galerkin Approximation. Roughly speaking, the nonlinear HJB equation associated to the continuoustime optimal control problem is solved by means of a Newton method. At each iteration, the control law is fixed. This leads to a Generalized HamiltonJacobi equation (GHJB) which is linear. The iteration is closed by an update of the control law based on the gradient of the value function. This method was intensively investigated in [5, 6], see also [7], and the references given in these citations. It is worth to mention that the discretetime counterpart of this method corresponds to the wellknown policy iteration or Howards’ algorithm [25, 11, 2].
The numerical examples in [5, 6, 7] do not go beyond dimension five, and most, if not all, of the published numerical results for nonlinear HJB equations do not exceed dimension eight [10, 20, 22]. An alternative sparse grid approach for highdimensional approximation of HJB equations based on openloop optimal control has been presented in [27], with tests up to dimension six. Numerical methods relying on tensor calculus have been shown to perform well in highdimensional settings where the associated HJB equation is a linear PDE [34].
In the present paper, to solve optimal control problems for certain classes of semilinear parabolic equations we shall proceed as follows. To accommodate the curse of dimensionality, the discretization of the PDE is based on a pseudospectral collocation method, allowing a higher degree of accuracy with relatively few collocation points. To solve the resulting HJB we utilize a Newton method based on the GHJB equation as described above. Next, the discretization of the GHJB equation is addressed through a Galerkin approximation with polynomial, globally supported, ansatz functions. While this mitigates the curse of dimensionality in terms of removing the mesh structure, it leads to highdimensional integrals. We therefore resort to separable representations for the system dynamics and for the basis set of the polynomial approximation. The separability assumption reduces the computation of the Galerkin residual equation to products of onedimensional integrals. The combination of these procedures allowed us to solve HJB equations related to nonlinear control systems up to dimension fourteen by means of basic parallelization tools. The successful use of the Newton procedure requires to provide a feasibly initialization, i.e. a suboptimal, stabilizing control. Since we do not consider constraints, this is not restrictive for finite horizon problem, but can be challenging for infinite horizon problems, and specifically for the stabilization problems which are considered in the present paper. In this respect we developed a continuation procedure based on the use of a discount factor. Specifically, we consider a nested iterative procedure: within the outer loop the value of a positive discount factor is driven to zero, within the inner loop the HJB equation is solved approximately for a fixed discount factor. With this approach, which, is summarized in Algorithms 1 and 2 below, we managed to solve optimal feedback stabilization problems for semilinear parabolic equations with different stability behavior of the desired steady state.
Let us give a brief outline of the paper. Section 2 sets the stage and provides the discussion of a special case to facilitate the understanding of the following material. In Section 3 the solution process of the HJB equation is detailed. In Section 4 we provide the formulas which are needed to numerically realize the discretized HJB equation after a separable basis has been chosen. Numerical experiments are documented in Section 5. There we can also find comparisons to suboptimal feedback strategies based on Riccati and asymptotic expansion techniques.
2. Infinite horizon optimal feedback control
We consider the following undiscounted infinite horizon optimal control problem:
subject to the nonlinear dynamical constraint
where we denote the state , the control , with , the state running cost , and the control penalization . Furthermore, we assume the running cost and the system dynamics and to be . Throughout it is assumed that and . Our focus is therefore asymptotic stabilization to the origin.
It is wellknown that the optimal value function
characterizing the solution of this infinite horizon control problem is the unique viscosity solution of the HamiltonJacobiBellman equation
(1) 
with . Here we follow the convention of dropping the subscript of . We study this equation in the unconstrained case, i.e., , where the explicit minimizer of (1) is given by
(2) 
note that by inserting this expression for the optimal control in (1), we obtain the equivalent HJB equation
(3) 
which under further assumptions can be simplified to the Riccati equation associated to linearquadratic infinite horizon optimal feedback control.
The methodology we present in this work is applicable to systems fitting the aforedescribed setting, although for the sake of simplicity we restrict the presentation by the following choices:

the control is a scalar variable, i.e. .

the running cost is quadratic, i.e. , with positivedefinite,

the control term is a constant vector in .
At this point, our setting differs from the linearquadratic case as it allows nonlinear dynamics, and nonquadratic state costs. For the numerical scheme that we develop, the following assumption is crucial:
Assumption 1.
The free dynamics are separable in every coordinate
where is a tensorvalued function. In the case , then we shall also assume a similar separable structure for .
2.1. Towards optimal feedback control of semilinear parabolic equations
In the following, we illustrate how the presented framework sets the grounds for a computational approach for approximate optimal feedback controllers for nonlinear PDEs. We consider the following optimal stabilization problem:
(4) 
subject to the semilinear parabolic equation
(5)  
In this case, the scalar control acts through the indicator function , with . At the abstract level, this corresponds to an infinitedimensional optimal control problem. A first step towards the application of the proposed framework is the space discretization of the system dynamics, leading to finitedimensional state space representation. The use of the pseudospectral collocation methods for parabolic equations has been studied in [31, 33], and leads to a state space representation of the form
where the discrete state corresponds to the approximation of at collocation points , and is the coordinatewise power. The matrices and are finitedimensional approximations of the Laplacian and control operators, respectively. Such a discretization of the dynamics directly fulfills the separability required in Assumption 1, as the ith equation of the dynamics reads
with a separability degree . It is very important to note that semidiscretization in space of a wide class of timedependent PDEs will lead to finitedimensional state space representations of this type, thus the applicability of the presented framework is only limited by the dimensionality of the associated HJB equation. This motivates the choice of a pseudospectral collocation method for the discretization, as it is possible to obtain a meaningful representation of the dynamics with considerably fewer degrees of freedom than classical loworder schemes. However, if pseudospectral collocation is not a suitable discretization method for the dynamics, model reduction procedures such as balanced truncation, proper orthogonal decomposition, or reduced basis techniques shall also lead to separable statespace representations. Once the finitedimensional state state space representation is obtained, we proceed to approximate the solution of the associated HJB equation (1), leading to the optimal feedback controller (2).
We now present a preview of the numerical results of the proposed approach. Further details of the numerical scheme will be developed in the forthcoming sections. The system dynamics in (5), are approximated in 12 collocation points (14 with b.c.s’), and therefore our approximation scheme seeks for a solution of a 12dimensional HJB equation, which allows the computation of online optimal feedback controllers. We compare our HJBbased controller (HJB) to the linearquadratic controller (LQR) obtained by linearization of the system dynamics, and to an approximation method for the HJB equation based on power series expansion (PSE) [21, 35]. In Figure 1 we observe the basic features of the dynamics and the control schemes. The uncontrolled system dynamics (diffusion+dissipative source term) are stable, but stabilization is extremely slow. The control algorithms considerably reduce the transient phase. However, the control signals are different, and the HJBbased controller generates a feedback control with reduced overall cost (4). Observe that at the beginning of the time horizon even the signs of the LQR, PSE, and HJBbased controls differ.
3. Approximate iterative solution of HJB equations
In this section, we construct a numerical scheme for the approximation of the HJB equation
(6) 
where . We recall two additional features in this equation which render the application of classical approximation techniques difficult: the absence of a variational formulation, and the minimization with respect to the control variable , which makes the HJB equation fully nonlinear. The simplest numerical approach to these problems is the use of monotone, gridbased discretizations (finite differences, semiLagrangian), in conjunction with a fixed point iteration for the value function which typically depends on the use of a discount factor. The socalled “value iteration” procedure was first presented by Bellman in [8], and although it has become a standard solution method for lowdimensional HJB equations, it suffers from three major drawbacks. First, the gridbased character of the scheme makes it inapplicable for highdimensional dynamics, as the total number of degrees of freedom scales exponentially with respect to the dimension of the dynamical system. This corresponds to the most classical statement of the socalled curse of dimensionality. Second, the contractive mapping includes a minimization procedure which needs to be solved for every grid point at every iteration. Third, the Lipschitz constant of the contractive mapping goes to 1 when the discretization parameter goes to 0, becoming extremely slow for finemesh solutions. In order to circumvent these limitations, we develop a numerical scheme combining an iteration on the control variable rather than the value function, together with a polynomial expansion for the value function to mitigate the computational burden associated to meshbased schemes.
3.1. Successive approximation of HJB equations
In the following, we revisit the method presented in [5, 6], which is referred as Successive Approximation Algorithm. We begin by defining the set of admissible controls.
Definition 1 (Admissible control).
We say that a feedback mapping is admissible on , denoted as , if , , and for all .
Starting from an admissible initial guess , the Successive Approximation Algorithm (Algorithm 1 below) generates the pair which solves equation (6).
(7) 
Algorithm 1 corresponds to a Newton method for solving equation (6), and in the linearquadratic setting it is equivalent to the NewtonKleinmann iteration for solving the Riccati equation. It can be also directly identified with the policy iteration algorithm for HJB equations (see [2] and references therein), although in this context the usual setting includes a discount factor which relaxes the admissibility assumption, as well as discretetime dynamics. Consequently, it is applied to a Bellman equation with no continuous gradient. In both cases, the core ingredient of the algorithm is to generate a decreasing sequence of values by solving an associated sequence of linear problems. In our case this translates into solving, for a given at each iteration, the Generalized HamiltonJacobiBellman (GHJB) equation
(8)  
The following result from [5] summarizes relevant properties of the GHJB equation.
Proposition 1.
If is a compact subset of , is Lipschitz continuous on and , is strictly increasing in , , and , then:

There exists a unique satisfying (8).

is a Lyapunov function of the controlled system.

, for all .

The update satisfies .

If satisfies , then for all .
3.2. A continuation procedure
A critical aspect of the Successive Approximation Algorithm 1 is its initialization, which requires the existence of an admissible control which in view of (4) means that it asymptotically stabilizes all the initial conditions in . For asymptotically stable dynamics, this is trivially satisfied by . For more general cases, the computation of stabilizing feedback controllers is a challenging task. A partial answer is to consider the stabilizing feedback associated to the linearized system dynamics. However, this feedback is only locally stabilizing, and therefore the identification of a suitable domain where this control law is admissible becomes relevant. For low dimensional dynamics, this has been studied in the context of Zubov’s method in [14]. An alternative solution that we propose is to consider a discounted infinite horizon control problem
where the inclusion of the discount factor relaxes the admissibility condition. Recently, in [19, 32], the link between discounted optimal control and asymptotic stabilization has been discussed, and under certain conditions, the discounted control problem can generate optimal controls that are also admissible for the undiscounted problem. We recall that the associated HJB equation for the infinite horizon optimal control problem is given by
(9) 
and the associated GHJB reads
(10)  
We consequently modify the Successive Approximation Algorithm in order to embed it within a pathfollowing iteration with respect to the discount factor:
3.3. Spectral element approximation of the GHJB equation
So far we have discussed the iterative aspects of a computational method for solving HJB equations. We now address the numerical approximation of the GHJB equation.
(12) 
For this purpose, we consider an expansion of the form
where , with belonging to a complete set of basis functions in , and . In particular, we shall often generate from a multidimensional monomial basis as illustrated in Figure 2, which directly satisfies the boundary condition . The coefficients are obtained by imposing the Galerkin residual equation
(13) 
Remark 1.
The convergence of has been studied thoroughly in [5]. It follows a power series argument, and requires conditions for uniform convergence of pointwise convergent series, in order to guarantee that for sufficiently large. In our particular case, we further assume that the dynamics are polynomial (as illustrated in Section 2.1). Therefore, under the assumptions of Theorem 26 in [5], by choosing a multidimensional monomial basis (of degree ) and an admissible control , it can be established that, , such that for , , and .
We now focus on the different terms involved in the approximation of the GHJB equation. Since this equation is meant to be solved within the iterative loop described in the previous section, we assume that can be expressed in the form
(14) 
where corresponds to the value function of the previous iteration, approximated with the expansion
Below we shall write for . We proceed by expanding case by case the different terms of the Galerkin residual equation
(15) 

: it is directly verifiable that

: by inserting the expansion we obtain
and therefore

: we further assume that

: note that
leading to
is given by
After discretization, the GHJB (13) reduces to a parameterdependent linear system for
where is given by the expansion of ( terms 4) and 5) in the list above).
4. Computation of integrals via separable expansions
Under Assumption 1 concerning the separability of the free dynamics , and with the construction of a separable set of basis functions by taking the tensor product of onedimensional basis functions as shown in Figure 2, the calculation of the dimensional inner products of the Galerkin residual equation of the previous section is reduced to the product of onedimensional integrals. In the following, we provide further details of this procedure.
4.1. Generation of a multidimensional basis
The multidimensional basis functions
for the expansion of are generated as follows. We start by choosing a polynomial degree , and a onedimensional polynomial basis . For the sake of simplicity, we consider the monomial basis , but the same ideas apply for other basis, such as orthogonal polynomials. The multidimensional basis is generated as a subset of the dimensional tensor product of onedimensional basis, such that
i.e., we construct a full multidimensional tensorial basis and then we remove elements according to the approximation degree . The elimination step is fundamental and is twofold. If no elimination is performed, the cardinality of would be , and again one would face the curse of dimensionality that also affects gridbased schemes. By reducing the set to multdimensional monomials of degree at most , the cardinality of the set is given by
(16) 
which replaces the exponential dependence on by a combinatorial one. This formula is evaluated in Table 1 for different values of interest for and . By considering globally defined polynomial basis functions, the dependence on the dimension is replaced by the combinatorial expression (16). The dimensional reduction of the basis is particularly significant for low order polynomial approximation (up to degree 6). A second justification for the way in which we generate the basis set has a controltheoretical inspiration. A wellknown result in optimal feedback control is that if the dynamics are linear, and the running cost is quadratic, the value function associated to the infinite horizon control problem (in the unconstrained case and other technical assumptions) is a quadratic form, i.e. is of the form , which fits precisely the elements generated for with a monomial basis when and linear elements are eliminated. Therefore, our basis can be interpreted as a controlled increment, accounting for the nonlinear dynamics, of the basis required to recover the solution of the control problem associated to the linearized dynamics around the equilibrium point.
Full monomial basis  Evendegree monomials  
\  2  4  6  8  2  4  6  8 
6  27  209  923  3002  21  147  609  1896 
8  44  494  3002  12869  36  366  2082  8517 
10  65  1000  8007  43757  55  770  5775  30085 
12  90  1819  18563  125969  78  1443  13819  89401 
14  119  3059  38759  319769  105  2485  29617  233107 
Remark 2.
Theorem 7.1 in [6] states parity conditions to reduce the polynomial basis . Under the assumptions , and , if

is a symmetric rectangle around the origin, i.e.,

the free dynamics are oddsymmetric on , i.e. , for all
then is an evensymmetric function, i.e., , and therefore odddegree monomials are excluded from the basis. A direct corollary is that in the linear quadratic case, where the linear dynamics are trivially oddsymmetric, is a quadratic form.
Finally, for the calculation presented in the following, it is important to note that due to the construction procedure, the basis elements directly admit a separable representation
(17) 
where each component .
4.2. Highdimensional integration
We begin by recalling that
(18) 
where is a tensorvalued function, and that .
As in the previous section, we proceed term by term, to obtain the summands in (15). The integration is carried over the hyperrectangle .

: this term is directly assembled from the calculation of

: This term involves the calculation of
which is expanded by using the separable structure of the free dynamics
where

: In this case, we need to work on the expression
which is obtained directly from the computations for in 5) below.

:
where with a similar argument as in the previous term we expand

: This term requires the computation of the inner product
with given by
By using the separable representation of the basis functions
we expand the inner product
(19)
Initialization
The first iteration, with a stabilizing initial guess , requires special attention. If it is obtained via a Riccatitype argument, then initialization follows directly from (14). Otherwise we shall relax this requirement, and only assume that the initial stabilizing controller is given in separable form
In this case, we must recompute the term:

As for the term , which needs to be computed differently in the first iteration, we can proceed in the same way as for , since both and have the same separable structure, it just takes to assign .
4.3. Computational complexity and implementation
Among the expressions developed in the previous subsection, the overall computational burden is governed by the approximation of
which requires the assembly of the 5dimensional tensor . Each entry of this tensor is a dimensional inner product, which under the aforementioned separability assumptions is computed as the product of , onedimensional integrals. Thus, the total amount of onedimensional integrals required for the proposed implementation is . A positive aspect of our approach is that the assembly of tensors like falls within the category of embarrassingly parallelizable computations, so the CPU time scales down almost directly with respect to the number of available cores. Furthermore, can be entirely computed in an offline phase, before entering the iterative loops in Algorithms 1 and 2. However, for values of interest of and , such as and , Table 1 indicates that is indeed a very large number. A rough estimate of the CPU time required for the assembly of is given by
where corresponds to the time required for the computation of a onedimensional integral. Therefore, it is fundamental for an efficient implementation to reduce to a bare minimum. From closer inspection of the expression (19), we observe that all the terms can be identified as elements of the tensors
Both and can be computed exactly with a Computer Algebra System, or approximated under suitable quadrature rules. We follow this latter approach, implementing an 8point GaussLegendre quadrature rule. After having computed and , the assembly of (19) reduces to calls to properly indexed elements of these tensors. This approach requires a careful bookkeeping of the separable components of each multidimensional basis function . In this way, an entry of takes of the order of 1E7 seconds and the overall CPU time is kept within hours for problems of dimension up to 12.
5. Computational implementation and numerical tests
5.1. Convergence of the polynomial approximation
We assess the convergence of the polynomial approximation in a 1D test, with
such that the exact solution of equation (3) is given by
We implement the pathfollowing version (Algorithm 2), starting with , and a threshold value , a parameter , and an internal tolerance . The relative error for Table 2 is defined as
and number of iterations for different polynomial degree approximations are shown in Table 2 and Figure 3.
Monomial basis  Legendre basis  

(degree)  error  iterations  error  iterations 
2  1.1539  53  1.4127  52 
4  0.2541  49  0.3643  58 
6  0.015  52  0.0206  52 
8  5.01E4  55  6.41E4  53 
10  8.33E6  55  1.072E5  55 
5.2. Optimal feedback control of semilinear parabolic equations
Similarly as in Section 2.1, we consider the following optimal control problem
(20) 
subject to the semilinear dynamics
where the linear operator is of the form with , and is a nonlinear operator such that . The scalar control acts through the indicator function , with The The system is closed under suitable boundary conditions. We choose , , , and . The nonlinearity covers both advective, Burgers’type, and polynomial source terms. In order to generate a lowdimensional state space representation of the dynamics, we resort to a pseudospectral collocation method with Chebyshev polynomials as in [31] (for further details we also refer to [33, p. 107]. By considering collocation points , the continuous state is discretized into , where . The semilinar PDE dynamics are thus approximated by the dimensional nonlinear ODE system
(21) 
where the operators correspond to the finitedimensional realization of through the Chebyshev pseudospectral method. Therefore, the number of collocation points governs the dimension of the resulting nonlinear ODE system (21), and consequently determines the dimension of the domain where the associated HJB equation is solved. In the following, Tests 13 are computed in 14 collocation points, which after including boundary conditions lead to a 12 dimensional domain for the HJB equation. Test 4 is solved in 14 dimensions. The highdimensional solver was implemented in MATLAB, parallelizing the tensors assembly, and tests were run on a muticore architecture 8x Intel Xeon E74870 with 2,4Ghz, 1 TB of RAM. The MATLAB pseudoparallelization distributes the tasks among 20 workers. Representative performance details are shown in Table 3. The assembly of highdimensional tensor that enter the iterative algorithm accounts for over 80% of the total CPU time. This percentage increases when Algorithm 1 is implemented for asymptotically stable dynamics, as it requires a much lower number of iterations. Note that much of the work done during the assembly phase is independent of the dynamics (see for instance (19)), and therefore can be reused in latter problems, mitigating the overall computational burden.
Test  Dimension  CPUassembly  CPUiterative (#) 

1  10  2.061E3[s]  4.221E2[s](32) 
1  12  1.945E4[s]  3.377E3[s](32) 
4  14  1.557E5[s]  3.102E4[s](37) 
We now turn to the specification of parameters for the solution of the HJB equation. We set , and consider a monomial basis up to order 4 as described in Section 4. Depending on the dynamics of every example, we will neglect odddegree basis functions as in Remark 2. All the integrals are approximated with an 8 point GaussLegendre quadrature rule. Whenever system dynamics are stable at the origin, the value function is obtained from the undiscounted Algorithm 1, initialized with . When the dynamics are unstable over , we implement Algorithm 2, with , , and . The initializing controller is given by the solution of the associated linearquadratic optimal feedback, as described below. For both implementations, the tolerance of the algorithm is set to . In the following tests, we compare the HJBbased feedback control with respect to the uncontrolled dynamics (), the linearquadratic optimal feedback (LQR), and the power series expansion type of controller (PSE). We briefly describe these controllers. The wellknown LQR feedback controller corresponds to the HJB synthesis applied over the linearized system around the origin
(22) 
and results in the optimal feedback control law given by
where is the unique selfadjoint, positivedefinite solution of the algebraic Riccati equation