Linear structures in nonlinear optimal control

Linear structures in nonlinear optimal control

Jakob Löber jakob@physik.tu-berlin.de Institut für Theoretische Physik, EW 7-1, Hardenbergstraße 36, Technische Universität Berlin, 10623 Berlin
July 5, 2019
Abstract

We investigate optimal control of dynamical systems which are affine, i.e., linear in control, but nonlinear in state. The control task is to enforce the system state to follow a prescribed desired trajectory as closely as possible, a task also known as optimal trajectory tracking. To obtain well-behaved solutions to optimal control, a regularization term with coefficient must be included in the cost functional. Assuming to be small, we reinterpret affine optimal control problems as singularly perturbed differential equations. Performing a singular perturbation expansion, approximations for the optimal tracking of arbitrary desired trajectories are derived. For , the state trajectory may become discontinuous, and the control may diverge. On the other hand, the analytical treatment becomes exact. We identify the conditions leading to linear evolution equations. These result in exact analytical solutions for an entire class of nonlinear trajectory tracking problems. The class comprises, among others, mechanical control systems in one spatial dimension and the FitzHugh-Nagumo model with a control acting on the activator.

pacs:
02.30.Yy, 05.45.-a, 07.05.Dz, 45.80.+r

Introduction.

In principle, a controlled dynamical systems can be considerably simpler than the corresponding uncontrolled system. Consider Newton’s equation of motion (EOM), cast in form of a two-dimensional dynamical system, for a point particle with unit mass, position , and velocity . Under the influence of an external force and a control force , the EOM read

(1)

While no simple analytical expression for the solution exists for the uncontrolled system with , assuming that is a feedback control signal renders a linearization of Eq. (1) possible. Indeed, for , we may introduce a new control signal by such that the new controlled system is linear, . This technique is called feedback linearization and is applicable, in a more sophisticated fashion involving an additional state transformation, to a huge class of dynamical systems Khalil (2001). The resulting linear structure facilitates the application of a multitude of solution and analysis techniques, which are not available in the nonlinear case Chen (1998). Contrary to the more familiar approximate linearizations applied e.g. to study of the linear stability of attractors, feedback linearization is an example of an exact linearization. The question emerges if exact linearizations are exclusive to systems with feedback control.

In this letter, we demonstrate the possibility of linear structures in optimal control, which may or may not be a feedback control. As a result, the controlled state as well as the control signal are given as the solution to linear differential and algebraic equations. The finding enables the exact analytical solution of an entire class of nonlinear optimally controlled dynamical systems, including but not limited to all models of the form Eq. (1). While stabilizing feedback control of unstable attractors received much attention by the physics community Bechhoefer (2005), especially in the context of chaos control Ott et al. (1990), optimality of these methods is rarely investigated. Numerical solutions of optimal control problems are computationally expensive, which often prevents real time applications. Particularly optimal feedback control suffers from the ’curse of dimensionality’ Bellman (2003). On the other hand, analytical methods are largely restricted to linear systems which lack e.g. limit cycles and chaos. The approach presented here opens up a way to circumvent such problems.

Trajectory tracking aims at enforcing, via a vector of control signals , a system state to follow a prescribed desired trajectory as closely as possible within a time interval . The distance between and in function space can be measured by the functional . vanishes if and only if , in which case we call an exactly realizable desired trajectory Löber (2016, 2015). If is not exactly realizable, we can formulate trajectory tracking as an optimization problem. The control task is to minimize the quadratic cost functional

(2)

subject to the dynamic constraints that evolves according to the controlled dynamical system

(3)

Here, is a constant symmetric positive definite matrix of weights. While the nonlinearity is known from uncontrolled dynamical systems, the input matrix is exclusive to controlled systems. We assume that the rank of the matrix equals the number of independent components of for all .

The term involving in Eq. (2) favors controls with small amplitude and serves as a regularization term. The idea is to use as the small parameter for a perturbation expansion. Note that has its sole origin in the formulation of the control problem. Assuming it to be small does not involve simplifying assumptions about the system dynamics. The dynamics is taken into account without approximations in the subsequent perturbation expansion. Furthermore, among all optimal controls, the unregularized () one brings the controlled state closest to the desired trajectory . For a given dynamical system, the unregularized optimal control solution can be seen as the limit of realizability of a prescribed desired trajectory .

Following a standard procedure Pontryagin and Boltyanskii (1962); Bryson and Ho (1975), the constrained optimization problem Eqs. (2) and (3) is converted to an unconstrained optimization problem by introducing the vector of Lagrange multipliers , also called the co-state or adjoint state. Minimizing the constrained functional with respect to , and yields the necessary optimality conditions,

(4)
(5)
(6)

with matrix , and Jacobi matrix of , and . The dynamics of an optimal control system takes place in the combined space of state and co-state with dimension . Starting from Eqs. (4)-(6), the usual procedure is to eliminate and solve the system of coupled ordinary differential equations (ODEs) for and .

To take advantage of the small parameter , we proceed differently. We rearrange the necessary optimality conditions such that multiplies the highest order derivative of the system. The rearrangement admits an interpretation of a singular optimal control problem as a singularly perturbed system of ODEs. Setting yields the outer equations but changes the differential order of the system. Remarkably, the outer equations may become linear even if the original system is nonlinear. We collect the conditions leading to linear outer equations under the name linearizing assumption.

Because decreases the differential order, not all initial and terminal conditions can be satisfied. Consequently, the outer solutions are not uniformly valid over the entire time domain. The non-uniformity manifests itself in initial and terminal boundary layers of width for certain components of the state . The boundary layers are resolved by the left and right inner equations valid near to the beginning and the end of the time interval, respectively. For , inner and outer solutions can be composed to an approximate composite solution valid over the entire time domain. See Bender and Orszag (1978) for analytical methods based on singular perturbations. The control signal is given in terms of the composite solution. The control exhibits a maximum amplitude at the positions of the boundary layers. In general, the inner equations are nonlinear even if the linearizing assumptions holds. However, for , the boundary layers degenerate into discontinuous jumps located at the beginning and end of the time interval. The jump heights are independent of any details of the inner equations, and the exact solution for is governed exclusively by the outer equations. Consequently, the linearizing assumption together with renders the nonlinear optimal control system linear. On the downside, the optimally controlled state becomes discontinuous at the time domain boundaries, and the control diverges. It is in this sense that we are able to speak about linear structures in nonlinear optimal control.

Rearranging the necessary optimality conditions.

The rearrangement is based on two complementary projection matrices

(7)

with symmetric matrix

(8)

We drop the dependence on the state in the following. It is understood that and all matrices, except , may depend on . The ranks of the projectors and are and , respectively, such that has only linearly independent components. The projectors satisfy idempotence, and , and , . The idea is to separate the state and adjoint state as well as their evolution equations with the help of and . The controlled state equation (5) is split as

(9)

After multiplication with , the first equation yields an expression for the control signal in terms of ,

(10)

Inserting from Eq. (10) in Eq. (4), multiplying by from the left and using yields

(11)

with symmetric matrix of rank . The adjoint state equation (6) is split analogously to Eq. (9). Subsequently, Eq. (11) is used to eliminate as well as from all equations. See the Supplemental Material (SM) sup () for a detailed derivation. The rearrangement results in the following, singularly perturbed system of linearly independent ODEs and linearly independent second order differential equations,

(12)
(13)
(14)

We introduced the matrix with entries and the vector

(15)

We emphasize that the rearrangement requires no approximation and is valid for all affine control systems with a cost functional of the form Eq. (2).

Outer equations and linearizing assumption.

The outer equations are obtained by setting in Eqs. (12)-(14) and subsequently multiplying Eq. (13) with from the left. Denoting the outer solutions by upper-case letters, and , we obtain

(16)
(17)
(18)

with and . In general, Eqs. (16)-(18) are nonlinear because , , , , and may all depend on .

The linearizing assumption consists of two parts. First, the matrix is assumed to be independent of . This assumption implies constant projectors and but not a constant input matrix . Second, the nonlinearity is assumed to have the following structure with respect to the input matrix,

(19)

with constant matrix and constant -component vector . Equation (17) together with the linearizing assumption yields an explicit expression for ,

(20)

Using Eqs. (19) and (20) in Eq. (18), we obtain linearly independent ODEs for and ,

(21)

with

(22)

Inner equations and matching.

The initial inner equation, valid at the beginning, or left side, of the time domain, is obtained by introducing functions and with a rescaled time . Performing a perturbation expansion of Eqs. (12)-(14) up to leading order in together with the linearizing assumption yields an ODE for ,

(23)

and , , . We introduced the matrix with entries , and . An analogous procedure yields the right inner equations for and with rescaled time valid at the right end of the time domain. They are identical in form to the left inner equations. All inner and outer equations must be solved with the matching conditions Bender and Orszag (1978),

(24)
(25)

and the boundary conditions for the state, , , see Eq. (3). From the constancy of and follow immediately the boundary conditions for the outer equations (21), and . The only non-constant inner solutions are and . They provide a connection from the the boundary conditions and to the outer solution given by Eq. (20). The connection is in form of a steep transition of width known as a boundary layer Bender and Orszag (1978). Note that the inner equations may still be nonlinear due to a possible dependence of and on . This nonlinearity originates from the input matrix , and vanishes for constant . See the SM sup () for details of the derivation.
The approximate solution to the necessary optimality conditions depends on the initial state . The system can either be prepared in , or is obtained by measurements. Once is known, no further information about the controlled system’s state is necessary to compute the control, resulting in an open-loop control. Measurements of the state performed at a later time can be used to update the control with as the new initial condition, leading to a sampled-data feedback law. In the limit of continuous monitoring of the system’s state , the initial conditions become functions of the current state of the controlled system itself, and the control becomes a continuous time feedback law Bryson and Ho (1975).

Figure 1: Optimal trajectory tracking for the damped mathematical pendulum. Top: While the numerically obtained, optimally controlled angular velocity (red dashed) resembles the desired velocity (blue solid) in shape (right), the angular displacement (left) is far off its desired counterpart . Bottom: The difference between analytical and numerical solution for the optimally controlled state reveals excellent agreement except close to the time domain boundaries. To leading order in the small parameter , the analytical results predicts a boundary layer of width for (right inset) but not for (left inset). As indicated by the red dots, the temporal resolution is just large enough to resolve the boundary layer.

Results for a two-dimensional dynamical system.

We compare the approximate analytical solution for small but finite with numerical results for the optimally controlled damped mathematical pendulum. The model is of the form Eq. (1) which satisfies the linearizing assumption. The EOM are

(26)

with denoting the angular displacement and the angular velocity. The desired trajectory is , . The terminal conditions and are chosen to lie on the desired trajectory, while the initial conditions and do not. Explicit analytical expressions for the optimally controlled composite state solution and the optimal control signal can be found in the SM sup (). The prescribed desired trajectory is not exactly realizable, which can be understood from physical reasoning. A point mass governed by Newton’s EOM can only trace out desired trajectories in phase space, spanned by and , which satisfy . This relation simply defines the velocity, and no control force of whatever amplitude can change that definition Löber (2016). Numerical computations are performed with the ACADO Toolkit Houska et al. (2011), an open source program package for solving optimal control problems. The problem is solved on a time interval of length with a time step width . A straightforward numerical solution of the necessary optimality conditions Eqs. (4)-(6) is usually impossible due to the mixed boundary conditions. While half of the boundary conditions are imposed at the initial time, the other half are imposed at the terminal time. This typically requires iterative algorithms such as shooting, and renders optimal trajectory tracking computationally expensive.
Figure 1 top compares the desired trajectory for angular displacement (left) and velocity (right) with the numerically computed, optimally controlled state . While the controlled velocity looks similar in shape to the desired trajectory, the controlled displacement is way off. Even though the terminal conditions comply with the desired trajectory, the solution for exhibits some very steep transients at both ends of the time interval. These transitions are the boundary layers of width described by the inner solutions. On this scale, there is no visible difference between the analytically and numerically obtained controlled state. We plot the difference between them in Fig. 1 bottom. Apart from the boundary layer region, the agreement is excellent. Deviations in the boundary layer region can be understood from the relatively poor numerical resolution equal to the boundary layer width, , see the insets for closeups of the boundary layers regions. Increasing leads to larger numerical errors but decreasing quickly increases computation time. While the analytical leading order result predicts a boundary layer for but not for , the numerical solution for exhibits a tiny boundary layer, which we expect analytically to arise from higher order contributions in the perturbation expansion.

Exact solution for .

For , the inner solutions do not play a role because the boundary layers exhibited by degenerate to jumps,

(27)

The jump heights are fully determined by the outer solution and the boundary conditions. The parts vanish identically for all times. The parts and are from the linear outer equations (21). Cast in terms of the state transition matrix , the solutions become

(28)

The parameter is determined from the terminal condition . The control is given in terms of by Eq. (10). We obtain (see the SM sup () for a derivation),

(29)

with

(30)

Because of , the jumps of at the time domain boundaries lead to divergences in form of Dirac delta functions located at the time domain boundaries.

Conclusions.

For , the dynamics of an optimal control system takes place in the in the combined state space of state and co-state of dimension . For , the dynamics is restricted by algebraic equations, and Eq. (27), to a hypersurface of dimension , also called a singular surface Bryson and Ho (1975). At the initial and terminal time, kicks in form of a Dirac delta function mediated by control induce an instantaneous transition from onto the singular surface and from the singular surface to , respectively. These instantaneous transitions render the state components discontinuous at the initial and terminal time, respectively. For , the discontinuities are smoothed out in form of boundary layers, i.e., continuous transition regions with a slope and width controlled by . The control signals are finite and exhibit a sharp peak at the time domain boundaries, with an amplitude inversely proportional to . This general picture of optimal trajectory tracking for small is independent of the linearizing assumption and remains true for arbitrary affine control systems Löber (2015).

In experiments and numerical simulations, it is impossible to generate diverging control signals. In practice, a finite value in Eq. (2) is indispensable. Nevertheless, understanding the behavior of control systems in the limit is very useful for applications. For example, to prevent or at least minimize any steep transitions and large control amplitudes, we may choose the initial state to lie on the singular surface. Furthermore, a faithful numerical representation of the solution to optimal control requires a temporal resolution to resolve the initial and terminal boundary layers.

The linearizing assumption yields linear algebraic and differential outer equations. While the inner equations may still be nonlinear for , they reduce to jumps, with jump heights independent of any details of the inner equations. Consequently, for , the exact solution for , , and , is entirely given in terms of the linear outer equations. For models of the form Eq. (1), this culminates in the following, surprising conclusion. The controlled state becomes independent of the external force as . Furthermore, the dependence on the nonlinearity is restricted to the boundary layers, which become irrelevant for . Consequently, for a given desired trajectory , and given initial and terminal conditions, and , all optimally controlled states of Eq. (1) trace out the same trajectories in phase space, independent of and . However, note that the control signal still depends on and .

Intuitively, the result can be understood as follows. The state components which and which are not acted upon directly by control are encoded in the matrices and , respectively. Assumption (19) of the linearizing assumption demands that the control acts on the nonlinear state equations, and all other equations are linear. For , the control amplitude is unrestricted. The control dominates over the nonlinearity and can absorb an arbitrarily large part of , resulting in linear evolution equations. The linearizing assumption is satisfied by all models of the form , which includes Eq. (1) and the FitzHugh-Nagumo model with a control acting on the activator as special cases. Another example is the SIR model for disease transmission with the transmission rate serving as the control Löber (2015).

Generalizations of the results presented here are possible. One example are noisy controlled systems, for which fundamental results exist for linear optimal control in form of the linear-quadratic regulator Bryson and Ho (1975). In this context, we mention Ref. Kappen (2005) which presents a linear theory for the control of nonlinear stochastic systems. However, the method in Kappen (2005) is restricted to systems with identical numbers of control signals and state components, . Other possibilities are generalizations to spatio-temporal systems as e.g. reaction-diffusion systems Löber and Engel (2014).

Acknowledgements.
We thank the DFG for funding via GRK 1558.

References

  • Khalil (2001) H. K. Khalil, Nonlinear Systems, 3rd ed. (Prentice Hall, 2001).
  • Chen (1998) C.-T. Chen, Linear System Theory and Design, 3rd ed. (Oxford University Press, 1998).
  • Bechhoefer (2005) J. Bechhoefer, Rev. Mod. Phys. 77, 783 (2005).
  • Ott et al. (1990) E. Ott, C. Grebogi,  and J. A. Yorke, Phys. Rev. Lett. 64, 1196 (1990).
  • Bellman (2003) R. Bellman, Dynamic Programming, Reprint ed. (Dover Publications, 2003).
  • Löber (2016) J. Löber, “Exactly realizable desired trajectories,”  (2016), arXiv:1603.00611 .
  • Löber (2015) J. Löber, Optimal trajectory trackingPh.D. thesis, Technical University Berlin (2015).
  • Pontryagin and Boltyanskii (1962) L. S. Pontryagin and V. G. Boltyanskii, The Mathematical Theory of Optimal Processes, 1st ed. (John Wiley & Sons Inc, 1962).
  • Bryson and Ho (1975) A. E. Bryson and Y.-C. Ho, Applied Optimal Control: Optimization, Estimation and Control, Revised ed. (Taylor & Francis Group, New York, 1975).
  • Bender and Orszag (1978) C. M. Bender and S. A. Orszag, Advanced Mathematical Methods for Scientists and Engineers (McGraw-Hill, New York, 1978).
  • (11) See Supplemental Material at [URL will be inserted by publisher] for additional derivations.
  • Houska et al. (2011) B. Houska, H. Ferreau,  and M. Diehl, Optim. Contr. Appl. Met. 32, 298 (2011).
  • Kappen (2005) H. J. Kappen, Phys. Rev. Lett. 95, 200201 (2005).
  • Löber and Engel (2014) J. Löber and H. Engel, Phys. Rev. Lett. 112, 148305 (2014).

Supplemental Material: Linear structures in nonlinear optimal trajectory tracking

Jakob Löber*

Institut für Theoretische Physik, EW 7-1, Hardenbergstraße 36, Technische Universität Berlin, 10623 Berlin

Appendix A: Rearranging the necessary optimality conditions

Here, we give a detailed version of the rearrangement of the necessary optimality conditions,

(S1)
(S2)
(S3)

with the initial and terminal conditions

(S4)

Together with the assumption , the rearrangement will enable the reinterpretation of Eqs. (S1)-(S3) as a singularly perturbed system of differential equations. To shorten the notation, the time argument of , and is suppressed in the following and some abbreviating matrices are introduced. Let the matrix be defined by

(S5)

A simple calculation shows that is symmetric,

(S6)

Note that is a symmetric matrix because is symmetric by assumption, and the inverse of a symmetric matrix is symmetric. Let the two projection matrices and be defined by

(S7)

and are idempotent, and . Furthermore, and satisfy the relations

(S8)

Computing the transposed of and yields

(S9)

and analogously for . Equation (S9) shows that , and therefore also , is not symmetric. However, satisfies the convenient property

(S10)

which implies

(S11)

and similarly for . The product of with yields

(S12)

and

(S13)

Let the matrix be defined by

(S14)

is symmetric,

(S15)

and satisfies

(S16)

Transposing yields

(S17)

The projectors and are used to partition the state as

(S18)

The controlled state equation (S2) is split in two parts by multiplying with and with from the left,

(S19)

The initial and terminal conditions are split up as well,

(S20)
(S21)

Multiplying the first equation of Eq. (S19) by from the left and using yields an expression for the control in terms of the controlled state trajectory ,

(S22)

The matrix is defined by

(S23)

The matrix can be used to rewrite the matrices and as

(S24)

respectively.
The solution for is inserted in the stationarity condition Eq. (S1) to yield

(S25)

Equation (S25) is utilized to eliminate any occurrence of the part in all equations. In contrast to the state , cf. Eq. (S18), the co-state is split up with the transposed projectors and ,

(S26)

Multiplying Eq. (S25) with from the right and using Eq. (S24) yields an expression for ,

(S27)

Transposing the last equation and exploiting the symmetry of , Eq. (S15), yields

(S28)

Equation (S28) is valid for all times such that we can apply the time derivative to get

(S29)

The short hand notations

(S30)

were introduced in Eq. (S29). Splitting the co-state as in Eq. (S26) and using Eq. (S28) to eliminate leads to

(S31)

Equation (S31) is an expression for independent of .
A similar procedure is performed for the adjoint equation (S3). Eliminating the control signal from Eq. (S3) gives

(S32)

Tho shorten the notation, the matrix

(S33)

with entries

(S34)

was introduced. With the help of the projectors and , Eq. (S32) is split up in two parts,

(S35)
(S36)

Using Eq. (S28) to eliminate in Eqs. (S35) and (S36) results in

(S37)
(S38)

with the abbreviation defined as the vector

(S39)

Equations (S31) and (S37) are two independent expressions for . Combining them yields a second order differential equation independent of and ,

(S40)

Equation (S40) contains several time dependent matrices which can be simplified. From Eq. (S17) follows for the time derivative of

(S41)

Furthermore, from

(S42)

follows

(S43)

due to the idempotence of projectors. Using Eqs. (S41) and (S43) in Eq. (S40) yields

(S44)

The form of Eq. (S44) together with

(S45)

makes it obvious that it contains no component in the ”direction” . Equation (S44) constitutes linearly independent second order differential equations for linearly independent state components . The boundary conditions necessary to solve Eq. (S44) are given by Eqs. (S20) and (S21).
To summarize the derivation, the rearranged necessary optimality conditions are