Convexity and monotonicity in nonlinear optimal control under uncertainty

# Convexity and monotonicity in nonlinear optimal control under uncertainty

Kevin J. Kircher,  and K. Max Zhang The authors are with the Sibley School of Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY 14853. {kjk82,kz33}@cornell.edu
###### Abstract

We consider the problem of finite-horizon optimal control design under uncertainty for imperfectly observed discrete-time systems with convex costs and constraints. It is known that this problem can be cast as an infinite-dimensional convex program when the dynamics and measurements are linear, uncertainty is additive, and the risks associated with constraint violations and excessive costs are measured in expectation or in the worst case. In this paper, we extend this result to systems with convex or concave dynamics, nonlinear measurements, more general uncertainty structures and other coherent risk measures. In this setting, the optimal control problem can be cast as an infinite-dimensional convex program if (1) the costs, constraints and dynamics satisfy certain monotonicity properties, and (2) the measured outputs can be reversibly ‘purified’ of the influence of the control inputs through Q- or Youla-parameterization. The practical value of this result is that the finite-dimensional subproblems arising in a variety of suboptimal control methods, notably including model predictive control and the Q-design procedure, are also convex for this class of nonlinear systems. Subproblems can therefore be solved to global optimality using convenient modeling software and efficient, reliable solvers. We illustrate these ideas in a numerical example.

Nonlinear systems, optimal control, convex optimization, model predictive control, Q-design, Youla parameterization, scenario optimization, sample-average approximation.

## I Introduction

We consider the problem of operating a system that evolves in discrete time steps over a finite horizon under uncertainty. Our goal is to design an output feedback control policy that minimizes a convex cost while satisfying convex constraints. This problem has been studied in the stochastic setting, where uncertainty is modeled probabilistically, the expected cost is minimized, and constraints are enforced in expectation or in probability. [1, 2, 3, 4, 5, 6] It has also been studied in the robust setting, where uncertainty is assumed to come from a given set, the worst-case cost is minimized, and constraints are enforced for all possible realizations of the uncertain influences on the system. [7, 8, 9, 10, 11] We accommodate both approaches in this paper, as well as others, adopting the flexible view of risk developed in [12, 13, 14, 15, 16, 17].

Optimal control under uncertainty is a hard problem, even when the costs and constraints are convex. This is due in part to the fact that the optimization variable is a policy, a collection of functions that map measured system outputs into control inputs at each time step. In the stochastic setting, optimal policies can, in principle, be found analytically using stochastic dynamic programming. [18] In practice, analytical solution is typically limited to systems of very low dimension. Notable exceptions are linear systems with additive uncertainty, no constraints, and costs that are either quadratic [19, 20] or exponential-of-quadratic [21, 22].

When analytical solution is impractical, various methods can generate suboptimal solutions that often perform well. Some examples are classical linear feedback control design, reinforcement learning [23, 24, 25, 26] and approximate dynamic programming [4, 27, 28, 29], approximation methods for multistage stochastic programming [6, 30, 31, 32, 33] and robust optimization with recourse [33, 34, 35, 36], the -design procedure [37, 38, 39, 40, 41, 42], and model predictive control (MPC) in its certainty-equivalent [43, 44, 45], stochastic [46, 47, 48, 49, 50, 51, 52], and robust [53, 54, 55, 56, 57] variants. When perfect state information is not available, control methods may be paired with a state estimator such as a linear [58, 59], extended [60, 61, 62] or unscented [63, 64] Kalman filter or a particle filter [65, 66].

These methods vary widely in their scope, scalability and performance. A common theme, however, is that they tend to work best for linear systems. This is due in part to the fact that for linear systems, (an equivalent transformation of) the optimal control problem is convex. [41] Suboptimal control methods often involve numerically solving optimization subproblems generated by the original optimal control problem. The convexity of the original problem typically carries over to the subproblems, allowing them to be efficiently and reliably solved to global optimality. When subproblems are nonconvex, however, guarantees of global optimality are generally unavailable, and solvers and initial guesses may need to be carefully tailored to the applications at hand.

In [41], Skaf and Boyd demonstrate that for linear systems with additive uncertainty, the control design problem can be transformed to an infinite-dimensional convex program. Their method hinges on a change of variables related to the - or Youla-parameterization [67, 68, 69] and to purified output feedback control [2, 10, 34]. When perfect state information is available, this change of variables parameterizes state feedback policies by equivalent disturbance feedback policies. Similar arguments have justified the use of (typically affine) disturbance feedback policies in robust and stochastic MPC of perfectly-observed linear systems. [42, 47, 70, 71, 72, 73, 74]

For nonlinear systems, the optimal control problem is widely understood to be nonconvex due to nonlinear equality constraints introduced by nonlinear dynamics. Equality constraints can be eliminated, however, by iteratively applying the dynamics to express the state trajectory in terms of the control and exogenous input trajectories. In [75], Rantzer and Bernhardsson observe that in convex-monotone systems, where the dynamics are convex and nondecreasing in the state and control input, the state trajectory is convex in the control input trajectory. In [76], Schmitt et al. generalize this result to convex-state-monotone systems, where the dynamics need not be monotone in the control input.

An immediate consequence of the observations in [75, 76] is that for some nonlinear systems, open-loop optimal control (where the decision variable is a fixed control trajectory, rather than a feedback policy) is a convex optimization problem. This holds for convex-state-monotone systems in particular, provided the cost and constraints are nondecreasing in the states. This raises two further questions: Are there other nonlinear systems for which open-loop optimal control is convex? What about closed-loop optimal control, where we optimize over policies?

This discussion motivates the definition of a convex system as one for which open-loop optimal control is a convex optimization problem. After setting the stage in §II, we establish three results for convex systems in this paper.

1. Characterization (§III): Systems with mixed convex and linear dynamics are convex systems, provided (a) any linear dynamics are independent of states with nonlinear dynamics, and (b) the cost, constraints and nonlinear dynamics are nondecreasing in the states with nonlinear dynamics. Concave dynamics can also be accommodated.

2. Convex closed-loop design (§IV): If the measured outputs can be reversibly ‘purified’ of the influence of the control inputs, then the closed-loop optimal control problem for convex systems can be transformed to an equivalent infinite-dimensional convex program. The transformation involves changing variables to policies in the purified outputs via - or Youla-parameterization.

3. Approximate solution (§VVI): The finite-dimensional subproblems arising in a variety of suboptimal control methods, notably including MPC and the -design procedure, are convex for convex systems. Subproblems can therefore be solved to global optimality using convenient modeling software and efficient, reliable solvers. We illustrate this in a numerical example.

## Ii Problem statement

### Ii-a System

We consider a system to be operated over a finite discrete time span . We can influence the system through the control inputs . The system is also influenced by exogenous inputs , which are generally uncertain.111We view the exogenous input as a random vector defined on an underlying probability space. We do not assume that the joint distribution of the exogenous input trajectory is known, but we do assume this distribution is independent of the control input trajectory. The exogenous inputs may include process disturbances, sensor noise, initial states, uncertain model parameters, prices, command or reference signals, etc. The exogenous inputs could come from bounded or unbounded sets, and need not be independent or identically distributed over time.

The control and exogenous inputs determine the states through the system dynamics:

 x0 =f0(δ0) (1) xt =ft(xt−1,ut−1,δt),t=1,…,T.

We observe the system through the measured outputs

 y0 =g0(x0,δ0) (2) yt =gt(xt,ut−1,δt),t=1,…,T−1.

We assume that the dynamics mappings and measurement mappings are known. At each time , the controller receives the measured output . It decides the control input

 ut=πt(y0,…,yt)

by evaluating an output feedback control law . The control policy is designed in advance; this design problem is the subject of this paper.

To simplify specification of the policy design problem, we work with the input, state and output trajectories. By iteratively applying the dynamics, the state trajectory can be expressed in terms of the control and exogenous input trajectories as

 x=⎡⎢ ⎢ ⎢ ⎢⎣x0x1⋮xT⎤⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣ϕ0(δ0)ϕ1(u0,δ0:1)⋮ϕT(u0:T−1,δ0:T)⎤⎥ ⎥ ⎥ ⎥ ⎥⎦=ϕ(u,δ).

Here the subscript denotes a trajectory from time to . For example, and . The input-state mappings are defined by the recursion

 ϕ0(δ0) =f0(δ0) ϕ1(u0,δ0:1) =f1(ϕ0(δ0),u0,δ1) ϕt(u0:t−1,δ0:t) =ft(ϕt−1(u0:t−2,δ0:t−1),ut−1,δt), t=2,…,T.

Similarly, the measured output trajectory can be written as

 y=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣y0y1⋮yT−1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣ψ0(δ0)ψ1(u0,δ0:1)⋮ψT−1(u0:T−2,δ0:T−1)⎤⎥ ⎥ ⎥ ⎥ ⎥⎦=ψ(u,δ),

where the input-output mappings are defined recursively by

 ψ0(δ0) =g0(f0(δ0),δ0) ψt(u0:t−1,δ0:t) =gt(ϕt(u0:t−1,δ0:t),ut−1,δt), t=1,…,T−1.

We also write the control input trajectory as

 u=π(y)=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣π0(y0)π1(y0:1)⋮πT−1(y0:T−1)⎤⎥ ⎥ ⎥ ⎥ ⎥⎦.

Figure 1 illustrates the system.

### Ii-B Cost, constraints and risk measures

We are interested in designing the output feedback control policy . The policy must be causal, meaning each control law can depend on outputs from the past or present, but not the future. We denote the set of causal output feedback policies by . Our goal is to find a that minimizes a cost

 c0(x,u,δ) (3)

while satisfying constraints

 cj(x,u,δ)≤0,j=1,…,J. (4)

We assume that each scalar-valued function is convex in for all .

Because the exogenous inputs are uncertain, the goals of minimizing cost and satisfying constraints are ambiguous. Should they be accomplished on average with respect to the distribution of , for all possible realizations of , or somewhere in between? Resolving these ambiguities amounts to choosing measures of risk.222For more on measuring risk, a rich subject that has received much recent attention, we refer the reader to [12, 13, 14, 15, 16, 17]. A risk measure is a functional that maps uncertain scalars into deterministic scalars (or possibly ). It quantifies the risk associated with positive realizations of its uncertain argument. In the language of [12, 13], the risk measure should be associated with constraint if the designer views the risk of constraint violation as acceptable whenever

 Rjcj(x,u,δ)≤0.

The risks associated with excessive costs can be treated similarly to the risks associated with constraint violations, because minimizing is equivalent to minimizing a deterministic scalar subject to the constraint .

Table I describes five risk measures commonly used in optimization under uncertainty. The expected value is risk-neutral: it weighs constraint violations and slacks equally, modeling the risk of constraint violation as acceptable if slacks balance out violations on average. The worst-case value, by contrast, is maximally risk-averse: it models any possibility of constraint violation as unacceptable. The value-at-risk at confidence level () is the smallest threshold that, with 100% confidence, will not be exceeded. Typical values of are 0.9, 0.95 and 0.99. generates chance constraints of the form

 P{cj(x,u,δ)≤0}≥β.

The conditional value-at-risk at confidence level () is the conditional expectation, conditioned on the event that is exceeded. [14, 15] always upper bounds . The of can be constrained by minimizing a scalar subject to

 E(cj(x,u,δ)−α)+≤−α(1−β),

where denotes the positive part, . An optimal upper bounds ; this bound is usually tight. [15]

In this paper, we leave the choice of risk measures to the designer. This flexibility makes the results in the following sections applicable in a range of settings, including the robust and stochastic frameworks and others. We do, however, impose one restriction on the risk measures: we assume that they are coherent in the sense of [12, 13, 16, 17]. By this, we mean that each risk measure is convex, nondecreasing, lower semicontinuous, and preserves certainty (i.e., for constant ). We make this assumption primarily because coherent risk measures preserve convexity: if is coherent and is convex for all , then is convex. [16]

All but one of the risk measures mentioned above are coherent. The exception is , which is convex only in structured special cases. We note, however, that can be approximated in a convexity-preserving fashion using sampling [77, 78, 79, 80, 81, 82] or conservative upper bounds such as [6, 50].

In summary, we are interested in the following problem of optimal control under uncertainty:

 minimizeπ∈ΠR0c0(x,u,δ)subject toRjcj(x,u,δ)≤0,j=1,…,Jx=ϕ(u,δ)u=π(y)y=ψ(u,δ). (OC)

This problem is intractable in its current form. This is due in part to the fact that the decision variable is infinite-dimensional, an obstacle that we will address approximately in §V. Problem (OC) is also complicated by the interdependence between the state, output and control trajectories. We will unravel this interdependence in §IV using a nonlinear change of variables. First, we will consider the case of open-loop control, where the control laws are restricted to be constant.

## Iii Convex systems

To understand when closed-loop control design is a convex optimization problem, we begin by building intuition in the simpler context of open-loop control. In this context, a (finite-dimensional) vector of control inputs is decided in advance and implemented without feedback. This amounts to drastically restricting the search space from the set of all causal output feedback policies to the subspace of constant policies. With this restriction, the open-loop optimal control problem is to

 minimize~uR0c0(ϕ(~u,δ),~u,δ)subject toRjcj(ϕ(~u,δ),~u,δ)≤0,j=1,…,J. (OLOC)

This is a finite-dimensional optimization problem in the vector . Because coherent risk measures preserve convexity [16], Problem (OLOC) is convex if each cost and constraint function , when composed with the input-state mapping , is convex in for all . This motivates the following definition.

###### Definition 1 (Convex system).

We call the system with dynamics (1), cost (3) and constraints (4) a convex system if the functions

 u↦cj(ϕ(u,δ),u,δ),j=0,…,J

are convex for all .

By definition, open-loop optimal control of convex systems is a convex optimization problem. In §IV, we will show that closed-loop optimal control of convex systems is also convex, provided the outputs can be reversibly ‘purified’ of the influence of the control inputs. First, we will characterize a class of convex systems.

### Iii-a Characterizing convex systems

To understand when a system is convex, we need to understand how the states and controls propagate through the dynamics (1) into the cost (3) and constraints (4). If the dynamics are linear, then this process is straightforward. The state trajectory is affine in the control trajectory for all . Convexity is preserved under composition with an affine mapping [83], so for linear systems is convex for all .

When the system is nonlinear, the process is less straightforward. We do not explore it in general here. Instead, we consider a class of systems with the following dynamics:

 xt=[xafftxcvxt] =[At(δt)xafft−1+Bt(δt)ut−1+wt(δt)ht(xafft−1,xcvxt−1,ut−1,δt)]. (5)

For this class of systems, convexity can be established using simple composition rules. An important restriction is that the states with linear dynamics are independent of the nonlinear states . This ensures that the trajectory is affine in the control trajectory. Any states with linear dynamics that depend on nonlinear states are included in .

###### Theorem 1 ().

The system with dynamics (5), cost (3) and constraints (4) is a convex system if the following conditions hold for all .

1. For , each row of the nonlinear dynamics mapping is

1. jointly convex in , and

2. nondecreasing in each element of .

2. For , the function

 (xaff,xcvx,u)↦cj((xaff,xcvx),u,δ)

is nondecreasing in each element of .

Appendix A contains a proof of Theorem 1. The proof hinges on the fact that the composition of a convex nondecreasing function with a convex function is convex. [83]

Three remarks are in order. First, system (5) is a straightforward generalization of the convex-state-monotone system studied in [76], which in turn generalizes the convex-monotone system studied in [75]. A precise name for system (5) would be convex-nonlinear-state-monotone, since the monotonicity requirement applies only to the states with nonlinear dynamics. The purpose of this generalization is to absorb the class of linear systems for which optimal control was shown in [41] to be convex. In particular, the linear-quadratic system is not convex-monotone or convex-state-monotone in general, but is a special case of system (5) with linear dynamics, additive uncertainty, no constraints and quadratic cost.

Second, concave dynamics can be included in system (5) after a sign change. For example, we consider a system with

 zcvxt =hcvxt(xafft−1,zcvxt−1,zccvt−1,ut−1,δt) zccvt =hccvt(xafft−1,zcvxt−1,zccvt−1,ut−1,δt),

where each row of is nondecreasing in , nonincreasing in and convex. Similarly, each row of is nondecreasing in , nonincreasing in and concave. If we define the nonlinear state as

 xcvxt=[zcvxt−zccvt] =⎡⎣hcvxt(xafft−1,zcvx% t−1,zccvt−1,ut−1,δt)−hccvt(xafft−1,zcvxt−1,zccvt−1,ut−1,δt)⎤⎦ =ht(xafft−1,xcvxt−1,ut−1,δt),

then each row of is nondecreasing in and convex, as required by Theorem 1.

Third, system (5) includes a rich class of nonlinear systems. Provided careful attention is paid to curvature and monotonicity, it can admit exponentials and logarithms, quadratic forms, roots, powers, nonsmooth functions such as maxima, minima and absolute values, and sums and compositions of the above. Many more examples of nonlinear convex and concave functions can be found in §3 of [83]. Systems with nonlinear convex dynamics have arisen in applications ranging from cancer and HIV treatment scheduling [75] to voltage control in power systems [75] to freeway congestion management [76] to energy storage control [84, 85]. The drug treatment model in [75] is a special case of a more general class of bilinear systems that, through a logarithmic transformation, can be cast as convex systems.

## Iv Convex closed-loop optimal control

By contrast to the open-loop optimal control problem (OLOC), the equality constraints in the closed-loop optimal control problem (OC) cannot be easily eliminated. This is due to the complicated interdependence between the state, output and control input trajectories. Under some conditions on the uncertainty structure, however, this interdependence can be disentangled through a nonlinear change of variables related to the - or Youla-parameterization [67, 68, 69] and purified output feedback control [2, 10, 34]. We now establish sufficient conditions for this change of variables to be possible.

### Iv-a Purifiability and Q-parameterization

###### Definition 2 (Purifiable).

We call the system with dynamics (1) and measurements (2) purifiable if for each , there exist mappings , and such that for all and ,

 pt(ψ0:t(u0:t−1,δ0:t),u0:t−1)=ξt(δ0:t) ⟺ qt(ξ0:t(δ0:t),u0:t−1)=ψt(u0:t−1,δ0:t).

We call the purified output and the purifier.

We note for future reference that is a function of only:

 y0=g0(x0,δ0)=g0(f0(δ0),δ0).

Without loss of generality, therefore, we define , and such that

 p0(y0)=y0,q0(e0)=e0,ξ0(δ0)=g0(f0(δ0),δ0).
###### Theorem 2 ().

If the system with dynamics (1) and measurements (2) is purifiable, then there exists a one-to-one correspondence between causal output feedback policies and causal policies in the purified output . Furthermore, given a causal in , the unique corresponding causal in can be constructed from the following recursion:

 π0(y0) =Q0(y0) (6) πt(y0:t) =Qt(p0:t(y0:t,π0:t−1(y0:t−1))), t=1,…,T−1.

Theorem 2 is closely related to the nonlinear discrete-time Youla parameterization presented by Wu and Lall in [69]. We include a proof in Appendix B for completeness. We give some examples of purifiable nonlinear systems in §IV-B.

Theorem 2 establishes that if the system is purifiable, then we can optimize over policies in the purified output . We interpret as what remains of the output when the effect of the control inputs has been removed. Given a causal , the unique corresponding causal can be recovered. We denote the set of causal by .

Under this change of variables, an equivalent reformulation of the closed-loop optimal control problem (OC) is to

 minimizeQ∈QR0c0(x,u,δ)subject toRjcj(x,u,δ)≤0,j=1,…,Jx=ϕ(u,δ)u=Q(e)e=ξ(δ).

The equality constraints can now be eliminated, giving another equivalent problem:

 minimizeQ∈QR0c0(ϕ(Q(ξ(δ)),δ),Q(ξ(δ)),δ)subject toRjcj(ϕ(Q(ξ(δ)),δ),Q(ξ(δ)),δ)≤0,j=1,…,J. (OC-Q)

Problem (OC-) is structurally identical to the open-loop optimal control problem (OLOC), except that the optimization is over (infinite-dimensional) purified output feedback policies rather than (finite-dimensional) control input trajectories . It follows that the two problems are convex under the same conditions, namely for convex systems. This observation, together with Theorem 2, gives the following result.

###### Corollary 3 ().

For purifiable systems, the output feedback policy design problem (OC) and the purified output feedback policy design problem (OC-) are equivalent. For convex systems, Problem (OC-) is convex.

We will see in §V that Corollary 3 establishes the basic tractability of a variety of suboptimal control schemes for nonlinear convex systems. First, we provide a few examples of purifiable systems.

### Iv-B Purifiability examples

Purifiability is essentially an invertibility property on the input-output mapping with respect to the exogenous inputs. The requirements for purifiability are (1) at each time step, the influence of the control input history can be removed from the current output, possibly using the output history; and (2) this process must be reversible, in the sense that the current output can be reconstructed from the purified output history and the control input history. To ground this notion, we now provide some concrete examples of purifiable systems. This list is not exhaustive.

##### Measured exogenous inputs

If the exogenous inputs are measured exactly (), then the system is purifiable with

 et =δt pt(y0:t,u0:t−1) =yt qt(e0:t,u0:t−1) =et.
##### Pure estimation

If the controller can observe the system but not influence it, then the states and outputs can be expressed as

 xt =ft(xt−1,δt)=ϕt(δ0:t) yt =gt(xt,δt)=gt(ϕt(δ0:t),δt)=ψt(δ0:t).

In this case, the system is trivially purifiable with . This implies that various constrained estimation problems can be put in the form of Problem (OC-). For example, we consider the problem of designing a state estimator such that minimizes the mean squared error in estimating , with the prior knowledge that almost surely. This can be put in the form of Problem (OC-) by setting , , and .

##### Perfect state information, invertible dynamics

If the states are measured exactly () and the dynamics are invertible in the exogenous inputs, i.e., there exist mappings such that

 f−1t(ft(xt−1,ut−1,δt),xt−1,ut−1)=δt,

then the system is purifiable with

 et =δt pt(y0:t,u0:t−1) =f−1t(yt,yt−1,ut−1) qt(e0:t,u0:t−1) =ϕt(u0:t−1,e0:t).

In the special case of additive disturbances, , the purifier reduces to

 pt(y0:t,u0:t−1)=yt−ft(yt−1,ut−1).
##### Deterministic dynamics, invertible measurements

If the initial state is measured exactly (), the dynamics are deterministic,

 xt=ft(xt−1,ut−1)=ϕt(u0:t−1,x0),

and the measurements are invertible in the exogenous inputs, i.e., there exist mappings such that

 g−1t(gt(xt,ut−1,δt),xt,ut−1)=δt,

then the system is purifiable with

 et =δt pt(y0:t,u0:t−1) =g−1t(yt,ϕt(u0:t−1,y0),ut−1) qt(e0:t,u0:t−1) =gt(ϕt(u0:t−1,e0),ut−1,et).

In the special case of additive noise, , the purifier reduces to

 pt(y0:t,u0:t−1)=yt−gt(ϕt(u0:t−1,y0),ut−1).
##### State-affine dynamics and measurements, additive uncertainty

If the dynamics and measurements have the form

 ft(xt−1,ut−1,δt) =Atxt−1+fut(ut−1)+wt(δt) gt(xt,ut−1,δt) =Ctxt+gut(ut−1)+vt(δt),

then it can be shown that the input-output mappings are additive:

 yt=ψt(u0:t−1,δ0:t)=ψut(u0:t−1)+ψδt(δ0:t).

In this case, the system is purifiable with

 et =ψδt(δ0:t) pt(y0:t,u0:t−1) =yt−ψut(u0:t−1) qt(e0:t,u0:t−1) =et+ψut(u0:t−1).

## V Approximate solution methods

Although Problem (OC-) is convex for convex systems, it remains challenging for two reasons. First, the decision variable is infinite-dimensional. Second, the risk measures may be difficult to compute, or even ill-defined if some distributional information is lacking. We discuss methods for addressing the infinite-dimensionality in §V-A and for approximating risk measures in §V-B. The upshot of this discussion is that several suboptimal control methods that perform well for linear systems can be applied to nonlinear convex systems using finite-dimensional convex optimization.

### V-a Finite-dimensional restrictions

#### V-A1 Open-loop model predictive control

A simple, effective method for overcoming the challenge of infinite dimensionality is open-loop MPC. In this method, at each time step we solve a version of the (finite-dimensional) open-loop optimal control problem (OLOC) over a truncated, receding horizon. This generates a planned control input trajectory. We implement the first step in this plan, the system evolves, we update the state estimate and the process repeats. As discussed in §III, open-loop optimal control is convex for convex systems, even those with nonlinear dynamics.

In one common variant of open-loop MPC, typically called certainty-equivalent MPC, the subproblem at each time step is solved under a single prediction of the disturbance trajectory. [43, 44, 45] This amounts to measuring risk with the predicted value risk measure discussed in §II-B. Other risk measures can be used in open-loop MPC, however, and can significantly improve performance. [46, 48, 49, 50, 51, 52, 53, 54, 55, 57] Risk measures can be approximated if necessary, as discussed in §V-B.

Open-loop MPC has several advantages. First, it often performs well in practice. Second, the design process is straightforward and intuitive; the primary design decisions are the prediction horizon, the terminal cost and/or constraints with which to augment the subproblems, and the algorithms for prediction and state estimation. A third advantage, and perhaps an underappreciated one, is that open-loop MPC admits very general uncertainty structures, including additive, multiplicative and others. In particular, since it does not require the Youla-type change of variables discussed in §IV-A, open-loop MPC can be applied to systems that are not purifiable.

Open-loop MPC also has several disadvantages. First, its implementation is computationally intensive due to the use of online optimization; this can limit its scalability. (We note, however, that for linearly constrained linear systems with quadratic costs, open-loop MPC can be implemented efficiently using custom solvers that exploit the problem structure. [45]) Second, open-loop MPC yields no closed-form expression for the control policy ; this complicates analysis of closed-loop stability, robustness, etc. Third, the method’s performance can suffer somewhat due to the open-loop structure of the optimal control subproblems. This structure ignores the controller’s opportunity to respond to future information as it becomes available, i.e., the controller’s recourse. The -design procedure partially addresses these disadvantages.

#### V-A2 Q-design

In open-loop MPC, the policy design procedure is straightforward (choosing a few parameters and subroutines), but implementation is computationally intensive due to the use of online optimization. In the -design procedure [37, 38, 39, 40, 41, 42, 73], by contrast, policy design is computationally intensive, but implementation is extremely efficient. In particular, no online optimization is needed. Another important distinction is that unlike MPC, -design yields a closed-form control policy. This facilitates analysis and simulation of the closed-loop system.

In -design, we design a suboptimal causal purified output feedback policy

 Qθ=K∑k=1θkQ(k).

Here are causal basis policies selected by the designer and is a parameter vector. The parameters are decided by solving a finite-dimensional analogue of Problem (OC-), with replaced by :

 minimizeθR0c0(ϕ(Qθ(ξ(δ)),δ),Qθ(ξ(δ)),δ)subject toRjcj(ϕ(Qθ(ξ(δ)),δ),Qθ(ξ(δ)),δ)≤0,j=1,…,JQθ(ξ(δ))=∑Kk=1θkQ(k)(ξ(δ)). (OC-Qθ)

Convex constraints on can also be added, e.g., to cultivate a particular structure in the control policy. Given , the unique corresponding causal output feedback policy can be constructed from the recursion in Theorem 2.

Because is affine in and convexity is preserved under composition with an affine mapping, the -design problem (OC-) inherits the convexity of Problem (OC-) for convex systems. It follows that -design can be applied to convex systems, even nonlinear ones, using finite-dimensional convex optimization. Risk measures in Problem (OC-) can be approximated if necessary, as discussed in §V-B.

In principle, -design can solve the optimal control problem (OC-) to any degree of accuracy for a sufficiently large basis. In practice, -design has two disadvantages. First, its performance depends intimately on the choice of basis policies. In [42], Skaf and Boyd explore the natural choice of affine policies in the context of linear systems. While affine -design performs well in a number of problems, the performance is typically not as good as open-loop MPC. To our knowledge, the general problem of finding good nonlinear basis policies has not yet been solved. A second disadvantage of -design is that, as discussed in §IV-A, it can be applied only to purifiable systems. If the system is not purifiable, then an implementable output feedback policy may not be recoverable from the -designed policy.

#### V-A3 Closed-loop model predictive control

The final approximate solution method we discuss, closed-loop MPC, can be viewed as receding horizon -design. As in open-loop MPC, in closed-loop MPC we solve an optimal control subproblem at each time step over a truncated, receding horizon. The key difference is that the closed-loop MPC subproblems include a recourse model, i.e., a model of the controller’s response to future information as it becomes available. More precisely, the open-loop MPC subproblems are truncated versions of the open-loop optimal control problem (OLOC), while the closed-loop MPC subproblems are truncated versions of the -design problem (OC-). Closed-loop MPC with affine recourse models is developed for linear systems in [42, 47, 50, 51, 56, 57].

Closed-loop MPC addresses the two disadvantages of -design. Like open-loop MPC, closed-loop MPC can be applied to systems that are not purifiable. To see this, we note that the first step of the recursion in Theorem 2,

 π0(y0)=Q0(y0),

can always be implemented, even if later steps cannot. Closed-loop MPC also tends to be less sensitive than -design to the choice of basis policies; in closed-loop MPC the policy designed at each time step is merely a model of future recourse, while in -design the policy is the actual source of the implemented control inputs.

Closed-loop MPC can outperform open-loop MPC due to the inclusion of a recourse model. [42] Its design process is similarly straightforward; the only additional step is choosing basis policies. Like open-loop MPC, however, closed-loop MPC involves computationally intensive, online optimization. A closed-form expression for the closed-loop MPC policy is generally not available.

### V-B Risk measure approximation

The control methods discussed in §V-A all require solving finite-dimensional convex optimization subproblems under uncertainty. These subproblems can be solved exactly in a few special cases. In general, however, we must resort to approximate solution methods. [3, 6, 10]

We now describe one approximate solution method based on sampling. This method can accommodate each of the risk measures discussed in §II-B. We begin by obtaining samples from the distribution of the exogenous input trajectory , or in the robust setting, from the corresponding uncertainty set. Samples could be obtained, e.g., from historical data or a pseudorandom number generator. Each risk measure is then replaced by an approximation based on the samples . This results in a convex optimization problem (with random input data) that can be solved using off-the-shelf software. Depending on the underlying risk measures, this sample-based approximate solution scheme is known as scenario optimization [77, 78, 79, 80, 81, 82] or sample-average approximation [6, 86, 87, 88].

Table II shows sample-based approximations for the five risk measures in Table I. The worst-case value and value-at-risk (which generate robust and chance constraints, respectively) can be approximated by maxima over the full sample. Expectation-type risk measures, including conditional value-at-risk, can be approximated using sample averages. Both of these approximations preserve the convexity of the underlying cost or constraint function. Theoretical bounds on violation probabilities for robust and chance constraints are available. [77, 78, 79, 80, 81, 82] Some convergence results for sample-average approximation can be found in [6, 86, 87, 88]. Variance reduction methods and hypothesis tests of solution quality can also be applied. [6]

To illustrate this method, we consider the open-loop optimal control problem (OLOC). The sample-based approximation to this problem is to

 minimize~u^RN0(c0(ϕ(~u,δ(1)),~u,δ(1)),…,c0(ϕ(~u,δ(N)),~u,δ(N)))subject to^RNj(cj(ϕ(~u,δ(1)),~u,δ(1)),…,cj(ϕ(~u,δ(N)),~u,δ(N)))≤0,j=1,…,J.

The -design problem (OC-) can be approximated similarly. The decision variables in these problems ( and , respectively) can be regularized to avoid overfitting the training data . The computational complexity of the sampled problem generally scales linearly with , so relatively large sample sizes can often be used. Different sample sizes can be used to approximate the different risk measures .

## Vi Numerical example

In this section, we demonstrate the new capabilities developed in this paper through a numerical example. The example includes nonlinear dynamics, non-additive uncertainty and the risk measure.

We consider a single-input, single-output system with dynamics

 xt+1=(1−w(δt+1)/20)(xt)2+−ut+w(δt+1). (7)

We assume the controller has perfect state information (), so that we can directly compare control methods in isolation from the state estimation problem. The unforced dynamics have a stable equilibrium at the origin and an unstable equilibrium at unity. Our primary control objective is to maintain the state within the basin of attraction of the stable equilibrium, i.e., to satisfy the constraint , where

 c1(x)=max{x1,…,xT}−1.

We accept that this constraint may occasionally be violated due to the uncertain initial state and disturbances, but tolerate violations only if they are both infrequent and small. For this reason, we measure the risk of constraint violations by the conditional value-at-risk () at confidence level . This risk measure addresses both the probability and (conditionally) expected magnitude of constraint violations. As discussed in §II-B, can be constrained by minimizing a scalar subject to

 E(c1(x)−α)+≤−α(1−β).

We would also like to minimize the total cost of control effort,

 c0(u,δ)=p(δ1)|u0|+⋯+p(δT)|uT−1|,

where are uncertain prices. We choose the cost risk measure .

The dynamics (7), while nonlinear, are nondecreasing in and convex in for all . (We assume , so the coefficient multiplying is nonnegative.) The cost function is convex in for all , and the constraint function is convex and nondecreasing in each of . By Theorem 1, therefore, the system is convex. It is also purifiable, since the controller has perfect state information and the dynamics are invertible in the disturbance . The purified output at time is , with purifier

 pt(y0:t,u0:t−1)=yt−(yt−1)2++ut1+(yt−1)2+/20.

The inverse purifier can be straightforwardly constructed from the system dynamics. Theorem 2 therefore applies, establishing a bijection between disturbance feedback policies and state feedback policies for this system. This gives our (infinite-dimensional, convex) control design problem:

 minimizeQ∈Q, αα+Ec0(Q(ξ(δ)),δ)subject toE(c1(ϕ(Q(ξ(δ)),δ))−α)+≤−α(1−β). (8)

We compare five suboptimal controllers for this system. The first is the optimal affine disturbance feedback policy, computed via the -design procedure. The second is a nonlinear -designed policy with piecewise quadratic basis policies tuned through trial and error. The third is the certainty-equivalent variant of open-loop MPC discussed in §V-A1; subproblems are solved under a single prediction of the exogenous input trajectory (in this case, its conditional expectation). The fourth is open-loop MPC with the same risk measures as Problem (8). The fifth is closed-loop MPC with the same risk measures and an affine disturbance feedback recourse model. We refer to the fourth and fifth controllers as open-loop and closed-loop scenario MPC, respectively, as is common in the literature. [51, 89]

Implementing each of these controllers involves numerically solving subproblems generated by Problem (8). As discussed in §V-B, we replace the expected values in these subproblems by sample averages over a training set of 1,000 sample exogenous input trajectories. We use a validation set of another 1,000 samples to tune parameters such as the basis policies in nonlinear -design and the MPC prediction horizon. We then compare the policies’ performance in a test set of another 2,000 samples.

In simulations, the initial state has a half-normal distribution on with mean two. The disturbances are identically Beta distributed, shifted and scaled to have mean zero and support . Each price of control effort is exponentially distributed with mean ; prices are lowest (on average) in the middle of the control horizon. The prices, disturbances and initial state are mutually independent.

Optimization is done in MATLAB using the Gurobi solver and the CVX modeling toolbox [90], which makes specifying our problems very easy. For example, the following code computes the optimal open-loop control input trajectory over sample exogenous input trajectories stored in x0 (), w () and p ():

variables a u(T,1)
expression x(T,N)
x = stateTrajectories(x0,u,w,f,T);
minimize( a + sum(c0(u,p))/N )
subject to
sum(pos(c1(x) - a))/N <= -a*(1 - b)


We wrote the vectorized function stateTrajectories to play the role of the input-state mapping , recursively building the state trajectory in each of the Monte Carlo simulations from the dynamics function f. It typically speeds up CVX modeling by an order of magnitude compared to a loop over Monte Carlo runs. This function, along with all other code used in this paper, is online at [91].

Figure 2 shows the sample-average cost and constraint conditional value-at-risk of each controller in the test set of exogenous input trajectories. Table III shows the frequency with which each controller satisfies the constraint in the test set. The five controllers strike different trade-offs between cost and robust constraint satisfaction. Certainty-equivalent MPC is aggressive, achieving low cost but poor robustness. The optimal affine controller satisfies constraints more robustly at the cost of increased control effort. The -designed nonlinear controller performs significantly better than the optimal affine controller, achieving similar cost and robustness to both of the scenario MPC variants. Including an affine recourse model in scenario MPC reduces cost slightly over open-loop scenario MPC, but also increases the risk of constraint violation. In this example, the robustness improvement of scenario MPC over certainty-equivalent MPC comes mainly from the use of samples, rather than the recourse model.

This numerical example demonstrates the key message of this paper: Suboptimal control methods that perform well for linear convex systems can be applied directly to nonlinear convex systems using the same software, and can perform similarly well. In this example, the -designed and predictive controllers show similar performance trends to the trends demonstrated for linear systems in [41, 42] and elsewhere. In particular, (a) scenario MPC is more robust than certainty-equivalent MPC, while achieving lower cost than affine controllers, and (b) nonlinear -designed controllers with good basis policies can be competitive with scenario MPC.

## Vii Conclusion

In this paper, we explored the class of nonlinear systems for which optimal control design under uncertainty can be cast as a convex optimization problem. We adopted a flexible approach to the risks associated with constraint violations and excessive costs, accommodating both the robust and stochastic views. We showed that open-loop optimal control is convex for nonlinear systems with convex or concave dynamics, provided the dynamics, cost and constraint functions satisfy certain monotonicity properties. We then showed that under the same conditions, closed-loop control design can be reformulated as a convex program if, in addition, the measured outputs can be reversibly ‘purified’ of the influence of the past control inputs.

The practical value of these results is the guarantee that, for a class of nonlinear systems, the subproblems solved in various suboptimal control methods are convex. This removes concerns about solvers getting stuck in local minima, and enables the use of convenient modeling software and reliable, efficient solvers. We illustrated this numerically for two methods, the -design procedure and model predictive control.

There are a number of opportunities to extend this work. First, it would be interesting to survey applications in which nonlinear convex systems arise. We are aware of several examples, including cancer and HIV treatment scheduling [75], voltage control in electricity distribution networks [75], freeway traffic congestion management [76] and energy storage control [84, 85], but this list is likely incomplete.

Second, the requirement of purifiability appears to significantly restrict the uncertainty structures and nonlinearities admissible for -design. Exploring purifiability further could enable convex control design for a richer class of nonlinear systems. We note, however, that other methods, notably including model predictive control, do not require purifiability.

Third, the class of nonlinear systems for which optimal control is convex could be further explored. We characterized only a subset of this class of systems; our method relied on the application of simple composition rules. While this method is constructive and compatible with convenient modeling software such as CVX [90] and YALMIP [92], it is not exhaustive.

## Appendix A Proof of Theorem 1

Our task is to show that the function

 u↦cj((C(δ)u+v(δ),η(u,δ)),u,δ)

is convex for all . Here we have expressed the state trajectory as

 x=[xaffxcvx]=[C(δ)u+v(δ)η(u,δ)].

We will define the mappings , and shortly.

We recall that convexity is preserved under composition with an affine mapping and under the composition of a convex nondecreasing function with a convex function [83], and that is nondecreasing in each element of by assumption. It therefore suffices to show that each row of is convex for all .

To do this, we need the recursive definitions of , and . They are initialized by

 C1(δ1) =B1(δ1) v1(δ0:1) =A1(δ1)w0(δ0)+w1(δ1) η1(u0,δ0:1) =h1(w0(δ0),h0(δ0),u0,δ1).

For ,

 Ct(δ1:t) =[At(δt)Ct−1(δ1:t−1),Bt(δt)] vt(δ0:t) =At(δt)vt−1(δ0:t−1)+wt(δt) ηt(u0:t−1,δ0:t) =ht(Ct−1(δt−1)u0:t−2+vt−1(δ0:t−1), ηt−1(u0:t−2,δ0:t−1),ut−1,δt).

The proof now proceeds by induction. At , is convex in by the convexity of . For the inductive step, we suppose that each row of is convex in . By assumption, each row of is nondecreasing in and convex. Therefore, involves compositions of convex functions with the affine mapping