Convexity and monotonicity in nonlinear optimal control under uncertainty
Abstract
We consider the problem of finitehorizon optimal control design under uncertainty for imperfectly observed discretetime systems with convex costs and constraints. It is known that this problem can be cast as an infinitedimensional convex program when the dynamics and measurements are linear, uncertainty is additive, and the risks associated with constraint violations and excessive costs are measured in expectation or in the worst case. In this paper, we extend this result to systems with convex or concave dynamics, nonlinear measurements, more general uncertainty structures and other coherent risk measures. In this setting, the optimal control problem can be cast as an infinitedimensional convex program if (1) the costs, constraints and dynamics satisfy certain monotonicity properties, and (2) the measured outputs can be reversibly ‘purified’ of the influence of the control inputs through Q or Youlaparameterization. The practical value of this result is that the finitedimensional subproblems arising in a variety of suboptimal control methods, notably including model predictive control and the Qdesign procedure, are also convex for this class of nonlinear systems. Subproblems can therefore be solved to global optimality using convenient modeling software and efficient, reliable solvers. We illustrate these ideas in a numerical example.
I Introduction
We consider the problem of operating a system that evolves in discrete time steps over a finite horizon under uncertainty. Our goal is to design an output feedback control policy that minimizes a convex cost while satisfying convex constraints. This problem has been studied in the stochastic setting, where uncertainty is modeled probabilistically, the expected cost is minimized, and constraints are enforced in expectation or in probability. [1, 2, 3, 4, 5, 6] It has also been studied in the robust setting, where uncertainty is assumed to come from a given set, the worstcase cost is minimized, and constraints are enforced for all possible realizations of the uncertain influences on the system. [7, 8, 9, 10, 11] We accommodate both approaches in this paper, as well as others, adopting the flexible view of risk developed in [12, 13, 14, 15, 16, 17].
Optimal control under uncertainty is a hard problem, even when the costs and constraints are convex. This is due in part to the fact that the optimization variable is a policy, a collection of functions that map measured system outputs into control inputs at each time step. In the stochastic setting, optimal policies can, in principle, be found analytically using stochastic dynamic programming. [18] In practice, analytical solution is typically limited to systems of very low dimension. Notable exceptions are linear systems with additive uncertainty, no constraints, and costs that are either quadratic [19, 20] or exponentialofquadratic [21, 22].
When analytical solution is impractical, various methods can generate suboptimal solutions that often perform well. Some examples are classical linear feedback control design, reinforcement learning [23, 24, 25, 26] and approximate dynamic programming [4, 27, 28, 29], approximation methods for multistage stochastic programming [6, 30, 31, 32, 33] and robust optimization with recourse [33, 34, 35, 36], the design procedure [37, 38, 39, 40, 41, 42], and model predictive control (MPC) in its certaintyequivalent [43, 44, 45], stochastic [46, 47, 48, 49, 50, 51, 52], and robust [53, 54, 55, 56, 57] variants. When perfect state information is not available, control methods may be paired with a state estimator such as a linear [58, 59], extended [60, 61, 62] or unscented [63, 64] Kalman filter or a particle filter [65, 66].
These methods vary widely in their scope, scalability and performance. A common theme, however, is that they tend to work best for linear systems. This is due in part to the fact that for linear systems, (an equivalent transformation of) the optimal control problem is convex. [41] Suboptimal control methods often involve numerically solving optimization subproblems generated by the original optimal control problem. The convexity of the original problem typically carries over to the subproblems, allowing them to be efficiently and reliably solved to global optimality. When subproblems are nonconvex, however, guarantees of global optimality are generally unavailable, and solvers and initial guesses may need to be carefully tailored to the applications at hand.
In [41], Skaf and Boyd demonstrate that for linear systems with additive uncertainty, the control design problem can be transformed to an infinitedimensional convex program. Their method hinges on a change of variables related to the  or Youlaparameterization [67, 68, 69] and to purified output feedback control [2, 10, 34]. When perfect state information is available, this change of variables parameterizes state feedback policies by equivalent disturbance feedback policies. Similar arguments have justified the use of (typically affine) disturbance feedback policies in robust and stochastic MPC of perfectlyobserved linear systems. [42, 47, 70, 71, 72, 73, 74]
For nonlinear systems, the optimal control problem is widely understood to be nonconvex due to nonlinear equality constraints introduced by nonlinear dynamics. Equality constraints can be eliminated, however, by iteratively applying the dynamics to express the state trajectory in terms of the control and exogenous input trajectories. In [75], Rantzer and Bernhardsson observe that in convexmonotone systems, where the dynamics are convex and nondecreasing in the state and control input, the state trajectory is convex in the control input trajectory. In [76], Schmitt et al. generalize this result to convexstatemonotone systems, where the dynamics need not be monotone in the control input.
An immediate consequence of the observations in [75, 76] is that for some nonlinear systems, openloop optimal control (where the decision variable is a fixed control trajectory, rather than a feedback policy) is a convex optimization problem. This holds for convexstatemonotone systems in particular, provided the cost and constraints are nondecreasing in the states. This raises two further questions: Are there other nonlinear systems for which openloop optimal control is convex? What about closedloop optimal control, where we optimize over policies?
This discussion motivates the definition of a convex system as one for which openloop optimal control is a convex optimization problem. After setting the stage in §II, we establish three results for convex systems in this paper.

Characterization (§III): Systems with mixed convex and linear dynamics are convex systems, provided (a) any linear dynamics are independent of states with nonlinear dynamics, and (b) the cost, constraints and nonlinear dynamics are nondecreasing in the states with nonlinear dynamics. Concave dynamics can also be accommodated.

Convex closedloop design (§IV): If the measured outputs can be reversibly ‘purified’ of the influence of the control inputs, then the closedloop optimal control problem for convex systems can be transformed to an equivalent infinitedimensional convex program. The transformation involves changing variables to policies in the purified outputs via  or Youlaparameterization.

Approximate solution (§V–VI): The finitedimensional subproblems arising in a variety of suboptimal control methods, notably including MPC and the design procedure, are convex for convex systems. Subproblems can therefore be solved to global optimality using convenient modeling software and efficient, reliable solvers. We illustrate this in a numerical example.
Ii Problem statement
Iia System
We consider a system to be operated over a finite discrete time span . We can influence the system through the control inputs . The system is also influenced by exogenous inputs , which are generally uncertain.^{1}^{1}1We view the exogenous input as a random vector defined on an underlying probability space. We do not assume that the joint distribution of the exogenous input trajectory is known, but we do assume this distribution is independent of the control input trajectory. The exogenous inputs may include process disturbances, sensor noise, initial states, uncertain model parameters, prices, command or reference signals, etc. The exogenous inputs could come from bounded or unbounded sets, and need not be independent or identically distributed over time.
The control and exogenous inputs determine the states through the system dynamics:
(1)  
We observe the system through the measured outputs
(2)  
We assume that the dynamics mappings and measurement mappings are known. At each time , the controller receives the measured output . It decides the control input
by evaluating an output feedback control law . The control policy is designed in advance; this design problem is the subject of this paper.
To simplify specification of the policy design problem, we work with the input, state and output trajectories. By iteratively applying the dynamics, the state trajectory can be expressed in terms of the control and exogenous input trajectories as
Here the subscript denotes a trajectory from time to . For example, and . The inputstate mappings are defined by the recursion
Similarly, the measured output trajectory can be written as
where the inputoutput mappings are defined recursively by
We also write the control input trajectory as
Figure 1 illustrates the system.
IiB Cost, constraints and risk measures
We are interested in designing the output feedback control policy . The policy must be causal, meaning each control law can depend on outputs from the past or present, but not the future. We denote the set of causal output feedback policies by . Our goal is to find a that minimizes a cost
(3) 
while satisfying constraints
(4) 
We assume that each scalarvalued function is convex in for all .
Because the exogenous inputs are uncertain, the goals of minimizing cost and satisfying constraints are ambiguous. Should they be accomplished on average with respect to the distribution of , for all possible realizations of , or somewhere in between? Resolving these ambiguities amounts to choosing measures of risk.^{2}^{2}2For more on measuring risk, a rich subject that has received much recent attention, we refer the reader to [12, 13, 14, 15, 16, 17]. A risk measure is a functional that maps uncertain scalars into deterministic scalars (or possibly ). It quantifies the risk associated with positive realizations of its uncertain argument. In the language of [12, 13], the risk measure should be associated with constraint if the designer views the risk of constraint violation as acceptable whenever
The risks associated with excessive costs can be treated similarly to the risks associated with constraint violations, because minimizing is equivalent to minimizing a deterministic scalar subject to the constraint .
Risk measure  Definition of  Risk acceptable if…  
Expected value  on average  
Predicted value  prediction of  
Worstcase value  almost surely  






Table I describes five risk measures commonly used in optimization under uncertainty. The expected value is riskneutral: it weighs constraint violations and slacks equally, modeling the risk of constraint violation as acceptable if slacks balance out violations on average. The worstcase value, by contrast, is maximally riskaverse: it models any possibility of constraint violation as unacceptable. The valueatrisk at confidence level () is the smallest threshold that, with 100% confidence, will not be exceeded. Typical values of are 0.9, 0.95 and 0.99. generates chance constraints of the form
The conditional valueatrisk at confidence level () is the conditional expectation, conditioned on the event that is exceeded. [14, 15] always upper bounds . The of can be constrained by minimizing a scalar subject to
where denotes the positive part, . An optimal upper bounds ; this bound is usually tight. [15]
In this paper, we leave the choice of risk measures to the designer. This flexibility makes the results in the following sections applicable in a range of settings, including the robust and stochastic frameworks and others. We do, however, impose one restriction on the risk measures: we assume that they are coherent in the sense of [12, 13, 16, 17]. By this, we mean that each risk measure is convex, nondecreasing, lower semicontinuous, and preserves certainty (i.e., for constant ). We make this assumption primarily because coherent risk measures preserve convexity: if is coherent and is convex for all , then is convex. [16]
All but one of the risk measures mentioned above are coherent. The exception is , which is convex only in structured special cases. We note, however, that can be approximated in a convexitypreserving fashion using sampling [77, 78, 79, 80, 81, 82] or conservative upper bounds such as [6, 50].
In summary, we are interested in the following problem of optimal control under uncertainty:
(OC) 
This problem is intractable in its current form. This is due in part to the fact that the decision variable is infinitedimensional, an obstacle that we will address approximately in §V. Problem (OC) is also complicated by the interdependence between the state, output and control trajectories. We will unravel this interdependence in §IV using a nonlinear change of variables. First, we will consider the case of openloop control, where the control laws are restricted to be constant.
Iii Convex systems
To understand when closedloop control design is a convex optimization problem, we begin by building intuition in the simpler context of openloop control. In this context, a (finitedimensional) vector of control inputs is decided in advance and implemented without feedback. This amounts to drastically restricting the search space from the set of all causal output feedback policies to the subspace of constant policies. With this restriction, the openloop optimal control problem is to
(OLOC) 
This is a finitedimensional optimization problem in the vector . Because coherent risk measures preserve convexity [16], Problem (OLOC) is convex if each cost and constraint function , when composed with the inputstate mapping , is convex in for all . This motivates the following definition.
Definition 1 (Convex system).
By definition, openloop optimal control of convex systems is a convex optimization problem. In §IV, we will show that closedloop optimal control of convex systems is also convex, provided the outputs can be reversibly ‘purified’ of the influence of the control inputs. First, we will characterize a class of convex systems.
Iiia Characterizing convex systems
To understand when a system is convex, we need to understand how the states and controls propagate through the dynamics (1) into the cost (3) and constraints (4). If the dynamics are linear, then this process is straightforward. The state trajectory is affine in the control trajectory for all . Convexity is preserved under composition with an affine mapping [83], so for linear systems is convex for all .
When the system is nonlinear, the process is less straightforward. We do not explore it in general here. Instead, we consider a class of systems with the following dynamics:
(5) 
For this class of systems, convexity can be established using simple composition rules. An important restriction is that the states with linear dynamics are independent of the nonlinear states . This ensures that the trajectory is affine in the control trajectory. Any states with linear dynamics that depend on nonlinear states are included in .
Theorem 1 ().
Appendix A contains a proof of Theorem 1. The proof hinges on the fact that the composition of a convex nondecreasing function with a convex function is convex. [83]
Three remarks are in order. First, system (5) is a straightforward generalization of the convexstatemonotone system studied in [76], which in turn generalizes the convexmonotone system studied in [75]. A precise name for system (5) would be convexnonlinearstatemonotone, since the monotonicity requirement applies only to the states with nonlinear dynamics. The purpose of this generalization is to absorb the class of linear systems for which optimal control was shown in [41] to be convex. In particular, the linearquadratic system is not convexmonotone or convexstatemonotone in general, but is a special case of system (5) with linear dynamics, additive uncertainty, no constraints and quadratic cost.
Second, concave dynamics can be included in system (5) after a sign change. For example, we consider a system with
where each row of is nondecreasing in , nonincreasing in and convex. Similarly, each row of is nondecreasing in , nonincreasing in and concave. If we define the nonlinear state as
then each row of is nondecreasing in and convex, as required by Theorem 1.
Third, system (5) includes a rich class of nonlinear systems. Provided careful attention is paid to curvature and monotonicity, it can admit exponentials and logarithms, quadratic forms, roots, powers, nonsmooth functions such as maxima, minima and absolute values, and sums and compositions of the above. Many more examples of nonlinear convex and concave functions can be found in §3 of [83]. Systems with nonlinear convex dynamics have arisen in applications ranging from cancer and HIV treatment scheduling [75] to voltage control in power systems [75] to freeway congestion management [76] to energy storage control [84, 85]. The drug treatment model in [75] is a special case of a more general class of bilinear systems that, through a logarithmic transformation, can be cast as convex systems.
Iv Convex closedloop optimal control
By contrast to the openloop optimal control problem (OLOC), the equality constraints in the closedloop optimal control problem (OC) cannot be easily eliminated. This is due to the complicated interdependence between the state, output and control input trajectories. Under some conditions on the uncertainty structure, however, this interdependence can be disentangled through a nonlinear change of variables related to the  or Youlaparameterization [67, 68, 69] and purified output feedback control [2, 10, 34]. We now establish sufficient conditions for this change of variables to be possible.
Iva Purifiability and Qparameterization
Definition 2 (Purifiable).
We note for future reference that is a function of only:
Without loss of generality, therefore, we define , and such that
Theorem 2 ().
If the system with dynamics (1) and measurements (2) is purifiable, then there exists a onetoone correspondence between causal output feedback policies and causal policies in the purified output . Furthermore, given a causal in , the unique corresponding causal in can be constructed from the following recursion:
(6)  
Theorem 2 is closely related to the nonlinear discretetime Youla parameterization presented by Wu and Lall in [69]. We include a proof in Appendix B for completeness. We give some examples of purifiable nonlinear systems in §IVB.
Theorem 2 establishes that if the system is purifiable, then we can optimize over policies in the purified output . We interpret as what remains of the output when the effect of the control inputs has been removed. Given a causal , the unique corresponding causal can be recovered. We denote the set of causal by .
Under this change of variables, an equivalent reformulation of the closedloop optimal control problem (OC) is to
The equality constraints can now be eliminated, giving another equivalent problem:
(OC) 
Problem (OC) is structurally identical to the openloop optimal control problem (OLOC), except that the optimization is over (infinitedimensional) purified output feedback policies rather than (finitedimensional) control input trajectories . It follows that the two problems are convex under the same conditions, namely for convex systems. This observation, together with Theorem 2, gives the following result.
Corollary 3 ().
IvB Purifiability examples
Purifiability is essentially an invertibility property on the inputoutput mapping with respect to the exogenous inputs. The requirements for purifiability are (1) at each time step, the influence of the control input history can be removed from the current output, possibly using the output history; and (2) this process must be reversible, in the sense that the current output can be reconstructed from the purified output history and the control input history. To ground this notion, we now provide some concrete examples of purifiable systems. This list is not exhaustive.
Measured exogenous inputs
If the exogenous inputs are measured exactly (), then the system is purifiable with
Pure estimation
If the controller can observe the system but not influence it, then the states and outputs can be expressed as
In this case, the system is trivially purifiable with . This implies that various constrained estimation problems can be put in the form of Problem (OC). For example, we consider the problem of designing a state estimator such that minimizes the mean squared error in estimating , with the prior knowledge that almost surely. This can be put in the form of Problem (OC) by setting , , and .
Perfect state information, invertible dynamics
If the states are measured exactly () and the dynamics are invertible in the exogenous inputs, i.e., there exist mappings such that
then the system is purifiable with
In the special case of additive disturbances, , the purifier reduces to
Deterministic dynamics, invertible measurements
If the initial state is measured exactly (), the dynamics are deterministic,
and the measurements are invertible in the exogenous inputs, i.e., there exist mappings such that
then the system is purifiable with
In the special case of additive noise, , the purifier reduces to
Stateaffine dynamics and measurements, additive uncertainty
If the dynamics and measurements have the form
then it can be shown that the inputoutput mappings are additive:
In this case, the system is purifiable with
V Approximate solution methods
Although Problem (OC) is convex for convex systems, it remains challenging for two reasons. First, the decision variable is infinitedimensional. Second, the risk measures may be difficult to compute, or even illdefined if some distributional information is lacking. We discuss methods for addressing the infinitedimensionality in §VA and for approximating risk measures in §VB. The upshot of this discussion is that several suboptimal control methods that perform well for linear systems can be applied to nonlinear convex systems using finitedimensional convex optimization.
Va Finitedimensional restrictions
VA1 Openloop model predictive control
A simple, effective method for overcoming the challenge of infinite dimensionality is openloop MPC. In this method, at each time step we solve a version of the (finitedimensional) openloop optimal control problem (OLOC) over a truncated, receding horizon. This generates a planned control input trajectory. We implement the first step in this plan, the system evolves, we update the state estimate and the process repeats. As discussed in §III, openloop optimal control is convex for convex systems, even those with nonlinear dynamics.
In one common variant of openloop MPC, typically called certaintyequivalent MPC, the subproblem at each time step is solved under a single prediction of the disturbance trajectory. [43, 44, 45] This amounts to measuring risk with the predicted value risk measure discussed in §IIB. Other risk measures can be used in openloop MPC, however, and can significantly improve performance. [46, 48, 49, 50, 51, 52, 53, 54, 55, 57] Risk measures can be approximated if necessary, as discussed in §VB.
Openloop MPC has several advantages. First, it often performs well in practice. Second, the design process is straightforward and intuitive; the primary design decisions are the prediction horizon, the terminal cost and/or constraints with which to augment the subproblems, and the algorithms for prediction and state estimation. A third advantage, and perhaps an underappreciated one, is that openloop MPC admits very general uncertainty structures, including additive, multiplicative and others. In particular, since it does not require the Youlatype change of variables discussed in §IVA, openloop MPC can be applied to systems that are not purifiable.
Openloop MPC also has several disadvantages. First, its implementation is computationally intensive due to the use of online optimization; this can limit its scalability. (We note, however, that for linearly constrained linear systems with quadratic costs, openloop MPC can be implemented efficiently using custom solvers that exploit the problem structure. [45]) Second, openloop MPC yields no closedform expression for the control policy ; this complicates analysis of closedloop stability, robustness, etc. Third, the method’s performance can suffer somewhat due to the openloop structure of the optimal control subproblems. This structure ignores the controller’s opportunity to respond to future information as it becomes available, i.e., the controller’s recourse. The design procedure partially addresses these disadvantages.
VA2 Qdesign
In openloop MPC, the policy design procedure is straightforward (choosing a few parameters and subroutines), but implementation is computationally intensive due to the use of online optimization. In the design procedure [37, 38, 39, 40, 41, 42, 73], by contrast, policy design is computationally intensive, but implementation is extremely efficient. In particular, no online optimization is needed. Another important distinction is that unlike MPC, design yields a closedform control policy. This facilitates analysis and simulation of the closedloop system.
In design, we design a suboptimal causal purified output feedback policy
Here are causal basis policies selected by the designer and is a parameter vector. The parameters are decided by solving a finitedimensional analogue of Problem (OC), with replaced by :
(OC) 
Convex constraints on can also be added, e.g., to cultivate a particular structure in the control policy. Given , the unique corresponding causal output feedback policy can be constructed from the recursion in Theorem 2.
Because is affine in and convexity is preserved under composition with an affine mapping, the design problem (OC) inherits the convexity of Problem (OC) for convex systems. It follows that design can be applied to convex systems, even nonlinear ones, using finitedimensional convex optimization. Risk measures in Problem (OC) can be approximated if necessary, as discussed in §VB.
In principle, design can solve the optimal control problem (OC) to any degree of accuracy for a sufficiently large basis. In practice, design has two disadvantages. First, its performance depends intimately on the choice of basis policies. In [42], Skaf and Boyd explore the natural choice of affine policies in the context of linear systems. While affine design performs well in a number of problems, the performance is typically not as good as openloop MPC. To our knowledge, the general problem of finding good nonlinear basis policies has not yet been solved. A second disadvantage of design is that, as discussed in §IVA, it can be applied only to purifiable systems. If the system is not purifiable, then an implementable output feedback policy may not be recoverable from the designed policy.
VA3 Closedloop model predictive control
The final approximate solution method we discuss, closedloop MPC, can be viewed as receding horizon design. As in openloop MPC, in closedloop MPC we solve an optimal control subproblem at each time step over a truncated, receding horizon. The key difference is that the closedloop MPC subproblems include a recourse model, i.e., a model of the controller’s response to future information as it becomes available. More precisely, the openloop MPC subproblems are truncated versions of the openloop optimal control problem (OLOC), while the closedloop MPC subproblems are truncated versions of the design problem (OC). Closedloop MPC with affine recourse models is developed for linear systems in [42, 47, 50, 51, 56, 57].
Closedloop MPC addresses the two disadvantages of design. Like openloop MPC, closedloop MPC can be applied to systems that are not purifiable. To see this, we note that the first step of the recursion in Theorem 2,
can always be implemented, even if later steps cannot. Closedloop MPC also tends to be less sensitive than design to the choice of basis policies; in closedloop MPC the policy designed at each time step is merely a model of future recourse, while in design the policy is the actual source of the implemented control inputs.
Closedloop MPC can outperform openloop MPC due to the inclusion of a recourse model. [42] Its design process is similarly straightforward; the only additional step is choosing basis policies. Like openloop MPC, however, closedloop MPC involves computationally intensive, online optimization. A closedform expression for the closedloop MPC policy is generally not available.
VB Risk measure approximation
The control methods discussed in §VA all require solving finitedimensional convex optimization subproblems under uncertainty. These subproblems can be solved exactly in a few special cases. In general, however, we must resort to approximate solution methods. [3, 6, 10]
We now describe one approximate solution method based on sampling. This method can accommodate each of the risk measures discussed in §IIB. We begin by obtaining samples from the distribution of the exogenous input trajectory , or in the robust setting, from the corresponding uncertainty set. Samples could be obtained, e.g., from historical data or a pseudorandom number generator. Each risk measure is then replaced by an approximation based on the samples . This results in a convex optimization problem (with random input data) that can be solved using offtheshelf software. Depending on the underlying risk measures, this samplebased approximate solution scheme is known as scenario optimization [77, 78, 79, 80, 81, 82] or sampleaverage approximation [6, 86, 87, 88].
Table II shows samplebased approximations for the five risk measures in Table I. The worstcase value and valueatrisk (which generate robust and chance constraints, respectively) can be approximated by maxima over the full sample. Expectationtype risk measures, including conditional valueatrisk, can be approximated using sample averages. Both of these approximations preserve the convexity of the underlying cost or constraint function. Theoretical bounds on violation probabilities for robust and chance constraints are available. [77, 78, 79, 80, 81, 82] Some convergence results for sampleaverage approximation can be found in [6, 86, 87, 88]. Variance reduction methods and hypothesis tests of solution quality can also be applied. [6]
Risk measure, 



Expected value  
Predicted value  prediction based on  
Worstcase value  
Valueatrisk  


To illustrate this method, we consider the openloop optimal control problem (OLOC). The samplebased approximation to this problem is to
The design problem (OC) can be approximated similarly. The decision variables in these problems ( and , respectively) can be regularized to avoid overfitting the training data . The computational complexity of the sampled problem generally scales linearly with , so relatively large sample sizes can often be used. Different sample sizes can be used to approximate the different risk measures .
Vi Numerical example
In this section, we demonstrate the new capabilities developed in this paper through a numerical example. The example includes nonlinear dynamics, nonadditive uncertainty and the risk measure.
We consider a singleinput, singleoutput system with dynamics
(7) 
We assume the controller has perfect state information (), so that we can directly compare control methods in isolation from the state estimation problem. The unforced dynamics have a stable equilibrium at the origin and an unstable equilibrium at unity. Our primary control objective is to maintain the state within the basin of attraction of the stable equilibrium, i.e., to satisfy the constraint , where
We accept that this constraint may occasionally be violated due to the uncertain initial state and disturbances, but tolerate violations only if they are both infrequent and small. For this reason, we measure the risk of constraint violations by the conditional valueatrisk () at confidence level . This risk measure addresses both the probability and (conditionally) expected magnitude of constraint violations. As discussed in §IIB, can be constrained by minimizing a scalar subject to
We would also like to minimize the total cost of control effort,
where are uncertain prices. We choose the cost risk measure .
The dynamics (7), while nonlinear, are nondecreasing in and convex in for all . (We assume , so the coefficient multiplying is nonnegative.) The cost function is convex in for all , and the constraint function is convex and nondecreasing in each of . By Theorem 1, therefore, the system is convex. It is also purifiable, since the controller has perfect state information and the dynamics are invertible in the disturbance . The purified output at time is , with purifier
The inverse purifier can be straightforwardly constructed from the system dynamics. Theorem 2 therefore applies, establishing a bijection between disturbance feedback policies and state feedback policies for this system. This gives our (infinitedimensional, convex) control design problem:
(8) 
We compare five suboptimal controllers for this system. The first is the optimal affine disturbance feedback policy, computed via the design procedure. The second is a nonlinear designed policy with piecewise quadratic basis policies tuned through trial and error. The third is the certaintyequivalent variant of openloop MPC discussed in §VA1; subproblems are solved under a single prediction of the exogenous input trajectory (in this case, its conditional expectation). The fourth is openloop MPC with the same risk measures as Problem (8). The fifth is closedloop MPC with the same risk measures and an affine disturbance feedback recourse model. We refer to the fourth and fifth controllers as openloop and closedloop scenario MPC, respectively, as is common in the literature. [51, 89]
Implementing each of these controllers involves numerically solving subproblems generated by Problem (8). As discussed in §VB, we replace the expected values in these subproblems by sample averages over a training set of 1,000 sample exogenous input trajectories. We use a validation set of another 1,000 samples to tune parameters such as the basis policies in nonlinear design and the MPC prediction horizon. We then compare the policies’ performance in a test set of another 2,000 samples.
In simulations, the initial state has a halfnormal distribution on with mean two. The disturbances are identically Beta distributed, shifted and scaled to have mean zero and support . Each price of control effort is exponentially distributed with mean ; prices are lowest (on average) in the middle of the control horizon. The prices, disturbances and initial state are mutually independent.
Optimization is done in MATLAB using the Gurobi solver and the CVX modeling toolbox [90], which makes specifying our problems very easy. For example, the following code computes the optimal openloop control input trajectory over sample exogenous input trajectories stored in x0 (), w () and p ():
variables a u(T,1) expression x(T,N) x = stateTrajectories(x0,u,w,f,T); minimize( a + sum(c0(u,p))/N ) subject to sum(pos(c1(x)  a))/N <= a*(1  b)
We wrote the vectorized function stateTrajectories to play the role of the inputstate mapping , recursively building the state trajectory in each of the Monte Carlo simulations from the dynamics function f. It typically speeds up CVX modeling by an order of magnitude compared to a loop over Monte Carlo runs. This function, along with all other code used in this paper, is online at [91].
Figure 2 shows the sampleaverage cost and constraint conditional valueatrisk of each controller in the test set of exogenous input trajectories. Table III shows the frequency with which each controller satisfies the constraint in the test set. The five controllers strike different tradeoffs between cost and robust constraint satisfaction. Certaintyequivalent MPC is aggressive, achieving low cost but poor robustness. The optimal affine controller satisfies constraints more robustly at the cost of increased control effort. The designed nonlinear controller performs significantly better than the optimal affine controller, achieving similar cost and robustness to both of the scenario MPC variants. Including an affine recourse model in scenario MPC reduces cost slightly over openloop scenario MPC, but also increases the risk of constraint violation. In this example, the robustness improvement of scenario MPC over certaintyequivalent MPC comes mainly from the use of samples, rather than the recourse model.
This numerical example demonstrates the key message of this paper: Suboptimal control methods that perform well for linear convex systems can be applied directly to nonlinear convex systems using the same software, and can perform similarly well. In this example, the designed and predictive controllers show similar performance trends to the trends demonstrated for linear systems in [41, 42] and elsewhere. In particular, (a) scenario MPC is more robust than certaintyequivalent MPC, while achieving lower cost than affine controllers, and (b) nonlinear designed controllers with good basis policies can be competitive with scenario MPC.
Vii Conclusion
Controller  Frequency  

Optimal affine  99.5%  

99.3%  
Nonlinear design  98.9%  

98.7%  

74.6% 
In this paper, we explored the class of nonlinear systems for which optimal control design under uncertainty can be cast as a convex optimization problem. We adopted a flexible approach to the risks associated with constraint violations and excessive costs, accommodating both the robust and stochastic views. We showed that openloop optimal control is convex for nonlinear systems with convex or concave dynamics, provided the dynamics, cost and constraint functions satisfy certain monotonicity properties. We then showed that under the same conditions, closedloop control design can be reformulated as a convex program if, in addition, the measured outputs can be reversibly ‘purified’ of the influence of the past control inputs.
The practical value of these results is the guarantee that, for a class of nonlinear systems, the subproblems solved in various suboptimal control methods are convex. This removes concerns about solvers getting stuck in local minima, and enables the use of convenient modeling software and reliable, efficient solvers. We illustrated this numerically for two methods, the design procedure and model predictive control.
There are a number of opportunities to extend this work. First, it would be interesting to survey applications in which nonlinear convex systems arise. We are aware of several examples, including cancer and HIV treatment scheduling [75], voltage control in electricity distribution networks [75], freeway traffic congestion management [76] and energy storage control [84, 85], but this list is likely incomplete.
Second, the requirement of purifiability appears to significantly restrict the uncertainty structures and nonlinearities admissible for design. Exploring purifiability further could enable convex control design for a richer class of nonlinear systems. We note, however, that other methods, notably including model predictive control, do not require purifiability.
Third, the class of nonlinear systems for which optimal control is convex could be further explored. We characterized only a subset of this class of systems; our method relied on the application of simple composition rules. While this method is constructive and compatible with convenient modeling software such as CVX [90] and YALMIP [92], it is not exhaustive.
Appendix A Proof of Theorem 1
Our task is to show that the function
is convex for all . Here we have expressed the state trajectory as
We will define the mappings , and shortly.
We recall that convexity is preserved under composition with an affine mapping and under the composition of a convex nondecreasing function with a convex function [83], and that is nondecreasing in each element of by assumption. It therefore suffices to show that each row of is convex for all .
To do this, we need the recursive definitions of , and . They are initialized by
For ,
The proof now proceeds by induction. At , is convex in by the convexity of . For the inductive step, we suppose that each row of is convex in . By assumption, each row of is nondecreasing in and convex. Therefore, involves compositions of convex functions with the affine mapping