Convex Relaxations for Nonlinear Stochastic Optimal Control Problems
Abstract
This article presents a new method for computing guaranteed convex and concave relaxations of nonlinear stochastic optimal control problems with finaltime expectedvalue cost functions. This method is motivated by similar methods for deterministic optimal control problems, which have been successfully applied within spatial branchandbound (B&B) techniques to obtain guaranteed global optima. Relative to those methods, a key challenge here is that the expectedvalue cost function cannot be expressed analytically in closed form. Nonetheless, the presented relaxations provide rigorous lower and upper bounds on the optimal objective value with no samplebased approximation error. In principle, this enables the use of spatial B&B global optimization techniques, but we leave the details of such an algorithm for future work.
I Introduction
This article concerns the guaranteed global solution of the stochastic optimal control problem stated informally as
(1)  
where and denote the expected value and probability over continuous random variables , respectively, and is the solution of the nonlinear ordinary differential equations (ODEs)
(2)  
The decision vector may represent a parameterized openloop control trajectory, parameters in an explicit feedback controller embedded in (2), etc. Such problems arise in stochastic model predictive control [1], renewable energy systems [2], trajectory planning [3], chemical process control [4], and many other applications.
For optimal control problems with deterministic objectives and constraints, a number of algorithms have recently been developed that can provide guaranteed global solutions [5, 6, 7, 8]. In brief, these methods are predicated on effective algorithms for enclosing the reachable set of the dynamics on subintervals of the decision space. These enclosures can take the form of fixed interval bounds or other fixed sets [7], but are more commonly described by bounds that depend affinely or convexly on the decisions [9, 10, 11, 12, 13, 8]. With such an enclosure, it is possible to construct convex relaxations of the optimal control problem on arbitrary subintervals of the decision space, and using these, to compute bounds on the optimal objective value on such subintervals. Finally, these bounds can be used within a generic spatial branchandbound (B&B) algorithm [14] to obtain a rigorous global solution.
To the best of our knowledge, no such guaranteed global optimization algorithm is available for the stochastic problem (1). In this case, a critical new challenge is that the expectedvalue and probability appearing in the objective and constraints cannot be evaluated analytically in closed form, and must be evaluated instead by sampling. In the context of the optimization approach outlined above, this is problematic because it is no longer possible to obtain guaranteed bounds on the optimal objective value. In fact, using only samplebased approximations, it is not even possible to bound the objective and constraint values at a given feasible point with finitely many computations.
In practice, this problem is most commonly addressed by replacing the objective and constraints in (1) by sampleaverage approximations (SAA), resulting in a deterministic optimal control problem that can be solved using existing methods. However, SAA has several critical limitations that can lead to inaccurate solutions or excessive computational cost. First, it only guarantees convergence to a global solution as the sample size tends to infinity [15]. Moreover, the number of samples required to achieve a highquality solution in practice is unknown and can be quite large [16, 17]. More importantly, a sufficient sample size is not known in advance. Thus, it is often necessary to solve several SAA problems with independent samples to assess solution accuracy, and to repeat the entire process if a larger sample size is deemed necessary [17]. This is clearly problematic for nonconvex optimal control problems, where solving a single instance to global optimality is already demanding.
In this article, we take a first step towards extending the rigorous global optimization methods outlined above to the stochastic problem (1). Specifically, our main contribution is a new method for computing guaranteed convex and concave relaxations of the finaltime expectedvalue objective function. As with the deterministic methods above, we rely on an existing method for computing timevarying bounds on the solutions of (2) [10]. However, we modify the method here to obtain lower and upper bounds that are convex and concave, respectively, with respect to both and . Through an application of Jensen’s inequality, we then obtain a timevarying, dependent convex enclosure of the mapping . This relaxation method is similar in spirit to the socalled probability bounds for dynamic systems in [18, 19], but these works do not consider expectedvalue bounds and have not been applied in the context of optimization. Our method is also related to our recent work in [20] (Preprint available at osf.io/qab6n), which described convex and concave relaxations of expectedvalue functions that do not depend on the solution of a dynamic system.
In the absence of chance constraints, the relaxation technique presented here can provide both lower and upper bounds on the optimal objective value of (1), without resorting to samplebased approximations. In principle, this enables the application of spatial B&B to solve (1) to guaranteed global optimality with no approximation error. However, we leave the details of such a B&B algorithm, as well as the treatment of chance constraints, for future work.
The remainder of this article is organized as follows. First, Section II gives a formal problem statement. Our new relaxation theory is then developed in two steps in Sections III and IV. In Section V, we apply the developed relaxations to obtain computable upper and lower bounds on the optimal objective value of (1) in the absence of chance constraints. In Section VI, we demonstrate the proposed relaxation technique on a simple case study. Finally, Section VII provides concluding remarks.
Ii Problem Statement
Let be a time horizon of interest, let be a compact dimensional interval of decision variables , and let be a random vector with probability density function (PDF) . We assume that is zero outside of a compact interval . Let and be locally Lipschitz continuous functions defining the dynamics (2). We assume that (2) has a unique solution on all of for every . Finally, let and define
(3)  
(4) 
which is assumed to exist for every .
We are interested in computing convex and concave relaxations, defined as follows.
Definition 1.
Let be convex and . Functions are convex and concave relaxations of on , respectively, if is convex on , is concave on , and
The objective of this article is to develop a method for computing convex and concave relaxations of on any given subinterval of . Specifically, we are interested in relaxations of itself, rather than any finite approximation of via sampling, quadrature, etc. At the same time, the relaxations themselves must be finitely computable to be of value in the context of spatial B&B.
The following general notation is used in the remainder of the article. For any with , let denote the compact dimensional interval . Moreover, for , let denote the set of all compact interval subsets of . In particular, let denote the set of all compact interval subsets of .
Iii Relaxing the Dynamics on
The first step in our relaxation procedure is to compute convex and concave relaxations of the function defined by
(5) 
Specifically, we will show in §IV that the desired relaxations of the expected value can be readily computed from convex and concave relaxations of jointly with respect to and . Assuming that is known in closed form, and hence amenable to standard relaxation techniques [21, 22], the only complication in computing such joint relaxations of is the presence of the terminal time state vector in (5), which we naturally assume is not known in closed form. To deal with this, we will construct joint state relaxations defined as follows.
Definition 2.
Choose any intervals and . Two functions are called state relaxations for (2) on if and are, respectively, convex and concave relaxations of on , for every .
Several methods have been developed for computing state relaxations in the deterministic case [9, 10, 11, 12, 13, 8]. Here, we extend the method in [10] to produce joint relaxations on . However, because Definition 2 makes no mathematical distinction between the decisions and the RVs , the extension is direct (our overall relaxation procedure treats and differently beginning in §IV).
The method in [10] computes state relaxations as the solutions of an auxiliary system of ODEs. Defining this system requires particular kinds of relaxations of the functions , and to be available, which we now assume. Although the following assumptions may seem restrictive, the required relaxations can be automatically constructed for nearly any functions , , and through a generalization of McCormick’s relaxation technique, as discussed in detail in [10, 23].
Assumption 3.
Assume that the following functions are available for any intervals :

are continuous convex and concave relaxations of on .

are Lipschitz continuous and satisfy the following condition: For any continuous and any fixed , the functions
(6) (7) are respectively convex and concave relaxations of
(8) on , provided that and are respectively convex and concave relaxations of on .

are Lipschitz continuous and satisfy the following condition: For any continuous , the functions
(9) (10) are respectively convex and concave relaxations of
(11) on , provided that and are respectively convex and concave relaxations of on .
Under Assumption 3, the following theorem provides state relaxations for (2) and the desired relaxations of .
Theorem 4.
Choose any and define the auxiliary system ODEs:
(12) 
for all . This system has unique solutions , and these solutions are state relaxations for (2) on . Moreover, the functions
are convex and concave relaxations of on .
Proof.
Iv Relaxing the Expected Value on
In this section, we develop a method for computing convex and concave relaxations of the expected cost function
(13) 
on any given . In light of Theorem 4, we assume throughout this section that relaxations and of are available on any desired subinterval .
To begin, note that for any and , is bounded from above and below by the values and , respectively, as a trivial consequence of integral monotonicity. Moreover, these functions can readily be shown to be concave and convex on , respectively. However, relaxations defined in this way are of no value for B&B global optimization since they must be evaluated by sampling in general, and so guaranteed bounds cannot be computed from such relaxations finitely. To overcome this limitation, we follow the technique recently proposed for standard stochastic programs (rather than optimal control problems) in [20]. Namely, we apply Jensen’s inequality to pass the expectation operator inside the relaxation functions and .
Lemma 5 (Jensen’s inequality).
Let be convex and let . If is convex and exists, then . If is concave, then .
Proof.
See Proposition 1.1 in [24]. ∎
Although we could apply Jensen’s inequality directly to the relaxations and on the whole uncertainty set , this introduces conservatism in the resulting relaxations that cannot be controlled. In particular, relaxations defined in this way may not converge to as the interval tends towards a singleton which is required for the convergence of spatial B&B algorithms [14]. Thus, we instead apply Jensen’s inequality on a partition of that can be refined as needed.
Definition 6.
A collection of intervals is called an interval partition of if and for all distinct and .
Definition 7.
For any measurable , let denote the probability of the event , and let denote the conditional expected value conditioned on the event .
The following theorem provides the desired relaxations of .
Theorem 8.
Let be an interval partition of . For every and every , define
(14)  
(15) 
Then and are convex and concave relaxations of on , respectively.
Proof.
By the law of total expectation (Proposition 5.1 in [25]), can be expressed for any as
(16) 
Thus, for any and any , integral monotonicity and Jensen’s inequality give,
(17)  
(18)  
(19) 
Noting that is a sum of convex functions on , it must be convex itself, and is therefore a convex relaxation of on . The proof for is analogous. ∎
By considering an exhaustive partition of , Theorem 8 provides relaxations that are valid for the true expected value , rather than a finite approximation obtained via sampling or otherwise. Moreover, in contrast to samplebased approaches, the relaxations in Theorem 8 can be evaluated finitely provided that the probabilities and conditional expectations are computable. This is clearly true if is uniformly distributed. On the other hand, directly evaluating these quantities for more general RVs often requires difficult multidimensional integrations. However, the article [20] presents an approach that avoids these computations for a variety of common distributions by using wellknown changeofvariables formulas to reformulate the RVs of interest as uniform RVs (e.g., using the inverse CDF transform). Since our relaxation theory does not require linearity or convexity assumptions, using such transformations poses no additional difficulties.
Clearly, the use of an exhaustive partition of in Theorem 8 is also a drawback, since in practice it limits our approach to models with a modest number of RVs. However, note that Theorem 8 provides valid relaxations on any partition , no matter how coarse. Thus, one can use partitions appropriate for any desired level of accuracy. In particular, in the context of spatial B&B, coarse partitions may be sufficient to eliminate large regions of the search space from consideration, with fine partitions being required only in the vicinity of global optimizers. We leave the issue of effective partitioning rules for future work. We also leave for future work a formal analysis of the convergence of and to as tends towards a singleton . However, note that Theorem 5.4 in [20] shows that the expectedvalue relaxation strategy used in Theorem 8 inherits the convergence properties of the integrand relaxations and , provided that is partitioned sufficiently quickly as the width of diminishes. In turn, the convergence properties of and depend on those of the relaxations defined in Assumption 3, and have been studied in detail in [26, 27].
V Rigorous Bounds on Stochastic Optimal Control Problems
In this section, we apply the results of Sections III and IV to establish computable upper and lower bounds on the optimal objective value of the stochastic optimal control problem
(20) 
Specifically, in order to solve (20) to gauranteed global optimality using spatial B&B, it is necessary to provide the B&B routine with upper and lower bounds for (20) restricted to any given interval . To obtain a lower bound, the standard approach is to minimize a convex relaxation over , which is available via Theorem 8. To obtain an upper bound, one common approach is simply to evaluate the objective at any feasible . However, this is not possible for (20) because cannot be evaluated finitely. Thus, a different approach is required to obtain a valid upper bound. In the following corollary, we employ a special case of the concave relaxation defined in Theorem 8.
Corollary 9.
Let be an interval partition of and define the shorthand . For any , a lower bound on the optimal objective value of (20) restricted to is given by the optimal objective value of the following deterministic convex optimal control problem:
(21) 
where and are the unique solutions of the independent systems of ODEs given for each by
(22) 
Moreover, for any , an upper bound on the optimal objective value of (20) restricted to is given by the value
(23) 
where and now denote the unique solutions of the independent systems of ODEs given for each by
(24) 
Vi Numerical Example
The following nonlinear ODEs describe a negative resistance circuit consisting of an inductor, a capacitor, and a resistive element in parallel, where is the current through the inductor and is the voltage across the capacitor [28]:
(25)  
We take the initial conditions to be independent random variables, both following a truncated normal distribution with mean , standard deviation , and truncation range . The parameters and are the inverses of the inductance and capacitance, respectively, and are scaled so that (25) is dimensionless.
Consider the problem of relaxing the expectedvalue function on the set with . Clearly, this function cannot be evaluated analytically, so standard relaxation techniques are not applicable. To apply Theorems 4 and 8, we considered interval partitions of consisting of 1, 16, and 64 uniform subintervals. We computed relaxations of the initial condition functions and righthand side functions in (25) satisfying Assumption 3 using generalized McCormick relaxations [23] as described in [10]. For every interval , the state relaxation system defined in Theorem 4 was solved at the point using the code CVODE
in the Sundials Matlab Toolbox [29] with default tolerances. Finally, relaxations of were constructed by summing the resulting relaxations of at weighted by the probabilities , as described in Theorem 8.
Figure 1 shows the resulting convex and concave relaxations, along with several final time solutions of (25) computed for random samples of and a sampleaverage approximation of computed using 200 samples. Note that the relaxations enclose the true expected value as per Theorem 8, but need not enclose all solutions of (25) for sampled . Figure 1 shows that the relaxations become tighter as the partition is refined, but the improvement from 16 to 64 subintervals is minor. This suggests that the proposed method can provide reasonably tight relaxations using fairly coarse partitions of the uncertainty space.
Vii Conclusions
The main contribution of this article is a new approach for computing convex and concave relaxations of nonlinear stochastic optimal control problems with finaltime expectedvalue cost functions. These relaxations can be used to compute rigorous upper and lower bounds on the optimal objective value of such problems restricted to any given subinterval of the decision space, as required for global optimization via spatial branchandbound. In this context, the key features of the presented relaxations are: (i) they provide valid bounds on the true objective function, without resorting to discrete approximations of either the expected value or the embedded dynamic model; (ii) they can be computed finitely, even when the objective function itself can only be approximated by sampling. Yet, both of these properties result from the use of an exhaustive partition of the uncertainty space (assumed compact), which limits the applicability of these relaxations to problems with a modest number of random variables. The presented case study suggests that the proposed relaxations can be made tight with a modest partition of the uncertainty space.
References
 [1] A. Mesbah. Stochastic model predictive control: An overview and perspectives for future research. IEEE Control Systems, 36(6):30–44, Dec 2016.
 [2] A. Hakizimana and J.K. Scott. Differentiability conditions for stochastic hybrid systems arising in the optimal design of microgrids. J. Optim. Theor. Appl., 173(2):658–682, 2017.
 [3] L. Blackmore, M. Ono, and B.C. Williams. Chanceconstrained optimal path planning with obstacles. IEEE Trans. Robotics, 27(6):1080–1094, Dec 2011.
 [4] Hesam Ahmadian Behrooz. Robust design and control of extractive distillation processes under feed disturbances. Industrial & Engineering Chemistry Research, 56(15):4446–4462, 2017.
 [5] J.K. Scott and P.I. Barton. Reachability analysis and deterministic global optimization of DAE models. In Achim Ilchman and Timo Reis, editors, Surveys in Differential Algebraic Equations III, volume 3, pages 61–116. Springer International Publishing, 2015.
 [6] B. Houska and B. Chachuat. Branchandlift algorithm for deterministic global optimization in nonlinear optimal control. J. Optim. Theor. Appl., 162(1):208–248, 2014.
 [7] Yao Zhao and Mark A. Stadtherr. Rigorous global optimization for dynamic systems subject to inequality path constraints. Industrial & Engineering Chemistry Research, 50(22):12678–12693, 2011.
 [8] I. Papamichail and C.S. Adjiman. A rigorous global optimization algorithm for problems with ordinary differential equations. J. Glob. Optim., 24(1):1–33, 2002.
 [9] A.B. Singer and P.I. Barton. Global optimization with nonlinear ordinary differential equations. J. Glob. Optim., 34:159–190, 2006.
 [10] J.K. Scott, B. Chachuat, and P.I. Barton. Nonlinear convex and concave relaxations for the solutions of parametric ODEs. Optimal Control Applications and Methods, 34(2):145–163, 2013.
 [11] J.K. Scott and P.I. Barton. Improved relaxations for the parametric solutions of ODEs using differential inequalities. J. Glob. Optim., 57(1):143–176, 2013.
 [12] A.M. Sahlodin and B. Chachuat. Convex/concave relaxations of parametric ODEs using Taylor models. Computers and Chemical Engineering, 35:844–857, 2011.
 [13] A.M. Sahlodin and B. Chachuat. Discretizethenrelax approach for convex/concave relaxations of the solutions of parametric ODEs. Applied Numerical Mathematics, 61:803–820, 2011.
 [14] Reiner Horst and Hoang Tuy. Global Optimization: Deterministic Approaches. Springer, New York, third edition, 1996.
 [15] Alexander Shapiro. Stochastic programming approach to optimization under uncertainty. Mathematical Programming, 112(1):183–220, 2008.
 [16] Bram Verweij, Shabbir Ahmed, Anton J. Kleywegt, George Nemhauser, and Alexander Shapiro. The sample average approximation method applied to stochastic routing problems: A computational study. Computational Optimization and Applications, 24(2):289–333, 2003.
 [17] Jeff Linderoth, Alexander Shapiro, and Stephen Wright. The empirical behavior of sampling methods for stochastic programming. Annals of Operations Research, 142(1):215–241, 2006.
 [18] J.A. Enszer, Y. Lin, S. Ferson, G.F. Corliss, and M.A. Stadtherr. Probability bounds analysis for nonlinear dynamic process models. AIChE Journal, 57(2):404–422, 2011.
 [19] Joshua A. Enszer, D. Andrei Maces, and Mark A. Stadtherr. Probability bounds analysis for nonlinear population ecology models. Mathematical Biosciences, 267:97–108, 2015.
 [20] Y. Shao and J. K. Scott. Convex relaxations for global optimization under uncertainty described by continuous random variables (preprint: arxiv:1709.08780). 2017.
 [21] G.P. McCormick. Computability of global solutions to factorable nonconvex programs: Part I  convex underestimating problems. Mathematical Programming, 10:147–175, 1976.
 [22] Mohit Tawarmalani and Nikolaos V. Sahinidis. Convexification and Global Optimization in Continuous and MixedInteger Nonlinear Programming. Kluwer Academic Publishers, 2002.
 [23] J.K. Scott, M.D. Stuber, and P.I. Barton. Generalized McCormick relaxations. Journal of Global Optimization, 51:569–606, 2011.
 [24] Michael D. Perlman. Jensen’s inequality for a convex vectorvalued function on an infinitedimensional space. Journal of Multivariate Analysis, 4(1):52–65, 1974.
 [25] S.M. Ross. A First Course in Probability. Prentice Hall, 2002.
 [26] S.D. Schaber, J.K. Scott, and P.I. Barton. Convergenceorder analysis for differentialinequalitiesbased bounds and relaxations of the solutions of ODEs. Submitted, pages 1–39, 2017.
 [27] Spencer D. Schaber. Tools for dynamic model development. PhD thesis, Massachusetts Institute of Technology, 2014.
 [28] K. Hassan Khalil. Nonlinear Systems. Prentice Hall, Upper Saddle River, NJ, third edition, 2002.
 [29] A. C. Hindmarsh, P. N. Brown, K. E. Grant, S. L. Lee, R. Serban, D. E. Shumaker, and C. S. Woodward. SUNDIALS, suite of nonlinear and differential/algebraic equation solvers. ACM Transactions on Mathematical Software, 31:363–396, 2005.