On Stochastic Model Predictive Control with Bounded Control InputsThis research was partially supported by the Swiss National Science Foundation under grant 200021-122072

# On Stochastic Model Predictive Control with Bounded Control Inputs††thanks: This research was partially supported by the Swiss National Science Foundation under grant 200021-122072

Peter Hokayem, Debasish Chatterjee, John Lygeros The authors are with the Automatic Control Laboratory, Electrical Engineering, ETH Zurich, Switzerland hokayem,chatterjee,lygeros@control.ee.ethz.ch
###### Abstract

This paper is concerned with the problem of Model Predictive Control and Rolling Horizon Control of discrete-time systems subject to possibly unbounded random noise inputs, while satisfying hard bounds on the control inputs. We use a nonlinear feedback policy with respect to noise measurements and show that the resulting mathematical program has a tractable convex solution in both cases. Moreover, under the assumption that the zero-input and zero-noise system is asymptotically stable, we show that the variance of the state, under the resulting Model Predictive Control and Rolling Horizon Control policies, is bounded. Finally, we provide some numerical examples on how certain matrices in the underlying mathematical program can be calculated off-line.

## I Introduction

Model Predictive Control (MPC) for deterministic systems has received a considerable amount of attention over the last few decades, and significant advancements have been realized in terms of theoretical analysis as well as industrial applications. The motivation for such research thrust comes primarily from tractability of calculating optimal control laws for constrained systems. In contrast, the counterpart of this development for stochastic systems is still in its infancy.

The deterministic setting is dominated by worst-case analysis relying on robust control methods. The central idea is to synthesize a controller based on the bounds of the noise such that a certain target set becomes invariant with respect to the closed-loop dynamics. However, such an approach usually leads to rather conservative controllers and to large infeasibility regions, and although disturbances are not likely to be unbounded in practice, assigning an a priori bound to them seems to demand considerable insight. A stochastic model of the disturbance is a natural alternative approach to this problem: the conservatism of the worst-case analysis may be circumvented, and one need not impose any a priori bounds on the maximum magnitude of the noise. However, since in practice control inputs are almost always bounded, it is of great importance to consider hard bounds on the control inputs as essential ingredients of the controller synthesis; probabilistic constraints on the controllers naturally raise difficult questions on what actions to take when such constraints are violated (see however [1] for one possible approach to answer these questions).

In this paper we aim to provide answers to the following questions: Given a linear system that is affected by (possibly unbounded) stochastic noise, to be controlled by applying predictive-type bounded control inputs, (i) is the associated optimization problem tractable? (ii) under what conditions is stability (in a suitable stochastic sense) of the closed-loop system guaranteed? (iii) is stability retained both in the case of MPC implementation and the case of Rolling Horizon Control (RHC) implementation?

In the deterministic setting, there exists a plethora of literature that settles tractability and stability of model-based predictive control, see, for example, [2, 3, 4, 5] and the references therein. However, there are fewer results in the stochastic case, some of which we outline next. In [6], the authors reformulate the stochastic programming problem as a deterministic one with bounded noise and solve a robust optimization problem over a finite horizon, followed by estimating the performance when the noise can take unbounded values, i.e., when the noise is unbounded, but takes high values with low probability (as in the Gaussian case). In [7, 8] a slightly different problem is addressed in which the noise enters in a multiplicative manner into the system, and hard constraints on the state and control input are relaxed to probabilistic ones. Similar relaxations of hard constraints to soft probabilistic ones have also appeared in [9] for both multiplicative and additive noise inputs, as well as in [10]. There are also other approaches, for example those employing randomized algorithms as in [11, 12]. Finally, a related line of research can be found in [13], and a novel convex analysis dealing with chance and integrated chance constraints can be found in [14].

In this paper we restrict attention to linear time-invariant controlled systems with affine stochastic disturbance inputs. Our approach has three main features. Firstly, for the finite-horizon optimal control subproblem we adopt a feedback control strategy that is affine in certain bounded nonlinear functions of the past noise inputs. Secondly, instead of following the usual trend of adding element-wise constraints to the control input in the optimization, we propose a new approach that entails saturating the utilized noise measurements first and then optimizing over the feedback gains, ensuring that the hard constraints on the input will be satisfied by construction. This novel approach does not require artificially relaxing the hard constraints on the control input to soft probabilistic ones to ensure large feasible sets, and still provides a solution to the problem for a wide class of noise input distributions. In fact, we demonstrate that our strategy (without state constraints) leads to global feasibility. The effect of the noise appears in the finite-horizon optimal control problem as certain covariance matrices, and these matrices may be computed off-line and stored. Thirdly, the measurement saturation functions are only required to be elementwise bounded in order to ensure tractability of the optimization problem while maintaining hard constraints on the control input; therefore, these measurement saturation functions may be picked from among the wide class of saturation functions, the standard sigmoidal functions and their piecewise affine approximations, etc.

Once tractability of the finite-horizon underlying optimization problem is insured, it is possible to implement the resulting optimal solution using an MPC approach or an RHC approach. In the former case [2], the optimization problem is resolved at each step and only the first control input is implemented. In the latter case [15], the optimization problem is resolved every steps (with being the horizon length) and the entire sequence of input vectors is implemented. Both of these approaches are shown to provide stability under the assumption that the zero-input and zero-noise system is asymptotically stable, which translates into the condition that the state matrix is Schur stable. At a first glance, this assumption might seem restrictive. However, the problem of ensuring bounded variance of linear Gaussian systems with bounded control inputs is, to our knowledge, still open, and here we are considering the problem of controlling a linear system with bounded control input and possibly unbounded noise. It is known that for discrete-time systems without any noise acting on the system it is possible to achieve global stability if and only if the matrix is neutrally stable [16].

This paper unfolds as follows. In §II we state the main problem to be tackled with the underlying assumptions. In §III, we provide a tractable approach to the finite horizon optimization problem with hard constraints on the control input, as well as some examples in §III-A. Stability of the MPC and RHC implementations is shown in §IV, and hints onto the input-to-state stable properties of this result are provided in §IV-C. Finally, we provide a numerical example in §V and conclude in §VI.

### Notation

Hereafter, is the set of natural numbers, , and is the set of nonnegative real numbers. We let denote the indicator function of a set , and and denote the -dimensional identity and zeros matrices, respectively. Also, let denote the expected value given , and denote the trace of a matrix. For a given symmetric -dimensional matrix with real entries, let be the set of eigenvalues of , and let and . Let denote standard norm. Finally, the mean and covariance matrix of any vector are denoted by and , respectively.

## Ii Problem Statement

Consider the following general affine discrete-time stochastic dynamical model:

 xt+1=Axt+But+Fwt+r,t∈N0, (1)

where is the state, is the control input, is a stochastic noise input vector, , and are known matrices, and is a known constant vector. We assume that the initial condition is given and that, at any time , is observed exactly. We shall assume further that the noise vectors are i.i.d. and that the control input vector is bounded at each instant of time , i.e.,

 (2)

where is some given element-wise saturation bound. Note that the model (1) with constraints (2) can handle a wide range of convex polytopic constraints. In particular, any system

 xt+1=Axt+^Bvt+F^wt+^r (3)

with input constraints that can be transformed to the form (2) by an affine transformation

 vt=Sut+l

is amenable to our approach by setting and in (1). Note that the set need not necessarily be a hypercube, or even contain the origin. Note also that we can assume that is zero mean in (1) without loss of generality; given a system of the form (3) where is not zero mean, we can replace it by a system in the form (1) with zero mean in which

 wt=^wt−E[wt]

by setting .

Fix a horizon and set . The MPC procedure can be described as follows.

• Determine an admissible optimal feedback control policy, say , for an -stage cost function starting from time , given the (measured) initial condition ;

• increase to , and go back to step (a).

On the other hand, the RHC procedure simply replaces (b) above by

• apply the entire sequence of control inputs, update the state at the -th step, increase to and go back to step (a).

Accordingly, the -th step of this procedure consists of minimizing the stopped -period cost function starting at time , namely, the objective is to find a feedback control policy that attains

 infπ∈ΠVt,t+N−1(π,x):= infπ∈ΠEπxt[t+N−1∑i=t(xTiQixi+uTiRiui) +xTt+NQt+Nxt+N]. (4)

Since both the system (1) and cost (4) are time-invariant, it is enough to consider the problem of minimizing the cost for , i.e., the problem of minimizing over .
In view of the above we consider the problem

 minπ∈Π Ex0[N−1∑t=0(xTtQtxt+uTtRtut)+xTNQNxN], (5) s.t. dynamics(???),andconstraints(???)

where and are some given symmetric matrices of appropriate dimension. If feasible with respect to (2), Problem (5) generates an optimal sequence of feedback control laws .

The evolution of the system (1) over a single optimization horizon can be described in compact form as follows:

 ¯x=¯Ax0+¯B¯u+¯D¯F¯w+¯D¯r, (6)

where

 ¯x:=⎡⎢ ⎢ ⎢ ⎢⎣x0x1⋮xN⎤⎥ ⎥ ⎥ ⎥⎦,¯u:=⎡⎢ ⎢ ⎢ ⎢⎣u0u1⋮uN−1⎤⎥ ⎥ ⎥ ⎥⎦,¯r:=⎡⎢ ⎢⎣r⋮r⎤⎥ ⎥⎦,¯w:=⎡⎢ ⎢ ⎢ ⎢⎣w0w1⋮wN−1⎤⎥ ⎥ ⎥ ⎥⎦,
 ¯A:=⎡⎢ ⎢ ⎢ ⎢⎣In×nA⋮AN⎤⎥ ⎥ ⎥ ⎥⎦,¯B:=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣0n×m⋯⋯0n×mB⋱⋮ABB⋱⋮⋮⋱0n×mAN−1B⋯ABB⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦,

where the input

 ¯u∈¯U:={ξ∈RNm∣∣∥ξ∥∞≤Umax}. (7)

Using the compact notation above, the optimization Problem (5) can be rewritten as follows:

 minπ∈Π Ex0[¯xT¯Q¯x+¯uT¯R¯u], (8) s.t. dynamics(???),andconstraints(???),

where

The solution to Problem (8) is difficult to obtain in general. In order to obtain an optimal solution to Problem (8) over the class of feedback policies, we need to solve the Dynamic Programming equations. This generally requires using some gridding technique, making the problem extremely difficult to solve computationally. Another approach is to restrict attention to a specific class of state feedback policies. This will result in a suboptimal solution to our problem, but may yield a tractable optimization problem. It is the track we pursue in the next section.

## Iii Tractable Solution under Bounded Control Inputs

By the hypothesis that the state is observed without error, one may reconstruct the noise sequence from the sequence of observed states and inputs by the formula

 Fwt=xt+1−Axt−But−r,t∈N0. (9)

In the light of this, and inspired by the works [17, 18], we shall consider feedback policies of the form:

 ut=t−1∑i=0Gt,iFwi+dt, (10)

where the feedback gains and the affine terms must be chosen based on the control objective, while observing the constraints (2). With this definition, the value of at time depends on the values of up to time . Using (9) we see that is a function of the observed states up to time . It was shown in [18] that there exists a one-to-one (nonlinear) mapping between control policies in the form (10) and the class of affine state feedback policies. That is, provided one is interested in affine state feedback policies, parametrization (9) constitutes no loss of generality. Of course, this choice is generally suboptimal, but it will ensure the tractability of a large class of optimal control problems. In compact notation, the control sequence up to time is given by

 ¯u=¯G¯F¯w+¯d, (11)

where , and

 ¯G:=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣0m×nG1,00m×n⋮⋱⋱GN−1,0⋯GN−1,N−20m×n⎤⎥ ⎥ ⎥ ⎥ ⎥⎦.

Since the elements of the noise vector are not assumed to be bounded, there can be no guarantee that the control input (11) will meet the constraint (7). This is a problem in practical applications, and has traditionally been circumvented by assuming that the noise input lies within a compact set [18], and designing a worst-case controller. In this article we propose to use the controller

 ¯u=¯G¯φ(¯F¯w)+¯d, (12)

 ¯φ(¯F¯w)=⎡⎢ ⎢⎣φ0(Fw0)⋮φN−1(FwN−1)⎤⎥ ⎥⎦,

is a shorthand for the vector , is the -th row of the matrix , and is any function with . In other words, we have chosen to saturate the measurements that we obtain from the noise input vector before inserting them into our control vector. This way we do not assume that the noise distribution is defined over a compact domain, which is an advantage over other approaches [6, 18]. Moreover, the choice of element-wise saturation functions is left open. As such, we can accommodate standard saturation, piecewise linear, and sigmoidal functions, to name a few.

###### Remark 1

Our choice of saturating the measurement from the noise vectors renders the optimization problem tractable as opposed to just calculating the whole input vector and then saturating it afterwards, which tends to an intractable optimization problem.

###### Remark 2

Note that the choices of control inputs in (11) and (12) are both non Markovian; however, they differ in the fact that the former depends affinely on previous noise inputs , whereas the latter is a nonlinear feedback due to passing noise measurements through the function .

###### Proposition 3

Assume that , . Then, Problem (8) with the input (12) is a convex optimization problem, with respect to the decision variables , which is given by

 min(¯G,¯d) bT¯d+¯dTM1¯d+tr(¯GTM1¯GΛ1+M2¯GΛ2) (13) s.t. |¯di|+∥∥¯Gi∥∥1ϕmax≤Umax,∀i=1,⋯,Nm

where is the -th row of ,

 bT =2(¯Ax0+¯D¯Fμ¯w+¯r)T¯Q¯B,M1=¯R+¯BT¯Q¯B, M2 =2¯FT¯DT¯Q¯B, Λ1 =diag{E[φ0(Fw0)φ0(Fw0)T],⋯, E[φN−1(FwN−1)φN−1(FwN−1)T]}, Λ2 =diag{E[φ0(Fw0)wT0],⋯, E[φN−1(FwN−1)wTN−1]}.
{proof}

Let us first consider the cost function in Problem 8. After substituting the system equations, we obtain

 Ex0[¯xT¯Q¯x+¯uT¯R¯u]= (14) Ex0[(¯Ax0+¯B¯u+¯D¯F¯w+¯r)T¯Q(¯Ax0+¯B¯u+¯D¯F¯w+¯r) +¯uT¯R¯u] =(¯Ax0+¯r)T¯Q(¯Ax0+¯r)+2(¯Ax0+¯r)T¯Q¯D¯FEx0[¯w] +2(¯Ax0+¯r)T¯Q¯BEx0[¯u]+2Ex0[¯wT¯FT¯DT¯Q¯B¯u] +Ex0[¯wT¯FT¯DT¯Q¯D¯F¯w]+Ex0[¯uT(¯R+¯BT¯Q¯B)¯u].

Note that since , we have that . Accordingly, using the definitions of , , , and ,

 Ex0[¯xT¯Q¯x+¯uT¯R¯u] =bT¯d+tr(M2¯GΛ2)+c +Ex0[¯uTM1¯u], (15)

where is a constant that we omit as it does not change the optimization problem, and we have used the following intermediate step

 Ex0[¯wT¯FT¯DT¯Q¯B¯u]=Ex0[¯wT¯FT¯DT¯Q¯B(¯G¯φ(¯F¯w)+¯d)] =tr(¯FT¯DT¯Q¯B¯GΛ2)+μT¯w¯FT¯DT¯Q¯B¯d.

Using again the assumption that , we have that

 Ex0[¯uT M1¯u]=Ex0[(¯G¯φ(¯F¯w)+¯d)TM1(¯G¯φ(¯F¯w)+¯d)] =tr(¯GTM1¯GEx0[¯φ(¯F¯w)¯φ(¯F¯w)T])+¯dTM1¯d =tr(¯GTM1¯GΛ1)+¯dTM1¯d. (16)

Finally, combining (15) and (16), we obtain the cost in Problem 13, which is convex.

Let us look at the constraints in Problem 8. The proposed control input (12) satisfies the hard constraints (7) as long as the following condition is satisfied: , such that . This is equivalent to the following conditions: , , such that . As these conditions should hold for any permissible value of the function , we can eliminate the dependence of the constraints on through the following optimization problems . It is straightforward now to show, using Hölder’s inequality [19, p. 29], that , and the result follows.

###### Remark 4

Problem (13) is a quadratic program in the optimization parameters  [20, p. 111], and can be solved efficiently by standard solvers such as cvx [21].

### Iii-a Examples

An important step in the solvability of Problem (13) is being able to calculate the matrices and . In general, these matrices can be calculated off-line by numerical integration. However, in some instances these matrices can be given in terms of explicit formulas; two of these instances are given in the following examples.

Recall the following standard special mathematical functions: the standard error function and the complementary error function [22, p. 297] defined by for , the incomplete Gamma function [22, p. 260] defined by for , the confluent hypergeometric function [22, p. 505] defined by for and is the standard Gamma function.

We collect a few facts in the following

For we have

1. ;

2. ;

3. ;

4. ;

5. .

###### Example 6

Let us consider (1) with Gaussian noise and sigmoidal bounds on the control input. More precisely, suppose that the noise process is an independent and identically distributed (i.i.d) sequence of Gaussian random vectors of mean and covariance . Let the components of be mutually independent, which implies that is a diagonal matrix . Suppose further that the matrix and that the function is a standard sigmoid, i.e., . Then from Proposition 5 we have for and ,

 E[φ(wij)2] =2⋅1√2π\vrule height 6.999893pt depth -5.5% 99915ptσi∫∞0t21+t2e−t22σ2i =√2π\vrule height 6.999893pt depth -5.599915ptσi−πe−12σ2ierfc(1√2\vrule height 6.999% 893pt depth -5.599915ptσi).

This shows that the matrix in Proposition 3 is equal to , where Similarly, since

 E[φ(wij)wij] =σi√2\vrule height 6.999893pt depth -5.5999% 15ptU(12,0,12σ2i),

the matrix in Proposition 3 is , where Therefore, given the system (1), the control policy (10), and the description of the noise input as above, the matrices and derived above complete the set of hypotheses of Proposition 3. The problem (5) can now be solved as a quadratic program (13).

Note that we have chosen to use the standard sigmoidal functions in Example 6. However, the result still holds for more general sigmoidal functions of the form , where is some given magnitude and is some given slope. This slight change is reflected in the entries of the matrices and , i.e., for and ,

 E[φ(wij)2]

and .

###### Example 7

Consider the system (1) as in Example 6, with being the standard saturation function defined as . From Proposition 3 we have for and ,

 ξ′i

and

 ξ′′i :=E[φ(wij)wij]=1√2π\vrule h% eight 6.999893pt depth -5.599915ptσi∫∞−∞tφ(t)e−t22σ2idt

Therefore, in this case the matrix in Proposition 3 is with , and the matrix is with . These information complete the set of hypotheses of Proposition 3, and the problem (5) can now be solved as a quadratic program (13).

## Iv Stability Analysis

In this section, we assume that the matrix is Schur stable, i.e., , . Accordingly, and since the control is bounded, it is intuitively evident that the closed-loop system is stable in some sense. Indeed, we shall show that the variance of the state is uniformly bounded both in the MPC and RHC cases, the only difference being a choice of implementation based on available memory.

First we need the following Lemma. It is a standard variant of the Foster-Lyapunov condition [23]; we include a proof here for completeness. The hypotheses of this Lemma are stronger than usual, but are sufficient for our purposes; see e.g., [24] for more general conditions.

###### Lemma 8

Let be an -valued Markov process. Let be a continuous positive definite and radially unbounded function, integrable with respect to the probability distribution function of . Suppose that there exists a compact set and a number such that

 E[V(x1)∣∣x0=x]⩽λV(x),∀x∉K.

Then .

{proof}

From the conditions it follows immediately that

 Ex[V(x1)]⩽λV(x)+b1K(x),∀x∈Rn

where . We then have

 Ex[V(xt)] =Ex[E[V(xt)∣∣xt−1]] (17) ⩽Ex[E[λV(xt−1)+b1K(xt−1)]] ⩽λtV(x)+t−1∑i=0λt−1−ibEx[1K(xi)] ⩽λtV(x)+b(1−λt)1−λ, (18)

which shows that as claimed.

We shall utilize Lemma 8 in order to show that the implementation of either the MPC or the RHC strategy generated by the solution of Problem (13) results in a uniformly bounded state variance.

### Iv-a MPC Case

The MPC implementation corresponding to our input (12) and optimization program (13) consists of the following steps: Given a fixed optimization horizon , set the initial time , calculate the optimal control gains using the program (13), apply the first optimal control input , increase to , and iterate. Of course, the optimal gain depends implicity on the current given initial state, i.e., , which in turn gives rise to a stationary infinite horizon optimal policy given by . The closed-loop system is thus given by

 xt+1=Axt+B¯d∗0|t+Fwt+r,t∈N0. (19)
###### Proposition 9

Assume that the matrix is Schur stable and the assumptions of Proposition 3 hold. Then, under the control policy defined above, the closed loop system (19) satisfies .

{proof}

Since by assumption the matrix is Schur stable, there exists a positive definite and symmetric matrix with real entries, say , such that . Using the system (19), at each time instant we have

 Ext[xTt+1Pxt+1]= Ext[(Axt+B¯d∗0|t+Fwt+r)TP(Axt+B¯d∗0|t+Fwt+r)] =xTtATPAxt+2xTtATP(B¯d∗0|t+Fμwt+r) +¯d∗T0|tBTPB¯d∗0|t+rTPr+2(Fμwt+r)TPB¯d∗0|t +2rTPFμwt+tr(FTPFΣwt).

Using the fact that (from  (13)), we obtain the following bound

 Ext[xTt+1Pxt+1] ≤xTtATPAxt+2c1∥xt∥∞+c2,

where and . Since , we have that

 (20)

For we know that

where . From (20) it now follows that whence

We see that the hypotheses of Lemma 8 are satisfied with , , and . Since , it follows that

which completes the proof.

### Iv-B RHC Case

In the RHC implementation is also iterative in nature, however instead of recalculating the gains at each time instant the optimization problem is solved every steps, where . The resulting optimal control policy (applied over a horizon ) is given by , where again the control gains depend implicitly on the initial condition , i.e., and . Therefore, the optimal policy is given by . For , the resulting closed-loop system over horizon is given by

 xkN+ℓ=AℓxkN+¯Bℓ¯G∗kN¯φ(¯F¯w)+¯Bℓ¯d∗kN+¯Dℓ¯F¯w+¯Dℓ¯r, (21)

where , and and are suitably defined matrices that are extracted from and , respectively.

###### Proposition 10

Assume that the matrix is Schur stable and the assumptions of Proposition 3 hold. Then, under the control policy defined above, the closed loop system (21) satisfies .

{proof}

Using (21) and the fact that , , we have that

 ExkN[xTkN+ℓPℓxkN+l]=xTkN(Aℓ)TPℓAℓxkN +2xTkN(Aℓ)TPℓ(¯Bℓ¯d∗kN+¯Dℓ¯Fμ¯w+¯Dℓ¯r)+¯rT¯DTℓPℓ¯Dℓ¯r +(¯d∗kN)T¯BTℓPℓ¯Bℓ¯d∗kN+2(¯d∗kN)T¯BTℓ