Stochastic Model Predictive Control for Constrained Linear Systems Using Optimal Covariance Steering

# Stochastic Model Predictive Control for Constrained Linear Systems Using Optimal Covariance Steering

Kazuhide Okamoto   Panagiotis Tsiotras K. Okamoto is with the School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0150, USA. Email: kazuhide@gatech.eduP. Tsiotras is with the School of Aerospace Engineering, and also with the Institute for Robotics and Intelligent Machines, Georgia Institute of Technology, Atlanta, GA 30332-0150, USA. Email: tsiotras@gatech.edu
###### Abstract

This work develops a stochastic model predictive controller (SMPC) for uncertain linear systems with additive Gaussian noise subject to state and control constraints. The proposed approach is based on the recently developed finite-horizon optimal covariance steering control theory, which steers the mean and the covariance of the system state to prescribed target values at a given terminal time. We show that the proposed approach has several advantages over traditional SMPC approaches in the literature. Specifically, it is shown that the newly developed algorithm can deal with unbounded Gaussian additive noise while ensuring stability and recursive feasibility, and with less computational cost than previous approaches. In addition, we demonstrate the recursive feasibility and guaranteed stability of the proposed CS-SMPC algorithm. The effectiveness of the proposed approach is verified and compared to traditional approaches using numerical simulations.

## 1 Introduction

Model predictive control (MPC), often also referred to as receding horizon control, has been an active research topic both in academia [1] and industry [2, 3], because it provides robustness, deals with complex constraints, and yields near-optimal performance. In the standard MPC framework, one solves a finite-horizon optimal control problem and executes the first element of the computed optimal control sequence. At the next time step, one solves another finite-horizon optimal control problem with the updated initial condition. By doing so, MPC implicitly closes the loop and achieves stability, assuming certain additional conditions hold [4].

A special case of MPC is learning-based MPC (LBMPC), in which the system parameters are identified on-line, and they are used in the optimal control sequence in order to guarantee safety, robustness, and convergence. Bouffard et al. [5] used an extended Kalman filter (EKF) to estimate the system state and update the model parameters and formulated the optimal control problem as a convex optimization problem. The authors of [5] experimentally demonstrated their approach using a small unmanned aerial vehicle (UAV) trying to catch a ball. Rosolia and Borrelli [6] applied LBMPC for iterative tasks, in which the system parameters and the safe state sets are identified from previous iterations, from which the optimal control sequence is then computed. This approach was later applied to autonomous vehicle racing in [7], where the information obtained from the previous laps was utilized to better control the vehicle. In addition, explicit MPC [8, 9] has been developed to avoid solving an optimal control problem online, by explicitly representing the optimal control actions as a function of the state and reference values. Furthermore, the approach proposed in [10] used a hybrid MPC approach to lower the computational cost of MPC for nonlinear plants.

The above mentioned MPC approaches have been developed for deterministic systems, and thus, they do not systematically deal with system uncertainty. In order to overcome this difficulty, robust MPC (RMPC) was developed with the assumption that the disturbances lie in a compact set. Unlike the case of deterministic MPC, asymptotic stability to the origin is not achievable [11], and instead RMPC tries to achieve asymptotic stability to a set. Another approach to deal with system uncertainty is tube-MPC. The control policy of the tube-MPC [12] separates the controller into a mean controller and a feedback controller that is proportional to the deviation from the expected state value with stabilizing state feedback gain, and thus achieves asymptotic stability to a set. In addition to the previous works, Magni et al. [13] analyzed the domain of attraction of nonlinear MPC with disturbances using input-to-state stability (ISS).

RMPC only considers the range of the uncertain parameter values and does not consider their distribution. In order to explicitly deal with the probability distribution of the system uncertainty, stochastic MPC (SMPC) has been developed. See [14] for a recent overview of SMPC. Since SMPC explicitly deals with the probability distribution of the uncertainty, it uses chance constraints, which impose a maximum probability of state/output constraint violation [15, 16]. Some applications of SMPC include building climate control [17], autonomous vehicle control [18], and bacterial fermentation control [19].

Stochastic MPC approaches can be categorized into stochastic tube approach [20], disturbance-feedback control [21], and scenario-based approach [22]. Our proposed approach is most closely related to the first and the second approaches. As open-loop controllers have difficulty in dealing with disturbances, generally, feedback policies are optimized instead of the control sequence alone for SMPC problems. However, it is difficult to optimize over arbitrary feedback policies especially when some constraints have to be satisfied. Thus, a popular approach is the use of “pre-stabilizing” control policies to construct a stabilizing linear feedback control law of the form , which can be computed off-line. The resulting control command takes the form of , where is computed on-line. Although this approach reduces computational complexity, it is difficult to compute a-priori a state feedback gain that is not too conservative.

Many research works on SMPC achieve recursive feasibility and convergence by assuming a bounded probability distribution of the disturbance, e.g., a truncated Gaussian distribution. SMPC approaches dealing with unbounded disturbance distributions have been proposed by several researchers [19, 23, 24]. For example, Farina et al. [23] considered a control policy that consists of a feed-forward controller and a feedback controller that takes the state deviation from the expectation as its input. Unlike stochastic tube approaches, the feedback gain in [23] was time-varying, and the authors of [23] tried to simultaneously optimize the feed-forward controller and feedback gains. Although this approach can overcome the difficulty of stochastic tube approaches in the design of the feedback gain, the non-convex covariance dynamics needs to be relaxed. Thus, this approach may have difficulty in computing the system covariance at each time step. The approach we propose in this article overcomes this difficulty by utilizing the results from the newly developed finite horizon optimal covariance steering control theory [25, 26].

An optimal covariance steering controller steers the covariance of a stochastic linear system to a target terminal value, while minimizing a state and control expectation-dependent cost. While the infinite horizon case has been researched since the late ‘80s [27, 28], the finite horizon case has not been investigated until recently [29, 30, 31, 32]. In our previous work [25], we introduced state chance constraints into the optimal covariance steering problem, and in [26], we proposed a path planner that is based on optimal covariance steering. To our knowledge, the current paper is the first work to incorporate optimal covariance steering into the SMPC framework.

The main contribution of this manuscript is the utilization of the optimal covariance steering theory to the SMPC framework. Although optimal covariance steering and SMPC have been developed independently from each other, they have many theoretical similarities, and by combining them as a covariance steering-based stochastic model predictive controller (CS-SMPC) we can achieve computational efficiency along with the ability to deal with unbounded Gaussian disturbances while also guaranteeing feasibility and stability. The key advantage of directly controlling the covariance of the state is illustrated in Fig. 3. While standard methods (e.g., stochastic tube MPC) try to control the mean state so that the end of the horizon the covariance is within the given bounds, CS-SMPC directly, and simultaneously, controls the covariance and the mean so obtain feasible solutions that satisfy the constraints. As a result, CS-SMPC leads to less conservative controllers that tend to operate closer to the boundaries, thus increasing performance. The benefit of the proposed CS-SMPC approach is also described with a vehicle control example in Fig. 16 in Section 5.

The remainder of this paper is organized as follows. Section 2 formulates the problem and introduces the general SMPC problem setup. In Section 3, before introducing the CS-SMPC algorithm, we review some mathematical preliminaries and briefly discuss the newly developed finite horizon optimal covariance steering algorithm [26], which is the basis for the CS-SMPC. Section 4 introduces the proposed CS-SMPC approach, followed by the proof of recursive feasibility and guaranteed stability. In Section 5 we validate the effectiveness of the CS-SMPC approach using numerical simulations. Finally, Section 6 summarizes the current work and discusses future research directions.

The notation used in this paper is quite standard. We use and to denote the fact that the matrix is positive definite and semidefinite, respectively. Also, we use and to denote element-wise inequalities. denotes the trace of the square matrix , and denotes the block-diagonal matrix with block-diagonal matrices . is the 2-norm of the vector . is the Frobenius norm of the matrix . denotes the range space of the matrix . denotes a random variable sampled from a Gaussian distribution with mean and (co)variance . denotes the expected value, or the mean, of the random variable . is the identity matrix of size .

## 2 Problem Statement

In this section we formulate the general SMPC problem and introduce the necessary mathematical preliminaries along with the optimal covariance steering background theory used in the proposed approach.

### 2.1 Problem Formulation

We consider the following discrete-time stochastic linear time-invariant (LTI) system with additive noise,

 xk+1=Axk+Buk+Dwk, (1)

where is the time-step index, is the state, is the control input, and is a zero-mean independently and identically distributed (i.i.d.) Gaussian noise. Thus, has the following properties,

 E[wk]=0,E[wk1w⊤k2]=Inwδk1,k2, (2)

where is the Kronecker’s delta function. In addition, the following holds

 E[xk1w⊤k2]=0,0≤k1≤k2. (3)

It is assumed that the state and control inputs are subject to the constraints

 xk∈X,uk∈U, (4)

for all , where and are convex sets containing the origin. Throughout this work, we assume that the sets and are convex polytopes, represented as the intersection of a finite number of linear inequality constraints as follows

 X ≜Ns−1⋂i=0{x:α⊤x,ix≤βx,i}, (5) U ≜Nc−1⋂j=0{u:α⊤u,ju≤βu,j}, (6)

where and are constant vectors, and and are constant scalars. In (5) and (6), and denote the number of state and control constraints defining the polytopes, respectively. Notice that, since the system noise in (1) is possibly unbounded, the state may be unbounded as well. Thus, we formulate the state constraints probabilistically, as chance constraints

 Pr(xk∈X)≥1−ϵx, (7)

where . Using Boole’s inequality, (5) and (7) can be satisfied, assuming

 Pr(α⊤x,ixk≤βx,i) ≥1−px,i, (8)

for all , where are such that

 Ns−1∑i=0px,i≤ϵx. (9)

Similarly, we replace the second inclusion in (4) with the chance constraint

 Pr(uk∈U)≥1−ϵu, (10)

and along with (6), we impose the following conditions

 Pr(α⊤u,juk≤βu,j) ≥1−pu,j, (11) Nc−1∑j=0pu,j ≤ϵu. (12)

Finally, the distribution of the initial state is assumed to be normal, according to

 x0∼N(μ0,Σ0). (13)

In this work we aim to design a control sequence that solves the following infinite horizon optimal control problem with state and control expectation-dependent quadratic cost.

###### Remark 1.

As discussed in [33, 19], SMPC with input hard constraints is not possible if the disturbance is unbounded and the system is not Schur stable [21] nor Lyapunov stable [34]. Thus, we use input chance constraints here, as indicated in (14d).

### 2.2 SMPC Formulation

The SMPC aims to solve the infinite-horizon optimal control problem (14) by solving, at each time step , the following finite horizon optimal control problem.

The variables and are the mean and the covariance of the state , and assumed to be given. The notation denotes the state at time step predicted at time step .

We denote the optimal solution to (15) as . At time step , we apply to the system (1), i.e., . Then, at time step , we solve the finite horizon optimal control problem (15) again, with the new initial condition

 xk+1|k+1=xk+1=Axk+Bu∗k|k+Dwk, (16)

which leads to a receding horizon control strategy that solves the original infinite horizon optimal control problem (14). The function is a terminal cost that needs to be designed properly to ensure stability [4]. In this work, we show that optimal covariance steering theory helps us choose an appropriate expression for and solves Problem (15) efficiently and robustly.

## 3 Mathematical Preliminaries of Optimal Covariance Steering

In this section, we introduce the optimal covariance steering controller under state and control chance constraints, which will be applied to solve the SMPC problem in the following sections.

### 3.1 Optimal Covariance Steering Problem

In the discrete-time optimal covariance steering problem setup [25], we steer the state distribution of system (1) from an initial distribution (13) to a prescribed Gaussian distribution at a given time step . Specifically, given an initial distribution (13), we solve the following problem.

Henceforth, we make the following assumptions.

###### Assumption 1.

The pair in (1) is controllable.

###### Assumption 2.

All control channels are corrupted by noise, that is, .

###### Assumption 3.

The horizon .

Note that from Assumption 1 and Assumption 2 it follows that the pair is also controllable. Furthermore, by choosing , along with Assumption 1,we ensure that is reachable from for any , provided that for with no state and control constraints. This assumption implies that, given any and , there exists a sequence of control inputs that steers to in the absence of disturbances or any constraints.

In order to proceed, we first rewrite the system dynamics in a more convenient form. Following [32], it is straightforward to obtain the following equivalent form of the system dynamics (17b),

 X=Ax0+BU+DW, (18)

where

 X=⎡⎢ ⎢ ⎢ ⎢⎣x0x1⋮xN⎤⎥ ⎥ ⎥ ⎥⎦∈R(N+1)nx,U=⎡⎢ ⎢ ⎢ ⎢⎣u0u1⋮uN−1⎤⎥ ⎥ ⎥ ⎥⎦∈RNnu,W=⎡⎢ ⎢ ⎢ ⎢⎣w0w1⋮wN−1⎤⎥ ⎥ ⎥ ⎥⎦∈RNnw, (19)

where the matrices , , and are defined accordingly. Note that

 E[x0x⊤0] =Σ0+μ0μ⊤0, (20a) E[x0W⊤] =0, (20b) E[WW⊤] =INnw (20c)

The following lemma provides an equivalent form of Problem (17).

###### Lemma 1.

Given (13), one can derive the following equivalent form of Problem (17) using (18), (19), and (20).

###### Proof.

The proof is straightforward from the discussion in [25]. Note also that, because and , it follows that and . ∎

The following theorem shows that Problem (21) can be relaxed to a convex programming problem.

###### Theorem 1.

Given (13), (18), (19), (20), and the relaxation of (21f)

 Σf⪰EN(E[XX⊤]−E[X]E[X]⊤)E⊤N, (22)

along with the control law

 uk =vk+Kkyk, (23)

where , , and from

 yk+1 =Ayk+Dwk, (24a) y0 =x0−μ0, (24b)

reformulates Problem (21) as the following convex programming problem.

###### Proof.

All the steps to convert Problem (21) to Problem (25) have already been discussed in our previous works [25, 26], except for the conversion from (21d) to (25c). Thus, we only need to outline the step of converting (21d) to (25c).

Using the control law (23), the control sequence in (19) is represented as

 U=V+KY, (26)

where . It follows from (24) that

 Y=Ay0+DW, (27)

and thus, using the facts that , , and , one obtains

 E[Y]=0,E[YY⊤]=Σy. (28)

Therefore,

 E[U]=V,E[~U~U⊤]=KΣyK⊤, (29)

where . The inequality (21d) can be rewritten as

 Pr(α⊤u,jFk(V+KY)≤βu,j)≥1−pu,j. (30)

Notice that is a Gaussian distributed random scalar with mean and variance . Thus, inequality (30) becomes

 Pr(α⊤u,jFk(V+KY)≤βu,j) =1√2πα⊤u,jFkKΣyK⊤F⊤kαu,j∫βu,j−∞exp(−(ξ−α⊤u,jFkV)22α⊤u,jFkKΣyK⊤F⊤kαu,j)dξ, =Φ⎛⎜ ⎜⎝βu,j−α⊤u,jFkV√α⊤u,jFkKΣyK⊤F⊤kαu,j⎞⎟ ⎟⎠≥1−pu,j. (31)

Using the inverse function of , we obtain

 α⊤u,jFkV−βu,j+√α⊤u,jFkKΣyK⊤F⊤kαu,jΦ−1(1−pu,j)≤0, (32)

which can be readily converted to (25c). ∎

As Problem (25) is convex, one can efficiently solve the problem using a convex programming solver.

## 4 CS-SMPC Design

In the previous section, we introduced the optimal covariance steering controller. We are now ready to discuss the main result of this paper, namely the CS-SMPC algorithm, followed by a proof of recursive feasibility and guaranteed stability.

### 4.1 CS-SMPC Formulation

In this section, we solve Problem (14) by solving Problem (15) in a receding horizon manner. Specifically, at time step , as shown in Fig. 4, we wish to solve the following finite horizon stochastic optimal control problem.

Problem (33) results from Problem (15) by setting

 Jf(x) =E[x]⊤PmeanE[x], (34a)

along with the state terminal constraints (33e) and (33f). Adding terminal constraints is a common methodology to ensure recursive feasibility and stability for MPC [4]. In this section, we show that, by properly designing the terminal parameters of Problem (33), i.e., , , and , we can achieve recursive feasibility and guaranteed stability.

We start from the following theorem that converts Problem (33) to a more convenient form to solve.

###### Theorem 2.

Given , , and , and using the following control law

 ut|k =vt|k+Kt|kyt|k, (35a) where vt|k∈Rnu, Kt|k∈Rnu×nx, and yt∈Rnx from yt+1|k =Ayt|k+Dwt, (35b) yk|k =xk|k−μk|k, (35c)

for , Problem (33) can be cast as a convex programming problem as follows.

###### Proof.

The proof follows directly from the discussion in Section 3. ∎

As discussed in Section 3, Problem (36) can be efficiently solved using a convex programming solver. The remaining issue is the design of , , and . To this end, we first introduce the following results.

###### Definition 1 (Assignable Covariance [35, 36]).

The state covariance is assignable to the closed-loop system

 xk+1=(A+B~K)xk+Dwk, (37)

if satisfies

 Σ=(A+B~K)Σ(A+B~K)⊤+DD⊤. (38)

Since and is controllable, it follows that the pair is controllable as well. Since the pair is controllable, if is stabilizing, the matrix in (38) is positive definite. Conversely, if is pre-specified, from Lyapunov stability theory, any that satisfies (38) is stabilizing. Such and can be computed as follows.

###### Theorem 3 ([35]).

The set of assignable state covariances can be parameterized by the following set of linear matrix inequalities (LMIs)

 (I−BB+)(Σ−AΣA⊤ −DD⊤)(I−BB+)=0, (39a) Σ ≻0, (39b) Σ ⪰DD⊤, (39c)

where denotes the Moore-Penrose pseudoinverse of .

###### Theorem 4 ([35]).

If is an assignable covariance matrix, then all state-feedback gains that satisfy (38) are parametrized by

 ~K=B+((Σ−DD⊤)1/2G1[Ir00T]G⊤2S−1−A)+(Inu−B+B)Z, (40)

where is an arbitrary orthogonal matrix, , is an arbitrary matrix, and and are defined from the singular-value decompositions

 (I−BB+)(Σ−DD⊤)1/2 =LΛG⊤1, (41a) (I−BB+)AS =LΛG⊤2, (41b)

where , , and are orthogonal matrices, and with .

###### Remark 2.

It is worth noticing from [35] that, if is nonsingular and is full column rank, then the rank in (41) is computed from . In addition, , and thus the second term in (40) vanishes.

We are now ready to prove the recursive feasibility and guaranteed stability of the closed-loop system with the proposed CS-SMPC algorithm in (36). Let us denote the optimal cost of Problem (36) at time step by and the associated optimal control sequence by

 {u∗k|k,…,u∗k+N−1|k}={v∗k|k+K∗k|kyk|k,…,v∗k+N−1|k+K∗k+N−1|kyk+N−1|k}, (42)

which generates the corresponding optimal state sequence . Since we are dealing with systems having additive uncertainty, it is difficult to design a control law that ensures the mean square stability of the state [37]. Instead, and similarly to [20], we will show that the average of the stage cost value is bounded from above.

###### Theorem 5.

Suppose that satisfies (39), satisfies , where the set is a control invariant set such that, for any ,

 (A+B~K)μ∈Xμf, (43a) α⊤x,iμ+∥Σ1/2fαx,i∥Φ−1(1−px,i)−βx,i≤0, i=0,…,Ns−1 (43b) α⊤u,j~Kμ+∥Σ1/2f~K⊤αu,j∥Φ−1(1−pu,j)−βu,j≤0, j=0,…,Nc−1, (43c)

where is derived from (40), and is the solution of the following discrete-time Lyapunov equation

 (A+B~K)⊤Pmean(A+B~K)−Pmean+Q+~K⊤R~K=0. (44)

Then, the solution of Problem (36) ensures recursive feasibility and stability. Namely, the following two properties hold:

• If Problem (36) is feasible at time step , i.e., the control sequence (42) satisfies (36b), (36c), (36d), and (36e), then Problem (36) is feasible for all , where .

• The average stage cost is bounded from above. Specifically,

 limn→∞1nn−1∑t=0E[x∗⊤k+t|kQx∗k+t|k+u∗⊤k+t|kRu∗k+t|k]≤ℓmax, (45)

where .

###### Proof.

In order to simplify notation, henceforth we will rewrite the cost function in (33a) as

 JN(xk|k;uk|k,…,uk+N−1|k)=k+N−1∑t=kℓ(xt|k,ut|k)+Jf(xk+N|k), (46)

where