Stochastic Model Predictive Control:
OutputFeedback, Duality and Guaranteed Performance
Abstract
A new formulation of Stochastic Model Predictive Output Feedback Control is presented and analyzed as a translation of Stochastic Optimal Output Feedback Control into a receding horizon setting. This requires lifting the design into a framework involving propagation of the conditional state density, the information state, via the Bayesian Filter and solution of the Stochastic Dynamic Programming Equation for an optimal feedback policy, both stages of which are computationally challenging in the general, nonlinear setup. The upside is that the clearance of three bottleneck aspects of Model Predictive Control is connate to the optimality: output feedback is incorporated naturally; dual regulation and probing of the control signal is inherent; closedloop performance relative to infinitehorizon optimal control is guaranteed. While the methods are numerically formidable, our aim is to develop an approach to Stochastic Model Predictive Control with guarantees and, from there, to seek a less onerous approximation. To this end, we discuss in particular the class of Partially Observable Markov Decision Processes, to which our results extend seamlessly, and demonstrate applicability with an example in healthcare decision making, where duality and associated optimality in the control signal are required for satisfactory closedloop behavior.
,
Department of Mechanical & Aerospace Engineering, University of California, San Diego, La Jolla, CA 920930411, USA
Key words: stochastic control, predictive control, information state, performance analysis, dual optimal control.
1 Introduction
MPC, in its original formulation, is a fullstate feedback law. This underpins two theoretical limitations of MPC: accommodation of output feedback, and extension to include a cogent robustness theory since the state dimension is fixed. This paper addresses the first question. There have been a number of approaches, mostly hinging on replacement of the measured true state by a state estimate, which is computed via Kalman filtering [26, 33], movinghorizon estimator [5, 31], tubebased minimax estimators [20], etc. Apart from [5], these designs, often for linear systems, separate the estimator design from the control design. The control problem may be altered to accommodate the state estimation error by methods such as: constraint tightening [33], chance/probabilistic constraints [25], and so forth.
In this paper, we first consider Stochastic Model Predictive Control (SMPC), formulated as a variant of Stochastic Optimal Output Feedback Control (SOOFC), without regard to computational tractability restrictions. By taking this route, we establish a formulation of SMPC which possesses central features: accommodation of output feedback and duality/probing; examination of the probabilistic requirements of deterministic and probabilistic constraints; guaranteed performance of the SMPC controller applied to the system. Performance bounds are stated in relation to the infinitehorizonoptimally controlled closedloop performance. We next particularize our performance results to the class of Partially Observable Markov Decision Processes (POMDPs), as is discussed explicitly in [28]. For this special class of systems, application of our results and verification of the underlying assumptions are computationally tractable, as we demonstrate using a numerical example in healthcare decision making from [29].
This paper does not seek to provide a comprehensive survey of the myriad alternative approaches proposed for Stochastic Model Predictive Control (SMPC). For that, we recommend the numerous available references such as [11, 16, 19, 21]. Rather, we present a new algorithm for SMPC based on SOOFC and prove, particularly, performance properties relative to optimality. As a byproduct, we acquire a natural treatment of output feedback via the Bayesian Filter and of the associated controller duality required to balance probing for observability enhancement and regulation. The price we pay for general nonlinear systems is the suspension of disbelief in computational tractability. However, the approach delineates a target controller with assured properties. Approximating this intractable controller by a more computationally amenable variant, as opposed to identifying soluble but indirect problems without guarantees, holds the prospect of approximately attracting the benefits. Such a strategy, using a particle implementation of the Bayesian filter and scenario methods at the cost of losing duality of the control inputs, is discussed in [27]. Alternatively, as suggested in [29], one may approximate the nonlinear SMPC problem by POMDPs and apply the methods of the current paper directly, resulting in optimality and duality on the approximate POMDP system.
Comparison with Other Performance Results
Our work is related to four central papers discussing performance bounds linking the achieved cost of MPC on the infinite horizon with the cost of infinitehorizon optimal control:
 Grüne & Rantzer

[13] study the deterministic, fullstate feedback situation and provide comparison between the infinitehorizon stochastic optimal cost and the achieved infinitehorizon MPC cost. In particular, the achieved MPC cost is bounded in terms of the computed finitehorizon MPC cost.
 Hernándes & Lasserre

[14] consider the stochastic case with fullstate feedback and average as well as discounted costs. Their results yield a comparison between the infinitehorizon stochastic optimal cost and the achieved infinitehorizon MPC cost in terms of the unknown true optimal cost.
 Chatterjee & Lygeros

[3] also treat the stochastic case with fullstate feedback and average cost function. They establish and quantify a bound on the expected longrun average MPC performance related to the terminal cost function and its associated monotonicity requirement.
 Riggs & Bitmead
 The current paper
Each of these works relies on a sequence of assumptions concerning the wellposedness of the underlying optimization problems and specific monotonicity conditions on certain value functions which admit the establishment of stability and performance bounds.
We summarize the main contribution of this paper, Corollary 2, for stochastic MPC with state estimation. Subject to cost monotonicity Assumption 10, which is testable in terms of a known terminal policyand the terminal cost function, an upper bound is computable for the achieved infinitehorizon MPC cost in terms of the the computed finitehorizon MPC cost and other parameters of the monotonicity condition. As in [3], we provide an example – here a POMDP form healthcare – in which the assumptions are verified, indicating the substance of the assumptions and the nature of the conclusion regarding closedloop outputfeedback stochastic MPC.
Organization of this Paper
The structure of the paper is as follows. Section 2 briefly formulates SOOFC, as used in Section 3 to present a new SMPC algorithm. After discussing recursive feasibility of this algorithm in Section 4, we proceed by establishing conditions for boundedness of the infinitehorizon discounted cost of the SMPCcontrolled nonlinear system in Section 5. Section 6 ties the performance of SMPC to the infinitehorizon SOOFC performance. Section 7 provides a brief encapsulation and postanalysis of the set of technical assumptions in the paper. The results are particularized for POMDPs in Section 8, followed by discussion of our numerical example in Section 9. We conclude the paper in Section 10. To aid the development, all proofs are relegated to the Appendix.
Notation
and are real and nonnegative real numbers, respectively. The set of nonnegative integers is denoted and the set of positive integers by . We write sequences as , where ; is an infinite sequence of the same form. denotes the probability density function of random variable while denotes the conditional probability density function of random variable given jointly distributed random variable . The acronyms a.s., a.e. and i.i.d. stand for almost sure, almost everywhere and independent and identically distributed, respectively.
2 Stochastic Optimal OutputFeedback Control
We consider stochastic optimal control of nonlinear timeinvariant dynamics of the form
(1)  
(2) 
where , denotes the state with initial value , the control input, the measurement output, the process noise and the measurement noise. We denote by
(3) 
the known apriori density of the initial state and by
the data available at time . We make the following standing assumptions on the random variables and system dynamics.
Assumption 1
Assumption 2
The control input at time instant is a function of the data and .
As there is no direct feedthrough from to , Assumptions 1 and 2 assure that system (22) is a controlled Markov process [17]. Assumption 1 further ensures that and enjoy the Ponomarev 0property [22] and hence that and possess joint and marginal densities.
2.1 Information State & Bayesian Filter
Definition 1
2.2 Cost and Constraints
Definition 3
and are expected value and probability with respect to state – with conditional density – and i.i.d. random variables .
Given the available data , we aim to select nonanticipatory (i.e. subject to Assumption 2) control inputs to minimize
(8) 
where is the control horizon, the stage cost, the terminal cost and a discount factor. Drawing from the literature (e.g. [1, 17]), optimal controls in (8) must inherently be separated feedback policies. That is, control input depends on data and initial density solely through the current information state . Optimality thus requires propagating and policies , where
(9) 
Cost (8) then reads
(10) 
Extending stochastic optimal control problems with cost (10) to the infinite horizon (see [1, 2]) typically requires and omitting the terminal cost term , leading to
(11) 
In addition to minimizing the expected value cost (10), we impose probabilistic state constraints of the form
(12) 
for . That is, we enforce constraints with respect to the known distributions of the future noise variables and the conditional density of the current state , captured by the information state . Moreover, we consider input constraints of the form
(13) 
When discussing infinitehorizon optimal control with cost (11), we replace the state constraints (12) by the stationary probabilistic state constraints
(14) 
for and the input constraints (13) by
2.3 Stochastic Optimal Control
Definition 5
Definition 6
Definition 7
is feasible for if there exists a sequence of policies such that, a.s., satisfy the constraints and is finite. Define feasibility likewise for .
In Stochastic Optimal Control, feasibility entails the existence of policies such that for any , and
Even though the state constraints (12) are probabilistic, this condition results in an equivalent almost sure constraint on the conditional state densities. The stochastic optimal feedback policies in may now be computed in principle by solving the Stochastic Dynamic Programming Equation (SDPE),
(15)  
s.t.  
for and . The equation is solved backwards in time, from its terminal value
(16) 
Solution of the SDPE is the primary source of the restrictive computational demands in Stochastic Optimal Control. The reason for this difficulty lies in the dependence of the future information state in each step of (1516) on the current and future control inputs. While the dependence on future control inputs is limiting even in deterministic control, the computational burden is drastically worsened in the stochastic case because of the complexity of the operator in (7). On the other hand, optimality via the SDPE leads to a control law of dual nature. Dual optimal control connotes the compromise in optimal control between the control signal’s function to reveal the state and its function to regulate that state. These dual actions are typically antagonistic [9]. The duality of stochastic optimal control is a generic feature, although there exist some problems – called neutral – where the probing nature of the control evanesces, linear Gaussian control being one such case.
Notice that, while the Bayesian Filter (56) can be approximated to arbitrary accuracy using a Particle Filter [30], the SDPE cannot be easily simplified without loss of optimal probing in the control inputs. While control laws generated without solution of the SDPE can be modified artificially to include certain excitation properties, as discussed for instance in [10, 18], such approaches are suboptimal and do not generally enjoy the theoretical guarantees discussed below. For the stochastic optimal control problems considered here, excitation of the control signal is incorporated automatically and as necessary through the optimization. The optimal control policies, , will inherently inject excitation into the control signal depending on the quality of state knowledge embodied in .
3 Stochastic Model Predictive Control
Notice how this algorithm differs from common practice in SMPC [15, 21] in that we explicitly use the information states . Throughout the literature, these information states – conditional densities – are replaced by best available, or certaintyequivalent state estimates in . While this makes the problem more tractable, one no longer solves the underlying stochastic optimal control problem. As we shall demonstrate in this paper, using information state and optimal policy resulting from solution of Problem at each time instance leads to a number of results regarding closedloop performance on the infinite horizon.
4 Recursive Feasibility
Assumption 3
yields feasible for , a.s.
Assumption 4
The constraints in and , for , satisfy
Assumption 5
For all densities , there exists a policy satisfying
5 Convergence and Stability
Assumption 6
For a given , the terminal feedback policy specified in Assumption 5 satisfies
(18) 
for all densities of with . The expectation is with respect to state – with density – and .
For , Assumption 6 can be interpreted as the existence of a stochastic Lyapunov function on the terminal set of densities, . If (18) holds for , it naturally holds for all .
While the discount factor may not seem to play a major role in this result, notice that small values of may be required to satisfy Assumption 6. For , (19) implies almost sure convergence to 0 of the achieved stage cost.
Assumption 7
State is detectable via the stage cost:
While (20) holds only for , notice that SMPC for with recursive feasibility possesses the default stability property (17). For zero terminal cost , Assumption 8 replaces Assumption 6 to guarantee (19), a finite discounted infinitehorizon SMPC cost.
Assumption 8
6 InfiniteHorizon Performance Bounds
In the following, we establish performance bounds for SMPC, implemented on the infinite horizon as a proxy to solving the infinitehorizon stochastic optimal control problem . These bounds are in the spirit of previously established bounds reported for deterministic MPC in [13] and the stochastic full statefeedback case in [24].
Assumption 9
There exist and such that
(21) 
for all densities of which are feasible in .
Definition 8
Denote by the SMPC implementation of policy on the infinite horizon, i.e.
Similarly, and are the optimal sequences of policies in Problems and , respectively.
In the special case , we impose the following assumption on the terminal cost to obtain an insightful corollary to Theorem 4.
Assumption 10
For , there exists such that the terminal policy specified in Assumption 5 satisfies
for all densities of with . The expectation is with respect to state – with density – and .
This Corollary relates the following quantities: design cost, which is known as part of the SMPC calculation, optimal cost which is unknown (otherwise we would use ), and unknown infinitehorizon SMPC achieved cost .
7 Analysis of Assumptions
The sequence of assumptions becomes more inscrutable as our study progresses. However, they deviate only slightly from standard assumptions in MPC, suitably tweaked for stochastic applications. Assumptions 1 and 2 are regularity conditions permitting the development of the Bayesian filter via densities and restricting the controls to causal policies. Assumptions 3 and 4 limit the constraint sets and initial state density to admit treatment of recursive feasibility.
Assumptions 5, 6, 8 and 10 each concerns a putative terminal control policy, . Assumption 5 implies positive invariance of the terminal constraint set under . Using the martingale analysis of the proof of Theorem 3, Assumption 6 ensures that the extant achieves finite costtogo on the terminal set. The costdetectability Assumption 7 is familiar in Optimal Control to make the implication that finite cost forces state convergence. Assumption 8 temporarily replaces Assumption 6 only to consider the zero terminal cost case. Assumptions 9 and 10 presume monotonicity of the finitehorizon cost with increasing horizon, firstly for the optimal policy and then for the putative terminal policy, on the terminal set. These monotonicity assumptions mirror those of, for example, [13] for deterministic MPC and [24] for fullstate stochastic MPC. They underpin the deterministic Lyapunov analysis and the stochastic Martingale analysis based on the costtogo. These assumptions are validated for a POMDP example in Section 9.
8 Dual Optimal Stochastic MPC for POMDPs
We now proceed by particularizing the performance results from Section 6 for the special class of POMDPs, as suggested for instance in [28, 29, 32]. This class of problems is characterized by probabilistic dynamics on a finite state space , finite action space , and finite observation space . POMDP dynamics are defined by the conditional state transition and observation probabilities
(23)  
(24) 
where , , , . The state transition dynamics (23) correspond to a conventional Markov Decision Process (MDP, e.g. [23]). However, the control actions are to chosen based on the known initial state distribution and the sequences of observations, , and controls , respectively. That is, we are choosing our control actions in a Hidden Markov Model (HMM, e.g. [8]) setup. Notice that, while POMDPs conventionally do not have an initial observation in (24), as is commonly assumed in nonlinear system models of the form (22), one can easily modify this basic setup without altering the following discussion.
Given control action and measured output , the information state in a POMDP is updated via
where denotes the entry of the row vector . To specify the cost functionals (10) and (11) in the POMDP setup, we write the stage cost as if and , summarized in the column vectors of the same dimension as row vectors . Similarly, the terminal cost terms are if , summarized in the column vector . The infinite horizon cost functional defined in Section 2 then follows as
with corresponding finitehorizon variant
Extending (1516), optimal control decisions may then be computed via
(25) 
for , from terminal value function
(26) 
Assumption 11
For , there exist and a policy such that
(27) 
for all densities of .
9 An Example in Healthcare Decision Making
9.1 Problem Setup
Consider a patient treated for a specific disease which can be managed but not cured. For simplicity, we assume that the patient does not die under treatment. While this transition would have to be added in practice, it results in a timevarying model, which we avoid in order to keep the following discussion compact.
The example, introduced in [29], is set up as follows. The disease encompasses three stages with severity increasing from Stage 1 through Stage 2 to Stage 3, transitions between which are governed by a controlled Markov chain, where is the transition probability matrix with values at row and column and is the observation matrix with elements . All transition and observation probability matrices below are defined similarly. Once our patient enters Stage 3, Stages 1 and 2 are inaccessible for all future times. However, Stage 3 can only be entered through Stage 2, a transition from which to Stage 1 is possible only under costly treatment. The same treatment inhibits transitions from Stage 2 to Stage 3. We have access to the patient state only through imprecise tests, which will result in one of three possible values, each of which is representative of one of the three disease stages. However, these tests are imperfect, with nonzero probability of returning an incorrect disease stage. All possible state transitions and observations are illustrated in Figure 1.
At each point in time, the current information state is available to make one of four possible decisions/actions:

Skip next appointment slot.

Schedule new appointment.

Order rapid diagnostic test.

Apply available treatment.
Skipping an appointment slot results in the patient progressing through the Markov chain describing the transition probabilities of the disease without medical intervention, without new information being available after the current decision epoch. Scheduling an appointment does not alter the patient transition probabilities but provides a lowquality assessment of the current disease stage, which is used to refine the next information state. The third option, ordering a rapid diagnostic test, allows for a highquality assessment of the patient’s state, leading to a more reliable refinement of the next information state than otherwise possible when choosing the previous decision option. The results from this diagnostic test are considered available sufficiently fast so that the patient state remains unchanged under this decision. The remaining option entails medical intervention, allowing probabilistic transition from Stage 2 to Stage 1 while preventing transition from Stage 2 to Stage 3. Transition probabilities , observation probabilities , and stage cost vectors for each decision are summarized in Table 1. Additionally, we impose the terminal cost
Decision  Transition Probabilities  Observation Probabilities  Cost 

1: Skip next appointment slot  
2: Schedule new appointment  
3: Order rapid diagnostic test  
4: Apply available treatment 
In the solution for the optimal feedback control, the selection of a diagnostic test comes at a cost to the objective criterion and, evidently, serves to refine the information state of the system/patient. It does so without effect on the regulation of the patient other than to improve the information state. Clearly, testing to resolve the state of the patient is part of an optimal strategy in this stochastic setting; but it does take resources. A certaintyequivalent feedback control would assign treatment on the supposition that the patient’s state is precisely known. Such a controller would never order a test. The decision to apply a test in the following numerical solution is evidence of duality in recedinghorizon stochastic optimal control, viz. SMPC.
9.2 Computational Results
The tradeoff between the two principal decision categories – testing versus treatment, probing versus regulating, exploration versus excitation – is precisely what is encompassed by duality, which we can include in an optimal sense by solving (2526) and applying the resulting initial policy in receding horizon fashion. This is demonstrated in Figure 2, which shows simulation results for SMPC with control horizon and discount factor . As anticipated, the stochastic optimal receding horizon policy shows a structure not drastically different from the decision structure motivated above. In particular, diagnostic tests are used effectively to decide on medical intervention.
In order to apply Theorem 5 to this particular example, we choose the policy in Assumption 11 always to apply medical intervention. Using the worstcase scenario for the expectations in (27), which entails transition from Stage 1 to Stage 2 under treatment, we can satisfy Assumption 11 with . The computed cost in our simulation is . Combined with the discount factor , we thus have the upper bound
via application of Theorem 5. Denoting by the rowvector with entry in element and zeros elsewhere, the observed (finitehorizon) cost corresponding with Figure 2 is
10 Conclusions
The central contribution of the paper is the presentation of an SMPC algorithm based on SOOFC. This yields a number of theoretical properties of the controlled system, some of which are simply recognized as the stochastic variants of results from deterministic fullstate feedback MPC with their attendant assumptions, including for instance Theorem 1 for recursive feasibility. Theorem 2 is the main stability result in establishing the finiteness of the discounted cost of the SMPCcontrolled system. Theorem 3 and Corollary 1 deal with consequent convergence of the state in special cases.
Performance guarantees of SMPC are made in comparison to performance of the infinitehorizon stochastically optimally controlled system and are presented in Theorem 4 and Corollary 2. These results extend those of [24], which pertain to fullstate feedback stochastic optimal control and which therefore do not accommodate duality. Other examples of stochastic performance bounds are mostly restricted to linear systems and, while computable, do not relate to the optimal constrained control. While the formal stochastic results are traceable to deterministic predecessors, the divergence from earlier work is also notable. This concentrates on the use of the information state to accommodate measurements and the exploration of control policy functionals stemming from the Stochastic Dynamic Programming Equation. The resulting output feedback control possesses duality and optimality properties which are either artificially imposed in or absent from earlier approaches.
We have further suggested two potential strategies to ameliorate the computational intractability of the Bayesian filter and SDPE, famous for its curse of dimensionality. Firstly, one may use the Particle filter implementation of the Bayesian filter, which has many examples of fast execution for small state dimensions, which with a loss of duality can be combined with scenario methods. This approach is discussed in [27] as an approximation of the algorithm in this paper. Secondly, we point out that our algorithm becomes computationally tractable for the special case of POMDPs, which may be used either to approximate a nonlinear model or to model a given system in the first place. This strategy inherits the dual nature of our SMPC algorithm for general nonlinear systems.
A Proofs
a.1 Theorem 2
Denote by the discounted costtogo,
where , , are the optimal feedback policies in Problem . Moreover, define as the algebra generated by the initial state with density and the i.i.d. noise sequences and for . Then is measurable and by nonnegativity of stage and terminal cost. Then,
and, by optimality of the policies in ,
where denotes the terminal feedback policy, specified by Assumptions 5 and 6, and feasibility follows as in the proof of Theorem 1. Given that
is measurable, we then have
By Assumption 6, this yields
(A.1) 
Taking expectations in (A.1) further gives
where via feasibility of for . By positivity of the stage cost, this yields
(A.2) 
Inequalities (A.1) and (A.2) with nonnegativity of the stage cost show that is a nonnegative supermartingale on its filtration and thus, by Doob’s Martingale Convergence Theorem (see [7]), converges almost surely to a finite random variable,
(A.3) 
Now define to be the discounted sample costtogo plus the achieved MPC cost at time ,
Then,
That is, recognizing that so that , also is a nonnegative supermartingale and converges almost surely to a finite random variable
a.2 Theorem 3
First proceed as in the proof of Theorem 2. By Doob’s Decomposition Theorem (see [6]) on (A.3), there exists a martingale and a decreasing sequence such that , where a.s. by (A.3). Using this decomposition, (A.1) yields
Taking limits as and reinvoking nonnegativity of the stage cost then leads to a.s., which by the detectability condition on the stage cost (Assumption 7) verifies (20).