Stochastic Model Predictive Control:
Output-Feedback, Duality and Guaranteed Performance
A new formulation of Stochastic Model Predictive Output Feedback Control is presented and analyzed as a translation of Stochastic Optimal Output Feedback Control into a receding horizon setting. This requires lifting the design into a framework involving propagation of the conditional state density, the information state, via the Bayesian Filter and solution of the Stochastic Dynamic Programming Equation for an optimal feedback policy, both stages of which are computationally challenging in the general, nonlinear setup. The upside is that the clearance of three bottleneck aspects of Model Predictive Control is connate to the optimality: output feedback is incorporated naturally; dual regulation and probing of the control signal is inherent; closed-loop performance relative to infinite-horizon optimal control is guaranteed. While the methods are numerically formidable, our aim is to develop an approach to Stochastic Model Predictive Control with guarantees and, from there, to seek a less onerous approximation. To this end, we discuss in particular the class of Partially Observable Markov Decision Processes, to which our results extend seamlessly, and demonstrate applicability with an example in healthcare decision making, where duality and associated optimality in the control signal are required for satisfactory closed-loop behavior.
Department of Mechanical & Aerospace Engineering, University of California, San Diego, La Jolla, CA 92093-0411, USA
Key words: stochastic control, predictive control, information state, performance analysis, dual optimal control.
MPC, in its original formulation, is a full-state feedback law. This underpins two theoretical limitations of MPC: accommodation of output feedback, and extension to include a cogent robustness theory since the state dimension is fixed. This paper addresses the first question. There have been a number of approaches, mostly hinging on replacement of the measured true state by a state estimate, which is computed via Kalman filtering [26, 33], moving-horizon estimator [5, 31], tube-based minimax estimators , etc. Apart from , these designs, often for linear systems, separate the estimator design from the control design. The control problem may be altered to accommodate the state estimation error by methods such as: constraint tightening , chance/probabilistic constraints , and so forth.
In this paper, we first consider Stochastic Model Predictive Control (SMPC), formulated as a variant of Stochastic Optimal Output Feedback Control (SOOFC), without regard to computational tractability restrictions. By taking this route, we establish a formulation of SMPC which possesses central features: accommodation of output feedback and duality/probing; examination of the probabilistic requirements of deterministic and probabilistic constraints; guaranteed performance of the SMPC controller applied to the system. Performance bounds are stated in relation to the infinite-horizon-optimally controlled closed-loop performance. We next particularize our performance results to the class of Partially Observable Markov Decision Processes (POMDPs), as is discussed explicitly in . For this special class of systems, application of our results and verification of the underlying assumptions are computationally tractable, as we demonstrate using a numerical example in healthcare decision making from .
This paper does not seek to provide a comprehensive survey of the myriad alternative approaches proposed for Stochastic Model Predictive Control (SMPC). For that, we recommend the numerous available references such as [11, 16, 19, 21]. Rather, we present a new algorithm for SMPC based on SOOFC and prove, particularly, performance properties relative to optimality. As a by-product, we acquire a natural treatment of output feedback via the Bayesian Filter and of the associated controller duality required to balance probing for observability enhancement and regulation. The price we pay for general nonlinear systems is the suspension of disbelief in computational tractability. However, the approach delineates a target controller with assured properties. Approximating this intractable controller by a more computationally amenable variant, as opposed to identifying soluble but indirect problems without guarantees, holds the prospect of approximately attracting the benefits. Such a strategy, using a particle implementation of the Bayesian filter and scenario methods at the cost of losing duality of the control inputs, is discussed in . Alternatively, as suggested in , one may approximate the nonlinear SMPC problem by POMDPs and apply the methods of the current paper directly, resulting in optimality and duality on the approximate POMDP system.
Comparison with Other Performance Results
Our work is related to four central papers discussing performance bounds linking the achieved cost of MPC on the infinite horizon with the cost of infinite-horizon optimal control:
- Grüne & Rantzer
 study the deterministic, full-state feedback situation and provide comparison between the infinite-horizon stochastic optimal cost and the achieved infinite-horizon MPC cost. In particular, the achieved MPC cost is bounded in terms of the computed finite-horizon MPC cost.
- Hernándes & Lasserre
 consider the stochastic case with full-state feedback and average as well as discounted costs. Their results yield a comparison between the infinite-horizon stochastic optimal cost and the achieved infinite-horizon MPC cost in terms of the unknown true optimal cost.
- Chatterjee & Lygeros
 also treat the stochastic case with full-state feedback and average cost function. They establish and quantify a bound on the expected long-run average MPC performance related to the terminal cost function and its associated monotonicity requirement.
- Riggs & Bitmead
- The current paper
Each of these works relies on a sequence of assumptions concerning the well-posedness of the underlying optimization problems and specific monotonicity conditions on certain value functions which admit the establishment of stability and performance bounds.
We summarize the main contribution of this paper, Corollary 2, for stochastic MPC with state estimation. Subject to cost monotonicity Assumption 10, which is testable in terms of a known terminal policyand the terminal cost function, an upper bound is computable for the achieved infinite-horizon MPC cost in terms of the the computed finite-horizon MPC cost and other parameters of the monotonicity condition. As in , we provide an example – here a POMDP form healthcare – in which the assumptions are verified, indicating the substance of the assumptions and the nature of the conclusion regarding closed-loop output-feedback stochastic MPC.
Organization of this Paper
The structure of the paper is as follows. Section 2 briefly formulates SOOFC, as used in Section 3 to present a new SMPC algorithm. After discussing recursive feasibility of this algorithm in Section 4, we proceed by establishing conditions for boundedness of the infinite-horizon discounted cost of the SMPC-controlled nonlinear system in Section 5. Section 6 ties the performance of SMPC to the infinite-horizon SOOFC performance. Section 7 provides a brief encapsulation and post-analysis of the set of technical assumptions in the paper. The results are particularized for POMDPs in Section 8, followed by discussion of our numerical example in Section 9. We conclude the paper in Section 10. To aid the development, all proofs are relegated to the Appendix.
and are real and non-negative real numbers, respectively. The set of non-negative integers is denoted and the set of positive integers by . We write sequences as , where ; is an infinite sequence of the same form. denotes the probability density function of random variable while denotes the conditional probability density function of random variable given jointly distributed random variable . The acronyms a.s., a.e. and i.i.d. stand for almost sure, almost everywhere and independent and identically distributed, respectively.
2 Stochastic Optimal Output-Feedback Control
We consider stochastic optimal control of nonlinear time-invariant dynamics of the form
where , denotes the state with initial value , the control input, the measurement output, the process noise and the measurement noise. We denote by
the known a-priori density of the initial state and by
the data available at time . We make the following standing assumptions on the random variables and system dynamics.
The control input at time instant is a function of the data and .
As there is no direct feedthrough from to , Assumptions 1 and 2 assure that system (2-2) is a controlled Markov process . Assumption 1 further ensures that and enjoy the Ponomarev 0-property  and hence that and possess joint and marginal densities.
2.1 Information State & Bayesian Filter
2.2 Cost and Constraints
and are expected value and probability with respect to state – with conditional density – and i.i.d. random variables .
Given the available data , we aim to select non-anticipatory (i.e. subject to Assumption 2) control inputs to minimize
where is the control horizon, the stage cost, the terminal cost and a discount factor. Drawing from the literature (e.g. [1, 17]), optimal controls in (8) must inherently be separated feedback policies. That is, control input depends on data and initial density solely through the current information state . Optimality thus requires propagating and policies , where
Cost (8) then reads
In addition to minimizing the expected value cost (10), we impose probabilistic state constraints of the form
for . That is, we enforce constraints with respect to the known distributions of the future noise variables and the conditional density of the current state , captured by the information state . Moreover, we consider input constraints of the form
for and the input constraints (13) by
2.3 Stochastic Optimal Control
is feasible for if there exists a sequence of policies such that, -a.s., satisfy the constraints and is finite. Define feasibility likewise for .
In Stochastic Optimal Control, feasibility entails the existence of policies such that for any , and
Even though the state constraints (12) are probabilistic, this condition results in an equivalent almost sure constraint on the conditional state densities. The stochastic optimal feedback policies in may now be computed in principle by solving the Stochastic Dynamic Programming Equation (SDPE),
for and . The equation is solved backwards in time, from its terminal value
Solution of the SDPE is the primary source of the restrictive computational demands in Stochastic Optimal Control. The reason for this difficulty lies in the dependence of the future information state in each step of (15-16) on the current and future control inputs. While the dependence on future control inputs is limiting even in deterministic control, the computational burden is drastically worsened in the stochastic case because of the complexity of the operator in (7). On the other hand, optimality via the SDPE leads to a control law of dual nature. Dual optimal control connotes the compromise in optimal control between the control signal’s function to reveal the state and its function to regulate that state. These dual actions are typically antagonistic . The duality of stochastic optimal control is a generic feature, although there exist some problems – called neutral – where the probing nature of the control evanesces, linear Gaussian control being one such case.
Notice that, while the Bayesian Filter (5-6) can be approximated to arbitrary accuracy using a Particle Filter , the SDPE cannot be easily simplified without loss of optimal probing in the control inputs. While control laws generated without solution of the SDPE can be modified artificially to include certain excitation properties, as discussed for instance in [10, 18], such approaches are suboptimal and do not generally enjoy the theoretical guarantees discussed below. For the stochastic optimal control problems considered here, excitation of the control signal is incorporated automatically and as necessary through the optimization. The optimal control policies, , will inherently inject excitation into the control signal depending on the quality of state knowledge embodied in .
3 Stochastic Model Predictive Control
Notice how this algorithm differs from common practice in SMPC [15, 21] in that we explicitly use the information states . Throughout the literature, these information states – conditional densities – are replaced by best available, or certainty-equivalent state estimates in . While this makes the problem more tractable, one no longer solves the underlying stochastic optimal control problem. As we shall demonstrate in this paper, using information state and optimal policy resulting from solution of Problem at each time instance leads to a number of results regarding closed-loop performance on the infinite horizon.
4 Recursive Feasibility
yields feasible for , -a.s.
The constraints in and , for , satisfy
For all densities , there exists a policy satisfying
5 Convergence and Stability
For a given , the terminal feedback policy specified in Assumption 5 satisfies
for all densities of with . The expectation is with respect to state – with density – and .
While the discount factor may not seem to play a major role in this result, notice that small values of may be required to satisfy Assumption 6. For , (19) implies almost sure convergence to 0 of the achieved stage cost.
State is detectable via the stage cost:
While (20) holds only for , notice that SMPC for with recursive feasibility possesses the default stability property (17). For zero terminal cost , Assumption 8 replaces Assumption 6 to guarantee (19), a finite discounted infinite-horizon SMPC cost.
The terminal feedback policy specified in Assumption 5 satisfies
for all densities of with .
6 Infinite-Horizon Performance Bounds
In the following, we establish performance bounds for SMPC, implemented on the infinite horizon as a proxy to solving the infinite-horizon stochastic optimal control problem . These bounds are in the spirit of previously established bounds reported for deterministic MPC in  and the stochastic full state-feedback case in .
There exist and such that
for all densities of which are feasible in .
Denote by the SMPC implementation of policy on the infinite horizon, i.e.
Similarly, and are the optimal sequences of policies in Problems and , respectively.
In the special case , we impose the following assumption on the terminal cost to obtain an insightful corollary to Theorem 4.
For , there exists such that the terminal policy specified in Assumption 5 satisfies
for all densities of with . The expectation is with respect to state – with density – and .
This Corollary relates the following quantities: design cost, which is known as part of the SMPC calculation, optimal cost which is unknown (otherwise we would use ), and unknown infinite-horizon SMPC achieved cost .
7 Analysis of Assumptions
The sequence of assumptions becomes more inscrutable as our study progresses. However, they deviate only slightly from standard assumptions in MPC, suitably tweaked for stochastic applications. Assumptions 1 and 2 are regularity conditions permitting the development of the Bayesian filter via densities and restricting the controls to causal policies. Assumptions 3 and 4 limit the constraint sets and initial state density to admit treatment of recursive feasibility.
Assumptions 5, 6, 8 and 10 each concerns a putative terminal control policy, . Assumption 5 implies positive invariance of the terminal constraint set under . Using the martingale analysis of the proof of Theorem 3, Assumption 6 ensures that the extant achieves finite cost-to-go on the terminal set. The cost-detectability Assumption 7 is familiar in Optimal Control to make the implication that finite cost forces state convergence. Assumption 8 temporarily replaces Assumption 6 only to consider the zero terminal cost case. Assumptions 9 and 10 presume monotonicity of the finite-horizon cost with increasing horizon, firstly for the optimal policy and then for the putative terminal policy, on the terminal set. These monotonicity assumptions mirror those of, for example,  for deterministic MPC and  for full-state stochastic MPC. They underpin the deterministic Lyapunov analysis and the stochastic Martingale analysis based on the cost-to-go. These assumptions are validated for a POMDP example in Section 9.
8 Dual Optimal Stochastic MPC for POMDPs
We now proceed by particularizing the performance results from Section 6 for the special class of POMDPs, as suggested for instance in [28, 29, 32]. This class of problems is characterized by probabilistic dynamics on a finite state space , finite action space , and finite observation space . POMDP dynamics are defined by the conditional state transition and observation probabilities
where , , , . The state transition dynamics (23) correspond to a conventional Markov Decision Process (MDP, e.g. ). However, the control actions are to chosen based on the known initial state distribution and the sequences of observations, , and controls , respectively. That is, we are choosing our control actions in a Hidden Markov Model (HMM, e.g. ) setup. Notice that, while POMDPs conventionally do not have an initial observation in (24), as is commonly assumed in nonlinear system models of the form (2-2), one can easily modify this basic setup without altering the following discussion.
Given control action and measured output , the information state in a POMDP is updated via
where denotes the entry of the row vector . To specify the cost functionals (10) and (11) in the POMDP setup, we write the stage cost as if and , summarized in the column vectors of the same dimension as row vectors . Similarly, the terminal cost terms are if , summarized in the column vector . The infinite horizon cost functional defined in Section 2 then follows as
with corresponding finite-horizon variant
for , from terminal value function
For , there exist and a policy such that
for all densities of .
9 An Example in Healthcare Decision Making
9.1 Problem Setup
Consider a patient treated for a specific disease which can be managed but not cured. For simplicity, we assume that the patient does not die under treatment. While this transition would have to be added in practice, it results in a time-varying model, which we avoid in order to keep the following discussion compact.
The example, introduced in , is set up as follows. The disease encompasses three stages with severity increasing from Stage 1 through Stage 2 to Stage 3, transitions between which are governed by a controlled Markov chain, where is the transition probability matrix with values at row and column and is the observation matrix with elements . All transition and observation probability matrices below are defined similarly. Once our patient enters Stage 3, Stages 1 and 2 are inaccessible for all future times. However, Stage 3 can only be entered through Stage 2, a transition from which to Stage 1 is possible only under costly treatment. The same treatment inhibits transitions from Stage 2 to Stage 3. We have access to the patient state only through imprecise tests, which will result in one of three possible values, each of which is representative of one of the three disease stages. However, these tests are imperfect, with non-zero probability of returning an incorrect disease stage. All possible state transitions and observations are illustrated in Figure 1.
At each point in time, the current information state is available to make one of four possible decisions/actions:
Skip next appointment slot.
Schedule new appointment.
Order rapid diagnostic test.
Apply available treatment.
Skipping an appointment slot results in the patient progressing through the Markov chain describing the transition probabilities of the disease without medical intervention, without new information being available after the current decision epoch. Scheduling an appointment does not alter the patient transition probabilities but provides a low-quality assessment of the current disease stage, which is used to refine the next information state. The third option, ordering a rapid diagnostic test, allows for a high-quality assessment of the patient’s state, leading to a more reliable refinement of the next information state than otherwise possible when choosing the previous decision option. The results from this diagnostic test are considered available sufficiently fast so that the patient state remains unchanged under this decision. The remaining option entails medical intervention, allowing probabilistic transition from Stage 2 to Stage 1 while preventing transition from Stage 2 to Stage 3. Transition probabilities , observation probabilities , and stage cost vectors for each decision are summarized in Table 1. Additionally, we impose the terminal cost
|Decision||Transition Probabilities||Observation Probabilities||Cost|
|1: Skip next appointment slot|
|2: Schedule new appointment|
|3: Order rapid diagnostic test|
|4: Apply available treatment|
In the solution for the optimal feedback control, the selection of a diagnostic test comes at a cost to the objective criterion and, evidently, serves to refine the information state of the system/patient. It does so without effect on the regulation of the patient other than to improve the information state. Clearly, testing to resolve the state of the patient is part of an optimal strategy in this stochastic setting; but it does take resources. A certainty-equivalent feedback control would assign treatment on the supposition that the patient’s state is precisely known. Such a controller would never order a test. The decision to apply a test in the following numerical solution is evidence of duality in receding-horizon stochastic optimal control, viz. SMPC.
9.2 Computational Results
The trade-off between the two principal decision categories – testing versus treatment, probing versus regulating, exploration versus excitation – is precisely what is encompassed by duality, which we can include in an optimal sense by solving (25-26) and applying the resulting initial policy in receding horizon fashion. This is demonstrated in Figure 2, which shows simulation results for SMPC with control horizon and discount factor . As anticipated, the stochastic optimal receding horizon policy shows a structure not drastically different from the decision structure motivated above. In particular, diagnostic tests are used effectively to decide on medical intervention.
In order to apply Theorem 5 to this particular example, we choose the policy in Assumption 11 always to apply medical intervention. Using the worst-case scenario for the expectations in (27), which entails transition from Stage 1 to Stage 2 under treatment, we can satisfy Assumption 11 with . The computed cost in our simulation is . Combined with the discount factor , we thus have the upper bound
The central contribution of the paper is the presentation of an SMPC algorithm based on SOOFC. This yields a number of theoretical properties of the controlled system, some of which are simply recognized as the stochastic variants of results from deterministic full-state feedback MPC with their attendant assumptions, including for instance Theorem 1 for recursive feasibility. Theorem 2 is the main stability result in establishing the finiteness of the discounted cost of the SMPC-controlled system. Theorem 3 and Corollary 1 deal with consequent convergence of the state in special cases.
Performance guarantees of SMPC are made in comparison to performance of the infinite-horizon stochastically optimally controlled system and are presented in Theorem 4 and Corollary 2. These results extend those of , which pertain to full-state feedback stochastic optimal control and which therefore do not accommodate duality. Other examples of stochastic performance bounds are mostly restricted to linear systems and, while computable, do not relate to the optimal constrained control. While the formal stochastic results are traceable to deterministic predecessors, the divergence from earlier work is also notable. This concentrates on the use of the information state to accommodate measurements and the exploration of control policy functionals stemming from the Stochastic Dynamic Programming Equation. The resulting output feedback control possesses duality and optimality properties which are either artificially imposed in or absent from earlier approaches.
We have further suggested two potential strategies to ameliorate the computational intractability of the Bayesian filter and SDPE, famous for its curse of dimensionality. Firstly, one may use the Particle filter implementation of the Bayesian filter, which has many examples of fast execution for small state dimensions, which with a loss of duality can be combined with scenario methods. This approach is discussed in  as an approximation of the algorithm in this paper. Secondly, we point out that our algorithm becomes computationally tractable for the special case of POMDPs, which may be used either to approximate a nonlinear model or to model a given system in the first place. This strategy inherits the dual nature of our SMPC algorithm for general nonlinear systems.
a.1 Theorem 2
Denote by the discounted -cost-to-go,
where , , are the optimal feedback policies in Problem . Moreover, define as the -algebra generated by the initial state with density and the i.i.d. noise sequences and for . Then is -measurable and by non-negativity of stage and terminal cost. Then,
and, by optimality of the policies in ,
is -measurable, we then have
By Assumption 6, this yields
Taking expectations in (A.1) further gives
where via feasibility of for . By positivity of the stage cost, this yields
Inequalities (A.1) and (A.2) with non-negativity of the stage cost show that is a non-negative -supermartingale on its filtration and thus, by Doob’s Martingale Convergence Theorem (see ), converges almost surely to a finite random variable,
Now define to be the discounted sample cost-to-go plus the achieved MPC cost at time ,
That is, recognizing that so that , also is a non-negative -supermartingale and converges almost surely to a finite random variable
a.2 Theorem 3
First proceed as in the proof of Theorem 2. By Doob’s Decomposition Theorem (see ) on (A.3), there exists a martingale and a decreasing sequence such that , where a.s. by (A.3). Using this decomposition, (A.1) yields