Stochastic Model Predictive Control:Output-Feedback, Duality and Guaranteed Performance

Stochastic Model Predictive Control:
Output-Feedback, Duality and Guaranteed Performance

Martin A. Sehr    Robert R. Bitmead

A new formulation of Stochastic Model Predictive Output Feedback Control is presented and analyzed as a translation of Stochastic Optimal Output Feedback Control into a receding horizon setting. This requires lifting the design into a framework involving propagation of the conditional state density, the information state, via the Bayesian Filter and solution of the Stochastic Dynamic Programming Equation for an optimal feedback policy, both stages of which are computationally challenging in the general, nonlinear setup. The upside is that the clearance of three bottleneck aspects of Model Predictive Control is connate to the optimality: output feedback is incorporated naturally; dual regulation and probing of the control signal is inherent; closed-loop performance relative to infinite-horizon optimal control is guaranteed. While the methods are numerically formidable, our aim is to develop an approach to Stochastic Model Predictive Control with guarantees and, from there, to seek a less onerous approximation. To this end, we discuss in particular the class of Partially Observable Markov Decision Processes, to which our results extend seamlessly, and demonstrate applicability with an example in healthcare decision making, where duality and associated optimality in the control signal are required for satisfactory closed-loop behavior.


Department of Mechanical & Aerospace Engineering, University of California, San Diego, La Jolla, CA 92093-0411, USA 

Key words:  stochastic control, predictive control, information state, performance analysis, dual optimal control.


11footnotetext: Corresponding author R. R. Bitmead. The material in this paper was not presented at any conference.

1 Introduction

MPC, in its original formulation, is a full-state feedback law. This underpins two theoretical limitations of MPC: accommodation of output feedback, and extension to include a cogent robustness theory since the state dimension is fixed. This paper addresses the first question. There have been a number of approaches, mostly hinging on replacement of the measured true state by a state estimate, which is computed via Kalman filtering [26, 33], moving-horizon estimator [5, 31], tube-based minimax estimators [20], etc. Apart from [5], these designs, often for linear systems, separate the estimator design from the control design. The control problem may be altered to accommodate the state estimation error by methods such as: constraint tightening [33], chance/probabilistic constraints [25], and so forth.

In this paper, we first consider Stochastic Model Predictive Control (SMPC), formulated as a variant of Stochastic Optimal Output Feedback Control (SOOFC), without regard to computational tractability restrictions. By taking this route, we establish a formulation of SMPC which possesses central features: accommodation of output feedback and duality/probing; examination of the probabilistic requirements of deterministic and probabilistic constraints; guaranteed performance of the SMPC controller applied to the system. Performance bounds are stated in relation to the infinite-horizon-optimally controlled closed-loop performance. We next particularize our performance results to the class of Partially Observable Markov Decision Processes (POMDPs), as is discussed explicitly in [28]. For this special class of systems, application of our results and verification of the underlying assumptions are computationally tractable, as we demonstrate using a numerical example in healthcare decision making from [29].

This paper does not seek to provide a comprehensive survey of the myriad alternative approaches proposed for Stochastic Model Predictive Control (SMPC). For that, we recommend the numerous available references such as [11, 16, 19, 21]. Rather, we present a new algorithm for SMPC based on SOOFC and prove, particularly, performance properties relative to optimality. As a by-product, we acquire a natural treatment of output feedback via the Bayesian Filter and of the associated controller duality required to balance probing for observability enhancement and regulation. The price we pay for general nonlinear systems is the suspension of disbelief in computational tractability. However, the approach delineates a target controller with assured properties. Approximating this intractable controller by a more computationally amenable variant, as opposed to identifying soluble but indirect problems without guarantees, holds the prospect of approximately attracting the benefits. Such a strategy, using a particle implementation of the Bayesian filter and scenario methods at the cost of losing duality of the control inputs, is discussed in [27]. Alternatively, as suggested in [29], one may approximate the nonlinear SMPC problem by POMDPs and apply the methods of the current paper directly, resulting in optimality and duality on the approximate POMDP system.

Comparison with Other Performance Results

Our work is related to four central papers discussing performance bounds linking the achieved cost of MPC on the infinite horizon with the cost of infinite-horizon optimal control:

  Grüne & Rantzer

[13] study the deterministic, full-state feedback situation and provide comparison between the infinite-horizon stochastic optimal cost and the achieved infinite-horizon MPC cost. In particular, the achieved MPC cost is bounded in terms of the computed finite-horizon MPC cost.

  Hernándes & Lasserre

[14] consider the stochastic case with full-state feedback and average as well as discounted costs. Their results yield a comparison between the infinite-horizon stochastic optimal cost and the achieved infinite-horizon MPC cost in terms of the unknown true optimal cost.

  Chatterjee & Lygeros

[3] also treat the stochastic case with full-state feedback and average cost function. They establish and quantify a bound on the expected long-run average MPC performance related to the terminal cost function and its associated monotonicity requirement.

  Riggs & Bitmead

[24] consider the stochastic full-state feedback as an extension to [13] via a discounted infinite-horizon cost function. Similarly to [13], they establish a performance bound of the achieved infinite-horizon MPC cost in terms of the computed finite-horizon MPC cost.

  The current paper

extends [13, 24] to include output feedback stochastic MPC. Achieved performance is bounded in terms of the computed finite-horizon MPC cost. The incorporation of state estimation into the problem is the central contribution.

Each of these works relies on a sequence of assumptions concerning the well-posedness of the underlying optimization problems and specific monotonicity conditions on certain value functions which admit the establishment of stability and performance bounds.

We summarize the main contribution of this paper, Corollary 2, for stochastic MPC with state estimation. Subject to cost monotonicity Assumption 10, which is testable in terms of a known terminal policyand the terminal cost function, an upper bound is computable for the achieved infinite-horizon MPC cost in terms of the the computed finite-horizon MPC cost and other parameters of the monotonicity condition. As in [3], we provide an example – here a POMDP form healthcare – in which the assumptions are verified, indicating the substance of the assumptions and the nature of the conclusion regarding closed-loop output-feedback stochastic MPC.

Organization of this Paper

The structure of the paper is as follows. Section 2 briefly formulates SOOFC, as used in Section 3 to present a new SMPC algorithm. After discussing recursive feasibility of this algorithm in Section 4, we proceed by establishing conditions for boundedness of the infinite-horizon discounted cost of the SMPC-controlled nonlinear system in Section 5. Section 6 ties the performance of SMPC to the infinite-horizon SOOFC performance. Section 7 provides a brief encapsulation and post-analysis of the set of technical assumptions in the paper. The results are particularized for POMDPs in Section 8, followed by discussion of our numerical example in Section 9. We conclude the paper in Section 10. To aid the development, all proofs are relegated to the Appendix.


and are real and non-negative real numbers, respectively. The set of non-negative integers is denoted and the set of positive integers by . We write sequences as , where ; is an infinite sequence of the same form. denotes the probability density function of random variable while denotes the conditional probability density function of random variable given jointly distributed random variable . The acronyms a.s., a.e. and i.i.d. stand for almost sure, almost everywhere and independent and identically distributed, respectively.

2 Stochastic Optimal Output-Feedback Control

We consider stochastic optimal control of nonlinear time-invariant dynamics of the form


where , denotes the state with initial value , the control input, the measurement output, the process noise and the measurement noise. We denote by


the known a-priori density of the initial state and by

the data available at time . We make the following standing assumptions on the random variables and system dynamics.

Assumption 1

The dynamics (2-2) satisfy

  1. is differentiable a.e. with full rank Jacobian .

  2. is differentiable a.e. with full rank Jacobian.

  3. and are i.i.d. sequences with known densities.

  4. are mutually independent for all .

Assumption 2

The control input at time instant is a function of the data and .

As there is no direct feedthrough from to , Assumptions 1 and 2 assure that system (2-2) is a controlled Markov process [17]. Assumption 1 further ensures that and enjoy the Ponomarev 0-property [22] and hence that and possess joint and marginal densities.

2.1 Information State & Bayesian Filter

Definition 1

The conditional density of state given data ,


is the information state of system (2-2).

For a Markov system such as (2-2), the information state is propagated via the Bayesian Filter (e.g. [4, 30]):


for and density as in (2). For linear dynamics and Gaussian noise, the recursion (5-6) yields the Kalman Filter.

Definition 2

The recursion (5-6) defines the mapping


2.2 Cost and Constraints

Definition 3

and are expected value and probability with respect to state – with conditional density – and i.i.d. random variables .

Given the available data , we aim to select non-anticipatory (i.e. subject to Assumption 2) control inputs to minimize


where is the control horizon, the stage cost, the terminal cost and a discount factor. Drawing from the literature (e.g. [1, 17]), optimal controls in (8) must inherently be separated feedback policies. That is, control input depends on data and initial density solely through the current information state . Optimality thus requires propagating and policies , where


Cost (8) then reads


Extending stochastic optimal control problems with cost (10) to the infinite horizon (see [1, 2]) typically requires and omitting the terminal cost term , leading to


In addition to minimizing the expected value cost (10), we impose probabilistic state constraints of the form


for . That is, we enforce constraints with respect to the known distributions of the future noise variables and the conditional density of the current state , captured by the information state . Moreover, we consider input constraints of the form


When discussing infinite-horizon optimal control with cost (11), we replace the state constraints (12) by the stationary probabilistic state constraints


for and the input constraints (13) by

Definition 4

Denote by the set of all densities on . Further define to be the set of all of satisfying the probabilistic constraint (12). Define likewise for (14).

2.3 Stochastic Optimal Control

Definition 5

Given dynamics (2-2), and horizon , define the finite-horizon stochastic optimal control problem

Definition 6

Given dynamics (2-2) and , define the infinite-horizon stochastic optimal control problem

Definition 7

is feasible for if there exists a sequence of policies such that, -a.s., satisfy the constraints and is finite. Define feasibility likewise for .

In Stochastic Optimal Control, feasibility entails the existence of policies such that for any , and

Even though the state constraints (12) are probabilistic, this condition results in an equivalent almost sure constraint on the conditional state densities. The stochastic optimal feedback policies in may now be computed in principle by solving the Stochastic Dynamic Programming Equation (SDPE),


for and . The equation is solved backwards in time, from its terminal value


Solution of the SDPE is the primary source of the restrictive computational demands in Stochastic Optimal Control. The reason for this difficulty lies in the dependence of the future information state in each step of (15-16) on the current and future control inputs. While the dependence on future control inputs is limiting even in deterministic control, the computational burden is drastically worsened in the stochastic case because of the complexity of the operator in (7). On the other hand, optimality via the SDPE leads to a control law of dual nature. Dual optimal control connotes the compromise in optimal control between the control signal’s function to reveal the state and its function to regulate that state. These dual actions are typically antagonistic [9]. The duality of stochastic optimal control is a generic feature, although there exist some problems – called neutral – where the probing nature of the control evanesces, linear Gaussian control being one such case.

Notice that, while the Bayesian Filter (5-6) can be approximated to arbitrary accuracy using a Particle Filter [30], the SDPE cannot be easily simplified without loss of optimal probing in the control inputs. While control laws generated without solution of the SDPE can be modified artificially to include certain excitation properties, as discussed for instance in [10, 18], such approaches are suboptimal and do not generally enjoy the theoretical guarantees discussed below. For the stochastic optimal control problems considered here, excitation of the control signal is incorporated automatically and as necessary through the optimization. The optimal control policies, , will inherently inject excitation into the control signal depending on the quality of state knowledge embodied in .

3 Stochastic Model Predictive Control

0:   and
1:  Offline: Solve for via (15-16)
2:  Online:
3:  for  do
4:     Measure
5:     Compute
6:     Apply first optimal control policy,
7:     Compute
8:  end for
(Dual Optimal) SMPC

Notice how this algorithm differs from common practice in SMPC [15, 21] in that we explicitly use the information states . Throughout the literature, these information states – conditional densities – are replaced by best available, or certainty-equivalent state estimates in . While this makes the problem more tractable, one no longer solves the underlying stochastic optimal control problem. As we shall demonstrate in this paper, using information state and optimal policy resulting from solution of Problem at each time instance leads to a number of results regarding closed-loop performance on the infinite horizon.

4 Recursive Feasibility

Assumption 3

yields feasible for , -a.s.

Assumption 4

The constraints in and , for , satisfy

Assumption 5

For all densities , there exists a policy satisfying

Theorem 1

Given Assumptions 3-5, SMPC yields feasible for , -a.s., for all .

The proof of this result follows directly as a stochastic version of the corresponding result in deterministic MPC, e.g. [12]. Notice that recursive feasibility and compact immediately implies a stability result independent of the cost (10), i.e.


for .

5 Convergence and Stability

Assumption 6

For a given , the terminal feedback policy specified in Assumption 5 satisfies


for all densities of with . The expectation is with respect to state – with density – and .

For , Assumption 6 can be interpreted as the existence of a stochastic Lyapunov function on the terminal set of densities, . If (18) holds for , it naturally holds for all .

Theorem 2

Given Assumptions 3-6, SMPC yields


While the discount factor may not seem to play a major role in this result, notice that small values of may be required to satisfy Assumption 6. For , (19) implies almost sure convergence to 0 of the achieved stage cost.

Assumption 7

State is detectable via the stage cost:

Theorem 3

Given Assumptions 3-7, SMPC with yields



While (20) holds only for , notice that SMPC for with recursive feasibility possesses the default stability property (17). For zero terminal cost , Assumption 8 replaces Assumption 6 to guarantee (19), a finite discounted infinite-horizon SMPC cost.

Assumption 8

The terminal feedback policy specified in Assumption 5 satisfies

for all densities of with .

Corollary 1

Given Assumptions 3-5 and 8, SMPC with zero terminal cost yields

Moreover, if and Assumption 7 is added, we have

6 Infinite-Horizon Performance Bounds

In the following, we establish performance bounds for SMPC, implemented on the infinite horizon as a proxy to solving the infinite-horizon stochastic optimal control problem . These bounds are in the spirit of previously established bounds reported for deterministic MPC in [13] and the stochastic full state-feedback case in [24].

Assumption 9

There exist and such that


for all densities of which are feasible in .

Definition 8

Denote by the SMPC implementation of policy on the infinite horizon, i.e.

Similarly, and are the optimal sequences of policies in Problems and , respectively.

Theorem 4

Given Assumptions 3-5 and 9, SMPC with yields


In the special case , we impose the following assumption on the terminal cost to obtain an insightful corollary to Theorem 4.

Assumption 10

For , there exists such that the terminal policy specified in Assumption 5 satisfies

for all densities of with . The expectation is with respect to state – with density – and .

Corollary 2

Given Assumptions 3-5 and 10, SMPC with yields

This Corollary relates the following quantities: design cost, which is known as part of the SMPC calculation, optimal cost which is unknown (otherwise we would use ), and unknown infinite-horizon SMPC achieved cost .

7 Analysis of Assumptions

The sequence of assumptions becomes more inscrutable as our study progresses. However, they deviate only slightly from standard assumptions in MPC, suitably tweaked for stochastic applications. Assumptions 1 and 2 are regularity conditions permitting the development of the Bayesian filter via densities and restricting the controls to causal policies. Assumptions 3 and 4 limit the constraint sets and initial state density to admit treatment of recursive feasibility.

Assumptions 5, 6, 8 and 10 each concerns a putative terminal control policy, . Assumption 5 implies positive invariance of the terminal constraint set under . Using the martingale analysis of the proof of Theorem 3, Assumption 6 ensures that the extant achieves finite cost-to-go on the terminal set. The cost-detectability Assumption 7 is familiar in Optimal Control to make the implication that finite cost forces state convergence. Assumption 8 temporarily replaces Assumption 6 only to consider the zero terminal cost case. Assumptions 9 and 10 presume monotonicity of the finite-horizon cost with increasing horizon, firstly for the optimal policy and then for the putative terminal policy, on the terminal set. These monotonicity assumptions mirror those of, for example, [13] for deterministic MPC and [24] for full-state stochastic MPC. They underpin the deterministic Lyapunov analysis and the stochastic Martingale analysis based on the cost-to-go. These assumptions are validated for a POMDP example in Section 9.

8 Dual Optimal Stochastic MPC for POMDPs

We now proceed by particularizing the performance results from Section 6 for the special class of POMDPs, as suggested for instance in [28, 29, 32]. This class of problems is characterized by probabilistic dynamics on a finite state space , finite action space , and finite observation space . POMDP dynamics are defined by the conditional state transition and observation probabilities


where , , , . The state transition dynamics (23) correspond to a conventional Markov Decision Process (MDP, e.g. [23]). However, the control actions are to chosen based on the known initial state distribution and the sequences of observations, , and controls , respectively. That is, we are choosing our control actions in a Hidden Markov Model (HMM, e.g. [8]) setup. Notice that, while POMDPs conventionally do not have an initial observation in (24), as is commonly assumed in nonlinear system models of the form (2-2), one can easily modify this basic setup without altering the following discussion.

Given control action and measured output , the information state in a POMDP is updated via

where denotes the entry of the row vector . To specify the cost functionals (10) and (11) in the POMDP setup, we write the stage cost as if and , summarized in the column vectors of the same dimension as row vectors . Similarly, the terminal cost terms are if , summarized in the column vector . The infinite horizon cost functional defined in Section 2 then follows as

with corresponding finite-horizon variant

Extending (15-16), optimal control decisions may then be computed via


for , from terminal value function

Assumption 11

For , there exist and a policy such that


for all densities of .

Theorem 5 ([28])

Given Assumption 11, SMPC for POMDPs with yields

for all densities of .

A special case of Corollary 2, this result allows us to bound the achieved infinite-horizon cost of SMPC on POMDPs. In this special case, we can compute the dual optimal control policies and verify Assumption 11 numerically, as is demonstrated for a particular example below.

9 An Example in Healthcare Decision Making

9.1 Problem Setup

Consider a patient treated for a specific disease which can be managed but not cured. For simplicity, we assume that the patient does not die under treatment. While this transition would have to be added in practice, it results in a time-varying model, which we avoid in order to keep the following discussion compact.

The example, introduced in [29], is set up as follows. The disease encompasses three stages with severity increasing from Stage 1 through Stage 2 to Stage 3, transitions between which are governed by a controlled Markov chain, where is the transition probability matrix with values at row and column and is the observation matrix with elements . All transition and observation probability matrices below are defined similarly. Once our patient enters Stage 3, Stages 1 and 2 are inaccessible for all future times. However, Stage 3 can only be entered through Stage 2, a transition from which to Stage 1 is possible only under costly treatment. The same treatment inhibits transitions from Stage 2 to Stage 3. We have access to the patient state only through imprecise tests, which will result in one of three possible values, each of which is representative of one of the three disease stages. However, these tests are imperfect, with non-zero probability of returning an incorrect disease stage. All possible state transitions and observations are illustrated in Figure 1.

Fig. 1: Feasible state transitions and possible test results in healthcare example. Solid arrows for feasible state transitions and observations. Dashed arrows for transitions conditional on treatment and diagnosis decisions.

At each point in time, the current information state is available to make one of four possible decisions/actions:

  1. Skip next appointment slot.

  2. Schedule new appointment.

  3. Order rapid diagnostic test.

  4. Apply available treatment.

Skipping an appointment slot results in the patient progressing through the Markov chain describing the transition probabilities of the disease without medical intervention, without new information being available after the current decision epoch. Scheduling an appointment does not alter the patient transition probabilities but provides a low-quality assessment of the current disease stage, which is used to refine the next information state. The third option, ordering a rapid diagnostic test, allows for a high-quality assessment of the patient’s state, leading to a more reliable refinement of the next information state than otherwise possible when choosing the previous decision option. The results from this diagnostic test are considered available sufficiently fast so that the patient state remains unchanged under this decision. The remaining option entails medical intervention, allowing probabilistic transition from Stage 2 to Stage 1 while preventing transition from Stage 2 to Stage 3. Transition probabilities , observation probabilities , and stage cost vectors for each decision are summarized in Table 1. Additionally, we impose the terminal cost

Decision Transition Probabilities Observation Probabilities Cost
  1: Skip next appointment slot
  2: Schedule new appointment
  3: Order rapid diagnostic test
  4: Apply available treatment
Table 1: Problem data for healthcare decision making example.

In the solution for the optimal feedback control, the selection of a diagnostic test comes at a cost to the objective criterion and, evidently, serves to refine the information state of the system/patient. It does so without effect on the regulation of the patient other than to improve the information state. Clearly, testing to resolve the state of the patient is part of an optimal strategy in this stochastic setting; but it does take resources. A certainty-equivalent feedback control would assign treatment on the supposition that the patient’s state is precisely known. Such a controller would never order a test. The decision to apply a test in the following numerical solution is evidence of duality in receding-horizon stochastic optimal control, viz. SMPC.

9.2 Computational Results

The trade-off between the two principal decision categories – testing versus treatment, probing versus regulating, exploration versus excitation – is precisely what is encompassed by duality, which we can include in an optimal sense by solving (25-26) and applying the resulting initial policy in receding horizon fashion. This is demonstrated in Figure 2, which shows simulation results for SMPC with control horizon and discount factor . As anticipated, the stochastic optimal receding horizon policy shows a structure not drastically different from the decision structure motivated above. In particular, diagnostic tests are used effectively to decide on medical intervention.

In order to apply Theorem 5 to this particular example, we choose the policy in Assumption 11 always to apply medical intervention. Using the worst-case scenario for the expectations in (27), which entails transition from Stage 1 to Stage 2 under treatment, we can satisfy Assumption 11 with . The computed cost in our simulation is . Combined with the discount factor , we thus have the upper bound

via application of Theorem 5. Denoting by the row-vector with entry in element and zeros elsewhere, the observed (finite-horizon) cost corresponding with Figure 2 is

Fig. 2: Simulation results for SMPC with horizon and discount factor . Top plot displays patient state and transitions, with optimal SMPC decisions based on current information state: appointment (pluses); diagnostic test (crosses); treatment (circles). Bottom plot shows information state evolution. Dashed vertical lines mark time instances of state transitions.

10 Conclusions

The central contribution of the paper is the presentation of an SMPC algorithm based on SOOFC. This yields a number of theoretical properties of the controlled system, some of which are simply recognized as the stochastic variants of results from deterministic full-state feedback MPC with their attendant assumptions, including for instance Theorem 1 for recursive feasibility. Theorem 2 is the main stability result in establishing the finiteness of the discounted cost of the SMPC-controlled system. Theorem 3 and Corollary 1 deal with consequent convergence of the state in special cases.

Performance guarantees of SMPC are made in comparison to performance of the infinite-horizon stochastically optimally controlled system and are presented in Theorem 4 and Corollary 2. These results extend those of [24], which pertain to full-state feedback stochastic optimal control and which therefore do not accommodate duality. Other examples of stochastic performance bounds are mostly restricted to linear systems and, while computable, do not relate to the optimal constrained control. While the formal stochastic results are traceable to deterministic predecessors, the divergence from earlier work is also notable. This concentrates on the use of the information state to accommodate measurements and the exploration of control policy functionals stemming from the Stochastic Dynamic Programming Equation. The resulting output feedback control possesses duality and optimality properties which are either artificially imposed in or absent from earlier approaches.

We have further suggested two potential strategies to ameliorate the computational intractability of the Bayesian filter and SDPE, famous for its curse of dimensionality. Firstly, one may use the Particle filter implementation of the Bayesian filter, which has many examples of fast execution for small state dimensions, which with a loss of duality can be combined with scenario methods. This approach is discussed in [27] as an approximation of the algorithm in this paper. Secondly, we point out that our algorithm becomes computationally tractable for the special case of POMDPs, which may be used either to approximate a nonlinear model or to model a given system in the first place. This strategy inherits the dual nature of our SMPC algorithm for general nonlinear systems.

A Proofs

a.1 Theorem 2

Denote by the discounted -cost-to-go,

where , , are the optimal feedback policies in Problem . Moreover, define as the -algebra generated by the initial state with density and the i.i.d. noise sequences and for . Then is -measurable and by non-negativity of stage and terminal cost. Then,

and, by optimality of the policies in ,

where denotes the terminal feedback policy, specified by Assumptions 5 and 6, and feasibility follows as in the proof of Theorem 1. Given that

is -measurable, we then have

By Assumption 6, this yields


Taking expectations in (A.1) further gives

where via feasibility of for . By positivity of the stage cost, this yields


Inequalities (A.1) and (A.2) with non-negativity of the stage cost show that is a non-negative -supermartingale on its filtration and thus, by Doob’s Martingale Convergence Theorem (see [7]), converges almost surely to a finite random variable,


Now define to be the discounted sample cost-to-go plus the achieved MPC cost at time ,


That is, recognizing that so that , also is a non-negative -supermartingale and converges almost surely to a finite random variable

However, by definition of and (A.3), this implies (19).   

a.2 Theorem 3

First proceed as in the proof of Theorem 2. By Doob’s Decomposition Theorem (see [6]) on (A.3), there exists a martingale and a decreasing sequence such that , where a.s. by (A.3). Using this decomposition, (A.1) yields

Taking limits as and re-invoking non-negativity of the stage cost then leads to a.s., which by the detectability condition on the stage cost (Assumption 7) verifies (20).   

a.3 Theorem 4

The optimal value function in the SDPE (15) satisfies , so that optimality of policy in Problem implies

which by Assumption 9 yields