Self-reflective model predictive control
This paper proposes a novel control scheme, named self-reflective model predictive control, which takes its own limitations in the presence of process noise and measurement errors into account. In contrast to existing output-feedback MPC and persistently exciting MPC controllers, the proposed self-reflective MPC controller does not only propagate a matrix-valued state forward in time in order to predict the variance of future state-estimates, but it also propagates a matrix-valued adjoint state backward in time. This adjoint state is used by the controller to compute and minimize a second order approximation of its own expected loss of control performance in the presence of random process noise and inexact state estimates. The properties of the proposed controller are illustrated with a small but non-trivial case study.
Key words. Optimal Control, Model Predictive Control, Optimal Experiment Design, Dual Control
AMS subject classifications. 49K21, 49K30, 93B07, 93B52
Historically, online optimization based receding horizon controllers have been developed by separating the overall control problem into two, namely, a state estimation and a state feedback control problem [1, 49]. Nowadays, there exists a mature theoretical foundation as well as software for both moving horizon estimation (MHE) [12, 48], realizing optimized state estimation, as well as model predictive control (MPC) [6, 7, 14, 28], realizing optimized state feedback control.
Unfortunately, if the dynamic system is not linear or if there are state- or control constraints presents, a separation of the overall control problem into an estimation and a feedback phase can be sub-optimal. For example, if a nonlinear system is not observable at its desired operation point, it might be necessary to excite the system from time to time on purpose in order to be able to measure the states. One way to address this problem is by using a so-called output-feedback model predictive controller, as suggested in [18, 41, 42]. During the last decade, there has been significant progress in the field of output-feedback MPC. In particular, in  a way of approaching the output-feedback MPC problem for linear systems in the presence of convex state- and control constraints has been developed. The corresponding method uses convex optimization techniques in order to design a feedback law that is affine in the disturbance sequence by solving an online optimization problem whose complexity scales quadratically with respect to the prediction horizon. Other approaches for conservatively solving the output-feedback problem based on set propagation techniques can be found in [5, 11, 33, 36, 57].
Notice that the output-feedback model predictive control problem is related to the so-called dual control problem . Dual control tries to find an optimal control input under the assumption that learning is possible while the model is unknown. A complete overview about the historical developments related to the dual control problem is beyond the scope of this article, but there exists a large number of recommendable overview articles [16, 17, 37, 55, 56], where reviews and comments about the history and importance of the dual control problem can be found. An early article about the limitations of predictive control from an adaptive control perspective can be found in . During the last decade, a number of articles have appeared, which all try in one or the other way to include dual control aspects in MPC, as reviewed below. Here, one way to predict future state- and parameter estimation errors is to use an extended Kalman filter. This idea has been introduced by Hovd and Bitmead , who propose to augment standard MPC by an additional matrix-valued state that predicts the variance of future state estimates. This matrix-valued state is propagated forward in time by a Riccati differential equation, which computes the variance state of an extended Kalman filter . It can be shown that the corresponding controller excites the system for improved state estimates. A related approach toward dual MPC has been proposed in [24, 25], where the MPC objective is augmented by a term that—similar to optimal experiment design [19, 46, 54]—penalizes the predicted parameter error variance. In  the same authors develop a way to predict and optimize future parameter estimation errors for single-input-single-output finite impulse response systems based on non-convex quadratically constrained quadratic programming formulations. A different way of taking predicted parameter estimation errors into account has been suggested in , where an MPC scheme is augmented by an adaptive parameter estimation algorithm. Yet another related control problem formulation is proposed in , where it is suggested to add an experiment design constraint to the MPC formulation. Lucia and Paulen  suggest to combine guaranteed parameter estimation and multi-stage MPC techniques in order to reduce the uncertainty about future estimation errors. Moreover, in  and  various extensions of application oriented experiment design are developed in order to generate excitation signals in the context of receding horizon control. Another recent article  on online experiment design suggests to use a modified experiment design framework for generating “least costly” excitations, which try to avoid perturbation of the nominal plant operation as much as possible. Other approaches for persistently exciting closed loop MPC can be found in [27, 39, 51, 58].
After introducing the mathematical notation and background in Section LABEL:sec::prelim, the paper starts with Section LABEL:sec::motivating_example by outlining an example, which illustrates why MPC may fail to stabilize nonlinear control systems if the MPC controller does not take the properties of the state estimator into account. Sections LABEL:sec::mpc and LABEL:sec::mhe review existing concepts from the field of MPC, dynamic programming, and state estimators based on extended Kalman filters and MHE. The main theoretical contribution of this paper is presented in Section LABEL:sec::loss, which proposes a novel way of analyzing the loss of optimality that is associated with implementing certainty equivalent MPC controllers in the presence of process disturbances and measurement noise. This analysis is related to a recently proposed method for formulating economic optimal experiment design problems . However,  analyzes a static optimization problem while this paper is concerned with the loss of optimality of a dynamic two-player-multi-stage game, as the MPC controller is optimizing the control input whenever new state estimates become available. A major contribution in this context is presented in Theorem LABEL:thm::expansion, which proposes a procedure to compute second order moment expansions of the expected loss of optimality of certainty equivalent MPC in the presence of random process noise as well as random measurement errors. The corresponding method scales linearly with the prediction horizon of the MPC controller despite the fact that the future state estimates are correlated in time. In Section LABEL:sec::rhc this second order moment approximation of the expected sub-optimality of certainty equivalent MPC is used to design a novel “self-reflective” model predictive controller, which optimizes not only its nominal performance but also includes a term that approximates its own expected sub-optimality in the presence of process noise and measurement errors. This leads—similar to the above reviewed dual MPC schemes—to a controller that persistently excites the states in order to reduce the variance of future state estimates. However, the above reviewed existing dual MPC schemes are all based on additional matrix-valued hyperstates that propagate forward in time in order to predict the accuracy of future state estimates. The proposed receding horizon controller also propagates states forward in time, but, in contrast to existing dual MPC schemes, it is based on additional matrix-valued adjoint states, which propagate backward in time. By coupling the information of these forward and adjoint states the control scheme obtains the ability to be “self-reflective”, in the sense that it is capable of taking its own limitations into account. The novelty and differences of this approach compared to the numerous existing articles on persistently exciting MPC is further discussed toward the end of Section LABEL:sec::rhc. Section LABEL:sec::numerics presents a numerical case study, which illustrates the properties of self-reflective MPC. Section LABEL:sec::conclusions concludes the paper.
For any matrix , the notation indicates that is positive definite. The notation is used to denote the pseudo-inverse of a matrix . If is an integrable function and a random variable with integrable probability distribution and support , the expected function value of is denoted by
Notice that the index is occasionally omitted, i.e., we simply write , if it is clear from the context that is the random variable with respect to which the expected value is computed. If the random variable has a block structure,
and , the conditional expectation over is denoted by
Notice that is a function of . In particular, this notation is such that
In addition, the notation denotes either the Euclidean norm, if the argument is a vector, or the spectral norm, if the argument is a quadratic matrix. Moreover, we use the upper index notation
to denote the sub-matrix that consists of the first columns of a given matrix with being column vectors. This upper index notation is well-defined for all integers . Additionally, we allow the upper index “”, which should be read as “none”. For example, if we write an expression like
this means that the -th element of the function sequence depends on the first columns of if , but the function depends on none of the elements of . Similarly, the lower index notation
is used to denote the sub-matrix that consists of all columns of with index larger than or equal to . Notice that throughout this paper denotes a state sequence of a discrete-time system, whose index typically runs from to . The associated control and disturbance sequences are denoted by and and have one element less. For example,
denotes the sequences of controls and disturbances from to . Analogous to the upper index notation, we allow an “index overflow”, i.e., the lower index “” should be read as “none”.
This paper concerns the design of control laws for nonlinear discrete-time systems of the form
Here, denotes the state, the control input, and the process noise at time . The variable denotes the output measurement and the measurement error. The functions
denote the right-hand side function of the discrete-time dynamic system as well as the measurement function. Moreover, the stage- and terminal cost are denoted by
Throughout this paper, the following blanket assumption is used.
The functions , and are three times continuously differentiable with respect to all arguments.
At many places in the paper, shorthands are used to denote derivatives of the functions , , and . For the first order derivatives the following shorthands are used
Notice that this notation will only be used, if it is clear from the context at which point the derivatives of the functions , , and are evaluated. Similarly, the associated second order derivatives of these functions are denoted by
All technical derivations are based on the following assumptions about the process noise and measurement errors.
The process noise variables and the measurement errors , , are independent random variables, whose expectation values and variances and are given,
The process noise variables and the measurement errors have bounded support such that and for a given radius such that
Notice that Assumptions LABEL:ass::variances and LABEL:ass::boundedSupport together allow us to compute second order moment expansions of nonlinear functions. For example, if is a three times continuously differentiable function of with , we have 
This equation can also be written in the form
which holds independently of the choice of the point , at which the derivative is evaluated, as long as . Similarly, the equation
with implies that we have
Also this result holds independently of the choice of the point as long as we ensure that . Similar expansion results hold for three times continuously differentiable vector-valued functions. For example, if is a three times continuously differentiable function, we have
for any with . We will use equations of the form (LABEL:eq::expansion1), (LABEL:eq::expansion2), and (LABEL:eq::expansion3) in this or very similar versions at many places in this paper (without always mentioning this explicitly).
In practice, random variables are often modelled by Gaussian probability distributions. Unfortunately, Assumption LABEL:ass::boundedSupport is violated in this case, as Gaussian distributions do not have a bounded support. It is possible to rescue (LABEL:eq::expansion1) and (LABEL:eq::expansion2) for Gaussian distributions with small variance, , if we impose more restrictive assumptions on the global properties of the function . For example, if has uniformly bounded third derivatives on , (LABEL:eq::expansion1) and (LABEL:eq::expansion2) hold. However, for general nonlinear functions the equations (LABEL:eq::expansion1) and (LABEL:eq::expansion2) are wrong. Therefore, this paper suggests to model the random variables with probability distributions that closely approximate Gaussian distributions, but have bounded support. One might argue, that this is a realistic modeling assumptions when dealing with nonlinear systems, as noise terms are—due to physical limitations—in practice almost always bounded anyhow, which justifies Assumptions LABEL:ass::boundedSupport.
3 Motivating Example
Let us consider a nonlinear discrete-time system of the form (LABEL:eq::nonlinearSystem) with dimensions as well as , given by
and . Here, is a small time step. This is a nonlinear system as the system matrix depends on the control input . The stage cost is in this example assumed to be a tracking term of the form
It can be checked that this system is unstable in open-loop mode. Since only the second state component, , can be measured, the only way to gather information about the first component is to choose such that the first state component has an influence on the propagation of the second state component via the term in the lower left corner of the system matrix . If we choose such that for all , the first component of the state vector cannot be observed. Unfortunately, a standard tracking MPC controller, whose aim is to minimize the stage cost under the assumption that full-state measurements are available, will choose for all . Thus, the above nonlinear system cannot be stabilized if the MPC controller does not cooperate with the state estimator. For example, a naive cascade consisting of an extended Kalman filter (or MHE) and MPC leads to unstable closed-loop trajectories. The goal of this paper is to develop a receding horizon controller, which resolves this problem while maintaining economic performance.
4 Model Predictive Control
This section briefly reviews the main idea of model predictive control (MPC) and its associated cost-to-go function . For this aim, we introduce for all the notation
to denote the parametric optimal control problem that is associated with the nonlinear system , stage cost and terminal cost . The function is called the cost-to-go function that is associated with the time horizon . Here, denotes the last elements of the disturbance sequence as explained in Section LABEL:sec::notation. Notice that the last cost-to-go function, , is equal to the Mayer term . The standard implementation of an MPC loop can be outlined as follows.
Wait for an estimate of the current state.
Solve Problem (LABEL:eq::mpc) for , , and .
Send the first element of the optimal control sequence to the real process.
Shift the time index and continue with Step 1.
Notice that MPC is a certainty equivalent controller. This means that the future control input is optimized as if there was no uncertainty, , and as if the state estimate was accurate. Comments on how to choose the terminal cost and how to ensure stability of MPC can be found in [14, 40, 49].
In the following discussion control and/or state constraints are not analyzed explicitly, as this article focuses on how to design controllers for the case that the function , , and are nonlinear. The considerations in the following sections can, however, be generalized for the case that control constraints are present, as explained at the end of Section LABEL:sec::rhc. State constraints are beyond the scope of this paper.
which are defined for all , and all , and all . The cost-to-go function sequence satisfies a functional recursion, which is known under the name dynamic programming. It starts with and iterates backwards
for all . Notice that this functional recursion is useful for analyzing the properties of MPC , although numerical implementations of MPC typically solve problem (LABEL:eq::mpc) directly instead of using dynamic programming. Assumption LABEL:ass::differentiable implies that the function as well as the function are three times continuously differentiable. The following proposition introduces a regularity condition under which this property is inherited by all cost-to-go functions through the dynamic programming recursion.
Let Assumption LABEL:ass::differentiable be satisfied and let be given. If the function is locally three times continuously differentiable in a neighborhood of , then the function is locally three times continuously differentiable in all arguments. Moreover, if the minimizer of problem (LABEL:eq::dynProg) is unique and satisfies the first order necessary and second order sufficient optimality conditions,
for all , then the functions is locally three times continuously differentiable in a neighborhood of .
Proof. The first statement follows trivially from the definition of , as the composition and sum of three times continuously differentiable functions remains three times continuously differentiable. The second statement is a well-known result from the field of parametric nonlinear programming , which can be established by applying the implicit function theorem to the first order optimality condition (LABEL:eq::stationarity).
Notice that Proposition LABEL:prop::diff can be applied recursively, in order to show that the cost-to-go functions as well as the functions are—under the regularity assumptions from Proposition LABEL:prop::diff—for all locally three times differentiable. In this context, the word “locally” means that the functions and are three times continuously differentiable for small disturbance sequences . This is sufficient, as we are in this paper interested in analyzing the influence of small disturbances, i.e., for the case that Assumption LABEL:ass::boundedSupport is satisfied for a sufficiently small .
We assume that the regularity conditions from Proposition LABEL:prop::diff are satisfied recursively for all such that all cost-to-go functions are locally three-times continuously differentiable for all .
In the following, we introduce a shorthand notation for the first and second order derivatives of the cost-to-go functions ,
Notice that the sequence satisfies an algebraic Riccati recursion  of the form
Here, the function
is for all given by the associated first and second order derivative of the Mayer term . The expression for the right-hand side function can be worked out explicitly by differentiating the recursion (LABEL:eq::dynProg) twice. This yields
Here, , and are shorthands for the second order derivatives of the function with respect to and ,
Recall that the shorthands , , , , , , , and denote first and second order derivatives of the functions and with respect to states and controls as defined in Section LABEL:sec::prelim. The first two arguments of the function on the left hand side of (LABEL:eq::backwardpP) indicate that all derivatives are evaluated at the point , although this dependence is not highlighted explicitly in the right-hand side of (LABEL:eq::backwardpP). Finally, a second order expansion of the cost-to-go functions is given by
for all .
If is affine in while and are quadratic forms, we have , , and , the functions are quadratic forms in , and the second order expansion (LABEL:eq::secondOrderJ) is exact. In this case, the algebraic Riccati recursion (LABEL:eq::forwardpP) for the sequence has the form
and the corresponding MPC controller is equivalent to LQR control .
5 State Estimation
This section briefly reviews the extended Kalman filter (EKF)  and its relation to moving horizon estimation (MHE) . Recall that denotes the measurements at time . The EKF proceeds by maintaining a state estimate and a variance by performing updates of the form
where denotes the point at which the nonlinear functions and are linearized. In detail, the right-hand side functions and of the EKF are given by the explicit expressions 
The most common variant of the EKF evaluates the derivatives
at the current state estimate . However, in the following, we keep our notation general remarking that the first two arguments of the functions and indicate at which point the first order derivatives and of the functions and are computed. This notation is needed, if we want to use the extended Kalman in order to estimate how accurate our future state estimates will be without knowing the measurements yet. In such a situation, we can compute the iterates of the EKF by evaluating the recursion (LABEL:eq::ekf2) at predicted states in place of the state estimates , which are in this case not available yet. This is possible, as the function does not depend on . In the following, we denote the true state sequence by . This sequence is defined by the recursion
Notice that this recursion depends on the disturbance sequence as well as the true state of the system at time , which are all unknown.
Let Assumptions LABEL:ass::differentiable, LABEL:ass::variances, and LABEL:ass::boundedSupport be satisfied, and let the sequences and be given by the extended Kalman filter recursions (LABEL:eq::ekf1) and (LABEL:eq::ekf2), where the linearization points satisfy while the sequence is given and exactly known. If the initial state estimate and the initial variance satisfy as well as
then the following statements hold on any finite time horizon, .
We have .
The expectation of the state estimate satisfies .
The variance of the state estimate satisfies
Proof. The statement of Lemma LABEL:lem::EKF is—at least in very similar versions—well known in the literature . The main idea is to use induction. Firstly, all three statements of Lemma LABEL:lem::EKF are true for . Next, under the assumption that these three statements are satisfied for a given , they can be established for : if for a given , then we also have
i.e., the first statement holds for all . The second and third statement follows from the fact the EKF computes the first and second order moments exactly, if the dynamic system and the measurement function are affine in . Thus, we can apply the Taylor expansion techniques from Section LABEL:sec::prelim to bound the contribution of higher order terms in order to show that the second and third statement also hold for . The details of this argument can be found in .
The statement of Lemma LABEL:lem::EKF relies on the assumption that the horizon is finite, but the statement of this lemma is wrong in general for infinite horizons without further observability assumptions, as the EKF updates are divergent in general.
Lemma LABEL:lem::EKF can also be applied for the case that the state estimates and variances are computed with a moving horizon estimator (MHE) , whose arrival cost is computed by extended Kalman filter updates. This leads to a refinement of the state estimates for nonlinear system. However, MHE with suitable arrival cost updates is in a first order approximation equivalent to EKF. A proof of this statement can be found in Chapter 3 of .
6 Expected Loss of Optimality
In Section LABEL:sec::motivating_example we have discussed an example, where certainty equivalent MPC leads to a sub-optimal performance, as it tries to bring the system to a non-observable steady-state. The goal of this section is to analyze in detail how much performance is actually lost when running a certainty equivalent MPC to control a general nonlinear system in the presence of disturbances and inexact state estimates. Let denote the true (but unknown) value of the state at time . If we would know the future disturbance sequence in advance, we could compute the optimal input sequence as the solution of the optimal control problem
In practice, we neither know the true initial value nor can predict the future disturbance sequence . Thus, the optimal input sequence cannot be computed. Recall that the control law, which is associated with the certainty equivalent MPC controller (LABEL:eq::mpc), is given by
where the functions have been defined in (LABEL:eq::Gdef) in Section LABEL:sec::mpc. The true state of the closed loop MPC system at time , denoted by , is a function of the state estimate sequence
It can be obtained recursively as the solution of the certainty equivalent MPC recursion
In the following, we denote with
the actual performance of the MPC controller. If uncertainties are present, the performance of the MPC controller can be expected to be worse than the best possible performance, given by
We call the associated difference, given by
the “loss of optimality”. In the following, we use the notation
to denote the loss of optimality that is associated with the -th suboptimal decision. The lemma below establishes an important relation between the function and the function sequence .
Let Assumption LABEL:ass::differentiable be satisfied. We have .
Proof. Recall that the functions are defined by (LABEL:eq::Gdef). Together with (LABEL:eq::trueMPC), this definition implies
Moreover, since is, by definition, a minimizer of the function the dynamic programming equation (LABEL:eq::dynProg) yields
Subtracting (LABEL:eq::aux1) and (LABEL:eq::aux2) yields
for all . Consequently, the sum over the terms turns out to be a telescoping series, which can be simplified as follows:
This is the statement of the lemma.
The performance of an MPC controller that operates under inexact state measurements and process noise can never have a better performance than the utopian controller (LABEL:eq::mpcRep), which knows the true current state and all future process disturbances. Thus, the loss of optimality function must be non-negative. The following corollary summarizes this result and provides a formal proof.
Let Assumption LABEL:ass::differentiable be satisfied. We have for all measurement sequences .
Proof. Definition (LABEL:eq::StageLoss) implies that the functions are non-negative, as is a minimizer of the function . Thus Lemma LABEL:lem::sum implies that the function is non-negative, too.
In the following, we focus on the derivation of a tractable second order approximation of the expected loss of optimality, given by
Here, the expected value is computed under the assumption that the state estimates are computed by the EKF (or locally equivalent MHE) that is described in Section LABEL:sec::mhe. On the first view, we might expect that the computation of such a second order expansion is rather cumbersome and computationally expensive, as the state estimates are in general correlated in time. Nevertheless, the following theorem provides a surprisingly simple recursion technique for computing a second order approximation of the expected loss of optimality requiring operations only. It is based on the following notation.
The matrices denote the solution of the backward Riccati recursion
for , which has been introduced in Section LABEL:sec::mpc.
We use the shorthand notation