Separated design of encoder and controller for
networked linear quadratic optimal control
For a networked control system, we consider the problem of encoder and controller design. We study a discrete-time linear plant with a finite horizon performance cost, comprising of a quadratic function of the states and controls, and an additive communication cost. We study separation in design of the encoder and controller, along with related closed-loop properties such as the dual effect and certainty equivalence. We consider three basic formats for encoder outputs: quantized samples, real-valued samples at event-triggered times, and real-valued samples over additive noise channels. If the controller and encoder are dynamic, then we show that the performance cost is minimized by a separated design: the controls are updated at each time instant as per a certainty equivalence law, and the encoder is chosen to minimize an aggregate quadratic distortion of the estimation error. This separation is shown to hold even though a dual effect is present in the closed-loop system. We also show that this separated design need not be optimal when the controller or encoder are to be chosen from within restricted classes.
We consider discrete-time sequential decision problems for a control loop that has a communication bottleneck between the sensor and the controller (Figure 1). The design problem is to choose in concert an encoder and a controller. The encoder maps the sensor’s raw data into a causal sequence of channel inputs. Depending on the channel model adopted in this paper, the encoder performs either sequential quantization, sampling, or analog companding. The controller maps channel outputs into a causal sequence of control inputs to the plant. Such two-agent problems are generally hard because the information pattern is non-classical, as the controller has less information than the sensor . This gives scope for the controller to exploit any dual effect present in the loop, even when the plant is linear . These two-agent problems are at the simpler end of a range of design problems arising in networked control systems [11, 3, 21, 1]. Naturally, one seeks formulations of these design problems as stochastic optimization problems whose solutions are tractable in some suitable sense.
The classical partially observed linear quadratic Gaussian (LQG) optimal control problem is a one-agent decision problem . Given a linear, Gauss-Markov plant, one is asked for a causal controller, as a function of noisy linear measurements of the state, to minimize a quadratic cost function of states and controls. This problem has a simple and explicit solution, where the optimal controller ‘separates’ into two policies; one to generate a minimum mean-squared error estimate of the state from the noisy measurements, and the other to control the fully observed Gauss-Markov process corresponding to the estimate. A networked version of this problem is the following two-agent LQG optimal control problem . Given a linear Gauss-Markov plant and a channel model, one is asked for an encoder and controller to minimize a performance cost which is a sum of a communication cost and a quadratic cost on states and controls. The communication cost is charged on decisions at the encoder, which are chosen to satisfy constraints imposed by the channel model. No causal encoding or control policies are, in general, excluded from consideration. As in the one-agent version, a certain ‘separated’ design is optimal, as has been suggested in various settings since the sixties [27, 42, 16, 5, 33, 45, 31, 53, 35, 6, 34, 56]. Precisely, the following combination is optimal: certainty equivalence controls with a minimum mean-squared estimator of the state, and an encoder that minimizes a distortion for state estimation at the controller. The distortion is the average of a sum of squared estimation errors with time-varying coefficients depending on the coefficients of the performance cost. This separation is different from that obtained in the classical LQG problem, but it is still due to a linear evolution of the state, and the statistical independence of noises from all other current and past variables. As in the classical one-agent version [43, 41], the random variables need not be Gaussian.
1.1. Previous works
In the long history of the two-agent networked LQG problem, different channel models have been treated, leading to different types of encoders. We find in these works that the encoder is either a quantizer, an analog time-dependent compander, or an event-based sampler.
When a discrete alphabet channel is treated, the encoder is a time-dependent quantizer. Quantized control has been explored since the sixties, and structural results for this problem have seen spirited discussions over the years [27, 32, 16]. This problem was revisited by Borkar et al.  in recent years, setting off a new wave of interest. Surveys can be found in [35, 18]. For an additive noise channel, the encoder is a time-dependent, possibly non-linear, compander. The corresponding networked LQG problem has been studied in , and more recently in [17, 19]. Analog channels with channel use restrictions lead to an encoder being an event-triggered sampler . The networked LQG problem for event-triggered sampling is studied in .
The above papers suggested separated designs for the two-agent LQG problem with dynamic encoder and controller, and certainty equivalence controls. This is despite other results [13, 15], confirming the dual effect in the two-agent networked control problem. Thus, there can be an incentive to the controller to influence the estimation error, and yet the optimal controller chooses to ignore this incentive. Furthermore, for the two-agent LQG problem with event-triggered sampling, and with zero order hold control between samples, Rabi et al.  showed through numerical computations that it is suboptimal to apply controls affine in the minimum mean square error (MMSE) estimate. The optimal controls are nonlinear functions of the received samples. Thus, the literature does not tell us when separation holds, and when it does not, for the general class of two-agent problems.
1.2. Our contributions
We make three main contributions. Firstly, we show that for the combination of a linear plant and nonlinear encoder, the dual effect is present. This confirms the results of Curry and others [13, 15], by establishing through a counter example that there is a dual effect in the closed-loop system. In fact, each of the three models we allow for the channel endow the loop with the dual effect. The dual role of the controller lies in reducing the estimation error in the future, using the predicted statistics of the future state and knowledge of the encoding policy. Due to this dual role, we show that, in general, separated designs need not be optimal for linear plants with non-linear measurements, even with independent and identically distributed (IID) Gaussian noise and quadratic costs. Examples 5 and 6 show instances where the dual effect matters. Example 3 shows how the dual effect in the two-agent networked LQ problem renders useless the techniques that work for the classical, single-agent, partially observed LQ problem. These examples illustrate the insufficiency of arguments offered in [27, 42, 16, 5, 33, 45, 31, 53, 35, 6, 34, 56] for the optimality of separation and certainty equivalent controls.
Our second contribution is a proof for separation in one specific design problem. We prove that for the dynamic encoder-controller design problem, it is optimal to apply separation and certainty equivalence. A key instrument in our proof is the class of ‘controls-forgetting encoders’ (introduced in section 4.2) which we show to be optimal despite it being a strict subset of the general class of state-based encoders. We also notice that the result holds under a variety of schemes for charging communication costs. For example, it holds even when the encoder is an analog compander with hard amplitude limits. Our proof does not require the dual effect to be absent. Hence there is no contradiction with the fact separation and certainty equivalence are not optimal for other design problems concerning the same plant-sensor combination. Our work also provides a direct insight to explain separation or the lack of it, in the form of a property of the optimal cost-to-go function (Example 4 in Section 6). Furthermore, we show that when this property does not hold separation is no longer optimal.
Our third contribution points out some subtleties that arise when dynamic policies are involved. We explicitly demonstrate that with dynamic encoders for LQ optimal control, one cannot extend and apply a result of Bar-Shalom and Tse  which mandates absence of dual effect for certainty equivalence to be optimal. The classical notion of a dual effect was introduced for static measurement policies, and the dual role of the controls has been motivated through the notion of a probing incentive . We ask if the concept of probing applies unchanged for dynamic measurement policies and point out some subtleties in answering this question.
In recent years, there has been a resurgence of interest in problems related to dynamic and decentralized decision making in stochastic control. Old problems and results have been reexamined and reinterpreted to find new insights and develop new methods, such as the common information approach [30, 36]. Others, such as , have sought to reinterpret the proof techniques used in . Following in the path of , many new counterexamples have been identified that show optimality of nonlinear strategies for control problems under non-classical information patterns [29, 57]. Similarly, drawing from the many works on two-agent networked LQG problems [13, 15, 10, 35, 18], we have sought to understand why a structural simplification can be found in some dynamic decision problems, despite the non-classical information pattern and the consequent presence of a dual effect.
The remainder of the paper is organized as follows. In Section 2, we present a basic problem formulation, pertaining to encoder and controller design for data-rate limited channels. In Section 3, we discuss the notion of a dual effect and certainty equivalence, and present a counterexample to establish that there is a dual effect in the considered networked control system. In Section 4, we present a proof for separation in the two-agent networked LQG problem. In Section 5, we extend our results to other channel models, including event-triggered samples and additive noise channels. In Section 6, we present a number of examples to illustrate that in general, separation does not hold for constrained design problems, followed by the conclusions in Section 7.
2. Problem formulation
In this section, we describe a version of the two-agent networked LQG problem, corresponding to a rate-limited channel model. We consider an instantaneous, error-free, discrete-alphabet channel and the logarithm of the size of the alphabet is the bit rate. A control system that uses such a channel to communicate between its sensor and controller is depicted in Figure 1, and comprises of four blocks. Each of these blocks, along with the performance cost, are described below, followed by a description of the design problems under consideration.
The plant state process is scalar, and its evolution law is linear:
for Here is the controls process, and is the plant noise process, which is a sequence of independent random variables with constant variance , and zero means. The initial state has a distribution with mean and variance . At any time , the noise is independent of all state, control, channel input, and channel output data up to and including time . We assume that the state process is perfectly observed by the sensor.
2.2. Performance cost
The performance cost is a sum of the quadratic cost charged on states and controls, and a communication cost charged on encoder decisions:
where and are suitably chosen scalar weights for the squares of the states and controls, respectively. The communication cost is an average quantity that depends on the encoding and control policies, and the channel model adopted.
2.3. Channel model
The channel model refers to an input-output description of the communication link from the sensor to the controller. We denote the channel input at time by , the corresponding output by , and the encoding map generating by . In Figure 1, we consider an ideal, discrete alphabet channel that faithfully reproduces inputs, and thus, . The encoder’s job is to pick at every time , the encoding map producing a channel output letter from the pre-assigned finite alphabet where the non-negative integer is the pre-assigned size of the channel alphabet. Since the alphabet is fixed, we have a hard data-rate constraint at every time. Hence there is no explicit cost attached to communication, so in this case. In Section 5, we consider other channel models that permit the data-rate or energy needed for each transmission to be chosen causally by the encoder.
The control signal is real valued and is to be computed by a causal policy based on the sequence of channel outputs. The controller has perfect memory, and thus remembers all of its past actions, and the causal sequence of channel outputs. Thus, in general, at every time the controller’s map takes the form:
At all times, the encoder knows the entire set of control policies employed by the controller and the statistical parameters of the plant. With this prestored knowledge, the encoder works as a causal quantizer mapping the sequence of plant outputs. Thus, the encoder’s map takes the form:
Notice that we do not allow the encoder to directly view the sequence of inputs to the plant. This subtle point plays an important role in the examples we present in Section 7.
2.6. Design problems
For a given information pattern, different design spaces may arise due to engineering heuristics, hardware or software limitations, etc. Any such design space is a subset of the set of all admissible encoder and controller pairs. We identify four design problems, each associated with its own design space. For these design problems, an adopted channel model can be either the one described in Section 2.3, or any of the models from Section 5. First, we pose a single-agent design problem which has a classical information pattern.
Design problem 1 (Controller-only Design).
Next we pose a design problem where the design space is the largest possible non-randomized set of admissible encoder-controller pairs. We consider every causally time-dependent encoder and controller. In other words, for this type of design problem, regardless of the choices one makes for channel and communication cost, at any time, the controller can update the control signal using all of the channel outputs up till then.
Design problem 2 (Dynamic Encoder-Controller Design).
Next we pose a design problem where the controller and encoder must respect a restriction on selecting the control signals or encoding maps. At every time, the control values must be chosen from a restricted set , such as the interval or the finite set . Likewise, the encoding maps have to be chosen from within restricted sets. For example, the encoding maps may be constrained to consist of two quantization cells , where the encoder threshold must be chosen from a restricted set , say the interval . Subject to these constraints, the controller and encoder policies are still to be dynamically chosen.
Design problem 3 (Constrained Encoder-Controller Design).
Next we pose a design problem where the controller must respect not only the information pattern in the dynamic encoder-controller design problem (Design problem 2), but must also respect a restriction on updating controls. Basically, the control waveform is generated in a piece-wise ‘open-loop’ way, while epochs and encoding maps are picked using dynamic policies. Let , be two random integers such that . Then the two epochs are and . These epochs are chosen by the controller respecting the inequalities: and , and hence have to be adapted to all the data available at the controller. Within an epoch, the controller must pick controls depending only on data at the start of the epoch. Precisely, given the condition that , and given the initial observation , the controls must be a fixed function of regardless of the data .
Design problem 4 (Hold-Waveform-Controller and Encoder Design).
For the linear plant (1), and the adopted channel model, the hold-waveform-controller and encoder design problem is to pick a causal sequence of encoding polices in concert with a causal sequence of policies for epochs and controls to minimize the performance cost (2). The controls are restricted to depend on the controller’s data in the specific form:
A special case of a hold-waveform controller is that of zero order hold (ZOH) control where an additional restriction forces the control waveform be held constant over each epoch.
For all four design problems presented above, we assume the existence of measurable policies minimizing the associated costs. We avoid investigating the necessary technical qualifications except to say that if need be, one may allow randomized polices, or even reject the class of merely measurable policies in favour of the class of universally measurable policies .
3. Dual effect and certainty equivalence
We begin by presenting a definition of dual effect  and certainty equivalence . We then present an example to establish that there is a dual effect of the controls in the networked control system introduced in Section 2.
3.1. Dual effect
In a feedback control loop, the dual effect is an effect that the controller may see in the rest of the loop. When it is present, the control laws affect not just the first moment, but also second, third and higher central moments of the controller’s nonlinear filter for the state. Below, we state this formally for a controlled Markov process with partial observations available to the controller:
where the sequences and are the real-valued plant state and control processes, respectively, see Figure 2. The sequence is the observation process and the sequences and are the plant noise and observation noise processes, respectively. Assume that all the primitive random variables are defined on a suitable probability triple, . Now, consider two arbitrary admissible sets of control policies: . Once we pick one such set of control policies, they together with the measure define the states, observations and controls as random processes. The choice of policies fixes their statistics. We can advertise this relationship by (1) specifying random variables, for example, in the form , (2) specifying a filtration, for example, the one generated by the -process as , or (3) specifying an expected value of a functional, for example, in the form
where stands for any element of the sample space of the primitive random variables. To minimize the notational burden, we advertise the dependence on the set of control policies only as needed. We now define the dual effect by defining its absence.
Definition 1 (Dual effect).
The networked control system in Figure 2 is said to have no dual effect of second-order if
for any two sets of admissible control policies, and
for any two time instants ,
we have for every , and that for any given event ,
Thus, we require equality of the two sets of covariances of filtering/prediction/smoothing errors, corresponding to any two choices of control strategies. In the definition above, by choosing one set of control policies, say as resulting in , for all , we obtain the definition of Bar-Shalom and Tse .
3.2. Certainty equivalence
For the controlled Markov process (3), consider the general cost
where is a given non-negative cost function. Imagine that a muse could at time supply to the controller the exact values of all primitive random variables by informing the controller the exact element of the sample space . With such complete and acausal information, the controller could, in principle, solve the deterministic optimization problem
Let be an optimal control law for this deterministic optimization problem. We now state the definition of certainty equivalence from van der Water and Willems :
Clearly, this law is causal. Notice also that its form is tied to the performance cost, and to the statistics of the state and observation processes. It is possible for certainty equivalence control laws to be nonlinear, and such laws can be optimal even when separated designs may not be. For linear plants, they can sometimes be linear or affine, as indicated by the following proposition from  adapted to our problem.
Lemma 1 (Affine certainty equivalence laws for linear plants).
Definition 3 (Certainty equivalence property).
The certainty equivalence property holds for a stochastic control problem if it is optimal to apply the certainty equivalence control law.
For the stochastic control problem described in Lemma 1, with non-linear measurements that do not result in a dual effect of the controls, Bar-Shalom and Tse  showed that the certainty equivalence property holds.
We now consider a simple example, and show that there is a dual effect of the control signal in the closed-loop system presented in Section 2.
For the plant (1), let , , and . Let this information be known to the encoder and the controller, which simply means that . Let the variance . For the objective function, let the horizon end at , and let . Let the channel alphabet be the discrete set .
For the given threshold , let the encoder at be:
The optimal control law at is , where . Using the encoding policy and the optimal control signal , the performance cost with can be written as a function of the control at :
In the above expression, is the quantization distortion, which is thus proportional to the conditional variance of the controller’s minimum mean-squared estimation error of . Notice that is a function of , thus resulting in a dual effect of the control signal in the plant-encoder-channel combination. Figure 3 shows how the quantization distortion depends on . The total cost is also plotted and the optimal value is shown to be different from the certainty equivalent control .
4. Dynamic encoder-controller design
In this section we solve the dynamic encoder-controller design problem (Design problem 2) which allows both controls and encoders to be dynamic. We work out the details for the discrete alphabet channel with the fixed alphabet size . We begin by examining a known structural property of optimal encoders. This states that it is optimal for the encoder to apply a quantizer on the state , with the shape of the quantizer depending only on past quantizer outputs. Next, we present a structural property for encoders called controls-forgetting, which leads to separation. Finally, we show that one optimal encoder for Design problem 2 does indeed possess this property, which leads to separation and certainty equivalence for this problem.
4.1. Known structural properties of optimal encoders
Let us now formulate the encoder’s Markov decision problem. Fix the control policies to be the arbitrary, but admissible laws:
Then the optimization problem reduces to one of picking encoding policies. This is a single-agent, sequential decision problem, and hence one with a classical information pattern. The action space for this decision problem is the infinite dimensional function space of discrete-valued encoders. At time , the encoder takes as input: the current and previous states, all previous outputs, and all previous encoding maps. For convenience, we can view this encoding map as a function of only the current state but with the rest of the inputs considered as parameters determining the form of this function. Thus, without loss of generality the encoder can be described as the function
having as its argument with its shape determined by Hence the action space at times can be described as: Identifying encoders as decisions to be picked is not enough, as the signal need not be Markov. We utilize the following property.
Lemma 2 (Striebel’s sufficient statistics).
For every design problem we have set up, the signals
form sufficient statistics for the encoding decision at time .
See Striebel . ∎
Hence, at every time , performance is not degraded by the encoder choosing to quantize just instead of quantizing the entire waveform . Of course the shape of the quantizer is allowed to vary with past encoder shapes, past encoder outputs, and on past control inputs. But given the sufficient statistics, the encoder can forget the data: .
Denote by the data at the controller just after it has read the channel output and just before it has generated the control value . Similarly denote by the data at the controller just after it has generated the control value . Then
The problem we consider has two decision makers that jointly minimize a given cost function. The information available to these decision makers is not the same, and neither is the information available to each agent a subset of the information available to the agent downstream in the loop. Thus, the information pattern here is neither classical nor nested. We apply the common information approach111This approach was first proposed by Witsenhausen, as a conjecture in , to deal with multiple decision makers and non-classical information patterns in a general setting. This conjecture was shown to be true by Varaiya and Walrand in  for a special case. Our terminology is derived from , where the conjecture has been studied in detail. to our problem. This approach allows a designer to treat a problem with multiple decision makers as a classical control problem with a single decision maker that has access to partial state information. When applied to our setup, this approach leads to the following structural result at the encoder. The encoding policy is selected based on the information available to the controller at the previous time instant namely . At times respectively, the data comprise the common information in this problem. The encoding map is applied to the state , which is private information available to the encoder. A similar approach has been used by others for problems of quantized control [12, 49, 55].
4.2. Controls-forgetting encoders and separation
At the encoder, the change of variables
is causal and causally invertible. Hence the statistics are also sufficient statistics at the encoder. We now introduce the innovation encoding of Borkar and Mitter .
Definition 4 (Innovation encoder ).
An encoder with the inputs and outputs:
is admissible and is called an ‘innovation’ encoder.
The networked control system in Figure 1 redrawn with an innovation encoder is shown in Figure 4. Note that with innovation encoding, the control free part of the state is not affected by the control policies, but obeys the recursion
For any sequence of causal encoders, one can find an equivalent sequence of innovation encoders such that when these two sets operate on the same sequence of plant outputs, they produce two sequences of channel inputs that are equal with probability one. Hence, if for a plant and channel, the dual effect is present in a certain class of causal encoders, then the dual effect is also present in the equivalent class of innovation encoders . This is what the following example illustrates:
Example 2 (Dual effect in a loop with fixed innovation encoder).
We use the same setup as in Example with the encoder replaced by an innovation encoder. For the given threshold , let the encoder at time be the following innovation encoder:
The optimal control law at is still , where . For the control , notice that (4) and (6) tell us that this innovation encoder is equivalent to the causal encoder of Example 1. For the same applied control policy , and for the same realizations of primitive random variables, we get . Hence, with probability one the two nonlinear filters for the state given are the same. Thus for an event , we have:
Hence the results in Figure 3 apply also to this example.
The encoder (quantizer) in the loop causes the dual effect. Furthermore, the encoder’s presence renders useless the techniques that worked in the case of the classical, single-agent, partially observed LQ control problem. The next example illustrates this.
We examine a scalar system as it evolves from time step 0 to time step 1. We have:
where is the process noise variable which is independent of , and We adopt the specific quantizing strategy given below (on the left in the form of a encoder for , and on the right, in the equivalent, innovation form):
Since the encoder at time is binary, the general control law at time has the form:
where are arbitrary real numbers. The process is fully observed at the controller. We have , and as noted in , one can write:
where the noise-like random variable is given by: Then one can treat the problem as the control of the fully observed process to minimize the given cost, which can be rewritten as the following sum of two terms:
Such a treatment actually works for the case of the classical, single-agent partially observed LQ control problem. There two special things happen: (1) the random process is statistically independent of the control process and of the ‘state’ process , and (2) because the dual effect is absent, the second term on the RHS of 8 does not vary with . Therefore, by considering as the process to be controlled, we get a single-agent, fully observed LQ control problem.
In the two-agent problems considered in this paper, neither of the above-mentioned special things may happen. For this specific example, we have calculated, and then plotted in Figure 5 how the second moments of and vary with . The calculations are presented in Appendix A.
Next we define a class of encoders for which at prescribed times the statistics of , are independent of the control
Definition 5 (Controls-forgetting encoder).
Denote by the conditional density of given the data . An admissible encoding strategy is controls-forgetting from time if it takes the form:
where (1) is any admissible policy for encoding at time , (2) for the policies are adapted to the data
and (3) for fixed values of the data , the map produces the same output regardless of both the controls and the control policies
Clearly such controls-forgetting encoders exist. For example, consider a set of encoders that quantize in sequence to minimize the estimation distortion , where . Let the non-negative function represent some notion of cost. For example, .
Lemma 3 (Distortions incurred by controls-forgetting encoders also forget controls).
Fix the time and the distortion measure . If the encoder is controls-forgetting from time , then for times , the distortions are statistically independent of the partial set of controls .
The unconditional statistics of are independent of the entire control waveform, no matter what the encoder is. For times and for sets , is independent of because the encoding maps are controls-forgetting from time . Since , for all , the lemma follows. ∎
Definition 6 (Controls affine from time ).
A controller affine from time takes the following form:
where the controls are generated by an admissible strategy , the controls are generated by an affine strategy , with the gains and offsets computed offline, and
4.3. Preliminary lemmas
The main result ahead is Theorem 1 that states that it is optimal for Design problem 2 to apply a separated design and certainty equivalence controls. In this subsection, we do some necessary ground work towards proving that result.
Once we are prescribed an admissible encoder, the controls affect only the cost-to-go: . In the classical single agent LQ problem, the ‘prescribed encoder’ is simply the linear observation process with prescribed signal-to-noise ratios. There, this cost-to-go can be expressed as a quadratic function of and . But in our two agent LQ problem, because of the dual effect, the cost to go may have a non-quadratic dependence on the controls . However we show that by restricting to controls-forgetting encoders and affine controls, the cost-to-go does get a quadratic dependence on controls. We use this reasoning and dynamic programming to show that for time going backwards from the following conclusions fall out:
it is optimal at time to apply as control a linear function of , and,
it is optimal at time to apply an encoding map that is controls-forgetting from time .
Lemma 4 (Optimal control at time ).
The optimal control policy at time is the linear law: , and the optimum cost-to-go is the expected value of a quadratic in and .
At time , one is given , and is asked to pick to minimize the cost-to-go
and this lets us prove the Lemma. ∎
Lemma 5 (Optimal for separated, quadratic cost-to-go).
Fix the time . Consider the dynamic encoder-controller design problem (Design problem 2), for the linear plant (1), and the performance cost (2). Suppose that we apply an admissible controller along with an encoder that is controls-forgetting from time . Furthermore, suppose that the partial sets of policies:
are chosen such that the following three properties hold:
the cost-to-go at time takes the separated form:
where, and the term is a weighted sum of future distortions and depends only on the random sequence ,
the coefficients of the quadratic may depend on the control policies but not on the partial set of encoding maps and,
the term depends on the encoding maps but not on the partial set of control policies .
Then, it is optimal to apply an encoding map at time that does not depend on the data: . It also follows that the shapes of the encoding maps and their performance do not depend on the control .
The proof exploits three facts: Firstly the special form of makes the encoder’s performance cost at time a sum of a quadratic distortion between and , and a term gathering distortions at later times. Secondly the minimum of the sum distortion depends only on the intrinsic shape of the conditional density and not on its mean. Thirdly, these facts and the controls-forgetting nature of later encoding maps allows the encoder to ‘ignore’ the control . We now start by writing the cost-to-go as: