Optimal Energy Allocation for Kalman Filtering over Packet Dropping Links with Imperfect Acknowledgments and Energy Harvesting ConstraintsA preliminary version of this paper was presented at the 4th IFAC NecSys workshop, Koblenz, Germany, Sep. 2013.
This paper presents a design methodology for optimal transmission energy allocation at a sensor equipped with energy harvesting technology for remote state estimation of linear stochastic dynamical systems. In this framework, the sensor measurements as noisy versions of the system states are sent to the receiver over a packet dropping communication channel. The packet dropout probabilities of the channel depend on both the sensor’s transmission energies and time varying wireless fading channel gains. The sensor has access to an energy harvesting source which is an everlasting but unreliable energy source compared to conventional batteries with fixed energy storages. The receiver performs optimal state estimation with random packet dropouts to minimize the estimation error covariances based on received measurements. The receiver also sends packet receipt acknowledgments to the sensor via an erroneous feedback communication channel which is itself packet dropping.
The objective is to design optimal transmission energy allocation at the energy harvesting sensor to minimize either a finite-time horizon sum or a long term average (infinite-time horizon) of the trace of the expected estimation error covariance of the receiver’s Kalman filter. These problems are formulated as Markov decision processes with imperfect state information. The optimal transmission energy allocation policies are obtained by the use of dynamic programming techniques. Using the concept of submodularity, the structure of the optimal transmission energy policies are studied. Suboptimal solutions are also discussed which are far less computationally intensive than optimal solutions. Numerical simulation results are presented illustrating the performance of the energy allocation algorithms.
sensor network (WSN) technologies arise in a wide range of applications such as environmental data gathering , mobile robots and autonomous vehicles , and monitoring of smart electricity grids , among many others. In these applications one of the important challenges is to improve system performance and reliability under resource (e.g., energy/power, computation and communication) constraints.
A considerable amount of research has recently been devoted to the concept of energy harvesting  (see also  among other papers). This is motivated by energy limited WSN applications where sensors may need to operate continuously for years on a single battery. In the energy harvesting paradigm the sensors can recharge their batteries by collecting energy from the environment, e.g. solar, wind, water, thermal or mechanical vibrations. However, the amount of energy harvested is random as most renewable energy sources are unreliable. In this work we will consider the remote Kalman filtering problem with random packet dropouts and imperfect receipt acknowledgments when the sensors are equipped with energy harvesting technology, and as a result, are subject to energy harvesting constraints.
Since the seminal work of , the problem of state estimation or Kalman filtering over packet dropping communication channels has been studied extensively (see for example among others). The reader is also referred to the comprehensive survey  for some of the research on the area of control and estimation over lossy networks up to 2007. In these problems sensor measurements (or state estimates in the case of ) are grouped into packets which are transmitted over a packet dropping link such that either the entire packet is received or lost in a random manner. The focus in these works is on deriving conditions on the packet arrival rate in order to guarantee the stability of the Kalman filter.
There are other works which are concerned with estimation performance (e.g. minimizing the expected estimation error covariance) rather than just stability. For instance, power allocation techniques
In conventional wireless communication systems, the sensors have access either to a fixed energy supply or have batteries that may be easily rechargeable/replaceable. Therefore, the sum of energy/power constraint is used to model the energy limitations of the battery-powered devices (see ). However, in the context of WSNs the use of energy harvesting is more practical, e.g., in remote locations with restricted access to an energy supply, and even essential where it is dangerous or impossible to change the batteries . In these situations it is possible to have communication devices with on-board energy harvesting capability which may recharge their batteries by collecting energy from the environment including solar, thermal or mechanical vibrations.
Typically, the harvested energy is stored in an energy storage such as a rechargeable battery which then is used for communications or other processing. Even though the energy harvesters provide an everlasting energy source for the communication devices, the amount of energy expenditure at every time slot is constrained by the amount of stored energy currently available. This is unlike the conventional communication devices that are subject only to a sum energy constraint. Therefore, a causality constraint is imposed on the use of the harvested energy . Communication schemes for optimizing throughput for transmitters with energy harvesting capability have been studied in , while a remote estimation problem with an energy harvesting sensor was considered in  which minimized a cost consisting of both the distortion and number of sensor transmissions.
In this paper we study the problem of optimal transmission energy allocation at an energy harvesting sensor for remote state estimation of linear stochastic dynamical systems. In this model, sensor’s measurements as noisy versions of the system’s states are sent to the receiver over a packet dropping communication channel. Similar to the channel models in , the packet dropout probabilities depend on both the sensor’s transmission energies and time varying wireless fading channel gains. The sensor has access to an energy harvesting source which is an everlasting but unreliable energy source compared to conventional batteries with fixed energy storages. The receiver performs a Kalman filtering optimal state estimation with random packet dropouts to minimize the estimation error covariances based on received measurements. In general, knowledge at the sensor of whether its transmissions have been received at the receiver is usually achieved via some feedback mechanism. Here, in contrast to the models in  the feedback channel from receiver to sensor is also a packet dropping erroneous channel leading to a more realistic formulation. The energy consumed in transmission of a packet is assumed to be much larger than that for sensing or processing at the sensor and thus energy consumed in sensing and processing is not taken into account in our formulation.
The objective of this work is to design optimal transmission energy allocation (per packet) at the energy harvesting sensor to minimize either a finite-time horizon sum or a long term average (infinite-time horizon) of the trace of the expected estimation error covariance of the receiver’s Kalman filter. The important issue in this problem formulation is to address the trade-off between the use of available stored energy to improve the current transmission reliability and thus state estimation accuracy, or storing of energy for future transmissions which may be affected by higher packet loss probabilities due to severe fading.
These optimization problems are formulated as Markov decision processes with imperfect state information. The optimal transmission energy allocation policies are obtained by the use of dynamic programming techniques. Using the concept of submodularity , the structure of the optimal transmission energy policies are studied. Suboptimal solutions which are far less computationally intensive than optimal solutions are also discussed. Numerical simulation results are presented illustrating the performance of the energy allocation algorithms.
Previous presentation of the model considered in this paper includes  which investigates the case with perfect acknowledgments at the sensor. Here, we address the more difficult problem where the feedback channel from receiver to sensor is an imperfect erroneous channel modelled as an erasure channel with errors.
In summary, the main contributions of this paper are as follows:
Unlike a large number of papers focusing on the stability for Kalman filtering with packet loss, e.g. , we focus on the somewhat neglected issue of estimation error performance (noting that stability only guarantees bounded estimation error) in the presence of packet loss and how to optimize it via power/energy allocation at the sensor transmitter. Note that it is quite common to study optimal power allocation in the context of a random stationary source estimation in fading wireless sensor networks , but this issue has received much less attention in the context of Kalman filtering over packet dropping links which are randomly time-varying. In particular, we consider minimization of a long-term average of error covariance minimization for the Kalman filter by optimally allocating energy for individual packet transmissions over packet dropping links with randomly varying packet loss probability due to fading. While a version of this problem was considered in our earlier conference paper , we extend the problem setting and the analysis along multiple directions as described below.
Unlike , we consider an energy harvesting sensor that is not constrained by a fixed initial battery energy, but rather the randomness of the harvested energy pattern. Energy harvesting is a promising solution to the important problem of energy management in wireless sensor networks. Furthermore, recent advances in hardware have made energy harvesting technology a practical reality .
We provide a new sufficient stability condition for bounded long term average estimation error, which depends on the packet loss probability (which is a function of the channel gain, harvested energy and the maximum battery storage capacity) and the statistics of the channel gain and harvested energy process. Although difficult to verify in general, we provide simpler forms of this condition in when the channel gains and harvested energy processes follow familiar statistical models such as independent and identically distributed processes or finite state Markov chains.
We consider the case of imperfect feedback acknowledgements, which is more realistic but more difficult to study than the case of perfect feedback acknowledgements. We model the feedback channel by a general erasure channel with errors.
It is well known that the optimal solution obtained by a stationary control policy minimizing the infinite horizon control cost is computationally prohibitive. Thus motivated, we provide structural results on the optimal energy allocation policy which lead to threshold policies which are optimal and yet very simple to implement in some practical cases, e.g. when the sensor is equipped with binary transmission energy levels. Note that most sensors usually have a finite number of transmission energy/power levels and for simplicity, sensors can be programmed to only have two levels.
Finally, also motivated by the computational burden for the optimal control solution in the general case of imperfect acknowledgments, we provide a sub-optimal solution based on an estimate of the error covariance at the receiver. Numerical results are presented to illustrate the performance gaps between the optimal and sub-optimal solutions.
The organization of the paper is as follows. The system model is given in Section 2. The optimal energy allocation problems subject to energy harvesting constraints are formulated in Section 3. In Section 4 the optimal transmission energy allocation policies are derived by the use of dynamic programming techniques. Section 5 presents suboptimal policies which are less computationally demanding. The structure of the optimal transmission energy allocation policies are studied in Section 6. Section 7 presents the numerical simulation results. Finally, concluding remarks are stated in Section 8.
A diagram of the system architecture is shown in Figure 1. The description of each part of the system is given in detail below.
2.1Process Dynamics and Sensor Measurements
We consider a linear time-invariant stochastic dynamical process
where is the process state at time , , and is a sequence of independent and identically distributed (i.i.d.) Gaussian noises with zero mean and positive definite covariance matrix . The initial state of the process is a Gaussian random vector, independent of the process noise sequence , with mean and covariance matrix .
The sensor measurements are obtained in the form
where is the observation at time , , and is a sequence of i.i.d. Gaussian noises, independent of both the initial state and the process noise sequence , with zero mean and a positive semi-definite covariance matrix .
We enunciate the following assumption:
(A1) We assume that is stabilizable and is detectable.
2.2Forward Communication Channel
The measurement is then sent to a receiver over a packet dropping communication channel such that (considered as a packet) is either exactly received or the packet gets lost due to corrupted data or substantial delay. The packet dropping channel is modelled by
where is the observation obtained by the receiver at time , and denotes that the measurement packet is received, while denotes that the packet containing the measurement is lost.
Similar to , we adopt a model for the packet loss process that is governed by the time-varying wireless fading channel gains and sensor transmission energy allocation (per packet) over this channel. In this model, the conditional packet reception probabilities are given by
where is a monotonically increasing continuous function. The form of will depend on the particular digital modulation scheme being used .
We consider the case where the set of fading channel gains is a first-order stationary and homogeneous Markov fading process (see ) where the channel remains constant over a fading block (representing the coherence time of the channel ). Note that the stationary first-order Markovian modelling includes the case of independent and identically distributed (i.i.d.) processes as a special case.
We assume that channel state information is available at the transmitter such that it knows the values of the channel gains at time . In practice, this can be achieved by channel reciprocity between the sensor-to-receiver and receiver-to-sensor channels (such as in typical time-division-duplex (TDD) based transmissions). In this scenario, the sensor can estimate the channel gain based on pilot signals transmitted from the remote receiver at the beginning of each fading block. Another possibility (if channel reciprocity does not hold) is to estimate the channel at the receiver based on pilot transmissions from the sensor and send it back to the sensor by channel state feedback. However, transmitting pilot signals consumes energy which should then be taken into account. To conform with our problem formulation, we therefore assume that channel reciprocity holds.
2.3Energy Harvester and Battery Dynamics
Let the unpredictable energy harvesting process be denoted by which is also modelled as a stationary first-order homogeneous Markov process, and which is independent of the fading process . This modelling for the harvested energy process is justified by empirical measurements in the case of solar energy .
We assume that the dynamics of the stored battery energy is given by the following first-order Markov model
with given , where is the maximum stored energy in the battery.
2.4Kalman Filter at Receiver
The receiver performs the optimal state estimation by the use of Kalman filtering based on the history which is the -field generated by the available information at the receiver up to time . We use the convention .
The optimal Kalman filtering and prediction estimates of the process state are given by and , respectively. The corresponding Kalman filter error covariances are defined as
The Kalman recursion equations for and are given in . In this paper we focus on the estimation error covariance which satisfies the random Riccati equation
for where (see ). Note that appears as a random coefficient in the Riccati equation (Equation 3). Since (i) the derivation in  allows for time-varying packet reception probabilities, and (ii) in the model of this paper the energy allocation only affects the probability of packet reception via (Equation 1) and not the system state that is being estimated, the estimation error covariance recursion is of the form (Equation 3) as given in . This is in contrast to the work  where the control signal can affect the states at future times which leads to a dual effect.
2.5Erroneous Feedback Communication Channel
In the case of unreliable acknowledgments, the packet loss process is not known to the sensor, instead, the sensor receives an imperfect acknowledgment process from the receiver. It is assumed that after the transmission of and before transmitting the sensor has access to the ternary process where
with given dropout probability for the binary process , i.e., for all . In case (i.e., ), no signal is received on the feedback link and this results in an erasure. In case , a transmission error may occur, independent of all other random processes, with probability . This transmission error results in the reception of when , and when . We may write the transition probability matrix of the erroneous feedback channel as a homogeneous Markov process with a transition probability matrix
where for and . This channel model refers to a generalized erasure channel, namely, a binary erasure channel with errors (see Exercise 7.13 in ). This model is general in the sense that if we let then the ternary acknowledgement process reduces to a binary process with the possibility of only transmission errors, and a standard erasure channel when we set . Finally, the case of perfect packet receipt acknowledgments studied in  is a special case when and above are both set to zero.
The present situation encompasses, as special cases, situations where no acknowledgments are available (UDP-case) and also cases where acknowledgments are always available (TCP-case), see also for a discussion in the context of closed loop control with packet dropouts.
3Optimal Transmission Energy Allocation Problems Subject to Energy Harvesting Constraints
In this section we formulate optimal transmission energy allocation problems in order to minimize the trace of the receiver’s expected estimation error covariances (Equation 3) subject to energy harvesting constraints. Unlike the problem formulation in , in the model of this paper the optimal energy policies are computed at the sensor which has perfect information about the energy harvesting and instantaneous battery levels but has imperfect state information about the packet receipt acknowledgments.
We consider the realistic scenario of causal information case where the unpredictable future wireless fading channel gains and energy harvesting information are not a priori known to the transmitter. More precisely, the information available at the sensor at any time is given by
where is the initial condition.
The information is used at the sensor to decide the amount of transmission energy for the packet loss process. A policy for is feasible if the energy harvesting constraint is satisfied. The admissible control set is then given by
The optimization problems are now formulated as Markov decision processes with imperfect state information for the following two cases:
(i) Finite-time horizon:
and (ii) Long term average (infinite-time horizon):
where is the stored battery energy available at time which satisfies the battery dynamics (Equation 2). It is evident that the transmission energy at time , , affects the amount of stored energy available at time which in turn affects the transmission energy since by (Equation 2). In the special case of perfect packet receipt acknowledgments from receiver to sensor, the reader is referred to  for a similar long term average cost formulation under an average transmission power constraint which is a soft constraint unlike the energy harvesting constraint considered here, which is a hard constraint in an almost sure sense.
We note that the expectations in (Equation 4) and (Equation 5) are computed over random variables , and for given initial condition . Since these expectations are conditioned on the transmission success process of the feedback channel instead of the packet loss process of the forward channel , these formulations fall within the general framework of stochastic control problems with imperfect state information.
It is known that Kalman filtering with packet losses may have unbounded expected estimation error covariances in certain situations (see ). We now aim to provide sufficient conditions under which the infinite horizon stochastic control problem (Equation 5) is well-posed in the sense that an exponential boundedness condition for the expected estimation error covariance is satisfied. The reader is referred to  for the problem of determining the minimum average energy required for guaranteeing the stability of the Kalman filtering with the packet reception probabilities (Equation 1) subject to an average sum energy constraint.
Let and be the time-invariant probability transition laws of the Markovian channel fading process and the Markovian harvested energy process , respectively.
We introduce the following assumption:
(A2) The channel fading process , harvested energy process and the maximum battery storage satisfy the following:
for some .
Proof: Based on Theorem 1 in , a sufficient condition for exponential stability in the sense of ( ?) is that
for some . We now consider a suboptimal solution scheme to the stochastic optimal control problem (Equation 5) where the full amount of energy harvested at each time step is used, i.e., and for . Then (Equation 6) will be a sufficient condition in terms of the channel fading process, harvested energy process and the maximum battery storage. Therefore, Assumption (A2) provides a sufficient condition for the exponential boundedness ( ?) of the expected estimation error covariance.
4Solutions to the Optimal Transmission Energy Allocation Problems Via Dynamic Programming
The stochastic control problems (Equation 4) and (Equation 5) can be regarded as Markov Decision Process (MDP)  problems with imperfect state information . In these formulations the energy harvesting sensor does not have perfect knowledge about whether its transmissions have been received at the receiver or not due to the existence of an imperfect feedback communication channel. Hence, at time the sensor has only “imperfect state information” about via the acknowledgment process . In this section we reduce the stochastic control problems with imperfect state information (Equation 4) and (Equation 5) to ones with perfect state information by using the notion of information-state .
as all observations about the receiver’s Kalman filtering state estimation error covariance at the sensor after the transmission of and before transmitting . We set . The so-called information-state is defined by
which is the conditional probability of estimation error covariance given , and . The following lemma shows how can be determined from together with , and .
Proof: See the Appendix.
It is important to note that the information-state dynamics ( ?) depends on the fading channel gains and sensor transmission energy allocation policies via the packet reception probabilities (Equation 1). Hence, we may write ( ?) as
for . Note that in (Equation 7) depends on the entire function and not just its value at any particular .
In the following sections the stochastic control problems with imperfect state information (Equation 4) and (Equation 5) are reduced to problems with perfect state information where the state is given by the information-state . The resulting stochastic problems with perfect information are approached via the dynamic programming principle.
We establish some notation. Let the binary random variable be defined akin to in (Equation 3), then for a given denote
as the random Riccati equation operator. Let be the set of all nonnegative definite matrices. Then, we denote the space of all probability density functions on as where for any . Let the ternary random variable be defined akin to in Section 2.5. Then, based on the information-state recursion (Equation 7) denote
for given , fading channel gain and sensor transmission energy allocation .
4.2Dynamic Programming Principle
In this section, the transmission energy allocation policy is computed offline from the Bellman dynamic programming equations given below.
Some notation is now presented. Given the fading channel gain and the harvested energy at time we denote the corresponding fading channel gain and the harvested energy at time by and , respectively. We recall that both fading channel gains and harvested energies are modelled as first-order homogeneous Markov processes (see Section 2).
Finite-Time Horizon Bellman Equation
The imperfect state information stochastic control problem (Equation 4) is solved in the following Theorem.
Proof: The proof follows from the dynamic programming principle for stochastic control problems with imperfect state information (see Theorem 7.1 in ).
Based on Remark ?, it is important to note that in the special case of perfect packet receipt acknowledgments, where and in Section 2.5 are set to zero, the Bellman equation ( ?) is written with respect to Dirac delta functions in space , i.e., (see Section 4 in ).
The solution to the imperfect state information stochastic control problem (Equation 4) is then given by
with , where is the solution to the Bellman equation ( ?).
For computational purposes, we now simplify the terms in ( ?). First, we have
with the constraint that . Since the mutually independent processes and are independent of other processes and random variables, we may write
where , and and are the probability transition laws of the Markovian processes and , respectively. But,
where the function is defined in (Equation 9).
Note that the solution to the dynamic programming equation can only be obtained numerically and there is no closed form solution. In fact, even for a horizon 2 problem with causal information and perfect feedback acknowledgment, it can be shown that the optimal solution cannot be obtained in closed form. It can be observed however that for a fixed battery level, the energy allocation generally increases with the channel gain and when the channel gain is above some threshold, all of the available battery energy is used for transmission. Similarly, when the channel gain is kept fixed, the energy allocation is equal to the available energy and increases with increasing battery energy level. Although after some point, the energy allocated for transmission becomes less than the available energy and some energy is saved for future transmissions.
Long Term Average (Infinite-Time Horizon) Bellman Equation
We present the solution to the imperfect state information stochastic control problem (Equation 5) in the following Theorem.
Proof: See the Appendix.
The stationary solution to the imperfect state information stochastic control problem (Equation 5) is then given by
where is the solution to the average cost Bellman equation ( ?).
We note that discretized versions of the Bellman equations ( ?) or ( ?), which in particular includes the discretization of the space of probability density functions , is used for the numerical computation to find suboptimal solutions to the stochastic control problems (Equation 4) and (Equation 5). As the number of discretization levels increases, it is expected that these discretized (suboptimal) solutions converge to the optimal solutions . We solve the Bellman equations ( ?) and ( ?) by the use of value iteration and relative value iteration algorithms, respectively (see Chapter 7 in ).
5Suboptimal Transmission Energy Allocation Problems and Their Solutions
The optimal solutions presented in Section 3 require us to compute the solution of Bellman equations in the space of probability densities . In this section we consider the design of suboptimal policies which are computationally much less intensive than the optimal solutions of Section 4.
Here, we only present suboptimal solutions to the finite-time horizon stochastic control problem (Equation 4). Following the same arguments one can design similar suboptimal solutions to the infinite-time horizon problem.
In this case we formulate the problem of minimizing the expected estimation error covariance as
where is an estimate of computed by the sensor based on the following recursive equations (with ):
(i) In the case we have
(ii) in the case we have
(iii) In the case we have
The reason that the solution to the stochastic control problem (Equation 13) is called suboptimal is that the true estimation error covariance matrix in (Equation 3) is replaced by its estimate . The intuition behind these recursive equations can be explained as follows. Note that in the case of perfect feedback acknowledgements, the error covariance is updated as in case , and in case . In our imperfect acknowledgement model, even when it is received, errors can occur such that is received when , and is received when . Thus the recursions given in (i) and (ii) are the weighted (by the corresponding error event probabilities) combinations of the error covariance recursions in the case of perfect feedback acknowledgements. In the case where an erasure occurs, taking the average of the error covariances in the cases and is intuitively a reasonable thing to do, which motivates the recursion in (iii).
Note that where the conditional probabilities are given in Section 2.5. This together with the recursive equations of yields
Since the expression is of the same form as when is replaced by , the Bellman equation for problem (Equation 13) is given by a similar equation to the case of perfect feedback communication channel considered in  which is presented in the following theorem.
with , where is the solution to the Bellman equation ( ?).
6Some Structural Results on the Optimal Energy Allocation Policies
In this section the structure of the optimal transmission energy allocation policies (Equation 10) is studied for the case of the finite-time horizon stochastic control problem (Equation 4). Following the same arguments one can show similar structural results for the infinite-time horizon problem (Equation 5).
Proof: We let . First, note that, for given and , the final time value function
is a convex function in due to the fact that is a concave function in given (see Lemma 2 in ). Now assume that is convex in for given and . Then, for given and , the function
is convex in , since it is the minimum of which is a constant independent of , and by the induction hypothesis the convex function in . Since the expectation operator preserves convexity,
We now present the main Theorem of this section which gives structural results on the optimal energy allocation policies (Equation 10).
Proof: Assume and are fixed. We define
from ( ?). We aim to show that is submodular in , i.e., for every and ,
It is evident that is submodular in since it is independent of . Denote
Since is convex in (by Lemma ?) we have
(see Proposition 2.2.6 in ). Now let , and . Then, we have the submodularity condition (Equation 14) for . Therefore, is submodular in . Note that submodularity is a sufficient condition for optimality of monotone increasing policies, i.e., since is submodular in then is non-decreasing in (see ).
For fixed , and , let be the unique solution to the convex unconstrained minimization problem
which can be easily solved using numerical techniques such as a bisection search. Then, the structural result of Theorem ? implies that the solution to the constrained problem (Equation 10) where will be of the form
This also helps to reduce the search space by restricting the search to be in one direction for different (see the discussion in Section III.C of ).
6.1Threshold Policy for Binary Energy Allocation Levels
Note that while solving for the optimal energy allocation level in the Bellman equation requires not only discretization of the state space, but also that of the action space. However, the discretization of the action space to a finite number of energy allocation levels is not often an issue as in practice, a sensor transmitter can be programmed to have a finite number of transmission power/energy levels only. In fact, for simplicity of implementation, often a sensor can be equipped with only two power/energy levels for transmission. Thus it is perfectly natural to consider the scenario where the energy allocation space is binary. In this section therefore we consider the optimal solution of Section 4 with the assumption that the transmission energy allocation control belongs to a two element set where . The monotonicity of Theorem ? yields a threshold structure such that the optimal transmission energy allocation policy is of the form
for , where is the battery storage threshold depending on , , and . This threshold structure simplifies the implementation of the optimal energy allocation significantly. However, it requires the knowledge of the optimal battery energy threshold above. In general, there is no closed form expression for , but it can be found via iterative search algorithms. Here we present a gradient estimate based algorithm based on Algorithm 1 in  (after ) to find the threshold in the case of the infinite-time horizon formulation (Equation 5) with perfect packet receipt acknowledgments where and in Section 2.5 are set to zero. A similar algorithm can be devised for the imperfect feedback case albeit with increased computational complexity.
First, we establish some notation. Let be the -th iteration of the relative value algorithm for solving Bellman equation ( ?) in the case of perfect feedback. Then, for given and fixed , , and denote
where the threshold policy is defined as
For , and we denote and . The term in (Equation 16) is the right hand side expression of ( ?) without the minimization, where the threshold policy , depending on the threshold policy , is used in the relative value iteration.
Gradient algorithm for computing the threshold. For fixed , , and in the -th iteration of the relative value algorithm the following steps are carried out:
Step 1) Choose the initial battery storage threshold .
Step 2) For iterations
Compute the gradient:
Update the battery storage threshold via