Power Allocation for Energy Harvesting Transmitter with Causal Information
Abstract
We consider power allocation for an accesscontrolled transmitter with energy harvesting capability based on causal observations of the channel fading state. We assume that the system operates in a timeslotted fashion and the channel gain in each slot is a random variable which is independent across slots. Further, we assume that the transmitter is solely powered by a renewable energy source and the energy harvesting process can practically be predicted. With the additional access control for the transmitter and the maximum power constraint, we formulate the stochastic optimization problem of maximizing the achievable rate as a Markov decision process (MDP) with continuous state. To efficiently solve the problem, we define an approximate value function based on a piecewise linear fit in terms of the battery state. We show that with the approximate value function, the update in each iteration consists of a group of convex problems with a continuous parameter. Moreover, we derive the optimal solution to these convex problems in closedform. Further, we propose power allocation algorithms for both the finite and infinitehorizon cases, whose computational complexity is significantly lower than that of the standard discrete MDP method but with improved performance. Extension to the case of a general payoff function and imperfect energy prediction is also considered. Finally, simulation results demonstrate that the proposed algorithms closely approach the optimal performance.
I Introduction
The utilization of renewable energy is an important characteristic of the green wireless communication systems [1]. Renewable energy powered transmitters can be deployed for wireless sensor networks or cellular networks, reducing the reliance on traditional batteries and prolonging the transmitter’s lifetime [2][3]. However, the fluctuation of the energy harvesting together with the variation of the channel fading brings many challenges to the design of energyharvesting communication systems [4][5].
Wireless transmission schemes for energyharvesting transmitters have been investigated by a number of recent works [6][7][8][9]. In order to achieve the optimal throughput, a “shortest path” based energy scheduling algorithm was proposed in [6] for a static channel with finite battery capacity and noncausal energy harvesting state. The authors of [7] discussed an MDP model for the case when the energy harvesting and channel fading are known causally and there is no maximum power constraint. A staircase waterfilling algorithm was proposed in [7] for the case when the battery capacity is infinite, and the energy harvesting and fading channel states are known noncausally. With a finite battery capacity and noncausal energy harvesting and fading channel states, a waterfilling procedure was studied in [8], and with an additional maximum power constraint a dynamic waterfilling algorithm was proposed in [9]. The authors of [10] developed an online approximately optimal algorithm based on Lyapunov optimization, which is designed to maximize a utility function, based on the number of packet transmissions in energy harvesting networks. In [11], using the discrete MDP model, a reinforcement learning based approach was used to optimize the number of packet transmissions without the prior knowledge of the statistics of the energy harvesting process and the channel fading process. The authors of [12] considered a static channel with causal knowledge of the stationary Poisson energy arrival process and gave an MDPbased solution to maximize the average throughput with unconstrained transmission power. On the other hand, the throughput optimization problem with causal information on the energy harvesting state and the fading channel state, and under the maximum power constraint, remains open. In this paper, we will tackle this problem.
Specifically, we first consider the power allocation for an accesscontrolled transmitter, which is powered by a renewable energy source and equipped with a finitecapacity battery and has a maximum power constraint. The channel fading is assumed to be a random variable in a slot and is independent across different slots. For energy harvesting, we first assume that it can be predicted accurately for the scheduling period, which can be realized in practice [13][14], and then later introduce the prediction error variables. Furthermore, we assume that a control center can temporarily suspend the transmitter’s access due to channel congestion. Such channel access control for the transmitter is modeled as a firstorder Markov process. Under the above setting, this paper finds the approximately optimal power allocation for both the finite and infinitehorizon cases.
To obtain the power allocation, we formulate the stochastic optimization problem as a discretetime and continuousstate Markov decision process (MDP), with the objective of maximizing the sum of the payoff in the current slot and the discounted expected payoffs in the future slots, where the payoff function is the achievable channel rate. Since the state variables including the battery state and the channel state in the MDP problem are continuous, to avoid the prohibitively high complexity for updating the value function caused by the continuous states, this paper introduces an approximate value function. We show that the approximate value function is concave and nondecreasing in the variable corresponding to the energy stored in the battery, which further enables the approximate value function be updated in closedform. This is then used to find the approximately optimal solution of the power allocation for both the finite and infinitehorizon cases.
The proposed algorithms provide approximate solutions, whose performances are lower bounded by the standard discrete MDP method. Also, to obtain the solution, we solve at most convex optimization problems where is the battery capacity, is the approximation precision, and is the length of horizon for the finitehorizon case or the maximum number of iterations for the infinitehorizon case. In particular, for the infinitehorizon case, given a convergence tolerance , the converged solution can be obtained within iterations, where is the discount factor.
The remainder of the paper is organized as follows. In Section II, we describe the system model, formulate the energy scheduling problem as a continuousstate MDP problem and define the value function. In Section III, we define an approximate value function and prove that the approximate value function is nondecreasing and concave with respect to the continuous battery state. In Section IV, we derive the optimal closedform procedure for updating the approximate value function and develop the power allocation algorithms for both finite and infinitehorizon cases. The proposed algorithms are extended to deal with the model with a general payoff function and imperfect energy prediction in Section V. Section VI provides simulation results and Section VII concludes the paper.
Ii Problem Formulation
Iia System Model
We consider a pointtopoint communication system with one transmitter and one receiver, as shown in Fig. 1. We assume a slow fading channel model where the channel gain is constant for a coherence time of (corresponding to a time slot) and changes independently across slots. The signal model for slot is given by
(1) 
where is the received signal, is the transmitted signal, is the channel gain in slot and is the additive white Gaussian noise consisting of elements.
At the beginning of each slot, the transmitter is informed of the channel access status for the current slot from the control center, where indicates that the channel access is not permitted for slot while indicates otherwise. We assume that follows a stationary firstorder Markov process, whose transition probabilities are given as and . If , then the transmit power in slot is . On the other hand, if , then the transmitter needs to decide its transmit power .
The transmitter is powered by an energy harvesting device, e.g., a solar panel, and a battery. The battery, which buffers the harvested energy, has a finite capacity, denoted by . Since the energy harvesting process is steady or can be well predicted, we assume that the energy harvested over the next slots can be noncausally known, denoted as (the causal energy harvesting model will be considered in Section V). We assume is independent across slots (i.i.d. when ).
In slot , the transmitter transmits at a power level of ( if ), which is constrained by the maximum transmission power and the available energy , i.e.,
(2) 
The battery level at the beginning of slot is given as
(3) 
with the constraint that the battery level is nonnegative for all slots, i.e.,
(4) 
Further, the transmitter receives a payoff based on the transmission power and channel gain. In this paper, we use the achievable channel rate as the payoff, i.e., . Also, in Section V, we consider a general payoff function which is continuous, nondecreasing, and concave with respect to given .
IiB Problem Formulation
We assume that can be predicted noncausally while all other variables are only known causally to the transmitter (we will relax this assumption in Section V where we assume that is predicted with a random error ). Denote , , and a discount factor . We assume that all the side information, e.g., the distributions of all random variables and the predictions of the harvested energy, is known before the first slot. Then the power allocation policy needs to be calculated to maximize the expected total payoff in the next slots, where consists of the observations available at the beginning of slot . Since and are continuous variables, it is not possible to store in a lookup table. Instead, we only store some of the intermediate results, i.e., the approximate value function introduced in Section III, in an efficient way, and then calculate the power allocation when is observed. Specifically, at the beginning of slot , given , if channel access is permitted, i.e., , the transmitter calculate the power level . And if the channel access is not permitted, i.e., , then . To that end, we formulate the following optimization problem for defining the optimal policy
(5) 
Note that by (3), the battery level forms a continuousstate firstorder Markov chain, whereas the channel access state is a discretestate Markov chain by assumption. Then, we can convert the problem in (5) to its equivalent MDP recursive form [15] in terms of the value function, which represents the total payoff received in the current slot and expected to be received in the future slots.
Specifically, in the MDP model we treat the battery level and the channel access state , i.e., , as the state, the channel as the observation, and the transmit power as the decision. Then, the state space becomes ; and the corresponding decision space is and , corresponding to and , respectively. The value function is then recursively defined as
(6) 
where
(7) 
and
(8) 
Note that, represents the expected maximum discounted payoff between slots and given the side information and . Due to the causality and the backward recursion, the observation in slot does not affect the value function for slot . Also, when , given the value function for slot , the optimal power allocation for slot can be obtained by
(9) 
where is calculated using (7). Moreover, when , we always have
(10) 
Iii Approximate Value Function
By recursively computing the value function defined in (6), in theory we can obtain the optimal solution to (9) for each . However, a closedform expression for is hard to obtain when is large, e.g., . A typical approach is to quantize the continuous variables to finite number of discrete levels, i.e., to convert the original problem to a discrete MDP problem [15]. However, with such discretization, solving the corresponding discrete MDP problem involves an exhaustive search on for all discretized , and we can only obtain discrete power levels.
In order to efficiently solve the MDP problem and obtain the continuous power allocation, in this section, we will define an approximate value function by using a piecewise linear approximation based on some discrete samples of where is an approximation precision. This approximate value function is shown to be concave and nondecreasing in the variable corresponding to the energy stored in the battery, making the optimal power allocation problem in (9) (or (18)) a convex optimization problem.
Iiia Value Function Approximation
With an approximation precision parameter , we define a piecewise linear approximation operator:
(11) 
and for any , as shown in Fig. 3.
Initially, we define
(12) 
which is a linear approximation to . Then, recursively from to , we use the approximate value function to replace the original value function in (7), i.e., , and define
(13) 
By setting in (6), we further define
(14) 
Finally, we write the approximation value function as
(15) 
Note that, in (13)(15), we made the substitutions and in (7) and (6), respectively. Thus we can treat the approximate value function , which is updated by (13)(15), as an approximation to the value function , which is updated by (6)(7).
We consider the approximation error at slot (or iteration ). In each iteration, the error is produced by the piecewise linear approximation in (15) and propagated through solving the problem in (14). Then, at the end of each iteration the total error accumulated by the obtained approximate value function is the sum of the newly produced error and the discounted propagated error, growing with the iteration number. Since the update rules for both and start from the same initial value function , then the total error in the th iteration (we use the subscript to denote the th iteration, which represents slot ) can be bounded by
(16) 
where
(17) 
is the new error produced by (15) in the th iteration.
With the approximate value function for each slot , when , the power allocation given can be obtained by
(18) 
Define . Note that the approximate value function is linearly recovered from the sample set and for all . We can consider the standard dynamic programing with the discretized state space as a special case of the update rules in (13)(15). Then, the performance achieved with the approximate value function can be characterized as follows.
Proposition 1
Proof:
Moreover, in the standard discrete dynamic programming, we discretize all continuous variables, i.e., , and then perform the dynamic programming with an exhaustive search on for all possible combinations of ; while with the proposed approximate value function, we only discretize the battery state and then obtain the approximate value function for each discretized in closedform.
IiiB Concavity of Approximate Value Function
In (13)(15), we note that the approximate value function is based on the solution to an optimization problem (14). To facilitate solving (14), in this subsection, we will show that the approximate value function given in (15) is concave for given . Then (14) is a convex optimization problem given and .
First, we introduce the following lemma, which can be easily shown and illustrated in Fig. 2.
Lemma 1
If a function is nondecreasing, for any , is also nondecreasing. Further, if the nondecreasing function is concave, then is concave for .
We have the following nondecreasing property of .
Proposition 2
For any , if the approximate value function is nondecreasing with respect to given , so is .
Proof:
If is nondecreasing with respect to for , by Lemma 1, we have that is also nondecreasing with respect to . Then, we have that , which is a linear combination of the terms of the form , is also nondecreasing with respect to , given and .
Given any battery level , channel fading , the power such that , and such that , we have
(19) 
and
(20)  
(21) 
Since is a nonnegative linear combination of the terms of the form , is nondecreasing with respect to . Then, by (15), we have that is also nondecreasing with respect to . \qed
The next result is on the concavity of .
Proposition 3
For any , if the approximate value function is nondecreasing and concave with respect to given , so is .
Proof:
Since is nondecreasing and concave with respect to given , by Lemma 1, we have is nondecreasing and concave with respect to given . Since is a linear combination of and , then is jointly concave with respect to and . Moreover, it follows that is also jointly concave with respect to and given [16].
Since the feasible domain is different under and . We consider the two cases separately.
When , since , can be written as
(22) 
Since is concave with respect to given and , so is [16]. Then, by (15), is nondecreasing with respect to .
When , the feasible domain of the objective function in (6) is given by . It can be verified that is a convex set. Then, for any , their convex combination , where and .
From Propositions 2 and 3, we have that if is nondecreasing and concave so is for any . Since is nondecreasing and concave with respect to , it is easily verified by (6) that is also nondecreasing and concave with respect to given . By induction, we obtain the following theorem.
Theorem 1
For , the approximate value function is nondecreasing and concave with respect to given . Further, the problem in (14) is a convex optimization problem given and .
Since both and are concave and nondecreasing, where is the iteration number, we can further bound the approximation error in (17) as follows.
Proposition 4
For any iteration , given , we have
(27) 
Proof:
By Theorem 1, is nondecreasing and concave with respect to given . As illustrated in Fig. 3, for , the value of is smaller than the value on line (*) but larger than , and therefore the distance between the value on line (*) and can also be considered as an upper bound on the approximation error, i.e., for . According to the secondorder derivative property of the concave function, we have that
(28) 
for all . Then, we further have that , where . \qed
Iv Power Allocation with Prefect Energy Prediction
Note that in (14), we need to solve the following optimization problem for a given and :
(29) 
When , . On the other hand, when , we will obtain the optimal solution in closedform.
Since the approximate value function in (15) is a piecewise linear function of given , it follows that in (13) is also a piecewise linear function with respect to given , which is differentiable everywhere except at . By Theorem 1 and Lemma 1, is also concave and nondecreasing with respect to .
Since is a piecewise linear function, we denote as the set of the nondifferentiable points, where , , and is the th smallest element in . Also, we denote as the set of the corresponding slopes, where is the slope of the segment , given by
(30) 
which is derived from (13) and (15). Hence, the derivative of for is
(31) 
Since is concave and nondecreasing with respect to , we have . Fig. 4 is a sketch of the staircase function .
In this section we first obtain the closedform solution to (29), and then use it to obtain the optimal power allocation for both finite and infinitehorizon cases.
Iva The Optimal Solution to (29)
In this subsection, for simplicity, we drop the superscript and denote the objective function in (29) as
(32) 
We note that is differentiable for with
(33) 
On the other hand, at the nondifferentiable points in , the rightderivative and the leftderivative of can be written as
(34) 
and
(35) 
respectively.
Theorem 2
In Fig. 5 we give a sketch of . To prove Theorem 2, we first give the necessary and sufficient conditions for the optimal solution as follows [16].
Lemma 2
is the optimal solution to (29) given , if and only if,

, when and ;

, when ;

, when .
Note that, Condition 1 corresponds to the case that is in the interior of . In this case, the leftderivative and the rightderivative should have opposite signs or be both zero at so that the increasing and decreasing of both lead to the decreasing of the objective function. Condition 2 and Condition 3 correspond to the cases that is on each side of the boundary of , where the objective function is nondecreasing and nonincreasing for all , respectively.
The following proposition gives a sufficient condition for the optimality of given .
Proposition 5
Proof:
Then it is easy to verify that for , the solution given by (36) satisfies the optimality condition in Proposition 5.
For , we use the next proposition to prove the optimality of (36).
Proposition 6
For any nondifferentiable point , is the optimal solution to (29) for any .