Power Allocation for Energy Harvesting Transmitter with Causal Information

# Power Allocation for Energy Harvesting Transmitter with Causal Information

Zhe Wang, Vaneet Aggarwal, Xiaodong Wang Z. Wang and X. Wang are with the Electrical Engineering Department, Columbia University, New York, NY 10027 (e-mail: {zhewang, wangx}@ee.columbia.edu).V. Aggarwal is with AT&T Labs-Research, Bedminster, NJ 07921 USA (e-mail: vaneet@research.att.com).
###### Abstract

We consider power allocation for an access-controlled transmitter with energy harvesting capability based on causal observations of the channel fading state. We assume that the system operates in a time-slotted fashion and the channel gain in each slot is a random variable which is independent across slots. Further, we assume that the transmitter is solely powered by a renewable energy source and the energy harvesting process can practically be predicted. With the additional access control for the transmitter and the maximum power constraint, we formulate the stochastic optimization problem of maximizing the achievable rate as a Markov decision process (MDP) with continuous state. To efficiently solve the problem, we define an approximate value function based on a piecewise linear fit in terms of the battery state. We show that with the approximate value function, the update in each iteration consists of a group of convex problems with a continuous parameter. Moreover, we derive the optimal solution to these convex problems in closed-form. Further, we propose power allocation algorithms for both the finite- and infinite-horizon cases, whose computational complexity is significantly lower than that of the standard discrete MDP method but with improved performance. Extension to the case of a general payoff function and imperfect energy prediction is also considered. Finally, simulation results demonstrate that the proposed algorithms closely approach the optimal performance.

Causal information, energy harvesting, fading channel, Markov decision process, power allocation.

## I Introduction

The utilization of renewable energy is an important characteristic of the green wireless communication systems [1]. Renewable energy powered transmitters can be deployed for wireless sensor networks or cellular networks, reducing the reliance on traditional batteries and prolonging the transmitter’s lifetime [2][3]. However, the fluctuation of the energy harvesting together with the variation of the channel fading brings many challenges to the design of energy-harvesting communication systems [4][5].

Wireless transmission schemes for energy-harvesting transmitters have been investigated by a number of recent works [6][7][8][9]. In order to achieve the optimal throughput, a “shortest path” based energy scheduling algorithm was proposed in [6] for a static channel with finite battery capacity and non-causal energy harvesting state. The authors of [7] discussed an MDP model for the case when the energy harvesting and channel fading are known causally and there is no maximum power constraint. A staircase water-filling algorithm was proposed in [7] for the case when the battery capacity is infinite, and the energy harvesting and fading channel states are known non-causally. With a finite battery capacity and non-causal energy harvesting and fading channel states, a water-filling procedure was studied in [8], and with an additional maximum power constraint a dynamic water-filling algorithm was proposed in [9]. The authors of [10] developed an online approximately optimal algorithm based on Lyapunov optimization, which is designed to maximize a utility function, based on the number of packet transmissions in energy harvesting networks. In [11], using the discrete MDP model, a reinforcement learning based approach was used to optimize the number of packet transmissions without the prior knowledge of the statistics of the energy harvesting process and the channel fading process. The authors of [12] considered a static channel with causal knowledge of the stationary Poisson energy arrival process and gave an MDP-based solution to maximize the average throughput with unconstrained transmission power. On the other hand, the throughput optimization problem with causal information on the energy harvesting state and the fading channel state, and under the maximum power constraint, remains open. In this paper, we will tackle this problem.

Specifically, we first consider the power allocation for an access-controlled transmitter, which is powered by a renewable energy source and equipped with a finite-capacity battery and has a maximum power constraint. The channel fading is assumed to be a random variable in a slot and is independent across different slots. For energy harvesting, we first assume that it can be predicted accurately for the scheduling period, which can be realized in practice [13][14], and then later introduce the prediction error variables. Furthermore, we assume that a control center can temporarily suspend the transmitter’s access due to channel congestion. Such channel access control for the transmitter is modeled as a first-order Markov process. Under the above setting, this paper finds the approximately optimal power allocation for both the finite- and infinite-horizon cases.

To obtain the power allocation, we formulate the stochastic optimization problem as a discrete-time and continuous-state Markov decision process (MDP), with the objective of maximizing the sum of the payoff in the current slot and the discounted expected payoffs in the future slots, where the payoff function is the achievable channel rate. Since the state variables including the battery state and the channel state in the MDP problem are continuous, to avoid the prohibitively high complexity for updating the value function caused by the continuous states, this paper introduces an approximate value function. We show that the approximate value function is concave and non-decreasing in the variable corresponding to the energy stored in the battery, which further enables the approximate value function be updated in closed-form. This is then used to find the approximately optimal solution of the power allocation for both the finite- and infinite-horizon cases.

The proposed algorithms provide approximate solutions, whose performances are lower bounded by the standard discrete MDP method. Also, to obtain the solution, we solve at most convex optimization problems where is the battery capacity, is the approximation precision, and is the length of horizon for the finite-horizon case or the maximum number of iterations for the infinite-horizon case. In particular, for the infinite-horizon case, given a convergence tolerance , the -converged solution can be obtained within iterations, where is the discount factor.

The remainder of the paper is organized as follows. In Section II, we describe the system model, formulate the energy scheduling problem as a continuous-state MDP problem and define the value function. In Section III, we define an approximate value function and prove that the approximate value function is non-decreasing and concave with respect to the continuous battery state. In Section IV, we derive the optimal closed-form procedure for updating the approximate value function and develop the power allocation algorithms for both finite- and infinite-horizon cases. The proposed algorithms are extended to deal with the model with a general payoff function and imperfect energy prediction in Section V. Section VI provides simulation results and Section VII concludes the paper.

## Ii Problem Formulation

### Ii-a System Model

We consider a point-to-point communication system with one transmitter and one receiver, as shown in Fig. 1. We assume a slow fading channel model where the channel gain is constant for a coherence time of (corresponding to a time slot) and changes independently across slots. The signal model for slot is given by

 yk=Hkxk+wk, (1)

where is the received signal, is the transmitted signal, is the channel gain in slot and is the additive white Gaussian noise consisting of elements.

At the beginning of each slot, the transmitter is informed of the channel access status for the current slot from the control center, where indicates that the channel access is not permitted for slot while indicates otherwise. We assume that follows a stationary first-order Markov process, whose transition probabilities are given as and . If , then the transmit power in slot is . On the other hand, if , then the transmitter needs to decide its transmit power .

The transmitter is powered by an energy harvesting device, e.g., a solar panel, and a battery. The battery, which buffers the harvested energy, has a finite capacity, denoted by . Since the energy harvesting process is steady or can be well predicted, we assume that the energy harvested over the next slots can be non-causally known, denoted as (the causal energy harvesting model will be considered in Section V). We assume is independent across slots (i.i.d. when ).

In slot , the transmitter transmits at a power level of ( if ), which is constrained by the maximum transmission power and the available energy , i.e.,

 0≤pk≤min{pmax,bk/Tc} . (2)

The battery level at the beginning of slot is given as

 bk+1=min{bmax,bk+ek−pkTc} , (3)

with the constraint that the battery level is non-negative for all slots, i.e.,

 bk≥0 . (4)

Further, the transmitter receives a payoff based on the transmission power and channel gain. In this paper, we use the achievable channel rate as the payoff, i.e., . Also, in Section V, we consider a general payoff function which is continuous, non-decreasing, and concave with respect to given .

### Ii-B Problem Formulation

We assume that can be predicted non-causally while all other variables are only known causally to the transmitter (we will relax this assumption in Section V where we assume that is predicted with a random error ). Denote , , and a discount factor . We assume that all the side information, e.g., the distributions of all random variables and the predictions of the harvested energy, is known before the first slot. Then the power allocation policy needs to be calculated to maximize the expected total payoff in the next slots, where consists of the observations available at the beginning of slot . Since and are continuous variables, it is not possible to store in a look-up table. Instead, we only store some of the intermediate results, i.e., the approximate value function introduced in Section III, in an efficient way, and then calculate the power allocation when is observed. Specifically, at the beginning of slot , given , if channel access is permitted, i.e., , the transmitter calculate the power level . And if the channel access is not permitted, i.e., , then . To that end, we formulate the following optimization problem for defining the optimal policy

 P∗≜argmaxpk(⋅),k=1,2,…,K{EH,A[K∑k=1γk−1log(1+pk(Γk)hk)]} , (5)

subject to the constraints in (2), (3), and (4) for .

Note that by (3), the battery level forms a continuous-state first-order Markov chain, whereas the channel access state is a discrete-state Markov chain by assumption. Then, we can convert the problem in (5) to its equivalent MDP recursive form [15] in terms of the value function, which represents the total payoff received in the current slot and expected to be received in the future slots.

Specifically, in the MDP model we treat the battery level and the channel access state , i.e., , as the state, the channel as the observation, and the transmit power as the decision. Then, the state space becomes ; and the corresponding decision space is and , corresponding to and , respectively. The value function is then recursively defined as

 vk(bk,Ak)≜Ehk[maxpk(Γk)∈DAk(bk){log(1+pk(Γk)hk)+γuk(bk,pk(Γk),Ak)}] ,k=1,2,…,K , (6)

where

 uk(bk,pk,Ak) ≜EAk+1|Ak[vk+1(min{bmax,bk+ek−pkTc},Ak+1)] , (7)

and

 vK+1(b,A)=0, for all b∈[0,bmax],A∈{0,1} . (8)

Note that, represents the expected maximum discounted payoff between slots and given the side information and . Due to the causality and the backward recursion, the observation in slot does not affect the value function for slot . Also, when , given the value function for slot , the optimal power allocation for slot can be obtained by

 log(1+phk)+γuk(bk,p,1)} , (9)

where is calculated using (7). Moreover, when , we always have

 p∗k(Γk)=0 . (10)

## Iii Approximate Value Function

By recursively computing the value function defined in (6), in theory we can obtain the optimal solution to (9) for each . However, a closed-form expression for is hard to obtain when is large, e.g., . A typical approach is to quantize the continuous variables to finite number of discrete levels, i.e., to convert the original problem to a discrete MDP problem [15]. However, with such discretization, solving the corresponding discrete MDP problem involves an exhaustive search on for all discretized , and we can only obtain discrete power levels.

In order to efficiently solve the MDP problem and obtain the continuous power allocation, in this section, we will define an approximate value function by using a piecewise linear approximation based on some discrete samples of where is an approximation precision. This approximate value function is shown to be concave and non-decreasing in the variable corresponding to the energy stored in the battery, making the optimal power allocation problem in (9) (or (18)) a convex optimization problem.

### Iii-a Value Function Approximation

With an approximation precision parameter , we define a piecewise linear approximation operator:

 L[vk(b,A),δ]≜vk(⌊b/δ⌋δ,A)+b−⌊b/δ⌋δδ(vk(⌈b/δ⌉δ,A)−vk(⌊b/δ⌋δ,A)), b∈[0,bmax] , (11)

and for any , as shown in Fig. 3.

Initially, we define

 WKδ(b,A)≜L[vK(b,A),δ] , (12)

which is a linear approximation to . Then, recursively from to , we use the approximate value function to replace the original value function in (7), i.e., , and define

 Uk(bk,pk,Ak)≜EAk+1|Ak[Wk+1δ(min{bmax,bk+ek−pkTc},Ak+1)] . (13)

By setting in (6), we further define

 Vk(bk,Ak)≜Ehk[maxpk(Γk)∈DAk(bk){log(1+pk(Γk)hk)+γUk(bk,pk(Γk),Ak)}] . (14)

Finally, we write the approximation value function as

 Wkδ(b,A)≜L[Vk(b,A),δ] . (15)

Note that, in (13)-(15), we made the substitutions and in (7) and (6), respectively. Thus we can treat the approximate value function , which is updated by (13)-(15), as an approximation to the value function , which is updated by (6)-(7).

We consider the approximation error at slot (or iteration ). In each iteration, the error is produced by the piecewise linear approximation in (15) and propagated through solving the problem in (14). Then, at the end of each iteration the total error accumulated by the obtained approximate value function is the sum of the newly produced error and the discounted propagated error, growing with the iteration number. Since the update rules for both and start from the same initial value function , then the total error in the -th iteration (we use the subscript to denote the -th iteration, which represents slot ) can be bounded by

 ||W(i)δ(b,A)−v(i)(b,A)||max≤i∑j=1γi−jϵj(δ) (16)

where

 ϵj(δ)≜maxb∈[0,bmax],A∈{0,1}{V(j)(b,A)−W(j)δ(b,A)}=||V(j)(b,A)−W(j)δ(b,A)||∞ (17)

is the new error produced by (15) in the -th iteration.

With the approximate value function for each slot , when , the power allocation given can be obtained by

 p∗k(Γ)=argmaxp∈D1(b){log(1+ph)+γUk(b,p,1)} . (18)

Define . Note that the approximate value function is linearly recovered from the sample set and for all . We can consider the standard dynamic programing with the discretized state space as a special case of the update rules in (13)-(15). Then, the performance achieved with the approximate value function can be characterized as follows.

###### Proposition 1

The approximate value function obtained by recursively solving (13)-(15) is no less than the discrete value function obtained by the standard dynamic programming method with the state space where is the approximate precision.

###### Proof:

Given the discrete state space , since for any , the standard dynamic programming follows the same update rule in (13)-(15) but with a discrete feasible power allocation set for the optimization problem in (14), which is a subset of . \qed

Moreover, in the standard discrete dynamic programming, we discretize all continuous variables, i.e., , and then perform the dynamic programming with an exhaustive search on for all possible combinations of ; while with the proposed approximate value function, we only discretize the battery state and then obtain the approximate value function for each discretized in closed-form.

### Iii-B Concavity of Approximate Value Function

In (13)-(15), we note that the approximate value function is based on the solution to an optimization problem (14). To facilitate solving (14), in this subsection, we will show that the approximate value function given in (15) is concave for given . Then (14) is a convex optimization problem given and .

First, we introduce the following lemma, which can be easily shown and illustrated in Fig. 2.

###### Lemma 1

If a function is non-decreasing, for any , is also non-decreasing. Further, if the non-decreasing function is concave, then is concave for .

We have the following non-decreasing property of .

###### Proposition 2

For any , if the approximate value function is non-decreasing with respect to given , so is .

###### Proof:

If is non-decreasing with respect to for , by Lemma 1, we have that is also non-decreasing with respect to . Then, we have that , which is a linear combination of the terms of the form , is also non-decreasing with respect to , given and .

Given any battery level , channel fading , the power such that , and such that , we have

 p0∈DA(b+ϵ) , (19)

and

 log(1+p0h)+γUk(b,p0,A) ≤log(1+p0h)+γUk(b+ϵ,p0,A) (20) ≤maxp∈DA(b+ϵ){log(1+ph)+γUk(b+ϵ,p,A)} . (21)

Since is a non-negative linear combination of the terms of the form , is non-decreasing with respect to . Then, by (15), we have that is also non-decreasing with respect to . \qed

The next result is on the concavity of .

###### Proposition 3

For any , if the approximate value function is non-decreasing and concave with respect to given , so is .

###### Proof:

Since is non-decreasing and concave with respect to given , by Lemma 1, we have is non-decreasing and concave with respect to given . Since is a linear combination of and , then is jointly concave with respect to and . Moreover, it follows that is also jointly concave with respect to and given [16].

Since the feasible domain is different under and . We consider the two cases separately.

When , since , can be written as

 Vk(b,0)=Ehk[γUk(b,0,0)] . (22)

Since is concave with respect to given and , so is [16]. Then, by (15), is non-decreasing with respect to .

When , the feasible domain of the objective function in (6) is given by . It can be verified that is a convex set. Then, for any , their convex combination , where and .

Moreover, since are non-empty, we can denote

 p1=argmaxp∈D1(b1){log(1+ph)+γUk(b1,p,1)} , (23)

and

 p2=argmaxp∈D1(b2){log(1+ph)+γUk(b2,p,1)} . (24)

Then

 maxp∈D1(θb1+¯θb2){log(1+ph)+γUk+1(θb1+¯θb2,p,1)} ≤log(1+(θp1+¯θp2)h)+γUk+1(θb1+¯θb2,θp1+¯θp2,1) ≤θlog(1+p1h)+¯θlog(1+p2h)+θγUk+1(b1,p1,1)+¯θγUk+1(b2,p2,1) (25) =θ(log(1+p1h)+γUk+1(b1,p1,1))+¯θ(log(1+p2h)+γUk+1(b2,p2,1)) =θmaxp∈D1(b1){log(1+ph)+γUk+1(b1,p,1)}+¯θmaxp∈D2(b2){log(1+ph)+γUk+1(b2,p,1)} , (26)

where (25) follows from the joint concavity, and (26) follows from the definitions in (23) and (24).

Therefore, we have that is concave with respect to . By (14) and (15), we further have is concave with respect to [16]. \qed

From Propositions 2 and 3, we have that if is non-decreasing and concave so is for any . Since is non-decreasing and concave with respect to , it is easily verified by (6) that is also non-decreasing and concave with respect to given . By induction, we obtain the following theorem.

###### Theorem 1

For , the approximate value function is non-decreasing and concave with respect to given . Further, the problem in (14) is a convex optimization problem given and .

Since both and are concave and non-decreasing, where is the iteration number, we can further bound the approximation error in (17) as follows.

###### Proposition 4

For any iteration , given , we have

 0≤ϵi(δ)≤2V(i)(δ,A)−V(i)(2δ,A)−V(i)(0,A) . (27)
###### Proof:

By Theorem 1, is non-decreasing and concave with respect to given . As illustrated in Fig. 3, for , the value of is smaller than the value on line (*) but larger than , and therefore the distance between the value on line (*) and can also be considered as an upper bound on the approximation error, i.e., for . According to the second-order derivative property of the concave function, we have that

 V(i)((n+1)δ,A)−V(i)(nδ,A)−(V(i)((n+2)δ,A)−V(i)((n+1)δ,A)) ≥ V(i)((n+2)δ,A)−V(i)((n+1)δ,A)−(V(i)((n+3)δ,A)−V(i)((n+2)δ,A)) (28)

for all . Then, we further have that , where . \qed

## Iv Power Allocation with Prefect Energy Prediction

Note that in (14), we need to solve the following optimization problem for a given and :

 p∗(h)=argmaxp(h)∈DA(B){log(1+p(h)h)+γUk(B,p(h),A)}, h≥0 . (29)

When , . On the other hand, when , we will obtain the optimal solution in closed-form.

Since the approximate value function in (15) is a piecewise linear function of given , it follows that in (13) is also a piecewise linear function with respect to given , which is differentiable everywhere except at . By Theorem 1 and Lemma 1, is also concave and non-decreasing with respect to .

Since is a piecewise linear function, we denote as the set of the non-differentiable points, where , , and is the -th smallest element in . Also, we denote as the set of the corresponding slopes, where is the slope of the segment , given by

 wi≜−γTcδEA|1{ Vk+1(⌈min{bmax,B+ek−piTc}/δ⌉δ,A) −Vk+1(⌊min{bmax,B+ek−piTc}/δ⌋δ,A)} , (30)

which is derived from (13) and (15). Hence, the derivative of for is

 w(p)=wi, if p∈(pi−1,pi) . (31)

Since is concave and non-decreasing with respect to , we have . Fig. 4 is a sketch of the stair-case function .

In this section we first obtain the closed-form solution to (29), and then use it to obtain the optimal power allocation for both finite- and infinite-horizon cases.

### Iv-a The Optimal Solution to (29)

In this subsection, for simplicity, we drop the superscript and denote the objective function in (29) as

 gh(p)≜log(1+ph)+γU(B,p,1), p∈D1(B) . (32)

We note that is differentiable for with

 g′h(p)=11/h+p+w(p) . (33)

On the other hand, at the non-differentiable points in , the right-derivative and the left-derivative of can be written as

 g′h(p+)≜11/h+p+w(p+) , (34)

and

 g′h(p−)≜11/h+p+w(p−) , (35)

respectively.

###### Theorem 2

The optimal solution to (29) is given by

 p∗(h)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩−1wi−1h1h∈[−1wi−pi,−1wi−pi−1]∩[0,+∞),i=1,2,…,N−1pi1h∈(−1wi+1−pi,−1wi−pi)∩[0,+∞),i=1,2,…,N−101h∈(−1w1−p0,∞)pN1h∈[0,−1wN−pN) , (36)

where and .

In Fig. 5 we give a sketch of . To prove Theorem 2, we first give the necessary and sufficient conditions for the optimal solution as follows [16].

###### Lemma 2

is the optimal solution to (29) given , if and only if,

1. , when and ;

2. , when ;

3. , when .

Note that, Condition 1 corresponds to the case that is in the interior of . In this case, the left-derivative and the right-derivative should have opposite signs or be both zero at so that the increasing and decreasing of both lead to the decreasing of the objective function. Condition 2 and Condition 3 correspond to the cases that is on each side of the boundary of , where the objective function is non-decreasing and non-increasing for all , respectively.

The following proposition gives a sufficient condition for the optimality of given .

###### Proposition 5

Given any , for , if the energy schedule satisfies

 p∗(h)=⎧⎪⎨⎪⎩−1w(p∗(h))−1h,%whenp∗(h)∈intD1(B)∖I,−1w(p∗(h)−)−1h or −1w(p∗(h)+)−1h, when p∗(h)∈I, (37)

then is the optimal solution to (29).

###### Proof:

Substituting (37) into (34)-(35), we have or when , and when . Since , we have . Moreover, since is concave, we have and . By Lemma 2 (Condition 1), we conclude the optimality. \qed

Then it is easy to verify that for , the solution given by (36) satisfies the optimality condition in Proposition 5.

For , we use the next proposition to prove the optimality of (36).

###### Proposition 6

For any non-differentiable point , is the optimal solution to (29) for any .

###### Proof:

From (34)-(35), and are functions of for a given . If is not empty, it is easy to verify that when , and and