Distributed Opportunistic Scheduling for Energy Harvesting Based Wireless Networks: A Two-Stage Probing Approach

# Distributed Opportunistic Scheduling for Energy Harvesting Based Wireless Networks: A Two-Stage Probing Approach

Hang Li, Chuan Huang, Ping Zhang, Shuguang Cui,  and Junshan Zhang,  Part of this work appeared in the Proceedings of the IEEE International Conference on Computer Communications (INFOCOM), Toronto, ON, Canada, April 27 - May 2, 2014.H. Li and S. Cui are with the Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843 USA (e-mail: david_lihang@tamu.edu; cui@ece.tamu.edu). S. Cui is also a Distinguished Adjunct Professor at King Abdulaziz University in Saudi Arabia and a Visiting Professor at ShanghaiTech University, China.C. Huang is with the National Key Laboratory of Science and Technology on Communications, University of Electronic Science and Technology of China, Chengdu, Sichuan 610051 China (e-mail: huangch@uestc.edu.cn).P. Zhang is with the School of Information and Communication and also with the State Key Lab. of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, 100876 China (e-mail: pzhang@bupt.edu.cn).J. Zhang is with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, Arizona, 85287 USA (e-mail: junshan.zhang@asu.edu).
###### Abstract

This paper considers a heterogeneous ad hoc network with multiple transmitter-receiver pairs, in which all transmitters are capable of harvesting renewable energy from the environment and compete for one shared channel by random access. In particular, we focus on two different scenarios: the constant energy harvesting (EH) rate model where the EH rate remains constant within the time of interest and the i.i.d. EH rate model where the EH rates are independent and identically distributed across different contention slots. To quantify the roles of both the energy state information (ESI) and the channel state information (CSI), a distributed opportunistic scheduling (DOS) framework with two-stage probing and save-then-transmit energy utilization is proposed. Then, the optimal throughput and the optimal scheduling strategy are obtained via one-dimension search, i.e., an iterative algorithm consisting of the following two steps in each iteration: First, assuming that the stored energy level at each transmitter is stationary with a given distribution, the expected throughput maximization problem is formulated as an optimal stopping problem, whose solution is proven to exist and then derived for both models; second, for a fixed stopping rule, the energy level at each transmitter is shown to be stationary and an efficient iterative algorithm is proposed to compute its steady-state distribution. Finally, we validate our analysis by numerical results and quantify the throughput gain compared with the best-effort delivery scheme.

Distributed opportunistic scheduling, energy harvesting, optimal stopping.

## I Introduction

Conventional wireless communication devices are usually powered by batteries that can provide stable energy supplies. However, the battery lifetime limits the operation time of such devices. Recently, energy harvesting (EH) techniques have been proposed as a promising alternative to the conventional constant power supplies [2, 3], which is capable of transferring the renewable energy from the environment into electrical energy. In this way, the node lifetime can be prolonged significantly. Compared with the conventional constant energy suppliers, transmitters powered by energy harvesters are restricted by a new class of EH constraints, i.e., the consumed energy up to any time is bounded by the harvested energy until this point [4]. Therefore, to meet certain performance requirements, such as throughput, stability, delay, etc., these EH constraints should be carefully taken into account in the design of EH-based communication systems.

### I-a Related Works and Motivations

Communication systems powered by energy harvesters have been investigated in recent years. For the point-to-point wireless systems, the authors in [4] [5] considered the throughput maximization problem over a finite horizon for both the cases that the harvested energy information is non-causally and causally known to the transmitter, where the optimal solutions were obtained by the proposed one-dimension search algorithm and dynamic programming (DP) techniques, respectively. In [6], the authors extended the results to the classic three-node Gaussian relay channel with EH source and relay nodes, where the optimal power allocation algorithms were proposed. With a more practical circuit model by considering the half-duplex constraint of the battery, the authors in [7] proposed a save-then-transmit protocol, which divides each transmission frame into two parts: the first one for harvesting energy and the other for data transmission. For wireless networks with EH constraints, the authors in [8] investigated the performance of some standard medium access control protocols, e.g., TDMA, framed-Aloha, and dynamic-framed-Aloha.

In related works on ad hoc networking, opportunistic scheduling has been known as an effective method to utilize the wireless resource [9, 10, 11]. In particular, a distributed opportunistic scheduling (DOS) scheme was introduced in [12, 13], where only local channel state information (CSI) is available to each transmitter. By applying optimal stopping theory [14], it has been shown in [12, 13] that the optimal solution for the expected throughput maximization problem has a threshold-based structure. When channel estimation is imperfect, the authors in [15] proposed a two-level channel probing framework that allows the accessing transmitter to perform one more round of channel estimation before data transmission to improve the quality of estimated CSI and possibly increase the system throughput. The optimal scheduling policy of the two-level probing framework was proven to be threshold-based as well by referring to the optimal stopping with two-level incomplete information [16].

Different from the traditional energy supplies (e.g., non-rechargeable batteries, power grid) in the conventional networks [9, 10, 11, 12, 13, 15], we consider the network powered by energy harvesters that could generate electric energy from different renewable energy sources. Among various types of renewable energy sources, we consider two typical energy harvesting rate models in this paper111A more general case is that the transmitter only has causal information about EH rates, which could be modeled as a Markov process. This model has been used in the point-to-point wireless system [4, 5].:

1. Constant energy harvesting rate model: The EH rate (specifically, the amount of harvested energy per unit time) can be approximated as a constant within the entire time duration of interest. For example, the power variation coherence time of wind and solar EH systems is on the order of multiple seconds [17, 18], while the duration of one communication block is about several milliseconds. Thus, over thousands of communication blocks, the EH rate keeps almost the same.

2. Independent and identically distributed (i.i.d.) energy harvesting rate model: Compared to the constant rate model, the EH rate for this case changes much faster, i.e., comparable to the duration of one communication block. For example, the energy from light, thermal, kinetic, or ambient-radiation sources, usually changes every several milliseconds. Accordingly, EH rates can be modeled as an i.i.d. [5, 8] random process.

With the above two EH models, we investigate the DOS problem for a heterogeneous EH-based network, where the channel gains across different links and the EH rates across different transmitters are non-identical. The system works in a two-stage pattern as follows. In the first stage, all transmitters adopt random access and do channel probing (CP), during which the successful link can obtain the CSI via channel contentions, similar to those in [12, 13, 15]. In the second stage, the successful transmitter at the first stage has the option to spend certain time to harvest more energy, i.e., executes energy probing (EP); and then, with the updated energy state information (ESI), it decides either to transmit in the rest of the transmission block, or to stop probing and give up the channel. With EP, since the total duration of the transmission block is fixed, although spending more time on harvesting energy could increase the energy level, it decreases the portion of the time for data transmission, which leads to a tradeoff to optimize.

### I-B Summary of Contributions

We propose a DOS framework for an ad hoc network powered by energy harvesters, which efficiently utilizes both the CSI and the ESI at each transmitter. In this framework, we adopt a “save-then-transmit” scheme, i.e., the transmitter keeps harvesting energy before it initiates the transmission that uses up all the available energy in the battery. Note that such a greedy power utilization scheme is suboptimal in general, while it is sensible when the number of transmitters is large.

The main contributions of this paper are summarized as follows:

1. First, by assuming that the battery state at each transmitter is stationary with a certain distribution, the throughput maximization problem for the considered network is cast as a rate-of-return problem. We prove the existence of the optimal stopping rules for both EP and CP, and further obtain:

• For the constant EH model, the optimal stopping rule of EP is determined by maximizing the throughput over the transmission block before starting EP, and it is either zero or a finite value according to the given CSI and ESI. Then, based on the stopping rule of EP, the optimal stopping rule of CP is shown to be a pure threshold policy (the threshold does not change over time) and the transmission decision is made right after each round of CP.

• For the i.i.d. EH model, the optimal stopping rule for EP is shown to be dynamic and threshold based, which is obtained by solving a stopping problem over a finite-time horizon. The stopping rule of CP is also threshold based and obtained based on the decision of EP, i.e., either transmit or start a new CP. Unlike the constant case, the transmission decision under i.i.d. EH model is made during the process of EP.

2. Next, with a fixed stopping rule, we show the existence of the steady-state distribution of the battery state by constructing a “super” Markov chain with its states being jointly determined by all transmitters. Moreover, we propose an efficient iterative algorithm to compute the steady-state distribution, executed at each transmitter in parallel. Particularly, it is shown that with the constant EH model, if the network consists of transmitters and each one is with possible energy states, the computational complexity for one iteration of the proposed algorithm is on the order of , which is more efficient (when and are large) than that of the super Markov chain case, whose complexity for one iteration is on the order of .

3. Finally, by exploiting the structure of the rate-of-return problem, we show that the maximum throughput and the optimal scheduling strategy of the DOS framework could be obtained for both the two EH rate models, via one-dimension search by repeating the above two steps.

The rest of this paper is organized as follows. Section II introduces the system model. In Section III, the throughput maximization problem is formulated and solved under the assumption that the stationary distribution of the battery at each transmitter is known. Then, with the obtained stopping rule, we prove in Section IV the existence of the steady-state distribution for each transmitter, and propose an iterative algorithm to compute it. Section V discusses the computation for the optimal throughput. In Section VI, numerical results are provided to validate our analysis and evaluate the throughput gain of our proposed scheduling scheme against the best-effort delivery. Finally, Section VII concludes the paper.

## Ii System Model

We consider a heterogeneous single-hop ad hoc network, where all the transmitter-receiver pairs have independent but not necessarily identical statistical information of CSI and ESI. All pairs contend for one shared channel by random access. For each link, the transmitter is powered by a renewable energy source and utilizes a small rechargeable battery to temporally store the harvested energy. Note that the transmitter could keep harvesting energy until it initiates a data transmission. In addition, we do not consider the effect of inefficiency in energy storage and retrieval, nor the energy consumed other than data transmission, which can be approximately neglected by properly adjusting the energy model [4, 6, 5, 8]. Denote the duration of one channel contention as , and the length of one transmission block as , which is an integer multiple of .

As illustrated in Fig. 1, the DOS procedure of the whole network takes place in two stages: First, each transmitter probes the channel via random access and harvests energy at the same time; and then the successful transmitter may start the EP (to potentially increase the average transmission rate over the transmission block222If the successful transmitter experiences a bad channel condition and a low energy level, it may skip the transmission.) before the data transmission process.

#### Ii-1 Channel probing

In the first stage, a successful channel contention is defined as follows: All transmitters first independently contend for the channel until there is only one contending in a particular time slot. Furthermore, one round of CP is defined as the process to achieve one successful channel contention. Denote the probability that transmitter contends for the channel as , , with . As such, the probability that the -th transmitter successfully occupies the channel is given by . Then, the probability to achieve one successful channel contention at each time slot is given by , and it is easy to check that [19]. Accordingly, for the -th round of CP, , we use to denote the number of time slots needed to achieve a successful channel contention, which is a random variable and satisfies the geometric distribution with parameter [12, 13, 15]. In this way, the expected duration of one round of CP is given as . Denote the transmitted signal at transmitter as , and the received signal is thus given by , where is the complex channel gain and is the circularly symmetric complex Gaussian (CSCG) noise with zero mean and variance at the receiver. Across different links, are independent with finite mean and variance, while not necessarily identically distributed. After one round of CP, the successful transmitter can perfectly estimate the corresponding channel gain via certain feedback mechanisms, and thus is assumed a known constant during the whole transmission block. After CP, the successful transmitter chooses one of the following actions based on its local CSI and ESI:

(a) releases the channel (if the CSI and ESI indicate that the transmission rate is lower than a threshold) and let all links re-contend; or

(b) directly transmits until the end of the transmission block; or

(c) holds the channel, starts EP.
Note that to complete one data transmission, it may take rounds of CPs as depicted in Fig. 1. It is worth noting that each transmitter keeps harvesting energy until it starts a transmission, and after each round of CP, only the successful transmitter makes a choice among three actions as listed above.

#### Ii-2 Energy Probing

When the successful transmitter decides not to take action (a) or (b) defined above, it starts the second stage EP, i.e., action (c), to obtain more energy. During this stage, the transmitter chooses to continue harvesting energy slot by slot, and then ends EP by action (a) or (b), i.e., either releasing the channel or transmitting over the rest of the transmission block. As it is depicted in Fig. 1, one transmission is fulfilled with rounds of CPs and extra slots of EP.

For transmitter , let denote the energy level of the battery after the -th round of CP and additional time slots for EP, where is the set of all possible energy states, with being the minimum energy unit and the capacity of the battery. We use to denote the EH rate of transmitter at time . As noted in the previous section, we consider the following two types of scenarios:

1. Constant EH rate model: are constants for each , i.e., for all , and can thus be learned and assumed non-causally known before transmissions.

2. I.i.d EH rate model: The EH rates among different transmitters are independent. For transmitter , are i.i.d. across , with finite mean and the probability mass function (PMF) , where .

Under the save-then-transmit scheme, the energy level will keep non-decreasing and drop to zero after the transmission, which forms a Markov chain (as described in Section IV later). Thus, the energy level can be written as

 Bin,m=min{Bin,0+lm∑k=0Eik,Bmaxδ}, (1)

where , , and denotes the smaller value between two real numbers and . Note that indicates the energy level after the successful contention round before taking any action. If , i.e., transmitter does not do EP, we let .

## Iii Transmission Scheduling

In this section, we target to derive the optimal scheduling policy that maximizes the average throughput for the considered network with the proposed two-stage access strategy, conditioned on the given battery state distribution. We point out that the results obtained in this section are based on the assumption that the energy level at transmitter is stationary with a given distribution , for , which will be validated in Section IV.

### Iii-a Problem Formulation

After the -th round of CP and additional time slots, the CSI and the ESI at the successful transmitter are given as . Note that the channel gain is now indexed by , which is determined at the end of the -th round of CP and assumed fixed during the whole data transmission block. In particular, denotes the initial information right after the -th round of CP. For convenience, we omit the index for either the CSI or the ESI in the sequel, and retrieve it when necessary.

By adopting the save-then-transmit scheme at the transmitters to fully take advantage of each channel use, the transmission rate over time slots with state is defined as

 Rn(m)=(1−mlL)log(1+|hn|2Bn,m(L−ml)σ2). (2)

When , we set since there is no transmission in this case.

###### Remark III.1

Some important properties of are listed as follows.

• and , which results from the fact that has finite mean and variance and the energy level is also finite.

• are approximately independent random variables over . To see this, recall that the channel gains and the battery states are independent across different transmitters at a given time slot; moreover, the probability is small for a transmitter to occupy the channel in two consecutive contentions when the number of user pairs is large. For example, in an ad hoc network with pairs where each pair fairly competes for the channel use with probability , such a probability is [19], which is as small as 0.0625 even when . Thus, are nearly independent over , which implies that are independent over .

Let be the stopping rule for CP, and be the stopping rule for EP associated with the -th CP for , which together tell the transmitter when to start the data transmission. Then, under these stopping rules, the transmission rate would be , and we let be the total time duration for completing one data transmission. Here, contains the duration of rounds of CP, which is given by , and time slots in which the transmitter probes the energy but gives up the channel after EP. Also, after the -th round of CP with the time , the transmitter may use slots for the EP and transmit within the duration afterwards. Accordingly, we obtain

 TN=lN−1∑n=1Mn+lN∑n=1Kn+L. (3)

If such a process is executed times with bits transmitted at each transmission, , we obtain the average throughput per transmission of the network:

 L∑Jj=1RNj(MNj)∑Jj=1TNj⟶λ=LE[RN(MN)]E[TN] a.s.

as by the renewal theory [20]. Again, we point out that the energy level is stationary at the -th round of CP for , as we assumed.

Our target is to maximize by adjusting the stopping rule and . It is easy to see that maximizing is in fact a “rate-of-return” stopping problem [14, 21] (for which the specific definition is given later). Instead of directly solving this problem, we examine the “net reward” of the considered network, which is given as

 rN(λ)=RN(MN)L−λTN = (RN(MN)−λ)L−λl[KN+N−1∑n=1(Kn+Mn)], (4)

for some . The term can be interpreted as the reward of transmission, as the cost of CP, and as the cost of failed EP for . We set since it is irrational that the system does not send any data forever. Then, we define the maximum value of the expected net reward with as

 (5)

where denotes the least upper bound for a set of real numbers, and

 N≜ {N:N≥1, E[TN]<∞, for Mn∈[0,L/l] with 1≤n≤N}. (6)
###### Remark III.2

One important property of problem (5) is time invariance. We observe that before the system starts the -th round of CP, the accumulated cost over the past rounds of CP has already been finalized, with no need to be further considered in the remaining decision process. Moreover, are independent over as we mentioned before; it follows that the expected optimal reward before the -th round of CP is the same as that of any previous round of CP. In other words, the system can obtain the expected optimal reward whenever a new round of CP is about to start. Therefore, we conclude that problem (5) is time invariant.

Recall from Section II that after each round of CP, the successful transmitter will choose one of three actions (i.e., transmitting, giving up the channel, or starting EP) according to the stopping rule of CP, which needs the expected reward of EP depending on the stopping rule of EP. Thus, we will first introduce the formulation and the optimal stopping rule for EP, and then for CP.

#### Iii-A1 Formulation for EP

When the successful transmitter starts EP after the -th round of CP, where , it will end up with one of the two actions: transmitting or giving up the channel without transmission. Specifically, we define the expected optimal reward at the -th slot of EP, , as

 Uk(Fn,k)=maxk≤Mn≤L/lE [max{(Rn(Mn)−λ)L, −λlMn+S∗(λ)}∣Fn,k], (7)

where is the expected value of giving up the channel after slots of EP. If , denotes the maximum of the expected net reward right after the -th round of CP. In other words, we want to find the optimal stopping rule of EP which attains

 U0(Fn,0)=max0≤Mn≤L/lE [max{(Rn(Mn)−λ)L, −λlMn+S∗(λ)}∣Fn,0]. (8)

Note that exists since problem (III-A1) is an optimal stopping problem over a finite time horizon [14, 22].

#### Iii-A2 Formulation for CP

By choosing , we define

 λ∗=supN∈NLE[RN(M∗N)]E[TN], N∗=argsupN∈NLE[RN(M∗N)]E[TN]. (9)

Note that if the optimal stopping rule , we would claim that does not exist. Thus, is the optimal average throughput of the original rate-of-return problem.

The connection between the transformed problem (5) and the original problem (9) is introduced in the following lemma. It is worth noticing that with the optimal stopping rule for EP, problem (5) boils down to a one-level stopping problem with stopping rule .

###### Lemma III.1

(i) If there exists such that , this is the optimal throughput defined in (9). Moreover, if is attained at , the stopping rule defined in (9) is the same as , i.e., .

(ii) Conversely, if (9) is true, there is , which is attained at given by (9).

This lemma directly follows Theorem 1 in Chapter 6 of [14].

The next proposition secures the existence of the optimal stopping rule for CP.

###### Proposition III.1

With the EP stopping rule , the optimal stopping rule for problem (5) exists. Moreover, for , the following equation holds

 S∗(λ)=U0(FN,0)−λlKN. (10)

The proof is given in Appendix A.

###### Remark III.3

The equation (10) is obtained from the optimality equation of the CP. The calculation of the optimal throughput relies on this equation, which will be shown in Section V.

Now, we are ready to derive the optimal stopping rules and that jointly maximize the expected value of for the two different EH models. As we mentioned above, the stopping rule for CP relies on the form of (the stopping rule for EP). We will find the optimal stopping rule before . After obtaining the forms of the optimal stopping rules, the calculation for the optimal throughput will be discussed.

### Iii-B Optimal Stopping Rule for Constant EH Model

For notation simplicity, we omit the index of CP when we derive the stopping rule in this subsection. Then, we will derive the stopping rule based on the results of EP.

When the EH rate is constant, the transmission rate is deterministic for a given over the transmission block. Then, we obtain a simplified version of (III-A1) as

 U0(F0)=max0≤M≤L/lmax{(R(M)−λ)L,−λlM+S∗(λ)}.

The value of can be obtained simply by comparing and , whose values can be computed individually. Clearly, the first one achieves its maximum at . For the second term, only is changing over with a given . Therefore, we settle down to the following auxiliary problem:

 V∗=argmax0≤V≤L/lR(V). (11)

Then, we could use the optimal to find without difficulty. Note that when , it follows that according to our definition in Section II, which implies that cannot be optimal, and thus we take . We first consider a related continuous version of by relaxing as , :

 max0≤ρ<1R(ρ)=max0≤ρ<1(1−ρ) ⋅log(1+|h|2min{B0+ρLE,Bmaxδ}(1−ρ)Lσ2). (12)

After solving (III-B), we will show how to obtain the optimal solution of problem (11).

First, we establish some properties for the objective function of problem (III-B).

###### Proposition III.2

For arbitrary , we have that

1. the function is concave over , and ;

2. the function is concave and non-increasing over .

{proof}

Since , when , is simply concave over on according to part 1) of Proposition III.2. When , according to Proposition III.2, is concave over , and is non-increasing on . Thus, cannot achieve its maximum on . Therefore, we treat this fact as a new constraint over , and rewrite problem (III-B) as

 maxG(ρ)=max(1−ρ)log(1+|h|2B0+ρLE(1−ρ)Lσ2) s.t.  B0+ρLE≤Bmaxδ, 0≤ρ<1. (13)

Next, we establish the following proposition to solve problem (III-B), where the obtained solution is optimal for problem (III-B) as well.

###### Proposition III.3

The optimal solution for problem (III-B) is given by:

 ρ∗={min{ρ0,Bmaxδ−B0LE},when C+D1+C≥log(1+C);0,otherwise,

where , , and is the unique solution for the equation when .

{proof}

Based on the optimal solution , the optimal for in (11) can be obtained easily: We only need to compare against , and should attain the larger value. Specifically, we have the following result.

###### Proposition III.4

The optimal of the problem (11) is given by

 (14)

where is obtained by Proposition III.3. Thus, the optimal stopping rule is given by

 M∗={0,if (R(V∗)−λ)L

The optimal reward with constant EH rate model is

 U0(F0)=max{(R(V∗)−λ)L,S∗(λ)}. (16)

Next, the following proposition formally quantifies the optimal stopping rule and the equation to compute the optimal throughput .

###### Proposition III.5

The optimal stopping rule to solve problem (5) is given by

 N∗=min{n≥1:Rn(V∗)≥λ∗}, (17)

with given in Proposition III.4. Moreover, satisfies the following equation

 I∑i=1QiE[(Ri(V∗)−λ∗)+]=λ∗lL, (18)

where the function means for some real number , and is the probability of a successful channel contention at transmitter , defined in Section II. The index for in (18) is removed since are ergodic for .

{proof}

Following (16) in Proposition III.4, the stopping rule has the form

 N∗=min{n≥1:(Rn(V∗)−λ∗)L≥S∗(λ∗)}. (19)

Thus, we can obtain by plugging into (19), which results in (17). Finally, equation (18) can be obtained by plugging into (10) and taking the expectation on both sides.

###### Remark III.4

Note that the stopping rule (19) implies that each transmitter has the same threshold that is globally determined even when all transmitters have different statistics of the CSI and ESI. The intuition is similar to that in [13]: In order to guarantee the overall system performance, the transmitter with a bad channel condition and a low energy level should “sacrifice” its own reward, while the one with good conditions should transmit more data.

Directly following Propositions III.4 and III.5, the next proposition gives the DOS under the constant EH model.

###### Proposition III.6

After the -th round of CP, it is optimal for the successful transmitter to take one of the following two options:

1. release the channel immediately if (which is equivalent to ), and let all transmitters perform the next round of CP;

2. otherwise, transmit after slots for EH, where is given by Proposition III.4.

### Iii-C Optimal Stopping Rule for i.i.d. EH Model

Similarly as in the previous subsection, we first consider problem (III-A1) to find the optimal stopping rule , then the optimal stopping rule afterwards.

Under the i.i.d. EH model, has the form in (III-A1). As we mentioned in Section III-A, it is a finite-horizon stopping problem [14, 22], and the solution of problem (III-A1) could be directly generalized in the next proposition.

###### Proposition III.7

For and some , the optimality equation for problem (III-A1) is given by

 Uk(Fk)=max {(R(k)−λ)L,−λkl+S∗(λ), E[Uk+1(Fk+1)∣Fk]}, (20)

and the optimal stopping rule has the following form:

 M∗ =min{0≤k≤L/l: Uk(Fk)=max{(R(k)−λ)L,−λkl+S∗(λ)}}. (21)

The stopping rule given in (III.7) suggests that the EP would stop at by either transmitting or giving up the channel, which also indicates the final decision for the current round of CP. Thus, the optimal stopping rule could be obtained by reorganizing (III.7).

###### Proposition III.8

The optimal stopping rule of CP under the i.i.d. EH model has the form as:

 N∗=min{n≥1:UM∗(Fn,M∗)=(Rn(M∗)−λ∗)L}, (22)

where is the optimal stopping rule of EP given in Proposition III.7. The optimal throughput satisfies the following equation

 I∑i=1 QiE[E[max{Ri(M∗)−λ∗,−λ∗M∗l/L}∣F0]+] = λ∗lL. (23)

The proof is analogous to the constant EH rate case, which is omitted here.

The next proposition, which directly follows Propositions III.7 and III.8, concludes the overall DOS under i.i.d. EH model.

###### Proposition III.9

After the -th round of CP, it is optimal for the successful transmitter to take one of the following two options:

1. if , release the channel immediately and let all transmitters start the next round of CP.

2. otherwise, start EP following the optimal stopping rule given in Proposition III.7.

###### Remark III.5

Propositions III.6 and III.9 summarize the DOS under the constant and i.i.d. EH models, respectively. We observe that under the constant EH model, the EP could be “forecasted” by finding the optimal ; then the decision of transmission would be made before starting EP. On the contrary, when the EH rates are i.i.d., such decision can only be made step by step during the EP.

## Iv Battery Dynamics

In this section, we validate the assumption made in Section III that the energy level at each transmitter is stationary with some distribution. Firstly, we show that under the constant EH model, the energy level stored at each transmitter forms a Markov chain over time, while the state transition probabilities for different transmitters are coupled together. However, we propose an iterative algorithm to compute the corresponding steady-state distribution, which is shown converging to the global optimal point. Then, we extend our analysis to the case with i.i.d. EH rate model.

### Iv-a Battery with Constant EH Model

Note that after CP, if the successful transmitter releases the channel immediately, then the next round of CP starts, and the battery continues to be charged. If the transmitter starts the transmission, its energy level will become zero at the end of the transmission block according to Section II. During this time, all other transmitters will keep harvesting energy within this period. Thus, the energy level transition over the transmission block can be determined. To simplify our analysis, the transmission block is treated as one time slot with length for the purpose of counting battery state transitions. In addition, we assume that the battery works in half-duplex mode, i.e., it cannot be charged when the transmitter transmits data.

For transmitter with EH rate , , the set of its energy states is given by , where is the slot index. The state transition is depicted in Fig. 2. In addition, we denote the distribution of the energy level for transmitter at time as .

Next, we consider the state transition probability. Suppose that transmitter is at energy level , there are three events that may happen at time slot :

(i) It occupies the channel and transmits. According to Section II, transmitter consumes all the energy for the transmission, and transfers to the energy level 0 after the transmission. Thus, the transition probability is given by

 piui,0=Qipitr(ui), (24)

where is the probability that the -th transmitter occupies the channel, and is the probability that it successfully transmits with the energy level . Furthermore, according to (17), can be computed as

 pitr(ui)=P{Ri(V∗)≥λ∗} = P⎧⎨⎩log(1+|hi|2ui+V∗lEi(L/l−V∗)lσ2i)≥λ∗1−V∗lL⎫⎬⎭, (25)

where is defined by (14) in Proposition III.4. Note that in (IV-A), is the only random variable and its distribution is known.

(ii) Other transmitters occupy the channel and transmit. If anyone among the other transmitters sends data, transmitter will harvest units of energy during this period, and then attain level . Suppose that the -th transmitter transmits. Similar to the first case, the probability of transmission performed by the -th transmitter is given by , where and thus . Since there are in total transmitters, the transition probability for the transmitter from level to is given by

 piui,vi=∑j≠iQjBmax∑b=0πjt,bpjtr(bEjl). (26)

(iii) No transmission happens. In this case, transmitter just harvests units of the energy and goes into state . The probability of this case happening can be directly obtained as

 piui,wi=1−piui,0−piui,vi. (27)

Note that when , the transition probability is just given by

 piui,˜ui =piui,vi+piui,wi=piu,vi+1−piui,0−piui,vi =1−piui,0. (28)

In this way, we can compute all for , where and . The transition probability matrix is nothing but with dimension . Obviously, is a stochastic matrix, i.e, a square matrix in which all elements are nonnegative and the row sum is 1. However, depends on since depends on the state distribution for all . Therefore, is a non-homogeneous Markov chain, whose state evolution is given by

 Πit+1=ΠitPit, t≥0. (29)

We propose Algorithm I, which is summarized in Table I, to compute the steady-state distribution for all transmitters. Here, the infinity norm is applied, which is defined as for .

###### Proposition IV.1

For any given initial state distribution , that is generated by Algorithm I, converges to a unique steady-state distribution for all .

The proof is given in Appendix D.

###### Remark IV.1

The steady-state distribution for all transmitters can be obtained by the iterative computation over the “super” Markov system as well, which is constructed in Appendix D. However, this is not as efficient as Algorithm I. From the computational complexity point of view, suppose that each transmitter has energy levels, and there are transmitters in total. The number of the states in the “super” Markov chain is . If there is only one processer, the floating-point calculation for one iteration of the state distribution for the “super” Markov chain is approximately on the order of . On the contrary, by using Algorithm I, (26) requires calculations, and updating requires about calculations according to (27). In addition, requires calculations. Overall, one iteration for all transmitters is approximately on the order of , which is more efficient than the case for the “super” Markov chain especially when and are large. Moreover, our algorithm can also be operated in a parallel way, i.e., computing for at the same time over different cores.

### Iv-B Battery with i.i.d. EH Model

The argument that the battery state evolves as a Markov process for the random case is analogous to that of the constant case in the previous subsection. The main difference is that the probability defined by (IV-A) is changed, which needs to be further developed under the i.i.d. EH rate model.

We now consider the calculation of . When transmitter grabs the channel with energy level , according to the stopping rule (III.7) and (22), the transmitter checks the condition . If it is true, the transmitter starts EP until the -th slot and transmits when according to (22). Specifically, given , the transmitter continues EP at slot for , which is equivalent to , where . Then, at slot , the transmitter stops EP and transmits when . Thus, we obtain

 pitr(ui) =∫∞0P{Transmits at M∗∣U0(ui,d|hi|2)≥0}⋅ P{U0(ui,d|hi|2)≥0}f(|hi|2)d|hi|2, (30)

where is the probability density function (PDF) of the channel power gain. The probability can be computed based on Proposition III.7. For notation simplicity, we omit the condition , and the first term in the integral of (IV-B) can be expanded as

 (31)

where , and . Note that in , and are random since they are the functions of , where are i.i.d. with a known distribution and . Thus, can be computed. Using the similar argument, it is easy to see that can be computed as well. Therefore, the probability given in (31) is computable. Overall, we could obtain after plugging (31) into (IV-B).

After obtaining , the transition probability , where , and , can be calculated similarly as the case of constant EH rate. In addition, Algorithm I and Proposition IV.1 could be modified, such that they could suit the i.i.d. EH model, which is omitted in this paper.

## V Computation of the Optimal Throughput

The optimal throughput hinges upon the optimal stopping rules in (17) and (22). Thus, to fully obtain the optimal scheduling policy of the proposed DOS, we next turn our attention to computing the value of .

By Propositions III.5 and III.8, can be obtained by solving (18) or (III.8) under the constant or i.i.d. EH model, respectively. Next, we briefly introduce the idea why there exists such that the equation (18) or (III.8) holds, and how to search . For brevity, we focus the constant EH rate case.

Note that is a function of random variables and ; we could calculate the expectation on the left-hand side of (18) for each given . Such expectation requires the distribution of , i.e., the steady-state distribution , which could be approximately computed as shown in Section IV. In addition, for a given , an upper bound of this expectation can be obtained by fixing . As increases from zero to infinity, this upper bound decreases to zero at some . Since the right-hand side of (18) is strictly increasing over within the range , there at least exists one satisfying (18). Therefore, an exhaustive one-dimension search can be applied to obtain the optimal throughput over the range . Note that during each iteration of the exhaustive search, Algorithm I (given in Section IV) is used to obtain the steady-state distribution for a given , and then we check if the equation (18) or (III.8) holds. Finally, should be the largest one in that makes the equation (18) or (III.8) hold.

In summary, the above search can characterize the optimal stopping rules given in Propositions III.5 and III.8, which completes the proposed DOS framework.

## Vi Numerical results

In this section, we first validate Propositions III.5 and III.8 to show that the optimal throughput exists and can be found via one-dimension search. Second, we investigate the throughput gain of our proposed DOS with two-level probing over the best-effort delivery method, where the data is transmitted whenever the channel contention is successful. Note that such a method can be realized in the proposed DOS framework by fixing and setting in (17) and (22). Let denote the throughput obtained by the best-effort scheme, which can be calculated as

 λ0=∑Ii=1QiQE[Llog(1+|hin|2Bin,0Lσ2)]lQ+L. (32)

In general, a typical button cell battery has the capacity of 150 mAh with the end-point voltage of 0.9 V, which is equal to 150 mAh 3600 s/h 0.9 V = 486 J. A thin-film rechargeable battery can offer 50 Ah with 3.3 V, which is equal to 0.594 J. Since a typical transmission time interval is on the time scale of milliseconds, we let the energy unit be  J in the simulation. Accordingly, we set the capacity of the battery , which falls between the capacity volume of a thin-film battery and that of a button cell battery. Also, the current commercial solar panel can provide power from 1 W to about 400 W, which is equivalent to ms ms. According to this fact, in our simulation, we let the EH rate vary within the range . In addition, the channel gains are i.i.d for different links and the channel power gains follow an exponential distribution with mean 5. The variance of the noise is set to be 10 mW. The length of one time slot is unified as  ms and the length of a transmission block is .

#### Vi-1 Validation of Propositions iii.5 and iii.8

In Fig. 3, we illustrate the variation of the average throughput as the “threshold” changes. Without loss of generality, we first consider a homogeneous network with user pairs, i.e., all pairs are identical. For the constant EH model, the EH rate is set to be for all transmitters. For the i.i.d. EH case, we choose the Bernoulli model [25, 26]: The EH rate is either zero or of a finite value with probability 0.5. In our simulation, we consider three cases for the mean values in i.i.d. EH model: , , and .

First, we observe in Fig. 3 that as increases from zero, the average throughput is increasing then decreasing. Then, the optimal point is achieved at , where the average throughput is at its apex that is also approximately of the same value as . Taking the case of i.i.d. EH model with mean as an example in Fig. 3, the value of the optimal throughput is approximately 4.5, and the actual optimal average throughput is about 4.5 as well. Therefore, this observation validates our Propositions III.5, III.8 and discussions in Section V. Second, we observe that the average throughput is almost the same when the mean of the EH rate in the i.i.d. EH model is equal to the EH rate in the constant EH model. Thus, the type of EH rate models does not directly determine the average throughput performance.

#### Vi-2 Throughput gain

We use to denote the throughput where only EP is adopted, i.e., setting and , and to denote the throughput where only CP is adopted, i.e., setting and . Thus, the throughput gains are defined as:

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩GEP=λEP−λ0λ0,gain from EP;GCP=λCP−λ0λ0,gain from CP;GDOS=λ∗−λ0λ0,gain from CP + EP. (33)

In Fig. 4, we evaluate the above throughput gains for the network with user pairs. Recall from Section II that our analysis is applicable for . Since the constant and i.i.d. EH rate models could attain the same throughput performance over , we only consider the constant EH model in this case. Particularly, we study a heterogeneous case where the first two transmitters have the same EH rates , while the EH rate of the third transmitter varies from to .

We observe in Fig. 4 that as the EH rate of the third transmitter increases, almost keeps constant and can achieve a gain about 19%. It implies that after the channel contention, the successful transmitter with any EH rate could do EP to enhance its average transmission rate over the transmission block. Thus, the ESI of the successful transmitter does not have obvious impact on the throughput. However, we notice that achieves its maximum when all transmitters are identical (with the same EH rate ) and then decreases slowly as the EH rate of the third transmitter increases. The intuition is that when the difference among EH rates becomes larger, the stopping rule of CP will more likely let the transmitter with relatively low energy level to give up the channel, which results in a longer time on CP and then the throughput gain is lower than the case when all transmitters are identical. Regarding , our proposed DOS with two-stage probing can achieve the highest throughput gain among three schemes. It is worth noticing that as the EH rate of the third transmitter increases, the efficiency of DOS becomes more apparent, although slowly, than the scheme with pure CP, which implies that the second stage probing brings more benefits. Our intuition is that a larger differen