Age-Minimal Transmission for Energy Harvesting Sensors with Finite Batteries: Online PoliciesAhmed Arafa and H. Vincent Poor are with the Electrical Engineering Department, Princeton University, NJ 08544. Jing Yang is with the School of Electrical Engineering and Computer Science, The Pennsylvania State University, PA 16802. Sennur Ulukus is with the Department of Electrical and Computer Engineering, University of Maryland College Park, MD 20742. Emails: aarafa@princeton.edu, yangjing@psu.edu, ulukus@umd.edu, poor@princeton.edu. This work is presented in part in the 2018 Information Theory and Applications Workshop [1], and in the 2018 International Conference on Communications [2].

# Age-Minimal Transmission for Energy Harvesting Sensors with Finite Batteries: Online Policies††thanks: Ahmed Arafa and H. Vincent Poor are with the Electrical Engineering Department, Princeton University, NJ 08544. Jing Yang is with the School of Electrical Engineering and Computer Science, The Pennsylvania State University, PA 16802. Sennur Ulukus is with the Department of Electrical and Computer Engineering, University of Maryland College Park, MD 20742. Emails: aarafa@princeton.edu, yangjing@psu.edu, ulukus@umd.edu, poor@princeton.edu. This work is presented in part in the 2018 Information Theory and Applications Workshop [1], and in the 2018 International Conference on Communications [2].

Ahmed Arafa Jing Yang Sennur Ulukus and H. Vincent Poor
###### Abstract

An energy-harvesting sensor node that is sending status updates to a destination is considered. The sensor is equipped with a battery of finite size to save its incoming energy, and consumes one unit of energy per status update transmission, which is delivered to the destination instantly over an error-free channel. The setting is online in which the harvested energy is revealed to the sensor causally over time after it arrives, and the goal is to design status update transmission times (policy) such that the long term average age of information (AoI) is minimized. The AoI is defined as the time elapsed since the latest update has reached at the destination. Two energy arrival models are considered: a random battery recharge (RBR) model, and an incremental battery recharge (IBR) model. In both models, energy arrives according to a Poisson process with unit rate, with values that completely fill up the battery in the RBR model, and with values that fill up the battery incrementally in a unit-by-unit fashion in the IBR model. The key approach to characterizing the optimal status update policy for both models is showing the optimality of renewal policies, in which the inter-update times follow a renewal process in a certain manner that depends on the energy arrival model and the battery size. It is then shown that the optimal renewal policy has an energy-dependent threshold structure, in which the sensor sends a status update only if the AoI grows above a certain threshold that depends on the energy available in its battery. For both the random and the incremental battery recharge models, the optimal energy-dependent thresholds are characterized explicitly, i.e., in closed-form, in terms of the optimal long term average AoI. It is also shown that the optimal thresholds are monotonically decreasing in the energy available in the battery, and that the smallest threshold, which comes in effect when the battery is full, is equal to the optimal long term average AoI.

\setstretch

1.2

## 1 Introduction

Real-time sensing applications in which time-sensitive measurement status updates of some physical phenomenon are sent to a destination (receiver) calls for careful transmission scheduling policies under proper metrics that assess the updates’ timeliness and freshness. The age of information (AoI) metric has recently acquired attention as a suitable candidate for such a purpose. The AoI is defined as the time spent since the latest measurement update has reached the destination, and hence it basically captures delay from the destination’s perspective. When sensors (transmitters) rely on energy harvested from nature to transmit their status updates, they cannot transmit all the time, so that they do not run out of energy and risk having overly stale status updates at the destination. Therefore, the fundamental question as to how to optimally manage the harvested energy to send timely status updates needs to be addressed. In this work, we provide an answer to this question by deriving optimal status update policies for energy harvesting sensors with finite batteries in an online setting where the harvested energy is only revealed causally over time.

The online energy harvesting communication literature, in which energy arrival information is only revealed causally over time, is mainly studied via Markov decision processes modeling and dynamic programming techniques, see, e.g., [3, 4, 5, 6, 7, 8, 9], and also via specific analyses of the involved stochastic processes, as in [10, 11, 12]. A different approach is introduced in [13], and then extended in [14, 15, 16, 17, 18, 19, 20, 21, 22, 23] for various system models, in which an online fixed fraction policy, where transmitters use a fixed fraction of their available energy for transmission in each time slot, is shown to perform within a constant gap from the optimal online policy. Such fixed fraction online policies are simpler than usual online policies introduced in the literature, with provable near-optimal performance. In the online setting of this work, we also investigate a relatively simple online policy, and show its exact optimality.

The AoI metric has been studied in the literature under various settings; mainly through modeling the update system as a queuing system and analyzing the long term average AoI, and through using optimization tools to characterize optimal status update policies, see, e.g., [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36], and also the recent survey in [37]. In this work, we employ tools from optimization theory to devise age-minimal online status update policies in systems where sensors are energy-constrained, and rely on energy harvested from nature to transmit status updates. Some related works to this problem are summarized next.

AoI minimization in energy harvesting communications have been recently considered in [38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48] under various service time (time for the update to take effect), battery capacity, and channel assumptions. With the exception of [41], an underlying assumption in these works is that energy expenditure is normalized, i.e., sending one status update consumes one energy unit. References [38, 39] consider a sensor with infinite battery, with [38] focusing on online policies under stochastic service times, and [39] focusing on both offline and online policies with zero service times, i.e., with updates reaching the destination instantly. The offline policy in [39] is extended to non-zero, but fixed, service times in [40] for both single and multi-hop settings, and in [41] for energy-controlled variable service times. The online policy in [39] is analyzed through a dynamic programming approach in a discretized time setting, and is shown to have a threshold structure, i.e., an update is sent only if the age grows above a certain threshold and energy is available for transmission. Motivated by such results for the infinite battery case, [42] then studies the performance of online threshold policies for the finite battery case under zero service times, yet with no proof of optimality. Reference [43] proves the optimality of online threshold policies under zero service times for the special case of a unit-sized battery, via tools from renewal theory. It also shows the optimality of best effort online policies, where updates are sent over uniformly-spaced time intervals if energy is available, for the infinite battery case. Such best effort online policy is also shown to be optimal, for the infinite battery case, when updates are subject to erasures in [44, 45]; with no erasure error feedback in [44] and with perfect erasure error feedback in [45]. Under the same system model of [44], reference [46] analyzes the performances of the best effort online policy and the save-and-transmit online policy, where the sensor saves some energy in its battery before attempting transmission, under coding to combat channel erasures. A slightly different system model is considered in [47], in which status updates are externally arriving, i.e., their measurement times are not controlled by the sensor. With a finite battery, and stochastic service times, reference [47] employs tools from stochastic hybrid systems to analyze the long term average AoI. An interesting approach is followed in [48] where the idea of sending extra information, on top of the measurement status updates, is introduced and analyzed for unit batteries and zero service times.

In this paper, we show the optimality of online threshold policies under a finite battery setting, with zero service times. We consider two energy arrival (recharging) models, namely, a random battery recharge (RBR) model, and an incremental battery recharge (IBR) model. In both models, energy arrives according to a Poisson process with unit rate, yet with the following difference: in the RBR model, energy arrivals completely fill up the battery, while in the IBR model, energy arrivals fill up the battery incrementally in a unit-by-unit fashion. We invoke tools from renewal theory to show that the optimal status update policy, the one minimizing the long term average AoI, is such that specific update times, depending on the recharging model, follow a renewal process with independent inter-update delays. Then, we follow a Lagrangian approach to show that the optimal renewal-type policy, for both recharging models, has an energy-dependent threshold structure, in which an update is sent only if the AoI grows above a certain threshold that depends on the energy available in the battery, the specifics of which vary according to the recharging model. Our approach enables characterizing the optimal thresholds explicitly, i.e., in closed-form, in terms of the optimal long term average AoI, which is in turn found by a bisection search over an interval that is strictly smaller than the unit interval. We also show that, for both recharging models, the optimal thresholds are monotonically decreasing in the available energy, i.e., the higher the available energy the smaller the corresponding threshold, and that the smallest threshold, corresponding to a full battery, is equal to the optimal long term average AoI.

We acknowledge an independent and concurrent work [49] that considers the same setting of the IBR model considered in this work, and also shows the optimality of online threshold policies. In there, tools from the theory of optimal stopping, from the stochastic control literature, are invoked to show such result, along with some structural properties. The optimal thresholds are found numerically. Different from [49], however, and as mentioned above, the approach followed in this work for the IBR model, namely, proving the renewal structure of the optimal policy followed by the Lagrangian approach, allows characterizing the optimal energy-dependent thresholds in closed-form in terms of the optimal AoI.

## 2 System Model and Problem Formulation

We consider a sensor node that collects measurements from a physical phenomenon and sends updates to a destination over time. The sensor relies on energy harvested from nature to acquire and send its measurement updates, and is equipped with a battery of finite size to save its incoming energy. The sensor consumes one unit of energy to measure and send out an update to the destination. We assume that updates are sent over an error-free link with negligible transmission times as in [39, 42, 43]. Energy arrives (is harvested) at times according to a Poisson process of rate . Our setting is online in which energy arrival times are revealed causally over time; only the arrival rate is known a priori. We consider two models for the amount of harvested energy at each arrival time. The first model, denoted random battery recharge (RBR), is when energy arrives in units. This models, e.g., situations where the battery size is relatively small with respect to the amounts of harvested energy, and hence energy arrivals fully recharge the battery. We note that this RBR model has been previously considered in the online scheduling literature in [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] and in the information-theoretic approach considered in [50]. The second model, denoted incremental battery recharge (IBR), is when energy arrives in units, i.e., when the battery is recharged incrementally in a unit-by-unit fashion. We mathematically illustrate the difference between the two models below.

Let denote the time at which the sensor acquires (and transmits) the th measurement update, and let denote the amount of energy in the battery at time . We then have the following energy causality constraint [51]:

 E(s−i)≥1,∀i (1)

We assume that we begin with an empty battery at time . For the RBR model, the battery evolves as follows over time:

 E(s−i)=min{E(s−i−1)−1+B⋅A(xi),B} (2)

where , and denotes the number of energy arrivals in . Note that is a Poisson random variable with parameter . We denote by the set of feasible transmission times described by (1) and (2) in addition to an empty battery at time 0, i.e., . Similarly, for the IBR model, the battery evolves as follows over time:

 E(s−i)=min{E(s−i−1)−1+A(xi),B} (3)

We denote by , the set of feasible transmission times described by (1) and (3) in addition to an empty battery at time 0, i.e., .

For either recharging model, the goal is to choose an online feasible transmission policy (or equivalently ) such that the long term average of the AoI experienced at the destination is minimized. The AoI is defined as the time elapsed since the latest update has reached the destination, which is formally defined as follows at time :

 a(t)≜t−u(t) (4)

where is the time stamp of the latest update received before time . Let denote the total number of updates sent by time . We are interested in minimizing the area under the age curve representing the total cumulative AoI, see Fig. 1 for a possible sample path with . At time , this area is given by

 r(t)≜12n(t)∑i=1x2i+12(t−sn(t))2 (5)

and therefore the goal is to characterize the following quantity for the RBR model:

 ρB≜minx∈FBlimsupT→∞1TE[r(T)] (6)

representing the long term average AoI, where is the expectation operator. Similarly, for the IBR model, the goal is to characterize

 ρ≜minx∈FlimsupT→∞1TE[r(T)] (7)

We discuss problems (6) and (7) in Sections 4 and 5, respectively. In the next section, we discuss the special case of in which the two models are equivalent.

## 3 Unit Battery Case: A Review

In this section, we review the case studied in [43]. Observe that for , and problems (6) and (7) are identical. In studying this problem, reference [43] first shows that renewal policies, i.e., policies with update times forming a renewal process, outperform any other uniformly bounded policy, which are defined as follows (see [43, Definition 3]).

###### Definition 1 (Uniformly Bounded Policy)

An online policy whose inter-update times, as a function of the energy arrival times, have a bounded second moment.

Then, reference [43] shows that the optimal renewal policy is a threshold policy, where an update is sent only if the AoI grows above a certain threshold. We review this latter result in this section.

Let denote the time until the next energy arrival since the th update time, . Since the arrival process is Poisson with rate 1, ’s are independent and identically distributed (i.i.d.) exponential random variables with parameter 1. Under renewal policies, the th inter-update time should not depend on the events before ; it can only be a function of . Moreover, under any feasible policy, cannot be smaller than , since the battery is empty at . Next, note that whenever an update occurs, both the battery and the age drop to 0, and hence the system resets. This constitutes a renewal event, and therefore using the strong law of large numbers of renewal processes [52], problem (6) (or equivalently problem (7)) reduces to

 ρ1=minx(τ)≥τ E[x(τ)2]2E[x(τ)] (8)

where expectation is over the exponential random variable .

In order to make problem (8) more tractable to solve, we introduce the following parameterized problem:

 p1(λ)≜minx(τ)≥τ 12E[x(τ)2]−λE[x(τ)] (9)

This approach was discussed in [53]. We now have the following lemma, which is a restatement of the results in [53], and provide a proof for completeness.

###### Lemma 1

is decreasing in , and the optimal solution of problem (8) is given by that solves .

Proof:  Let , and let the solution of problem (9) be given by for . Now for some , one can write

 p1(λ1) =12E[(x(1))2]−λ1E[x(1)] >12E[(x(1))2]−λ2E[x(1)] ≥p1(λ2). (10)

where the last inequality follows since is also feasible in problem (9) for .

Next, note that both problems (9) and (8) have the same feasible set. In addition, if , then the objective function of (8) satisfies . Hence, the objective function of (8) is minimized by finding the minimum such that . Finally, by the first part of lemma, there can only be one such , which we denote .

By Lemma 1, one can simply use a bisection method to find that solves . This certainly exists since and . We focus on problem (9) in the rest of this section, for which we introduce the following Lagrangian [54]:

 L= 12∫∞0x2(τ)e−τdτ−λ∫∞0x(τ)e−τdτ−∫∞0μ(τ)(x(τ)−τ)dτ (11)

where is a non-negative Lagrange multiplier. Taking derivative with respect to and equating to 0 we get

 x(t)=λ+μ(t)e−t (12)

Now if , then has to be larger than , for if it were equal, the right hand side of the above equation would be larger than the left hand side. By complementary slackness [54], we conclude that in this case , and hence . On the other hand if , then has to be equal to , for if it were larger, then by complementary slackness and the right hand side of the above equation would be smaller than the left hand side. In conclusion, we have

 x(t)={λ,t≤λt,t>λ (13)

This means that the optimal inter-update time is threshold-based; if an energy arrival occurs before amount of time since the last update time, i.e., if , then the sensor should not use this energy amount right away to send an update. Instead, it should wait for extra amount of time before updating. Else, if an energy arrival occurs after amount of time since the last update time, i.e., if , then the sensor should use that amount of energy to send an update right away. We coin this kind of policy -threshold policy. Substituting this into problem (9) we get

 p1(λ)=e−λ−12λ2 (14)

which admits a unique solution of when equated to 0. In the next two sections, we extend the above approach to characterize optimal policies for larger (general) battery sizes under both RBR and IBR models.

## 4 Random Battery Recharge (RBR) Model

### 4.1 Renewal-Type Policies

In this section, we focus on problem (6) in the general case of energy units. Let denote the th time that the battery level falls down to energy units. We use the term epoch to denote the time duration between two consecutive such events, and define as the length of the th epoch. The main reason behind choosing such specific event to determine the epoch’s start/end times is that the epoch would then contain at most updates, and that any other choice leads to having possibly infinite number of updates in a single epoch, which is clearly more complex to analyze. Let denote the time until the next energy arrival after . One scenario for the update process in the th epoch would be that starting at time , the sensor sends an update only after the battery recharges, i.e., at some time after , causing the battery state to fall down from to again. Another scenario would be that the sensor sends updates before the battery recharges, i.e., at some times before , and then submits one more update after the recharge occurs, making in total updates in the th epoch.

Let us now define , , to be the time it takes the sensor to send updates in the th epoch before a battery recharge occurs. That is, starting at time , and assuming that the th epoch contains updates, the sensor sends the first update at , followed by the second update at , and so on, until it submits the st update at , using up all the energy in its battery. The sensor then waits until it gets a recharge at before sending its final th update in the epoch. See Fig. 2 for an example run of the AoI curve during the th epoch given that the sensor sends updates.

In general, under any feasible status updating online policy, and may depend on all the history of status updating and energy arrival information up to , which we denote by . In addition to that, the value of can also depend on . However, by the energy causality constraint (1), the values of cannot depend on . This is due to the fact that if the sensor updates times in the same epoch, then the first updates should occur before the battery recharges. Focusing on uniformly bounded policies, we now have the following theorem. The proof is in Appendix 8.1.

###### Theorem 1

The optimal status update policy for problem (6) in the case is a renewal policy, i.e., the sequence forms a renewal process. Moreover, the optimal are constants, and the optimal only depends on .

### 4.2 Threshold Policies

Theorem 1 indicates that the sensor should let its battery level fall down to at times that constitute a renewal policy. Next, we characterize the optimal renewal policy by which the sensor sends its updates. Using the strong law of large numbers of renewal processes [52], problem (6) reduces to an optimization over a single epoch as follows:

 ρB=minx E[R(x)]E[xB(τ)] s.t. xB−1≥0 xj−1≥xj,2≤j≤B−1 xB(τ)≥τ,∀τ (15)

where , with denoting the length of an epoch in which the battery recharge occurs after time units of its beginning, and denotes the area under the age curve during an epoch. Note that the expectation is over the exponential random variable . Similar to the case, we define as follows:

 prbrB(λ)≜minx E[R(x)]−λE[xB(τ)] s.t. constraints of (4.2) (16)

As in Lemma 1, one can show that is decreasing in , and that the optimal solution of problem (4.2) is given by satisfying .

Since the optimal solution for the case cannot be larger than that of the case, which is , one can use, e.g., a bisection search over to find the optimal for . We now write the following Lagrangian for problem (4.2) after expanding the objective function:

 L= 12x2B−1e−xB−1+12B−2∑j=1(xj−xj+1)2e−xj+12∫xB−10xB(τ)2e−τdτ +12B−1∑j=2∫xj−1xj(xB(τ)−xj)2e−τdτ+12∫∞x1(xB(τ)−x1)2e−τdτ−λ∫∞0xB(τ)e−τdτ −μB−1xB−1−B−2∑j=1μj(xj−xj+1)−∫∞0μB(τ)(xB(τ)−τ)dτ (17)

where are non-negative Lagrange multipliers. Taking derivative with respect to and equating to 0 we get

 xB(t)=λ+B−1∑j=1xj\mathbbm1xj≤t

where equals 1 if the event is true, and 0 otherwise. Now let us assume that is smaller than , and verify this assumption later on. Proceeding similarly to the analysis of the case, we get

 xB(t)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩λ,t<λt,λ≤t

A depiction of the above policy for is shown in Fig. 3.

Thus, the optimal update policy has the following structure. Starting with a battery of energy units and zero age, if the next battery recharge occurs at any time before time units, then the sensor updates at exactly . While if it occurs at any time between and , then the sensor updates right away. This is the same as the -threshold policy, the solution of the case, except that it has a cut-off at . This cut-off value has the following interpretation: if the battery recharge does not occur until , then the sensor updates at , causing the battery level and the age to fall down to and , respectively. The sensor then repeats the -threshold policy described above with a new cut-off value of , i.e., if the recharge does not occur until , then the sensor updates again at , causing the battery level and the age to fall down to and , respectively. This technique repeats up to , when the sensor updates for the th time, emptying its battery. At this time, the sensor waits for the battery recharge and applies the -threshold policy one last time, with no cut-off value, to submit the last th update in the epoch. Note that if the battery recharge occurs at some time , then there would be update in the epoch. On the other hand, if , for some , then there would be updates. Finally, if then there would be updates.

In the sequel, we find the optimal values of (and ) by taking derivatives of the Lagrangian with respect to ’s and equating to 0. Before doing so, we simplify the objective function of problem (4.2) by evaluating the expectations involved and using (19). After some simplifications we get

 E[R(x)]−λE[xB(τ)]= e−λ−12λ2+f1(λ)B−1∑j=1e−xj(xB−1+1)e−xB−1−B−2∑j=1(xj−xj+1+1)e−xj (20)

where . Using the above in the Lagrangian and taking derivatives we get

 x1 =x2+f1(λ)+μ1ex1 (21) xj (22) xB−1 =f1(λ)+(μB−1−μB−2−e−xB−2)exB−1 (23)

Now let us assume that , , and . Hence, by complementary slackness we have , . One can then substitute in (22) for to find and proceed recursively to get

 xj−xj+1 =fj(λ),1≤j≤B−2 (24) xB−1 =fB−1(λ) (25)

where we have defined

 fj(λ) ≜f1(λ)−e−fj−1(λ),2≤j≤B−1 (26)

We have the following result on the structure of .

###### Lemma 2

For a fixed , the sequence is decreasing; and for a fixed , is decreasing in .

Proof:  The proofs of the two statements follow by induction. Clearly, we have . Now assume for some . Therefore . This shows the first statement.

Next, direct first derivative analysis shows that is decreasing in . Now assume that is decreasing in for some , and observe that , which is negative by the induction hypothesis. This shows the second statement, and completes the proof of the lemma.

Note that represents the inter-update delay between updates and . With this in mind, Lemma 2 has an intuitive explanation: it shows that when the amount of energy in the battery is relatively low, the sensor becomes less eager to send the next update, so that it does not run out of energy, and oppositely, when the amount of energy in the battery is relatively high, the sensor becomes more eager to send the next update so that it makes use of the available energy before the next recharge overflows the battery. Next, by equations (24) and (25), we proceed recursively from to to find the values of ’s in terms of . This gives

 xj=B−1∑m=jfm(λ),1≤j≤B−1 (27)

Finally, we substitute the above in (4.2) to get

 prbrB(λ)= e−λ−12λ2+B−1∑j=1(f1(λ)−fj(λ)−1)e−∑B−1m=jfj(λ) (28) = e−λ−12λ2−e−fB−1(λ) (29)

and perform a bisection search over to find the optimal that solves . We note that for , the summation in (29) vanishes and we directly get (14). Finally, observe that implies . Since , we have , and hence ; moreover . By Lemma 2, the above argument shows that: 1) , , which further implies by (21)-(23) that all Lagrange multipliers are zero, as previously assumed; and 2) , , which verifies the previous assumption regarding the optimal age being smaller than all inter-update delays.

In summary, given the functions through the recursive formulas in (26) with , the optimal solution of problem (6) is given by a bisection search for that satisfies in (29), and the thresholds of the optimal policy in (19) are given by (27).

## 5 Incremental Battery Recharge (IBR) Model

### 5.1 Renewal-Type Policies

In this section, we focus on problem (7) in the general case of . Similar to what we have shown in the previous section, we first show that the optimal update policy that solves problem (7) has a renewal structure. Namely, we show that it is optimal to transmit updates in such a way that the inter-update delays are independent over time; and that the time durations in between the two consecutive events of transmitting an update and having units of energy left in the battery are i.i.d., i.e., these events occur at times that constitute a renewal process. We first introduce some notation.

Let the pair represent the state of the system at time . Fix , and consider the state , which means that the sensor has just submitted an update and has units of energy remaining in its battery. Let denote the time at which the system visits for the th time. We use the term epoch to denote the time in between two consecutive visits to . Observe that there can possibly be an infinite number of updates occurring in an epoch, depending on the energy arrival pattern and the update time decisions. For instance, in the th epoch, which starts at , one energy unit may arrive at some time , at which the system goes to state , and then the sensor updates afterwards to get the system state back to again. Another possibility (if ) is that the sensor first updates at some time , at which the system goes to state , and then two consecutive energy units arrive at times and , respectively, at which the system goes to state , and then the sensor updates afterwards to get the system state back to again. Depending on how many energy arrivals occur in the th epoch, how far apart from each other they are, and the status update times, one can determine the length of the th epoch and how many updates it has. Observe that the update policy in the th epoch may depend on the history of events (energy arrivals and transmission updates) that occurred in previous epochs, which we denote by . Our first main result in this section shows that this is not the case, under uniformly bounded policies as per Definition 1, and that epoch lengths should be i.i.d. Our next theorem formalizes this. The proof is in Appendix 8.2.

###### Theorem 2

The optimal status update policy for problem (7) in the case is a renewal policy, i.e., the sequence denoting the times at which the system visits state , for some fixed , forms a renewal process.

Based on Theorem 2, the next corollary now follows.

###### Corollary 1

In the optimal solution of problem (7), the inter-update times are independent.

Proof:  Observe that whenever an update occurs the system enters state for some . The system then starts a new epoch with respect to state . Since the choice of energy units in Theorem 2 is arbitrary, the results of the theorem now tell us that the update policy in that epoch, and therefore its length, is independent of the past history, in particular the past inter-update lengths.

Based on Corollary 1, we have the following observation. Let us assume that the optimal policy is such that the state at time is . This means that the previous status update occurred at time . By Corollary 1, the policy at time is independent of the events before time . However, it may depend on the events occurring in . For instance, for , it may be the case that at time the sensor had energy units in its battery, and then received another energy unit at some time in ; or, it may have already started with energy units at time and received no extra energy units in . The question now is whether the optimal policy at time is the same in either of the two scenarios. The following result concludes that it is indeed the same.

###### Lemma 3

The optimal status update policy of problem (7) is such that at time the next scheduled update time is only a function of the system state .

Proof:  Let us assume that the optimal policy is such that the state at time is . Then this means that the previous status update occurred at time . By Corollary 1, the optimal policy at time in this case is independent of the events before . Starting from time , the sensor then solves a shifted problem defined as follows. We basically use the same terminology and random variables that constitute (5) to characterize the area under the age curve starting from time until time (instead of starting from time to time ), and denote it by , with . We also characterize a shifted feasible set , in which the battery evolves exactly as in (3) and starts with energy units at time . Therefore, given a state of at time , the sensor solves the following shifted problem:

 minx∈FtlimsupT→∞1TE[rt(T)] (30)

to find the optimal solution from time onwards (cost-to-go). The above solution depends only on future energy arrivals after time , which are, by the memoryless property of the exponential distribution, independent of the events in . Only the age and the battery state at time are needed to solve this problem. This concludes the proof.

By Theorem 2, focusing on state for some and defining the epochs with respect to this state, problem (7) reduces to an optimization over a single epoch. Based on Corollary 1 (and Lemma 3), we introduce the following notation, which is slightly different than that used in Section 4.

Once the system goes into state , for , at some time , the sensor schedules its next update after time. Since does not depend on the history before time , and cannot depend on the future energy arrivals by the energy causality constraint, we conclude that it is a constant. Now if the first energy arrival in that epoch occurs at time with , the sensor transmits the update at , whence the state becomes , and if the sensor schedules its next update after time, i.e., at . On the other hand, if the first energy unit arrives relatively early, i.e., , the state becomes at , and the sensor reschedules the update to be at instead of . Note that only depends on , since it does not depend on the history before time . If the second energy arrival in that epoch occurs at time with , the sensor transmits the update at , whence the state returns to . On the other hand, if the second energy arrival occurs relatively early as well, i.e., , and if , the state becomes at , and the sensor reschedules the update at instead of .

In summary, the optimal update policy is completely characterized by constants: , and functions: , where represents the scheduled update time after entering state , and represents the scheduled update time after entering state at some time . We emphasize the fact that by Corollary 1, the constants neither depend on each other, nor on the functions .

### 5.2 Renewal State Analysis

To analyze the optimal solution of our problem, in view of Theorem 2, we now need to choose some renewal state , , and define the epoch with respect to that state. Unlike the random battery recharges problem in Section 4, unfortunately, there is no choice of that guarantees a finite number of updates in an epoch; for all choices of there can possibly be an infinite number of updates in a single epoch. In the sequel, we continue our analysis with state as the renewal state and define the epochs with respect to it, i.e., an epoch from now onwards denotes the time between two consecutive visits to state . We note, however, that any other renewal state choice yields the same results with equivalent complexity. We use the notation and to denote the area under the age curve in a given epoch and its length, respectively, as a function of the constants and the functions . Using the strong law of large numbers of renewal processes [52], problem (7) now reduces to:

 ρ=minx,y E[R(x,y)]E[L(x,y)] s.t. xk≥0,1≤k≤B−1 yk(τ)≥τ,1≤k≤B (31)

As in the previous section we introduce the auxiliary parameterized problem:

 pibrB(λ)≜minx,y E[R(x,y)]−λE[L(x,y)] s.t. constraints of (5.2) (32)

In view of Lemma 1, we solve for the unique such that .

One main goal now is to express and explicitly in terms of and in order to proceed with the optimization. In our previous work [1], we do so for the case through some involved analysis. We note, however, that the analysis approach in [1] does not directly extend for general as it is of a complex combinatorial nature. In what follows, we introduce a novel technique that expresses the objective function of problem (5.2) explicitly in terms of and for general , and in fact shortens the analysis in [1] for .

For convenience, we remove the dependency on in the sequel. Observe that starting from state the system can go to any other state , , by the next status update, i.e., after only one update, each with some probability. Then, from state , , the system can only go to one of the following states by the next update: , each with some probability. We denote by the probability of going from state to state after one update. Clearly for . We also denote by and the area under the age curve and the time taken when the system goes from state to state in one update, respectively. Finally, since the goal is to compute the area under the age curve in an epoch together with the epoch length, we define and as the area under the age curve and the time taken to go from state back to again (in however many number of updates). See Fig. 4 where we depict the relationships between the previous variables/notation in the form of a tree graph. The graph basically represents the transitions between different system states (nodes on the graph) after only one update, which occur with probabilities indicated on the arrows in the graph that connect the nodes. We emphasize that, for instance, state in the first column of the graph is no different than state in the second column, and that the arrow connecting them merely represents a loop connecting a state to itself; we chose to expand such loop horizontally for clarity of presentation. From the graph, one can write the following equations:

 E[R] =p0,0E[r0,0]+B−1∑j=1p0,j(E[r0,j]+E[Rj]) (33) E[L] =p0,0E[ℓ0,0]+B−1∑j=1p0,j(E[ℓ0,j]+E[Lj]) (34)

Next, we evaluate the above equations. We use the following short-hand notation for nested integrals:

 ∫[a1,a2,…,an]dτn1≜a1∫τ1=0a2∫τ2=0…an∫τn=0dτ1dτ2…dτn (35)

We first begin by the terms , , and , , which are directly computable as follows. Without loss of generality, let us assume that we start at state at time . To go from state to after one update means that the sensor receives the first energy arrival in the epoch after time and then updates after time . This occurs if and only if the second energy arrival after the start of the epoch arriving at time occurs relatively late, i.e., . Note that and are i.i.d. exponential random variables with parameter . Thus,

 p0,0=P(τ2>y1(τ1)−τ1)=∫∞τ1=0e−y1(τ1)dτ1 (36)

The area under the age curve and the time taken to go from state to state after one update are respectively given by the expectation of and conditioned on the event . Hence,

 p0,0E[r0,0] =p0,0E[12y1(τ1)2∣∣∣τ2>y1(τ1)−τ1]=∫∞τ1=012y1(τ1)2e−y1(τ1)dτ1 (37) p0,0E[ℓ0,0] =p0,0E[y1(τ1)∣∣τ2>y1(τ1)−τ1]=∫∞τ1=0y1(τ1)e−y1(τ1)dτ1 (38)

Next, to go from state to , , after one update means that the sensor receives energy units consecutively before updating. This occurs if and only if each of the energy units arrive relatively early. That is, after the first arrival at time the sensor receives the second arrival at with , and then the third energy arrival occurs at with , and so on. Only the th arrival occurs relatively late so that the sensor updates exactly after arrivals, i.e., . Thus, for

 p0,n =P(τ2≤y1(τ1)−τ1,τ3≤y2(τ1+τ2)−(τ1+τ2),…, τn+1≤yn(τ1+⋯+τn)−(τ1+⋯+τn),τn+2>yn+1(τ1+⋯+τn+1)−(τ1+⋯+τn+1)) =∫∞τ1=0∫y1(τ1)−τ1τ2=0…∫yn(τ1+⋯+τn)−(τ1+⋯+τn)τn+1=0e−yn+1(τ1+⋯+τn+1)dτ1dτ2…dτn+1 =∫[∞, y1(τ1)−τ1, …, yn(τ1+⋯+τ