Energy-Neutral Source-Channel Codingwith Battery and Memory Size Constraints Paolo Castiglione is with AKG Acoustics GmbH, Vienna, Austria. Gerald Matz is with Vienna University of Technology, Vienna, Austria. Funding by FWF projects S10606 and S10607. The work of P. Castiglione was also supported by FTW Forschungszentrum Telekommunikation Wien project I0. The authors thank Prof. Osvaldo Simeone for his valuable help in improving the quality of this work.

# Energy-Neutral Source-Channel Coding with Battery and Memory Size Constraints ††thanks: Paolo Castiglione is with AKG Acoustics GmbH, Vienna, Austria. Gerald Matz is with Vienna University of Technology, Vienna, Austria. Funding by FWF projects S10606 and S10607. The work of P. Castiglione was also supported by FTW Forschungszentrum Telekommunikation Wien project I0. The authors thank Prof. Osvaldo Simeone for his valuable help in improving the quality of this work.

Paolo Castiglione and Gerald Matz
###### Abstract

We study energy management policies for the compression and transmission of source data collected by an energy-harvesting sensor node with a finite energy buffer (e.g., rechargeable battery) and a finite data buffer (memory) between source encoder and channel encoder. The sensor node can adapt the source and channel coding rates depending on the observation and channel states. In such a system, the absence of precise information about the amount of energy available in the future is a key challenge. We provide analytical bounds and scaling laws for the average distortion that depend on the size of the energy and data buffers. We furthermore design a resource allocation policy that achieves almost optimal distortion scaling. Our results demonstrate that the energy leakage of state of art energy management policies can be avoided by jointly controlling the source and channel coding rates.

## I Introduction

Energy harvesting techniques [1] enable the design of completely autonomous wireless sensor networks (WSN). However, fluctuations in the amount of the energy being harvested call for resource management policies that achieve a trade-off between short-term metrics like delay and data queue length and long-term performance indicators like throughput and average distortion (see [2] and references therein).

In a WSN, an additional challenge is the fact that the energy consumption of source compression is in the same order as that of transmission. Even without energy harvesting this allows for energy savings via a joint energy management for source coding and transmission [3, 4, 5]. These results have been extended to fluctuating energy sources in [6, 7].

In this paper, we consider a single sensor node and adopt the model from [6], where an energy buffer, e.g., a rechargeable battery, stores the harvested energy. In each time slot, the node acquires and compresses an observation with a suitably adapted rate. The observation is characterized by a time-varying state, e.g., observation signal-to-noise ratio (SNR). Then, the node stores the source coder output bits in a data buffer (memory). Furthermore, it transmits to the destination a certain number of bits from the data buffer, using a suitably adapted channel coding rate. The transmission channel is characterized by an instantaneous channel SNR.

In previous work [6], we characterized optimal energy management policies that achieve minimum distortion for the extreme cases where the energy and data buffer are either infinite or very small. For infinite buffer size, where the stability of the data queue needs to be guaranteed, the optimal policies independently allocate energy to the source and channel encoders. On the other hand, for the case of very small buffer size, a joint energy allocation by means of dynamic programming was found to be optimal.

In this paper, we consider finite buffers and use large deviation tools for our analysis that were developed in the seminal work of Tse [8]. Compared to dynamic programming, these tools have the advantage of not suffering from the curse of dimensionality. In [8], only the source coding is taken into account in the sensor policy, which thus amounts to choosing a point on the rate-distortion curve. Neither the problem of maintaining energy-neutrality, nor fluctuations of the available energy, nor optimal resource allocation among source and channel encoder have been addressed in [8].

In this work, we claim that distortion optimality can be achieved via a joint energy management for the source and channel encoders. In particular, we provide analytical bounds on the average distortion achievable with an energy-harvesting sensor, and on the scaling laws of the achievable average distortion with respect to buffer size. We further propose a joint energy management for source and channel encoding that asymptotically achieves the distortion lower bound and scales almost optimally with buffer size. We emphasize that in related work [9, 10] on this topic a joint adaptation of the source code and of the channel code has not been considered since the bit stream entering the data buffer was modeled as exogenous (i.e., uncontrollable).

Other recent contributions [7, 11] for multi-hop systems have shown that a good trade-off between performance and buffer sizes can be found by using Lyapunov optimization techniques that do not require knowledge of the statistics of the system states. In particular, [7] addresses the problem of jointly controlling distributed source coding and data transmission and develops policies that achieve a distortion optimality gap that is inversely proportional to buffer size. In contrast to our work, the optimality of such policies is not discussed in [7].

## Ii System Model

We consider a system in which a single energy-harvesting sensor node communicates with a single receiver. A block diagram of the sensor node is depicted in Fig. 1. It essentially consists of a source encoder, a transmitter, an energy buffer, a data buffer, and an energy management unit (EMU).

Energy buffer. In our model, the sensor operation is structured in time slots (indexed by ). The energy harvested in slot , denoted , is accumulated in an energy buffer of finite size , hence-forth also referred to as battery. For convenience, all energies are normalized by the number of channel uses per slot (i.e., the number of symbols transmitted per time slot). The harvested energy is assumed to be a discrete stationary irreducible aperiodic Markov process. The steady-state probability density function (pdf) of is denoted . The energy available in the battery for use in slot evolves as

 Ek+1=min{B,[Ek−(Es,k+Et,k)]++Eh,k}, (1)

where . Here, is the residual energy from the previous slot, with and denoting the energies allocated in slot for source encoding and transmission, respectively. We do not take into account the energy consumed by channel encoding and channel state acquisition, since they are typically small compared to the transmit energy in the scenario considered. Energy-neutrality amounts to the constraint .

Source encoder. The sensor takes measurements per time slot. The quality of these measurements is characterized by a parameter sequence , which is assumed to form a discrete stationary irreducible aperiodic Markov process. As an example, could model the measurement SNR, which may change over time due to source movement or environmental factors. The set is assumed to be discrete and finite. The steady-state probability mass function (pmf) for is denoted , . Due to sampling, analog-to-digital conversion, and compression the sensor acquires the source in a lossy fashion. The loss is captured by the distortion , obtained from a given distortion metric such as the mean square error (MSE). The bit stream resulting at the source encoder output is stored in a data buffer, subsequently also referred to as memory.

The number of bits produced by the source encoder within slot is given by . Here, the rate-distortion-energy function models the dependence of the source encoder output on the distortion level , the allocated energy , and the observation state . The function is assumed (for any ) to be continuous, differentiable, and separately strictly convex and non-increasing in and . Example rate-distortion-energy functions are provided in [6]. Conventional rate-distortion functions are special cases without dependence on .

Transmitter. The channel between sensor and destination is characterized by a a discrete stationary irreducible aperiodic Markov process that changes slowly over time (e.g., block-fading). The pmf of is given by , . The transmitter uses the channel times per slot. A maximum number of bits per slot can be communicated successfully to the destination. For any , the channel rate function is assumed to be continuous, differentiable, strictly concave, and non-decreasing in ; furthermore, . We consider rate-adaptive transmission schemes that achieve arbitrarily small block error probabilities. An example for is given by the Shannon capacity of the additive white Gaussian noise (AWGN) channel with SNR . However, the channel-rate function can also model the rate of channel codes with a non-zero gap to Shannon capacity. The number of bits actually transmitted using the allocated energy is given by .

Data buffer. The size and queue length of the data buffer are denoted by and , respectively. The data queue length evolves as

 Xk+1=min{A,[Xk−g(Hk,Et,k)]++f(Dk,Es,k,Qk)}. (2)

The source encoder increases the data queue length by bits while the transmitter decreases the queue length by transmitting bits. When all parameters except and are fixed, (2) captures the trade-off that results from splitting the available energy between the source encoder and the transmitter. Ideally, it is desirable to decrease by increasing and simultaneously increase by increasing . However, due to energy neutrality and cannot be simultaneously increased without bounds.

## Iii Problem Statement and Main Results

### Iii-a Problem Statement

The results obtained in what follows are based on the assumption that the buffer sizes and are much larger than the maximum variation of the respective buffer states, i.e., ( and ). We further note that the Markov assumption for the energy harvesting, for the observation state, and for the channel state generalizes the memoryless assumption that was used in related work [9, 2], and is ispired by recent models for real harvesting processes [12] as well as by well-established models for the wireless channel [13]. An extension to even more general models is beyond the scope of this paper.

The EMU has to prescribe the distortion and the energies and to be allocated to the source encoder and the transmitter, respectively. It does so using the combined state of the energy buffer, the data buffer, the source, and the channel, formally . More specifically, the EMU uses a policy where determines the parameters in the th time slot based on the present and past states .

To decrease the distortion, the sensor can either use more compression energy, thereby faster discharging the energy buffer, or compress less, thereby faster filling the data buffer (which necessitates an increase of the transmission energy to empty the buffer). Therefore, any policy amounts to a trade-off between the distortion performance and the risk of an energy buffer drain or data buffer overflow. If the battery is empty or the memory is full, a packet is lost and the maximum distortion is accrued. The long-term average distortion achieved with policy is defined as

 ¯Dπ=limsupn→∞1nn∑k=1Eπ[Dk]. (3)

The optimal EMU policy achieves the minimum distortion ; hence, for all .

### Iii-B Lower bound on the achievable distortion

We define, conditional on the observation state , the long-term average source encoder energy and the long-term average distortion . Furthermore, the long-term average transmit energy conditional on the channel state is defined as . Our main results are based on the following convex optimization problem.

###### Definition 1.

The convex optimization problem is defined as follows:

 minD(q),E(q)s,E(h)t,α∑qPr(q)D(q) subject to ∑qPr(q)f(q)(D(q),E(q)s)−∑hPr(h)g(h)(E(h)t)≤δd, (4) ∑qPr(q)E(q)s≤(1−α)(E[Eh,k]+δe), (5) ∑hPr(h)E(h)t≤α(E[Eh,k]+δe), (6) D(q)≥0,E(q)s≥0,E(h)t≥0,0<α<1.

Here, and can be interpreted as the incremental and decremental drifts111The drift is the long-term expected difference between the size of the data buffer input and the size of the data buffer output. Vice versa, the drift is the long-term expected difference between the size of the energy buffer output and the size of the energy buffer input. For the definition of drift, the buffer is assumed to be unbounded. See Appendix for a more formal definition. for the data buffer and for the energy buffer, respectively. The problem minimizes the inferior limit of the long-term expected distortion, thereby identifying the values of the above defined long-term expectations , , and of the associated parameter . The latter parameter denotes the ratio between the long-term expected energy spent for transmission and the long-term overall expected energy spent for transmission and source coding. For the problem with and , condition (4) is necessary for the mean rate stability of the data queue [14], and conditions (5)-(6) are necessary to meet the energy neutrality requirement. We note that the problems with and can be viewed as relaxations of .

Using the results in [6], we establish a lower bound on the achievable long-term distortion. The proofs of this and the subsequent results are provided in the Appendix.

###### Proposition 1.

Let denote the minimum of the problem (i.e., with zero drift, , ). Then, the minimum distortion is lower bounded as .

We note that it suffices to prove this result for the special case of infinite data and energy buffer, i.e., and . This also establishes the bound for finite buffer sizes since the assumption and is more restrictive (an infinite buffer can always mimic a finite buffer) and hence cannot lead to a smaller achievable distortion. The proof substantially demonstrates by means of Jensen inequality that condition (4) is necessary to meet the mean rate stability of the infinite data queue [14].

The next result provides a lower bound on the scaling behavior of the difference , showing how converges to when buffer size increases.222We use the following notation to compare the growth of two sequences and as increases: if for large enough and some constant ; if .

###### Proposition 2.

For any EMU policy there is

 ¯Dmin−¯D∗=Ω(A−2)+Ω(B−2).

This results states that the optimality gap is asymptotically bounded below by (here, and are constants). Thus, with increasing buffer size, cannot converge to the minimum distortion at a rate faster than . This result is not intuitive. The key idea behind the proof provided in the Appendix entails the manipulation of appropriate balance equations for each buffer as done in [8, 2.4.1].

### Iii-C Distortion achievable with finite buffer size

We now present a stationary EMU policy that only depends (in a deterministic manner) on the current state and performs close to the lower bound established in Proposition 2. This policy enforces drifts depending on hyper states that indicate whether the queues are more or less than half full. These hyper states are captured by the indices and .333The indicator function equals if the argument is true and otherwise. The data queue drift and the energy buffer drift then equal and , respectively, with and sufficiently large constants (see Appendix). The sign of these drifts ensures that the buffer states are pushed towards the respective center levels and . Furthermore, the drift magnitude depends on the size of the respective buffer. For example, the data queue drift decreases with increasing buffer size . This is intuitive since smaller buffers tend to become full faster and hence require a stronger drift to avoid overflow. The same reasoning applies to the battery drift.

###### Definition 2.

We define the policy by

 Dk =D(q)n,m,forQk=q, Es,k =min{(1−αn,m)Ek,E(q)s,n,m},forQk=q, (7) Et,k =min{αn,mEk,E(h)t,n,m},forHk=h,

where the parameters , , , and are obtained by solving the optimization problem from Definition 1 with the drifts chosen as and .

For given rate functions and and given statistics for and , the above problem must be solved for all four possible hyper states using standard convex optimization tools and the resulting parameters of the policy can then be stored in a lookup table. As illustrated in Fig. 2, the source code is determined by the distortion and the energy consumption depending on the state of the source . On the other hand, the channel code is determined by the energy consumption depending on the channel state . If the energy in the battery is not sufficient to provide and , policy (7) assigns the residual energy to the source encoder and to the channel encoder according to parameter . The next result assesses the performance of the EMU policy defined above.

###### Proposition 3.

The policy achieves a long-term average distortion that approaches as .

Proposition 3 states that the scaling behavior of the long-term average distortion achieved with the policy is almost optimal (cf. Proposition 2). More specifically, the optimality gap converges to zero as . This scaling behaviour can be interpreted as the truncation of the Taylor representation of the optimality gap to the second order derivative with respect to the drifts in each buffer. The second order component dominates the performance because the first order component cancels out due to the fact that the drifts in each buffer have opposite signs, are equal in magnitude and occur with asymptotically equal probability (see Appendix). Moreover, the data and energy buffer drifts imposed by keep the probabilities of battery depletion and memory overflow small (but different from zero), such that they become asymptotically negligible. More complex EMU policies, for instance, that can online adapt the source-channel code and the associated energies, might force these probabilities to zero. However, according to Proposition 2, any other policy, even if more complex and adaptive, cannot perform substantially better than .

We additionally observe that the source and channel encoder parameters are jointly adapted over time. This is consistent with the dynamic programming solution in [6] for small memory/battery sizes. It is also interesting to note that the source encoder and the transmitter are separately controlled as long as the hyper state of the buffers remains the same, for instance, as long as the data queue length is less than and the available energy is larger than . Hence, as , a separate energy management for source encoding and transmission remains optimal, which is consistent with the results in [6].

## Iv Numerical Results

We present Monte Carlo simulations in order to numerically assess the performance of the proposed EMU policy. We consider a system with slot duration ms. In each, the sensor acquires noisy samples with SNR . The source encoder output bits are passed through a channel encoder and transmitted over an AWGN channel with channel uses (the transmission bandwidth thus is kHz).

Using the model in [3], the source encoder is characterized by the rate-distortion-energy function Here, the first term is the information-theoretic rate-distortion limit for a zero-mean white Gaussian source with variance and with minimum distortion (minimum MSE) . The function accounts for the rate increase incurred by practical (energy-limited) compression schemes. For our simulations we chose , , and a maximum energy consumption per slot of dBm. The transmitter is characterized by the channel rate function , where accounts for the path loss, dB is the SNR gap to Shannon capacity of the (rate-adaptive) channel code, and is the noise power (measured in J, here dBm).

The source SNR dB and the path loss dB are two-state Markov chains with uniform steady-state pmf and transitions from one state to the other happening with probability . The harvested energy is a uniformly distributed Markov chain with nine states, uniformly spaced in the interval and with transition probability from each state to any of the other eight states equal to .

Fig. 3 shows the optimality gap of the policy versus data and energy buffer size. It is seen that the optimality gap can be decreased to 0.2% of the source variance by simultaneously increasing the energy buffer size and the data buffer size to realistic values of 50J and 75kB, respectively.444These are typical values for the capacitor and for the memory of a low-power sensor node. Moreover, these results confirm the validity of the scaling behavior stated in Proposition 3.

## V Concluding Remarks

In this paper we have proposed an energy management policy for energy-harvesting sensor nodes that achieves a close-to-optimal distortion scaling with respect to battery and memory size. Our large deviations results substantially differ from [9, 10], which assumed the bits arriving in the data buffer to be exogenous (uncontrolled). In these papers, the average harvested energy was assumed to be strictly larger than the average energy required to achieve the optimal utility. The energy buffer is therefore constantly filled, which implies that only the data buffer needs to be controlled, at the price that not all harvested energy is used. In contrast, our proposed energy management policy jointly adapts the source code and the channel code, which leads to a separate control of the energy buffer and the data buffer and achieves the distortion lower bound without any detrimental energy leakage.

## Appendix

### Proof of Proposition 1

We prove this result for the special case of infinite data and energy buffer, i.e., and . This also establishes the bound for finite buffer sizes since the assumption and is more restrictive (an infinite buffer can always mimic a finite buffer) and hence cannot lead to a smaller achievable distortion.

Without loss of generality we assume mean rate stability. i.e., and , both of which are always satisfied for the case of finite data buffers. Using [14, Theorem 3], we now prove that mean rate stability implies the same necessary stability conditions as in [6, Proposition 1].

Since is convex non-increasing in and and is concave non-decreasing in , Jensen’s inequality implies

 1nn∑k=1E[f(Dk,Es,k,Qk)−g(Hk,Et,k)]≥∑qPr(q)f(q)(¯D(q)n,¯E(q)s,n)−∑hPr(h)g(h)(¯E(h)t,n).

with

 ¯D(q)n=1nn∑k=1E[Dk|Qk=q],¯E(q)s,n=1nn∑k=1E[Es,k|Qk=q],¯E(h)t,n=1nn∑k=1E[Et,k|Hk=h].

Furthermore, we have

 limsupn→∞∑qPr(q)f(q)(¯D(q)n,¯E(q)s,n)=∑qPr(q)f(q)(D(q),E(q)s), limsupn→∞−∑hPr(h)g(h)(¯E(h)t,n)=−∑hPr(h)g(h)(E(h)t),

with , , and . Using the fact that mean rate stability and [14, (12)] imply , we finally arrive at

 ∑qPr(q)f(q)(D(q),E(q)s)≤∑hPr(h)g(h)(E(h)t),

which is the same necessary stability condition as in [6, Proposition 1]. The proof of Proposition 1 thus follows from [6, Proposition 1].

### Proof of Proposition 2

We first define some quantities that are instrumental for the proofs of Propositions 2 and 3. The probabilities that a policy results in an empty energy buffer or a full data queue are respectively defined as

 pπEB =limsupn→∞1nn∑k=1Pr([Ek−Es,k−Et,k]+=0), (8) pπFQ =limsupn→∞1nn∑k=1Pr(Xk=A) (9)

The decremental drift of the unbounded energy buffer queue process , and the incremental drift of the unbounded data queue process are respectively defined as

 δe =limn→∞1nn∑k=1E[~Ek−~Ek+1], δd =limn→∞1nn∑k=1E[~Xk+1−~Xk].

The drift can be viewed as the long-term expected difference between the size of the data buffer input and the size of the data buffer output, whereas the drift can be viewed as the long-term expected difference between the size of the energy buffer output and the size of the energy buffer input.

We denote the event of normal operation by (i.e., the data buffer is not full and the energy buffer is not empty). The complementary event is denoted as (i.e., either the data buffer is full or the energy buffer is empty). The instantaneous expected distortion can now be written as . With , , and the union bound , we obtain the following upper bound on the long-term average distortion :

 ¯Dπ=limsupn→∞1nn∑k=1Eπ[Dk]≤Dop+Dmax(pπFQ+pπEB). (10)

Here, we have used (8), (9), and the long-term average distortion during normal operation,

 Dop=limsupn→∞1nn∑k=1Eπ[Dk|Ek]. (11)

Note that here we assume that if the energy buffer is empty or the data buffer is full, a packet is lost and the maximum distortion is accrued (i.e., the decoder treats this missing packet as an arbitrary vector, e.g., the mean of the source distribution). An energy buffer discharge or data buffer overflow could be handled in a more sophisticated manner, but this does not bear on our asymptotic analysis (see also [8] for a similar reasoning). The bound (10) indicates that studying the optimal convergence of to the lower bound (see Proposition 1) is equivalent to finding the optimal scaling laws for i) the probabilities and approaching zero and ii) the operational distortion approaching .

In order to prove Proposition 2, we need the following two lemmas555The following notations are used to compare two sequences and as grows: if for all and some constant ; if ; if and ; if ..

###### Lemma 1.

Let and consider an arbitrary control scheme that achieves . Then, .

Lemma 1 states that for an infinitely large energy buffer no control scheme can make both and converge at a rate faster than . The proof of Lemma 1 is based on [8, Proposition 2.4.1]. Let us consider the optimization problem , i.e., the problem formulated in Def. 1 with the specific choice . Denote the minimum of by .

According to Proposition 1, the solution to this problem with equals the lower bound on the minimum achievable distortion. Notice that, by the convexity [15] of the problem, is convex and non-increasing in . Moreover, using the same arguments as in Proposition 1, it can be proved that there exists no policy that is able to achieve a long term average distortion smaller than . with a data queue drift smaller or equal to . Thus, can be viewed as the lower limit of the distortion-drift region. This observation allows us to directly apply the proof of [8, Proposition 2.4.1] to Lemma 1.

###### Lemma 2.

Let , and consider an arbitrary control scheme that achieves . Then, .

Lemma 2 is the counterpart of Lemma 1; it states that, for an infinitely large data buffer size, no control scheme can achieve a convergence rate faster than for both and . The proof parallels that of Lemma 1.

We now prove Proposition 2 by contradiction. Assume that there exists a policy with and that achieves , i.e., is asymptotically bounded above by (where and are constant factors). Such a policy would violate Lemma 1 (or Lemma 2) as (or ) tends to infinity and hence cannot exist. This implies that , which concludes the proof of Proposition 2.

### Proof of Proposition 3

In order to prove Proposition 3, we first need to recall some known results. Let us define the random walk , , with and a stationary, irreducible, and aperiodic Markov chain with states , . The transition probability from state to state is denoted and is the invariant distribution of . The drift of the random walk is assumed negative. The moment generating function of is given by

 ρ(r)=E[exp(rWk)]=I∑i=1Pr(wi)exp(rwi).

It can be shown that the function (i.e., the cumulant-generating function) has a unique positive zero at .

###### Theorem 1.

(Wald’s identity) Let K be the first for which or . Then

 E[exp(r∗ZK)]=1,

where is the unique positive root of . Furthermore,

 E[K]E[Wk]=E[ZK].

Using Theorem 1 we can compute the probability that a negative-drift random walk that starts at zero will cross the barrier before returning to the origin,

 pE[exp(r∗ZK)|ZK≥L]+(1−p)E[exp(r∗ZK)|ZK≤0]=1.

Since and in the regime of large , we have

 p=Θ(exp(−r∗L)),E[ZK]=pE[ZK|ZK≥L]+(1−p)E[ZK|ZK≤0]=Θ(1).

Hence, by Theorem 1, also the expected crossing time is dominated by the return to the origin, i.e., .

Let us now define , as the minimum distortion in the convex problem (Def. 1). According to Proposition 1, is a lower bound for the minimum achievable distortion . The convexity of implies that is convex and non-increasing in . Moreover, it can be proved using the same arguments as in Proposition 1 that there exists no policy that is able to achieve a long-term average distortion smaller than with a data queue drift and energy buffer drift less than or equal to and , respectively. Thus, can be viewed as a lower bound for the distortion-drift region.

Consider now the policy in Def. 2 and recall that the hyper states and indicate, respectively, whether the data buffer is more than half full and the energy buffer is more than half empty. Both and can be shown to be irreducible and aperiodic Markov chains. Furthermore, the data queue increment process in each half of the data buffer is function of the aggregate Markov chain . Hence, in each half of the data queue is is itself an irreducible and aperiodic Markov chain whose mean equals the drift . Similarly, the energy buffer decrement process in each half of the energy buffer is a function of the aggregate Markov chain and hence is itself an irreducible and aperiodic Markov chain whose mean equals the drift (as defined above), i.e., . We next state a lemma that follows from [10] and relates the unique positive roots and of the cumulant-generating functions of and , respectively.

###### Lemma 3.

Assume that the policy is the unique optimal policy for the drifts and . Then

 dr∗d(^Xk)dδd∣∣∣δd=δe=0=−2var(Wdk(^Xk)|δd=δe=0), (12a) dr∗e(^Ek)d(δe)∣∣∣δd=δe=0=−2var(Wek(^Ek)|δd=δe=0). (12b)

The proof of Lemma 3 is based on the fact that the policy depends smoothly on the drifts in the neighborhood of . This implies that and are smooth functions of the means of the respective increment/decrement processes, i.e., and , too. Since the supports of and are finite and the policy is almost always continuous and differentiable (like the waterfilling-like policy in [16]), the respective second-order derivatives around almost always exist and are continuous.

To obtain (we omit the argument in what follows for notational simplicity), we need to find the root of the cumulant generating function of with :

 Λ(r∗d)=logρ(r∗d)=log(I∑i=1Pr(wdi)exp(r∗dwxi))=0;

here, is the invariant distribution of , which depends on . Denoting by the th cumulant of (i.e., the th derivative of the cumulant generating function at ), the Maclaurin expansion of reads

 Λ(r∗d)=κ0+∞∑n=1κn(r∗d)nn!=δdr∗d+∞∑n=2κn(r∗d)nn!,

where the second equality is obtained with and . Setting this expression equal to zero, dividing both sides by and differentiating with respect to yields

 ∞∑n=2κn(n−1)(r∗d)n−2n!dr∗ddδd=−1.

Since as (due to well-known properties of the moment generating function, see, e.g., [8]), the above expression becomes

 κ22dr∗ddδd∣∣∣δd=0=−1.

By substituting , we obtain (12a) in Lemma 3. The proof of (12b) is analogous.

Let us next consider the probability of a full data buffer