Staying Alive: System Design for SelfSufficient Sensor Networks
Abstract
Selfsustainability is a crucial step for modern sensor networks. Here, we offer an original and comprehensive framework for autonomous sensor networks powered by renewable energy sources. We decompose our design into two nested optimization steps: the inner step characterizes the optimal network operating point subject to an average energy consumption constraint, while the outer step provides online energy management policies making the system energetically selfsufficient in the presence of unpredictable and intermittent energy sources. Our framework sheds new light into the design of pragmatic schemes for the control of energy harvesting sensor networks and permits to gauge the impact of key sensor network parameters, such as the battery capacity, the harvester size, the information transmission rate and the radio duty cycle. We analyze the robustness of the obtained energy management policies in the cases where the nodes have differing energy inflow statistics and where topology changes may occur, devising effective heuristics. Our energy management policies are finally evaluated considering real solar radiation traces, validating them against state of the art solutions and describing the impact of relevant design choices in terms of achievable network throughput and battery level dynamics.
G.1.6OptimizationStochastic Programming \termsDesign, Algorithms, Performance \acmformatNicola Bui, and Michele Rossi. 2014. Staying Alive: System Design for SelfSufficient Sensor Networks. {bottomstuff} The research leading to these results has received funding from the Seventh Framework Programme (FP7/20072013) under grant agreement no. 251557 (Project SWAP).
Author’s addresses: Nicola Bui, IMDEA Networks Institute, Av. Mar del Mediterraneo, 22, 28918, Madrid, Spain. Email: nicola.bui@imdea.org. Michele Rossi, Department of Information Engineering (DEI), University of Padova, Via Gradenigo 6/B, 35131 Padova, Italy. Email: rossi@dei.unipd.it.
1 Introduction
The operation of wireless sensor networks powered by renewable sources is a very lively area of research, both theoretical and applied. This is due to the increasing inclination toward green systems and to the need for Wireless Sensor Networks (WSN) that can last unattended indefinitely. In fact, despite the advances in microprocessor fabrication and protocol design, batteries are expected to last for less than ten years for many applications and their replacement is in some cases prohibitively expensive. This problem is particularly severe for urban sensing applications, e.g., sensors placed below the street level, where the installation of new power cables is impractical. Other examples include body sensor networks or WSNs deployed in remote geographic areas [Wang and Liu (2011)]. In contrast, WSNs powered by energy scavenging devices provide potentially maintenancefree perpetual networks, which are particularly appealing, especially for highly pervasive Internet of Things [Atzori et al. (2010)].
In the past few years, a vast literature has emerged on energy harvesting WSNs. These networks are made of tiny sensor devices with communication capabilities, that also have an onboard rechargeable battery (also referred to as energy buffer) and are capable of scavenging energy from the surrounding physical environment. Most of the research papers that have been published so far deal with the energy neutral design of transmission policies, where the concept of energy neutrality accounts for the fact that the energy used, in the long term, should be equal to that harvested. Within this body of work, two well established approaches have been adopted to find energy neutral policies, namely, offline and online. Offline solutions are concerned with finding optimal packet transmission schedules, assuming that the nodes have full knowledge of the harvesting and information generation processes. Although this is unrealistic, it provides useful insights into the design of online strategies. On the other hand, online approaches only assume some prior statistical knowledge about the energy arrival and the input data processes.
Offline approaches: [Ozel et al. (2011)] considers a single sensor node transmitting data over a wireless fading channel with additive Gaussian noise and causal channel state information at the transmitter. The authors of this paper obtain optimal policies considering two objectives: maximize the throughput by a deadline and minimize the transmission completion time. [Yang and Ulukus (2012)] generalizes the results of [Ozel et al. (2011)] by relaxing the assumption on packet arrivals, which can now arrive during transmissions. Also, this paper derives fast search algorithms leveraging structural properties of the solution. Another recent work [Gregori and Payaró (2013)] relaxes the assumption that the battery is infinite, obtaining optimal transmission policies for given Quality of Service (QoS) constraints, while fulfilling data and energy causality constraints. To the best of our knowledge, no papers in this category studied energy management policies for network of devices.
Online approaches: these approaches differ in the stochastic model considered for the energy arrival process and in the optimization objective. Notably, only a few contributions addressed aspects related to multiple access and routing in distributed networks. [Vigorito et al. (2007)] presents a decentralized strategy for the control of an energy buffer with stochastic replenishment, through the adaptation of the transmission dutycycle. This paper models the optimal buffer management as an online optimization problem, estimating the system dynamics using a gradient descent update rule and implementing energycentric policies. Similarly, [Hsu et al. (2006)] presents an adaptive duty cycling algorithm for energy harvesting sensor nodes.
The authors of [Kansal et al. (2007)] study fundamental properties of energy harvesting processes and utilize them to devise an algorithm which maximizes the throughput based on energy prediction. [Fan et al. (2008)] proposes a solution for high throughput with fairness guarantees, devising centralized and distributed algorithms that compute the optimal lexicographic rate assignment for all nodes. [Lei et al. (2009)] develops a Markov decision analysis for a sensor node with i.i.d. stochastic replenishments (i.e., fixed energy arrival rate) and a finite energy buffer. The authors of this paper devise optimal online policies that depend on the importance of packets, which is modeled through a generic probability distribution function (pdf). The authors of [Sharma et al. (2010)] propose throughput as well as delay optimal online policies for a sensor node with infinite data and energy queues. This paper considers stationary and ergodic arrival processes for data and energy and transmission over fading channels. [Michelusi et al. (2013)] generalizes the results of [Lei et al. (2009)]: it models energy replenishment through a twostate Markov model and associates a cost with data transmission. Optimal and heuristic policies are characterized considering the longterm data importance of transmitted data through a dynamic programming formulation. The focus of [Luo et al. (2013)] is instead on practical circuits for energy harvesting wireless transmitters and on their impact on the design of optimal transmission policies for TDMA channel access. The paper optimizes the time spent in storing energy and transmitting, while accounting for QoS constraints and a TDMA access scheme.
Other approaches dealing with multiple access channels and, in turn, considering the simultaneous interaction of multiple sensor nodes are [Gatzianas et al. (2010), Huang and Neely (2013)], [Michelusi and Zorzi (2013)] and [Tapparello et al. (2013)]. To our knowledge, [Gatzianas et al. (2010)] is the first contribution that has dealt with the distributed control of energy harvesting WSNs. There, the authors present an online and adaptive policy for the stabilization and optimal control of these networks using tools from Lyapunov optimization. This line of work has been continued by [Huang and Neely (2013)], which tackles the distributed routing problem using the Lyapunov optimization theory combined with the idea of weight perturbation, see, e.g., [Neely et al. (2008)]. The authors of [Michelusi and Zorzi (2013)] consider a single hop WSN where each node harvests energy from the environment and randomly accesses the channel to transmit packets of random importance to a sink node. Thus, optimal distributed policies, based on a Game theoretic formulation of the random access problem are proposed. [Tapparello et al. (2013)] presents a theoretical framework which extends [Gatzianas et al. (2010), Huang and Neely (2013)] by proposing joint transmission, data compression (distributed source coding, DSC) and routing policies that minimize the longterm expected distortion of the signal reconstructed at the sink, while assuring the energetic stability of the network.
Other research directions deal with energy sharing networks [Zhu et al. (2010)] and laserpower beaming [Bhatti et al. (2014)]. However, in the present contribution we neither look at the possibility of exchanging energy among nodes nor at performing wireless energy transfer. Further extensions may involve the adoption of energy aware programming languages [Sorber et al. (2007)].
Our contribution: our present work belongs to the online category and considers networks of energy harvesting devices. Specifically, we propose a framework based on the dynamic adaptation of two key protocol parameters, namely, the radio duty cycle and the transmission frequency for the own generated traffic, . This framework permits to assess the performance of energy harvesting sensor networks, while shedding new light into the pragmatic design of energy management solutions.
Toward this end, we account for: 1) the network topology, 2) the transmission of endogenous (own packets) data, 3) the relaying of exogenous (forwarded) data, 4) the amount of energy consumed for transmission, reception, idling, processing, etc., 5) the channel access mechanism and 6) the harvested energy inflow dynamics. For the channel access, we consider the Low Power Listening (LPL) MAC [Buettner et al. (2006), Bonetto et al. (2012)], whereas routing dynamics are modeled through the IETF Routing for low Power Lossy networks (RPL) [Ko et al. (2011), Bui et al. (2012)].
Technically, our first contribution is a model that, for any pair , returns the associated average energy consumption of a sensor node, taking 1)–5) as input. We obtain (in closed form) the pair that maximizes the node throughput subject to a given energy constraint. We subsequently locate the bottleneck node in the network (the one suffering the highest amount of interference) and we carry out a further optimization step based on 6) keeping this worst case into account. The resulting policies dynamically select the pair considering the state of the bottleneck node along with the stochastic model of the harvested energy. Being dimensioned for the worst case, the obtained policies can be applied at all nodes, leading to the selfsufficient operation of the entire WSN. Hence, we comment the behavior of the obtained energy management policies and we compare their performance against that of competing solutions from the state of the art. Finally, we relax each of the model assumptions, showing that the solutions so obtained are still robust.
In summary, the main contributions of the present paper are:

a model for the energy consumption of a network of embedded wireless devices;

a closed form formula for the optimal operating point of the network;

a mathematical framework to maximize the throughput performance, while allowing the perpetual operation of the entire sensor network;

a performance evaluation of the proposed energy management policies;

a validation of the proposed solution when the model assumptions are relaxed.
In Table 1, we introduce the notation used in the rest of the paper. Additional definitions will be given at the beginning of each section.
The remainder of this paper is organized as follows. In Section 2, we describe the workflow of the paper, detailing the objectives of our design and how these are accomplished by the analyses that follow. In Section 3 and Section 4, we characterize the energy consumption of a sensor node according to the network properties and we derive the optimal operating point for the network subject to input energy constraints. In Section 5 we present a stochastic semiMarkov model for the harvested energy and in Section 6 we obtain energy management policies for selfsufficient networks of embedded devices. In Section 7 and Section 8, we evaluate the proposed policies and, in Section 9, we present our closing remarks.
2 Problem Formulation
In this section we describe the problem formulation as two nested optimization problems. The list of used symbols is given in Table 2.
We consider a wireless sensor network composed of homogeneous embedded devices, where sensor nodes transmit their readings to a data collector node (referred to as sink). The nodes are deployed according to a certain multihop topology, and the data packets are routed toward the sink through a predetermined collection tree, as detailed in Section 3. Each sensor node is described through the diagram in Fig. 1. Specifically:

Energy source (S): this block accounts for the presence of some energy scavenging circuitry that feeds a storage unit. The amount of harvested current is described by the variable . A detailed description of a stochastic semiMarkov model of S is provided in Section 5. Note that, while the energy scavenged is stochastic across time, we initially assume that it is described by the same Markov source for all nodes. The extension to heterogeneous energy sources is provided in Section 8.1.

Battery (B): the storage unit (e.g., either a rechargeable battery or a supercapacitor) provides an average current to the following block N, see Section 6.

Sensor node (N): this block models the aggregate energy consumption of a sensor node, which is referred to as . This accounts for the energy drained by the sensor node hardware, including the network protocol stack (e.g., routing, channel access and physical layer), the onboard sensors and the CPU. The energy consumption of block N is characterized in Section 3.
The overall objective of our analysis is providing dynamic and energydependent (i.e., depending on the state of S and B) configurations for the sensor nodes in so that the entire network will be energetically selfsufficient.
To accomplish this, for a given network setup, we first identify the so called bottleneck node, which is the node experiencing the highest traffic load. This node is by definition the one subject to the highest energy consumption (more precise details will be given in Section 3 and in Appendix D).
Our analysis develops along the following two optimization steps:

We first characterize the energy consumption of the bottleneck node, for the given routing topology and channel access technology. In detail, we relate its average energy consumption, (assumed constant for this first analysis), to two key parameters: the radio dutycycle, , and the transmission frequency for the endogeneous traffic, . Given this, we solve a first optimization problem P1 (the inner problem in Fig. 1), where we seek the operational point (i.e., the pair ) for which is maximized considering as the the energy consumption constraint. To solve P1, we model the interaction of the bottleneck node with respect to the other sensors in , accounting for the transmission behavior of all nodes within range (e.g., the amount of relay traffic from the children nodes, the total traffic that these forward on behalf of their children, the number of interferers and their transmission rate, etc.). Subsequently, we derive in closed form the optimal protocol configuration for a given average energy consumption constraint .

In the second optimization step (problem P2), we additionally account for the presence of blocks S and B, where S is modeled through a stochastic timecorrelated Markov model, where the harvested current is assumed to be a timevarying, correlated stochastic process and is now the control variable. Problem P2 consists of dynamically selecting the control (or, equivalently, the pair , where the relation follows from the solution of P1), for the given energy source model, so that the bottleneck will maximize its own throughput, while being energetically selfsufficient.
At this point, we combine the results of P1 and P2: P1 decides the optimal operating point for the bottleneck as a function of , whereas P2 dictates how should vary as a function of the battery state and on some statistical knowledge of the energy harvesting process. This combined optimization amounts to a dynamic selection of the current level that has to be drained by the node, depending on the state of S and B, so that the throughput is maximized (P1) and the node is energetically selfsufficient (P2).
After solving this combined problem, the selfsufficiency of all network nodes can be assured by the following scheme. The time is divided into a number of slots, which depend on the temporal characterization of the energy scavenging process, see Section 5. A decision epoch occurs at the beginning of each slot, i.e., when the source model transitions to a new state. Thus, at each epoch the sink collects the information about the state of the battery of the bottleneck node, computes the optimal actions (using P1 and P2) for the next time slot for this node, and sends back a description of the computed optimal policy to all network nodes. Thus, all nodes will implement, in the next time slot, the policy that is optimal for the bottleneck. Consequently, the energetic stability at all nodes is assured. This can be conveniently implemented through a practical network management and routing protocol such as RPL [Winter et al. (2010)].
In this paper we look at a coursegrained control of the protocol behavior of the nodes. In fact, one control command has to be sent out to the nodes at the beginning of every time slot, whose duration depends on the number of states that are used to model the energy inflow during a typical day. While our mathematical analysis holds for any number of energy states, practical considerations related to the network overhead incurred in sending control actions to the nodes, and to the number of states that is sufficient to accurately model, e.g., typical solar sources, lead to slot durations of the order of hours.
In Section 3, for a given network scenario (i.e., transmission model, topology and data collection tree), we characterize the energy consumption of the bottleneck node. Thus, in the next Sections 4 and 6 we respectively solve problems P1 and P2 for this node, assuming that all the remaining nodes in the network behave in the same exact manner as the bottleneck does.
In Section 8.1 we extend our analysis to the case where the sensor nodes harvest different amounts of energy.
3 Node Consumption Model
The symbols used in this section are listed in the following Table 3.
In this section, we discuss the sensor node block of our architecture: this entails the definition of a tractable framework to model the interactions among nodes, including routing and channel access (MAC). We require the model to track network characteristics such as the topology, the adopted MAC protocol, channel errors and internal processing (assembling data packets, etc.). Although our framework develops along the lines of [Fischione et al. (2013)], we aim at obtaining simple and meaningful relationships, that will make it possible to compute the optimal throughput in closedform.
For tractability, we make the following assumptions:

there exists a node that consumes more energy than any other sensor. This node is referred to as the bottleneck node;

every sensor operates as the bottleneck node in terms of information generation rate, (expressed in packets per second), and duty cycle, , where , whereas and are the durations of the active and sleeping portions of the duty cycle, respectively;

the sensor nodes maintain the same behavior for long enough to justify the use of average energy consumption figures. Specifically, the time scale at which the sink takes control actions is much coarser than that related to the radio duty cycling.
To start with, we identify the operational states of a sensor node and, for each of them, the associated energy expenditure (expressed here in terms of the current drained in each state ):

TX: this is the transmission state. Here, both the microprocessor and the radio transceiver are active and the current drained in these states is and , respectively.

RX: in this state a node receives and decodes a radio frame. As for the TX state, both the microprocessor and the radio transceiver are on and, in this case, their energy drainage is and , respectively.

INT: in this state the node receives a frame that is neither intended for it nor it has to be forwarded by it. Here, the node drains exactly the same current as in state RX. In the following analysis, we track this state separately from RX as the rate of interfering and successful transmissions may differ.

CPU: the node is busy with operations that do not require any radio activity (e.g., sensing, data processing, encoding, etc.). In this state, the radio transceiver is off or in a power saving state, thus the consumption is just .

IDLE: the node is idle and can switch to some lowpower state. However, since preamblesampling MAC protocols, such as XMAC [Buettner et al. (2006)] or LowPower Listening (LPL) [Moss et al. (2007)], need to periodically sample the radio channel while idling, it is convenient to split this state into two substates:

CCA: in this state, the node samples the channel (Clear Channel Assessment). Hence, it drains the same current as in RX.

OFF: this is the state with the lowest energy consumption. Here, the microprocessor and the radio transceiver are in power saving mode and the total current drained by the device is , which is much smaller than all the other energy consumption figures ().

We now formally introduce the system state set
(1) 
where for the IDLE state it holds . The main idea behind our model consists of computing the average current drained by the bottleneck node for each state , for the given protocol and network parameters. Note that, in our model computing average currents is equivalent to computing powers, as we assume that the sensors operate according to a fixed supply voltage. For each , we have that: , where , and correspond to the drained current, the average permanence time (duration) in state and the average rate (frequency) at which state is entered, respectively. In addition, we use the quantity to indicate the average fraction of time the node spends in state . Hence, the average output current is obtained by the sum of the average currents:
(2) 
To find and , we make the following choices:

the main function of the nodes is that of sensing environmental data and sending them to the sink (Section 8 describes how to account for eventdriven WSNs);

at the channel access, we adopt a preamblebased transmitterinitiated MAC protocol, such as XMAC (that exploits a Low Power Listening strategy) [Buettner et al. (2006)];

network configuration and maintenance is managed via a distributed protocol, such as RPL (IPv6 Routing Protocol for Low power and Lossy Networks) [Winter et al. (2010)].
From the first choice, we assume that the nodes periodically sense the environment and generate their data at a constant rate of packets per second, where is the average interpacket generation time (practical details on how to deal with non periodic traffic are provided in Section 8). Also, each data packet is assembled considering the data from sensor readings; can be used to account for additional processing of data and other operations that do not involve radio activity. Note that is the nominal transmission rate, that is only obtained for a collision and errorfree channel. In practice, given that multiple nodes share the same transmission medium, packets can be lost due to, e.g., collisions or transmission errors. Taking some error recovery into account (retransmissions), the actual transmission rate will be .
For the routing, each node forwards its data packets either to the sink or to its nexthop node (referred to as its parent node). Also, each node sends its own information packets (this is referred to as endogenous traffic), as well as the packets generated by other nodes (exogenous traffic, in case the node acts as a relay for its children nodes).
To illustrate our network setting we refer to the topology example of Fig. 2, where the bottleneck node is represented as a black dot, while the sink is placed in the center of the network. In this figure, a possible realization of the routing tree is also shown. In particular, the links represented with solid lines belong to the subtree rooted at the bottleneck. White filled dots indicate the nodes that use the bottleneck to forward their data to the sink (these are referred to as children nodes), while white triangles indicate the nodes whose traffic can interfere with that of the bottleneck (interfering nodes). Crosses indicate the position of all the other nodes.
For our model, we consider the topology, the data gathering tree and the coverage range as given. Also, we only track the number of children and interfering nodes, disregarding their actual position. Given this, next we refer to the following quantities as the input parameters for our analysis:

: is the number of children nodes, i.e., the total number of nodes in the subtree rooted at the bottleneck. governs the total traffic that has to be relayed by the bottleneck node.

: is the number of interfering nodes (white triangles of Fig. 2). These are within the transmission range of the bottleneck (i.e., within one hop from it), but the latter is not their intended nexthop. Any transmission from one of these nodes can either be a spurious reception or a collision for the bottleneck.

: this corresponds to the total number of packets the bottleneck may be interfered from, i.e., the sum of the traffic load (endogenous and exogenous) from all the interfering nodes. Note that in general .
Note that especially depends on the size of the network in terms of number of communication hops, while and increase with the node density. Finally, the analysis that follows, we assume that no node in the network has larger , , and than the bottleneck node and, for each node but the bottleneck, at least one of the three parameters is strictly smaller than that of the bottleneck.
We are now ready to compute the various quantities needed to calculate (2) for the bottleneck node. We start with states TX and RX. Note that packet transmissions and receptions depend on . In fact, given that all the nodes generate a packet every seconds (homogeneous network behavior), on average, the bottleneck will receive packets from its children nodes and will transmit packets (the exogenous traffic plus its own endogenous) every seconds. This leads to:
(3)  
(4) 
where and are the data gathering components of the transmission and reception frequencies, disregarding for the moment the traffic due to RPL.
To account for the impact of the MAC protocol, we summarize here its basic functionalities. The XMAC LPL protocol specifies that each idling node periodically wakes up to perform a clear channel assessment (CCA) operation. The dutycycle period lasts seconds and is composed of a sleeping phase of seconds and a wakeup phase lasting seconds, during which CCA is performed. A node wanting to send a unicast packet transmits a burst of short request to send (RTS) preambles, for long enough so that the intended receiver will detect at least one of these RTSs in its next wakeup period. Since the nodes are in general not synchronized, to be sure of hitting the intended receiver, a node will be sending preambles for the duration of an entire dutycycle . Due to the lack of synchronization, the receiver can detect an RTS at any time within this period. Whenever a node detects an incoming RTS destined to itself, it sends a clear to send (CTS) message back to the sender and waits for the transmission of the actual data packet. After the complete reception of the data, the receiver sends an acknowledgment (ACK) to the sender. This channel access mechanism is illustrated in Fig. 3 (where we omit the transmission of the ACK for simplicity). In this figure, the sixth RTS from the sender is detected by the intended receiver, which immediately replies with a CTS. The node at the top of the diagram also detects the RTS, but it does not take any action as it is not the intended destination.
For this channel access scheme, the average time needed to carry out a successful transmission is , where the term follows from the fact that the time needed for the receiver to detect an incoming RTS is assumed to be uniformly distributed in . The terms , , and correspond to the durations associated with the transmission of a data packet, a CTS and an ACK, respectively. The reception time is . Note that the RTS time is not considered in nor in , because it is accounted for by the CCA state. Also, to simplify the notation in the following analysis we include and in .
Now, if is the transmission rate (packets/second) for an errorfree channel, in the presence of packet collisions and transmission errors the actual transmission rate becomes . For the sake of clarity, the complete characterization of the channel access problem in this case is provided in the Appendices A and B.
Thus, the average transmission time can be expressed as:
(5) 
where the factor represents the average number of retransmissions. Note that (5) implies a stopandwait retransmission policy, where an infinite number of retransmissions is allowed for each data packet. Instead, we assume that the impact of channel errors and collisions on spurious receptions and interfering packets is negligible as, in these cases, the intended receiver does not stay awake to receive the data packet and, thus, its energy expenditure is already accounted for by the CCA state.
We now model the energy expenditure associated with the maintenance of the routing topology. The selected routing algorithm, RPL, consists of a proactive technique that periodically disseminates network information through DODAG^{1}^{1}1Destination oriented directed acyclic graph (DODAG). information objects (DIO) and, subsequently, builds a routing tree by sending destination advertisement objects (DAO) toward the sink. RPL timing is governed by the trickle timer, which exponentially increases up to a maximum value for a static topology. In this paper, we analyze the steady state phase of RPL, considering static networks. This implies the following operations: for every trickle timer epoch, which lasts seconds, the bottleneck node must send its own DIO message, its own DAO and has to forward DAOs for its children. This leads to a transmission frequency for RPL messages of:
(6) 
In addition, the bottleneck node will receive DAOs from its children and DIOs from its interfering nodes (note that DIOs are not treated as interference, as they are broadcast). Thus the reception frequency for RPL messages is:
(7) 
where and are the contributions of RPL to the transmission and reception frequencies, respectively.
Finally, our model accounts for the energy expenditure due to the reception of messages that are detected during CCA but are not destined to the receiver. In this case, the receiver behaves as during a reception, but, as soon as it decodes the packet header, it recognizes that the message is not intended for itself. At this point, the node drops the message and goes back to sleep. Interfering messages can be either due to data gathering or to networking traffic and occur at a rate proportional to . Thus, we have:
(8) 
Also, we refer to as the time needed to decode the packet header and therefore detect whether a node is the intended destination for that message.
From the above reasonings, we are able to express the average current consumption for each state:
(9)  
(10)  
(11)  
(12)  
(13)  
(14) 
where is the average time spent in operations that do not involve the radio and is the fraction of time that the node spends in the IDLE state, which is computed as one minus the fraction of time spent in the remaining states:
(15) 
The total energy consumption is finally given by:
(16) 
4 Node Consumption Analysis
In this section, we present the solution of problem P1: identifying the optimal network’s operating point given a target consumption . The symbols used in this section are listed in Table 4.
Problem P1 can be formally written as:
Problem P1:
subject to:  (17)  
P1 (4) amounts to finding the optimal pair that maximizes the node throughput, , subject to the maximum allowed consumption and to time and frequency constraints. The problem can be numerically solved through two nested dichotomic searches (as shown in [Bui and Rossi (2013)]): the inner search looks for the optimal given ,^{2}^{2}2Note that in this paper we consider as a constant that depends on the considered sensor architecture, whereas the nodes can adapt the duration of their off phase, , of the duty cycle. Hence, optimizing over , or is equivalent. while the outer search looks for the optimal . Instead, our objective here is to obtain the solution in closed form. This will permit to solve problem P2 in a reasonable amount of time, while also facilitating the implementation of optimal energy management policies on constrained sensor devices.
Despite the simple problem formulation, (5) introduces a polynomial of th degree on the independent variable , which makes it difficult to express the solution through tractable and still meaningful equations. Thus, we solve the problem for a collisionfree channel and we subsequently adapt the results to keep collisions into account through a heuristic.
In fact, removing collisions allows for a simpler expression for , i.e., , which removes the th degree polynomial on . In order to illustrate that this approach is reasonable within the solution space, in Fig. 4, we show some preliminary results.
Fig. 4 shows contour lines in the plane for different output current levels ( mA): dotted lines represent the numerical solution for the complete problem, while dashdotted lines represent the solution for a collisionfree channel for the same levels. The locations of the optimal operating points in these two cases are also plotted for comparison (white squares and white circles for the complete problem and that without collisions, respectively). For a given the maximum throughput is achieved for a unique value of the duty cycle . Hence, it is not possible to find a feasible solution with higher throughput nor one with the same throughput and a different duty cycle.
From Fig. 4 we deduce the following facts:

the impact of collisions increases with which implies that the difference between the optimal working points with and without collisions is an increasing function of the energy consumption .

the maximum allowed increases with , which is expected and means that the transmission rate for the endogenous data is an increasing function of the energy consumption .

the dutycycle has a critical point, beyond which the throughput suddenly drops; which implies that has a critical point too.

the search for the optimal operating point involves the joint optimization of the transmission rate () and the dutycycle period () as these two quantities are intertwined.
For the sake of readability, the full derivation of the closed form solution in the collisionfree case is given in Appendix C. In what follows, we confine ourselves to the discussion of the adopted approach and of the main results. First, has been rewritten as a function of and , which makes it possible to find the mathematical expression of (as a function of , which is still a free parameter). This is achieved by taking the partial derivative of with respect to , equating it to zero and solving for . In doing so, we observe that , , and , as they do not depend on . This leads to:
(18)  
where coefficients , , and are given in Table C.
To illustrate the behavior of (18), in Fig. 5 we show by varying and keeping fixed in the set seconds (see dashed lines). The locus of the optimal solutions , obtained through (18), is plotted as a solid line. The closed form for the optimal crosses (without collisions) where the latter is minimized, as requested.
At this point, it is possible to replace with in (see (16)) expressing the output current as , which becomes a function of the single independent variable . Since increases with , the maximum achievable for a given target current is obtained at the equality point .
Also, cannot be increased indefinitely, because, beyond a given threshold the problem becomes bound by the frequency constraint . In this region, the system drains the maximum current , which cannot be further increased as the channel is saturated. is the smallest feasible interpacket transmission time for the considered system and can be analytically derived by observing that the optimality condition, see (18), and the frequency constraint must concurrently hold for . Thus, from we obtain the relationship between and , i.e., . Whereas replacing with in (18) leads to . Using in place of in the latter equation returns a third order polynomial in the only variable , which allows the calculation of and, in turn, of . The coefficients are given in Table C, whereas the involved mathematical derivations are detailed in Appendix C. Computing for returns the maximum current that can be consumed by the bottleneck node using an optimal configuration, i.e., . The maximum control is therefore given by .
Conversely, there is a minimum current that has to be drained in order to keep the system running and operational. is found as , which amounts to solely considering the energy consumption due to the periodic transmission of control traffic (taken into account through ). The minimum energy consumption, also corresponds to the smallest control action .
Finally, the optimal working point, , is found as the solution of with , which can be expressed as:
(19) 
where is the positive solution of the quadratic equation and is the largest solution of the cubic equation . The reader is referred again to Appendix C for mathematical insights and the definition of the coefficients (see Table C).
Fig. 6 shows the optimal operating point by varying the control as the independent parameter. The dashed line corresponds to the result of (4) for a collisionfree channel, the white filled circles represent the numerical results of the complete problem with collisions and the solid line shows the results achieved from the closed form solution, which has been adapted through a heuristic to keep collisions into account. In addition, the crosses and the dashdotted line illustrate the solution of obtained for the complete problem and using the closed form heuristically modified, respectively.
The adopted heuristic is a rigid translation of the closed form for a collisionfree channel so that the latter equals the numerical solution with collisions for the maximum allowed control . The error introduced through this approach is very small for high values of and increases for decreasing . However, this error is negligible throughout most of the solution space, as it grows slower than does and it always provides a feasible solution for the system.
Finally, in Fig. 7 we plot the reward function:
(20) 
corresponds to the maximum achievable throughput for the given multihop network. In Fig. 7, we show results for dense, medium and sparse networks (represented with squares, circles and triangles, respectively) of and hops (solid and dashed lines, respectively). The parameters of these networks are given in Table 4, where is the total number of nodes and is the network density. Increasing the number of hops has a much larger impact on the reward function than increasing the node density. All the graphs of this paper have been obtained considering a sensor platform characterized by the energy consumption and timing parameters of Table 4. The optimal throughput of (20) will be used in Section 6 as the reward function for problem P2, which considers a stochastic energy source.
5 Optimization Framework
The objective of the following sections is to solve problem P2, which translates into finding optimal and online energy consumption strategies for the sensor nodes, given the energy consumption model (see problem P1), their current energy reserve and a statistical characterization of future energy arrivals (i.e., of the energy source S). This requires to link the energy consumed to that harvested and to the instantaneous energy buffer state. In the analysis that follows, we assume that the amount of charge in the energy buffer is a known quantity or, equivalently, that it can be reliably estimated at the sensor nodes. Based on this, we formulate our optimal control as a Markov Decision Process (MDP). We observe that heuristic approaches, which base their energy consumption policies on energy estimates, are also possible but are not considered here and are left as a future work. Nevertheless, in Section 7.2 the performance of the obtained policies is compared against that of heuristic solutions from the literature.
Here, we present the stochastic model that will be used to describe the source S, as per our sensor diagram of Fig. 1. This will be used in Section 6 to solve problem P2. The resulting energy management policies are validated in Section 7.
In Table 5 we define the symbols used in this section.
Energy source: the energy source dynamics are captured by a continuoustime Markov chain with states . We refer to , with , as the time instant where the source transitions between states and to as the time elapsed between two subsequent transitions. Also, the system between and is said to be in stage , and its duration is described by a r.v. , depending on the source state in the stage. has an associated probability distribution function (pdf) . Moreover, during stage , the source provides a constant current that is fed into the battery and is assumed to remain constant until the next transition, occuring at time . This input current is described by the r.v. with pdf . We assume that and have bounded support. with are the transition probabilities of the associated embedded Markov chain, which are invariant with respect to .
DiscreteTime Formulation: we describe the energy source model through an equivalent discretetime Markov process. This will make it possible to conveniently characterize the optimal policies through a DiscreteTime Constrained Markov Decision Process (DTCMDP), in Section 6. For improved clarity of exposition and conciseness, in the remainder of this paper we omit the time index from the symbols, unless explicitly stated otherwise.
To describe the energy source through a discrete time model, for any given , we map the random nature of the stage duration into the corresponding variation of charge during the stage. To do this, we define the two r.v.s and that respectively describe the amount of charge that enters the system during the stage (stored into the energy buffer) and the amount of charge consumed by the sensor node. is the r.v. describing the overall variation of charge during the stage. We recall that is our control variable, corresponding to the current drained by the sensor node during the stage. for a given policy is a known quantity and it will be considered as a constant in the following derivations. We have that:
(21) 
Hence, the r.v. is obtained as the product of the two r.v.s and . From the theory in [Papoulis and Pillai (2002)], the pdf of when the source is in state and the control is , , is obtained as:
(22) 
Henceforth, the energy source is equivalently characterized by a discretetime Markov chain with states and transition probabilities , . Moreover, when the current state is and the control is , the corresponding variation of charge during a stage is accounted by the r.v. with pdf given by (22).
6 Markov Decision Process Analysis
This section presents our analysis of the outer optimization problem P2, which is framed as a Markov Decision Process. For improved clarity, this analysis is split into four subsections: in Section 6.1, we define the basic ingredients of the MDP, in Section 6.2 we formulate the optimal policy, discussing its properties and detailing an algorithm for its computation (see Section 6.3). Finally, in Section 6.4 we report our considerations on computational complexity and on the usage model for the computed policies. The list of symbols used in the MDP analysis is given in Table 6.
6.1 Definitions
We consider the sensor system of Fig. 1 and we assume without loss of generality that the system evolves in discrete time. Hereafter, at time , the system is said to be in stage and the terms “time” and “stage” will be used interchangeably in the following analysis. The source S feeds energy into the energy buffer B and is modeled according to the discretetime Markov chain presented in the previous section. At any time , the source S is in a certain state , whereas the energy buffer hosts an amount of charge , where is the buffer capacity. At the generic time , we define the system state as , where . The system state at the following time , defined as , depends on the dynamics of , on the control for the current stage and on the total variation of charge during stage . For the battery at the beginning of the next stage , , we have:
(23) 
where is expressed in (21) and depends on the control for the current stage , whereas is defined as , with .
We model the sensor system through a discrete time MDP. At every stage a decision has to be made based on the current state .
In addition to the system state and its dynamics, a Markov decision process is characterized by a control set , where and . contains all the feasible current consumption levels for the sensor (see Section 4).
In this paper, we consider mixed and stationary Markov (i.e., history independent) policies. The term mixed means that there exists a mapping that, for any possible state , returns a vector of pairs , of size , with .
This vector represents the decision to be made when the system state is and indicates that control must be implemented with the associated probability .
A mixed policy is a collection of such mappings for all stages.
Our problem belongs to the class of MDPs with unichain structure, bounded costs and rewards. For these, it is sufficient to consider the set of admissible Markov policies as the optimal policy can always be found within this class, see [Derman and
Strauch (1966)], [Altman (1999)] or Theorem 13.2 of [Feinberg and
Shwartz (1995)].
The boundedness of rewards and costs follows from the finite support of , and from the fact that the instantaneous reward function is also bounded. Thus, for the problem addressed in this paper it is sufficient to restrict our attention to Markov stationary policies, which means that only depends on the system state at time (past stages are not considered) and that the mapping functions do not depend on , i.e., .
Reward: the reward function takes into account the throughput of the system. Specifically, from the derivations in Section 4, we know that for a given control the optimal instantaneous throughput of a sensor node is given by , as defined in (20). Now, let , with , be the system state at the beginning of a generic decision stage . Moreover, let and respectively represent the realization of the r.v. , describing the duration of the stage, and the realization of the r.v. , quantifying the input current from the source. Taking (21) into account and recalling that the input current and the control are both constant during the stage, we have that the amount of charge varies linearly within a stage until it either hits the buffer capacity or drops to , depending on the sign of . Hence, during the stage, the total variation of charge is (see (21)) and the amount of time the level of charge in the energy buffer is greater than zero is given by the following function:
(24) 
Furthermore, as long as the buffer level is above zero, the throughput remains constant and equal to , whereas it drops to zero in case the energy buffer gets empty. Given this, the singlestage expected reward, when the system state at the beginning of the stage is and the control is , is computed as:
(25)  
where represents the average amount of time the energy buffer contains a positive amount of charge during the stage.
In the previous equation
remains constant during a stage
when is given. The actual average throughput is then modulated through the average amount of time the energy buffer state is greater than zero in the stage, i.e., .
Cost: for the cost, we account for a penalty whenever the energy buffer drops below a given threshold . This threshold is a design parameter that may be related to the minimum energy reserve that is required to keep the system operational and responsive. Also, is in general implementation dependent and besides depending on application requirements, it depends on hardware constraints. In fact, too low a charge may not be sufficient to guarantee the correct operation of the sensor nodes.
The cost is obtained as the average time spent with the energy buffer level below . The amount of time the energy buffer level is below is given by the following function:
(26) 
Hence, the singlestage expected cost when the system state at the beginning of the stage is and the control is , is obtained as:
(27)  
6.2 Optimal Policy  Formulation
We now formulate our optimal control problem as a DTCMDP. The total expected reward that is earned over an infinite horizon by a feasible policy is expressed as:
(28) 
where is the discount factor, and are respectively the system state and the control at stage and is the initial state. If we disregard the cost, having the sole objective of maximizing the throughput (reward), the optimal policy is the one that solves the following Bellman optimality equation:
(29) 
where if the current state is , represents the optimal expected reward from the current stage onwards and is obtained, maximizing over the admissible controls, the sum of the singlestage expected reward (the immediate reward, accrued in the present stage) and the expected optimal reward from the next stage onwards (where future rewards are weighted accounting for the system dynamics, i.e., and ). (6.2) can be solved through Value Iteration (VI), as detailed in Section 1.3.1 of [Bertsekas (2012)]. In short, VI amounts to using (6.2) as an update rule, which is iterated for all states starting from an initial estimate of .^{3}^{3}3Setting , in the first iteration of the algorithm also assures convergence. It can be shown that the optimality equation is a contraction mapping. This property assures that the VI iterations converge, at which point the optimal estimates computed in the previous step equal the new ones, that are obtained using the righthand side (RHS) of (6.2). Hence, the optimal policy, for any given , is given by the control that maximizes the RHS of (6.2). Note that the optimal control corresponding to (6.2) is a pure policy whereby a single control is associated with each state , i.e., there exists a mapping function such that, for each state and is unique for each .
Analogously, solely taking the cost into account, the total expected and discounted cost of a given policy for an initial state is obtained as the solution of the following Bellman equation:
(30) 
The DTCMDP problem for our controlled sensor node is thus written as:
Problem P2: