# Delay-Limited Cooperative Communication with Reliability Constraints in Wireless Networks

## Abstract

We investigate optimal resource allocation for delay-limited cooperative communication in time varying wireless networks. Motivated by real-time applications that have stringent delay constraints, we develop a dynamic cooperation strategy that makes optimal use of network resources to achieve a target outage probability (reliability) for each user subject to average power constraints. Using the technique of Lyapunov optimization, we first present a general framework to solve this problem and then derive quasi-closed form solutions for several cooperative protocols proposed in the literature. Unlike earlier works, our scheme does not require prior knowledge of the statistical description of the packet arrival, channel state and node mobility processes and can be implemented in an online fashion.

## 1Introduction

There is growing interest in the idea of utilizing cooperative communication [1] to improve the performance of wireless networks with time varying channels. The motivation comes from the work on MIMO systems [25] which shows that employing multiple antennas on a wireless node can offer substantial benefits. However, this may be infeasible in small-sized devices due to space limitations. Cooperative communication has been proposed as a means to achieve the benefits of traditional MIMO systems using *distributed single antenna* nodes. Much recent work in this area promises significant gains in several metrics of interest (such as diversity [3][4], capacity [5], energy efficiency [10], etc.) over conventional methods. We refer the interested reader to a recent comprehensive survey [1] and its references.

The main idea behind cooperative communication can be understood by considering a simple -hop network consisting of a source , its destination and a set of relay nodes as shown in Figure 1. Suppose has a packet to send to in timeslot . The channel gains for all links in this network are shown in the figure. In direct communication, uses the full slot to transmit its packet to over link as shown in Figure 1(a). In conventional multi-hop relaying, uses the first half of the slot to transmit its packet to a particular relay node over link as shown in Figure 1(b). If can successfully decode the packet, it re-encodes and transmits it to in the second half of the slot over link . In both scenarios, to ensure reliable communication, the source and/or the relay must transmit at high power levels when the channel quality of any of the links involved is poor. However, note that due to the broadcast nature of wireless transmissions, other relay nodes may receive the signal from the transmission by and can cooperatively relay it to . The destination now receives multiple copies/signals and can use all of them jointly to decode the packet. Since these signals have been transmitted over independent paths, the probability that all of them have poor quality is significantly smaller. Cooperative communication protocols take advantage of this *spatial diversity gain* by making use of multiple relays for cooperative transmissions to increase reliability and/or reduce energy costs. This is different from traditional multi-hop relaying in which only one node is responsible for forwarding at any time and in which the destination does not use multiple signals to decode a packet.

Because of the half-duplex nature of wireless devices, a relay node cannot send and receive on the same channel simultaneously. Therefore, such cooperative communication protocols typically operate over a two phase slot structure as shown in Figs. Figure 1(c) and Figure 1(d). In the first phase, transmits its packet to the set of relay nodes. In the second phase, a subset of these relays transmit their signals to . Note that the destination may receive the source signal from the first phase as well. At the end of the second phase, the destination appropriately combines all of these received signals to decode the packet. The exact slot structure as well as the signals transmitted by the relays depend on the cooperative protocol being used.^{1}

In this work, we consider a mobile ad-hoc network with *delay-limited* traffic and cooperative communication. Many real-time applications (e.g., voice) have stringent delay constraints and fixed rate requirements. In slow fading environments (where decoding delay is of the order of the channel coherence time), it may not be possible to meet these delay constraints for every packet. However, these applications can often tolerate a certain fraction of lost packets or outages. A variety of techniques are used to combat fading and meet this target outage probability (including exploiting diversity, channel coding, ARQ, power control, etc.). Cooperative communication is a particularly attractive technique to improve reliability in such delay-limited scenarios since it can offer significant spatial diversity gains in addition to these techniques.

Much prior work on cooperative communication considers physical layer resource allocation for a static network, particularly in the case of a single source. Objectives such as minimizing sum power, minimizing outage probability, meeting a target SNR constraint, etc., are treated in this context [9]. We draw on this work in the development of *dynamic* resource allocation in a stochastic network with fading channels, node mobility, and random packet arrivals, where *opportunistic cooperation decisions* are required. Dynamic cooperation was also considered in the prior work [18] which investigates throughput optimality and queue stability in a multi-user network with static channels and randomly arriving traffic using the framework of Lyapunov drift. Our formulation is different and does not involve issues of queue stability. Rather, we consider a delay-limited scenario where each packet must either be transmitted in one slot, or dropped. This is similar to the concept of *delay-limited capacity* [19]. Also related to such scenarios is the notion of *minimum outage probability* [20]. These quantities are also investigated in the recent work [14] that considers a node static network with Rayleigh fading and shows that opportunistic cooperation significantly improves the delay-limited capacity.

In this work, we use techniques of both Lyapunov drift and Lyapunov optimization [24] to develop a control algorithm that takes dynamic decisions for each new slot. Different from most work that applies this theory, our solution involves a -stage stochastic shortest path problem due to the cooperative relaying structure. This problem is non-convex and combinatorial in nature and does not admit closed form solutions in general. However, under several important and well known classes of physical layer cooperation models, we develop techniques for reducing the problem exactly to an -stage set of convex programs. The convex programs themselves are shown to have quasi-closed form solutions and can be computed in real time for each slot, often involving simple water-filling strategies that also arise in related static optimization problems.

## 2Basic Network Model

We consider a mobile ad-hoc network with delay-limited communication over time varying fading channels. The network contains a set of nodes, all potentially mobile. All nodes are assumed to be within range of each other, and any node pair can communicate either through direct transmission or through a -phase cooperative transmission that makes use of other nodes as relays. The system operates in slotted time and the channel coefficient between nodes and in slot is denoted by . We assume a block fading model [25] for the channel coefficients so that their value remains fixed during a slot and changes from one slot to the other according to the distribution of the underlying fading and mobility processes.

For simplicity, we assume that the set contains a single source node and its destination node and that all other nodes act simply as cooperative relays. This is similar to the single-source assumption treated in [12] for static networks. We derive a dynamic cooperation strategy for this single source problem in Section 4 that optimizes a weighted sum of reliability and power expenditure subject to individual reliability and average power constraints at the source and at all relays. This highlights the decisions involved from the perspective of a source node, and these decisions and the resulting solution structure are similar to the multi-source scenario operating under an orthogonal medium access scheme (such as TDMA or FDMA) studied later in Section 7. In the following, we denote the set of relay nodes by and the set by . All nodes have both long term average and instantaneous peak power constraints given by and respectively.

We consider two models for the availability of the channel state information (CSI). The first is the *known channels, unknown statistics* model. Under this model, we assume that the channel gains between the source node and its relay set and destination as well as the channel gains between the relays and the destination are known every slot. These could be obtained by sending pilot signals and via feedback. This model has also been considered in prior works [12] on power allocation in static networks where, in addition to the current channel gains, a knowledge of the distribution governing the fading process is assumed. In our work, under this *known channels, unknown statistics* model, we do not assume any knowledge of the distributions governing the evolution of the channel states, mobility processes, or traffic. Thus, our algorithm and its optimality properties hold for a very general class of channel and mobility models that satisfy certain ergodicity requirements (to be made precise later). We note that the channel gain could represent just the amplitude of the channel coefficient if an orthogonal cooperative scheme is being used. However, in case of cooperative schemes such as beamforming, this could represent the complete description of the fading coefficient that includes the phase information.

The second model we consider is the *unknown channels, known statistics* model. In this case, we assume that the current set of potential relay nodes is known on each slot , but the exact channel realizations between the source and these relays, and the relays and the destination, are unknown. Rather, we assume only that the *statistics* of the fading coefficients are known between the source and current relays, and the current relays and destination. However, we still do not require knowledge of the distributions governing the arriving traffic or the mobility pattern (which affects the set of relays we will see in future slots). This is in contrast to prior works that have considered resource allocation in the presence of partial CSI only for static networks.

For both models, we use to represent the collection of all channel state information known on slot . For the known channels, unknown statistics model, represents the collection of channel coefficients between the source and relays and relays and destination. For the unknown channels, known statistics model, represents the set of all nodes that are available on slot for relaying and the distribution of the fading coefficients. We assume that lies in a space of finite but arbitrarily large size and evolves according to an ergodic process with a well defined steady state distribution. This variation in channel state information affects the reliability and power expenditure associated with the direct and cooperative transmission modes that are discussed in Section 2.2.

### 2.1Example of Channel State Information Models

As an example of these models, suppose the nodes move in a cell-partitioned network according to a Markovian random walk (see also Figure 2 in Section 8 on Simulations). Each slot, a node may decide to stay in its current cell or move to an adjacent cell according to the probability distribution governing the random walk. Suppose that each slot, the set of potential relays consists only of nodes in either the same or an adjacent cell of the source. Suppose channel gains between nodes in the same cell are distributed according to a Rayleigh fading model with a particular mean and variance, while gains for nodes in adjacent cells are Rayleigh with a different mean and variance. Under the known channels, unknown statistics model, the information is the set of current gains , and the Rayleigh distribution is not needed. Under the unknown channels, known statistics model, the information is the set of nodes currently in the same and adjacent cells of the source, and we assume we know that the fading distribution is Rayleigh, and we know the corresponding means and variances. However, neither model requires knowledge of the mobility model or the traffic rates.

### 2.2Control Options

Suppose the slot size is normalized to integer slots . In each slot, the source receives new packets for its destination according to an i.i.d. Bernoulli process of rate . Each packet is assumed to be bits long and has a *strict* delay constraint of slot. Thus, a packet not served within slot of its arrival is dropped. Further, packets that are not successfully received by their destinations due to channel errors are not retransmitted. The source node has a minimum time-average reliability requirement specified by a fraction which denotes the fraction of packets that were transmitted successfully. In any slot , if source has a new packet for transmission, it can use one of the following transmission modes (Figure 1):

Transmit directly to using the full slot

Transmit to using traditional relaying over two hops

Transmit cooperatively with the set of relay nodes using the two phase slot structure

Stay idle (so that the packet gets dropped)

We consider all of these transmission modes because, depending on the current channel conditions and energy costs in slot , it might be better to choose one over the other. For example, due to the half-duplex constraint, direct transmission using the full slot might be preferable to cooperative transmission over two phases on slots when the source-destination link quality is good. Note that this is similar to the much studied framework of opportunistic transmission scheduling in time varying channels. Further, even in the special case of static channels, the optimal strategy may involve a mixture of these modes of operation to meet the target reliability and average power constraints.

Let denote the collective control action in slot under some policy that includes the choice of the transmission mode at the source, power allocations for the source and all relevant relays, and any additional physical layer choices such as modulation and coding. Specifically, we have:

where the mode choice refers to one of the transmission modes for the source, and where is the collection of coefficients representing power allocations for each node . Note that for all under transmission mode (idle). If the source chooses mode , we have for all relay nodes , whereas if chooses mode , we have for at most one relay . Note that under any feasible policy , must satisfy the instantaneous peak power constraint every slot for all . Also note that under the cooperative transmission option, the power allocation for the source node and the relays corresponds to the first and second phase respectively. Thus, the source is active in the first phase while the relays are active in the second phase. We denote the set of all valid power allocations by and define as the set of all valid control actions:

The success/failure outcome of the control action is represented by an indicator random variable that depends on the current control action and channel state. Successful transmission of a packet is usually a complicated function of the transmission mode chosen, the associated power allocations and channel states, as well as physical layer details like modulation, coding/decoding scheme, etc. In this work, the particular physical layer actions are included in the decision variable. Specifically, given a control action and a channel state , the outcome is defined as follows:

Note that is a random variable, and its conditional expectation given is equal to the success probability under the given physical layer channel model. Use of this abstract indicator variable allows a unified treatment that can include a variety of physical layer models. Under the known channels, unknown statistics model (where includes the full channel realizations between source and relays and relays and destination on slot ), can be a determinisitic function based on the known channel state and control action. Specific examples for this model are considered in Section 5. Under the unknown channels, known statistics model (where represents only the set of current possible relays and the fading statistics), we assume we know the value of under each possible control action . This model is considered in Section 6. Under both models, we assume that explicit ACK/NACK information is received at the end of each slot, so that the source knows the value of . For notational convenience, in the rest of the paper, we use instead of noting that the dependence on is implicit.

### 2.3Discussion of Basic Model

The basic model described above extends prior work on -phase cooperation in static networks to a mobile environment, and treats the important example scenario where a team of nodes move in a tight cluster but with possible variation in the relative locations of nodes within the cluster. We note that our model and results are applicable to the special case of a static network as well. Another example scenario captured by our model is an OFDMA-based cellular network with multiple users that have both inter-cell and intra-cell mobility. In each slot, a set of transmitters is determined in each orthogonal channel (for example, based on a predetermined TDMA schedule, or dynamically chosen by the base station). The remaining nodes can potentially act as cooperative relays in that slot.

The basic model treats scenarios in which a source node can transmit to its destination, possibly with the help of multiple relay nodes, in stages. While this is a simplifying assumption, the framework developed here can be applied to more general scenarios in which, in a single slot, cooperative relaying over stages is performed (for some ) using multi-hop cooperative techniques (e.g., [21]).

## 3Control Objective

Let and for be a collection of non-negative weights. Then our objective is to design a policy that solves the following *stochastic optimization problem*:

where is the time average reliability for source under policy and is defined as:

and is the time average power usage of node under :

Here, the expectation is with respect to the possibly randomized control actions that policy might take. The and weights allow us to consider several different objectives. For example, setting and for all reduces (Equation 2) to the problem of minimizing the average sum power expenditure subject to minimum reliability and average power constraints. This objective can be important in the multiple source scenario when the resources of the relays must be shared across many users. Setting all of these weights to reduces (Equation 2) to a feasibility problem where the objective is to provide minimum reliability guarantees subject to average power constraints.

Problem (Equation 2) is similar to the general stochastic utility maximization problem presented in [24]. Suppose (Equation 2) is feasible and let and denote the optimal value of the objective function, potentially achieved by some arbitrary policy. Using the techniques developed in [24], it can be shown that it is sufficient to consider only the class of stationary, randomized policies that take control decisions purely as a (possibly random) function of the channel state every slot to solve (Equation 2). However, computing the optimal stationary, randomized policy explicitly can be challenging and often impractical as it requires knowledge of arrival distributions, channel probabilities and mobility patterns in advance. Further, as pointed out earlier, even in the special case of a static channel, the optimal strategy may involve a mixture of direct transmission, multi-hop, and cooperative modes of operation, and the relaying modes must select different relay sets over time to achieve the optimal time average mixture.

However, the technique of Lyapunov optimization [24] can be used to construct an alternate dynamic policy that overcomes these challenges and is provably optimal. Unlike the stationary, randomized policy, this policy does not need to be computed beforehand and can be implemented in an online fashion. In the known channels model, it does not need a-priori statistics of the traffic, channels, or mobility. In the unknown channels model, it does not need a-priori statistics of the traffic or mobility. We present this policy in the next section.

## 4Optimal Control Algorithm

In this section, we present a dynamic control algorithm that achieves the optimal solution and to the stochastic optimization problem presented earlier. This algorithm is similar in spirit to the backpressure algorithms proposed in [24] for problems of throughput and energy optimal networking in time varying wireless ad-hoc networks.

The algorithm makes use of a “reliability queue” for source . Specifically, let be a value that is initialized to zero (so that ), and that is updated at the end of every slot according to the following equation:

where is the number of arrivals to source on slot (being either or ), and is if and only if a packet that arrived was successfully delivered (recall that ACK/NACK information gives the value of at the end of every slot ). Additionally, it also uses the following virtual power queues :

All these queues are also initialized to and updated at the end of every slot according to the equation above. We note that these queues are virtual in that they do not represent any real backlog of data packets. Rather, they facilitate the control algorithm in achieving the time average reliability and energy constraints of (Equation 2) as follows. If a policy stabilizes (Equation 5), then we must have that its service rate is no smaller than the input rate, i.e.,

Similarly, stabilizing (Equation 6) yields the following:

where we have used definitions (Equation 3), (Equation 4). This technique of turning time-average constraints into queueing stability problems was first used in [23].

To stabilize these virtual queues and optimize the objective function in (Equation 2), the algorithm operates as follows. Let denote the collection of these queues in timeslot . Every slot , given and the current channel state , it chooses a control action that minimizes the following stochastic metric (for a given control parameter ):

After implementing and observing the outcome, the virtual queues are updated using (Equation 5), (Equation 6). Recall that there are no actual queues in the system. Our algorithm enforces a strict -slot delay constraint so that if the packet is not successfully delivered after slot. The virtual queues are maintained only in software and act as known weights in the optimization (Equation 7) that guide decisions towards achieving our time average power and reliability goals. The control action ) that optimizes (Equation 7) affects the powers allocated and the value according to (Equation 1).

The above optimization is a -stage *stochastic shortest path* problem [26] where the two stages correspond to the two phases of the underlying cooperative protocol. Specifically, when decides to use the option of transmitting cooperatively, the cost incurred in the first stage is given by the first term . The cost incurred during the second stage is given by and at the end of this stage, we get a reward of . The transmission outcome depends on the power allocation decisions in *both* phases which makes this problem different from greedy strategies (e.g., [18], [23]). In order to determine the optimal strategy in slot , the source computes the minimum cost of (Equation 7) for all transmission modes described earlier and chooses one with the least cost.

Note that this problem is unconstrained since the long term time average reliability and power constraints do not appear explicitly as in the original problem. These are implicitly captured by the virtual queue values. Further, its solution uses the value of the *current* channel state and does not require knowledge of the statistics that govern the evolution of the channel state process. Thus, the control strategy involves implementing the solution to the sequence of such unconstrained problems every slot and updating the queue values according to (Equation 5), (Equation 6). Assuming i.i.d. states, the following theorem characterizes the performance of this dynamic control algorithm A similar statement can be made for more general Markov modulated using the techniques of [24]. For simplicity, here we consider the i.i.d. case.

Theorem 1

: (Algorithm Performance) Suppose all queues are initialized to . Then, implementing the dynamic algorithm (Equation 7) every slot stabilizes all queues, thereby satisfying the minimum reliability and time-average power constraints, and guarantees the following performance bounds (for some that depends on the slackness of the feasibility constraints):

Further, the time average utility achieved for any satisfies:

where

Proof

: Appendix A.

Thus, one can get within of the optimal values by increasing at the cost of an increase in the virtual queue backlogs. The size of these queues affects the time required for the time average values to converge to the desired performance.

In the following sections, we investigate the basic -stage resource allocation problem (Equation 7) in detail and present solutions for two widely studied classes of cooperative protocols proposed in the literature: Decode-and-Forward (DF) and Amplify-and-Forward (AF) [3]. These protocols differ in the way the transmitted signal from the first phase is processed by the cooperating relays. In DF, a relay fully decodes the signal. If the packet is received correctly, it is re-encoded and transmitted in the second phase. In AF, a relay simply retransmits a scaled version of the received analog signal. We refer to [3] for further details on the working of these protocols as well as derivation of expressions for the mutual information achieved by them. Let . In the following, we assume a Gaussian channel model with a total bandwidth and unit noise power per dimension. We use the information theoretic definition of a transmission failure (an outage event) as discussed in [19], [20]. Here, an outage occurs when the total instantaneous mutual information is smaller than the rate at which data is being transmitted.

We first consider the case when the channel gains are known at the source (Section 5). In this scenario, (Equation 7) becomes a -stage *deterministic shortest path problem* because the outcome due to any control decision and its power allocation can be computed beforehand. Specifically, when the resulting total mutual information exceeds and otherwise. Further, this outcome is a function of control actions taken over two stages when cooperative transmission is used. This resulting problem is combinatorial and non-convex and does not admit closed-form solutions in general. However, for these protocols, we can reduce it to a set of simpler convex programs for which we can derive quasi-closed form solutions. Then in Section 6, we consider the case when only the statistics of the channel gains are known. In this case, the outcome is random function of the control actions (taken over the two stages in case of cooperative transmission) and (Equation 7) becomes a -stage *stochastic dynamic program*. While standard dynamic programming techniques can be used to compute the optimal solution, they are typically computationally intensive. Therefore, for this case, we present a Monte Carlo simulation based technique to efficiently solve the resulting dynamic program.

## 5-Stage Resource Allocation Problem with Known Channels, Unknown Statistics

Recall that in order to determine the optimal control action in any slot , we must choose between the four modes of operation as discussed in Section 2: direct transmission, multi-hop relay, cooperative, and idle. Let and denote the optimal cost of the metric (Equation 7), and the corresponding action that achieves that metric, assuming that mode is chosen in slot . Every slot, the algorithm computes and for each mode and then implements the mode and the resulting action that minimizes cost. Note that the cost for the idle mode is trivially . The minimum cost for direct transmission can be computed as follows. When the source transmits directly, we have . The minimum cost associated with a *successful* direct transmission () can be obtained by solving the following convex problem ^{2}

where the constraint represents the fact that to get , the mutual information must exceed . It is easy to see that if there is a feasible solution to the above, then for minimum cost, this constraint must be met with equality. Using this, the minimum cost corresponding to the direct transmission mode is given by: if . Otherwise, direct transmission is infeasible and so we set . In this case, direct transmission will not be considered as the idle mode cost is strictly better, but we must also compare with the costs and .

To compute the minimum cost associated with multi-hop transmission, note that in this case, the slot is divided into two parts (Figure 1(b)) and for at most one . This strategy is a special case of the Regenerative DF protocol (to be discussed next) that uses only relay and in which the destination does not use signals received from the first stage for decoding. Therefore, the optimal cost for this can be calculated using the procedure for the Regenerative DF case by imposing the single relay constraint and setting .

Below we present the computation of the minimum cost for the cooperative transmission mode under several protocols. In what follows, we drop the time subscript for notational convenience.

### 5.1Regenerative DF, Orthogonal Channels

Here, the source and relays are each assigned an orthogonal channel of equal size. An example slot structure is shown in Figure 1(c) in which the entire slot is divided into equal mini-slots. In the first phase of the protocol, transmits the packet in its slot using power . In the second phase, a subset of relays that were successful in reliably decoding the packet, re-encode it using the *same* code book and transmit to the destination on their channels with power (where ). Given such a set , the total mutual information under this protocol is given by [3]:

This is derived by assuming that the receiver uses Maximal Ratio Combining to process the signals. As seen in the expression for the mutual information, such an orthogonal structure increases the SNR, but utilizes only a fraction of the available degrees of freedom leading to reduced multiplexing gain.

Define binary variables to be if relay can reliably decode the packet after the first stage and else. Then, for this protocol, (Equation 7) is equivalent to the following optimization problem:

The variables capture the requirement that a relay can cooperatively transmit in the second stage only if it was successful in reliably decoding the packet using the first stage transmission. A similar setup is considered in [12] but it treats the limiting case when goes to infinity. Because of the integer constraints on , (Equation 9) is non-convex. However, we can exploit the structure of this protocol to reduce the above to a set of subproblems as follows. We first order the relays in decreasing order of their values. Define as the set that contains the first (where ) relays from this ordering. Let denote the minimum source power required to ensure that all relays in can reliably decode the packet after the first stage. We note that for all values of in the range , the relay set that can reliably decode remains the same, i.e., . Thus, we need to consider only subproblems, one for each . The subproblem for any set is given by:

This can easily be expressed as the following LP:

where . The solution to the LP above has a greedy structure where we start by allocating increasing power to the nodes (including ) in decreasing order of the value of (where ) till any constraint is met.

Therefore, for this protocol, the optimal solution to finding the cost associated with the cooperative transmission mode in (Equation 7) can be computed by solving (Equation 11) for each and picking the one with the least cost. It is interesting to note that if we impose a constraint on the sum total power of the relays instead of individual node constraints, then due to the greedy nature of the solution to (Equation 11), it is optimal to select at most relay for cooperation. Specifically, this relay is the one that has the highest value of .

### 5.2Non-Regenerative DF, Orthogonal Channels

This protocol is similar to Regenerative DF protocol discussed in Section 5.1. The only difference is that here, in the second stage, the subset relays that were successful in reliably decoding the packet re-encode it using *independent* code books. In this case, the total mutual information is given by [4]:

Using the same definition of binary variables as in Section 5.1 , we can express (Equation 7) for this protocol as an optimization problem that resembles (Equation 9). Similar to the Regenerative DF case, we can then reduce this to a set of subproblems, one for each . The subproblem for set is given by:

The above problem is convex and we can use the KKT conditions to get the optimal solution (see Appendix B for details). Define . Then the solution to the subproblem for set is given by:

where is chosen so that the total mutual information constraint is met with equality. Therefore, the optimal solution for the cost in (Equation 7) for this protocol can be computed by solving (Equation 13) for each and picking one with the least cost. We note that the solution above has a water-filling type structure that is typical of related resource allocation problems in static settings.

### 5.3AF, Orthogonal Channels

In this protocol, the source and relays are again assigned an orthogonal channel of equal size. An example slot structure is shown in Figure 1(c). However, instead of trying to decode the packet, the relays amplify and forward the received signal from the first stage. The total mutual information under this protocol is given by [13] [16]:

where . Using this, we can express (Equation 7) for this model as follows.

This problem is non-convex. However, if we fix the source power , then it becomes convex in the other variables. This reduction has been used in [16] as well, although it considers a static scenario with the objective of minimizing instantaneous outage probability. After fixing , we can compute the optimal relay powers for this value of by solving the following:

where . The first constraint can be simplified as:

Since we have fixed , we can express (Equation 15) as:

where . Using the KKT conditions, the solution the above convex optimization problem is given by (see Appendix C for details): where is chosen so that the second constraint is met with equality. We note that this solution has a water-filling type structure as well. Therefore, to compute the optimal solution to (Equation 7) for this protocol, we would have to solve the above for each value of . In practice, this computation can be simplified by considering only a discrete set of values for . Because we have derived a simple closed form expression for each , it is easy to compare these values over, say, a discrete list of options in to pick the best one, which enables a very accurate approximation to optimality in real time.

### 5.4DF with DSTC

In this protocol, all the cooperating relays in the second stage use an appropriate distributed space-time code (DSTC) [4] so that they can transmit simultaneously on the same channel. The slot structure under this scheme is shown in Figure 1(d). Suppose in the first phase of the protocol, transmits the packet in the first half of the slot using power . In the second phase, a subset of relays that were successful in reliably decoding the packet, re-encode it using a DSTC and transmit to the destination with power (where ) in the second half of the slot. Given such a set , the total mutual information under this protocol is given by [3]:

The factor of appears because only half of the slot is being used for transmission. As seen in the expression above, unlike the earlier examples, this protocol does not suffer from reduced multiplexing gains due to orthogonal channels.

We can now express (Equation 7) for this protocol as follows. Define binary variables to be if relay can reliably decode the packet after the first stage and else. Then, for this protocol, (Equation 7) is equivalent to the following optimization problem:

By comparing the above with (Equation 9), it can be seen that the computation of minimum cost under this protocol follows the same procedure as described in Section 5.1 of solving subproblems, each an LP, by ordering the relays greedily and hence we do not repeat it.

### 5.5AF with DSTC

Here, all cooperating relays use amplify and forward along with DSTC. The total mutual information under this protocol is given by:

where . Using this, we can express (Equation 7) for this model as follows.

This is similar to (Equation 14) and thus, we fix and use a similar reduction to get a convex optimization problem whose solution can be derived using KKT conditions and is given by:

where is chosen so that the constraint on the total mutual information at the destination is met with equality.

## 6-Stage Resource Allocation Problem with Unknown Channels, Known Statistics

We next consider the solution to (Equation 7) when the source does not know the current channel gains and is only aware of their statistics. In this case, (Equation 7) becomes a -stage stochastic dynamic program. For brevity, here we focus on its solution for the cooperative transmission mode.

Suppose the source uses power in the first stage. Let denote the outcome of this transmission. This lies in a space of possible network states which is assumed to be of a finite but arbitrarily large size. For example, in the DF protocol, might represent the set of relay nodes that received the packet successfully after the first stage as well as the mutual information accumulated so far at the destination. For AF, can represent the SNR value at each relay node and at the destination.

Let be the optimal cost-to-go function for the -stage dynamic program (Equation 7) given that the source uses power in the first stage and the network state is at the beginning of the second stage. Let denote the optimal cost-to-go function starting from the first stage. Also, let denote the set of relay nodes that can take part in cooperative transmission when the network state in . We define the following probabilities. Let be the probability that the outcome of the first stage is when the source uses power . Also, let be the probability that the receiver gets the packet successfully when relays in use a power allocation and the source uses power . Note that these probabilities are obtained by taking expectation over all channel state realizations. We assume these are obtained from the knowledge of the channel statistics.

Using these definitions, we can now write the Bellman optimality equations [26] for this dynamic program :

While this can be solved using standard dynamic programming techniques, it has a computational complexity that grows with the state space size and can be prohibitive when this is large. We therefore present an alternate method based on the idea of Monte Carlo simulation.

### 6.1Simulation Based Method

Suppose the transmitter performs the following simulation. Fix a source power . Define as the optimal cost-to-go function *given* that the source uses power . Note that this is simply the expression on the right hand side of (Equation 19) with fixed. Simulate the outcome of a transmission at this power times independently using the values of . Let denote the outcome of the simulation. For each generated outcome , compute the optimal cost-to-go function by solving ( ?) (this could be done using the knowledge of either analytically or numerically). Use this to update , which is an *estimate* of for a given after iterations and is defined as follows:

We now show that, for a given , can be pushed arbitrarily close to the optimal cost-to-go function by increasing . Since we have fixed , from (Equation 19), we have:

Define the following indicator random variables for each simulation and :

Note that by definition . Therefore, we can express in terms of these indicator variables as follows:

We note that are i.i.d. random variables with mean and variance . Using Chebyshev’s inequality, we get for any :

This shows that the value of the estimate quickly converges to the optimal cost-to-go value. Thus, this method can be used to get a good estimate of the optimal cost-to-go function for a fixed value of in a reasonable number of steps.

## 7Multi-Source Extensions

In this section, we extend the basic model of Section 2 to the case when there are multiple sources in the network. Let the set of source nodes be given by . We consider the case when all source nodes have orthogonal channels.^{3}

Let be the source node that gets a transmission opportunity in slot . Then, the optimal resource allocation framework developed in Section 4 can be applied as follows. A virtual reliability queue is defined for each source node and is updated as in (Equation 5). Note that in slots where a source node does not get a transmission opportunity, . We assume that each incoming packet gets one transmission opportunity so that the delay constraint of slot per packet only measures the transmission delay and not the queueing delay that would be incurred due to contention. Similarly, a virtual power queue is maintained for each node as in (Equation 6) including the source nodes and relay nodes. Note that in this model, it is possible for a source node to act as a relay for another source node when it is not transmitting its own data. We denote the set of relay nodes (that includes such source nodes) in slot as .

Then the optimal control algorithm operates as follows. Let denote the collection of all virtual queues in timeslot . Every slot, given and any channel state , it chooses a control action that minimizes the following stochastic metric (for a given control parameter ):