Transmitting important bits and sailing high radio waves: a decentralized
crosslayer approach to
cooperative video transmission
Abstract
We investigate the impact of cooperative relaying on uplink and downlink multiuser (MU) wireless video transmissions. The objective is to maximize the longterm sum of utilities across the video terminals in a decentralized fashion, by jointly optimizing the packet scheduling, the resource allocation, and the cooperation decisions, under the assumption that some nodes are willing to act as cooperative relays. A pricingbased distributed resource allocation framework is adopted, where the price reflects the expected future congestion in the network. Specifically, we formulate the wireless video transmission problem as an MU Markov decision process (MDP) that explicitly considers the cooperation at the physical layer and the medium access control sublayer, the video users’ heterogeneous traffic characteristics, the dynamically varying network conditions, and the coupling among the users’ transmission strategies across time due to the shared wireless resource. Although MDPs notoriously suffer from the curse of dimensionality, our study shows that, with appropriate simplications and approximations, the complexity of the MUMDP can be significantly mitigated. Our simulation results demonstrate that integrating cooperative decisions into the MUMDP optimization can increase the resource price in networks that only support low transmission rates and can decrease the price in networks that support high transmission rates. Additionally, our results show that cooperation allows users with feeble direct signals to achieve improvements in video quality on the order of dB peak signaltonoise ratio (PSNR), with less than 0.8 dB quality loss by users with strong direct signals, and with a moderate increase in total network energy consumption that is significantly less than the energy that a distant node would require to achieve an equivalent PSNR without exploiting cooperative diversity.
Cooperative communications, crosslayer optimization, decodeandforward relaying, Markov decision process (MDP), multiuser scheduling, resource allocation, wireless video transmission.
I Introduction
Existing wireless networks provide dynamically varying resources with only limited support for the Quality of Service (QoS) required by delaysensitive, bandwidthintense, and losstolerant multimedia applications. This problem is further exacerbated in multiuser (MU) settings because they require multiple video streams, with heterogeneous traffic characteristics, to share the scarce wireless resources. To address these challenges, a lot of research has focused on MU wireless communication [2, 3, 4, 5, 6] and, in particular, MU video streaming over wireless networks [7, 8, 9, 10, 11]. The majority of this research relies on crosslayer adaptation to match available system resources (e.g., bandwidth, power, or transmission time) to application requirements (e.g., delay or source rate), and vice versa. In MU video streaming applications [7, 8, 9, 10, 11], for example, crosslayer optimization is deployed to strike a balance between scheduling lucky users who experience very good fades, and serving users who have the highest priority video data to transmit. This tradeoff is important because rewarding a few lucky participants, as opportunistic multiple access policies do [3, 4, 5], does not translate to providing good quality to the application (APP) layer. Unfortunately, with the exception of [6, 12], the aforementioned research assumes that wireless users are noncooperative. This leads to a basic inefficiency in the way that the network resources are assigned: indeed, good fades experienced by some nodes can go to waste because users with higher priority video data, but worse fades, get access to the shared wireless channel.
A way to not let good fades go to waste is to enlist the nodes that experience good fades as cooperative helpers, using a number of techniques available for cooperative coding [13, 14, 15]. As mentioned above, this idea has been considered in [6, 12]. In [12], for example, a crosslayer optimization is proposed involving the physical (PHY) layer, the medium access control (MAC) sublayer, and the APP layer, where layered video coding is integrated with randomized cooperation to enable efficient video multicast in a cooperative wireless network. However, because it is a multicast system, there is no need for an optimal multipleaccess strategy, and no need to worry about heterogeneous traffic characteristics. In [6], a centralized network utility maximization (NUM) framework is proposed for jointly optimizing relay strategies and resource allocations in a cooperative orthogonal frequencydivision multipleaccess (OFDMA) network. In both [6, 12], it is assumed that each user has a static utility function of the average transmission rate, where the utility derived by each user in [12] is a function of the average received rate of the base and enhancement layer video bitstreams.
Unlike the aforementioned solutions, we take a dynamic optimization approach to the cooperative MU video streaming problem. In particular, unlike [6, 12], the solution that we adopt explicitly considers packetlevel video traffic characteristics (instead of flowlevel) and dynamic network conditions (instead of average case conditions). Our solution is inspired by the crosslayer resource allocation and scheduling solution in [11], in which the MU wireless video streaming problem is modeled and solved as an MU Markov decision process (MDP) that allows the users, via a uniform resource pricing solution, to obtain longterm optimal video quality in a distributed fashion. However, although we use the traffic model and dual decomposition proposed in [11], cooperation renders our PHY/MAC model completely different from that studied in [11], thus opening additional research issues with respect to [11], such as how the cooperation decision should be made, what is the impact of cooperation on the resource price, and what is the impact of cooperation on the total network energy consumption. Moreover, as recently shown in [16], augmenting the framework developed in [11] to also account for cooperation is challenging because of the complexity of the resulting crosslayer MUMDP optimization.
The contributions of this paper are fourfold. First, we formulate the cooperative wireless video transmission problem as an MUMDP using a timedivision multipleaccess (TDMA)like network, randomized spacetime block coding (STBC) [17], and a decodeandforward cooperation strategy. To the best of our knowledge, we are the first to consider cooperation in a dynamic optimization framework. We show analytically that the decision to cooperate can be made opportunistically, independently of the MUMDP. Consequently, each user can determine its optimal scheduling policy by only keeping track of its experienced cooperative transmission rates, rather than tracking the channel statistics throughout the network. Second, in light of the fact that opportunistic cooperation is optimal, we propose a low complexity opportunistic cooperative strategy for exploiting good fades in an MU wireless network. The key idea is that nodes can, in a distributed manner, selfselect themselves to act as cooperative relays. The proposed selfselection strategy requires a number of message exchanges that is linear in the number of video sources, and selects sets of cooperative relays in such a way that cooperation can be guaranteed to be better than direct transmission. Third, we show experimentally that users with feeble direct signals to the access point (AP) are conservative in their resource usage when cooperation is disabled. In contrast, when cooperation is enabled, users with feeble direct signals to the AP use cooperative relays and utilize resources more aggressively. Consequently, the uniform resource price that is designed to manage resources in the network tends to increase when cooperation is enabled in a network that only supports low transmission rates, but tends to decrease when it is enabled in a network that supports high transmission rates. Fourth, we study the impact of cooperation on the total network energy consumption. We show that the increased transmission rate afforded by cooperation requires an increase in total network energy relative to the lower rate direct transmission; however, this increase is moderate compared to the amount of power required to transmit directly to the access point at a transmission rate equivalent to the cooperative rate.
The remainder of the paper is organized as follows. We introduce the system and application models in Section II. In Section III, we provide expressions for the transmission rate, packet error rate, and network energy consumption in both direct and cooperative transmission modes. In Section IV, we present the proposed MU crosslayer PHY/MAC/APP optimization. In Section V, we propose a distributed protocol for opportunistically recruiting cooperative relays. Finally, we report numerical results in Section VI and conclude in Section VII.
Ii System Model
We consider a network composed of users streaming video content over a shared wireless channel to a single AP (see Fig. 1). Such a scenario is typical of many uplink media applications, such as remote monitoring and surveillance, wireless video sensors, and mobile video cameras. The proposed optimization framework can also be used for downlink applications, where the relays can be recruited for streaming video to a certain user in the network in exactly the same way that they can be recruited to transmit to the AP in the uplink scenario. In Subsection IIA, we introduce the MAC and PHY layer models. Then, in Subsection IIB, we describe the deployed APP layer model.
Iia MAC and PHY layer models
We assume that time is slotted into discrete timeintervals of length seconds and each time slot is indexed by .^{1}^{1}1 The fields of complex, real, and nonnegative integer numbers are denoted with , , and , respectively; matrices [vectors] are denoted with upper [lower] case boldface letters (e.g., or ); the field of complex [real] matrices is denoted as [], with [] used as a shorthand for []; the superscript denotes the transpose of a vector; denotes the magnitude of a complex number; is the norm of the vector , which for positive realvalued vectors is simply the sum of the components, whereas is the Euclidean norm of ; indicates the th element of the matrix , with and ; a circular symmetric complex Gaussian random variable with mean and variance is denoted as ; and denote flooring and ceilinginteger, respectively; stands for ensemble averaging; and, finally, . At the MAC sublayer, the users access the shared channel using a TDMAlike protocol. In each time slot , the AP endows the th user, for , with the resource fraction , where , such that the user can use the amount of channel time for transmission. Let denote the resource allocation vector at time slot , which must satisfy the stage resource constraint , where the inequality accounts for possible signaling overhead.
Each node’s PHY layer is assumed to be a singlecarrier singleinput singleoutput system designed to handle quadrature amplitude modulation (QAM) square constellations, with a (fixed) symbol rate of symbols per second. The PHY layer can support a set of data rates (bits/second), where is the number of bits that are sent every symbol period, with , and is the number of signals in the QAM constellation. Hence, form the basic rate set and is the base rate at which the nodes exchange control messages. Let be the minimum distance of the QAM constellation, the average transmitter energy per symbol is given by
(II.1) 
which is assumed to be fixed for all the nodes and data rates, i.e., it does not depend on the indices and . Consequently, the average power per symbol expended by each transmitter is (Watts). We consider a frequency nonselective block fading model, where denotes the fading coefficient over the link in time slot , with , and or corresponding to the AP. It is assumed that all the channels are dual, i.e., , and that the fading coefficients are independent and identically distributed (i.i.d.) with respect to . Moreover, we define as the matrix collecting the fading coefficients among all of the nodes and the AP, i.e., , for .
At the PHY layer, there are two transmission modes to choose from: direct and cooperative. In the direct transmission mode, as shown in Fig. 1, the th source node transmits directly to the AP at the data rate (bits/second) for the assigned transmission time of seconds. In the cooperative transmission mode, some nodes serve as decodeandforward relays. Specifically, in the cooperative mode, the assigned transmission time is divided into two phases as illustrated in Fig. 1: in Phase I, the th source node directly broadcasts its own data to all the nodes in the network at the data rate for seconds, where is the Phase I time fraction; in Phase II, some of the nodes overhearing the source transmission, belonging to a certain subset , demodulate the data received in Phase I, remodulate the original source bits, and then cooperatively transmit towards the AP, along with the original source , at the data rate for the remaining seconds. In the sequel, we denote with (bits/second) the cooperative data rate over the two phases, i.e., the amount of bits that are transmitted in a single phase divided by the overall length of the two phases, which depends on the data rates and attainable in each of the two hops. The decision to transmit in the direct or cooperative transmission mode depends on fading coefficients throughout the network in time slot and on the target packet error rate (PER). Thus, the actual transmission rate of the th source in time slot is dictated by the cooperation decision , where if cooperation is chosen, and if direct transmission is chosen. In Section III, we compute the transmission parameters and as functions of a subset of the entries in , as well as the time fraction , and, in Section V, we describe how to determine the set of cooperative relays and the cooperation decision .
IiB APP layer model and packet scheduling
The source traffic can be modeled using any Markovian traffic model (e.g. [11, 20]). However, to accurately capture the characteristics of the video packets, we adopt the sophisticated video traffic model proposed in [11], which accounts for the fact that video packets have different deadlines, distortion impacts, and sourcecoding dependencies (whereas the model in [20] does not consider these characteristics). In this section, we describe the key features of this model, but because the problem formulation and novelty of this paper do not depend on the deployed traffic model (so long as the model is Markovian), we refer the interested reader to [11] for complete details.
For , the traffic state represents the video data that the th user can potentially transmit in time slot , and comprises the following two components: the schedulable frame set and the buffer state . In time slot , we assume that the th user can transmit packets belonging to the set of video frames whose deadlines are within the scheduling time window (STW) . The buffer state represents the number of packets of each frame in the STW that are awaiting transmission at time . The th component of denotes the number of packets of frame remaining for transmission at time . We assume that each packet has size bits. Fig. 2 illustrates how the traffic states are defined for a simple IBPB GOP structure.^{2}^{2}2In a typical hybrid video coder like H.264/AVC or MPEG2, I, P, and B indicate the type of motion prediction used to exploit temporal correlations between video frames. Iframes are compressed independently of the other frames, Pframes are predicted from previous frames, and Bframes are predicted from previous and future frames.
We now define the packet scheduling action. In each time slot , the th user takes scheduling action , which determines the number of packets to transmit out of . Specifically, the th component of represents the number of packets of the th frame within the STW that are scheduled to be transmitted in time slot . Importantly, the scheduling action is constrained to be in the feasible scheduling action set , which depends on the traffic state and the transmission rate supported by the PHY layer . In particular, the following three constraints must be met:

Buffer: Every component of must satisfy .

Packet: The total number of transmitted packets must satisfy , where in the direct transmission mode, i.e., when , and in the cooperative transmission mode, i.e., when . Note that depends on a subset of the elements in as described later in Section III.^{3}^{3}3We do not include in the packet constraint because is not known at the time the scheduling decision is determined. Once the scheduling decision is determined, the resource allocation is determined as (see (IV.5)). Importantly, the stage resource constraint ensures that the scheduling decisions , are selected such that .

Dependency: If there exists a frame that has not been transmitted, and frame depends on frame (denoted by ), then . In other words, all packets associated with must be transmitted before transmitting any packets associated with .
The sequence of traffic states can be modeled as a controllable Markov chain with transition probability function .
Iii Cooperative PHY layer transmission
In this subsection, with reference to the uplink scenario, we describe how the direct transmission rate and the cooperative transmission rate depend on a subset of the elements in the channel state matrix .
Let us first consider the direct link with instantaneous channel gain and data rate (bits/second) corrupted by additive white Gaussian noise. The bit error probability (BEP) at the output of the maximum likelihood (ML) detector of node , under the assumption that a Gray code is used to map the information bits into QAM symbols and the signaltonoise ratio (SNR) is sufficiently high, can be upper bounded as (see [21])
(III.1) 
where is the average SNR per symbol expended by the transmitter and is the noise power spectral density. Each direct transmission is subject to a PER threshold at the MAC sublayer, which leads to a BEP constraint at the PHY layer. Consequently, the achievable data rate under the BEP constraint is
(III.2) 
The data rate over the link between the source and the AP is obtained using (III.2) by setting . In this case, the number of symbols required to transmit a packet of bits is equal to . Thus, neglecting receive and processing energy consumption, the energy required for a direct transmission of one packet is equal to
(III.3) 
It is worth noting that the energy expended in direct mode is inversely proportional to the achievable data rate .
At this point, let us consider the cooperative mode. Because of possible error propagation, the endtoend BEP for a twohop cooperative transmission is cumbersome to calculate exactly with decodeandforward relays; therefore, the relationship that ties , and the relevant channel state information, and that guarantees a certain reliability of the overall link, is not as simple as (III.2). To significantly simplify the computation of and , we use two different BEP thresholds and for the first and second hops, respectively. The threshold is typically a large percentage of the total error rate budget, say , and , since the first link is the bottleneck in decodeandforward relaying. Indeed, the performance at each relay is that of a singleinput singleoutput system transmitting over a fading channel. On the other hand, the transmission over the second link (from the recruited relays to the destination) can be regarded as a distributed multipleinput singleoutput system operating over a fading channel; consequently, the performance at the destination, which can take advantage from cooperative diversity, is significantly better than that of each sourcetorelay link, even when a small number of relays are recruited. Moreover, due to this fact and since the exponential function in (III.1) decays fast as a function of its argument, we reasonably assume that the endtoend BEP at the output of the ML detector of the AP is dominated by the BEP over the worst sourcetorelay channel, i.e., the link for which is the smallest one. Under this assumption, accounting for (III.2), we can estimate in Phase I as
(III.4) 
where is obtained from by replacing with . In this phase, which lasts seconds, the number of symbols needed to transmit a packet of bits is equal to and, thus, it must result that
(III.5) 
Supposing that a subset of the available nodes are recruited to serve as relays in Phase II, these nodes, along with the th user, cooperatively forward the source message by using a randomized STBC rule [17]. More specifically, assuming errorfree demodulation at the decodeandforward relays, if gathers the block of i.i.d. QAM source symbols to be transmitted in Phase II of time slot , then at the th node, for each , the vector is mapped onto an orthogonal spacetime code matrix [22], where is the block length and denotes the number of antennas in the underlying spacetime code. During Phase II, the th node transmits a linear weighted combination of the columns of , with the weights of the columns of contained in the vector . We denote with the weight matrix of all the cooperating nodes, where is the cardinality of .^{4}^{4}4One specific code of the STBC matrix is always assigned to the source itself, which transmits over the cooperative link every time cooperation is activated. This can be accounted for by simply setting and replacing the first row of with , whereas the remaining entries of are identically and independently generated random variables with zero mean and variance . Under the randomized STBC rule, the AP observes the spacetime coded signal with equivalent channel vector , where collects all the channel coefficients between the relay nodes and the AP (see Fig. 1). Note that the AP only needs to estimate for coherent ML decoding and that the randomized coding is decentralized since the th relay chooses locally. By capitalizing on the orthogonality of the underlying STBC matrix , the BEP over the second hop at the output of the ML detector of the AP using data rate (bits/second) can be upper bounded as in (III.1) by replacing and with and , respectively. By imposing the BEP constraint , the data rate attainable on the second hop of the cooperating link is given by
(III.6) 
where is obtained from in (III.2) by replacing with . In this phase, which lasts seconds, the number of symbols needed to transmit a packet of bits is equal to and, thus, it must result that
(III.7) 
where is the rate of the orthogonal STBC rule. From (III.5) and (III.7), the transmission time for the two phase communication mode is
(III.8) 
which also unveils what is the functional dependence of on and . Moreover, from (III.5) and (III.7), it is required that
(III.9) 
which shows that, given the STBC rule, the time fraction is determined by the data rates in Phase I and II. The cooperative mode is activated only if the cooperative transmission is more datarate efficient than the direct communication, i.e., only if , which from (III.8) leads to the following condition
(III.10) 
If condition (III.10) is fulfilled, then the opportunistically optimal cooperation decision is ; otherwise, the th source transmits to the AP in direct mode and .
It is interesting to evaluate the energy consumption in the case of a cooperative transmission. Neglecting receive and processing energy consumption, the energy expended by the source for transmission of one packet is equal to
(III.11) 
whereas the energy expended by each recruited relay node for transmitting one packet of the th source is given by
(III.12) 
It is noteworthy from (III.3) and (III.11) that, since cooperation is activated only when , the energy expended by the source node for a cooperative transmission is smaller than that required by the same node for a direct transmission. On the other hand, the energy (III.12) expended by the relays is inversely proportional to the achievable data rate in Phase II. Therefore, provided that , over a sufficiently long period, the energy expenditure in relaying another node’s data can be partially compensated for when the recruited relay acts as a source in the network. The total energy expended in the network to transmit packets for user can be expressed as
(III.13) 
The energy consumption in the direct and cooperative modes is numerically compared in Section VI.
Iv Cooperative MultiUser Video Transmission
Recall that denotes the th user’s traffic state and collects the channel coefficients among all the nodes and the AP. Hence, the global state can be defined as , where is a discrete set of all possible states.^{5}^{5}5To have a discrete set of network states, the individual link states in are quantized into a finite number of bins (see [25] for details). Since: (i) the th user’s traffic state evolves as a Markov process controlled by its scheduling action ; (ii) the th user’s traffic state transition is conditionally independent of the other users’ traffic state transitions given ; and (iii) the state of each link is assumed to be i.i.d. with respect to time; the sequence of global states can be modeled as a controlled Markov process with transition probability function
(IV.1) 
where collects the scheduling actions of all the video users.
Under the scheduling action , the th user obtains the immediate utility
(IV.2) 
which is the total video quality improvement experienced by the th user by taking scheduling action in traffic state under the assumption that quality is incrementally additive [18].
The objective of the MU optimization is the maximization of the expected discounted sum of utilities with respect to the joint scheduling action and the cooperation decision vector taken in each state . Due to the stationary Markovian transition probability function, the optimization can be formulated as an MDP that satisfies the following dynamic programming equation^{6}^{6}6In this section, since we model the problem as a stationary MDP, we omit the time index when it does not create confusion. In place of the time index, we use the notation to denote a state variable in the next time step (e.g. , , ).
(IV.3) 
subject to
(IV.4) 
where is the timefraction allocated to the th user given its scheduling action and transmission rate , i.e.,
(IV.5) 
the parameter is the “discount factor”, which accounts for the relative importance of the present and future utility, and is the set of feasible scheduling actions given the traffic state and channel state matrix . From Theorem 6.2.5 in [27], we know that there exists a stationary optimal policy that is the global optimal solution to (IV.3) .
Given the distributions and for all , the above MUMDP can be solved by the AP using value iteration or policy iteration [19]. However, there are two challenges associated with solving the above MUMDP. First, the complexity of solving an MDP is proportional to the cardinality of its statespace , which, in the above MUMDP, scales exponentially with the number of users, i.e., , and with the number of links in , i.e., . Hence, even for moderate sized networks, it is impractical to compute, or even to encode, . In subsection IVA, we show that the exponential dependence on the number of links in can be eliminated. Second, in the uplink scenario, the traffic state information is local to the users, so neither the AP nor the users have enough information to solve the above MUMDP. In subsection IVB, we summarize the findings in [11] that show that the considered optimization can be approximated to make it amenable to a distributed solution. Additionally, this distributed solution eliminates the exponential dependence on the number of users. Note that the simplification in subsection IVA is very important, because only after obtaining this result does it become possible to use the solution in [11].
Iva Reformulation with simplified network state
The only reason to include the detailed network state information and the cooperation decision in the MUMDP is to make foresighted cooperation decisions, which take into account the impact of the immediate cooperation decision on the expected future utility of the users. However, if we can show that the optimal opportunistic (i.e., myopic) cooperation decision is also longterm optimal, then the detailed network state information does not need to be included in the MUMDP. The following theorem shows that the optimal opportunistic cooperation decision, which maximizes the immediate transmission rate, is also longterm optimal.
Theorem 1 (Opportunistic cooperation is optimal)
If utilizing cooperation incurs zero cost to the source and relays, then the optimal opportunistic cooperation decision, which maximizes the immediate throughput, is also longterm optimal.
See Appendix I.
To intuitively understand why maximizing the immediate transmission rate at the PHY layer is longterm optimal, consider what happens when a user chooses not to maximize its immediate transmission rate (i.e., does not utilize the optimal opportunistic cooperation decision). Two things can happen: either less packets are transmitted overall because of packet expirations; or, the same number of packets are transmitted overall, but their transmission incurs additional resource costs because transmitting the same number of packets at a lower rate requires more resources [see (IV.5)]. In either case, the longterm utility is suboptimal. A consequence of Theorem 1 is that the cooperation decision vector does not need to be included in the MUMDP. Instead, it can be determined opportunistically by selecting to maximize the immediate transmission rate. Most importantly, this means that the MUMDP does not need to include the highdimensional network state.
We now make two remarks regarding Theorem 1 so that its consequences are not misinterpreted. First, in the introduction, we noted that maximizing throughput is a suboptimal multiple access strategy for wireless video. This does not contradict Theorem 1 because it only states that the cooperation decision should be made opportunistically to maximize the immediate transmission rate. Indeed, myopic (opportunistic) resource allocation and scheduling is suboptimal because it does not take into account the dynamic video data attributes (i.e., deadlines, priorities, and dependencies). Second, although the users’ MDPs do not need to include the highdimensional network state, the optimal resource allocation and scheduling strategies still depend on it; however, instead of tracking , it is sufficient to track the users’ optimal opportunistic transmission rates provided by the PHY layer, i.e., for all . Under the assumption that the channel coefficients are i.i.d. random variables with respect to , can also be modeled as an i.i.d. random variable with respect to . We let denote the probability mass function (pmf) from which is drawn. We note that depends on and the deployed PHY layer cooperation algorithm.
Based on the second remark, we can simplify the maximization problem in (IV.3). Let us define the th user’s state as and redefine the global state as . In Section V, we describe how is determined, but for now we will take for granted that it is known. Because the optimization does not need to include the cooperation decision, the maximization of the expected sum of discounted utilities in (IV.3) can be simplified by only maximizing with respect to the scheduling action in each state , that is,
(IV.6) 
subject to
(IV.7) 
where .
IvB Distributed solution
Similar to [11], (IV.6) can be reformulated as an unconstrained MDP using Lagrangian relaxation. The key idea is to introduce a Lagrange multiplier associated with the stage resource constraint in each global state because every global state has a different resourcequality tradeoff. The resulting dual solution has zero duality gap compared to the primary problem [i.e., (IV.6)], but it still depends on the global state so it is not amenable to a distributed solution. However, by imposing a uniform resource price , , which is independent of the multiuser state, the resulting MUMDP can be decomposed into MDPs, one for each user [11].^{7}^{7}7We note that the resource price is only used to efficiently allocate the limited wireless resources among the users; it is not used to generate revenue for the AP. In other words, it is a congestion price rather than a real price. These local MDPs satisfy the following dynamic programming equation
(IV.8) 
(IV.9) 
subject to . Importantly, the th user’s dynamic programming equation defines the optimal scheduling action as a function of the th user’s state, rather than the global state . In this paper, the th user solves (IV.8) offline using value iteration; however, it can be easily solved online using reinforcement learning as in [11] and [20]. Also, note that due to the distributed nature of the proposed algorithm, the stage resource constraint is not guaranteed to be satisfied during convergence or at steadystate. Because the stage resource constraint may be violated, it must be enforced separately by the AP, which we assume normalizes the requested resource allocations and, subsequently, has the users recompute their scheduling policies to satisfy the new allocations.
Although the optimization can be decomposed across the users, the optimal resource price still depends on all of the users’ resource demands. Hence, must be determined by the AP in both the uplink and downlink scenarios. Specifically, the resource price can be numerically computed by the AP using the subgradient method. The subgradient with respect to is given by , where is the th user’s expected discounted accumulated resource consumption, which can be calculated as described in [11]. Importantly, can be computed locally by the th user in the uplink scenario and by the AP in the downlink scenario. Using the subgradient method, the resource price is updated as
(IV.10) 
where is a diminishing step size. Since the focus of this paper is on the interaction between the multiuser video transmission and the cooperative PHY layer, we refer the interested reader to [11] for complete details on the dual decomposition outlined in this subsection, and the derivation of the subgradient with respect to .
We note that a similar decomposition has recently been proposed for energyefficient uplink scheduling with delay constraints in multiuser wireless networks using a different MUMDP framework [20]. Besides the fact that [20] does not consider physical layer cooperation or heterogeneous traffic characteristics, there is one significant difference between the decomposition in [20] and the one adopted in this paper. Specifically, the TDMAlike protocol in [20] assumes that only one user can transmit in each time slot, whereas we consider a TDMAlike protocol in which each time slot is divided into different length transmission opportunities for each user. Moreover, in [20], every user has a unique Lagrange multiplier associated with its average buffer delay constraint. In contrast, in our decomposition, all users have the same Lagrange multiplier, which regulates the resource division among the users, rather than their individual delay constraints. Note that, in this paper, delay constraints are included in the application model. Importantly, Theorem 1 applies to the MUMDP formulation in [20] and therefore the recruitment protocol proposed in Section V can be used to integrate cooperation into [20]. In other words, the novelty and technical contributions of this paper are independent of the dual decomposition in [11], which we only use for illustrative purposes.
V Recruitment protocol
With reference to the uplink scenario, we define our opportunistic cooperative strategy to select distributively the set of cooperative relays and make the decision at the AP. The downlink case is a minor variation.
Importantly, the AP can exactly evaluate in (III.6) because it can estimate and via training as mentioned in Section III. However, the trouble in recruiting relays onthefly is that the AP and the relays cannot directly compute given by (III.4), since they cannot estimate the channel coefficients , for all . Some MAC randomized protocols have recently been proposed [23, 24], which get around the problem that the AP and the relays do not have the necessary channel state information to determine . However, such protocols require the exchange and/or the tracking of a large amount of network parameters that may incur unacceptable delays in a wireless video network. In particular, the first and secondhop data rates are computed in [24] by the source node using the average PER evaluated by simulations. To quickly setup the cooperative transmission and, thus, reduce the delays, we propose a much simpler recruitment scheme that is based on the closedform formulas (III.4) and (III.6). The proposed fourway protocol is reminiscent of the requesttosend (RTS) and cleartosend (CTS) handshaking used in carrier sense multiple access with collision avoidance (CSMA/CA), which is extended to include a helperready to send (HTS) control message that is cooperatively transmitted by the relays using randomized STBC and a cooperative recruitment signal (CRS) that is sent by the AP to recruit relays. The idea of sending the HTS frame in cooperative mode has been originally proposed in [24]. However, apart from the use of the HTS control message, the proposed protocol is different from that of [24] because we use a completely different recruitment policy.
All the control frames are transmitted at the base rate such that they can be decoded correctly, and the thresholds and , as well as and , are fixed parameters that are known at all the nodes. Fig. 3 illustrates the signaling protocol for time slot , which consists of the nine steps detailed in Table I. We would like to highlight that, similar to the data transmitted in Phase II, the HTS message is a cooperative signal, i.e., all relays jointly deliver the HTS frame using randomized STBC at the same time and, hence, simultaneous transmissions do not cause a collision. With reference to Table I, the key observation is that the selection of the set by virtue of (.4) is done in a distributed way and, moreover, by simply having access to the channel state from the source to itself, i.e., , the th candidate cooperative node can autonomously determine if, by cooperating, it can improve the data rate of node . Another important observation is that the recruitment of the cooperative nodes and the assignment of the data rates requires only four control messages for each source. In particular, the control information exchange is independent of the number of recruited relays thanks to the randomization of the cooperative transmission. Moreover, the two parameters and need to be chosen appropriately. The best choice for and requires global network information. A learning framework would be very appropriate for their selection but we defer the treatment of this aspect to future work. Finally, as for the impact of on the network performance, it should evidenced that randomized channels tend to behave statistically like their nonrandomized counterparts [17], with deepfade events that become as frequent as those of independent channels, as long as the number of cooperative nodes .
Vi Numerical Results
We consider a network with 50 potential relay nodes placed randomly and uniformly throughout the 100 m coverage range of a single AP as illustrated in Fig. 4. We specify the placement of the video source(s) separately for each experiment. Let denote the distance in meters between the th and th nodes. The fading coefficient over the link is modeled as an i.i.d. random variable, where is the pathloss exponent. Additionally, we assume that the entries of , defined in Section III, are i.i.d. random variables, where is the length of the STBC. If an error occurs in the packet transmission, then the packet remains in the frame buffer to be retransmitted in a future time slot (assuming the packet’s deadline has not passed).
Due to space constraints, and because cooperation has the same impact in both uplink and downlink scenarios, we only present results for cooperative uplink video transmission. In particular, we consider four uplink scenarios:

Single source: In this scenario, we assume that a single source node is placed between 10 and 100 m directly to the right of the AP in Fig. 4. We use this scenario to evaluate the transmission rates in the direct and cooperative transmission modes at different distances from the AP, and to determine a good selfselection parameter .

Homogeneous video sources: This scenario mimics a surveillance application in which three cameras capture correlated video content in an outdoor environment and transmit it to the AP. The video sources are placed to the right of the AP as illustrated in Fig. 7. To simulate correlated content, we assume that each of the three cameras stream the Foreman sequence (CIF resolution, 30 Hz framerate, encoded at 1.5 Mb/s) offset by several frames. Using homogeneous sources allows us to isolate the impact of cooperation on the video streaming performance by removing the additional layer of complexity introduced by heterogeneous video sources (e.g. different packet priorities and bitrates among the video users).

Heterogeneous video sources 1: This scenario mimics a network in which users deploy entertainment applications such as video sharing or video conferencing. To simulate this, we assume that the three video sources illustrated in Fig. 7 transmit heterogeneous video content to the AP. Specifically, we assume that video user 1 streams the Coastguard sequence (CIF, 30 Hz, 1.5 Mb/s), video user 2 streams the Mobile sequence (CIF, 30 Hz, 2.0 Mb/s), and video user 3 streams the Foreman sequence (CIF, 30 Hz, 1.5 Mb/s).

Heterogeneous video sources 2: This is the same as the previous scenario, but with video user 2 streaming the Foreman sequence and video user 3 streaming the Mobile sequence.
We note that the proposed framework can be applied using any video coder to compress the video data. However, for illustration, we use a scalable video coding scheme [26], which is attractive for wireless streaming applications because it provides onthefly application adaptation to channel conditions, support for a variety of wireless receivers with different resource and power constraints, and easy prioritization of video packets.
In our results, we deploy the proposed randomized STBC cooperation protocol outlined in Table I and determine the optimal resource allocation and scheduling decisions using the distributed optimization introduced in Section IVB. The relevant simulation parameters are given in Table II. Note that, in the homogeneous and heterogeneous scenarios described above, we simulate a network with a “high” transmission rate, using the symbol rate , and a network with a “low” transmission rate, using the symbol rate symbols/second.
Via Transmission rates and energy consumption
In this subsection, we consider the single source scenario described above. Fig. 5 illustrates the performance of the proposed cooperation protocol for timeinvariant selfselection parameter values , and the performance of direct transmission, given a single source transmitting to the AP. Note that these results hold regardless of the symbol rate. In particular, the “transmission rate” in Fig. 5(a) is presented in terms of the spectral efficiency (bits/second/Hz); the probability of cooperation in Fig. 5(b) and the average number of recruited relays in Fig. 5(c) only depend on the spectral efficiency; and the energy results reported in Figs. 5(df) are normalized by setting the symbol energy (or, equivalently, ) in (III.3), (III.11), and (III.12).
From Fig. 5(a), it is clear that nodes further from the AP utilize cooperation more frequently than nodes closer to the AP. This is because, on average, distant nodes have the feeblest direct signals to the AP due to pathloss and, therefore, have the most to gain from the channel diversity afforded to them by cooperation. It is also clear from Fig. 5(a) that cooperation is utilized more frequently as the selfselection parameter increases. This is because, as illustrated in Fig. 5(c), more relays satisfy the selfselection condition in step 5 of Table I for larger values of . However, larger values of yield relay nodes for which is large, which leads to a bad transmission rate over the bottleneck hop1 cooperative link. Due to this poor bottleneck rate and the large number of recruited relays, the average transmission rate shown in Fig. 5(b) declines for even while the total energy consumption increases as illustrated in Fig. 5(d). In contrast, lower values of the selfselection parameter (e.g. ) lead to too few nodes being recruited to achieve large cooperative gains, but yield lower energy consumption. Interestingly, the same properties of relay nodes that are desirable for achieving the best transmission rate – a balance between the number and quality of relays – is also important for achieving a high throughputtoenergy ratio. For example, Fig. 5(e) shows us that at 100 m from the AP, the average throughputtoenergy ratio for cooperative transmission with is a little less than 0.8, which is close to the throughputtoenergy ratio of a direct transmission, which is 1 at 100 m.
Although the average network energy required to support a cooperative transmission is larger than that required for a direct transmission, this increase is moderate compared to the amount of energy the source node would have to expend in order to achieve the same transmission rate as the cooperative transmission, i.e., to attain requires a large increase in the transmission power with respect to the cooperative case. This is illustrated in Fig. 5(f), where, for example, it is shown that transmitting in the direct mode at the rate attainable under cooperative transmission with requires approximately 13.5 normalized Joules/Packet compared to approximately 3.5 normalized Joules/Packet in the cooperative case shown in Fig. 5(d).^{8}^{8}8The results in Fig. 5(f) were obtained by fixing the transmission rate and adapting the symbol energy, which is in contrast to the current problem formulation in which we fix the symbol energy and adapt the transmission rate. Specifically, we calculated the symbol energy required to set by rearranging (III.2). Note that we could also force to achieve lower energy consumption at the same transmission rate as the direct mode.
In the remainder of our experiments, we let the selfselection parameter because, as illustrated in Figs. 5(b,e), this value provides a large average transmission rate over the AP’s entire coverage range and a high throughputtoenergy ratio. With , Fig. 7 illustrates the activation frequencies for different relays and Fig. 6 illustrates the average energy consumed by the source and relay nodes. Notice that, under a cooperative transmission, the source node actually uses less power than under a direct transmission, which partially compensates for the extra energy it may expend acting as a relay for other nodes.
ViB Transmission rate, resource price, and resource utilization
Fig. 8 illustrates the average transmission rates achieved by the video users in the homogeneous and heterogeneous scenarios in networks that support high and low transmission rates. Recall that the resource cost incurred by user is inversely proportional to the transmission rate [see (IV.5)], which decreases as the distance to the AP increases due to path loss. Hence, when only direct transmission is available, user 3 tends to resign itself to a low average transmission rate because the cost of using resources is too high. Cooperation increases the average transmission rate, thereby providing user 3 lower cost access to the channel to transmit more data.
In the homogeneous scenario illustrated in Fig. 8(a), cooperation tends to equalize the resource allocations to the three users (this is especially evident in the cooperative case with a high transmission rate). This is because the homogeneous users have identical utility functions; thus, when sufficient resources are available, it is optimal for them to all operate at the same point of their resourceutility curves. In contrast, when heterogeneous users with different utility functions are introduced, the transmission rates change to reflect the priorities of the different users’ video data. Observing Fig. 8(b,c), it is clear that the additional resources afforded by cooperation tend to go to the highest priority video user, who, in our simulations, is the user streaming the Mobile sequence.
Recall that users autonomously optimize their resource allocation and scheduling actions given the resource price announced by the AP. Table III illustrates the optimal resource prices in the homogeneous and heterogeneous scenarios along with the average network resource utilization, i.e. the average of . There are several interesting results in Table III. First, the average network resource utilization is often considerably less than the total available resources. This is due to the distributed nature of the resource allocation and scheduling algorithm, which requires users to be conservative in their resource usage to ensure feasible allocations. Second, in the cooperative transmission mode, the resource price tends to increase and the utilization tends to decrease when going from a high rate to a low rate network, regardless of the streaming scenario. The resource price increases because the network supports lower rates, but the demand stays the same, which increases congestion. The utilization decreases because lower rates yield a coarser set of feasible resource allocations for each user (see (IV.5)). Third, in the high rate network, the resource price tends to decrease and the utilization tends to increase when going from the direct to the cooperative transmission mode, regardless of the streaming scenario. The resource price decreases because cooperation floods the network with resources without significantly impacting demand, which reduces congestion. The utilization increases because the cooperative transmission mode supports higher transmission rates, which yield a finer set of feasible resource allocations for each user (see (IV.5)). Finally, in the low rate network, the resource price and utilization tend to increase when going from the direct to the cooperative transmission mode. In contrast to the high rate network, the resource price increases because users that resigned themselves to very low transmission rates in the direct scenario suddenly demand resources when cooperation is enabled. The resource price increases in our simulations because the enlarged demand pool exceeds the additional supply of resources that is introduced by cooperation. In other words, users that would like to transmit video, but are too far from the AP for a direct transmission, are essentially absent from the network when only direct transmission is available, and therefore do not significantly impact the resource price and resource utilization; however, when cooperation is enabled, these users are suddenly within range of the AP, and will therefore demand resources, which increases congestion. As in the other cases, the utilization increases because the transmission rate increases.
ViC Discounted utility and video quality comparison
Table IV compares the expected value of the objective function in (IV.9) (with respect to the stationary distribution over the states) obtained in the homogeneous and heterogeneous scenarios. Because the objective function includes a Lagrangian cost term, it is not always indicative of the corresponding video quality. For this reason, we also include Table V to compare the video quality obtained in the homogeneous and heterogeneous scenarios, where video quality is measured in terms of peaksignaltonoise ratio (PSNR in dB) of the luminance channel. In the network that supports a high transmission rate, the user furthest from the AP (user 3) benefits on the order of 510 dB PSNR from cooperation, while the video user closest to the AP (user 1) is penalized by less than 0.4 dB PSNR. In the network that only supports low transmission rates, user 3 goes from transmitting too little data to decode the video (denoted by “”) to transmitting enough data to decode at low quality, while penalizing user 1 by less than 0.8 dB PSNR. Note that these PSNR results implicitly reflect the endtoend delay from the source, through the relays, to the destination. This is because the sophisticated traffic model in subsection IIB accounts for the fact that frames that are not entirely received before their deadlines, and frames that depend on them, cannot be decoded and therefore do not contribute to the received video quality.
Vii Conclusion
We introduced a cooperative multiple access strategy that enables nodes with high priority video data to be serviced while simultaneously exploiting the diversity of channel fading states in the network using a randomized STBC cooperation protocol. We formulated the dynamic multiuser video transmission problem with cooperation as an MUMDP and we used Lagrangian relaxation with a uniform resource price to decompose the MUMDP into local MDPs at each user. We analytically proved that opportunistic (myopic) cooperation strategies are optimal, and therefore the users’ local MDPs only need to determine their optimal resource allocation and scheduling policies based on their experienced cooperative transmission rates. Subsequently, we proposed a randomized STBC cooperation protocol that enables nodes to opportunistically and distributively selfselect themselves as cooperative relays. Finally, we experimentally showed that the proposed cooperation strategy significantly improves the video quality of nodes with feeble direct links to the AP, without significantly penalizing other users, and with only moderate increases in total network energy consumption.
Appendix I: proof of Theorem 1
The transmission rate is a function of the cooperation decision and the channel state , i.e., we can write . Thus, the cooperation decision impacts the immediate utility because it constrains the set of feasible scheduling actions through the packet constraint .
Let and denote the optimal opportunistic cooperation decision and the maximum transmission rate, respectively. Selecting the cooperation decision that maximizes the immediate transmission rate enlarges the set of feasible scheduling actions, i.e., , for all . We now show that the optimal opportunistic cooperation decision enables a user to maximize its longterm utility for any . Let denote the utility less the cost, where is given by (IV.5). Under the optimal opportunistic cooperation decision, we have
(VII.1)  
(VII.2) 
where the inequality is due to the fact that for all . Thus, the optimal opportunistic cooperative decision maximizes the longterm utility.
References
References
 [1]
 [2] M. Chiang, S. H. Low, A. R. Caldbank, and J. C. Doyle, “Layering as optimization decomposition: A mathematical theory of network architectures,” Proceedings of IEEE, vol. 95, no. 1, pp. 255312, Jan. 2007.
 [3] R. Knopp and P. A. Humblet, “Information capacity and power control in singlecell multiuser communications,” Proc. IEEE ICC, vol. 1, pp. 331335, June 1995.
 [4] P. Viswanath, D. N. C. Tse, R. Laroia, “Opportunistic beamforming using dumb antennas,” IEEE Trans. on Information Theory, vol. 48, no. 6, pp. 12771294, June 2002.
 [5] D. N. C. Tse and P. Viswanath, Fundamentals of wireless communication. Cambridge, U.K.: Cambridge Univ. Press, 2005.
 [6] T. C.Y. Ng and W. Yu, “Joint optimization of relay strategies and resource allocations in cooperative cellular networks,” IEEE Trans. on Selected Areas in Communications, vol. 25, no. 2, pp. 328339, Feb. 2007.
 [7] X. Zhang and Q. Du, “CrossLayer Modeling for QoSDriven Multimedia Multicast/Broadcast Over Fading Channels in Mobile Wireless Networks,” IEEE Communications Magazine, pp. 6270, August 2007.
 [8] J. Huang, Z. Li, M. Chiang, and A.K. Katsaggelos, “Joint Source Adaptation and Resource Allocation for MultiUser Wireless Video Streaming,” IEEE Trans. Circuits and Systems for Video Technology, vol. 18, issue 5, pp. 582595, May 2008.
 [9] E. Maani, P. Pahalawatta, R. Berry, T.N. Pappas, and A.K. Katsaggelos, “Resource Allocation for Downlink Multiuser Video Transmission over Wireless Lossy Networks,” IEEE Transactions on Image Processing, vol. 17, issue 9, pp. 16631671, September 2008.
 [10] GM. Su, Z. Han, M. Wu, and K.J.R. Liu, “Joint Uplink and Downlink Optimization for RealTime Multiuser Video Streaming Over WLANs,” IEEE Journal of Selected Topics in Signal Processing, vol. 1, no. 2, pp. 280294, August 2007.
 [11] F. Fu and M. van der Schaar, “A Systematic Framework for Dynamically Optimizing MultiUser Video Transmission,” IEEE J. Sel. Areas Commun., vol. 28, pp. 308320, Apr. 2010.
 [12] O. Alay, P. Liu, Z. Guo, L. Wang, Y. Wang, E. Erkip, and S. Panwar, “Cooperative layered video multicast using randomized distributed space time codes”, in Proc. of IEEE INFOCOM MOVID Workshop,, pp. 16, April. 2009.
 [13] J.N. Laneman and G.W. Wornell, “Distributed spacetime block coded protocols for exploiting cooperative diversity in wireless networks,” IEEE Trans. Inf. Theory, vol. 49, no. 10, pp. 2415–2425, Oct. 2003.
 [14] A. Sendonaris, E. Erkip, and B. Aazhang, “User cooperation diversity – Part I II,” IEEE Trans. Commun., vol. 51, no. 11, pp. 1927–1948, Nov. 2003.
 [15] J.N. Laneman, D. Tse, and G.W. Wornell, “Cooperative diversity in wireless networks: efficient protocols and outage behavior,” IEEE Trans. Inf. Theory, vol. 50, no. 12, pp. 30623080, Sept. 2004.
 [16] N. Mastronarde, M. van der Schaar, A. Scaglione, F. Verde, and D. Darsena, “Sailing good radio waves and transmitting important bits: a case for cooperation at the physical layer in wireless video transmission,” in Proc. IEEE International Conf. Acoustics, Speech and Signal Proc., Dallas, Texas, USA, pp. 55665569, Mar. 2010.
 [17] B. SirkeciMergen and A. Scaglione, “Randomized spacetime coding for distributed cooperative communication”, IEEE Trans. Signal Process., vol. 55, pp. 5003–5017, Oct. 2007.
 [18] P. Chou and Z. Miao, “Ratedistortion optimized streaming of packetized media”, IEEE Trans. Multimedia, vol. 8, no. 2, pp. 390404, Apr. 2006.
 [19] D. P. Bertsekas, “Dynamic programming and optimal control,” 3rd, Athena Scientific, Massachusetts, 2005.
 [20] N. Salodkar, A. Karandikar, V. S. Borkar, “A stable online algorithm for energyefficient multiuser scheduling,” IEEE Trans. on Mobile Computing, vol. 9, no. 10, pp. 13911406, Oct. 2010.
 [21] J.G. Proakis, Digital Communications. New York: McGrawHill, 2001.
 [22] V. Tarokh, H. Jafarkhani, and A. Calderbank, “Spacetime block codes from orthogonal designs,” IEEE Trans. Inf. Theory, vol. 45, no. 5, pp. 14561467, July 1999.
 [23] F. Verde, T. Korakis, E. Erkip, and A. Scaglione, “A simple recruitment scheme of multiple nodes for cooperative MAC,” IEEE Trans. on Communications., vol. 58, no. 9, pp. 26672682, Sept. 2010.
 [24] P. Liu, C. Nie, T. Korakis, E. Erkip, S. Panwar, F. Verde, and A. Scaglione, “STiCMAC: A MAC Protocol for Robust SpaceTime Coding in Cooperative Wireless LANs.” Available online: http://arxiv.org/abs/1105.3977.
 [25] H. Wang and N. Mandayam, “A Simple Packet Transmission Scheme for Wireless Data over Fading Channels,” IEEE Trans. on Communications, vol. 52, no. 7, pp. 10551059, July 2004.
 [26] J.R. Ohm, “Threedimensional subband coding with motion compensation”, IEEE Trans. Image Processing, vol. 3, no. 5, pp. 559571, Sept. 1994.
 [27] , Finite Markov Decision Processes. New York: Wiley, 1994.