Stochastic Throughput Optimization for Twohop Systems with Finite Relay Buffers
Abstract
Optimal queueing control of multihop networks remains a challenging problem even in the simplest scenarios. In this paper, we consider a twohop halfduplex relaying system with random channel connectivity. The relay is equipped with a finite buffer. We focus on stochastic link selection and transmission rate control to maximize the average system throughput subject to a halfduplex constraint. We formulate this stochastic optimization problem as an infinite horizon average cost Markov decision process (MDP), which is wellknown to be a difficult problem. By using samplepath analysis and exploiting the specific problem structure, we first obtain an equivalent Bellman equation with reduced state and action spaces. By using relative value iteration algorithm, we analyze the properties of the value function of the MDP. Then, we show that the optimal policy has a thresholdbased structure by characterizing the supermodularity in the optimal control. Based on the thresholdbased structure and Markov chain theory, we further simplify the original complex stochastic optimization problem to a static optimization problem over a small discrete feasible set and propose a lowcomplexity algorithm to solve the simplified static optimization problem by making use of its special structure. Furthermore, we obtain the closedform optimal threshold for the symmetric case. The analytical results obtained in this paper also provide design insights for twohop relaying systems with multiple relays equipped with finite relay buffers.
I Introduction
The demand for communication services has been changing from traditional voice telephony services to mixed voice, data, and multimedia services. When data and realtime services are considered, it is necessary to jointly consider both physical layer issuers such as coding and modulation as well as higher layer issues such as network congestion and delay. It is also important to model these services using queueing concepts[1, 2]. On the other hand, to meet the explosive demand for these services, relaying has been shown effective for providing higher wireless date rate and better quality of service. Therefore, relay has been included in LTEA [3] and WiMAX [4] standards, where both technologies support fixed and twohop relays [5].
Consider a twohop relaying system with one source node (S), one halfduplex relay node (R) and one destination node (D) under i.i.d. onoff fading. Under conventional decodeandforward (DF) relay protocol, a scheduling slot is divided into two transmission phases, i.e., the listening phase (SR) and the retransmission phase (RD). The SR phase must be followed by the RD phase[6]. Under the instantaneous flow balance constraint, the system throughput is the minimum of the throughput from S to R and from R to D. Therefore, under random link connectivity, to achieve a nonzero system throughput (from S to D) within a scheduling slot, both the SR and RD links should be connected[7].
Now consider a finite buffer at R and apply crosslayer buffered decodeandforward (BDF) protocol to exploit the random channel connectivity and queueing[7]. Under BDF, due to buffering at R, a scheduling slot can be adaptively allocated for the SR transmission or the RD transmission, according to the R queue length and link quality. Then, the throughput to D can be made nonzero provided that the RD link is connected. While the buffer at R appears to offer obvious advantages, it is not clear how to design the optimal control to maximize the average system throughput given a finite relay buffer. Buffering a certain amount of bits at R can capture RD transmission opportunity (when only the RD link is on) and improve the throughput in the future. However, buffering too many bits at R may waste SR transmission opportunity (when only the SR link is on) due to R buffer overflow. Therefore, it remains unclear how to take full advantage of the finite buffer at R to balance the transmission rates of the SR and RD links so as to maximize the average system throughput.
Recently, the idea of crosslayer design using queueing concepts has been considered in the context of multihop networks with buffers. In [7] and [8], the authors consider the delayoptimal control for twohop networks with infinite buffers at the source and relay. Specifically, in [8], the authors obtain a delayoptimal link selection policy for nonfading channels. Then, in[7], the authors extend the analysis to i.i.d. on/off fading channels and show that a thresholdbased link selection policy is asymptotically delayoptimal when the scheduling slot duration tends to zero. However, it is not known whether the delayoptimal policy is still of a thresholdbased structure. In [9], the authors consider a twohop relaying system with an infinite backlog at the source and an infinite buffer at the relay. The optimal link selection policies are obtained to maximize the average system throughput. In the aforementioned references, the relay is assumed to be equipped with an infinite buffer and the proposed algorithms cannot guarantee that the instantaneous relay queue length is below a certain threshold. However, in practical systems, buffers are finite. The optimal designs for systems with infinite buffers do not necessarily lead to good performance for systems with finite buffers. In addition, in several practical networks, such as wireless sensor networks, wireless body area networks and wireless networksonchip, buffer size is limited. This is because that using buffers of large size would introduce practical issues, such as larger onchip board space, increased memoryaccess latency and higher power consumption [10, 11]. Therefore, it is very important to consider finite relay buffers in designing optimal resource controls for multihop networks to support data and realtime services [12, 13, 14, 15, 16].
Lyapunov drift approach represents a systematic way to queue stabilization problems for general multihop networks with infinite buffers[17, 18]. Specifically, Lyapunov drift approach mainly relies on quadratic Lyapunov functions, and can be used to obtain stochastic control algorithms with achieved utilities that are arbitrarily close to optimal. The derived control algorithms usually do not require system statistics predict beforehand and can be easily implemented online. However, the traditional Lyapunov drift approach cannot properly handle systems with finite buffers. References [13] and [14] extend the traditional Lyapunov drift approach in [17] and [18] to design stochastic control algorithms for multihop networks with infinite source buffers and finite relay buffers. In particular, [13] and [14] employ a new type of Lyapunov functions by multiplying queue backlogs of infinite buffers to the quadratic term of queue backlogs of finite buffers. Specifically, in [13], the authors propose scheduling algorithms to stabilize source queues under a fixed routing design. In [14], the authors propose joint flow control, routing and scheduling algorithms to maximize the throughput. References [15] and[16] adopt similar approaches to those in [13] and [14], and design control algorithms to optimize network utilities for multihop networks with finite source and relay buffers. However, the gap between the utility of each algorithm proposed in [14, 16, 15] and the optimal utility is inversely proportional to the buffer size. In other words, for the finite buffer case, the performance gap is always positive. Therefore, in contrast to the algorithms for the infinite buffer case in [17] and [18], the algorithms for the finite buffer case in [14, 16, 15] cannot achieve utilities that are arbitrarily close to optimal.
On the other hand, dynamic programming represents a systematic approach to optimal queueing control problems[2, 19, 20]. Generally, there exist only numerical solutions, which do not typically offer many design insights and are usually impractical for implementation due to the curse of dimensionality [20]. For example, in [21, 22], the authors consider delayaware control problems for twohop relaying systems with multiple relay nodes and propose suboptimal distributed numerical algorithms using approximate Markov Decision Process (MDP) and stochastic learning [20]. However, the obtained numerical algorithms may still be too complex for practical systems and do not offer many design insights. Several existing works focus on characterizing structural properties of optimal policies to obtain design insights for simple queueing networks. However, most existing analytical results are for a single queue with either controlled arrival rate or departure rate [23, 24, 25, 26]. To the best of our knowledge, structural results for a single queue with both controlled arrival and departure rates are still unknown. Furthermore, if the single queue has a finite buffer, the analytical results are limited. For example, [26] characterizes structural properties of a single finite queue only for part of the queue state space. The challenge of structural analysis for finitebuffer systems stems from the reflection effect (when a finite buffer is almost full)[11].
In general, the stochastic throughput maximization for multihop systems with fading channels and finite relay buffers is still unknown even for the case of a simple twohop relaying system. In this paper, we shall tackle some of the technical challenges. We consider a twohop relaying system with one source node, one halfduplex relay node and one destination node as well as random link connectivity. S has an infinite backlog and R is equipped with a finite buffer. We consider stochastic link selection and transmission rate control to maximize the average system throughput subject to a halfduplex constraint. We formulate the stochastic average throughput optimization problem as an infinite horizon average cost MDP, which is wellknown to be a difficult problem in general. By using samplepath analysis and exploiting the specific problem structure, we first obtain an equivalent Bellman equation with reduced state and action spaces. By relative value iteration algorithm, we analyze properties of the value function of the MDP. Then, based on these properties and the concept of supermodularity, we show that the optimal policy has a thresholdbased structure. By the structural properties of the optimal policy and Markov chain theory, we further simplify the original complex stochastic optimization problem to a static optimization problem over a small discrete feasible set. We propose a lowcomplexity algorithm to solve the static optimization problem by making use of its special structure. Furthermore, we obtain the closedform optimal threshold for the symmetric case. Numerical results verify the theoretical analysis and demonstrate the performance gain of the derived optimal policy over the existing solutions.
Notations: Boldface uppercase letters denote matrices and boldface lowercase letters denote vectors. denotes an identity matrix, the th column of which is denote as . and denote the inverse and the transpose of matrix , respectively. denotes the norm of vector . The important notations used in this paper are summarized in Table I.
Ii System Model
As illustrated in Fig. 1, we consider a twohop relaying system with one source node (S), one relay node (R) and one destination node (D). S cannot transmit packets to D due to the limited coverage and has to communicate with D with the help of R via the SR link and the RD link.^{1}^{1}1This twohop relaying model can be used to model the Type 1 relay in LTEAdvanced and the nontransparent relay in WiMAX [5]. R is halfduplex and equipped with a finite buffer. We consider a discretetime system, in which the time axis is partitioned into scheduling slots with unit slot duration. The slots are indexed by .
Iia Physical Layer Model
We model the channel fading of the SR link and the RD link with i.i.d. random link connectivity.^{2}^{2}2This channel fading model is widely used in the literature [27, 28]. Let denote the link connectivity state information (CSI) of the SR link and the RD link at slot , respectively, where 1 denotes connected and 0 not connected. Let denote the joint CSI at the th slot, where denotes the joint CSI state space.
Assumption 1 (Random Link Connectivity Model): and are both i.i.d. over time, where in each slot , the probabilities of being 1 for and are and , respectively, i.e., and . Furthermore, and are independent of each other.
We assume fixed transmission powers of S and R, and consider packet transmission. The maximum transmission rates (i.e., the maximum numbers of packets transmitted within a slot) of the SR link when and the RD link when are given by and , respectively. Note that, to avoid the overflow of the finite R buffer, the actual transmission rate of S may be smaller than . In addition, the actual transmission rate of R may be smaller than , subject to the availability of packets in the R buffer. These will be further illustrated in Section IIIA.
IiB Queueing Model
We assume that S has an infinite backlog (i.e., always has data to transmit) and consider a finite buffer of size (in number of packets) at R. Note that can be arbitrarily large. Assume . The finite buffer at R is used to hold the packet flow from S. We consider the buffered decodeandforward (BDF) protocol [7] to exploit the potential benefit of buffering at R under random channel connectivity. Specifically, according to BDF, (i) S can transmit packets to R when the SR link is connected, and R decodes and stores the packets from S in its buffer; (ii) R can transmit the packets in its buffer to D when the RD link is connected. Using the buffer at R and BDF, we can dynamically select the SR link or the RD link to transmit and choose the corresponding transmission rate at each slot based on the channel fading and queue states, according to a link selection and transmission rate control policy defined in Section IIIA.
Therefore, as illustrated in Fig. 1, the simple twohop relaying system with on/off channel connectivity can be modeled as a single queue with controlled arrival rate and departure rate. Let denote the queue state information (QSI) (in number of packets) at the R buffer at the beginning of the th slot, where denotes the QSI state space. The queue dynamics under the control policy will be illustrated in Section IIIB.
Iii Problem Formulation
Iiia Control Policy
For notation convenience, we denote as the system state at the th slot, where denotes the system state space. Let and denote whether the SR link or the RD link is scheduled, respectively, in the th slot, where 1 denotes scheduled and 0 otherwise. Let and denote the transmission rates of S and R in the th slot, respectively. Given an observed system state , the link selection action and the transmission rate control action are determined according to a stationary policy defined below.
Definition 1 (Stationary Policy)
A stationary link selection and transmission rate control policy is a mapping from the system state to the link selection action and the transmission rate control action , where and satisfy the following constraints:

;

(orthogonal link selection);

(at least one link is not connected); 
(departure rate at S);

(departure rate at R).
Note that, our focus for the link selection control is on the design of when . Moreover, the departure rates (actual transmission rates) of S and R, i.e., and , may be smaller than and , respectively, due to the following reasons. When the finite R buffer does not have enough space, to avoid buffer overflow and the resulting packet loss, is smaller than . When the R buffer does not have enough packets to transmit, is smaller than . Thus, we have the constraints in 4) and 5).
IiiB MDP Formulation
Given a stationary control policy defined in Definition 1, the queue dynamics at R is given by:
(1) 
From Assumption 1 and the queue dynamics in (1), we can see that the induced random process under policy is a Markov chain with the following transition probability
(2) 
In this paper, we restrict our attention to stationary unichain policies.^{3}^{3}3A unichain policy is a policy, under which the induced Markov chain has a single recurrent class (and possibly some transient states)[20]. For a given stationary unchain policy , the average system throughput is given by:
(3) 
where is the perstage reward (i.e., the departure rate at R at slot , indicating the number of packets delivered by the twohop relaying system) and the expectation is taken w.r.t. the measure induced by policy .
We wish to find an optimal link selection and transmission rate control policy to maximize the average system throughput in (3).^{4}^{4}4By Little’s law, maximizing the average system throughput in Problem 1 is equivalent to minimizing the upper bound of the average delay in the relay with a finite buffer.
Problem 1 (Stochastic Throughput Optimization)
(4) 
where is a stationary unchain policy satisfying the constraints in Definition 1.
Please note that, in Problem 1, we assume the existence of a stationary unichain policy achieving the maximum in (4). Latter, in Theorem 1, we shall prove the existence of such a policy. Problem 1 is an infinite horizon average cost MDP, which is wellknown to be a difficult problem [20]. While dynamic programming represents a systematic approach for MDPs, there generally exist only numerical solutions, which do not typically offer many design insights, and are usually not practical due to the curse of dimensionality[20].
Fig. 2 illustrates in the remainder of this paper, how we shall address the above challenges to solve Problem 1. Specifically, in Sections IV and V, we shall analyze the properties of the optimal policy. Based on these properties, we shall simplify Problem 1 to a static optimization problem (Problem 2) and develop a lowcomplexity algorithm (Algorithm 3) to solve it. Finally, we shall obtain the corresponding static optimization problem (Problem 3) for the symmetric case and derive its closedform optimal solution.
Iv Structure of Optimal Policy
In this section, we first obtain an equivalent Bellman equation based on reduced state and action spaces. Then, we show that the optimal policy has a thresholdbased structure.
Iva Optimality Equation
By exploiting some special structures in our problem, we obtain the following equivalent Bellman equation by reducing the state and action spaces. By solving the Bellman equation, we can obtain the optimal policy to Problem 1.
Theorem 1 (Equivalent Bellman Equation)
(i) The optimal transmission rate control policy is given by:
(5) 
(ii) There exists satisfying the following equivalent Bellman equation:
(6) 
where , , and . is the optimal value to Problem 1 for all initial state and is called the value function.
(iii) The optimal link selection policy is given by:
(7) 
where
(8) 
and .
Please see Appendix A.
Note that, the four terms in the R.H.S of (1) correspond to the perstage reward plus the value function of the updated queue state for and , respectively, under the optimal transmission rate control policy in (5), the link selection policy in Definition 1 for and , and the optimal link selection policy for . Therefore, the R.H.S of (1) indicates the expectation of the perstage reward plus the value function of the updated queue state under the optimal policy, where the expectation is over the channel state .
Remark 1 (Reduction of State and Action Spaces)
Note that the closedform optimal transmission rate control policy has already been obtained in (5). The optimal link selection policy is determined by the policy in (8). Thus, we only need to consider the optimal link selection for . In the following, we also refer to as the optimal link selection policy. To obtain the optimal policy, it remains to characterize . From Theorem 1, we can see that depends on the QSI state through the value function . Obtaining involves solving the equivalent Bellman equation in (1) for all . There is no closedform solution in general[20]. Brute force solutions such as value iteration and policy iteration are usually impractical for implementation and do not yield many design insights [20]. Therefore, it is desirable to study the structure of .
IvB Threshold Structure of Optimal Link Selection Policy
To further simplify the problem and obtain design insights, we study the structure of the optimal link selection policy. In the existing literature, structural properties of optimal policies are characterized for simple networks by studying properties of the value function. For example, most existing works consider the structural analysis of a single queue with either controlled arrival or departure rates [23, 24, 25, 26]. However, we control both the arrival and departure rates of the relay queue. Moreover, we consider a finite buffer, which has reflection effect when the buffer is almost full [11], and general system parameters, i.e., and . Therefore, it is more challenging to explore the properties of the value function in our system.
First, by the relative value iteration algorithm (RVIA)^{5}^{5}5RVIA is a commonly used numerical method for iteratively computing the value function, which is the solution to the Bellman equation for the infinite horizon average cost MDP [20, Chapter 4.3]. The details of RVIA can be found in Appendix B., we can iteratively prove the following properties of the value function.
Lemma 1 (Properties of Value Function)
The value function satisfies the following properties:

is monotonically nondecreasing in ;

, ;

, .
Please see Appendix B.
Remark 2 (Interpretation of Lemma 1)
Property 1 generally holds for singlequeue systems and is widely studied in the existing literature. Property 2 results from the throughput maximization problem considered in this work. This property does not hold for sum queue length minimization problems considered in most existing literature. Property 3 indicates that is concave^{6}^{6}6A function : is concave (where ) if . with . This stems from the relay queue with both controlled arrival and departure rates. In contrast, most existing works consider a single queue with either controlled arrival rate or departure rate, and the corresponding value function is 1concave.
Next, define the stateaction reward function as follows[24]
(9) 
Note that is related to the R.H.S. of the Bellman equation in (1). The R.H.S. of (9) indicates the expectation of the perstage reward plus the value function of the updated queue state under the optimal transmission rate control policy in (5), the link selection policy in Definition 1 for and , and any link selection policy satisfying and for .
By Lemma 1 and (9), we can show that the stateaction reward function is supermodular^{7}^{7}7A function : is supermodular in if [29]. in , i.e.,
(10) 
By [29, Lemma 4.7.1], supermodularity is a sufficient condition for the monotone policies to be optimal. Thus, we have the following theorem.
Theorem 2 (Threshold Structure of Optimal Policy)
There exists such that the optimal link selection policy for has the thresholdbased structure, i.e.,
(11) 
is the optimal threshold.
Please see Appendix C.
Remark 3 (Interpretation of Theorem 2)
By Theorem 2, we know that when , it is optimal to schedule the RD link if and to schedule the SR link otherwise. The intuition is as follows. When the relay queue length is large (), the SR transmission opportunities may be wasted when due to the overflow of the finite R buffer. Therefore, when , we should reduce the relay queue length at . When the relay queue length is small (), the RD transmission opportunities may be wasted when , as there may not be enough packets left to transmit. Therefore, when , we should schedule the SR link at . These design insights also hold for twohop relaying systems with multiple relays which are equipped with finite relay buffers.
V Optimal Solution for General Case
In this section, we first obtain a simplified static optimization problem for Problem 1 by making use of the structural properties of the optimal policy in Theorems 1 and 2. Then, based on the special structure, we develop a lowcomplexity algorithm to solve the static optimization problem.
Va Recurrent Class
By the structural properties of the optimal policy in Theorems 1 and 2, we can restrict our attention to the optimal transmission rate control in (5) and a thresholdbased link selection policy for , i.e.,
(12) 
where is the threshold. In the following, we use to denote the relay queue state process under the policies in (5) and (12). is a stationary DiscreteTime Markov Chain (DTMC)[30], the transition probabilities of which are determined by the threshold and the statistics of the CSI (i.e., and ). Fig. 3 and Fig. 4 illustrate the transition from any state and the transition diagram for , respectively, under the optimal transmission rate control in (5) and the thresholdbased link selection control in (12).
Next, we study the steadystate probabilities of . Note that, the steadystate probability of each transient state is zero [30], and hence the throughput of the transient states will not contribute to the ergodic throughput. In other words, the ergodic system throughput is equal to the average throughput over the recurrent class of . Thus, to calculate the ergodic throughput, we first characterize the recurrent class of . Let where and are two positive integers having no factors in common. Denote
(13) 
Since , there exist and such that
(14) 
Using the Bézout’s identity, we characterize the recurrent class of in the following lemma.
Lemma 2 (Recurrent Class)
Please see Appendix D.
Note that and is the same for any .
VB Equivalent Problem
We first consider a thresholdbased policy in (12) with the threshold chosen from instead of . Denote this threshold as . We wish to find the optimal threshold to maximize the ergodic system throughput (i.e., the ergodic reward of ). Later, in Lemma 3, we shall show the relationship between and .
As illustrated in Section VA, we focus on the computation of the average throughput over the recurrent class . Given , we can express the transition probability from to as , where . Let and denote the transition probability matrix and the steadystate probability row vector of the recurrent class , respectively. Note that is fully determined by and the statistics of the CSI (i.e., and ), and can be easily obtained, as illustrated in Fig. 3. By the PerronFrobenius theorem[30], can be computed from the following system of linear equations:
(15) 
Let denote the average departure rate at state under the threshold . According to the thresholdbased link selection policy in (12), we know that: (i) if queue state , the RD link is selected when or ; (ii) if queue state , the RD link is selected only when . Thus, we have:
(16) 
Let denote the average departure rate column vector of the recurrent class . Therefore, the ergodic system throughput can be expressed as .
Now, we formulate a static optimization problem to maximize the ergodic system throughput as below.
Problem 2 (Equivalent Optimization Problem)
(17) 
Note that, is the optimal solution to Problem 2. and is the optimal threshold to Problem 1. The following lemma summarizes the relationship between and .
Lemma 3 (Relationship between Problem 1 and Problem 2)
The optimal values to Problems 1 and 2 are the same, i.e., . If , then . If , then any threshold is optimal to Problem 1, where .
Please see Appendix E.
VC Algorithm for Problem 2
Problem 2 is a discrete optimization problem over the feasible set . It can be solved in a bruteforce way by computing for each separately. The bruteforce method has high complexity and fails to exploit the structure of the problem. In this part, we develop a lowcomplexity algorithm to solve Problem 2 by computing for all iteratively based on the special structure of .
We sort the elements of in ascending order, i.e., , where denotes the th smallest element in . For notation simplicity, we use and to represent and , respectively, where . In other words, each variable in is indexed by . Denote
(18) 
Note that the size of is . The system of linear equations in (15) can be transformed to the following system of linear equations:
(19) 
The steadystate probability vector in (19) can be obtained using the partition factorization method[31] as follows. By removing the th column and the th row of , we obtain a submatrix of , denoted as . Note that the size of is . Accordingly, let denote the permutation matrix such that
(20) 
In addition, let denote the solution to the following subsystem:
(21) 
Then, based on , we can compute by the partition factorization method[31] in Algorithm 1.
Remark 4 (Computational Complexity of Gaussian elimination)
The computation of each using Gaussian elimination in step 3 of Algorithm 1 requires flops.^{8}^{8}8The computational complexity is measured as the number of floatingpoint operations (flops), where a flop is defined as one addition, subtraction, multiplication or division of two floatingpoint numbers[32]. Thus, the computation of using Gaussian elimination requires flops, i.e., is of complexity .
On the other hand, for each , can also be obtained by multiplying both sides of (21) with .^{9}^{9}9 exists because is a nonsingular matrix[31]. This involves matrix inversion. To reduce the complexity, instead of computing for each separately, we shall compute iteratively (i.e., compute based on ) by exploiting the relationship between and . Specifically, for two adjacent thresholds and , the corresponding transition probability matrices and differ only in the th row, as illustrated in Fig. 5. The following lemma summarizes the relationship between and , which directly results from the special structure of .
Lemma 4 (Relationship between and )
Let denote the permutation matrix obtained by exchanging the th and th columns of and let and denote the column of and the column of , respectively. Then, and satisfy:
(22) 
where
(23)  
(24) 
Please see Appendix F.
Remark 5 (Computational Complexity of Algorithm 2)
By comparing Remarks 8 and 5, we can see that, the complexity of computing using Algorithm 2 is lower than that using Gaussian elimination in step 3 of Algorithm 1 . This is because using Gaussian elimination in step 3 of Algorithm 1 cannot make use of the special structure of , and hence has higher computational complexity.
Vi Optimal Solution for Special Case
In this section, we first obtain the corresponding static optimization problem for the symmetric case ( and ). Then, we derive its closedform optimal solution.
By Lemma 2, the recurrent class of is given by . Fig. 6 illustrates the corresponding transition diagram. By applying the PerronFrobenius theorem and the detailed balance equations[30], we obtain the steadystate probability:
(25a)  
(25b) 
where . Then, in the symmetric case, Problem 2 is equivalent to the following optimization problem.
Problem 3 (Optimization for Symmetric Case)
(26) 
By change of variables, we can equivalently transform the discrete optimization problem in Problem 3 to a continuous optimization problem and obtain the optimal threshold to Problem 1, which is summarized in the following lemma.
Lemma 5 (Optimal Threshold for Symmetric Case)
Please see Appendix G.
Vii Numerical Results and Discussion
In this section, we verify the analytical results and evaluate the performance of the proposed optimal solution via numerical examples. In the simulations, we choose .
Viia Threshold Structure of Optimal Policy
Fig. 7 illustrates the value function versus . is computed numerically using RVIA [20]. It can be seen that is increasing with and , which verify Properties 1) and 2) in Lemma 1, respectively. The third property of Lemma 1 can also be verified by checking the simulation points. Fig. 7 illustrates the function versus . Note that, is a function of , which is computed numerically using RVIA (a standard numerical MDP technique). According to the Bellman equation in Theorem 2, indicates that it is optimal to schedule the RD link for state ; indicates that it is optimal to schedule the SR link for state . Hence, from Fig. 7, we know that the optimal policy (obtained using RVIA) has a thresholdbased structure and are the optimal thresholds for the three cases. We have also calculated the optimal threshold for the three cases, using Algorithm 3 for the general case ( and ) and Lemma 5 for the symmetric case (). The obtained thresholds are equal to the optimal values obtained by the numerical MDP technique.
ViiB Throughput Performance
We compare the throughput performance of the proposed optimal policy (given in Theorems 1 and 2) with five baseline schemes: DOPN, ADOP, TOP, OLSP and NOP.^{10}^{10}10The detailed illustrations of DOPN, ADOP, TOP and OLSP are given in Section I. In particular, DOPN refers to the DelayOptimal Policy for Nonfading channels in [8], and ADOP refers to the Asymptotically DelayOptimal Policy for on/off fading channels in [7], both of which are designed for twohop networks with infinite buffers at the source and relay. TOP refers to the ThroughputOptimal Policy for a multihop network with infinite source buffers and finite relay buffers in [14]. OLSP refers to the Optimal Link Selection Policy for a twohop system with an infinite relay buffer in [9, Theorem 2]. NOP refers to the NearOptimal Policy obtained based on approximate value iteration using aggregation[20, Chapter 6.3], which is similar to the approximate MDP technique used in [21] and[22]. Note that, OLSP depends on the CSI only, while the other four baseline schemes depend on both of the CSI and QSI. In addition, the threshold in DOPN () is fixed; the threshold in ADOP () depends on ; the threshold in TOP () depends on ; NOP adapts to and .
Fig. 8 and Fig. 8 illustrate the average system throughput versus the maximum transmission rate and the relay buffer size, respectively, in the asymmetric case (). Since DOPN, ADOP, TOP, NOP and the proposed optimal policy depend on both of the CSI and QSI, they can achieve better throughput performance than OLSP in most cases. Moreover, as the threshold in the proposed optimal policy also depends on and , it outperforms all the baseline schemes. In summary, the proposed optimal policy can make better use of the system information and system parameters, and hence achieves the optimal throughput. Specifically, the performance gains of the proposed policy over DOPN, ADOP, TOP, OLSP and NOP are up to , , , and , respectively. Besides, the performance of TOP relies heavily on the choice for the parameter (the maximum admitted rate), which is not specified in [14].
Fig. 9 and Fig. 9 illustrate the average system throughput versus the maximum transmission rate and the relay buffer size, respectively, in the symmetric case (). Similar observations can be made for the symmetric case. The proposed optimal policy outperforms all the baseline schemes and its performance gains over DOPN, ADOP, TOP, OLSP and NOP are up to , , , and , respectively.