Stochastic Throughput Optimization for Two-hop Systems with Finite Relay Buffers

Stochastic Throughput Optimization for Two-hop Systems with Finite Relay Buffers

Bo Zhou, Ying Cui, , and Meixia Tao,  This paper was presented in part at IEEE Globecom 2014. B. Zhou, Y. Cui and M. Tao are with the Department of Electronic Engineering at Shanghai Jiao Tong University, Shanghai, 200240, P. R. China. Email: {b.zhou, cuiying, mxtao}@sjtu.edu.cn.
Abstract

Optimal queueing control of multi-hop networks remains a challenging problem even in the simplest scenarios. In this paper, we consider a two-hop half-duplex relaying system with random channel connectivity. The relay is equipped with a finite buffer. We focus on stochastic link selection and transmission rate control to maximize the average system throughput subject to a half-duplex constraint. We formulate this stochastic optimization problem as an infinite horizon average cost Markov decision process (MDP), which is well-known to be a difficult problem. By using sample-path analysis and exploiting the specific problem structure, we first obtain an equivalent Bellman equation with reduced state and action spaces. By using relative value iteration algorithm, we analyze the properties of the value function of the MDP. Then, we show that the optimal policy has a threshold-based structure by characterizing the supermodularity in the optimal control. Based on the threshold-based structure and Markov chain theory, we further simplify the original complex stochastic optimization problem to a static optimization problem over a small discrete feasible set and propose a low-complexity algorithm to solve the simplified static optimization problem by making use of its special structure. Furthermore, we obtain the closed-form optimal threshold for the symmetric case. The analytical results obtained in this paper also provide design insights for two-hop relaying systems with multiple relays equipped with finite relay buffers.

Wireless relay system, finite buffer, throughput optimization, Markov decision process, Markov chain theory, matrix update, structural results.

I Introduction

The demand for communication services has been changing from traditional voice telephony services to mixed voice, data, and multimedia services. When data and realtime services are considered, it is necessary to jointly consider both physical layer issuers such as coding and modulation as well as higher layer issues such as network congestion and delay. It is also important to model these services using queueing concepts[1, 2]. On the other hand, to meet the explosive demand for these services, relaying has been shown effective for providing higher wireless date rate and better quality of service. Therefore, relay has been included in LTE-A [3] and WiMAX [4] standards, where both technologies support fixed and two-hop relays [5].

Consider a two-hop relaying system with one source node (S), one half-duplex relay node (R) and one destination node (D) under i.i.d. on-off fading. Under conventional decode-and-forward (DF) relay protocol, a scheduling slot is divided into two transmission phases, i.e., the listening phase (S-R) and the retransmission phase (R-D). The S-R phase must be followed by the R-D phase[6]. Under the instantaneous flow balance constraint, the system throughput is the minimum of the throughput from S to R and from R to D. Therefore, under random link connectivity, to achieve a non-zero system throughput (from S to D) within a scheduling slot, both the S-R and R-D links should be connected[7].

Now consider a finite buffer at R and apply cross-layer buffered decode-and-forward (BDF) protocol to exploit the random channel connectivity and queueing[7]. Under BDF, due to buffering at R, a scheduling slot can be adaptively allocated for the S-R transmission or the R-D transmission, according to the R queue length and link quality. Then, the throughput to D can be made non-zero provided that the R-D link is connected. While the buffer at R appears to offer obvious advantages, it is not clear how to design the optimal control to maximize the average system throughput given a finite relay buffer. Buffering a certain amount of bits at R can capture R-D transmission opportunity (when only the R-D link is on) and improve the throughput in the future. However, buffering too many bits at R may waste S-R transmission opportunity (when only the S-R link is on) due to R buffer overflow. Therefore, it remains unclear how to take full advantage of the finite buffer at R to balance the transmission rates of the S-R and R-D links so as to maximize the average system throughput.

Recently, the idea of cross-layer design using queueing concepts has been considered in the context of multi-hop networks with buffers. In [7] and [8], the authors consider the delay-optimal control for two-hop networks with infinite buffers at the source and relay. Specifically, in [8], the authors obtain a delay-optimal link selection policy for non-fading channels. Then, in[7], the authors extend the analysis to i.i.d. on/off fading channels and show that a threshold-based link selection policy is asymptotically delay-optimal when the scheduling slot duration tends to zero. However, it is not known whether the delay-optimal policy is still of a threshold-based structure. In [9], the authors consider a two-hop relaying system with an infinite backlog at the source and an infinite buffer at the relay. The optimal link selection policies are obtained to maximize the average system throughput. In the aforementioned references, the relay is assumed to be equipped with an infinite buffer and the proposed algorithms cannot guarantee that the instantaneous relay queue length is below a certain threshold. However, in practical systems, buffers are finite. The optimal designs for systems with infinite buffers do not necessarily lead to good performance for systems with finite buffers. In addition, in several practical networks, such as wireless sensor networks, wireless body area networks and wireless networks-on-chip, buffer size is limited. This is because that using buffers of large size would introduce practical issues, such as larger on-chip board space, increased memory-access latency and higher power consumption [10, 11]. Therefore, it is very important to consider finite relay buffers in designing optimal resource controls for multi-hop networks to support data and realtime services [12, 13, 14, 15, 16].

Lyapunov drift approach represents a systematic way to queue stabilization problems for general multi-hop networks with infinite buffers[17, 18]. Specifically, Lyapunov drift approach mainly relies on quadratic Lyapunov functions, and can be used to obtain stochastic control algorithms with achieved utilities that are arbitrarily close to optimal. The derived control algorithms usually do not require system statistics predict beforehand and can be easily implemented online. However, the traditional Lyapunov drift approach cannot properly handle systems with finite buffers. References [13] and [14] extend the traditional Lyapunov drift approach in [17] and [18] to design stochastic control algorithms for multi-hop networks with infinite source buffers and finite relay buffers. In particular, [13] and [14] employ a new type of Lyapunov functions by multiplying queue backlogs of infinite buffers to the quadratic term of queue backlogs of finite buffers. Specifically, in [13], the authors propose scheduling algorithms to stabilize source queues under a fixed routing design. In [14], the authors propose joint flow control, routing and scheduling algorithms to maximize the throughput. References [15] and[16] adopt similar approaches to those in [13] and [14], and design control algorithms to optimize network utilities for multi-hop networks with finite source and relay buffers. However, the gap between the utility of each algorithm proposed in [14, 16, 15] and the optimal utility is inversely proportional to the buffer size. In other words, for the finite buffer case, the performance gap is always positive. Therefore, in contrast to the algorithms for the infinite buffer case in [17] and [18], the algorithms for the finite buffer case in [14, 16, 15] cannot achieve utilities that are arbitrarily close to optimal.

On the other hand, dynamic programming represents a systematic approach to optimal queueing control problems[2, 19, 20]. Generally, there exist only numerical solutions, which do not typically offer many design insights and are usually impractical for implementation due to the curse of dimensionality [20]. For example, in [21, 22], the authors consider delay-aware control problems for two-hop relaying systems with multiple relay nodes and propose suboptimal distributed numerical algorithms using approximate Markov Decision Process (MDP) and stochastic learning [20]. However, the obtained numerical algorithms may still be too complex for practical systems and do not offer many design insights. Several existing works focus on characterizing structural properties of optimal policies to obtain design insights for simple queueing networks. However, most existing analytical results are for a single queue with either controlled arrival rate or departure rate [23, 24, 25, 26]. To the best of our knowledge, structural results for a single queue with both controlled arrival and departure rates are still unknown. Furthermore, if the single queue has a finite buffer, the analytical results are limited. For example, [26] characterizes structural properties of a single finite queue only for part of the queue state space. The challenge of structural analysis for finite-buffer systems stems from the reflection effect (when a finite buffer is almost full)[11].

In general, the stochastic throughput maximization for multi-hop systems with fading channels and finite relay buffers is still unknown even for the case of a simple two-hop relaying system. In this paper, we shall tackle some of the technical challenges. We consider a two-hop relaying system with one source node, one half-duplex relay node and one destination node as well as random link connectivity. S has an infinite backlog and R is equipped with a finite buffer. We consider stochastic link selection and transmission rate control to maximize the average system throughput subject to a half-duplex constraint. We formulate the stochastic average throughput optimization problem as an infinite horizon average cost MDP, which is well-known to be a difficult problem in general. By using sample-path analysis and exploiting the specific problem structure, we first obtain an equivalent Bellman equation with reduced state and action spaces. By relative value iteration algorithm, we analyze properties of the value function of the MDP. Then, based on these properties and the concept of supermodularity, we show that the optimal policy has a threshold-based structure. By the structural properties of the optimal policy and Markov chain theory, we further simplify the original complex stochastic optimization problem to a static optimization problem over a small discrete feasible set. We propose a low-complexity algorithm to solve the static optimization problem by making use of its special structure. Furthermore, we obtain the closed-form optimal threshold for the symmetric case. Numerical results verify the theoretical analysis and demonstrate the performance gain of the derived optimal policy over the existing solutions.

Notations: Boldface uppercase letters denote matrices and boldface lowercase letters denote vectors. denotes an identity matrix, the -th column of which is denote as . and denote the inverse and the transpose of matrix , respectively. denotes the norm of vector . The important notations used in this paper are summarized in Table I.

{adjustbox}

max width=0.49 slot index maximum transmission rates of S and R relay buffer size probabilities of “ON” for S-D and R-D links joint CSI QSI system state joint CSI state space QSI state space system state space link selection actions for S-D and R-D links transmission rates of S and R link selection and transmission rate control policy value function state-action reward function recurrent class of relay queue state process threshold

TABLE I: List of important notations

Ii System Model

As illustrated in Fig. 1, we consider a two-hop relaying system with one source node (S), one relay node (R) and one destination node (D). S cannot transmit packets to D due to the limited coverage and has to communicate with D with the help of R via the S-R link and the R-D link.111This two-hop relaying model can be used to model the Type 1 relay in LTE-Advanced and the non-transparent relay in WiMAX [5]. R is half-duplex and equipped with a finite buffer. We consider a discrete-time system, in which the time axis is partitioned into scheduling slots with unit slot duration. The slots are indexed by .

Fig. 1: System model.

Ii-a Physical Layer Model

We model the channel fading of the S-R link and the R-D link with i.i.d. random link connectivity.222This channel fading model is widely used in the literature [27, 28]. Let denote the link connectivity state information (CSI) of the S-R link and the R-D link at slot , respectively, where 1 denotes connected and 0 not connected. Let denote the joint CSI at the -th slot, where denotes the joint CSI state space.

Assumption 1 (Random Link Connectivity Model): and are both i.i.d. over time, where in each slot , the probabilities of being 1 for and are and , respectively, i.e., and . Furthermore, and are independent of each other.

We assume fixed transmission powers of S and R, and consider packet transmission. The maximum transmission rates (i.e., the maximum numbers of packets transmitted within a slot) of the S-R link when and the R-D link when are given by and , respectively. Note that, to avoid the overflow of the finite R buffer, the actual transmission rate of S may be smaller than . In addition, the actual transmission rate of R may be smaller than , subject to the availability of packets in the R buffer. These will be further illustrated in Section III-A.

Ii-B Queueing Model

We assume that S has an infinite backlog (i.e., always has data to transmit) and consider a finite buffer of size (in number of packets) at R. Note that can be arbitrarily large. Assume . The finite buffer at R is used to hold the packet flow from S. We consider the buffered decode-and-forward (BDF) protocol [7] to exploit the potential benefit of buffering at R under random channel connectivity. Specifically, according to BDF, (i) S can transmit packets to R when the S-R link is connected, and R decodes and stores the packets from S in its buffer; (ii) R can transmit the packets in its buffer to D when the R-D link is connected. Using the buffer at R and BDF, we can dynamically select the S-R link or the R-D link to transmit and choose the corresponding transmission rate at each slot based on the channel fading and queue states, according to a link selection and transmission rate control policy defined in Section III-A.

Therefore, as illustrated in Fig. 1, the simple two-hop relaying system with on/off channel connectivity can be modeled as a single queue with controlled arrival rate and departure rate. Let denote the queue state information (QSI) (in number of packets) at the R buffer at the beginning of the -th slot, where denotes the QSI state space. The queue dynamics under the control policy will be illustrated in Section III-B.

Iii Problem Formulation

Iii-a Control Policy

For notation convenience, we denote as the system state at the -th slot, where denotes the system state space. Let and denote whether the S-R link or the R-D link is scheduled, respectively, in the -th slot, where 1 denotes scheduled and 0 otherwise. Let and denote the transmission rates of S and R in the -th slot, respectively. Given an observed system state , the link selection action and the transmission rate control action are determined according to a stationary policy defined below.

Definition 1 (Stationary Policy)

A stationary link selection and transmission rate control policy is a mapping from the system state to the link selection action and the transmission rate control action , where and satisfy the following constraints:

  1. ;

  2. (orthogonal link selection);


  3. (at least one link is not connected);

  4. (departure rate at S);

  5. (departure rate at R).

Note that, our focus for the link selection control is on the design of when . Moreover, the departure rates (actual transmission rates) of S and R, i.e., and , may be smaller than and , respectively, due to the following reasons. When the finite R buffer does not have enough space, to avoid buffer overflow and the resulting packet loss, is smaller than . When the R buffer does not have enough packets to transmit, is smaller than . Thus, we have the constraints in 4) and 5).

Iii-B MDP Formulation

Given a stationary control policy defined in Definition 1, the queue dynamics at R is given by:

(1)

From Assumption 1 and the queue dynamics in (1), we can see that the induced random process under policy is a Markov chain with the following transition probability

(2)

In this paper, we restrict our attention to stationary unichain policies.333A unichain policy is a policy, under which the induced Markov chain has a single recurrent class (and possibly some transient states)[20]. For a given stationary unchain policy , the average system throughput is given by:

(3)

where is the per-stage reward (i.e., the departure rate at R at slot , indicating the number of packets delivered by the two-hop relaying system) and the expectation is taken w.r.t. the measure induced by policy .

We wish to find an optimal link selection and transmission rate control policy to maximize the average system throughput in (3).444By Little’s law, maximizing the average system throughput in Problem 1 is equivalent to minimizing the upper bound of the average delay in the relay with a finite buffer.

Problem 1 (Stochastic Throughput Optimization)
(4)

where is a stationary unchain policy satisfying the constraints in Definition 1.

Please note that, in Problem 1, we assume the existence of a stationary unichain policy achieving the maximum in (4). Latter, in Theorem 1, we shall prove the existence of such a policy. Problem 1 is an infinite horizon average cost MDP, which is well-known to be a difficult problem [20]. While dynamic programming represents a systematic approach for MDPs, there generally exist only numerical solutions, which do not typically offer many design insights, and are usually not practical due to the curse of dimensionality[20].

Fig. 2 illustrates in the remainder of this paper, how we shall address the above challenges to solve Problem 1. Specifically, in Sections IV and V, we shall analyze the properties of the optimal policy. Based on these properties, we shall simplify Problem 1 to a static optimization problem (Problem 2) and develop a low-complexity algorithm (Algorithm 3) to solve it. Finally, we shall obtain the corresponding static optimization problem (Problem 3) for the symmetric case and derive its closed-form optimal solution.

Fig. 2: Proposed solution to Problem 1.

Iv Structure of Optimal Policy

In this section, we first obtain an equivalent Bellman equation based on reduced state and action spaces. Then, we show that the optimal policy has a threshold-based structure.

Iv-a Optimality Equation

By exploiting some special structures in our problem, we obtain the following equivalent Bellman equation by reducing the state and action spaces. By solving the Bellman equation, we can obtain the optimal policy to Problem 1.

Theorem 1 (Equivalent Bellman Equation)

(i) The optimal transmission rate control policy is given by:

(5)

(ii) There exists satisfying the following equivalent Bellman equation:

(6)

where , , and . is the optimal value to Problem 1 for all initial state and is called the value function.

(iii) The optimal link selection policy is given by:

(7)

where

(8)

and .

{proof}

Please see Appendix A.

Note that, the four terms in the R.H.S of (1) correspond to the per-stage reward plus the value function of the updated queue state for and , respectively, under the optimal transmission rate control policy in (5), the link selection policy in Definition 1 for and , and the optimal link selection policy for . Therefore, the R.H.S of (1) indicates the expectation of the per-stage reward plus the value function of the updated queue state under the optimal policy, where the expectation is over the channel state .

Remark 1 (Reduction of State and Action Spaces)

The Bellman equation in (1) is defined over the QSI state space . Thus, the system state space in Definition 1 is reduced to the QSI state space . The action space reduction can be observed by comparing Definition 1 with (5) and (7).

Note that the closed-form optimal transmission rate control policy has already been obtained in (5). The optimal link selection policy is determined by the policy in (8). Thus, we only need to consider the optimal link selection for . In the following, we also refer to as the optimal link selection policy. To obtain the optimal policy, it remains to characterize . From Theorem 1, we can see that depends on the QSI state through the value function . Obtaining involves solving the equivalent Bellman equation in (1) for all . There is no closed-form solution in general[20]. Brute force solutions such as value iteration and policy iteration are usually impractical for implementation and do not yield many design insights [20]. Therefore, it is desirable to study the structure of .

Iv-B Threshold Structure of Optimal Link Selection Policy

To further simplify the problem and obtain design insights, we study the structure of the optimal link selection policy. In the existing literature, structural properties of optimal policies are characterized for simple networks by studying properties of the value function. For example, most existing works consider the structural analysis of a single queue with either controlled arrival or departure rates [23, 24, 25, 26]. However, we control both the arrival and departure rates of the relay queue. Moreover, we consider a finite buffer, which has reflection effect when the buffer is almost full [11], and general system parameters, i.e., and . Therefore, it is more challenging to explore the properties of the value function in our system.

First, by the relative value iteration algorithm (RVIA)555RVIA is a commonly used numerical method for iteratively computing the value function, which is the solution to the Bellman equation for the infinite horizon average cost MDP [20, Chapter 4.3]. The details of RVIA can be found in Appendix B., we can iteratively prove the following properties of the value function.

Lemma 1 (Properties of Value Function)

The value function satisfies the following properties:

  1. is monotonically non-decreasing in ;

  2. , ;

  3. , .

{proof}

Please see Appendix B.

Remark 2 (Interpretation of Lemma 1)

Property 1 generally holds for single-queue systems and is widely studied in the existing literature. Property 2 results from the throughput maximization problem considered in this work. This property does not hold for sum queue length minimization problems considered in most existing literature. Property 3 indicates that is -concave666A function : is -concave (where ) if . with . This stems from the relay queue with both controlled arrival and departure rates. In contrast, most existing works consider a single queue with either controlled arrival rate or departure rate, and the corresponding value function is 1-concave.

Next, define the state-action reward function as follows[24]

(9)

Note that is related to the R.H.S. of the Bellman equation in (1). The R.H.S. of (9) indicates the expectation of the per-stage reward plus the value function of the updated queue state under the optimal transmission rate control policy in (5), the link selection policy in Definition 1 for and , and any link selection policy satisfying and for .

By Lemma 1 and (9), we can show that the state-action reward function is supermodular777A function : is supermodular in if [29]. in , i.e.,

(10)

By [29, Lemma 4.7.1], supermodularity is a sufficient condition for the monotone policies to be optimal. Thus, we have the following theorem.

Theorem 2 (Threshold Structure of Optimal Policy)

There exists such that the optimal link selection policy for has the threshold-based structure, i.e.,

(11)

is the optimal threshold.

{proof}

Please see Appendix C.

Remark 3 (Interpretation of Theorem 2)

By Theorem 2, we know that when , it is optimal to schedule the R-D link if and to schedule the S-R link otherwise. The intuition is as follows. When the relay queue length is large (), the S-R transmission opportunities may be wasted when due to the overflow of the finite R buffer. Therefore, when , we should reduce the relay queue length at . When the relay queue length is small (), the R-D transmission opportunities may be wasted when , as there may not be enough packets left to transmit. Therefore, when , we should schedule the S-R link at . These design insights also hold for two-hop relaying systems with multiple relays which are equipped with finite relay buffers.

V Optimal Solution for General Case

In this section, we first obtain a simplified static optimization problem for Problem 1 by making use of the structural properties of the optimal policy in Theorems 1 and 2. Then, based on the special structure, we develop a low-complexity algorithm to solve the static optimization problem.

V-a Recurrent Class

By the structural properties of the optimal policy in Theorems 1 and 2, we can restrict our attention to the optimal transmission rate control in (5) and a threshold-based link selection policy for , i.e.,

(12)

where is the threshold. In the following, we use to denote the relay queue state process under the policies in (5) and (12). is a stationary Discrete-Time Markov Chain (DTMC)[30], the transition probabilities of which are determined by the threshold and the statistics of the CSI (i.e., and ). Fig. 3 and Fig. 4 illustrate the transition from any state and the transition diagram for , respectively, under the optimal transmission rate control in (5) and the threshold-based link selection control in (12).

(a)
(b)
Fig. 3: Illustration of transitions from state .
(a) , , .
(b) , , .
Fig. 4: Illustration of the transition diagram of . , and .

Next, we study the steady-state probabilities of . Note that, the steady-state probability of each transient state is zero [30], and hence the throughput of the transient states will not contribute to the ergodic throughput. In other words, the ergodic system throughput is equal to the average throughput over the recurrent class of . Thus, to calculate the ergodic throughput, we first characterize the recurrent class of . Let where and are two positive integers having no factors in common. Denote

(13)

Since , there exist and such that

(14)

Using the Bézout’s identity, we characterize the recurrent class of in the following lemma.

Lemma 2 (Recurrent Class)

For any , under the optimal transmission rate control in (5) and a threshold-based link selection policy in (12) with any , the recurrent class of is given by

where is given by (13) and satisfy (14). The size of is .

{proof}

Please see Appendix D.

Note that and is the same for any .

V-B Equivalent Problem

We first consider a threshold-based policy in (12) with the threshold chosen from instead of . Denote this threshold as . We wish to find the optimal threshold to maximize the ergodic system throughput (i.e., the ergodic reward of ). Later, in Lemma 3, we shall show the relationship between and .

As illustrated in Section V-A, we focus on the computation of the average throughput over the recurrent class . Given , we can express the transition probability from to as , where . Let and denote the transition probability matrix and the steady-state probability row vector of the recurrent class , respectively. Note that is fully determined by and the statistics of the CSI (i.e., and ), and can be easily obtained, as illustrated in Fig. 3. By the Perron-Frobenius theorem[30], can be computed from the following system of linear equations:

(15)

Let denote the average departure rate at state under the threshold . According to the threshold-based link selection policy in (12), we know that: (i) if queue state , the R-D link is selected when or ; (ii) if queue state , the R-D link is selected only when . Thus, we have:

(16)

Let denote the average departure rate column vector of the recurrent class . Therefore, the ergodic system throughput can be expressed as .

Now, we formulate a static optimization problem to maximize the ergodic system throughput as below.

Problem 2 (Equivalent Optimization Problem)
(17)

Note that, is the optimal solution to Problem 2. and is the optimal threshold to Problem 1. The following lemma summarizes the relationship between and .

Lemma 3 (Relationship between Problem 1 and Problem 2)

The optimal values to Problems 1 and 2 are the same, i.e., . If , then . If , then any threshold is optimal to Problem 1, where .

{proof}

Please see Appendix E.

By Lemma 3, instead of solving Problem 1, which is a complex stochastic optimization problem, we can solve Problem 2, which is a static problem over the smaller feasible set .

V-C Algorithm for Problem 2

Problem 2 is a discrete optimization problem over the feasible set . It can be solved in a brute-force way by computing for each separately. The brute-force method has high complexity and fails to exploit the structure of the problem. In this part, we develop a low-complexity algorithm to solve Problem 2 by computing for all iteratively based on the special structure of .

We sort the elements of in ascending order, i.e., , where denotes the -th smallest element in . For notation simplicity, we use and to represent and , respectively, where . In other words, each variable in is indexed by . Denote

(18)

Note that the size of is . The system of linear equations in (15) can be transformed to the following system of linear equations:

(19)

The steady-state probability vector in (19) can be obtained using the partition factorization method[31] as follows. By removing the -th column and the -th row of , we obtain a submatrix of , denoted as . Note that the size of is . Accordingly, let denote the permutation matrix such that

(20)

In addition, let denote the solution to the following subsystem:

(21)

Then, based on , we can compute by the partition factorization method[31] in Algorithm 1.

1:  Obtain and in (18).
2:  Find and partition into the form (20) to obtain and .
3:  Compute using Gaussian elimination.
4:  Let and normalize to obtain , i.e.,
Algorithm 1 Algorithm to Compute
Remark 4 (Computational Complexity of Gaussian elimination)

The computation of each using Gaussian elimination in step 3 of Algorithm 1 requires flops.888The computational complexity is measured as the number of floating-point operations (flops), where a flop is defined as one addition, subtraction, multiplication or division of two floating-point numbers[32]. Thus, the computation of using Gaussian elimination requires flops, i.e., is of complexity .

Fig. 5: Illustration of . , , , . =.

On the other hand, for each , can also be obtained by multiplying both sides of (21) with .999 exists because is a nonsingular matrix[31]. This involves matrix inversion. To reduce the complexity, instead of computing for each separately, we shall compute iteratively (i.e., compute based on ) by exploiting the relationship between and . Specifically, for two adjacent thresholds and , the corresponding transition probability matrices and differ only in the -th row, as illustrated in Fig. 5. The following lemma summarizes the relationship between and , which directly results from the special structure of .

Lemma 4 (Relationship between and )

Let denote the permutation matrix obtained by exchanging the -th and -th columns of and let and denote the -column of and the -column of , respectively. Then, and satisfy:

(22)

where

(23)
(24)
{proof}

Please see Appendix F.

Based on Lemma 4, we can compute by Algorithm 2.

1:  if  then
2:     Compute using Gaussian elimination.
3:  else
4:     Obtain , and in Lemma  4.
5:     Compute based on according to (22).
6:  end if
7:  Compute .
Algorithm 2 Algorithm to Compute
Remark 5 (Computational Complexity of Algorithm 2)

By Algorithm 2, for , the computation of requires flops. For each , steps 4, 5 and 7 require , and flops, respectively, and hence the computation of requires flops. Therefore, the computation of using Algorithm 2 requires flops, i.e., is of complexity .

By comparing Remarks 8 and 5, we can see that, the complexity of computing using Algorithm 2 is lower than that using Gaussian elimination in step 3 of Algorithm 1 . This is because using Gaussian elimination in step 3 of Algorithm 1 cannot make use of the special structure of , and hence has higher computational complexity.

By replacing step 3 in Algorithm 1 with Algorithm 2, we can compute for all iteratively. Therefore, we can develop Algorithm 3 to solve Problem 2.

1:  initialize , .
2:  for  do
3:     .
4:     Compute by (16).
5:     Compute by Algorithm 1 wherein step (3) is replaced with Algorithm 2.
6:     if  then
7:        , .
8:     end if
9:  end for
Algorithm 3 Algorithm to Compute for Problem 2

Vi Optimal Solution for Special Case

In this section, we first obtain the corresponding static optimization problem for the symmetric case ( and ). Then, we derive its closed-form optimal solution.

By Lemma 2, the recurrent class of is given by . Fig. 6 illustrates the corresponding transition diagram. By applying the Perron-Frobenius theorem and the detailed balance equations[30], we obtain the steady-state probability:

(25a)
(25b)

where . Then, in the symmetric case, Problem 2 is equivalent to the following optimization problem.

Problem 3 (Optimization for Symmetric Case)
(26)
Fig. 6: The transition diagram of for the symmetric case. State represents state and .

By change of variables, we can equivalently transform the discrete optimization problem in Problem 3 to a continuous optimization problem and obtain the optimal threshold to Problem 1, which is summarized in the following lemma.

Lemma 5 (Optimal Threshold for Symmetric Case)

In the symmetric case, any threshold

achieves the optimal value to Problem 1.

{proof}

Please see Appendix G.

Vii Numerical Results and Discussion

In this section, we verify the analytical results and evaluate the performance of the proposed optimal solution via numerical examples. In the simulations, we choose .

Vii-a Threshold Structure of Optimal Policy

Fig. 7 illustrates the value function versus . is computed numerically using RVIA [20]. It can be seen that is increasing with and , which verify Properties 1) and 2) in Lemma 1, respectively. The third property of Lemma 1 can also be verified by checking the simulation points. Fig. 7 illustrates the function versus . Note that, is a function of , which is computed numerically using RVIA (a standard numerical MDP technique). According to the Bellman equation in Theorem 2, indicates that it is optimal to schedule the R-D link for state ; indicates that it is optimal to schedule the S-R link for state . Hence, from Fig. 7, we know that the optimal policy (obtained using RVIA) has a threshold-based structure and are the optimal thresholds for the three cases. We have also calculated the optimal threshold for the three cases, using Algorithm 3 for the general case ( and ) and Lemma 5 for the symmetric case (). The obtained thresholds are equal to the optimal values obtained by the numerical MDP technique.

(a)
(b)
Fig. 7: Verification of analytical results. packets.

Vii-B Throughput Performance

We compare the throughput performance of the proposed optimal policy (given in Theorems 1 and 2) with five baseline schemes: DOPN, ADOP, TOP, OLSP and NOP.101010The detailed illustrations of DOPN, ADOP, TOP and OLSP are given in Section I. In particular, DOPN refers to the Delay-Optimal Policy for Non-fading channels in [8], and ADOP refers to the Asymptotically Delay-Optimal Policy for on/off fading channels in [7], both of which are designed for two-hop networks with infinite buffers at the source and relay. TOP refers to the Throughput-Optimal Policy for a multi-hop network with infinite source buffers and finite relay buffers in [14]. OLSP refers to the Optimal Link Selection Policy for a two-hop system with an infinite relay buffer in [9, Theorem 2]. NOP refers to the Near-Optimal Policy obtained based on approximate value iteration using aggregation[20, Chapter 6.3], which is similar to the approximate MDP technique used in [21] and[22]. Note that, OLSP depends on the CSI only, while the other four baseline schemes depend on both of the CSI and QSI. In addition, the threshold in DOPN () is fixed; the threshold in ADOP () depends on ; the threshold in TOP () depends on ; NOP adapts to and .

Fig. 8 and Fig. 8 illustrate the average system throughput versus the maximum transmission rate and the relay buffer size, respectively, in the asymmetric case (). Since DOPN, ADOP, TOP, NOP and the proposed optimal policy depend on both of the CSI and QSI, they can achieve better throughput performance than OLSP in most cases. Moreover, as the threshold in the proposed optimal policy also depends on and , it outperforms all the baseline schemes. In summary, the proposed optimal policy can make better use of the system information and system parameters, and hence achieves the optimal throughput. Specifically, the performance gains of the proposed policy over DOPN, ADOP, TOP, OLSP and NOP are up to , , , and , respectively. Besides, the performance of TOP relies heavily on the choice for the parameter (the maximum admitted rate), which is not specified in [14].

(a) Throughput versus . packets.
(b) Throughput versus . packets/slot.
Fig. 8: Throughput for different schemes in the asymmetric case (). . The unit of is packet/slot.

Fig. 9 and Fig. 9 illustrate the average system throughput versus the maximum transmission rate and the relay buffer size, respectively, in the symmetric case (). Similar observations can be made for the symmetric case. The proposed optimal policy outperforms all the baseline schemes and its performance gains over DOPN, ADOP, TOP, OLSP and NOP are up to , , , and , respectively.

(a) Throughput versus . packets.
(b) Throughput versus .