Abstract
We design a dynamic rate scheduling policy of Markov type via the
solution (a social optimal Nash equilibrium point) to a
utilitymaximization problem over a randomly evolving capacity set
for a class of generalized processorsharing queues living in a
random environment, whose job arrivals to each queue follow a doubly
stochastic renewal process (DSRP). Both the random environment and
the random arrival rate of each DSRP are driven by a finite state
continuous time Markov chain (FSCTMC). Whereas the scheduling
policy optimizes in a greedy fashion with respect to each
queue and environmental state and since the closedform solution for
the performance of such a queueing system under the policy is
difficult to obtain, we establish a reflecting diffusion with
regimeswitching (RDRS) model for its measures of performance and
justify its asymptotic optimality through deriving the stochastic
fluid and diffusion limits for the corresponding system under heavy
traffic and identifying a cost function related to the utility
function, which is minimized through minimizing the workload process
in the diffusion limit. More importantly, our queueing model
includes both user multiinput multioutput (MIMO) multiple
access channel (MAC) and broadcast channel (BC) with cooperation and
admission control as special cases. In these wireless systems, data
from the users in the MAC or data to the users in the BC is
transmitted over a common channel that is fading according to the
FSCTMC. The user capacity region for the MAC or the BC is a
setvalued stochastic process that switches with the FSCTMC fading.
In any particular channel state, we show that each of the user
capacity regions is a convex set bounded by a number of linear or
smooth curved facets. The random arrival rate to each user for these
systems is designed to switch with the FSCTMC fading via admission
control. At the transmit end, packets to each user are queued and
served under the policy. Therefore our queueing model can perfectly
match the dynamics of these wireless systems.
Key words: ProcessorSharing Queues, Random Environment, MultiInput MultiOutput, Multiple Access Channel, Broadcast Channel, Shannon Capacity Region, Markov Fading, Doubly Stochastic Renewal Process, UtilityMaximization Scheduling, Nash Equilibrium, Concave Game, Heavy Traffic, Asymptotic Optimality, Fluid Limit, Diffusion Limit, Reflecting Diffusion with RegimeSwitching
Optimal Rate Scheduling via UtilityMaximization for User MIMO Markov Fading Wireless Channels with Cooperation^{1}^{1}1The author gratefully acknowledges the support from National Natural Science Foundation of China under grant No. 10971249.
Wanyang Dai
Department of Mathematics and State Key Laboratory of Novel Software Technology
Nanjing University, Nanjing 210093, China
Email: nan5lu8@netra.nju.edu.cn
Originally submitted on June 17, 2010
Revised version submitted on December 24, 2010
1 Introduction
In the current cellular systems, each base station is considered as a separate entity with no cooperation among base stations, infrastructure cooperation among base stations has been proposed in the literature such as [1, 33, 48], which is to consider the base stations as one end of a MIMO system that has received a great deal of attention as a method to achieve high data rates over wireless links. Thus, in this paper, we study a user MIMO MAC uplink system and a user MIMO BC downlink system. Both of them can be seen as a cellular system with multiple users and multiple cooperating base station antennas: either multiple cooperating base stations each with a single antenna or a singlecell cellular system with a multiantenna base station or a combination thereof. In the MAC or the BC, data is buffered at the transmit end and the channel is timevarying due to multipath fading, which is a typical feature of wireless channel and brings additional complexity for system design and performance analysis. We suppose that the fading process is a FSCTMC whose discrete time version is widely used in modeling wireless channels (see, e.g., [49, 43, 48, 22], and references therein). Therefore, the user capacity regions of the MAC and the BC are both timevarying setvalued stochastic processes driven by the FSCTMC and in each state of the Markov chain, it is well known that one can obtain the improved capacity by cooperation, e.g., the sum of the rates at which data can be served for the users is greater than the singleuser capacity for any user (see, e.g., [3]). Moreover, due to the impact of the random environmental fading factor and the cooperated design, the service rates of the corresponding queueing system for the users in the MAC or in the BC are also random processes driven by the FSCTMC.
So, motivated by the above observations, we consider a type of generalized processorsharing queues living in a random environment, whose job arrivals to each queue follow a DSRP. Both the random environment and the random arrival rate of each DSRP are driven by a FSCTMC. Presently, for such a queueing system, it is not known how to choose a reasonable online rate scheduling policy to minimize the average delay for a given load and exact solutions for average delay are not available even for many simple policies, which implies that any meaningful comparison has to be done by simulations. Therefore, to make the gap between the dynamic rate scheduling and the performance optimization for the system be filled to some extent, we design a dynamic rate scheduling policy of Markov type via the solution (a social optimal Nash equilibrium point) to an optimization problem that maximizes a general utility function over each of the randomly evolving capacity regions through the KarushKuhnTucker (KKT) optimality conditions (see, e.g., [35]). Moreover, to overcome the intractability of performance evaluation for the system under the designed policy, we develop stochastic fluid and diffusion models through suitable scaling of time and space and justifying related limit theorems for a heavily loaded queueing system operating under this policy. The limit models for queue lengths (or workloads) are respectively a random process driven by the FSCTMC and a RDRS (i.e., a reflecting stochastic differential equation (SDE) with regimeswitching). In addition, we identify a cost function related to the utility function, which is minimized through minimizing the workload process in the diffusion limit and hence provides a useful means in illustrating our policy to be asymptotically optimal.
Finally, in order to incorporate the user MIMO MAC and MIMO BC into our general queueing framework, we justify that the user capacity region for the MAC or the BC in any particular channel state is a convex set formed by a number of linear or smooth curved facets through applying the method of convex optimization, the implicit function theorem, and the duality of capacity regions between the MAC and the BC. Moreover, to realize the DSRP in the MAC or in the BC, we adopt a crosslayer design methodology to switch the arrival rates with the FSCTMC channel fading process according to the current channel state information (CSI) through admission control.
Literature Review
The randomly evolving capacity region used in designing our utilitymaximization rate scheduling policy is a generalization of the socalled MIMO channel capacity region in the Shannon theoretic sense. For a singleuser timeinvariant channel, the Shannon capacity is defined as the maximum mutual information between input and output, which is shown by Shannon’s capacity theorem to be the maximum data rate that can be transmitted over the channel with arbitrarily small error probability. For a user timeinvariant MIMO channel, the corresponding capacity region is a dimensional set of all rate vectors simultaneously achievable by all users. In particular, the region for the Gaussian MAC is a convex set that is the union of rate regions corresponding to every product input distribution satisfying the userbyuser power constraints (see, e.g., [14], [12], [56], [25]). The Gaussian BC differs from the Gaussian MAC in two fundamental aspects (see, e.g., [31]). In the MAC, each transmitter has an individual power constraint, whereas in the BC there is only a single power constraint on the transmitter. Moreover, signal and interference come from different transmitters in the MAC and are multiplied by different channels gains (known as the nearfar effect) before being received, whereas in the BC, the entire received signal comes from the same source and therefore has the same channel gain. Nevertheless, the capacity region for the Gaussian BC can be obtained through the duality between the Gaussian MAC and the Gaussian BC (see, e.g., [31] and [25]), i.e., it is the convex hull of the union over the set of capacity regions of the dual Gaussian MACs such that the total MAC power is the same as the power in the BC. Moreover, the authors in [34] provide an analytical and numerical characterization in terms of the shape of the capacity boundaries for both the MAC and the BC.
However, in both the Gaussian MAC and the Gaussian BC, the exact characterization concerning piecewise smoothness of the capacity boundaries is not available until now, which motivates us to give more accurate analysis about the capacity region in order to apply our utility maximization rate scheduling algorithm to these wireless systems. In addition, when the user MIMO channels are stochastic and timevarying fading ones, the capacity regions have multiple definitions (see, e.g., [25]). Nevertheless, to capture the exact capacity region at each time instant for the MAC or the BC, we consider the capacity regions as a setvalued stochastic process evolving with the FSCTMC rather than think of it as a fixed one in an average sense such as an ergodic capacity region (see, e.g., [25]).
Concerning the scheduling algorithms, the authors in [1, 3, 4] considered a quasistatic downlink channel, where the channel is assumed to be fixed for all transmissions over the period of interest. In this case, the FSCTMC and the random packet arrival rates assumed in the current paper reduce to constants, and moreover, without considering utility and cost optimization, the authors in [1, 3, 4] designed a simple rate scheduling policy of Markov type, which was shown to be throughputoptimal for a fixed convex capacity region in [1] and a limit theorem was proved to justify the diffusion approximation of the queue length process for a heavily loaded system operating under their policy with two users in [3] and with multiple users in [4]. Their approximating model is a RBM living in the twodimensional positive quadrant or in the generaldimensional positive orthant.
In the studies of [43, 41, 22], some scheduling policies were considered for certain heavily loaded wireless systems with finite state discrete time Markov fading process. In particular, a MaxWeight scheduling policy was considered in [43] for a generalized switch and it was shown that the workload process converges to a onedimensional RBM and MaxWeight policy asymptotically minimizes the workload under certain conditions. Moreover, an exponential scheduling rule was designed for wireless channels in [41] and for a generalized switch in [22], which was proved to be throughputoptimal and under which, the similar results concerning the workload process were obtained and justified as in [43]. In addition, [54] designed a utilitymaximizing resource allocation policy for a class of stochastic networks with concurrent occupancy of resources and established its asymptotic optimality for the associated heavily loaded queueing system. Their policy covers the generalized rule in [36] and the MaxWeight policy in [43] as special cases.
The differences between the current study and those in [43, 41, 22, 54] are in three aspects as follows.
First, their scheduling policies in [43, 41, 22, 54] depend only on a fixed capacity region that is a convex polyhedral and ours depends on a timevarying and stochastic evolving capacity region process (a random environment) that, at each time instant, is a more general convex region rather than a convex polyhedral.
Second, the rates of packet arrivals to the users are random processes rather than a constant as used in [43, 41, 22, 54]. Hence our input traffic to each user is a DSRP whose particular case is the wellknown doubly stochastic Poisson process (see, e.g., [8]) that is widely used to model voice, video and data source traffics in telecommunication systems and is called Markovian modulated Poisson process (MMPP) or ON/OFF source (see, e.g., [30], [37], [44], [20]) and [21])).
Third, our discussion is based on a continuous time horizon rather than a discrete one as in [43, 41, 22]. Therefore our vectorvalued random service rate process depends on the FSCTMC whose holding time at each environmental state has an important impact on the limiting processes, e.g., the limiting fluid model is a random process driven by the FSCTMC rather than a deterministic function of time and the limiting diffusion model is a more general RDRS rather than a RBM as derived in [43, 41, 22]. If one wants to directly generalize the studies in [43, 41, 22] to the corresponding ones in a discrete time random environment, a geometric distribution may be imposed on the holding time at each environmental state.
Finally, without considering optimal dynamic scheduling with utility/cost and performance optimizations as the goals. CTMCs have been used to model the random environments in the studies of some queueing systems under certain static service disciplines, see, e.g., [13] and references therein for more details.
The rest of the paper is organized as follows. In Section 2, we introduce our generalized processorsharing queues under random environment and design our optimal rate scheduling policy. In Section 3, we introduce our heavy traffic condition and present our main asymptotic optimality theorem. In Section 4, we illustrate the usages of our optimal policy and our main results in the user MIMO uplink and downlink wireless channels and present the associated results concerning the piecewise smoothness of capacity boundaries of the user MIMO MAC and MIMO BC. In Sections 56, we prove our main theorem and associated lemmas.
2 Optimizing ProcessorSharing Queues under Random Environment
2.1 Primitive Data
The queueing system under consideration is a type of generalized processorsharing queues that live in a random environment evolving according to a stationary FSCTMC , which takes value in a finite state space with generator matrix () and
(2.1) 
where is the holding rate for the chain in an environmental state and is the transition matrix of its embedded discrete time Markov chain (see, e.g., [39]). Moreover, the queueing system has queues in parallel, which correspond to users for a given positive integer . Each queue that is of infinite buffer capacity buffers packets (jobs) arrived for a given user. The queues can be served simultaneously by a single server with rate allocation vector that takes values in a timevarying and randomly evolving capacity set .
Concretely, for each state , is a convex set that contains the origin and has boundary pieces of which are dimensional linear facets along the coordinate axes while the remaining ones are in the interior of and form the socalled capacity surface denoted by , which consists of linear or smooth curved facets on for , i.e.,
(2.2) 
Moreover, if we let denote the sum capacity upper bound for the capacity region, then the facet in the center of the capacity surface is linear and can be expressed by
(2.3) 
where is the index corresponding to . Moreover, we suppose that any one of the linear facets along the coordinate axes forms a user capacity region corresponding to a particular group of users who are the only users in the systems. Similarly, we can define the user capacity region for each . Examples of such capacity sets in two and threedimensional spaces for a particular state are shown in Figures 1 and 2.
In addition, we suppose that the system starts empty and that there is a dimensional packet arrival process , where with and is the number of packets arrived to the th queue during and the prime denotes the transpose of a vector or a matrix. For each , is assumed to be a DSRP with random arrival rate process and squared coefficient of variation process . The packet interarrival times are assumed to be i.i.d. during the time interval corresponding to a specific environmental state . Moreover, let denote the sequence of times between the arrivals of the th and the th packets to the th queue and let denote the sequence of packet lengths (in bits) for the successive arrivals to queue , which is assumed to be a sequence of strictly positive i.i.d. random variables with average packet length and squared coefficient of variation . In addition, we suppose that all interarrival and service time processes are mutually (conditionally) independent when the environmental state is fixed. For each and each nonnegative constant (in bits), we use to denote the renewal counting process associated with , i.e.,
(2.4) 
The reasonability about the DSRP assumption on the packet arrivals and about the i.i.d. assumption on the packet sizes in a communication system is due to the largescale computer experiments and statistical analysis conducted by Bell Labs scientists [10], and recent findings by [20]) and [21]).
2.2 A UtilityMaximization Scheduling Algorithm and Queueing Dynamics
First of all, we remark that the service discipline used in this paper is the socalled head of line discipline under which the service goes to the packet at the head of the line for a serving queue where packets are stored in the order of their arrivals. The service rates are determined by a function of the environmental state and the number of packets in each of the queues. At each state and for a given queue length vector , let denote the corresponding rate vector (in bps) of serving the queues, which is a solution of the following utility maximization problem
(2.5) 
where is a dimensional vector and for each is a utility function defined on , which is secondorder differentiable and satisfies the following conditions
(2.6)  
(2.7)  
(2.8)  
(2.9) 
Due to condition (2.7), we know that there must exist an optimal solution in the following form for a given ,
(2.10) 
where for each and if . Moreover, denotes the set of all that have exactly components () to be zero, and the components of corresponding to () consist of the optimal solution to (2.5) with the capacity region replaced by the corresponding user capacity region and all other components of are zero. For example, when there are only two users in the system, (2.10) is of the following form,
Remark 2.1
The optimal solution to (2.5) may not be unique when for some , however, if with for some , we can reset to zero without violating the constraints or decreasing the objective value (referred to (2.7)). Hence, whenever the solution to (2.5) is concerned, we will always suppose that is true. Moreover, for each (and similarly, for a lower dimensional case), it follows from (2.7) that every point on the capacity surface defined in (2.2) is a Nash equilibrium point to a concave game in the sense of [40] and therefore the solution to (2.5) is a social optimal Nash equilibrium point to the concave game.
In addition, we assume that satisfies the socalled radial homogeneity condition, i.e., for any scalar , each and each , its maximizer satisfies
(2.11) 
Interested readers are referred to [54] for numerous examples of the utility function that satisfies conditions (2.6)(2.9) and (2.11), such as, the socalled proportional fair allocation, minimal delay allocation, and proportionally fair allocation, which are widely used in communication protocols.
2.3 The Dual Cost Minimization Problem
In this subsection, we consider the following cost minimization problem for each , a given and a given parameter ,
(2.12)  
where the function is defined by
(2.13) 
and is the cost function associated with the utility function in (2.5), i.e.,
(2.14) 
In other words, when the environment is in state , we try to identify a queue state corresponding to a given and a given parameter such that the total cost over the system is minimized and the (average) workload meets or exceeds .
2.4 Performance Measure Processes
Let denote the queue length for the th queue with at each time , i.e.,
(2.15) 
where is the number of packet departures from the th queue in , i.e., , where
(2.16) 
which denotes the cumulative amount of service (measured in bits) given to the th queue up to time . Moreover, let denote the (expected) workload at time and denote the unused capacity up to time , i.e.,
(2.17) 
where, for each , is a given point on the capacity surface and it is chosen to satisfy
(2.18) 
Here we remark that the second condition in (2.18) and the separable condition in (2.8) are required in proving Lemmas 5.55.6. However, when only a constant environment (e.g., a pseudo channel in a wireless system) is concerned, these two conditions can be removed. Obviously,
(2.19) 
since, for each , we have
(2.20) 
3 Main Theorem: Asymptotic Optimality
In this section, we present the optimality result for our scheduling policy by considering the operation of the queueing system in the asymptotic regime where it is heavily loaded. Concretely, we define three sequences of diffusionscaled processes , and by
(3.21) 
for each and , which associate with a sequence of independent Markov processes . These systems indexed by all have the same basic structure as described in the last section except the arrival rates and the holding time rates for all , which may vary with and satisfy the following heavy traffic condition
(3.22) 
for each , where are some constants and are the nominal average packet arrival rates when the channel is in state .
Note that, due to the heavy traffic condition in (3.22) for the th environmental state process with , we know that and equal each other in distribution since they own the same generator matrix (see, e.g., the definition in pages 384388 of [39]). Hence, in the sense of distribution, all of the systems indexed by in (3.21) share the same random environment over any time interval .
Moreover, let and denote the two independent dimensional standard Brownian motions, and for each , let
(3.23)  
(3.24)  
(3.25)  
(3.26)  
(3.27)  
(3.28)  
(3.29) 
In addition, let and denote the diffusionscaled queuelength and workload processes under an arbitrarily feasible rate scheduling policy , e.g., a simple Markovian policy as studied in [4] or a policy that may not be the optimal solution to the utility maximization problem (2.5). Then we have the following theorem.
Theorem 3.1
Suppose for all and the heavy traffic
condition (3.22) holds, then under the scheduling policy
(2.10), we have the claims as stated in the following two
parts:
Part A: Along , the following
convergence in distribution is true,
(3.30) 
and the limits and are continuous a.s., which satisfy the following RDRS
(3.31) 
where
(3.32) 
Moreover, is the unique solution of (3.31) with the following complementary property:

,

is nondecreasing,

can increase only at a time that .
In addition, we have
(3.33) 
with being the solution to the cost minimization
problem (2.12) in terms of each given and .
Part B: The workload and the cost
are
minimal with probability one in the sense that, for all ,
(3.34)  
(3.35) 
Remark 3.1
Comparing with the RBM widely studied in queueing literature, the RDRS model derived in (3.31) exhibits its new feature in the sense that it corresponds to a more realistic FSCTMC fading process in certain applications such as in a wireless system and indicates that the random process is a nonignorable random environmental factor to the system performance even in the limiting approximation model. From the model, we can also see that, when a constant environment (e.g., a quasistatic channel in a wireless system) is concerned, the model in (3.31) reduces to a RBM since the state process keeps a constant. Moreover, by the discussions in [11], [18], [17], and [26], we know that the unique solution to (3.31) can be represented by , where and are Lipschitz continuous mappings. In addition, a RDRS is different from a conventional SDE since its drift and diffusion coefficients are not adapted to the filtration generated by the driving Brownian motions. This type of SDEs without boundary reflections has received a great attention in the area of financial engineering (see, e.g., [57]).
4 Applications to user MIMO Uplink and Downlink Wireless Channels
In this section, we apply the discussions in the previous sections to a cellular system where base stations cooperate among noisefree infinite capacity links. We do not make any distinction between a singlecell cellular system having multiple basestation antennas and the traditional cellular system with cooperating singleantenna base stations. Here, the cooperation means that the base stations can perform joint beamforming and/or power control but there is a constraint on the total power that the base stations can share. Therefore, our wireless system can be considered consisting of a base station having antennas and users (mobiles), each of which has antennas. Thus the uplink channel can be modeled as a user MIMO MAC and the downlink channel can be modeled as a user MIMO BC (see, e.g., Figure 3).
The channel fading is supposed to obey the stationary FSCTMC that is described in the previous sections. Moreover, we suppose that the receive or transmit end (the cooperating base stations) has perfect CSI. For each channel state , we let () denote the downlink channel matrix from the base station to user . Assuming the same channel is used on the uplink and downlink, then the uplink matrix of user is that is the conjugate transpose of .
Moreover, at the transmit end, arriving packets for each user are buffered before transmission and the rate of arrivals is a random process that switches with the FSCTMC channel fading through admission control. Therefore, the processorsharing queues presented in the previous section can be used to model the channel dynamics for both user MIMO MAC and user MIMO BC. The remaining issue is about how to characterize the MAC and BC capacity region processes, which is also a central topics in information theory literature.
4.1 The MIMO MAC Capacity Region
In the MAC and for each channel state , let be the transmitted signal of user , where denotes the complex matrix, and let denote the received signal, denote the noise vector where is circularly symmetric complex Gaussian with identity covariance (note that the notation here has the different meaning from the workload process defined in (2.17)). Then the received signal at the base station is equal to
(4.36) 
where and (see, e.g., Figure 3). Moreover, each user is subject to an individual power constraint . The transmit covariance matrix of user is defined to be . The power constraint implies that Tr for . During the period of each channel state , it follows from [25] and [56] that the MAC capacity region is a dimensional closed convex set in , i.e.,
(4.37)  
where is a subset of and denotes the determinant of a matrix. Moreover, every point in can be achieved by Shannon’s source coding theorem and successive decoding (see, e.g., [24] and [25]). However, in designing a utility maximization based rate scheduling policy, we need to know more detailed boundary characterization of the MAC capacity region since it frequently relies on the KKT optimality conditions (see, e.g., [35] and [34]). Thus we have the following lemma.
Lemma 4.1
For the user MIMO MAC and each channel state , contains the origin and has linear or smooth curved facets with given by
(4.38) 
Moreover, of these pieces are dimensional linear facets along the coordinate axes while the remaining ones are in the interior of and form , which are linear or smooth curved facets on for , i.e.,
(4.39) 
Moreover, if is used to denote the sum capacity upper bound for the MAC capacity region, then
(4.40) 
where is the index corresponding to .
Example 4.1
For the MAC channel and each , when and (i.e., each of the user’s mobiles has only single transmit antenna), it follows from [25] that
4.2 The MIMO BC Capacity Region
In the MIMO BC and for each channel state , let denote the transmitted vector signal from the base station and let be the received signal at the user . The noise at user is represented by and is assumed to be circularly symmetric complex Gaussian noise . The received signal of user (see, e.g., Figure 3) is equal to
The transmit covariance matrix of the input signal is . The base station is subject to an average power constraint, which implies that Tr(. During each channel state , the user MIMO BC capacity region denoted by can be calculated by the duality of the MAC and the BC in [31] and [25], where the BC capacity region is obtained by taking the convex hull of the union over the set of capacity regions of the dual MIMO MACs such that the total MAC power is the same as the power in the BC, i.e.,
(4.41) 
Moreover, the Dirty Paper Coding (DPC) proposed in [15] achieves the capacity for the MIMO BC (see, e.g., [50]). In particular, if each user has only single receive antenna, we have the following lemma.
Lemma 4.2
For the user MIMO BC with , each and given in (4.38), contains the origin and has boundary pieces of which are (dimensional linear facets along the coordinate axes while the remaining ones are in the interior of and form , which are linear or smooth curved facets on for , i.e.,
(4.42) 
Moreover, if denotes the sum capacity upper bound (called the Sato upper bound) for the BC capacity region, then
(4.43) 
where is the index corresponding to .
Remark 4.1
5 Proof of Theorem 3.1
To be convenient for readers, we first outline the proof of Theorem 3.1, which consists of the following five parts.
Firstly, in Subsection 5.1, we first justify a dual relationship between the utilitymaximization problem in (2.5) and the costminimization problem in (2.12), which is summarized in Lemma 5.2. Then we prove a claim in Lemma 5.3, which states that when the system state is close to the unique optimal solution to the cost minimization problem (called a fixed point), the capacity of the system will be fully utilized. The claims stated in Lemmas 5.25.3 are similar to their counterparts in [54], nevertheless, their concrete proofs are different due to the different problem formulations and the difference of the capacity constraints between the two studies.
Secondly, in Subsection 5.2, we present an equivalent queueing model due to the assumption (3.22) imposed on the FSCTMC and justify a functional central limit theorem (Lemma 5.4) for a DSRP whose arrival rate process is driven by the FSCTMC. The main idea used in proving Lemma 5.4 is stemmed from the related discussion in [18], [17] and the concrete proving techniques include the conventional functional central limit theorem (see, e.g., [28] and [38]), random change of time lemma (see, e.g., [5]), establishment of oscillation inequality (see, e.g., [18], [17]), equivalent conditions of relative compactness and Skorohod representation theorem (see, e.g., [23]), and etc.
Thirdly, in Subsection 5.3, we derive the fluid limit processes for the physical processes under fluid scaling in Lemma 5.5 and study the asymptotic behavior for the fluid limit processes as time evolves in Lemma 5.6. Fluid limits are widely used as an intermediate step in justifying diffusion approximations (see, e.g., [7], [43], [54], [19], [3], [4], and references therein). Nevertheless, our fluid limit is a random process driven by the FSCTMC rather than a deterministic function of time as obtained in the existing studies. This new feature brings us additional complexity in proving Lemma 5.5 and Lemma 5.6, e.g., comparing with the study in [54], it requires more technical treatment in handling the FSCTMC based jumps for the constructed Lyapunov function. Therefore, by noticing this new feature and the difference between our optimal scheduling policy and the one in [54], we develop a theory through combining and generalizing the discussions in [54], [16], [3], and [4] to finish the justifications of Lemma 5.5 and Lemma 5.6.
Fourthly, in Subsection 5.4, we study the convergence of the workload and queue length processes on a finer timescale, which is an important step in justifying the main result of the paper. This method has appeared in queueing literature for a while (see, e.g., [6], [54], [43], [36], [41], and etc.) The main difference between ours and the existing works is as follows: all the processes concerned in our study involve the jumps introduced by the random environment and in the meanwhile the processes in existing studies do not involve this type of jumps. Therefore we develop a scheme and incorporate it into the framework as used in [54] to finish the proof of the convergence properties for the processes on a finer timescale.
Finally, in Subsection 5.5, we combine the results obtained in the previous subsections with the uniqueness of solution to an associated Skorohod problem and the minimality of the Skorohod problem to provide a proof for Theorem 3.1. This type of techniques have been used in the studies concerning network scheduling (see, e.g., [54], [43], [36], [41], and etc.) Nevertheless, our justification logic and technical treatment are somewhat different.
5.1 Preliminary Lemmas on the UtilityMaximization and Dual Cost Minimization Problems
Lemma 5.1
Proof. Consider each specific state , then the proof can be accomplished similarly as for Lemma 6.2 in [53] and hence we omit it.
Lemma 5.2
For each state , the following claims are true.
Proof. First of all, without loss of generality, we suppose that . Then it follows from the KKT optimality conditions (see, e.g., [35]) that the solution to the utility maximization problem in (2.5) can be obtained through the following equations,
(5.48)  
(5.49) 
where and are defined in (2.2), for all are the Lagrangian multipliers and for each and is defined in (2.2). Similarly, the solution to the cost minimization problem (2.12) can be obtained through the following equations,
(5.50)  
(5.51) 
where is the Lagrangian multiplier. Moreover, it follows from (2.14) that
(5.52) 
Thus, based on the above facts, the claim in the first part of the lemma can be proved as follows. By condition (2.7), we know that is strictly concave in for each . Therefore is the unique optimal solution to the utility maximization problem in (2.5) for the given in the utility function, which satisfies (5.48)(5.49). Thus, if we take
then it follows from (5.48) and (5.52) that (5.50) holds. Due to condition (2.9), we know that is strictly convex in for each . So the cost minimization problem in (2.12) has a unique optimal solution when is in the cost function and is in the constraints.
Conversely, the claim in the second part of the lemma can be proved as follows. Due to the conditions (2.8)(2.9) and the relationship (2.14), we know that is strictly convex in . Therefore is the unique optimal solution to the cost minimization problem (2.12) with . Thus we can prove by showing a contradiction.
In fact, without loss of generality, we suppose that there is some with such that with and , where is defined in (2.10). Then we can construct a dimensional line for some constant ,
(5.53) 
such that it passes through the point . Now it follows from (2.14) that the function () with the constraint for all ) is of the following derivative function in ,
(5.54) 
which is strictly increasing in due to (2.8). Moreover, it follows from (5.54) and (2.9) that
(5.55)  
(5.56) 