Chen-Feng Liu,  Mehdi Bennis,  Mérouane Debbah,  and H. Vincent Poor,
(Invited Paper)
This work was supported in part by the U.S. National Science Foundation under Grant CNS-1702808. This paper was presented in part at the IEEE Global Communications Conference Workshops, Singapore, December 2017 [1].C.-F. Liu and M. Bennis are with the Centre for Wireless Communications, University of Oulu, 90014 Oulu, Finland (e-mail: chen-feng.liu@oulu.fi; mehdi.bennis@oulu.fi).M. Debbah is with the Large Networks and System Group, CentraleSupélec, Université Paris–Saclay, 91192 Gif-sur-Yvette, France, and also with the Mathematical and Algorithmic Sciences Laboratory, Huawei France Research and Development, 92100 Paris, France (e-mail: merouane.debbah@huawei.com).H. V. Poor is with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, USA (e-mail: poor@princeton.edu).
###### Abstract

5G and beyond, mobile edge computing (MEC), fog networking and computing, ultra-reliable low latency communications (URLLC), extreme value theory.

## I Introduction

### I-B Our Contribution

While conventional communication networks were engineered to boost network capacity, little attention has been paid to reliability and latency performance. Indeed, ultra-reliable and low latency communication (URLLC) is one of the pillars for enabling 5G and is currently receiving significant attention in both academia and industry [23, 24, 25]. Regarding the existing MEC literature, the vast majority considers the average delay as a performance metric or the quality-of-service requirement [13, 14, 15, 16, 17, 18, 19, 20, 21, 22]. In other words, these system designs focus on latency through the lens of the average. In the works addressing the stochastic nature of the task arrival process [7, 8, 9, 10, 11, 12], their prime concern is how to maintain the mean rate stability of task queues, i.e., ensuring a finite average queue length as time evolves [26]. However, merely focusing on the average-based performance is not sufficient to guarantee URLLC for mission-critical applications, which mandates a further examination in terms of bound violation probability, high-order statistics, characterization of the extreme events with very low occurrence probabilities, and so forth [24].

The remainder of this paper is organized as follows. The system model is first specified in Section II. Subsequently, we formulate the latency requirements, reliability constraints, and the studied optimization problem in Section III. In Section IV, we detailedly specify the proposed UE-server association mechanism as well as the latency and reliability-aware task offloading and resource allocation framework. The network performance is evaluated numerically and discussed in Section V which is followed by Section VI for conclusions. Furthermore, for the sake of readability, we list all notations in Table II shown in Appendix A. The meaning of the notations will be detailedly defined in the following sections.

## Ii System Model

 ⎧⎨⎩ηij(n)∈{0,1},∀i∈U,j∈S,∑j∈Sηij(n)=1,∀i∈U. (1)

Subsequently in each time slot, i.e., the short/fast timescale, within the th frame, each UE dynamically offloads part of the tasks to the associated MEC server and computes the remaining tasks locally. The network architecture and timeline of the considered MEC network are shown in Fig. 1.

### Ii-a Traffic Model at the UE Side

The UE uses one application in which tasks arrive in a stochastic manner. Following the data-partition model [5], we assume that each task can be computed locally, i.e., at the UE, or remotely, i.e., at the server. Different tasks are independent and can be computed in parallel. Thus, having the task arrivals in time slot , each UE divides its arrival into two disjoint parts in which one part is executed locally when the remaining tasks will be offloaded to the server. Task splitting at UE can be expressed as

 {Ai(t)=ALi(t)+AOi(t),ALi(t),AOi(t)∈{0,Aunit,2Aunit,⋯}. (2)

Here, represents the unit task which cannot be further split. Moreover, we assume that task arrivals are independent and identically distributed (i.i.d.) over time with the average arrival rate .

Each UE has two queue buffers to store the split tasks for local computation and offloading. For the UE ’s local-computation queue, the queue length (in the unit of bits) in time slot is denoted by which evolves as

 QLi(t+1)=max{QLi(t)+ALi(t)−τfi(t)Li,0},\leavevmode\nobreak ∀i∈U. (3)

Here, (in the unit of cycle/sec) is the UE ’s allocated CPU-cycle frequency to execute tasks when accounts for the required CPU cycles per bit for computation, i.e., the processing density. The magnitude of the processing density depends on the performed application.111For example, the six-queen puzzle, 400-frame video game, seven-queen puzzle, face recognition, and virus scanning require the processing densities of 1760 cycle/bit, 2640 cycle/bit, 8250 cycle/bit, 31680 cycle/bit, and 36992 cycle/bit, respectively [7]. Furthermore, given a CPU-cycle frequency , the UE consumes the amount of power for computation. is a parameter affected by the device’s hardware implementation [10, 29]. For UE ’s task-offloading queue, we denote the queue length (in the unit of bits) in time slot as . Analogously, the task-offloading queue dynamics is given by

 QOi(t+1)=max{QOi(t)+AOi(t)−∑j∈SτRij(t),0},\leavevmode\nobreak ∀i∈U, (4)

in which

 Rij(t)=Wlog2(1+ηij(n)Pi(t)hij(t)N0W+∑i′∈U∖iηi′j(n)Pi′(t)hi′j(t)),\leavevmode\nobreak ∀i∈U,j∈S, (5)

In order to minimize the total power consumption of resource allocation for local computation and task offloading, the UE adopts the dynamic voltage and frequency scaling (DVFS) capability to adaptively adjust its CPU-cycle frequency [5, 29]. Thus, to allocate the CPU-cycle frequency and transmit power, we impose the following constraints at each UE , i.e.,

 ⎧⎪⎨⎪⎩κ[fi(t)]3+Pi(t)≤Pmaxi,fi(t)≥0,Pi(t)≥0, (6)

where is UE ’s power budget.

### Ii-B Traffic Model at the Server Side

We assume that each MEC server has distinct queue buffers to store different UEs’ offloaded tasks, where the queue length (in bits) of the UE ’s offloaded tasks at server in time slot is denoted by . The offloaded-task queue length evolves as

 Zji(t+1) =max{Zji(t)+min{QOi(t)+AOi(t),τRij(t)}−τfji(t)Li,0} (7) ≤max{Zji(t)+τRij(t)−τfji(t)Li,0},\leavevmode\nobreak ∀i∈U,j∈S. (8)

Here, is the server ’s allocated CPU-cycle frequency to process UE ’s offloaded tasks. Note that the MEC server is deployed to provide a faster computation capability for the UE. Thus, we consider the scenario in which each CPU core of the MEC server is dedicated to at most one UE (i.e., its offloaded tasks) in each time slot, and a UE’s offloaded tasks at each server can only be computed by one CPU core at a time [9, 10]. The considered computational resource scheduling mechanism at the MEC server is mathematically formulated as

 ⎧⎨⎩∑i∈U\mathbbm1{fji(t)>0}≤Nj,∀j∈S,fji(t)∈{0,fmaxj},∀i∈U,j∈S, (9)

where denotes the total CPU-core number of server , is server ’s computation capability of one CPU core, and is the indicator function. In (9), we account for the allocated CPU-cyle frequencies to all UEs even though some UEs are not associated with this server in the current time frame. The rationale will be detailedly explained in Section IV-D after formulating the concerned optimization problem. Additionally, in order to illustrate the relationship between the offloaded-task queue length and the transmission rate, we introduce inequality (8) which will be further used to formulate the latency and reliability requirements of the considered MEC system and derive the solution of the studied optimization problem.

## Iii Latency Requirements, Reliability Constraints, and Problem Formulation

 (10) (11)

Here, and are the queue length bounds when and are the tolerable bound violation probabilities. Furthermore, the queue length bound violation also undermines the reliability issue of task computation. For example, if a finite-size queue buffer is over-loaded, the incoming tasks will be dropped.

In addition to the bound violation probability, let us look at the complementary cumulative distribution function (CCDF) of the UE’s local-computation queue length, i.e., , which reflects the queue length profile. If the monotonically decreasing CCDF decays faster while increasing , the probability of having an extreme queue length is lower. Since the prime concern in this work lies in the extreme-case events with very low occurrence probabilities, i.e., , we resort to principles of extreme value theory333Extreme value theory is a powerful and robust framework to study the tail behavior of a distribution. Extreme value theory also provides statistical models for the computation of extreme risk measures. to characterize the statistics and tail distribution of the extreme event . To this end, we first introduce the Pickands–Balkema–de Haan theorem [27].

###### Theorem 1 (Pickands–Balkema–de Haan theorem).

Consider a random variable , with the cumulative distribution function (CDF) , and a threshold value . As the threshold closely approaches , i.e., , the conditional CCDF of the excess value , i.e., , can be approximated by a generalized Pareto distribution (GPD) , i.e.,

 ¯FX|Q>d(x)≈G(x;σ,ξ)= (1+ξxσ)−1/ξ, where x≥0 and ξ>0, (12a) ¯FX|Q>d(x)≈G(x;σ,ξ)= e−x/σ, where x≥0 and ξ=0, (12b) ¯FX|Q>d(x)≈G(x;σ,ξ)= (1+ξxσ)−1/ξ, where 0≤x≤−σ/ξ and ξ<0, (12c)

which is characterized by a scale parameter and a shape parameter .

In other words, the conditional CCDF of the excess value converges to a GPD as . However, from the proof [27] for Theorem 1, we know that the GPD provides a good approximation when is close to 1, e.g., . That is, depending on the CDF of , imposing a very large might not be necessary for obtaining the approximated GPD. Moreover, for a GPD , its mean and other higher-order statistics such as variance and skewness exist if , , and , respectively. Note that the scale parameter and the domain of are in the same order. In this regard, we can see that at and at in (12b). We also show the CCDFs of the GPDs for various shape parameters in Fig. 2, where the x-axis is indexed with respect to the normalized value . As shown in Fig. 2, the decay speed of the CCDF increases as decreases. In contrast with the curves with , we can see that the CCDF decays rather sharply when .

Now, let us denote the excess value (with respect to the threshold in (10)) of the local-computation queue of each UE in time slot as . By applying Theorem 1, the excess queue value can be approximated by a GPD whose mean and variance are

 E[XLi(t)|QLi(t)>dLi] ≈σLi1−ξLi, (13) Var(XLi(t)|QLi(t)>dLi) ≈(σLi)2(1−ξLi)2(1−2ξLi), (14)

with the corresponding scale parameter and shape parameter . In (13) and (14), we can find that the smaller and are, the smaller the mean value and variance. Since the approximated GPD is just characterized by the scale and shape parameters as mentioned previously, therefore, we impose thresholds on these two parameters, i.e., and . The selection of threshold values can be referred to the above discussions about the GPD, Fig. 2, and the magnitude of the interested metric’s values. Subsequently, applying the two parameter thresholds and to (13) and (14), we consider the constraints on the long-term time-averaged conditional mean and second moment of the excess value of each UE’s local-computation queue length, i.e.,

 ¯XLi =limT→∞1TT∑t=1E[XLi(t)|QLi(t)>dLi]≤σL,thi1−ξL,thi,\leavevmode\nobreak ∀i∈U, (15) ¯YLi (16)

with . Analogously, denoting the excess value, with respect to the threshold , of UE ’s task-offloading queue length in time slot as , we have the constraints on the long-term time-averaged conditional mean and second moment

 ¯XOi =limT→∞1TT∑t=1E[XOi(t)|QOi(t)>dOi]≤σO,thi1−ξO,thi,\leavevmode\nobreak ∀i∈U, (17) ¯YOi (18)

in which and are the thresholds for the characteristic parameters of the approximated GPD, and .

Likewise, the average queuing delay at the server is proportional to the ratio of the average queue length to the average transmission rate. Referring to (8), we consider the probabilistic constraint as follows:

 (19)

with the threshold and tolerable violation probability , on the offloaded-task queue length at the MEC server. is the moving time-averaged transmission rate. Similar to the task queue lengths at the UE side, we further denote the excess value, with respect to the threshold , in time slot as of the offloaded-task queue length of UE at server and impose the constraints as follows:

 ¯Xji =limT→∞1TT∑t=1E[Xji(t)|Zji(t)>~Rij(t−1)dji]≤σthji1−ξthji, (20) ¯Yji =limT→∞1TT∑t=1E[Yji(t)|Zji(t)>~Rij(t−1)dji]≤2(σthji)2(1−ξthji)(1−2ξthji), (21)

with . Here, and are the thresholds for the characteristic parameters of the approximated GPD.

We note that the local computation delay at the UE and the transmission delay while offloading are inversely proportional to the computation speed and the transmission rate as per (3) and (4), respectively. To decrease the local computation and transmission delays, the UE should allocate a higher local CPU-cycle frequency and more transmit power, which, on the other hand, incurs energy shortage. Since allocating a higher CPU-cycle frequency and more transmit power can also further decrease the queue length, both (local computation and transmission) delays are implicitly taken into account in the queue length constraints (10), (11), and (15)–(18). At the server side, the remote computation delay can be neglected because one CPU core with the better computation capability is dedicated to one UE’s offloaded tasks at a time. On the other hand, the server needs to schedule its computational resources, i.e., multiple CPU cores, when the associated UEs are more than the CPU cores.

Incorporating the aforementioned latency requirements and reliability constraints, the studied optimization problem is formulated as follows:

 MP: %minimizeη(n),f(t),P(t) \leavevmode\nobreak \leavevmode\nobreak ∑i∈U(¯PCi+¯PTi) subject to \leavevmode\nobreak \leavevmode\nobreak (???) for UE-server association, \leavevmode\nobreak \leavevmode\nobreak (???) for task splitting, \leavevmode\nobreak \leavevmode\nobreak (???) and (???) for resource % allocation, \leavevmode\nobreak \leavevmode\nobreak (???), (???), and (???) for queue length bound violation, \leavevmode\nobreak \leavevmode\nobreak (???)--(???), (???), and (???) for the GPDs,

where and are the UE ’s long-term time-averaged power consumptions for local computation and task offloading, respectively. and denote the network-wide UE-server association and transmit power allocation vectors, respectively. In addition, denotes the network-wide computational resource allocation vector in which is the computational resource allocation vector of server . To solve problem MP, we utilize techniques from Lyapunov stochastic optimization and propose a dynamic task offloading and resource allocation policy in the next section.

### Iv-a Lyapunov Optimization Framework

We first introduce a virtual queue for the long-term time-averaged constraint (15) with the queue evolution as follows:

 QL,(X)i(t+1) =max{QL,(X)i(t)+(XLi(t+1)−σL,thi1−ξL,thi)×\mathbbm1{QLi(t+1)>dLi},0},\leavevmode\nobreak ∀i∈U, (22)

in which the incoming traffic amount and outgoing traffic amount correspond to the left-hand side and right-hand side of the inequality (15), respectively. Note that [26] ascertains that the introduced virtual queue is mean rate stable, i.e., , is equivalent to satisfying the long-term time-averaged constraint (15). Analogously, for the constraints (16)–(18), (20), and (21), we respectively introduce the virtual queues as follows:

 QL,(Y)i(t+1) =max{QL,(Y)i(t)+(YLi(t+1)−2(σL,thi)2(1−ξL,thi)(1−2ξL,thi)) ×\mathbbm1{QLi(t+1)>dLi},0},\leavevmode\nobreak ∀i∈U, (23) QO,(X)i(t+1) =max{QO,(X)i(t)+(XOi(t+1)−σO,thi1−ξO,thi) ×\mathbbm1{QOi(t+1)>dOi},0},\leavevmode\nobreak ∀i∈U, (24) QO,(Y)i(t+1) =max{QO,(Y)i(t)+(YOi(t+1)−2(σO,thi)2(1−ξO,thi)(1−2ξO,thi)) ×\mathbbm1{QOi(t+1)>dOi},0},\leavevmode\nobreak ∀i∈U, (25) Q(X)ji(t+1) =max{Q(X)ji(t)+(Xji(t+1)−σthji1−ξthji) ×\mathbbm1{Zji(t+1)>~Rij(t)dji},0},\leavevmode\nobreak ∀i∈U,j∈S. (26) Q(Y)ji(t+1) =max{Q(Y)ji(t)+(Ykji(t+1)−2(σthji)2(1−ξthji)(1−2ξthji)) ×\mathbbm1{Zji(t+1)>~Rij(t)dji},0},\leavevmode\nobreak ∀i∈U,j∈S. (27)

Additionally, given an event and the set of all possible outcomes , we can derive . By applying , constraints (10), (11), and (19) can be equivalently rewritten as

 limT→∞1TT∑t=1E[\mathbbm1{QLi(t)>dLi}] ≤ϵLi,\leavevmode\nobreak ∀i∈U, (28) limT→∞1TT∑t=1E[\mathbbm1{QOi(t)>dOi}] ≤ϵOi,\leavevmode\nobreak ∀i∈U, (29) limT→∞1TT∑t=1E[\mathbbm1{Zji(t)>~Rij(t−1)dji}] ≤ϵji,\leavevmode\nobreak ∀i∈U,j∈S. (30)

Then let us follow the above steps. The corresponding virtual queues of (28)–(30) are expressed as

 QL,(Q)i(t+1) =max{QL,(Q)i(t)+\mathbbm1{QLi(t+1)>dLi}−ϵLi,0},\leavevmode\nobreak ∀i∈U, (31) QO,(Q)i(t+1) =max{QO,(Q)i(t)+\mathbbm1{QOi(t+1)>dOi}−ϵOi,0},\leavevmode\nobreak ∀i∈U, (32) Q(Z)ji(t+1) =max{Q(Z)ji(t)+\mathbbm1{Zji(t+1)>~Rij(t)dji}−ϵji,0},\leavevmode\nobreak ∀i∈U,j∈S. (33)

Now problem MP is equivalently transferred to [26]

 MP’: %minimizeη(n),f(t),P(t) \leavevmode\nobreak \leavevmode\nobreak ∑i∈U(¯PCi+¯PTi) subject to \leavevmode\nobreak \leavevmode\nobreak (???), (???), (???), and (???), \leavevmode\nobreak \leavevmode\nobreak Stability of % (???)--(???) and (???)--(???).

To solve problem MP’, we let denote the combined queue vector for notational simplicity and express the conditional Lyapunov drift-plus-penalty for slot as

 E[L(Q(t+1))−L(Q(t))+∑i∈UV(κ[fi(t)]3+Pi(t))∣∣Q(t)], (34)

where

is the Lyapunov function. The term is a parameter which trades off objective optimality and queue length reduction. Subsequently, plugging the inequality , all physical and virtual queue dynamics, and (8) into (34), we can derive

 (???) ×\mathbbm1{QLi(t)+Ai(t)>dLi}+QL,(Q)i(t)×\mathbbm1{max{QLi(t)+ALi(t)−τfi(t)/Li,0}>dLi}] +∑i∈U[(QO,(X)i(t)+QOi(t)+2QO,(Y)i(t)QOi(t)+2[QOi(t)]3)(AOi(t)−∑j∈SτRij(t)) ×\mathbbm1{QOi(t)+Ai(t)>dOi}+QO,(Q)i(t)×\mathbbm1{max{QOi(t)+AOi(t)−∑j∈SτRij(t),0}>dOi}] +∑i∈U∑j∈S[(Q(X)ji(t)+Zji(t)+2Q(Y)ji(t)Zji(t)+2[Zji(t)]3)(τRij(t)−τfji(t)Li) ×\mathbbm1{Zji(t)+τRmaxij(t)>~Rij(t−1)dji}+Q(Z)ji(t)×\mathbbm1{max{Zji(t)+τRij(t)−τfji(t)/Li,0}>~Rij(t−1)dji}] +∑i∈UV(κ[fi(t)]3+Pi(t))∣∣Q(t)]. (35)

Here, is UE ’s maximum offloading rate. Since the constant does not affect the system performance in Lyapunov optimization, we omit its details in (35) for expression simplicity. Note that a solution to problem MP’ can be obtained by minimizing the upper bound (35) in each time slot , in which the optimality of MP’ is asymptotically approached by increasing [26]. To minimize (35), we have three decomposed optimization problems P1, P2, and P3 which are detailed and solved in the following parts.

The first decomposed problem, which jointly associates UEs with MEC servers and allocates UEs’ computational and communication resources, is given by

 P1: %minimizeη(n),f(t),P(t) \leavevmode\nobreak \leavevmode\nobreak ∑i∈U∑j∈S(βji(t)−βOi(t))τWlog2(1+ηij(n)Pi(t)hij(t)N0W+∑i′∈U∖iηi′j(n)Pi′(t)hi′j(t)) \leavevmode\nobreak \leavevmode\nobreak −∑i∈UβLi(t)τfi(t)Li+∑i∈UV(κ[fi(t)]3+Pi(t)) subject to \leavevmode\nobreak \leavevmode\nobreak (???) and (???),

with

 βji(t) =(Q(X)ji(t)+Zji(t)+2Q(Y)ji(t)Zji(t)+2[Zji(t)]3) ×\mathbbm1{Zji(t)+τRmaxij(t)>~Rij(t−1)dji}+Q(Z)ji(t)+Zji(t), (36)
 βOi(t) =(QO,(X)i(t)+QOi(t)+2QO,(Y)i(t)QOi(t)+2[QOi(t)]3) ×\mathbbm1{QOi(t)+Ai(t)>dOi}+QO,(Q)i(t)+QOi(t), (37) βLi(t) =(QL,(X)i(t)+QLi(t)+2QL,(Y)i(t)QLi(t)+2[QLi(t)]3) ×\mathbbm1{QLi(t)+Ai(t)>dLi}+QL,(Q)i(t)+QLi(t). (38)

Note that in P1, the UE’s allocated transmit power is coupled with the local CPU-cycle frequency. The transmit power also depends on the wireless channel strength to the associated server and the weight of the corresponding offloaded-task queue, in which the former depends on the distance between the UE and server when the latter is related to the MEC server’s computation capability and the number of associated UEs. Therefore, the UEs’ geographic configuration and the servers’ computation capabilities should be taken into account while we associate the UEs with the servers. Moreover, UE-server association, i.e., , and resource allocation, i.e., and , are performed in two different timescales, i.e., in the beginning of each time frame and every time slot afterwards. We solve P1 in two steps, in which the UE-server association is firstly decided. Then, given the association results, UEs’ CPU-cycle frequencies and transmit powers are allocated.

### Iv-B UE-Server Association using Many-to-One Matching with Externalities

To associate UEs to the MEC servers, let us focus on the wireless transmission part of P1 and, thus, fix and , at this stage. The wireless channel gain and the weight factors and dynamically change in each time slot, whereas the UEs are re-associated with the servers in every slots. In order to take the impacts of , , and into account, we consider the average strength for the channel gain, i.e., letting , and the empirical average, i.e.,

 ~βOi(n) =1(n−1)T0(n−1)T0−1∑t=0βOi(t),\leavevmode\nobreak ∀i∈U