Joint Optimization of Radio and Computational Resources for Multicell MobileEdge Computing
Abstract
Migrating computational intensive tasks from mobile devices to more resourceful cloud servers is a promising technique to increase the computational capacity of mobile devices while saving their battery energy. In this paper, we consider a MIMO multicell system where multiple mobile users (MUs) ask for computation offloading to a common cloud server. We formulate the offloading problem as the joint optimization of the radio resourcesthe transmit precoding matrices of the MUsand the computational resourcesthe CPU cycles/second assigned by the cloud to each MUin order to minimize the overall users’ energy consumption, while meeting latency constraints. The resulting optimization problem is nonconvex (in the objective function and constraints). Nevertheless, in the singleuser case, we are able to express the global optimal solution in closed form. In the more challenging multiuser scenario, we propose an iterative algorithm, based on a novel successive convex approximation technique, converging to a local optimal solution of the original nonconvex problem. Then, we reformulate the algorithm in a distributed and parallel implementation across the radio access points, requiring only a limited coordination/signaling with the cloud. Numerical results show that the proposed schemes outperform disjoint optimization algorithms.
I Introduction
Mobile terminals, such as smartphones, tablets and netbooks, are increasingly penetrating into our everyday lives as convenient tools for communication, entertainment, business, social networking, news, etc. Current predictions foresee a doubling of mobile data traffic every year. However such a growth in mobile wireless traffic is not matched with an equally fast improvement on mobile handsets’ batteries, as testified in [3]. The limited battery lifetime is then going to represent the stumbling block to the deployment of computationintensive applications for mobile devices. At the same time, in the InternetofThings (IoT) paradigm, a myriad of heterogeneous devices, with a wide range of computational capabilities, are going to be interconnected. For many of them, the local computation resources are insufficient to run sophisticated applications. In all these cases, a possible strategy to overcome the above energy/computation bottleneck consists in enabling resourceconstrained mobile devices to offload their most energyconsuming tasks to nearby more resourceful servers. This strategy has a long history and is reported in the literature under different names, such as cyber foraging [4], or computation offloading [5]. In recent years, cloud computing (CC) has provided a strong impulse to computation offloading through virtualization, which decouples the application environment from the underlying hardware resources and thus enables an efficient usage of available computing resources. In particular, Mobile Cloud Computing (MCC) [6] makes possible for mobile users to access cloud resources, such as infrastructures, platforms, and software, ondemand. Several works addressed mobile computation offloading, such as [7, 8, 9, 10, 11, 12, 13, 14, 15, 16]. Recent surveys are [6], [17], and [18]. Some works addressed the problem of program partitioning and offloading the most demanding program tasks, as e.g. in [7, 8, 9, 10]. Specific examples of mobile computation offloading techniques are: MAUI [19], ThinkAir [20], and Phone2Cloud [21]. The tradeoff between the energy spent for computation and communication was studied in [12, 13, 14, 22]. A dynamic formulation of computation offloading was proposed in [15]. These works optimized offloading strategies, assuming a given radio access, and concentrated on singleuser scenarios. In [23], it was proposed a joint optimization of radio and computational resources, for the single user case. The joint optimization was then extended to the multiuser case in [24]; see also [25] for a recent survey on joint optimization for computation offloading in a 5G perspective. The optimal joint allocation of radio and computing resources in [24], [25] was assumed to be managed in a centralized way in the cloud. A decentralized solution, based on a gametheoretic formulation of the problem, was recently proposed in [26], [11]. In current cellular networks, the major obstacles limiting an effective deployment of MCC strategies: i) the energy spent by mobile terminals, especially cell edge users, for radio access; and ii) the latency experienced in reaching the (remote) cloud server through a wide area network (WAN). Indeed, in macrocellular systems, the transmit power necessary for cell edge users to access a remote base station might null all potential benefits coming from offloading. Moreover, in many realtime mobile applications (e.g., online games, speech recognition, Facetime) the user Quality of Experience (QoE) is strongly affected by the system response time. Since controlling latency over a WAN might be very difficult, in many circumstances the QoE associated to MCC could be poor.
A possible way to tackle these challenges is to bring both radio access and computational resources closer to MUs. This idea was suggested in [27, 17], with the introduction of cloudlets, providing proximity radio access to fixed servers through WiFi. However, the lack of available fixed servers could limit the applicability of cloudlets. The European project TROPIC [28] suggested to endow small cell LTE base stations with, albeit limited, cloud functionalities. In this way, one can exploit the potential dense deployment of small cell base stations to facilitate proximity access to computing resources and have advantages over WiFi access in terms of QualityofService guarantee and a single technology system (no need for the MUs to switch between cellular and WiFi standards). Very recently, the European Telecommunications Standards Institute (ETSI) launched a new standardization group on the so called MobileEdge Computing (MEC), whose aim is to provide information technology and cloudcomputing capabilities within the Radio Access Network (RAN) in close proximity to mobile subscribers in order to offer a service environment characterized by proximity, low latency, and high rate access [29].
Merging MEC with the dense deployment of (small cell) Base Stations (BSs), as foreseen in the 5G standardization roadmap, makes possible a real proximity, ultralow latency access to cloud functionalities [25]. However, in a dense deployment scenario, offloading becomes much more complicated because of intercell interference. The goal of this paper is to propose a joint optimization of radio and computational resources for computation offloading in a dense deployment scenario, in the presence of intercell interference. More specifically, the offloading problem is formulated as the minimization of the overall energy consumption, at the mobile terminals’ side, under transmit power and latency constraints. The optimization variables are the mobile radio resourcesthe precoding (equivalently, covariance) matrices of the mobile MIMO transmittersand the computational resourcesthe CPU cycles/second assigned by the cloud to each MU. The latency constraint is what couples computation and communication optimization variables. This problem is much more challenging than the (special) cases studied in the literature because of the presence of intercell interference, which introduces a coupling among the precoding matrices of all MUs, while making the optimization problem nonconvex. In this context, the main contributions of the paper are the following: i) in the singleuser case, we first establish the equivalence between the original nonconvex problem and a convex one, and then derive the closed form of its (global optimal) solution; ii) in the multicell case, hinging on recent Successive Convex Approximation (SCA) techniques [30, 31], we devise an iterative algorithm that is proved to converge to local optimal solutions of the original nonconvex problem; and iii) we propose alternative decomposition algorithms to solve the original centralized problem in a distributed form, requiring limited signaling among BSs and cloud; the algorithms differ for convergence speed, computational effort, communication overhead, and apriori knowledge of system parameters, but they are all convergent under a unified set of conditions. Numerical results show that all the proposed schemes converge quite fast to “good” solutions, yielding a significant energy saving with respect to disjoint optimization procedures, for applications requiring intensive computations and limited exchange of data to enable offloading.
The rest of the paper is organized as follows. In Section II we introduce the system model; Section III formulates the offloading optimization problem in the single user case, whereas Section IV focuses on the multicell scenario along with the proposed SCA algorithmic framework. The decentralized implementation is discussed in Section V.
Ii Computation offloading
Let us consider a network composed of
cells; in each cell , there is one Small Cell enhanced Node B (SCeNB in LTE terminology) serving MUs. We denote by the th user in the cell , and by the set of all the users. Each MU and SCeNB are equipped with transmit and receive antennas, respectively. The SCeNB’s are all connected to a common cloud provider, able to serve multiple users concurrently. We assume that MUs in the same cell transmit over orthogonal channels, whereas users of different cells may interfere against each other.
In this scenario, each MU is willing to run an application within a given maximum time , while minimizing the energy consumption at the MU’s side.
To offload computations to the remote cloud, the MU has to send all the needed information to the server.
Each module to be executed is characterized by: the number of CPU cycles necessary to run the module itself; the number of input bits necessary to transfer the program execution from local to remote sides; and the number of output bits encoding the result of the computation, to be sent back from remote to local sides.
The MU can perform its
computations locally or offload them to the cloud, depending on which strategy requires less
energy, while satisfying the latency constraint.
In case of offloading, the latency incorporates
the time to transmit the input bits to the server, the time necessary for the server to execute
the instructions, and the time to send the result back to the MU.
More specifically, the overall latency experienced by each MU can be written as
(1) 
where is the time necessary for the MU to transfer the input bits to its SCeNB; is the time for the server to execute CPU cycles; and is the time necessary for SCeNB to send the bits
to the cloud through the backhaul link plus the time necessary to send back the result (encoded in bits) from the server to MU .
We derive next an explicit expression of and as a function of the radio and computational resources.
Radio resources: The optimization variables at radio level are the users’ transmit covariance matrices , subject to power budget constraints
(2) 
where is the average transmit power of user . We will denote by the joint set .
For any given profile , the maximum achievable rate of MU is:
(3) 
where
(4) 
is the covariance matrix of the noise (assumed to be diagonal w.l.o.g, otherwise one can always prewhitening the channel matrices) plus the intercell interference at the SCeNB (treated as additive noise); is the channel matrix of the uplink in the cell , whereas is the crosschannel matrix between the interferer MU in the cell and the SCeNB of cell ; and denotes the tuple of the covariance matrices of all users interfering with the SCeNB .
Given each , the time necessary for user in cell to transmit the input bits
of duration to its SCeNB can be written as
(5) 
where . The energy consumption due to offloading is then
(6) 
which depends also on the covariance matrices of the users in the other cells, due to the intercell interference.
Computational resources. The cloud provider is able to serve multiple users concurrently. The computational resources made available by the cloud and shared among the users are quantified in terms of number of CPU cycles/second, set to ; let be the fraction of assigned to each user . All the are thus nonnegative optimization variables to be determined, subject to the computational budget constraint . Given the resource assignment , the time needed to run CPU cycles of user ’s instructions remotely is then
(7) 
The expression of the overall latency [cf. (1), (5), and (7)] clearly shows the interplay between radio access and computational aspects, which motivates a joint optimization of the radio resources, the transmit covariance matrices of the MUs, and the computational resources, the computational rate allocation .
Iii The Singleuser case
In the singleuser case, there is only one active MU having access to the cloud. In such interferencefree scenario, the maximum achievable rate on the MU and energy consumption due to offloading reduce to [cf. (3) and (6)]
(8) 
and
(9) 
respectively, with (for notational simplicity, we omit the user index; denotes now the covariance matrix of the MU).
We formulate the offloading problem as the minimization of the energy spent by the MU to run its application remotely, subject to latency and transmit power constraints, as follows:
() 
where a) reflects the user latency constraint [cf. (1)], with capturing all the constant terms, i.e., ; b) imposes a limit on the cloud computational resources made available to the users; and c) is the power budget constraint on the radio resources.
Feasibility: Depending on the system parameters, problem may be feasible or not. In the latter case, offloading is not possible and thus the MU will perform its computations locally. It is not difficult to prove that the following condition is necessary and sufficient for to be nonempty and thus for offloading to be feasible:
(10) 
where is the capacity of the MIMO link of the MU, i.e.,
(11) 
The unique (closedform) solution of (11) is the wellknown MIMO waterfilling. Note that condition (10) has an interesting physical interpretation: offloading is feasible if and only if , i.e., the delay on the wired network is less than the maximum tolerable delay, and the overall latency constraint is met (at least) when the wireless and computational resources are fully utilized (i.e., , and ). It is not difficult to check that this worstcase scenario is in fact achieved when (10) is satisfied with equality; in such a case, the (globally optimal) solution to is trivially given by , where is the waterfilling solution to (11). Therefore in the following we will focus w.l.o.g. on under the tacit assumption of strict feasibility [i.e., the inequality in (10) is tight].
Solution Analysis: Problem is nonconvex due to the nonconvexity of the energy function. A major contribution of this section is to i) cast into a convex equivalent problem, and ii) compute its global optimal solution (and thus optimal also to ) in closed form. To do so, we introduce first some preliminary definitions.
Let be the following auxiliary convex problem
() 
which corresponds to minimizing the transmit power of the MU under the same latency and power constraints as in . Also, let be the (reduced) eigenvalue decomposition of , with , where is the (semi)unitary matrix whose columns are the eigenvectors associated with the positive eigenvalues of , and is the diagonal matrix, whose diagonal entries are the eigenvalues arranged in decreasing order. We are now ready to establish the connection between and .
Theorem 1.
(b) (and ) has a unique solution , given by
(12) 
where must be chosen so that the latency constraint (a) in is satisfied with equality at , and (intended componentwise).
The waterlevel can be efficiently computed using the hypothesistestingbased algorithm described in Algorithm 1.
Proof.
See Appendix A.∎
Theorem 1 is the formal proof that, in the singleuser case, the latency constraint has to be met with equality and then the offloading strategy minimizing energy consumption coincides with the one minimizing the transmit power. Note also that has a waterfillinglike structure: the optimal transmit “directions” are aligned with the eigenvectors of the equivalent channel . However, differently from the classical waterfilling solution [cf. (11)], the waterlevel is now computed to meet the latency constraints with equality. This means that a transmit strategy using the full power (like ) is no longer optimal. The only case in which is the case where the feasibility condition (10) is satisfied with equality. Note also that the waterlevel depends now on both communication and computational parameters (the maximum tolerable delay, size of the program state, CPU cycle budget, etc.).
Iv Computation offloading over multiplecells
In this section we consider the more general multicell scenario described in Sec.II. The overall energy spent by the MUs to remotely run their applications is now given by
(13) 
with defined in (6). If some fairness has to be guaranteed among the MUs, other objective functions of the MUs’ energies can be used, including the weighted sum, the (weighted) geometric mean, etc.. As a casestudy, in the following, we will focus on the minimization of the sumenergy , but the proposed algorithmic framework can be readily applied to the alternative aforementioned functions.
Each MU is subject to the power budget constraint (2) and, in case of offloading, to an overall latency given by
(14) 
The offloading problem in the multicell scenario is then formulated as follows:
() 
where a) represent the users’ latency constraints with ; and the constraint in b) is due to the limited cloud computational resources to be allocated among the MUs.
Feasibility: The following conditions are sufficient for to be nonempty and thus for offloading to be feasible: for all , and there exists a such that
(15) 
Problem is nonconvex, due to the nonconvexity of the objective function and the constraints a). In what follows we exploit the structure of and, building on some recent Successive Convex Approximation (SCA) techniques proposed in [30, 31], we develop a fairly general class of efficient approximation algorithms, all converging to a local optimal solution of . The numerical results will show that the proposed algorithms converge in a few iterations to “good” locally optimal solutions of (that turn out to be quite insensitive to the initialization). The main algorithmic framework, along with its convergence properties, is introduced in Sec. IVA; alternative distributed implementations are studied in Sec. V.
Iva Algorithmic design
To solve the nonconvex problem efficiently, we develop a SCAbased method where is replaced by a sequence of strongly convex problems. At the basis of the proposed technique, there is a suitable convex approximation of the nonconvex objective function and the constraints around the iterates of the algorithm, which are preliminarily discussed next.
IvA1 Approximant of
Let and , with and . Let be any closed convex set containing such that is welldefined on it. Note that such a set exits. For instance, noting that at every (feasible) , it must be , , for all and . Hence, condition in can be equivalently rewritten as
so that one can choose .
Following [30, 31], our goal is to build, at each iteration , an approximant, say , of the nonconvex (nonseparable) around the current (feasible) iterate that enjoys the following key properties:
 P1:

is uniformly strongly convex on ;
 P2:

, ;
 P3:

is Lipschitz continuous on ;
where denotes the conjugate gradient of with respect to . Conditions P1P2 just guarantee that the candidate approximation is strongly convex while preserving the same first order behaviour of at any iterate ; P3 is a standard continuity requirement.
We build next a satisfying P1P3. Observe that i) for any given , each term of the sum in [cf. (13)] is the product of two convex functions in [cf. (6)], namely: and ; and ii) the other terms of the sum with are not convex in . Exploiting such a structure, a convex approximation of can be obtained for each MU by convexifying the term and linearizing the nonconvex part . More formally, denoting , for each , let us introduce the “approximation” function :
(16) 
where: the first two terms on the righthand side are the aforementioned convexification of ; the third term comes from the linearization of , with and denoting the conjugate gradient of with respect to evaluated at , and given by
(17) 
the fourth term in (16) is a quadratic regularization term added to make uniformly strongly convex on .
Based on each , we can now define the candidate sumenergy approximation as: given ,
(18) 
It is not difficult to check that satisfies P1P3; in particular it is strongly convex on with constant . Note that is also separable in the users variables , which is instrumental to obtain distributed algorithms across the SCeNBs, see Sec. V.
IvA2 Inner convexification of the constraints
We aim at introducing an inner convex approximation, say , of the constraints around , satisfying the following key properties (the proof is omitted for lack of space and reported in Appendix B in the supporting material) [30, 31]:
 C1:

is uniformly convex on ;
 C2:

, ;
 C3:

is continuous on ;
 C4:

, and ;
 C5:

, ;
 C6:

is Lipschitz continuous on .
Conditions C1C3 are the counterparts of P1P3 on ; the extra condition C4C5 guarantee that is an inner approximation of , implying that any satisfying is feasible also for the original nonconvex problem .
To build a satisfying C1C6, let us exploit first the concaveconvex structure of the rate functions [cf. (3)]:
(19) 
where
(20) 
with defined in (4). Note that and are concave on and convex on , respectively. Using (19), and observing that at any (feasible) , it must be and for all and , the constraints in can be equivalently rewritten as
(21) 
where with a slight abuse of notation we used the same symbol to denote the constraint in the equivalent form.
The desired inner convex approximation is obtained from by retaining the convex part in (21) and linearizing the concave term , resulting in:
(22) 
where each is defined as
(23) 
and .
IvA3 Inner SCA algorithm: centralized implementation
We are now ready to introduce the proposed inner convex approximation of the nonconvex problem , which consists in replacing the nonconvex objective function and constraints in with the approximations and , respectively. More formally, given the feasible point , we have
() 
where we denoted by the unique solution of the strongly convex optimization problem.
The proposed solution consists in solving the sequence of problems , starting from a feasible . The formal description of the method is given in Algorithm 2, which is proved to converge to local optimal solutions of the original nonconvex problem in Theorem 2. Note that in Step 3 of the algorithm we include a memory in the update of the iterate . A practical termination criterion in Step 1 is , where is the prescribed accuracy.
Theorem 2.
Given the nonconvex problem , choose and such that
(24) 
Then every limit point of (at least one of such points exists) is a stationary solution of . Furthermore, none of such points is a local maximum of the energy function .
Proof.
The proof is omitted for lack of space and reported in Appendix B of the supporting material. ∎
Theorem 2 offers some flexibility in the choice of the free parameters and while guaranteeing convergence of Algorithm 2. For instance, is positive if all and are positive (but arbitrary); in the case of fullcolumn rank matrices , one can also set (still resulting in ). Many choices are possible for the stepsize ; a practical rule satisfying (24) that we found effective in our experiments is [32]:
(25) 
with .
On the implementation of Algorithm 2: Since the base stations are connected to the cloud throughout high speed wired links, a good candidate place to run Algorithm 2 is the cloud itself: The cloud collects first all system parameters needed to run the algorithm from the SCeNBs (MUs’ channel state information, maximum tolerable latency, etc.); then, if the feasibility conditions (15) are satisfied, the cloud solves the strongly convex problems (using any standard nonlinear programming solver), and sends the solutions back to the corresponding SCeNBs; finally, each SCeNB communicates the optimal transmit parameters to the MUs it is serving.
Related works: Algorithm 2 hinges on the idea of successive convex programming, which aims at computing stationary solutions of some classes of nonconvex problems by solving a sequence of convexified subproblems. Some relevant instances of this method that have attracted significant interest in recent years are: i) the basic DCA (DifferenceofConvex Algorithm) [33, 34]; ii) the M(ajorization)M(inimization) algorithm [35, 36]; iii) alternating/successive minimization methods [37, 38, 39]; and iv) partial linearization methods [40, 41, 32]. The aforementioned methods identify classes of “favorable” nonconvex functions, for which a suitable convex approximation can be obtained and convergence of the associated sequential convex programming method can be proved. However, the sumenergy function in (13) and the resulting nonconvex optimization problem do not belong to any of the above classes. More specifically, what makes current algorithms not readily applicable to Problem is the lack in the objective function of a(n additively) separable convex and nonconvex part [each in (13) is in fact the ratio of two functions, and , of the same set of variables]. Therefore, the proposed approximation function , along with the resulting SCAalgorithm, i.e., Algorithm 2, are an innovative contribution of this work.
V Distributed Implementation
To alleviate the communication overhead of a centralized implementation (Algorithm 2), in this section we devise distributed algorithms converging to local optimal solutions of . Following [31], the main idea is to choose the approximation functions and so that (on top of satisfying conditions P.1P.3 and C.1C.6, needed for convergence) the resulting convexified problems can be decomposed into (smaller) subproblems solvable in parallel across the SCeNBs, with limited signaling between the SCeNBs and the cloud.
Since the approximation function introduced in (18) is (sum) separable in the optimization variables of the MUs in each cell, any choice of ’s enjoying the same decomposability structure leads naturally to convexified problems that can be readily decomposed across the SCeNBs by using standard primal or dual decomposition techniques.
Of course there is more than one choice of meeting the above requirements; all of them lead to convergent algorithms that however differ for convergence speed, complexity, communication overhead, and apriori knowledge of the system parameters. As case study, in the following, we consider two representative valid approximants. The first candidate is obtained exploiting the Lipschitz property of the gradient of the rate functions , whereas the second one is based on an equivalent reformulation of introducing proper slack variables. The first choice offers a lot of flexibility in the design of distributed algorithmsboth primal and dualbased schemes can be invokedbut it requires knowledge of all the Lipschitz constants. The second choice does not need this knowledge, but it involves a higher computational cost at the SCeNBs side, due to the presence of the slack variables.
Va Percell distributed dual and primal decompositions
The approximation function in (22) has the desired property of preserving the structure of the original constraint function “as much as possible” by keeping the convex part of unaltered. Numerical results show that this choice leads to fast convergence schemes, see Sec. VI. However the structure of prevents to be decomposed across the SCeNBs due to the nonadditive coupling among the variables in . To cope with this issue, we lower bound [and thus upper bound in (22)], so that we obtain an alternative approximation of that is separable in all the ’s, while still satisfying C.1C.6. Invoking the Lipschitz property of the (conjugate) gradients on , with constant [given in (19) in Appendix B of the supporting material], we have