# Joint Optimization of Radio and Computational Resources for Multicell Mobile-Edge Computing

## Abstract

Migrating computational intensive tasks from mobile devices to more resourceful cloud servers is a promising technique to increase the computational capacity of mobile devices while saving their battery energy. In this paper, we consider a MIMO multicell system where multiple mobile users (MUs) ask for computation offloading to a common cloud server. We formulate the offloading problem as the *joint* optimization of the radio resourcesthe transmit precoding matrices of the MUsand the computational resourcesthe CPU cycles/second assigned by the cloud to each MUin order to minimize the overall users’ energy consumption, while meeting latency constraints. The resulting optimization problem is nonconvex (in the objective function and constraints). Nevertheless, in the single-user case, we are able to express the global optimal solution in closed form. In the more challenging multiuser scenario, we propose an iterative algorithm, based on a novel successive convex approximation technique, converging to a local optimal solution of the original nonconvex problem. Then, we reformulate the algorithm in a distributed and parallel implementation across the radio access points, requiring only a limited coordination/signaling with the cloud. Numerical results show that the proposed schemes outperform disjoint optimization algorithms.

## 1Introduction

Mobile terminals, such as smartphones, tablets and netbooks, are increasingly penetrating into our everyday lives as convenient tools for communication, entertainment, business, social networking, news, etc. Current predictions foresee a doubling of mobile data traffic every year. However such a growth in mobile wireless traffic is not matched with an equally fast improvement on mobile handsets’ batteries, as testified in [3]. The limited battery lifetime is then going to represent the stumbling block to the deployment of computation-intensive applications for mobile devices. At the same time, in the Internet-of-Things (IoT) paradigm, a myriad of heterogeneous devices, with a wide range of computational capabilities, are going to be interconnected. For many of them, the local computation resources are insufficient to run sophisticated applications. In all these cases, a possible strategy to overcome the above energy/computation bottleneck consists in enabling resource-constrained mobile devices to offload their most energy-consuming tasks to nearby more resourceful servers. This strategy has a long history and is reported in the literature under different names, such as *cyber foraging* [4], or *computation offloading* [5]. In recent years, cloud computing (CC) has provided a strong impulse to computation offloading through virtualization, which decouples the application environment from the underlying hardware resources and thus enables an efficient usage of available computing resources. In particular, Mobile Cloud Computing (MCC) [6] makes possible for mobile users to access cloud resources, such as infrastructures, platforms, and software, on-demand. Several works addressed mobile computation offloading, such as [7]. Recent surveys are [6], [17], and [18]. Some works addressed the problem of program partitioning and offloading the most demanding program tasks, as e.g. in [7]. Specific examples of mobile computation offloading techniques are: *MAUI* [19], *ThinkAir* [20], and *Phone2Cloud* [21]. The trade-off between the energy spent for computation and communication was studied in [12]. A dynamic formulation of computation offloading was proposed in [15]. These works optimized offloading strategies, assuming a given radio access, and concentrated on single-user scenarios. In [23], it was proposed a *joint* optimization of radio and computational resources, for the single user case. The joint optimization was then extended to the multiuser case in [24]; see also [25] for a recent survey on joint optimization for computation offloading in a 5G perspective. The optimal joint allocation of radio and computing resources in [24], [25] was assumed to be managed in a centralized way in the cloud. A decentralized solution, based on a game-theoretic formulation of the problem, was recently proposed in [26], [11]. In current cellular networks, the major obstacles limiting an effective deployment of MCC strategies: i) the energy spent by mobile terminals, especially cell edge users, for radio access; and ii) the latency experienced in reaching the (remote) cloud server through a wide area network (WAN). Indeed, in macro-cellular systems, the transmit power necessary for cell edge users to access a remote base station might null all potential benefits coming from offloading. Moreover, in many real-time mobile applications (e.g., online games, speech recognition, Facetime) the user Quality of Experience (QoE) is strongly affected by the system response time. Since controlling latency over a WAN might be very difficult, in many circumstances the QoE associated to MCC could be poor.

A possible way to tackle these challenges is to bring *both* radio access and computational resources closer to MUs. This idea was suggested in [27], with the introduction of *cloudlets*, providing proximity radio access to fixed servers through Wi-Fi. However, the lack of available fixed servers could limit the applicability of cloudlets. The European project TROPIC [28] suggested to endow small cell LTE base stations with, albeit limited, cloud functionalities. In this way, one can exploit the potential dense deployment of small cell base stations to facilitate proximity access to computing resources and have advantages over Wi-Fi access in terms of Quality-of-Service guarantee and a single technology system (no need for the MUs to switch between cellular and Wi-Fi standards). Very recently, the European Telecommunications Standards Institute (ETSI) launched a new standardization group on the so called *Mobile-Edge Computing* (MEC), whose aim is to provide information technology and cloud-computing capabilities within the Radio Access Network (RAN) in close proximity to mobile subscribers in order to offer a service environment characterized by proximity, low latency, and high rate access [29].

Merging MEC with the dense deployment of (small cell) Base Stations (BSs), as foreseen in the 5G standardization roadmap, makes possible a real proximity, ultra-low latency access to cloud functionalities [25]. However, in a dense deployment scenario, offloading becomes much more complicated because of intercell interference. The goal of this paper is to propose a *joint* optimization of radio and computational resources for computation offloading in a dense deployment scenario, *in the presence of intercell interference*. More specifically, the offloading problem is formulated as the minimization of the overall energy consumption, at the mobile terminals’ side, under transmit power and latency constraints. The optimization variables are the mobile radio resourcesthe precoding (equivalently, covariance) matrices of the mobile MIMO transmittersand the computational resourcesthe CPU cycles/second assigned by the cloud to each MU. The latency constraint is what couples computation and communication optimization variables. This problem is much more challenging than the (special) cases studied in the literature because of the presence of intercell interference, which introduces a coupling among the precoding matrices of all MUs, while making the optimization problem nonconvex. In this context, the main contributions of the paper are the following: i) in the single-user case, we first establish the equivalence between the original nonconvex problem and a *convex one*, and then derive the *closed form* of its (global optimal) solution; ii) in the multi-cell case, hinging on recent Successive Convex Approximation (SCA) techniques [30], we devise an iterative algorithm that is proved to converge to local optimal solutions of the original nonconvex problem; and iii) we propose alternative decomposition algorithms to solve the original centralized problem in a distributed form, requiring limited signaling among BSs and cloud; the algorithms differ for convergence speed, computational effort, communication overhead, and a-priori knowledge of system parameters, but they are all convergent under a unified set of conditions. Numerical results show that all the proposed schemes converge quite fast to “good” solutions, yielding a significant energy saving with respect to disjoint optimization procedures, for applications requiring intensive computations and limited exchange of data to enable offloading.

The rest of the paper is organized as follows. In Section 2 we introduce the system model; Section 3 formulates the offloading optimization problem in the single user case, whereas Section 4 focuses on the multi-cell scenario along with the proposed SCA algorithmic framework. The decentralized implementation is discussed in Section 5.

## 2Computation offloading

Let us consider a network composed of cells; in each cell , there is one Small Cell enhanced Node B (SCeNB in LTE terminology) serving MUs. We denote by the -th user in the cell , and by the set of all the users. Each MU and SCeNB are equipped with transmit and receive antennas, respectively. The SCeNB’s are all connected to a common cloud provider, able to serve multiple users concurrently. We assume that MUs in the same cell transmit over orthogonal channels, whereas users of different cells may interfere against each other.

In this scenario, each MU is willing to run an application within a given maximum time , while minimizing the energy consumption at the MU’s side. To offload computations to the remote cloud, the MU has to send all the needed information to the server. Each module to be executed is characterized by: the number of CPU cycles necessary to run the module itself; the number of input bits necessary to transfer the program execution from local to remote sides; and the number of output bits encoding the result of the computation, to be sent back from remote to local sides.

The MU can perform its computations locally or offload them to the cloud, depending on which strategy requires less energy, while satisfying the latency constraint. In case of offloading, the latency incorporates the time to transmit the input bits to the server, the time necessary for the server to execute the instructions, and the time to send the result back to the MU. More specifically, the overall latency experienced by each MU can be written as

where is the time necessary for the MU to transfer the input bits to its SCeNB; is the time for the server to execute CPU cycles; and is the time necessary for SCeNB to send the bits to the cloud through the backhaul link plus the time necessary to send back the result (encoded in bits) from the server to MU . We derive next an explicit expression of and as a function of the radio and computational resources.

**Radio resources**: The optimization variables at radio level are the users’ transmit covariance matrices , subject to power budget constraints

where is the average transmit power of user . We will denote by the joint set .

For any given profile , the maximum achievable rate of MU is:

where

is the covariance matrix of the noise (assumed to be diagonal w.l.o.g, otherwise one can always pre-whitening the channel matrices) plus the inter-cell interference at the SCeNB (treated as additive noise); is the channel matrix of the uplink in the cell , whereas is the cross-channel matrix between the interferer MU in the cell and the SCeNB of cell ; and denotes the tuple of the covariance matrices of all users interfering with the SCeNB .

Given each , the time necessary for user in cell to transmit the input bits of duration to its SCeNB can be written as

where . The energy consumption due to offloading is then

which depends also on the covariance matrices of the users in the other cells, due to the intercell interference.

**Computational resources**. The cloud provider is able to serve multiple users concurrently. The computational resources made available by the cloud and shared among the users are quantified in terms of number of CPU cycles/second, set to ; let be the fraction of assigned to each user . All the are thus nonnegative optimization variables to be determined, subject to the computational budget constraint . Given the resource assignment , the time needed to run CPU cycles of user ’s instructions remotely is then

The expression of the overall latency [cf. (Equation 1), (Equation 5), and (Equation 7)] clearly shows the interplay between radio access and computational aspects, which motivates a *joint* optimization of the radio resources, the transmit covariance matrices of the MUs, and the computational resources, the computational rate allocation .

We are now ready to formulate the offloading problem rigorously. We focus first on the single-user scenario (cf. Section 3); this will allow us to shed light on the special structure of the optimal solution. Then, we will extend the formulation to the multiple-cells case (cf. Section 4).

## 3The Single-user case

In the single-user case, there is only one active MU having access to the cloud. In such interference-free scenario, the maximum achievable rate on the MU and energy consumption due to offloading reduce to [cf. (Equation 3) and (Equation 6)]

and

respectively, with (for notational simplicity, we omit the user index; denotes now the covariance matrix of the MU).

We formulate the offloading problem as the minimization of the energy spent by the MU to run its application remotely, subject to latency and transmit power constraints, as follows:

where a) reflects the user latency constraint [cf. (Equation 1)], with capturing all the constant terms, i.e., ; b) imposes a limit on the cloud computational resources made available to the users; and c) is the power budget constraint on the radio resources.

**Feasibility:** Depending on the system parameters, problem Equation 10 may be feasible or not. In the latter case, offloading is not possible and thus the MU will perform its computations locally. It is not difficult to prove that the following condition is *necessary* and *sufficient* for to be nonempty and thus for offloading to be feasible:

where is the capacity of the MIMO link of the MU, i.e.,

The unique (closed-form) solution of (Equation 12) is the well-known MIMO water-filling. Note that condition has an interesting physical interpretation: offloading is feasible if and only if , i.e., the delay on the wired network is less than the maximum tolerable delay, and the overall latency constraint is met (at least) when the wireless and computational resources are fully utilized (i.e., , and ). It is not difficult to check that this worst-case scenario is in fact achieved when is satisfied with equality; in such a case, the (globally optimal) solution to Equation 10 is trivially given by , where is the waterfilling solution to . Therefore in the following we will focus w.l.o.g. on Equation 10 under the tacit assumption of *strict* feasibility [i.e., the inequality in is tight].

**Solution Analysis:** Problem Equation 10 is nonconvex due to the non-convexity of the energy function. A major contribution of this section is to i) cast Equation 10 into a convex equivalent problem, and ii) compute its global optimal solution (and thus optimal also to Equation 10) in closed form. To do so, we introduce first some preliminary definitions.

Let Equation 13 be the following auxiliary *convex* problem

which corresponds to minimizing the transmit power of the MU under the same latency and power constraints as in Equation 10. Also, let be the (reduced) eigenvalue decomposition of , with H, where is the (semi-)unitary matrix whose columns are the eigenvectors associated with the positive eigenvalues of , and D is the diagonal matrix, whose diagonal entries are the eigenvalues arranged in decreasing order. We are now ready to establish the connection between Equation 10 and Equation 13.

See Appendix @.1.

Theorem 1 is the formal proof that, in the single-user case, the latency constraint has to be met with equality and then the offloading strategy minimizing energy consumption coincides with the one minimizing the transmit power. Note also that has a water-filling-like structure: the optimal transmit “directions” are aligned with the eigenvectors U of the equivalent channel . However, differently from the classical waterfilling solution [cf. (Equation 12)], the waterlevel is now computed to meet the latency constraints with equality. This means that a transmit strategy using the full power (like ) is no longer optimal. The only case in which is the case where the feasibility condition (Equation 11) is satisfied with equality. Note also that the water-level depends now on *both* communication and computational parameters (the maximum tolerable delay, size of the program state, CPU cycle budget, etc.).

## 4Computation offloading over multiple-cells

In this section we consider the more general multi-cell scenario described in Section 2. The overall energy spent by the MUs to remotely run their applications is now given by

with defined in (Equation 6). If some fairness has to be guaranteed among the MUs, other objective functions of the MUs’ energies can be used, including the weighted sum, the (weighted) geometric mean, etc.. As a case-study, in the following, we will focus on the minimization of the sum-energy , but the proposed algorithmic framework can be readily applied to the alternative aforementioned functions.

Each MU is subject to the power budget constraint and, in case of offloading, to an overall latency given by

The offloading problem in the multi-cell scenario is then formulated as follows:

where a) represent the users’ latency constraints with ; and the constraint in b) is due to the limited cloud computational resources to be allocated among the MUs.

**Feasibility**: The following conditions are sufficient for to be nonempty and thus for offloading to be feasible: for all , and there exists a such that

Problem is nonconvex, due to the nonconvexity of the objective function and the constraints a). In what follows we exploit the structure of and, building on some recent Successive Convex Approximation (SCA) techniques proposed in [30], we develop a fairly general class of efficient approximation algorithms, all converging to a local optimal solution of . The numerical results will show that the proposed algorithms converge in a few iterations to “good” locally optimal solutions of (that turn out to be quite insensitive to the initialization). The main algorithmic framework, along with its convergence properties, is introduced in Section 4.1; alternative distributed implementations are studied in Section 5.

### 4.1Algorithmic design

To solve the non-convex problem efficiently, we develop a SCA-based method where is replaced by a sequence of *strongly convex* problems. At the basis of the proposed technique, there is a suitable *convex* approximation of the nonconvex objective function and the constraints around the iterates of the algorithm, which are preliminarily discussed next.

#### Approximant of

Let and , with and . Let be any closed convex set containing such that is well-defined on it. Note that such a set exits. For instance, noting that at every (feasible) , it must be , , for all and . Hence, condition in Equation 16 can be equivalently rewritten as

so that one can choose .

Following [30], our goal is to build, at each iteration , an approximant, say , of the nonconvex (nonseparable) around the current (feasible) iterate that enjoys the following key properties:

- P1:
is uniformly

*strongly convex*on ;- P2:
, ;

- P3:
is Lipschitz continuous on ;

where denotes the conjugate gradient of with respect to . Conditions P1-P2 just guarantee that the candidate approximation is strongly convex while preserving the same first order behaviour of at any iterate ; P3 is a standard continuity requirement.

We build next a satisfying P1-P3. Observe that i) for any given , each term of the sum in [cf. (Equation 14)] is the product of two convex functions in [cf. (Equation 6)], namely: and ; and ii) the other terms of the sum with are not convex in . Exploiting such a structure, a convex approximation of can be obtained for each MU by convexifying the term and linearizing the nonconvex part . More formally, denoting , for each , let us introduce the “approximation” function :

where: the first two terms on the right-hand side are the aforementioned convexification of ; the third term comes from the linearization of , with and denoting the conjugate gradient of with respect to evaluated at , and given by

the fourth term in is a quadratic regularization term added to make uniformly strongly convex on .

Based on each , we can now define the candidate sum-energy approximation as: given ,

It is not difficult to check that satisfies P1-P3; in particular it is strongly convex on with constant . Note that is also separable in the users variables , which is instrumental to obtain distributed algorithms across the SCeNBs, see Section 5.

#### Inner convexification of the constraints

We aim at introducing an inner convex approximation, say , of the constraints around , satisfying the following key properties (the proof is omitted for lack of space and reported in Appendix B in the supporting material) [30]:

- C1:
is uniformly convex on ;

- C2:
, ;

- C3:
is continuous on ;

- C4:
, and ;

- C5:
, ;

- C6:
is Lipschitz continuous on .

Conditions C1-C3 are the counterparts of P1-P3 on ; the extra condition C4-C5 guarantee that is an inner approximation of , implying that any satisfying is feasible also for the original nonconvex problem .

To build a satisfying C1-C6, let us exploit first the concave-convex structure of the rate functions [cf. (Equation 3)]:

where

with defined in (Equation 4). Note that and are concave on and convex on , respectively. Using , and observing that at any (feasible) , it must be and for all and , the constraints in Equation 16 can be equivalently rewritten as

where with a slight abuse of notation we used the same symbol to denote the constraint in the equivalent form.

The desired inner convex approximation is obtained from by retaining the convex part in (Equation 22) and linearizing the concave term , resulting in:

where each is defined as

and .

#### Inner SCA algorithm: centralized implementation

We are now ready to introduce the proposed inner convex approximation of the nonconvex problem , which consists in replacing the nonconvex objective function and constraints in with the approximations and , respectively. More formally, given the feasible point , we have

where we denoted by the unique solution of the strongly convex optimization problem.

The proposed solution consists in solving the sequence of problems Equation 24, starting from a feasible . The formal description of the method is given in Algorithm ?, which is proved to converge to local optimal solutions of the original nonconvex problem in Theorem ?. Note that in Step 3 of the algorithm we include a memory in the update of the iterate . A practical termination criterion in Step 1 is , where is the prescribed accuracy.

The proof is omitted for lack of space and reported in Appendix B of the supporting material.

Theorem ? offers some flexibility in the choice of the free parameters and while guaranteeing convergence of Algorithm ?. For instance, is positive if all and are positive (but arbitrary); in the case of full-column rank matrices , one can also set (still resulting in ). Many choices are possible for the step-size ; a practical rule satisfying ( ?) that we found effective in our experiments is [32]:

with .

*On the implementation of Algorithm 2:* Since the base stations are connected to the cloud throughout high speed wired links, a good candidate place to run Algorithm 2 is the cloud itself: The cloud collects first all system parameters needed to run the algorithm from the SCeNBs (MUs’ channel state information, maximum tolerable latency, etc.); then, if the feasibility conditions are satisfied, the cloud solves the strongly convex problems (using any standard nonlinear programming solver), and sends the solutions back to the corresponding SCeNBs; finally, each SCeNB communicates the optimal transmit parameters to the MUs it is serving.

*Related works:* Algorithm 2 hinges on the idea of successive convex programming, which aims at computing stationary solutions of some classes of nonconvex problems by solving a sequence of convexified subproblems. Some relevant instances of this method that have attracted significant interest in recent years are: i) the basic DCA (Difference-of-Convex Algorithm) [33]; ii) the M(ajorization)-M(inimization) algorithm [35]; iii) alternating/successive minimization methods [37]; and iv) partial linearization methods [40]. The aforementioned methods identify classes of “favorable” nonconvex functions, for which a suitable convex approximation can be obtained and convergence of the associated sequential convex programming method can be proved. However, the sum-energy function in (Equation 14) and the resulting nonconvex optimization problem do not belong to any of the above classes. More specifically, what makes current algorithms not readily applicable to Problem is the lack in the objective function of a(n additively) separable convex and nonconvex part [each in (Equation 14) is in fact the ratio of two functions, and , of the *same* set of variables]. Therefore, the proposed approximation function , along with the resulting SCA-algorithm, i.e., Algorithm 2, are an innovative contribution of this work.

## 5Distributed Implementation

To alleviate the communication overhead of a centralized implementation (Algorithm 2), in this section we devise *distributed* algorithms converging to local optimal solutions of . Following [31], the main idea is to choose the approximation functions and so that (on top of satisfying conditions P.1-P.3 and C.1-C.6, needed for convergence) the resulting convexified problems can be decomposed into (smaller) subproblems solvable in parallel across the SCeNBs, with limited signaling between the SCeNBs and the cloud.

Since the approximation function introduced in (Equation 19) is (sum) separable in the optimization variables of the MUs in each cell, any choice of ’s enjoying the same decomposability structure leads naturally to convexified problems that can be readily decomposed across the SCeNBs by using standard primal or dual decomposition techniques.

Of course there is more than one choice of meeting the above requirements; all of them lead to *convergent* algorithms that however differ for convergence speed, complexity, communication overhead, and a-priori knowledge of the system parameters. As case study, in the following, we consider two representative valid approximants. The first candidate is obtained exploiting the Lipschitz property of the gradient of the rate functions , whereas the second one is based on an equivalent reformulation of introducing proper slack variables. The first choice offers a lot of flexibility in the design of distributed algorithmsboth primal and dual-based schemes can be invokedbut it requires knowledge of all the Lipschitz constants. The second choice does not need this knowledge, but it involves a higher computational cost at the SCeNBs side, due to the presence of the slack variables.

### 5.1Per-cell distributed dual and primal decompositions

The approximation function in (Equation 23) has the desired property of preserving the structure of the original constraint function “as much as possible” by keeping the convex part of unaltered. Numerical results show that this choice leads to fast convergence schemes, see Section 6. However the structure of prevents to be decomposed across the SCeNBs due to the *nonadditive* coupling among the variables in . To cope with this issue, we lower bound [and thus upper bound in (Equation 23)], so that we obtain an alternative approximation of that is *separable in all* the ’s, while still satisfying C.1-C.6. Invoking the Lipschitz property of the (conjugate) gradients on , with constant [given in (19) in Appendix B of the supporting material], we have

for all , where each and are defined respectively as

with and

Note that is (sum) separable in the MUs’ covariance matrices ’s. The desired approximant of can be then obtained just replacing in with [cf. (Equation 23)], resulting in

with and given by

It is not difficult to check that , on top of being separable in the MUs’ covariance matrices, also satisfies the required conditions C.1-C.6. Using instead of , the convexified subproblem replacing is: given ,

where with a slight abuse of notation we still use to denote the unique solution of Equation 27.

Problem Equation 27 is now (sum) separable in the MUs’ covariance matrices; it can be solved in a distributed way using standard primal or dual decomposition techniques. We briefly show next how to customize standard dual algorithms to Equation 27.

#### Per-cell optimization via dual decomposition

The subproblems Equation 27 can be solved in a distributed way if the side constraints are dualized (note that there is zero duality gap). The dual problem associated with Equation 27 is: given ,