Generalized Sparse and LowRank Optimization for UltraDense Networks
Abstract
Ultradense network (UDN) is a promising technology to further evolve wireless networks and meet the diverse performance requirements of 5G networks. With abundant access points, each with communication, computation and storage resources, UDN brings unprecedented benefits, including significant improvement in network spectral efficiency and energy efficiency, greatly reduced latency to enable novel mobile applications, and the capability of providing massive access for Internet of Things (IoT) devices. However, such great promises come with formidable research challenges. To design and operate such complex networks with various types of resources, efficient and innovative methodologies will be needed. This motivates the recent introduction of highly structured and generalizable models for network optimization. In this article, we present some recently proposed largescale sparse and lowrank frameworks for optimizing UDNs, supported by various motivating applications. A special attention is paid on algorithmic approaches to deal with nonconvex objective functions and constraints, as well as computational scalability.
1 Introduction
As mobile data traffic keeps growing at an exponential rate, and mobile applications pose more and more stringent and diverse requirements, wireless networks are facing unprecedented pressures. To further evolve wireless networks and maintain their competitiveness, network infrastructure densification stands out as a promising approach. By deploying more radio access points, supplemented with storage and computational capabilities, we can not only increase network capacity, but also improve network energy efficiency, enable lowlatency mobile applications, and provide access for massive mobile devices. Such ultradense network (UDN) provides an ideal platform to develop disruptive proposals to advance wireless information technologies, including cloud radio access networks (CRANs), wireless edge caching, and mobile edge computing. These are achieved by leveraging innovative ideas in different areas, such as softwaredefined networking, network function virtualization, contentcentric networking, cloud and fog computing.
By enabling capabilities of cloud computing and softwaredefined networking, UDNs can easily support CRAN as an effective network architecture to exploit the benefits of network densification via centralized signal processing and interference management [1, 2]. This is achieved by moving the baseband processing functionality to the cloud data center via highcapacity fronthaul links, supported by massively deployed lowcost remote radio heads (RRHs). Meanwhile, the Internet is shifting from the “connectioncentric” mode to the “contentcentric” mode to support highvolume content delivery [3]. By enabling content caching at radio access points, i.e., wireless edge caching, UDNs can assist the Internet architecture evolution and achieve more efficient content delivery for mobile users [4]. Another trend is the increasing computation intensity in mobile applications, which puts a heavy burden on resourceconstrained mobile devices. Mobile edge computing was recently proposed as a promising solution, by offloading computation tasks of mobile applications to servers at nearby access points. It avoids excessive propagation delay in the backbone network, compared to mobile cloud computing, and thus enables latencycritical applications. All of these systems are built upon the UDN platform, which enables integration of the storage, computing, control and networking functionalities at the ubiquitous access points. In particular, CRANs serve the purpose of providing higher data rates, while mobile edge caching and computing networks enable lowlatency content delivery and mobile applications.
Models  Structured Sparse Optimization (1)  Generalized LowRank Optimization (2) 

Applications  Largescale network adaptation:  Network optimization with side information: 
1. Network power minimization  1. Topological interference management  
2. User admission control  2. Wireless distributed computing  
3. Active user detection  3. Mobile edge caching  
Algorithms  Convex optimization solver [5]:  Riemannian optimization solver [6]: 
1. convergence rate: (: iterations)  1. Superlinear convergence rate with conjugate gradient  
2. Subspace projection per iteration  2. Quadratic convergence rate with trustregion  
3. Parallel cone projection per iteration  3. Compute Riemannian gradient and Hessian per iteration 
However, all the emerging networking paradigms associated with UDNs bring formidable challenges to network optimization, signal processing and resource allocation, given the highly complex network topology, the massive amount of required side information, and the high computational requirement. Typical design problems are nonconvex in nature, and of enormously large scales, i.e., with large numbers of constraints and optimization variables. For examples, the uncertainty or estimation error in the available channel state information (CSI) yields nonconvex qualityofservice (QoS) constraints, while such network performance metrics as sum throughput and energy efficiency lead to nonconvex objective functions. Thus effective and scalable design methodologies, with the capability of handling nonconvex constraints and objectives, will be needed to fully exploit the benefits of UDNs. The aim of this article is to present recent advances in sparse and lowrank techniques for optimizing dense wireless networks [7, 8, 9], with a comprehensive coverage including modeling, algorithm design, and theoretical analysis. We identify two representative classes of design problems in UDNs, i.e., largescale network adaption and side information assisted network optimization.
The first class of design problems are for the efficient network adaptation in UDNs, including radio access point selection [7], backhaul data assignment, user admission control, user association [10], and active user detection [9]. Such largescale network adaptation problems involve both discrete and continuous decision variables, which motivates us to enforce sparsity structures in the solutions. The success of the structured sparse optimization for network adaptation comes from the key observation that such adaptation can be achieved by enforcing structured sparsities in the solution, which will be presented in Section 2 in details. The second class of design problems involve how to effectively utilize the available side information for network optimization, including topological interference management [11], wireless distributed computing [12], and mobile edge caching [4]. Network side information is critical to design UDNs, and it can take various forms, such as the network connectivity information, cache content placement at access points, and locally computed intermediate values in wireless distributed computing. In Section 3, we will present a general incomplete matrix framework to model various network side information, which leads to a unified network performance metric via the rank of the modeling matrix for optimizing UDNs.
Although the structured sparse and lowrank techniques enjoy the benefits of modeling flexibility, the sparse function and rank function are nonconvex, which brings computational challenges [13, 14]. Furthermore, typical optimization problems in UDNs bear complicated structures, which make most of the existing algorithms and theoretical results inapplicable. To address these algorithmic challenges, we present various convexification procedures for both objectives and constraints throughout our discussion. Moreover, scalable convex optimization algorithms and nonconvex optimization techniques, such as Riemannian optimization, will be presented in Section 4. This article shall serve the purpose of providing network modeling methodologies and scalable computational tools for optimizing complex UDNs, as summarized in Table 1.
2 Structured Sparse Optimization for LargeScale Network Adaptation
In UDNs, to effectively utilize densely deployed access points to support massive mobile devices, largescale network adaptation will play a pivotal role. For various network adaptation problems in UDNs, the solution vector is expected to be sparse in a structured manner, e.g., radio access point selection results in a group sparsity structure. To illustrate the powerfulness of the generalized sparse representation and scalable optimization paradigms, in this section, we present representative examples of group sparse beamforming for green CRANs, and structured sparse optimization for active users detection and user admission control.
2.1 Generalized Structured Sparse Models
In this part, two motivating applications of generalized sparse models for largescale network adaptation are presented.
2.1.1 LargeScale Structured Optimization
We take green CRAN as an example to illustrate structured optimization for network adaptation. In CRANs, the network power consumption consists of the transmit power of active RRHs and the power of the corresponding active fronthaul links. By exploiting the spatial and temporal data traffic fluctuation, network adaption via dynamically switching off RRHs and the associated fronthaul links can significantly reduce the network power consumption. To minimize the network power of a CRAN, we need to optimize over both the discrete variables (i.e., the selection of RRHs and fronthaul links) and continuous variables (i.e., downlink beamforming coefficients), yielding a mixed combinatorial optimization problems, which is highlyintractable. To support efficient algorithm design and analysis, a principled group sparse beamforming framework was proposed in [7] by enforcing the group sparsity structure in the solution vectors. This is achieved by a group sparsity representation of the discrete optimization variables for RRHs selection as shown in Fig. 1. Specifically, by regarding all the beamforming coefficients of one RRH as a group, switching off this RRH corresponds to setting all the associated beamforming coefficients in the same group to be zero simultaneously. We thus enforce the group sparsity structure in the aggregative beamforming vector to guide switching off the corresponding RRHs to minimize the network power consumption. Similar to group sparse beamforming for RRH selection, there is a corresponding user side node selection problem. With crowded mobile devices, it is critical to maximize the user capacity, i.e., the number of admitted users. This user admission problem is equivalent to minimizing the number of violated QoS constraints (modeled as for the th user and may be infeasible), which can further be modeled as minimizing the individual sparsity of the auxiliary vector with indicating the violations of the QoS constraints. That is, for the constraint (always feasible as the auxiliary variable ), indicates that the original QoS constraint is feasible, while indicates that the original QoS constraint is infeasible. Therefore, by enforcing this structured sparsity in the solution, user admission can be effectively handled.
2.1.2 HighDimensional Structured Estimation
(a) Phase transitions in noiseless scenario. (b) NMSE in noisy scenario. 
With limited radio resources, it is challenging to support massive device connectivity for such applications as IoT. Fortunately, only part of the massive devices will be active at a time given the sporadic traffic for the emerging applications (e.g., machinetype communications, InternetofThings (IoT)) [9]. Active user detection is thus a key problem for providing massive connectivity in UDNs, which turns out to be a structured sparse estimation problem. Specifically, suppose we have singleantenna mobile devices ( of which are active) and one antenna base station (BS). The received signal at the BS has the form , where is the unknown diagonal activity matrix with nonzero diagonals whose positions are to be estimated, is the unknown channel matrix from all the devices to the BS, is the known pilot matrix with training length , and is the additive noise. We thus need to simultaneously estimate the channel matrix and , which poses a great challenge. We observe that detecting the active users is equivalent to estimating the group sparsity structure of the combined matrix , which has a group structured sparsity in columns of matrix , induced by the structure of . That is, when mobile device is inactive, all the entries in the th column in matrix become zeros simultaneously. Due to the limited radio resources, the training length will be much smaller than the channel dimension , and thus, the estimation problem is illposed and yields a highdimensional structured estimation problem.
Fortunately, the embedded lowdimensional structure (i.e., the structured sparsity) can be algorithmically exploited to ensure the success for the highdimensional structured estimation, as illustrated in Fig. 2 for the behaviors of phase transitions and normalized mean square error (NMSE) . Phase transition defines a sharp change in the behavior of a computational problem as its parameters vary. Convex geometry and conic integral geometry provide principled ways to theoretical predicate the phase transitions precisely [15]. In particular, the phase transition phenomenon in Fig. 2 (a) reveals the fundamental limits of sparsity recovery in the best cases, i.e., without noise. Specifically, such study reveals that the required training length, or the number of measurements, depends on the sparsity level of , and highly accurate user activity detection can be achieved with sufficient measurements. Fig. 2 (b) further demonstrates that the lowdimensional structure can be exploited to significantly reduce the training length for active users detection even in the noisy scenarios.
2.2 A Generalized Sparse Optimization Paradigm
We have demonstrated that effective network adaptation can be achieved by either inducing vector sparsity in the structured manner or estimating the structured sparsity pattern. In this part, we provide a generalized sparse optimization framework to algorithmically exploit the lowdimensional structures in UDNs. This is achieved by optimizing a constrained composite combinatorial objective:
(1) 
where is the index set of nonzero coefficients of a vector , is a combinatorial positivevalued setfunction to control the structured sparsity in , is a continuous convex function in to represent the system performance such as transmit power consumption, and the constraint set serves the purpose of modeling system constraints, e.g, transmit power constraints and QoS constraints. The most natural convex surrogate for a nonconvex function is its convex envelope, i.e., its tightest convex lower bound. The main motivation for convexifying function is that the convexified optimization problems make it possible to use the convex geometry theory [15] to reveal benign properties about the globally optimal solutions, which can be computed with efficient algorithms. For example, the individual sparsity function with norm in can be convexified to the norm. The group sparsity function can be convexified by the mixed norm. More general convex relaxation results can be derived based on the principles of convex analysis [15]. Note that, it is critical to establish the optimality for various convex relaxation approaches in UDNs. For example, for the nonconvex active user detection problem in Section 2.1.2, the optimality condition can be established via the conic geometry approach in [15].
The constraint set serves the purpose of modeling various QoS constraints including unicast beamforming, multicast beamforming, and stochastic beamforming, just to name a few. For example, the nonconvex QoS constraints for unicast beamforming can be equivalently transformed into convex secondorder cone constraints [7]. Furthermore, physical layer integration techniques can effectively improve the network performance via providing multicast services, which, however, yield nonconvex quadratic QoS constraints. The semidefinite relaxation (SDR) technique turns out to be effective to convexify the nonconvex quadratic constraints via lifting the original vector problem to higher matrix dimensions, followed by dropping the rankone constraints. For stochastic beamforming with probabilistic QoS constraints due to CSI uncertainty, the probabilistic QoS constraints can be convexified based on the principles of the majorizationminimization procedure, yielding sequential convex approximations. In summary, the general formulation in (1) enables efficient algorithm design and analysis for network adaptation in UDNs.
3 Generalized LowRank Optimization with Network Side Information
UDNs are highly complex to optimize, for which it is critical to exploit the available network side information. For example, network connectivity information, cached content at the access points, and locally computed intermediate values, all serve as exploitable side information for efficiently designing coding and decoding in UDNs. In this section, we provide a generalized lowrank matrix modeling framework to exploit the network side information, which helps to efficiently optimize across the communication, computation, and storage resources. To demonstrate the powerfulness of this framework, we present topological interference alignment as a concrete example and then extend it to cacheaided interference channels and wireless distributed computing systems. A general lowrank optimization problem is then formulated by incorporating the network side information.
3.1 Network Side Information Modeling via Incomplete Matrix
(a) TIM problem. (b) Cacheaided interference channel. (c) Side information modeling matrix. 
To exploit the full performance gains of network densification, recent years have seen progresses on interference management under various scenarios depending on the amount of shared CSI and user messages. Typical interference management strategies include interference alignment, interference coordination, coordinated multipoint transmission and reception, to name just a few. However, the significant overhead of acquiring global CSI motivates numerous research efforts on CSI overhead reduction strategies, e.g., delayed CSI, alternating CSI and mixed CSI. One of the most promising strategies is topological interference management (TIM) [11], for which only network connectivity information is required. This is based on the fact that most of the wireless channel propagation links are weak enough to be ignored, thanks to pathloss and shadowing. However, the TIM problem turns out to be linear index coding problems [11], which are in general highly intractable and only partial results exist for special cases. Recently, a new proposal was made for the TIM problem, which can greatly assist the algorithm design. The main innovation is to model the network connectivity pattern in UDNs as an incomplete matrix. Then the TIM problem can be formulated as a generalized matrix completion problem^{1}^{1}1B. Hassibi, “Topological interference alignment in wireless networks,” Smart Antennas Workshop, Aug. 2014., which helps to develop effective linear precoding and decoding strategies. Fig. 3 demonstrates the modeling framework, with Fig. 3 (a) showing a 5user interference channel as an example and Fig. 3 (c) showing the corresponding modeling matrix. The task of TIM is to complete the side information modeling matrix, which will then determine the precoder and decoder [8].
This modeling framework is very powerful, and can be adopted to consider other design problems in UDNs. By equipping the densely deployed radio access points and mobile devices with isolated cache storages, caching the content at the edge of the network provides a promising way to improve the throughput and reduce latency, as well as reducing the load of the core network and radio access networks [4]. In general, contentcentric communications consist of two phases, a content placement phase followed by a content delivery phase. However, due to the coupled wireline and wireless communications in cacheaided UDNs, unique challenges arise in the edge caching problem. Fortunately, the incomplete matrix modeling framework can capture the information of the content cached at different nodes. Fig. 3 (b) shows an example for cacheaided 5user interference channels, where the side information is represented in the side information modeling matrix in Fig. 3 (c). Similarly, this modeling framework can also be extended to wireless distributed computing networks [12]. For the prevalent distributed computing structures like MapReduce and Spark, the basic idea is that intermediate values computed in the “map” phase based on the locally available dataset, can be regarded as the side information for the “reduce” phase to compute the output value for a given input. This thus can help reduce the communication overhead in the “shuffle” phase to obtain the intermediate values that are not computed locally in the “map” phase. The incomplete matrix modeling approach will help to formulate the design problems for wireless caching and distributed computing systems.
3.2 A Generalized LowRank Optimization Paradigm
We have presented an effective and general framework to model various network side information in UDNs. Next we present a lowrank optimization formulation to exploit the available network side information. The side information modeling matrix as shown in Fig. 3 (c) helps cancel interference over channel uses, yielding an interferencefree channel with degreesoffreedom (DoF), i.e., the firstorder data characterization. Observe that the rank of the side information modeling matrix , denoted by , equals the number of channel uses , which equals the inverse of the achievable DoF. To maximize the achievable degreesoffreedom (DoF), we thus can minimize the rank of the side information modeling matrix, yielding the following generalized lowrank optimization problem:
(2) 
where the constraint set encodes the network side information. Lowrank optimization has been proved to be a key design tool in machine learning, highdimensional statistics, signal processing and computational mathematics [14]. The rank function is nonconvex and thus is computationally difficult, but convexifying it leads to efficient algorithms. For example, the nuclear norm (i.e., the summation of singular valves of a matrix) provides a convex surrogate of the rank function that is analogous to the norm relaxation of the cardinality of a vector.
Given the special structure of the side information modeling matrix in UDNs, most existing algorithmic and theoretical results for lowrank optimization are inapplicable. The recent work [8] contributed a novel proposal of nonconvex paradigms for solving the generalized lowrank optimization problem (2) by optimizing over the nonconvex rank constraints directly via Riemannian optimization and matrix factorization. Fig. 4 illustrates the phase transition behavior for the generalized lowrank optimization in topological interference management, which characterizes the relationships between the achievable DoF and the number of connected interference links on average. Given the rank, representing the achievable DoF, with more connected interference links, the success probability for recovering the incomplete side information modeling matrix is lower. It thus provides the guidelines for network deployment in dense wireless networks, content placement in cacheaided interference channels, and dataset placement in wireless distributed computing systems.
4 Optimization Algorithms and Analysis
(a) Iterates of Riemannian optimization algorithm. (b) Convergence rates of various algorithms. 
We have seen quite a few algorithmic challenges for the sparse and lowrank modeling frameworks for UDNs. In this section, we present some new trends in optimization algorithms for solving the generalized sparse and lowrank optimization problems in the forms of (1) and (2), respectively. Basically, numerical optimization algorithms can be classified in terms of first versus second order methods, depending on whether they use only gradientbased information versus calculations of both the first and second derivatives. The convergence rates of secondorder methods are usually faster with the caveat that each iteration is more expensive. In general, there is a tradeoff between the periteration computation cost versus the total number of iterations, though firstorder methods often scale better to largescale highdimensional statistics problems [13]. While optimization problems in communication systems are typically solved in the convex paradigm with the secondorder methods, thanks to the ease of use of the CVX toolbox, we have observed the necessity of the firstorder methods and the importance of the nonconvex paradigm, as will be elaborated in the following parts.
4.1 Convex Optimization Algorithms
We have presented a variety of methodologies to convexify the nonconvex objective functions and nonconvex constraints for the generalized sparse optimization problem (1). Newton iteration based interiorpoint methods supported by many userfriendly software packages (e.g., CVX) provide a general way to solve constrained convex optimization problems. However, the cubic computational complexity of each Newton step limits its capability to scale to large network sizes in UDNs. This motivates enormous research efforts to improve the computational efficiency for convex programs, including the techniques of firstorder methods, randomization, parallel and distributed computing.
Parallel and distributed optimization provides a principled way to exploit the distributed computing environments to increase the levels of scalability, while reducing the communication costs. To solve a general largescale convex programs, a principled twostage framework has recently been proposed in [5] with the capability of providing certificates of infeasibility, enabling parallel and scale computing. This is achieved, in the first stage, by the matrix stuffing technique to fast transform the original convex programs into the standard conic optimization problem form via updating the associated values in the prestored structure of the standard conic program. In the second stage, the ADMM based algorithm is adopted to solve the standard largescale conic optimization problem via exploiting the problem structures [5] to enable parallel cone projection at each iterate.
Other lines of works have focused on the use of firstorder methods and randomization to solve large convex programs. In particular, for sparse convex optimization problems, FrankWolfetype algorithms (a.k.a., conditional gradient) have recently gained enormous interests, fueled by the excellent scalability with projectionfree operations via exploiting the wellstructured sparsity constraints. The coordinate descent method has gained its popularity for scalability by choosing a single coordinate (or a block of coordinates) to be updated within each iteration, thereby reducing the iteration computing cost. Approximation techniques, including randomization methods and sketching methods, further provide algorithmic opportunities to enable scalability for, in particular, firstorder methods, via speeding up numerical linear algebra or reducing problem dimensions. In particular, the stochastic gradient method provides a generic way to stochastically approximate the gradient descent method to solve largescale machine learning problems. All the above presented algorithmic and theoretical results may be leveraged to solve largescale convex optimization problems in UDNs.
4.2 Nonconvex Optimization Algorithms
Recently, a new line of work has attracted significant attentions, which focuses on solving the nonconvex optimization problems directly via developing efficient nonconvex procedures, sometimes with optimality guarantee. We have seen recent progress on nonconvex procedures based on various algorithms (e.g., projected/stochastic/conditional gradient methods, Riemannian manifold optimization algorithms) for a class of highdimensional statistical problems and machine learning problems, including lowrank matrix completion, phase retrieval and blind deconvolution, to name just a few. In particular, optimization by directly exploiting problems’ manifold structures is becoming a general and powerful approach to solve various nonconvex optimization problems. The structured constraints such as rank and orthogonality appear in many machine learning applications, including sensor network localization, dimensionality reduction, lowrank matrix recovery, phase synchronization, and community detection.
At a highlevel standpoint, Riemannian optimization is the extension of standard unconstrained optimization searching in the Euclidean space to optimizing in the Riemannian manifold space by generalizing the concepts such as the gradient and Hessian [6]. A graphic representation of Riemannian optimization algorithms is illustrated in Fig. 5. Specifically, the Euclidean gradient needs to be projected to the tangent space of manifold to define a search direction (which can be computed based on the principles of the conjugate gradient method or trustregion method), followed by the retraction operator to define a new iterate ( is the step size) on the manifold . In particular, we exploit the manifold geometry of fixedrank matrices to solve the lowrank optimization problem (2) efficiently. Fig. 5 (b) demonstrates the effectiveness of Riemannian optimization based methods. It shows that the Riemannian optimization enjoys fast convergence rates, e.g., compared with an existing approach based on alternating minimization.
5 Conclusions and Future Directions
This article presented generalized sparse and lowrank optimization techniques for optimizing across communication, computation and storage resources in UDNs by exploiting network structures and side information. Illustrated by important application examples, various structured sparse modeling methods were introduced, and an incomplete matrix representation was presented to model different types of network side information. Methodologies of designing scalable algorithms were discussed, including both convex and nonconvex methods. The presented results and methodologies demonstrated the effectiveness of structured optimization techniques for designing UDNs.
Despite the encouraging progress, there still remain a variety of interesting open questions. To date, generalized sparse and lowrank optimization techniques are mainly applied to improve the network energy efficiency and spectral efficiency in UDNs. However, emerging mobile applications have strong demands for user privacy and ultralow latency communications, which call for more general mathematical models and formulations. Other interesting problems concern the theoretical analysis for the generalized sparse and lowrank optimization models and algorithms. Although we have seen significant progresses for theoretical understanding of sparse and lowrank optimization problems via convex relaxation approaches [15] and nonconvex procedures, it is challenging to apply existing results to the generalized sparse and lowrank optimization problems (1) and (2) due to the complicated structures. Finally, there are a variety of interesting research directions associated with improving the computational scaling behaviour of various algorithms via recent proposals, e.g., randomized algorithms based on sketching.
References
 [1] T. Q. S. Quek, M. Peng, O. Simeone, and W. Yu, Cloud Radio Access Networks: Principles, Technologies, and Applications. Cambridge University Press, 2017.
 [2] H. Zhang, Y. Dong, J. Cheng, M. J. Hossain, and V. C. M. Leung, “Fronthauling for 5G LTEU ultra dense cloud small cell networks,” IEEE Wireless Commun. Mag., vol. 23, pp. 48–53, Dec. 2016.
 [3] G. Xylomenos, C. N. Ververidis, V. A. Siris, N. Fotiou, C. Tsilopoulos, X. Vasilakos, K. V. Katsaros, and G. C. Polyzos, “A survey of informationcentric networking research,” IEEE Commun. Surveys Tuts., vol. 16, pp. 1024–1049, Second 2014.
 [4] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role of proactive caching in 5g wireless networks,” IEEE Commun. Mag., vol. 52, pp. 82–89, Aug. 2014.
 [5] Y. Shi, J. Zhang, B. O’Donoghue, and K. Letaief, “Largescale convex optimization for dense wireless cooperative networks,” IEEE Trans. Signal Process., vol. 63, pp. 4729–4743, Sept. 2015.
 [6] N. Boumal, B. Mishra, P.A. Absil, and R. Sepulchre, “Manopt, a Matlab toolbox for optimization on manifolds,” J. Mach. Learn. Res., vol. 15, pp. 1455–1459, 2014.
 [7] Y. Shi, J. Zhang, and K. B. Letaief, “Group sparse beamforming for green CloudRAN,” IEEE Trans. Wireless Commun., vol. 13, pp. 2809–2823, May 2014.
 [8] Y. Shi, J. Zhang, and K. B. Letaief, “Lowrank matrix completion for topological interference management by Riemannian pursuit,” IEEE Trans. Wireless Commun., vol. 15, pp. 4703–4717, Jul. 2016.
 [9] G. Wunder, H. Boche, T. Strohmer, and P. Jung, “Sparse signal processing concepts for efficient 5G system design,” IEEE Access, vol. 3, pp. 195–208, 2015.
 [10] H. Zhang, S. Huang, C. Jiang, K. Long, V. C. M. Leung, and H. V. Poor, “Energy efficient user association and power allocation in millimeterwavebased ultra dense networks with energy harvesting base stations,” IEEE J. Sel. Areas Commun., vol. 35, pp. 1936–1947, Sept. 2017.
 [11] S. Jafar, “Topological interference management through index coding,” IEEE Trans. Inf. Theory, vol. 60, pp. 529–568, Jan. 2014.
 [12] S. Li, M. A. MaddahAli, and A. S. Avestimehr, “Coding for distributed fog computing,” IEEE Commun. Mag., vol. 55, pp. 34–40, Apr. 2017.
 [13] M. J. Wainwright, “Structured regularizers for highdimensional problems: Statistical and computational issues,” Annu. Rev. Stat. Appl., vol. 1, pp. 233–253, 2014.
 [14] M. A. Davenport and J. Romberg, “An overview of lowrank matrix recovery from incomplete observations,” IEEE J. Sel. Topics Signal Process., vol. 10, pp. 608–622, Jun. 2016.
 [15] D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp, “Living on the edge: phase transitions in convex programs with random data,” Inf. Inference, vol. 3, pp. 224–294, Jun. 2014.
Biographies
 Yuanming Shi

[S’13M’15] (shiym@shanghaitech.edu.cn) received the B.S. degree from Tsinghua University in 2011, and the Ph.D. degree from The Hong Kong University of Science and Technology (HKUST) in 2015. He is currently an Assistant Professor at ShanghaiTech University. He received the 2016 IEEE Marconi Prize Paper Award and the 2016 Young Author Best Paper Award by the IEEE Signal Processing Society. His research interests include dense wireless networks, intelligent IoT, mobile AI, machine learning, statistics, and optimization.
 Jun Zhang

[M’10SM’15] (eejzhang@ust.hk) received the Ph.D. degree from the University of Texas at Austin. He is currently a Research Assistant Professor at Hong Kong University of Science and Technology. He received the 2016 Marconi Prize Paper Award in Wireless Communications, and the 2016 IEEE ComSoc AsiaPacific Best Young Researcher Award. His research interests include dense wireless cooperative networks, mobile edge caching and computing, cloud computing, and big data analytics systems.
 Wei Chen

[S’05M’07SM’13] (wchen@tsinghua.edu.cn) received his BS and Ph.D. degrees (Hons.) from Tsinghua University in 2002 and 2007, respectively. Since 2007, he has been on the faculty at Tsinghua University, where he is a tenured full Professor and a member of the University Council. He is a member of the National 10000Talent Program and a Cheung Kong Young Scholar. He received the IEEE Marconi Prize Paper Award and the IEEE Comsoc Asia Pacific Board Best Young Researcher Award.
 Khaled B. Letaief

[S’85M’86SM’97F’03] (eekhaled@ust.hk) received Ph.D. Degree from Purdue University, USA. From 1990 to 1993, he was faculty member at University of Melbourne, Australia. He has been with HKUST since 1993 where he was Dean of Engineering. From September 2015, he joined HBKU in Qatar as Provost. He is Fellow of IEEE, ISI Highly Cited Researcher, and recipients of many distinguished awards. He served in many IEEE leadership positions including ComSoc VicePresident for Technical Activities and VicePresident for Conferences.