Joint User Selection and Energy Minimization for Ultra-Dense Multi-channel C-RAN with Incomplete CSI

# Joint User Selection and Energy Minimization for Ultra-Dense Multi-channel C-RAN with Incomplete CSI

## Abstract

This paper provides a unified framework to deal with the challenges arising in dense cloud radio access networks (C-RAN), which include huge power consumption, limited fronthaul capacity, heavy computational complexity, unavailability of full channel state information (CSI), etc. Specifically, we aim to jointly optimize the remote radio head (RRH) selection, user equipment (UE)-RRH associations and beam-vectors to minimize the total network power consumption (NPC) for dense multi-channel downlink C-RAN with incomplete CSI subject to per-RRH power constraints, each UE’s total rate requirement, and fronthaul link capacity constraints. This optimization problem is NP-hard. In addition, due to the incomplete CSI, the exact expression of UEs’ rate expression is intractable. We first conservatively replace UEs’ rate expression with its lower-bound. Then, based on the successive convex approximation (SCA) technique and the relationship between the data rate and the mean square error (MSE), we propose a single-layer iterative algorithm to solve the NPC minimization problem with convergence guarantee. In each iteration of the algorithm, the Lagrange dual decomposition method is used to derive the structure of the optimal beam-vectors, which facilitates the parallel computations at the Baseband unit (BBU) pool. Furthermore, a bisection UE selection algorithm is proposed to guarantee the feasibility of the problem. Simulation results show the benefits of the proposed algorithms and the fact that a limited amount of CSI is sufficient to achieve performance close to that obtained when perfect CSI is possessed.

## 1Introduction

The fifth-generation (5G) wireless system is expected to offer a thousand times the throughput [1] of the current fourth-generation (4G) [2] and provide ubiquitous service access for a large number of user equipments (UEs) in hot spots such as shopping malls, stadia, etc. To achieve this goal, heterogeneous and small cell network (HetSNet) is regarded as one of the most promising techniques by exploiting spatial degrees of freedom through deploying more and more access points (APs) [5]. However, since all APs reuse the same frequency, the interference among the APs is a limiting factor [5], which should be carefully managed. Dense cloud radio access network (C-RAN) was proposed in [6] as one promising architecture to conquer this issue. In dense C-RAN, all the base-band processing is performed at the BBU pool through the recent development of cloud computing techniques [7], while the RRHs are only responsible for simple radio transmission or reception [8]. Due to their simple functionality, RRHs can be densely deployed in the network with low hardware cost. Due to the centralized architecture of dense C-RAN, the multi-UE interference can be efficiently handled through joint signal processing techniques such as coordinated multi-point (CoMP), leading to significant performance gains. Although C-RAN has been introduced in 4G, it is usually deployed in a large geographical area by connecting macrocell base stations to the BBU pool through fronthaul links. This conventional C-RAN incurs large delays on the fronthaul links due to long transmission distance between RRHs and BBU pool [10], which will violate the stringent latency requirement in 5G [1], i.e., a roundtrip latency within 1 ms. In contrast, dense C-RAN studied in this paper is aimed to cover hot spots with much smaller geographical area. Hence, delays can be significantly reduced.

However, there are many technical and deployment issues associated with dense C-RAN. First, dense deployment of RRHs will require high power consumption if all RRHs are activated even when the network traffic load is low. In addition, if each RRH serves all UEs, significant power will be used on the fronthaul links. As a result, how to activate the RRHs and select the RRHs for serving each UE to minimize the total network power consumption (NPC) is a critical issue. Second, in a dense C-RAN there will be a need for a large number of fronthaul links, requiring them to be low cost. There may also be a need to use millimeter wave (mmWave) technology for flexible and low cost deployment. These cost considerations lead to the likelihood of a capacity constraint on the fronthaul. Third, in dense C-RAN, the BBU pool will support large number of RRHs and the number of optimization variables for beam-vectors will become very large, which will incur high computational complexity and will become unaffordable. Finally, the dense C-RAN requires more CSI for the facilitation of CoMP transmission design, which will cause a heavy training overhead. The amount of training overhead will increase with the number of RRHs and UEs, and may counteract the cooperative gains provided by CoMP transmission [11]. The most promising way to deal with this issue is to restrict the number of RRHs that each UE should measure CSI to. The remaining CSI values can be regarded as zeros, or only long term channel statistics of the remaining CSI, such as path loss and shadowing, are considered. How to design transmission strategies for this incomplete CSI case becomes an imperative task.

Most of current work only deals with parts of the above challenges. For example, [12] considered the joint RRH selection and beamforming design to minimize the total NPC subject to UEs’ quality of service (QoS) targets and per-RRH power constraints. These papers ignored the capacity constraints on the fronthaul links and assumed that the fronthaul capacity is unlimited. To address the fronthaul capacity constraints issue, [16] investigated the problem of minimizing the number of data transfers on the aggregated fronthaul links with UEs’ QoS constraints and power constraints on each RRH. However, [16] did not explicitly impose the fronthaul capacity constraints in the optimization problem. Recently, several papers have addressed the case when the fronthaul capacity constraints are explicitly imposed [17]. The case when the optimization problem is infeasible was not considered. Then, some UEs can be removed to make the optimization problem feasible again. The UE admission control and total NPC minimization were jointly optimized in [20], where a single-stage optimization problem was formulated by introducing a weighting factor in the admission control part. Recently, [21] extended the work in [20] to multi-channel heterogeneous C-RAN where the C-RAN is overlaid by a macro-cell. However, for the admission control designs considered in [20] and [21], one has to carefully choose the weighting factor associated with the admission control part to ensure that the selected UEs can satisfy the QoS constraints, which is not easy.

However, the algorithms proposed in [12] were based on the assumption of full CSI at the BBU pool, which is not practical as explained. Unfortunately, the algorithms designed for perfect CSI cannot be directly extended to the case of incomplete CSI. To the best of our knowledge, only a few papers have considered the incomplete CSI case [22]. [22] proposed a CSI reduction scheme named compressive CSI acquisition, that can obtain the instantaneous CSIs for a subset of channel links and the large scale fading gains of the others. Based on the incomplete CSI, [22] solved a transmit power minimization problem while guaranteeing UEs’ QoS requirements by using a stochastic coordinated beamforming technique. However, the method needs to solve a high-dimension semi-definite programming (SDP) problem for each sample, and the number of samples increases with the size of the network, which incurs an unacceptable complexity for dense C-RAN. [23] focused on the beamforming algorithm to maximize the sum-rate for arbitrary UE-centric clustering C-RAN. The “C-cluster method” was introduced in [23] to reduce channel estimation overhead where only subsets of CSIs for each UE are measured, and the other unavailable CSIs are regarded as zeros. Recently, [24] proposed a conservative precoder design with the objective of maximizing the weighted sum-rate of UEs for arbitrary UE-centric clustering method with incomplete CSIs, where the long term channel statistic was incorporated into the optimization. Finally, [25] designed a clustering scheme maximizing the average net throughput of the dense C-RAN by taking the training overhead into account. The scheme is based on a hybrid CoMP transmission mode and operates under a long time duration that may be performed at the medium access control (MAC) layer since only large-scale CSIs are required. However, both the beam directions and power allocations were not optimized in [25]. None of the papers [22] considered the fronthaul capacity constraints and were mainly focused on sum-rate maximization problems without incorporating QoS requirements.

The aim of this paper is to provide a complete framework to jointly tackle the above-mentioned challenges together. Specifically, we investigate the joint optimization of RRH selection, RRH-UE associations and transmit beamforming to minimize the NPC for downlink multi-channel C-RAN with incomplete CSI, subject to fronthaul link capacity constraints, all UEs’ rate requirements and per-RRH power constraints. The NPC is modeled as the sum of the RRH power consumption and the fronthaul link power consumption. The low-power sleep mode is considered in the RRH power consumption model, and the fronthaul link power consumption is modeled as a linear function of fronthaul traffic. To reduce the computational complexity, each UE is restricted to be served by its nearby RRHs since only nearby RRHs contribute significantly to the UE’s signals. Moreover, to reduce the channel measurement overhead, we introduce the subset of RRHs that each UE should estimate the CSIs to, while the large-scale fading (such as path-loss and shadowing) is assumed to be known for the other unavailable CSI. In general, the candidate set of RRHs for serving UEs and the CSI estimation set of RRHs for each UE are determined based on UEs’ locations that may be the task of the upper-layer, which is beyond the scope of this paper. The NPC minimization problem is an NP-hard mixed-integer non-linear programming (MINLP) problem due to the indicator functions introduced in both objective function and fronthaul capacity constraints, whose optimal solution is intractable. In addition, due to the sum rate constraints and incomplete CSI, the QoS constraints are non-convex and difficult to handle. Furthermore, due to the conflicting constraints, the NPC minimization problem may be infeasible and the initialization solution should be carefully selected. As a result, the contributions of this paper can be summarized as follows:

1. Due to the incomplete CSI, it is intractable to derive the exact closed-form expression of the data rate for each UE, and thus stringent QoS requirements for each UE are difficult to be guaranteed. To alleviate this difficulty, we conservatively replace the data rate of each UE with its lower-bound expression derived by using the Jensen’s inequality.

2. To resolve the feasibility issue, we provide a low-complexity UE selection algorithm based on bisection search method to maximize the number of admitted UEs that can achieve their QoS targets, and its complexity only increases logarithmically with the number of UEs. Simulation results show that this algorithm can achieve marginal performance loss with respect to (w.r.t.) that obtained by the exhaustive UE search algorithm with an exponential computational complexity over the number of admitted UEs.

3. Given the feasible set of UEs from the UE selection algorithm, we provide a low-complexity single-layer iterative algorithm (i.e., Algorithm 1) to solve the NPC minimization problem. Specifically, the non-smooth indicator function is approximated as a non-convex function and the successive convex approximation (SCA) technique [26] is adopted to approximate the non-convex function as a series of convex functions. To deal with the non-convex QoS constraints, we translate the technique in [27] that aimed at rate maximization problem to the NPC minimization problem with rate expressions in the constraints and incomplete CSI. The convergence of the iterative algorithm is strictly proved.

4. In each iteration of Algorithm 1, there is a subproblem that the beam-vectors should be optimized. We derive the structure of the optimal beam-vectors by employing the Lagrange dual decomposition method. Then, each beam-vector can be obtained in parallel for each sub-channel (SC), which facilitates the application of the cloud computing technique in BBU pool.

This paper is organized as follows. Section 2 presents the system model, and Section 3 formulates the UE selection problem and NPC minimization problem along with the complexity analysis. The single-layer iterative algorithm to solve the NPC algorithm is given in Section 4 when the UEs are selected to be admitted. Then, in Section 5, the low-complexity UE selection algorithm is provided. Simulation results are presented in Section 5 to evaluate the performance of the proposed algorithms. Finally, conclusions are drawn in Section 7.

Notations: For a set , denotes the cardinality of , while for a complex number , denotes the magnitude of . denotes a vector with all elements equal to ones. ‘s.t.’ is short for ‘subject to’. means the expectation of over . The complex Gaussian distribution is denoted as . We use to represent the complex set. The lower-case bold letters denote vectors and upper-case bold letters denote matrices. denotes the block diagonalization operation.

## 2System Model

### 2.1System model

Consider a downlink ultra-dense C-RAN, as shown in Figure 1, consisting of RRHs and UEs1, where each RRH is equipped with transmit antennas, and each UE has a single antenna. Denote the set of RRHs and UEs as and , respectively. Each RRH is connected to the BBU pool through wireless (e.g. mmWave communication) fronthaul links. The fronthaul links are represented by dark solid arrows in Figure 1. The BBU pool is assumed to have all UEs’ data and distributes each UE’s data to a carefully selected set of RRHs through the fronthaul links. It is assumed that all the RRHs send their received data using the Orthogonal Frequency Division Multiple Access (OFDMA) technique and then cooperatively transmit to the UEs.

Denote as the subset of UEs that are admitted in the C-RAN. To reduce the computational complexity of the large network, it is assumed that each UE can only be served by its nearby RRHs since only nearby RRHs contribute significantly to the UE’s signal quality due to the severe path loss. Denote and as the candidate set of RRHs that potentially serve UE and the set of UEs that can be potentially served by RRH , respectively. The transmission links from the RRHs in to UE are called the candidate serving links, which are represented in red solid arrows in Figure 1. In this paper, it is assumed that and are predetermined by some well-known user-centric cluster methods [28] determined by the MAC layer2. Please refer to [30] for a survey on user-centric cluster methods. Note that since no restrictions are placed on , they can overlap with each other, i.e., there may exist two different UEs and that , for . Moreover, the other-cluster interference due to overlapping coverage can be effectively handled under this user-centric cluster method. For example, UE 4 and UE 5 have one common serving RRH 8. Hence, RRH 8 will transmit useful signals to both UE 4 and UE 5, rather than only interference signals. In addition, the BBU pool has the CSI knowledge from RRH 3 to UE 4. Thus, the interference from RRH 3 to UE 4 will be carefully controlled when RRH 3 is serving UE 5. In contrast to the non-cooperative optimization where each cluster selfishly optimizes its own performance without considering its impact on the other clusters, in dense C-RAN all the signal processing operation is performed at the BBU pool, where the interference among different clusters can be centrally mitigated by resorting to the powerful cloud computing tool.

Denote the set of available sub-channels (SCs) as , where is the total number of SCs. To maximize the spectral efficiency, it is assumed that universal frequency reuse is adopted and the multiuser interference can be efficiently handled by the beamforming technique. Denoting as the beam-vector at RRH for UE on SC , the transmitted signal of RRH on SC is

where is the data symbol for UE on SC . Without loss of generality, it is assumed that and for . The baseband received signal at UE on SC is given by

where is the channel vector from RRH to UE on SC , and is the additive complex white Gaussian noise following the distribution of . The channel vector can be written as , where denotes the large-scale channel gain that includes the path loss and shadowing, and denotes the small-scale fading vector, where all elements are dependent of each other and each one has zero mean and unit variance.

For the sake of reduced complexity of decoding at the receivers, we do not consider the joint decoding of the interfering signals and the multiuser interference is simply regarded as noise at the receivers. In addition, coherent joint transmission3 is assumed as in most of existing papers [12]. Then, the SINR at UE on SC can be obtained from (Equation 2) as

where denotes the collection of all beam-vectors.

As seen in (Equation 3), to design the beam-vectors for all UEs, the overall CSI of all UEs is required. However, it is a formidable task to obtain all CSI for the dense C-RAN due to the limited training resources. To handle this difficulty, we introduce the set for each UE that is defined as the set of RRHs that UE needs to measure CSI from. Also, we define for each RRH as the set of UEs that each RRH knows the CSI to. In general, are the set of UE ’s nearby RRHs and are the set of RRH ’s nearby UEs. Note that at least the CSI from all RRHs in is required for cooperative transmission design. The other CSI from RRHs in to UE is used to coordinate the interference, and the links from RRHs in are called coordinated interference links, which are shown by blue dashed arrows in Figure 1. Also, the UEs in are called RRH ’s coordinated UEs. For the CSI from RRHs in to UE , it is assumed that the BBU pool only knows the large scale gains . This is possible because the large scale gains change much more slowly than the small-scale fading.

Since the CSI in is unknown, we consider the following data rate for UE on SC (bit/s/Hz) [31]

where the expectation operator is performed over the fast fading of the unknown CSI in . Each UE ’s total data rate should be larger than the minimum rate requirement :

In each fronthaul link, the maximum capacity that can be supported is limited. Hence, the following fronthaul capacity constraint follows:

where is an indicator function, defined as

denotes the total transmission power from RRH to UE , is the maximum capacity that can be supported by the th fronthaul link.

### 2.2Network power consumption model

In this subsection, a practical NPC model is provided that consists of two parts: power consumption at the RRHs and power consumption on the fronthaul links.

As in [32], the power consumption of RRH can be modeled as a piecewise linear function of the transmit power at RRH :

where is the constant accounting for the efficiency of the power amplifier of RRH , is the total transmit power at RRH that should be no larger than , i.e.,

and represent the circuit power consumption when RRH is in active mode and sleep mode, respectively. In general, is much larger than , which motivates us strategically to switch off the RRHs to save power in case of very low traffic.

Fronthaul power consumption model is critical for the optimization of NPC. In [12] and [13], the fronthaul power consumption was simply modeled as a step function, with a larger constant value for active mode and smaller one for sleep mode. In [33], the fronthaul power consumption is modeled to be proportional to the number of UEs that each one supports. However, these papers did not take into account the effect of data rate transmitting on each fronthaul link. Intuitively, to support high fronthaul transmit data rate, more power should be consumed on the fronthaul links. Compared with [12], we go one step further by modeling the power consumption of each fronthaul link to be proportional to the total fronthaul transmit data rate as in [34]:

where is a constant scaling factor 4.

Based on the above analysis and with some simple manipulations, the NPC is modeled as

where is given in (Equation 9), .

## 3Problem Formulation and Analysis

Based on the above system model, we formulate the user selection problem and the NPC minimization problem in a two-stage form. Then, we provide the complexity analysis for the formulated problems.

### 3.1Problem Formulation

Due to the limited fronthaul capacity constraints C2 in (Equation 6) and the power constraints C3 in (Equation 9), the system may not be able to support all UEs with their rate requirements of C1 in (Equation 5). Hence, some UEs may be dropped or rescheduled in other orthogonal time slots to make the optimization problem feasible. As a result, we may consider a two-stage optimization problem. In the first stage, one should find the largest subsets of UEs that can be supported by the system5, while in the second stage, one should optimize the corresponding beam-vectors to minimize with the selected subset of UEs obtained from the first stage.

As a result, the optimization problem at the first stage is formulated as

Denote as the solution from Stage I and the corresponding becomes . Then, the optimization problem at the second stage is formulated as

In the constraints C1, C2, and C3, and are replaced by and , respectively. Note that the constant term in (Equation 11) has been omitted in the objective function ( ?).

We emphasize that the aim of Stage I is to find the maximum number of admitted UEs with feasible beam-vectors. These obtained beam-vectors are not guaranteed to be optimal in terms of NPC. Hence, we need to perform Stage II to optimize the beam-vectors to reduce the NPC. The beam-vectors obtained from Stage I will be a feasible initial input that is required by the algorithm developed in Stage II.

The incomplete CSI at the BBU pool makes the design of beam-vectors very difficult to solve and the expression for the data rate is difficult to derive. In the following, we consider its lower-bound and replace the data rate with its lower-bound, which makes the optimization problem more tractable.

We first simplify the SINR expression in (Equation 3). The beam-vectors for each UE on each SC are merged into a single large-dimension vector . Then, we define a set of new channel vectors , representing the aggregated CSI from the RRHs in to UE on SC . The SINR expression in (Equation 3) can be rewritten as

Note that is perfectly known in the BBU pool according to the previous assumption, and only the denominator in (Equation 13) contains the uncertain terms. However, it is difficult to obtain the accurate rate expression. To deal with this challenge, we consider its lower-bound with more tractable form. Specifically, since is a convex function for any positive , by using Jensen’s inequality [35], the lower bound of the data rate in (Equation 4) can be derived as

where . To obtain the closed-form expression of , we define the indices of as . Then, we have

where is the block matrix of at the th row and th column, given by

It can be easily verified that is a positive definite matrix. Note that the derivations of matrix place no restrictions on the channel distributions and only large-scale channel gains are required. Hence, the following developed algorithms are applicable for any channel distributions, such as Rayleigh fading, Ricean channels, Nakagami- fading channels, et al.

We now start to check the tightness of this rate lower-bound. It is difficult to derive the accurate data rate expression for general case. Instead, in Appendix ?, we derive the accurate closed-form expression of data rate for one special case under three assumptions: 1) The RRH serving cluster is the same as the CSI cluster for each UE: ; 2) The RRH serving cluster for each UE is non-overlapped with each other: ; 3) The small-scale fading vector follows the distribution of for . We consider one non-overlapped C-RAN scenario deployed within a square area of coordinates km as shown in Fig. ?. This network area is divided into nine squares. In each square, one UE is located at the center point and three RRHs are randomly generated in this square to exclusively serve this UE. For simplicity, only one SC is considered. The other simulation parameters are the same as in the simulation Section. It is assumed that each RRH transmits at their maximum power and the beam direction is chosen to be channel direction. The values of and are tested, which correspond to sparse and dense scenarios, respectively. Only UE 5 is considered. Figure 3 plots three kinds of curves for comparison: one is the lower bound of data rate derived in ( ?), one is the accurate closed-form data rate expression derived in ( ?) in Appendix ?, and the last one is the Monte-Carlo simulations. It is seen from Figure 3 that the curve of the closed-form expression coincides with that of the Monte-Carlo simulations, which verifies the correctness of the derivations. Furthermore, for the sparse scenarios, when the transmit power is low, , the lower-bound is quite tight. With the increase of transmit power, the gap increases and becomes a constant in the high transmit power regime. Note that only roughly 3% data rate loss will be incurred when using the lower-bound compared with the accurate data rate, which is negligible. On the other hand, for the dense scenario, the C-RAN becomes interference limited and the data rate remains fixed for all ranges of the transmit power as expected. It is again observed that the gap between the lower-bound and the exact value is small. Hence, considering the complicated data rate expression in ( ?) in Appendix ?, our derived lower-bound expression in ( ?) is much easier to handle and more suitable for algorithm design.

By replacing the data rate in Problems and with its lower-bound given in ( ?) and considering the fact that the minimum rate constraints are met with equality at the optimal point, Problems and can be transformed as

and

respectively.

In the following, we focus on Problems and .

### 3.2Problem Analysis

By adopting the user-centric clustering method in Section II, the number of optimization variables in Problems and has been reduced from in fully cooperative transmission scheme to here. By appropriately setting the cluster sizes, the reduced number of variables may be very large, which significantly reduces the computational complexity. In addition, some redundant constraints can be removed, which can additionally reduce the computational complexity. For example, in Figure 1, RRH 3 and RRH 5 are not in any UE’s candidate serving set, and thus the power constraints associated with RRH 3 and RRH 5 in C3 can be removed. Moreover, if each link supports at most two UEs, then only link 8 (i.e., RRH 8) should be imposed with the fronthaul capacity constraints. Hence, by employing the user-centric clustering with limited cooperation, the computational complexity can be reduced significantly.

However, Problems and are still difficult to solve due to the following reasons. Both the objective functions and constraint C5 contain the non-smooth and non-differential indicator function or (and) continuous variables, which are usually named as an MINLP problem. Although the generalized Benders decomposition method [18] is effective in solving this kind of problems, it is very difficult to directly apply this method to Problems and due to the non-convex sum data rate constraints over all multiple SCs. An exhaustive search method can be applied to solve Problems and . Specifically, to solve Problem , one should check whether Problem is feasible or not for each given user set and each given set of UE-RRH associations. This requires operations, which will become prohibitive for large values of and . In addition, even given the selected UE set and the set of UE-RRH associations, it is still difficult to check the feasibility since constraint C4 is non-convex. Moreover, for dense C-RAN, the complexity associated with the exhaustive search method is unaffordable for BBU pool. Similar difficulties hold for Problem .

In the next section, we first deal with NPC minimization Problem by assuming that the UEs have been selected with feasible beam-vectors, then one low-complexity UE selection algorithm to deal with Problem is provided in Section 5.

## 4Low-complexity Algorithm to deal with Problem P4

In this section, we propose a low-complexity algorithm to solve Problem when UEs have been selected by using the UE selection algorithms in Section 5, and denote the selected subset of UEs as . As analyzed in Section 3.2, there are two difficulties to solve Problem : one is the non-convex sum data rate constraint C4 and the other one is the non-smooth indicator function.

To deal with the first difficulty, we resort to the relationship between the data rate and weighted mean square error (MSE). In [27], the authors considered the sum rate maximization problem by showing that maximizing the sum rate is equivalent to minimizing the weighted MSE. Unfortunately, there are two hurdles that preclude the direct application of the technique in [27]: First, [27] considered the multiple-antenna UEs with perfect CSI. When each UE has only one antenna with perfect CSI, the rank of channel covariance matrices will be equal to one, i.e., . However, for the incomplete CSI considered in this paper, the rank of channel covariance matrix may be larger than 1 according to (Equation 15), i.e., . Second, in [27], the rate expression is in the objective function, while the rate expressions are in the constraints here.

To resolve the first hurdle, we construct an auxiliary signal transmission model by decomposing each interfering UE into multiple interfering sources. Specifically, for each UE on SC , since , are positive definite matrices, they can be decomposed as

where , with being the rank of . Then, we construct the following auxiliary signal transmission model for UE

where can be regarded as the number of interfering sources from UE , can be treated as the CSI from the th interfering source of UE to UE , is the corresponding transmission data. Both and are assumed to obey the distribution of . The data from different interfering sources are mutually independent and independent of . Note that all interfering sources from the same UE use the same beam-vector. By using the receive decoding to decode UE ’s received signal on SC 6, the estimated signal is given by

Due to the independence of the transmit data and noise, the mean square error (MSE) matrix at UE is given by

where and are the collections of decoding variables and data symbols, respectively, and (Equation 17) has been used to derive (Equation 20).

To deal with the second hurdle, we successfully find a lower bound of the sum rate for each UE and this lower bound is tight at certain point. Then, we replace the sum rate in constraints C4 with its lower bound and iteratively solve the beam-vectors by using the block coordinate decent method. Specifically, defining the following functions:

where is an introduced variable, we have the following lemma:

Lemma 1: Given the beam-vectors , function is a lower bound for . In addition, the optimal and for to achieve are where is given by Proof: Please see Appendix ?.

By replacing in Problem with its lower-bound , Problem can be transformed into the following optimization problem

where and are the collection of variables and , respectively. Note that given and , constraint C6 is a convex set over beam-vectors, which is more tractable than Problem , wherein constraint C4 is non-convex. Hence, Problem can be solved by using the block coordinate decent method: given , update and in ( ?) and ( ?), respectively; update and with fixed and . We only need to deal with the latter one. Given and , by inserting the MSE expression in (Equation 20) into C6, Problem can be transformed as

where .

Now, we deal with the second difficulty: the non-smooth indicator function in (Equation 7) in the objective function and C5 in Problem . The non-smooth indicator function is approximated as a fractional function , where is a very small positive value that controls the smoothness of approximation7. Then, can be approximated as

Note that for any positive , the fractional function is strictly smaller than one. Hence, is actually the lower bound of . However, this gap is negligible when is very small and is comparatively large. By replacing the indicator function in Problem with , we have

Problem is much more tractable than Problem since both the objective function and constraints in Problem are differentiable and continuous. Although Problem is still nonconvex due to the concavity of , it is a well-known difference of convex (d.c.) program, which can be efficiently solved by the SCA method [37]. The main idea of this method is to approximate the concave function as its first order Taylor expansion. Specifically, by using the concavity of , one has

where is a collection of beam-vectors at the iteration, and are given by

where denotes the first-order derivative of . By replacing and in Problem with the right hand side (RHS) of (Equation 22) and (Equation 23), respectively, one can solve the following optimization problem in the iteration

where is given by with , , and . Note that some constant terms in the RHS of (Equation 22) and (Equation 23) are omitted in ( ?). Obviously, is a positive definite matrix and all constraints form a convex set. Then Problem is a convex problem. The details to solve it will be given in the next subsection.

Based on the above analysis, an iterative algorithm is given to solve Problem . A straightforward way to solve Problem would involve two layers: the inner layer to solve Problem by using the SCA method given and ; the outer layer to update and by using ( ?) and ( ?) given . Although the inner layer is guaranteed to converge to a Karush-Kuhn-Tucker (KKT) point of Problem as proved in [38], this two-layer algorithm will incur high computational complexity. Instead, we merge these two layers into one layer and update , , and at the same layer, as given in Algorithm ?. Fortunately, Algorithm ? is guaranteed to converge, as proved in Theorem 1.

Theorem 1: Given the feasible initial input , Algorithm ? is guaranteed to converge both in objective value and variables.

Proof: Please see Appendix ?.