TwoTier Precoding for FDD Multicell Massive MIMO TimeVarying Interference Networks
Abstract
Massive MIMO is a promising technology in future wireless communication networks. However, it raises a lot of implementation challenges, for example, the huge pilot symbols and feedback overhead, requirement of realtime global CSI, large number of RF chains needed and high computational complexity. We consider a twotier precoding strategy for multicell massive MIMO interference networks, with an outer precoder for intercell/intercluster interference cancellation, and an inner precoder for intracell multiplexing. In particular, to combat with the computational complexity issue for the outer precoding, we propose a low complexity online iterative algorithm to track the outer precoder under timevarying channels. We follow an optimization technique and formulate the problem on the Grassmann manifold. We develop a low complexity iterative algorithm, which converges to the global optimal solution under static channels. In timevarying channels, we propose a compensation technique to offset the variation of the timevarying optimal solution. We show with our theoretical result that, under some mild conditions, perfect tracking of the target outer precoder using the proposed algorithm is possible. Numerical results demonstrate that the twotier precoding with the proposed iterative compensation algorithm can achieve a good performance with a significant complexity reduction compared with the conventional twotier precoding techniques in the literature.
Massive MIMO, Twotier Precoding, Tracking Algorithm, Optimization, Grassmann Manifold
I Introduction
Massive MIMO is a promising technology to meet the future capacity demand in wireless cellular networks. Equipped with a large number of antennas, the system has a sufficient number of degrees of freedom (DoF) to exploit the spatial multiplexing gains for intracell users and to mitigate the intercell interference. However, the corresponding beamforming (precoder) designs for such multiuser MIMO (MUMIMO) interference networks are challenging even in traditional MIMO systems with a small number of antennas. In [1], the intercell interference is mitigated by using coherently coordinated transmission (CCT) from multiple base stations (BSs) to each user, using commonly shared global channel state information (CSI). In [2], the beamformers are jointly optimized among BSs, where the uplinkdownlink duality is used to obtain the global CSI in a timedivision duplex (TDD) system. Using alternative optimization techniques, WMMSE algorithm is proposed in [3] with the objective to maximize the weighted sum rate for multicell systems. Moreover, interference alignment (IA) approaches were used in [4, 5] for downlink interference cellular networks.
We consider the beamforming design for frequencydivision duplex (FDD)^{1}^{1}1FDD is still a major duplexing technique in the near future, especially for macrocoverage applications. massive MIMO systems with a large number of antennas . Unlike conventional multicell MUMIMO networks, where the schemes in [1, 2, 3, 4, 5] may be easily implemented, FDD massive MIMO systems induce a lot of practical issues: (i) huge pilot symbols and feedback overheads, (ii) large number of RF chains, (iii) realtime global CSI sharing, and (iv) huge computational complexity for precoders at the BSs. For instance, the required number of independent pilot symbols for transmit side CSI (CSIT) estimation at the mobile scales as , and so as the CSIT feedback overheads. In addition, as scales up, the number of RF chains also scales up, which induces a high fabrication cost and power consumption. Although the dynamic antenna switching techniques [6, 7] may reduce the required number of RF chains, those solutions did not fully utilize the benefits of the extra antennas. Moreover, there is signaling latency over the backhauls and it is highly difficult to acquire global realtime CSIT for precoding. Finally, the computational complexity for the precoding algorithms scales quickly with , and low complexity precoding algorithms are needed for massive MIMO systems.
In this paper, we address all the above difficulties by considering a twotier precoding with subspace alignments. This is motivated by the clustering behavior of the user terminals. As illustrated in Fig. 1, the users in the same cluster may share the same scattering environment, and hence, they may have similar spatial channel correlations. Whereas, users from a different cluster may have different spatial channel correlations. Therefore, we can decompose the MIMO precoder at the BS into an outer precoder and an inner precoder. The outer precoder is used to mitigate intercell and intercluster interference based on the statistical channel spatial covariance. Since the spatial correlations are slowly varying, the outer precoder can be computed on a slower timescale. On the other hand, the inner precoder is used for spatial multiplexing of intracluster users on the dimensionreduced subspace spanned by the outer precoder. As a result, the inner precoders are adaptive to the local realtime CSIT at the BS and can be computed in a faster timescale. Using the proposed twotier precoding structure, we shall illustrate in Section IIIB that the aforementioned technical issues (i)(iii) associated with large can be substantially alleviated.
In [8, 9], a zeroforcing based twotier precoding has been proposed for single cell massive MIMO systems. The outer precoders are computed using a block diagonalization (BD) algorithm. However, it requires a high complexity for computing the outer precoder, and the tracking issues for the outer precoder under timevarying channels were not addressed. In fact, the computational complexity is a serious concern in massive MIMO systems as the number of antennas scales to very large. For example, in the BD algorithm proposed in [8, 9], we need to apply a series of matrix manipulations including SVD to a number of channel covariance matrices each time we update the outer precoder, and the associated complexity is . In addition, deriving a low complexity iterative algorithm for the BD solution in [8, 9] is far from trivial. To address the complexity issue, we consider online tracking solutions to exploit the temporal correlation of the channel matrices. There is a body of literature for iterative subspace tracking algorithms, for example, gradientbased algorithms [10, 11, 12, 13], power iteration based algorithms [14] and the algorithms based on Krylov subspace approximations [15, 16]. Moreover, the author in [17] proposed an iterative subspace tracking precoder design for MIMO cellular networks. However, these algorithms have not fully exploited the channel temporal correlations to enhance the tracking. In this paper, we propose a compensated subspace tracking algorithm for the online computation of the outer precoder. The algorithm is derived by solving an optimization problem formulated on the Grassmann manifold, and its tracking capability is enhanced by introducing a compensation term that estimates and offsets the motion of the target signal subspace. Using a control theoretical approach, we also characterize the tracking performance of the online outer precoding algorithm in timevarying massive MIMO systems. We show that, under mild technical conditions, perfect tracking (with zero convergence error) of the target outer precoder using the proposed compensation algorithm is possible, despite the channel covariance matrix being timevarying. In general, we demonstrate with numerical results that the proposed twotier precoding algorithm has a good system performance with low signaling overhead and low complexity of .
The rest of the paper is organized as follows. Section II introduces the massive MIMO channel model and the signal model. Section III illustrates the twotier precoding techniques. Section IV derives the iterative algorithm for tracking the outer precoder, where the associated convergence analysis is given in Section V. Numerical results are given in Section VI and Section VII gives the concluding remarks.
Notations: We use lower case bold font to denote vectors and upper case bold font for matrices. denotes the identity matrix. For matrices and , denotes a concatenated matrix, whose first columns are given by and the last columns are given by .
Ii System Model
Iia Massive MIMO Channel Model with Local Spatial Scattering
We consider a cellular network with BSs, and the th () BS serves MSs. The MSs are clustered together, and without lost of generality, we assume each BS serves one cluster of MSs^{2}^{2}2Note that the extension to the case of multiple clusters is very straight forward. Hence we only focus on the single cluster case to simplify the notation.. Each BS has antennas and each user has antennas. The downlink channel from the th BS to the th MS in the th cell is given by . The receive signal at the MS in cell is given by
where is the symbol transmitted at the BS , is the number of data streams transmitted by BS , and is the additive complex Gaussian noise.
In the massive MIMO system, where the BS has a large number of antennas () and is placed on the top of a building, there is usually not enough local scattering surrounding the BS. Correspondingly, it has also been shown by channel measurements that most of the signal energy is localized over the azimuth direction [18]. Therefore, we consider the onering local scattering model [19, 20] to characterize the massive MIMO channel. As illustrated in Fig. 1, the local scattering surrounding the MS is modeled by a ring with radius ; whereas, the transmit signal from the BS shapes a narrow angular spread (AS) denoted as , where is the distance between BS and MS in cell . Let be the angle of departure (AoD) of a path from BS to MS in cell . We use the vonMises model to characterize the power azimuth spectrum (PAS) w.r.t. [21, 19] as follows:
(1) 
where is the mean angle of the AoD, is the zeroth order modified Bessel function and characterizes the AS at BS in the direction of MS in cell . \mysubnoteR1A3.c Denote as the corresponding transmit spatial correlation matrix at BS . The th entry of the matrix , which describes the spatial correlations between the th and th antenna elements at BS [22], is defined as:
(2) 
where accounts for the phase difference between the th and th antenna elements over the azimuth direction at BS .
We assume that MSs within the same cluster have the same channel statistical parameters and , i.e., and , . As a result, the transmit correlation matrices satisfy for all MS in the scattering cluster of cell . We adopt the following twotimescale, clustered, and spatial correlated massive MIMO channel model.
Assumption 1
(Twotimescale, Clustered and Spatial Correlated Channel Model) The timevarying massive MIMO channel on each subframe is given by
(3) 
where and are changing in different timescales:

Small Timescale: are identical and independently distributed (i.i.d.) over MSs and is timevarying over subframes . Each element of the matrix follows an independent complex Gaussian distribution with zero mean and unit variance.

Large Timescale: The spatial correlation matrix is constant within each superframe , but changes between consecutive superframes (i.e., a block of subframes).
∎
IiB Signal Model, Interference Mitigation and Challenges
Denote the precoding matrix for user in cell as and the associated receiver shaping matrix as , where is the number of the data streams. Applying the receiver shaping matrix to the signal at MS in cell , the received signal is given by
(4)  
where is still a standard complex Gaussian noise and is the data symbol intended for user in cell . The per BS power budget is .
In the conventional approach, the intercell interference mitigation and intracell spatial multiplexing are achieved by a joint design of the precoders and receiver shaping matrices among all the BSs using, for example, ZF techniques [23, 24], WMMSE [3], IA [25, 5], etc. However, these approaches cannot be directly applied in FDD massive MIMO cellular systems because:

A large number of RF chains () are required to perform RFbaseband translation as well as AnalogtoDigital (A/D) conversion. As a result, there is a huge cost in hardware design and power consumption.

A huge amount of pilot symbols should be used to estimate the massive MIMO channels (a large matrix), and a huge CSI feedback overhead is involved.

Global realtime CSIT is required for computing . However, the cross link information can only be obtained via message passing among the backhauls connecting the BS. This induces a huge burden on \mysubnoteR1A2 the backhaul and increases the signaling latency.
R1A3.a
Remark 1 (Intercell Interference in Massive MIMO)
It is reported that the intercell interference (ICI) of multicell massive MIMO systems can be asymptotically ignored [26] using simple percell zeroforcing. One key assumption is that the direct links and interference links are spatially uncorrelated. However, such uncorrelation may not hold under local scattering (such as the onering scattering model considered in this paper), and hence, ICI coordination may be needed for massive MIMO.
Can we call it limited scattering? Lineofsight propagation also satisfies the uncorrelated assumption.
To deal with these challenges, we propose a twotier precoding in the next section.
Iii TwoTier Precoding: Joint Signal and Interference Subspace Alignment
In this section, we propose a twotier precoding structure by exploiting the limited local scattering and the clustering structure of mobile users in cellular systems.
Iiia Twotier Precoding with Subspace Alignment
The precoder at BS to MS have the twotier structure given by:
(5) 
where is the outer subspace precoder that adapts to the large timescale spatial correlations to mitigate the intercell interference, and is the inner precoder that utilizes the realtime local CSIT to mitigate the intracell interference. The outer precoder is computed in a longtimescale once every superframe, and the inner precoder (and the corresponding receiver shaping matrices ) is computed in a shorttimescale once every subframe. The parameter determines the dimension of the subspace for intracell spatial multiplexing. In the massive MIMO scenario, we have .
Specifically, the twotier precoding with subspace alignment is described below:

LongTimescale Processing: In each superframe, the subspace precoders are chosen as the solution to the following optimization problem
(6) subject to , where the expectations are conditioned on the spatial correlation matrices and is a weight parameter.
R1A3.d
Remark 2
Minimizing the first term in (6) only corresponds to the conventional ZF solution. Whereas, minimizing the second term alone corresponds to the match filter (MF) solution. The formulation (6) is to strike a balance between the intercell interference leakage and the intracell signal energy in a system with a large but finite number of antennas. On one hand, for large number of transmit antennas, the second term dominates and the solution approaches the MF solution. On the other, for limited number of antennas (such as traditional traditional MIMO), the first term is significant and the solution approaches the coordinated ZF solution. The weight is to adjust the balance between the intercell interference and the direct link signal.

ShortTimescale Processing: In each subframe, a ZF precoding [23] is used.
Step 1: Choose the receiver shaping matrix for MS in cell by solving
(7) subject to , and feedback the equivalent channel to BS .
Step 2: Concatenate the rows of for each MS to form a matrix . The inner precoder is given by^{3}^{3}3We assume that the number of data streams assigned to each user always satisfy . Hence, with probability 1, the matrix has full row rank.
(8) where denotes the pseudoinverse for . The inner precoder for MS is given by the th to the th columns of .
IiiB Motivation of the TwoTier MIMO Precoding and the Complexity Issue
The twotier MIMO precoding has the following advantages:

A light demand on pilot symbols and CSI feedback overheads: With the outer precoding, the MS only needs to estimate the effective channel and feedback the channel matrix . Instead of directly working on the channel matrix , there is a huge saving on the pilot symbols for channel estimation and the CSI feedback loading.

A relatively small number of RF chains required: With the limited local scattering around the BS, there is only a few active eigenmodes for the massive MIMO channel, and a small number of spatial multiplexing data streams can be supported. Therefore, we do not need to implement RF chains. Instead, only RF chains are required, where and the outer precoder can be implemented using the RF phase shifting network [27, 28] as illustrated in Fig. 2.

Only statistical global CSI required: The inner precoder only requires the local CSI between the BS and its serving MSs. To update the outer precoder in the longtimescale, only the knowledge of channel statistics is required. As a result, the performance is insensitive to backhaul latency among the BSs.
Having addressed the practical issues (i)(iii) raised in Section I, we now focus on the computational complexity issue in (iv). Note that, with the dimension reduction for the inner precoder, the computation for the outer precoder dominates the complexity. We first investigate the solution property for the outer precoding problem (6).
Theorem 1 (Solution to the Outer Precoder)
The optimal solution to the outer precoding problem (6) is given by the eigenvectors corresponding to the smallest eigenvalues of the covariance matrix
(9) 
for each BS . ∎
Please refer to Appendix A for the proof.
IiiC Achievable Percell DoF of TwoTier Precoding in Massive MIMO
In this section, we characterize the performance of the twotier precoding by evaluating its achievable DoF percell. The DoF can be interpreted as the number of data streams or the asymptotic throughput performance that can be supported in the massive MIMO systems at high SNR [25, 29]. Denote as the sum throughput of cell . The percell DoF of the massive MIMO system is defined as .
For simplicity, we consider a symmetric massive MIMO network, where each cell has the same number of MSs, , the same rank of transmit spatial correlation matrices , and for all . We derive the network DoF of the twotier precoding in the symmetric massive MIMO system below.
Theorem 2
(Percell DoF of Symmetric Massive MIMO Systems with Twotier Precoding) For a symmetric massive MIMO network , where all the transmit correlation matrices have rank , if , then

the percell DoF of the proposed twotier precoding is given by

the percell DoF of conventional onetier interference alignment (with global realtime CSIT) is given by
∎
Please refer to Appendix B for the proof.
As a result, there is no loss of DoF performance using the proposed twotier precoding design in a symmetric multicell massive MIMO network^{4}^{4}4Although the DoF result only focuses on a special network topology region specified by , the region does cover the interested application scenario of massive MIMO systems, since is usually very large and is relatively small in massive MIMO systems..
Iv Iterative Algorithms for Outer Precoder under TimeVarying Channels
To reduce the complexity of finding the global optimal solution for
the outer precoder problem in (6),
one approach is to leverage on the slowly varying nature of the spatial
correlation and to compute the outer precoder
iteratively at every superframe. A common technique for such iterative
outer precoder is to apply the gradient descent algorithm to solve
problem (6). However, such a “naive”
method may have a poor convergence performance, because problem (6)
is nonconvex due to the quadratic equality constraints .
Furthermore, problem (6) suffers
from uncountably many nonunique and nonisolated local optima. For
example, if is one local optimum, then
gives another local optimum, where is any unitary matrix.
Such a nonisolated property makes it hard to develop iterative algorithms
with fast convergence to the global optimal solution under time varying
channels. Hence, we need to tackle the following challenge,
Challenge 1: To derive a low complexity iterative algorithm
which can be shown to converge to the desired solution for the outer
precoder under timevarying channels.
To deal with the above challenge, we focus on deriving algorithms on the Grassmann manifold, where all the outer precoders that span the same subspace are considered to be equivalent and are represented by a single point on the Grassmann manifold. As a result, the local optimum becomes isolated.
Iva Transformation of Problem (6) on Grassmann Manifold
A Grassmann manifold is the set of all dimensional subspaces of : , where denotes the space spanned by the columns of the matrix . The Grassmann manifold can be considered as a topology embedded on the Euclidean space with a mapping that maps each point from the Euclidean space to the manifold . For example, all the matrices ( with full rank ), which span the same subspace as does, are all mapped to the same element in under the mapping , i.e., . On the other hand, the inverse mapping represents the set of matrices in the Euclidean space that span the same subspace.
Consider as an element on the Grassmann manifold, i.e., . The outer precoding problem (6) (see equation (23) in Appendix A) can be reformulated as \mysubnoteR2A3 an optimization over the Grassmann manifold [30, 31]
(10) 
where the global optimal solution is defined to be the subspace that yields the minimum objective value of (10).
Note that, there is a substantial difference between the formulation (10) in the Grassmann manifold and (6) in the Euclidean space. While the objective in (6) is to find a matrix in the Euclidean space that minimizes the interference and signal utility function , the problem (10) focuses on choosing the right subspace , \mysubnoteR2A4 which is a unique solution to minimizing ^{5}^{5}5Problem (10) has a unique solution, if, for , the th eigenvalue of the covariance matrix has multiplicity 1. . With this insight, we can derive algorithms to obtain the subspace precoders more efficiently.
IvB Outer Precoder Tracking Algorithm with Compensations
An intuitive way yo derive an algorithm that solves the problem (10), is to generalize the gradient descent algorithm to the Grassmann manifold:
(11) 
where is the step size, and
(12) 
is the gradient iteration mapping on the Grassmann manifold and is a collection of the covariance matrices at iteration . The notation emphasizes that is the key parameter that determines the optimal solution .
Although the gradient descent algorithm usually finds a stationary point under static parameters, it may not be the case under the timevarying parameter . Under timevarying channels, the channel covariance matrices are varying in a similar timescale as the gradient iterations and there is always a convergence gap between the gradient iterate and the timevarying optimal solution . Taking the precoder for BS as an example: When the gradient iteration gets closer to the previous target at the th iteration, the optimal target has already moved to a new position , which contributes to an additional tracking error.
Intuitively, one way to enhance the tracking of the outer precoder under timevarying channels is to estimate the motion of the moving target and compensate for it:
(13) 
where the compensation term is an estimation of the difference of .
In the following, we illustrate how to derive the gradient mapping and the compensation term .
IvB1 Gradient on the Grassmann Manifold
Using calculus on Grassmann manifolds [31, 32], the gradient of in (12) can be derived from , where is the gradient of on the Euclidean space and is to project onto the tangent space [32] of on the Grassmann manifold. Moreover, it is observed that, there is no cross product of and in , and hence we can compute separately. As a result, the gradient is given by , where is the partial derivative w.r.t. the precoder , ,
(14) 
IvB2 Derivation of the Compensation
Consider the parameter in (10) as a discretetime sampling of the continuoustime covariance matrix profile . The optimality condition [33, 34] of problem (10) on the Grassmann manifold is given by:
(15) 
Note that, since the function is nonlinear, we cannot easily solve (15) to get . However, we are only interested in the differential .
Taking the differentiation on (15) w.r.t. , we get
(16) 
where is the partial differential of on the covariance matrix profile , and is the partial differential of on on the Grassmann manifold along the direction . Note that as the function in (15) is the gradient of the objective function in (10), represents the Hessian of .
R2A5]
Consider the case that the optimal solution is nondegenerate, i.e., the function in (15) has a unique solution over the neighborhood of . By the implicit function theorem [32], the linear equation (16) has a unique solution . Consider that the outer precoder obtained from the previous iteration is already a good approximation of , and the fact that the objective function is decoupled on each component , we can estimate the differential by , where , , is obtained by solving (17) for ,
(17) 
IvB3 Low Complexity Calculation on Grassmann Manifold for the Compensation Term
Although the compensation equation (17) is linear in the matrix variable , it is a Sylvester equation in the general form , which is difficult to solve. However, using the property that is a point on the Grassmann manifold, we can find a low complexity algorithm to solve the compensation equation (17).
Consider are already orthonormalized. Using the calculus on the Grassmann manifold [35, 31], the term can be derived as
(18) 
Notice that is linear in (c.f. (14)). Multiplying (17) with a unitary matrix on the right, we obtain
(19) 
where diagonalizes , i.e., and . Let . Since is diagonal, equation (19) can be written into parallel linear matrix equations according to each column of ,
(20) 
where , and are the th () columns of and , respectively. The above linear equation can be solved by the conjugate gradient (CG) algorithm, which only has complexity of .
IvC Complexity and Implementation Considerations
IvC1 Computational Complexity
During each superframe , each MS in the th cell estimates the interference covariance matrix . Each BS updates according to (9) and computes the new outer precoder according to the following steps:
Step 1: Compute the compensation estimator \mysubnote[R2A6]

( op.) Let , where is given in (14).

( op.) Find the eigen factorization for the dimension reduced matrix , where .

( op.) Compute the coefficient matrix .

( op.) Solve the equation for using CG algorithm with step,

Update the compensation .
Step 2: ( op.) Compute the search direction , for .
Step 3: Update , where is the step size and denotes the GramSchmidt procedure for the orthogonalization of .
The computational complexity of the proposed compensation algorithm (13) is mainly contributed by the gradient term and the compensation term in (13). The gradient term requires (omitting the small order terms) arithmetic operations (addition, multiplication, etc.). The compensation term requires solving the linear equations in (20) with the CG algorithm. Note that, as the CG algorithm has a fast convergence rate and the norm of is usually small (since is small due to the slow timevarying property of the covariance matrix ), computing only step (requires operations) to obtain is sufficient to yield a good compensation . Therefore, the proposed compensation algorithm (summarized in Algorithm 1) has a total computational complexity of around operations. As we discussed in Section IIIB, we usually have for massive MIMO channels, and therefore the complexity of the proposed algorithm is substantially lower than of the brute force computing of Theorem 1 using SVD [36].
IvC2 Implementation Considerations
Fig. 3 gives a diagram of the associated signaling for the twotier precoding. In stage (a), each BS broadcasts channel training sequences using the outer precoder at each subframe. In stage (b), each MS feeds back the low dimensional equivalent channel at each subframe and full dimension ( interference covariance matrices only at the end of each superframe. In stage (c), BSs exchange the covariance matrix profile for each MS cluster through the backhaul only at the end of each superframe. As a result, the pilot symbols for channel estimation, the CSI feedback overhead and the signaling over the backhaul have been greatly reduced in the massive MIMO system.
V Convergence Analysis of the Outer Precoding Algorithm
In this section, we analyze the tracking performance of the proposed
iterative outer precoder tracking algorithms under timevarying channels.
We are interested in whether algorithm (13)
will converge to the global optimal solution of the problem in (6).
However, since the problem in (6)
is nonconvex and there are multiple stationary points for
the algorithm, existing techniques [37, 38]
for the convergence analysis under timevarying channels cannot be
applied. In general, we shall address the following challenge:
Challenge 2: Analyze the convergence behavior for the outer
precoder tracking algorithm under the timevarying massive MIMO channel,
despite the optimization problem being nonconvex.
Towards this end, we extend the analysis framework in [37, 38] and obtain the results of the algorithm tracking performance by analyzing an equivalent continuoustime virtual dynamic system (VDS), which models the behavior of the algorithm iteration. Please refer to Appendix C for details.
Va Convergence under Static Channel Covariance
When the channel covariance matrices are static, the compensation term in iteration (13) is always zero. Hence, the compensation algorithm (13) degenerates to a pure gradient descent algorithm (11). To establish the convergence results, we first derive the following uniqueness property for the algorithm.
Lemma 1 (Uniqueness of Global Optimal Point)
Suppose under a given , the covariance matrix has distinct th and th smallest eigenvalues for each . Then there is only one global optimal stationary point for the iteration (13). ∎
Please refer to Appendix C for the proof.
Based on Lemma 1, we shall establish the global convergence result below.
Theorem 3
(Global Convergence under Static Channel Covariance) There exists , such that under the distinct eigenvalue condition in Lemma 1 and choosing step size , the proposed algorithm converges to the global optimal solution . ∎
Please refer to Appendix C for the proof.
The above theorem concludes that, although the original outer precoding problem (6) is nonconvex, the proposed algorithm is guaranteed to converge to the global optimal solution under static channel covariance.
Remark 4 (Global Convergence of NonConvex Problem)
As we pointed out in Section IVA, problem (6) is nonconvex. Yet, after the problem reformation, the new problem in (10) on the manifold has the following structure: there is only one maximum point (attractive) among all the other KKT points (repulsive) as illustrated in Fig. 4. As a result, the iterative algorithm trajectory will converge to the maximum point almost surely.
VB Convergence under TimeVarying Channel Covariance
We now study the case under timevarying channel covariance. Under timevarying channels, the global optimal solution is also time varying in similar timescale as the algorithm iteration (13) and hence, it is not clear if the iterate can converge to .
To analyze the tracking performance of the outer precoder iterations in (13), we approximate the discretetime iterations with the following continuous time iterations^{6}^{6}6The iteration of (13) is a discretization of the compensated virtual dynamic system at (for example, by replacing in (21) with in (13)). , which is defined as the solution of the following differential equations:
(21)  
(22) 
We evaluate the tracking behavior of the outer precoder iteration in the following.
Theorem 4
(Convergence of the Outer Precoder Iteration in Timevarying Channels) Assume the distinct eigenvalue condition in Lemma 1. In addition, suppose the largest eigenvalue of is bounded w.p.1 for all . Then there exists , such that for , we have , w.p.1, as . ∎
Please refer to Appendix D for the proof.
The result in Theorem 4 implies that perfect tracking of the outer precoder in timevarying channels is possible if the initial iterate is sufficiently close to the global optimal point .
Vi Numerical Results
We consider a cellular network with cells, where each cell has 2 clusters and each cluster has users. Fig. 5 illustrates a realization of the network topology. The intersite distance is m. The largescale propagation follows the outdoor evaluation methodology in LTE standard [39] with pathloss exponent 2.6. Each BS is equipped with antennas and each user is equipped with antenna. We generate the massive MIMO channels according to (3), where the transmit correlation matrices are specified by (2) and the AS parameter is modeled as deg. Moreover, the small timescale channel variation in (3) is modeled by the widely used autoregressive (AR) model [40] given by , where is a standard complex Gaussian matrix, is the temporal correlation coefficient, is the zeroth order Bessel function, is the maximum Doppler frequency, and is the subframe duration. The length of the superframe is . The noise is normalized as equal to the smallest direct link power gain.
We consider the following baselines: Baseline 1 (Onetier coordinated MIMO using ZF [23]): In each subframe, full CSI is used to compute the precoder, which zeroforces both the intercell and intracell interference. Baseline 2 (Twotier precoding using the BD algorithm in [8]): Twotier precoding strategy in [8] is applied, where the outer precoder is computed by the BD algorithm in [8]. Baseline 3 (Twotier precoding with conventional gradient algorithm for the outer precoder [13]): The twotier precoding strategy (equations (6), (7) and (8)) is applied, where the solution of the outer precoders given in Theorem 1 are computed iteratively using the gradient algorithm in [13].
Note that Baseline 1 suffers from implementation challenges in massive MIMO systems, such as huge pilot symbols and feedback overhead, and realtime global CSI sharing as discussed in Section I. Hence it serves as performance benchmark only.
Via Throughput Performance
Fig. 6 shows the per cell throughput versus the per BS transmit power under MS speed 10 km/h. The onetier cooperative ZF scheme (Baseline 1) achieves the highest data rate when there is no signaling latency for the BSs to exchange global CSI over the backhaul. However, the performance of Baseline 1 is very sensitive to the signaling latency and its performance degrades significantly when 5 ms backhaul latency is considered^{7}^{7}7As a benchmark, the X2 interface in eNode B of LTE systems usually induce 1020 ms latency [39].. On the other hand, the performance of the two tier precoding schemes (Baseline 2, Baseline 3 and proposed scheme) are robust to signaling latency, as they do not require instantaneous global CSI. The proposed scheme achieves slightly better performance compared with Baseline 2 but with substantially lower complexity (Table II).
Fig. 7 shows the per cell throughput versus the MS speed under per BS transmit power budget dB. Similarly, the proposed scheme with compensation achieves good performance but with substantially lower complexity. In addition, it significantly outperforms Baseline 3 at high MS speed. This confirms the superior tracking capability of the proposed compensation algorithm under timevarying channels. As a comparison, the throughput performance of Baseline 1 drops quickly when increasing the MS speed under ms backhaul latency.
ViB Feedback Loading and Complexity
Table I shows numerical examples of CSI feedback amount and signaling loading in terms of number of complex numbers per cell per subframe following the discussion in Section IVC2. We assume each cell only needs to exchange CSI to neighboring cells. Baseline 1 requires a high feedback cost and signaling loading. Whereas, the proposed scheme has significantly lowered the CSI feedback overhead among BSs in massive MIMO systems.
Feedback amount  Signaling loading  
BL 1  BL 2, 3 & Prop  BL 1  BL 2, 3 & Prop  