Two-Tier Precoding for FDD Multi-cell Massive MIMO Time-Varying Interference Networks

# Two-Tier Precoding for FDD Multi-cell Massive MIMO Time-Varying Interference Networks

Junting Chen,  and Vincent K. N. Lau,  This paper was accepted in IEEE Journal on Selected Areas in Communications, special issue on 5G wireless communication systems.The authors are with the Department of Electronic and Computer Engineering (ECE), The Hong Kong University of Science and Technology (HKUST), Hong Kong (e-mail: {eejtchen, eeknlau}@ust.hk).
###### Abstract

Massive MIMO is a promising technology in future wireless communication networks. However, it raises a lot of implementation challenges, for example, the huge pilot symbols and feedback overhead, requirement of real-time global CSI, large number of RF chains needed and high computational complexity. We consider a two-tier precoding strategy for multi-cell massive MIMO interference networks, with an outer precoder for inter-cell/inter-cluster interference cancellation, and an inner precoder for intra-cell multiplexing. In particular, to combat with the computational complexity issue for the outer precoding, we propose a low complexity online iterative algorithm to track the outer precoder under time-varying channels. We follow an optimization technique and formulate the problem on the Grassmann manifold. We develop a low complexity iterative algorithm, which converges to the global optimal solution under static channels. In time-varying channels, we propose a compensation technique to offset the variation of the time-varying optimal solution. We show with our theoretical result that, under some mild conditions, perfect tracking of the target outer precoder using the proposed algorithm is possible. Numerical results demonstrate that the two-tier precoding with the proposed iterative compensation algorithm can achieve a good performance with a significant complexity reduction compared with the conventional two-tier precoding techniques in the literature.

{keywords}

Massive MIMO, Two-tier Precoding, Tracking Algorithm, Optimization, Grassmann Manifold

## I Introduction

Massive MIMO is a promising technology to meet the future capacity demand in wireless cellular networks. Equipped with a large number of antennas, the system has a sufficient number of degrees of freedom (DoF) to exploit the spatial multiplexing gains for intra-cell users and to mitigate the inter-cell interference. However, the corresponding beamforming (precoder) designs for such multiuser MIMO (MU-MIMO) interference networks are challenging even in traditional MIMO systems with a small number of antennas. In [1], the inter-cell interference is mitigated by using coherently coordinated transmission (CCT) from multiple base stations (BSs) to each user, using commonly shared global channel state information (CSI). In [2], the beamformers are jointly optimized among BSs, where the uplink-downlink duality is used to obtain the global CSI in a time-division duplex (TDD) system. Using alternative optimization techniques, WMMSE algorithm is proposed in [3] with the objective to maximize the weighted sum rate for multi-cell systems. Moreover, interference alignment (IA) approaches were used in [4, 5] for downlink interference cellular networks.

We consider the beamforming design for frequency-division duplex (FDD)111FDD is still a major duplexing technique in the near future, especially for macro-coverage applications. massive MIMO systems with a large number of antennas . Unlike conventional multi-cell MU-MIMO networks, where the schemes in [1, 2, 3, 4, 5] may be easily implemented, FDD massive MIMO systems induce a lot of practical issues: (i) huge pilot symbols and feedback overheads, (ii) large number of RF chains, (iii) real-time global CSI sharing, and (iv) huge computational complexity for precoders at the BSs. For instance, the required number of independent pilot symbols for transmit side CSI (CSIT) estimation at the mobile scales as , and so as the CSIT feedback overheads. In addition, as scales up, the number of RF chains also scales up, which induces a high fabrication cost and power consumption. Although the dynamic antenna switching techniques [6, 7] may reduce the required number of RF chains, those solutions did not fully utilize the benefits of the extra antennas. Moreover, there is signaling latency over the backhauls and it is highly difficult to acquire global real-time CSIT for precoding. Finally, the computational complexity for the precoding algorithms scales quickly with , and low complexity precoding algorithms are needed for massive MIMO systems.

In this paper, we address all the above difficulties by considering a two-tier precoding with subspace alignments. This is motivated by the clustering behavior of the user terminals. As illustrated in Fig. 1, the users in the same cluster may share the same scattering environment, and hence, they may have similar spatial channel correlations. Whereas, users from a different cluster may have different spatial channel correlations. Therefore, we can decompose the MIMO precoder at the BS into an outer precoder and an inner precoder. The outer precoder is used to mitigate inter-cell and inter-cluster interference based on the statistical channel spatial covariance. Since the spatial correlations are slowly varying, the outer precoder can be computed on a slower timescale. On the other hand, the inner precoder is used for spatial multiplexing of intra-cluster users on the dimension-reduced subspace spanned by the outer precoder. As a result, the inner precoders are adaptive to the local real-time CSIT at the BS and can be computed in a faster timescale. Using the proposed two-tier precoding structure, we shall illustrate in Section III-B that the aforementioned technical issues (i)-(iii) associated with large can be substantially alleviated.

In [8, 9], a zero-forcing based two-tier precoding has been proposed for single cell massive MIMO systems. The outer precoders are computed using a block diagonalization (BD) algorithm. However, it requires a high complexity for computing the outer precoder, and the tracking issues for the outer precoder under time-varying channels were not addressed. In fact, the computational complexity is a serious concern in massive MIMO systems as the number of antennas scales to very large. For example, in the BD algorithm proposed in [8, 9], we need to apply a series of matrix manipulations including SVD to a number of channel covariance matrices each time we update the outer precoder, and the associated complexity is . In addition, deriving a low complexity iterative algorithm for the BD solution in [8, 9] is far from trivial. To address the complexity issue, we consider online tracking solutions to exploit the temporal correlation of the channel matrices. There is a body of literature for iterative subspace tracking algorithms, for example, gradient-based algorithms [10, 11, 12, 13], power iteration based algorithms [14] and the algorithms based on Krylov subspace approximations [15, 16]. Moreover, the author in [17] proposed an iterative subspace tracking precoder design for MIMO cellular networks. However, these algorithms have not fully exploited the channel temporal correlations to enhance the tracking. In this paper, we propose a compensated subspace tracking algorithm for the online computation of the outer precoder. The algorithm is derived by solving an optimization problem formulated on the Grassmann manifold, and its tracking capability is enhanced by introducing a compensation term that estimates and offsets the motion of the target signal subspace. Using a control theoretical approach, we also characterize the tracking performance of the online outer precoding algorithm in time-varying massive MIMO systems. We show that, under mild technical conditions, perfect tracking (with zero convergence error) of the target outer precoder using the proposed compensation algorithm is possible, despite the channel covariance matrix being time-varying. In general, we demonstrate with numerical results that the proposed two-tier precoding algorithm has a good system performance with low signaling overhead and low complexity of .

The rest of the paper is organized as follows. Section II introduces the massive MIMO channel model and the signal model. Section III illustrates the two-tier precoding techniques. Section IV derives the iterative algorithm for tracking the outer precoder, where the associated convergence analysis is given in Section V. Numerical results are given in Section VI and Section VII gives the concluding remarks.

Notations: We use lower case bold font to denote vectors and upper case bold font for matrices. denotes the identity matrix. For matrices and , denotes a concatenated matrix, whose first columns are given by and the last columns are given by .

## Ii System Model

### Ii-a Massive MIMO Channel Model with Local Spatial Scattering

We consider a cellular network with BSs, and the -th () BS serves MSs. The MSs are clustered together, and without lost of generality, we assume each BS serves one cluster of MSs222Note that the extension to the case of multiple clusters is very straight forward. Hence we only focus on the single cluster case to simplify the notation.. Each BS has antennas and each user has antennas. The downlink channel from the -th BS to the -th MS in the -th cell is given by . The receive signal at the MS in cell is given by

 yb,k=H[b]b,kx[b]+G∑l=1,l≠bH[l]b,kx[l]+nb,k

where is the symbol transmitted at the BS , is the number of data streams transmitted by BS , and is the additive complex Gaussian noise.

In the massive MIMO system, where the BS has a large number of antennas () and is placed on the top of a building, there is usually not enough local scattering surrounding the BS. Correspondingly, it has also been shown by channel measurements that most of the signal energy is localized over the azimuth direction [18]. Therefore, we consider the one-ring local scattering model [19, 20] to characterize the massive MIMO channel. As illustrated in Fig. 1, the local scattering surrounding the MS is modeled by a ring with radius ; whereas, the transmit signal from the BS shapes a narrow angular spread (AS) denoted as , where is the distance between BS and MS in cell . Let be the angle of departure (AoD) of a path from BS to MS in cell . We use the von-Mises model to characterize the power azimuth spectrum (PAS) w.r.t. [21, 19] as follows:

 P[l]θ,(b,k)(θ[l]b,k)=exp[κ[l]b,kcos(θ[l]b,k−¯¯¯θ[l]b,k)]2πJ0(κ[l]b,k), (1)

where is the mean angle of the AoD, is the zero-th order modified Bessel function and characterizes the AS at BS in the direction of MS in cell . \mysubnoteR1-A3.c Denote as the corresponding transmit spatial correlation matrix at BS . The -th entry of the matrix , which describes the spatial correlations between the -th and -th antenna elements at BS [22], is defined as:

 [T[l]b,k](p,q)=∫π−πej[ϕ[l]p(θ)−ϕ[l]q(θ)]P[l]θ,(b,k)(θ)dθ (2)

where accounts for the phase difference between the -th and -th antenna elements over the azimuth direction at BS .

We assume that MSs within the same cluster have the same channel statistical parameters and , i.e., and , . As a result, the transmit correlation matrices satisfy for all MS in the scattering cluster of cell . We adopt the following two-timescale, clustered, and spatial correlated massive MIMO channel model.

###### Assumption 1

(Two-timescale, Clustered and Spatial Correlated Channel Model) The time-varying massive MIMO channel on each subframe is given by

 H[l]b,k(j)=Hωk(j)T[l]b(j)1/2 (3)

where and are changing in different timescales:

• Small Timescale: are identical and independently distributed (i.i.d.) over MSs and is time-varying over subframes . Each element of the matrix follows an independent complex Gaussian distribution with zero mean and unit variance.

• Large Timescale: The spatial correlation matrix is constant within each super-frame , but changes between consecutive super-frames (i.e., a block of subframes).

### Ii-B Signal Model, Interference Mitigation and Challenges

Denote the precoding matrix for user in cell as and the associated receiver shaping matrix as , where is the number of the data streams. Applying the receiver shaping matrix to the signal at MS in cell , the received signal is given by

 ^yb,k = U†b,kH[b]b,kV[b]ks[b]k+U†b,kH[b]b,kKb∑j=1,j≠kV[b]js[b]j\scriptsize intra-cell interference (4) +U†b,kG∑l=1,l≠bH[l]b,kKl∑j=1V[l]js[l]j\scriptsize inter-cell interference+^nb,k

where is still a standard complex Gaussian noise and is the data symbol intended for user in cell . The per BS power budget is .

In the conventional approach, the inter-cell interference mitigation and intra-cell spatial multiplexing are achieved by a joint design of the precoders and receiver shaping matrices among all the BSs using, for example, ZF techniques [23, 24], WMMSE [3], IA [25, 5], etc. However, these approaches cannot be directly applied in FDD massive MIMO cellular systems because:

• A large number of RF chains () are required to perform RF-baseband translation as well as Analog-to-Digital (A/D) conversion. As a result, there is a huge cost in hardware design and power consumption.

• A huge amount of pilot symbols should be used to estimate the massive MIMO channels (a large matrix), and a huge CSI feedback overhead is involved.

• Global real-time CSIT is required for computing . However, the cross link information can only be obtained via message passing among the backhauls connecting the BS. This induces a huge burden on \mysubnoteR1-A2 the backhaul and increases the signaling latency.

\mysubnote

R1-A3.a

###### Remark 1 (Inter-cell Interference in Massive MIMO)

It is reported that the inter-cell interference (ICI) of multi-cell massive MIMO systems can be asymptotically ignored [26] using simple per-cell zero-forcing. One key assumption is that the direct links and interference links are spatially uncorrelated. However, such uncorrelation may not hold under local scattering (such as the one-ring scattering model considered in this paper), and hence, ICI coordination may be needed for massive MIMO.

\mysubnote

Can we call it limited scattering? Line-of-sight propagation also satisfies the uncorrelated assumption.

To deal with these challenges, we propose a two-tier precoding in the next section.

## Iii Two-Tier Precoding: Joint Signal and Interference Subspace Alignment

In this section, we propose a two-tier precoding structure by exploiting the limited local scattering and the clustering structure of mobile users in cellular systems.

### Iii-a Two-tier Precoding with Subspace Alignment

The precoder at BS to MS have the two-tier structure given by:

 V[b]k=Φ[b]F[b]k (5)

where is the outer subspace precoder that adapts to the large timescale spatial correlations to mitigate the inter-cell interference, and is the inner precoder that utilizes the real-time local CSIT to mitigate the intra-cell interference. The outer precoder is computed in a long-timescale once every super-frame, and the inner precoder (and the corresponding receiver shaping matrices ) is computed in a short-timescale once every subframe. The parameter determines the dimension of the subspace for intra-cell spatial multiplexing. In the massive MIMO scenario, we have .

Specifically, the two-tier precoding with subspace alignment is described below:

• Long-Timescale Processing: In each super-frame, the subspace precoders are chosen as the solution to the following optimization problem

 min{Φ[b]} ∑l,b,k,l≠bE∥∥H[l]b,kΦ[l]∥∥2F\scriptsize inter-cell % interference−w∑b,kE∥∥H[b]b,kΦ[b]∥∥2F\scriptsize intra-cell signal energy (6)

subject to , where the expectations are conditioned on the spatial correlation matrices and is a weight parameter.

\mysubnote

R1-A3.d

###### Remark 2

Minimizing the first term in (6) only corresponds to the conventional ZF solution. Whereas, minimizing the second term alone corresponds to the match filter (MF) solution. The formulation (6) is to strike a balance between the inter-cell interference leakage and the intra-cell signal energy in a system with a large but finite number of antennas. On one hand, for large number of transmit antennas, the second term dominates and the solution approaches the MF solution. On the other, for limited number of antennas (such as traditional traditional MIMO), the first term is significant and the solution approaches the coordinated ZF solution. The weight is to adjust the balance between the inter-cell interference and the direct link signal.

• Short-Timescale Processing: In each subframe, a ZF precoding [23] is used.

Step 1: Choose the receiver shaping matrix for MS in cell by solving

 min{Ub,k}∑l≠b∥∥U†b,kH[l]b,kΦ[l]∥∥2F% \scriptsize remaining inter-cell interference−w∥∥U†b,kH[b]b,kΦ[b]∥∥2F% \scriptsize direct link signal (7)

subject to , and feedback the equivalent channel to BS .

Step 2: Concatenate the rows of for each MS to form a matrix . The inner precoder is given by333We assume that the number of data streams assigned to each user always satisfy . Hence, with probability 1, the matrix has full row rank.

 F[b]=√Pdb˜H[b]‡=√Pdb˜H[b]†(˜H[b]˜H[b]†)−1 (8)

where denotes the pseudo-inverse for . The inner precoder for MS is given by the -th to the -th columns of .

###### Remark 3

Similar to the outer precoding, the problem (7) tries to strike a balance between the (remaining) inter-cell interference and the direct link signal. The inner precoder (8) corresponds to the ZF solution in a single cell multiuser MIMO system [23].

### Iii-B Motivation of the Two-Tier MIMO Precoding and the Complexity Issue

The two-tier MIMO precoding has the following advantages:

1. A light demand on pilot symbols and CSI feedback overheads: With the outer precoding, the MS only needs to estimate the effective channel and feedback the channel matrix . Instead of directly working on the channel matrix , there is a huge saving on the pilot symbols for channel estimation and the CSI feedback loading.

2. A relatively small number of RF chains required: With the limited local scattering around the BS, there is only a few active eigen-modes for the massive MIMO channel, and a small number of spatial multiplexing data streams can be supported. Therefore, we do not need to implement RF chains. Instead, only RF chains are required, where and the outer precoder can be implemented using the RF phase shifting network [27, 28] as illustrated in Fig. 2.

3. Only statistical global CSI required: The inner precoder only requires the local CSI between the BS and its serving MSs. To update the outer precoder in the long-timescale, only the knowledge of channel statistics is required. As a result, the performance is insensitive to backhaul latency among the BSs.

Having addressed the practical issues (i)-(iii) raised in Section I, we now focus on the computational complexity issue in (iv). Note that, with the dimension reduction for the inner precoder, the computation for the outer precoder dominates the complexity. We first investigate the solution property for the outer precoding problem (6).

###### Theorem 1 (Solution to the Outer Precoder)

The optimal solution to the outer precoding problem (6) is given by the eigenvectors corresponding to the smallest eigenvalues of the covariance matrix

 Q[b] (9)

for each BS .   ∎

{proof}

Please refer to Appendix A for the proof.

Although Theorem 1 gives a closed form expression for , computing still require a huge computation complexity of . For example, using SVD for the covariance matrix requires arithmetic operations. We will address the computation complexity issue in Section IV and V.

### Iii-C Achievable Per-cell DoF of Two-Tier Precoding in Massive MIMO

In this section, we characterize the performance of the two-tier precoding by evaluating its achievable DoF per-cell. The DoF can be interpreted as the number of data streams or the asymptotic throughput performance that can be supported in the massive MIMO systems at high SNR [25, 29]. Denote as the sum throughput of cell . The per-cell DoF of the massive MIMO system is defined as .

For simplicity, we consider a symmetric massive MIMO network, where each cell has the same number of MSs, , the same rank of transmit spatial correlation matrices , and for all . We derive the network DoF of the two-tier precoding in the symmetric massive MIMO system below.

###### Theorem 2

(Per-cell DoF of Symmetric Massive MIMO Systems with Two-tier Precoding) For a symmetric massive MIMO network , where all the transmit correlation matrices have rank , if , then

• the per-cell DoF of the proposed two-tier precoding is given by

• the per-cell DoF of conventional one-tier interference alignment (with global real-time CSIT) is given by

{proof}

Please refer to Appendix B for the proof.

As a result, there is no loss of DoF performance using the proposed two-tier precoding design in a symmetric multi-cell massive MIMO network444Although the DoF result only focuses on a special network topology region specified by , the region does cover the interested application scenario of massive MIMO systems, since is usually very large and is relatively small in massive MIMO systems..

## Iv Iterative Algorithms for Outer Precoder under Time-Varying Channels

To reduce the complexity of finding the global optimal solution for the outer precoder problem in (6), one approach is to leverage on the slowly varying nature of the spatial correlation and to compute the outer precoder iteratively at every super-frame. A common technique for such iterative outer precoder is to apply the gradient descent algorithm to solve problem (6). However, such a “naive” method may have a poor convergence performance, because problem (6) is non-convex due to the quadratic equality constraints . Furthermore, problem (6) suffers from uncountably many non-unique and non-isolated local optima. For example, if is one local optimum, then gives another local optimum, where is any unitary matrix. Such a non-isolated property makes it hard to develop iterative algorithms with fast convergence to the global optimal solution under time varying channels. Hence, we need to tackle the following challenge,
Challenge 1: To derive a low complexity iterative algorithm which can be shown to converge to the desired solution for the outer precoder under time-varying channels.

To deal with the above challenge, we focus on deriving algorithms on the Grassmann manifold, where all the outer precoders that span the same subspace are considered to be equivalent and are represented by a single point on the Grassmann manifold. As a result, the local optimum becomes isolated.

### Iv-a Transformation of Problem (6) on Grassmann Manifold

A Grassmann manifold is the set of all -dimensional subspaces of : , where denotes the space spanned by the columns of the matrix . The Grassmann manifold can be considered as a topology embedded on the Euclidean space with a mapping that maps each point from the Euclidean space to the manifold . For example, all the matrices ( with full rank ), which span the same subspace as does, are all mapped to the same element in under the mapping , i.e., . On the other hand, the inverse mapping represents the set of matrices in the Euclidean space that span the same subspace.

Consider as an element on the Grassmann manifold, i.e., . The outer precoding problem (6) (see equation (23) in Appendix A) can be reformulated as \mysubnoteR2-A3 an optimization over the Grassmann manifold [30, 31]

 min˜ΦI(˜Φ)≜G∑b=1 tr[(Φ[b]†Φ[b])−1Φ[b]†Q[b]Φ[b]] (10)

where the global optimal solution is defined to be the subspace that yields the minimum objective value of (10).

Note that, there is a substantial difference between the formulation (10) in the Grassmann manifold and (6) in the Euclidean space. While the objective in (6) is to find a matrix in the Euclidean space that minimizes the interference and signal utility function , the problem (10) focuses on choosing the right subspace , \mysubnoteR2-A4 which is a unique solution to minimizing 555Problem (10) has a unique solution, if, for , the -th eigenvalue of the covariance matrix has multiplicity 1. . With this insight, we can derive algorithms to obtain the subspace precoders more efficiently.

### Iv-B Outer Precoder Tracking Algorithm with Compensations

An intuitive way yo derive an algorithm that solves the problem (10), is to generalize the gradient descent algorithm to the Grassmann manifold:

 ˜Φ[n+1]=˜Φ[n]+γnF(˜Φ[n];Q[n]) (11)

where is the step size, and

 F(˜Φ;Q)≜∇I(˜Φ;Q) (12)

is the gradient iteration mapping on the Grassmann manifold and is a collection of the covariance matrices at iteration . The notation emphasizes that is the key parameter that determines the optimal solution .

Although the gradient descent algorithm usually finds a stationary point under static parameters, it may not be the case under the time-varying parameter . Under time-varying channels, the channel covariance matrices are varying in a similar timescale as the gradient iterations and there is always a convergence gap between the gradient iterate and the time-varying optimal solution . Taking the precoder for BS as an example: When the gradient iteration gets closer to the previous target at the -th iteration, the optimal target has already moved to a new position , which contributes to an additional tracking error.

Intuitively, one way to enhance the tracking of the outer precoder under time-varying channels is to estimate the motion of the moving target and compensate for it:

 ˜Φ[n+1]=˜Φ[n]+γnF(˜Φ[n];Q[n])+△ˆ~Φ∗[n] (13)

where the compensation term is an estimation of the difference of .

In the following, we illustrate how to derive the gradient mapping and the compensation term .

#### Iv-B1 Gradient on the Grassmann Manifold

Using calculus on Grassmann manifolds [31, 32], the gradient of in (12) can be derived from , where is the gradient of on the Euclidean space and is to project onto the tangent space [32] of on the Grassmann manifold. Moreover, it is observed that, there is no cross product of and in , and hence we can compute separately. As a result, the gradient is given by , where is the partial derivative w.r.t. the precoder , ,

 F[b](Φ[b];Q[b])=[I−Φ[b](Φ[b]†Φ[b])−1Φ[b]†]\scriptsize Projection PΦ[b]Q[b]Φ[b]\scriptsize Gradient ∇Φ[b]I. (14)

#### Iv-B2 Derivation of the Compensation

Consider the parameter in (10) as a discrete-time sampling of the continuous-time covariance matrix profile . The optimality condition [33, 34] of problem (10) on the Grassmann manifold is given by:

 F(˜Φ∗;Q(t))=0. (15)

Note that, since the function is nonlinear, we cannot easily solve (15) to get . However, we are only interested in the differential .

Taking the differentiation on (15) w.r.t. , we get

 Hess˜Φ∗(d˜Φ∗)+F(˜Φ∗;dQ(t))=0 (16)

where is the partial differential of on the covariance matrix profile , and is the partial differential of on on the Grassmann manifold along the direction . Note that as the function in (15) is the gradient of the objective function in (10), represents the Hessian of .

\mysubnote

R2-A5]

Consider the case that the optimal solution is non-degenerate, i.e., the function in (15) has a unique solution over the neighborhood of . By the implicit function theorem [32], the linear equation (16) has a unique solution . Consider that the outer precoder obtained from the previous iteration is already a good approximation of , and the fact that the objective function is decoupled on each component , we can estimate the differential by , where , , is obtained by solving (17) for ,

 HessΦ[b](^ξ[b])+F[b](Φ[b];dQ[b](t))=0. (17)

#### Iv-B3 Low Complexity Calculation on Grassmann Manifold for the Compensation Term

Although the compensation equation (17) is linear in the matrix variable , it is a Sylvester equation in the general form , which is difficult to solve. However, using the property that is a point on the Grassmann manifold, we can find a low complexity algorithm to solve the compensation equation (17).

Consider are already orthonormalized. Using the calculus on the Grassmann manifold [35, 31], the term can be derived as

 HessΦ[b](^ξ[b])=PΦ[b]{limt→0[F(Φ[b]∗+t^ξ[b];Q)−F(Φ[b]∗;Q)]}=PΦ[b](Q[b]^ξ[b]−^ξ[b]Φ[b]†Q[b]Φ[b]) (18)

Notice that is linear in (c.f. (14)). Multiplying (17) with a unitary matrix on the right, we obtain

 PΦ[b][Q[b](^ξ[b]M)−(^ξ[b]M)M†Φ[b]†Q[b]Φ[b]MΨ] −F[b](Φ[b]M;Q[b])=0 (19)

where diagonalizes , i.e., and . Let . Since is diagonal, equation (19) can be written into parallel linear matrix equations according to each column of ,

 PΦ[b](Q[b]−βiI)Yi+F[b](Φ[b]M;dQ[b])i=0 (20)

where , and are the -th () columns of and , respectively. The above linear equation can be solved by the conjugate gradient (CG) algorithm, which only has complexity of .

### Iv-C Complexity and Implementation Considerations

#### Iv-C1 Computational Complexity

The computational complexity of the proposed compensation algorithm (13) is mainly contributed by the gradient term and the compensation term in (13). The gradient term requires (omitting the small order terms) arithmetic operations (addition, multiplication, etc.). The compensation term requires solving the linear equations in (20) with the CG algorithm. Note that, as the CG algorithm has a fast convergence rate and the norm of is usually small (since is small due to the slow time-varying property of the covariance matrix ), computing only step (requires operations) to obtain is sufficient to yield a good compensation . Therefore, the proposed compensation algorithm (summarized in Algorithm 1) has a total computational complexity of around operations. As we discussed in Section III-B, we usually have for massive MIMO channels, and therefore the complexity of the proposed algorithm is substantially lower than of the brute force computing of Theorem 1 using SVD [36].

#### Iv-C2 Implementation Considerations

Fig. 3 gives a diagram of the associated signaling for the two-tier precoding. In stage (a), each BS broadcasts channel training sequences using the outer precoder at each subframe. In stage (b), each MS feeds back the low dimensional equivalent channel at each subframe and full dimension ( interference covariance matrices only at the end of each super-frame. In stage (c), BSs exchange the covariance matrix profile for each MS cluster through the backhaul only at the end of each super-frame. As a result, the pilot symbols for channel estimation, the CSI feedback overhead and the signaling over the backhaul have been greatly reduced in the massive MIMO system.

## V Convergence Analysis of the Outer Precoding Algorithm

In this section, we analyze the tracking performance of the proposed iterative outer precoder tracking algorithms under time-varying channels. We are interested in whether algorithm (13) will converge to the global optimal solution of the problem in (6). However, since the problem in (6) is non-convex and there are multiple stationary points for the algorithm, existing techniques [37, 38] for the convergence analysis under time-varying channels cannot be applied. In general, we shall address the following challenge:
Challenge 2: Analyze the convergence behavior for the outer precoder tracking algorithm under the time-varying massive MIMO channel, despite the optimization problem being non-convex.

Towards this end, we extend the analysis framework in [37, 38] and obtain the results of the algorithm tracking performance by analyzing an equivalent continuous-time virtual dynamic system (VDS), which models the behavior of the algorithm iteration. Please refer to Appendix C for details.

### V-a Convergence under Static Channel Covariance

When the channel covariance matrices are static, the compensation term in iteration (13) is always zero. Hence, the compensation algorithm (13) degenerates to a pure gradient descent algorithm (11). To establish the convergence results, we first derive the following uniqueness property for the algorithm.

###### Lemma 1 (Uniqueness of Global Optimal Point)

Suppose under a given , the covariance matrix has distinct -th and -th smallest eigenvalues for each . Then there is only one global optimal stationary point for the iteration (13).   ∎

{proof}

Please refer to Appendix C for the proof.

Based on Lemma 1, we shall establish the global convergence result below.

###### Theorem 3

(Global Convergence under Static Channel Covariance) There exists , such that under the distinct eigenvalue condition in Lemma 1 and choosing step size , the proposed algorithm converges to the global optimal solution .   ∎

{proof}

Please refer to Appendix C for the proof.

The above theorem concludes that, although the original outer precoding problem (6) is non-convex, the proposed algorithm is guaranteed to converge to the global optimal solution under static channel covariance.

###### Remark 4 (Global Convergence of Non-Convex Problem)

As we pointed out in Section IV-A, problem (6) is non-convex. Yet, after the problem reformation, the new problem in (10) on the manifold has the following structure: there is only one maximum point (attractive) among all the other KKT points (repulsive) as illustrated in Fig. 4. As a result, the iterative algorithm trajectory will converge to the maximum point almost surely.

### V-B Convergence under Time-Varying Channel Covariance

We now study the case under time-varying channel covariance. Under time-varying channels, the global optimal solution is also time varying in similar timescale as the algorithm iteration (13) and hence, it is not clear if the iterate can converge to .

To analyze the tracking performance of the outer precoder iterations in (13), we approximate the discrete-time iterations with the following continuous time iterations666The iteration of (13) is a discretization of the compensated virtual dynamic system at (for example, by replacing in (21) with in (13)). , which is defined as the solution of the following differential equations:

 d˜Φc = F(˜Φc;Q(t)dt+ˆd˜Φ∗,˜Φc(0)=˜Φ0 (21) 0 = Hess˜Φc(d˜Φc)+F(˜Φc;dQ(t)). (22)

We evaluate the tracking behavior of the outer precoder iteration in the following.

###### Theorem 4

(Convergence of the Outer Precoder Iteration in Time-varying Channels) Assume the distinct eigenvalue condition in Lemma 1. In addition, suppose the largest eigenvalue of is bounded w.p.1 for all . Then there exists , such that for , we have , w.p.1, as .   ∎

{proof}

Please refer to Appendix D for the proof.

The result in Theorem 4 implies that perfect tracking of the outer precoder in time-varying channels is possible if the initial iterate is sufficiently close to the global optimal point .

## Vi Numerical Results

We consider a cellular network with cells, where each cell has 2 clusters and each cluster has users. Fig. 5 illustrates a realization of the network topology. The inter-site distance is m. The large-scale propagation follows the outdoor evaluation methodology in LTE standard [39] with pathloss exponent 2.6. Each BS is equipped with antennas and each user is equipped with antenna. We generate the massive MIMO channels according to (3), where the transmit correlation matrices are specified by (2) and the AS parameter is modeled as deg. Moreover, the small timescale channel variation in (3) is modeled by the widely used autoregressive (AR) model [40] given by , where is a standard complex Gaussian matrix, is the temporal correlation coefficient, is the zero-th order Bessel function, is the maximum Doppler frequency, and is the subframe duration. The length of the super-frame is . The noise is normalized as equal to the smallest direct link power gain.

We consider the following baselines: Baseline 1 (One-tier coordinated MIMO using ZF [23]): In each subframe, full CSI is used to compute the precoder, which zero-forces both the inter-cell and intra-cell interference. Baseline 2 (Two-tier precoding using the BD algorithm in [8]): Two-tier precoding strategy in [8] is applied, where the outer precoder is computed by the BD algorithm in [8]. Baseline 3 (Two-tier precoding with conventional gradient algorithm for the outer precoder [13]): The two-tier precoding strategy (equations (6), (7) and (8)) is applied, where the solution of the outer precoders given in Theorem 1 are computed iteratively using the gradient algorithm in [13].

Note that Baseline 1 suffers from implementation challenges in massive MIMO systems, such as huge pilot symbols and feedback overhead, and real-time global CSI sharing as discussed in Section I. Hence it serves as performance benchmark only.

### Vi-a Throughput Performance

Fig. 6 shows the per cell throughput versus the per BS transmit power under MS speed 10 km/h. The one-tier cooperative ZF scheme (Baseline 1) achieves the highest data rate when there is no signaling latency for the BSs to exchange global CSI over the backhaul. However, the performance of Baseline 1 is very sensitive to the signaling latency and its performance degrades significantly when 5 ms backhaul latency is considered777As a benchmark, the X2 interface in e-Node B of LTE systems usually induce 10-20 ms latency [39].. On the other hand, the performance of the two tier precoding schemes (Baseline 2, Baseline 3 and proposed scheme) are robust to signaling latency, as they do not require instantaneous global CSI. The proposed scheme achieves slightly better performance compared with Baseline 2 but with substantially lower complexity (Table II).

Fig. 7 shows the per cell throughput versus the MS speed under per BS transmit power budget dB. Similarly, the proposed scheme with compensation achieves good performance but with substantially lower complexity. In addition, it significantly outperforms Baseline 3 at high MS speed. This confirms the superior tracking capability of the proposed compensation algorithm under time-varying channels. As a comparison, the throughput performance of Baseline 1 drops quickly when increasing the MS speed under ms backhaul latency.