# Linear Precoding Based on Polynomial Expansion: Large-Scale Multi-Cell MIMO Systems

## Abstract

Large-scale MIMO systems can yield a substantial improvements in spectral efficiency for future communication systems. Due to the finer spatial resolution and array gain achieved by a massive number of antennas at the base station, these systems have shown to be robust to inter-user interference and the use of linear precoding appears to be asymptotically optimal. However, from a practical point of view, most precoding schemes exhibit prohibitively high computational complexity as the system dimensions increase. For example, the near-optimal regularized zero forcing (RZF) precoding requires the inversion of a large matrix. To solve this issue, we propose in this paper to approximate the matrix inverse by a truncated polynomial expansion (TPE), where the polynomial coefficients are optimized to maximize the system performance. This technique has been recently applied in single cell scenarios and it was shown that a small number of coefficients is sufficient to reach performance similar to that of RZF, while it was not possible to surpass RZF.

In a realistic multi-cell scenario involving large-scale multi-user MIMO systems, the optimization of RZF precoding has, thus far, not been feasible. This is mainly attributed to the high complexity of the scenario and the non-linear impact of the necessary regularizing parameters. On the other hand, the scalar coefficients in TPE precoding give hope for possible throughput optimization. To this end, we exploit random matrix theory to derive a deterministic expression of the asymptotic signal-to-interference-and-noise ratio for each user based on channel statistics. We also provide an optimization algorithm to approximate the coefficients that maximize the network-wide weighted max-min fairness. The optimization weights can be used to mimic the user throughput distribution of RZF precoding. Using simulations, we compare the network throughput of the proposed TPE precoding with that of the suboptimal RZF scheme and show that our scheme can achieve higher throughput using a TPE order of only 5.

## 1Introduction

A typical multi-cell communication system consists of base stations (BSs) that each are serving user terminals (UTs). The conventional way of mitigating inter-user interference in the downlink of such systems has been to assign orthogonal time/frequency resources to UTs within the cell and across neighboring cells. By deploying an array of antennas at each BSs, one can turn each cell into a multi-user multiple-input multiple-output (MIMO) system and enable flexible spatial interference mitigation [1]. The essence of downlink multi-user MIMO is *precoding*, which means that the antenna arrays are used to direct each data signal spatially towards its intended receiver. The throughput of multi-cell multi-user MIMO systems ideally scales linearly with . Unfortunately, the precoding design in multi-user MIMO requires very accurate instantaneous channel state information (CSI) [2] which can be cumbersome to achieve in practice [3]. This is one of the reasons why only rudimentary multi-user MIMO techniques have found the way into current wireless standards, such as LTE-Advanced [4].

Large-scale multi-user MIMO systems (with ) have received massive attention lately [5], partially because these systems are less vulnerable to inter-user interference. An exceptional spatial resolution is achieved when the number of antennas, , is large; thus, the leakage of signal power caused by having imperfect CSI is less probable to arrive as interference at other users. Interestingly, the throughput of these systems become highly predictable in the large-() regime; random matrix theory can provide simple deterministic approximations of the otherwise stochastic achievable rates [9]. These so-called *deterministic equivalents* are tight as due to channel hardening, but are often very accurate also at small/practical values of and . The deterministic equivalents can, for example, be utilized for optimization of various system parameters [8].

Many of the issues that made small-scale MIMO difficult to implement in practice appear to be solved by large-scale MIMO [6]; for example, simple linear precoding schemes achieve (when and is fixed) high performance in some multi-cell systems [6] and robust to CSI imperfections [5]. The complexity of computing most of the state-of-the-art linear precoding schemes is, nevertheless, prohibitively high in the large-() regime. For example, the optimal precoding parametrization in [13] and the near-optimal *regularized zero-forcing (RZF)* precoding [14] require inversion of the Gram matrix of the joint channel of all users—this matrix operation has cubic complexity in . A notable exception is the matched filter, also known as *maximum ratio transmission (MRT)* [15], which has only square complexity. This scheme is, however, not very appealing from a throughput perspective since it does not actively suppress inter-user interference and thus requires an order of magnitude more antennas to achieve performance close to that of RZF [7].

In this paper, we propose to solve the precoding complexity issue by a new family of precoding schemes called truncated polynomial expansion (TPE) precoding. This family can be obtained by approximating the matrix inverse in RZF by a -degree matrix polynomial which admits a low-complexity multistage hardware implementation. By changing , one achieves a smooth transition in performance between MRT () and RZF (). The hardware complexity of TPE precoding is proportional to , thus the hardware complexity can be tailored to the deployment scenario. Furthermore, the TPE order needs not scale with the system dimensions and to maintain a fixed per-user rate gap to RZF, but it is desirable to increase it with the signal-to-noise ratio (SNR) and the quality of the CSI.

Building on the proof-of-concept provided by our work in [16] and the independent concurrent work of [17], this paper applies TPE precoding in a large-scale multi-cell scenario with realistic characteristics, such as user-specific channel covariance matrices, imperfect CSI, pilot contamination (due to pilot reuse in neighboring cells), and cell-specific power constraints. The th BS serves its UTs using TPE precoding with an order that can be different between cells and thus tailored to factors such as cell size, performance requirements, and hardware resources.

In this paper, we derive new deterministic equivalents for the achievable user rates. The derivation of these expressions is the main analytical contribution and required major analytical advances related to the powers of stochastic Gram matrices with arbitrary covariances. The deterministic equivalents are tight when and grow large with a fixed ratio, but provide close approximations at small parameter values as well. Due to the inter-cell and intra-cell interference, the effective signal-to-interference-and-noise ratios (SINRs) are functions of the TPE coefficients in all cells. However, the deterministic equivalents only depend on the channel statistics, and not the instantaneous realizations, and can thus be optimized beforehand/offline. The joint optimization of all the polynomial coefficients is shown to be mathematically similar to the problem of multi-cast beamforming optimization considered in [18]. We can therefore adapt the state-of-the-art optimization procedures from the multi-cast area and use these for offline optimization. We provide a simulation example that reveals that the optimized coefficients can provide even higher network throughput than RZF precoding at relatively low TPE orders, where TPE orders refers to the number of matrix polynomial terms.

### 1.1Notation

Boldface (lower case) is used for column vectors, , and (upper case) for matrices, . Let , , and denote the transpose, conjugate transpose, and conjugate of , respectively, while denotes the matrix trace function. Moreover, denotes the set of matrices with size , whereas is the set of vectors with size . The identity matrix is denoted by and the stands for the vector with all entries equal to zero. The expectation operator is denoted and denotes the variance. The spectral norm is denoted by and equals the norm when applied to a vector. A circularly symmetric complex Gaussian random vector is denoted , where is the mean and is the covariance matrix. For an infinitely differentiable monovariate function , the th derivative at (i.e., ) is denoted by and more concisely by when . The big notation and little notation mean that is bounded or approaches zero, respectively, as .

## 2System Model

This section defines the multi-cell system with flat-fading channels, linear precoding, and channel estimation errors.

### 2.1Transmission Model

We consider the downlink of a multi-cell system consisting of cells. Each cell consists of an -antenna BS and single-antenna UTs. We consider a time-division duplex (TDD) protocol where the BS acquires instantaneous CSI in the uplink and uses it for the downlink transmission by exploiting channel reciprocity. We assume that the TDD protocols are synchronized across cells, such that pilot signaling and data transmission take place simultaneously in all cells.

The received complex baseband signal at the th UT in the th cell is

where is the transmit signal from the th BS and is the channel vector from that BS to the th UT in the th cell, and is additive white Gaussian noise (AWGN), with variance , at the receiver’s input.

The small-scale channel fading is modeled as follows.

The two technical conditions on in Assumption A- ? enables asymptotic analysis and follow from the law of energy conservation and from increasing the physical size of the array with ; see [21] for a detailed discussion.

Based on this assumption, the BS in the th cell transmits the signal

The latter is obtained by letting be the precoding matrix of the th BS and be the vector containing all the data symbols for UTs in the th cell. The transmission at BS is subject to a total transmit power constraint

where is the average transmit power per user in the th cell.

The received signal can now be expressed as

A well-known feature of large-scale MIMO systems is the channel hardening, which means that the effective useful channel of a UT converges to its average value when grows large. Hence, it is sufficient for each UT to have only statistical CSI and the performance loss vanishes as [7]. An ergodic achievable information rate can be computed using a technique from [22], which has been applied to large-scale MIMO systems in [5] (among many others). The main idea is to decompose the received signal as

and assume that the channel gain is known at the corresponding UT, along with its variance and the average sum interference power caused by simultaneous transmissions to other UTs in the same and other cells. By treating the inter-user interference (from the same and other cells) and channel uncertainty as worst-case Gaussian noise, UT in cell can achieve the ergodic rate

without knowing the instantaneous values of of its channel [22]. The parameter is given in at the top of the next page and can be interpreted as the effective average SINR of the th UT in the th cell.

The last expression in is obtained by using the following identities:

The achievable rates only depend on the statistics of the inner products of the channel vectors and precoding vectors. The precoding vectors should ideally be selected to achieve a strong signal gain and little inter-user and inter-cell interferences. This requires some instantaneous CSI at the BS, as described next.

### 2.2Model of Imperfect Channel State Information at BSs

Based on the TDD protocol, uplink pilot transmissions are utilized to acquire instantaneous CSI at each BS. Each UT in a cell transmits a mutually orthogonal pilot sequence, which allows its BS to estimate the channel to this user. Due to the limited channel coherence interval of fading channels, the same set of orthogonal sequences is reused in each cell; thus, the channel estimate is corrupted by pilot contamination emanating from neighboring cells [5]. When estimating the channel of UT in cell , the corresponding BS takes its received pilot signal and correlates it with the pilot sequence of this UT. This results in the processed received signal

where and is the effective training SNR [7]. The MMSE estimate of is given as [24]:

where

and is the channel covariance matrix of vector , as described in Assumption A- ?. The estimated channels from the th BS to all UTs in its cell is denoted

and will be used in the precoding schemes considered herein.

For notational convenience, we define the matrices

and note that since the channels are Rayleigh fading and the MMSE estimator is used.

## 3Review on Regularized Zero-Forcing Precoding

The optimal linear precoding (in terms of maximal weighted sum rate or other criteria) is unknown under imperfect CSI and requires extensive optimization procedures under perfect CSI [25]. Therefore, only heuristic precoding schemes are feasible in fading multi-cell systems. Regularized zero-forcing (RZF) is a state-of-the-art heuristic scheme with a simple closed-form precoding expression [14]. The popularity of this scheme is easily seen from its many alternative names: transmit Wiener filter [26], signal-to-leakage-and-noise ratio maximizing beamforming [27], generalized eigenvalue-based beamformer [28], and virtual SINR maximizing beamforming [29]. This section provides a brief review of prior performance results on RZF precoding in large-scale multi-cell MIMO systems. We also explain why RZF is computationally intractable to implement in practical large systems.

Based on the notation in [7], the RZF precoding matrix used by the BS in the th cell is

where the scaling parameter is set so that the power constraint in is fulfilled. The regularization parameters and have the following properties.

Several prior works have considered the optimization of the parameter in the single-cell case [8] when . This parameter provides a balance between maximizing the channel gain at each intended receiver (when is large) and suppressing the inter-user interference (when is small), thus depends on the SNRs, channel uncertainty at the BSs, and the system dimensions [14]. Similarly, the deterministic matrix describes a subspace where interference will be suppressed; for example, this can be the joint subspace spanned by (statistically) strong channel directions to users in neighboring cells, as proposed in [30]. The optimization of these two regularization parameters is a difficult problem in general multi-cell scenarios. To the authors’ knowledge, previous works dealing with the multi-cell scenario have been restricted to considering intuitive choices of the regularizing parameters and . For example, this was recently done in [7], where the performance of the RZF precoding was analyzed in the following asymptotic regime.

In particular, it was shown in [7] that the SINRs perceived by the users tend to deterministic quantities in the large- regime. These quantities depend only on the statistics of the channels and are referred to as *deterministic equivalents*.

In the sequel, by deterministic equivalent of a sequence of random variables , we mean a deterministic sequence which approximates such that

Before reviewing some results from [7], we shall recall some deterministic equivalents that play a key role in the next analysis. They are introduced in the following theorem.^{1}

Theorem ? shows how to approximate quantities with only one occurrence of the resolvent matrix . For many situations, this kind of result is sufficient to entirely characterize the asymptotic SINR, in particular when dealing with the performance of linear receivers [31]. However, when precoding is considered, random terms involving two resolvent matrices arise, a case which is out of the scope of Theorem ?. For that, we recall the following result from [8] which establishes deterministic equivalents for this kind of quantities.

The performance of RZF precoding depends on a sequence of deterministic equivalents which we denote by and . These are defined as

We are now in position to state the result establishing the convergence of the SINRs with RZF precoding.

### 3.1Complexity Issues of RZF Precoding

The SINRs achieved by RZF precoding converge in the large- regime to the deterministic equivalents in Theorem ?. However, the precoding matrices are still random quantities that need to be recomputed at the same pace as the channel knowledge is updated. With the typical coherence time of a few milliseconds, we thus need to compute the large-dimensional matrix inverse in hundreds of times per second. The number of arithmetic operations needed for matrix inversion scales cubically in the rank of the matrix, thus this matrix operation is intractable in large-scale systems; we refer to [16] for detailed complexity discussions. To reduce the implementation complexity and maintain most of the RZF performance, the low-complexity TPE precoding was proposed in [16] and [17] for single-cell systems. This new precoding scheme has two main benefits over RZF precoding: 1) the precoding matrix is not precomputed at the beginning of each coherence interval, thus there is no computational delays and the computational operations are spread out uniformly over time; 2) the precoding computation is divided into a number of simple matrix-vector multiplications which can be highly parallelized and can be implemented using a multitude of simple application-specific circuits. The next section extends this class of precoding schemes to practical multi-cell scenarios.

## 4Truncated Polynomial Expansion Precoding

Building on the concept of truncated polynomial expansion (TPE), we now provide a new class of low-complexity linear precoding schemes for the multi-cell case. We recall that the TPE concept originates from the Cayley-Hamilton theorem which states that the inverse of a matrix of dimension can be written as a weighted sum of its first powers:

where are the coefficients of the characteristic polynomial. A simplified precoding could, hence, be obtained by taking only a truncated sum of the matrix powers. We refers to it as TPE precoding.

For and truncation order , the proposed TPE precoding is given by the precoding matrix:

where

and are the scalar coefficients that are used in cell . While RZF precoding only has the design parameter , the proposed TPE precoding scheme offers a larger set of design parameters. These polynomial coefficients define a parameterized class of precoding schemes ranging from MRT (if ) to RZF precoding when and given by the coefficients based on the characteristic polynomial of . We refer to as the *TPE order* corresponding to the th cell and note that the corresponding polynomial degree in is . For any , the polynomial coefficients have to be treated as design parameters that should be selected to maximize some appropriate system performance metric [16]. An initial choice is

where and are as in RZF precoding, while the parameter can take any value such that . This expression is obtained by calculating a Taylor expansion of the matrix inverse. The coefficients in gives performance close to that of RZF precoding when [16]. However, the optimization of the RZF precoding has not, thus far, been feasible. Therefore, we can obtain even better performance than the suboptimal RZF, using only small TPE orders (e.g., ), if the coefficients are optimized with the system performance metric in mind. This optimization of the polynomial coefficients in multi-cell systems is dealt with in Subsection Section 4.2 and the results are evaluated in Section 5.

A fundamental property of TPE is that needs not scale with the and , because is equivalent to inverting each eigenvalue of and the polynomial expansion effectively approximates each eigenvalue inversion by a Taylor expansion with terms [34]. More precisely, this means that the approximation error per UT is only a function of (and not the system dimensions), which was proved for multiuser detection in [35] and validated numerically in [16] for TPE precoding.

Next, we provide an asymptotic analysis of the SINR for TPE precoding.

### 4.1Large-Scale Approximations of the SINRs

In this section, we show that in the large-() regime, defined by Assumption A- ?, the SINR experienced by the th UT served by the th cell, can be approximated by a deterministic term, depending solely on the channel statistics. Before stating our main result, we shall cast in a simpler form by introducing some extra notation.

Let and let and be given by

Then, the SINR experienced by the th user in the th cell is

Since and are of finite dimensions, it suffices to determine an asymptotic approximation of the expected value of each of their elements. For that, similarly to our work in [16], we link their elements to the resolvent matrix

by introducing the functionals and

it is straightforward to see that:

where and . Higher order moments of the spectral distribution of appear when taking derivatives of or . The asymptotic convergence of these moments require an extra assumption ensuring that the spectral norm of is almost surely bounded. This assumption is expressed as follows.

Before stating our main result, we shall define (in a similar way, as in the previous section) the deterministic equivalents that will be used:

As it has been shown in [36], the computation of the first derivatives of and at , which we denote by and , can be performed using the iterative Algorithm 1, which we provide in Appendix Section 10. These derivatives and play a key role in the asymptotic expressions for the SINRs. We are now in a position to state our main results.

The proof is given in Appendix Section 8.

The proof is given in Appendix Section 9.

Theorem ? provides the tools to calculate the derivatives of and at , in a recursive manner.

Now, denote by and the deterministic quantities given by

We can now iteratively compute the deterministic sequences and as

Then, from Theorem ?, we have

Plugging the deterministic equivalent of Theorem ? into and , we get the following corollary.

This corollary gives asymptotic equivalents of and , which are the random quantities, that appear in the SINR expression in . Hence, we can use these asymptotic equivalents to obtain an asymptotic equivalent of the SINR for all UTs in every cell.

### 4.2Optimization of the System Performance

The previous section developed deterministic equivalents of the SINR at each UT in the multi-cell system, as a function of the polynomial coefficients of the TPE precoding applied in each of the cells. These coefficients can be selected arbitrarily, but should not be functions of any instantaneous CSI—otherwise the low complexity properties are not retained. Furthermore, the coefficients need to be scaled such that the transmit power constraints

are satisfied in each cell . By plugging the TPE precoding expression from into , this implies

In this section, we optimize the coefficients to maximize a general metric of the system performance. To facilitate the optimization, we use the asymptotic equivalents of the SINRs developed in this paper and apply the corresponding asymptotic analysis in order to replace the constraint with its asymptotically equivalent condition

where for all and .

The performance metric in this section is the weighted max-min fairness, which can provide a good balance between system throughput, user fairness, and computational complexity [25].^{2}

This problem has a similar structure as the *joint max-min fair beamforming* problem previously considered in [19] within the area of multi-cast beamforming communications with several separate user groups. The analogy is the following: The users in cell in our work corresponds to the th multi-cast group in [19], while the coefficients in correspond to the multi-cast beamforming to group in [19]. The main difference is that our problem is more complicated due to the structure of the power constraints, the negative sign of the second term in the denominators of the SINRs, and the user weights. Nevertheless, the tight mathematical connection between the two problems implies, that is an NP-hard problem because of [19]. One should therefore focus on finding a sensible approximate solution to , instead of the global optimum.

Approximate solutions to can be obtained by well-known techniques from the multi-cast beamforming literature (e.g., [18]). For the sake of brevity, we only describe the approximation approach of semi-definite relaxation in this section. To this end we note, we write on its equivalent epigraph form

where the auxiliary variable represents the minimal weighted rate among the users. If we substitute the positive semi-definite rank-one matrix for a positive semi-definite matrix of arbitrary rank, we obtain the following tractable relaxed problem

This is a so-called semi-definite relaxation of the original problem . Interestingly, for any fixed value on , is a convex semi-definite optimization problem because the power constraints are convex and the SINR constraints can be written in the convex form . Hence, we can solve by standard techniques from convex optimization theory for any fixed [39]. In order to also find the optimal value of , we note that the SINR constraints become stricter as grows and thus we need to find the largest value for which the SINR constraints are still feasible. This solution process is formalized by the following theorem.

Based on Theorem ?, we devise the following algorithm based on conventional bisection line search.

In order to apply Algorithm ?, we need to find a finite upper bound on the optimum of . This is achieved by further relaxation of the problem. For example, we can remove the inter-cell interference and maximize the SINR of each user in each cell by solving the problem

This is essentially a generalized eigenvalue problem and therefore solved by scaling the vector to satisfy the power constraint. We obtain a computationally tractable upper bound by taking the smallest of the relaxed SINR among all the users:

The solution to the relaxed problem in is a set of matrices that, in general, can have ranks greater than one. In our experience, the rank is indeed one in many practical cases, but when the rank is larger than one we cannot apply the solution directly to the original problem formulation in . A standard approach to obtain rank-one approximations is to select the principal eigenvectors of and scale each one to satisfy the power constraints in with equality.

As mentioned in the proof of Theorem ?, the optimization problem in belongs to the class of quasi-convex problems. As such, the computational complexity scales polynomially with the number of UTs and the TPE orders . It is important to note that the number of base station antennas has no impact on the complexity. The exact number of arithmetic operation depends strongly on the choice of the solver algorithm (e.g., interior-point methods [40]) and if the implementation is problem-specific or designed for general purposes. As a rule-of-thumb, polynomial complexity means that the scaling is between linear and cubic in the parameters [41]. In any case, the complexity is prohibitively large for real-time computation, but this is not an issue since the coefficients are only functions of the statistics and not the instantaneous channel realizations. In other words, the coefficients for a given multi-cell setup can be computed offline, e.g., by a central node or distributively using decomposition techniques [42]. Even if the channel statistics would change with time, this happens at a relatively slow rate (as compared to the channel realizations), which makes the complexity negligible compared the precoding computations [16]. Furthermore, we note that the same coefficients can be used for each subcarrier in a multi-carrier system, as the channel statistics are essentially the same across all subcarriers, even though the channel realizations are different due to the frequency-selective fading.

## 5Simulation Example

This section provides a numerical validation of the proposed TPE precoding in a practical deployment scenario. We consider a three-sector site composed of cells and BSs; see Figure 1. Similar to the channel model presented in [38], we assume that the UTs in each cell are divided into groups. UTs of a group share approximatively the same location and statistical properties. We assume that the groups are uniformly distributed in an annulus with an outer radius of and an inner radius of , which is compliant with a future LTE urban macro deployment [43].

The pathloss between UT in group of cell and cell follows the same expression as in [38] and is given by

where is the pathloss exponent and is the reference distance. Each base station is equipped with an horizontal linear array of antennas. The radiation pattern of each antenna is

where degrees and is measured with respect to the BS boresight. We consider a similar channel covariance model as the one-ring model described in Remark ?. The only difference is that we scale the covariance matrix in by the pathloss and the antenna gain:

We assume that each BS has acquired imperfect CSI from uplink pilot transmissions with . In the downlink, we assume for simplicity that all BSs use the same normalized transmit power of with .

The objective of this section is to compare the network throughput of the proposed TPE precoding with that of conventional RZF precoding. To make a fair comparison, the coefficients of the TPE precoding are optimized as described in Remark ?. More specifically, each user weight in the semi-definite relaxation problem is set to the asymptotic rate that the same user would achieve using RZF precoding. Consequently, the relative differences in network throughput that we will observe in this section hold approximately also for the achievable rate of each UT.

Using Monte-Carlo simulations, we show in Figure 2 the average rate per UT, which is defined as

We consider a scenario with users in each cell and different number of antennas at each BS: . The TPE order is the same in all cells: . As expected, the user rates increase drastically with the number of antennas, due to the higher spatial resolution. The throughput also increases monotonically with the TPE order , as the number of degrees of freedom becomes larger. Note that, if is equal to , increasing leads to a negligible performance improvement that might not justify the increased complexity of having a greater . TPE orders of less than can be relevant in situations when the need for interference-suppression is smaller than usual, for example, if is large (so that the user channels are likely to be near-orthogonal) or when the UTs anticipate small SINRs, due to low performance requirements or large cell sizes. The TPE order is limited only by the available hardware resources and we recall from [16] that increasing corresponds solely to duplicating already employed circuitry.

Contrary to the single-cell case analyzed in [16], where TPE precoding was merely a low-complexity approximation of the optimal RZF precoding, we observe in Figure 2 that TPE precoding achieves higher user rates for all than the suboptimal RZF precoding (obtained for ). This is due to the optimization of the polynomial coefficients in Section 4.2, which enables a certain amount of inter-cell coordination, a feature which could not be implemented easily for RZF precoding in multi-cell scenarios.

From the results of our work in [16], we expected that RZF precoding would provide the highest performance if the regularization coefficient is optimized properly. To confirm this intuition, we consider the case where all BSs employ the same regularization coefficient . Figure 3 shows the performance of the RZF and TPE precoding schemes as a function of , when , , and . We remind the reader that the TPE precoding scheme indirectly depends on the regularization coefficient , since while solving the optimization problem , we choose the user weights as the asymptotic rates that are achieved by RZF precoding. Figure 3 shows that RZF precoding provides the highest performance if the regularization coefficient is chosen very carefully, but TPE precoding is generally competitive in terms of both user performance and implementation complexity.

In an additional experiment, we investigate how the performance depends on the effective training SNR (). Figure 4 shows the average rate per UT for , , , and . Note that, as expected, both precoding schemes achieve higher performance as the effective training SNR increases.

The observed high performance of our TPE precoding scheme is essentially due to the good accuracy of the asymptotic deterministic equivalents. To assess how accurate our asymptotic results are, we show in Figure 5 the empirical and theoretical UT rates with TPE precoding () and RZF precoding with respect to , when . We see that the deterministic equivalents yield a good accuracy even for finite system dimensions. Similar accuracies are also achieved for other regularization factors (recall from Figure 2 that the value is not optimal), but we chose to visualize a case where the differences between TPE and RZF are large so that the curves are non-overlapping.

## 6Conclusion

This paper generalizes the recently proposed TPE precoder to multi-cell large scale MIMO systems. This class of precoders originates from the high-complexity RZF precoding scheme by approximating the regularized channel inversion by a truncated polynomial expansion.

The model includes important multi-cell characteristics, such as user-specific channel statistics, pilot contamination, different TPE orders in different cells, and cell-specific power constraints. We derived asymptotic SINR expressions, which depend only on channel statistics, that are exploited to optimize the polynomial coefficients in an offline manner.

The effectiveness of the proposed TPE precoding is illustrated numerically. Contrary to the single-cell case, where RZF leads to a near-optimal performance when the regularization coefficient is properly chosen, the use of the RZF precoding in the multi-cell scenario is more delicate. Until now, there is no general rule for the selection of its regularization coefficients. This enabled us to achieve higher throughput with our TPE precoding for certain scenarios. This is a remarkable result, because TPE precoding therefore has *both* lower complexity and better throughput. This is explained by the use of optimal polynomial coefficients in TPE precoding, while the corresponding optimization of the regularization matrix in RZF precoding has not been obtained so far.

## 7Some Useful Results

Applying Lemma ? to the function , we obtain the following result.

## 8Proof of Theorem

The objective of this section is to find deterministic equivalents for and . These quantities involve the resolvent matrix

For technical reasons, the resolvent matrix , that is obtained by removing the contribution of vector will be extensively used. In particular, if denotes the matrix after removing the th column, is given by

With this notation on hand, we are now in position to prove Theorem ?. In the sequel, we will mean by “controlling a certain quantity” the study of its asymptotic behaviour in the asymptotic regime.

### 8.1Controlling and

Next, we study sequentially the random quantities and . Using Lemma ?, the matrix writes as