Joint Pilot Design and Uplink Power Allocation in MultiCell Massive MIMO Systems
Abstract
This paper considers pilot design to mitigate pilot contamination and provide good service for everyone in multicell Massive multiple input multiple output (MIMO) systems. Instead of modeling the pilot design as a combinatorial assignment problem, as in prior works, we express the pilot signals using a pilot basis and treat the associated power coefficients as continuous optimization variables. We compute a lower bound on the uplink capacity for Rayleigh fading channels with maximum ratio detection that applies with arbitrary pilot signals. We further formulate the maxmin fairness problem under power budget constraints, with the pilot signals and data powers as optimization variables. Because this optimization problem is nondeterministic polynomialtime hard due to signomial constraints, we then propose an algorithm to obtain a local optimum with polynomial complexity. Our framework serves as a benchmark for pilot design in scenarios with either ideal or nonideal hardware. Numerical results manifest that the proposed optimization algorithms are close to the optimal solution obtained by exhaustive search for different pilot assignments and the new pilot structure and optimization bring large gains over the stateoftheart suboptimal pilot design.
I Introduction
The demands on capacity and reliability in wireless cellular networks are continuously increasing. It is known that multiple input multiple output (MIMO) techniques can improve both capacity and reliability [1, 2, 3], but current systems only support up to eight antennas per base station (BS). While codebookbased channel acquisition is attractive in such smallscale MIMO systems, these methods are not scalable and unable to support the fifth generation (G) demands on spectral efficiency (SE) in nonlineofsight conditions [4]. Massive MIMO was proposed in [5] as a possible solution and it has emerged as a key G technology, because it offers significant improvements in both SE and energy efficiency [6, 5, 4, 7, 8]. By equipping the BSs with hundreds of antennas, mutual interference, thermal noise, and smallscale fading can be almost eliminated by virtue of the channel hardening and favorable propagation properties [6]. The BSs only need to use linear detection schemes, such as maximum ratio (MR) or zero forcing, to achieve nearly optimal performance [9]. In addition, the SE only depends on the largescale fading coefficients, thus power control algorithms are easier to deploy than in smallscale MIMO systems, which are greatly affected by smallscale fading [10].
The uplink (UL) detection and downlink precoding in Massive MIMO are based on instantaneous channel state information (CSI), which the BSs obtain from UL pilot signals. Mutually orthogonal pilots are desirable, but this is impractical in multicell scenarios since the pilot overhead would be proportional to the total number of users in the entire system. The consequence is that the pilot signals need to be reused across cells. This leads to pilot contamination [11, 12], where users sending the same pilot degrade each others channel estimation and cause large mutual interference. Hence, the pilot design is of key importance in Massive MIMO and should be optimized to mitigate the pilot contamination effects.
The baseline scheme for mitigating pilot contamination is to introduce a pilot reuse factor , such that each pilot is only reused in of the cells. This approach, which was studied in [13, 14, 15, 16], can greatly reduce the pilot contamination, even if the pilots are randomly assigned within each cell. However, this gain comes at the cost of using times more pilots than in a system reusing the pilots in every cell. For any given cell, only a few users in the neighboring cells cause most of the potential pilot contamination, thus it is most important that these potential contaminators are assigned different pilots from the users in the given cell. Algorithms for coordinated pilot assignment were proposed in [17, 18, 19, 20]. A pilot reuse dictionary was defined in [17] and the corresponding pilot assignment problem was shown to be nondeterministic polynomialtime hard (NPhard), which motivates the design of heuristic assignment mechanisms. Although [17] proposed several greedy algorithms, the optimized SE was far from that with exhaustive search over all pilot assignments. Graph theory was used for pilot assignment in [18], by exploiting variations in the largescale fading coefficients. A method called “smart pilot assignment” was proposed in [20] to enhance the maxmin fairness SE level, by optimizing a heuristic mutual interference metric. Alternatively, [19] formulated the pilot assignment problem as a potential game. The numerical results in [18, 19, 20] show performance that is similar to an exhaustive search, but with a substantially lower computational complexity. Moreover, the authors of [21, 22] utilized particular channel properties to reduce channel estimation errors and mitigate pilot contamination. In particular, [21] utilized the orthogonality among different channels and an assumed lowrankness of the channel covariance matrices. An adjustable phase shift pilot construction was suggested in [22] based on the relationship between channel correlations in the frequency domain and their power angledelay spectrum. However, all these algorithms rely on the assumption of fixed pilot and data power.
The pilot and payload data powers are usually treated as constants in the Massive MIMO literature, but it is known from [23, 12] that the performance can be much improved by using the optimal power allocation, which balances the mutual interference levels. To improve the channel estimation quality, more power might also be assigned to the pilots than to the data transmissions [24, 25]. For singlecell systems, [24] showed that a pilotdata power imbalance is especially important for celledge users. Moreover, the power allocation that maximizes the sum SE is much different from the one that maximizes the maxmin SE. Similar behaviors for multicell systems were observed in [25]. The authors in [26] considered power optimization problems with pilot reuse factors. To the best of our knowledge, no prior work analyzes joint pilot design and power control in Massive MIMO systems.
In this paper, we propose a novel pilot design and optimize the UL performance in multicell Massive MIMO systems, using the maxmin fairness utility. Our main contributions are:

We propose a new pilot design where the pilot signals are treated as continuous variables. We demonstrate that previous pilot designs are special cases of our proposal.

Based on the proposed pilot design, we derive closedform expressions of the SE with Rayleigh fading channels and MR detection, for the cases of ideal hardware and with hardware impairments. These expressions explicitly demonstrate how the SE is affected by mutual interference, noise, and pilot contamination.

We formulate the maxmin fairness problem for the proposed pilot design, by treating the pilot signals, pilot powers, and data powers as optimization variables. This is an NPhard signomial program, so we propose an algorithm that finds a local optimum in polynomial time. For comparison the optimal solution by an exhaustive search of different pilot assignments is also investigated.

The proposed algorithms are evaluated numerically, with either ideal hardware or hardware impairments. The results show that our local solution is close to the global optimum by exhaustive search over different pilot assignments and demonstrate significant improvements over the heuristic algorithms in prior works.
A preliminary version of this work, focusing only on pilot optimization with fixed data powers, was presented in [27].
The rest of this paper is organized as follows: Section II presents our proposed pilot structure and compares it with prior works. Lower bounds on the UL ergodic SE for arbitrary pilots are derived in Section III, while Section IV formulates the maxmin fairness optimization problems and provides the global and local solutions. Sections V and VI extend our research to the case of hardware impairments and correlated Rayleigh fading, respectively. Finally, Section VII gives extensive numerical results and some conclusions are provided in Section VIII.
Notations: Lower bold letters are used for vectors and upper cases are for matrices. and stand for regular transpose and Hermitian transpose, respectively. The superscript denotes the conjugate transpose of a complex number. is the identity matrix of size . is the space of complex (real) matrices, while denotes the space of length complex vectors. is the set of nonnegative real numbers. denotes the expectation of a random variable and is the Euclidean norm. Finally, is the circularly symmetric complex Gaussian distribution, while is the normal distribution.
Ii Pilot Designs for Massive MIMO Systems
We consider the UL of a multicell Massive MIMO system with cells. Each cell consists of a BS equipped with antennas that serves singleantenna users. All tuples of cell and user indices belong to a set defined as
(1) 
The radio channels vary over time and frequency. We divide the timefrequency plane into coherence intervals, each containing samples, such that the channel between each user and each BS is static and frequency flat. In each coherence block, the pilot signaling utilizes symbols and the remaining is dedicated to data transmission. In this paper, we focus on the UL, so the fraction of the coherence interval is dedicated to UL data transmission. However, it is straightforward to extend our work to the downlink by using time division duplex (TDD) and channel reciprocity. We assume to keep the training process feasible and stress that the case is of practical importance since it gives rise to pilot contamination and since is large in practice.
Iia Proposed Pilot Design
Let us denote the mutually orthonormal basis vectors , where is a vector whose th element has unit magnitude, and all other elements are equal to zero. The corresponding basis matrix is
(2) 
We assume that the pilot signals of the users can span arbitrarily over the above basis vectors. We aim at designing a pilot signal collection comprising the pilot signals used by all users in the network and each of them has the length of symbols. The pilot signal of user in cell is and the power that this user assigns to the th pilot basis is denoted as . Thus, the pilot of user in cell is
(3) 
We stress that the pilot construction in (3) can be used to create any set of orthogonal pilot signals (up to a unitary transformation) and many different sets of nonorthogonal signals. ^{1}^{1}1The pilot signals in (3) are formed as linear combinations of basis vectors in the complex field. The new pilot design allows the use of nonorthogonal pilot signals even within a cell in order to get extra degrees of freedom to minimize the interference in the network. The total pilot power consumption utilized by user in cell is and we assume that it satisfies the power constraint
(4) 
where is the maximum pilot power for user in cell . The inner product of two pilot signals and is
(5) 
These pilot signals are orthogonal if the product is zero, which only happens when they allocate their powers to different subsets of basis vectors. Otherwise, they are nonorthogonal and then the two users cause pilot contamination to each other. If the square roots of the powers allocated to the users in cell are gathered in matrix form as
(6) 
then the users in cell utilize a pilot matrix defined as
(7) 
We now describe the difference between this new pilot structure and the prior works, for example [20, 18, 24, 25].
IiB Other Pilot Designs
The works [20, 18] considered the assignment of orthogonal pilot signals under the assumption of fixed equal pilot power. Using our notation, the pilot matrix in cell is
(8) 
where is the equal power level of all users. is a permutation matrix, that assigns the pilot signals to each user in cell . The assignment is optimized in [20, 18] to minimize a heuristic mutual interference metric. Note that these works assume orthogonal pilot signals and equal power allocation, which are simplifications compared to (7). These assumptions are generally suboptimal. Apart from this, the selection of the optimal permutation matrices for cell is a combinatorial problem, so to limit the computational complexity [20, 18] and the references therein only study the special case of .
The previous work [24] optimized the pilot powers to maximize functions of the SE, but the paper only considered a single cell without pilot contamination. The authors of [25] optimized the pilot powers to minimize the UL transmit power for a multicell system. This work assumed and a fixed pilot assignment. If is the pilot power of user in cell , the square root of the power matrix allocated to the users in cell is a diagonal matrix defined as
(9) 
where denotes the diagonal matrix with the vector on the diagonal. The pilot matrix in cell is then formulated as
(10) 
Similar to (4), the pilot power at user in cell is limited as
(11) 
Since orthogonal pilots and fixed pilot assignment are assumed, this is also a special case of (7). We can combine the pilot structure in (10) and the idea of selecting a permutation matrix in (8) to jointly optimize the power allocation and pilot assignment. In particular, the pilot signals of the users in cell are now defined as
(12) 
This modified pilot design is a special case of (7) and has not been studied in prior works, but will be considered herein. In order to analyze the channel estimation, we define a pilot reuse set including all tuples of cell and user indices that cause pilot contamination to user in cell :
(13) 
We stress that designing an exhaustive search to obtain the best pilot assignment strategy is extremely computationally expensive.^{2}^{2}2For the first user in the first cell , there are possibilities of . There are then possible and so on. As in prior works, we only consider the case when using (12) and we further assume that orthogonal pilots are used within each cell; that is, for any user indices in cell . To perform an exhaustive search, we need to construct a dictionary , see Fig. 1, with all the possible combinations of pilot assignments in the network. Let denote the index of the pilot signal assigned to user in cell . It follows that for since all users within a cell use different pilots. The pilot assignment matrix containing the pilot indices of the users is
(14) 
Each row of contains to and there are different combinations, each defining a permutation matrix for the pilot signals in (8) and (12). The dictionary contains all the pilot assignment matrices. For each , we can extract the pilot reuse sets as^{3}^{3}3Each collection of pilot reuse sets is generated by different . By eliminating the copies, the size of the dictionary can be reduced to , which still grows rapidly with and .
(15) 
The dictionary will be later used to obtain the pilot assignment that maximizes the SE performance.
Iii Uplink Massive MIMO Transmission
This section provides ergodic SE expressions with arbitrary pilot signals, which are later used for pilot optimization.
Iiia Channel Estimation with Arbitrary Pilots
During the UL pilot transmission, the received signal at the BS of cell is
(16) 
where denotes the channel between user in cell and BS . is the additive noise with independent elements distributed as . Correlating in (16) with the pilot of user in cell , we obtain
(17) 
We consider uncorrelated Rayleigh fading since results obtained with this tractable model well matches the results obtained in nonlineofsight measurements [28]. The channel between user in cell and BS is distributed as
(18) 
where the variance determines the largescale fading, including geometric attenuation and shadowing. By using minimum mean squared error (MMSE) estimation, the distributions of the channel estimate and estimation error when using the pilot structure in (7) are given in Lemma 1.
Lemma 1.
Proof.
The proof follows directly from standard MMSE estimation techniques in [29]. ∎
Lemma 1 provides the MMSE estimator for the pilot design in (7). The pilot powers as well as inner products between pilot signals appear explicitly in the expressions. We now compute the channel estimate and estimation error of when using the pilot structure in (12).
Corollary 1.
Proof.
This follows from replacing the terms and in Lemma 1 by and , and then doing some algebra. ∎
Corollary 1 reveals that the quality of the estimated channel heavily depends on both the pilot power control and the pilot reuse set . A proper selection of mitigates channel estimation errors, and will also reduce the coherent interference during data transmission. Aligned with prior works, in the special case of , the channel estimate and estimation error are obtained for the pilot structure in (8). We now use the distributions in Lemma 1 and Corollary 1 to derive lower bounds on the UL ergodic capacity.
IiiB Uplink Data Transmission
(32) 
(35) 
In the UL data transmission, user in cell transmits the signal . The received signal vector at BS is the superposition of the transmitted signals
(26) 
where is the transmit power corresponding to the signal and the additive noise is . To detect the transmitted signal, BS selects a detection vector and applies it to the received signal as
(27) 
A general lower bound on the UL ergodic capacity of user in cell is computed in [9] as
(28) 
where the effective SINR value, , is
(29) 
The lower bound on the UL ergodic capacity in (28) is computed by using the useandthenforget bounding technique [6] and its tightness compared to the other possible bounds is discussed in Appendix D in [6]. Although the channel capacity for Massive MIMO in the case of imperfect CSI is unknown, we believe that the lower bound in (28) is quite close to the actual capacity. This is because the effective noise is comprised of a sum of many uncorrelated terms, it is close to Gaussian. This agrees with the worstcaseisGaussian assumption made when obtaining the bound. As a contribution of this paper, we compute a closed form expression for this lower bound in the case of MR detection with
(30) 
Lemma 2.
Proof.
The proof is available in Appendix A. ∎
From (32), we notice that it is always advantageous to add more BS antennas since the numerator grows linearly with (and only some terms in the denominator have the same scaling). The first term in the denominator represents noncoherent interference that only depends on the number of BSs and users, while it is independent of . The second term in the denominator represents coherent interference caused by pilot contamination and it grows linearly with . As a consequence, as , we have
(33) 
This limit depends only on the pilot design (i.e., inner products between pilot signals) and data power. An optimized selection of the power terms improves the SE by enhancing the channel estimation quality and reducing the coherent interference.
We also consider the achievable SE for the modified pilot structure in (12) as shown in Corollary 2.
Corollary 2.
Proof.
This follows as a special case of Lemma 2. ∎
The SE in Corollary 2 depends explicitly on the choice of thus the optimization of the pilot assignment is a combinatorial problem. We stress that the SINR expressions reflect the joint effects of pilot design, channel estimation quality, pilot contamination, and data power control, in contrast to the MSE that cannot distinguish between pilot contamination and noise. Hence, the SINR is a good metric to consider in the maxmin fairness optimization as shown in the next section.
Iv Maxmin Fairness Optimization
In this section, we first utilize the SE expressions in Lemma 2 and Corollary 2 to formulate maxmin fairness problems with joint pilot and data optimization. We demonstrate that these optimization problems are NPhard and propose an algorithm to find the globally optimal solution with the pilot design in (12) by making an exhaustive search over all pilot assignments. In addition, instead of looking for the global optimum, an algorithm to obtain a locally optimal solution in polynomial time is presented when using the new pilot design in (7).
Iva Problem Formulation
(47) 
A key vision of Massive MIMO is to provide uniformly good quality of service for everyone in the network. We will investigate how to optimize the pilots and powers towards this goal. We consider the pilot and data powers as optimization variables. The maxmin fairness optimization problem is first formulated for the proposed pilot design in (7) as^{4}^{4}4The optimization problem (36) requires coordination among the cells to be solved, but the main target in this paper is to investigate how much the maxmin fairness SE can be improved in multicell Massive MIMO by joint pilot design and UL power control. One potential way to deal with practical limitations such as backhaul signaling, delays, and scalability is to implement the optimization problem in a distributed manner using dual/primal decomposition [30].
(36)  
where is the maximum power that users can provide for each data symbol. Note that this optimization problem jointly generates the pilot signals and performs power control on the pilot and data transmission. The epigraphform representation of (36) is
(37a)  
subject to  (37b)  
(37c)  
(37d) 
From the expression of the SINR constraints in (37b), we realize that the proposed optimization problem is a signomial program.^{5}^{5}5A function defined in is signomial with terms () if the exponents are real numbers and the coefficients are also real but at least one must be negative. In case all are positive, is a posynomial function. Therefore, the maxmin fairness optimization problem is NPhard in general and seeking the optimal solution has very high complexity in any nontrivial setup [31]. However, the power constraints (37c) and (37d) ensure a compact feasible domain and make the SINRs continuous functions of the optimization variables. According to Weierstrass’ theorem [32], an optimal solution always exists.
For the alternative pilot design in (12), the maxmin fairness optimization problem is formulated as
(38)  
The optimization problem (38) is nonconvex since it contains a combinatorial pilot assignment selection. Fortunately the optimal solution to this problem can be obtained by looking up every instance in the dictionary . For each we attain the pilot reuse sets , and then convert (38) to a convex problem as shown in Corollary 3.
Corollary 3.
For a given pilot assignment matrix , (38) reduces to the geometric program
(39)  
The optimal solution to (39) is obtained in polynomial time due to its convexity. By checking every instance in the dictionary and solving the corresponding problem (39), the global optimum to (38) is obtained as the highest objective value to (39).
In more detail, the globally optimal solution to (38) is obtained as shown in Algorithm 1. The th iteration seeks the optimal solution , and for given by considering (39) as the main cost function. The algorithm is terminated when the iteration index equals . The global optimum to the pilot and data power control together with the pilot reuse set are obtained from the maximum values of all . This is a practical issue. We are indeed able to find the solution, but it will take very long time.
Input: Set ; Select the initial values of and for ; Set up the dictionary .

Iteration :

Assign the reuse pilot set index by an instance .

Solve the following geometric program to obtain and
(40) subject to


If Stop. Otherwise, go to Step 3.

Restore ,, and . Set , then go to Step 1.
Output: Set , then the optimal solutions: , , and
Algorithm 1 is computationally heavy since the number of iterations grows rapidly with and , but it obtains the global optimum to the maxmin SE problem (38). Specifically, the main cost of each iteration in Algorithm 1 is the geometric program (40) which includes optimization variables and constraints. Based on [33], in general, the computational complexity of this algorithm is of the order of
(41) 
where is the cost of evaluating the first and second derivatives of the objective and constraint functions in (40). Therefore, this approach will serve as a benchmark for comparison in Section VII. For the sake of completeness, we also include another benchmark whereas the data powers are fixed at their maximum value then Algorithm 1 is solved with respect to the remaining pilot power variables, as was done in our previous work [27].
IvB Local Optimality Algorithm
This subsection provides a method to obtain a local optimum to the optimization problem (37). To this end, the signomial SINR constraints are converted to monomial ones by using the weighted arithmetic meangeometric mean inequality [34] stated in Lemma 3.^{6}^{6}6 A function defined in is monomial if the coefficient and the exponents are real numbers.
Lemma 3.
[34, Lemma 1] Assume that a posynomial function is defined from the set of monomials as
(42) 
then it is lower bounded by a monomial function as
(43) 
where is a nonnegative weight corresponding to . We say that is the best approximation to near the point in the sense of the first order Taylor expansion, if the weight is selected as
(44) 
By using this lemma, the maxmin fairness optimization problem (37) is converted to a geometric program by bounding the term in the numerators of the SINR constraints:
(45) 
where is the weight value corresponding to . This leads to a lower bound on the SINR value for user in cell obtained as
(46) 
where the value is presented in (47).
The optimal solution to the maxmin SE optimization problem (37) is lower bounded by solving the geometric program
(48)  
subject to  
By virtue of the successive approximation technique [35], a locally optimal KarushKuhnTucker (KKT) point to the maxmin fairness optimization problem (37) can be obtained if we solve (48) iteratively as shown in Theorem 1.
Theorem 1.
In particular, we first select the initial powers that satisfy , . Then the corresponding weight values are computed as in (44). Furthermore, in each iteration, the SINR constraints are converted to the corresponding monomials by bounding the pilot power of user in cell as in (47), by using the weight values computed from the optimal pilot powers in the previous iteration. The pilot and data allocation solution is obtained by solving the geometric program (48) before the weight values are updated again at the end of each iteration. We repeat the procedure until this algorithm has converged to a KKT point. The convergence can be declared, for example, when the variation between two consecutive iterations is sufficient small. The proposed algorithm for obtaining a locally optimal solution is summarized in Algorithm 2. Note that one can also fix the data powers and only optimize the pilot signals in Algorithm 2, as was done in our previous work [27]. Algorithm 2 involves optimization with variables and constraints, and it has a computational complexity of the order of [33] ^{7}^{7}7The exact complexity or the runtime of the proposed algorithms are not suitable metrics since they depend significantly on the computer configuration and how much time is spent to optimize the implementations. However (41) and (49) give basic insights into the general computational complexity scaling.
(49) 
where is the cost of evaluating the first and second derivatives of the objective and constraint functions in (48). is the number of iterations needed for this algorithm to converge to the KKT point. Even though each iteration in Algorithm 2 is more costly than in Algorithm 1 since we carefully design powers for all pilot signals, the successive approximation approach converges after only a few iterations.