Quantized Precoding for Massive MUMIMO
Abstract
Massive multiuser (MU) multipleinput multipleoutput (MIMO) is foreseen to be one of the key technologies in fifthgeneration wireless communication systems. In this paper, we investigate the problem of downlink precoding for a narrowband massive MUMIMO system with lowresolution digitaltoanalog converters (DACs) at the base station (BS). We analyze the performance of linear precoders, such as maximalratio transmission and zeroforcing, subject to coarse quantization. Using Bussgang’s theorem, we derive a closedform approximation on the rate achievable under such coarse quantization. Our results reveal that the performance attainable with infiniteresolution DACs can be approached using DACs having only to bits of resolution, depending on the number of BS antennas and the number of user equipments (UEs). For the case of 1bit DACs, we also propose novel nonlinear precoding algorithms that significantly outperform linear precoders at the cost of an increased computational complexity. Specifically, we show that nonlinear precoding incurs only a dB penalty compared to the infiniteresolution case for an uncoded bit error rate of , in a system with BS antennas that uses 1bit DACs and serves singleantenna UEs. In contrast, the penalty for linear precoders is about dB.
I Introduction
Massive multiuser (MU) multipleinput multipleoutput (MIMO) wireless systems, where the base station (BS) is equipped with several hundreds of antenna elements, promises significant improvements in spectral efficiency, energy efficiency, reliability, and coverage compared to traditional cellular systems [1, 2, 3]. Increasing the number of radio frequency (RF) chains at the BS could, however, lead to significant increases in hardware complexity, system costs, and circuit power consumption. Therefore, practical massive MUMIMO systems may require lowcost and powerefficient hardware components at the BS. In this paper, we consider the downlink of massive MUMIMO system, where the BS is equipped with lowresolution digitaltoanalog converters (DACs) and transmits data to multiple, independent user equipments (UEs) in the same timefrequency resource.
For the quantizationfree case (infiniteresolution DACs), the capacity region of the MU downlink Gaussian channel has been characterized in [4, 5, 6, 7]. When channel state information (CSI) is known noncausally at the BS, dirtypaper coding (DPC) [8] is known to achieve the sumrate capacity [6]. Several precoding algorithms to approach the DPC performance have been proposed (see, e.g., [9, 10, 11, 12]). Most of these precoding methods are, however, computationally demanding, and their complexity scales unfavorably with the number of BS antennas, preventing their use in massive MUMIMO. Linear precoding, on the other hand, is an attractive lowcomplexity approach to massive MUMIMO downlink precoding, which offers competitive performance to DPC for large antenna arrays [13, 14].
These results assume that the RF circuitry connected to each antenna port at the BS is ideal. The impact of RF hardware impairments at the transmit side has been investigated in, e.g., [15, 16, 17, 18]. Some of these results indicate that massive MUMIMO exhibits a certain degree of resilience against RF impairments. The crude aggregate models used for characterizing such hardware impairments, however, are unable to accurately capture the distortion caused by lowresolution DACs.
Ia What are the Benefits of Quantized Massive MUMIMO?
One of the dominant sources of power consumption in massive MUMIMO systems are the data converters at the BS. In the downlink, the transmit baseband signal at each RF chain is generated by a pair of DACs. The power consumption of these DACs increases exponentially with the resolution (in bits) and linearly with the bandwidth [19, 20]. In traditional multiantenna systems, each RF port is connected to a pair of highresolution DACs (e.g., 10bit or more). For massive MUMIMO systems with hundreds or even thousands of antenna elements, this would lead to prohibitively high power consumption due to the large number of required DACs. Hence, the DAC resolution must be limited to keep the power budget within tolerable levels. Furthermore, an often overlooked issue in massive MUMIMO is the vast amount of data that must be exchanged between the basebandprocessing unit and the radio unit (where the DACs are located). To make matters worse, in many deployment scenarios, these two units are separated by a large distance. Hence, lowering the DAC resolution is a potential solution to mitigate the datarate bottleneck on the fronthaul.
IB Relevant Prior Art
IB1 Quantized Receivers
Reducing the fronthaul throughput at the BS can be achieved by using lowresolution DACs in the downlink and lowresolution analogtodigital converters (ADCs) in the uplink. Several recent contributions have studied the use of lowresolution ADCs in the massive MUMIMO uplink. In particular, there has been a significant interest in the 1bit ADC case. For frequencyflat channels, the performance of 1bit ADCs followed by linear detectors was analyzed in, e.g., [21, 22, 23, 24], where it was shown that large achievable sum rates are supported. Similar conclusions were made in [25] for the frequencyselective case. Nonlinear detection algorithms for frequencyselective channels were studied in, e.g., [26]. These results suggest that the number of ADC bits can be reduced significantly compared to today’s systems.
IB2 Quantized Precoding
In contrast to the uplink case, there has only been a small number of contributions that consider the massive MUMIMO downlink with lowresolution DACs at the BS. In [27], the authors design a linearquantized precoder based on the minimum meansquare error (MMSE) criterion, taking into account the distortion caused by the DACs. For DACs with to bits resolution, the precoder proposed in [27] is shown to outperform conventional linearquantized precoders for smalltomoderatesized MIMO systems at high signaltonoise ratio (SNR). Massive MUMIMO systems with 1bit DACs are investigated in [28], where it is shown that maximal ratio transmission (MRT) precoding results in manageable distortion levels. Again for the case of bit DACs, the authors of [29] analyze the performance of zeroforcing (ZF) precoding on a Rayleighfading channel. Interestingly, it is shown that the received signal can be made proportional to the transmitted signal when the number of BS antennas tend to infinity. This implies that the severe perantenna distortion caused by the bit DACs averages out when many transmit antennas are available. A linear precoder where the bit quantized outcomes are rescaled in the analog domain was presented in [30]. There, the authors use the gradient projection algorithm to find a precoder that yields improved performance over the one reported in [27]. In [31], it is shown that, in the presence of transceiver nonlinearities (e.g., finiteresolution DACs), the achievable rate can be improved by minimizing the MSE between the transmitted symbols and the received signal prior to decoding. This result which, as we shall see, is related to the approach taken in this paper, relies on the assumptions of Gaussian inputs and nearestneighbor decoding.
IB3 LowPAR and ConstantEnvelope Precoding
Other types of hardwareaware precoding have previously been considered for massive MUMIMO systems, with the goal of reducing the linearity requirements at the BS. In [32], joint MU precoding and peaktoaverage power ratio (PAR) reduction was achieved by solving a convex optimization problem. Constantenvelope precoding, which minimizes the PAR by transmitting constantmodulus signals only, was studied in [33, 34]. Note that the 1bit DAC precoding problem can be seen as a special (or extreme) case of constantenvelope precoding, where the phase of the transmitted signal is limited to only four different values.
IC Contributions
We consider quantized precoding for the massive MUMIMO downlink over frequencyflat channels. Similarly to [30, 28, 29], we consider DACs operating at symbol rate sampling frequency. However, in contrast to [30, 28, 29], we do not restrict ourselves to 1bit DACs and linear precoding. Specifically, we consider both linearquantized precoders, where a linear precoder is followed by a finiteresolution DAC, and nonlinear precoders, where the data vector together with the CSI is used to directly generate the DAC outputs. Our contributions can be summarized as follows.

We formulate the MMSEoptimal linearquantized precoding problem and present low complexity, suboptimal linearquantized precoders that yield approximate solutions to this problem. We use Bussgang’s theorem to develop simple closedform approximations for the rate achievable with linearquantized precoding and lowresolution DACs. Through numerical simulations, we validate the accuracy of these approximations, and we show that only a small number of quantization bits are sufficient to close the performance gap to the infiniteresolution case. For the special case of bit DACs, we obtain a firm lower bound on the achievable rate with linear precoding.

For the bit case, we develop a variety of lowcomplexity nonlinear precoders that achieve nearoptimal performance. We show that the MMSEoptimal downlink precoding problem can be relaxed to a convex problem that can be solved in a computationallyefficient manner. We propose computationally efficient algorithms based on semidefinite relaxation, squared norm relaxation, and sphere decoding, and discuss advantages and limitations of each of these methods. Through numerical simulations, we demonstrate the superiority of nonlinear precoding over linearquantized precoding.

We investigate the sensitivity of the proposed precoders to channelestimation errors and demonstrate that the proposed precoders are robust to imperfect CSI at the BS.
Our results reveal that massive MUMIMO enables the use of lowresolution DACs at the BS without a significant performance loss in terms of errorrate performance and informationtheoretic rates.
ID Notation
Lowercase and uppercase boldface letters designate column vectors and matrices, respectively. For a matrix , we denote its complex conjugate, transpose, and Hermitian transpose by , , and , respectively. The entry on the th row and on the th column of the matrix is denoted as . For a vector , the th entry is denoted as . We use to indicate that the matrix is positive semidefinite. The trace and the main diagonal of are and , respectively. The identity matrix and the allzeros matrix are denoted by and , respectively. The real and imaginary parts of a complex vector are and , respectively. We use and to denote the norm and the norm of , respectively. We use to denote the signum function, which is applied entrywise to vectors and defined as if and if . We further use to denote the indicator function, which is defined as for and for . The multivariate complexvalued circularlysymmetric Gaussian probability density function (PDF) with covariance matrix is denoted by . We use to denote PDFs and to denote expectation with respect to the random vector . The mutual information between two random vectors and is written as .
IE Paper Outline
The rest of the paper is organized as follows. In Section II, we introduce the system model and formulate the MMSEoptimal quantized precoding problem. In Section III, we investigate linearquantized precoders for massive MUMIMO systems. Section IV deals with nonlinear precoding algorithms for the case of 1bit DACs. In Section V, we provide numerical simulation results and we analyze the robustness of the developed algorithms to channelestimation errors. We conclude the paper in Section VI.
Ii System Model and Quantized Precoding
Iia System Model
We consider the downlink of a singlecell massive MUMIMO system as illustrated in Fig. 1. The system consists of a BS with antennas that serves singleantenna UEs simultaneously and in the same timefrequency resource. For simplicity, we assume that all RF hardware (e.g., local oscillators, mixers, power amplifiers, etc.) are ideal and that the ADCs at the UEs have infinite resolution. We also assume that the sampling rate of the DACs at the BS is equal to the sampling rate of the ADCs at the UEs and that the system is perfectly synchronized. Finally, we assume that the reconstruction stage (see, e.g., [35]) of the DACs consists only of a zeroorder hold circuit (no filtering stage).^{1}^{1}1Symbolrate sampling combined with lowresolution DACs may yield undesired outofband emissions, which may be mitigated by using analog filters. Such filters, however, may in turn cause intersymbolinterference. In this work, we shall ignore the outofband emissions caused by the lowresolution DACs and no filter will be considered. Under these assumptions, the inputoutput relation of the downlink channel can be modeled as
(1) 
Here, the vector contains the received signals at all users, with denoting the signal received at the th UE. The matrix models the downlink channel, and it is assumed to be perfectly known to the BS.^{2}^{2}2In Section VB, we will relax this assumption by investigating the impact of imperfect CSI to the robustness of the proposed quantized precoding algorithms. We shall also assume that the entries of are independent circularlysymmetric complex Gaussian random variables with unit variance, i.e., , for , and . The vector in (1) models additive noise. We assume the noise to be i.i.d. circularlysymmetric complex Gaussian with variance per complex entry, i.e., , for . We shall also assume that the noise level is known perfectly at the BS.^{3}^{3}3Knowledge of at the BS can be obtained by explicit feedback from the UEs to the BS.
The precoded vector is denoted by , where the set is the transmit alphabet; this set coincides with the set of complex numbers in the case of infiniteresolution DACs. In realworld BS architectures with finiteresolution DACs, the set is, however, a finitecardinality alphabet. Specifically, we denote the set of possible realvalued DAC outputs (quantization labels) as . We refer to and as the number of quantization levels and the number of quantization bits per real dimension, respectively. For each BS antenna, we assume the same quantization alphabet for the real part and the imaginary part. Hence, the set of complexvalued DAC outputs at each antenna is . Under these assumptions, the th entry of the precoded vector is where .
IiB Precoding
Let for be the constellation point at the BS intended for the UE ; here, is the set of constellation points (e.g., QPSK). The BS uses the available CSI, namely the knowledge of the realization of the channel matrix , to precode the symbol vector into a dimensional precoded vector . Here, the function represents the precoder. The precoded vector must satisfy the average power constraint
(2) 
We define as the SNR.
Coherent transmission of data using multiple BS antennas leads to an array gain, which depends on the realization of the fading channel. We shall assume that the th UE is able to rescale the received signal by a factor to compute an estimate of the transmitted symbol as follows:
(3) 
The problem of downlink precoding has been studied extensively in the literature. Broadly speaking, the goal is to increase the array gain to the intended UE while simultaneously reducing MU interference (MUI) [36]. There exist multiple formulations of this optimization problem based on different performance metrics (e.g., sumrate throughput, worstcase throughput, error probability, etc.). We refer the interested reader to the tutorial [37] for a comprehensive overview.
Our specific goal is to design a precoder that minimizes the MSE between the received signal and the transmitted symbol vector under the power constraint (2). This problem has been studied extensively for the case of infiniteresolution DACs (see, e.g., [38, 39, 40]). If the BS is equipped with finiteresolution DACs, then the UEs will experience additional distortion compared to the infiniteresolution case, due to finite cardinality of the set of possible precoder outputs.
Finding the MMSEoptimal precoder for BS architectures with finiteresolution DACs is a formidable task due to the finite cardinality of . In what follows, we present novel algorithms that efficiently compute approximate solutions to the quantized precoding problem. More specifically, we investigate two approaches: linearquantized precoding in Section III and nonlinearquantized precoding for the special case of 1bit DACs in Section IV. As illustrated in Fig. 2, linearquantized precoders perform linear processing (matrixvector multiplication) followed by quantization; in contrast, nonlinear precoders use the transmit vector together with the available CSI in order to directly compute the precoded vector . As it will be shown in Section V, nonlinear precoders outperform (often significantly) linearquantized precoders in terms of errorrate performance at the cost of higher computational complexity.
Iii Linear–Quantized Precoders
In the infiniteresolution case, linear precoders multiply the dimensional symbol vector with a precoding matrix so that . This approach is particularly attractive for massive MUMIMO systems due to (i) the relatively low computational complexity and (ii) the fact that even the simplest linear precoder, namely the MRT precoder, achieves virtually optimal performance in the largeantenna limit (see, e.g., [1]). Linearquantized precoders inherit the first of these two advantages. Indeed, quantizing the precoded vector implies no additional computational complexity. For linearquantized precoders, the precoded vector is given by
(4) 
Here, denotes the quantizermapping function, which is a nonlinear function that describes the joint operation of the DACs at the BS.
The remainder of this section is organized as follows. We start by formulating the MMSE quantized precoding problem for linearquantized precoders. We then describe the operation of the DACs and define the quantizermapping function. We then use Bussgang’s theorem [41] to derive a lower bound on the sumrate capacity for the case of bit DACs at the BS. Finally, we derive a simple closedform approximation of the rate achievable with Gaussian inputs for the more general case of bit DACs.
Iiia The LinearQuantized Precoding Problem
By restricting ourselves to linearquantized precoding (LQP), we can formulate the quantized precoding problem as follows:
(5) 
The resulting precoding matrix and the associated precoding factor will be referred to as the optimal solution to the problem (LQP). Here, we have introduced the scalar to account for the array gain at the UEs (as commonly done in the MMSE precoding literature; see, e.g., [39, 27]). By solving (5), we find the precoded vector that minimizes the perchannel MSE between the transmitted symbols and the vector . Indeed, note that
(6) 
and recall that . Next, we provide more insights on the role of the precoding factor . We seek a precoded vector that makes the received signal proportional to the transmitted symbol vector , i.e., . To lessen the adverse impact of the noise vector in (1), we look for a design that maximizes the received signal power at the UEs. The cost function in (5) accomplishes exactly this goal by favoring solutions with a smaller . Unfortunately, the introduction of the precoding factor (which is not known to the UEs) may complicate decoding at the UEs.^{4}^{4}4We shall elaborate on this point in Section IVD.
Solving (5) in closed form is challenging due to the nonlinear operation of the DACs, which is captured by the quantizermapping function . An approximate solution to (5) was given in [27]. This solution is obtained by approximating the statistics of the distortion caused by the DACs. We shall consider here a different approach. Specifically, we design linear precoders that assume infiniteresolution DACs at the BS, and then quantize the resulting precoded vector. Such linearquantized precoders have the advantage that the precoding matrix does not depend on the resolution of the DACs. Furthermore, as we shall see in Section VA, the difference in errorrate performance between the precoders found using our approach and the precoder presented in [27] is negligible. We next review a selection of linear precoding algorithms for the case of infiniteresolution DACs.
IiiA1 WF precoding
IiiA2 ZF precoding
With ZF precoding, the BS nulls the MUI by choosing as precoding matrix the pseudoinverse of the channel matrix. The ZF precoding matrix is obtained from (7) by setting the noise variance to zero, which yields , where is the pseudoinverse of the channel matrix , and . The resulting precoded vector is .
IiiA3 MRT precoding
The MRT precoder maximizes the power directed towards each UE, ignoring MUI. The precoding matrix can be obtained from (7) by letting the noise variance tend to infinity, which yields and . The resulting precoded vector is .
IiiB Uniform Quantization of a ComplexValued Vector
For simplicity, we shall model the DACs as symmetric uniform quantizers with step size . When a signal is quantized, the average power in the signal is in general not preserved. Therefore, we further assume that the output of the quantizer is scaled by a constant , to ensure that the transmit power constraint (2) is satisfied. We start by defining a set of quantization labels with entries
(9) 
Furthermore, let , where specify the set of quantization thresholds. For uniform quantizers, the quantization thresholds are given by
(10) 
The quantizermapping function can be uniquely described by the set of quantization labels and the set of quantization thresholds . The DACs map with entries into the quantized output with entries in the following way: if and , then .
The step size of the quantizers should be chosen to minimize the distortion between the quantized and nonquantized vector. The optimal step size depends on the distribution of the input [42], which in our case depends on both the precoder and the signaling scheme. For simplicity, we set the step size so as to minimize the distortion under the assumption that the perantenna input to the quantizers is distributed. This step size can be found numerically (see e.g., [43] for details).
In the extreme case of 1bit DACs, the quantizermapping function reduces to
(11) 
Here, we have chosen the set of possible complexvalued quantization outcomes per antenna to be , which ensures that the power constraint in (2) is satisfied with equality.
IiiC Signal Decomposition using Bussgang’s Theorem
Quantizing the precoded signal causes a distortion that is correlated with the input to the DACs. For Gaussian inputs, Bussgang’s theorem [41] allows us to decompose the quantized signal into a linear function of the input to the quantizers and a distortion term that is uncorrelated with the input to the quantizers [44, 31]. This allows us to characterize the rates achievable with Gaussian inputs. We start by stating Bussgang’s theorem [41, 44].
Theorem 1
Consider two zeromean jointly complex Gaussian random variables and . Assume that is passed through a nonlinear function that acts independently on the real and the imaginary components of . The covariance between and is given by
(12) 
Bussgang’s theorem has recently been used to analyze the massive MUMIMO uplink with bit ADCs (see, e.g., [24, 25]). It has also been used in [28] to approximate the distortion levels caused by MRT precoding and bit quantization in the massive MIMO downlink. We shall use Theorem 1 to characterize the performance of linearquantized precoders for the case of bit uniform DACs. As a first step, we establish the following result, whose proof is given in Appendix A.
Theorem 2
Let denote the output from a set of uniform quantizers described by the quantizermapping function . Assume that and that . The quantized vector can be decomposed as
(13) 
where the distortion and the signal are uncorrelated. Furthermore, is the following diagonal matrix:
(14)  
Here, and denote the number of levels and the step size of the DACs, respectively.
The following corollary provides a wellknown result for the case of 1bit quantization (see, e.g., [24, 25]). Its proof follows by setting and in (14) to satisfy the power constraint (2) with equality.
Corollary 3
For the case of 1bit DACs, the matrix in (14) reduces to
(15) 
Let now denote the th row of the channel matrix , let be the th column of the precoding matrix , and be the th entry of the noise vector . Using (13), we can express the received signal at UE as follows:
(16)  
(17)  
(18) 
Here, the error term captures both the MUI and the distortion caused by the finiteresolution DACs. Note that and are uncorrelated. Indeed,
(19) 
We shall next use the decomposition in (16) to analyze the performance of linearquantized precoders.
IiiD Achievable Rate Lower Bound for 1bit DACs
We assume that each UE scales its received signal by the scalar (which is assumed to be known at the th UE) to obtain the following estimate:
(20) 
The nonlinearity introduced by the DACs prevents one to characterize the probability distribution of the error term in closed form, which makes it difficult to compute the achievable rates. One can, however, lowerbound the achievable rate using the socalled “auxiliarychannel lower bound” [45, p. 3503], which gives the rates achievable with a mismatched decoder (see [46, ch. ] for a recent review on the subject). As auxiliary channel, we take the one with output
(21) 
where has the same variance as the actual error term but is Gaussian distributed. Assuming Gaussian inputs, by standard manipulations of the mutual information, we can bound the achievable rate for UE as follows:
(22)  
(23)  
(24)  
(25) 
where
(26) 
is the signaltointerferencenoiseanddistortion ratio (SINDR) at the th UE.^{5}^{5}5One can establish (25) also by noting that Gaussian noise is the worst noise for Gaussian inputs [47]. Here, denotes the covariance of the distortion . It is worth pointing out that the choice of the auxiliary channel (21) corresponds to the use of mismatched nearestneighbor decoding at the UEs [48, 49].
Next, we use (13) to write the covariance matrix in (26) as
(27) 
where is the covariance matrix of the quantized signal . In the special case of 1bit DACs, can be written in closedform as [50, 51]
(28) 
Thus, using (27) and (IIID), we can express the SINDR in (26) in closed form for the case of 1bit DACs. Substituting (26) in (25), one obtains a lower bound on the peruser achievable rate with Gaussian signaling for the 1bit DAC case. Unfortunately, no closedform expression for is available for the multibit DAC case. We address this problem in the next section.
IiiE Achievable Rate Approximation for MultiBit DACs
In this section, we provide an approximation of (26) for the multibit DAC case, which is derived under the assumption that both and are large and that the error term in (16) is a Gaussian random variable. The approximation relies on standard random matrix theory arguments. Specifically, let
(29) 
where the normalization by , given by
(30)  
ensures that the power constraint (2) is satisfied. In (29), the function is the cumulative distribution function of a Gaussian random variable. Let also be defined as follows:
(31) 
Following the same approach as in [52, 53, 54], one can show that, for the three linearquantized precoders (WF, ZF, and MRT) introduced in Section IIIA, the SINDR in (26) can be approximated for large and by
(32)  
(33)  
(34) 
Substituting (32)–(34) into (25), one gets an approximation of the achievable rate with Gaussian signaling and nearestneighbor decoding that is valid for large and . In Section V, we verify through numerical simulations that this approximations is accurate already for realistic values of and .
Iv Nonlinear Precoders for 1Bit DACs
We now investigate nonlinear precoders that seek approximate solutions to the MMSEoptimal problem detailed in Section IIB. We shall focus on the extreme case of 1bit DACs, for which the problem simplifies and efficient numerical algorithms can be developed.
We start by noting that, in the 1bit case, all DAC outcomes have equal amplitude, and that if one sets in (9). This observation allows us to formulate the 1bit quantized precoding (QP) problem as follows:
(35) 
Here, . The resulting precoded vector and the associated precoding factor are referred to as the optimal solution to the problem (35).
Compared to the problem (LQP) in (5), where we minimize the MSE averaged over both the symbol vector and the noise vector (for a given ), in (QP) we minimize the MSE averaged over the noise vector (for a given and ). Since the optimization problem is solved for a given , the precoding factor depends on ; this is in contrast to the linearquantized case, where depends only on .^{6}^{6}6We shall discuss how the dependence of on effects decoding the receiver side in Section IVD.
We note that (QP) in (35) resembles an norm regularized closestvector problem (CVP), with the unique feature that the discrete set of vectors is parametrized by the continuous precoding factor . This prevents the straightforward use of conventional algorithms to approximate CVPs [55, 56]. Since the objective function in (35) is a quadratic function in , we can compute the optimal value of as
(36) 
which depends on . Inserting (36) into the objective function in (35), we obtain the following equivalent formulation of the QP problem:
(37) 
To obtain , we can then simply evaluate (36) for the optimal vector . We emphasize that a straightforward exhaustive search to solve (QP) requires the evaluation of candidate vectors, a quantity that grows exponentially with the number of BS antennas . For a system with antennas at the BS, this approach would require us to evaluate the objective function more than times (more than 10 quattuorvigintillions times). In fact, for a fixed value of , the problem (QP) is a closest vector problem that is NPhard [57]. This implies that there are no known algorithms to solve such problems efficiently for large values of .^{7}^{7}7As we will show in Section IVC, we can—in some cases—design branchandbound methods (such as spheredecoding methods) that allow us to solve the quantized precoding problem efficiently for moderatelysized problems. For massive MUMIMO systems with hundreds of antennas, however, such methods still exhibit prohibitive computational complexity. Hence, alternative algorithms that solve a lower complexity version of the QP problem are required for massive MUMIMO systems.
In order to develop such computationally efficient algorithms, we start by defining the auxiliary vector and rewrite (35) in the following equivalent form:
(38) 
Here, . To obtain (38), we have used that . Let be the solution to (38). The resulting precoding vector is obtained by scaling each entry of so that it belongs to the set . Clearly, is the scaling parameter.
It turns out convenient to transform the complexvalued problem (38) into an equivalent realvalued problem using the following definitions:
These definitions enable us to rewrite (38) as
(39) 
where is the set of scaled antipodal outputs of each bit DAC. We shall next develop a variety of nonlinear precoding methods that find approximate solutions to the problem (39).
Iva Semidefinite Relaxation
Semidefinite relaxation (SDR) is a wellestablished technique to develop approximate algorithms for a variety of discrete programming problems [58]. For example, SDR is commonly used to find nearML solutions for the MUMIMO detection problem (see, e.g., [58, 59]). For the case when the BS is equipped with infiniteresolution DACs, SDR has been used for downlink precoding in [60, 61]. We next show how SDR can be used to find approximate solutions to (35).
In our context, SDR involves relaxing (39) to a semidefinite program (SDP) as follows. We start by writing the realvalued problem (39) in the following equivalent form [58]:
(40) 
If then is the solution to (40); if , the solution is . Next, let the matrix be defined as follows:
(41) 
Also, let . Following steps similar to those in [58], we rewrite the objective function in (40) as
(42) 
The problem (40) can now be reformulated as
(43) 
Here, denotes the set of real and symmetric matrices. To see why (40) and (43) are equivalent, remember that , which implies that has rank 1, and that for , and .
Unfortunately, the rank1 constraint in (43) is nonconvex, which makes this problem just as hard to solve as the original QP problem in (35). Nevertheless, we can use SDR to relax the problem in (43) by omitting the rank1 constraint, which results in the following SDP:
(44) 
This problem can be solved efficiently using standard methods from convex optimization [62]. If the solution matrix has rank one, then (SDRQP) finds the exact solution to the problem (QP) in (39). If, however, the rank exceeds one, we have to extract a precoding vector that belongs to the discrete set . As commonly done, one can obtain such vector by first performing an eigenvaluedecomposition of and by then quantizing the first entries of the leading eigenvector . To this end, let denote the realvalued counterpart of , whose th entry () is given by
(45) 
The multiplication by takes into account the potential sign change caused by . The th entry of the resulting complexvalued precoded vector is obtained as follows:
(46) 
for . We refer to this approach as SDR with a rankone approximation (SDR1). Alternatively, we can obtain a precoding vector in using more sophisticated randomized procedures. See the survey article [58] for more details. We refer to this approach as SDR with randomization (SDRr).
SDR enables the computation of approximate solutions to the NPhard problem (QP) in polynomial time. Specifically, the worstcase complexity scales with [58]. However, SDR lifts the problem to a higher dimension: from dimensions to dimensions. Furthermore, implementing the corresponding numerical solvers entails high hardware complexity [63]. Recently, a hardwarefriendly approximate SDR solver for problems of dimension up to was proposed in [63]. However, the complexity of this solver still prevents its use for massive MUMIMO systems with hundreds of antennas. Hence, we conclude that SDR is a suitable technique only for small to moderatelysized systems (e.g., 16 BS antennas or less). For larger antenna arrays, alternative methods are necessary. One such method is described next.
IvB Squared Norm Relaxation
We next present a novel method to approximately solving (35), which avoids lifting the problem to a higher dimension and requires low complexity. We start by rewriting the realvalued optimization problem (39) as
(47) 
where we used that under the constraint that for . By dropping the nonconvex constraints for , we obtain the following convex relaxation of (47):
(48) 
which, as we shall see, can be solved efficiently. To extract a feasible precoding vector from the solution to the problem (48), we quantize the entries of the vector to the quaternary set by computing
(49) 
for , where is the realvalued counterpart of . As in (46), we then obtain the complexvalued precoded vector as follows:
(50) 
for . There exist several numerical optimization methods that are capable of solving problems of the form of () in (48) in a computationally efficient manner. The most prominent methods are forwardbackward splitting (FBS) [64, 65] and DouglasRachford (DR) splitting [66, 67]. In what follows, we develop a DR splitting method, which we refer to as squaredinfinity norm DouglasRachford splitting (SQUID). We define the two convex functions and , and solve
(51) 
Let
(52) 
define the proximal operator for the function [64]. By initializing and , SQUID performs the following iterative procedure for until convergence or until a maximum number of iterations has been reached:
(53)  
(54)  
(55) 
The proximal operator in (53) has the following simple^{8}^{8}8One can further accelerate the evaluation of this proximal operator by using the Woodbury matrix identity (which reduces the dimension of the matrix inverse), and by precomputing certain constant quantities, such as . expression:
(56) 
While the proximal operator for the norm is well known in the literature [64], the proximal operator for the squared norm, needed in (54), appears to be novel. The following theorem details an efficient procedure for computing this proximal operator. The proof is given in Appendix B.
Theorem 4
Let . Then, the squared norm proximal operator
(57) 
can be computed using the procedure summarized in Algorithm 1.
In summary, SQUID enables us to solve the relaxed problem in (48) in a computationally efficient manner. Indeed, each iteration requires only simple matrix and vector operations, and the evaluation of the proximal operator in Algorithm 1. The performance of SQUID is investigated in Section V where we demonstrate that this lowcomplexity algorithm achieves performance comparable to SDR, which is a far more demanding algorithm in terms of computational complexity.
IvC Sphere Precoding
Sphere decoding (SD) is a common method to solve CVPs exactly but at lower average computational complexity than a naïve exhaustive search [56, 55, 68]. The idea of SD is to constrain the search for possible optimal solutions to a hypersphere of radius . By transforming the optimal CVP into a treesearch problem, one can then perform a depthfirst branchandbound procedure and prune branches that exceed the radius constraint to reduce the number of candidate vectors. While SD reduces (often significantly) the average complexity compared to an exhaustive search, it was shown to exhibit exponential complexity in the number of variables for data detection in multiantenna wireless systems [69, 70].
To adapt SD to 1bit quantized precoding (we call this adaptation sphere precoding (SP)), we proceed as follows. Assume that the optimal precoding factor is known. Then, we can rewrite the objective function in (37) as follows:
(58)  
(59) 
In (58), we used that in the 1bit case; in (59) we set and . Hence, we can write the precoding problem as
(60) 
which can be solved using SD. More specifically, by computing the QR factorization , where with and is upper triangular with nonnegative diagonal entries, we obtain the equivalent problem
(61) 
The triangular structure of this problem allows us to deploy standard SD methods, as the one in [55].
In practice, the optimal precoding factor is unknown. We therefore propose the following alternating optimization approach. At iteration , we initialize the algorithm with the precoding factor obtained from WF precoding. Specifically, we use (36) and set . We then solve (SP) to obtain and compute an improved precoding factor using (36). We repeat this procedure for until convergence or until a maximum number of iterations is reached. Our simulations have shown that this procedure usually converges in only to iterations and achieves nearoptimal performance for small to moderatelysized MIMO systems (in Section VA, we present numerical results for the case of antennas). We note that a plethora of SDrelated methods can be used to solve SP. However, the exponential complexity of SD prevents its use for massive MIMO systems with hundreds of antennas.
IvD Decoding at the UEs
As for the case of linearquantized precoders, we assume that the th UE is able to scale the received signal by some scaling factor . Note that the scaling factor can not directly be chosen to be equal to the precoding factor , since depends in the nonlinear case on the instantaneous transmit vector and cannot be estimated at the UEs. It is worth noting that for the special case in which the entries of are taken from of a constantmodulus constellation (e.g., PSK) and the receiver employs symbolwise nearestneighbor decoding (i.e., each UE maps its estimate in (3) to the nearest constellation point, which implies that both the residual MUI and the quantization error are treated as Gaussian noise, although they are not Gaussian), the scaling factor chosen by the receiver does not affect performance because the decision regions are circular sectors in the complex plane. In the simulation results in Section V, we shall focus on QPSK modulation for which no scaling is needed. In a followup work [71], we presented simulation results for the case of higherorder constellations that do not satisfy the constantmodulus assumption (e.g., 16QAM). In this case, it is sufficient to modify the precoding problem (35) so that a single value of is chosen for a block of transmit symbols whose length does not exceed the channel coherence time. This allows the UEs to estimate through pilot transmissions or blind estimation techniques.
V Numerical Results
We now present numerical simulations for the quantized precoders introduced in Section III and Section IV. Due to space constraints, we shall focus on a limited set of system parameters.^{9}^{9}9Our simulation framework is available for download from GitHub (https://github.com/quantizedmassivemimo/1bit_precoding). The purpose is to enable interested readers to perform their own simulations with different system parameters and also to test alternative algorithms.
Va ErrorRate Performance
We start by comparing the performance of the developed precoders in terms of uncoded bit error rate (BER). In what follows, we assume that the UEs perform symbolwise nearestneighbor decoding.
In Fig. 3, we compare the BER with QPSK signaling and 1bit DACs for the linear precoders presented in Section III (namely, WF, ZF and, MRT) and the nonlinear precoding algorithms presented in Section IV (namely, SDR1, SDRr, SQUID and SP). For comparison, we also report the performance of the WFquantized (WFQ) precoder proposed in [27], and the performance of the WF precoder for the infiniteresolution case.
In Fig. (a)a, we consider the case BS antennas and