Hybrid Vector Perturbation Precoding: The Blessing of Approximate Message Passing
Vector perturbation (VP) precoding is a promising technique for multiuser communication systems operating in the downlink. In this work, we introduce a hybrid framework to improve the performance of lattice reduction (LR) aided (LRA) VP. Firstly, we perform a simple precoding using zero forcing (ZF) or successive interference cancellation (SIC) based on a reduced lattice basis. Since the signal space after LR-ZF or LR-SIC precoding can be shown to be bounded to a small range, then along with sufficient orthogonality of the lattice basis guaranteed by LR, they collectively pave the way for the subsequent application of an approximate message passing (AMP) algorithm, which further boosts the performance of any suboptimal precoder. Our work shows that the AMP algorithm in compressed sensing can be beneficial for a lattice decoding problem whose signal constraint lies in and entries of the input lattice basis not necessarily being i.i.d. Gaussian. Numerical results show that the developed hybrid scheme can provide performance enhancement with negligible increase in the complexity.
The broadband mobile internet of the next generation is expected to deliver high volume data to a large number of users simultaneously. To meet this demand in the broadcast network, it is desirable to precode the transmit symbols according to the channel state information (CSI) with improved time-efficiency while retaining the reliability. It has been indicated that plain channel inversion performs poorly at all singal-to-noise ratios (SNRs), and further regularization cannot improve the performance substantially. In [1, 2], the authors proposed a precoding scheme called vector perturbation (VP) based on Tomlinson-Harashima precoding to modify the transmitted data by modulo-lattice operations, and the scheme has been shown to be achieving near-sum-capacity of the system, which does not require explicit dirty-paper techniques.
The optimization target in a VP problem represents a closest vector problem (CVP) in a lattice perspective, which has been proved NP-complete by a reduction from the decision version of CVP . Therefore, the sphere decoding technique  adopted in [1, 2] (referred to as sphere precoding) is computationally prohibitive for large-scale systems. This hardness especially looms in the VP precoding problem because there is no prior on the distance from a target to the lattice, and the lattice bases in VP are not Gaussian random, so that Hassibi’s expected complexity analysis  no longer suits this setting. The complexity issue is indeed one of the three main challenges associated with VP, where the other two issues are about its power scaling factor and large signal space [6, 7].
To bypass the complexity issue of sphere precoding, much work has been done in recent years to explore low complexity CVP algorithms in multiuser (MU) multiple-input multiple-ouput (MIMO) communicatoins, e.g., cf. [6, 8, 9, 10, 11, 12, 13]. The spirit of these results is to address CVP on a sub-lattice or impose a constraint for the signal space of sphere precoding. E.g., in , the authors proposed to alternatively solve a CVP about a selective sub-basis of a smaller dimension, whose associated complexity of VP depends on the size of the new basis. As for the sparse vector perturbation technique in , it also belongs to the class of selective vector perturbation where it only selects two vectors. The reduction on the target vector in  is then applied to all basis vectors sequentially, which resembles a special case of the sequential lattice reduction . There is however no theoretical performance guarantee for these simplified methods, so we have to resort to a lattice reduction (LR) aided (LRA) precoding scheme [10, 15, 16], which had been shown diversity achieving . In addition to their theoretical guarantees, LRA methods particularly suit slow fading channels, where the lattice basis is fixed during a large number of time slots and only the CVP targets are changing.
We investigate VP by using LRA methods in this work. LR has become quite popular in both MIMO precoding and decoding, especially after the pioneering work of Lenstra–Lenstra–Lov sz (LLL) . In recent years, in addition to the polynomial LLL algorithm, more researchers are showing interests in strong lattice reduction algorithms such as Minkowski’s reduction [18, 19], Korkine-Zolotarev’s (KZ) reduction  and its boosted version . The performance of LRA precoding is not well understood except [10, 16], so our primary motivation is to investigate how far LRA methods can go, especially with the blessing of algorithms from compressed sensing.
We propose to use a message passing algorithm to explore the vicinity of sub-optimal solutions under the LRA framework. The approximate message passing (AMP) algorithm was initially proposed by Donoho, Montanari and Maleki in [22, 23, 24] to solve the least-absolute shrinkage and selection operator (Lasso) problem in compressed sensing, which has much lower complexity than previous benchmark algorithms. Researchers have been adopting message passing algorithms to solve problems in MIMO detection [25, 26, 27] with small constellation sizes, where the assumed Rayleigh fading channel assists to model the input lattice basis with i.i.d. Gaussian entries. It is noteworthy that directly applying AMP in MIMO detection problems cannot be diversity achieving because a general discrete prior renders the AMP threshold function not Lipschitz continuous in high signal-to-noise-ratio (SNR), so channel coding is often required (e.g., cf. ). If we want to embrace the low complexity advantage of AMP, several practical issues must be hampered. i) the lattice basis in VP is not a Gaussian random one nor its dual, while  shows the entries have to be at least sub-Gaussian and the generalized AMP (GAMP) [29, 30] only shows convergence of the algorithm with the aid of damping. ii) the problem size may not be infinitely large, and we should make AMP feasible in the non-asymptotic region (say, the base station is equipped with antennas to serve users). iii) the constellation in AMP cannot be integers . Fortunately, AMP in conjunction with a reduced lattice basis can alleviate all these concerns.
The contributions of this paper are summarized as twofold:
1) After showing boosted LLL/KZ (b-LLL/b-KZ) reduced bases are good for AMP, we analyze the energy efficiency of LRA precoding with zero-forcing (ZF) or successive-interference-cancellation (SIC) precoding. b-LLL/b-KZ suits compressed sensing scenarios because they yield bases with small coherence parameters, and an orthogonality metric in lattice theory is indeed reflecting the same goodness. The proved bound on LR-ZF/SIC not only shows that a sub-optimal estimator has a power scaling factor not far from that of sphere precoding, but also reveals that we can subtract the LR-ZF/SIC estimation to get into another estimation problem of a bounded constellation size. Since the bound on constellation size is derived from a worst case analysis, we also empirically shows a small constellation size suffices for our new problem.
2) For the first time, the AMP algorithm is successfully deployed to address a lattice decoding problem with an arbitrary input basis and an integer prior . A reduced lattice basis may still not suit the basis assumption of AMP, so we derive a new one based on the exposition of Montanari  and Maleki . This derivation can be associated with a state evolution equation, where the impacts of lattice reduction and parameter selections are revealed explicitly. We propose to impose a ternary prior for AMP, so that the threshold functions have closed forms and the whole algorithm has relatively low complexity. This design helps to explore all the adjacent Voronoi cells of a LR-SIC/ZF one. Numerical results show that we can get a few dB’s gain after concatenating AMP to previous LRA-ZF/SIC.
The rest of this paper is organized as follows. We review some basic concepts about lattices and VP in Sec. II. The hybrid scheme is explained in Sec. III, which includes demonstrations about why we have reached another problem with a finite constellation size. Sec. IV presents our AMP algorithm. And lastly we give out simulation results and conclusions.
Notation: Matrices and column vectors are denoted by uppercase and lowercase boldface letters. denotes rounding, denotes the absolute value, denotes the Euclidean norm, and stands for pseudoinverse. denotes the vector space spanned by . and denote the projection of onto and the orthogonal complement of . stands for equality up to a normalization constant. denotes , . In the message passing algorithms, we take , to index the rows and columns of , respectively. We use the standard asymptotic notation when .
An -dimensional lattice is a discrete additive subgroup in . A -lattice with basis can be represented by
The th successive minimum of is the smallest real number such that contains linearly independent vectors of length at most :
in which denotes a ball centered at with radius .
It is necessary to distinguish whether a lattice basis is good or not. Good means all the lattice vectors are short and nearly orthogonal, and this property is measured by the orthogonality defect (OD):
We have due to Hadamard’s inequality.
Lattice reduction is the process to transform a bad lattice basis into a good one. Depending on what type of goodness we are pursuing, and how much complexity we can afford, there are many well developed reduction algorithms. Here we review the polynomial time LLL  reduction and the exponential time KZ  reduction because most reduction algorithms can be interpreted as variants of these two.
We shall present the definitions of LLL/KZ reduction, whose algorithmic routines can be found in . Let be the R matrix of a QR decomposition on , with elements ’s, and be a Lov sz constant.
Definition 1 ().
A basis is called LLL reduced if it satisfies the size reduction conditions of for , , and Lov sz conditions of for .
Define . If is LLL reduced, it has 
Definition 2 ().
A basis is called KZ reduced if it satisfies the size reduction conditions, and the projection conditions of being the shortest vector of the projected lattice for .
If is KZ reduced, it has 
It has been shown in  that the boosted version of LLL/KZ can produce shorter and more orthogonal basis vectors.
Definition 3 ().
A basis is called boosted LLL (b-LLL) reduced if it satisfies diagonal reduction conditions of for , and all for are reduced by an approximate CVP oracle with list size along with a rejection operation.
Although the definition of b-LLL ensures that it performs no worse than LLL, only the same bound on OD has been proved: .
Definition 4 ().
A basis is called boosted KZ (b-KZ) reduced if it satisfies the projection conditions as KZ, and the length reduction conditions of for , where is a lattice quantizer.
If is b-KZ reduced, it has
Ii-B Vector Perturbation and Optimization
Vector perturbation is a precoding technique that aims to minimize the transmitted power that is associated with the transmission of a certain data vector [1, 2]. Assume the MIMO system is equipped with with transmit antennas and individual users, and each user has only one receive antenna. The observed signal at users to can be collectively expressed as
where denotes a channel matrix whose entries admit , is a transmitted signal, and is an additive Gaussian noise.
With perfect channel knowledge at transmitter’s side, the transmitted signal is designed to be a truncation of the channel inversion precoding :
where is an integer vector to be optimized, is the symbol vector. We set , because any quadrature amplitude modulation (QAM) constellation can be transformed to this format after adjusting (6), which means has an equivalent QAM size of
Assume the transmitted signal has unit power, and is a normalization factor. Then the received data at users can be represented as
Let , , since , the receive equation can be transformed to
From (8), we can see that if , where , then can be faithfully recovered.
To decrease the decoding error probability which is dominated by , we have to address the following optimization problem at the transmitter:
Define , , then (9) represents a closest vector problem (CVP) of lattice :
This CVP is different from the CVP in MIMO detection  because the distance distribution from to lattice is not known, the lattice basis is generally not admitting Gaussian distributions, and the optimization domain of is in rather than a finite constellation.
Iii The hybrid scheme
Apply lattice reduction to a not necessarily good input basis of to get , , and use this new basis to obtain a sub-optimal candidate , e.g., .
Let and define a finite constraint .
Use our AMP algorithm to solve:
In order to show the hybrid scheme is valid, we try to answer the following three questions in this paper:
To make the reduced basis good for AMP, which lattice reduction algorithm should we adopt? Answer: We should use b-LLL/b-KZ. These algorithms excel in the “short and orthogonal” metrics. See Appx. A for more details.
Iii-a The bound of
In the application to precoding, we show in this section that the estimation range is bounded after LRA precoding. We first analyze the energy efficiency 111In , is referred to as proximity factor in the CVP context. To avoid confusion with the proximity factor in , we simply call it “energy efficiency”. of b-LLL/b-KZ aided ZF/SIC, and then address the bound for based on .
The energy efficiency of an algorithm providing is the smallest in the constraint
where , and we say this algorithm solves -CVP.
The practical implication of is to describe how far a suboptimal perturbation is from an optimal one.
For the serial SIC algorithm 222The readers may consult  if not familiar with SIC routines., if the lattice basis is reduced by b-LLL, then
where ; and if the basis is reduced by b-KZ, then
Regarding the b-LLL/b-KZ, their lower bounds of are not worse than those of LLL/KZ, we can use the of classic LLL/KZ if they exist. So Eq. (13) is adapted from LLL in [10, Lem. 1]. Since no result about the of KZ is known, we prove a sharp bound for b-KZ in Appx. B, where the skill involved is essentially due to . ∎
For the parallel ZF algorithm, if the lattice basis is reduced by b-LLL, then
and if the basis is reduced by b-KZ, then
see Appx. C. ∎
Unfortunately, (15) is no better than that of LLL in [10, Lem. 1]. The hardness in the analysis is to incorporate the effect of length reduction of b-LLL/b-KZ, while Thm. 2 only employs their projection conditions or diagonal reduction conditions. Since our empirical survey strongly suggests using b-LLL/b-KZ for the ZF estimator (see e.g., Figs. 6 and 9), we need Thm. 2 to claim their bounds on .
is related to in the following way. denotes the symbol bound of . For a reduced basis, we have the following relations: , , and the values of and can be found in .
is bounded by the covering radius of , so that from the triangle inequality,
With unitary transform, we have . Then it is reminiscent of evaluating an equality from sphere decoding:
In the th layer, one has
Similarly in the th layer,
By induction, we can obtain the upper bounds of . The concrete values of these bounds are easily evaluated by plugging in the values of , and based on the chosen LR aided ZF/SIC algorithms.
The theoretical bound of represents a worst case scenario. Although we have proved the existence of these upper bounds, it is not necessary to evaluate these values because, in practice, LR aided ZF/SIC are quite close to the optimal one.
In Fig. 2, we plot the maximal error for the ZF estimator under Monte Carlo runs. Four groups of simulations are tested with system size or and the size of constellations set as or . Among the four histograms inside Fig. 2, we can see that is the worst case behavior, and the probability of correct decoding decreases from around to when the system size increases from to .
With similar settings, we have also plotted histograms of the SIC estimator. The probability of correct decoding with is about , and it slightly decreases to when . The maximal symbol errors are most likely to occur with , and there exists a small probability that we have when . The changes of constellation size has almost no impact on these histograms.
Iii-C Things are ready for AMP?
Regarding the constellation of , the previous discussions have demonstrated that the error of ZF/SIC estimator is bounded to a function about system dimension and some inherent lattice metrics. This means we are not facing an infinite lattice decoding problem with constellations in Eq. (11), whence the application of AMP becomes possible. Moreover, the bound of can be made very small when designing our AMP algorithm.
Regarding the channel matrix , it is short and nearly orthogonal after lattice reduction. If the basis satisfies a sub-Gaussian assumption, then one can adopt the well developed AMP [22, 23, 24] or GAMP [29, 30] algorithms to solve our problem in Eq. (11). Indeed, rigorous proofs [28, 36] showing the AMP algorithm can track the symbol-wise estimation errors are relying on this foundation. We are assuming a reduced basis can approximately take the blessing of this sub-Gaussian assumption. Rigorously proving this equivalence seems technically complicated, but our simulation results would confirm the plausibility of modeling reduced bases as sub-Gaussian. To improve the accuracy of this approximation, we should slightly modify the AMP algorithm to make it work with column-wise i.i.d. basis entries. To be concise, even in the cases where the successive minima constitute a basis, the entries of channel matrix cannot be uniformly normalized to . This motivates us to adjust the classic AMP algorithm in [22, 23, 24], hoping to reach the simplest routines.
As for the distribution of noise , it is not known a priori. We can equip with a Gaussian distribution whose radius is in the order . To be concise, let , . This is crucial for obtain a non-informative likelihood function of , i.e., .
Iv AMP algorithm for Eq. (11)
By combing the non-informative likelihood function with the signal prior , we obtain a Maximum-a-Posteriori (MAP) function for Bayesian estimation:
where , and the prior to be designed in subsection IV-E. The MAP function is not discrete, so the measure events are extended from a power set (e.g., message passing decoding of LDPC ) to a field in the Lebesgue measure space.
The simplified belief propagation (BP)  in IV-A is folklore and can be found in some pioneering literature [22, 23, 24, 31]; they are however included to help understanding the derivation in the followed subsections. After deriving our AMP algorithm, we will present the threshold functions of certain priors and characterize the symbol-wise estimation errors in Thm. 3.
Iv-a Simplified BP
In the BP algorithm, there are factor nodes and variable nodes, indexed by and respectively. The message from variable to factor is given by
where the message from to is
These messages are impractical to evaluate in the Lebesgue measure space, and thus often simplified by many techniques. We attempt to remove the complexities from an expectation propagation  perspective. Suppose the message in Eq. (19) is estimated by a Gaussian function with mean and variance , then
In the other direction, we work out messages with Gaussian functions through matching their first and second order moments by the following constraints:
where and are referred to as threshold functions. From Eq. (24), inferring and its variance from by using the MAP principle yields:
Compare Eq. (29) with the previously defined mean and variance , we have
Our derivation is to equip with a density function that can be fully described by its first and second moments, then one obtains their moment equations when passing back. In [31, Lem. 5.3.1], Maleki had applied the Berry–Esseen theorem to prove that approximating with a Gaussian is tight. Although our variance of looks different from his, they are indeed equivalent if we set the variance of as . Moreover, [31, Lem. 5.5.4] also justifies the correctness on the other side of our approximation.
Iv-B Reaching scalars
For a reduced lattice basis , we denote Then the variance of entries in can be equipped, e.g., , so one can employ this knowledge to further simplify the algorithm in IV-A. Here we define
By equipping all the with equal magnitude , referred to as , as well as using due to the law of large numbers, it yields
For the moment, we can expand the local estimations about and as , , so the techniques in [24, 23] can be employed. The crux of these transformation is to neglect elements whose amplitudes are no larger than . Subsequently, Eq. (32) and (33) become
In (35), terms with indexes are mutually related while others are not, so that
Further expand the r.h.s. of (36) with the first order Taylor expression of at , in which
then it yields
Distinguishing terms that are dependent on index leads to
Iv-C Further simplification
From (42), the estimated variance for each now becomes
As , (31) tells
According to (46), we denote as , then the whole algorithm can be described by the following four steps:
After lattice reduction and transforming the constraint to a finite set, we recognize that the AMP/GAMP algorithms in [26, 22, 29] can be employed for our problem after further regularizing the channels (i.e., let and update the prior ). However, Algo. 1 still gives valuable insights in the following aspects:
i) We can explicitly study the impact of channel power on the state evolution equation based on our derivation, as shown in Sec. IV-F. Moreover, in Algo. 1 reveals the averaged estimation variance of and its convergence behavior, which is computationally advantageous if one needs to observe the convergence behavior and choose a candidate in the set that has the best fitness value (in that the last corresponding to a stable fixed point may not have the best fitness). ii) The estimated in Algo. 1 is reflecting the MAP estimation, while AMP with needs additional steps to scale back. Regularizing AMP with can be detrimental in a finite accuracy processor. For instance, in the single precision floating-point arithmetic defined in IEEE-754 standard, if in is operating is a scaled range where e.g. only bits of mantissa are effective, then the other bits in the mantissa are wasted.
Iv-E Associating discrete priors
Algo. 1 needs to work with specifically designed threshold functions. From Secs. III-A and III-B, a dominant portion of “errors” would be corrected if we impose a ternary prior for . We present its threshold functions and in Lem. 1, which can be proved after a simple algebra exercise. These threshold functions have closed forms and are easy to compute. The AMP algorithm using (51) (52) due to ternary priors is referred to as AMPT.
Let , with , . Then the conditional mean and conditional variance of on are: