Hybrid Vector Perturbation Precoding: The Blessing of Approximate Message Passing

Hybrid Vector Perturbation Precoding: The Blessing of Approximate Message Passing

Shanxiang Lyu and Cong Ling,  S. Lyu and C. Ling are with the Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, United Kingdom (e-mail: s.lyu14@imperial.ac.uk, cling@ieee.org).
Abstract

Vector perturbation (VP) precoding is a promising technique for multiuser communication systems operating in the downlink. In this work, we introduce a hybrid framework to improve the performance of lattice reduction (LR) aided (LRA) VP. Firstly, we perform a simple precoding using zero forcing (ZF) or successive interference cancellation (SIC) based on a reduced lattice basis. Since the signal space after LR-ZF or LR-SIC precoding can be shown to be bounded to a small range, then along with sufficient orthogonality of the lattice basis guaranteed by LR, they collectively pave the way for the subsequent application of an approximate message passing (AMP) algorithm, which further boosts the performance of any suboptimal precoder. Our work shows that the AMP algorithm in compressed sensing can be beneficial for a lattice decoding problem whose signal constraint lies in and entries of the input lattice basis not necessarily being i.i.d. Gaussian. Numerical results show that the developed hybrid scheme can provide performance enhancement with negligible increase in the complexity.

Vector perturbation, lattice reduction, approximate message passing

I Introduction

The broadband mobile internet of the next generation is expected to deliver high volume data to a large number of users simultaneously. To meet this demand in the broadcast network, it is desirable to precode the transmit symbols according to the channel state information (CSI) with improved time-efficiency while retaining the reliability. It has been indicated that plain channel inversion performs poorly at all singal-to-noise ratios (SNRs), and further regularization cannot improve the performance substantially. In [1, 2], the authors proposed a precoding scheme called vector perturbation (VP) based on Tomlinson-Harashima precoding to modify the transmitted data by modulo-lattice operations, and the scheme has been shown to be achieving near-sum-capacity of the system, which does not require explicit dirty-paper techniques.

The optimization target in a VP problem represents a closest vector problem (CVP) in a lattice perspective, which has been proved NP-complete by a reduction from the decision version of CVP [3]. Therefore, the sphere decoding technique [4] adopted in [1, 2] (referred to as sphere precoding) is computationally prohibitive for large-scale systems. This hardness especially looms in the VP precoding problem because there is no prior on the distance from a target to the lattice, and the lattice bases in VP are not Gaussian random, so that Hassibi’s expected complexity analysis [5] no longer suits this setting. The complexity issue is indeed one of the three main challenges associated with VP, where the other two issues are about its power scaling factor and large signal space [6, 7].

To bypass the complexity issue of sphere precoding, much work has been done in recent years to explore low complexity CVP algorithms in multiuser (MU) multiple-input multiple-ouput (MIMO) communicatoins, e.g., cf. [6, 8, 9, 10, 11, 12, 13]. The spirit of these results is to address CVP on a sub-lattice or impose a constraint for the signal space of sphere precoding. E.g., in [11], the authors proposed to alternatively solve a CVP about a selective sub-basis of a smaller dimension, whose associated complexity of VP depends on the size of the new basis. As for the sparse vector perturbation technique in [13], it also belongs to the class of selective vector perturbation where it only selects two vectors. The reduction on the target vector in [13] is then applied to all basis vectors sequentially, which resembles a special case of the sequential lattice reduction [14]. There is however no theoretical performance guarantee for these simplified methods, so we have to resort to a lattice reduction (LR) aided (LRA) precoding scheme [10, 15, 16], which had been shown diversity achieving [16]. In addition to their theoretical guarantees, LRA methods particularly suit slow fading channels, where the lattice basis is fixed during a large number of time slots and only the CVP targets are changing.

We investigate VP by using LRA methods in this work. LR has become quite popular in both MIMO precoding and decoding, especially after the pioneering work of Lenstra–Lenstra–Lov sz (LLL) [17]. In recent years, in addition to the polynomial LLL algorithm, more researchers are showing interests in strong lattice reduction algorithms such as Minkowski’s reduction [18, 19], Korkine-Zolotarev’s (KZ) reduction [20] and its boosted version [21]. The performance of LRA precoding is not well understood except [10, 16], so our primary motivation is to investigate how far LRA methods can go, especially with the blessing of algorithms from compressed sensing.

We propose to use a message passing algorithm to explore the vicinity of sub-optimal solutions under the LRA framework. The approximate message passing (AMP) algorithm was initially proposed by Donoho, Montanari and Maleki in [22, 23, 24] to solve the least-absolute shrinkage and selection operator (Lasso) problem in compressed sensing, which has much lower complexity than previous benchmark algorithms. Researchers have been adopting message passing algorithms to solve problems in MIMO detection [25, 26, 27] with small constellation sizes, where the assumed Rayleigh fading channel assists to model the input lattice basis with i.i.d. Gaussian entries. It is noteworthy that directly applying AMP in MIMO detection problems cannot be diversity achieving because a general discrete prior renders the AMP threshold function not Lipschitz continuous in high signal-to-noise-ratio (SNR), so channel coding is often required (e.g., cf. [25]). If we want to embrace the low complexity advantage of AMP, several practical issues must be hampered. i) the lattice basis in VP is not a Gaussian random one nor its dual, while [28] shows the entries have to be at least sub-Gaussian and the generalized AMP (GAMP) [29, 30] only shows convergence of the algorithm with the aid of damping. ii) the problem size may not be infinitely large, and we should make AMP feasible in the non-asymptotic region (say, the base station is equipped with antennas to serve users). iii) the constellation in AMP cannot be integers . Fortunately, AMP in conjunction with a reduced lattice basis can alleviate all these concerns.

The contributions of this paper are summarized as twofold:

1) After showing boosted LLL/KZ (b-LLL/b-KZ) reduced bases are good for AMP, we analyze the energy efficiency of LRA precoding with zero-forcing (ZF) or successive-interference-cancellation (SIC) precoding. b-LLL/b-KZ suits compressed sensing scenarios because they yield bases with small coherence parameters, and an orthogonality metric in lattice theory is indeed reflecting the same goodness. The proved bound on LR-ZF/SIC not only shows that a sub-optimal estimator has a power scaling factor not far from that of sphere precoding, but also reveals that we can subtract the LR-ZF/SIC estimation to get into another estimation problem of a bounded constellation size. Since the bound on constellation size is derived from a worst case analysis, we also empirically shows a small constellation size suffices for our new problem.

2) For the first time, the AMP algorithm is successfully deployed to address a lattice decoding problem with an arbitrary input basis and an integer prior . A reduced lattice basis may still not suit the basis assumption of AMP, so we derive a new one based on the exposition of Montanari [24] and Maleki [31]. This derivation can be associated with a state evolution equation, where the impacts of lattice reduction and parameter selections are revealed explicitly. We propose to impose a ternary prior for AMP, so that the threshold functions have closed forms and the whole algorithm has relatively low complexity. This design helps to explore all the adjacent Voronoi cells of a LR-SIC/ZF one. Numerical results show that we can get a few dB’s gain after concatenating AMP to previous LRA-ZF/SIC.

The rest of this paper is organized as follows. We review some basic concepts about lattices and VP in Sec. II. The hybrid scheme is explained in Sec. III, which includes demonstrations about why we have reached another problem with a finite constellation size. Sec. IV presents our AMP algorithm. And lastly we give out simulation results and conclusions.

Notation: Matrices and column vectors are denoted by uppercase and lowercase boldface letters. denotes rounding, denotes the absolute value, denotes the Euclidean norm, and stands for pseudoinverse. denotes the vector space spanned by . and denote the projection of onto and the orthogonal complement of . stands for equality up to a normalization constant. denotes , . In the message passing algorithms, we take , to index the rows and columns of , respectively. We use the standard asymptotic notation when .

Ii Preliminaries

Ii-a Lattices

An -dimensional lattice is a discrete additive subgroup in . A -lattice with basis can be represented by

The th successive minimum of is the smallest real number such that contains linearly independent vectors of length at most :

in which denotes a ball centered at with radius .

It is necessary to distinguish whether a lattice basis is good or not. Good means all the lattice vectors are short and nearly orthogonal, and this property is measured by the orthogonality defect (OD):

(1)

We have due to Hadamard’s inequality.

Lattice reduction is the process to transform a bad lattice basis into a good one. Depending on what type of goodness we are pursuing, and how much complexity we can afford, there are many well developed reduction algorithms. Here we review the polynomial time LLL [17] reduction and the exponential time KZ [20] reduction because most reduction algorithms can be interpreted as variants of these two.

We shall present the definitions of LLL/KZ reduction, whose algorithmic routines can be found in [32]. Let be the R matrix of a QR decomposition on , with elements ’s, and be a Lov sz constant.

Definition 1 ([17]).

A basis is called LLL reduced if it satisfies the size reduction conditions of for , , and Lov sz conditions of for .

Define . If is LLL reduced, it has [17]

(2)
Definition 2 ([33]).

A basis is called KZ reduced if it satisfies the size reduction conditions, and the projection conditions of being the shortest vector of the projected lattice for .

If is KZ reduced, it has [33]

(3)

It has been shown in [21] that the boosted version of LLL/KZ can produce shorter and more orthogonal basis vectors.

Definition 3 ([21]).

A basis is called boosted LLL (b-LLL) reduced if it satisfies diagonal reduction conditions of for , and all for are reduced by an approximate CVP oracle with list size along with a rejection operation.

Although the definition of b-LLL ensures that it performs no worse than LLL, only the same bound on OD has been proved: [21].

Definition 4 ([21]).

A basis is called boosted KZ (b-KZ) reduced if it satisfies the projection conditions as KZ, and the length reduction conditions of for , where is a lattice quantizer.

If is b-KZ reduced, it has

(4)

Ii-B Vector Perturbation and Optimization

Vector perturbation is a precoding technique that aims to minimize the transmitted power that is associated with the transmission of a certain data vector [1, 2]. Assume the MIMO system is equipped with with transmit antennas and individual users, and each user has only one receive antenna. The observed signal at users to can be collectively expressed as

(5)

where denotes a channel matrix whose entries admit , is a transmitted signal, and is an additive Gaussian noise.

With perfect channel knowledge at transmitter’s side, the transmitted signal is designed to be a truncation of the channel inversion precoding :

(6)

where is an integer vector to be optimized, is the symbol vector. We set , because any quadrature amplitude modulation (QAM) constellation can be transformed to this format after adjusting (6), which means has an equivalent QAM size of

Assume the transmitted signal has unit power, and is a normalization factor. Then the received data at users can be represented as

(7)

Let , , since , the receive equation can be transformed to

(8)

From (8), we can see that if , where , then can be faithfully recovered.

To decrease the decoding error probability which is dominated by , we have to address the following optimization problem at the transmitter:

(9)

Define , , then (9) represents a closest vector problem (CVP) of lattice :

(10)

This CVP is different from the CVP in MIMO detection [34] because the distance distribution from to lattice is not known, the lattice basis is generally not admitting Gaussian distributions, and the optimization domain of is in rather than a finite constellation.

Iii The hybrid scheme

Our hybrid scheme to solve the CVP in (10) is described as follows. The rationale is demonstrated in Fig. 1.

Fig. 1: Exploring the vicinity of a good candidate , whose ZF parallelepiped is the cyan cube. After updating the target vector , to optimize enables locating all the blue lattice points inside the white cubes (some cubes are not plotted to avoid shading).
  1. Apply lattice reduction to a not necessarily good input basis of to get , , and use this new basis to obtain a sub-optimal candidate , e.g., .

  2. Let and define a finite constraint .

  3. Use our AMP algorithm to solve:

    (11)
  4. Return .

In order to show the hybrid scheme is valid, we try to answer the following three questions in this paper:

  • To make the reduced basis good for AMP, which lattice reduction algorithm should we adopt? Answer: We should use b-LLL/b-KZ. These algorithms excel in the “short and orthogonal” metrics. See Appx. A for more details.

  • Is there any theoretical/practical guarantee for transforming to ? Answer: See Secs. III-A and III-B.

  • The AMP algorithms in [22, 23, 24] were assuming at least the entries of being sub-Gaussian with variance . Can we tune an AMP algorithm that is suitable for problem (11), and possibly the routines are simple and have closed-form expressions? Answer: See Sec. IV.

Iii-a The bound of

In the application to precoding, we show in this section that the estimation range is bounded after LRA precoding. We first analyze the energy efficiency 111In [10], is referred to as proximity factor in the CVP context. To avoid confusion with the proximity factor in [34], we simply call it “energy efficiency”. of b-LLL/b-KZ aided ZF/SIC, and then address the bound for based on .

Definition 5.

The energy efficiency of an algorithm providing is the smallest in the constraint

(12)

where , and we say this algorithm solves -CVP.

The practical implication of is to describe how far a suboptimal perturbation is from an optimal one.

Theorem 1.

For the serial SIC algorithm 222The readers may consult [34] if not familiar with SIC routines., if the lattice basis is reduced by b-LLL, then

(13)

where ; and if the basis is reduced by b-KZ, then

(14)
Proof:

Regarding the b-LLL/b-KZ, their lower bounds of are not worse than those of LLL/KZ, we can use the of classic LLL/KZ if they exist. So Eq. (13) is adapted from LLL in [10, Lem. 1]. Since no result about the of KZ is known, we prove a sharp bound for b-KZ in Appx. B, where the skill involved is essentially due to [35]. ∎

Theorem 2.

For the parallel ZF algorithm, if the lattice basis is reduced by b-LLL, then

(15)

and if the basis is reduced by b-KZ, then

(16)
Proof:

see Appx. C. ∎

Remark 1.

Unfortunately, (15) is no better than that of LLL in [10, Lem. 1]. The hardness in the analysis is to incorporate the effect of length reduction of b-LLL/b-KZ, while Thm. 2 only employs their projection conditions or diagonal reduction conditions. Since our empirical survey strongly suggests using b-LLL/b-KZ for the ZF estimator (see e.g., Figs. 6 and 9), we need Thm. 2 to claim their bounds on .

is related to in the following way. denotes the symbol bound of . For a reduced basis, we have the following relations: , , and the values of and can be found in [21].

is bounded by the covering radius of , so that from the triangle inequality,

With unitary transform, we have . Then it is reminiscent of evaluating an equality from sphere decoding:

In the th layer, one has

Similarly in the th layer,

By induction, we can obtain the upper bounds of . The concrete values of these bounds are easily evaluated by plugging in the values of , and based on the chosen LR aided ZF/SIC algorithms.

The theoretical bound of represents a worst case scenario. Although we have proved the existence of these upper bounds, it is not necessary to evaluate these values because, in practice, LR aided ZF/SIC are quite close to the optimal one.

Iii-B Empirical

In Fig. 2, we plot the maximal error for the ZF estimator under Monte Carlo runs. Four groups of simulations are tested with system size or and the size of constellations set as or . Among the four histograms inside Fig. 2, we can see that is the worst case behavior, and the probability of correct decoding decreases from around to when the system size increases from to .

Fig. 2: The error histogram of ZF. The x-axis is .

With similar settings, we have also plotted histograms of the SIC estimator. The probability of correct decoding with is about , and it slightly decreases to when . The maximal symbol errors are most likely to occur with , and there exists a small probability that we have when . The changes of constellation size has almost no impact on these histograms.

Fig. 3: The error histogram of SIC. The x-axis is .

Iii-C Things are ready for AMP?

Regarding the constellation of , the previous discussions have demonstrated that the error of ZF/SIC estimator is bounded to a function about system dimension and some inherent lattice metrics. This means we are not facing an infinite lattice decoding problem with constellations in Eq. (11), whence the application of AMP becomes possible. Moreover, the bound of can be made very small when designing our AMP algorithm.

Regarding the channel matrix , it is short and nearly orthogonal after lattice reduction. If the basis satisfies a sub-Gaussian assumption, then one can adopt the well developed AMP [22, 23, 24] or GAMP [29, 30] algorithms to solve our problem in Eq. (11). Indeed, rigorous proofs [28, 36] showing the AMP algorithm can track the symbol-wise estimation errors are relying on this foundation. We are assuming a reduced basis can approximately take the blessing of this sub-Gaussian assumption. Rigorously proving this equivalence seems technically complicated, but our simulation results would confirm the plausibility of modeling reduced bases as sub-Gaussian. To improve the accuracy of this approximation, we should slightly modify the AMP algorithm to make it work with column-wise i.i.d. basis entries. To be concise, even in the cases where the successive minima constitute a basis, the entries of channel matrix cannot be uniformly normalized to . This motivates us to adjust the classic AMP algorithm in [22, 23, 24], hoping to reach the simplest routines.

As for the distribution of noise , it is not known a priori. We can equip with a Gaussian distribution whose radius is in the order . To be concise, let , . This is crucial for obtain a non-informative likelihood function of , i.e., .

Iv AMP algorithm for Eq. (11)

By combing the non-informative likelihood function with the signal prior , we obtain a Maximum-a-Posteriori (MAP) function for Bayesian estimation:

(17)

where , and the prior to be designed in subsection IV-E. The MAP function is not discrete, so the measure events are extended from a power set (e.g., message passing decoding of LDPC [37]) to a field in the Lebesgue measure space.

The simplified belief propagation (BP) [38] in IV-A is folklore and can be found in some pioneering literature [22, 23, 24, 31]; they are however included to help understanding the derivation in the followed subsections. After deriving our AMP algorithm, we will present the threshold functions of certain priors and characterize the symbol-wise estimation errors in Thm. 3.

Iv-a Simplified BP

In the BP algorithm, there are factor nodes and variable nodes, indexed by and respectively. The message from variable to factor is given by

(18)

where the message from to is

(19)

These messages are impractical to evaluate in the Lebesgue measure space, and thus often simplified by many techniques. We attempt to remove the complexities from an expectation propagation [39] perspective. Suppose the message in Eq. (19) is estimated by a Gaussian function with mean and variance , then

(20)

By substituting Eq. (20) into Eq. (18), we have

(21)

where

(22)
(23)

In the other direction, we work out messages with Gaussian functions through matching their first and second order moments by the following constraints:

(24)
(25)
(26)

where and are referred to as threshold functions. From Eq. (24), inferring and its variance from by using the MAP principle yields:

(27)
(28)

By plugging the approximation of Eq. (24) into Eq. (19), which becomes an multidimensional Gaussian function expectation about probability measure , the integration over Gaussian functions becomes

(29)

Compare Eq. (29) with the previously defined mean and variance , we have

(30)
(31)

Thus far, Eqs . (22) (23) (27) (28) (30) (31) define a simplified version of BP, where the tracking of functions in Eqs. (18) and (19) has been replaced by the tracking of scalars.

Remark 2.

Our derivation is to equip with a density function that can be fully described by its first and second moments, then one obtains their moment equations when passing back. In [31, Lem. 5.3.1], Maleki had applied the Berry–Esseen theorem to prove that approximating with a Gaussian is tight. Although our variance of looks different from his, they are indeed equivalent if we set the variance of as . Moreover, [31, Lem. 5.5.4] also justifies the correctness on the other side of our approximation.

Iv-B Reaching scalars

For a reduced lattice basis , we denote Then the variance of entries in can be equipped, e.g., , so one can employ this knowledge to further simplify the algorithm in IV-A. Here we define

(32)

By equipping all the with equal magnitude , referred to as , as well as using due to the law of large numbers, it yields

(33)
(34)

For the moment, we can expand the local estimations about and as , , so the techniques in [24, 23] can be employed. The crux of these transformation is to neglect elements whose amplitudes are no larger than . Subsequently, Eq. (32) and (33) become

(35)
(36)

In (35), terms with indexes are mutually related while others are not, so that

(37)
(38)

Further expand the r.h.s. of (36) with the first order Taylor expression of at , in which

(39)

then it yields

Distinguishing terms that are dependent on index leads to

(40)
(41)

Then we substitute (38) into (40), and (41) into (37), to obtain

(42)
(43)

where

(44)

Iv-C Further simplification

From (42), the estimated variance for each now becomes

(45)

As , (31) tells

(46)

According to (46), we denote as , then the whole algorithm can be described by the following four steps:

(47)
(48)
(49)
(50)

Let , . The iterations in (47) to (50) can be summarized in Algo. 1.

Input: Lattice basis , target , number of iterations , threshold functions and , the minimum symbol error .
Output: estimated coefficient .
1 , , ;
2 ;
3 for  do
4       ;
5       ;
6       ;
7      
Algorithm 1 The AMP algorithm.

Iv-D Discussions

After lattice reduction and transforming the constraint to a finite set, we recognize that the AMP/GAMP algorithms in [26, 22, 29] can be employed for our problem after further regularizing the channels (i.e., let and update the prior ). However, Algo. 1 still gives valuable insights in the following aspects:

i) We can explicitly study the impact of channel power on the state evolution equation based on our derivation, as shown in Sec. IV-F. Moreover, in Algo. 1 reveals the averaged estimation variance of and its convergence behavior, which is computationally advantageous if one needs to observe the convergence behavior and choose a candidate in the set that has the best fitness value (in that the last corresponding to a stable fixed point may not have the best fitness). ii) The estimated in Algo. 1 is reflecting the MAP estimation, while AMP with needs additional steps to scale back. Regularizing AMP with can be detrimental in a finite accuracy processor. For instance, in the single precision floating-point arithmetic defined in IEEE-754 standard, if in is operating is a scaled range where e.g. only bits of mantissa are effective, then the other bits in the mantissa are wasted.

Iv-E Associating discrete priors

Algo. 1 needs to work with specifically designed threshold functions. From Secs. III-A and III-B, a dominant portion of “errors” would be corrected if we impose a ternary prior for . We present its threshold functions and in Lem. 1, which can be proved after a simple algebra exercise. These threshold functions have closed forms and are easy to compute. The AMP algorithm using (51) (52) due to ternary priors is referred to as AMPT.

Lemma 1.

Let , with , . Then the conditional mean and conditional variance of on are: