Low-Complexity Design of Generalized Block Diagonalization Precoding Algorithms for Multiuser MIMO Systems
Block diagonalization (BD) based precoding techniques are well-known linear transmit strategies for multiuser MIMO (MU-MIMO) systems. By employing BD-type precoding algorithms at the transmit side, the MU-MIMO broadcast channel is decomposed into multiple independent parallel single user MIMO (SU-MIMO) channels and achieves the maximum diversity order at high data rates. The main computational complexity of BD-type precoding algorithms comes from two singular value decomposition (SVD) operations, which depend on the number of users and the dimensions of each user’s channel matrix. In this work, low-complexity precoding algorithms are proposed to reduce the computational complexity and improve the performance of BD-type precoding algorithms. We devise a strategy based on a common channel inversion technique, QR decompositions, and lattice reductions to decouple the MU-MIMO channel into equivalent SU-MIMO channels. Analytical and simulation results show that the proposed precoding algorithms can achieve a comparable sum-rate performance as BD-type precoding algorithms, substantial bit error rate (BER) performance gains, and a simplified receiver structure, while requiring a much lower complexity.
Low-Complexity Design of Generalized Block Diagonalization Precoding Algorithms for Multiuser MIMO Systems
Keke Zu, Rodrigo C. de Lamare, Senior Member, IEEE and Martin Haardt, Senior Member, IEEE
00footnotetext: Part of this work has been submitted to IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove, USA, Oct. 2012 
Multiuser MIMO (MU-MIMO), block diagonalization (BD), regularized block diagonalization (RBD), low-complexity, lattice reduction (LR).
Multiple-input multiple-output (MIMO) systems have drawn a considerable research effort in the past years due to the fact that they can greatly increase the spectrum efficiency of wireless communications [2, 3]. In order to meet the continuous growing data traffic, a downlink peak spectrum efficiency of 30 bps/Hz and an uplink peak spectrum efficiency of 15 bps/Hz is proposed in LTE-Advanced , and a configuration of up to transmit antennas for the downlink is suggested. A new amendment for the WLAN standard IEEE 802.11ac  also recommends up to MIMO spatial streams. Configurations with dozens of antennas are now being considered . High-dimensional MIMO systems or large MIMO systems are very promising for the next generation of wireless communication systems due to their potential to improve rate and reliability dramatically . However, it is a challenge to design a suitable precoding algorithm with good overall performance and low computational complexity at the same time for high-dimensional MIMO systems.
Unlike the received signal in single user MIMO (SU-MIMO) systems, the received signals of different users in multiuser MIMO (MU-MIMO) systems not only suffer from the noise and the inter-antenna interference but are also affected by the multiuser interference (MUI). Channel inversion based precoding or linear precoding algorithms such as zero forcing (ZF) and minimum mean squared error (MMSE) precoding  can still be used to cancel the MUI, but they result in a reduced throughput or require a higher power at the transmitter in the MU-MIMO scenarios. As a generalization of the ZF precoding algorithm, block diagonalization (BD) based precoding algorithms have been proposed in [11, 12] for MU-MIMO systems. However, BD based precoding algorithms only take the MUI into account and thus suffer a performance loss at low signal to noise ratios (SNRs) when the noise is the dominant factor. Therefore, a regularized block diagonalization (RBD) precoding algorithm which introduces a regularization factor to take the noise term into account has been proposed in . We term the BD and RBD based precoding schemes as BD-type precoding algorithms in this work for convenience.
The main steps of the BD-type precoding algorithms are two SVD operations, which need to be implemented for each user. Therefore, the computational complexity of the BD-type precoding algorithms depends on the number of users and the dimensions of each user’s channel matrix. For MU-MIMO systems with a large number of users and multiple receive antennas, this could result in a considerable computational cost. Another distinctive aspect of the BD-type precoding algorithms is that they need a decoding matrix obtained from the second SVD operation to orthogonalize each user’s streams. The requirement of this decoding matrix brings extra control overhead or computational complexity .
Recent work on BD-type precoding algorithms has focused on how to equivalently implement the BD-type precoding algorithms with less computational complexity. A low-complexity generalized ZF channel inversion (GZI) method has been proposed in  to equivalently implement the first SVD operation of the original BD precoding, and a generalized MMSE channel inversion (GMI) method is also developed in  for the original RBD precoding. In  the first SVD operation of the RBD precoding is replaced with a less complex QR decomposition . We term the work in  as GMI-type precoding and the work in  as QR/SVD-type precoding. For the second SVD operation, however, both the GMI-type and QR/SVD-type precoding schemes employ it in a similar way as the conventional BD-type precoding algorithms to parallelize each user’s streams. Therefore, the second SVD operation needs to be implemented multiple times and the decoding matrix for the effective channel still needs to be known or estimated at the receiver of each user for the GMI-type or QR/SVD-type precoding algorithms.
The GMI-type and QR/SVD-type techniques are solely low complexity equivalent implementations of the BD-type precoding algorithms. As an improvement of the BD-type precoding algorithms, a low-complexity lattice reduction-aided RBD (LC-RBD-LR) type precoding algorithms has been proposed in [19, 20] based on the QR decomposition scheme. Not only much less complexity but also considerable BER gains are achieved by the LC-RBD-LR-type precoding algorithms. However, the QR decomposition in LC-RBD-LR-type precoding algorithms still needs to be implemented for each user, which could result in a high complexity for large MIMO systems.
A new category of low-complexity high performance precoding algorithms based on the channel inversion scheme is proposed in this work. A simplified GMI (S-GMI) precoding scheme which employs a common channel inversion for all users is developed first. Equivalent parallel SU-MIMO channels are obtained from the S-GMI precoding process. Then, these effective channels are transformed into the lattice space by utilizing the lattice reduction (LR) technique , whose complexity is dictated by a QR decomposition. Linear precoding strategies are applied in the lattice space to parallelize each user’s streams. Finally, the proposed lattice reduction-aided simplified GMI (LR-S-GMI) precoding algorithms are obtained. According to the specific linear precoding constraint used, the proposed LR-S-GMI-type precoding algorithms are categorized as LR-S-GMI-ZF and LR-S-GMI-MMSE, respectively.
The algorithm structure of the proposed LR-S-GMI-type precoding is different from the LC-RBD-LR-type precoding since the channel inversion is only implemented once for all users, while the QR decomposition needs to be implemented multiple times in LC-RBD-LR-type precoding. Therefore, the computational complexity can be reduced considerably by the proposed LR-S-GMI-type precoding. A comprehensive mathematical analysis is developed to analyze and predict the performance of the proposed LR-S-GMI-type precoding algorithms. The simulation results verified that the proposed LR-S-GMI-type precoding algorithms have the lowest computational complexity compared to BD-type [11, 13], GMI-type , QR/SVD-type  and LC-RBD-LR-type  precoding algorithms, a comparable sum-rate performance as BD-type precoding algorithms, and substantial BER performance gains over prior art.
The main contributions of the work are summarized below:
A simplified GMI (S-GMI) precoding is developed in this work as an improvement of the original RBD in . A mathematical analysis is given to show that the S-GMI has a better BER performance and much less complexity than that of RBD, which is a clear difference compared to the GMI in  which only provides an equivalent implementation of RBD.
A new category of low-complexity high-performance LR-S-GMI-type precoding algorithms is proposed for MU-MIMO systems based on a channel inversion technique, QR decompositions, and lattice reductions.
The BD-type precoding algorithms are systematically analyzed and summarized. We show that the computational complexity of the BD-type precoding depends on the number of users and the system dimensions.
A comprehensive performance analysis is carried out in terms of BER performance, achievable sum-rate, and computational complexity.
A simulation study of the proposed algorithms under imperfect channel situations is also conducted, which completes this paper.
The proposed and existing precoding techniques are all performed with the help of downlink channel state information (CSI). The assumption that full CSI is available at the transmit side is valid in time-division duplex (TDD) systems because the uplink and downlink share the same frequency band. For frequency-division duplex (FDD) systems, however, the CSI needs to be estimated at the receiver and fed back to the transmitter.
This paper is organized as follows. The system model is given in Section II. A brief review of the BD-type precoding algorithms is presented in Section III. The proposed LR-S-GMI-type precoding algorithms are described in detail in Section IV and the performance analysis is developed in Section V. Simulation results and conclusions are displayed in Section VI and Section VII, respectively.
Notation: Matrices and vectors are denoted by upper and lowercase boldface letters, and the transpose, Hermitian transpose, inverse, pseudo-inverse of a matrix are described by , , , , respectively. The trace, determinant, Frobenius norm, round function are denoted as , , , . With } creates a block diagonal matrix with the matrices on the main diagonal.
Ii System Model
We consider an uncoded MU-MIMO downlink channel, with transmit antennas at the base station (BS) and receive antennas at the th user equipment (UE). With users in the system, the total number of receive antennas is . A block diagram of such a system is illustrated in Fig. 1.
From the system model, the combined channel matrix and the joint precoding matrix are given by
where is the th user’s channel matrix. The quantity is the th user’s precoding matrix. We assume a flat fading MIMO channel and the received signal at the th user is given by
where the quantity is the th user’s transmitted signal, and is the th user’s Gaussian noise with independent and identically distributed (i.i.d.) entries of zero mean and variance . Assuming that the average transmit power for each user is , then, the power constraint is imposed. We construct an unnormalized signal such that
where with being the transmit data vector and is the average energy of with . The physical meaning of dividing by the scalar is to make sure the average transmit power is still the same after the precoding process. With this normalization, obeys .
The received signal is weighted by the scalar to form the estimate
Note that it is necessary to cancel out at the receiver to get the correct amplitude of the desired signal part. The average energy is independent from the channel and the data, which means the receivers do not need to know the instantaneous CSI for the precoding techniques to work . As analyzed and illustrated in , however, the performance difference between the average and the instantaneous is very small. Therefore, we follow the strategy developed in  and  to assume the receivers need to know only but use instead of for simulation convenience as is simpler to compute. The simulation results represent the performance of either normalization method. In this case, we can replace (4) and (5) with the instantaneous in the simulations and employ
Iii Review of BD-type Precoding Algorithms
The design of BD-type precoding algorithms is performed in two steps [11, 13]. The first precoding filter is used to completely eliminate (by BD) or balance the MUI with noise (by RBD), then exact (by BD) or approximate (by RBD) parallel SU-MIMO channels are obtained. The second precoding filter is implemented to parallelize each user’s streams. Correspondingly, the precoding matrix for the th user can be rewritten in two parts as
where and . The parameter is dependent on the specific choice of the precoding algorithm. We exclude the th user’s channel matrix and define as
where . Then, the interference generated to the other users is determined by .
In order to eliminate all the MUI, we impose the constraint that
We term (9) as the BD constraint. Note that the BD constraint is actually an extension of the ZF constraint in  for MU-MIMO with multiple receive antennas. In order to take the noise term into account as well, an RBD constraint is developed in  and given by
Assuming that the rank of is , define the SVD of
where and are unitary matrices. The diagonal matrix contains the singular values of the matrix . Factorizing into two parts, consists of the first non-zero singular vectors and holds the last zero singular vectors. Thus, forms an orthogonal basis for the null space of . The solution for the BD constraint (9) is given by
As shown in , the solution for the RBD constraint can be obtained as
where is the regularization factor with is the whole average transmit power.
After the first precoding process, the MU-MIMO channel is decoupled into a set of parallel independent SU-MIMO channels by the BD precoding. For the RBD precoding, there are residual interferences between these channels due to the regularization process, but, these interferences tend to zero at high SNRs. Therefore, the effective channel matrix for the th user can be expressed as
Define and consider the second SVD operation on the effective channel matrix
using the unitary matrix and contains the first singular vectors. The second precoding filters for BD and RBD precoding can be respectively obtained as
where is the power loading matrix that depends on the optimization criterion. An example power loading is the water filling (WF) . The th user’s decoding matrix is obtained as
which needs to be known by each user’s receiver.
Note that for the conventional BD-type precoding algorithms, there is a dimensionality constraint below to be satisfied
Then, we can get the matrix dimension relationship as . Note that the first SVD operation in (11) needs to be implemented times on with dimension and the second SVD operation in (15) needs to be implemented times on with dimensions for the BD precoding and for the RBD precoding. From the above analysis, most of the computational complexity of the BD-type precoding algorithms comes from the two SVD operations which make the computational complexity of the BD-type precoding algorithms increase with the number of users and the system dimensions.
Iv Proposed S-GMI Based Precoding Algorithms
In this section, we describe the proposed LR-S-GMI-type precoding algorithms based on a strategy that employs a channel inversion method, QR decompositions, and lattice reductions. Similar to the BD-type precoding algorithms, the design of the proposed LR-S-GMI-type precoding algorithms is computed in two steps.
First, we obtain the first precoding filter for the LR-S-GMI-type precoding algorithms by using one channel inversion and QR decompositions each implemented on individual users with matrix dimension . By applying the MMSE inversion to the combined channel matrix, we have
where is the sub-matrix of . Considering a high SNR case, it can be shown that the regularization factor approaches zero and thus we have . This means the off-diagonal block matrices of converge to zeros with the increase of SNR. Hence, the matrix is approximately in the null space of defined in (8), that is,
Considering the QR decomposition of , we have
where is an orthogonal matrix and is an upper triangular matrix. Since is invertible, we have
Thus, satisfies the RBD constraint (10) to balance the MUI and the noise term.
We have simplified the design of the first precoding filter here as compared to  where a residual interference suppression filter is applied after the first precoding process . The filter increases the complexity and cannot completely cancel the MUI. Therefore, we omit the residual interference suppression part since it is not necessary for the RBD constraint based precoding. We term the simplified GMI as S-GMI in this work. Then, the first precoding filter for S-GMI can be obtained as
where . By implementing the QR decomposition in (22) times on with dimension , the first combined precoding matrix for S-GMI is
Note the QR decompositions of the LC-RBD-LR-type precoding in [19, 20] are implemented on with dimension . The S-GMI algorithm can be completed by applying the SVD operation to the effective channel matrix . Then, the second precoding filter of S-GMI is obtained as . The S-GMI algorithm is summarized in Table I.
|Applying the MMSE Channel Inversion|
|(2)||for i = 1 :|
|Compute the overall precoding and decoding matrix|
|Calculate the scaling factor|
|Get the received signal|
Similarly, the extension of the channel inversion method from the RBD constraint based precoding to the BD constraint based precoding is straightforward on
Moreover, the obtained MUI is strictly zero as . Assuming the QR decomposition of is , then, we have
Thus, satisfies the BD constraint (9). The first precoding matrix for the BD constraint based precoding can be equivalently obtained as
This equivalent method is termed as GZI in .
For the proposed LR-S-GMI-type precoding algorithms, we get the first precoding filter as S-GMI in (24), while we employ the LR-aided linear precoding technique instead of the SVD operation in S-GMI to obtain the second precoding filter . The aim of the LR transformation is to find a new basis which is nearly orthogonal compared to the original matrix for a given lattice . The most commonly used LR algorithm has been first proposed by Lenstra, Lenstra and L. Lovász (LLL) in  with polynomial time complexity. In order to reduce the computational complexity, a complex LLL (CLLL) algorithm was proposed in , which reduces the overall complexity of the LLL algorithm by nearly half without sacrificing any performance. We employ the CLLL algorithm to implement the LR transformation in this work.
After the first precoding, we transform the MU-MIMO channel into parallel or approximately parallel SU-MIMO channels and the effective channel matrix for the th user is
We perform the LR transformation on in the precoding scenario , that is
where is a unimodular matrix with and all elements of are complex integers, i.e. . The physical meaning of the constraint is that the channel energy is unchanged after the LR transformation.
Following the LR transformation, we employ the linear precoding constraint to get the second precoding filter to parallelize each user’s streams. The ZF precoding constraint is implemented for user as
It is well-known that the performance of MMSE precoding is always better than that of ZF precoding. We can get the second precoding filter by employing an MMSE precoding constraint. The MMSE precoding is actually equivalent to the ZF precoding with respect to an extended system model [29, 32]. The extended channel matrix for the MMSE precoding scheme is defined as
By introducing the regularization factor , a trade-off between the level of MUI and the noise is introduced . Then, the MMSE precoding filter is obtained as
where , and the multiplication by will not result in transmit power amplification since . From the mathematical expression in (33), the rows of determine the effective transmit power amplification of the MMSE precoding. Correspondingly, the LR transformation for the MMSE precoding should be applied to the transpose of the extended channel matrix , and the LR transformed channel matrix is obtained as
where is the unimodular matrix for . Then, the LR-aided MMSE precoding filter is given by
where the matrix . Finally, the combined second precoding matrix for all users is
The overall precoding matrix is . Since the lattice reduced precoding matrix has near orthogonal columns, the required transmit power will be reduced compared to the BD-type precoding algorithms. Thus, a better BER performance than that of the BD-type precoding algorithms can be achieved by the proposed LR-S-GMI-type precoding algorithms.
The received signal is finally obtained as
where . The main processing work left for the receiver is to quantize the received signal to the nearest data vector and the decoding matrix described in the BD-type [11, 13], QR/SVD-type , and GMI-type  precoding algorithms is not needed anymore. The receiver structure is thus simplified, and a significant amount of transmit power can be saved which is very important considering the mobility of the distributed users.
The proposed precoding algorithms are called LR-S-GMI-ZF and LR-S-GMI-MMSE depending on the choice of the second precoding filter as given in (31) and (35), respectively. We will focus on the LR-S-GMI-MMSE precoding since a better performance is achieved. The implementing steps of the LR-S-GMI-MMSE precoding algorithm are summarized in Table II. By replacing the steps (8) and (9) in Table II with the formulation in (31), the LR-S-GMI-ZF precoding algorithm can be obtained. Similarly, the first precoding matrix can also be computed according to the GZI method in (28), and combined with (31) or (35) to get the second precoding matrix. Then, the LR-GZI-ZF or LR-GZI-MMSE precoding algorithms can be obtained, respectively.
|Applying the MMSE Channel Inversion|
|(2)||for i = 1 :|
|Compute the overall precoding matrix|
|Calculate the scaling factor|
|Get the received signal|
|Transform back from lattice space|
V Performance Analysis
In this section, we carry out an analysis of the performance of the proposed LR-S-GMI-type precoding algorithms. We consider a performance analysis in terms of BER, sum-rate and computational complexity. In the BER analysis part, we show that the residual interference matrix of the RBD precoding actually converges to an identity matrix, which is a new result in the literature so far. We also mathematically demonstrate that the residual interference of the proposed LR-S-GMI-type precoding algorithms converges to a zero matrix. Finally, we illustrate the quality of the effective channel matrices of the proposed and existing precoding algorithms by measuring their condition numbers. The maximum achievable sum-rate of the proposed LR-S-GMI-type precoding algorithms is given in the sum-rate analysis part. The computational complexity of the proposed and existing precoding algorithms is summarized in tables in the complexity analysis part. The trend of the computational complexity with the increase of the dimensions is also given and an analysis is developed.
V-a BER Performance Analysis
For the BD precoding, the effective SU-MIMO channels are strictly parallel between each other after the first precoding filtering. For the RBD precoding, however, the residual interference is not zero between the users. We use to denote for convenience. From (13), the following formula is obtained
Mathematically, the quantity can be expressed as . Substituting this into (38), the formula can be rewritten as
With the increase of the SNR, approaches zero and then we have
By further manipulating the expression in (40), we obtain
that is, the residual interference matrix of the RBD precoding converges to an identity matrix at high SNRs. While, for the S-GMI precoding algorithm developed in Section IV with the SNR increase we have
By comparing (41) and (42), we can see that the impact of the residual interference of S-GMI precoding would be smaller than that of the conventional RBD precoding algorithm. Thus, we expect that a better BER performance is achieved by the S-GMI precoding algorithm over the conventional RBD precoding algorithm.
As pointed out in , the BER performance for a MIMO precoding system is actually determined by the energy of the transmitted signal . In order to reduce and improve the BER performance further, we transform the effective channel into the lattice space. By doing this, an improved basis is computed. Actually, the LR transformed channel matrix is quasi-orthogonal rather than strictly orthogonal. We can employ the condition number which is defined as 
to measure the orthogonality of the channel matrix. From the above definition of the condition number in (43), we get that with equality for an orthogonal basis while matrices which are nearly singular have large condition numbers. In Fig. 2, the probability density functions (PDFs) of the condition numbers for the effective channel matrices are illustrated. For the effective channel matrix of the proposed LR-S-GMI-MMSE precoding algorithm, not only the spread in the condition numbers but also their average value is much smaller compared to the effective channel matrices achieved by the existing precoding algorithms. Therefore, a significant reduction in the required transmit power is achieved and a better BER performance can be obtained by the proposed LR-S-GMI-MMSE precoding algorithm. Note for the special case of each user with a single receive antenna, the proposed LR-S-GMI-type precoding will not converge to GMI or S-GMI because the second precoding filter is designed in the lattice space.
V-B Achievable Sum-Rate Analysis
Recall that at high SNRs, the MU-MIMO channel is approximately decoupled into equivalent SU-MIMO channels by applying the first precoding filtering in (23). Then, we can transform the MU-MIMO sum-rate analysis  to a set of SU-MIMO sum-rate analysis tasks. For the second precoding filter, the LR-aided MMSE precoding is actually equal to the LR-aided ZF precoding under the high SNR scenario. Therefore, the th user’s received signal is
where . By assuming that the average transmit power is , and because of the fact that , we get the normalization factor as
where the quantity is the th singular value of , and denotes the energy of the th stream of .
From (45), the received SNR for the th stream of user is obtained as
Then, the achievable sum-rate for user is given by
Note that the achievable sum-rate is degraded by the normalization factor . The value of approaches its maximum when , thus we have
Finally, the maximum achievable sum-rate of the proposed LR-S-GMI-type precoding algorithms at high SNRs can be expressed as
For the BD precoding, we multiply the decoding matrix at the th user’s receiver and the received signal is given by
Due to the fact that the two precoding matrices and are semi-unitary matrices, we get and . Then, by applying the equivalence , the normalization factor for BD can be expressed as
Since the statistical property of is not changed by the multiplication with the unitary matrix , we get the th received SNR as
For simplicity, we do not consider the power loading between users and streams in the following derivation and term this strategy as no power loading (NPL). Then, the achievable sum-rate for the BD precoding algorithm is given by
By comparing the maximum achievable sum-rate of the proposed LR-S-GMI-type precoding algorithms in (49), we conclude that the sum-rate of the proposed LR-S-GMI-type precoding algorithms will be slightly inferior to that of the BD precoding algorithm at high SNRs. At low SNRs, however, we expect that the achieved sum-rate of the proposed LR-S-GMI-type precoding algorithms will be better than that of the BD precoding since a regularization factor is employed to mitigate the degradation by the noise term.
The sum-rate performance of the BD precoding is actually dependent on the power loading scheme being used. Hence, the BD precoding algorithm can achieve its maximum sum-rate performance by allocating the power between streams according to a WF power loading scheme. As pointed out in , we do not consider the power loading strategy for the RBD or the proposed LR-S-GMI-type precoding algorithms for two reasons. One is that it is not easy to identify the optimal power allocation coefficients because of the existence of residual interference. The second reason is that the MMSE condition (10) is already satisfied. Therefore, an allocation of different powers between streams is not needed.
V-C Computational Complexity Analysis
In this section, we use the total number of floating point operations (FLOPs) to measure the computational complexity of the precoding algorithms discussed above. It is worth noting that the lattice reduction algorithm has variable complexity, and the average complexity of the CLLL algorithm has been given in FLOPs by . A reduced and fixed complexity lattice reduction structure is proposed in , however, we employ the conventional CLLL algorithm for the reason that the lattice reduction algorithm is not the main focus in this work. The number of FLOPs for the complex QR decomposition and the real SVD operation are given in . As shown in , the number of FLOPs required by a complex SVD operation is equivalent to its extended real matrix. The total number of FLOPs needed by the matrix operations is summarized below:
Multiplication of and complex matrices: ;
QR decomposition of an complex matrix: ;
SVD of an complex matrix where only and are obtained: ;
SVD of an complex matrix where , and are obtained: ;
Inversion of an real matrix using Gauss-Jordan elimination: .
We illustrate the required FLOPs for the conventional RBD, S-GMI and LR-S-GMI-MMSE precoding algorithms in Table III, Table IV and Table V, respectively. The complexity of the QR/SVD RBD  and LC-RBD-LR-MMSE precoding algorithms is already given in . A system with transmit antennas and users each equipped with receive antennas is considered; this scenario is denoted as the case. For the case, the reduction in the number of FLOPs obtained by the proposed LR-S-GMI-MMSE precoding is , , and as compared to the RBD, BD, QR/SVD RBD and LC-RBD-LR-MMSE precoding algorithms, respectively. Clearly, the proposed LR-S-GMI-MMSE precoding algorithm requires the lowest complexity.