Low-Complexity Design of Generalized Block Diagonalization Precoding Algorithms for Multiuser MIMO Systems

Low-Complexity Design of Generalized Block Diagonalization Precoding Algorithms for Multiuser MIMO Systems

Keke Zu, Rodrigo C. de Lamare, Senior Member, IEEE and Martin Haardt, Senior Member, IEEE Part of this work has been submitted to IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove, USA, Oct. 2012 [1]
Abstract

Block diagonalization (BD) based precoding techniques are well-known linear transmit strategies for multiuser MIMO (MU-MIMO) systems. By employing BD-type precoding algorithms at the transmit side, the MU-MIMO broadcast channel is decomposed into multiple independent parallel single user MIMO (SU-MIMO) channels and achieves the maximum diversity order at high data rates. The main computational complexity of BD-type precoding algorithms comes from two singular value decomposition (SVD) operations, which depend on the number of users and the dimensions of each user’s channel matrix. In this work, low-complexity precoding algorithms are proposed to reduce the computational complexity and improve the performance of BD-type precoding algorithms. We devise a strategy based on a common channel inversion technique, QR decompositions, and lattice reductions to decouple the MU-MIMO channel into equivalent SU-MIMO channels. Analytical and simulation results show that the proposed precoding algorithms can achieve a comparable sum-rate performance as BD-type precoding algorithms, substantial bit error rate (BER) performance gains, and a simplified receiver structure, while requiring a much lower complexity.

ptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptpt

Low-Complexity Design of Generalized Block Diagonalization Precoding Algorithms for Multiuser MIMO Systems


Keke Zu, Rodrigo C. de Lamare, Senior Member, IEEE and Martin Haardt, Senior Member, IEEE


00footnotetext: Part of this work has been submitted to IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove, USA, Oct. 2012 [1]

Index Terms

Multiuser MIMO (MU-MIMO), block diagonalization (BD), regularized block diagonalization (RBD), low-complexity, lattice reduction (LR).

I Introduction

Multiple-input multiple-output (MIMO) systems have drawn a considerable research effort in the past years due to the fact that they can greatly increase the spectrum efficiency of wireless communications [2, 3]. In order to meet the continuous growing data traffic, a downlink peak spectrum efficiency of 30 bps/Hz and an uplink peak spectrum efficiency of 15 bps/Hz is proposed in LTE-Advanced [4], and a configuration of up to transmit antennas for the downlink is suggested. A new amendment for the WLAN standard IEEE 802.11ac [5] also recommends up to MIMO spatial streams. Configurations with dozens of antennas are now being considered [6]. High-dimensional MIMO systems or large MIMO systems are very promising for the next generation of wireless communication systems due to their potential to improve rate and reliability dramatically [7]. However, it is a challenge to design a suitable precoding algorithm with good overall performance and low computational complexity at the same time for high-dimensional MIMO systems.

Unlike the received signal in single user MIMO (SU-MIMO) systems, the received signals of different users in multiuser MIMO (MU-MIMO) systems not only suffer from the noise and the inter-antenna interference but are also affected by the multiuser interference (MUI). Channel inversion based precoding or linear precoding algorithms such as zero forcing (ZF) and minimum mean squared error (MMSE) precoding [8] can still be used to cancel the MUI, but they result in a reduced throughput or require a higher power at the transmitter in the MU-MIMO scenarios. As a generalization of the ZF precoding algorithm, block diagonalization (BD) based precoding algorithms have been proposed in [11, 12] for MU-MIMO systems. However, BD based precoding algorithms only take the MUI into account and thus suffer a performance loss at low signal to noise ratios (SNRs) when the noise is the dominant factor. Therefore, a regularized block diagonalization (RBD) precoding algorithm which introduces a regularization factor to take the noise term into account has been proposed in [13]. We term the BD and RBD based precoding schemes as BD-type precoding algorithms in this work for convenience.

The main steps of the BD-type precoding algorithms are two SVD operations, which need to be implemented for each user. Therefore, the computational complexity of the BD-type precoding algorithms depends on the number of users and the dimensions of each user’s channel matrix. For MU-MIMO systems with a large number of users and multiple receive antennas, this could result in a considerable computational cost. Another distinctive aspect of the BD-type precoding algorithms is that they need a decoding matrix obtained from the second SVD operation to orthogonalize each user’s streams. The requirement of this decoding matrix brings extra control overhead or computational complexity [14].

Recent work on BD-type precoding algorithms has focused on how to equivalently implement the BD-type precoding algorithms with less computational complexity. A low-complexity generalized ZF channel inversion (GZI) method has been proposed in [15] to equivalently implement the first SVD operation of the original BD precoding, and a generalized MMSE channel inversion (GMI) method is also developed in [15] for the original RBD precoding. In [16] the first SVD operation of the RBD precoding is replaced with a less complex QR decomposition [17]. We term the work in [15] as GMI-type precoding and the work in [16] as QR/SVD-type precoding. For the second SVD operation, however, both the GMI-type and QR/SVD-type precoding schemes employ it in a similar way as the conventional BD-type precoding algorithms to parallelize each user’s streams. Therefore, the second SVD operation needs to be implemented multiple times and the decoding matrix for the effective channel still needs to be known or estimated at the receiver of each user for the GMI-type or QR/SVD-type precoding algorithms.

The GMI-type and QR/SVD-type techniques are solely low complexity equivalent implementations of the BD-type precoding algorithms. As an improvement of the BD-type precoding algorithms, a low-complexity lattice reduction-aided RBD (LC-RBD-LR) type precoding algorithms has been proposed in [19, 20] based on the QR decomposition scheme. Not only much less complexity but also considerable BER gains are achieved by the LC-RBD-LR-type precoding algorithms. However, the QR decomposition in LC-RBD-LR-type precoding algorithms still needs to be implemented for each user, which could result in a high complexity for large MIMO systems.

A new category of low-complexity high performance precoding algorithms based on the channel inversion scheme is proposed in this work. A simplified GMI (S-GMI) precoding scheme which employs a common channel inversion for all users is developed first. Equivalent parallel SU-MIMO channels are obtained from the S-GMI precoding process. Then, these effective channels are transformed into the lattice space by utilizing the lattice reduction (LR) technique [18], whose complexity is dictated by a QR decomposition. Linear precoding strategies are applied in the lattice space to parallelize each user’s streams. Finally, the proposed lattice reduction-aided simplified GMI (LR-S-GMI) precoding algorithms are obtained. According to the specific linear precoding constraint used, the proposed LR-S-GMI-type precoding algorithms are categorized as LR-S-GMI-ZF and LR-S-GMI-MMSE, respectively.

The algorithm structure of the proposed LR-S-GMI-type precoding is different from the LC-RBD-LR-type precoding since the channel inversion is only implemented once for all users, while the QR decomposition needs to be implemented multiple times in LC-RBD-LR-type precoding. Therefore, the computational complexity can be reduced considerably by the proposed LR-S-GMI-type precoding. A comprehensive mathematical analysis is developed to analyze and predict the performance of the proposed LR-S-GMI-type precoding algorithms. The simulation results verified that the proposed LR-S-GMI-type precoding algorithms have the lowest computational complexity compared to BD-type [11, 13], GMI-type [15], QR/SVD-type [16] and LC-RBD-LR-type [19] precoding algorithms, a comparable sum-rate performance as BD-type precoding algorithms, and substantial BER performance gains over prior art.

The main contributions of the work are summarized below:

  1. A simplified GMI (S-GMI) precoding is developed in this work as an improvement of the original RBD in [13]. A mathematical analysis is given to show that the S-GMI has a better BER performance and much less complexity than that of RBD, which is a clear difference compared to the GMI in [15] which only provides an equivalent implementation of RBD.

  2. A new category of low-complexity high-performance LR-S-GMI-type precoding algorithms is proposed for MU-MIMO systems based on a channel inversion technique, QR decompositions, and lattice reductions.

  3. The BD-type precoding algorithms are systematically analyzed and summarized. We show that the computational complexity of the BD-type precoding depends on the number of users and the system dimensions.

  4. A comprehensive performance analysis is carried out in terms of BER performance, achievable sum-rate, and computational complexity.

  5. A simulation study of the proposed algorithms under imperfect channel situations is also conducted, which completes this paper.

The proposed and existing precoding techniques are all performed with the help of downlink channel state information (CSI). The assumption that full CSI is available at the transmit side is valid in time-division duplex (TDD) systems because the uplink and downlink share the same frequency band. For frequency-division duplex (FDD) systems, however, the CSI needs to be estimated at the receiver and fed back to the transmitter.

This paper is organized as follows. The system model is given in Section II. A brief review of the BD-type precoding algorithms is presented in Section III. The proposed LR-S-GMI-type precoding algorithms are described in detail in Section IV and the performance analysis is developed in Section V. Simulation results and conclusions are displayed in Section VI and Section VII, respectively.

Notation: Matrices and vectors are denoted by upper and lowercase boldface letters, and the transpose, Hermitian transpose, inverse, pseudo-inverse of a matrix are described by , , , , respectively. The trace, determinant, Frobenius norm, round function are denoted as , , , . With } creates a block diagonal matrix with the matrices on the main diagonal.

Ii System Model

We consider an uncoded MU-MIMO downlink channel, with transmit antennas at the base station (BS) and receive antennas at the th user equipment (UE). With users in the system, the total number of receive antennas is . A block diagram of such a system is illustrated in Fig. 1.

Fig. 1: MU-MIMO System Model

From the system model, the combined channel matrix and the joint precoding matrix are given by

(1)
(2)

where is the th user’s channel matrix. The quantity is the th user’s precoding matrix. We assume a flat fading MIMO channel and the received signal at the th user is given by

(3)

where the quantity is the th user’s transmitted signal, and is the th user’s Gaussian noise with independent and identically distributed (i.i.d.) entries of zero mean and variance . Assuming that the average transmit power for each user is , then, the power constraint is imposed. We construct an unnormalized signal such that

(4)

where with being the transmit data vector and is the average energy of with . The physical meaning of dividing by the scalar is to make sure the average transmit power is still the same after the precoding process. With this normalization, obeys .

The received signal is weighted by the scalar to form the estimate

(5)

Note that it is necessary to cancel out at the receiver to get the correct amplitude of the desired signal part. The average energy is independent from the channel and the data, which means the receivers do not need to know the instantaneous CSI for the precoding techniques to work [21]. As analyzed and illustrated in [22], however, the performance difference between the average and the instantaneous is very small. Therefore, we follow the strategy developed in [21] and [22] to assume the receivers need to know only but use instead of for simulation convenience as is simpler to compute. The simulation results represent the performance of either normalization method. In this case, we can replace (4) and (5) with the instantaneous in the simulations and employ

(6)

Iii Review of BD-type Precoding Algorithms

The design of BD-type precoding algorithms is performed in two steps [11, 13]. The first precoding filter is used to completely eliminate (by BD) or balance the MUI with noise (by RBD), then exact (by BD) or approximate (by RBD) parallel SU-MIMO channels are obtained. The second precoding filter is implemented to parallelize each user’s streams. Correspondingly, the precoding matrix for the th user can be rewritten in two parts as

(7)

where and . The parameter is dependent on the specific choice of the precoding algorithm. We exclude the th user’s channel matrix and define as

(8)

where . Then, the interference generated to the other users is determined by .

In order to eliminate all the MUI, we impose the constraint that

(9)

We term (9) as the BD constraint. Note that the BD constraint is actually an extension of the ZF constraint in [8] for MU-MIMO with multiple receive antennas. In order to take the noise term into account as well, an RBD constraint is developed in [13] and given by

(10)

Assuming that the rank of is , define the SVD of

(11)

where and are unitary matrices. The diagonal matrix contains the singular values of the matrix . Factorizing into two parts, consists of the first non-zero singular vectors and holds the last zero singular vectors. Thus, forms an orthogonal basis for the null space of . The solution for the BD constraint (9) is given by

(12)

As shown in [13], the solution for the RBD constraint can be obtained as

(13)

where is the regularization factor with is the whole average transmit power.

After the first precoding process, the MU-MIMO channel is decoupled into a set of parallel independent SU-MIMO channels by the BD precoding. For the RBD precoding, there are residual interferences between these channels due to the regularization process, but, these interferences tend to zero at high SNRs. Therefore, the effective channel matrix for the th user can be expressed as

(14)

Define and consider the second SVD operation on the effective channel matrix

(15)

using the unitary matrix and contains the first singular vectors. The second precoding filters for BD and RBD precoding can be respectively obtained as

(16)
(17)

where is the power loading matrix that depends on the optimization criterion. An example power loading is the water filling (WF) [2]. The th user’s decoding matrix is obtained as

(18)

which needs to be known by each user’s receiver.

Note that for the conventional BD-type precoding algorithms, there is a dimensionality constraint below to be satisfied

(19)

Then, we can get the matrix dimension relationship as . Note that the first SVD operation in (11) needs to be implemented times on with dimension and the second SVD operation in (15) needs to be implemented times on with dimensions for the BD precoding and for the RBD precoding. From the above analysis, most of the computational complexity of the BD-type precoding algorithms comes from the two SVD operations which make the computational complexity of the BD-type precoding algorithms increase with the number of users and the system dimensions.

Iv Proposed S-GMI Based Precoding Algorithms

In this section, we describe the proposed LR-S-GMI-type precoding algorithms based on a strategy that employs a channel inversion method, QR decompositions, and lattice reductions. Similar to the BD-type precoding algorithms, the design of the proposed LR-S-GMI-type precoding algorithms is computed in two steps.

First, we obtain the first precoding filter for the LR-S-GMI-type precoding algorithms by using one channel inversion and QR decompositions each implemented on individual users with matrix dimension . By applying the MMSE inversion to the combined channel matrix, we have

(20)

where is the sub-matrix of . Considering a high SNR case, it can be shown that the regularization factor approaches zero and thus we have [8]. This means the off-diagonal block matrices of converge to zeros with the increase of SNR. Hence, the matrix is approximately in the null space of defined in (8), that is,

(21)

Considering the QR decomposition of , we have

(22)

where is an orthogonal matrix and is an upper triangular matrix. Since is invertible, we have

(23)

Thus, satisfies the RBD constraint (10) to balance the MUI and the noise term.

We have simplified the design of the first precoding filter here as compared to [15] where a residual interference suppression filter is applied after the first precoding process . The filter increases the complexity and cannot completely cancel the MUI. Therefore, we omit the residual interference suppression part since it is not necessary for the RBD constraint based precoding. We term the simplified GMI as S-GMI in this work. Then, the first precoding filter for S-GMI can be obtained as

(24)

where . By implementing the QR decomposition in (22) times on with dimension , the first combined precoding matrix for S-GMI is

(25)

Note the QR decompositions of the LC-RBD-LR-type precoding in [19, 20] are implemented on with dimension . The S-GMI algorithm can be completed by applying the SVD operation to the effective channel matrix . Then, the second precoding filter of S-GMI is obtained as . The S-GMI algorithm is summarized in Table I.

Steps Operations
Applying the MMSE Channel Inversion
(1)
(2) for i = 1 : 
(3)      
(4)      
(5)      
(6)      
(7)      
(8) end
Compute the overall precoding and decoding matrix
(9)
(10)
(11)    
Calculate the scaling factor
(12)
Get the received signal
(13)
TABLE I: The S-GMI Precoding Algorithm

Similarly, the extension of the channel inversion method from the RBD constraint based precoding to the BD constraint based precoding is straightforward on

(26)

Moreover, the obtained MUI is strictly zero as . Assuming the QR decomposition of is , then, we have

(27)

Thus, satisfies the BD constraint (9). The first precoding matrix for the BD constraint based precoding can be equivalently obtained as

(28)

This equivalent method is termed as GZI in [15].

For the proposed LR-S-GMI-type precoding algorithms, we get the first precoding filter as S-GMI in (24), while we employ the LR-aided linear precoding technique instead of the SVD operation in S-GMI to obtain the second precoding filter . The aim of the LR transformation is to find a new basis which is nearly orthogonal compared to the original matrix for a given lattice . The most commonly used LR algorithm has been first proposed by Lenstra, Lenstra and L. Lovász (LLL) in [23] with polynomial time complexity. In order to reduce the computational complexity, a complex LLL (CLLL) algorithm was proposed in [27], which reduces the overall complexity of the LLL algorithm by nearly half without sacrificing any performance. We employ the CLLL algorithm to implement the LR transformation in this work.

After the first precoding, we transform the MU-MIMO channel into parallel or approximately parallel SU-MIMO channels and the effective channel matrix for the th user is

(29)

We perform the LR transformation on in the precoding scenario [28], that is

(30)

where is a unimodular matrix with and all elements of are complex integers, i.e. . The physical meaning of the constraint is that the channel energy is unchanged after the LR transformation.

Following the LR transformation, we employ the linear precoding constraint to get the second precoding filter to parallelize each user’s streams. The ZF precoding constraint is implemented for user as

(31)

It is well-known that the performance of MMSE precoding is always better than that of ZF precoding. We can get the second precoding filter by employing an MMSE precoding constraint. The MMSE precoding is actually equivalent to the ZF precoding with respect to an extended system model [29, 32]. The extended channel matrix for the MMSE precoding scheme is defined as

(32)

By introducing the regularization factor , a trade-off between the level of MUI and the noise is introduced [8]. Then, the MMSE precoding filter is obtained as

(33)

where , and the multiplication by will not result in transmit power amplification since . From the mathematical expression in (33), the rows of determine the effective transmit power amplification of the MMSE precoding. Correspondingly, the LR transformation for the MMSE precoding should be applied to the transpose of the extended channel matrix , and the LR transformed channel matrix is obtained as

(34)

where is the unimodular matrix for . Then, the LR-aided MMSE precoding filter is given by

(35)

where the matrix . Finally, the combined second precoding matrix for all users is

(36)

The overall precoding matrix is . Since the lattice reduced precoding matrix has near orthogonal columns, the required transmit power will be reduced compared to the BD-type precoding algorithms. Thus, a better BER performance than that of the BD-type precoding algorithms can be achieved by the proposed LR-S-GMI-type precoding algorithms.

The received signal is finally obtained as

(37)

where . The main processing work left for the receiver is to quantize the received signal to the nearest data vector and the decoding matrix described in the BD-type [11, 13], QR/SVD-type [16], and GMI-type [15] precoding algorithms is not needed anymore. The receiver structure is thus simplified, and a significant amount of transmit power can be saved which is very important considering the mobility of the distributed users.

The proposed precoding algorithms are called LR-S-GMI-ZF and LR-S-GMI-MMSE depending on the choice of the second precoding filter as given in (31) and (35), respectively. We will focus on the LR-S-GMI-MMSE precoding since a better performance is achieved. The implementing steps of the LR-S-GMI-MMSE precoding algorithm are summarized in Table II. By replacing the steps (8) and (9) in Table II with the formulation in (31), the LR-S-GMI-ZF precoding algorithm can be obtained. Similarly, the first precoding matrix can also be computed according to the GZI method in (28), and combined with (31) or (35) to get the second precoding matrix. Then, the LR-GZI-ZF or LR-GZI-MMSE precoding algorithms can be obtained, respectively.

Steps Operations
Applying the MMSE Channel Inversion
(1)
(2) for i = 1 : 
(3)      
(4)      
(5)      
(6)      
(7)      
(8)      
(9)      
(10) end
Compute the overall precoding matrix
(11)
(12)
(13)
Calculate the scaling factor
(14)
Get the received signal
(15)
Transform back from lattice space
(16)
TABLE II: The LR-S-GMI-MMSE Precoding Algorithm

V Performance Analysis

In this section, we carry out an analysis of the performance of the proposed LR-S-GMI-type precoding algorithms. We consider a performance analysis in terms of BER, sum-rate and computational complexity. In the BER analysis part, we show that the residual interference matrix of the RBD precoding actually converges to an identity matrix, which is a new result in the literature so far. We also mathematically demonstrate that the residual interference of the proposed LR-S-GMI-type precoding algorithms converges to a zero matrix. Finally, we illustrate the quality of the effective channel matrices of the proposed and existing precoding algorithms by measuring their condition numbers. The maximum achievable sum-rate of the proposed LR-S-GMI-type precoding algorithms is given in the sum-rate analysis part. The computational complexity of the proposed and existing precoding algorithms is summarized in tables in the complexity analysis part. The trend of the computational complexity with the increase of the dimensions is also given and an analysis is developed.

V-a BER Performance Analysis

For the BD precoding, the effective SU-MIMO channels are strictly parallel between each other after the first precoding filtering. For the RBD precoding, however, the residual interference is not zero between the users. We use to denote for convenience. From (13), the following formula is obtained

(38)

Mathematically, the quantity can be expressed as . Substituting this into (38), the formula can be rewritten as

(39)

With the increase of the SNR, approaches zero and then we have

(40)

By further manipulating the expression in (40), we obtain

(41)

that is, the residual interference matrix of the RBD precoding converges to an identity matrix at high SNRs. While, for the S-GMI precoding algorithm developed in Section IV with the SNR increase we have

(42)

By comparing (41) and (42), we can see that the impact of the residual interference of S-GMI precoding would be smaller than that of the conventional RBD precoding algorithm. Thus, we expect that a better BER performance is achieved by the S-GMI precoding algorithm over the conventional RBD precoding algorithm.

As pointed out in [21], the BER performance for a MIMO precoding system is actually determined by the energy of the transmitted signal . In order to reduce and improve the BER performance further, we transform the effective channel into the lattice space. By doing this, an improved basis is computed. Actually, the LR transformed channel matrix is quasi-orthogonal rather than strictly orthogonal. We can employ the condition number which is defined as [17]

(43)

to measure the orthogonality of the channel matrix. From the above definition of the condition number in (43), we get that with equality for an orthogonal basis while matrices which are nearly singular have large condition numbers. In Fig. 2, the probability density functions (PDFs) of the condition numbers for the effective channel matrices are illustrated. For the effective channel matrix of the proposed LR-S-GMI-MMSE precoding algorithm, not only the spread in the condition numbers but also their average value is much smaller compared to the effective channel matrices achieved by the existing precoding algorithms. Therefore, a significant reduction in the required transmit power is achieved and a better BER performance can be obtained by the proposed LR-S-GMI-MMSE precoding algorithm. Note for the special case of each user with a single receive antenna, the proposed LR-S-GMI-type precoding will not converge to GMI or S-GMI because the second precoding filter is designed in the lattice space.

Fig. 2: PDFs of the natural logarithm of for matrices

V-B Achievable Sum-Rate Analysis

Recall that at high SNRs, the MU-MIMO channel is approximately decoupled into equivalent SU-MIMO channels by applying the first precoding filtering in (23). Then, we can transform the MU-MIMO sum-rate analysis [34] to a set of SU-MIMO sum-rate analysis tasks. For the second precoding filter, the LR-aided MMSE precoding is actually equal to the LR-aided ZF precoding under the high SNR scenario. Therefore, the th user’s received signal is

(44)

where . By assuming that the average transmit power is , and because of the fact that , we get the normalization factor as

(45)

where the quantity is the th singular value of , and denotes the energy of the th stream of .

From (45), the received SNR for the th stream of user is obtained as

(46)

Then, the achievable sum-rate for user is given by

(47)

Note that the achievable sum-rate is degraded by the normalization factor . The value of approaches its maximum when , thus we have

(48)

Finally, the maximum achievable sum-rate of the proposed LR-S-GMI-type precoding algorithms at high SNRs can be expressed as

(49)

For the BD precoding, we multiply the decoding matrix at the th user’s receiver and the received signal is given by

(50)

Due to the fact that the two precoding matrices and are semi-unitary matrices, we get and . Then, by applying the equivalence , the normalization factor for BD can be expressed as

(51)

Since the statistical property of is not changed by the multiplication with the unitary matrix , we get the th received SNR as

(52)

For simplicity, we do not consider the power loading between users and streams in the following derivation and term this strategy as no power loading (NPL). Then, the achievable sum-rate for the BD precoding algorithm is given by

(53)

By comparing the maximum achievable sum-rate of the proposed LR-S-GMI-type precoding algorithms in (49), we conclude that the sum-rate of the proposed LR-S-GMI-type precoding algorithms will be slightly inferior to that of the BD precoding algorithm at high SNRs. At low SNRs, however, we expect that the achieved sum-rate of the proposed LR-S-GMI-type precoding algorithms will be better than that of the BD precoding since a regularization factor is employed to mitigate the degradation by the noise term.

The sum-rate performance of the BD precoding is actually dependent on the power loading scheme being used. Hence, the BD precoding algorithm can achieve its maximum sum-rate performance by allocating the power between streams according to a WF power loading scheme. As pointed out in [15], we do not consider the power loading strategy for the RBD or the proposed LR-S-GMI-type precoding algorithms for two reasons. One is that it is not easy to identify the optimal power allocation coefficients because of the existence of residual interference. The second reason is that the MMSE condition (10) is already satisfied. Therefore, an allocation of different powers between streams is not needed.

V-C Computational Complexity Analysis

In this section, we use the total number of floating point operations (FLOPs) to measure the computational complexity of the precoding algorithms discussed above. It is worth noting that the lattice reduction algorithm has variable complexity, and the average complexity of the CLLL algorithm has been given in FLOPs by [27]. A reduced and fixed complexity lattice reduction structure is proposed in [35], however, we employ the conventional CLLL algorithm for the reason that the lattice reduction algorithm is not the main focus in this work. The number of FLOPs for the complex QR decomposition and the real SVD operation are given in [17]. As shown in [19], the number of FLOPs required by a complex SVD operation is equivalent to its extended real matrix. The total number of FLOPs needed by the matrix operations is summarized below:

  • Multiplication of and complex matrices: ;

  • QR decomposition of an complex matrix: ;

  • SVD of an complex matrix where only and are obtained: ;

  • SVD of an complex matrix where , and are obtained: ;

  • Inversion of an real matrix using Gauss-Jordan elimination: .

We illustrate the required FLOPs for the conventional RBD, S-GMI and LR-S-GMI-MMSE precoding algorithms in Table III, Table IV and Table V, respectively. The complexity of the QR/SVD RBD [16] and LC-RBD-LR-MMSE precoding algorithms is already given in [19]. A system with transmit antennas and users each equipped with receive antennas is considered; this scenario is denoted as the case. For the case, the reduction in the number of FLOPs obtained by the proposed LR-S-GMI-MMSE precoding is , , and as compared to the RBD, BD, QR/SVD RBD and LC-RBD-LR-MMSE precoding algorithms, respectively. Clearly, the proposed LR-S-GMI-MMSE precoding algorithm requires the lowest complexity.

Steps Operations Flops Case
1 21504
2 336
3 5184
4 552
5 13248
Total 40824
TABLE III: Computational complexity of conventional RBD
Steps Operations Flops Case
1
2736
2 2432
3 552
4 13248