LowComplexity Design of Generalized Block Diagonalization Precoding Algorithms for Multiuser MIMO Systems
Abstract
Block diagonalization (BD) based precoding techniques are wellknown linear transmit strategies for multiuser MIMO (MUMIMO) systems. By employing BDtype precoding algorithms at the transmit side, the MUMIMO broadcast channel is decomposed into multiple independent parallel single user MIMO (SUMIMO) channels and achieves the maximum diversity order at high data rates. The main computational complexity of BDtype precoding algorithms comes from two singular value decomposition (SVD) operations, which depend on the number of users and the dimensions of each user’s channel matrix. In this work, lowcomplexity precoding algorithms are proposed to reduce the computational complexity and improve the performance of BDtype precoding algorithms. We devise a strategy based on a common channel inversion technique, QR decompositions, and lattice reductions to decouple the MUMIMO channel into equivalent SUMIMO channels. Analytical and simulation results show that the proposed precoding algorithms can achieve a comparable sumrate performance as BDtype precoding algorithms, substantial bit error rate (BER) performance gains, and a simplified receiver structure, while requiring a much lower complexity.
ptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptpt
LowComplexity Design of Generalized Block Diagonalization Precoding Algorithms for Multiuser MIMO Systems
Keke Zu, Rodrigo C. de Lamare, Senior Member, IEEE and Martin Haardt, Senior Member, IEEE
^{0}^{0}footnotetext: Part of this work has been submitted to IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove, USA, Oct. 2012 [1]
Index Terms
Multiuser MIMO (MUMIMO), block diagonalization (BD), regularized block diagonalization (RBD), lowcomplexity, lattice reduction (LR).
I Introduction
Multipleinput multipleoutput (MIMO) systems have drawn a considerable research effort in the past years due to the fact that they can greatly increase the spectrum efficiency of wireless communications [2, 3]. In order to meet the continuous growing data traffic, a downlink peak spectrum efficiency of 30 bps/Hz and an uplink peak spectrum efficiency of 15 bps/Hz is proposed in LTEAdvanced [4], and a configuration of up to transmit antennas for the downlink is suggested. A new amendment for the WLAN standard IEEE 802.11ac [5] also recommends up to MIMO spatial streams. Configurations with dozens of antennas are now being considered [6]. Highdimensional MIMO systems or large MIMO systems are very promising for the next generation of wireless communication systems due to their potential to improve rate and reliability dramatically [7]. However, it is a challenge to design a suitable precoding algorithm with good overall performance and low computational complexity at the same time for highdimensional MIMO systems.
Unlike the received signal in single user MIMO (SUMIMO) systems, the received signals of different users in multiuser MIMO (MUMIMO) systems not only suffer from the noise and the interantenna interference but are also affected by the multiuser interference (MUI). Channel inversion based precoding or linear precoding algorithms such as zero forcing (ZF) and minimum mean squared error (MMSE) precoding [8] can still be used to cancel the MUI, but they result in a reduced throughput or require a higher power at the transmitter in the MUMIMO scenarios. As a generalization of the ZF precoding algorithm, block diagonalization (BD) based precoding algorithms have been proposed in [11, 12] for MUMIMO systems. However, BD based precoding algorithms only take the MUI into account and thus suffer a performance loss at low signal to noise ratios (SNRs) when the noise is the dominant factor. Therefore, a regularized block diagonalization (RBD) precoding algorithm which introduces a regularization factor to take the noise term into account has been proposed in [13]. We term the BD and RBD based precoding schemes as BDtype precoding algorithms in this work for convenience.
The main steps of the BDtype precoding algorithms are two SVD operations, which need to be implemented for each user. Therefore, the computational complexity of the BDtype precoding algorithms depends on the number of users and the dimensions of each user’s channel matrix. For MUMIMO systems with a large number of users and multiple receive antennas, this could result in a considerable computational cost. Another distinctive aspect of the BDtype precoding algorithms is that they need a decoding matrix obtained from the second SVD operation to orthogonalize each user’s streams. The requirement of this decoding matrix brings extra control overhead or computational complexity [14].
Recent work on BDtype precoding algorithms has focused on how to equivalently implement the BDtype precoding algorithms with less computational complexity. A lowcomplexity generalized ZF channel inversion (GZI) method has been proposed in [15] to equivalently implement the first SVD operation of the original BD precoding, and a generalized MMSE channel inversion (GMI) method is also developed in [15] for the original RBD precoding. In [16] the first SVD operation of the RBD precoding is replaced with a less complex QR decomposition [17]. We term the work in [15] as GMItype precoding and the work in [16] as QR/SVDtype precoding. For the second SVD operation, however, both the GMItype and QR/SVDtype precoding schemes employ it in a similar way as the conventional BDtype precoding algorithms to parallelize each user’s streams. Therefore, the second SVD operation needs to be implemented multiple times and the decoding matrix for the effective channel still needs to be known or estimated at the receiver of each user for the GMItype or QR/SVDtype precoding algorithms.
The GMItype and QR/SVDtype techniques are solely low complexity equivalent implementations of the BDtype precoding algorithms. As an improvement of the BDtype precoding algorithms, a lowcomplexity lattice reductionaided RBD (LCRBDLR) type precoding algorithms has been proposed in [19, 20] based on the QR decomposition scheme. Not only much less complexity but also considerable BER gains are achieved by the LCRBDLRtype precoding algorithms. However, the QR decomposition in LCRBDLRtype precoding algorithms still needs to be implemented for each user, which could result in a high complexity for large MIMO systems.
A new category of lowcomplexity high performance precoding algorithms based on the channel inversion scheme is proposed in this work. A simplified GMI (SGMI) precoding scheme which employs a common channel inversion for all users is developed first. Equivalent parallel SUMIMO channels are obtained from the SGMI precoding process. Then, these effective channels are transformed into the lattice space by utilizing the lattice reduction (LR) technique [18], whose complexity is dictated by a QR decomposition. Linear precoding strategies are applied in the lattice space to parallelize each user’s streams. Finally, the proposed lattice reductionaided simplified GMI (LRSGMI) precoding algorithms are obtained. According to the specific linear precoding constraint used, the proposed LRSGMItype precoding algorithms are categorized as LRSGMIZF and LRSGMIMMSE, respectively.
The algorithm structure of the proposed LRSGMItype precoding is different from the LCRBDLRtype precoding since the channel inversion is only implemented once for all users, while the QR decomposition needs to be implemented multiple times in LCRBDLRtype precoding. Therefore, the computational complexity can be reduced considerably by the proposed LRSGMItype precoding. A comprehensive mathematical analysis is developed to analyze and predict the performance of the proposed LRSGMItype precoding algorithms. The simulation results verified that the proposed LRSGMItype precoding algorithms have the lowest computational complexity compared to BDtype [11, 13], GMItype [15], QR/SVDtype [16] and LCRBDLRtype [19] precoding algorithms, a comparable sumrate performance as BDtype precoding algorithms, and substantial BER performance gains over prior art.
The main contributions of the work are summarized below:

A simplified GMI (SGMI) precoding is developed in this work as an improvement of the original RBD in [13]. A mathematical analysis is given to show that the SGMI has a better BER performance and much less complexity than that of RBD, which is a clear difference compared to the GMI in [15] which only provides an equivalent implementation of RBD.

A new category of lowcomplexity highperformance LRSGMItype precoding algorithms is proposed for MUMIMO systems based on a channel inversion technique, QR decompositions, and lattice reductions.

The BDtype precoding algorithms are systematically analyzed and summarized. We show that the computational complexity of the BDtype precoding depends on the number of users and the system dimensions.

A comprehensive performance analysis is carried out in terms of BER performance, achievable sumrate, and computational complexity.

A simulation study of the proposed algorithms under imperfect channel situations is also conducted, which completes this paper.
The proposed and existing precoding techniques are all performed with the help of downlink channel state information (CSI). The assumption that full CSI is available at the transmit side is valid in timedivision duplex (TDD) systems because the uplink and downlink share the same frequency band. For frequencydivision duplex (FDD) systems, however, the CSI needs to be estimated at the receiver and fed back to the transmitter.
This paper is organized as follows. The system model is given in Section II. A brief review of the BDtype precoding algorithms is presented in Section III. The proposed LRSGMItype precoding algorithms are described in detail in Section IV and the performance analysis is developed in Section V. Simulation results and conclusions are displayed in Section VI and Section VII, respectively.
Notation: Matrices and vectors are denoted by upper and lowercase boldface letters, and the transpose, Hermitian transpose, inverse, pseudoinverse of a matrix are described by , , , , respectively. The trace, determinant, Frobenius norm, round function are denoted as , , , . With } creates a block diagonal matrix with the matrices on the main diagonal.
Ii System Model
We consider an uncoded MUMIMO downlink channel, with transmit antennas at the base station (BS) and receive antennas at the th user equipment (UE). With users in the system, the total number of receive antennas is . A block diagram of such a system is illustrated in Fig. 1.
From the system model, the combined channel matrix and the joint precoding matrix are given by
(1)  
(2) 
where is the th user’s channel matrix. The quantity is the th user’s precoding matrix. We assume a flat fading MIMO channel and the received signal at the th user is given by
(3) 
where the quantity is the th user’s transmitted signal, and is the th user’s Gaussian noise with independent and identically distributed (i.i.d.) entries of zero mean and variance . Assuming that the average transmit power for each user is , then, the power constraint is imposed. We construct an unnormalized signal such that
(4) 
where with being the transmit data vector and is the average energy of with . The physical meaning of dividing by the scalar is to make sure the average transmit power is still the same after the precoding process. With this normalization, obeys .
The received signal is weighted by the scalar to form the estimate
(5) 
Note that it is necessary to cancel out at the receiver to get the correct amplitude of the desired signal part. The average energy is independent from the channel and the data, which means the receivers do not need to know the instantaneous CSI for the precoding techniques to work [21]. As analyzed and illustrated in [22], however, the performance difference between the average and the instantaneous is very small. Therefore, we follow the strategy developed in [21] and [22] to assume the receivers need to know only but use instead of for simulation convenience as is simpler to compute. The simulation results represent the performance of either normalization method. In this case, we can replace (4) and (5) with the instantaneous in the simulations and employ
(6) 
Iii Review of BDtype Precoding Algorithms
The design of BDtype precoding algorithms is performed in two steps [11, 13]. The first precoding filter is used to completely eliminate (by BD) or balance the MUI with noise (by RBD), then exact (by BD) or approximate (by RBD) parallel SUMIMO channels are obtained. The second precoding filter is implemented to parallelize each user’s streams. Correspondingly, the precoding matrix for the th user can be rewritten in two parts as
(7) 
where and . The parameter is dependent on the specific choice of the precoding algorithm. We exclude the th user’s channel matrix and define as
(8) 
where . Then, the interference generated to the other users is determined by .
In order to eliminate all the MUI, we impose the constraint that
(9) 
We term (9) as the BD constraint. Note that the BD constraint is actually an extension of the ZF constraint in [8] for MUMIMO with multiple receive antennas. In order to take the noise term into account as well, an RBD constraint is developed in [13] and given by
(10) 
Assuming that the rank of is , define the SVD of
(11) 
where and are unitary matrices. The diagonal matrix contains the singular values of the matrix . Factorizing into two parts, consists of the first nonzero singular vectors and holds the last zero singular vectors. Thus, forms an orthogonal basis for the null space of . The solution for the BD constraint (9) is given by
(12) 
As shown in [13], the solution for the RBD constraint can be obtained as
(13) 
where is the regularization factor with is the whole average transmit power.
After the first precoding process, the MUMIMO channel is decoupled into a set of parallel independent SUMIMO channels by the BD precoding. For the RBD precoding, there are residual interferences between these channels due to the regularization process, but, these interferences tend to zero at high SNRs. Therefore, the effective channel matrix for the th user can be expressed as
(14) 
Define and consider the second SVD operation on the effective channel matrix
(15) 
using the unitary matrix and contains the first singular vectors. The second precoding filters for BD and RBD precoding can be respectively obtained as
(16)  
(17) 
where is the power loading matrix that depends on the optimization criterion. An example power loading is the water filling (WF) [2]. The th user’s decoding matrix is obtained as
(18) 
which needs to be known by each user’s receiver.
Note that for the conventional BDtype precoding algorithms, there is a dimensionality constraint below to be satisfied
(19) 
Then, we can get the matrix dimension relationship as . Note that the first SVD operation in (11) needs to be implemented times on with dimension and the second SVD operation in (15) needs to be implemented times on with dimensions for the BD precoding and for the RBD precoding. From the above analysis, most of the computational complexity of the BDtype precoding algorithms comes from the two SVD operations which make the computational complexity of the BDtype precoding algorithms increase with the number of users and the system dimensions.
Iv Proposed SGMI Based Precoding Algorithms
In this section, we describe the proposed LRSGMItype precoding algorithms based on a strategy that employs a channel inversion method, QR decompositions, and lattice reductions. Similar to the BDtype precoding algorithms, the design of the proposed LRSGMItype precoding algorithms is computed in two steps.
First, we obtain the first precoding filter for the LRSGMItype precoding algorithms by using one channel inversion and QR decompositions each implemented on individual users with matrix dimension . By applying the MMSE inversion to the combined channel matrix, we have
(20) 
where is the submatrix of . Considering a high SNR case, it can be shown that the regularization factor approaches zero and thus we have [8]. This means the offdiagonal block matrices of converge to zeros with the increase of SNR. Hence, the matrix is approximately in the null space of defined in (8), that is,
(21) 
Considering the QR decomposition of , we have
(22) 
where is an orthogonal matrix and is an upper triangular matrix. Since is invertible, we have
(23) 
Thus, satisfies the RBD constraint (10) to balance the MUI and the noise term.
We have simplified the design of the first precoding filter here as compared to [15] where a residual interference suppression filter is applied after the first precoding process . The filter increases the complexity and cannot completely cancel the MUI. Therefore, we omit the residual interference suppression part since it is not necessary for the RBD constraint based precoding. We term the simplified GMI as SGMI in this work. Then, the first precoding filter for SGMI can be obtained as
(24) 
where . By implementing the QR decomposition in (22) times on with dimension , the first combined precoding matrix for SGMI is
(25) 
Note the QR decompositions of the LCRBDLRtype precoding in [19, 20] are implemented on with dimension . The SGMI algorithm can be completed by applying the SVD operation to the effective channel matrix . Then, the second precoding filter of SGMI is obtained as . The SGMI algorithm is summarized in Table I.
Steps  Operations 

Applying the MMSE Channel Inversion  
(1)  
(2)  for i = 1 : 
(3)  
(4)  
(5)  
(6)  
(7)  
(8)  end 
Compute the overall precoding and decoding matrix  
(9)  
(10)  
(11)  
Calculate the scaling factor  
(12)  
Get the received signal  
(13) 
Similarly, the extension of the channel inversion method from the RBD constraint based precoding to the BD constraint based precoding is straightforward on
(26) 
Moreover, the obtained MUI is strictly zero as . Assuming the QR decomposition of is , then, we have
(27) 
Thus, satisfies the BD constraint (9). The first precoding matrix for the BD constraint based precoding can be equivalently obtained as
(28) 
This equivalent method is termed as GZI in [15].
For the proposed LRSGMItype precoding algorithms, we get the first precoding filter as SGMI in (24), while we employ the LRaided linear precoding technique instead of the SVD operation in SGMI to obtain the second precoding filter . The aim of the LR transformation is to find a new basis which is nearly orthogonal compared to the original matrix for a given lattice . The most commonly used LR algorithm has been first proposed by Lenstra, Lenstra and L. Lovász (LLL) in [23] with polynomial time complexity. In order to reduce the computational complexity, a complex LLL (CLLL) algorithm was proposed in [27], which reduces the overall complexity of the LLL algorithm by nearly half without sacrificing any performance. We employ the CLLL algorithm to implement the LR transformation in this work.
After the first precoding, we transform the MUMIMO channel into parallel or approximately parallel SUMIMO channels and the effective channel matrix for the th user is
(29) 
We perform the LR transformation on in the precoding scenario [28], that is
(30) 
where is a unimodular matrix with and all elements of are complex integers, i.e. . The physical meaning of the constraint is that the channel energy is unchanged after the LR transformation.
Following the LR transformation, we employ the linear precoding constraint to get the second precoding filter to parallelize each user’s streams. The ZF precoding constraint is implemented for user as
(31) 
It is wellknown that the performance of MMSE precoding is always better than that of ZF precoding. We can get the second precoding filter by employing an MMSE precoding constraint. The MMSE precoding is actually equivalent to the ZF precoding with respect to an extended system model [29, 32]. The extended channel matrix for the MMSE precoding scheme is defined as
(32) 
By introducing the regularization factor , a tradeoff between the level of MUI and the noise is introduced [8]. Then, the MMSE precoding filter is obtained as
(33) 
where , and the multiplication by will not result in transmit power amplification since . From the mathematical expression in (33), the rows of determine the effective transmit power amplification of the MMSE precoding. Correspondingly, the LR transformation for the MMSE precoding should be applied to the transpose of the extended channel matrix , and the LR transformed channel matrix is obtained as
(34) 
where is the unimodular matrix for . Then, the LRaided MMSE precoding filter is given by
(35) 
where the matrix . Finally, the combined second precoding matrix for all users is
(36) 
The overall precoding matrix is . Since the lattice reduced precoding matrix has near orthogonal columns, the required transmit power will be reduced compared to the BDtype precoding algorithms. Thus, a better BER performance than that of the BDtype precoding algorithms can be achieved by the proposed LRSGMItype precoding algorithms.
The received signal is finally obtained as
(37) 
where . The main processing work left for the receiver is to quantize the received signal to the nearest data vector and the decoding matrix described in the BDtype [11, 13], QR/SVDtype [16], and GMItype [15] precoding algorithms is not needed anymore. The receiver structure is thus simplified, and a significant amount of transmit power can be saved which is very important considering the mobility of the distributed users.
The proposed precoding algorithms are called LRSGMIZF and LRSGMIMMSE depending on the choice of the second precoding filter as given in (31) and (35), respectively. We will focus on the LRSGMIMMSE precoding since a better performance is achieved. The implementing steps of the LRSGMIMMSE precoding algorithm are summarized in Table II. By replacing the steps (8) and (9) in Table II with the formulation in (31), the LRSGMIZF precoding algorithm can be obtained. Similarly, the first precoding matrix can also be computed according to the GZI method in (28), and combined with (31) or (35) to get the second precoding matrix. Then, the LRGZIZF or LRGZIMMSE precoding algorithms can be obtained, respectively.
Steps  Operations 

Applying the MMSE Channel Inversion  
(1)  
(2)  for i = 1 : 
(3)  
(4)  
(5)  
(6)  
(7)  
(8)  
(9)  
(10)  end 
Compute the overall precoding matrix  
(11)  
(12)  
(13)  
Calculate the scaling factor  
(14)  
Get the received signal  
(15)  
Transform back from lattice space  
(16) 
V Performance Analysis
In this section, we carry out an analysis of the performance of the proposed LRSGMItype precoding algorithms. We consider a performance analysis in terms of BER, sumrate and computational complexity. In the BER analysis part, we show that the residual interference matrix of the RBD precoding actually converges to an identity matrix, which is a new result in the literature so far. We also mathematically demonstrate that the residual interference of the proposed LRSGMItype precoding algorithms converges to a zero matrix. Finally, we illustrate the quality of the effective channel matrices of the proposed and existing precoding algorithms by measuring their condition numbers. The maximum achievable sumrate of the proposed LRSGMItype precoding algorithms is given in the sumrate analysis part. The computational complexity of the proposed and existing precoding algorithms is summarized in tables in the complexity analysis part. The trend of the computational complexity with the increase of the dimensions is also given and an analysis is developed.
Va BER Performance Analysis
For the BD precoding, the effective SUMIMO channels are strictly parallel between each other after the first precoding filtering. For the RBD precoding, however, the residual interference is not zero between the users. We use to denote for convenience. From (13), the following formula is obtained
(38) 
Mathematically, the quantity can be expressed as . Substituting this into (38), the formula can be rewritten as
(39) 
With the increase of the SNR, approaches zero and then we have
(40) 
By further manipulating the expression in (40), we obtain
(41) 
that is, the residual interference matrix of the RBD precoding converges to an identity matrix at high SNRs. While, for the SGMI precoding algorithm developed in Section IV with the SNR increase we have
(42) 
By comparing (41) and (42), we can see that the impact of the residual interference of SGMI precoding would be smaller than that of the conventional RBD precoding algorithm. Thus, we expect that a better BER performance is achieved by the SGMI precoding algorithm over the conventional RBD precoding algorithm.
As pointed out in [21], the BER performance for a MIMO precoding system is actually determined by the energy of the transmitted signal . In order to reduce and improve the BER performance further, we transform the effective channel into the lattice space. By doing this, an improved basis is computed. Actually, the LR transformed channel matrix is quasiorthogonal rather than strictly orthogonal. We can employ the condition number which is defined as [17]
(43) 
to measure the orthogonality of the channel matrix. From the above definition of the condition number in (43), we get that with equality for an orthogonal basis while matrices which are nearly singular have large condition numbers. In Fig. 2, the probability density functions (PDFs) of the condition numbers for the effective channel matrices are illustrated. For the effective channel matrix of the proposed LRSGMIMMSE precoding algorithm, not only the spread in the condition numbers but also their average value is much smaller compared to the effective channel matrices achieved by the existing precoding algorithms. Therefore, a significant reduction in the required transmit power is achieved and a better BER performance can be obtained by the proposed LRSGMIMMSE precoding algorithm. Note for the special case of each user with a single receive antenna, the proposed LRSGMItype precoding will not converge to GMI or SGMI because the second precoding filter is designed in the lattice space.
VB Achievable SumRate Analysis
Recall that at high SNRs, the MUMIMO channel is approximately decoupled into equivalent SUMIMO channels by applying the first precoding filtering in (23). Then, we can transform the MUMIMO sumrate analysis [34] to a set of SUMIMO sumrate analysis tasks. For the second precoding filter, the LRaided MMSE precoding is actually equal to the LRaided ZF precoding under the high SNR scenario. Therefore, the th user’s received signal is
(44) 
where . By assuming that the average transmit power is , and because of the fact that , we get the normalization factor as
(45) 
where the quantity is the th singular value of , and denotes the energy of the th stream of .
From (45), the received SNR for the th stream of user is obtained as
(46) 
Then, the achievable sumrate for user is given by
(47) 
Note that the achievable sumrate is degraded by the normalization factor . The value of approaches its maximum when , thus we have
(48) 
Finally, the maximum achievable sumrate of the proposed LRSGMItype precoding algorithms at high SNRs can be expressed as
(49) 
For the BD precoding, we multiply the decoding matrix at the th user’s receiver and the received signal is given by
(50) 
Due to the fact that the two precoding matrices and are semiunitary matrices, we get and . Then, by applying the equivalence , the normalization factor for BD can be expressed as
(51) 
Since the statistical property of is not changed by the multiplication with the unitary matrix , we get the th received SNR as
(52) 
For simplicity, we do not consider the power loading between users and streams in the following derivation and term this strategy as no power loading (NPL). Then, the achievable sumrate for the BD precoding algorithm is given by
(53) 
By comparing the maximum achievable sumrate of the proposed LRSGMItype precoding algorithms in (49), we conclude that the sumrate of the proposed LRSGMItype precoding algorithms will be slightly inferior to that of the BD precoding algorithm at high SNRs. At low SNRs, however, we expect that the achieved sumrate of the proposed LRSGMItype precoding algorithms will be better than that of the BD precoding since a regularization factor is employed to mitigate the degradation by the noise term.
The sumrate performance of the BD precoding is actually dependent on the power loading scheme being used. Hence, the BD precoding algorithm can achieve its maximum sumrate performance by allocating the power between streams according to a WF power loading scheme. As pointed out in [15], we do not consider the power loading strategy for the RBD or the proposed LRSGMItype precoding algorithms for two reasons. One is that it is not easy to identify the optimal power allocation coefficients because of the existence of residual interference. The second reason is that the MMSE condition (10) is already satisfied. Therefore, an allocation of different powers between streams is not needed.
VC Computational Complexity Analysis
In this section, we use the total number of floating point operations (FLOPs) to measure the computational complexity of the precoding algorithms discussed above. It is worth noting that the lattice reduction algorithm has variable complexity, and the average complexity of the CLLL algorithm has been given in FLOPs by [27]. A reduced and fixed complexity lattice reduction structure is proposed in [35], however, we employ the conventional CLLL algorithm for the reason that the lattice reduction algorithm is not the main focus in this work. The number of FLOPs for the complex QR decomposition and the real SVD operation are given in [17]. As shown in [19], the number of FLOPs required by a complex SVD operation is equivalent to its extended real matrix. The total number of FLOPs needed by the matrix operations is summarized below:

Multiplication of and complex matrices: ;

QR decomposition of an complex matrix: ;

SVD of an complex matrix where only and are obtained: ;

SVD of an complex matrix where , and are obtained: ;

Inversion of an real matrix using GaussJordan elimination: .
We illustrate the required FLOPs for the conventional RBD, SGMI and LRSGMIMMSE precoding algorithms in Table III, Table IV and Table V, respectively. The complexity of the QR/SVD RBD [16] and LCRBDLRMMSE precoding algorithms is already given in [19]. A system with transmit antennas and users each equipped with receive antennas is considered; this scenario is denoted as the case. For the case, the reduction in the number of FLOPs obtained by the proposed LRSGMIMMSE precoding is , , and as compared to the RBD, BD, QR/SVD RBD and LCRBDLRMMSE precoding algorithms, respectively. Clearly, the proposed LRSGMIMMSE precoding algorithm requires the lowest complexity.
Steps  Operations  Flops  Case 
1  21504  
2  336  
3  5184  
4  552  
5  13248  
Total 40824 
Steps  Operations  Flops  Case 
1  
2736  
2  2432  
3  552  
4  13248  