# Low-Complexity Recursive Convolutional Precoding for OFDM-based Large-Scale Antenna Systems

## Abstract

Large-scale antenna

(LSA) has gained a lot of attention recently since it can significantly improve the performance of wireless systems. Similar to *multiple-input multiple-output* (MIMO) *orthogonal frequency division multiplexing* (OFDM) or MIMO-OFDM, LSA can be also combined with OFDM to deal with frequency selectivity in wireless channels. However, such combination suffers from substantially increased complexity proportional to the number of antennas in LSA systems. For the conventional implementation of LSA-OFDM, the number of *inverse fast Fourier transform*s (IFFTs) increases with the antenna number since each antenna requires an IFFT for OFDM modulation. Furthermore, *zero-forcing* (ZF) precoding is required in LSA systems to support more users, and the required matrix inversion leads to a huge computational burden. In this paper, we propose a low-complexity recursive convolutional precoding to address the issues above. The traditional ZF precoding can be implemented through the recursive convolutional precoding in the time domain so that only one IFFT is required for each user and the matrix inversion can be also avoided. Simulation results show that the proposed approach can achieve the same performance as that of ZF but with much lower complexity.

## 1Introduction

By installing hundreds of antennas at the *base station* (BS), *large-scale antenna* (LSA) systems can significantly improve performance of cellular networks [1]. Even if LSA can be regarded as an extension of the traditional *multiple-input multiple-output* (MIMO) systems, which has been widely studied during the last couple of decades [3], many special properties of LSA due to extremely large number of antennas make it a potential technique for future wireless systems and thus has gained lots of attention recently.

When the antenna number is sufficiently large, the performance in an LSA system becomes deterministic [4]. From the power scaling law for LSA [4], the transmit power of each user is inversely proportional to the antenna number or the square root of the antenna number, depending on whether accurate channel state information is available or not. For downlink transmission with multiple users, precoding techniques are required at the BS to achieve the system capacity [5]. When the antenna number is large enough and the channels corresponding to different antennas or users are independent, the channel vectors for different users are asymptotically orthogonal. If the user number is much smaller than the antenna number which is always true in LSA systems, the *matched filter* (MF) will perform as well as the typical linear precoders, such as *zero-forcing* (ZF) or *minimum mean-square-error* (MMSE). Therefore, the complexity can be greatly reduced since no matrix inversion is required for precoding[6].

Similar to the philosophy of MIMO-*orthogonal frequency division multiplexing* (OFDM) [7] or MIMO-OFDM [8], LSA can be also combined with OFDM to deal with frequency selectivity in wireless channels. Although straightforward, such combination suffers from substantially increased complexity.

First, the precoding is conducted in the frequency domain for traditional MIMO-OFDM [9]. In this case, each antenna at the BS requires an *inverse fast Fourier transform* (IFFT) for OFDM modulation and the number of IFFTs is equal to the antenna number. Therefore, the number of IFFTs will increase substantially as the rising of the antenna number in LSA systems, leading to a huge computational burden.

Second, *zero-forcing* (ZF) precoding is required to support more users in LSA systems. As indicated in [1], the MF precoding can perform as well as the ZF precoding in LSA systems because the *inter-user-interference* (IUI) can be suppressed asymptotically through the MF precoding if the antenna number is large enough and the channels at different antennas and different users are independent. In practical systems, however, the antenna number is always finite. Moreover, the channels at different antennas will be correlated when placing so many antennas in a small area. In this sense, there will be residual IUI for the MF precoding, and the ZF precoding is thus still required [10]. As a result, the matrix inversion of the ZF precoding will substantially increase the complexity, especially when the user number is large.

To address the issues above, we propose a low-complexity recursive convolutional precoding for LSA-OFDM in this paper.

First, a convolutional precoding filter in the time domain is used to replace the traditional precoding in the frequency domain. In this way, only one IFFT is required for each user no matter how many antennas there are. Meanwhile, by exploiting the frequency-domain correlation of the traditional precoding coefficients, the length of the precoding filter can be much smaller than the FFT size. As a result, the complexity can be greatly reduced, especially when the antenna number is large. Even though the convolutional precoding has been studied in [11] for traditional MIMO-OFDM systems, its advantage is not as significant as in LSA systems. In this paper, we highlight that such advantage becomes remarkable when the antenna number is large and thus it is more suitable to adopt the convolutional precoding rather than the traditional frequency-domain precoding for the transceiver design in LSA-OFDM systems.

Second, based on the order recursion of Taylor expansion, the convolutional precoding filter works recursively in this paper such that we can not only avoid direct matrix inverse of traditional ZF precoding but also provide a way to implement the traditional ZF precoding through the convolutional precoding filter with low complexity. Taylor expansion has already been used for *Truncated polynomial expansion* (TPE) in [12]. In [12], it is used to approximate the matrix inverse in ZF precoding. The precoding can be conducted iteratively so that the matrix inverse can be avoided. A similar approach is adopted in [13] where the TPE is based on Cayley-Hamilton theorem and Taylor expansion is used for optimization of polynomial coefficients. The order recursion of Taylor expansion has also been used in [14] for channel estimation and multiuser detection. Different from the existing works that are based on a matrix form Taylor expansion in the frequency domain, the recursive ZF precoding in this paper is implemented through the recursive filter in the time domain such that it can be naturally combined with the convolutional precoding. Moreover, the order recursion is converted to a time recursion in this paper so that the proposed approach can track the time-variation of channels. Based on the time recursion, the tracking property is further analyzed for large-scale regime, resulting in new theoretical insights for the behaviors of time recursion in LSA systems that are not revealed before.

The rest of this paper is organized as follows. The system model is introduced in Section II. The proposed approach is derived in Section III, and its performance is analyzed in Section III. Simulation results are presented Section V. Finally, conclusions are drawn in Section VI.

## 2System Model

Consider downlink transmission in an LSA-OFDM system where a BS employs antennas to serve users, each with one antenna, simultaneously at the same frequency band. As in [1], we assume .

Denote with to be the transmit symbol for the -th user at the -th subcarrier of the -th OFDM block. In an LSA-OFDM based on traditional OFDM implementation, the precoding is carried out in the frequency domain, and therefore the transmit signal at the -th sample of the -th OFDM block at the -th antenna for the -th user is

where denotes the subcarrier number for the OFDM modulation and denotes the precoding coefficient for the -th subcarrier of the -th OFDM block at the -th antenna for the -th user. A *cyclic prefix* (CP) will be added in front of the transmit signal to deal with the delay spread of wireless channels.

After removing the CP and OFDM demodulation, the received signal at the -th user can be expressed as

where is the additive white noise with , and is the *channel frequency response* (CFR) corresponding to the -th subcarrier of the -th block at the -th antenna for the -th user, which can be expressed as

where is the *channel impulse response* (CIR) and denotes the channel length which is usually much smaller than the FFT size. The CFR is assumed to be complex Gaussian distributed with zero mean and , where denotes the square of the large-scale fading coefficient for the -th user, denotes the correlation function of the channels at different antennas for the same user, and denotes the Kronecker delta function. It means the CFRs have been assumed to be independent for different users while they depend on the correlation function, , for different antennas. In particular, we have when the CFRs at different antennas are independent.

From (Equation 1), the received signal vector corresponding to the -th subcarrier of the -th OFDM block for all users can be expressed as

where

with being the corresponding precoding vector of the -th user and being the CFR vector for the -th user with correlation matrix where .

## 3Low-Complexity Recursive Convolutional Precoding

In this section, we will first present recursive updating of precoding matrices, then derive the low-complexity convolutional precoding, and discuss its complexity at the end of this section.

### 3.1Recursive Updating

The ZF precoding is considered in this paper although the proposed approach can be also used for other precodings, such as the MMSE precoding. Assuming the downlink channels are known at the BS, the desired precoding matrix can be expressed as

Using Taylor expansion in Appendix A, the matrix inverse in (Equation 3) can be substituted by an order-recursive relation as

where and denotes the corresponding precoding matrix with the -th order expansion and is a step size that affects the convergence, as we will discuss in Section IV. The order-recursive relation in (Equation 4) can be also rewritten in a vector form as

where denotes the -th column of .

In (Equation 5), the order-recursive updating is driven by the expansion order, . Mathematically, the expansion order in (Equation 5) can be viewed as a *recursion counter*, which increases as the recursion proceeds. In this sense, the OFDM block index can be also used as that *recursion counter*. In other words, (Equation 5) can be also driven by the OFDM block index if replacing expansion order, , with OFDM block index, , that is

As a result, the order recursion in (Equation 5) is converted to the time recursion in (Equation 6). Essentially, the order recursion in (Equation 5) can be converted to the time recursion in (Equation 6) is just because they have a similar expression except that one is driven by and the other is driven by . Using the time recursion in (Equation 6), the actual calculation can be conducted in the time domain even though the principle for avoiding the matrix inverse is based on the order recursion in (Equation 5). In this way, we can not only reduce the complexity since there is not need to repeat the order recursions from the zeroth order for each OFDM block, but also track the time-varying channels as long as the channel changes slowly. Strictly speaking, the above conversion is only valid when the channel is time invariant. In this case, (Equation 5) and (Equation 6) have exactly the same expression except for different *recursion counter*s. In practice, the time recursion in (Equation 6) can still work as long as the channel is slowly time-varying. Our analysis in Section IV shows that the time recursion can track the time variation of the channels when Doppler frequency is small but the performance will degrade as the rising of Doppler frequency.

Actually, the order recursion and the time recursion can be used in a hybrid manner as in Figure 1. The order recursion is used for initialization and the time recursion is used for tracking. Once the expansion order for initialization is large enough to achieve satisfied performance, the time recursion will be on to update the precoding coefficients in the subsequent OFDM blocks. In this way, we can save the complexity since only one recursion is needed to update the coefficients during the tracking stage.

### 3.2Convolutional Precoding

Although the matrix inverse is avoided through (Equation 6), the precoding is still conducted in the frequency domain. In this subsection, we will convert it into the time-domain convolutional precoding by exploiting the frequency-domain correlation of the precoding matrices. Denote , which contains the precoding coefficients from all subcarriers of the -th OFDM block at the -th antenna for the -th user. Then, (Equation 6) can be rewritten as

where is the corresponding CFR vector from the -th antenna to the -th user, and is a diagonal matrix with the -th element given by

Denote to be the coefficient for the -th tap of the precoding filter at the -th antenna for the -th user corresponding to the -th OFDM block. Then, we have

where is the corresponding precoding vector, and is the *discrete Fourier transform* (DFT) matrix with the -th element given by

From Appendix B, by taking the inverse DFT of (Equation 7), we can obtain the coefficients for the time-domain convolutional precoding filter as

where is the estimation error given by

The resulted recursive convolutional precoding is shown in Figure 2, where large-scale fading is omitted by setting . The precoding is carried out in the time domain via the precoding filter. In this case, only one IFFT is required for each user no matter how many antennas there are at the BS. Therefore, the number of IFFTs is equal to the number of users, which is much smaller than the antenna number in LSA systems. By exploiting the correlation of frequency-domain precoding coefficients, the coefficients of the precoding filter is sparse and thus can be truncated. For the single user case, the precoding filter is exactly the conjugate of the CIR and thus . In the case of multiple users, we use one more tap, as a rule of thumb, for the positive taps and another taps to include the significant coefficients on the negative taps. As a result, can be truncated within the range (modulo ). Following the order recursion based initialization, the coefficients of the precoding filter can be updated recursively.

Note that the transmit signal after the IFFT should be circularly extended before sending to the precoding filter so that the signal can be circularly convolved with the precoding filter because the production in the frequency domain corresponds to the circular convolution in the time domain [16].

### 3.3Complexity

Proposed | Traditional ZF | TPE | |

IFFT |
|||

Precoding operation |
|||

Coefficient calculation |
|||

In Tab. ?, the complexity is evaluated in terms of the number of *complex multiplications* (CMs) required by the IFFT, the actual precoding operation, and the coefficient calculation for the precoding [17]. As comparisons, the complexities of the traditional ZF precoding and the TPE precoding in [12] are also included in the table. For the traditional ZF precoding, consecutive subcarriers ( in *long-term evolution* (LTE)) can share the same precoding coefficients by exploiting the frequency-domain correlation of the precoding coefficients. For the TPE precoding, it requires iterations for each OFDM block because the iterations are repeated from the zeroth order for each OFDM block.

We have the following observations from the table. First, the number of IFFTs is equal to the antenna number for the traditional ZF precoding and the TPE precoding, and thus the number of IFFTs for the proposed approach is greatly reduced since the user number is much smaller than the antenna number in LSA systems. Second, the precoding filter length for the convolutional precoding is much smaller than the FFT size, while the precoding operation has to be conducted on each subcarrier individually for the traditional ZF precoding and the TPE precoding. Third, the number of CMs can be reduced for the proposed approach because the coefficient calculation is conducted recursively, while the traditional ZF precoding can also reduce the number of CMs since consecutive subcarriers can have the same precoding coefficients.

As an example, Figure 3 presents the CMs required by the proposed approach, the traditional ZF precoding, and the TPE precoding for the typical MHz bandwidth in LTE where the size of FFT is [18]. For a typical *extended typical urban* (ETU) channel whose maximum delay , a channel length is enough to contain most of the channel power. As expected, the complexity of the convolutional precoding is substantially reduced compared with existing approaches when the antenna number is large. When antenna number is small, however, the complexity reduction is not so significant as that for the case of large antenna number. The traditional ZF or TPE may even require fewer CMs than the proposed approach with larger or smaller , at the cost of performance degradation, as will be shown in Section V. In fact, the advantage of the convolutional precoding can be hardly observed in traditional systems since the antenna number there is small, and it only becomes remarkable when the antenna number is very large. Therefore, it is more suitable to adopt the convolutional precoding rather than the traditional frequency-domain precoding for the transceiver design in LSA-OFDM systems. Note that the convolutional precoding will cause some delay of the signal transmission. However, the complexity reduction is favorable if the delay due to the convolution is tolerable.

## 4Performance Analysis

In this section, we will first analyze the convergence performances of initialization and tracking, respectively, and then discuss the impacts of imperfect channels. Since the time-domain convolutional precoding is equivalent to the frequency-domain precoding, the performance analysis is conducted in the frequency domain for simplicity.

### 4.1Initialization

We focus on the OFDM block with where the order-recursion is used for initialization. Define to be the normalized precoding matrix error for initialization, where the large-scale fading effect has been taken into account. Then it is shown in Appendix C that

where is the -th eigenvalue of or .

Denote and to be the maximum and the minimum eigenvalues of , respectively. From (Equation 10), the convergence can be achieved as long as , and the optimal step size for the fastest convergence will be [19]. Depending on whether the channels at different antennas are independent or not, we have the following discussions:

If the CFRs corresponding to different antennas are independent, we have for [20]. In this case, fast convergence can be achieved by setting , and the convergence can be almost achieved within only one recursion as we can see from the simulation results in the next section.

If the CFRs corresponding to different antennas are correlated, the maximum and the minimum eigenvalues will rely on . Inspired by , we let and for simplicity, and thus in this case. Obviously, such step size can cover the case where the channels at different antennas are independent because in that situation. Simulation results in Section V shows such step size can work well for the proposed approach.

### 4.2Tracking

When the channel is static, the performance of tracking will be the same with that in (Equation 10) except that the expansion order, , is replaced by the block index, . On the other hand, if the channel is time-varying, the variation of the desired precoding matrix is given, from (Equation 3), by

Exact analysis based on (Equation 11) is difficult. To gain analytical insights, we assume the channels corresponding to different antennas and different users are independent. In that case, can be approximated by

Furthermore, we assume that the expansion order for initialization is sufficiently large so that .

Define to be the normalized precoding matrix error for tracking, where the large-scale fading effect has been taken into account. When the Doppler frequency, , is small, then it is shown in Appendix D that the *mean-square-error* (MSE) can be expressed by

where denotes the OFDM symbol duration. From (Equation 12), we have the following observations:

The MSE of tracking depends only on the ratio of user number and antenna number. As the antenna number is much larger than the user number in an LSA system, we have

The MSE of tracking increases as the rising of OFDM block index. It means that the performance will be degraded as the time recursion proceeds which can be also confirmed by our simulation results.

The MSE of tracking increases as the rising of Doppler frequency, that is, the performance will be degraded as the rising of Doppler frequency, which also coincides with our intuition.

### 4.3Impact of Imperfect Channel

In the above, we have assumed that the accurate downlink channel is known at the BS. In practical systems, the downlink channel at the BS can be obtained by estimating the uplink channel due to the reciprocity in time-division duplexing systems [21]. In any case, only imperfect channel is known at the BS.

To analyze the impacts of channel estimation error, denote the imperfect channel to be

where denotes the channel estimation error with with being the variance of the error when . Assuming the CFRs and the channel errors are independent, we can obtain, when the antenna number is large enough, that,

From (Equation 14), we have where denotes the -th eigenvalue of or . For simplicity, we will only focus on the initialization in the subsequential of this subsection, although our results are also available for the tracking stage.

In the presence of the channel estimation error, the order recursion for initialization can be rewritten by

where denotes the precoding coefficients with imperfect channel. Correspondingly, the normalized precoding matrix error is where indicates the desired precoding matrix with imperfect channel. Following the same analysis in Section IV.A, the convergence of (Equation 15) can be achieved by choosing when the channels at different antennas are independent or when they are correlated. We have and thus when .

In addition to changing the step size, the channel estimation error will also cause the performance degradation when the convergence has been achieved. Denote to be the error for the desired precoding matrix due to the channel estimation error. To gain analytical insights, we assume the channels corresponding to different antennas and different users are independent. In that case,

With the assumption that the CFRs and the channel errors are independent, the MSE can be expressed by

From (Equation 17), we have the following observations:

If assuming is very small, we have

which is approximately proportional to the variance of the channel estimation error.

By increasing the antenna number or reducing the user number, the impact of the channel estimation error can be mitigated. In the extreme case where , we have , which means the impact of the channel estimation error vanishes when the antenna number is very large.

## 5Simulation Results

(a)

(b)

(a)

(b)

(a)

(b)

In this section, we evaluate the proposed approach using computer simulation. We consider a BS equipped with antennas and users in the system. A *quadrature-phase-shift-keying* (QPSK) modulated OFDM signal is used, where the subcarrier spacing is KHz corresponding to an OFDM symbol duration about . For a typical MHz channel, the size of FFT is with subcarriers used for data transmission and the others used as guard band as in LTE [18]. Each frame consists of OFDM symbols. A normalized ETU channel model is used, which has taps and the maximum delay . The channels at different antennas can be independent or correlated. For the latter, a *uniform-linear-array* (ULA) is used where the antennas are placed along a straight line [22]. In this case, the correlation of channels at -th antenna and -th antenna is , where is the array size normalized by the wavelength. Apparently, the channels at different antennas will be more correlated for smaller . Without loss of generality, we assume for all users.

Fig. Section 4.1 shows the MSE and *symbol-error-ratio* (SER) for the initialization of the proposed approach. From Fig. Section 4.1 (a), the MSE reduces as the order recursion proceeds. However, the MSEs for the correlated channels cannot reduce as fast as that for the independent channels. It means that more order recursions are required to achieve a satisfied performance for the initialization when the channels at different antennas are correlated. This coincides with the observation in Fig. Section 4.1 (b). From Fig. Section 4.1 (b), the SER can be improved as the order recursion proceeds. When the channels at different antennas are independent, the proposed approach can achieve the same SER with the ZF precodings within only two recursions. However, more recursions are required when the channels at different antennas are correlated.

Fig. Section 4.2 shows the MSE and SER for the tracking of the proposed approach with different Doppler frequencies. We assume the expansion order for initialization is large enough such that . From Fig. Section 4.2 (a), the channel correlation causes smaller impact to the tracking MSE than it does to the initialization MSE. From Fig. Section 4.2 (b), the time-varying channels can be efficiently tracked when the Doppler frequency is small and therefore the SERs over different OFDM blocks will be almost the same. On the other hand, it becomes difficult to track the channel time variation as the increasing of the Doppler frequency, and thus the SERs for the OFDM blocks at the end of the frame will get worse. This problem can be easily addressed by re-initialization when the precoding coefficients are getting far from the desired ones.

Fig. ? shows the impacts of the channel estimation error. From ( ?) (a), the MSE is approximately proportional to the variance of the channel estimation error when the latter is small, which coincides with our analysis in Section IV. Fig. ? (b) shows that the channel estimation error has little affects on the SER when dB. Otherwise, the SER performances will be seriously degraded as the increasing of the channel estimation error.

Figure 4 shows the SER versus with different Doppler frequencies. For the proposed approach, we also assume the expansion order is large enough for initialization such that . As the increasing of the Doppler frequencies, the SER performances degrade because the channels cannot be efficiently tracked when the Doppler frequency is large. As comparisons, the MF precoding and the traditional ZF precodings with are also included. Since the ZF and MF precodings are conducted for each OFDM block individually, the SER performances will be the same for different Doppler frequencies. When the Doppler frequency is small, the proposed approach can achieve the same SER as the traditional ZF precoding with . As the increasing of , the performance of ZF precoding will degrade although the complexity can be reduced. Meanwhile, the proposed approach can significantly outperform the MF precoding since the latter cannot completely remove the IUI.

## 6Conclusions

In this paper, low-complexity convolutional precoding has been proposed for the precoder design in an LSA-OFDM system. The traditional frequency-domain precoding has been converted into a time-domain convolutional precoding so that the number of IFFTs is substantially reduced. On the other hand, based on the order recursion of Taylor expansion, the convolutional precoding filter works recursively in this paper such that we can not only avoid direct matrix inverse of traditional ZF precoding but also provide a way to implement the traditional ZF precoding through the convolutional precoding filter with low complexity. Our results have shown that it is more suitable to adopt the convolutional precoding rather than the traditional frequency-domain precoding for the transceiver design in LSA-OFDM systems.

## 7Derivation of ()

When the antenna number is sufficiently large and the CFRs corresponding to different users and different antennas are independent, the CFR vectors for different users are asymptotically orthogonal and therefore we have

In practical systems, however, the antenna number is always finite, and the channels at different antennas can be correlated when placing so many antennas in a small area. In such case,

where can be viewed as a perturbation matrix. When scaled by a factor , we have

where . Using the Taylor expansion, the inverse of (Equation 19) can be expressed by

Substituting (Equation 20) into (Equation 3), we can obtain

where denotes the precoding matrix with the -th order Taylor expansion. Exploiting the relation between consecutive expansion orders, we have

Substituting (Equation 22) into (Equation 21),

## 8Derivation of ()

Taking the inverse DFT on both sides of (Equation 7), we have

To proceed, we can derive that

where denotes the circular convolution and denotes a circular matrix constructed using . Substituting (Equation 24) into (Equation 23), we have

which can be rewritten in a scalar form as

In general, the channel length, , is much smaller than the FFT size. In other words, the power of CIR, , may concentrate only on the taps at the beginning and the others are small enough and thus can be omitted. This is also the case for the precoding coefficients, , due to the correlation of frequency-domain precoding matrices. As a result, the circular convolution in (Equation 26) can be replaced by the linear convolution, leading exactly to (Equation 9).

## 9Derivation of ()

By subtracting on both sides of (Equation 4) and then multiplying , we obtain

Denote

where is a unitary matrix, with being a diagonal matrix, and is an unitary matrix where includes the first columns and includes the last columns. Then, we have

Substituting (Equation 29) into (Equation 27), we can obtain

Using the recursive relation in (Equation 30), we can derive that

Recall that

Therefore,

As a result, we have since . Using this relation, (Equation 31) can be simplified as

Therefore, can be expressed by