Design and Evaluation of a Novel Short Prototype Filter for FBMC/OQAM Modulation
Abstract
FilterBank MultiCarrier with Offset Quadrature Amplitude Modulation (FBMC/OQAM) is considered by recent research projects as one of the key enablers for the future 5G air interface. It exhibits better spectral shape and improves mobility support compared to Orthogonal FrequencyDivision Multiplexing (OFDM) thanks to the use of a time and frequency localized prototype filter. The choice of this filter is crucial for FBMC/OQAM, due to its substantial impact on achieved performance and complexity levels. In the context of 5G, short frame sizes are foreseen in several communication scenarios to reduce system latency, and therefore short filters are preferred. In this context, a novel short filter allowing for near perfect reconstruction and having the same size as one OFDM symbol is proposed. Using FrequencySpread (FS) implementation for the FBMC/OQAM receiver, analytical analysis and simulation results show that the proposed filter exhibits better robustness to several types of channel impairments when compared to StateofTheArt (SoTA) prototype filters and OFDM modulation. In addition, FSbased hardware architecture of the filtering stage is proposed, showing lower complexity than the classical PolyPhase Network (PPN)based implementation.
I Introduction
Next generation mobile communication systems are foreseen to provide ubiquitous connectivity and seamless service delivery in all circumstances. The expected important number of devices and the coexistence of humancentric and machinetype applications will lead to a large diversity of communication scenarios and characteristics [1]. In this context, many advanced communication techniques are under investigation. Taken individually, each one of these techniques is suitable for a subset of the foreseen communication scenarios.
FilterBank MultiCarrier with Offset Quadrature Amplitude Modulation (FBMC/OQAM), or in short FBMC, is being studied and considered nowadays by recent research projects as a key enabler for the future flexible 5G air interface [2]. It exhibits better spectrum shape compared to the traditional Orthogonal FrequencyDivision Multiplexing (OFDM) and enables better spectrum usage and improved mobility support. This is possible thanks to the use of a Prototype Filter (PF) which allows to improve the time and/or frequency localization properties of the transceiver. The orthogonality is preserved in the real domain (as opposite to complex domain) with the OQAM scheme. Furthermore, FBMC implementation relies on Fast Fourier Transform (FFT), similarly to OFDM, with an additional lowcomplexity PolyPhase Network (PPN) filtering stage.
However, the choice of the PF is crucial for the design of an efficient FBMC modulation. In fact, the time/frequency localization of this filter can impact significantly the different performance levels [3] and the frame structure of the communication system. Furthermore, the length of the PF impacts considerably the transceiver complexity. Thus, the careful design of new PFs is of high interest to improve robustness of FBMC against channel impairments and to support the constraints imposed by various 5G scenarios.
In this context, a novel short PF is proposed. It is obtained by inverting the time and frequency lattice of the FilterBank (FB) impulse response of a long PF. Due to its near perfect reconstruction property and its length of one OFDM symbol, it is denoted by Near Perfect Reconstruction 1 (NPR1) in this paper.
Indepth technical analysis and comparison with existing short PFs in terms of power spectral density, robustness to timing and frequency offsets, and robustness to multipath channel impairments for different wireless channel models are performed. Both PPN and Frequency Spread (FS) implementations of the FBMC receiver, respectively referred to as PPNFBMC and FSFBMC in this paper, are considered. It is shown that:

the NPR1 PF achieves improved robustness against all the considered channel impairments when the FSFBMC receiver is considered,

the FSFBMC receiver offers improved robustness against timing offset and multipath impairments when compared to the PPNFBMC receiver,

by exploiting the different properties of the PFs, a substantial reduction in hardware complexity can be achieved for the FSFBMC receiver.
Particularly, it is shown in this paper that the hardware complexity of the FSFBMC receiver is lower than the PPNFBMC receiver when the NPR1 PF is considered.
The rest of the paper is organized as follows. Section II provides a technical description of the FBMC modulation with different types of implementation. Section III is dedicated to the presentation of the proposed novel short PF along with StateofTheArt (SoTA) existing ones. Section IV evaluates and compares the performance of all considered PFs with several channel impairments. Section V presents the proposed hardware architecture of the FS filtering stage and illustrates the complexity reduction with respect to the PPNFBMC architecture. Finally, Section VI concludes the paper.
Ii FBMC/OQAM system description
FBMC is a multicarrier transmission scheme that introduces a filterbank to enable efficient pulse shaping for the signal conveyed on each individual subcarrier. This additional element represents an array of bandpass filters that separate the input signal into multiple components or subcarriers, each one carrying a single frequency subband of the original signal. As a promising variant of filtered modulation schemes, FBMC, originally proposed in [4] and also called OFDM/OQAM [5] or staggered modulated multitone (SMT) [6], can potentially achieve a higher spectral efficiency than OFDM since it does not require the insertion of a CyclicPrefix (CP). Additional advantages include the robustness against highly variant fading channel conditions and imperfect synchronization by selecting the appropriate PF type and coefficients [3]. Such a transceiver structure usually requires a higher implementation complexity related not only to the filtering steps but also to the applied modifications to the modulator/demodulator architecture. However, the usage of digital polyphase filter bank structures [7][5], together with the rapid growth of digital processing capabilities in recent years have made FBMC a practically feasible approach.
In the literature, two types of implementation for the FBMC modulation exist, each having different hardware complexity and performance. The first one is the PPN implementation [8], illustrated in Figure 1, which is based on an IFFT and a PPN for the filtering stage, and enables a low complexity implementation of the FBMC transceiver.
The second type of implementation is the FS implementation (Figure 1), proposed in [9][10] for the Martin–Mirabassi–Bellange PF with an overlapping factor equal to 4 (MMB4), considered for FBMC during PHYDYAS project [11]. The original idea was to shift the filtering stage into the frequency domain, in order to enable the use of a lowcomplexity persubcarrier equalizer as in OFDM. The hardware complexity is supposed to be higher than the complexity of the PPN implementation, at least for long PFs. In fact, it requires one FFT of size per FBMC symbol, where is the overlapping factor of the PF, and is the total number of available subcarriers. However, in the short PF case (), the size of the FFT is same as for the PPN implementation.
The rest of the section provides a mathematical background of the PPNFBMC transceiver and the FSFBMC receiver.
Iia PolyPhase Networkbased implementation
If is the total number of available subcarriers and the PulseAmplitude Modulation (PAM) symbol at subcarrier index and time slot , then the baseband signal can be mathematically decomposed as follows:
(1)  
(2) 
with . To keep the orthogonality in the real field, must be a quadrature phase rotation term. In the literature, it is generally defined as , as in [3]. The impulse response of the PF is , with when , where is the length of the PF. In practice, the PPNFBMC transmitter is implemented using an IFFT of size followed by a PolyPhase Network. When a short PF is used (), this latter can be seen as a windowing operation: the outputs of the IFFT are simply multiplied by the PF impulse response . Consequently, the complexity overhead introduced by the PPN is limited. Note that, due to the OQAM scheme, the obtained FBMC symbol overlaps with both the previous and next symbols on half of the symbol length. Therefore, for practical implementation, FBMC symbols may be generated in parallel. It is however possible to avoid the use of two IFFT blocks at the transmitter side through the use of the pruned FFT algorithm. This leads to a reducedcomplexity implementation presented in [12] and [13].
Receiver side implementation applies dual operations with respect to the ones performed by the constituent blocks of the transmitter. The IFFT must be replaced by an FFT, and the operations order must be reversed: PPN (windowing if ), FFT then OQAM demapper, as shown in Figure 1. If is the received signal and are the recovered PAM symbols, then we have:
(3)  
(4)  
(5) 
where represents the complex conjugate operation, and is the Zero Forcing (ZF) equalizer coefficient to compensate the impairments introduced by the channel. Note that, contrary to transmitter side, doubling the FFT processing cannot be avoided using the pruned FFT algorithm. The main reason is that the equalization term introduces complex valued coefficients.
IiB FSFBMC receiver description
The FSFBMC implementation is generally considered at the receiver side, since it enables a lowcomplexity and efficient equalization scheme [14][15]. It remains perfectly compatible with the PPN implementation at the transmitter side. The received symbols are expressed as follows:
(6)  
(7)  
(8) 
where is the frequency response of the PF. The FSFBMC receiver first applies an IFFT of size on the part of the signal containing the FBMC symbol to demodulate in order to obtain the signal in frequency domain (6). Then, it introduces a filtering stage in frequency domain, as described in (7): the frequency response of the PF is convoluted with the signal, for instance using a Finite Impulse Response (FIR) filter. Finally, the recovered PAM symbols are obtained by extracting the real part of quadrature phase rotated and downsampled samples (8).
These operations are summarized and illustrated in Figure 1. The FSFBMC implementation seems highly complex, however has a lot of zero coefficients due to its frequency localization. Therefore, it can be truncated down to coefficients. Then, by defining ( is considered an odd number), (7) becomes:
(9)  
Iii Proposal of a novel short prototype filter
Current literature often focuses on FBMC using a PF with a duration times larger than an OFDM symbol (), like MMB4 or Isotropic Orthogonal Transform Algorithm 4 (IOTA4) [16]. However, a shorter PF can also be applied, as proposed in [17] with the TimeFrequency Localization 1 (TFL1) PF. Another example is the Quadrature Mirror Filter 1 (QMF1) [18] which was recently applied to FBMC leading to a variant denoted by LappedOFDM modulation and presented in [19]. In the rest of this paper, PFs with a duration larger than one OFDM symbol will be referred to as long PFs (e.g. MMB4, IOTA4), and the ones with a duration of one OFDM symbol as short PFs (e.g. TFL1, QMF1). When compared to long PFs, short PFs provide lower latency, higher robustness against Carrier Frequency Offset (CFO) and higher spectral efficiency due to shortened transition between two successive transmission frames. In this context, finding a new short PF with good performance and low hardware complexity becomes a challenging task of high interest. In the following, after a short description of the two existing short PFs in the literature, we present a novel short PF design that shows significant advantages in terms of performance and complexity.
Iiia TFL1 and QMF1 prototype filters
The TFL1 PF was the first attempt to specifically design a time and frequency localized short PF for the FBMC modulation [17]. It is the most known short PF in the literature. Indeed, it is already integrated into proofofconcept hardware platforms [20][21]. The analytical expression of the TFL1 PF [22] is given by:
where with , and
being defined in Table I. The second half of the PF coefficients are constructed by symmetry: for .
Regarding the QMF1 PF [18], it was applied to FBMC leading to a variant denoted by LappedOFDM modulation presented in [19]. The analytical expression of the QMF1 PF is given below:
(10) 
IiiB Proposed Near Perfect Reconstruction filter
This subsection describes a novel short PF representing one of the major contributions of this paper. The main design procedure of the proposed PF starts by inverting the time and frequency axes of the FilterBank (FB) impulse response of the MMB4 PF. The coefficients of this FB impulse response are given in [23] and presented in Table II. It can be seen that the FB impulse response of the MMB4 PF is highly localized in frequency since interference is limited only to one adjacent subcarrier (indexes and ) in the frequency plane. Inverting the time () and frequency () axes will generate a PF highly localized in time, since the obtained FB impulse response coefficients have values only at the adjacent FBMC symbols ( and ). Therefore, a PF with an overlapping factor of is sufficient to obtain these coefficients. Consequently, the PF coefficients can be deduced from a given FB impulse response.
By definition, the FB impulse response is composed of the values obtained at the output of the receiver from (7) by setting and when . In this case, we have , and in (4) becomes:
(11) 
Furthermore, we have , where is the FB impulse response coefficients of the MMB4 PF presented in Table II. Particularly, for , we have:
(12) 
Thus, can be deduced as follows:
(13) 
Then, the design procedure introduces simplifications to obtain a simpler analytical expression, by taking advantage of the real valued and symmetrical coefficients:
(14)  
We call the resulting proposed short PF with overlapping factor equal to as Near Perfect Reconstruction (NPR1) PF due to its nature, similar to the MMB4 PF. To analytically calculate the residual interference, the recovered PAM symbols must be expressed by taking into account the transmitted symbols and the effect of the PF at the transmitter. Thus, by setting and by integrating equations (1), (3) and (4) into (5), we have:
Due to the time localization of the PF, we have , being the number of FBMC symbols acting as interference after (or before) the FBMC symbol currently demodulated. Typically, we have for short PFs, since the impulse response of is equal to zero after samples. Then, we have:
(16)  
with
In (16), the term corresponding to the FFT of can be rewritten by a circular convolution operation denoted by , as follows:
with
where and are the results of the application of a FFT to the terms and , respectively expressed as:
(20) 
and,
Finally, the expression of the recovered PAM symbol becomes:
(21)  
with . In fact, is the FB impulse response of , and for the NPR1 PF, we have . If symbols are independent and identically distributed random variables and , then the residual interference of the PF can be evaluated as follows:
(22)  
with SIR being the SignaltoInterference Ratio and . For , and using the coefficients presented in [10] to design the MMB4 PF and the related FB impulse response, we have and the obtained SIR is dB for the proposed NPR1 PF. This SIR has the same order of magnitude as the SIR of the MMB4 PF ( dB [23][10]), confirming the near perfect reconstruction nature of the proposed short filter.
Iv Performance evaluation
This section evaluates and compares the performance of the proposed NPR1 short PF with respect to SoTA ones. It provides a comparison of different FBMC short PFs, including the proposed one, in terms of spectral usage and SIR when applying a truncated FS implementation. Their robustness against several types of channel impairments is also evaluated and compared with OFDM, for both PPN and FS implementations. These impairments include timing synchronization errors, carrier frequency offset and the use of a multipath channel.
Iva Comparison of outofband power leakage
One of the main advantages of FBMC over OFDM resides in its spectrum shape with low OutofBand Power Leakage (OOBPL). Consequently, a shorter guardband can be used to fit the Adjacent Channel Leakage Ratio (ACLR) constraints and to support relaxed synchronization communication services. In general, long PFs have lower OOBPL when compared to short PFs on one side, but lose the other advantages of short PFs provided in Section III on the other side.
However, depending on the chosen short PF, the spectral characteristics may vary. The frequency response of the QMF1 PF, depicted in Figure 2a, has the secondary lobes with the highest amplitude, followed by TFL1 then NPR1 PF. This explains the reasons behind the differences in the OOBPL depicted in Figure 3. Indeed, this figure shows the power spectral density of OFDM and FBMC with different short PFs, on a MHz bandwidth. Simulation parameters correspond to a 4G/LTE setting were a notch of subcarriers, or 1 Ressource Block (RB), was inserted in the spectrum to evalute the capacity to support fragmented spectrum for asynchronous communication services.
As expected, the OOBPL is extremely low for FBMC: a gap of dB can be observed between OFDM and FBMC at the extreme edges of the bandwidth, independently from the used PF. For NPR1 case, the OOPBL quickly decreases when compared to the other PFs, since it has the lowest secondary lobes (Figure 2a). Inside the notch, a gap of dB can be observed between OFDM and FBMC with NPR1 PF, and a difference of dB between this PF and QMF1. These results demonstrate that, despite using a short PF, the OOBPL is still very low for FBMC when compared to OFDM, even in a fragmented band. In conclusion, NPR1 represents the most suitable short PF to respect high ACLR constraints.
IvB Truncation impact on the frequency response of the filter
The frequency response of the PF can be truncated to reduce the complexity of the FSFBMC receiver. By truncating the frequency response of the PF at the receiver side, interference may appear due to a non perfect reconstruction, resulting in performance degradation. However, if , the number of nonzero coefficients, is too high, the resulting FS implementation will require important hardware complexity. A compromise between complexity and performance must be devised.
Equation (22) can be adapted to evaluate the residual interference introduced by the truncation. In this case, the PF impulse response is replaced by the truncated one in (IIIB), where is expressed as follows:
(23) 
The values of are obtained by computing an IFFT of size on the nonzero coefficients of . Then, using similar mathematical development as described in Subsection IIIB from (IIIB), the analytic expression of the SIR is:
(24) 
where is the FB impulse response using the PF at the transmitter side and the PF at the receiver side. It is expressed as follows:
(25) 
The obtained numerical values of the SIR are presented in Figure 4 for different PFs and corresponding values. The analytical results of (24) have also been confirmed by simulations. Table III summarizes the needed number of nonzero coefficients for a SIR target ranging between and dB, depending on the used PF.
SIR (dB)  TFL1  NPR1  QMF1 

50  23  7  29 
55  31  7  41 
60  45  15  59 
65  65  23  83 
70  91  35  115 
The dB target SIR may be interesting to consider as it corresponds to the nearly perfect reconstruction case of the MMB4 PF. However, this target requires a large number of coefficients ( for NPR1). In practice, a SIR due to truncation of dB may be sufficient since channel impairments already degrade the resulting SIR, as illustrated in the next subsections. For the rest of the paper, is chosen so that each PF has the same SIR of dB, enabling a fair comparison. Therefore, we have:

for TFL1.

for NPR1.

for QMF1.
The TFL1 and QMF1 PFs require more than nonzero coefficients to obtain this SIR target for a FS implementation. Such high number of coefficients may not be acceptable if a low complexity receiver is targeted. For the NPR1 PF, only coefficients are required to achieve a SIR up to dB, making it better suited for the FS implementation. It is worth noting that a convolution operation using coefficients may appear too complex to implement in practice. Therefore, a lowcomplexity hardware architecture is proposed in Section V to address this aspect.
IvC Robustness to timing offset
Timing offset impairment occurs when the transmitter and receiver baseband samples are not perfectly aligned in time. It is always the case in practice, since the channel introduces a propagation delay. Therefore, timing synchronization algorithms must be employed. In LTE uplink case, the timing synchronization is realized using time advance mechanism [24] to compensate the propagation delay of each User Equipment (UE) located at different geographical distance from the base station. However, new highly demanding scenarios like massive machine communications are considered in 5G. To reduce energy consumption and to improve spectral usage, time advance mechanism should be avoided and relaxed synchronization should be supported, where the propagation delay of each UE is not compensated. Therefore, synchronization errors appear, which causes two types of impairments:

Linear phase rotation for each subcarrier due to the additional delay. This effect can be totally compensated after channel estimation and equalization. Indeed, if is the time offset in number of samples, then the frequency domain compensation term is expressed as .

InterSymbol and InterCarrier Interference (ISI and ICI) due to PF misalignment between the transmitter and the receiver.
It is considered, in this paper, that the OFDM signal is synchronized () at the middle of its cyclic prefix. If , where is the length of the cyclic prefix, then orthogonality is perfectly restored, since a circular shift in time domain represents a linear phase rotation in frequency domain. In 4G/LTE, for OFDM. Thus, orthogonality is still guaranteed if , where represents the absolute value operator.
Due to the absence of CP in a FBMC system, timing offset will result in unavoidable performance degradation. However, depending on the use of PPN or FS implementation, results are different due to the application of different timing offset compensation techniques. For the FS implementation, the compensation step lies between the FFT and the FS filtering stage, whereas in the PPN case it is performed after the PPN and the FFT.
For the PPNFBMC case, the SIR expression in (16) can be adapted to obtain the expression of the recovered PAM symbol when a timing offset of samples is applied, as follows:
and the expression of the SIR becomes:
(26) 
Note that the number of FBMC symbols acting as interference denoted by must be set to . Indeed, when a timing offset is considered, the FBMC symbols at and are now acting as interference. Therefore, the obtained numerical values are similar to that obtained by simulation in Figure 5. Concerning the FSFBMC receiver, it has been evaluated in [25], where the following expression is obtained:
(27) 
where is the power of the residual interference when . In our case, the residual interference comes from the NPR nature of the PF and the truncation applied on the filter coefficients. Therefore, we have:
Figure 5 shows SIR values for each considered short PF using the FSFBMC receiver. The parameters used for the PFs are those defined in Subsection IVB. It is clear that independently from the used PF, the FS implementation outperforms the PPN implementation, a result that was already demonstrated for the case of the MMB4 PF [15] and the QMF1 PF [19]. When using the PPN implementation, the timing offset error compensation is done in frequency domain, thus after the filtering stage. This lowers the compensation efficiency, causing ICI and ISI as mentioned above. In case of FS implementation, the compensation is more efficient since the filtering stage is performed in frequency domain, after compensation of the timing offset error. This explains the gap in performance between PPN and FS implementations.
A gain of more than dB can be observed with NPR1 PF when compared to QMF1 PF for timing offset inferior to 5%, and dB when compared to TFL1 PF. Around dB of difference is visible between TFL1 and QMF1 PFs. From (27), it is clear that the NPR1 achieves a higher SIR than the other PFs. Figure 2b shows the impulse response of each PF. It can be observed that the amplitude of the NPR1 impulse response is lower at its edges. Therefore, the term has the lowest value for NPR1, confirming its higher robustness against timing offset error than the other PFs.
Concerning OFDM, it is clearly outperformed by FSFBMC implementation when . A gap of at least dB can be observed between FBMC with NPR1 PF and OFDM. For lower timing offset impairments, FBMC still exhibits acceptable performance since the SIR remains superior to dB for the NPR1 PF.
These results, validated by simulation, point out that the proposed NPR1 is the most interesting PF to combat timing offset impairment due to imperfect timing synchronization. This is particularly interesting to fulfill the relaxed synchronization requirement foreseen in specific 5G communication scenarios like massive machine communications.
IvD Robustness to frequency offset
Frequency offset impairment is a common issue in communication systems, and it is the consequence of the transmitter and/or receiver being in a situation of mobility (Doppler Shift/Spread). It also appears when there is a frequency misalignment in local oscillators of the transmitter and the receiver. Mathematically, it corresponds to a linear phase rotation of the received baseband samples. In 4G/LTE downlink case, the CFO is estimated and compensated in time domain by multiplying the received baseband samples by , where is the CFO value relative to the subcarrier spacing expressed as a percentage. However, in 4G/LTE uplink (and related 5G scenarios), it is not possible to compensate it directly in time domain since all baseband signals of all users overlap. In fact, it generates two types of impairments after demodulation:

Common Phase Error impairment (CPE). All the subcarriers in a given symbol experience a phase rotation. The rotation angle is incremented at each received symbol. It can be easily compensated in frequency domain if the CFO is estimated, as the CPE term to compensate is .

ICI due to misalignment of the transmitter and receiver PFs in frequency domain, also resulting in interuser interference (IUI) in related 5G scenarios.
The second described impairment represents a major issue, particularly for OFDM due to its low frequency localization. FBMC is naturally more robust against this type of ICI, especially when using a short PF [3]. Therefore, it is expected that FBMC has higher robustness against CFO than OFDM. This is confirmed in Figure 6, which shows the SIR performance in presence of CFO with all the considered PFs, obtained both by numerical and simulation results. The SIR expression can be obtained by adapting equation (16), as follows:
Assuming that the interference introduced by the truncation is negligible, the expression of the SIR for both FSFBMC and PPNFBMC receivers is:
Up to dB of SIR can be observed between OFDM and FBMC with the NPR1 PF. This later is also the PF having the highest robustness against CFO. When compared to the TFL1 PF, a difference of dB is observed, and almost dB when compared to the QMF1 PF.
For this particular channel impairment, PPNFBMC and FSFBMC receivers have similar performance. This can be explained by the fact that the compensation term only depends on the FBMC symbol index. Therefore, it can be integrated before and after the filtering stage without any mathematical difference. The only difference comes from the interference introduced by the filter truncation in FSFBMC, however the impact on the SIR is negligible.
IvE Performance comparison over multipath channels
In the context of the 4G/LTE standard, three multipath fading channel models are defined [26]:

Extended Pedestrian A (EPA) model: ns,

Extended Vehicular A (EVA) model: ns,

Extended Typical Urban (ETU) model: ns,
where corresponds to the delay spread of the multipath channel. The delay and power profiles of each channel model are detailed in [26]. In the 4G/LTE standard, an OFDM symbol duration is always equal to without CP, and the subcarrier spacing is always equal to kHz.
This subsection aims at evaluating the effect of these channels on the error rate performance of uncoded FBMC using different short PFs, with PPN and FS implementations. LTE parameters are considered for an IFFT length of and a 16QAM constellation. Thus, for OFDM, and actives subcarriers are used, corresponding to RBs. However, the frame structure of LTE is not perfectly respected, since Demodulation Reference Signals (DM RS) [27] are not transmitted, and the Channel State Information (CSI) is considered to be perfectly known. Note that the CSI needs to be estimated in practice by sending, for instance, coded auxiliary pilots [28].
For a fair comparison, the same equalization technique is used for OFDM and FBMC. The equalization step is realized after the computation of the FFT, in frequency domain. The output samples of the FFT are simply divided by the frequency response of the channel, realizing the classical lowcomplexity and persubcarrier Zero Forcing (ZF) equalizer.
Static (no Doppler shift/spread) multipath channels with Additive White Gaussian Noise (AWGN) are considered to only evaluate the multipath and fading effect on the performance of OFDM and FBMC demodulators in terms of Bit Error Rate (BER). Figure 7a shows the BER performance when using EPA channel models, considering PPN and FS based FBMC with different PFs. As expected, the FS implementation outperforms the PPN implementation for all the considered PFs, particularly at higher SNR values. A difference of at least one decade of BER can be observed at dB. Furthermore, the FS implementation with TFL1 and QMF1 PFs shows comparable performance to OFDM with CP. FS implementation with NPR1 offers slightly better results than OFDM at moderate Eb/No values ( dB), due to the absence of CP and its robustness against all the different types of timing impairments.
Similar conclusions can be made for a channel with a longer delay spread like EVA, as shown in Figure 7b. However, an exception should be made for the QMF1 PF, since it exhibits a performance level inferior to OFDM. On the other hand, with the NPR1 PF, FBMC remains superior to OFDM even for an EVA channel.
Due to the absence of a CP, FBMC seems to be more sensitive to long delay spread channels as it is the case for the static ETU channel model. Indeed, OFDM with CP outperforms FBMC on this type of channels when dB, as shown in Figure 7c. At low values, the FS implementation is close to OFDM, and offers better results than the PPN implementation. NPR1 is again the most interesting short PF when using a FS implementation.
In fact, when deep fading occurs, the received signal is highly degraded, as shown in Figure 8. This figure represents the SIR per subcarrier with a randomly generated ETU channel, for different PFs and implementations. In the case of a flat fading in band, almost no interference occurs in FS implementation case (SIR dB). It is however not the case for the PPN implementation, where a gap of at most dB can be observed when compared to FS implementation, confirming the superiority of the FS implementation. Finally, as the delay spread of the ETU channel model being approximately two times longer than the delay spread of the EVA channel model, one straightforward solution is to double the duration of the FBMC symbol.
V Complexity evaluation
The FSFBMC receiver is known to be more complex than the PPNFBMC receiver. This is mainly due to the required convolution operation with truncated coefficients, compared to the simple windowing operation of the PPNbased implementation when short PFs are used. However, additional complexity reduction is possible for the FSbased receiver thanks to the properties of the PF and the OQAM scheme. After detailing the proposed complexity reduction approach, a hardware architecture is proposed for the FS filter stage and its complexity is evaluated and compared to the PPN filter stage.
Va Complexity reduction of the filter stage
When a short PF is considered (), the FFT size becomes equal to , and there is no downsampling step after the filter stage. Therefore, the complexity overhead comes from the circular convolution operation, which depends on the number of filter coefficients after truncation. A circular convolution typically requires Complex Multiplications (CMs) and Real Additions (RAs) per sample. However, these resources can be reduced by exploiting the properties of the PF and the OQAM scheme. First, if and , verified for the NPR1 PF (IIIB), then we have:
(28) 
This expression shows that is real valued. Therefore, the filter stage requires now only Real Multipliers (RMs) per sample. Furthermore, can be expressed as follows:
Consequently, the output of the filter stage becomes:
where if and . One RM can be removed by rescaling the PF coefficients by . Then, (VA) becomes:
with . This scaling factor can be integrated without complexity increase in the equalizer coefficients. Alternatively, it can be taken into account in the decision stage (QAM demapper). The rescaled PF coefficients can be computed during design time, and stored in a LookUpTable (LUT). At this step, the filter stage requires RMs per sample. This number can be further reduced since half of the circular convolution outputs are discarded due to the OQAM scheme. In this case, we have:
(32) 
where being the rescaled signal. Therefore, there is no need to process or depending on the parity of and . Only RMs and RAs per sample are now required. Additionally, only the outputs corresponding to the allocated subcarrier indexes can be considered. If the number of allocated subcarriers is denoted by , this gives RMs required per FBMC symbol.
Regarding the PPN stage, it requires RMs per FBMC symbol for short PFs. Thus, when considering only multiplication operations, the complexity ratio between FS and PPN filter stages is given by , with in 4G/LTE. For QMF1 () and TFL1 () PFs, the complexity (in number of RMs) is multiplied by and respectively. This confirms that these PFs are not suitable for the FS implementation. However, for NPR1 PF (), the FS filter stage is less complex than the PPN stage.
It is worth noting that the PPN stage requires a LUT memory of depth to store the PF coefficients. Furthermore, the FS filter stage still require additions per FBMC symbols. When targeting hardware implementation, registers must also be considered to store the input signal due to the iterative processing of the circulation convolution operation. Finally, this operation uses constant filter coefficients which do not change during the processing iterations. This particularity can be taken into account to further reduce the complexity when considering hardware implementation.
Therefore, the above ratio may not be accurate enough to reflect the comparative hardware complexity since it only considers the number of multipliers. For an accurate comparison, we propose a detailed hardware architecture for the FS filter stage.
VB Proposed hardware architecture for the FS filter stage
The circular convolution operation can be efficiently implemented in hardware using a typical FIR filter architecture. Such architecture can take one input and generate one output per clock cycle in pipelined manner. If a Multiple Constant Multiplier (MCM) is used, multiplierless FIR architecture can be designed for fixedpoint precision. If are the PF coefficients quantized on bit to be multiplied by a bit input , then:
where denotes the symbol number of the Canonical Signed Digit (CSD) representation of [29]. Therefore, only adders and registers are required for this architecture. It is advantageous to consider it if the same set of coefficients has to be reused for each processed sample, which is the case for the FS filter stage. As an adder requires less hardware resources (logic gates) than a multiplier, important hardware complexity reduction is expected.
As a baseline solution, (VA) can be directly implemented using one FIR filter for the computation of the real part, and another FIR filter for the computation of the imaginary part. It is however not an optimal choice if a lowcomplexity implementation is targeted, since half of the generated samples by both FIR filters are discarded due to the OQAM scheme, as shown in (32). However, this equation cannot be directly implemented using a typical FIR filter. In fact, the content of the corresponding registers must be switched between the real and imaginary parts of the stored samples. Therefore, we propose a novel architecture adapted for the FS filter stage.
In the following, only even FBMC symbol indexes are considered ( and ). Similar demonstration can be applied for the odd indexed symbols. Index of (32) can be rewritten as follows:
(34) 
where and . Similarly, index of (32) becomes:
(35) 
The above equations show that the FS filter stage can be separated into two FIR filters, each respectively holding even and odd indexes of the PF coefficients. Indeed, for each received sample, its real part is processed by the evenindexed FIR filter, while its imaginary part is processed by the oddindexed FIR filter. Conversely, the real part of is processed by the oddindexed FIR filter, and its imaginary part by the evenindexed FIR filter. Therefore, FIR filters are required, similarly to the baseline solution. However, the number of required coefficients per FIR filter is divided by , reducing the complexity. The obtained architecture using the NPR1 filter is presented in Figure 9.
The Even MCM (EMCM) unit generates the multiplications by the evenindexed filter coefficients, while the Odd MCM (OMCM) unit generates the multiplications by the oddindexed filter coefficients. The behaviour of the architecture executes in two phases, which are repeated continuously. Each phase takes one clock cycle.
In the first phase, the real part of is sent to the EMCM unit while its imaginary part is sent to the OMCM unit. Meanwhile:

the registers of the Even Real Data Path (ERDP) (Figure 9), belonging to the evenindexed FIR filter, are enabled by the select_DP control signal,

the registers of the Odd Imaginary Data Path (OIDP), belonging to the oddindexed FIR filter, are also enabled,

the registers of the Even Imaginary Data Path (EIDP) and the Odd Real Data Path (ORDP) are both disabled.
Finally, the outputs of the ERDP and the ORDP are summed together. Furthermore, sign inversion is performed depending on the term value of (32).
At the second phase, the real part of is sent to the OMCM unit while its imaginary part is sent to the EMCM unit. The registers which were disabled (respectively enabled) are now enabled (respectively disabled) by the control signal. The outputs of the EIDP and the OIDP are selected and summed together, followed by a possible sign inversion depending on the term value.
VC Hardware complexity comparison
The proposed FS filter stage architecture, in addition to the baseline architecture, were described in VHDL/Verilog and synthesized targeting the XC7z020 Xilinx Zynq SoC device. All the MCM units are generated using the SPIRAL code generator [30]. The results, summarized in Table IV, include:
The architecture of the PPN unit in [13] is adapted to process FBMC symbols in parallel (OQAM scheme). To enable a fair comparison, we have considered FS filter stages in parallel, in such a way that the same processing speed is achieved. Furthermore, the same quantization chosen in [13] is considered:

the samples at the input and the output of each unit use 16bit quantization,

all filter coefficients are quantized on 12bits.
Only the NPR1 PF is considered in this section. The QMF1 and TFL1 PFs are less adapted for FS implementation since they require, at least, times more filter coefficients (see Subsection VA).
The baseline solution of the FS filter stage requires % less LUTs than the PPN unit. This confirms that one MCM is less complex than multipliers. On the other hand, the FS filter stage uses times more flipflops due to the FIR filter registers. If we consider that LUTs and flipflops have similar complexity, then the baseline FS filter stage requires % more hardware resources than the PPN unit. It also achieves a clock frequency speed of MHz, % less than the PPN unit.
Concerning the proposed architecture, it requires % less LUTs than the PPN unit and % less LUTs than the baseline solution. The number of required flipflops is almost unchanged when compared to the baseline solution. However, the proposed architecture is less complex than the PPN implementation since it requires % less in total hardware resources. It also achieves a clock frequency speed of MHz, which is % higher than the baseline solution.
Section IV shows that the FSFBMC receiver offers improved robustness when compared to PPNFBMC against timing offset and multipath impairments (assuming ZF equalizer is used). Furthermore, hardware complexity evaluation conducted in this section shows that the proposed FS filtering stage (using the proposed NPR1 PF) has a lower complexity than the PPN implementation. This concludes that the FSFBMC receiver is more advantageous to use than the PPNFBMC receiver when using the proposed short NPR1 filter.
Filter stage  LUTs  FlipFlops  Total  Frequency 

PPN unit [13]  MHz  
Baseline FS filter stage  MHz  
Proposed FS filter stage  MHz 
Vi Conclusion
In this paper, a novel short PF (NPR1) suitable for several 5G scenarios is proposed. In presence of timing offset due to imperfect synchronization, the NPR1 PF, combined with the FS implementation, exhibits a gain of more than 8 dB of SIR when compared to SoTA short PFs (TFL1 and QMF1). It outperforms OFDM, where a gap of dB of SIR can be observed. The NPR1 PF is also the most robust filter to combat CFO. In the case of 4G/LTE multipath channel, the NPR1 PF is even better than OFDM for the EPA channel model, due to the absence of CP. In the case of ETU channel model, the NPR1 PF shows improved performance when compared to other FBMC PFs. Finally, an efficient hardware architecture of the FS filter stage is proposed. Hardware complexity evaluation shows that the proposed FSbased FBMC receiver, using the NPR1 PF, requires % less hardware resources than the PPNbased FBMC receiver. Therefore, combining the proposed NPR1 filter and FSFBMC receiver architecture provides an original solution that combines complexity reduction and performance improvement with respect to a typical PPNFBMC receiver.
References
 [1] M. Maternia et al., “5g ppp use cases and performance evaluation models,” https://5gppp.eu/wpcontent/uploads/2014/02/5GPPPusecasesandperformanceevaluationmodeling_v1.0.pdf, Apr. 2016.
 [2] M. Schellmann et al., “FBMCbased air interface for 5g mobile: Challenges and proposed solutions,” in 2014 9th Int. Conf. on Cognitive Radio Oriented Wireless Networks and Commun. (CROWNCOM), Jun. 2014, pp. 102–107.
 [3] H. Lin, M. Gharba, and P. Siohan, “Impact of time and carrier frequency offsets on the FBMC/OQAM modulation scheme,” Signal Process., vol. 102, pp. 151–162, Sep. 2014.
 [4] B. Saltzberg, “Performance of an Efficient Parallel Data Transmission System,” IEEE Trans. on Commun. Technology, vol. 15, no. 6, pp. 805–811, Dec. 1967.
 [5] P. Siohan, C. Siclet, and N. Lacaille, “Analysis and design of OFDM/OQAM systems based on filterbank theory,” IEEE Trans. on Signal Process., vol. 50, no. 5, pp. 1170–1183, May 2002.
 [6] P. Sabeti, H. SaeediSourck, and M. Omidi, “Lowcomplexity CFO correction of frequencyspreading SMT in uplink of multicarrier multiple access networks,” in 2015 23rd Iranian Conference on Electrical Engineering (ICEE), May 2015, pp. 410–415.
 [7] M. Bellanger and J. Daguet, “TDMFDM Transmultiplexer: Digital Polyphase and FFT,” IEEE Trans. on Commun., vol. 22, no. 9, pp. 1199–1205, Sep. 1974.
 [8] B. Hirosaki, “An Orthogonally Multiplexed QAM System Using the Discrete Fourier Transform,” IEEE Trans. on Commun., vol. 29, no. 7, pp. 982–989, Jul. 1981.
 [9] M. Bellanger, “FSFBMC: An alternative scheme for filter bank based multicarrier transmission,” in 2012 5th Int. Symp. on Commun. Control and Signal Process. (ISCCSP), May 2012, pp. 1–4.
 [10] K. Martin, “Small sidelobe filter design for multitone datacommunication applications,” IEEE Trans. on Circuits and Syst. II: Analog and Digital Signal Process., vol. 45, no. 8, pp. 1155–1161, Aug. 1998.
 [11] “Phydyas project,” http://www.ictphydyas.org.
 [12] Y. Dandach and P. Siohan, “FBMC/OQAM Modulators with Half Complexity,” in 2011 IEEE Global Telecommun. Conf. (GLOBECOM 2011), Dec. 2011, pp. 1–5.
 [13] J. Nadal, C. Nour, and A. Baghdadi, “Lowcomplexity pipelined architecture for FBMC/OQAM transmitter,” IEEE Trans. on Circuits and Syst. II: Express Briefs, vol. PP, no. 99, pp. 1–1, 2015.
 [14] M. Bellanger, “FSFBMC: A flexible robust scheme for efficient multicarrier broadband wireless access,” in 2012 IEEE Globecom Workshops (GC Wkshps), Dec. 2012, pp. 192–196.
 [15] V. Berg, J.B. Dore, and D. Noguet, “A flexible FSFBMC receiver for dynamic access in the TVWS,” in 2014 9th Int. Conf. on Cognitive Radio Oriented Wireless Networks and Commun. (CROWNCOM), Jun. 2014, pp. 285–290.
 [16] B. Le Floch, M. Alard, and C. Berrou, “Coded orthogonal frequency division multiplex [TV broadcasting],” Proc. of the IEEE, vol. 83, no. 6, pp. 982–996, Jun. 1995.
 [17] D. Pinchon, P. Siohan, and C. Siclet, “Design techniques for orthogonal Modulated filterbanks based on a compact representation,” IEEE Trans. on Signal Process., vol. 52, no. 6, pp. 1682–1692, Jun. 2004.
 [18] H. Malvar, “Modulated QMF filter banks with perfect reconstruction,” Electron. Lett., vol. 26, no. 13, pp. 906–907, Jun. 1990.
 [19] M. Bellanger, D. Mattera, and M. Tanda, “LappedOFDM as an Alternative to CPOFDM For 5g Asynchronous Access and Cognitive Radio,” in Veh. Technology Conf. (VTC Spring), 2015 IEEE 81st, May 2015, pp. 1–5.
 [20] M. Lanoiselee et al., “Comparative evaluation on realtime hardware platforms of coded OFDM/QAM and OFDM/OQAM systems,” in 2012 Int. Symp. on Wireless Commun. Syst. (ISWCS), Aug. 2012, pp. 186–190.
 [21] J. Nadal et al., “Hardware prototyping of FBMC/OQAM baseband for 5G mobile communication systems,” in IEEE Int. Symp. Rapid Syst. Prototyping (RSP), Delhi, India, oct 2014, pp. 135–141.
 [22] D. Pinchon and P. Siohan, “Derivation of analytical expressions for flexible PR low complexity FBMC systems,” in Signal Process. Conf. (EUSIPCO), 2013 proc. of the 21st European, Sep. 2013, pp. 1–5.
 [23] M. Bellanger, “FBMC physical layer: A primer,,” PHYDYAS FP7 Project Document, Jan. 2010.
 [24] 3GPP TS 36.133, “Requirements for support of radio resource management,” http://www.3gpp.org/dynareport/36133.htm.
 [25] D. Mattera, M. Tanda, and M. Bellanger, “Performance analysis of some timing offset equalizers for FBMC/OQAM systems,” Signal Processing, vol. 108, pp. 167 – 182, 2015.
 [26] 3GPP TS 36.104, “Base station (bs) radio transmission and reception,” http://www.3gpp.org/dynareport/36104.htm.
 [27] 3GPP TS 36.211, “Physical channels and modulation,” http://www.3gpp.org/dynareport/36211.htm.
 [28] W. Cui, D. Qu et al., “Coded auxiliary pilots for channel estimation in fbmcoqam systems,” IEEE Transactions on Vehicular Technology, vol. 65, no. 5, pp. 2936–2946, May 2016.
 [29] R. M. Hewlitt and E. S. Swartzlantler, “Canonical signed digit representation for FIR digital filters,” in IEEE Workshop on Signal Processing Systems, 2000, pp. 416–426.
 [30] M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, “SPIRAL: Code generation for DSP transforms,” Proceedings of the IEEE, special issue on “Program Generation, Optimization, and Adaptation”, vol. 93, no. 2, pp. 232– 275, 2005.