Efficient DSP and Circuit Architectures for Massive MIMO: State-of-the-Art and Future Directions

Efficient DSP and Circuit Architectures for Massive MIMO: State-of-the-Art and Future Directions

Abstract

Massive MIMO is a compelling wireless access concept that relies on the use of an excess number of base-station antennas, relative to the number of active terminals. This technology is a main component of 5G New Radio (NR) and addresses all important requirements of future wireless standards: a great capacity increase, the support of many simultaneous users, and improvement in energy efficiency.

Massive MIMO requires the simultaneous processing of signals from many antenna chains, and computational operations on large matrices. The complexity of the digital processing has been viewed as a fundamental obstacle to the feasibility of Massive MIMO in the past. Recent advances on system-algorithm-hardware co-design have led to extremely energy-efficient implementations. These exploit opportunities in deeply-scaled silicon technologies and perform partly distributed processing to cope with the bottlenecks encountered in the interconnection of many signals. For example, prototype ASIC implementations have demonstrated zero-forcing precoding in real time at a 55 mW power consumption (20 MHz bandwidth, 128 antennas, multiplexing of 8 terminals). Coarse and even error-prone digital processing in the antenna paths permits a reduction of consumption with a factor of 2 to 5. This article summarizes the fundamental technical contributions to efficient digital signal processing for Massive MIMO. The opportunities and constraints on operating on low-complexity RF and analog hardware chains are clarified. It illustrates how terminals can benefit from improved energy efficiency. The status of technology and real-life prototypes discussed. Open challenges and directions for future research are suggested.

I Introduction

Massive MIMO is an efficient sub-6 GHz physical-layer technology for wireless access, and a key component of the 5G New Radio (NR) interface [1]. The main concept is to use large antenna arrays at base stations to simultaneously serve many autonomous terminals, as illustrated in Figure 1 [2, 3]. Smart processing at the array exploits differences among the propagation signatures of the terminals to perform spatial multiplexing. Massive MIMO offers two main benefits:

  1. Excellent spectral efficiency, achieved by spatial multiplexing of many terminals in the same time-frequency resource [4, 5]. Efficient multiplexing requires channels to different terminals to be sufficiently distinct. Theory as well as experiments have demonstrated that this can be achieved both in line-of-sight and in rich scattering.

  2. Superior energy efficiency, by virtue of the array gain, that permits a reduction of radiated power. Moreover, the ability to achieve excellent performance while operating with low-accuracy signals and linear signal processing further enables considerable savings in the power required for signal processing.

This overview paper focuses on sub-6 GHz Massive MIMO systems implemented with fully digital per-antenna signal processing. Massive MIMO at mmWave frequencies is also possible, and can benefit from the large bandwidth available at these frequencies. Propagation and hardware implementation aspects are different at mmWaves; for example, hybrid analog-digital beamforming approaches are typically considered [6]. However, this is not discussed further here.

The complexity of the signal processing has been considered a potential obstacle to actual deployment of Massive MIMO technology. An obvious concern is how operations on large matrices and the interconnection of the many antenna signals can be efficiently performed in real-time. Moreover, real-life experiments have shown that the channel responses to different terminals can be highly correlated in some propagation environments. Appropriate digital signal processing hence needs to feature interference suppression capabilities, which further increases complexity.

This paper discusses the digital signal processing required to realize the Massive MIMO system concept, and examines in detail the co-design of algorithms, hardware architecture, and circuits (Figure 2). Unconventional, low-complexity digital circuitry implementations in deeply scaled silicon are possible, despite (and thanks to) the excess number of antenna signals. A careful choice of algorithmic and circuit parameters permits considerable reduction of the average energy consumption. Terminals in turn can be implemented at low complexity while benefiting from the channel hardening effect, that offers increased reliability.

Proof of concept implementations and demonstrations have revealed constraints that turned out more harsh than anticipated in initial theoretical assessments. This concerns the interconnection of the signals from all antennas, which poses a bottleneck that partly necessitates distributed processing. Also, relaxing the specifications of the analog and RF chains can result in higher distortion both in-band and out-of-band than initially anticipated, as hardware imperfections can in general not be considered uncorrelated.

Fig. 1: Massive MIMO exploits large antenna arrays at the base stations, to spatially multiplex many terminals.
Fig. 2: Massive MIMO opens up new hardware-software co-design opportunities for low-complexity circuitry.

The rest of the paper is organized as follows. First, basic concepts and notation are introduced. Next, we provide a complexity analysis considering computation as well as data transfer. The following section zooms in on the RF and front-end, highlighting the opportunities and constraints of relaxing their specifications in the large-number-of-antennas regime. Subsequently, the central detector and precoder blocks are detailed and major complexity reductions facilitated by algorithm-hardware co-design are demonstrated. Signal processing leveraging on error-resilient circuits in the per-antenna functionality is discussed next, and consequent energy savings are illustrated. Further we introduce introduces the increased reliability that can be delivered on complexity terminals. Finally, in the conclusions we discuss validation performed in real-life test beds, summarize opportunities and constraints in efficient processing for Massive MIMO systems, and suggest future research directions.

Ii Massive MIMO System Model

This section introduces the notation for MIMO transmission that is used in the paper. Further details can be found in, for example, [3]. We consider the block-fading model where the time-frequency domain is partitioned into coherence intervals within which the channel is static. The number of samples in each coherence interval is equal to the coherence time in seconds multiplied by the coherence bandwidth in Hertz. For the signal processing algorithms discussed in this paper, it does not matter whether there is coding across coherence intervals or not.

In every coherence interval, a flat fading complex baseband channel model applies. Let be the number of antennas at the base station, and the number of terminals served simultaneously. Also, denote by the -vector of channel responses between the th terminal and the array. Then on uplink, for every sample in the coherence interval,

(1)

where is an -vector comprising samples received at the base station array, are symbols sent by the th terminal, and is noise. On downlink, assuming linear precoding,

(2)

where is the sample received by the th terminal, is a precoding vector associated with the th terminal, is the symbol destined to the th terminal, and is receiver noise.

The base station forms a channel estimate, , of for each terminal by measurements on uplink pilots. Channel estimation is discussed extensively in for example [3] (for independent Rayleigh fading) and [7] (for correlated fading models).

On uplink, the data streams from the terminals are detected through linear processing. This entails multiplication of with a vector, for each terminal, yielding the scalar . Common choices of the detection vector include

(3)

where is a normalizing constant (different for the three methods), and . The result of this linear processing will comprise the desired signal, embedded in additive interference and noise.

On downlink, channel reciprocity is leveraged. Low-complexity front-ends typically introduce non-reciprocity and this non-reciprocity needs to be compensated for; see Section IV. The base station forms the transmitted vector in (2) where the precoding vector is given by:

(4)

where, again, are normalizing constants and is a regularization parameter. The signal received at the terminal will contain the symbol of interest, plus additive interference and noise.

Many variations are possible and detection and precoding that take multi-cell interference into account are also possible [7, 8].

Iii Signal Processing and Data Transfer Complexity Assessment

Both Massive MIMO base stations and terminals can be implemented with significantly better energy efficiency compared to in conventional systems. This is possible owing to a combination of effects. First, the array gain permits a reduction of the radiated power. Second, the large number of constituent signals promotes excellent performance while operating relatively simple algorithms on coarse signals.

In this section we focus on the processing at the base station side. The opportunity to reduce terminal-side complexity is discussed in Section VII. First, a high-level assessment of the signal processing requirements, in terms of number of computations, is presented. The data transfer and interconnection of signals poses a distinct bottleneck. Hence, next a distributed processing approach is presented to balance performance and complexity.

Iii-a Computational Complexity

We first analyze the computational complexity of a Massive MIMO base station. Figure 3 shows a high-level block diagram of the signal processing for an OFDM-based massive MIMO system. Other modulation options can be used, and single-carrier schemes may be preferred. The overall partition of the processing presented here will still hold.

Fig. 3: Signal processing in an OFDM-based massive MIMO system for BS antennas and UEs.

The processing in Massive MIMO systems is logically grouped into three categories:

  1. The outer modem performing symbol (de)mapping, (de)interleaving and channel (de)coding. This processing performed on the transmit/receive bits applies to each User Equipment (UE) individually.

  2. The inner modem comprising channel estimation, and detection and precoding of the uplink and downlink data, respectively. This central processing aggregates/distributes data from/to all the antenna chains.

  3. The per-antenna processing which primarily consists of the analog and digital front-end (mainly re-sampling and filtering) and OFDM processing.

We identify inherent parallelism and observe that the processing complexity scales with the number of BS antennas, , the number of UEs, , or both [9]:

  • Per-antenna processing: Scales with as each antenna requires OFDM (de)modulation and a digital/analog front-end.

  • Central processing: Scales with and .

  • Per-user processing: Scales with .

The number of digital signal processing operations performed in the sub-systems provides a high-level estimate of complexity. Table I gives numbers for a sample system with antennas at the base-station and simultaneous terminals. It is acknowledged that these estimates represent an over-simplification, as the nature and precision of the operations will be an important determining factor in the eventual hardware complexity and power consumption.

Subcomponent Downlink data (DL) Uplink data (UL) Training
[GOPS] [GOPS] [GOPS]
Inner modem 175 520 290
Outer modem 7 40 0
Per-antenna DSP 920 920 920
TABLE I: Estimated number of DSP operations in GOPS, for and , MHz bandwidth, and bps/Hz (16-QAM, code rate 3/4).

Table I demonstrates that the collective per-antenna digital processing is demanding, and requires a minimal-complexity implementation. Interestingly, the per-antenna processing does not need to be performed with high precision to offer very good performance. An in-depth analysis and efficient implementation options are presented in Section VI.

For the inner modem processing in Massive MIMO, a high degree of reconfigurability is desired in order to adapt to changing operating conditions, such as the number of connected UEs, and their SNRs/path losses. Section V discusses efficient algorithm-hardware co-design solutions for the Massive MIMO precoding and detection.

Furthermore, reciprocity calibration needs to be performed occasionally. Elegant solutions have been proposed and demonstrated, see Section IV.

Channel coding clearly is an essential component of the wireless transmission, yet it is not Massive MIMO-specific and therefore not further treated in this paper.

Iii-B Signal Interconnection and Data Transfer Complexity

The transfer of data between processing components creates a significant challenge, as the amount of signals and data to be aggregated/distributed from/to all the antennas is very high. The required data shuffling rate between the per-antenna processing and the central processing is [9]

(5)

where is the sampling rate after OFDM processing and is the word-length of one data sample. For a 100-antenna 20 MHz bandwidth system, the sampling rate at each antenna is 30.72 and thus

(6)

where , , and are the number of data subcarriers, the total number of subcarriers, respectively the number of cyclic prefix samples. Assuming that 24 bits are used for one complex sample, equals 40.32 . This requirement is an order of magnitude higher than in a conventional system.

Additionally, the data transfer network must re-organize data among different dimensions. Figure 4 illustrates the uplink data shuffling between the per-antenna and the central processing. First,

1

in the figure, the data shuffling network aggregates data samples of all subcarriers from all antenna chains. Next,

2

in the figure, it divides the entire data into bandwidth chunks depending on the number of central processing units in the system, and distributes the data to the corresponding processing unit.

Fig. 4: Illustration of the data shuffling between the per-antenna and central processing.

This high data transfer requirements has motivated the development of decentralized processing architectures, which are introduced next.

Iii-C Decentralized Processing

Depending on the selected MIMO processing algorithms, both the processing performed in the per-antenna and in the central units, and the communication between these two, will influence the resulting system performance and overall complexity. For instance, the maximum-ratio precoding operation can be performed in each antenna path in a distributed manner, whereas the zero-forcing algorithm requires centralized processing, specifically for the inversion of the Gram matrix .

Fig. 5: Decentralized processing architecture, performing group-based operations between the per-antenna processing (PAP) and the central unit.

Decentralized processing enables parallel computing and offers a balanced trade-off between system performance and data transfer requirements [10, 11, 12, 13]. The authors of [12] propose a decentralized architecture for both uplink and downlink, illustrated in Figure 5. Instead of aggregating the full channel state information and transmit/received data vectors at the centralized processing node, antenna nodes are grouped into equally sized groups, each serving antenna nodes. A middle -level processing node, labeled group processor, is introduced between the per-antenna and central processor to handle the corresponding data dedicated to the group of antenna nodes. As a result, a limited amount of data is then aggregated/distributed to/from the central processor, relaxing the requirements on the data transfer network. For instance, the Gram matrix calculation can be rewritten as

(7)

where is the local channel estimate for each group of antennas. The decentralized processing is performed such that the terms are computed at each group processor locally, and the results are aggregated at the central processor for the final summation. The tree-like distributed processing architecture is further elaborated in [14], with special focus on modularity and scalability. Especially, the trade-off between data processing, storage, and shuffling is investigated for maximum-ratio transmission, zero-forcing, and MMSE algorithms.

Iv Analog and RF Processing: Relax with Caution!

In traditional base stations, the RF electronics and analog front-ends, and the power amplifiers specifically, consume most of the power [15]. In Massive MIMO, thanks to the array gain provided by the closed-loop beamforming, much less radiated power is needed for the data transmission. This facilitates a significant reduction of the RF complexity and power consumption compared to conventional systems.

The hardware in any wireless transceiver will introduce distortion, and the most important source of distortion is nonlinearities in power amplifiers and quantization noise in A/D-converters. A commonly used model in the literature has been that this distortion is additive and uncorrelated among the antennas [7]. If this were the case, then the effects of hardware imperfections would average out as the number of antennas is increased, in a similar way as the effects of thermal noise average out. In more detail, consider the linear processing in the uplink, ; see Section II. The essence of the argument is that if the received signal at the array, , is affected by uncorrelated additive distortion noise , then the effective power of the useful signal after beamforming processing, , would grow as whereas the power of the distortion, , would be constant with respect to (see [7] for more precise analyses). But unfortunately, this model does not accurately describe the true nature of the hardware distortion.

To understand why, fundamentally, the distortion is correlated among the antennas, consider the downlink in the special case of a single terminal in line-of-sight. Then the signal radiated by the th antenna is simply a phase-shifted version of the signal radiated by the first antenna (). The distortion arising from an amplifier nonlinearity at the th antenna is phase-shifted by the same amount as the signal. Hence, if all amplifiers have identical characteristics (a weak assumption in practice), the distortion is beamformed into the same direction as the signal of interest, and receives the same array gain as that signal of interest. That is, the effects of the distortion do scale proportionally to rather than disappearing as is increased. In this case, the covariance matrix of the distortion, when viewed as an -vector , has rank one. A similar effect exists on the uplink, when the nonlinearities in low-noise amplifiers are considered [16].

In the remainder of this section, we discuss the specifics of distortion arising from amplifier nonlinearities and finite-resolution A/D-converters in more detail. We furthermore discuss the impact and calibration of RF front-end non-reciprocity.

Iv-a Power Amplifiers Benefit from the Large Array

The required output power of a Massive MIMO base station can be reduced inversely proportionally to the square root of number of BS antennas, or even linearly in operating regimes with good channel estimation quality, thanks to the coherent combination of all antenna signals. This results in significantly reduced output specifications of the Power Amplifiers (PAs). The power amplification stage typically accounts for of the power consumption of base stations in wireless broadband macro-cells [15]. Moreover they necessitate cooling, causing a overhead. The reduced output power in Massive MIMO hence can reduce the total power by a factor of 3 in an exemplary 100-antenna base station, assuming that all other contributions remain equal.

The PA mostly operates at a low efficiency as a consequence of a considerable back-off, required to avoid entering the saturation region. For OFDM-based systems such as 3GPP-LTE, the PA typically operates with a back off of 8–12 dB. Best-in-class solutions need complex techniques that achieve an efficiency of [17]. Entering the saturation region introduces non-linear distortion, which comes with two detrimental effects: distortion of the intended signal within the band of interest, and out-of-band (OOB) emissions that result in adjacent channel leakage.

We consider a polynomial memoryless model [18] for the non-linear behaviour of the PA. The impact on the signal at RF can be expressed as:

(8)

where is the input signal to the PA, is the output signal, and is the non-linear distortion coefficient of the PA for the th harmonic component. The third-order harmonic will have the largest impact both in terms of in-band distortion and adjacent channel leakage. Furthermore, the amplitude will be limited to the saturation amplitude for input values exceeding the input saturation amplitude :

(9)

The non-linear distortion resulting from the PAs in the many antenna paths is hence signal dependent. The input signals to the PAs can be correlated, depending on the specific communication scenario in terms of users, channel responses, and power (im)balance among the users. In [19] we analyzed how the distortion terms can combine by means of a basic dual-tone modulation scheme. The following effects can occur:

  1. The distortions may add up coherently in the channel and generate considerable out-of-band emissions. This will be the case for example in a single-user situation with one strongly dominating propagation direction.

  2. In most multi-user scenarios the precoder will provide significant different compositions of signals to the antenna paths and hence power amplifiers. In general, this will randomize the harmonic distortion terms.

The constellation diagrams in Figure 6 illustrate the impact of increasing the number of antennas at the base station on the Error Vector Magnitude (EVM), for a case with equal-strength signals for the different users and i.i.d. Rayleigh fading channels. The results were simulated based on a cubic polynomial model for the PA, which operates in saturation ( dB with respect to the dB compression point). With antennas at the base station, the constellation points are seriously dispersed and an EVM of dB is measured. When increasing the number of antennas, in steps of 10 in the graph, the clarity of the constellation diagram greatly improves and for an EVM of dB is observed.

Fig. 6: Increasing number of base station antennas improves the EVM with PAs operating in saturation.

In conclusion, the power amplifiers benefit from the large array owing to the drastically reduced total output power requirement. Moreover in many typical conditions, Massive MIMO systems will not transmit predominantly to one user and in one direction. One could then operate the PAs efficiently in their non-linear region. Hence, a considerable further improvement of the power consumption could be achieved. However, the inconvenient truth is that in general, directive emissions of OOB radiation can arise under some conditions. More detailed mathematical models and results can be found in [20].

Iv-B Coarse and Lean Convertors

The impact of low-resolution data converters on system performance has been investigated. We give an overview of these theoretical results and discuss them in perspective of actual design constraints and merits of state-of-the-art data converters. These reveal that minimizing the resolution strictly (e.g., below 6 bits) does not result in a significant power reduction in a conventional base station. One should hence question any penalty in system performance and/or additional DSP complexity when considering very low resolution data converters.

A specific type of hardware distortion arises if low-precision A/D converters are used at the base station. Such converters are highly desirable owing to their low cost and power consumption. In principle for each bit reduction in resolution, the A/D converter power is halved. Doubling the sampling frequency will double the power. This is reflected in the common figure-of-merit (F.o.M.) in terms of energy consumption per conversion step (cs) [21] used to assess the design merit of A/D converters implementing different architectural principles and resolution/bandwidth specifications:

(10)

where ENOB is the Effective Number of Bits resolution as measured and is the sampling frequency.

The resulting quantization noise of A/D conversion is fairly easy to model accurately, and rigorous information-theoretic analyses of its effect are available. In some cases, line-of-sight with a single terminal, the quantization noise may combine constructively. However, in frequency-selective, Rayleigh fading channels with large delay-spreads and multi-user beamforming, the distortion averages out over the antennas to a significant extent. Specifically, with 1-bit quantization, the quantization noise has a power equal to where is the received signal power [22], and the aggregate effect of the quantization is approximately a loss in effective SINR of 4 dB. The 1-bit A/D converter case is of particular interest as it allows operation without automatic gain control (AGC), which simplifies hardware complexity. With -bit quantization, , corresponding results can be found in [23], and when grows eventually the capacity formulas for the un-quantized case [3, Ch. 3] are rediscovered. Other authors have derived similar results subsequently [24] – and earlier, using heuristic arguments, [25, 26]. Importantly, these analyses take into account the fact that both the received pilots and the payload data will be affected by quantization noise.

The loss in effective SINR due to quantization needs to be considered relatively to the extra power consumption resulting from adding bits resolution in the A/D converters. Circuit innovation in data converters has brought great improvements in power efficiency. State-of-the-art designs for A/D converter cores achieve figures-of-merit following 10 in the order of fJ/cs [27, 28]. A 6-bit ADC with a speed of several 100 Mbit/s consumes  mW.

Massive MIMO systems operating with low-resolution Digital-to-Analog (D/A) converters at the base station in the downlink transmission have also been studied. There is some evidence that they are sufficient to attain a good performance in terms of achievable link rate [29, 30]. Also, while these analyses are independent of the actual modulation and coding used in the system, numerical end-to-end link simulations have independently arrived at essentially the same conclusion that the degradation of BER performance due to low-precision () D/A converters is negligible [31]. It is however a misconception that the number of bits resolution affects the D/A converter power consumption in a similar way as it does for A/D converters. The constraint on OOB emission in combination with the swing to be delivered to the analog output signal are the dominant factors in the power and complexity of a D/A converter [21]. A relevant standard figure-of-merit (F.o.M.) for current-steering D/A converters is given by

(11)

where SFDR is the spurious free dynamic range, being the distance between the signal and the largest single unwanted component – the spurious signal, and is the peak-to-peak signal swing which accounts for the power (and design problems) needed for generating the analog signal in a digital-to- analog converter. D/A converters with a resolution  bits are conveniently implemented by current injection or resistive architectures whose power consumption is typically not directly impacted by their resolution. In contrast, the complexity of the reconstruction filter in the D/A converter is mostly determined by the SFDR specification, which will eventually determine the out-of-band (OOB) harmonic distortion. Digital predistortion and analog filtering to reduce OOB emissions have been proposed for coarsely quantized precoding in Massive MIMO [32]. The extra processing complexity in deeply scaled technology will be very reasonable, yet a degradation of the in-band signal-to-interference-noise-and-distortion ratio (SINDR) on the link is introduced. This presents the same trade-off between in-band transmission versus out-of-band rejection encountered in D/A converter design.

The trend in broadband wireless systems to increase spectral efficiency through a combination of higher order modulation constellations and conventional multi-layer MIMO has raised the resolution requirement for data converters  bits. Massive MIMO can operate without noticeable implementation loss with only -bit A/D and D/A converter resolution. This reduces the power consumption of an individual A/D converter specifically with a factor , which more than compensates for the fact that times more converters are needed. It is however neither necessary nor overall beneficial to reduce the resolution of A/D and D/A converters below 6 bits:

  • On uplink, reducing the A/D resolution further will save less than  mW in a 100 antenna basestation.

  • On downlink, a potential implementation loss of  dB or more due to a D/A converters with a lower resolution may require more power in the PA stage. More importantly, the constraints on OOB emission will not be met. Dedicated processing will hence be needed to avoid or filter out unacceptable leakage in adjacent bands.

Iv-C Reciprocity Calibration in RF Front-Ends

Channel estimates are obtained from uplink pilots; see Section II. In practice, the response observed by the digital baseband processing for each user includes both the propagation channel and the transceiver transfer functions. The full responses for uplink and downlink can be expressed as:

(12)

where and are complex diagonal matrices containing the base station receiver and transmitter responses, and and are the responses of the transmitter and receiver of user terminal . While the responses of the propagation channel are reciprocal, the responses of the front-ends will typically cause non-reciprocity in the full response. In the precoded Massive MIMO downlink reception the following holds:

(13)

When the corresponding estimates of are used to calculate the precoding coefficients, they will introduce Multi-User Interference (MUI) and potentially an SNR loss, depending on the precoding vectors . We include the derivation for the zero-forcing precoder, and refer to [33] for a comprehensive treatment. Under the assumption of negligible channel estimation errors and considering normalized responses to simplify notation,1 the received signals at the terminals are given by

(14)

where and are the -vectors of transmitted symbols and received noise samples, respectively. Writing out the front-end responses gives the following expression:

(15)

where and are diagonal matrices containing the transmitter and receiver responses of terminals and . Equation (15) shows that in general the combined precoding, channel, and transceiver responses will not result in a diagonal matrix. As a result, MUI will occur. Structurally it is the multiplication of the base station’s front-end responses that is responsible for the MUI. The terminal responses appear as scalar multiplications on the received symbols and will be contained in the equalization processing in the terminal. A suitable calibration procedure operating locally at the base station can restore the reciprocity. Calibration data needs to be obtained through measurements of the transceiver front-end responses, for which several approaches have been proposed and validated:

  • Utilization of an auxiliary front-end, which sequentially measures the RF transceiver front-ends. The method works well in conventional MU-MIMO systems [34]. However, it does not scale well to large numbers of antennas.

  • Exploitation of the coupling, essentially radio propagation, between antennas in the array to derive the relative differences among the transceiver responses. This solution has been implemented in real-life testbeds and performs well [35].

Analysis has shown that non-reciprocity requirements are not as severe for Massive MIMO as in conventional systems [33] and depend on the system load and precoding algorithms. The RF transceiver responses may vary in time mainly due to temperature differences. The calibration procedure hence needs to be repeated on a regular basis. In typical conditions the required updating frequency is in the order of hours. It thus introduces only very limited overhead.

V Algorithm-Hardware Co-Design for Precoding and Detection

The central detector and precoder perform the crucial operations to achieve spatial multiplexing. This section zooms in on the hardware implementation of the precoding and detection algorithms.

Algorithms Per channel realization Per channel use
Neumann Series
Cholesky Decomp.
Modified QR Decomp.
Coordinate Descent -
TABLE II: Computational complexity (number of real multiplications) of different detection techniques.

V-a Implementation Challenges and Design Considerations

Linear processing provides good precoding and detection performance under favorable propagation conditions. However, linear processing in Massive MIMO does not necessarily result in low computational complexity given that the operations need to be performed on large matrices. For instance, the complexity of computing for an matrix is

(16)

This number is in the order of for an , system. In TDD massive MIMO systems, processing latency is a crucial design consideration, especially for high-mobility scenarios. The analysis in [9] shows that the time budget for operating the precoding is 150 s to support a moderate mobility of km/h. The high computational complexity and processing speed need to be handled with reasonable hardware cost and power consumption. These implementation challenges necessitate meticulously optimized solutions following a systematic algorithm-hardware co-design methodology.

A central property of Massive MIMO is that the column vectors of the channel matrix are asymptotically orthogonal under favorable propagation conditions. As a result, the Gram matrix, , becomes diagonally dominant, i.e.,

(17)

and for i.i.d. channels,

(18)

The extent of the diagonal dominance varies with the characteristics of the antenna array, the propagation environment, and the number of users served. Exploiting this dominance, approximate matrix inversion can be performed to reduce the computational complexity. Matrix inversion approaches can be categorized into three types: explicit computation, implicit computation, and hybrid methods. We next assess the complexity and suitability of these methods.

V-B Explicit Matrix Inversion

Explicit matrix inversion can be performed using approaches such as Gauss-elimination, Neumann series expansion [36], and truncated polynomial expansion [37]. Recently, the Neumann series approximation has been identified as one of the most hardware-friendly algorithms for Massive MIMO systems [38, 39]. If a matrix satisfies

(19)

its inverse can be approximated by a Neumann series with terms as:

(20)

where is a pre-conditioning matrix. The number of terms, , can be used as a tuning parameter to trade off between complexity and accuracy. It is shown in [39] that using the main diagonal of the Gram matrix,

(21)

as the pre-conditioning matrix, the Neumann series approximation can provide close-to-exact-inversion performance with when . However, a significant performance loss is demonstrated when . To improve the accuracy, the following weighted Neumann series approximation was introduced in [40, 41]:

(22)

In [40], the coefficients are selected by solving the equation

(23)

where

(24)

At the price of extra computational complexity, the method in (22) improves the performance significantly, especially in cases with a high user load.

V-C Implicit Matrix Inversion

Implicit matrix inversion uses linear-solvers such as conjugate-gradient [42], coordinate-descent [43], and Gauss-Seidel [44] to perform linear precoding and detection, without explicitly calculating the Gram matrix inverse. In [43], the coordinate-descent method is adopted to realize an MMSE detector. The regularized squared Euclidean distance,

(25)

is minimized sequentially for each variable in in a round-robin fashion. In (25), is the variance of each complex entry in the noise vector . In each iteration, the solution for the th element in is

(26)

This procedure is then repeated for iterations.

V-D Hybrid Method

Matrix decomposition algorithms factorize the Gram matrix into intermediate matrices, which are generally triangular. Forward or backward substitution is then performed to accomplish the corresponding precoding and detection operation. The solution in [45] utilizes QR-decomposition. The Gram matrix is decomposed as

(27)

where is unitary and is upper triangular. The linear equation is then rewritten as

(28)

which can be solved using backward substitution. This method avoids the explicit computation of matrix inverses, relaxing (to some extent) the requirements on data representation accuracy. By exploiting the diagonally dominant property of the Gram matrix, modified QR-decomposition can be performed [45]. For instance, the original solutions

(29)

to the Givens rotation operation

(30)

are approximated by

(31)

Equation (31) makes use of the fact that and results in 50% complexity savings by introducing the constant .

Cholesky-decomposition () has also been studied for Massive MIMO precoding and detection implementation [46, 47]. It has lower computational complexity than the Neumann series expansion method (with ) [39] and provides accurate processing independent of and . More importantly, the Cholesky decomposition imposes lower memory requirements, since only the lower triangular matrix needs to be stored.

V-E Complexity versus Accuracy Trade-Off

Fig. 7: Simulated performance of different detection methods. The subscripts in the legend indicate the fixed-point resolution of the fractional part. Markers at dB performance loss mean that the corresponding detection scheme has a performance loss greater than dB or shows an error floor before reaching a BER of .

To select appropriate processing algorithms for Massive MIMO is non-trivial, and an analysis of the trade-off between computational complexity and processing performance is necessary. Reference [48] presents such an analysis for different MMSE detection techniques.

To evaluate the processing accuracy, we simulate the performance of different detection techniques including Neumann series approximation (NSA), Cholesky decomposition (ChD), modified QRD (MQRD), and coordinate descent (CD). The effects of fixed-point arithmetics is also taken into consideration to examine the required data precision. In the simulations, , sweeps from 8 to 32, and an i.i.d. block Rayleigh fading channel with perfect channel estimation and synchronization was considered. A rate- convolutional code with generator polynomial [171, 133] and a constraint length of 7 was used. Figure 7 shows the performance at BER relative to floating-point ZF detection. The number of iterations for the NSA and CD was set to 3. Implicit and hybrid methods are more robust to lower resolutions, while NSA requires a larger number of bits to calculate the matrix inverse explicitly. When is small the Gram matrix becomes less diagonally dominant and approximate matrix inversion methods suffer from a larger performance loss. CD offers better interference cancellation when the user load is relatively high.

Fig. 8: Computational complexity (per instance of the detection problem) of different implementations of ZF detection, for different numbers of base station antennas, numbers of users, and channel coherence duration.

Table II lists the corresponding computational complexity in terms of number of real multiplications. The computation is divided into two parts depending on how frequently it needs to be executed, i.e., per channel realization and per channel use (instance of the detection problem). The Gram matrix calculation, matrix decomposition, and matrix inversion are performed when the channel changes, while matched-filtering and backward/forward substitution are performed for each received vector. Thereby, the computational complexity depends on the channel dynamics, i.e., the number of samples () during which the channel is constant. Figure 8 depicts the results. Different system setups and channel conditions are analyzed. While changing , , and in the three sub-figures, the other two are fixed to , , and , respectively. Several observations can be made. The detection complexity grows linearly with , enabling large savings in transmit power by deploying large numbers of antennas, with a mild increase in the processing power. Moreover, the processing complexity (for explicit and hybrid matrix inversion algorithms) can be dramatically reduced in static environments, in which case the channel matrix-dependent operations are performed very rarely.

In addition to the processing accuracy and computational complexity, parallelism is an important aspect to be considered, and it highly impacts the processing latency. Iterative algorithms such as Neumann series approximation and coordinate descent can suffer from a long processing latency for MUI-dominant channels. On the other hand, matrix decomposition can be performed in a more parallel fashion and was thus selected for the first Massive MIMO precoder-detector chip introduced in the next section. Moreover, the intermediate results , , and can be shared between the uplink and downlink processing, further simplifying the hardware.

V-F 1288 Massive MIMO Precoder-Detector Chip Achieving 300 Mb/s at 60 pJ/b

Fig. 9: Simple, configurable, and scalable architecture for QRD-based massive MIMO precoder (From [45]).

Integrated hardware implementations will ultimately define both the performance and power consumption of Massive MIMO systems. Hence, algorithms should be selected such that the corresponding operations can be mapped into simple, configurable, and scalable hardware architectures to enable high throughput, low latency, and flexible implementation. The reconfigurability and scalability are essential to enable efficient operation in a wide range of conditions. In this section we present a design [45] demonstrating such an algorithm and hardware architecture co-design, where the QR-decomposition based ZF precoding is mapped onto a systolic array architecture; see Figure 9. The systolic array consists of a homogeneous network of elementary processing nodes, where each node performs the same pre-defined tasks. Due to the homogeneity, the architecture is scalable to support different and . The data flow in a systolic array is straightforward and parallel, leading to a simple and high-speed hardware implementation.

Fig. 10: Microphotographs of massive MIMO precoder and detector chips: (a) From [45] (b) From [49].

The QR-decomposition based precoder, together with a Cholesky decomposition based detector, was fabricated using nm FD-SOI (Fully Depleted Silicon On Insulator) technology. Figure 10(a) shows a photograph of the chip. It occupies only a mm silicon area and consumes  mW power for precoding and detection for a 1288 Massive MIMO system with a Mb/s throughput. The fabricated chip and the measurement results prove that the Massive MIMO concept works in practice and that system-algorithm-hardware co-optimization enables record energy-efficient signal processing. The cross-level design approach also applies advanced circuits techniques leveraging on the flexible FD-SOI body bias feature [50]. Using forward body bias or reverse body bias allows systems to dynamically adjust processing speed and power consumption of the chip towards the most efficient operating point.

The algorithm-hardware co-design method is further exploited in [49] to map an iterative expectation-propagation detection (EPD) onto a condensed systolic array for higher hardware resource utilization. This detector chip (Figure 10) is fabricated using nm FD-SOI technology and provides 1.8 Gb/s throughput with  mW power consumption. It offers 3 dB processing gain comparing to [45], equivalent to a 2 boost in link margin that can be utilized to lower the TX power and relax the front-end requirements.

Vi Per-Antenna Chain processing at the Semiconductor Edge

An obvious concern is how the large number of antennas and the associated signal processing will affect the cost and energy consumption of the base station. The individual antenna signals may have low precision, but regardless of that, the coherent combination yields excellent SNR eventually. We demonstrate below that the resolution of digital signals and operators, such as filtering coefficients, can be scaled back sharply. Furthermore, we advocate processing of the per-antenna functionality without the conventional circuit design margins that are used to cope with uncertainties in the semiconductor technology. This approach has been called “at the semiconductor’s edge” to indicate an operation point where the performance-energy benefit of the technology is maximally exploited at the expense of reliability [51]. Specifically, voltage over-scaling offers significant energy reductions in deeply scaled CMOS, up to more than 50%, at the risk of occasional processing errors. Massive MIMO systems can be designed to meet required performance levels when operating with error-prone digital signal processing circuits. Circuits remain functional even for the worst-case scenario in which the DSP circuitry in some antenna paths fails completely, for example by a broken power supply. We will call the situation where the signal in an antenna branch is fully lost “antenna outage”.

Vi-a Per-Antenna Functions: Coarse Processing Provides Excellent Performance

Massive MIMO can operate well with low-resolution signals. A profiling of the per-antenna functionality in terms of generic operations per second shows that for an LTE-like setup, about of the complexity is in the filtering and the remaining is in the (I)FFT operation. The filtering functionality is the most demanding because of the need to over-sample and hence process at high speed. Significant savings in complexity are therefore possible by minimizing the resolution of this processing. An exploration of the word lengths of the data signals, , and of the filtering coefficients, , is reported on in [52]. The circuit area complexity of the -tap FIR filtering of I- and Q-signals as a function of the word lengths is calculated using basic formulas for the complexity of adders and multipliers, which are dependent on the word-lengths and of the operands as follows:

(32)

If a smaller number of bits is used to represent the signals and the filter coefficients, the hardware complexity as given in (32) is reduced. However, decreasing the word length will increase the quantization noise. For a desired transmission quality the just-sufficient precision can be determined. Considering that the quantization noise will be independent among the antennas, its combined impact will be smaller for larger numbers of antennas. This effect is illustrated in Figure 11 for the rather demanding 64-QAM case, and an uncoded Bit-Error Rate (BER) of . The curves were generated based on individual BER vs. SNR simulations for different coefficient and signal resolutions, from which the equal performance points were extracted. Dotted lines show equal-complexity (in terms of area) solutions. For a 1284 Massive MIMO system, 4 and 5 bits are sufficient for the signals and the coefficients, respectively, for the targeted performance. This brings a 62% complexity reduction for the filters compared to the 84 case. The outer right points on the curves are clearly always suboptimal and demonstrate that high-precision filter coefficients do not improve performance, while they can cause a significant complexity penalty. A similar observation holds for the upper left points.

Fig. 11: Representation of the relative circuit complexity (area) as function of the signal and filter coefficient word lengths. The markers show possible operating points with a BER of . The dashed lines with numbers show operating points with equal complexity. The graphs demonstrate that low-resolution processing is feasible with large antenna arrays. From [52].

For higher system loads, more bits are needed. At the system level one could trade-off system load for constellation order to satisfy throughput requirements.

This analysis provides evidence that low-complexity, coarse processing in the digital filters of the individual antenna signals can offer the required performance in Massive MIMO. In the downlink the signals will next be passed to the D/A converters. The latter could be low resolution as well. The more demanding design challenge for D/A converters however is to meet out-of-band emission specifications, as introduced in Section IV.

The (I)FFT operations required in Massive MIMO systems with multicarrier modulation can also be designed for Massive MIMO operation specifically and benefit from the complexity reduction brought by the coarse quantization. A thorough optimization is however quite complex and should consider varying quantization at the different butterfly stages.

Vi-B Processing at the Semiconductor’s Edge

Applications have benefited over the last decades from Moore’s law, providing ever higher performance at lower power consumption. Integrated Circuits (ICs) have been able to operate at lower dynamic power thanks to the scaling of the supply voltage . For digital circuits, the average dynamic power consumption is

(33)

where is the effective switching capacitance of the module and is the switching frequency. Clearly, scales quadratically with the supply voltage .

However with scaling towards deep sub-micron CMOS technologies (65 nm and smaller), designers are facing ever-increasing variability challenges. The process, voltage and temperature (PVT) variabilities are considered to be the three main contributors to circuit variability. Conventionally, to cope with this challenge, ICs are designed at the worst PVT corners, to ensure that they always operate correctly. Figure 12 illustrates the different operating regions for ICs suffering from manufacturing variability.

Fig. 12: Different approaches to scaling of the supply voltage to cope with speed variability. Operation at the worst-case corner misses out on potential energy savings. Adaptive Voltage Scaling (AVS) provides the just-needed for the circuit to function error-free. A further reduction of by Voltage Over-Scaling (VOS) would save more power, yet would introduce processing errors.

The conventional design approach for worst-case conditions introduces considerable margins, leading to reduced peak performance and wasted power consumption. The worst-case synthesis assumes that all devices in the circuit operate in the slow-process corner and experience the least favorable voltage and temperature conditions. Temperature variations can yield up to 20% speed differences for a single D flip-flop. For instance, [53] shows that for 28-nm technology, the performance (speed) difference for a representative circuit is as large as times between the typical case and the worst case. Adaptive scaling techniques manage power dissipation and temperature by using a variable supply voltage .

Scaling down the supply voltage is regarded as an error-free power saving method as long as the signal timing constraints are met. However, the critical (minimum) that guarantees timing closure cannot be determined at design time due to PVT variabilities and aging effects.

A third design approach has recently gained interest, namely to scale the below the critical supply voltage, which is called Voltage Over-Scaling (VOS). In the VOS approach, the designer accepts that sporadic errors might occur: for logic components, the signal from the longest propagation paths can be mis-captured; for memory components, it may lead to incorrect write/read data/address or data loss. This methodology of approximate computing enables very energy-efficient processing [54]. Wireless communication systems are designed to cope with distortions and errors occurring on the channel. They are hence inherently good candidates for error-resilient processing solutions. In Massive MIMO, the large number of antennas implies redundancy in the system. It is promising to apply VOS specifically in the per-antenna processing, reaching beyond the reliability margins of the circuits, but still operating at a point where the computations are more often correct than wrong.

Vi-C Massive MIMO Resilience to Circuit Errors

Massive MIMO inherently is resilient to some circuits errors in the per-antenna processing. Hardware errors in a number of antenna paths can be absorbed by the system thanks to the averaging induced by the large number of antennas – reminiscent of how the effects of small-scale fading average out in the coherent multi-user MIMO processing [3]. Semiconductor process variability was at first experienced globally, between wafers or circuits separated in space on a silicon wafer, hence die-to-die. Designers have thus realistically assumed transistor parameters to be correlated for nearby circuits on a specific die and chip. However, in deeply scaled technologies, device variability is mostly caused by the inaccuracy of lithography and etch technology. Intra-die (local) variations have consequently become significant, and are even reported dominant over global variations [55]. This apparent design challenge comes with a new opportunity to shave margins in the implementation of Massive MIMO. Indeed, different from the distortion resulting from non-linearities, the digital distortion is independent of the signal and hence uncorrelated over the antennas. The massive MIMO system will continue functioning even when, sporadically, one or a few individual antenna signals is subject to full failure. As mentioned in Section VI-B, this opens the door to operation of circuits with much lower design margins compared to traditional specifications, and most interestingly at lower supply voltages and hence power consumption.

The digital hardware errors in (I)FFT and filters introduced by silicon unreliability and by ambitious design methodologies result in incorrect bits during signal processing. This can be regarded as digital circuit distortion. We characterize the impact on the purity of the signal in terms of the signal-to-digital distortion ratio (SDDR):

(34)

where and are the powers of the error-free digital antenna signal output, and the noise power of the digital distortion due to circuit unreliability, respectively. First, we consider VOS errors which are temporary and local in nature. The BER-performance is shown in Figure 13 for a severe SDDR distortion, where signals get stuck at a fixed value. Results for different modulation orders and both uncoded and coded performance (rate 3/4 soft decoded LPDC) are shown. The resulting SNR degradation remains limited to  dB for of the antennas being a “victim” of circuit errors in the coded 16-QAM, and even up to of the antennas in QPSK case.

Fig. 13: BER performance versus channel SNR. Randomly affected “victim antennas” from significant digital hardware errors for uncoded and coded (3/4 soft LDPC) QPSK, and uncoded and coded (3/4 soft LDPC) 16-QAM. From [51]. The legend denotes: i) error-free (star markers), ii) 3% victim antennas (circle markers), and iii) 10% victim antennas (triangle markers).

When operating deeply scaled circuits without margins, occasionally a full circuit failure may occur. The impact of this effect on the Massive MIMO system performance is called “antenna outage”. The digital output are then permanently stuck at a fixed value, which is assumed to be its maximum possible value. The SDDR of the outage antenna is , as the signals from the victim antennas are completely lost. This model is regarded a worst-case hardware failure. Note that the SDDR does not imply infinite noise to the whole system, as only the victim antennas are affected and their PA power is normalized among all antennas. Therefore, a single antenna outage will not cause the system to fail entirely. The impact on the system performance is shown in Figure 14 for different antenna outage and system loads, for the pessimistic case where the errors are not detected.

Fig. 14: Impact of antenna outage on Massive MIMO system performance depends on the system load, for the pessimistic case where the errors are not detected. Disabling antennas will limit the impact of antenna outage on the Massive MIMO system performance. From [51].

As demonstrated, Massive MIMO can operate well with rather severe circuits errors, and thus allows significant VOS. The impact increases with higher system load and modulation constellations. The may be adapted according to the system parameters to always offer just sufficient performance. In-situ monitoring based on test signals can be applied to perform adequate scaling [51].

In order to further improve the system robustness towards hardware errors, techniques to first detect hardware errors, and next either neglect, or if needed disable, defective hardware can be applied. Importantly, the distortion originating from digital circuit errors fundamentally differs from pure random noise. While process variations may feature continuous random distributions, their effect typically results in discrete error events. Dedicated monitoring circuitry can be established [56]for the functional components such as (I)FFTs and filters, that will detect these errors. If the Massive MIMO system is operated whereby it receives information from the hardware level on failing circuits, it can adapt its signal processing accordingly. One option is to disable systematically failing antenna paths and no longer consider them in the central processing. The BER results are given in Figure 15 for a case with moderate system load (10100 in this simulation). It shows that excluding defective circuits limits the degradation level to  dB on uncoded QPSK for up to of the antenna paths failing. This approach is equivalent to operating the Massive MIMO system with a reduced number of BS antennas . For a representative case of QPSK transmission in a 100-antenna, 10-user scenario and with 28 nm standard CMOS technology, up to 40% power savings can be achieved with negligible performance degradation [57].

Fig. 15: BER performance is only slightly degraded for up to of antennas failing. Systematic failure of circuits is detected and corresponding antenna signals are discarded. From [51].

In conclusion, lean per-antenna processing can be performed in Massive MIMO systems. The very large number of operations, due to the large number of antenna paths, can be performed with low precision and with a profoundly scaled supply voltage. Combined, these techniques can reduce the power consumption due to the digital processing on each antenna path by an order of magnitude. For an exemplary system with 100 antennas at the base station, the total is comparable to a conventional MIMO system with an order of magnitude less antennas.

Vii Terminals: Increased Reliability with Low-Complexity Signal Processing

Vii-a Increased Service Levels on Low Complexity Terminals

It has been shown that the Massive MIMO system concept does not require any additional specific functionality at the UE side. Massive MIMO terminals that have a single antenna, or apply simple diversity reception, will only be able to receive a single spatial stream. However, large numbers of terminals can be multiplexed in the same time-frequency slot, and every terminal can be allocated the full bandwidth of the system. This results in a throughput per terminal comparable with that of conventional UEs that receive multiple spatial streams in parallel.

5G terminals are expected to come in large numbers and support a diverse set of service requirements. Next to the continued traffic growth towards terminals allocated to human users, a variety of devices will require Machine Type Communication (MTC). Figure 16 illustrates three main use cases envisioned by industry alliances and the International Telecommunication Union (ITU) [58].

Fig. 16: Envisioned use cases for future international mobile telecommunication. (source: Recommendation ITU-R M.2083-0 “Framework and overall objectives of the future development of IMT for 2020 and beyond” [58])

Figure 16 demonstrates that 5G technologies not only need to enhance mobile broadband links. New solutions are needed to connect a very large number of (ultra-) low-power devices and machines requiring very reliable and low-latency services. Massive MIMO can simultaneously support many broadband terminals in sub-6 GHz bands in indoor, outdoor, and mobile environments. The technology can also be tailored to optimally serve new MTC-based applications. Especially for narrowband MTC, the high array gain and the high degree of spatial diversity offered by Massive MIMO will help. The spatial diversity specifically gives rise to channel hardening.

The effects of array gain and channel hardening are illustrated for a 128-antenna setup in Figure 17. Consistently boosted signal levels over all terminal positions, thanks to the array gain, are observed. Terminals can potentially transmit data at several tens of dB lower output powers. The latter however requires high-quality CSI to be available, and the power allocated to pilots will limit the savings in practice. The channel hardening effect enhances the reliability of the links and improves the quality of service; most notably:

  1. Increased performance at the cell edges, where terminals may experience limited or worst case no connectivity in current networks. Massive MIMO addresses this challenge, provided good uplink pilot-based CSI acquisition is ensured.

  2. Power savings and hence longer autonomy for battery-powered devices.

  3. Improved reliability. Fewer packet retransmissions can also reduce the end-to-end latency. The specifications put forward for Ultra Reliable Low Latency Communication (URLLC) in 5G is to support a reliability, and an end-to-end latency better than 1ms.

  4. Sustained good service levels in conditions with many simultaneously active users.

Fig. 17: The array gain and channel hardening effect demonstrated experimentally, for a , setup. With permission and ©Ove Edfors, Lund University.

In the next paragraphs, first a typical broadband user equipment is zoomed in on. It is indicated how low-power operation can be achieved while keeping backward compatibility with 4G air interfaces. Next, we discuss how tailored Massive-MIMO systems have great potential to address the challenging requirements of MTC terminals.

Vii-B Energy Efficient Broadband Terminals

No advanced processing is required at the UE in Massive MIMO systems. In contrast, 4G systems deliver broadband services to UEs through multiplexing of several spatial layers. We compare a typical Massive MIMO terminal with the reference case of a MIMO link. The latter requires MIMO detection at the terminal side in the downlink. Figure 18 shows a functional block scheme of a conventional broadband, multiple-antenna terminal receiver.

Fig. 18: A conventional wideband receiver for multiple spatial layers requires complex MIMO detection.

The complexity breakdown of a typical MIMO-OFDM baseband chain identifies channel estimation and MIMO detection as the main bottlenecks. We take as a reference MIMO-OFDM case where the multiple-antenna processing can be conveniently performed per subcarrier resulting in relatively low-complexity implementations [59]. We consider a basic linear MIMO detector, and non-linear detectors implementing (ordered) Successive Interference Cancellation (SIC). The latter are required to achieve acceptable system performance especially in the low-SNR regime and in high-mobility scenarios. The power consumption of the inner modem receiver of the terminal in a Massive MIMO system is estimated relatively to published VLSI implementations for conventional MIMO receivers [60, 61]. A range of algorithms and implementations for MIMO detectors have been reported on, differing substantially in complexity. Our analysis is based on typical data for the specific components, and our own design know-how. Table III summarizes the assessment for both single-antenna and dual-antenna diversity-reception terminals, demonstrating an expected reduction in power consumption of a factor 5 to 50.2 The instantaneous throughput will be higher for conventional MIMO terminals receiving several spatial layers. To compare the energy efficiency (in Joule/bit), the same average throughput needs to be considered.

4 x 4 4 x 4 Massive MIMO Massive MIMO
linear non-linear 2-antenna
detector detector single antenna diversity
TABLE III: Relative power consumption estimates for UE inner modem receivers.

Vii-C Tailored Solutions Fit for Low-Power Connected Devices

MTC for sensors and actuators opens the door for a variety of new IoT applications. Low energy consumption is essential to enable long autonomy of devices powered by batteries or even relying on harvested energy. The physics of radio propagation dictates a strong attenuation on the link with distance, :

(35)

where and are the received and transmitted powers, respectively, and and are directivity gains at the receiving and transmitting end of the link. The above is especially unfortunate for mostly uplink-dominated MTC. Low Power Wide Area Network (LPWAN) technologies are dedicated to connect IoT nodes at long ranges. We performed measurements with an IoT node communicating via a LORA gateway [62]. Inspection of the power consumption of this illustrative node in Table IV provides valuable insights. The transmit power is relatively high since the power amplifier needs to provide sufficient power to cope with large-scale fading losses. The energy consumption, which will ultimately determine the autonomy of the node, is shown in Figure 19.

Operation mode Power consumption (mW)
Transmit 3
Receive
Sense
Sleep
TABLE IV: Power consumption in different modes measured on a LPWAN IoT node.
Fig. 19: The transmit energy will dominate the battery time on a LPWAN IoT node.

This pinpoints the fierce challenge of connecting sensor nodes and other autonomous devices at a longer range. Their traffic is mostly dominated by uplink, hence putting the node in the most energy-consuming transmitting mode. Equation (35) reveals that fundamentally only few parameters can be influenced to improve the link budget. Antennas at IoT nodes, due to size and cost constraints, can hardly offer any gain and on the contrary not seldom introduce losses. Massive MIMO systems exhibit a large-antenna array gain and apply an adaptive channel-matched beamforming approach. They offer the opportunity to reduce the transmit power in constrained MTC nodes proportionally to the square root of number of BS antennas or even proportionally to if accurate CSI is acquired. This enables the simultaneous service of a large number of devices. This asset is important to keep up with the predicted evolution towards Massive MTC. A Massive MIMO-based LPWAN could also offer extended coverage and increased reliability, provided that a power-efficient solution for the pilot-based CSI-acquisition is implemented. This challenge, to develop Massive MIMO technology for MTC services is further discussed in Section VIII.

Viii Demonstrations, Conclusions and Future Directions

Viii-a Signal Processing at Work in Massive MIMO Demonstrations

Demonstrations that have proven the superior spectral efficiency of Massive MIMO and the adequacy of DSP solutions in real-life testbeds are illustrated here below. Furthermore we summarize the conclusions of this paper and outline future research directions.

Fig. 20: Two different Massive MIMO testbeds: (a) the LuMaMi testbed at Lund University a with collocated antenna array (from [63]) and (b) the KU Leuven testbed with separated antenna arrays.

To prove a new wireless technology, it is very important to build up testbeds to conduct verification and evaluate performance in real-life environments with over-the-air transmission. For Massive MIMO it is especially crucial, since performance is dependent on propagation characteristics, and measurement-based channel models themselves are still under development. Thanks to recent advances in Software-Defined Radio (SDR) technology, several Massive MIMO prototype systems have been built by both industry and academia, including the Argos testbed with 96 antennas [10], Eurecom’s 64-antenna testbed [64], Facebook’s ARIES project [65], the 100-antenna LuMaMi testbed from Lund University (Figure 20a) [63], SEU’s 128-antenna testbed [5], and testbeds exploring distributed arrays from the KU Leuven (Figure 20b) [66] and University of Bristol [67].

World-Record in Spectral Efficiency and Massive MIMO in Mobility

The signal processing techniques discussed in this paper, especially the cross-level optimization methodology, have been exploited in the development of Massive MIMO testbeds to enable real-time processing of wide-band signals for large numbers of antennas. For instance, the LuMaMi tested adopts the processing distribution scheme in Figure 3, where 50 SDRs with Field-Programmable Gate-Arrays (FPGAs) are used to perform per-antenna processing in a parallel fashion. Four centralized FPGAs are responsible for per-subcarrier processing, and the Peripheral Component Interconnect Express (PCIe) with direct memory access (DMA) channels handles the data shuffling. QR-decomposition based ZF processing has been implemented to fully leverage the available parallel processing resources in the FPGAs.

Diverse field trials, both indoors and outdoors with static and mobile users, have been conducted using the Massive MIMO testbeds. In a 2016 experiment, a 128-antenna Massive MIMO base station served 22 users, each transmitting with 256-QAM modulation, on the same time-frequency resource [67]. The spectral efficiency benefits from the spatial multiplexing as well as from the high constellation order, enabled by the array gain. In practice, protocol overhead and FEC redundancy will determine the actual net spectral efficiency. In the actual demonstration a spectral efficiency of 145.6 bits/s/Hz was achieved on a 20 MHz radio channel, representing a  times increase with respect to the current 4G air interface. The performance was achieved in an environment without mobility and multi-cell interference, which would constitute the limiting factors performance in a practical deployment.

The same research group also demonstrated Massive MIMO operation in an outdoor scenario with moderate mobility [4]. Figure 21 shows the measurement scenario where the 100-antenna LuMaMi testbed is placed on the rooftop of a building facing a parking lot  m away. Ten single-antenna users are served in real time at 3.7 GHz, including six users moving at pedestrian speed and four terminals on vehicles moving at a speed up to around km/h. The spatial multiplexing was fully achieved and the communication quality was on average well maintained for all terminals [68]. Sporadic interruptions could be traced back to temporary loss of synchronization. It should be noted that both the speed of the cars and the number of terminals could be larger in a real deployment. In the proof of concept they were limited by the available test space and equipment. In fact, at 3.7 GHz carrier frequency and with a slot length of 0.5 ms, the maximum permitted mobility (assuming a two-ray model with Nyquist sampling, and a factor-of-two design margin, as in [3]) is over 140 km/h [69].

Further Investigation Needed for Synchronization

A critical challenge requiring further investigation is the initial synchronization between the base station and the user terminals. This initial synchronization has to start without any knowledge of the channels, and therefore cannot benefit from an array gain. How to efficiently perform initial time and frequency synchronization acquisition without the massive array gain and how to explore the (partial) array gain to provide faster and more robust synchronization are still open questions. Two methods were studied during the LuMaMi testbed experiments. One method is to reserve a dedicated RF chain for the synchronization signal, which is transmitted using an omni-directional antenna. In this case, a higher-power PA (which is not available in LuMaMi) is needed to provide coverage. Another method is to use beam-sweeping for the synchronization signal [70], but this is inefficient, as it is essentially equivalent to repetition coding, and also there is risk of synchronization loss when the users are not hit by a beam. Improved techniques, based on space-time block codes, have been investigated [71, 72, 73]. Iterative search and tracking methods [74] may have potential, especially for mobile users.

Fig. 21: Overview of the testbed demonstration of Massive MIMO in a mobility scenario, at the campus of Lund University, Sweden.

Viii-B Concluding on the Signal Processing

Appropriate co-design of algorithms, hardware architectures, and circuits in Massive MIMO implementations brings significant benefits:

  • Energy efficient implementations of “theoretically optimal” Massive MIMO DSP architectures are nontrivial but possible. We have detailed some of the most important innovations required, and explained their analysis. The power consumption of conventional macro base stations is dominated by the PA stage. They benefit in Massive MIMO from the ability to operate on an order of magnitude less transmit power.

  • The sufficiency of low-precision quantization and processing, predicted by information-theoretic studies, has now also been validated through real signal processing experiments. A reduction in word-length up to 6 times compared to conventional systems translates into corresponding savings in complexity, power consumption and memory.

  • Dedicated and scalable hardware architectures implementing tailored algorithms for large matrix processing facilitate zero-forcing precoding at the base station in real time, at 30 mW power consumption in relevant scenarios for a system.

  • Voltage over-scaling, a speculative concept just 5 years ago, has found appropriate application in the Massive MIMO per-antenna processing.

  • Smart control of algorithmic modes and scalable devices, including body bias adaptation, can guarantee suitable performance-power trade-offs over a wide range of communication scenarios and channel propagation conditions.

  • Lean terminals could operate in typical broadband cellular Massive MIMO networks at about of the power consumption of equivalent conventional terminals, both in data transmission and reception.

  • The efficiency of Massive MIMO base stations can be further improved by relaxing the requirements of the RF and analogue hardware. However, caution is needed as (non-linear) distortion may under specific conditions combine coherently.

Viii-C Future Directions

Progress Massive MIMO Deployment in Actual Networks

Integration of all components into deployment in actual networks represents a vast design and development effort, that will include:

  • Overcoming challenges related to connection of the many antenna paths to the central processing units. This involves implementing high-speed interconnects and coping with potential coupling effects in the front-end modules.

  • Devising efficient schedulers for large numbers of users. Achieving the high spatial multiplexing gains offered by Massive MIMO fundamentally requires that many terminals are scheduled for service simultaneously. Tuning or re-design of higher-layer protocols could be beneficial to shape the traffic patterns, such that aggressive spatial multiplexing can be performed.

  • Designing antenna arrays. Massive MIMO arrays do not have to be linear, rectangular or cylindrical. Small antenna elements could be naturally integrated into the environment, onto the surface of existing structures, or faces of buildings, for example, in an aesthetically pleasing manner.

    Insights from electromagnetics may guide the design of new types of arrays. Specifically, for a given volume , consider the corresponding smallest possible sphere that contains . If one covers the surface of this sphere with antennas at a density of elements per square meter, then there is no point in installing any additional elements inside of the interior of [75]. Sampling the surface on a -grid captures all information in the radiated field. In conclusion, what goes into the interior of is unimportant, only the surface matters.

Industrial recognition of the value of Massive MIMO technology is evidenced by the large number of contributions on the topic in the 3GPP-LTE standardization of New Radio (NR) for 5G systems. Leading operators have already started to perform commercial field trials of the technology [69].

Enhanced Functionalities

Large antenna arrays can also be used to perform accurate positioning and localization. This feature can offer improved context-awareness to services. Also the Massive MIMO communication system itself could exploit this information to perform smart pilot allocation, for example.

Scale Up Capacity and Efficiency

The call for more and higher-quality wireless services is expected to increase for many years, and the quest for wireless systems offering higher spectral and energy efficiency will continue. Higher peak-rates can be offered in Massive MIMO by performing spatial multiplexing of several streams to one terminal. Actual gains may be limited due to insufficient rank of the channel, yet for two streams this will mostly be achievable with co-located antennas exploiting cross-polarization.

Wider bandwidth channels can be allocated especially in mmWave bands. Radio propagation and in particular absorption is considerably different at these frequencies. Arrays with a large number of antennas can be small in size, yet their effective gain may suffer from high losses on the interconnect. Consequently, Massive MIMO systems in these bands call for other architectures and their deployment will best suit particular use cases, for example hotspots.

With larger antenna arrays, both better spatial multiplexing and array gains can be achieved. The new concepts of cell-free Massive MIMO [76] and intelligent surfaces [77] accelerate this trend to a next level. With cell-free Massive MIMO, coherently cooperating antennas are spread out over a larger geographical area, providing improved macro-diversity and improved channel rank for multiple-antenna terminals. The intelligent surface concept envisages distributed nodes that form electromagnetically active walls, floors, and planar objects. New research is urgently needed to bring these new concepts to their full potential.

Ix Acknowledgment

The authors thank their colleagues and collaborators in the European FP7-MAMMOET project for the nice cooperation that truly progressed Massive MIMO technology.

Liesbet Van der Perre is Professor at the Department of Electrical Engineering at the KU Leuven in Leuven, Belgium and a guest Professor at the Electrical and Information Technology Department at Lund University, Sweden. Dr. Van der Perre was with the nano-electronics research institute imec in Belgium from 1997 till 2015, where she took up responsibilities as senior researcher, system architect, project leader and program director. She was appointed honorary doctor at Lund University, Sweden, in 2015. She was a part-time Professor at the University of Antwerp, Belgium, from 1998 till 2002. She received her Ph.D. degree from the KU Leuven, Belgium, in 1997.

Her main research interests are in wireless communication, with a focus on physical layer and energy efficiency in transmission, implementation, and operation. Prof. L. Van der Perre was the scientific leader of FP7-MAMMOET, Europe’s prime project on Massive MIMO technology. Dr. Van der Perre has been serving as a scientific and technological advisor, reviewer and jury member for companies, institutes, and funding agencies. She is a member of the Board of Directors of the company Zenitel since 2015. Liesbet Van der Perre is an author and co-author of over 300 scientific publications. She was a system architect for the OFDM ASICs listed in the IEEE International Solid State Circuit Conference (ISSCC’s) Best of 50 Years papers in 2003. She co-authored the paper winning the DAC/ISSCC 2006 design contest.

Liang Liu is an Associate Professor at Electrical and Information Technology Department, Lund University, Sweden. He received his Ph.D. in 2010 from Fudan University in China. In 2010, he was with Electrical, Computer and Systems Engineering Department, Rensselaer Polytechnic Institute, USA as a visiting researcher. He joined Lund University as a Post-doc researcher in 2010 and is now associate professor there. His research interest includes signal processing for wireless communication and digital integrated circuits design. Liang is active in several EU and Swedish national projects, including FP7 MAMMOET, VINNOVA SoS, and SSF HiPEC, DARE. He is a board member of the IEEE Swedish Solid-State Circuits/Circuits and Systems chapter. He is also a member of the technical committees of VLSI systems and applications and CAS for communications of the IEEE Circuit and Systems society.

Erik G. Larsson received the Ph.D. degree from Uppsala University, Uppsala, Sweden, in 2002.

He is currently Professor of Communication Systems at Linköping University (LiU) in Linköping, Sweden. He was with the KTH Royal Institute of Technology in Stockholm, Sweden, the George Washington University, USA, the University of Florida, USA, and Ericsson Research, Sweden. In 2015 he was a Visiting Fellow at Princeton University, USA, for four months. His main professional interests are within the areas of wireless communications and signal processing. He has co-authored some 150 journal papers on these topics, he is co-author of the two Cambridge University Press textbooks Space-Time Block Coding for Wireless Communications (2003) and Fundamentals of Massive MIMO (2016). He is co-inventor on 18 issued and many pending patents on wireless technology.

He is a member of the IEEE Signal Processing Society Awards Board during 2017–2019. He is an editorial board member of the IEEE Signal Processing Magazine during 2018–2020. From 2015 to 2016 he served as chair of the IEEE Signal Processing Society SPCOM technical committee. From 2014 to 2015 he was chair of the steering committee for the IEEE Wireless Communications Letters. He was the General Chair of the Asilomar Conference on Signals, Systems and Computers in 2015, and its Technical Chair in 2012. He was Associate Editor for, among others, the IEEE Transactions on Communications (2010-2014) and the IEEE Transactions on Signal Processing (2006-2010).

He received the IEEE Signal Processing Magazine Best Column Award twice, in 2012 and 2014, the IEEE ComSoc Stephen O. Rice Prize in Communications Theory in 2015, the IEEE ComSoc Leonard G. Abraham Prize in 2017, and the IEEE ComSoc Best Tutorial Paper Award in 2018. He is a Fellow of the IEEE.

Footnotes

  1. Power control does not impact reciprocity, and it will show up as a scalar multiplication on the individual terminal signals.
  2. A similar reduction in hardware complexity could be achieved for UE radios custom-designed to operate in Massive MIMO networks specifically. Backward compatibility with previous broadband systems may require the presence of MIMO detection hardware in broadband UEs in practice.
  3. The module used to perform the measurements has a current limited to 45 mA.

References

  1. “First 5G NR specs approved, Dec. 2017,” http://www.3gpp.org/news-events/3gpp-news/1929-nsa_nr_5g.
  2. E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive MIMO for next generation wireless systems,” IEEE Comm. Mag., vol. 52, no. 2, pp. 186–195, Feb. 2014.
  3. T. L. Marzetta, E. G. Larsson, H. Yang, and H. Q. Ngo, Fundamentals of Massive MIMO.   Cambridge University Press, 2016.
  4. P. Harris, S. Malkowsky, J. Vieira, E. Bengtsson, F. Tufvesson, W. Hasan, L. Liu, M. Beach, S. Armour, and O. Edfors, “Performance characterization of a real-time massive MIMO system with LOS mobile channels,” IEEE Journal on Sel. Areas in Comm., vol. 35, pp. 1244–1253, Jun. 2017.
  5. X. Yang, W. Lu, N. Wang, K. Nieman, C. K. Wen, C. Zhang, S. Jin, X. Mu, I. Wong, Y. Huang, and X. You, “Design and implementation of a TDD-based 128-antenna massive MIMO prototype system,” China Communications, vol. 14, no. 12, pp. 162–187, Dec. 2017.
  6. A. F. Molisch, V. V. Ratnam, S. Han, Z. Li, S. L. H. Nguyen, L. Li, and K. Haneda, “Hybrid beamforming for massive MIMO: A survey,” IEEE Comm. Mag., vol. 55, no. 9, pp. 134–141, 2017.
  7. E. Björnson, J. Hoydis, L. Sanguinetti et al., “Massive MIMO networks: Spectral, energy, and hardware efficiency,” Foundations and Trends® in Signal Processing, vol. 11, pp. 154–655, 2017.
  8. X. Li, E. Björnson, E. G. Larsson, S. Zhou, and J. Wang, “Massive MIMO with multi-cell MMSE processing: exploiting all pilots for interference suppression,” EURASIP Journal on Wireless Comm. and Netw., vol. 2017, no. 1, p. 117, 2017.
  9. S. Malkowsky, J. Vieira, K. Nieman, N. Kundargi, I. Wong, V. Owall, O. Edfors, F. Tufvesson, and L. Liu, “Implementation of low-latency signal processing and data shuffling for TDD massive MIMO systems,” in Proc. of IEEE International Workshop on Signal Processing Systems (SiPS), pp. 260–265, Dec. 2016.
  10. C. Shepard, H. Yu, N. Anand, L. E. Li, T. L. Marzetta, R. Yang, and L. Zhong, “Argos: Practical many-antenna base stations,” in Proc. of ACM Int. Conf. on Mobile Computing and Networking (MobiCom), Istanbul, Turkey, Aug. 2012.
  11. A. Puglielli, A. Townley, G. LaCaille, V. Milovanovic, P. Lu, K. Trotskovsky, A. Whitcombe, N. Narevsky, G. Wright, T. A. Courtade, E. Alon, B. Nikolic, and A. M. Niknejad, “Design of energy- and cost-efficient massive MIMO arrays,” Proceedings of the IEEE, vol. 104, no. 3, pp. 586–606, 2016.
  12. K. Li, R. R. Sharan, Y. Chen, T. Goldstein, J. R. Cavallaro, and C. Studer, “Decentralized baseband processing for Massive MU-MIMO systems,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 7, no. 4, pp. 491–507, Dec. 2017.
  13. Q. Yang, X. Li, H. Yao, J. Fang, K. Tan, W. Hu, J. Zhang, and Y. Zhang, “Bigstation: enabling scalable real-time signal processing in large MU-MIMO systems,” in Prof. of ACM SIGCOMM, pp. 399–410, Aug. 2013.
  14. E. Bertilsson, O. Gustafsson, and E. G. Larsson, “A scalable architecture for massive MIMO base stations using distributed processing,” in Proc. of Asilomar Conference on Signals, Systems and Computers, pp. 864–868, Nov. 2016.
  15. G. Auer, V. Giannini, C. Desset, I. Godor, P. Skillermark, M. Olsson, M. A. Imran, D. Sabella, M. J. Gonzalez, O. Blume, and A. Fehske, “How much energy is needed to run a wireless network?” IEEE Wireless Comm., vol. 18, no. 5, pp. 40–49, Oct. 2011.
  16. C. Mollén, U. Gustavsson, T. Eriksson, and E. G. Larsson, “Impact of spatial filtering on distortion from low-noise amplifiers in massive MIMO base stations,” IEEE Trans. Comm., 2018. To appear.
  17. Y. Li, J. Lopez, P. H. Wu, W. Hu, R. Wu, and D. Y. C. Lie, “A SiGe envelope-tracking Power Amplifier with an integrated CMOS envelope modulator for mobile WiMAX/3GPP LTE transmitters,” IEEE Trans. on Microwave Theory and Techniques, vol. 59, no. 10, pp. 2525–2536, Oct. 2011.
  18. P. Horlin and A. Bourdoux, Digital Compensation for Analog Front-Ends: A New Approach to Wireless Transceiver Design.   Wiley, 2008.
  19. E. G. Larsson and L. Van der Perre, “Out-of-band radiation from antenna arrays clarified,” IEEE Wireless Comm. Lett., 2018. To appear.
  20. C. Mollén, U. Gustavsson, T. Eriksson, and E. G. Larsson, “Spatial characteristics of distortion radiated from antenna arrays with transceiver nonlinearities,” CoRR, vol. abs/1711.02439, 2017. [Online]. Available: http://arxiv.org/abs/1711.02439
  21. M. Pelgrom, Analog-to-Digital Conversion.   Springer, 2017.
  22. C. Mollén, J. Choi, E. G. Larsson, and J. R. W. Heath, “Uplink performance of wideband massive MIMO with one-bit ADCs,” IEEE Trans. Wireless Comm., vol. 16, pp. 87–100, Jan. 2017.
  23. C. Mollén, J. Choi, E. G. Larsson, and R. W. Heath Jr., “Achievable uplink rates for massive MIMO with coarse quantization,” in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 2017.
  24. Y. Li, C. Tao, G. Seco-Granados, A. Mezghani, A. L. Swindlehurst, and L. Liu, “Channel estimation and performance analysis of one-bit massive MIMO systems,” IEEE Trans. Signal Process., vol. 65, no. 15, pp. 4075–4089, Aug. 2017.
  25. L. Fan, S. Jin, C.-K. Wen, and H. Zhang, “Uplink achievable rate for massive MIMO with low-resolution ADC,” IEEE Commun. Lett., vol. 19, no. 12, pp. 2186–2189, Dec. 2015.
  26. J. Zhang, L. Dai, S. Sun, and Z. Wang, “On the spectral efficiency of massive MIMO systems with low-resolution ADCs,” IEEE Commun. Lett., vol. 20, no. 5, pp. 842–845, May 2016.
  27. G. Van der Plas and B. Verbruggen, “A 150 MS/s 133 W 7 bit ADC in 90nm Digital CMOS,” IEEE Journal of Solid-State Circuits, no. 43-12, pp. 2631–2640, Dec. 2008.
  28. K. Choo, J. Bell, and M. Flynn, “Area-efficient 1GS/s 6b SAR ADC with charge-injection-cell-based DAC,” in Proc. of IEEE International Solid-State Circuits Conference (ISSCC), Feb. 2016.
  29. Y. Li, C. Tao, A. L. Swindlehurst, A. Mezghani, and L. Liu, “Downlink achievable rate analysis in massive MIMO systems with one-bit DACs,” IEEE Commun. Lett., vol. 21, no. 7, pp. 1669–1672, Jul. 2017.
  30. S. Jacobsson, G. Durisi, M. Coldrey, T. Goldstein, and C. Studer, “Quantized precoding for massive MU-MIMO,” IEEE Trans. Comm., vol. 65, no. 11, pp. 4670–4684, Nov. 2017.
  31. C. Desset and L. Van der Perre, “Validation of low-accuracy quantization in massive MIMO and constellation EVM analysis,” in Proc. of European Conference on Networks and Communications (EuCNC), Jun. 2015.
  32. S. Jacobsson, G. Durisi, M. Coldrey, and C. Studer, “On out-of-band emissions of quantized precoding in massive MU-MIMO-OFDM,” in Proc. of Asilomar Conference on Signals, Systems, and Computers, Oct. 2017.
  33. MAMMOET project deliverable D2.4. [Online]. Available: https://mammoet-project.eu/downloads/publications/deliverables/MAMMOET-D2.4-Analysis-non-reciprocity-impact-PU-M20.pdf
  34. A. Bourdoux, B. Come, and N. Khaled, “Non-reciprocal transceivers in OFDM/SDMA systems: impact and mitigation,” in Proc. of Radio and Wireless Conference (RAWCON), pp. 183–186, Aug. 2003.
  35. J. Vieira, F. Rusek, O. Edfors, S. Malkowsky, L. Liu, and F. Tufvesson, “Reciprocity calibration for massive MIMO: Proposal, modeling and validation,” IEEE Trans. on Wireless Comm., vol. 16, no. 5, pp. 3042–3056, Mar. 2017.
  36. D. Zhu, B. Li, and P. Liang, “On the matrix inversion approximation based on Neumann series in massive MIMO systems,” in Proc. of IEEE International Conference on Communications (ICC), pp. 1763–1769, Jun. 2015.
  37. A. Mueller, A. Kammoun, E. Björnson, and M. Debbah, “Linear precoding based on polynomial expansion: Reducing complexity in massive MIMO,” EURASIP Journal on Wireless Comm. and Netw., Dec. 2016.
  38. H. Prabhu, O. Edfors, J. Rodrigues, L. Liu, and F. Rusek, “Hardware efficient approximative matrix inversion for linear pre-coding in massive MIMO,” in Proc. of IEEE International Symposium on Circuits and Systems (ISCAS), pp. 260–265, Jun. 2014.
  39. M. Wu, B. Yin, G. Wang, C. Dick, J. Cavallaro, and C. Studer, “Large-scale MIMO detection for 3GPP LTE: Algorithms and FPGA implementations,” IEEE Journal of Sel. Topics in Sign. Proc., vol. 8, no. 5, pp. 916–929, Oct. 2014.
  40. K. Lee and C. Chen, “An eigen-based approach for enhancing matrix inversion approximation in massive MIMO systems,” IEEE Trans. Veh. Techn., vol. 66, no. 6, pp. 5483–5487, Jun. 2017.
  41. B. Nagy, M. Elsabrouty, and S. Elramly, “Fast converging weighted Neumann series precoding for massive MIMO systems,” IEEE Wireless Comm. Lett., Oct. 2017.
  42. B. Yin, M. Wu, J. R. Cavallaro, and C. Studer, “Conjugate gradient-based soft-output detection and precoding in massive MIMO systems,” in Proc. of IEEE Global Communications Conference (GLOBECOM), pp. 3696–3701, Dec. 2014.
  43. M. Wu, C. Dick, J. Cavallaro, and C. Studer, “High-throughput data detection for massive MU-MIMO-OFDM using coordinate descent,” IEEE Trans. Circ. and Syst. I: Regular Papers, vol. 63, no. 12, pp. 2357–2367, Dec. 2016.
  44. X. Gao, L. Dai, J. Zhang, S. Han, and I. Chih-Lin, “Capacity-approaching linear precoding with low-complexity for large-scale MIMO systems,” in Proc. of IEEE International Conference on Communications (ICC), pp. 1577–1582, Jun. 2015.
  45. H. Prabhu, J. Rodrigues, L. Liu, and O. Edfors, “A 60 pJ/b 300 Mb/s 128 x 8 massive MIMO precoder-detector in 28 nm FD-SOI,” in Proc. of IEEE International Solid-State Circuits Conference (ISSCC), Feb. 2017.
  46. H. A. J. Alshamary and W. Xu, “Efficient optimal joint channel estimation and data detection for massive MIMO systems,” in Proc. of IEEE International Symposium on Information Theory (ISIT), pp. 875–879, Jul. 2016.
  47. R. Gangarajaiah, H. Prabhu, O. Edfors, and L. Liu, “A Cholesky decomposition based massive MIMO uplink detector with adaptive interpolation,” in Proc. of IEEE International Symposium on Circuits and Systems (ISCAS), May 2017.
  48. M. Cirkic and E. Larsson, “On the complexity of very large multi-user MIMO detection,” in Proc. of IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 55–59, Jun. 2014.
  49. T. Wei, H. Prabhu, L. Liu, Ö. Viktor, and Z. Zhengya, “A 1.8Gb/s 70.6pJ/b 128×16 link-adaptive near-optimal massive MIMO detector in 28nm UTBB-FDSOI,” in Proc. of IEEE International Solid-State Circuits Conference (ISSCC), Feb. 2018.
  50. N. Planes, O. Weber, V. Barral, S. Haendler, D. Noblet, D. Croain, M. Bocat, P. Sassoulas, X. Federspiel, A. Cros, and A. Bajolet, “28nm FDSOI technology platform for high-speed low-voltage digital applications,” in Proc. of IEEE Symposium on VLSI Technology (VLSIT), pp. 133–134, Jun. 2012.
  51. Y. Huang, C. Desset, A. Bourdoux, W. Dehaene, and L. Van der Perre, “Massive MIMO processing at the semiconductor edge: Exploiting the system and circuit margins for power savings,” in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3474–3478, Mar. 2017.
  52. S. Gunnarsson, M. Bortas, Y. Huang, C.-M. Chen, L. Van der Perre, and O. Edfors, “Lousy processing increases energy efficiency in massive MIMO systems,” in Proc. of European Conference on Networks and Communications (EuCNC), Jun. 2017.
  53. Y. Huang, M. Li, C. Li, P. Debacker, and L. Van der Perre, “Computation-skip error mitigation scheme for power supply voltage scaling in recursive applications,” Journal of Signal Processing Systems, vol. 84, no. 3, pp. 413–424, Sep. 2016.
  54. J. Han and M. Orshansky, “Approximate computing: An emerging paradigm for energy-efficient design,” in Proc. of European Test Symposium (ETS), May 2013.
  55. S. K. Saha, “Modeling process variability in scaled CMOS technology,” IEEE Design Test of Computers, vol. 27, no. 2, pp. 8–16, Mar. 2010.
  56. M. Fojtik, D. Fick, Y. Kim, N. Pinckney, D. Harris, D. Blaauw, and D. Sylvester, “Bubble Razor: An architecture-independent approach to timing-error detection and correction,” in Proc. of IEEE International Solid-State Circuits Conference (ISSCC), pp. 488–490, Feb. 2012.
  57. Y. Liu, T. Zhang, and K. K. Parhi, “Computation error analysis in digital signal processing systems with overscaled supply voltage,” IEEE Trans. on Very Large Scale Integr. Syst., vol. 18, pp. 517–526, Apr. 2010.
  58. ITU vision on 5G usage scenarios. [Online]. Available: https://www.itu.int/dms_pubrec/itu-r/rec/m/R-REC-M.2083-0-201509-I!!PDF-E.pdf
  59. P. Vandenameele, L. Van der Perre, M. G. E. Engels, B. Gyselinckx, and H. J. D. Man, “A combined OFDM/SDMA approach,” IEEE Journal on Sel. Areas in Comm., vol. 18, no. 11, pp. 2312–2321, Nov. 2000.
  60. Y. Shingo and Y. Miyanaga, “VLSI implementation of a 4x4 MIMO-OFDM transceiver with an 80-MHz channel bandwidth,” in Proc. of IEEE International Symposium on Circuits and Systems, pp. 1743–1746, May 2009.
  61. J. Ketonen, M. Juntti, and J. R. Cavallaro, “Performance-complexity comparison of receivers for a LTE MIMO-OFDM system,” IEEE Trans. on Sign. Proc., vol. 58, no. 6, pp. 3360–3372, Jun. 2010.
  62. Connecting sensors with low power wireless technologies. [Online]. Available: https://dramco.be/tutorials/low-power-lora/
  63. S. Malkowsky, J. Vieira, L. Liu, P. Harris, K. Nieman, N. Kundargi, I. Wong, F. Tufvesson, V. Owall, and O. Edfors, “The world’s first real-time testbed for massive MIMO: Design, implementation, and validation,” IEEE Access, vol. 5, pp. 9073–9088, May 2017.
  64. “Openairinterface Massive MIMO testbed : A 5G innovation platform,” http://www.openairinterface.org/.
  65. “Introducing facebook’s new terrestrial connectivity systems – terragraph and project ARIES,” https://code.facebook.com/posts/1072680049445290/introducing-facebook-s-new-terrestrial-connectivity-systems-terragraph-and-project-aries/.
  66. C. M. Chen, V. Volskiy, A. Chiumento, L. Van der Perre, G. A. E. Vandenbosch, and S. Pollin, “Exploration of user separation capabilities by distributed large antenna arrays,” in Proc. of IEEE Global Communications Conference (GLOBECOM) Workshops, Dec. 2016.
  67. P. Harris, W. Hasan, S. Malkowsky, J. Vieira, S. Zhang, M. Beach, L. Liu, E. Mellios, A. Nix, S. Armour, and A. Doufexi, “Serving 22 users in real-time with a 128-antenna massive MIMO testbed,” in Proc. of IEEE International Workshop on Signal Processing Systems (SiPS), pp. 266–272, Oct. 2016.
  68. MAMMOET project deliverable D4.2. [Online]. Available: https://mammoet-project.eu/downloads/publications/deliverables/MAMMOET-D4.2-Testbed-assessment-PU-M33.pdf
  69. “Massive MIMO in mobile environments,” in the Massive MIMO blog, http://massive-mimo.net.
  70. C. Barati, S. Hosseini, S. Rangan, P. Liu, T. Korakis, S. Panwar, and T. Rappaport, “Directional cell discovery in millimeter wave cellular networks,” IEEE Trans. on Wireless Comm., vol. 14, no. 2, pp. 6664–6678, Dec. 2015.
  71. M. Karlsson, E. Björnson, and E. G. Larsson, “Performance of in-band transmission of system information in massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 17, pp. 1700–1712, Mar. 2018.
  72. X. G. Xia and X. Gao, “A space-time code design for omnidirectional transmission in massive MIMO systems,” IEEE Wireless Comm. Lett., vol. 5, no. 5, pp. 512–515, Oct. 2016.
  73. X. Meng, X. Gao, and X. G. Xia, “Omnidirectional precoding based transmission in massive MIMO systems,” IEEE Trans. on Comm., vol. 64, no. 1, pp. 174–186, Jan. 2016.
  74. G. Marco, M. Mezzavilla, and M. Zorzi, “Initial access in 5G mmwave cellular networks,” IEEE Comm. Mag., vol. 54, no. 11, pp. 40–47, Nov. 2016.
  75. M. Franceschetti, “On Landau’s eigenvalue theorem and information cut-sets,” IEEE Trans. Inf. Theory, vol. 61, no. 9, pp. 5042–5051, Sep. 2015.
  76. H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, “Cell-free massive MIMO versus small cells,” IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1834–1850, Mar. 2017.
  77. S. Hu, F. Rusek, and O. Edfors, “Beyond massive MIMO: The potential of data transmission with large intelligent surfaces,” IEEE Trans. on Sign. Proc., vol. 66, no. 10, pp. 2746–2758, May 2018.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
220556
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description