Efficient DSP and Circuit Architectures for Massive MIMO: StateoftheArt and Future Directions
Abstract
Massive MIMO is a compelling wireless access concept that relies on the use of an excess number of basestation antennas, relative to the number of active terminals. This technology is a main component of 5G New Radio (NR) and addresses all important requirements of future wireless standards: a great capacity increase, the support of many simultaneous users, and improvement in energy efficiency.
Massive MIMO requires the simultaneous processing of signals from many antenna chains, and computational operations on large matrices. The complexity of the digital processing has been viewed as a fundamental obstacle to the feasibility of Massive MIMO in the past. Recent advances on systemalgorithmhardware codesign have led to extremely energyefficient implementations. These exploit opportunities in deeplyscaled silicon technologies and perform partly distributed processing to cope with the bottlenecks encountered in the interconnection of many signals. For example, prototype ASIC implementations have demonstrated zeroforcing precoding in real time at a 55 mW power consumption (20 MHz bandwidth, 128 antennas, multiplexing of 8 terminals). Coarse and even errorprone digital processing in the antenna paths permits a reduction of consumption with a factor of 2 to 5. This article summarizes the fundamental technical contributions to efficient digital signal processing for Massive MIMO. The opportunities and constraints on operating on lowcomplexity RF and analog hardware chains are clarified. It illustrates how terminals can benefit from improved energy efficiency. The status of technology and reallife prototypes discussed. Open challenges and directions for future research are suggested.
I Introduction
Massive MIMO is an efficient sub6 GHz physicallayer technology for wireless access, and a key component of the 5G New Radio (NR) interface [1]. The main concept is to use large antenna arrays at base stations to simultaneously serve many autonomous terminals, as illustrated in Figure 1 [2, 3]. Smart processing at the array exploits differences among the propagation signatures of the terminals to perform spatial multiplexing. Massive MIMO offers two main benefits:

Excellent spectral efficiency, achieved by spatial multiplexing of many terminals in the same timefrequency resource [4, 5]. Efficient multiplexing requires channels to different terminals to be sufficiently distinct. Theory as well as experiments have demonstrated that this can be achieved both in lineofsight and in rich scattering.

Superior energy efficiency, by virtue of the array gain, that permits a reduction of radiated power. Moreover, the ability to achieve excellent performance while operating with lowaccuracy signals and linear signal processing further enables considerable savings in the power required for signal processing.
This overview paper focuses on sub6 GHz Massive MIMO systems implemented with fully digital perantenna signal processing. Massive MIMO at mmWave frequencies is also possible, and can benefit from the large bandwidth available at these frequencies. Propagation and hardware implementation aspects are different at mmWaves; for example, hybrid analogdigital beamforming approaches are typically considered [6]. However, this is not discussed further here.
The complexity of the signal processing has been considered a potential obstacle to actual deployment of Massive MIMO technology. An obvious concern is how operations on large matrices and the interconnection of the many antenna signals can be efficiently performed in realtime. Moreover, reallife experiments have shown that the channel responses to different terminals can be highly correlated in some propagation environments. Appropriate digital signal processing hence needs to feature interference suppression capabilities, which further increases complexity.
This paper discusses the digital signal processing required to realize the Massive MIMO system concept, and examines in detail the codesign of algorithms, hardware architecture, and circuits (Figure 2). Unconventional, lowcomplexity digital circuitry implementations in deeply scaled silicon are possible, despite (and thanks to) the excess number of antenna signals. A careful choice of algorithmic and circuit parameters permits considerable reduction of the average energy consumption. Terminals in turn can be implemented at low complexity while benefiting from the channel hardening effect, that offers increased reliability.
Proof of concept implementations and demonstrations have revealed constraints that turned out more harsh than anticipated in initial theoretical assessments. This concerns the interconnection of the signals from all antennas, which poses a bottleneck that partly necessitates distributed processing. Also, relaxing the specifications of the analog and RF chains can result in higher distortion both inband and outofband than initially anticipated, as hardware imperfections can in general not be considered uncorrelated.
The rest of the paper is organized as follows. First, basic concepts and notation are introduced. Next, we provide a complexity analysis considering computation as well as data transfer. The following section zooms in on the RF and frontend, highlighting the opportunities and constraints of relaxing their specifications in the largenumberofantennas regime. Subsequently, the central detector and precoder blocks are detailed and major complexity reductions facilitated by algorithmhardware codesign are demonstrated. Signal processing leveraging on errorresilient circuits in the perantenna functionality is discussed next, and consequent energy savings are illustrated. Further we introduce introduces the increased reliability that can be delivered on complexity terminals. Finally, in the conclusions we discuss validation performed in reallife test beds, summarize opportunities and constraints in efficient processing for Massive MIMO systems, and suggest future research directions.
Ii Massive MIMO System Model
This section introduces the notation for MIMO transmission that is used in the paper. Further details can be found in, for example, [3]. We consider the blockfading model where the timefrequency domain is partitioned into coherence intervals within which the channel is static. The number of samples in each coherence interval is equal to the coherence time in seconds multiplied by the coherence bandwidth in Hertz. For the signal processing algorithms discussed in this paper, it does not matter whether there is coding across coherence intervals or not.
In every coherence interval, a flat fading complex baseband channel model applies. Let be the number of antennas at the base station, and the number of terminals served simultaneously. Also, denote by the vector of channel responses between the th terminal and the array. Then on uplink, for every sample in the coherence interval,
(1) 
where is an vector comprising samples received at the base station array, are symbols sent by the th terminal, and is noise. On downlink, assuming linear precoding,
(2) 
where is the sample received by the th terminal, is a precoding vector associated with the th terminal, is the symbol destined to the th terminal, and is receiver noise.
The base station forms a channel estimate, , of for each terminal by measurements on uplink pilots. Channel estimation is discussed extensively in for example [3] (for independent Rayleigh fading) and [7] (for correlated fading models).
On uplink, the data streams from the terminals are detected through linear processing. This entails multiplication of with a vector, for each terminal, yielding the scalar . Common choices of the detection vector include
(3) 
where is a normalizing constant (different for the three methods), and . The result of this linear processing will comprise the desired signal, embedded in additive interference and noise.
On downlink, channel reciprocity is leveraged. Lowcomplexity frontends typically introduce nonreciprocity and this nonreciprocity needs to be compensated for; see Section IV. The base station forms the transmitted vector in (2) where the precoding vector is given by:
(4) 
where, again, are normalizing constants and is a regularization parameter. The signal received at the terminal will contain the symbol of interest, plus additive interference and noise.
Iii Signal Processing and Data Transfer Complexity Assessment
Both Massive MIMO base stations and terminals can be implemented with significantly better energy efficiency compared to in conventional systems. This is possible owing to a combination of effects. First, the array gain permits a reduction of the radiated power. Second, the large number of constituent signals promotes excellent performance while operating relatively simple algorithms on coarse signals.
In this section we focus on the processing at the base station side. The opportunity to reduce terminalside complexity is discussed in Section VII. First, a highlevel assessment of the signal processing requirements, in terms of number of computations, is presented. The data transfer and interconnection of signals poses a distinct bottleneck. Hence, next a distributed processing approach is presented to balance performance and complexity.
Iiia Computational Complexity
We first analyze the computational complexity of a Massive MIMO base station. Figure 3 shows a highlevel block diagram of the signal processing for an OFDMbased massive MIMO system. Other modulation options can be used, and singlecarrier schemes may be preferred. The overall partition of the processing presented here will still hold.
The processing in Massive MIMO systems is logically grouped into three categories:

The outer modem performing symbol (de)mapping, (de)interleaving and channel (de)coding. This processing performed on the transmit/receive bits applies to each User Equipment (UE) individually.

The inner modem comprising channel estimation, and detection and precoding of the uplink and downlink data, respectively. This central processing aggregates/distributes data from/to all the antenna chains.

The perantenna processing which primarily consists of the analog and digital frontend (mainly resampling and filtering) and OFDM processing.
We identify inherent parallelism and observe that the processing complexity scales with the number of BS antennas, , the number of UEs, , or both [9]:

Perantenna processing: Scales with as each antenna requires OFDM (de)modulation and a digital/analog frontend.

Central processing: Scales with and .

Peruser processing: Scales with .
The number of digital signal processing operations performed in the subsystems provides a highlevel estimate of complexity. Table I gives numbers for a sample system with antennas at the basestation and simultaneous terminals. It is acknowledged that these estimates represent an oversimplification, as the nature and precision of the operations will be an important determining factor in the eventual hardware complexity and power consumption.
Subcomponent  Downlink data (DL)  Uplink data (UL)  Training 

[GOPS]  [GOPS]  [GOPS]  
Inner modem  175  520  290 
Outer modem  7  40  0 
Perantenna DSP  920  920  920 
Table I demonstrates that the collective perantenna digital processing is demanding, and requires a minimalcomplexity implementation. Interestingly, the perantenna processing does not need to be performed with high precision to offer very good performance. An indepth analysis and efficient implementation options are presented in Section VI.
For the inner modem processing in Massive MIMO, a high degree of reconfigurability is desired in order to adapt to changing operating conditions, such as the number of connected UEs, and their SNRs/path losses. Section V discusses efficient algorithmhardware codesign solutions for the Massive MIMO precoding and detection.
Furthermore, reciprocity calibration needs to be performed occasionally. Elegant solutions have been proposed and demonstrated, see Section IV.
Channel coding clearly is an essential component of the wireless transmission, yet it is not Massive MIMOspecific and therefore not further treated in this paper.
IiiB Signal Interconnection and Data Transfer Complexity
The transfer of data between processing components creates a significant challenge, as the amount of signals and data to be aggregated/distributed from/to all the antennas is very high. The required data shuffling rate between the perantenna processing and the central processing is [9]
(5) 
where is the sampling rate after OFDM processing and is the wordlength of one data sample. For a 100antenna 20 MHz bandwidth system, the sampling rate at each antenna is 30.72 and thus
(6) 
where , , and are the number of data subcarriers, the total number of subcarriers, respectively the number of cyclic prefix samples. Assuming that 24 bits are used for one complex sample, equals 40.32 . This requirement is an order of magnitude higher than in a conventional system.
Additionally, the data transfer network must reorganize data among different dimensions. Figure 4 illustrates the uplink data shuffling between the perantenna and the central processing. First, in the figure, the data shuffling network aggregates data samples of all subcarriers from all antenna chains. Next, in the figure, it divides the entire data into bandwidth chunks depending on the number of central processing units in the system, and distributes the data to the corresponding processing unit.
This high data transfer requirements has motivated the development of decentralized processing architectures, which are introduced next.
IiiC Decentralized Processing
Depending on the selected MIMO processing algorithms, both the processing performed in the perantenna and in the central units, and the communication between these two, will influence the resulting system performance and overall complexity. For instance, the maximumratio precoding operation can be performed in each antenna path in a distributed manner, whereas the zeroforcing algorithm requires centralized processing, specifically for the inversion of the Gram matrix .
Decentralized processing enables parallel computing and offers a balanced tradeoff between system performance and data transfer requirements [10, 11, 12, 13]. The authors of [12] propose a decentralized architecture for both uplink and downlink, illustrated in Figure 5. Instead of aggregating the full channel state information and transmit/received data vectors at the centralized processing node, antenna nodes are grouped into equally sized groups, each serving antenna nodes. A middle level processing node, labeled group processor, is introduced between the perantenna and central processor to handle the corresponding data dedicated to the group of antenna nodes. As a result, a limited amount of data is then aggregated/distributed to/from the central processor, relaxing the requirements on the data transfer network. For instance, the Gram matrix calculation can be rewritten as
(7) 
where is the local channel estimate for each group of antennas. The decentralized processing is performed such that the terms are computed at each group processor locally, and the results are aggregated at the central processor for the final summation. The treelike distributed processing architecture is further elaborated in [14], with special focus on modularity and scalability. Especially, the tradeoff between data processing, storage, and shuffling is investigated for maximumratio transmission, zeroforcing, and MMSE algorithms.
Iv Analog and RF Processing: Relax with Caution!
In traditional base stations, the RF electronics and analog frontends, and the power amplifiers specifically, consume most of the power [15]. In Massive MIMO, thanks to the array gain provided by the closedloop beamforming, much less radiated power is needed for the data transmission. This facilitates a significant reduction of the RF complexity and power consumption compared to conventional systems.
The hardware in any wireless transceiver will introduce distortion, and the most important source of distortion is nonlinearities in power amplifiers and quantization noise in A/Dconverters. A commonly used model in the literature has been that this distortion is additive and uncorrelated among the antennas [7]. If this were the case, then the effects of hardware imperfections would average out as the number of antennas is increased, in a similar way as the effects of thermal noise average out. In more detail, consider the linear processing in the uplink, ; see Section II. The essence of the argument is that if the received signal at the array, , is affected by uncorrelated additive distortion noise , then the effective power of the useful signal after beamforming processing, , would grow as whereas the power of the distortion, , would be constant with respect to (see [7] for more precise analyses). But unfortunately, this model does not accurately describe the true nature of the hardware distortion.
To understand why, fundamentally, the distortion is correlated among the antennas, consider the downlink in the special case of a single terminal in lineofsight. Then the signal radiated by the th antenna is simply a phaseshifted version of the signal radiated by the first antenna (). The distortion arising from an amplifier nonlinearity at the th antenna is phaseshifted by the same amount as the signal. Hence, if all amplifiers have identical characteristics (a weak assumption in practice), the distortion is beamformed into the same direction as the signal of interest, and receives the same array gain as that signal of interest. That is, the effects of the distortion do scale proportionally to rather than disappearing as is increased. In this case, the covariance matrix of the distortion, when viewed as an vector , has rank one. A similar effect exists on the uplink, when the nonlinearities in lownoise amplifiers are considered [16].
In the remainder of this section, we discuss the specifics of distortion arising from amplifier nonlinearities and finiteresolution A/Dconverters in more detail. We furthermore discuss the impact and calibration of RF frontend nonreciprocity.
Iva Power Amplifiers Benefit from the Large Array
The required output power of a Massive MIMO base station can be reduced inversely proportionally to the square root of number of BS antennas, or even linearly in operating regimes with good channel estimation quality, thanks to the coherent combination of all antenna signals. This results in significantly reduced output specifications of the Power Amplifiers (PAs). The power amplification stage typically accounts for of the power consumption of base stations in wireless broadband macrocells [15]. Moreover they necessitate cooling, causing a overhead. The reduced output power in Massive MIMO hence can reduce the total power by a factor of 3 in an exemplary 100antenna base station, assuming that all other contributions remain equal.
The PA mostly operates at a low efficiency as a consequence of a considerable backoff, required to avoid entering the saturation region. For OFDMbased systems such as 3GPPLTE, the PA typically operates with a back off of 8–12 dB. Bestinclass solutions need complex techniques that achieve an efficiency of [17]. Entering the saturation region introduces nonlinear distortion, which comes with two detrimental effects: distortion of the intended signal within the band of interest, and outofband (OOB) emissions that result in adjacent channel leakage.
We consider a polynomial memoryless model [18] for the nonlinear behaviour of the PA. The impact on the signal at RF can be expressed as:
(8) 
where is the input signal to the PA, is the output signal, and is the nonlinear distortion coefficient of the PA for the th harmonic component. The thirdorder harmonic will have the largest impact both in terms of inband distortion and adjacent channel leakage. Furthermore, the amplitude will be limited to the saturation amplitude for input values exceeding the input saturation amplitude :
(9) 
The nonlinear distortion resulting from the PAs in the many antenna paths is hence signal dependent. The input signals to the PAs can be correlated, depending on the specific communication scenario in terms of users, channel responses, and power (im)balance among the users. In [19] we analyzed how the distortion terms can combine by means of a basic dualtone modulation scheme. The following effects can occur:

The distortions may add up coherently in the channel and generate considerable outofband emissions. This will be the case for example in a singleuser situation with one strongly dominating propagation direction.

In most multiuser scenarios the precoder will provide significant different compositions of signals to the antenna paths and hence power amplifiers. In general, this will randomize the harmonic distortion terms.
The constellation diagrams in Figure 6 illustrate the impact of increasing the number of antennas at the base station on the Error Vector Magnitude (EVM), for a case with equalstrength signals for the different users and i.i.d. Rayleigh fading channels. The results were simulated based on a cubic polynomial model for the PA, which operates in saturation ( dB with respect to the dB compression point). With antennas at the base station, the constellation points are seriously dispersed and an EVM of dB is measured. When increasing the number of antennas, in steps of 10 in the graph, the clarity of the constellation diagram greatly improves and for an EVM of dB is observed.
In conclusion, the power amplifiers benefit from the large array owing to the drastically reduced total output power requirement. Moreover in many typical conditions, Massive MIMO systems will not transmit predominantly to one user and in one direction. One could then operate the PAs efficiently in their nonlinear region. Hence, a considerable further improvement of the power consumption could be achieved. However, the inconvenient truth is that in general, directive emissions of OOB radiation can arise under some conditions. More detailed mathematical models and results can be found in [20].
IvB Coarse and Lean Convertors
The impact of lowresolution data converters on system performance has been investigated. We give an overview of these theoretical results and discuss them in perspective of actual design constraints and merits of stateoftheart data converters. These reveal that minimizing the resolution strictly (e.g., below 6 bits) does not result in a significant power reduction in a conventional base station. One should hence question any penalty in system performance and/or additional DSP complexity when considering very low resolution data converters.
A specific type of hardware distortion arises if lowprecision A/D converters are used at the base station. Such converters are highly desirable owing to their low cost and power consumption. In principle for each bit reduction in resolution, the A/D converter power is halved. Doubling the sampling frequency will double the power. This is reflected in the common figureofmerit (F.o.M.) in terms of energy consumption per conversion step (cs) [21] used to assess the design merit of A/D converters implementing different architectural principles and resolution/bandwidth specifications:
(10) 
where ENOB is the Effective Number of Bits resolution as measured and is the sampling frequency.
The resulting quantization noise of A/D conversion is fairly easy to model accurately, and rigorous informationtheoretic analyses of its effect are available. In some cases, lineofsight with a single terminal, the quantization noise may combine constructively. However, in frequencyselective, Rayleigh fading channels with large delayspreads and multiuser beamforming, the distortion averages out over the antennas to a significant extent. Specifically, with 1bit quantization, the quantization noise has a power equal to where is the received signal power [22], and the aggregate effect of the quantization is approximately a loss in effective SINR of 4 dB. The 1bit A/D converter case is of particular interest as it allows operation without automatic gain control (AGC), which simplifies hardware complexity. With bit quantization, , corresponding results can be found in [23], and when grows eventually the capacity formulas for the unquantized case [3, Ch. 3] are rediscovered. Other authors have derived similar results subsequently [24] – and earlier, using heuristic arguments, [25, 26]. Importantly, these analyses take into account the fact that both the received pilots and the payload data will be affected by quantization noise.
The loss in effective SINR due to quantization needs to be considered relatively to the extra power consumption resulting from adding bits resolution in the A/D converters. Circuit innovation in data converters has brought great improvements in power efficiency. Stateoftheart designs for A/D converter cores achieve figuresofmerit following 10 in the order of fJ/cs [27, 28]. A 6bit ADC with a speed of several 100 Mbit/s consumes mW.
Massive MIMO systems operating with lowresolution DigitaltoAnalog (D/A) converters at the base station in the downlink transmission have also been studied. There is some evidence that they are sufficient to attain a good performance in terms of achievable link rate [29, 30]. Also, while these analyses are independent of the actual modulation and coding used in the system, numerical endtoend link simulations have independently arrived at essentially the same conclusion that the degradation of BER performance due to lowprecision () D/A converters is negligible [31]. It is however a misconception that the number of bits resolution affects the D/A converter power consumption in a similar way as it does for A/D converters. The constraint on OOB emission in combination with the swing to be delivered to the analog output signal are the dominant factors in the power and complexity of a D/A converter [21]. A relevant standard figureofmerit (F.o.M.) for currentsteering D/A converters is given by
(11) 
where SFDR is the spurious free dynamic range, being the distance between the signal and the largest single unwanted component – the spurious signal, and is the peaktopeak signal swing which accounts for the power (and design problems) needed for generating the analog signal in a digitalto analog converter. D/A converters with a resolution bits are conveniently implemented by current injection or resistive architectures whose power consumption is typically not directly impacted by their resolution. In contrast, the complexity of the reconstruction filter in the D/A converter is mostly determined by the SFDR specification, which will eventually determine the outofband (OOB) harmonic distortion. Digital predistortion and analog filtering to reduce OOB emissions have been proposed for coarsely quantized precoding in Massive MIMO [32]. The extra processing complexity in deeply scaled technology will be very reasonable, yet a degradation of the inband signaltointerferencenoiseanddistortion ratio (SINDR) on the link is introduced. This presents the same tradeoff between inband transmission versus outofband rejection encountered in D/A converter design.
The trend in broadband wireless systems to increase spectral efficiency through a combination of higher order modulation constellations and conventional multilayer MIMO has raised the resolution requirement for data converters bits. Massive MIMO can operate without noticeable implementation loss with only bit A/D and D/A converter resolution. This reduces the power consumption of an individual A/D converter specifically with a factor , which more than compensates for the fact that times more converters are needed. It is however neither necessary nor overall beneficial to reduce the resolution of A/D and D/A converters below 6 bits:

On uplink, reducing the A/D resolution further will save less than mW in a 100 antenna basestation.

On downlink, a potential implementation loss of dB or more due to a D/A converters with a lower resolution may require more power in the PA stage. More importantly, the constraints on OOB emission will not be met. Dedicated processing will hence be needed to avoid or filter out unacceptable leakage in adjacent bands.
IvC Reciprocity Calibration in RF FrontEnds
Channel estimates are obtained from uplink pilots; see Section II. In practice, the response observed by the digital baseband processing for each user includes both the propagation channel and the transceiver transfer functions. The full responses for uplink and downlink can be expressed as:
(12) 
where and are complex diagonal matrices containing the base station receiver and transmitter responses, and and are the responses of the transmitter and receiver of user terminal . While the responses of the propagation channel are reciprocal, the responses of the frontends will typically cause nonreciprocity in the full response. In the precoded Massive MIMO downlink reception the following holds:
(13) 
When the corresponding estimates of are used
to calculate the precoding coefficients, they will introduce
MultiUser Interference (MUI) and potentially an SNR loss, depending on
the precoding vectors . We include the derivation for the
zeroforcing precoder, and refer to [33] for a
comprehensive treatment. Under the assumption of negligible channel
estimation errors and considering normalized responses to simplify
notation,
(14) 
where and are the vectors of transmitted symbols and received noise samples, respectively. Writing out the frontend responses gives the following expression:
(15) 
where and are diagonal matrices containing the transmitter and receiver responses of terminals and . Equation (15) shows that in general the combined precoding, channel, and transceiver responses will not result in a diagonal matrix. As a result, MUI will occur. Structurally it is the multiplication of the base station’s frontend responses that is responsible for the MUI. The terminal responses appear as scalar multiplications on the received symbols and will be contained in the equalization processing in the terminal. A suitable calibration procedure operating locally at the base station can restore the reciprocity. Calibration data needs to be obtained through measurements of the transceiver frontend responses, for which several approaches have been proposed and validated:

Utilization of an auxiliary frontend, which sequentially measures the RF transceiver frontends. The method works well in conventional MUMIMO systems [34]. However, it does not scale well to large numbers of antennas.

Exploitation of the coupling, essentially radio propagation, between antennas in the array to derive the relative differences among the transceiver responses. This solution has been implemented in reallife testbeds and performs well [35].
Analysis has shown that nonreciprocity requirements are not as severe for Massive MIMO as in conventional systems [33] and depend on the system load and precoding algorithms. The RF transceiver responses may vary in time mainly due to temperature differences. The calibration procedure hence needs to be repeated on a regular basis. In typical conditions the required updating frequency is in the order of hours. It thus introduces only very limited overhead.
V AlgorithmHardware CoDesign for Precoding and Detection
The central detector and precoder perform the crucial operations to achieve spatial multiplexing. This section zooms in on the hardware implementation of the precoding and detection algorithms.
Algorithms  Per channel realization  Per channel use 

Neumann Series  
Cholesky Decomp.  
Modified QR Decomp.  
Coordinate Descent   
Va Implementation Challenges and Design Considerations
Linear processing provides good precoding and detection performance under favorable propagation conditions. However, linear processing in Massive MIMO does not necessarily result in low computational complexity given that the operations need to be performed on large matrices. For instance, the complexity of computing for an matrix is
(16) 
This number is in the order of for an , system. In TDD massive MIMO systems, processing latency is a crucial design consideration, especially for highmobility scenarios. The analysis in [9] shows that the time budget for operating the precoding is 150 s to support a moderate mobility of km/h. The high computational complexity and processing speed need to be handled with reasonable hardware cost and power consumption. These implementation challenges necessitate meticulously optimized solutions following a systematic algorithmhardware codesign methodology.
A central property of Massive MIMO is that the column vectors of the channel matrix are asymptotically orthogonal under favorable propagation conditions. As a result, the Gram matrix, , becomes diagonally dominant, i.e.,
(17) 
and for i.i.d. channels,
(18) 
The extent of the diagonal dominance varies with the characteristics of the antenna array, the propagation environment, and the number of users served. Exploiting this dominance, approximate matrix inversion can be performed to reduce the computational complexity. Matrix inversion approaches can be categorized into three types: explicit computation, implicit computation, and hybrid methods. We next assess the complexity and suitability of these methods.
VB Explicit Matrix Inversion
Explicit matrix inversion can be performed using approaches such as Gausselimination, Neumann series expansion [36], and truncated polynomial expansion [37]. Recently, the Neumann series approximation has been identified as one of the most hardwarefriendly algorithms for Massive MIMO systems [38, 39]. If a matrix satisfies
(19) 
its inverse can be approximated by a Neumann series with terms as:
(20) 
where is a preconditioning matrix. The number of terms, , can be used as a tuning parameter to trade off between complexity and accuracy. It is shown in [39] that using the main diagonal of the Gram matrix,
(21) 
as the preconditioning matrix, the Neumann series approximation can provide closetoexactinversion performance with when . However, a significant performance loss is demonstrated when . To improve the accuracy, the following weighted Neumann series approximation was introduced in [40, 41]:
(22) 
In [40], the coefficients are selected by solving the equation
(23) 
where
(24) 
At the price of extra computational complexity, the method in (22) improves the performance significantly, especially in cases with a high user load.
VC Implicit Matrix Inversion
Implicit matrix inversion uses linearsolvers such as conjugategradient [42], coordinatedescent [43], and GaussSeidel [44] to perform linear precoding and detection, without explicitly calculating the Gram matrix inverse. In [43], the coordinatedescent method is adopted to realize an MMSE detector. The regularized squared Euclidean distance,
(25) 
is minimized sequentially for each variable in in a roundrobin fashion. In (25), is the variance of each complex entry in the noise vector . In each iteration, the solution for the th element in is
(26) 
This procedure is then repeated for iterations.
VD Hybrid Method
Matrix decomposition algorithms factorize the Gram matrix into intermediate matrices, which are generally triangular. Forward or backward substitution is then performed to accomplish the corresponding precoding and detection operation. The solution in [45] utilizes QRdecomposition. The Gram matrix is decomposed as
(27) 
where is unitary and is upper triangular. The linear equation is then rewritten as
(28) 
which can be solved using backward substitution. This method avoids the explicit computation of matrix inverses, relaxing (to some extent) the requirements on data representation accuracy. By exploiting the diagonally dominant property of the Gram matrix, modified QRdecomposition can be performed [45]. For instance, the original solutions
(29) 
to the Givens rotation operation
(30) 
are approximated by
(31) 
Equation (31) makes use of the fact that and results in 50% complexity savings by introducing the constant .
Choleskydecomposition () has also been studied for Massive MIMO precoding and detection implementation [46, 47]. It has lower computational complexity than the Neumann series expansion method (with ) [39] and provides accurate processing independent of and . More importantly, the Cholesky decomposition imposes lower memory requirements, since only the lower triangular matrix needs to be stored.
VE Complexity versus Accuracy TradeOff
To select appropriate processing algorithms for Massive MIMO is nontrivial, and an analysis of the tradeoff between computational complexity and processing performance is necessary. Reference [48] presents such an analysis for different MMSE detection techniques.
To evaluate the processing accuracy, we simulate the performance of different detection techniques including Neumann series approximation (NSA), Cholesky decomposition (ChD), modified QRD (MQRD), and coordinate descent (CD). The effects of fixedpoint arithmetics is also taken into consideration to examine the required data precision. In the simulations, , sweeps from 8 to 32, and an i.i.d. block Rayleigh fading channel with perfect channel estimation and synchronization was considered. A rate convolutional code with generator polynomial [171, 133] and a constraint length of 7 was used. Figure 7 shows the performance at BER relative to floatingpoint ZF detection. The number of iterations for the NSA and CD was set to 3. Implicit and hybrid methods are more robust to lower resolutions, while NSA requires a larger number of bits to calculate the matrix inverse explicitly. When is small the Gram matrix becomes less diagonally dominant and approximate matrix inversion methods suffer from a larger performance loss. CD offers better interference cancellation when the user load is relatively high.
Table II lists the corresponding computational complexity in terms of number of real multiplications. The computation is divided into two parts depending on how frequently it needs to be executed, i.e., per channel realization and per channel use (instance of the detection problem). The Gram matrix calculation, matrix decomposition, and matrix inversion are performed when the channel changes, while matchedfiltering and backward/forward substitution are performed for each received vector. Thereby, the computational complexity depends on the channel dynamics, i.e., the number of samples () during which the channel is constant. Figure 8 depicts the results. Different system setups and channel conditions are analyzed. While changing , , and in the three subfigures, the other two are fixed to , , and , respectively. Several observations can be made. The detection complexity grows linearly with , enabling large savings in transmit power by deploying large numbers of antennas, with a mild increase in the processing power. Moreover, the processing complexity (for explicit and hybrid matrix inversion algorithms) can be dramatically reduced in static environments, in which case the channel matrixdependent operations are performed very rarely.
In addition to the processing accuracy and computational complexity, parallelism is an important aspect to be considered, and it highly impacts the processing latency. Iterative algorithms such as Neumann series approximation and coordinate descent can suffer from a long processing latency for MUIdominant channels. On the other hand, matrix decomposition can be performed in a more parallel fashion and was thus selected for the first Massive MIMO precoderdetector chip introduced in the next section. Moreover, the intermediate results , , and can be shared between the uplink and downlink processing, further simplifying the hardware.
VF 1288 Massive MIMO PrecoderDetector Chip Achieving 300 Mb/s at 60 pJ/b
Integrated hardware implementations will ultimately define both the performance and power consumption of Massive MIMO systems. Hence, algorithms should be selected such that the corresponding operations can be mapped into simple, configurable, and scalable hardware architectures to enable high throughput, low latency, and flexible implementation. The reconfigurability and scalability are essential to enable efficient operation in a wide range of conditions. In this section we present a design [45] demonstrating such an algorithm and hardware architecture codesign, where the QRdecomposition based ZF precoding is mapped onto a systolic array architecture; see Figure 9. The systolic array consists of a homogeneous network of elementary processing nodes, where each node performs the same predefined tasks. Due to the homogeneity, the architecture is scalable to support different and . The data flow in a systolic array is straightforward and parallel, leading to a simple and highspeed hardware implementation.
The QRdecomposition based precoder, together with a Cholesky decomposition based detector, was fabricated using nm FDSOI (Fully Depleted Silicon On Insulator) technology. Figure 10(a) shows a photograph of the chip. It occupies only a mm silicon area and consumes mW power for precoding and detection for a 1288 Massive MIMO system with a Mb/s throughput. The fabricated chip and the measurement results prove that the Massive MIMO concept works in practice and that systemalgorithmhardware cooptimization enables record energyefficient signal processing. The crosslevel design approach also applies advanced circuits techniques leveraging on the flexible FDSOI body bias feature [50]. Using forward body bias or reverse body bias allows systems to dynamically adjust processing speed and power consumption of the chip towards the most efficient operating point.
The algorithmhardware codesign method is further exploited in [49] to map an iterative expectationpropagation detection (EPD) onto a condensed systolic array for higher hardware resource utilization. This detector chip (Figure 10) is fabricated using nm FDSOI technology and provides 1.8 Gb/s throughput with mW power consumption. It offers 3 dB processing gain comparing to [45], equivalent to a 2 boost in link margin that can be utilized to lower the TX power and relax the frontend requirements.
Vi PerAntenna Chain processing at the Semiconductor Edge
An obvious concern is how the large number of antennas and the associated signal processing will affect the cost and energy consumption of the base station. The individual antenna signals may have low precision, but regardless of that, the coherent combination yields excellent SNR eventually. We demonstrate below that the resolution of digital signals and operators, such as filtering coefficients, can be scaled back sharply. Furthermore, we advocate processing of the perantenna functionality without the conventional circuit design margins that are used to cope with uncertainties in the semiconductor technology. This approach has been called “at the semiconductor’s edge” to indicate an operation point where the performanceenergy benefit of the technology is maximally exploited at the expense of reliability [51]. Specifically, voltage overscaling offers significant energy reductions in deeply scaled CMOS, up to more than 50%, at the risk of occasional processing errors. Massive MIMO systems can be designed to meet required performance levels when operating with errorprone digital signal processing circuits. Circuits remain functional even for the worstcase scenario in which the DSP circuitry in some antenna paths fails completely, for example by a broken power supply. We will call the situation where the signal in an antenna branch is fully lost “antenna outage”.
Via PerAntenna Functions: Coarse Processing Provides Excellent Performance
Massive MIMO can operate well with lowresolution signals. A profiling of the perantenna functionality in terms of generic operations per second shows that for an LTElike setup, about of the complexity is in the filtering and the remaining is in the (I)FFT operation. The filtering functionality is the most demanding because of the need to oversample and hence process at high speed. Significant savings in complexity are therefore possible by minimizing the resolution of this processing. An exploration of the word lengths of the data signals, , and of the filtering coefficients, , is reported on in [52]. The circuit area complexity of the tap FIR filtering of I and Qsignals as a function of the word lengths is calculated using basic formulas for the complexity of adders and multipliers, which are dependent on the wordlengths and of the operands as follows:
(32) 
If a smaller number of bits is used to represent the signals and the filter coefficients, the hardware complexity as given in (32) is reduced. However, decreasing the word length will increase the quantization noise. For a desired transmission quality the justsufficient precision can be determined. Considering that the quantization noise will be independent among the antennas, its combined impact will be smaller for larger numbers of antennas. This effect is illustrated in Figure 11 for the rather demanding 64QAM case, and an uncoded BitError Rate (BER) of . The curves were generated based on individual BER vs. SNR simulations for different coefficient and signal resolutions, from which the equal performance points were extracted. Dotted lines show equalcomplexity (in terms of area) solutions. For a 1284 Massive MIMO system, 4 and 5 bits are sufficient for the signals and the coefficients, respectively, for the targeted performance. This brings a 62% complexity reduction for the filters compared to the 84 case. The outer right points on the curves are clearly always suboptimal and demonstrate that highprecision filter coefficients do not improve performance, while they can cause a significant complexity penalty. A similar observation holds for the upper left points.
For higher system loads, more bits are needed. At the system level one could tradeoff system load for constellation order to satisfy throughput requirements.
This analysis provides evidence that lowcomplexity, coarse processing in the digital filters of the individual antenna signals can offer the required performance in Massive MIMO. In the downlink the signals will next be passed to the D/A converters. The latter could be low resolution as well. The more demanding design challenge for D/A converters however is to meet outofband emission specifications, as introduced in Section IV.
The (I)FFT operations required in Massive MIMO systems with multicarrier modulation can also be designed for Massive MIMO operation specifically and benefit from the complexity reduction brought by the coarse quantization. A thorough optimization is however quite complex and should consider varying quantization at the different butterfly stages.
ViB Processing at the Semiconductor’s Edge
Applications have benefited over the last decades from Moore’s law, providing ever higher performance at lower power consumption. Integrated Circuits (ICs) have been able to operate at lower dynamic power thanks to the scaling of the supply voltage . For digital circuits, the average dynamic power consumption is
(33) 
where is the effective switching capacitance of the module and is the switching frequency. Clearly, scales quadratically with the supply voltage .
However with scaling towards deep submicron CMOS technologies (65 nm and smaller), designers are facing everincreasing variability challenges. The process, voltage and temperature (PVT) variabilities are considered to be the three main contributors to circuit variability. Conventionally, to cope with this challenge, ICs are designed at the worst PVT corners, to ensure that they always operate correctly. Figure 12 illustrates the different operating regions for ICs suffering from manufacturing variability.
The conventional design approach for worstcase conditions introduces considerable margins, leading to reduced peak performance and wasted power consumption. The worstcase synthesis assumes that all devices in the circuit operate in the slowprocess corner and experience the least favorable voltage and temperature conditions. Temperature variations can yield up to 20% speed differences for a single D flipflop. For instance, [53] shows that for 28nm technology, the performance (speed) difference for a representative circuit is as large as times between the typical case and the worst case. Adaptive scaling techniques manage power dissipation and temperature by using a variable supply voltage .
Scaling down the supply voltage is regarded as an errorfree power saving method as long as the signal timing constraints are met. However, the critical (minimum) that guarantees timing closure cannot be determined at design time due to PVT variabilities and aging effects.
A third design approach has recently gained interest, namely to scale the below the critical supply voltage, which is called Voltage OverScaling (VOS). In the VOS approach, the designer accepts that sporadic errors might occur: for logic components, the signal from the longest propagation paths can be miscaptured; for memory components, it may lead to incorrect write/read data/address or data loss. This methodology of approximate computing enables very energyefficient processing [54]. Wireless communication systems are designed to cope with distortions and errors occurring on the channel. They are hence inherently good candidates for errorresilient processing solutions. In Massive MIMO, the large number of antennas implies redundancy in the system. It is promising to apply VOS specifically in the perantenna processing, reaching beyond the reliability margins of the circuits, but still operating at a point where the computations are more often correct than wrong.
ViC Massive MIMO Resilience to Circuit Errors
Massive MIMO inherently is resilient to some circuits errors in the perantenna processing. Hardware errors in a number of antenna paths can be absorbed by the system thanks to the averaging induced by the large number of antennas – reminiscent of how the effects of smallscale fading average out in the coherent multiuser MIMO processing [3]. Semiconductor process variability was at first experienced globally, between wafers or circuits separated in space on a silicon wafer, hence dietodie. Designers have thus realistically assumed transistor parameters to be correlated for nearby circuits on a specific die and chip. However, in deeply scaled technologies, device variability is mostly caused by the inaccuracy of lithography and etch technology. Intradie (local) variations have consequently become significant, and are even reported dominant over global variations [55]. This apparent design challenge comes with a new opportunity to shave margins in the implementation of Massive MIMO. Indeed, different from the distortion resulting from nonlinearities, the digital distortion is independent of the signal and hence uncorrelated over the antennas. The massive MIMO system will continue functioning even when, sporadically, one or a few individual antenna signals is subject to full failure. As mentioned in Section VIB, this opens the door to operation of circuits with much lower design margins compared to traditional specifications, and most interestingly at lower supply voltages and hence power consumption.
The digital hardware errors in (I)FFT and filters introduced by silicon unreliability and by ambitious design methodologies result in incorrect bits during signal processing. This can be regarded as digital circuit distortion. We characterize the impact on the purity of the signal in terms of the signaltodigital distortion ratio (SDDR):
(34) 
where and are the powers of the errorfree digital antenna signal output, and the noise power of the digital distortion due to circuit unreliability, respectively. First, we consider VOS errors which are temporary and local in nature. The BERperformance is shown in Figure 13 for a severe SDDR distortion, where signals get stuck at a fixed value. Results for different modulation orders and both uncoded and coded performance (rate 3/4 soft decoded LPDC) are shown. The resulting SNR degradation remains limited to dB for of the antennas being a “victim” of circuit errors in the coded 16QAM, and even up to of the antennas in QPSK case.
When operating deeply scaled circuits without margins, occasionally a full circuit failure may occur. The impact of this effect on the Massive MIMO system performance is called “antenna outage”. The digital output are then permanently stuck at a fixed value, which is assumed to be its maximum possible value. The SDDR of the outage antenna is , as the signals from the victim antennas are completely lost. This model is regarded a worstcase hardware failure. Note that the SDDR does not imply infinite noise to the whole system, as only the victim antennas are affected and their PA power is normalized among all antennas. Therefore, a single antenna outage will not cause the system to fail entirely. The impact on the system performance is shown in Figure 14 for different antenna outage and system loads, for the pessimistic case where the errors are not detected.
As demonstrated, Massive MIMO can operate well with rather severe circuits errors, and thus allows significant VOS. The impact increases with higher system load and modulation constellations. The may be adapted according to the system parameters to always offer just sufficient performance. Insitu monitoring based on test signals can be applied to perform adequate scaling [51].
In order to further improve the system robustness towards hardware errors, techniques to first detect hardware errors, and next either neglect, or if needed disable, defective hardware can be applied. Importantly, the distortion originating from digital circuit errors fundamentally differs from pure random noise. While process variations may feature continuous random distributions, their effect typically results in discrete error events. Dedicated monitoring circuitry can be established [56]for the functional components such as (I)FFTs and filters, that will detect these errors. If the Massive MIMO system is operated whereby it receives information from the hardware level on failing circuits, it can adapt its signal processing accordingly. One option is to disable systematically failing antenna paths and no longer consider them in the central processing. The BER results are given in Figure 15 for a case with moderate system load (10100 in this simulation). It shows that excluding defective circuits limits the degradation level to dB on uncoded QPSK for up to of the antenna paths failing. This approach is equivalent to operating the Massive MIMO system with a reduced number of BS antennas . For a representative case of QPSK transmission in a 100antenna, 10user scenario and with 28 nm standard CMOS technology, up to 40% power savings can be achieved with negligible performance degradation [57].
In conclusion, lean perantenna processing can be performed in Massive MIMO systems. The very large number of operations, due to the large number of antenna paths, can be performed with low precision and with a profoundly scaled supply voltage. Combined, these techniques can reduce the power consumption due to the digital processing on each antenna path by an order of magnitude. For an exemplary system with 100 antennas at the base station, the total is comparable to a conventional MIMO system with an order of magnitude less antennas.
Vii Terminals: Increased Reliability with LowComplexity Signal Processing
Viia Increased Service Levels on Low Complexity Terminals
It has been shown that the Massive MIMO system concept does not require any additional specific functionality at the UE side. Massive MIMO terminals that have a single antenna, or apply simple diversity reception, will only be able to receive a single spatial stream. However, large numbers of terminals can be multiplexed in the same timefrequency slot, and every terminal can be allocated the full bandwidth of the system. This results in a throughput per terminal comparable with that of conventional UEs that receive multiple spatial streams in parallel.
5G terminals are expected to come in large numbers and support a diverse set of service requirements. Next to the continued traffic growth towards terminals allocated to human users, a variety of devices will require Machine Type Communication (MTC). Figure 16 illustrates three main use cases envisioned by industry alliances and the International Telecommunication Union (ITU) [58].
Figure 16 demonstrates that 5G technologies not only need to enhance mobile broadband links. New solutions are needed to connect a very large number of (ultra) lowpower devices and machines requiring very reliable and lowlatency services. Massive MIMO can simultaneously support many broadband terminals in sub6 GHz bands in indoor, outdoor, and mobile environments. The technology can also be tailored to optimally serve new MTCbased applications. Especially for narrowband MTC, the high array gain and the high degree of spatial diversity offered by Massive MIMO will help. The spatial diversity specifically gives rise to channel hardening.
The effects of array gain and channel hardening are illustrated for a 128antenna setup in Figure 17. Consistently boosted signal levels over all terminal positions, thanks to the array gain, are observed. Terminals can potentially transmit data at several tens of dB lower output powers. The latter however requires highquality CSI to be available, and the power allocated to pilots will limit the savings in practice. The channel hardening effect enhances the reliability of the links and improves the quality of service; most notably:

Increased performance at the cell edges, where terminals may experience limited or worst case no connectivity in current networks. Massive MIMO addresses this challenge, provided good uplink pilotbased CSI acquisition is ensured.

Power savings and hence longer autonomy for batterypowered devices.

Improved reliability. Fewer packet retransmissions can also reduce the endtoend latency. The specifications put forward for Ultra Reliable Low Latency Communication (URLLC) in 5G is to support a reliability, and an endtoend latency better than 1ms.

Sustained good service levels in conditions with many simultaneously active users.
In the next paragraphs, first a typical broadband user equipment is zoomed in on. It is indicated how lowpower operation can be achieved while keeping backward compatibility with 4G air interfaces. Next, we discuss how tailored MassiveMIMO systems have great potential to address the challenging requirements of MTC terminals.
ViiB Energy Efficient Broadband Terminals
No advanced processing is required at the UE in Massive MIMO systems. In contrast, 4G systems deliver broadband services to UEs through multiplexing of several spatial layers. We compare a typical Massive MIMO terminal with the reference case of a MIMO link. The latter requires MIMO detection at the terminal side in the downlink. Figure 18 shows a functional block scheme of a conventional broadband, multipleantenna terminal receiver.
The complexity breakdown of a typical MIMOOFDM baseband chain
identifies channel estimation and MIMO
detection as the main bottlenecks. We take as a reference MIMOOFDM case where the multipleantenna processing can be
conveniently performed per subcarrier resulting in relatively lowcomplexity
implementations [59]. We consider a
basic linear MIMO detector, and nonlinear detectors implementing (ordered) Successive
Interference Cancellation (SIC). The latter are
required to achieve acceptable system performance especially in the
lowSNR regime and in highmobility scenarios. The power
consumption of the inner modem receiver of the terminal in a Massive
MIMO system is estimated relatively to published VLSI implementations for
conventional MIMO receivers [60, 61]. A range
of algorithms and implementations for MIMO detectors have been
reported on, differing substantially in complexity. Our analysis is
based on typical data for the specific components, and our own design
knowhow. Table III summarizes the assessment for
both singleantenna and dualantenna diversityreception terminals,
demonstrating an expected reduction in power consumption of a factor 5 to
50.
4 x 4  4 x 4  Massive MIMO  Massive MIMO 
linear  nonlinear  2antenna  
detector  detector  single antenna  diversity 
ViiC Tailored Solutions Fit for LowPower Connected Devices
MTC for sensors and actuators opens the door for a variety of new IoT applications. Low energy consumption is essential to enable long autonomy of devices powered by batteries or even relying on harvested energy. The physics of radio propagation dictates a strong attenuation on the link with distance, :
(35) 
where and are the received and transmitted powers, respectively, and and are directivity gains at the receiving and transmitting end of the link. The above is especially unfortunate for mostly uplinkdominated MTC. Low Power Wide Area Network (LPWAN) technologies are dedicated to connect IoT nodes at long ranges. We performed measurements with an IoT node communicating via a LORA gateway [62]. Inspection of the power consumption of this illustrative node in Table IV provides valuable insights. The transmit power is relatively high since the power amplifier needs to provide sufficient power to cope with largescale fading losses. The energy consumption, which will ultimately determine the autonomy of the node, is shown in Figure 19.
Operation mode  Power consumption (mW) 

Transmit 

Receive  
Sense  
Sleep 
This pinpoints the fierce challenge of connecting sensor nodes and other autonomous devices at a longer range. Their traffic is mostly dominated by uplink, hence putting the node in the most energyconsuming transmitting mode. Equation (35) reveals that fundamentally only few parameters can be influenced to improve the link budget. Antennas at IoT nodes, due to size and cost constraints, can hardly offer any gain and on the contrary not seldom introduce losses. Massive MIMO systems exhibit a largeantenna array gain and apply an adaptive channelmatched beamforming approach. They offer the opportunity to reduce the transmit power in constrained MTC nodes proportionally to the square root of number of BS antennas or even proportionally to if accurate CSI is acquired. This enables the simultaneous service of a large number of devices. This asset is important to keep up with the predicted evolution towards Massive MTC. A Massive MIMObased LPWAN could also offer extended coverage and increased reliability, provided that a powerefficient solution for the pilotbased CSIacquisition is implemented. This challenge, to develop Massive MIMO technology for MTC services is further discussed in Section VIII.
Viii Demonstrations, Conclusions and Future Directions
Viiia Signal Processing at Work in Massive MIMO Demonstrations
Demonstrations that have proven the superior spectral efficiency of Massive MIMO and the adequacy of DSP solutions in reallife testbeds are illustrated here below. Furthermore we summarize the conclusions of this paper and outline future research directions.
To prove a new wireless technology, it is very important to build up testbeds to conduct verification and evaluate performance in reallife environments with overtheair transmission. For Massive MIMO it is especially crucial, since performance is dependent on propagation characteristics, and measurementbased channel models themselves are still under development. Thanks to recent advances in SoftwareDefined Radio (SDR) technology, several Massive MIMO prototype systems have been built by both industry and academia, including the Argos testbed with 96 antennas [10], Eurecom’s 64antenna testbed [64], Facebook’s ARIES project [65], the 100antenna LuMaMi testbed from Lund University (Figure 20a) [63], SEU’s 128antenna testbed [5], and testbeds exploring distributed arrays from the KU Leuven (Figure 20b) [66] and University of Bristol [67].
WorldRecord in Spectral Efficiency and Massive MIMO in Mobility
The signal processing techniques discussed in this paper, especially the crosslevel optimization methodology, have been exploited in the development of Massive MIMO testbeds to enable realtime processing of wideband signals for large numbers of antennas. For instance, the LuMaMi tested adopts the processing distribution scheme in Figure 3, where 50 SDRs with FieldProgrammable GateArrays (FPGAs) are used to perform perantenna processing in a parallel fashion. Four centralized FPGAs are responsible for persubcarrier processing, and the Peripheral Component Interconnect Express (PCIe) with direct memory access (DMA) channels handles the data shuffling. QRdecomposition based ZF processing has been implemented to fully leverage the available parallel processing resources in the FPGAs.
Diverse field trials, both indoors and outdoors with static and mobile users, have been conducted using the Massive MIMO testbeds. In a 2016 experiment, a 128antenna Massive MIMO base station served 22 users, each transmitting with 256QAM modulation, on the same timefrequency resource [67]. The spectral efficiency benefits from the spatial multiplexing as well as from the high constellation order, enabled by the array gain. In practice, protocol overhead and FEC redundancy will determine the actual net spectral efficiency. In the actual demonstration a spectral efficiency of 145.6 bits/s/Hz was achieved on a 20 MHz radio channel, representing a times increase with respect to the current 4G air interface. The performance was achieved in an environment without mobility and multicell interference, which would constitute the limiting factors performance in a practical deployment.
The same research group also demonstrated Massive MIMO operation in an outdoor scenario with moderate mobility [4]. Figure 21 shows the measurement scenario where the 100antenna LuMaMi testbed is placed on the rooftop of a building facing a parking lot m away. Ten singleantenna users are served in real time at 3.7 GHz, including six users moving at pedestrian speed and four terminals on vehicles moving at a speed up to around km/h. The spatial multiplexing was fully achieved and the communication quality was on average well maintained for all terminals [68]. Sporadic interruptions could be traced back to temporary loss of synchronization. It should be noted that both the speed of the cars and the number of terminals could be larger in a real deployment. In the proof of concept they were limited by the available test space and equipment. In fact, at 3.7 GHz carrier frequency and with a slot length of 0.5 ms, the maximum permitted mobility (assuming a tworay model with Nyquist sampling, and a factoroftwo design margin, as in [3]) is over 140 km/h [69].
Further Investigation Needed for Synchronization
A critical challenge requiring further investigation is the initial synchronization between the base station and the user terminals. This initial synchronization has to start without any knowledge of the channels, and therefore cannot benefit from an array gain. How to efficiently perform initial time and frequency synchronization acquisition without the massive array gain and how to explore the (partial) array gain to provide faster and more robust synchronization are still open questions. Two methods were studied during the LuMaMi testbed experiments. One method is to reserve a dedicated RF chain for the synchronization signal, which is transmitted using an omnidirectional antenna. In this case, a higherpower PA (which is not available in LuMaMi) is needed to provide coverage. Another method is to use beamsweeping for the synchronization signal [70], but this is inefficient, as it is essentially equivalent to repetition coding, and also there is risk of synchronization loss when the users are not hit by a beam. Improved techniques, based on spacetime block codes, have been investigated [71, 72, 73]. Iterative search and tracking methods [74] may have potential, especially for mobile users.
ViiiB Concluding on the Signal Processing
Appropriate codesign of algorithms, hardware architectures, and circuits in Massive MIMO implementations brings significant benefits:

Energy efficient implementations of “theoretically optimal” Massive MIMO DSP architectures are nontrivial but possible. We have detailed some of the most important innovations required, and explained their analysis. The power consumption of conventional macro base stations is dominated by the PA stage. They benefit in Massive MIMO from the ability to operate on an order of magnitude less transmit power.

The sufficiency of lowprecision quantization and processing, predicted by informationtheoretic studies, has now also been validated through real signal processing experiments. A reduction in wordlength up to 6 times compared to conventional systems translates into corresponding savings in complexity, power consumption and memory.

Dedicated and scalable hardware architectures implementing tailored algorithms for large matrix processing facilitate zeroforcing precoding at the base station in real time, at 30 mW power consumption in relevant scenarios for a system.

Voltage overscaling, a speculative concept just 5 years ago, has found appropriate application in the Massive MIMO perantenna processing.

Smart control of algorithmic modes and scalable devices, including body bias adaptation, can guarantee suitable performancepower tradeoffs over a wide range of communication scenarios and channel propagation conditions.

Lean terminals could operate in typical broadband cellular Massive MIMO networks at about of the power consumption of equivalent conventional terminals, both in data transmission and reception.

The efficiency of Massive MIMO base stations can be further improved by relaxing the requirements of the RF and analogue hardware. However, caution is needed as (nonlinear) distortion may under specific conditions combine coherently.
ViiiC Future Directions
Progress Massive MIMO Deployment in Actual Networks
Integration of all components into deployment in actual networks represents a vast design and development effort, that will include:

Overcoming challenges related to connection of the many antenna paths to the central processing units. This involves implementing highspeed interconnects and coping with potential coupling effects in the frontend modules.

Devising efficient schedulers for large numbers of users. Achieving the high spatial multiplexing gains offered by Massive MIMO fundamentally requires that many terminals are scheduled for service simultaneously. Tuning or redesign of higherlayer protocols could be beneficial to shape the traffic patterns, such that aggressive spatial multiplexing can be performed.

Designing antenna arrays. Massive MIMO arrays do not have to be linear, rectangular or cylindrical. Small antenna elements could be naturally integrated into the environment, onto the surface of existing structures, or faces of buildings, for example, in an aesthetically pleasing manner.
Insights from electromagnetics may guide the design of new types of arrays. Specifically, for a given volume , consider the corresponding smallest possible sphere that contains . If one covers the surface of this sphere with antennas at a density of elements per square meter, then there is no point in installing any additional elements inside of the interior of [75]. Sampling the surface on a grid captures all information in the radiated field. In conclusion, what goes into the interior of is unimportant, only the surface matters.
Industrial recognition of the value of Massive MIMO technology is evidenced by the large number of contributions on the topic in the 3GPPLTE standardization of New Radio (NR) for 5G systems. Leading operators have already started to perform commercial field trials of the technology [69].
Enhanced Functionalities
Large antenna arrays can also be used to perform accurate positioning and localization. This feature can offer improved contextawareness to services. Also the Massive MIMO communication system itself could exploit this information to perform smart pilot allocation, for example.
Scale Up Capacity and Efficiency
The call for more and higherquality wireless services is expected to increase for many years, and the quest for wireless systems offering higher spectral and energy efficiency will continue. Higher peakrates can be offered in Massive MIMO by performing spatial multiplexing of several streams to one terminal. Actual gains may be limited due to insufficient rank of the channel, yet for two streams this will mostly be achievable with colocated antennas exploiting crosspolarization.
Wider bandwidth channels can be allocated especially in mmWave bands. Radio propagation and in particular absorption is considerably different at these frequencies. Arrays with a large number of antennas can be small in size, yet their effective gain may suffer from high losses on the interconnect. Consequently, Massive MIMO systems in these bands call for other architectures and their deployment will best suit particular use cases, for example hotspots.
With larger antenna arrays, both better spatial multiplexing and array gains can be achieved. The new concepts of cellfree Massive MIMO [76] and intelligent surfaces [77] accelerate this trend to a next level. With cellfree Massive MIMO, coherently cooperating antennas are spread out over a larger geographical area, providing improved macrodiversity and improved channel rank for multipleantenna terminals. The intelligent surface concept envisages distributed nodes that form electromagnetically active walls, floors, and planar objects. New research is urgently needed to bring these new concepts to their full potential.
Ix Acknowledgment
The authors thank their colleagues and collaborators in the European FP7MAMMOET project for the nice cooperation that truly progressed Massive MIMO technology.
Liesbet Van der Perre is Professor at the Department of Electrical Engineering at the KU Leuven in Leuven, Belgium and a guest Professor at the Electrical and Information Technology Department at Lund University, Sweden. Dr. Van der Perre was with the nanoelectronics research institute imec in Belgium from 1997 till 2015, where she took up responsibilities as senior researcher, system architect, project leader and program director. She was appointed honorary doctor at Lund University, Sweden, in 2015. She was a parttime Professor at the University of Antwerp, Belgium, from 1998 till 2002. She received her Ph.D. degree from the KU Leuven, Belgium, in 1997.
Her main research interests are in wireless communication, with a focus on physical layer and energy efficiency in transmission, implementation, and operation. Prof. L. Van der Perre was the scientific leader of FP7MAMMOET, Europe’s prime project on Massive MIMO technology. Dr. Van der Perre has been serving as a scientific and technological advisor, reviewer and jury member for companies, institutes, and funding agencies. She is a member of the Board of Directors of the company Zenitel since 2015. Liesbet Van der Perre is an author and coauthor of over 300 scientific publications. She was a system architect for the OFDM ASICs listed in the IEEE International Solid State Circuit Conference (ISSCC’s) Best of 50 Years papers in 2003. She coauthored the paper winning the DAC/ISSCC 2006 design contest.
Liang Liu is an Associate Professor at Electrical and Information Technology Department, Lund University, Sweden. He received his Ph.D. in 2010 from Fudan University in China. In 2010, he was with Electrical, Computer and Systems Engineering Department, Rensselaer Polytechnic Institute, USA as a visiting researcher. He joined Lund University as a Postdoc researcher in 2010 and is now associate professor there. His research interest includes signal processing for wireless communication and digital integrated circuits design. Liang is active in several EU and Swedish national projects, including FP7 MAMMOET, VINNOVA SoS, and SSF HiPEC, DARE. He is a board member of the IEEE Swedish SolidState Circuits/Circuits and Systems chapter. He is also a member of the technical committees of VLSI systems and applications and CAS for communications of the IEEE Circuit and Systems society.
Erik G. Larsson received the Ph.D. degree from Uppsala University, Uppsala, Sweden, in 2002.
He is currently Professor of Communication Systems at Linköping University (LiU) in Linköping, Sweden. He was with the KTH Royal Institute of Technology in Stockholm, Sweden, the George Washington University, USA, the University of Florida, USA, and Ericsson Research, Sweden. In 2015 he was a Visiting Fellow at Princeton University, USA, for four months. His main professional interests are within the areas of wireless communications and signal processing. He has coauthored some 150 journal papers on these topics, he is coauthor of the two Cambridge University Press textbooks SpaceTime Block Coding for Wireless Communications (2003) and Fundamentals of Massive MIMO (2016). He is coinventor on 18 issued and many pending patents on wireless technology.
He is a member of the IEEE Signal Processing Society Awards Board during 2017–2019. He is an editorial board member of the IEEE Signal Processing Magazine during 2018–2020. From 2015 to 2016 he served as chair of the IEEE Signal Processing Society SPCOM technical committee. From 2014 to 2015 he was chair of the steering committee for the IEEE Wireless Communications Letters. He was the General Chair of the Asilomar Conference on Signals, Systems and Computers in 2015, and its Technical Chair in 2012. He was Associate Editor for, among others, the IEEE Transactions on Communications (20102014) and the IEEE Transactions on Signal Processing (20062010).
He received the IEEE Signal Processing Magazine Best Column Award twice, in 2012 and 2014, the IEEE ComSoc Stephen O. Rice Prize in Communications Theory in 2015, the IEEE ComSoc Leonard G. Abraham Prize in 2017, and the IEEE ComSoc Best Tutorial Paper Award in 2018. He is a Fellow of the IEEE.
Footnotes
 Power control does not impact reciprocity, and it will show up as a scalar multiplication on the individual terminal signals.
 A similar reduction in hardware complexity could be achieved for UE radios customdesigned to operate in Massive MIMO networks specifically. Backward compatibility with previous broadband systems may require the presence of MIMO detection hardware in broadband UEs in practice.
 The module used to perform the measurements has a current limited to 45 mA.
References
 “First 5G NR specs approved, Dec. 2017,” http://www.3gpp.org/newsevents/3gppnews/1929nsa_nr_5g.
 E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive MIMO for next generation wireless systems,” IEEE Comm. Mag., vol. 52, no. 2, pp. 186–195, Feb. 2014.
 T. L. Marzetta, E. G. Larsson, H. Yang, and H. Q. Ngo, Fundamentals of Massive MIMO. Cambridge University Press, 2016.
 P. Harris, S. Malkowsky, J. Vieira, E. Bengtsson, F. Tufvesson, W. Hasan, L. Liu, M. Beach, S. Armour, and O. Edfors, “Performance characterization of a realtime massive MIMO system with LOS mobile channels,” IEEE Journal on Sel. Areas in Comm., vol. 35, pp. 1244–1253, Jun. 2017.
 X. Yang, W. Lu, N. Wang, K. Nieman, C. K. Wen, C. Zhang, S. Jin, X. Mu, I. Wong, Y. Huang, and X. You, “Design and implementation of a TDDbased 128antenna massive MIMO prototype system,” China Communications, vol. 14, no. 12, pp. 162–187, Dec. 2017.
 A. F. Molisch, V. V. Ratnam, S. Han, Z. Li, S. L. H. Nguyen, L. Li, and K. Haneda, “Hybrid beamforming for massive MIMO: A survey,” IEEE Comm. Mag., vol. 55, no. 9, pp. 134–141, 2017.
 E. Björnson, J. Hoydis, L. Sanguinetti et al., “Massive MIMO networks: Spectral, energy, and hardware efficiency,” Foundations and Trends® in Signal Processing, vol. 11, pp. 154–655, 2017.
 X. Li, E. Björnson, E. G. Larsson, S. Zhou, and J. Wang, “Massive MIMO with multicell MMSE processing: exploiting all pilots for interference suppression,” EURASIP Journal on Wireless Comm. and Netw., vol. 2017, no. 1, p. 117, 2017.
 S. Malkowsky, J. Vieira, K. Nieman, N. Kundargi, I. Wong, V. Owall, O. Edfors, F. Tufvesson, and L. Liu, “Implementation of lowlatency signal processing and data shuffling for TDD massive MIMO systems,” in Proc. of IEEE International Workshop on Signal Processing Systems (SiPS), pp. 260–265, Dec. 2016.
 C. Shepard, H. Yu, N. Anand, L. E. Li, T. L. Marzetta, R. Yang, and L. Zhong, “Argos: Practical manyantenna base stations,” in Proc. of ACM Int. Conf. on Mobile Computing and Networking (MobiCom), Istanbul, Turkey, Aug. 2012.
 A. Puglielli, A. Townley, G. LaCaille, V. Milovanovic, P. Lu, K. Trotskovsky, A. Whitcombe, N. Narevsky, G. Wright, T. A. Courtade, E. Alon, B. Nikolic, and A. M. Niknejad, “Design of energy and costefficient massive MIMO arrays,” Proceedings of the IEEE, vol. 104, no. 3, pp. 586–606, 2016.
 K. Li, R. R. Sharan, Y. Chen, T. Goldstein, J. R. Cavallaro, and C. Studer, “Decentralized baseband processing for Massive MUMIMO systems,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 7, no. 4, pp. 491–507, Dec. 2017.
 Q. Yang, X. Li, H. Yao, J. Fang, K. Tan, W. Hu, J. Zhang, and Y. Zhang, “Bigstation: enabling scalable realtime signal processing in large MUMIMO systems,” in Prof. of ACM SIGCOMM, pp. 399–410, Aug. 2013.
 E. Bertilsson, O. Gustafsson, and E. G. Larsson, “A scalable architecture for massive MIMO base stations using distributed processing,” in Proc. of Asilomar Conference on Signals, Systems and Computers, pp. 864–868, Nov. 2016.
 G. Auer, V. Giannini, C. Desset, I. Godor, P. Skillermark, M. Olsson, M. A. Imran, D. Sabella, M. J. Gonzalez, O. Blume, and A. Fehske, “How much energy is needed to run a wireless network?” IEEE Wireless Comm., vol. 18, no. 5, pp. 40–49, Oct. 2011.
 C. Mollén, U. Gustavsson, T. Eriksson, and E. G. Larsson, “Impact of spatial filtering on distortion from lownoise amplifiers in massive MIMO base stations,” IEEE Trans. Comm., 2018. To appear.
 Y. Li, J. Lopez, P. H. Wu, W. Hu, R. Wu, and D. Y. C. Lie, “A SiGe envelopetracking Power Amplifier with an integrated CMOS envelope modulator for mobile WiMAX/3GPP LTE transmitters,” IEEE Trans. on Microwave Theory and Techniques, vol. 59, no. 10, pp. 2525–2536, Oct. 2011.
 P. Horlin and A. Bourdoux, Digital Compensation for Analog FrontEnds: A New Approach to Wireless Transceiver Design. Wiley, 2008.
 E. G. Larsson and L. Van der Perre, “Outofband radiation from antenna arrays clarified,” IEEE Wireless Comm. Lett., 2018. To appear.
 C. Mollén, U. Gustavsson, T. Eriksson, and E. G. Larsson, “Spatial characteristics of distortion radiated from antenna arrays with transceiver nonlinearities,” CoRR, vol. abs/1711.02439, 2017. [Online]. Available: http://arxiv.org/abs/1711.02439
 M. Pelgrom, AnalogtoDigital Conversion. Springer, 2017.
 C. Mollén, J. Choi, E. G. Larsson, and J. R. W. Heath, “Uplink performance of wideband massive MIMO with onebit ADCs,” IEEE Trans. Wireless Comm., vol. 16, pp. 87–100, Jan. 2017.
 C. Mollén, J. Choi, E. G. Larsson, and R. W. Heath Jr., “Achievable uplink rates for massive MIMO with coarse quantization,” in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 2017.
 Y. Li, C. Tao, G. SecoGranados, A. Mezghani, A. L. Swindlehurst, and L. Liu, “Channel estimation and performance analysis of onebit massive MIMO systems,” IEEE Trans. Signal Process., vol. 65, no. 15, pp. 4075–4089, Aug. 2017.
 L. Fan, S. Jin, C.K. Wen, and H. Zhang, “Uplink achievable rate for massive MIMO with lowresolution ADC,” IEEE Commun. Lett., vol. 19, no. 12, pp. 2186–2189, Dec. 2015.
 J. Zhang, L. Dai, S. Sun, and Z. Wang, “On the spectral efficiency of massive MIMO systems with lowresolution ADCs,” IEEE Commun. Lett., vol. 20, no. 5, pp. 842–845, May 2016.
 G. Van der Plas and B. Verbruggen, “A 150 MS/s 133 W 7 bit ADC in 90nm Digital CMOS,” IEEE Journal of SolidState Circuits, no. 4312, pp. 2631–2640, Dec. 2008.
 K. Choo, J. Bell, and M. Flynn, “Areaefficient 1GS/s 6b SAR ADC with chargeinjectioncellbased DAC,” in Proc. of IEEE International SolidState Circuits Conference (ISSCC), Feb. 2016.
 Y. Li, C. Tao, A. L. Swindlehurst, A. Mezghani, and L. Liu, “Downlink achievable rate analysis in massive MIMO systems with onebit DACs,” IEEE Commun. Lett., vol. 21, no. 7, pp. 1669–1672, Jul. 2017.
 S. Jacobsson, G. Durisi, M. Coldrey, T. Goldstein, and C. Studer, “Quantized precoding for massive MUMIMO,” IEEE Trans. Comm., vol. 65, no. 11, pp. 4670–4684, Nov. 2017.
 C. Desset and L. Van der Perre, “Validation of lowaccuracy quantization in massive MIMO and constellation EVM analysis,” in Proc. of European Conference on Networks and Communications (EuCNC), Jun. 2015.
 S. Jacobsson, G. Durisi, M. Coldrey, and C. Studer, “On outofband emissions of quantized precoding in massive MUMIMOOFDM,” in Proc. of Asilomar Conference on Signals, Systems, and Computers, Oct. 2017.
 MAMMOET project deliverable D2.4. [Online]. Available: https://mammoetproject.eu/downloads/publications/deliverables/MAMMOETD2.4AnalysisnonreciprocityimpactPUM20.pdf
 A. Bourdoux, B. Come, and N. Khaled, “Nonreciprocal transceivers in OFDM/SDMA systems: impact and mitigation,” in Proc. of Radio and Wireless Conference (RAWCON), pp. 183–186, Aug. 2003.
 J. Vieira, F. Rusek, O. Edfors, S. Malkowsky, L. Liu, and F. Tufvesson, “Reciprocity calibration for massive MIMO: Proposal, modeling and validation,” IEEE Trans. on Wireless Comm., vol. 16, no. 5, pp. 3042–3056, Mar. 2017.
 D. Zhu, B. Li, and P. Liang, “On the matrix inversion approximation based on Neumann series in massive MIMO systems,” in Proc. of IEEE International Conference on Communications (ICC), pp. 1763–1769, Jun. 2015.
 A. Mueller, A. Kammoun, E. Björnson, and M. Debbah, “Linear precoding based on polynomial expansion: Reducing complexity in massive MIMO,” EURASIP Journal on Wireless Comm. and Netw., Dec. 2016.
 H. Prabhu, O. Edfors, J. Rodrigues, L. Liu, and F. Rusek, “Hardware efficient approximative matrix inversion for linear precoding in massive MIMO,” in Proc. of IEEE International Symposium on Circuits and Systems (ISCAS), pp. 260–265, Jun. 2014.
 M. Wu, B. Yin, G. Wang, C. Dick, J. Cavallaro, and C. Studer, “Largescale MIMO detection for 3GPP LTE: Algorithms and FPGA implementations,” IEEE Journal of Sel. Topics in Sign. Proc., vol. 8, no. 5, pp. 916–929, Oct. 2014.
 K. Lee and C. Chen, “An eigenbased approach for enhancing matrix inversion approximation in massive MIMO systems,” IEEE Trans. Veh. Techn., vol. 66, no. 6, pp. 5483–5487, Jun. 2017.
 B. Nagy, M. Elsabrouty, and S. Elramly, “Fast converging weighted Neumann series precoding for massive MIMO systems,” IEEE Wireless Comm. Lett., Oct. 2017.
 B. Yin, M. Wu, J. R. Cavallaro, and C. Studer, “Conjugate gradientbased softoutput detection and precoding in massive MIMO systems,” in Proc. of IEEE Global Communications Conference (GLOBECOM), pp. 3696–3701, Dec. 2014.
 M. Wu, C. Dick, J. Cavallaro, and C. Studer, “Highthroughput data detection for massive MUMIMOOFDM using coordinate descent,” IEEE Trans. Circ. and Syst. I: Regular Papers, vol. 63, no. 12, pp. 2357–2367, Dec. 2016.
 X. Gao, L. Dai, J. Zhang, S. Han, and I. ChihLin, “Capacityapproaching linear precoding with lowcomplexity for largescale MIMO systems,” in Proc. of IEEE International Conference on Communications (ICC), pp. 1577–1582, Jun. 2015.
 H. Prabhu, J. Rodrigues, L. Liu, and O. Edfors, “A 60 pJ/b 300 Mb/s 128 x 8 massive MIMO precoderdetector in 28 nm FDSOI,” in Proc. of IEEE International SolidState Circuits Conference (ISSCC), Feb. 2017.
 H. A. J. Alshamary and W. Xu, “Efficient optimal joint channel estimation and data detection for massive MIMO systems,” in Proc. of IEEE International Symposium on Information Theory (ISIT), pp. 875–879, Jul. 2016.
 R. Gangarajaiah, H. Prabhu, O. Edfors, and L. Liu, “A Cholesky decomposition based massive MIMO uplink detector with adaptive interpolation,” in Proc. of IEEE International Symposium on Circuits and Systems (ISCAS), May 2017.
 M. Cirkic and E. Larsson, “On the complexity of very large multiuser MIMO detection,” in Proc. of IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 55–59, Jun. 2014.
 T. Wei, H. Prabhu, L. Liu, Ö. Viktor, and Z. Zhengya, “A 1.8Gb/s 70.6pJ/b 128Ã16 linkadaptive nearoptimal massive MIMO detector in 28nm UTBBFDSOI,” in Proc. of IEEE International SolidState Circuits Conference (ISSCC), Feb. 2018.
 N. Planes, O. Weber, V. Barral, S. Haendler, D. Noblet, D. Croain, M. Bocat, P. Sassoulas, X. Federspiel, A. Cros, and A. Bajolet, “28nm FDSOI technology platform for highspeed lowvoltage digital applications,” in Proc. of IEEE Symposium on VLSI Technology (VLSIT), pp. 133–134, Jun. 2012.
 Y. Huang, C. Desset, A. Bourdoux, W. Dehaene, and L. Van der Perre, “Massive MIMO processing at the semiconductor edge: Exploiting the system and circuit margins for power savings,” in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3474–3478, Mar. 2017.
 S. Gunnarsson, M. Bortas, Y. Huang, C.M. Chen, L. Van der Perre, and O. Edfors, “Lousy processing increases energy efficiency in massive MIMO systems,” in Proc. of European Conference on Networks and Communications (EuCNC), Jun. 2017.
 Y. Huang, M. Li, C. Li, P. Debacker, and L. Van der Perre, “Computationskip error mitigation scheme for power supply voltage scaling in recursive applications,” Journal of Signal Processing Systems, vol. 84, no. 3, pp. 413–424, Sep. 2016.
 J. Han and M. Orshansky, “Approximate computing: An emerging paradigm for energyefficient design,” in Proc. of European Test Symposium (ETS), May 2013.
 S. K. Saha, “Modeling process variability in scaled CMOS technology,” IEEE Design Test of Computers, vol. 27, no. 2, pp. 8–16, Mar. 2010.
 M. Fojtik, D. Fick, Y. Kim, N. Pinckney, D. Harris, D. Blaauw, and D. Sylvester, “Bubble Razor: An architectureindependent approach to timingerror detection and correction,” in Proc. of IEEE International SolidState Circuits Conference (ISSCC), pp. 488–490, Feb. 2012.
 Y. Liu, T. Zhang, and K. K. Parhi, “Computation error analysis in digital signal processing systems with overscaled supply voltage,” IEEE Trans. on Very Large Scale Integr. Syst., vol. 18, pp. 517–526, Apr. 2010.
 ITU vision on 5G usage scenarios. [Online]. Available: https://www.itu.int/dms_pubrec/itur/rec/m/RRECM.20830201509I!!PDFE.pdf
 P. Vandenameele, L. Van der Perre, M. G. E. Engels, B. Gyselinckx, and H. J. D. Man, “A combined OFDM/SDMA approach,” IEEE Journal on Sel. Areas in Comm., vol. 18, no. 11, pp. 2312–2321, Nov. 2000.
 Y. Shingo and Y. Miyanaga, “VLSI implementation of a 4x4 MIMOOFDM transceiver with an 80MHz channel bandwidth,” in Proc. of IEEE International Symposium on Circuits and Systems, pp. 1743–1746, May 2009.
 J. Ketonen, M. Juntti, and J. R. Cavallaro, “Performancecomplexity comparison of receivers for a LTE MIMOOFDM system,” IEEE Trans. on Sign. Proc., vol. 58, no. 6, pp. 3360–3372, Jun. 2010.
 Connecting sensors with low power wireless technologies. [Online]. Available: https://dramco.be/tutorials/lowpowerlora/
 S. Malkowsky, J. Vieira, L. Liu, P. Harris, K. Nieman, N. Kundargi, I. Wong, F. Tufvesson, V. Owall, and O. Edfors, “The world’s first realtime testbed for massive MIMO: Design, implementation, and validation,” IEEE Access, vol. 5, pp. 9073–9088, May 2017.
 “Openairinterface Massive MIMO testbed : A 5G innovation platform,” http://www.openairinterface.org/.
 “Introducing facebook’s new terrestrial connectivity systems – terragraph and project ARIES,” https://code.facebook.com/posts/1072680049445290/introducingfacebooksnewterrestrialconnectivitysystemsterragraphandprojectaries/.
 C. M. Chen, V. Volskiy, A. Chiumento, L. Van der Perre, G. A. E. Vandenbosch, and S. Pollin, “Exploration of user separation capabilities by distributed large antenna arrays,” in Proc. of IEEE Global Communications Conference (GLOBECOM) Workshops, Dec. 2016.
 P. Harris, W. Hasan, S. Malkowsky, J. Vieira, S. Zhang, M. Beach, L. Liu, E. Mellios, A. Nix, S. Armour, and A. Doufexi, “Serving 22 users in realtime with a 128antenna massive MIMO testbed,” in Proc. of IEEE International Workshop on Signal Processing Systems (SiPS), pp. 266–272, Oct. 2016.
 MAMMOET project deliverable D4.2. [Online]. Available: https://mammoetproject.eu/downloads/publications/deliverables/MAMMOETD4.2TestbedassessmentPUM33.pdf
 “Massive MIMO in mobile environments,” in the Massive MIMO blog, http://massivemimo.net.
 C. Barati, S. Hosseini, S. Rangan, P. Liu, T. Korakis, S. Panwar, and T. Rappaport, “Directional cell discovery in millimeter wave cellular networks,” IEEE Trans. on Wireless Comm., vol. 14, no. 2, pp. 6664–6678, Dec. 2015.
 M. Karlsson, E. Björnson, and E. G. Larsson, “Performance of inband transmission of system information in massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 17, pp. 1700–1712, Mar. 2018.
 X. G. Xia and X. Gao, “A spacetime code design for omnidirectional transmission in massive MIMO systems,” IEEE Wireless Comm. Lett., vol. 5, no. 5, pp. 512–515, Oct. 2016.
 X. Meng, X. Gao, and X. G. Xia, “Omnidirectional precoding based transmission in massive MIMO systems,” IEEE Trans. on Comm., vol. 64, no. 1, pp. 174–186, Jan. 2016.
 G. Marco, M. Mezzavilla, and M. Zorzi, “Initial access in 5G mmwave cellular networks,” IEEE Comm. Mag., vol. 54, no. 11, pp. 40–47, Nov. 2016.
 M. Franceschetti, “On Landau’s eigenvalue theorem and information cutsets,” IEEE Trans. Inf. Theory, vol. 61, no. 9, pp. 5042–5051, Sep. 2015.
 H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, “Cellfree massive MIMO versus small cells,” IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1834–1850, Mar. 2017.
 S. Hu, F. Rusek, and O. Edfors, “Beyond massive MIMO: The potential of data transmission with large intelligent surfaces,” IEEE Trans. on Sign. Proc., vol. 66, no. 10, pp. 2746–2758, May 2018.