Massive MIMO Systems: Signal Processing Challenges and Research Trends
Abstract
This article presents a tutorial on multiuser multipleantenna wireless systems with a very large number of antennas, known as massive multiinput multioutput (MIMO) systems. Signal processing challenges and future trends in the area of massive MIMO systems are presented and key application scenarios are detailed. A linear algebra approach is considered for the description of the system and data models of massive MIMO architectures. The operational requirements of massive MIMO systems are discussed along with their operation in timedivision duplexing mode, resource allocation and calibration requirements. In particular, transmit and receiver processing algorithms are examined in light of the specific needs of massive MIMO systems. Simulation results illustrate the performance of transmit and receive processing algorithms under scenarios of interest. Key problems are discussed and future trends in the area of massive MIMO systems are pointed out.
I Introduction
Wireless networks are experiencing a very substantial increase in the delivered amount of data due to a number of emerging applications that include machinetomachine communications and video streaming [1, 2, 3]. This very large amount of data exchange is expected to continue and rise in the next decade or so, presenting a very significant challenge to designers of wireless communications systems. This constitutes a major problem, not only in terms of exploitation of available spectrum resources, but also regarding the energy efficiency in the transmission and processing of each information unit (bit) that has to substantially improve. The Wireless Internet of the Future (WIoF) will have therefore to rely on technologies that can offer a substantial increase in transmission capacity as measured in bits/Hz but do not require increased spectrum bandwidth or energy consumption.
Multipleantenna or multiinput multioutput (MIMO) wireless communication devices that employ antenna arrays with a very large number of antenna elements which are known as massive MIMO systems have the potential to overcome those challenges and deliver the required data rates, representing a key enabling technology for the WIoF [4][6]. Among the devices of massive MIMO networks are user terminals, tablets, and base stations which could be equipped with a number of antenna elements with orders of magnitude higher than current devices. Massive MIMO networks will be structured by the following key elements: antennas, electronic components, network architectures, protocols and signal processing.
The first important ingredient of massive MIMO networks is antenna technology, which allows designers to assemble large antenna arrays with various requirements in terms of spacing of elements and geometries, reducing the number of required radio frequency (RF) chains at the transmit and the receive ends and their implementation costs [7, 8, 9]. In certain scenarios and deployments, the use of compact antennas with closelyspaced elements will be of great importance to equip devices with a large number of antennas but this will require techniques to mitigate the coupling effects especially at the user terminals [10]. The second key area for innovation is that of electronic components and RF chains, where the use of lowcost amplifiers with output power in the mWatt range will play an important role. Architectures such as the directconversion radio (DCR) [11] are very attractive due to their flexibility and ability to operate with several different air interfaces, frequency bands and waveforms. Existing peripherals such as large coaxial cables and powerhungry circuits will have to be replaced with lowenergy solutions.
Another key element of massive MIMO networks is the network architecture, which will evolve from homogeneous cellular layouts to heterogeneous architectures that include small cells and the use of coordination between cells [12]. Since massive MIMO technology is likely to be incorporated into cellular and local area networks in the future, the network architecture will necessitate special attention on how to manage the interference created [13] and measurements campaigns will be of fundamental importance [14][16]. The coordination of adjacent cells will be necessary due to the current trend towards aggressive reuse factors for capacity reasons, which inevitably leads to increased levels of intercell interference and signalling. The need to accommodate multiple users while keeping the interference at an acceptable level will require significant work in scheduling and mediumaccess protocols.
The last ingredient of massive MIMO networks and the main focus of this article is signal processing. In particular, MIMO signal processing will play a crucial role in dealing with the impairments of the physical medium and in providing costeffective tools for processing information. Current stateoftheart in MIMO signal processing requires a computational cost for transmit and receive processing that grows as a cubic or supercubic function of the number of antennas, which is clearly not scalable with a large number of antenna elements. We advocate the need for simpler solutions for both transmit and receive processing tasks, which will require significant research effort in the next years. Novel signal processing strategies will have to be developed to deal with the problems associated with massive MIMO networks like computational complexity and its scalability, pilot contamination effects, RF impairments, coupling effects, delay and calibration issues. Another key point for future massive MIMO technology is the application scenarios, which will become the main object of investigation in the coming years. Amongst the most important scenarios are multibeam satellite networks, cellular systems beyond LTEA [2] and local area networks.
This article is structured as follows. Section II reviews the system model including both uplink and downlink and discusses the application scenarios. Section III is dedicated to transmit processing techniques, whereas Section IV concentrated on receive processing. Section V discusses the results of some simulations and Section VI presents some open problems and suggestions for further work. The conclusions of this article are given in Section VII.
Ii Application Scenarios and Signal Models
In this section, we discuss several application scenarios for multiuser massive MIMO systems which include multibeam satellite systems, cellular and local area networks. Signal models based on elementary linear algebra are then presented to describe the information processing in both uplink and downlink transmissions. These models are based on the assumption of a narrowband signal transmission over flat fading channels which can be easily generalized to broadband signal transmission with the use of multicarrier systems.
Iia Application Scenarios
Amongst the most promising application scenarios of multiuser massive MIMO techniques are multibeam satellite [17], cellular and local area networks. Multibeam satellite systems are perhaps the most natural scenario for massive MIMO because the number of antenna elements is above one hundred. The major benefit of satellite communications is that all users can be served within the coverage region at the same cost. In this context, the next generation of broadband satellite networks will employ multibeam techniques in which the coverage region is served by multiple spot beams intended for the users that are shaped by the antenna feeds forming part of the payload [17], as depicted in Fig 1. A fundamental problem with the multibeam approach is the interference caused by multiple adjacent spot beams that share the same frequency band. This interference between spot beams must be mitigated by suitable signal processing algorithms. Specifically, multiuser interference mitigation schemes such as precoding or multiuser detection can be jointly designed with the beamforming process at the gateway station. The interference mitigation must be applied to all the radiating signals instead of the user beams directly. In the downlink (also known as the forward link in the satellite communications literature), the interference mitigation problem corresponds to designing transmit processing or precoding strategies that require the channel state information (CSI). For the uplink (also known as the reverse link), the interference mitigation problem can be addressed by the design of multiuser detectors.
The second highlyrelevant scenario is that of mobile cellular networks beyond LTEA [2], which is illustrated in Fig. 2. In such networks, massive MIMO would play a key role with the deployment of hundreds of antenna elements at the base station, coordination between cells and a more modest number of antenna elements at the user terminals. At the base station, very large antenna arrays could be deployed on the roof or on the façade of buildings. With further development in the area of compact antennas and techniques to mitigate mutual coupling effects, it is likely that the number of antenna elements at the user terminals (mobile phones, tables and other gadgets) might also be significantly increased from elements in current terminals to in future devices. In these networks, it is preferable to employ timedivisionduplexing (TDD) mode to perform uplink channel estimation and obtain downlink CSI by reciprocity for signal processing at the transmit side. This operation mode will require costeffective calibration algorithms. Another critical requirement is the uplink channel estimation, which employs nonorthogonal pilots and due to the existence of adjacent cells and the coherence time of the channel needs to reuse the pilots [18]. Pilot contamination occurs when CSI at the base station in one cell is affected by users from other cells. In particular, the uplink (or multipleaccess channel) will need CSI obtained by uplink channel estimation, efficient multiuser detection and decoding algorithms. The downlink (also known as the broadcast channel) will require CSI obtained by reciprocity for transmit processing and the development of costeffective scheduling and precoding algorithms.
The third and last highlyrelevant scenario is represented by wireless local area networks (WLANs) [3], which are shown in Fig. 3. The deployment of WLANs has increased tremendously in the last few years with the proliferation of hot spots and home users. These systems have adopted orthogonal frequencydivision multiplexing (OFDM) for their air interface and are equipped with a number of antennas of up to at the access point and up to antennas at the user terminals [3]. Massive MIMO could play an important role in the incorporation of a substantial number of antenna elements at the access point using compact antennas and planar array geometries to keep the size of the access point at reasonable physical dimensions. The user terminals (laptops, tablets and smart phones) could also rely on compact antennas to accommodate a substantial number of radiating elements. In the future, it is possible that the number of antenna elements at the user terminals will be significantly increased from to over elements at the access points terminals and from to over in future devices.
A key challenge in all the three scenarios is how to deal with a very large number of antenna elements and develop costeffective algorithms, resulting in excellent performance in terms of the metrics of interest, namely, bit error rate (BER), sumrate and throughput. In what follows, signal models that can describe the processing and transmission will be detailed.
IiB Donwlink Model
In our description, we consider a multiuser massive MIMO system with a number of antenna elements equal to at the transmitter, which could be a satellite gateway, a base station of a cellular network or an access point of a WLAN. The transmitter communicates with users in the system, where each user is equipped with antenna elements and . It should be noted that in massive MIMO systems, it is desirable to have an excess of degrees of freedom [4], which means should exceed by a significant margin in order to leverage the array gain. At each time instant , the transmitter applies a precoder to the data vector intended for the users. The data vector consists of the stacking of the vectors of the users, where each entry is a data symbol taken from a modulation constellation with zero mean and variance , where denotes transpose. The precoded data vector for user is given by , where is the mathematical mapping applied by the precoder, and is then transmitted over flat fading channels.
The received signal at each user after demodulation, matched filtering and sampling is collected in an vector with sufficient statistics for processing and given by
(1) 
where the vector is a zero mean complex circular symmetric Gaussian noise with covariance matrix , where stands for expected value, denotes the Hermitian operator, is the noise variance and is the identity matrix. The precoded data vectors have covariance matrices , where is the signal power. The elements of the channel matrices are the complex channel gains from the th transmit antenna to the th receive antenna.
IiC Uplink Model
Let us now consider the uplink of a multiuser massive MIMO system with users that are equipped with antenna elements and communicate with a receiver with antenna elements, where . At each time instant, the users transmit symbols which are organized into a vector taken from a modulation constellation . The data vectors are then transmitted over flat fading channels. The received signal after demodulation, matched filtering and sampling is collected in an vector with sufficient statistics for processing as described by
(2) 
where the vector is a zero mean complex circular symmetric Gaussian noise with covariance matrix . The data vectors have zero mean and covariance matrices , where is the signal power. The elements of the channel matrices are the complex channel gains from the th transmit antenna to the th receive antenna.
Iii Transmit Processing
In this section, we discuss several aspects related to transmit processing in massive MIMO systems. Fundamental results in information theory have shown that the optimum transmit strategy for the multiuser massive MIMO downlink channel involves a theoretical dirty paper coding (DPC) technique that performs interference cancellation combined with an implicit user scheduling and power loading algorithm [41]. However, this optimal approach is extremely costly and unlikely to be used in any practical deployment. In what follows, we consider several aspects of transmit processing in massive MIMO systems which include TDD operation, pilot contamination, resource allocation and precoding, and related signal processing tasks.
Iiia TDD operation
One of the key problems in modern wireless systems is the acquisition of CSI in a timely way. In timevarying channels, TDD offers the most suitable alternative to obtain CSI because the training requirements in a TDD system is independent of the number of antennas at the base station (or access point) [18] and there is no need for CSI feedback. In particular, TDD systems rely on reciprocity by which the uplink channel is used as an estimate of the downlink channel. An issue in this operation mode is the difference in the transfer characteristics of the amplifiers and the filters in the two directions. This can be addressed through measurements and appropriate calibration [5]. In contrast, in a frequency division duplexing (FDD) system the training requirements is proportional to the number of antennas and CSI feedback is essential. For this reason, massive MIMO systems will most likely operate in TDD mode and will require further investigation in calibration methods.
IiiB Pilot contamination
The adoption of TDD mode and uplink training in massive MIMO systems with multiple cells results in a phenomenon called pilot contamination. In multicell scenarios, it is difficult to employ orthogonal pilot sequences because the duration of the pilot sequences depends on the number of cells and this duration is severely limited by the channel coherence time due to mobility. Therefore, nonorthogonal pilot sequences must be employed and this affects the CSI employed at the transmitter. Specifically, the channel estimate is contaminated by a linear combination of channels of other users that share the same pilot [18]. Consequently, the precoders and resource allocation algorithms will be highly affected by the contaminated CSI. Strategies to control or mitigate pilot contamination and its effects are very important for massive MIMO networks. Possible approaches include work on optimization of waveforms, blind channel estimation techniques, implicit training approaches and precoding and resource allocation techniques that take into account pilot contamination to mitigate its effects.
IiiC Resource allocation
Prior work on multiuser MIMO [36, 37, 38] has shown that resource allocation techniques are fundamental to obtain further capacity gains. In massive MIMO this will be equally important and will have the extra benefit of more accurate CSI. From a multiuser information theoretic perspective, the capacity region boundary is achieved by serving all active users simultaneously. The resources (antennas, users and power) that should be allocated to each user depend on the instantaneous CSI which may vary amongst users. Since the total number of users that could be served is often much higher than the number of transmit antennas , the system needs a resource allocation algorithm to select the best set of users according to a chosen criterion such as the sum rate or a user target rate. The resource allocation task is then to choose a set of users and their respective powers in order to satisfy a given performance metric. In massive MIMO systems, the spatial signatures of the users to be scheduled might play a fundamental role thanks to the very large number of antennas and an excess of degrees of freedom [4, 5]. The multiuser diversity [36] along with high array gains might be exploited by resource allocation algorithm along with timely CSI. In particular, the problem of user selection, i.e., scheduling, corresponds to a combinatorial problem equivalent to the combination of choosing . Hence, it is clear that the exhaustive search over all possible combinations is computationally prohibitive when the in the system is reasonably large, and thus costeffective user selection algorithms will be required. Strategies based on greedy, lowcost and discrete optimization methods [37, 38, 40] are very promising for massive MIMO networks because they could reduce the cost of resource allocation algorithms.
IiiD Precoding and Related Techniques
Strategies for mitigating the multiuser interference at the transmit side include transmit beamforming [5] and precoding based on linear minimum mean square error (MMSE) [42] or zeroforcing (ZF) [43] techniques and nonlinear approaches such as DPC, TomlinsonHarashima precoding (THP) [63] and vector perturbation [48]. Transmit matched filtering (TMF) is the simplest method for processing data at the transmit side and has been recently advocated by several works for massive MIMO systems [4, 5]. The basic idea is to apply the conjugate of the channel matrix to the data symbol vector prior to transmission as described by
(3) 
where the matrix contains the parameters of all the channels and the vector represents the data processed by TMF.
Linear precoding techniques such as ZF and MMSE precoding are based on channel inversion operations and are attractive due to their relative simplicity for MIMO systems with a small to moderate number of antennas. However, channel inversion based precoding requires a higher average transmit power than other precoding algorithms especially for ill conditioned channel matrices, which could result in poor performance. A linear precoder applies a linear transformations to the data symbol vector prior to transmission as described by
(4) 
where the matrix contains the parameters of the channels and the data symbol vectors represent the data processed by the linear precoder. The linear MMSE precoder is described by , where is a gain factor, and the linear ZF precoder is expressed by .
Block diagonalization (BD) type precoding algorithms have been proposed in [43, 44, 46] for MUMIMO systems. The main advantage of BD type algorithms is the sumrate performance that is not far from that obtained by DPC techniques and the relative simplicity for implementation in systems with a modest number of antennas. However, existing BD solutions are unlikely to be used in massive MIMO systems due to the cost associated with their implementation in antenna arrays with hundreds of elements. This suggests that there is need for costeffective BD type strategies for very large antenna arrays. THP [63] is a nonlinear precoding technique that employs feedforward and feedback matrices along with a modulo operation to cancel the multiuser interference in a more effective way than a standard linear precoder. With THP, the precoded data vector is given by
(5) 
where is the feedforward precoding matrix which can be obtained by an LQ decomposition of the channel matrix and the input data is computed elementbyelement by
(6) 
where are the elements of the lower triangular matrix that can also be obtained by an LQ decomposition. Amongst the appealing features of THP are its excellent BER and sumrate performances which are not far from DPC and its flexibility to incorporate channel coding. Future work on THP for massive MIMO networks should concentrate on the reduction of the computational cost to compute the feedforward and feedback matrices since existing factorization algorithms would be too costly for systems with hundreds of antenna elements.
Vector perturbation employs a modulo operation at the transmitter to perturb the transmitted signal vector and to avoid the transmit power enhancement incurred by ZF or MMSE methods [48]. The task of finding the optimal perturbation involves solving a minimum distance type problem that can be implemented using sphere encoding or full searchbased algorithms. Let denote a multiuser composite channel. The idea of perturbation is to find a perturbing vector from an extended constellation to minimize the transmit power. The perturbation is obtained by solving
(7) 
where is some linear transformation or precoder such that , the scalar is chosen depending on the constellation size (e.g., for QPSK), and is the Kdimensional complex lattice. The transmit matched filter, linear ZF or MMSE precoders can be used for . After predistortion using a linear precoder, the resulting constellation region also becomes distorted and thus a modulo operation is employed. This problem can be regarded as Kdimensional integerlattice least squares problem, which can be solved by search based algorithms [48].
Iv Receive Processing
In this section, we discuss receive processing in massive MIMO systems. In particular, we examine parameter estimation and detection algorithms, iterative detection and decoding techniques, mitigation of RF impairments and related signal processing tasks.
Iva Parameter Estimation and Detection Algorithms
Amongst the key problems in the uplink of multiuser massive MIMO systems are the estimation of parameters such as channels gains and receive filter coefficients, and the detection of the transmitted symbols of each user as described by the signal model in (2). The parameter estimation task usually relies on pilot (or training) sequences and signal processing algorithms. In multiuser massive MIMO networks, nonorthogonal training sequences are likely to be used in most application scenarios and the estimation algorithms must be able to provide the most accurate estimates and to track the variations due to mobility. Standard MIMO linear MMSE and leastsquares (LS) channel estimation algorithms [49] can be used for obtaining CSI. However, the cost associated with these algorithms is often cubic in the number of antenna elements at the receiver, i.e., in the uplink. Moreover, in scenarios with mobility the receiver will need to employ adaptive algorithms [78] which can track the channel variations. Interestingly, massive MIMO systems have an excess of degrees of freedom that translates into a reducedrank structure to perform parameter estimation. This is an excellent opportunity that massive MIMO offers to apply reducedrank algorithms [28][35] and further develop these techniques.
In order to separate the data streams transmitted by the different users in a multiuser massive MIMO network, a designer must resort to detection techniques, which are similar to multiuser detection methods [50]. The optimal maximum likelihood (ML) detector is described by
(8) 
where the data vector contains the symbols of all users. The ML detector has a cost that is exponential in the number of data streams and the modulation order that is too complex to be implemented in systems with a large number of antennas. Even though the ML solution can be alternatively computed using sphere decoder (SD) algorithms [51][55] that are very efficient for MIMO systems with a small number of antennas, the cost of SD algorithms depends on the noise variance, the number of data streams to be detected and the signal constellation, resulting in high computational costs for low signaltonoise ratios (SNR), highorder constellations and a large number of data streams.
The high computational complexity of the ML detector and the SD algorithms in the scenarios described above have motivated the development of numerous alternative strategies for MIMO detection, which often rely on signal processing with receive filters. The key advantage of these approaches with receive filters is that the cost is typically not dependent on the modulation and the receiver can compute the receive filter only once per data packet and perform detection. Algorithms that can compute the parameters of receive filters with low cost are of central importance to massive MIMO systems. In what follows, we will briefly review some relevant suboptimal detectors, which include linear and decisiondriven strategies.
Linear detectors [56] include approaches based on the receive matched filter (RMF), ZF and MMSE designs and are described by
(9) 
where the receive filters are for the RMF, for the MMSE and for the ZF design, and represents the slicer used for detection.
Decisiondriven detection algorithms such as successive interference cancellation (SIC) approaches used in the VerticalBell Laboratories Layered SpaceTime (VBLAST) systems [57][61] and decision feedback (DF) [62] detectors are techniques that can offer attractive tradeoffs between performance and complexity. Prior work on SIC and DF schemes has been reported with DF detectors with SIC (SDF) [62, 68] and DF receivers with parallel interference cancellation (PIC) (PDF) [71, 72], combinations of these schemes [71, 24, 75] and mechanisms to mitigate error propagation [76, 77]. DF detectors [62, 68, 71] employ feedforward and feedback matrices that can be based on the receive matched filter (RMF), ZF and MMSE designs as described by
(10) 
where corresponds to the initial decision vector that is usually performed by the linear section of the DF receiver (e.g., ) prior to the application of the feedback section. The receive filters and can be computed using design criteria and optimization algorithms.
An often criticized aspect of these suboptimal schemes is that they typically do not achieve the full receivediversity order of the ML algorithm. This led to the investigation of detection strategies such as latticereduction (LR) schemes [63][64], QR decomposition, Malgorithm (QRDM) detectors [65], probabilistic data association (PDA) [66, 67] and multibranch [24, 26] detectors, which can approach the ML performance at an acceptable cost for small to moderate systems. The development of costeffective detection algorithms for massive MIMO systems is a formidable task that calls for new approaches and ideas in this exciting area.
IvB Iterative Detection and Decoding Techniques
Iterative detection and decoding (IDD) schemes have received considerable attention in the last years following the discovery of Turbo codes [79] and the use of the Turbo principle for mitigation of several sources of interference [80][88]. More recently, work on IDD schemes has been extended to lowdensity paritycheck codes (LDPC) [84, 86] and their variants which rival Turbo codes in terms of performance. The basic idea of an IDD system is to combine an efficient softinput softoutput (SISO) detection algorithm and a SISO decoding technique. In particular, the detector produces loglikelihood ratios (LLRs) associated with the encoded bits and these LLRs serve as input to the decoder. Then, in the second phase of the detection/decoding iteration, the decoder generates a posteriori probabilities (APPs) after a number of (inner) decoding iterations for encoded bits of each data stream. These APPs are fed to the detector to help in the next iterations between the detector and the decoder, which are called outer iterations. The joint process of detection/decoding is then repeated in an iterative manner until the maximum number of (inner and outer) iterations is reached. In massive MIMO systems, it is likely that either Turbo or LDPC codes will be adopted in IDD schemes for mitigation of multiuser, multipath, intercell and other sources of interference. LDPC codes exhibit some advantages over Turbo codes that include simpler decoding and implementation issues. However, LDPC codes often require a higher number of decoding iterations which translate into delays or increased complexity. The development of IDD schemes and decoding algorithms that perform message passing with reduced delays [89][91] are of paramount importance in future wireless systems.
IvC Mitigation of RF Impairments
The large antenna arrays used in massive MIMO systems will pose several issues to system designers such as coupling effects, inphase/quadrature (I/Q) imbalances [92], and failures of antenna elements, which will need to be addressed. The first potential major impairment in massive MIMO systems is due to reduced spacing between antenna elements which result in coupling effects. In fact, for compact antenna arrays a reduction of the physical size of the array inevitably leads to reduced spacing between antenna elements, which can severely reduce the multiplexing gain. In order to address these coupling effects, receive processing approaches will have to work with transmit processing techniques to undo the coupling induced by the relatively close spacing of radiating elements in the array. Another major impairment in massive MIMO systems is I/Q imbalances in the RF chains of the large arrays. This problem can be addressed by receive or transmit processing techniques and require modelling of the impairments for subsequent mitigation. When working with large antenna arrays, a problem that might also occur is the failure of some antenna elements. Such sensor failures are responsible for a reduction in the degrees of freedom of the array and must be dealt by signal processing algorithms.
V Simulation Results
In this section, we illustrate some of the techniques outlined in this article using massive MIMO configurations, namely, a very large antenna array, an excess of degrees of freedom provided by the array and a large number of users with multiple antennas. We consider QPSK modulation and channels that are fixed during a data packet and that are modeled by complex Gaussian random variables with zero mean and variance equal to unity. The signaltonoise ratio (SNR) in dB is defined as , where is the variance of the symbols, is the noise variance, and we consider data packets of QPSK symbols.
In the first example, we compare the BER performance against the SNR of several detection algorithms, namely, the RMF with users and with a single user, the linear MMSE detector [56] and the DF MMSE detector using a successive interference cancellation [62, 71, 24]. In particular, a scenario with antenna elements at the receiver, users and antenna elements at the user devices is considered, which corresponds to an excess of degrees of freedom equal to . The results shown in Fig. 4 indicate that the RMF with a single user has the best performance, followed by the DF MMSE, the linear MMSE and the RMF detectors. Unlike previous works [5] that advocate the use of the RMF, it is clear that the BER performance loss experienced by the RMF should be avoided and more advanced receivers should be considered. However, the cost of linear and DF receivers is dictated by the matrix inversion of matrices which must be reduced for large systems.
In the second example, we compare the sumrate performance against the SNR of several precoding algorithms, namely, the TMF with a varying number of users and with a single user, the linear MMSE precoder and the THP MMSE precoder. The sumrate is calculated using [95]:
(11) 
We consider a similar scenario to the previous one in which the transmitter is equipped with antenna elements, and there are users with antenna elements. The results in Fig. 5 show that the TMF with a single user has the best sumrate performance, followed by the THP MMSE, the regularized BD (RBD), the linear MMSE and the TMF precoding algorithms. From the curves in Fig. 5, we can notice that the performance of TMF is much worse than that of THP and of RBD . This suggests that more sophisticated precoding techniques with lower complexity should be developed to maximize the capacity of massive MIMO systems.
Vi Future Trends and Emerging Topics
In this section, we discuss some future signal processing trends in the area of massive MIMO systems and point out some emerging topics that might attract the interest of researchers. The topics are structured as:

Transmit processing:
Costeffective scheduling algorithms: The development of methods that have low cost and are scalable such as greedy algorithms [37] and discrete optimization techniques [40] will play a crucial role in massive MIMO networks.
Calibration procedures: The transfer characteristics of the filters and amplifiers used for TDD operation will require designers to devise algorithms that can efficiently calibrate the links.
Precoders with scability in terms of complexity: The use of divideandconquer approaches, methods based on sensor array signal processing and sectorization will play an important role to reduce the dimensionality of the transmit processing problem. Moreover, the investigation and development of TMF strategies with nonlinear cancellation strategies and lowcost decompositions for linear and nonlinear precoders will be important to obtain efficient transmit methods.

Receive processing:
Costeffective detection algorithms: Techiques to perform dimensionality reduction [28][35] for detection problems will play an important role in massive MIMO devices. By reducing the number of effective processing elements, detection algorithms could be applied. In addition, the development of schemes based on RMF with nonlinear interference cancellation capabilities might be a promising option that can close the gap between RMF and more costly detectors.
Decoding strategies with low delay: The development of decoding strategies with reduced delay will play a key role in applications such as audio and video streaming because of their delay sensitivity. Therefore, we argue that novel message passing algorithms with smarter strategies to exchange information should be investigated along with their application to IDD schemes.
Vii Concluding Remarks
This article has presented a tutorial on massive MIMO systems and discussed signal processing challenges and future trends in this exciting reseach topic. Key application scenarios which include multibeam satellite, cellular and local area networks have been examined along with several operational requirements of massive MIMO networks. Transmit and receive processing tasks have been discussed and fundamental signal processing needs for future massive MIMO networks have been identified. Numerical results have illustrated some of the discussions on transmit and receive processing functions and future trends have been highlighted. Massive MIMO technology is likely to be incorporated into the applications detailed in this article on a gradual basis by the increase in the number of antenna elements and by the need for more sophistical signal processing tools to transmit and process a large amount of information.
References
 Cisco and/or its affiliates,“Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 20122017”, Tech. Rep., Cisco Systems, Inc., Jan. 2013.
 Requirements for Further Advancements for EUTRA (LTEAdvanced), 3GPP TR 36.913 Standard, 2011.
 Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications: Enhancements for Very High Throughput for Operation in Bands Below 6GHz, IEEE P802.11ac/D1.0 Stdandard., Jan. 2011.
 T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Trans. Wireless Commun., vol. 9, no. 11, pp. 3590–3600, Nov. 2010.
 F. Rusek, D. Persson, B. Lau, E. Larsson, T. Marzetta, O. Edfors and F. Tufvesson, “Scaling up MIMO: Opportunities, and challenges with very large arrays”, IEEE Signal Processing Mag., vol. 30, no. 1, pp. 4060, Jan. 2013.
 J. Nam, J.Y. Ahn, A. Adhikary, and G. Caire, “Joint spatial division and multiplexing: Realizing massive MIMO gains with limited channel state information,” in 46th Annual Conference on Information Sciences and Systems (CISS), 2012.
 C. Waldschmidt and W. Wiesbeck, “Compact wideband multimode antennas for MIMO and diversity,” IEEE Trans. Antennas Propag., vol. 52, no. 8, pp. 19631969, Aug. 2004.
 A. Grau, J. Romeu, S. Blanch, L. Jofre, and F. D. Flaviis, “Optimization of linear multielement antennas for selection combining by means of a Butler matrix in different MIMO environments,” IEEE Trans. Antenna Propag., vol. 54, no. 11, pp. 32513264, Nov. 2006.
 C. Chiu, J. Yan and R. Murch, ”24Port and 36Port antenna cubes suitable for MIMO wireless communications,” IEEE Trans. Antennas Propag., vol.56, no.4, pp. 11701176, Apr. 2008.
 J. W. Wallace and M. A. Jensen, “Terminationdependent diversity performance of coupled antennas: Network theory analysis”, IEEE Trans. Antennas Propag., vol. 52, no. 1, pp. 98105, Jan. 2004.
 T. Schenk, RF Imperfections in Highrate Wireless Systems: Impact and Digital Compensation, 1st ed. Springer, Feb. 2008.
 R. Combes, Z. Altman and E. Altman, “Interference coordination in wireless networks: A flowlevel perspective,” Proceedings IEEE INFOCOM 2013, vol., no., pp.2841,2849, 1419 April 2013.
 R. Aggarwal, C. E. Koksal, and P. Schniter, “On the design of large scale wireless systems”, IEEE J. Sel. Areas Commun, vol. 31, no. 2, pp. 215225, Feb. 2013.
 C. Shepard, H. Yu, N. Anand, L. E. Li, T. L. Marzetta, R. Yang, and L. Zhong, “Argos: Practical manyantenna base stations,” in ACM Int. Conf.Mobile Computing and Networking (MobiCom), Istanbul, Turkey, Aug. 2012.
 J. Hoydis, C. Hoek, T. Wild, and S. ten Brink, “Channel measurements for large antenna arrays,” in IEEE International Symposium on Wireless Communication Systems (ISWCS), Paris, France, Aug. 2012.
 X. Gao, F. Tufvesson, O. Edfors, and F. Rusek, “Measured propagation characteristics for verylarge MIMO at 2.6 GHz,” in Proc. of the 46th Annual Asilomar Conference on Signals, Systems, and Computers,, Pacific Grove, California, USA, Nov. 2012.
 J. Arnau, B. Devillers, C. Mosquera, A. PrezNeira, “Performance study of multiuser interference mitigation schemes for hybrid broadband multibeam satellite architectures”, EURASIP Journal on Wireless Communications and Networking, 2012:132 (5 April 2012).
 J. Jose, A. Ashikhmin, T. L. Marzetta, S. Vishwanath, “Pilot Contamination and Precoding in MultiCell TDD Systems,” IEEE Transactions on Wireless Communications, vol.10, no.8, pp. 26402651, August 2011.
 H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and spectral efficiency of very large multiuser MIMO systems”, IEEE Trans. Commun., vol. 61, no. 4, pp. 14361449, Apr. 2013.
 H. Yang and T. L. Marzetta, “Performance of conjugate and zeroforcing beamforming in largescale antenna systems,” IEEE J. Sel. Areas Commun., vol. 31, no. 2, pp. 172–179, Feb. 2013.
 A. Ashikhmin and T. L. Marzetta, “Pilot contamination precoding in multicell large scale antenna systems,” in IEEE International Symposium on Information Theory (ISIT), Cambridge, MA, Jul. 2012.
 J. Zhang, X. Yuan, and L. Ping, “Hermitian precoding for distributedMIMO systems with individual channel state information,” IEEE J. Sel. Areas Commun., vol. 31, no. 2, pp. 241–250, Feb. 2013.
 K. Vardhan, S. Mohammed, A. Chockalingam and B. Rajan, “A lowcomplexity dector for large MIMO systems and multicarrier CDMA systems”, IEEE J. Sel. Commun., vol. 26, no. 3, pp. 473485, Apr. 2008.
 R.C. de Lamare, R. SampaioNeto, “Minimum meansquared error iterative successive parallel arbitrated decision feedback detectors for DSCDMA systems”, IEEE Trans. Commun., vol. 56, no. 5, May 2008, pp. 778789.
 P. Li and R. D. Murch, Multiple Output SelectionLAS Algorithm in Large MIMO Systems, IEEE Commun. Lett., vol. 14, no. 5, pp. 399401, May 2010.
 R. C. de Lamare, ”Adaptive and Iterative MultiBranch MMSE Decision Feedback Detection Algorithms for MultiAntenna Systems”, IEEE Transactions on Wireless Communications, vol. 14, no. 2, February 2013.
 Q. Zhou and X. Ma, ElementBased Lattice Reduction Algorithms for Large MIMO Detection, IEEE J. Sel. Areas Commun, vol. 31, no. 2, pp. 274286, Feb. 2013.
 H. Qian and S. N. Batalama, “Datarecordbased criteria for the selection of an auxiliary vector estimator of the MMSE/MVDR filter,” IEEE Trans. Commun., vol. 51, no. 10, pp. 1700–1708, Oct. 2003.
 R. C. de Lamare and R. SampaioNeto, “Adaptive ReducedRank MMSE Filtering with Interpolated FIR Filters and Adaptive Interpolators”, IEEE Sig. Proc. Letters, vol. 12, no. 3, March, 2005.
 Y. Sun, V. Tripathi, and M. L. Honig, “Adaptive, iterative, reducedrank (turbo) equalization,” IEEE Trans. Wireless Commun., vol. 4, no. 6, pp. 2789–2800, Nov. 2005.
 R. C. de Lamare and R. SampaioNeto, “ReducedRank Adaptive Filtering Based on Joint Iterative Optimization of Adaptive Filters”, IEEE Signal Processing Letters, Vol. 14, no. 12, December 2007.
 R. C. de Lamare, M. Haardt, and R. SampaioNeto, “Blind Adaptive Constrained ReducedRank Parameter Estimation based on Constant Modulus Design for CDMA Interference Suppression”, IEEE Transactions on Signal Processing, June March 2008.
 R. C. de Lamare and R. SampaioNeto, “Adaptive reducedrank processing based on joint and iterative interpolation, decimation, and filtering,” IEEE Trans. Signal Process., vol. 57, no. 7, July 2009, pp. 25032514.
 R. C. de Lamare and R. SampaioNeto, “ReducedRank SpaceTime Adaptive Interference Suppression With Joint Iterative Least Squares Algorithms for SpreadSpectrum Systems,” IEEE Transactions on Vehicular Technology, vol.59, no.3, March 2010, pp.12171228.
 R.C. de Lamare and R. SampaioNeto, “Adaptive reducedrank equalization algorithms based on alternating optimization design techniques for MIMO systems,” IEEE Trans. Veh. Technol., vol. 60, no. 6, pp. 24822494, July 2011.
 Z. Tu and R.S. Blum, “Multiuser diversity for a dirty paper approach,” IEEE Commun. Lett., vol. 7, no. 8, pp. 370–372, Aug. 2003.
 G. Dimic and N.D. Sidiropoulos, “On downlink beamforming with greedy user selection: Performance analysis and a simple new algorithm,” IEEE Trans. Signal Processing, vol. 53, no. 10, pp. 3857–3868, Oct. 2005.
 Z. Shen, J.G. Andrews, R.W. Heath Jr., and B.L. Evans, “Low complexity user selection algorithms for multiuser MIMO systems with block diagonalization,” IEEE Trans. Signal Processing, vol. 54, no. 9, pp. 3658–3663, Sept. 2006.
 N. Dao and Y. Sun, “Userselection algorithms for multiuser precoding,” IEEE Trans. Veh. Technol., vol. 59, no. 7, Sep. 2010.
 P. Clarke ad R. C. de Lamare, ”Transmit Diversity and Relay Selection Algorithms for Multirelay Cooperative MIMO Systems”, IEEE Transactions on Vehicular Technology, vol. 61 , no. 3, March 2012, Page(s): 1084  1098.
 G. Caire and S. Shamai (Shitz), “On the achievable throughput of a multiantenna Gaussian broadcast channel,” IEEE Trans. Inform. Theory, vol. 49, no. 7, pp. 1691–1706, July 2003.
 M. Joham, W. Utschick, and J. Nossek, “Linear transmit processing in MIMO communications systems,” IEEE Trans. Signal Processing, vol. 53, no. 8, pp. 2700–2712, Aug. 2005.
 Q. Spencer, A.L. Swindlehurst, and M. Haardt, “Zeroforcing methods for downlink spatial multiplexing in multiuser MIMO channels,” IEEE Trans. Signal Processing, vol. 52, no. 2, pp. 462–471, Feb. 2004.
 V. Stankovic and M. Haardt, “Generalized design of multiuser MIMO precoding matrices,” IEEE Trans. Wireless Commun., vol. 7, no. 3, pp. 953961, Mar. 2008.
 K. Zu and R. C. de Lamare, ”Lowcomplexity lattice reductionaided regularized block diagonalization for MUMIMO systems,” IEEE Communication Letters, vol. 16, no. 6, Jun. 2012.
 K. Zu, R. C. de Lamare and M. Haardt, “Generalized design of low complexity block diagonalization type precoding algorithms for multiuser MIMO systems”, IEEE Trans. Communications, 2013.
 C. Windpassinger, R. Fischer, T. Vencel and J. Huber, “Precoding in multiantenna and multiuser communications,” IEEE Trans. Wireless Commun., vol. 3, no. 4, Jul. 2004.
 C.B. Peel, B.M. Hochwald, and A.L. Swindlehurst, “A vectorperturbation technique for near capacity multiantenna multiuser communication—part I: channel inversion and regularization,” IEEE Trans. Commun., vol. 53, no. 1, pp. 195–202, Jan. 2005.
 M. Biguesh, A.B. Gershman, “Trainingbased MIMO channel estimation: a study of estimator tradeoffs and optimal training signals”, IEEE Transactions on Signal Processing, vol. 54 no. 3, March 2006.
 S. Verdu, Multiuser Detection, Cambridge, 1998.
 E. Viterbo and J. Boutros, “A universal lattice code decoder for fading channels”, IEEE Trans. on Inf. Theory, vol. 45, no. 5, pp.1639–1642, July 1999.
 M. O. Damen, H. E. Gamal, and G. Caire, “On maximum likelihood detection and the search for the closest lattice point,” IEEE Trans. Inform. Theory, vol. 49, pp. 2389–2402, Oct. 2003.
 Z. Guo and P. Nilsson, “Algorithm and Implementation of the KBest Sphere Decoding for MIMO Detection,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 3, pp. 491–503, March 2006.
 C. Studer, A. Burg, and H. Bolcskei, Softoutput sphere decoding: algorithms and VLSI implementation,” IEEE J. Sel. Areas Commun., vol. 26, pp. 290300, Feb. 2008.
 B. Shim and I. Kang, “On further reduction of complexity in tree pruning based sphere search,” IEEE Trans. Commun., vol. 58, no. 2, pp. 417–422, Feb. 2010.
 A. DuelHallen, “Equalizers for Multiple Input Multiple Output Channels and PAM Systems with Cyclostationary Input Sequences,” IEEE J. Select. Areas Commun., vol. 10, pp. 630639, April, 1992.
 G. D. Golden, C. J. Foschini, R. A. Valenzuela and P. W. Wolniansky, “Detection algorithm and initial laboratory results using VBLAST spacetime communication architecture”, Electronics Letters, vol. 35, No.1, January 1999.
 J. Benesty, Y. Huang, and J. Chen, “A fast recursive algorithm for optimum sequential signal detection in a BLAST system,” IEEE Trans. Signal Processing, vol. 51, pp. 1722–1730, July 2003.
 A. Rontogiannis, V. Kekatos, and K. Berberidis,” A SquareRoot Adaptive VBLAST Algorithm for Fast TimeVarying MIMO Channels,” IEEE Signal Processing Letters, Vol. 13, No. 5, pp. 265268, May 2006.
 R. Fa, R. C. de Lamare, “MultiBranch Successive Interference Cancellation for MIMO Spatial Multiplexing Systems”, IET Communications, vol. 5, no. 4, pp. 484  494, March 2011.
 P. Li, R. C. de Lamare and R. Fa, “Multiple Feedback Successive Interference Cancellation Detection for Multiuser MIMO Systems,” IEEE Transactions on Wireless Communications, vol. 10, no. 8, pp. 2434  2439, August 2011.
 J. H. Choi, H. Y. Yu, Y. H. Lee, ”Adaptive MIMO decision feedback equalization for receivers with timevarying channels”, IEEE Trans. Signal Proc., 2005, 53, no. 11, pp. 42954303.
 C. Windpassinger, L. Lampe, R.F.H. Fischer, T.A Hehn, “A performance study of MIMO detectors,” IEEE Transactions on Wireless Communications, vol. 5, no. 8, August 2006, pp. 20042008.
 Y. H. Gan, C. Ling, and W. H. Mow, “Complex lattice reduction algorithm for lowcomplexity fulldiversity MIMO detection,” IEEE Trans. Signal Processing, vol. 56, no. 7, July 2009.
 K. J. Kim, J. Yue, R. A. Iltis, and J. D. Gibson, “A QRDM/Kalman filterbased detection and channel estimation algorithm for MIMOOFDM systems”, IEEE Trans. Wireless Communications, vol. 4,pp. 710721, March 2005.
 Y. Jia, C. M. Vithanage, C. Andrieu, and R. J. Piechocki, “Probabilistic data association for symbol detection in MIMO systems,” Electron. Lett., vol. 42, no. 1, pp. 38–40, Jan. 2006.
 S. Yang, T. Lv, R. Maunder, and L. Hanzo, ”Unified BitBased Probabilistic Data Association Aided MIMO Detection for HighOrder QAM Constellations”, IEEE Transactions on Vehicular Technology, vol. 60, no. 3, pp. 981991, 2011.
 M. K. Varanasi, “Decision feedback multiuser detection: A systematic approach,” IEEE Trans. on Inf. Theory, vol. 45, pp. 219240, January 1999.
 J. F. Rößler and J. B. Huber, ”Iterative soft decision interference cancellation receivers for DSCDMA downlink employing 4QAM and 16QAM,” in Proc. 36th Asilomar Conf. Signal, Systems and Computers, Pacific Grove, CA, Nov. 2002.
 J. Luo, K. R. Pattipati, P. K. Willet and F. Hasegawa, “Optimal User Ordering and Time Labeling for Ideal Decision Feedback Detection in Asynchronous CDMA”, IEEE Trans. on Communications, vol. 51, no. 11, November, 2003.
 G. Woodward, R. Ratasuk, M. L. Honig and P. Rapajic, “Minimum MeanSquared Error Multiuser DecisionFeedback Detectors for DSCDMA,” IEEE Trans. on Communications, vol. 50, no. 12, December, 2002.
 R.C. de Lamare, R. SampaioNeto, “Adaptive MBER decision feedback multiuser receivers in frequency selective fading channels”, IEEE Communications Letters, vol. 7, no. 2, Feb. 2003, pp. 73  75.
 F. Cao, J. Li, and J. Yang, ”On the relation between PDA and MMSEISDIC,” IEEE Signal Processing Letters, vol. 14, no. 9, Sep. 2007.
 Y. Cai and R. C. de Lamare, ”Adaptive SpaceTime Decision Feedback Detectors with Multiple Feedback Cancellation”, IEEE Transactions on Vehicular Technology, vol. 58, no. 8, October 2009, pp. 4129  4140.
 P. Li and R. C. de Lamare, ”Adaptive DecisionFeedback Detection With Constellation Constraints for MIMO Systems”, IEEE Transactions on Vehicular Technology, vol. 61, no. 2, 853859, 2012.
 M. Reuter, J.C. Allen, J. R. Zeidler, R. C. North, “Mitigating error propagation effects in a decision feedback equalizer”, IEEE Transactions on Communications, vol. 49, no. 11, November 2001, pp. 2028  2041.
 R.C. de Lamare, R. SampaioNeto, A. Hjorungnes, “Joint iterative interference cancellation and parameter estimation for cdma systems”, IEEE Communications Letters, vol. 11, no. 12, December 2007, pp. 916  918.
 S. Haykin, Adaptive Filter Theory, 4th ed. Englewood Cliffs, NJ: Prentice Hall, 2002.
 C. Berrou and A. Glavieux, “Near optimum errorcorrecting coding and decoding: Turbo codes,” IEEE Trans. Commun., vol. 44, Oct. 1996.
 C. Douillard et al., “Iterative correction of intersymbol interference: Turbo equalization,” European Trans. Telecommun., vol. 6, no. 5, pp. 507–511, Sept.–Oct. 1995.
 X. Wang and H. V. Poor, “Iterative (turbo) soft interference cancellation and decoding for coded CDMA,” IEEE Trans. Commun., vol. 47, pp. 1046–1061, July 1999.
 M. Tuchler, A. Singer, and R. Koetter, ”Minimum mean square error equalization using a priori information,” IEEE Trans. Signal Processing, vol. 50, pp. 673683, Mar. 2002.
 B. Hochwald and S. ten Brink, Achieving nearcapacity on a mutliple antenna channel,” IEEE Trans. Commun., vol. 51, pp. 389399, Mar. 2003.
 J. Hou, P. H. Siegel, L. B. Milstein, “Design of multiinput multioutput systems based on lowdensity Paritycheck codes,” IEEE Transactions on Communications, vol. 53, no. 4, pp. 601 611, April 2005.
 H. Lee, B. Lee, and I. Lee, “Iterative detection and decoding with an improved VBLAST for MIMOOFDM Systems,” IEEE J. Sel. Areas Commun., vol. 24, pp. 504513, Mar. 2006.
 J. Wu, H.N. Lee, “Performance Analysis for LDPCCoded Modulation in MIMO MultipleAccess Systems,” IEEE Transactions on Communications, vol. 55, no. 7, pp. 14171426, July 2007
 X. Yuan, Q. Guo, X. Wang, and Li Ping, ”Evolution analysis of lowcost iterative equalization in coded linear systems with cyclic prefixes,” IEEE J. Select. Areas Commun. (JSAC), vol. 26, no. 2, pp. 301310, Feb. 2008.
 J. W. Choi, A. C. Singer, J Lee, N. I. Cho, ”Improved linear softinput softoutput detection via soft feedback successive interference cancellation,” IEEE Trans. Commun., vol.58, no.3, pp.986996, March 2010.
 M. J. Wainwright, T. S. Jaakkola, and A.S. Willsky, “A new class of upper bounds on the log partition function,” IEEE Trans. Information Theory, vol. 51, no. 7, pp. 2313  2335, July 2005.
 H. Wymeersch, F. Penna and V. Savic, “Uniformly Reweighted Belief Propagation for Estimation and Detection in Wireless Networks,” IEEE Trans. Wireless Communications, vol. PP, No. 99, pp. 19, Feb. 2012.
 J. Liu, R. C. de Lamare, “LowLatency Reweighted Belief Propagation Decoding for LDPC Codes,” IEEE Communications Letters, vol. 16, no. 10, pp. 16601663, October 2012.
 A. Hakkarainen, J. Werner, K. R. Dandekar and M. Valkama, “Widelylinear beamforming and RF impairment suppression in massive antenna arrays,” Journal of Communications and Networks, vol.15, no.4, pp.383,397, Aug. 2013
 T. Adali, P. J. Schreier and L. L. Scharf, “Complexvalued signal processing: the proper way to deal with impropriety”, IEEE Trans. Signal Processing, vol. 59, no. 11, pp. 5101–5125 , November 2011.
 N. Song, R. C. de Lamare, M. Haardt, and M. Wolf, “Adaptive Widely Linear ReducedRank Interference Suppression based on the MultiStage Wiener Filter,” IEEE Transactions on Signal Processing, vol. 60, no. 8, 2012.
 S. Vishwanath, N. Jindal and A. J. Goldsmith, “On the capacity of multiple input multiple output broadcast channels,” in Proc. IEEE International Conf. Commun. (ICC), New York, USA, Apr. 2002, pp. 14441450.