# Frequency-domain Compressive Channel Estimation for Frequency-selective Hybrid mmWave MIMO Systems

## Abstract

Channel estimation is useful in millimeter wave (mmWave) MIMO communication systems. Channel state information allows optimized designs of precoders and combiners under different metrics such as mutual information or signal-to-interference-noise (SINR) ratio. At mmWave, MIMO precoders and combiners are usually hybrid, since this architecture provides a means to trade-off power consumption and achievable rate. Channel estimation is challenging when using these architectures, however, since there is no direct access to the outputs of the different antenna elements in the array. The MIMO channel can only be observed through the analog combining network, which acts as a compression stage of the received signal. Most of prior work on channel estimation for hybrid architectures assumes a frequency-flat mmWave channel model. In this paper, we consider a frequency-selective mmWave channel and propose compressed-sensing-based strategies to estimate the channel in the frequency domain. We evaluate different algorithms and compute their complexity to expose trade-offs in complexity-overhead-performance as compared to those of previous approaches.

## 1Introduction

MIMO architectures with large arrays are a key ingredient of mmWave communication systems, providing gigabit-per-second data rates [1]. Hybrid MIMO structures have been proposed to operate at mmWave because the cost and power consumption of an all-digital achitecture is prohibitive at these frequencies [2]. Optimally configuring the digital and analog precoders and combiners requires channel knowledge when the design goal is maximizing metrics such as the achievable rate or the SINR. Acquiring the mmWave channel is challenging with a hybrid architecture, however, because the channel is seen through the analog combining network, the SNR is low before beamforming, and the size of the channel matrices is large [3].

A significant number of papers have proposed solutions to the problem of channel estimation at mmWave with hybrid architectures, but under a frequency-flat channel model [4]. These strategies exploit the spatially sparse structure in the mmWave MIMO channel, formulating the estimation of the channel as a sparse recovery problem. The support of the estimated sparse vector identifies the pairs of direction-of-arrival/direction-of-departure (DoA/DoD) for each path in the mmWave channel, while the amplitudes of the non-zero coefficients provide the channel gains for each path. Compressive estimations lead to a reduction in the channel training length when compared to conventional approaches such as those based on least squares (LS) estimation [12].

Recently, some approaches for channel estimation in frequency-selective mmWave channels have been proposed. In a recent paper [13], we designed a time-domain approach to estimate the wideband mmWave channel assuming a hybrid MIMO architecture. This algorithm exploits the sparsity of the wideband millimeter wave channel in both the angular and delay domains. The sparse formulation of the problem in [13] includes the effect of non-integer sampling of the transmit pulse shaping filter, with the subsequent leakage effect and increase of sparsity level in the channel matrix. The main limitation of [13] is that the algorithm can be applied only to single carrier systems. A frequency-domain strategy to estimate frequency-selective mmWave channels was also proposed in [14]. A sparse reconstruction problem was formulated there to estimate the channel independently for every subcarrier, without exploiting spatial congruence between subbands. Another approach in the frequency domain was designed in [15], but only exploting the information from a reduced number of subcarriers. A different algorithm operating in the frequency domain for a MIMO-OFDM system was proposed in [16]. Exploiting the fact that spatial propagation characteristics do not change significantly within the system bandwidth, [16] assumed spatially common sparsity between the channels corresponding to the different subcarriers. A structured sparse recovery algorithm was then considered to reconstruct the channels in the frequency domain. Thus, [16] is an interesting initial solution to the problem, but has several limitations when applied to a mmWave communication system:

The effect of sampling the pulse shaping filter delayed by a non integer factor was not considered in the channel model for a given delay tap. As shown in [18], not accounting for this effect leads to virtual MIMO matrices with an artificially enhanced sparsity.

The algorithm was evaluated only for medium and high SNR regimes (larger than 10 dBs), very unlikely at mmWave, where the expected SNR is below 0 dB.

The reconstruction algorithm provides accurate results when Gaussian measurement matrices are employed; to generate the Gaussian matrices, unquantized phases were considered in the training precoders, which is unrealistic in a practical implementation of a mmWave system based on a hybrid architecture.

The algorithm is based on the strong assumption that the SNR is perfectly known at the receiver before explicit channel estimation.

Another algorithm exploiting common sparsity in the frequency domain at mmWave was proposed in [17]. Unlike the algorithm proposed in [16], the algorithm in [17] is proposed to estimate mmWave wideband MU-MIMO channels. Besides the limitations 1)-3) described above, which also hold in this case, this algorithm exhibits another problem that makes it less feasible to be applied in a real mmWave communication system. A Line-of-Sight (LoS) Rician channel model with dB is considered, which is only applicable when there is a strong LoS path. Owing to this artifact of the channel model, the algorithm in [17] estimates a single path for each user, such that the task of channel estimation can not be successfully accomplished.

This paper proposes two novel frequency-domain approaches for the estimation of frequency-selective mmWave MIMO channels. These approaches overcome the limitations of prior work and provide different trade-offs between complexity and achievable rate for a fixed training length. As in recent work on hybrid architectures for frequency-selective mmWave channels [16], we also consider a MIMO-OFDM communications system. Similar to [14], we introduce zero-padding (ZP) as a means to avoid loss and/or distortion of training data during reconfiguration of RF circuitry. A geometric channel is considered to model the different scattering clusters as in [13],[17],[18], including the band-limiting property in the overall channel response.

Our two proposed approaches exploit the spatially common sparsity within the system bandwidth. The first algorithm aims at exploiting the information on the support coming from every subcarrier in the MIMO-OFDM system and provides the best performance. In contrast, the second algorithm uses less information to estimate the different frequency-domain subchannels, thereby managing to significantly reduce the computational complexity. We show that both strategies are asymptotically efficient, since they both attain the Cramér-Rao Lower Bound (CRLB). Further, we show that asymptotic efficiency can be achieved without using frequency-selective baseband precoders and combiners during the training stage, thereby reducing computational complexity.

Simulation results in the low SNR regime show that the two proposed algorithms significantly outperform the approach in the frequency domain developed in [14]. Comparisons with the algorithms proposed in[16] and [17] are also provided to show their performance in terms of estimation error in the SNR regime that mmWave systems are expected to work. The two proposed channel estimation algorithms provide a good trade-off between performance and overhead. Results show that using a reasonably small training length, approximately in the range of frames, leads to low estimation errors. The computational complexity of the proposed algorithms and previous strategies is also analyzed to compare the trade-offs between performance-complexity provided by the different algorithms. Finally, we also show that it is not necessary to exploit the information on the support coming from every OFDM subcarrier to estimate the different mmWave subchannels. Yet, a reduced number of subcarriers is enough to asymptotically attain the CRLB.

The paper is organized as follows: Section 2 introduces the system model that is used throughout this work. Section 3 proposes two frequency-domain compressive channel estimation approaches, including the computation of the CRLB and suitable estimators derived from it in order to obtain the different frequency-domain channel matrices for each subcarrier. Thereafter, Section 4 provides the main simulation results for the two proposed algorithms, the OMP-based compressive approach proposed in [13] and the SSAMP and DGMP algorithms proposed in [16] and [17], respectively. Finally, Section 5 collects the main conclusions derived from the simulation results and also describes future work to be conducted.

Notation

: We use the following notation throughout this paper: bold uppercase is used to denote matrices, bold lowercase denotes a column vector and non-bold lowercase denotes a scalar value. We use to denote a set. Further, is the Frobenius norm and , , and denote the conjugate transpose, conjugate, transpose and Moore-Penrose pseudoinverse of a matrix , respectively. The -th entry of a matrix is denoted using . Similarly, the -th entry of a column vector is denoted as . The identity matrix of order is denoted as . If and are two matrices, is the Khatri-Rao product of and and is their Kronecker product. We use to denote a circularly symmetric complex Gaussian random vector with mean and covariance matrix . We use to denote expectation. Discrete-time signals are represented as , while frequency-domain signals are denoted using .

## 2System model

We consider an OFDM based mmWave MIMO link employing subcarriers to send data streams using a transmitter with antennas and a receiver with antennas. The system is based on a hybrid MIMO architecture as shown in Figure 1, with and RF chains at the transmitter and receiver sides. For a general exposition, a frequency-selective hybrid precoder is used, with , , where is the analog precoder and the digital one. Note that the analog precoder is frequency-flat, while the digital precoder is different for every subcarrier. The RF precoder and combiner are implemented using a fully connected network of phase shifters, as described in [12]. The symbol blocks are transformed into the time domain using parallel -point IFFTs. As in [14], we consider Zero-Padding (ZP) to both suppress Inter Symbol Interference (ISI) and account for the RF circuitry reconfiguration time. The discrete-time complex baseband signal at subcarrier can be written as

where the transmitted symbol sequence at subcarrier of size is denoted as .

The MIMO channel between the transmitter and the receiver is assumed to be frequency-selective, having a delay tap length in the time domain. The -th delay tap of the channel is represented by a matrix denoted as , which, assuming a geometric channel model [18], can be written as

where denotes the path loss between the transmitter and the receiver, denotes the number of paths, is a filter that includes the effects of pulse-shaping and other lowpass filtering evaluated at , is the complex gain of the th path, is the delay of the th path, and are the angles-of-arrival and departure (AoA/AoD), of the th path, and and are the array steering vectors for the receive and transmit antennas. Each one of these matrices can be written in a more compact way as

where is diagonal with non-zero complex entries, and and contain the receive and transmit array steering vectors and , respectively. The channel can be approximated using the extended virtual channel model defined in [2] as

where is a sparse matrix which contains the path gains of the quantized spatial frequencies in the non-zero elements. The dictionary matrices and contain the transmitter and receiver array response vectors evaluated on a grid of size for the AoA and a grid of size for the AoD. Due to the few scattering clusters in mmWave channels, the sparse assumption for is commonly accepted.

Finally, the channel at subcarrier can be written in terms of the different delay taps as

It is also useful to write this matrix in terms of the sparse matrices and the dictionaries

to help expose the sparse structure later.

Assuming that the receiver applies a hybrid combiner , the received signal at subcarrier can be written as

where is the circularly symmetric complex Gaussian distributed additive noise vector. The receive signal model in corresponds to the data transmission phase. As we will see in Section 3, during the channel acquisition phase, we will consider analog-only training precoders and combiners to reduce complexity.

## 3Compressive Channel Estimation in the Frequency Domain

In this section, we formulate a compressed sensing problem to estimate the vectorized sparse channel vector in the frequency domain. We also propose two algorithms to solve this problem that leverage the common support between the channel matrices for every subcarrier, providing different trade-offs complexity-performance. The first algorithm leverages the common support between the different subchannels providing a very good performance, while the second one only exploits information from a reduced number of subcarriers, thereby keeping computational complexity at a lower level.

### 3.1Problem formulation

We assume that RF chains are used at the transmitter. During the training phase, for the -th frame we use a training precoder and a training combining matrix . This means that during the training phase, only analog precoders and combiners are considered to keep the complexity of the sparse recovery algorithm low. We assume that the transmitted symbols satisfy , with the total transmitted power and . Furthermore, each entry in , is normalized to have squared-modulus and , respectively. Then, the received samples in the frequency domain for the -th training frame can be written as

where is the frequency-domain MIMO channel response at the -th subcarrier and , , is the frequency-domain combined noise vector received at the -th subcarrier. The average received SNR per transmitted symbol is given by . We assume that the channel coherence time is larger than the frame duration and that the same channel can be considered for several consecutive frames.

Using the result , the vectorized received signal is

Taking into account the expression in , the vectorized channel matrix can be written as . Therefore, if we define the measurement matrix as

and the dictionary as

(Equation 7) can be rewritten as

where is the sparse vector containing the complex channel gains. To have enough measurements and accurately reconstruct the sparse vector , it is necessary to use several training frames, especially in the very-low SNR regime. If the transmitter and receiver communicate during training steps using different pseudorandomly built precoders and combiners, (Equation 8) can be extended to

Finally, the vector can be found by solving the sparse reconstruction problem

where is a tunable parameter defining the maximum error between the measurement and the received signal assuming the reconstructed channel between the transmitter and the receiver. Since the sparsity (number of channel paths) is usually unknown, this parameter can be set to the noise variance [13].

There are a great variety of algorithms to solve . We could use, for example, Orthogonal Matching Pursuit (OMP) to find the sparsest approximation of the vectors containing the channel gains. Since the vector depends on the frequency bin, it would be necessary to run the algorithm as many times as the number of subcarriers at which the MIMO channel response is to be estimated. In the next subsections we consider an additional assumption to solve this problem, which avoids the need to run OMP algorithms in parallel as proposed in [14].

The formulation in - is provided as a general framework to leverage the sparse structure of the frequency-selective mmWave channel. This formulation, however, leads to subcarrier-dependent measurement matrices, resulting in high computational complexity for channel estimation. Using more than a single RF chain at the transmitter offers more degrees of freedom to design the measurement matrices and, consequently, obtain better estimation performance. Nonetheless, exploiting these degrees of freedom results in additional computational complexity with regard to the case of using a single RF chain at the transmitter. Furthermore, the difference in performance between using and was observed to be less than dB when solving the problem with the algorithm proposed in this section, which comes from the fact that a reasonable number of RF chains must be used at a given transceiver to keep power consumption low. For this reason, our focus is on designing algorithms exhibiting reasonable performance and low computational complexity.

For this reason, we consider the special case of to present our algorithms with reduced complexity. With this choice, the frequency-selective transmitted symbols are scalar and its effect can be easily inverted at the receiver (i.e., by multiplying by or multiplying by if the transmitted symbols belong to a QPSK constellation with energy-normalized symbols). As shown in Section 4, the use of a single frequency-flat measurement matrix is enough to asymptotically attain the CRLB with on-grid channel parameters. Therefore, a single RF chain and analog-only precoders are used during training, thereby managing a reasonable trade-off between performance and computational complexity. Nonetheless, the algorithms we propose in the next subsections can be easily extended to work with subcarrier-dependent measurement matrices.

By compensating for , for a given training step , the measurement matrix reduces to . At this point, it is important to highlight that: 1) Analog-only training precoders and combiners are used to keep computational complexity low, and 2) A single data stream is transmitted to guarantee that the same measurement matrix is enough to estimate the different frequency-domain subchannels. In fact, these choices enable both online and offline complexity reduction when the noise statistics are used to estimate the MIMO channel at the different subcarriers.

The matrices exhibit an interesting property that can be exploited when solving the compressed channel estimation problems defined in . Let us define the vectorized virtual channel matrix for a given delay tap as

Let us denote the supports of the virtual channel matrices as , . Then, since , with , , it is clear that

where the union of the supports of the time-domain virtual channel matrices comes from the additive nature of the Fourier transform. The sparse assumption on the vectorized channel matrix for a given delay tap is commonly accepted, since in mmWave channels . The vectorized channel matrix will have, in the worst case, non-zero coefficients. Typical values for in mmWave channels are usually lower than symbols (for example IEEE 802.11ad has been designed to work robustly for a maximum of delay taps in the channel), while the number of measured paths usually satisfies for outdoor and indoor scenarios [21]. From these values, using dictionaries of size , allows us to assume a spatially sparse structure for as well. Furthermore, this sparse structure is the same for all , since from , and do not depend on , which means that the AoAs and AoDs do not change with frequency in the transmission bandwidth.

### 3.2Simultaneous weighted estimation exploiting common support between subcarriers (SW-OMP)

To develop a channel estimation algorithm that leverages the sparse nature and the common support property for all , we propose to modify the S-OMP algorithm proposed in [22]. For a given iteration, this algorithm aims at finding a new index of the support such that its likelihood is larger than if the signals for the different subbands were individually processed. In this way, the different signals contribute to decide which is the most likely index belonging to the support in a cooperative fashion. The S-OMP algorithm in [22] computes the gains of the sparse vector using a LS approach once the support is obtained, assuming that the noise covariance matrix is the identity matrix . In this paper, we propose two modifications of the S-OMP for the estimation of the support and the channel gains. The first one accounts for the correlated nature of the noise at the output of the RF combiner, and leads to a more precise estimation of the support in this particular application. The second one consists of a different approach to compute the channel gains. In particular, we develop the minimum variance unbiased estimator for the channel gains, which is shown to be a weighted LS estimator. We show that this estimator attains the CRLB.

#### Support computation with correlated noise

Before explicit estimation of the channel gains, it is necessary to compute the atom, i.e., vector in the measurement matrix, which yields the largest sum-projection onto the received signals, provided that the support of the different sparse vectors is the same. The S-OMP algorithm is based on the assumption that the perturbation (noise) covariance matrix is diagonal, such that no correlation between the different noise components is present. The projection vector is defined as

in which , is the measurement matrix and is the received signal for a given , . If there is correlation between noise components, the atom estimated from the projection in (Equation 11) is not likely to be the actual one. To introduce the appropriate correction in the projection, the specific form the noise covariance matrix takes needs to be used. It is important to highlight that depending on the design of the combiners for each of the training steps, the noise statistics will be different. Specifically, the noise covariance matrix contains the (complex) inner product between the different hybrid combiners, for each training step , . Let us consider two arbitrary (hybrid) combiners , for two arbitrary training steps , and a given subcarrier . Therefore, if the combined noise at a given training step and subcarrier is denoted as , with , the noise cross-covariance matrix is given by

such that it is the same for all subcarriers during training. It can be written as a block diagonal matrix whose -th diagonal block is the Gram matrix of the (hybrid) combiner , that is, , where

Now, we can resort to the Cholesky decomposition of

with an upper triangular matrix. The subscript in and indicates that these matrices only depend on the combiners used during consecutive training frames. Therefore, the projection step is performed as

where is the whitened measurement matrix given by . In this way, the resulting projection simultaneously whitens the spatial noise components and estimates the projection of the received signal onto the subspace spanned by the columns in the measurement matrix.

#### Computation of the channel gains

If the vectorized channel matrix for a given subcarrier , , is given by , with the vector of channel gains, then the received signal is distributed according to . Once the different AoAs/AoDs are found, the signal model for the -th subcarrier can be written as

where is the residual noise in our linear model after estimating the channel support. If the estimation of the support is accurate, will be close to the post-combining noise vector . The vector is the vector of channel gains to be estimated after sparse recovery, where is the estimate of the sparsity level. The matrix is defined as

It is important to remark that the support estimated by the proposed algorithm may be different from the actual channel support. Provided that, in general, can be different from the actual support, the vector containing the complex channel gains can be also different from the vector containing the true channel gains. Since the model in is linear on the parameter vector , there is a Minimum Variance Unbiased (MVU) Estimator that happens to be the Best Linear Unbiased Estimator (BLUE) as well [23].

The mathematical equation in (Equation 12) is usually referred to as the *General Linear Model* (GLM), for which the solution of for real parameters is provided in [23]. The extension for a complex vector of parameters is straightforward and given by

Therefore, is the MVU estimator for our parameter vector , . Hence, it is unbiased and attains the CRLB is the support is estimated correctly. It is interesting to note that this corresponds to a Weighted Least-Squares (WLS) estimator, with the corresponding weights given by the inverse covariance matrix of the noise. We are interested in finding the CRLB for the estimation of the channel matrices at each subcarrier assuming perfect sparse reconstruction. To that end, taking into account only the non-zero entries in , the Fisher Information Matrix (FIM) is derived from the GLM in as

Note that the previous expression gives the FIM for the vector , which contains the actual channel gains. Accordingly, the overall variance of the estimator for the vector of channel gains is given by the sum of the individual CRLBs for each of the complex gains. Therefore, if we denote the overall variance of the estimator for as , , then

and

The last equation only takes into account the channel gains for each path in the extended virtual channel representation. Our interest, however, is to compare the channel estimation performance at the antenna level. To that end, the frequency-domain channel matrix can be vectorized as (subcarrier-wise)

where , are the array response matrices evaluated on the AoAs/AoD. Notice that the decomposition in (Equation 13) is expressed with equality, which follows from the assumption that the estimation of the support is perfect and only the channel gains are to be estimated. The assumption on the perfect estimation of the support comes from the fact that the channel vectors are assumed to be sparse and the AoA/AoD are estimated perfectly. Since the Fisher Information requires that the mean of the received signal is continuous and differentiable on the parameters to estimate, it cannot be applied to the estimation of the angular parameters. Owing to our lacking space, the solution to the problem of finding the CRLB in a more general case accounting for continuous AoA and AoD is left for future work.

The overall minimum variance for an unbiased estimator of the entries in is given by the sum of the variances for each element , , . Mathematically, following the same notation as with , we will denote the overall variance of the estimator for as , for a given subcarrier . Then, is derived as

An important feature of this estimator is that the difference in performance given by the LS and the WLS estimators is more accentuated as the number of RF chains grows (if and only if the hybrid combiner is not built from orthonormal vectors). Furthermore, this estimator works even though the noise variance is unknown. This is because the noise covariance matrix can be written as , where models the coupling between the different combining vectors. Since the combining vectors are known at the receiver, this matrix is known as well. Therefore, we can write our estimator for the vector containing the channel gains, as

When the noise variance is unknown, it follows from the Slepian-Bangs formula that the CRLB for the vector does not change, since the noise mean does not depend on the variance . Thus, the elements in the FIM modeling the coupling between the noise variance and the channel gains are zero. Therefore, the estimator of is efficient. In fact, it is also the ML estimator for this complex parameter vector, since it can be seen to maximize the Log-Likelihood Function (LLF) of the received signal. Nonetheless, there is no efficient estimator for both and , since the MVUE of depends on the true value for . Despite this, the MLE for can still be found by setting the partial derivative of the LLF to zero. If we use the identities

and

it is possible to find that

Finally, we can consider the information coming from all the subcarriers to enhance the estimation of the noise variance. For the different subcarriers, we have the model

where models the estimation error for the variance in the -th subcarrier. We known that, in high SNR regime, the ML estimator is Gaussian distributed. Therefore, we have a set of linear models for the estimation of the variance. Since the model is linear on the parameter , we find that the ML of the variance is also the BLUE estimator, given by

where is defined as and is the column vector containing a in each position.

The derived estimator for the channel gains can be used in the S-OMP algorithm. In fact, the larger the number of subcarriers, the smaller the estimation variance the ML estimator can achieve. Thereby, if the number of averaging subcarriers is large enough, the lack of knowledge of the sparsity level is not so critical because of two reasons: 1) the computation of the support is more precise due to noise averaging during the projection process, and 2) if the support is estimated correctly, a particular estimate of will be very close to the true noise variance, such that the chosen halting criterion is optimal from the Maximum Likelihood perspective. It should be clear that the higher the covariance between adjacent noise components, the larger the performance gap between the S-OMP and the SW-OMP algorithm will be, which actually depends on the ratio between () and ().

The modification of the S-OMP algorithm to include the MVU estimator for the channel gains, as well as the whitening matrix to estimate the support is provided in Algorithm ?. Notice that the proposed algorithm performs both noise whitening and channel estimation, such that interferences due to noise correlation are mitigated.

### 3.3Subcarrier Selection Simultaneous Weighted-OMP + Thresholding

Despite the use of a single, subcarrier-independent measurement matrix to estimate the frequency-domain channels, the algorithm presented in the previous section exhibits high computational complexity. The SW-OMP algorithm considers the distributed projection coming from every subcarrier; however, a trade-off between performance and computational complexity can be achieved if a small number of subcarriers is used, instead. The problem amounts as to how to choose those subcarriers, since no quality measure is available beforehand. The ideal situation would require knowledge of the Signal-to-Noise Ratio (SNR), which is unknown so far. Nonetheless, we have access to different frequency-domain received vectors , . Therefore, the quality measure to be used could be the -norm of the different vectors. Thereby, the subcarriers having largest -norm can be used to derive an estimate of the support of the already defined sparse channel vectors , .

The main problem concerning Matching Pursuit (MP) algorithms comes from the lack of knowledge of the channel sparsity . For this reason, there is usually an iteration at which paths have been detected but the estimate of the average residual energy is a little larger than the noise variance itself. This makes the algorithm find additional paths which are not actually contained in the MIMO channel. These paths usually have low power, and a pruning procedure is needed to filter out these undesired components.

A reasonable way to prune the undesired paths is to remove those components whose power falls below a given threshold, which can be related to the average power of the component in the estimated sparse vectors having maximum average power. Let us denote this power by . Then, the threshold can be defined as , . The value is taken as . To keep the common sparsity property we must ensure that the channel support after thresholding remains invariant. For this purpose, we define a signal whose -th component is given by , such that measures the average power along the different subbands of each spatial component in the quantized angle grid. The final support after thresholding is defined as . Therefore, the components in indexed by are the final channel gains estimates for each subcarrier. The modification of the proposed SW-OMP algorithm to reduce computational complexity and implement this pruning procedure is provided in Algorithm ?.

## 4Results

This section includes the main empirical results obtained with the two proposed algorithms, SW-OMP and SS-SW-OMP + Thresholding, and comparisons with other the frequency-domain channel estimation algorithms including SSAMP [16] and DGMP [17]. To obtain these results, we perform Monte Carlo simulations averaged over many trials to evaluate the normalized mean squared error (NMSE) and the ergodic rate as a function of SNR and number of training frames . We also provide calculations of the computational complexity for the proposed algorithms in Table ? and prior work in Table ?.

The typical parameters for our system configuration are summarized as follows. Both the transmitter and the receiver are assumed to use Uniform Linear Arrays (ULAs) with half-wavelength separation. Such a ULA has steering vectors obeying the expressions and . We take and . The phase-shifters used in both the transmitter and the receiver are assumed to have quantization bits, so that the entries of the training vectors , , are drawn from a set . The number of RF chains is set to at the transmitter and at the receiver. The number of OFDM subcarriers is set to .

We generate channels according to with the following parameters:

The channel paths are assumed to be independent and identically distributed, with delay chosen uniformly from , with s, as in the IEEE 802.11ad wireless standard.

The AoAs/AoDs are assumed to be uniformly distributed in .

The gains of each path are zero-mean complex Gaussian distributed such that .

The band-limiting filter is assumed to be a raised-cosine filter with roll-off factor of .

The number of delay taps of the channel is set to symbols.

The simulations we perform consider channel realizations in which the AoA/AoD are off-grid, i.e. do not correspond to the angles used to build the dictionary, and also the on-grid case, to analyze the loss due to the model mismatch.

### 4.1NMSE Comparison

One performance metric is the Normalized Mean Squared Error (NMSE) of a channel estimate for a given realization, defined as

The NMSE will be our baseline metric to compute the performance of the different estimators, and will be averaged over many channel realizations. The normalized CRLB (NCRLB) is also provided to compare the average performance of each algorithm with the lowest achievable NMSE, and will also be averaged over many channel realizations

We compare the average NMSE versus SNR obtained for the different channel estimation algorithms in Fig. ? for a practical SNR range of dB to dB, on-grid AoA/AoDs, and two different values for the number of training frames. SW-OMP performs the best, achieving NMSE values very close to the NCRLB. SS-SW-OMP+Th performs similarly to SW-OMP, although there is some performance loss due to the fact that SS-SW-OMP+Th does not employ every subcarrier to estimate the common support of the sparse channel vectors. In SS-SW-OMP+Th, the number of selected subcarriers for the estimation of the support was set to for illustration and the parameter was chosen as , which is a reasonably small value to filter out undesired components in the sparse channel estimate. From the curves shown in Fig. ?, OMP performs poorly for all the SNR range due to the fact that it is not designed to process several vectors which are sparse in a common vector basis. Exploiting common sparsity provides an NMSE reduction of approximately dB, although there are slight variations depending on the SNR. This improvement comes at the cost of a higher offline computational complexity in the proposed algorithms in comparison with OMP, as we show in Section 4.3.

We also observe that the DGMP algorithm from [17] performs the worst, due to the fact that it has been designed to estimate near-LoS mmWave MIMO channels. Since it only estimates a single path, the estimation error for NLOS channels is large. The algorithm SSAMP is also shown for comparison. We can see that, at low SNR regime, the information on the common support is enough to outperform the OMP algorithm, but not at the high SNR regime. This comes from the fact that the algorithm is estimating more than a single path per iteration. Provided that the dictionary matrices are not square in this setting, the redundancy between columns in the transmit and receive array matrices makes it difficult to properly estimate more than a support index per iteration.

(a) | (b) |

The previous simulations showed the performance of the algorithms when the channel fits the on-grid model, but it is also important to analyze the performance in a practical scenario, when the AoA/AoD do not fall within the quantized spatial grid. Fig. ? shows the performance of the different algorithms for off-grid angular parameters. It can be observed that when using there is a considerable loss in performance due to the use of a fixed dictionary to estimate the AoA/AoD. However, the estimation error is below dB for values of SNR in the order of and beyond. On the other hand, provided that the SNR expected in mmWave communication systems is in the order of dB up to dB, the large gap between the proposed algorithms and the NCRLB needs to be reduced. Increasing the size of the dictionary is one of the possible solutions, as shown by the curves corresponding to and .

(a) | (b) |

We show in Figure 2 the performance of the different frequency-domain algorithms when increasing the number of subcarriers. The parameters for the simulation scenario are the same as in Fig. ?, however, the number of subcarriers is set to in this case. is set to subcarriers and . Interestingly, both SW-OMP and SS-SW-OMP+Th are asymptotically efficient since they are both unbiased and attain the NCRLB. A magnified plot for dB is also shown to clearly see the performance gap between the different algorithms and also the NCRLB.

Fig. ? shows the average NMSE vs number of training frames . The number of training frames is increased from to . The remaining parameters in the simulation scenario are the same as in Fig. ?. Results are shown for channel realizations in which the angular parameters fall within the quantized angle grid and when they do not. The average performance of OMP algorithm is poor for all the considered cases, which comes from its inability to exploit the common support property shared by the different subchannels. SW-OMP can be observed to provide the best performance for all the values of and SNR. Furthermore, its performance lies closer to the NCRLB at higher SNR. The larger the number of training frames and the higher the SNR, the estimation of the support is more robust and gets closer to the actual one. The NCRLB provided assumes perfect sparse recovery, which is the reason why the estimation error does not lie within the bound. Nonetheless, in the on-grid case, if the number of training frames is large enough and the SNR is not low, the performance gap between SW-OMP and the NCRLB is lower than dB. The difference in performance between SS-SW-OMP+Th and SW-OMP reduces when either or SNR is increased. As in Fig. ?, there is a big difference in performance between OMP and SW-OMP, depending on both the SNR and the number of frames. We also observe a high loss in performance when the angular parameters do not fall within the grid.

(a) | (b) | (c) |

(d) | (e) | (f) |

### 4.2Spectral efficiency comparison

Another performance metric is the spectral efficiency, computed assuming fully-digital precoding and combining using the dominant left and right singular vectors of the channel estimate. This gives parallel channels each with gain , leading to average spectral efficiency

In Fig. ?(a), we show the achievable spectral efficiency as a function of the SNR for the different channel estimation algorithms when off-grid AoA/AoDs are considered. The simulation parameters are the same as in Fig. ?. The difference in performance between OMP and either of our two proposed algorithms is approximately constant as the SNR increases, which comes from the fact that OMP does not force the channel estimates to share the same support. The two proposed algorithms perform very similarly for all the range of SNR, which is an indicator that subcarriers are enough to obtain a reliable channel estimate. Therefore, SS-SW-OMP+Th can be claimed to be a good trade-off between performance and computational complexity.

(a) | (b) |

Finally, Fig. ?(b) shows the spectral efficiency as a function of the number of training frames for the different channel estimation algorithms. Comparisons are provided for dB. We observe that SW-OMP is the algorithm providing best performance, followed closely by SS-SW-OMP+Th, whilst OMP performs the worst. For low values of SNR, there is a noticeable gap between SW-OMP and the perfect CSI case, which becomes smaller as increases. For small values of , the performance of the sparse recovery algorithms is very poor. Conversely, using frames does not result in a significant improvement in performance. Simulations also show that near-optimal achievable rates can be achieved by using a reasonable number of frames, i.e., .

### 4.3Computational complexity

The computational complexity for each step in the different algorithms is also provided in in Table ?. Since some steps can be performed before running the channel estimation algorithms, we will distinguish between on-line and off-line operations. Values are provided for a single iteration. In the case of the SSAMP algorithm in [16] and the DGMP algorithm in [17], we take the notation used in the corresponding papers for frequency-domain vectors and measurement matrices.

Operation | Complexity |

Maximum of | |

Overall | |

Operation | Complexity |

Find the strongest subcarriers | |

Maximum of | |

Thresholding | |

Overall | |

Operation | Complexity |

Maximum of | |