Millimeter Wave MIMO Channel Estimation using Overlapped Beam Patterns and Rate Adaptation
Abstract
This paper is concerned with the channel estimation problem in Millimeter wave (mmWave) wireless systems with large antenna arrays. By exploiting the inherent sparse nature of the mmWave channel, we first propose a fast channel estimation (FCE) algorithm based on a novel overlapped beam pattern design, which can increase the amount of information carried by each channel measurement and thus reduce the required channel estimation time compared to the existing nonoverlapped designs. We develop a maximum likelihood (ML) estimator to optimally extract the path information from the channel measurements. Then, we propose a novel rateadaptive channel estimation (RACE) algorithm, which can dynamically adjust the number of channel measurements based on the expected probability of estimation error (PEE). The performance of both proposed algorithms is analyzed. For the FCE algorithm, an approximate closedform expression for the PEE is derived. For the RACE algorithm, a lower bound for the minimum signal energytonoise ratio required for a given number of channel measurements is developed based on the ShannonHartley theorem. Simulation results show that the FCE algorithm significantly reduces the number of channel estimation measurements compared to the existing algorithms using nonoverlapped beam patterns. By adopting the RACE algorithm, we can achieve up to a 6dB gain in signal energytonoise ratio for the same PEE compared to the existing algorithms.
I Introduction
Millimeter wave (mmWave) communication has been shown to be a promising technique for next generation wireless systems due to a large expanse of available spectrum [3, 4, 5, 6]. This spectrum, ranging from 30GHz to 300GHz, offers a potential 200 times the bandwidth currently allocated in today’s mobile systems [7]. However, a critical challenge in exploiting the mmWave frequency band is its severe signal propagation loss compared to that over conventional microwave frequencies [5, 8, 9]. To compensate for such a loss, large antenna arrays can be employed to achieve a high power gain. Fortunately, owing to the small wavelength of mmWave signals, these arrays can be packed into small areas at the transmitter and receiver [10, 11]. For such mmWave systems, channel state information (CSI) is essential for effective communication and precoder design. However, the use of large antenna arrays results in a large multipleinput multipleoutput (MIMO) channel matrix. This makes the channel estimation of mmWave systems very challenging due to the large number of channel parameters to be estimated. Moreover, owing to the high frequency, it is often not feasible to obtain digital samples from each antenna [3]. To resolve this high frequency sampling problem, analog beamforming techniques have been proposed and widely adopted in open literature (see [12, 13, 14, 10] and references therein). The main idea of analog beamforming is to control the phase of the signal transmitted or received by each antenna via a network of analog phase shifters.
Using analog beamforming techniques, the most straightforward channel estimation method is to exhaustively search in all possible angular directions. Specifically, consider a system with transmit antennas and receive antennas. If we aim at achieving a minimum angular resolution of , an exhaustive searchbased channel estimation would then require a set of transmit beamforming vectors at the transmitter designed to span all possible beam directions and likewise with receive beamforming vectors at the receiver. By searching all possible combinations, an matrix can be formed whose entries represent the channel gains between the transmit and the receive beams. This matrix is commonly referred to as the virtual channel matrix [15, 16, 17, 18, 19]. Despite the large number of entries expected for the mmWave MIMO channel matrix, it has been shown in recent measurements [20, 21] that the mmWave channel exhibits sparse propagation characteristics in the angular domain. That is, there are only a few dominant propagation paths in mmWave channels. This sparsity can be seen in the virtual channel matrix, as only a limited number of transmit and receive direction combinations have a nonzero gain [15]. Therefore, the key objective of mmWave channel estimation is to identify these paths so that the transceiver can align the transmit and receive beams along these paths.
Recently, some compressed sensing based channel estimation algorithms have been proposed to explore the channel sparsity in mmWave systems, e.g., [1, 5, 10, 22]. The fundamental idea in some of these approaches is to search multiple transmit/receive directions in each measurement, by creating initial beam patterns that span a wider angular range than those used by the exhaustive search. Similar adaptive beamforming algorithms and multistage codebooks were proposed in [23, 14, 24, 25]. More recent work [26] has also shown that such hierarchical codebooks can be achieved with a single RF chain by exploiting subarray and deactivation (turningoff) antenna processing techniques. By initially using wider beam patterns, multistage approaches are able to reduce the number of measurements required for channel estimation. However, this introduces a loss of directionality gain, leading to a lower signaltonoise ratio (SNR) at the receiver and a higher probability of estimation error (PEE). In this sense, there exists a challenging tradeoff between estimation time and accuracy for mmWave channel estimation.
As one of the seminal works on multistage codebook approaches, [5] developed a “divide and conquer” search type algorithm to estimate sparse mmWave channels. As shown in Fig. 1, in each stage of this algorithm, the possible ranges of angles of departure (AODs) and angles of arrival (AOAs) are both divided into nonoverlapped angular subranges. Correspondingly, nonoverlapped beam patterns are designed at both the transmitter and receiver such that each transmit (receive) beam pattern exactly covers one AOD (AOA) subrange. The channel estimation carried out in each stage consists of time slots. In each time slot, the pilot signal is transmitted using one of the beam patterns at the transmitter, and then collected by one of the beam patterns at the receiver. The corresponding channel output for each pair of transmitreceive beam patterns can then be obtained. These time slots span all the combinations of transmitreceive beam patterns. By comparing the magnitudes of the corresponding channel outputs, the transmit/receive subranges that the AOD/AOA most likely belong to are determined. The receiver can then feed back the AOD information for use at the transmitter. Afterwards, the algorithm will limit the estimation to the angular subrange identified at each link end in the previous stage and further divide it into subranges for the channel estimation in the next stage. An example of multistage angular refinement, using our proposed approach, can be seen in Fig. 3. This process continues until the smallest beam width resolution is reached. It is shown in [5] that the algorithm requires estimation time proportional to per path. Despite the significant improvement when compared to the exhaustive search approach, such a channel estimation algorithm still might not be quick enough to track the fast channel variations, especially for mmWave mobile channels with rapidly changing parameters. Furthermore, at high SNR it may not be necessary to perform so many measurements, which would result in an unnecessary time delay. Adaptive training approaches are also investigated in [27, 28, 29, 30]. While these approaches are shown to significantly improve system performance as the number of measurement iterations is increased, there is generally no adaptation of the number of measurements with respect to the associated probability of success or channel conditions.
Motivated by this, in this paper we develop a fast mmWave MIMO channel estimation framework by designing a set of novel overlapped beam patterns that can significantly reduce the number of channel estimation measurements. In order to improve estimation accuracy, we then introduce a novel rateadaptive channel estimation approach, where the average number of channel measurements is adapted to channel conditions. The main contributions of this paper are summarized as follows:

Relying on novel overlapped beam pattern design, we first present a fast channel estimation (FCE) algorithm for mmWave systems. In this algorithm, we develop a maximum likelihood (ML) detector to optimally extract the channel AOD/AOA information from the measurements. We also design a linear minimum mean squared error (LMMSE) channel estimator to estimate the channel coefficients by optimally combining the selected measurements in all stages.

We then develop a rateadaptive channel estimation (RACE) algorithm, in which additional measurements are permitted when the current measurements are found to have an inadequate probability of success. In this way, the number of measurements can adapt to the channel conditions and the channel estimation accuracy can be significantly improved with minimal measurements.

We analyze the probability of channel estimation error for the FCE algorithm. In particular, we derive a closedform approximation, lower bound and upper bound for the PEE. Based on the ShannonHartley theorem, we also provide some theoretical analysis for the minimum energy required to estimate the channel using the RACE algorithm.

Finally, we compare the performance of the proposed algorithms to that of the algorithm in [5] with nonoverlapped beam patterns. Simulation results show that both of the proposed algorithms can significantly reduce the number of channel measurements compared to [5]. We show that the FCE algorithm achieves a guaranteed reduction of channel measurements, at the expense of estimation accuracy. On the other hand, the RACE algorithm can achieve the same average reduction of channel measurements as the FCE algorithm, but using up to 6dB less signal energy compared to the algorithm in [5].
Notation : is a matrix, is a vector, is a scalar, and is a set. is the 2norm of , is the determinant of and represents the cardinality of . , and are the transpose, conjugate transpose and conjugate of , respectively. For a square matrix , represents its inverse. We use to denote a diagonal matrix with entries of on its diagonal. is the identity matrix, is an all column vector and denotes the ceiling function. is a complex Gaussian random vector with mean and covariance matrix . and denote the expected value and covariance of , respectively. denotes the Kronecker product of and whereas denotes the rowwise Kronecker product of and .
Ii System Model
Consider a mmWave MIMO system composed of transmit antennas and receive antennas. We consider that both the transmitter and receiver are equipped with a limited number of radio frequency (RF) chains. Following [5], we further assume that these RF chains, at one end, can only be combined to form a single beam pattern, indicating that only one pilot signal can be transmitted and received at one time. In this paper, for simplicity, we consider the unconstrained beamforming vectors by ignoring some practical constraints imposed by hardware such as constant amplitude and quantized phase shifters. However, in practice, our unconstrained beamforming vectors could be realized by using a network of constrained beamformers with quantized phase shifters and constant amplitude as the hybridbeamforming approach adopted in [5] and depicted by Figure 2 therein. To estimate the channel matrix, the transmitter sends a pilot signal , with unit energy (), to the receiver. Denote by and , respectively, the beamforming vector at the transmitter and beamforming vector at the receiver. The corresponding channel output can be represented as
(1) 
where denotes the MIMO channel matrix, is the transmit power and is an complex additive white Gaussian noise (AWGN) vector following distribution .
In this paper we follow [31] and adopt a twodimensional (2D) sparse geometricbased channel model. Specifically, we consider an path channel between the transceiver, with the th path having steering AOD, , and AOA, where . Then the corresponding channel matrix can be expressed in terms of the physical propagation path parameters as
(2) 
where is the fading coefficient of the th propagation path, and and respectively denote the transmit and receive spatial signatures of the th path. To simplify the analysis, we assume that the transmitter and receiver have the same number of antennas (i.e., ). However, it is worth pointing out that the developed schemes can be easily extended to a general asymmetric system. If uniform linear antenna arrays (ULA) are employed at both the transmitter and receiver, we can define and , respectively, where
(3) 
Here, the steering angle, , is related to the physical angle by with denoting the signal wavelength^{1}^{1}1Note that the use of ULA results in no distinguishable difference between AODs and or between AOAs and . Hence, only AODs and AOAs in the range need to be considered.. A similar expression can be written for at the receiver. With halfwavelength spacing, the distance between antenna elements becomes .
From (2), we can see that the overall channel state information of each path includes only three parameters, i.e., the AOD , the AOA , and the fading coefficient . We assume that the fading coefficient of each path follows a complex Gaussian distribution with zero mean and variance and that both and can only take some discrete values from the set . Here, for the sake of ensuing mathematical problem formulation, we only consider the discrete AOA and AOD. It is noteworthy that they can be continuous in practice. However, the extension to the case with continuous AOD/AOA may require the consideration of other more practical issues such as the number of RF chains to realize the beam patterns and the hardware constraints (e.g., quantized phase shifters) imposed on the RF beamforming vectors, which may constitute a new paper. We thus have left this extension as our future work.
We aim to find an efficient way to estimate the three parameters for each path. The key challenge here is how to design a sequence of and in such a way that the channel parameters can be quickly and accurately estimated. We consider pairs of beam patterns that are designed to span all possible transmitreceive combinations. Denote by and , respectively, the transmit and receive beamforming vectors adopted in the th channel measurement time slot such that . Similarly to [5], we assume the same pilot symbol is transmitted during the time slots. Then, after time slots, we can obtain a sequence of measurements represented as
(4) 
where describes the channel inputoutput relationship for a given set of transmit and receive beamforming vectors defined by
(5) 
and
(6) 
is an vector of the corresponding noise terms. Note that since , , the vector follows the same distribution as that of , i.e., .
Motivated by the geometric sparsity of the mmWave channel, in the following two sections, we propose to use a set of overlapped beam patterns that are able to estimate the AOD/AOA information very quickly. We then extend the algorithm to use a rateadaptive estimation approach. The adaptive nature of the algorithm permits additional measurements to be performed under poor channel conditions. This allows the fast channel estimation to be carried out with significant accuracy and energy efficiency.
Iii Fast Channel Estimation with Overlapped Beam Patterns
In this section, we develop a fast channel estimation framework for mmWave systems using overlapped beam patterns, as illustrated in Fig. 3. Specifically, we design a set of beam patterns that are adopted in different measurement time intervals and are overlapped with one another in the angular domain. A maximum likelihood based estimation algorithm is then proposed to accurately retrieve the channel state information from the set of measurements. The proposed channel estimation algorithm also works in a similar multistage manner as that in [5] where each stage reduces the possible subranges in which the AOD/AOA are expected to be found.
Iiia An Example of Overlapped Beam Pattern Design
We will first explain the design principle of overlapped beam patterns using a simple example. Following Fig. 1, we divide the AOD/AOA angular spaces into subranges in the first stage, denoted by , and , respectively. However, instead of using 3 beam patterns to cover them at each transceiver end as in Fig. 1(a), we propose to use only 2 overlapped beam patterns to achieve this. Fig. 2(a) illustrates our designed beam patterns in the first stage. We can see that the first and second beam patterns cover , and , , respectively, and are overlapped in the whole range of . Intuitively, if a path is observed in two measurements using adjacent beam patterns, the AOD or AOA must belong to the overlapped subrange of these two beam patterns. It is also seen that each beam pattern can have different amplitudes in different subranges. We represent the amplitudes of each beam pattern in different subranges by a vector. For beam pattern 1 and 2 in Fig. 2(a), these vectors are respectively defined as
(7) 
where corresponds to the th beam pattern with denoting the amplitude of the th beam pattern in subrange . By using measurement time slots, we can then span all beam pattern combinations between the transceiver. We denote the sequential set of beam patterns respectively adopted at the transmitter and receiver by
(8)  
(9) 
We refer to these as the beam pattern design matrices, with their th row denoting the beam pattern adopted in the th measurement time slot, where . The efficient design of these beam patterns can lead to many solutions. However, one desirable property is that the same quantity of signal energy is transmitted/received via each subrange over all measurements, i.e., the energy of each column of (8)(9) should have the same Euclidean norm. This provides the same accuracy for each possible subrange combination. Another desirable property is that the transmit/receive beamforming gains of all measurement patterns are equal, i.e., the energy of each rows of (8)(9) should be the same.
One possible way to make the beam pattern subrange amplitudes in (7) follow the aforementioned properties, is to normalize and to have unit energy in each row and have equal energy in each column. For the beam patterns in Fig. 2, we have , as beam patterns and do not cover, respectively, the third and first subranges. Due to the symmetry between the two beam patterns, we further have and . This leads to and , and the matrices in (8)(9) become
In order to observe the resultant amplitude gains (i.e., the transceiver gain) over each of the subrange combinations, we introduce another matrix referred to as the generator matrix. We define this as the rowwise Kronecker product between the transmit beam pattern design matrix and the receive beam pattern design matrix such that
(11)  
recalling that and represent the rowwise Kronecker and Kronecker product operations, respectively. By denoting and as the entry on the th and th column on the th row in and , respectively, we can express the th column of as
(13) 
where the relationship is a result of the Kronecker product operation.
The columns of generator matrix describe the received measurement gains over each of the subrange combinations. For example, if a path were present between the transmit subrange and the receive subrange , the measurement vector would be expected to be a scalar multiple of column of , expressed as . It is important to note that the sets of beam patterns used in this paper are not unique. We later show that their performance, in terms of PEE, depends only on the Euclidean distance between each column of , which directly determines the probability that one column corresponding to a certain subrange is to be mistaken for another. In the example set shown in (IIIA), each column has the same equal minimum Euclidean distance when compared with all other columns, although some have more spatial neighbours at this minimum distance than others.
IiiB Beamforming Vector Design
To generate the beam patterns illustrated in Fig. 2(a) and described in (IIIA), the transmit and receive beamforming vectors should be designed as follows. Denote by and , respectively, the transmit beamforming vector and receive beamforming vector corresponding to the th pair of beam patterns in and . We then design the product of the transmit array response and transmit beamforming vector to have
(14) 
and the product of the receiver array response and receive beamforming vector to have
(15) 
where has been defined in (3) and is a scalar constant that ensures . Physically, corresponds to the average directivity gain of each beam pattern and is the same for all due to the normalization of the rows in (IIIA). Eqs. (14) and (15) can be expressed in a matrix form as
(16) 
and
(17) 
where denotes the cardinality of set and is a matrix whose columns describe the antenna array response at each angle. Therefore and can be designed as
(18) 
and
(19) 
where is the pseudo inverse of .
IiiC Channel Measurements
We now perform channel estimation in the first stage using the previously designed transmit and receive beamforming vectors and . In each time slot, the beamforming vectors and are adopted to transmit/receive the pilot signal . If we substitute the channel in (2) into (5), we get
(20) 
Without loss of generality, let us consider the case with AOD, and AOA, , where and are respectively, the transmit and receive subrange indices of the th propagation path. By recalling (14)(15) we write
(21) 
which leads to
(22) 
We can see from (13) that the vector term in (22) is the weighted sum of the columns of i.e., the weighted sum of columns , in . Therefore we can express by the generator matrix as
(23) 
where is a sparse row vector that describes the channel gain at each of the subrange combinations by
(24) 
and zero otherwise. For example with , a single path (i.e., ) with coefficient , exists on the first transmit subrange (i.e., ) and second receive subrange (i.e., ) leads to and
(25) 
Finally by using (23), we can rewrite the received channel output vector defined in (1), after measurements, as
(26) 
IiiD Maximum Likelihood Detection of AOD/AOA Information
We now require an efficient means of detecting given that a generator matrix has been used to obtain the channel outputs in (26). Due to its optimal detection properties, this subsection elaborates how to implement a maximum likelihood detection [32] method to extract the AOD/AOA information from the received measurements. We begin by considering the distribution of . From (26), this can be expressed as
(27)  
(28) 
Recall that , where is the identity matrix. Also recall that the pilot signal has unit energy, i.e., . For the signal component, as each of the path coefficients have zero mean we can write and
(29)  
(30) 
By defining a binary version of , denoted by , with elements defined by
(31) 
we can separate the AOD/AOA information in from each of the path coefficient variances, . As each path coefficient has variance , this then gives . We can then rewrite the distribution of as
(32) 
where
(33) 
It can now be seen that follows a zero mean, circularly symmetric complex Gaussian (CSCG) distribution with corresponding probability density function (PDF) defined as [33]
(34) 
Now let us find the conditional probability of , given the receive measurement vector and knowledge of , denoted by . Define as the set of all possible binary channel realizations such that . We also define to represent the cardinality of this set. Following the principle of maximum likelihood detection and based on Bayes rule [34], we can express the probability of for all possible as
(35) 
where the term
(36) 
is independent of a particular channel realization. We assume that each channel realization is equiprobable, therefore
(37) 
We then denote the probability that the th element of has a path by . We can express this probability as the sum of all in (35) in which by
(38) 
Following the maximum likelihood approach we then find the most likely subrange combination by
(39) 
Finally by finding the most likely transmitter and receiver subranges through
(40) 
we can reduce the ranges of possible AOD and AOA to, respectively, the th transmit and th receive angular subranges. Each of these two subranges will be further divided into another subranges for the channel estimation in the next stage.
IiiE Multistage Generalization
In general the proposed channel estimation algorithm works in a similar multistage manner as that in [5], requiring stages. We show a high level overview of this process in Fig. 3. In the th stage, we initially divide the possible AOA angular space into nonoverlapped subranges and divide the possible AOD angular space into . Then only overlapped beam pattern pairs will be designed at the transmitter and receiver to cover these subranges. The designed beam patterns are characterized by the beam pattern design matrices and . These should be generated to maximize the minimum Euclidean distance between the columns of the corresponding generator matrix .
Given and , we can then generate both the transmit beamforming vectors and receive beamforming vectors respectively in the same way as in (18)(19). For example, to generate , the corresponding vector in (16), which is redefined as for rigorousness, should be designed such that its th entry, denoted by , satisfies
(41) 
where is a scalar constant for the th stage to guarantee that satisfies . Physically, describes the desired beam pattern amplitude at angle when is used. Each receive beamforming vector can be designed in the same way.
The channel output on the th estimation stage can then be obtained after time slots by
(42) 
where
(43) 
and denotes the transmit power of the pilot signal in the th stage. Similar to that in [5], we prefer that all the stages have an equal probability of failure, indicating that we should allocate power among stages inversely proportional to the beamforming gains of these beam patterns, i.e.,
(44) 
where is a constant. Similar to (39) we then find the most likely subrange combination of the th stage by
(45) 
with the corresponding most likely transmitter and receiver subranges given by
(46) 
The selected subranges, and are then used for the channel estimation in the next stage. This process continues until the minimum angle resolution is reached requiring stages. It is worth pointing out that although the proposed algorithms are elaborated based on the estimation process of a single path, their implementation in multipath scenario is actually feasible by following the same procedure as in [5, Algorithm 2]. More specifically, multiple paths are estimated sequentially, with the first path being estimated using the multistage algorithms described above. Subsequent paths can then be found by returning to the first stage and repeating the estimation. Moreover, in each stage’s measurements, the expected contributions from all previously estimated paths can be subtracted to reveal new paths.
IiiF Estimation of the Fading Coefficient
Once all estimation stages described in the previous subsection have been performed, we estimate the identified path fading coefficient . In [5], the value of was estimated based on the measurement of the final stage only. To improve the estimation accuracy, we estimate by using all measurements in all stages of the algorithm. Denote by and , respectively, the vector of all received measurements and the vector of their estimates such that
(47) 
where denotes the th column of . Provided that the AOD/AOA estimation is correct in each stage, we can write
(48) 
where is the vector of corresponding noise terms. In the case where the AOD/AOA estimation is incorrect, the estimation of the fading coefficient is not important as there will be a beam misalignment between the transmitter and receiver. Following the LMMSE principle [35], we can then estimate the fading coefficient as
(49) 
where is an identity matrix. Now we can formally describe the proposed fast channel estimation algorithm using overlapped beam patterns in Algorithm 1.
Remark 1. It can be seen that, compared with the channel estimation algorithm in [5] with the same value of , our proposed algorithm also requires stages, but the number of measurement time slots required in each stage reduces to , instead of . In general, this yields a reduction in measurement time slots. For the example of discussed earlier with , a increase in estimation rate can be achieved.
Iv Rate Adaptive Channel Estimation Algorithm
The proposed channel estimation scheme explained in the previous section uses a fixed where the detector is forced to make a decision after measurements, irrespective of what the computed probability may be. Leveraging the detection method developed in the previous section, we now propose a novel rateadaptive channel estimation (RACE) algorithm.
We first introduce a target maximum probability of estimation error (PEE), denoted by . The basic principle of the RACE algorithm is that after the initial measurements are completed in any given stage, if the most likely subrange combination probability does not satisfy , then additional measurements will be performed. To this end, the receiver will feedback the current most likely transmit subrange, , and also the information indicating whether more measurements are required or not.