Efficient Beam Training and Channel Estimation for Millimeter Wave Communications Under Mobility
Abstract
In this paper, we propose an efficient beam training technique for millimeterwave (mmWave) communications. In the presence of the mobile users under high mobility, the conventional beam training should be performed more frequently to allow the users to acquire channel state information (CSI) accurately. Since it demands high resource overhead for beam training, we introduce the dedicated beam training protocol which sends the training beams separately to a specific high mobility user (called a target user) without changing the periodicity of the conventional beam training. The dedicated beam training does not require much resource since only a small number of the training beams are sent to the target user. In order to achieve good system performance with low training overhead, we design the optimal beam selection strategy which finds the best beamforming vectors yielding the lowest channel estimation error based on the target user’s probabilistic channel information. Such dedicated beam training is combined with the new greedy channel estimator which effectively estimates the mmWave channel accounting for sparse characteristics and dynamics of the target user’s channel. Our numerical evaluation demonstrates that the proposed beam training scheme can maintain good channel estimation performance with significantly less training overhead than the conventional beam training protocols.
I Introduction
In recent years, wireless communications using millimeterwave (mmWave) frequency band has received great deal of attention as a means to meet everincreasing throughput demand of next generation communication systems [1, 2, 3]. Basically, millimeterwave band covers the frequency band ranging from 30 GHz to 300 GHz, which is much higher than the frequency band in current cellular systems. As shown in Fig. 1, there is a vast amount of bandwidth that has not been explored by the current communication systems. Notwithstanding the great promise and potential benefit, there are some drawbacks and obstacles that need to be addressed for the commercialization of mmWavebased communication systems. One major obstacle is the significant path loss of mmWave channels [4, 5, 6]. Compared to conventional communication systems using microwave radio waves, mmWave band experiences high atmospheric attenuation when the transmit signal is absorbed by gas and humidity. Additionally, there would be a significant path loss when the signal is blocked by obstacles such as building, foliage, and user’s body[2]. The key enabler to overcome this drawback is the beamforming technique in which two communication entities transmit and receive the signals with appropriately adjusted phase and amplitude using an array of antennas [3, 7, 8, 9]. Since the wavelength in the mmWave communication systems is in the range of one to ten millimeters, a large number of antenna elements can be integrated into a small form factor, enabling the highly directional beamforming to compensate for the large path loss of mmWave channels.
Over the years, various beamforming strategies, such as hybrid beamforming, switched beamforming, and multistage beamforming, have been introduced [10, 11, 12, 13].
In a nutshell, conventional beamforming protocol consists of two main steps. The first step is the beam training [3, 9]. In this step, the basestation transmits the known training symbols to the mobile users periodically. Using the received beams, a user acquires the channel state information (CSI) which corresponds to the channel gains for all pairs of the transmit antenna and receive antenna. In the widely used beamcycling scheme, for example, the basestation sequentially transmits the beams steered at the equally distributed directions. In the mobile terminal, each user estimates its own CSI and then feeds back the estimated CSI to the basestation. Using the obtained CSIs, the basestation performs the precoding of data symbols, which corresponds to the second stage of beamforming. In this stage, the basestation transmits the precoded symbols using the precoding matrix designed to maximize the system throughput [14, 15].
While high directional beamforming is an effective means to improve the system performance, the system becomes inefficient when the users’ locations change in time or the position and heading angle of the devices vary relative to the basestation. This is because the beam training of the mobile users should be performed more frequently to track the CSIs of the mobile users. Since the beams are shared by all coscheduled users, even when only a fraction of users are under mobility, more frequent beam training should be performed, thus increasing the training overhead. Recently, various attempts have been made to enhance the beam training efficiency. These include the adaptive beamtraining [9], codebookbased beam switching [16], simplex optimizationbased beam training [17], beamcoding approach [18], and multilevel beam training [19]. Also, various beam tracking algorithms to estimate the timevarying mmWave channel have been proposed. These include the probabilistic beam tracking [23], adaptive beamformer and combiner [20], Kalmanbased beam tracking [21], and beamspacebased beam tracking [22].
The primary purpose of this paper is to propose a new beam training strategy to support mobility scenario in mmWave communications. The key idea of the proposed scheme is to use the dedicated beam training to facilitate the beam training targeted for high mobility user without changing the periodicity of the conventional beam training. While all users in the cell are supported by the conventional beam referred to as the common beam, the proposed dedicated beam is designed to support the users under high mobility that cannot acquire the CSI accurately only using the common beam training. Since the dedicated beam training is intended for the moving user, we can improve the quality of beam training by exploiting the information on the moving user’s channel dynamics. As a result, even with a small number of training beams, good system performance for the target user can be ensured. In this perspective, the proposed dedicated beam transmission is analogous to the userspecific pilot transmission scheme in the 4G longterm evolution (LTE) standard [24].
Our contributions in this paper are as follows:

We propose a new type of beam for mmWave systems called dedicated beam. The proposed dedicated beam training protocol achieves a significant reduction in training overhead by transmitting only a small number of training beams over the radio resource allocated for high mobility users. Note that the dedicated beam training process is activated only when there are high mobility users in a cell.

We propose a sparse channel estimation technique suited for the dedicated beam training. The proposed technique is based on modified greedy sparse recovery algorithm which successively finds the estimate of each pair of the angle of departure (AoD) and the angle of arrival (AoA) until it finds those corresponding to multipaths. Based on the received training beams, the proposed channel estimator produces the probabilistic distribution of AoA and AoD, which are used to perform the beam selection and channel estimation for the next beam training period.

Using the statistical channel model that captures the dynamic channel behavior, we derive the optimal beam selection strategy searching for the best beam indices minimizing the channel estimation error from the beam codebook. Specifically, we derive a computationally efficient beam search strategy for dual beam transmission. We demonstrate from numerical experiments that the combination of our channel estimation algorithm and the dedicated beam training yields the performance comparable to the conventional beam cycling with much smaller training overhead.
It is also worth mentioning that our approach is distinct from the existing beam tracking techniques in [21, 30]. These methods consider the transmission of a single beam towards the direction of AoD, which is estimated using the AoD tracking algorithm such as the extended Kalman filter. We observe that this single beam transmission strategy does not perform well for high mobility scenario since an incorrect AoD estimation causes a degradation in the received SNR and hence the AoD estimation quality. Such vicious circle could lead to gradual degradation of the channel estimation until when the beam cycling (involving multiple beam transmissions) is performed. We also find that the sparse channel estimation based on the quantized AoD model [3] does not work with the single beam transmission due to poor conditioning of system matrix. On the contrary, our beam training substantially mitigates this problem by using optimally placed multiple beams (at least two) for beam transmission and adopting the enhanced sparse channel estimation accounting for the probabilistic distribution of the dynamic channel.
The rest of this paper is organized as follows; In section II, we introduce the system and channel models for mmWave communications. In section III, we describe the proposed beam tracking method. In section IV, we present the simulation results, and conclude the paper in section V.
The notations to be used in the rest of paper are as follows. Operations , , , and denote transpose, Hermitian, conjugate, and inverse operations, respectively. is Kronecker product of two matrices and . and denotes the expectation and variance of the random variable . denotes the th element of the matrix and is the th element of the vector . is the submatrix of that contains the columns as specified in the set . and is the vector constructed by picking the elements from the vector as specified in the set . implies that is positive semidefinite. is the vectorization operation of the matrix . denotes the complex multivariate Gaussian distribution
(1) 
Ii mmWave Channel Model and Conventional Beam Training
In this section, we briefly describe the system model for mmWave communications. We first present the mmWave channel model and then discuss the basic procedure of the conventional beam training and channel estimation.
Iia System Description
Consider the mmWave downlink where a basestation equipped with antennas is serving users with antennas. In each beam training period, the basestation transmits beams sequentially at equally distributed directions. Each user estimates its own channel based on the received training signals and then feeds back the channel information to the basestation. After generating the precoding matrix using the users’ channel state, the basestation transmits the precoded data symbols. This beam training procedure is performed periodically to keep track of channels for all users. The frame structure for the conventional common beam training is depicted in Fig. 2.
IiB mmWave Channel Model
In general, the mmWave channel from the basestation to the th user can be expressed as the channel matrix where the subscripts and represent the th beam training period and the th beam transmission, respectively (see Fig. 2). The th element of represents the channel gain from the th antenna of the basestation to the th antenna of the user. The angular domain representation of the channel matrix is expressed as [9]
(2)  
(3) 
where
(4)  
(5)  
(6) 
Note that is the total number of the paths, is the channel gain for the th path, and and are the AoD and AoA for the th path where
where , are the angles in radian for AoD and AoA, respectively. Notice that and are the steering vectors for the basestation and user, respectively. That is,
where and are the distances between the adjacent antennas for the basestation and user, respectively, and is the signal wavelength. It is worth mentioning that although the AoD and AoA have real values in the range , they can be approximated to the discrete values by quantizing them on the uniform grid of and bins. That is,
(7) 
Adopting the quantized channel representation, we have socalled virtual channel representation [9]
(8) 
where
The th element of is the channel gain corresponding to the th angular bin for the AoA and the th angular bin for the AoD. If there exist multipaths, only entries of have dominant values and the rest entries are close to zero. Since the number of multipaths is in general much smaller than that of elements in , as illustrated in Fig. 3, the channel matrix can be readily modeled as a sparse matrix in the angular domain.
IiC Conventional Beam Training and Channel Estimation
As mentioned, during the beam training period, the basestation transmits the known symbols over the consecutive time steps. In the th beam transmission, for example, the basestation transmits the known symbol through the beamforming vector . The corresponding received vector for the th user is given by
(9) 
where is the beamforming vector and is the Gaussian noise vector . Since is the known symbol, we let in the sequel for simplicity. In the generation of the beamforming vector, the beamcycling scheme in which the basestation transmits beams at equally spaced directions is popularly used (see Fig.2)
(10)  
(11)  
(12) 
Clearly, the larger the number of beams , the better the quality of the channel estimation would be. However, large leads to higher resource overhead and consequently lower data throughput. The th user multiplies the combining vectors to , i.e.,
(13)  
(14) 
where is called the combining matrix. Wellknown example of the combining matrix is the DFT matrix given by
(15) 
Since the combining matrix is multiplied to the single received vector, increasing the number of the combining vectors (i.e., the column vectors) does not increase the resource overhead in contrast to the beamforming vectors.
Next, we describe the basic channel estimation algorithm used in the conventional beam training. Using the vectors collected from the th beam training period, the th user estimates the channel matrix . We assume that the channel matrix is fixed during the single beam training period, i.e., . Let , then
(16)  
(17) 
where and . Since the channel matrix can be represented by a small number of parameters in the angular domain, we use the angular channel representation in (3), and as a result
(18)  
(19) 
Note that the subscript is dropped from the channel parameters , , and since we assume that the CSI does not change during the beam training period. From (19), one can see that the channel estimation problem can be readily expressed as the joint estimation of the parameters , , and . In estimating these parameters, one can think of maximum likelihood estimation
(20) 
where , and . Due to the nonlinearity of AoD and AoA with respect to the received vector, joint parameter estimation tends to be computationally infeasible. Even though the optimization is performed over the discretized parameter space, it requires huge computational complexity for . Alternatively, one can use the virtual channel representation, in (17) and construct the observation vector by stacking the columns of as [9]
(21) 
where , , and .
Initialize :
Output : Support set and channel gain
Note that the channel estimation is equivalent to the estimation of the unknown vector from the received vector in (21). In practical scenarios, we cannot use as many training beams as desired due to the limitation of resources, which in turn means that the number of rows in should be less than the number of columns. While obtaining an accurate estimate of in an underdetermined systems is in general very difficult, we can accurately recover by exploiting the sparsity of the channel vector [26]. In estimating the channel , we can basically use any sparse recovery algorithm including orthogonal matching pursuit (OMP) [27] described in Algorithm 1.
Iii Proposed Dedicated Beam Training For Mobility Scenario
In this section, we present the new beam training technique to support the users under mobility. Since the common beam training should serve all users in a cell, a large number of beams to cover wide range of direction are needed. As mentioned, when there are moving users or the positions of devices change, a period for the common beam training should be reduced, thereby increasing training overhead. Instead of using the beam cycling, we use the dedicated beam targeted for the users under high mobility. Using the information on the target user’s channel, we can enhance the efficiency of the beam training employing only a small number of beams optimized for the specific user. In the next subsections, we present the overall beam training protocol, dynamic channel characteristics, optimal beam design, and new channel estimation method.
Iiia Description of Overall Beam Training Protocol
Fig. 4 depicts the proposed beam training protocol. For the users under mobility (e.g., user 3, 4, and 5 in Fig. 4), the basestation transmits the dedicated beams along with the common beams. As shown in Fig. 4, the basestation sends training beams for the dedicated beam training where the number of dedicated beams is much smaller than the number of common beams . In order to serve high mobility users, the dedicated beams are transmitted more frequently than the common beams, i.e., . As shown in Fig. 4, the periodicity of the dedicated beam can be adjusted depending on the extent of mobility. The proposed dedicated beam training involves the interaction between the basestation and the target user. Based on the received beams, the target user estimates the CSI and pick the optimal beams to be used by the basestation in the next beam training period. The selected beam indices are fed back to the basestation and the same beam training procedure repeats until the frame ends. Since the beamforming vectors are optimally selected based on the CSI of the user, the good channel estimation quality and the beamforming accuracy can be maintained even with a small number of beam transmissions.
IiiB Dynamic mmWave Channel Characteristics
In this subsection, we describe the statistical characteristics of mmWave channel under the mobility scenario. Under the condition that the channel is correlated in time and varying slowly, we can readily model this slowlyvarying channel as a Markov process. Using the discrete representation of the AoA and AoD in (7), we model the AoD and AoA as the discrete state Markov process [30, 31]. The discrete state Markov process is described by the transition probability given by
(22) 
In fact, one simple example of the transition probability is
(23) 
where is the normalization constant and indicates the mobility of the user. This transition probability decays exponentially with the difference between the present and past AoD. Note that one can estimate the parameter from the user’s mobility level (e.g. moving speed).
IiiC Optimal Beam Design
During the dedicated beam training period, the basestation needs to adapt the beamforming vectors based on user’s CSI. In our work, since the dedicated beam training is performed in the angular space, we use the channel information represented in terms of the AoD and AoA. First, the basestation selects beamforming vectors () minimizing the channel estimation error of the target user from the beam codebook . We assume that the beam codebook contains the precalculated beamforming vectors that correspond to the steering vectors for different directions and beam width^{1}^{1}1 The beamwidth can be increased by turning off the subset of the antennas. Then, the optimal beam selection problem can be formulated as
(24) 
where is the estimate of . Since the channel estimation performance is determined by both the combining vector and the beamforming vector , they should be jointly determined. In order to simplify the optimization process, we decouple joint optimization step by using the ideal combining matrix in the system model and optimizing the cost function only with respect to the beamforming vectors . After the support in , (corresponding to the active AoD bins) is estimated, the channel gain can be estimated by the linear projection of onto the subspace spanned by . Since the channel gain is calculated by the function of AoD, the channel estimation performance is determined by the AoD estimation accuracy. Thus, the beam selection problem is reformulated as
(25) 
where is the estimate of . While the cost function in (25) is a reasonable choice, since mmWave channel estimation is highly related to the AoD estimation, it depends on channel estimation algorithm. To avoid the dependency of the cost function in (25) on the algorithm selection, we use the analytic bound of AoD variance as a performance metric. Specifically, we derive the lower bound of using information inequality [28]. Let , then the CramerRao Lower bound (CRLB) for the parameter is given by
(26)  
(27) 
where is the Fisher’s information matrix and is the CRLB of for . In our setup, the CRLB is a function of the deterministic parameters so that we cannot determine the CRLB without the knowledge of them. As heuristic surrogate, we use the CRLB weighted with respect to the probability distribution as a cost function, i.e.,
(28) 
where the AoDs for different paths are assumed to be statistically independent in (28). The distribution of the AoD is obtained by the proposed channel estimator, which will be described later. Specifically, based on the training beams received during the previous beam training periods, the proposed channel estimator predicts the distribution of the AoD for the next beam training period. By doing so, we can account for the uncertainty on the AoD estimate in our beam selection strategy. Since the cost function is expressed as a function of beamforming vectors , the beam selection rule can be readily expressed as
(29) 
IiiD Beam Selection with Single Path Scenario
First, we consider the beam selection rule for the single path scenario (i.e., ) where the energy of single path is predominant. In this scenario, the beam selection rule in (25) can be simplified to
(30) 
Also, since , the received signal vector in (19) is simplified to
(31) 
Using the ideal combining matrix , we have
(32) 
Since is a row vector, we can rewrite (32) as
(33) 
and one can show that the CRLB of is given by (see Appendix A)
(34) 
where
By taking similar step to (28), the average cost function can be obtained by weighting the lower bound with the distribution of , i.e.,
(35) 
Then, we search for beam indices from the beam codebook that minimizes the cost function . Although this process is combinatoric in nature and thus computationally burdensome, the computational complexity can be reduced significantly by considering only a small number of beams. In our work, we consider dual beam scenario (), which offers small search complexity as well as low training overhead. In the simulation section, we will demonstrate that the performance of the proposed method using two training beams is comparable to the common beam training using beams. In addition, we will also show that using a single beam does not provide satisfactory performance. When , we search for two beam indices from as
(36) 
To find out , we need to evaluate combinations of all beam indices. Fig. 5 shows the beam patterns obtained by the proposed selection criterion. When the AoD distribution is symmetric, directions of the optimal beam pair are also symmetric with respect to the center of the distribution. Exploiting this symmetry, we only have to determine the angle formed by two beamforming vectors. Then, two dimensional grid search can be reduced to the one dimensional search as
(37) 
where is the beam index corresponding to the center of the AoD distribution. Note that the beam index parameter corresponds to the optimal angle between two selected training beams. (see Fig. 6.)
IiiE Beam Selection With Multipath Scenario
When there are multiple paths, we need to select beams based on the cost metric defined over the set of parameters . As the number of parameters to be considered increases, the derivation of the CRLB would be cumbersome and also the optimization process requires huge computational complexity. To avoid this hassle, we use a simple beam selection method which is in essence an extension of the beam selection strategy for . The basic idea is to use the dual beams optimized for each path. In this case, the basestation sends beams in total. Since the number of paths is small in mmWave environments, the training resources for the dedicated beam training is much smaller than the common beam training (i.e., ). Since our channel estimator is designed to provide the distribution of AoD for each path, (i.e., ), the optimal beam spacing is obtained by running the search algorithm for each path. Though such proposed perpath beam selection seems to be a bit heuristic, our numerical evaluation shows that it offers good system performance with reasonably small training overhead.
IiiF New Channel Estimation for Dedicated Beam Training
In order to estimate the channel in mmWave communication systems, we first need to estimate the channel parameters , , and from the received vector . Under the quantized channel model, (), the AoD and AoA parameters correspond to the support of the channel vector (i.e., the set of indices of nonzero elements in ) and its amplitude. In this subsection, we propose a greedy sparse channel estimation algorithm that jointly estimates the support and the amplitude of from . Note that the proposed channel estimator can also generate the distribution of the current AoD parameter using the measurement vectors acquired until the th beam training period.
Let be the support of the channel vector , where the support index corresponds to the pair of the AoD and the AoA . Then, the received vector can be expressed as
(38) 
where is the vector containing nonzero gains in . Recall that is the submatrix of that contains columns indexed by ^{2}^{2}2For example, . . We assume that is the Gaussian random vector . Since the channel vector is completely determined by two parameters, viz., the support and the magnitude of the gain , the channel estimate can be obtained by finding the joint estimate of and the amplitude . The joint maximum a posteriori (MAP) estimate of and is given by
(39) 
Using and denoting , the MAP estimate of the support can be obtained from
(40)  
(41) 
Note that the channel amplitude vector is Gaussian distributed when the support is given, i.e., where