No Downlink Pilots are Needed in TDD Massive MIMO

# No Downlink Pilots are Needed in TDD Massive MIMO

## Abstract

We consider the Massive Multiple-Input Multiple-Output downlink with maximum-ratio and zero-forcing processing and time-division duplex operation. To decode, the users must know their instantaneous effective channel gain. Conventionally, it is assumed that by virtue of channel hardening, this instantaneous gain is close to its average and hence that users can rely on knowledge of that average (also known as statistical channel information). However, in some propagation environments, such as keyhole channels, channel hardening does not hold.

We propose a blind algorithm to estimate the effective channel gain at each user, that does not require any downlink pilots. We derive a capacity lower bound of each user for our proposed scheme, applicable to any propagation channel. Compared to the case of no downlink pilots (relying on channel hardening), and compared to training-based estimation using downlink pilots, our blind algorithm performs significantly better. The difference is especially pronounced in environments that do not offer channel hardening.

{IEEEkeywords}

Blind channel estimation, downlink, keyhole channels, Massive MIMO, maximum-ratio processing, time-division duplexing, zero-forcing processing.

## 1 Introduction

\IEEEPARstart

In Massive Multiple-Input Multiple-Output (MIMO), the base station (BS) is equipped with a large antenna array (with hundreds of antennas) that simultaneously serves many (tens or more of) users. It is a key, scalable technology for next generations of wireless networks, due to its promised huge energy efficiency and spectral efficiency [2, 3, 4, 5, 6, 7]. In Massive MIMO, time-division duplex (TDD) operation is preferable, because the amount of pilot resources required does not depend on the number of BS antennas. With TDD, the BS obtains the channel state information (CSI) through uplink training. This CSI is used to detect the signals transmitted from users in the uplink. On downlink, owing to the reciprocity of propagation, CSI acquired at the BS is used for precoding. Each user receives an effective (scalar) channel gain multiplied by the desired symbol, plus interference and noise. To coherently detect the desired symbol, each user should know its effective channel gain.

Conventionally, each user is assumed to approximate its instantaneous channel gain by its mean [8, 9, 10]. This is known to work well in Rayleigh fading. Since Rayleigh fading channels harden when the number of BS antennas is large (the effective channel gains become nearly deterministic), the effective channel gain is close to its mean. Thus, using the mean of this gain for signal detection works very well. This way, downlink pilots are avoided and users only need to know the channel statistics. However, for small or moderate numbers of antennas, the gain may still deviate significantly from its mean. Also, in propagation environments where the channel does not harden, using the mean of the effective gain as substitute for its true value may result in poor performance even with large numbers of antennas.

The users may estimate their effective channel gain by using downlink pilots, see [2] for single-cell systems and [11] for multi-cell systems. Effectively, these downlink pilots are orthogonal between the users and beamformed along with the downlink data. The users may use, for example, linear minimum mean-square error (MMSE) techniques for the estimation of this gain. The downlink rates of multi-cell systems for maximum-ratio (MR) and zero-forcing (ZF) precoders with and without downlink pilots were analyzed in [12]. The effect of using outdated gain estimates at the users was investigated in [13]. Compared with the case when the users rely on statistical channel knowledge, the downlink-pilot based schemes improve the system performance in low-mobility environments (where the coherence interval is long). However, in high-mobility environments, they do not work well, owing to the large requirement of downlink training resources; this required overhead is proportional to the number of multiplexed users. A better way of estimating the effective channel gain, which requires less resources than the transmission of downlink pilots does, would be desirable.

Inspired by the above discussion, in this paper, we consider the Massive MIMO downlink with TDD operation. The BS acquires channel state information through the reception of uplink pilot signals transmitted by the users – in the conventional manner, and when transmitting data to the users, it applies MR or ZF processing with slow time-scale power control. For this system, we propose a simple blind method for the estimation of the effective gain, that each user should independently perform, and which does not require any downlink pilots. Our proposed method exploits the asymptotic properties of the received data in each coherence interval. Our specific contributions are:

• We give a formal definition of channel hardening, and an associated criterion that can be used to test if channel hardening holds. Then we examine two important propagation scenarios: independent Rayleigh fading, and keyhole channels. We show that Rayleigh fading channels harden, but keyhole channels do not.

• We propose a blind channel estimation scheme, that each user applies in the downlink. This scheme exploits the asymptotic properties of the sample average power of the received signal per coherence interval. We presented a preliminary version of this algorithm in [1].

• We derive a rigorous capacity lower bound for Massive MIMO with estimated downlink channel gains. This bound can be applied to any types of channels and can be used to analyze the performance of any downlink channel estimation method.

• Via numerical results we show that, in hardening propagation environments, the performance of our proposed blind scheme is comparable to the use of only statistical channel information (approximating the gain by its mean). In contrast, in non-hardening propagation environments, our proposed scheme performs much better than the use of statistical channel information only. The results also show that our blind method uniformly outperforms schemes based on downlink pilots [2, 11].

Notation: We use boldface upper- and lower-case letters to denote matrices and column vectors, respectively. Specific notation and symbols used in this paper are listed as follows:

 ()∗, ()T, and ()H Conjugate, transpose, and transpose conjugate Determinant and trace of a matrix Circularly symmetric complex Gaussian vector with zero mean and covariance matrix Σ Absolute value, Euclidean norm Expectation, variance operators Convergence in probability n×n identity matrix The kth column of A.

## 2 System Model

We consider a single-cell Massive MIMO system with an -antenna BS and single-antenna users, where . The channel between the BS and the th user is an channel vector, denoted by , and is modelled as:

 gk=√βkhk, (1)

where represents large-scale fading which is constant over many coherence intervals, and is an small-scale fading channel vector. We assume that the elements of are uncorrelated, zero-mean and unit-variance random variables (RVs) which are not necessarily Gaussian distributed. Furthermore, and are assumed to be independent, for . The th elements of and are denoted by and , respectively.

Here, we focus on the downlink data transmission with TDD operation. The BS uses the channel estimates obtained in the uplink training phase, and applies MR or ZF processing to transmit data to all users in the same time-frequency resource.

Let be the length of the coherence interval (in symbols). For each coherence interval, let be the length of uplink training duration (in symbols). All users simultaneously send pilot sequences of length symbols each to the BS. We assume that these pilot sequences are pairwisely orthogonal. So it is required that . The linear MMSE estimate of is given by [14]

 ^gk=τu,pρuβkτu,pρuβk+1gk+√τu,pρuβkτu,pρuβk+1wp,k, (2)

where independent of , and is the transmit signal-to-noise ratio (SNR) of each pilot symbol.

The variance of the th element of is given by

 Var{^gmk}=E{|^gmk|2}=τu,pρuβ2kτu,pρuβk+1≜γk. (3)

Let be the channel estimation error, and be the th element of . Then from the properties of linear MMSE estimation, and are uncorrelated, and

 Var{~gmk}=E{|~gmk|2}=βk−γk. (4)

In the special case where is Gaussian distributed (corresponding to Rayleigh fading channels), the linear MMSE estimator becomes the MMSE estimator and is independent of .

Let be the th symbol intended for the th user. We assume that , where . With linear processing, the precoded signal vector is

 x(n)=√ρdK∑k=1√ηkaksk(n), (5)

where , , are the precoding vectors which are functions of the channel estimate , is the (normalized) average transmit power, are the power coefficients, and is a diagonal matrix with on its diagonal. For a given , the power control coefficients are chosen to satisfy an average power constraint at the BS:

 E{∥x(n)∥2}≤ρd. (6)

The signal received at the th user is1

 yk(n)=gHkx(n)+wk(n) =√ρdηkαkksk(n)+K∑k′≠k√ρdηk′αkk′sk′(n)+wk(n), (7)

where is additive Gaussian noise, and

 αkk′≜gHkak′.

Then, the desired signal is decoded.

We consider two linear precoders: MR and ZF processing.

• MR processing: here the precoding vectors are

 ak=^gk∥^gk∥,k=1,…,K. (8)
• ZF processing: here the precoding vectors are

 ak=1∥∥∥[^G(^GH^G)−1]k∥∥∥[^G(^GH^G)−1]k, (9)

for .

With the precoding vectors given in (8) and (9), the power constraint (6) becomes

 K∑k=1ηk≤1. (10)

## 3 Preliminaries of Channel Hardening

One motivation of this work is that Massive MIMO channels may not always harden. In this section we discuss the channel hardening phenomena. We specifically study channel hardening for independent Rayleigh fading and for keyhole channels.

Channel hardening is a phenomenon where the norms of the channel vectors , , fluctuate only little. We say that the propagation offers channel hardening if

 ∥gk∥2E{∥gk∥2} P→ 1,as M→∞,k=1,…,K. (11)

### 3.1 Advantages of Channel Hardening

If the BS and the users know the channel perfectly, the channel is deterministic and its sum-capacity is given by [15]

 C=maxηk≥0,∑Kk=1ηk≤1log2det(IM+ρdGDηGH), (12)

where is the diagonal matrix whose th diagonal element is the power control coefficient .

In Massive MIMO, for most propagation environments, we have asymptotically favorable propagation [16], i.e. , as , for . In addition, if the channel hardens, i.e., , as ,2 then we have, for fixed ,

 C−maxηk≥0,∑Kk=1ηk≤1K∑k=1log2(1+ρdηkβkM) =C−maxηk≥0,∑Kk=1ηk≤1log2det⎛⎜ ⎜⎝IK+ρdDηM⎡⎢ ⎢⎣β1⋯0⋮⋱⋮0⋯βK⎤⎥ ⎥⎦⎞⎟ ⎟⎠ =maxηk≥0,∑Kk=1ηk≤1log2det⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣1+ρdη1∥g1∥21+ρdη1β1M⋯ρdη1gH1gK1+ρdηKβKM⋮⋱⋮ρdηKgHKg11+ρdη1β1M⋯1+ρdηK∥gK∥21+ρdηKβKM⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠ →0,as M→∞. (13)

In (3.1) we have used the facts that

 1+ρdηk∥gk∥21+ρdηkβkM=1M+ρdηk∥gk∥2M1M+ρdηkβk→1, as M→∞,

and for ,

 ρdηkgHkgk′1+ρdηk′βk′M=ρdηkgHkgk′/M1/M+ρdηk′βk′→0,as M→∞.

The limit in (3.1) implies that if the channel hardens, the sum-capacity (12) can be approximated for as:

 C≈maxηk≥0,∑Kk=1ηk≤1K∑k=1log2(1+ρdηkβkM), (14)

which does not depend on the small-scale fading. As a consequence, the system scheduling, power allocation, and interference management can be done over the large-scale fading time scale instead of the small-scale fading time scale. Therefore, the overhead for these system designs is significantly reduced.

Another important advantage is: if the channel hardens, then we do not need instantaneous CSI at the receiver to detect the transmitted signals. What the receiver needs is only the statistical knowledge of the channel gains. This reduces the resources (power and training duration) required for channel estimation. More precisely, consider the signal received at the th user given in (2.2). The th user wants to detect from . For this purpose, it needs to know the effective channel gain . If the channel hardens, then . Therefore, we can use the statistical properties of the channel, i.e., is a good estimate of when detecting . This assumption is widely made in the Massive MIMO literature [8, 9, 10] and circumvents the need for downlink channel estimation.

### 3.2 Measure of Channel Hardening

We next state a simple criterion, based on the Chebyshev inequality, to check whether the channel hardens or not. A similar method was discussed in [17]. From Chebyshev’s inequality, we have

 Pr⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩∣∣ ∣ ∣∣∥gk∥2E{∥gk∥2}−1∣∣ ∣ ∣∣2≤ϵ⎫⎪ ⎪ ⎪⎬⎪ ⎪ ⎪⎭ Missing \left or extra \right Missing or unrecognized delimiter for \left (15)

Clearly, if

 Missing or unrecognized delimiter for \right (16)

we have channel hardening. In contrast, (11) implies

 Var{∥gk∥2}(E{∥gk∥2})2→0,as M→∞,

so if (16) does not hold, then the channel does not harden. Therefore, we can use to determine if channel hardening holds for a particular propagation environment.

### 3.3 Independent Rayleigh Fading and Keyhole Channels

In this section, we study the channel hardening property of two particular channel models: Rayleigh fading and keyhole channels.

Consider the channel model (1) where (the elements of ) are i.i.d.  RVs. Independent Rayleigh fading channels occur in a dense, isotropic scattering environment [18]. By using the identity [19], we obtain

 Missing or unrecognized delimiter for \right =1β2kM2E{∥gk∥4}−1 =1M→0,M→∞. (17)

Therefore, we have channel hardening.

#### Keyhole Channels

A keyhole channel (or double scattering channel) appears in scenarios with rich scattering around the transmitter and receiver, and where there is a low-rank connection between the two scattering environments. The keyhole effect can occur when the radio wave goes through tunnels, corridors, or when the distance between the transmitter and receiver is large. Figure 1 shows some examples where the keyhole effect occurs in practice. This channel model has been validated both in theory and by practical experiments [21, 22, 23, 24]. Under keyhole effects, the channel vector in (1) is modelled as [22]:

 gk=√βknk∑j=1c(k)ja(k)jb(k)j, (18)

where is the number of effective keyholes, is the random channel gain from the th user to the th keyhole, is the random channel vector between the th keyhole associated with the th user and the BS, and represents the deterministic complex gain of the th keyhole associated with the th user. The elements of and are i.i.d. RVs. Furthermore, the gains are normalized such that . Therefore,

 nk∑i=1∣∣c(k)i∣∣2=1. (19)

When , we have a degenerate keyhole (single-keyhole) channel. Conversely, when , under the additional assumptions that for finite and as , we obtain an i.i.d. Rayleigh fading channel.

We assume that different users have different sets of keyholes. This assumption is reasonable if the users are located at random in a large area, as illustrated in Figure 1. Then from the derivations in Appendix .1, we obtain

 Missing or unrecognized delimiter for \right =(1+1M)nk∑i=1∣∣c(k)i∣∣4+1M →nk∑i=1∣∣c(k)i∣∣4≠0,M→∞. (20)

Consequently, the keyhole channels do not harden. In addition, since , we have

 Missing or unrecognized delimiter for \right ≤(1+1M)nk∑i=1∣∣c(k)i∣∣2+1M. (21)

Using (19), (21) becomes

 Missing or unrecognized delimiter for \right ≤1+2M, (22)

where the right hand side corresponds to the case of single-keyhole channels (). This implies that a single-keyhole channel represents the worst case in the sense that then the channel gain fluctuates the most.

## 4 Proposed Downlink Blind Channel Estimation Technique

The th user should know the effective channel gain to coherently detect the transmitted signal from in (2.2). Most previous works on Massive MIMO assume that is used in lieu of the true when detecting . The reason behind this is that if the channel is subject to independent Rayleigh fading (the scenario considered in most previous Massive MIMO works), it hardens when the number of BS antennas is large, and hence ; is then a good estimate of . However, as seen in Section 3, under other propagation models the channel may not always harden when and then, using as the true effective channel to detect may result in poor performance.

For the reasons explained, it is desirable that the users estimate their effective channels. One way to do this is to have the BS transmit beamformed downlink pilots [2]. Then at least downlink pilot symbols are required. This can significantly reduce the spectral efficiency. For example, suppose antennas serve users, in a coherence interval of length symbols. If half of the coherence interval is used for the downlink, then with the downlink beamforming training of [2], we need to spend at least symbols for sending pilots. As a result, less than of the downlink symbols are used for payload in each coherence interval, and the insertion of the downlink pilots reduces the overall (uplink + downlink) spectral efficiency by a factor of .

In what follows, we propose a blind channel estimation method which does not require any downlink pilots.

### 4.1 Downlink Blind Channel Estimation Algorithm

We next describe our downlink blind channel estimation algorithm, a refined version of the scheme in [1]. Consider the sample average power of the received signal at the th user per coherence interval:

 ξk≜|yk(1)|2+|yk(2)|2+…+|yk(τd)|2τd, (23)

where is the th sample received at the th user and is the number of symbols per coherence interval spent on downlink transmission. From (2.2), and by using the law of large numbers, we have, as ,

 ξk−⎛⎝ρdηk|αkk|2+K∑k′≠kρdηk′|αkk′|2+1⎞⎠ P→ 0. (24)

Since is a sum of many terms, it can be approximated by its mean (this follows from the law of large numbers). As a consequence, when , and are large, in (23) can be approximated as follows:

 ξk≈ρdηk|αkk|2+ρdE⎧⎨⎩K∑k′≠kηk′|αkk′|2⎫⎬⎭+1. (25)

Furthermore, the approximation (25) is still good even if is small. The reason is that when is small, with high probability the term is much smaller than , since with high probability . As a result, can be approximated by its mean even for small . (In fact, in the special case of , this sum is zero.)

Equation (25) enables us to estimate the amplitude of the effective channel gain using the received samples via as follows:

 ˆ|αkk|= ⎷ξk−1−ρdE{∑Kk′≠kηk′|αkk′|2}ρdηk. (26)

In case the argument of the square root is non-positive, we set the estimate equal to .

For completeness, the th user also needs to estimate the phase of . When is large, with high probability, the real part of is much larger than the imaginary part of . Thus, the phase of is very small and can be set to zero. Based on that observation, we propose to treat the estimate of as the estimate of the true :

The algorithm for estimating the downlink effective channel gain is summarized as follows:

###### Algorithm 1
(Blind downlink channel estimation method) For each coherence interval, using a data block of samples , compute according to (23). The th user acquires and . See Remark 1 for a detailed discussion on how to acquire these values. The estimate of the effective channel gain is as (27)

###### Remark 1

To implement Algorithm 1, the th user has to know and . We assume that the th user knows these values. This assumption is reasonable since these values depend only on the large-scale fading coefficients, which stay constant over many coherence intervals. The BS can compute these values and inform the th user about them. In addition can be expressed in closed form (except for in the case of ZF processing with keyhole channels) as follows:

 E⎧⎨⎩K∑k′≠kηk′|αkk′|2⎫⎬⎭=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩K∑k′≠kηk′βk,for MR,(Rayleigh/keyhole channels)K∑k′≠kηk′(βk−γk), for ZF.(Rayleigh channels) (28)

Detailed derivations of (28) are presented in Appendix .2.

### 4.2 Asymptotic Performance Analysis

In this section, we analyze the accuracy of our proposed downlink blind channel estimation scheme when and go to infinity for two specific propagation channels: Rayleigh fading and keyhole channels. We use the model (18) for keyhole channels. When , in (23) is equal to its asymptotic value:

 ξk−⎛⎝ρdηk|αkk|2+K∑k′≠kρdηk′|αkk′|2+1⎞⎠→0, (29)

and hence, the channel estimate in (27) becomes

 Missing or unrecognized delimiter for \right (30)

Since , it is reasonable to assume that the BS can perfectly estimate the channels in the uplink training phase, i.e., we have . (This can be achieved by using very long uplink training duration.) With this assumption, is a positive real value. Thus, (30) can be rewritten as

 ^αkkαkk=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩ ⎷1+K∑k′≠kηk′ηk|αkk′|2−E{|αkk′|2}α2kk,if ξk>1+ρdE{K∑k′≠kηk′|αkk′|2},E{αkk}αkk,% otherwise. (31)

#### Maximum-Ratio Processing

With MR processing, from (28) and (31), we have

 ^αkkαkk=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩   ⎷1+K∑k′≠kηk′ηk∣∣ ∣∣gHkgk′∥∥gk′∥∥∣∣ ∣∣2−βk∥gk∥2,if ξk>1+ρdK∑k′≠kηk′βk,E{∥gk∥}∥gk∥,otherwise. (32)

 Pr⎧⎨⎩ξk>1+ρdK∑k′≠kηk′βk⎫⎬⎭ =Pr⎧⎨⎩1+K∑k′=1ρdηk′|αkk′|2>1+ρdK∑k′≠kηk′βk⎫⎬⎭ ≥Pr⎧⎨⎩ρdηk|αkk|2>ρdK∑k′≠kηk′βk⎫⎬⎭ =Pr⎧⎨⎩1M∥gk∥2>1MK∑k′≠kηk′ηkβk⎫⎬⎭ → 1,as M→∞, (33)

where the convergence follows the fact that and , as .

In addition, by the law of large numbers,

 ∣∣∣gHkgk′∥∥gk′∥∥∣∣∣2−βk∥gk∥2 =⎛⎝∣∣ ∣∣gHkgk′M∣∣ ∣∣2M∥gk′∥2−βkM⎞⎠M∥gk∥2 → 0,as M→∞. (34)

From (32), (-4.2.1), and (-4.2.1), we obtain

 ^αkkαkk→ 1,as M→∞. (35)

Our proposed scheme is expected to work very well at large and .

• Keyhole channels: Following a similar methodology used in the case of Rayleigh fading, and using the identity

 gHkgk′∥gk′∥=√βknk∑j=1c(k)ja(k)jν(k)j, (36)

where is distributed, we can arrive at the same result as (35). The random variable is Gaussian due to the fact that conditioned on , is a Gaussian RV with zero mean and unit variance which is independent of .

#### Zero-forcing Processing

With ZF processing, when ,

 ^αkkαkk→ 1,as M→∞. (37)

This follows from (29) and the fact that , for .

## 5 Capacity Lower Bound

Next, we give a new capacity lower bound for Massive MIMO with downlink channel gain estimation. It can be applied, in particular, to our proposed blind channel estimation scheme.3 Denote by , , and . Then from (2.2), we have

 yk=√ρdηkαkksk+K∑k′≠k√ρdηk′αkk′sk′+wk. (38)

The capacity of (38) is lower bounded by the mutual information between the unknown transmitted signal and the observed/known values , . More precisely, for any distribution of , we obtain the following capacity bound for the th user:

 Ck≥1τdI(yk,^αkk;sk) =1τd[h(sk)−h(sk|yk,^αkk)] (a)=1τdh(sk)−1τd[h(sk(1)|yk,^αkk)+h(sk(2)|sk(1),yk,^αkk) +…+h(sk(τd)|sk(1),…,sk(τd−1),yk,^αkk)] (b)≥1τdh(sk)−1τd[h(sk(1)|yk,^αkk)+h(sk(2)|yk,^αkk) +…+h(sk(τd)|yk,^αkk)], (39)

where in we have used the chain rule [25], and in we have used the fact that conditioning reduces entropy.

It is difficult to compute in (5) since and are correlated. To render the problem more tractable, we introduce new variables , , which can be considered as the channel estimates of using Algorithm 1, but is now computed as

 |yk(1)|2+…+|yk(n−1)|2+|yk(n+1)|2…+|yk(τd)|2τd−1.

Clearly, is very close to . More importantly, is independent of , . This fact will be used for subsequent derivation of the capacity lower bound.

Since is a deterministic function of , , and hence, (5) becomes

 Ck ≥1τdh(sk)−1τd[h(sk(1)|yk,^αkk,^^αkk(1)) +…+h(sk(τd)|yk,^αkk,^^αkk(τd))] ≥1τdh(sk)−1τd[h(sk(1)|yk(1),^^αkk(1)) +…+h(sk(τd)|yk(τd),^^αkk(τd))], (40)

where in the last inequality, we have used again the fact that conditioning reduces entropy. The bound (5) holds irrespective of the distribution of . By taking to be i.i.d. , we obtain

 Ck ≥log2(πe)−h(sk(1)|yk(1),^^αkk(1)). (41)

The right hand side of (41) is the mutual information between