Low-Complexity Message Passing Based Massive MIMO Channel Estimation by Exploiting Unknown Sparse Common Support with Dirichlet Process

# Low-Complexity Message Passing Based Massive MIMO Channel Estimation by Exploiting Unknown Sparse Common Support with Dirichlet Process

## Abstract

This paper investigates the problem of estimating sparse channels in massive MIMO systems. Most wireless channels are sparse with large delay spread, while some channels can be observed having sparse common support (SCS) within a certain area of the antenna array, i.e., the antenna array can be grouped into several clusters according to the sparse supports of channels. The SCS property is attractive when it comes to the estimation of large number of channels in massive MIMO systems. Using the SCS of channels, one expects better performance, but the number of clusters and the elements for each cluster are always unknown in the receiver. In this paper, the Dirichlet process is exploited to model such sparse channels where those in each cluster have SCS. We proposed a low complexity message passing based sparse Bayesian learning to perform channel estimation in massive MIMO systems by using combined BP with MF on a factor graph. Simulation results demonstrate that the proposed massive MIMO sparse channel estimation outperforms the state-of-the-art algorithms. Especially, it even shows better performance than the variational Bayesian method applied for massive MIMO channel estimation.

{IEEEkeywords}

Channel estimation, massive MIMO, message passing, Dirichlet process, sparse Bayesian learning, sparse common support.

## 1 Introduction

Deploying of multiple antennas for wireless communication systems often yield significant advantages on the performance of power gain, channel robustness, diversity and spatial multiplexing [1], [2]. Therefor multiple-input-multiple-output (MIMO) technology has already attracted widespread attention of researchers [1, 2, 3, 4, 5, 6, 7]. However, accurate channel estimation is needed to realizing the full potential of MIMO systems [3], [4]. With the number of transmit antennas increasing, the receiver have to estimate proportionally more channel coefficients, which in turn increases the pilot overhead and tends to reduce the overall MIMO throughput gains [5]. Hence, exploring efficient channel estimation technology for massive MIMO systems, required less computational complexity and number of pilots, is a challenge, which has been thoroughly addressed in [4, 5, 6, 7].

To reduce the overhead, some works studied and exploited sparse common support (SCS) approximately existing in the sparse channels of MIMO systems [8, 7, 9]. It is reasonable to assume that the antennas closely arranged will observe almost the same echoes from different reflectors or scatterers, and therefore the corresponding sparse channels will exhibit common support. [8] expounds that two channel taps are resolvable if the time difference of arrival is larger than , where is the signal bandwidth. In the other words, the channels corresponding to two antennas have SCS, when their distance is less than with standing for the speed of light. The channel estimation algorithms [8, 9] exploiting SCS property for all the channels perform well with less pilot overhead in the case of the farthest antennas of an array close enough. Masood et al. studied the SCS property for different antenna arrays with several typical communication standards [7], and illustrated that the full SCS may not hold with large antenna array and wide bandwidth. That means the application of algorithms in [8, 9] will be limited. In this work, we focus on studying channel estimation algorithm for massive MIMO systems with large antenna array, where the full SCS doesn’t often exist, but the channels in each cluster have SCS.

Pertaining to the aforementioned scenario, a message passing based channel estimation algorithm is proposed in this paper, which exploits the un-known SCS information by leveraging the cluster property of Dirichlet process (DP). By assuming that channels with SCS property share a precision vector, the unknown SCS information can be automatically learned by using the Dirichlet process. Generally speaking, the proposed algorithm is based on the following two techniques.

(1) Dirichlet process mixture. In the context of Bayesian non-parametric methods, DP mixture models [10, 11] have been studied for more than three decades, and have been used in multi-target tracking [12], image segmentation [13], direction of arrival (DoA) estimation [14], and many other scenarios. As in in [15], we also impose a DP prior over the sparse Bayesian learning (SBL) [16], denote as DP-SBL, and apply such model in the channel estimation of massive MIMO system.

(2) Factor graph and message passing. Due to its remarkable performance, factor graph and message passing inference technology [17] has been widely used in the design of wireless communication systems [6, 18, 19, 20]. Since each of the message passing rules, e.g., belief propagation (BP) [17], mean field (MF) [21], expectation propagation (EP) [22], have their specialities, a method that combines BP, MF or EP as a unified framework on a same factor graph has been proposed [23], [20], which keeps the virtues but avoids their respective drawbacks. In this paper, the DP-SBL model is built on factor graph and combined BP-MF message passing, while some messages are handled by the recently developed generalized approximate message passing (GAMP) to reduce the complexity [24],[25]. Compared to the variational Bayesian (VB) method in literature [26] [15], the proposed algorithm can reduce the complexity significantly.

In summary, the proposed channel estimation algorithm have the following distinctive features: It utilizes the sparsity of the channel impulse response (CIR), and the feature that antennas in massive MIMO system can be grouped into clusters according to their SCS property. By the adoption of DP-SBL model, the SCS information can be automatically learned, thus channels with SCS can be estimated jointly. The DP mixture is modeled and derived based on the factor graph and combined message passing, which can significantly reduce the complexity. Simulation results show that, the proposed SCS-exploiting channel estimation algorithm shows significant performance gain and robustness over other methods in literature.

The remainder of this paper is organized as follow. The transmission model and channel model of the MIMO-OFDM system is described in Section II. In Section III, we present the DP mixture and the probabilistic model. The message computation, schedule and the complexity comparison of the proposed message passing based algorithm are detailed in Section IV. Numerical results are provided in Section V.

Notation- Boldface lower-case and upper-case letters denote vectors and matrices, respectively. Superscripts and represent conjugation and transposition, respectively. The expectation operator with respect to a density is expressed by . The probability density function (pdf) of a complex Gaussian distribution with mean and variance is represented by . The pdf of Gamma distribution with shape parameter and scale parameter is denoted as , and beta distribution with shape parameter is denoted as . The gamma and digamma function are represented by and respectively. The relation for some positive constant is written as . We use the to transform the vector into a diagonal matrix with the entries of spread along the diagonal.

## 2 System Model

### 2.1 MIMO-OFDM Transmission Model

Consider the uplink of a multiuser massive MIMO-OFDM system consisting of users, each of which equipped with one antenna, and a receiver equipped with antennas. To combat the inter symbol interference, the users are modulated by OFDM with subcarriers. The transmitted symbols by the th user in frequency domain are denoted by . Among the subcarriers, uniformly spaced subcarriers are selected for all the users to transmit pilot signals, with represents the indices set of pilot-subcarriers of user . As in [20], we assumes that , and when a pilot-subcarrier is employed by a user, the remaining users do not use it to transmit any signal. The received signal by the th receive antenna from the user can be written as

 y(m,u) = X(u)h(m,u)+n(m), (1)

where stands for the diagonalized pilot symbols of th user, represents the vector of frequency-domain channel weight between the th user and the th receive antenna, represents the additive white Gaussian noise (AWGN) with zero mean and variance . Since users are independent to each other and , here we consider only one user without loss of generality. In the rests of this paper, we will drop the script for convenience, then the receive model in (1) becomes the simplified form

 y(m)=Xh(m)+n(m). (2)

### 2.2 Spatial Channel Model

It is known that most wireless channels can be modeled as multi-path channels with large delay spread and very few significant paths as scatterers are sparsely distributed in space. This makes the CIR sparse [27, 28]. Thus, for each transmit-receive link, we need only estimate a few significant multi-path channel gains, which has the potential to reduce the pilot overhead substantially. Following [7][29], we also build the frequency channel weight on tapped delay line model

 h(m,u)=F(u)α(m,u), (3)

where represents the truncated Fourier matrix formed by selecting the rows and the first columns from the discrete Fourier transform (DFT) matrix, denotes the -taps sparse channel between the th user and th receive antenna. As (2), equation (3) can also be simplified as

 h(m)=Fα(m). (4)

Due to the physical properties of outdoor electromagnetic propagation, the CIR measured at different antennas of MIMO systems share a common support, i.e. the times of arrival (ToA) at different antennas are similar while the paths amplitudes and phases are distinct [8]. An example of the SCS channel model for a section of an antenna array is shown in [7, Fig.3] and [8, Fig.1]. Since the degrees of freedom to estimated can be reduces with such SCS assumption, which can improve the channel estimation overhead.

It is important to note that, the SCS assumptions only hold with respect to the channel bandwidth and the signal noise ratio (SNR) of the channel. One can assume that antennas with distance less than share a common support [8, 7]. Authors in [7] illustrate the relationship between the maximum resolvable distance () and the dimensions of the arrays for three different communication standards, with the distance between two adjacent antennas is assumed to be where is the signal wavelength. It can be seen that such SCS support may not hold with large bandwidth and large arrays. A schematic diagram of an antenna array without full SCS property is shown in Fig. 1 [7].

To the authors’ knowledge, there are lack of conclusive methods about the support pattern in massive MIMO. In this paper we set the channel of massive MIMO using a simple assumption: an antenna may have common support with its neighbors in probability . With such assumption, the channels can be grouped into several clusters, and antennas in each cluster have the property of common support. This model is more general than [8], i.e., when set the proposed channel model is equivalent to [8]. The construction of aforementioned model is detailed in section 5.

## 3 Unknown Sparse Common Support Using Sparse Bayesian Learning with Dirichlet Process

To acquire the SCS information in the MIMO-OFDM systems, we resort to the sparse Bayesian learning (SBL) with Dirichlet process (DP) prior, e.g., DP-SBL, as in [15] and [14]. In this section we first introduce the DP and SBL model briefly, and then present the DP-SPL model using probabilistic model and factor graph.

### 3.1 SBL with Dirichlet Process Prior

#### Sparse Bayesian Learning Model

Since equation (2) is a typical SBL problem, here we employ a two-layer (2-L) hierarchical structure [16] that assumes a conditional prior pdf as

 p(α(m)|γ) = ∏lCN(α(m)l;0,γ−1l),   ∀m, p(γl) = Ga(γl;c,d).

The above equations imply that, all the sparse channel taps , have the common hyper-prior, which is equivalent to the assumption of [9]. However, in this paper we are solving the problem that, the total channel may be grouped into several sets of clusters, and the common hyper-prior may only be appropriate within each cluster. Through the use of DP employed as the prior over , we can simultaneously outperform the clustering and SBL.

#### Dirichlet Process

The Dirichlet process, denoted as , is a measure on measure, and is parameterized by a positive scaling parameter and the base distribution . Assume each , is drawn identically from and itself is a random measure drawn form a Dirichlet process.

 γ(k) ∼ G,   k=1:K, G ∼ DP(η,G0),

where denotes that follows a distribution. Since the explicit formulation of is unattainable, a definition of in terms of a stick-breaking construction was provided in [11], as

 G=∑∞k=1ωkδ(˜γ(k)) (5)

with

 ωk=πk∏k−1i=1(1−πi), (6) p(˜γ(k))=G0, (7)

where is the Dirac delta function, and parameter has the prior distribution . Known form (11) that, the base distribution is selected as Gamma distribution. The infinite number of components in (5) will inevitably results in an intractable complexity. In practice, the number of components is truncated to a relatively large number . In this paper, is set to be the number of antennas without loss of generality [14].

### 3.2 Probabilistic Model and Factor Graph Representation

Following the stick-breaking construction of DP mixture in [26], we introduce the assignment variables , which can be defined by indicator function

 Missing dimension or its units for \hskip

which indicate the mixture components, i.e. , with which is associated. Then the assignment vector has a multi-nomial distribution with a parameter set , i.e.,

 p(z(m)|{ωk}k=1:K)=%Mult({ωk}k=1:K).

Using the deterministic relationship of and as in (6), we can define the conditional distribution as

 p(z(m)|π) =K∏k=1(πkk−1∏i=1(1−πi))\mathbbm1[z(m)k]≜K∏k=1f(m)zk(z(m)k,π) (8) =K∏k=1π\mathbbm1[z(m)k]kK∏i=k+1(1−πk)\mathbbm1[z(m)i]≜f(m)z(z(m),π), (9)

with vectors and .

The distribution of conditional on and can be expressed as

 p(α(m)|z(m),˜γ(k),∀k)=∏l∏kCN(α(m)l;0,1/˜γ(k)l)\mathbbm1[z(m)k] ≜∏l∏kf(m)Dk,l(α(m)l,z(m)k,˜γ(k)l) Missing or unrecognized delimiter for \big (10)

Following [15], we can also define conditional and prior distributions,

 p(π|η) = ∏kp(πk|η)=∏kBe(πk;1,η) ≜ ∏kfπk(πk,η)≜fπ(π,η), p(η) = Ga(η;e,g)≜fη(η),

and the mixture components have the prior distribution

 p(˜γ(k)) = ∏lGa(˜γ(k)l;c,d)≜∏lf(k)γl(˜γ(k)l) (11) ≜ f(k)γ(˜γ(k)).

From the receive model presented in (2), the likelihood function of observation vector can be written as

 p(y(m)|h(m),λ) = ∏nCN(y(m)n;xnh(m)n,λ−1) ≜ f(m)y(h(m),λ)≜∏nf(m)yn(h(m)n.λ)

The deterministic constrains of and , as is shown in (4), can be expressed as

 Missing or unrecognized delimiter for \big ≜f(m)δ(h(m),α(m))≜∏nf(m)δn(h(m)n,α(m)).

As in [25], we also define the prior of noise precision

 p(λ)=Ga(λ;a,b)≜fλ(λ).

From the receive model presented in (2) and the SBL with DP prior model list above, the joint pdf of the collection of observed and unknown variables can be factorized as

 p(y,h,α,z,˜γ,π,η,λ) =∏mf(m)y(h(m),λ)f(m)δ(h(m),α(m))f(m)z(z(m),π) Missing or unrecognized delimiter for \big ×p(π|η)p(η)fλ(λ). (12)

The aforementioned factorization can be expressed in factor graph as depicted in Fig.2. For clarity, we group the factor graph into three functional blocks, labeled by Blocks and marked in dashed boxes. Where Block represents the DP prior estimation, Block denotes the estimation of hyper prior, and Block represents the estimation of sparse channel taps and noise precision.

## 4 Low Complexity Combined Message Passing Approach

The message computation based on combined belief propagation (BP) and mean field (MF), message passing schedule and complexity comparison are presented in this section.

### 4.1 Message Computation

In this subsection, we detail the message computation in accordance with the three functional Blocks labeled in Fig.2. Note that, if a forward message computation requires backward messages, we use the message in previous iteration by default.

#### Messages Updating in DP Prior Estimation

Assume the belief of , are given from last iteration. We can compute the message using MF rule,

 mf(m)Dk,l→z(m)k(z(m)k)=exp{⟨logf(m)Dk,l⟩b(α(m)l)∏kb(˜γ(k)l)} Missing or unrecognized delimiter for \right =exp{\mathbbm1[z(m)k](⟨log˜γ(k)l⟩b(˜γ(k)l) Missing or unrecognized delimiter for \big

where and denote the expectation of and with respect to , and their values are updated at (19) and (20).

With the factor node defined in (8) and the belief of , later defined in (14), message can be updated by

 mf(m)zk→z(m)k(z(m)k)=exp{⟨logf(m)zk⟩b(π)} =exp{\mathbbm1[z(m)k]⟨logπk∑k−1i=1log(1−πi)⟩b(π)} Missing or unrecognized delimiter for \big

where and represent the expectation of and with respect of the belief of , and are updated in (15) and (16) respectively. Then the belief of can be obtained

 b(z(m)k) = mf(m)zk→z(m)k(z(m)k)×∏lmf(m)Dk,l→z(m)k(z(m)k) = exp{\mathbbm1[z(m)k]×^Em,k}

where . After normalization, the expectation of can be updated as

 ⟨\mathbbm1[z(m)k]⟩b(z(m)k)=exp{^Ck}∑kexp{^Ck}≜^ϕmk. (13)

With the definition of factor node in (9), message can be updated by MF rule,

 mf(m)z→π(π)=exp{⟨logf(m)z⟩b(z(m))} =exp{∑k^ϕmklogπk+∑K−1i=k+1^ϕmilog(1−πk)}.

By the factor node, , message can be get by MF rule, i.e.,

 mfπ→π(π) = exp{⟨logfπ⟩b(η)} = exp{(^η−1)∑klog(1−πk)},

where denotes the expectation of with respect to , and is updated in (17). Then the belief of can be get by

 b(π) = mfπ→π(π)×∏mmf(m)z→π(π) (14) = ∏kexp{∑m^ϕmklogπk +[∑m∑Ki=k+1^ϕmi+^η−1]log(1−πk)} = ∏kπτ1k−1k(1−πk)τ2k−1,

where and . So the expectation of and with respect to the belief of , can be get by, [15],

 ⟨logπk⟩b(π)=Ψ(τ1k)−Ψ(τ1k+τ2k) (15) ⟨log(1−πk)⟩b(π)=Ψ(τ2k)−Ψ(τ1k+τ2k), (16)

where denotes the digamma function, with definition .

Then the message from factor node to variable node is updated by MF rule, which reads

 mfπ→η(η) = exp{⟨logfπ⟩b(π)} = ηKexp{(η−1)∑k⟨log(1−πk)⟩b(π)} ∝ ηKexp{η∑k⟨log(1−πk)⟩b(π)}.

With its prior , here we calculate the belief of as

 b(η) ∝ mfπ→η(η)×fη(η) ∝ Missing or unrecognized delimiter for \left

and the expectation of can be updated

 ^η=⟨η⟩b(η)=K+e−1h−∑k⟨log(1−πk)⟩b(π). (17)

#### Messages Updating in Hyper Prior Estimation

With the updated beliefs and message from factor node to variable node can be get using MF

 Missing or unrecognized delimiter for \big =exp{^ϕmk(log˜γ(k)l−˜γ(k)l(|^α(m)l|2+ν(m)αl))} =(˜γ(k)l)^ϕmkexp{−^ϕmk(|^α(m)l|2+ν(m)αl)˜γ(k)l}.

Then by the prior of , the belief of can be updated

 b(˜γ(k)l) ∝ f(k)γl(γ(k)l)×∏mmf(m)Dk,l→˜γ(k)l(˜γ(k)l) (18) = (˜γ(k)l)^ckl−1exp{−˜γ(k)l^dkl}.

where and . Thus the expectation of and can be updated as

 ⟨˜γ(k)l⟩b(˜γ(k)l) = ^ckl^dml, (19) ⟨log˜γ(k)l⟩b(˜γ(k)l) = Ψ(^ckl)−log(^dml). (20)

#### Messages Updating in Sparse Channel and Noise Precision Estimation

Assume that messages from factor node to variable node is obtained previously, which is defined in (27). Then the product of messages can be get by

 q(α(m)l)=CN(α(m)l;^q(m)l,ν(m)ql), (21)

where

 ν(m)ql = (∑n|F(m)nl|2ν(m)θn+ν(m)pn)−1, (22) ^q(m)l = ν(m)ql∑n^s(m)n(F(m)nl)∗+^α(m)l, (23)

with denotes a intermediate variable, which is defined as

 ^s(m)n≜^θ(m)n−^p(m)nν(m)θn+ν(m)pn, (24)

and variables , represent the mean and variance of message , which can be found in (30). Note that, the derivation of equations (27), (21) and (24) can be found in our prior work [25, Eqs. (29)-(33)], and will not be detailed here.

With the beliefs of and , defined in (13) and (18), message is computed by MF rule, which yields

 Missing or unrecognized delimiter for \right =exp{∑k^ϕmk⟨log˜γ(k)l−˜γ(k)l|α(m)l|2⟩∏kb(˜γ(k))} Missing or unrecognized delimiter for \big

Then the belief of is updated

 b(α(m)l) ∝ mf(m)Dl→α(m)l(α(m)l)×q(α(m)l) ≜ CN(α(m)l;^α(m)l,ν(m)αl),

with

 ν(m)αl = (∑k^ϕmk⟨˜γ(k)l⟩b(˜γ(k)l)+(ν(m)ql)−1)−1, (25) ^α(m)l = ν(m)αl^q(m)l/ν(m)ql. (26)

With the GAMP method proposed in [24], message can be updated by

 mf(m)δn→h(m)n(h(m)n)=CN(h(m)n;^p(m)n,ν(m)pn), (27)

where

 ν(m)pn = ∑l|F(m)nl|2ν(m)αl (28) ^p(m)n = ∑lF(m)nl^α(m)l−^s(m)nν(m)pn. (29)

Then message , form the observation node to variable node , can be updated as

 mf(m)yn→h(m)n(h(m)n) = exp{⟨logf(m)yn⟩b(λ)} (30) ≜ CN(h(m)n;^θ(m)n,ν(m)θn),

where and . Thus we can calculate the belief of as,

 b(h(m)n) ∝ Missing or unrecognized delimiter for \big ≜ CN(h(m)n;^h(m)n,ν(m)hn),

where

 ν(m)hn = ((ν(m)pn)−1+(ν(m)θn)−1)−1, (31) ^h(m)n = ν(m)hn(^p(m)n/ν(m)pn+^θ(m)n/ν(m)θn). (32)

Then the expectation of noise precision can be updated by,

 ^λ=NM∑n,m⟨|y(m)n−xnh(m)n|2⟩b(h(m)n). (33)

Detailed derivation of (33) can be found in [25, 20].

### 4.2 Message Passing Schedule

The factors in Fig.