Convergence and Density Evolution of a Low-Complexity MIMO Detector based on Forward-Backward Recursion over a Ring

# Convergence and Density Evolution of a Low-Complexity MIMO Detector based on Forward-Backward Recursion over a Ring

Seokhyun Yoon,
Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2012R1A1A2038807). S. Yoon is with the Department of Electronics Engineering, Dankook University, Korea (e-mail: syoon@dku.edu)
###### Abstract

Convergence and density evolution of a low complexity, iterative MIMO detection based on belief propagation (BP) over a ring-type pair-wise graph are presented in this paper. The detection algorithm to be considered is effectively a forward-backward recursion and was originally proposed in [13], where the link level performance and the convergence for Gaussian input were analyzed. Presented here are the convergence proof for discrete alphabet and the density evolution framework for binary input to give an asymptotic performance in terms of average SINR and bit error rate (BER) without channel coding. The BER curve obtained via density evolution shows a good match with simulation results, verifying the effectiveness of the density evolution analysis and the performance of the detection algorithm.

{keywords}

MIMO detection, Belief propagation, Density evolution, Pair-wise graphs, Forward-backward recursion.

## I Introduction

During the last decade, there were lots of works on belief propagation based MIMO detection, in terms of detection in multi-antenna spatial multiplexing or multiuser detection in code-division multiplexing [1, 2, 3, 4, 5, 6]. In these approaches, the MIMO channel is modeled as a fully-connected factor graph, which consists of a multiple factor nodes representing the received signal, a multiple variable nodes representing the hidden data, and the edges connecting the factor nodes with the variable nodes. The resulting graph has maximal edge degree, i.e., every factor node is connected to every variable node.

In terms of performance, [5] and [6] showed that BP asymptotically performs the same as maximum a posteriori (MAP) detector, if the graphical model is sparse enough. Especially, [5] showed that BP performs the same as MAP even if the graph is dense while the system load (which, in our context, is the multiplexing order normalized to the number of transmit antenna) is less than a certain limit.

In terms of complexity, however, the complexity of BP based detection over the fully connected factor graph is as high as MAP detector due mainly to the marginalization operation required for the message update at the factor nodes. To reduce the computational complexity, model simplification approaches have been studied. Especially, in [7], it was suggested to prune some edges in the fully connected factor graph, based on the strength of the channel coefficients, i.e., to prune edges corresponding to those variable-factor node pairs with small value of . By using only edges per factor node (i.e., pruning edges), the complexity is reduced by a factor of relative to MAP of complexity . The problem of this scheme is that must not be too small to ensure reasonable performance.

Other interesting graph-based approaches are those in [8, 9, 10, 11, 12, 13] based on the pair-wise Markov random field (MRF). In MRF, we have nodes representing the hidden data and the edges reflecting the local dependency among them. The local dependency is represented by potential functions and, specifically, in pair-wise MRF they are functions of one or two variables. In fact, as noticed in [14, 15, 16, 17], a multivariate Gaussian function can be decomposed into a product of functions of one or two variables resulting in a fully connected pair-wise MRF. Unfortunately, however, BP over the pair-wise MRF based on this potential function does not work well for higher-order modulation, such as 16QAM.

To overcome such problem, [12] and [13] considered using potential functions obtained by a linear transformation, e.g., by a conditional MMSE estimator. Leaving only two variables, one can construct pair-wise graphical model resulting in a low complexity detection algorithm. As a matter of fact, the edge pruning in [8], [9] and [12, 13] are special case of the channel truncation approach in [18], either in zero-forcing sense or in MMSE sense. Similar to sphere decoding, these detectors are a two stage detector, where the channel is first truncated (pruning edges) to simplify the graph and, then, post-joint detection is performed as a BP over the simplified graphical model.

Gaussian BP, as those in [16, 17, 19], can also be considered for low complexity MIMO detection. In Gaussian BP, the input data and messages are all assumed to be Gaussian so that the message and posterior probability (belief) can be represented by a pair of mean and variance, resulting in a very simple message update rule. As shown in [16] and [19], (and also in [13]), however, the algorithm converges only to the linear minimum mean squared error (LMMSE) solution that is inferior to the MAP detector for non-Gaussian input.

One lesson from [13] is that the BP based MIMO detector over the ring-type pair-wise graph in [13] is always convergent regardless of its alphabet size. Note that the BP over fully-connected pair-wise model in [10], [11], and [13] do not converge, especially when alphabet size is finite and higher than 2. The guarantee of the convergence of the BP over ring-type model in [13] might come from avoiding short loop. This result is consistent with the results in [20] and [21], where it was shown that BP over a graphical model with a single cycle always converges. According to [5] and [6], as well as more recent simulation results in [9] and [13], however, limiting edge degree and keeping graphical model sparse seems to be a must for successful convergence of BP based detection, especially for use of higher order modulation.

In this paper, we extend the convergence of the iterative MIMO detector based on the ring-type pair-wise graph to the discrete alphabet. We also develop density evolution framework to characterize the stationary SINR distribution after convergence, which will give us a deep insight into the mechanism and the performance of the algorithm.

The paper is organized as follows. In the next section, the previous works are briefly introduced for the development of the analysis in the subsequent sections. In section III, the convergence proof is provided for discrete alphabet and, in section IV, the density evolution approach is developed for SINR analysis of the MIMO detection algorithm under consideration. Some numerical results are given in section V to validates the density evolution approach for SINR analysis and, finally, the concluding remarks in section VI.

## Ii System model and previous works

### Ii-a System Model

A Gaussian MIMO system with an channel matrix is modeled as

 y=Hx+n=M∑k=1hkxk+n (1)

where is an transmitted data symbol vector, is an noise vector, is an received signal vector and is the th column of . The noise vector is assumed to be complex Gaussian with mean and covariance and the transmitted data symbol vector is assumed to have mean and covariance matrix , where denotes expectation. In practice, each element of is drawn from a finite alphabet set of size , such as QPSK and 16-QAM, for which = 2 and 4, respectively.

### Ii-B Low complexity detection based on BP over ring-type pair-wise graph [13]

Our start point is the one in [13], especially the BP over ring-type pair-wise graph. The graphical model is shown in Fig.1, over which the BP algorithm is effectively a forward-backward recursion and can be summarized as follows.

BP over ring-type pair-wise graph

Given the messages in the previous iteration, , they are recursively updated for all as

 πj→(j±1)M (x(j±1)M=s) = ∑s′∈Ξγ(j±1)M|j(s|s′)⋅π(j∓1)M→j(xj=s′) (2)

with given by (9).
After a pre-defined number of iterations, the belief is finally obtained by

 b(xj)=π(j+1)M→j(xj)⋅π(j−1)M→j(xj). (3)

In (2) and (3), is one-base modulo- operation. Since using is cumbersome, it will later be omitted.

In this algorithm, we used only factor to variable node message, , since there is only two factor nodes connected to a variable node such that variable nodes simply pass the incoming message to the opposite side.

In (2), the translation function, , (also known as branch metric in the context of forward-backward recursion along a trellis) is based on the conditional MMSE estimator of given . By defining the conditional estimator of given as

 wj|i=K−1{j,i}hj (4)

and applying it to the received signal vector , we have

 yj|i=wHj|iy=aj|i,jxj+aj|i,ixi+nj|i (5)

where

 KΦ =σ2I+∑k∉ΦhkhHk (6) aj|i,k =wHj|ihk=hHjK−1{j,i}hk% fork=iorj (7) E|nj|i|2 =wHj|iK{j,i}cj|i=hHjK−1{j,i}hj≡σ2j|i. (8)

Note that and the parameters from (6) to (8) are computed from the channel parameters, and , and the received signal . In the truncated signal model in (5), we assume the noise + interference, , to be a Gaussian, from which the translation function is given by

 γj|i (xj|xi)∝CN(yj|i;aj|i,jxj+aj|i,ixi,σ2j|i) (9)

where is the complex Gaussian density function with mean and variance given by

 CN(y;μ,σ2) ≡1πσ2exp(−|y−μ|2σ2)

The translation function given by Gaussian density function in (9) assumes data symbols other than and are all Gaussian. This assumption, however, is only for pruning edges and to simplify the graphical model as shown in Fig.1 while the messages themselves in (2) are not necessarily Gaussian too.

On the other hand, when the input is indeed Gaussian, the forward backward algorithm in (2) reduces to an update rule for mean and variance [13] as follows.

Gaussian BP over ring-type pair-wise graph

Given the messages in the previous iteration, , they are recursively updat- ed by

 σ2π,j→j±1 =11+σ2j±1|j+|aj±1|j,j|2(1+σ2j±1|j)2⋅σ2π,j∓1→j (10) μπ,j→j±1 =yj±1|j1+σ2j±1|j−aj±1|j,j1+σ2j±1|j⋅μπ,j∓1→j (11)

In [13], it is proved that the mean and variance in the Gaussian forward-backward recursion converge respectively to MMSE estimates and its corresponding MMSE, i.e.,

 μπ,j∓1→j→ ^xj=hHjK−1y (12) σ2π,j∓1→j→ MMSEj=1−hHjK−1hj. (13)

as the number of iteration goes to infinity. Since MAP detector becomes linear MMSE estimator for the Gaussian input, it shows the optimality of the scheme for Gaussian input. However, for the non-Gaussian input, MMSE estimator is far inferior to MAP detector.

In the rest of this paper, we deal with the convergence of the forward-backward algorithm in (2) for arbitrary discrete alphabet and its density evolution characteristic for binary input, which will give us a deeper insight into its convergence and performance.

## Iii Convergence for Discrete Alphabet

In this section, we provide two convergence proofs of the BP over ring-type pair-wise graph, one for arbitrary discrete alphabet and the other for binary one. The signal model for binary input provides framework not only for the convergence proof but also for density evolution analysis to be discussed in the next section.

### Iii-a Convergence proof for arbitrary discrete input

The convergence proof in this subsection is based on the ’Perron-Frobenius theorem’. Although it has already been discussed in [21], we provide it in our context here for the reader’s convenience.

Suppose that data symbols are drawn from an -ary alphabet set . The forward-backward recursion in (2) can be concisely expressed as

 πj→j±1 =1αj±1Aj±1|jπj∓1→j (14)

where is message vector of which the th element is given by , is translation matrix of which the ()th element is and is the normalization constant, such that , i.e., .111 denotes the -norm of a vector Note that all element of are positive real.

Define and as the translation matrix for one complete turn of forward and backward recursion, respectively, i.e.,

 Fj =Aj|j−1Aj−1|j−2⋯A2|1A1|M⋯Aj+1|j (15) Bj =Aj|j+1Aj+1|j+2⋯AM−1|MAM|1⋯Aj−1|j (16)

Then, the message vector at the th turn can be expressed as

 π(k)j−1→j∝Fjπ(k−1)j−1→j=Fkjπ0 (17) π(k)j+1→j∝Bjπ(k−1)j+1→j=Bkjπ0 (18)

where is the message at the th turn and is the initial message, which is typically set to uniform distribution. Now, one can prove the following theorem.

###### Theorem 1

With any initial message, , the message in (17) converges (up to a normalization constant) to the eigenvector of corresponding to its largest eigenvalue. Similarly, in (18) converges to the eigenvector of corresponding to its largest eigenvalue.

{proof}

First, we decomposed the transition matrix for one complete turn, , into

 Fj =EjDjE−1j

where is diagonal matrix with its th diagonal element is the th eigenvalue, , and is the eigenbasis, of which the th column, , is the eigenvector for . Then, we have

 Fkj =EjDkjE−1j

where is the ()th element of . Note that, with an initial message, , we have

 [π(k)j−1→j]m =[Fkjπ0]m=M∑n=1M∑l=1emlλklilnπ0,n (19)

Since all the entries of are positive, we see, from Perron-Frobenius theorem, that the matrix, , has single largest real eigenvalue (a.k.a. Perron-Frobenius eigenvalue) of which the corresponding eigenvector has (or can be chosen to have) all positive entries. Let be the largest real eigenvalue and the corresponding eigenvector. Then, by taking limit to the normalized message , we have

 limk→∞[π(k)j−1→j]m∥π(k)j−1→j∥1 →∑Mn=1eml∗λkl∗il∗nπ0,n∑Mm′=1∑Mn=1em′l∗λkl∗il∗nπ0,n =eml∗λkl∗∑Mn=1il∗nπ0,n∑Mm′=1em′l∗λkl∗∑Mn=1il∗nπ0,n =eml∗∑Mm′=1em′l∗ =[el∗]m∥el∗∥1 (20)

where, in the r.h.s. of the first line, we took only the term with the largest eigenvalue since other terms are negligible as .

The convergence for the backward recursion can be proved similarly.

Although the convergence proof below is applicable to arbitrary alphabet, it is not suitable for further SINR and BER analysis. So, we provides another signal model which deals directly with the log-likelihood ratio (LLR) by restricting the data to binary. As will be shown later, it gives us a more convenient way for SINR and BER analysis via density evolution.

### Iii-B Message passing for binary input

For binary input, the message can be summarized by a scalar, i.e., the log likelihood ratio(LLR). Define the message and a priori LLR as

 li→j =logπi→j(xj=+1)πi→j(xj=−1) (21) la,i =logp(xi=+1)p(xi=−1) (22)

such that

 p(xi=±1) =e±la,i/2e+la,i/2+e−la,i/2

Then, the forward recursion in (2), together with (21), can be expressed as (23) shown on top of the next page, where we assume uniform priors.

Removing common terms in the numerator and denominator in (23) and defining a function of with a parameter as

 ζ(x;c)=−log(ex/2−c+e−x/2+cex/2+c+e−x/2−c) (24)

(23) can be concisely rewritten as

 lj−1→j=la,j+4y(R)j|j−1−ζ(lj−2→j−1+2dj|j−1;cj|j−1) (25)

where

 y(R)j|i =R[yj|i] (26) cj|i =2σ2j|iR[a∗j|i,jaj|i,i]=2R[aj|i,i]=2a(R)j|i,i (27)
 dj|i =2σ2j|iR[a∗j|i,iyj|i] =2R[a∗j|i,i]xj+2|aj|i,i|2σ2j|ixi+2σ2j|iR[a∗j|i,jnj|i] =2a(R)j|i,ixj+2|aj|i,i|2σ2j|ixi +2σ2j|i(a(R)j|i,jn(R)j|i+a(I)j|i,jn(I)j|i) (28)

where we denote the real and imaginary part of a complex variable as superscript () and (), respectively, for notational simplicity.

The non-linear function in (24) is a monotonic function of , either increasing if or decreasing if , and has the following properties.

 (a)ζ′(x;c)=ddxζ(x;c) =12tanh(x2+c)−12tanh(x2−c) (29) (b)|ζ′(x;c)|<1∀x (30) (c)limx→∞ζ(x;c)→±2c(saturation% ). (31) (d)limc→0ζ(x;c)2c→tanh(x2). (32)

Note that

 ζ(lj−2→j−1+2dj|j−1;cj|j−1)2cj|j−1≈tanh(lj−2→j−12+dj|j−1)

can be regarded as an estimate of based on the information provided from the current observation, , and the message from the previous node, , through the forward-backward recursion.

### Iii-C Convergence proof for binary input

To prove the convergence of the forward-backward recursion in (25) for binary input, we first prove the following lemma.

###### Lemma 2

Let be a function with the following two properties

1. is a monotonic (either increasing or decreasing) function defined on ().

2. .

Then, the following properties hold

1. The equation, , has a unique solution.

2. Let be the solution of . Let for be a sequence obtained by successively applying starting from an initial value , i.e., . Then, for any , approaches to as .

3. Let for some real values , and . The properties 1) to 4) also hold for .

{proof}

Property 3) is obvious from 1) and 2). Proof of 4) is as follows. Let and . Suppose that . Then, we have

 |Δk| =∣∣∣∫xs+Δk−1xsf′(x)dx∣∣∣ (a)=∫xs+Δk−1xs∣∣f′(x)∣∣dx (b)<∫xs+Δk−1xsdx=Δk−1=|Δk−1|

where (a) is due to the monotony of , by which always has the same sign, and (b) is due to . Similarly, one also can show the same result, , for , which ensures that .

In 4), it is obvious that also has the properties 1) and 2) since shift does not alter the slope and, with , . Hence, 3) and 4) also hold for .

Fig.3(a) and (b) show two examples of the convergence in lemma 2 (Property 4)). From lemma 2, one can prove the convergence of the forward-backward recursion for binary input as follows.

###### Theorem 3

The forward and backward recursion in (25) for binary input converges to a unique fixed point as iteration goes to infinity.

{proof}

Note that the forward recursion at the th node is of a form

 lj−1→j =gj(lj−2→j−1)≡bj−ζ(lj−2→j−1−aj;cj)

for some real values, , and , where satisfies the properties 1) to 5) in lemma 2 and, hence, so is . Define one iteration as one complete turn of the recursions along the ring, such that the LLR of the th data bit at the th iteration, , is represented as

 l(k)j−1→j =gT,j(l(k−1)j−1→j) =gj∘gj−1∘⋯∘g1∘gM∘…gj+1(l(k−1)j−1→j)

where, from the chain rule, we have

 dgT,jdl

Since all ’s satisfy the properties 1) and 2) in lemma 2, so is . And, hence, from 3) to 5), converges to a unique fixed point as .

The convergence of the backward recursion can be proved similarly. Note that the two fixed points from the forward and backward recursion do not necessarily equal.

## Iv SINR Analysis via Density Evolution

In this section, we use the model in Section III-B to determine the stationary distribution of the LLR, . Assuming the channel matrix and the noise power are fixed, we develop the density evolution of messages between neighboring nodes. In channel coding context, the density evolution in an iterative decoder assumes all-zero sequence is sent and the LLR mean is tracked with the number of iterations, assuming the LLR is Gaussian with its variance being the same as its mean. The density evolution used in [5] assumes the same, even though the approach differs.

In this section, we will also assume the LLRs are Gaussian and will track their mean and variance. The differences here from those in iterative channel decoding are that 1) both mean and variance have to be tracked along the ring, where the message of each node has different statistics and, hence, 2) we cannot assume all-zero input since the statistics of the current message, , depends not only on the background noise but also on the other data. Fortunately, symmetry holds for binary data and the message depends largely on the previous data only so that one can proceed as follows: Under symmetry, we denote the mean and variance of as and , where both and are non-negative and the mean has the same sign as that of . In this definition, can be interpreted as a reliability of . Then, supposing that , we evaluate for given .

Assume uniform priors, i.e., and suppose that . Using (5), the forward message passing in (25) becomes

 lj−1→j =4y(R)j|j−1−ζ(lj−2→j−1+2dj|j−1;2a(R)j|j−1,j−1) =4a(R)j|j−1,j+4n(R)j|j−1+4a(R)j|j−1,j−1xj−1 −ζ(lj−2→j−1+2dj|j−1;2a(R)j|j−1,j−1) =4a(R)j|j−1,j+4n(R)j|j−1+4a(R)j|j−1,j−1ej−1|j−2(xj−1) =4zj|j−1+4a(R)j|j−1,j−1ej−1|j−2(xj−1) (33)

where, from (26) to (28),

 zj|j−1= a(R)j|j−1,j+n(R)j|j−1 (34) y(R)j|j−1= a(R)j|j−1,j+n(R)j|j−1+a(R)j|j−1,j−1xj−1 = zj|j−1+a(R)j|j−1,j−1xj−1 (35) ej−1|j−2 (xj−1)=xj−1−ζ(lj−2→j−1+2dj|j−1;2a(R)j|j−1,j−1)4a(R)j|j−1,j−1 (36)
 dj|j−1= 2a(R)j|j−1,j−1σ2j|j−1(a(R)j|j−1,j+n(R)j|j−1) + 2|aj|j−1,j−1|2σ2j|j−1xj−1+2a(I)j|j−1,j−1σ2j|j−1n(I)j|j−1 = 2a(R)j|j−1,j−1σ2j|j−1zj|j−1+2|aj|j−1,j−1|2σ2j|j−1xj−1 + 2a(I)j|j−1,j−1σ2j|j−1n(I)j|j−1 (37)

Note that is the estimation error on based on the information provided from the current observation, and the previous node message, .

### Iv-a Density evolution of the forward-backward recursion

Here, we determine for given of the previous message, . If the two terms in the last line of (33) is uncorrelated, the problem is simple. That is, we simply assume the estimation error, , is Gaussian, evaluate its mean and variance and add them to the mean and variance of . By iterating such procedure many times, one can obtain the mean and variance of (density evolution), even though we need numerical evaluation of integrals. Unfortunately, the two term are correlated due to 1) the inclusion of in the argument of and 2) the noise + interference in the term in and in are correlated. (Here, the dependency on is shown explicitly for . While, it is not for since we are assuming .) One thing that helps make the analysis possible is that they both are well modeled as Gaussian so that the density evolution is numerically tractable by making a few simplifying assumptions.

To this end, we fix and explore the correlations between the involving variables, of which the randomness solely comes from the noise and other interferences than and . Consider first in the last line of (33). We have, from (6) and (7),

 zj|j−1 ∼N(a(R)j|j−1,j,E(n(R)j|j−1)2) (38)

where, if we assume the suppressed noise + interference, , is circularly symmetric, then the variance is given, from (7) and (8), by

 E(n(R)j|j−1)2 =E(n(I)j|j−1)2=a(R)j|j−1,j2 (39)

This is valid when circularly symmetric constellations, such as QPSK, are used, while it is generally not for non-circularly symmetric real constellations, such as BPSK. Although it has little impact, especially when is large, it will be certainly more accurate to use the exact variances, which are given in Appendix A.

Now, let us look at the argument of , which can be rewritten as

 lj−2→j−1 +2dj|j−1=lj−2→j−1+4a(R)j|j−1,j−1σ2j|j−1zj|j−1 +4|aj|j−1,j−1|2σ2j|j−1xj−1+4a(I)j|j−1,j−1σ2j|j−1n(I)j|j−1 (40)

where is assumed to be Gaussian, of which the mean and variance are provided from the previous node as the pair , i.e.,

 lj−2→j−1 ∼N(mj−1|j−2xj−1,vj−1|j−2) (41)

The rest three terms, except for , are assumed to be uncorrelated to each other. Although has a weak correlation with as shown in the Appendix A, we will ignore it for analytical simplicity. Unfortunately, and has non-negligible covariance. Recalling the definition of and , it stems from the covariance between in and in , which is given by

 σlz,j ≡E[lj−2→j−1⋅zj|j−1|xj−1] −E[lj−2→j−1|xj−1]⋅E[zj|j−1] =E[n(R)j|j−1n(R)j−1|j−2]+2a(R)j−1|j−2⋅E[n(R)j|j−1ej−1|j−2] ≈E[n(R)j|j−1n(R)j−1|j−2] (42)

where we ignored the correlation between and .

Noting that

 E [n(R)j|j−1n(R)j−1|j−2]+E[n(I)j|j−1n(I)j−1|j−2] =R[nj|j−1n∗j−1|j−2] =R[hHjK−1{j,j−1}K{j,j−1,j−2}K−1{j−1,j−2}hj−1] (43)

and resorting to the circular symmetry, we have

 E[n(R)j|j−1n(R)j−1|j−2]=E[n(I)j|j−1n(I)j−1|j−2],

resulting in

 σlz,j ≈12R[hHjK−1{j,j−1}K{j,j−1,j−2}K−1{j−1,j−2}hj−1] (44)

Without circular symmetry, we also may use similar derivation to as shown in the Appendix A.

According to the argument below, one can rewrite (40) using two correlated random variables, say