Easily Computed Lower Bounds on the Information Rate of Intersymbol Interference Channels

Easily Computed Lower Bounds on the Information Rate of Intersymbol Interference Channels

Seongwook Jeong, and Jaekyun Moon, This work was supported in part by the NSF under Theoretical Foundation grant no. 0728676 and the National Research Foundation of Korea under grant no. 2010-0029205. S. Jeong is with the Dept. of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: jeong030@umn.edu). J. Moon is with Dept. of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, 305-701, Republic of Korea (e-mail: jmoon@kaist.edu).
Abstract

Provable lower bounds are presented for the information rate where is the symbol drawn independently and uniformly from a finite-size alphabet, is a discrete-valued random variable (RV) and is a Gaussian RV. It is well known that with representing the precursor intersymbol interference (ISI) at the decision feedback equalizer (DFE) output, serves as a tight lower bound for the symmetric information rate (SIR) as well as capacity of the ISI channel corrupted by Gaussian noise. When evaluated on a number of well-known finite-ISI channels, these new bounds provide a very similar level of tightness against the SIR to the conjectured lower bound by Shamai and Laroia at all signal-to-noise ratio (SNR) ranges, while being actually tighter when viewed closed up at high SNRs. The new lower bounds are obtained in two steps: First, a “mismatched” mutual information function is introduced which can be proved as a lower bound to . Secondly, this function is further bounded from below by an expression that can be computed easily via a few single-dimensional integrations with a small computational load.

{keywords}

Channel capacity, decision feedback equalizer, information rate, intersymbol interference, lower bounds, mutual information.

I Introduction

The computation of the symmetric information rate (SIR) of the classical discrete-time intersymbol interference (ISI) channel is of great interest in digital communication. The SIR represents the mutual information between the channel input and output while the input is constrained to be independently and uniformly distributed (i.u.d.) over the given alphabet. In this sense, the SIR is also known as capacity with uniform, independent input distribution and itself represents a reasonably tight lower bound to unconstrained channel capacity, especially at high coding rates. During recent years, a number of researchers have worked on estimating or bounding the information rate via simulation of the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm [1]. The information rate with a given input distribution can be closely estimated for finite ISI channels with moderate input alphabet size and channel impulse response length, by running the forward-recursion portion of the BCJR algorithm on long (pseudo) randomly generated input and noise samples [2], [3], [4]. The simulation-based method has been further generalized, and lower and upper bounds based on auxiliary finite-state channels with reduced states were introduced for long ISI channels, as well as some non-finite state ISI channels in [5]. The tightness of these bounds is highly related to the optimality of auxiliary channels, but the general rule to find the optimal or near-optimal auxiliary channel has not been provided in [5]. The work of [5] has been recently extended in [6] to further tighten the lower and upper bounds by using an iterative expectation-maximization type algorithm to optimize the parameters of the auxiliary finite-state channels. It is noted, however, that the global optimality of the bounds in [6] is neither guaranteed, nor the lower bound is proven to converge to a stationary point as iteration progresses. Another approach based on auxiliary channels is also proposed to obtain a lower bound utilizing a mismatched Ungerboeck-type channel response to achieve improved tightness for a given level of computational complexity [7]. In the context of [7], the Ungerboeck-type response is the channel’s response observed at the output of the matched filter front-end. As such, the trellis search detection algorithms driven by the channel observations of the Ungerboeck model must be designed so that they can handle correlated noise samples [8].

An entirely different direction in estimating or bounding the information rate is based on finding an analytical expression that can easily be evaluated or numerically computed (in contrast to the methods based on Monte-Carlo simulation that rely on generating pseudo-random signal and noise samples). An early work in this direction is the lower bound on the SIR by Hirt [9] based on carving a fixed block out of the channel input/output sequences and performing a single multi-dimensional integration (or running Monte-Carlo simulation for estimating the integral) with the dimensionality equal to the block size. However, this method is also computationally intense unless the size of the block gets small. Unfortunately the lower bound of [9] is not tight unless the block size is very large compared to the channel ISI length.

A number of more computationally efficient and analytically evaluated lower bounds for the SIR have been discussed in [10], [11]. Unfortunately, however, the only bound presented in [11] that is reasonably tight throughout the entire signal-to-noise ratio (SNR) region (i.e., both low and high code rate regimes) is the one that could not be proved as a lower bound. This particular bound is now widely known as the Shamai-Laroia conjecture (SLC) and, although unproven, is a popular tool for quickly estimating the SIR of ISI channels. At high code rates, the SIR is generally very close to capacity, so an easily computed tight SIR lower bound is also useful for quickly estimating channel capacity for high code rate applications, such as data storage channels and optical fiber channels.

Consider the random variable (RV) , where is a symbol drawn independently and uniformly from a fixed, finite-size alphabet set symmetrically positioned around the origin, a zero-mean discrete-valued RV, and a zero-mean Gaussian RV. The SLC is concerned with the special case where is a linear sum of symbols drawn independently and uniformly from the same symbol set where was taken. As the number of symbols forming grows, finding an analytical expression for the probability density function of (and thus one for ) is a long-standing problem [13], [14], as pointed out in [11]. The SLC of [11] can be stated as , where is a Gaussian RV with variance matching that of . The information rate is easily obtained by numerically calculating a single one-dimensional integral, and is generally observed to be reasonably tight to in most cases. Unfortunately, remains as a conjectured bound with no proof available to date. One difficulty of proving the SLC stems from the fact that for the channels driven by the inputs from a finite alphabet, Gaussian noise is not the worst-case noise in terms of the achievable information rate [11], [12]. Another difficulty is that the power contribution of a single individual weight involved in constructing could remain a significant portion of the total power associated with all weights, even if the number of weights approaches infinity. This is to say that the Lindberg condition for the central limit theorem does not hold for this problem, and the Gaussian approximation of cannot be justified [11].

In this paper, we are also interested in the easily computable analytical expressions for lower bounds to . Note that, in the context of the unbiased minimum mean-squared-error decision feedback equalizer (MMSE-DFE) application, represents the collection of residual precursor ISI contributions and in this case itself is a well-known lower bound to the SIR [11]. The bounds we develop here are fairly tight, with their tightness generally enhanced with increasing computational load (which in the end still remains small). Our approach is to first define a “mismatched” mutual information (MI) function based on the “mismatched” entropy that takes the operation not on the actual underlying probability density but on the Gaussian density with the same variance. We then prove that this “mismatched” MI is always less than or equal to . We further bound this function from below so that the final bound can be evaluated using numerical integration. The bound is basically evaluated by computing a few single-dimensional integrals. This is in contrast to the Hirt bound that computes a single multi-dimensional integral of very high dimension. Our bound computation also requires the evaluation of sum of the absolute values of the linear coefficients that form as well as the identification of dominant coefficient values, if they exist. With the application of the MMSE-DFE, these linear coefficients correspond to the weights on the interfering symbols after ideal postcursor ISI cancellation and can easily be obtained with a small amount of computation. At a reasonable overall computational load, our bounds are shown to be for all practical purposes as tight as the Shamai-Laroia conjecture for many practical ISI channels.

Section II presents the provable bound to and numerically compares it with the SLC for some example distributions for the linear coefficients that form . Section III develops upper and lower bounds on the provable bound itself, based on identifying clusters in the distribution of . Finding clusters in the distribution is the same as identifying dominant coefficient values from the linear coefficient set that is used to construct . Section IV generates and discusses numerical results. In all finite-ISI channels examined, our bound provides the same level of tightness as the SLC against the SIR (while being actually tighter than SLC at high SNRs when viewed closed up) with a very reasonable computation load. In particular, our lower bound is presented on the same channel employed in [6]. This provides an indirect means to compare the computational loads of our method and that of [6]. As expected, our analytical method is considerably better in quickly producing a reasonably tight bound than the simulation-based method of [6] in terms of complexity/accuracy tradeoffs. Note that the method of [6] represents the latest development in simulation-based SIR bounds. Section V concludes the paper.

Ii A Provable Lower Bound to the Symmetrical Information Rate

We first present a provable lower bound to where . The symbols and are all independently and uniformly drawn. The linear coefficients ’s are related to the channel impulse response and will be specified in Section IV. Let so we can write . Note that is a Gaussian mixture. Also let where is a zero mean Gaussian with variance matching that of , i.e., .

Definition 1 (“Mismatched” MI (MMI) Function)

Define

 I′(X;Y) ≜ H′(Y)−H′(V) (1)

where

 H′(Y) ≜ −∫∞−∞fY(t)logfZ(t)dt, H′(V) ≜ −∫∞−∞fV(t)logfG(t)dt

and , , , and are the probability density functions (pdfs) of the RVs, , , , and , respectively. Note that the “mismatched” entropy functions and are defined based the operation applied not to the actual underlying pdf but rather to the “mismatched” Gaussian pdf .

Lemma 1

Given the MMI function defined as above, we have

 I′(X;Y)≤I(X;Y). (2)
{proof}

See Appendix A.

Let us now take a close look at this MMI function and develop some insights into its behavior. Let the variances of , , and be , , and respectively. Further assume that the RVs, , , , and are all real-valued. We will also assume a binary input alphabet. These assumptions are not necessary for our development but make the presentation clearer as well as less cluttered. We will simply state the results in Section III-C for a non-binary/complex-valued example. We also denote for since can have different sequences. Naturally, the pdfs of RVs and can be written as

 fV(t) = 2−L2L∑i=11√2πσ2Nexp(−(t−mi)22σ2N) fG(t) = 1√2πσ2Vexp(−t22σ2V).
Proposition 1

Denoting and , letting ’s to mean the positive-half subset of ’s, and defining and , the MMI function can be rewritten as with the new definition \sublabonequation

 F ≜ 2−L2L∑i=1Eτ[log{1+e−2Rρie−2ϕ√Rτ−2R}] (4) = Eρ,τ[log{1+e−2Rρe−2ϕ√Rτ−2R}] = 2−(L−1)2L−1∑k=1Eτ[12log{1+2cosh(2Rρ+k)e−2ϕ√Rτ−2R +e−4ϕ√Rτ−4R}] = Eρ+,τ[12log{1+2cosh(2Rρ+)e−2ϕ√Rτ−2R +e−4ϕ√Rτ−4R}].
\sublaboff

equation

A detailed derivation is given in Appendix B. The position of the th Gaussian pdf of the mixture is expressed as a dimensionless quantity: , with the normalization by the square root of the input power. Because of the symmetric nature of , occurs in equal-magnitude, opposite-polarity pairs. The expectation is initially over , which is considered a zero-mean unit-variance Gaussian random variable when contained inside the argument of the expectation operator. The expectation operator in this case can simply be viewed as a short-hand notation as in

 Eτ[p(τ)]=∫∞−∞e−τ2/2√2πp(τ)dτ.

In (4) and (4), however, (or ) is also treated as a RV and the expectation is over both and (or and ) as the double subscripts indicate. Given the pdfs of , and , the computation of the expectation now involves numerical evaluation of a double integral. Note that in (4) is a discrete-valued random variable distributed according to , which denotes the probability distribution of and is a discrete-valued random variable distributed according to where is a step function. Also, notice that and . Since it is not easy to find when is large, evaluating (4) or (4) is difficult in general.

It is insightful to compare with \sublabonequation

 FSLC ≜ log2−CSLC(R) = ∫∞−∞e−τ2/2√2πlog{1+e−2√Rτ−2R}dτ = = Eτ[12log{1+2e−2√Rτ−2R+e−4√Rτ−4R}]
\sublaboff

equationwhere is the SIR of the binary-input Gaussian channel with SNR given by and is the well-known SLC. The function quantifies the gap between the SLC and the maximum attainable capacity for any binary channel with infinite SNR, namely, 1 bit/channel use. Comparing the expressions for in (4) and in (LABEL:eq:F_b2), we see that if so that , then , and and the SLC both become equal to . Also, if the discrete RV converges to a Gaussian random variable (in cumulative distribution), then again we get and .

Furthermore, that in (4) makes larger while the factor being less than 1 has an effect of decreasing as it increases. If is to be a tight lower bound to , then needs to be small. The important question is: how does overall compare with , over all interested range of SNR? Since it is already proved that , if for some values, then clearly at those SNRs, i.e., the SLC holds true at least at these SNRs.

While exact computation of (4) requires in general obtaining all possible positive-side values of and thus can be computationally intense for large , in the cases where we know the functional form of the distribution for , evaluation of (4) or (4) is easy; the behavior of under different distributions offers useful insights.

First try a uniform distribution for . For a uniformly distributed discrete random variable from to with a gap between delta functions in the pdf, we have

 σ2S = 2PXΔ22K+1K∑i=1i2=PXΔ2K(K+1)3 = PX|ρ|max(|ρ|max+Δ)3

which makes

 ϕ2 = σ2Nσ2N+σ2S=1−σ2Sσ2V=1−RΔ2K(K+1)3 = 1−R|ρ|max(|ρ|max+Δ)3.

Fig. 1 shows and plotted with as functions of for various values of . We also consider a simple case involving only a single coefficient , in which case takes only two possible values, e.g., . The plots of and for this case are shown against for different values of in Fig. 2. Figs. 1 and 2 point to similar behaviors of versus . Namely, becomes smaller than as decreases for a range of values. At these values, the provable lower bound is apparently tighter than the SLC, with respect to the SIR.

Iii Bounding F

Exact computation of in general is not easy, especially when goes to infinity. We thus resort to bounding with expressions that can easily be computed. An upper bound on will provide a lower bound on and thus on . Lower bounds on are also derived to see if they can get smaller than . If so, this would mean is larger than , i.e., our bound is tighter than the SLC.

Iii-a Simple Bounds

Since is convex in , its integral function with respect to , , is also convex in . Moreover, this function increases as increases. Accordingly, we can develop bounds on . The first simple upper bound is

 Fu1 ≜ T(|ρ|max,θ)∣∣θ=σρ (7)

where, for a given , the function represents a straight line passing through two points of the function at and at . Note that and is the standard deviation of RV .

Similarly, is a concave and increasing function of . Based on this property, we can develop another upper bound.

 Fu2 ≜ Eτ[12log{1+2(sσρ+1)e−2ϕ√Rτ−2R (8) +e−4ϕ√Rτ−4R}]

where , the slope of a straight line connecting two points and .

A lower bound on can also be obtained that can help shed lights on how tight the upper bounds on are. Using the convexity of in , the simple lower bound of is

 Fl ≜ Eτ[12log{1+2e−2ϕ√Rτ−2R+e−4ϕ√Rτ−4R}]. (9)

Detailed derivations of (7), (8), and (9) are given in Appendix C.

Iii-B Tightened Bounds Based on Cluster Identification

The above bounds can be tightened up by identifying clusters in the Gaussian mixture . In practical ISI channels, often consists of clusters. This is due to the fact that the coefficient set ’s typically contains a few dominating coefficients plus many small terms. Assuming there are dominating coefficients among ’s, we can let where , , and . Since is an i.u.d. RV, and are independent so that where and denote the variance of RVs and , respectively. Notice that can be viewed as the position of a specific cluster while points to a specific Gaussian pdf out of Gaussian pdf’s symmetrically positioned around .

Therefore, assuming there are clusters of Gaussian pdfs, the upper bound can be tightened as

 Fu1M ≜ 2−M2M∑n=1Tn(|μ|max,θ)∣∣θ=σμ (10)

where, for a given , the function is a straight line that passes through the two points of the convex function at and , is the standard deviation of RV defined as , and .

Another form of tightened upper bound based on is obtained as

 Fu2M ≜ 2−M2M∑n=1Eτ[12log{1+2(sMσμ+1)e−2Rλn (11) ×e−2ϕ√Rτ−2R+e−4Rλne−4ϕ√Rτ−4R}]

where .

The lower bound can also be tightened similarly based on the cluster identification:

 FlM ≜ 2−(M−1)2M−1∑k=1Eτ[12log{1+2cosh(2Rλ+k) (12) ×e−2ϕ√Rτ−2R+e−4ϕ√Rτ−4R}]

where ’s form the positive-half subset of ’s. Detail derivations of (10), (11), and (12) can be found in Appendix D.

Iii-C Bounds for Complex Channels with the Quaternary Alphabet Inputs

In the previous subsections, ISI coefficients and noise samples are assumed to be real-valued with the channel inputs being the binary phase shift keying (BPSK) signal. In this subsection, we provide a complex-valued example along with the channel inputs taken from a quadrature phase shift keying (QPSK) quaternary alphabet, i.e., . The extension to larger alphabets should be straightforward.

Denoting the real and imaginary parts of complex number by and respectively, i.e., , and for , the pdf’s of complex random variables and are given as

 fV(t) = 4−L4L∑i=11πσ2Nexp(−|t−mi|2σ2N) = 4−L4L∑i=1{1√πσ2Nexp(−(t(r)−m(r)i)2σ2N) fG(t) = 1πσ2Vexp(−|t|2σ2V) = 1√πσ2Vexp⎛⎝−(t(r))2σ2V⎞⎠1√πσ2Vexp⎛⎝−(t(i))2σ2V⎞⎠.

Then, for the SLC, we write \sublabonequation

 FSLC ≜ log4−CSLC(R) = 2∫∞−∞e−τ2√πlog{1+e−2√2Rτ−2R}dτ = 2Eτ[log{1+e−2√2Rτ−2R}] = 2Eτ[12log{1+2e−2√2Rτ−2R+e−4√2Rτ−4R}]
\sublaboff

equation where .

The function is given as \sublabonequation

 F ≜ (16) +Eτ[log{1+e−2√2Rρ(i)ie−2ϕ√2Rτ−2R}]) = 2Eρ(r),τ[log{1+e−2√2Rρ(r)e−2ϕ√2Rτ−2R}] = 4−(L−1)4L−1∑k=12Eτ[12log{1+2cosh(2√2Rρ(r)+k) ×e−2ϕ√2Rτ−2R+e−4ϕ√2Rτ−4R}] = 2Eρ(r)+,τ[12log{1+2cosh(2√2Rρ(r)+)e−2ϕ√2Rτ−2R +e−4ϕ√2Rτ−4R}]
\sublaboff

equationwhere , , and ’s and ’s denote the positive-half subset of ’s and ’s respectively. The equality (16) holds because the pdf of is identical to the pdf of .

Then, the upper bound based on can be derived in a similar way as

 Fu1M ≜ 4−M4M∑n=12T(r)n(|μ(r)|max,θ)∣∣θ=σμ√2 (17)

where, for a given , denotes a straight line that passes through the two points of the function at and at . Note that and the variance of is equal to since the pdfs of and are identical to the pdfs of and , respectively.

A second upper bound on is given as

 Fu2M ≜ 4−M4M∑n=12Eτ[12log{1+2(s(r)Mσμ√2+1)e−2√2Rλ(r)n (18) ×e−2ϕ√2Rτ−2R+e−4√2Rλ(r)ne−4ϕ√2Rτ−4R}]

where .

Finally, a lower bound to can be shown to be

 FlM ≜ 4−M4M/2∑k=1Eτ[12log{1+2cosh(2√2Rλ(r)+k) (19) ×e−2ϕ√2Rτ−2R+e−4ϕ√2Rτ−4R}]

where ’s form the positive-half subset of ’s.

Iv Application to ISI Channels and Numerical Examples

Iv-a The ISI Channel and MMSE-DFE

Fig. 3 shows the discrete-time equivalent system model of the finite-ISI channel with the infinite-length feedforward filter of the unbiased MMSE-DFE preceded by the matched filter (MF) for the channel. The discrete-time MF output of Fig. 3 is identical to the baud-rate sampled output of the continuous-time MF applied to the continuous-time channel, under the assumption that the channel is strictly limited to the Nyquist band.

We also assume that the receiver knows the -transform of the finite-ISI channel response, , is an i.u.d. input sequence and is additive white Gaussian noise (AWGN) with variance . Furthermore, is the channel output sequence, is the output sequence of the infinite-length MMSE-DFE feedforward filter and is the unbiased MMSE-DFE output after ideal postcursor ISI cancellation.

Denoting , , and , the output of the the unbiased MMSE-DFE with ideal feedback [15] is

 Y=X+∞∑k=1d−kXk+N=X+S+N=X+V

where is the Gaussian noise sample observed at the DFE forward filter output and is the precursor ISI sequence. Note we are assuming stationary random processes. It is well-known that the -transform of the precursor ISI taps is given by [15]

 d(D)=N0P0−N0(1−1g∗(D−∗)) (20)

where is such that and is obtained from spectral factorization: with . Notice that a convenient numerical spectral factorization algorithm exists for recursively computing the coefficients of [16], [17].

Accordingly, the variances of , , and are given as

 σ2V = PXN0P0−N0 σ2N = PXP0N02π(P0−N0)2∫π−πRhh(e−jθ)Rhh(e−jθ)+N0/PXdθ σ2S = σ2V−σ2N.

We can obtain by the absolute summation of the inverse -transform of if the feedforward filter of MMSE-DFE is stable, i.e., . Let us first consider the case where has multiple first-order poles, for . Then, can be obtained by the partial fraction method since is a rational function. In other words, the inverse -transform of individual fraction terms can be found and then added together to form . Denoting , the sequence is given as . Therefore,

 |ρ|max = (21) = N0(P0−N0)(∞∑k=1|a−k|) = N0(P0−N0)(∞∑k=1∣∣ ∣∣P∑i=1cipki∣∣ ∣∣) ≤ N0(P0−N0)(P∑i=1∞∑k=1∣∣cipki∣∣) = N0(P0−N0)(P∑i=1|cipi|1−|pi|).

The upper bound of can be also tightened by identifying the first dominant taps:

 |ρ|max = N0(P0−N0)(∞∑k=1∣∣ ∣∣P∑i=1cipki∣∣ ∣∣) (22) = N0(P0−N0)(K∑k=1∣∣ ∣∣P∑i=1cipki∣∣ ∣∣+∞∑k=K+1∣∣ ∣∣P∑i=1cipki∣∣ ∣∣) ≤ N0(P0−N0)(K∑k=1∣∣ ∣∣P∑i=1cipki∣∣ ∣∣+P∑i=1∞∑k=K+1∣∣cipki∣∣) =

For the case of the multiple-order poles of , the upper bound of can be also obtained in a similar way using the triangle inequality .

The SIR or the i.u.d. capacity (bits/channel use) for any finite-ISI channel corrupted by Gaussian noise is given [18] as

 SIR ≜ limN→∞12N+1I({xk}N−N;{rk}N−N) (23) ≥ limN→∞12N+1I({xk}N−N;{zk}N−N) ≥ I(x0;z0|{xk}−1−∞) (24) = I(X;Y) (25)

where . The inequality in (23) holds due to the data processing theorem (equality holds if the MMSE-DFE feedforward filter is invertible). The inequality of (24) can be obtained by applying the chain rule of mutual information and assuming stationarity [11]. The equality (25) is valid because known post-cursor ISI can simply be subtracted out without affecting capacity.

Iv-B Numerical Results

Now, let us examine the particular ISI channels, , and , which are well-known and previously investigated in [2], [10], [11], and , which was considered in [6]. The first 20 precursor ISI tap values are computed and shown in Fig. 4 for these example channels. In addition, we consider a complex-valued partial response channel: . The channel inputs are binary, except the complex-valued channel for which the inputs are assumed quaternary.