Coherent multiple-antenna block-fading channels at finite blocklength

Coherent multiple-antenna block-fading channels at finite blocklength

Austin Collins and Yury Polyanskiy Authors are with the Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 02139 USA. e-mail: {austinc,yp} This material is based upon work supported by the National Science Foundation CAREER award under grant agreement CCF-12-53205 and by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCF-09-39370. This paper was presented in part at [1, 2].

In this paper we consider a channel model that is often used to describe the mobile wireless scenario: multiple-antenna additive white Gaussian noise channels subject to random (fading) gain with full channel state information at the receiver. Dynamics of the fading process are approximated by a piecewise-constant process (frequency non-selective isotropic block fading). This work addresses the finite blocklength fundamental limits of this channel model. Specifically, we give a formula for the channel dispersion – a quantity governing the delay required to achieve capacity. Multiplicative nature of the fading disturbance leads to a number of interesting technical difficulties that required us to enhance traditional methods for finding channel dispersion. Alas, one difficulty remains: the converse (impossibility) part of our result holds under an extra constraint on the growth of the peak-power with blocklength.

Our results demonstrate, for example, that while capacities of and antenna configurations coincide (under fixed received power), the coding delay can be quite sensitive to this switch. For example, at the received SNR of dB the system achieves capacity with codes of length (delay) which is only of the length required for the system. Another interesting implication is that for the MISO channel, the dispersion-optimal coding schemes require employing orthogonal designs such as Alamouti’s scheme – a surprising observation considering the fact that Alamouti’s scheme was designed for reducing demodulation errors, not improving coding rate.

I Introduction

Given a noisy communication channel, the maximal cardinality of a codebook of blocklength which can be decoded with block error probability no greater than is denoted as . Evaluation of this function – the fundamental performance limit of block coding – is alas computationally impossible for most channels of interest. As a resolution of this difficulty [3] proposed a closed-form normal approximation, based on the asymptotic expansion:


where capacity and dispersion are two intrinsic characteristics of the channel and is the inverse of the -function111As usual, . One immediate consequence of the normal approximation is an estimate for the minimal blocklength (delay) required to achieve a given fraction of channel capacity:


Asymptotic expansions such as (1) are rooted in the central-limit theorem and have been known classically for discrete memoryless channels [4, 5] and later extended in a wide variety of directions; see surveys in [6, 7].

The fading channel is the centerpiece of the theory and practice of wireless communication, and hence there are many slightly different variations of the model: differing assumptions on the dynamics and distribution of the fading process, antenna configurations, and channel state knowledge. The capacity of the fading channel was found independent by Telatar [8] and Foschini and Gans [9] for the case of Rayleigh fading and channel state information available at the receiver only (CSIR) and at both the transmitter and receiver (CSIRT). Motivated by the linear gains promised by capacity results, space time codes were introduced to exploit multiple antennas, most notable amongst them is Alamouti’s ingenious orthogonal scheme [10] along with a generalization of Tarokh, Jafarkhani and Calderbank [11]. Motivated by a recent surge of orthogonal frequency division (OFDM) technology, this paper focuses on isotropic channel gain distribution, which is piecewise independent (“block-fading”) and assume full channel state information available at the receiver (CSIR). This work describes finite blocklength effects incurred by the fading on the fundamental communication limits.

Some of the prior work on the similar questions is as follows. Single antenna channel dispersion was computed in [12] for a more general stationary channel gain process with memory. In [13] finite-blocklength effects are explored for the non-coherent block fading setup. Quasi-static fading channels in the general MIMO setting have been thoroughly investigated in [14], showing that the expansion (1) changes dramatically (in particular the channel dispersion term becomes zero); see also [15] for evaluation of the bounds. Coherent quasi-static channel has been studied in the limit of infinitely many antennas in [16] appealing to concentration properties of random matrices. Dispersion for lattices (infinite constellations) in fading channels has been investigated in a sequence of works, see [17] and references. Note also that there are some very fine differences between stationary and block-fading channel models, cf. [18, Section 4]. The minimum energy to send bits over MIMO channel for both the coherent and non-coherent case was studied in [19], showing the latter requires orders of magnitude larger latencies. [20] investigates the problem of power control with an average power constraint on the codebook in the quasi-static fading channel with perfect CSIRT. A novel achievability bound was found and evaluated for the fading channel with CSIR in [21]. Parts of this work have previously appeared in [1, 2].

The paper is organized as follows. In Section II we describe the channel model and state all our main results formally. Section III characterizes capacity achieving input/output distributions (caid/caod, resp.) and evaluates moments of the information density. Then in Sections IV and V we prove the achievability and converse parts of our (non rank-1) results, respectively. Section VI focuses on the special case of when the matrix of channel gains has rank 1. Finally, Section VII contains discussion of numerical results and behavior of channel dispersion with the number of antennas.

The numerical software used to compute the achievability bounds, dispersion and normal approximation in this work can be found online under the Spectre project [22].

Ii Main Results

The channel model considered in this paper is the frequency-nonselective coherent real block fading discrete-time channel with multiple transmit and receive antennas (MIMO) (See [23, Section II] for background on this model). We will simply refer to it as the MIMO-BF channel, which we formally define as follows. Given parameters as follows: let be the number of transmit antennas, be the number of receive antennas, and be the coherence time of the channel. The input-output relation at block (spanning time instants to ) with is given by


where is a matrix-valued random fading process, is a matrix channel input, is a Gaussian random real-valued matrix with independent entries of zero mean and unit variance, and is the matrix-valued channel output. The process is assumed to be i.i.d. with isotropic distribution , i.e. for any orthogonal matrices and , both and are equal in distribution to . We also assume that


to avoid trivialities. Note that due to merging channel inputs at time instants into one matrix-input, the block-fading channel becomes memoryless. We assume coherent demodulation so that the channel state information (CSI) is fully known to the receiver (CSIR).

An code of blocklength , probability of error and power-constraint is a pair of maps: the encoder and the decoder satisfying the probability of error constraint


on the probability space

where the message is uniformly distributed on , , is as described in (3), and . In addition the input sequences are required to satisfy the power constraint:

where is the Frobenius norm of the matrix .

Under the isotropy assumption on , the capacity appearing in (1) of this channel is given by [8]


where is the capacity of the additive white Gaussian noise (AWGN) channel with SNR , is the minimum of the transmit and receive antennas, and are eigenvalues of . Note that it is common to think that as the expression (7) scales as , but this is only true if .

The goal of this line of work is to characterize dispersion of the present channel. Since the channel is memoryless it is natural to expect, given the results in [3, 12], that dispersion (for ) is given by


where we denoted (single -block) information density by


and is the capacity achieving output distribution (caod). Justification of (8) as the actual (operational) dispersion, appearing in the expansion of is by no means trivial and is the subject of this work.

Here we formally state the main results, then go into more detail in the following sections. Our first result is an achievability and partial converse bound for the MIMO-BF fading channel for fixed parameters .

Theorem 1.

For the MIMO-BF channel, there exists an maximal probability of error code with satisfying


Furthermore, for any there exists so that every code with extra constraint that , must satisfy


where capacity is given by (6) and dispersion by (8).222For the explicit expression for see (III-C) below.


This follows from Theorem 16 and Theorem 19 below. ∎

The dispersion (8) is expressed as minimization over input distributions on that achieve capacity of the channel. As will be shown soon, there may be many different caids, but in typical MIMO scenarios, it turns out they all achieve the same dispersion, which can be in turn calculated for the simplest Telatar caid (i.i.d. Gaussian matrix ). The following theorem gives full details.

Theorem 2.

Assume that , then , where


where are eigenvalues of , , and


This is proved in Proposition 11 below. ∎

Remark 1.

Each of the three terms in (12) is non-negative, see Remark 4 below.

In the rank-1 case (e.g. for MISO systems), there is a multitude of caids, and the minimization problem in (8) is non-trivial. Quite surprisingly, for some values of , we show that the (essentially unique) minimizer is a full-rate orthogonal design. The latter were introduced into communication by Alamouti [10] and Tarokh et al [11]. This shows a somewhat unexpected connection between schemes optimal from modulation-theoretic and information-theoretic points of view. The precise results are as follows.

Theorem 3.

When , we have


where is the non-zero eigenvalues of , and


This is the content of Proposition 12 below.∎

Unfortunately, the quantity is generally unknown since the minimization in (18) is over the manifold of matrix-valued random variables. However, for many dimensions, the minimum can be found by invoking the Radon-Hurwitz theorem. We state this below to introduce the notation, and expand on it in Section VI.

Theorem 4 (Radon-Hurwitz).

There exists a family of real matrices satisfying


if and only iff , where


In particular, and only for .

The following theorem summarizes our current knowledge of .

Theorem 5.

For any pair of positive integers we have


If or then a full-rate orthogonal design is dispersion-optimal and


If instead and then for a jointly-Gaussian capacity-achieving input we have333So that in these cases the bound (22) is either non-tight, or is achieved by a non-jointly-Gaussian caid.


Finally, if and (23) holds, then for any (and similarly with the roles of and switched).

Note that the function is monotonic in even values of (and is for odd), so that for any , there is a large enough such that , in which case an full rate orthogonal design achieves the optimal .

Iii Preliminary results

Iii-a Known results: capacity and capacity achieving output distribution

We review known results on the MIMO-BF channel. Since the channel is memoryless, the capacity is given by


It was shown by Telatar [8] that whenever distribution of is isotropic, the input given by


is a maximizer, resulting in the capacity formula (6). The distribution induced by a caid at the channel output is called capacity achieving output distribution (caod). A classical fact is that while there can be many caids, the caod is unique, e.g. [24, Section 4.4]. Thus, from [8] we infer that the caod is given by


where is decomposed into columns .

Iii-B Capacity achieving input distributions

When is of low rank there are many possible caids, and somewhat surprisingly for the case of rank-1 (e.g. for MISO) there are even jointly Gaussian caids. The next proposition characterizes all caids. Later, this characterization will be used to show that dispersion-minimizing caid in Theorem 1 is given by orthogonal designs (such as Alamouti’s coding), for dimensions when those exist.

Theorem 6.
  1. Every caid satisfies


    If then condition (30) is also sufficient for to be caid.

  2. Let be decomposed into rows . If is caid, then each (i.i.d. Gaussian) and


    If is jointly zero-mean Gaussian and , then (31)-(32) are sufficient for to be caid.

  3. Let be decomposed into columns . If is caid, then each (i.i.d. Gaussian) and


    If is jointly zero-mean Gaussian and , then (33)-(34) are sufficient for to be caid.

  4. When , any caid has pairwise independent rows:


    and in particular


    Therefore, among jointly Gaussian the i.i.d. is the unique caid.

  5. There exist non-Gaussian caids if and only if .

Remark 2.

(Special case of rank-1 ) In the case when and (or more generally, a.s.), there is not only a multitude of caids, but in fact those can have non-trivial correlations between different entries of (and this is ruled out by (36) for all other cases). As an example, for case, the set of jointly Gaussian caids is given by


where iid. In particular, there are caids for which not all entries of are pairwise independent.

Remark 3.

Another way to state conditions (31)-(32) is: all elements in a row (resp. column) are pairwise independent and each minor has antipodal correlation for the two diagonals. In particular, if is caid, then and any submatrix of are caids too (for different and ).


We will rely repeatedly on the following observations:

  1. if are two random vectors in then and for any we have


    This is easy to show by computing characteristic functions.

  2. If are two random vectors in independent of , then


    This follows from the fact that characteristic function of is nowhere zero.

  3. For two matrices we have


    This follows from the fact that quadratic form that is zero everywhere on must have all coefficients equal to zero.

Part 1 (necessity). As we discussed the caod is unique and given by (27). Thus is caid iff for -almost every we have


where is an matrix with iid entries (for sufficiency, just write with denoting differential entropy). We will argue next that this implies (under isotropy assumption on ) that


From (38), (42) is equivalent to .

Let be an almost sure set of those for which (41) holds. Let denote the group of orthogonal matrices, with the topology inherited from . Let and for be countable dense subsets of and , respectively. (These exist since is a second-countable topological space). By isotropy of we have and therefore


is also almost sure: . By assumption (4) must contain a non-zero element and also for all . From (39) we conclude that we have

Arguing by continuity and using density of and , this implies also


In particular, for any there must exist a choice of such that has the top row equal to . Choosing these in (44) and comparing distributions of top rows, we conclude (42) (after scaling by ).

Part 1 (sufficiency). Suppose . Then our goal is to show that (42) implies that is caid. To that end, it is sufficient to show that for all rank-1 . In the special case

claim follows directly from (42). Every other rank-1 can be decomposed as for some matrix , and thus again we get , concluding the proof.

Parts 2 and 3 (necessity). From part 1 we have that for every we must have . Computing expected square we get


Thus, expressing the left-hand side in terms of rows as we get

and thus by (40) we conclude that for all :

Each entry of the matrices above is a quadratic form in and thus again by (40) we conclude (31)-(32). Part 3 is argued similarly with roles of and interchanged.

Parts 2 and 3 (sufficiency). When is (at most) rank-1, we have from part 1 that it is sufficient to show that . When is jointly zero-mean Gaussian, we have is zero-mean Gaussian and so we only need to check its second moment satisfies (45). But as we just argued (45) is equivalent to either (31)-(32) or (33)-(34).

Part 4. As in Part 1, there must exist such that (44) holds and . Thus, by choosing we can diagonalize and thus we conclude any pair of rows must be independent.

Part 5. This part is never used in subsequent parts of the paper, so we only sketch the argument and move the most technical part of the proof to Appendix A. Let . Then arguing as for (44) we conclude that is caid if and only if for any with we have

In other words, we have


If , then rank condition on is not active and hence, we conclude by (38) that . So assume . Note that (46) is equivalent to the condition on characteristic function of as follows:


It is easy to find polynomial (in ) that vanishes on all matrices of rank (e.g. take the product of all minors). Then Proposition 24 in Appendix A constructs non-Gaussian satisfying (47) and hence (46). ∎

Iii-C Information density and its moments

It will be convenient to assume that the matrix is represented as


where are uniformly distributed on and , respectively,444Recall that is the space of all orthogonal matrices. This space is compact in a natural topology and admits a Haar probability measure. and is the diagonal matrix with diagonal entries . Joint distribution of depends on the fading model. It does not matter for our analysis whether ’s are sorted in some way, or permutation-invariant.

For the MIMO-BF channel, let denote the caod (27). Then the information density with respect to (for a single -block of symbols) as defined in (9) becomes


where we assumed as in (48), so that is the -th column of , and have set , with representing the -th row of . The definition naturally extends to blocks of length additively:


We compute the (conditional) mean of information density to get


where we used the following simple fact:

Lemma 7.

Let be uniformly distributed on the unit sphere, and be a fixed matrix, then


Note that by additivity of across columns, it is sufficient to consider the case , for which the statement is clear from symmetry. ∎

Proposition 8.

Let , then we have


where the function defined as is given by


where was defined in (13) and

Remark 4.

Every term in the definition of (except the one with ) is non-negative (for -term, see (88)). The -term will not be important because for inputs satisfying power-constraint with equality it vanishes. Note also that the first term in the definition of in (63) can alternatively be given as


From (III-C), we have the form of the information density. First note that the information density over channel uses decomposes into a sum of independent terms,


As such, the variance conditioned on also decomposes as


from which (54) follows. Because the variance decomposes as a sum in (65), we focus on only computing for a single coherent block. Define


so that in notation from (III-C). With this, the quantity of interest is