Coherent multipleantenna blockfading channels at finite blocklength
Abstract
In this paper we consider a channel model that is often used to describe the mobile wireless scenario: multipleantenna additive white Gaussian noise channels subject to random (fading) gain with full channel state information at the receiver. Dynamics of the fading process are approximated by a piecewiseconstant process (frequency nonselective isotropic block fading). This work addresses the finite blocklength fundamental limits of this channel model. Specifically, we give a formula for the channel dispersion – a quantity governing the delay required to achieve capacity. Multiplicative nature of the fading disturbance leads to a number of interesting technical difficulties that required us to enhance traditional methods for finding channel dispersion. Alas, one difficulty remains: the converse (impossibility) part of our result holds under an extra constraint on the growth of the peakpower with blocklength.
Our results demonstrate, for example, that while capacities of and antenna configurations coincide (under fixed received power), the coding delay can be quite sensitive to this switch. For example, at the received SNR of dB the system achieves capacity with codes of length (delay) which is only of the length required for the system. Another interesting implication is that for the MISO channel, the dispersionoptimal coding schemes require employing orthogonal designs such as Alamouti’s scheme – a surprising observation considering the fact that Alamouti’s scheme was designed for reducing demodulation errors, not improving coding rate.
I Introduction
Given a noisy communication channel, the maximal cardinality of a codebook of blocklength which can be decoded with block error probability no greater than is denoted as . Evaluation of this function – the fundamental performance limit of block coding – is alas computationally impossible for most channels of interest. As a resolution of this difficulty [3] proposed a closedform normal approximation, based on the asymptotic expansion:
(1) 
where capacity and dispersion are two intrinsic characteristics of the channel and is the inverse of the function^{1}^{1}1As usual, . One immediate consequence of the normal approximation is an estimate for the minimal blocklength (delay) required to achieve a given fraction of channel capacity:
(2) 
Asymptotic expansions such as (1) are rooted in the centrallimit theorem and have been known classically for discrete memoryless channels [4, 5] and later extended in a wide variety of directions; see surveys in [6, 7].
The fading channel is the centerpiece of the theory and practice of wireless communication, and hence there are many slightly different variations of the model: differing assumptions on the dynamics and distribution of the fading process, antenna configurations, and channel state knowledge. The capacity of the fading channel was found independent by Telatar [8] and Foschini and Gans [9] for the case of Rayleigh fading and channel state information available at the receiver only (CSIR) and at both the transmitter and receiver (CSIRT). Motivated by the linear gains promised by capacity results, space time codes were introduced to exploit multiple antennas, most notable amongst them is Alamouti’s ingenious orthogonal scheme [10] along with a generalization of Tarokh, Jafarkhani and Calderbank [11]. Motivated by a recent surge of orthogonal frequency division (OFDM) technology, this paper focuses on isotropic channel gain distribution, which is piecewise independent (“blockfading”) and assume full channel state information available at the receiver (CSIR). This work describes finite blocklength effects incurred by the fading on the fundamental communication limits.
Some of the prior work on the similar questions is as follows. Single antenna channel dispersion was computed in [12] for a more general stationary channel gain process with memory. In [13] finiteblocklength effects are explored for the noncoherent block fading setup. Quasistatic fading channels in the general MIMO setting have been thoroughly investigated in [14], showing that the expansion (1) changes dramatically (in particular the channel dispersion term becomes zero); see also [15] for evaluation of the bounds. Coherent quasistatic channel has been studied in the limit of infinitely many antennas in [16] appealing to concentration properties of random matrices. Dispersion for lattices (infinite constellations) in fading channels has been investigated in a sequence of works, see [17] and references. Note also that there are some very fine differences between stationary and blockfading channel models, cf. [18, Section 4]. The minimum energy to send bits over MIMO channel for both the coherent and noncoherent case was studied in [19], showing the latter requires orders of magnitude larger latencies. [20] investigates the problem of power control with an average power constraint on the codebook in the quasistatic fading channel with perfect CSIRT. A novel achievability bound was found and evaluated for the fading channel with CSIR in [21]. Parts of this work have previously appeared in [1, 2].
The paper is organized as follows. In Section II we describe the channel model and state all our main results formally. Section III characterizes capacity achieving input/output distributions (caid/caod, resp.) and evaluates moments of the information density. Then in Sections IV and V we prove the achievability and converse parts of our (non rank1) results, respectively. Section VI focuses on the special case of when the matrix of channel gains has rank 1. Finally, Section VII contains discussion of numerical results and behavior of channel dispersion with the number of antennas.
The numerical software used to compute the achievability bounds, dispersion and normal approximation in this work can be found online under the Spectre project [22].
Ii Main Results
The channel model considered in this paper is the frequencynonselective coherent real block fading discretetime channel with multiple transmit and receive antennas (MIMO) (See [23, Section II] for background on this model). We will simply refer to it as the MIMOBF channel, which we formally define as follows. Given parameters as follows: let be the number of transmit antennas, be the number of receive antennas, and be the coherence time of the channel. The inputoutput relation at block (spanning time instants to ) with is given by
(3) 
where is a matrixvalued random fading process, is a matrix channel input, is a Gaussian random realvalued matrix with independent entries of zero mean and unit variance, and is the matrixvalued channel output. The process is assumed to be i.i.d. with isotropic distribution , i.e. for any orthogonal matrices and , both and are equal in distribution to . We also assume that
(4) 
to avoid trivialities. Note that due to merging channel inputs at time instants into one matrixinput, the blockfading channel becomes memoryless. We assume coherent demodulation so that the channel state information (CSI) is fully known to the receiver (CSIR).
An code of blocklength , probability of error and powerconstraint is a pair of maps: the encoder and the decoder satisfying the probability of error constraint
(5) 
on the probability space
where the message is uniformly distributed on , , is as described in (3), and . In addition the input sequences are required to satisfy the power constraint:
where is the Frobenius norm of the matrix .
Under the isotropy assumption on , the capacity appearing in (1) of this channel is given by [8]
(6)  
(7) 
where is the capacity of the additive white Gaussian noise (AWGN) channel with SNR , is the minimum of the transmit and receive antennas, and are eigenvalues of . Note that it is common to think that as the expression (7) scales as , but this is only true if .
The goal of this line of work is to characterize dispersion of the present channel. Since the channel is memoryless it is natural to expect, given the results in [3, 12], that dispersion (for ) is given by
(8) 
where we denoted (single block) information density by
(9) 
and is the capacity achieving output distribution (caod). Justification of (8) as the actual (operational) dispersion, appearing in the expansion of is by no means trivial and is the subject of this work.
Here we formally state the main results, then go into more detail in the following sections. Our first result is an achievability and partial converse bound for the MIMOBF fading channel for fixed parameters .
Theorem 1.
For the MIMOBF channel, there exists an maximal probability of error code with satisfying
(10) 
Furthermore, for any there exists so that every code with extra constraint that , must satisfy
(11) 
where capacity is given by (6) and dispersion by (8).^{2}^{2}2For the explicit expression for see (IIIC) below.
The dispersion (8) is expressed as minimization over input distributions on that achieve capacity of the channel. As will be shown soon, there may be many different caids, but in typical MIMO scenarios, it turns out they all achieve the same dispersion, which can be in turn calculated for the simplest Telatar caid (i.i.d. Gaussian matrix ). The following theorem gives full details.
Theorem 2.
Assume that , then , where
(12) 
where are eigenvalues of , , and
(13)  
(14)  
(15) 
Proof.
This is proved in Proposition 11 below. ∎
In the rank1 case (e.g. for MISO systems), there is a multitude of caids, and the minimization problem in (8) is nontrivial. Quite surprisingly, for some values of , we show that the (essentially unique) minimizer is a fullrate orthogonal design. The latter were introduced into communication by Alamouti [10] and Tarokh et al [11]. This shows a somewhat unexpected connection between schemes optimal from modulationtheoretic and informationtheoretic points of view. The precise results are as follows.
Theorem 3.
When , we have
(16)  
(17) 
where is the nonzero eigenvalues of , and
(18) 
Proof.
This is the content of Proposition 12 below.∎
Unfortunately, the quantity is generally unknown since the minimization in (18) is over the manifold of matrixvalued random variables. However, for many dimensions, the minimum can be found by invoking the RadonHurwitz theorem. We state this below to introduce the notation, and expand on it in Section VI.
Theorem 4 (RadonHurwitz).
There exists a family of real matrices satisfying
(19)  
(20) 
if and only iff , where
(21) 
In particular, and only for .
The following theorem summarizes our current knowledge of .
Theorem 5.
For any pair of positive integers we have
(22) 
If or then a fullrate orthogonal design is dispersionoptimal and
(23) 
If instead and then for a jointlyGaussian capacityachieving input we have^{3}^{3}3So that in these cases the bound (22) is either nontight, or is achieved by a nonjointlyGaussian caid.
(24) 
Finally, if and (23) holds, then for any (and similarly with the roles of and switched).
Note that the function is monotonic in even values of (and is for odd), so that for any , there is a large enough such that , in which case an full rate orthogonal design achieves the optimal .
Iii Preliminary results
Iiia Known results: capacity and capacity achieving output distribution
We review known results on the MIMOBF channel. Since the channel is memoryless, the capacity is given by
(25) 
It was shown by Telatar [8] that whenever distribution of is isotropic, the input given by
(26) 
is a maximizer, resulting in the capacity formula (6). The distribution induced by a caid at the channel output is called capacity achieving output distribution (caod). A classical fact is that while there can be many caids, the caod is unique, e.g. [24, Section 4.4]. Thus, from [8] we infer that the caod is given by
(27)  
(28)  
(29) 
where is decomposed into columns .
IiiB Capacity achieving input distributions
When is of low rank there are many possible caids, and somewhat surprisingly for the case of rank1 (e.g. for MISO) there are even jointly Gaussian caids. The next proposition characterizes all caids. Later, this characterization will be used to show that dispersionminimizing caid in Theorem 1 is given by orthogonal designs (such as Alamouti’s coding), for dimensions when those exist.
Theorem 6.

When , any caid has pairwise independent rows:
(35) and in particular
(36) Therefore, among jointly Gaussian the i.i.d. is the unique caid.

There exist nonGaussian caids if and only if .
Remark 2.
(Special case of rank1 ) In the case when and (or more generally, a.s.), there is not only a multitude of caids, but in fact those can have nontrivial correlations between different entries of (and this is ruled out by (36) for all other cases). As an example, for case, the set of jointly Gaussian caids is given by
(37) 
where iid. In particular, there are caids for which not all entries of are pairwise independent.
Remark 3.
Proof.
We will rely repeatedly on the following observations:

if are two random vectors in then and for any we have
(38) This is easy to show by computing characteristic functions.

If are two random vectors in independent of , then
(39) This follows from the fact that characteristic function of is nowhere zero.

For two matrices we have
(40) This follows from the fact that quadratic form that is zero everywhere on must have all coefficients equal to zero.
Part 1 (necessity). As we discussed the caod is unique and given by (27). Thus is caid iff for almost every we have
(41) 
where is an matrix with iid entries (for sufficiency, just write with denoting differential entropy). We will argue next that this implies (under isotropy assumption on ) that
(42) 
Let be an almost sure set of those for which (41) holds. Let denote the group of orthogonal matrices, with the topology inherited from . Let and for be countable dense subsets of and , respectively. (These exist since is a secondcountable topological space). By isotropy of we have and therefore
(43) 
is also almost sure: . By assumption (4) must contain a nonzero element and also for all . From (39) we conclude that we have
Arguing by continuity and using density of and , this implies also
(44) 
In particular, for any there must exist a choice of such that has the top row equal to . Choosing these in (44) and comparing distributions of top rows, we conclude (42) (after scaling by ).
Part 1 (sufficiency). Suppose . Then our goal is to show that (42) implies that is caid. To that end, it is sufficient to show that for all rank1 . In the special case
claim follows directly from (42). Every other rank1 can be decomposed as for some matrix , and thus again we get , concluding the proof.
Parts 2 and 3 (necessity). From part 1 we have that for every we must have . Computing expected square we get
(45) 
Thus, expressing the lefthand side in terms of rows as we get
and thus by (40) we conclude that for all :
Each entry of the matrices above is a quadratic form in and thus again by (40) we conclude (31)(32). Part 3 is argued similarly with roles of and interchanged.
Parts 2 and 3 (sufficiency). When is (at most) rank1, we have from part 1 that it is sufficient to show that . When is jointly zeromean Gaussian, we have is zeromean Gaussian and so we only need to check its second moment satisfies (45). But as we just argued (45) is equivalent to either (31)(32) or (33)(34).
Part 4. As in Part 1, there must exist such that (44) holds and . Thus, by choosing we can diagonalize and thus we conclude any pair of rows must be independent.
Part 5. This part is never used in subsequent parts of the paper, so we only sketch the argument and move the most technical part of the proof to Appendix A. Let . Then arguing as for (44) we conclude that is caid if and only if for any with we have
In other words, we have
(46) 
If , then rank condition on is not active and hence, we conclude by (38) that . So assume . Note that (46) is equivalent to the condition on characteristic function of as follows:
(47) 
It is easy to find polynomial (in ) that vanishes on all matrices of rank (e.g. take the product of all minors). Then Proposition 24 in Appendix A constructs nonGaussian satisfying (47) and hence (46). ∎
IiiC Information density and its moments
It will be convenient to assume that the matrix is represented as
(48) 
where are uniformly distributed on and , respectively,^{4}^{4}4Recall that is the space of all orthogonal matrices. This space is compact in a natural topology and admits a Haar probability measure. and is the diagonal matrix with diagonal entries . Joint distribution of depends on the fading model. It does not matter for our analysis whether ’s are sorted in some way, or permutationinvariant.
For the MIMOBF channel, let denote the caod (27). Then the information density with respect to (for a single block of symbols) as defined in (9) becomes
(49) 
where we assumed as in (48), so that is the th column of , and have set , with representing the th row of . The definition naturally extends to blocks of length additively:
(50) 
We compute the (conditional) mean of information density to get
(51)  
(52) 
where we used the following simple fact:
Lemma 7.
Let be uniformly distributed on the unit sphere, and be a fixed matrix, then
(53) 
Proof.
Note that by additivity of across columns, it is sufficient to consider the case , for which the statement is clear from symmetry. ∎
Proposition 8.
Let , then we have
(54) 
where the function defined as is given by
(55)  
(56)  
(57)  
(58)  
(59) 
where was defined in (13) and
(60)  
(61)  
(62)  
(63) 
Remark 4.
Proof.
From (IIIC), we have the form of the information density. First note that the information density over channel uses decomposes into a sum of independent terms,
(64) 
As such, the variance conditioned on also decomposes as
(65) 
from which (54) follows. Because the variance decomposes as a sum in (65), we focus on only computing for a single coherent block. Define
(66)  
(67) 
so that in notation from (IIIC). With this, the quantity of interest is
(68)  