Coherent multiple-antenna block-fading channels at finite blocklength
In this paper we consider a channel model that is often used to describe the mobile wireless scenario: multiple-antenna additive white Gaussian noise channels subject to random (fading) gain with full channel state information at the receiver. Dynamics of the fading process are approximated by a piecewise-constant process (frequency non-selective isotropic block fading). This work addresses the finite blocklength fundamental limits of this channel model. Specifically, we give a formula for the channel dispersion – a quantity governing the delay required to achieve capacity. Multiplicative nature of the fading disturbance leads to a number of interesting technical difficulties that required us to enhance traditional methods for finding channel dispersion. Alas, one difficulty remains: the converse (impossibility) part of our result holds under an extra constraint on the growth of the peak-power with blocklength.
Our results demonstrate, for example, that while capacities of and antenna configurations coincide (under fixed received power), the coding delay can be quite sensitive to this switch. For example, at the received SNR of dB the system achieves capacity with codes of length (delay) which is only of the length required for the system. Another interesting implication is that for the MISO channel, the dispersion-optimal coding schemes require employing orthogonal designs such as Alamouti’s scheme – a surprising observation considering the fact that Alamouti’s scheme was designed for reducing demodulation errors, not improving coding rate.
Given a noisy communication channel, the maximal cardinality of a codebook of blocklength which can be decoded with block error probability no greater than is denoted as . Evaluation of this function – the fundamental performance limit of block coding – is alas computationally impossible for most channels of interest. As a resolution of this difficulty  proposed a closed-form normal approximation, based on the asymptotic expansion:
where capacity and dispersion are two intrinsic characteristics of the channel and is the inverse of the -function111As usual, . One immediate consequence of the normal approximation is an estimate for the minimal blocklength (delay) required to achieve a given fraction of channel capacity:
Asymptotic expansions such as (1) are rooted in the central-limit theorem and have been known classically for discrete memoryless channels [4, 5] and later extended in a wide variety of directions; see surveys in [6, 7].
The fading channel is the centerpiece of the theory and practice of wireless communication, and hence there are many slightly different variations of the model: differing assumptions on the dynamics and distribution of the fading process, antenna configurations, and channel state knowledge. The capacity of the fading channel was found independent by Telatar  and Foschini and Gans  for the case of Rayleigh fading and channel state information available at the receiver only (CSIR) and at both the transmitter and receiver (CSIRT). Motivated by the linear gains promised by capacity results, space time codes were introduced to exploit multiple antennas, most notable amongst them is Alamouti’s ingenious orthogonal scheme  along with a generalization of Tarokh, Jafarkhani and Calderbank . Motivated by a recent surge of orthogonal frequency division (OFDM) technology, this paper focuses on isotropic channel gain distribution, which is piecewise independent (“block-fading”) and assume full channel state information available at the receiver (CSIR). This work describes finite blocklength effects incurred by the fading on the fundamental communication limits.
Some of the prior work on the similar questions is as follows. Single antenna channel dispersion was computed in  for a more general stationary channel gain process with memory. In  finite-blocklength effects are explored for the non-coherent block fading setup. Quasi-static fading channels in the general MIMO setting have been thoroughly investigated in , showing that the expansion (1) changes dramatically (in particular the channel dispersion term becomes zero); see also  for evaluation of the bounds. Coherent quasi-static channel has been studied in the limit of infinitely many antennas in  appealing to concentration properties of random matrices. Dispersion for lattices (infinite constellations) in fading channels has been investigated in a sequence of works, see  and references. Note also that there are some very fine differences between stationary and block-fading channel models, cf. [18, Section 4]. The minimum energy to send bits over MIMO channel for both the coherent and non-coherent case was studied in , showing the latter requires orders of magnitude larger latencies.  investigates the problem of power control with an average power constraint on the codebook in the quasi-static fading channel with perfect CSIRT. A novel achievability bound was found and evaluated for the fading channel with CSIR in . Parts of this work have previously appeared in [1, 2].
The paper is organized as follows. In Section II we describe the channel model and state all our main results formally. Section III characterizes capacity achieving input/output distributions (caid/caod, resp.) and evaluates moments of the information density. Then in Sections IV and V we prove the achievability and converse parts of our (non rank-1) results, respectively. Section VI focuses on the special case of when the matrix of channel gains has rank 1. Finally, Section VII contains discussion of numerical results and behavior of channel dispersion with the number of antennas.
The numerical software used to compute the achievability bounds, dispersion and normal approximation in this work can be found online under the Spectre project .
Ii Main Results
The channel model considered in this paper is the frequency-nonselective coherent real block fading discrete-time channel with multiple transmit and receive antennas (MIMO) (See [23, Section II] for background on this model). We will simply refer to it as the MIMO-BF channel, which we formally define as follows. Given parameters as follows: let be the number of transmit antennas, be the number of receive antennas, and be the coherence time of the channel. The input-output relation at block (spanning time instants to ) with is given by
where is a matrix-valued random fading process, is a matrix channel input, is a Gaussian random real-valued matrix with independent entries of zero mean and unit variance, and is the matrix-valued channel output. The process is assumed to be i.i.d. with isotropic distribution , i.e. for any orthogonal matrices and , both and are equal in distribution to . We also assume that
to avoid trivialities. Note that due to merging channel inputs at time instants into one matrix-input, the block-fading channel becomes memoryless. We assume coherent demodulation so that the channel state information (CSI) is fully known to the receiver (CSIR).
An code of blocklength , probability of error and power-constraint is a pair of maps: the encoder and the decoder satisfying the probability of error constraint
on the probability space
where the message is uniformly distributed on , , is as described in (3), and . In addition the input sequences are required to satisfy the power constraint:
where is the Frobenius norm of the matrix .
where is the capacity of the additive white Gaussian noise (AWGN) channel with SNR , is the minimum of the transmit and receive antennas, and are eigenvalues of . Note that it is common to think that as the expression (7) scales as , but this is only true if .
The goal of this line of work is to characterize dispersion of the present channel. Since the channel is memoryless it is natural to expect, given the results in [3, 12], that dispersion (for ) is given by
where we denoted (single -block) information density by
and is the capacity achieving output distribution (caod). Justification of (8) as the actual (operational) dispersion, appearing in the expansion of is by no means trivial and is the subject of this work.
Here we formally state the main results, then go into more detail in the following sections. Our first result is an achievability and partial converse bound for the MIMO-BF fading channel for fixed parameters .
For the MIMO-BF channel, there exists an maximal probability of error code with satisfying
Furthermore, for any there exists so that every code with extra constraint that , must satisfy
The dispersion (8) is expressed as minimization over input distributions on that achieve capacity of the channel. As will be shown soon, there may be many different caids, but in typical MIMO scenarios, it turns out they all achieve the same dispersion, which can be in turn calculated for the simplest Telatar caid (i.i.d. Gaussian matrix ). The following theorem gives full details.
Assume that , then , where
where are eigenvalues of , , and
This is proved in Proposition 11 below. ∎
In the rank-1 case (e.g. for MISO systems), there is a multitude of caids, and the minimization problem in (8) is non-trivial. Quite surprisingly, for some values of , we show that the (essentially unique) minimizer is a full-rate orthogonal design. The latter were introduced into communication by Alamouti  and Tarokh et al . This shows a somewhat unexpected connection between schemes optimal from modulation-theoretic and information-theoretic points of view. The precise results are as follows.
When , we have
where is the non-zero eigenvalues of , and
This is the content of Proposition 12 below.∎
Unfortunately, the quantity is generally unknown since the minimization in (18) is over the manifold of matrix-valued random variables. However, for many dimensions, the minimum can be found by invoking the Radon-Hurwitz theorem. We state this below to introduce the notation, and expand on it in Section VI.
Theorem 4 (Radon-Hurwitz).
There exists a family of real matrices satisfying
if and only iff , where
In particular, and only for .
The following theorem summarizes our current knowledge of .
For any pair of positive integers we have
If or then a full-rate orthogonal design is dispersion-optimal and
If instead and then for a jointly-Gaussian capacity-achieving input we have333So that in these cases the bound (22) is either non-tight, or is achieved by a non-jointly-Gaussian caid.
Finally, if and (23) holds, then for any (and similarly with the roles of and switched).
Note that the function is monotonic in even values of (and is for odd), so that for any , there is a large enough such that , in which case an full rate orthogonal design achieves the optimal .
Iii Preliminary results
Iii-a Known results: capacity and capacity achieving output distribution
We review known results on the MIMO-BF channel. Since the channel is memoryless, the capacity is given by
It was shown by Telatar  that whenever distribution of is isotropic, the input given by
is a maximizer, resulting in the capacity formula (6). The distribution induced by a caid at the channel output is called capacity achieving output distribution (caod). A classical fact is that while there can be many caids, the caod is unique, e.g. [24, Section 4.4]. Thus, from  we infer that the caod is given by
where is decomposed into columns .
Iii-B Capacity achieving input distributions
When is of low rank there are many possible caids, and somewhat surprisingly for the case of rank-1 (e.g. for MISO) there are even jointly Gaussian caids. The next proposition characterizes all caids. Later, this characterization will be used to show that dispersion-minimizing caid in Theorem 1 is given by orthogonal designs (such as Alamouti’s coding), for dimensions when those exist.
Every caid satisfies
If then condition (30) is also sufficient for to be caid.
When , any caid has pairwise independent rows:
and in particular
Therefore, among jointly Gaussian the i.i.d. is the unique caid.
There exist non-Gaussian caids if and only if .
(Special case of rank-1 ) In the case when and (or more generally, a.s.), there is not only a multitude of caids, but in fact those can have non-trivial correlations between different entries of (and this is ruled out by (36) for all other cases). As an example, for case, the set of jointly Gaussian caids is given by
where iid. In particular, there are caids for which not all entries of are pairwise independent.
We will rely repeatedly on the following observations:
if are two random vectors in then and for any we have
This is easy to show by computing characteristic functions.
If are two random vectors in independent of , then
This follows from the fact that characteristic function of is nowhere zero.
For two matrices we have
This follows from the fact that quadratic form that is zero everywhere on must have all coefficients equal to zero.
Part 1 (necessity). As we discussed the caod is unique and given by (27). Thus is caid iff for -almost every we have
where is an matrix with iid entries (for sufficiency, just write with denoting differential entropy). We will argue next that this implies (under isotropy assumption on ) that
Let be an almost sure set of those for which (41) holds. Let denote the group of orthogonal matrices, with the topology inherited from . Let and for be countable dense subsets of and , respectively. (These exist since is a second-countable topological space). By isotropy of we have and therefore
Arguing by continuity and using density of and , this implies also
Part 1 (sufficiency). Suppose . Then our goal is to show that (42) implies that is caid. To that end, it is sufficient to show that for all rank-1 . In the special case
claim follows directly from (42). Every other rank-1 can be decomposed as for some matrix , and thus again we get , concluding the proof.
Parts 2 and 3 (necessity). From part 1 we have that for every we must have . Computing expected square we get
Thus, expressing the left-hand side in terms of rows as we get
and thus by (40) we conclude that for all :
Parts 2 and 3 (sufficiency). When is (at most) rank-1, we have from part 1 that it is sufficient to show that . When is jointly zero-mean Gaussian, we have is zero-mean Gaussian and so we only need to check its second moment satisfies (45). But as we just argued (45) is equivalent to either (31)-(32) or (33)-(34).
Part 4. As in Part 1, there must exist such that (44) holds and . Thus, by choosing we can diagonalize and thus we conclude any pair of rows must be independent.
Part 5. This part is never used in subsequent parts of the paper, so we only sketch the argument and move the most technical part of the proof to Appendix A. Let . Then arguing as for (44) we conclude that is caid if and only if for any with we have
In other words, we have
It is easy to find polynomial (in ) that vanishes on all matrices of rank (e.g. take the product of all minors). Then Proposition 24 in Appendix A constructs non-Gaussian satisfying (47) and hence (46). ∎
Iii-C Information density and its moments
It will be convenient to assume that the matrix is represented as
where are uniformly distributed on and , respectively,444Recall that is the space of all orthogonal matrices. This space is compact in a natural topology and admits a Haar probability measure. and is the diagonal matrix with diagonal entries . Joint distribution of depends on the fading model. It does not matter for our analysis whether ’s are sorted in some way, or permutation-invariant.
where we assumed as in (48), so that is the -th column of , and have set , with representing the -th row of . The definition naturally extends to blocks of length additively:
We compute the (conditional) mean of information density to get
where we used the following simple fact:
Let be uniformly distributed on the unit sphere, and be a fixed matrix, then
Note that by additivity of across columns, it is sufficient to consider the case , for which the statement is clear from symmetry. ∎
Let , then we have
where the function defined as is given by
where was defined in (13) and
From (III-C), we have the form of the information density. First note that the information density over channel uses decomposes into a sum of independent terms,
As such, the variance conditioned on also decomposes as
so that in notation from (III-C). With this, the quantity of interest is