Coherent multiple-antenna block-fading channels at finite blocklength
In this paper we consider a channel model that is often used to describe the mobile wireless scenario: multiple-antenna additive white Gaussian noise channels subject to random (fading) gain with full channel state information at the receiver. Dynamics of the fading process are approximated by a piecewise-constant process (frequency non-selective isotropic block fading). This work addresses the finite blocklength fundamental limits of this channel model. Specifically, we give a formula for the channel dispersion – a quantity governing the delay required to achieve capacity. Multiplicative nature of the fading disturbance leads to a number of interesting technical difficulties that required us to enhance traditional methods for finding channel dispersion. Alas, one difficulty remains: the converse (impossibility) part of our result holds under an extra constraint on the growth of the peak-power with blocklength.
Our results demonstrate, for example, that while capacities of and antenna configurations coincide (under fixed received power), the coding delay can be quite sensitive to this switch. For example, at the received SNR of dB the system achieves capacity with codes of length (delay) which is only of the length required for the system. Another interesting implication is that for the MISO channel, the dispersion-optimal coding schemes require employing orthogonal designs such as Alamouti’s scheme – a surprising observation considering the fact that Alamouti’s scheme was designed for reducing demodulation errors, not improving coding rate.
Given a noisy communication channel, the maximal cardinality of a codebook of blocklength which can be decoded with block error probability no greater than is denoted as . Evaluation of this function – the fundamental performance limit of block coding – is alas computationally impossible for most channels of interest. As a resolution of this difficulty  proposed a closed-form normal approximation, based on the asymptotic expansion:
where capacity and dispersion are two intrinsic characteristics of the channel and is the inverse of the -function
Asymptotic expansions such as (Equation 1) are rooted in the central-limit theorem and have been known classically for discrete memoryless channels  and later extended in a wide variety of directions; see surveys in .
The fading channel is the centerpiece of the theory and practice of wireless communication, and hence there are many slightly different variations of the model: differing assumptions on the dynamics and distribution of the fading process, antenna configurations, and channel state knowledge. The capacity of the fading channel was found independent by Telatar  and Foschini and Gans  for the case of Rayleigh fading and channel state information available at the receiver only (CSIR) and at both the transmitter and receiver (CSIRT). Motivated by the linear gains promised by capacity results, space time codes were introduced to exploit multiple antennas, most notable amongst them is Alamouti’s ingenious orthogonal scheme  along with a generalization of Tarokh, Jafarkhani and Calderbank . Motivated by a recent surge of orthogonal frequency division (OFDM) technology, this paper focuses on isotropic channel gain distribution, which is piecewise independent (“block-fading”) and assume full channel state information available at the receiver (CSIR). This work describes finite blocklength effects incurred by the fading on the fundamental communication limits.
Some of the prior work on the similar questions is as follows. Single antenna channel dispersion was computed in  for a more general stationary channel gain process with memory. In  finite-blocklength effects are explored for the non-coherent block fading setup. Quasi-static fading channels in the general MIMO setting have been thoroughly investigated in , showing that the expansion (Equation 1) changes dramatically (in particular the channel dispersion term becomes zero); see also  for evaluation of the bounds. Coherent quasi-static channel has been studied in the limit of infinitely many antennas in  appealing to concentration properties of random matrices. Dispersion for lattices (infinite constellations) in fading channels has been investigated in a sequence of works, see  and references. Note also that there are some very fine differences between stationary and block-fading channel models, cf. . The minimum energy to send bits over MIMO channel for both the coherent and non-coherent case was studied in , showing the latter requires orders of magnitude larger latencies.  investigates the problem of power control with an average power constraint on the codebook in the quasi-static fading channel with perfect CSIRT. A novel achievability bound was found and evaluated for the fading channel with CSIR in . Parts of this work have previously appeared in .
The paper is organized as follows. In Section 2 we describe the channel model and state all our main results formally. Section 3 characterizes capacity achieving input/output distributions (caid/caod, resp.) and evaluates moments of the information density. Then in Sections Section 4 and Section 5 we prove the achievability and converse parts of our (non rank-1) results, respectively. Section 6 focuses on the special case of when the matrix of channel gains has rank 1. Finally, Section 7 contains discussion of numerical results and behavior of channel dispersion with the number of antennas.
The numerical software used to compute the achievability bounds, dispersion and normal approximation in this work can be found online under the Spectre project .
The channel model considered in this paper is the frequency-nonselective coherent real block fading discrete-time channel with multiple transmit and receive antennas (MIMO) (See  for background on this model). We will simply refer to it as the MIMO-BF channel, which we formally define as follows. Given parameters as follows: let be the number of transmit antennas, be the number of receive antennas, and be the coherence time of the channel. The input-output relation at block (spanning time instants to ) with is given by
where is a matrix-valued random fading process, is a matrix channel input, is a Gaussian random real-valued matrix with independent entries of zero mean and unit variance, and is the matrix-valued channel output. The process is assumed to be i.i.d. with isotropic distribution , i.e. for any orthogonal matrices and , both and are equal in distribution to . We also assume that
to avoid trivialities. Note that due to merging channel inputs at time instants into one matrix-input, the block-fading channel becomes memoryless. We assume coherent demodulation so that the channel state information (CSI) is fully known to the receiver (CSIR).
An code of blocklength , probability of error and power-constraint is a pair of maps: the encoder and the decoder satisfying the probability of error constraint
on the probability space
where the message is uniformly distributed on , , is as described in (Equation 3), and . In addition the input sequences are required to satisfy the power constraint:
where is the Frobenius norm of the matrix .
where is the capacity of the additive white Gaussian noise (AWGN) channel with SNR , is the minimum of the transmit and receive antennas, and are eigenvalues of . Note that it is common to think that as the expression ( ?) scales as , but this is only true if .
The goal of this line of work is to characterize dispersion of the present channel. Since the channel is memoryless it is natural to expect, given the results in , that dispersion (for ) is given by
where we denoted (single -block) information density by
and is the capacity achieving output distribution (caod). Justification of (Equation 7) as the actual (operational) dispersion, appearing in the expansion of is by no means trivial and is the subject of this work.
Here we formally state the main results, then go into more detail in the following sections. Our first result is an achievability and partial converse bound for the MIMO-BF fading channel for fixed parameters .
This follows from Theorem ? and Theorem ? below.
The dispersion (Equation 7) is expressed as minimization over input distributions on that achieve capacity of the channel. As will be shown soon, there may be many different caids, but in typical MIMO scenarios, it turns out they all achieve the same dispersion, which can be in turn calculated for the simplest Telatar caid (i.i.d. Gaussian matrix ). The following theorem gives full details.
This is proved in Proposition ? below.
In the rank-1 case (e.g. for MISO systems), there is a multitude of caids, and the minimization problem in (Equation 7) is non-trivial. Quite surprisingly, for some values of , we show that the (essentially unique) minimizer is a full-rate orthogonal design. The latter were introduced into communication by Alamouti  and Tarokh et al . This shows a somewhat unexpected connection between schemes optimal from modulation-theoretic and information-theoretic points of view. The precise results are as follows.
This is the content of Proposition ? below.
Unfortunately, the quantity is generally unknown since the minimization in ( ?) is over the manifold of matrix-valued random variables. However, for many dimensions, the minimum can be found by invoking the Radon-Hurwitz theorem. We state this below to introduce the notation, and expand on it in Section 6.
The following theorem summarizes our current knowledge of .
Note that the function is monotonic in even values of (and is for odd), so that for any , there is a large enough such that , in which case an full rate orthogonal design achieves the optimal .
3.1Known results: capacity and capacity achieving output distribution
We review known results on the MIMO-BF channel. Since the channel is memoryless, the capacity is given by
It was shown by Telatar  that whenever distribution of is isotropic, the input given by
is a maximizer, resulting in the capacity formula (Equation 6). The distribution induced by a caid at the channel output is called capacity achieving output distribution (caod). A classical fact is that while there can be many caids, the caod is unique, e.g. . Thus, from  we infer that the caod is given by
where is decomposed into columns .
3.2Capacity achieving input distributions
When is of low rank there are many possible caids, and somewhat surprisingly for the case of rank-1 (e.g. for MISO) there are even jointly Gaussian caids. The next proposition characterizes all caids. Later, this characterization will be used to show that dispersion-minimizing caid in Theorem ? is given by orthogonal designs (such as Alamouti’s coding), for dimensions when those exist.
We will rely repeatedly on the following observations:
if are two random vectors in then and for any we have
This is easy to show by computing characteristic functions.
If are two random vectors in independent of , then
This follows from the fact that characteristic function of is nowhere zero.
For two matrices we have
This follows from the fact that quadratic form that is zero everywhere on must have all coefficients equal to zero.
Part 1 (necessity). As we discussed the caod is unique and given by (Equation 11). Thus is caid iff for -almost every we have
where is an matrix with iid entries (for sufficiency, just write with denoting differential entropy). We will argue next that this implies (under isotropy assumption on ) that
Let be an almost sure set of those for which (Equation 15) holds. Let denote the group of orthogonal matrices, with the topology inherited from . Let and for be countable dense subsets of and , respectively. (These exist since is a second-countable topological space). By isotropy of we have and therefore
Arguing by continuity and using density of and , this implies also
In particular, for any there must exist a choice of such that has the top row equal to . Choosing these in (Equation 17) and comparing distributions of top rows, we conclude (Equation 16) (after scaling by ).
Part 1 (sufficiency). Suppose . Then our goal is to show that (Equation 16) implies that is caid. To that end, it is sufficient to show that for all rank-1 . In the special case
claim follows directly from (Equation 16). Every other rank-1 can be decomposed as for some matrix , and thus again we get , concluding the proof.
Parts 2 and 3 (necessity). From part 1 we have that for every we must have . Computing expected square we get
Thus, expressing the left-hand side in terms of rows as we get
and thus by (Equation 14) we conclude that for all :
Each entry of the matrices above is a quadratic form in and thus again by (Equation 14) we conclude ( ?)- ( ?). Part 3 is argued similarly with roles of and interchanged.
Parts 2 and 3 (sufficiency). When is (at most) rank-1, we have from part 1 that it is sufficient to show that . When is jointly zero-mean Gaussian, we have is zero-mean Gaussian and so we only need to check its second moment satisfies (Equation 18). But as we just argued (Equation 18) is equivalent to either ( ?)- ( ?) or ( ?)- ( ?).
Part 4. As in Part 1, there must exist such that (Equation 17) holds and . Thus, by choosing we can diagonalize and thus we conclude any pair of rows must be independent.
Part 5. This part is never used in subsequent parts of the paper, so we only sketch the argument and move the most technical part of the proof to Appendix Section 8. Let . Then arguing as for (Equation 17) we conclude that is caid if and only if for any with we have
In other words, we have
It is easy to find polynomial (in ) that vanishes on all matrices of rank (e.g. take the product of all minors). Then Proposition ? in Appendix Section 8 constructs non-Gaussian satisfying (Equation 20) and hence (Equation 19).
3.3Information density and its moments
It will be convenient to assume that the matrix is represented as
where are uniformly distributed on and , respectively,
where we assumed as in (Equation 21), so that is the -th column of , and have set , with representing the -th row of . The definition naturally extends to blocks of length additively:
We compute the (conditional) mean of information density to get
where we used the following simple fact:
Note that by additivity of across columns, it is sufficient to consider the case , for which the statement is clear from symmetry.
From (Equation 22), we have the form of the information density. First note that the information density over channel uses decomposes into a sum of independent terms,
As such, the variance conditioned on also decomposes as
from which ( ?) follows. Because the variance decomposes as a sum in (Equation 25), we focus on only computing for a single coherent block. Define
so that in notation from (Equation 22). With this, the quantity of interest is
where ( ?) follows from the identity
Below we show that and corresponds to ( ?), corresponds to ( ?), corresponds to ( ?), and corresponds to ( ?) and ( ?). We evaluate each term separately.
where ( ?) follows from noting that
Now, since is independent from by the rotational invariance assumption, we have that is independent from , since only depends on through its eigenvalues, see ( ?). Hence the outer expectation in (Equation 28) distributed over the term, and we obtain
which gives ( ?).
Next, in ( ?) becomes
For in ( ?), we obtain
(Equation 29) follows from taking the variance over .
( ?) follows from Lemma ? applied to , and adding and subtracting the term
Continuing with from ( ?),
(Equation 30) follows from taking the expectation over ,
( ?) follow from applying the variance identity (Equation 27) with respect to and , as well as recalling ( ?).
We are left to show that the term ( ?) equals ( ?). To that end, define
We will finish the proof by showing
To that end, we first compute moments of drawn from the Haar measure on the orthogonal group.
Proof of Lemma is given below.
First, note that the variance does not depend on , since the marginal distribution of each is uniform on the unit sphere. Hence below we only consider . We obtain
where denotes the -th row of . Now it is a matter counting similar terms.
(Equation 33) follows from collecting like terms from the summation in ( ?).
( ?) uses Lemma ? to compute each expectation.
( ?) follows from realizing that
Plugging this back into (Equation 32) yields the variance term,
Now we compute the covariance term from (Equation 31) in a similar way. By symmetry of the columns of , we can consider only the covariance between and , i.e.
Expanding the expectation, we get
With this, we obtain from (Equation 35),
where the steps follow just as in the variance computation (Equation 33)- ( ?).
Plugging this into ( ?) finishes the proof.
We first note that all entries of have identical distribution, since permutations of rows and columns leave the distribution invariant. Because of this, we can WLOG only consider