Coherent multiple-antenna block-fading channels at finite blocklength

# Coherent multiple-antenna block-fading channels at finite blocklength

## Abstract

In this paper we consider a channel model that is often used to describe the mobile wireless scenario: multiple-antenna additive white Gaussian noise channels subject to random (fading) gain with full channel state information at the receiver. Dynamics of the fading process are approximated by a piecewise-constant process (frequency non-selective isotropic block fading). This work addresses the finite blocklength fundamental limits of this channel model. Specifically, we give a formula for the channel dispersion – a quantity governing the delay required to achieve capacity. Multiplicative nature of the fading disturbance leads to a number of interesting technical difficulties that required us to enhance traditional methods for finding channel dispersion. Alas, one difficulty remains: the converse (impossibility) part of our result holds under an extra constraint on the growth of the peak-power with blocklength.

Our results demonstrate, for example, that while capacities of and antenna configurations coincide (under fixed received power), the coding delay can be quite sensitive to this switch. For example, at the received SNR of dB the system achieves capacity with codes of length (delay) which is only of the length required for the system. Another interesting implication is that for the MISO channel, the dispersion-optimal coding schemes require employing orthogonal designs such as Alamouti’s scheme – a surprising observation considering the fact that Alamouti’s scheme was designed for reducing demodulation errors, not improving coding rate.

## 1Introduction

Given a noisy communication channel, the maximal cardinality of a codebook of blocklength which can be decoded with block error probability no greater than is denoted as . Evaluation of this function – the fundamental performance limit of block coding – is alas computationally impossible for most channels of interest. As a resolution of this difficulty [3] proposed a closed-form normal approximation, based on the asymptotic expansion:

where capacity and dispersion are two intrinsic characteristics of the channel and is the inverse of the -function1. One immediate consequence of the normal approximation is an estimate for the minimal blocklength (delay) required to achieve a given fraction of channel capacity:

Asymptotic expansions such as (Equation 1) are rooted in the central-limit theorem and have been known classically for discrete memoryless channels [4] and later extended in a wide variety of directions; see surveys in [6].

The fading channel is the centerpiece of the theory and practice of wireless communication, and hence there are many slightly different variations of the model: differing assumptions on the dynamics and distribution of the fading process, antenna configurations, and channel state knowledge. The capacity of the fading channel was found independent by Telatar [8] and Foschini and Gans [9] for the case of Rayleigh fading and channel state information available at the receiver only (CSIR) and at both the transmitter and receiver (CSIRT). Motivated by the linear gains promised by capacity results, space time codes were introduced to exploit multiple antennas, most notable amongst them is Alamouti’s ingenious orthogonal scheme [10] along with a generalization of Tarokh, Jafarkhani and Calderbank [11]. Motivated by a recent surge of orthogonal frequency division (OFDM) technology, this paper focuses on isotropic channel gain distribution, which is piecewise independent (“block-fading”) and assume full channel state information available at the receiver (CSIR). This work describes finite blocklength effects incurred by the fading on the fundamental communication limits.

Some of the prior work on the similar questions is as follows. Single antenna channel dispersion was computed in [12] for a more general stationary channel gain process with memory. In [13] finite-blocklength effects are explored for the non-coherent block fading setup. Quasi-static fading channels in the general MIMO setting have been thoroughly investigated in [14], showing that the expansion (Equation 1) changes dramatically (in particular the channel dispersion term becomes zero); see also [15] for evaluation of the bounds. Coherent quasi-static channel has been studied in the limit of infinitely many antennas in [16] appealing to concentration properties of random matrices. Dispersion for lattices (infinite constellations) in fading channels has been investigated in a sequence of works, see [17] and references. Note also that there are some very fine differences between stationary and block-fading channel models, cf. [18]. The minimum energy to send bits over MIMO channel for both the coherent and non-coherent case was studied in [19], showing the latter requires orders of magnitude larger latencies. [20] investigates the problem of power control with an average power constraint on the codebook in the quasi-static fading channel with perfect CSIRT. A novel achievability bound was found and evaluated for the fading channel with CSIR in [21]. Parts of this work have previously appeared in [1].

The paper is organized as follows. In Section 2 we describe the channel model and state all our main results formally. Section 3 characterizes capacity achieving input/output distributions (caid/caod, resp.) and evaluates moments of the information density. Then in Sections Section 4 and Section 5 we prove the achievability and converse parts of our (non rank-1) results, respectively. Section 6 focuses on the special case of when the matrix of channel gains has rank 1. Finally, Section 7 contains discussion of numerical results and behavior of channel dispersion with the number of antennas.

The numerical software used to compute the achievability bounds, dispersion and normal approximation in this work can be found online under the Spectre project [22].

## 2Main Results

The channel model considered in this paper is the frequency-nonselective coherent real block fading discrete-time channel with multiple transmit and receive antennas (MIMO) (See [23] for background on this model). We will simply refer to it as the MIMO-BF channel, which we formally define as follows. Given parameters as follows: let be the number of transmit antennas, be the number of receive antennas, and be the coherence time of the channel. The input-output relation at block (spanning time instants to ) with is given by

where is a matrix-valued random fading process, is a matrix channel input, is a Gaussian random real-valued matrix with independent entries of zero mean and unit variance, and is the matrix-valued channel output. The process is assumed to be i.i.d. with isotropic distribution , i.e. for any orthogonal matrices and , both and are equal in distribution to . We also assume that

to avoid trivialities. Note that due to merging channel inputs at time instants into one matrix-input, the block-fading channel becomes memoryless. We assume coherent demodulation so that the channel state information (CSI) is fully known to the receiver (CSIR).

An code of blocklength , probability of error and power-constraint is a pair of maps: the encoder and the decoder satisfying the probability of error constraint

on the probability space

where the message is uniformly distributed on , , is as described in (Equation 3), and . In addition the input sequences are required to satisfy the power constraint:

where is the Frobenius norm of the matrix .

Under the isotropy assumption on , the capacity appearing in (Equation 1) of this channel is given by [8]

where is the capacity of the additive white Gaussian noise (AWGN) channel with SNR , is the minimum of the transmit and receive antennas, and are eigenvalues of . Note that it is common to think that as the expression ( ?) scales as , but this is only true if .

The goal of this line of work is to characterize dispersion of the present channel. Since the channel is memoryless it is natural to expect, given the results in [3], that dispersion (for ) is given by

where we denoted (single -block) information density by

and is the capacity achieving output distribution (caod). Justification of (Equation 7) as the actual (operational) dispersion, appearing in the expansion of is by no means trivial and is the subject of this work.

Here we formally state the main results, then go into more detail in the following sections. Our first result is an achievability and partial converse bound for the MIMO-BF fading channel for fixed parameters .

This follows from Theorem ? and Theorem ? below.

The dispersion (Equation 7) is expressed as minimization over input distributions on that achieve capacity of the channel. As will be shown soon, there may be many different caids, but in typical MIMO scenarios, it turns out they all achieve the same dispersion, which can be in turn calculated for the simplest Telatar caid (i.i.d. Gaussian matrix ). The following theorem gives full details.

This is proved in Proposition ? below.

In the rank-1 case (e.g. for MISO systems), there is a multitude of caids, and the minimization problem in (Equation 7) is non-trivial. Quite surprisingly, for some values of , we show that the (essentially unique) minimizer is a full-rate orthogonal design. The latter were introduced into communication by Alamouti [10] and Tarokh et al [11]. This shows a somewhat unexpected connection between schemes optimal from modulation-theoretic and information-theoretic points of view. The precise results are as follows.

This is the content of Proposition ? below.

Unfortunately, the quantity is generally unknown since the minimization in ( ?) is over the manifold of matrix-valued random variables. However, for many dimensions, the minimum can be found by invoking the Radon-Hurwitz theorem. We state this below to introduce the notation, and expand on it in Section 6.

The following theorem summarizes our current knowledge of .

Note that the function is monotonic in even values of (and is for odd), so that for any , there is a large enough such that , in which case an full rate orthogonal design achieves the optimal .

## 3Preliminary results

### 3.1Known results: capacity and capacity achieving output distribution

We review known results on the MIMO-BF channel. Since the channel is memoryless, the capacity is given by

It was shown by Telatar [8] that whenever distribution of is isotropic, the input given by

is a maximizer, resulting in the capacity formula (Equation 6). The distribution induced by a caid at the channel output is called capacity achieving output distribution (caod). A classical fact is that while there can be many caids, the caod is unique, e.g. [24]. Thus, from [8] we infer that the caod is given by

where is decomposed into columns .

### 3.2Capacity achieving input distributions

When is of low rank there are many possible caids, and somewhat surprisingly for the case of rank-1 (e.g. for MISO) there are even jointly Gaussian caids. The next proposition characterizes all caids. Later, this characterization will be used to show that dispersion-minimizing caid in Theorem ? is given by orthogonal designs (such as Alamouti’s coding), for dimensions when those exist.

We will rely repeatedly on the following observations:

1. if are two random vectors in then and for any we have

This is easy to show by computing characteristic functions.

2. If are two random vectors in independent of , then

This follows from the fact that characteristic function of is nowhere zero.

3. For two matrices we have

This follows from the fact that quadratic form that is zero everywhere on must have all coefficients equal to zero.

Part 1 (necessity). As we discussed the caod is unique and given by (Equation 11). Thus is caid iff for -almost every we have

where is an matrix with iid entries (for sufficiency, just write with denoting differential entropy). We will argue next that this implies (under isotropy assumption on ) that

From (Equation 12), (Equation 16) is equivalent to .

Let be an almost sure set of those for which (Equation 15) holds. Let denote the group of orthogonal matrices, with the topology inherited from . Let and for be countable dense subsets of and , respectively. (These exist since is a second-countable topological space). By isotropy of we have and therefore

is also almost sure: . By assumption (Equation 4) must contain a non-zero element and also for all . From (Equation 13) we conclude that we have

Arguing by continuity and using density of and , this implies also

In particular, for any there must exist a choice of such that has the top row equal to . Choosing these in (Equation 17) and comparing distributions of top rows, we conclude (Equation 16) (after scaling by ).

Part 1 (sufficiency). Suppose . Then our goal is to show that (Equation 16) implies that is caid. To that end, it is sufficient to show that for all rank-1 . In the special case

claim follows directly from (Equation 16). Every other rank-1 can be decomposed as for some matrix , and thus again we get , concluding the proof.

Parts 2 and 3 (necessity). From part 1 we have that for every we must have . Computing expected square we get

Thus, expressing the left-hand side in terms of rows as we get

and thus by (Equation 14) we conclude that for all :

Each entry of the matrices above is a quadratic form in and thus again by (Equation 14) we conclude ( ?)- ( ?). Part 3 is argued similarly with roles of and interchanged.

Parts 2 and 3 (sufficiency). When is (at most) rank-1, we have from part 1 that it is sufficient to show that . When is jointly zero-mean Gaussian, we have is zero-mean Gaussian and so we only need to check its second moment satisfies (Equation 18). But as we just argued (Equation 18) is equivalent to either ( ?)- ( ?) or ( ?)- ( ?).

Part 4. As in Part 1, there must exist such that (Equation 17) holds and . Thus, by choosing we can diagonalize and thus we conclude any pair of rows must be independent.

Part 5. This part is never used in subsequent parts of the paper, so we only sketch the argument and move the most technical part of the proof to Appendix Section 8. Let . Then arguing as for (Equation 17) we conclude that is caid if and only if for any with we have

In other words, we have

If , then rank condition on is not active and hence, we conclude by (Equation 12) that . So assume . Note that (Equation 19) is equivalent to the condition on characteristic function of as follows:

It is easy to find polynomial (in ) that vanishes on all matrices of rank (e.g. take the product of all minors). Then Proposition ? in Appendix Section 8 constructs non-Gaussian satisfying (Equation 20) and hence (Equation 19).

### 3.3Information density and its moments

It will be convenient to assume that the matrix is represented as

where are uniformly distributed on and , respectively,2 and is the diagonal matrix with diagonal entries . Joint distribution of depends on the fading model. It does not matter for our analysis whether ’s are sorted in some way, or permutation-invariant.

For the MIMO-BF channel, let denote the caod (Equation 11). Then the information density with respect to (for a single -block of symbols) as defined in (Equation 8) becomes

where we assumed as in (Equation 21), so that is the -th column of , and have set , with representing the -th row of . The definition naturally extends to blocks of length additively:

We compute the (conditional) mean of information density to get

where we used the following simple fact:

Note that by additivity of across columns, it is sufficient to consider the case , for which the statement is clear from symmetry.

From (Equation 22), we have the form of the information density. First note that the information density over channel uses decomposes into a sum of independent terms,

As such, the variance conditioned on also decomposes as

from which ( ?) follows. Because the variance decomposes as a sum in (Equation 25), we focus on only computing for a single coherent block. Define

so that in notation from (Equation 22). With this, the quantity of interest is

where ( ?) follows from the identity

Below we show that and corresponds to ( ?), corresponds to ( ?), corresponds to ( ?), and corresponds to ( ?) and ( ?). We evaluate each term separately.

where ( ?) follows from noting that

Now, since is independent from by the rotational invariance assumption, we have that is independent from , since only depends on through its eigenvalues, see ( ?). Hence the outer expectation in (Equation 28) distributed over the term, and we obtain

which gives ( ?).

Next, in ( ?) becomes

For in ( ?), we obtain

where

• (Equation 29) follows from taking the variance over .

• ( ?) follows from Lemma ? applied to , and adding and subtracting the term

Continuing with from ( ?),

where

• (Equation 30) follows from taking the expectation over ,

• ( ?) follow from applying the variance identity (Equation 27) with respect to and , as well as recalling ( ?).

We are left to show that the term ( ?) equals ( ?). To that end, define

We will finish the proof by showing

To that end, we first compute moments of drawn from the Haar measure on the orthogonal group.

Proof of Lemma is given below.

First, note that the variance does not depend on , since the marginal distribution of each is uniform on the unit sphere. Hence below we only consider . We obtain

where denotes the -th row of . Now it is a matter counting similar terms.

where

• (Equation 33) follows from collecting like terms from the summation in ( ?).

• ( ?) uses Lemma ? to compute each expectation.

• ( ?) follows from realizing that

Plugging this back into (Equation 32) yields the variance term,

Now we compute the covariance term from (Equation 31) in a similar way. By symmetry of the columns of , we can consider only the covariance between and , i.e.

Expanding the expectation, we get

With this, we obtain from (Equation 35),

where the steps follow just as in the variance computation (Equation 33)- ( ?).

Finally, returning to (Equation 31), using the variance (Equation 34) and covariance (Equation 37), we obtain

Plugging this into ( ?) finishes the proof.

We first note that all entries of have identical distribution, since permutations of rows and columns leave the distribution invariant. Because of this, we can WLOG only consider