On the Capacity of Vector Gaussian Channels With Bounded Inputs

# On the Capacity of Vector Gaussian Channels With Bounded Inputs

Borzoo Rassouli and Bruno Clerckx Borzoo Rassouli is with the Intelligent Systems and Networks group of Department of Electrical and Electronics, Imperial College London, United Kingdom. email: b.rassouli12@imperial.ac.ukBruno Clerckx is with the Communication and Signal Processing group of Department of Electrical and Electronics, Imperial College London and the School of Electrical Engineering, Korea University, Korea. email: b.clerckx@imperial.ac.ukThis paper was presented in part at the IEEE International Conference on Communications (ICC) 2015, London, UK.This work was partially supported by the Seventh Framework Programme for Research of the European Commission under grant number HARP-318489.
###### Abstract

The capacity of a deterministic multiple-input multiple-output (MIMO) channel under the peak and average power constraints is investigated. For the identity channel matrix, the approach of Shamai et al. is generalized to the higher dimension settings to derive the necessary and sufficient conditions for the optimal input probability density function. This approach prevents the usage of the identity theorem of the holomorphic functions of several complex variables which seems to fail in the multi-dimensional scenarios. It is proved that the support of the capacity-achieving distribution is a finite set of hyper-spheres with mutual independent phases and amplitude in the spherical domain. Subsequently, it is shown that when the average power constraint is relaxed, if the number of antennas is large enough, the capacity has a closed form solution and constant amplitude signaling at the peak power achieves it. Moreover, it will be observed that in a discrete-time memoryless Gaussian channel, the average power constrained capacity, which results from a Gaussian input distribution, can be closely obtained by an input where the support of its magnitude is a discrete finite set. Finally, we investigate some upper and lower bounds for the capacity of the non-identity channel matrix and evaluate their performance as a function of the condition number of the channel.

Vector Gaussian channel, peak power constraint, discrete magnitude, spherical symmetry

## I Introduction

The capacity of a point-to-point communication system subject to peak and average power constraints was investigated in [1] for the scalar Gaussian channel where it was shown that the capacity-achieving distribution is unique and has a probability mass function with a finite number of mass points. In [2], Shamai and Bar-David gave a full account on the capacity of a quadrature Gaussian channel under the aforementioned constraints and proved that the optimal input distribution has a discrete amplitude and a uniform independent phase. This discreteness in the optimal input distribution was surprisingly shown in [3] to be true even without a peak power constraint for the Rayleigh-fading channel when no channel state information (CSI) is assumed either at the receiver or the transmitter. Following this work, the authors in [4] and [5] investigated the capacity of noncoherent AWGN and Rician-fading channels, respectively. In [6], a point to point real scalar channel is considered in which sufficient conditions for the additive noise are provided such that the support of the optimal bounded input has a finite number of mass points. These sufficient conditions are also useful in multi-user settings as shown in [7] for the MAC channel under bounded inputs.

The analysis of the MIMO channel under the peak power constraints per antenna is a straightforward problem after changing the vector channel into parallel AWGN channels and applying the results of [1] or [2]. Recently, the vector Gaussian channel under the peak and average power constraints has become more practical by the new scheme proposed in [8]. More specifically, this scheme enables multiple antenna transmission using only one RF chain and the peak power constraint (i.e., a peak constraint on the norm of the input vector rather than on each antenna separately) is the very result of this single RF chain. The capacity of the vector Gaussian channel under the peak and average power constraints has been explored in [9] and [10]. However, according to [11], it seems that the results in the higher dimension settings are not rigorous due to the usage of the identity theorem for holomorphic functions of several complex variables without fulfilling its conditions. As shown by an example in section IV of [11], a holomorphic function of several complex variables can be zero on , but not necessarily zero on . Since is not an open subset of , the identity theorem cannot be applied. To address this problem, the contributions of this paper are as follows.

• For the identity channel matrix, the approach of [2] is generalized to the vector Gaussian channel in which the complex extension will be done only on a single variable which is the amplitude of the input in the spherical coordinates. The necessary and sufficient conditions for the optimality of the input distribution are derived and it is proved that the magnitude of the capacity-achieving distribution has a probability mass function over a finite number of mass points which determines a finite number of hyper spheres in the spherical coordinates. Further, the magnitude and the phases of the capacity-achieving distribution are mutually independent and the phases are distributed in a way that the points are uniformly distributed on each of the hyper spheres.

• It is shown that if the average power constraint is relaxed, when the ratio of peak power to the number of dimensions remains below a certain threshold (), the constant amplitude signaling at the peak power achieves the capacity.

• It is also shown that for a fixed SNR, the gap between the Shannon capacity and the constant amplitude signaling decreases as for large values of , where denotes the number of dimensions.

• Finally, the case of the non-identity channel matrix is considered where we start from the MISO channel and show that the support of the optimal input does not necessarily have discrete amplitude. Afterwards, several upper bounds and lower bounds are provided for the general by MIMO channel capacity. The performance of these bounds are evaluated numerically as a function of the condition number of the channel.

The paper is organized as follows. The system model and some preliminaries are provided in section II, respectively. The main result of the paper is given in section III for the identity channel. The general case of the non-identity channel matrix is briefly investigated in section IV. Numerical results and the conclusion are given in sections V and VI, respectively. Some of the calculations are provided in the appendices at the end of the paper.

## Ii System Model and preliminaries

In a discrete-time memoryless vector Gaussian channel, the input-output relationship for the identity channel is given by

 Y(t)=X(t)+W(t), (1)

where , () denote the input and output of the channel, respectively. denotes the channel use and is an i.i.d. noise vector process with which is independent of for every transmission . 111It is obvious that the -dimensional complex AWGN channel can be mapped to the channel in (1) with

The capacity of the channel in (1) under the peak and the average power constraints is

 C(up,ua)=supFX(x):∥X∥2≤up, E(∥X∥2)≤uaI(X;Y), (2)

where denotes the input cumulative distribution function (CDF) of the input vector, and , are the upper bounds for the peak and the average power, respectively. Throughout the paper, any operator that involves a random variable reads with the term almost-surely (e.g. )222More precisely, let be the sample space of the probability model over which the random vector is defined. is equivalent to .

It is obvious that

 supFX(x):∥X∥2≤up, E(∥X∥2)≤uaI(X;Y)≤supFX(x):E(∥X∥2)≤min(up,ua)I(X;Y).

Therefore, a trivial upper bound for the capacity is given by

 C(up,ua)≤CG=n2ln(1+min(up,ua)n), (3)

where is achieved by a Gaussian input vector distributed as .

We formulate the optimization problem in the spherical domain. The rational behind this change of coordinates is due to the spherical symmetry of the white Gaussian noise and the constraints which, as it will be clear, enables us to perform the optimization problem only on the magnitude of the input. By writing the mutual information in terms of the differential entropies, we have

 I(X;Y)=h(Y)−h(Y|X)=h(Y)−n2ln2πe,

where the entropies are in nats. Motivated by the spherical symmetry of the white Gaussian noise and the constraints, and can be written in spherical coordinates as

where and denote the magnitude of the output and the input, respectively. and are, respectively, the phase vectors of the output and the input, in which , and , is a unit vector in which

 ak(ϕ)={cosϕk∏k−1i=1sinϕik∈[1:n−1]∏k−1i=1sinϕik=n. (4)

As it will become clear later, this change of coordinates prevents the usage of the identity theorem for holomorphic functions of several complex variables. The optimization problem in (2) is equivalent to

 C(up,ua)=supFP,Θ(ρ,θ):P2≤up, E(P2)≤uah(Y)−n2ln2πe. (5)

The differential entropy of the output is given by

 h(Y) =−∫RnfY(y)lnfY(y)dy =−∫∞0∫π0…∫π0n−2 times∫2π0fY(y(r,ψ))lnfY(y(r,ψ))|∂y∂(r,ψ)|dψdr =−∫∞0∫π0…∫π0n−2 times∫2π0fR,Ψ(r,ψ)lnfR,Ψ(r,ψ)|∂y∂(r,ψ)|dψdr =h(R,Ψ)+∫∞0fR(r)lnrn−1dr +n−2∑i=1∫π0fΨi(ψi)lnsinn−i−1ψidψi, (6)

where is the Jacobian of the transform. The conditional pdf of conditioned on is given by

 fR,Ψ|P,Θ(r,ψ|ρ,θ) =1(√2π)ne−r2+ρ2−2rρaT(θ)a(ψ)2rn−1 ×n−2∏i=1sinn−i−1ψi. (7)

From (7), the joint pdf of the magnitude and phases of the output is

 fR,Ψ(r,ψ)=∫∞0∫π0…∫π0n−2 times∫2π0fR,Ψ|P,Θ(r,ψ|ρ,θ)dnFP,Θ(ρ,θ), (8)

in which denotes the joint CDF of By integrating (8) over the phase vector , we have

 fR(r)=∫∞0L(r,ρ)fP(ρ)dρ, (9)

where 333The reason that is not a function of the phase vector is due to the spherically symmetric distribution of the white Gaussian noise. In other words, is the integral of the Gaussian pdf over the surface of an n-sphere with radius which is invariant to the position of as long as , i.e. which is constant on . (9) implies that in the AWGN channel in (1), is induced only by and not .

 L(r,ρ)=∫π0…∫π0n−2 times∫2π0fR,Ψ|P,Θ(r,ψ|ρ,θ)dψn−1…dψ1.

It is obvious that

 h(R,Ψ)≤h(R)+n−1∑i=1h(Ψi)≤h(R)+n−2∑i=1h(Ψi)+ln2π, (10)

where the first inequality is tight iff the elements of are mutually independent, and the second inequality becomes tight iff is uniformly distributed over . From (6) and (10),

 h(Y) ≤h(R)+n−2∑i=1h(Ψi)+∫∞0fR(r)lnrn−1dr +n−2∑i=1∫π0fΨi(ψi)lnsinn−i−1ψidψi+ln2π. (11)

For the sake of readability, the following change of variables is helpful

 V=Rnn , Ui=∫Ψi0sinn−i−1δdδ , i∈[1:n−2]. (12)

Since and , it is easy to show that the two mappings and (defined in (12)) are invertible. Also, the support set of is where (the Gamma function is defined as .) From (9), the pdf of is 444The existence of is guaranteed by the Gaussian distribution of the additive noise.

 fV(v)=fV(v;FP)=∫∞0Kn(v,ρ)dFP(ρ), (13)

where the notation in is to emphasize that has been induced by . Not that the integral transform in (13) is invertible as shown in Appendix D. The kernel is given by

 Kn(v,ρ) =L(n√nv,ρ)(n√nv)n−1 =∫π0…∫π0n−2 % times∫2π01(√2π)ne−(n√nv)2+ρ2−2n√nvρaT(θ)a(ψ)2 .n−2∏i=1sinn−i−1ψidψn−1…dψ1 (14) =e−(n√nv)2+ρ22⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩ccIn2−1(ρn√nv)(ρn√nv)n2−1ρv≠01Γ(n2)2n2−1ρv=0,∀n≥2, (15)

where is the modified bessel function of the first kind and order . The calculations are provided in Appendix A. Note that is continuous on its domain. The differential entropy of is

 h(V) =h(V;FP) =−∫∞0fV(v;FP)lnfV(v;FP)dv =−∫∞0fR(r)lnfR(r)rn−1dr. (16)

The differential entropy of is given by

 h(Ui) =−∫SUifUi(u)lnfUi(u)du =−∫π0fΨi(ψi)lnfΨi(ψi)sinn−i−1ψidψi  ,  i∈[1:n−2]. (17)

Rewriting (5), we have

 C(up,ua) =supFP,Θ(ρ,θ):P2≤up,E[P2]≤uah(Y)−n2ln2πe ≤supFP,Θ(ρ,θ):P2≤up,E[P2]≤uah(V;FP)+n−2∑i=1h(Ui) +(1−n2)ln2π−n2 (18) ≤supFP(ρ):P2≤up,E[P2]≤uah(V;FP)+n−2∑i=1lnαi +(1−n2)ln2π−n2, (19)

where (18) results from (11), (16) and (17). (19) is due to the fact that since (the support of ) is bounded, is maximized when is uniformly distributed. It is easy to verify that if the magnitude and phases of the input are mutually independent with the phases having the distributions as

 Θn−1∼U[0,2π) , fΘi(θi)=α−1isinn−i−1θi  ,  i∈[1:n−2], (20)

the magnitude and phases of the output become mutually independent with the phases having the distributions as

 Ψn−1∼U[0,2π) , fΨi(ψi)=α−1isinn−i−1ψi  ,  i∈[1:n−2], (21)

where . In other words, having the input distribution

 FP,Θ(ρ,θ)=θn−12πFP(ρ)n−2∏i=1∫θi0α−1isinn−i−1θdθ (22)

results in

 FR,Ψ(r,ψ)=ψn−12πFR(r)n−2∏i=1∫ψi0α−1isinn−i−1ψdψ. (23)

The above result can be easily checked either by solving for in (8) or by the fact that the summation of two independent spherically symmetric random vectors is still spherically symmetric.555The magnitude and the unit vector of a spherically symmetric random vector are independent and the unit vector is uniformly distributed on the unit ball. It can be verified that this property is equivalent to the vector having the distribution of (23) in spherical coordinates. Also, note that having distributed as in (21) implies uniform on It can be observed that the input pdf in (22) makes the inequalities in (18) and (19) tight. Since the constraint is only on the magnitude of the input and is induced only by , it is concluded that the optimal input distribution must have mutually independent phases and magnitude with the phases being distributed as in (20). Therefore,

 C(up,ua) =supFP(ρ):P2≤up,E[P2]≤uah(V;FP) +n−2∑i=1lnαi+(1−n2)ln2π−n2. (24)

Before proceeding further, it is interesting to check whether the problem in (24) boils down to the classical results when the peak power constraint is relaxed (i.e., ). From the definition of ,

 E[V2n]=1n√n2E[n+P2].

This can be verified by a change of variable (i.e., ) and using the derivative of (112) (in Appendix D) with respect to . Therefore, when , the problem in (24) becomes maximization of the differential entropy over all the distributions having a bounded moment of order which is addressed in Appendix B for an arbitrary moment. Substituting with and with in (92), the optimal distribution for is obatined and from (13), the corresponding has the general Rayleigh distribution as

 fP∗(ρ)=nn2ρn−1e−nρ22ua2n−22un2aΓ(n2),

which is the only solution, since (13) is an invertible transform (see Appendix D). Furthermore, it can be verified that the maximum is

 C(∞,ua)=n2ln(1+uan), (25)

which coincides with the classical results for the identity channel matrix [12].

Similar to [1] and [2], we define the marginal entropy density of as

 ~hV(x;FP)=−∫∞0Kn(v,x)lnfV(v;FP)dv, (26)

which satisfies

 h(V;FP)=∫∞0~hV(ρ;FP)dFP(ρ).

(26) is shown to be an invertible transform in Appendix D and this property will become useful later on.

## Iii Main results

Let denote the set of points of increase666A point is said to be a point of increase of a distribution if for any open set containing , we have of in the interval . The main result of the paper is given in the following theorem.

Theorem. The supremization in (24), which is for the identity channel matrix, has a unique solution and the optimal input achieving the supremum (and therefore the maximum) has the following distribution in the spherical coordinates,

 F∗P,Θ(ρ,θ)=θn−12πF∗P(ρ)n−2∏i=1∫θi0α−1isinn−i−1θdθ, (27)

where has a finite number of points of increase (i.e., has a finite cardinality). Further, the necessary and sufficient condition for to be optimal is the existence of a for which

 ~hV(ρ;F∗P) ≤h(V;F∗P)+λ(ρ2−ua) , ∀ρ∈[0,√up] (28) ~hV(ρ;F∗P) =h(V;F∗P)+λ(ρ2−ua) , ∀ρ∈ϵP∗. (29)

Note that when the average power constraint is relaxed (i.e., ), .

###### Proof.

The phases of the optimal input distribution have already been shown to be mutually independent and have the distribution in (20) being independent of the magnitude. Therefore, it is sufficient to show the optimal distribution of the input magnitude. This is proved by reductio ad absurdum. In other words, it is shown that having an infinite number of points of increase results in a contradiction. The detailed proof is given in Appendix C. ∎

Remark 1. When the average power constraint is relaxed (i.e. ), the following input distribution is asymptotically () optimal

 F∗∗P,Θ(ρ,θ)=θn−12πu(ρ−√up)n−2∏i=1∫θi0α−1isinn−i−1θdθ, (30)

where is the unit step function. Further, the resulting capacity is given by

 C(up,up)≈up2  when  upn≪1.

Later, in the numerical results section, we observe that the density in (30) remains optimal for the non-vanishing ratio when it is below a certain threshold.

###### Proof.

Since the density in (30) has spherical symmetry, it is sufficient to show that is optimal when . From (3), we have

 limupn→0C(up,ua)≤up2. (31)

The CDF induces the following output pdf

 fV(v;F∗∗P)=Kn(v,√up)=e−(n√nv)2+up2In2−1(√upn√nv)(√upn√nv)n2−1. (32)

When is small, the entropy of is given by (35) on top of the next page. In (33), we have approximated the modified bessel function with the first two terms in its power series expansion as follows

 In(x)≈xnΓ(n+1)2n(1+x24(n+1))  ,  xn→0.

In (34), we use the approximation and in (35), the higher order term is neglected. Given the input distribution , the achievable rate with small ratio is given by (see (24))

 limupn→0h(V;F∗∗P)+n−2∑i=1lnαi+(1−n2)ln2π−n2=up2, (36)

where we have used the fact that

 n−2∑i=1lnαi=−lnΓ(n2)+n−22lnπ.

From (36) and (31), it is concluded that the pdf in (30) is asymptotically optimal for when . Note that the distribution in (30) is not the only asymptotically optimal distribution. There are many possible alternatives, one of which, for example, is the binary PAM in each dimension with the points and which can be verified to have an achievable rate of when . Specifically, in the low peak power regime (), a sufficient condition for the input distribution to be asymptotically optimal is as follows. First, it has a constant magnitude at . Second, its is independent of and has a zero first Fourier coefficient i.e.,

 ∫π0ejθfΘ1(θ)dθ=0. (37)

The claim is justified by noting that fulfilling the second condition results in the spherical symmetric output distribution of (23) as follows. Using the approximation , at small values of , (7) can be approximated as

 fR,Ψ|P,Θ(r,ψ|ρ,θ) ≈1(√2π)ne−r2+ρ22(1+rρaT(θ)a(ψ)) ×rn−1n−2∏i=1sinn−i−1ψi. (38)

If is independent of , substituting (38) in (8) results in

 fR,Ψ(r,ψ) ≈∫∞0∫π0…∫π0n−3 times∫2π0∫π01(√2π)ne−r2+ρ22rn−1 ×(1+rρaT(θ)a(ψ))n−2∏i=1sinn−i−1ψi dFΘ1(θ1)dn−1FP,Θn−12(ρ,θn−12) (39)

where . If has a zero first Fourier coefficient, due to the structure of (see (4)), we have

 ∫π0aT(θ)a(ψ)dFΘ1(θ1)=0.

Therefore, (39) simplifies as

 fR,Ψ(r,ψ)≈∫∞01(√2π)ne−r2+ρ22rn−1n−2∏i=1sinn−i−1ψidFP(ρ)

which implies that when , having independent of all other spherical variables with a zero first Fourier coefficient results in the output distribution in (23) which makes the inequalities (18) and (19) tight. Finally, fulfilling the first condition (i.e., having a constant magnitude at ) validates the previous reasoning starting from (32).

The asymptotic optimality of the constant-magnitude signaling in (30) can alternatively be proved by inspecting the behavior of the marginal entropy density when is sufficiently small. From (13)

 fV(v;FP)→e−(n√nv)22Γ(n2)2n2−1∫∞0e−ρ22dFP(ρ)constant\ = C when upn→0.

Therefore,

 ~hV(ρ;FP) =−∫∞0e−(n√nv)2+ρ22In2−1(ρn√nv)(ρn√nv)n2−1lnfV(v;FP)dv →∫∞0e−(n√nv)2+ρ22In2−1(ρn√nv)(ρn√nv)n2−1[(n√nv)22 +ln⎛⎜⎝Γ(n2)2n2−1C⎞⎟⎠⎤⎥⎦dv =ρ2+n2+ln⎛⎜⎝Γ(n2)2n2−1C⎞⎟⎠ (40)

It is obvious that (40) is a (strictly) convex (strictly) increasing function. Hence, the necessary and sufficient conditions in (28) and (29) are satisfied if and only if the input has only one point of increase at which proves the asymptotic optimality of (30) for and . ∎

Remark 2. For a fixed SNR, the gap between Shannon capacity and the constant amplitude signaling decreases as for large values of .

###### Proof.

By writing the first two terms of the Taylor series expansion of the logarithm (i.e., ), we have

 when n→∞  ,  n2ln(1+upn)≈up2−u2p4n.

From (34), the achievable rate obtained by the constant envelope signaling is

 when n→∞  ,  I(X;Y)≈up2−u2p2n.

This shows that the gap between achievable rate and the Shannon capacity decreases as (), when goes to infinity. ∎

While remark 2 shows an asymptotic behavior of the gap, the following remark provides an analytical lower bound for any values of .

Remark 3. The following lower bound holds for the capacity of constant amplitude signaling.

 supFX(x):∥