Capacity PreLog of Noncoherent SIMO
Channels via Hironaka’s Theorem
Abstract
We find the capacity prelog of a temporally correlated Rayleigh blockfading singleinput multipleoutput (SIMO) channel in the noncoherent setting. It is well known that for blocklength and rank of the channel covariance matrix equal to , the capacity prelog in the singleinput singleoutput (SISO) case is given by . Here, can be interpreted as the prelog penalty incurred by channel uncertainty. Our main result reveals that, by adding only one receive antenna, this penalty can be reduced to and can, hence, be made to vanish for the blocklength , even if remains constant as . Intuitively, even though the SISO channels between the transmit antenna and the two receive antennas are statistically independent, the transmit signal induces enough statistical dependence between the corresponding receive signals for the second receive antenna to be able to resolve the uncertainty associated with the first receive antenna’s channel and thereby make the overall system appear coherent. The proof of our main theorem is based on a deep result from algebraic geometry known as Hironaka’s Theorem on the Resolution of Singularities.
I Introduction
It is well known that the capacity prelog, i.e., the asymptotic ratio between capacity and the logarithm of signaltonoise ratio (SNR), as SNR goes to infinity, of a singleinput multipleoutput (SIMO) fading channel in the coherent setting (i.e., when the receiver has perfect channel state information (CSI)) is equal to and is, hence, the same as that of a singleinput singleoutput (SISO) fading channel [4]. This result holds under very general assumptions on the channel statistics. Multiple antennas at the receiver only, hence, do not result in an increase of the capacity prelog in the coherent setting [4]. In the noncoherent setting, where neither transmitter nor receiver have CSI, but both know the channel statistics, the effect of multiple antennas on the capacity^{1}^{1}1In the remainder of the paper, we consider the noncoherent setting only. Consequently, we will refer to capacity in the noncoherent setting simply as capacity. prelog is understood only for a specific simple channel model, namely, the Rayleigh constant blockfading model. In this model the channel is assumed to remain constant over a block (of symbols) and to change in an independent fashion from block to block [5]. The corresponding SIMO capacity prelog is again equal to the SISO capacity prelog, but, differently from the coherent setting, is given by [6, 7].
An alternative approach to capturing channel variations in time is to assume that the fading process is stationary. In this case, the capacity prelog is known only in the SISO [8] and the multipleinput singleoutput (MISO) [9, Thm. 4.15] cases. The capacity bounds for the SIMO stationaryfading channel available in the literature [9, Thm. 4.13] do not allow to determine whether the capacity prelog in the SIMO case equals that in the SISO case. Resolving this question for stationary fading seems elusive at this point.
A widely used channel model that can be seen as lying in between the stationaryfading model considered in [8, 9], and the simpler constant blockfading model analyzed in [5, 7] is the correlated blockfading model, which assumes that the fading process is temporally correlated within blocks of length and independent across blocks. The channel covariance matrix of rank is taken to be the same for each block. This channel model is relevant as it captures channel variations in time in an accurate yet simple fashion: the rank of the covariance matrix corresponds to the minimum number of channel coefficients per block that need to be known at the receiver to perfectly reconstruct all channel coefficients within the same block. Therefore, larger corresponds to faster channel variations.
The SISO capacity prelog for correlated blockfading channels is given by [10]. In the SIMO and the multipleinput multipleoutput (MIMO) cases the capacity prelog is unknown. The main contribution of this paper is a full characterization of the capacity prelog for SIMO correlated blockfading channels. Specifically, we prove that under a mild technical condition on the channel covariance matrix, the SIMO capacity prelog, , of a channel with receive antennas and independent identically distributed (i.i.d.) SISO subchannels is given by
(1) 
This shows that even with receive antennas a capacity prelog of can be obtained in the SIMO case (provided that ). This capacity prelog is strictly larger than the capacity prelog of the corresponding SISO channel (i.e., the capacity prelog of one of the component channels), given by . Here can be interpreted as prelog penalty due to channel uncertainty. Our result reveals that, by adding at least one receive antenna, this penalty can be made to vanish in the large blocklength limit, , even if the amount of channel uncertainty scales linearly in the blocklength.
A conjecture for the correlated blockfading channel model stated in [10] for the MIMO case, when particularized to the SIMO case, implies that the capacity prelog in the SIMO case would be the same as that in the SISO case. As a consequence of (1) this conjecture is disproved.
In terms of the technical aspects of our main result, we sandwich capacity between an upper and a lower bound that turn out to be asymptotically (in SNR) tight (in the sense of delivering the same capacity prelog). The upper bound is established by proving that the capacity prelog of a correlated blockfading channel with receive antennas can be upperbounded by the capacity prelog of a constant blockfading channel with receive antennas and the same SNR. The derivation of the capacity prelog lower bound poses serious technical challenges. Specifically, after a change of variables argument applied to the integral expression for the differential entropy of the channel output signal, the main technical difficulty lies in showing that the expected logarithm of the Jacobian determinant corresponding to this change of variables is finite. As the Jacobian determinant takes on a very involved form, a per pedes approach appears infeasible. The problem is resolved by first distilling structural properties of the determinant through a suitable factorization and then introducing a powerful tool from algebraic geometry, namely [11, Th. 2.3], which is a consequence of Hironaka’s Theorem on the Resolution of Singularities [12, 13]. Roughly speaking, this result allows to rewrite every real analytic function [14, Def. 1.1.5, Def. 2.2.1] locally as a product of a monomial and a nonvanishing real analytic function. This factorization is then used to show that the integral of the logarithm of the absolute value of a real analytic function over a compact set is finite, provided that the real analytic function is not identically zero. This method is quite general and may be of independent interest when one tries to show that integrals of certain functions with singularities are finite, in particular, functions involving logarithms. In information theory such integrals often occur when analyzing differential entropy.
Notation
Sets are denoted by calligraphic letters Roman letters and designate deterministic matrices and vectors, respectively. Boldface letters and denote random matrices and random vectors, respectively. We let be the vector (of appropriate dimension) that has the th entry equal to one and all other entries equal to zero, and denote the identity matrix as . The element in the th row and th column of a deterministic matrix is (italic letters), and the th component of the deterministic vector is (italic letters); the element in the th row and th column of a random matrix is (sans serif letters), and the th component of the random vector is (sans serif letters). For a vector , stands for the diagonal matrix that has the entries of on its main diagonal. The linear subspace spanned by the vectors is denoted by . The superscripts and stand for transposition and Hermitian transposition, respectively. For two matrices and , we designate their Kronecker product as ; to simplify notation, we use the convention that the ordinary matrix product precedes the Kronecker product, i.e., . For a finite subset of the set of natural numbers, , we write for the cardinality of . For an matrix , and a set of indices , we use to denote the submatrix of containing the rows of with indices in . For two matrices and of arbitrary size, is the blockdiagonal matrix that has in the upper left corner and in the lower right corner. For matrices , we let . The ordered eigenvalues of the matrix are denoted by . For two functions and , the notation means that is bounded. For a function , we say that is not identically zero and write if there exists at least one element in the domain of such that . We say that a function is nonvanishing on a subset of its domain, if for all , . For two functions and , denotes the composition . For , . We use to designate the set of natural numbers . Let be a vectorvalued function; then denotes the Jacobian matrix [15, Def. 3.8] of the function , i.e., the matrix that contains the partial derivative in its th row and th column. The logarithm to the base 2 is written as . For sets , we define . If , then . With , we denote by the open cube in with side length centered at . The set of natural numbers, including zero, is . For and , we let . If is a subset of the image of a map then denotes the inverse image of . The expectation operator is designated by . For random matrices and , we write to indicate that and have the same distribution. Finally, stands for the distribution of a jointly proper Gaussian (JPG) random vector with mean and covariance matrix .
Ii System Model
We consider a SIMO channel with receive antennas. The fading in each SISO component channel follows the correlated blockfading model described in the previous section. The inputoutput (IO) relation within any block of length for the th SISO component channel can be written as
(2) 
where is the signal vector transmitted in the given block, and the vectors are the corresponding received signal and additive noise, respectively, at the th receive antenna. Finally, contains the channel coefficients between the transmit antenna and the th receive antenna. We assume that , for all , where (which is the same for all blocks and all component channels) has rank . The entries of the vectors are taken to be of unit variance, which implies that the main diagonal entries of are equal to 1 and the average received power is constant across time slots. It will turn out convenient to write the channel coefficient vector in whitened form as , where . Further, we assume that . As the noise vector has unit variance components, in (2) can be interpreted as the SNR. Finally, we assume that and are mutually independent, independent across , and change in an independent fashion from block to block. Note that for the correlated blockfading model reduces to the constant blockfading model as used in [6, 7].
With , , , and , we can write the IO relation (2) in the following—more compact—form
(3) 
The capacity of the channel (3) is defined as
(4) 
where the supremum is taken over all input distributions that satisfy the averagepower constraint
(5) 
The capacity prelog, the central quantity of interest in this paper, is defined as
Iii Intuitive Analysis
We start with a simple “backoftheenvelope” calculation that allows to develop some intuition on the main result in this paper, summarized in (1). The different steps in the intuitive analysis below will be seen to have rigorous counterparts in the formal proof of the capacity prelog lower bound detailed in Section VI.
The capacity prelog characterizes the channel capacity behavior in the regime where additive noise can “effectively” be ignored. To guess the capacity prelog, it therefore appears prudent to consider the problem of identifying the transmit symbols from the noisefree (and rescaled) observation
(6) 
Specifically, we shall ask the question: “How many symbols can be identified uniquely from given that the vector of channel coefficients is unknown but the statistics of the channel, i.e., the matrix , are known?” The claim we make is that the capacity prelog is given by the number of identifiable symbols divided by the block length .
We start by noting that the unknown variables in (6) are and , which means that we have a quadratic system of equations. It turns out, however, that the simple change of variables
(7) 
(we make the technical assumption , in the remainder of this section) transforms (6) into a system of equations that is linear in and . Since the transformation is invertible for , uniqueness of the solution of the linear system of equations in and is equivalent to uniqueness of the solution of the quadratic system of equations in and .
For concreteness and simplicity of exposition, we first consider the case and and assume that satisfies the technical condition specified in Theorem 1, stated in Section IV. A direct computation reveals that upon change of variables according to (7), the quadratic system (6) can be rewritten as the following linear system of equations:
(8) 
The solution of (8) can not be unique, as we have 6 equations in 7 unknowns. The can, therefore, not be determined uniquely from . We can, however, make the solution of (8) to be unique if we devote one of the data symbols to transmitting a pilot symbol (known to the receiver). Take, for concreteness, . Then (8) reduces to the following inhomogeneous system of 6 equations in 6 unknowns
(9) 
This system of equations has a unique solution if . We prove in Appendix C that under the technical condition on specified in Theorem 1, stated in Section IV, we, indeed, have that for almost all^{2}^{2}2Except for a set of measure zero. . It, therefore, follows that for almost all , the linear system of equations (9) has a unique solution. As explained above, this implies uniqueness of the solution of the original quadratic system of equations (6). We can therefore recover and , and, hence, and from . Summarizing our findings, we expect that the capacity prelog of the channel (3), for the special case and , is equal to , which is larger than the capacity prelog of the corresponding SISO channel (i.e., one of the SISO component channels), given by [10]. This answer, obtained through the backoftheenvelope calculation above, coincides with the rigorous result in Theorem 1.
We next generalize what we learned in the example above to and arbitrary, and start by noting that if is a solution of for fixed , then with is also a solution of this system of equations. It is therefore immediately clear that at least one pilot symbol is needed to make this system of equations uniquely solvable.
To guess the capacity prelog for general parameters and we first note that the homogeneous linear system of equations corresponding to that in (8), has equations for unknowns. As the example above indicates, we need to seek conditions under which this homogeneous linear system of equations can be converted into a linear system of equations that has a unique solution. Provided that satisfies the technical condition specified in Theorem 1 below, this entails meeting the following two requirements: (i) at least one symbol is used as a pilot symbol to resolve the scaling ambiguity described in the previous paragraph; (ii) the number of unknowns in the system of equations corresponding to that in (8) must be smaller than or equal to the number of equations. To maximize the capacity prelog we want to use the minimum number of pilot symbols that guarantees (i) and (ii). In order to identify this minimum, we have to distinguish two cases:

When [in this case ] we will need at least pilot symbols to satisfy requirement (ii). Since , choosing exactly pilot symbols will satisfy both requirements. The number of symbols left for communication will, therefore, be . Hence, we expect the capacity prelog to be given by , which agrees with the result stated in (1).

When [in this case ], we will need at least one pilot symbol to satisfy requirement (i). Since requirement (ii) is satisfied as a consequence of , it suffices to choose exactly one pilot symbol. The number of symbols left for communication will, therefore, be and we hence expect the capacity prelog to equal , which again agrees with the result stated in (1). Note that the resulting inhomogeneous linear system of equations has equations in unknowns. As there are more equations than unknowns, equations are redundant and can be eliminated.
The proof of our main result, stated in the next section, will provide rigorous justification for the casual arguments put forward in this section.
Iv The Capacity PreLog
The main result of this paper is the following theorem.
Theorem 1.
Suppose that satisfies the following
Property (A): Every rows of are linearly independent.
Then, the capacity prelog of the SIMO channel (3) is given by
(10) 
Remark 1.
We will prove Theorem 1 by showing, in Section V, that the capacity prelog of the SIMO channel (3) can be upperbounded as
(11) 
and by establishing, in Section VI, the lower bound
(12) 
While the upper bound (11) can be shown to hold even if does not satisfy Property (A), this property is crucial to establish the lower bound (12).
Remark 2.
The lower bound (12) continues to hold if Property (A) is replaced by the following milder condition on .
Property (A’): There exists a subset of indices with cardinality
such that every rows of are linearly independent.
We decided, however, to state our main result under the stronger Property (A) as both Property (A) and Property (A’) are very mild and the proof of the lower bound (12) under Property (A’) is significantly more cumbersome and does not contain any new conceptual aspects. A sketch of the proof of the stronger result (i.e., under Property (A’)) can be found in [2].
We proceed to discussing the significance of Theorem 1.
Iva Eliminating the prediction penalty
According to (10) the capacity prelog of the SIMO channel (3) with receive antennas is given by , provided that Property (A) holds, and . Comparing to the capacity prelog in the SISO case^{3}^{3}3Note that the results in [10] are stated for general channel covariance matrix . [10] (this result also follows from (10) with ), we see that—under a mild condition on the channel covariance matrix —adding only one receive antenna yields a reduction of the channel uncertaintyinduced prelog penalty from to . How significant is this reduction? Recall that is the number of uncertain channel parameters within each given block of length . Hence, the ratio between the rank of the covariance matrix and the blocklength, , is a measure that can be seen as quantifying the amount of channel uncertainty relative to the number of degrees of freedom for communication. It often makes sense to consider with the amount of channel uncertainty held constant. For concreteness, consider with so that . The capacity prelog penalty due to channel uncertainty in the SISO case is then given by . Theorem 1 reveals that, by adding a second receive antenna, this penalty can be reduced to and, hence, be made to vanish in the limit . Intuitively, even though the SISO channels between the transmit antenna and the two receive antennas are statistically independent, the transmit signal induces enough statistical dependence between the corresponding receive signals for the second receive antenna to be able to resolve the channel uncertainty associated with the first receive antenna’s channel and thereby make the overall system appear coherent.
IvB Number of receive antennas
Note that for , we can rewrite (10) as
(13) 
As illustrated in Fig. 1, it follows from (13) that for fixed and with the capacity prelog of the SIMO channel (3) grows linearly with as long as is smaller than the critical value . Once reaches this critical value, further increasing the number of receive antennas does not increase the capacity prelog.
IvC Property (A) is mild
Property (A) is not very restrictive and is satisfied by many practically relevant channel covariance matrices . For example, removing an arbitrary set of columns from an discrete Fourier transform (DFT) matrix results in a matrix that satisfies Property (A) when is prime [16]. (Weighted) DFT covariance matrices arise naturally in socalled basisexpansion models for timeselective channels [10].
Property (A) can furthermore be shown to be satisfied by “generic” matrices . Specifically, if the entries of are chosen randomly and independently from a continuous distribution [17, Sec. 23, Def. (2)] (i.e., a distribution with a welldefined probability density function (PDF)), then the resulting matrix will satisfy Property (A) with probability one. The proof of this statement follows from a union bound argument together with the fact that independent dimensional vectors drawn independently from a continuous distribution are linearly independent with probability one.
V Proof of the Upper Bound (11)
The proof of (11) consists of two parts. First, in Section VA, we prove that . This will be accomplished by generalizing—to the SIMO case—the approach developed in [10, Prop. 4] for establishing an upper bound on the SISO capacity prelog. Second, in Section VB, we prove that by showing that the capacity of a SIMO channel with receive antennas and channel covariance matrix of rank can be upperbounded by the capacity of a SIMO channel with receive antennas, the same SNR, and a rank1 covariance matrix. The desired result, , then follows by application of [7, Eq. (27)], [18, Eq. (7)] as detailed below.
Va First part:
Recall that has rank . Without loss of generality, we assume, in what follows, that the first rows of are linearly independent. This can always be ensured by reordering the scalar IO relations in (2). With and we can write
(15) 
where (a) and (b) follow by the chain rule for mutual information and in (c) we used that and are independent conditional on . Next, we upperbound each term in (15) separately.
From [19, Thm. 4.2] we can conclude that the assumption of the first rows of being linearly independent implies that the first term on the RHS of (15) grows at most doublelogarithmically with SNR and hence does not contribute to the capacity prelog. For the reader’s convenience, we repeat the corresponding brief calculation from [19, Thm. 4.2] in Appendix A and show that:
(16) 
Here and in what follows, refers to the limit .
For the second term in (15) we can write
(17) 
where in (a) we used the fact that conditioning reduces entropy; (b) follows from the chain rule for differential entropy and the fact that conditioning reduces entropy; (c) follows because Gaussian random variables are differentialentropymaximizers for fixed variance and because and are independent; (d) is a consequence of the power constraint (5); and (e) follows because .
It follows from (18) that for , the capacity prelog is zero and can grow no faster than doublelogarithmically in .
Recall that is the capacity prelog of the correlated blockfading SISO channel [10]. As the proof of the upper bound reveals, the capacity prelog of the SIMO channel (3) can not be larger than times the capacity prelog of the corresponding SISO channel (i.e., the capacity prelog of one of the SISO component channels). The upper bound may seem crude, but, surprisingly, it matches the lower bound for .
VB Second part:
The proof of will be accomplished in two steps. In the first step, we show that the capacity of a SIMO channel with receive antennas and rank channel covariance matrix is upperbounded by the capacity of a SIMO channel with receive antennas, the same SNR, and rank covariance matrix. In the second step, we exploit the fact that the channel (14) with rank covariance matrix (under the assumption that the rows of have unit norm) is a constant blockfading channel for which the capacity prelog was shown in [7] to equal . We now implement the proof program just outlined.
Let denote the columns of the matrix so that . Let denote the transposed rows of the matrix so that . We can rewrite the IO relation (14) in the following form that is more convenient for the ensuing analysis:
Let be independent random matrices of dimension , each with i.i.d. entries. As, by assumption, the rows of have unit norm, we have that
Hence, we can rewrite as
(19) 
where
(20) 
Note now that each is the output of a SIMO channel with receive antennas, rank channel covariance matrix, and SNR . Realizing that, by (19) and (20), forms a Markov chain, we conclude, by the dataprocessing inequality [20, Sec. 2.8], that
The claim now follows by noting that the matrix obtained by stacking the matrices next to each other can be interpreted as the output of a SIMO channel with receive antennas, rank covariance matrix, independent fading across receive antennas, and SNR . The proof is completed by upperbounding the capacity of this channel by means of the following lemma.
Lemma 2.
The capacity of the SIMO channel (14) with receive antennas, , and can be upperbounded according to
Vi Proof of the Lower Bound (12)
To help the reader navigate through the proof of the lower bound (12), we start by explaining the architecture of the proof.
Via Architecture of the proof
The proof consists of the following steps, each of which corresponds to a subsection in this section:

Choose an input distribution; we will see that i.i.d. input symbols allow us to establish the capacity prelog lower bound (12).

Decompose the mutual information between the input and the output of the channel according to .

Using standard informationtheoretic bounds show that is upperbounded by.

Split into three terms: a term that depends on SNR, a differential entropy term that depends on the noiseless channel output only, and a differential entropy term that depends on the noise vector only. Conclude that the last of these three terms is a finite constant^{4}^{4}4Here, and in what follows, whenever we say “finite constant”, we mean SNRindependent and finite..

To show that the dependent differential entropy obtained in Step 4 can be lowerbounded by a finite constant, apply the change of variables to rewrite the differential entropy as a sum of the differential entropy of and the expected (w.r.t. and ) logarithm of the Jacobian determinant corresponding to the transformation . Conclude that the differential entropy of is a finite constant. It remains to show that the expected logarithm of the Jacobian determinant is lowerbounded by a finite constant as well.

Factor out the dependent terms from the expected logarithm of the Jacobian determinant and conclude that these terms are finite constants. It remains to show that the expected logarithm of the dependent factor in the Jacobian determinant is lowerbounded by a finite constant as well. This poses the greatest technical difficulties in the proof of the lower bound (12) and is addressed in the remaining steps.

Based on a deep result from algebraic geometry, known as Hironaka’s Theorem on the Resolution of Singularities, conclude that the expected logarithm of the dependent factor in the Jacobian determinant is lowerbounded by a finite constant, provided that this factor is nonzero for at least one element in its domain.

Prove by explicit construction that there exists at least one , for which the dependent factor in the Jacobian determinant is nonzero.
We next implement the proof program outlined above.
ViB Step 1: Choice of input distribution
First note that for the lower bound in (12) is reduced to and is hence trivially satisfied. In the remainder of the paper we shall therefore assume that .
We shall furthermore work under the assumption
(21) 
which trivially leads to a capacity prelog lower bound as capacity is a nondecreasing function of (one can always switch off receive antennas).
A capacity lower bound is trivially obtained by evaluating the mutual information in (4) for an appropriate input distribution. Specifically, we take i.i.d. , . This implies that and, hence [19, Lem. 6.7],
(22) 
We point out that every input vector with i.i.d., zero mean, unit variance entries that satisfy would allow us to prove (12). The choice is made for concreteness and convenience.
ViC Step 2: Mutual information decomposition
Decompose
(23) 
and separately bound the two differential entropy terms for the input distribution chosen in Step 1.
ViD Step 3: Analysis of
As conditioned on is JPG, the conditional differential entropy can be upperbounded in a straightforward manner as follows:
(24) 
Here, (a) follows from Jensen’s inequality, and (b) holds because has rank and, therefore, for all .
ViE Step 4: Splitting into three terms
Finding an asymptotically (in SNR) tight lower bound on is the main technical challenge of the proof of Theorem 1. The backoftheenvelope calculation presented in Section III suggests that the problem can be approached by splitting into a term that depends on the noiseless channel output only and a term that depends on noise only. This can be realized as follows.
Consider a set of indices (we shall later discuss how to choose ) and define the following projection matrices
We can lowerbound according to
(25) 
Here, (a) follows by the chain rule for differential entropy; (b) follows from (3), (6), and because conditioning reduces entropy; (c) follows because differential entropy is invariant under translations and because and are independent; (d) follows because and are independent; and in (e) we used the fact that is a dimensional vector and , where here and in what follows denotes a constant that is independent of and can take a different value at each appearance.
Through this chain of inequalities, we disposed of noise and isolated SNRdependence into a separate term. This corresponds to considering the noisefree IO relation (6) in the backoftheenvelope calculation. Note further that we also rid ourselves of the components of indexed by ; this corresponds to eliminating unnecessary equations in the backoftheenvelope calculation. The specific choice of the set is crucial and will be discussed next.
ViF Step 5: Analysis of the Snrdependent term in (ViE)
If , we can substitute (VIE) and (VID) into (23) which then yields a capacity lower bound of the form
(26) 
This bound needs to be tightened by choosing the set such that is as large as possible while guaranteeing . Comparing the lower bound (26) to the upper bound (11) we see that the bounds match if
(27) 
Condition (27) dictates that for we must set , which yields . When the set must be a proper subset of . Specifically, we shall choose as follows. Set
(28) 
let
and define .
ViG Step 6: Analysis of through change of variables
It is difficult to analyze directly since depends on the pair of variables in a nonlinear fashion. We have seen, in Section III, that (6) has a unique solution in , provided that the appropriate number of pilot symbols is used. This suggests that there must be a onetoone correspondence between and the pair . The existence of such a onetoone correspondence allows us to locally linearize the equation and to relate to . This idea is key to bringing into a form that eventually allows us to conclude that .
Formally, it is possible to relate the differential entropies of two random vectors of the same dimension that are related by a deterministic onetoone function (in the sense of [21, p.7]) according to the following lemma.
Lemma 3 (Transformation of differential entropy).
Assume that is a continuous vectorvalued function that is onetoone and differentiable almost everywhere (a.e.) on . Let be a continuous [17, Sec. 23, Def. (2)] random vector (i.e., it has a welldefined PDF) and let . Then
where is the Jacobian of the function .
The proof follows from the changeofvariables theorem for integrals [21, Thm. 7.26] and is given in Appendix B for completeness since the version of the theorem for complexvalued functions does not seem to be well documented in the literature.
Note that with given in (27) and . Since (see (27)), the vectors and are of different dimensions and Lemma 3 can therefore not be applied directly to relate to . This problem can be resolved by conditioning on a subset (specified below) of components of according to
(29) 
The components correspond to the pilot symbols in the backoftheenvelope calculation. The set is chosen such that (i) the set of remaining components in , , is of appropriate size ensuring that and are of the same dimension, and (ii) and are related by a deterministic bijection so that Lemma 3 can be applied to relate to . Specifically, set
(30) 
let , which implies . Observe that (conditioned on ) depends only on , and due to our choice of (it is actually the choice of that is important here), the vectors and are of the same dimension. Furthermore, these two vectors are related through a deterministic bijection: Consider the vectorvalued function
(31) 
Here, and whenever we refer to the function in the following, we use the convention that the parameter vector and the variable vector are stacked into the vector and we set .
Lemma 4.
If has nonzero components only, i.e., for all , then the function is onetoone a.e. on .
The proof of Lemma 4 is given in Appendix C and is based on the results obtained later in this section. We therefore invite the reader to first study the remainder of Section V and to return to Appendix C afterwards.
Recall that and hence . Therefore, it follows from Lemma 4 that as long as is fixed and satisfies , for all , and are related through the bijection as claimed.
Comments
A few comments on Lemma 4 are in order. For and as in the simple example in Section III, we see from (27) that so that and . Further, for this example, it follows from (30) that and hence and . Therefore, Lemma 4 simply says that (6) has a unique solution for fixed . As already mentioned, conditioning w.r.t. in (29) in order to make the relation between and be onetoone corresponds to transmitting a pilot symbol, as was done in the backoftheenvelope calculation by setting .
We can now use Lemma 3 to relate