Noncoherent Capacity of Underspread Fading Channels
We derive bounds on the noncoherent capacity of wide-sense stationary uncorrelated scattering (WSSUS) channels that are selective both in time and frequency, and are underspread, i.e., the product of the channel’s delay spread and Doppler spread is small. For input signals that are peak constrained in time and frequency, we obtain upper and lower bounds on capacity that are explicit in the channel’s scattering function, are accurate for a large range of bandwidth and allow to coarsely identify the capacity-optimal bandwidth as a function of the peak power and the channel’s scattering function. We also obtain a closed-form expression for the first-order Taylor series expansion of capacity in the limit of large bandwidth, and show that our bounds are tight in the wideband regime. For input signals that are peak constrained in time only (and, hence, allowed to be peaky in frequency), we provide upper and lower bounds on the infinite-bandwidth capacity and find cases when the bounds coincide and the infinite-bandwidth capacity is characterized exactly. Our lower bound is closely related to a result by Viterbi (1967).
The analysis in this paper is based on a discrete-time discrete-frequency approximation of WSSUS time- and frequency-selective channels. This discretization explicitly takes into account the underspread property, which is satisfied by virtually all wireless communication channels.
vario\fancyrefseclabelprefixSection #1 \frefformatvariothmTheorem #1 \frefformatvariolemLemma #1 \frefformatvariocorCorollary #1 \frefformatvariodefDefinition #1 \frefformatvario\fancyreffiglabelprefixFig. #1 \frefformatvarioappAppendix #1 \frefformatvario\fancyrefeqlabelprefix(#1) \frefformatvariopropProperty #1
I Introduction and Outline
I-1 Models for fading channels
Channel capacity is a benchmark for the design of any communication system. The techniques used to compute, or at least to bound, channel capacity often provide guidelines for the design of practical systems, e.g., how to best utilize the resources bandwidth and power, and how to design efficient modulation and coding schemes [1, Sec. III.3]. Our goal in this paper is to analyze the capacity of wireless communication channels that are of direct practical importance. We believe that an accurate stochastic model for such channels should take the following aspects into account:
The channel is selective in time and frequency, i.e., it exhibits memory in frequency and in time, respectively.
Neither the transmitter nor the receiver knows the instantaneous realization of the channel.
The peak power of the input signal is limited.
These aspects are important because they arise from practical limitations of real-world communication systems: temporal variations of the environment and multipath propagation are responsible for channel selectivity in time and frequency, respectively [2, 3]; perfect channel knowledge at the receiver is impossible to obtain because channel state information needs to be extracted from the received signal; finally, realizable transmitters are always limited in their peak output power . The above aspects are also fundamental as they significantly impact the behavior of channel capacity: for example, the capacity of a block-fading channel behaves differently from the capacity of a channel that is stationary in time ; channel capacity with perfect channel knowledge at the receiver is always larger than the capacity without channel knowledge , and the signaling schemes necessary to achieve capacity are also very different in the two cases ; finally, a peak constraint on the transmit signal can lead to vanishing capacity in the large-bandwidth limit [7, 8, 9], while without a peak constraint the infinite-bandwidth AWGN capacity can be attained asymptotically [7, 10, 11, 12, 13, 14, 15].
Small scale fading of wireless channels can be sensibly modeled as a stochastic Gaussian linear time-varying (LTV) system ; in particular, we base our developments on the widely used wide-sense stationary uncorrelated scattering (WSSUS) model for random LTV channels [16, 12]. Like most models for real-world channels, the WSSUS model is time continuous; however, almost all tools for information-theoretic analysis of noisy channels require a discretized representation of the channel’s input-output relation. Several approaches to discretize random LTV channels are proposed in the literature, e.g., sampling [8, 16, 17] or basis expansion [18, 19]; all these discretized models incur an approximation error with respect to the continuous-time WSSUS model that is often difficult to quantify. As virtually all wireless channels of practical interest are underspread, i.e., the product of maximum delay and maximum Doppler shift is small, we build our information-theoretic analysis upon a discretization of LTV channels, proposed by Kozek , that explicitly takes into account the underspread property to minimize the approximation error in the mean-square sense.
I-2 Capacity of noncoherent WSSUS channels
Throughout the paper, we assume that both the transmitter and receiver know the channel law111This implies that the codebook and the decoding strategy can be optimized accordingly . but both are ignorant of the channel realization, a setting often called noncoherent. In the following, we refer to channel capacity in the noncoherent setting simply as “capacity”. In contrast, in the coherent setting the receiver is also assumed to know the channel realization perfectly; the corresponding capacity is termed coherent capacity.
A general closed-form expression for the capacity of Rayleigh-fading channels is not known, even if the channel is memoryless . However, several asymptotic results are available. If only a constraint on the average transmitted power is imposed, the AWGN capacity can be achieved in the infinite-bandwidth limit also in the presence of fading. This result is quite robust, as it holds for a wide variety of channel models [7, 10, 11, 12, 13, 14, 15]. Verdú showed that flash signaling, which implies unbounded peak power of the input signal, is necessary and sufficient to achieve the infinite-bandwidth AWGN capacity on block-memoryless fading channels ; a form of flash signaling is also infinite-bandwidth optimal for the more general time- and frequency-selective channel model used in the present paper . In contrast, if the peakiness of the input signal is restricted, the infinite-bandwidth capacity behavior of most fading channels changes drastically, and the limit depends on the type of peak constraint imposed [7, 8, 9, 13, 23]. In this paper, we shall distinguish between a peak constraint in time and a peak constraint in time and frequency.
Peak constraint in time
No closed-form capacity expression, not even in the infinite-bandwidth limit, seems to exist to date for time- and frequency-selective WSSUS channels. Viterbi’s analysis  provides a result that can be interpreted as a lower bound on the infinite-bandwidth capacity of time- and frequency-selective channels. This lower bound is in the form of the infinite-bandwidth AWGN capacity minus a penalty term that depends on the channel’s power-Doppler profile . For channels that are time selective but frequency flat, structurally similar expressions were found for the infinite-bandwidth capacity [24, 25] and for the capacity per unit energy .
Peak constraint in time and frequency
Although a closed-form capacity expression valid for all bandwidths is not available, it is known that the infinite-bandwidth capacity is zero for various channel models [7, 8, 9]. This asymptotic capacity behavior implies that signaling schemes that spread the transmit energy uniformly across time and frequency perform poorly in the large-bandwidth regime. Even more useful for performance assessment would be capacity bounds for finite bandwidth. For frequency-flat time-selective channels, such bounds can be found in [27, 28], while for the more general time- and frequency-selective case treated in the present paper, upper bounds seem to exist only on the rates achievable with particular signaling schemes, namely for orthogonal frequency-division multiplexing (OFDM) with constant-modulus symbols , and for multiple-input multiple-output (MIMO) OFDM with unitary space-frequency codes over frequency-selective block-fading channels .
We use the discrete-time discrete-frequency approximation of continuous-time underspread WSSUS channels proposed in , to obtain the following results:
We derive upper and lower bounds on capacity under a constraint on the average power and under a peak constraint in both time and frequency. These bounds are valid for any bandwidth, are explicit in the channel’s scattering function, and generalize the results on achievable rates in . In particular, our bounds allow to coarsely identify the capacity-optimal bandwidth for a given peak constraint and a given scattering function.
Under the same peak constraint in time and frequency, we find the first-order Taylor series expansion of channel capacity in the limit of infinite bandwidth. This result extends the asymptotic capacity analysis for frequency-flat time-selective channels in  to channels that are selective in both time and frequency.
In the infinite-bandwidth limit and for transmit signals that are peak-constrained in time only, we recover Viterbi’s capacity lower bound . In addition, we derive an upper bound that is shown to coincide with the lower bound for a specific class of channels; hence, the infinite-bandwidth capacity for this class of channels is established.
The results in this paper rely on several flavors of Szegö’s theorem on the asymptotic eigenvalue distribution of Toeplitz matrices [31, 32]; in particular, we use various extensions of Szegö’s theorem to two-level Toeplitz matrices, i.e., block-Toeplitz matrices that have Toeplitz blocks [33, 34]. Another key ingredient for several of our proofs is the relation between mutual information and minimum mean-square error (MMSE) discovered recently by Guo et al. . Furthermore, we use a property of the information divergence of orthogonal signaling schemes derived by Butman and Klass .
Uppercase boldface letters denote matrices and lowercase boldface letters designate vectors. The superscripts , , and stand for transposition, element-wise conjugation, and Hermitian transposition, respectively. For two matrices and of appropriate dimensions, the Hadamard product is denoted as . We designate the identity matrix of dimension as and the all-zero vector of appropriate dimension as . We let denote a diagonal square matrix whose main diagonal contains the elements of the vector . The determinant, trace, and rank of the matrix are denoted as , , and , respectively, and is the th eigenvalue of a square matrix . The function is the Dirac distribution, and is defined as and for all . All logarithms are to the base . The real part of the complex number is denoted . We write for the set difference between the sets and . For two functions and , the notation for means that . With we denote the largest integer smaller or equal to . A signal is an element of the Hilbert space of square integrable functions. The inner product between two signals and is denoted as . For a random variable (RV) with distribution , we write . We denote expectation by , and use the notation to stress that the expectation is taken with respect to the RV . We write for the Kullback-Leibler (KL) divergence between the two distributions and . Finally, stands for the distribution of a jointly proper Gaussian (JPG) random vector with mean and covariance matrix .
Ii Channel and System Model
A channel model needs to strike a balance between generality, accuracy, engineering relevance, and mathematical tractability. In the following, we start from the classical WSSUS model for LTV channels [16, 12] because it is a fairly general, yet accurate and mathematically tractable model that is widely used. This model has a continuous-time input-output relation, which is difficult to use as a basis for information-theoretic studies. However, if the channel is underspread it is possible to closely approximate the original WSSUS input-output relation by a discretized input-output relation that is especially suited for the derivation of capacity bounds. In particular, the bounds we derive in this paper can be directly related to the underlying continuous-time WSSUS channel as they are explicit in its scattering function.
Ii-a Time- and Frequency-Selective Underspread Fading Channels
Ii-A1 The channel operator
A wireless channel can be described as a linear operator that maps an input signal into an output signal , where denotes the range space of . The corresponding noise-free input-output relation is then .
It is sensible to model wireless channels as random, for one because a deterministic description of the physical propagation environment is too complex in most cases of practical interest, and second because a stochastic description is much more robust, in the sense that systems designed on the basis of a stochastic channel model can be expected to work in a variety of different propagation environments . Consequently, we assume that is a random operator.
Ii-A2 System functions
Because communication takes place over a finite bandwidth and a finite time duration, we can assume that each realization of is a Hilbert-Schmidt operator [38, 39]. Hence, the noise-free input-output relation of the LTV channel can be written as222All integrals are from to unless stated otherwise.[38, p. 1083]
where the kernel can be interpreted as the channel response at time to a Dirac impulse at time . Instead of two variables that denote absolute time, it is common in the engineering literature to use absolute time and delay . This leads to the time-varying impulse response and the corresponding noise-free input-output relation 
Two more system functions that will be important in the following developments are the time-varying transfer function333 As is of Hilbert-Schmidt type, the time-varying impulse response is square integrable, and the Fourier transforms in (3) and (4) are well defined.
and the spreading function
In particular, if we rewrite the input-output relation \frefeq:ltv-io in terms of the spreading function as
we obtain an intuitive physical interpretation: the output signal is a weighted superposition of copies of the input signal that are shifted in time by the delay and in frequency by the Doppler shift .
Ii-A3 Stochastic characterization and WSSUS assumption
For mathematical tractability, we need to make additional assumptions on the system functions. First, we assume that is a zero-mean JPG random process in and . Indeed, the Gaussian distribution is empirically supported for narrowband channels , and even ultrawideband (UWB) channels with bandwidth up to several gigahertz can be modeled as Gaussian distributed . By virtue of the Gaussian assumption, is completely characterized by its correlation function. Yet, this correlation function is four-dimensional in general and thus difficult to work with. A further simplification is possible if we assume that the channel process is wide-sense stationary in time and uncorrelated in delay , the so-called WSSUS assumption . As a consequence, is wide-sense stationary both in time and frequency , or, equivalently, is uncorrelated in Doppler and delay :
The function is called the channel’s (time-frequency) correlation function, and is called the scattering function of the channel . The two functions are related by a two-dimensional Fourier transform,
As is stationary in and , is nonnegative and real-valued for all and , and can be interpreted as the spectrum of the channel process. The power-delay profile of is defined as
|and the power-Doppler profile as|
The WSSUS assumption is widely used in wireless channel modeling [16, 12, 2, 1, 41, 42]. It is in good agreement with measurements of tropospheric scattering channels , and provides a reasonable model for many types of mobile radio channels [43, 44, 45], at least over a limited time duration and bandwidth . Furthermore, the scattering function can be directly estimated from measured data [46, 47], so that capacity expressions and bounds that explicitly depend on the channel’s scattering function can be evaluated for many channels of practical interest.
Formally, the WSSUS assumption is mathematically incompatible with the requirement that is of Hilbert-Schmidt type, or, equivalently, that the system functions are square integrable, because stationarity in time and frequency of implies that cannot decay to zero for and . Similarly to the engineering model of white noise, this incompatibility is a mathematical artifact and not a problem of real-world wireless channels: in fact, every communication system transmits over a finite time duration and over a finite bandwidth.444A more detailed account on solutions to overcome the mathematical incompatibility between stationary and finite-energy models can be found in [48, Sec. 7.5]. We believe that the simplification the WSSUS assumption entails justifies this mathematical inconsistency.
Ii-B The Underspread Assumption and its Consequences
Because the velocity of the transmitter, of the receiver, and of the objects in the propagation environment is limited, so is the maximum Doppler shift experienced by the transmitted signal. We also assume that the maximum delay is strictly smaller than . For simplicity and without loss of generality, throughout this paper, we consider scattering functions that are centered at and , i.e., we remove any overall fixed delay and Doppler shift. The assumptions of limited Doppler shift and delay then imply that the scattering function is supported on a rectangle of spread ,
Condition (7) in turn implies that the spreading function is also supported on the same rectangle with probability 1 (w.p.1). If , the channel is said to be underspread [16, 12, 20]. Virtually all channels in wireless communication are highly underspread, with for typical land-mobile channels and as low as for some indoor channels with restricted mobility of the terminals [49, 50, 51]. The underspread property of typical wireless channels is very important, first because only (deterministic) underspread channels can be completely identified from measurements [52, 53], and second because underspread channels have a well-structured set of approximate eigenfunctions that can be used to discretize the channel operator, as described next.
Ii-B1 Approximate diagonalization of underspread channels
As is a Hilbert-Schmidt operator, its kernel can be expressed in terms of its positive singular values , its left singular functions , and its right singular functions [37, Th. 6.14.1], according to
We denote by the null space of , i.e., the space of input signals that the channel maps onto . The set is an orthonormal basis for the linear span of , and is an orthonormal basis for the range space . Any input signal in is of no utility for communication purposes; the remaining input signals in the linear span of , which we denote in the remainder of the paper as input space, can be completely characterized by their projections onto the set . Similarly, the output signal is completely described by its projections onto the set . These projections together with the kernel decomposition \frefeq:svd yield a countable set of scalar input-output relations, which we refer to as the diagonalization of .
Because the right and left singular functions depend on the realization of , diagonalization requires perfect channel knowledge. But this knowledge is not available in the noncoherent setting. In contrast, if the singular functions of the random channel did not depend on its particular realization, we could diagonalize without knowledge of the channel realization. This is the case, for example, for random linear time-invariant (LTI) channels, where complex sinusoids are always eigenfunctions, independently of the realization of the channel’s impulse response. Fortunately, the singular functions of underspread random LTV channels can be well approximated by deterministic functions. More precisely, an underspread channel has the following properties :
All realizations of the underspread channel are approximately normal, so that the singular value decomposition \frefeq:svd can be replaced by an eigenvalue decomposition.
Any deterministic unit-energy signal that is well localized555We measure the joint time-frequency localization of a signal by the product between its effective duration and its effective bandwidth, defined in (64). in time and frequency is an approximate eigenfunction of in the mean-square sense, i.e., the mean-square error is small if is underspread. This error can be further reduced by an appropriate choice of , where the choice depends on the scattering function .
If is an approximate eigenfunction as defined in the previous point, then so is for any time shift and any frequency shift .
For any , the time-varying transfer function is an approximate eigenvalue of corresponding to the approximate eigenfunction , in the sense that the mean-square error is small.
We use these properties of underspread operators to construct an approximation of the random channel that has a well-structured set of deterministic eigenfunctions. The errors incurred by this approximation are discussed in detail in \frefapp:ch-approx-error. We then diagonalize this approximating operator and exclusively consider the corresponding discretized input-output relation in the reminder of the paper. Property 1, the approximate normality of , together with Property 2 implies that the kernel of the approximating operator can be synthesized as where, differently from \frefeq:svd, the are now random eigenvalues instead of random singular values, and the constitute a set of deterministic orthonormal eigenfunctions instead of random singular functions. Property 2 means that we are at liberty to choose the approximate eigenfunctions among all signals that are well localized in time and frequency. In particular, we would like the resulting approximating kernel to be convenient to work with and the approximate eigenfunctions easy to implement, as discussed in \frefsec:ofdm-interpretation; therefore, we choose the set of approximate eigenfunctions to be highly structured. By Property 3, it is possible to use time- and frequency-shifted versions of a single well-localized prototype function as eigenfunctions. Furthermore, because the support of is strictly limited in Doppler and delay , it follows from the sampling theorem and the Fourier transform relation \frefeq:spreading function that the samples , taken on a rectangular grid with and , are sufficient to characterize exactly. Hence, we take as our set of approximate eigenfunctions the so-called Weyl-Heisenberg set , where are orthonormal signals. The requirement that the are orthonormal and at the same time well localized in time and frequency implies , as a consequence of the Balian-Low theorem [55, Ch. 8]. Large values of the product allow for better time-frequency localization of , but result in a loss of dimensions in signal space compared with the critically sampled case . The Nyquist condition and can be readily satisfied for all underspread channels.
The samples are approximate eigenvalues of by Property 4; hence, our choice of approximate eigenfunctions results in the following approximating eigenvalue decomposition for
where denotes the kernel of the approximating operator . For , the Weyl-Heisenberg set is not complete in [54, Th. 8.3.1]. Therefore, the null space of is nonempty. As is only an approximation of , this null space might differ from . Similarly, the range space of might differ from . The characterization of the difference between these spaces is an important open problem.
Ii-B2 Canonical characterization of signaling schemes
The approximating random channel operator has a highly structured set of deterministic orthonormal eigenfunctions. We can, therefore, diagonalize the input-output relation of the approximating channel without the need for channel knowledge at both transmitter and receiver. Any input signal that lies in the input space of the approximating operator is uniquely characterized by its projections onto the set . All physically realizable transmit signals are effectively band limited. As the prototype function is well concentrated in frequency by construction, we can model the effective band limitation of by using only a finite number of slots in frequency. The resulting transmitted signal
then has effective bandwidth . We call the coefficient the transmit symbol in the time-frequency slot . The received signal can be expanded in the same basis. To compute the resulting projections, we substitute and the canonical input signal \frefeq:canonical-input into the integral input-output relation \frefeq:ltv-kernel-io, add white Gaussian noise , and project the resulting noisy received signal onto the functions , i.e.,
for all time-frequency slots . The last step in (11) follows from the orthonormality of the set . Orthonormality also implies that the discretized noise signal is JPG, independent and identically distributed (i.i.d.) over time and frequency ; for convenience, we normalize the noise variance so that for all and . The diagonalized input-output relation \frefeq:scalar-io is completely generic, i.e., it is not limited to a specific signaling scheme.
Ii-B3 OFDM interpretation of the approximating channel model
The canonical signaling scheme \frefeq:canonical-input and the corresponding discretized input-output relation \frefeq:scalar-io, are not just tools to analyze channel capacity, but also lead to a practical transmission system. The decomposition of the channel input signal \frefeq:canonical-input can be interpreted as pulse-shaped (PS) OFDM , where discrete data symbols are modulated onto a set of orthogonal signals, indexed by and . In addition, this perspective leads to an operational interpretation of the error incurred when approximating as in \frefeq:approx-kernel. The time- and frequency-dispersive nature of LTV channels leads to intersymbol interference (ISI) and intercarrier interference (ICI) in the received PS-OFDM signal. This is apparent if we project onto the function :
The second term on the right-hand side (RHS) of \frefeq:psofdm-rx corresponds to ISI and ICI, while the first term is the desired signal; we can approximate the first term as by Property 4. Comparison of \frefeq:scalar-io and \frefeq:psofdm-rx then shows that the input-output relation \frefeq:scalar-io, which results from the approximation \frefeq:approx-kernel, can be interpreted as PS-OFDM transmission over the original channel if all ISI and ICI terms are neglected.
With proper design of the prototype signal and choice of the grid parameters and , both ISI and ICI can be reduced [56, 57, 58]. The larger the product , the more effective the reduction in ISI and ICI, as discussed in \frefapp:ch-approx-error. Heuristically, a good compromise between loss of dimensions in signal space and reduction of the interference terms seems to result for [56, 58]. The cyclic prefix (CP) in a conventional CP-OFDM system incurs a similar dimension loss.
In (72), we provide an upper bound on mean-square energy of the interference term in \frefeq:psofdm-rx, and show how this upper bound can be minimized by a careful choice of the signal and of the grid parameters and [20, 17, 58]. For general scattering functions, the optimization of the triple needs to be performed numerically; a general guideline is to choose and such that (see \frefapp:ch-approx-error)
To summarize, in this section we constructed an approximation of the random linear operator on the basis of the underspread property. The kernel of the approximating operator is synthesized from the Weyl-Heisenberg set as in (9), so that is an orthonormal basis for the input space and the range space of . The decomposition of the input signal (10) can be interpreted as PS-OFDM: this interpretation sheds light on one of the errors resulting from the approximation (9). Finally, an important open problem is the characterization of the difference between the input spaces of and , and between the range spaces of and .
Ii-C Linear Time-Invariant and Linear Frequency-Invariant Channels
The properties of LTV underspread channels we listed in \frefsec:underspread are similar to the properties of LTI and linear frequency-invariant (LFI) channels: both LTI and LFI channel operators are normal and have a well-structured set of deterministic eigenfunctions (sinusoids parametrized by frequency for LTI channels, and Dirac functions parametrized by time for LFI channels), with corresponding eigenvalues equal to the samples of a channel system function (e.g., the transfer function in the LTI case). Intuitively, LTI and LFI channels are limiting cases within the class of LTV channels analyzed in this section; in fact, an LTV channel reduces to an LTI channel when , and to an LFI channel when . Both LTI and LFI channels are then underspread, according to our definition. Yet, since LTI and LFI channel operators are not of Hilbert-Schmidt type [59, App. A], the kernel diagonalization presented in \frefsec:underspread does not apply to these two classes of channels; consequently, the capacity bounds we derive in Sections III and IV do not reduce to capacity bounds for the LTI or the LFI case when or , respectively.666For deterministic LTI channels, a channel discretization that is useful for information-theoretic analysis is discussed in [13, Sec. 8.5].
Quasi-LTI channels, i.e., channels that are slowly time varying ( small but positive), and quasi-LFI channels, i.e., channels that are slowly frequency varying ( small but positive), can instead be approximately diagonalized as described in \frefsec:underspread, as long as they are underspread.
Ii-D Discrete-Time Discrete-Frequency Input-Output Relation
The discrete-time discrete-frequency channel coefficients constitute a two-dimensional discrete-parameter stationary random process that is JPG with zero mean and correlation function
The two-dimensional power spectral density of is defined as
We shall often need the following expression for in terms of the scattering function :
where (a) follows from the Fourier transform relation \frefeq:scafun-chcorr, and (b) results from Poisson’s summation formula. The variance of each channel coefficient is given by
where (a) follows from \frefeq:specfun-scafun, and (b) results because we chose the grid parameters to satisfy the Nyquist conditions and , so that periodic repetitions of the compactly supported scattering function lie outside the integration region. Finally, (c) follows from the change of variables and . For ease of notation, we normalize throughout the paper.
For each time slot , we arrange the discretized input signal , the discretized output signal , the channel coefficients , and the noise samples in corresponding vectors. For example, the -dimensional vector that contains the input symbols in the th time slot is defined as
The output vector , the channel vector , and the noise vector are defined analogously. This notation allows us to rewrite the input-output relation \frefeq:scalar-io as
for all . In this formulation, the channel is a multivariate stationary process with matrix-valued correlation function
In most of the following analyses, we initially consider a finite number of time slots and then take the limit . To obtain a compact notation, we stack contiguous elements of the multivariate input, channel, and output processes just defined. For the channel input, this results in the -dimensional vector
Again, the stacked vectors , , and are defined analogously. With these definitions, we can now compactly express the input-output relation \frefeq:scalar-io as
We denote the correlation matrix of the stacked channel vector by . Because the channel process is stationary in time and in frequency, is a two-level Hermitian Toeplitz matrix, given by
Ii-E Power Constraints
Throughout the paper, we assume that the average power of the transmitted signal is constrained as . In addition, we limit the peak power to be no larger than times the average power, where is the nominal peak- to average-power ratio (PAPR).
The multivariate input-output relation \frefeq:vec-io allows to constrain the peak power in several different ways. We analyze the following two cases:
Peak constraint in time: The power of the transmitted signal in each time slot is limited as
This constraint models the fact that physically realizable power amplifiers can only provide limited output power .
Peak constraint in time and frequency: Regulatory bodies sometimes limit the peak power in certain frequency bands, e.g., for UWB systems. We model this type of constraint by imposing a limit on the squared amplitude of the transmitted symbols in each time-frequency slot according to
This type of constraint is more stringent than the peak constraint in time given in \frefeq:peak-per-tslot.
Both peak constraints above are imposed on the input symbols , i.e., in the eigenspace of the approximating channel operator. This limitation is mathematically convenient; however, the peak value of the corresponding transmitted continuous-time signal in \frefeq:canonical-input also depends on the prototype signal , so that a limit on does not generally imply that is peak limited.
Iii Capacity Bounds under a Peak Constraint in Time and Frequency
In the present section, we analyze the capacity of the discretized channel in \frefeq:scalar-io subject to the peak constraint in time and frequency specified by \frefeq:peak-per-tfslot. The link between the discretized channel \frefeq:scalar-io and the continuous-time channel model established in \frefsec:model then allows us to express the resulting bounds in terms of the scattering function of the underspread WSSUS channel .
for a given bandwidth . Here, the supremum is taken over the set of all input distributions that satisfy the peak constraint \frefeq:peak-per-tfslot and the average-power constraint .
The capacity of fading channels with finite bandwidth has so far resisted all attempts at closed-form solutions [62, 22, 63], even for the memoryless case; thus, we resort to bounds to characterize the capacity \frefeq:capacityPeakTF. In particular, we present the following bounds:
An upper bound , which we refer to as coherent upper bound, that is based on the assumption that the receiver has perfect knowledge of the channel realizations. This bound is standard; it turns out to be useful for small bandwidth.
An upper bound that is useful for medium to large bandwidth. This bound is explicit in the channel’s scattering function and extends the upper bound [28, Prop. 2.2] on the capacity of frequency-flat time-selective channels to general underspread channels that are selective in time and frequency.
A lower bound that extends the lower bound [27, Prop. 2.2] to general underspread channels that are selective in time and frequency. This bound is explicit in the channel’s scattering function only for large bandwidth.
Iii-a Coherent Upper Bound
The assumption that the receiver perfectly knows the instantaneous channel realizations furnishes the following capacity upper bound:
Here, (a) holds because the coherent mutual information, , is an upper bound on the corresponding mutual information in the noncoherent setting. Inequality (b) follows as we drop the peak constraint and thus enlarge the set of admissible input distributions. The supremum of over the resulting relaxed input constraint is achieved by a zero-mean JPG input vector with covariance matrix that satisfies . To obtain (c), we use that, conditioned on , the output vector is JPG and its covariance matrix can be expressed as
where the last equality results from the following elementary relation between Hadamard products and outer products:
Finally, (d) follows from Hadamard’s inequality, from the fact that by Jensen’s inequality the supremum is achieved by , and because the channel coefficients all have the same distribution . As the upper bound \frefeq:coh-ub-deriv does not depend on , we obtain an upper bound on capacity \frefeq:capacityPeakTF as a function of bandwidth if we set :
For a discretization of the WSSUS channel different from the one in \frefsec:underspread, Médard and Gallager  showed that the corresponding capacity vanishes with increasing bandwidth if the peakiness of the input signal is constrained in a way that includes our peak constraint \frefeq:peak-per-tfslot. As the upper bound monotonically increases in , it is sensible to conclude that does not accurately reflect the capacity behavior for large bandwidth. However, we demonstrate in \frefsec:num-eval by means of a numerical example that can be quite useful for small and medium bandwidth.
Iii-B An Upper Bound for Large but Finite Bandwidth
To better understand the capacity behavior at large bandwidth, we derive an upper bound that captures the effect of diminishing capacity in the large-bandwidth regime. The upper bound is explicit in the channel’s scattering function .
Iii-B1 The upper bound
Consider an underspread Rayleigh-fading channel with scattering function ; assume that the channel input satisfies the average-power constraint and the peak constraint w.p.1. The capacity of this channel is upper-bounded as , where
To bound , we first use the chain rule for mutual information, . Next, we split the supremum over into two parts, similarly as in the proof of [28, Prop. 2.2]: one supremum over a restricted set of input distributions that satisfy the peak constraint (24) and have a prescribed average power, i.e., for some fixed parameter , and another supremum over the parameter . Both steps together yield the upper bound
Next, we bound the two terms inside the braces individually. While standard steps suffice for the bound on the first term, the second term requires some more effort; we relegate some of the more technical steps to \frefapp:mmse-mi.
Upper bound on the first term
The output vector depends on the input vector only through , so that . To upper-bound the mutual information , we take as JPG with zero mean and covariance matrix . An upper bound on the first term inside the braces in \frefeq:ubTFpeak-step1 now results if we drop the peak constraint on . Then,
where (a) follows from Hadamard’s inequality and (b) from Jensen’s inequality.
Lower bound on the second term
We use the fact that the channel is JPG, so that . Next, we expand the expectation operator as follows:
where is the integration domain because the input distribution satisfies the peak constraint (24). Both factors under the integral are nonnegative; hence, we obtain a lower bound on the expectation if we replace the first factor by its infimum over .
As the matrix is positive semidefinite, the above infimum is achieved on the boundary of the admissible set [26, Sec. VI.A], i.e., by a vector whose entries satisfy . We use this fact and the relation between mutual information and MMSE, recently discovered by Guo et al. , to further lower-bound the infimum on the RHS in \frefeq:ubTFpeak-term2-inf. The corresponding derivation is detailed in \frefapp:mmse-mi; it results in
where , defined in \frefeq:chspecfun, is the two-dimensional power spectral density of the channel process . Finally, we use the bound \frefeq:ubTFpeak-term2-immse-lb in \frefeq:ubTFpeak-term2-inf, relate to the scattering function by means of \frefeq:specfun-scafun and get