Noncoherent Capacity of Underspread Fading Channels
Abstract
We derive bounds on the noncoherent capacity of widesense stationary uncorrelated scattering (WSSUS) channels that are selective both in time and frequency, and are underspread, i.e., the product of the channel’s delay spread and Doppler spread is small. For input signals that are peak constrained in time and frequency, we obtain upper and lower bounds on capacity that are explicit in the channel’s scattering function, are accurate for a large range of bandwidth and allow to coarsely identify the capacityoptimal bandwidth as a function of the peak power and the channel’s scattering function. We also obtain a closedform expression for the firstorder Taylor series expansion of capacity in the limit of large bandwidth, and show that our bounds are tight in the wideband regime. For input signals that are peak constrained in time only (and, hence, allowed to be peaky in frequency), we provide upper and lower bounds on the infinitebandwidth capacity and find cases when the bounds coincide and the infinitebandwidth capacity is characterized exactly. Our lower bound is closely related to a result by Viterbi (1967).
The analysis in this paper is based on a discretetime discretefrequency approximation of WSSUS time and frequencyselective channels. This discretization explicitly takes into account the underspread property, which is satisfied by virtually all wireless communication channels.
vario\fancyrefseclabelprefixSection #1 \frefformatvariothmTheorem #1 \frefformatvariolemLemma #1 \frefformatvariocorCorollary #1 \frefformatvariodefDefinition #1 \frefformatvario\fancyreffiglabelprefixFig. #1 \frefformatvarioappAppendix #1 \frefformatvario\fancyrefeqlabelprefix(#1) \frefformatvariopropProperty #1
I Introduction and Outline
I1 Models for fading channels
Channel capacity is a benchmark for the design of any communication system. The techniques used to compute, or at least to bound, channel capacity often provide guidelines for the design of practical systems, e.g., how to best utilize the resources bandwidth and power, and how to design efficient modulation and coding schemes [1, Sec. III.3]. Our goal in this paper is to analyze the capacity of wireless communication channels that are of direct practical importance. We believe that an accurate stochastic model for such channels should take the following aspects into account:

The channel is selective in time and frequency, i.e., it exhibits memory in frequency and in time, respectively.

Neither the transmitter nor the receiver knows the instantaneous realization of the channel.

The peak power of the input signal is limited.
These aspects are important because they arise from practical limitations of realworld communication systems: temporal variations of the environment and multipath propagation are responsible for channel selectivity in time and frequency, respectively [2, 3]; perfect channel knowledge at the receiver is impossible to obtain because channel state information needs to be extracted from the received signal; finally, realizable transmitters are always limited in their peak output power [4]. The above aspects are also fundamental as they significantly impact the behavior of channel capacity: for example, the capacity of a blockfading channel behaves differently from the capacity of a channel that is stationary in time [5]; channel capacity with perfect channel knowledge at the receiver is always larger than the capacity without channel knowledge [6], and the signaling schemes necessary to achieve capacity are also very different in the two cases [1]; finally, a peak constraint on the transmit signal can lead to vanishing capacity in the largebandwidth limit [7, 8, 9], while without a peak constraint the infinitebandwidth AWGN capacity can be attained asymptotically [7, 10, 11, 12, 13, 14, 15].
Small scale fading of wireless channels can be sensibly modeled as a stochastic Gaussian linear timevarying (LTV) system [2]; in particular, we base our developments on the widely used widesense stationary uncorrelated scattering (WSSUS) model for random LTV channels [16, 12]. Like most models for realworld channels, the WSSUS model is time continuous; however, almost all tools for informationtheoretic analysis of noisy channels require a discretized representation of the channel’s inputoutput relation. Several approaches to discretize random LTV channels are proposed in the literature, e.g., sampling [8, 16, 17] or basis expansion [18, 19]; all these discretized models incur an approximation error with respect to the continuoustime WSSUS model that is often difficult to quantify. As virtually all wireless channels of practical interest are underspread, i.e., the product of maximum delay and maximum Doppler shift is small, we build our informationtheoretic analysis upon a discretization of LTV channels, proposed by Kozek [20], that explicitly takes into account the underspread property to minimize the approximation error in the meansquare sense.
I2 Capacity of noncoherent WSSUS channels
Throughout the paper, we assume that both the transmitter and receiver know the channel law^{1}^{1}1This implies that the codebook and the decoding strategy can be optimized accordingly [21]. but both are ignorant of the channel realization, a setting often called noncoherent. In the following, we refer to channel capacity in the noncoherent setting simply as “capacity”. In contrast, in the coherent setting the receiver is also assumed to know the channel realization perfectly; the corresponding capacity is termed coherent capacity.
A general closedform expression for the capacity of Rayleighfading channels is not known, even if the channel is memoryless [22]. However, several asymptotic results are available. If only a constraint on the average transmitted power is imposed, the AWGN capacity can be achieved in the infinitebandwidth limit also in the presence of fading. This result is quite robust, as it holds for a wide variety of channel models [7, 10, 11, 12, 13, 14, 15]. Verdú showed that flash signaling, which implies unbounded peak power of the input signal, is necessary and sufficient to achieve the infinitebandwidth AWGN capacity on blockmemoryless fading channels [14]; a form of flash signaling is also infinitebandwidth optimal for the more general time and frequencyselective channel model used in the present paper [15]. In contrast, if the peakiness of the input signal is restricted, the infinitebandwidth capacity behavior of most fading channels changes drastically, and the limit depends on the type of peak constraint imposed [7, 8, 9, 13, 23]. In this paper, we shall distinguish between a peak constraint in time and a peak constraint in time and frequency.
Peak constraint in time
No closedform capacity expression, not even in the infinitebandwidth limit, seems to exist to date for time and frequencyselective WSSUS channels. Viterbi’s analysis [23] provides a result that can be interpreted as a lower bound on the infinitebandwidth capacity of time and frequencyselective channels. This lower bound is in the form of the infinitebandwidth AWGN capacity minus a penalty term that depends on the channel’s powerDoppler profile [16]. For channels that are time selective but frequency flat, structurally similar expressions were found for the infinitebandwidth capacity [24, 25] and for the capacity per unit energy [26].
Peak constraint in time and frequency
Although a closedform capacity expression valid for all bandwidths is not available, it is known that the infinitebandwidth capacity is zero for various channel models [7, 8, 9]. This asymptotic capacity behavior implies that signaling schemes that spread the transmit energy uniformly across time and frequency perform poorly in the largebandwidth regime. Even more useful for performance assessment would be capacity bounds for finite bandwidth. For frequencyflat timeselective channels, such bounds can be found in [27, 28], while for the more general time and frequencyselective case treated in the present paper, upper bounds seem to exist only on the rates achievable with particular signaling schemes, namely for orthogonal frequencydivision multiplexing (OFDM) with constantmodulus symbols [29], and for multipleinput multipleoutput (MIMO) OFDM with unitary spacefrequency codes over frequencyselective blockfading channels [30].
I3 Contributions
We use the discretetime discretefrequency approximation of continuoustime underspread WSSUS channels proposed in [20], to obtain the following results:

We derive upper and lower bounds on capacity under a constraint on the average power and under a peak constraint in both time and frequency. These bounds are valid for any bandwidth, are explicit in the channel’s scattering function, and generalize the results on achievable rates in [29]. In particular, our bounds allow to coarsely identify the capacityoptimal bandwidth for a given peak constraint and a given scattering function.

Under the same peak constraint in time and frequency, we find the firstorder Taylor series expansion of channel capacity in the limit of infinite bandwidth. This result extends the asymptotic capacity analysis for frequencyflat timeselective channels in [28] to channels that are selective in both time and frequency.

In the infinitebandwidth limit and for transmit signals that are peakconstrained in time only, we recover Viterbi’s capacity lower bound [23]. In addition, we derive an upper bound that is shown to coincide with the lower bound for a specific class of channels; hence, the infinitebandwidth capacity for this class of channels is established.
The results in this paper rely on several flavors of Szegö’s theorem on the asymptotic eigenvalue distribution of Toeplitz matrices [31, 32]; in particular, we use various extensions of Szegö’s theorem to twolevel Toeplitz matrices, i.e., blockToeplitz matrices that have Toeplitz blocks [33, 34]. Another key ingredient for several of our proofs is the relation between mutual information and minimum meansquare error (MMSE) discovered recently by Guo et al. [35]. Furthermore, we use a property of the information divergence of orthogonal signaling schemes derived by Butman and Klass [36].
I4 Notation
Uppercase boldface letters denote matrices and lowercase boldface letters designate vectors. The superscripts , , and stand for transposition, elementwise conjugation, and Hermitian transposition, respectively. For two matrices and of appropriate dimensions, the Hadamard product is denoted as . We designate the identity matrix of dimension as and the allzero vector of appropriate dimension as . We let denote a diagonal square matrix whose main diagonal contains the elements of the vector . The determinant, trace, and rank of the matrix are denoted as , , and , respectively, and is the th eigenvalue of a square matrix . The function is the Dirac distribution, and is defined as and for all . All logarithms are to the base . The real part of the complex number is denoted . We write for the set difference between the sets and . For two functions and , the notation for means that . With we denote the largest integer smaller or equal to . A signal is an element of the Hilbert space of square integrable functions. The inner product between two signals and is denoted as . For a random variable (RV) with distribution , we write . We denote expectation by , and use the notation to stress that the expectation is taken with respect to the RV . We write for the KullbackLeibler (KL) divergence between the two distributions and . Finally, stands for the distribution of a jointly proper Gaussian (JPG) random vector with mean and covariance matrix .
Ii Channel and System Model
A channel model needs to strike a balance between generality, accuracy, engineering relevance, and mathematical tractability. In the following, we start from the classical WSSUS model for LTV channels [16, 12] because it is a fairly general, yet accurate and mathematically tractable model that is widely used. This model has a continuoustime inputoutput relation, which is difficult to use as a basis for informationtheoretic studies. However, if the channel is underspread it is possible to closely approximate the original WSSUS inputoutput relation by a discretized inputoutput relation that is especially suited for the derivation of capacity bounds. In particular, the bounds we derive in this paper can be directly related to the underlying continuoustime WSSUS channel as they are explicit in its scattering function.
Iia Time and FrequencySelective Underspread Fading Channels
IiA1 The channel operator
A wireless channel can be described as a linear operator that maps an input signal into an output signal , where denotes the range space of [37]. The corresponding noisefree inputoutput relation is then .
It is sensible to model wireless channels as random, for one because a deterministic description of the physical propagation environment is too complex in most cases of practical interest, and second because a stochastic description is much more robust, in the sense that systems designed on the basis of a stochastic channel model can be expected to work in a variety of different propagation environments [3]. Consequently, we assume that is a random operator.
IiA2 System functions
Because communication takes place over a finite bandwidth and a finite time duration, we can assume that each realization of is a HilbertSchmidt operator [38, 39]. Hence, the noisefree inputoutput relation of the LTV channel can be written as^{2}^{2}2All integrals are from to unless stated otherwise.[38, p. 1083]
(1) 
where the kernel can be interpreted as the channel response at time to a Dirac impulse at time . Instead of two variables that denote absolute time, it is common in the engineering literature to use absolute time and delay . This leads to the timevarying impulse response and the corresponding noisefree inputoutput relation [16]
(2) 
Two more system functions that will be important in the following developments are the timevarying transfer function^{3}^{3}3 As is of HilbertSchmidt type, the timevarying impulse response is square integrable, and the Fourier transforms in (3) and (4) are well defined.
(3) 
and the spreading function
(4) 
In particular, if we rewrite the inputoutput relation \frefeq:ltvio in terms of the spreading function as
(5) 
we obtain an intuitive physical interpretation: the output signal is a weighted superposition of copies of the input signal that are shifted in time by the delay and in frequency by the Doppler shift .
IiA3 Stochastic characterization and WSSUS assumption
For mathematical tractability, we need to make additional assumptions on the system functions. First, we assume that is a zeromean JPG random process in and . Indeed, the Gaussian distribution is empirically supported for narrowband channels [2], and even ultrawideband (UWB) channels with bandwidth up to several gigahertz can be modeled as Gaussian distributed [40]. By virtue of the Gaussian assumption, is completely characterized by its correlation function. Yet, this correlation function is fourdimensional in general and thus difficult to work with. A further simplification is possible if we assume that the channel process is widesense stationary in time and uncorrelated in delay , the socalled WSSUS assumption [16]. As a consequence, is widesense stationary both in time and frequency , or, equivalently, is uncorrelated in Doppler and delay [16]:
The function is called the channel’s (timefrequency) correlation function, and is called the scattering function of the channel . The two functions are related by a twodimensional Fourier transform,
(6) 
As is stationary in and , is nonnegative and realvalued for all and , and can be interpreted as the spectrum of the channel process. The powerdelay profile of is defined as
and the powerDoppler profile as  
The WSSUS assumption is widely used in wireless channel modeling [16, 12, 2, 1, 41, 42]. It is in good agreement with measurements of tropospheric scattering channels [12], and provides a reasonable model for many types of mobile radio channels [43, 44, 45], at least over a limited time duration and bandwidth [16]. Furthermore, the scattering function can be directly estimated from measured data [46, 47], so that capacity expressions and bounds that explicitly depend on the channel’s scattering function can be evaluated for many channels of practical interest.
Formally, the WSSUS assumption is mathematically incompatible with the requirement that is of HilbertSchmidt type, or, equivalently, that the system functions are square integrable, because stationarity in time and frequency of implies that cannot decay to zero for and . Similarly to the engineering model of white noise, this incompatibility is a mathematical artifact and not a problem of realworld wireless channels: in fact, every communication system transmits over a finite time duration and over a finite bandwidth.^{4}^{4}4A more detailed account on solutions to overcome the mathematical incompatibility between stationary and finiteenergy models can be found in [48, Sec. 7.5]. We believe that the simplification the WSSUS assumption entails justifies this mathematical inconsistency.
IiB The Underspread Assumption and its Consequences
Because the velocity of the transmitter, of the receiver, and of the objects in the propagation environment is limited, so is the maximum Doppler shift experienced by the transmitted signal. We also assume that the maximum delay is strictly smaller than . For simplicity and without loss of generality, throughout this paper, we consider scattering functions that are centered at and , i.e., we remove any overall fixed delay and Doppler shift. The assumptions of limited Doppler shift and delay then imply that the scattering function is supported on a rectangle of spread ,
(7) 
Condition (7) in turn implies that the spreading function is also supported on the same rectangle with probability 1 (w.p.1). If , the channel is said to be underspread [16, 12, 20]. Virtually all channels in wireless communication are highly underspread, with for typical landmobile channels and as low as for some indoor channels with restricted mobility of the terminals [49, 50, 51]. The underspread property of typical wireless channels is very important, first because only (deterministic) underspread channels can be completely identified from measurements [52, 53], and second because underspread channels have a wellstructured set of approximate eigenfunctions that can be used to discretize the channel operator, as described next.
IiB1 Approximate diagonalization of underspread channels
As is a HilbertSchmidt operator, its kernel can be expressed in terms of its positive singular values , its left singular functions , and its right singular functions [37, Th. 6.14.1], according to
(8) 
We denote by the null space of , i.e., the space of input signals that the channel maps onto . The set is an orthonormal basis for the linear span of , and is an orthonormal basis for the range space . Any input signal in is of no utility for communication purposes; the remaining input signals in the linear span of , which we denote in the remainder of the paper as input space, can be completely characterized by their projections onto the set . Similarly, the output signal is completely described by its projections onto the set . These projections together with the kernel decomposition \frefeq:svd yield a countable set of scalar inputoutput relations, which we refer to as the diagonalization of .
Because the right and left singular functions depend on the realization of , diagonalization requires perfect channel knowledge. But this knowledge is not available in the noncoherent setting. In contrast, if the singular functions of the random channel did not depend on its particular realization, we could diagonalize without knowledge of the channel realization. This is the case, for example, for random linear timeinvariant (LTI) channels, where complex sinusoids are always eigenfunctions, independently of the realization of the channel’s impulse response. Fortunately, the singular functions of underspread random LTV channels can be well approximated by deterministic functions. More precisely, an underspread channel has the following properties [20]:

All realizations of the underspread channel are approximately normal, so that the singular value decomposition \frefeq:svd can be replaced by an eigenvalue decomposition.

Any deterministic unitenergy signal that is well localized^{5}^{5}5We measure the joint timefrequency localization of a signal by the product between its effective duration and its effective bandwidth, defined in (64). in time and frequency is an approximate eigenfunction of in the meansquare sense, i.e., the meansquare error is small if is underspread. This error can be further reduced by an appropriate choice of , where the choice depends on the scattering function .

If is an approximate eigenfunction as defined in the previous point, then so is for any time shift and any frequency shift .

For any , the timevarying transfer function is an approximate eigenvalue of corresponding to the approximate eigenfunction , in the sense that the meansquare error is small.
We use these properties of underspread operators to construct an approximation of the random channel that has a wellstructured set of deterministic eigenfunctions. The errors incurred by this approximation are discussed in detail in \frefapp:chapproxerror. We then diagonalize this approximating operator and exclusively consider the corresponding discretized inputoutput relation in the reminder of the paper. Property 1, the approximate normality of , together with Property 2 implies that the kernel of the approximating operator can be synthesized as where, differently from \frefeq:svd, the are now random eigenvalues instead of random singular values, and the constitute a set of deterministic orthonormal eigenfunctions instead of random singular functions. Property 2 means that we are at liberty to choose the approximate eigenfunctions among all signals that are well localized in time and frequency. In particular, we would like the resulting approximating kernel to be convenient to work with and the approximate eigenfunctions easy to implement, as discussed in \frefsec:ofdminterpretation; therefore, we choose the set of approximate eigenfunctions to be highly structured. By Property 3, it is possible to use time and frequencyshifted versions of a single welllocalized prototype function as eigenfunctions. Furthermore, because the support of is strictly limited in Doppler and delay , it follows from the sampling theorem and the Fourier transform relation \frefeq:spreading function that the samples , taken on a rectangular grid with and , are sufficient to characterize exactly. Hence, we take as our set of approximate eigenfunctions the socalled WeylHeisenberg set , where are orthonormal signals. The requirement that the are orthonormal and at the same time well localized in time and frequency implies [54], as a consequence of the BalianLow theorem [55, Ch. 8]. Large values of the product allow for better timefrequency localization of , but result in a loss of dimensions in signal space compared with the critically sampled case . The Nyquist condition and can be readily satisfied for all underspread channels.
The samples are approximate eigenvalues of by Property 4; hence, our choice of approximate eigenfunctions results in the following approximating eigenvalue decomposition for
(9) 
where denotes the kernel of the approximating operator . For , the WeylHeisenberg set is not complete in [54, Th. 8.3.1]. Therefore, the null space of is nonempty. As is only an approximation of , this null space might differ from . Similarly, the range space of might differ from . The characterization of the difference between these spaces is an important open problem.
IiB2 Canonical characterization of signaling schemes
The approximating random channel operator has a highly structured set of deterministic orthonormal eigenfunctions. We can, therefore, diagonalize the inputoutput relation of the approximating channel without the need for channel knowledge at both transmitter and receiver. Any input signal that lies in the input space of the approximating operator is uniquely characterized by its projections onto the set . All physically realizable transmit signals are effectively band limited. As the prototype function is well concentrated in frequency by construction, we can model the effective band limitation of by using only a finite number of slots in frequency. The resulting transmitted signal
(10) 
then has effective bandwidth . We call the coefficient the transmit symbol in the timefrequency slot . The received signal can be expanded in the same basis. To compute the resulting projections, we substitute and the canonical input signal \frefeq:canonicalinput into the integral inputoutput relation \frefeq:ltvkernelio, add white Gaussian noise , and project the resulting noisy received signal onto the functions , i.e.,
(11) 
for all timefrequency slots . The last step in (11) follows from the orthonormality of the set . Orthonormality also implies that the discretized noise signal is JPG, independent and identically distributed (i.i.d.) over time and frequency ; for convenience, we normalize the noise variance so that for all and . The diagonalized inputoutput relation \frefeq:scalario is completely generic, i.e., it is not limited to a specific signaling scheme.
IiB3 OFDM interpretation of the approximating channel model
The canonical signaling scheme \frefeq:canonicalinput and the corresponding discretized inputoutput relation \frefeq:scalario, are not just tools to analyze channel capacity, but also lead to a practical transmission system. The decomposition of the channel input signal \frefeq:canonicalinput can be interpreted as pulseshaped (PS) OFDM [56], where discrete data symbols are modulated onto a set of orthogonal signals, indexed by and . In addition, this perspective leads to an operational interpretation of the error incurred when approximating as in \frefeq:approxkernel. The time and frequencydispersive nature of LTV channels leads to intersymbol interference (ISI) and intercarrier interference (ICI) in the received PSOFDM signal. This is apparent if we project onto the function :
(12) 
The second term on the righthand side (RHS) of \frefeq:psofdmrx corresponds to ISI and ICI, while the first term is the desired signal; we can approximate the first term as by Property 4. Comparison of \frefeq:scalario and \frefeq:psofdmrx then shows that the inputoutput relation \frefeq:scalario, which results from the approximation \frefeq:approxkernel, can be interpreted as PSOFDM transmission over the original channel if all ISI and ICI terms are neglected.
With proper design of the prototype signal and choice of the grid parameters and , both ISI and ICI can be reduced [56, 57, 58]. The larger the product , the more effective the reduction in ISI and ICI, as discussed in \frefapp:chapproxerror. Heuristically, a good compromise between loss of dimensions in signal space and reduction of the interference terms seems to result for [56, 58]. The cyclic prefix (CP) in a conventional CPOFDM system incurs a similar dimension loss.
In (72), we provide an upper bound on meansquare energy of the interference term in \frefeq:psofdmrx, and show how this upper bound can be minimized by a careful choice of the signal and of the grid parameters and [20, 17, 58]. For general scattering functions, the optimization of the triple needs to be performed numerically; a general guideline is to choose and such that (see \frefapp:chapproxerror)
(13) 
To summarize, in this section we constructed an approximation of the random linear operator on the basis of the underspread property. The kernel of the approximating operator is synthesized from the WeylHeisenberg set as in (9), so that is an orthonormal basis for the input space and the range space of . The decomposition of the input signal (10) can be interpreted as PSOFDM: this interpretation sheds light on one of the errors resulting from the approximation (9). Finally, an important open problem is the characterization of the difference between the input spaces of and , and between the range spaces of and .
IiC Linear TimeInvariant and Linear FrequencyInvariant Channels
The properties of LTV underspread channels we listed in \frefsec:underspread are similar to the properties of LTI and linear frequencyinvariant (LFI) channels: both LTI and LFI channel operators are normal and have a wellstructured set of deterministic eigenfunctions (sinusoids parametrized by frequency for LTI channels, and Dirac functions parametrized by time for LFI channels), with corresponding eigenvalues equal to the samples of a channel system function (e.g., the transfer function in the LTI case). Intuitively, LTI and LFI channels are limiting cases within the class of LTV channels analyzed in this section; in fact, an LTV channel reduces to an LTI channel when , and to an LFI channel when . Both LTI and LFI channels are then underspread, according to our definition. Yet, since LTI and LFI channel operators are not of HilbertSchmidt type [59, App. A], the kernel diagonalization presented in \frefsec:underspread does not apply to these two classes of channels; consequently, the capacity bounds we derive in Sections III and IV do not reduce to capacity bounds for the LTI or the LFI case when or , respectively.^{6}^{6}6For deterministic LTI channels, a channel discretization that is useful for informationtheoretic analysis is discussed in [13, Sec. 8.5].
QuasiLTI channels, i.e., channels that are slowly time varying ( small but positive), and quasiLFI channels, i.e., channels that are slowly frequency varying ( small but positive), can instead be approximately diagonalized as described in \frefsec:underspread, as long as they are underspread.
IiD DiscreteTime DiscreteFrequency InputOutput Relation
The discretetime discretefrequency channel coefficients constitute a twodimensional discreteparameter stationary random process that is JPG with zero mean and correlation function
(14) 
The twodimensional power spectral density of is defined as
(15) 
We shall often need the following expression for in terms of the scattering function :
(16) 
where (a) follows from the Fourier transform relation \frefeq:scafunchcorr, and (b) results from Poisson’s summation formula. The variance of each channel coefficient is given by
(17) 
where (a) follows from \frefeq:specfunscafun, and (b) results because we chose the grid parameters to satisfy the Nyquist conditions and , so that periodic repetitions of the compactly supported scattering function lie outside the integration region. Finally, (c) follows from the change of variables and . For ease of notation, we normalize throughout the paper.
For each time slot , we arrange the discretized input signal , the discretized output signal , the channel coefficients , and the noise samples in corresponding vectors. For example, the dimensional vector that contains the input symbols in the th time slot is defined as
The output vector , the channel vector , and the noise vector are defined analogously. This notation allows us to rewrite the inputoutput relation \frefeq:scalario as
(18) 
for all . In this formulation, the channel is a multivariate stationary process with matrixvalued correlation function
(19) 
In most of the following analyses, we initially consider a finite number of time slots and then take the limit . To obtain a compact notation, we stack contiguous elements of the multivariate input, channel, and output processes just defined. For the channel input, this results in the dimensional vector
(20) 
Again, the stacked vectors , , and are defined analogously. With these definitions, we can now compactly express the inputoutput relation \frefeq:scalario as
(21) 
We denote the correlation matrix of the stacked channel vector by . Because the channel process is stationary in time and in frequency, is a twolevel Hermitian Toeplitz matrix, given by
(22) 
IiE Power Constraints
Throughout the paper, we assume that the average power of the transmitted signal is constrained as . In addition, we limit the peak power to be no larger than times the average power, where is the nominal peak to averagepower ratio (PAPR).
The multivariate inputoutput relation \frefeq:vecio allows to constrain the peak power in several different ways. We analyze the following two cases:

Peak constraint in time: The power of the transmitted signal in each time slot is limited as
(23) This constraint models the fact that physically realizable power amplifiers can only provide limited output power [4].

Peak constraint in time and frequency: Regulatory bodies sometimes limit the peak power in certain frequency bands, e.g., for UWB systems. We model this type of constraint by imposing a limit on the squared amplitude of the transmitted symbols in each timefrequency slot according to
(24) This type of constraint is more stringent than the peak constraint in time given in \frefeq:peakpertslot.
Both peak constraints above are imposed on the input symbols , i.e., in the eigenspace of the approximating channel operator. This limitation is mathematically convenient; however, the peak value of the corresponding transmitted continuoustime signal in \frefeq:canonicalinput also depends on the prototype signal , so that a limit on does not generally imply that is peak limited.
Iii Capacity Bounds under a Peak Constraint in Time and Frequency
In the present section, we analyze the capacity of the discretized channel in \frefeq:scalario subject to the peak constraint in time and frequency specified by \frefeq:peakpertfslot. The link between the discretized channel \frefeq:scalario and the continuoustime channel model established in \frefsec:model then allows us to express the resulting bounds in terms of the scattering function of the underspread WSSUS channel .
As we assumed that the channel process has a spectral density [given in (16)], the vector process is ergodic [60] and the capacity of the discretized underspread channel (21) is given by [61, Ch. 12]
(25) 
for a given bandwidth . Here, the supremum is taken over the set of all input distributions that satisfy the peak constraint \frefeq:peakpertfslot and the averagepower constraint .
The capacity of fading channels with finite bandwidth has so far resisted all attempts at closedform solutions [62, 22, 63], even for the memoryless case; thus, we resort to bounds to characterize the capacity \frefeq:capacityPeakTF. In particular, we present the following bounds:

An upper bound , which we refer to as coherent upper bound, that is based on the assumption that the receiver has perfect knowledge of the channel realizations. This bound is standard; it turns out to be useful for small bandwidth.

An upper bound that is useful for medium to large bandwidth. This bound is explicit in the channel’s scattering function and extends the upper bound [28, Prop. 2.2] on the capacity of frequencyflat timeselective channels to general underspread channels that are selective in time and frequency.

A lower bound that extends the lower bound [27, Prop. 2.2] to general underspread channels that are selective in time and frequency. This bound is explicit in the channel’s scattering function only for large bandwidth.
Iiia Coherent Upper Bound
The assumption that the receiver perfectly knows the instantaneous channel realizations furnishes the following capacity upper bound:
(26) 
Here, (a) holds because the coherent mutual information, , is an upper bound on the corresponding mutual information in the noncoherent setting. Inequality (b) follows as we drop the peak constraint and thus enlarge the set of admissible input distributions. The supremum of over the resulting relaxed input constraint is achieved by a zeromean JPG input vector with covariance matrix that satisfies [3]. To obtain (c), we use that, conditioned on , the output vector is JPG and its covariance matrix can be expressed as
where the last equality results from the following elementary relation between Hadamard products and outer products:
(27) 
Finally, (d) follows from Hadamard’s inequality, from the fact that by Jensen’s inequality the supremum is achieved by , and because the channel coefficients all have the same distribution . As the upper bound \frefeq:cohubderiv does not depend on , we obtain an upper bound on capacity \frefeq:capacityPeakTF as a function of bandwidth if we set :
(28) 
For a discretization of the WSSUS channel different from the one in \frefsec:underspread, Médard and Gallager [8] showed that the corresponding capacity vanishes with increasing bandwidth if the peakiness of the input signal is constrained in a way that includes our peak constraint \frefeq:peakpertfslot. As the upper bound monotonically increases in , it is sensible to conclude that does not accurately reflect the capacity behavior for large bandwidth. However, we demonstrate in \frefsec:numeval by means of a numerical example that can be quite useful for small and medium bandwidth.
IiiB An Upper Bound for Large but Finite Bandwidth
To better understand the capacity behavior at large bandwidth, we derive an upper bound that captures the effect of diminishing capacity in the largebandwidth regime. The upper bound is explicit in the channel’s scattering function .
IiiB1 The upper bound
Theorem 1
Consider an underspread Rayleighfading channel with scattering function ; assume that the channel input satisfies the averagepower constraint and the peak constraint w.p.1. The capacity of this channel is upperbounded as , where
(29a)  
with  
(29b)  
and  
(29c) 
Proof:
To bound , we first use the chain rule for mutual information, . Next, we split the supremum over into two parts, similarly as in the proof of [28, Prop. 2.2]: one supremum over a restricted set of input distributions that satisfy the peak constraint (24) and have a prescribed average power, i.e., for some fixed parameter , and another supremum over the parameter . Both steps together yield the upper bound
(30) 
Next, we bound the two terms inside the braces individually. While standard steps suffice for the bound on the first term, the second term requires some more effort; we relegate some of the more technical steps to \frefapp:mmsemi.
Upper bound on the first term
The output vector depends on the input vector only through , so that . To upperbound the mutual information , we take as JPG with zero mean and covariance matrix . An upper bound on the first term inside the braces in \frefeq:ubTFpeakstep1 now results if we drop the peak constraint on . Then,
(31) 
where (a) follows from Hadamard’s inequality and (b) from Jensen’s inequality.
Lower bound on the second term
We use the fact that the channel is JPG, so that . Next, we expand the expectation operator as follows:
(32) 
where is the integration domain because the input distribution satisfies the peak constraint (24). Both factors under the integral are nonnegative; hence, we obtain a lower bound on the expectation if we replace the first factor by its infimum over .
(33) 
As the matrix is positive semidefinite, the above infimum is achieved on the boundary of the admissible set [26, Sec. VI.A], i.e., by a vector whose entries satisfy . We use this fact and the relation between mutual information and MMSE, recently discovered by Guo et al. [35], to further lowerbound the infimum on the RHS in \frefeq:ubTFpeakterm2inf. The corresponding derivation is detailed in \frefapp:mmsemi; it results in
(34) 
where , defined in \frefeq:chspecfun, is the twodimensional power spectral density of the channel process . Finally, we use the bound \frefeq:ubTFpeakterm2immselb in \frefeq:ubTFpeakterm2inf, relate to the scattering function by means of \frefeq:specfunscafun and get