# Noncoherent Capacity of Underspread Fading Channels

Giuseppe Durisi,  Ulrich G. Schuster,  Helmut Bölcskei,  Shlomo Shamai (Shitz),  This work was supported in part by the Swiss Kommission für Technologie und Innovation (KTI) under grant 6715.2 ENS-ES, and by the European Commission as part of the Integrated Project Pulsers Phase II under contract FP6-027142, and as part of the FP6 Network of Excellence NEWCOM.G. Durisi and H. Bölcskei are with the Communication Technology Laboratory, ETH Zurich, 8092 Zurich, Switzerland (e-mail: {gdurisi, boelcskei}@nari.ee.ethz.ch).U. G. Schuster was with the Communication Technology Laboratory, ETH Zurich, and is now with Celestrius AG, Zurich, Switzerland.S. Shamai (Shitz) is with Technion, Israel Institute of Technology, 32000 Haifa, Israel (e-mail: sshlomo@ee.technion.ac.il). This paper was presented in part at the IEEE International Symposium on Information Theory, Seattle, WA, U.S.A., July 2006, and at the IEEE International Symposium on Information Theory, Nice, France, June 2007.
###### Abstract

We derive bounds on the noncoherent capacity of wide-sense stationary uncorrelated scattering (WSSUS) channels that are selective both in time and frequency, and are underspread, i.e., the product of the channel’s delay spread and Doppler spread is small. For input signals that are peak constrained in time and frequency, we obtain upper and lower bounds on capacity that are explicit in the channel’s scattering function, are accurate for a large range of bandwidth and allow to coarsely identify the capacity-optimal bandwidth as a function of the peak power and the channel’s scattering function. We also obtain a closed-form expression for the first-order Taylor series expansion of capacity in the limit of large bandwidth, and show that our bounds are tight in the wideband regime. For input signals that are peak constrained in time only (and, hence, allowed to be peaky in frequency), we provide upper and lower bounds on the infinite-bandwidth capacity and find cases when the bounds coincide and the infinite-bandwidth capacity is characterized exactly. Our lower bound is closely related to a result by Viterbi (1967).

The analysis in this paper is based on a discrete-time discrete-frequency approximation of WSSUS time- and frequency-selective channels. This discretization explicitly takes into account the underspread property, which is satisfied by virtually all wireless communication channels.

\@IEEEtunefonts\frefformat

vario\fancyrefseclabelprefixSection #1 \frefformatvariothmTheorem #1 \frefformatvariolemLemma #1 \frefformatvariocorCorollary #1 \frefformatvariodefDefinition #1 \frefformatvario\fancyreffiglabelprefixFig. #1 \frefformatvarioappAppendix #1 \frefformatvario\fancyrefeqlabelprefix(#1) \frefformatvariopropProperty #1

## I Introduction and Outline

#### I-1 Models for fading channels

Channel capacity is a benchmark for the design of any communication system. The techniques used to compute, or at least to bound, channel capacity often provide guidelines for the design of practical systems, e.g., how to best utilize the resources bandwidth and power, and how to design efficient modulation and coding schemes [1, Sec. III.3]. Our goal in this paper is to analyze the capacity of wireless communication channels that are of direct practical importance. We believe that an accurate stochastic model for such channels should take the following aspects into account:

• The channel is selective in time and frequency, i.e., it exhibits memory in frequency and in time, respectively.

• Neither the transmitter nor the receiver knows the instantaneous realization of the channel.

• The peak power of the input signal is limited.

These aspects are important because they arise from practical limitations of real-world communication systems: temporal variations of the environment and multipath propagation are responsible for channel selectivity in time and frequency, respectively [2, 3]; perfect channel knowledge at the receiver is impossible to obtain because channel state information needs to be extracted from the received signal; finally, realizable transmitters are always limited in their peak output power [4]. The above aspects are also fundamental as they significantly impact the behavior of channel capacity: for example, the capacity of a block-fading channel behaves differently from the capacity of a channel that is stationary in time [5]; channel capacity with perfect channel knowledge at the receiver is always larger than the capacity without channel knowledge [6], and the signaling schemes necessary to achieve capacity are also very different in the two cases [1]; finally, a peak constraint on the transmit signal can lead to vanishing capacity in the large-bandwidth limit [7, 8, 9], while without a peak constraint the infinite-bandwidth AWGN capacity can be attained asymptotically [7, 10, 11, 12, 13, 14, 15].

Small scale fading of wireless channels can be sensibly modeled as a stochastic Gaussian linear time-varying (LTV) system [2]; in particular, we base our developments on the widely used wide-sense stationary uncorrelated scattering (WSSUS) model for random LTV channels [16, 12]. Like most models for real-world channels, the WSSUS model is time continuous; however, almost all tools for information-theoretic analysis of noisy channels require a discretized representation of the channel’s input-output relation. Several approaches to discretize random LTV channels are proposed in the literature, e.g., sampling [8, 16, 17] or basis expansion [18, 19]; all these discretized models incur an approximation error with respect to the continuous-time WSSUS model that is often difficult to quantify. As virtually all wireless channels of practical interest are underspread, i.e., the product of maximum delay and maximum Doppler shift is small, we build our information-theoretic analysis upon a discretization of LTV channels, proposed by Kozek [20], that explicitly takes into account the underspread property to minimize the approximation error in the mean-square sense.

#### I-2 Capacity of noncoherent WSSUS channels

Throughout the paper, we assume that both the transmitter and receiver know the channel law111This implies that the codebook and the decoding strategy can be optimized accordingly [21]. but both are ignorant of the channel realization, a setting often called noncoherent. In the following, we refer to channel capacity in the noncoherent setting simply as “capacity”. In contrast, in the coherent setting the receiver is also assumed to know the channel realization perfectly; the corresponding capacity is termed coherent capacity.

A general closed-form expression for the capacity of Rayleigh-fading channels is not known, even if the channel is memoryless [22]. However, several asymptotic results are available. If only a constraint on the average transmitted power is imposed, the AWGN capacity can be achieved in the infinite-bandwidth limit also in the presence of fading. This result is quite robust, as it holds for a wide variety of channel models [7, 10, 11, 12, 13, 14, 15]. Verdú showed that flash signaling, which implies unbounded peak power of the input signal, is necessary and sufficient to achieve the infinite-bandwidth AWGN capacity on block-memoryless fading channels [14]; a form of flash signaling is also infinite-bandwidth optimal for the more general time- and frequency-selective channel model used in the present paper [15]. In contrast, if the peakiness of the input signal is restricted, the infinite-bandwidth capacity behavior of most fading channels changes drastically, and the limit depends on the type of peak constraint imposed [7, 8, 9, 13, 23]. In this paper, we shall distinguish between a peak constraint in time and a peak constraint in time and frequency.

##### Peak constraint in time

No closed-form capacity expression, not even in the infinite-bandwidth limit, seems to exist to date for time- and frequency-selective WSSUS channels. Viterbi’s analysis [23] provides a result that can be interpreted as a lower bound on the infinite-bandwidth capacity of time- and frequency-selective channels. This lower bound is in the form of the infinite-bandwidth AWGN capacity minus a penalty term that depends on the channel’s power-Doppler profile [16]. For channels that are time selective but frequency flat, structurally similar expressions were found for the infinite-bandwidth capacity [24, 25] and for the capacity per unit energy [26].

##### Peak constraint in time and frequency

Although a closed-form capacity expression valid for all bandwidths is not available, it is known that the infinite-bandwidth capacity is zero for various channel models [7, 8, 9]. This asymptotic capacity behavior implies that signaling schemes that spread the transmit energy uniformly across time and frequency perform poorly in the large-bandwidth regime. Even more useful for performance assessment would be capacity bounds for finite bandwidth. For frequency-flat time-selective channels, such bounds can be found in [27, 28], while for the more general time- and frequency-selective case treated in the present paper, upper bounds seem to exist only on the rates achievable with particular signaling schemes, namely for orthogonal frequency-division multiplexing (OFDM) with constant-modulus symbols [29], and for multiple-input multiple-output (MIMO) OFDM with unitary space-frequency codes over frequency-selective block-fading channels [30].

#### I-3 Contributions

We use the discrete-time discrete-frequency approximation of continuous-time underspread WSSUS channels proposed in [20], to obtain the following results:

• We derive upper and lower bounds on capacity under a constraint on the average power and under a peak constraint in both time and frequency. These bounds are valid for any bandwidth, are explicit in the channel’s scattering function, and generalize the results on achievable rates in [29]. In particular, our bounds allow to coarsely identify the capacity-optimal bandwidth for a given peak constraint and a given scattering function.

• Under the same peak constraint in time and frequency, we find the first-order Taylor series expansion of channel capacity in the limit of infinite bandwidth. This result extends the asymptotic capacity analysis for frequency-flat time-selective channels in [28] to channels that are selective in both time and frequency.

• In the infinite-bandwidth limit and for transmit signals that are peak-constrained in time only, we recover Viterbi’s capacity lower bound [23]. In addition, we derive an upper bound that is shown to coincide with the lower bound for a specific class of channels; hence, the infinite-bandwidth capacity for this class of channels is established.

The results in this paper rely on several flavors of Szegö’s theorem on the asymptotic eigenvalue distribution of Toeplitz matrices [31, 32]; in particular, we use various extensions of Szegö’s theorem to two-level Toeplitz matrices, i.e., block-Toeplitz matrices that have Toeplitz blocks [33, 34]. Another key ingredient for several of our proofs is the relation between mutual information and minimum mean-square error (MMSE) discovered recently by Guo et al. [35]. Furthermore, we use a property of the information divergence of orthogonal signaling schemes derived by Butman and Klass [36].

#### I-4 Notation

Uppercase boldface letters denote matrices and lowercase boldface letters designate vectors. The superscripts , , and  stand for transposition, element-wise conjugation, and Hermitian transposition, respectively. For two matrices  and  of appropriate dimensions, the Hadamard product is denoted as . We designate the identity matrix of dimension  as  and the all-zero vector of appropriate dimension as . We let  denote a diagonal square matrix whose main diagonal contains the elements of the vector . The determinant, trace, and rank of the matrix  are denoted as , , and , respectively, and  is the th eigenvalue of a square matrix . The function  is the Dirac distribution, and  is defined as  and  for all . All logarithms are to the base . The real part of the complex number  is denoted . We write  for the set difference between the sets  and . For two functions  and , the notation  for  means that . With  we denote the largest integer smaller or equal to . A signal is an element of the Hilbert space  of square integrable functions. The inner product between two signals  and  is denoted as . For a random variable (RV)  with distribution , we write . We denote expectation by , and use the notation  to stress that the expectation is taken with respect to the RV . We write  for the Kullback-Leibler (KL) divergence between the two distributions  and . Finally, stands for the distribution of a jointly proper Gaussian (JPG) random vector with mean  and covariance matrix .

## Ii Channel and System Model

A channel model needs to strike a balance between generality, accuracy, engineering relevance, and mathematical tractability. In the following, we start from the classical WSSUS model for LTV channels [16, 12] because it is a fairly general, yet accurate and mathematically tractable model that is widely used. This model has a continuous-time input-output relation, which is difficult to use as a basis for information-theoretic studies. However, if the channel is underspread it is possible to closely approximate the original WSSUS input-output relation by a discretized input-output relation that is especially suited for the derivation of capacity bounds. In particular, the bounds we derive in this paper can be directly related to the underlying continuous-time WSSUS channel as they are explicit in its scattering function.

### Ii-a Time- and Frequency-Selective Underspread Fading Channels

#### Ii-A1 The channel operator

A wireless channel can be described as a linear operator  that maps an input signal  into an output signal , where  denotes the range space of  [37]. The corresponding noise-free input-output relation is then .

It is sensible to model wireless channels as random, for one because a deterministic description of the physical propagation environment is too complex in most cases of practical interest, and second because a stochastic description is much more robust, in the sense that systems designed on the basis of a stochastic channel model can be expected to work in a variety of different propagation environments [3]. Consequently, we assume that  is a random operator.

#### Ii-A2 System functions

Because communication takes place over a finite bandwidth and a finite time duration, we can assume that each realization of  is a Hilbert-Schmidt operator [38, 39]. Hence, the noise-free input-output relation of the LTV channel can be written as222All integrals are from  to  unless stated otherwise.[38, p. 1083]

 r(t)=(Hx)(t)=∫t′kH(t,t′)x(t′)dt′ (1)

where the kernel  can be interpreted as the channel response at time  to a Dirac impulse at time . Instead of two variables that denote absolute time, it is common in the engineering literature to use absolute time  and delay . This leads to the time-varying impulse response  and the corresponding noise-free input-output relation [16]

 r(t)=∫τhH(t,τ)x(t−τ)dτ. (2)

Two more system functions that will be important in the following developments are the time-varying transfer function333 As  is of Hilbert-Schmidt type, the time-varying impulse response  is square integrable, and the Fourier transforms in (3) and (4) are well defined.

 LH(t,f) =∫τhH(t,τ)e−j2πfτdτ (3)

and the spreading function

 SH(ν,τ) =∫thH(t,τ)e−j2πνtdt=∬tfLH(t,f)e−j2π(νt−τf)dtdf. (4)

In particular, if we rewrite the input-output relation \frefeq:ltv-io in terms of the spreading function  as

 r(t)=∬ντSH(ν,τ)x(t−τ)ej2πtνdτdν (5)

we obtain an intuitive physical interpretation: the output signal  is a weighted superposition of copies of the input signal  that are shifted in time by the delay  and in frequency by the Doppler shift .

#### Ii-A3 Stochastic characterization and WSSUS assumption

For mathematical tractability, we need to make additional assumptions on the system functions. First, we assume that  is a zero-mean JPG random process in  and . Indeed, the Gaussian distribution is empirically supported for narrowband channels [2], and even ultrawideband (UWB) channels with bandwidth up to several gigahertz can be modeled as Gaussian distributed [40]. By virtue of the Gaussian assumption,  is completely characterized by its correlation function. Yet, this correlation function is four-dimensional in general and thus difficult to work with. A further simplification is possible if we assume that the channel process is wide-sense stationary in time  and uncorrelated in delay , the so-called WSSUS assumption [16]. As a consequence, is wide-sense stationary both in time  and frequency , or, equivalently, is uncorrelated in Doppler  and delay  [16]:

 E[LH(t,f)L∗H(t′,f′)] =RH(t−t′,f−f′) E[SH(ν,τ)S∗H(ν′,τ′)] =CH(ν,τ)δ(ν−ν′)δ(τ−τ′).

The function  is called the channel’s (time-frequency) correlation function, and  is called the scattering function of the channel . The two functions are related by a two-dimensional Fourier transform,

 CH(ν,τ)=∬tfRH(t,f)e−j2π(νt−τf)dtdf. (6)

As  is stationary in  and , is nonnegative and real-valued for all  and , and can be interpreted as the spectrum of the channel process. The power-delay profile of  is defined as

 pH(τ) =∫νCH(ν,τ)dν and the power-Doppler profile as qH(ν) =∫τCH(ν,τ)dτ.

The WSSUS assumption is widely used in wireless channel modeling [16, 12, 2, 1, 41, 42]. It is in good agreement with measurements of tropospheric scattering channels [12], and provides a reasonable model for many types of mobile radio channels [43, 44, 45], at least over a limited time duration and bandwidth [16]. Furthermore, the scattering function can be directly estimated from measured data [46, 47], so that capacity expressions and bounds that explicitly depend on the channel’s scattering function can be evaluated for many channels of practical interest.

Formally, the WSSUS assumption is mathematically incompatible with the requirement that  is of Hilbert-Schmidt type, or, equivalently, that the system functions are square integrable, because stationarity in time  and frequency of  implies that  cannot decay to zero for  and . Similarly to the engineering model of white noise, this incompatibility is a mathematical artifact and not a problem of real-world wireless channels: in fact, every communication system transmits over a finite time duration and over a finite bandwidth.444A more detailed account on solutions to overcome the mathematical incompatibility between stationary and finite-energy models can be found in [48, Sec. 7.5]. We believe that the simplification the WSSUS assumption entails justifies this mathematical inconsistency.

### Ii-B The Underspread Assumption and its Consequences

Because the velocity of the transmitter, of the receiver, and of the objects in the propagation environment is limited, so is the maximum Doppler shift  experienced by the transmitted signal. We also assume that the maximum delay is strictly smaller than . For simplicity and without loss of generality, throughout this paper, we consider scattering functions that are centered at  and , i.e., we remove any overall fixed delay and Doppler shift. The assumptions of limited Doppler shift and delay then imply that the scattering function is supported on a rectangle of spread ,

 CH(ν,τ)=0for (ν,τ)∉[−ν0,ν0]×[−τ0,τ0]. (7)

Condition (7) in turn implies that the spreading function  is also supported on the same rectangle with probability 1 (w.p.1). If , the channel is said to be underspread [16, 12, 20]. Virtually all channels in wireless communication are highly underspread, with  for typical land-mobile channels and as low as for some indoor channels with restricted mobility of the terminals [49, 50, 51]. The underspread property of typical wireless channels is very important, first because only (deterministic) underspread channels can be completely identified from measurements [52, 53], and second because underspread channels have a well-structured set of approximate eigenfunctions that can be used to discretize the channel operator, as described next.

#### Ii-B1 Approximate diagonalization of underspread channels

As  is a Hilbert-Schmidt operator, its kernel can be expressed in terms of its positive singular values , its left singular functions , and its right singular functions  [37, Th. 6.14.1], according to

 kH(t,t′)=∞∑i=−∞σiui(t)v∗i(t′). (8)

We denote by  the null space of , i.e., the space of input signals that the channel maps onto . The set  is an orthonormal basis for the linear span of , and  is an orthonormal basis for the range space . Any input signal in  is of no utility for communication purposes; the remaining input signals in the linear span of , which we denote in the remainder of the paper as input space, can be completely characterized by their projections onto the set . Similarly, the output signal  is completely described by its projections onto the set . These projections together with the kernel decomposition \frefeq:svd yield a countable set of scalar input-output relations, which we refer to as the diagonalization of .

Because the right and left singular functions depend on the realization of , diagonalization requires perfect channel knowledge. But this knowledge is not available in the noncoherent setting. In contrast, if the singular functions of the random channel  did not depend on its particular realization, we could diagonalize  without knowledge of the channel realization. This is the case, for example, for random linear time-invariant (LTI) channels, where complex sinusoids are always eigenfunctions, independently of the realization of the channel’s impulse response. Fortunately, the singular functions of underspread random LTV channels can be well approximated by deterministic functions. More precisely, an underspread channel  has the following properties [20]:

1. All realizations of the underspread channel  are approximately normal, so that the singular value decomposition \frefeq:svd can be replaced by an eigenvalue decomposition.

2. Any deterministic unit-energy signal  that is well localized555We measure the joint time-frequency localization of a signal  by the product between its effective duration and its effective bandwidth, defined in (64). in time and frequency is an approximate eigenfunction of  in the mean-square sense, i.e., the mean-square error is small if  is underspread. This error can be further reduced by an appropriate choice of , where the choice depends on the scattering function .

3. If  is an approximate eigenfunction as defined in the previous point, then so is  for any time shift  and any frequency shift .

4. For any , the time-varying transfer function is an approximate eigenvalue of  corresponding to the approximate eigenfunction , in the sense that the mean-square error  is small.

We use these properties of underspread operators to construct an approximation  of the random channel  that has a well-structured set of deterministic eigenfunctions. The errors incurred by this approximation are discussed in detail in \frefapp:ch-approx-error. We then diagonalize this approximating operator and exclusively consider the corresponding discretized input-output relation in the reminder of the paper. Property 1, the approximate normality of , together with Property 2 implies that the kernel of the approximating operator  can be synthesized as where, differently from \frefeq:svd, the  are now random eigenvalues instead of random singular values, and the  constitute a set of deterministic orthonormal eigenfunctions instead of random singular functions. Property 2 means that we are at liberty to choose the approximate eigenfunctions  among all signals that are well localized in time and frequency. In particular, we would like the resulting approximating kernel to be convenient to work with and the approximate eigenfunctions  easy to implement, as discussed in \frefsec:ofdm-interpretation; therefore, we choose the set of approximate eigenfunctions to be highly structured. By Property 3, it is possible to use time- and frequency-shifted versions of a single well-localized prototype function  as eigenfunctions. Furthermore, because the support of  is strictly limited in Doppler  and delay , it follows from the sampling theorem and the Fourier transform relation \frefeq:spreading function that the samples , taken on a rectangular grid with  and , are sufficient to characterize exactly. Hence, we take as our set of approximate eigenfunctions the so-called Weyl-Heisenberg set , where  are orthonormal signals. The requirement that the  are orthonormal and at the same time well localized in time and frequency implies  [54], as a consequence of the Balian-Low theorem [55, Ch. 8]. Large values of the product  allow for better time-frequency localization of , but result in a loss of dimensions in signal space compared with the critically sampled case . The Nyquist condition  and  can be readily satisfied for all underspread channels.

The samples  are approximate eigenvalues of  by Property 4; hence, our choice of approximate eigenfunctions results in the following approximating eigenvalue decomposition for

 kH(t,t′)≈k˜H(t,t′)=∞∑k=−∞∞∑n=−∞LH(kT,nF)gk,n(t)g∗k,n(t′) (9)

where  denotes the kernel of the approximating operator . For , the Weyl-Heisenberg set  is not complete in  [54, Th. 8.3.1]. Therefore, the null space of  is nonempty. As  is only an approximation of , this null space might differ from . Similarly, the range space of  might differ from . The characterization of the difference between these spaces is an important open problem.

#### Ii-B2 Canonical characterization of signaling schemes

The approximating random channel operator  has a highly structured set of deterministic orthonormal eigenfunctions. We can, therefore, diagonalize the input-output relation of the approximating channel without the need for channel knowledge at both transmitter and receiver. Any input signal  that lies in the input space of the approximating operator is uniquely characterized by its projections onto the set . All physically realizable transmit signals are effectively band limited. As the prototype function  is well concentrated in frequency by construction, we can model the effective band limitation of  by using only a finite number of slots  in frequency. The resulting transmitted signal

 x(t)=∞∑k=−∞N−1∑n=0⟨x,gk,n⟩=x[k,n]gk,n(t) (10)

then has effective bandwidth . We call the coefficient  the transmit symbol in the time-frequency slot . The received signal can be expanded in the same basis. To compute the resulting projections, we substitute  and the canonical input signal \frefeq:canonical-input into the integral input-output relation \frefeq:ltv-kernel-io, add white Gaussian noise , and project the resulting noisy received signal onto the functions , i.e.,

 y[k,n]=⟨y,gk,n⟩=⟨˜Hx,gk,n⟩+⟨w,gk,n⟩w[k,n]=∑k′,n′x[k′,n′]⟨˜Hgk′,n′,gk,n⟩+w[k,n]=LH(kT,nF)h[k,n]x[k,n]+w[k,n] (11)

for all time-frequency slots . The last step in (11) follows from the orthonormality of the set . Orthonormality also implies that the discretized noise signal  is JPG, independent and identically distributed (i.i.d.) over time  and frequency ; for convenience, we normalize the noise variance so that  for all  and . The diagonalized input-output relation \frefeq:scalar-io is completely generic, i.e., it is not limited to a specific signaling scheme.

#### Ii-B3 OFDM interpretation of the approximating channel model

The canonical signaling scheme \frefeq:canonical-input and the corresponding discretized input-output relation \frefeq:scalar-io, are not just tools to analyze channel capacity, but also lead to a practical transmission system. The decomposition of the channel input signal \frefeq:canonical-input can be interpreted as pulse-shaped (PS) OFDM [56], where discrete data symbols  are modulated onto a set of orthogonal signals, indexed by  and . In addition, this perspective leads to an operational interpretation of the error incurred when approximating  as in \frefeq:approx-kernel. The time- and frequency-dispersive nature of LTV channels leads to intersymbol interference (ISI) and intercarrier interference (ICI) in the received PS-OFDM signal. This is apparent if we project  onto the function :

 ⟨r,gk,n⟩=⟨Hx,gk,n⟩=∞∑k′=−∞N−1∑n′=0x[k′,n′]⟨Hgk′,n′,gk,n⟩=⟨Hgk,n,gk,n⟩x[k,n]+∞∑k′=−∞N−1∑n′=0(k′,n′)≠(k,n)x[k′,n′]⟨Hgk′,n′,gk,n⟩. (12)

The second term on the right-hand side (RHS) of \frefeq:psofdm-rx corresponds to ISI and ICI, while the first term is the desired signal; we can approximate the first term as  by Property 4. Comparison of \frefeq:scalar-io and \frefeq:psofdm-rx then shows that the input-output relation \frefeq:scalar-io, which results from the approximation \frefeq:approx-kernel, can be interpreted as PS-OFDM transmission over the original channel  if all ISI and ICI terms are neglected.

With proper design of the prototype signal  and choice of the grid parameters  and , both ISI and ICI can be reduced [56, 57, 58]. The larger the product , the more effective the reduction in ISI and ICI, as discussed in \frefapp:ch-approx-error. Heuristically, a good compromise between loss of dimensions in signal space and reduction of the interference terms seems to result for  [56, 58]. The cyclic prefix (CP) in a conventional CP-OFDM system incurs a similar dimension loss.

In (72), we provide an upper bound on mean-square energy of the interference term in \frefeq:psofdm-rx, and show how this upper bound can be minimized by a careful choice of the signal  and of the grid parameters  and  [20, 17, 58]. For general scattering functions, the optimization of the triple  needs to be performed numerically; a general guideline is to choose  and  such that (see \frefapp:ch-approx-error)

 TF=τ0ν0. (13)

To summarize, in this section we constructed an approximation  of the random linear operator  on the basis of the underspread property. The kernel of the approximating operator is synthesized from the Weyl-Heisenberg set  as in (9), so that  is an orthonormal basis for the input space and the range space of . The decomposition of the input signal (10) can be interpreted as PS-OFDM: this interpretation sheds light on one of the errors resulting from the approximation (9). Finally, an important open problem is the characterization of the difference between the input spaces of  and , and between the range spaces of  and .

### Ii-C Linear Time-Invariant and Linear Frequency-Invariant Channels

The properties of LTV underspread channels we listed in \frefsec:underspread are similar to the properties of LTI and linear frequency-invariant (LFI) channels: both LTI and LFI channel operators are normal and have a well-structured set of deterministic eigenfunctions (sinusoids parametrized by frequency for LTI channels, and Dirac functions parametrized by time for LFI channels), with corresponding eigenvalues equal to the samples of a channel system function (e.g., the transfer function in the LTI case). Intuitively, LTI and LFI channels are limiting cases within the class of LTV channels analyzed in this section; in fact, an LTV channel reduces to an LTI channel when , and to an LFI channel when . Both LTI and LFI channels are then underspread, according to our definition. Yet, since LTI and LFI channel operators are not of Hilbert-Schmidt type [59, App. A], the kernel diagonalization presented in \frefsec:underspread does not apply to these two classes of channels; consequently, the capacity bounds we derive in Sections III and IV do not reduce to capacity bounds for the LTI or the LFI case when  or , respectively.666For deterministic LTI channels, a channel discretization that is useful for information-theoretic analysis is discussed in [13, Sec. 8.5].

Quasi-LTI channels, i.e., channels that are slowly time varying ( small but positive), and quasi-LFI channels, i.e., channels that are slowly frequency varying ( small but positive), can instead be approximately diagonalized as described in \frefsec:underspread, as long as they are underspread.

### Ii-D Discrete-Time Discrete-Frequency Input-Output Relation

The discrete-time discrete-frequency channel coefficients  constitute a two-dimensional discrete-parameter stationary random process that is JPG with zero mean and correlation function

 RH[k,n]=E[h[k′+k,n′+n]h∗[k′,n′]]=E[LH((k′+k)T,(n′+n)F)L∗H(k′T,n′F)]. (14)

The two-dimensional power spectral density of  is defined as

 c(θ,φ)=∞∑k=−∞∞∑n=−∞RH[k,n]e−j2π(kθ−nφ),|θ|,|φ|≤1/2. (15)

We shall often need the following expression for  in terms of the scattering function :

 c(θ,φ)\lx@stackrel(a)=∞∑k=−∞∞∑n=−∞e−j2π(kθ−nφ)∬ντCH(ν,τ)ej2π(kTν−nFτ)dτdν=∬ντCH(ν,τ)∞∑k=−∞ej2πkT(ν−θT)∞∑n=−∞e−j2πnF(τ−φF)dτdν\lx@stackrel(b)=1TF∬ντCH(ν,τ)∞∑k=−∞δ(ν−θ−kT)∞∑n=−∞δ(τ−φ−nF)dτdν=1TF∞∑k=−∞∞∑n=−∞CH(θ−kT,φ−nF) (16)

where (a) follows from the Fourier transform relation \frefeq:scafun-chcorr, and (b) results from Poisson’s summation formula. The variance of each channel coefficient is given by

 σ2H=∫1/2−1/2∫1/2−1/2c(θ,φ)dθdφ\lx@stackrel(a)=1TF∞∑k=−∞∞∑n=−∞∫1/2−1/2∫1/2−1/2CH(θ−kT,φ−nF)dθdφ\lx@stackrel(b)=1TF∫1/2−1/2∫1/2−1/2CH(θT,φF)dθdφ\lx@stackrel(c)=∬ντCH(ν,τ)dτdν (17)

where (a) follows from \frefeq:specfun-scafun, and (b) results because we chose the grid parameters to satisfy the Nyquist conditions  and , so that periodic repetitions of the compactly supported scattering function lie outside the integration region. Finally, (c) follows from the change of variables  and . For ease of notation, we normalize  throughout the paper.

For each time slot , we arrange the discretized input signal , the discretized output signal , the channel coefficients , and the noise samples  in corresponding vectors. For example, the -dimensional vector that contains the input symbols in the th time slot is defined as

The output vector , the channel vector , and the noise vector  are defined analogously. This notation allows us to rewrite the input-output relation \frefeq:scalar-io as

 y[k]=h[k]⊙x[k]+w[k] (18)

for all . In this formulation, the channel is a multivariate stationary process  with matrix-valued correlation function

 Rh[k]=E[h[k′+k]hH[k′]]=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣RH[k,0]R∗H[k,1]…R∗H[k,N−1]RH[k,1]RH[k,0]…R∗H[k,N−2]⋮⋮⋱⋮RH[k,N−1]RH[k,N−2]…RH[k,0]⎤⎥ ⎥ ⎥ ⎥ ⎥⎦. (19)

In most of the following analyses, we initially consider a finite number  of time slots and then take the limit . To obtain a compact notation, we stack  contiguous elements of the multivariate input, channel, and output processes just defined. For the channel input, this results in the -dimensional vector

 x=[xT[0]xT[1]⋯xT[K−1]]T. (20)

Again, the stacked vectors , , and  are defined analogously. With these definitions, we can now compactly express the input-output relation \frefeq:scalar-io as

 y=x⊙h+w. (21)

We denote the correlation matrix of the stacked channel vector  by . Because the channel process  is stationary in time and in frequency, is a two-level Hermitian Toeplitz matrix, given by

 Rh=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣Rh[0]RHh[1]…RHh[K−1]Rh[1]Rh[0]…RHh[K−2]⋮⋮⋱⋮Rh[K−1]Rh[K−2]…Rh[0]⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦. (22)

### Ii-E Power Constraints

Throughout the paper, we assume that the average power of the transmitted signal is constrained as . In addition, we limit the peak power to be no larger than  times the average power, where  is the nominal peak- to average-power ratio (PAPR).

The multivariate input-output relation \frefeq:vec-io allows to constrain the peak power in several different ways. We analyze the following two cases:

1. Peak constraint in time: The power of the transmitted signal in each time slot  is limited as

 1TN−1∑n=0|x[k,n]|2≤βPw.p.1. (23)

This constraint models the fact that physically realizable power amplifiers can only provide limited output power [4].

2. Peak constraint in time and frequency: Regulatory bodies sometimes limit the peak power in certain frequency bands, e.g., for UWB systems. We model this type of constraint by imposing a limit on the squared amplitude of the transmitted symbols  in each time-frequency slot  according to

 (1/T)|x[k,n]|2≤βP/Nw.p.1. (24)

This type of constraint is more stringent than the peak constraint in time given in \frefeq:peak-per-tslot.

Both peak constraints above are imposed on the input symbols , i.e., in the eigenspace of the approximating channel operator. This limitation is mathematically convenient; however, the peak value of the corresponding transmitted continuous-time signal  in \frefeq:canonical-input also depends on the prototype signal , so that a limit on  does not generally imply that  is peak limited.

## Iii Capacity Bounds under a Peak Constraint in Time and Frequency

In the present section, we analyze the capacity of the discretized channel in \frefeq:scalar-io subject to the peak constraint in time and frequency specified by \frefeq:peak-per-tfslot. The link between the discretized channel \frefeq:scalar-io and the continuous-time channel model established in \frefsec:model then allows us to express the resulting bounds in terms of the scattering function  of the underspread WSSUS channel .

As we assumed that the channel process  has a spectral density [given in (16)], the vector process  is ergodic [60] and the capacity of the discretized underspread channel (21) is given by [61, Ch. 12]

 C(W)=limK→∞1KTsupQI(y;x)[nat/s] (25)

for a given bandwidth . Here, the supremum is taken over the set  of all input distributions that satisfy the peak constraint \frefeq:peak-per-tfslot and the average-power constraint .

The capacity of fading channels with finite bandwidth has so far resisted all attempts at closed-form solutions [62, 22, 63], even for the memoryless case; thus, we resort to bounds to characterize the capacity \frefeq:capacityPeakTF. In particular, we present the following bounds:

• An upper bound , which we refer to as coherent upper bound, that is based on the assumption that the receiver has perfect knowledge of the channel realizations. This bound is standard; it turns out to be useful for small bandwidth.

• An upper bound  that is useful for medium to large bandwidth. This bound is explicit in the channel’s scattering function and extends the upper bound [28, Prop. 2.2] on the capacity of frequency-flat time-selective channels to general underspread channels that are selective in time and frequency.

• A lower bound  that extends the lower bound [27, Prop. 2.2] to general underspread channels that are selective in time and frequency. This bound is explicit in the channel’s scattering function only for large bandwidth.

### Iii-a Coherent Upper Bound

The assumption that the receiver perfectly knows the instantaneous channel realizations furnishes the following capacity upper bound:

 1KTsupQI(y;x)\lx@stackrel(a)≤1KTsupQI(y;x|h)\lx@stackrel(b)≤1KTsupE[∥x∥2]≤KPTI(y;x|h)\lx@stackrel(c)=1KTsupRxEh[logdet(IKN+(hhH)⊙Rx)]\lx@stackrel(d)≤NTEh[log(1+PTN|h|2)]. (26)

Here, (a) holds because the coherent mutual information, , is an upper bound on the corresponding mutual information in the noncoherent setting. Inequality (b) follows as we drop the peak constraint and thus enlarge the set of admissible input distributions. The supremum of  over the resulting relaxed input constraint is achieved by a zero-mean JPG input vector  with covariance matrix  that satisfies  [3]. To obtain (c), we use that, conditioned on , the output vector  is JPG and its covariance matrix can be expressed as

 E[yyH|h]=IKN+Ex[(x⊙h)(x⊙h)H]=IKN+(hhH)⊙Rx

where the last equality results from the following elementary relation between Hadamard products and outer products:

 (x⊙h)(x⊙h)H=xxH⊙hhH. (27)

Finally, (d) follows from Hadamard’s inequality, from the fact that by Jensen’s inequality the supremum is achieved by , and because the channel coefficients all have the same distribution . As the upper bound \frefeq:coh-ub-deriv does not depend on , we obtain an upper bound  on capacity \frefeq:capacityPeakTF as a function of bandwidth  if we set :

 (28)

For a discretization of the WSSUS channel  different from the one in \frefsec:underspread, Médard and Gallager [8] showed that the corresponding capacity vanishes with increasing bandwidth if the peakiness of the input signal is constrained in a way that includes our peak constraint \frefeq:peak-per-tfslot. As the upper bound  monotonically increases in , it is sensible to conclude that  does not accurately reflect the capacity behavior for large bandwidth. However, we demonstrate in \frefsec:num-eval by means of a numerical example that  can be quite useful for small and medium bandwidth.

### Iii-B An Upper Bound for Large but Finite Bandwidth

To better understand the capacity behavior at large bandwidth, we derive an upper bound  that captures the effect of diminishing capacity in the large-bandwidth regime. The upper bound is explicit in the channel’s scattering function .

#### Iii-B1 The upper bound

###### Theorem 1

Consider an underspread Rayleigh-fading channel with scattering function ; assume that the channel input  satisfies the average-power constraint  and the peak constraint  w.p.1. The capacity of this channel is upper-bounded as , where

 U{}1(W) =WTFlog(1+α(W)PTFW)−α(W)A(W) (29a) with α(W) =min{1,WTF(1A(W)−1P)} (29b) and A(W) =Wβ∬ντlog(1+βPWCH(ν,τ))dτdν. (29c)
###### Proof:

To bound , we first use the chain rule for mutual information, . Next, we split the supremum over  into two parts, similarly as in the proof of [28, Prop. 2.2]: one supremum over a restricted set of input distributions  that satisfy the peak constraint (24) and have a prescribed average power, i.e.,  for some fixed parameter , and another supremum over the parameter . Both steps together yield the upper bound

 supQI(y;x)=supQ{I(y;x,h)−I(y;h|x)}=sup0≤α≤1supQ|α{I(y;x,h)−I(y;h|x)}≤sup0≤α≤1{supQ|αI(y;x,h)−infQ|αI(y;h|x)}. (30)

Next, we bound the two terms inside the braces individually. While standard steps suffice for the bound on the first term, the second term requires some more effort; we relegate some of the more technical steps to \frefapp:mmse-mi.

##### Upper bound on the first term

The output vector  depends on the input vector  only through , so that . To upper-bound the mutual information , we take  as JPG with zero mean and covariance matrix . An upper bound on the first term inside the braces in \frefeq:ubTFpeak-step1 now results if we drop the peak constraint on . Then,

 supQ|αI(y;x,h)≤supE[∥x∥2]=αKPTlogdet(IKN+E[xxH]⊙Rh)\lx@stackrel(a)≤supE[∥x∥2]=αKPTK−1∑k=0N−1∑n=0log(1+E[|x[k,n]|2])\lx@stackrel(b)≤KNlog(1+αPTN) (31)

where (a) follows from Hadamard’s inequality and (b) from Jensen’s inequality.

##### Lower bound on the second term

We use the fact that the channel  is JPG, so that . Next, we expand the expectation operator as follows:

 infQ|αI(y;h|x)=infQ|αEx[logdet(IKN+(xxH)⊙Rh)]=infQ∈Q|α∫x∈X(logdet(IKN+(xxH)⊙Rh)∥x∥2)∥x∥2dQ (32)

where  is the integration domain because the input distribution  satisfies the peak constraint (24). Both factors under the integral are nonnegative; hence, we obtain a lower bound on the expectation if we replace the first factor by its infimum over .

 infQ|αI(y;h|x)≥infQ∈Q|α∫~x∈X(infx∈Xlogdet(IKN+(xxH)⊙Rh)∥x∥2)(∥~x∥2)dQ=infx∈Xlogdet(IKN+(xxH)⊙Rh)∥x∥2(infQ∈Q|α∫∥x∥2dQ)infQ|αE[∥x∥2]=αKPT=αKPTinfx∈Xlogdet(IKN+(xxH)⊙Rh)∥x∥2. (33)

As the matrix  is positive semidefinite, the above infimum is achieved on the boundary of the admissible set [26, Sec. VI.A], i.e., by a vector  whose entries satisfy . We use this fact and the relation between mutual information and MMSE, recently discovered by Guo et al. [35], to further lower-bound the infimum on the RHS in \frefeq:ubTFpeak-term2-inf. The corresponding derivation is detailed in \frefapp:mmse-mi; it results in

 infx∈Xlogdet(IKN+(xxH)⊙Rh)∥x∥2≥NβPT∫1/2−1/2∫1/2−1/2log(1+βPTNc(θ,φ))dθdφ (34)

where , defined in \frefeq:chspecfun, is the two-dimensional power spectral density of the channel process . Finally, we use the bound \frefeq:ubTFpeak-term2-immse-lb in \frefeq:ubTFpeak-term2-inf, relate  to the scattering function  by means of \frefeq:specfun-scafun and get

 infQ|αI(y;h|x)≥αKNβ∫1/2−