Spatial Wireless Channel Prediction
under Location Uncertainty
Abstract
Spatial wireless channel prediction is important for future wireless networks, and in particular for proactive resource allocation at different layers of the protocol stack. Various sources of uncertainty must be accounted for during modeling and to provide robust predictions. We investigate two channel prediction frameworks, classical Gaussian processes (cGP) and uncertain Gaussian processes (uGP), and analyze the impact of location uncertainty during learning/training and prediction/testing, for scenarios where measurements uncertainty are dominated by largescale fading. We observe that cGP generally fails both in terms of learning the channel parameters and in predicting the channel in the presence of location uncertainties. In contrast, uGP explicitly considers the location uncertainty. Using simulated data, we show that uGP is able to learn and predict the wireless channel.
I Introduction
Locationbased resource allocation schemes are expected to become an essential element of emerging 5G networks, as 5G devices will have the capability to accurately selflocalize and predict relevant channel quality metrics (CQM) [1, 2, 3] based on crowdsourced databases. The geotagged CQM (including, e.g., received signal strength, delay spread, and interference levels) from users enables the construction of a dynamic database, which in turn allows the prediction of CQM at arbitrary locations and future times. Current standards are already moving in this direction through the socalled minimization of drive test (MDT) feature in 3GPPP Release 10 [4]. In MDT, users collect radio measurements and associated location information in order to assess network performance. In terms of applications, prediction of spatial wireless channels (e.g., through radio environment maps) and its utilization in resource allocation can reduce overheads and delays due to the ability to predict channel quality beyond traditional time scales [2]. Exploitation of locationaware CQM is relevant for interference management in twotier cellular networks [5], coverage hole detection and prediction [6], cooperative spectrum sensing in cognitive radios [7], anticipatory networks for predictive resource allocation [3], and proactive caching [8].
In order to predict locationdependent radio propagation channels, we rely on mathematical models, in which the physical environment, including the locations of transmitter and receiver, play an important role. The received signal power in a wireless channel is mainly affected by three major dynamics, which occur at different length scales: pathloss, shadowing, and smallscale fading[9]. Smallscale fading decorrelates within tens of centimeters (depending on the carrier frequency), making it infeasible to predict based on location information. On the other hand, shadowing is correlated up to tens of meters, depending on the propagation environment (e.g., 50–100 m for outdoor [9] and 1–2 m for indoor environments[10]). Finally, pathloss, which captures the deterministic decay of power with distance, is a deterministic function of the distance to the transmitter. In rich scattering environments, the measurements average smallscale fading either in frequency or space provided sufficient bandwidth or number of antennas[10]. Thus, provided that measurements are dominated by largescale fading, locationdependent models for pathloss and shadowing can be developed based on the physical properties of the wireless channel. With the help of spatial regression tools, these largescale channel components can be predicted at other locations and used for resource allocation [1]. However, since localization is subject to various error sources (e.g., the global positioning system (GPS) gives an accuracy of around 10 m [11] in outdoor scenarios, while ultrawide band (UWB) systems can give submeter accuracy), there is a fundamental need to account for location uncertainties when developing spatial regression tools.
Spatial regression tools generally comprise a training/learning phase, in which the underlying channel parameters are estimated based on the available training database, and a testing/prediction phase, in which predictions are made at test locations, given learned parameters and the training database. Among such tools, Gaussian processes (GP) is a powerful and commonly used regression framework, since it is generally considered to be the most flexible and provides prediction uncertainty information [12]. Two important limitations of GP are its computational complexity [13, 14, 15, 16] and its sensitivity to uncertain inputs [17, 18, 19, 14, 20, 21]. To alleviate the computational complexity, various sparse GP techniques have been proposed in [14, 13, 15], while online and distributed GP were treated in [22, 23, 16] and [24, 25, 26], respectively. The impact of input uncertainty was studied in [17, 18], which showed that GP was adversely affected, both in training and testing, by input uncertainties. The input uncertainty in our case corresponds to location uncertainty.
No framework has yet been developed to mathematically characterize and understand the spatial predictability of wireless channels with location uncertainty. In this paper, we build on and adapt the framework from [17, 18] to CQM prediction in wireless networks. Our main contributions are as follows:

We show that not considering location uncertainty leads to poor learning of the channel parameters and poor prediction of CQM values at other locations, especially when location uncertainties are heterogeneous;

We relate and unify existing GP methods that account for uncertainty during both learning and prediction, by operating directly on an input set of distributions, rather than an input set of locations;

We describe and delimit proper choices for mean functions and covariance functions in this unified framework, so as to incorporate location uncertainty in both learning and prediction; and

We demonstrate the use of the proposed framework for simulated data and apply it to a spatial resource allocation application.
The remainder of the paper is structured as follows. Section III presents the channel model and details the problem description for locationdependent channel prediction with location uncertainty. In Section IV, we review channel learning and prediction in the classical GP (cGP) setup with no localization errors. Section V details learning and prediction procedures using the proposed GP framework that accounts for uncertainty on training and test locations, termed uncertain GP (uGP). Finally, numerical results are given in Section VI in addition to a resource allocation example, followed by our conclusions in Section VII.
Notation
Vectors and matrices are written in bold (e.g., a vector and a matrix ); denotes transpose of ; denotes determinant of ; denotes entry of ; denotes identity matrix of appropriate size; and are vectors of ones and zeros, respectively, of appropriate size; denotes norm unless otherwise stated; denotes the expectation operator; denotes covariance operator (i.e., ); denotes a Gaussian distribution evaluated in with mean vector and covariance matrix and denotes that is drawn from a Gaussian distribution with mean vector and covariance matrix . Important symbols used in the paper are: is an exact, true location; , is a vector that describes (e.g., in the form of moments) the location distribution . For example in the case of Gaussian distributed localization error, , then a possible choice is , where stacks all the elements of in a vector. Finally, is a location estimate extracted from through a function (e.g., the mean or mode).
Ii Related Work
First, we give an overview of the literature on GP with uncertain inputs. One way to deal with the input noise is through linearizing the output around the mean of the input [21, 19]. In [21], the input noise was viewed as extra output noise by linearization at each point and this is proportional to the squared gradient of the GP posterior mean. However, the proposed method works under the condition of constantvariance input noise. In [19], a Delta method was used for linearization under the assumption of Gaussian distributed inputs and proposed a corrected covariance function that accounts for the input noise variance. For Gaussian distributed test inputs and known training inputs, the exact and approximate moments of the GP posterior was examined for various forms of covariance functions [18]. Training on Gaussian distributed input points by calculating the expected covariance matrix was studied in [17, 18]. Two approximations were evaluated in [27], first a joint maximization of joint posterior on uncertain inputs and hyperparameters (leading to overfitting), and second using a stochastic expectation–maximization algorithm (at a high computational cost).
We now review previous works on GP for channel prediction, which include spatial correlation of shadowing in cellular [28] and adhoc networks [29], as well as tracking of transmit powers of primary users in a cognitive network [23]. In [28], GP was shown to model spatially correlated shadowing to predict shadowing and pathloss at any arbitrary location. A multihop network scenario was considered [29], and shadowing was modeled using a spatial loss field, integrated along a line between transmitter and receiver. In [23], a cognitive network setting was evaluated, in which the transmit powers of the primary users were tracked with cooperation among the secondary users. For this purpose a distributed radio channel tracking framework using Kriged Kalman filter was developed with location information. A study on the impact of underlying channel parameters on the spatial channel prediction variance using GP was presented in [30]. A common assumption in [28, 29, 23, 30] was the presence of perfect location information. This assumption was partially removed in [31], which extends [30] to include the effect of localization errors on spatial channel prediction. It was found that channel prediction performance was degraded when location errors were present, in particular when either the shadowing standard deviation or the shadowing correlation were large. However, [31] did not tackle combined learning and prediction under location uncertainty. The only work that explicitly accounts for location uncertainty was [20], in which the Laplace approximation was used to obtain a closedform analytical solution for the posterior predictive distribution. However, [20] did not consider learning of parameters in presence of location uncertainty.
Iii System Model
iii.a Channel Model
Consider a geographical region , where a source node is located at the origin and transmits a signal with power to a receiver located at through a wireless propagation channel. The received radio signal is affected mainly by distancedependent pathloss, shadowing due to obstacles in the propagation medium, and smallscale fading due to multipath effects. The received power can be expressed as [32, Chap. 2]
(1) 
where is a constant that captures antenna and other propagation gains, is the pathloss exponent, is the locationdependent shadowing and is the smallscale fading. We assume measurements average^{1}^{1}1If measurements cannot average over smallscale fading, the proposed framework from this paper cannot be applied. smallscale fading, either in time (measurements taken over a time window), frequency (measurements represent average power over a large frequency band), or space (measurements taken over multiple antennas) [33, 10]. Therefore, the resulting received signal power from the source node to a receiver node can be expressed in dB scale as
(2) 
where with and . A common choice for modeling shadowing in wireless systems is through a lognormal distribution, i.e., , where is the shadowing variance. Shadowing is spatially correlated, with wellestablished correlation models [34], among which the Gudmundson model is widely used [35]. Let be the scalar^{2}^{2}2Vector measurements are also possible (e.g., from multiple base stations), but not considered here for the sake of clarity. observation of the received power at node , which is written as where is a zero mean additive white Gaussian noise with variance . For the sake of notational simplicity, we do not consider a threedimensional layout, the impact of nonuniform antenna gain patterns, or distancedependent pathloss exponents.
iii.b Location Error Model
In practice, nodes may not have access to their true location , but only to a distribution ^{3}^{3}3 is used for for notational simplicity.. The distribution is obtained from the positioning algorithm in the devices, and depends on the specific positioning technology (e.g., for GPS the distribution can be modeled as a Gaussian). We will assume that all distributions come from a given family of distributions (e.g., all bivariate Gaussian distributions). These distributions can be described by a finite set of parameters, , , e.g., a mean and a covariance matrix for Gaussian distributions. The set of descriptions of all distributions from the given family is denoted by . Within this set, the set of all delta Dirac distributions over locations is denoted by . Note that is equivalent to the set of possible locations. Finally, we introduce a function that extracts a position estimate from the distribution (in our case chosen as the mean), and denote . We will generally make no distinction between a distribution and its representation .
iii.c Problem Statement
We assume a central coordinator, which collects a set of received power measurements with respect to a common source from nodes, along with their corresponding location distributions . Our goals are to perform

Learning: construct a spatial model (through estimating model parameters , to be defined later) of the received power based on the measurements;

Prediction: determine the predictive distribution of the power in test locations and the distribution of the expected^{4}^{4}4Here, should be interpreted as the expected received power, , where is described by received power, , for test location distributions .
We will consider two methods for learning and prediction: classical GP (Section IV), which ignores location uncertainty and only considers , and uncertain GP (Section V), which is a method that explicitly accounts for location uncertainty. We introduce and as the collection of true and estimated locations respectively. A high level comparison of cGP and uGP is shown in Fig. 1, where cGP operates on and , while uGP operates on and .
Iv Channel Prediction with Classical GP
We first present cGP under the assumption that all locations during learning and prediction are known exactly, based on [12, 36]. Later in this section, we will discuss the impact of location uncertainties on cGP in learning/training and prediction/testing.
iv.a cGP without Location Uncertainty
We designate as the input variable, and as the output variable. We model as a GP with mean function and a positive semidefinite covariance function , and we write
(3) 
where stands for a Gaussian process. The mean function^{5}^{5}5Other ways of including the mean function in the model are possible, such as to include it in the covariance structure, and transform the prior model to a zeromean GP prior [12]. is defined as , due to (2). The covariance function is defined as . We will consider a class of covariance functions of the form:
(4) 
where for and zero otherwise, , is the correlation distance of the shadowing, and captures any noise variance term that is not due to measurement noise (more on this later). Setting in (4), gives the exponential covariance function that is commonly used to describe the covariance properties of shadowing [35], and , gives the squared exponential covariance function that will turn out to be useful in Section V.C. Note that the mean and covariance depend on
(5) 
which may not be known a priori.
iv.a1 Learning
The objective during learning is to infer the model parameters from observations of the received power at known locations . The resulting training database is thus . Due to the GP model, the joint distribution of the training observations exhibits a Gaussian distribution
(6) 
where is the mean vector and is the covariance matrix of the measured received powers, with entries . The model parameters can be learned through maximum likelihood estimation, given the training database , by minimizing the negative loglikelihood function with respect to :
(7) 
The negative loglikelihood function is usually not convex and may contain multiple local optima. Additional details on the learning process are provided later. Once is determined from , the training process is complete.
iv.a2 Prediction
After learning, we can determine the predictive distribution of at a new and arbitrary test location , given the training database and . We first form the joint distribution
(8) 
where is the vector of crosscovariances between the received power at and at the training locations , and is the prior variance (i.e., the variance in the absence of measurements), given by . Conditioning on the observations , we obtain the Gaussian posterior distribution for the test location . The mean () and variance () of this distribution turn out to be [12]
(9)  
(10)  
where . In (9), corresponds to the deterministic pathloss component at , which is corrected by a term involving the database and the correlation between the measurements at the training locations and the test location. In (10), we see that the prior variance is reduced by a term that accounts for the correlation of nearby measurements.
iv.b cGP with Location Uncertainty
Now let us consider the case when the nodes do not have access to their true location , but only to a distribution , which is described by . Fig. 2 illustrates the impact of location uncertainties assuming Gaussian location errors for a onedimensional example. The figure shows (in red) the true received power as a function of as well as the measured power as a function of for a discrete number of values of , shown as markers. To clearly illustrate the impact of different amounts on uncertainty on the position, we have artificially created three regions: high location uncertainty close to the transmitter, medium location uncertainty far away, and low location uncertainty for intermediate distances. When there is no location uncertainty (70 m until 140 m from the transmitter), , so , and hence the black dots coincide with the red curve. For medium and high uncertainty, can differ significantly from , so the data point with coordinates can lie far away from the red curve, especially for high location uncertainty (distances below 70 m). From Fig. 2 it is clear that the input uncertainty manifests itself as output noise, with a variance that grows with increasing location uncertainty^{6}^{6}6In fact, the output noise induced by location uncertainty will also depend on the slope of around , since a locally flat function will lead to less output noise than a steep function, under the same location uncertainty.. This output noise must be accounted for in the model during learning and prediction. When these uncertainties are ignored, both learning and prediction will be of poor quality, as described below.
iv.b1 Learning from uncertain training locations
In this case, the training database comprises locations and power measurements at the true (but unknown) locations . The measurements will be of the form shown in Fig. 2. The estimated model parameters can take two forms: (i) assign very short correlation distances , large , and small , as some seemingly nearby events will appear uncorrelated: or (ii) assign larger correlation distances , smaller , and explain the measurements by assigning a higher value to [21]. In the first case, correlations between measurement cannot be exploited, so that during prediction, the posterior mean will be close to the prior mean and the posterior variance will be close to the prior variance. In the second case, predictions will be better, as correlations can be exploited to reduce the posterior variance. However, the model must explain different levels of input uncertainty with a single covariance function, which can make no distinctions between locations with low, medium, or high uncertainty. This will lead to poor performance when location error statistics differ from node to node.
iv.b2 Prediction at an uncertain test location
In the case where training locations are exactly known (i.e., , ), we may want to predict the power at an uncertain test location , made available to cGP in the form , while the true test location is not known. This scenario can occur when a mobile user relies on a lowquality localization system and reports an erroneous location estimate to the base station. The wrong location has impact on the predicted posterior distribution since the predicted mean will differ from the correct mean . In addition, will contain erroneous entries: the th entry will be too small when and too large when . This will affect both the posterior mean (9) and variance (10). In the case were training locations are also unknown, i.e., , and , these effects are further exacerbated by the improper learning of .
V Channel Prediction with Uncertain GP
In the previous section, we have argued that cGP is unable to learn and predict properly when training or test locations are not known exactly, especially when location error statistics are heterogeneous. In this section, we explore several possibilities to explicitly incorporate location uncertainty. We recall that denotes the set of all distributions over the locations in the environment , while represents the delta Dirac distributions over the positions and has a onetoone mapping to .
We will describe three approaches. First, a Bayesian approach where the uncertain input (i.e., the uncertain location) is marginalized, leading to a nonGaussian output (i.e., the received power) distribution. Second, we derive a Gaussian approximation of the output distribution through moment matching and detail the corresponding learning and prediction expressions. From these expressions, the concepts of expected mean function and expected covariance function naturally appear. Finally, we discuss uncertain GP, which is a Gaussian process with input from input set and output . We will relate these three approaches in a unified view. For each approach, we detail the quality of the solution and the computational complexity. We note that other approaches exist, e.g., through linearizing the output around the mean of the input [21, 19], but they are limited to mildly nonlinear scenarios.
v.a Bayesian Approach
In a Bayesian context, we learn and predict by integrating the respective distributions over the uncertainty of the training and test locations. As this method will involve Monte Carlo integration, we will refer to it as Monte Carlo GP (MCGP).
v.a1 Learning
Given the training database , the likelihood function with uncertain training locations is obtained by integrating^{7}^{7}7For the sake of notation, all integrals in this section are written as indefinite integrals, however they should be understood as definite integrals over appropriate sets. over the random training locations:
(11) 
where . As there is generally no closedform expression for the integral (11), we resort to a Monte Carlo approach by drawing i.i.d. samples , so that
(12) 
where and . Finally, an estimate of can be found by minimizing the negative loglikelihood function
(13) 
which has to be solved numerically.
Remark 1.
This optimization involves high computational complexity and possibly numerical instability (due to the sum of exponentials). More importantly, a good estimate of can only be found if a sample is generated that is close to the true locations . Due to the high dimensionality [37, Section 29.2], this is unlikely, even for large . Hence, (13) will lead to poor estimates of .
v.a2 Prediction
Given the training database and , we wish to determine for an uncertain test location with associated distribution , described by . The posterior predictive distribution is obtained by integrating with respect to and :
(14) 
This integral is again analytically intractable. The Laplace approximation was utilized in [20] to solve (14), while here we again resort to a Monte Carlo method by drawing i.i.d. samples and , so that
(15) 
As increases, the approximate distribution will tend to the true distribution. We refer to (13) and (15) as Monte Carlo GP (MCGP). From (15), we can compute the mean () and the variance () [38, Eq. (14.10) and Eq. (14.11)] as
(16)  
(17) 
Remark 2.
Prediction is numerically straightforward, though it involves the inversion of an matrix for each of the samples . In the case training locations are known, we can utilize cGP to obtain a good estimate of and efficiently and accurately compute and . When both training and test locations are known, the above procedure reverts to cGP.
v.b Gaussian Approximation
We have seen that while MCGP can account for location uncertainty during prediction, it will fail to deliver adequate estimates of during learning (see Remark 1). To address this, we can modify from (11) using a Gaussian approximation through moment matching. In addition, we can also form a Gaussian approximation of for prediction. We will term this approach Gaussian approximation GP (GAGP). The expressions that are obtained in the learning of GAGP, namely the expectation of mean and covariance functions will be used later in the design of uncertain GP (described in Section V.C).
v.b1 Learning
Given the training database , the mean of is given by
(18) 
where and . The covariance matrix of can be expressed as
(19) 
where in which
(20) 
and is a diagonal matrix with entries
(21) 
We will refer to and as the expected mean and expected covariance function. We can now express the likelihood function as so that can be estimated by minimizing the negative loglikelihood function
(22) 
Remark 3.
Learning in GAGP involves computation of the expected mean in (18) and (21), as well as the expected covariance function in (20). These integrals are generally again intractable, but there are cases where closedform expression exist [17, 18]. These will be discussed in detail in Section V.C. GAGP avoids the numerical problems present in MCGP and will hence generally be able to provide a good estimate of .
v.b2 Prediction
Given the training database and , we approximate the predictive distribution by a Gaussian with mean and variance . These are given by
(23) 
Note that is itself a function of all ’s and . Similarly is calculated as
(24)  
(25) 
Remark 4.
v.c Uncertain GP
While GAGP avoids the learning problems inherent to MCGP, prediction is generally intractable. Hence, GAGP is not a fully coherent approach to deal with location uncertainty. To address this, we consider a new type of GP (uGP), which operates directly on the location distributions, rather than on the locations. uGP involves a mean function and a positive semidefinite covariance function , which considers as inputs and as outputs . In other words,
(28) 
The mean function is given by , already introduced as the expected mean function in (18). However, for the mean function to be useful in a GP context, it should be available in closed form. As in cGP, we have significant freedom in our choice of covariance function. Apart from all technical conditions on the covariance function as described in [12], it is desirable to have a covariance function that (i) is available in closed form; (ii) leads to decreasing correlation with increasing input uncertainty (even when both inputs have same mean); (iii) can account for varying amounts of input uncertainty; (iv) reverts to a covariance function of the form (4) when , (v) does not depend on the mean function . We will now describe the mean function and covariance function in detail.
The mean function
According to law of iterated expectations, the mean function is expressed as
(29) 
While there is no closedform expression available for (29), we can form a polynomial approximation , where the coefficients are found by least squares minimization. For a given range of , this approximation can be made arbitrarily close by increasing the order . When is approximately Gaussian (which may be the case for ), can be evaluated in closed form, since all Gaussian moments are known. See Appendix A for details on the approximation.
The covariance function
While any covariance function meeting the criteria (i)–(v) listed above can be chosen, a natural choice is (see Section IV.A)
(30) 
Unfortunately, as we can see from (19), this choice does not satisfy criterion (v). An alternative choice is the expected covariance function from (20). This choice clearly satisfies criteria (ii), (iii), (iv), and (v). To satisfy (i), we can select appropriate covariance functions, tailored to the distributions , or appropriate distributions for a given covariance function. Examples include:

Covariance functions of the form (4) with , , for Laplacian .

Covariance functions of the form (4) with , , for Gaussian (i.e., ). The expected covariance function is then given by [17, 18]
(31) Note that the factor ensures that inputs with the same mean (i.e., ) exhibit lower correlation with increasing uncertainty. The factor ensures that the measurements taken at locations with low uncertainty (smaller than ) can be explained by a large value of , while for measurements taken at locations with high uncertainty, will be small and decreasing with increasing uncertainty.
v.c1 Learning
Given the training database and choosing and , the model parameters are found by minimizing the loglikelihood function
(32) 
Note that in contrast to GAGP, we have constructed uGP so that and are available in closed form, making numerical minimization tractable.
v.c2 Prediction
Let be the mean and be the variance of the posterior predictive distribution of uGP with uncertain training and test locations, then . The expressions for and are now in standard GP form:
(33)  
(34) 
where is the vector of crosscovariances between the received power at the test distribution and at the training distribution , and is the a priori variance .
Remark 7.
In case the training locations are known, i.e., , the mean and the variance can be obtained from the expressions (33) and (34), respectively, by setting . Furthermore, the resulting mean is exactly the same as (26), obtained in GAGP. However, due to a different choice of covariance function, the predicted variance is different from (27).
v.d Unified View
We are now ready to recap the main differences between cGP and uGP, and to provide a unified view of the four methods (cGP, MCGP, GAGP, and uGP). Fig. 3 describes the main processes in uGP and cGP, along with the inputs and outputs during the learning and prediction processes. The four methods are depicted in Fig. 4: all four methods revert to cGP when training and predictions occur in , i.e., when there is no uncertainty about the locations. MCGP is able to consider general input distributions in , but leads to nonGaussian output distributions. Through a Gaussian approximation of these output distributions, GAGP can consider general inputs and directly determine a Gaussian output distribution. Both of these approaches (MCGP and GAGP) have in common that they treat the process with input as a GP. In contrast, uGP treats the process with input as a GP. This allows for a direct mapping from inputs in to Gaussian output distributions. In terms of tractability for learning and prediction, the four methods are compared in Table I. We see that among all four methods, uGP combines tractability with good performance.
Method  Learning  Prediction 

cGP  tractable, poor quality  closedform, poor quality 
MCGP  complex, poor quality  tractable 
GAGP  tractable in some cases  intractable 
uGP  tractable by design  closedform 
Vi Numerical Results and Discussion
In this section, we show learning and prediction results of cGP, uGP, and MCGP with uncertainty in training or test locations. In Section VI.D, we describe a resource allocation problem, where communication rates are predicted at future locations using cGP and uGP, in the presence of location uncertainty during training. The numerical analysis carried in this section is based on simulated channel measurements according to the model outlined in Section III.
Parameter  Value 

2.5  