# Modelling spatial heterogeneity and discontinuities using Voronoi tessellations

## Abstract

Many methods for modelling spatial processes assume global smoothness properties; such assumptions are often violated in practice. We introduce a method for modelling spatial processes that display heterogeneity or contain discontinuities. The problem of non-stationarity is dealt with by using a combination of Voronoi tessellation to partition the input space, and a separate Gaussian process to model the data on each region of the partitioned space. Our method is highly flexible because we allow the Voronoi cells to form relationships with each other, which can enable non-convex and disconnected regions to be considered. In such problems, identifying the borders between regions is often of great importance and we propose an adaptive sampling method to gain extra information along such borders. The method is illustrated with simulation studies and application to real data.

Keywords: boundary sampler, emulation, Gaussian process, spatial process.

## 1 Introduction

Often when attempting to model a spatial process, smoothness is assumed. By smoothness, we are referring to the assumption that minor perturbations in the parameters lead to only minor changes in the observations. In many applications, this assumption is violated and we see non-smooth characteristics within spatial fields. Examples such as climate models are well known for small changes in the input resulting in large changes in the output in certain areas of input space (Alley et al., 2003). Naively modelling these processes using methods that rely on the assumption of smoothness can lead to poor results when the models are used for analysis such as prediction (Paciorek and Schervish, 2006). Non-stationary methods such as mixtures of thin plate splines (Wood et al., 2002), local linear regression (Cleveland et al., 1992) and wavelet-based imputation (Heaton and Silverman, 2008) have been used to build approximations for the underlying functions of these processes. By making so few assumptions about the data, methods such as these have the drawback that a large number of observations are needed to build an accurate model of the underlying process. The method in this paper is applicable to any situation in which we suspect that a process displays heterogeneity or we are modelling a function that contains discontinuities.

One well-established method for spatial interpolation is Gaussian process regression or kriging (Cressie, 1993). By using a Gaussian process to model the underlying function (or random field), we are making an assumption of smoothness in the underlying function over the entire input space. As mentioned previously, this assumption is rarely justified. To deal with this, adaptations to the stationary Gaussian process methodology must be made to accommodate non-stationarity. Two of the main methods that have been focused on in the literature are changes to the covariance function, such as spatial deformations (Schmidt and O’Hagan, 2003), and the use of piecewise Gaussian processes, such as treed Gaussian process (TGP) (Gramacy and Lee, 2008) or Voronoi-tessellation Gaussian process (Kim et al., 2005). This paper’s focus is the latter of the two categories, and readers interested in adaptations to the covariance function are directed to Risser (2016) for a review. A third, more recent method that could be implemented is a two stage approach in which a classification technique is used to partition the data and then regression techniques are applied to each of the partitions.

In order to fit a piecewise Gaussian process model, we must specify a technique for partitioning the input space. Both of the methods mentioned previously partition the input space to give regions with straight boundaries: treed partitioning does so using non-overlapping lines parallel to the input axes and Voronoi tessellation uses the Euclidean distance from a set of centres to create Voronoi cells. In this paper, we shall focus on partitioning the input space using Voronoi tessellation due to its flexibility compared to treed partitioning. We allow Voronoi cells to join together to create larger, more flexible, joint regions. Once we have specified the partition of the input space, we can fit independent Gaussian processes to each region. The models which are built using Voronoi tessellations in Kim et al. (2005) are a special case of this where there is no dependence between cells and severe constraints are applied to the locations of the centres. The joining of cells in our tessellation allows more complex regions, such as when one region is surrounded by another, or non-convex shaped regions, without the loss of information that is intrinsic to building regions that can only be a single independent cell. Very importantly, we also look to allow a greater range of models than the standard Voronoi model of Kim et al. (2005) by changing the prior distribution of the centres that defines the cells of the tessellation. Further to this, we also tackle the problem of better defining the regions via further sampling. Traditional sampling methods are not geared towards this objective and are shown by examples to perform worse than our method in the presence of different regions.

We give a brief overview of the stationary Gaussian process model in Section 2. In Section 3, we describe our partitioning technique and the MCMC methods needed to remove dependence on the partition structure for inference. Section 4 introduces an adaptive sampling method used to better define the location of a discontinuity. Section 5 demonstrates the technique on simulated examples, and Section 6 shows the method applied to real examples. We finish with a discussion of the method and possible future extensions in Section 7 .

## 2 Modelling Using a Gaussian Process

In spatial processes, we may have a measured attribute corresponding to inputs . In the spatial setting, typically corresponds to the spatial location; however, it is not restricted to this, and more general inputs can be used for other applications. We represent the relationship between the input and output by a function: . The output is not necessarily a scalar, though we only consider the case of a scalar output in this paper. Examples of multivariate outputs are given by Rougier (2008), Higdon et al. (2008) and Conti and O’Hagan (2010), and our approach could be extended for multivariate outputs. Spatial process data are typically measured with natural variation or error, so repeated observations of identical results in different outputs . The approach we propose is also applicable to cases in which the output is deterministic, where multiple applications of the same will result in the same output , which is often found when considering computer model output, such as those seen in Sacks et al. (1989) and Currin et al. (1991). In this paper, our regression model of choice is the Gaussian process; however, this is just one possible choice and other functions could be used if appropriate.

The conditional mean of given a vector of coefficients is given by

The vector consists of known regression functions of , incorporating any beliefs that we might have about the form of . In this paper, we use a constant function which incorporates weak prior belief. The covariance between and is given by

conditionally on , and , where is a correlation function that depends on parameters given in , is a scaling term for the covariance, is an error or nugget term and is a Kronecker delta which is one if and only if and zero otherwise. The function must ensure that the covariance matrix of any set of inputs is positive semidefinite. A common choice for this function is the Gaussian correlation function

where is a diagonal matrix of roughness parameters. In this paper, we shall simply assume values of and using optimisation techniques on their likelihood.

The output of is observed at locations, , to obtain data . We denote the collection of observed inputs and outputs as training points . The likelihood of the parameters given the data can be seen in Rasmussen (1996) to be

where

(1) | ||||

If we have data in which the output is deterministic, we set , so We use a weak prior for the other hyperparameters as given in O’Hagan (1992). Alternatively, we can add more informative priors if we have stronger beliefs about our parameters, as in Oakley (2002).

## 3 Piecewise Gaussian Process Prior

### 3.1 Voronoi tessellation with joint centres

We allow for discontinuities and changes in the behaviour of over the input space by partitioning the input space into disjoint regions where and . The partition structure that we employ is a Voronoi tessellation structure in which we allow the centres to have relationships with each other to form non-standard shaped regions. By standard shaped regions, we are referring to the property in Voronoi tessellations that, if we have a finite number of unique disjoint centres in finite-dimensional Euclidean space, all of the Voronoi cells are convex polytopes (Gallier, 2008). A standard Voronoi tessellation with cells is defined by a set of centres, . An arbitrary point is contained in the cell of the th centre if

where is the Euclidean distance between and . We define to be the indices of all cell centres in . A simple example of the tessellations described can be seen in Figure 1.

We do not require the Voronoi cells to share a vertex to be in the same region, and we do not restrict the centres to be the training points . However, we note that there are potential identifiability issues here due to the fact that a region in one model which consists of multiple cells joined together can be equivalent to a region in another model consisting of a single cell. This could be easily addressed by putting a proper prior distribution on .

Our tessellation differs from that of Kim et al. (2005) in a number of important ways. In the standard Voronoi method of Kim et al. (2005), the centres are restricted to be at datapoints. Moreover, all cells are assumed to be independent with a seperate Gaussian process being built in each cell. For example, if we have an input space that is made up of two separate functions, we would ideally want to model this using two disjoint Gaussian processes. This can only be modelled accurately using the standard method if the two functions lie in parts of the input space that can be modelled by just two centres.

We could still attempt to model a setup such as this in the standard method using a larger number of independent centres; however, there are two clear drawbacks to doing so. First, by splitting up a true region into multiple independent cells, we are not using all of the points from the true region to estimate the parameters of the Gaussian process, namely and . By using a single region, as we allow for in our approach, all points from the region can be utilized simultaneously to gain better estimates of the parameters of interest , assuming of course that we are using the correct model. Secondly, using a weak prior distribution for the GP parameters has the constraint that we need at least four points to build a Gaussian process with a defined variance, which could make accurately modelling a function with a discontinuity impossible. We may, for example, only have five points sampled in a given region and may not be able to model this region with one centre, making it inadvisable to split this into multiple regions.

For our method, we define the collection of tessellation parameters and assign the priors

where is the number of all possible ordered partitions, the th Bell number (Aigner, 1999), is a Poisson process with suitable intensity parameter (it is suggested that a few different values for are simulated from until a suitable prior is found) and is a discrete uniform distribution on . It should be noted that is the only hyper-parameter that needs choosing (we could even place a prior distribution on it). If we assign each of the possible relationships a number between 1 and , then determines which of the relationships is selected to be (the collection of s) in the model. There are many adjustments that could be made to incorporate prior beliefs about the underlying model. For example, one adjustment that could be made if appropriate is to replace the Poisson process by one that includes a repulsion term, such as a Gibbs process (Illian et al., 2008). Using a repulsion term would have the benefit of additional centres having a localised effect on the model tessellation.

The likelihood of the model is

where is the multivariate Gaussian distribution for outputs corresponding to inputs which lie in the th region. We can analytically integrate over and to give us our posterior distribution

where is the number of datapoints in the th region, is the gamma function and , and are as defined in (1), with the subscript showing that these terms are evaluated using the points that lie in the th region.

The posterior distribution for is analytically intractable, and, to deal with this, we select the parameter using optimisation techniques on the likelihood of , as we similarly do with . We could also take into account our uncertainty in by placing a prior distribution on and including it within our MCMC method.

### 3.2 MCMC implementation

We use reversible-jump MCMC (RJMCMC) to sample our model parameters (Green, 1995). We have two model elements to update that we need to account for in our move types: the set of centres for the tessellation and the relationship between the centres. To update the set of centres, we add, take away or move a centre: these moves are called birth, death and move respectively. To update the relationship between the centres, we change a single centre to be in a different region (possibly a new region with no other centre); this move is called change. This gives us four possible general moves.

These four types of proposal are taken to be equally likely during the proposal step. We use an acceptance ratio similar to that described in Green (1995), which has the form

Due to the setup of the moves, we find that the acceptance ratio simplifies to the ratio of the likelihood of the proposed model to that of the existing model. As we cannot have a death when we have one centre and we also cannot change the relationship of the centre, we only propose birth and move steps in that situation. To maintain reversibility here, when a birth step is proposed, we multiply the acceptance ratio by , and, conversely, we multiply the acceptance ratio by 2 when we have two centres and we propose a death step. These multiplications are called , and we set the to 1 in all other cases. Pseudo-code detailing the RJMCMC steps is given in Algorithm 1.

After the RJMCMC update of the tessellation, we fit independent Gaussian processes to each region. We can then use the Gaussian process model on each region to make predictions at points in the same region.

## 4 Adaptive Sampling to Identify Discontinuities

In some applications, it may be possible to gather additional data at new training points . In many cases, this is costly and/or time consuming, so these values of must be chosen with care. In particular, we may wish to sample points such that we estimate any discontinuities or borders between regions more accurately. Having more information around the discontinuity will not only help us predict outcome values at unobserved locations with more accuracy, but will also supplement the understanding we have about where the discontinuities are occurring, which is often of practical interest. Existing sampling methods such as space filling algorithms and largest uncertainty samplers (Santner et al., 2003) are not tailored to this objective.

We propose the following sampling method to help estimate these boundaries. The (approximate) MAP model is found by looking at which tessellation in our posterior sample has the largest likelihood value. The method looks at our MAP model and samples points on the boundary of the region of interest (the region we want to sample from) in this model as this is a good estimate of the boundary of the discontinuity. There are an infinite number of positions that we could sample on the boundary, and so we attempt to maximise the information we get from each sample. We iteratively choose points on the boundary that are furthest from all existing design points to try to attain some of the properties that are established for space filling designs. The algorithm for this sampling method is given in Algorithm 2.

We may note that it is trivial to extend this sampling method to sample any generic region or multiple regions. A change could be made to the algorithm if we are able to double the number of points that we can sample. Instead of sampling at a point , we could look at the line that interpolates and the centre of its corresponding cell , then sample two points on this line at distances from the centre. This adaptation should in theory sample just inside and just outside of the discontinuity if is chosen suitably.

## 5 Simulation Studies

To initially test our modelling approach, we apply it to two deterministic test functions, which are shown in Figures 2a and 3(a). The first (Figure 2) has a discontinuity defined by straight lines, but these are not parallel to the parameter axes. The second (Figure 3(a)) has a curved discontinuity defining a non-convex region. These examples were used to provide a challenging test-bed to assess the effectiveness of our approach.

### 5.1 A diamond-shaped discontinuity

The first test function has a diamond-shaped discontinuity and is defined by

where

We run the function at 80 different inputs for our design points, the location of which are chosen using a Latin hyper-square design with a maximin criterion to get a good even coverage of the input space (Johnson et al., 1990). One thing of interest here is to look at the estimated surface of our model and to see how this compares to the true surface. Due to the nature of the posterior samples, any mean surface (or kriging surface) that we attain from a single sample would be conditional on the tessellation for that sample. We can numerically integrate over via Monte Carlo methods using the posterior sample and, hence, have an integrated mean surface that is not conditional on the tessellation parameter; that is . This will be referred to as the integrated surface whilst the surface of a single sample will be referred to as a mean surface. To create the integrated surface for our analysis, we find the value of the mean surface for each of the posterior tessellation samples at 10,000 points (using an equispaced grid of points) and find the mean of these points over the samples.

We compare different analysis methods using the mean squared error (MSE) of the integrated surface for each method. We find that the MSE of our method (MSE) is smaller than that of both the Treed GP (MSE=1.98) and the standard GP (MSE=2.04). The MSE of our method compared to the others suggests that our approach is more representative of the true surface. The integrated surfaces of all of the methods can be seen in Figure 2.

We also consider the performance of the adaptive sampler for this example. Firstly, we obtained the MAP model from our posterior sample, which is shown in Figure 3. The MAP model has 14 cells divided into two regions, with one region containing 12 of these cells and the other region containing just two. The region with two cells, which is the region that contains all of the points from the discontinuity, is the region whose boundary we will sample on. To do this, we implement the sampler from Algorithm 2, using 2000 candidate points on the boundary and selecting 5 of these points to evaluate and include in our training data.

1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|

Original points | 0 | 0.294 | 0.505 | 0.118 | 0.047 | 0.023 | 0.014 |

After sampler | 0 | 0.595 | 0.280 | 0.060 | 0.021 | 0.031 | 0.006 |

We can see in Figure 3 that two of the points we have chosen to sample lie very close to the true discontinuity, and, around those areas, we should have a much better understanding about the location of the boundary. We will also reduce the uncertainty about the mean function around the other three points that have been sampled although these points do not lie as close to the boundary as the two previously mentioned. We compare our sampling method to two existing methods of selecting new design points: using a Sobol sequence (Giunta et al., 2003) and selecting the points in that have the largest posterior variance. We find that our sampling algorithm produces the smallest MSE (MSE=1.352) using the method from section 4 when an additional five points are sampled compared to using Sobel (MSE=1.511) and the largest posterior uncertainty (MSE=1.392). In Table 1, we can see that the most probable number of regions is not equal to the true number of regions when only the original points are considered. However, the true number of regions becomes the most probable when the additional new points are added. In fact, as we further sample more points, we become more confident that the number of regions is two ( when we sample an additional 10 points using the sampler).

### 5.2 A discontinuity with curved boundaries

The second test function is shown in Figure 3(a). This example is a particularly difficult one as to truly represent a circular boundary using Voronoi tessellation we would need an infinite number of centres. This test function is defined as

where

We evaluate the function at 70 different design points chosen using a Latin hyper-square design with a maximin criterion to again get a good even coverage of the input space (Johnson et al., 1990).

The integrated surface we obtain for this case on application of our method can be seen in Figure 3(b). We can see that the method has performed as well as can be expected when considering the data we have used to train it. We again measure the performance of the method by looking at the mean squared error of the averaged surface in the same way as the first example. Again, the results show that our method (MSE=4.498) has a smaller MSE than Treed GP (MSE=6.886) and the standard GP (MSE=6.473). This reduction in MSE again suggests that our approach is more representative of the true surface.

In the Treed Gaussian process method, the input space is partitioned using non-overlapping straight lines parallel to the parameter axis and independent Gaussian processes are built for each of the regions. We can see from the shape of in Figure 3(a) that we would need a large number of these partitions to be able to get a good approximation for the the true shape of the discontinuity. A similar argument follows when we consider the shape of in Figure 2. The extremely large MSE values for the standard GP is due to the fact that a standard GP is inappropriate for both of these functions as the smoothness assumption is clearly being violated. As a result the mean function must over-smooth to ensure that the function intersects the training points exactly leading to poor estimates around the discontinuity.

## 6 Applications

### 6.1 Cloud modelling

We illustrate our method on simulation data from a complex numerical cloud-resolving model, the System for Atmospheric Modelling (SAM) (Khairoutdinov and Randall, 2003). For this example, the model is used to simulate the development of shallow nocturnal marine stratocumulus clouds for 12 hours over a domain of size 40 km x 40 km x 1.5 km height, given changes in the initial conditions described through the perturbation of six key parameters (which we refer to as ). These simulations are an updated version (with longer run-time and updated radiation scheme) of the nocturnal marine stratocumulus simulations (set 2) described in detail in Feingold et al. (2016), and we focus here on the average predicted cloud coverage fraction over the domain in the final hour of the simulations, .

The SAM model is highly computationally expensive to run, and so the model/cloud behaviour over the multi-dimensional parameter space cannot be fully explored using it directly. Hence, our modelling approach to generate a statistical representation of the model behaviour provides a means by which the cloud behaviour can be more rigorously examined. Shallow clouds are very important to the climate system as they reflect solar energy to space and hence cool the planet, offsetting some of the greenhouse gas warming. These clouds are particularly sensitive to aerosol concentrations in the atmosphere and meteorological conditions, where small changes in temperature and humidity profiles can impact strongly on whether clouds form or not, and how thick/reflective they are. It is essential to understand how changes in aerosol and meteorological conditions can affect shallow clouds in order to improve their representation in climate models. Currently, large-scale climate model representations of shallow clouds are poor as they form and develop on smaller scales than the large grids used, yet how they are represented can have a strong influence on predictions of climate sensitivity (i.e., the magnitude of warming for a prescribed increase in CO2).

Initial investigations of these SAM model simulations and expert opinion has suggested that the model potentially produces two different forms of cloud behaviour (open and closed cell behaviour) over the six-dimensional parameter space. Hence, the underlying model function that we want to reproduce with our modelling approach is potentially made up of a single function with two regimes, and will likely contain a discontinuity in as the model behaviour moves between these regimes. As such, there is also interest in knowing about the location of any discontinuity/change in regime in order to explore where and why this phenomenon occurs.

We have 105 training points available from the simulations to build our model of the cloud coverage fraction, , where the input combinations were chosen to cover the 6-d parameter space using a space-filling maximin Latin hypercube design. Scatter plots of the outputs against the individual inputs can be seen in Figure 5, in which we can see no obvious way of splitting the data. We see from Figure 5 that most areas of the parameter space output consistent values of at around 0.9, however some other areas have much smaller values of , from 0.2 to 0.4. The plot of the output against the aerosol concentration () input suggests that high values of this variable are very likely to yield large cloud coverage values; however, there is no clear way to differentiate low values.

One interesting aspect that we have discovered from our posterior sample is that the MAP model obtained is one that contains two regions. The posterior distribution for the number of regions is shown in Table 2. The belief that there are two underlying regimes (cloud behaviours) is further strengthened by these posterior probabilities, in which we see that two regions () is the most probable, despite the fact that no prior knowledge of this was incorporated.

Number of regions | 1 | 2 | 3 |
---|---|---|---|

Probability | 0.102 | 0.667 | 0.231 |

To measure how well our method is performing compared to other methods, 35 further simulations (with different input configurations to the training simulations) were run through the computer simulator, and these were used for validation, following the advice of Bastos and O’Hagan (2009). Our method performs better at predicting these validation points (MSE=0.016) than both the Treed GP (MSE=0.032) and the standard GP (MSE=0.025). Following this, we refitted our model using all 140 simulations as training points, and then the sampler method from Section 4 was implemented. An additional 25 parameter combinations were selected by the sampler, chosen using a candidate set of 170,000 points sampled on the boundary of the smaller (low cloud fraction) region, and these simulations were run through the simulator.

As with our initial model, our final posterior sample (after incorporating the extra information from these new simulations in our training data set) revealed that the MAP model is one which has two regions, and that two regions is most probable (). An interesting inference we aim to draw from our posterior sample is to visualise the shape of the two regions and the discontinuity between them. A challenge when dealing with data of dimension is the visualisation of results. Our MAP model has 18 Voronoi cells corresponding to one region and 87 corresponding to the other region. The region with 18 cells corresponds to low cloud fraction output and will be referred to as the smaller region. The region with 87 cells corresponds to high cloud fraction output and will be referred to as the larger region.

In Figure 6, we attempt to visualise the shape of the boundary of these regions of cloud behaviour. We used ten equispaced points in each input dimension to create a grid of 1,000,000 equispaced points over the six dimensions, and noted which points lie in each region of the MAP model. To aid with visualisation, we perform a dimension reduction technique. The technique we apply here is a 2-d averaging scheme, as follows: There are 15 possible pairwise combinations of our input variables , which are each assigned 100 equally spaced points in 2-d. For each of these points, we have 10,000 possible combinations that the other inputs can take (due to our grid) and so we compute the proportion of these 10,000 points that lie within the smaller region. In Figure 6, we use a grey scale to represent this proportion, with a white (black) block meaning that all of the points lie within the larger (smaller) of the two regions, and so correspond to areas of high (low) cloud fraction output.

We can see from Figure 6 that the smaller (low cloud fraction) region does indeed appear to have a complex shape in the parameter space as we initially suspected. In particular, an interesting aspect of this region can be seen in , the aerosol concentration. It appears that smaller values of aerosol concentration are much more likely to be attributed to the smaller region corresponding to low cloud fraction. This observation is supported by the MAP model that was seen when a TGP was attempted with the MAP model splitting the range of the aerosol concentration input variable at 117.7 cm. Figure 6 also indicates a reason why the TGP performed poorly compared to our method; in the and projections, we see that the region appears to have a curved boundary, which the TGP will find near impossible to model with straight lines.

Our results here show that by using our described modelling approach, we are able to more clearly and accurately capture and represent the discontinuity that corresponds to the sharp change in cloud behaviour over the six-dimensional parameter space of the cloud model initial conditions. This is a significant step of importance to the cloud modelling community, as this enables the identification of the key initial conditions under which these changes in behaviour may occur. We are also able to determine the sensitivity of the cloud fraction output to the co-varying initial conditions. Full exploration of this may ultimately lead to improvements in the way the shallow cloud coverage is represented in climate models.

### 6.2 USA ammonia levels data

We also apply our method to data on recorded ammonia () levels at locations across the USA, obtained from obtained from the National Atmospheric Deposition Program (National Atmospheric Deposition Program, 2007), which can be seen in Figure 7. The was measured at 250 locations in the USA, with the two points in the bottom right corresponding to the United States Virgin Islands and Puerto Rico. On plotting the data in Figure 7, we have found that there is a drastic change in the output for certain areas of the USA. In other locations, however, the output does not change as drastically, suggesting that we may have heterogeneity. As this is real observed data which is observed with error as opposed to a deterministic computer output as in Section 6.1, the error term is included in our model.

The integrated surface that we obtain for this example, via application of our modelling approach, is shown in Figure 7. This surface suggests that the north-central region of the USA has higher levels of ammonia compared to the rest of the country. We also notice that the north western region of the USA has much lower levels than the rest of the country. Figure 8 shows our posterior distribution for the number of regions of different behaviour in over the USA. We see that we have a bell shaped distribution that peaks at eight regions with an elongated tail towards the larger values, showing that there are most likely 8 different regimes over the spatial area. As with the previous examples, we test our method against the TGP and the standard Gaussian process modelling approaches. To do this here we use cross validation in which we randomly omit 50 training points (20% of the total data), and then use these as validation points on a model trained using only the remaining training points. Here, we again found that our method has a lower MSE (MSE ) than both the standard Gaussian process (MSE ) and the treed Gaussian process (MSE ).

## 7 Discussion

In this paper, we developed a method that can be used to model functions that we believe contain discontinuities or display heterogeneous characteristics. The use of Voronoi tessellations as a tool to partition the input space has been shown to be advantageous over similar methods such as treed GP in the simulations and data we explored. The idea of joining Voronoi tiles is trivial to extend to other methods; for example, we could join partitions of the treed GP to create larger and non-convex regions.

There are computational benefits to our approach over fitting a standard Gaussian process model: instead of fitting a Gaussian process model to function outputs, which requires the inversion of a matrix, we fit Gaussian process models to sets of outputs that are smaller in size. For instance, if we divide into regions, we have that the largest region could have at most data points (for our setup).

Standard Voronoi tessellations were used to partition the input space due to their flexibility. However, using standard Voronoi tessellations still partitions the input space with straight lines and more flexibility over the shape of these partitions would be preferable. One extension to our approach is the use of weighted Voronoi tessellations. The use of weights on Voronoi tessellations allows for a greater range of partition shapes. For example, we could use multiplicatively weighted Voronoi tessellations to create round partitions or additively weighted Voronoi tessellations to create hyperbolic curves for our partitions (Okabe et al., 2000). Another generalization is the additively weighted power diagram or sectional Voronoi tessellation (Okabe et al., 2000), formed by the intersection of the input space with a Voronoi tessellation in a higher-dimensional space. The cells of the sectional tessellation are again convex polytopes, but the configurations of cells that can occur differ from those in a standard Voronoi tessellation (with probability 1, if the higher-dimensional tessellation is Poisson-Voronoi); see Chiu et al. (1996) for a more precise statement and proof. The use of these weights however will add a new set of parameters to the model that need to be estimated and therefore increase the model complexity. Exploration is needed as to whether the increased flexibility justifies the additional parameter cost. Alternatively, we can consider perturbing the individual vertices of the Voronoi cells. Again, this is computationally expensive, but it adds flexibility by allowing polygonal but non-convex cells. Blackwell and Møller (2003) carry out MCMC inference for this spatial structure in combination with forms of likelihood quite different from ours.

Another obvious area that needs to be explored is the sampling method that we use to explore our posterior distribution. Currently, our MCMC sampling is initialised by using a large number of chains for the burn-in, with some chains only used for the burn-in. This is done in an attempt to avoid local maxima and ensure that we are converging to the true distribution.

### References

- Aigner, M. (1999). A characterization of the Bell numbers. Discrete Mathematics 205(1), 207–210.
- Alley, R. B., J. Marotzke, W. D. Nordhaus, J. T. Overpeck, D. M. Peteet, R. A. Pielke, R. Pierrehumbert, P. Rhines, T. Stocker, L. Talley, et al. (2003). Abrupt climate change. Science 299(5615), 2005–2010.
- Bastos, L. S. and A. O’Hagan (2009). Diagnostics for Gaussian process emulators. Technometrics 51(4), 425–438.
- Blackwell, P. G. and J. Møller (2003). Bayesian analysis of deformed tessellation models. Advances in Applied Probability 35(1), 4–26.
- Chiu, S. N., R. VandeWeygaert, and D. Stoyan (1996). The sectional Poisson Voronoi tessellation is not a Voronoi tessellation. Advances in Applied Probability 28(2), 356–376.
- Cleveland, W. S., E. Grosse, and W. M. Shyu (1992). Local regression models. Statistical Models in S 2, 309–376.
- Conti, S. and A. O’Hagan (2010). Bayesian emulation of complex multi-output and dynamic computer models. Journal of Statistical Planning and Inference 140(3), 640–651.
- Cressie, N. (1993). Statistics for Spatial Data (rev.ed.). New York: Wiley.
- Currin, C., T. Mitchell, M. Morris, and D. Ylvisaker (1991). Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. Journal of the American Statistical Association 86(416), 953–963.
- Feingold, G., A. McComiskey, T. Yamaguchi, J. S. Johnson, K. S. Carslaw, and K. S. Schmidt (2016). New approaches to quantifying aerosol influence on the cloud radiative effect. Proceedings of the National Academy of Sciences 113(21), 5812–5819.
- Gallier, J. (2008). Notes on convex sets, polytopes, polyhedra, combinatorial topology, Voronoi diagrams and Delaunay triangulations. arXiv preprint arXiv:0805.0292.
- Giunta, A., S. Wojtkiewicz, and M. Eldred (2003). Overview of modern design of experiments methods for computational simulations. In 41st Aerospace Sciences Meeting and Exhibit, pp. 649.
- Gramacy, R. B. and H. K. H. Lee (2008). Bayesian treed Gaussian process models with an application to computer modeling. Journal of the American Statistical Association 103(483), 1119–1130.
- Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 711–732.
- Heaton, T. and B. Silverman (2008). A wavelet- or lifting-scheme-based imputation method. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70(3), 567–587.
- Higdon, D., J. Gattiker, B. Williams, and M. Rightley (2008). Computer model calibration using high-dimensional output. Journal of the American Statistical Association 103(482), 570–583.
- Illian, J., A. Penttinen, H. Stoyan, and D. Stoyan (2008). Statistical Analysis and Modelling of Spatial Point Patterns. Chichester: John Wiley & Sons.
- Johnson, M., L. Moore, and D. Ylvisaker (1990). Minimax and maximin distance designs. Journal of Statistical Planning and Inference 26(2), 131 – 148.
- Khairoutdinov, M. F. and D. A. Randall (2003). Cloud resolving modeling of the arm summer 1997 iop: Model formulation, results, uncertainties, and sensitivities. Journal of the Atmospheric Sciences 60(4), 607–625.
- Kim, H., B. Mallick, and C. Holmes (2005). Analyzing nonstationary spatial data using piecewise Gaussian processes. Journal of the American Statistical Association 100, 653–668.
- National Atmospheric Deposition Program (2007). Ammonia monitering network. http://nadp.sws.uiuc.edu/data/AMoN/.
- Oakley, J. (2002). Eliciting Gaussian process priors for complex computer codes. The Statistician 51, 81–97.
- O’Hagan, A. (1992). Some Bayesian numerical analysis. In Bayesian Statistics 4 (eds. Bernado, J.M. et al.), pp. 345–363. Oxford: Oxford University Press.
- Okabe, A., B. Boots, K. Sugihara, and S. Chui (2000). Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. New York: Wiley.
- Paciorek, C. J. and M. J. Schervish (2006). Spatial modelling using a new class of nonstationary covariance functions. Environmetrics 17(5), 483–506.
- Rasmussen, C. (1996). Evaluation of Gaussian processes and other methods for non-linear regression. Ph. D. thesis, Graduate Department of Computer Science, University of Toronto.
- Risser, M. D. (2016). Review: Nonstationary spatial modeling, with emphasis on process convolution and covariate-driven approaches. arXiv preprint arXiv:1610.02447.
- Rougier, J. (2008). Efficient emulators for multivariate deterministic functions. Journal of Computational and Graphical Statistics 17(4), 827–843.
- Sacks, J., W. Welch, T. Mitchell, and H. Wynn (1989). Design and analysis of computer experiments. Statistical Science 4, 409–423.
- Santner, T., B. Williams, and W. Notz (2003). The Design and Analysis of Computer Experiments. New York: Springer.
- Schmidt, A. M. and A. O’Hagan (2003). Bayesian inference for non-stationary spatial covariance structure via spatial deformations. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65(3), 743–758.
- Wood, S. A., W. Jiang, and M. Tanner (2002). Bayesian mixture of splines for spatially adaptive nonparametric regression. Biometrika, 513–528.