Modeling Daily Seasonality of Mexico City Ozone using Nonseparable Covariance Models on Circles Cross Time
Abstract
Mexico City tracks groundlevel ozone levels to assess compliance with national ambient air quality standards and to prevent environmental health emergencies. Ozone levels show distinct daily patterns, within the city, and over the course of the year. To model these data, we use covariance models over space, circular time, and linear time. We review existing models and develop new classes of nonseparable covariance models of this type, models appropriate for quasiperiodic data collected at many locations. With these covariance models, we use nearestneighbor Gaussian processes to predict hourly ozone levels at unobserved locations in April and May, the peak ozone season, to infer compliance to Mexican air quality standards and to estimate respiratory health risk associated with ozone. Predicted compliance with air quality standards and estimated respiratory health risk vary greatly over space and time. In some regions, we predict exceedance of national standards for more than a third of the hours in April and May. On many days, we predict that nearly all of Mexico City exceeds nationally legislated ozone thresholds at least once. In peak regions, we estimate respiratory risk for ozone to be 55% higher on average than the annual average risk and as much at 170% higher on some days.
Keywords: Bayesian inference, circle, environmental health, nonseparable covariance function, Mexico City, pollution monitoring
1.7 \newfloatcommandcapbtabboxtable[][\FBwidth]
1 Introduction
Groundlevel ozone is linked with short and longterm health risks in general (see, e.g., Lippmann, 1989; Salam et al., 2005; Bell et al., 2006; Weschler, 2006) and in Mexico City specifically (see Mage et al., 1996; Romieu et al., 1996; HernándezGarduño et al., 1997; Loomis et al., 1999; BravoAlvarez and TorresJardón, 2002; BarrazaVillarreal et al., 2008; RiojasRodríguez et al., 2014). In statistics, many have studied daily ozone levels (see, e.g., Sahu et al., 2007; Berrocal et al., 2010; Huang et al., 2018). However, because shortterm spikes in ozone levels are associated with increased respiratory health risk, some argue that using finer time scales, normally hourly, and treating time as continuous, is essential to quantify shortterm health risks (Chiogna and Pauli, 2011; Arisido, 2016).
The Mexico City ozone monitoring data presented here consist of hourly measurements from 24 stations from April and May, Mexico City’s peak ozone season. Of primary interest in this analysis are predicted ozone levels at unmonitored locations and associated compliance to national ambient air quality standards and respiratory health risk. For compliance, we discuss the probability of exceeding nationally legislated limits, 95 parts per billion (ppb) for hourly ozone and 70 ppb for eighthour average ozone (Diario Oficial de la Federación, 2014). For health outcomes, we use methods in Chiogna and Pauli (2011) and compare respiratory health risks during the peak ozone season to the average ozone risk over 2017. Our model must enable these goals. First, our model must allow spacetime predictions, accounting for daily seasonality or periodicity. With these predictions, we make probabilistic assessments of compliance and respiratory health risk.
Here, we provide three primary contributions. To account for the daily pattern of ozone in the Mexico City data, we envision time as a quantity that lies on a circle and the real line (i.e. a 24hour clock and linear time) (see Figure 1 for illustration). To develop appropriate models for this, we extend the current literature on covariance functions for circles cross time and demonstrate that examples from these new classes have better predictive performance for the Mexico City data than current alternatives. Then, we discuss modeling details for Vecchia models that rely on neighbor selections for model fitting and prediction (Vecchia, 1988). This problem has not been addressed in the literature for periodic data and is not as simple as covariance models that decay monotonically over space and time. Lastly, we use this modeling framework to assess compliance with air quality standards and respiratory health risks.
Periodic or seasonal data, as we define it, rise and fall with some fixed period or alternatively has covariance that follows a periodic pattern. With this loose definition of seasonality, we include covariance functions that dampen as a function of time while exhibiting seasonality, for which we borrow the term quasiperiodic from Solin and Särkkä (2014). The Mexico City ozone data exhibits this type of pattern, as we discuss in Section 2.
Treating seasonal data in discrete time is common (for reference, see West and Harrison, 1997; Prado and West, 2010; Shumway and Stoffer, 2017). These types of models have even been applied to Mexico City ozone levels (Huerta et al., 2004; White et al., 2018) but were used with different goals in mind. Because prediction at unobserved locations and potentially unobserved times is essential for our modeling goals, we argue for continuous space and continuous time models.
Although the discussion on positivedefinite functions on spheres dates to Schoenberg (1942), we take the work of Gneiting (2013) as our starting point. Extensions of this work to spheres cross time include Porcu et al. (2016) and White and Porcu (2018), while Shirota et al. (2017) developed similar covariance models for circles cross time to account for timeofday effects for modeling crime event data. We further this discussion by proposing new classes of nonseparable covariance functions specific for circles cross time and by extending current covariance classes to this setting. For discussion on tests for correlation functions on circles, see Gneiting (1998).
For scalable Gaussian models, we use methods that induce sparsity in the inverse covariance matrix. First proposed by Vecchia (1988) and then extended by Stein et al. (2004), Stein (2005), Bevilacqua et al. (2012), Gramacy and Apley (2015), Datta et al. (2016a), and Katzfuss and Guinness (2017), these methods reduce computational expense by assuming conditional independence given a neighborhood (conditioning) set. We refer to these approaches as Vecchia models and focus specifically on the nearestneighbor Gaussian process (Datta et al., 2016a).
Discussion about periodic data for neighborbased approaches is lacking in the literature. Stein (2005) considers daily data where periodicity is not present and conditions on most recent times. Datta et al. (2016b) work on the same time scale and argue using dynamic neighborhood sets that select the most correlated neighbors; however, their approaches do not easily generalize to our setting. In this article, we open this discussion for quasiperiodic covariance models.
We continue by discussing the Mexico City pollution monitoring data in Section 2. Then, we discuss appropriate covariance classes for these data and present new covariance classes for circles cross time in Section 3. Using these and other covariance classes, we discuss modeling details using Bayesian nearestneighbor Gaussian processes models in Section 4. We motivate this discussion in the Online Supplement using a single timeseries. We present the results of our analysis in Section 5, addressing compliance to Mexican ambient air quality standards and respiratory health risks associated with groundlevel ozone. Lastly, we give concluding remarks and discuss future extensions in Section 6.
2 Mexico City Ozone Monitoring Data
In this dataset, we have hourly ozone measurements for April and May of 2017 at monitoring stations across Mexico City, Mexico. At each station, we have measurements at times, giving observations in total^{1}^{1}1Although ozone levels are given hourly, these hourly quantities are derived is an average of the 60 minutebyminute measurements. Missing measurements were imputed prior to receiving the data using the measurements at the nearest station within the same region. If no simultaneous measurements in the same region were available, then the missing measurement was filled in using the nearest station in a different region.. Relative humidity (RH) and temperature (TMP) are also measured hourly at the same 24 stations, and these are used as explanatory variables.
Ozone is formed by volatile nitrous oxides exposed to heat and sunlight (Sillman, 1999); thus, ozone levels peak during the warmest hours of the day. Ozone levels are held in check by rain that clears out pollutants. To use heat and rain as explanatory variables, we include hourly temperature and relative humidity as covariates for models in Sections 5 and the Online Supplement.
First, we look at variability in ozone levels across the city. To do this, we plot station locations with their mean ozone levels in Figure 2 using the ggmap package in R (Kahle and Wickham, 2013). In general, ozone decreases as we move northward; however, this trend is not uniform, as central Mexico City has the lowest ozone values. Peak ozone levels in the south are largely explained by a wind corridor that flows northeast to southwest, moving ozone and ozone precursors produced along this wind path to southern parts of Mexico City. Additionally, ozone is often trapped in the south by mountains on Mexico City’s southwest boundary.
We explore how the data vary over time, Figure 3 displays the mean and maximum ozone concentration for each day, showing that ozone concentrations peak in May. Figure 3 plots ozone averages as a function of hour of the day and demonstrates a clear peak in ozone levels around 2:00 or 3:00 p.m.
To examine whether the daily pattern in ozone can be explained by temperature and humidity, we fit a linear model to ozone using relative humidity and temperature as covariates. Then, we examine the autocorrelation of the model residuals for each of the 24 locations (See Figure 3). For each site, the temporal autocorrelation pattern peaks every 24 hours but decays overall as a function of time. Thus, a purely periodic function or purely decaying covariance model would be insufficient for these data (we demonstrate this empirically in both our temporal analysis in the Online Supplement and Section 5). These plots motivate our theoretical discussion in Section 3.
3 Covariance Models
3.1 Covariance Modeling Approach
We start with the notation and background needed for a neat exposition of our results. We denote as the unit circle and use the mapping as the angle (great circle distance) between any two points on . We use the space to capture the circular nature of time exhibited in daily patterns in ozone levels. We use as the spatial domain, a projection of a small portion of the sphere onto a plane. Throughout, we denote as a weakly stationary spacetime Gaussian random field.
The assumption of Gaussianity implies that finite dimensional subsets of are uniquely determined by its mean and the covariance function , defined as where and are the spatial and the temporal lag, respectively. Our novel contribution is to use the temporal lag to account for differences in raw time ( itself) and timeofday, which we represent as the angle on a circle. Thus, we assume
(1) 
for some mapping . Constructing covariance functions of the type (1) is challenging, and we propose covariance functions that are partially nonseparable. As an abuse of notation, we write for and for and consider the following nonseparabilities:
 (A) nonseparability.

Here, ;
 (B) nonseparability,

obtained when ;
 (C) nonseparability,

where .
Apparently, , are covariance functions on their respective subspaces. For the Mexico City data, we also consider complete separability, , to motivate the need of nonseparable models. We detail some approaches for constructions (A), (B) and (C).
For ease of illustration, we first define Gneiting functions, , according to Gneiting (2002), as follows:
(2) 
Following Porcu et al. (2016), we define the modified Gneiting class, , as
(3) 
For both classes, the function is completely monotonic; that is, is infinitely differentiable on , satisfying , . The function is strictly positive and has a completely monotonic derivative. Here, denotes the restriction of to the interval . We provide selections for and in Tables 4 and 5 of the Online Supplement. The class cannot not capture seasonality or periodicity but is appropriate for strategy (B), i.e. (see Section B.2 for examples). We limit our selections of this type to , although many other examples exist in the literature. For strategy (A), arguments in Theorem 2 in White and Porcu (2018) ensure to provide a valid covariance function, and Theorem 1 in Porcu et al. (2016) shows that is a valid covariance function. These classes can also be used for in approach . Examples for and are given in Appendix B.1.
The marginal covariance is taken from the Matérn class , defined as
(4) 
where is the MacDonald function (Gradshteyn and Ryzhik, 2007). Well known arguments show that is a valid choice (Stein, 1999) for any . The exponential model on , , induces conditional independence between random effects given the time immediately previous, regardless of spacing (see the Online Supplement for some discussion). A severe restriction on the parameter is required for to be a valid covariance (see Gneiting, 2013): . The Matérn function can also be used to generate valid covariance on any dimensional Euclidean space (Stein, 1999). Choices for the functions for construction (C) are the core of our theoretical contributions.
3.2 Covariance Functions for Circles Cross Time
Literature on periodic covariance functions is sparse in general, with some exceptions. For fixed period , MacKay (1998) embeds the temporal mapping, , in the squared exponential correlation function to obtain a periodic covariance model. Rasmussen and Williams (2006) propose quasiperiodic covariance functions by taking the product of periodic and decaying covariance functions. Solin and Särkkä (2014) show an explicit link between some periodic and separable quasiperiodic Gaussian process specifications and statespace models. Most discussion about periodic covariance models is, however, limited to separable models like those discussed. To provide more general nonseparable covariance models for periodic data, we turn to models on circles cross time.
For hourly data, we map hourofday onto a 24hour clock or circle. As discussed, there are rich covariance classes on spheres and spheres cross time (e.g., Gneiting, 2013; Porcu et al., 2016; White and Porcu, 2018). Shirota et al. (2017) discuss nonseparable covariance classes on (circles cross space) for crime event data using logGaussian Cox processes. However, for our data, purely circular classes from Gneiting (2013) and Shirota et al. (2017) are too limited because they do not dampen over time.
The result we present is motivated by autocorrelation patterns in Figure 3 in Section 2, exhibiting both periodicity and decay. Specifically, we propose two classes on nonseparable covariance functions on . For these classes, we define the variogram of an intrinsically stationary process, , as , . We also recall that, for any covariance function on , the ratio is a correlation function. Lastly, we define as the hyperbolic sine function.
Theorem 1.
Let be a variogram and be a correlation function.
 (I)

Let be defined as
(5) Then, is a correlation function.
 (II)

Let be defined as
(6) Then, is a correlation function.
Proof of Theorem 1 is deferred to Appendix A. Examples of variograms are reported in Tables 3 and 5 in the Online Supplement. We give Equations (21), (22), and (23) in Appendix B.1 as examples from Theorem 1. These models do not decay to zero at gets large. If eventual decay to zero is preferred, then we could construct a model , where decays to zero as a function of time, but this construction did not improve predictive performance for these data.
4 Methods and Models
4.1 NearestNeighbor Gaussian Process
We envision a model for the Mexico City data as
(7)  
where is squareroot ozone measured at the locationtime pair , are relative humidity and temperature measurements, are regression coefficients, are spacetime random effects, and is Gaussian noise. We found that using squareroot ozone reduced correlation between the variance of model residuals and predicted mean. Additionally, modeling on the squareroot scale led to better predictive performance than modeling on the original scale (see also Sahu et al., 2007; Berrocal et al., 2010).
Spacetime random effects for pointreferenced data are often specified through a functional prior using a Gaussian process (GP) (see, e.g., Rasmussen and Williams, 2006; Banerjee et al., 2014). Likelihood computations for hierarchical Gaussian process models require inverting a square matrix that is the same size as the data, making Gaussian process models intractable with even moderate amounts of data (for example, ).
Many have addressed this computational bottleneck using either lowrank or sparse matrix methods (see Heaton et al., 2018, for a review and comparison of these methods). Lowrank methods project the original process onto representative points or knots (see, e.g., Higdon, 2002; Banerjee et al., 2008; Cressie and Johannesson, 2008; Stein, 2008); however, these approaches often perform poorly for prediction as they often oversmooth (see Stein, 2014; Heaton et al., 2018).
Alternatively, sparse methods either induce zeros in the covariance matrix using compactly supported covariance functions (see, e.g., Furrer et al., 2006; Kaufman et al., 2008) or in the precision matrix by assuming conditional independence (Vecchia, 1988; Stein et al., 2004). We ultimately favor conditional independence approaches because predictive performance is generally better, and the class of valid covariance models is more expansive.
Sparse precision methods date to Vecchia (1988) and were furthered by Stein et al. (2004) and Bevilacqua et al. (2012) to approximate the likelihood of the full GP model using conditional likelihoods. Gramacy and Apley (2015) and Datta et al. (2016a) extend this work to process modeling. The nearestneighbor Gaussian process is itself a Gaussian process (Datta et al., 2016a) and has good predictive performance relative to other fast FP methods (See Heaton et al., 2018).
If we take a parent Gaussian process over , the nearestneighbor Gaussian process induces sparsity in the precision matrix of the parent process by assuming conditional independence given neighborhood sets (Datta et al., 2016a). Let of distinct locationtime pairs denote the reference set, where time gives natural ordering and spatial ordering induced for observations at identical times. In our case, we order space by latitude, south to north, and take observed data as the reference set. For model validation, we include heldout locationtime pairs in the reference set, making prediction part of model fitting.
We define neighborhood sets on the reference set, where consists of nearestneighbors of chosen from . Evidently, if , . Together, and define a Gaussian directed acyclic graph (DAG) with joint distribution
(8) 
where , , and is the subset of corresponding to neighbors (Datta et al., 2016a). Like other GP models, the NNGP can be used for hierarchically spatial random effects.
4.2 Neighbor Selection
Optimal neighbor selection is not simple, and “best” subsets vary depending on covariance function and associated parameters (see, e.g., Vecchia, 1988; Datta et al., 2016b). Defining nearestneighbors only in terms of distance can lead to models that are uniformly suboptimal compared to other neighbor selections (Stein et al., 2004), but nearestneighbor approaches are often used for simplicity. Gramacy and Apley (2015) provide a thorough discussion on how to improve upon nearestneighbor selections.
For daily monitoring data, Stein (2005) argues using mostrecent neighbors. In the same setting, Datta et al. (2016b) discuss two approaches for selecting neighbors: simple neighbor selection and dynamic neighbor selection. For neighbors, simple selection involves choosing nearest spatial neighbors over the most recent times, including current times. For dynamic neighbor selection, neighbors change within model fitting to select the most correlated neighbors. In some nonseparable spacetime settings, possible neighbors can be enumerated simply, limiting the computation burden of dynamic neighbor selection (Datta et al., 2016b). Enumeration of possible neighbors for periodic time and quasiperiodic time covariance models is intractable because the covariance function has many local maxima over time.
Even for fixed periods, the pattern of covariance seasonality varies greatly depending on the model and its parameters. Thus, neighborhood selection may be specific to the model and data. Our argument is simple. Including periodic peaks, as well as locations near periodic peaks improves predictive performance. We provide an empirical discussion to this point in the Online Supplement.
4.3 Priors, Prediction, and Model Comparison
We begin here by discussing the prior distributions for our model parameters. We use inversegamma prior distributions for and with 2.1 and 10 for the shape and rate parameters. For squareroot ozone concentrations, these selections are not overly informative. For compactly supported covariance parameters, we use uniform prior distributions. For parameters that are strictly positive, we use gamma prior distributions with shape and rate of 0.01. We assume that regression coefficients a priori follow independent normal distributions with mean zero and variance . For this model formulation, model fitting details can follow either Datta et al. (2016a) or the collapsed sampler in Finley et al. (2018).
Predictions at unobserved times and locations can depend on any subset of the reference set . Because this is a retrospective analysis, we use past and future lags for prediction, where we again consider neighbor selection carefully to account for seasonality. In our case, we use the five nearest spatial neighbors with 1, 2, 23, 24, 25, and 168hour lags back and forward, as well as simultaneous spatial neighbors. Random effects at unobserved locationtime pairs have a conditional normal distribution given by
(9) 
Then, the posterior prediction done using posterior samples of , , and via composition sample (see, e.g., Gelman et al., 2014). To obtain estimates of at unmonitored locations, we weight the 24 simultaneously observed covariates proportional to inverse squared distance. Predictions at heldout locationtime pairs are used to compare competing models.
We fit models on the squareroot scale but compare models using predictive criteria on the original scale. To validate our spacetime model, we hold out all observations (i.e. all stations) at 20% of the observed hours and compare models using root mean squared prediction error (RMSPE), mean absolute prediction error (MAPE), % prediction interval coverage (CVG), and continuous ranked probability scores (CRPS) (Gneiting and Raftery, 2007), where
(10) 
We use posterior predictive samples for a Monte Carlo approximation (see, e.g., Krüger et al., 2016),
We then average over heldout data. Because CRPS considers how well the entire predictive distribution matches the observed data rather than only the predictive mean (MAPE and RMSPE) or quantiles (CVG). Furthermore, CRPS is a proper scoring rule (Gneiting and Raftery, 2007); thus, we prefer it as a selection criterion.
To assess predictions for all observations at a heldout hour jointly, we consider the energy score (ES), a multivariate generalization of CRPS. For a set of multivariate predictions Y, ES is
(11) 
where y is an observation, , and is a probability measure (Gneiting and Raftery, 2007). It is common to fix (see, e.g., Gneiting et al., 2008; Jordan et al., 2017). For a set of MCMC predictions for a heldout observation y, the empirical ES reduces to
as was done in Gneiting et al. (2008). Like CRPS, ES is a proper scoring rule (Gneiting and Raftery, 2007). To obtain a single value, average ES over heldout data.
4.4 Ozone Exposure Respiratory Risk Assessment
We use the daily respiratory risk model presented in Chiogna and Pauli (2011) to assess risk attributable to groundlevel ozone. In their paper, Chiogna and Pauli (2011) model the shortterm effects of summer ozone levels on all hospital admissions and respiratory hospital admissions. For this goal, they compare 115 models using crossvalidation. Although the risk model of Chiogna and Pauli (2011) does not include space, it can be applied locationwise.
Chiogna and Pauli (2011) found that the best model included three measures associated with an ozone threshold of ppb: number of the hours exceeding on day (we call this ), the difference between the maximum daily ozone level and (max  threshold) , and the lagged average nighttime ozone from the last three days (nighttime is defined to be 9 PM and 8 AM). We used fewer days to calculate if data from the last three nights were not available (e.g. April 2 only has one night to estimate risk). Then, the relative risk of respiratory hospital admissions on day at location s is
(12) 
where and are regression coefficients for and , respectively. The scaling factor makes the average risk at the 24 ozone monitoring stations over the year equal to one. The scale of the coefficients differs from those presented in Chiogna and Pauli (2011) because we use ozone levels in ppb instead of .
By adopting this model, we make many assumptions that we outline here. First, we assume that the effects of ozone in Milano, Italy are like those in Mexico City, Mexico. While the marginal effect may be the same, it is likely that the effects of ozone are influenced by other factors like other pollutants and weather. Second, we adopt their choice of threshold where ozone begins to be harmful (they consider three thresholds), although several question using thresholds altogether (See, e.g., Kim et al., 2004). Additionally, we consider the difference between the maximum daily level compared to a threshold instead of integrated ozone as was used in Arisido (2016). Noting these limitations, this model enables us to estimate respiratory risk on a daily level using predicted hourly ozone levels.
5 Results and Discussions
To select a covariance model, we hold out all observations at 20% of hours in April and May, as discussed in Section 4. This holdout approach allows us to score models using the energy score, as well as univariate criteria like CRPS, RMSPE, MAPE, and 90% prediction interval coverage. While RMSPE and prediction interval coverage are the most common criteria, we rely upon CRPS and ES most because they are proper scoring rules (Gneiting and Raftery, 2007) and compare the whole predictive distribution to heldout values. Outofsample predictions are made using 25,000 posterior samples after a burnin of 5,000 iterations.
We present results for a variety of covariance models on or a subset of this space. These models are given in Table 1. We first consider a completely separable covariance model on , Model 1, which is used to motivate the use of nonseparable models. We then consider two examples from Gneiting (2002). The first, given as Model 2, ignores periodicity. We also consider this same covariance model multiplied by an exponential covariance function that takes as an argument (Model 3), an example of construction (B) in Section 3. Then, we consider two examples based on the class of nonseparable covariance models presented by Shirota et al. (2017) (Models 4 and 5), where Model 4 ignores temporal decay and Model 5 is an example of construction (A). Then, we give two examples of nonseparable circular cross linear time covariance models, one from White and Porcu (2018) and the other from Theorem 1, where both models are multiplied by an exponential spatial covariance function (Models 6 and 7). Both Models 6 and 7 are examples of construction (C). Other models from these classes were considered; however, these represent the models with the best predictive performance.
All models except Model 4 use the five nearest spatial neighbors (including the location itself) with 1, 2, 23, 24, 25, and 168hour lags. We use neighbors at the same time as the observation but exclude the observation itself. Thus, we condition on at most 34 neighbors. For Model 4, we exclude lags 24 and 168 at the location itself to maintain positivedefinite covariance matrices; instead, we include lags 3, 167, and 169 to compensate. The results of the model comparison are given in Table 1.
The results highlight the importance of modeling the entire space () and utilizing nonseparable models. Models that excluded one of these subspaces (Models 2 and 4) had the worst predictive performance. The completely separable model (Model 1) was next worse in prediction, motivating the need for nonseparable models. Each of the constructive approaches, denoted (A), (B), and (C), proposed in Section 3 performed well in prediction. Ultimately, Model 7, derived from Theorem 1 and using construction (C), performed best in terms of ES, CRPS, MAPE, and RMSPE for these data. Thus, we use this model for prediction at unmonitored locations.
Explicitly, the final covariance model is
(13) 
where , and . The parameter governs the temporal range, is the spatial range parameter, and is a smoothness parameter. The parameter is most accurately understood as the spatial range for simultaneous observations, and is the temporal range at a fixed location. One benefit of this model is that it has few parameters compared with many of its competitors.
Posterior summaries for regression coefficients and covariance parameters are given in Table 2. The regression parameters agree with our hypotheses that relative humidity is negatively related to ozone levels and the temperature is positively associated with ozone. The mean of the spatial range parameter is 22.32 km. The posterior mean of the temporal range parameter is 86.90 hours, meaning that correlation for this model persists over several days. However, these range parameters have limited interpretability, as discussed above.
We limit our predictions to the convex hull of the monitoring network to not spatially extrapolate. Using these posterior predictions in April and May, we can assess probabilities of exceeding Mexican ambient air quality standards and examine respiratory risk rates during Mexico City’s peak groundlevel ozone season.
To summarize exceedance results temporally, we plot posterior means and 95% credible intervals for the estimated proportion of the city that exceeds national ozone standards at least once on that day in Figure 4. In April, there are several peaks in exceedance probability; however, from May 6 to May 28, the proportion of the city with at least one daily ozone exceedance is near one.
For each predictive location, we plot the estimated proportion of hours when Mexican groundlevel ozone standards (either onehour or eighthour ozone) are exceeded in Figure 5. This follows the pattern in Figure 2, but the degree of variation between locations is striking. We estimate that locations in southern Mexico City exceed national ozone regulations nearly three times as often as north, west, and central Mexico City.
In Figures 5, we give the mean relative risk averaged over all days in April and May, and we plot the estimated maximum relative risk over the same time period in Figure 5. Although the spatial patterns of the means and maxima are similar, the scale of these plots differs significantly. In terms of mean risk, the most extreme regions have respiratory risks 50% higher than the least extreme regions. In contrast, regions of the highest maximum risk in the peak season are nearly twice that of regions with the lowest maximum risk. These plots demonstrate the degree of increased health risk determined solely by where one lives or works.
The estimated mean and maximum respiratory health risk over Mexico City (with 95% credible intervals) are given as a function of the day in Figure 6. Ozone risk peaks May 17 to May 23 in terms of both the mean and maximum, but the changes in the maximum risk are much more drastic than those in mean risk. These maxima correspond to extreme ozone levels in south Mexico City where the estimated risk is 2.7 times the annual average and nearly two times the mean risk on the same day.
These results highlight how extreme the spacetime variability in both exceedance probability and respiratory health risk is within Mexico City during its peak ozone season. According to the risk model from Chiogna and Pauli (2011), those living in regions with extreme ozone levels are about 50% more likely to be admitted to the hospital due to ozone exposure compared to those in less polluted areas. Although this is not a component of the risk model, increased respiratory risk impacts atrisk populations (e.g. the elderly) at a much higher rate than healthier subpopulations (see, e.g., Bell et al., 2004).
6 Discussion and Conclusions
We have discussed the monitoring network for ozone and analyzed these data in April and May of 2017. For these data, we develop new classes of covariance models that account for daily seasonality. We apply these covariance models to the Mexico City ozone data using nearestneighbor Gaussian processes and select a model using predictive criteria on a holdout dataset. This model is then used for prediction to assess compliance with Mexican ambient air quality standards and respiratory risk due to ozone exposure. In this analysis, we identify regions and times where exceedance probabilities and ozone risk factors are particularly high.
An interesting followup study would account for people’s movement over the day, giving individualized risk assessments. In addition to this, we could combine our modeling approaches with hospital admission data in Mexico City, allowing us to establish more accurate respiratory risk measures for Mexico City, rather of relying on other methods.
Further theoretical work could address how the constraints on the functions and in Shirota et al. (2017) can be relaxed while preserving positivedefiniteness. Here, we have considered models that are partially nonseparable over but not over all three spaces. Thus, further theoretical work to establish such classes is justified.
Appendix A Proof of Theorem 1
We start by noting that arguments in Berg and Porcu (2017) show that the functions defined at points (I) and (II) are positive definite if and only if they can both be written as
(14) 
where uniquely determined sequence of positive definite functions has . To show (I), we resort to in Gradshteyn and Ryzhik (2007):
The function in Equation (5) admits an expansion of the type (14) with coefficients for . It can be namely verified that is finite. Thus, the proof is completed. As for Assertion (II), Equation (6) comes straight by considering the expansion in [1.449.2] in Gradshteyn and Ryzhik (2007):
which is absolutely convergent provided . We now replace with the correlation function to obtain (6). The proof is completed.
Appendix B Covariance Examples
b.1 Temporal Modeling
We begin with two purely circular covariance examples from Gneiting (2013) in (15) and (16). For and , we define
(15)  
(16) 
In addition to the Matérn model presented in Section 3, we consider a pair of covariance models on from what we call the inverted Gneiting class in (Porcu et al., 2016). Let , and . Also, let and be positive parameters. Then
(17) 
where , and where and belong to the interval . The second we consider uses the generalized Cauchy covariance function (Gneiting and Schlather, 2004, see Tables 4 and 5 in the Online Supplement) for the temporal margin,
(18) 
Examples from White and Porcu (2018) include
(19) 
with , and . For the second, we propose again a generalized Cauchy covariance function for the spatial margin, obtaining
(20) 
where , , and .
From Theorems 1, we give three examples. Let and . Then, we propose
(21)  
(22)  
(23) 
b.2 SpaceTime Models
To create effective spacetime models, we suggest combining models from Appendix B.1 with valid spatial models. Additionally, we provide examples of covariance models on and which can be combined with valid models on and given in Appendix B.1.
First, we look at nonseparable covariance models on from the socalled Gneiting class from Gneiting (2002) given in (2). From this class, we give two examples. For the first, we propose again a generalized Cauchy covariance function for the spatial margin, obtaining
(24) 
for , and . To incorporate periodicity, giving a model on , one take the product of one of these models with, e.g., (4) or (15).
Shirota et al. (2017) extends these models to , replacing the temporal lag with the angular difference on a circle. We give one example in Equation (25). For the second, we propose again a generalized Cauchy covariance function for the spatial margin, obtaining
(25) 
for , and . To incorporate a decaying covariance over linear time, one simply takes the product of one of these models with a covariance model on .
References
 Arisido (2016) Arisido, M. W. (2016). Functional Measure of Ozone Exposure to Model Shortterm Health Effects. Environmetrics, 27(5):306–317.
 Banerjee et al. (2014) Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2014). Hierarchical Modeling and Analysis for Spatial Data. CRC Press.
 Banerjee et al. (2008) Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008). Gaussian Predictive Process Models for Large Spatial Data Sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4):825–848.
 BarrazaVillarreal et al. (2008) BarrazaVillarreal, A., Sunyer, J., HernandezCadena, L., EscamillaNuñez, M. C., SienraMonge, J. J., RamírezAguilar, M., CortezLugo, M., Holguin, F., DiazSánchez, D., and Olin, A. C. (2008). Air Pollution, Airway Inflammation, and Lung Function in a Cohort Study of Mexico City Schoolchildren. Environmental Health Perspectives, 116(6):832.
 Bell et al. (2004) Bell, M. L., McDermott, A., Zeger, S. L., Samet, J. M., and Dominici, F. (2004). Ozone and Shortterm Mortality in 95 US Urban Communities, 19872000. Journal of the American Medical Association, 292(19):2372–2378.
 Bell et al. (2006) Bell, M. L., Peng, R. D., and Dominici, F. (2006). The Exposure–Response Curve for Ozone and Risk of Mortality and the Adequacy of Current Ozone Regulations. Environmental Health Perspectives, 114(4):532.
 Berg and Porcu (2017) Berg, C. and Porcu, E. (2017). From Schoenberg Coefficients to Schoenberg Functions. Constructive Approximation, 45(2):217–241.
 Berrocal et al. (2010) Berrocal, V. J., Gelfand, A. E., and Holland, D. M. (2010). A SpatioTemporal Downscaler for Output from Numerical Models. Journal of Agricultural, Biological, and Environmental Statistics, 15(2):176–197.
 Bevilacqua et al. (2012) Bevilacqua, M., Gaetan, C., Mateu, J., and Porcu, E. (2012). Estimating Space and SpaceTime Covariance Functions for Large Data Sets: A Weighted Composite Likelihood Approach. Journal of the American Statistical Association, 107(497):268–280.
 BravoAlvarez and TorresJardón (2002) BravoAlvarez, H. and TorresJardón, R. (2002). Air Pollution Levels and Trends in the Mexico City Metropolitan Area. In Urban Air Pollution and Forests, pages 121–159. Springer.
 Chiogna and Pauli (2011) Chiogna, M. and Pauli, F. (2011). Modelling Shortterm Effects of Ozone on Morbidity: An Application to the Cty of Milano, Italy, 1995–2003. Environmental and Ecological Statistics, 18(1):169–184.
 Cressie and Johannesson (2008) Cressie, N. and Johannesson, G. (2008). Fixed Rank Kriging for Very Large Spatial Data Sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1):209–226.
 Datta et al. (2016a) Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016a). Hierarchical NearestNeighbor Gaussian Process Models for Large Geostatistical Datasets. Journal of the American Statistical Association, 111(514):800–812.
 Datta et al. (2016b) Datta, A., Banerjee, S., Finley, A. O., Hamm, N. A., and Schaap, M. (2016b). Nonseparable Dynamic Nearest Neighbor Gaussian Process Models for Large SpatioTemporal Data with an Application to Particulate Matter Analysis. The Annals of Applied Statistics, 10(3):1286–1316.
 Diario Oficial de la Federación (2014) Diario Oficial de la Federación (2014). Norma Oficial Mexicana NOM025SSA12014.
 Finley et al. (2018) Finley, A. O., Datta, A., Cook, B. C., Morton, D. C., Andersen, H. E., and Banerjee, S. (2018). Efficient Algorithms for Bayesian Nearest Neighbor Gaussian Processes. arXiv preprint arXiv:1702.00434.
 Furrer et al. (2006) Furrer, R., Genton, M. G., and Nychka, D. (2006). Covariance Tapering for Interpolation of Large Spatial Datasets. Journal of Computational and Graphical Statistics, 15(3):502–523.
 Gelman et al. (2014) Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014). Bayesian Data Analysis, volume 2. CRC press.
 Gneiting (1998) Gneiting, T. (1998). Simple Tests for the Validity of Correlation Function Models on the Circle. Statistics & Probability Letters, 39(2):119–122.
 Gneiting (2002) Gneiting, T. (2002). Nonseparable, Stationary Covariance Functions for Space–Time Data. Journal of the American Statistical Association, 97(458):590–600.
 Gneiting (2013) Gneiting, T. (2013). Strictly and NonStrictly Positive Definite Functions on Spheres. Bernoulli, 19(4):1327–1349.
 Gneiting and Raftery (2007) Gneiting, T. and Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477):359–378.
 Gneiting and Schlather (2004) Gneiting, T. and Schlather, M. (2004). Stochastic Models That Separate Fractal Dimension and the Hurst Effect. SIAM Rev., 46:269–282.
 Gneiting et al. (2008) Gneiting, T., Stanberry, L. I., Grimit, E. P., Held, L., and Johnson, N. A. (2008). Assessing Probabilistic Forecasts of Multivariate Quantities, with an Application to Ensemble Predictions of Surface Winds. Test, 17(2):211.
 Gradshteyn and Ryzhik (2007) Gradshteyn, I. S. and Ryzhik, I. M. (2007). Tables of Integrals, Series, and Products. Academic Press, Amsterdam, seventh edition.
 Gramacy and Apley (2015) Gramacy, R. B. and Apley, D. W. (2015). Local Gaussian Process Approximation for Large Computer Experiments. Journal of Computational and Graphical Statistics, 24(2):561–578.
 Heaton et al. (2018) Heaton, M. J., Datta, A., Finley, A., Furrer, R., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D., Katzfuss, M., and Lindgren, F. (2018). Methods for Analyzing Large Spatial Data: A Review and Comparison. arXiv preprint arXiv:1710.05013.
 HernándezGarduño et al. (1997) HernándezGarduño, E., PérezNeria, J., Paccagnella, A. M., PiñaGarcía, M. A., MunguíaCastro, M., CatalánVázquez, M., and RojasRamos, M. (1997). Air Pollution and Respiratory Health in Mexico City. Journal of Occupational and Environmental Medicine, 39(4):299–307.
 Higdon (2002) Higdon, D. (2002). Space and SpaceTime Modeling Using Process Convolutions. In Quantitative Methods for Current Environmental Issues, pages 37–56. Springer.
 Huang et al. (2018) Huang, G., Lee, D., and Scott, E. M. (2018). Multivariate SpaceTime Modelling of Multiple Air Pollutants and Their Health Effects Accounting for Exposure Uncertainty. Statistics in Medicine, 37(7):1134–1148.
 Huerta et al. (2004) Huerta, G., Sansó, B., and Stroud, J. R. (2004). A spatiotemporal model for mexico city ozone levels. Journal of the Royal Statistical Society: Series C (Applied Statistics), 53(2):231–248.
 Jordan et al. (2017) Jordan, A., Krüger, F., and Lerch, S. (2017). Evaluating Probabilistic Forecasts with the R Package scoringRules. arXiv preprint arXiv:1709.04743.
 Kahle and Wickham (2013) Kahle, D. and Wickham, H. (2013). ggmap: Spatial Viualization with ggplot2. The R Journal, 5(1):144–161.
 Katzfuss and Guinness (2017) Katzfuss, M. and Guinness, J. (2017). A General Framework for Vecchia Approximations of Gaussian Processes. arXiv preprint arXiv:1708.06302.
 Kaufman et al. (2008) Kaufman, C. G., Schervish, M. J., and Nychka, D. W. (2008). Covariance Tapering for LikelihoodBased Estimation in Large Spatial Data Sets. Journal of the American Statistical Association, 103(484):1545–1555.
 Kim et al. (2004) Kim, S.Y., Lee, J.T., Hong, Y.C., Ahn, K.J., and Kim, H. (2004). Determining the Threshold Effect of Ozone on Daily Mortality: An Analysis of Ozone and Mortality in Seoul, Korea, 1995–1999. Environmental Research, 94(2):113–119.
 Krüger et al. (2016) Krüger, F., Lerch, S., Thorarinsdottir, T. L., and Gneiting, T. (2016). Probabilistic Forecasting and Comparative Model Assessment Based on Markov Chain Monte Carlo Output. arXiv preprint arXiv:1608.06802.
 Lippmann (1989) Lippmann, M. (1989). Health Effects of Ozone a Critical Review. Japca, 39(5):672–695.
 Loomis et al. (1999) Loomis, D., Castillejos, M., Gold, D. R., McDonnell, W., and BorjaAburto, V. H. (1999). Air Pollution and Infant Mortality in Mexico City. Epidemiology, pages 118–123.
 MacKay (1998) MacKay, D. J. (1998). Introduction to Gaussian Processes. NATO ASI Series F Computer and Systems Sciences, 168:133–166.
 Mage et al. (1996) Mage, D., Ozolins, G., Peterson, P., Webster, A., Orthofer, R., Vandeweerd, V., and Gwynne, M. (1996). Urban Air Pollution in Megacities of the World. Atmospheric Environment, 30(5):681–686.
 Porcu et al. (2016) Porcu, E., Bevilacqua, M., and Genton, M. G. (2016). SpatioTemporal Covariance and CrossCovariance Functions of the Great Circle Distance on a Sphere. Journal of the American Statistical Association, 111(514):888–898.
 Prado and West (2010) Prado, R. and West, M. (2010). Time Series: Modeling, Computation, and Inference. CRC Press.
 Rasmussen and Williams (2006) Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. The MIT Press, Cambridge, MA.
 RiojasRodríguez et al. (2014) RiojasRodríguez, H., ÁlamoHernández, U., TexcalacSangrador, J. L., and Romieu, I. (2014). Health Impact Assessment of Decreases in PM10 and Ozone Concentrations in the Mexico City Metropolitan Area: A Basis for a New Air Quality Management Program. Salud Pública de México, 56:579–591.
 Romieu et al. (1996) Romieu, I., Meneses, F., Ruiz, S., Sienra, J. J., Huerta, J., White, M. C., and Etzel, R. A. (1996). Effects of Air Pollution on the Respiratory Health of Asthmatic Children lLving in Mexico City. American Journal of Respiratory and Critical Care Medicine, 154(2):300–307.
 Sahu et al. (2007) Sahu, S. K., Gelfand, A. E., and Holland, D. M. (2007). HighResolution Space–Time Ozone Modeling for Assessing Trends. Journal of the American Statistical Association, 102(480):1221–1234.
 Salam et al. (2005) Salam, M. T., Millstein, J., Li, Y.F., Lurmann, F. W., Margolis, H. G., and Gilliland, F. D. (2005). Birth Outcomes and Prenatal Exposure to Ozone, Carbon Monoxide, and Particulate Matter: Results from the Childrenâs Health Study. Environmental Health Perspectives, 113(11):1638.
 Schoenberg (1942) Schoenberg, I. J. (1942). Positive Definite Functions on Spheres. Duke Math. Journal, 9:96–108.
 Shirota et al. (2017) Shirota, S., Gelfand, A. E., et al. (2017). Space and Circular Time Log Gaussian Cox Processes with Application to Crime Event Data. The Annals of Applied Statistics, 11(2):481–503.
 Shumway and Stoffer (2017) Shumway, R. H. and Stoffer, D. S. (2017). Time Series Analysis and Its Applications. Springer.
 Sillman (1999) Sillman, S. (1999). The Relation Between Ozone, NOx and Hydrocarbons in Urban and Polluted Rural Environments. Atmospheric Environment, 33(12):1821–1845.
 Solin and Särkkä (2014) Solin, A. and Särkkä, S. (2014). Explicit Link between Periodic Covariance Functions and State Space Models. In Artificial Intelligence and Statistics, pages 904–912.
 Stein (1999) Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer Science & Business Media.
 Stein (2005) Stein, M. L. (2005). Statistical Methods for Regular Monitoring Data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(5):667–687.
 Stein (2008) Stein, M. L. (2008). A Modeling Approach for Large Spatial Datasets. Journal of the Korean Statistical Society, 37(1):3–10.
 Stein (2014) Stein, M. L. (2014). Limitations on Low Rank Approximations for Covariance Matrices of Spatial Data. Spatial Statistics, 8:1–19.
 Stein et al. (2004) Stein, M. L., Chi, Z., and Welty, L. J. (2004). Approximating Likelihoods for Large Spatial Data Sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(2):275–296.
 Vecchia (1988) Vecchia, A. V. (1988). Estimation and Model Identification for Continuous Spatial Processes. Journal of the Royal Statistical Society. Series B (Methodological), pages 297–312.
 Weschler (2006) Weschler, C. J. (2006). Ozoneâs Impact on Public Health: Contributions from Indoor Exposures to Ozone and Products of OzoneInitiated Chemistry. Environmental Health Perspectives, 114(10):1489.
 West and Harrison (1997) West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models. SpringerVerlag New York, Inc., New York, NY, USA.
 White et al. (2018) White, P. A., Gelfand, A. E., Rodrigues, E. R., and Tzintzun, G. (2018). Pollution State Modeling for Mexico City. arXiv preprint arXiv:1807.03935.
 White and Porcu (2018) White, P. A. and Porcu, E. (2018). Towards a Complete Picture of Covariance Functions on Spheres Cross Time. arXiv preprint arXiv:1807.04272.
Model  Equation  Space  ES  CRPS  MAPE  RMSPE  90% CVG 

1  21.518  3.530  4.752  6.611  0.914  
2  (24)  24.889  4.139  5.645  7.554  0.932  
3  (24)  19.941  3.238  4.531  6.085  0.903  
4  (25)  24.726  4.295  5.851  7.890  0.914  
5  (25)  19.773  3.249  4.309  6.050  0.908  
6  (20)  21.045  3.452  4.635  6.514  0.911  
7  (23)  19.652  3.214  4.273  6.025  0.919 
Mean  Standard Deviation  2.5%  97.5%  

7.3943  0.5356  6.0396  8.1924  
0.0207  0.0012  0.0230  0.0180  
0.1129  0.0068  0.0994  0.1263  
86.9006  4.9505  77.6969  97.4642  
22.3190  0.7020  20.8030  23.6254  
0.6740  0.0095  0.6558  0.6940  
0.0947  0.0019  0.0909  0.0985  
2.0981  0.0502  2.0015  2.1986  
0.9568  0.0013  0.9542  0.9592 