Locally stationary spatiotemporal interpolation of
Argo profiling float data
Abstract
Argo floats measure sea water temperature and salinity in the upper 2,000 m of the global ocean. The statistical analysis of the resulting spatiotemporal dataset is challenging due to its nonstationary structure and large size. We propose mapping these data using locally stationary Gaussian process regression where covariance parameter estimation and spatiotemporal prediction are carried out in a movingwindow fashion. This yields computationally tractable nonstationary anomaly fields without the need to explicitly model the nonstationary covariance structure. We also investigate Student distributed microscale variation as a means to account for nonGaussian heavy tails in Argo data. We use crossvalidation to study the point prediction and uncertainty quantification performance of the proposed approach. We demonstrate clear improvements in the point predictions and show that accounting for the nonstationarity and nonGaussianity is crucial for obtaining wellcalibrated uncertainties. The approach also provides datadriven local estimates of the spatial and temporal dependence scales which are of scientific interest in their own right.
Keywords: Movingwindow Gaussian process regression, local kriging, nonstationarity, nonGaussianity, climatology, physical oceanography
1 Introduction
Subsurface open ocean has traditionally been one of the least studied places on Earth, due to a lack of observational data at fine enough spatial and temporal resolutions. That changed dramatically roughly a decade ago with the introduction of the Argo array of profiling floats. Argo is a collection of nearly 4,000 autonomous floats that measure temperature and salinity in the upper 2,000 m of the ocean. The array’s nearly uniform days sampling of the global ocean has enabled oceanographers to study the subsurface ocean at unprecedented accuracy and scale. Argo data have been used, for example, to quantify global changes in the ocean heat content [1], to study ocean circulation [2], mesoscale eddies [3], internal waves [4] and tropical cyclones [5] and to improve climate model predictions [6]. Argo has now become the primary source of subsurface temperature and salinity data for these and hundreds of other studies of ocean climate and dynamics.
A significant portion of scientific results from Argo rely on spatially and temporally interpolated temperature and salinity maps, such as those in [7, 8, 9]. These data products transform the irregularly located Argo observations into a fine regular grid which facilitates further scientific analysis. In a sense, the goal is to “fill in the gaps” between the insitu Argo observations, such as those shown in Figure 1, and to turn them into a continuously interpolated field in both space and time. To achieve this, a number of statistical modeling assumptions need to be made and the resulting interpolated maps, along with their uncertainties, may be sensitive to these choices. Given the unique nature of Argo data and the scientists’ reliance on the gridded maps, it is of utmost importance to produce the interpolations using the most appropriate statistical techniques and to rigorously understand their performance and limitations.
From the statistical perspective, Argo observations constitute a fascinatingly rich geostatistical dataset. There is a huge volume of data (more than 1.6 million profiles, each having between 50–1,000 vertical observations, have been collected to date), the data are nonstationary in both their mean and covariance structure and exhibit heavy tails and other nonGaussian features. Argo is also a rare example of a truly fourdimensional (3 space + time) insitu observational dataset. Due to their cubic computational scaling and typical assumptions of stationarity and Gaussianity, classical geostatistical interpolation techniques, as reviewed for example in [10], cannot directly handle datasets of this size and complexity.
In this paper, we propose a statistical framework for interpolating Argo data that is aimed at addressing both the computational issues caused by the size of the dataset and the modeling challenges caused by the nonstationarity of these data. The approach is based on the basic idea that if a prediction is desired at the spatiotemporal location , where is the spatial location in degrees of latitude and longitude and is time, then the observations that are close to in the space should be the most informative for making the prediction. Indeed, a data point in, say, Northern Atlantic in February is of little value in predicting temperatures in Southern Pacific in August. Furthermore, while it is clear from a simple exploratory inspection of Argo data that a global random field model for the ocean would need to be nonstationary, it is reasonable to assume that the data can be locally modeled as a stationary random field. (Assuming that the mean has been successfully removed, a spatiotemporal random field is stationary if the covariance is a function of only, .) We combine these two ideas by considering data only within a small spatiotemporal neighborhood of and by assuming that this subset of data can be modeled as a stationary random field. We use the data within the neighborhood to first estimate the unknown parameters of the random field model using maximum likelihood and then to perform the interpolation. As first proposed by Haas in [11, 12], we use the neighborhoods in a movingwindow fashion: when moving to the next grid point, the window is recentered and the parameters reestimated. This leads to locally stationary gridded predictions of the globally nonstationary random field and datadriven local estimates of the globally varying spatiotemporal dependence structure. The approach is computationally efficient since it considers only a subset of the full dataset at a time and since the computations across the grid points can be fully parallelized. In a recent line of work described in [13, 14, 15, 16], a closely related movingwindow method has been developed and successfully applied to mapping of satellite observations.
The Roemmich–Gilson climatology (along with the associated anomalies) [7] is one of the more popular gridded Argo data products. Roemmich and Gilson first estimate the mean field using a weighted local spatiotemporal regression fit to several years of Argo data and then perform kriging [17, 18] (also known as optimal interpolation [19] or objective analysis/mapping [20] and closely related to Gaussian process regression [21]) on the meansubtracted monthly residuals to obtain the interpolated anomaly fields. In this work, we use the Roemmich–Gilson mean field, but improve the modeling of the anomalies in three important ways: first, we include time in the interpolation; second, we use datadriven local estimates of the nonstationary covariance structure as described above; and third, we consider Student distributed microscale variation (the socalled nugget effect) in order to account for nonGaussian heavy tails in Argo data. We investigate the point prediction and uncertainty quantification performance of the proposed approach using crossvalidation studies. We demonstrate that adding the temporal component to the mapping leads to major performance improvements, while the locally estimated model parameters and the Student nugget are crucial for obtaining reasonable uncertainties. The uncertainty quantification part is particularly important as the Roemmich–Gilson data product does not currently provide any uncertainty information, presumably due to challenges in modeling the nonstationary and nonGaussian features of the data. Furthermore, our local estimates of the model parameters contain information about ocean dynamics and are of scientific interest in their own right. In fact, even the basic idea of using the observed data to estimate the covariance parameters needed in the mapping appears to be largely unexplored in the oceanographic context. Indeed, most of the existing Argo data products (see Sections 2(b) and 2(c) for a literature review) do not use formal datadriven criteria, such as maximum likelihood, to estimate the model parameters, even though such techniques are standard in the relevant statistics literature [18, 21, 17, 10]. A notable exception is the work described in [2, 22] where ocean velocity fields are mapped based on variogram fits to Argo data.
In the present paper we focus primarily on interpolating Argo temperature anomalies, but similar techniques can be developed for the salinity fields. The rest of this paper is structured as follows: Section 2 provides an overview of Argo and reviews the existing gridded Argo data products with an emphasis on the Roemmich–Gilson climatology. Section 3 explains the proposed locally stationary spatiotemporal interpolation methodology. Section 4 applies the new approach to Argo data and studies its performance in terms of point predictions, uncertainty quantification and the estimated model parameters. Section 5 concludes and discusses directions for future work.
2 Overview of Argo and existing data products
(a) Argo floats and data
Argo [23, 24] is a global array of profiling floats for observing the subsurface ocean. The floats follow a regular cycle to measure temperature (), salinity () and pressure () in the upper 2,000 meters of the water column; see Figure 2(a). The cycle starts when a float at the surface changes its buoyancy and descends to the parking depth of 1,000 meters. The float stays at that depth for approximately 9 days. During that time, it does not collect data and simply drifts with the current. After this, the float descends further down to 2,000 meters and turns on its measurement devices. The float then ascends back to the surface recording insitu temperature, salinity and pressure triplets along the water column. The end result is a temperature and salinity profile for the upper 2,000 meters of the ocean. Once the float reaches the surface, it obtains a satellite position fix, sends its data via satellite to groundbased data centers and starts a new cycle. Each cycle takes approximately 10 days to complete.
There are currently almost 4,000 Argo floats operating in the world’s oceans. This results in nearly uniform global coverage at roughly spatial resolution and 10 days temporal resolution. Figure 1 illustrates a monthly aggregate of Argo data at 300 db. (Pressure is measured in desibars, db. A pressure change of 1 db is approximately equivalent to a depth change of 1 m.) The first Argo floats were deployed in 1999, sparse global coverage was achieved in 2004 and the design resolution in 2007. While satellites provide plenty of data of the ocean surface, there are few other sources of data besides Argo for the subsurface ocean. Indeed, the preArgo shipbased subsurface temperature and salinity observations are sparse in both space and time with heavy sampling biases towards the Northern Hemisphere, coastal areas and summer months, while Argo provides much larger quantities of data and much more uniform sampling [24]. This improvement is particularly dramatic in the Southern Ocean during winter months, where preArgo subsurface data are extremely scarce [7, Figure 1.1].
Argo data are transmitted using either the Argos or Iridium satellite systems. Newer floats (Figure 2(b) shows a SOLOII float at the Scripps Institution of Oceanography) primarily use Iridium, while roughly a half of the current array uses the older Argos system. The transmission system has some important implications on the quality of Argo data: the limited bandwidth of the Argos system allows these floats to transmit only 50–100 data points per profile, while the Iridium floats can transmit up to a 1,000 observations at a time, leading to a finer vertical resolution. To ensure successful data transmission, the Argos floats need to remain on the surface for 6–12 hours, while Iridium data transmission takes only 15 minutes. This improves the lifetime and decreases the undesired surface drift of the Iridium floats. The Argos floats also rely on the same satellite system for location information, while the Iridium floats use GPS, improving the accuracy of the position data.
The nominal accuracy of Argo temperature, salinity and pressure data are 0.005 C, 0.01 psu and 2.5 db, respectively [24]. (Salinity is measured in practical salinity units, psu.) The temperature data are considered very reliable, but pressure and salinity measurements may need to be adjusted for sensor offsets and drift. Argo data are distributed in two ways: realtime data is made available within 24 hours and has passed through various automatic checks and adjustments, while delayedmode data has been manually checked by an oceanographic expert and may contain finer corrections for sensor offsets and drift [25]. Depending on the float model and configuration, the reported temperature and salinity values may be spot samples or pressurebin averages. The vertical sampling is also often nonuniform, with finer sampling closer to the surface. Some floats also sample at finer temporal resolution than the nominal 10 days, in particular in the Kuroshio region near Japan.
Two extensions of the basic Argo program are currently under development [24]: The Deep Argo program aims to extend the vertical coverage of the Argo array to 6,000 meters, while biogeochemical Argo floats are to be equipped with extra sensors for measuring biogeochemical variables, such as dissolved oxygen, nitrate, chlorophyll and pH. Pilot arrays of both programs have been deployed in certain areas of the ocean.
(b) Roemmich–Gilson climatology
The Roemmich–Gilson (RG) climatology and its anomalies [7, 26] are constructed by first estimating a seasonally varying mean field and them performing kriging on the meansubtracted monthly residuals. The vertical dimension is handled by binning the profiles into 58 pressure bins, whose size increases with depth. The mean field is estimated using 3 adjacent pressure levels, but the kriging for the anomalies is carried out using data from only one pressure level at a time.
The RG mean field is a weighted local leastsquares spatiotemporal regression fit that is carried out separately for each latitudelongitude grid point and each depth level. The regression function is (John Gilson, personal communication, 2016)
(1) 
where is latitude, is longitude, is pressure and is time in days within a year. This function is fitted to nearest neighbors, where the factors refer to the 3 depth levels and the 12 calendar months. The nearest neighbors are found across the entire Argo data set and are given weights according to their horizontal distance from the grid point. The horizontal distance metric is
(2) 
where , is the meridional distance in kilometers, is the zonal distance in kilometers (converted from degrees to kilometers using the midpoint latitude) and is a penalty term for crossing ocean depth contours (see [7] for details). Once fitted, the regression function is evaluated at the grid point for the midpoint of each month to produce the monthly mean field estimates for that latitude, longitude and depth.
The anomalies are computed one month at a time. Here the crucial modeling choice concerns the covariance structure of those fields. The Roemmich–Gilson covariance function is
(3) 
where is otherwise the same distance metric as in Equation (2), but with replaced by , where [7, 26]
(4) 
Using instead of has the effect of elongating the zonal dependence in the tropics. This results in a nonstationary covariance function that varies in the zonal direction depending on the latitude. However, the meridional range is kept constant and the same covariance function is used at all longitudes, all depths, all seasons and for both the temperature and the salinity anomalies. Also the noisetosignal variance ratio (the ratio of the nugget variance and the Gaussian process variance in the terminology and notation of Section 3) is set to be the constant 0.15 throughout the global ocean.
The functional form and the parameter values in (3) are motivated by the observed empirical correlation of the steric height anomaly (essentially a vertical integral of the density anomaly; see Section 7.6.2 in [27]) in Argo and satellite altimetry data (see Figure 2.2 in [7]). The parameter values are chosen by a graphical comparison of the correlation functions and not through a formal estimation procedure, such as maximum likelihood or weighted least squares. Since the RG covariance function is chosen based on a vertically integrated quantity, it is likely to be suboptimal for mapping quantities in the mixed layer, which contributes only a little to the vertical integral, but where atmospheric interaction is expected to cause the dependence structure to differ from the deeper ocean.
An important limitation of the Roemmich–Gilson climatology is that it does not provide uncertainty estimates. In principle the formal kriging variance could be used to provide Gaussian prediction intervals, but this would require defining the proportionality constant in Equation (3) (this constant cancels out in the kriging point predictions). Even if this proportionality constant was provided, the uncertainties are unlikely to be reliable given that the RG covariance does not vary with depth and uses a fixed noisetosignal variance ratio. Another limitation is that the RG covariance is only a function of the spatial locations and does not take the temporal dimension into account.
Gasparin et al. [28] improve the Roemmich–Gilson model by including time in the covariance and adding nonstationarity in the meridional range parameter. They also briefly investigate the prediction uncertainties. But their analysis is limited to the Equatorial Pacific only, their covariance remains the same for all depths and all longitudes and their covariance parameters are not based on formal statistical estimates. In this work, we go further and develop datadriven covariances for Argo temperature data that can vary as a function of latitude, longitude, depth and season. Covariance estimates, anomaly maps and uncertainties are investigated in the global ocean at three exemplary depths. The covariances include time and nonstationarity is allowed in all covariance parameters, including the zonal, meridional and temporal range parameters.
(c) Other Argo data products
Besides the Roemmich–Gilson climatology, several other gridded Argo data products have been produced. We give here a brief overview of some of these products. Our treatment is by no means exhaustive and a full list of Argo data products can be found at http://www.argo.ucsd.edu/Gridded_fields.html.
The defining feature of the MIMOC product [8] is that it is mapped on isopycnals (surfaces of constant density) instead of pressure surfaces. This approach may have distinct advantages in handling the vertical movement of isopycnals and in avoiding density inversions. The EN4 data product [9] provides uncertainty estimates, but its uncertainty quantification procedure seems statistically ad hoc, is only validated in the rootmeansquare sense and shows signs of miscalibration below approximately 400 m. ISAS [29] and MOAA GPV [30] are further examples of krigingbased Argo data products. For most products, there is some effort to use physical data to justify the chosen covariance parameters but no formal statistical estimators of these parameters are used. An exception is the work of Gray and Riser [2, 22] that uses a weighted leastsquares variogram fit and an iteratively estimated mean field to map ocean velocity fields based on Argo data. Out of the data products mentioned here, [7] and [2] use Argo data only, while the other products also incorporate data from other sources.
3 Locally stationary interpolation of Argo data
This section describes the statistical methodology we propose for interpolating Argo temperature data. The approach is based on a locally stationary spatiotemporal Gaussian process (GP) regression model. We start by linearly interpolating the Argo temperature profiles to a given pressure level. We then subtract the Roemmich–Gilson mean field (see Sections 2(b) and 4(a)) and work with the residuals, which are assumed to have zero mean. Our goal is to use these residuals to produce gridded maps of temperature anomalies. Similar to the Roemmich–Gilson climatology, our analysis is carried out separately for each pressure level.
Let with be a spacetime grid point for which a prediction is desired. We assume that within a small spatiotemporal neighborhood around the following model holds:
(5) 
where refers to years and to observations within at the desired pressure level in the th year, is the th temperature measurement (with mean removed), and are the location (in degrees of latitude and longitude) and time (in days within a year) of and denotes a zeromean Gaussian process with a stationary spacetime covariance function depending on parameters . We use the nonseparable anisotropic exponential spacetime covariance function , where the GP variance ,
(6) 
and , and are positive range parameters. The nugget effect is independent of and assumed to follow either , where is the zeromean Gaussian distribution with variance , or , where is the scaled Student distribution with degrees of freedom and scale parameter (that is, if and only if , where is the Student distribution with degrees of freedom). The Student nugget provides a way to model nonGaussian heavy tails in Argo data. Notice that under this model we obtain independent realizations of the random field, one for each year, which facilitates estimating the model parameters.
We employ model (5) in a movingwindow fashion: for each grid point , we use data within the local neighborhood to estimate the model parameters and to predict . When we move to the next grid point, we recenter the window around the new location and reestimate the model parameters. The overlap of the nearby windows results in smoothly varying local estimates of the model parameters. Such movingwindow approach to Gaussian process regression was first proposed by Haas in [11, 12]. Figure 3 illustrates the method.
This approach facilitates both the modeling and the computational challenges in Argo data analysis. Instead of having to specify a global nonstationary covariance model, the movingwindow approach enables us to handle the nonstationarity in Argo data using an ensemble of locally fitted stationary GP models. In terms of the computations, the approach essentially replaces the inversion of one large covariance matrix by many inversions of smaller covariance matrices. This results in significant computational gains, especially since the computations across the grid points are embarrassingly parallel. More concretely, let be the number of years and the typical number of global Argo observations for each year. Then, to leading order, the computing time of a global GP regression fit would be , where is a constant. When each moving window contains fraction of data, there are grid points and the computations are parallelized to threads, the computing time of the movingwindow approach is . Rough values for the analysis carried out in this paper are , and , leading to a speedup factor .
In model (5), we understand the nugget to primarily reflect ocean variability at spatiotemporal scales smaller than the days Argo sampling resolution. The fitted nugget may also capture sensor noise, but we consider this component to be negligibly small in comparison to the microscale ocean variability. This interpretation of the nugget leads us to make predictions for instead of , which widens the predictions intervals by an amount corresponding to the microscale component . This is a sensible approach for temperature data, since the microscale ocean temperature variability is orders of magnitude larger than the noiselevel of Argo temperature sensors. For salinity, especially at large depth, it would be appropriate to model the measurement process more carefully, but this is outside the scope of the present work.
To demonstrate the importance of proper modeling of temporal effects, we also consider a spatial version of model (5). In that case, the moving window remains the same as before, but we ignore the temporal separation of the observations within the window by dropping the time covariate in Equation (5). The distance metric for the covariance is given by Equation (6) without the time term. The next two sections explain in detail the parameter estimation, point prediction and uncertainty quantification steps under model (5).
(a) Gaussian nugget
With the Gaussian nugget , the unknown model parameters are the covariance parameters and the nugget variance . We use maximum likelihood to estimate these parameters from Argo data. For each year , let be the matrix with elements and let be the column vector with elements . The loglikelihood of the parameters is
(7)  
(8) 
and the maximum likelihood estimator (MLE) of is
(9) 
The MLE needs to be obtained using numerical optimization; we use the BFGS quasiNewton algorithm, as implemented in the Matlab Optimization Toolbox R2016a [31], on logtransformed parameters.
We base the predictions on the conditional distribution , where we have plugged in the MLE for the model parameters. Standard manipulations show that the predictive distribution is
(10) 
where is a column vector with elements . We make point predictions using the conditional mean . This is the wellknown kriging predictor (see, e.g., [17, 18]) based on data within the moving window and with plugin values for the model parameters. It is mean square error optimal assuming that model (5) is correct and ignoring the uncertainty of the model parameters. To quantify the uncertainty of , we use the predictive intervals based on ,
(11) 
where is the standard normal quantile and is the kriging variance.
(b) Student nugget
With the Student nugget , the likelihood and the predictive distribution are not available in closed form. To achieve computationally tractable inferences, we employ the Laplace approximation as in [32]; see also Section 3.4 in [21]. We give here only an overview of the method and refer to [32, 21] for details.
Denoting , the th year likelihood can be written as
(12) 
where . Denote the mode of by . Since
(13)  
(14) 
is also the mode of the conditional distribution . Employing a secondorder Taylor expansion of around ,
(15) 
Equation (12) becomes a Gaussian integral that can be evaluated analytically. The resulting approximate loglikelihood is [32, 21]
(16)  
where . The year approximate loglikelihood is then the sum of the oneyear loglikelihoods, , and this is maximized numerically with respect to the model parameters .
Denote . The predictive distribution for is
(17) 
where we have plugged in the Laplace approximated MLEs for the model parameters. Here one can again employ the Laplace approximation by taking a secondorder Taylor expansion of around its mode. This results in the Gaussian approximation . Equation (17) then becomes a Gaussian integral, which can be evaluated to give
(18) 
The Laplace approximated predictive distribution for is therefore
(19) 
where denotes the distribution of , with following the Gaussian distribution in Equation (18), and and are independent. The point predictions are made using the approximate conditional mean and the predictive intervals are , where is the quantile of the distribution of , where and are as in Equation (19). Since is not available in closed from, we compute it by Monte Carlo sampling.
Software implementations of Gaussian processes with a Student nugget differ in terms of how is computed and how the approximate loglikelihood is maximized. In this work, we compute the Student fits using the GPML toolbox [33, 34], which we have adapted to include the multiyear likelihood and the Student predictive intervals. GPML uses conjugate gradient descent for finding the model parameters and Newton’s method for maximizing with respect to , with special care for the nonconcavity of the latter problem [33]. When finding the approximate MLEs, we initialize to the MLEs from the Gaussian nugget fit. This initialization was found to significantly improve the convergence and stability of the conjugate gradient descent.
4 Results
In this section, we study the performance of the statistical methodology described in Section 3 in interpolating Argo temperature data. We use the Roemmich–Gilson approach described in Section 2(b) as a baseline and investigate how datadriven local estimates of the covariance structure improve treatment of the anomalies. Section 4(a) describes the models and datasets that we consider. We then study the fitted models from three perspectives: 1) crossvalidated point prediction performance in Section 4(b), 2) crossvalidated uncertainty quantification performance in Section 4(c) and 3) the estimated spatiotemporal dependence structure in Section 4(d).
(a) Experiment setup
Our experiments are performed by fitting models to Argo temperature data collected between 2007 and 2016. We investigate three pressure levels, 10 db, 300 db and 1500 db, with the model parameters estimated separately for each pressure level. We focus on 1 and 3month temporal windows centered around February (i.e., February 2007, February 2008,, February 2016 or January–March 2007, January–March 2008,, January–March 2016 are considered independent realizations from the statistical model in Equation (5)). To enable comparison of models with different time windows, all the crossvalidation studies are done for February data only.
Table 1 describes the different models considered in this study. Model 1, which we regard as the baseline reference model, is our reimplementation of the procedure developed by Roemmich and Gilson [7]; see Section 2(b). Apart from a few technical details (see below), this model is the same as the one used to produce the Roemmich–Gilson Argo climatology and its associated anomalies. Models 2–6 are different variants of the locally stationary mapping procedure described in Section 3. In each case, the mapping is carried out on a grid and the covariance parameters are estimated using maximum likelihood within a moving window on this grid. Model 2 is a 1month fit with a spatial covariance and a Gaussian nugget. Model 3 is otherwise the same but with a Student nugget. Model 4 is a 3month version of model 2. Models 5 and 6 are 3month fits with a spatiotemporal covariance and either a Gaussian or a Student nugget. The spatiotemporal fits are done with 3 months of data to make sure that there are enough profiles from each float to estimate the temporal covariance structure (there are usually 9 profiles from each float within a 3month window).
Model  Time window  Mean  Covariance  Nugget 

1  February  RG (spatial)  RGlike  Gaussian 
2  February  RG (spatial)  Local (spatial)  Gaussian 
3  February  RG (spatial)  Local (spatial)  Student 
4  January–March  RG (spatialtemporal)  Local (spatial)  Gaussian 
5  January–March  RG (spatialtemporal)  Local (spatialtemporal)  Gaussian 
6  January–March  RG (spatialtemporal)  Local (spatialtemporal)  Student 
For each model, the mean field is the Roemmich–Gilson local regression fit (see Section 2(b)). The publicly available version of the RG climatology [26] includes only the annual mean field, but John Gilson kindly provided us with the midmonth evaluations of the local mean functions (1). We use either these midmonth evaluations treating the mean as a constant over the temporal period (spatial mean) or alternatively a temporally varying reconstruction of the original mean (spatiotemporal mean) as indicated in Table 1. The temporally varying mean is reconstructed from the 12 midmonth evaluations of (1) by handling the one remaining degree of freedom by finding the minimumnorm solution in the space of the regression coefficients.
The reference model (model 1) captures the key aspects of the Roemmich–Gilson approach. There are however two technical differences between our implementation and theirs. First, we do not include the depth penalty term when we compute the point estimates and the uncertainties. And, second, we do not implement specialized treatment of distances across islands or continental land. These differences are unlikely to markedly affect the conclusions drawn below. Finally, we note that when computing the prediction intervals for the reference model, we need to provide the proportionality constant in Equation (3) (i.e., the variance of the Gaussian process part of the spatial model). Since Roemmich and Gilson do not provide uncertainty estimates, they also do not have estimates of this constant as it cancels out in the point predictions. To simulate what could have potentially been done to produce uncertainties under the Roemmich–Gilson model, we estimate the proportionality constant using a movingwindow empirical variance. That is, the RGinspired prediction intervals at the grid point are formed using the covariance model (3) with the following estimate of the GP variance :
(20) 
where the denominator originates from the RG noisetosignal variance ratio 0.15 via the relation . We emphasize that the resulting prediction intervals are not part of the original Roemmich–Gilson climatology and by no means advocated by them for uncertainty quantification.
We fit models 1–6 to the global Argo data set as of May 8, 2017 [35]. The quality control criteria we use for filtering out profiles with technical issues are given in the supplement [36]. There were a total of 1,417,813 Argo profiles in 2007–2016, out of which 994,709 passed our selection criteria. We also perform a further temporal filtering to focus on the desired time windows and a spatial filtering based on the Roemmich–Gilson land mask [26], which filters out profiles located in marginal seas, such as the Mediterranean Sea or the Gulf of Mexico. The final data set has 70,227 profiles in February and 223,797 profiles in January to March.
The analysis was carried out using Matlab R2016a. As described in Section 3(b), we use GPML [33, 34] to fit models 3 and 6, while the other models are our own implementations. The computations were carried out on the Midway2 cluster at the University of Chicago Research Computing Center. The possibility to parallelize the movingwindow computations to the 28 threads of the Midway2 compute nodes was crucial for making the analysis computationally feasible. The Matlab code used to produce the results is available on Github [37].
(b) Point predictions
We first investigate the performance of the different models in making point predictions of the temperature field. Figure 4 shows the February 2012 temperature anomalies at 10 db, 300 db and 1500 db for the locally stationary spatiotemporal model with a Gaussian nugget (model 5) and for the reference model (model 1). The maps are on a grid and the spacetime field is evaluated at noon February 15, 2012. The overall patterns in both fields are similar: one can recognize the largescale anomalies near the surface, the elongated patterns in subsurface Equatorial regions and the meanders and eddies associated with the western boundary currents and the Antarctic Circumpolar Current. There are, however, a number of clear differences between the maps. Especially at 10 db, the reference map shows small speckles that are absent from the locally fitted map. This is related to the Roemmich–Gilson covariance parameters which are not optimized for mapping anomalies near the surface. Also, while present in both maps, the zonal elongation of the anomalies in the Equatorial regions is much more pronounced in the reference maps.
In order to study the difference between the reference method and the locally stationary maps in more quantitative terms, we perform a crossvalidation study comparing the predictive performance of the different methods. We use two crossvalidation procedures: in leaveoneobservationout (LOOO) crossvalidation we leave out one temperature observation at a time, while in leaveonefloatout (LOFO) crossvalidation we leave out an entire float. LOOO predictions are easier to make since there will almost always be nearby observations from the same float to constrain the temperature value at the prediction location. LOFO crossvalidation on the other hand creates a “gap” in the Argo array and has a higher amount of irreducible error. In the actual mapping problem, the typical distance from a grid point to nearby floats will be somewhere in between these two extremes.
We investigate the crossvalidation performance in terms of the rootmeansquare error (RMSE), the median absolute error (MdAE) and the third quartile of the absolute error (QAE). Let be the prediction at with either removed from the dataset (LOOO) or all observations with the same float ID as removed (LOFO). Then
(21) 
and MdAE and QAE are the sample median and sample third quartile of . When crossvalidating , the model parameters are taken from the moving window centered at the grid point closest to . The parameter estimates and the mean fields are kept fixed during the crossvalidation.
Table 2 shows the LOOO crossvalidation performance for the models with a Gaussian nugget (models 1, 2, 4 and 5). Also included is the performance when the predictions are made using only the Roemmich–Gilson spatial mean without any modeling of the anomalies (the performance of the spatiotemporal mean is only slightly better and omitted for clarity purposes). For all three pressure levels and for all three performance metrics, the relative performance of the models is in the following order: the reference model (model 1) outperforms the spatial mean, the locally stationary 1month spatial model (model 2) outperforms the reference model (model 1), the locally stationary 3month spatial model (model 4) outperforms the 1month spatial model (model 2) and the locally stationary 3month spatiotemporal model (model 5) outperforms the 3month spatial model (model 4). Combining datadriven local covariance parameters with a longer temporal window and a covariance structure that includes the time leads to fairly substantial 10% – 30% performance improvements over the reference model. The improvement is particularly large near the surface at 10 db. The distribution of the squared prediction errors (not shown) has a much fatter right tail than it would be if the errors followed a common normal distribution, so we also consider MdAE and QAE as summaries of prediction performance to make sure that our conclusions are not largely due to a small fraction of poor predictions. Table 2 shows that all three performance metrics are consistent in their relative rankings of the various methods.
Pressure  Perf.  Spatial  Reference  Space  Space  Spacetime 

level  metric  mean  model  (1 month)  (3 months)  (3 months) 
10 db  RMSE  0.8889  0.6135  0.5876 (4.2 %)  0.5667 (7.6 %)  0.5072 (17.3 %) 
QAE  0.8670  0.5026  0.4824 (4.0 %)  0.4568 (9.1 %)  0.3735 (25.7 %)  
MdAE  0.4750  0.2556  0.2490 (2.6 %)  0.2293 (10.3 %)  0.1801 (29.5 %)  
300 db  RMSE  0.8149  0.5782  0.5692 (1.6 %)  0.5675 (1.9 %)  0.5124 (11.4 %) 
QAE  0.6320  0.4213  0.4150 (1.5 %)  0.4005 (4.9 %)  0.3684 (12.6 %)  
MdAE  0.3062  0.1991  0.1957 (1.7 %)  0.1873 (5.9 %)  0.1740 (12.6 %)  
1500 db  RMSE  0.1337  0.1014  0.0997 (1.7 %)  0.0935 (7.8 %)  0.0883 (12.9 %) 
QAE  0.1043  0.0736  0.0725 (1.5 %)  0.0678 (7.9 %)  0.0641 (12.8 %)  
MdAE  0.0530  0.0356  0.0355 (0.3 %)  0.0328 (8.0 %)  0.0311 (12.7 %) 
Pressure  Perf.  Spatial  Reference  Space  Space  Spacetime 

level  metric  mean  model  (1 month)  (3 months)  (3 months) 
10 db  RMSE  0.8889  0.7177  0.6823 (4.9 %)  0.6954 (3.1 %)  0.6489 (9.6 %) 
QAE  0.8670  0.6107  0.5776 (5.4 %)  0.5987 (2.0 %)  0.5222 (14.5 %)  
MdAE  0.4750  0.3165  0.2981 (5.8 %)  0.3062 (3.2 %)  0.2552 (19.3 %)  
300 db  RMSE  0.8149  0.7686  0.7486 (2.6 %)  0.7483 (2.6 %)  0.7388 (3.9 %) 
QAE  0.6320  0.5942  0.5733 (3.5 %)  0.5666 (4.6 %)  0.5556 (6.5 %)  
MdAE  0.3062  0.2856  0.2753 (3.6 %)  0.2732 (4.3 %)  0.2664 (6.7 %)  
1500 db  RMSE  0.1337  0.1373  0.1308 (4.8 %)  0.1313 (4.4 %)  0.1307 (4.8 %) 
QAE  0.1043  0.1015  0.0976 (3.9 %)  0.0973 (4.2 %)  0.0959 (5.6 %)  
MdAE  0.0530  0.0511  0.0499 (2.3 %)  0.0491 (3.7 %)  0.0484 (5.3 %) 
Table 3 shows the same results for LOFO crossvalidation. The prediction errors for the spatial mean remain the same, but the rest of the errors are larger than with LOOO crossvalidation, reflecting the more challenging nature of the LOFO prediction task. Even in this case, there are still distinct advantages to be gained by appropriate modeling of the anomalies. The ranking of the models and the general conclusions are otherwise the same as above, expect for two differences: First, here the 3month spatial model (model 4) does not significantly improve upon the 1month model (model 2) and can in fact even perform worse. This happens because the model confuses spatial and temporal variation and highlights the importance of using a spatiotemporal covariance model. Second, at 1500 db, the RMSE of the reference model is slightly larger than the RMSE of the spatial mean. As discussed above, this may happen because of a few values in the right tail of the squared prediction error distribution, but may also indicate that the Roemmich–Gilson covariance parameters are not wellsuited for this depth. In comparison, all the datadriven models perform better than the spatial mean, as expected.
The crossvalidation results for the Student nugget are given in the supplement [36]. The Student models tend to perform worse than the comparable Gaussian models. This happens because they smooth out nonGaussian high frequency features (eddies in particular) by including them in the nugget term. Nevertheless, the same conclusion that the spatiotemporal model outperforms the purely spatial model remains true. Even though the Student models have inferior point prediction performance, they offer significant advantages in uncertainty quantification (see Section 4(c)).
To summarize, these results highlight the importance of including time in the mapping and allowing the covariance parameters to change with location and depth. Indeed, the largest improvements are observed at 10 db, where one would intuitively expect the datadriven covariances to differ the most from the Roemmich–Gilson model (see Section 2(b)). To put these results into context, it is important to keep in mind that statistical procedures typically converge at sublinear rates as the amount of data increases. For example, assuming a rate of convergence, a improvement in the predictive performance translates into more data. This would correspond to deploying roughly 2,000 additional floats at a cost of 30 million USD (one Argo float costs approximately 15,000 USD [38]). Similarly, even a performance improvement would correspond to approximately more data at a cost of some 12 million USD. These costs are even higher if one takes into account the need to replenish the array and the cost of data handling.
(c) Uncertainty quantification
We next investigate the uncertainty quantification performance of the different models. Figure 5 shows , i.e., the ratio of the postdata and predata variances, for the locally stationary 3month spatiotemporal model with a Gaussian nugget (model 5) at 10 db, 300 db and 1500 db in February 2012. Also shown is the same ratio for the reference model (model 1) at 10 db. Since the uncertainties of Gaussian process models depend only on the covariance parameters and the observation locations, the uncertainty of the reference model at 300 db and 1500 db (not shown) looks essentially the same the uncertainty at 10 db as the Roemmich–Gilson covariance model is the same at all depths (there are minor differences due to some profiles not extending all the way from 10 db to 1500 db). By contrast, the locally stationary uncertainties are vastly different at different depths. This happens because the estimated covariance parameters vary significantly as a function of depth (see Section 4(d)). Based on the anomalies shown in Figure 4, it makes intuitive sense that the uncertainties near the surface should be quite different from the uncertainties at greater depths. Note also that in the Roemmich–Gilson model with its fixed noisetosignal variance ratio , the postdatatopredata variance ratio shown in Figure 5(d) does not depend on how the GP variance is chosen. By contrast, the rest of the results in this section require an estimate of (see Section 4(a)).
In order to study the uncertainty quantification performance in a more quantitative way, we crossvalidate the entire predictive distribution for the different interpolation methods. With the Gaussian nugget, the crossvalidated predictive distribution for is , where is with either removed (LOOO) or all observations with the same float ID as removed (LOFO). The predictive mean and variance are as in Equation (10), but with the appropriate data points removed. The covariance parameters and are taken from the MLEs at the grid point closest to . Conditional on , it then follows that so we can compare the sample quantiles of to the corresponding quantiles of the standard Gaussian distribution.
With the Student nugget, the crossvalidated Laplace approximated predictive distribution is , where denotes the distribution of , with , where and are as in Equation (18), but with the appropriate data points removed when computing the Laplace approximation, and and are independent. Let denote the cumulative distribution function of . Then, conditional on , we have that , where is the uniform distribution on . Letting denote the standard Gaussian cumulative distribution function, it follows that and we can again compare the sample quantiles of to the quantiles of the standard Gaussian distribution. Since is not available in closed form, we compute its values by Monte Carlo sampling. The model parameters are taken to be the Laplace approximated MLEs at the grid point closest to .
We compare the calibration of the reference model (model 1), the locally stationary 1month spatial model with a Gaussian and a Student nugget (models 2 and 3) and the locally stationary 3month spatiotemporal model with a Gaussian and a Student nugget (models 5 and 6). For each method, let denote the crossvalidated sample quantile on the scale and the corresponding theoretical quantile. Figure 6 shows the quantile difference plotted against the theoretical quantile at 300 db for LOOO crossvalidation. This can be understood as the usual QQ plot with the identity line subtracted—plotting the quantiles this way helps visualize differences at the core of the distribution. The reference model is poorly calibrated with both tails of the data distribution much wider than those of the predictive distribution. The locally stationary models with the Gaussian nugget improve the calibration, but the data distribution still has heavier tails and a pointier core than the predictive distribution. We understand this as evidence of nonGaussian heavy tails in Argo temperature data. The Student nugget provides a way to account for these heavy tails and indeed the calibration of the Student models is much better than that of the fully Gaussian models. Even though some miscalibration still remains, the overall improvement of the locally stationary Student models over the reference model is quite substantial. We also note that the spatial and spatiotemporal models are essentially equally wellcalibrated in the present setting.
Confidence  Method  Empirical  Mean  Median 

level  coverage  length  length  
68%  Reference  0.6607  0.7511  0.6435 
Space, Gaussian nugget  0.7745  0.9749  0.8728  
Spacetime, Gaussian nugget  0.7800  0.8722  0.7816  
Space, Student nugget  0.7261  0.8427  0.7621  
Spacetime, Student nugget  0.7389  0.7697  0.7049  
95%  Reference  0.8755  1.4721  1.2612 
Space, Gaussian nugget  0.9482  1.9108  1.7107  
Spacetime, Gaussian nugget  0.9490  1.7095  1.5320  
Space, Student nugget  0.9432  1.9918  1.7525  
Spacetime, Student nugget  0.9452  1.8543  1.6192  
99%  Reference  0.9329  1.9347  1.6575 
Space, Gaussian nugget  0.9777  2.5112  2.2483  
Spacetime, Gaussian nugget  0.9793  2.2466  2.0134  
Space, Student nugget  0.9835  3.9385  2.6417  
Spacetime, Student nugget  0.9844  3.8086  2.3976 
The quantile plots in Figure 6 translate directly into the coverage probability of the predictive intervals. Table 4 shows the LOOO crossvalidated empirical coverages and interval lengths for 68%, 95% and 99% predictive intervals at 300 db for the same models as in Figure 6. As expected based on Figure 6, the reference model undercovers at all confidence levels. The locally stationary models overcover at 68% level, are wellcalibrated at 95% level and undercover at 99% level. At 68% and 99% levels, the coverage of the Student intervals is much closer to the nominal level than the coverage of the Gaussian intervals. The spacetime intervals are always shorter than the corresponding spatial intervals, which is consistent with the performance improvements observed in Section 4(b).
The quantile plots, empirical coverages and interval lengths for the other depths and for LOFO crossvalidation are given in the supplement [36]. The LOOO conclusions at 10 db are otherwise similar to 300 db, except that the spatiotemporal models appear to have worse calibration than the spatial models. This might be an indication that the spatiotemporal dependence structure near the surface is more complicated than what our exponential covariance function with geometric anisotropy can capture. For LOOO at 1500 db, the spatial and spatiotemporal models are equally wellcalibrated, but there also appears to be less nonGaussianity and the Student nugget provides only limited improvement over the Gaussian nugget. The basic conclusions that locally stationary modeling improves the calibration over the reference model and that the Student nugget improves over the Gaussian nugget still largely hold true for LOFO crossvalidation, but here the spatiotemporal models are slightly worse calibrated than the spatial models at all depths. The spatiotemporal models nonetheless provide distinct advantages in terms of the length of the intervals. In these cases, the optimal choice of intervals depends on the relative importance given to accurate calibration and short interval length.
(d) Local estimates of dependence structure
In this section, we investigate the locally estimated model parameters and demonstrate that they exhibit physically meaningful structure. We study in detail the fitted 3month spatiotemporal model with a Gaussian nugget (model 5). Analogous plots for the other models are given in the supplement [36].
Figure 7 shows the estimated total variance at 10 db, 300 db and 1500 db. At 10 db and 300 db, we can clearly see the large variability in the western sides of the ocean basins caused by the strong western boundary currents. For example, the Kuroshio Current off the coast of Japan, the Gulf Stream in the NorthWest Atlantic, the Brazil Current in the SouthWest Atlantic and the Agulhas retroflection and leakage areas around the southern tip of Africa (see Section 11.4.2 in [27]) can be easily identified at both depths. The East Australian Current is also visible at 300 db. By contrast, the eastern sides of the ocean basins have significantly less variability, as expected. A notable exception is NorthEastern Atlantic at 1500 db which has much more variability than other regions at this depth. We suspect that this variability can be attributed to eddies caused by outflow from the Mediterranean Sea [39]. The bands of high variability at roughly 10N and 10S at 10 db in the Pacific Ocean are likely to be related to Rossby waves at those latitudes.
We next study the estimated ranges of zonal, meridional and temporal dependence in Argo temperature data. We do this by plotting maps of the correlation implied by the locally estimated covariance parameters at given zonal, meridional and temporal lags. We plot the maps in the correlation space instead of the range parameter space since there is some degree of ambiguity with regards to what fraction of the variability is attributed to the nugget effect in the MLE fits (at intermediate lags, a large nugget variance , a small GP variance and a large range parameter , or can imply almost the same fitted correlation as a small nugget , a large GP variance and a small range parameter , or ).
Figure 8 shows the fitted correlation at the zonal lag km (with , and taking the conversion latitute into account when converting degrees into kilometers). The range of the zonal dependence varies considerably as a function of depth, with much longer ranges near the surface than at greater depths. The longrange dependence near the surface is likely to be caused by interaction with the atmosphere. There is also a general tendency to have zonally elongated ranges in the Equatorial regions. The estimated ranges are generally longer in the Pacific Ocean than in the Indian or Atlantic Oceans. At 10 db in the Pacific Ocean, there is again evidence of patterns related to Equatorial Rossby and Kelvin waves.
Figure 9 shows the fitted correlation at the meridional lag km (with , ). The general conclusions are similar to Figure 8: the ranges are longer close to the surface and in the Equatorial regions. The distinct patch of longrange dependence in the Equatorial West Pacific is likely to be related to the Pacific thermocline which is tilted westward along the Equator. The meridional ranges also show interesting patterns in areas where the Amazon River and the Congo River flow into the Atlantic Ocean. Note that all the plots in Figures 8 and 9 are on the same (logarithmic) color scale and can thus be compared directly. This comparison shows that the dependence structure is clearly anisotropic with the zonal ranges longer than the meridional ones.
Figure 10 shows the fitted correlation at the temporal lag days (with and ). There are differences in the correlation patterns between the three depths, but interestingly the overall magnitude of the temporal correlation changes relatively little with depth. There does seem to be a slight tendency for the temporal correlation to increase with depth in the Southern Ocean and to decrease with depth in most other areas, but these changes are much less pronounced than in the case of the zonal and meridional ranges (Figures 8 and 9).
Ninove et al. [40] have previously estimated dependence scales using Argo data. They estimate the range parameters (which are often called “decorrelation scales” by oceanographers) by dividing the global ocean into large disjoint boxes and then use a weighted leastsquares variogram fit to data within each box. They only consider spatial dependence and do not provide estimates of temporal ranges. In comparison, our movingwindow approach includes the temporal dimension and provides information about the dependence structure at much finer horizontal resolution. It is also wellestablished in the spatial statistics literature that MLE fits should be preferred over variogram fits [18]. Nevertheless, many of our conclusions qualitatively agree with those of Ninove et al. They also find longer ranges near the surface, elongation in the tropics, zonal anisotropy and shorter ranges in the Indian and Atlantic Oceans. However, we do not find evidence for the strong increase in spatial ranges below 700 m that Ninove et al. observe. Instead, our spatial ranges are almost always shorter at 1500 db than at 300 db. We suspect that the effect seen by Ninove et al. is an artifact caused by the preArgo mean field they use. This mean field is likely to be poorly constrained at depth, where little data was available before Argo, and any residual mean effect would show up as increased spatial dependence in the empirical variograms.
5 Conclusions and outlook
We have demonstrated that the spatiotemporal dependence structure of ocean temperatures can be estimated from Argo data using local movingwindow maximum likelihood estimates. The resulting fully datadriven nonstationary anomaly fields and their uncertainties yield substantial improvements over existing stateoftheart methods. The improvements in point prediction accuracy are comparable to deploying hundreds of new Argo floats. There is also evidence of nonGaussian heavy tails in Argo temperature data and taking this into account is crucial for obtaining wellcalibrated uncertainties. The estimated covariance models exhibit physically sensible patterns that can be analyzed further to test theories of largescale ocean circulation.
Statistically, this work demonstrates how wellsuited movingwindow Gaussian process regression is for handling modern massive nonstationary spatiotemporal data sets. This approach helps address in a straightforward manner challenges related to both nonstationarity and computational complexity, issues that affect analysis of almost any largescale environmental data set. In the present work, we developed a version of the movingwindow approach that uses a Student distributed nugget effect to address heavy tails in Argo data. To the best of our knowledge, Student nugget has not been previously used in conjunction with movingwindow Gaussian process regression. While spatial and spatiotemporal movingwindow techniques have been around since the seminar work of Haas [11, 12], we feel that there is much room for further application and methodological development on this front.
To produce our interpolated temperature anomalies, we used the fairly simple exponential covariance function with geometric anisotropy along the zonal, meridional and temporal axes (see Equation (6)). We have compared the model fit to empirical covariances in various regions and generally find the two to be in reasonably good agreement. We have also investigated more complex spacetime covariance models including the Matérn model [18], the Gneiting model [41] and geometric anisotropy that is not constrained to be aligned with the latitudelongitude coordinate axes. We have experimented with these models in selected regions of the Pacific Ocean at 300 db. In each case, we found only minor improvements in the likelihood values and point predictions in comparison to the simple exponential covariance function. At the same time, the estimated covariance parameters became very challenging to interpret and validate (since several combinations of the model parameters can represent almost the same covariance structure) and the likelihood computations were much too slow to be practical on a global scale. These extensions could still be useful at other depths or regions, but the interpretability and computations remain problematic. Another potentially useful extension, which we have not yet explored, is to add a velocity term in the spatiotemporal covariance, as is for example done in mapping of satellite altimetry data [42].
While the Student prediction intervals clearly yield improved uncertainty quantification, the present implementation of the Student nugget suffers from algorithmic instabilities. This can be easily seen in the fitted model parameters shown in the supplementary material [36]. These instabilities are likely to be related to the nonconcave optimization problem that needs to be solved as part of the Laplace approximation [32]. Alternatively, it might be that the Laplace approximation itself is not appropriate in some parts of the ocean. We experimented with an alternative implementation of the Laplace approximation [43] but observed similar instabilities. Further research is needed to understand the precise cause of these instabilities before the Student fits can be recommended for use in an actual data product.
This work has only scratched the surface in terms of the statistical research that can be done with the Argo data set. We sketch below some potential extensions and directions for future work:

In this work, we have focused on Argo temperature data only. In principle, similar locally stationary interpolation is also applicable to salinity data, but the challenge is that especially at larger depths the variability of the salinity field is so small that instrumental errors can no longer be ignored. This means that more careful modeling of the nugget effect to include measurement error is needed. We also feel that when mapping both temperature and salinity, one should take into account their correlation which requires modeling the spatiotemporal crosscovariance between the two fields. This crosscovariance function can in fact be of scientific interest in its own right.

Maps of the locally estimated covariance models (see Figures 7–10) tend to sometimes have zonal or meridional stripes that are related to the sharp boundary of the moving window. These stripes could be removed by replacing the moving window by a smooth kernel function as in [44]. Since it is often possible to represent almost the same covariance structure by various combinations of the model parameters, we also sometimes see abrupt transitions from one parameter configuration to another. It should be possible to fix this issue by finding a way to borrow strength across the nearby grid points in the parameter estimates. One could, for instance, envision using a Bayesian hierarchical model, where the maps of the local model parameters are themselves modeled as spatially dependent stochastic processes, as in [45], for example. However, it seems challenging to find a way to carry out the computations at the scale of the global ocean under such model.

The only type of nonGaussianity we considered in this work is spatially and temporally uncorrelated Student distributed heavy tails. However, further exploratory analysis of the meansubtracted residuals reveals also the presence of skewness (especially near the surface) and regions where the distribution of the anomalies appears to be multimodal. It is also likely that the heavytailed features, which at least partially correspond to eddies, are spatially and temporally correlated (this would also help explain why the point prediction performance of the Student nugget models is inferior to the performance of the Gaussian nugget models). It would hence be interesting to consider spacetime models with more versatile nonGaussian structure. Stochastic partial differential equations, as in [46], may provide a way to construct such models.

In the present work, we have not exploited dependence across depth levels. A natural extension would be to carry out full 4D mapping by including the vertical dimension in the covariance structure. In principle, the movingwindow approach can easily be extended to this situation by considering observations at nearby depths. However, modeling the vertical covariance structure seems nontrivial since the vertical direction contains phenomena that are fundamentally different from the other dimensions. For example, ocean stratification can cause abrupt vertical changes, while eddies exhibit coherent vertical structure. One would ideally also like to find a way to incorporate the constraint that density must a monotonically increasing function of depth.

In this work, we have only considered pointwise uncertainties. However, many scientifically important quantities are functionals of the temperature and salinity fields (or some field derived from these two quantities). For example, the ocean heat content is essentially an integral of the temperature field. To provide uncertainties for functionals requires access to covariances or conditional simulations. Neither of these quantities can be directly obtained from the movingwindow approach considered here. It should, however, be possible to combine the local covariance models into a valid global nonstationary model using the approach developed in [45]. This global model could then be used to compute the predictive covariance matrix or to produce conditional simulations, although it is not immediately clear if all the necessary computations are feasible on the scale of the Argo data set.
Access to data, code and supplementary material
The results presented in this paper are based on the May 8, 2017 snapshot of the Argo GDAC (http://doi.org/10.17882/42182#50059). The Matlab code is available at https://github.com/mkuusela/ArgoMappingPaper and supplementary material can be found at https://github.com/mkuusela/ArgoMappingPaper/raw/master/Doc/supplement.pdf.
Acknowledgments
We are grateful to Fred Bingham, Chen Chen, Bruce Cornuelle, Donata Giglio, John Gilson, Sarah Gille, Alison Gray, Malte Jansen, Alice Marzocchi, Matt Mazloff, Breck Owens, Dean Roemmich, Megan Scanderbeg and Nathalie Zilberman for many discussions about Argo data, oceanography and previous mapping algorithms as well as for helpful feedback on preliminary results throughout this project.
Most of this work was carried out while MK was at the University of Chicago, Department of Statistics. This work was supported by the STATMOS Research Network (NSF awards 1106862, 1106974 and 1107046), the US Department of Energy grant no. DESC0002557 and the National Science Foundation Grant DMS1638521 to the Statistical and Applied Mathematical Sciences Institute. This work was completed in part using computational resources provided by the University of Chicago Research Computing Center.
Argo data were collected and made freely available by the International Argo Program and the national programs that contribute to it (http://www.argo.ucsd.edu, http://argo.jcommops.org). The Argo Program is part of the Global Ocean Observing System.
Disclaimer
Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the US Department of Energy.
References
 [1] Dean Roemmich, John Church, John Gilson, Didier Monselesan, Philip Sutton, and Susan Wijffels. Unabated planetary warming and its ocean structure since 2006. Nature Climate Change, 5:240–245, 2015.
 [2] Alison R. Gray and Stephen C. Riser. A global analysis of Sverdrup balance using absolute geostrophic velocities from Argo. Journal of Physical Oceanography, 44(4):1213–1229, 2014.
 [3] Zhengguang Zhang, Wei Wang, and Bo Qiu. Oceanic mass transport by mesoscale eddies. Science, 345(6194):322–324, 2014.
 [4] Tyler D. Hennon, Stephen C. Riser, and Matthew H. Alford. Observations of internal gravity waves by Argo floats. Journal of Physical Oceanography, 44(9):2370–2386, 2014.
 [5] L. Cheng, J. Zhu, and R. L. Sriver. Global representation of tropical cycloneinduced shortterm ocean thermal changes using Argo data. Ocean Science, 11(5):719–741, 2015.
 [6] YouSoon Chang, Shaoqing Zhang, Anthony Rosati, Thomas L. Delworth, and William F. Stern. An assessment of oceanic variability for 1960–2010 from the GFDL ensemble coupled data assimilation. Climate Dynamics, 40:775–803, 2013.
 [7] Dean Roemmich and John Gilson. The 2004–2008 mean and annual cycle of temperature, salinity, and steric height in the global ocean from the Argo Program. Progress in Oceanography, 82:81–100, 2009.
 [8] Sunke Schmidtko, Gregory C. Johnson, and John M. Lyman. MIMOC: A glob