Spatially Extended Tests of a Neural Network Parametrization Trained by Coarsegraining
Abstract
General circulation models (GCMs) typically have a grid size of 25–200 km. Parametrizations are used to represent diabatic processes such as radiative transfer and cloud microphysics and account for subgridscale motions and variability. Unlike traditional approaches, neural networks (NNs) can readily exploit recent observational datasets and global cloudsystem resolving model (CRM) simulations to learn subgrid variability. This article describes an NN parametrization trained by coarsegraining a nearglobal CRM simulation with a 4 km horizontal grid spacing. The NN predicts the residual heating and moistening averaged over grid boxes as a function of the coarseresolution fields within the same atmospheric column. This NN is coupled to the dynamical core of a GCM with the same resolution. A recent study described how to train such an NN to be numerically stable when coupled to specified timeevolving advective forcings in a single column model, but feedbacks between NN and GCM components cause spatiallyextended simulations to crash within a few days. Analyzing the linearized response of such an NN reveals that it learns to exploit a strong synchrony between precipitation and the atmospheric state above . Removing these variables from the NN’s inputs stabilizes the coupled simulations, which predict the future state more accurately than a coarseresolution simulation without any parametrizations of subgridscale variability, although the mean state slowly drifts.
Journal of Advances in Modeling Earth Systems (JAMES)
Department of Atmospheric Sciences, University of Washington
Noah Brenowitznbren12@uw.edu
A neural network parametrization is coupled to a GCM
The neural network is trained by coarsegraining a nearglobal cloudsystem resolving simulation
The coupled simulation accurately forecasts the weather of the training data
1 Introduction
Current global climate and weather models cannot explicitly resolve many important physical processes because of computational expenses. Global weather and shortrange climate forecast models support grid sizes of to (ECMWF, 2018; NOAA, 2018), while current climate models typically have to grids. Unresolved processes, include cumulus convection, turbulence, and subgrid cloud variability are approximated by subgridscale parametrizations (Palmer, 2001). These are usually designed by human physical intuition, informed by process modeling and observations.
Cumulus convection in the tropics is one of the most dynamically important processes in the atmosphere yet it is extremely difficult to parameterize because of the multiscale nature of moist flows (Nakazawa, 1988; Majda, 2007). There is a longstanding debate about whether convective parametrizations should be based on moisture convergence or quasiequilibrium closures Arakawa (2004). Moreover, important meanstate biases in climate modeling such as the doubleITCZ bias are sensitive to convective parametrization (Zhang and Wang, 2006; Woelfle et al., 2018). Climate models also struggle to simulate observed aspects of tropical variability such as the diurnal cycle of continental precipitation (Stratton and Stirling, 2012) and the Madden Julian Oscillation (MJO) (Jiang et al., 2015; Jiang, 2017). Thus, an improved convective parametrization could help improve weather and climate simulations.
With sufficient computational resources, cloudsystem resolving models (GCRMs) with sub grid spacing can be run in global domains without deep convective parametrizations (Satoh et al., 2008; Bretherton and Khairoutdinov, 2015). Recent studies have used such simulations to study the moist static energy budget (Bretherton and Khairoutdinov, 2015) and cloud feedbacks (Narenpitak et al., 2017). The DYAMOND project recently compared 40day hindcasts by 10 independentlydeveloped global CRMs (CRMs), showing reassuring similarity in their simulated patterns of precipitation and high cloud (Stevens et al., Submitted). Centennialscale climate GCRM simulations are not yet feasible, so improving parametrizations in coarseresolution models remains an important goal.
It remains challenging for human experts to translate the vast volume of information in new highresolution datasets and models into better parametrizations of moist physical processes incorporating subgrid variability. On the other hand, parametrizations can also be built using powerful regression tools developed in the field of machine learning (ML) (Rasp et al., 2018; O’Gorman and Dwyer, 2018; Brenowitz and Bretherton, 2018). In particular, Brenowitz and Bretherton (2018) trained a neural network (NN) (Goodfellow et al., 2016) parametrization for the combined diabatic heating and moistening in a coarsegrid model by coarsegraining a GCRM simulation. They showed that coupling this parametrization to specified dynamical tendencies in a single column model gives stable and highly accurate simulations if the NN is trained to optimize model performance over multiple time steps.
We now extend our single column model results by coupling an NN parametrization to the dynamical core of a GCM. As in Brenowitz and Bretherton (2018), this unified physics scheme will predict the sources of water and heat due to latent heating, turbulence, radiation, and any other physical process beyond resolved advection. Our main goal is to produce accurate multiple day forecasts with the coupled GCMNN model.
After a brief survey of recent work on ML parametrization in Section 2, we describe our training data and GCM configuration in Section 3. Section 4 describes our coarsegraining and machine learning strategies. Particular focus is placed on the difficult task of ensuring numerical stability in spatially extended simulations (Section 4.3). Section 5 shows the results of the coupled simulations, and we conclude in Section 6.
2 Review of Machine Learning Parametrization
One attractive use of machine learning (ML) is to automatically tune existing GCM parametrizations, which build in physical insights and constraints that may help them apply across a range of climates. Proposed techniques include data assimilation (Schneider et al., 2017; Lyu et al., ) and genetic algorithms (Langenbrunner and Neelin, 2017). These techniques can tune a few free parameters, but may not scale to larger numbers of parameters. Moreover, existing parametrizations may not be flexible enough to be realistic in part because they have so few parameters. For instance, adjusting the entrainment rates in a convection scheme can improve meanstate bias, but harm the variability (Kim et al., 2011; Mapes and Neale, 2011).
A more ambitious approach replaces an existing GCM parametrization with a machine learning (ML) model, which has enough parameters to capture arbitrarily complex relationships (Cybenko, 1989; Hanin, 2017) present within a training dataset. ML models are typically trained by minimizing a loss function, such as the meansquared error (MSE) compared to some reference outputs from the training data; the choice of the loss function is subjective and a key to good performance. Early studies (Chevallier et al., 1998; Krasnopolsky et al., 2005) trained NNs (Goodfellow et al., 2016) to emulate the outputs of radiative transfer codes in order to decrease computational expense. Subsequent studies also trained NNs to emulate existing convective parametrizations (O’Gorman and Dwyer, 2018) and superparametrizations (SP) (Rasp et al., 2018). In each of these instances, the ML model is trained with a nearly ideal dataset, where the inputs are the state of the atmosphere, and the output is a tendency that has actually driven a GCM.
Training a parametrization by coarsegraining a GCRM is a more difficult problem because it lacks this hierarchical structure. On the other hand, GCRMs are the most realistic models available because they do not impose an arbitrary scale separation as SP does. In this setting, how does one define the target tendency? In the first study of its kind, Krasnopolsky et al. (2010, 2013) defined the output as the residual heating and moistening (Yanai et al., 1973) in limited area simulations. They diagnosed heating rates and cloud fractions with high accuracy but did not present any prognostic simulations. Brenowitz and Bretherton (2018) extended this work in two ways. First, their training dataset was a nearglobal CRM with rich multiscale convective organization. Second, they found single column model (SCM) simulations with NNs trained in this way diverged to infinity after just a few time steps. They ensured longterm stability by minimizing the loss accumulated over several predicted time steps.
In summary, in coarsegraining, there is no clear target to emulate. While Brenowitz and Bretherton (2018) developed an approach for ensuring numerical stability in a singlecolumn mode with prescribed dynamics, they did not test it within a full threedimensional GCM where the dynamics interact with the NN parametrization.
3 Training Data and Atmospheric Model Configuration
3.1 Training Data
We use the same training dataset as Brenowitz and Bretherton (2018): a nearglobal aquaplanet simulation (NGAqua) using the System for Atmospheric Modeling (SAM) version 6.10 (Khairoutdinov and Randall, 2003). SAM is run in a cloudsystem resolving configuration with a horizontal grid spacing of in a tropical channel domain measuring zonally by meridionally. The simulation has 34 vertical levels with a spacing that increases from at the surface to in the troposphere and stratosphere. The atmospheric circulation is driven by a zonally symmetric sea surface temperature (SST) profile which peaks at at the equator () and decreases to at the poleward boundaries. Because SAM was originally designed for smallscale modeling, the grid is Cartesian and no spherical effects are included, except for a meridionally varying Coriolis parameter. Despite the simplifications described above, NGAqua features realistic multiscale organization of tropical and midlatitude systems.
The simulation uses the radiation scheme from version 3 of the Community Atmosphere Model (CAM) with a zonally symmetric diurnal cycle and a bulk microphysics scheme (Khairoutdinov and Randall, 2003). It was initialized with small random perturbations to the temperature, spun up on a 20 km grid, then interpolated to 4 km and run for 100 days, storing the full 3D outputs every 3 hours. To allow for the circulation to statistically equilibrate at , all analyses described below are performed on the final 80 days. More details about the model configuration are described by Brenowitz and Bretherton (2018) and Narenpitak et al. (2017).
3.2 Coarseresolution model configuration
We test the NN schemes within an atmospheric model with a grid resolution of , which is within the range of modern GCMs. This scale is coarse enough that the gridmean precipitation has significant autocorrelation over the 3hour sampling interval of the training data, which should therefore sufficiently resolve the timeevolving gridmean moist convective dynamics.
We use SAM rather than a more traditional GCM as our coarsegrid model to ensure that it has the same geometry and dry dynamics as the training data. This model, coarseSAM (cSAM), is run with the same vertical grid as the NGAqua simulation. Its default microphysics scheme has the same three prognostic thermodynamic variables as SAM: the total nonprecipitating water mixing ratio , the precipitating water mixing ratio , which is a fast variable that we will neglect in the parametrizations below; and the liquidice static energy . See (Khairoutdinov and Randall, 2003, Appendix A) for more details.
SAM is typically used for cloudresolving simulations so running it efficiently and stably with coarse resolution required several modifications. Most importantly, horizontal hyperdiffusion of , , and the three wind components, , , and , was added to suppress gridscale oscillations and model blowup (divergence of the solutions to machine infinity in a finite amount of time). For the 160 km grid, was sufficient to damp gridscale oscillations.
For NN simulations, we use a simplified momentum damping rather than SAM’s default turbulence closure, which severely limits the time step on the 160 km grid. This forcing damps the velocity towards zero at a rate which decreases linearly from at the surface to zero at a level of (Held and Suarez, 1994). Because this is different than NGAqua, it leads to an inevitable drift of cSAM simulations away from NGAqua over the course of a few days. In the future, we hope to use learned coarsegrained momentum tendencies from NGAqua in place of this approach.
We compare simulations with cSAM coupled to our NN parametrization with a base version of cSAM with the microphysics and radiation tendencies calculated from gridscale mean thermodynamics. Unlike the NN simulations, the surface fluxes are computed interactively and coupled to SAM’s default turbulence scheme. The base cSAM will only precipitate when a coarseresolution gridcell achieves saturation, so we expect it will produce noisy simulations. Ideally, we would have preferred to compare the NN parametrizations against a traditional GCM parametrization suite, as did Rasp et al. (2018) and Brenowitz and Bretherton (2018), but no such scheme has been implemented in SAM.
In this article, we assess the accuracy of 10day weather simulations. They are initialized with the coarsegrained outputs from a particular time, day 100.625, of the NGAqua simulation. Our goal is to obtain cSAM simulations that best match the actual evolution of NGAqua over the following 10 days. All of the simulations presented in this paper use a timestep. The thermodynamic variables and vertical velocity from the 4 km NGAqua data, are averaged over grid cell blocks, but the horizontal velocities must be averaged along the interfaces of these blocks to respect the anelastic mass conservation equation.
4 Machine Learning Parametrization
4.1 Coarsegraining Problem
An NN will parameterize the unknown sources of the thermodynamic variables and . Their coarseresolution budgets are given by
(1)  
(2) 
where is the horizontal average of over the coarsegrid boxes. The first term on the righthand side of these budgets are the tendencies due to the GCM, which includes advection and any other explicitly treated process. The apparent heating and moistening are defined as a budget residual from these GCM tendencies (Yanai et al., 1973). Because they are residuals, they contain the effects of all unresolved and untreated physics including latent heating, turbulent mixing, and radiation.
The goal of any parametrization is to approximate and as potentially stochastic and nonlocal functions of the coarse resolution variables alone (Palmer, 2001). Because radiative heating and moist physical processes couple the atmosphere more strongly in the vertical than the horizontal direction, we follow the assumption of horizontal locality typically made in moist physics parametrizations; this reduces the dimensionality of the training problem, allowing robust results to be obtained from our NGAqua training dataset. For simplicity, we also treat and as deterministic functions of the inputs.
The inputs are the vertical profiles of the prognostic variables and , as well as the sea surface temperature (SST) and downwelling insolation (SOLIN) at the top of the atmosphere, which are external parameters. Unlike Brenowitz and Bretherton (2018), we do not use latent heat flux because it depends on the surface winds, which tend to be inaccurate because they are forced by the simplified damping scheme described in Section 3 rather than a coarsegrained source from NGAqua.
Let the vector concatenate the observed prognostic variables and over all coarsegrained grid levels in grid column , and let contain the auxiliary variables SST and SOLIN at this location. With these assumptions, (1) and (2) can be combined:
(3) 
where the subscript is the index of the given horizontal grid cell and the superscript indicates that these are observed quantities. The function and its parameters are invariant for all locations. Here, is the GCM tendency from coarsegrid advection (which involves other grid columns and hence is nonlocal); is the portion of the apparent sources and that can be modeled deterministically, which the NN will be used to parameterize; and includes stochastic and structural error. As with Brenowitz and Bretherton (2018), will be parameterized as an NN where the parameters include the weights and biases of the network’s layers.
4.2 Loss Functions for Numerically Stable Parametrizations
How do we find the parameters from the NGAqua time series ? Brenowitz and Bretherton (2018) trained a numerically stable NN for use in single column model (SCM) simulations by minimizing the multiple step prediction error (MSPE) over a 13 day prediction window . SCM dynamics decouple the dynamics of an atmospheric column from its surroundings and are described mathematically by
(4) 
where is the single column time series for a given location , starting prediction time , and set of parameters . Compared to (3), the global state and the auxiliary variables are prescribed functions of time, which are decoupled from the local dynamics.
Brenowitz and Bretherton (2018) define the MSPE as
(5) 
where the sum ranges over all possible prediction start times and spatial locations . The inner norm is given by
(6) 
where is SAM’s reference density profile. is the mass of an atmospheric column while and are weight constants for the temperature and humidity, respectively. In this study, we find that when setting and each variable contributes comparably to the overall loss. This resulted in a highly accurate single column timestepping scheme which closely replicated the and time series of the NGAqua data.
Unfortunately, MSPE training produces NN schemes which implicitly depend on the temporal discretization and time step used to simulate (4). The main symptom of this is a “spinup” error where the SCM only produces accurate tendencies after a few time steps of prediction (see Figure 1): the spunup predictions (panel c, d) more closely resemble the truth (panel a) than the instantaneous prediction (b) does. Because this spinup process occurs after 3 hours, it did not cause large errors in the single column simulations of Brenowitz and Bretherton (2018). However, for spatiallyextended simulations, the coupling between the GCM dynamics and the NN occurs at every time step, so the NN must produce accurate predictions from the start. Thus, MSPEtrained NNs are not suitable for use within cSAM because they spinup on a timescale longer than the cSAM time step.
Our goal in this article is to develop a loss function which produces an NN that gives spatiallyextended coarsegrid simulations that are stable, unbiased, and insensitive to the model time step. This loss function is inspired by the Lax equivalence theorem in numerical analysis, which asserts that a finite difference method is accurate if and only if it is consistent and stable (Quarteroni et al., 2007). The loss is given by
(7) 
Consistency will mean that the approximates and instantaneously, and is accomplished by minimizing the loss given by
(8)  
(9) 
In practice, the storage terms are estimated from the 3 hourly sampled data using finite differences. Thus, minimizing demands that approximates the apparent heating and moistening.
On the other hand, stability requires that the small perturbations to the inputs do not grow under the dynamics given by (4). In practice, we enforce this using a penalty given by
(10) 
where is the time average operator. is the single column simulation under the action of the timemean forcing, whose dynamics are given by
(11) 
Its initial condition is . This penalty demands that the SCM simulation does not diverge too quickly for a prediction of length (). Because uses constant forcing, it is less sensitive than the MSPE to the temporal discretization of (4). In practice, NNs trained to minimize avoid spinup errors and accurately predict and at the initial time. Like MSPEtrained NNs, they do produce stable, albeit less accurate, SCM simulations. Since our main goal is to produce accurate coupled GCMNN simulations this deteriorated SCM performance is acceptable.
4.3 Coupled Numerical Instability
The methods discussed still do not prevent model blowup when coupled to cSAM, as we will show in Section 5. This occurs because the single column training strategy does not account for feedbacks between cSAM and the NN so corresponding numerical instabilities are not penalized.
A similar feedback causes gridscale storms when a GCM is coupled to a moistureconvergence closures (Arakawa, 2004). Emanuel et al. (1994) argue this closure fundamentally confuses cause and effect, and that precipitation drives moisture convergence rather than vice versa. In fact, such misleading correlations are common in nonlinear dynamical systems such as the atmosphere, a phenomenon known as synchronization (Pecora et al., 1997), so datadriven models must be taught to ignore such noncausal correlations. Thus, we must determine if the NN is mistaking cause for effect.
4.3.1 Interpreting the trained networks using linear response functions
The linearized response of the NN to small perturbations can reveal if it is confusing cause and effect. The linearized response function (LRF) is given by
where the derivative is with respect to the phase space variables . For ease of discussion, we defer the model architecture and training details to Section 4.4. We first train an NN using the stabilitypenalized loss in (7) and then compute its LRF for the mean equatorial sounding (Figure 2) using automatic differentiation (Paszke et al., 2017). We compare against LRFs that Kuang (2018) computed by perturbing the steady forcing of a CRM, which presumably represent causal relationships between the inputs and outputs.
Are the LRFs of our NNs similar to those of Kuang (2018, Figure 1)? The LRFs of Kuang (2018) have an upward dependence structure (a mainly uppertriangular structure in the panels of Figure 2), where the heating and to a lesser extent, the moistening, at a given vertical level depend only on data from below. This is physically realistic since cumulus convection initiates at lower elevations and deepens into cumulonimbus; evaporation of precipitation leads to some lowtriangular structure as well. On the other hand, our NN predicts that heating and moistening at all heights are very sensitive to the humidity near the tropopause (200 mb), a physically nonsensical result. A increase of humidity at this level leads to more than of drying at all lower levels. Moreover, the NN does not accurately predict and at these levels (cf. Figure 5), perhaps because the fluctuations of humidity and temperature at these levels occur on faster timescales than the 3hour sampling rate of the data. Thus, the NN depends most sensitively on the inputs that it performs worst on.
The NN is sensitive to upperlevel humidity because the latter is strongly correlated to precipitation. Figure 3a shows that the humidity and apparent moistening are highly anticorrelated, as measured by the effective damping rate given by
(12) 
where is the zonal and time averaging operator. Near the tropopause, the moisture has a damping timescale of , which is the sampling rate of the NGAqua data. This strong damping results in a near perfect correlation between the advective moistening and at this level, as shown in Figure 3b.
The advective moistening is primarily driven by gridscale mean vertical motions due to cumulus convection. The numerical instability probably appears in the tropics first because the weak temperature gradient there ensures that latent heating drives gridscale average vertical motions (Sobel and Bretherton, 2000). Thus, the amount of water cSAM advects into the upper atmosphere strongly correlates with precipitation.
The sensitivity to humidity can drive numerical instability as follows. A positive moisture anomaly in the tropical upper troposphere will increase the predicted precipitation and heating. This will then increase the upward velocity, which closes the feedback loop by increasing the supply of moisture to the upper atmosphere. Ultimately, this feedback can lead to gridscale storms (see Figure 7c below), similarly to moisture convergence closures.
This instability might be less problematic had we been able to customize the 4 km NGAqua simulations for the training process so as to better infer causality, e. g. by using higher output time resolution and calculation of instantaneous heating and moistening rates. However, we were forced to work around it, as described in the next section.
4.3.2 Eliminating noncausal relationships
We enforce a more plausible causal structure on the NN by not using the humidity and temperature above certain heights as predictors. To obtain a numerically stable simulation without gridscale storms, it suffices to ignore the and at or above levels 15 () and 19 (), respectively. We also assume that that the NN predicts at the same levels; however, a debiasing procedure discussed below will allow predictions of nonzero tendencies there. To justify this assumption, we note that is small at these levels, and that is very noisy above (cf. Figure 6). We have run cSAM simulations with this configuration for up to 100 days without numerical blowup or gridpoint storms.
Figure 4 shows the LRF of the modified NN. Most strikingly, the linearized response of the drying/moistening to the input variables is an order of magnitude smaller than for the NN with all input variables. For the most part, this LRF also has a smoother vertical structure that is more compatible with the CRM derived LRFs (Kuang, 2018). For instance, both our LRFs and those of Kuang (2018) show that destabilizing the atmosphere by decreasing the lower tropospheric temperature leads to heating/drying throughout the column (panels b and d). Likewise, the lower tropospheric moisture induces convective heating above (panel c). Unfortunately, there are still large oscillations in the response of to the humidity at . Nonetheless, ignoring the upper atmospheric variables markedly improves the linearized response of the NN.
4.4 Network architecture and training
Unlike Rasp et al. (2018), we did not find that changing network architecture by adding more layers or parameters prevents blowup. However, since we are training the data with global data, we use a higher capacity network than Brenowitz and Bretherton (2018) used for the tropics alone. Thus, each model trained in this paper has three densely connected hidden layers with 256 nodes each and rectified linear unit (ReLU) activations. Altogether, this architecture has free parameters.
Prior to input, each input variable is normalized by the global mean and dividing by the standard deviation for each level independently, except where the latter vanishes. This levelbylevel normalization is a departure from past studies (Rasp et al., 2018; Brenowitz and Bretherton, 2018), but had little impact upon the results. In particular, NNs trained with inputs normalized as Brenowitz and Bretherton (2018) also learn to depend strongly on the upper atmospheric humidity and suffer from coupled numerical instabilities discussed above.
The networks weights and biases are optimized by stochastic gradient descent (SGD) with the popular Adam (Kingma and Ba, 2014) optimizer with an initial learning rate of . If a sample is defined as an individual atmospheric column for a given horizontal location and time step, then there are samples in the training dataset. These samples are randomly in horizontal space and formed into batches of 64 time series, each containing the full 80 days of data. In practice, SGD trains an NN that captures the timespace variability in diabatic processes, but makes large () errors in the zonal and time mean. This occurs because of the optimization is highly nonconvex and the variability of cloudrelated processes is larger than its mean. Therefore, after SGD converges, we use linear regression to debias the predicted heating and moistening for each height and latitude separately (see Text S1). To avoid reintroducing the causality and numerical stability issues solved in Section 4.3, these regressions depend only the predictions of and as well as and at the given height and do not introduce new dependencies between vertical levels.
The NN training and linear regression are implemented in Python using PyTorch (Paszke et al., 2017) and scikitlearn (Pedregosa et al., 2011), respectively. The debiased NN models are then coupled to cSAM using a python interface. The predicted tendencies are disabled within 3 grid cells () of the boundary because the flow in these regions is greatly influenced by the nonormal flow boundary conditions, so the NN is inaccurate there (cf. Fig. 5).
5 Results
We now present our spatiallyextended simulations. As described in Section 3, the simulations are initialized with the NGAqua data at day 100.625, and our discussion focuses on the prediction accuracy in the first 10 forecast days. Two NN parametrizations will be compared with the training data and the base simulation with resolvedscale microphysics and radiation. The first NN simulation, labeled as “NNAll”, is performed using the NN which includes all levels as input. The discussion in Section 4.3 indicates that this setup leads to gridscale storms and model blowup. The NN in the second simulation, known as “NNLower”, only uses humidities below level 16 (375 mb) and temperature below level 19 (226 mb) as described in Section 4.3.2. When evaluated directly on the NGAqua data, NNLower has an value of around 60% for and in the lower troposphere (Figure 5). Interestingly, the predicted heating and moistening are less noisy than the “true” values estimated as budget residuals (Figure 6). Thus, the NNLower parametrization performs well in an instantaneous sense.
Figure 7b demonstrates that the NNLower configuration is indeed numerically stable and produces a reasonable prediction of the PW after 5 days, albeit with a visible loss in largescale moisture variability in the tropics. On the other hand, NNAll shows extremely moist gridscale storms near the tropics, which coincide with intense precipitation and ascent (not shown). Likewise, the base simulation suffers from gridpoint storms in the tropics and frontal regions in the extratropics, and it ultimately blows up after 9 days of simulation.
5.1 Perfect Model Weather Prediction
Figure 8 shows the rootmeansquared error (RMSE) of , , and the vertical component of vorticity for each configuration. The RMSE values are vertically integrated and massweighted and averaged over different latitude bands. The tropics lie within of the equator, the subtropics are between and , and the extratropics are the poleward of the domain. For all regions and variables, NNLower has the lowest RMSE. This improvement is most striking for the thermodynamic variables and , which have parameterized source terms. Compared to the base simulation, NNLower has 50% lower errors in both and . The vorticity error improves less, which is not surprising since we do not attempt to correctly represent the source of momentum (). However, in the extratropics, both NNLower and NNAll outperform the base simulation. Overall, NNLower produces the most accurate predictions.
The NNLower simulation also predicts the net precipitation rate more accurately than the base simulation does. The net precipitation rate is given by
where is precipitation and is the evaporation (Yanai et al., 1973). Net precipitation is the closest analog to precipitation that the NNs can predict because they predict the combined effect of precipitation and evaporation.
The net precipitation is much less noisy than in NNLower than the base run. Figure 9 shows maps of net precipitation after two days of prediction (day 102.625) for the NGAqua training data and the NNLower and base simulations. Overall, the NNLower simulation produces the correct extratropical pattern of net precipitation, but fails to produce enough variability in the tropics. In NGAqua, the tropical precipitation occurs in smallscale clusters and in a largerscale Kelvin wave centered around at day 102.625, elsewhere there is column moistening due to evaporation. On the other hand, the base simulation produces far too strong and noisy precipitation, as expected for a scheme which only rains when a largescale gridcell becomes saturated.
Does the NN predict overly smooth net precipitation at the initial time or does this smoothness only appear as the coupled simulation evolves? The NN’s “semiprognostic” net precipitation (i.e. predicted from the true coarsegrained NGAqua state at that time) is also shown in Figure 9b. Even in this instantaneous sense, the NNLower prefers to lightly rain throughout the tropics, rather than moisten in certain regions and strongly dry in others.
Figure 10 shows the pattern correlation compared to NGAqua of net precipitation for the NNLower and base simulations. NNLower has higher pattern correlations than the base simulation at all times and retains a correlation of 0.5 up to day 101.5 in the tropics and beyond day 102.5 in the extratropics and subtropics. By this measure, the base simulation has little predictive skill in the tropics.
5.2 Meanstate bias
Shortterm predictions with NNLower are reasonably accurate, but the mean state drifts away from the true climate. We first focus on the time evolution of the zonal mean precipitable water (PW) and net precipitation, as shown in Figure 11. The fields are shown for days 100120 for NGAqua and the NNLower simulation; the base simulation crashed in day 110. Compared to NGAqua, NNLower maintains the correct meridional distribution of PW, but the net precipitation steadily decreases in the tropics. All else equal, this would tend to increase the PW in the tropics, but the Hadley circulation and its transport of vapor towards the equator also weaken. On the other hand, the net precipitation in the base simulation becomes too concentrated at the equator and atmosphere dries out significantly. Overall, NNLower outperforms the base case and maintains the correct zonalmean PW distribution, but does not produce enough precipitation in the tropics at later times.
Figure 12 shows the zonalmean bias vs. NGAqua for the zonal winds, meridional streamfunction, humidity , and temperature in the NNLower simulation, timeaveraged over forecast days 5–10. The humidity bias is relatively small with an amplitude of less than . The bias of is also small, and peaks near the meridional boundaries of the domain where the NN is not used. On the other hand, the meridional circulation substantially weakens from a peak mass transport of to . This is associated with the decreasing in the tropics seen in Figure 11. The zonal mean zonal winds also differ substantially. The surface westerlies and eddydrivenjets become weaker, likely due to the HeldSuarez momentum damping. At the same time, the jets shift southward and superrotation develops near , likely as a transient response to the weakening Hadley circulation. Overall, the small biases in the thermodynamic variables and belie growing biases in the circulation.
The loss of zonal variability in net precipitation (Fig. 9) and precipitable water (Fig. 7) might explain these meanstate drifts. Precipitation depends exponentially on precipitable water (Rushley et al., 2018; Bretherton et al., 2004), therefore a reduction in the zonal variance of PW (fewer extremely moist or dry coarse grid cells) will tend to decreases the zonalmean precipitation for a given zonalmean PW. Figure 13 confirms that this occurs in the NNLower simulation, whose mean tropical net precipitation is 40% weaker than NGAqua despite having a similar mean PW. Indeed, Figure 13 shows that the joint distribution of PW and net precipitation has much less variance in NNLower than in NGAqua.
The predicted net precipitation has too little variance even when evaluated on the true state (Fig. 13c). In other words, the distribution of net precipitation has too little spread for a given value of PW— i.e. the NN relies too strongly on PW. When integrated forward in time, this lack of variance likely reduces the PW variance, and therefore the mean precipitation. Thus, enhancing the variance of the conditional distributions of net precipitation could ameliorate the circulation biases in these simulations.
Plausible physical mechanisms for the loss of moisture variability depend on the spatial scale. On smaller scales, both the explicit hyperdiffusion and implicit numerical diffusion could reduce the variance of moisture. For larger scales, the moisture probably varies less because NNLower prefers to smooth throughout the tropics rather than delineating precipitating and evaporating regions. Future work could improve this by developing a stochastic or deterministic trigger for convection, or by separating precipitating and evaporating processes.
6 Conclusions
We use a neural network to build a unified physics parametrization of diabatic heating and moistening, which include the effects of unresolved subgrid processes. The neural network is trained by coarsegraining a near global aquaplanet simulation with a grid size of to a resolution of . It learns the coarsegrid heating and moistening profiles, calculated as residuals from the coarsegrid dynamics, from the temperature and moisture profiles, plus auxiliary parameters. A key challenge of the coarsegraining approach is that it lacks the natural hierarchical structure of other training datasets used in the literature, such as superparametrization (SP) (Rasp et al., 2018). In the SP application, for instance, the target model and the training model share the same software interface for the physical process tendencies. These tendencies are predicted by a highresolution submodel in the SP training dataset and must be learned by the neural network parametrization for application to the target model. This application is thus a clear target for emulation.
We extend the single column model results of Brenowitz and Bretherton (2018) to a spatiallyextended simulation. The coarsegrid dynamical core is the same anelastic atmospheric model that generated the 4 km training dataset but is run with a resolution. This model needed additional damping in the form of horizontal hyperdiffusion and Newtonian relaxation near the surface to run stably in a base configuration with its default resolvedscale microphysics and radiation schemes, or with our neural network unifiedphysics parametrization. We developed a deeper version of our neural network capable of simulating the diabatic heating and moistening over the entire 46S46N domain of our training simulations, rather than just the tropical subdomain used by Brenowitz and Bretherton (2018). We include a zonalmean bias correction to minimize zonal mean temperature and moisture drifts.
An attempt to couple a preliminary version of this neural network to this GCM caused the model to blow up. Analyzing the linearized response of that neural network showed that it inadvertently learns to exploit a strong correlation between upper atmospheric humidity and precipitation. This correlation owes to the short lifetime of water vapor at those heights rather than any causal mechanism. Thus, ML parametrization is not immune to issues with causality that have long inspired debates about closure assumptions.
We enforced a plausible causal structure by removing the upper atmospheric humidity and temperature from the neural network inputs. Spatially extended simulations with this modified network can run stably indefinitely, without blowing up. Thus, stabilizing the parametrization required a rather crude human intervention. Future studies will need to explore automatic ways to discover true causal relationships and forestall model blowup in a dynamically coupled setting.
The resulting neural network predicted the weather of NGAqua with much higher forecast skill and lower bias than a base coarsegrid simulation with only resolved microphysics and radiation parametrizations, which is the only reference case we could easily use as a metric. For a fair comparison, we will need to train and implement the ML parametrization in a global model whose coarsegrid version is typically run with a traditional suite of physical parametrizations.
More work is needed to reduce the longterm biases in these simulations. While temperature and humidity biases are small in the first 10 days, the Hadley circulation is dramatically weakened because there is not enough heating in the tropics to sustain vertical motions. Thus, more work is needed to keep the climate generated by neural network parametrizations trained by coarsegraining from drifting. We argue that our scheme could benefit from adding a mechanism for enhancing the tropical variance of moisture and the predicted precipitation, such as stochasticity in the parametrized tendencies.
Acknowledgements.
N.B. is supported as a postdoctoral fellow by the Washington Research Foundation, and by a Data Science Environments project award from the Gordon and Betty Moore Foundation (Award #20131029) and the Alfred P. Sloan Foundation (Award #3835) to the University of Washington eScience Institute. C.B. is supported by U. S. Department of Energy grants DESC0012451 and DESC0016433. The coarsegrained NGAqua data and computer codes are available atzenodo.org
(Brenowitz, 2018, 2019a) and GitHub (Brenowitz, 2019b), respectively.
References
 Arakawa (2004) Arakawa, A. (2004), The cumulus parameterization problem: Past, present, and future, J. Clim., 17(13), 2493–2525, doi:10.1175/15200442(2004)017¡2493:RATCPP¿2.0.CO;2.
 Brenowitz (2018) Brenowitz, N. D. (2018), Coarsegrained outputs from nearglobal aquaplanet control run with QOBS SST, doi:10.5281/zenodo.1226370.
 Brenowitz (2019a) Brenowitz, N. D. (2019a), Coarsegrained nearglobal aquaplanet simulation with computed dynamical tendencies, doi:10.5281/zenodo.2621638.

Brenowitz (2019b)
Brenowitz, N. D. (2019b),
https://github.com/nbren12/uwnet
v0.7, doi:10.5281/zenodo.2621717.  Brenowitz and Bretherton (2018) Brenowitz, N. D., and C. S. Bretherton (2018), Prognostic validation of a neural network unified physics parameterization, Geophys. Res. Lett., 17, 2493, doi:10.1029/2018GL078510.
 Bretherton and Khairoutdinov (2015) Bretherton, C. S., and M. F. Khairoutdinov (2015), Convective selfaggregation feedbacks in nearglobal cloudresolving simulations of an aquaplanet, Journal of Advances in Modeling Earth Systems, 7(4), 1765–1787.
 Bretherton et al. (2004) Bretherton, C. S., M. E. Peters, and L. E. Back (2004), Relationships between water vapor path and precipitation over the tropical oceans, J. Clim., 17(7), 1517–1528, doi:10.1175/15200442(2004)017¡1517:RBWVPA¿2.0.CO;2.
 Chevallier et al. (1998) Chevallier, F., F. Chéruy, N. A. Scott, and A. Chédin (1998), A neural network approach for a fast and accurate computation of a longwave radiative budget, J. Appl. Meteorol., 37(11), 1385–1397, doi:10.1175/15200450(1998)037¡1385:ANNAFA¿2.0.CO;2.
 Cybenko (1989) Cybenko, G. (1989), Approximation by superpositions of a sigmoidal function, Math. Control Signals Systems, 2(4), 303–314, doi:10.1007/BF02551274.
 ECMWF (2018) ECMWF (Ed.) (2018), Part III : Dynamics and numerical procedures, no. 3 in IFS Documentation, ECMWF.
 Emanuel et al. (1994) Emanuel, K. A., J. David Neelin, and C. S. Bretherton (1994), On largescale circulations in convecting atmospheres, Q.J.R. Meteorol. Soc., 120(519), 1111–1143, doi:10.1002/qj.49712051902.
 Goodfellow et al. (2016) Goodfellow, I., Y. Bengio, and A. Courville (2016), Deep learning, Adaptive computation and machine learning, The MIT Press, Cambridge, Massachusetts.
 Hanin (2017) Hanin, B. (2017), Universal function approximation by deep neural nets with bounded width and ReLU activations, arXiv eprints.
 Held and Suarez (1994) Held, I. M., and M. J. Suarez (1994), A proposal for the intercomparison of the dynamical cores of atmospheric general circulation models, Bull. Am. Meteorol. Soc., 75(10), 1825–1830, doi:10.1175/15200477(1994)075¡1825:APFTIO¿2.0.CO;2.
 Jiang (2017) Jiang, X. (2017), Key processes for the eastward propagation of the MaddenJulian oscillation based on multimodel simulations, J. Geophys. Res. D: Atmos., 122(2), 2016JD025,955, doi:10.1002/2016JD025955.
 Jiang et al. (2015) Jiang, X., D. E. Waliser, P. K. Xavier, J. Petch, N. P. Klingaman, S. J. Woolnough, B. Guan, G. Bellon, T. Crueger, C. DeMott, C. Hannay, H. Lin, W. Hu, D. Kim, C.L. Lappen, M.M. Lu, H.Y. Ma, T. Miyakawa, J. A. Ridout, S. D. Schubert, J. Scinocca, K.H. Seo, E. Shindo, X. Song, C. Stan, W.L. Tseng, W. Wang, T. Wu, X. Wu, K. Wyser, G. J. Zhang, and H. Zhu (2015), Vertical structure and physical processes of the MaddenJulian oscillation: Exploring key model physics in climate simulations, J. Geophys. Res. D: Atmos., 120(10), 2014JD022,375, doi:10.1002/2014JD022375.
 Khairoutdinov and Randall (2003) Khairoutdinov, M. F., and D. A. Randall (2003), Cloud resolving modeling of the ARM summer 1997 IOP: Model formulation, results, uncertainties, and sensitivities, J. Atmos. Sci., 60(4), 607–625, doi:10.1175/15200469(2003)060¡0607:CRMOTA¿2.0.CO;2.
 Kim et al. (2011) Kim, D., Y.S. Jang, D.H. Kim, Y.H. Kim, M. Watanabe, F.F. Jin, and J.S. Kug (2011), El niñosouthern oscillation sensitivity to cumulus entrainment in a coupled general circulation model: ENSO SENSITIVE TO CONVECTION SCHEME, J. Geophys. Res., 116(D22), doi:10.1029/2011JD016526.
 Kingma and Ba (2014) Kingma, D. P., and J. Ba (2014), Adam: A method for stochastic optimization.
 Krasnopolsky et al. (2005) Krasnopolsky, V. M., M. S. FoxRabinovitz, and D. V. Chalikov (2005), New approach to calculation of atmospheric model physics: Accurate and fast neural network emulation of longwave radiation in a climate model, Mon. Weather Rev., 133(5), 1370–1383, doi:10.1175/MWR2923.1.
 Krasnopolsky et al. (2010) Krasnopolsky, V. M., M. S. FoxRabinovitz, and A. A. Belochitski (2010), Development of neural network convection parameterizations for numerical climate and weather prediction models using cloud resolving model simulations, in The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, doi:10.1109/IJCNN.2010.5596766.
 Krasnopolsky et al. (2013) Krasnopolsky, V. M., M. S. FoxRabinovitz, and A. A. Belochitski (2013), Using ensemble of neural networks to learn stochastic convection parameterizations for climate and numerical weather prediction models from data simulated by a cloud resolving model, Advances in Artificial Neural Systems, 2013, e485,913, doi:10.1155/2013/485913.
 Kuang (2018) Kuang, Z. (2018), Linear stability of moist convecting atmospheres part i: from linear response functions to a simple model and applications to convectively coupled waves, J. Atmos. Sci., doi:10.1175/JASD180092.1.
 Langenbrunner and Neelin (2017) Langenbrunner, B., and J. D. Neelin (2017), ParetoOptimal estimates of california precipitation change, Geophys. Res. Lett., p. 2017GL075226, doi:10.1002/2017GL075226.
 (25) Lyu, G., A. Köhl, I. Matei, and D. Stammer (), AdjointBased climate model tuning: Application to the planet simulator, J. Adv. Model. Earth Syst., doi:10.1002/2017MS001194.
 Majda (2007) Majda, A. J. (2007), Multiscale models with moisture and systematic strategies for superparameterization, J. Atmos. Sci., 64(7), 2726–2734, doi:10.1175/JAS3976.1.
 Mapes and Neale (2011) Mapes, B., and R. Neale (2011), Parameterizing convective organization to escape the entrainment dilemma, J. Adv. Model. Earth Syst., 3(2), M06,004, doi:10.1029/2011MS000042.
 Nakazawa (1988) Nakazawa, T. (1988), Tropical super clusters within intraseasonal variations over the western pacific, Journal of the Meteorological Society of Japan. Ser. II, 66(6), 823–839.
 Narenpitak et al. (2017) Narenpitak, P., C. S. Bretherton, and M. F. Khairoutdinov (2017), Cloud and circulation feedbacks in a nearglobal aquaplanet cloudresolving model: Cloud feedbacks in a NearGlobal CRM, J. Adv. Model. Earth Syst., 9(2), 1069–1090, doi:10.1002/2016MS000872.
 NOAA (2018) NOAA (2018), Strategic implementation plan for evolution of nggps to a national unified modeling system, Tech. rep.
 O’Gorman and Dwyer (2018) O’Gorman, P. A., and J. G. Dwyer (2018), Using machine learning to parameterize moist convection: Potential for modeling of climate, climate change, and extreme events, J. Adv. Model. Earth Syst., 10(10), 2548–2563, doi:10.1029/2018MS001351.
 Palmer (2001) Palmer, T. N. (2001), A nonlinear dynamical perspective on model error: A proposal for nonlocal stochasticdynamic parametrization in weather and climate prediction models, Quart. J. Roy. Meteor. Soc., 127(572), 279–304, doi:10.1002/qj.49712757202.
 Paszke et al. (2017) Paszke, A., S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017), Automatic differentiation in PyTorch.
 Pecora et al. (1997) Pecora, L. M., T. L. Carroll, G. A. Johnson, D. J. Mar, and J. F. Heagy (1997), Fundamentals of synchronization in chaotic systems, concepts, and applications, Chaos, 7(4), 520–543, doi:10.1063/1.166278.
 Pedregosa et al. (2011) Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay (2011), Scikitlearn: Machine learning in Python, Journal of Machine Learning Research, 12, 2825–2830.
 Quarteroni et al. (2007) Quarteroni, A., R. Sacco, and F. Saleri (2007), Numerical Mathematics, Texts in Applied Mathematics, 2 ed., SpringerVerlag Berlin Heidelberg, doi:10.1007/b98885.
 Rasp et al. (2018) Rasp, S., M. S. Pritchard, and P. Gentine (2018), Deep learning to represent subgrid processes in climate models, Proc. Natl. Acad. Sci. U. S. A., 115(39), 9684–9689, doi:10.1073/pnas.1810286115.
 Rushley et al. (2018) Rushley, S. S., D. Kim, C. S. Bretherton, and M.S. Ahn (2018), Reexamining the nonlinear MoisturePrecipitation relationship over the tropical oceans, Geophys. Res. Lett., 45(2), 2017GL076,296, doi:10.1002/2017GL076296.
 Satoh et al. (2008) Satoh, M., T. Matsuno, H. Tomita, H. Miura, T. Nasuno, and S. Iga (2008), Nonhydrostatic icosahedral atmospheric model (NICAM) for global cloud resolving simulations, J. Comput. Phys., 227(7), 3486–3514, doi:10.1016/j.jcp.2007.02.006.
 Schneider et al. (2017) Schneider, T., S. Lan, A. Stuart, and J. Teixeira (2017), Earth system modeling 2.0: A blueprint for models that learn from observations and targeted HighResolution simulations, Geophys. Res. Lett., 44(24), 12,396–12,417, doi:10.1002/2017GL076101.
 Sobel and Bretherton (2000) Sobel, A. H., and C. S. Bretherton (2000), Modeling tropical precipitation in a single column, J. Clim., 13(24), 4378–4392, doi:10.1175/15200442(2000)013¡4378:MTPIAS¿2.0.CO;2.
 Stevens et al. (Submitted) Stevens, B., M. Satoh, L. Auger, J. Biercamp, C. Bretherton, X. Chen, P. Dueben, F. Judt, M. Khairoutdinov, D. Klocke, C. Kodama, L. Kornblueh, S.J. Lin, W. Putman, R. Shibuya, P. Neumann, N. Roeber, B. Vanniere, P.L. Vidale, N. Wedi, and L. Zhou (Submitted), DYAMOND: The DYnamics of the Atmospheric general circulation Modeled On Nonhydrostatic Domains., Prog. Earth Planet. Sci.
 Stratton and Stirling (2012) Stratton, R. A., and A. J. Stirling (2012), Improving the diurnal cycle of convection in GCMs, Q.J.R. Meteorol. Soc., 138(666), 1121–1134, doi:10.1002/qj.991.
 Woelfle et al. (2018) Woelfle, M. D., S. Yu, C. S. Bretherton, and M. S. Pritchard (2018), Sensitivity of coupled tropical pacific model biases to convective parameterization in CESM1, J. Adv. Model. Earth Syst., 10(1), 126–144, doi:10.1002/2017MS001176.
 Yanai et al. (1973) Yanai, M., S. Esbensen, and J.H. Chu (1973), Determination of bulk properties of tropical cloud clusters from LargeScale heat and moisture budgets, J. Atmos. Sci., 30(4), 611–627, doi:10.1175/15200469(1973)030¡0611:DOBPOT¿2.0.CO;2.
 Zhang and Wang (2006) Zhang, G. J., and H. Wang (2006), Toward mitigating the double ITCZ problem in NCAR CCSM3, Geophys. Res. Lett., 33(6), 4641, doi:10.1029/2005GL025229.