MultiRate Time Integration on Overset Meshes
Abstract
Overset meshes are an effective tool for the computational fluid dynamic simulation of problems with complex geometries or multiscale spatiotemporal features. When the maximum allowable timestep on one or more meshes is significantly smaller than on the remaining meshes, standard explicit time integrators impose inefficiencies for timeaccurate calculations by requiring that all meshes advance with the smallest timestep. With the targeted use of multirate time integrators, separate meshes can be timemarched at independent rates to avoid wasteful computation while maintaining accuracy and stability. This work applies timeexplicit multirate integrators to the simulation of the compressible NavierStokes equations discretized on overset meshes using summationbyparts (SBP) operators and simultaneous approximation term (SAT) boundary conditions. We introduce a novel class of multirate AdamsBashforth (MRAB) schemes that offer significant stability improvements and computational efficiencies for SBPSAT methods. We present numerical results that confirm the numerical efficacy of MRAB integrators, outline a number of outstanding implementation challenges, and demonstrate a reduction in computational cost enabled by MRAB. We also investigate the use of our method in the setting of a largescale distributedmemory parallel implementation where we discuss concerns involving load balancing and communication efficiency.
keywords:
MultiRate Time Integration, Overset Meshes, Chimera, SummationByParts, SimultaneousApproximationTerm, AdamsBashforth.capbtabboxtable[][\FBwidth]
1 Introduction
In the timeexplicit direct numerical simulation (DNS) of computational fluid dynamic problems, the maximum timestep allowable for stable integration of the governing equations is often limited by the wellknown CourantFreidrichsLewy number (CFL),
where is the minimum grid spacing, and is the maximum characteristic speed of the physical phenomena being simulated, or its viscous analog,
where is the characteristic diffusivity. The timestep taken when integrating over the entire computational domain using a standard explicit singlerate integrator can be severely limited by dynamics that occur on a small portion of the domain, be it due to shorter time scales of certain physical phenomena, such as highspeed flow, or locallyhigh grid resolution.
With multirate integration, groups of degrees of freedom, or even individual righthand side contributions corresponding to evolution on small timescales, can be integrated with independent time steps, allowing computational work to be avoided on the “slow” components while the “fast” components remain stable and wellresolved in time. This improves the efficiency of the application. The present study focuses on an application of multirate integrators to problems with overset meshes.
The method of overset meshes (also known as the Chimera method (Steger, 1991)) is an approach that discretizes a domain using multiple independent, overlapping structured meshes, each of which is associated with a smooth but potentially geometrically complex mapping. The earliest known appearance of a composite mesh method was to solve elliptic partial differential equations (Volkov, 1968), but similar methods were soon introduced for inviscid transonic flow (Magnus and Yoshihara, 1970) and the Euler equations of gas dynamics (Benek et al., 1983). In the years since, highorder oversetgrid (HOOG) approaches have been developed (Sherer and Scott, 2005) and applied to numerous problems (Lee and Baeder, 2002; Sherer and Visbal, 2004; Chou and Ekaterinaris, 2003). The individual meshes are structured, so that the entire discretization can be considered locally structured and globally unstructured. A concise summary of recent developments in the usage of overset meshes to solve problems involving compressible viscous fluids, along with a discussion of stable and accurate interpolation, is given in (Bodony et al., 2011). Additionally, a discussion of various applications of overset meshes can be found in (Noack and Slotnick, 2005).
Some of the earliest work on multirate multistep methods was done by Gear (Gear, 1974). Gear and Wells (Gear and Wells, 1984) later worked with these methods to automate the choice of step size and the partition of fast and slow components. The primary conclusion therein was that the problem of automating these integrator characteristics dynamically was mostly a function of software organization. Engstler and Lubich (Engstler and Lubich, 1997) used Richardson extrapolation along with simple Euler schemes to create a dynamically partitioned process of multirate integration.
Osher and Sanders (Osher and Sanders, 1983) introduced numerical approximations to conservation laws that changed the global CFL restrictions to local ones. Dawson and Kirby (Dawson and Kirby, 2001) performed local time stepping based on the formulations of Osher and Sanders, attaining only firstorder accuracy in time. Tang and Warnecke (Tang and Warnecke, 2006) expanded on this work to produce secondorder accuracy via more refined projections of solution increments at each local timestep.
Gunther, Kvaerno, and Rentrop (Günther et al., 2001) introduced multirate partitioned RungeKutta (MPRK) schemes, starting from a discussion of RosenbrockWanner methods, and based on strategies introduced by Gunther and Rentrop (Günther and Rentrop, 1993). This method focuses on coupling the fast and slow solution components primarily via interpolation and extrapolation of state variables, echoing the earlier work in (Gear, 1974) and (Gear and Wells, 1984).
Savcenco et. al (Savcenco et al., 2007) developed a selfadjusting multirate timestepping strategy primarily using implicit Rosenbrock methods on stiff ODEs. Constantinescu and Sandu (Constantinescu and Sandu, 2007) developed multirate timestepping methods for hyperbolic conservation laws based on a method of lines (MOL) approach with partitioned RungeKutta schemes. More recently, Sandu and Constantinescu (Sandu and Constantinescu, 2009) developed explicit multirate AdamsBashforth methods similar to the ones discussed in this work, and while the resulting methods are exactly conservative, their application is limited to onedimensional hyperbolic conservation laws, and their accuracy is limited to second order by an interface region.
Recently, Seny and Lambrechts extended the explicit multirate RungeKutta schemes of Constantinescu and Sandu to apply to discontinuous Galerkin computations for largescale geophysical flows, introducing the method (Seny et al., 2010) and subsequently demonstrating its efficacy in both serial and parallel implementations. Their latest work (Seny et al., 2014) focuses on efficient parallelization of the method using a multiconstraint partitioning library.
While all of these works characterize the stability of the multirate integrators numerically, only a few works approach the topic of stability theoretically. An analytical discussion of the stability of a multirate method (namely, for numerical integration of a system of firstorder ODEs that can be readily separated into subsystems) is given by Andrus in (Andrus, 1993). This study is based on a method that combines a fourthorder RungeKutta scheme (for the fast component) with a similar thirdorder scheme (for the slow component), introduced by the same author in (Andrus, 1979). Kvaerno also derived analytical expressions to find stability regions for her multirate RungeKutta schemes in (Kværnø, 2000).
In this study, we use a multirate AdamsBashforth integrator in a parallel fluid solver, taking advantage of the solver’s overset mesh capabilities to segregate solution components with differing timescales. We do so with particular focus on the resulting improvement in performance, reduction in work through fewer righthandside evaluations, and accompanying changes in stability regions and accuracy. What results from this effort is a number of conclusions about the viability and extensibility of these integrators to other problems and applications. By taking advantage of a new method of interpolation between overset meshes (Sharan et al., 2018), we attain a higher order of accuracy at interface regions than that of previous work. We also introduce extendedhistory AdamsBashforth schemes as a basis for a new class of multirate integrators that demonstrate improved stability compared to standard multirate AdamsBashforth (MRAB) schemes. For further reference on the multirate AdamsBashforth schemes implemented here, thorough analyses (especially empirical analyses of stability and the effect of various method design choices) are given in Klöckner (Klockner, 2010). The method is also discussed in a particleincell context by Stock (Stock, 2009), as well as more generally in (Gear, 1974), (Gear and Wells, 1984), and (Sandu and Constantinescu, 2009).
2 Background
2.1 The Compressible NavierStokes Equations on Overset Meshes
We apply multirate integrators to the compressible NavierStokes equations on overset meshes, making use of an SBPSAT discretization with a SATbased method that applies the effect of an interpolation as a penalty term in the righthand side. Details on the NavierStokes formulation, including the governing equations, nondimensionalization, and coordinate mappings can be found in (Sharan, 2016). Details on the SBP operators can be found in (Strand, 1994; Carpenter et al., 1993; Mattsson et al., 2004). For the results given later, we use the thirdorder SBP operator given in (Mattsson et al., 2004). Details on the accompanying SAT boundary conditions (to provide energy stability) can be found in (Svärd et al., 2007; Svärd and Nordström, 2008; Bodony, 2010).
2.2 SATBased Interpolation
The method used for interpolation between overset meshes is critical to the use of multirate integrators, as it forms the coupling between fast and slow terms. Our interpolation relies on a Chimera framework that makes use of PEGASUS (Suhs et al., 2002) and BELLERO (Sherer et al., 2006) interpolation scheme and tooling. In general, the process of communication between grids can be broken down into a number of phases:

Establish communication between grids. Compute the bounding box for a given grid, and use it to determine collisions with any other grids.

Hole cutting/fringe determination. Identify points on each grid as ”fringe points” which will donate and receive data from other grids, and identify points on coarser background grids which are well within the boundaries of the finer feature grids, and can thus be deemed inactive.

Donorreceiver pair search — see Figure 1. Fringe points on the receiver grid are paired with donor cells on the donor grid.

Interpolation. State data from the points in the donor cell is transferred to the receiver point via Lagrangian interpolation, with corresponding weights determined as a function of Lagrange shape functions.
See (Bodony et al., 2011) for further discussion of this implementation, including algorithms for holecutting and the donorreceiver pair search. Once the donor data is sent (interpolated) to the receiver grids, it must be applied to the receiver state. The conventional method of interpolation between overset meshes uses an “injection” procedure to apply the result of overset interpolation, in which the state values on the receiver grid are directly overwritten by the interpolated values from the donor grid. The interpolation itself is performed at each righthand side evaluation.
An alternative interface treatment given in (Sharan et al., 2018) follows a methodology similar to the weak boundary treatment using SAT (Svärd et al., 2007; Svärd and Nordström, 2008; Bodony, 2010) and applies the interpolated values as a penalization term in the righthand side via a target vector. As in (Sharan, 2016), we consider a single grid point on an overlapping interface. Using the same notation as (Sharan et al., 2018), we describe the interface as a boundary where , , or . is the normal direction to the face the grid point lies on, and the superscript indicates inflow (+) or outflow (). If denotes the solution at this grid point, with the interpolated value from the donor grid given as , we can express the discretized equation at this point as
(2.1) 
where denotes the derivatives of the fluxes, , is the (1,1) element of the positivedefinite matrix associated with the SBP operator, is an identity matrix of size , and , where and are the same transformation and diagonal matrices mentioned in (Svärd et al., 2007; Svärd and Nordström, 2008; Bodony, 2010). denotes the viscous flux at the interface point, and all ”hatted” terms indicate interpolated values. Note also that if the grid point lies on an edge or a corner in 3 dimensions, the interface terms for each normal direction must be added. The penalty parameters used in (2.1) are chosen following
for an inflow () or outflow () interface point. Unlike the injectionbased interpolation method, this scheme is provably stable.
Injectionbased methods are incompatible with the strictly ODEbased time advancement required for the schemes we will describe. Multistep schemes are explicitly reliant on maintaining accurate righthand side histories, whereas injection methods rely on inplace modification of state variables. The SAT interface treatment instead applies the effects of intergrid communication as a righthand side penalty, and is the method we will use.
3 Time Integration
3.1 AdamsBashforth Integration
To fix notation for new schemes developed later, we here give a brief derivation of a standard AdamsBashforth (AB) integrator, as described in Bashforth and Adams (1883). We start with a model IVP given by
This is the form that results from a method of lines (MOL) approach to solving timedependent partial differential equations like those considered in this study. We can find the solution at the next timestep by evaluating
We approximate the time dependency of the righthand side function with a polynomial with coefficients (formed by interpolating past values of ), extrapolate with that polynomial approximation, and integrate the extrapolant. We use a Vandermonde matrix to construct a linear system to obtain the coefficients to be used in extrapolation from history values:
(3.1) 
where is a vector evaluating the integral of the interpolation polynomial, and is the transpose of the Vandermonde matrix with monomial basis and nodes , corresponding to past time values, given by
(3.2) 
In (3.1) and (3.2), is equal to the order of the integrator, and are the time history values, with . The coefficients are used to extrapolate to the next state via
(3.3) 
Clearly, the length of the past history needed to calculate a step (and, thus, the memory required) influences the order of accuracy attained.
An alternative time integration method is required for the first few time steps (the exact number of which is dependent on the number of history values needed) in order to establish righthand side history and ”bootstrap” the method. We use a thirdorder RungeKutta (RK3) integrator Heun (1900) to bootstrap the thirdorder AB methods, whereas a fourthorder RungeKutta (RK4) integrator Kutta (1901) is used to bootstrap the fourthorder AB methods. A historical survey of these methods, along with RungeKutta methods in general, can be found in Butcher (1996).
3.2 ExtendedHistory AdamsBashforth Schemes
We will see in Section 4.2 that the ODE systems resulting from our SBPSAT discretization on overset meshes yield eigenvalue spectra that extend far along the negative real axis. To improve the stability of the AB schemes for ODE systems containing these eigenvalues, we develop new extendedhistory AdamsBashforth schemes. As shown in (3.1) and (3.2), the standard order AB scheme constructs a square () Vandermonde matrix to produce a linear system that, when solved, gives the vector of coefficients to be used in extrapolation from history values. With the extendedhistory scheme, we modify a family of AB schemes to include history values with , resulting in a nonsquare () Vandermonde matrix with a number of additional rows:
(3.4) 
This gives an underdetermined system without a unique solution for the coefficients (we are solving equations for unknowns). We select a solution that minimizes for further study:
(3.5) 
We expect that minimization of the coefficients to be used in extrapolation will minimize the possibility of overshoot and produce schemes with improved stability characteristics. To further motivate this choice of , we use a model ODE:
Referencing (3.3), we can derive a step matrix for a thirdorder AdamsBashforth integrator applied to this case such that
where the vector is given by , is given by , and the step matrix is given by
where is the vector of AdamsBashforth coefficients . Based on the definition of , it follows that the AdamsBashforth integration of this case will remain stable if , where are the eigenvalues of . The eigenvalues of are given by
where
Therefore, as , for all real values of . This indicates for this case that a choice of that minimizes is likely to produce superior stability characteristics, as it will lead to lower values of .
This method can be extended to any order. For third and fourthorder, the method produces what we term AB34 (thirdorder) and AB45 (fourthorder) schemes. We can plot stability regions (using the method we describe in B) for the new AB34 scheme for comparison against a fourthorder RungeKutta (RK4) integrator and standard AB3 integrators and, similarly, plot a comparison of the new AB45 scheme to RK4 and fourthorder AdamsBashforth (AB4) integrators (Figures 1(a) and 2(a)). Motivated by the amount of computational work per timestep, we normalize the approximate stability regions based on the number of righthand side evaluations per timestep a given integrator requires — that is, while RK4 requires four righthand side evaluations per timestep, AdamsBashforth integrators only require one.
When compared with the plots of approximate stability regions for standard AdamsBashforth schemes, the plots of the approximate stability regions for the AB34 and AB45 schemes extend further along the negative real axis, while slightly shrinking along the imaginary axis. In the upcoming section, we will see that the SBPSAT discretization we use motivates the use of this scheme to handle potentially problematic eigenvalues associated with interpolation and, in the results section, we will therefore consider both the standard AB schemes (AB3, AB4) and the new AB34 and AB45 schemes in our analysis of convergence, stability, and performance considerations. While we will limit our continued discussion of extendedhistory schemes to AB34 and AB45 integrators, we can also extend the history used to produce the coefficients even further, creating AB35 and AB46 schemes, and again viewing normalized approximate stability regions for these schemes (Figures 1(b) and 2(b)).
3.3 Multirate AdamsBashforth Integration
We now describe a multirate generalization of the scheme, making use of the algorithm introduced in Gear and Wells (1984). We consider the following model system with “fast” and “slow” solution components:
(3.6) 
With this in mind, we can set a slow (larger) time step for such that we maintain stability in the integration of the slow component. We also set a fast time step for such that is an integer multiple of , and define the ratio between the two, , as the step ratio of the multirate AdamsBashforth scheme (henceforth referred to as MRAB). While the results presented here make use of only two separate state components, each with its own righthand side function and independent rate, the theory is readily extensible to any number of rates.
In the twocomponent overset formulation with which we are concerned, we define the fast and slow components of our NavierStokes solution as the conserved variables on each grid, that is (using a twogrid case as an example): , , where the subscripts of the vectors indicate global grid number, and in this instance we assume Grid 1 to be the grid with the fastmoving component of the solution, be it due to physical behavior or finer mesh spacing. Each righthand side function and is a function of both the slow and fast states and — this coupling between the righthand side functions is, in the case of our application of this theory to overset meshes, realized by the SAT penalty interpolation discussed in Section 2.2.
Within this twocomponent scheme, a few design choices are available:

The order in which we evaluate and advance the solution components. Namely, two primary options are advancing the fastevolving solution component through all of its microtimesteps and waiting to perform the single macrotimestep required for the slow component until the end (a “fastestfirst” scheme, per the nomenclature of Gear and Wells (1984)), or pursuing an algorithm in which the slow component is instead advanced first.

For slowestfirst evaluation schemes, the choice of whether or not to reextrapolate the slow state after additional state and righthand side information is gathered at the microtimestep level.
Empirical observations on the effects of these choices, among others, are made in (Klockner, 2010). It may be useful to step through a brief example of a multirate AdamsBashforth integrator, using an example system with a “fast” component requiring twice as many timesteps as the “slow” component to remain wellresolved (). We lay out the steps of a third order fastestfirst MRAB scheme with no reextrapolation, assuming that evolves at the slow rate (macrotimestep ) and evolves at the fast rate (microtimestep ). denotes extrapolants of the righthand side functions as polynomial functions of both time and the set of history values : . These polynomials approximating the evolution of and in time are what we will integrate to march and , and will be updated to replace older history values with new righthand side evaluations during the course of integration through a macrotimestep . We assume availability of righthand side histories to start the AB method.

Form the polynomial extrapolants we will integrate, per the AB methods described in Section 3.1. The righthand side history values of (used to form ) have been obtained at time points , , and current time , whereas the righthand side history values of (used to form ) have been obtained at time points , , and .

March both and to time by integrating the polynomial extrapolants and formed in Step 1. This results in a set of intermediate values and .

Evaluate the fast righthand side .

Update the set of righthand side history values for to include these new values, and construct a new extrapolant .

March to time by integrating the extrapolant formed in Step 1.

March to time by integrating the extrapolant formed in Step 3.

Evaluate and and update extrapolants.
The scheme evaluates the fastevolving righthand side twice per macrotimestep , whereas the slowlyevolving righthand side is only evaluated once. For the results shown later, this is the scheme we will use, albeit generalized to different step ratios .
4 Time Integration in SBPSAT Discretizations
One of the critical questions to be answered is at what timestep sizes and the integrators developed in Section 3 remain stable. This section establishes the procedures and tools for characterizing the stability of the integrators developed in Section 3 in the context of the SBPSAT discretization used.
4.1 Timestep Calculation
For the discretization of the NavierStokes equations described in Section 2, we will incorporate the metrics and multidimensionality into calculation of the timestep used by a given integrator. The maximum stable timestep for a given NavierStokes simulation using the SBPSAT discretization is approximated via the following steps. First, we calculate an inviscid timestep:
(4.1) 
and a viscous timestep:
(4.2) 
where the vector contains the Cartesian velocities, is the speed of sound, , is the Jacobian, and the terms are the metric terms described in MacCormack (2014). To determine the timestep, we calculate and at each mesh point, then take
(4.3) 
and take an additional minimum over all mesh points. This timestep calculation is described in more detail in MacCormack (2014).
4.2 SAT Interpolation Stability
We now undertake a brief parameter study to show that one of the stiffest righthand side components limiting the timestep when solving the NavierStokes equations with the overset discretization involves the SAT penaltybased interpolation (discussed in Section 2.2). Specifically, we will show that penalty terms applied at fringe points with more than one nonzero characteristic direction (i.e. corner points) produce eigenvalues lying far along the negative real axis. To illustrate this phenomenon, we can linearize a computationally small NavierStokes problem (using the procedure for approximation of the Jacobian described in Osusky et al. (2010) to form the approximate linear operator of the system) employing SAT interpolation on overset meshes and analyze the eigenspectrum for the resulting global operator.
We express the time evolution of a fringe corner point as
(4.4) 
with denoting the NavierStokes righthand side, and denoting the penalty terms applied by the SAT interpolation procedure described in Section 2.2 (the latter two terms in (2.1)). We introduce a parameter that we can vary to apply a fractional weight to, and thus change the magnitude of, terms associated with the interpolation penalty applied at the corner fringe points, and determine the righthand side modes associated with these terms.
As for the case to be analyzed, we select an inviscid application of overset grids to the Euler equations with periodic boundary conditions. The initial condition is an momentum gradient of the form
with the momentum set to 0 everywhere. The initial condition and grid layout for this case are shown in Figure 4. Grid 1 (31 = 961 points) spans , , while Grid 2 ( = 1681 points) spans , .
We form the approximate linearized operators for this case for two values of the corner penalty weight parameter : and . By finding the eigenvalues of the two operators and comparing them, we can identify the eigenvalues associated with the fringe corners. Based on the highestmagnitude cornerassociated eigenvalue that can be determined from comparing these spectra, it is clear that the approximate stability region of the RK4 scheme contains the cornerassociated eigenvalue, which extends outwards along the negative real axis as we increase the penalty weight from 0.5 to 1, whereas the approximate stability region for the thirdorder AB scheme does not. This motivated the development of the extendedhistory schemes discussed in Section 3.2, which are shown to improve the stability of AB schemes along the negative real axis. Figure 5 shows that the approximate stability region for the thirdorder extendedhistory AB scheme indeed contains the cornerassociated eigenvalue.
4.3 Procedure for Determining Discrete Stability
We now develop a procedure for characterizing the discrete temporal stability of a given integrator within the SBPSAT discretization. Given that the AdamsBashforth integrators discussed in Section 3 use a linear combination of state and righthand side history to march in time (see (3.3)), we define an “integrator state vector” at time as
where is the vector of ODE state being integrated, and is a function (linear or nonlinear) denoting the righthand side for that state at a given time. For a fourthorder RungeKutta (RK4) integrator, which we will use as a baseline for comparison and to verify this procedure, the integrator state vector instead simply contains the vector of ODE state, as RK4 integration is not dependent on righthand side history:
We can then express integration forward one timestep as a matrix operation on the integrator state vector :
(4.5) 
where denotes the “step matrix” for a given integrator. These step matrices are functions of the timestep used in the simulation, and are formed using a linearization procedure that approximates the full Jacobian matrix of the system using finite differences. In doing so, analyzing the eigenvalues of this step matrix is analogous to analyzing the amplification factors of various modes in the global error of the scheme.
To construct the step matrix , we first use an initial state as input to an integrator with timestep to march one step forward in time and form the base result. Next, we perturb a single element of the initial state vector using a small value . Using this perturbed state as an input to the integrator returns a state that — when compared to the base result — demonstrates the effect of the perturbation of that single element on the entire domain. If we define as a column vector with all zeroes except for a 1 in the th row, we can write
Thus we calculate the th column of the step matrix by perturbing the th initial state vector element, giving this modified initial state to the integrator to obtain , and taking
(4.6) 
Note also that in the case of AdamsBashforth integration, the stability vector element being perturbed may also be a righthand side history element. For our perturbation , we use a value of . This procedure is similar to the NavierStokes linearization for SBPSAT discretizations described in Osusky et al. (2010).
Once the matrix is formed, we consider the system discretely stable if the spectral radius fulfills the property . Therefore, creating step matrices for an integrator at a number of different values of will allow us to characterize that integrator’s stability. We use SLEPC Hernandez et al. (2005) for the computation of the eigenvalues, using the default KrylovSchur solver with a tolerance of . We will use this procedure to determine the maximum stable timesteps of our integrators reported in Section 5.2.
4.4 RungeKutta Stability
We validate the procedure of the previous section by using it to determine the maximum stable timestep for an explicit fourthorder RungeKutta (RK4) integrator applied to 1D advection with a wave speed of 1. With a spatial domain with 61 mesh points, the step matrix analysis described in Section 4.3 predicts a maximum stable timestep for this case between and . Using the method described in B to approximate a given integrator’s stability region in the complex plane, we see that the imaginary axis bound for an RK4 integrator is about . When using a fourthorder SBP operator as described in (Strand, 1994; Carpenter et al., 1993; Mattsson et al., 2004) for spatial discretization of the semidiscrete problem, and with periodic boundaries, the eigenvalues can be found analytically, and all lie on the imaginary axis, with a maximum eigenvalue of (a result given by Lele Lele (1992)). We can thus estimate the maximum stable timestep for this problem, noting a grid spacing of , to be , a result which is consistent with that of our step matrix procedure.
5 Numerical Results
5.1 Guiding Experiment: Viscous Flow Past a Cylinder
We consider twodimensional viscous flow over a cylinder with diameter in the spatial domain , . The cylinder center is located at , . The freestream Mach number is 0.2, the Reynolds number is 200 and the Prandtl number is 0.72. We start our simulations from a steady state solution obtained after 100 nondimensional time units (, using the nondimensionalization of Park et al. (1998)) using an RK4 integration scheme. Note that the timesteps we report are also nondimensional. The physical problem is modeled using two overset meshes: a coarser, base Cartesian grid (), and a finer teardropshaped curvilinear grid () surrounding the stationary cylinder.
It is important to characterize the disparity in allowable timestep sizes such that we can estimate the maximum allowable step ratio SR we are able to use in our integrators. Calculated using the method described in Section 4.1 ((4.1), (4.2), and (4.3)), we find the ratio between the minimum timesteps on each grid to be about 12.
The boundary conditions employed in our simulation are as follows. The inner fine grid (Grid 2) is periodic in the azimuthal direction. We model the cylinder surface as an SAT isothermal wall per Svärd and Nordström (2008). On Grid 1 (Cartesian base grid), all outer boundaries are modeled as SAT farfield following the procedure of Svärd et al. (2007). In addition, sponge boundaries with a cell depth of 6 are used as described in Bodony (2006).
5.2 Multirate AdamsBashforth Stability
We will now determine the stability characteristics of a given MRAB integrator by defining two ratios: and , where is the maximum stable timestep of the integrator in question, and and are the maximum stable timesteps for singlerate AdamsBashforth (SRAB) and RK4 integrators, respectively, determined using the method of Section 4.3. Recall that the step ratio of a given MRAB integrator is given as , and is an integer value. An MRAB integrator with a step ratio of 1 is equivalent to an SRAB integrator. SR is a userspecified input.
For the experiments to determine stability, we set the value of the macrotimestep , build the step matrix of (4.5) (starting from a steadystate solution), and check for eigenvalues outside the unit disc, repeating the procedure until we find the maximum critical value of (to the nearest ) for an MRAB integrator with a given step ratio. The values are a measure of the maximum stable timesteps of these integrators relative to an RK4 integrator. The values reported for an MRAB integrator with a given order and history length use the maximum stable timestep for an SRAB integrator of the same order and history length. The percentages reported document percent efficiency of a given integrator, : that is, if an MRAB integrator with a step ratio of SR obtains an value of SR, it is deemed 100% efficient.
Standard (AB3)  ExtendedHistory (AB34)  

Int.  SR  %  %  
RK4  0.0217  1.000  5.13  0.0217  1.000  3.86  
SRAB  1  0.0042  0.195  1.00  100  0.0056  0.259  1.00  100 
MRAB  2  0.0084  0.388  1.99  99.5  0.0113  0.518  2.00  100 
MRAB  3  0.0127  0.583  2.99  99.7  0.0169  0.775  2.99  99.7 
MRAB  4  0.0169  0.777  3.99  99.8  0.0225  1.035  3.99  99.8 
MRAB  5  0.0171  0.788  4.04  80.8  0.0281  1.294  4.99  99.8 
MRAB  6  0.0171  0.788  4.04  67.3  0.0282  1.296  5.00  83.3 
MRAB  7  0.0171  0.788  4.04  57.7  0.0282  1.296  5.00  71.4 
MRAB  8  0.0171  0.788  4.04  50.5  0.0282  1.296  5.00  62.5 
MRAB  9  0.0171  0.788  4.04  44.9  0.0282  1.296  5.00  55.6 
MRAB  10  0.0171  0.788  4.04  40.4  0.0282  1.296  5.00  50.0 
MRAB  20  0.0171  0.788  4.04  20.2  0.0282  1.296  5.00  25.0 
Standard (AB4)  ExtendedHistory (AB45)  

Int.  SR  %  %  
RK4  0.0217  1.000  9.42  0.0217  1.000  5.63  
SRAB  1  0.0023  0.106  1.00  100  0.0039  0.178  1.00  100 
MRAB  2  0.0046  0.212  2.00  100  0.0078  0.356  2.00  100 
MRAB  3  0.0070  0.321  3.02  100  0.0116  0.533  3.00  100 
MRAB  4  0.0093  0.427  4.02  100  0.0155  0.711  4.00  100 
MRAB  5  0.0093  0.430  4.05  81.0  0.0180  0.827  4.65  93.0 
MRAB  6  0.0093  0.430  4.05  67.5  0.0180  0.827  4.65  77.5 
MRAB  7  0.0093  0.430  4.05  57.9  0.0180  0.827  4.65  66.4 
MRAB  8  0.0093  0.430  4.05  50.6  0.0180  0.827  4.65  58.1 
MRAB  9  0.0093  0.430  4.05  45.0  0.0180  0.827  4.65  51.7 
MRAB  10  0.0093  0.430  4.05  40.5  0.0180  0.827  4.65  46.5 
MRAB  20  0.0093  0.430  4.05  20.3  0.0180  0.827  4.65  23.3 
The results of Tables 1 and 2 show that the extendedhistory schemes have higher maximum stable timesteps than their standard counterparts of same order. All schemes attain nearperfect efficiency () for SRvalues up to about a third of the speed ratio of 12 for this case (), with the extendedhistory schemes maintaining this efficiency for the cases as well, a result due to the improved stability of these schemes along the negative real axis. Above this step ratio, we expect our performance to degrade, given that we are simply performing extra righthand side evaluations (more microtimesteps on the fast grid) with no commensurate macrotimestep gain. The increase from third to fourth order necessitates the use of lower timesteps for all step ratios, but the same qualitative ceiling as in the thirdorder case is observed. The thirdorder MRAB scheme with extended history is the only scheme able to attain for any step ratio, indicating maximum stable timesteps larger than that of the RK4 integrator.
5.3 MultiRate AdamsBashforth Accuracy and Convergence
We now examine the accuracy of the MRAB schemes and confirm that they are convergent. In presenting some accuracy results for our schemes, we will examine integrators for SRvalues ranging from 2 to 6. As we have seen in the previous section, above this step ratio, the maximum stable timestep remains unchanged for all AB integrators for our speed ratio of about 12. For each integrator, we calculate an estimated order of convergence (EOC) using 3 data points consisting of the macrotimestep used and the maximum error obtained in the density after 2.5 NDTU. The initial condition for these 2.5NDTU integrations is a uniform subsonic flow with Mach number in the direction (before the boundary layer has developed on the cylinder), and the true solution to which the solution obtained with our integrators is compared to form the error is the solution obtained using an RK4 integrator with a timestep of . Based on the stability results of Section 5.2, we limit our convergence study to the thirdorder schemes, which have the highest potential to improve performance. We see in Table 4 that an estimated order of convergence of about 3 is obtained in all cases, as expected.
SR  err,  err,  err,  EOC 

2  1.93E06  1.79E08  2.44E09  2.900 
3  1.73E06  1.48E08  1.87E09  2.965 
4  1.68E06  1.40E08  1.77E09  2.977 
5  1.67E06  1.38E08  1.73E09  2.984 
6  1.66E06  1.37E08  1.71E09  2.986 
SR  err,  err,  err,  EOC 

2  3.56E06  3.42E08  4.53E09  2.894 
3  3.29E06  2.86E08  3.63E09  2.956 
4  3.23E06  2.73E08  3.44E09  2.971 
5  3.22E06  2.68E08  3.37E09  2.979 
6  3.21E06  2.66E08  3.34E09  2.982 
6 MRAB Performance
6.1 Performance Model
Our next task is characterizing the benefit in performance when using these multirate integrators. What expectations should we have in terms of the reduction in computational work required for these new schemes compared to singlerate schemes?
To answer this question, we will develop a performance model for how we would expect an MRAB integrator of a certain step ratio to perform in comparison to the same RK4 integrator used as a baseline for the accuracy and stability tests. In doing so, we make a few assumptions: we assume that righthand side evaluations make up the bulk of the computational cost of the NavierStokes simulation, and we also assume that all simulations will be performed at or near the maximum stable timestep of a given integrator.
We note in forming this model that in a given timestep for an RK4 integrator, four righthand side evaluations must be performed, whereas in the AB schemes, only one evaluation is needed. Thus, to reach time , we can write expressions for the total number of righthand side evaluations each integrator will need to perform:
(6.1)  
(6.2) 
In (6.1) and (6.2), is the number of points on the fast grid, and is the number of points on the slow grid (assuming a twogrid configuration). Furthermore, SR is the step ratio of the AB integrator (the number of microsteps taken on the fast grid per macrostep on the slow grid — for an SRAB integrator, ), is the timestep taken by the RK4 scheme, and is the end time to be reached. Recall that the factor is defined in Section 5.2 as
This factor relates to the timestep restrictions of AB integrators relative to RK4. These restrictions, as we have seen in the analyses of Section 5.2, are imposed by stability constraints. Namely, the maximum stable timestep of the RK4 integrator applied to a certain case will be larger than that of an AB integrator, and therefore more timesteps must be taken with an AB integrator to reach time than the RK4 integrator takes to reach time .
Given the assumptions of our performance model, calculating the percent reduction in righthand side evaluations therefore provides a theoretical estimate of computational work saved when using a given AB integrator. We define the resulting speedup as
(6.3) 
Using this definition, any values of SU over 1 indicate a profitable integrator, whereas values of 1 or less indicate equal or detrimental performance. As an example of this performance model in execution, we can compose a model for the case discussed in Section 5.1 modeling the cylinder in crossflow, for a number of step ratios, and for both third and fourth order. We use the grid data presented in Section 5.1 and the values given in Section 5.2, and for simplicity we take . For these results and for all of the results that follow, we will use the more potentially profitable extendedhistory (AB34 and AB45) integrators.
Integrator  Total RHS Evals  % Red. from RK4  SU 

RK4  34244  0.00  1.00 
SRAB ()  33023  3.57  1.04 
MRAB ()  25846  24.52  1.32 
MRAB ()  23529  31.29  1.46 
MRAB ()  22310  34.85  1.53 
MRAB ()  21581  36.98  1.59 
MRAB ()  25274  26.19  1.35 
Integrator  Total RHS Evals  % Red. from RK4  SU 

RK4  34244  0.00  1.00 
SRAB ()  48154  40.62  0.71 
MRAB ()  37694  10.08  0.91 
MRAB ()  34204  0.12  1.00 
MRAB ()  32459  5.21  1.05 
MRAB ()  33756  1.42  1.01 
MRAB ()  39608  15.66  0.86 
Table 5 shows that we can expect speedup for all thirdorder MRAB integrators using the extended history scheme for this specific case — however, based on Table 6, we speculate that fourthorder MRAB integrators using the extended histories will largely fail to be profitable for this specific case, though the fourthorder schemes do reach a breakeven point at , attaining very minimal performance benefit at (around 5% reduction in righthand side evaluations), before decreasing again due to the stability ceiling observed in Section 5.2.
6.2 Sequential Performance
Having developed a performance model in the previous subsection and thus obtained expectations for how the integrators should perform, we now develop tests to verify this model. We gauge performance by using a given integrator to march the viscous flow over a cylinder case 10 NDTU (again, starting from a steady state solution) at the maximum stable timestep of the integrator, measuring endtoend wall clock time of the NavierStokes solver, time spent calculating righthand sides, time spent performing the necessary interpolations between grids and time spent performing spatial operatorrelated tasks, a subset of the righthand side calculation. We limit our performance testing here to thirdorder MRAB integrators with the extended history scheme only, as Section 6.1 shows these to be the most promising for this case. The percentages reported for each step ratio document the percent reduction in time compared to that of the RK4 integrator, and the predicted percentage is that which results from the performance model (based purely on righthand side evaluations as given by (6.1) and (6.2)).
Int.  R [s]  %  O [s]  %  I [s]  %  E [s]  %  % Pred. 

RK4  29.9  18.4  0.623  38.7  
()  28.29  5.6  17.44  5.2  0.273  56.2  41.73  8.0  3.6 
()  22.76  23.9  14.07  23.5  0.214  65.7  30.14  22.0  24.5 
()  21.60  27.7  13.33  27.5  0.201  67.7  28.90  25.2  31.3 
()  20.91  30.0  12.91  29.8  0.194  68.9  27.50  28.9  34.9 
()  18.13  39.3  11.17  39.2  0.163  73.8  23.60  38.9  37.0 
()  20.27  32.2  12.51  32.0  0.182  70.8  24.13  37.6  26.2 
We see in Table 7 that the measured times given for our new SRAB and MRAB integrators are all lower than those of RK4, with one exception (the SRAB integrator has a slightly larger endtoend time). Furthermore, we note in the case of sequential performance profiling that the observed interpolation times are low as we would expect (when not running the solver in parallel, this amounts to copying data between buffers) — we will see that the time spent interpolating becomes more significant as we transition to a discussion of parallel runs. Regarding the percentages, while we see good agreement with the overall trend of the model of Section 6.1, we see that a number of the percentage reductions observed are slightly lower than those predicted by the model.
6.3 Parallel Performance
In this section, we perform tests similar to those of Section 6.2, with a specific focus on the distributedmemory parallelization of the new integrators, in order to answer the question of how communication and interpolation between grids affects performance. The procedure with these tests is the same as in Section 6.2, but an important distinction to note is that we here document endtoend wall clock time, but the times reported for righthand side calculations, operator tasks, and interpolation are accumulated inclusive times (the total time spent by all processors performing these specific tasks). In the tables that follow, we report times for thirdorder MRAB integrators with extended history for various step ratios. Bold values indicate lower time spent with a certain task than that of the RK4 integrator.
.5in.5in
Proc  RK4  %  %  %  %  

2  23.52  25.21  7.2  33.82  43.8  25.70  9.3  24.28  3.2 
4  17.12  19.49  13.8  10.04  41.4  8.80  48.6  10.05  41.3 
8  8.37  9.36  11.9  6.41  23.4  6.64  20.6  7.04  15.8 
16  4.98  5.50  10.4  3.93  21.1  3.91  21.5  4.50  9.6 
.5in.5in
Proc  RK4  %  %  %  %  

2  31.17  29.82  4.3  23.67  24.1  20.91  32.9  20.66  33.7 
4  35.26  33.40  5.3  27.07  23.2  24.12  31.6  26.62  24.5 
8  44.28  41.97  5.2  32.97  25.5  28.06  36.6  31.74  28.3 
16  53.36  48.82  8.5  39.81  25.4  34.30  35.7  38.97  27.0 
.5in.5in
Proc  RK4  %  %  %  %  

2  19.17  18.46  3.7  14.61  23.8  12.86  32.9  12.65  34.0 
4  20.97  20.12  4.1  16.37  21.9  14.63  30.2  15.78  24.7 
8  24.97  23.83  4.6  18.88  24.4  16.29  34.8  18.60  25.5 
16  28.52  27.00  5.3  21.71  23.9  18.29  35.9  21.42  24.9 
.5in.5in
Proc  RK4  %  %  %  %  

2  5.55  3.86  30.5  29.61  433.5  21.92  295.0  22.04  297.1 
4  24.54  21.02  14.3  3.73  84.8  2.28  90.7  5.03  79.5 
8  13.27  10.05  24.3  1.46  89.0  6.60  50.3  11.31  14.8 
16  15.45  12.12  21.6  1.93  87.5  8.53  44.8  14.54  5.9 
Generally, what we see here is first and foremost that the time spent on righthand side and operatorrelated tasks (Table 9 and Table 10, respectively) is always lowered by the use of multirate methods, in certain cases by over 30%, in line with the model of Section 6.1). Furthermore, the endtoend wall clock times (Table 8) largely show that the reduction in righthand side evaluations indeed leads to endtoend speedup for a number of step ratios and processor counts. Regarding the SRAB scheme (), we see that the times spent in the interpolation, operator, and righthand side tasks are all lower than those of RK4, yet the endtoend times are very slightly higher for all core counts. The reason for this is the introduction of higher overhead from elsewhere in the program due to the requirement that more timesteps be run to reach the same end time.
For tests using 2 processors, we see a large amount of time spent in interpolation subroutines for all multirate integrators. This is due to the fact that the usage of multirate integration in the overset sense naturally induces load imbalance within the application, given that in a given macrotimestep, we will be evaluating more righthand sides on certain grids than on others. This causes long processor wait times (accounted for, in our case, as interpolation time) due to suboptimal work distribution and motivates a change to the existing decomposition of the problem, which simply distributes processors to grids based on the ratio of that grid’s number of points to the total number of points in the simulation. Rescaling this decomposition based on the multirate step ratio being used (a direct indication of how many righthand side evaluations per macrotimestep a given processor is responsible for) allows us to produce improved performance results at higher processor counts, reducing the wait times spent in gridtogrid communication. With only two cores, however, we are unable to rescale the decomposition to ameliorate the load imbalance produced by multirate. Further effects of intergrid communications are shown in the next section, focusing on largescale results.
6.4 LargeScale Performance
We now move to a different example to investigate the performance of the integrators at larger scale. The grids for this new case, modeling ignition via laserinduced breakdown (LIB) of an underexpanded jet, are shown in Figure 7, and are numerically described in Table 12. Note the percentage of total points in Grid 1 — this will be our fast grid, while the remaining 3 grids will be the slow grids (see Figure 8).
Grid  Grid Type  No. of Points  % of Total 

1  Cartesian  12,582,912  47.7 
2  Cylindrical  339,521  1.3 
3  Cylindrical  11,419,614  43.3 
4  Cylindrical  2,060,250  7.8 
Using the performance model outlined in (6.1), (6.2), and (6.3), and assuming values for this case similar to the viscous flow over a cylinder case, we can calculate expected performance. We test step ratios from 1 to 5, as we have seen in Section 5.2 that this is the range of step ratios for which we maintain perfect efficiency ().
Integrator  Total RHS Evals  % Red. from RK4  SU 

RK4  105,609,188  0.00  1.00 
SRAB  101,939,371  3.50  1.04 
MRAB ()  75,261,021  26.5  1.36 
MRAB ()  66,539,511  37.0  1.59 
MRAB ()  62,011,632  41.3  1.70 
MRAB ()  59,299,803  43.8  1.78 
The results in Table 13 suggest that we should expect moderate speedup when using a thirdorder multirate scheme with , with the speedup increasing as we increase the step ratio to 5. Performing the integration using these schemes allows us to measure the same times of interest as in the previous section and compare them with this model. The results we present are obtained from a 6,144core simulation on Stampede2, an NSFsupported supercomputer at the University of Texas and the Texas Advanced Computing Center (TACC), and report scaled time spent in various portions of the NavierStokes solver relative to the endtoend wallclock time of a RungeKuttadriven simulation.
These results show measured times that correlate well with what we would expect for reduction in operator and righthand side costs, and furthermore, we see large reductions in the time spent in interpolationrelated subroutines for all step ratios — this is a direct result of implementation of multiratespecific overset interpolation schemes featuring selective communications and separate sendreceive schemes (interleaved with righthand side calculation). In the end, the main time marching loop sees moderate speedup for all step ratios above 1, but is below the prediction of the performance model. Our results show roughly constant time spent in all other components of the NavierStokes solver not explicitly timed, and higher multirate times in two specific portions of the solver, both of which can be readily explained:

Timestep calculation. While we do scale the frequency of these calculations such that they occur the same number of times in RK4 as in our integrators, the aforementioned new interpolation scheme — which reduces the amount of wait times in intergrid communication — shifts load imbalance to these calculations, which feature communication between all processors to determine the minimum timestep.

Ghost cell exchanges. These operations involve communication between processors working on the same grid (to exchange ghost cell data used to fill boundary stencils). Rescaling the decomposition the solver uses to allocate more processing power to the fast grid and account for the inherent load imbalance (as mentioned at the end of Section 6.3) results in more processor boundaries that require this operation to be performed. The question of balancing intergrid communication with intragrid communication when decomposing the problem to avoid load imbalance remains. We leave this for future work.
In short, the results for this largescale case show that while we see performance benefit due to decrease in righthand side evaluations and implementation of more efficient interpolation routines, the improvement is below the estimates of the performance model due to increased communication times induced by load imbalance.
7 Conclusions and Future Work
In this paper, we have developed multirate AdamsBashforth integrators to take advantage of an overset discretization of the NavierStokes equations using SBP spatial operators, SAT boundary conditions, and an SAT penaltybased interpolation method. We improve the temporal stability of these integrators by introducing extendedhistory AdamsBashforth schemes with higher maximum stable timesteps for ODE systems with eigenvalues along the negative real axis. These new schemes are shown, using a number of numerical tools and examples, to be stable and accurate, and are also shown to improve performance and reduce the computational work required to march a given system in time when compared to both singlerate AdamsBashforth integrators and a fourthorder RungeKutta integrator. In our demonstration of performance improvement using these new multirate schemes, we also identify load balance as a critical consideration for further reduction in computational time — this is left for future work.
Acknowledgments
This material is based in part upon work supported by the Department of Energy, National Nuclear Security Administration, under Award Number DENA0002374. Computational support has been provided, in part, by the National Science Foundation XSEDE resources under grant TGCTS090004.
References
References
 Andrus (1993) J. Andrus. Stability of a multirate method for numerical integration of ode’s. Computers & Mathematics with applications, 25(2):3–14, 1993.
 Andrus (1979) J. F. Andrus. Numerical solution of systems of ordinary differential equations separated into subsystems. SIAM Journal on Numerical Analysis, 16(4):605–611, 1979.
 Bashforth and Adams (1883) F. Bashforth and J. C. Adams. An attempt to test the theories of capillary action. University Press, 1883.
 Benek et al. (1983) J. Benek, J. Steger, and F. C. Dougherty. A flexible grid embedding technique with application to the Euler equations. In 6th Computational Fluid Dynamics Conference Danvers, page 1944, 1983.
 Bodony (2006) D. J. Bodony. Analysis of sponge zones for computational fluid mechanics. Journal of Computational Physics, 212(2):681–702, 2006.
 Bodony (2010) D. J. Bodony. Accuracy of the simultaneousapproximationterm boundary condition for timedependent problems. Journal of Scientific Computing, 43(1):118–133, 2010.
 Bodony et al. (2011) D. J. Bodony, G. Zagaris, A. Reichert, and Q. Zhang. Provably stable overset grid methods for computational aeroacoustics. Journal of Sound and Vibration, 330(17):4161–4179, 2011.
 Butcher (1996) J. C. Butcher. A history of RungeKutta methods. Applied numerical mathematics, 20(3):247–260, 1996.
 Carpenter et al. (1993) M. H. Carpenter, D. Gottlieb, and S. Abarbanel. Timestable boundary conditions for finitedifference schemes solving hyperbolic systems: methodology and application to highorder compact schemes. Technical report, NASA, 1993.
 Chou and Ekaterinaris (2003) H. Chou and J. Ekaterinaris. A compact highorder CFD package for the flow solver OVERFLOW. In 41st Aerospace Sciences Meeting and Exhibit, page 1234, 2003.
 Constantinescu and Sandu (2007) E. M. Constantinescu and A. Sandu. Multirate timestepping methods for hyperbolic conservation laws. Journal of Scientific Computing, 33(3):239–278, 2007.
 Dawson and Kirby (2001) C. Dawson and R. Kirby. High resolution schemes for conservation laws with locally varying time steps. SIAM Journal on Scientific Computing, 22(6):2256–2281, 2001.
 Engstler and Lubich (1997) C. Engstler and C. Lubich. Multirate extrapolation methods for differential equations with different time scales. Computing, 58(2):173–185, 1997.
 Gear (1974) C. Gear. Multirate methods for ordinary differential equations. Technical report, Illinois Univ., Urbana (USA). Dept. of Computer Science, 1974.
 Gear and Wells (1984) C. W. Gear and D. Wells. Multirate linear multistep methods. BIT Numerical Mathematics, 24(4):484–502, 1984.
 Günther and Rentrop (1993) M. Günther and P. Rentrop. Multirate ROW methods and latency of electric circuits. Applied Numerical Mathematics, 13(1):83–102, 1993.
 Günther et al. (2001) M. Günther, A. Kvaernø, and P. Rentrop. Multirate partitioned RungeKutta methods. BIT Numerical Mathematics, 41(3):504–514, 2001.
 Hernandez et al. (2005) V. Hernandez, J. E. Roman, and V. Vidal. SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Software, 31(3):351–362, 2005.
 Heun (1900) K. Heun. Neue methoden zur approximativen integration der differentialgleichungen einer unabhängigen veränderlichen. Z. Math. Phys, 45:23–38, 1900.
 Klockner (2010) A. Klockner. Highperformance highorder simulation of wave and plasma phenomena. PhD thesis, Brown University, 2010.
 Klöckner and Wala (2018a) A. Klöckner and M. Wala. Dagrt. https://gitlab.tiker.net/inducer/dagrt, 2018a.
 Klöckner and Wala (2018b) A. Klöckner and M. Wala. Leap. https://gitlab.tiker.net/inducer/leap, 2018b.
 Kutta (1901) W. Kutta. Beitrag zur näherungweisen integration totaler differentialgleichungen. 1901.
 Kværnø (2000) A. Kværnø. Stability of multirate RungeKutta schemes. 2000.
 Lee and Baeder (2002) Y. Lee and J. Baeder. Highorder overset method for blade vortex interaction. In 40th AIAA Aerospace Sciences Meeting & Exhibit, page 559, 2002.
 Lele (1992) S. K. Lele. Compact finite difference schemes with spectrallike resolution. Journal of computational physics, 103(1):16–42, 1992.
 MacCormack (2014) R. W. MacCormack. Numerical computation of compressible and viscous flow. American Institute of Aeronautics and Astronautics, 2014.
 Magnus and Yoshihara (1970) R. Magnus and H. Yoshihara. Inviscid transonic flow over airfoils. AIAA Journal, 8(12):2157–2162, 1970.
 Mattsson et al. (2004) K. Mattsson, M. Svärd, and J. Nordström. Stable and accurate artificial dissipation. Journal of Scientific Computing, 21(1):57–79, 2004.
 Noack and Slotnick (2005) R. Noack and J. Slotnick. A summary of the 2004 overset symposium on composite grids and solution technology. In 43rd AIAA Aerospace Sciences Meeting and Exhibit, page 921, 2005.
 Osher and Sanders (1983) S. Osher and R. Sanders. Numerical approximations to nonlinear conservation laws with locally varying time and space grids. Mathematics of computation, 41(164):321–336, 1983.
 Osusky et al. (2010) M. Osusky, J. Hicken, and D. Zingg. A parallel NewtonKrylovSchur flow solver for the NavierStokes equations using the SBPSAT approach. In 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition, page 116, 2010.
 Park et al. (1998) J. Park, K. Kwon, and H. Choi. Numerical solutions of flow past a circular cylinder at Reynolds numbers up to 160. KSME international Journal, 12(6):1200–1205, 1998.
 Sandu and Constantinescu (2009) A. Sandu and E. M. Constantinescu. Multirate explicit Adams methods for time integration of conservation laws. Journal of Scientific Computing, 38(2):229–249, 2009.
 SanzSerna (1980) J. SanzSerna. Some aspects of the boundary locus method. BIT Numerical Mathematics, 20(1):97–101, 1980.
 Savcenco et al. (2007) V. Savcenco, W. Hundsdorfer, and J. Verwer. A multirate time stepping strategy for stiff ordinary differential equations. BIT Numerical Mathematics, 47(1):137–155, 2007.
 Seny et al. (2010) B. Seny, J. Lambrechts, R. Comblen, V. Legat, J.F. Remacle, et al. Multirate time stepping methods for accelerating explicit discontinuous Galerkin computations. In 9th International workshop on Multiscale (Un)structured mesh numerical Modeling for coastal, shelf, and global ocean dynamics, 2010.
 Seny et al. (2014) B. Seny, J. Lambrechts, T. Toulorge, V. Legat, and J.F. Remacle. An efficient parallel implementation of explicit multirate Runge–Kutta schemes for discontinuous Galerkin computations. Journal of Computational Physics, 256:135–160, 2014.
 Sharan (2016) N. Sharan. Timestable highorder finite difference methods for overset grids. PhD thesis, University of Illinois at UrbanaChampaign, 2016.
 Sharan et al. (2018) N. Sharan, C. Pantano, and D. J. Bodony. Timestable overset grid method for hyperbolic problems using summationbyparts operators. Journal of Computational Physics, 361:199–230, 2018.
 Sherer and Visbal (2004) S. Sherer and M. Visbal. Implicit large eddy simulations using a highorder overset grid solver. In 34th AIAA Fluid Dynamics Conference and Exhibit, page 2530, 2004.
 Sherer et al. (2006) S. Sherer, M. Visbal, and M. Galbraith. Automated preprocessing tools for use with a highorder oversetgrid algorithm. In 44th AIAA Aerospace Sciences Meeting and Exhibit, page 1147, 2006.
 Sherer and Scott (2005) S. E. Sherer and J. N. Scott. Highorder compact finitedifference methods on general overset grids. Journal of Computational Physics, 210(2):459–496, 2005.
 Steger (1991) J. Steger. The Chimera method of flow simulation. In Workshop on applied CFD, Univ of Tennessee Space Institute, volume 188, 1991.
 Stock (2009) A. Stock. Development and application of a multirate multistep AB method to a discontinuous Galerkin method based particle in cell scheme, 2009.
 Strand (1994) B. Strand. Summation by parts for finite difference approximations for d/dx. Journal of Computational Physics, 110(1):47–67, 1994.
 Suhs et al. (2002) N. Suhs, S. Rogers, and W. Dietz. PEGASUS 5: an automated preprocessor for oversetgrid CFD. In 32nd AIAA Fluid Dynamics Conference and Exhibit, page 3186, 2002.
 Svärd and Nordström (2008) M. Svärd and J. Nordström. A stable highorder finite difference scheme for the compressible Navier–Stokes equations: noslip wall boundary conditions. Journal of Computational Physics, 227(10):4805–4824, 2008.
 Svärd et al. (2007) M. Svärd, M. H. Carpenter, and J. Nordström. A stable highorder finite difference scheme for the compressible Navier–Stokes equations, farfield boundary conditions. Journal of Computational Physics, 225(1):1020–1038, 2007.
 Tang and Warnecke (2006) H.z. Tang and G. Warnecke. High resolution schemes for conservation laws and convectiondiffusion equations with varying time and space grids. Journal of computational mathematics, pages 121–140, 2006.
 Volkov (1968) E. A. Volkov. The method of composite meshes for finite and infinite regions with piecewise smooth boundary. Trudy Matematicheskogo Instituta imeni VA Steklova, 96:117–148, 1968.
Appendix A Software and Reproducibility
Here, we give brief synopses of the software tools used to implement the methods described in this paper.

PlasComCM is a Fortran 90 code written to solve the compressible NavierStokes equations on overset meshes. PlasComCM is currently being used in the University of Illinois’ NNSA and DOEfunded PSAAPII center, the Center for Exascale Simulation of PlasmaCoupled Combustion (XPACC). For more on XPACC and its work, see https://xpacc.illinois.edu. For more on PlasComCM, see https://bitbucket.org/xpacc/plascomcm.

Leap is a Python package used to describe integration methods (including multirate integrators) with flexible algorithms via a virtual machine, and is capable of describing both implicit and explicit time steppers in the form of instructions that can then be passed to Dagrt (see below) to generate Fortran or Python code. Our results have been generated using Git revision 9382dd35 at https://github.com/inducer/leap Klöckner and Wala [2018b].

Dagrt, a second Python package, is a DAGbased runtime system which can generate Fortran or Python code implementing the integrators described by Leap for a given righthandside. In using this tool, and Leap, with a host application, the user needs to describe the data types to be operated on, along with the righthandsides that the host application uses, in a short Python driver. Our results have been generated using Git revision 3ccb3479 at https://github.com/inducer/dagrt Klöckner and Wala [2018a].
Appendix B Plotting Approximate Stability Regions
In order to better pose the question of what this maximum stable timestep is, we must first characterize and plot the approximate stability regions of the integrator. These stability regions are plotted in the complex plane, and enclose a space (where are the eigenvalues of the righthand side being integrated, and is the timestep) within which an integrator is approximated to be stable.
We approximate the stability region in the complex plane for a given integrator by marching the simple scalar ODE
in time, with a timestep of , and with initial condition at . By setting
with , we can approximate an integrator’s stability region by varying for a given value of and marching for 100 timesteps. For a given combination, if before 100 steps are taken the value of exceeds 2, we deem the method unstable, and is decreased. Otherwise, is increased for that value until the method becomes unstable. For all approximate stability regions given, we define the stability boundary using a tolerance on of , and we use 500 equally spaced values in the range . This procedure is essentially an application of the boundary locus method described in SanzSerna [1980] for ordinary differential equations, which we find to be a reasonable heuristic for discussion of stability in the context of both singlerate and multirate time integration of PDEs using an SBPSAT discretization.