Multi-Rate Time Integration on Overset Meshes

Multi-Rate Time Integration on Overset Meshes

Cory Mikida cmikida2@illinois.edu Andreas Klöckner andreask@illinois.edu Daniel Bodony bodony@illinois.edu Department of Aerospace Engineering, University of Illinois at Urbana-Champaign, 104 South Wright St, Urbana, IL 61801 Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Ave, Urbana, IL 61801
Abstract

Overset meshes are an effective tool for the computational fluid dynamic simulation of problems with complex geometries or multiscale spatio-temporal features. When the maximum allowable timestep on one or more meshes is significantly smaller than on the remaining meshes, standard explicit time integrators impose inefficiencies for time-accurate calculations by requiring that all meshes advance with the smallest timestep. With the targeted use of multi-rate time integrators, separate meshes can be time-marched at independent rates to avoid wasteful computation while maintaining accuracy and stability. This work applies time-explicit multi-rate integrators to the simulation of the compressible Navier-Stokes equations discretized on overset meshes using summation-by-parts (SBP) operators and simultaneous approximation term (SAT) boundary conditions. We introduce a novel class of multi-rate Adams-Bashforth (MRAB) schemes that offer significant stability improvements and computational efficiencies for SBP-SAT methods. We present numerical results that confirm the numerical efficacy of MRAB integrators, outline a number of outstanding implementation challenges, and demonstrate a reduction in computational cost enabled by MRAB. We also investigate the use of our method in the setting of a large-scale distributed-memory parallel implementation where we discuss concerns involving load balancing and communication efficiency.

keywords:
Multi-Rate Time Integration, Overset Meshes, Chimera, Summation-By-Parts, Simultaneous-Approximation-Term, Adams-Bashforth.
\newfloatcommand

capbtabboxtable[][\FBwidth]

1 Introduction

In the time-explicit direct numerical simulation (DNS) of computational fluid dynamic problems, the maximum timestep allowable for stable integration of the governing equations is often limited by the well-known Courant-Freidrichs-Lewy number (CFL),

where is the minimum grid spacing, and is the maximum characteristic speed of the physical phenomena being simulated, or its viscous analog,

where is the characteristic diffusivity. The timestep taken when integrating over the entire computational domain using a standard explicit single-rate integrator can be severely limited by dynamics that occur on a small portion of the domain, be it due to shorter time scales of certain physical phenomena, such as high-speed flow, or locally-high grid resolution.

With multi-rate integration, groups of degrees of freedom, or even individual right-hand side contributions corresponding to evolution on small timescales, can be integrated with independent time steps, allowing computational work to be avoided on the “slow” components while the “fast” components remain stable and well-resolved in time. This improves the efficiency of the application. The present study focuses on an application of multi-rate integrators to problems with overset meshes.

The method of overset meshes (also known as the Chimera method (Steger, 1991)) is an approach that discretizes a domain using multiple independent, overlapping structured meshes, each of which is associated with a smooth but potentially geometrically complex mapping. The earliest known appearance of a composite mesh method was to solve elliptic partial differential equations (Volkov, 1968), but similar methods were soon introduced for inviscid transonic flow (Magnus and Yoshihara, 1970) and the Euler equations of gas dynamics (Benek et al., 1983). In the years since, high-order overset-grid (HO-OG) approaches have been developed (Sherer and Scott, 2005) and applied to numerous problems (Lee and Baeder, 2002; Sherer and Visbal, 2004; Chou and Ekaterinaris, 2003). The individual meshes are structured, so that the entire discretization can be considered locally structured and globally unstructured. A concise summary of recent developments in the usage of overset meshes to solve problems involving compressible viscous fluids, along with a discussion of stable and accurate interpolation, is given in (Bodony et al., 2011). Additionally, a discussion of various applications of overset meshes can be found in (Noack and Slotnick, 2005).

Some of the earliest work on multi-rate multistep methods was done by Gear (Gear, 1974). Gear and Wells (Gear and Wells, 1984) later worked with these methods to automate the choice of step size and the partition of fast and slow components. The primary conclusion therein was that the problem of automating these integrator characteristics dynamically was mostly a function of software organization. Engstler and Lubich (Engstler and Lubich, 1997) used Richardson extrapolation along with simple Euler schemes to create a dynamically partitioned process of multi-rate integration.

Osher and Sanders (Osher and Sanders, 1983) introduced numerical approximations to conservation laws that changed the global CFL restrictions to local ones. Dawson and Kirby (Dawson and Kirby, 2001) performed local time stepping based on the formulations of Osher and Sanders, attaining only first-order accuracy in time. Tang and Warnecke (Tang and Warnecke, 2006) expanded on this work to produce second-order accuracy via more refined projections of solution increments at each local timestep.

Gunther, Kvaerno, and Rentrop (Günther et al., 2001) introduced multi-rate partitioned Runge-Kutta (MPRK) schemes, starting from a discussion of Rosenbrock-Wanner methods, and based on strategies introduced by Gunther and Rentrop (Günther and Rentrop, 1993). This method focuses on coupling the fast and slow solution components primarily via interpolation and extrapolation of state variables, echoing the earlier work in (Gear, 1974) and (Gear and Wells, 1984).

Savcenco et. al (Savcenco et al., 2007) developed a self-adjusting multi-rate timestepping strategy primarily using implicit Rosenbrock methods on stiff ODEs. Constantinescu and Sandu (Constantinescu and Sandu, 2007) developed multi-rate timestepping methods for hyperbolic conservation laws based on a method of lines (MOL) approach with partitioned Runge-Kutta schemes. More recently, Sandu and Constantinescu (Sandu and Constantinescu, 2009) developed explicit multi-rate Adams-Bashforth methods similar to the ones discussed in this work, and while the resulting methods are exactly conservative, their application is limited to one-dimensional hyperbolic conservation laws, and their accuracy is limited to second order by an interface region.

Recently, Seny and Lambrechts extended the explicit multi-rate Runge-Kutta schemes of Constantinescu and Sandu to apply to discontinuous Galerkin computations for large-scale geophysical flows, introducing the method (Seny et al., 2010) and subsequently demonstrating its efficacy in both serial and parallel implementations. Their latest work (Seny et al., 2014) focuses on efficient parallelization of the method using a multi-constraint partitioning library.

While all of these works characterize the stability of the multi-rate integrators numerically, only a few works approach the topic of stability theoretically. An analytical discussion of the stability of a multi-rate method (namely, for numerical integration of a system of first-order ODEs that can be readily separated into subsystems) is given by Andrus in (Andrus, 1993). This study is based on a method that combines a fourth-order Runge-Kutta scheme (for the fast component) with a similar third-order scheme (for the slow component), introduced by the same author in (Andrus, 1979). Kvaerno also derived analytical expressions to find stability regions for her multi-rate Runge-Kutta schemes in (Kværnø, 2000).

In this study, we use a multi-rate Adams-Bashforth integrator in a parallel fluid solver, taking advantage of the solver’s overset mesh capabilities to segregate solution components with differing timescales. We do so with particular focus on the resulting improvement in performance, reduction in work through fewer right-hand-side evaluations, and accompanying changes in stability regions and accuracy. What results from this effort is a number of conclusions about the viability and extensibility of these integrators to other problems and applications. By taking advantage of a new method of interpolation between overset meshes (Sharan et al., 2018), we attain a higher order of accuracy at interface regions than that of previous work. We also introduce extended-history Adams-Bashforth schemes as a basis for a new class of multi-rate integrators that demonstrate improved stability compared to standard multi-rate Adams-Bashforth (MRAB) schemes. For further reference on the multi-rate Adams-Bashforth schemes implemented here, thorough analyses (especially empirical analyses of stability and the effect of various method design choices) are given in Klöckner (Klockner, 2010). The method is also discussed in a particle-in-cell context by Stock (Stock, 2009), as well as more generally in (Gear, 1974), (Gear and Wells, 1984), and (Sandu and Constantinescu, 2009).

2 Background

2.1 The Compressible Navier-Stokes Equations on Overset Meshes

We apply multi-rate integrators to the compressible Navier-Stokes equations on overset meshes, making use of an SBP-SAT discretization with a SAT-based method that applies the effect of an interpolation as a penalty term in the right-hand side. Details on the Navier-Stokes formulation, including the governing equations, nondimensionalization, and coordinate mappings can be found in (Sharan, 2016). Details on the SBP operators can be found in (Strand, 1994; Carpenter et al., 1993; Mattsson et al., 2004). For the results given later, we use the third-order SBP operator given in (Mattsson et al., 2004). Details on the accompanying SAT boundary conditions (to provide energy stability) can be found in (Svärd et al., 2007; Svärd and Nordström, 2008; Bodony, 2010).

2.2 SAT-Based Interpolation

The method used for interpolation between overset meshes is critical to the use of multi-rate integrators, as it forms the coupling between fast and slow terms. Our interpolation relies on a Chimera framework that makes use of PEGASUS (Suhs et al., 2002) and BELLERO (Sherer et al., 2006) interpolation scheme and tooling. In general, the process of communication between grids can be broken down into a number of phases:

  • Establish communication between grids. Compute the bounding box for a given grid, and use it to determine collisions with any other grids.

  • Hole cutting/fringe determination. Identify points on each grid as ”fringe points” which will donate and receive data from other grids, and identify points on coarser background grids which are well within the boundaries of the finer feature grids, and can thus be deemed inactive.

  • Donor-receiver pair search — see Figure 1. Fringe points on the receiver grid are paired with donor cells on the donor grid.

  • Interpolation. State data from the points in the donor cell is transferred to the receiver point via Lagrangian interpolation, with corresponding weights determined as a function of Lagrange shape functions.

Figure 1: Sample grids for explanation of overset interpolation. The cell outlined in green and the blue point represent an example donor cell receiver point pair.

See (Bodony et al., 2011) for further discussion of this implementation, including algorithms for hole-cutting and the donor-receiver pair search. Once the donor data is sent (interpolated) to the receiver grids, it must be applied to the receiver state. The conventional method of interpolation between overset meshes uses an “injection” procedure to apply the result of overset interpolation, in which the state values on the receiver grid are directly overwritten by the interpolated values from the donor grid. The interpolation itself is performed at each right-hand side evaluation.

An alternative interface treatment given in (Sharan et al., 2018) follows a methodology similar to the weak boundary treatment using SAT (Svärd et al., 2007; Svärd and Nordström, 2008; Bodony, 2010) and applies the interpolated values as a penalization term in the right-hand side via a target vector. As in (Sharan, 2016), we consider a single grid point on an overlapping interface. Using the same notation as (Sharan et al., 2018), we describe the interface as a boundary where , , or . is the normal direction to the face the grid point lies on, and the superscript indicates inflow (+) or outflow (-). If denotes the solution at this grid point, with the interpolated value from the donor grid given as , we can express the discretized equation at this point as

(2.1)

where denotes the derivatives of the fluxes, , is the (1,1) element of the positive-definite matrix associated with the SBP operator, is an identity matrix of size , and , where and are the same transformation and diagonal matrices mentioned in (Svärd et al., 2007; Svärd and Nordström, 2008; Bodony, 2010). denotes the viscous flux at the interface point, and all ”hatted” terms indicate interpolated values. Note also that if the grid point lies on an edge or a corner in 3 dimensions, the interface terms for each normal direction must be added. The penalty parameters used in (2.1) are chosen following

for an inflow () or outflow () interface point. Unlike the injection-based interpolation method, this scheme is provably stable.

Injection-based methods are incompatible with the strictly ODE-based time advancement required for the schemes we will describe. Multi-step schemes are explicitly reliant on maintaining accurate right-hand side histories, whereas injection methods rely on in-place modification of state variables. The SAT interface treatment instead applies the effects of inter-grid communication as a right-hand side penalty, and is the method we will use.

3 Time Integration

3.1 Adams-Bashforth Integration

To fix notation for new schemes developed later, we here give a brief derivation of a standard Adams-Bashforth (AB) integrator, as described in Bashforth and Adams (1883). We start with a model IVP given by

This is the form that results from a method of lines (MOL) approach to solving time-dependent partial differential equations like those considered in this study. We can find the solution at the next timestep by evaluating

We approximate the time dependency of the right-hand side function with a polynomial with coefficients (formed by interpolating past values of ), extrapolate with that polynomial approximation, and integrate the extrapolant. We use a Vandermonde matrix to construct a linear system to obtain the coefficients to be used in extrapolation from history values:

(3.1)

where is a vector evaluating the integral of the interpolation polynomial, and is the transpose of the Vandermonde matrix with monomial basis and nodes , corresponding to past time values, given by

(3.2)

In (3.1) and (3.2), is equal to the order of the integrator, and are the time history values, with . The coefficients are used to extrapolate to the next state via

(3.3)

Clearly, the length of the past history needed to calculate a step (and, thus, the memory required) influences the order of accuracy attained.

An alternative time integration method is required for the first few time steps (the exact number of which is dependent on the number of history values needed) in order to establish right-hand side history and ”bootstrap” the method. We use a third-order Runge-Kutta (RK3) integrator Heun (1900) to bootstrap the third-order AB methods, whereas a fourth-order Runge-Kutta (RK4) integrator Kutta (1901) is used to bootstrap the fourth-order AB methods. A historical survey of these methods, along with Runge-Kutta methods in general, can be found in Butcher (1996).

3.2 Extended-History Adams-Bashforth Schemes

We will see in Section 4.2 that the ODE systems resulting from our SBP-SAT discretization on overset meshes yield eigenvalue spectra that extend far along the negative real axis. To improve the stability of the AB schemes for ODE systems containing these eigenvalues, we develop new extended-history Adams-Bashforth schemes. As shown in (3.1) and (3.2), the standard -order AB scheme constructs a square () Vandermonde matrix to produce a linear system that, when solved, gives the vector of coefficients to be used in extrapolation from history values. With the extended-history scheme, we modify a family of AB schemes to include history values with , resulting in a non-square () Vandermonde matrix with a number of additional rows:

(3.4)

This gives an underdetermined system without a unique solution for the coefficients (we are solving equations for unknowns). We select a solution that minimizes for further study:

(3.5)

We expect that minimization of the coefficients to be used in extrapolation will minimize the possibility of overshoot and produce schemes with improved stability characteristics. To further motivate this choice of , we use a model ODE:

Referencing (3.3), we can derive a step matrix for a third-order Adams-Bashforth integrator applied to this case such that

where the vector is given by , is given by , and the step matrix is given by

where is the vector of Adams-Bashforth coefficients . Based on the definition of , it follows that the Adams-Bashforth integration of this case will remain stable if , where are the eigenvalues of . The eigenvalues of are given by

where

Therefore, as , for all real values of . This indicates for this case that a choice of that minimizes is likely to produce superior stability characteristics, as it will lead to lower values of .

This method can be extended to any order. For third and fourth-order, the method produces what we term AB34 (third-order) and AB45 (fourth-order) schemes. We can plot stability regions (using the method we describe in B) for the new AB34 scheme for comparison against a fourth-order Runge-Kutta (RK4) integrator and standard AB3 integrators and, similarly, plot a comparison of the new AB45 scheme to RK4 and fourth-order Adams-Bashforth (AB4) integrators (Figures 1(a) and 2(a)). Motivated by the amount of computational work per timestep, we normalize the approximate stability regions based on the number of right-hand side evaluations per timestep a given integrator requires — that is, while RK4 requires four right-hand side evaluations per timestep, Adams-Bashforth integrators only require one.

(a) Approximate stability regions for AB34 integrators compared to AB3 and RK4.
(b) Approximate stability regions for AB35 integrators compared to AB3 and RK4.
Figure 2: Approximate stability regions obtained via the method of B for AB3 integrators with extended history, normalized by the number of RHS evaluations.
(a) Approximate stability regions for AB45 integrators compared to AB4 and RK4.
(b) Approximate stability regions for AB46 integrators compared to AB4 and RK4.
Figure 3: Approximate stability regions obtained via the method of B for AB4 integrators with extended history, normalized by the number of RHS evaluations.

When compared with the plots of approximate stability regions for standard Adams-Bashforth schemes, the plots of the approximate stability regions for the AB34 and AB45 schemes extend further along the negative real axis, while slightly shrinking along the imaginary axis. In the upcoming section, we will see that the SBP-SAT discretization we use motivates the use of this scheme to handle potentially problematic eigenvalues associated with interpolation and, in the results section, we will therefore consider both the standard AB schemes (AB3, AB4) and the new AB34 and AB45 schemes in our analysis of convergence, stability, and performance considerations. While we will limit our continued discussion of extended-history schemes to AB34 and AB45 integrators, we can also extend the history used to produce the coefficients even further, creating AB35 and AB46 schemes, and again viewing normalized approximate stability regions for these schemes (Figures 1(b) and 2(b)).

3.3 Multi-rate Adams-Bashforth Integration

We now describe a multi-rate generalization of the scheme, making use of the algorithm introduced in Gear and Wells (1984). We consider the following model system with “fast” and “slow” solution components:

(3.6)

With this in mind, we can set a slow (larger) time step for such that we maintain stability in the integration of the slow component. We also set a fast time step for such that is an integer multiple of , and define the ratio between the two, , as the step ratio of the multi-rate Adams-Bashforth scheme (henceforth referred to as MRAB). While the results presented here make use of only two separate state components, each with its own right-hand side function and independent rate, the theory is readily extensible to any number of rates.

In the two-component overset formulation with which we are concerned, we define the fast and slow components of our Navier-Stokes solution as the conserved variables on each grid, that is (using a two-grid case as an example): , , where the subscripts of the vectors indicate global grid number, and in this instance we assume Grid 1 to be the grid with the fast-moving component of the solution, be it due to physical behavior or finer mesh spacing. Each right-hand side function and is a function of both the slow and fast states and — this coupling between the right-hand side functions is, in the case of our application of this theory to overset meshes, realized by the SAT penalty interpolation discussed in Section 2.2.

Within this two-component scheme, a few design choices are available:

  • The order in which we evaluate and advance the solution components. Namely, two primary options are advancing the fast-evolving solution component through all of its micro-timesteps and waiting to perform the single macro-timestep required for the slow component until the end (a “fastest-first” scheme, per the nomenclature of Gear and Wells (1984)), or pursuing an algorithm in which the slow component is instead advanced first.

  • For slowest-first evaluation schemes, the choice of whether or not to re-extrapolate the slow state after additional state and right-hand side information is gathered at the micro-timestep level.

Empirical observations on the effects of these choices, among others, are made in (Klockner, 2010). It may be useful to step through a brief example of a multi-rate Adams-Bashforth integrator, using an example system with a “fast” component requiring twice as many timesteps as the “slow” component to remain well-resolved (). We lay out the steps of a third order fastest-first MRAB scheme with no re-extrapolation, assuming that evolves at the slow rate (macro-timestep ) and evolves at the fast rate (micro-timestep ). denotes extrapolants of the right-hand side functions as polynomial functions of both time and the set of history values : . These polynomials approximating the evolution of and in time are what we will integrate to march and , and will be updated to replace older history values with new right-hand side evaluations during the course of integration through a macro-timestep . We assume availability of right-hand side histories to start the AB method.

  • Form the polynomial extrapolants we will integrate, per the AB methods described in Section 3.1. The right-hand side history values of (used to form ) have been obtained at time points , , and current time , whereas the right-hand side history values of (used to form ) have been obtained at time points , , and .

  • March both and to time by integrating the polynomial extrapolants and formed in Step 1. This results in a set of intermediate values and .

  • Evaluate the fast right-hand side .

  • Update the set of right-hand side history values for to include these new values, and construct a new extrapolant .

  • March to time by integrating the extrapolant formed in Step 1.

  • March to time by integrating the extrapolant formed in Step 3.

  • Evaluate and and update extrapolants.

The scheme evaluates the fast-evolving right-hand side twice per macro-timestep , whereas the slowly-evolving right-hand side is only evaluated once. For the results shown later, this is the scheme we will use, albeit generalized to different step ratios .

4 Time Integration in SBP-SAT Discretizations

One of the critical questions to be answered is at what timestep sizes and the integrators developed in Section 3 remain stable. This section establishes the procedures and tools for characterizing the stability of the integrators developed in Section 3 in the context of the SBP-SAT discretization used.

4.1 Timestep Calculation

For the discretization of the Navier-Stokes equations described in Section 2, we will incorporate the metrics and multidimensionality into calculation of the timestep used by a given integrator. The maximum stable timestep for a given Navier-Stokes simulation using the SBP-SAT discretization is approximated via the following steps. First, we calculate an inviscid timestep:

(4.1)

and a viscous timestep:

(4.2)

where the vector contains the Cartesian velocities, is the speed of sound, , is the Jacobian, and the terms are the metric terms described in MacCormack (2014). To determine the timestep, we calculate and at each mesh point, then take

(4.3)

and take an additional minimum over all mesh points. This timestep calculation is described in more detail in MacCormack (2014).

4.2 SAT Interpolation Stability

We now undertake a brief parameter study to show that one of the stiffest right-hand side components limiting the timestep when solving the Navier-Stokes equations with the overset discretization involves the SAT penalty-based interpolation (discussed in Section 2.2). Specifically, we will show that penalty terms applied at fringe points with more than one nonzero characteristic direction (i.e. corner points) produce eigenvalues lying far along the negative real axis. To illustrate this phenomenon, we can linearize a computationally small Navier-Stokes problem (using the procedure for approximation of the Jacobian described in Osusky et al. (2010) to form the approximate linear operator of the system) employing SAT interpolation on overset meshes and analyze the eigenspectrum for the resulting global operator.

We express the time evolution of a fringe corner point as

(4.4)

with denoting the Navier-Stokes right-hand side, and denoting the penalty terms applied by the SAT interpolation procedure described in Section 2.2 (the latter two terms in (2.1)). We introduce a parameter that we can vary to apply a fractional weight to, and thus change the magnitude of, terms associated with the interpolation penalty applied at the corner fringe points, and determine the right-hand side modes associated with these terms.

As for the case to be analyzed, we select an inviscid application of overset grids to the Euler equations with periodic boundary conditions. The initial condition is an -momentum gradient of the form

with the -momentum set to 0 everywhere. The initial condition and grid layout for this case are shown in Figure 4. Grid 1 (31 = 961 points) spans , , while Grid 2 ( = 1681 points) spans , .

(a) Penalty parameter study - 1D Euler test case initial conditions.
(b) Penalty parameter study - 1D Euler test case meshes.
Figure 4: Description of the (effectively) 1D Euler test case for evaluating corner interpolation.

We form the approximate linearized operators for this case for two values of the corner penalty weight parameter : and . By finding the eigenvalues of the two operators and comparing them, we can identify the eigenvalues associated with the fringe corners. Based on the highest-magnitude corner-associated eigenvalue that can be determined from comparing these spectra, it is clear that the approximate stability region of the RK4 scheme contains the corner-associated eigenvalue, which extends outwards along the negative real axis as we increase the penalty weight from 0.5 to 1, whereas the approximate stability region for the third-order AB scheme does not. This motivated the development of the extended-history schemes discussed in Section 3.2, which are shown to improve the stability of AB schemes along the negative real axis. Figure 5 shows that the approximate stability region for the third-order extended-history AB scheme indeed contains the corner-associated eigenvalue.

(a) Spectrum for linearized operator with , including AB34 scheme.
(b) Spectrum for linearized operator with , including AB34 scheme.
Figure 5: Spectra of box-in-box case for the linearized operator with , including AB34 scheme. Stability regions are plotted using the method described in B.

4.3 Procedure for Determining Discrete Stability

We now develop a procedure for characterizing the discrete temporal stability of a given integrator within the SBP-SAT discretization. Given that the Adams-Bashforth integrators discussed in Section 3 use a linear combination of state and right-hand side history to march in time (see (3.3)), we define an “integrator state vector” at time as

where is the vector of ODE state being integrated, and is a function (linear or nonlinear) denoting the right-hand side for that state at a given time. For a fourth-order Runge-Kutta (RK4) integrator, which we will use as a baseline for comparison and to verify this procedure, the integrator state vector instead simply contains the vector of ODE state, as RK4 integration is not dependent on right-hand side history:

We can then express integration forward one timestep as a matrix operation on the integrator state vector :

(4.5)

where denotes the “step matrix” for a given integrator. These step matrices are functions of the timestep used in the simulation, and are formed using a linearization procedure that approximates the full Jacobian matrix of the system using finite differences. In doing so, analyzing the eigenvalues of this step matrix is analogous to analyzing the amplification factors of various modes in the global error of the scheme.

To construct the step matrix , we first use an initial state as input to an integrator with timestep to march one step forward in time and form the base result. Next, we perturb a single element of the initial state vector using a small value . Using this perturbed state as an input to the integrator returns a state that — when compared to the base result — demonstrates the effect of the perturbation of that single element on the entire domain. If we define as a column vector with all zeroes except for a 1 in the th row, we can write

Thus we calculate the th column of the step matrix by perturbing the th initial state vector element, giving this modified initial state to the integrator to obtain , and taking

(4.6)

Note also that in the case of Adams-Bashforth integration, the stability vector element being perturbed may also be a right-hand side history element. For our perturbation , we use a value of . This procedure is similar to the Navier-Stokes linearization for SBP-SAT discretizations described in Osusky et al. (2010).

Once the matrix is formed, we consider the system discretely stable if the spectral radius fulfills the property . Therefore, creating step matrices for an integrator at a number of different values of will allow us to characterize that integrator’s stability. We use SLEPC Hernandez et al. (2005) for the computation of the eigenvalues, using the default Krylov-Schur solver with a tolerance of . We will use this procedure to determine the maximum stable timesteps of our integrators reported in Section 5.2.

4.4 Runge-Kutta Stability

We validate the procedure of the previous section by using it to determine the maximum stable timestep for an explicit fourth-order Runge-Kutta (RK4) integrator applied to 1D advection with a wave speed of 1. With a spatial domain with 61 mesh points, the step matrix analysis described in Section 4.3 predicts a maximum stable timestep for this case between and . Using the method described in B to approximate a given integrator’s stability region in the complex plane, we see that the imaginary axis bound for an RK4 integrator is about . When using a fourth-order SBP operator as described in (Strand, 1994; Carpenter et al., 1993; Mattsson et al., 2004) for spatial discretization of the semidiscrete problem, and with periodic boundaries, the eigenvalues can be found analytically, and all lie on the imaginary axis, with a maximum eigenvalue of (a result given by Lele Lele (1992)). We can thus estimate the maximum stable timestep for this problem, noting a grid spacing of , to be , a result which is consistent with that of our step matrix procedure.

5 Numerical Results

5.1 Guiding Experiment: Viscous Flow Past a Cylinder

We consider two-dimensional viscous flow over a cylinder with diameter in the spatial domain , . The cylinder center is located at , . The free-stream Mach number is 0.2, the Reynolds number is 200 and the Prandtl number is 0.72. We start our simulations from a steady state solution obtained after 100 non-dimensional time units (, using the nondimensionalization of Park et al. (1998)) using an RK4 integration scheme. Note that the timesteps we report are also non-dimensional. The physical problem is modeled using two overset meshes: a coarser, base Cartesian grid (), and a finer teardrop-shaped curvilinear grid () surrounding the stationary cylinder.

(a) Physical case: flow past a cylinder.
(b) Overset grids: flow past a cylinder.
Figure 6: Physical and computational description of the flow past cylinder test case used to obtain numerical results.

It is important to characterize the disparity in allowable timestep sizes such that we can estimate the maximum allowable step ratio SR we are able to use in our integrators. Calculated using the method described in Section 4.1 ((4.1), (4.2), and (4.3)), we find the ratio between the minimum timesteps on each grid to be about 12.

The boundary conditions employed in our simulation are as follows. The inner fine grid (Grid 2) is periodic in the azimuthal direction. We model the cylinder surface as an SAT isothermal wall per Svärd and Nordström (2008). On Grid 1 (Cartesian base grid), all outer boundaries are modeled as SAT far-field following the procedure of Svärd et al. (2007). In addition, sponge boundaries with a cell depth of 6 are used as described in Bodony (2006).

5.2 Multi-rate Adams-Bashforth Stability

We will now determine the stability characteristics of a given MRAB integrator by defining two ratios: and , where is the maximum stable timestep of the integrator in question, and and are the maximum stable timesteps for single-rate Adams-Bashforth (SRAB) and RK4 integrators, respectively, determined using the method of Section 4.3. Recall that the step ratio of a given MRAB integrator is given as , and is an integer value. An MRAB integrator with a step ratio of 1 is equivalent to an SRAB integrator. SR is a user-specified input.

For the experiments to determine stability, we set the value of the macro-timestep , build the step matrix of (4.5) (starting from a steady-state solution), and check for eigenvalues outside the unit disc, repeating the procedure until we find the maximum critical value of (to the nearest ) for an MRAB integrator with a given step ratio. The values are a measure of the maximum stable timesteps of these integrators relative to an RK4 integrator. The values reported for an MRAB integrator with a given order and history length use the maximum stable timestep for an SRAB integrator of the same order and history length. The percentages reported document percent efficiency of a given integrator, : that is, if an MRAB integrator with a step ratio of SR obtains an value of SR, it is deemed 100% efficient.

Standard (AB3) Extended-History (AB34)
Int. SR % %
RK4 0.0217 1.000 5.13 0.0217 1.000 3.86
SRAB 1 0.0042 0.195 1.00 100 0.0056 0.259 1.00 100
MRAB 2 0.0084 0.388 1.99 99.5 0.0113 0.518 2.00 100
MRAB 3 0.0127 0.583 2.99 99.7 0.0169 0.775 2.99 99.7
MRAB 4 0.0169 0.777 3.99 99.8 0.0225 1.035 3.99 99.8
MRAB 5 0.0171 0.788 4.04 80.8 0.0281 1.294 4.99 99.8
MRAB 6 0.0171 0.788 4.04 67.3 0.0282 1.296 5.00 83.3
MRAB 7 0.0171 0.788 4.04 57.7 0.0282 1.296 5.00 71.4
MRAB 8 0.0171 0.788 4.04 50.5 0.0282 1.296 5.00 62.5
MRAB 9 0.0171 0.788 4.04 44.9 0.0282 1.296 5.00 55.6
MRAB 10 0.0171 0.788 4.04 40.4 0.0282 1.296 5.00 50.0
MRAB 20 0.0171 0.788 4.04 20.2 0.0282 1.296 5.00 25.0
Table 1: Stability results for third order AB integrators applied to flow past a cylinder (speed ratio of about 12). , , and .
Standard (AB4) Extended-History (AB45)
Int. SR % %
RK4 0.0217 1.000 9.42 0.0217 1.000 5.63
SRAB 1 0.0023 0.106 1.00 100 0.0039 0.178 1.00 100
MRAB 2 0.0046 0.212 2.00 100 0.0078 0.356 2.00 100
MRAB 3 0.0070 0.321 3.02 100 0.0116 0.533 3.00 100
MRAB 4 0.0093 0.427 4.02 100 0.0155 0.711 4.00 100
MRAB 5 0.0093 0.430 4.05 81.0 0.0180 0.827 4.65 93.0
MRAB 6 0.0093 0.430 4.05 67.5 0.0180 0.827 4.65 77.5
MRAB 7 0.0093 0.430 4.05 57.9 0.0180 0.827 4.65 66.4
MRAB 8 0.0093 0.430 4.05 50.6 0.0180 0.827 4.65 58.1
MRAB 9 0.0093 0.430 4.05 45.0 0.0180 0.827 4.65 51.7
MRAB 10 0.0093 0.430 4.05 40.5 0.0180 0.827 4.65 46.5
MRAB 20 0.0093 0.430 4.05 20.3 0.0180 0.827 4.65 23.3
Table 2: Stability results for fourth-order AB integrators applied to flow past a cylinder (speed ratio of about 12). , , and .

The results of Tables 1 and 2 show that the extended-history schemes have higher maximum stable timesteps than their standard counterparts of same order. All schemes attain near-perfect efficiency () for SR-values up to about a third of the speed ratio of 12 for this case (), with the extended-history schemes maintaining this efficiency for the cases as well, a result due to the improved stability of these schemes along the negative real axis. Above this step ratio, we expect our performance to degrade, given that we are simply performing extra right-hand side evaluations (more micro-timesteps on the fast grid) with no commensurate macro-timestep gain. The increase from third to fourth order necessitates the use of lower timesteps for all step ratios, but the same qualitative ceiling as in the third-order case is observed. The third-order MRAB scheme with extended history is the only scheme able to attain for any step ratio, indicating maximum stable timesteps larger than that of the RK4 integrator.

5.3 Multi-Rate Adams-Bashforth Accuracy and Convergence

We now examine the accuracy of the MRAB schemes and confirm that they are convergent. In presenting some accuracy results for our schemes, we will examine integrators for SR-values ranging from 2 to 6. As we have seen in the previous section, above this step ratio, the maximum stable timestep remains unchanged for all AB integrators for our speed ratio of about 12. For each integrator, we calculate an estimated order of convergence (EOC) using 3 data points consisting of the macro-timestep used and the maximum error obtained in the density after 2.5 NDTU. The initial condition for these 2.5-NDTU integrations is a uniform subsonic flow with Mach number in the -direction (before the boundary layer has developed on the cylinder), and the true solution to which the solution obtained with our integrators is compared to form the error is the solution obtained using an RK4 integrator with a timestep of . Based on the stability results of Section 5.2, we limit our convergence study to the third-order schemes, which have the highest potential to improve performance. We see in Table 4 that an estimated order of convergence of about 3 is obtained in all cases, as expected.

SR err, err, err, EOC
2 1.93E-06 1.79E-08 2.44E-09 2.900
3 1.73E-06 1.48E-08 1.87E-09 2.965
4 1.68E-06 1.40E-08 1.77E-09 2.977
5 1.67E-06 1.38E-08 1.73E-09 2.984
6 1.66E-06 1.37E-08 1.71E-09 2.986
Table 3: Convergence results for third-order MRAB integrators.
SR err, err, err, EOC
2 3.56E-06 3.42E-08 4.53E-09 2.894
3 3.29E-06 2.86E-08 3.63E-09 2.956
4 3.23E-06 2.73E-08 3.44E-09 2.971
5 3.22E-06 2.68E-08 3.37E-09 2.979
6 3.21E-06 2.66E-08 3.34E-09 2.982
Table 4: Convergence results for third-order MRAB integrators with new extended-history scheme.

6 MRAB Performance

6.1 Performance Model

Our next task is characterizing the benefit in performance when using these multi-rate integrators. What expectations should we have in terms of the reduction in computational work required for these new schemes compared to single-rate schemes?

To answer this question, we will develop a performance model for how we would expect an MRAB integrator of a certain step ratio to perform in comparison to the same RK4 integrator used as a baseline for the accuracy and stability tests. In doing so, we make a few assumptions: we assume that right-hand side evaluations make up the bulk of the computational cost of the Navier-Stokes simulation, and we also assume that all simulations will be performed at or near the maximum stable timestep of a given integrator.

We note in forming this model that in a given timestep for an RK4 integrator, four right-hand side evaluations must be performed, whereas in the AB schemes, only one evaluation is needed. Thus, to reach time , we can write expressions for the total number of right-hand side evaluations each integrator will need to perform:

(6.1)
(6.2)

In (6.1) and (6.2), is the number of points on the fast grid, and is the number of points on the slow grid (assuming a two-grid configuration). Furthermore, SR is the step ratio of the AB integrator (the number of micro-steps taken on the fast grid per macro-step on the slow grid — for an SRAB integrator, ), is the timestep taken by the RK4 scheme, and is the end time to be reached. Recall that the factor is defined in Section 5.2 as

This factor relates to the timestep restrictions of AB integrators relative to RK4. These restrictions, as we have seen in the analyses of Section 5.2, are imposed by stability constraints. Namely, the maximum stable timestep of the RK4 integrator applied to a certain case will be larger than that of an AB integrator, and therefore more timesteps must be taken with an AB integrator to reach time than the RK4 integrator takes to reach time .

Given the assumptions of our performance model, calculating the percent reduction in right-hand side evaluations therefore provides a theoretical estimate of computational work saved when using a given AB integrator. We define the resulting speedup as

(6.3)

Using this definition, any values of SU over 1 indicate a profitable integrator, whereas values of 1 or less indicate equal or detrimental performance. As an example of this performance model in execution, we can compose a model for the case discussed in Section 5.1 modeling the cylinder in crossflow, for a number of step ratios, and for both third and fourth order. We use the grid data presented in Section 5.1 and the values given in Section 5.2, and for simplicity we take . For these results and for all of the results that follow, we will use the more potentially profitable extended-history (AB34 and AB45) integrators.

Integrator Total RHS Evals % Red. from RK4 SU
RK4 34244 0.00 1.00
SRAB () 33023 3.57 1.04
MRAB () 25846 24.52 1.32
MRAB () 23529 31.29 1.46
MRAB () 22310 34.85 1.53
MRAB () 21581 36.98 1.59
MRAB () 25274 26.19 1.35
Table 5: Evaluating RHS costs and theoretical performance for third-order MRAB integrators (using extended history) compared to RK4.
Integrator Total RHS Evals % Red. from RK4 SU
RK4 34244 0.00 1.00
SRAB () 48154 -40.62 0.71
MRAB () 37694 -10.08 0.91
MRAB () 34204 0.12 1.00
MRAB () 32459 5.21 1.05
MRAB () 33756 1.42 1.01
MRAB () 39608 -15.66 0.86
Table 6: Evaluating RHS costs and theoretical performance for fourth-order MRAB integrators (using extended history) compared to RK4.

Table 5 shows that we can expect speedup for all third-order MRAB integrators using the extended history scheme for this specific case — however, based on Table 6, we speculate that fourth-order MRAB integrators using the extended histories will largely fail to be profitable for this specific case, though the fourth-order schemes do reach a break-even point at , attaining very minimal performance benefit at (around 5% reduction in right-hand side evaluations), before decreasing again due to the stability ceiling observed in Section 5.2.

6.2 Sequential Performance

Having developed a performance model in the previous subsection and thus obtained expectations for how the integrators should perform, we now develop tests to verify this model. We gauge performance by using a given integrator to march the viscous flow over a cylinder case 10 NDTU (again, starting from a steady state solution) at the maximum stable timestep of the integrator, measuring end-to-end wall clock time of the Navier-Stokes solver, time spent calculating right-hand sides, time spent performing the necessary interpolations between grids and time spent performing spatial operator-related tasks, a subset of the right-hand side calculation. We limit our performance testing here to third-order MRAB integrators with the extended history scheme only, as Section 6.1 shows these to be the most promising for this case. The percentages reported for each step ratio document the percent reduction in time compared to that of the RK4 integrator, and the predicted percentage is that which results from the performance model (based purely on right-hand side evaluations as given by (6.1) and (6.2)).

Int. R [s] % O [s] % I [s] % E [s] % % Pred.
RK4 29.9 18.4 0.623 38.7
() 28.29 5.6 17.44 5.2 0.273 56.2 41.73 -8.0 3.6
() 22.76 23.9 14.07 23.5 0.214 65.7 30.14 22.0 24.5
() 21.60 27.7 13.33 27.5 0.201 67.7 28.90 25.2 31.3
() 20.91 30.0 12.91 29.8 0.194 68.9 27.50 28.9 34.9
() 18.13 39.3 11.17 39.2 0.163 73.8 23.60 38.9 37.0
() 20.27 32.2 12.51 32.0 0.182 70.8 24.13 37.6 26.2
Table 7: Sequential times for third-order MRAB integrators (with extended history). R = Right-hand Side Evaluation Time, O = Operator Time, I = Interpolation Time, and E = End-to-End Time.

We see in Table 7 that the measured times given for our new SRAB and MRAB integrators are all lower than those of RK4, with one exception (the SRAB integrator has a slightly larger end-to-end time). Furthermore, we note in the case of sequential performance profiling that the observed interpolation times are low as we would expect (when not running the solver in parallel, this amounts to copying data between buffers) — we will see that the time spent interpolating becomes more significant as we transition to a discussion of parallel runs. Regarding the percentages, while we see good agreement with the overall trend of the model of Section 6.1, we see that a number of the percentage reductions observed are slightly lower than those predicted by the model.

6.3 Parallel Performance

In this section, we perform tests similar to those of Section 6.2, with a specific focus on the distributed-memory parallelization of the new integrators, in order to answer the question of how communication and interpolation between grids affects performance. The procedure with these tests is the same as in Section 6.2, but an important distinction to note is that we here document end-to-end wall clock time, but the times reported for right-hand side calculations, operator tasks, and interpolation are accumulated inclusive times (the total time spent by all processors performing these specific tasks). In the tables that follow, we report times for third-order MRAB integrators with extended history for various step ratios. Bold values indicate lower time spent with a certain task than that of the RK4 integrator.

{adjustwidth}

-.5in-.5in

Proc RK4 % % % %
2 23.52 25.21 -7.2 33.82 -43.8 25.70 -9.3 24.28 -3.2
4 17.12 19.49 -13.8 10.04 41.4 8.80 48.6 10.05 41.3
8 8.37 9.36 -11.9 6.41 23.4 6.64 20.6 7.04 15.8
16 4.98 5.50 -10.4 3.93 21.1 3.91 21.5 4.50 9.6
Table 8: End-to-end timings and percentage reductions for various processor counts. Bold values indicate lower times than those of the RK4 integrator.
{adjustwidth}

-.5in-.5in

Proc RK4 % % % %
2 31.17 29.82 4.3 23.67 24.1 20.91 32.9 20.66 33.7
4 35.26 33.40 5.3 27.07 23.2 24.12 31.6 26.62 24.5
8 44.28 41.97 5.2 32.97 25.5 28.06 36.6 31.74 28.3
16 53.36 48.82 8.5 39.81 25.4 34.30 35.7 38.97 27.0
Table 9: RHS timings and percentage reductions for various processor counts. Bold values indicate lower times than those of the RK4 integrator.
{adjustwidth}

-.5in-.5in

Proc RK4 % % % %
2 19.17 18.46 3.7 14.61 23.8 12.86 32.9 12.65 34.0
4 20.97 20.12 4.1 16.37 21.9 14.63 30.2 15.78 24.7
8 24.97 23.83 4.6 18.88 24.4 16.29 34.8 18.60 25.5
16 28.52 27.00 5.3 21.71 23.9 18.29 35.9 21.42 24.9
Table 10: Operator timings and percentage reductions for various processor counts. Bold values indicate lower times than those of the RK4 integrator.
{adjustwidth}

-.5in-.5in

Proc RK4 % % % %
2 5.55 3.86 30.5 29.61 -433.5 21.92 -295.0 22.04 -297.1
4 24.54 21.02 14.3 3.73 84.8 2.28 90.7 5.03 79.5
8 13.27 10.05 24.3 1.46 89.0 6.60 50.3 11.31 14.8
16 15.45 12.12 21.6 1.93 87.5 8.53 44.8 14.54 5.9
Table 11: Interpolation timings and percentage reductions for various processor counts. Bold values indicate lower times than those of the RK4 integrator.

Generally, what we see here is first and foremost that the time spent on right-hand side and operator-related tasks (Table 9 and Table 10, respectively) is always lowered by the use of multi-rate methods, in certain cases by over 30%, in line with the model of Section 6.1). Furthermore, the end-to-end wall clock times (Table 8) largely show that the reduction in right-hand side evaluations indeed leads to end-to-end speedup for a number of step ratios and processor counts. Regarding the SRAB scheme (), we see that the times spent in the interpolation, operator, and right-hand side tasks are all lower than those of RK4, yet the end-to-end times are very slightly higher for all core counts. The reason for this is the introduction of higher overhead from elsewhere in the program due to the requirement that more timesteps be run to reach the same end time.

For tests using 2 processors, we see a large amount of time spent in interpolation subroutines for all multi-rate integrators. This is due to the fact that the usage of multi-rate integration in the overset sense naturally induces load imbalance within the application, given that in a given macro-timestep, we will be evaluating more right-hand sides on certain grids than on others. This causes long processor wait times (accounted for, in our case, as interpolation time) due to suboptimal work distribution and motivates a change to the existing decomposition of the problem, which simply distributes processors to grids based on the ratio of that grid’s number of points to the total number of points in the simulation. Rescaling this decomposition based on the multi-rate step ratio being used (a direct indication of how many right-hand side evaluations per macro-timestep a given processor is responsible for) allows us to produce improved performance results at higher processor counts, reducing the wait times spent in grid-to-grid communication. With only two cores, however, we are unable to rescale the decomposition to ameliorate the load imbalance produced by multi-rate. Further effects of inter-grid communications are shown in the next section, focusing on large-scale results.

6.4 Large-Scale Performance

We now move to a different example to investigate the performance of the integrators at larger scale. The grids for this new case, modeling ignition via laser-induced breakdown (LIB) of an underexpanded jet, are shown in Figure 7, and are numerically described in Table 12. Note the percentage of total points in Grid 1 — this will be our fast grid, while the remaining 3 grids will be the slow grids (see Figure 8).

Grid Grid Type No. of Points % of Total
1 Cartesian 12,582,912 47.7
2 Cylindrical 339,521 1.3
3 Cylindrical 11,419,614 43.3
4 Cylindrical 2,060,250 7.8
Table 12: Description of grids for the large-scale case.
Figure 7: Grids for large-scale case: LIB ignition in underexpanded jet.
Figure 8: Grids for large-scale case: assigning rates. Grid 1 (upper left) is assigned the ”fast” rate, while Grid 2 (lower left), Grid 3 (upper right), and Grid 4 (lower right) are assigned the ”slow” rate.

Using the performance model outlined in (6.1), (6.2), and (6.3), and assuming values for this case similar to the viscous flow over a cylinder case, we can calculate expected performance. We test step ratios from 1 to 5, as we have seen in Section 5.2 that this is the range of step ratios for which we maintain perfect efficiency ().

Integrator Total RHS Evals % Red. from RK4 SU
RK4 105,609,188 0.00 1.00
SRAB 101,939,371 3.50 1.04
MRAB () 75,261,021 26.5 1.36
MRAB () 66,539,511 37.0 1.59
MRAB () 62,011,632 41.3 1.70
MRAB () 59,299,803 43.8 1.78
Table 13: Evaluating RHS costs for third-order MRAB integrators (using extended-history scheme) compared to RK4 for the new large-scale case.

The results in Table 13 suggest that we should expect moderate speedup when using a third-order multi-rate scheme with , with the speedup increasing as we increase the step ratio to 5. Performing the integration using these schemes allows us to measure the same times of interest as in the previous section and compare them with this model. The results we present are obtained from a 6,144-core simulation on Stampede2, an NSF-supported supercomputer at the University of Texas and the Texas Advanced Computing Center (TACC), and report scaled time spent in various portions of the Navier-Stokes solver relative to the end-to-end wallclock time of a Runge-Kutta-driven simulation.

Figure 9: Scaled time (wall time / RK4 total wall time) spent in various components of the flow solver for the large-scale case, comparing RK4 integration and AB34 integration with step ratios of 1-5. Thicker columns indicate actual performance at a given step ratio, whereas the thinner columns report model predictions. ”GC” denotes the time spent in ghost-cell exchanges, and ”TS” denotes the time spent in timestep calculation. We see actual end-to-end timings above the model prediction for all step ratios, though actual right-hand-side and operator times are seen to match the model.

These results show measured times that correlate well with what we would expect for reduction in operator and right-hand side costs, and furthermore, we see large reductions in the time spent in interpolation-related subroutines for all step ratios — this is a direct result of implementation of multi-rate-specific overset interpolation schemes featuring selective communications and separate send-receive schemes (interleaved with right-hand side calculation). In the end, the main time marching loop sees moderate speedup for all step ratios above 1, but is below the prediction of the performance model. Our results show roughly constant time spent in all other components of the Navier-Stokes solver not explicitly timed, and higher multi-rate times in two specific portions of the solver, both of which can be readily explained:

  • Timestep calculation. While we do scale the frequency of these calculations such that they occur the same number of times in RK4 as in our integrators, the aforementioned new interpolation scheme — which reduces the amount of wait times in inter-grid communication — shifts load imbalance to these calculations, which feature communication between all processors to determine the minimum timestep.

  • Ghost cell exchanges. These operations involve communication between processors working on the same grid (to exchange ghost cell data used to fill boundary stencils). Rescaling the decomposition the solver uses to allocate more processing power to the fast grid and account for the inherent load imbalance (as mentioned at the end of Section 6.3) results in more processor boundaries that require this operation to be performed. The question of balancing inter-grid communication with intra-grid communication when decomposing the problem to avoid load imbalance remains. We leave this for future work.

In short, the results for this large-scale case show that while we see performance benefit due to decrease in right-hand side evaluations and implementation of more efficient interpolation routines, the improvement is below the estimates of the performance model due to increased communication times induced by load imbalance.

7 Conclusions and Future Work

In this paper, we have developed multi-rate Adams-Bashforth integrators to take advantage of an overset discretization of the Navier-Stokes equations using SBP spatial operators, SAT boundary conditions, and an SAT penalty-based interpolation method. We improve the temporal stability of these integrators by introducing extended-history Adams-Bashforth schemes with higher maximum stable timesteps for ODE systems with eigenvalues along the negative real axis. These new schemes are shown, using a number of numerical tools and examples, to be stable and accurate, and are also shown to improve performance and reduce the computational work required to march a given system in time when compared to both single-rate Adams-Bashforth integrators and a fourth-order Runge-Kutta integrator. In our demonstration of performance improvement using these new multi-rate schemes, we also identify load balance as a critical consideration for further reduction in computational time — this is left for future work.

Acknowledgments

This material is based in part upon work supported by the Department of Energy, National Nuclear Security Administration, under Award Number DE-NA0002374. Computational support has been provided, in part, by the National Science Foundation XSEDE resources under grant TG-CTS090004.

References

References

  • Andrus (1993) J. Andrus. Stability of a multi-rate method for numerical integration of ode’s. Computers & Mathematics with applications, 25(2):3–14, 1993.
  • Andrus (1979) J. F. Andrus. Numerical solution of systems of ordinary differential equations separated into subsystems. SIAM Journal on Numerical Analysis, 16(4):605–611, 1979.
  • Bashforth and Adams (1883) F. Bashforth and J. C. Adams. An attempt to test the theories of capillary action. University Press, 1883.
  • Benek et al. (1983) J. Benek, J. Steger, and F. C. Dougherty. A flexible grid embedding technique with application to the Euler equations. In 6th Computational Fluid Dynamics Conference Danvers, page 1944, 1983.
  • Bodony (2006) D. J. Bodony. Analysis of sponge zones for computational fluid mechanics. Journal of Computational Physics, 212(2):681–702, 2006.
  • Bodony (2010) D. J. Bodony. Accuracy of the simultaneous-approximation-term boundary condition for time-dependent problems. Journal of Scientific Computing, 43(1):118–133, 2010.
  • Bodony et al. (2011) D. J. Bodony, G. Zagaris, A. Reichert, and Q. Zhang. Provably stable overset grid methods for computational aeroacoustics. Journal of Sound and Vibration, 330(17):4161–4179, 2011.
  • Butcher (1996) J. C. Butcher. A history of Runge-Kutta methods. Applied numerical mathematics, 20(3):247–260, 1996.
  • Carpenter et al. (1993) M. H. Carpenter, D. Gottlieb, and S. Abarbanel. Time-stable boundary conditions for finite-difference schemes solving hyperbolic systems: methodology and application to high-order compact schemes. Technical report, NASA, 1993.
  • Chou and Ekaterinaris (2003) H. Chou and J. Ekaterinaris. A compact high-order CFD package for the flow solver OVERFLOW. In 41st Aerospace Sciences Meeting and Exhibit, page 1234, 2003.
  • Constantinescu and Sandu (2007) E. M. Constantinescu and A. Sandu. Multirate timestepping methods for hyperbolic conservation laws. Journal of Scientific Computing, 33(3):239–278, 2007.
  • Dawson and Kirby (2001) C. Dawson and R. Kirby. High resolution schemes for conservation laws with locally varying time steps. SIAM Journal on Scientific Computing, 22(6):2256–2281, 2001.
  • Engstler and Lubich (1997) C. Engstler and C. Lubich. Multirate extrapolation methods for differential equations with different time scales. Computing, 58(2):173–185, 1997.
  • Gear (1974) C. Gear. Multirate methods for ordinary differential equations. Technical report, Illinois Univ., Urbana (USA). Dept. of Computer Science, 1974.
  • Gear and Wells (1984) C. W. Gear and D. Wells. Multirate linear multistep methods. BIT Numerical Mathematics, 24(4):484–502, 1984.
  • Günther and Rentrop (1993) M. Günther and P. Rentrop. Multirate ROW methods and latency of electric circuits. Applied Numerical Mathematics, 13(1):83–102, 1993.
  • Günther et al. (2001) M. Günther, A. Kvaernø, and P. Rentrop. Multirate partitioned Runge-Kutta methods. BIT Numerical Mathematics, 41(3):504–514, 2001.
  • Hernandez et al. (2005) V. Hernandez, J. E. Roman, and V. Vidal. SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Software, 31(3):351–362, 2005.
  • Heun (1900) K. Heun. Neue methoden zur approximativen integration der differentialgleichungen einer unabhängigen veränderlichen. Z. Math. Phys, 45:23–38, 1900.
  • Klockner (2010) A. Klockner. High-performance high-order simulation of wave and plasma phenomena. PhD thesis, Brown University, 2010.
  • Klöckner and Wala (2018a) A. Klöckner and M. Wala. Dagrt. https://gitlab.tiker.net/inducer/dagrt, 2018a.
  • Klöckner and Wala (2018b) A. Klöckner and M. Wala. Leap. https://gitlab.tiker.net/inducer/leap, 2018b.
  • Kutta (1901) W. Kutta. Beitrag zur näherungweisen integration totaler differentialgleichungen. 1901.
  • Kværnø (2000) A. Kværnø. Stability of multirate Runge-Kutta schemes. 2000.
  • Lee and Baeder (2002) Y. Lee and J. Baeder. High-order overset method for blade vortex interaction. In 40th AIAA Aerospace Sciences Meeting & Exhibit, page 559, 2002.
  • Lele (1992) S. K. Lele. Compact finite difference schemes with spectral-like resolution. Journal of computational physics, 103(1):16–42, 1992.
  • MacCormack (2014) R. W. MacCormack. Numerical computation of compressible and viscous flow. American Institute of Aeronautics and Astronautics, 2014.
  • Magnus and Yoshihara (1970) R. Magnus and H. Yoshihara. Inviscid transonic flow over airfoils. AIAA Journal, 8(12):2157–2162, 1970.
  • Mattsson et al. (2004) K. Mattsson, M. Svärd, and J. Nordström. Stable and accurate artificial dissipation. Journal of Scientific Computing, 21(1):57–79, 2004.
  • Noack and Slotnick (2005) R. Noack and J. Slotnick. A summary of the 2004 overset symposium on composite grids and solution technology. In 43rd AIAA Aerospace Sciences Meeting and Exhibit, page 921, 2005.
  • Osher and Sanders (1983) S. Osher and R. Sanders. Numerical approximations to nonlinear conservation laws with locally varying time and space grids. Mathematics of computation, 41(164):321–336, 1983.
  • Osusky et al. (2010) M. Osusky, J. Hicken, and D. Zingg. A parallel Newton-Krylov-Schur flow solver for the Navier-Stokes equations using the SBP-SAT approach. In 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition, page 116, 2010.
  • Park et al. (1998) J. Park, K. Kwon, and H. Choi. Numerical solutions of flow past a circular cylinder at Reynolds numbers up to 160. KSME international Journal, 12(6):1200–1205, 1998.
  • Sandu and Constantinescu (2009) A. Sandu and E. M. Constantinescu. Multirate explicit Adams methods for time integration of conservation laws. Journal of Scientific Computing, 38(2):229–249, 2009.
  • Sanz-Serna (1980) J. Sanz-Serna. Some aspects of the boundary locus method. BIT Numerical Mathematics, 20(1):97–101, 1980.
  • Savcenco et al. (2007) V. Savcenco, W. Hundsdorfer, and J. Verwer. A multirate time stepping strategy for stiff ordinary differential equations. BIT Numerical Mathematics, 47(1):137–155, 2007.
  • Seny et al. (2010) B. Seny, J. Lambrechts, R. Comblen, V. Legat, J.-F. Remacle, et al. Multirate time stepping methods for accelerating explicit discontinuous Galerkin computations. In 9th International workshop on Multiscale (Un)-structured mesh numerical Modeling for coastal, shelf, and global ocean dynamics, 2010.
  • Seny et al. (2014) B. Seny, J. Lambrechts, T. Toulorge, V. Legat, and J.-F. Remacle. An efficient parallel implementation of explicit multirate Runge–Kutta schemes for discontinuous Galerkin computations. Journal of Computational Physics, 256:135–160, 2014.
  • Sharan (2016) N. Sharan. Time-stable high-order finite difference methods for overset grids. PhD thesis, University of Illinois at Urbana-Champaign, 2016.
  • Sharan et al. (2018) N. Sharan, C. Pantano, and D. J. Bodony. Time-stable overset grid method for hyperbolic problems using summation-by-parts operators. Journal of Computational Physics, 361:199–230, 2018.
  • Sherer and Visbal (2004) S. Sherer and M. Visbal. Implicit large eddy simulations using a high-order overset grid solver. In 34th AIAA Fluid Dynamics Conference and Exhibit, page 2530, 2004.
  • Sherer et al. (2006) S. Sherer, M. Visbal, and M. Galbraith. Automated preprocessing tools for use with a high-order overset-grid algorithm. In 44th AIAA Aerospace Sciences Meeting and Exhibit, page 1147, 2006.
  • Sherer and Scott (2005) S. E. Sherer and J. N. Scott. High-order compact finite-difference methods on general overset grids. Journal of Computational Physics, 210(2):459–496, 2005.
  • Steger (1991) J. Steger. The Chimera method of flow simulation. In Workshop on applied CFD, Univ of Tennessee Space Institute, volume 188, 1991.
  • Stock (2009) A. Stock. Development and application of a multirate multistep AB method to a discontinuous Galerkin method based particle in cell scheme, 2009.
  • Strand (1994) B. Strand. Summation by parts for finite difference approximations for d/dx. Journal of Computational Physics, 110(1):47–67, 1994.
  • Suhs et al. (2002) N. Suhs, S. Rogers, and W. Dietz. PEGASUS 5: an automated pre-processor for overset-grid CFD. In 32nd AIAA Fluid Dynamics Conference and Exhibit, page 3186, 2002.
  • Svärd and Nordström (2008) M. Svärd and J. Nordström. A stable high-order finite difference scheme for the compressible Navier–Stokes equations: no-slip wall boundary conditions. Journal of Computational Physics, 227(10):4805–4824, 2008.
  • Svärd et al. (2007) M. Svärd, M. H. Carpenter, and J. Nordström. A stable high-order finite difference scheme for the compressible Navier–Stokes equations, far-field boundary conditions. Journal of Computational Physics, 225(1):1020–1038, 2007.
  • Tang and Warnecke (2006) H.-z. Tang and G. Warnecke. High resolution schemes for conservation laws and convection-diffusion equations with varying time and space grids. Journal of computational mathematics, pages 121–140, 2006.
  • Volkov (1968) E. A. Volkov. The method of composite meshes for finite and infinite regions with piecewise smooth boundary. Trudy Matematicheskogo Instituta imeni VA Steklova, 96:117–148, 1968.

Appendix A Software and Reproducibility

Here, we give brief synopses of the software tools used to implement the methods described in this paper.

  • PlasComCM is a Fortran 90 code written to solve the compressible Navier-Stokes equations on overset meshes. PlasComCM is currently being used in the University of Illinois’ NNSA and DOE-funded PSAAPII center, the Center for Exascale Simulation of Plasma-Coupled Combustion (XPACC). For more on XPACC and its work, see https://xpacc.illinois.edu. For more on PlasComCM, see https://bitbucket.org/xpacc/plascomcm.

  • Leap is a Python package used to describe integration methods (including multi-rate integrators) with flexible algorithms via a virtual machine, and is capable of describing both implicit and explicit time steppers in the form of instructions that can then be passed to Dagrt (see below) to generate Fortran or Python code. Our results have been generated using Git revision 9382dd35 at https://github.com/inducer/leap Klöckner and Wala [2018b].

  • Dagrt, a second Python package, is a DAG-based runtime system which can generate Fortran or Python code implementing the integrators described by Leap for a given right-hand-side. In using this tool, and Leap, with a host application, the user needs to describe the data types to be operated on, along with the right-hand-sides that the host application uses, in a short Python driver. Our results have been generated using Git revision 3ccb3479 at https://github.com/inducer/dagrt Klöckner and Wala [2018a].

Appendix B Plotting Approximate Stability Regions

In order to better pose the question of what this maximum stable timestep is, we must first characterize and plot the approximate stability regions of the integrator. These stability regions are plotted in the complex plane, and enclose a space (where are the eigenvalues of the right-hand side being integrated, and is the timestep) within which an integrator is approximated to be stable.

We approximate the stability region in the complex plane for a given integrator by marching the simple scalar ODE

in time, with a timestep of , and with initial condition at . By setting

with , we can approximate an integrator’s stability region by varying for a given value of and marching for 100 timesteps. For a given combination, if before 100 steps are taken the value of exceeds 2, we deem the method unstable, and is decreased. Otherwise, is increased for that value until the method becomes unstable. For all approximate stability regions given, we define the stability boundary using a tolerance on of , and we use 500 equally spaced values in the range . This procedure is essentially an application of the boundary locus method described in Sanz-Serna [1980] for ordinary differential equations, which we find to be a reasonable heuristic for discussion of stability in the context of both single-rate and multi-rate time integration of PDEs using an SBP-SAT discretization.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
202073
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description