# CASTRO: A New Compressible Astrophysical Solver. II. Gray Radiation Hydrodynamics

W. Zhang11affiliation: Center for Computational Sciences and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 , L. Howell22affiliation: Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA 94550 , A. Almgren11affiliation: Center for Computational Sciences and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 , A. Burrows33affiliation: Dept. of Astrophysical Sciences, Princeton University, Princeton, NJ 08544 , J. Bell11affiliation: Center for Computational Sciences and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
###### Abstract

We describe the development of a flux-limited gray radiation solver for the compressible astrophysics code, CASTRO. CASTRO uses an Eulerian grid with block-structured adaptive mesh refinement based on a nested hierarchy of logically-rectangular variable-sized grids with simultaneous refinement in both space and time. The gray radiation solver is based on a mixed-frame formulation of radiation hydrodynamics. In our approach, the system is split into two parts, one part that couples the radiation and fluid in a hyperbolic subsystem, and another parabolic part that evolves radiation diffusion and source-sink terms. The hyperbolic subsystem is solved explicitly with a high-order Godunov scheme, whereas the parabolic part is solved implicitly with a first-order backward Euler method.

diffusion – hydrodynamics – methods: numerical – radiative transfer

## 1. Introduction

In this paper, we present the development of a gray radiation solver in our compressible astrophysics code, CASTRO. CASTRO uses an Eulerian grid with block-structured adaptive mesh refinement (AMR). Our approach to AMR is based on a nested hierarchy of logically-rectangular variable-sized grids with simultaneous refinement in both space and time. In our previous paper (Almgren et al., 2010, henceforth Paper I), we describe our treatment of hydrodynamics, including gravity and nuclear reactions. Here, we describe an algorithm for flux-limited gray radiation hydrodynamics based on a mixed-frame formulation.

Many astrophysical phenomena involve radiative processes, which often dominate the energy transport and dynamical behavior of the system. Some examples include star formation, stellar structure and evolution, accretion onto compact objects, and supernovae. Radiation hydrodynamics simulations are playing an increasingly important role in modeling these astrophysical systems.

The fundamental equation of radiation transfer is a six-dimensional integro-differential equation (Pomraning, 1973), which is unfortunately very difficult to solve. Numerical codes typically solve one or two angular moment equations of the transfer equation. One common approach is to solve a two-moment system including radiation energy density and radiation flux (e.g., Hayes & Norman, 2003; Hubeny & Burrows, 2007; González et al., 2007; Sekora & Stone, 2010). The system is closed by an approximate expression for the radiation pressure. Another popular approach is to solve the radiation energy equation only (e.g., Turner & Stone, 2001; Hayes et al., 2006; Krumholz et al., 2007; Swesty & Myra, 2009; Commerçon et al., 2011; van der Holst et al., 2011). In this approach, the so-called flux-limited diffusion (FLD) approximation is used for closure (Alme & Wilson, 1973). The two-moment approach is more accurate when the radiation is highly anisotropic and optically-thin, but it is computationally more expensive than the FLD approach. However, FLD is a very good approximation for optically-thick flows. Furthermore, the FLD approach can be numerically more robust than the two-moment approach for systems in the hyperbolic limit. We have adopted the FLD approach in the radiation solver of CASTRO.

CASTRO solves the equations of nonrelativistic radiation hydrodynamics. Thus, gas quantities, such as pressure, temperature and density, are treated as frame-independent because the corrections are of order , where is the gas velocity and is the speed of light. However, radiation quantities, such as radiation energy density, radiation flux, and radiation pressure, in the comoving frame differ from those in the laboratory frame by order (Mihalas & Mihalas, 1999). Neglecting the terms potentially leads to erroneous results, especially in the dynamic diffusion limit where transport of radiation is dominated by motion of the fluid (Castor, 1972; Mihalas & Klein, 1982; Castor, 2004). For numerical codes, some authors chose the comoving frame approach in which the radiation quantities are measured in the comoving frame (e.g., Turner & Stone, 2001; Hayes & Norman, 2003; González et al., 2007; Swesty & Myra, 2009; Commerçon et al., 2011), whereas others chose the mixed-frame approach in which the radiation quantities are computed in the laboratory frame while the opacities are measured in the comoving frame (e.g., Mihalas & Klein, 1982; Hubeny & Burrows, 2007; Krumholz et al., 2007; Sekora & Stone, 2010). A primary weakness of the comoving frame approach is that it does not conserve energy, whereas the mixed-frame approach is not suitable for systems in which line transport is important. In CASTRO, we have chosen the mixed-frame approach because it conserves the total energy and is well suited for AMR.

This paper is organized as follows. In § 2 we present the governing equations of the mixed-frame gray radiation hydrodynamics and the mathematical characteristics of the system. In § 3 we describe the single-level integration scheme. In § 4 we describe how the integration algorithm is extended for AMR. In § 5 we show the scaling behavior of CASTRO with radiation. In § 6 we present results from a series of test problems. Finally, we summarize the results of the paper in § 7.

### 2.1. Equations of Gray Radiation Hydrodynamics

Assuming local thermodynamic equilibrium, the mixed-frame frequency-integrated radiation hydrodynamics equations, correct to , can be written as (see e.g., Mihalas & Klein, 1982; Lowrie et al., 1999):

 ∂ρ∂t+∇⋅(ρ\boldmathu)= 0, (1) ∂(ρ\boldmathu)∂t+∇⋅(ρ\boldmathu\boldmathu)+∇p= 1cχF\boldmathF(0)r −κP(\boldmathuc)(aT4−E(0)r), (2) ∂(ρE)∂t+∇⋅(ρE% \boldmathu+p\boldmathu)= −cκP(aT4−E(0)r) +χF(\boldmathuc)⋅% \boldmathF(0)r, (3) ∂Er∂t+∇⋅\boldmathFr= cκP(aT4−E(0)r) −χF(\boldmathuc)⋅% \boldmathF(0)r, (4) 1c2∂\boldmathFr∂t+∇⋅Pr= −1cχF\boldmathF(0)r +κP(\boldmathuc)(aT4−E0r). (5)

Here , , , , and are the mass density, velocity, pressure, temperature, and total energy per unit mass (internal energy, , plus kinetic energy, ), respectively. , , and are radiation energy density, radiation flux, and radiation pressure tensor, respectively. Note that here the subscript denotes radiation. The speed of light and radiation constant are denoted by and , respectively, where and is the Stefan-Boltzmann constant. and are the Planck mean and flux mean interaction coefficients, both in units of inverse length. The superscript denotes the comoving frame. Radiation quantities (, and ) without the superscript are measured in the lab frame. Radiation quantities measured in the comoving and lab frames are related by the Lorentz transformation (Mihalas & Klein, 1982). It should be noted that absorption and scattering coefficients are always computed in the comoving frame in the mixed-frame approach. Also note that scattering can be included in the flux mean interaction coefficient. The whole system is closed by an equation of state for the fluid and a relation between and ,

 P(0)r=f(0)E(0)r, (6)

where is the Eddington tensor in the comoving frame.

In the FLD approximation (Alme & Wilson, 1973), the comoving radiation flux is written in the form of Fick’s law of diffusion,

 \boldmathF(0)r=−D∇E(0)r, (7)

where the diffusion coefficient is given by

 D=cλχR, (8)

where is the Rosseland mean of the sum of the absorption and scattering coefficients, and is the flux limiter. We adopt the flux limiter approximation given in Levermore & Pomraning (1981) as

 λ= 2+R6+3R+R2, (9) R= |∇E(0)r|χRE(0)r. (10)

The corresponding radiation pressure tensor is (Levermore, 1984)

 P(0)r=12[(1−f)I+(3f−1)^\boldmathn% ^\boldmathn]E(0)r, (11)

where is the identity tensor of rank 2, , and the Eddington factor is given by

 f=λ+λ2R2. (12)

We note that in the optically-thick limit both the flux limiter and the Eddington factor approach , whereas in the optically-thin limit the flux limiter and the Eddington factor approach and , respectively.

Furthermore we assume that , a common approximation accurate in optically-thick regions (Mihalas & Mihalas, 1999). Following Krumholz et al. (2007), we keep terms up to , and we drop all terms that are insignificant in all following regimes: streaming, static diffusion, and dynamic diffusion limits. Our radiation hydrodynamics equations now become,

 ∂ρ∂t+∇⋅(ρ\boldmathu)= 0, (13) ∂(ρ\boldmathu)∂t+∇⋅(ρ\boldmathu\boldmathu)+∇p+λ∇Er= 0, (14) ∂(ρE)∂t+∇⋅(ρE% \boldmathu+p\boldmathu)+λ\boldmathu⋅∇Er= (15) −cκP(aT4 −E(0)r), ∂Er∂t+∇⋅(3−f2Er\boldmathu)−λ\boldmathu⋅∇Er= (16) cκP(aT4−E(0)r)+∇⋅ (cλχR∇Er).

The absorption terms on the right hand side of these equations still include the radiation energy density in the comoving frame, because this is the the frame in which emission and absorption balance as the material becomes opaque. The comoving and lab frame quantities are related by

 E(0)r= Er−2c2\boldmathu⋅\boldmathF(0)r+O(v2/c2) (17) = Er+2λχR\boldmathu% c⋅∇Er+O(v2/c2) (18)

in the framework of the FLD approximation.

### 2.2. Mathematical Characteristics of the Hyperbolic Subsystem

Radiation hydrodynamics under the assumption of FLD is a mixed hyperbolic-parabolic system with stiff source terms. The equations of the hyperbolic subsystem are

 ∂ρ∂t+∇⋅(ρ\boldmathu)= 0, (19) ∂(ρ\boldmathu)∂t+∇⋅(ρ\boldmathu\boldmathu)+∇p+λ∇Er= 0, (20) ∂(ρE)∂t+∇⋅(ρE% \boldmathu+p\boldmathu)+λ\boldmathu⋅∇Er= 0, (21) ∂Er∂t+∇⋅(3−f2Er\boldmathu)−λ\boldmathu⋅∇Er= 0, (22)

which are obtained by neglecting the terms on the right-hand-side of Eqs. 1316. In the limit of strong equilibrium (i.e., and ), these right-hand-side terms are negligible and the full system becomes hyperbolic, governed by Eqs. 1922. In the more general case the hyperbolic subsystem will be solved first as part of a time-split discretization.

In CASTRO, we solve the hyperbolic subsystem with a Godunov method, which utilizes a characteristic-based Riemann solver. The Godunov method requires that we analyze the mathematical characteristics of the hyperbolic subsystem. For simplicity, let us consider the system in one dimension, which can be written in terms of primitive variables as,

 ∂Q∂t+A∂Q∂x=0, (23)

where the primitive variables are

 Q=⎛⎜ ⎜ ⎜⎝ρupEr⎞⎟ ⎟ ⎟⎠, (24)

and the Jacobian matrix is

 A=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝uρ000u1ρλρ0γpu003−f2Er0(3−f2−λ)u⎞⎟ ⎟ ⎟ ⎟ ⎟⎠. (25)

The system is hyperbolic because the Jacobian matrix is diagonalizable with four real eigenvalues. In general cases, the eigenvalues and eigenvectors are unfortunately very complicated. However, when the following relation holds,

 3−f2=λ+1, (26)

the four eigenvalues are,

 u−cs, u, u, u+cs, (27)

where

 cs=√γpρ+(λ+1)λErρ (28)

is the radiation modified sound speed. The corresponding right eigenvectors are,

 ⎛⎜ ⎜ ⎜ ⎜⎝1−cs/ργp/ρ(λ+1)Er/ρ⎞⎟ ⎟ ⎟ ⎟⎠, ⎛⎜ ⎜ ⎜⎝00−λ1⎞⎟ ⎟ ⎟⎠, ⎛⎜ ⎜ ⎜⎝1000⎞⎟ ⎟ ⎟⎠, ⎛⎜ ⎜ ⎜ ⎜⎝1cs/ργp/ρ(λ+1)Er/ρ⎞⎟ ⎟ ⎟ ⎟⎠, (29)

and the corresponding left eigenvectors are,

 (0,−ρ/2cs,1/2cs2,λ/2cs2),(0,0,−(λ+1)Er/ρcs2,γp/ρcs2),(1,0,−1/cs2,−λ/cs2),(0,ρ/2cs,1/2cs2,λ/2cs2). (30)

These four eigenvectors define the characteristic fields for the one-dimensional system. By computing the product of the right eigenvectors and the gradients of their corresponding eigenvalues (LeVeque, 2002), we find that the first and fourth fields are genuinely nonlinear corresponding to either a shock wave or a rarefaction wave. The second and third fields are linearly degenerate corresponding to a contact discontinuity in either gas pressure or density. Note that there is also a jump in radiation energy density accompanying the jump in gas pressure such that the total pressure, , is constant across the contact discontinuity. Obviously, in three dimensions, there are two additional linear discontinuities for transverse velocities just like the case of pure hydrodynamics.

It should be noted that Eq. 26 is satisfied in both optically-thick and thin limits. Although this condition is not always satisfied, it is a fairly good approximation. The Levermore & Pomraning (1981) flux limiter that we use (Eqs. 9 & 10) satisfies

 0.978<3−f2−λ≤1.05. (31)

Thus, for the purpose of an approximate Riemann solver, the approximate eigenvalues and eigenvectors are used.

### 2.3. Radiation Diffusion and Source-Sink Terms

The parabolic part of the system consists of the radiation diffusion and source-sink terms, which were omitted from the discussion of the hyperbolic subsystem (Eqs. 1922).

 ∂(ρe)∂t= −cκP(aT4−E(0)r), (32) = (33) ∂Er∂t= cκP(aT4−E(0)r)+∇⋅(cλχR∇Er) (34) = (35)

where is the specific internal energy. The term represents the energy exchange in the comoving frame between the material and radiation through absorption and emission of radiation. The term is due to the Lorentz transformation of radiation energy density. We do not directly solve Eqs. 32 & 34 because of our mixed-frame approach. Instead, we solve Eqs. 33 & 35. An implicit treatment is usually necessary in order to solve these equations because of their stiffness. However, the Lorentz transformation term can be treated explicitly in many situations because it is of similar order to the term in the hyperbolic subsystem (Eqs. 21 & 22), unless the Planck mean is much larger than the Rosseland mean. Without scattering, the Planck mean is usually larger than the Rosseland mean because the latter gives more weight to the lower opacity part of the radiation spectrum. However, in many astrophysical phenomena (e.g., shock breakout in core-collapse supernovae), electron scattering, which contributes to the Rosseland mean only, is the dominant source of opacity, and therefore the Lorentz transformation term can be neglected in those cases.

## 3. Single-level Integration Algorithm

For each step at a single level of refinement, the state is first evolved using an explicit Godunov method for the hyperbolic subsystem (§ 3.1). Then an implicit update for radiation diffusion and source-sink terms is performed (§ 3.2).

It is customary in time-split schemes to denote intermediate quantities with a fractional time index such as , so that, for example, the explicit hyperbolic update would advance radiation energy density from to and the implicit update would then advance it from to . We are not using this notational convention here mainly to avoid confusion in the following section with time-centered quantities constructed at the actual intermediate time . In § 3.2, where we write out the implicit update in detail, we will refer to the post-hyperbolic intermediate quantities as , , etc.

### 3.1. Explicit Solver for Hyperbolic subsystem

The hyperbolic subsystem is treated explicitly. This explicit part of our numerical integration algorithm for radiation hydrodynamics is very similar to the hydrodynamics algorithm presented in Paper I of this series. We refer the reader to Paper I for detailed description of the integration scheme, which supports a general equation of state, self-gravity, and nuclear reactions. Here we will only present the parts specific to radiation hydrodynamics.

The advection part of the time evolution can be written in the form

 ∂\boldmathU∂t=−∇⋅\boldmathF, (36)

where with the superscript denoting the transpose operation are the conserved variables, and is their flux. The conserved variables are defined at cell centers. We predict the primitive variables, including , , , , , from cell centers at time to edges at time and use an approximate Riemann solver to construct fluxes, , on cell faces. This algorithm is formally second-order in both space and time. The time step is computed using the standard CFL condition for explicit methods, with additional constraints if additional physics (such as burning) is included. The sound speed used in the computation is now the radiation modified sound speed (Eq. 28).

#### 3.1.1 Construction of Fluxes

CASTRO solves the hyperbolic subsystem of radiation hydrodynamics with an unsplit piecewise parabolic method (PPM) with characteristic tracing and full corner coupling (Miller & Colella, 2002). The four major steps in the construction of the face-centered fluxes, , in the case of hydrodynamics have been described in details in Paper I. The extension to radiation hydrodynamics is straightforward given the characteristic analysis presented in § 2.2. Thus, we will not repeat the details here. We will also omit the procedures for passively advected quantities, auxiliary variables, gravity, and reaction, because they do not change.

First, we compute the primitive variables defined as . Note that several variables are redundant for various reasons (i.e., efficiency and safety), which will be explained later.

Second, we reconstruct parabolic profiles of the primitive variables within each cell. The total pressure, , rather than the gas pressure, , is used in computing a flattening coefficient (Miller & Colella, 2002).

In the third step, we perform characteristic extrapolations of the primitive variables and obtain the edge values of at using the eigenvectors of the system (Eq. 29 & 30) and the parabolic profiles of the primitive variables. Flattening is applied in this procedure.

Finally, the fluxes are computed for the edge values obtained in the last step using an approximate Riemann solver, which is based on the Riemann solver of Bell et al. (1989) and Colella et al. (1997). The computational procedure is essentially the same as that in Paper I except that the gas internal energy density, , and gas pressure, are now replaced by the total internal energy density, , and total pressure, . The Riemann solver computes the Godunov state at the interface, which is then used to compute the fluxes, . Recall that there are redundant variables in the primitive variables, , for efficiency and safety. With the total internal energy density, , a call to the equation of state can be avoided. Negative radiation energy density and gas pressure can also be avoided in the Godunov state. The term, , in Eqs. 21 & 22 is computed as follows,

 (λ \boldmathu⋅∇Er)i,j,k=λi,j,k (37) ×[ (ux,i−1/2,j,k+ux,i+1/2,j,k2)(Er,i+1/2,j,k−Er,i−1/2,j,kΔx) + (uy,i,j−1/2,k+uy,i,j+1/2,k2)(Er,i,j+1/2,k−Er,i,j−1/2,kΔy) + (uz,i,j,k−1/2+uz,i,j,k+1/22)(Er,i,j,k+1/2−Er,i,j,k−1/2Δz)],

where the variables with half-integer index are the Godunov states. The term, , in Eq. 20 is computed in a similar way.

Depending upon a switch set by the user, the Lorentz transformation term, , may be included in the explicit update.

### 3.2. Implicit Solver for Radiation Diffusion and Source-Sink Terms

The implicit solver evolves the radiation and gas according to Eqs. 33 & 35. The algorithm uses a first-order backward Euler discretization. We note that the Lorentz term, , may or may not be included in the implicit solver. The advantage of including this term here is that it effectively balances emission against absorption in the comoving frame as the material becomes optically-thick and this coupling becomes stiff. Implicit treatment also helps avoid a time step restriction when is significantly greater than 1. The disadvantage is that it makes the resulting linear systems nonsymmetric, but this not a major concern in practice.

The radiation update algorithm is based on that of Howell & Greenough (2003). The update from the post-hyperbolic state to time for Eqs. 33 & 35 has the form

 ρen+1−ρe−Δt=− cκn+1P[a(Tn+1)4−En+1r] (38) + qn+1\boldmathu⋅∇En+1r, En+1r−E−rΔt=+ cκn+1P[a(Tn+1)4−En+1r] (39) − qn+1\boldmathu⋅∇En+1r+∇⋅(dn+1∇En+1r),

where and . Here, we use the superscript to denote the state following the explicit update, and the superscript for the state at . Since the velocity does not change in the implicit radiation update, we have dropped the superscript for . We solve Eqs. 38 & 39 iteratively via Newton’s method. We define

 Fe=ρen+1−ρe−−Δt {−cκn+1P[a(Tn+1)4−En+1r] +qn+1\boldmathu⋅∇En+1r}, (40) Fr=En+1r−E−r−Δt {cκn+1P[a(Tn+1)4−En+1r] −qn+1\boldmathu⋅∇En+1r +∇⋅(dn+1∇En+1r)}. (41)

Here, we have dropped the superscript for because the implicit update does not change the mass density. The desired solution is for and to both be zero, and the Newton update to approach this state is the solution to the linear system

 [(∂Fe/∂T)(k)(∂Fe/∂Er)(k)(∂Fr/∂T)(k)(∂Fr/∂Er)(k)][δT(k+1)δE(k+1)r]=[−F(k)e−F(k)r]. (42)

Here and , where the superscript denotes the stage of the Newton iteration. To reduce clutter we drop the superscript without loss of clarity, so .

To solve this system we eliminate the dependency on by forming the Schur complement, leaving a modified diffusion equation for the radiation update . For simplicity we drop the temperature derivatives of and , keeping only the temperature dependence of the emission and absorption terms. This does not affect the converged solution and in practice does not appear to significantly degrade the convergence rate.

After some mathematical manipulation we obtain the following diffusion equation, which must be solved for each Newton iteration:

 [(1−η)cκP+1Δt]E(k+1)r− ∇⋅(d∇E(k+1)r) (43) + (1−η)q\boldmathu⋅∇E(k+1)r =(1−η)cκPa(T(k))4+1Δt [E−r−η(ρe(k)−ρe−)],

where

 (44)

and is the specific heat capacity of the matter.

The iterations are stopped when the maximum of on the computational domain falls below a preset tolerance (e.g., ). Note that Eq. 43 is rewritten to be in terms of rather than . This is done for computational efficiency. After one or two Newton iterations, the solution at the previous iteration, , is getting very close to the final converged solution . Since we use as the starting point for the call to the iterative linear solver these calls get cheaper for each additional Newton iteration. If we solved for instead, we would have to change the linear solver tolerance to avoid an unnecessarily accurate and expensive solve for what may be a very small correction.

Each time we solve the diffusion equation for a new iterate , we update the gas internal energy density as follows,

 ρe(k+1) =ηρe(k)+(1−η)ρe− (45) −Δt(1−η) [cκP(a(T(k))4−E(k+1)r)−q\boldmathu⋅∇E(k+1)r].

The new temperature then derives from and via a call to the equation of state. Other quantities may or may not be updated: the coefficients , , , , and are also temperature-dependent and could have been written with a superscript. The limiter can be recomputed based on the new . It is a tradeoff between efficiency and accuracy whether to recompute some or all of these at every iteration. Our default choice is to recompute all of these at each iteration for better accuracy. However, updating coefficients can make the implicit update iteration less stable if the coefficients are not smooth functions of their inputs, in addition to the extra computational costs. But since each stage of the Newton iteration (Eq. 43 combined with Eq. 45) is itself a conservative solution, the implicit algorithm will conserve total energy to the tolerance of the linear solver (which should not be confused with the tolerance of the Newton iterations) regardless of the number of iterations taken or which coefficients are updated.

In CASTRO, the linear system Eq. 43 is solved using the hypre library (Falgout & Yang, 2002; hypre Code Project, 2011). We have developed drivers that work with systems in the canonical form

 AEn+1r− ∑i∂∂xi(Bi∂En+1r∂xi)+∑i∂∂xi(CiEn+1r) + ∑iDi∂En+1r∂xi=rhs, (46)

where are cell-centered coefficients and , , and are centered at cell faces. For Eq. 43 the coefficients are not used. There is some subtlety in the appropriate averaging of coefficients from cells to faces; see Howell & Greenough (2003) for further discussion. The same canonical form works for Cartesian, cylindrical, and spherical coordinates so long as appropriate metric factors are included in the coefficients and the rhs.

With hypre we have a choice between two parallel multigrid solvers: Schaffer multigrid (SMG) and PFMG. SMG is more robust for difficult problems with strongly-varying coefficients, but PFMG is typically more efficient and scalable. These solvers work for systems at a uniform grid resolution (that is, systems associated with a single level of adaptive mesh refinement). For systems coupling together more than one refinement level we could use the hypre algebraic multigrid (AMG) solver, or an FAC-type scheme (McCormick, 1989) using structured solvers on the separate levels. In earlier versions of the AMR algorithm we required multilevel solvers for conservative coupling between refinement levels. We have now developed a scheme, though, that eliminates the need for a multilevel linear solver while still conserving total energy. We discuss this in detail in the following sections. Note also that use of the coefficients deriving from the Lorentz term make the diffusion equation nonsymmetric. The multigrid solvers mentioned above are designed for use with symmetric systems, but good convergence behavior can still be obtained by using these solvers as preconditioners for a Krylov method such as GMRES.

## 4. Amr

CASTRO uses a nested hierarchy of logically-rectangular, variable-sized grids with simultaneous refinement in both space and time, as illustrated in Figures 1 and 2. One major design objective of the AMR algorithm is to preserve the conservation properties of the uniform-grid discretization. AMR for hyperbolic equations in CASTRO was described in detail in Paper I.

The explicit update step for radiation hydrodynamics (§ 3.1) follows the same pattern as other hyperbolic equations and so does not increase the complexity of the AMR algorithm. We note that the hyperbolic subsystem (Eqs. 1922) is only partially in conservation law form. It will not conserve total momentum because it does not include an equation analogous to Eq. 5 tracking the radiation momentum. It will conserve total energy, though, so long as the divergence terms are differenced in a conservative manner. These divergence terms therefore require AMR reflux operations, as described in Berger & Colella (1989). The term appears in both the gas energy and radiation energy equations with opposite signs, so any consistent discretization of these terms will conserve total energy.

The AMR version of the implicit radiation diffusion update is based on Howell & Greenough (2003), but with several important differences: The present algorithm is fully-implicit, not time-centered. The optional multilevel linear solve at the beginning of each coarse time step is no longer included—this feature was introduced to improve accuracy but we now consider it unnecessary in most cases. Finally, the multilevel linear solve for flux synchronization between coarse and fine levels is replaced by a new algorithm we call the “deferred sync.” These changes entirely eliminate the need to compute linear system solutions coupling different levels of the AMR hierarchy, while not compromising conservation of total energy. Performance is significantly improved because multilevel linear solvers tend to be more complex and expensive than those for single-level systems.

### 4.1. AMR Time Step Outline with Deferred Sync

The AMR time step is defined recursively in terms of operations on a level and its interactions with coarser and finer levels. We consider advancing level  from time index to , corresponding to time values and , respectively. (Even though levels other than have executed different numbers of time steps, we will use the superscript to refer to values at time on all levels involved in the calculation.) The region covered by level  is denoted , its border is , and the border of the next finer level, projected onto level , is .

The notation to describe all of this is unavoidably complex due to the quantities at different times, levels, and stages of the update process. In the following outline, we specify the update for the level and its synchronization with finer levels. We include the hyperbolic update and the refluxing step associated with it in order to show the proper sequence of operations and to contrast the explicit reflux with the implicit deferred sync. As in § 3.1 the hyperbolic conserved state vector is denoted by , but we denote the hyperbolic flux by to distinguish it from the radiation flux. For the radiation flux we are concerned here only with the diffusion term in the implicit update, and we denote the associated flux by to distinguish it from the complete radiation flux introduced in § 2.1.

Note that while the flux divergence is needed everywhere, the fluxes and themselves are stored only on the borders between levels. Our code has data structures called flux registers designed for this purpose. The notation indicates an average of level data in space over the fine cells (or cell faces) making up each corresponding coarse cell (or face), while denotes an average of level data in both space and time over the coarse (level ) time step. Thus, at the only point in the algorithm where this notation is used,

 ⟨⟨\boldmathFℓ+1⟩⟩=1rℓ+1rℓ+1(n+1)−1∑m=rℓ+1n⟨\boldmathFℓ+1,m+1⟩, (47)

where is the number of level  time steps making up the level  time step from to , and is the level flux during the level time step .

The expression involving that appears as a deferred source term in the diffusion equation and explicitly for the hyperbolic reflux is called the reflux divergence, because it takes the form of the divergence of a flux difference stored in the flux registers. These terms are evaluated only in the coarse cells bordering an interface with a finer level. They do not affect fine cells at the interface because flux calculations for those cells are already the more accurate ones; the terms represent the corrected fluxes from these fine cells being imposed onto the coarse grid.

Another way to understand the terms is to consider and to be energy that has been “misplaced” at the coarse-fine interfaces during the level time step, due to the differing flux calculations on the different levels. If the solution were not corrected, this energy would be lost and the system would not be conservative. Instead, the terms re-introduce the missing energy into the system. For the explicit hyperbolic flux this is done explicitly; for the radiation flux it contributes to the right hand side for the implicit update, so as not to impose a new stability constraint on the size of the time step.

The hyperbolic state vector includes the radiation and fluid energies updated in the implicit update step, but there is no ambiguity because the operations and their associated fluxes are completely distinct. The refluxing update to is written as an update, with appearing on both sides of the equation, because otherwise we would need additional notation to indicate the pre- and post-reflux states. Averaging down from fine to coarse levels is also written as an update. The meaning of the rest of the pseudocode below should be reasonably clear in context:

 If (ℓ<ℓmax) and (regrid requested from base level ℓ) then For ℓ′∈{ℓmax−1,…,ℓ} do ∙ Determine new grid layout for level ℓ′+1. ∙ Interpolate data to new grids from level ℓ′. ∙ Copy data on intersection with old level ℓ′+1. Enddo Endif Level Time Step, level ℓ: Explicit Hyperbolic Update for \boldmathUℓ,n+1 (see Paper I) ∙ Set \boldmathFℓ,n+1H for hyperbolic fluxes on (P(∂Λℓ+1), ℓ<ℓmax) and on (∂Λℓ, ℓ>0) Implicit Diffusion Update ⋆ En+1r−E−rΔtn+1= +cκn+1P[a(Tn+1)4−En+1r] −qn+1\boldmathu⋅∇En+1r +∇⋅(dn+1∇En+1r) −(Δtn/Δtn+1)∇Reflux⋅(δ\boldmathFℓ+1,nR) on Λℓ ⋆ ρen+1−ρe−Δtn+1=−cκn+1P[a(Tn+1)4−En+1r] +qn+1\boldmathu⋅∇En+1r on Λℓ End Implicit Diffusion Update ∙ \boldmathFℓ,n+1R=−dn+1∇En+1r on (P(∂Λℓ+1), ℓ<ℓmax) and on (∂Λℓ, ℓ>0) ∙ Advance levels ℓ+1,…,ℓmax recursively. ∙ on P(∂Λℓ+1), ℓ<ℓmax ∙ on P(∂Λℓ+1), ℓ<ℓmax End Level Time Step If (ℓ<ℓmax) then (synchronization/refluxing between levels ℓ and ℓ+1) ∙ \boldmathUℓ,n+1:=\boldmathUℓ,n+1−(Δtn+1)∇Reflux⋅(δ\boldmathFℓ+1,n+1H) on Λℓ−P(Λℓ+1) ∙ \boldmathUℓ,n+1:=⟨\boldmathUℓ+1,n+1⟩ on P(Λℓ+1) Endif (end synchronization/refluxing).

The advantages of the deferred sync algorithm are that it eliminates the need for a separate linear solve for synchronization and eliminates the need for solving multilevel linear systems entirely. It does, however, add complications of its own to the AMR implementation. One is that there is no longer any point within the time step cycle when the field variables are fully synchronized. At the end of a coarse time step all levels have reached the same point in time, but if we want to actually compute the energy budget and confirm conservation we have to include the contributions from deferred fluxes stored in flux registers. These will not be re-introduced into the state until the next time step. The end of a coarse time step is also the natural time for checkpoint/restart operations, so to reproduce the saved state of the system we now have to include the deferred fluxes in checkpoints as well.

Regridding becomes an issue as well. The adaptive algorithm periodically re-evaluates the refinement criteria for each level and may change the layout of refined grids. For grids at level the criteria are evaluated at level , and this happens between level time steps. The interpolation operations between coarse and fine field variables are conservative. With the deferred sync, though, there will also be fluxes stored around the edges of level at the time that level may be changed. There is no straightforward way to transform these stored fluxes so that they coincide with the new mesh layout. Instead, what we have is an old set of flux registers that may now overlap with level as well as with level , and if the grid layout changes enough may even overlap with other levels both finer and coarser. The deferred sync idea still applies, but the implementation becomes more complicated than the pseudocode above suggests. Portions of the stored flux at the interface between levels and may be reintroduced into the level advance or the level advance, and so on, not just into level .

## 5. Parallel Performance

CASTRO is implemented within the BoxLib framework for parallel structured-grid AMR applications (Paper I and references therein). In BoxLib, parallelization is based upon either hybrid OpenMP-MPI or pure MPI. Because the hypre library does not fully support OpenMP yet, we use the pure MPI approach for the radiation hydrodynamics solver. For more information on software design and parallelization, we refer the reader to Paper I. Here we show the scaling behavior of the radiation hydrodynamics solver in CASTRO.

A weak scaling study has been carried out on Hopper111The Hopper supercomputer was named after Grace Hopper, a pioneer in computer science and the developer of the first compiler., a petascale Cray XE6 supercomputer at the National Energy Research Scientific Computing Center. We have performed a series of three-dimensional simulations with 1, 2 and 3 total levels on various numbers of cores. For the convenience of comparison, each run has one grid of cells at each level on each core. A refinement factor of 2 is used in the multi-level simulations. Thus, the level 1 and 2 grids occupy 12.5% and 1.5625% of the whole volume, respectively. A point explosion like the one in § 6.9 is replicated on each core. The fine grids are placed at the center of the local domain of each core. Figure 3 shows that CASTRO has very good scaling behavior up to 32768 cores. For the single-level simulations, the average wall clock time per coarse time step increases by only 17% from 8 cores to 32768 cores. Because of subcycling in time for simulations with multiple levels, one coarse time step consists of one step on the coarse level and two steps on the next fine level, and the two fine steps might also consist of even finer steps, recursively. Thus, the number of cells that are advanced in one coarse time step increases by a factor of three or seven, for the two- and three-level simulations, respectively (Fig. 2). Our results also show that the overhead introduced by AMR is modest. For example, on 4096 cores, the average wall clock times per coarse time step for the two- and three-level simulations are 3.6 and 8.6 times more than that of the single-level simulation. This corresponds to an overhead of 19% and 22%, respectively. On 32768 cores, the average wall clock times per coarse time step for the two- and three-level simulations are 3.8 and 9.5 times more than that of the single-level simulation. This corresponds to an overhead of 25% and 36%, respectively. It should be noted that single-level simulations with equivalent uniform grids would cost and times more time than the corresponding two- and three-level simulations we have run even if the single-level simulations are assumed to scale perfectly. We also note that in this series of simulations about half of the time is spent on the implicit evolution of the parabolic part of the system.

## 6. Test Problems

A CFL number of 0.8 is used for these tests unless stated otherwise or a fixed time step is used. The refinement factor is 2 for all AMR runs. The relative tolerance for the Newton iterations in the implicit update is for all runs. The Lorentz transformation term is handled explicitly except in the second radiative blast wave test (§ 6.10).

### 6.1. Approach to Thermal Equilibrium

This test introduced by Turner & Stone (2001) has often been used to test the ability of a code to handle the source-sink term, . The test consists of a static uniform field of gas and radiation. The gas has a density of , a Planck mean absorption coefficient of , a mean molecular weight of , and an adiabatic index of . The initial radiation energy density is corresponding to a temperature of . Two cases with an initial internal energy density of and , respectively, have been studied. The gas temperatures are and for the two cases, respectively. The time step is chosen as . The evolution of the system will bring the gas and radiation into a thermal equilibrium. The radiation energy density will hardly change because the energy exchange between the gas and radiation is only a small fraction of the radiation energy. Therefore an analytic solution can be calculated by solving the following ordinary differential equation

 d(ρe)dt=−cκP(aT4−Er), (48)

where is assumed to be constant. The results of this test of the approach to thermal equilibrium are shown in Figure 4. The agreement with the analytic solution is good, especially in the first case. For the second case, the cooling in the numerical calculation is initially slower than that in the analytic solution for the first few steps in agreement with the results of Turner & Stone (2001). The slow cooling is a result of the backward Euler method and a relatively large time step at the beginning of the decay of the gas temperature.

### 6.2. Nonequilibrium Marshak Wave

In this test, we simulate the nonequilibrium Marshak wave problem in one dimension. Initially half of the space, , consists of a static uniform zero-temperature gas and no radiation. A constant radiation flux is incident on the surface at . The gas is not allowed to move in this idealized test. Thus the gas and radiation are coupled only through the source-sink term and the system is governed by Eqs. 33 & 35, with . Su & Olson (1996) obtained analytic solutions for the problem under special assumptions. The matter is assumed to be gray with , and its volumetric heat capacity at constant volume is assumed to be , where is the gas temperature and is a parameter.

We have run this test with and no flux limiter (i.e., ). Figure 5 shows the numerical results of the dimensionless radiation energy density and gas energy density defined by Pomraning (1979)

 x≡ √3κz, (49) τ≡ (4acκα)t, (50) u(x,τ)≡ (c4)(Er(z,t)Finc), (51) v(x,τ)≡ (c4)(aT4(z,t)Finc). (52)

The computational domain of is covered by 128 uniform cells. The dimensionless time step is chosen to be . The numerical results are in good agreement with the analytic results of Su & Olson (1996).

### 6.3. Thermal Wave

In this test, we simulate a thermal wave (Zel’Dovich & Raizer, 1967) in three dimensions and use this test to assess the accuracy of the AMR algorithms for radiation diffusion. Suppose that a large amount of energy is deposited into a small volume as the internal energy of matter. The matter then starts to radiate and transfer most of its energy to radiation. We assume that the Planck mean absorption coefficient is large enough so that the matter and the radiation are in thermal equilibrium. The heat is transported out of the initial hot spot because of the nonlinear radiation heat conduction. As a result, a thermal wave develops. As the matter cools down, the matter gains most of the energy again. Initially the thermal wave speed is much higher than the sound speed. Assuming that there is no fluid motion and the matter contains most of the energy, there is an analytic solution for this problem (Zel’Dovich & Raizer, 1967).

This test is adapted from Howell & Greenough (2003). The computational region in this test is three-dimensional with a domain of in each dimension. Initially, the spherical hot spot has an energy of within a radius of , whereas the ambient medium has a very low temperature of . The volumetric heat capacity is . The Planck mean absorption coefficient is , whereas the Rosseland mean is . In this test, the hydrodynamics is turned off, and there is no flux limiter (i.e., ). We have performed three simulations, a non-AMR run with uniform cells, an AMR run using the deferred sync algorithm (§ 4.1), and an AMR run using the multilevel algorithm of Howell & Greenough (2003). The two AMR runs use 2 total levels with an effective resolution of cells on the finer level. A cell is tagged for refinement if its temperature satisfies both and , where is the size of the cell. For the non-AMR run, we use a time step for step , whereas for the two AMR runs, we use for step on the coarser level. We have chosen the time steps for the following reasons. First, as it expands, the thermal wave is rapidly decelerated. Thus, a fixed time step would not be optimal. Instead, we let the time step grow over time. Second, the numerical error depends on both time step and cell size . In this test, we want to assess the accuracy of the AMR runs in comparison with a non-AMR run. Therefore, we make the time step on the finer level of the AMR runs to be roughly the same as that of the non-AMR run. Figure 6 shows a 2D slice at of the deferred sync run at for temperature and the difference between the numerical and analytic results. It is shown that the largest errors () occur near the thermal wave front where the slope of the temperature profile is extremely steep. At the center, the absolute error is only , whereas the relative error is 0.7%. It takes 451 coarse time steps for the AMR runs to reach . The finer grids in the AMR runs occupy 0.20, 2.3, and 32% of the volume at steps 1, 226, and 451, respectively. Thus, the benefit of AMR is obvious. To quantitatively measure the accuracy of the results, we compute the normalized -norm error for temperature, , where and are the numerical and analytic results for cell , respectively, and is the volume of cell . At , the -norm errors are , , and , for the non-AMR, deferred sync, multilevel sync runs, respectively. The results show that our AMR algorithms have only increased the error by .

We have also performed a series of AMR simulations using the deferred sync to check the convergence behavior of the code. Besides the AMR run using the deferred sync that has been presented (Fig. 6), two lower resolution runs and one higher resolution run have been performed (Table 1). In all four runs, there are two total AMR levels. In the convergence study, when the cell size changes by a factor of 2 from one run to another, we change the time step by a factor of 4. Table 1 shows the -norm errors and convergence rate at . In this study, the order of accuracy with respect to is . Because our implicit scheme is first-order in time and second-order in the spatial discretization of the diffusion term, the expected convergence rate for a smooth flow is second-order with respect to when is fixed. However, the temperature profile of the thermal wave problem has a very steep slope near the front and its second derivative is discontinuous there. Hence, it is not surprising that the achieved order of accuracy is lower than 2.

### 6.4. Optically-Thin Streaming of Radiation Front

In this problem, we test our code in the optically-thin streaming limit. This is the opposite limit from diffusion. The computational domain of this test is a one-dimensional region of , covered by 128 uniform cells. Initially, the region is filled with radiation, , whereas the region of has . The left and right boundaries are held at and , respectively, during the calculation. The hyperbolic update (§ 3.1) is switched off in this test. The Planck mean absorption coefficient is set to zero. We have studied two cases with Rosseland means of and , respectively. Thus the optical depths of the whole domain are and , respectively. For each case, we have performed two calculations, one with the time step set to and one with . Physically, the radiation front is expected to propagate at the speed of light, but the diffusion approximation does not naturally support a propagating front at any speed. Only the flux limiter permits the code to approximate the correct solution. In the numerical results the radiation front moves at approximately the speed of light with spreading due to diffusion (Figure 7). Better results are obtained for smaller time steps.

### 6.5. Shock Tube Problem In Strong Equilibrium Regime

In this test, the one-dimensional numerical region () initially consists of two constant states: , ,