On Architecture and Performance of Adaptive Mesh Refinement in an Electrostatics Particle-In-Cell Code
This paper presents the architecture and a performance study of the newly developed adaptive mesh refinement (AMR) solver in the Object-Oriented Particle Accelerator Library (OPAL). The framework AMReX, formerly known as BoxLib, is used and interfaced to the existing particle infrastructure in OPAL. The multigrid Poisson solver is based on second generation Trilinos packages that allow to develop hardware independent algorithms. The solver is verified with a comparison to the built- in multigrid solver. Further benchmarks problems, covering the scale of the later anticipated physics, are presented, showing accuracy and parallel scalability.
keywords:Poisson equation, Adaptive mesh refinement (AMR), Multigrid, AMReX, BoxLib, Trilinos
fill class = background, draw = line, font=
In todays state-of-the-art beam dynamics codes the well-known Particle-In-Cell (PIC) Hockney and Eastwood (1988)
technique has become indispensable. The computational complexity of for the naive direct summation over
all macro particles was reduced to in case of a 3D FFT-based Poisson solver with grid
points per dimension. Furthermore, the efficient parallelization of the space-charge solver with MPI (or nowadays with other
accelerators such as GPU and MIC Adelmann et al. (2016)) enabled large-scale simulations that are more realistic. Nevertheless,
multi-bunch simulations of high intensity accelerators such as cyclotrons require fine meshes in order to resolve the non-linear
effects in the evolution of the beams due to space-charge. A remedy to increase the resolution, reduce the computational effort
and possibly also memory consumption is adaptive mesh refinement (AMR) Berger and Oliger (1984); Berger and Colella (1989). In the context of
Vlasov-Poisson problems, AMR was applied by
Hittinger and Banks (2013) using the Eulerian description for the coordinate and velocity space. Examples for a Lagrangian
formulation are the Unified Flow Solver (UFS) framework Kolobov and Arslanbekov (2016) and Warp-X Vay et al. (2018).
The diversity of todays computer architectures and the fast increase of emerging HPC technologies have shown that it’s getting more and more infeasible to design a scientific software to one specific hardware only. It is therefore obvious that recent source code developments reveal a trend towards architecture independent programming where the backend kernels exhibit the hardware-specific implementation. An example are the second generation Trilinos packages that are built on top of the Kokkos library Edwards et al. (2014, 2012).
In this paper the new AMR capability of the particle accelerator library OPAL Adelmann et al. (2017) using AMReX AMR (2017) is presented, as well as the built-in adaptive multigrid solver based on the algorithm in Martin (1998) and the second generation Trilinos packages Tpetra Baker and Heroux (2012), Amesos2 and Belos Bavier et al. (2012), MueLu Prokopenko et al. (2014); Hu et al. (2014) and Ifpack2 Prokopenko et al. (2016). The new implementation was benchmarked with the existing Fotran-based multigrid solver of BoxLib and the analytical example of a uniformly charged sphere.
The new AMR feature of OPAL will enable to study neighbouring bunch effects as they occur in high intensity cyclotrons due to the low turn separation in more detail. Previous investigations such as Yang et al. (2010) for the PSI ring cyclotron have already shown their existence but the PIC model was limited in resolution due to the high memory needs. It is hoped that the usage of AMR will reduce the memory consumption for the mesh by decreasing the resolution in regions of void while maintaining or even increasing the grid point density at locations of interest in order to resolve the neighbouring bunch interactions more precisely. In Yang et al. (2010) was shown that the interaction of neighbouring bunches leads to an increase at the tails of a particle distribution (i.e. increase of the number of halo particles) that usually causes particle losses and therefore an activation of the machine. Thus, it is essential to quantify this effect more precisely in order to do predictions on further machine developments with higher beam current.
Beside a short introduction to OPAL in section 2 and AMReX in section 3, section 4.4 depicts the implementation of the adaptive multigrid with aforementioned benchmarks in section 5. In the last section are conclusions and outlook.
The Object Oriented Parallel Accelerator Library (OPAL) is an electrostatic PIC (ES-PIC) beam dynamics code for large-scale particle accelerator simulations. Due to the general design its application ranges from high intensity cyclotrons to low intensity proton therapy beamlines Rizzoglio et al. (2017) with negligible space charge. Beside the default FFT Poisson solver for periodic and open boundary problems the built-in SAAMG (Smoothed Aggregation Algebraic Multigrid) solver enables to simulate accelerators with arbitrary geometries Adelmann et al. (2010). The time integration relies on the second order Leapfrog or the fourth order Runge-Kutta (RK-4) method.
In beam dynamics the evolution of the density function in time of the charged particle distribution in phase space due to electromagnetic fields and is described by the Vlasov (or collisionless Boltzmann) equation
with particle charge and rest mass . Instead of the velocity OPAL uses the relativistic momentum with Lorentz factor together with the coordinate to specify the state of a particle in the 6D phase space. Both, the electric and magnetic field, in Eq. (1) are a sum of an external and internal, i.e. space charge, contribution
The external fields are given by RF-cavities and by the magnetic field of the machine, respectively. In order to evaluate the electric self-field the beam is Lorentz transformed into its rest frame where the magnetic field induced by the motion of the particles is negligible. Thus, the electric self-field is fully described by the electrostatic potential , i.e.
that is computed by Poisson’s equation
with charge density and vacuum permittivity . The magnetic self-field is afterwards restored by the inverse Lorentz transformation. This quasi-static approximation is known as Vlasov-Poisson equation.
2.1 AMR Interface
The new feature in OPAL is implemented in a lightweight fashion where the AMR library is used as a black box. Thus, it is basically possible to have multiple AMR dependencies. The AMR functionality is provided to OPAL by an abstract base class that each library has to extend. The additional particle attributes, i.e. the level and the grid a particle lives on, are provided by a new layout. Beside the regrid function each AMR module implements the charge deposition and the particle-to-core (re-)distribution. The AMR flavour in OPAL is currently equipped with six refinement methods. The most obvious criteria are the charge density per cell, the potential strength or the electric field strength per cell. There’s also the possiblity to constrain the minimum, respectively, maximum number of particles within a cell before it is refined. A last tagging option is the particle momentum. In OPAL the spurious self-forces for particles close by a coarse-fine grid interfaces are corrected by using buffer cells as described in Vay et al. (2012). Another solution as depict in Colella and Norgaard (2010) applies a modification of the charge deposition algorithm using a convolution of the Green’s function for particles near a refinement boundary.
AMReX is a descendant of the parallel block-structured adaptive mesh refinement (SAMR) code named BoxLib. It is C++ based
with an optional Fortran90 interface. Each level is distributed independently among MPI-processes in order to ensure load
balancing. The owned data is located either at nodes, faces, edges or centers of cells where the latter description is used in
the OPAL-AMR implementation.
In order to generate a level each cell of the underlying coarser level has to be marked to get refined or not according to a user-defined criterion. In electrostatic problems natural choices are for example the amount of charge per cell, the potential strength or the electric field. Subsequent levels satisfy the relation
where is called the refinement ratio and specifies the mesh spacing of level in direction of . An example of a refined mesh is given in Fig. 1. By definition, the coarsest level covers the full domain whereas a fine level is defined by patches that may overlap several coarser grids. In general, for a level with grids following holds
Although neighbour grids aren’t allowed to overlap they exchange data at interfaces via ghost cells.
4 Adaptive Geometric Multigrid
This section describes the algorithm and implementation of the adaptive geometric multigrid (AGMG) solver according to Martin and Cartwright (1996); Martin (1998). A cell-centered implementation is also presented in Almgren et al. (1998).
4.1 Coarse-Fine Interface
AGMG is a special variant of GMG since not all levels cover the full domain . At interfaces between subsequent levels the elliptic matching condition (i.e. Neumann and Dirichlet boundary condition) must be satisfied in order to ensure continuity of the solution. This condition is met by flux differencing
with and at the interface , i.e. the average flux across the boundary where is assumed. In case of a cell without adjacent finer cells the flux differencing reduces to the usual second order Laplacian discretization
An illustration of the stencil of Eq. (9) with fluxes computed either by Eq. (10) or Eq. (11), respectively, is shown in Fig. 2. In order to simplify the representation the example is in 2D with only one coarse-fine interface on the left side. Hence, the corresponding finite difference stencil is given by
In order to express ghost cells in terms of valid coarse and fine cells a two-step second order Lagrange intepolation in 2D
is performed. First, the intermediate points symbolized as red crosses in Fig. 1(b) are computed where only non-covered coarse cells parallel to the interface are taken. Second, the fine cells normal to the boundary are used together with the intermediate locations to obtain the ghost cells. In total, nine different configurations can be distinguished that are shown in Fig. 3. The selection of the interpolation pattern used follows the ordering of Fig. 3, i.e. from left to right and top to bottom. A fast evaluation of the right interpolation scheme is accomplished by assigning to each configuration an integer value where a natural choice is their representation as a bit pattern (see Tab. 1). Hence, a simple switch-statement allows to select the appropriate configuration. In case none of the nine patterns is applicable one of the four first order Lagrange interpolation configurations of Fig. 4 is taken instead. The implementation follows exactly the same scheme with conversion shown in Tab. 2.
4.2 Boundary Conditions
Assuming the beam in vacuum and neglecting any beam pipes the electrostatic potential converges to zero at infinity. In order to resemble this behaviour in finite difference a common approximation is the Asymptotic Boundary Condition (ABC) presented in Alvin and Eli (1980); Bayliss et al. (1982) that is also denoted as radiative, respectively, open boundary condition (BC). The first order approximation ABC-1 is given by
Instead to spherical coordinates a formulation in Cartesian coordinates is applied for example in Khebir et al. (1990); Gordon and Fook (1993); Biswas et al. (2015). In spherical coordinates the -th order approximation (ABC-) is easily evaluated by
where the product is computed in decreasing order and , of course. Robin boundary conditions are another method to approximate open boundaries. The formula looks similar to Eq. (17) except that the radial derivative is replaced by a normal derivative w.r.t. the mesh boundary, i.e. Adelmann et al. (2010)
where is an artificial distance. The condition is discretized using central difference. In addition to open BCs using Eq. (19) the solver presented here allows to imply homogeneous Dirichlet and periodic BCs at the mesh (or physical) boundaries.
4.3 Domain Transform
In order to prevent particles leaving the domain of the mesh where the AMR hierarchy is built, they are mapped into the computation space for the evaluation of Poisson’s equation. Therefore, the geometry can be kept fixed. This cube is a natural choice since in the local frame the bunch is located around the design trajectory with the reference particle at . The following subsections explain this linear transformation in detail. After solving Poisson’s equation the electrostatic potential and the electric field have to be rescaled properly. As depicted in Fig. 5, instead of rescaling the fields at the location of the particles, it is directly done on the grid.
4.3.1 Particle Coordinate
Let be a coordinate of some particle in the particle space , and let the maximum norm be defined by
Then, the transformation of an individual particle at position into computation space is given by
where is the number of particles.
4.3.2 Electrostatic Potential
Let be the electrostatic potential in particle space and the corresponding potential value in computation space , then they relate as
Let the discrete charge density of particles be described by (Jackson, 1999, eq. 1.6)
in dimensions and the coordinates being transformed as denoted above then
where . Thus,
Therefore, the potential transform in dimensions as denoted in Eq. (20). In 2 dimensions the electrostatic potential remains. ∎
4.3.3 Electric Field
Let be the electric field in particle space and the corresponding electric field vector in computation space , then they relate as
According to Gauss’ law the electric field is the derivative of the electrostatic potential. Thus, an additional contributes to the transformation. Therefore,
that coincides with (21) in dimensions. ∎
4.4 Algorithm and Implementation Details
where the projection from level to level satisfies . Due to the properties of the refinement Poisson’s equation is described by
with composite Laplacian operator that considers only non-refined regions of each level. The full algorithm is illustrated in matrix notation in Alg. 2 to Alg. 3. It performs a V-cycle in the residual correction formulation with pre- and post-smoothing of the error. The iterative procedure stops when the -norm of the residual of all levels with is smaller than the corresponding right-hand side norm. Since AMReX assigns the grids independent of the underlying level to the cores, the implementation provides special matrices, i.e. and , to handle the coarse-fine-interfaces. Thus, each AMR level stores up to eight matrices and four vectors represented by Tpetra objects. These are the composite Laplacian matrix , the Laplacian matrix assuming no-finer grids , the coarse boundary matrix and fine boundary matrix , the restriction and interpolation matrices , respectively, , the gradient matrices and the matrix to get all uncovered cells . The vectors per level are the charge density , electrostatic potential , residual and error . Whereas the vectors span the whole level domain, some matrices only cover a subdomain or carry additional information for the coarse-fine interfaces as shown in Fig. 6. The coarse and fine boundary matrices encompass one side of the Lagrange interpolation stencil that is completed the Laplacian matrices. In case of the finest level the composite and no-fine Laplacian matrices coincide.
The pre- and post-relaxation steps on line 8 and, respectively, 16 of Alg. 3 use the algorithms provided by Ifpack2 (e.g. Gauss-Seidel, Jacobi, etc.). The linear system of equations on the coarsest level (Alg. 3, line 20) is either solved by direct solvers available via Amesos2 or iterative solvers of Belos. Furthermore, an interface to MueLu allows Smoothed Aggregation Algebraic Multigrid (SAAMG) as bottom solver.
5 AGMG Solver Benchmark
In a first example the solver is verified by means of the built-in Fortran90 based multigrid solver of BoxLib where 10
Gaussian-shaped bunches are placed in a chain using Dirichlet boundary conditions in the computation domain. This represents
a more or less realistic example in multi bunch simulations in high intensity cyclotrons as studied in
Yang et al. (2010). The second mini-app shows a comparison with the analytical solution of a uniformly charged sphere in free space. Although AMR
doesn’t really make senses for a single bunch simulation in beam dynamics it’s still a perfect mini-app to check for any
discontinuities at the coarse-fine interfaces of the meshes among levels.
Lastly, the correctness is shown by a mini-app of perturbed plasma undergoing Landau damping. There, the implementation follows Myers et al. (2017, 2016) but omitting the particle remapping. The particles are integrated in time with time step using the symplectic second order Leapfrog in the kick-drift-kick formulation (velocity-Verlet) Swope et al. (1982), i.e.
where the BoxLib solver is used as a reference for the evaluation of the electric field again. All lineplots, projection plots and sliceplots are generated using an own extension of yt Turk et al. (2011).
5.1 10 Gaussian-Shaped Bunches
In this mini-app the newly implemented solver is compared to the built-in Fortran90 based solver of BoxLib. Each bunch is initialized with macro particles of charge . The particles per bunch are picked using a one-dimensiontal Gaussian distribution per dimension where and . In horizontal direction the standard deviation is with a mean shift of to the neighbouring bunches. The problem is solved on a base grid and 2 levels of refinement. At the mesh boundaries the Dirichlet boundary condition is imposed. A projection in -direction of the charge density is shown in Fig. 8. As indicated by the lineplots of Fig. 9 both solutions agree.
5.2 Uniformly Charged Sphere in Free Space
In this mini-app particles are randomly picked within a sphere of radius centered at origin. In order to simplify comparison to the analytical solution
each particle carries a charge of . Thus, the peak value of the electric field is