On Architecture and Performance of Adaptive Mesh Refinement in an Electrostatics Particle-In-Cell Code

On Architecture and Performance of Adaptive Mesh Refinement in an Electrostatics Particle-In-Cell Code

Matthias Frey matthias.frey@psi.ch Andreas Adelmann andreas.adelmann@psi.ch Uldis Locans Paul Scherrer Institute, CH-5232 Villigen, Switzerland
Abstract

This paper presents the architecture and a performance study of the newly developed adaptive mesh refinement (AMR) solver in the Object-Oriented Particle Accelerator Library (OPAL). The framework AMReX, formerly known as BoxLib, is used and interfaced to the existing particle infrastructure in OPAL. The multigrid Poisson solver is based on second generation Trilinos packages that allow to develop hardware independent algorithms. The solver is verified with a comparison to the built- in multigrid solver. Further benchmarks problems, covering the scale of the later anticipated physics, are presented, showing accuracy and parallel scalability.

keywords:
Poisson equation, Adaptive mesh refinement (AMR), Multigrid, AMReX, BoxLib, Trilinos
\tikzumlset

fill class = background, draw = line, font=

1 Introduction

In todays state-of-the-art beam dynamics codes the well-known Particle-In-Cell (PIC) Hockney and Eastwood (1988) technique has become indispensable. The computational complexity of for the naive direct summation over all macro particles was reduced to in case of a 3D FFT-based Poisson solver with grid points per dimension. Furthermore, the efficient parallelization of the space-charge solver with MPI (or nowadays with other accelerators such as GPU and MIC Adelmann et al. (2016)) enabled large-scale simulations that are more realistic. Nevertheless, multi-bunch simulations of high intensity accelerators such as cyclotrons require fine meshes in order to resolve the non-linear effects in the evolution of the beams due to space-charge. A remedy to increase the resolution, reduce the computational effort and possibly also memory consumption is adaptive mesh refinement (AMR) Berger and Oliger (1984); Berger and Colella (1989). In the context of Vlasov-Poisson problems, AMR was applied by Hittinger and Banks (2013) using the Eulerian description for the coordinate and velocity space. Examples for a Lagrangian formulation are the Unified Flow Solver (UFS) framework Kolobov and Arslanbekov (2016) and Warp-X Vay et al. (2018).
The diversity of todays computer architectures and the fast increase of emerging HPC technologies have shown that it’s getting more and more infeasible to design a scientific software to one specific hardware only. It is therefore obvious that recent source code developments reveal a trend towards architecture independent programming where the backend kernels exhibit the hardware-specific implementation. An example are the second generation Trilinos packages that are built on top of the Kokkos library Edwards et al. (2014, 2012).
In this paper the new AMR capability of the particle accelerator library OPAL Adelmann et al. (2017) using AMReX AMR (2017) is presented, as well as the built-in adaptive multigrid solver based on the algorithm in Martin (1998) and the second generation Trilinos packages Tpetra Baker and Heroux (2012), Amesos2 and Belos Bavier et al. (2012), MueLu Prokopenko et al. (2014); Hu et al. (2014) and Ifpack2 Prokopenko et al. (2016). The new implementation was benchmarked with the existing Fotran-based multigrid solver of BoxLib and the analytical example of a uniformly charged sphere.
The new AMR feature of OPAL will enable to study neighbouring bunch effects as they occur in high intensity cyclotrons due to the low turn separation in more detail. Previous investigations such as Yang et al. (2010) for the PSI ring cyclotron have already shown their existence but the PIC model was limited in resolution due to the high memory needs. It is hoped that the usage of AMR will reduce the memory consumption for the mesh by decreasing the resolution in regions of void while maintaining or even increasing the grid point density at locations of interest in order to resolve the neighbouring bunch interactions more precisely. In Yang et al. (2010) was shown that the interaction of neighbouring bunches leads to an increase at the tails of a particle distribution (i.e. increase of the number of halo particles) that usually causes particle losses and therefore an activation of the machine. Thus, it is essential to quantify this effect more precisely in order to do predictions on further machine developments with higher beam current.
Beside a short introduction to OPAL in section 2 and AMReX in section 3, section 4.4 depicts the implementation of the adaptive multigrid with aforementioned benchmarks in section 5. In the last section are conclusions and outlook.

2 Opal

The Object Oriented Parallel Accelerator Library (OPAL) is an electrostatic PIC (ES-PIC) beam dynamics code for large-scale particle accelerator simulations. Due to the general design its application ranges from high intensity cyclotrons to low intensity proton therapy beamlines Rizzoglio et al. (2017) with negligible space charge. Beside the default FFT Poisson solver for periodic and open boundary problems the built-in SAAMG (Smoothed Aggregation Algebraic Multigrid) solver enables to simulate accelerators with arbitrary geometries Adelmann et al. (2010). The time integration relies on the second order Leapfrog or the fourth order Runge-Kutta (RK-4) method.

In beam dynamics the evolution of the density function in time of the charged particle distribution in phase space due to electromagnetic fields and is described by the Vlasov (or collisionless Boltzmann) equation

(1)

with particle charge and rest mass . Instead of the velocity OPAL uses the relativistic momentum with Lorentz factor together with the coordinate to specify the state of a particle in the 6D phase space. Both, the electric and magnetic field, in Eq. (1) are a sum of an external and internal, i.e. space charge, contribution

(2)
(3)

The external fields are given by RF-cavities and by the magnetic field of the machine, respectively. In order to evaluate the electric self-field the beam is Lorentz transformed into its rest frame where the magnetic field induced by the motion of the particles is negligible. Thus, the electric self-field is fully described by the electrostatic potential , i.e.

(4)

that is computed by Poisson’s equation

(5)

with charge density and vacuum permittivity . The magnetic self-field is afterwards restored by the inverse Lorentz transformation. This quasi-static approximation is known as Vlasov-Poisson equation.

2.1 AMR Interface

The new feature in OPAL is implemented in a lightweight fashion where the AMR library is used as a black box. Thus, it is basically possible to have multiple AMR dependencies. The AMR functionality is provided to OPAL by an abstract base class that each library has to extend. The additional particle attributes, i.e. the level and the grid a particle lives on, are provided by a new layout. Beside the regrid function each AMR module implements the charge deposition and the particle-to-core (re-)distribution. The AMR flavour in OPAL is currently equipped with six refinement methods. The most obvious criteria are the charge density per cell, the potential strength or the electric field strength per cell. There’s also the possiblity to constrain the minimum, respectively, maximum number of particles within a cell before it is refined. A last tagging option is the particle momentum. In OPAL the spurious self-forces for particles close by a coarse-fine grid interfaces are corrected by using buffer cells as described in Vay et al. (2012). Another solution as depict in Colella and Norgaard (2010) applies a modification of the charge deposition algorithm using a convolution of the Green’s function for particles near a refinement boundary.

3 AMReX

AMReX is a descendant of the parallel block-structured adaptive mesh refinement (SAMR) code named BoxLib. It is C++ based with an optional Fortran90 interface. Each level is distributed independently among MPI-processes in order to ensure load balancing. The owned data is located either at nodes, faces, edges or centers of cells where the latter description is used in the OPAL-AMR implementation.
In order to generate a level each cell of the underlying coarser level has to be marked to get refined or not according to a user-defined criterion. In electrostatic problems natural choices are for example the amount of charge per cell, the potential strength or the electric field. Subsequent levels satisfy the relation

(6)

where is called the refinement ratio and specifies the mesh spacing of level in direction of . An example of a refined mesh is given in Fig. 1. By definition, the coarsest level covers the full domain whereas a fine level is defined by patches that may overlap several coarser grids. In general, for a level with grids following holds

(7)
(8)

Although neighbour grids aren’t allowed to overlap they exchange data at interfaces via ghost cells.

Figure 1: Sketch of a block-structured mesh refinement of a Cartesian grid in 2D with AMReX. Fine levels may span multiple coarser grids as indicated. At interfaces among grids of the same level ghost cells allow exchanging data.

4 Adaptive Geometric Multigrid

This section describes the algorithm and implementation of the adaptive geometric multigrid (AGMG) solver according to Martin and Cartwright (1996); Martin (1998). A cell-centered implementation is also presented in Almgren et al. (1998).

4.1 Coarse-Fine Interface

AGMG is a special variant of GMG since not all levels cover the full domain . At interfaces between subsequent levels the elliptic matching condition (i.e. Neumann and Dirichlet boundary condition) must be satisfied in order to ensure continuity of the solution. This condition is met by flux differencing

(9)

where either

(10)

on or

(11)

with and at the interface , i.e. the average flux across the boundary where is assumed. In case of a cell without adjacent finer cells the flux differencing reduces to the usual second order Laplacian discretization

(12)

An illustration of the stencil of Eq. (9) with fluxes computed either by Eq. (10) or Eq. (11), respectively, is shown in Fig. 2. In order to simplify the representation the example is in 2D with only one coarse-fine interface on the left side. Hence, the corresponding finite difference stencil is given by

(13)

where

(14)

(a) The red nodes indicate ghost cells that need to be interpolated.

(b) The red crosses specify the intermediate interpolation points using coarse cells.
Figure 2: Illustration of flux differencing in 2D at a coarse-fine interface on the left side.

In order to express ghost cells in terms of valid coarse and fine cells a two-step second order Lagrange intepolation in 2D

(15)

with

(16)

is performed. First, the intermediate points symbolized as red crosses in Fig. 1(b) are computed where only non-covered coarse cells parallel to the interface are taken. Second, the fine cells normal to the boundary are used together with the intermediate locations to obtain the ghost cells. In total, nine different configurations can be distinguished that are shown in Fig. 3. The selection of the interpolation pattern used follows the ordering of Fig. 3, i.e. from left to right and top to bottom. A fast evaluation of the right interpolation scheme is accomplished by assigning to each configuration an integer value where a natural choice is their representation as a bit pattern (see Tab. 1). Hence, a simple switch-statement allows to select the appropriate configuration. In case none of the nine patterns is applicable one of the four first order Lagrange interpolation configurations of Fig. 4 is taken instead. The implementation follows exactly the same scheme with conversion shown in Tab. 2.

(a) case 0
(b) case 1
(c) case 2
(d) case 3
(e) case 4
(f) case 5
(g) case 6
(h) case 7
(i) case 8
Figure 3: All possible configurations for 2D quadratic Lagrange interpolation where the red cells are used for the interpolation. The coarse-fine interface is thought as a layer behind. The black dot specifies the cell for which the Laplacian operator is evaluated.

0

5

10

15

20

1

6

11

16

21

2

7

12

17

22

3

8

13

18

23

4

9

14

19

24

bit pattern unsigned long case
Table 1: Bit pattern of the Lagrange interpolation schemes. The ordering follows Fig. 3.

(a) case 0
(b) case 1
(c) case 2
(d) case 3
Figure 4: All possible configurations for 2D linear Lagrange interpolation at which the red cells are used to build the Lagrange coefficients. The black dot indicates the cell at the current coarse-fine-interface. The interface has to be thought one layer behind or in front.

0

3

6

1

4

7

2

5

8

bit pattern unsigned long case
Table 2: Bit strings of each pattern shown in Fig. 4. The right column contains the appropriate number used in the switch-statement.

4.2 Boundary Conditions

Assuming the beam in vacuum and neglecting any beam pipes the electrostatic potential converges to zero at infinity. In order to resemble this behaviour in finite difference a common approximation is the Asymptotic Boundary Condition (ABC) presented in Alvin and Eli (1980); Bayliss et al. (1982) that is also denoted as radiative, respectively, open boundary condition (BC). The first order approximation ABC-1 is given by

(17)

Instead to spherical coordinates a formulation in Cartesian coordinates is applied for example in Khebir et al. (1990); Gordon and Fook (1993); Biswas et al. (2015). In spherical coordinates the -th order approximation (ABC-) is easily evaluated by

(18)

where the product is computed in decreasing order and , of course. Robin boundary conditions are another method to approximate open boundaries. The formula looks similar to Eq. (17) except that the radial derivative is replaced by a normal derivative w.r.t. the mesh boundary, i.e. Adelmann et al. (2010)

(19)

where is an artificial distance. The condition is discretized using central difference. In addition to open BCs using Eq. (19) the solver presented here allows to imply homogeneous Dirichlet and periodic BCs at the mesh (or physical) boundaries.

4.3 Domain Transform

In order to prevent particles leaving the domain of the mesh where the AMR hierarchy is built, they are mapped into the computation space for the evaluation of Poisson’s equation. Therefore, the geometry can be kept fixed. This cube is a natural choice since in the local frame the bunch is located around the design trajectory with the reference particle at . The following subsections explain this linear transformation in detail. After solving Poisson’s equation the electrostatic potential and the electric field have to be rescaled properly. As depicted in Fig. 5, instead of rescaling the fields at the location of the particles, it is directly done on the grid.

transform

deposit

solve

back transform

interpolate

apply

Figure 5: Workflow of the space charge calculation. Poisson’s equation is solved in the computation domain and rescaled afterwards. All steps in and are marked in blue, respectively, green.

4.3.1 Particle Coordinate

Let be a coordinate of some particle in the particle space , and let the maximum norm be defined by

Then, the transformation of an individual particle at position into computation space is given by

where is the number of particles.

4.3.2 Electrostatic Potential

Let be the electrostatic potential in particle space and the corresponding potential value in computation space , then they relate as

(20)
Proof.

Let the discrete charge density of particles be described by (Jackson, 1999, eq. 1.6)

in dimensions and the coordinates being transformed as denoted above then

with and

where . Thus,

Therefore, the potential transform in dimensions as denoted in Eq. (20). In 2 dimensions the electrostatic potential remains. ∎

4.3.3 Electric Field

Let be the electric field in particle space and the corresponding electric field vector in computation space , then they relate as

(21)
Proof.

According to Gauss’ law the electric field is the derivative of the electrostatic potential. Thus, an additional contributes to the transformation. Therefore,

that coincides with (21) in dimensions. ∎

4.4 Algorithm and Implementation Details

Following the notation of Martin and Cartwright (1996); Martin (1998), the full domain is given by

(22)

where the projection from level to level satisfies . Due to the properties of the refinement Poisson’s equation is described by

(23)

with composite Laplacian operator that considers only non-refined regions of each level. The full algorithm is illustrated in matrix notation in Alg. 2 to Alg. 3. It performs a V-cycle in the residual correction formulation with pre- and post-smoothing of the error. The iterative procedure stops when the -norm of the residual of all levels with is smaller than the corresponding right-hand side norm. Since AMReX assigns the grids independent of the underlying level to the cores, the implementation provides special matrices, i.e.  and , to handle the coarse-fine-interfaces. Thus, each AMR level stores up to eight matrices and four vectors represented by Tpetra objects. These are the composite Laplacian matrix , the Laplacian matrix assuming no-finer grids , the coarse boundary matrix and fine boundary matrix , the restriction and interpolation matrices , respectively, , the gradient matrices and the matrix to get all uncovered cells . The vectors per level are the charge density , electrostatic potential , residual and error . Whereas the vectors span the whole level domain, some matrices only cover a subdomain or carry additional information for the coarse-fine interfaces as shown in Fig. 6. The coarse and fine boundary matrices encompass one side of the Lagrange interpolation stencil that is completed the Laplacian matrices. In case of the finest level the composite and no-fine Laplacian matrices coincide.

The pre- and post-relaxation steps on line 8 and, respectively, 16 of Alg. 3 use the algorithms provided by Ifpack2 (e.g. Gauss-Seidel, Jacobi, etc.). The linear system of equations on the coarsest level (Alg. 3, line 20) is either solved by direct solvers available via Amesos2 or iterative solvers of Belos. Furthermore, an interface to MueLu allows Smoothed Aggregation Algebraic Multigrid (SAAMG) as bottom solver.

1:Level
2:Updated residual on the composite domain
3:function Residual()
4:     if  then
5:         
6:     else
7:         
8:     end if
9:end function
Algorithm 1 Residual evaluation on the composite domain
1:Charge density , electrostatic potential , electric field and finest level
2:Electrostatic potential and electric field
3:function Solve()
4:     for  to  do
5:         Residual() // Initialize residual
6:     end for
7:     
8:     while  do //
9:         Relax() // Start of V-cycle
10:         for  to  do
11:              Residual() // Update residual
12:         end for
13:         
14:     end while
15:     for  to  do
16:          // Average down
17:     end for
18:     for  to  do
19:         for  to  do
20:               // Evaluate electric field
21:         end for
22:     end for
23:end function
Algorithm 2 Main loop of AGMG
1:Level
2:Electrostatic potential
3:function Relax()
4:     if  then
5:         
6:     end if
7:     if  then
8:         
9:         
10:         Smooth() // Pre-smooth: Gauss-Seidel, Jacobi, …
11:         
12:          // Restrict on covered domain
13:          // Residual update on uncovered domain
14:         Relax()
15:          // Prolongation / Interpolation
16:         
17:         
18:         Smooth() // Post-smooth: Gauss-Seidel, Jacobi, …
19:         
20:         
21:     else
22:          // Solve linear system of equations
23:         
24:     end if
25:end function
Algorithm 3 Residual correction V-Cycle

(a) Covered cells by Laplacian
matrix assuming no finer level .

(b) Coarse portion of Lagrange
interpolation covered by .

(c) Restriction on covered domain
by .

(d) Covered cells by update of
residual (line 11 of Alg. 3).

(e) Composite Laplacian matrix
.

(f) Fine contribution of Lagrange
interpolation covered by .

(g) Prolongation matrix from
level to level .
Figure 6: Cell domain occupied by matrices. Red: Usual cell domain; Green: Physical / mesh boundary; Blue: Fine contribution of Lagrange interpolation; Violet: Coarse contribution of Lagrange interpolation.

5 AGMG Solver Benchmark

In a first example the solver is verified by means of the built-in Fortran90 based multigrid solver of BoxLib where 10 Gaussian-shaped bunches are placed in a chain using Dirichlet boundary conditions in the computation domain. This represents a more or less realistic example in multi bunch simulations in high intensity cyclotrons as studied in Yang et al. (2010). The second mini-app shows a comparison with the analytical solution of a uniformly charged sphere in free space. Although AMR doesn’t really make senses for a single bunch simulation in beam dynamics it’s still a perfect mini-app to check for any discontinuities at the coarse-fine interfaces of the meshes among levels.
Lastly, the correctness is shown by a mini-app of perturbed plasma undergoing Landau damping. There, the implementation follows Myers et al. (2017, 2016) but omitting the particle remapping. The particles are integrated in time with time step using the symplectic second order Leapfrog in the kick-drift-kick formulation (velocity-Verlet) Swope et al. (1982), i.e.

where the BoxLib solver is used as a reference for the evaluation of the electric field again. All lineplots, projection plots and sliceplots are generated using an own extension of yt Turk et al. (2011).

5.1 10 Gaussian-Shaped Bunches

In this mini-app the newly implemented solver is compared to the built-in Fortran90 based solver of BoxLib. Each bunch is initialized with macro particles of charge . The particles per bunch are picked using a one-dimensiontal Gaussian distribution per dimension where and . In horizontal direction the standard deviation is with a mean shift of to the neighbouring bunches. The problem is solved on a base grid and 2 levels of refinement. At the mesh boundaries the Dirichlet boundary condition is imposed. A projection in -direction of the charge density is shown in Fig. 8. As indicated by the lineplots of Fig. 9 both solutions agree.

(a) AGMG.
(b) BoxLib FMG.
Figure 7: Projection onto -plane of the electrostatic potential.
Figure 8: Projection plot of the charge density with adaptive mesh refinement.
(a) Electrostatic potential.
(b) Electric field in .
Figure 9: Lineplots in -direction.

5.2 Uniformly Charged Sphere in Free Space

In this mini-app particles are randomly picked within a sphere of radius centered at origin. In order to simplify comparison to the analytical solution

each particle carries a charge of . Thus, the peak value of the electric field is