# A “metric” semi-Lagrangian Vlasov-Poisson solver

###### Abstract

We propose a new semi-Lagrangian Vlasov-Poisson solver. It employs elements of metric to follow locally the flow and its deformation, allowing one to find quickly and accurately the initial phase-space position of any test particle , by expanding at second order the geometry of the motion in the vicinity of the closest element. It is thus possible to reconstruct accurately the phase-space distribution function at any time and position by proper interpolation of initial conditions, following Liouville theorem. When distorsion of the elements of metric becomes too large, it is necessary to create new initial conditions along with isotropic elements and repeat the procedure again until next resampling. To speed up the process, interpolation of the phase-space distribution is performed at second order during the transport phase, while third order splines are used at the moments of remapping. We also show how to compute accurately the region of influence of each element of metric with the proper percolation scheme. The algorithm is tested here in the framework of one-dimensional gravitational dynamics but is implemented in such a way that it can be extended easily to four or six-dimensional phase-space. It can also be trivially generalised to plasmas.

## 1 Introduction

The dynamics of Dark matter in the Universe and stars in galaxies is governed by Vlasov-Poisson equations

(1.0) | |||

(1.0) |

where is the phase-space distribution function at position , velocity and time , the gravitational potential, the gravitational constant and the projected density. Usually, these equations are solved numerically with a -body approach, where the phase-space distribution function is represented by an ensemble of Dirac functions interacting gravitationally with a softened potential (see, e.g., Hockney and Eastwood 1988; Bertschinger 1998; Colombi 2001; Dolag et al. 2008; Dehnen and Read 2011, and references therein). It is known that such a rough representation of the phase-space fluid is not free of biases nor systematic effects (see, e.g., Aarseth et al. 1988; Kandrup and Smith 1991; Boily et al. 2002; Binney 2004; Melott 2007; Joyce et al. 2009; Colombi et al. 2015). With the computational power available today, an alternative approach is becoming possible, consisting in solving directly Vlasov dynamics in phase-space (see, e.g., Yoshikawa et al. 2013). In fact, this point of view is already commonly adopted in plasma physics (see, e.g., Grandgirard et al. 2016, and references therein), but seldom used in the gravitational case.

In this paper, we consider the case where the phase-space distribution is a smooth function of the vector

(1.0) |

This means that initial conditions have non zero initial velocity dispersion, which is relevant to the analysis of e.g. galaxies or relaxed dark matter halos. On the other hand, if the aim would be to follow accurately the dynamics of dark matter in the Cold Dark Matter paradigm, the initial velocity would be null. In this case, the problem is different since one has to describe the foldings of a -dimensional hypersurface evolving in 2 dimensional phase-space (see, e.g., Hahn and Angulo 2016; Sousbie and Colombi 2016).

In the warm case considered here, it is possible to represent on an Eulerian grid. Most of the numerical methods trying to resolve directly the dynamics of function have been developed in plasma physics and are of semi-Lagrangian nature. They usually exploit directly Liouville theorem, namely that is conserved along characteristics. In practice, a test particle is associated to each grid site where the value of the phase-space distribution function has to be evaluated. This test particle is followed backwards in time to find its position at previous time step. Then, function from previous time step is interpolated at the root of the characteristic to obtain the searched value of , following Liouville theorem. This approach was pioneered by Cheng and Knorr (1976) in a split fashion and first applied to astrophysical systems by Fujiwara (1981), Nishida et al. (1981) and Watanabe et al. (1981). In the classical implementation, interpolation of the phase-space distribution function is performed using a third order spline.

There exist many subsequent improvements over the splitting algorithm of Cheng & Knorr (see, e.g., Shoucri and Gagne 1978; Sonnendrücker et al. 1999; Filbet et al. 2001; Besse and Sonnendrücker 2003; Crouseilles et al. 2010; Rossmanith and Seal 2011; Qiu and Shu 2011; Güçlü et al. 2014, but this list is far from comprehensive). The main problem common to all these algorithms is the unavoidable diffusion due to repeated resampling of the phase-space distribution function. Indeed, because of the finite grid resolution, repeated interpolation, as sophisticated it can be, induces information loss and augmentation of entropy. One way to fix this problem is to employ adaptive mesh refinement (see, e.g., Gutnic et al. 2004; Besse et al. 2008, 2017), i.e. to augment local resolution when the level of details asks for it. However, diffusion can also be significantly dampened if remapping of the phase-space distribution function is performed as seldom as possible. One way to achieve this is to represent with an ensemble of elements of which the shape is allowed to change with time according to the Lagrangian equations of motion (see, e.g., Alard and Colombi 2005; Campos Pinto et al. 2014; Larson and Young 2015). In the “cloudy” method that we proposed in Alard and Colombi (2005), the phase-space distribution is sampled on an ensemble of overlapping round Gaussian functions of which the contours tranform into ellipsoids following the phase-space flow. When the ellipsoids become too elongated, the phase-space distribution function is resampled with new round clouds. Because resampling is seldom performed, say at fractions of dynamical times, diffusion effects are much less prominent, even though coarse graining is still performed at the resolution scale given by the inter-cloud center spacing. The problem with this remapping approach is the computational cost: to have a smooth and accurate representation of the phase-space distribution function, the clouds need to overlap significantly: the extra cost due to this overlap becomes prohibitive in high number of dimensions.

However, one can realize in fact that it is not necessary to represent the phase-space distribution on a set of clouds: the only important piece of information is the metric intrinsically associated to each cloud of the previous approach, i.e. the information on how function is deformed by the motion. If one is indeed able to follow the properties of the motion up to some order in the neighbourhood of test particles, that we call metric elements, reconstructing the initial position of any point nearby is straightforward, and so is the calculation of from interpolation of initial conditions following Liouville theorem.

Our algorithm, Vlamet, described in details in § 2, can thus be summarised as follows. It consists in sampling at all times the phase-space distribution function on a grid of fixed resolution and following at second order in space and time using a predictor-corrector time integration scheme. The calculation of exploits directly Liouville theorem, namely that is conserved along the characteristics, i.e. is equal to its initial value at Lagrangian position . To follow these characteristics accurately, we define a set of elements of metric that are sparse sampling the grid. These elements of metric follow the Lagrangian equations of motion and carry during time local information about the deformation of phase-space up to second order, to account for the curvature of phase-space structures building up during the course of the dynamics. To resample at time and at a given point of phase-space, we consider the closest elements of metric in Lagrangian space and reconstruct the initial position of a test particle coinciding with using a second order expansion of the geometry of the flow around the metric elements. Once the initial position is found, we just need to interpolate at position and attribute it to .

At some time , the distorsion of phase-space is too strong to be simply represented at second order: is considered as a new initial condition and new isotropic elements of metric are created. Between initial time and , we use a cheap, second order interpolation to reconstruct , while resampling at is more accurate, using third order splines. Indeed, although our algorithm is meant to be globally precise only at second order, a third order description of the phase-space distribution function is essential at the moment of resampling to be able to conserve quantities such total energy and total mass in the long term. Furthermore, to preserve smoothness of the resampling procedure, we perform some interpolation in Lagrangian space between the initial positions proposed by each element of metric close to the point of interest. Indeed, different neighbouring metric elements can provide different answers. The procedure is illustrated by Fig. 1.

In § 3, we thoroughly test the performances of our algorithm, by checking how a stationary solution is preserved and by simulating systems with Gaussian initial condition as well as random sets of halos. To demonstrate the potential improvements brought by our code Vlamet compared to traditional semi-Lagrangian approaches, we compare our algorithm to the traditional splitting method of Cheng and Knorr (1976) with third order spline interpolation. Results are also tested against simulations performed with the entropy conserving waterbag code of Colombi & Touma (2014). The waterbag scheme, extremely accurate but very costly, is meant to provide a supposedly “exact” solution. It is mainly used to test the convergence of Vlamet and of the splitting algorithm for the Gaussian initial condition. From the pure mathematical side, which is not really approached in this article, it is worth mentioning the recent work of Campos Pinto and Charles (2016), who discuss some convergence properties of a simplified version of our algorithm.

## 2 Algorithm

Now we describe the main steps of the algorithm, namely, (i) our second order representation of the metric (§ 2.1), (ii) our interpolation schemes to reconstruct (§ 2.2), (iii) our percolation algorithm to find the nearest element of metric to an Eulerian pixel and the reconstruction of its Lagrangian counterpart using an interpolation between the alternatives proposed by the neighbouring elements of metric (§ 2.3), (iv) the equations of motion as well as their numerical implementation (§ 2.4) and finally, (v) the calculation of the force, its first and second derivatives, required to follow the second order representation of the local metric during motion (§ 2.5).

While the actual implementation of the algorithm is performed in 2D phase-space, generalisation to higher number of dimensions will be discussed in details. Indeed, our algorithm is meant to improve other standard semi-Lagrangian solvers in terms of diffusion while being of comparable cost, and is designed in such a way that it can be easily generalised to 4- and 6-dimensional phase-space.

### 2.1 Second order representation of the metric

The Lagrangian position of each fluid element in phase space is defined by

(2.0) |

where and are its initial position and velocity, respectively. Here we consider 2 dimensional phase-space, so these quantities are scalar, but in general, they are vectors. In what follows, unless specified otherwise, all the calculations apply to the general case: one just has to replace and with and . The Eulerian position at time of an element of fluid in phase-space is given by

(2.0) |

where and are its position and velocity at time . Initially

(2.0) |

In our second order approach, we need the deformation tensor,

(2.0) |

as well as the Hessian of the characteristics,

(2.0) |

With the knowledge at all times of the position , the deformation tensor and the Hessian for each element of metric, we can predict, at second order, the position of an element of fluid initially sufficiently close to an element of metric:

(2.0) |

Of course the series expansion could be developed further, but we restrict here to an algorithm at second order in space and time. Indeed, going beyond second order would be very costly in higher number of dimensions, particularly 6-dimensional phase-space. The interesting bit, as we shall see later, is the inverse relation between Eulerian position and Lagrangian position, i.e., at second order

(2.0) |

In the two dimensional phase-space case considered here, the deformation tensor is a 4 elements matrix with no special symmetry except that according to the Hamiltonian nature of the system its determinant should be conserved with time and equal to the initial value:

(2.0) |

at all times. Indeed

(2.0) |

where is the identity matrix. Note that this property is general, as volume is conserved in phase-space, it is not specific to the one dimensional dynamical case. We have,

(2.0) |

hence has only 3 independent elements. In practice we shall not impose equation (2.1) but checking to which extent it is verified can be used as a test of the good numerical behaviour of the code, or equivalently, as a test of its symplecticity.

In higher number of dimensions, phase-space volume conservation is not the only constraint. In fact it is a consequence of the symplectic nature of the system, which hence conserves Poincaré invariants, in particular symplectic forms of the kind

(2.0) |

where

(2.0) |

is the symplectic matrix, while and correspond to 2 sides of an elementary triangle composed of three particles following the equations of motion in phase-space. In particular,

(2.0) |

So the generalisation of equation (2.1) to higher number of dimensions provides the following constraints:

(2.0) | |||

(2.0) | |||

(2.0) |

which can be again used to double check, a posteriori, the good behaviour of the code in the general case.

In the 2-dimensional phase-space case considered here, the Hessian is a tensor in which only 6 terms are independant if one just takes into account the symmetries in second derivatives, e.g. . In 4- and 6- dimensional phase-space, the same symmetries imply that among its and 216 elements, the Hessian includes 40 and 126 independant terms, respectively.

While exploiting the Hamiltonian nature of the system would allow us to carry less information, we chose for each element of metric to store the phase-space coordinates, the deformation tensor and the Hessian. The calculations above show that this ends up in carrying 2+4+6=12, 4+16+40=60 and 6+36+126=168 numbers for each element of metric in 2-, 4- and 6-dimensional phase-space, respectively. This might look a large number, particularly in 4- and 6-dimensional phase-space, but one has to keep in mind the fact that elements of metric are much sparser than the grid used to compute the phase-space distribution function. For instance, a factor 3 in each direction of the spacing between elements of metric compared to the grid cell size implies that there is one element of metric for grid sites. Another issue is the cost of applying the inverse mapping (2.1), but it can be reduced considerably by precomputing for each element of metric the matrixes and prior to any interpolation of the phase-space distribution function using the inverse mapping.

### 2.2 Interpolation schemes

We use two interpolation schemes. Between two coarse steps, a period of time during which the phase-space distribution function is transported along motion using the same elements of metric at each time step, we use a simple interpolation of based on local fitting of a quadratic hypersurface. Then, during remapping, i.e. when new elements of metric are set, is interpolated with a third order spline in order to have a more accurate reconstruction of phase-space. The advantage of using second order interpolation during the transport phase is that generalisation to 6-dimensional phase-space is relatively cheap, since only 28 terms contribute in this case to such an interpolation. The loss of accuracy due to the relative roughness of the reconstruction, which is not even continuous, has little consequence on the dynamics because most of the information is recovered during the resampling phase. This will be tested and demonstrated in § 3. On the other hand, because our method does not allow for splitting, terms contribute to each grid site when resampling in 6 dimensions with the third order spline interpolation, if one assumes that this latter takes about 4 operation per grid site in one dimension. This rather costly step is thus roughly times more expensive than what is expected when performing a full time step in the standard splitting algorithm.^{}^{} This calculation assumes that third order spline interpolation is used in the splitting algorithm. In a split fashion, this interpolation is performed individually on each axis, which corresponds, in 6 dimension, to operations per grid site for a full time step, or operations if synchronisation between velocities and positions is performed.
It is however performed rarely, hence making the algorithm of potentially comparable cost to the standard semi-Lagrangian approach, with the advantage of being, in principle, less diffusive.

Now we provide the details for the second order interpolation we use, followed by the standard third order spline interpolation.

#### 2.2.1 Second order interpolation

The general procedure for second order interpolation consists in understanding that a second order hypersurface requires respectively 6, 15 and 28 points in 2D, 4D and 6D phase-space. What we have access to are the values of the phase-space distribution function on a grid , e.g., in one dimension where and are integers, corresponding to the following Lagrangian positions in phase space,

(2.0) |

where and are the steps chosen for the grid (Eulerian or Lagrangian) that remain, in the current version of the code, fixed during runtime and corresponds to the middle of the Lagrangian computational domain. The bounds of the computational domain are then related to maximum values and of and .

This is the way we proceed in 2D phase-space: given a point in phase space, we first find the nearest grid point to it, then consider the cross composed of , , , and on which the interpolation must coincide with . The last point needed is taken to be the one for which a square composed of 3 points of the cross and contains (see Fig. 2). With this choice of the points of the grid for the interpolation, we have, at second order,

with

(2.0) | |||||

(2.0) | |||||

(2.0) | |||||

(2.0) |

where and are respectively the nearest integer and the integer part functions. Note obviously that when defining the edges of the computing domain, one makes sure that the interpolation is defined everywhere. In the present implementation, is supposed to be different from zero on a compact domain in phase-space. To avoid artificial enlarging of the computing domain during runtime due to diffusion and aliasing, the computing domain is restricted to a rectangular region containing all the points where

(2.0) |

with a small number, typically . Note the absolute value in equation (2.0), because our algorithm does not enforce positivity of and better total mass conservation is obtained by keeping the regions where .

The generalisation of this second order interpolation to higher number of dimensions is straightforward: the most difficult is to chose the grid points through which the hypersurface must pass. To achieve this, similarly as in 2 dimensions, we first find the nearest grid point to . Then take the collection of all segments centered on and composed of 3 points of the grid (the cross defined above in 2 dimensions): this provides us in total 9 and 13 points respectively in 4 and 6 dimensions. Then, project on the planes defined by each pair of such segments and find the corresponding grid site in this plane such that the square composed of this grid site plus 3 contained in the 2 segments contains the projection of and add it to our list of points defining the second order hypersurface. There are 6 and 15 such points in 4 and 6 dimensions, respectively, and we thus end up using this procedure with respectively 15 and 28 points as needed to interpolate our hypersurface, with straightforward generalisation of equation (LABEL:eq:int1) to perform the interpolation.

#### 2.2.2 Standard third order spline

The third order interpolation we use employs standard cubic B-splines as very clearly described in e.g. Crouseilles et al. (2009), so we do not provide the details here, except the edge conditions, which are

(2.0) | |||||

(2.0) |

and this can be trivially generalised to 4- or 6-dimensional phase-space.

### 2.3 Tracing back the characteristics in Lagrangian space: percolation algorithm

To determine the value of , we have to trace the trajectory of a test particle coinciding with back to its Lagrangian, initial position (the position at time of remapping, in general). To do this, we need to find the closest element of metric to this test particle and use equation (2.1) to compute . However, the closest element of metric, from the dynamical point of view, has to be found in Lagrangian space, which complicates the procedure. Furthermore, two close test particles might end up associated to different elements of metric that will propose slightly different mappings, implying that function is discontinuous if no supplementary treatment is performed. While this is not a big deal during the transport phase, it can have dramatic consequences on the long term during the remapping step which requires high level of precision when interpolating the phase-space distribution function with third order splines. Hence, each time a full remapping of is performed, one needs to find not one but several elements of metric in the neighbourhood of a test particle to be able to interpolate between the proposed Lagrangian positions provided by each of these metric elements so that function stays smooth: this is the most complex part of the algorithm, because enforcing full continuity of the interpolation is actually not trivial since it has to be performed in Lagrangian space. The percolation algorithm however allows us to define in an unambiguous way regions of influence of each element of metric which in turn will allow us to make function smooth.

We assume that elements of metric are initially set on a Lagrangian grid covering the computing domain, , corresponding to Lagrangian positions

(2.0) |

with

(2.0) |

The actual computing domain changes subsequently with time. To estimate it at each time step we use the positions of the metric elements and we define a rectangle with lower-left and upper right corners respectively given by and . This means that the element of metric coverage should be wide enough to avoid missing regions where should be non equal to zero. We therefore allow for an extra layer of metric elements on all the sides of the Lagrangian computing domain. Note thus that the calculation of the computing domain is far from being optimal, which can have dramatic consequences for the cost of the algorithm in 6 dimensions. In order to fix this problem, one would need to define a more complex shape dependent computational domain by e.g. constructing a k-d tree, i.e. a spatial decomposition of phase-space with nested cubes. Such a k-d tree could also constitute the basis for an adaptive mesh refinement algorithm following in an optimal way not only the computational domain, but also all the details of the phase-space distribution function when needed. However such a sophistication is not needed in our 2-dimensional implementation so we stick to the rectangular computing domain for now.

#### 2.3.1 Percolation

To compute the region of influence of each element of metric with coordinates , we percolate around it on the Eulerian computational grid starting from its nearest grid point

(2.0) |

with being the time dependent position of the middle of the Eulerian computational domain. The initial Lagrangian position of this point is estimated with equation (2.1) using the element of metric we started from as a reference. Note that another neighbouring element of metric would propose a slightly different value for the initial position of the test particle associated to Eulerian grid site , hence the dependence on of . Then one defines the function

(2.0) | |||||

(2.0) |

where is the Lagrangian position of the element of metric, to find out which neighbours of the element of metric are susceptible to be closer to the true initial in Lagrangian space. Because the calculation of is not exact we need to introduce the concept of uncertainty. Considering all the neighbours of the current element of metric under consideration

(2.0) |

we are able to chose the one which proposes the smallest value of

(2.0) |

where the dependence on of and has been made implicit. If there is a candidate element of metric closer than to the Eulerian grid element using this Lagrangian distance, the percolation process is stopped. Otherwise, one continues (bond) percolation i.e. proceeds with neighbouring grid sites , , , unless they have been already processed.

Note that the percolation procedure just defined above is not always free of aliasing: it can be successful only if the separation between elements of metric is sufficiently large compared to the grid element size and if their zone of influence is not too distorted. Testing the validity of the percolation process is relatively simple, by checking if some points inside the computing domain have been left unexplored. If it is the case, percolation is performed again locally until all the unexplored points are processed, which induces a small additional cost in the algorithm.

To speed up the process, which is not really relevant here in 2D phase-space but crucial in higher number of dimensions, we reduce the range of investigations of values of to where

(2.0) | |||||

(2.0) | |||||

(2.0) |

with the dependence on of still implicit and is a parameter close to . Similar conditions are set to define an interval . The parameter is introduced to take into account the uncertainty on the remapping (2.1). This is illustrated by Fig. 3.

However, even though we can actually estimate as described below in § 2.3.3, we do not use it to calculate which is fixed from the beginning. Indeed, our implementation favours performance over perfect accuracy, to make a future extension to 6-dimensional phase-space possible: while it is better to find in Lagrangian space the actual nearest element of metric to a test particle, this is not a must and the algorithm can still work without it, at the cost of some loss of accuracy on the long run.

#### 2.3.2 Interpolation of the Lagrangian position of a test particle

We now explain how the Lagrangian position of a test particle can be calculated in a smooth way with proper interpolation in Lagrangian space between predicates given by each of neighbouring element of metric. We define the following mapping function from Eulerian to Lagrangian space

(2.0) |

where function is the obvious extension to the continuum of its discrete counterpart defined previously, likewise for , and is a smooth isotropic kernel function compactly supported. Clearly, function as defined above is at least continuous. Our choice for is the spline of Monaghan (1992):

(2.0) |

which preserves the second order nature of the Taylor expansion around each metric element position. Our choice of allows us to restrict the range of investigation of values of and to the respective intervals and where corresponds to the closest element of metric to point as computed in previous paragraph, § 2.3.1, for sites of a grid.^{}^{}Obviously, we assume here that the errors in the remapping (2.1) are smaller than the inter-element of metric separation. Here, we indeed only need to consider points associated to such an Eulerian grid. To reduce furthermore the number of elements of metric investigated to perform the interpolation (2.0) while still taking into account small variations related to the previously discussed uncertainty on the calculation of , we restrict the search to the intervals and defined as follow, with the same notations as in equations (2.0), (2.0) and (2.0):

(2.0) | |||||

(2.0) | |||||

(2.0) |

where is a small parameter accounting for the error , similarly for and , with the additional constraints if or to gain even more time.

Again, similarly as stated at the end of § 2.3.1, the above algorithm is not guaranteed to provide a smooth interpolation if the error on the remapping associated to the metric elements becomes larger than some threshold. However, this should have little consequence on the algorithm accuracy and the gain of speed in high dimensional phase-space is considerable with such a procedure. Indeed, if becomes too large, the loss of accuracy on the characteristics themselves is much more dramatic than small discontinuities introduced on their interpolation.

#### 2.3.3 Estimates of errors on the Lagrangian position and choice of the margin parameters and

In the two subsections above, we defined two parameters and but did not provide any numerical value. In principle these parameters could be user-defined, but we chose to fix them:

(2.0) | |||||

(2.0) |

The idea behind this choice is the following. Firstly, according to Figure 5, since we do not want to consider any diagonal element of metric if or , we must have for full consistency,

(2.0) |

Because there is some error on the reconstructed value of , we take half this value, assuming that

(2.0) |

Hence is given by equation (2.0) to take into account this upper bound for the errors. With this setting, the algorithm is fully consistent only if the uncertainty on the remapped trajectories remains small enough, but even if or are larger than , it still works. Simply function is not guaranteed to be completely smooth anymore.

Note that in our algorithm, a global error is estimated during the percolation process by comparing the value of proposed by a neighbouring element of metric to the value of proposed by the element of metric under consideration using the Lagrangian distance defined in equation (2.0). Such an error can in principle be used to force global remapping if exceeds some threshold, for instance to preserve full smoothness of function . While this is an option in our code, we do not use it for the tests performed in § 3. Indeed, we found difficult to find some reliable criterion on and preferred to keep the number of time steps between full resampling fixed and see how the results depend on the choice of , being aware of the fact that this is potentially suboptimal. Note also that and could be estimated as the residuals from the third order description of the local metric elements, but this is very costly in highly dimensional phase-space.

### 2.4 Equations of motion

In -dimensional phase-space, the equations characterising the evolution of an element of metric with time are given by the following expressions with some obvious new notations:

(2.0) | |||||

(2.0) | |||||

(2.0) | |||||

(2.0) | |||||

(2.0) | |||||

(2.0) |

where is the acceleration calculated by solving Poisson equation. Note thus that we need to have access at all times to the gradient and the Hessian of the gravitational force. In these equations we have dropped the “m” subscript for simplicity.

The actual time step implementation uses a splitting algorithm with a drift phase during half a time step where only spatial positions are modified, followed with a kick phase where only velocities are modified during a full time step using the acceleration computed after the drift, followed again by a drift during half a time step, where spatial positions are modified using the new velocities.

The first drift step or predictor is thus written

(2.0) | |||||

(2.0) | |||||

after which Poisson equation is solved to compute the acceleration , its gradient and its Hessian. To compute the acceleration, we first need to estimate the phase-space distribution function on the Eulerian grid. To map back to initial conditions (or previous stage after full remapping), we use the elements of metric as transformed by the predictor step.

The kick phase reads

(2.0) | |||||

(2.0) | |||||

(2.0) | |||||

Finally the second drift phase, or corrector, is expressed as follows:

(2.0) | |||||

(2.0) | |||||

(2.0) |

With our predictor-corrector scheme, it is possible to adopt a slowly variable time step. In this case, we use the following dynamical constraint (see, e.g., Alard and Colombi 2005; Colombi & Touma 2014),

(2.0) |

where is the maximum value of the projected density, . Note that for the tests performed in § 3, we used a constant time step, but this should not have critical effects on the numerical results.

### 2.5 Calculation of the force and its derivatives

In practice, the force is calculated on a grid and its gradient and Hessian estimated by simple finite difference methods.

In our 2D phase-space implementation, the acceleration is calculated at discrete points with coordinate corresponding to the projection of the Eulerian phase-space grid. To estimate it, we first calculate the projected mass inside each pixel by summing up the phase-space density calculated on each point of the Eulerian phase-space grid. In the present work we consider isolated systems in an empty space, which means that the acceleration at a given point is simply proportional to the total mass at the right of point minus the total mass at the left of . This implies, within some trivial factor depending on code units

(2.0) |

with

(2.0) |

and where is the total mass of the system. Note thus that the mass in pixel does not contribute to as it is supposed to be distributed symmetrically inside it.^{}^{}Note that a potentially better procedure could consist in using a staggered mesh for computing the acceleration, in which the nodes would correspond to boundaries of each pixel of the grid. In larger number of dimensions, Poisson equation is more complex to solve but this can be done on a fixed grid resolution with FFT methods, whether the system is spatially periodic or isolated as supposed here.

Our estimate of the first and second derivatives of the acceleration at grid points is written

(2.0) | |||||

(2.0) |

which is equivalent to fit the acceleration with a second order polynomial on the three successive points corresponding to , and .

To compute the force at an arbitrary point of the computing domain, we perform dual TSC (triangular shaped cloud, see e.g., Hockney and Eastwood 1988) interpolation from the grid, i.e.,

(2.0) | |||||

(2.0) | |||||

(2.0) |

and likewise for its gradient and its second derivative. Note that we also tested linear interpolation, which leads in practice to nearly the same results as TSC for all the numerical tests we discuss in next section.

Clearly, the way we estimate the force and its successive derivatives is probably too naive but actually works pretty well for all the experiments we did so we did not bother to try to improve this part of our algorithm. However, this should be investigated in a future work.

## 3 Numerical experiments

Designation | ||||

I | 0.002 | 0.02 | 25 | 1.0 |

II | 0.002 | 0.04 | 25 | 1.0 |

III | 0.002 | 0.01 | 25 | 1.2 |

IV | 0.002 | 0.005 | 25 | 1.4 |

V | 0.002 | 0.02 | 100 | 1.1 |

VI | 0.002 | 0.02 | 50 | 1.1 |

VII | 0.002 | 0.02 | 10 | 1.2 |

VIII | 0.002 | 0.02 | 5 | 1.5 |

IX | 0.002 | 0.02 | 1 | 3.2 |

X | 0.001 | 0.01 | 25 | 4.0 |

XI | 0.004 | 0.04 | 25 | 0.4 |

XII | 0.0005 | – | – | 8.8 |

XIII | 0.00707 | – | – | 4.5 |

XIV | 0.001 | – | – | 2.3 |

XV | 0.00141 | – | – | 1.2 |

XVI | 0.002 | – | – | 0.7 |

To test the method, we performed an ensemble of simulations of which the parameters are listed in Table 1. In § 3.2, we compare the results obtained with Vlamet to the standard splitting algorithm of Cheng and Knorr (1976) with cubic B-spline interpolation as well as the extremely accurate entropy conserving waterbag simulations of Colombi &
Touma (2014, hereafter CT14). Before going into the details of these comparisons, we first provide in § 3.1 additional technical information on the performances of Vlamet and of the splitting algorithm.^{}^{}Comparisons to the waterbag code in terms of performances are irrelevant here, since the future goal is to generalize Vlamet to higher number of dimensions in phase-space. The waterbag method, extremely accurate but already costly in 2-dimensional phase-space, becomes prohibitive in higher number of dimensions. For instance, in 4-dimensional phase-space, it consists in following, in an adaptive fashion, ensembles of hypersurfaces sampled with tetrahedra (see, e.g. Sousbie and Colombi 2016). In particular, we discuss parallelisation issues as well as extension to higher number of dimensions.

### 3.1 Performances: parallel scaling and computing cost

Both Vlamet and the splitting algorithm are parallelised on shared memory architecture using OpenMP library, but no attempt has been made to optimize cash memory, so there is still room for improvement. Parallel scaling is displayed on Fig. 6, which shows that the two codes present a reasonable speedup, greater than 10 on 32 cores. The scaling is slightly better for the metric code than for the splitting algorithm but this comparison is irrelevant without further optimisation. The main point here is to show that OpenMP parallelisation is rather straightforward for Vlamet, while it is already well known that both distributed and shared memory parallelisations are not a problem for the splitting algorithm (see, e.g., Crouseilles et al. 2009).

As an additional note, but this is something we leave for future work, there is in principle no particular difficulty in parallelising Vlamet on a distributed memory architecture with the MPI library. Except for solving Poisson equation and for cubic B-spline interpolation that we discuss just after, all the operations performed in the algorithm, including the percolation process, are of local nature, hence trivially compatible with domain decomposition. In 4 and 6-dimensional phase-space, introducing a k-d tree structure to focus on important regions of phase space is also quite standard in parallel programming. The solution to Poisson equation is well known in many parallel implementations. For instance there exist parallel Fast Fourier Transform libraries such as FFTW to compute the gravitational potential on a fixed resolution grid. As for spline interpolation, a domain decomposition implementation is possible using the method proposed by Crouseilles et al. (2009). In this case, interpolation is localised to each domain by using Hermite boundary conditions between the domains with an ad hoc reconstruction of the derivatives. In this implementation, each domain would for instance correspond to a leaf of the k-d tree mentioned above. Of course, as long as we did not actually implement the algorithm on distributed memory architecture, this discussion stays at the speculative level.

In terms of total computational time, interesting conclusions can be driven from examination of fifth column of Table 1, which indicates, for each run, the approximate total CPU cost in units of simulation I. Neglecting the Poisson equation resolution step, both Vlamet and the splitting algorithm are expected in general to scale roughly like the number of cells in the grid representing phase-space, where (here equal to unity) is the dimension of space. This is approximately the case in Table 1 if is small enough. The cost also increases for Vlamet when resamplings of the phase-space distribution function are more frequent (in practice, if for the runs we did) or when the element of metric density increases (in practice, if for the runs we did).

When it comes to really compare Vlamet to the standard splitting algorithm, we focus on single thread runs, assuming that full parallelisation is equally optimal on both algorithms. We noticed that with the same spatial resolution , Vlamet is about times slower than the splitting algorithm for and . How these estimates generalize to 4 and 6 dimensions remains an open question, but we can try to infer some rough estimates. From now on we restrict to 6D phase-space. In this case, the critical step in Vlamet is the cubic spline interpolation during the resampling phase, which is expected to be about 171 times more costly than a step of the standart splitting algorithm, as discussed in introduction of § 2.2. The rest of the algorithm, on the other hand, should perform comparably well to the splitting scheme, i.e. should be times slower with of the order of unity. Hence we expect, in six dimensions, a factor