An efficient numerical method for solving the Boltzmann equation in multidimensions

An efficient numerical method for solving the Boltzmann equation in multidimensions

Giacomo Dimarco Department of Mathematics and Computer Science, University of Ferrara, Via Machiavelli 30, 44122 Ferrara, Italy. Raphaël Loubère CNRS and Institut de Mathématiques de Toulouse (IMT) Université Paul-Sabatier, Toulouse, France. Jacek Narski CNRS and Institut de Mathématiques de Toulouse (IMT) Université Paul-Sabatier, Toulouse, France. Thomas Rey Laboratoire Paul Painlevé, Université Lille 1, Lille, France.

In this paper we deal with the extension of the Fast Kinetic Scheme (FKS) [J. Comput. Phys., Vol. 255, 2013, pp 680-698] originally constructed for solving the BGK equation, to the more challenging case of the Boltzmann equation. The scheme combines a robust and fast method for treating the transport part based on an innovative Lagrangian technique supplemented with fast spectral schemes to treat the collisional operator by means of an operator splitting approach. This approach along with several implementation features related to the parallelization of the algorithm permits to construct an efficient simulation tool which is numerically tested against exact and reference solutions on classical problems arising in rarefied gas dynamic. We present results up to the DD case for unsteady flows for the Variable Hard Sphere model which may serve as benchmark for future comparisons between different numerical methods for solving the multidimensional Boltzmann equation. For this reason, we also provide for each problem studied details on the computational cost and memory consumption as well as comparisons with the BGK model or the limit model of compressible Euler equations.

Boltzmann equation, kinetic equations, semi-Lagrangian schemes, spectral schemes, 3D/3D, GPU, CUDA, OpenMP, MPI.
(82B40, 76P05, 65M70, 65M08, 65Y05, 65Y20
journal: Journal of Computational Physics

1 Introduction

Kinetic equations consider a representation of a gas as particles undergoing instantaneous collisions interspersed with ballistic motion cercignani; ACTA. Nowadays, these models appear in a variety of sciences and applications such as astrophysics, aerospace and nuclear engineering, semiconductors, plasmas related to fusion processes as well as biology, medicine or social sciences. The common structure of such equations consists in a combination of a linear transport term with one or more interaction terms which furnishes the time evolution of the distribution of particles in the phase space. For its nature, the unknown distribution lives in a seven dimensional space, three dimensions for the physical space and three dimensions for the velocity space, plus the time. This makes the problem a real challenge from the numerical point of view, since the computational cost becomes immediately prohibitive for realistic multidimensional problems ACTA. Aside from the curse of dimensionality problem, there are many other difficulties which are specific to kinetic equations. We recall two among the most important ones. The computational cost related to the evaluation of the collision operator involving multidimensional integrals which should be solved in each point of the physical space FiMoPa:2006; PR2. The second challenge is represented by the presence of multiple scales which requires the development of adapted numerical schemes to avoid the resolution of the stiff dynamics Dimarco_stiff1; Dimarco_stiff2; Jin_review; Jin; Jin2; BLM; degondrev typically arising when dealing with problems with multiple regimes.

Historically, there exists two different approaches which are generally used to tackle kinetic equations from a numerical point of view: deterministic numerical methods such as finite volume, semi-Lagrangian and spectral schemes ACTA, and, probabilistic numerical methods such as Direct Simulation Monte Carlo (DSMC) schemes bird; Cf. Both methodologies have strengths and weaknesses. While the first could normally reach high order of accuracy, the second are often faster, especially for solving steady problems but, typically, exhibit lower convergence rate and difficulties in describing non stationary and slow motion flows.

In this work, we deal with deterministic techniques. In particular, we focus on semi-Lagrangian approaches CrSon; CrSon1; Gu; Shoucri; FilbetRusso; Filbet2 for the transport part coupled with spectral methods canuto:88 for the interaction part. Our main goal is to tackle the challenges related to the high dimensionality of the equations and with the difficulties related to the approximation of the collision integral. More in details, we deal with the extension of our recent works FKS; FKS_HO; FKS_DD; FKS_GPU which were based on an innovative semi-Lagrangian technique for discretizing the transport part of a kinetic model (FKS method). This technique has been applied solely to the solution of a simple kinetic equation with a relaxation type collision operator, i.e. the BGK (Bhatnagar-Gross-Krook) operator Gross. Here we extend it to the case of the full Boltzmann operator bird; cercignani and we test his performances up to the six dimensional case. The FKS is based on the classical discrete velocity models (DVM) approach bobylev; Pal; Pal1; Mieussens. Successively, in order to overcome the problem of the excessive computational cost, we propose to use a Lagrangian technique which exactly solves the transport step on the entire domain, without reprojecting the solution on the grid at each time step. The FKS approach was shown to be an efficient way to solve linear transport equations, and, it has permitted the simulation of full six dimensions problems on a single processor machine FKS. Unfortunately the solutions computed with this method are limited to a first order in space and time precision. Extension of the method to high order reconstruction is under consideration RFKS. Concerning the discretization of the collision operator we rely on Fourier techniques. For the resolution of the Boltzmann integral, these techniques have been first introduced independently by L. Pareschi and B. Perthame in pareschi:1996 and by A. Bobylev and S. Rjasanow in bobylev_rjasanow97. Since then, this approach has been investigated by a many authors FiMoPa:2006; PR2; gamba:2010; gamba; gambaL; wu2013deterministic; bobylev_rjasanow97; Rjasanow; Villani; filbet:2011Conv; wu2015influence; Aleks; PaRu:stab:00; wu2015fast. In this work we will make use in particular of the fast method described in MoPa:2006; FiMoPa:2006 which has a complexity of the order of where is the number of point in which the velocity space is discretized in one direction and the dimension of the velocity space. The method preserves mass, and approximates with spectral accuracy momentum and energy.

Combining opportunely the FKS method with the fast spectral approach we have developed a method for solving the Boltzmann equation up to the six dimensional case for unsteady flows. In order to obtain such result we have constructed a parallel version of our algorithm taking advantages of Graphical-Processor-Unit (GPU) under CUDA language. The results presented in this work show that we are nowadays ready and able to use kinetic equations to simulate realistic multidimensional flows. Up to our knowledge this is one of the first examples in literature of solution of the full multidimensional Boltzmann equation by means of deterministic schemes. Our main limitation to run extensive simulations remains at the present moment the lack of memory capacity due to the use of shared memory machines. However, in the present paper, we also report some preliminary results about the extension of our method to deal with distributed-memory computers but we postpone for a future research the detailed analysis of the Message Passive Interface (MPI) version of the algorithm as well as the analysis of its performances.

The article is organized as follows. In Section 2 we recall the Boltzmann equation. In Section 3 we present the Fast Kinetic Scheme and the fast spectral scheme. In Section 4 we detail the aspects related to the implementation of the resulting algorithm necessary to realize an efficient parallel numerical tool. Several tests starting from the 0D 2D up to the 3D3D case are studied in details in Section 5. These tests assess the validity of our approach as well as detail all the computational aspects. Conclusions and future works are finally exposed in Section 6.

2 The Boltzmann equation

In this section we briefly recall the Boltzmann equation and its main properties, we also recall some related models, i.e. the BGK model and the compressible Euler equations, which will be used for numerical comparisons. We refer the reader to ACTA; cercignani for an exhaustive description.

In the kinetic theory of rarefied gases, the non-negative function characterizes the state of the system and it defines the density of particles having velocity in position at time , where is the physical dimension and the dimension of the velocity space. The time evolution of the particle system is obtained through the equation


The operator , on the right hand side in equation (1), describes the effects of particle interactions and its form depends on the details of the microscopic dynamic. Independently on the type of microscopic interactions considered, typically the operator is characterized by some conservation properties of the physical system. They are written as


where are commonly called the collision invariants. We denote by

the first three moments of the distribution function , namely, , where is the density, the momentum and the energy. Integrating (1) against yields a system of macroscopic conservation laws


The above moment system is not closed since the second term involves higher order moments of the distribution function . However, using the additional property of the operator that the functions belonging to its kernel satisfy


where the Maxwellian distributions can be expressed in terms of the set of moments by


with the temperature, one can get a closed system by replacing with in (3). This corresponds to the set of compressible Euler equations which can be written as


with . The simplest operator satisfying (2) and (4) is the relaxation operator Gross


where defines the so-called collision frequency. Its values will be specified in the numerical test Section in order for the model to be as close as possible to the Boltzmann model described next. The classical Boltzmann operator reads


where is a vector of the unitary sphere . The post-collisional velocities are given by the relations


where is the relative velocity. The kernel characterizes the details of the binary interactions, it has the form


where the scattering cross-section , in the case of inverse -th power forces between particles, can be written as


with . The special situation gives the so-called Maxwell pseudo-molecules model with


For the Maxwell case the collision kernel is independent of the relative velocity. For numerical purposes, a widely used model is the variable hard sphere (VHS) model introduced by Bird bird. The model corresponds to , where is a positive constant, and hence


In the numerical test Section we will consider the Maxwell molecules case when dealing with a velocity space of dimension and with the more realistic case of VHS molecules when dealing with the three dimensional case in velocity space: . For comparison purposes we will also consider the simplified BGK model (7) and the compressible Euler system case (6).

3 The Fast Kinetic Scheme and the Fast Spectral Scheme

In this section we detail the numerical scheme. In the first part we discuss the Fast Kinetic Scheme (FKS) and in the second part the Fast Spectral method for the Boltzmann equation. The two solvers are connected by using splitting in time approaches as the ones described in Des.

3.1 The Fast Kinetic Scheme (FKS)

The Fast Kinetic Scheme FKS; FKS_HO belongs to the family of so-called semi-Lagrangian schemes CrSon; CrSon1; Filbet which are typically applied to a Discrete Velocity Model (DVM) bobylev; Mieussens approximation of the original kinetic equation.

In order to introduce the scheme, let us truncate the velocity space by fixing some given bounds and set a cubic grid in velocity space of points with the grid step which is taken equal in each direction. The continuous distribution function is then replaced by a vector whose components are assumed to be approximations of the distribution function at locations :


The discrete velocity kinetic model consists then of a set of evolution equations in the velocity space for , , of the form


where is a suitable approximation of the collision operator at location discussed in Section (3.3). Observe that, due to the truncation of the velocity space and to the finite number of points with which is discretized, the moments of the discrete distribution function are such that


with the discrete collision invariants, i.e. they are no longer those given by the continuous distribution . This problem concerns all numerical methods based on the discrete velocity models and different strategies can be adopted to restore the correct macroscopic physical quantities Mieussens; Pal; Pal1; gamba. At the same time, the spectral approach does not preserve the energy and the momentum of the system even if it approximates these quantities with spectral accuracy PR2. In order to solve this problem of loss of conservation we adopt an projection technique for the discretized distribution and the discretized collision operator which permits to project the discretized and in the space of the distributions for which the moments are exactly the continuous macroscopic quantities we aim to preserve. We detail this approach in the following paragraph. For lightening the notations we omit the tilde in the rest of the paper, we only use it in the next paragraph in order to introduce the projection procedure. We will then suppose from now on that all the operations preserves the macroscopic moments if not differently stated.

Figure 1: Illustration of the transport scheme for the FKS scheme. Left panel before transport step, right panel after transport step. Each discrete velocity (index ) drives its own transport equation with velocity . The representation of is made by means of a piecewise constant function. The shape of the entire function has not changed during the transport but the cell-centered values (bullets) may have.

Let us now introduce a Cartesian uniform grid in the three dimensional physical space made of points with a scalar which represents the grid step (the same in each direction) in the physical space. Further we define a time discretization starting at , where is the time step defined by an opportune CFL condition discussed next. The time index varies between and so that the final time is .

Each equation of system (15) is solved by a time splitting procedure. We recall here a first order splitting approach: first a transport step exactly solves the left-hand side, whereas a collision stage solves the right-hand side using the solution from the transport step as initial data:

Transport stage (17)
Collisions stage (18)
Transport step

Let be the point-wise values at time of the distribution , . The idea behind the fast kinetic scheme is to solve the transport stage (17) continuously in space, see Fig. 1 for a sketch in the one dimensional setting. To this aim we define at the initial time the function as a piecewise continuous function for all , where and . Hence starting from data at time index , the exact solution of (17) is simply


In other words, the entire function is advected with velocity during unit of time and the superscript indicates that only the transport step has been solved so far. The extension of this procedure to the generic time step gives


where now, the key observation is that the discontinuities of the piecewise function do not lie on the interfaces of two different cells. Instead, the positions of the discontinuities depend entirely on the previous advection step and thus they may be located anywhere in the physical space. This means that if only the linear transport equation has to be solved, this approach gives the exact solution to the equation if the initial data is truly a piecewise constant function initially centered on the spacial mesh.

Collision step

The effect of the collisional step is to change the amplitude of . The idea is to solve the collision operator locally on the grid points, and, successively, extend these computed values to the full domain . Thus we need to solve the following ordinary differential equation


where , for all velocities of the lattice and grid points . The initial data for solving this equation is furnished by the result of the transport step obtained by (20) at points of the mesh at time , i.e. , for all , and . Then, the solution of (21), locally on the grid points, reads if, for simplicity, a forward Euler scheme is used as


where . Many different time integrators can be employed to solve this equation. In particular special care is needed in the case in which the equation becomes stiff, refer to Dimarco_stiff1; Dimarco_stiff2 for alternative strategies. Since the time integration of the collision term is not the issue considered in this paper, we considered the simplest possible scheme, but the FKS technique remains the same when other time integrators are employed. Equation (22) furnishes a new value for the distribution at time only in the cell centers of the spacial cells for each velocity . However, one needs also the value of the distribution in all points of the domain in order to perform the transport step at the next time step. Therefore, we define a new piecewise constant function for each velocity of the lattice as


Said differently we make the fundamental assumption that the shape of in space coincides with the one of . Thanks to the above choice one can rewrite the collision step in term of spatially reconstructed functions as


This ends one time step of the FKS scheme.

Concerning the transport part of the scheme, the time step is constrained by a CFL like condition of type


The time step constraint for the collision step depends on the choice of the operator . Since in the numerical test section we used both a BGK and a Boltzmann operator, the time step constraint for the interaction part has been chosen as the minimum time step which gives stability in the solution of the ODE (22) independently on the type of collision operator employed. As observed in FKS the transport scheme is stable for every choice of the time step, being the solution for a given fixed reconstruction performed exactly. Nonetheless the full scheme being based on a time splitting technique, the error is of the order of in the case of first order splitting or of order for a splitting of order . This suggests to take the usual CFL condition for the transport part in order to maintain the time error small enough.

To conclude this Section let us observe that time accuracy can be increased by high order time splitting methods, while spacial accuracy can be increased close to the fluid limit to a nominally second-order accurate scheme by the use of piecewise linear reconstructions of state variables, see the details in FKS_HO. Spacial accuracy for all regime can also be increased by using high order polynomial reconstruction for the distribution . This work is in progress and results are discussed in RFKS.

3.2 Conservation of macroscopic quantities

In order to preserve mass, momentum and energy in the scheme, we employ the strategy proposed in gamba. We consider one space cell, the same renormalization of should be considered for all spatial cells. This step is performed at the beginning, i.e. . for the distribution and after each collision step (22) which causes loss of momentum and energy due to the spectral discretization. Let be the distribution function vector at at the center of the cell and let be the unknown corrected distribution vector which fulfills the conservation of moments. Let

be a matrix of coefficients depending on the discretization parameters and be the vector of conserved quantities, namely density, momentum and energy. Conservation can be imposed solving a constrained optimization formulation:


The solution of this minimization problem can be analytically obtained by employing the Lagrange multiplier method. Let be the Lagrange multiplier vector. The corresponding scalar objective function to be minimized is given by


Then, by nullifying the derivative of with respect to we get


while the Lagrange multipliers are obtained by solving


In particular, the above expression says that the value of is uniquely determined by . Back substituting into (28) finally provides


Note that matrix is pre-computed and stored as being constant for each simulation. If a discretization of the Maxwellian distribution is needed, as for example when using the BGK model, the same procedure should be applied to the function in order to assure conservation of the first three moments at each instant of time at which the Maxwellian function is invoked. In this case by defining with the approximated Maxwellian and by the pointwise value of the equilibrium distribution, we get mimicking (30)


This ends the description of the procedure which permits to conserve macroscopic quantities.

3.3 Fast Spectral Scheme (FSS) to discretize the Boltzmann collision operator

The fast spectral discretization of the Boltzmann operator employed in this work is described in this section. To this aim, we focus again on a given cell at a given instant of time . The same computation is repeated for all cells and times . Moreover, since the collision operator acts only on the velocity variable, to lighten the notation in this paragraph, only the dependency on the velocity variable is considered for the distribution function , i.e. .

The first step to construct for our spectral discretization is to truncate the integration domain of the Boltzmann integral (8) as done for the distribution . As a consequence, we suppose the distribution function to have compact support on the ball of radius centered in the origin. Then, since one can prove that , in order to write a spectral approximation which avoid aliasing, it is sufficient that the distribution function is restricted on the cube with . Successively, one should assume on and extend to a periodic function on the set . Let observe that the lower bound for can be improved. For instance, the choice guarantees the absence of intersection between periods where is different from zero. However, since in practice the support of increases with time, we can just minimize the errors due to aliasing canuto:88 with spectral accuracy.

To further simplify the notation, let us take and hence with in the following. We denote by the Boltzmann operator with cut-off. Hereafter, using one index to denote the -dimensional sums, we have that the approximate function can be represented as the truncated Fourier series by


We then obtain a spectral quadrature of our collision operator by projecting (8) on the space of trigonometric polynomials of degree less or equal to , i.e.


Finally, by substituting expression (32) in (34) one gets after some computations


where are given by




Let us notice that the naive evaluation of (35) requires operations, where . This causes the spectral method to be computationally very expensive, especially in dimension three. In order to reduce the number of operations needed to evaluate the collision integral, the main idea is to use another representation of (8), the so-called Carleman representation Carl:EB:32 which is obtained by using the following identity

This gives in our context for the Boltzmann integral




This transformation yields the following new spectral quadrature formula


where are now given by


Now, in order to reduce the number of operation needed to evaluate (40), we look for a convolution structure. The aim is to approximate each by a sum

where represents the number of finite possible directions of collisions. This finally gives a sum of discrete convolutions and, consequently, the algorithm can be computed in operations by means of standard FFT technique canuto:88.

In order to get this convolution form, we make the decoupling assumption


This assumption is satisfied if is constant. This is the case of Maxwellian molecules in dimension two, and hard spheres in dimension three, the two cases treated in this paper. Indeed, using kernel (10) in (39), one has

so that is constant if , and , .

We start by dealing with dimension and , i.e. Maxwellian molecules. Here we write and in spherical coordinates and to get

Then, denoting for , we have the explicit formula

where . This explicit formula is further plugged in the expression of and using its parity property, this yields

Finally, a regular discretization of equally spaced points, which is spectrally accurate because of the periodicity of the function, gives




where .

Now let us deal with dimension with satisfying the decoupling assumption (42). First we switch to the spherical coordinates for :

Then, integrating first on the intersection of the unit sphere with the plane gives


This leads to the following decoupling formula

where denotes the half sphere and

Now, in the particular case where , i.e. the hard sphere model, we can explicitly compute the functions . These are

Taking a spherical parametrization of and uniform grids of respective size and for and (again spectrally accurate because of the periodicity of the function) leads to the following quadrature formula for


and for all and

4 Numerical implementation

We discuss in this part some aspects relative to the numerical implementation of the method.

4.1 Algorithm for the transport part

The method described in the previous sections can be resumed into two main actions: transport and interaction. In our implementation we have adopted a particle like interpretation of the FKS scheme which helps in reducing the computational effort and the memory requirement. In this interpretation, each point of the quadrature of the phase space is represented by a particle with a given mass, velocity and position. The transport phase causes the particle to move in the physical space, while the interaction phase causes the mass of each particle to change. In this setting, the distribution function can be expressed as


where represents the particle position, the particle velocity, the particle mass. Moreover, the particle velocity corresponds to the quadrature point chosen to discretize the velocity space. The effect of the transport is a simple shift of particles to their new positions according to


The collision step acts only locally and changes the velocity distribution. The particle interpretation of this part of the scheme consists in changing the mass of each particle through the formula (22) which reads


where corresponds to the approximation of the collision integral in the center of the cell evaluated at location by means of the fast spectral method described before. Let observe that thanks to the uniform grid in velocity and physical space, the number of particles is the same in each cell and remains constant in time. This permits to consider the motion of only a fixed subset of particles which belongs to only one chosen cell. The relative motion of all other particles in their cells being the same FKS. This gives great computational advantage since the number of particle to track in time is greatly reduced. The only information which should be tracked remains the mass of each individual particle which is different for every velocity and position . This reinterpretation of the scheme permits to strongly reduce the computational cost related to the transport part as shown in the numerical test section.

4.2 Profiling and parallelization strategy

The Fast Kinetic scheme for the BGK equation was shown to be very efficient and extremely parallelizable (with a shown acceleration very close to the ideal scaling) in FKS_GPU on mild parallel architecture, namely using OpenMP on maximum threads and on two Graphical Processing Units (GPU) using CUDA framework. These light parallel infrastructures are unfortunately not sufficient when Boltzmann operator is to be simulated. To this aim, let us briefly discuss the profiling of the collision algorithm. As we have already seen in the description of the method in the previous section, the resolution of Boltzmann operator requires several passages from the physical space to the Fourier space, which implies several calls to (inverse) Fourier transforms within nested loops which are due to the convolutive form of the integral. This interaction step covers about of the total cost of a 2D2D simulation on a serial machine, see table 1. The situation is alike in the case of 3D3D simulations, and, thus, results are not reported. From table 1 we deduce that the collision step is the part of the code which demands to be dealt with great attention if any gain in performance is expected.

Main routines Cost CPU % Sub-routines Sub-cost CPU %
Transport 2 % 2 %
Update moments 0 % 0 %
Collision 98 % Compute 5 %
-Loop: compute coeff. 5 %
-Loop: compute 87 %
Update 3 %
Total 100 % 100%
Table 1: Schematic table of the average cost for each routine of the code on a cylindrical Sod like problem in 2D2D. Simulation is run on a serial machine. The collision part demands almost all the CPU resources and, within this step, the calls to the (inverse) Fourier transform routine covers more than of the total cost of one single step.

More precisely, if we give a closer look to the computational cost related to the computation of the collision term in one fixed spacial cell, the major cost (beyond ) is spent by the Fourier transform routines. In our implementation, the collision is performed without any communication with the local neighborhood. In other words, once the values of the distribution functions are stored then is computed locally. Because is not too large, typically the number of points chosen to discretize the velocity space goes from to , then the memory requirement is not extremely demanding. Finally, as seen in the previous Section the computation of is based on a loop over the number of angles with in dimension two, and, a double loop over the angles and in dimension three with and with a typical number of chosen angles of eight. After a first Fourier transform of the distribution , for each iterate of these loops, the coefficients of the quadrature formula are computed and then three calls to the inverse fast Fourier transform are done. The results of the FFT are gathered into temporary complex variables later arranged to form the solution of . The costs related to the four main routines needed for the evaluation of the operator namely: Fourier transform, coefficients computation, inverse Fourier transform and sum of the obtained values to form , are detailed in table  1.

We discuss now some aspects related to the parallelization strategy. The framework described is extremely well suited for parallelization. The strategy adopted is to divide the space domain in several subdomains because the interaction step, the most expensive one, does not demand any communication between spatial cells. Each cell can be therefore treated independently from the others. Then, the idea employed has been to use the OpenMP on shared memory systems and make each computational core responsible for a subset of a space mesh. This approach requires very little modification of a sequential code and gives strong scaling close to ideal as shown in FKS_GPU for the BGK case. A different possibility is to compute the collision kernel, and all the related Fast Fourier Transforms on a GPU. Even if, this involves substantial amount of slow communications between CPU and GPU, the computational complexity of this part is so large that the communication time does not play a fundamental role. This approach requires more programming effort, compared to the OpenMP parallelization strategy as several routines have to be completely rewritten. In particular, the code has to be adapted to the Single Instruction, Multiple Threads (SIMT) parallel programming mode imposed by the GPU architecture. That is to say, parts of the code that can be run in parallel on large number of threads have to be identified and rewritten as so called CUDA kernels - functions that are executed multiple times in parallel by different CUDA threads. Moreover, the GPU-CPU memory data transfer has to be taken into account. The payoff is however higher and, at least for computations which does not require too large memory storage, this is probably the best choice.

Unfortunately, when the number of points needed to perform a simulation grows in the case the Boltzmann operator is used, the above implementation strategies on shared memory systems are not sufficient any more. This is true particularly in the full three dimensional case. Luckily, the method described in this work can be easily adapted for its application in distributed memory systems. The idea adopted in this case is to distribute spatial degrees of freedom over computational nodes, keeping on every node a complete set of velocity space points. The computational domain is then decomposed into slices along the direction (see Fig.2). The slices are successively distributed over different MPI processes. Every node will therefore compute the collision term for a part of computational domain. This can be done on each node using the OpenMP or GPU parallelization strategy, depending on the architecture at disposal. As the update of the primitive variables requires an exchange with neighboring spatial cells, some of the particle masses, which contain all the information related to the distribution function in our implementation, need to be broadcast to other computational nodes. This data exchange, which is typically a bottleneck in the MPI computations, and, usually demands large efforts in order to be minimized the internodal communications, in the case of the Boltzmann equation, takes only a small part of the total runtime, even if, in our first MPI implementation, the domain decomposition is far from being optimal. We postpone to a future work the discussion related to a more efficient MPI parallelization strategy and all details related to this type of implementation.

z axis

ghost cells

ghost cells

Figure 2: Domain decomposition for the MPI parallelization. The domain slices are distributed over computational nodes. After the collision step the new information brought by the particles on the cells located on the subdomain boundary are communicated to the ghost cells of the neighboring nodes.

5 Numerics

In this section, we validate the proposed SFKS (Spectral-FKS) method which couples the fast semi-Lagrangian and the fast spectral scheme. The testing methodology is divided into four parts.

Part 1. Sanity checks. We only consider the space homogeneous Boltzmann equation in two and three dimensions in velocity space. Two problems from PR2 for which the analytical solution is known are simulated. The results show that our implementation of the fast spectral discretization of the Boltzmann operator is correct.

Part 2. Using the SFKS method in one dimension of physical space and two or three dimensions in velocity space, we show on Riemann like problems the differences between the BGK and the Boltzmann models. The reported results justify the use of the more complex and costly collisional Boltzmann operator especially in situations far from the thermodynamical equilibrium. We also report the details of the computational costs as well as the numerical convergence of the method for an increasing number of points in the physical or velocity space.

Part 3. Using the SFKS method in the two dimensional case in space and velocity, we show that using the Boltzmann or the BGK operator leads to notably different results. We consider a regular vortex like problem and a reentry like problem with evolving angle of attack. Performances and profiling study of the SFKS approach supplemented with BGK or Boltzmann collisional operator are provided for refined grids in space and time. Details about the computational costs of the different part of the scheme, scalability with respect to the number of cells are also furnished in order to characterize the behavior of the method as precisely as possible.

Part 4. We test the SFKS method in the full three dimensional case in space and velocity for an unsteady test problem. The interaction of a flying object with the surrounding ambient gas is simulated. We compare the results obtained with the Boltzmann collisional operator with those given by a simpler BGK model. Moreover, we measure the cost of such a 3D3D simulation in terms of CPU, cost per degree of freedom and scalability with respect to the number of cells. Some code profiling is also provided to measure the cost of the main components of the scheme.

Finally, the ability of the whole numerical code to run on several parallel environments is tested: CPU (OpenMP, MPI) and GPU (CUDA) types of parallelism frameworks are considered.

5.1 Part 1. Numerical results for the space homogeneous case

In this part, we validate our spectral discretization implementation of the Boltzmann collisional operator. To this aim, we re-employ the test cases considered in PR2 and FilbetRusso for the homogeneous two dimensional and three dimensional Boltzmann equation. For both situations, exact solutions are available and briefly recalled in the following.

5.1.1 Test 1.1. Convergence to equilibrium for the Maxwell molecules in dimension two.

We consider two dimensional in velocity Maxwellian molecules and the following space homogeneous initial condition


The analytical solution (the so-called BKW one) is given for all times by Bobylev:75; KrookWu:1977:


with . The final time is set to and the time step is equal to . The test is performed for points and for eight discrete angles . In order to keep the aliasing error smaller than the spectral error, the velocity domain is set to for , for and to for points. This choice helps fighting back the aliasing error behavior since it increases with the number of points. Therefore, to have comparable aliasing errors for the three meshes, we enlarge the size of the velocity space when the number of points increases. We report the discrete and norms of the error for the distribution function in Table 2. Moreover, in Figure 3, we report the distribution function at different times (, and ) when . The solution is plotted versus the exact solution. The results clearly show the convergence of the method towards the exact solution. Last in Figure 3 right panel, we present the time evolution of the error for the three configurations. Consistently the errors decrease by about one order of magnitude. The last recorded results, at corresponds to the Figures reported in table 2.

of points error error
Table 2: Test 1.1. Sanity check — and relative errors for the fast spectral method for different number of points at . Maxwellian molecules in dimension two.
Figure 3: Test 1.1. Sanity check — Left: (symbols and straight line) as a function of velocity and time for , and versus the exact solution (dashed line). Right: error as a function of time for , and points (or modes) in each direction. Maxwellian molecules in dimension two.

5.1.2 Test 1.2. Space homogeneous comparison between the BGK and the Boltzmann model. Maxwellian molecules.

In Figure 4, we compare the convergence to equilibrium for the BGK and Boltzmann models on the test case described in the previous paragraph using points on a domain . The norm of the difference between the two distribution functions and as a function of time is shown. As expected the differences are increasing at early stages of the relaxation towards the equilibrium to reach a maximum difference around time . Then, the two models slowly converge towards the same equilibrium solution.

Figure 4: Test 1.2. Differences between the distribution functions obtained with the BGK and the Boltzmann models. Maxwellian molecules in two dimensions. The distribution function values for the red marked iterations are depicted in Figure 5.

The red marked iterations in Figure 4 are further depicted in Figure 5, where we show the details of the difference in time between the two distribution functions. The same azimuthal scale is employed to ease the comparison.

Figure 5: Test 1.2. Differences in the distribution functions obtained by using the BGK and the Boltzmann models at , , and from top-left to bottom-right. Maxwellian molecules in two dimensions.

5.1.3 Test 1.3. Convergence to equilibrium in dimension three. Hard sphere molecules.

The following initial condition is considered FilbetRusso


where and is . The final time is set to , the time step is , while the velocity domain is discretized with points.

Figure 6: Test 1.3. Sanity check — Initial distribution function for middle, left and right. Hard sphere molecules.

In Figure 6, we report the initial data while in Figure 7, we report the relaxation to the equilibrium for the hard sphere model on a fixed plane passing through the points of coordinates and of normal 111This plane passes through the center of the two spheres defined by (50). In Figure 8, we report as before the comparison between the Boltzmann and the BGK model measuring the difference between the two distribution functions in time. Finally, in Figure 9 are presented the details of such differences between the distribution functions for the red marked iterations of figure 8 at . As for the Maxwellian molecules in two dimensions, at the beginning of the simulation we observe the larger deviations between the two models while at the end they both converge to the same limit solution as expected.
The results of these homogeneous tests (in two and in three dimensions in velocity) are meant to verify and validate our implementation of the spectral discretization of the Boltzmann collisional operator. Moreover, the differences observed between the two models justify the necessity of the Boltzmann operator for space non homogeneous situations. We can now investigate the coupling of the Boltzmann operator with the Fast semi-Lagrangian method for several space non homogeneous cases.

Figure 7: Test 1.3. Sanity check for the hard sphere case at time and for points in each direction. The Figure shows the relaxation to the Maxwellian state for the distribution function on a plane which passes from the points and of normal .
Figure 8: Test 1.3. Differences between the distribution functions obtained with the BGK and the Boltzmann models. Hard sphere molecules in three dimensions. The distribution function values for the red marked iterations are depicted in Figure 9.
Figure 9: Test 1.3. Differences in the distribution functions obtained by using the BGK and the Boltzmann models at , , and from top-left to bottom-right for . Hard sphere molecules in three dimensions.

5.2 Part 2. Numerical results for the one dimensional case in space.

In this part, we focus on solving the one dimensional in space Boltzmann and BGK equations considering a two or three dimensional dimensional velocity space setting. The purposes are twofold. First, we numerically demonstrate that the FKS and the spectral accurate Boltzmann kernel solver may be appropriately coupled and that they provide a valid kinetic solver for Boltzmann equations. Second, we show that BGK and Boltzmann models provide different results, justifying the use of a more sophisticated model. For these tests, a classical Riemann problem with Sod like initial data is considered

with . Initial local thermodynamics equilibrium is considered for all tests: . The velocity space is set to for all cases. Dirichlet boundary conditions are set on the left/right boundaries of . The two kinetic models are solved by employing a time rescaling factor in order to put in evidence the role of the collisions in the solutions. The rescaled equations reads


where is the rescaled parameter (the frequency of relaxation) which plays the role of the non dimensional Knudsen number. Smaller is the relaxation frequency, faster is the relaxation of the distribution function towards the equilibrium state. However, the exact rate of convergence is dependent on the type of collision considered either BGK or Boltzmann (Maxwellian molecules or hard spheres).

5.2.1 Test 2.1. Numerical convergence of the Boltzmann equation. The two dimensional in velocity Maxwellian molecules case.

The two dimensional velocity space is first considered leading to a space/velocity mesh of the form with and a varying number of space cells . In Figure 10, we present the space convergence results for the density, the velocity and the temperature when the Boltzmann operator is solved. Successively refined (doubled) spatial meshes are employed, from to up to final time . From these data we can observe that the simulation results seem to converge towards a given numerical solution in both cases (left panels) and (right panels). We can also observe that the increase in mesh resolution is profitable especially for smaller . This is the same behavior observed in FKS. In fact, the scheme precision decreases as the equilibrium state is approached, being virtually exact in non collisional or almost non collisional regimes. The loss of precision observed in fluid dynamic regimes can be recovered with a similar technique as the one proposed in FKS_HO. Here, however we do not consider this possibility. The CFL condition employed is the following


where the first term on the right hand side of the above equation comes from the will of keeping the error small enough in the splitting scheme, while the second term is due to the stability restriction in the solution of the space homogeneous problem when Maxwellian molecules are employed. In fact, in this case the loss part of the collision integral can be estimated, giving .

Figure 10: Test 2.1. One dimension in space and two dimension in velocity Boltzmann model with Maxwellian molecules for a Sod like test case at . Mesh convergence results for (left) and (right). Density (top), velocity (middle) and temperature (bottom) are shown for , and cells and velocity cells.

5.2.2 Test 2.2. Comparisons between the BGK model and the Boltzmann model. The two dimensional in velocity Maxwellian molecules case.

Here, the BGK and Boltzmann models are simulated for the same Sod-like problem. In order to have fairest as possible comparisons between the two models, we choose for the BGK model. This choice permits to have the same loss part for the two models since for Maxwellian molecules the loss part is close to as stated in the previous paragraph. We fix spatial cells and cells in velocity space. This permits to consider almost converged results. In Figure 11, we present the results when (left) and