GPUPEGAS: a new GPUaccelerated hydrodynamical code for numerical simulation of interacting galaxies
Abstract
In this paper a new scalable hydrodynamic code GPUPEGAS (GPUaccelerated PErformance Gas Astrophysic Simulation) for simulation of interacting galaxies is proposed. The code is based on combination of Godunov method as well as on the original implementation of FlIC method, specially adapted for GPUimplementation. Fast Fourier Transform is used for Poisson equation solution in GPUPEGAS. Software implementation of the above methods was tested on classical gas dynamics problems, new Aksenov’s test and classical gravitational gas dynamics problems. Collisionless hydrodynamic approach was used for modelling of stars and dark matter. The scalability of GPUPEGAS computational accelerators is shown.
1 Introduction
The movement of galaxies in dense clusters turns the collisions of galaxies into an important evolutionary factor (Tutukov & Fedorova, 2006; Tutukov et al., 2011). Numerical simulation plays a major role in studying of these processes. Due to extremely high growth of supercomputers performance the new astrophysical models and codes needs to be developed for detailed simulation of different physical effects in astrophysics.
During the last 15 years, from the wide range of the hydrodynamical methods two main approaches are used for nonstationary astrophysical problems solution. They are the Lagrangian SPH method (Gingold & Monaghan, 1977; Luci, 1977) (Smoothed Particle Hydrodynamics) and Eulerian methods within adaptive meshes, or AMR (OShea et al., 2005) (Adaptive Mesh Refinement). During the last 5 years a lot of combined codes (using both Lagrangian and Eulerian approaches) appeared.
The main problem of SPH codes is the search of neighbors for a partcle and the computation of gravitational interaction between particles. In order to solve these problems a lot of algorithms were developed: particleparticle/particlemesh or PM method (Hockney & Eastwood, 1981), adaptation of PM method with using of hierarchy of computational mesh APM (Couchman, 1991), tree code (Barnes & Hut, 1986), combination of tree code and particlemesh method  TreePM method (Dubinski et al., 2004). There are some methods using for solving of Poisson equation: the conjugate gradient method, the fast Fourier transform method, the method of successive overrelaxation and Fedorenko method (The Multigrid method) (Fedorenko, 1961).
For numerical solution of gas dynamics problems the Godunov method is widely used(Godunov, 1959), that’s main structural element is a Riemann problem. Different algorithms of Riemann problem solution have produced a wide class of mathematical methods (Kulikovsky et al., 2001; Toro, 1999). The main methods are Courant  Isakson  Reese method (Courant et al., 1952) and Roe method (Roe, 1997). These methods are based on linearized hyperbolic systems of equations (Engquist & Osher, 1981) where the solution of Riemann problem is constructed only of Riemann waves. The main approach of wave velocity evaluation is a double wave HartenLaxVan Leer(HLL) scheme (Harten et al., 1983) and his modification HLLEmethod (Einfeld, 1988), HLLC (Batten et al., 1997). Some schemes, based on Godunov method, such as upwind second order MUSCL (Van Leer, 1979) and TVD schemes (Jin & Xin, 1995), third order piecewise parabolic method PPM (Collela & Woodward, 1984) were developed. Still is is not clear how to determine the order of accuracy of a scheme in the case of a discontinuous solution as it is stated in (Godunov et al., 2011).
The most wellknown SPH codes are Hydra (Pearcea & Couchman, 1997), Gasoline (Wadsley et al., 2004), GrapeSPH (Matthias, 1996), GADGET (Springel, 2005). The most wellknown mesh codes (in some case with adaptive mesh refinement) are NIRVANA (Ziegler, 2005), FLASH (Mignone et al., 2005), ZEUSMP (Norman et al., 2006), ENZO (OShea et al., 2005), RAMSES (Teyssier, 2002), ART (Kravtsov et al., 2002), Athena (Stone et al., 2008), Pencil Code (Brandenburg & Dobler, 2002), Heracles (Gonzalez et al., 2007), Orion (Krumholz et al., 2007), Pluto (Mignone et al., 2007), CASTRO (Almgren et al., 2010), GAMER (Schive et al., 2010). BETHEHydro (Murphy & Burrows, 2008), AREPO (Springel, 2009), CHIMERA (Bruenn et al., 2009) and PEGAS (Vshivkov et al., 2011,1) (designed by the author of the present paper) codes are based on mixed Lagrangian and Eulerian approaches. The large number of existing astrophysical codes means that there is no perfect code suitable for all cases. In such a way, new code development as well as modification of an existing codes is still necessary. In spite of the development of PETAFLOPs astrophysics codes such as PetaGADGET (Feng et al., 2011), EnzoP, PetaART, it is necessary to note some scalability restrictions of both AMR and SPH methods (Ferrari, 2010; Straalen, 2009). The main properties of codes are given in the table 1.
The present paper contain description a new collisionless component of galaxies on basis ”Collisionless hydrodynamic” model. The first time, the model for astrophysical problems was used in (Mitchell et al., 2012). GPUPEGAS is a first code, which contains full Boltzmann moment equations with complete the symmetric velocity dispersion tensor. The purpose of this paper is a description of the details of the numerical method as well as the peculiarities of the implementation of the method for hydrid supercomputers equipped with GPUaccelerated.
2 Numerical Method Description
We should use a twophase approach for modelling of interacting galaxies: hydrodynamic component (Vshivkov et al., 2011,1) and collisionless component (Mitchell et al., 2012) for description of stars and dark matter.
We should use for hydrodynamic component is the 3D model of selfgravitating gas dynamics in Cartesian coordinate system.
where is the pressure, is the density, is the velocity vector, is the density of total energy, is the gravitational potential of the gas itself, is the contribution of the dark matter and stars to the gravitational potential,  the inner energy,  cooling function (Sutherland & Dopita, 1993).
The dynamics of collisionless component are described by the collisionless Boltzmann equation for the distribution function of particles in the 6D position() – velocity() phase space:
The first moment of the collisionless Boltzmann equation are:
where is the symmetric velocity dispersions tensor, is the density, is the velocity vector, is the density of total energy, is the gravitational potential of the gas itself, is the contribution of the dark matter and stars to the gravitational potential, is the particles mass.
We should use for collisionless component is the 3D model of Boltzmann moment equations in Cartesian coordinate system.
The main characteristic parameters are: parsec, , N m/kg, kg/sec m. Let us introduce a uniform grid in the 3D computation domain. The cells of the grid are: , , , , , , where , , are the mesh steps, , , are the numbers of the mesh cells along the directions , , : , , . The method for the solution of gas dynamics equation is based on the FluidsInCells and Godunov method (Vshivkov et al., 2011,1), which showed good advantage for astrophysical problems (Tutukov et al., 2011).
2.1 The gas dynamics equation solution technique
The solution of the gas dynamics equations system is performed in two stages. During the first (Eulerian) stage, the equations system describes the the change of gas values as the result of pressure, gravity and cooling. The operator approach is used for the elimination of the mesh effect Vshivkov et al. (2006). The pressure and velocity values at all cells boundaries are exact solution of the linearized Eulerean stage equation system without potential and cooling function.
Let us consider 1D gas dynamics equations in Cartesian coordinate system.
We can rule out advective terms and consider 1D gas dynamics equations on Eulerian stage:
The eigenvalues of this matrix is: . We should rule out first column and first row and consider equations:
where , , – matrix of right eigenvectors, – matrix of left eigenvectors, – diagonal matrix of eigenvalues, . We should make the change and consider independent equations:
This system of equations has the exact solution at each cell boundaries, depending on the sign of the eigenvalues. We should make the inverse change and is the exact solution of equations on the Eulerian stage.
The eigenvalues and eigenvectors of 1D gas dynamics equations on Eulerian stage is:
This system is linearly hyperbolic and it has the following analytic solution:
where corresponds to the values of a function left and right at the cells boundaries. This values are used in Eulerean stage scheme.
During the second (Lagrangian) stage, the equations system contain divergent equations of the following type:
The Lagrangian stage describes the advective transportation process of all gas quantities . The initial version of numerical method involved the computation of the contributions of the gas quantities to adjacent cells Vshivkov et al. (2007). The computation was based on the scheme velocity . But this approach is not suitable for computation accelerators (Nvidia Tesla or Intel Xeon Phi etc.). In order to show it let us consider the solution of the above equation in 1D form:
where is defined as follows pic. 1:
which is demonstrated by pic. 1.
2.2 The Boltzmann moment equations solution technique
We can rule out advective terms on the 1D Boltzmann moment equations and consider six (instead of 10 equations) system equations on Eulerian stage:
We should repeat the approach described in the previous paragraph. The eigenvalues and eigenvectors of 1D Boltzmann moment equations on Eulerian stage is:
This system is linearly hyperbolic and it has the following analytic solution:
where corresponds to the values of a function left and right at the cells boundaries. Parameters in square the parentheses should be averaged. This values are used in Eulerean stage scheme on The Boltzmann moment equations.
2.3 The Poisson equation solution
The Poisson equation solution for the gravitational potential
is based on 27 points stencil. Fourier transform method is used to solve the Poisson equation. The Poisson equation solution scheme in Fourier space is:
The Fast Fourier Transform procedure is used for the direct and inverse transform.
2.4 The Correctness checking
On each step at time was used correctness checking:
The energy balance procedure on each time step described in Vshivkov et al. (2011,2).
3 Parallel implementation
The necessity of threedimensional simulation and the unsteady nature of the problem impose hard requirements for methods of solution. The rapid development of computer technology in recent time allowed to perform intensive computations and obtain physically correct results by means of the threedimensional codes. Using of supercomputers gives the possibility to use large data volumes, improve the computation accuracy and also to accelerate computations. The main problem within the astrophysical code development is the efficient solution of gas dynamics equations and the Boltzmann moment equations since it takes up to 90% of computation time (fig. 2).
The basis of the parallel implementation of hydrodynamical solver is a threedimensional domain decomposition. There are MPI decomposition along the one coordinate, along another two coordinates CUDA technology are used 3. Three dimensional parallel Fast Fourier Transform is performed by the subroutine from the freeware FFTW library. In the future, such a Poisson solver will be ported to GPU accelerators.
Parallel implementation is related to the topology and architecture of a hybrid supercomputer NKSG6 of Siberian Supercomputer Center ICMMG SB RAS. In basis of the modification of the numerical method for solving the hydrodynamic equations at every stage irrespective computing the fluxes through each cell. Is performed the onelayer overlapping of the boundary points of the neighboring subdomains. In the future, such a modification of the method will be ported to accelerators Intel Xeon Phi.
In the case of a hybrid implementation is necessary to define two concepts of scalability:

The strong scalability – reduction of the computation time of one step of d the same problem when more devices are used.

The weak scalability – a saving of computation time of one step and the same amount of tasks with increasing the number of devices at the same time.
The results of the efficiency of program implementation are shown in the figure 4.
4 Testing of the implementation
A GPUPEGAS code was verified on: 4 problems of shock tube (3 the Godunov tests for gas dynamics equations and one test for the Boltzmann moment equations); a new the Aksenov test; KelvinHelmholtz and RayleighTaylor Instability tests; Author’s cloud collision test; the collapse of rotating molecular cloud test.
4.1 The Godunov tests
4.2 The Shock tube test for the Boltzmann moment equations
The initial configurations of the Shock tube test for the Boltzmann moment equations are
The results of simulation are given in figures 6.
4.3 Aksenov test
Consider the equations of onedimensional gas dynamics in the dimensional form:
where  pressure,  density,  velocity,  adiabatic index.
As the characteristic variable to – characteristic length, – characteristic density, – characteristic pressure. Then the characteristic velocity is and characteristic time is . Selecting the dimensional quantities , , , and , , . Let us choose the initial data (Aksenov, 2001) , . Then analytical solution for is:
For time is and . The results of simulation are given in figures 7.
Velocity is sufficiently well approximated numerical solution. Density has a jump in the center. This jump is of the same principle as temperature jump in the third Godunov test. Actually, we have to be something of type a trace of entropy in this test, which is formed as a result of ”runoff” gas in this region with zero velocity. This feature focuses on a finite number of points and is reduced by splitting of mesh.
4.4 KelvinHelmholtz and RayleighTaylor Instability
The gravitational instability is the basis for the physical formulation of the problem which leads to a mathematical incorrect. Numerical method must not suppress the physical instability. To check the correct reproduction of the instability a code has been verificate on RayleighTaylor and KelvinHelmholtz instability. RayleighTaylor instability verifies playback capability of the gravitational term. KelvinHelmholtz instability verifies playback capability of the nonlinear hydrodynamic turbulence.
The initial data for RayleighTaylor instability: – domain, – adiabatic index,
– the hydrostatic equilibrium pressure, – acceleration of free fall, , where
The initial data for KelvinHelmholtz instability: – domain, – adiabatic index,
– pressure, , where
The results of KelvinHelmholtz and RayleighTaylor Instability simulation are given in figures 8.
4.5 Author’s cloud collision test
Let us take the steady state gas sphere in the situation of hydrostatical equilibrium as the initial state for the gas dynamics equation. The density distribution is obtained from the gas dynamics equation system together with Poisson equation written in spherical coordinate:
The density distribution is:
Then pressure and gravitational potential is:
The initial distance between two gas clouds , collision velocity . The results of simulation are given in figures 9.
4.6 Collapse of rotation cloud
For research the possibility of modeling the collapse of rotating molecular clouds we simulate the gas cloud bounded by the sphere with radius – pc, mass of cloud – , density – , temperature – K, angular velocity – km/sec, adiabatic index – , speed of sound – km/sec. The main characteristic parameters are: pc, kg/m, km/sec. Then Nondimensional density – , pressure – , angular velocity – , adiabatic index – , domain – . In this research the behavior of energy has quantitatively 10 coincided with the results of other authors (Petrov & Berczik, 2005) .
5 Numerical simulation of a galaxies collision
The main purpose of code GPUPEGAS is modeling the galaxies collision of different types and at different angles. As a model problem, we consider the collision of disk galaxies at an angle. The first cloud is given as spherical domain filled uniformly with gas kg, The second cloud is given in respect of the ellipsoid axes 1:2:1, inclined at 45 degrees to the axis of the collision. The clouds move in the opposite directions with the velocities km/sec. The figure 11 shows the evolution of the collision and ”slim” splash at a interacting galaxies. The calculation was made using 96 GPUaccelerators cluster NKSG6 of Siberian Supercomputer Center ICMMG SB RAS on the mesh and time steps.
5.1 The passage scenario of a central collision of two galaxies
We should show the possibility of scenarios galaxies passage in the twophase model (Vshivkov et al., 2011,1). Two selfgravitating gas clouds are set in the 3D computational domain at the initial moment. Both clouds have the same distributions of gas parameters. Each cloud is a spherical domain filled uniformly by gas with the mass kg and the stars and dark matter with the mass kg. The clouds move in the opposite directions with the velocities km/sec. We should repeat the passage scenario of a central collision of two galaxies in twophase model. The figure 12 shows the evolution of the passage scenario of a central collision of two galaxies.
6 Conclusions and future work
A new Hydrodinamical code GPUPEGAS is described for simulation of interacting galaxies on a hybrid supercomputer by means GPU. The code is based on combination of Godunov method as well as on the original implementation of FlIC method, specially adapted for GPUimplementation. Fast Fourier Transform is used for Poisson equation solution in GPUPEGAS. Software implementation of the above methods was tested on classical gas dynamics problems, new Aksenov’s test and classical gravitational gas dynamics problems. The Boltzmann moment equations approach was used for modelling of stars and dark matter. The possibility of scenarios galaxies passage in the twophase model is shown. The scalability of GPUPEGAS computational accelerators is shown. Maximum speedup factors of 55 (as compared with 12.19 on GAMER code (Schive et al., 2010)) are demonstrated using one GPU. Maximum efficiency 96 % (as compared with 65 % on GAMER code on 16GPUs (Schive et al., 2010)) are demonstrated using 60 GPUs cluster NKSG6 of Siberian Supercomputer Center ICMMG SB RAS.
References
 Tutukov & Fedorova (2006) Tutukov, A., Fedorova, A. 2006, Astron. Rep., 50, 785
 Tutukov et al. (2011) Tutukov, A., Lazareva, G., Kulikov, I. 2011, Astron. Rep., 55, 770
 Gingold & Monaghan (1977) Gingold, R.A., Monaghan, J.J. 1977, MNRAS, 181, 375
 Luci (1977) Luci, L.B. 1977, ApJ, 82, 1013
 Collela & Woodward (1984) Collela, P., Woodward, P.R. 1984, J. Comp. Phys., 54, 174
 OShea et al. (2005) OShea, B., Bryan, G., Bordner, J., Norman, M., Abel, T., Harkness, R., & Kritsuk, A. 2005, Lect. Notes Comput. Sci. Eng., 41, 341
 Hockney & Eastwood (1981) Hockney, R.W., & Eastwood, J.W. 1981, Computer Simulation Using Particles (New York: McGrawHill)
 Couchman (1991) Couchman, H. M. P. 1991, ApJ, 368, L23
 Barnes & Hut (1986) Barnes, J., & Hut, P. 1986, Nature, 324, 446
 Dubinski et al. (2004) Dubinski, J., Kim, J., Park, C., & Humble, R. 2004, New Astron., 9, 111
 Fedorenko (1961) Fedorenko, R. 1961, USSR Comput. Math. & Math. Phys., 1, 1092
 Godunov (1959) Godunov, S. K. 1959, Math. Sb., 47, 271
 Kulikovsky et al. (2001) Kulikovskii, A. G., Pogorelov, N. V., & Semenov, A. Yu. 2001, Mathematical Aspects of Numerical Solution of Hyperbolic Systems (in Russian;Moscow:Fizmatlit)
 Toro (1999) Toro, E.F. 1999, Riemann Solvers and Numerical Methods for Fluid Dynamics (Heidelberg:SpringerVerlag)
 Courant et al. (1952) Courant, R., Isaacson, E., Rees, M. 1952, Communications on Pure and Applied Mathematics, 5, 243
 Roe (1997) Roe, P. 1997, J. Comp. Phys., 135, 250
 Engquist & Osher (1981) Engquist, B., Osher, S.J. 1981, Math. Comp., 36, 321
 Harten et al. (1983) Harten, A., Lax, P.D., Van Leer, B. 1983, Soc. Indust. & Appl. Math. Rev., 25, 35
 Einfeld (1988) Einfeld, B. 1988, Soc. Indust. & Appl. Math. J. Num. Analys., 25, 294
 Batten et al. (1997) Batten, P., Clarke, N., Lambert, C., & Causon, D.M. 1997, Soc. Indust. & Appl. Math. J. Comp., 18, 1553
 Van Leer (1979) Van Leer, B. 1979, J. Comput. Phys., 32, 101
 Pearcea & Couchman (1997) Pearcea, F.R., Couchman, H.M.P. 1997, New Astronomy, 2, 411
 Wadsley et al. (2004) Wadsley, J.W., Stadel, J., Quinn, T. 2004, New Astronomy, 9, 137
 Matthias (1996) Matthias, S. 1996, MNRAS, 278, 1005
 Springel (2005) Springel, V. 2005, MNRAS, 364, 1105
 Jin & Xin (1995) Jin, S., Xin, Z. 1995, Commun. Pure Appl. Math., 48, 235
 Godunov et al. (2011) Godunov, S.K., Manuzina, Yu.D., Nazareva, M.A. 2011, Comp. Math. Comp. Phys, 51, 88
 Ziegler (2005) Ziegler, U. 2005, A&A, 435, 385
 Mignone et al. (2005) Mignone, A., Plewa, T., Bodo, G. 2005, ApJ, 160, 99
 Norman et al. (2006) Hayes, J., Norman, M., Fiedler, R. et al. 2006, ApJS, 165, 188
 Teyssier (2002) Teyssier, R. 2002, A&A, 385, 337
 Kravtsov et al. (2002) Kravtsov, A., Klypin, A., Hoffman, Y. 2002, ApJ, 571, 563
 Stone et al. (2008) Stone, J. et al. 2008, ApJS, 178, 137
 Brandenburg & Dobler (2002) Brandenburg, A., Dobler, W. 2002, Comp. Phys. Comm., 147, 471
 Schive et al. (2010) Schive, H., Tsai, Y., Chiueh, T. 2010, ApJ, 186 457
 Murphy & Burrows (2008) Murphy, J., Burrows, A. 2008, ApJS, 179, 209
 Springel (2009) Springel, V. 2010, MNRAS, 401, 791
 Bruenn et al. (2009) Bruenn, S. et al. 2009, J. Phys., 180, 393
 Vshivkov et al. (2011,1) Vshivkov, V., Lazareva, G., Snytnikov, A., Kulikov, I., & Tutukov, A. 2011, ApJS, 194, 47
 Gonzalez et al. (2007) Gonzalez, M., Audit, E., Huynh P. 2007, A&A, 464, 429
 Krumholz et al. (2007) Krumholz, M.R., Klein, R.I., McKee, C.F., & Bolstad, J. 2007, ApJ, 667, 626
 Mignone et al. (2007) Mignone, A. et al. 2007, ApJS, 170, 228
 Almgren et al. (2010) Almgren, A. et al. 2010, ApJ, 715, 1221
 Feng et al. (2011) Feng, Y. et al., 2011, ApJS, 197, 18
 Ferrari (2010) Ferrari, A. 2010, HPC on vector systems 2009, 4, 179
 Straalen (2009) Van Straalen, B. 2009, Proc. IPDPS 09, 1
 Mitchell et al. (2012) Mitchell, N., Vorobyov, E., Hensler, G. 2012, MNRAS, 428, 2674
 Sutherland & Dopita (1993) Sutherland, R. S., & Dopita, M. A. 1993, ApJS, 88, 253
 Vshivkov et al. (2006) Vshivkov, V. A., Lazareva, G. G., & Kulikov, I. M. 2006, Comp. Tech., 11, 27 (in Russian)
 Vshivkov et al. (2007) Vshivkov, V., Lazareva, G., & Kulikov, I. 2007, Opt. Instrum. Data Proc., 43, 530
 Vshivkov et al. (2011,2) Vshivkov, V., Lazareva, G., Snytnikov, A., Kulikov, I., & Tutukov, A. 2011, J. Inv. Illpos. Prob., 19, 1, 151
 Aksenov (2001) Aksenov, A.V. 2001, Doklady Akademii Nauk, 381, 2, 176 (in Russian)
 Petrov & Berczik (2005) Petrov, M.I., Berczik, P.P. 2005, Astronomische Nachrichten, 326, 505
Code  Hydrodynamical method  Poisson solver  HPC technologies  Correctness checking 

Hydra  SPH  Adaptive P  High Performance Fortran  + 
Gasoline  SPH  Tree code + Multipole Method  MPI  – 
GrapeSPH  SPH  Direct Summation  GRAPE  – 
GADGET2  SPH  TreePM + FFT  MPI  – 
NIRVANA  AMR+HLL  Multigrid  MPI  – 
FLASH  AMR+PPM  Multigrid  MPI  + 
ZEUSMP  Finite difference method  FFT+Multigrid  MPI  – 
ENZO  AMR+PPM  FFT+Multigrid  MPI  – 
RAMSES  AMR+HLLC  Multigrid+CG  OpenMP+MPI  + 
ART  AMR+MUSCL  FFT  MPI  – 
Athena  Roes solver  FFT  MPI  – 
Pencil Code  Finite difference method  FFT  HPF+MPI  + 
Heracles  MUSCL  CG  MPI  – 
Orion  AMR+MUSCL  Multigrid  –  + 
Pluto  AMR+HLLC  Analytical  MPI  – 
CASTRO  AMR+PPM  Multigrid  MPI+OpenMP  – 
GAMER  AMR+TVD  FFT+SOR  CUDA+MPI  – 
BETHEHydro  Arbitrary LagrangianEulerian  Matrix Inverse  –  – 
AREPO  Moving mesh + MUSCL  TreePM + FFT  MPI  – 
CHIMERA  Moving mesh + PPM  Analytical  –  – 
PEGAS  FlIC+Godunov  FFT  MPI  + 
Parameter  Test 1  Test 2  Test 3 

2  1  1  
0  2  0  
2  0.4  1000  
1  1  1  
0  2  0  
1  0.4  0.01  
0.5  0.5  0.5  
0.2  0.15  0.012 