REBOUND: An opensource multipurpose Nbody code for collisional dynamics
Key Words.:
Methods: numerical – Planets and satellites: rings – Protoplanetary disksREBOUND is a new multipurpose Nbody code which is freely available under an opensource license. It was designed for collisional dynamics such as planetary rings but can also solve the classical Nbody problem. It is highly modular and can be customized easily to work on a wide variety of different problems in astrophysics and beyond.
REBOUND comes with three symplectic integrators: leapfrog, the symplectic epicycle integrator (SEI) and a WisdomHolman mapping (WH). It supports open, periodic and shearingsheet boundary conditions. REBOUND can use a BarnesHut tree to calculate both selfgravity and collisions. These modules are fully parallelized with MPI as well as OpenMP. The former makes use of a static domain decomposition and a distributed essential tree. Two new collision detection modules based on a planesweep algorithm are also implemented. The performance of the planesweep algorithm is superior to a tree code for simulations in which one dimension is much longer than the other two and in simulations which are quasitwo dimensional with less than one million particles.
In this work, we discuss the different algorithms implemented in REBOUND, the philosophy behind the code’s structure as well as implementation specific details of the different modules. We present results of accuracy and scaling tests which show that the code can run efficiently on both desktop machines and large computing clusters.
1 Introduction
REBOUND is a new opensource collisional Nbody code. This code, and precursors of it, have already been used in wide variety of publications (Rein & Papaloizou, 2010; Crida et al., 2010; Rein et al., 2010, Rein & Liu in preparation; Rein & Latter in preparation). We believe that REBOUND can be of great use for many different problems and have a wide reach in astrophysics and other disciplines. To our knowledge, there is currently no publicly available code for collisional dynamics capable of solving the problems described in this paper. This is why we decided to make it freely available under the opensource license GPLv3^{1}^{1}1The full license is distributed together with REBOUND. It can also be downloaded from http://www.gnu.org/licenses/gpl.html..
Collisional Nbody simulations are extensively used in astrophysics. A classical application is a planetary ring (see e.g. Wisdom & Tremaine, 1988; Salo, 1991; Richardson, 1994; Lewis & Stewart, 2009; Rein & Papaloizou, 2010; Michikoshi & Kokubo, 2011, and references therein) which have often a collision timescale that is much shorter than or at least comparable to an orbital timescale. Selfgravity plays an important role, especially in the dense parts of Saturn’s rings (Schmidt et al., 2009). These simulations are usually done in the shearing sheet approximation (Hill, 1878).
Collisions are also important during planetesimal formation (Johansen et al., 2007; Rein et al., 2010, Johansen et al. in preparation). Collisions provide the dissipative mechanism to form a planetesimal out of a gravitationally bound swarm of boulders.
REBOUND can also be used with little modification in situations where only a statistical measure of the collision frequency is required such as in transitional and debris discs. In such systems, individual collisions between particles are not modeled exactly, but approximated by the use of superparticles (Stark & Kuchner, 2009; Lithwick & Chiang, 2007).
Furthermore, REBOUND can be used to simulate classical Nbody problems involving entirely collisionless systems. A symplectic and mixed variable integrator can be used to follow the trajectories of both testparticles and massive particles.
We describe the general structure of the code, how to obtain, compile and run it in Sect. 2. The timestepping scheme and our implementation of symplectic integrators are described in Sect. 3. The modules for gravity are described in Sect. 4. The algorithms for collision detection are discussed in Sect. 5. In Sect. 6, we present results of accuracy tests for different modules. We discuss the efficiency of the parallelization with the help of scaling tests in Sect. 7. We finally summarize in Sect. 8.
2 Overview of the code structure
REBOUND is written entirely in C and conforms to the ISO C99 standard. It compiles and runs on any modern computer platform which supports the POSIX standard such as Linux, Unix and Mac OSX. In its simplest form, REBOUND requires no external libraries to compile.
Users are encouraged to install the OpenGL and GLUT libraries which enable realtime and interactive 3D visualizations. LIBPNG is required to automatically save screenshots. The code uses OpenMP for parallelization on shared memory systems. Support for OpenMP is builtin to modern compilers and requires no libraries (for example gcc ). An MPI library must be installed for parallelization on distributed memory systems. REBOUND also supports hybrid parallelization using both OpenMP and MPI simultaneously.
2.1 Downloading and compiling the code
The source code is hosted on the github platform and can be downloaded at http://github.com/hannorein/rebound/. A snapshot of the current repository is provided as tar and zipfiles. However, users are encouraged to clone the entire repository with the revision control system git. The latter allows one to keep uptodate with future updates. Contributions from users to the public repository are welcome. Once downloaded and extracted, one finds five main directories.
The entire source code can be found in the src/ directory. In the vast majority of cases, nothing in this directory needs to be modified.
Many examples are provided in the examples/ directory. Each example comes with a problem file, named problem.c, and a makefile named Makefile. To compile one of the examples, one has to run the make command in the corresponding directory. The code compilation is then performed in the following steps:

The makefile sets up environment variables which control various options such as the choice of compiler, code optimization, real time visualization and parallelization.

It sets symbolic links, specifying the modules chosen for this problem (see below).

It calls the makefile in the src/ directory which compiles and links all source files.

The binary file is copied to the problem directory, from where it can be executed.
Documentation of the source code can be generated in the doc/ directory with doxygen. There is no static documentation available because the code structure depends crucially on the modules currently selected. To update the documentation with the current module selection, one can simply run make doc in any directory with a makefile.
In the directory tests/ one finds tests for accuracy and scaling as well as simple unit tests. The source code of the tests presented in Sects. 6 and 7 is included as well.
The problem/ directory is the place to create new problems. It contains a template for that. Any of the examples can also be used as a starting point for new problems.
2.2 Modules
REBOUND is extremely modular. The user has the choice between different gravity, collision, boundary and integration modules. It is also possible to implement completely new modules with minimal effort.
Modules are chosen by setting symbolic links. Thus, there is no need to execute a configuration script prior to compiling the code. For example, there is one link gravity.c which points to one of the gravity modules gravity_*.c. The symbolic links are set in each problem makefile. Only this makefile has to be changed when a different module is used. Precompiler macros are set automatically for situations in which different modules need to know about each other.
This setup allows the user to work on multiple projects at the same time using different modules. When switching to another problem, nothing has to be setup and the problem can by compiled by simply typing make in the corresponding directory.
To implement a new module, one can just copy an existing module to the problem directory, modify it and change the link in the makefile accordingly. Because no file in the src/ directory needs to be changed, one can easily keep REBOUND in sync with new versions^{2}^{2}2On how to do that, see for example http://gitref.org/ for an introduction to git..
2.3 Computational domain and boundary conditions
In REBOUND, the computational domain consists of a collection of cubic boxes. Any integer number of boxes can be used in each direction. This allows elongated boxes to be constructed out of cubic boxes. The cubic root boxes are also used for static domain decomposition when MPI is enabled. In that case, the number of root boxes has to be a integer multiple of the number of MPI nodes. When a tree is used for either gravity or collision detection, there is one tree structure per root box (see Sect. 4.2).
REBOUND comes with three different boundary conditions. Open boundaries (boundaries_open.c) remove every particle from the simulation that leaves the computational domain. Periodic boundary conditions (boundaries_periodic.c) are implemented with ghost boxes. Any number of ghost boxes can be used in each direction. Shearperiodic boundary conditions (boundaries_shear.c) can be used to simulate a shearing sheet.
3 Integrators
Several different integrators have been implemented in REBOUND. Although all of these integrators are second order accurate and symplectic, their symplectic nature is formally lost as soon as selfgravity or collisions are approximated or when velocity dependent forces are included.
All integrators follow the commonly used DriftKickDrift (DKD) scheme^{3}^{3}3We could have also chosen a KDK scheme but found that a DKD scheme performs slightly better. but implement the three substeps differently. We describe the particles’ evolution in terms of a Hamiltonian which can often be written as the sum of two Hamiltonians . How the Hamiltonian is split into two parts depends on the integrator. Usually, one identifies as the kinetic part and as the potential part, where and are the canonical momenta and coordinates. During the first drift substep, the particles evolve under the Hamiltonian for half a timestep . Then, during the kick substep, the particles evolve under the Hamiltonian for a full timestep . Finally, the particles evolve again for half a timestep under . Note that the positions and velocities are synchronized in time only at the end of the DKD timesteps. We refer the reader to Saha & Tremaine (1992) and references therein for a detailed discussion on symplectic integrators.
REBOUND uses the same timestep for all particles. By default, the timestep does not change during the simulation because in all the examples that come with REBOUND, the timestep can be naturally defined as a small fraction of the dynamical time of the system. However, it is straight forward to implement a variable timestep. This implementation depends strongly on the problem studied. Note that in general variable timesteps also break the symplectic nature of an integrator.
REBOUND does not choose the timestep automatically. It is up to the user to ensure that the timestep is small enough to not affect the results. This is especially important for highly collisional systems in which multiple collisions per timestep might occur and in situations where the curvature of particle trajectories is large. The easiest way to ensure numerical convergence is to run the same simulation with different timesteps. We encourage users to do that whenever a new parameter regime is studied.
3.1 Leapfrog
Leapfrog is a secondorder accurate and symplectic integrator for nonrotating frames. Here, the Hamiltonian is split into the kinetic part and the potential part . Both the drift and kick substeps are simple Euler steps. First the positions of all particles are advanced for half a timestep while keeping the velocities fixed. Then the velocities are advanced for one timestep while keeping the positions fixed. In the last substep the velocities are again advanced for half a timestep. Leapfrog is implemented in the module integrator_leapfrog.c.
3.2 WisdomHolman Mapping
A symplectic WisdomHolman mapping (WH, Wisdom & Holman, 1991) is implemented as a module in integrator_wh.c. The implementation follows closely that by the SWIFT code^{4}^{4}4http://www.boulder.swri.edu/~hal/swift.html. The WH mapping is a mixed variable integrator that calculates the Keplerian motion of two bodies orbiting each other exactly up to machine precision during the drift substep. Thus, it is very accurate for problems in which the particle motion is dominated by a central potential and perturbations added in the kick substep are small. However, the WH integrator is substantially slower than the leapfrog integrator because Kepler’s equation is solved iteratively every timestep for every particle.
The integrator assumes that the central object has the index in the particle array, that it is located at the origin and that it does not move. The coordinates of all particles are assumed to be the heliocentric frame. During the subtimesteps the coordinates are converted to Jacobi coordinates (and back) according to their index. The particle with index has the first Jacobi index, and so on. This works best if the particles are sorted according to their semimajor axis. Note that this is not done automatically.
3.3 Symplectic Epicycle Integrator
The symplectic epicycle integrator (SEI, Rein & Tremaine, 2011) for Hill’s approximation (Hill, 1878) is implemented in integrator_sei.c. When shearperiodic boundary conditions (boundaries_shear.c) are used, the Hill approximation is know as a shearing sheet.
SEI has similar properties to the WisdomHolman mapping in the case of the Kepler potential but works in a rotating frame and is as fast as a standard nonsymplectic integrator. The error after one timestep scales as the third power of the timestep times the ratio of the gravitational force over the Coriolis force (see Rein & Tremaine, 2011, for more details on the performance of SEI).
The epicyclic frequency and the vertical epicyclic frequency can be specified individually. This can be used to enhance the particle density in the midplane of a planetary ring and thus simulate the effect of selfgravity (see e.g. Schmidt et al., 2001).
4 Gravity
REBOUND is currently equipped with two (self)gravity modules. A gravity module calculates exactly or approximately the acceleration onto each particle. For a particle with index this is given by
(1) 
where is the gravitational constant, the mass of particle and the relative distance between particles and . The gravitational softening parameter defaults to zero but can be set to a finite value in simulations where physical collisions between particles are not resolved and close encounters might lead to large unphysical accelerations. The variable specifies the number of massive particles in the simulation. Particles with an index equal or larger than are treated as testparticles. By default, all particles are assumed to have mass and contribute to the sum in Eq. 1.
4.1 Direct summation
The direct summation module is implemented in gravity_direct.c and computes Eq. 1 directly. If there are massive particles and particles in total, the performance scales as . Direct summation is only efficient with few active particles; typically .
4.2 Octree
Barnes & Hut (1986, BH hereafter) proposed an algorithm to approximate Eq. 1, which can reduce the computation time drastically from to . The idea is straightforward: distant particles contribute to the gravitational force less than those nearby. By grouping particles hierarchically, one can separate particles in being far or near from any one particle. The total mass and the center of mass of a group of particles which are far away can then be used as an approximation when calculating the longrange gravitational force. Contributions from individual particles are only considered when they are nearby.
We implement the BH algorithm in the module gravity_tree.c. The hierarchical structure is realized using a threedimensional tree, called an octree. Each node represents a cubic cell which might have up to eight subcells with half the size. The root node of the tree contains all the particles, while the leaf nodes contain exactly one particle. The BH tree is initially constructed by adding particles one at a time to the root box, going down the tree recursively to smaller boxes until one reaches an empty leaf node to which the particle is then added. If the leaf node already contains a particle it is divided into eight subcells.
Every time the particles move, the tree needs to be updated using a tree reconstruction algorithm. We therefore keep track of any particle crossing the boundaries of the cell it is currently in. If it has moved outside, then the corresponding leaf node is destroyed and the particle is readded to the tree as described above. After initialization and reconstruction, we walk through the tree to update the total mass and the center of mass for each cell from the bottomup.
To calculate the gravitational forces on a given particle, one starts at the root node and descends into subcells as long as the cells are considered to be close to the particle. Let us define the opening angle as , where is the width of the cell and is the distance from the cell’s center of mass to the particle. If the opening angle is smaller than a critical angle , the total mass and center of mass of the cell are used to calculate the contribution to the gravitational force. Otherwise, the subcells are opened until the criterion is met. One has to choose appropriately to achieve a balance between accuracy and speed.
REBOUND can also include the quadrupole tensor of each cell in the gravity calculation by setting the precompiler flag QUADRUPOLE. The quadrupole expansion (Hernquist, 1987) is more accurate but also more time consuming for a fixed . We discuss how the critical opening angle and the quadrupole expansion affect the accuracy in Sect. 6.1.
With REBOUND, a static domain decomposition is used for parallelizing the tree algorithm on distributed memory systems. Each MPI node contains one or more root boxes (see also Sect. 2.3) and all particles within these boxes belong to that node. The number of root boxes has to be a multiple of the number of MPI nodes . For example, the setup illustrated in Fig. 1 uses 9 root boxes allowing 1, 3 or 9 MPI nodes. By default, the domain decomposition is done first along the direction, then along the and direction. If one uses 3 MPI nodes in the above example, the boxes are on on node , the boxes on node and the remaining boxes on node . When a particle moves across a root box boundary during the simulation, it is send to the corresponding node and removed form the local tree.
Because of the longrange nature of gravity, every node needs information from any other node during the force calculation. We distribute this information before the force calculation using an essential tree (Salmon et al., 1990) and an alltoall communication pattern. The essential tree contains only those cells of the local tree that might be accessed by the remote node during the force calculation. Each node prepares a total of different essential trees. The cells that constitute the essential tree are copied into a buffer array and the daughter cell references therein are updated accordingly. The center of mass and quadrupole tensors (if enabled) are stored in the cell structure and automatically copied when a cell is copied. For that reason only the tree structure needs to be distributed, not individual particles. The buffer array is then sent to the other nodes using nonblocking MPI calls.
For example, suppose 9 MPI nodes are used, each node using exactly one tree in its domain. For that scenario the essential trees prepared for root box are illustrated in Fig. 1. The essential trees include all cells which are close enough to the boundary of root box so that they might be opened during the force calculation of a particle in root box according to the opening angle criteria.
In Sect. 7 we show that this parallelization is very efficient when the particle distribution is homogeneous and there are more than a few thousand particles on every node. When the number of particles per node is small, communication between nodes dominates the total runtime.
5 Collisions
REBOUND supports several different modules for collision detection which are described in detail below. All of these methods search for collisions only approximately, might miss some of the collisions or detect a collision where there is no collision. This is because either curved particle trajectories are approximated by straight lines (collisions_sweep.c and collisions_sweepphi.c) or particles have to be overlapping to collide (collisions_direct.c and collisions_tree.c). This is also illustrated in Fig. 2.
In all modules, the order of the collisions is randomized. This ensures that there is no preferred ordering which might lead to spurious correlations when one particles collides with multiple particles during one timestep. Note that REBOUND uses a fixed timestep for all particles. Therefore one has to ensure that the timestep is chosen small enough so that one particle does collide with no more than one other particle during one timestep, at least on average. See also the discussion in Sect. 3.
A freeslip, hardsphere collision model is used. Individual collisions are resolved using momentum and energy conservation. A constant or an arbitrary velocity dependent normal coefficient of restitution can be specified to model inelastic collisions. The relative velocity after one collision is then given by
(2) 
where and are the relative normal and tangential velocities before the collision. Particle spin is currently not supported.
5.1 Direct nearest neighbor search
A direct nearest neighbor collisions search is the simplest collision module in REBOUND. It is implemented in collisions_direct.c,
In this module, a collision is detected whenever two particles are overlapping at the end of the DKD timestep, i.e. the middle of the drift substep, where positions and velocities are synchronized in time (see Sect. 3). This is illustrated in the right panel of Fig. 2. Then, the collision is added to a collision queue. When all collisions have been detected, the collision queue is shuffled randomly. Each individual collision is then resolved after checking that the particles are approaching each other.
Every pair of particles is checked once per timestep, making the method scale as . Similar to the direct summation method for gravity, this is only useful for a small number of particles. For most cases, the nearest neighbor search using a tree is much faster (see next section).
5.2 Octree
The octree described in Sect. 4.2 can also be used to search for nearest neighbors. The module collisions_tree.c implements such a nearest neighbor search. It is parallelized with both OpenMP and MPI. It can be used in conjunction with any gravity module, but when both tree modules gravity_tree.c and collisions_tree.c are used simultaneously, only one tree structure is needed. When collisions_tree.c is the only tree module, center of mass and octopole tensors are not calculated in tree cells.
To find overlapping particles for particle , one starts at the root of the tree and descents into daughter cells as long as the distance of the particle to the cell center is smaller than a critical value:
(3) 
where is the size of the particle, is the maximum size of a particle in the simulation and is the width of the current cell. When two particles are found to be overlapping, a collision is added to the collision queue and resolved later in the same way as above.
If MPI is used, each node prepares the tree and particle structures that are close to the domain boundaries as these might be needed by other nodes (see Fig. 1). This essential tree is send to other nodes and temporarily added to the local tree structure. The nearest neighbor search can then be performed in the same way as in the serial version. The essential tree and particles are never modified on a remote node.
This essential tree is different from the essential tree used for the gravity calculation in two ways. First, this tree is needed at the end of the timestep, whereas the gravity tree is needed at the beginning of the kick sub timestep. Second, the criteria for cell opening, Eq. 3, is different.
A nearest neighbor search using the octree takes on average operations for one particle and therefore operations for all particles.
5.3 Planesweep Algorithm
We further implement two collision detection modules based on a planesweep algorithm in collisions_sweep.c and collisions_sweepphi.c. The planesweep algorithm makes use of a conceptual plane that is moved along one dimension.
The original algorithm described by Bentley & Ottmann (1979) maintains a binary search tree in the orthogonal dimensions and keeps track of line crossings. In our implementation, we assume the dimension normal to the plane is much longer than the other dimensions. This allows us to simplify the BentleyOttmann algorithm and get rid of the binary search tree which further speeds up the calculation.
In REBOUND the sweep is either performed along the direction or along the azimuthal angle (measured in the plane from the origin). The sweep in the direction can also be used in the shearing sheet. The sweep in the direction is useful for (narrow) rings in global simulations. Here, we only discuss the planesweep algorithm in the Cartesian case (along the direction) in detail. The sweep implementation is almost identical except of the difference in periodicity and the need to calculate the angle and angular frequency for every particle at the beginning of the collision search.
Our planesweep algorithm can be described as follows (see also Fig. 3):

If a tree is not used to calculate selfgravity, the particles are sorted according to their coordinate^{5}^{5}5Each tree cell keeps a reference to the particle it contains. This reference has to be updated every time a particle is moved in the particle array which would lead to larger overhead.. During the first timestep, quicksort is used as the particles are most likely not presorted. In subsequent timesteps, the adaptive sort algorithm insertionsort is used. It can make use of the presorted array from the previous timestep and has an average performance of as long as particles do not completely randomize their positions in one timestep.

The coordinate of every particle before and after the drift step is inserted into an array SWEEPX. The trajectory is approximated by a line (see left panel of Fig. 2). In general, the real particle trajectories will be curved. In that case the positions are only approximately the start and end points of the particle trajectory. The particle radius is subtracted/added to the minimum/maximum coordinate. The array contains elements when all particles have been added.

If a tree is not used, the array SWEEPX is sorted with the position as a key using the insertionsort algorithm. Because the particle array is presorted, insertionsort runs in approximately operations. If a tree is used, the array is sorted with quicksort.

A conceptual plane with its normal vector in the direction is inserted at the left side of the box. While going through the array SWEEPX, we move the plane towards the right one step at a time according to the coordinate of the current element in the array. We thus move the plane to the other side of the box in a total of stops.

The plane is intersecting particle trajectories. We keep track of these intersection using a separate array SWEEPL. Whenever a particle appears for the first time in the array SWEEPX the particle is added to the SWEEPL array. The particle is removed from the array SWEEPL when it appears in the array SWEEPX for the second time. In Fig. 3, the plane is between stop 10 and 11, intersecting the trajectories of particles 5 and 7.

When a new particle is inserted into the array SWEEPL, we check for collisions of this particle with any other particle in SWEEPL during the current timestep. The collision is recorded and resolved later. In Fig. 3 the array SWEEPL has two entries, particles 5 and 7. Those will be checked for collisions.
The time needed to search for a collision at each stop of the plane is , where is the number of elements in the array SWEEPL. This could be reduced with a binary search tree to as in the original algorithm by Bentley & Ottmann (1979). However tests have shown that there is little to no performance gain for the problems studied with REBOUND because a more complicated data structure has to be maintained. One entire timestep with the planesweep algorithm is completed in . It is then easy to see that this method can only be efficient when , as a tree code is more efficient otherwise.
Indeed, experiments have shown (see Sect. 7.4) that the planesweep algorithm is more efficient than a nearest neighbor search with an octree by many orders of magnitude for low dimensional systems in which is small.
6 Test problems
We present several tests in this section which verify the implementation of all modules. First, we measure the accuracy of the tree code. Then we check for energy and momentum conservation. We use a long term integration of the outer solar system as a test of the symplectic WH integrator. Finally, we measure the viscosity in a planetary ring which is a comprehensive test of both selfgravity and collisions.
6.1 Force accuracy
We measure the accuracy of the tree module gravity_tree.c by comparing the force onto each particle to the exact result obtained by direct summation (Eq. 1). We set up 1000 randomly distributed particles with different masses in a box. We do not use any ghost boxes and particles do not evolve.
We sum up the absolute value of the acceleration error for each particle and normalize it with respect to the total acceleration (see Hernquist, 1987, for more details).
This quantity is plotted as a function of the critical opening angle in Fig. 4(a). One can see that the force quickly converges towards the correct value for small . The quadrupole expansion performs one order of magnitude better then the monopole expansion for and two orders of magnitude better for .
In Fig. 4(b) we plot the errors of the same simulations as a function of the computation time. The quadrupole expansion requires more CPU time than the monopole expansion for fixed . However, the quadrupole expansion is faster when for a fixed accuracy. Note that including the quadrupole tensor also increases communication costs between MPI nodes.
6.2 Energy and momentum conservation in collisions
In a nonrotating simulation box with periodic boundaries and nongravitating collisional particles, we test both momentum and energy conservation. Using a coefficient of restitution of unity (perfectly elastic collisions), the total momentum and energy is conserved up to machine precision for all collision detection algorithms.
6.3 Long term integration of Solar System
To test the longterm behavior of our implementation of the WisdomHolman Mapping, we integrate the outer Solar System for 200 million years. We use the initial conditions given by Applegate et al. (1986) with 4 massive planets and Pluto as a test particle. The direct summation module has been used to calculate selfgravity. With a 40 day timestep and an integration time of Myr, the total runtime on one CPU was less then 2 hours.
6.4 Viscosity in planetary rings
Daisaka et al. (2001) calculate the viscosity in a planetary ring using numerical simulations. We repeat their analysis as this is an excellent code test as the results depend on both selfgravity and collisions. The quasiequilibrium state is dominated by either selfgravity or collisions, depending on the ratio of the Hill radius over the physical particle radius, .
In this simulation we use the octree implementation for gravity and the planesweep algorithm for collisions. The geometric optical depth is and we use a constant coefficient of restitution of . The separate parts of the viscosity are calculated directly as defined by Daisaka et al. (2001) for various and plotted in dimensionless units in Fig. 6.
The results are in good agreement with previous results. At large , the collisional part of the viscosity is slightly higher in our simulations when permanent particle clumps form. This is most likely due to the the different treatment of collisions and some ambiguity in defining the collisional viscosity when particles are constantly touching each other (Daisaka, private communication).
7 Scaling
Using the shearing sheet configuration with the tree modules gravity_tree.c and collisions_tree.c, we measure the scaling of REBOUND and the efficiency of the parallelization. The simulation parameters have been chosen to resemble those of a typical simulation of Saturn’s Aring with an optical depth of order unity and a collision timescale being much less than one orbit. The opening angle is . The problem.c files for this and all other tests can be found in the test/ directory.
All scaling tests have been performed on the IAS aurora cluster. Each node has dual quadcore 64bit AMD Opteron Barcelona processors and 16 GB RAM. The nodes are interconnected with 4x DDR Infiniband.
7.1 Strong scaling
In the strong scaling test the average time to compute one timestep is measured as a function of the number of processors for a fixed total problem size (e.g. fixed total number of particles). We use only the MPI parallelization option.
The results for simulations using and particles are plotted in Fig. 7(a). One can see that for a small number of processors the scaling is linear for all problems. When the number of particles per processor is below a critical value, , the performance drops. Below the critical value, a large fraction of the tree constitutes the essential tree which needs to be copied to neighboring nodes every timestep. This leads to an increase in communication.
The results show that we can completely utilize 64 processors cores with one million particles.
7.2 Weak scaling
In the weak scaling test we measure the average time to compute one timestep as a function of the number of processors for a fixed number of particles per processor. Again, we only use the MPI parallelization option.
The results for simulations using and particles per processor are plotted in Fig. 7(b). One can easily confirm that the runtime for a simulation using processors is , as expected. By increasing the problem size, the communication per processor does not increase for the collision detection algorithm as only direct neighbors need to be evaluated on each node. The runtime and communication for the gravity calculation is increasing logarithmically with the total number of particles (which is proportional to the number of processors in this case).
These results shows that REBOUND’s scaling is as good as it can possibly get with a tree code. The problem size is only limited by the total number of available processors.
7.3 OpenMP/MPI tradeoff
The previous results use only MPI for parallelization. REBOUND also supports parallelization with OpenMP for shared memory systems.
OpenMP has the advantage over MPI that no communication is needed. On one node, different processes share the same memory and work on the same tree and particle structures. However, the tree building and reconstruction routines are not parallelized. These routines can only be parallelized efficiently when a domain decomposition is used (as used for MPI, see above).
Results of hybrid simulations using both OpenMP and MPI at the same time are shown in Fig. 8. We plot the average time to compute one timestep as a function of the number of OpenMP processes per MPI node. The total number of particles and processors (64) is kept fixed.
One can see that OpenMP does indeed perform better than MPI when the particle number per node is small and the runtime is dominated by communication (see also Sect. 7.1). For large particle numbers, the difference between OpenMP and MPI is smaller, as the sequential tree reconstruction outweighs the gains. Eventually, for very large simulations () the parallelization with MPI is faster.
Thus, in practice OpenMP can be used to accelerate MPI runs which are bound by communication. It is also an easy way to accelerate simulations on desktop computer which have multiple CPU cores.
7.4 Comparison of collision detection algorithms
The collision modules described in Sect. 5 have very different scaling behaviors and are optimized for different situations. Here, we illustrate their scalings using two shearing sheet configurations with no selfgravity. We plot the average number of timesteps per second as a function of the problem size in Fig. 9 for the planesweep algorithm and both the octree and direct nearest neighbor collision search.
In simulations used in Fig. 9(a), we vary both the azimuthal size, , and radial size, , of the computational domain. The aspect ratio of the simulation box is kept constant. For the planesweep algorithm, the number of particle trajectories intersecting the plane^{6}^{6}6Note that a disk is effectively a two dimensional system. In three dimensions . scales as . Thus, the overall scaling of the planesweep method is , which can be verified in Fig. 9(a). Both the tree and direct detection methods scale unsurprisingly as and , respectively.
For simulations used in Fig. 9(b), we vary the radial size of the computational domain and keep the azimuthal size fixed at 20 particle radii. Thus, the aspect ratio changes and the box becomes very elongated for large particle numbers. If a tree is used in REBOUND, an elongated box is implemented as many independent trees, each being a cubic root box (see Sect. 2.3). Because each tree needs to be accessed at least one during the collision search, this makes the tree code scale as for large , effectively becoming a direct nearest neighbor search. The planesweep algorithm on the other hand scales as , as the number of particle trajectories intersecting the plane is constant, . Again, the direct nearest neighbor search scales unsurprisingly as .
From these test cases, it is obvious that the choice of collision detection algorithm strongly depends on the problem. Also note that if the gravity module is using a tree, the collision search using the same tree comes at only a small additional cost.
The planesweep module can be faster for nonselfgravitating simulations by many orders of magnitude, especially if the problem size is varied only in one dimension.
8 Summary
In this paper, we presented REBOUND, a new opensource multipurpose Nbody code for collisional dynamics. REBOUND is available for download at http://github.com/hannorein/rebound and can be redistributed freely under the GPLv3 license.
The code is written in a modular way, allowing users to choose between different numerical integrators, boundary conditions, selfgravity solvers and collision detection algorithms. With minimal effort, one can also implement completely new modules.
The octree selfgravity and collision detection modules are fully parallelized with MPI and OpenMP. We showed that both run efficiently on multicore desktop machines as well as on large clusters. Results from a weak scaling test show that there is no practical limit on the maximum number of particles that REBOUND can handle efficiently except by the number of available CPUs. We will use this in future work to conduct extremely elongated simulations that can span the entire circumference of Saturn’s rings.
Two new collision detection methods based on a planesweep algorithm are implemented in REBOUND. We showed that the planesweep algorithm scales linearly with the number of particles for effectively low dimensional systems and is therefor superior to a nearest neighbor search with a tree. Examples of effectively low dimensional systems include very elongated simulation domains and narrow rings. Furthermore, the simpler datastructure of the planesweep algorithm makes it also superior for quasitwo dimensional simulations with less than about one million particles.
Three different integrators have been implemented, for rotating and nonrotating frames. All of these integrators are symplectic. Exact longterm orbit integrations can be performed with a WisdomHolman mapping.
Given the already implemented features as well as the open and modular nature of REBOUND, we expect that this code will find many applications both in the astrophysics community and beyond. For example, molecular dynamics and granular flows are subject areas where the methods implemented in REBOUND can be readily applied. We strongly encourage users to contribute new algorithms and modules to REBOUND.
Acknowledgements.
We would like to thank the referee John Chambers for helpful comments and suggestions. We would also like to thank Scott Tremaine, Hiroshi Daisaka and Douglas Lin for their feedback during various stages of this project. Hanno Rein was supported by the Institute for Advanced Study and the NSF grant AST0807444. ShangFei Liu acknowledges the support of the NSFC grant 11073002. Hanno Rein and ShangFei Liu would further like to thank the organizers of ISIMA 2011 and the Kavli Institute for Astronomy and Astrophysics in Beijing for their hospitality.References
 Applegate et al. (1986) Applegate, J. H., Douglas, M. R., Gursel, Y., Sussman, G. J., & Wisdom, J. 1986, AJ, 92, 176
 Barnes & Hut (1986) Barnes, J. & Hut, P. 1986, Nature, 324, 446
 Bentley & Ottmann (1979) Bentley, J. & Ottmann, T. 1979, Computers, IEEE Transactions on, C28, 643
 Crida et al. (2010) Crida, A., Papaloizou, J., Rein, H., Charnoz, S., & Salmon, J. 2010, AJ, submitted
 Daisaka et al. (2001) Daisaka, H., Tanaka, H., & Ida, S. 2001, Icarus, 154, 296
 Hernquist (1987) Hernquist, L. 1987, ApJS, 64, 715
 Hill (1878) Hill, G. W. 1878, Astronomische Nachrichten, 91, 251
 Johansen et al. (2007) Johansen, A., Oishi, J. S., Low, M.M. M., et al. 2007, Nature, 448, 1022
 Lewis & Stewart (2009) Lewis, M. C. & Stewart, G. R. 2009, Icarus, 199, 387
 Lithwick & Chiang (2007) Lithwick, Y. & Chiang, E. 2007, ApJ, 656, 524
 Michikoshi & Kokubo (2011) Michikoshi, S. & Kokubo, E. 2011, ApJ, 732, L23+
 Rein et al. (2010) Rein, H., Lesur, G., & Leinhardt, Z. M. 2010, A&A, 511, A69+
 Rein & Papaloizou (2010) Rein, H. & Papaloizou, J. C. B. 2010, A&A, 524, A22+
 Rein & Tremaine (2011) Rein, H. & Tremaine, S. 2011, MNRAS, 845
 Richardson (1994) Richardson, D. C. 1994, MNRAS, 269, 493
 Saha & Tremaine (1992) Saha, P. & Tremaine, S. 1992, AJ, 104, 1633
 Salmon et al. (1990) Salmon, J., Quinn, P. J., & Warren, M. 1990, Using parallel computers for very large Nbody simulations (Dynamics and Interactions of Galaxies), 216–218
 Salo (1991) Salo, H. 1991, Icarus, 90, 254
 Schmidt et al. (2009) Schmidt, J., Ohtsuki, K., Rappaport, N., Salo, H., & Spahn, F. 2009, Dynamics of Saturn’s Dense Rings (Springer), 413–458
 Schmidt et al. (2001) Schmidt, J., Salo, H., Spahn, F., & Petzschmann, O. 2001, Icarus, 153, 316
 Stark & Kuchner (2009) Stark, C. C. & Kuchner, M. J. 2009, ApJ, 707, 543
 Wisdom & Holman (1991) Wisdom, J. & Holman, M. 1991, AJ, 102, 1528
 Wisdom & Tremaine (1988) Wisdom, J. & Tremaine, S. 1988, AJ, 95, 925