RayRamses: a code for ray tracing on the fly in Nbody simulations
Abstract
We present a ray tracing code to compute integrated cosmological observables on the fly in AMR Nbody simulations. Unlike conventional ray tracing techniques, our code takes full advantage of the time and spatial resolution attained by the Nbody simulation by computing the integrals along the line of sight on a cellbycell basis through the AMR simulation grid. Moroever, since it runs on the fly in the Nbody run, our code can produce maps of the desired observables without storing large (or any) amounts of data for postprocessing. We implemented our routines in the RAMSES Nbody code and tested the implementation using an example of weak lensing simulation. We analyse basic statistics of lensing convergence maps and find good agreement with semianalytical methods. The ray tracing methodology presented here can be used in several cosmological analysis such as SunyaevZel’dovich and integrated SachsWolfe effect studies as well as modified gravity. Our code can also be used in crosschecks of the more conventional methods, which can be important in tests of theory systematics in preparation for upcoming large scale structure surveys.
I Introduction
Observations of large scale structure in the Universe have been playing a crucial role in getting ever tighter constraints on competing theoretical cosmological models. Perhaps the most classical type of such observations consists in mapping the threedimensional distribution of galaxies (which trace with some bias the total matter distribution) with spectroscopic surveys 2000AJ….120.1579Y (); 2003astro.ph..6581C (); 2013AJ….145…10D (). These surveys measure the baryon acoustic oscillations (BAO) signal 2005MNRAS.362..505C (); 2005ApJ…633..560E () and clustering anisotropies due to redshift space distortions (RSD) 2009MNRAS.393..297P (); 2008Natur.451..541G (); 2014MNRAS.440.2692S (), which allow to put constraints on the rate at which the Universe expands and the rate at which structure grows in it. A complementary approach to traditional galaxy surveys (and which is the focus of this paper) is to infer the large scale structure of matter by measuring its integrated effect on light that travels from background sources towards us. These include shifts in the temperature of cosmic microwave background (CMB) photons caused by inverse Compton scattering with high energy electrons inside galaxy clusters – the socalled SunyaevZel’dovich (SZ) effect 1972CoASP…4..173S (); 1980MNRAS.190..413S (); carlstrom (); 2012PhRvL.109d1101H (), shifts in the temperature of CMB photons as they cross timeevolving gravitational potentials – the socalled integrated SachsWolfe (ISW) effect 1967ApJ…147…73S (); 2002PhRvD..65j3510C (); 2008PhRvD..78d3519H (); 2008ApJ…683L..99G (), and magnification and distortions of background light sources as their emitted light bends due to strong and weak gravitational lensing effects 2001PhR…340..291B (); 2003ARA&A..41..645R (); 2010CQGra..27w3001B (); 2015RPPh…78h6901K ().
As observational surveys of large scale structure keep attaining higher precision, it is important that our theoretical understanding of the relevant physical processes keeps evolving as well. This helps in the interpretation of current data, as well as in the planning of future missions. In computing these theoretical predictions, theorists typically recourse to Nbody simulation methods since these are currently the most accurate way to predict the clustering of matter on intermediate and small scales, where the density fluctuations have become nonlinear. Nbody simulations allow also to include recipes to model the effects of baryonic physics and can be used in the generation of mock data sets to calibrate observational pipelines. Using Nbody simulations to predict integrated effects along lines of sight that cover a redshift range is, in general, not as straightforward as getting predictions at fixed redshift values. For the latter, it often suffices to analyse the simulation output at a given snapshot of the particle distribution, whereas for the former it is required that the simulation results are analysed in a continuous range of redshift values. As a result of that, Nbody methods for integrated observables are often subject to a number of approximations that are more or less valid depending on the exact observable studied. For instance, one of the most popular methods for cosmological weak lensing simulations consists of projecting the continuous matter distribution along the line of sight into a series of lens planes 2000ApJ…530..547J (); 2003ApJ…592..699V (); 2008ApJ…682….1D (); 2008MNRAS.391..435F (); 2009A&A…497..335T (); 2008MNRAS.388.1618C (); Hilbert:2008kb (); 2009ApJ…701..945S (); 2012MNRAS.420..155K (); 2013MNRAS.435..115B (); 2014MNRAS.445.1942M (); 2014MNRAS.445.1954P (); 2015PhRvD..91f3507L (); 2015arXiv151108211G (). This naturally erases the details of the time evolution of the fields along the line of sight. Furthermore, these projections assume that the superposition principle holds for the lensing effects of gravity, which is not necessarily true in theories beyond General Relativity that have nonlinear equations. Simulations of the ISW effect also make certain simplifying assumptions in the calculation of the time derivative of the gravitational potentials (see e.g. 2009MNRAS.396..772C ()).
A great deal of effort is normally put into assessing the validity of the approximations made in these numerical methods, and in general, they seem to be robust enough. However, given the ever higher precision observations that lie ahead, it is desirable that the same observables can be computed with different methods, especially those which are subject to fewer approximations. This can allow for important checks of any residual theory systematics that could still be present. Moreover, current Nbody methods to probe the clustering of matter along the line of sight sometimes require substantial amounts of data to be stored before it is postprocessed to compute the desired signal. This provides extra motivation to develop new numerical techniques that are lighter in data storage, especially in light of the upcoming generation of surveys, which will require large simulations and mock data sets for calibration purposes.
This paper is precisely about a numerical method for integrated cosmological observables that goes beyond existing techniques in the number of approximations made and data storage concerns. Our method, which is based on the original idea of Refs. whitehu2000 (); li2001 (), is designed to trace rays from some source redshift to the observer, on the fly in the Nbody simulation. Our algorithm is implemented in the adaptive mesh refinement (AMR) RAMSES Nbody code 2002A&A…385..337T () and performs the integrations along the line of sight on a cellbycell basis, fully exploiting the spatial and time resolution provided by the Nbody code. Moreover, since it runs on the fly in the Nbody simulation, it can produce the desired maps (lensing, ISW, etc) without having to output the particle snapshots for postprocessing analysis. The goal of this paper is to introduce the code implementation and illustrate its application in cosmological weak lensing simulations.
The outline of this paper is as follows. We start in Sec. II by describing in more detail the reasons that motivated us to develop the ray tracing code presented in this paper. Section III explains the main aspects of the implementation of the algorithm in the RAMSES code. In Sec. IV, we explain the formalism to perform weak lensing studies with our code and test our implementation in Sec. V for a fixed Gaussian potential. Section VI is devoted to weak lensing cosmological simulations. We analyse our code results for one and twopoint statistics of the lensing convergence, where we assess the impact of Nbody resolution and different integration methods. We also investigate the lensing signal around dark matter haloes in our simulations. Finally, we summarise and analyse the prospects for future developments and work in Sec. VII.
Ii Motivating a new ray tracing code
In general, numerical studies of integrated observables employ the following three step strategy. First, one runs a Nbody simulation for a given cosmology and stores the particle data at a specified number of redshift values. Second, the output from the simulations is used to generate a mock lightcone from some observer to a given source redshift^{1}^{1}1There are however ways to compute the lightcone on the fly in the simulation (e.g. Refs. 2008ApJ…682….1D (); 2009A&A…497..335T ()).. Here, one often needs to employ some interpolation scheme to construct a continuous matter distribution from the simulation results that are available only at a finite number of redshift values. Finally, rays are traced across the lightcone to probe the distribution of matter along the line of sight. This strategy has been employed most notably in weak lensing studies (see e.g. Refs. 2000ApJ…530..547J (); 2003ApJ…592..699V (); 2008ApJ…682….1D (); 2008MNRAS.391..435F (); 2009A&A…497..335T (); 2008MNRAS.388.1618C (); Hilbert:2008kb (); 2009ApJ…701..945S (); 2012MNRAS.420..155K (); 2013MNRAS.435..115B (); 2014MNRAS.445.1942M (); 2014MNRAS.445.1954P (); 2015PhRvD..91f3507L (); 2015arXiv151108211G () and references therein), but also in ISW 2009MNRAS.396..772C (); 2010MNRAS.407..201C (); 2014MNRAS.438..412W () and SZ 2000MNRAS.317…37D (); 2001MNRAS.326..155D (); 2001ApJ…549..681S (); 2014MNRAS.440.3645M (); 2015arXiv150905134D () related work.
One can identify, however, two less appealing aspects of this strategy. The first one is practical and relates to the large amounts of data that are needed to generate lightcones for postprocessing. The second is related to the loss in resolution along the line of sight that follows from analysing a lightcone that has been constructed from a finite number of snapshots. To give a concrete example, conventional weak lensing studies usually employ the socalled multiple lensplane approximation, in which the observables are calculated only on a series of planes, onto which the density field has been projected^{2}^{2}2See, however, the approach of Ref. 2012MNRAS.420..155K (), in which the lensing quantities are integrated using the threedimensional distribution of the simulations (without projection onto planes), but which is still only available at a finite number of redshifts.. Although one can always perform convergence tests on the number of planes used (e.g. Ref. 2014MNRAS.445.1954P ()), some of the detailed information on the time evolution along the line of sight is in general lost.
Our main motivation to develop the code presented in this paper was to overcome the two abovementioned aspects. Namely, we aimed to implement a numerical method that (i) computes the integrated observables on the fly in the simulation, thereby avoiding the need to store large amounts of data; and (ii) takes full advantage of the spatial and time resolution of the Nbody run to compute the integrals along the line of sight. Our numerical implementation is based on the original idea of Ref. whitehu2000 () for weak lensing simulations, which was later optimized in Ref. li2001 (). In particular, in the latter work, the authors realized that in particlemesh (PM) Nbody simulations, the integrated quantities can be computed analytically on a cellbycell basis as the simulation is running. These authors implemented their method in regular grid PM codes. In this paper, we follow a similar approach, but implement the algorithm in the publicly available RAMSES code 2002A&A…385..337T (), which can achieve a far greater resolution due to its AMR nature.
When designing this code, it was also our goal to make it general enough so that it could be used as a platform to perform studies of other types of integrated observables, and not just lensing. As a result, even though in this code presentation paper we illustrate the code operation for lensing, we stress that the algorithm is more general than that. In short, the code we present in this paper can calculate integrals of the form
(1) 
where is the comoving distance along some ray trajectory, is an integration kernel and is any field that can be determined inside the simulation box at coordinates , and . The calculation of different observables corresponds to different expressions for and . For instance, for thermal and kinetic SZ studies, is related to the densityweighted temperature and bulk velocity of electrons in clusters, respectively; for ISW studies, is given by the time derivative of the lensing gravitational potential; and for lensing studies, would be associated with second transverse derivatives of the lensing potential (cf. Sec. IV). For the case of lensing, rays may also get their trajectories bent (although we anticipate here that this is not the case in this first version of the code). As we commented above, there is already a substantial body of work available in the literature on these topics. The code we present here provides a different method to compute the same quantities which, amongst other things, can be used in important crosschecks of the traditional methods.
We note in passing that the raytracing machinery that we installed in RAMSES may also serve as a starting point to develop a code that could be applied in radiative transfer studies (see e.g. Refs. 2006MNRAS.371.1057I (); 2009MNRAS.400.1283I (); 2011MNRAS.414.3458W (); 2013MNRAS.436.2188R (); 2013MNRAS.434..748A () and references therein). This will, however, require some modifications to the code presented here, which is primarily oriented for integrated observables along lines of sight.
Iii Code description
In this section, we describe the main parts of the ray tracing code. We start by presenting a quick review of the default RAMSES code, which is followed by an overview of the ray tracing algorithm and how it is implemented in RAMSES. We then explain with more detail each of the main parts of the ray tracing code.
iii.1 Notation and basics of the Ramses code
Our ray tracing modules are installed in the publicly available AMR RAMSES Nbody code 2002A&A…385..337T (). RAMSES employs a number of simulation particles which act as discrete tracers of the underlying matter field. The simulation box is covered by a three dimensional mesh, on which the density values are calculated using the cloudincell (CIC) interpolation scheme given the particle distribution at any time step. The code then solves for the gravitational potential field on the mesh, which can be finitedifferenced to find the corresponding gravitational forces. The force at the particle positions is obtained by interpolating back from the mesh using the same CIC scheme to ensure momentum conservation. This is then used to update the particle’s velocity and position at the next time step. The whole process is repeated from some initial time (typically redshift ) to a later time (usually ). The cubic cells of the 3D mesh can get refined if the effective number of particles contained in them exceeds some prespecified threshold, . Conversely, the cells are also derefined if the number of particles drops below that threshold. This AMR nature of the code is useful in cosmological simulations, because it allows for high resolution in regions of high matter density, whilst saving computational resources in regions of lower density where the resolution can be lower.
The term "domain level" refers to the coarsest mesh that regularly covers the whole simulation volume. In RAMSES, the domain level contains cells in each direction^{3}^{3}3In this paper, we always assume three dimensional systems, although some times we shall use twodimensional diagrams to facilitate the illustrations and explanations.. If a cell of the domain level gets refined, then it is called a "father cell" with eight cubic "son cells". If the cell size of the father cell is , then each of the eight sons has cell size . The father cell, together with its son cells, form a socalled "grid" or "oct" of the refined level. If one of these eight sons gets further refined, then it will form a grid of the second refined level, i.e., its son cells will have cell size . This series of grids accross refinement levels is organized in a tree structure. RAMSES stores the grid and cell IDs in separate arrays and in a way that (i) given a son cell’s relative position inside its grid and the ID of that grid, one can find the ID of the son cell, and vice versa; and (ii) given the parent cell’s ID one can find the grid ID. Each level of refinement is labelled by . The domain level is defined by . For instance, if the simulation has , then . The first refinement level is labelled by and so on. Another characteristic of the RAMSES code is that, at refinement boundaries, the coarse and fine sides differ only by one level of refinement. The size of the time steps is determined independently for each refinement level, with higher refinements taking smaller time steps. For example, one of the criteria to determine the size of the time steps is that the particles should move only by a fraction of the cell size they are currently in (cf. Sec. 2.4 of Ref. 2002A&A…385..337T ()).
RAMSES is also efficiently parallelised using MPI. When run in parallel, each grid is "owned" by the same CPU that "owns" the parent cell of the grid. "To own" here means that the CPU knows all necessary information of a cell/grid, and is responsible for calculating all the relevant quantities inside the cell (density, potential, etc.), as well as the son cell position inside the grid. As domain decomposition strategies, RAMSES can employ PeanoHilbert, angular and planar schemes. Of these, the PeanoHilbert space filling curve is the most optimal for standard Nbody runs. However, given the angular geometry of the ray tracing operations that we aim to perform, it is beneficial to consider the angular scheme since it distributes better the rays accross CPUs (specially when the number of rays becomes large).
Our modifications to the RAMSES code are mostly in the form of additional independent numerical modules, which keep the base code unchanged (except only for a few interfaces). We refer the reader to the RAMSES code paper 2002A&A…385..337T () for more details about its operation.
iii.2 Tiled simulation boxes
Highresolution Nbody simulations of boxes that are large enough to contain the distance travelled by photons from () typically require massive computational resources (see e.g. Refs. 2008MNRAS.391..435F (); 2009A&A…497..335T ()). To circumvent this, one can "tile" together a number of simulation boxes in order to fit the whole light bundle 2000ApJ…530..547J (); whitehu2000 (); li2001 (). Figure 1 shows an example of a possible tiling scheme. The observer lies in the box that we refer to as the "last box", as opposed to the "first box", which contains the ray sources. The source redshift is in this example. Each simulation box takes as input the position of the observer w.r.t. its origin, . For example, for the case illustrated in Fig. 1, the observer is located at the center of  face () of the last box^{4}^{4}4We use the same letter, , to denote redshift and one of the cartesian coordinates. The meaning of should be taken by the context., and so we have , for that box. For the first box, on the other hand, these would be , . Given the geometry of the light bundle, the ray positions are more easily described using a spherical coordinate system with the observer at its origin
(2) 
where , are the two angular coordinates on the sky and is the radial coordinate. If the rays follow straight trajectories, then is equal to the comoving distance , with being the Hubble expansion rate, the redshift and the speed of light.
In the tiling scheme, a ray is only traced in a given box in the redshift interval during which the ray position lies within that box. For example, the integration of the rays in the first box would start at and it will last until , which is approximately when the rays "touch" the face of the box. Following the same reasoning, the second box would start the integrations at , which will continue until ; and so on and so forth, until the rays reach the observer at . Naturally, rays located in the outermost regions of the light bundle move from one box to the other before the more central rays. The conditions for the start and end of integration in each of the boxes are explained with more detail in Sec. III.6.3. We note also that for straight ray cases, the boxes in the tile can be run simultaneously, since they all "know" a priori the position of the rays at all times. On the other hand, if rays bend, then boxes located closer to the observer can start tracing the rays after reading their positions from higherredshift boxes.
We note that each simulation box should also start from different statistical realizations of the initial density field. This way, one ensures that the rays do not see the same structures throughout their trajectories (due to the periodic conditions of the simulation box). Finally, although not depicted in Fig. 1, it is also worth mentioning that closer to the observer, where the spatial volume covered by the light bundle is smaller, the boxes in the tile can be made smaller to gain resolution without sacrificing computational efficiency (although not too small to still allow large enough structures to form).
iii.3 Outline of the code
Figure 2 shows a sketch of the flow of calculations in the code. The first operation of the ray tracing calculation consists of the initialization of the ray data structure (cf. Sec. III.4). The goal is to identify the ID of the grid that a given ray belongs to, i.e., determining the physical location of the ray within the grid structure (cf. Sec. III.5). This is performed only when the rays start the integration because, as the rays move through the mesh, it is possible to determine the ID of the next crossed grid, by searching for neighbouring grids.
After the rays have been initialized, they are moved across the mesh on a cellbycell basis (cf. Sec. III.6), integrating a given quantity along the path inside each cell. As we explain in Sec. III.7, the integration can be done analytically by using the values of the desired quantity at each crossed cell centre or at its vertices. The latter have to be obtained by interpolation from the cell centres, which is where RAMSES evaluates all fields (density, potential, etc.) by default (cf. Sec. III.8).
In a given time step, each CPU moves the rays that are currently within its spatial domain until one of the following possibilities happens:

the rays travel the distance that light can travel in that time step;

the rays reach the observer/face of the box;

the rays reach the end of the CPU’s spatial domain.
If (i) is satisfied, then the CPU simply moves on to the next ray. If (ii) happens, then the ray’s integration is marked as finished, and the CPU also proceeds to the next ray. Finally, if (iii) happens, then the ray is marked for communication and the CPU still carries on to its next ray. Once each CPU has dealt with its initial number of rays, it checks whether rays from other CPUs have been marked to enter its domain, and whether its own rays have been marked to leave. If there are rays entering and/or leaving the CPU’s domain, then the relevant CPUs exchange ray data via MPI communication and repeat the above calculations for the incoming rays. This process is repeated until all rays satisfy (i) or (ii).
Our ray integration routines require as input the field values given by RAMSES at a given time step, and hence, they are called after RAMSES computes these quantities. Once the ray tracing calculations for this time step are finished, the code proceeds with the standard Nbody part, until it is time to call the ray tracing routines again at the next time step. Our modifications consisted therefore in the development of independent modules that do not impact in any way the standard Nbody part.
In the remainder of this section, we explain in more detail each of the steps and concepts involved in the propagation and integration of the rays across the mesh. The reader who wishes to skip these details can jump to Sec. IV, from whereon we present tests and results from weak lensing ray tracing simulations.
iii.4 Ray data structure
To implement our ray tracing algorithm in RAMSES we need to establish a data structure that links ray and grid IDs ^{5}^{5}5Once the link between grid and ray IDs is set, then the link between the ray and the relevant cell is made by checking in which octant of the grid the ray is in.. One can think of at least two ways to do so. We call one the "particle approach" and the other the "ray approach". These two approaches are sketched in Fig. 3.
In crude terms, the particle approach determines which rays are in each grid (cf. lefthand side of Fig. 3). The advantage of it is that it enables direct use of the existing RAMSES structure for other types of "particles" (dark matter, stars, sinks, etc.), therefore making the coding easier. In this approach, a linked list data structure is used in RAMSES to find the global IDs of the particles that lie within a grid, given the grid ID. The communication of rays between CPUs would also follow the strategy already set up for other types of particles in RAMSES. However, in the code, one criterion for determining the size of the time step ensures that dark matter particles move only by a fraction of the current grid size. During this particle time step, photons, which travel at the speed of light, can cross many grids. This means that one has to either update the linked lists each time rays change grids, which involves a large numbers of operations and memory allocations/deallocations, or drastically reduce the particle time step so that rays do not cross more than one grid in a time step, which would make the code prohibitively slow.
The above drawbacks motivated us to implement the ray approach, in which one determines which grid a ray is in (cf. righthand side part of Fig. 3). In this case, the data structure consists of two ordered lists of global ray IDs and their corresponding local grid IDs. Ordered here is in the sense that the global ray ID that is in the th entry of the ray list physically resides inside the grid whose local grid ID is that of the th entry of the grid list. Note that, naturally, if a ray is inside a refined grid, then it also physically lies inside its parent grid. In the data structure, rays are always linked to the grid that is at the highest refinement level. With the ray approach, the integrated quantities are associated to each ray ID in the same way as grid IDs. For example, if the ray in the th entry of the ray list crosses one cell to enter another, then all the code needs to do is to (i) compute the integral for the crossed cell and accumulate it in the th entry of the quantity’s list; and (ii) update the th entry of the grid list with the grid ID that contains the new cell (which can be the same grid). The ray approach is more efficient than the particle approach, even though it involved developing new strategies to move rays and communicate between CPUs (when rays move outside a CPU’s spatial domain or when RAMSES does load balancing to reassign domains to CPUs).
iii.5 Ray initialization
The goal of the ray initialization is to set up the data structure described in the previous subsection. Just to guide the discussion in this section, we assume that the central ray of the bundle travels in the negative direction. The light bundle is specified by (i) its opening angle in the and directions, , , respectively; and (ii) the number of rays in each of these two directions, and . The angular positions of the rays in the field of view are assigned as
(3) 
and we define the global ray ID number as^{6}^{6}6There is an abuse of notation with the subscripts and . Here, they denote the two directions perpendicular to the line of sight and should not be confused with the 3D Cartesian coordinates.
(4) 
with and . Once initialized, the global ray ID stays the same throughout the simulation, even when rays change CPUs. At the end of the simulation, given a ray ID, one can reverse the above equations to find the ray’s position in the 2D ray tracing map.
In the RAMSES structure, it is straightforward to retrieve the spatial location of a grid given its ID, but not the inverse operation: to find the grid ID given a certain (the ray’s) spatial location. To achieve this, one can loop over all grids, and for each grid, loop over all rays to identify those that are inside it. We note that such a "brute force" search may not lead to huge overheads to the overall performance of the code since the initialization is performed only once per box. Nevertheless, we have developed a more efficient algorithm that loops over all grids, but for each, only loops over a smaller targeted range of ray IDs. The details of the algorithm are given in Appendix A, but in short, the idea is to compute the solid angle that subtends a sphere that contains the grid, with which one can determine a range of and . This then enables us to significantly narrow down the ray IDs to loop for each grid using Eqs. (3) and (4).
For nonfirst boxes in the tile, the rays are initialized at the box face, and as a result, if the spatial location of the grid (which can be determined straightforwardly in RAMSES) does not lie on the box face, then the loop over rays can be skipped. Note also that since rays are linked always to the finest grid (cf. Sec. III.4), one can also skip the loop over grids that are not at the highest refinement level.
As we commented above already, the initialization needs only to be performed once per box. During the course of the ray integrations, when a ray leaves a grid to enter another, there are other ways to efficiently determine the IDs of the new grid.
In this paper, we limit ourselves to analysing the code results for small opening angles, for which the flatsky approximation holds. For fullsky ray tracing, one may benefit from using techniques such as HEALPix 2005ApJ…622..759G () to describe the ray distribution across the sphere. We leave the generalization to fullsky cases for future work.
iii.6 Moving rays
In this section, we describe how the algorithm determines the path of a ray in a cell and how the rays are moved through the mesh across particle time steps and CPU domains.
iii.6.1 Trajectory inside cells
Given the direction of a ray inside a cell, the calculation of its trajectory and the determination of the face from which the ray leaves the cell (which determines the next crossed cell) is mostly a geometrical exercise. Denoting by and the two spherical coordinate angles that specify the direction of the ray, then the position of the ray can be parametrized by in the equation ^{7}^{7}7For straight rays, and coincide with the angles that specify the spherical coordinates of the ray (cf. Eq. (III.2)). We leave the generalization of our algorithm to follow nonstraight trajectories for future work.
(5) 
where is the comoving distance of the ray to the observer at the beginning of the trajectory in the cell (cf. Fig. 4). Hence,
(6) 
is the position vector of the ray at the beginning of its trajectory. Of all of the six cell faces the ray can cross, three can be ruled out by the signs of , and . For instance, if , then the ray is travelling in the negative direction (). This means that the ray cannot enter the neighbouring cell that lies in the positive direction. Similarly, if (), then the ray cannot enter the neighbouring cell that lies in the positive () direction. More compactly, the faces from which the ray can leave the cell lie on one of three planes, each characterized by
(7) 
where is the cell size and , and are the cell center coordinates. The trajectory of the ray, Eq. (5), intersects each of these three planes, respectively, at
(8) 
where the subscript denotes the end of the ray trajectory inside the cell (cf. Fig. 4). The face from which the ray leaves the cell is that to which the ray needs to travel the least. Therefore, one computes
(9) 
and the value of specifies the target face. For example, if , then the ray leaves from the face (and analogously for and ).
Once the exiting face is found, then one can identify the next crossed cell by searching for neighbours using the default data structure in RAMSES. If the new cell belongs to the same grid, then one needs only to update the cell information. If on the other hand the next crossed cell belongs to a different grid, then one updates the grid information as well. Note that the new grid can lie at a different refinement level and/or in a different CPU domain, and this information is also recorded in our ray data structure.
iii.6.2 From cell to cell across different time steps and CPU domains
Figure 5 shows a twodimensional sketch of rays moving through the mesh. Shown are the trajectories of two rays during three coarse time steps. During the first two there are no refinements (left), and in the last coarse step some cells get refined (right). The different colors indicate different CPU domains. The red dots along the ray trajectories indicate where the rays cross cells or reach the end of a time step.
In the first coarse time step, ray A starts from the middle of a cell and is moved to the face of this cell. During this trajectory, the code integrates the desired quantity using the algorithm that shall be described in Sec. III.7. Ray A then continues its trajectory in the next crossed cell, but stops before reaching the new cell’s face, at the point where the ray has travelled the maximally allowed distance in that time step. In the meantime, Ray B is moved similarly to ray A, but in the domain of CPU 2.
In the second coarse time step, CPU 1 moves ray A to the face of the cell and marks it for communication to CPU 2, which is found to own the next crossed cell (CPU 1 would then continue to move the other rays in its domain, if any). In the meantime, CPU 2 moves ray B until it is marked for communication to CPU 3. After CPU 2 has dealt with ray B (and all the other rays in its domain), it checks whether other CPUs have marked rays to be sent to it and moves these incoming rays as described above. Similarly, CPU 3 also checks for incoming rays. In the case sketched in Fig.5, CPU 2 receives ray A and integrates it until it is marked for communication to CPU 3, and CPU 3 receives ray B and integrates it until the end of the time step. These series of CPU communications occur until all rays have reached the end of the time step ^{8}^{8}8We note in passing that, in addition to the communications that occur when rays leave CPU domains, there are other situations that require communicating the ray information, namely when the code performs load balancing (redistribution of the spatial domains across CPUs).. In our particular example, CPU 3 receives ray A and integrates it until the end of the time step.
The way that rays move in the third coarse step is analogous to that of the other two, except that the distance that the rays travel before RAMSES updates the field values is smaller, because of the smaller particle time step. In the current implementation of the code, all rays travel by the distance determined by the particle time step on the finest level, regardless of which level they belong to. In principle, this is not necessary since, if a ray only crosses unrefined cells, then it can be moved by the (larger) distance allowed in the coarse step, if the field values at the corresponding cell are not updated during the finer steps taken by the code. The implementation of this in RAMSES is, however, slightly more involved and therefore we opted to have all rays moving by the distance determined by the finest cells of the mesh.
iii.6.3 From box to box in the tile
As the bundle approaches the face of nonlast boxes, some of the rays will cross the face earlier than other rays. For instance, the outermost rays in the bundle are the first to reach the box face, whereas the center rays are the last. To exemplify how the code deals with this transition, consider the trajectory of the outermost and center rays as they leave a given box (call it Box 2) to enter another (call it Box 1), and denote by and the redshift values at which the outermost and center rays cross the face of the box. In this case, Box 2 propagates the outermost ray until it reaches its face. At this point, the integration for this ray stops, but the calculation for the center ray is still ongoing. The simulation of Box 2 should therefore only be stopped for . As for Box 1, it initializes the ray data structure at for all rays, including the center one whose position is known beforehand ^{9}^{9}9As mentioned previously, the geometry is fixed for straight ray simulations and therefore Box 1 "knows" a priori the position of all rays at any redshift. In this case Box 1 and Box 2 can be run simultaneously.. However, it starts by integrating only the outermost ray. The center ray remains "initialized" at the box face, and it only starts being integrated when Box 1 reaches .
iii.7 Integration in cells
One of the key parts of our algorithm is the computation of the integral of a given field along the ray trajectories. Explicitly, we wish to evaluate integrals of the form of Eq. (1), which we repeat here for easy reference,
(10) 
where (with the source redshift), is some weighting function or kernel that can be a polynomial of and is any quantity that can be evaluated in the cells. Compared with Eq. (1), Eq. (10) is more specific as it encodes the information that we wish to follow rays from some distant source to an observer at redshift zero. The integral of Eq. (10) can be split into the contribution from each cell crossed by a ray
(11) 
in which
(12) 
with and being, respectively, the radial coordinate of the starting and ending points of the ray trajectory in a cell as the ray travels towards the observer () ^{10}^{10}10The points and can lie anywhere inside the cell and not necessarily at the intersection of the lineofsight with the cell faces. This is, for instance, the general case at the start and end of particle time steps. (see Fig. 4). To perform the integral we need to be able to evaluate the quantity at any given point inside the cells. The simplest way to do so is to take the field to be constant inside the cell and equal to the value at its center , as given by default RAMSES. In this case, can be taken out of the integral in Eq. (12) and the contribution of each cell to the total integral becomes
(13) 
where the superscript NGP stands for nearest grid point. We note that, in this way, the fields being integrated do not vary continuously when crossing cell boundaries.
Another, more involved, way to construct the field at an arbitrary point inside the cell is via trilinear interpolation of the values of at each cell vertex:
(14)  
where the ’s are given by
(15) 
in which denote the values of at the cell vertices , as indicated in Fig. 4. In Eq. (14), , and are given by
(16) 
where is the cell size, and , and are, respectively, the coordinates of the given ray (cf. Eq. (III.2)), the point , and the point w.r.t. point .
Using Eqs. (14), (III.7) and (III.7), it is possible to compactify the expression for as
(17) 
where the coefficients depend on and , but not on (note also that we are now specifying as a function of spherical coordinates). Their expression is given in Appendix B. Since the ’s do not depend on , one can combine Eqs. (12) and (17) to write
(18) 
The integrand in the above equation is a polynomial in , and so the integral can be solved analytically li2001 (). Therefore, the calculation of the integral in the cell reduces to the geometrical exercise of determining the position of points and and the evaluation of the quantity at the cell vertices. In Secs. IV.2 and IV.4 we shall see a concrete example of the use of these formulae when we apply it to lensing.
Equations (13) and (18) provide two possible ways to compute the final result. The latter has the advantage of allowing for the integration of a continuous field when crossing cell boundaries, by appropriately evaluating the fields at the vertices of the cells (see next section). On the other hand, the use of Eq. (13) does not involve evaluating the fields at the cell vertices (which do not exist in default RAMSES), making it therefore more computationally efficient. In Sec. VI.4, we compare results based on Eqs. (13) and (18).
iii.8 Field values at cell vertices
In default RAMSES, the field values are evaluated at cell centers, but the application of Eq. (18) above requires them to be interpolated to cell vertices. When doing so, one can ensure that the fields reconstructed with trilinear interpolation (cf. Eq. (14)) vary continuously from cell to cell and also that the total mass ^{11}^{11}11In this subsection, we use the word "mass" to denote the product of a surface or volume density with an area or volume. is preserved. This is trivial on a regular mesh without AMR, but requires special care at the boundary of refined regions. We discuss these subtle issues in this subsection.
Figure 6 illustrates a refinement boundary in two dimensions (we shall later generalize the discussions to three dimensions). In the figure, the nine coarse level cells shown are labelled from c1 to c9 and a few fine level cells are labelled from f1 to f12. The vertices V1 to V6 represent different types of vertices at which one wishes the evaluate the fields. To be concrete, we shall take the example of the matter density field, , but the interpolation scheme that we describe here is applied to any type of field.
For vertices of the type of vertex V1, which is shared by four cells of the same level, one can simply set , where are the values at the relevant cell centers. This average ensures that varies continuously from one cell to another when computed with Eq. (14) (or its twodimensional version). In regular mesh simulations, this is the only type of vertex. The vertex V2, which lies at the middle of a coarse cell edge and is shared by two fine cells, can be straightforwardly computed after the values in vertices V3 and V4 have been determined. Explicitly, . This ensures, for instance, that two rays that are infinitesimally close to vertex V2, with one crossing cell c4 and another crossing fine cells f5 or f8, experience the same density field (as reconstructed with trilinear interpolation), i.e., there is no sharp discontinuity between the density experienced by one ray and the other.
The cases of vertices V3, V4 and V5 are more involved as they are shared by both coarse and fine cells. Let us consider vertex V3 as an example, which is shared by cells c1, c4, f2 and f5. It is natural to assume that the density at V3 depends on the density in each of these four cells. The simplest way to write this is as
(19) 
where the ’s are some weights to be determined. Let us focus on . The mass associated with cell c1 is , where in this subsection is the coarse cell size. After the interpolation, we want the mass of this cell to be the sum of the masses associated with each of its vertices. Hence, the mass associated with vertex V3 due to cell c1 is . Due to mass conservation, the value of is redistributed by vertex V3 to other cells at the boundary. In this sense, we can colloquially describe vertex V3 as a mass reservoir that is collecting mass from c1 and redistributing it to neighbouring cells. This mass distribution constraint can be written as
In this equation, the first two terms on the RHS represent the mass from V3 due to cell c1 that is redistributed to cells c1 and c4 ( is the density that V3 contributes to c1 and c4 and is the area of cells c1 and c4). The third and forth terms are the same as the first two, but for cells f2 and f5 (note that is the area of the fine cells). The last four terms on the RHS of Eq. (III.8) must be included as vertex V3 also contributes to the masses in cells f1, f2, f5 and f8 via vertices V2 and V6. Since the contribution from V3 to and is only , each of these four terms gets a factor of compared to the third and forth terms. We can solve for in Eq. (III.8) and, to facilitate the discussions below, we write the result as
(21) 
where is the number of coarse cells that share vertex V3, is the number of fine cells that share vertex V3 and is the number of vertices that the fine cell that shares V3 has at the boundary ( f2, f5). Explicitly, for V3 we have , and . The coefficient in Eq. (19) is obtained in the same way: Eq. (III.8) remains the same, just with replaced by (which appears on both sides of the equation and therefore cancels). Hence, .
The determination of the remaining coefficients, and differs from and . To determine , the RHS of Eq. (III.8) remains the same, just with and replaced by and , respectively. From the reasoning that led to Eq. III.8, one could naively think that the LHS would be simply given by the mass that vertex V3 collects from f2, . However, recall that vertex V3 also contributes to the mass in cells f2 and f1 via V6 (this is why two of the last four terms in Eq. (III.8) appear). As a result, the mass that V6 collects from f2, must also be included in the LHS of the mass distribution equation that determines . Explicitly,
The equation for is the same (apart from ) and the equation for both can be written as
(23) 
where f2, f5. The meaning of , and is the same as above and since , then
Equations (21) and (23) were written in terms of , and because this way they also hold for the other vertices. By following the same steps for V4 and V5 we can write:
(24)  
(25) 
where the coefficients associated with coarse and fine cells are obtained as in Eq. (21) and (23), respectively. For V4, , = 1 and , whereas for V5 , , , and . Note that in Eqs. (19), (24) and (25), the summed value of the weights adds up to unity, as it should.
iii.8.1 Generalization to three dimensions
In three dimensions, the above derivation holds with only a few generalizations. When writing the RHS of mass distribution equations, Eqs. (III.8) and (III.8), for each vertex, in addition to considering the contribution from vertices that lie at the edge of coarse cell vertices (which get a factor of ), one must also consider the contribution to vertices that lie at the center of the coarse cell faces, which get a factor of . Moreover, in two dimensions, the ratio of the area of a fine to coarse cell is , whereas in three dimensions the ratio of their volumes is . Bearing these two things in mind, it is possible to show that the weights associated with coarse cells are given by
(26) 
and the weights associated with fine cells by
(27) 
These expressions differ from their twodimensional counterparts by replacing the factors of by . The meaning of , and is the same as in two dimensions. Just to give an example, consider a vertex V that is shared by seven coarse cells and one fine cell. In the scheme described above, the density at this vertex is
(28) 
where is the density at the center of the th coarse cell and is the density at the center of the fine cell. For this case, , and in Eqs. (26) and (27).
Analogously to the two dimensional case, once the density at the vertices that are shared by both coarse and fine cells is determined, then (i) the density at vertices that lie at the middle of a coarse cell edge is given by the average of the densities of the two coarse cell vertices of that edge; and (ii) the densities at vertices that lie at the center of a coarse cell face is given by the average of the densities at the four coarse cell vertices of that face. The density at vertices that are shared by eight cells of the same level (i.e. not in a refinement boundary) is given by the average value of the density at those eight cell centers.
As a test of our interpolation scheme, we have measured the total mass inside simulation boxes by using the values at the cell centers and at the cell vertices. The agreement between the two ways of measuring the total mass was perfect for meshes with and without refinements, which confirms that the design and implementation of our interpolation is correct. We note also that these operations to interpolate the field values from cell centers to cell vertices naturally add some computational costs to the code, and hence, it is desirable to reduce the number of times these operations should be performed. For instance, since a single cell can contain a large number of rays, the interpolation needs to be performed only once to compute the integral for all rays. Morevoer, if the fields at the vertices of a given cell do not change from one time step to the other (e.g., if it is a coarse cell that is not at a refinement boundary and the time step taken was a fine one), then one can also store the interpolated values from the previous time step, thereby saving some computational time.
Before proceeding, we note that the interpolation scheme, as describe in this section, represents in practice a form of adaptive smoothing of the fields used in the ray integration Hilbert:2008kb (). This is in the sense that the field values at a given cell vertices (and hence the interpolated field inside that cell) depend on the field values on neighbouring cells. The reason why we dub this an adaptive smoothing is because the size of the smoothing kernel (roughly the volume occupied by all the cells that are used in the interpolation) depends on the sizes of the given cell and its neighbours, which in turn depends on the local matter density.
Iv Weak Lensing simulations: method
In this section, we explain how our algorithm can be applied to studies of weak gravitational lensing.
iv.1 Lensing basics
We start with a brief recap of the basics of gravitational lensing (see e.g. Refs. 2001PhR…340..291B (); 2003ARA&A..41..645R (); 2005PhRvD..72b3516C (); 2010PhRvD..81h3002B (); 2010CQGra..27w3001B (); 2015RPPh…78h6901K ()). In a perturbed FriedmannRobertsonWalker (FRW) spacetime, the line element in the absence of anisotropic stress can be written as (considering only scalar perturbations)
(29) 
where is the scale factor. Photons travelling from distant sources towards an observer get their trajectories bent due to the intervening gravitational potential, . The (unobserved) angular position of the source on the source plane, , is related to the observed one, , by the lensing deflection angle as
(30)  
where denotes the two perpendicular directions to the lineofsight. The third line in Eq. (30) is obtained from the second one by defining the derivatives w.r.t. the angular coordinate , or equivalently, . The Jacobian matrix, , of this sourcetoobserver mapping is obtained by differentiating the above equation w.r.t. as in
where . Note that the integral is performed along the perturbed path of the photon, as indicated by the dependence of the potential inside the integral. Note also that one of the derivatives of is w.r.t. and another w.r.t. . These two aspects add complication to the ray tracing, but they can be neglected to obtain approximate solutions. To first order, we can write
in which the integral is now peformed along the unperturbed apparent direction of the photons, which is the socalled Born approximation, and the derivatives are now both w.r.t. , which amounts to neglecting the socalled lenslens coupling^{12}^{12}12Lenslens coupling refers to the correlation between the distortions of the sources with the intervening sources that act as lenses, whose images and positions seen by the observer are also distorted.). The lensing results that we present in this paper are obtained under these two approximations, which are generally found to be valid, at least in what concerns determinations of the power spectrum of lensing quantities Hilbert:2008kb (). The generalization of our ray tracing calculations to follow the rays in their perturbed paths, as well as calculations that take lenslens coupling into account is the subject of ongoing work (see e.g. the Appendix of Ref. li2001 () for a discussion).
Equation (IV.1) can be written in matrix form
(33) 
to define the lensing convergence, ,
(34)  
and the two components of the lensing shear ,
(35)  
(36)  
where we have now denoted for compactness of notation.
iv.2 Lensing integration in the code
The integrals of Eqs. (34), (35) and (36) can be found by using the algorithm outlined in Sec. III.7. In the case of lensing, the integration kernel in Eq. (12) is given by (up to factors ) , and the quantity that one needs to evaluate at cell vertices is for , for , and , for . Consequently, the contribution of each crossed cell to the integral according to method NGP (cf. Eq. (13)) is given by
(37)  
Alternatively, according to Eq. (18), the contribution is
(38)  
where .
What is left to specify is the relation between the quantities , and , with the values of () that are actually computed on the simulation mesh (see the subsection below). This relation is
(40)  
(41) 
The above equations are derived by associating the two spherical coordinates and that specify the incoming direction of the rays with and . Then, the expressions follow straightforwardly from applying , with being the Christoffel symbols of the line element^{13}^{13}13We note in passing that by using this line element one takes into account the curvature of the sky. However, Eqs. (IV.2), (40) and (41) remain the same if and are interpreted as being derivatives w.r.t. the coordinates , i.e., by taking the flat sky approximation. This coordinate system is essentially a Cartesian system rotated such that the direction of the incoming ray is perpendicular to the plane. A simple argument for this equivalence is that the sphere is locally flat, which means that the curvature can in practice be neglected when one takes derivatives. , and then writing and in terms of , , according to Eq. (III.2).
iv.3 Calculation of
The values of () can be computed at the center of a given cell by finite differencing the values of on neighbouring cells. If a cell has all its neighbours at the same refinement level, this calculation is straightforward. For instance, if is the gravitational potential on the cell labelled by , then we have
as two representative examples. The other components of are obtained similarly. However, some complications arise at boundary regions of refinements. As an example, consider that we wish to compute on cell f5 in Fig. 6, where and are, respectively, the horizontal and vertical directions on the figure. The fine cell f5 is missing the neighbour that would exist if coarse cell c4 had been refined. One can think of two ways to compute the missing values that are needed for the finite difference. One option is to interpolate the values of obtained from the coarse level to the point in cell c4 where the center of the relevant son cell would be if it existed. This value could then be used in a finelevel finite difference to compute in cell f5. Another option is interpolating directly the coarse values of in cells c1, c2, c4 and c5 to the center of cell f5, without finite differencing.
To test these two approaches, we have set up a grid with more that one refinement level in the code and used the cell centers to define a Gaussian potential on the mesh. We then compared the analytical result of with the result given by the code. We have found that the second approach agrees very well with the analytical result, but the first option showed larger discrepancies at the refinement boundaries. This is because by taking the finite difference using interpolated values, one amplifies the interpolation error in by the factor of , which enters in the finite difference. In the results that follow, we have therefore implemented the second approach, which is also more computationally efficient^{14}^{14}14As a technical point, imagine that there is a CPU domain along the line that contains vertices V6, V2, V3 and V4 in Fig. 6. In RAMSES, there are "communication buffers" at the CPU domain boundaries, i.e., regions in the next CPU’s domain that are available to the present CPU. In order to compute the value of in cell f5, then its CPU needs to access the value of at cell C4, whose calculation involves on cells further left of C4 (not shown in Fig. 6). These latter cells are outside of the communication buffer of standard RAMSES, which means that we had to increase its size. This is one of the few changes made to the main code.. Once is evaluted at the center of the cells, then its interpolation to cell vertices is as described in Sec. III.8.
iv.4 Alternative lensing integration in the code
The lensing methodology described above involves the calculation of () on the mesh, which inevitably adds some computational overheads. However, as described in this section, it is possible to compute and by integrating only quantities that are computed by default in RAMSES.
Equation (34) can be written as
(43)  
where in the second equality we have used the Poisson equation to relate the comoving threedimensional Laplacian to the matter density contrast, , and the term was integrated by parts using , where is the physical time. The integration of the first term on the RHS of Eq. (43) is the same as that of Eqs. (34), (35) and (36), but with as the quantity ^{15}^{15}15In the current implementation of the code, is taken to be constant during the time step integration. This should not lead to big errors in highresolution simulations if the time steps are sufficiently small. There are however ways to go beyond this by, for instance, implementing the relation in the integration.. The second term is obtained analogously, but with (where