# Unmasking the Masked Universe: the 2M++ catalogue through Bayesian eyes

###### Abstract

This work describes a full Bayesian analysis of the Nearby Universe as traced by galaxies of the 2M++ survey. The analysis is run in two sequential steps. The first step self-consistently derives the luminosity dependent galaxy biases, the power-spectrum of matter fluctuations and matter density fields within a Gaussian statistic approximation. The second step makes a detailed analysis of the three dimensional Large Scale Structures, assuming a fixed bias model and a fixed cosmology. This second step allows for the reconstruction of both the final density field and the initial conditions at assuming a fixed bias model. From these, we derive fields that self-consistently extrapolate the observed large scale structures. We give two examples of these extrapolation and their utility for the detection of structures: the visibility of the Sloan Great Wall, and the detection and characterization of the Local Void using DIVA, a Lagrangian based technique to classify structures.

###### keywords:

methods: data analysis – methods: statistical – galaxies: statistics – large-scale structure of Universe^{†}

^{†}pagerange: Unmasking the Masked Universe: the 2M++ catalogue through Bayesian eyes–Unmasking the Masked Universe: the 2M++ catalogue through Bayesian eyes

^{†}

^{†}pubyear: 2014

## 1 Introduction

Over the last decades, the wealth of galaxy redshift catalogues has stupendously increased. Nowadays millions of galaxies with precision positioning on the sky and accurate redshifts are available and have to be handled and processed on a routinely basis. For example the Sloan Digital Sky Survey (SDSS, e.g. York et al., 2000; Abazajian et al., 2009; Ahn et al., 2014) provides millions of galaxy redshifts and the Six Degree Field Galaxy Redshift Survey (6DFGRS Jones et al., 2009), covering the southern sky, contains nearly 70 000 galaxies with accurate redshift measurements. While the amount of data has steadily increased, progress in the development of modern data analysis techniques has only been made in recent years. These advances are particularly crucial to interpret evermore complex data sets where time evolution of objects (e.g. star formation rate), non-linear dynamics (e.g. galaxy cluster formation), foreground subtraction as well as systematic selection effects become increasingly important.

Inferring 3d density fields in a formal and rigorous Bayesian framework has several advantages. The first and foremost advantage is that all observational aspects are treated self-consistently yielding inferred 3d density fields that do not require any post-analysis correction. The second advantage is that the model yields more information on the density field than what is readily usable in catalogues. For example the tidal field created by visible large scale structures may trigger the collapse in other unobserved area of the Universe. This can raise the interesting possibility of predicting where structures (such as walls, filaments, clusters and voids) form. The actual presence of such inferred structures can then be tested via dedicated observations a posteriori. Specifically this work focuses on developing a probabilistic structure predictor. We will concentrate on the void aspect in difficult unobserved regions like the Galactic plane. To characterize these voids we will make use of the previously presented DIVA framework (Lavaux & Wandelt, 2010).

Particularly successful approaches to solving such ill-posed inverse problems rely on the Bayesian formulation of parameter inference. We define a forward data model that indicates how a continuous three-dimensional density field is transformed into a set of predicted observables which are then directly compared to data. In our case the observable is the number density of galaxies in comoving space. Conversely, given the position of galaxies we may infer this density field provided it is decomposed on an adequate finite basis. In this context, the data model should include everything that may happen between the density field to the detection of a galaxy by an observer, which includes for example photon detection, galaxy detection efficiency. The full problem cannot be solved in its entirety but for sufficiently well constructed samples only basic selection criterion, such as flux limitation and overall redshift completeness, are important.

Even in this optimistic context, the parameter inference problem is daunting: for typical inferences we need to treat on the order of highly degenerate parameters comprised typically of density per volume elements and power spectrum values. There exists a wealth of literature on the derivation of power spectra and correlation functions from noisy and incomplete data (see e.g. Landy & Szalay, 1993; Tegmark et al., 2004; Percival, 2005). However they never fully grasp the complexity of the posterior of a blind analysis of power spectra in data. More recent developments, notably stimulated by the requirement of the Cosmic Microwave Background community (see e.g. Eriksen et al., 2004; Jewell et al., 2004; Wandelt et al., 2004), have pushed the limits of density and power-spectrum reconstruction for galaxy redshift catalogue (Jasche et al., 2010; Jasche & Kitaura, 2010; Jasche & Lavaux, 2015).

All the aforementioned techniques still require a good knowledge on how tracers have been selected. To have the largest, deepest and cleanest galaxy redshift compilation we propose to use the 2M++ (Lavaux & Hudson, 2011) galaxy compilation. This survey offers a near full sky coverage at a magnitude and above 50% coverage for . Evolutionary effects of galaxies were corrected in average and the selection is done for a consistent population of galaxy. Finally, redshift completeness maps are provided for the two magnitude selections.

The data application presented in this works builds upon our previously developed Bayesian data analysis algorithms ARES (Algorithm for REconstruction and Sampling, Jasche & Wandelt, 2013b) and BORG (Bayesian Origin Reconstruction from Galaxies, Jasche & Wandelt, 2013a). Both these algorithms perform a Bayesian analysis of the 3d distribution of galaxies albeit with different assumptions on the noise and on the dynamics of the tracers. This work is structured as follows. In Section 2, we give a description of the 2M++ galaxy compilation which is the data that we are aiming at modelling. Then in Section 3, we present the pipeline and give a reminder on the working of the ARES and BORG models and algorithms. In Section 4, we present the setup and the convergence tests of the Bayesian inference. In Section 5, we analyse the results in the context of cosmography and structure classification. Finally, in Section 6 we conclude.

## 2 The 2M++ Survey

In this work we follow a similar procedure as described in Jasche et al. (2010) and more recently in Jasche et al. (2015), by applying the BORG algorithm to the 2M++ galaxy compilation (Lavaux & Hudson, 2011). The 2M++ is a superset of the 2MASS Redshift Survey (2MRS, Huchra et al., 2012), with a greater depth and a higher sampling than the IRAS Point Source Catalogue Redshift Survey (PSCZ, Saunders et al., 2000). The photometry is based primarily on the Two-Micron-All-Sky-Survey (2MASS) Extended Source Catalogue (2MASS-XSC, Skrutskie et al., 2006), an all-sky survey in the , and bands. Redshifts in the band of the 2MASS Redshift Survey (2MRS) are supplemented by those from the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7, Abazajian et al., 2009), and the Six-Degree-Field Galaxy Redshift Survey Data Release Three (6dFGRS, Jones et al., 2009). Data from SDSS were matched to that of 2MASS-XSC using the NYU-VAGC catalogue (Blanton et al., 2005). As the 2M++ draws from multiple surveys, galaxy magnitudes from all sources were first recomputed by measuring the apparent magnitude in the band within a circular isophote at 20 mag arcsec . Following a prescription described in Lavaux & Hudson (2011), magnitudes were then corrected for Galactic extinction, cosmological surface brightness dimming and stellar evolution. After corrections the sample was limited to in regions not covered by the 6dFGRS or the SDSS, and limited to elsewhere. Other relevant corrections which were made to this catalogue include accounting for incompleteness due to fibre-collisions in 6dF and SDSS, as well as treatment of the zone of avoidance (ZoA). Incompleteness due to fibre-collisions was treated by cloning redshifts of nearby galaxies within each survey region as described in Lavaux & Hudson (2011).

The treatment of the ZoA in the 2M++ will be ignored for this work as the Bayesian machinery naturally and self-consistently accounts for incomplete observations. The galactic plane will thus be simply obscured, the objects marked as cloned removed from the catalogue and the completeness set to zero in that region. The ZoA is defined in the 2M++ as the region delimited by for and , and for or .

The galaxy distribution on the sky and the corresponding selection at and are given in Figure 1. The top row shows the data used in our analysis. The lower row show the redshift incompleteness, i.e. the number of acquired redshifts versus the number of targets, for the two apparent magnitude bins. We note that the galactic plane clearly stands out and that the incompleteness is evidently inhomogeneous and strongly structured.

In addition to the target magnitude incompleteness, and the redshift angular incompleteness, one may also worry about the dependence of the completeness with redshift. This is not a problem for the lower which is essentially 100% complete. We do not expect much effect in the fainter magnitude bins as the spectroscopic data come from SDSS and 6dFGRS which have both an homogeneous sampling and have fainter magnitude limits as the 2M++.

We account for radial selection functions using a a standard luminosity function proposed by Schechter (1976). Using this function we can deduce the expected number of galaxies in the absolute magnitude range, observed within the apparent magnitude range of the sample at a given redshift. The and parameters are given for the K-band in the line labeled ”” of the table 2 of Lavaux & Hudson (2011), i.e. , . The target selection completeness of a voxel, indexed by , is then

(1) |

where the co-moving coordinate set spanned by the voxel, and . The full completeness of the catalogue is derived from the product of and the map corresponding to the considered apparent magnitude cut given in the bottom row of the Figure 1 after its extrusion in three dimensions.

Finally, we note that our analysis accounts for luminosity dependent galaxy biases by following the approach as described in Jasche et al. (2015). In order to do so the galaxy sample is subdivided into 3 equidistant bins in absolute -band magnitude in the range . The galaxy sample is further splitted into two sub-sets depending on the apparent magnitude: if it belongs to the sample one, otherwise, it belongs to the sample two. The bias in each of these bins is kept constant to greatly reduce the time complexity burden, at the cost of losing a full marginalization according to these parameters. The determination of these values is left to ARES. The mean density of tracers, and thus the Poisson noise amplitude, in each of these bins is sampled.

As will be described in more detail below, splitting the galaxy sample permits us to treat each of these sub-samples as an individual data set, with its respective selection effects, biases and noise levels.

## 3 Methodology

In this section we give a brief introduction to the Bayesian inference framework BORG (Bayesian Origin Reconstruction from Galaxies).

### 3.1 The Ares framework

The ARES framework is a full Bayesian large scale structure inference method targeted at precision recovery of cosmological power-spectra from three dimensional galaxy redshift surveys. Specifically it performs joint inferences of three dimensional density fields, cosmological power spectra as well as luminosity dependent galaxy biases and corresponding noise levels for different galaxy populations in the survey (Jasche et al., 2010; Jasche & Wandelt, 2013b).

The complete problem solved by ARES has many parameters. In the case of a single population, the data model implemented in ARES corresponds to the following:

(2) |

with the number of galaxies in the voxel , the mean density of the galaxy population, the overall linear response operator of the survey (i.e. the redshift and the target completeness), the population bias, the density growth factor in the voxel , the linear density at a reference redshift in the voxel and a random instrumental noise. The noise is assumed to be Poissonian but approximated by a Gaussian distribution and neglecting the influence of the density fluctuations themselves. Thus we have

(3) |

with is one for and zero otherwise. Finally, we add an isotropic Gaussian prior to . All the details of the general model and the posterior formulation are given in Jasche & Wandelt (2013b). The linear bias model should be generally adequate to model the largest scale density fluctuations. In that regime, through Taylor expansion, all bias models are equivalent. However this is not the case at the smallest scales considered here ( Mpc ) though we think that we should not be strongly biased by this assumption. Effectively, we expect the signal-to-noise to ratio of the measurement of density modes to peak at intermediate scales ( Mpc ) and decreasing sharply both at small (for Poisson sampling reasons) and large (for selection reasons) scales. Thus the measured bias should actually represent the one at this typical scale. We finally note that the final confirmation that the bias model is not causing problems is the a posteriori confirmation that the recovered power spectrum is in agreement on large scales.

To summarize the posterior from which we want to draw samples is

(4) |

with the number of free voxels with non vanishing selection , and the discrete powerspectrum of the density field. Such a posterior probability is too complex to analyse directly. In order to provide full Bayesian uncertainty quantification the algorithm explores the joint posterior distribution of all these quantities via an efficient implementation of high dimensional Markov Chain Monte Carlo methods in a block sampling scheme. In particular the sampling consists in generating from a Wiener posterior random realizations of three dimensional density fields constrained by data . Following each generation, we produce conditioned random realizations of the power-spectrum , galaxy biases and noise levels through several sampling steps. Iteration of these sampling steps correctly yields random realizations from the joint posterior distribution. In this fashion the ARES algorithm accounts for all joint and correlated uncertainties between all inferred quantities and allows for accurate inferences from galaxy surveys with non-trivial survey geometries. Classes of galaxies with different biases are treated as separate sub samples, allowing even for combined analyses of more than one galaxy survey.

This methodology has also been demonstrated to correctly treat anti-correlations between bias amplitudes and power spectrum, which are not taken into account in traditional approaches to power spectrum estimation, a 20 percent effect across large ranges in Fourier space (Jasche & Wandelt, 2013b). In this work we use an upgraded version of the ARES which employs the messenger method discussed in Elsner & Wandelt (2013). This particular implementation of the Wiener posterior sampling has been demonstrated to improve upon the statistical efficiency of previous implementations (Jasche & Lavaux, 2015). In this work we use the ARES algorithm to infer and calibrate luminosity dependent galaxy biases for the 2M++ galaxy survey.

### 3.2 The Borg algorithm

In addition to ARES, this work also capitalizes on the BORG (Bayesian Origin Reconstruction from Galaxies Jasche & Wandelt, 2013a) algorithm to perform a chrono-cosmographical analysis of the 2M++ galaxy survey. The BORG algorithm is a fully probabilistic inference machinery aiming at the analysis of linear and mildly-non-linear matter density fields in galaxy observations. The algorithm incorporates a physical model for gravitational structure formation, which translates the traditional task of reconstructing the 3d density field into the task of inferring corresponding initial conditions at an earlier epoch from present cosmological observations. This results in a highly non-trivial Bayesian inverse problem, requiring to explore the very high-dimensional and non-linear space of possible solutions to the initial conditions problem from incomplete observations. These parameter spaces typically consist in to parameters, corresponding to the discretized volume elements of the observed domain.

As for ARES, the BORG algorithm is assuming a specific data model to interpret the galaxy redshift catalogue and infer the three dimensional density field. We do not describe here the full problem solved by BORG as such details are already described in Jasche & Wandelt (2013a); Jasche et al. (2015). We remind here nonetheless the basic assumptions. BORG assumes that the distribution of galaxies, after binning in volumetric elements, are Poisson distributed according to some expectation. This expectation, , of the galaxy distribution in the voxel is modelled as

(5) |

with the mean galaxy density, the linear response operator including the effects of redshift and target completeness at the voxel , and the bias model parameter and the non-linear density field at the voxel which functionally depends on the initial density field . The power law bias model is behaving like the linear bias model when is small compared to one. In this work the relation between and is given by the 2LPT. As indicated above, in addition to the data model, we put a Gaussian prior on the initial conditions, with a cosmological power spectrum. This Gaussian prior does not enforce Gaussianity of initial conditions. The prior only enforces that without access to data a Gaussian statistics should be followed. But intrinsically non-Gaussian defects in the data would not be erased under this assumption.

Our algorithm explores the posterior distribution of the Fourier modes of and the meta-parameter . As pointed out previously, the 2LPT describes the one, two and three-point statistics correctly and represents higher-order statistics very well (see e.g. Moutarde et al., 1991; Buchert et al., 1994; Bouchet et al., 1995; Scoccimarro, 2000; Scoccimarro & Sheth, 2002). Consequently, the BORG algorithm naturally accounts for features of the cosmic web, such as filaments, that are typically associated to higher-order statistics induced by non-linear gravitational structure formation processes. Besides higher-order statistics of the density field, this posterior distribution also accounts for survey geometries, selection effects and noise, inherent to any cosmological observation. The BORG algorithm provides full Bayesian uncertainty quantification by exploring this highly non-Gaussian and non-linear posterior distribution via an efficient Hamiltonian Markov Chain Monte Carlo sampling algorithm (see Duane et al., 1987; Jasche & Wandelt, 2013a, for details). As it incorporates an approximate model of large scale dynamics, it automatically and fully self consistently infers the dynamical evolution of the large scale structure from observations. In this fashion the algorithm provides dynamical structure formation histories compatible with both data and model. In order to account for luminosity dependent galaxy bias and to make use of automatic noise calibration, we will further use modifications introduced to the original BORG algorithm by Jasche et al. (2015).

## 4 the Bayesian analysis

The analysis of the 2M++ galaxy sample has been performed on a cubic Cartesian domain with a side length of 600 Mpc consisting of equidistant grid nodes, resulting in inference parameters for both the ARES and the BORG runs. Thus the inference procedure provides data constrained realizations for final (and the initial density fields in the case of BORG) at a grid resolution of about Mpc . To integrate the effect of the growth of large scale structure and the cosmological Doppler effects, we assume a fixed standard CDM cosmology with the following set of cosmological parameters (, , , , , ) taken from Planck Collaboration (2014). Additionally, for the BORG runs, cosmological power-spectra for initial density fields were calculated following the prescription provided by Eisenstein & Hu (1998) and Eisenstein & Hu (1999). For the ARES runs the cosmological power spectrum, the bias values and the mean densities have been left free. Also note that to guarantee a sufficient resolution of the final density field, we oversample the initial density field by a factor of eight, which requires to evaluate the 2LPT model with particles. The algorithm correctly accounts for the displacement of matter in the course of structure formation by inferring initial density fields at their Lagrangian coordinates, while final density fields are recovered at corresponding final Eulerian coordinates. We note that redshift space distortions are not modelled in the BORG algorithm and thus are not accounted for explicitly. In its present formulation the BORG algorithm interprets features associated to redshift distortions as noise and will tend to infer isotropic density fields. Isotropy of density fields is naturally imposed by assuming diagonal covariance matrices for initial density fields. Adding the treatment of redshift distortions, both small scale and large scale, is not trivial. The redshift distortions on large scale induces a change in the likelihood where the initial conditions appears twice (in the density field and the way it is evaluated). An illustration of the expected important of such effect is given and discussed in Section 5.2. The distortions on small scales, dubbed ”finger-of-god” (first observationally noted by Jackson, 1972), are even more complicated to model, and causes spreading of the mass of haloes on a large volume. This effect not only depends on scale but also depends on the density regime under consideration. As demonstrated by Leclercq et al. (2015) cosmic voids reconstructed by the BORG algorithm do not show any sign of redshift space distortions. With regard to reconstructed haloes tests on -body simulations showed a remaining residual of 15 percent redshift space distortions at the high mass end. In total we generated 6552 samples data constrained realizations for initial and final density fields. Generally, the computational costs to generate a single Markov sample are equivalent to about two hundred 2LPT model evaluations. We measured the typical time to produce a single sample to be about 1500 seconds on a Intel Xeon E5-4640 using cores.

Sample selection | Identifier | |||
---|---|---|---|---|

and | 0 | 1.74 | ||

1 | 1.21 | |||

2 | 1.00 | |||

and | 3 | 1.70 | ||

4 | 1.20 | |||

5 | 1.15 |

## 5 Inference results

This section describes inference results obtained using our Bayesian analysis on the 2M++ galaxy compilation. As mentioned in Section 2, we cannot run a single code to do the entire analysis. Even though that it is mathematically possible, the time complexity would be too high to obtain results in a timely fashion. So we rely on a splitted analysis, using an approximate statistical model (ARES) to derive some of the meta parameters that will be used in the advanced model (BORG). We first present the relevant results of the analysis using the ARES code in Section 5.1. Then we describe the 3d density field obtained by the BORG code in Section 5.2, along with its convergence properties. In Section 5.3, we present the cosmography of the final density field as inferred by BORG. Finally, in Section 5.4, we give a quantitative assessment of the presence of the Local Void behind Milky Way’s galactic bulge.

### 5.1 Initialization analysis with ARES

As described above, in this work we will use the ARES code to calibrate unknown luminosity dependent galaxy biases followed by an detailed analysis with the BORG algorithm. To perform this initial analysis with ARES we will follow a similar approach as described in Jasche &
Wandelt (2013b). Specifically we will treat galaxies selected at (sample 1) and (sample 2) as two independent data sets with their respective survey geometry and selection function, as detailed in section 2. In addition we sub divide each of these galaxy samples into three bins of absolute magnitude in the range to account for respective luminosity dependent galaxy biases and noise levels. When applied to the 2M++ data, the ARES code generated 4306 joint posterior realizations for the cosmic power-spectrum, the density field, noise levels and luminosity dependent galaxy biases.^{1}^{1}1ARES run has been done on a standard workstation Intel Core i7-2600, 8 cores, in a week. To demonstrate that the ARES algorithm inference yielded physically correct results, in Figure 2 we show the comparison between the inferred ensemble mean cosmological power-spectrum and a fiducial one, calculated according to the prescription described in Eisenstein &
Hu (1998) and Eisenstein &
Hu (1999). As can be seen ARES has recovered the shape of the cosmological power-spectrum within the corresponding one sigma confidence regions. No particular sign of bias throughout all modes in Fourier space can be observed. Erroneous treatment of survey geometries, selection effects and galaxy biases typically yield artefacts of false power in the power-spectrum. The absence of such artefacts in Figure 2, therefore indicates that these effects have been accounted for accurately.

In Figure 3, we show the value for the bias parameter found in the different subsample, taking the faintest magnitude bin of the sample 2 with a fiducial value of one. The result are given in red and blue coloured boxes. The width of those boxes corresponds to the width of the magnitude interval and their height to the 95% confidence interval. In addition, the best fit of Westover (2007) have been plotted in black, alongside its error bar analysis. The best fit of Westover (2007) is given by

(6) |

with the intrinsic luminosity of the considered galaxy population, the reference luminosity which for 2M++ is given by . We have adjusted the reference so that a bias of one is given for our reference population (sample 2, faint luminosity bin). We note the perfect agreement between the two measurement. The advantage of our procedure is its full automation, the derivation of an unbiased power spectrum and the alongside matter density field. Also, we have used a limited number of bins, but nothing prevents us to increase their number, at the cost of the amplitude of the signal-to-noise. The most important result of the ARES analysis for this work is the derivation of the luminosity-dependent galaxy biases for the galaxy population selected in 2M++. We use these biases as-is in the following BORG reconstruction. While the two bias model are relatively different, in the regime of small density fluctuations on large scales, they can be rejoined by doing a Taylor expansion: and thus . Of course this equality is not exact and is probably leading to some bias in the density field reconstruction. We expect in the future to be able to jointly infer the bias parameter in BORG with the density field itself at lesser computational cost, which will remove any foreseeable problem.

### 5.2 3d density field

Using inferred bias values, as described above, we have run the BORG algorithm on the 2M++ compilation data. The results are presented in Figures 4, 5 and 7.

In Figure 4, we show the sequence of power-spectra of the initial density field as the chain is attached to a locus around the maximum posterior. The top panel shows the raw power spectra and the bottom panel are the same power-spectra divided by the assumed CDM initial linear power-spectrum. We note that after a convergence in 400 samples, the power spectra starts oscillating on large scales ( Mpc). This indicates the chain has extracted all the available information at these scales from observations. Additionally this indicates the correlation length of the Markov chain to be on the order of sampling steps. On intermediate scales ( Mpc Mpc) the power-spectrum is strongly constrained and unbiased compared to our reference power-spectrum. At very small scale the noise increases back again because we reach scales at most of the size of a voxel element. Consequently all information is lost. We note that, contrary to Kitaura (2013), we do not observe any bumps in the power-spectra of reconstructed phases at intermediate scales. Finally we handle unobserved regions sufficiently correctly that the power spectra appear unbiased.

In Figure 5, we show the mean initial density field (top row), the 2LPT evolved mean final density field (middle row) and the input data (bottom row) for the , and plane of the Equatorial coordinate system. The edge of the 2M++ survey is clearly visible in the mean final density field. For these panels, we see clearly defined structures in the central region, which is close to the observer and more likely to be fully complete. Towards the boundaries of the cubic domain structures become increasingly blurry when going out of the observed volume at a distance of 200 Mpc from the centre. In the initial condition (top row), these edges are far less clear which emphasizes that the information stored in the current position of galaxies comes from extended places in Lagrangian coordinates and that information is distributed differently in initial and final conditions (Jasche et al., 2015). Finally, we see the visual improvement obtained from the final density field derived by BORG compared to the actual distribution of galaxies given in the bottom row.

In Figure 6, we show the impact, a posteriori, of the large scale component of redshift space distortions. In particular for this test we assume that inferred density fields have been correctly recovered in real-space and add redshift space distortions corresponding to velocities derived through 2LPT dynamics. The left-hand panel of Figure 6 reproduces the real space density field of Figure 5 (centre column) as determined by BORG. The middle panel shows the redshift distortion effects produced by peculiar velocities predicted by the 2LPT dynamics on the density field. The right-hand panel gives the difference between the middle and the left-hand panel, highlighting the regions that have moved due to redshift distortions. On top of the three density fields, we have drawn a red dash-dotted grid with a spacing of 50 Mpc . As can be seen Large Scale Structures are not moved much by the large scale component of the peculiar velocities. The most important effects lead to smearing of filaments and haloes (middle panel), which already happens when comparing 2LPT dynamics to full non-linear solution since 2LPT does not capture shell crossing effects very well. Inspection of the right-hand panel indicates that structures move typically by a few Mpc, which is of the same order as grid resolution used here ( 2.3 Mpc ). Thus, on scales larger than a single voxel size inferred density fields are not affected much by this effect. We can conclude that for the purpose of density reconstruction that the fields predicted by BORG are very close to what they should be if redshift space distortions were taken into account.

### 5.3 Cosmography

In Figure 7, we show the supergalactic plane as seen from a thin slice of the final density field (coloured background field) computed by BORG and a 20 Mpc -thick slice (20 Mpc ) extracted directly from the galaxy data (magenta dots). We have represented the data in polar coordinates so that the Supergalactic longitude can be directly read from the plot.

Major structures of the Local Universe are clearly visible both with the galaxies and the final density field. Also, the density field in the Galactic plane (visible at and ) is smoothly extrapolated from neighbouring structures. We typically see the Pisces-Cetus supercluster (, 180 Mpc ; Tully, 1986), the Coma cluster (, Mpc ; Wolf, 1901; Hubble & Humason, 1931), the Shapley concentration (, Mpc ; Scaramella et al., 1989; Raychaudhury, 1989) and the Perseus-Pisces supercluster (, 55 Mpc ; Jõeveer et al., 1978). We note that a quite prominent circular filament connected to the Shapley concentration, going from to at Mpc , located just behind the Bootes void. We are not aware of any name given to this filament, we name it the Virgo-Bootes-Hercules filament.

As a final remark, we note that the Sloan Great Wall is clearly visible in the reconstructed density field shown in the middle right-hand panel of Figure 5 at Mpc , Mpc . The wall itself is not clearly visible in the galaxy distribution shown in the panel just below. We see that the Sloan Great Wall is not as well characterized as other structures by looking at the amplitude of the mean field, which is expected given the sparsity of galaxies in the catalogue in that part of the volume. This structure is a striking example of the large-scale structure reconstruction achieved by BORG from noisy data. By representing the galaxies and the reconstructed Sloan Great Wall on the same sky plot, we see that the Hercules-Aries filament inters

### 5.4 Local Void analysis

An interesting feature of non-linear density fields inferred by BORG is the possibility to uncover unobserved structures. In Figure 8 we provide a particular example by looking at the Local Void (also known as Tully’s void, Tully & Fisher, 1987). In the two panels we show the mean ”final density field” and overplotted by either the 2M++ galaxies (left-hand panel) or the HI Parkes All Sky Survey (HIPASS) galaxies (right-hand panel, Meyer et al., 2004). The 2M++ galaxies are appearing in spite of the galactic plane cut and the galactic bulge because we represent a 40 Mpc thick slice. This void is clearly visible at the Galactic longitude in both panels and it visually seems to extend from 10 Mpc to 60 Mpc in the ensemble mean field.

To illustrate a further application of our reconstruction technique, we identify and assign a probabilistic value to belonging in a diva(Lavaux & Wandelt, 2010) void for voxels located in the galactic plane. We have used the following procedure.

First we smooth the initial density field of each sample of the Markov Chain created by BORG with a Gaussian filter of 5 Mpc . The choice of this filter size is motivated by the mass it corresponds to in Lagrangian coordinates. For a Universe with , a tophat filter of 5 Mpc would represent . So filtering over that scale removes the contribution from groups of galaxies in the classification of the cosmic web.

Then we run the truncated watershed transform on this field. We identify particles belonging to the identified voids and propagate forward in time using 2LPT. We set to one each voxel where a void particle is found, and we compute the average field. By construction the average field becomes the marginalized probability for each voxel to be in a void:

(7) |

where is the length of the Markov Chain, is set to one if the voxel belongs to a void assuming initial density fluctuations and zero otherwise, the conditional marginalized posterior of the reconstructed initial density fluctuations given the data. The mean field is thus equal to the probability that is in a void given the observational data.

We show the result of this procedure in the Figure 9, highlighting the regions definitely voids (dark blue colour) or not voids (white). We have over-plotted the galaxies of the HIPASS catalogue that are within 10 Mpc of the galactic plane. Of course the regions with a large number of galaxies are more clearly not voids. On the other hand there is a filament of galaxies at that is marked as belonging to a void with a high probability, i.e. greater than 90%. We note that the void classification probability is entirely marginalized according to all the other variables. The classification here corresponds qualitatively well with the visual impression of Figure 8 for which the void-like area located in the most under-dense region at longitudes between and . Most of the voxels to the right of are identified as non-void. Of course this classification is not the full story, and it has been advocated by Lavaux & Wandelt (2010) that one should use a full filtering hierarchy to characterize dynamically the cosmic web. It is however a powerful tool to separate the galaxies according to their dynamical environment. As it would be beyond the scope of this paper, we postpone this classification to a future work. We also note that the diva classification of the Large Scale structure is not unique as other prescriptions have been advocated in other work that rely only on the present gravitational field (such as Hahn et al., 2007). However the combination of BORG and diva allows us to use the full dynamical history of Large Scale structures to make the classification. Contrary to other techniques, it accounts for the fact that galaxies may have originally formed in environments different from their present one.

## 6 Summary and Conclusions

This work presents a fully Bayesian data analysis pipeline to study cosmic structures in galaxy redshift catalogues, derive their statistical properties and infer corresponding initial conditions as well as plausible dynamic structure formation histories. This pipeline consists in the sequential application of two of our Bayesian inference algorithms.

Specifically, here we have applied this methodology to the 2M++ galaxy compilation (Lavaux & Hudson, 2011), spanning the entire sky at a depth of 200 Mpc . In a first step we have employed the ARES (Jasche & Wandelt, 2013b) algorithm to infer the cosmological power-spectrum and calibrate luminosity dependent galaxy biases. As demonstrated in Section 5.1 the ARES algorithm accurately recovers the shape of a fiducial cosmological power-spectrum throughout the entire range of Fourier modes considered in this work. This result clearly demonstrates that systematics arising from survey geometries, selection effects and galaxy biases have been accounted for in our Bayesian inference approach. In particular, we have determined the bias values of galaxies with luminosities in three bins for magnitudes going from to . We note that our results on luminosity dependent galaxy biases are consistent with and confirm the previous findings of Westover (2007).

Based upon these results we performed a highly detailed analysis of the mildly non-linear and non-linear large scale structure in the 2M++ galaxy catalogue via the BORG algorithm (Jasche & Wandelt, 2013a). Specifically, we have used the previously inferred galaxy biases as an input to BORG to infer the large scale structure of the Nearby Universe within a co-moving equidistant box of a volume of (600 Mpc ) centred on the observer. The grid resolution is Mpc , resulting in a total of inference parameters which can be accurately handled by our Bayesian inference framework. The algorithm jointly infers the present non-linear Large Scale structures and their corresponding initial conditions, at a cosmic scale factor of , from which they originate. In Section 5.2 we have demonstrated the results for inferred three dimensional density fields. These results show highly detailed Large Scale structures at present and in initial conditions. Further we have shown that our Bayesian inference algorithm permits us to accurately quantify uncertainties inherent to any cosmological observations. We have thus successfully reconstructed statistically the initial conditions on large scales of our Local Universe together with a detailed treatment of survey geometries, selection effects and tracer biases.

As a particular application of the reconstructed density field and initial conditions to statistical structure detection, we have focused on the problem of identifying the Local Void. The Local Void is typically obscured by the Galaxy and is consequently masked out in the 2M++ galaxy compilation. To demonstrate the power of our Bayesian methodology to recover structures in unobserved regions we have shown that the Local Void is clearly visible in the reconstructed density field at despite the lack of information. To further quantify the statistical significance of this detection, we have used the diva void classification prescription to generate a density of probability that a given volume element is part of the Local Void. These results indicate a high probability for the existence of the Local Void behind the Galaxy. The validity of our results is further supported by comparison with data from the HIPASS catalogue.

The results obtained in this work will be subject to more detailed studies, including further improvement in the dynamical model used in the BORG tool, of the large scale structure in the Nearby Universe.

In summary, this work presents a detailed application of our Bayesian inference framework to data of the 2M++ galaxy catalogue. In contrast to state-of-the-art approaches, our algorithm accurately recovers structures in noisy and masked regimes and also infers the dynamic formation history of individual large scale structures. As a result this methodology opens new windows to analyse and understand the Large Scale structures of our Universe.

## Acknowledgements

Special thanks go to Stéphane Rouberol for his support during the course of this work, in particular for guaranteeing flawless use of all required computational resources. JJ is partially supported by a Feodor Lynen Fellowship by the Alexander von Humboldt foundation and Benjamin Wandelt’s Chaire d’Excellence from the Agence Nationale de la Recherche. This research was supported by the DFG cluster of excellence ”Origin and Structure of the Universe” (www.universe-cluster.de). This work made in the ILP LABEX (under reference ANR-10-LABX-63) was supported by French state funds managed by the ANR within the Investissements d’Avenir programme under reference ANR-11-IDEX-0004-02. The Parkes telescope is part of the Australia Telescope which is funded by the Commonwealth of Australia for operation as a National Facility managed by CSIRO. This work was granted access to the HPC resources of The Institute for scientific Computing and Simulation financed by Region Île-de-France and the project EquipMeso (reference ANR-10-EQPX-29-01) overseen by the French National Research Agency (ANR) as part of the “Investissements d’Avenir” program. We acknowledge financial support from ”Programme National de Cosmologie and Galaxies” (PNCG) of CNRS/INSU, France.

## References

- Abazajian et al. (2009) Abazajian K. N., Adelman-McCarthy J. K., Agüeros M. A., Allam S. S., Allende Prieto C., An D., Anderson K. S. J., Anderson S. F., Annis J., Bahcall N. A., et al. 2009, ApJS, 182, 543
- Ahn et al. (2014) Ahn C. P., Alexandroff R., Allende Prieto C., Anders F., Anderson S. F., Anderton T., Andrews B. H., Aubourg É., Bailey S., Bastien F. A., et al. 2014, ApJS, 211, 17
- Blanton et al. (2005) Blanton M. R., Eisenstein D., Hogg D. W., Schlegel D. J., Brinkmann J., 2005, ApJ, 629, 143
- Bouchet et al. (1995) Bouchet F. R., Colombi S., Hivon E., Juszkiewicz R., 1995, A&A, 296, 575
- Buchert et al. (1994) Buchert T., Melott A. L., Weiss A. G., 1994, A&A, 288, 349
- Duane et al. (1987) Duane S., Kennedy A. D., Pendleton B. J., Roweth D., 1987, Physics Letters B, 195, 216
- Eisenstein & Hu (1998) Eisenstein D. J., Hu W., 1998, ApJ, 496, 605
- Eisenstein & Hu (1999) Eisenstein D. J., Hu W., 1999, ApJ, 511, 5
- Elsner & Wandelt (2013) Elsner F., Wandelt B. D., 2013, A&A, 549, A111
- Eriksen et al. (2004) Eriksen H. K., O’Dwyer I. J., Jewell J. B., Wandelt B. D., Larson D. L., Górski K. M., Levin S., Banday A. J., Lilje P. B., 2004, ApJS, 155, 227
- Hahn et al. (2007) Hahn O., Porciani C., Carollo C. M., Dekel A., 2007, MNRAS, 375, 489
- Hubble & Humason (1931) Hubble E., Humason M. L., 1931, ApJ, 74, 43
- Huchra et al. (2012) Huchra J. P., Macri L. M., Masters K. L., Jarrett T. H., Berlind P., Calkins M., Crook A. C., Cutri R., Erdoǧdu P., Falco E., et al., 2012, ApJS, 199, 26
- Jõeveer et al. (1978) Jõeveer M., Einasto J., Tago E., 1978, MNRAS, 185, 357
- Jackson (1972) Jackson J. C., 1972, MNRAS, 156, 1P
- Jasche & Kitaura (2010) Jasche J., Kitaura F. S., 2010, MNRAS, 407, 29
- Jasche et al. (2010) Jasche J., Kitaura F. S., Li C., Enßlin T. A., 2010, MNRAS, 409, 355
- Jasche et al. (2010) Jasche J., Kitaura F. S., Wandelt B. D., Enßlin T. A., 2010, MNRAS, 406, 60
- Jasche & Lavaux (2015) Jasche J., Lavaux G., 2015, MNRAS, 447, 1204
- Jasche et al. (2015) Jasche J., Leclercq F., Wandelt B. D., 2015, JCAP, 1, 36
- Jasche & Wandelt (2013a) Jasche J., Wandelt B. D., 2013a, MNRAS, 432, 894
- Jasche & Wandelt (2013b) Jasche J., Wandelt B. D., 2013b, ApJ, 779, 15
- Jewell et al. (2004) Jewell J., Levin S., Anderson C. H., 2004, ApJ, 609, 1
- Jones et al. (2009) Jones D. H., Read M. A., Saunders W., Colless M., Jarrett T., Parker Q. A., Fairall A. P., Mauch T., Sadler E. M., Watson F. G., Burton D., Campbell L. A., Cass P., Croom S. M., Dawe J., Fiegert K., et al., 2009, MNRAS, 399, 683
- Kitaura (2013) Kitaura F.-S., 2013, MNRAS, 429, L84
- Landy & Szalay (1993) Landy S. D., Szalay A. S., 1993, ApJ, 412, 64
- Lavaux & Hudson (2011) Lavaux G., Hudson M. J., 2011, MNRAS, 416, 2840
- Lavaux & Wandelt (2010) Lavaux G., Wandelt B. D., 2010, MNRAS, 403, 1392
- Leclercq et al. (2015) Leclercq F., Jasche J., Sutter P. M., Hamaus N., Wandelt B., 2015, JCAP, 3, 47
- Meyer et al. (2004) Meyer M. J., Zwaan M. A., Webster R. L., Staveley-Smith L., Ryan-Weber E., Drinkwater M. J., Barnes D. G., Howlett M., Kilborn V. A., Stevens J., Waugh M., Pierce M. J., Bhathal R., de Blok W. J. G., Disney M. J., Ekers R. D., Freeman K. C., et al., 2004, MNRAS, 350, 1195
- Moutarde et al. (1991) Moutarde F., Alimi J., Bouchet F. R., Pellat R., Ramani A., 1991, ApJ, 382, 377
- Percival (2005) Percival W. J., 2005, MNRAS, 356, 1168
- Planck Collaboration (2014) Planck Collaboration 2014, A&A, 571, A16
- Raychaudhury (1989) Raychaudhury S., 1989, Nature, 342, 251
- Saunders et al. (2000) Saunders W., Sutherland W. J., Maddox S. J., Keeble O., Oliver S. J., Rowan-Robinson M., McMahon R. G., Efstathiou G. P., Tadros H., White S. D. M., Frenk C. S., Carramiñana A., Hawkins M. R. S., 2000, MNRAS, 317, 55
- Scaramella et al. (1989) Scaramella R., Baiesi-Pillastrini G., Chincarini G., Vettolani G., Zamorani G., 1989, Nature, 338, 562
- Schechter (1976) Schechter P., 1976, ApJ, 203, 297
- Scoccimarro (2000) Scoccimarro R., 2000, ApJ, 544, 597
- Scoccimarro & Sheth (2002) Scoccimarro R., Sheth R. K., 2002, MNRAS, 329, 629
- Skrutskie et al. (2006) Skrutskie M. F., Cutri R. M., Stiening R., Weinberg M. D., Schneider S., Carpenter J. M., Beichman C., Capps R., Chester T., Elias J., Huchra J., Liebert J., Lonsdale C., Monet D. G., et al., 2006, AJ, 131, 1163
- Tegmark et al. (2004) Tegmark M., Blanton M. R., Strauss M. A., Hoyle F., Schlegel D., Scoccimarro R., Vogeley M. S., Weinberg D. H., Zehavi I., Berlind A., Budavari T., Connolly A., Eisenstein D. J., Finkbeiner D., et al., 2004, ApJ, 606, 702
- Tully (1986) Tully R. B., 1986, ApJ, 303, 25
- Tully & Fisher (1987) Tully R. B., Fisher J. R., 1987, Nearby galaxies Atlas. Cambridge University Press
- Wandelt et al. (2004) Wandelt B. D., Larson D. L., Lakshminarayanan A., 2004, Phys. Rev. D, 70, 083511
- Westover (2007) Westover M., 2007, PhD dissertation, Harvard University, Department of Astronomy
- Wolf (1901) Wolf M., 1901, Astronomische Nachrichten, 155, 127
- York et al. (2000) York D. G., Adelman J., Anderson Jr. J. E., Anderson S. F., Annis J., Bahcall N. A., Bakken J. A., Barkhouser R., Bastian S., Berman E., Boroski W. N., Bracker S., Briegel C., Briggs J. W., Brinkmann J., et al., 2000, AJ, 120, 1579