GAMA: Galaxy clustering using photometric redshifts

Galaxy and Mass Assembly (GAMA): Colour and luminosity dependent clustering from calibrated photometric redshifts


We measure the two-point angular correlation function of a sample of 4,289,223 galaxies with mag from the Sloan Digital Sky Survey as a function of photometric redshift, absolute magnitude and colour down to mag. Photometric redshifts are estimated from model magnitudes and two Petrosian radii using the artificial neural network package ANNz, taking advantage of the Galaxy and Mass Assembly (GAMA) spectroscopic sample as our training set. The photometric redshifts are then used to determine absolute magnitudes and colours. For all our samples, we estimate the underlying redshift and absolute magnitude distributions using Monte-Carlo resampling. These redshift distributions are used in Limber’s equation to obtain spatial correlation function parameters from power law fits to the angular correlation function. We confirm an increase in clustering strength for sub- red galaxies compared with red galaxies at small scales in all redshift bins, whereas for the blue population the correlation length is almost independent of luminosity for galaxies and fainter. A linear relation between relative bias and log luminosity is found to hold down to luminosities . We find that the redshift dependence of the bias of the population can be described by the passive evolution model of Tegmark & Peebles (1998). A visual inspection of a random sample of our sample of SDSS galaxies reveals that about 10 per cent are spurious, with a higher contamination rate towards very faint absolute magnitudes due to over-deblended nearby galaxies. We correct for this contamination in our clustering analysis.

galaxies: clustering, photometric redshift, faint population

1 Introduction

Measurement of galaxy clustering is an important cosmological tool in understanding the formation and evolution of galaxies at different epochs. The dependence of galaxy clustering on properties such as morphology, colour, luminosity or spectral type has been established over many decades. Elliptical galaxies or galaxies with red colours, which both trace an old stellar population, are known to be more clustered than spiral galaxies (e.g. Davis & Geller 1976; Dressler 1980; Postman & Geller 1984; Loveday et al. 1995; Guzzo et al. 1997; Goto et al. 2003). Recent large galaxy surveys have allowed the investigation of galaxy clustering as a function of both colour and luminosity (Norberg et al. 2002; Budavári et al. 2003; Zehavi et al. 2005; Wang et al. 2007; McCracken et al. 2008; Zehavi et al. 2011). Among the red population, a strong luminosity dependence has been observed whereby luminous galaxies are more clustered, because they reside in denser environments.

The galaxy luminosity function shows an increasing faint-end density to at least as faint as mag (Blanton et al. 2005a; Loveday et al. 2012), thus intrinsically faint galaxies represent the majority of the galaxies in the universe. These galaxies with luminosity have low stellar mass and are mostly dwarf galaxies with ongoing star formation. However, because most wide-field spectroscopic surveys can only probe luminous galaxies over large volumes, this population is often under-represented. Previous clustering analyses have revealed that intrinsically faint galaxies have different properties to luminous ones. A striking difference appears between galaxy colours in this regime: while faint blue galaxies seem to cluster on a scale almost independent of luminosity, the faint red population is shown to be very sensitive to luminosity (Norberg et al. 2001, 2002; Zehavi et al. 2002; Hogg et al. 2003; Zehavi et al. 2005; Swanson et al. 2008a; Zehavi et al. 2011; Ross et al. 2011b). As found by Zehavi et al. (2005), this trend is naturally explained by the halo occupation distribution framework. In this picture, the faint red population corresponds to red satellite galaxies, which are located in high mass halos with red central galaxies and are therefore strongly clustered. Recently, Ross et al. (2011b) compiled from the literature bias measurements for red galaxies over a wide range of luminosities for both spectroscopic and photometric data. They showed that the bias measurements of the faint red population are strongly affected by non-linear effects and thus on the physical scales over which they are measured. They conclude that red galaxies with mag are similarly or less biased than red galaxies of intermediate luminosity.

In this work, we make use of photometric redshifts to probe the regime of intrinsically faint galaxies. Our sample is composed of SDSS galaxies with -band Petrosian magnitude . As we have an ideal training set for this sample, thanks to the GAMA survey (Driver et al. 2011), we use the artificial neural network package ANNz (Collister & Lahav 2004) to predict photometric redshifts. We then calculate the angular two-point correlation function as a function of absolute magnitude and colour. The correlation length of each sample is computed through the inversion of Limber’s equation, using Monte-Carlo resampling for modelling the underlying redshift distribution. Recently, Zehavi et al. (2011) presented the clustering properties of the DR7 spectroscopic sample of SDSS. They extracted a sample of 700,000 galaxies with redshifts to mag, covering an area of 8000 . Their study of the luminosity and colour dependence uses power law fits to the projected correlation function. Our study is complementary to theirs, since we are using calibrated photo-s of fainter galaxies from the same SDSS imaging catalogue. We use similar luminosity bins to Zehavi et al., with the addition of a fainter luminosity bin .

Small-scale galaxy clustering provides additional tests of the fundamental problem of how galaxies trace dark matter. Previous studies have used SDSS data and the projected correlation function to study the clustering of galaxies at the smallest scales possible (Masjedi et al. 2006), using extensive modeling to account for the fibre constraint in SDSS spectroscopic data. The interpretation of these results offers unique tests about how galaxies trace dark matter and the inner structure of dark matter halos (Watson et al. 2011). Motivated by these studies we present measurements of the angular correlation function down to scales of degrees. We work solely with the angular correlation function and we pay particular attention to systematics errors and the quality of the data.

On the other hand, on sufficiently large scales (), it is expected that the galaxy density field evolves linearly following the evolution of the dark matter density field (Tegmark et al. 2006). However, it is less clear if this assumption holds on smaller scales, where complicated physics of galaxy formation and evolution dominate. In the absence of sufficient spectroscopic data to comprehensively study the evolution of clustering, Ross et al. (2010) used SDSS photometric redshifts to extract a volume-limited sample with and . Their analysis revealed significant deviations from the passive evolution model of Tegmark & Peebles (1998). Here we perform a similar analysis, again using photometric redshifts, for the population.

This paper is organised as follows. In Section 2, we introduce the statistical quantities to calculate the clustering of galaxies, with an emphasis on the angular correlation function. In Section 3 we present our data for this study and the method for estimating the clustering errors. In Section 4 we describe the procedure that we followed in order to obtain the photometric redshifts. We then investigate the clustering of our photometric sample, containing a large number of intrinsically faint galaxies, in Section 5. In Section 6 we present bias measurements as functions of colour, luminosity and redshift. Our findings are summarised in Section 7. In Appendix A we show how we extracted our initial catalogue from the SDSS DR7 database and finally in Appendix B we describe in some detail the tests performed to assess systematic errors.

Throughout we assume a standard flat CDM cosmology, with , and km s Mpc.

2 The two-point angular correlation function

2.1 Definition

The simplest way to measure galaxy clustering on the sky is via the two-point correlation function, , which gives the excess probability of finding two galaxies at an angular separation compared to a random Poisson distribution (Peebles 1980, § 31):


where is the joint probability of finding galaxies in solid angles and separated by , and is the mean number of objects per solid angle. If 0, then the galaxies are unclustered and randomly distributed at this separation. We consider various estimators for in Section 2.3.

2.2 Power law approximation

Over small angular separations, the two-point correlation function can be approximated by a power law:


where is the amplitude. The amplitude of the correlation function of a galaxy population is reduced as we go to higher redshifts, because equal angular separations trace larger spatial separations for more distant objects. By contrast, the slope , of the correlation function is observed to vary little from sample to sample, with . It is mostly sensitive to galaxy colours (see Section 5).

2.3 Estimator

In practice, the calculation of is done through the normalised counts of galaxy-galaxy pairs from the data, random-random pairs from an unclustered random catalogue which follows the survey angular selection function, and galaxy-random pairs . Various expressions have been used to calculate . In this work we adopt the estimator introduced by Landy & Szalay (1993), which is widely used in the literature:


Landy & Szalay (1993) showed that this estimator has a small variance, close to Poisson, and allows one to measure correlation functions with minimal uncertainty and bias. The counts , and have to be normalised to allow for different total numbers of galaxies and random points :

We use approximately ten times as many random points as galaxies in order that the results do not depend on a particular realization of random distribution. We also tried an alternative estimator proposed by Hamilton (1993) which revealed no significant changes in the correlation function measurements.

Estimates of the angular correlation function are affected by an integral constraint of the form


where the integral is over all pairs of elements of solid angle , within the survey area. The constraint requires that goes negative at large separations, to balance the positive clustering signal at smaller separations. However, for wide-field surveys like SDSS the integral constraint has a negligible effect on , even on large scales. We find that the additive correction for the integral constraint is at least two order of magnitude smaller than the value of at degrees. Thus the integral constraint does not bias our clustering measurements.

2.4 Spatial correlation function

We are interested in the spatial clustering and the physical separations at which galaxies are clustered, in order to compare data against theory. To this end, we need to calculate the spatial correlation function from our angular correlation function, which is simply its projection on the sky. The spatial correlation function, , can be also expressed as a power law


where is the correlation length. It corresponds to the proper separation at which the probability of finding two galaxies is twice that of a random distribution, . Limber (1953) demonstrated that the power law approximation for in equation 5 leads to the power law defined in equation 2 with the index being the same in both cases. Phillipps et al. (1978) expressed the amplitude of the correlation function, , as a function of the proper correlation length, , and of the selection function of the survey, whereas later studies propose similar equations where the selection function is implicitly included in the redshift distribution.

Now, writing the angular correlation function as , Limber’s equation becomes (Peebles 1980, § 52, 56):


where is the redshift distribution1, which is zero everywhere outside the limits and and

with the gamma function. The quantity is defined as

where is related to the curvature factor in the Robertson-Walker metric by:

We assume zero curvature, and so .

When using equation 6, we need to determine the redshift distribution of the sample with precision. We address this issue in Section 4.3. Another subtle complication which arises from the use of equation 6 is that galaxy clustering is assumed to be independent of galaxy properties such as colour and luminosity (Peebles 1980, § 51). Therefore it is particularly important to use samples with fixed colour and luminosity, instead of mixed populations for studying galaxy clustering using Limber’s approximation. We address this issue in Section 4.2 where we define the colour and luminosity bins for the clustering analysis.

3 Data

To carry out this analysis, we take advantage of the Galaxy and Mass Assembly (GAMA) survey (Driver et al. 2011). This spectroscopic sample, at low to intermediate redshifts, forms an ideal training set for predicting photometric redshifts of faint galaxies. The galaxies considered for the calculation of the correlation functions are drawn from the seventh data release of the Sloan Digital Sky Survey photometric sample (SDSS DR7; Abazajian et al. 2009). We briefly outline the properties of these samples below.

3.1 SDSS DR7 photometric sample

At the time of writing, the Sloan Digital Sky Survey (SDSS) is the largest local galaxy survey ever undertaken. The completed SDSS maps almost one quarter of the sky, with optical photometry in , , , and bands and spectra for galaxies. The main goal of the survey is to provide data for large-scale structure studies of the local universe. A series of papers describe the survey: technical information about the data products and the pipeline can be found in York et al. (2000) and in Stoughton et al. (2002). Details about the photometric system can be found in Fukugita et al. (1996).

The SDSS imaging survey is completed with the seventh data release (Abazajian et al. 2009), that we use in this paper. The main program of SDSS is concentrated in the Northern Galactic cap with three stripes in the Southern Galactic cap. SDSS DR7 contains about galaxies with over 7,646 of sky.

The images are obtained with a 2.5-meter telescope, located at Apache Point Observatory, New Mexico. Various flux measures are available for galaxies in the SDSS database (Stoughton et al. 2002), including Petrosian fluxes, model fluxes (corresponding to whichever of a de Vaucouleurs or exponential profile provides a better fit to the observed galaxy profile), and aperture fluxes. In this paper we use model magnitudes to calculate galaxy colours and Petrosian magnitudes to split galaxies in absolute magnitude ranges. After Schlegel et al. (1998), we correct the magnitudes with dust attenuation corrections provided for each object and each filter in the SDSS database.

The star-galaxy classification adopted by the SDSS photometric pipeline is based on the difference between an object’s PSF magnitude (calculated assuming a point spread function profile, as for a stellar source) and its model magnitude. An object is then classified as a galaxy if it satisfies the criterion (Stoughton et al. 2002)


where and magnitudes are obtained from the sum of the fluxes over photometric bands. This cut works at the 95 per cent confidence level for galaxies with . In Section 3.2 we discuss a different star-galaxy classification, following the GAMA survey, which is the one we adopt for this work (see also Appendix A).

A photometric redshift study can be vulnerable to contamination not only due to stars misclassified as galaxies, but also to contamination due to over-deblended sources (Scranton et al. 2002), usually coming from local spiral galaxies. This imposes limits on the angular scale over we can probe the correlation function. In order to test for this systematic in our sample, in Appendix B.4 we visually inspect random samples of the data and then we model the contamination as a function of angular separation.

3.2 GAMA sample

The Galaxy and Mass Assembly (GAMA) project2 is a combination of several ground and space-based surveys with the aim of improving our understanding of galaxy formation and evolution (Driver et al. 2011). GAMA uses the AAOmega spectrograph of the Anglo-Australian Telescope (AAT) for spectroscopy (Saunders et al. 2004; Sharp et al. 2006). Its targets are selected from the SDSS photometric sample. Target selection is described in detail by Baldry et al. (2010). The main restriction is that the source is detected as an extended object: . As shown in Appendix A, this criterion is also adopted for our sample extraction from SDSS. This criterion is more restrictive, in the sense that fewer stars will be mis-classified as galaxies, than the star-galaxy classification adopted by the SDSS photometric pipeline (previous Section), but similar to that used for the SDSS main galaxy spectroscopic sample (Strauss et al. 2002).

The GAMA survey is almost 99 per cent spectroscopically complete over its 144 deg area to mag (Driver et al. 2011). GAMA phase 1 (comprising 3 years of observations) includes 95,592 reliable spectroscopic galaxy redshifts to this magnitude limit, extending to redshift . Of these redshifts, 76,360 have been newly-acquired by the GAMA team. The rest come from previous surveys: SDSS (Abazajian et al. 2009), 2dFGRS (Colless et al. 2001; Cole et al. 2005), 6dFGS (Jones et al. 2004), MGC (Driver et al. 2005) and 2SLAQ (Cannon et al. 2006). The overall GAMA redshift distribution is shown in Fig. 13 of Driver et al. (2011).

For a consistent training of ANNz it is necessary to match all the GAMA objects with SDSS DR7 übercal photometry (Padmanabhan et al. 2008) and perform identical colour cuts. Once we apply the colour cuts (Section 3.3) necessary for the optimization of ANNz performance, and low and high redshifts cuts (), 93,584 redshifts remain. They are used to train our photometric redshift neural net algorithm as described in Section 4.

3.3 Colour cuts

Before we build our final sample from ANNz, we remove galaxies with outlier , , , colours both in the SDSS imaging sample and in the training set, because photometric redshift estimates are based primarily on these colours. The complete colour and magnitude cuts are given in Table 1. Less than per cent of the galaxies are affected by the colour cuts. These colour cuts in principle could affect the mask that we use for correlation function calculations. To estimate the extent of this effect we study the distribution on the sky of the colour outliers as well as their angular correlation function. This exercise reveals that colour outliers have a spurious correlation an order of magnitude larger on all angular scales than the correlation function of our final sample. However, since the number of these objects is almost three orders of magnitude less than the total, they would have a negligible effect on measurements if included.

Table 1: Colour and apparent magnitude cuts for the optimization of ANNz. All magnitudes are SDSS model magnitudes.

3.4 Final sample

Our aim is to obtain a galaxy sample with photometric properties as close as possible to our training set. To this end, we have selected galaxies from the SDSS DR7 photometric sample with the query used to select GAMA targets (Appendix A). We select galaxies which have “clean” photometry according to the instructions given on the SDSS website3. Our sample is hence limited by and satisfies the criterion for star-galaxy separation . In our analysis, we choose to calculate the correlation function for galaxies located in the SDSS northern cap, corresponding to 92 per cent of SDSS DR7 galaxies. As such, the geometry of the survey is simplified to a contiguous area. Our final sample, after the colour cuts given in Table 1 comprises 4,890,965 galaxies.

To evaluate the number of data-random and random-random pairs in equation 3, we need to build a mask for our sample. The mask precisely defines the sky coverage of the sample. We use the file lss_combmask.dr72.ply in the NYU Value Added Catalogue4 (Blanton et al. 2005b), mapping SDSS stripes, as our mask. This file contains the coordinates of the fields observed by SDSS expressed in spherical polygons, excluding areas around bright stars because galaxies in these regions can be affected by photometric errors. It is also suitably formatted for use with the mangle software (Hamilton 1993; Hamilton & Tegmark 2004; Swanson et al. 2008b), a tool for manipulating survey masks and obtaining random points with the exact geometry of the mask. Once masking is applied, 4,511,011 galaxies remain in our sample.

The upper panel of Fig. 1 shows the boundaries of the final mask for SDSS DR7 that we use for creating random catalogues. Our random catalogues consist of objects, approximately ten times larger than the number of galaxies in each luminosity and colour bin. Consistency checks have shown that our clustering results are not sensitive to any particular realization of the random catalogue. In Appendix B.1 we check the accuracy of the survey mask, as well as the photometric uniformity of the sample, by studying the angular clustering of our sample as a function of -band apparent magnitude.

3.5 Pixelisation scheme and jackknife resampling

Figure 1: The upper panel shows the jackknife regions used for the error estimation of our correlation function measurements. After modifying the SDSSPix scheme, there are 80 jackknife regions which contain approximately equal numbers of random points. The lower panel reports the normalized area of each pixel, based on a random catalogue. The deviations from uniformity show that differences in the areas of the JK regions are limited to per cent at most.

In order to speed up the computation of the correlation function, we pixelise our data according to the SDSSPix5 scheme. The basic concept consists of assigning galaxies located in a portion of the sky to a pixel. After this step, we only need to take into account galaxies in the same pixel and in the neighbouring pixels to calculate the correlation function up to the scale of a pixel. SDSSPix divides the sky along SDSS and spherical coordinates (as defined in Section 3.2.2 of Stoughton et al. 2002) in equal spherical areas. Different resolutions are available according to the angular scale of interest. We choose the resolution called basic resolution (resolution ). This divides the sky in 468 pixels of size deg. Then, for galaxies in a given pixel, that pixel and its 8 direct neighbouring pixels include all neighbouring galaxies with separations up to 9.4 degrees, the largest angular separation we consider (see Section 5).

We also use this pixelisation scheme to define the Jackknife (JK) regions for the error analysis. In order to minimize the variation in the number of objects in each JK region, some neighbouring pixels that contain the survey boundary are merged in order that they contain a more nearly equal number of random points. This modification of the SDSSPix pixelisation yields 80 JK regions, as shown in the upper panel of Fig. 1. The lower panel of Fig. 1 presents the relative variation in area of each region, as measured by the relative number of randoms each one contains. Hereafter, errors on are determined from 80 JK resamplings, by calculating omitting each region in turn. We have checked that our results are not significantly affected by using either 104 or 40 Jackknife regions. The elements of the covariance matrix, , are given by:


where is the angular correlation function of the JK resampling on scale , the mean angular correlation function and N the total number of JK resamplings. In practice, is identical with the angular correlation function measurement from the whole survey area. The factor in the numerator of equation 8 accounts for correlations inherent in the jackknife procedure (Miller 1974).

Jackknife is a method of calculating uncertainties on a quantity that that we measure from the data itself. In wide-field galaxy surveys, more often than not, large superstructures appear to significantly influence clustering measurements. The best known example is the SDSS Great Wall (Gott et al. 2005). The presence of such structures makes it tempting to present the results with and without the JK region that encloses them, as done in the clustering studies of Zehavi et al. (2005, 2011). Better still, Norberg et al. (2011) devise a more objective method to consistently remove outlier JK regions, from the distribution of all JK measurements that one has at hand. We follow that method in the present analysis, and find that for all samples considered, the number of JK regions that are outliers, and therefore removed, is mostly two or three and no more than five.

4 Photometric redshifts

Figure 2: Density/scatter plot of redshift error (spectroscopic minus photometric redshift) against predicted photo- from this work (top panel) and SDSS (middle and bottom panels). The colour coding is such that the densest area (black contour) is 5 times denser than the white contour. Points are drawn whenever the density of points is less than 10 per-cent of the maximum (black contour). The red squares and error bars represent the mean redshift errors and their standard deviations in photo- bins of width . Horizontal red lines show the zero error benchmark. The improvement in photometric redshift estimates in this work, due primarily to use of the representative GAMA training set, is clear.

For the clustering measurements presented in this paper, all distance information comes from photometric redshifts (photo-). Photo-s are the basis for estimating the redshift distributions to be used in equation 6 and in estimating distance moduli to calculate absolute magnitudes and colours. For this study we have a truly representative subset of SDSS galaxies down to and we therefore use the artificial neural network package ANNz developed by Collister & Lahav (2004) to obtain photo- estimates.

It is important that the training set and the final galaxy sample from SDSS are built using the same selection criteria. The input parameters are the following: übercalibrated, extinction-corrected model magnitudes in bands, the radii enclosing 50 per cent and 90 per cent of the Petrosian -band flux of the galaxy, and their respective uncertainties. The architecture of the network is 7:11:11:1, with seven input parameters described above, two hidden layers with 11 nodes each and a single output, the photo-. We use a committee of 5 networks to predict the photo-s and their uncertainties (see Section 4.1).

4.1 Photometric redshift errors

Before we proceed with the photo- derived quantities that we use in this study, we investigate the possible biases and errors that ANNz introduces, using the known redshifts from GAMA. Following standard practice we split our data into three distinct sets: the training set, the validation set and the test set. Half of the objects constitute the test set and the other two quarters the training and validation sets. This investigation is insensitive to the exact numbers in these three sets. The training and validation sets are used for training the network, whereas the test set is treated as unknown. Given predicted photo-s , we can quantify the redshift error for each galaxy in the test set as


the primary quantity of interest as far as true redshift errors are concerned. It can depend on apparent magnitude, colour, the output , the intrinsic scatter of ANNz committees, as well as the position of an object on the sky if the survey suffers from any photometric non-uniformity. We investigate some of these potential sources of error below. The dispersion , of is given by the equation


and is found to be . The standard deviation for the redshift range , within which we choose to work, is .

In Fig.  2 we compare our photo- estimates with the publicly available photo- from the SDSS website (Oyaizu et al. 2008, tables photoz1 and photoz2). For this comparison we plot the redshift error as a function of photo-. We then calculate the mean and the standard deviation of for photo- bins of width . The number of catastrophic outliers (galaxies with ) for the GAMA calibrated photo- is 1 percent or less for all photo- bins. We work in fixed photo- bins, because all our derived quantities are based on the photo- estimates. This way, any biases with estimated photo- are readily apparent. Our results based on the GAMA training set outperform the SDSS results — for the redshift range , we obtain essentially unbiased redshift estimates, given the observed scatter. The scatter, in turn, increases with redshift. We note, however, that the photoz2 catalogue from SDSS DR7 has been improved with the addition of estimates which are designed to perform much better in recovering the total redshift probability distribution function of all galaxies (Cunha et al. 2009). Since it is still not clear how to directly relate a redshift pdf to absolute magnitude and colour for a given galaxy, our approach for the study of luminosity- and colour-dependent clustering is easier to interpret.

In Appendix B.2, we quantify the photo- error and possible contamination between redshift bins by cross-correlating photo- bins which are more than apart. We find, as expected, that the residual cross-correlation of the different photo- bins is negligible compared to their auto-correlation.

The distribution of photo- errors is in general non-Gaussian, albeit less pronounced in the case of a complete training set. Photo- errors also propagate asymmetrically in absolute magnitude: for a given redshift error, the error induced in absolute magnitude is larger at low- and smaller at high-, and thus a photo- analysis is more tolerant to redshift errors for objects at high-. For that reason, it is common practice to scale the redshift error by the quantity . Taking into account this redshift stretch, can be defined as


giving .

We exclude from our analysis galaxies with or . ANNz provides a photo- error calculated from the photometric errors. Using our test set, we find that this error underestimates the true photo- error (given from equation 9). We therefore apply a cut on the output parameter of ANNz at . These cuts eliminate per cent of the galaxies. Cross-checks show that the correlation function measurements do not change if we use a less strict cut, but the chosen cut does improve the estimates. The final number of galaxies after this cut is 4,289,223. We summarize the changes in the number of galaxies in our sample in Table 2. We use Petrosian magnitudes to divide galaxies by luminosity and model magnitudes to calculate galaxy colours.

Table 2: The change in the total number of galaxies as a result of the cuts applied in various stages of the analysis.

The photo- work presented here is similar, but not identical, to that of Parkinson (2012). The latter is appropriate for even fainter SDSS magnitudes as it uses, in its training and validation, all GAMA galaxies with and fainter zCOSMOS galaxies (Lilly et al. 2007) matched to SDSS DR7 imaging. Minor differences in the two photo- pipelines, such as the inclusion of different light profile measurements, do not significantly affect the estimated photo-, which present a similar scatter around the underlying spectroscopic distribution. Our photo- agree with those of Parkinson (2012) within the estimated errors.

4.2 Division by redshift, absolute magnitude and colour

Figure 3: -band absolute magnitude against photo- for our photometric sample. Solid red lines show the boundaries of our samples in photo- and absolute magnitude and dashed lines the further split in absolute magnitude bins. Only 1 percent of the galaxies are shown.
Figure 4: -band absolute magnitude against colour (both -corrected and passively evolved to ) for galaxies split in photo- bins. Solid red lines show the colour cut for red and blue populations suggested by Loveday et al. (2012) and used in this work, while dashed red lines the colour cut used by Zehavi et al. (2011).
Figure 5: Redshift error against photo- for our luminosity and colour-selected GAMA subsamples. The mean redshift error and standard deviation in bins of photo- are shown by the coloured squares and error bars, while the root mean square standard deviation, , is listed in each panel. The faint red sample has been omitted due to the small number of galaxies that it contains.

Galaxy magnitudes are -corrected to , using kcorrect version 4.1.4 (Blanton & Roweis 2007) and the passive evolution parameter of Blanton et al. (2003). In this simple model, the evolution-corrected absolute magnitude is given by , where is the reference redshift. We note that Loveday et al. (2012) using GAMA found , which would change evolution-corrected magnitudes by mag at . Approximately equal deviations in absolute magnitude will be induced in our high- blue galaxy samples, if we use a colour-dependent Q (e.g. Loveday et al. 2012). Assuming a global value for however allows for a more direct comparison with the SDSS-based clustering studies of Zehavi et al. (2005, 2011). Galaxy colours, derived from SDSS model magnitudes, are referred to as , while absolute magnitude are derived using the -band Petrosian magnitude (to match the GAMA redshift survey selection). Fig. 3 shows that the -band absolute magnitude extends to mag with a few galaxies reaching as faint as mag.

We split our galaxy sample in photo- as well as luminosity bins. Our samples are shown in Fig 3. Initially we define four photo- bins in the redshift range and then we further split each photo--defined sample into six absolute magnitude bins in the range . Thus our photo- catalogue offers the opportunity for a clustering analysis over the luminosity range , spanning almost three orders of magnitude in .

In Fig 3 some of these redshift-magnitude bins extending beyond the survey flux limit are only partially occupied by galaxies in terms of photometric redshifts and photo- derived absolute magnitudes. The true redshift and absolute magnitude distributions for each bin are recovered by Monte-Carlo resampling, as discussed in Section 4.3.

Fig. 4 shows colour-magnitude diagrams for our sample split in photo- bins. The colour bimodality is evident at for all photo- bins. We have adopted the tilted colour cuts defined by Loveday et al. (2012),


which is a slightly modified version of the colour cut used by Zehavi et al. (2011), also shown in Fig. 4.

In Fig. 5 we plot the photo- error against photo- for galaxies subdivided into subsamples, where we again have used photometric redshifts to estimate galaxy luminosities and colours. There are no obvious systematic biases of for any of the subsamples, although we do note that the most luminous (faintest) bin contains very few blue (red) galaxies.

The relatively good photo-s notwithstanding, our analysis does not eliminate completely the main systematic error of neural network derived photo-, which is the overestimation of low redshifts and the underestimation of high redshifts (see e.g. Fig. 7 of Collister et al. 2007). As a result, a number of faint galaxies have their redshift overestimated and hence appear brighter in our sample. We note that there is a discrepancy between the fraction of faint red objects in the luminosity bin between this work and Zehavi et al. (2011), which is most probably caused by this systematic shift (see Table 3). It is possible to cure this by Monte-Carlo resampling the photo-s with their respective errors and then rederive the absolute magnitudes and colours, but we do not pursue this here.

4.3 Photometric redshift distribution(s)

Figure 6: Estimates of the underlying redshift distribution for the luminosity samples used in the clustering analysis. Thin solid lines show the photo- distribution, which is the basis for the selection, dotted lines the true spectroscopic redshift distribution from GAMA and thick solid line the average distribution inferred from 100 Monte-Carlo resamplings of the photo- distribution using equation 13.
Figure 7: The r-band absolute magnitude distribution for GAMA galaxies with split into photo- and photo--derived absolute magnitude slices. Magnitude distributions shown by dashed lines are derived from the raw photo-, by thin lines from the underlying spectroscopic redshifts and by thick lines from the Monte-Carlo derived magnitudes. The latter reproduces the true underlying spec- inferred magnitude distribution rather well; however for a few samples there is a discrepancy between the spec--derived and the Monte-Carlo-derived distributions. All MC absolute magnitude estimates are -corrected and passively evolved following the procedure described in Section 4.2.

Despite the fact that ANNz gives fairly accurate and unbiased photo-s for calculations in broad absolute magnitude bins or photo- bins, in order to translate the two dimensional clustering signal to the three dimensional one using equation 6, the underlying true is needed. In this work we loosely follow the approach given in Parkinson (2012), (see also Driver et al. 2011). The GAMA spectroscopic sample is highly representative and it allows us to calculate the true redshift errors as a function of photo- for all objects in GAMA with . Then, under the assumption of a Gaussian photometric error distribution in each photo- bin, we perform a Monte-Carlo resampling of the ANNz predictions for photo-s. This is equivalent to replacing each photo- derived from ANNz with the quantity drawn from a Gaussian distribution, using a photo- dependent standard deviation, :


Note that convolving the imprecise photo- with additional scatter improves the redshift distribution: in other words the photo- process deconvolves the and makes it artificially narrow.

All our sample selections in Fig. 6 have been made using the photo- derived absolute magnitude . We then use the accurate spectroscopic information from GAMA to assess how well Monte-Carlo resampling compares to the underlying true . Since the GAMA area is much smaller than the SDSS area, we do not wish to recover the exact spectroscopic redshift distribution, merely to match a smoothed version thereof. Our test shows that MC resampling performs rather well in recovering the true . This method performs even better with a larger number of objects, which indicates that we are still dominated by statistical errors and therefore there is room for improvement in future when larger spectroscopic training sets will be available. Nevertheless, as an incorrect redshift distribution can cause a systematic error in , in Appendix B.3 we test the sensitivity of our results to the assumed , and compare results using the Monte-Carlo recovered with those from the weighting method proposed by Cunha et al. (2009).

Fig. 7 shows, for all samples split by photo- and photo--derived absolute magnitude, the photo--derived, the true underlying and the Monte-Carlo inferred absolute magnitude distributions (as dashed, thin and thick solid lines respectively). We note that the photo- derived absolute magnitude estimates in Fig.  7 are obtained from the resampled redshifts and not by resampling the absolute magnitudes per se. We then -correct every Monte-Carlo absolute magnitude realization using the procedure described in Section 4.2. As expected, the true underlying distribution extends well beyond the photo- inferred luminosity bins, but is yet again rather well described by the Monte-Carlo inferred distribution.

It is crucial that we have a good understanding of the true underlying absolute magnitude for all our samples. For galaxy clustering studies with spectroscopic redshifts it is desirable to work with volume-limited samples. Using photometric redshifts, however, one can form only approximately volume-limited samples, since photo- uncertainties will propagate into absolute magnitude estimates. Essentially, any tophat absolute magnitude distribution, as selected using photo-, corresponds to a wider true absolute magnitude distribution, as shown in Fig. 7. This is rather similar to selecting galaxies from a photometric redshift bin and then convolving the initial tophat distribution with the photo- error distribution in order to obtain the true . However, using the statistic and an accurate for that particular galaxy sample we can extract its respective spatial clustering signal, which would then correspond to the derived absolute magnitude. Direct comparisons with other studies can then be made, modulo the extent of the overlap between the two absolute magnitude distributions.

5 Results for the two-point correlation function

5.1 Luminosity and redshift dependence

Figure 8: Two-point angular correlation functions of our samples split into photo- bins and six photo--inferred absolute magnitude bins, as indicated in each panel, with jackknife errors. The solid lines show power law fits estimated using the full covariance matrix for the sample. Dotted lines show the extension of the power law fits on scales and .
Figure 9: Left: Power law slope, , as a function of absolute magnitude and redshift. Right: Real space correlation length, , as a function of absolute magnitude and redshift. Absolute magnitude ranges for which and measurements are valid are given in Table 3.

Table 3: Clustering properties of luminosity-selected samples. Col. 1 lists the photo- based absolute magnitude ranges, col. 2 the median absolute magnitude and the associated 16 and 84 percentiles from the Monte-Carlo resampling (Fig. 7) and col. 3 the number of galaxies in each sample. Cols. 4, 5 and 6 list respectively the slope, , the correlation length, , and the reduced , , of the power law fit as defined in Section 2.4. Cols. 7, 8 and 9 show the same information but for power law fits using only the diagonal elements of the covariance matrix. All power law fits are approximately over the comoving scales  Mpc. Finally col. 10 presents the relative bias at 5  Mpc measured using equation 14.

We first calculate the angular correlation function for our samples selected on absolute magnitude and photometric redshift over angular scales from 0.005 to 9.4 degrees, in 15 equally spaced bins in log()6. In a flux-limited survey like SDSS, intrinsically bright galaxies dominate at high redshifts and intrinsically faint objects dominate at low redshifts (see Fig. 4). For that reason, we calculate for the 17 well-populated samples given in Table 3. Errors are estimated using the jackknife technique, with the covariance matrix given by equation 8. Even if the validity of a given error method based on data alone is still widely debated, it is commonly accepted that the jackknife method is adequate for angular clustering studies (see e.g. Cabré et al. 2007), while for 3-D clustering measurements, Norberg et al. (2009) have shown that the jackknife method suffers from some limitations, in particular on small scales.

Our angular correlation function measurements are broad and probe both highly non-linear and quasi-linear scales. Fig. 8 presents galaxy angular correlation functions for six photo- selected absolute magnitude bins. We show the angular scale (lower -axis), used for the correlation function estimation, and the corresponding comoving scale estimated at the mean redshift of the sample (upper -axis).

Over the range of angular scales fitted, chosen to correspond to approximately 0.1–20  Mpc comoving separation according to the mean redshift of each sample, the angular correlation function can be reasonably well approximated by a power law, equation 2. We perform power law fits, both with the full covariance matrix and with the diagonal elements only. The power law fits for our sample are shown in Fig. 8. Dotted lines in Fig. 8 show the extension of the power laws beyond the scales over which they were fitted. The resulting correlation lengths, , slopes, , and quality of the fits as given by the reduced , , for all samples are listed in Table 3.

The luminosity dependence of galaxy clustering is present in all photo- shells: the shape and the amplitude of the angular correlation function differ for galaxies with different luminosity. The amplitude of the angular correlation function decreases as we go from bright to faint galaxies for all photo- bins. The slope of the correlation function also decreases with decreasing luminosity, very much in line with the change in the fraction of red and blue galaxies. As observed in Section 5.2, red (blue) galaxies dominate the brightest (faintest) luminosity bins, with red galaxies preferentially having a steeper correlation function slope than blue galaxies.

For each sample, we estimate the correlation length via equation 6 using the Monte-Carlo inferred redshift distribution described in Section 4.3. The redshift distribution is calculated separately for each sample, as shown in Fig 6. In Appendix B.3 we investigate the effects of the assumed on the recovered correlation length , and show that the adopted recovery method compares favourably with the true underlying , as obtained from the smoothed .

For our luminosity bins in the redshift range , the correlation length is found to decrease as we go to fainter absolute magnitudes, from () to (). This is very much in line with the recent results of Zehavi et al. (2011). Moreover, we do not observe strong evolution with redshift for samples of fixed luminosity. All and measurements are shown in Fig. 9.

There are two main sources of error in the estimates: (a) the correlated uncertainties on the power law parameters and which propagate through equation 6 to ; (b) statistical and systematic uncertainties in the modelling of the underlying redshift distribution. The uncertainties and the induced error on and are obtained using the standard deviation from the distribution of JK resampling estimates (Section 3.5). As in the case of the covariance matrix, these uncertainties are multiplied by a factor of (Norberg et al. 2009). The uncertainties are investigated in great detail in Appendix B.3, where we show that the Monte-Carlo inferred performs best, while still returning a residual systematic uncertainty of on that depends on the sample considered. We find that both sources of uncertainty have a comparable contribution to the errors. In Table 3 we quote the total error on the correlation length after adding the two (independent) errors in quadrature.

5.2 Luminosity, redshift and colour dependence

Figure 10: Two-point angular correlation functions split by absolute magnitude and colour, with red circles (blue squares) showing the red (blue) sample. Colour gradients indicate the transition from bright (darker shade) to faint (lighter shade) luminosities. Lines are as in Fig. 8. The faintest (brightest) sample does not contain enough red (blue) galaxies to robustly estimate .

Table 4: Clustering properties of luminosity-selected red galaxies. Columns are the same as in Table 3.