BOSS LSS targeting and catalogues

SDSS-III Baryon Oscillation Spectroscopic Survey Data Release 12: galaxy target selection and large scale structure catalogues


The Baryon Oscillation Spectroscopic Survey (BOSS), part of the Sloan Digital Sky Survey (SDSS) III project, has provided the largest survey of galaxy redshifts available to date, in terms of both the number of galaxy redshifts measured by a single survey, and the effective cosmological volume covered. Key to analysing the clustering of these data to provide cosmological measurements is understanding the detailed properties of this sample. Potential issues include variations in the target catalogue caused by changes either in the targeting algorithm or properties of the data used, the pattern of spectroscopic observations, the spatial distribution of targets for which redshifts were not obtained, and variations in the target sky density due to observational systematics. We document here the target selection algorithms used to create the galaxy samples that comprise BOSS. We also present the algorithms used to create large scale structure catalogues for the final Data Release (DR12) samples and the associated random catalogues that quantify the survey mask. The algorithms are an evolution of those used by the BOSS team to construct catalogues from earlier data, and have been designed to accurately quantify the galaxy sample. The code used, designated mksample, is released with this paper.

cosmology: observations - (cosmology:) large-scale structure of Universe

1 Introduction

The size of galaxy redshift surveys has grown exponentially over the last decade and will continue do so into the next, thanks to the continuing development of instrumentation to undertake mulit-object spectroscopy (MOS) on dedicated telescopes. The scientific driver for this dramatic increase is that galaxy redshift surveys provide a wealth of cosmological and extra-galactic information. The most easily accessible cosmological information is encoded in 2-point clustering statistics of the over-density field, which contain both the Baryon Acoustic Oscillation (BAO) and Redshift Space Distortion (RSD) signals. The BAO scale is a comoving large-scale enhancement in pairs of galaxies separated by 150 Mpc, which can be used to track cosmological expansion. It arises from the propagation of sound waves in the early Universe (Peebles & Yu, 1970; Sunyaev & Zel’dovich, 1970; Doroshkevich et al., 1978), and is quite insensitive to astrophysical processing that occurs on smaller scales; thus BAO experiments are affected by a low level of systematics (see review by Weinberg et al. 2013 for a comparison of different methods). Redshift-Space Distortions arise from the peculiar velocities of galaxies within a comoving frame, which produce coherent distortions in the measured redshifts compared to those produced by the Hubble expansion (Kaiser, 1987). As these velocities are gravitational in origin, the amplitude depends on the rate of structure growth, and hence RSD allow tests of General Relativity (GR) on large scales.

The BAO signature has now been detected in many different galaxy surveys and analysed using a variety of methods. To show the exponential growth in BAO measurements, Fig. 1 presents the predicted error on the BAO scale expected for different surveys, calculated as if the clustering signal from different directions was optimally combined to provide the best possible single BAO position measurement. We include results from various stages of the 2-degree-Field Galaxy Redshift Survey (2dFGRS; Colless et al. 2001, 2003), Sloan Digital Sky Survey (SDSS; York et al. 2000) and WiggleZ (Drinkwater et al, 2010), and predictions for the continuation of the SDSS project with eBOSS (Dawson et al., 2015). For consistency, all calculations used the code of Seo & Eisenstein (2007), approximating each survey as a single volume, limited in redshift and area, and sampled by a constant density of galaxies, with numbers approximately matching those of the actual surveys. Thus the results themselves are not precise and are designed to simply demonstrate the evolution rather than provide a quantitative comparison between experiments. The best fit line shows the growth in the impact of past surveys, following the development of Multi-Object Spectrographs (MOS) on the Anglo-Australian telescope (Lewis et al., 1998) and the Sloan telescope (Gunn et al., 2006), which continues to the next generation with a new MOS being developed for the Hobby-Eberly telescope (HETDEX; Hill et al. 2008), the Mayall telescope (DESI; Levi et al. 2013), the VISTA telescope (4MOST; de Jong et al. 2014), the William Herschel Telescope (WEAVE; Dalton et al. 2014), the Subaru telescope (PFS; Takada et al. 2014) and the satellite experiments Euclid (Laureijs et al., 2011) and WFIRST (Spergel et al., 2015) . For clarity we only plot approximate DESI and Euclid predictions in Fig. 1 to show the general expected trend from these new instruments, as our simplified approach is insufficient to provide a careful differential analysis of these future projects. Also, there is significant uncertainty in the predictions for Euclid, as a consequence of our lack of knowledge about the galaxy population targeted: the prediction here uses the predicted volume and galaxy density of Laureijs et al. (2011). The higher redshift surveys of eBOSS and WiggleZ are inherently more difficult and consequently they lie above the line: they push into new redshift ranges, rather than to larger volumes.

Figure 1: BAO measurement errors predicted for various surveys as a function of the year of publication. In order to calculate these with a consistent methodology we plot “predictions” using the code of Seo & Eisenstein (2007) based on a single number of galaxies, and volume for each survey. The surveys plotted are 2dFGRS early (Percival et al., 2001) and final (Cole et al., 2005); SDSS-II LRGs (Eisenstein et al., 2005); WiggleZ (Blake et al., 2011); BOSS DR9 CMASS (Anderson et al., 2012); BOSS DR11 LOWZ (Tojeiro et al., 2014) and CMASS (Anderson et al., 2014b). In terms of survey volume, BOSS DR12 is very close to DR11 and we do not show it here. We also present approximate predictions for the eBOSS, DESI and Euclid future surveys (see text for details).

In this paper, we present the target selection and catalogue generation of the Data Release 12 (DR12; Alam et al. 2015) samples of galaxies selected from the Baryon Oscillation Spectroscopic Survey (BOSS; Dawson et al. 2012), which is part of SDSS-III (Eisenstein et al., 2011). The spectroscopic sample has two primary catalogues: LOWZ at , and CMASS covering (see Section 3 for details). An overview of the BOSS observations is provided in Section 2; see Dawson et al. (2012) for a full description of the survey.

The work presented here follows on from the analysis of previous data releases: DR12 is the third public SDSS data release containing BOSS spectroscopic results. The first was DR9 (Ahn et al., 2012), when the survey was approximately one third complete. The creation of the large-scale structure catalogues from these data was outlined in Anderson et al. (2012), alongside the isotropic BAO results, with the anisotropic results following in Anderson et al. (2014a). The development of a method to remove potential systematic errors in the clustering measurements caused by fluctuations in the target catalogue was presented in Ross et al. (2012). The DR9 catalogues were used extensively for science, which also tested the catalogues themselves. The clustering was compared with simulations in Nuza et al. (2013), and with full model fits in Sanchez et al. (2012). RSD were measured by Reid et al. (2012) and enhanced by knowledge of passive galaxy evolution in Tojeiro et al. (2012): the resulting GR tests were presented in Samushia et al. (2013). Primordial non-Gaussianity was constrained by Ross et al. (2013), while Zhao et al. (2013) reported neutrino masses, and Scoccola et al. (2013) examined the time variation of physical constants. This work led to further refinements of the catalogue creation algorithm for the analysis of the second public BOSS data release DR10 (Ahn et al., 2014), which coincided with an internal release (called DR11). In particular, the code was rewritten into a modular version, called mksample, new weights were used to correct for fluctuations in the expected target density, and new masks were used for “bad” areas. These refinements were presented alongside the BAO results for the CMASS sample in Anderson et al. (2014b) and the LOWZ sample in Tojeiro et al. (2014), and were confirmed to be robust to colour (Ross et al., 2014) and against possible systematics in the fit (Vargas-Magana et al., 2014). As for DR9, the results were extensively used, further testing the catalogues: RSD measurements have been made in a number of different ways (Beutler et al., 2014a; Samushia et al., 2014; Sanchez et al., 2014; Chuang et al., 2013), the bispectrum calculated and analysed (Gil-Marin et al., 2014a, b), and neutrino mass constraints presented (Beutler et al., 2014b). Saito et al. (2015) account for redshift dependent selection effects and compare clustering and RSD with predictions from abundance matching.

We have now analysed the final BOSS DR12 galaxy sample using an algorithm that builds on the work described above. This paper on the targeting algorithm and catalogue creation method is complemented by a series of papers measuring and analysing clustering, splitting the BOSS galaxies into sub-samples delineated by the primary targeting algorithms LOWZ and CMASS samples (see Section 3 for details). BAO measurements are presented in configuration-space (Cuesta et al. 2015) and Fourier-space (Gil-Marin et al. 2015a), and RSD measurements made in Fourier-space are presented in Gil-Marin et al. (2015b). Two further support papers are provided in this set: Ross et al. (2015) considers the BOSS selection function in more detail, presenting the observational foot-print, masks for image quality and Galactic extinction, and weights to account for density relationships intrinsic to the imaging and spectroscopic portions of the survey. Vargas-Magana et al. (2015) presents systematic tests on the reconstruction algorithm used for anisotropic BAO analyses. A subsequent set of analyses to be released soon, will consider jointly analysing the full BOSS sample, without splitting by target selection.

Because the key cosmological measurements depend on the density field, galaxy properties (except how they trace this field, commonly quantified by a linear deterministic bias ), are unimportant once redshifts have been measured, and cosmological surveys are free to choose which galaxies to observe to optimise survey efficiency and the optimal bias . BOSS targets luminous galaxies for spectroscopic observations as they have a large bias, are relatively easy to target, and have strong spectral features that ease redshift determination. The target selection adopted by BOSS is an extension of the targeting algorithms for the SDSS-II (Eisenstein et al., 2001) and 2SLAQ (Cannon et al., 2006) Luminous Red Galaxies (LRGs), targeting fainter and bluer galaxies in order to achieve the desired number density of  hMpc. The majority of the galaxies are old stellar systems whose prominent 4000 Å break makes them relatively easy to target using multi-colour data. The data from which the samples are targeted is described in Section 2, and the LOWZ and CMASS target selection algorithms are discussed in detail in Section 3.

In order to do large-scale structure analyses with the sample of spectroscopically observed galaxies, we have put together catalogues including information on the detailed angular and radial mask of the sample including the redshift completeness, the observing conditions when the imaging and spectroscopic observations were made, and the appropriate weights to give each object, as well as random (i.e., unclustered) catalogues with the same selection function. These collectively make up the large-scale structure catalogues, whose contents are detailed in this paper.

Key to creating these catalogues for the BOSS galaxy surveys is the ability to predict where we could have observed galaxies, as well as where galaxies exist, thus defining the survey or sample mask. This mask is intricately linked with the selection of galaxies: in general, corrections for selection effects can be applied to either the mask or the galaxy sample to produce a match between the two. In order to understand the mask, we need to understand both the target sample and the subsequent spectroscopy and redshift measurement, which we briefly summarise in Section 4. The BOSS galaxy mask is quantified using a “random catalog”, a Poisson sampling of the volume covered by the selected galaxies, including any variations in density other than the cosmological clustering signal we wish to measure. The “3D mask” does not have to be quantified by a Poisson sampling, but this is a straightforward approach to this - in effect providing a Monte-Carlo sampling of the volume covered. This weighted random sample and the weighted galaxy sample form the starting point for the key BOSS galaxy clustering analyses. Section 5 presents the method adopted by the BOSS team to prepare catalogues of galaxies and randoms, using routines made publicly available in a code called mksample. This is a further extension of the code used for the early DR9 analyses, which is described in Anderson et al. (2012).

Although the targeting algorithm adopted for BOSS is isotropic and the catalogue of target objects covers an angular area larger than that of spectroscopic observations, the mask is complicated by various anisotropic effects including variations in imaging depth due to recalibration of the SDSS photometric scale and rereduction of the imaging as the spectrosocpic survey progressed, variation of seeing, variation with stellar density caused by occultation by stars, the inability to measure spectra for close to another target observed at the same time, and the failure to measure spectra as a function of signal-to-noise ratio in the spectrum. These effects are often corrected by applying a weight to the galaxies (e.g. Ross et al. 2012), but could instead be incorporated into the mask. The quality of the DR12 data is such that we can now observe systematic effects that couple radial and angular fluctuations, and we introduce 3D corrections for these. The manner adopted to deal with these effects for BOSS is described in Section 6.

BOSS includes a number of galaxy catalogues with different selection functions, some of which spatially overlap. The combination of these to optimally quantify the underlying matter overdensity field is non-trivial, and we present the method adopted by the BOSS team in Section 7.

The mksample code will be released upon publication of this paper, and we will also publish the resulting Large-Scale Structure catalogues, with a full datamodel describing each. These will all be linked from the main SDSS web site .

2 Data

2.1 Imaging Data

The Sloan Digital Sky Survey (SDSS-I/II; York et al. 2000) imaged approximately 7,606 deg of the Northern Galactic Hemisphere and 600 deg of the Southern Galactic Hemisphere in the bands (Fukugita et al., 1996; Smith et al., 2002; Doi et al., 2010), using a specially designed camera (Gunn et al., 1998) on the 2.5m Sloan telescope (Gunn et al., 2006) at the Apache Point Observatory in New Mexico. The SDSS-III project (Eisenstein et al., 2011) obtained additional imaging to make the region of the Southern Galactic Hemisphere contiguous, covering 3,172 deg. As part of this effort, the original SDSS-I/II data and the SDSS-III data were reduced with the latest versions of the SDSS image processing and calibration pipelines (Lupton et al., 2001; Pier et al., 2003; Padmanabhan et al., 2008). These data were released as part of Data Release 8 (Aihara et al., 2011), and form the parent imaging catalogue for the BOSS galaxy target selection. There are a number of differences between the processing performed for DR8 (see the DR8 paper Aihara et al. 2011 for a detailed discussion) and earlier reductions; reproducing BOSS galaxy samples derived from the imaging data requires using the appropriate algorithms.

spherical polygon The base unit of a Mangle mask. Spherical polygons are used to represent the boundaries of the imaging survey from which the targets are drawn, the circular fields defined by spectroscopic tiles, as well as regions to be removed from the survey footprint (e.g. the centerpost of each spectroscopic tile; see Sec. 5.1.1 for a full list).
spectroscopic tile Output of the tiling algorithm providing a central location on the sky and list of targets to be observed for spectroscopic observations. Each tile has a circular field-of-view of radius 1.49 degrees, and can be observed by multiple plates.
plate Physical plate with a hole drilled for each target, based on the anticipated airmass of observation. Spectroscopic tiles may be observed using multiple plates.
chunk Basic unit of sky input to the tiling algorithm. It consists of a set of rectangles in a spherical coordinate system. The SDSS-III BOSS survey is composed of 38 chunks.
sector The union of spherical polygons defined by a unique intersection of spectroscopic tiles. The survey completeness is treated as uniform within a sector.
Table 1: Basic definitions for the geometric description of SDSS-III BOSS observations and the LSS Mangle masks.

BOSS obtained spectra and redshifts for 1,372,737 galaxies over 9,376 deg. The targets are assigned to tiles of diameter 3 deg, using a tiling algorithm that is adaptive to the density of targets on the sky (Blanton et al., 2003). Spectra are then obtained using the BOSS spectrographs (Smee et al., 2013). Each observation is performed in a series of 900 sec exposures, integrating until a minimum signal-to-noise ratio is achieved for the faint galaxy targets. Redshifts are then measured using the methods described in Bolton et al. (2012). The spectroscopic observations were split into distinct areas of sky, which we call chunks, targeted separately and sequentially in time, each defined by a subset of the total footprint. Later chunks can overlap earlier chunks and recover unobserved targets. The angular distribution of chunks 2-11, which are special as they reflect early versions of the target selection (see Appendix A), but also serve to show how the survey is built up from chunks are shown in Fig. 12, and basic definitions for geometrical descriptors used in this paper are provided in Table 1.

The start of spectroscopic observations preceded the finalisation of the DR8 imaging reductions, so the imaging data used by BOSS are based on the photometric measurements available at the time of tiling (see Section 4.2), which may differ from the quantities available for an object in the DR8 catalog. BOSS targeting was performed using three different versions of the reduction software that resolves the catalogues from overlapping imaging data (RESOLVE; see Aihara et al. 2011). Chunks 1–4 used a version of the RESOLVE software tagged on 14-06-2009, chunks 5–11 used a version tagged on 16-11-2009, and chunks 12 onwards used the same version as that used to produce DR8. In total, 17% of targets were targeted with pre-DR8 RESOLVE versions. Because these different versions of the software selected different imaging data to be designated as “primary” (i.e. either the only or the best observation of this object; see the DR8 documentation for more details), approximately 9% of the imaging data used for targeting CMASS galaxies is now designated as secondary2 in the DR8 database.

2.2 Parent catalog

The selection of galaxy targets for spectroscopic observation is based on a parent catalogue of photometrically identified objects within the imaging data. The parent catalogue was based on objects chosen from 3172 deg in the Southern Galactic Cap (SGC) and 7606 deg in the Northern Galactic Cap (NGC), as described in this section. The SDSS imaging pipeline returns a number of different measurements of the photometry of galaxies. Full descriptions may be found on the SDSS website3 and in Stoughton et al. (2002). For galaxy target selection we use three photometric measurements, which have all been corrected for Galactic extinction using the Schlegel, Finkbeiner & Davis (1998) dust maps.

The colours of galaxies are based on SDSS model magnitudes (denoted by the subscript mod). These are determined by using the best-fit (psf-convolved) deVaucouleurs or exponential profile fit in the band to determine the fluxes in the other bands (full details are provided in Abazajian et al. 2004). Cuts in apparent magnitude are made with “cmodel” magnitudes (denoted by a subscript cmod). These are a linear combination of the flux from the best fit exponential and deVaucouleurs profile fit in each band separately4


where is the best-fit coefficient obtained from a fit of the linear combination of the deVaucouleurs and exponential profile fits to the image, and weights the different contributions (reported as by the SDSS pipelines), and represents the flux (not magnitude) assuming an exponential or deVaucouleur profile. Star-galaxy separation compares the PSF magnitudes of galaxies (denoted by a subscript PSF) with model or cmodel magnitudes; PSF magnitudes underestimate the flux from extended sources compared with the model fits (see § 3.3.2 for details). Finally, we use “fiber2” (denoted by subscript fib2) magnitudes to estimate the expected flux through the SDSS-III 2 fibres.

The parent sample for the BOSS galaxy target selection is constructed by selecting all detected objects that the photometric pipeline classifies as galaxies, and that are chosen by RESOLVE to be “primary”. The targeting software uses the photometry of the primary objects to select targets for spectroscopic follow-up. The variation in selected targets from different imaging data is consistent with that expected given the photometric uncertainties, and so we treat the regions targeted with pre- and final DR8 imaging as statistically identical. We do not make any cuts on photometricity at this stage; unphotometric data is discarded at the catalogue creation stage (see § 5.1.1). Users constructing their own samples for science analyses are advised to use the CALIB_STATUS flag to cut on photometricity (restricting to photometric observations corresponds to CALIB_STATUS==1). We cull objects with suspect photometry as reported in the flags set by the imaging pipeline. In particular, we require objects that are detected in the and bands. In the Image Processing pipeline, this is indicated by having one of the BINNED1, BINNED2 or BINNED4 flags set in both the and bands. We also require that the OBJC_FLAG flag, which is a combination of the per-filter flags appropriate for the whole object (the full definition is provided in Stoughton et al. 2002) has

  1. Objects not to be saturated :(NOT SATUR) OR (SATUR AND (NOT SATUR_CENTER)),

  2. Blended objects : (NOT BLENDED) OR (NOT NODEBLEND),


3 Target selection

We now turn to the specifics of the target selection algorithms used to define the BOSS spectroscopic galaxy samples. We first summarize the criteria that we wish our algorithm to satisfy (§ 3.1), with the aim of defining a uniformly selected sample over a broad redshift range. The galaxy sample is targeted using two different algorithms, which we term “LOWZ” (detailed in § 3.2) and “CMASS” (for “Constant (stellar) Mass”, § 3.3), respectively. Star-galaxy separation is treated differently in the CMASS sample than elsewhere in SDSS, as we describe in § 3.3.2. A variant of the CMASS algorithm was used to explore the colour boundaries of the sample (§ 3.4).

3.1 Requirements and Criteria

The BOSS sample was designed to measure the BAO signature in the two-point galaxy clustering signal, and in particular to meet error requirements on the measurement of the angular diameter distance and Hubble parameter at and . These requirements can be met by a survey covering an area of approximately 10,000 deg with a comoving number density of galaxies of  hMpc for . This density is close to optimal for large-scale cosmological studies (e.g., Kaiser 1986). To efficiently undertake such a survey using the Sloan telescope and spectrographs, we need to select a sub-sample of the parent catalogue of photometrically identified objects that fulfil the following criteria simultaneously:

  1. galaxies that lie in the desired redshift range ,

  2. sufficient galaxies to meet the desired density over the full redshift range,

  3. well-defined limits in stellar populations, to isolate a strongly clustered subsample of galaxies,

  4. redshifts that can be measured in a relatively short exposure with our telescope,

  5. few contaminating objects that are not part of the desired sample,

  6. selectable uniformly across the desired area,

  7. selection is not sensitive to systematic errors in the data used.

The challenge of target selection is to provide an algorithm for selecting the subsample of the parent imaging catalogue that optimally meets these goals. Selection based solely on an apparent magnitude cut, as used for the SDSS-I and -II Main Galaxy Sample (Strauss et al., 2002) in general selects too many low redshift and low luminosity galaxies. Rather, in BOSS we follow a similar philosophy to the selection of Luminous Red Galaxies (LRGs) in SDSS-I and -II (Eisenstein et al., 2001) and the 2SLAQ survey (Cannon et al., 2006) using colour-magnitude and colour-colour cuts, selecting luminous galaxies with strong spectral features (item 4 above).

Figure 2: Top panel: Black dots show median for LOWZ spectroscopically confirmed galaxies as a function of measured redshift, with the dashed lines showing the interquartile range. The efficiency of using this quantity to track redshift is clear. Bottom panel: Median as a function of redshift for confirmed CMASS galaxies, with interquartile range (dashed lines). The way in which we can track the high-redshift locus of galaxies using this colour, and select as a function of redshift, is clear.

At redshifts , we can select such a sample by extending to fainter LRGs than observed in SDSS-I and -II. At higher redshifts, we do not restrict ourselves to red galaxies, and instead select an approximately stellar mass-limited sample of objects of all intrinsic colours. As in Eisenstein et al. (2001), two sets of colours are necessary to describe the colour locus: one when the 4000 Å break lies in the SDSS -band, and the other when it redshifts into the -band at . Selecting these two subsamples requires defining fiducial colours that track the locus of a passively evolving population of galaxies in colour space. Following Eisenstein et al. (2001) and Cannon et al. (2006), we define


to describe the low redshift locus and


to describe the high-redshift locus. As discussed above, the colours are defined using SDSS model magnitudes, and are corrected for Milky Way extinction. The efficiency of these selections to select luminous galaxies as a function of redshift is demonstrated in Fig. 2, which shows how and versus redshift for observed BOSS galaxies.

Where the targeting algorithms use colour selection, they are built on model magnitudes, which are based on the flux measured through equivalent apertures in all band and thus provide unbiased colours of galaxies. Brightness limits are instead based on cmodel magnitudes, which provide better estimates of the total light observed.

Figure 3: Density plot of LOWZ galaxies in the colour plane; red corresponds to higher density and dark blue to lower density, in an arbitrary normalisation and linear scale. Redshift increases rightwards and upwards along the galaxies locus, starting at on the bottom left corner. The knee on the galaxy locus is caused by the 4000 Å break transitioning between the and -band filters, and happens at . The colours and are simple rotations of this colour plane, and trace the position of a target in parallel and perpendicular, respectively, to the data locus. The black thick line represents the passively evolving LRG model of Maraston et al. (2009). The green and red dashed lines are the colour and magnitude targeting cuts – see the main text for details. The few targets seen outside of the selection cut are due to differences in the targeting and final photometry, see Section 2.

3.2 The LOWZ sample

The LOWZ sample is designed to extend the SDSS-I/II Cut I LRG sample (Eisenstein et al., 2001) to to fainter luminosities, in order to increase the number density of the sample by roughly a factor of 3. Fig. 3 shows how the colours and describe the evolution of a passively evolving stellar population with redshift. Redshift increases from the bottom left to upper right. The black line shows the passively evolving LRG model of Maraston et al. (2009). The Maraston et al. (2009) ’LRG’ template is a model of a metal-rich population in passive evolution containing a small fraction of a metal-poor coeval population. This model was found to be a good fit to the  colours of luminous red galaxies (LRGs) from the 2SLAQ survey (Cannon et al., 2006) as a function of redshift, over models containing star formation in various amount. The same model also better fit the overall luminosity evolution of BOSS galaxies (Montero-Dorta et al., 2015). The knee seen in the galaxy locus corresponds to the transition of the 4000 Å break from the to the band. The parameter quantifies the position of a galaxy along the main locus, and characterises the departure of a galaxy from the centre of the locus; lies approximately at the centre of the galaxy distribution.

We select targets at low redshift () around the predicted colour locus using


(red dashed lines in Fig. 3) and we select the brightest and reddest objects at each redshift using a sliding colour-magnitude cut with (an effective proxy for a photometric redshift):


The dashed green lines in Fig. 3 show the effective cuts in for three different band magnitudes: and mag corresponding to the faint boundary, the median magnitude and the bright boundary of the sample respectively. Thus fainter objects must be redder to pass the cut. This cut is the most important criterion in the selection of LOWZ galaxies - it drives the number density of the sample by effectively setting the magnitude limit as a function of redshift, and aims to produce a constant number density over the desired redshift range. The number of galaxies in the sample is therefore highly sensitive to this cut (see Ross et al. 2012; Tojeiro et al. 2014). The resulting space density of the sample is shown in Fig. 11; the sample is close to volume-limited (constant space density at ) over the redshift range .

We impose brightness limits on the targets, such that


The faint limit ensures a high redshift success rate. The bright limit excludes a significant number of low-redshift blue galaxies that would otherwise pass the colour cut, but also excludes a fraction of brightest cluster galaxies in low-redshift massive clusters (Hoshino et al., 2015). A bright cut was not needed in SDSS-I/II as such galaxies were already targeted by the SDSS-I/II Main Galaxy Sample (Strauss et al., 2002), but a significant fraction of the BOSS footprint lies outside that of SDSS-I and -II.

The star-galaxy separation follows the same procedure introduced in Eisenstein et al. (2001) for the LRGs,


The cmodel magnitude is a proxy for a “total” magnitude for a galaxy, while the PSF magnitude fits the unresolved component of the object. The difference between the two is therefore a measure of the extendedness of the galaxy.

In summary, the LOWZ selection algorithm, as implemented after commissioning, is as follows:


The galaxies in the LOWZ sample may be selected from the DR12 database using the following flags, whose definitions can be found on the SDSS website5:

  • BOSS_TARGET1 &&   Objects targeted by the LOWZ algorithm.

  • SPECPRIMARY == 1   Objects with spectra, removing duplicate observations.

  • ZWARNING_NOQSO == 0   Objects whose spectroscopic redshifts are cleanly measured.

  • CLASS_NOQSO == ’GALAXY’   Objects whose spectra are those of a galaxy (as opposed to a quasar or star).

The basic properties of the LOWZ sample are presented in Parejko et al. (2013), who fitted the small-scale clustering of the galaxies using halo occupation distribution (HOD) modelling. They demonstrated that these galaxies lie in massive haloes, with a mean halo mass of M, a large-scale bias of and a satellite fraction of %. These galaxies occupy haloes with average masses between those of the CMASS sample and the original SDSS I/II LRG sample.

Exceptions to the LOWZ targeting

During the first nine months of BOSS observations, the incorrect star-galaxy separation criterion was used to identify LOWZ targets, removing a significant fraction of galaxies (see Appendix A). To select a uniformly-targeted sample from all LOWZ redshifts, with the selection criteria described in this section, the simplest procedure is to avoid those data with the use of an additional cut


where TILEID identifies spectroscopic tiles, and this cut corresponds to chunk numbers larger than 6.

Further details on this issue, and other slight changes in the targeting of LOWZ galaxies in early chunks, can be found in Appendix A. Briefly, LOWZ targets in chunk 2, and LOWZ targets in chunks 3-6, were selected with different algorithms from those of subsequent data. For the purposes of a large-scale structure catalog, in previous data releases we simply removed chunks 2-6 from the LOWZ sample and the corresponding mask. In § 7 we construct separate samples using the chunk 2 (“LOWZE2”) and chunk 3-6 (“LOWZE3”) selections, and combine all three LOWZ catalogues with the CMASS samples to construct a single unified sample appropriate for analyses restricted to large scales, such as BAO fitting. The effects of these changes on the density of galaxies measured as a function of redshift can be seen in Fig. 11.

3.3 The CMASS sample

Figure 4: Both panels show density plots of CMASS galaxies; red corresponds to higher density and dark blue to lower density in an arbitrary normalisation and linear scale. The black thick line shows the passively evolving LRG model of Maraston et al. (2009). Top: redshift increases upwards, starting at at . Bottom: the sliding cut in with band magnitude, designed to select an approximately stellar-mass complete sample. Stellar mass increases with the perpendicular distance to the sliding cut, represented here by the red dashed line - see Maraston et al. (2013) for details. The green dashed line shows the sliding cut adopted for the CMASS SPARSE sample (see Section 3.4). Vertical solid lines show the magnitude limits. On both panels, the small fraction of targets that lie outside of the selection cut are due to differences in the targeting and final photometry, see Section 2. Only chunks greater than 6 are shown.

The CMASS sample uses similar selection cuts to those utilised by the Cut-II LRGs from SDSS-I/II and the LRGs in 2SLAQ, but extends them both bluer and fainter in order to increase the number density of targets in the redshift range and get closer to a mass limited sample.

The quantity (Fig. 4) effectively discards low-redshift galaxies by choosing


We do not apply any further colour cuts, with the exception of a sliding colour-magnitude cut that selects the brightest objects at each redshift, in such a way as to keep an approximately constant stellar mass limit over the redshift range of CMASS according to the passively evolving model of Maraston et al. (2009):


This approach is a significant departure from SDSS-I/II Cut-II and 2SLAQ LRGs - which consisted of essentially a flux-limited sample with a colour cut to isolate the reddest galaxies.

We impose model and magnitude limits as follows:


The faint magnitude limits are set to ensure a high redshift success rate, whereas the bright limit protects against some low-redshift interlopers. In the first 14 tiling chunks, CMASS objects were targeted with , but the redshift failure rate at the faint end of this range was quite poor, so we revised this limit to the final value of .

To exclude outliers with problematic deblending, we further impose the following cuts on colour and (the effective radius in the fit to the deVaucouleurs profile for the -band magnitude, measured in pixels):


These cuts remove a very small fraction of targets. The CMASS star-galaxy separation is described in detail in the next section.

CMASS galaxies can be selected from the DR12 database using the following flags:





The basic clustering properties of the CMASS sample are presented in White et al. (2011), which fitted the small-scale clustering of the galaxies using HOD modelling. They showed that these galaxies lie in massive haloes, with a mean halo mass of M, a large-scale bias of and a satellite fraction of %. These galaxies occupy haloes with lower masses than those of the LOWZ sample, although the bias is similar, a consequence of them being at higher redshift.

CMASS galaxies are massive, with (e.g. Chen et al. 2012; Maraston et al. 2013), and the majority are dominated by old stellar populations with low star-formation rates (e.g. Chen et al. 2012; Thomas et al. 2013; Tojeiro et al. 2012). Maraston et al. (2013) argues that the CMASS sample becomes significantly incomplete at stellar masses and for a Kroupa initial mass function, and is roughly consistent with a volume-limited sample at higher masses and lower redshift. Thomas et al. (2013) presented similar results, showing that stellar velocity dispersions of BOSS galaxies peak at 240 kms with a narrow distribution virtually independent of redshift. Most recently, Leauthaud et al. (2015) quantified the stellar mass completeness of CMASS and LOWZ using data from the Stripe 82 region of sky along the celestial equator - a narrow, but deeper subset of the SDSS imaging survey region, that is 2 magnitudes deeper than the single epoch SDSS imaging (Annis et al., 2014). Using the Stripe 82 Massive Galaxy Catalog (Bundy et al., 2015), they estimate that CMASS is 80% complete at in the redshift range . The stellar mass completeness of CMASS decreases at lower and higher redshifts and the denomination “constant mass” should be considered only as a loose approximation outside of the redshift window . However, the combination of LOWZ and CMASS yields a spectroscopic sample that is 80% complete at at . Compared to cut-II LRGs, CMASS galaxies have a larger range of properties including morphology (Masters et al., 2011), star-formation rates (Thomas et al., 2013; Chen et al., 2012) and star-formation histories (Tojeiro et al., 2012), partly because no red cut has been imposed on the g-r observed-frame colour. It should be noted however that, for example, galaxies with detectable emission-lines (hence hosting very young stellar populations) still represent only 4% of the sample (see Thomas et al. 2013).

Exceptions in CMASS targeting flag

The meaning of BOSS_TARGET1 && (the CMASS targeting flag) evolved during the first 14 chunks of the survey. Therefore BOSS_TARGET1 && will not select CMASS galaxies (as defined by the equations in the previous sections) in these regions, and further subsampling is required based on galaxy colours and magnitudes to recover the final selection in these regions. Alternatively, these chunks can be explicitly excluded. For the first 14 chunks the following exceptions should be noted:

  • Chunks 1 & 2: The data taken in the commissioning phase (chunks 1 & 2) used a significantly broader selection criteria (see Section 3.3.3), and therefore must be dealt with carefully.

  • Chunks 3-6: The data taken in chunks 3-6 used a slightly looser cut, selecting instead on .

  • Chunks 1-14: As mentioned above, the cut in changed during the survey. In chunks 1-14 the targeting required .

With the exception of chunk 1, all of these chunks are included in the LSS catalogue after applying the required subsampling based on colours and/or magnitudes.

Star-Galaxy Separation in the CMASS sample

The difference between psf and model magnitudes is a measure of the extendedness of a source, thus making it useful to separate stars from galaxies. For the commissioning phase of the survey we applied a star-galaxy separation criterion identical to that used in the 2SLAQ survey (Cannon et al., 2006), a sloping cut in :


Whilst this cut is effective at removing the bulk of the stars, roughly 6.9% of 7000 CMASS targets from the commissioning runs had stellar spectra, mainly cool M-dwarfs.

Fig. 5 displays the distribution of the spectroscopically classified stars (blue) and galaxies (red) in the vs and vs planes for these commissioning targets. As expected, the stars preferentially occupy lower values of psfmodel than the galaxies, and so applying a more restrictive cut would remove more stars but at the expense of removing some galaxies. For a maximum loss of just 1% of galaxies we found the linear cuts that would remove the largest numbers of stars. These cuts, shown as the black lines in Figure 5, remove 31% and 52% of the stars that remained in the commissioning data for the and band cuts respectively. Since the band cut,


performed significantly better, it was added to the original band cut (Eq. 20) for all data from Chunk 3 onwards (i.e., after the commissioning runs), such that targets have to pass both cuts to be selected. Even though the band cut alone removes the vast majority of the stars excluded by the band cut, we kept the band cut in place to ensure that we could apply a consistent star-galaxy separation throughout the survey. This is achieved simply by retroactively applying the band cut to the commissioning data.

Figure 5: The distribution of spectroscopically confirmed stars (large blue points) and galaxies (small red points) in the psf-model vs model -band (top) and -band (bottom) planes selected in the CMASS sample of the commissioning data. The black lines are the linear cuts that remove the most spectroscopically confirmed stars whilst removing less than 1% of the galaxies. The band cut was added to the original band cut targeting from chunk 3 onwards.

Since these star-galaxy separation criteria measure the compactness of the objects in the SDSS imaging, their effectiveness will depend on the imaging PSF. Based on the commissioning data, Fig. 6 shows how the fraction of stars and galaxies removed by the new band criteria depends on the band PSF. The fraction of galaxies removed is fairly flat at 1% for PSF FWHM 1.5 and then rapidly increases at higher FWHMs. The fraction of stars removed displays the opposite trend. Fig. 6 also presents the numbers of stars and galaxies as a function of band FWHM, demonstrating that the vast majority of the sample is selected from imaging with FWHM 1.5. This slight seeing-dependent star-galaxy separation will result in the imprint of a spatial dependence in the density of galaxies across the survey, which can be corrected using seeing-dependent weights (see 6.4 for details).

Figure 6: The dependence of the star galaxy separation on the FWHM of the imaging PSF. The top panel shows the fraction, and the bottom panel the number, of spectroscopically classified stars and galaxies in the commissioning data that are excluded by the additional band star-galaxy separation as functions of band FWHM.

Whilst the above analysis addresses the fraction of galaxies lost due to the addition of the band star galaxy separation criteria, it provides no indication of how many compact galaxies were removed by the original band cut. To investigate this issue we combined the deep coadded SDSS Stripe82 imaging (Abazajian et al., 2009; Annis et al., 2014) with near-infrared and band imaging from the UKIDSS Large Area Survey Data Release 4 (Lawrence et al., 2007; Casali et al., 2007; Hewett et al., 2006; Hambly et al., 2008) in order to define a robust set of stars and galaxies over an area of 150 deg. A colour cut provides an excellent separation between stars and galaxies in the colour-magnitude region that the CMASS galaxies occupy. When this information is combined with the higher S/N measurement of from the coadded imaging we can confidently separate stars and galaxies. Using these data we estimate that the final star-galaxy separation cuts removes 2.3% of the full sample of galaxies selected by the CMASS colour cuts.

Summary of CMASS target selection

In summary, the CMASS target selection for the bulk of the survey is as follows:


During commissioning (chunks 1 and 2), we used significantly looser criteria; the CMASS_COMM sample (BOSS_TARGET&&), just under 25000 galaxies, was selected as follows:


See other exceptions to these criteria in Section 3.3.1.

3.4 Sparse Sampling Cuts

Motivated by the wish to study objects of slightly lower stellar mass and bluer intrinsic colour, we designed the CMASS_SPARSE sample. It extends the CMASS selection by altering the - sliding colour-magnitude cut to


with the other cuts unchanged (i.e., the area between the red and green dashed lines in the bottom panel of Fig. 4). These galaxies were randomly subsampled down to a number density on the sky of 5 deg, corresponding to approximately 1 in 10 targets. This sample was selected across the full BOSS footprint.

CMASS_SPARSE galaxies may be selected with





after excluding the commissioning chunks.

Altering the CMASS target selection in this way produces a sample of galaxies at somewhat lower redshift and stellar mass. The median redshift of CMASS_SPARSE is , with a stellar mass distribution that peaks at M (using the stellar masses of Chen et al. 2012), relative to the peak CMASS mass of M.

4 Spectroscopic observations

4.1 Previously known redshifts

Fractions of the LOWZ and CMASS targets have a previous robust object classification and redshift determined from the SDSS-II survey (York et al., 2000; Abazajian et al., 2009). We therefore matched our target sample to a sample of “known objects” with pre-determined secure classifications and redshifts and did not spectroscopically reobserve these galaxies within BOSS. This subsample of targets has a complicated angular distribution on the sky: the majority of the NGC was covered by SDSS-II, but only a few stripes in the SGC were observed. These pre-observed targets account for 43% (9%) of the LOWZ targets in the north (south). A much smaller fraction of CMASS targets were pre-observed: 1.7% (0.7%) in the N (S).

4.2 Target Collation and Spectroscopic Tiling

We start with the list of targets provided by the target selection algorithms detailed above, and remove targets with known redshifts as defined above. The tiling algorithm assigns the remaining targets to spectroscopic tiles. The sky was tiled in a piecemeal fashion as the survey progressed; each of these regions is called a “chunk”; see § 2.1 and Dawson et al. (2012) for further details. DR12 contains observations from 38 chunks. The survey mask and collated target catalogue both indicate the chunk to which a region or specific object was assigned.

The tiling algorithm (Blanton et al., 2003) determines the location of the diameter spectroscopic tiles and allocates the available fibres among the targets, including targets from other programmes within BOSS. Because of the size of the cladding on the fibres, fibres may not lie within 62” of one another on a given spectroscopic tile. The algorithm therefore divides target galaxies into friends-of-friends groups with a linking length of 62, and then assigns fibres to the groups in a way that maximizes the number of targets with fibres. The choice of which galaxies are assigned fibres is otherwise random. The algorithm adapts to the density of targets on the sky, such that regions with a larger than average number density tend to be covered by more than one tile. For the DR12 sample, 42% (55%) of the area in the north (south) is covered by multiple tiles, and the number density of CMASS targets is larger by 4.7% (3.4%) in those regions. The tile overlap - target density correlation is less pronounced for the LOWZ sample (1.6% and 2.4% enhancement in north and south, respectively). The LOWZ sample constitutes only 35% of the galaxy targets, and particularly in the north many galaxies in dense regions already have spectra from the SDSS-II and thus were not targeted for SDSS-III BOSS spectroscopy (see Sec. 4.1).

Fibre collisions are partially resolved only in the multiple tile regions, and therefore may not be representative of the unresolved fibre collisions in lower target density regions. Fibre-collided galaxies cannot simply be accounted for by reducing the completeness of their sector, since they are a non-random subset of targets (conditioned to have another target within 62). As discussed further in Sec. 6.1, we provide a set of weights that treat these objects as if they were observed, and assign their weight to the nearest object of the same target class. Finally, since quasar targets are given higher priority by the tiling algorithm, we account for their presence by simply including a 62 veto mask (see Sec. 5.1.1) around each high priority quasar target.

4.3 Spectroscopic Reductions

Figure 7: Normalised distributions of redshift failures (green, dashed) and redshift successes (red, solid) for the CMASS sample. Redshift failures constitute 1.8% of the CMASS targets observed by SDSS-III BOSS. These are contrasted against normalised distributions for the LOWZ sample of redshift failures (pink, dotted) and redshift successes (blue, dashed). Error bars were calculated assuming Poisson statistics. Note that some LOWZ galaxies have , which is why the normalisation for LOWZ curves looks lower than for CMASS.

Each “tile” output from the tiling algorithm specifies a central location on the sky and the list of targets to be observed. Physical plates are drilled at the University of Washington based on the anticipated airmass of observation. Multiple plates can cover the same tile, and plates may be observed on multiple nights until the desired signal-to-noise ratio is reached (Dawson et al., 2012).

The BOSS spectroscopic reduction pipeline is detailed in Bolton et al. (2012), with minor updates given in Alam et al. (2015). The final DR12 catalogues used the v5_7_0 tag of the idlspec2d software package6 for spectroscopic calibration, extraction, classification, and redshift analysis. We restrict the large-scale structure catalogues to only include data from plates with PLATEQUALITY set to “good.” The criteria for this designation are a minimum of three exposures, the number of spectroscopic pixels flagged as bad must be less than 10%, and a minimum signal-to-noise ratio requirement for both the blue and red arms of the spectrograph must be met (Dawson et al., 2012).

The classification and redshift of each object are determined by a Maximum Likelihood fit of the coadded spectra to a linear combination of redshifted “eigenspectra” in combination with a low-order polynomial. The polynomial (quadratic for galaxies, quasars, and cataclysmic variable stars; cubic for all other stars) allows for residual extinction effects or broadband continua not otherwise described by the templates. The templates are derived from a rest-frame principal-component analysis (PCA) of training samples of galaxies, quasars and stars using stellar population templates at the BOSS resolution (from Maraston et al. 2013). The reduced versus redshift is measured in redshift steps corresponding to the logarithmic pixel scale of the spectra, where . Galaxy templates are fit from to , quasar templates from to , and star templates from to (kms). The template fit with the best reduced is selected as the classification and redshift, with warning flags set for poor wavelength coverage, broken/dropped and sky-target fibres, and best fits which are within of the next best fit (comparing only to fits with a velocity difference of more than 1000 kms). This method is a development of that used for the SDSS DR8 (Aihara et al., 2011), and is explained in further detail in Bolton et al. (2012), and in Ahn et al. (2012, 2014).

For galaxy targets, a dominant source of false identifications is due to quasar templates with unphysical fit parameters, e.g., large negative amplitudes causing a quasar template emission feature to fit a galaxy absorption feature. Thus, for galaxy targets, the best classification and redshift are selected only from the fits to galaxy and star templates, and we restrict the sample to fits the pipeline classifies as robust. The results of these fits are tabulated in the “*_NOQSO” versions of various quantities in the LSS catalogues.

Table 2 lists the total number of CMASS and LOWZ targets that were assigned a fibre within the survey footprint () as well as the breakdown for each of the three possible outcomes: the number of CMASS and LOWZ targets robustly classified as stars () or galaxies (), and the number of targets for which the pipeline failed to find a robust classification and redshift (). A total of 2.3% (3.4%) of CMASS targets are stars and 1.6% (2.1%) are redshift failures in the north (south). Only 0.6% of LOWZ targets are stars and 0.5% are redshift failures.

Fig. 7 demonstrates that the pipeline is less likely to obtain a successful redshift for CMASS targets with fainter magnitudes. Section 6.3 discusses how we account for this strong dependence in the redshift failure weights.

5 Large scale structure catalogue creation

The creation of the BOSS large-scale structure catalogues involves a number of steps. We start with a list of targets based on the target selection procedure described above, with the previously known redshifts and outcome of the spectral analysis for each object for which we have a spectrum, matched to this list. Next we construct the survey mask, which specifies the regions of the sky that will be included in the LSS catalogues and the completeness in each included region. Finally, we use the mask and observed redshifts to generate a set of “random” galaxies, Poisson sampling the sky coverage specified by the mask with the same expected density distribution as the galaxies. The random galaxies are assigned redshifts to match the distribution of the target sample. Together, the data and random catalogues can be used for statistical analyses such as -point functions. These steps and some of the subtleties involved are now described in detail.

5.1 Mask

Property NGC SGC total NGC SGC total NGC NGC
607,357 228,990 836,347 177,336 132,191 309,527 2,985 11,195
11,449 1,841 13,290 140,444 13,073 153,517 2,730 6,371
14,556 8,262 22,818 1,043 976 2,019 24 61
10,188 5,157 15,345 868 602 1,470 21 55
34,151 11,163 45,314 4,459 4,422 8,881 16 167
7,997 3,488 11,485 10,295 3,499 13,794 114 609
568,776 208,426 777,202 248,237 113,525 361,762 4,336 15,380
632,101 242,409 874,510 179,247 133,769 313,016 3,030 11,311
685,698 258,901 944,599 334,445 154,763 489,208 5,890 18,458
Total area (deg) 7,429 2,823 10,252 6,451 2,823 9,274 144 834
Veto area (deg) 495 263 759 431 264 695 10 55
Used area (deg) 6,934 2,560 9,493 6,020 2,559 8,579 134 779
Effective area (deg) 6,851 2,525 9,376 5,836 2,501 8,337 131 755
Targets / deg 98.9 101.1 99.5 55.6 60.5 57.0 43.4 23.5
Table 2: Basic parameters of the DR12 CMASS,LOWZ, LOWZE2, and LOWZE3 samples. We track these classifications on a sector-by-sector basis in order to compute the BOSS fibre completeness in each sector of the survey. In this table we report , the sum over all sectors retained in the final BOSS mask. Target classification counts and areas for the LOWZE2 and LOWZE3 samples are reported for chunk 2 and chunk 3-6, respectively. To estimate the target density for those samples, we use the full NGC footprint to reduce cosmic variance.

We use the Mangle software (Swanson et al., 2008) to track the areas covered by the BOSS survey and the angular completeness of each distinct region; our terminology is summarised in Table 1. The mask is constructed of spherical polygons, which form the base unit for the geometrical decomposition of the sky. The angular mask of the survey is formed from the intersection of the imaging boundaries (expressed as a set of polygons) and the spectroscopic tiles. We define each unique intersection of spectroscopic tiles to be a sector (see Blanton et al., 2003; Tegmark et al., 2004; Aihara et al., 2011).

We compute sector completeness based on the distribution of targets across various outcomes of the tiling pipeline and spectroscopic reductions. In each sector (indexed by ) included in the large scale structure catalog, we distinguish the following outcomes (separately for each target class):

  1. galaxies with redshifts from good BOSS spectra (we denote the number in each sector by ),

  2. galaxies with redshifts from pre-BOSS spectra (),

  3. spectroscopically-confirmed stars (),

  4. objects with BOSS spectra from which stellar classification or redshift determination failed (),

  5. objects with no spectra, in a fibre collision group with at least one object of the same target class (), 7

  6. objects with no spectra, if in a fibre collision group then with no other objects from the same target class ().

These quantities, summed over all sectors included in the LSS catalogues, are given in Table 2. As each target is classed by one of these descriptors, we have that the total number of targets in sector is


and we define the number of targets observed by BOSS as


Matching our analyses for DR9, DR10 and DR11, the LOWZ catalogue is then cut to , and the CMASS catalogue is cut to to avoid overlap, and to make the samples independent. The number of galaxies used in the final catalogue is the subset of that pass these redshift cuts.

Figure 8: Completeness maps for both the LOWZ and CMASS samples in the north and south Galactic caps. The mean completeness is 98.8% for the CMASS sample shown in the left panels, and 97.2% for the LOWZ sample in the right-hand panels. Gaps correspond to early chunks as shown in Fig. 12. Each patch of different colour corresponds to a plate, with the colour determined by the completeness of that plate. This is surrounded by the higher completeness regions that overlap that plate with other plates. This leaves a pattern that looks like a darker, higher-completeness “mesh”, covering the survey.

From these descriptions, we define a BOSS fibre completeness in sector


This completeness definition excludes the “known” objects observed by SDSS-II. , shown in Fig. 8, is recorded in the mangle mask files released with the LSS catalogues and is used in the random catalogue generation (see Sec. 5.2). By this definition, the area-weighted average completeness is 99% (97%) for the CMASS (LOWZ) samples. We compute the effective mask area in Table 2 by weighting the used area of each sector by its completeness.

The boundaries of the spectroscopic tiles can be seen by eye in Fig. 8 as discontinuities in the value of completeness; the unique intersection of those tiles define individual sectors, in which we treat the BOSS fiber completeness as uniform. On average, the completeness is larger in regions covered by more than one spectroscopic tile. The raw sky area covered by spectroscopic tiles is 10338 deg, of which 10252 deg remain (7429 deg in the NGC and 2823 deg in the SGC) after restricting the mask to sectors for which every planned tile has been observed with “good” PLATEQUALITY.

Figure 9: The fraction of the total survey area that has a target completeness greater than the value shown, where target completeness is defined as the number of good galaxies spectroscopically observed in BOSS and those with previously known redshifts divided by the number of targets calculated in each sector as , as in Eq. (43). We compare this completeness with those we would have obtained had we not had to include various classes of targets. If there had been no stars in our target list, the completeness would have been (green line). If additionally we had not had to deal with fibre collisions, we would have observed a completeness (blue line), and if additionally there were no redshift failures (black line). From the definition of in Eq. (39) we see that the remaining decrement of the black line from is due to missed galaxies .

We also define a galaxy redshift completeness, assuming that stars are always correctly classified spectroscopically


and define a target completeness


which gives the number of good galaxies spectroscopically observed in BOSS combined with previously known redshifts divided by the number of targets calculated in each sector. Fig. 9 shows the fraction of the total BOSS area that has target completeness greater than a specified value, and how this would change if we coud ignore various effects. This shows the relative importance of different categories of targets to the target completness of BOSS, from the least important, which is redshift failures, to fibre collisions, which is the most important.

Previous LSS catalogues (DR9, DR10, DR11) had to deal with sizeable regions where BOSS spectra were not complete, and we made a number of cuts on sectors to include in the LSS catalogues to minimise the impact of this effect. In particular, sectors meeting any of the following criteria were removed from the LSS mask:

  • (Eqn. 41); removing part-complete sectors on the edges of the survey missing a significant fraction of redshifts.

  • (Eqn. 42) and ; removing regions with bad spectroscopic observations.

  • and there is not another sector within 2  in the right ascension or declination directions; removing isolated regions without galaxies.

But this was not done for the DR12 sample. If we had additionally applied the fibre completeness cut (first criterion above), for DR12 we would have rejected an additional 30 (56) deg from the CMASS (LOWZ) mask; if instead we had applied the redshift success cut in DR12 (second criterion above), we would have rejected an additional 1.7 (1.4) deg from the CMASS (LOWZ) mask. The difference between the earlier mask selection and the algorithm described above applied to DR12 constitute negligible changes on the survey mask. The two algorithms agree to within 0.3% of the total mask area for both the CMASS and LOWZ samples. Finally, the classification of and has slightly changed in DR12 relative to DR9-DR11; see Sec. 6.1.

Veto Masks

While the basic geometry of the survey is encapsulated in the survey mask described in the previous sections, there remain many small regions within it where we could not have observed galaxies. Although they are individually small, they are not randomly distributed across the sky, and sum to a significant area, and so we exclude them from any analysis. We represent those regions by a set of veto masks, and remove “randoms” that fall within these masks. The masks are:

  • Centerpost mask: Each Sloan plate is secured to the focal plane by a central bolt: no targets coinciding with the centerpost of a spectroscopic tile can be observed. This mask reduces the survey area by 0.04%.

  • Collision priority mask: Ly quasar targets receive higher priority than BOSS galaxy targets in the tiling algorithm; in regions of only a single spectroscopic tile, BOSS galaxy targets are unobservable within a fibre collision radius (62) of those targets. Treating the high-priority quasar target locations as uncorrelated with the galaxy density field and neglecting any recovered galaxy targets in tile overlap regions, we can simply account for the high-priority quasars by masking a 62 radius around each. This mask reduces the survey area by 1.5%.

  • Bright stars mask: We mask an area around stars in the Tycho catalogue (Høg et al., 2000) with Tycho magnitude within [6,11.5] with magnitude-dependent radius


    This mask reduces the area by 1.9%.

  • Bright objects mask: The standard bright star mask occasionally misses some bright stars that impact the SDSS imaging data quality. Additionally, a small number of bright local galaxies saturate the imaging as well, affecting target selection in their outskirts. These objects were identified by visual inspection, and the mask radii for each object were also determined in this manner, ranging from 0.1  to 1.5 . The number of objects in this mask is , subtending a total area of 43.8 deg. The list of objects is described in section 2.1 of (Rykoff et al., 2014). This mask covers 0.4% of the BOSS area.

  • Non-photometric conditions mask: We mask regions where the imaging was not photometric in , , or bands, the PSF modelling failed, the imaging reduction pipeline timed out (usually due to too many blended objects in a single field, caused by a high stellar density), or the image was identified as having any other critical problems. This mask reduces the area by 3.4%.

  • Seeing cut: we discard regions where the point spread function full width half maximum (labeled ’PSF_FHWM’ in the catalogues) is greater than 2.3, 2.1, 2.0 in the , , and band, respectively. The rationale for this cut is to decrease the variation of target density and properties with seeing due to the star galaxy separation (Eqns. 12, 20, and 21) and cuts. This cut removes an additional 0.5% (1.7%) of the NGC (SGC) footprint.

  • Extinction cut: for similar reasons, we also discard areas where the extinction (labeled ’EB_MINUS_V’ in the catalogues, from Schlegel, Finkbeiner & Davis 1998) exceeds 0.15. This cut removes an additional 0.06% (2.2%) of the NGC (SGC) footprint.

In the catalogue creation pipeline, the list of targets is immediately passed through these veto masks, so that targets in vetoed regions do not contribute to the sector completeness calculation. All random galaxies within the veto regions must also be removed. Table 2 shows that in total, 6.6% (9.3%) of the area within the north (south) galactic cap footprint was removed by the veto masks.

5.2 Random Catalogue generation

All of our clustering analyses make use of random catalogues with the same angular and redshift selection functions as the data. To produce these catalogues, we first use the Mangle ransack command to generate one and two catalogues, where the angular density of the random galaxies is proportional to the completeness value in the mask for each sector8. As the random catalogue follows the redshift completeness per sector, it automatically corrects for any systematic effects caused by the decrease in fiducial exposure times starting roughly half-way through the BOSS survey. Next we remove random galaxies using the set of veto masks described in Sec. 5.1.1. Only the angular coordinates of the random catalogue are used to fit for angular systematic weights; see Sec. 6.4. Since the true underlying redshift distribution of our targets is unknown and can only be estimated from the empirical redshift distribution, we assign redshifts to the galaxies in the two random catalogues by randomly drawing from the measured galaxy redshifts, but with a weight for each galaxy given by , defined in Eq. (50). This procedure ensures that the (weighted) galaxy and random catalogues have exactly the same redshift distribution, apart from (small) stochasticity from the random redshift assignment. Ross et al. (2012) compare this random redshift assignment scheme with approaches that fit a spline of varying knot number to the measured galaxy redshift distribution, and then sample from the resulting spline directly. Based on analysis of mock catalogues, their figure 19 demonstrates that the former method provides the smallest bias in fits to the monopole and quadrupole correlation function.

6 Accounting for observational artefacts in LSS catalogues

In this section we describe in detail how we weight the targeted galaxies when computing LSS statistics, in order to minimize the impact of observational artefacts on our estimate of the true galaxy overdensity field. We identify various effects that affect the completeness of the sample, which we quantify with weights applied per sector. These weights are a development of those presented in Anderson et al. (2012, 2014b). In particular, we discuss treatment of “known” redshifts from SDSS-II that were not re-observed in SDSS-III BOSS, galaxies not observed due to fibre collisions, observed galaxies for which a robust redshift was not obtained, and a weighting scheme to null non-cosmological fluctuations imprinted on the catalogue by the target selection step. The weights described below are available for each galaxy in the LSS catalogues. In this section we also summarise weights we apply to minimize our statistical error on the observed power spectrum.

6.1 Fibre collision corrections

Galaxies that were not assigned a spectroscopic fibre due to fibre collisions are not a random subsample of the full target sample since they are within a fibre collision radius (62) of another target. This is potentially a large effect: in the SGC, where the coverage of known targets from SDSS-II is lowest, approximately 20% of galaxy targets are in a collision group containing other CMASS or LOWZ galaxy targets. As a result, 5.8% of CMASS targets and 3.3% of LOWZ targets were not assigned a spectroscopic fibre.

These objects preferentially occupy denser environments and therefore have higher than average large-scale bias. They are also more likely than average to occupy the same dark matter halo as a neighbouring galaxy target. Accurate fibre collision corrections are therefore particularly important for applications relying on the absolute value of galaxy bias (i.e., in a comparison of the lensing and clustering amplitude) or those that use small-scale clustering to deduce halo occupation statistics and satellite fractions.

In the default large-scale structure catalogue that focuses on obtaining unbiased galaxy density fields on large scales, we simply upweight the nearest galaxy from the same target class that was assigned a fibre to account for collided galaxies that were not assigned fibres. This information is tracked by incrementing a weight , labelled WEIGHT_CP in the DR12 LSS catalogues. The upweighted nearest neighbour could be classified by the spectroscopic pipeline as a good galaxy redshift, a star, or a redshift failure. Upweighting the neighbour without reference to its classification is the appropriate thing to do as the missed object could be in any of these classes.

We correct 34151 (11163) CMASS targets and 4459 (4422) LOWZ targets by nearest neighbour upweighting in the NGC (SGC). This amounts to 5.0% (4.3%) of CMASS targets in the NGC (SGC), and 1.3% (2.9%) of the LOWZ targets. The difference between the hemispheres is due both to higher tile density in the SGC (so more fibre collisions fall in overlap regions where they can be partially resolved) and to most of the previously known SDSS-II redshifts falling in the NGC.

The algorithm used to generate the DR12 catalogues differs slightly from the one used for the DR9-DR11 catalogues. The new algorithm uses the output from the tiling algorithm to determine membership in fibre collision groups. Targets with the same ’FINALN’ and ’INGROUP’ field flags output from the tiling code share a collision group. We choose the nearest object of the same target class and collision group to carry the weight of the unobserved target. We also allow “known” galaxies to carry the weight if they are closer than all BOSS-observed targets. In DR9-DR11 catalogues, we did not refer to the fibre collision group indices, but simply identified collision pairs in the same target class if they were separated by less than 62. Nonetheless, the two algorithms select the same nearest neighbour % of the time.

Our adopted fibre collision correction scheme neglects a few subtle cases:

  • No corrections are applied for objects that are the only members of their target class in their fibre collision group, and did not receive a fibre. For CMASS, this class represents 4% of all targets in fibre collision groups, and 0.7% of all CMASS targets overall. Since there are more CMASS targets per unit area, this effect is larger for LOWZ targets: 12% of all collided LOWZ targets and 1.4% of the full sample. Treating such collision pairs as unassociated is still a good approximation. To verify this assumption, we examined all collision groups consisting of a single LOWZ target and single CMASS target, and for which we obtained both redshifts. Only 11% of such pairs had line-of-sight separations smaller than 50 hMpc.

  • No corrections are applied when none of the multiple objects of the same target class in a fibre collision group were assigned a fibre. These galaxies are treated as random incompleteness in the survey coverage and comprise 0.14% of the total galaxy sample.

  • Finally, 0.3% of targets did not receive a fibre due to collisions with targets other than CMASS and LOWZ but of the same priority. Again we treat these missing redshifts as random.

Tables 3 and 4 provide statistics about the distribution of CMASS and LOWZ galaxies in fibre collision groups and how the probability of assigning a fibre to a pair of collided galaxies in the same fibre collision group depends on the size of the collision group. Approximately 75% of collided galaxies are in a group of only two, and group sizes above four are quite rare. In Table 4, reports the fraction of galaxies in a collision group that received a spectroscopic fibre, as a function of , the number of spectroscopic tiles covering their sector. In the remaining columns we report the fraction of pairs of CMASS+LOWZ targets in the same collision group for which both targets received a fibre, both globally (), and as a function of . In regions covered by a single spectroscopic tile, only a small fraction of pairs with both receive a spectroscopic fibre (4%). Such pairs must be sourced from collision groups containing at least one target of another class, oriented such that the two CMASS/LOWZ targets in the group are separated by more than 62. As expected, for pairs in smaller collision groups are more likely to be resolved, and the majority of fibre collisions are removed.

Finally, to understand the impact of fibre collision corrections on our estimate of the true galaxy density field, we examine the apparent separation for pairs of galaxies in the same fibre collision group for which good redshifts were obtained for both. Fig. 10 shows the distribution near , although the tails extend to much larger separations. We have converted redshift separations to apparent distance separations using the fiducial cosmological model. The observed distribution (coloured lines) can be fit by a flat background and an exponential distribution centered on (black lines). The fraction of resolved fibre collision pairs that are “correlated” (i.e., contribute to the exponential component in the fit to the pairwise separation histogram) is 52% for pairs of CMASS targets and 62% for pairs of LOWZ targets, i.e., nearly half of fibre collision pairs are unassociated projections. Interestingly, the width of the distribution is consistent with  hMpc for both target classes and is generally consistent with halo modeling expectations.

Since the choice of which galaxies are assigned fibres in a collision group is completely random (apart from maximising the number of targets receiving a fibre), the object not assigned a fibre is statistically equivalent to the one we upweight, and so once upweighted correlations at transverse separations larger than the fibre collision scale should be unbiased. However, correlations at transverse separations below the collision scale will be biased, since we are removing these small scale pairs. Additionally, these small-scale variations will be anisotropic, and therefore likely to have a stronger affect on the quadrupole, rather than monopole moments of 2-point clustering statistics, for example. We therefore advocate constructing statistics that do not apply these weights in situations where these effects are important; see Reid et al. (2014) for an example configuration space statistic.

2 0.7631 0.8456 0.7566
3 0.1687 0.1182 0.1726
4 0.0440 0.0270 0.0461
5 0.0146 0.0070 0.0150
6 0.0059 0.0010 0.0057
7 0.0023 0.0005 0.0022
8 0.0007 0.0003 0.0008
9 0.0003 0.0006 0.0003
Table 3: Distribution of galaxies across fibre collision group sizes. The largest collisions group (not listed) contains 17 galaxy targets. The first column provides the fraction of CMASS targets in groups with CMASS targets, restricted to groups with at least two CMASS targets. The second column shows the same calculation for LOWZ targets. The final column lists the fibre collision group size distribution, where includes both CMASS and LOWZ targets. For consistency across the mask these results were computed from the LOWZ sample footprint (chunks 7). For reference, the fraction of galaxies that are not in any collision group is 77%.
1 0.54 0.561 0.092 0.042 0.159 0.142
2 0.41 0.945 0.820 0.971 0.685 0.589
3 0.05 0.992 0.966 0.992 0.985 0.915
4 0.0005 1.000 1.000 1.000 1.000 -
Table 4: Fibre collision statistics for targets in regions covered by spectroscopic tiles. The second column shows the fraction of the total mask area covered by tiles. The third column gives , the fraction of all collided galaxies that were assigned a fibre. The remaining columns specify the fraction of pairs of galaxy targets (CMASS + LOWZ) in the same collision group for which both targets received a fibre, both globally (), and as a function of . We use the global fraction to remove collided pairs and approximate the fibre collision effect in our mock galaxy catalogues. We track separately for the NGC and SGC and for CMASS, LOWZ or combined catalogues, but in practice the values are similar in each case to those reported here.
Figure 10: The probability distribution of apparent line-of-sight separations for pairs of galaxies in the same fibre collision group and for which both have good redshifts. The left panel uses pairs of CMASS targets and the right panel uses pairs of LOWZ targets. Both distributions can be fit with the sum of a background term and an exponential: in the range  hMpc. A total of 52% (62%) of the CMASS (LOWZ) pairs contribute to the exponential term. The best fit width of the exponential component is 5.4 hMpc for both CMASS and LOWZ targets.

6.2 Treatment of “known” targets

As the pre-observed “known” sample is complete (no failures are kept), it does not match the angular distribution induced by variations in completeness of the galaxies spectroscopically observed by BOSS. Rather than try to model the distribution of known galaxies, we instead subsampled these data to match BOSS completeness in each sector, thus imposing the BOSS mask on the known galaxies. In this way we make the sample indistinguishable from BOSS-observed targets. In earlier data releases (DR9-11) we also marked a fraction of the galaxies in a 62 close pair containing at least one object from the “known” sample as fibre-collided; we did not apply this step in our DR12 analysis and describe the difference in more detail in the next section.

In DR9-DR11 catalogues we additionally marked a fraction of the galaxies in a 62 close pair containing at least one object from the “known” sample as fibre-collided, and assigned its weight to its nearest neighbour. This fraction was determined by measuring the fraction of 62 BOSS targets that were fibre collision corrected in each sector. In sectors covered only by a single spectroscopic tile all 62 pairs were collided. The original motivation of this correction was to impose the same fibre collision completeness on the “known” targets as the BOSS targets. In DR12 we did not apply this correction. The rationale was that on sufficiently large scales the nearest neighbour upweighting scheme restores the correct clustering statistics, and so should therefore be equivalent to using the measured redshifts. However, we expect the effective shot noise to be larger when using the former procedure. Correlation function and power spectrum analyses that marginalize over a shot noise term should be unaffected by this choice; analyses of smaller-scale clustering should examine this issue further. This change is particularly important for clustering of the LOWZ sample because of the large overlap with the “known” galaxy sample.

6.3 Redshift Failures

For 1.8% (0.5%) of CMASS (LOWZ) targets, the spectroscopic pipeline fails to obtain a robust redshift. We do not necessarily expect these to be distributed randomly with respect to e.g., plate center or redshift, and so we again adopt a nearest neighbour upweighting scheme to account for these objects. Redshift failure galaxies were permitted to be upweighted because of a nearest neighbour fibre collision. We therefore transfer the total weight to the nearest neighbour of the redshift failure, incrementing a weight , labelled WEIGHT_NOZ in the DR12 LSS catalogues. The upweighted object must be classified either as a good galaxy or star redshift.

In DR9-DR11 large-scale structure catalogues we removed sectors with redshift success rates below 80% and at least ten good redshifts; in our DR12 catalogue we exclude troublesome observations by restricting mask regions with PLATEQUALITY of ’good’, and do not remove the handful of sectors that would have been excluded using the DR9-11 criteria. Upon closer examination, we found that sectors failing the DR9-11 cut contained a small number of targets and therefore subject to small number statistics; we checked that targets in those sectors were drawn from plates with high redshift success rates.

In DR9-11 we searched for redshift failure neighbours to upweight only in the same sector; in the DR12 catalogue we only consider neighbours observed on the same plate (which spans multiple sectors) and same date, which restricts the neighbour search to galaxies observed under approximately the same conditions, and means the weighted number of classified objects in each sector matches the number of targets. The majority of close neighbours restricted to the same sector vs. restricted to the same plate and date are the same neighbour. The median angular separation between galaxies without a good redshift and their closest neighbour using the updated algorithm is 3.7 (3.9) in the north (south), compared with 2.9 using the sector-based algorithm. Total counts of redshift failures for CMASS and LOWZ galaxies are listed in Table 2.

In CMASS, redshift failures are more likely to occur on faint targets - see Fig. 7. In the weighting scheme described above the neighbouring, up-weighted, galaxies are drawn from the distribution of observed galaxies, which in turn are brighter on average than the galaxies that failed to yield a good redshift. Given the slight correlation of with redshift, this introduces a small redshift-dependent bias on the LSS catalogues. To ameliorate this effect, we modify the redshift-failure weights such that the weighted distribution of of the corrective weight matches the of the targets with failed redshifts. In practice, acknowledging that an up-weighted galaxy might be a neighbour to more than one redshift failure, we compute , where with and corresponding to the green and red lines of Fig. 7 respectively. To avoid being dominated by Poisson noise in any given bin of , we set for any bin where or are less than ten. The weights are normalised such that . This scheme effectively transfers weight from bright to faint neighbours of redshift-failure weights. We only apply this extra correction to CMASS for two reasons: firstly the LOWZ redshift-failure rate is very small ( and, secondly, we find no significant dependence of redshift failure with for LOWZ targets.

6.4 Angular Systematic Weights

For the DR12 data we follow the same approach as described in Ross et al. (2012) and updated in Anderson et al. (2014b) to remove non-cosmological fluctuations in CMASS target density with stellar density and seeing. The LOWZ targets are brighter and do not show significant variations with these quantities, so LOWZ targets do not require these weights.

In DR12 we update the HEALPix9 stellar density map to include all stars with -band magnitudes between 17.5 and 19.9; the map used in DR10/DR11 did not impose the 17.5 bright cut. The two maps also differ by a factor of the pixel area, 0.210 deg. The functional form for was also updated in DR12 to be the inverse of a linear relation:


while in DR10/DR11 was linearly dependent on ; see Ross et al. (2015) for details. These two differences explain the changes to the values of the and parameters between DR10/DR11 and DR12. The DR12 parameter values for , determined using all galaxies in the CMASS catalog with , are and , computed in computed in 0.3 magnitude width bins centred at [20.45, 20.75, 21.05, 21.35], as in Anderson et al. (2014b). The parameter is determined for each galaxy by first linearly interpolating the and fits to derive a value at each galaxy’s , and then using Eq. 45. The distribution of weight values is similar in the NGC and SGC and, overall, 93% of CMASS galaxies have .

For DR10/DR11 analyses, a map of the DR8 -band seeing, , was created by taking the mean seeing value within HEALPix pixels with over the primary SDSS galaxies in the DR8 Catalogue Archive Server. For DR12, we instead directly query the imaging data to determine the conditions estimated for each galaxy’s parent imaging field. Per-object and per-field seeing estimates are calculated differently. Empirically, these two methods for determining differ by a factor of . There is also scatter between per-field and per-object estimates of sky flux and airmass. The DR12 galaxy and random catalogues contain fields for ’PSF_FWHM’, ’AIRMASS’, ’SKYFLUX’, ’EB_MINUS_V’, and ’IMAGE_DEPTH’ if users want to further explore systematics relationships. In what follows, the -band seeing . For DR12 we adopted a slightly different parameter convention from that of earlier catalogues10:


In addition, we fit the systematic relationship separately for the NGC and SGC, again restricting the fits to objects in the CMASS LSS catalogues with . The DR12 parameter values are 0.5205 (0.5344), 2.844 (2.267), and 1.236 (0.906) for the NGC and SGC, respectively. In DR10/DR11 we also set ; this action is no longer necessary since the DR12 veto masks remove all area with .

Finally, the application of the CMASS -band star/galaxy separation cut in the LOWZE3 sample induced a significant dependence on the sample number density with that varies with the -band model magnitude; see Ross et al. (2015) for details. The systematic weight for this sample is


with parameters and , fit using all objects in the LOWZE3 catalogue with , including objects in chunks in addition to the LOWZE3 targeted region, chunks 3-6.

The total angular systematic weights are simply the product of and for each object with index :


6.5 Total Galaxy Weights

Finally, we combine the angular systematics weight with the fibre collision and redshift failure nearest neighbour weights to produce a final weight for each object in the final catalog:


Since the default values of both and are 1, the term in parentheses conserves the total number of galaxy targets. This is the galaxy weighting consistent with the construction of the LSS catalogues provided, and must be used to obtain unbiased estimates of the galaxy density field, since this weight is used when assigning the random galaxy redshifts; see Sec. 5.2.

6.6 Angular Density and Redshift Distribution

We estimate the angular density of galaxy targets as the total number of targets within the final LSS mask divided by the total non-vetoed area within the sample LSS mask. The values for each target class are listed in the final line of Table 2.11 We convert this angular target density into a three-dimensional space density through a properly normalised redshift probability distribution:


where we sum over all objects in the catalogue with good spectroscopic redshifts, and is the total weight assigned to target to account for various observational artefacts (Eq. 50). The inclusion of in the estimate for accounts for any impact of the angular systematics on the (normalised) redshift distribution, through e.g., the dependence of the stellar weights. However, our estimator for the angular target density does not recover the true target density in the absence of stars and imperfect seeing, but an average target density over the survey footprint. Finally, we use the fiducial cosmology to determine the number of targets per hMpc. The result is shown in Fig. 11 for all four target classes, as well as the sum of the CMASS and LOWZ sample number densities (with duplicate CMASS and LOWZ targets counted only once). The CMASS+LOWZ number density reaches a local minimum in the overlap region of  hMpc. As reported in the previous sections, survey incompleteness, fibre collisions, redshift failures, and stars in the target sample all reduce the average angular density of good galaxy redshifts compared to the angular target density; their aggregate impact is a 10% (4.4%) reduction for CMASS (LOWZ). Finally, we compute the effective volume , which quantifies the reach of a sample for making cosmological measurements, for the CMASS and LOWZ samples following the same algorithm outlined in Anderson et al. (2014b), summing over 200 redshift shells


where is the volume of the shell at , and we assume that Mpc, which we have changed since DR11, so the numbers are not directly comparable to Anderson et al. (2014b). We find  Gpc for CMASS and  Gpc for LOWZ.

Figure 11: Number density of all four target classes assuming our fiducial cosmology with , along with the sum of the CMASS and LOWZ number densities (black).

6.7 FKP weights

Feldman, Kaiser & Peacock (1994), hereafter FKP, showed that the optimal weighting of galaxies as a function of redshift depends on the number density of galaxy tracers. The optimal weight depends on the amplitude of the power spectrum in the power spectrum bin of interest. In practice, we use the same value  hMpc to estimate both the power spectrum and correlation function on all scales. This value of corresponds to the observed power spectrum at  hMpc. The field ‘WEIGHT_FKP’ in the DR12 galaxy and random catalogues is given by


for an object with redshift , where is computed by linear interpolation over bins with starting at . The weight is optional in LSS analyses. To utilize these weights in a large scale structure analysis, one must weight both data and random objects; the final weight of galaxy is therefore and the final weight of random object is . If one does not use the FKP weights (i.e., as in Reid et al., 2014), consistent weightings of the galaxy and random catalogues are and , respectively.

Earlier data releases adopted a different fiducial cosmology and assumed  hMpc to compute . Percival, Verde, & Peacock (2004) updated the analysis of Feldman, Kaiser & Peacock (1994) to a weighting scheme that accounts for luminosity-dependent clustering; such weights will be presented for the BOSS galaxy samples in a forthcoming BOSS team paper. However, because our target selection algorithm is so efficient at selecting massive galaxies, the gain provided by luminosity-dependent weights is modest for our sample.

7 Combined catalogue creation

For the purpose of providing a maximally contiguous three dimensional density field estimate, in DR12 we provide a new catalogue that combines the CMASS sample with the three lower redshift samples: LOWZE2 (chunk 2), LOWZE3 (chunks 3-6), and LOWZ (chunks ). See Appendix A for details of the LOWZE2 and LOWZE3 samples. A precise geometric description of the sky area covered by each sample is provided in mangle mask format, constructed such that every sector included in the CMASS mask is included in exactly one of the LOWZE2, LOWZE3, or LOWZ footprints. We also construct two additional masks, one including the LOWZE2 + LOWZ sky coverage and another including the LOWZE3 + LOWZ sky coverage.

Using those masks, we first generate a LOWZE2 catalogue including chunk 2 and chunks and a LOWZE3 catalogue including chunks using the target selection algorithms detailed in Appendix A. This is possible since all the galaxies passing LOWZE2 and LOWZE3 cuts will also pass the LOWZ cuts. Producing a catalogue across a larger fraction of the sky allows a more accurate estimate of for the LOWZE2 and LOWZE3 samples (and therefore a better means of assigning redshifts to the random galaxy sample). Without this step, the average density in chunk2 and chunks 3-6 would be poorly determined and could lead to erroneous reconstruction flows towards or away from those regions in the final combined catalogues. As discussed in Sec. 6.4 and Ross et al. (2015), there is a significant correlation between -band seeing and LOWZE3 target density which we remove using a systematic weight given by Eq. (48); LOWZE2 and LOWZ samples require no systematic weight corrections. We follow this same procedure with some minor but important differences when combining CMASS and LOWZ catalogues. After full footprint data and random catalogues are produced, we trim each catalogue back to its original targeted region (i.e., LOWZE2 in chunk 2, LOWZE3 in chunks 3-6, and LOWZ in chunks ) using the mutually exclusive masks discussed above.

Our algorithm to generate the combined catalogue from the four different samples (CMASS, LOWZ, LOWZE2, LOWZE3) is as follows:

  • Renormalize the CMASS galaxy systematic weights such that


    This ensures that in the combined catalog, a CMASS target and a LOWZ target on average have equal weight in each of the three distinct regions. The functional form chosen for and does not guarantee this normalisation. Fibre collision and redshift failure weights are left the same as in the original CMASS-only catalogue and the parameters for the systematic weights are identical to the ones in the CMASS-only catalogue (apart from the renormalisation).

  • For each of LOWZ, LOWZE2, LOWZE3 samples (“LOWZX”), read in the targets (including those in chunks ), and remove objects already in the CMASS catalog. Duplicate targets are 2.6%, 2.4%, and 4.4% of the LOWZ, LOWZE2, and LOWZE3 samples, respectively. Fibre collision and redshift failure weights are then recomputed on each duplicate excluded LOWZX sample. As in the previous catalogues, fibre collision and redshift failure weights are only assigned to other LOWZX targets (not CMASS targets). For the LOWZE3 sample, systematic weights are assigned using the same parameters as the LOWZE3-only sample, but renormalised as in  Eq. 54.

  • Concatenate the CMASS and LOWZX samples and compute the completeness of the combined sample in each sector. The rest of the catalogue creation steps, i.e., random catalogue generation and estimation, are identical to the algorithms used for the CMASS and LOWZ catalogues described previously.

When analysing the combined catalogue, as well as allowing for any evolution in the bias across the sample, one also has to consider the differential bias between LOWZ and CMASS samples. Although this is expected to be small due to the relatively benign transition from LOWZ to CMASS (Ross et al., 2015), a full exploration of this issue is left for a forthcoming BOSS team paper.

8 Discussion

The small statistical errors achievable on cosmological measurements from BOSS data require removal of potential systematic issues to an unprecedented level. Spectroscopic target selection and mask creation are key areas where systematic problems can be introduced if care is not taken to fully understand both. In this paper we have presented the target selection for the three primary spectroscopic galaxy catalogues within BOSS: LOWZ, CMASS and Sparse, and for variations on these used for some early data. Each sample has different sky coverage and expected redshift distribution.

We have also presented the methods used to turn the target catalogue and redshift measurement data into galaxy and random catalogues, which enable clustering measurements to be quickly made, as well as methods to mitigate potential systematics. It may be that some analyses are best done without the corrections provided - for example, it may be cleaner for small-scale clustering analyses not to apply the close-pair weights, but to correct in some other manner.

In addition to a number of improvements over the catalogue creation method used for DR9, DR10 and DR11 samples we have described how we have created a single BOSS catalog, combining CMASS and LOWZ samples. This allows us to include some extra galaxies, and maximise the effective volume covered by galaxies within BOSS. It also allows us to use a binning scheme in redshift different from those of CMASS and LOWZ, optimising our cosmological measurements.

The resulting galaxy and random catalogues, the largest in the world, are hosted at as well as supplemental catalogue and target information. In this final release, we also provide copies of our source code, mksample, to reproduce the DR10, DR11, and DR12 catalogues. The reader should consult the source code directly to resolve any ambiguities in our description here.

Next generation spectroscopic experiments, such as eBOSS (Dawson et al., 2015), DESI (Levi et al., 2013), HETDEX (Hill et al., 2008), 4MOST (de Jong et al., 2014), WEAVE (Dalton et al., 2014) PFS (Takada et al., 2014), Euclid (Laureijs et al., 2011) and WFIRST (Spergel et al., 2015), are expected to make cosmological measurements with precision either comparable or higher by up to an order of magnitude compared to that of BOSS, requiring a thorough understanding and extremely careful treatment of potential systematic effects. Although each of these future experiments have different observing strategies, they will encounter challenges in the process of catalogue creation similar to those of BOSS (e.g. variations in the galaxy surface density due to galactic extinction is an effect inherent to our observable Universe). The lessons learned from the catalogue creation method applied within BOSS, and described in this paper, will be of strong benefit for these future surveys.

9 Acknowledgements

Funding for SDSS-III has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, and the U.S. Department of Energy Office of Science. The SDSS-III web site is

SDSS-III is managed by the Astrophysical Research Consortium for the Participating Institutions of the SDSS-III Collaboration including the University of Arizona, the Brazilian Participation Group, Brookhaven National Laboratory, Carnegie Mellon University, University of Florida, the French Participation Group, the German Participation Group, Harvard University, the Instituto de Astrofisica de Canarias, the Michigan State/Notre Dame/JINA Participation Group, Johns Hopkins University, Lawrence Berkeley National Laboratory, Max Planck Institute for Astrophysics, Max Planck Institute for Extraterrestrial Physics, New Mexico State University, New York University, Ohio State University, Pennsylvania State University, University of Portsmouth, Princeton University, the Spanish Participation Group, University of Tokyo, University of Utah, Vanderbilt University, University of Virginia, University of Washington, and Yale University.

We thank Eli Rykoff for providing the Bright Objects Mask described in Section 5.1.1.

The author order reflects the following: BR led the development of the mksample software package. She is followed in the author list by two alphabetical lists of scientists who provided major contributions to galaxy targeting and/or catalogue creation. Out of these, WJP is corresponding author as he took-over the creation of this paper when BR left the field of astronomy. This list is followed by an alphabetical list of scientists who provided moderate contributions to these topics, and who contributed to the BOSS project as a whole.


Appendix A LOWZ Early selection algorithms

As the survey progressed, these were slight changes to the targeting pipeline. In some instances the newer algorithm was stricter than the one used in the past, so we simply apply the same cuts to the objects targeted earlier as well. One special case is the LOWZ targets in chunks 2-6. The star-galaxy separation algorithm for CMASS was erroneously applied to those galaxies as well, resulting in a drastic reduction in the target density. There are other differences, and so we define two algorithms, LOWZE2 as that applied to chunk 2, and LOWZE3 as that applied to chunks 3-6. In analyses thus far we have simply eliminated these early regions from our LOWZ catalog, but we are actively pursuing a sufficient description of that population to robustly recover clustering measurements in those regions. Removing this area results in a 10% reduction in the LOWZ survey mask area. The distributions of the early chunks on the sky are shown in Fig. 12. Chunk 2 was commissioning data, and used the LOWZE2 version; Chunks 3-6 used LOWZE3, and chunks 7-11 used older photometric reductions, and a different version of resolve (see Section 2.1). Chunk 1 was used for very early commissioning runs and is not of sufficient uniformity to be used to create LSS catalogues. Chunk 1 is located at DEC=0 in the SGC footprint (commonly referred to as “Stripe 82”). This area was later reobserved with updated target selection as Chunk 11.

  • Chunk 2: The LOWZE2 sample had slightly different cuts and the CMASS -band star-galaxy separation cut was erroneously applied. The catalogue was later trimmed to as well. This selection yields a target density lower than the nominal LOWZ target sample.

  • Chunks 3-6: The LOWZE3 sample is the same as chunk 2 but with a stricter bound and both star-galaxy separation cuts. This selection yields a target density lower than the nominal LOWZ target sample.