\sim 3 million ages from Gaia DR2

Isochrone ages for million stars with Gaia

Jason L. Sanders, Payel Das
Institute of Astronomy, University of Cambridge, Madingley Road, Cambridge, CB3 0HA, UK
Rudolf Peierls Centre for Theoretical Physics, University of Oxford, OX1 3NP, UK
E-mail: jls@ast.cam.ac.uk (JLS)
Accepted XXX. Received YYY; in original form ZZZ

We present a catalogue of distances, masses and ages for million stars observed by Gaia with spectroscopic parameters available from the large spectroscopic surveys: APOGEE, Gaia-ESO, GALAH, LAMOST, RAVE and SEGUE. We use a Bayesian framework to characterise the probability density functions of distance, mass and age using photometric, spectroscopic and astrometric information, supplemented with spectroscopic masses where available for giant stars. Furthermore, we provide posterior extinction estimates () to every star using published extinction maps as a prior input. We provide an appendix with extinction coefficients for Gaia photometry derived from stellar models, which account for variation with intrinsic colour and total extinction. Our pipeline provides output estimates of the spectroscopic parameters, which can be used to inform improved spectroscopic analysis. We complement our catalogues with Galactocentric coordinates and actions with associated uncertainties. As a demonstration of the power of our catalogue, we produce velocity dispersion profiles of the disc separated by age and Galactocentric radius. This suggests that the velocity dispersion profiles flatten with radius in the outer Galaxy and that at all radii the velocity dispersion follows the smooth power law with age observed in the solar neighbourhood.

Galaxy: structure – stellar content – formation – kinematics and dynamics
pubyear: 2018pagerange: Isochrone ages for million stars with GaiaA.1

1 Introduction

The second Gaia data release (Gaia DR2 Gaia Collaboration et al., 2018a) heralds a revolution in the study of the Milky Way. Gaia (Gaia Collaboration et al., 2016) has provided parallaxes and proper motions for billion Milky Way stars (Lindegren et al., 2018) as well as highly accurate multi-band photometry for many of these stars (Riello et al., 2018; Evans et al., 2018). For a subset of stars , radial velocities have been computed from the Gaia Radial Velocity Spectrometer (RVS) spectra (Cropper et al., 2018; Sartoretti et al., 2018). Whilst in its own right this dataset will lead to rapid advances in our knowledge of the structure of the Milky Way (Gaia Collaboration et al., 2018b), when complemented with other photometric surveys and large-scale spectroscopic surveys, Gaia is the perfect tool for Galactic archaeology studies.

The primary goal of Galactic archaeology is to construct a census of the structure of the Milky Way in spatial, kinematic and chemical space (Bland-Hawthorn & Gerhard, 2016). Differences in the chemodynamical structure of populations reflects both differences between the chemodynamical structure of the populations at their formation epochs as well as differences in their subsequent dynamical evolution. The secondary goal of Galactic archaeology is to use the census of the Galaxy to infer the properties of the Galaxy at different epochs of star formation and hence the evolution history of the Galaxy from this data. This must necessarily be informed by Milky-Way-like simulations and models. To disentangle the impacts of dynamical evolution from the formation environment in this way, we require the rich, detailed extended picture of our Galaxy, provided by the synergy of Gaia with large-scale spectroscopic surveys. One question of interest for example is to understand whether the correlation between stellar age and velocity dispersion is a consequence of older stars being born hotter, or stars heating over time.

Before Gaia, reliable kinematics (from parallaxes) were only available in the solar neighbourhood. Early extended studies of the Milky Way were limited to using large-scale photometric surveys (e.g. 2MASS, SDSS Skrutskie et al., 2006; Abolfathi et al., 2017). In recent years, large-scale spectroscopic surveys have been the dominant tool in furthering the study of the Milky Way. Many spectroscopic surveys (APOGEE, LAMOST, RAVE, GALAH, Gaia-ESO, SEGUE) have been designed with the intention of complementing the astrometry from Gaia with radial velocity information as well as detailed chemical abundances. With full phase-space coordinates, the dynamical structure of the Galaxy well beyond the solar neighbourhood can now be mapped in detail. Going forward, future surveys such as WEAVE, 4-MOST, Milky Way Mapper and MOONS will further complement the Gaia data and provide a more complete picture of the populations of the Milky Way.

The combination of spectroscopy, photometry and astrometry allows accurate characterization of stellar properties. Given a set of stellar models, this data combination can give accurate measurements of the distance, mass, age and other spectroscopic properties of the stars. In time, the Gaia data will be used to produce improvements in stellar models. However, a first step is to inspect the properties of the stars given the currently available models. Measuring distances from photometry and spectroscopy via sets of stellar isochrones has been developed over a number of years (Pont & Eyer, 2004; Jørgensen & Lindegren, 2005; Burnett & Binney, 2010; Binney et al., 2014) and has been widely applied to current spectroscopic surveys (e.g. Queiroz et al., 2018; Mints & Hekker, 2018). Utilising additional information from parallaxes (e.g. McMillan et al., 2017) and/or masses (Das & Sanders, 2018) is a natural extension. We highlight two synergies of the methodology. For a typical star with , Gaia DR2 provides a typical parallax uncertainty of giving parallaxes accurate to out to . For more distant giants observed by spectroscopic surveys, spectro-photometric distances can be superior. The combination of spectro-photometry and astrometric information is particularly powerful (McMillan et al., 2017) – for instance, Gaia parallaxes can cleanly distinguish between nearby dwarfs and distant giants, producing improved spectroscopic parameter estimates. These estimates are useful initial guesses for improved analyses of the spectra.

The real advantage, however, of comparing astrometry and spectro-photometry to sets of stellar models is measuring stellar ages. The age of a star is a fundamental parameter in understanding the formation and evolution of the Galaxy, but, except in a few limited cases, can only be inferred via models (Soderblom, 2010). The most reliable age estimates are available for turn-off stars as a star’s initial mass determines the time at which it leaves the main sequence. However, even with parallaxes from Gaia there is a strong degeneracy between metallicity and age that must be broken with the use of spectroscopic metallicities (Howes et al., 2018). Other populations of stars (e.g. lower main sequence dwarfs and giants) are much more difficult to accurately date due to only subtle differences in their observed spectro-photometric properties. However, recent theoretical and empirical work (Masseron & Gilmore, 2015; Martig et al., 2016) has demonstrated that there are clear spectroscopic indicators of a giant star’s mass (carbon and nitrogen atomic and molecular lines) and hence age. When further combined with information on a giant’s luminosity from Gaia parallaxes, it is also possible to accurately date giant stars. Das & Sanders (2018) have recently demonstrated the power of including masses for giant stars in the standard isochrone pipeline.

In this paper, we provide distances, masses, ages, extinctions and spectroscopic parameters (, and ) for stars from spectroscopic surveys combined with the Gaia data. In Section 2 we describe the method employed, focussing on details of the extinction priors and extinction law, the scheme for estimating masses from spectroscopic parameters (‘spectroscopic mass estimates’) and the adopted Milky Way prior. In Section 3, we describe the datasets to which we have applied our algorithm, along with the Gaia data employed. In Section 4, we discuss the results of our procedure concentrating on the quality of the catalogue as well as indicating the possible chemo-dynamical studies that such a catalogue makes possible. We present our conclusions in Section 5.

2 Method

We lay out the framework used to construct a probability density function for a star’s distance. We assume the stars are single (see Coronado et al., 2018, for handling binary stars). We first present the framework using the isochrones in this subsection, discuss extinction in Section 2.1, discuss complementing our data with spectroscopic mass estimates in Section 2.2, present our choice of Galaxy prior in Section 2.3 and discuss the outputs of our approach in Section 2.4.

We use the Bayesian method presented in Burnett & Binney (2010). We assess the probability of the th star being at a distance (with distance modulus ) given the data , which consists of observed spectroscopic data , photometric data , astrometric data (Galactic coordinates and parallax ) and in some cases mass estimates ( from spectroscopic mass estimators calibrated with asteroseismology) with associated uncertainties. The uncertainties in the spectroscopic parameters are given by the spectroscopic parameter covariance matrix . The uncertainties in the photometry are assumed to be uncorrelated and given by , whilst the parallax and mass have associated uncertainties and respectively. We drop the subscript in the following, for clarity. The resulting pdf is given by


where the integral is performed over a set of isochrones indexed by metallicity and log-age . We work with a set of PARSEC isochrones (v1.2S – excludes thermally pulsing AGB phase, Bressan et al., 2012; Chen et al., 2014; Tang et al., 2014; Chen et al., 2015) spaced by in and in (upto a maximum age of ). Each isochrone gives a set of observed properties as a function of the initial mass . The integral over isochrones can therefore be written as


where is the volume occupied by each isochrone point . Given an isochrone point and a distance , the model spectroscopic ), photometric (absolute magnitudes ) properties and current mass () may be computed and compared with the observed properties given the reported errors. We write


and expand the first term as


to marginalize over the unknown -band extinction . The four likelihood terms are then given by


where we have introduced the notation for a Gaussian with mean and standard deviation , and is the extinction in the th photometric band.

2.1 Extinction

When using photometry in our pipeline, we have to correct for the line-of-sight extinction. We choose to characterise this by the parameter , the extinction in the Johnson band. The total extinction is related to the selective extinction by . Therefore, the extinction coefficient for the band, , is dependent on the assumed extinction law. For photometric bands other than the extinction is given by so is simply related to our parameter via . The set of coefficients describe the adopted extinction law whilst scales the total extinction. We first describe our choice of before discussing priors on .

2.1.1 Extinction coefficients

Here we briefly describe the adopted extinction coefficients, . Full details are given in Appendix A. We use the extinction curve (total extinction at wavelength ) from Schlafly et al. (2016) calibrated using APOGEE data. We scale to the units adopted in the extinction maps of Green et al. (2018, is slightly different from ). Therefore, the provided coefficients can be used in combination with the Green et al. (2018) extinction map to find the extinction in band as .

Using the stellar model with and from Castelli & Kurucz (2004), we compute for all photometric bands of interest (SDSS , Pan-STARRS , 2MASS , APASS , WISE , Gaia ) by integrating the photometric band response over the spectrum. For the broad Gaia band we consider variation of with effective temperature by tabulating over the full range of Castelli & Kurucz (2004) models. In Appendix A, we provide expressions for as a function of effective temperature and intrinsic colours in different bands.

There is also variation of with . We evaluate equation (19) at a range of monochromatic extinctions and measure the gradient . The results are provided in Appendix A. We do not consider variation of the extinction law (characterised by the parameter ) despite Schlafly et al. (2016) demonstrating that there is variation of across the APOGEE fields.

2.1.2 Extinction prior

Our prior uses extinction measurements from a combination of three extinction maps (c.f. Bovy et al., 2016). We preferentially use the extinction maps from Green et al. (2018, using the dustmaps bayestar interface). For each star that falls in the Pan-STARRS footprint, we draw samples of the extinction (in the bayestar 2017 units) at a set of discrete distances along the line-of-sight. At each distance, we use the coefficients from the previous section to compute the mean and the standard deviation . For any star that falls outside the Pan-STARRS footprint (which only extends to ), we next attempt to use the extinction map from Marshall et al. (2006) which is confined to and and provides mean extinction and its uncertainty. For each star, we use from the previous section to find and on a grid in distance. Finally, where neither of these maps are available we use the 3D extinction map (expressed in ) from Drimmel et al. (2003) using the interface from Bovy et al. (2016). Again we tabulate on a grid in distance and assume uncertainty in .

2.2 Mass estimates

For giant stars, a mass measurement is a near direct measurement of age. Asteroseismology provides an estimate of stellar mass given measurements of the frequency spectrum of a star’s oscillations. However, such observations require high quality photometry over a long time baseline so are limited to small subsets of stars confined to limited regions of the Galaxy (e.g. the Kepler field). Recent work (Masseron & Gilmore, 2015; Martig et al., 2016; Ness et al., 2016) has demonstrated empirically that the [C/N] ratio in giant stars is an indicator of mass. Theoretical expectation (Charbonnel, 1994) is that both the equilibrium position of core CNO burning and strength of dredge-up are functions of stellar mass. Higher mass stars produce more nitrogen than carbon leading to a suppression in [C/N]. We wish to use the carbon and nitrogen abundances as constraints within our framework by converting these spectroscopic parameters into estimates of the stellar mass.

We adopt the procedure in Das & Sanders (2018, DS18) for relating spectroscopic parameters to mass. This involves constructing a Bayesian artificial neural network for the input spectroscopic parameters (DS18 use ) and their uncertainties to the mass and its uncertainty . We scale both the input and output parameters to approximate unit gaussians. The neural network architecture presented here differs slightly with that in DS18. Here the neural network contains two hidden layers (with 24 hidden nodes each) rather than one. Although this is a more complex architecture, marginalizing over the model parameters does not result in significant over-fitting. We again assume each layer (except the output) uses a sigmoid function


The weights are matrices and the biases vectors. To find the posterior distributions on the neural network parameters , we evaluate


The likelihood term is given by


and we choose normal priors on with zero mean and standard deviations given by hyperparameters – one for each of the six parameters . The inclusion of these hyperparameters is a development from DS18 and are important as, although the inputs and outputs have been scaled to approximate normal distributions, the hidden layers can have considerably larger dynamic range.

We implement this model in PyMC3 (Salvatier et al., 2016) and train using automatic differentiation variational inference (ADVI). As our training set we use the overlap of APOGEE DR14 (Abolfathi et al., 2017, described later) with asteroseismic results (Pinsonneault et al., 2014; Vrard et al., 2016). We preferentially work with the Vrard et al. (2016) sample and supplement with Pinsonneault et al. (2014) APOKASC results for those stars not in Vrard et al. (2016). We remove stars with ASPCAPFLAG , those without all elements of measured and duplicates. We further remove those stars that have been identified as rotating by Tayar et al. (2015).

For unseen data, the posterior distribution on the mass is given by


where the first term is a -function involving . We generate samples from this pdf by using the trace output from PyMC3 for the model parameters and sample from a Gaussian for the spectroscopic parameters. We reduce the resulting pdf to the first two moments: mean and standard deviation.

Here we have used ADVI to characterise the pdf whereas DS18 fully sampled from the pdf using the No-U-Turn sampler (NUTS, Hoffman & Gelman, 2011). The disadvantage of ADVI is that we fail to characterise the (potentially significant) correlations and multi-modality of the pdf (although highly isolated modes will not be well sampled by NUTS). Additionally, DS18 found the behaviour of NUTS to be smoother when varying the architecture. However, ADVI is considerably faster and in practice the results obtained are quite similar.

The advantage of employing this method is that we need not perform quality cuts on our samples (provided we trust the reported uncertainties), and the uncertainty in the mass estimate reflects both the input uncertainty and the model uncertainty. Modelling the training set permits some level of de-noising of the output masses. The output mass uncertainty for unseen data is comparable (or better than) the mass uncertainty in the training set, significantly extending the power of asteroseismology to vast numbers of stars. Additionally, the flexibility of the model allows us to extrapolate into regions of spectroscopic parameter space that are sparsely populated where our model becomes more uncertain. A disadvantage is we are utilising spectroscopic information e.g. to constrain the mass and both mass and are then used in the distance pipeline. We are therefore using some spectroscopic information twice. However, using solely , and produces poor constraints on the mass, and most of the constraint on the mass comes from and which are not further utilised. The use of the other spectroscopic parameters can be thought of as weakly adjusting the relationship between and , and mass. Furthermore, we must assume that the stars within our asteroseismic sample are representative of the stars in our entire spectroscopic sample. Finally, we are required to use input spectroscopic parameters calibrated on the same scale as the training set.

2.3 Galaxy prior

We choose to adopt a non-uniform prior distribution in distance, age and metallicity reflecting the prior knowledge that the considered stars belong primarily to the Galactic disc. For this we introduce the 3D prior distribution with which we must include a Jacobian factor to relate to the prior on distance. Additionally, we use the Kroupa initial mass function (Kroupa et al., 1993) as a prior on the initial mass:


We decompose our Galaxy prior into multiple components


Following Binney et al. (2014), we use a three-component (thin disc, thick disc and halo prior) but supplemented by a bulge prior (c.f. Queiroz et al., 2018). As we are interested in producing reliable age estimates, we follow Queiroz et al. (2018) and adopt a smooth age prior (as opposed to the truncated age prior from Binney et al. (2014) that e.g. assigns zero probability to young ages for high latitude distant stars). Each component can be written in the separable form


We detail each component in turn. All age distributions are truncated at (the largest isochrone age we consider) and normalized appropriately.

  1. Thin disc: Double exponential profile (exponential in both and ) with scalelength and scaleheight normalized with local density (Bovy, 2017), gaussian distribution in metallicity (mean , standard deviation ), age distribution

  2. Thick disc: Double exponential profile with scalelength and scaleheight normalized with local density (Bland-Hawthorn & Gerhard, 2016; Bovy, 2017), gaussian distribution in metallicity (mean , standard deviation ), truncated gaussian in age (mean , standard deviation )

  3. Halo: spherical power-law with (Binney et al., 2014) normalized with local density (Bland-Hawthorn & Gerhard, 2016; Bovy, 2017), gaussian in metallicity (mean , standard deviation ), truncated gaussian in age (mean , standard deviation ),

  4. Bulge: Besançon density profile (Robin et al., 2012):


    where if . is cylindrical polar radius and primed coordinates are aligned with the bar (at an angle relative to , Simion et al., 2017). We normalize the profile such that the central density is (Robin et al., 2012), and we set and from Sharma et al. (2011) and all other parameters from Simion et al. (2017) S model fits to VVV data, gaussian in metallicity (mean , standard deviation ), truncated gaussian in age (mean , standard deviation ).

Importantly, we place the Sun at radius and height above the plane (Bland-Hawthorn & Gerhard, 2016).

2.4 Outputs

With our specification, the full pdf of distance can be constructed using equation (1). In practice, we simply compute the first two moments of the distance distribution e.g.


For this we sum over a range in distance modulus at each isochrone point given by where is the uncertainty in the measured apparent magnitude and is the absolute magnitude of the isochrone point (we use or ). For speed, the extinction prior is only computed at the central value . Additionally, the sum in equation (2) is only performed over the isochrone points within standard deviations of the observed metallicity and then we only consider points within standard deviations of the reported and . We choose but increase this (iteratively by a factor of two) if there is no reported overlap with any isochrone points on a first pass. The range of integration for is found from evaluated at a first estimate of the distance neglecting extinction (using only infra-red photometry).

In a similar fashion to equation (15), we can compute the moments of the log-age distribution for each star, and the covariance between distance modulus and log-age. We also compute the first and second moments of the effective temperature, surface gravity, metallicity, initial mass and the logarithm of the extinction.

The pipeline will fail if meaning there is no overlap of the data with any isochrone point within times the uncertainties. In this case, the star is flagged in the output catalogue (see Section 3.8). For each spectroscopic dataset we thin the isochrone grid in metallicity to the median metallicity uncertainty of the dataset. This results in some failures (see Section 2.4) which we reanalyse using the finest spacing of .

With full 6D data for the stars (on-sky position, distance, line-of-sight velocity and proper motion) we compute the Galactocentric velocity (using the peculiar solar velocity from Schönrich et al., 2010), as well as the action coordinates and guiding-centre radius (using the Stäckel fudge method, Binney, 2012; Sanders & Binney, 2016) in the potential of McMillan et al. (2017). We first draw samples from the multivariate distance modulus, line-of-sight velocity and equatorial proper motion distribution. We account for the covariance in the Gaia parallax and proper motions by first drawing samples using the Gaia measurements and then using rejection sampling to generate samples with acceptance proportional to where and are the mean and uncertainty in the distance modulus from our pipeline and and are the mean and standard deviation of the parallax reported by Gaia. For each sample we compute the derived quantities and report the mean and standard deviation as the best estimate and uncertainty (ignoring unbound samples – if no samples are bound, the resulting actions are undefined). Note that the uncertainty in the actions does not reflect the uncertainty in the solar position or the potential.

3 Spectroscopic datasets

Figure 1: On-sky galactic distribution of the stars processed by our pipeline coloured by survey.

In this section we describe the spectroscopic datasets to which we apply our algorithm. We give a brief overview of each spectroscopic survey along with the specific data we use. In Figure 1 we show the on-sky distributions of the different surveys we use. For APOGEE, LAMOST and GALAH giants, we apply the spectroscopic mass estimator. For LAMOST, we first build a data-driven model of the spectra (the Cannon, Casey et al., 2016) to find spectroscopic parameters calibrated to the APOGEE scale. For GALAH we build a data-driven model of the spectroscopic labels directly using a neural network to put GALAH spectroscopic parameters on the APOGEE scale.

3.1 Apogee

As part of SDSS IV (Blanton et al., 2017), the Apache Point Observatory Galactic Evolution Experiment (APOGEE) has targeted primarily stars confined to the disc and bulge with infrared spectroscopy. The North survey uses the APOGEE 300-fibre spectrograph (Wilson et al., 2010) on the Sloan 2.5m telescope (Gunn et al., 2006) at Apache Point Observatory. APOGEE spectra are taken in the band () with a resolution of . We use the APOGEE DR14 catalogue (Abolfathi et al., 2017) removing duplicates (retaining the highest signal-to-noise). We use the calibrated , and (García Pérez et al., 2016) along with the reported covariances, and the 2MASS photometry with uncertainties (Skrutskie et al., 2006). Although APOGEE was designed to target giant stars, there are many nearby dwarf stars. These stars do not have provided in DR14 as they differed significantly from stellar models, but and is reported. For all stars with valid and , but no we assign . Note, we do not perform any cuts on quality of spectroscopic parameters (flagged with ASPCAPFLAG) instead leaving this to be done at a later stage of analysis. We use the procedure of Section 2.2 to assign masses and uncertainties given the reported , and .

3.2 Lamost

LAMOST (Large Sky Area Multi-Object Fiber Spectroscopic Telescope, Cui et al., 2012; Zhao et al., 2012) Experiment for Galactic Understanding and Exploration (LEGUE, Deng et al., 2012) is a low resolution () optical () spectroscopic survey designed to study the Milky Way disc and halo. From the third data release A, F, G, K catalogue (http://dr3.lamost.org/), we use the reported , and (as a proxy for metallicity) and their uncertainties (approximately million stars). We complement the catalogue with 2MASS filtering on quality (different photometric catalogues were used in the LAMOST target selection). We use photometry provided:

  1. the photometric quality flag ph_qual is A, B, C or D,

  2. the contamination flag cc_flg is .

  3. if the read flag rd_flg is (i.e. the magnitude is from profile fitting), X_psfchi (where X is the magnitude) must be less than .

When 2MASS photometry does not satisfy these requirements, we instead use Pan-STARRS photometry (, and ) along with or just , and when Pan-STARRS isn’t available.

3.2.1 Spectroscopic mass estimates

We further complement the LAMOST giants with mass measurements using a two-stage procedure. First, we apply the Cannon method of Ho et al. (2017) to a sample of stars cross-matched between APOGEE and LAMOST (with ASPCAPFLAG, C_M_FLAG, N_M_FLAG ). We repeat the analysis of Ho et al. (2017) as (i) we wish to use our framework of Section 2.2 which has been calibrated using updated APOGEE parameters and (ii) we are using LAMOST DR3 instead of DR2. We build a seven label (, , , , , , ) using the DR14 calibrated APOGEE parameters using the code of Casey et al. (2017)111https://github.com/andycasey/AnniesLasso. All spectra are normalized using a smoothed version of the spectrum (with FWHM ). Unlike Ho et al. (2017), we don’t use any photometry (resulting in poorer estimates of ). As Ho et al. (2017) found, there is a weak correlation between the output and line-of-sight velocity due to sky features in the spectrum. We do not correct for this by omitting regions contaminated by sky features as it does not seem a significant issue for measuring the mass (which is a weak function of ). We employ a leave--out scheme to measure the accuracy of our approach. The formal errors from the Cannon are significantly smaller than the scatter with respect to the test set. In a similar approach to Ho et al. (2017), we measure the scatter in label in bins of signal-to-noise and fit three parameter functions of the form


We apply our model to all stars with and in the LAMOST catalogue and deem the parameters satisfactory if the reduced chi-squared and the results for , and lie within the training set. We assign uncertainties using the relation (of equation 16). The resulting sets of parameters (, , , , , ) are used in the mass estimator of Section 2.2 to obtain masses and uncertainties.

3.3 Rave

The RAdial Velocity Experiment (RAVE, Steinmetz et al., 2006) was designed as a predecessor to the Gaia Radial Velocity Spectrometer (RVS). medium resolution () spectra were taken in the spectral range (around the Ca II triplet) on the 1.2 m UK Schmidt Telescope at the Australian Astronomical Observatory (AAO) using a multi-object spectrograph. With a targeting magnitude range of , RAVE has primarily observed local disc stars. Spectroscopic parameters were extracted from the spectra using two different pipelines. The first (denoted RAVE_DR5) utilises the parameters provided in the latest RAVE data release (DR5, Kunder et al., 2017) using the analysis method from Kordopatis et al. (2013), whilst the second (denoted RAVE_Cannon) utilises the Cannon results for the RAVE spectra from Casey et al. (2017). We remove duplicate observations retaining the higher signal-to-noise spectra. For both datasets, we use 2MASS photometry and associated uncertainties. For RAVE_Cannon we compute the metallicity from the reported combined with an inverse-variance-weighted estimate of using the formula from Salaris et al. (1993). We also use covariances between , and for this dataset.

3.4 Ges

The Gaia-ESO survey (GES, Gilmore et al., 2012) is a public spectroscopic survey on the Very Large Telescope (VLT) utilising the FLAMES (Fiber Large Array Multi-Element Spectrograph) spectrograph (both medium resolution GIRAFFE spectra and high resolution UVES ). One of the goals of Gaia-ESO is to study the systematics in spectroscopic parameters by comparison of the results from multiple nodes and through a combination of field and cluster fields. The field sample primarily focusses on thick disc and halo stars. We use the reported , and (as a proxy for metallicity) with the associated errors from the DR3 public data release222https://www.gaia-eso.eu/data-products/public-data-releases/gaia-eso-survey-data-release-3. The target selection for field stars in GES was performed using VISTA photometry (Emerson et al., 2006; Dalton et al., 2006) so we preferentially use VISTA photometry with associated uncertainties and default to 2MASS where unavailable (we transform the 2MASS bands provided from the PARSEC isochrones to VISTA bands using the relations from http://casu.ast.cam.ac.uk/surveys-projects/vista/technical/photometric-properties).

Figure 2: Recovery of nitrogen abundance from the Bayesian neural network for 193 stars in the training sample (grey) and 44 stars in the unseen testing sample (cyan) from the overlap of APOGEE DR14 and GALAH DR2 data.

3.5 Galah

GALactic Archaeology with HERMES (the High Efficiency and Resolution Multi-Element Spectrograph) (De Silva et al., 2015; Martell et al., 2017) is a medium resolution () optical (four windows) multi-fibre spectroscopic survey. It was designed to provide a rich set of chemical abundances for the purpose of chemical tagging distinct star formation events (Freeman & Bland-Hawthorn, 2002) and has a simple selection function () targeting primarily disc stars. The second GALAH data release (Buder et al., 2018) contains stars. The spectroscopic analysis is performed in two stages with a high-quality test-set analysed in detail with line synthesis which is then used as a training set for a Cannon model (Ness et al., 2016; Casey et al., 2016). As a result, there are stars that fall outside the training set producing unreliable results (flagged by flag_cannon). We ignore these stars resulting in a catalogue of . We use 2MASS magnitudes and compute metallicity from a combination of and using Salaris et al. (1993) (if is available, else we use ).

3.5.1 Spectroscopic mass estimates

We wish to complement the GALAH giant stars with spectroscopic mass estimates. We cannot employ the same procedure as we performed for LAMOST as the GALAH spectra are not publicly available. Instead, we build a model to put the reported spectroscopic parameters from GALAH onto the APOGEE scale. GALAH DR2 has measured from the spectra (Buder et al., 2018), but not . Without , the spectroscopic mass estimator is of limited power as the strongest correlation is between and mass. However, we can estimate the missing measurement from the other spectroscopic measurements. In particular, the conservation of total CNO in the CNO cycle produces a simple relationship between , and .

We take the 237 stars that lie in the overlap between APOGEE and GALAH (ASPCAPFLAG and flag_cannon = 0) and build a Bayesian neural network (with one hidden layer and 20 nodes) to relate the GALAH () to APOGEE (). The procedure is similar to that described in Section 2.2 except we use NUTS to find the posterior distributions of the neural network parameters on a training subsample of 193 stars, and use the remaining 44 test stars to assess the performance of the neural network.

Figure 2 shows how well the Bayesian neural network is able to predict the APOGEE DR14 nitrogen abundance from the GALAH DR2 spectral parameters for the training and testing samples. The model only starts failing for [N/M], beyond which the model underestimates the APOGEE DR14 nitrogen abundance. Nitrogen has a negative correlation with oxygen and carbon; however the five stars with [N/M] are found over the whole range of oxygen and carbon abundances. There may be a genuine physical reason underpinning this, but it’s difficult to look beyond the Poisson noise obviously affecting the estimates for higher nitrogen abundances.

3.6 Segue

The Sloan Extension for Galactic Understanding and Exploration (SEGUE) (Yanny et al., 2009) is a low resolution () optical spectroscopic survey designed to complement the SDSS catalogues with radial velocities. The survey was conducted in two key stages (SEGUE-1 and SEGUE-2) where SEGUE-2 focussed primarily on the outer halo of the Galaxy. Spectroscopic parameters have been computed by the SEGUE Stellar Parameter Pipeline (SSPP) (Lee et al., 2008a, b; Allende Prieto et al., 2008; Smolinski et al., 2011). We adopt the external error estimates from Lee et al. (2008a) of , and in , and respectively which we add in quadrature to the reported uncertainties. We take all SDSS DR12 stars observed in the programmes SEGUE, SEGUE-2 or SEGUE-faint with valid , and , that are science primary, have zwarning or and are flagged as normal (‘nnnnn’), resulting in a catalogue of stars). We use as a proxy for metallicity. We complement with the photometry from SDSS.

3.7 Complementary Gaia data

For each catalogue, we perform a radius cross-match to the second Gaia data release (Gaia Collaboration et al., 2018a) by utilising the Gaia proper motions and accounting for the respective epochs of the surveys (we assume the epoch of all the spectroscopic catalogue observations is 2000). From the Gaia DR2 source catalogue, we extract the parallax, proper motion and the uncertainty covariance matrix for the astrometry. Despite the reported global zero-point parallax offset of (Lindegren et al., 2018; Gaia Collaboration et al., 2018c), we use the reported parallax and uncertainty as is. For studying large-scale Galactic structure, the zero-point is only important for more distant stars where other uncertainties (e.g. in the choice of prior) are also significant. Additionally, the zero-point is also a strong function of on-sky location, magnitude and colour (Lindegren et al., 2018; Gaia Collaboration et al., 2018c; Riess et al., 2018) so a global offset will only fix some systematic issues. For the Gaia photometry we assume a systematic uncertainty floor in the photometry that reflects systematics in the photometry (Riello et al., 2018) as well as intrinsic uncertainty in the isochrones. We use the parallax and photometry as additional inputs in our pipeline. When the cross-match fails, we still process the stars without any Gaia data.

3.8 Output catalogue description

Figure 3: Spectroscopic HR diagram of stars assigned a young age by our pipeline: black contours show the distribution of stars with and red which are suspected binaries. The inset shows the distribution of shaded by the separation. The blue line shows a solar metallicity isochrone.

We provide a catalogue of the combined results from all surveys complete with the Gaia DR2 source_id for each entry (where available)333The full catalogue is available at https://www.ast.cam.ac.uk/j̃ls/data/gaia_spectro.hdf5. For each survey, we include a unique identifier (APOGEE: APOGEE_ID, LAMOST: obsid, RAVE: raveid, GES: CNAME, GALAH: sobject_id, SEGUE: specobjid) and we provide a field survey with a string detailing which survey the entry comes from. Note that we have not removed duplicate stars that were observed by two separate surveys (e.g. RAVE and RAVE-On). The table is saved as an astropy table (The Astropy Collaboration et al., 2018) in hdf format, and can be simply read with from astropy.table import Table; data = Table.read(’file’);. A description of the column and structure of the table is available in data.meta[’COMMENT’].

We provide a flag entry in the table. If the isochrone pipeline has failed (e.g. there is no overlap between isochrones and data within times the uncertainties due to a bad cross-match), flag=1. This flag is also non-zero if there is a problem with the input spectroscopy (flag=2), photometry (3), astrometry (4) or mass (5) (no entries in the catalogue have flag=4,5). A small fraction () of the processed stars appeared to have overlap with only a single isochrone (point) leading to zero or undefined 2nd moments (uncertainties). We flag these stars with .

We found a non-negligible fraction of processed red stars had small ages. In Fig. 3, we show the 1D distribution of which exhibits a clear peak at . We also display the - distribution of these stars which reveals that the pipeline has assigned these as pre-main-sequence objects. The assigned and for these stars form a clear sequence running alongside the main sequence, such that it is highly likely that many of these stars are binary. This is also evident in colour-magnitude space. We flag the stars for which and and with . For more accurate parameters, these stars require a special pipeline such as that presented in Coronado et al. (2018).

Some surveys contain duplicate observations of the same stars and some surveys have observed the same stars more than once. We identify duplicates using the Gaia source IDs and preferentially keep the entries flagged as from different surveys in the order: APOGEE, GALAH, GES, RAVE-ON, RAVE, LAMOST and SEGUE, and for duplicates within surveys we keep the result with the smallest radial velocity uncertainty. We provide a duplicated flag which is for the rejected duplicates and zero otherwise. Finally, we provide a best flag which is if flag=0, duplicated=0 and the star has a Gaia match. We provide statistics for all processed stars in Table 1. Our ‘best’ sample consists of million stars. It should be noted that additional quality cuts may be necessary for certain analyses to remove stars that have been flagged as unusual (for instance, in RAVE stars with c!=‘n’).

Total 457555 457555 342682 258475 3177995 25332 187152 4906746 3702273
Gaia matches 456353 456353 342212 256851 3168545 25313 187100 4892727 3702273
Success (0) 415200 376316 260233 203417 2802163 10882 180401 4248612 3277004
Pipeline failed (1) 1358 368 67 2778 90198 105 378 95252 68210
Spec. problem (2) 20602 67439 78455 42553 4792 12513 0 226354 144582
Phot. problem (3) 498 498 0 810 8124 1181 1465 12576 4973
Unreliable errors (6) 3220 1093 46 460 67862 52 72 72805 41054
Low age (7) 16677 11841 3881 8457 204856 599 3062 249373 164815
Success with Gaia 414238 375488 259877 203127 2799338 10881 180401 4243350 3277004
Table 1: Numbers of stars processed by pipeline. Total gives the total number of stars in the survey, Gaia matches gives the number of matches in Gaia DR2, and Success gives the number of stars with flag=0. The middle section shows the number of stars with pipeline failure (along with the allocated flag in brackets). Success with Gaia gives the number of stars with flag=0 with Gaia DR2 matches. In the All no dupl. column we only consider Gaia matches and have removed duplicate Gaia entries.

Caveats: Our provided catalogue has a number of caveats and features that we should highlight. First, we note that when inspecting and analysing this catalogue one should be aware of our choice of Galaxy prior. This choice imposes some level of structure on the results. When attempting to fit the models to the provided dataset, this must be accounted for by ‘dividing out’ our choice of prior. When inspecting the catalogue in age and metallicity, the imprint of the isochrone gridding is visible with a weak preference fo stars to bunch at the isochrone points. Some of these stars will have very small uncertainty in age and metallicity and so in practice we recommend a minimum age uncertainty (of order the spacing be used). Finally, our choice of prior restricts distant and metal-poor stars to be old and no star can be older than our final isochrone point. This biases the results for large ages and there is a correlation between uncertainty and age for these old stars.

Our catalogue could be further improved with additional data and modelling improvements. For instance, we have not explicitly included abundance which is now regularly provided by large spectroscopic surveys. Furthermore, with access to the stellar spectra (from for instance RAVE and GALAH), we could improve age estimates from the giant stars (using or CN bands in the case of RAVE). Finally, the hard truncation in age and our adopted prior produces a number of undesirable features which could be improved through further refinements.

4 Results

Our catalogue represents the largest, most homogeneous catalogue of distances, ages, masses and spectroscopic parameters available. In this section, we demonstrate the properties of the catalogue, give a number of checks of its quality and highlight its possible power in studies of the dynamical structure of the Galaxy.

The real power of our catalogue is in the age estimates. There are two key subsamples of stars that have accurate ages from our pipeline. First, the combination of parallaxes and spectroscopic metallicities break the metallicity-age degeneracy for turn-off stars (e.g. Howes et al., 2018). Secondly, for giant stars accurate ages are possible due to the employed spectroscopic mass estimates combined with Gaia parallaxes which further constrain the luminosity and hence age. We conservatively define these two subsamples as 1. giants: and , and 2. turn-off: and .

Figure 4: Distribution of output uncertainties from our pipeline: each row corresponds to a different quantity labelled in the right plots, and each column to a different sample (defined in the text) labelled in the top panels.

4.1 Output uncertainties

In Figure 4, we show the output uncertainties from our pipeline for all stars, the giant stars and the turn-off stars. We opt not to show the results for SEGUE as the majority of these stars are distant and metal-poor so the uncertainties (particularly in age) strongly reflect the prior. We see that for giant stars we obtain uncertainties in age of for APOGEE (which has the most accurate spectroscopic mass estimates), for GALAH (for which we have inferred from other abundances before computing a spectroscopic mass estimate) for LAMOST (which also has spectroscopic mass estimates) and for RAVE and GES (for which no spectroscopic mass estimates are used). However, there are a number of metal-poor distant giants in these surveys that have mass estimates due partly to the age prior employed. The mass uncertainties for the giant stars are the age uncertainties. For the turn-off stars, all surveys yield ages accurate to with GES and LAMOST producing a slightly fatter tail to large uncertainties due to larger metallicity errors.

The distribution of distance errors peaks very close to zero due to the accuracy of the Gaia parallaxes. For the giant stars (which are at larger distance) the peak moves to higher . The tails of the relative distance uncertainty distributions are a combination of the different survey selection functions and the quality of the input parameters. We see the largest errors arise from LAMOST and GES (which observe the faintest stars).

Our pipeline takes as inputs the spectroscopic parameters, and , but also provides these as outputs. In the lower panels of Figure 4, we show the output uncertainties in these parameters. For all surveys, we find uncertainties in of and for giants and turn-off stars respectively. The effective temperature is strongly constrained with photometry, hence the independence with survey. However, the degree to which the output uncertainties can be trusted is unclear as they are dependent on systematics in the bandpasses and the assumed extinction law. For , the uncertainties are survey dependent – we see the two surveys with spectroscopic mass estimates for giants (APOGEE and LAMOST) peak at smaller uncertainties ( respectively) than the other surveys (). For the turn-off stars, the difference in accuracy is due to the selection functions of the surveys, where LAMOST and GES target more distant stars with less accurate distances from Gaia and hence less accurate .

Figure 5: Input and output spectroscopic parameters ( against coloured be ). The left column shows the spectroscopic parameters used as inputs in the pipeline and the right column the parameters output by our pipeline. Each row corresponds to a different survey labelled in the left plot.

The output and from our method can be compared with the inputs determined entirely spectroscopically. We show a comparison of the approaches in Figure 5. We clearly see the power of enforcing an isochrone prior on the spectroscopic parameters. In particular, the main sequences are very tight due to the quality of the Gaia parallaxes. We also note that our procedure has introduced some discreteness to the diagrams due to our chosen isochrone spacing. However, each parameter estimate also has an associated error that smooths over this discreteness.

4.2 Extinction maps

The combination of photometric, spectroscopic and astrometric data is powerful for mapping the 3D dust extinction throughout the Galaxy. Spectroscopy allows us to identify intrinsically similar stars, photometry gives a measure of the differential reddening between these stars, and astrometry tells us how the reddening varies along the line-of-sight. measurements for all stars are an output of our pipeline. Our measurements are informed by the adopted extinction prior but stellar colours and effective temperatures provide additional information producing stronger constraints on the extinction than provided by the prior. In Fig. 6, we show on-sky for all stars and at two distance slices for the ‘best’ dataset. We see the expected large-scale dust structure and the extinction increasing as a function of distance.

We also compare our measurements with provided in the Gaia DR2 source catalogue, calculated using a combination of the Gaia photometry and parallaxes. We see that above there is a clear linear relationship between our output and with approximate gradient in agreement with the expectation from Table 2. Below , is poorly constrained, possibly due to limitations in the methodology (Andrae et al., 2018).

Figure 6: Extinction maps in galactic coordinates for band extinction, . The left panel shows all stars with flag=0, and middle and right the subsets of these with and .
Figure 7: Comparison of extinction estimates to Gaia DR2 catalogue results (2d histogram is log-scaled). The line has gradient equal to the anticipated value for red stars from Table 2.

4.3 Age maps

In Figure 8 we present all-sky age maps for the giant and turn-off subsamples. The entire giant subsample exhibits the expected gradients with galactic coordinates with younger stars confined primarily to the disc plane and higher latitudes dominated by older stars. As we slice through in distance we see the increasing contribution of old stars at high latitude (due in part to our age prior).

The turn-off subsample exhibits a less clean picture of on-sky age distribution and for the entire sample some of the observed features are produced by the different selection functions of the surveys. In particular, the southern sky surveys (GALAH and RAVE) observe brighter stars than the primary northern sky survey (LAMOST). If we restrict ourselves to observing the nearby bright turn-off stars () we find that there is essentially no on-sky age gradients. Over these distance scales, all age populations are contributing so we end up observing the average age of the solar neighbourhood. For a more distant bin () we observe the anticipated age gradient with latitude and the southern sky surveys smoothly match into the northern sky. For an even more distant bin (), we lose southern sky coverage in the turn-off sample and are dominated by LAMOST which shows very strong age gradients on these large Galactic scales.

Figure 8: All-sky age maps in galactic coordinates: left panels show giant stars and right panels turn-off stars. Each row corresponds to different distance brackets (shown in ) above the plots with the top panels showing all stars. The different survey selection functions are clearly imprinted on the top right image.

4.4 Age-kinematic relations

As a demonstration of the power of our catalogue, we briefly investigate some age-kinematic correlations in the disc. Studies of disc populations subdivided by age (or ) (e.g. Bovy et al., 2016; Mackereth et al., 2017) have tended to be restricted to spatial-chemical correlations due to the limitations of pre-Gaia proper motions. Our catalogue opens up the possibility of inspecting not just spatial-chemical correlations but full chemo-kinematic correlations. We reserve a full analysis to a future project and instead here give a flavour of what is possible.

We compute the velocity dispersion of populations separated into radial and age bins. When computing the velocity dispersion in each bin, we account for the uncertainty in each velocity measurement in the following way. We first perform a clip of the raw velocities to remove outliers and halo contaminants. We then seek to maximise the log-likelihood


which can be found by solving


Here is the simple mean computed without accounting for the uncertainties. We find the value of that solves this equation by Brent’s method. The uncertainty in the resulting estimate can be found from the second derivative of the log-likelihood and is approximately for stars. We consider at least stars per bin resulting in dispersion uncertainties better than .

Figure 9: Velocity dispersions against Galactocentric radius for a series of age bins (for giant and turn-off stars with and ). Each line is coloured by the mean age of the bin. The top panel shows the radial velocity dispersion and bottom panel the vertical dispersion. The dashed black lines are to guide the eye and correspond to exponentials with scale radii of and respectively. The errorbar show the maximum error for each datapoint (the errorbar is the bin width).
Figure 10: Velocity dispersions against age for a series of radial bins (for giant and turn-off stars with and ). Each line is coloured by the mean Galactocentric radius of the bin. The top panel shows the radial velocity dispersion and bottom panel the vertical dispersion. The thick line corresponds to bin centred around the solar neighbourhood. The dashed black lines are to guide the eye and correspond to power laws with coefficients and respectively. The errorbar show the maximum error for each datapoint (the errorbar is the bin width).

In Figure 9 we give the Galactocentric radial dispersion profile of the radial and vertical velocity separated into different age bins. We use all ‘best’ giant and turn-off stars with , and (approximately million stars). Inwards of the solar radius, both velocity dispersions decline approximately exponentially with scale radii of . Near and beyond the solar radius, the velocity dispersion profiles flatten with radius. The radial velocity dispersion continues to decline whilst the vertical dispersion plateaus and begins to weakly increase (possibly due to the selection effect of seeing higher latitude stars at larger Galactocentric radii). At all radii, we clearly see a gradient in age with lower age populations on kinematically colder orbits. The break in the behaviour of the velocity dispersion profiles occurs at decreasing radius with increasing age.

The behaviour of the velocity dispersions with age can be observed more clearly in Figure 10 where we show the dispersion profiles with age split by Galactocentric radius. At all radii, the velocity dispersions grow approximately like a power-law with age, . We find coefficients of for the radial dispersion and for the vertical dispersion although this is not accounting for the (significant) age uncertainties (Aumer et al., 2016b). However, these values are consistent with observations of the solar neighbourhood (Aumer & Binney, 2009) and indicate the velocity dispersions throughout the Galaxy are consistent with heating by a combination of spiral arms and molecular clouds (Aumer et al., 2016a). With our particular set of cuts (in particular, only considering low Galactic height stars), we find no real indication of a break in the dispersion profile at intermediate/large age. Inside the solar radius, there is the suggestion of the profiles flattening with age (although this is possibly due to increased age uncertainties). We note that our observations must all be considered within the context of the selection function of the sample. A future work will account for these effects in a full model of our presented catalogue.

Figure 11: Normalized kernel density estimates for the action distributions in six evenly-spaced age bins from to (for giant and turn-off stars with and ) – left panel shows logarithm of the radial action, central panel the -component of the angular momentum and right panel the logarithm of the vertical action.

In our output catalogue, we also provide estimates of the actions (with associated errors). In Figure 11 we show the action distributions (estimated using a kernel density estimate) split into six age bins using all ‘best’ giant and turn-off stars with , and . Mirroring the results from Figure 9, we see the steadily increasing mean radial and vertical action with increasing age. We observe that the oldest age bin in vertical action is signifcantly skewed indicating the presence of a thick disc component. This is not mirrored in the radial action distribution. The picture from the radial action distributions alone is of a smooth quiescent evolution of the disc. The -component of angular momentum smoothly declines with age reflecting both the asymmetric drift in the populations and the spatial distribution due to inside-out growth. The angular momentum distribution also has more evidence of the selection function of the catalogue (as -component of angular momentum is a proxy for radius for dynamically cool stars).

5 Conclusions

We have presented a catalogue444The catalogue is available from https://www.ast.cam.ac.uk/ jls/data
of approximately million ages, masses, distances, extinctions and spectroscopic parameters for stars in common with large spectroscopic surveys and the second Gaia data release. We considered the results from APOGEE, LAMOST, SEGUE, Gaia-ESO, RAVE (using both the RAVE DR5 results and RAVE-On) and GALAH giving approximate all-sky coverage. We have complemented our catalogue with estimates of Galactocentric coordinates and actions along with associated (one-dimensional) uncertainties. As well as presenting details of our procedure, we have focussed on the quality of the output catalogue and presented some preliminary results demonstrating its power. Our conclusions are as follows.

  1. A considerable fraction of our catalogue is assigned pre-main sequence properties from our pipeline. Many of these stars lie along the binary sequence in colour-magnitude space so are suspected binary stars. Our catalogue can be used to assign binarity to the stars observed by the considered spectroscopic surveys.

  2. We have investigated the output uncertainties produced by our catalogue. Two key subsamples for studying the age structure of the Galaxy are giant stars and turn-off stars. For the APOGEE, LAMOST and GALAH subsamples, we have employed techniques to assign spectroscopic mass estimates to the giant stars (via carbon and nitrogen abundances). This results in ages accurate to . For turn-off stars, parallax is insufficient to determine an accurate age due to a degeneracy with metallicity. Employing spectroscopic metallicity measurements results in output uncertainties of . We also have provided output spectroscopic parameters and which typically have uncertainties less than and respectively. These parameters can be used as initial guesses in improved analysis of the spectra.

  3. We have provided output extinction estimates for all stars in our catalogue. These estimates are informed by prior extinction maps but the output uncertainties are typically smaller than the prior input uncertainties. When complemented with our output distances, the extinction values can be used to construct a 3D extinction map (over the volume probed by our spectroscopic samples).

  4. We presented some first results on the correlations between kinematics and age for Milky Way stars using our new catalogue. We demonstrated that the Galactocentric radial profile of radial and vertical velocity dispersions appears to flatten beyond the solar radius. At all radii, we see a smooth power law increase of the (radial and vertical) velocity dispersions with age.

The presentation of our catalogue represents a first step in analysing the chemo-dynamic composition of the Galaxy with the Gaia data. We have provided an indication of the power of the catalogue in analysing the correlations between age and kinematics throughout the Galaxy. Although our catalogue is a combination of multiple surveys, the selection function of the total catalogue can be found, as most of the constituent catalogues have well defined selection functions. A future contribution will model the published catalogue.


JLS acknowledges the support of the Science and Technology Facilities Council (STFC). PD would like to acknowledge support from the STFC (ST/N000919/1). We thank Gyuchul Myeong for checking an early version of the catalogue, and Douglas Boubert and Eugene Vasiliev for useful conversations. We acknowledge the use of Sergey Koposov’s Whole Sky Database (WSDB) and Andy Casey’s Cannon code. We thank Andy Casey for providing the correlations between spectroscopic parameters for the RAVE-On catalogue. This research made use of Astropy, a community-developed core Python package for Astronomy (Astropy Collaboration, 2018).

This project was developed in part at the 2016 NYC Gaia Sprint, hosted by the Center for Computational Astrophysics at the Simons Foundation in New York City.

This work has made use of data from the European Space Agency (ESA) mission Gaia (https://www.cosmos.esa.int/gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC, https://www.cosmos.esa.int/web/gaia/dpac/consortium). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement.

This publication makes use of data products from the Two Micron All Sky Survey, which is a joint project of the University of Massachusetts and the Infrared Processing and Analysis Center/California Institute of Technology, funded by the National Aeronautics and Space Administration and the National Science Foundation.

Funding for the Sloan Digital Sky Survey IV has been provided by the Alfred P. Sloan Foundation, the U.S. Department of Energy Office of Science, and the Participating Institutions. SDSS-IV acknowledges support and resources from the Center for High-Performance Computing at the University of Utah. The SDSS web site is www.sdss.org.

SDSS-IV is managed by the Astrophysical Research Consortium for the Participating Institutions of the SDSS Collaboration including the Brazilian Participation Group, the Carnegie Institution for Science, Carnegie Mellon University, the Chilean Participation Group, the French Participation Group, Harvard-Smithsonian Center for Astrophysics, Instituto de Astrofísica de Canarias, The Johns Hopkins University, Kavli Institute for the Physics and Mathematics of the Universe (IPMU) / University of Tokyo, Lawrence Berkeley National Laboratory, Leibniz Institut für Astrophysik Potsdam (AIP), Max-Planck-Institut für Astronomie (MPIA Heidelberg), Max-Planck-Institut für Astrophysik (MPA Garching), Max-Planck-Institut für Extraterrestrische Physik (MPE), National Astronomical Observatories of China, New Mexico State University, New York University, University of Notre Dame, Observatário Nacional / MCTI, The Ohio State University, Pennsylvania State University, Shanghai Astronomical Observatory, United Kingdom Participation Group, Universidad Nacional Autónoma de México, University of Arizona, University of Colorado Boulder, University of Oxford, University of Portsmouth, University of Utah, University of Virginia, University of Washington, University of Wisconsin, Vanderbilt University, and Yale University.

Guoshoujing Telescope (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope LAMOST) is a National Major Scientific Project built by the Chinese Academy of Sciences. Funding for the project has been provided by the National Development and Reform Commission. LAMOST is operated and managed by the National Astronomical Observatories, Chinese Academy of Sciences.

Funding for RAVE has been provided by: the Australian Astronomical Observatory; the Leibniz-Institut fuer Astrophysik Potsdam (AIP); the Australian National University; the Australian Research Council; the French National Research Agency; the German Research Foundation (SPP 1177 and SFB 881); the European Research Council (ERC-StG 240271 Galactica); the Istituto Nazionale di Astrofisica at Padova; The Johns Hopkins University; the National Science Foundation of the USA (AST-0908326); the W. M. Keck foundation; the Macquarie University; the Netherlands Research School for Astronomy; the Natural Sciences and Engineering Research Council of Canada; the Slovenian Research Agency; the Swiss National Science Foundation; the Science & Technology Facilities Council of the UK; Opticon; Strasbourg Observatory; and the Universities of Groningen, Heidelberg and Sydney. The RAVE web site is at https://www.rave-survey.org.


  • Abolfathi et al. (2017) Abolfathi B., et al., 2017, preprint, (arXiv:1707.09322)
  • Allende Prieto et al. (2008) Allende Prieto C., et al., 2008, AJ, 136, 2070
  • Andrae et al. (2018) Andrae R., et al., 2018, preprint, (arXiv:1804.09374)
  • Aumer & Binney (2009) Aumer M., Binney J. J., 2009, MNRAS, 397, 1286
  • Aumer et al. (2016a) Aumer M., Binney J., Schönrich R., 2016a, MNRAS, 459, 3326
  • Aumer et al. (2016b) Aumer M., Binney J., Schönrich R., 2016b, MNRAS, 462, 1697
  • Binney (2012) Binney J., 2012, MNRAS, 426, 1324
  • Binney et al. (2014) Binney J., et al., 2014, MNRAS, 437, 351
  • Bland-Hawthorn & Gerhard (2016) Bland-Hawthorn J., Gerhard O., 2016, ARA&A, 54, 529
  • Blanton et al. (2017) Blanton M. R., et al., 2017, AJ, 154, 28
  • Bovy (2017) Bovy J., 2017, MNRAS, 470, 1360
  • Bovy et al. (2016) Bovy J., Rix H.-W., Green G. M., Schlafly E. F., Finkbeiner D. P., 2016, ApJ, 818, 130
  • Bressan et al. (2012) Bressan A., Marigo P., Girardi L., Salasnich B., Dal Cero C., Rubele S., Nanni A., 2012, MNRAS, 427, 127
  • Buder et al. (2018) Buder S., et al., 2018, preprint, (arXiv:1804.06041)
  • Burnett & Binney (2010) Burnett B., Binney J., 2010, MNRAS, 407, 339
  • Casey et al. (2016) Casey A. R., et al., 2016, preprint, (arXiv:1609.02914)
  • Casey et al. (2017) Casey A. R., et al., 2017, ApJ, 840, 59
  • Castelli & Kurucz (2004) Castelli F., Kurucz R. L., 2004, ArXiv Astrophysics e-prints,
  • Charbonnel (1994) Charbonnel C., 1994, A&A, 282, 811
  • Chen et al. (2014) Chen Y., Girardi L., Bressan A., Marigo P., Barbieri M., Kong X., 2014, MNRAS, 444, 2525
  • Chen et al. (2015) Chen Y., Bressan A., Girardi L., Marigo P., Kong X., Lanza A., 2015, MNRAS, 452, 1068
  • Coronado et al. (2018) Coronado J., Rix H.-W., Trick W. H., 2018, preprint, (arXiv:1804.07760)
  • Cropper et al. (2018) Cropper M., et al., 2018, preprint, (arXiv:1804.09369)
  • Cui et al. (2012) Cui X.-Q., et al., 2012, Research in Astronomy and Astrophysics, 12, 1197
  • Dalton et al. (2006) Dalton G. B., et al., 2006, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series. p. 62690X, doi:10.1117/12.670018
  • Das & Sanders (2018) Das P., Sanders J., 2018, preprint, (arXiv:1804.09596)
  • De Silva et al. (2015) De Silva G. M., et al., 2015, MNRAS, 449, 2604
  • Deng et al. (2012) Deng L.-C., et al., 2012, Research in Astronomy and Astrophysics, 12, 735
  • Drimmel et al. (2003) Drimmel R., Cabrera-Lavers A., López-Corredoira M., 2003, A&A, 409, 205
  • Emerson et al. (2006) Emerson J., McPherson A., Sutherland W., 2006, The Messenger, 126, 41
  • Evans et al. (2018) Evans D. W., et al., 2018, preprint, (arXiv:1804.09368)
  • Fitzpatrick (1999) Fitzpatrick E. L., 1999, PASP, 111, 63
  • Freeman & Bland-Hawthorn (2002) Freeman K., Bland-Hawthorn J., 2002, ARA&A, 40, 487
  • Gaia Collaboration et al. (2016) Gaia Collaboration et al., 2016, A&A, 595, A1
  • Gaia Collaboration et al. (2018c) Gaia Collaboration et al., 2018c, preprint, (arXiv:1804.09381)
  • Gaia Collaboration et al. (2018b) Gaia Collaboration et al., 2018b, preprint, (arXiv:1804.09380)
  • Gaia Collaboration et al. (2018a) Gaia Collaboration Brown A. G. A., Vallenari A., Prusti T., de Bruijne J. H. J., Babusiaux C., Bailer-Jones C. A. L., 2018a, preprint, (arXiv:1804.09365)
  • García Pérez et al. (2016) García Pérez A. E., et al., 2016, AJ, 151, 144
  • Gilmore et al. (2012) Gilmore G., et al., 2012, The Messenger, 147, 25
  • Green et al. (2018) Green G. M., et al., 2018, preprint, (arXiv:1801.03555)
  • Gunn et al. (2006) Gunn J. E., et al., 2006, AJ, 131, 2332
  • Ho et al. (2017) Ho A. Y. Q., et al., 2017, ApJ, 836, 5
  • Hoffman & Gelman (2011) Hoffman M. D., Gelman A., 2011, preprint, (arXiv:1111.4246)
  • Howes et al. (2018) Howes L. M., Lindegren L., Feltzing S., Church R. P., Bensby T., 2018, preprint, (arXiv:1804.08321)
  • Jørgensen & Lindegren (2005) Jørgensen B. R., Lindegren L., 2005, A&A, 436, 127
  • Kordopatis et al. (2013) Kordopatis G., et al., 2013, AJ, 146, 134
  • Kroupa et al. (1993) Kroupa P., Tout C. A., Gilmore G., 1993, MNRAS, 262, 545
  • Kunder et al. (2017) Kunder A., et al., 2017, AJ, 153, 75
  • Lee et al. (2008a) Lee Y. S., et al., 2008a, AJ, 136, 2022
  • Lee et al. (2008b) Lee Y. S., et al., 2008b, AJ, 136, 2050
  • Lindegren et al. (2018) Lindegren L., et al., 2018, preprint, (arXiv:1804.09366)
  • Mackereth et al. (2017) Mackereth J. T., et al., 2017, MNRAS, 471, 3057
  • Maíz Apellániz (2006) Maíz Apellániz J., 2006, AJ, 131, 1184
  • Marshall et al. (2006) Marshall D. J., Robin A. C., Reylé C., Schultheis M., Picaud S., 2006, A&A, 453, 635
  • Martell et al. (2017) Martell S. L., et al., 2017, MNRAS, 465, 3203
  • Martig et al. (2016) Martig M., et al., 2016, MNRAS, 456, 3655
  • Masseron & Gilmore (2015) Masseron T., Gilmore G., 2015, MNRAS, 453, 1855
  • McMillan et al. (2017) McMillan P. J., et al., 2017, preprint, (arXiv:1707.04554)
  • Mints & Hekker (2018) Mints A., Hekker S., 2018, preprint, (arXiv:1804.06578)
  • Ness et al. (2016) Ness M., Hogg D. W., Rix H.-W., Martig M., Pinsonneault M. H., Ho A. Y. Q., 2016, ApJ, 823, 114
  • Pinsonneault et al. (2014) Pinsonneault M. H., et al., 2014, ApJS, 215, 19
  • Pont & Eyer (2004) Pont F., Eyer L., 2004, MNRAS, 351, 487
  • Queiroz et al. (2018) Queiroz A. B. A., et al., 2018, MNRAS, 476, 2556
  • Riello et al. (2018) Riello M., et al., 2018, preprint, (arXiv:1804.09367)
  • Riess et al. (2018) Riess A. G., et al., 2018, preprint, (arXiv:1804.10655)
  • Robin et al. (2012) Robin A. C., Marshall D. J., Schultheis M., Reylé C., 2012, A&A, 538, A106
  • Salaris et al. (1993) Salaris M., Chieffi A., Straniero O., 1993, ApJ, 414, 580
  • Salvatier et al. (2016) Salvatier J., Wiecki T. V., Fonnesbeck C., 2016, PeerJ Computer Science, 2, e55
  • Sanders & Binney (2016) Sanders J., Binney J., 2016, MNRAS, 457, 2107
  • Sartoretti et al. (2018) Sartoretti P., et al., 2018, preprint, (arXiv:1804.09371)
  • Schlafly & Finkbeiner (2011) Schlafly E. F., Finkbeiner D. P., 2011, ApJ, 737, 103
  • Schlafly et al. (2016) Schlafly E. F., et al., 2016, ApJ, 821, 78
  • Schlegel et al. (1998) Schlegel D. J., Finkbeiner D. P., Davis M., 1998, ApJ, 500, 525
  • Schönrich et al. (2010) Schönrich R., Binney J., Dehnen W., 2010, MNRAS, 403, 1829
  • Sharma et al. (2011) Sharma S., Bland-Hawthorn J., Johnston K., Binney J., 2011, ApJ, 730, 3
  • Simion et al. (2017) Simion I. T., Belokurov V., Irwin M., Koposov S. E., Gonzalez-Fernandez C., Robin A. C., Shen J., Li Z.-Y., 2017, MNRAS, 471, 4323
  • Skrutskie et al. (2006) Skrutskie M. F., et al., 2006, AJ, 131, 1163
  • Smolinski et al. (2011) Smolinski J. P., et al., 2011, AJ, 141, 89
  • Soderblom (2010) Soderblom D. R., 2010, ARA&A, 48, 581
  • Steinmetz et al. (2006) Steinmetz M., et al., 2006, AJ, 132, 1645
  • Tang et al. (2014) Tang J., Bressan A., Rosenfield P., Slemer A., Marigo P., Girardi L., Bianchi L., 2014, MNRAS, 445, 4287
  • Tayar et al. (2015) Tayar J., et al., 2015, ApJ, 807, 82
  • The Astropy Collaboration et al. (2018) The Astropy Collaboration et al., 2018, preprint, (arXiv:1801.02634)
  • Vrard et al. (2016) Vrard M., Mosser B., Samadi R., 2016, A&A, 588, A87
  • Wilson et al. (2010) Wilson J. C., et al., 2010, in Ground-based and Airborne Instrumentation for Astronomy III. p. 77351C, doi:10.1117/12.856708
  • Yanny et al. (2009) Yanny B., et al., 2009, AJ, 137, 4377
  • Zhao et al. (2012) Zhao G., Zhao Y., Chu Y., Jing Y., Deng L., 2012, preprint, (arXiv:1206.3569)

Appendix A Extinction coefficients

In our Bayesian distance pipeline, we require an extinction law to deredden any photometry used. This amounts to computing the set of coefficients for the photometric bands . Given the band extinction , the extinction in band is . The traditional definition of is such that where is the selective extinction. As we work with the extinction maps of Green et al. (2018) who provide extinction in units of , we define .

Here we choose to adopt the extinction curve from Schlafly et al. (2016) that was determined from APOGEE data. The curve was chosen to reproduce the observed extinction coefficients at the iso-extinction wavelengths for the Pan-STARRS and 2-MASS bands. Schlafly et al. (2016) parametrizes the extinction curve in terms of where corresponds approximately to . Schlafly et al. (2016) finds that on average although there is variation of across the APOGEE survey area. In the main body of the paper, we fix and hence choose . The bluest band considered by Schlafly et al. (2016) was the Pan-STARRS band. Therefore, computing the extinction coefficients for bluer bands (particularly ) is an extrapolation.

Following Green et al. (2018), we set the unknown grey component by insisting the extinction in the WISE band is zero. We set the scaling relative to the extinction provided by Green et al. (2018) (chosen such that one unit of for the Pan-STARRS bands produces one unit of Schlegel et al. ) by matching the coefficients for and provided in Table 1 of Green et al. (2018) at the iso-extinction wavelengths of Schlafly et al. (2016). . We compute


where is the response of bandpass 555Gaia DR2 photometric response curves downloaded from www.cosmos.esa.int, Pan-STARRS from http://ipp.ifa.hawaii.edu/ps1.filters/, 2MASS from https://www.ipac.caltech.edu/2mass/releases/second/ (provides ), WISE from http://wise2.ipac.caltech.edu/docs/release/prelim/expsup/, Johnson and from Maíz Apellániz (2006) and SDSS from http://classic.sdss.org/dr3/instruments/imager/filters/. and the stellar model flux. From these coefficients and the selective extinction reported by Green et al. (2018), the extinction in band is .

In Table 2 we provide the extinction coefficients evaluated using equation (19) with using for the Castelli & Kurucz (2004) stellar model with solar metallicity, and .

Table 2: Extinction coefficients from Schlafly et al. (2016) extinction curve for use with Green et al. (2018) extinction maps: gives the effective wavelength, the extinction coefficient evaluated at the effective wavelength and the extinction coefficient computed by integrating over a solar-metallicity stellar spectrum of effective temperature and surface gravity . Given extinction in the extinction units reported by Green et al. (2018), the extinction in band is .

a.1 Variation of with intrinsic colour

As the Gaia band is broad, the coefficient provided in Table 2 is not appropriate for high extinction and for stars with intrinsic colours significantly different from that of a star. To handle the latter of these issues, we evaluate equation 19 as a function of effective temperature and (we neglect any deviation with metallicity and surface gravity) using the solar metallicity and surface gravity stellar models of Castelli & Kurucz (2004) (for ). To the resulting , we fit polynomials of the form ( and terms were unnecessary)


We use and . Additionally, using a solar metallicity PARSEC isochrone (Bressan et al., 2012; Chen et al., 2014; Tang et al., 2014; Chen et al., 2015) we can rewrite these polynomials with arguments , , , , , and . The coefficients for this polynomial is given in Table 3. Throughout the main body of the paper we use so .

2.9083 1.0268 0.0278 -1.571 0.2401 -0.0007 -0.247 0.9734
2.9228 -0.7571 0.022 -0.164 0.008 0.0007 0.1633 0.081
2.9083 -0.9529 0.0285 -0.274 -0.0392 0.0007 0.1937 0.3879
2.9176 -0.4347 0.0255 -0.0648 -0.0041 0.0004 0.0893 0.0394
2.8808 -1.0154 0.0302 0.2385 -0.0696 0.0008 0.2245 -0.0014
2.7549 -0.5898 0.0525 0.3843 -0.0425 0.0005 0.1456 -0.2209
2.7433 -0.6583 0.0552 0.4938 -0.056 0.0006 0.165 -0.3346
2.908 -0.7492 0.0198 0.3956 -0.0402 0.0005 0.1621 -0.1553
Table 3: extinction coefficient polynomial coefficients where where for the extinction reported by Green et al. (2018) using Schlafly et al. (2016) extinction curve. Each row corresponds to a different given in the left column and .
2.6241 0.9392 -0.1496 -1.4703 0.1517 0.0525 0.0368 -0.2465 0.9321 -0.0092
2.6343 -0.698 -0.1641 -0.1484 -0.0126 0.0579 -0.0356 0.1954 0.0846 -0.0092
2.6214 -0.881 -0.1601 -0.2573 -0.0424 0.0579 -0.0365 0.2183 0.3736 -0.0092
2.6298 -0.4016 -0.1619 -0.0602 -0.0071 0.0579 -0.0182 0.1038 0.0382 -0.0092
2.5951 -0.9401 -0.1587 0.226 -0.0725 0.058 -0.0401 0.2513 0.014 -0.0092
2.4775 -0.5453 -0.1352 0.3646 -0.0411 0.0549 -0.0254 0.1604 -0.2039 -0.0092
2.4668 -0.6083 -0.1323 0.4688 -0.0541 0.0545 -0.0289 0.1818 -0.3089 -0.0092
2.6197 -0.6955 -0.1708 0.3719 -0.0394 0.0602 -0.0259 0.1774 -0.1416 -0.0092
Table 4: extinction coefficient polynomial coefficients where where for the extinction reported by Schlegel et al. (1998) using Fitzpatrick (1999) extinction curve and the correction of Schlafly & Finkbeiner (2011). Each row corresponds to a different given in the left column and .

The run of with is shown in Fig. 12 coloured by choice of . We also show the polynomial result for in black. We note that the variation of with effective temperature is reasonably large () where for blue stars () , for redder stars of and for very red stars () is as low as .

Figure 12: Extinction coefficient for Gaia -band: the extinction coefficient is plotted as a function of coloured by the choice of Schlafly et al. (2016) extinction law coefficient . The black dots show the polynomial fit for . The two horizontal dashed lines show the extinction coefficients for the narrow bands and . This extinction coefficient is for use with the extinctions reported by Green et al. (2018) as .

As the Gaia band is broad, we also consider variation in with the monochromatic extinction. For two models , we compute equation (