# Kinematic modelling of the Milky Way using the RAVE and GCS stellar surveys

## Abstract

We investigate the kinematic parameters of the Milky Way disc using the Radial Velocity (RAVE) and Geneva-Copenhagen (GCS) stellar surveys. We do this by fitting a kinematic model to the data taking the selection function of the data into account. For stars in the GCS we use all phase-space coordinates, but for RAVE stars we use only . Using Markov Chain Monte Carlo (MCMC) technique, we investigate the full posterior distributions of the parameters given the data. We investigate the ‘age-velocity dispersion’ relation (AVR) for the three kinematic components (), the radial dependence of the velocity dispersions, the Solar peculiar motion (), the circular speed at the Sun and the fall of mean azimuthal motion with height above the mid-plane. We confirm that the Besançon-style Gaussian model accurately fits the GCS data, but fails to match the details of the more spatially extended RAVE survey. In particular, the Shu distribution function (DF) handles non-circular orbits more accurately and provides a better fit to the kinematic data. The Gaussian distribution function not only fits the data poorly but systematically underestimates the fall of velocity dispersion with radius. The radial scale length of the velocity dispersion profile of the thick disc was found to be smaller than that of the thin disc. We find that correlations exist between a number of parameters, which highlights the importance of doing joint fits. The large size of the RAVE survey, allows us to get precise values for most parameters. However, large systematic uncertainties remain, especially in and . We find that, for an extended sample of stars, is underestimated by as much as if the vertical dependence of the mean azimuthal motion is neglected. Using a simple model for vertical dependence of kinematics, we find that it is possible to match the Sgr A* proper motion without any need for being larger than that estimated locally by surveys like GCS.

###### Subject headings:

galaxies:kinematics and dynamics – fundamental parameters – formation – methods: data analysis – numerical – statistical## 1. Introduction

Understanding the origin and evolution of disc galaxies is one of the major goals of modern astronomy. The disc is a prominent feature of late type galaxies like the Milky Way. As compared to distant galaxies, for which one can only measure the gross properties, the Milky Way offers the opportunity to study the disc in great detail. For the Milky Way, we can determine 6-dimensional phase space information, combined with photometric and stellar parameters, for a huge sample of stars. This has led to large observational programs to catalog the stars in the Milky Way in order to compare them with theoretical models.

The Milky Way stellar system is broadly composed of four distinct parts although in reality there is likely to be considerable overlap between them: the thin disc, the thick disc, the stellar halo and the bulge. In this paper, we mainly concentrate on understanding the disc components which are the dominant stellar populations.

In the Milky Way, the thick disc was originally identified as the second exponential required to fit vertical star counts (Gilmore & Reid, 1983; Reid & Majewski, 1993; Jurić et al., 2008). Thick discs are also ubiquitous features of late type galaxies (Yoachim & Dalcanton, 2006). But whether the thick disc is a separate component with a distinct formation mechanism is highly debatable and a difficult question to answer.

Since the Gilmore & Reid (1983) result, various attempts have been made to characterize the thick disc. Some studies suggest that thick disc stars have distinct properties: they are old and metal poor (Chiba & Beers, 2000) and enhanced (Fuhrmann, 1998; Bensby et al., 2005, 2003). Jurić et al. (2008) fit the SDSS star counts using a two-component model and find that the thick disc has a larger scale-length than the thin disc. In contrast, Bovy et al. (2012c) using a much smaller sample of SDSS and SEGUE stars find the opposite when they associate the thick disc with the -enhanced component. Finally, the idea of a separate thick disc has recently been challenged. Schönrich & Binney (2009b, a) argued that chemical evolutionary models with radial migration and mixing can replicate the properties of the thick disc (see also Loebman et al., 2011, who explore radial mixing using N-body simulations). Ivezić et al. (2008) do not find the expected separation between metallicity and kinematics for F, G stars in the SDSS survey, and Bovy et al. (2012a, b) argue that the thick disc is a smooth continuation of the thin disc.

Opinions regarding the formation of a thick disc are equally divided. Various mechanisms have been proposed: accretion of stars from disrupted galaxies (Abadi et al., 2003), heating of discs by minor mergers (Quinn et al., 1993; Kazantzidis et al., 2008, 2009; Villalobos & Helmi, 2008; Di Matteo et al., 2011), radial migration of stars (Schönrich & Binney, 2009b, a; Loebman et al., 2011), a gas-rich merger at high redshift (Brook et al., 2004), and gravitationally unstable disc evolution (Bournaud et al., 2009), inter alia. Recently, Forbes et al. (2012) have suggested that the thick disc can form without secular heating, mainly because stars forming at higher redshift had a higher velocity dispersion. Another possibility, proposed by Roškar et al. (2010), is misaligned angular momentum of in-falling gas. How the angular momentum of halo gas becomes misaligned is described in Sharma et al. (2012). However, Aumer & White (2013) and Sales et al. (2012) suggest that misaligned gas can destroy the discs.

The obvious way to test the different thick disc theories is to compare the kinematic and chemical abundance distributions of the thick disc stars with those of different models. Since, the thin and thick disc stars strongly overlap in both space and kinematics, it is difficult to separate them using just position and velocity. To really isolate and study the thick disc, one needs a tag that stays with a star throughout its life. Age is a possible tag but it is difficult to get reliable age estimates of stars. Chemical composition is another promising tag that can be used, but this requires high resolution spectroscopy of a large number of stars. In the near future, surveys such as GALAH using the HERMES spectrograph (Freeman & Bland-Hawthorn, 2008) and the Gaia–ESO survey using the FLAMES spectrograph (Gilmore et al., 2012) should be able to fill this void. In our first analysis, we restrict ourselves to a differential kinematic study of the disc components. We plan to treat the more difficult problem of chemo-dynamics in future.

The simplest way to describe the kinematics of the Milky Way stars of the Solar neighborhood is by assuming Gaussian velocity distributions with some pre-determined orientation of the principal axes of the velocity ellipsoid. Then if a single component disc is used, only three components of velocity dispersion and the mean azimuthal velocity need be known. If a thick disc is included, one requires five additional parameters, one of them being the fraction of stars in the thick disc. If stars are sampled from an extended volume and not just the Solar neighborhood, then one needs to specify the radial dependence of the dispersions.

The velocity dispersion of a disc stellar population is known to increase with age, so one has to adopt an age velocity-dispersion relation. Discs heat because a cold, thin disc occupies a very small fraction of phase space, and fluctuations in the gravitational field cause stars to diffuse through phase space to regions of lower phase-space density. The fluctuations arise from several sources, including giant molecular clouds, spiral arms, a rotating bar, and halo objects that come close to the disc. One approach to computing the consequences of these processes is N-body simulation, but stellar discs are notoriously tricky to simulate accurately, with the consequence that reliable simulations are computationally costly. In particular, they are too costly for it to be feasible to find a simulation that provides a good fit to a significant body of observational data. Instead we characterize the properties of the Milky Way disc by fitting a suitable analytical formula. The formula summarizes large amounts of data but its usefulness extends beyond this. The formula is traditionally taken to be a power law in age (although see Edvardsson et al., 1993; Quillen & Garnett, 2001; Seabroke & Gilmore, 2007). The exponents , and of these power laws may not be the same for all three components. The ratio and the values of , and are useful for understanding the physical processes responsible for heating the disc (e.g. Binney, 2013; Sellwood, 2013).

The first generation of stellar population models characterized the density distribution of stars using photometric surveys. Bahcall & Soneira (1980a, b); Bahcall & Soneira (1984) assumed an exponential disc with magnitude-dependent scale heights. An evolutionary model using population synthesis techniques was presented by Robin & Creze (1986). Given a star formation rate (SFR) and an initial mass function (IMF), one calculates the resulting stellar populations using theoretical evolutionary tracks. The important step forward was that the properties of the disc, like scale height, density laws and velocity dispersions, were assumed to be a function of age rather than being color-magnitude dependent terms. Bienayme et al. (1987) later introduced dynamical self-consistency to link disc scale and vertical velocity dispersions via the gravitational potential. Haywood et al. (1997a, b) further improved the constraints on SFR and IMF of the disc. The present state of the art is described in Robin et al. (2003) and is known as the Besançon model. Here, the disc is constructed from a set of isothermal populations that are assumed to be in equilibrium. Analytic functions for the density distribution, age/metallicity relation and IMF are provided for each population. A similar scheme is also used by the codes TRILEGAL Girardi et al. (2005) and Galaxia (Sharma et al., 2011).

There is a crucial distinction between kinematic and dynamical models. In a kinematic model, one specifies the stellar motions independently at each spatial location, and the gravitational field in which the stars move plays no role. In a dynamical model, the spatial density distribution of stars and their kinematics are self-consistently linked by the potential, under the assumption that the system is in steady state. If one has expressions for three constants of stellar motion as functions of position and velocity, dynamical models are readily constructed via Jeans’ theorem. Binney (2012b) provides an algorithm for evaluating approximate action integrals, and has used these to fit dynamical models to the GCS data (Binney, 2012b). Binney et al. (2014) have confronted the predictions of the best of these models with RAVE data and shown that the model is remarkably, but not perfectly, successful. Our approach is different in two key respects: we fit kinematic rather than dynamical models, and we avoid adopting distances to, or using proper motions of, RAVE stars.

Large photometric surveys such as DENIS (Epchtein et al., 1999), 2MASS (Skrutskie et al., 2006) and SDSS (Abazajian et al., 2009) provide the underpinning for all Galaxy modelling efforts. The SDSS survey has been used to provide an empirical model of the Milky Way stars (Jurić et al., 2008; Ivezić et al., 2008; Bond et al., 2010). The Besançon model was fitted to the 2MASS star counts, and its photometric parameters have been more thoroughly tested than its kinematic parameters because kinematic data for a large number of stars was not available when the model was constructed,

The Hipparcos satellite (Perryman et al., 1997) and the UCAC2 catalog (Zacharias et al., 2004) provided proper motions and parallaxes for stars in the Solar neighborhood. Dehnen & Binney (1998b) used the Hipparcos data to study stellar kinematics as a function of color. They also determined the Solar motion with respect to the LSR and the axial ratios of the velocity ellipsoid. Binney et al. (2000) also using Hipparcos stars found the velocity dispersion to vary with function of age as . More recently, Aumer & Binney (2009) using data from a new reduction of the Hipparcos mission estimated the Solar motion and the AVR for all three velocity components. The AVR is assumed to be a power law with exponents and for the three velocity components in the galactocentric cylindrical coordinate system. They found . They also investigated the star formation rate (SFR) and found it to be declining from past to present. However, a degeneracy exists between the SFR and the slope of the IMF (Haywood et al., 1997b), and constraining both of them together is challenging.

The GCS survey (Nordström et al., 2004) combined the Hipparcos and Tycho-2 (Høg et al., 2000) proper motions with radial velocity measurements and Strömgren photometry to create a kinematically unbiased sample of 16682 F and G stars in the Solar neighborhood. The data contains full 6D phase space information along with estimates of ages. The temperature, metallicity and ages were further improved by Holmberg et al. (2007) and distances and kinematics were improved by Holmberg et al. (2009) using revised Hipparcos parallaxes. They investigated the AVR and found which are at odds with Aumer & Binney (2009). Casagrande et al. (2011) used the infrared flux method to improve the temperature, metallicity and age estimates for the GCS survey. The uncertainty in estimated ages is an ongoing concern for studies that attempt to derive the AVR directly from the GCS data.

With the advent of large spectroscopic surveys like RAVE (Steinmetz et al., 2006) and SDSS/SEGUE (Yanny et al., 2009), we now have the radial velocity and stellar parameters for a large number of stars to beyond the Solar neighborhood. Bovy et al. (2012b, a, c) used SDSS/SEGUE to fit the spatial distributions of mono-abundance populations by double exponentials. They showed that the vertical velocity dispersion declines exponentially with radius but varies little in . Finally, they argue that the thick disc is a continuation of the thin disc rather than a separate entity.

The RAVE survey has also been used to study the stellar kinematics of the Milky Way disc. Pasetto et al. (2012a, b) study the velocity dispersion and mean motion of the thin and thick disc stars in the plane. They use the technique of singular value decomposition to compute the moments of the velocity distribution. Their analysis clearly shows that velocity dispersions fall as a function of distance from the Galactic Center. Williams et al. (2013) explored the kinematics using red clump stars from RAVE and found complex structures in velocity space. A detailed comparison with the prediction from the code Galaxia was done, taking the selection function of RAVE into account. The trend of dispersions in the plane showed a good match with the model. However, the mean velocities showed significant differences. Boeche et al. (2013) studied the relation between kinematics and the chemical abundances of stars. By computing stellar orbits they deduced the maximum vertical distance and eccentricity of stars. Next they studied the chemical properties of stars by binning them in the plane. They found that stars with kpc and have two populations with distinct chemical properties, which hints at radial migration. Binney et al. (2014) used full six-dimensional information for RAVE stars to fit a Gaussian model to velocities in the plane. They studied how the orientation and shape of the velocity ellipsoid varies with location in the Galaxy, and provided analytic fits to the highly non-Gaussian distributions of . They also compared the observed kinematics of stars in different spatial bins with the predictions of a full dynamical model that had been fitted to the GCS data.

Stellar kinematics allow us to measure the peculiar motion () of the Sun with respect to the local standard of rest (LSR), and also the speed of the LSR (in other words, the circular speed at the location of Sun, ). There have been as many determinations of these as there have been new data, one of the earliest being ( km s by Delhaye (1965). Very precise measurements of these have been extracted from the Hipparcos proper motions and the Geneva Copenhagen survey. Dehnen & Binney (1998b) and Aumer & Binney (2009), using Hipparcos proper motions, got ( km s. A revision of was suggested by Binney (2010) and McMillan & Binney (2010). Later Schönrich et al. (2010) explained why the previous estimates, which used colors as a proxy for age, gave incorrect results. Using a chemo-dynamical model calibrated on GCS data, they found ( km s. Schönrich (2012) described a model-independent method and suggests that could be as high as . As further evidence of an unsettled situation, Bovy et al. (2012d) find from a sample of 3500 APOGEE stars , and km s and also suggest a revision of the LSR reference frame.

component | age | IMF | density law | |
---|---|---|---|---|

Thin disc | Gyr | for | = 5 kpc, = 3 kpc | |

for | ||||

0.15-10 Gyr | = 2.53 kpc, = 1.32 kpc | |||

Thick disc | 11 Gyr | if | = 2.5 kpc, = 0.8 kpc | |

if | kpc |

In this paper, we refine the kinematic parameters of the Milky Way, using first a simple model based on Gaussian velocity distributions, and then a model based on the Shu distribution function (DF). We explore the age-velocity dispersion relation, the radial gradient in dispersions, the Solar motion and the circular speed. A full exploration of this parameter space using Markov chain Monte Carlo (MCMC) techniques has not been done before, even for a sample as small as the GCS.

The RAVE survey contains giants and dwarfs in roughly equal proportions, and it is hard to determine distances to giants. Moreover, many RAVE stars are sufficiently distant for the errors in their available, ground-based, proper motions to give rise to errors in their tangential velocities that far exceed the small () errors in their line-of-sight velocities. Hence we choose not to use either distances or proper motions. Instead we marginalize over these variables in addition to mass, age, and metallicity. When the velocity distribution is Gaussian, the marginalisation over tangential velocity can be done analytically, in general for other models, e.g., Shu DF models, the marginalisation has to be done numerically, and it is computationally expensive.

Bovy et al. (2012d) recently used a similar procedure to fit models to 3500 APOGEE stars, but they did not investigate the AVR, and considered only Gaussian models. In this paper, we fit a kinematic model to 280,000 RAVE stars taking full account of RAVE’s photometric selection function. To handle the large data size, we introduce two new MCMC model-fitting techniques. Our aim is to encapsulate in simple analytical models the main kinematic properties of the Milky Way disc. Our results should be useful for making detailed comparison with simulations.

The paper is organized as follows. In §2, we introduce the analytic framework employed for modelling. In §3, we describe the data that we use and its selection functions. In §4, we describe MCMC model-fitting techniques employed here. In §5, we present our results and discuss their implications in §6. Finally, in §7 we summarize our findings and look forward to the next stages of the project.

## 2. Analytic framework for modelling the galaxy

We first describe the analytic framework used to model the Galaxy (Sharma et al., 2011). The stellar content of the Galaxy is modeled as a set of distinct components: the thin disc, the thick disc, the stellar halo and the bulge. The distribution function, i.e., the number density of stars as a function of position (), velocity (), age (), metallicity (), and mass () for each component, is assumed to be specified a priori as a function

(1) |

where runs over components. The form of that correctly describes all the properties of the Galaxy and is self-consistent is still an open question. However, over the past few decades considerable progress has been made in identifying a working model dependent on a few simple assumptions (Robin & Creze, 1986; Bienayme et al., 1987; Haywood et al., 1997a, b; Girardi et al., 2005; Robin et al., 2003). Our analytical framework brings together these models as we describe below.

For a given Galactic component, let the stars form at a rate with a mass distribution (IMF) that is a parameterized function of age . Let the present day spatial distribution of stars be conditional on age only. Finally, assuming that the velocity distribution to be and the metallicity distribution to be , we have

(2) |

The functions conditional on age can take different forms for different Galactic components. The IMF here is normalized such that and is the mean stellar mass. The metallicity distribution is modeled as a log-normal distribution,

(3) |

the mean and dispersion of which are given by age-dependent functions and . The is widely referred to as the age-metallicity relation (AMR). Functional forms for each of the expressions in Equation (2) are given in Sharma et al. (2011) (see also Robin et al., 2003). For convenience we reproduce in Table 1 a short description of the thin and thick disc components. The axis ratio of the thin disc is given by

(4) |

and this represents the age scale height relation.

### 2.1. Kinematic modelling

Having described the general framework for analytical modelling, we now discuss our strategy for the kinematic modelling of the Milky Way. Simply put, we want to constrain the velocity distribution . In what follows, we assume that everything except for on the right hand side of Equation (2) is known. In the next two subsections we discuss the functional forms of the adopted and describe ways to parameterize them. Technical details related to fitting such a model to observational data are discussed in Section 4.

Although we can supply any functional form for and fit them to data, in reality there is much less freedom. The spatial density distribution and the kinematics are linked to each other via the potential. Hence, specifying independently lacks self consistency. In such a scenario, the accuracy of a pure kinematic model depends upon our ability to supply functional forms of that are a good approximation to the actual velocity distribution of the system. A proper way to handle this problem would be to use dynamically self consistent models, but such models are still under development and we hope to explore them in future. In the meantime, we explore kinematic models that provide a reasonable approximation to the actual velocity distribution and hope to learn from them.

### 2.2. Gaussian velocity ellipsoid model

In this model, the velocity distribution is assumed to be a triaxial Gaussian,

(5) | |||||

where are cylindrical coordinates. The is the asymmetric drift and is given by

(6) | |||||

This follows from Equation 4.227 in Binney & Tremaine (2008) assuming . This is valid for the case where the principal axes of velocity ellipsoid are aligned with the spherical coordinate system. If the velocity ellipsoid is aligned with the cylindrical coordinate system, then . Recent results using the RAVE data suggest that the velocity ellipsoid is aligned with the spherical coordinates (Siebert et al., 2008; Binney et al., 2014). One can parameterize our ignorance by writing the asymmetric drift as follows:

(7) |

This is the form that is used by Bovy et al. (2012d).

The dispersions of the and components of velocity increase as a function age due to secular heating in the disc, and there is a radial dependence such that the dispersion increases towards the Galactic Center. We model these effects after Aumer & Binney (2009) and Binney (2010) using the functional form

(8) | |||||

(9) |

The choice of the radial dependence is motivated by the desire to produce discs in which the scale height is independent of radius. For example, under the epicyclic approximation, if is assumed to be constant, then the scale height is independent of radius for (van der Kruit & Searle, 1982; van der Kruit, 1988; van der Kruit & Freeman, 2011). In reality there is also a dependence of velocity dispersions which we have chosen to ignore in our present analysis. This means that for a given mono age population the asymmetric drift is independent of . However, the velocity dispersion and asymmetric drift of the combined population of stars are functions of . This is because the scale height of stars for a given isothermal population is an increasing function of its vertical velocity dispersion.

For our kinematic analysis we assume with . While this is true for the thick disc adopted by us, for the thin disc this is only approximately true (see Table 1). The thin disc with age between 0.15 and 10 Gyr is exponential at large with a scale length of 2.53 kpc.

### 2.3. Shu distribution function model

The Gaussian velocity ellipsoid model has its limitations. In particular, the distribution of is strongly non-Gaussian, being highly skew to low .

For a two-dimensional disc, a much better approximation to the velocity distribution is provided by the Shu (1969) distribution function. Moreover, the Shu DF, being dynamical in nature, connects the radial and azimuthal components of velocity dispersion to each other and to the mean-streaming velocity, thus lowering the number of free parameters in the model.

Assuming the potential is separable as we can write the distribution function as

(10) |

where is the angular momentum,

(11) |

(12) | |||||

with

(13) |

being the effective potential. Let be the radius of a circular orbit with specific angular momentum . In Schönrich & Binney (2012) (see also Sharma & Bland-Hawthorn, 2013) it was shown that joint distribution of and can be written as

(14) |

where is a function that controls the disc’s surface density and

(15) | |||||

(16) |

We assume to be specified as

(17) | |||||

and to be specified as

(18) |

Now this leaves us to choose . This should be done so as to produce discs that satisfy the observational constraint given by , i.e., an exponential disc (or discs) with scale length . A simple way to do this is to let

(19) |

However, this matches the target surface density only approximately. A better way to do this is to use the empirical formula proposed in Sharma & Bland-Hawthorn (2013) such that

(20) |

where and is a function of the following form

(21) |

with . This is the scheme that we employ in this paper. As in the previous section, we adopt kpc.

Model Parameter | Description |
---|---|

Solar motion with respect to LSR | |

Solar motion with respect to LSR | |

Solar motion with respect to LSR | |

The velocity dispersion at 10 Gyr | |

Normalization of thin disc AVR (Eq 9) | |

The velocity dispersion at 10 Gyr | |

Normalization of thin disc AVR (Eq 9) | |

The velocity dispersion at 10 Gyr | |

Normalization of thin disc AVR (Eq 9) | |

The velocity dispersion of thick disc (Eq 9) | |

The velocity dispersion of thick disc (Eq 9) | |

The velocity dispersion of thick disc (Eq 9) | |

The exponent of thin disc AVR (Eq 9) | |

The exponent of thin disc AVR (Eq 9) | |

The exponent of thin disc AVR (Eq 9) | |

The scale length of the | |

velocity dispersion profile for thin disc (Eq 9) | |

The scale length of the | |

velocity dispersion profile for thick disc (Eq 9) | |

Distance of Sun from the Galactic Center | |

The circular speed at Sun | |

Vertical fall of circular velocity (Eq 22) | |

Radial gradient of circular speed (Eq 22) |

### 2.4. Model for the potential

So far we have described kinematic models in which the potential is separable in and . In such cases, the energy associated with the vertical motion can be assumed to be the third integral of motion. In reality, the potential generated by a double exponential disc is not separable in and . For example, the hypothetical circular speed defined as can have both a radial and a vertical dependence. We model it as

(22) | |||||

The parameters and control the radial and vertical dependencies, respectively. The motivation for the vertical term comes from the fact that the above formula with provides a good fit to the profile of Milky Way potential by Dehnen & Binney (1998a) as well as that of Law & Majewski (2010) (see Figure 1). Both of them have bulge, halo and disc components. The former has two double exponential discs while the later has a Miyamoto-Nagai disc.

To accurately model a system, in which the potential is not separable in and , requires a distribution function that incorporates the third integral of motion in addition to energy and angular momentum , e.g., distribution functions based on action integrals and (Binney, 2012b, 2010). Converting phase space coordinates to actions integrals is not easy and techniques to make this possible are under development. One way to compute the actions is by using the adiabatic approximation, i.e., conservation of vertical action (Binney & McMillan, 2011; Schönrich & Binney, 2012). Using an adiabatic approximation, Schönrich & Binney (2012) extend the Shu DF to three dimensions and model the kinematics as a function of distance from the plane. Recently, it has been shown by Binney (2012a) that the adiabatic approximation is accurate only close to the midplane and that much better results are obtained by assuming the potential to be similar to a Stackel potential.

In this paper, to model systems where the potential is not separable in and , we follow a much simpler approach. The approach is motivated by the fact that, for realistic galactic potentials, we expect the of a single age population to fall with . It has been shown by both Binney & McMillan (2011) and Schönrich & Binney (2012) that when vertical motion is present, in a Milky Way type potential, the effective potential for radial motion (see Equation 13) needs to be modified as the vertical motion also contributes to the centrifugal potential. Neglecting this effect leads to an overestimation of . As one moves away from the plane this effect is expected to become more and more important. Secondly, as shown by Schönrich & Binney (2012), in a given solar annulus, stars with smaller will have larger vertical energy and hence larger scale height. This implies that stars with smaller are more likely to be found at higher , consequently the should also decrease with height.

The fall of with height is also predicted by the Jeans equation for an axisymmetric system

(23) | |||||

The at high will be lower both because is lower and because the term in the third square bracket decreases with , e.g., assuming .

For the Gaussian model we simulate the overall reduction of with by introducing a parameterized form for as given by Equation (22) in Equation (6). Given this prescription we expect , so as to account for effects other than that involving the first term in Equation (23). In reality, the velocity dispersion tensor will have a much more complicated dependence on and than what we have assumed, e.g., we assume that only has an dependence which is given by an exponential form.

For the Shu model we replace in Equation (15) by the form in Equation (22). The idea again is to model the fall of with . However, the prescription breaks the dynamical self-consistency of the model and turns it into a fitting formula. In reality, the may not exactly follow the functional form for the vertical dependence predicted by our model, but is better than completely neglecting it.

### 2.5. Models and parameters explored

We now give a description of the parameters and models that we explore. We investigate up to 18 parameters (see Table 2 for a summary). These are the Solar motion , the logarithmic slopes of age-dispersion relations , the scale lengths of radial dependence of velocity dispersions , the velocity dispersions at of the thin disc and of the thick disc ; for simplicity the subscript is dropped here. The Gaussian models are denoted by GAU whereas models based on the Shu DF are denoted by SHU. For models based on the Shu DF, the azimuthal motion is coupled to the radial motion, hence , and are not required. When is fixed, we assume its value to be . In some cases, we also keep the parameters and fixed. While reporting the results we highlight the fixed parameters using the magenta color.

In our analysis the distance of the Sun from the galactic center, , is assumed to be 8.0 kpc. To gauge the sensitivity of our results to , we also provide results for cases with and 8.5 kpc. The true value of is still debatable ranging from 6.5 to 9 kpc. Recent results from studies of orbit of stars near the Galactic Center give (Gillessen et al., 2009). The classically accepted value of kpc is a weighted average given in a review by Reid (1993). The main reason we keep fixed is as follows. Given that we do not make use of explicit distances, proper motions or external constraints like the proper motion of Sgr A*, it is clear we will not be able to constrain well, specially if is free. For example McMillan & Binney (2010) using parallax, proper motion and line of sight velocity of masers in high star forming regions, show that constraining both and independently is difficult.

## 3. Observational data and selection functions

In this paper we analyze data from two surveys, the Radial Velocity Experiment, RAVE (Steinmetz et al., 2006; Zwitter et al., 2008; Siebert et al., 2011; Kordopatis et al., 2013) and the Geneva Copenhagen Survey, GCS (Nordström et al., 2004; Holmberg et al., 2009). For fitting theoretical models to data from stellar surveys, it is important to take into account the selection biases that were introduced when observing the stars. This is especially important for spectroscopic surveys which observe only a subset of the all possible stars defined within a color-magnitude range. So we also analyze the selection function for the RAVE and GCS surveys.

### 3.1. RAVE survey

The RAVE survey collected spectra of 482430 stars between April 2004 and December 2012 and stellar parameters, radial velocity, abundance and distances have been determined for 425561 stars. In this paper we used the internal release of RAVE from May 2012, which consisted of 458412 observations. The final explored sample after applying various selection criteria consists of 280128 unique stars. These data are available in the DR4 public release (Kordopatis et al., 2013), where an extended discussion of the sample is also presented.

For RAVE we only make use of the and of stars. The and 2MASS colors are used for marginalization over age, metallicity and mass of stars taking into account the photometric selection function of RAVE. We do not use proper motions, or stellar parameters which could in principle provide tighter constraints, but then one has to worry about the systematics introduced by their use. For example, in a recent kinematic analysis of RAVE stars, Williams et al. (2013) found systematic differences between different proper motion catalogs like PPMXL (Röser et al., 2008), SPM4 (Girard et al., 2011) and UCAC3 (Zacharias et al., 2010). As for stellar parameters, although they are reliable, no pipeline can claim to be free of unknown systematics specially when working with low signal to noise data. Hence, as a first step it is instructive to work with data that are least ambiguous and then in the next step check the results by adding more information. As we will show later, for the types of model that we consider, even using only and can provide good constraints on the model parameters.

We now discuss the selection function of RAVE. The RAVE survey was designed to be a magnitude-limited survey in the band. This means that theoretically it has one of the simplest selection functions, but, in practice, for a multitude of reasons, some biases were introduced. First, the DENIS and 2MASS surveys were not fully available when the survey started. Hence, the first input catalog (IC1) had stars from Tycho and SuperCOSMOS. For Tycho stars, magnitudes were estimated from and magnitudes. On the other hand, the SuperCOSMOS stars had magnitudes but an offset was later detected with respect to . Later, as DENIS and 2MASS became available, the second input catalog IC2 was created. With the availability of DENIS, it became possible to have a direct mag measurement which was free from offsets like those observed in SuperCOSMOS. But DENIS itself had its own problems – saturation at the bright end, duplicate entries, missing stripes in the sky, inter alia. To solve the problem of duplicate entries, the DENIS catalog was cross-matched with 2MASS to within a tolerance of 1. This helped clean up the color-color diagram of vs in particular (Seabroke, 2008).

Given this history, the question arises how can we compute the selection function. Since accurate mag photometry is not available for stars that are only in IC1, the first cut we make is to select stars from IC2 only. Then we removed the duplicates– among multiple observations one of them was selected randomly. To weed out stars with large errors in radial velocity, we made some additional cuts:

For brighter magnitudes, , suffers from saturation. One could either get rid of these stars to be more accurate or ignore the saturation. In the present analysis we ignore the saturation. Note, the observed stars in the input catalog are not necessarily randomly sampled from the IC2. Stars were divided into four bins in and stars in each bin were randomly selected to observe at a given time. However it seems later on this division was not strictly maintained (probably due to the observation of calibration stars and some extra stars going to brighter magnitudes). This means the selection function has to be computed as a function of in much finer bins. Assuming the DENIS I magnitudes are correct, and the cross-matching is correct, the only thing that needs to be taken into account is the angular completeness of the DENIS survey (missing stripes). To this end, we grid the observed and IC2 stars in space and compute a probability map. To grid the angular co-ordinates we use the HEALPIX pixelization scheme (Górski et al., 2005). The resolution of HEALPIX is specified by the number and the total number of pixels is given by . For our purpose, we use which gives a pixel size of 13.42 , which is smaller than the RAVE field of view of 28.3 . For magnitudes, we use a bin size of 0.1 mag, which again is much smaller than the magnitude range included in each observation. Given the fine resolution of the probability map, the angular and magnitude dependent selection biases are adequately handled. Note, in the range , a color selection of was used to selectively target giants, and we take this into account in our analysis.

Arce & Goodman (1999) suggest that the Schlegel et al. (1998) maps overestimate reddening by a factor of 1.3-1.5 in regions with smooth extinction , i.e., (see also Cambrésy et al., 2005). In Figure 2 the color and temperature distributions of our RAVE stars (black lines) are compared with predictions from Galaxia given the selection above. At high latitudes (second and fourth panels) the red model curves agree reasonably well with the black data curves, but in the top panel () the red model distribution of colors is clearly displaced to red colours relative to the data. The low-latitude temperature distributions shown in the third panel show no analogous shift of the model curve to lower temperatures, so we have a clear indication that the model colours have been made too red by excessive extinction. To correct this problem, we modify the Schlegel as follows

(24) |

The formula above reduces extinction by 40% for high extinction regions; the transition occurs around and is smoothly controlled by the function. The green curves in the top two panels show that the proposed correction to Schlegel maps. Although not perfect, the correction reduces the discrepancy between the model and data for low latitude stars (top panel) whilst having negligible impact on high-latitude stars.

The fact that the temperature and color distributions in Figure 2 match up so well is encouraging, given that we selected on magnitude alone. This implies that the spatial distribution of stars specified by Galaxia satisfies one of the necessary observational constraints.

### 3.2. GCS survey

We fit the models to all six phase-space coordinates of a subset of the 16682 F and G type main-sequence stars in the GCS (Nordström et al., 2004; Holmberg et al., 2009). A mock GCS sample was extracted from the model as in Sharma et al. (2011). Velocities and temperatures are available for 13382 GCS stars. We found that while Galaxia predicts less than one halo star in the GCS sample for a distance less than pc, when plotted in plane, the GCS has 29 stars with and highly negative values of (as expected for halo stars). Following Schönrich et al. (2010), we identify these as halo stars and exclude them from our analysis.

The GCS catalog is complete for F and G type stars within a volume given by pc and in magnitude; within these limits there are only 1342 stars. But since GCS is a color-magnitude limited survey, there is no need to restrict the analysis to a volume complete sample. In Nordström et al. (2004) magnitude completeness as a function of color is provided and we use this (their §2.2). There is some ambiguity about the coolest dwarfs which were added for declination ; from information gleaned from Nordström et al. (2004), we could not find a suitable way to take this into account.

We also applied some additional restrictions on the sample. For example, we restrict our analysis to stars with distance less than 120 pc, so as to avoid stars with large distance errors. The GCS survey selectively avoids giants. To mimic this we use the following selection function . The predicted temperature distributions show a mismatch with models, in particular, there are too many hot stars. Using Casagrande et al. (2011) temperatures, which are more accurate, we found an upper limit on of 7244 K, which was applied to the models.

After the above mentioned cuts, the final sample consisted of 5201 stars. Note, we do not remove possible binary stars as this will further reduce the number of stars. In future, we think it will be instructive to check if there is any systematic associated with the inclusion or exclusion of binaries. The black histograms in Figure 4 show the distribution of these stars, while the red histograms show the predictions of the model. At the hot end, the temperature distributions of model and data are still discrepant, but the distance distributions agree nicely. The model’s age distribution is qualitatively correct but differences can also be seen. The plotted GCS ages are maximum likelihood Padova ages and there can be systematics associated with this. A more quantitative comparison would require estimating the ages of model stars in the same way as done by GCS and taking into account uncertainties and systematics which we do not do here. The peak in the model at 11 Gyr is due to the thick disc having a fixed age. The peaks in the data at 0 and are most likely due to caps employed while estimating ages. The color distribution in GCS shows a peak at around , which could be due to an unknown selection effect. The bump at , which is also seen in models, is due to turnoff stars. Overall, we think our modelling reproduces to a good degree the selection function of the GCS stars.

## 4. Model Fitting techniques

If are the observed properties of a star, we can describe the observed data by . Also, let be the set of parameters that define the model. Our job is to compute

(25) |

where . We employ an MCMC scheme to estimate and assume a uniform prior on . We now discuss how to compute .

Generally, a model of a galaxy gives the probability density . For RAVE, the observed quantities are and , while for GCS they are and . Since quantities like and are unknown, one has to compute the marginal probability density by integration. For RAVE, the required marginal density is

(26) | |||||

and for GCS it is

(27) | |||||

Here is the selection function specifying how the stars were preselected in the data. The actual selection is on photometric magnitude which in turn is a function of and .

For the kinds of models explored here, the computations are considerably simplified due to the fact that

(28) |

for which is the set of model parameters that govern the spatial distribution of stars and is the set of model parameters that govern the kinematic distribution of stars. The term is invariant in our analysis, and this is the main assumption that we make. In other words we assume star formation rate (SFR), initial mass function (IMF), scale length of disc, age scale-height relation, age metallicity relation and radial metallicity gradient for the disc. All these distributions can be constrained by the stellar photometry. The distribution represents the kinematics, which is what we explore. It should be noted that the model that we use has been shown to satisfy the number count of stars (Robin et al., 2003; Sharma et al., 2011). In a fully self consistent model, the scale height, the vertical stellar velocity dispersion and the potential would all be related to each other and this is something we would like to address in future.

We can now integrate the last term in Equation (28) over and such that

(29) | |||||

where

(30) | |||||

The term is computed numerically using the code Galaxia (Sharma et al., 2011). Galaxia, uses isochrones from the Padova database to compute photometric magnitudes for the model stars (Marigo et al., 2008; Bertelli et al., 1994). We first generate a fiducial set of stars satisfying the color-magnitude range of the survey. Then we apply the selection function and reject stars that do not satisfy the constraints of the survey. The accepted stars are then binned in space. Since, the GCS is local to the Sun, we use the following approximation . The probability distribution in space for RAVE is shown in Figure 3.

For RAVE, we have to integrate over four variables (), but for GCS we integrate over only . The 4D marginalization for RAVE poses a serious computational challenge for data as large as the RAVE survey. For Gaussian distribution functions, the integral over and can be performed analytically to give an analytic expression for , but in general it cannot be done analytically. Hence, we try two new methods. The first method is fast but has inflated uncertainties. The second method is slower to converge but gives correct estimates of uncertainties. Given these strengths and limitations, we use a combined strategy that makes best use of both the methods.

We use the first ‘sampling and projection’ method to get an initial estimate of and also its covariance matrix. These are then used in the second ‘data augmentation’ method. The initial estimate reduces the ‘burn in’ time, while the covariance matrix eliminates the need to tune the widths of the proposal distributions. In general we use an adaptive MCMC scheme, which avoids manual tuning of the widths of the proposal distributions (Andrieu & Thoms, 2008). At regular intervals, we compute the covariance matrix and scale it so as to achieve the desired acceptance ratio for the given number of parameters Gelman et al. (1996). We now discuss the two methods in more detail.

### 4.1. MCMC using sampling and projection

Instead of doing the computationally intensive marginalization, at each step of the Markov chain of model parameters, we generate a sample of stars by Monte-Carlo sampling the current model subject to the selection function. Binning these stars in space then gives an estimate of . Note that, given the stochastic nature of our estimate of , the standard Metropolis-Hastings algorithm had to be altered to avoid the simulation from getting stuck at a stochastic maximum of the likelihood.

### 4.2. MCMC using data augmentation

Instead of marginalizing one can treat the nuisance parameters as unknown parameters and estimate them alongside other parameters. This constitutes what is known as a sampling based approach for computing the marginal densities. The basic form of this scheme was introduced by Tanner & Wong (1987) and later extended by Gelfand & Smith (1990). Let be an extra set of variables that are needed by the model to compute the probability density. Then we can write

(31) |

where , and is a function which is known and relatively easy to compute. For example, for the RAVE data and . Due to the unusually large number of parameters, it is difficult to get satisfactory acceptance rates with the standard Metropolis-Hastings scheme without making the widths of the proposal distributions extremely small. Thus the chains would take an unusually long time to mix. To solve this, one uses the Metropolis scheme with Gibbs sampling (MWG) (Tierney, 1994). The MWG scheme is also useful for solving hierarchical Bayesian models, and its application for 3D extinction mapping is discussed in Sale (2012). In our case, the Gibbs step consists of first sampling from the conditional density and then from the conditional density . The sampling in each Gibbs step is done using the Metropolis-Hastings algorithm.

### 4.3. Goodness of fit

To assess the ability of a model to fit the data, we compute an approximate reduced value. To accomplish this, first we bin the data in the observational space. For RAVE, we bin the data in space with bins of size 859 deg and . Angular binning was done using the HEALPIX scheme. For GCS, we bin the and components of velocity separately with bins of size . Next, an N-body realization of a given model was created satisfying the same constraints as the data, The reduced between the data and the model was then computed as

(32) |

Here, is the number of data points in a bin, the number of model points in the same bin and is the sampling fraction. Choosing, to be very high one can increase the precision of the estimate, but then it increases the computational cost. For RAVE was 1 while for GCS it was 10. To decrease the stochasticity in the estimate, we computed the mean over 30 random estimates .

The reduced as computed above, has its limitations. Firstly, it is not an accurate estimator of the goodness of fit. Secondly, value is sensitive to the choice of bin size and . Hence, it is not advisable to estimate statistical significance using our reduced . However, the reduced should be good enough to qualitatively compare the goodness of fit of two models.

### 4.4. Tests using synthetic data

We now describe tests in which mock data are sampled from the distribution function and then fitted using the MCMC machinery. These tests serve two main purposes. First, they determine if our MCMC scheme works correctly. Secondly, they tell us which parameters can be recovered and with what accuracy. We study two classes of models based on (1) the Gaussian DF and (2) the Shu DF. Additionally, we study two types of mock data, one corresponding to the RAVE survey and the other to the GCS survey. For GCS we also study models where is fixed. Altogether this leads to 6 different types of tests.

The results of these tests are summarized in Tables 4 and 4. The difference of a parameter from input values divided by uncertainty measures the confidence of recovering the parameter. To aid the comparison, we color the values if they differ significantly from the input values: (black), (blue). It can be seen that all parameters are recovered within the 3 range as given by the error bars. Ideally to check the systematics, the fitting should be repeated multiple times and the mean values should be compared with input values. However, the MCMC simulations being computationally very expensive we report results with only one independent data sample for each of the test cases.

It can be seen that GCS type data cannot properly constrain . This is because the GCS sample is very local to the Sun. Keeping free also has the undesirable effect of increasing the uncertainty of and . For Gaussian models, it is easy to see from Equation (6) that the effect of changing can be compensated by a change in and . Given these limitations, when analyzing GCS we keep fixed to 226.87 km s, a value that was used by Sharma et al. (2011) in the Galaxia code.

The Solar motion is constrained well by both surveys, but better by RAVE. RAVE is also clearly better in constraining thick-disc parameters than GCS, mainly because the GCS has very few thick-disc stars (Galaxia estimates it to be 6% of the overall GCS sample). Across all parameters, for Shu models is the only parameter which is constrained better by GCS than by RAVE. This is because RAVE only has radial velocities. This means that only those stars that lie towards the pole can carry meaningful information about the vertical motion, and such stars constitute a much smaller subset of the whole RAVE sample. This suggests that one can use the value from GCS when fitting the RAVE data, as we show below.

Model | GCS GAU | GCS GAU | RAVE GAU | Input |