A hierarchical Bayesian approach for reconstructing the Initial Mass Function of Single Stellar Populations
Recent studies based on the integrated light of distant galaxies suggest that the initial mass function (IMF) might not be universal. Variations of the IMF with galaxy type and/or formation time may have important consequences for our understanding of galaxy evolution. We have developed a new stellar population synthesis (SPS) code specifically designed to reconstruct the IMF. We implement a novel approach combining regularization with hierarchical Bayesian inference. Within this approach we use a parametrized IMF prior to regulate a direct inference of the IMF. This direct inference gives more freedom to the IMF and allows the model to deviate from parametrized models when demanded by the data. We use Markov Chain Monte Carlo sampling techniques to reconstruct the best parameters for the IMF prior, the age, and the metallicity of a single stellar population. We present our code and apply our model to a number of mock single stellar populations with different ages, metallicities, and IMFs. When systematic uncertainties are not significant, we are able to reconstruct the input parameters that were used to create the mock populations. Our results show that if systematic uncertainties do play a role, this may introduce a bias on the results. Therefore, it is important to objectively compare different ingredients of SPS models. Through its Bayesian framework, our model is well-suited for this.
keywords:galaxies: stellar content – galaxies: luminosity function, mass function – methods: statistical
Since its introduction by Salpeter (1955), the initial mass function (IMF) has been a key parameter in the study of stars, stellar populations and galaxy evolution. Salpeter initially parametrized the IMF as a single power law. However, it was recognized later on that when the IMF was extended down to the lowest stellar masses that it did not follow a single power law. Instead, a lognormal distribution (Miller & Scalo, 1979), a multicomponent power law (Kroupa et al., 1993) or a combination of a lognormal distribution for low masses and a power law for higher masses (Chabrier, 2003) were proposed as alternatives. Dabringhausen et al. (2008), among others, have shown that the latter two are in fact very similar.
Measurements of the IMF have long been based on direct star counts and mass estimates of resolved stars. These kinds of measurements are not possible for stars beyond the Local Group. Therefore, for many astrophysical studies the IMF has been assumed to be universal and similar to the one of the Milky Way. However, recent studies (Davé, 2008; van Dokkum, 2008; Treu et al., 2010; Graves & Faber, 2010; Conroy & van Dokkum, 2012; Cappellari et al., 2012; Spiniello et al., 2012, 2014; Ferreras et al., 2013; La Barbera et al., 2013) suggest that the IMF might not be universal on a cosmological scale, indicating that the relative number of low-mass stars in the population changes as a function of galaxy mass or velocity dispersion. This may have important consequences for the many properties of galaxies that are derived on the basis of the IMF, such as their stellar content, chemical enrichment history and even their evolutionary history: see e.g. Tinsley (1972).
Starting with Tinsley (1968), stellar population synthesis (SPS) models have been developed to transform the observable properties of a galaxy into a set of physical properties. Among the physical properties encrypted in the spectrum of a galaxy are its star formation history (SFH), the amount of gas and dust that it contains, its chemical composition and its IMF. However, deriving the low-mass end of the IMF on the basis of the spectrum of a galaxy is not straightforward. Dwarfs with masses M contribute only 1% to the integrated light of an old stellar population (Conroy & van Dokkum, 2012). Nevertheless, they contribute 12 and 42% of the total stellar mass for a standard Kroupa IMF (Kroupa, 2001) and a Salpeter IMF with a low-mass boundary of 0.1 M and a high-mass boundary of 100 M, respectively. For younger stellar populations, the relative contribution of low-mass stars to the spectrum is even less. In old stellar populations, the spectral similarity of low-mass stars and the most luminous stars (the K and M giants) further complicates the situation. However, a number of (gravity-sensitive) spectral features are known to be sensitive to either dwarfs or giants (Faber & French, 1980; Schiavon et al., 1997a; Wing & Ford, 1969; Schiavon et al., 1997b; Gorgas et al., 1993; Worthey et al., 1994; Schiavon, 2007; Cenarro et al., 2003; Spiniello et al., 2012). The challenge for a SPS model is to extract this information from a spectrum.
Most SPS models are built upon three basic ingredients: a stellar evolution model in the form of isochrones as a function of age and metallicity, a stellar library, and an IMF. These ingredients form the basis of what is known as a single stellar population (SSP): a single, coeval population of stars with the same metallicity. The isochrone describes which stars are present in a stellar population, the stellar library provides a set of stellar spectra, and the IMF determines the distribution of stars along the isochrone. All of these ingredients have their own uncertainties. Models of stellar evolution are often one-dimensional codes and the results of these codes depend on the adopted prescriptions for uncertain factors, such as overshooting, rotation, interaction between binary stars, and mass loss. Stellar libraries may be theoretical, empirical or a combination of both. Both empirical and theoretical libraries have their own advantages and disadvantages. The assumption of a universal IMF is another source of uncertainty.
Real galaxies are not SSPs. Combining a set of SSPs with a SFH, a model for chemical evolution and possibly a dust model allows the construction of composite stellar populations (CSPs). To date, many different SPS models have been developed, e.g. Bruzual & Charlot (2003), Le Borgne et al. (2004), Maraston (2005), Conroy & van Dokkum (2012), and Vazdekis et al. (2012). Most of these models allow the user to change the IMF. Once an IMF or a set of different IMFs is defined, this allows the model to create synthesized spectra for a grid of different model parameters (including the IMF). The synthesized spectra are then compared with observed galaxy spectra to obtain values of, e.g., metallicity or IMF slope. Determining the best-fitting parameters is often done through a minimization technique, such as minimization in for example Koleva et al. (2009). However, the ultimate goal of a SPS model would be a direct inference of the physical parameters from the spectrum.
Each SPS model uses its own set of ingredients, and the way in which these ingredients are combined also varies. This requires an objective manner to compare different SPS models with each other. A solution to this problem is provided by Bayesian inference. In this paper we develop a hierarchical Bayesian framework for SPS. Within our model, a parametrization of the IMF is used to construct a (flexible) IMF prior. Given this prior, our model allows for a direct inference of the piecewise IMF from the spectrum of an SSP. The outline of the paper is as follows. In Section 2 we discuss the Bayesian framework of our model. In Section 3 we describe how we construct a representative set of stellar templates as an input for our model. We then test our model by applying it to respectively mock SSPs and SSPs created by other SPS models in Sections 4 and 5.
2 Hierarchical Bayesian inference
Within a hierarchical Bayesian model there are multiple levels of inference. In this paper, we have two levels. The first level of inference assumes that a certain model family can provide a proper description of the truth and tries to obtain the best-fit for the free parameters within that model family (parameter estimation). The second level of inference allows us to compare a set of different model families and tries to infer the most probable model family given the data (model comparison). In analogy to the analysis presented by MacKay (1992), we derive a hierarchical Bayesian framework for modelling spectral energy distributions.
Neglecting the effect of extinction, which we will include in a future publication, the spectral energy distribution of a stellar population may be considered as the sum of the spectra of all the stars that it contains. This allows us to write the spectrum of the stellar population as a linear combination of a certain set of stellar templates. For an SSP, the stellar types that are present in the population are defined by an isochrone. The most important parameters that define an isochrone are its age and metallicity . An isochrone provides us with the stellar parameters (effective temperatures, surface gravities, luminosities, colors, initial masses, and current masses) of all the stellar types present in the corresponding SSP. These parameters are typically combined with a stellar library and an interpolator to create a spectrum s for each of the isochrone stars. This procedure is discussed in more detail in Section 3.
Suppose that S is a matrix with the spectra of all the isochrone stars in its columns, such that corresponds to the i-th flux density bin of the spectrum of isochrone star . Since the isochrone is defined by its age and metallicity, is implicitly also a function of age and metallicity (i.e. the age and metallicity define the isochrone, the isochrone defines a set of stars and their parameters which in turn are used to create a corresponding set of stellar spectra that goes into S). Although here we consider SSPs, S might equally well contain the spectra of the stellar templates of a CSP. In that case S also becomes a function of the SFH of the stellar population.
If w is a vector with the number of stars for each stellar template in S, the spectrum g of the stellar population is given by:
For each star, an isochrone provides in general both the initial mass and the current mass (taking into account a prescription for possible mass loss). Since w, hereafter called weights, represents the number of stars for each stellar template, the initial masses of the isochrone allow us to relate w to the IMF of the stellar population and vice versa. The IMF,
of an SSP is related to w through
where is the number of stars of template in the stellar population and is the initial (rank-ordered) mass associated with template by the isochrone. The boundaries and of the mass bin are defined such that
For the lowest mass template, (low-mass-cut-off of the IMF) and for the highest mass template we take . In this way, corresponds to the number of stars in the mass bin (, ). The way in which and are defined ensures that mass bins never overlap.
In this section we first discuss how to find the most probable distribution of weights by using the combination of regularization and hierarchical Bayesian inference. Subsequently we discuss how Markov Chain Monte Carlo techniques may be used to reconstruct the parameters of a certain IMF prior parametrization and to find the age and metallicity of the SSP. As a last step, we show how different model families may be compared on the basis of their Bayesian evidence. The hierarchical nature of the different steps in the model is illustrated in Fig. 1.
2.1 The first level of inference
At the first level of inference, we assume that model family is the correct model family and we try to infer the model parameters given the data g. A model family is defined on the one hand by the set of stellar templates S (e.g. a set of SPS templates as a function of age and metallicity) and on the other hand by the parametrization of the IMF prior, which defines the space of possible priors on the weights w. In Section 3 we discuss how to construct a representative set of stellar templates whereas the parametrization of the IMF prior, and hence the prior on the weights, is discussed in more detail in Section 2.1.2.
Given a certain model family , we infer the number of stars w for each of the templates in our model given the spectrum of a stellar population g. Note that in this paper we consider SSPs but in principle this can also be done for CSPs if we include the SFH and chemical evolution of the stellar population in our model as well, as we plan to do in the future.
2.1.1 The most likely solution
In reality the spectrum of a stellar population contains noise, such that the observed spectrum of the stellar population becomes
in which n represents the noise in the data. Assuming that the noise is Gaussian distributed, the likelihood of the data given the weights w and the stellar templates S is
where is the covariance matrix. The likelihood is normalized by
in which is the number of data points in the spectrum.
The most likely solution may be found by maximizing the likelihood function defined in equation 6. Maximizing the likelihood implies minimizing , so that we obtain
The solution to this equation is given by
Finding is in general an ill-posed problem. Therefore we use a prior on the weights w to regularize the solution that we obtain and to find the most probable distribution of weights .
2.1.2 The prior
Suppose that within a model family , the IMF prior is parametrized by a set of (non-linear) parameters which we call . For example the IMF prior may be parametrized as a power law which is defined by its slope and the normalization : in that case . If we take one particular combination of the parameters , this completely defines a prior on the IMF. By using the initial masses associated to the templates through the isochrone, the prior on the IMF translates into a prior on the weights which we refer to as . Since , the number of stars that we have for template is given by
where and are defined in equation 4. So within one model family , there is a range of different models that are defined by different priors . The allowed range of priors within the model family is defined by the functional form of the IMF prior and its parameters . Note that that the latter may have their own priors as well.
Once we have transformed the prior on the IMF into a prior on the weights , we define the regularization function as
where is the (constant) Hessian of . Hence the regularization function puts a penalty on w for deviating from the prior distribution of weights . Deviations are only possible if the data require it (i.e. if a deviation increases the likelihood more than it decreases the prior). Note that is part of as it is a model-dependent choice that relates to the form of regularization being used: we might for example use the identity matrix or enforce smoothness. Given the regularization function, the prior probability function may be expressed as
where is the regularization parameter and the prior probability function is normalized by . A larger regularization parameter implies that there is more emphasis on the prior and less on the likelihood. In Section 2.1.5 we show how the value of the regularization parameter may be derived in a Bayesian manner. The regularization parameter is therefore a nuisance parameter that must be marginalized over in the results.
2.1.3 The posterior
We have seen that a model family is defined by the stellar templates S, the parametrization of the IMF and by the choice of the Hessian , such that . Within such a model family there exists a range of models , in which is related to one particular choice of the IMF prior parametrization. In this way, each model is defined by a different prior . Hence the prior is not fixed but should be considered as a flexible entity that is allowed to change within the boundaries of the IMF parametrization . For every model , the likelihood and the prior are combined to find the most probable distribution of weights . Defining as
we apply Bayes’ theorem to combine the likelihood function and the prior probability function into the posterior probability function
where the posterior is normalized by . The last equation shows that the posterior probability distribution for the weights w of the stellar templates is controlled by two functions. On the one hand there is the ‘goodness of fit’ represented by and on the other hand there is the deviation of the weights from the prior represented by the regularization function . The balance between these two functions is set by the regularization parameter111Naively one might think will give the highest posterior, but since the prior is normalized, lowering makes the width of the prior very large, hence lowering the probability density at the position where the likelihood peaks. This lowers the posterior probability. Making larger will increase the latter, but might make the fit to the data more difficult, lowering the likelihood. Balancing these is the Bayesian equivalent to Occam’s razor, finding the simplest model that fits the data..
To find the most probable solution , we have to maximize the posterior probability density function (equation 15). Maximizing implies minimizing , so that we have . Defining as the Hessian of and using the definition of from equation 12, we have for the most probable solution
where is the Hessian of . In practice we solve equation 16 by using non-negative least squares222Lawson & Hanson (1995) (NNLS) to ensure a physically meaningful solution (i.e. the number of stars cannot be negative: ).
The most probable solution depends on the model as well as on the regularization parameter that regulates the balance between the ‘goodness of fit’ and the penalty term resulting from the regularization function. The inversion of the most probable weights is represented by the inner block in Fig. 1. To find , the inner block needs information from the outer levels: a set of stellar templates, a prior on the weights and a regularization parameter. In Section 2.1.5 we show how to find the most probable value of the regularization parameter given the model and the data.
2.1.4 Uncertainties of the most probable weights
Using a second order Taylor expansion for around , we may approximate as
with . This allows us to approximate the posterior as
From this equation we see that the posterior may be approximated locally as a multivariate Gaussian distribution with covariance matrix . The marginalized errors on the individual weights resulting from the linear inversion may be obtained by taking the square root of the diagonal elements in .
2.1.5 The regularization parameter
To find the optimal regularization parameter , we have to find the maximum value for the probability density function . According to Bayes’ theorem is written as
Neglecting the normalization constant , the function to consider for optimizing is the product of the likelihood and the prior . Note that the likelihood term appears as the normalizing constant of equation 15: this term is often referred to as the evidence. Using equations 6, 8, and 13-15 we have
Using the definition of from equation 12, the normalization of the prior becomes
where is the number of stellar templates in model . Using the Taylor expansion from equation 17 we have for
Since we do not know a priori the value of nor its order of magnitude, we choose a flat prior in such that . The optimal regularization parameter is then found by solving , which results in the following non-linear expression for the most probable value of the regularization parameter
where the last term in this equation originates from the prior on . This equation may be solved by using a non-linear solver. The process of finding the most probable regularization parameter is represented by block 2 in Fig. 1. Note that for every step in finding the solution to equation 24, the model has to go to the inner block to find the most probable weights . Instead of solving for the most probable regularization parameter, can in principle also be sampled as a nuisance parameter together with the other non-linear parameters of the model.
2.1.6 Reconstructing the IMF model parameters
To reconstruct the parameters of the IMF parametrization in a model family , we compare the different models in that model family with each other. The posterior probability of a certain model is given by
In the case of a flat prior , models may be compared on the basis of the likelihood term . Taking into account that is actually a marginalization over , we may write it as
where is the evidence derived in equation 23.
If we make the assumption that is a strongly peaked function at the most probable value (MacKay, 1992), we may approximate it by a delta function centred on so that we obtain:
and we may rank the different models on the basis of , i.e. the evidence from equation 23 evaluated for the most probable regularization constant . Note that if we compare two models and on the basis of their evidence, we are actually interested in the ratio of the evidence for model by the evidence for model . This ratio is called the Bayes factor . Values of and may be considered as, respectively, substantial and strong evidence in favour of whereas is in general considered as decisive evidence in favour of (Jeffreys, 1961).
The ability that we now have to quantify the posterior of a model on the basis of the evidence allows us to use Monte Carlo sampling techniques to reconstruct the posterior probability distribution of the non-linear IMF prior model parameters. In this paper we use emcee (Foreman-Mackey et al., 2013) to sample the IMF prior parameters . The reconstruction of the IMF prior model parameters is visualized by block 3 in Fig. 1. For every sample of the IMF prior model parameters , the model constructs a corresponding prior . Then the model finds the most probable regularization parameter at the level below. Finally the model determines the most probable weights and calculates the evidence which may in turn be used to compare different samples of the IMF prior model parameters.
2.1.7 The age and metallicity of the SSP
Before we can actually sample the IMF prior model parameters , we have to define the set of stellar templates that we are going to use. For SSPs, the stellar templates are defined by an isochrone of a certain age and metallicity. If we want to compare a combination of two different ages and metallicities (i.e. vs. ) we may once again use the Bayes factor
In this equation, is defined as
which represents the evidence (or marginal likelihood) for the templates defined by (i.e. in determining the evidence for the age and metallicity combination we marginalize over all parameters in the inner layers, among which the IMF prior model parameters ). To find the most probable age and metallicity, we determine the evidence for each entry in a predefined age-metallicity grid. The age-metallicity combination that results in the highest evidence is then used to refine the sampling of the IMF prior model parameters with emcee. Note that we are only allowed to do this if there is a clear peak for the evidence in our age-metallicity grid. Otherwise, the resulting distributions for the IMF model parameters should be marginalized over all ages and metallicities.
An efficient method for determining the integral in equation 29 is provided by nested sampling (Skilling, 2004). We calculate evidences by using Multinest (Feroz & Hobson, 2008; Feroz et al., 2009; Feroz et al., 2013). To implement the Monte Carlo sampling techniques that we use in this paper, we have developed our code as a pipeline of cosmoSIS (Zuntz et al., 2015). CosmoSIS is a cosmological parameter estimation code that brings together different inference tools, including Multinest and emcee.
The reconstruction of the (most probable) age and metallicity is represented by block 4 in Fig. 1. For every age and metallicity, the model selects an isochrone which is combined with the stellar library and the interpolator to create a set of stellar templates. At the level below, Multinest requires a complete sample of the IMF model parameters which allows it to marginalize over these parameters and calculate the evidence for the corresponding age and metallicity. This step still belongs to the first level of inference as we are trying to determine a set of parameters (i.e. age and metallicity).
2.2 The second level of inference
For the first level of inference we assume a certain model family . This model family is defined by the choice of isochrones, the stellar library, the interpolator, the regularization method and the parametrization of the IMF. The choice we make in this work to restrict ourselves to SSPs is also a model-dependent choice. We define as the set of parameters that defines a model family. The second level of inference allows us to compare different model families with each other. As an example, for a given dataset we might want to compare two different IMF prior parametrizations: e.g. a double power law parametrization versus a lognormal parametrization.
According to Bayes’ theorem, the posterior of a model family given the data g is
Assuming a flat prior over the model families, the posterior of a model family is proportional to the likelihood . This likelihood term is a marginalization over the free parameters of the model family
This integral may be determined by using e.g. Multinest and gives us the evidence for a certain model family.
If we return to the example where we want to compare a double power law parametrization of the IMF with a lognormal parametrization, the relevant model parameters to marginalize over are the parameters of the IMF prior parametrization . This is however only true if we find a sufficiently strong peak in the evidence of the age-metallicity grid that justifies the use of an SSP. If this is not the case, we should in principle also marginalize over all ages and metallicities to obtain the evidence for a model family. In the current paper, since we test only SSP models, there is no strong need to do this, but we plan to further expand the code to sample directly over the space of age and metallicity and then expand the code to enable modelling CSPs as well. The second level of inference in our model is represented by the outer shell in Fig. 1.
Now that we have discussed the general setup of our model, in the next section we discuss the particular set of ingredients that we use to apply our model in this paper.
3 Stellar templates
In this section we describe the basic ingredients that we use to create the stellar templates as an input for our model and that form the columns of the matrix S. Here we consider the example of an SSP as a function of age and metallicity. Therefore the age and metallicity are two free parameters of the model, although they are here solved on a regular grid of values and therefore are not part of the ‘continuous’ set of parameters w, and the IMF prior model parameters . Those continuous parameters can be inferred and the evidence obtained for each chosen age and metallicity via nested sampling as discussed in Section 2.1.7. Since that evidence is the probability density for a chosen age and metallicity, it can be used for model comparison.
Note that although we describe one particular set of ingredients in this section, in principle these ingredients may be substituted by any other set of ingredients. The approach described in Section 2 will still be valid, as long as we are able to construct a representative set of stellar templates.
The stars that are present in an SSP are defined by an isochrone. For a given age and metallicity, an isochrone provides us with the effective temperatures, surface gravities, masses and luminosities of the stars in an SSP corresponding to that particular age and metallicity.
Within our model we use the Padova isochrones described in Marigo et al. (2008). These isochrones may in principle be replaced with other models and different isochrone models may be assessed based on the evidence. The age and metallicity that define an isochrone are in principle continuous parameters. However, for every combination of age and metallicity we need to create an isochrone and for every isochrone star we need to interpolate a corresponding spectrum. Since isochrone determination and spectrum interpolation is time consuming, we create the stellar templates before running the model. Therefore we model age and metallicity as discrete parameters. We define a grid of ages and metallicities such that and with . The spacings of the grid are and , respectively.
3.2 Stellar library
To construct a spectrum for a given set of stellar parameters, the starting point is a stellar library. Currently we use the (empirical) MILES stellar library (Sánchez-Blázquez et al. (2006)), consisting of approximately 1000 stars. Once again, note that this is only one particular choice, and in future work we plan to extend our model to include the X-Shooter Spectral Library (Chen et al., 2014). Although the MILES library covers a broad range in atmospheric parameters, empirical libraries have the disadvantage that they provide a limited coverage of the Hertzsprung-Russel (HR) diagram.
Fig. 2 shows the HR diagram of the MILES stars. The figure also shows four isochrones used in our model. One can see that, although in general the coverage of the isochrones is quite good, there are some regions of parameter space where there is clearly a lack of stars. Especially for the low-mass end and the upper giant and asymptotic giant branches, it is apparent that the stars in the library do not fully cover the parameter space defined by the isochrone stars. As a consequence, there can potentially be significant uncertainties in the stellar spectra that are constructed in these regions.
The limited coverage of the HR diagram by empirical libraries requires a method to attach the stars in the library to the isochrones. We use an interpolator to do this, which for a given set of stellar parameters tries to interpolate between the surrounding spectra to create a representative stellar spectrum.
The idea behind such an interpolator is to create a function that interpolates between the spectra in the stellar library, allowing us to construct stellar spectra at all relevant locations in the HR diagram. Before creating an interpolator, one has to define the parameters that are required to model the spectrum of a star. In addition to the effective temperature and surface gravity, this would in principle require detailed knowledge of all chemical abundances in the star. However, the isochrones only define the overall metallicity , whereas the stars in the MILES library have measured values of available. Therefore we choose to use an interpolator that interpolates the spectra of the stars in the three dimensional space of effective temperature, surface gravity and . In addition, we assume that to make the conversion from the isochrones to the stellar library straightforward. For the parameters of the stars in the MILES library, we use the values derived by Cenarro et al. (2007).
Interpolating between stellar spectra may be done by using either a local approach or a global approach. Within the local approach described in Vazdekis et al. (2003), the spectra in the library that surround the point for which we want to create a spectrum are weighted and combined to create a representative spectrum for that particular point. The global approach described in Prugniel et al. (2011) fits a polynomial to each of the spectral bins individually. This polynomial may then be used to determine the flux in each of the bins for the required set of atmospheric parameters.
In this work, we use a local approach very similar to that described in Vazdekis et al. (2003). Before we build the interpolator, we normalize the stars in the MILES library such that they have the same magnitude in the (Johnson) V-band. Suppose that we want to create a spectrum for the point (where ). Within the three dimensional space of , and , this point is surrounded by eight cubes. The first step of the interpolator consists of finding the nearby stars that are present in each of these eight boxes. The initial size of each of these cubes is , in which corresponds to the typical uncertainty in the parameters . These typical uncertainties are defined on the basis of the local density of stars , such that
In this equation is the maximum density of stars in the grid, which is taken as the 99.7 percentile of all densities in the grid. For the minimum uncertainty and maximum uncertainty we use the same values as Vazdekis et al. (2003): , , , , and . As an additional constraint for , should lie within . If no stars are found in one of the boxes, the size of the box is enlarged in steps of along each of its axes until at least one star is found or the axes reach a size of . Note that the metallicity parameter is only taken into account for stars with 4000 K 9000 K. Outside of this range the uncertainty in the metallicity is relatively large and in addition there is a significant number of stars with unknown metallicity.
As a next step, we create a representative spectrum for each of the boxes that contain stars. To create the spectrum of a box, each of its stars is assigned a weight such that
where are the parameters of the star and is the signal-to-noise ratio (SNR) of the star. The maximum SNR, , is defined to be the 99.7 percentile of the the SN ratios of all the stars. If , is set equal to . Once we have a weight for each of the stars in a box, the spectrum of that box is calculated as
with the weight of star and its spectrum. In the same way, we also determine a corresponding set of of parameters for each of the boxes, such that
Each box that contain stars is assigned a weight based on the distance of the box parameters to the point , so that
and as a final step the spectrum of the point that we want to interpolate is calculated as
where the sum runs over all of the boxes that contain stars. For more details we refer to Vazdekis et al. (2003).
3.3.1 Polynomial correction of the MILES stars
To test the interpolator, we have created an interpolated spectrum for each of the stars in the MILES library, in such a way that the original MILES star was excluded from the data set that we use to build the interpolator. In this way, we calculate for each of the spectra in the MILES library the average residual between the original spectrum and the interpolated spectrum . The residual is weighted by the average of the original spectrum, such that
This allowed us to assess the quality of the interpolator and to identify stars with problems. A large mismatch between the interpolated spectrum and the original spectrum may be the result of a low SNR of the original spectrum, any form of peculiarity in the original spectrum or using incorrect atmospheric parameters for the star in question. Problematic stars with a low signal-to-noise ratio, stars with obvious problems in their spectra after visual inspection and some peculiar stars that showed a large mismatch with their interpolated spectrum were removed from the dataset. Overall we removed 46 stars from the library.
A mismatch between the continuum of the original star and the interpolated spectrum may be caused by uncertainties in both the flux calibration and the correction for extinction. To absorb this effect, we correct each of the stars in the MILES library by a first order polynomial. This polynomial correction is an iterative process. At every step, the star with the largest residual is selected and corrected by a polynomial. Then the residuals are calculated again for the new data set. This process is repeated until each star has been corrected. Each star is corrected only once. After this correction, the average residual between the stars in the MILES library and the interpolated spectra is 2.3%. If we exclude stars with any notion of peculiarity in SIMBAD the average residual becomes 1.8%. Fig. 3 shows two examples of a comparison between a MILES spectrum and an interpolated spectrum with the same atmospheric parameters.
Having the interpolator in place, we create spectra for all the stars defined by the set of isochrones in the age-metallicity grid yr and . As a final step, the spectra resulting from the interpolator are scaled to match the -magnitudes of the stars defined by the isochrone, for which we use the filter response defined in Maíz Apellániz (2006).
4 Results - mock single stellar populations
In this section we apply our model to a number of mock SSPs. We create the spectra for these mock SSPs by combining the stellar templates of one particular age and metallicity with an IMF.
To model the velocity dispersion of real stellar populations, we smooth the stellar templates to a velocity dispersion of . Before applying the model, we smooth the stellar templates to the same velocity dispersion. In Appendix A we show that the results that we obtain for our mock SSPs do not depend on the velocity dispersion.
Note that although here we fix the velocity dispersion of the templates, this could in principle also be a free parameter of the model that we sample together with the other non-linear parameters. We will implement this in a future version of the code.
As a final step, we add Gaussian noise to the mock spectra to mimic an observation with a certain signal-to-noise ratio (SNR).
4.1 The regularization scheme
Before we apply the model to the mock SSPs we have to specify a regularization scheme . For the mock SSPs we choose to use a regularization scheme that penalizes the relative deviation of the weights from the prior and prefers smooth deviations from the prior. This is expressed through the following inverse covariance matrix for the prior
where is a diagonal matrix with
and enforces smoothness of the deviations by using the following form of gradient regularization
4.2 First level of inference
Within the first level of inference, we assume a certain model family and try to reconstruct its underlying parameters. Here we parametrize the IMF as a double power law with a break at . We split the reconstruction of the model parameters into a linear and a non-linear part. The linear part consists of the weights assigned to the individual stellar templates (i.e. the actual best-fitting IMF) whereas the non-linear part consists of determining the most probable regularization parameter, finding the age and metallicity of the SSP and sampling the parameters of the IMF prior parametrization.
4.2.1 Linear parameters
The linear parameters in our model are represented by the weights w which correspond to the number of stars that we have for each of the templates in our model. As described in Section 2, these weights allow us to reconstruct the piecewise IMF of the stellar population.
To demonstrate the reconstruction of the IMF we consider a mock SSP with an age of 8.5 Gyr and solar metallicity. This mock SSP has an average SNR of 70 and the underlying IMF of the stellar population is a double power law Kroupa IMF (i.e. a break at 0.5 , a low-mass slope and a high-mass slope ).
To reconstruct the piecewise IMF, we first need to find the most probable distribution of weights given a set of stellar templates and a prior IMF. If we fix the age and metallicity of the templates to the true values (which we know a priori in this case), the only thing that we can change in the model is the prior IMF.
First we consider what happens when we use the correct prior, a double power law Kroupa IMF (but let the level of regularization be free in the optimization). The upper panel of Fig. 4 shows the spectrum of the mock SSP together with the reconstructed spectrum of the model. The average of the spectrum divided by the standard deviation of the difference between the spectrum and the reconstructed spectrum is 68.6, consistent with an average SNR of 70. The lower panel of Fig. 4 shows the reconstructed IMF compared to the original IMF and the prior IMF. As one might expect, in this case the original IMF, the prior IMF and the reconstructed IMF all lie on top of each other.
As a next step, we provide the model with an (incorrect) Salpeter IMF prior (i.e. a constant slope of ). The regularization parameter is again free and optimized by the model. The reconstructed spectrum and reconstructed IMF that we obtain are shown in the upper and lower panel of Fig. 5. In this case the reconstructed IMF converges between the original IMF and the prior IMF, reflecting the fact that the minimization routine on the one hand wants to provide a good fit to the data and on the other hand wants to stay as close as possible to the prior.
Moreover, the obtained solution tends more and more towards the prior as we go to lower masses. This is probably related to the fact that the lowest mass stars contribute very little light to the spectrum. Therefore, it becomes much harder for the model to derive the abundance of these lower mass stars and hence the solution tends more and more towards the incorrect prior.
There also appears to be a degeneracy between the lower mass templates ( ) and the higher mass templates ( ). This degeneracy is such that the model corrects the over-abundance of the lower mass templates with respect to the original weights (enforced by the prior) by decreasing the abundance of the templates that have slightly higher masses. In Fig. 6 we show the reconstructed spectrum from the most probable weights as compared to the original input spectrum without noise. This figure shows us that the obtained solution provides a very good fit to the data and it appears that this fit is as good as that obtained for the real solution within the uncertainty computed from the covariance matrix. Since this solution lies closer to the prior IMF, it is preferred over the actual solution. Except for the lowest mass bin, the actual solution lies within the one sigma errors of our solution.
One might expect that this degeneracy disappears if we increase the SNR. However, repeating this analysis for a number of (higher) SNRs shows that the degeneracy does not completely disappear. As an example, we show in the lower panel of Fig. 5 the reconstructed IMF for a spectrum with SNR=150. Although the obtained solution is now closer to the true one, apparently the data is not informative enough to break the degeneracy.
The final test that we discuss here is that of an IMF which is not part of the prior space allowed by the IMF parametrization. For this test we consider a 13 Gyr old mock SSP with solar metallicity. We assign weights to the different stellar templates based on a Kroupa IMF () and then add to these weights a sinusoidal bump. This bump is added to the templates in the mass range 0.3 through 0.7 with a maximum of 10% for , so that we have
for templates in the range .
To reconstruct the IMF for this mock SSP we use a Kroupa IMF prior. The reconstructed spectrum and IMF are shown in Fig. 7. This figure shows that, although not by much, the obtained solution is different from the prior in the sense that it deviates slightly towards the true solution. And although this may not allow us to find the true solution, it provides a clear indication that the prior we used does not completely fit the data.
|name||age [Gyr]||[Fe/H]||IMF||reconstructed||reconstructed||evidence||reconstruction IMF prior parameters|
|age [Gyr]||[Fe/H]||2nd best|
4.2.2 Non-linear parameters
The non-linear parameters of our model are represented by the age and metallicity of the stellar templates and by the parameters of the IMF prior model. We split the sampling of these parameters into two parts.
First we determine the evidence for a grid of ages and metallicities. Each point in this grid is associated with a set of stellar templates with corresponding age and metallicity. We select the templates with the highest evidence. Secondly, we use these templates to refine the sampling of the parameters of the IMF prior model. Note that in principle Multinest already provides us with a sample of the parameters of the IMF prior and that the sampling of these parameters with emcee for the stellar templates with the highest evidence is only required to get a more precise sampling. Inside this loop, for every sample of the IMF prior model parameters , the model continuously determines the most probable regularization parameter and solves for the most probable weights .
To demonstrate the reconstruction of the non-linear parameters we consider a set of twelve mock SSPs. The first nine mock SSPs have a Kroupa IMF (, ) and the last three mock SSPs have a bottom-heavy IMF with (the two low-mass slopes limit the range of low-mass slopes that are currently being considered in the literature as reasonable for early-type galaxies). All of the mock SSPs have a SNR of 70. The parameters of the different mock SSPs are summarized in Table 1.
Fig. 8 shows the results for the reconstruction of the non-linear parameters of mock5. The age-metallicity grid in this plot shows us that the stellar templates that we used as an input (i.e., and ) give us the highest evidence. Although the grid shows an age-metallicity degeneracy, the evidence difference with the other grid points is at least more than 8. This implies that there is substantial evidence in favour of the correct stellar templates and that we are able to reconstruct the age and metallicity of the mock SSP. Note that in the future we plan to further expand the sampling to cover the full space of age, metallicity and IMF prior parameters.
In addition to the age-metallicity grid, Fig. 8 also shows the reconstruction of the two IMF slopes. These slopes are part of the parameters that define the assumed double power law IMF prior parametrization. The values presented for and in Fig. 8 correspond to the median values of the marginalized distributions whereas the errors represent the difference between the median and the 16th and 84th percentile. Within these confidence limits, the reconstructed IMF prior parameters for mock5 agree well with the input parameters.
Once we have a set of best fit values333We use the maximum a posteriori values of the sampled IMF prior parameters. for the parameters of the IMF prior model, we use these parameters to construct a prior for the IMF. This prior may then be used to reconstruct the piecewise IMF, similar to what we did in Section 4.2.1. The reconstructed IMF for mock5 is shown in Fig. 9. Also shown in this figure is the reconstructed spectrum obtained by using this IMF together with the most probable stellar templates.
The plots corresponding to the other mock SSPs in Table 1 are shown in Appendix B. For all of the mock SSPs, the reconstructed parameters are summarized in Table 1. We are able to select the true age and metallicity for all of our mock SSPs. As a measure of the robustness of this reconstruction, we present in Table 1 the difference in log evidence with the second best set of stellar templates. These numbers show us that for mock3, mock4, mock5, mock7, mock9, mock10, mock11 and mock12 there is substantial evidence in favour of the true set of stellar templates. For mock1, mock2, mock6 and mock8 there is strong evidence in favour of the true stellar templates. See also Section 2.1.6 for the interpretation of the difference in log evidence.
The reconstructed IMF slopes for the twelve mock SSPs are listed in Table 1. Except for mock7, mock8 and mock11, we are able to reconstruct the input slopes of the IMF within the confidence limits resulting from the sampling procedure. For mock7, mock8 and mock11, the true high-mass slope is just outside the confidence limits. Since the confidence limits correspond to the 16th and 84th percentile of the distribution, one would expect that indeed about one third of the test cases will be outside of these limits.
For the intermediate and older populations with a Kroupa IMF, the errors on the low-mass slope are significantly smaller than those of the younger populations with a Kroupa IMF. Although the absolute signal of the low-mass stars in an old and a young population may be the same, the additional light of the young stars that are still present in the younger population effectively reduces the SNR of the low-mass stars. So the younger an SSP, the lower the SNR of the low-mass stars (for a spectrum with a given SNR) and the more difficult it becomes to constrain the low-mass slope .
Driven by the increasing error on for the mock SSPs with a Kroupa IMF, we also consider the fractional contribution of low-mass stars (i.e. M) to the integrated spectrum across the MILES wavelength range. Table 2 provides an overview of this fraction for the different ages and IMF prior models that we consider in Table 1 (for solar metallicity). We conclude that the signal from the low-mass stars in the youngest populations with a Kroupa IMF becomes comparable to or even lower than the intrinsic noise of the spectrum. Hence, this explains why there is a sudden increase in the error on if we go from the intermediate to the youngest populations and why the effect is much smaller if we go from the oldest populations to the intermediate populations.
By increasing the age of an SSP, one reduces light from the more massive stars in the SSP and effectively increases the SNR of the low-mass stars. A different way to increase the SNR of the low-mass stars is to increase the number of low-mass stars. To realize this, we consider a set of three bottom-heavy SSPs (mock10, mock11 and mock12). Table 2 shows that the relative contribution of low-mass stars to the integrated spectrum is much higher than it is for a Kroupa IMF. This is reflected in the smaller errors on the low-mass slope in Table 1, even for the youngest population where the signal of the low-mass stars is still well above the intrinsic noise of the spectrum.
The high-mass slope is, independently of age, determined by the stars that emit most of the light. Therefore we expect that the model accurately reconstructs for all mock SSPs. This is confirmed by the relatively small errors on in Table 1, which are more or less constant as a function of age.
All of the two-dimensional probability density plots in Fig. 8 and Appendix B show a clear degeneracy between the low-mass slope and high-mass slope . This degeneracy is such that an increasing low-mass slope seems to prefer a decreasing high-mass slope. To interpret this result we show in Fig. 10 two reconstructed IMFs corresponding to the one sigma values around the medians of the sampled values. For the first reconstruction we combine a lower low-mass slope with a higher high-mass slope (i.e. , ) and for the second reconstruction we combine a higher low-mass slope with a lower high-mass slope (i.e. , ). The parameters of these two reconstructions reflect the degeneracy visible in Fig. 8. The normalization of the IMF is lower for the second reconstruction. For both reconstructions the model appears to correct an over-abundance (under-abundance) of the lowest mass templates with respect to the real IMF by an under-abundance (over-abundance) of the templates around 0.5 (luminosity conservation). In fact, the observed degeneracy between the IMF slopes may therefore be closely related to the degeneracy observed in Fig. 5.
4.2.3 Mass fraction of low-mass stars
Stellar templates in the same stellar mass range can have spectra that look very similar. As a consequence, there may be degeneracies between stellar templates with similar masses. Our regularization scheme ensures that these degeneracies do not cause problems when we solve for the most probable weights. However, if such a degeneracy exists it is basically impossible to find the exact contribution of the degenerate templates and hence the balance between these templates is set by the parametrization of the IMF prior. Although in that case we may not be able to completely reconstruct the piecewise IMF, we may still be able to constrain broader IMF-related quantities, for example the dwarf-to-giant ratio.
One of the important questions that we try to answer with our model is what the relative importance of low-mass stars is to the total stellar mass. In that context, transforming the most probable weights determined by our model into a fractional contribution of low-mass stars to the total stellar mass allows for a simple interpretation of the results.
La Barbera et al. (2013) define the fraction of the total initial stellar mass in stars with as
However, the SSPs that we consider here do not have stars more massive than . Everything beyond the high-mass-cut-off (HMCO) of the current mass function (i.e. the highest mass star in an SSP of a given age and metallicity that is still present) is more sensitive to the parametrization of the IMF than to the actual distribution of stellar masses. This is particularly true if we extrapolate our reconstructed IMFs, which can possibly be irregular. Therefore, we define the quantity as the fraction of the total current stellar mass that is in stars with
In Table 3 we summarize the mass fractions for our twelve mock SSPs. As a reference, for every SSP we report : the value of for an SSP with the same age and metallicity and an IMF equal to the input IMF. The results in Table 3 show that for all of our mock SSPs the value of lies within one sigma of the reconstructed and these results are therefore consistent with the input data.
4.2.4 The signal-to-noise ratio
The spread that we find in the reconstructed parameters for the mock SSPs is related to the SNR of the input spectra. To demonstrate this, we compare the reconstructed parameters for four mock SSPs with different SNRs. We will consider a young, metal-rich population with the same parameters as mock3 and an old, metal-poor population with the same parameters as mock7. For both populations we have a spectrum with a SNR=70 and a spectrum with a SNR=150. The spectra are summarized in Table 4.
The reconstructed IMF prior parameters (, and the normalization of the IMF) for the mock SSPs in Table 4 are shown in Fig. 11. This figure shows that increasing the SNR from 70 to 150 decreases the spread in the reconstructed IMF prior parameters and results in a much sharper peak in the marginalized distributions. In this case, the effect of increasing the SNR is very clear. However, for these mock SSPs the only source of uncertainty that we have is the Gaussian noise that we add a priori to the spectra of the SSPs. When we consider real data, there are additional problems and uncertainties that play a role, including flux-calibration issues, telluric residuals, abundance ratios and systematic uncertainties between the interpolated stellar templates and the true stellar templates. These errors are non-negligible and therefore increasing the SNR will not always improve the reconstruction of the IMF model parameters.
4.2.5 Wavelength dependence
The residuals between the original spectrum and the reconstructed spectrum for mock5 in Fig. 9 do not show a strong dependence on wavelength. To investigate a possible wavelength dependence of the residuals in more detail, we smooth the residuals with a Gaussian kernel with pixels. The smoothed version of the residual is also shown in Fig. 9. The smoothed residuals seem to be slightly smaller at the bluest wavelengths, but this is mainly related to the lower flux in that region (the residuals are absolute).
Although we do not see a wavelength dependence in the residuals of our results, this does not tell us which wavelength regions are better suited to constrain the IMF. To investigate the relation between the reconstruction of the IMF prior parameters and the considered wavelength region we repeat our analysis for mock5 using only the blue half of the spectrum and using only the red half of the spectrum. The results of this analysis are presented in Table 5.
Firstly, the results in Table 5 show that ignoring half of the data points significantly increases the scatter in the reconstructed parameters. Secondly, the reconstructed parameters that we obtain using the reddest wavelength region are consistent with the input data whereas the reconstructed parameters derived from the bluest wavelength region are not. Since the overall flux in the blue part of the spectrum is lower, the SNR of the blue part is also slightly lower. Moreover, according to our models in the blue part of the spectrum the low-mass stars contribute only 1.5% to the integrated spectrum whereas in the red part they contribute 2.6%. Taking into account that these numbers have the same order of magnitude as the intrinsic noise in the spectrum this represents an important difference. This difference and the lower SNR in the blue part of the spectrum might explain why we are able to constrain correctly in the red part of the spectrum and not in the blue part of the spectrum. The offset that we find for in the blue part of the spectrum may be related to the model being unable to correctly break the degeneracy between the IMF slopes.
The results in this section show that in order to constrain the contribution of low-mass stars to the spectrum it is important to consider a broad wavelength region. Furthermore, it is essential to consider wavelength regions and features that are sensitive to changes in the IMF. Since low-mass stars emit most of their light in the (infra)red part of the spectrum and most of the IMF sensitive features are also found in this part of the spectrum (Spiniello et al., 2014), (infra)red wavelength regions will be more useful to constrain the low-mass IMF. In future work, we therefore plan to combine our model with the X-Shooter Spectral Library which provides a much broader and redder wavelength coverage.
4.3 Second level of inference
The second level of inference allows us to compare different model families based on the evidence. Here we consider two model families: one in which the IMF is parametrized as a double power law and one in which it is parametrized as a single power law. For simplicity, although there could in principle also be a degeneracy between age/metallicity and IMF model family, for now we fix the age and metallicity in both model families to the true values, such that the stellar templates are the same for the two model families.
To test the second level of inference we consider two mock SSPs with Gyr and . The SNR of the spectra of these SSPs is 70. One of the SSPs has a Kroupa IMF and the other a Salpeter IMF.
We then use Multinest to calculate the evidence for both model families by applying them to the two mock SSPs. The results are presented in Table 6.
|true IMF||IMF model||log evidence|
|Kroupa||single power law|
|Kroupa||double power law|
|Salpeter||single power law|
|Salpeter||double power law|
First consider the Kroupa mock SSP. The difference in log evidence is 2.4 in favour of the double power law model family. Since a broken power law IMF is not part of the model family of single power laws, this is what we expect. However, the difference in log evidence between the two model families is not substantial. Apparently there is a single power law that, in combination with the allowed deviations from the prior IMF, is still able to provide a reasonable fit to the data. If this would not have been the case, the difference in the evidence would have been much more significant.
For the Salpeter mock SSP, the difference in log evidence is 2.4 in favour of of the single power law model family. In this case the single power law input IMF is part of both model families. Therefore we expect the two model families to be able to fit the data equally well. Nevertheless, the model prefers the simpler single power law model. This is the result of Occam’s razor: the model is set up in such a way that it tries to find the simplest model that fits the data.
Considering the errors on the evidences in Table 6, we conclude that for these specific mock SSPs we are able to discriminate between the single- and double-power-law model families. The difference in log evidence between the two model families is however not substantial.
5 Results - model versus model
|reconstructed||evidence||reconstructed IMF prior parameters|
|name||age [Gyr]||[Fe/H]||IMF||age [Gyr]||[Fe/H]||2nd best|