# A comprehensive Bayesian discrimination of the simple stellar population model, star formation history and dust attenuation law in the spectral energy distribution modeling of galaxies

## Abstract

When modeling and interpreting the spectral energy distributions (SEDs) of galaxies, the simple stellar population (SSP) model, star formation history (SFH) and dust attenuation law (DAL) are three of the most important components. However, each of them carries significant uncertainties which have seriously limited our ability to reliably recover the physical properties of galaxies from the analysis of their SEDs. In this paper, we present a Bayesian framework to deal with these uncertain components simultaneously. Based on the Bayesian evidence, a quantitative implement of the principle of Occam’s razor, the method allows a more objective and quantitative discrimination among the different assumptions about these uncertain components. With a Ks-selected sample of 5467 low-redshift (mostly with ) galaxies in the COSMOS/UltraVISTA field and classified into passively evolving galaxies (PEGs) and star-forming galaxies (SFGs) with UVJ diagram, we present a Bayesian discrimination of a set of SSP models from five research groups (BC03 and CB07, M05, GALEV, Yunnan-II, BPASS V2.0), five forms of SFH (Burst, Constant, Exp-dec, Exp-inc, Delayed-), and four kinds of DAL (Calzetti law, MW, LMC, SMC). We show that the results obtained with the method are either obvious or understandable in the context of galaxy physics. We conclude that the Bayesian model comparison method, especially that for a sample of galaxies, is very useful for discriminating the different assumptions in the SED modeling of galaxies. The new version of the BayeSED code, which is used in this work, is publicly available at https://bitbucket.org/hanyk/bayesed/.

## 1 Introduction

Understanding the formation and evolution of galaxies is one of the biggest challenges in modern astrophysics (Mo et al., 2010; De Lucia et al., 2014; Somerville & Davé, 2015; Naab & Ostriker, 2017). Various complex and not well understood baryonic processes, such as the formation and evolution of stars (Kennicutt, 1998; McKee & Ostriker, 2007; Heber, 2009; Kennicutt & Evans, 2012; Duchêne & Kraus, 2013), the accretion and feedback of super-massive black holes (Melia & Falcke, 2001; Merloni, 2004; Kormendy & Ho, 2013; Fabian, 2012) and the chemical enrichment of interstellar medium (ISM) (McKee & Ostriker, 1977; Spitzer, 1978; Li & Greenberg, 1997; Draine, 2003; De Lucia et al., 2004; Scannapieco et al., 2005; Draine, 2010; Nomoto et al., 2013), are involved. What make the problem even more challenging is the fact that all of these complex baryonic processes are also tightly entangled (Hamann & Ferland, 1993; Timmes et al., 1995; Ferrarese & Merritt, 2000; Hopkins et al., 2008b, a; Marulli et al., 2008; Bonoli et al., 2009; Heckman & Best, 2014). It is often not trivial to decouple any one of them from the others to allow a complete independent study. To disentangle these complex and highly related baryonic processes involved in the formation and evolution of galaxies, we need to make use of all available sources of information (Bartos & Kowalski, 2017).

Despite the recent progress in the detection of the cosmic-rays (Murase et al., 2008; Adriani et al., 2009), neutrinos (Ahmad et al., 2002; Becker, 2008), and gravitational-waves (Abbott et al., 2016, 2017), electromagnetic emissions are still the main source of information for our understanding of galaxies. All of those complex baryonic processes involved in the formation and evolution of galaxies leave their imprint on the spectral energy distributions (SEDs) of the electromagnetic emissions from galaxies. In the last decades, large photometric and spectroscopic surveys, such as 2MASS (Skrutskie et al., 2006), SDSS (York et al., 2000), COSMOS (Scoville et al., 2007), UltraVISTA (McCracken et al., 2012; Muzzin et al., 2013), CANDELS (Koekemoer et al., 2011; Grogin et al., 2011), and 3D-HST (Brammer et al., 2012; Skelton et al., 2014), have provided us with rich multi-wavelength observational data for millions of galaxies covering a large range of redshift. These massive data sets present a tremendous opportunity and challenge for us to understand the formation and evolution of galaxies from the analysis of their SEDs.

The process of solving the inverse problem of deriving the physical properties of galaxies from their observational SEDs is known as SED-fitting (Bolzonella et al., 2000; Massarotti et al., 2001; Ilbert et al., 2006; Salim et al., 2007; Walcher et al., 2011). In principle, a SED-fitting method which is capable of effectively extracting all the information encoded in these SEDs of galaxies would allow us to fully understand their physical properties. Traditionally, SED-fitting is considered as an optimization problem, where some minimization techniques are employed to find the best-fit model and corresponding value of parameters (Arnouts et al., 1999; Bolzonella et al., 2000; Cid Fernandes et al., 2005; Kriek et al., 2009; Koleva et al., 2009; Sawicki, 2012; Gomes & Papaderos, 2017). However, due to the large number of often degenerated free parameters, it should be more reasonable to consider the problem of SED-fitting as a Bayesian inference problem (Benítez, 2000; Kauffmann et al., 2003). Recently, it has becoming quite popular to employ the Markov Chain Monte Carlo (MCMC) sampling method to efficiently obtain not only the best-fit results but also the detailed posterior probability distribution of all parameters (Benítez, 2000; Kauffmann et al., 2003; Serra et al., 2011; Acquaviva et al., 2011; Pirzkal et al., 2012; Johnson et al., 2013; Calistro Rivera et al., 2016; Leja et al., 2017).

Despite the popularity of Bayesian parameter estimation method, the Bayesian model comparison/selection method, which is based on the computation of the Bayesian evidences of different models, has not yet been widely used in the field of SED-fitting of galaxies. The Bayesian evidence quantitatively implements the principle of Occam’s razor, according to which a simpler model with compact parameter space should be preferred over a more complicated one with a large fraction of useless parameter space, unless the latter can provide a significantly better explanation to the data (MacKay, 1992, 2003). Based on the Bayesian framework initially introduced by Suyu et al. (2006) for solving the gravitational lensing problem, Dye (2008) presented an approach to determine the star formation history of galaxies from multiband photometry, where the most probable model of star formation history is obtained by the maximization of the Bayesian evidence. In Han & Han (2012), we have presented a Bayesian model comparison for the SED modeling of hyperluminous infrared galaxies (HLIRGs), where the multimodal-nested sampling (MultiNest) techniques (Feroz & Hobson, 2008; Feroz et al., 2009, 2013) has been employed to allow a more efficient calculation of the Bayesian evidence of different SED models. Salmon et al. (2016) presented a Bayesian approach based on Bayesian evidence to check the universality of the dust attenuation law. For a sample of galaxies from CANDELSwith rest-frame UV to near-IR photometric data, they found that some galaxies show strong Bayesian evidence in favor of one particular dust attenuation law over another, and this preference is consistent with their observed distribution on the infrared excess (IRX) and UV slope () plane. Dries et al. (2016, 2018) presented a hierarchical Bayesian approach to reconstructing the initial mass function (IMF) in single and composite stellar populations (SSPs and CSPs), where the Bayesian evidence is employed to compare different choices of the IMF prior parameters, and to determine the number of SSPs required in CSPs by the maximization of the Bayesian evidence.

In Han & Han (2014), with the first publicly available version of our BayeSED code, we have presented a Bayesian model comparison between two of the most widely used stellar population synthesis (SPS) model (Bruzual & Charlot, 2003; Maraston, 2005, hereafter BC03 and M05) for the first time. With the distribution of Bayes factor (the ratio of Bayesian evidence) for a Ks-selected sample of galaxies in the COSMOS/UltraVISTA field (Muzzin et al., 2013), we found that the BC03 model statistically has larger Bayesian evidence than the M05 model. In Han & Han (2014), the reliability of the BayeSED code for physical parameter estimation has also been systematically tested. The internal consistency of the code has been tested with a mock sample of galaxies, while its external consistency has been tested by the comparison with the results of the widely used FAST code (Kriek et al., 2009). However, the work still has many limitations. For example, a fixed exponentially declining SFH and the Calzetti et al. (2000) dust attenuation law have been assumed to be universal for all galaxies. However, from either an observational or a theoretical point of view, the form of star formation history and dust attenuation law of different galaxies are not likely to be the same (Witt & Gordon, 2000; Maraston et al., 2010; Wuyts et al., 2011; Simha et al., 2014). Besides, the numerous uncertainties carried by almost all the components involved in the process of stellar population synthesis (Conroy et al., 2009, 2010; Conroy & Gunn, 2010; Conroy, 2013) have resulted in the diversity of SPS models. Except for the BC03 and M05 model, there are numerous SPS models from many other groups, which have employed different stellar evolution tracks, stellar spectral libraries, IMFs and/or synthesis methods (Buzzoni, 1989; Fioc & Rocca-Volmerange, 1997; Leitherer et al., 1999; Zhang et al., 2005a; Kotulla et al., 2009; Eldridge & Stanway, 2009; Conroy et al., 2009; Vazdekis et al., 2010).

As three of the most important components in modeling and interpreting the SEDs of galaxies, the simple stellar population model, star formation history and dust attenuation law all carry significant uncertainties. The existence of these uncertainties would seriously limit the possibility of reliably recovering the physical properties of galaxies from the analysis of their SEDs. Besides, it is not easy to reasonably quantify the impact of any one of them without mention the other two. So, it is very important to find an unitized method to quantify the propagation of these uncertainties into the estimation of the physical parameters of galaxies, and to quantitatively discriminate their different choices. In this work, we present an unitized Bayesian framework to deal with all of these uncertain components simultaneously.

This paper is structured as follows. In §2, we introduce the new SED modeling module of the BayeSED code, including the composite stellar population (CSP) synthesis method in §2.1, the SED modeling of a simple stellar population (SSP) in §2.2, the form of star formation history (SFH) in §2.3, and the dust attenuation law (DAL) in §2.4. We then briefly review the Bayesian inference methods in §3, including the Bayesian parameter estimation in §3.1 and the Bayesian model comparison in §3.2. In the next two sections, we introduce our new methods for calculating the Bayesian evidence and associated Occam factor for the SED modeling of an individual galaxy (§4) and a sample of galaxies (§5), respectively. In §6, we present the results of applying our new methods to a Ks-selected sample in the COSMOS/UltraVISTA field for discriminating among the different choices of SSP model, SFH and DAL when modeling the SEDs of galaxies. Some discussions about the different SSPs, SFHs and DALs are presented in §7. Finally, a summary of our new methods and results is presented in §8.

## 2 The spectral energy distribution modeling of galaxies in BayeSED

For a detailed Bayesian analysis of the observed multi-wavelength SED of a galaxy, the modeling of its SED is often the most computationally demanding. So, the efficiency of the whole Bayesian analysis process is strongly depends on the efficiency of the SED modeling method. In the previous version of BayeSED (Han & Han, 2012, 2014), some machine learning methods, such as artificial neural network (ANN) and K-nearest neighbor searching (KNN) algorithm, have been employed. After the training with a pre-computed library of model SEDs, the machine learning methods allow a very efficient computation of a massive number of model SEDs during the sampling of an often high-dimensional parameter space of a SED model. By using the machine learning methods, very different SED models can be easily integrated into the BayeSED code with the same procedure. Therefore, the BayeSED code can be easily extended to solve the SED fitting problem in different fields.

Despite these interesting benefits, the machine learning based SED modeling methods are not so convenient during the development a SED model, since any modification to the model components requires a new and often time-consuming machine learning procedure. To explore the effects of assuming different simple stellar population model, star formation history, and dust attenuation law in the SED modeling of galaxies, we have built a SED modeling module into the new version (V2.0) of our BayeSED code (see the flowchart in Figure 1). Currently, we do not intend to build a very sophisticated SED modeling procedure into the BayeSED code. To be consistent with the principle of Occam’s razor, according to which “Entities should not be multiplied unnecessarily”, we prefer to start with a simple but still useful SED modeling procedure, and gradually increase its complexity.

### 2.1 Composite Stellar Population synthesis

The SED of a galaxy as a complex stellar system can be obtained with composite stellar population synthesis as:

(1) | ||||

(2) |

where is the star formation rate at time (SFH: the star formation history), the luminosity emitted per unit wavelength per unit mass by a simple stellar population (SSP) of age and chemical composition , and the transmission function of the ISM. We assume a time-independent metallicity and dust attenuation law for the entire composite population.

### 2.2 The SED modeling of a simple stellar population

According to the most widely used isochrone synthesis approach (Charlot & Bruzual, 1991; Bruzual & Charlot, 1993, 2003), the SED of an SSP is obtained as:

(3) |

where is the stellar mass, the stellar initial mass function (IMF) with lower and upper mass cutoffs and , and the SED of a star with bolometric luminosity , effective temperature , and metallicity . So, different choices for any of the IMF, stellar isochrone and stellar spectral library will result in different SSP models.

Alternatively, the fuel consumption theorem (Renzini & Buzzoni, 1986; Maraston, 1998, 2005) has been used to allow an easier calculation of the luminosity contribution of the short-lived and often less understood post-main sequence stellar evolution stages, such as the thermally-pulsing asymptotic giant branch (TP-AGB) phase. According to the theorem, the luminosity contribution of each stellar evolutionary phase is proportional to the amount of hydrogen and/or helium (the fuel) burned by nuclear fusion within the stars. It also provides analytical relations between the main sequence and post-main sequence stellar evolution, and the SEDs can be obtained using the relations between colors/spectra and bolometric luminosities. There are other approaches to obtain the integrated SED of an SSP, such as the use of empirical spectra of star clusters as templates for SSPs (Bica & Alloin, 1986; Bica, 1988; Cid Fernandes et al., 2001; Kong et al., 2003) and the employment of Monte Carlo technique (Zhang et al., 2005a; Han et al., 2007; da Silva et al., 2012; Cerviño, 2013).

There are many publicly available SSP models (See http://www.sedfitting.org/Models.html). In this work, we have selected a set of different SSP models from five groups, including the BC03 (Bruzual & Charlot, 2003) and CB07 (Bruzual, 2007), M05 (Maraston, 2005), GALEV (Kotulla et al., 2009), Yunnan-II (Zhang et al., 2005a), and BPASS V2.0 (Eldridge & Stanway, 2009) models. Many SSP models from other research groups (e.g. Buzzoni, 1989; Fioc & Rocca-Volmerange, 1997; Leitherer et al., 1999; Conroy et al., 2009; Vazdekis et al., 2010), many of which have been widely used in many works, are not included in our list. It is straightforward for us to add all of these SSP models to the new version of the BayeSED code. However, the main purpose of this paper is to demonstrate the Bayesian model comparison method, and to evaluate its effectiveness. So, we try to randomly select a small set of representative models that are as diverse as possible, although they could be biased to those that are either popular, easier to obtain, or more familiar to us. The physical considerations about the effectiveness of the SSP models for the galaxy sample have not been used as the criterion for the selection of them. Actually, they are considered to be equally likely a priori (i.e. before the comparison with data). A summary of the SSP models used in this paper is presented in Table 1. As shown clearly in the table, the SSP models which differ in any model component (Track/Spectral library/IMF/Binary/Nebular) are treated as different SSP models. In the following of this section, we present a short description of each chosen SSP model, with a focus on their differences.

Short name | Model family | Track/Isochrone | Spectral library | IMF | Binary | Nebular |
---|---|---|---|---|---|---|

bc03_ch | BC03^{7} |
Padova94+Charlot97 | BaSeL 3.1 | Chabrier03 | No | No |

bc03_kr | BC03 | Padova94+Charlot97 | BaSeL 3.1 | Kroupa01 | No | No |

bc03_sa | BC03 | Padova94+Charlot97 | BaSeL 3.1 | Salpeter55 | No | No |

cb07_ch | CB07^{8} |
Padova94+Marigo07 | BaSeL 3.1 | Chabrier03 | No | No |

cb07_kr | CB07 | Padova94+Marigo07 | BaSeL 3.1 | Kroupa01 | No | No |

cb07_sa | CB07 | Padova94+Marigo07 | BaSeL 3.1 | Salpeter55 | No | No |

m05_sa | M05^{9} |
Cassisi et al. (1997a, b, 2000) | BaSeL 3.1 | Salpeter55 | No | No |

m05_kr | M05 | Cassisi et al. (1997a, b, 2000) | BaSeL 3.1 | Kroupa01 | No | No |

galev0_sa | GALEV^{10} |
Padova94 | BaSeL 2.0 | Salpeter55 | No | No |

galev0_kr | GALEV | Padova94 | BaSeL 2.0 | Kroupa01 | No | No |

galev_sa | GALEV | Padova94 | BaSeL 2.0 | Salpeter55 | No | Yes |

galev_kr | GALEV | Padova94 | BaSeL 2.0 | Kroupa01 | No | Yes |

ynII_s | Yunnan-II^{11} |
Pols et al. (1998)^{12} |
BaSeL 2.0 |
Miller & Scalo (1979)^{13} |
No | No |

ynII_b | Yunnan-II | Pols et al. (1998) | BaSeL 2.0 | Miller & Scalo (1979) | Yes | No |

bpass_s | BPASS V2.0^{14} |
Eldridge et al. (2008)^{15} |
BaSeL 3.1 | Broken power law^{16} |
No | No |

bpass_b | BPASS V2.0 | Eldridge et al. (2008) | BaSeL 3.1 | Broken power law | Yes | No |

Note. –

#### BC03 and updated CB07

The BC03 (Bruzual & Charlot, 2003) model is the one most widely used in the literature. It is a good choice for a standard model that will be compared with. Besides, the isochrone synthesis technique first introduced in this model have been employed by many other more recent models. So, the BC03 model is also a good representative of the set of models which have employed similar technique. We have used the version built with the Padova 1994 evolutionary tracks, the BaSeL 3.1 spectral library, and the IMF of Chabrier (2003), Kroupa (2001), and Salpeter (1955), respectively. The model contains the SED of SSPs with and . The CB07 (Bruzual, 2007) model is very similar to the BC03 model, with the former including an updated prescription (Marigo & Girardi, 2007) for the TP-AGB evolution of low- and intermediate-mass stars, which produces much redder near-IR colors for young and intermediate-age stellar populations. However, whether this represents a much better treatment of the TP-AGB phase remains an open issue (Kriek et al., 2010; Zibetti et al., 2013; Capozzi et al., 2016).

#### M05

The M05 (Maraston, 2005) model is also very widely used in many works and often used to be compared with the BC03 model. A main feature of this model lies on its treatment of the post-main sequence stellar evolution stages, such as TP-AGB, based on the fuel consumption theorem. The contribution of TP-AGB stars is expected to be crucial for modelling the SEDs of young and intermediate age () stellar populations, which predominate the redshift range (Maraston, 2005; Maraston et al., 2006; Henriques et al., 2011). Except for the different treatment of TP-AGB stars, M05 model has employed the input stellar evolution tracks/isochrones of Cassisi et al. (1997a, b, 2000), which is different from that used in BC03 and CB07 model. The public version of M05 model contains the SED of SSPs with and . In this work, we have used the version with a red horizontal branch morphology, and the IMF of Kroupa (2001) and Salpeter (1955), respectively.

#### Galev

The GALEV (GALaxy EVolution) evolutionary synthesis model (Kotulla et al., 2009) has many properties that are in common with the BC03 model. What makes the GALEV model special is its consistent treatment of the chemical evolution of the gas and the spectral evolution of the stellar content. However, to be more easily compared with the SSPs from other groups, we prefer to use the version with metallicity fixed to some specific values, instead of that obtained with a chemically consistent treatment. Actually, we just want to select an SSP model that has nebular emission included, while the GALEV model is the only one which we found to be publicly available and much easier to obtain. Although the treatment of nebular emission in GALEV model is relatively simple, it is still useful to test the importance of including nebular emission in the SED model of galaxies. We have used the web interface at http://model.galev.org/model_1.php to generate the SED of SSPs with and . Both the version with and without the contribution of nebular emission have been used in this work.

#### Yunnan-II

The Yunnan models have been built at our binary population synthesis (BPS) team of Yunnan observatory (Zhang et al., 2004, 2005a, 2005b; Han et al., 2007). The main feature of these models is the consideration of various binary interactions, which is implemented with the help of a Monte Carlo technique. The Yunnan models have employed the Cambridge stellar evolutionary tracks in the form given in the rapid stellar evolution code of Hurley et al. (2000, 2002) as a set of analytic evolution functions fitted to the model tracks of Pols et al. (1998). In this work,we have used the Yunnan-II version (Zhang et al., 2005a) with the BaSeL 2.0 spectral library, and the IMF of Miller & Scalo (1979). The model contains the SED of SSPs with and . To test the importance of considering the effects of binary interactions, both the version with and without binary interactions have been used in this work.

#### Bpass

The Binary Population and Spectral Synthesis (BPASS) code is another publicly available population synthesis model which has considered the effects of binary evolution in the SED modelling of stellar populations. Instead of an approximate rapid population synthesis method, detailed stellar evolution models, which are obtained with a custom version of the long-established Cambridge STARS stellar evolution code, have been used in the code. The authors of the model also try to only use theoretical model spectra with as few empirical inputs as possible in the population syntheses to create synthetic models as pure as possible to compare with observations. In this work, we have used the V2.0 fiducial models which have assumed a broken power law IMF with the slope to be from to , and from to . The model contains the SED of SSPs with and .

The BPASS model is undergoing a rather rapid development. During the writing of this paper, the BPASS team have released their V2.1 (Eldridge et al., 2017) and V2.2 (Stanway & Eldridge, 2018) models. The BPASS V2.0 model, which is used in this paper, was released in 2015 and has been widely used in many stellar and extragalactic works. However, it was not formally described in detail until the V2.1 data release paper of Eldridge et al. (2017). There are a few refinements in the V2.1 models, but no major changes to the V2.0 results. In Eldridge et al. (2017), the authors also discussed some key caveats and uncertainties in their current models. Especially, they identified several aspects of the old stellar populations (), such as the binary fraction in lower mass stars, as problematic in their current model set. In Stanway & Eldridge (2018), the authors stated that some of these issues have been partly addressed in their recently released V2.2 models.

Given the limitations of the BPASS V2.0 model and the improved V2.1 and V2.2 of the same model, it may seem nonsensical to still use the older one. However, in addition to those regarding binary evolution, there are still many other uncertainties involved in the SSP model. Given this, the model would be undergoing an intensive development for a long time, during which the older version of the same model will be rapidly replaced by the newer ones. Actually, many of the models from other groups also have their more updated version (e.g., Bruzual, 2011; Maraston & Strömbäck, 2011; Zhang et al., 2013). Here, we need to point out that it is by no means the aim of this paper to find out the most cutting-edge SSP model. In this paper, we aim at evaluating the effectiveness of applying the Bayesian model comparison method to the SSP models. So, we prefer to use the more stable version of those models that have been used for a relatively long time, and the performance of them have been known to some extent. Certainly, in the future, we would like to compare these more updated models using the Bayesian methods developed in this paper.

### 2.3 The form of star formation history

Due to its complex formation and evolution history, the detailed star formation history (SFH) of a real galaxy could be arbitrarily complex. However, to derive the physical parameters, such as star formation rate (SFR) and stellar mass, from the multi-wavelength photometric SED from a very limited number of filter bands, we need to make a priori simple assumptions about its SFH.

The exponentially declining (Exp-dec for short) SFH of the form , the so-called model, is the most widely used assumption. However, some works suggest that it leads to biased estimation of the stellar mass of individual galaxies and the stellar mass functions (Wuyts et al., 2011; Simha et al., 2014). The exponentially increasing (Exp-inc for short) SFH of the form , the so-called inverted- model (Maraston et al., 2010; Pforr et al., 2012), and the delayed- (Delayed for short) model of form (Lee et al., 2010) has been suggested to explain the SEDs of high-redshift star-forming galaxies. Besides, we also considered the simpler single-burst (Burst for short) and constant SFH for reference. So, in total, we have considered five analytical forms of SFHs.

Recently, some authors have suggested some more complicated parameterizations of SFH (Gladders et al., 2013; Abramson et al., 2016; Diemer et al., 2017; Ciesla et al., 2017; Carnall et al., 2018), and physically motivated prescriptions of SFHs drawn from either the hydrodynamic or the semi-analytic models of galaxy formation (Finlator et al., 2007; Pacifici et al., 2012; Iyer & Gawiser, 2017). Besides, it is also possible to directly employ the non-parametric form of SFH, an approach that has been employed by many works (Heavens et al., 2000; Cid Fernandes et al., 2005; Ocvirk et al., 2006; Tojeiro et al., 2007; Koleva et al., 2009; Díaz-García et al., 2015; Magris C. et al., 2015; Leja et al., 2017; Zhang et al., 2017). However, the aim of this paper is to evaluate the effectiveness of the Bayesian model comparison method and build a reference for future study, it is better to start with much simpler forms of SFH. We leave the exploration of these more complicated forms of SFH for future study.

### 2.4 Dust attenuation curve

The existence of interstellar medium (Draine, 2010) can significantly change the SED of the stellar populations. For example, the UV-Optical starlight can be absorbed by the interstellar dust and re-emitted in the infrared. Besides, the UV and ionizing photon flux from the stellar populations can be reduced by the interstellar nebular gas, and re-emitted as a nebular continuum component and strong emission lines in the optical and infrared. In this paper, we only consider the effect of dust attenuation as a uniform dust screen with different dust attenuation laws, while leaving the consideration of dust emission for future study.

The dust attenuation law of different galaxies are likely to be different due to different star-dust geometry and/or composition (Witt & Gordon, 2000; Reddy et al., 2015; Cullen et al., 2017a, b).
In this work, we have selected four widely used attenuation curves, including the Calzetti et al. (2000) dust attenuation law for starburst galaxies (SB for short), the MW, LMC, and SMC attenuation laws^{17}

## 3 Bayesian inference methods

In BayeSED, the Bayesian inference methods are employed to interpret the SEDs of galaxies. The base for all these methods is Bayes’ theorem. It can be used to solve both the parameter estimation problem and model comparison/selection problems.

### 3.1 Bayesian parameter estimation

With the application of Bayes’ theorem in the parameter space, the posterior probability of the parameters of a model given a set of observational data , the model itself, and all the other background assumptions is related to the prior probability and the likelihood function such that:

(4) |

where is a normalization factor called Bayesian evidence, or model likelihood. With the joint posterior parameter probability distribution in Equation 4, the marginalized posterior probability distribution for each parameter can be obtained as:

(5) |

The mean, median, or maximum of the marginalized posterior probability distribution can be used as an estimation of the value of a parameter, while the typical width of the distribution can be used as an estimation of the associated uncertainty.

Assuming a Gaussian form of noise, we define the likelihood function for independent wavelength band as:

(6) |

where and represent the observational flux and associated uncertainty in each band, and represents the value of flux for the -th band predicted by the model which has a set of free parameters (as indicated by the vector ). The uncertainty for the -th band is not just the observational error, which is often an underestimation. It is a common practice to additionally consider the potential systematic uncertainties in the observed fluxes and the systematic uncertainties in the employed model itself. So, should contain three terms such that:

(7) |

where is the purely observational error, represents the systematic uncertainties regarding the observational procedure, and represents the systematic uncertainties regarding the modeling procedure.

In principle, should be considered as a function of the observer-frame wavelength, while should be considered as a function of the rest-frame wavelength. For example, Brammer et al. (2008) have introduced a rest-frame template error function to account for the systematic uncertainties in the SED model. However, the form of the rest-frame template error function, which is likely to be model-dependent, must be determined in advance, instead of during the SED fitting. Besides, the definition of a flexible form of wavelength-dependent and would require too much free parameters, which cannot be well constrained by the limited number of photometric data. So, in BayeSED V2.0, the two additional terms are simply defined as:

(8) |

and

(9) |

where and are two wavelength-independent free parameters.

In the literature (e.g. Dale et al., 2012; Dahlen et al., 2013), only one of the and is usually used and fixed to a pre-determined value (typically, ). So, to start from a simpler assumption and not go beyond too much from the common practice, in this work, only is considered as a free parameter spanning , while is fixed to be zero. Due to the simple definition in Equation 8 and 9, the two free parameters and are likely to be degenerated with each other to some extent. In practice, we found that the reduced tend to be closer to in most cases if only is considered as a free parameter. Besides, We found that the free parameter can be well constrained by the data, and very close to the typical value (See Table 3 and Figures 9, 10). On the other hand, if is left to vary as a free parameter, the model deficiencies would be deposited in this free parameter, and it is potentially possible to use the value of as an indicator of the quality of a certain model. However, if is also considered as a free parameter, the difference between different SED model as shown in the Bayesian evidence, which is the focus of this paper, would likely be diluted. We leave the exploration of the effects of and more complicated form of both and for future study.

### 3.2 Bayesian model comparison

Bayesian model comparison try to compare competing models, which may have similar or different parameters, by calculating the probability of each model as a whole. Similar to Bayesian parameter estimation, Bayesian model comparison can be achieved by the application of Bayes’ theorem in the model space:

(10) |

Here, the a priori probability distribution of models in the model space, , can be used to represent our aesthetic and/or empirical motivated like or dislike of a model, although it is often assumed to be uniform in practice. The Bayesian evidence, or model likelihood of the model , , which is just a normalization factor in Equation 4 and not relevant to parameter estimation, is crucial for Bayesian model comparison. The Bayesian evidence of a model can be obtained by the marginalization (integration) over the entire parameter space:

(11) |

In Equation 10, is a normalization factor, which is not relevant to the Bayesian comparison of different models , but could be crucial for the Bayesian comparison of different background assumptions in an even higher level of Bayesian inference.

Two models, and , can be formally compared with the ratio of their posterior probabilities given the same observational dataset and the background knowledge and assumptions :

(12) |

where is the prior odds ratio of the two models. If none of the two models is more favored a priori, the Bayes factor, which is defined as

(13) |

can be directly used for the Bayesian model comparison. In practice, the empirically calibrated Jeffrey’s scale (Jeffreys, 1961; Trotta, 2008) suggests that and (corresponding to the odds of about 1:1, 3:1, 12:1 and 150:1) can be used to indicate inconclusive, weak, moderate and strong evidence in favor of , respectively (See also Jenkins, 2014). If more than two models need to be compared, it would be convenient to define a standard model and compute the Bayes factors of all models with respect to the standard model. When comparing models with their Bayes factors, it is important to make sure that the data and all of the background knowledge/assumptions used in all models are the same, or the results of comparison would be meaningless.

### 3.3 Occam factor

As the prior-weighted average of likelihood over the entire parameter space, the Bayesian evidence obtained with Equation 11 automatically implements the principle of Occam’s razor. Actually, the Bayesian evidence is determined by the trade-off between the complexity of a model and its goodness-of-fit to the data. The Occam factor (see e.g. MacKay, 2003; Gregory, 2005), which represents a penalty to a model for having to finely tune its free parameters to match the observations, is related to the variety of the predictions that a model makes in the data space. By adopting the suggestion of Gregory (2005), we define the Occam factor of a model as:

(14) |

where is the maximum of the likelihood function at . So, the Occam factor defined here is just the ratio of average-likelihood and maximum-likelihood which is never greater than one. It ensures that:

(15) |

A complex model would require a fine-tuning of its parameters to give a better fit to the observational data. Then, a large fraction of its parameter space would be useless and consequently wasted. In that case, its average-likelihood will be much smaller than its maximum-likelihood, which lead to smaller Occam factor. The Occam factor defined in Equation 14 is not directly related to the algorithmic complexity of a model, and cannot be obtained independently of the observational data. So, it would be interesting to find out whether this simple quantification of the complexity of a model is consistent with our intuition about the complexity of the model. Some examples about this will be presented in §6.

## 4 The Bayesian evidence for the SED modeling of an individual galaxy

When modeling the SED of a galaxy, it is clear from §2 that we need to make assumptions about the SSP model, the form of SFH, and the properties of the interstellar medium given by the DAL. Since our understandings of these physical processes are far from complete, we have many possible choices for each one of them. Apparently, different choices of these components would result in very different SED modelings. In this section, we introduce the methods of compute the Bayesian evidence for the different SED modelings.

In practice, it is meaningful to distinguish between two kinds of SED modelings: the SED modelings with the SSP, SFH and DAL all being fixed and the SED modelings with one of the SSP, SFH and DAL being fixed while the other two being free to vary. The Bayesian model comparison of the first kind of SED modelings can be used to ask the question like: Which specific combination of SSP, SFH and DAL is the best? Differently and more interestingly, the Bayesian model comparison of the second kind of SED modelings can be used to ask the question like: Which SSP/SFH/DAL is the best regardless of the choices of the other two? In §4.1 and 4.2, we will introduce our method to compute the Bayesian evidence for the two different kinds of SED modelings, respectively.

### 4.1 The SED modeling of a galaxy with SSP, SFH and DAL all being fixed

Since we have many possible choices for the SSP, SFH and DAL when modeling the SED of galaxies, it would be interesting to ask: within all the possible choices, which combination of the SSP, SFH and DAL is the best for the interpretation of a given observational data? This question can be answered by the Bayesian model comparison of the like SED model which has assumed a specific SSP model , SFH , and DAL , respectively. The hierarchical (or multilevel) structure of this kind of SED modeling of a galaxy is shown in Figure 2.

As mentioned above, the computation of Bayesian evidence is crucial for the Bayesian model comparison. The Bayesian evidence for a like SED model can be obtained as:

(16) | |||

(17) |

where

(18) |

is the maximum of the likelihood function at , and is the defined Occam factor associated with the free parameters of the like SED model. If we use the shorthand “” to indicate that is the conditioning information common to all displayed probabilities in the equation, then Equation 16 can be significantly shortened as:

(19) |

Similar shorthand will be used throughout this paper.

### 4.2 The SED modeling of a galaxy with one of the SSP, SFH and DAL being fixed while the other two being free to vary

#### The case for a fixed SSP but free SFH and DAL

Given the observational data of a galaxy, it is even more interesting to ask a question like: Which SSP model is the best regardless of the choices of the SFH and DAL? To answer this question, it is useful to define a SED model , where the SSP model is fixed to the specific choice , while the star formation history and the dust attenuation law are free to vary. The hierarchical structure of this kind of SED modeling of a galaxy is shown in Figure 3.

So, and are considered as two free parameters in addition to , while represents a given SSP model.

The Bayesian evidence for the like SED model can be obtained as:

(20) | |||

(21) | |||

(22) |

where

(23) |

is the maximum of the likelihood function at , and , , and is the defined Occam factor associated with the free parameters of this SED model. The additional Occam factors and imply that the like SED model will be further punished for having to freely select the SFH and DAL to match the observations.

Using the product rule of probability, we can obtain the identity equation:

(24) |

So, Equation 20 can be rewritten as:

(25) |

With the assumptions that the SSP , SFH and DAL are independent of each other, and the of SSP , the of SFH , and the of DAL are equally likely a priori, respectively, Equations 25 can be further simplified as:

(26) |

The method of calculating the Bayesian evidence for the like SED modeling presented above can be similarly applied to the and like SED modelings. The hierarchical structure of the latter two kinds of SED modelings of a galaxy are shown in Figures 4 and 5, respectively.

The Bayesian evidence of like SED can be obtained as:

(27) |

It can be used to answer the question: Given the observational data of a galaxy, which SFH model is the best regardless of the choices of SSP and DAL? Similarly, the Bayesian evidence of the like SED modeling can be obtained as:

(28) |

It can be used to answer the question: Given the observational data of a galaxy, which DAL is the best regardless of the choices of SSP and SFH?

## 5 The Bayesian evidence for the SED modeling of a sample of galaxies

When modeling and interpreting the SEDs of a sample of galaxies, we need to make assumptions about the SSP, the form of SFH and DAL for all galaxies in the sample. In many works in the literature, a common SSP, SFH and DAL (e.g. the BC03 SSP with a Chabrier03 IMF, exponentially declining SFH, and Calzetti law) are often assumed for all galaxies in their sample. However, we cannot make sure that the SFH and DAL for different galaxies must be the same. Generally, the different assumptions about the universality of SSP, SFH and DAL result in different SED modelings of a sample of galaxies, and the correctness of them need to be properly justified. This can be achieved by the Bayesian model comparison of the SED modelings of a sample of galaxies with different assumptions about the universality of SSP, SFH and DAL. The foundation for this kind of study is the computation of the Bayesian evidences for the different cases. In this paper, we limit ourselves to two kinds of SED modelings of a sample of galaxies: the one with SSP, SFH and DAL all being assumed to be universal, and the one with one of the SSP, SFH and DAL being assumed to be universal while the other two object-dependent. We introduce our method for computing the Bayesian evidence for them in §5.1, §5.2, respectively.

### 5.1 The SED modeling of a sample of galaxies with SSP, SFH and DAL all being assumed to be universal

As a widely used approach when modeling and interpreting the SEDs of a sample of galaxies, the same SSP, SFH and DAL are often assumed for all galaxies in a sample, especially when the size of the sample is very large. This is a natural choice, since it would be much more computational demanding to use different SSP, SFH and/or DAL for different galaxies when we have a large sample. In this subsection, we introduce the method to compute the Bayesian for this case. Although the SSP, SFH and DAL are all assumed to be universal for all galaxies in a sample, we still have many possible choices for each one of them. This is very similar to the case for an individual galaxy in §4. In §5.1.1, §5.1.2, we introduce our method for computing the Bayesian evidence for the different cases respectively.

#### The case for a fixed SSP, SFH and DAL

As the most widely used approach for the SED modeling of a sample of galaxies, the like SED modeling assumes a particular SSP, SFH, and DAL for all galaxies in a sample. The hierarchical structure of this kind of SED modeling of a sample of galaxies is shown in Figure 6.

The Bayesian evidence of this kind of SED modeling for a sample of galaxies can be obtained as:

(29) | |||

(30) | |||

(31) |

where

(32) |

is the maximum of the likelihood function at , and is the defined Occam factor associated with the free parameters of galaxies, respectively.

As shown in Figure 6, we assume that the observational data of different galaxies are independent of each other, and that the parameters of a galaxy tell nothing about the observational data of any other galaxy. With these assumptions, the Bayesian evidence of a like SED model in Equation 29 can be obtained as:

(33) |

#### The case for a fixed SSP but free SFH and DAL

It is interesting to check the performance of a particular SSP model for a sample of galaxies and independently of the selection of SFH and DAL. This can be achieved by defining a like SED modeling for a sample of galaxies, where a particular SSP model and a free SFH and DAL are assumed for all galaxies in the sample. The hierarchical structure of this kind of SED modeling of a sample of galaxies is similar to Figure 6, but with the nodes for SFH and DAL being empty. With the Bayesian evidence for this case, we can answer the question: Given the observational dataset of a sample of galaxies, which SSP model is the best regardless of the choices of the SFH and DAL? The Bayesian evidence for this case can be obtained as:

(34) | ||||

(35) | ||||

(36) |

where

(37) |

is the maximum of the likelihood function at , and is the defined Occam factor associated with the free parameters of galaxies, respectively. Since the SFH and DAL are assumed to be universal for all galaxies in the sample, we only need to add two free parameters ( and ) to represent the selection of the form of SFH and DAL. The associated two additional Occam factors and imply that the like SED modeling for a sample of galaxies will be further punished for having to freely select the SFH and DAL to match the observations.