Mr-Moose: An advanced SED-fitting tool for heterogeneous multi-wavelength datasets
We present the public release of Mr-Moose, a fitting procedure that is able to perform multi-wavelength and multi-object spectral energy distribution (SED) fitting in a Bayesian framework. This procedure is able to handle a large variety of cases, from an isolated source to blended multi-component sources from an heterogeneous dataset (i.e. a range of observation sensitivities and spectral/spatial resolutions). Furthermore, Mr-Moose handles upper-limits during the fitting process in a continuous way allowing models to be gradually less probable as upper limits are approached. The aim is to propose a simple-to-use, yet highly-versatile fitting tool fro handling increasing source complexity when combining multi-wavelength datasets with fully customisable filter/model databases. The complete control of the user is one advantage, which avoids the traditional problems related to the “black box” effect, where parameter or model tunings are impossible and can lead to overfitting and/or over-interpretation of the results. Also, while a basic knowledge of Python and statistics is required, the code aims to be sufficiently user-friendly for non-experts. We demonstrate the procedure on three cases: two artificially-generated datasets and a previous result from the literature. In particular, the most complex case (inspired by a real source, combining Herschel, ALMA and VLA data) in the context of extragalactic SED fitting, makes Mr-Moose a particularly-attractive SED fitting tool when dealing with partially blended sources, without the need for data deconvolution.
keywords:methods: statistical, radio continuum: general, submillimetre: general, infrared: general, (galaxies:) quasars: general
Confronting observations and models of our Universe is the very core of any fitting procedure. However, the fitting process faces an increasing number of challenges to adapt and compare the wide variety of models and observations. From the modelling side, generations of large libraries of highly non-linear physical processes allow for a more accurate description of the complexity of the physical processes at work in the sources. From the observer’s side, new instruments and facilities continue to push further resolution — both spectrally and spatially — and sensitivity over the electromagnetic spectrum. The principle of fitting appears at the frontiers of these two sides and interfacing these two aspects optimally is crucial. How to obtain maximal information from the data (which can vary a lot on quality and quantity) and compare it to the most optimal models available? By optimal here, we define the model which will represent data with the most fidelity or complexity without going in the regime of "overfitting", where degeneracies between parameters dominate, and therefore, no meaningful constraints can be extracted from the dataset (usually keeping in mind the Ockham razor principle). This optimal regime can be quantified thanks to the Bayes theorem, where likelihood of different models can be compared to find the relative likelihood probability of one model compared to another. The particular case of spectral energy distribution fitting is an old problem and can be traced as far as the 1960s (e.g. Tinsley, 1968). This technique has seen a considerable development starting at the end of the twentieth century and is now widely used in most fields of Astrophysics: from stars in our Galaxy to very distant galaxies during the epoch of reionisation.
As data throughout the electromagnetic spectrum became readily available, more and more complex models were developed, such as PEGASE (Fioc & Rocca-Volmerange, 1997), Maraston (1998), Starburst99 (Leitherer et al., 1999), Bruzual & Charlot (2003), MAPPINGS III (Groves et al., 2004) to predict galaxy emission and Pier & Krolik (1992); Hönig et al. (2006); Fritz et al. (2006); Nenkova et al. (2008); Stalevski et al. (2012); Siebenmorgen et al. (2015) for AGN emission to name the most widely used. Note that these codes do not contain any fitting routines, their main intent being to provide the user with a vast library of templates exploring a large, often non-linear parameter space. As a result, SED fitting techniques remained primitive, mainly using minimising algorithms and regression such as least-square or chi-square minimisation. Since 2000, we have seen a rise of tools implementing more sophisticated statistical frameworks in order to explore degeneracies in the increasing complexity of the SED fitting. It is important here to distinguish the two complementary aspects of SED fitting: the models used to predict the SED given a set of parameters and assumptions (Conroy, 2013, for a recent review) and the algorithm used to estimate the parameters on data (the fitting itself, see Sharma, 2017, for a review on the different available algorithms). During the last decade, these two aspects were often combined to provide users off-the-shelf SED fitting procedures with a varying wavelength range: Magphys (da Cunha et al., 2008) for galaxy evolution in the UV-FIR range, BayesClumpy (Asensio Ramos & Ramos Almeida, 2009) for AGN torus fitting in the UV-FIR range, CIGALE (Noll et al., 2009) for galaxy evolution with AGN implementation in the UV-FIR range, BRATS (Harwood et al., 2013) specialised for multi-radio frequency fitting in resolved sources, SEDfit (Sawicki, 2012) for galaxy evolution in optical/NIR implementing spatially resolved sources, SEDeblend (MacKenzie et al., 2016) for SED fitting in the context of lensed source with FIR observations, AGNFitter (Calistro Rivera et al., 2016) focusing on the AGN component in the UV-FIR range, … to name the most widely used in extragalactic astronomy111see also http://www.sedfitting.org/Fitting.html for a more exhaustive listing..
In an extragalactic context, most of these codes such as Magphys, CIGALE, BayesClumpy or AGNFitter, rely on a relatively strong assumption: all the emission comes from a single unresolved source and cross-identification at different wavelength is trivial. This simplification, while being necessary to enable the interpretation of extended multi-wavelength coverage (typically from UV to FIR) and to make use of the numerous source catalogues available in literature, is also one of their most important shortcomings in the advent of high resolution imaging across the electromagnetic spectrum. Indeed, as the wavelength range increases, resolution and sensitivity change drastically, and the assumption of the single source is no longer valid. Also, the other codes such as BRATS, SEDeblend or SEDfit are specialised in relatively short wavelength range and/or very specific applications: BRATS for spectral index mapping in radio, SEDeblend for lensed sources in the FIR domain and SEDfit in the optical/NIR domain for resolved galaxies. None of the aforementioned codes present the possibility to fit partially resolved sources, from NIR to radio.
To cite a more specific example as presented in Gullberg et al. (2016): fitting simultaneously Spitzer(3-160 m Werner et al., 2004), Herschel(70-500 m Pilbratt et al., 2010), ALMA(100-700 GHz) and VLA data for distant, partially resolved galaxies, illustrates exactly this point. One one hand, the resolution of Herschel does not allow to properly disentangle different spatial components (blending) but provides five data-points in spectral space, particularly covering the thermal dust IR peak of the SED. One the other hand, ALMA is particularly suited to answer to the question of the spatial location, at the cost of smaller spectral coverage. Also the heating sources of the dust emitting at 70 m and 3 mm is different (AGN or young stars) but is linked through the total energy emitted and their location. The same type of reasoning can be applied to the AGN radio emission (100 GHz, from the VLA for instance), often unresolved at low frequency and presenting multi-components at higher frequencies (core, jets, hot-spots and lobes). In the case of radio galaxies, these last two aspects are even co-existing for a single source. Therefore, as these observations are complementary, each providing valuable constraints spectrally and spatially on different overlapping components (some with non-detections), why not combining all information together? This is where Mr-Moose intervenes, providing a framework to simultaneously combine spatial and spectral information into a single fitting routine. The paper is organised as follows: we explain the methodology and structure of the code as well as the description of the input and output files in § 2. § 3 presents three examples of application of Mr-Moose. In § 4, we discuss the limitations and future developments and finally conclude in § 5.
2 Design and structure of Mr-Moose
Mr-Moose stands for M̱ulti-ṟesolution and m̱ulti-o̱bject/o̱rigin s̱pectral e̱nergy distribution fitting procedure. The philosophy when designing Mr-Moose was to provide a tool for SED fitting, sufficiently simple to be used by non-SED fitting experts, yet sufficiently versatile to handle as many cases as possible, particularly in the context of multi-wavelength samples. The main drivers for the design are:
to handle data with a large span of resolutions to combine interferometer/single-dish observations,
to deal with upper limits consistently, allowing a fit with gradually decreasing probability when approaching an upper-limit rather than an arbitrary cut at 3,
to treat the fitting in a bayesian framework to provide asymmetric uncertainties on parameters and to identify easily degeneracies as well as comparing different model combinations,
to allow a wide variety of models as input, analytic models for this first release and/or libraries for non-linear models in future updates.
Mr-Moose is written in python 2.7, developed in PyCharm Community Edition and complies to the PEP8 writing standard. It relies as much as possible on already existing packages (see full list in Appendix) to further increase robustness. The code was designed to be user friendly but not user opaque. A large (and necessary) control of the data, model and fit setting inputs is allowed to test the reliability of the output and therefore avoid the “black-box” effect. However, the code aims to be sufficiently simple to be used with a minimal statistical and computational knowledge. We first present the core of the fitting procedure, the maximum likelihood estimation and review the inputs and outputs files. Table 1 summarises the different file extensions used in Mr-Moose. We also report a simplified flowchart of the code structure in Figure 1 and refer the user to a more detailed flowchart in the GitHub repository222https://github.com/gdrouart/MrMoose.
|.fit||input||file containing the settings for the fit.|
|.dat||input||file containing the data.|
|.mod||input||model file containing the function, parameters and their considered range|
|.fil||input||filter files, containing the transmission curves|
|.sav||output||file containing the information from the fit, saved for future use, and storing the settings for the fit|
|.sed||output||file containing the flux per frequency (in Hz) of the best fit for all components of the SED in unit of erg scmHz|
|output||output files containing the plots|
|.fits||output||modelisation of the source(s) in each filter from the .dat file|
|.py||…||Mr-Moose files containing the procedures.|
2.1 The maximum likelihood estimation in Mr-Moose
The code working in a bayesian framework, likelihood calculation is central in identifying the most representative models to the data and providing marginalised probability density functions for each parameter in order to estimate uncertainties. Following the Bayes theorem written in terms relevant for SED fitting, the probability of a model M given the data D can be written as:
where P(M|D) is the posterior distribution (the probability of M after observing D), P(M) is the prior probability (the probability of M before observing D), P(D|M) is the likelihood (the probability of observing D given M), and P(D) is the model evidence (a measure of how well the model M predicts the data D and can be considered as a normalisation factor to the problem, often non-trivial to estimate). Therefore, is the final parameter distribution (that we are looking for) and is likelihood to be determined and to maximise (or minimise given our estimator, a slightly modified version of the goodness-of-fit). This maximum likelihood (abbreviated P onwards) corresponds to the product of all probabilities of the model given the data and can be written as:
with the detections and the upper limits. We stress that the prior on the model parameters (set to uniform, corresponding to an uninformative prior in Mr-Moose version 1). Each individual probability distribution, respectively for detection and upper limit (following a normal distribution), are defined as follows:
where is the measured flux and , its standard deviation, is the predicted flux, all in filter for a detection and for a non-detection. Note the as being the data offset from the true value of , infinitesimally small in the case of detection, and referring to the integrated flux probability () to the detection threshold (see Appendix 1, their Fig. 6, Sawicki, 2012, for a graphical representation). Taking the logarithm of Eq. 2 and injecting Eq. 3 and 4, we obtain:
The second term of the right-hand side of the equation being constant, maximising the probability, , corresponds to minimising the following estimator:
We apply a change of expression of the third term of Eq. 5 making use of the error function, erf, for computational convenience (see Appendix A of Sawicki (2012) for details). In practice, the error function tends to infinity quickly but gradually after a threshold point (the upper limit) calculated by the user as the local noise level in the original image, in the corresponding filter (the same image at a given filter/frequency can provide a combination of detections and upper limits if the location of components are known). This is only slightly different when including the upper limits into the calculation. We note that in the case when no upper limit is provided, the second term is null, and therefore the equation corresponds to the classical definition. In order to explore the parameter space and minimises the (equivalent to maximising the posterior probability), we make use of the emcee python package (Foreman-Mackey et al., 2013), allowing a choice of different samplers to probe the parameter space.
The emcee package follows the classic Markov chain Monte Carlo (MCMC) approach, consisting of randomly moving walkers exploring the parameter space. The main goal is to sample of the posterior probability density function of the parameters with a sufficiently high number of independent realisations. This is the best compromise between a brute force approach where each solution of the parameter space is calculated for a given grid and a standard least-square minimisation where only the best solution is provided (which can be a local minima). This approach also presents the advantage to estimate uncertainties on the model parameters, even with non-normal distributions thanks to the exploration the parameter space, but without the computing cost (increasing exponentially with the number of parameters). To explore and reach convergence on the best solution, the main idea of a MCMC approach is to set a walker in the parameter space, moving randomly with a given step size. For each new position, the likelihood function is evaluated, and the step is accepted if the probability is higher than the previous position. Hence the walker moves slowly towards the best solution which maximise the likelihood function. Several algorithms are available, the most famous being the Metropolis-Hasting (M-H) algorithm. However, this algorithm can fail to converge quickly when the convergence parameters are not optimally tuned and the parameter space presents strong non-linear behaviour (in particular if the step length is not optimised for the problem). New algorithms were developed to answer these caveats, reducing significantly the converging time. In particular, emcee uses affine-invariant transformations (Foreman-Mackey et al., 2013; Goodman & Weare, 2010) to sample the parameter space, using a stretch move based on the other walker positions. This presents the advantages of being less sensitive to strongly correlated parameters allowing a minimum loss of performance and therefore a quicker convergence. Also, this algorithm enables two valuable simplifications compared to the standard M-H: (i) the performance of the code is basically relying on two parameters: , a gain parameter and , the number of walkers and (ii) the possible parallelisation, splitting the chains into complementary sub-ensembles to predict iteratively the next walker moves, enabling even larger performance gains for specific problems. We refer the reader to Foreman-Mackey et al. (2013) for more details on the algorithm.
Mr-Moose requires four input files: a data-file, a model-file, a setting-file, and the filter database each containing the necessary information to perform the fit (see Table 1).
2.2.1 Data file
The data-file requires the most attention in its design as it will set up the number of arrangements (the various data/model combination). The Mr-Moose procedure doesn’t check for inconsistency in the data, and is the responsibility of the user to provide reliable or relevant photometric points (well calibrated and corrected for any photometric effects). To help with this step, corresponding images will be generated for each reported filter in the data file (see § 2.3). Multi-wavelength SED fitting is particularly challenging in this regard, therefore specific care should be taken in order to have consistent data throughout the fitting range (blending, filtering of scale/flux by interferometers, instruments corrections, aperture effects, absolute calibration uncertainties) by adding extra models/arrangements if necessary. However, we stress that starting with a simple case and gradually complexifying the fitting is usually a rewarding approach. Table 2 summarises the data-file organisation.
|Name of columns||Type||Purposes/Notes|
|Filter||string||filter name. It should match exactly the name of a filter in the database (case sensitive), without the extension (***.fil), necessary to calculate the model flux during the fitting (§ 2.2.4)|
|lambda0||float||central wavelength of the filter|
|RA||string||right ascension of the source in the 00h00m00.0s format - used only in the .fits generation|
|Dec||string||declination of the source in the 00d00m00.0s format - used only in the .fits generation|
|resolution||float||resolution of the instrument in arcsec - used only in the .fits generation|
|det_type||string||indicate if the reported value is a detection or an upper limit|
|flux||float||observed flux in the given filter|
|flux_error||float||uncertainty on the flux for the given filter|
|arrangement||integer||the number of the arrangement|
|component||string||the name of the arrangement (will be used for the spilt plot)|
|component_number||list of integer||the list of models to fit for this data point (separated with a semi-column)|
2.2.2 Model file
The model-file contains the information of the parameters from the combination of the models to fit on the data. Table 3 shows the format of the model file. The function name is referring to the function to be called during the fit and therefore should match exactly the name of the user defined function (case sensitive). The function is to be defined in the models.py file. Ideally, it should be sufficiently well designed to avoid a drop in execution time as each model is executed at each step for each walker by avoiding as much as possible loops, conditions or function calls. The code already provides some functions, developed and tested for specific applications and examples (Gilli et al., 2014; Falkendal et al., 2018; Drouart & et al., prep). We provide here a description of the models provided with the release of Mr-Moose:
BB_law: a black body model directly called from the astropy package (Astropy Collaboration et al., 2013)
where is the flux in erg s cm Hz, the normalisation to the data (relatively complex term containing the scaling to the data and the solid angle covered by the source with the small angular size source simplification, depending on each instrument for unresolved sources see333https://casper.berkeley.edu/astrobaki/index.php/Single_Dish_Basics for more details), the frequency in Hz, the Planck constant, the Boltzmann constant, the temperature in K, the speed of light.
AGN_law: an empirical AGN model designed to fit the IR AGN component
where the cutoff frequency, the spectra index, the observed frequency in Hz
sync_law: a single power law model, common form to represent synchrotron emission in radio and is defined as:
Each model exists in two versions, the standard where an array of frequency, an array with the parameter and the redshift value is provided and a version with redshift as a free parameter by simply adding the _z at the end of the function (see examples in the Mr-Moose repository) and by changing the all_same_redshift and providing the relevant redshift array (one redshift value per component). Note that when the redshift is a free parameter, the inputs are slightly different with only an array of frequency and an array for the parameters. We refer the user to the examples available in the repository. The library of models will increase in future updates depending on the need, we refer to the user manual for an up-to-date library of available models. We remind that the package astropy (Astropy Collaboration et al., 2013) provides a large library of the most commonly used function in astronomy. Also, we suggest to design the model with parameters spanning a large range in log-scale (such as normalisation factors, as done in the examples in this paper), making sure the corresponding function is making the adequate transformation to calculate flux. The initial position of the walkers are set following the emcee documentation: a Gaussian ball of walkers, centred on the median of the parameter interval. This has important consequences and should be taken into account when defining this interval (see § 4).
|function_name||number of parameter ()|
|first parameter||min value||max value|
|second parameter||min value||max value|
|n parameter||min value||max value|
2.2.3 Setting file
The setting file (see full structure in Table 4) contains the parameters for Mr-Moose to perform the fit such as the number of walkers (nwalkers), the number of steps (nsteps), the limit (nsteps_cut) at which the chains should be used to plot the results (the limit from which convergence of the fit is reached). We provide examples with typical numbers. For more information, we invite the user to read the recommendations from the emcee documentation. In a nutshell: the more walkers and steps, the better. However a balance has to be found to keep the code executable on a machine. From experience, one effective strategy consists to run some first blind tests to have a feeling of the likely requirements for the number of walkers and the number of steps. For relatively simple case, we advice to start with 200 walkers and run for 500 steps, with a cut at 400 (nwalkers=200, nsteps=500 and nsteps_cut=400). This first run will probe the parameter space with 20 000 points ((nsteps-nsteps_cut) nwalkers) and gives an indication of how many steps might be required to reach convergence. For a larger the number of parameters, a larger number of walkers and steps is required to explore more thoroughly the parameter space (see example 1 versus example 2).
2.2.4 Filter database
The filter database is essential in the fitting routine as allowing to predict the flux of a given combination of components at a given wavelength. For each step, the average flux of the model is calculated through the transmission curve of the filter given in the data file. This transmission curve represents the response of a telescope for a constant incident flux (in the frequency space) for a given frequency range. The convolution of the incident flux with the telescope response is important as any large variation in the SED within a spectral window can bias the estimated flux, for instance in the case of an emission line or a strong gradient within the averaged frequency range. This effect is particularly strong in optical and NIR where the telescope response varies significantly from filter to filter. During the fitting, we therefore calculate for each filter, the corresponding telescope response convoluted with the modelled source with the following equation:
where the averaged flux for a given filter , the frequency, the flux of the model and the transmission curve. Each of the filter used in the data file must be present in the database, or the code will fail to execute. Addition of new filters is easy, a simple text file with two columns, frequency and corresponding transmission can be added in the filters folder. For optical and NIR, given the large amount of filters used in astronomy, we point the reader to Asiago Database on Photometric Systems444http://ulisse.pd.astro.it/Astro/ADPS/ to add the relevant filters for a specific analysis (Fiorucci & Munari, 2003). For other wavelength, we refer the user to each instrument user manual to provide the relevant transmission curve. In the case of radio telescopes (ALMA, VLA, ATCA, etc), the bandpass calibration corrects for any effect, therefore the telescope response can be approximated as a gate function centred on the observed frequency and covered by the correlators (which can be discontinuous, depending on the configuration required to the science case)555two functions in the mm_utilities.py can be found to generate these gate filters.
|source_file||string||the source file path where the data are to be read|
|all_same_redshift||boolean||keyword to allow for the redshift of the components to be a free parameter|
|redshift||float array||the redshift of each component in the system|
|model_file||string||the model file where the parameters of the models and their range are selected (the parameters to fit)|
|nwalkers||integer||the number of walkers for the fit (should be typical more than 100, emcee documentation advice at least twice the number of free parameters). It play a role on the speed at which the chain converges, so usually the more the better (in the limit of the computer memory).|
|nsteps||integer||the number of steps for each walkers to perform (can vary widely from 200 to several thousand base on the complexity of the problem)|
|nsteps_cut||integer||should be smaller than nsteps, it represents the number of steps to ignore when drawing the probability density function of the parameters (it depends mainly on convergence time and the number of walkers as it require to select a sufficiently large number of points to populate the parameter space after convergence.|
|percentiles||float array||the percentile of the distribution to be returned (used in the plots to show max intervals). Must be odd-dimensioned and all numbers in the range 0percentiles1. The default values are chosen to represent probability density functions which are not normal, excluding the 10% outliers.|
|skip_imaging||boolean||skip the creation of the data file in images|
|skip_fit||boolean||skip the fitting|
|skip_MCChains||boolean||skip the generation of the MC-Chains plot|
|skip_triangle||boolean||skip the generation of the triangle plot|
|skip_SED||boolean||skip the generation of the SED plots|
|unit_obs||string||unit of the wavelength/frequency provided in the .dat file|
|unit_flux||string||unit of the flux provided in the .dat file|
Mr-Moose produces series of outputs, containing different information (see Table 1). The outputs can be divided in two parts: the supporting files (“.fits”, “.pkl”, “.sav”, “.sed”) and the result files (“.pdf”). We focus first on the support files and their contents, allowing to dig further in the interpretation if necessary and allow for an easy reproduction of a given fitting. We designed Mr-Moose to allow users familiar with the emcee package to use their custom tools from the “.pkl”. However, Mr-Moose comes also with plotting functions to display results in order to interpret the result from the fit. There are several graphs “.pdf” file generated by the procedure: chain convergence, triangle and SED plots. These plots works “in unison” to understand the results of the fitting. Any unexpected behaviours in these plots should bring suspicion on the inputs, would it be data, model used, arrangements defined or the settings of the fitting procedure.
2.3.1 Supporting files: chains, SED, fitting and fits files
The “.fits” files are generated following the data provided in the “.dat” file. For each filter, the procedure generates an image in the FITS format (commonly used in astronomy Wells et al., 1981), containing all the components associated with this filter, making use of the RA, Dec and resolution values provided in the “.dat” file. The procedure creates a unresolved source, assuming a Gaussian point spread function of full width at half maximum provided as the resolution parameter, at the given position (RA, Dec). This allow to check that the combination of the data file are correctly assigned by direct comparison with the original images, if available. We note that the pixel size is fixed to a fifth of the resolution parameter.
A “.pkl” file, contains the object from the emcee procedure saved making use of the cPickle function from the pathos package allowing a serialisation, necessary in case of parallelised use. No modifications are implemented from the original emcee sampler, and as such, emcee-familiar users can make use of this output into custom graphical tools. This file contains the complete chain for each walker and other diagnostics provided by the emcee package.
The “.sav” file contains the information from the fitting, including the setting file and the model structures, modified during the fitting and storing the final results in a YAML format. The file contains the final percentiles for each parameter, the best fit value, the name and path of the output files generated by the procedure.
A “.sed” file is created reporting each component of the models from the global best fit. The first column is the frequency in Hz, the following columns reports the flux in erg s cm Hz.
2.3.2 Result files: Convergence, Triangle and SED plots
The convergence plot shows the walkers exploration of the parameter space for each iteration of the procedure for each parameter. This plot helps to identify when convergence is reached, therefore to which value to set the nsteps_cut parameter to be used by the triangle and SED plots. When is convergence reached? The walkers should oscillate at given values in all windows simultaneously, revealing that they converged into the minimum of the parameter space. Note that several minima can co-exist, therefore the walkers can be split between several values, but as long as the chains are stable around these values (for several hundreds steps), convergence can be considered as reached. However, with the increasing complexity of larger parameter spaces, some walkers can be “stuck”: i.e. moving erratically, very slowly or not moving at all. Therefore including these walkers in the subsequent plots would bias any interpretation of the probability density distribution. We report for completeness these “stuck” walkers in the convergence plot (with a different colour coding) as a check of the fitting (see § 3). These “stuck” walkers will be ignored in the triangle, and SED plots. We filter these out extracting only the chains from the sampler with a high acceptance fraction (meaning that walkers are effectively “walking”). The default value for this threshold of selection in Mr-Moose is set to half of the average acceptance fraction value of all chains (AAF value, for instance if AAF = 0.6, the threshold is defined as 0.3 and all chains below are ignored in the following steps). We report both the AAF along with the fraction of walkers filtered out with this cut (SW) in the convergence plot. For specific cases, where the default cut is not optimal, Mr-Moose allows to set a manual value (see README). As a guide, AAF should be in the 0.2-0.5 range and the stuck walkers fraction relatively low. For better details, we report the reader to the emcee documentation. Briefly, in the case of AAF and SW values being extremes, three solutions are to increase the number of walkers (nwalkers), to better define the initial parameter values through the parameter interval (see § 4) or to simply increase the number of steps (nsteps).
The triangle plot shows several pieces of information from the fit making use of the corner package. It basically uses all information contained by the selected walkers in the chain after the convergence value (nsteps_cut) provided in the setting-file. It presents two types of plot: 2D paired-parameter plots and marginalised distribution for each parameter in a form of a triangle. The 2D plots shows the probability density functions by reporting each step of each walker for the projected parameters. Additional contours are also added, representing an alternative version of the dot-representation, with contour values being at the 25%, 50% and 75% of the maximal value of the corresponding marginalised distribution. We stress that these contours do not correspond to the percentiles of the distribution, obtained by cumulative integration of the probability density function. Along the diagonal, the marginalised distributions of each parameter are presented by histograms. The percentile values defined in the setting-file are also reported. We recommend the use of a sightly different version of the seven-figure summary (Bowley, 1920), omitting the two extremes, minimum and maximum values of the distribution as particularly sensitive to outliers and “stuck” walkers, therefore 10%, 25%, 50%, 75% and 90%. They usually present a good summary of different type of distributions (as it is expected to deviate substantially from normal distributions) although should not prevent to carefully check each distribution visually to identify problems. This triangle diagram is particularly helpful to (i) reveal any degeneracies among parameters (ii) reveal if parameters are constrained, and how well, given the dataset.
The SED plots show the fit of each arrangement on the data, the final results of the models and data combination. The final number of generated plot changes, depending on the complexity of the source considered. First, the SED is coming in two versions, a best fit and a “spaghetti” version. The former shows the best overall fit to the data, each colour corresponding to one model (the maximum likelihood reached in the chains). The latter shows all the solutions provided by each walkers at each step after the convergence cut (same than the triangle plot). This is particularly useful to see how good the models are constrained with the data in combination with the triangle and convergence plots. We choose this representation over a shaded area representation to better visualise the constraints from the data and to better highlight the different solution in case of the presence of several minima. For sources with several arrangements, split versions of the SED will also be produced to ease interpretation (see examples in § 3) with colour coding of the models conserved over the windows.
|triangle_plot||probability density functions for each parameter and by pair. This plot is particularly efficient to reveal degeneracies between parameters (banana shape) or unconstrained parameters by the fit (no single peak in the probability density function)|
|MCChains_plot||values of parameters at each step for each walker. This plot is particularly useful to show when convergence is reached, used to calculate all the quantities displayed in other plots|
|SED_fnu_plot||plot in units of frequency and flux of all the data and models on the same graphical window. While relevant for few components or arrangements, this plot becomes rapidly too complex for multi-component case|
|SED_fnu_splitplot||split plot in unit of frequency and flux for each arrangement defined in the data file|
|SED_fnu_spaghettitplot||plot showing all the possible parameter values on the SED after convergence (at each step for each walker). This plot translates the triangle plot on the data, showing how constrained the model is compared to the data.|
|SED_fnu_splitspaghettitplot||equivalent to the SED_fnu_splitplot but for the spaghetti style|
|SED_file||file storing the SED of each component as plotted in the SED_fnu plot|
|sampler_file||file storing the chains of the fitting.|
|save_struct||file storing the details of the fit for future references and reproducibility, save the fit settings dictionnary|
In order to demonstrate Mr-Moose capabilities, we focus here on three specific extragalactic examples which where used to build this procedure (the code is provided with several examples with increasing complexity, we focus here on the artificial ones produced by the file example_1.py and example_6.py) as well as a previous result from literature. For the sake of simplicity, we will refer to them as example #1, #2 and #3, respectively. The first is a simple case of fitting, with only one model onto a dataset, here synchrotron emission from a radio source in the 74 MHz-8.4 GHz range. The second case is much more complex: we here aim to disentangle the contribution of various components, spatially non-collocated and originating from different physical processes. We take the example of a powerful radio galaxy and a star forming galaxy companion with separate emission from the lobes (of synchrotron origin), the dust heated by star formation and the dust heated by AGN, in the 8 m-1.4 GHz wavelength range. The third example cover a real dataset published in Gilli et al. (2014), fitting a modified black body in the 70 m-3 mm range to reproduce dust emission of a distant quasar, demonstrating the useful addition of upper-limits in the fitting process.
3.1 Example #1: single source, single component
This example is the simplest fitting imaginable for Mr-Moose, consisting of a single law on a given dataset. Although of limited interest, it allows us to familiarise with the code. We benchmark Mr-Moose by generating a fake source and trying to recover these original parameters.
3.1.1 Fake source and input files
To generate a fake source data-points, we assume a single, unresolved source at emitting a single synchrotron law represented by the model sync_law with =-1 and log=-22 (to recover a source with flux at Jy-level) where is the slope and the normalisation (in log-scale). We generate five single points passing this SED through the filters from the filter library and assign a given signal-to-noise ratio (here SNR=5 for all points). We simulate further observation uncertainties by adding a random gaussian noise with a standard deviation corresponding to the provided SNR value. In practice, the code adds (or subtracts) flux to each datapoint, not following exactly the original synchrotron law defined previously. By adding this extra noise the chance of recovering the exact parameter values decreases, this perfectly illustrates the real case of observations, where each point is drawn from a distribution. This fake data-file (Listing 9) is, along with the setting-file (Listing 6) and the model-file (Listing 1), generated from the example_1.py procedure (provided with Mr-Moose). These files are therefore fed into Mr-Moose to perform the SED fitting.
3.1.2 Output files and results
|Name||parameter||true value||fit value|
The code generates the output files previously described (see § 2). We focus on the four “.pdf” plots: the convergence plot (Fig. 2), the triangle plot (Fig. 3) and two SED plots (in unit of , Fig. 4).
The MC-Chains plot (Fig. 2) shows here a quick convergence (reached roughly at the #100 step). We define nsteps_cut=180, and apply it to the subsequent plots (triangle and SEDs).
The triangle plot (Fig. 3) is the classical plot to represent multi-parameter space: plotting the distribution of parameters in pairs with the marginalised distribution of each parameter on the hypotenuse of the the triangle. Along this diagonal, the marginalised distribution for each parameter is a histogram. The dashed lines are the percentiles provided in the setting-file, here 10%, 25%, 50%, 75% and 90% (provided by the user in the setting-file), as recommended in § 2. The parameters are well constrained in a small area of the parameter space. It is interesting to note that even if the two marginalised distributions show relatively Gaussian-like distributions, the 2D projection is far from being a symmetric cloud. The contours correspond here to 25%, 50% and 75% from the maximal value of the marginal distributions. Overall, we note that the input parameter are recovered within the uncertainty interval (remembering we added extra noise to the data, see Table 6).
As presented in § 2, the code generates two version of the SED to visualise convergence and the effect of uncertainties: the best fit and the “spaghetti” version allows to visualise the how well the data constrain the parameters of the model. Fig. 4 shows that the power-law parameters are well constrained with only a small scatter of the model in the data-points.
3.2 Example #2: multi-sources, multi-components
For the second example, we aim to demonstrate the application of Mr-Moose to a much more realistic situation where the total emission received is a complex combination of several blended sources at different wavelengths by using all the variations and freedom allowed by Mr-Moose. This fake system is adapted from a real science case, see Gullberg et al. (2016). In the following paragraph and Fig. 5, we summarise this complex source consisting of two galaxies, one powerful radio galaxy with two bright radio lobes (FRII type, Fanaroff & Riley, 1974) and a star-forming companion galaxy.
|Name||parameter||true value||fit value|
3.2.1 Fake source and input files
The following characteristics and assumptions are used to define this system:
The radio galaxy is composed of an AGN, a host galaxy and is star-forming
The AGN is radio-loud presenting a FRII morphology (two bright radio lobes)
The companion is star-forming and without AGN
The SED coverage covers the 8 m-1.4 GHz range with various resolutions and detections as the result of the different facilities used for observations.
Fig. 6 presents the observations, each image generated by Mr-Moose before the fitting. To further precise, the IR SED do not differentiate the two galaxies due to the lack of resolution and is constituted of three components: the cold dust from both the companion and the host and the AGN hosted in the radio galaxy. We also note that the 350 m, 500 m and 870 m are only upper limits and therefore, the sum of these three components cannot largely exceed these values. The synchrotron emission is constituted of two lobes, partially blended with the radio galaxy itself and with the companion or unresolved at the lowest radio frequency. Also, the synchrotron and dust emission are equally contributing at certain frequencies (here 230 GHz).
We have a total of 14 observations in the 8 m-1.4 GHz range, with 17 photometric data-points. The total number of components is therefore five, one AGN dust component, two cold dust component (one the radio galaxy, one of the companion galaxy assuming that the cold dust component originates from young stars) and two synchrotron emissions (originating from each lobe). This corresponds to ten free parameters, summarised in Table 7. Given the combination of resolution, sensitivity, frequency coverage, photometry and components, we can define seven arrangements of the data/models, used to generate the fake data-file and model-file.
Dust/AGN: combination of the dust emission from the companion and the radio galaxy, itself consisting of the AGN and cold dust components Spitzer, Herschel and LABOCA (Siringo et al., 2009):
W sync/comp (western synchrotron, companion): partially resolved at 229 GHz, combination of synchrotron and cold dust emission (ALMA Band 6, 7.5 GHz bandwidth)
E sync/host (eastern synchrotron, host galaxy): partially resolved at 245 GHz, combination of synchrotron and cold dust emission (ALMA Band 6, 7.5 GHz bandwidth)
total sync (total synchrotron): unresolved radio emission at 1.4 GHz, 4.7 GHz 102 GHz (VLA, ATCA and ALMA Band 3 at 50 MHz, 2 GHz and 7.5 GHz bandwidth, respectively)
W sync: only resolved at 4.8 GHz and 8.4 GHz (VLA, C and X bands, with 50 MHz bandwidth)
E sync: only resolved at 4.8 GHz and 8.4 GHz (VLA, C and X bands, with 50 MHz bandwidth)
total emission: combination of all components at 102 GHz (ALMA Band 3, with 7.5 GHz bandwidth)
We generate the fake files using the fake source procedure provided with Mr-Moose, generating, summing and passing the different combination through the adequate filters, in the same fashion presented in § 3.1. Table 7 lists the values of each parameter and Fig. 6 the images corresponding to each filter from the data-file. We choose nwalkers=400 to make sure to have a sufficiently large number of walkers exploring the parameter space and a deliberately (too) large number of steps (nsteps=3500) to ensure convergence is reached.
3.2.2 Output files
The code generates the output files previously described in § 2. Given the higher number of arrangements, two new SED files are generated (compared to example 1), presenting a split version of the total SED. Indeed, when dealing with complex system, presenting emission from different origins, over-plotting everything on the same SED makes interpretation difficult. Therefore, Mr-Moose provides split SEDs, with the same scale and range for each arrangement (seven in this example) with color coding each model component. We describe here the results from the convergence plot (Fig. 7), the triangle plot (Fig. 9) and finally finish by the SEDs (Fig. 10-11).
Fig. 7 shows the chains of walkers, where convergence is reached roughly after 1000 steps. With more attention, one notices each parameter converges at different times, the first ones being the most constrained parameter such as or (compared to ). Also from this plot, it is clear that the dust temperature cannot be further constrained without extra information. We deliberately chose a larger number of steps to ensure we would not miss a long convergence trend. The grey chains corresponds to flagged walkers, below the acceptance fraction threshold discussed in § 2. In this particular example, the default AAF threshold value does not filter out some isolated walkers which did not converge even after this large number of steps. This is probably due to the increasing complexity of the parameter space, where several local minima coexist. We therefore set a manual acceptance fraction threshold at 0.23. We choose this value because it filters out the low-end tail of the distribution of the acceptance fraction values (see Fig. 8) corresponding to the isolated walkers in the parameter space (see the bottom of the chains, at roughly step #1000). To plot the following SEDs and triangle diagram, we set the limit nsteps_cut=3450. A much lower cut, such as 3000, is possible but would clutter the following diagrams and given our 400 walkers, this still represents 20000 points to sample our distributions.
Fig. 9 provides us with information on the parameters. Looking at the histogram on the diagonal, we can see that most parameters are relatively well constrained, within the original values, except for the temperature of the cold dust components, especially (This will be particularly obvious in the SED plot). We also see that the increasing complexity of the system, contours are very useful to highlight “banana” shapes more clearly. This is a perfect demonstration of how Mr-Moose provides reliable results, in the ambiguous case of a lack of data.
Fig. 10 and Fig. 11 respectively show the single and the split SEDs for the best fit and “spaghetti” visualisations. In these case, overplotting all information on a single SED is very confusing (see Fig. 10). While, we keep this figure for its pedagogic value, we prefer to focus on the split version of SED to interpret the fitting (see Fig. 11). On the top panel (best-fit visualisation), the procedure proposes a really good fitting. When incorporating the uncertainties on the parameters, in the bottom panel (spaghetti visualisation), we see that the superposition of the different components explain the degeneracies observed in Fig. 9. The large variation of the temperature allows for a large variation of the normalisation and, in turns, a different setup and relative contribution of each component. This is a perfect illustration that (i) a simple value of minimum fit is limited as not providing a fair view of the fitting, (ii) extra-information on the relative contribution of the different components is necessary to better constrain the fit (iii) even a sophisticated fitting approach does not preclude to high-quality data and a careful examination of the source.
3.3 Example #3: real dataset
We finally run Mr-Moose on a published dataset to ensure the code is able to recover previous results. We take the example from Gilli et al. (2014) as photometry and parameter results are reported in the publication and allow for a direct comparison. Briefly, the data, a combination of ALMA and Herschel observations with upper limits and detections, are used to fit a modified black body component (reported in § 2.2.2), in order to access the dust properties of a =4.75 quasar. We run the code in two configurations and : one with only the detected filters and a second adding the upper limits to the fit. For both configurations we set =2 and =1.5 10 Hz as in the original publication. We list the files in the Appendix and run the procedure twice. We focus only on recovering the temperature parameter as the other parameters are derived either from the best fit temperature or directly from observed fluxes.
For the case of detection only (a), the temperature is recovered within the uncertainties of the quoted value in the original paper (see Table 8, Fig. 12, Fig. 13 and Fig. 14). Note also the asymmetrical uncertainties when having the information from parameter space exploration. The uncertainties provided by Mr-Moose are different, referring to the 25% and 75% of the distribution when the value quoted in Gilli et al. (2014) is likely the standard 1 symmetric uncertainty assuming a normal distribution.
When including upper limits (b), the result differs with a slightly lower temperature and uncertainties. This is probably due to the inclusion of upper limits as continuous data with decreasing probability as defined in Eq. 6 (note also how the model is able to converge above the upper limits, which are represented as 1 in Fig. 14, bottom right rather than the 3 in Fig. 2, Gilli et al. (2014). This is a good demonstration of how constraining upper limits can add valuable information when performing SED fitting. As mentioned in the original paper, we would like to stress that the uncertainties reported here are mainly fit uncertainties: they only represent the combined user knowledge on the uncertainties given in the data file.
For a more general comment, these statistical uncertainties do not incorporate any systematic uncertainties (which can dominate in certain cases). This is a very common problem in SED fitting, the final uncertainties obtained are limited to the systematic uncertainties associated to the assumed model to represent the data or from the data themselves. We add this word of caution for user non-familiar with SED fitting as these effects are often more subtle (but can be larger than the statistical uncertainties!) and usually hidden in the model choice and/or data. In this particular case, only a full comparison between a range of models (e.g. the common problem of IMF in stellar population fitting) and a complete data processing uncertainties (e.g. bad calibration or sky subtraction) assessment can answer this particular question.
|Name||parameter||Gilli2014+||config a||config b|
4 Known limitations and future developments
|Examples||nwalkers||nsteps||# of parameters||Time|
|# 1||200||200||2||2 min|
|# 2||400||3500||10||24 h|
|# 3a||200||400||2||5 min|
|# 3b||200||400||2||10 min|
At the publication of this paper, the code will be available online on the GitHub platform at the following url: https://github.com/gdrouart/MrMoose, under a GPLv3 license (allowing re-use and modifications, see License.txt in Mr-Moose folder). While Mr-Moose is under continuous development and will see periodic upgrades, the code also knows some limitations. We list some of the most important ones and refer the reader to the README for a more exhaustive list of all known issues/limitations.
The execution time is highly dependent on the complexity of the configuration: the most parameters/components/arrangements, the longer the time for the fitting to converge. Typical execution times are from couple of minutes to several hours (see Table 9) for a reasonably recent machine. More particularly, it assumes that the user defined models are as efficient as possible because these models are called at each iteration for each walker and is therefore the bottleneck of the code. Large number of walkers and steps can lead to significantly larger files, slowing the process. Even if Mr-Moose is designed to run in one step, we recommend to split the run in several steps, first generate the images only to check your data-file, hence a second run to perform the fit and a third to generate the “.pdf” files (by changing the keywords values in the setting file).
Choosing initial values close to the “true” values is important, otherwise the code will take a long time to converge (if converging at all). This choice of initial guess is particularly important when including upper limits. By design, when the parameter values produce a model above an upper limit, the likelihood is very quickly set to infinity due to the error function (see § 2). Therefore a slight underestimation of parameters is preferable, particularly on any normalisation parameters (to keep the second term of the in Eq. 6 close to null at the beginning). We remind that the initialisation of the walkers is made following the emcee recommendation: a Gaussian “ball” of walkers centred on the median value of the interval parameters.
The code provides two levels of parallelisation (multi-core processing) for the calculation of the likelihood: the parallelisation on one source and the parallelisation on a sample of sources. The former tends to deteriorate the performance, most likely due to the large overheads when the likelihood is relatively simple (see emcee documentation). The second is only applicable on a sample of sources and therefore is much more efficient in reducing fitting time of an entire sample, using one core per source (Mr-Moose is provided with the wrapper herd.py to parallelise on sample). However, specific care should be taken to avoid running out of memory as the number of positions to store scales with the number of steps and walkers (and the number of sources to fit simultaneously).
We list here some of the future features, already planned:
allowing for a wider range of priors: in the current version, only uniform priors are incorporated. We plan to add a variety of priors in order to increase the fidelity of prior knowledge into the fitting procedure. This will also solve the initial parameter value limitations mentioned previously.
a migration to Python 3 is planned (when upgraded, all further developments will be applied to the Python 3 in priority)
implementing the use of discrete models such as sophisticated AGN models or galaxy evolution models. This step requires the implementation of a methodology where the samplers can explore a (partially) discrete parameter space. Several solutions are considered (SED interpolation, brute force, branch-and-bound methods, nesting, etc). Its implementation is beyond the scope of this paper, it will be the object of a major update of Mr-Moose.
We presented a new fitting tool allowing the user to fit SEDs for a variety of source configuration from relatively simple to complex source configuration in a bayesian framework using the emcee package. The code, highly customisable, allows a wide range of configurations between data and models, with the aim to stay user-friendly. The code consistently handles upper limits along with detections, each data point being calculated through the filter database, also managed and defined by the user. The large flexibility on creating models/data/filters makes this code virtually adaptable to any case, independently of wavelengths, instruments/telescopes combinations or science cases, providing a robust tool to interpret increasingly complex multi-wavelength datasets. In particular, the code appears ideally suited (but not limited) to fit a combination of Spitzer, Herschel, ALMA and JVLA dataset, and/or in the presence of constraining upper limits in the data even in the case of a single unresolved source.
GD would like to thanks Steven Murray for insightful discussions on Bayesian inference and the Curtin PyClub for useful programming tips. We also thank Matt Lehnert, Carlos De Breuck, Joel Vernet and Nick Seymour for advices and help in the conception of this code. GD acknowledges the ESO Scientific Visitor Programme in supporting the development of this code.
- Asensio Ramos & Ramos Almeida (2009) Asensio Ramos A., Ramos Almeida C., 2009, ApJ, 696, 2075
- Astropy Collaboration et al. (2013) Astropy Collaboration et al., 2013, A&A, 558, A33
- Bowley (1920) Bowley A. L. (Arthur Lyon) S., 1920, An elementary manual of statistics, 3rd ed. edn. P. S. King & Son, http://ci.nii.ac.jp/ncid/BB1069398X
- Bruzual & Charlot (2003) Bruzual G., Charlot S., 2003, MNRAS, 344, 1000
- Calistro Rivera et al. (2016) Calistro Rivera G., Lusso E., Hennawi J. F., Hogg D. W., 2016, ApJ, 833, 98
- Conroy (2013) Conroy C., 2013, ARA&A, 51, 393
- Drouart & et al. (prep) Drouart G., et al. in prep, MNRAS
- Falkendal et al. (2018) Falkendal T., et al., sub., 2018, A&A
- Fanaroff & Riley (1974) Fanaroff B. L., Riley J. M., 1974, MNRAS, 167, 31P
- Fioc & Rocca-Volmerange (1997) Fioc M., Rocca-Volmerange B., 1997, A&A, 326, 950
- Fiorucci & Munari (2003) Fiorucci M., Munari U., 2003, A&A, 401, 781
- Foreman-Mackey et al. (2013) Foreman-Mackey D., Hogg D. W., Lang D., Goodman J., 2013, PASP, 125, 306
- Fritz et al. (2006) Fritz J., Franceschini A., Hatziminaoglou E., 2006, MNRAS, 366, 767
- Gilli et al. (2014) Gilli R., et al., 2014, A&A, 562, A67
- Goodman & Weare (2010) Goodman J., Weare J., 2010, Communications in Applied Mathematics and Computational Science, Vol. 5, No. 1, p. 65-80, 2010, 5, 65
- Groves et al. (2004) Groves B. A., Dopita M. A., Sutherland R. S., 2004, ApJS, 153, 9
- Gullberg et al. (2016) Gullberg B., et al., 2016, A&A, 586, A124
- Harwood et al. (2013) Harwood J. J., Hardcastle M. J., Croston J. H., Goodger J. L., 2013, MNRAS, 435, 3353
- Hönig et al. (2006) Hönig S. F., Beckert T., Ohnaka K., Weigelt G., 2006, A&A, 452, 459
- Leitherer et al. (1999) Leitherer C., et al., 1999, ApJS, 123, 3
- MacKenzie et al. (2016) MacKenzie T. P., Scott D., Swinbank M., 2016, MNRAS, 463, 10
- Maraston (1998) Maraston C., 1998, MNRAS, 300, 872
- Nenkova et al. (2008) Nenkova M., Sirocky M. M., Ivezić Ž., Elitzur M., 2008, ApJ, 685, 147
- Noll et al. (2009) Noll S., Burgarella D., Giovannoli E., Buat V., Marcillac D., Muñoz-Mateos J. C., 2009, A&A, 507, 1793
- Pier & Krolik (1992) Pier E. A., Krolik J. H., 1992, ApJ, 401, 99
- Pilbratt et al. (2010) Pilbratt G. L., et al., 2010, A&A, 518, L1
- Sawicki (2012) Sawicki M., 2012, PASP, 124, 1208
- Sharma (2017) Sharma S., 2017, preprint, (arXiv:1706.01629)
- Siebenmorgen et al. (2015) Siebenmorgen R., Heymann F., Efstathiou A., 2015, A&A, 583, A120
- Siringo et al. (2009) Siringo G., et al., 2009, A&A, 497, 945
- Stalevski et al. (2012) Stalevski M., Fritz J., Baes M., Nakos T., Popović L. Č., 2012, MNRAS, 420, 2756
- Tinsley (1968) Tinsley B. M., 1968, ApJ, 151, 547
- Wells et al. (1981) Wells D. C., Greisen E. W., Harten R. H., 1981, A&AS, 44, 363
- Werner et al. (2004) Werner M. W., et al., 2004, ApJS, 154, 1
- da Cunha et al. (2008) da Cunha E., Charlot S., Elbaz D., 2008, MNRAS, 388, 1595
List of packages used in Mr-Moose and references
The list of packages, their version and their respective reference.
corner v2.0.1, https://pypi.python.org/pypi/corner
guppy v0.1.10, https://pypi.python.org/pypi/guppy/
matplotlib v1.5.1, http://matplotlib.org/
numpy v1.9.1, http://www.numpy.org/
pathos v0.2.0, https://pypi.python.org/pypi/pathos/0.2.0
pycallgraph v1.0.1, http://pycallgraph.slowchop.com/en/master/#,
PyYAML v3.12, http://pyyaml.org/
tqdm v4.8.4, https://github.com/tqdm/tqdm
scipy v0.14.0, https://www.scipy.org/