# A Comparison of Approaches in Fitting Continuum SEDs

###### Abstract

We present a detailed comparison of two approaches, the use of a pre-calculated database and simulated annealing (SA), for fitting the continuum spectral energy distribution (SED) of astrophysical objects whose appearance is dominated by surrounding dust. While pre-calculated databases are commonly used to model SED data, only few studies to date employed SA due to its unclear accuracy and convergence time for this specific problem. From a methodological point of view, different approaches lead to different fitting quality, demand on computational resources and calculation time. We compare the fitting quality and computational costs of these two approaches for the task of SED fitting to provide a guide to the practitioner to find a compromise between desired accuracy and available resources. To reduce uncertainties inherent to real datasets, we introduce a reference model resembling a typical circumstellar system with 10 free parameters. We derive the SED of the reference model with our code MC3D at 78 logarithmically distributed wavelengths in the range and use this setup to simulate SEDs for the database and SA. Our result shows directly the applicability of SA in the field of SED modeling, since the algorithm regularly finds better solutions to the optimization problem than a pre-calculated database. As both methods have advantages and shortcomings, a hybrid approach is preferable. While the database provides an approximate fit and overall probability distributions for all parameters deduced using Bayesian analysis, SA can be used to improve upon the results returned by the model grid.

2012 Vol. X No. XX, 000–000

## 1 Introduction

The continuum spectral energy distribution (SED) is an important observable of astrophysical sources embedded in a dusty environment such as young stellar objects (YSO), active galactic nuclei (AGN), and post-AGB stars. It allows one to probe the mass, composition, temperature, and spatial distribution of the dust. The common method of analysis is the comparison of available observations with predictions derived from solving the radiative transfer self-consistently with a model describing the dust properties and its spatial distribution. The task is to find a parameter set that reproduces the observations best for a given model. This minimization of the discrepancy between observation and prediction is an optimization problem and normally called a fitting procedure.

Various fitting algorithms have been proposed and implemented before (e.g., Press et al. 1992). From a methodological point of view, the fitting approaches differ in the quality of the resulting fit and demand on computational resources. The most common method is based on a pre-calculated model database that is established on a huge grid in a high-dimensional parameter space (e.g., Robitaille et al. 2006; Woitke et al. 2010). Once the database is established, the optimum parameter set is readily identified by evaluating the merit function, i.e. for our purpose the -distribution. However, as the number of grid points increases substantially with the dimensionality of the parameter space, the model grid is always a compromise between finite computational resources and resolution. Simulated Annealing (SA) is a versatile optimization technique based on the Metropolis-Hastings algorithm that can be used to search for the optimum of a merit function in arbitrary dimensions (e.g., Kirkpatrick et al. 1983; Madlener et al. 2012). The main idea is to construct a random walk through parameter space thereby improving the agreement between observation and prediction gradually by following the local topology of the merit function. A drawback of this method is that no upper bound for the step count to reach the global optimum can be given. Moreover, the sequential execution of the algorithm and possible slow convergence of the Markov chain can make SA time consuming.

In the context of SED modeling, a pre-calculated database is quite commonly invoked to perform the task because it can not only provide an approximate fit but also enables to evaluate the overall uncertainty of all parameters by Bayesian analysis (Lay et al. 1997). On the contrary, only a small sample of studies to date make use of SA in this area due to its unclear accuracy and typical convergence time. A main limitation is the significant computing time per individual SED model, especially when simulating the SEDs in an optically thick system. With the recent advance in computing performance, the SA approach is now applicable. The motivation behind this study is to evaluate advantages and shortcomings of these two methods when applied to an astrophysical optimization problem. We set up an unbiased benchmark by deriving synthetic observations from a known reference model to exclude any influence due to model uncertainties on the optimization process. We will present fitting quality and computational cost for this idealized optimization task using both approaches to provide a guide for practitioners to find a compromise between desired accuracy and available resources.

## 2 Reference Model and Modeling

Circumstellar disks surrounding YSOs, are considered as an essential step of the star-forming process. A lot of attention have been paid to these interesting objects, since they are most probably the birthplace of planet systems (e.g., Mundy et al. 2000; Meyer et al. 2007). The planet formation mechanism in these disks and the properties of the resulting planetary system depend on the structure of the protoplanetary disks that can be constrained by SED modeling. In this section we introduce a reference model (RM) to mimic a virtual object located in the Taurus star formation region at a distance of and describe our simulation technique. By using a virtual object, all uncertainties in regard to the model itself are eliminated.

### 2.1 Disk Structure

We employ a parametrized flared disk in which dust and gas are well mixed and homogeneous throughout the system.
This model has been successfully used to explain multi-wavelength observations of protoplanetary disks like the
SED and high resolution images (e.g., Wolf et al. 2003; Schegerer et al. 2008; Sauter et al. 2009).
For the dust in the circumstellar disk we assume a density structure with a Gaussian vertical profile:

(1) |

and a power-law distribution for the surface density

(2) |

where is the distance from the central star measured in the disk midplane. The proportionality factor is determined by normalising the total dust mass in the disk. The disk scale height follows the power law

(3) |

with the flaring exponent describing the extent of flaring and the scale height at a distance of from the central star.

Table 1 lists the parameters of the RM. We truncate the disk at , a typical size found for T Tauri disks and fix the value of to (e.g., Andrews & Williams 2007b). We consider a total dust mass of , corresponding to the typical value found in T Tauri disks (e.g., Beckwith et al. 1990; Andrews & Williams 2007a; Andrews et al. 2009). We make a standard assumption for the dust-to-gas mass ratio, i.e., . For the flaring exponent, we adopt the value of 1.25 (e.g., D’Alessio et al. 1999). The exponent of the dust density profile is derived from viscous accretion theory (Shakura & Sunyaev 1973).

parameter | RM | best-fit model |
---|---|---|

4000 | 4262 | |

0.92 | 2.7 | |

[AU] | 2.0 | 2.9 |

[AU] | 300 | 450 |

2.25 | 1.61 | |

1.25 | 1.033 | |

10 | 10.5 | |

2.5 | 25 | |

60 | 75 |

### 2.2 Stellar Heating

There are several heating sources of the circumstellar disks, such as irradiation by the central star, disk accretion and turbulent processes within the disks. To keep our model simple and decrease the number of free parameters, we consider a passive disk in which only stellar irradiation is taken into account (e.g., Chiang & Goldreich 1997). We assume parameters of a typical T Tauri star for the central source: and , corresponding to a bolometric luminosity of (e.g., Gullbring et al. 1998).

### 2.3 Dust Properties

We consider the dust grains to be homogeneous spheres, since the assumption of spherical grain is a valid approximation to describe the scattering behavior compared with a more complex and fractal grain structure. The dust grain ensemble incorporates both silicate and graphite material with relative abundances of 62.5% astronomical silicate and 37.5% graphite. To calculate the optical properties of the dust with the Mie scattering-theory, we use the complex refractive indices of “smoothed astronomical silicate” and graphite published by Weingartner & Draine (2001). For graphite, we adopt the common “” approximation (Draine & Malhotra 1993), which means the extinction efficiency factor is computed by:

(4) |

where and are the components of the graphite dielectric tensor for the electric field parallel and orthogonal to the crystallographic axis, respectively.

We assume a power law grain size distribution with , where represents the grain radius and and are the minimum and the maximum grain radii. The size distribution with and is the well-known MRN distribution found for the ISM (Mathis et al. 1977). For the RM, we keep and increase the maximum grain size to to account for dust growth in circumstellar disks (e.g., Sauter et al. 2009; Ricci et al. 2010).

### 2.4 Radiative Transfer Simulation Code

To derive the observables of the RM, we use the well-tested radiative transfer code MC3D developed by Wolf (2003). Based on the Monte-Carlo method, MC3D solves the radiative transfer problem self-consistently. It implements the immediate temperature correction technique as described by Bjorkman & Wood (2001) and the continuous absorption concept as introduced by Lucy (1999). Multiple and anisotropic scattering is considered in the simulations.

### 2.5 The SED of the RM

The radiative transfer problem is solved at 100 wavelengths, logarithmically distributed in the wavelength range . The red squares in Fig. 1 show the simulated SED of the RM at 78 wavelengths in the range . Since data points within this range can be obtained by current telescopes, we therefore only reproduce fluxes at these wavelengths in the fitting procedure.

## 3 Sed-Fitting With a Pre-Calculated Database

A popular approach to analyze SEDs is to solve the radiation transfer equation based on dust properties and a density distribution. Given a particular model and radiative transfer code, a database of model SEDs can be established on a part of the model’s parameter space (Woitke et al. 2010).

The main advantage of this approach is that it allows to explore how specific data points influence parameters of the fit. As an example, given a particular dust model one (sub)millimeter data point can constrain the total dust mass. An additional data point can help to constrain the maximum dust radius in protoplanetary disks. Moreover, observations of a large sample of objects can be fitted very fast to a pre-calculated database.

In order to examine the fitting accuracy of a pre-calculated database, we adapted the MC3D code to the GPU cluster at Purple Mountain Observatory and built a database of 70,000 model SEDs using the same dust density profile and grain composition as described in Section 2. We used a similar strategy as implemented by Robitaille et al. (2006) to sample the parameter values in wide space to avoid giving any attention to a particular model, especially the RM. Generally, the model SEDs span a reasonable range of parameters constrained by theory and observations of protoplanetary disks. The effective temperature and radius of the central star were derived by interpolating pre-main-sequence evolutionary tracks given by Siess et al. (2000) with the mass and age randomly sampled and logarithmically spaced in the ranges of and respectively. The dust mass was sampled from a wide range of , but values between and were randomly selected more frequently. The model grid contains a majority of samples with disk outer radius between and . Other reasonable values consistent with observations are also considered. For one half of our models, we set the disk inner radius to the dust sublimation radius , where we take to be the dust sublimation temperature (Whitney et al. 2004). The flaring exponent for these models is set to a smaller value than that found for T Tauri disks, i.e., . This was done in order to interpret the observations of homologously depleted disks in which material dissipation is thought to occur throughout the disk simultaneously (e.g., Currie & Kenyon 2009). In the remaining half of models, the inner radius of the disk was increased to account for canonical transition disks that are expected to have large inner holes due to the mechanism of inside-out disk dissipation (e.g., Muzerolle et al. 2010). The flaring exponent for these models are uniformly sampled between 1.10 and 1.30. The exponent of the density distribution is calculated using the relationship derived for a steady-state disk in which the temperature and surface density profile are coupled by , where is the exponent of the temperature profile . The models with scale height parameter between and occupy a large portion, and other reasonable values outside this range are also contained in our database. The entire model grid was established in approximately 30 days using 32 CPUs on the computer cluster.

The RM and the grid in parameter space of the database are set up independently. We fit the SED of the RM to this database and examine whether the database returns a model with parameter values close to the reference. Additionally, the distance is fixed at and no extinction is used to avoid any uncertainties from the employed extinction law. In short, we are only interested in the fitting quality of this approach. Fig. 1 shows the top two SED models in the database and the flux discrepancies between these models and the RM as well. The blue solid line represents the best-fit with a minimum calculated from

(5) |

Here, is the number of data points, is the simulated flux, is the “observed flux”. The photometric uncertainty, , was assumed to be comparable to typical observation errors of real instruments working in different wavelength regimes, and of the flux at shorter and longer wavelengths respectively. This assumption has little impact on our study as along as the observation errors are not changed. Obviously, the best-fit SED is in good agreement with the SED of the RM. The corresponding parameter values of the best model are listed in Table 1. It is particularly noteworthy that the fitting results constrain the inner radius and dust mass quite well. The central star of the best-fit model is much brighter than the RM. However, the fluxes at short wavelengths are close to the RM. This can be explained by higher disk inclinations (e.g., the best fit ) for which the outer disk occults the central star and even the inner disk regions, leading to a significant extinction for short-wavelength emission (Chiang & Goldreich 1999). The slight flux deficit in the far-infrared domain () can be explained by a smaller flaring exponent () of the best-fit model as compared to the RM. Disks with larger flaring exponent intercept a larger portion of the central star’s radiation due to a larger flaring angle and consequently re-emit more infrared flux. The maximum dust grain radius is one order of magnitude larger than the reference value. This shallows the spectral index of the best-fit model at (sub)millimeter wavelengths and weakens the silicate band. However, the database contains only four values for , namely , , , and , hence the difference in between the best-fit model and the RM is only one grid interval.

In order to estimate the confidence range for each parameter, we performed a Bayesian analysis using the reduced defined by

(6) |

where and are the number of data points and degrees of freedom respectively (e.g., Pinte et al. 2007, 2008; Bouy et al. 2008). The relative probability of each parameter set is used as a statistical weight to calculate the relative likelihood for every parameter by summing over all models with a common value and normalising with the total sum. Non-uniform distributions of models in parameter space due to different sampling density are considered as the priori probabilities and have to be taken into account. Quantitative error bars for every parameter can then be deduced from the resulting probability distributions.

Fig. 2 presents the probability distribution for some selected disk parameters. The results show that SED analysis places constraints on the parameters differently, e.g. only high inclinations (edge-on) can be excluded, while all other configurations appear to be equally likely. The scale height and the flaring exponent , characterising the vertical geometry of the disk and hence its capacity to absorb stellar energy, are constrained to some extent. Our Bayesian analysis clearly supports the notion that SEDs mainly contain information on the inner radius and dust mass of circumstellar disks as we get the best constraints on these two parameters. Moreover, we noticed that the best-fit model implies a parameter value (e.g., ) offset from the peak of its probability distribution. This is an indicator of model degeneracy often encountered in pure SED fitting.

## 4 Optimization with Simulated Annealing

Simulated annealing (SA) is a variation of the Metropolis-Hastings algorithm that applies methods of statistical physics to optimization problems like the traveling salesman problem and has been used for many optimization challenges (e.g., Metropolis et al. 1953; Hastings 1970; Kirkpatrick et al. 1983). The physical analogy of quenching a hot melt has given this extremely useful method its name as the main feature of SA is the gradual reduction in temperature during optimization. Basically, the distribution under scrutiny can be interpreted as the inner energy of a system in a thermal bath. By starting at a high temperature and subsequently cooling the bath while allowing the system to evolve through phase space the system trajectory intrinsically leads to regions of lower inner energy and thus to the global optimum of the underlying problem. One of the major advantages of this Monte Carlo method is the independence from the dimensionality. Local extrema are overcome intrinsically, no gradients have to be evaluated and the parameter space can be discrete or continuous. We adapted this technique to search the optimal disk model for the synthetic SED derived from the RM.

parameter | min | max | |
---|---|---|---|

[K] | 2500 | 7000 | |

[] | 0.2 | 8.0 | |

[AU] | 0.1 | 20.0 | |

[AU] | 50 | 1000 | |

0.005 | 1.5 | 3.0 | |

0.005 | 1.0 | 1.3 | |

[AU] | 0.25 | 5.0 | 20.0 |

[] | |||

[] | 0.25 | 1000 | |

[] | 2.0 | 15 | 85 |

0.6, , , , and are the starting points.

parameter | I | II | III | IV | V | VI | VII | VIII |
---|---|---|---|---|---|---|---|---|

[K] | 4372 | 4040 | 6540 | 2984 | 3277 | 3781 | 3052 | 2806 |

[] | 0.93 | 0.71 | 5.48 | 0.29 | 0.38 | 0.57 | 0.32 | 0.25 |

[AU] | 0.906 | 0.613 | 11.64 | 0.17 | 0.24 | 0.45 | 0.19 | 0.14 |

[AU] | 173.8 | 139.4 | 736.5 | 69.0 | 83.9 | 117.3 | 72.2 | 61.3 |

2.124 | 2.013 | 2.846 | 1.661 | 1.759 | 1.927 | 1.684 | 1.602 | |

1.124 | 1.102 | 1.269 | 1.032 | 1.051 | 1.085 | 1.036 | 1.021 | |

[AU] | 11.24 | 10.13 | 18.46 | 6.61 | 7.59 | 9.27 | 6.84 | 6.02 |

[] | 0.32 | 0.88 | 0.00041 | 22.0 | 9.2 | 1.9 | 18 | 39 |

[] | 7.8 | 4.3 | 428.8 | 0.6 | 1.0 | 2.6 | 0.69 | 0.43 |

[] | 44 | 38 | 77 | 22 | 27 | 35 | 24 | 19 |

### 4.1 Algorithm

SA creates a Markov chain of points in parameter space by starting from an arbitrarily chosen set of parameters within the constrained range, see Table 2. The parameter intervals considered here are consistent with those for establishing the pre-calculated database. The Markov chain progresses from the starting point by generating new parameter sets depending on the current position until ideally the global optimum is reached. The parameter values of each step are recorded and can be used for subsequent analysis. We list 8 starting models in Table 3 with parameter values that deviate from the RM in varying degrees. After sampling a uniformly distributed number in the range [0, 1] using the randomu function in IDL programming language, we scale it to the corresponding parameter range to obtain the starting point in every dimension. The argument seed of the randomu function used to initialize the random sequence is chosen to be identical to the chain number. Moreover, the random sampling procedure is implemented in logarithmical space for parameters with large dynamical range, such as the inner radius, dust mass and maximum grain size. We define the starting points in this way to ensure independence from the RM and reproducible results. The applicability of SA to the optimization of SEDs of circumstellar disks can be validated by comparing the disk models generated by the algorithm with the RM. Moreover, we are interested in the number of Markov chains and steps per chain needed to find a good fit. The displacement in parameter space is sampled from a gaussian distribution using varying step sizes :

(7) |

where the index enumerates the parameter dimensions. To ensure randomness of the sampling processes in our study, 4 different seeds were used to initialize the pseudo-random number sequence for each of the 8 starting points, resulting in a total number of different Markov chains.

After sampling a new step , we accept or reject it by evaluating the acceptance probability using the difference in between the last accepted step and the new position:

(8) |

We immediately accept a new step if the at the new position is lower than at the actual position. If we only accept choices with , the Markov chain converges to the next local minimum that is not necessarily identical with the global optimum. Instead, a uniformly distributed number is sampled and compared with . The new proposed position is accepted if and the chain has the chance to escape a local minimum depending on the actual temperature. The crucial role of the temperature is discussed in the next section.

### 4.2 Annealing Schedule

The random walk through parameter space not only moves to better models with smaller but also to worse models with a probability . The system temperature is hence the major control parameter of the random walk and is gradually reduced to steer the Markov chain to the vicinity of the global optimum. It can be shown in the case of logarithmic cooling that the Markov chain will reach the global optimum asymptotically (Geman & Geman 1984). Unfortunately, no upper bound for the time to reach the optimum with this strategy can be given and a variety of cooling schedules have been proposed to achieve faster convergence (Nourani & Andresen 1998). Monotonic cooling, like in exponential and linear schedules, is used in most applications of SA. The chain starts at a high temperature, runs through a melting phase to reduce any bias from the chosen starting point, and is then successively cooled to reduce the probability of moving to worse parameter sets until it freezes at a location in parameter space.

However, monotonic cooling schedules always contain several free parameters that must be adjusted empirically to control the duration of the optimization . For this study we implemented a non-monotonic schedule by setting the chain temperature after each accepted step to

(9) |

Here, is a fixed parameter controling the probability of taking an uphill step (e.g., Locatelli 2000). Without any user interaction during the optimization procedure, our approach enables a chain to escape a local minimum.

### 4.3 Adaptive Step Sizes:

The step size controls the displacement between the two adjacent positions in parameter space. Optimal step sizes cannot be defined at the beginning due to the lack of any information on the distribution under investigation. Fixed step width is not an appropriate solution because this approach can waste computation time if the chosen value are too large compared to the local shape of the merit function, leading to a low efficiency of the algorithm. If the increments are too small, the Markov chain converges slowly and the escape from a local valley becomes unlikely as too many steps are needed. The step sizes for each of the parameters hence have to be adapted along the run. For this purpose, the reasonable initial values that are dependent on the starting points of the Markov chain are listed in Table 2.

We define a local acceptance ratio of the chain using the sequence of rejected and accepted steps by calculating

(10) |

for the constant lookback length , where refers to the step count. In order to maintain the optimal acceptance ratio in the vicinity of (Roberts et al. 1997), we implement a control loop that adjusts the step widths by analysing the last positions of the chain. The algorithm calculates the moduli of relative changes between the proposed parameter and the previously accepted parameter :

(11) |

By separately summing up the relative changes for every parameter in accepted and rejected proposals, two vectors and can be derived to describe the impact of good and bad decisions:

(12) |

if , the step size of the parameter with the smallest component in is increased to encourage riskier proposals. On the contrary, if the step size of the parameter with the largest component in is decreased to induce more conservative proposals. In either case the step size of the selected parameter is adjusted by multiplying with the factor of . Moreover, for parameters spanning several orders of magnitude, like for and , we adapt their step widths using a logarithmical scale.

Additionally, the step sizes are controlled to be and of the current parameter values in order to avoid large fluctuations and extremely slow convergence of the Markov chain respectively.

### 4.4 Chain Abortion

Since no upper bound for the step count to reach the global optimum in SA can be given, we must define a reasonable criterion to stop an optimization run. When the system temperature drops, the Markov chain can get trapped at a location in parameter space if surrounded by sufficiently steep gradients as any deviation from the local minimum will yield large , see Eq. 8.

As the system temperature is coupled to the of the current position, see Eq. 9. Hence, the probability to escape a local minimum does not converge monotonically to zero as the probability to climb uphill remains constant. Hence, it is not straightforward to decide when to abort the chain using a temperature threshold as normally implemented (e.g., Nourani & Andresen 1998). Instead, we analyze the quality of the fit step by step for every accepted model of an individual Markov chain. The variation of the fitting quality is evaluated by calculating the modulus of the difference between two adjacent accepted steps. A particular Markov chain is aborted if the box average remained below a variation threshold for a certain number of steps in a row. We used and steps and averaged with a box of 10 samples, see Fig. 3 (a) for examples of chain and . The value for the threshold is derived from , as and correspond to the number of data points we fit and the degrees of freedom of our model respectively. Here, we emphasize that our criterion of chain abortion, especially the defined threshold , is not universal. One should choose an appropriate solution depending on the -distribution of the problem at hand and the required accuracy.

chain | a | b | c | d |
---|---|---|---|---|

I | 45 | 72 | 56 | 89 |

II | 82 | 39 | 12 | 68 |

III | 504 | 236 | 293 | 693 |

IV | 39 | 127 | 375 | 11 |

V | 34 | 262 | 165 | 31 |

VI | 85 | 52 | 127 | 64 |

VII | 106 | 31 | 527 | 22 |

VIII | 351 | 95 | 71 | 166 |

(1) from SED-Fitting on the basis of database is 179.5. (2) The subscript letters from a to d represent different seeds used to initialize the pseudo-random number sequence.

### 4.5 Fitting Results

Thirty-two CPUs were used to perform the optimization and each Markov chain was propagated on a single CPU. The typical number of steps for a chain before abortion is . Because the calculation time per iteration highly depends on the optical depth of the proposed model, the total run-time of an individual chain can vary significantly. On average, it took about 25 days to complete one chain, comparable to the time to establish the database. The , and some model parameters of the Markov chains / are plotted in Fig. 3 versus the accepted step count. It is clear from this plot that the merit function decreases quickly during the optimization run and then remains in the vicinity of . Both and gradually converge to the value of the RM. This behavior is the base for the criterion described in Sect. 4.4 for chain abortion. Parameters like that are difficult to be constrained by the observations, does not converge in an obvious pattern but instead cover an extended interval.

The minimum for the best fit of each Markov chain are listed in Table 4. By comparing these values to the minimum of obtained from fitting the SED to the database we can evaluate the quality of fit. Most Markov chains found parameter sets with significantly lower as compared to the result from the database, demonstrating the applicability of SA in the field of circumstellar disk SED modeling.

More quantitatively, of the Markov chains found “improved” models compared to the best fit in our pre-computed database. Here, we use quotation marks to caution the reader that better not necessarily translates into disk parameters closer to the RM as the model is degenerate.

parameter | best-fit |
---|---|

[K] | |

[] | |

[AU] | |

[AU] | |

[AU] | |

[] | |

[] | |

[] |

### 4.6 Estimation of Local Error Intervals with SA

Estimation of the uncertainties for the best-fit parameters using SA are quite different from the previously described Bayesian analysis. First, a confidence interval has to be defined from already calculated models. The deduction of this bound relies on the practitioners experience as no analytic solution can be given. Secondly, the vicinity of the best-fit in parameter space is probed by starting one or several Markov chains from this location. After collecting sufficient parameter sets below the confidence threshold, this normally asymmetric domain can be characterized by taking the minimum and maximum value of each degree of freedom.

We started a new Markov chain from the best-fit model of chain to sample the -distribution. The low starting temperature of (see Eq. 9) retained the random walk to the vicinity for sufficient steps to examine the local optimum. We gathered samples with this new chain below the confidence threshold defined by

(13) |

where is the new overall minimum found during the restart run (Robitaille et al. 2007). Again, and denotes the number of data points and dimensionality of the model. Table 5 summarizes the best-fit model of chain , the error bounds are deduced from the minimal and maximal components of all sampled parameter sets.

We like to emphasize that the error bounds deduced with this method have to be interpreted as a local confidence interval as it is only based on a Markov chain probing the direct environment of the best fit while the Bayesian analysis is more appropriate for a global error estimation by taking the whole database into account.

## 5 Discussion

There are generally two explanations for discrepancies between SEDs derived from the best-fit model and the RM. First, the SED is a spatially integrated observable and high-dimensional disk models are often degenerate. Hence, fitting of such models can only provide weak constraints on the disk structure (e.g., Robitaille et al. 2007). Experience from previous modeling campaigns unambigously indicates that additional observations, especially multi-wavelength, high resolution images, are crucial to overcome model degeneracy (e.g., Pinte et al. 2008; Sauter et al. 2009). Secondly, it is impractical to search on a sufficiently fine grid in parameter space as realistic models usually contain too many dimensions. The SED fitting problem is therefore an excellent showcase to assess optimization methods for astrophysical modeling. We will discuss in the following paragraphs the highlights of our study on fitting with a pre-calculated database and simulated annealing.

Database vs. SA: We implemented two different approaches to fit SED data of the RM, the use of a pre-calculated model database and SA, and compared these two methods extensively.

After construction of the database, a single set of observations can be fitted very fast by calculating the of every model in the database. The result may be only an approximation but the database enables us to deduce global error intervals for all parameters with the Bayesian inference method. It evaluates the probability for each parameter by weighing all models in the database. The fitting accuracy highly depends on the grid resolution of the models. Unfortunately, the number of grid points increases exponentially with the dimensionality of the parameter space. It is hence necessary to make a compromise between grid resolution and available computational resources.

Our results show that in most cases SA finds parameter sets with a approximately half the size of the solution from the pre-calculated database, indicating a higher fitting quality on average. However, some Markov chains are unable to reproduce the SED quite as well during the allocated timeframe as their siblings. This is caused by the stochastic nature of Monte Carlo methods and should be taken into account by using sufficient Markov chains in parallel to perform the optimization. The increasing computational speed of multicore CPUs ameliorates this embarrassing/parallel problem but the user will have to find a compromise to retain the calculation time in a tolerable range too. SA is a sequential process and the Markov chain may converge slowly. The fitting of SED data for a sample of objects can therefore be a very time consuming task, even for a medium computer cluster with hundreds of CPUs.

Hence, using SA and a pre-calculated model database have their individual merits for solving astrophysical fitting problems. SA is often better suited for individual cases whereas a database is preferable in a sample study, and the practitioner has to choose by weighing the computational constraints against the required accuracy. One can of course use a hybrid approach by taking the best-fit model from the database as starting point for SA. Moreover, the most probable value for each parameter indicated by the peak of Bayesian probability distribution is also a good starting choice for SA. An illustration of this idea is given in Fig. 4 for fitting SEDs of two “transition disks” in the Ophiuchus molecular cloud identified by Cieza et al. (2010), in their nomenclature TRAN 14 and TRAN 15. The quotation mark here is used to remind that the selection criterion for transition disks used by Cieza et al. (2010) is broad. Six different Markov chains starting from the best-fit, the most probable parameter set in our database or their surroundings were used to optimize the fit. After we performed accepted steps for each Markov chain, the quality of the fit for both objects were highly improved.

Starting point for SA: The influence of the starting point on the probability to encounter the vicinity of the global optimum is obvious. In our study, the starting location III features the largest distance in parameter space to the RM, see Table 3. Consequently, more steps are required to reach the RM. The corresponding chains therefore has more difficulty in finding better solutions than the best fit from the database compared with other optimization runs. As some parameters of astrophysical models can be constrained by observations, we suggest to choose the starting values accordingly.

## 6 Summary

We have compared in detail the fitting of continuum SED using a pre-calculated database and SA, a Markov chain Monte Carlo method that has been successfully employed to solve many optimization problems outside the field of astrophysics. All the simulations are performed with the radiative transfer code MC3D. Our study shows the practical application of SA in the context of circumstellar disk SED modeling and compares fitting quality and efficiency of both approaches. Because the same radiative transfer code, disk structure and dust composition are used to calculate the database and the Markov chain, the comparison of fitting quality reduces to a comparison of resulting . We show that SA typically finds better solutions than the database, directly demonstrating the applicability of this algorithm to the modeling of SED data. However, a database can quickly evaluate the overall uncertainty of all parameters and provide a good starting point for SA. A hybrid approach to combine both methods leads to more accurate solutions.

###### Acknowledgements.

Y.L. acknowledges support by the German Academic Exchange Service. H.W. acknowledges the support by NSFC grants 10733030, 10921063, and 11173060. We acknowledge the anonymous referee for the comments and suggestions that helped to improve the paper.## References

- Andrews & Williams (2007a) Andrews, S. M., & Williams, J. P. 2007a, \apj, 671, 1800
- Andrews & Williams (2007b) Andrews, S. M., & Williams, J. P. 2007b, \apj, 659, 705
- Andrews et al. (2009) Andrews, S. M., Wilner, D. J., Hughes, A. M., Qi, C., & Dullemond, C. P. 2009, \apj, 700, 1502
- Beckwith et al. (1990) Beckwith, S. V. W., Sargent, A. I., Chini, R. S., & Guesten, R. 1990, \aj, 99, 924
- Bjorkman & Wood (2001) Bjorkman, J. E., & Wood, K. 2001, \apj, 554, 615
- Bouy et al. (2008) Bouy, H., Huélamo, N., Pinte, C., et al. 2008, \aap, 486, 877
- Chiang & Goldreich (1997) Chiang, E. I., & Goldreich, P. 1997, \apj, 490, 368
- Chiang & Goldreich (1999) Chiang, E. I., & Goldreich, P. 1999, \apj, 519, 279
- Cieza et al. (2010) Cieza, L. A., Schreiber, M. R., Romero, G. A., et al. 2010, \apj, 712, 925
- Currie & Kenyon (2009) Currie, T., & Kenyon, S. J. 2009, \aj, 138, 703
- D’Alessio et al. (1999) D’Alessio, P., Calvet, N., Hartmann, L., Lizano, S., & Cantó, J. 1999, \apj, 527, 893
- Draine & Malhotra (1993) Draine, B. T., & Malhotra, S. 1993, \apj, 414, 632
- Geman & Geman (1984) Geman, S., & Geman, D. 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721
- Gullbring et al. (1998) Gullbring, E., Hartmann, L., Briceno, C., & Calvet, N. 1998, \apj, 492, 323
- Hastings (1970) Hastings, W. H. 1970, Biometrika, 57, 97
- Kirkpatrick et al. (1983) Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. 1983, Science, 220, 671
- Lay et al. (1997) Lay, O. P., Carlstrom, J. E., & Hills, R. E. 1997, \apj, 489, 917
- Locatelli (2000) Locatelli, M. 2000, Journal of optimization theory and applications, 104, 121
- Lucy (1999) Lucy, L. B. 1999, \aap, 344, 282
- Madlener et al. (2012) Madlener, D., Wolf, S., Dutrey, A., & Guilloteau, S. 2012, \aap, 543, A81
- Mathis et al. (1977) Mathis, J. S., Rumpl, W., & Nordsieck, K. H. 1977, \apj, 217, 425
- Metropolis et al. (1953) Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. 1953, Journal of Chemical Physics, 21, 1087
- Meyer et al. (2007) Meyer, M. R., Backman, D. E., Weinberger, A. J., & Wyatt, M. C. 2007, Protostars and Planets V, 573–588
- Mundy et al. (2000) Mundy, L. G., Looney, L. W., & Welch, W. J. 2000, Protostars and Planets IV, 355–+
- Muzerolle et al. (2010) Muzerolle, J., Allen, L. E., Megeath, S. T., Hernández, J., & Gutermuth, R. A. 2010, \apj, 708, 1107
- Nourani & Andresen (1998) Nourani, Y., & Andresen, B. 1998, J. Phys. A-Math. Gen, 31, 8373
- Pinte et al. (2007) Pinte, C., Fouchet, L., Ménard, F., Gonzalez, J.-F., & Duchêne, G. 2007, \aap, 469, 963
- Pinte et al. (2008) Pinte, C., Padgett, D. L., Ménard, F., et al. 2008, \aap, 489, 633
- Press et al. (1992) Press, W. H., Teukolsky, S. A., & Vetterling, W. T. 1992, Numerical recipes. The art of science computing (Cambridge University Press), 2nd edn.
- Ricci et al. (2010) Ricci, L., Testi, L., Natta, A., & Brooks, K. J. 2010, \aap, 521, A66+
- Roberts et al. (1997) Roberts, G. O., Gelman, A., & Gilks, W. R. 1997, Annals of Applied Probability, 7, 110
- Robitaille et al. (2007) Robitaille, T. P., Whitney, B. A., Indebetouw, R., & Wood, K. 2007, \apjs, 169, 328
- Robitaille et al. (2006) Robitaille, T. P., Whitney, B. A., Indebetouw, R., Wood, K., & Denzmore, P. 2006, \apjs, 167, 256
- Sauter et al. (2009) Sauter, J., Wolf, S., Launhardt, R., et al. 2009, \aap, 505, 1167
- Schegerer et al. (2008) Schegerer, A. A., Wolf, S., Ratzka, T., & Leinert, C. 2008, \aap, 478, 779
- Shakura & Sunyaev (1973) Shakura, N. I., & Sunyaev, R. A. 1973, \aap, 24, 337
- Siess et al. (2000) Siess, L., Dufour, E., & Forestini, M. 2000, \aap, 358, 593
- Weingartner & Draine (2001) Weingartner, J. C., & Draine, B. T. 2001, \apj, 548, 296
- Whitney et al. (2004) Whitney, B. A., Indebetouw, R., Bjorkman, J. E., & Wood, K. 2004, \apj, 617, 1177
- Woitke et al. (2010) Woitke, P., Pinte, C., Tilling, I., et al. 2010, \mnras, 405, L26
- Wolf (2003) Wolf, S. 2003, Computer Physics Communications, 150, 99
- Wolf et al. (2003) Wolf, S., Padgett, D. L., & Stapelfeldt, K. R. 2003, \apj, 588, 373