# Statistical errors and systematic biases in the calibration of the convective core overshooting with eclipsing binaries

###### Key Words.:

Binaries: eclipsing – stars: fundamental parameters – methods: statistical – stars: evolution – stars: interiors^{†}

^{†}offprints: G. Valle, valle@df.unipi.it

###### Abstract

Context:Recently published work has made high-precision fundamental parameters available for the binary system TZ Fornacis, making it an ideal target for the calibration of stellar models.

Aims:Relying on these observations, we attempt to constrain the initial helium abundance, the age and the efficiency of the convective core overshooting. Our main aim is in pointing out the biases in the results due to not accounting for some sources of uncertainty.

Methods: We adopt the SCEPtER pipeline, a maximum likelihood technique based on fine grids of stellar models computed for various values of metallicity, initial helium abundance and overshooting efficiency by means of two independent stellar evolutionary codes, namely FRANEC and MESA.

Results:Beside the degeneracy between the estimated age and overshooting efficiency, we found the existence of multiple independent groups of solutions. The best one suggests a system of age 1.10 0.07 Gyr composed of a primary star in the central helium burning stage and a secondary in the sub-giant branch (SGB). The resulting initial helium abundance is consistent with a helium-to-metal enrichment ratio of ; the core overshooting parameter is for FRANEC and for MESA. The second class of solutions, characterised by a worse goodness-of-fit, still suggest a primary star in the central helium-burning stage but a secondary in the overall contraction phase, at the end of the main sequence (MS). In this case, the FRANEC grid provides an age of Gyr and a core overshooting parameter , while the MESA grid gives Gyr and . We analyse the impact on the results of a larger, but typical, mass uncertainty and of neglecting the uncertainty in the initial helium content of the system. We show that very precise mass determinations with uncertainty of a few thousandths of solar mass are required to obtain reliable determinations of stellar parameters, as mass errors larger than approximately 1% lead to estimates that are not only less precise but also biased. Moreover, we show that a fit obtained with a grid of models computed at a fixed – thus neglecting the current uncertainty in the initial helium content of the system – can provide severely biased age and overshooting estimates. The possibility of independent overshooting efficiencies for the two stars of the system is also explored.

Conclusions:The present analysis confirms that to constrain the core overshooting parameter by means of binary systems is a very difficult task that requires an observational precision still rarely achieved and a robust statistical treatment of the error sources.

## 1 Introduction

Stellar models play a central role in many astronomical research fields as they are routinely used to infer important physical quantities. In spite of their significant improvement over the last decades, stellar models are still affected by non-negligible uncertainties. One of the major and long standing weaknesses is the lack of a rigorous treatment of convective transport (see Viallet2015, for a comprehensive introduction). As a consequence, both the extension of convective regions and the temperature profile inside them is not yet a robust prediction of the current generation of stellar models. Within such a scenario, to compute the dimension of the convective core, one usually determines the classical Schwarzschild border and then allows for an overshooting region whose extension is a function of a free parameter. It is thus clear that the convective core mass (hereafter we refer to the fully mixed central region as the convective core), which has a profound impact on many evolutionary features, can not be firmly predicted. On the other hand, the overshooting extension can be empirically calibrated.

In this regard, the importance of double-lined eclipsing binary systems for stellar evolution models is well recognised in the literature (see, among many, Andersen1991; Torres2010). However, even after the great improvement in observations, which nowadays routinely reach precisions in stellar mass and radius determinations of a few percent, the error in the estimated age of the system is seldom lower than 10-15%, mainly owing to poorly constrained parameters such as the initial helium abundance and the efficiency of the convective core overshooting in the main-sequence (MS) phase (binary). Recent studies, focused on the calibration of the overshooting extension in binary systems (Claret2007; Stancliffe2015), show that a large uncertainty is still present. Moreover, in a recent theoretical investigation, we showed that the overshooting calibration based on eclipsing binaries with both members in the MS phase, with error in mass determination at the percent level, are affected by statistical errors and systematic biases large enough to hamper a statistically meaningful conclusion, thus suggesting great caution must be taken (overshooting). It is apparent that only very precise observations of binary systems could, in principle, significantly improve model calibration.

A favourable opportunity was recently provided by Gallenne2015, who made available estimates of the stellar masses in the double-line eclipsing binary system TZ Fornacis with the unprecedented precision of 0.001 . Gallenne2015 adopted their determination to estimate the age of the system for two assumed values of the initial metallicity.

A binary system such as this, where the secondary star is still in the sub-giant branch (SGB) or earlier while the primary is already in a more advanced stage, offers the opportunity to constrain some other parameters besides the age, such as the initial helium abundance and the convective core overshooting extension, much more efficiently than in binary systems where both components are in MS. On the theoretical side, a twofold heavy computational effort is necessary to provide, firstly, a large and fine grid of stellar models spanning a range of plausible values of the parameters to be constrained and, secondly, a robust statistical procedure for evaluating the statistical error affecting the inferred parameters.

For our purposes, the main interest of this system is that it provides a case study for analysing the difficulty in obtaining a sensible calibration of unconstrained stellar parameters, even in the presence of very precise observational data. In this respect, we aim to show how the different uncertainty sources in stellar models propagate into the final calibration and how multiple detached solutions can exist in the hyperspace of the explored stellar parameters. We are also interested in analysing the biases in the overshooting calibration that can be induced by the adoption of a grid of stellar models that arbitrarily neglects the variability of some model parameters, such as the initial helium abundance. Moreover, the TZ Fornacis data can be exploited to test stellar evolution models. In particular, due to the negligible uncertainty on mass determination, it is realistic to test several assumptions adopted in the binary system modelling. In this work we assume, as is usual, that the two stars in the detached system can be modelled as two independent objects, therefore neglecting the tidal interaction between them. This hypothesis, in addition to the fact that the two stars have almost equal mass (mass ratio 1.05), allows us to suppose that the stars share a common overshooting efficiency. An impossibility to obtain a fit in this scenario will point out some difficulty in the adopted stellar models and/or in the stated assumptions.

In the following discussion we do not analyse the impact of the current uncertainty affecting the input physics required to compute stellar models (i.e. EOS, radiative and conductive opacities, nuclear reaction cross sections, etc.) on overshooting calibration. In fact we kept fixed the input physics to their reference values. Therefore, the reported uncertainties in the calibrated parameters (age, initial helium abundance and overshooting efficiency) come only from the propagation of the observational errors in the observables used to constrain the fitting procedure (, , , [Fe/H]).

The structure of the paper is as follows. In Sect. 2, we discuss the method and the grids used in the estimation process. The best fit for the system is presented in Sect. 3. The effect of varying the observational errors is explored in Sect. 4. The impact of changing the prior constraints in the maximum likelihood procedure is discussed in Sect. 5. The results of the analysis are compared with the literature in Sect. 6. Some concluding remarks can be found in Sect. 7.

## 2 Methods

The estimation process is based on a modified SCEPtER pipeline^{1}^{1}1Publicly available on CRAN: http://CRAN.R-project.org/package=SCEPtER, http://CRAN.R-project.org/package=SCEPtERbinary, a well as a tested grid-based maximum likelihood technique, which has been adopted in the past for single stars (scepter1; eta; bulge) and binary systems (binary). Briefly, the procedure computes a likelihood value for all the grid points and it provides estimates of the parameters of interest (age, initial helium abundance, initial metallicity, core overshooting parameter and extension of the convective core) by averaging the values of all the models with likelihood greater than 95% of the maximum value.

We assume and to be two stars in a detached binary system. The observed quantities adopted as observational constraints are . Let be the uncertainty in the observed quantities. For each point on the estimation grid of stellar models, we define . Let be the single-star likelihood functions defined as

(1) |

where

(2) |

Then, the joint likelihood of the system is computed as the product of the single star likelihood functions. In the computation of the likelihood functions, we consider only the grid points within of all the variables from the observational constraints. The pipeline provides the estimates both for the individual components and for the whole system. In the former case, the fits for the two stars are obtained independently, while in the latter case the algorithm imposes that the two members must have a common age (with a tolerance of 1 Myr), identical initial helium abundance and initial metallicity and a common overshooting efficiency parameter. We discuss this last assumption in more detail in Sect. 5.2.

The estimation process relies on stellar models spanning a wide range of evolutionary phases with very different time scales; therefore the time step between consecutive points is far from uniform. As a consequence, grid based estimates can be biased towards more densely represented phases. To obtain a sensible estimate of the density function of the age of the system and of the best overshooting parameter consistent with the data, we adopt a weighted approach (see e.g. Jorgensen2005; eta). Each point in the grid is weighted with the evolutionary time step around it and this weight is inserted as a multiplicative factor in Eq. (1). Such a procedure intrinsically favours lower helium abundance and higher metallicity models, because their evolution is slower. A different weighting scheme was also tried, allowing for a normalization at track level so that the highest weight for each track is equal to one, with negligible differences in the results.

primary | secondary | |
---|---|---|

() | 2.057 0.001 | 1.958 0.001 |

() | 8.32 0.12 | 3.96 0.09 |

(K) | 4930 30 | 6650 200 |

0.02 0.05 | 0.1 |

As observational constraints, we use the masses, radii, metallicities [Fe/H] and effective temperatures of both stars. The adopted values and their uncertainties, reported in Table 1, are taken from Gallenne2015 except for the radii, for which we adopt the more precise photometric estimate from Andersen1991. Apart from the high accuracy in mass determination, Gallenne2015 also provides a precise estimate of the effective temperature of the primary star, with a nominal uncertainty of 30 K. This is the value we adopted in our analysis. However, in order to take into account possible systematic uncertainties on the effective temperature scale determination, we also performed a more conservative analysis assuming an uncertainty of 100 K, showing only minor differences.

The error on the estimated parameters is obtained by means of Monte Carlo simulations. We generate artificial binary systems, sampling from a multivariate Gaussian distribution with mean and covariance matrix . Since the observationally inferred values of a given physical quantity for the two binary components are correlated, it would be unsafe to adopt a diagonal covariance matrix. Off-diagonal elements computed adopting sensible correlation coefficients should be included in whenever a realistic noise needs to be simulated. We assume a correlation of 0.95 between the primary and secondary effective temperatures, and 0.95 between the metallicities of the two stars. Regarding mass correlation, the high precision of the estimate makes this parameter of no importance, but we set it at 0.8, a typical value for this class of stars (binary). The correlation between the radii is set at . Since the correlation between the observed radii was not provided in the analysis by Andersen1991, we derived it by error propagation assuming normality, as in binary.

For each of these systems we repeat the parameter estimates. The errors on the parameters constrained by the fit (i.e. age, initial helium abundance, metallicity and overshooting) are assessed by their multivariate posterior densities as detailed below. As such, the obtained error accounts only for the propagation of the uncertainty in the observational constraints and neglects the systematic due to the stellar models. All the obtained solutions are validated by means of a test to assess their agreement with the observational data.

### 2.1 Stellar model grids

The grids of models were computed for the masses of the two stars, that is, and from the zero-age MS up to the red-giant branch (RGB) ascension for the less massive star and to the central helium depletion for the more massive component. The error on masses was neglected since a shift of 0.001 has negligible effect on the parameters of interest, but see Sect. 4.2 for a discussion of the effects of mass error of approximately 0.01 or larger. The initial metallicity [Fe/H] was varied from dex to 0.1 dex, with a step of 0.025 dex. The solar heavy-element mixture by AGSS09 was adopted. Several initial helium abundances were considered at fixed metallicity by adopting the commonly used linear relation with the primordial abundance from WMAP (peimbert07a; peimbert07b) and with a helium-to-metal enrichment ratio from 1 to 3 with a step of 0.5 (gennaro10).

To check the robustness of the results and their dependence on the adopted stellar evolutionary code, we computed two grids of models by means of independent codes, namely FRANEC (scilla2008; Tognelli2011) and MESA (MESA2013, release 7184).

The FRANEC code was used in the same
configuration as was adopted to compute the Pisa Stellar
Evolution Data Base^{2}^{2}2http://astro.df.unipi.it/stellar-models/
for low-mass stars (database2012).
The models were computed
by assuming the solar-scaled mixing-length parameter . The calibration is performed repeating the Sun evolution by changing , and . The iteration stops when, at the Sun age, the computed radius, luminosity, effective temperature, and photospheric [Fe/H] match the observed values with relative tolerance .
The extension of the extra-mixing region beyond the Schwarzschild border
was parametrized in terms of the pressure scale height : , with
in the range
[0.00; 0.30] with a step of 0.01. The code adopts an instantaneous mixing in the overshooting treatment.
Semiconvection during the central He-burning phase
(castellani1971) was treated according to the algorithm
described in castellani1985, resulting in no additional free parameters to be tuned. Breathing pulses were suppressed (castellani1985; cassisi2001) as suggested by caputo1989. Atmospheric models by brott05, computed using the PHOENIX code
(hauschildt99; hauschildt03),
available in the range , , and where adopted.
In the range , , and , where models from brott05 are unavailable,
models by castelli03 are used.
Further details on the stellar models are fully
described in eta; binary and references therein.

The MESA computations assumed the same set of opacities and initial chemical compositions as in FRANEC ones. Outer boundary conditions are slightly different as MESA adopts the atmospheric models from (hauschildt99), supplemented with models by castelli03 where the former are unavailable. The difference in the atmospheric models causes a small shift in the theoretical tracks, MESA models being approximately 20 K cooler than FRANEC ones in the RGB phase. In MESA, we obtained a value of the solar calibrated mixing-length , in agreement with that reported by Stancliffe2016. The code uses a different scheme for the convective core overshooting, relying on a diffusive approach with exponential decay of the overshooting diffusion coefficient. In particular, MESA sets the overshoot mixing diffusion coefficient by:

(3) |

where is the diffusion coefficient from mixing-length theory at a location near the Schwarzschild boundary, is the distance in the radiative layer from that location and is a free parameter. The overshooting parameter adopted in the diffusive scheme in MESA and the instantaneous one in FRANEC approximately follow the relation (Noels2010). The different treatment of the overshooting adopted by FRANEC and MESA could shed light on possible systematic differences in the evaluation of the evolutionary time scale and on the convective core mass. We computed a grid that spans an overshooting efficiency in MS in the range [0.00; 0.04] with a step of 0.001. Contrarily to FRANEC code, MESA also adopts a core overshooting scheme during the central helium burning phase. Since our aim is to calibrate the overshooting extension during the MS phase, we computed the models of the MESA grid by keeping the overshooting parameter fixed to = 0.014 during the central helium-burning phase. Although the effects of such a choice are expected to be very minor (see the discussion in Sect. 3), we also built a grid with a crude assumption of no overshooting in the central helium burning phase for a further consistency check. For all the input and configurations not explicitly mentioned above, we adopted the default choices in standard MESA release. An analysis of the differences that can be expected for assuming different codes and input in stellar evolution computation for stars of 1 and 3 can be found in Stancliffe2016.

The estimates of the stellar masses provided by Gallenne2015 are so precise (0.001 ) that their contribution to the statistical error in the stellar age, initial helium abundance and overshooting parameter can be safely neglected. However, much larger errors ( ) are still quite common in modern binary system observations. It is thus worth discussing the effect of mass errors of this order of magnitude on the parameter calibration. To this aim, we computed an additional grid of stellar models with the FRANEC code for 18 values of the mass in the range [1.88; 2.15] , and the same chemical compositions (initial metallicity and helium abundance) and core overshooting parameters previously specified. This multi-mass grid of models was then adopted to recover the stellar parameters by means of the SCEPtER pipeline as described in Sect. 2 but assuming larger uncertainties in the stellar masses. This exercise was performed twice. A first simulation adopted the uncertainties on the masses (1.5% for the secondary and 3% for the primary) from Andersen1991, that is, the most precise values before the analysis by Gallenne2015. In a second simulation, a mass uncertainty of 1% for both stars was adopted, a typical value that is nowadays routinely reached.

A total of approximately 64 000 stellar tracks were computed for this work.

## 3 Parameters best estimates

In this section we try to calibrate the age, core overshooting efficiency and initial helium abundance, taking the observational errors at face value. The effect of changing their extent is discussed in Sect. 4.

Although the two evolutionary codes used to compute stellar models are independent, the results obtained by means of FRANEC and MESA grids are similar. Interestingly, both grids of stellar models provided multiple solutions in spite of the very precise mass estimates of the two stars. This result is partially due to the peculiar position of the secondary star, which can be fitted either in the MS overall contraction phase or in the SGB, as detailed below. Overall, it demonstrates, in practice, the high degree of precision in the observations that is needed to unambiguously constrain the free parameters governing stellar evolution.

Figure 1 shows the two-dimensional density of probability of the estimated core overshooting efficiency and system age. As can easily be seen, in addition to the well-known degeneracy between age and overshooting, both FRANEC and MESA grids provide two detached islands of solutions. It is worth noting that the colour-scale in the two panels is different. In fact, all the densities are normalised to one; since different grids provide different relative height of the peak in the overshooting vs. age plane, the overall maximum is obviously different from one grid to another. Therefore, a comparison of the relative height is only meaningful within the individual panels and not between them.

F-I | F-II | F-I,II | F-III | |||||||||

0.261 | 0.262 | 0.263 | 0.263 | 0.264 | 0.264 | 0.261 | 0.262 | 0.263 | 0.262 | 0.263 | 0.264 | |

0.012 | 0.013 | 0.014 | 0.015 | 0.015 | 0.016 | 0.013 | 0.013 | 0.015 | 0.014 | 0.015 | 0.016 | |

0.140 | 0.151 | 0.160 | 0.160 | 0.170 | 0.170 | 0.141 | 0.155 | 0.163 | 0.240 | 0.250 | 0.255 | |

age (Gyr) | 1.07 | 1.10 | 1.13 | 1.16 | 1.16 | 1.17 | 1.08 | 1.11 | 1.16 | 1.14 | 1.16 | 1.19 |

() | - | - | - | - | - | - | - | - | - | 0.169 | 0.171 | 0.174 |

Fit parameters | ||||||||||||

(K) | 4952 | 4917 | 4947 | 4914 | ||||||||

(K) | 6896 | 6788 | 6875 | 6701 | ||||||||

() | 8.39 | 8.43 | 8.39 | 8.61 | ||||||||

() | 4.03 | 4.04 | 4.03 | 3.71 | ||||||||

2.93 | 3.65 | 2.54 | 14.38 | |||||||||

0.23 | 0.16 | 0.28 | 0.001 |

^{3}

^{3}3 F-I, F-II, F-I,II and F-III identify the different islands of solutions (see text). represents the median of the considered stellar quantities in the different solution islands, while and corresponds to the 16th and 84th quantiles. For some solutions, the distributions are skewed and or coincide with the median . One asterisk marks the quantities, which differ by more than from the corresponding observational constraints; two asterisks indicate a difference larger than . For each solution, the goodness-of-fit and the test value are reported.

M-I | M-II | M-III | |||||||

0.264 | 0.270 | 0.277 | 0.263 | 0.264 | 0.264 | 0.263 | 0.264 | 0.265 | |

0.013 | 0.014 | 0.016 | 0.014 | 0.015 | 0.016 | 0.014 | 0.016 | 0.016 | |

0.012 | 0.013 | 0.014 | 0.023 | 0.023 | 0.024 | 0.027 | 0.027 | 0.028 | |

age (Gyr) | 1.03 | 1.10 | 1.17 | 1.20 | 1.23 | 1.26 | 1.20 | 1.23 | 1.26 |

() | - | - | - | 0.143 | 0.144 | 0.146 | 0.164 | 0.166 | 0.167 |

Fit parameters | |||||||||

(K) | 4938 | 4915 | 4904 | ||||||

(K) | 6824 | 6612 | 6708 | ||||||

() | 8.24 | 8.52 | 8.61 | ||||||

() | 4.02 | 3.79 | 3.69 | ||||||

2.44 | 7.93 | 18.11 | |||||||

0.30 | 0.02 | ¡ 0.001 |

The most probable class of solutions with the highest peak in the probability density at 0.13 – 0.15 for FRANEC (solutions F-I and F-II in Fig. 1) and for MESA (solution M-I) suggests a primary star in the central helium-burning phase and a secondary in the SGB phase. As expected, the inferred age and overshooting parameter values are highly correlated as the larger the overshooting, the larger the convective core, and hence the longer the central hydrogen-burning phase. The elongated shape of the first island of the probability density and its steep inclination suggest that a small variation of the convective core overshooting has a significant impact on the inferred age. The comparison of the top-right end of this island of solutions between the two panels of Fig. 1 shows a minor difference, as the FRANEC grid provides a detached solution around and age 1.16 Gyr (F-II), whereas solutions based on a MESA grid do not show any gap. Tab. 2 and 3 list the results provided by FRANEC and MESA grid respectively, for the estimates of the binary system initial helium abundance, metallicity, age, core overshooting parameter and mass of the convective core of the secondary star when present. According to the multiple-solutions shown in Fig. 1, the tables separately report the results corresponding to the individual islands. The tables show the median values (i.e. ) of the above mentioned quantities for the models selected by the maximum likelihood procedure described in Sect. 2 and separated according to solution islands. The 16th () and 84th () quantiles are reported as representative of the errors. The fitting values of temperatures and radii for the two stars are also shown. Finally, the tables report the goodness-of-fit of the obtained solutions and their values from the distribution. As usual, models with a goodness-of-fit value lower than 0.05 are considered as providing a poor agreement with observational constraints.

In detail, FRANEC grid solutions F-I and F-II have been computed by splitting the results at age 1.15 Gyr for ¡ 0.20, while the MESA grid solution M-I has been obtained for . The two FRANEC solutions differ slightly in terms of metallicity; F-I provides = 0.013 and F-II = 0.015. Tab. 2 also shows the solution F-I,II for which the split in the two regions is neglected as the mentioned gap, rather than being genuine, could be an artefact induced by the Monte Carlo procedure and/or by the metallicity resolution of the grid of models. Although consistent within the errors, the FRANEC FI,II solution suggests both a lower initial helium abundance ( = 0.262) and a lower metallicity ( = 0.013) than the MESA M-I ( = 0.270, = 0.014). However, both grids provide results consistent with a helium-to-metal enrichment ratio of . The age of the system inferred by the two grids turned out to be nearly identical: 1.11 Gyr for F-I,II and 1.10 0.07 for M-I.

In spite of the different numerical treatment of convective core overshooting in FRANEC and MESA codes, TZ Fornacis binary seems able to severely constrain the overshooting efficiency in both cases. The FRANEC parameter is in the range 0.14-0.16 with a best value of 0.155, while the MESA is in the range 0.012-0.014 with median value 0.013. Such a result is in agreement with the qualitative relation between the parameters adopted in a diffusive (as in MESA) and in an instantaneous (as in FRANEC) treatment of overshooting (Noels2010).

The goodness of the proposed fits can be judged relying on the as in Eq. (2) summing over the two stars. Since there are six observational constraints (the masses are fixed to the observational value) and four free parameters (age, overshooting parameter, initial and ) in the fit algorithm, the computed statistics has two degrees of freedom. In Tab. 2 and 3, for both M-I and F-I,II and therefore one cannot reject the hypothesis that the solutions provide a good fit of the data.

The FRANEC and MESA tracks in the plane effective temperature vs. radius for the solutions F-I,II and M-I are displayed in the left panels of Fig. 2. The two evolutionary codes provide very consistent solutions by finding the secondary star in the SGB and the primary in the central helium-burning stage.

As anticipated at the beginning of this section, Fig. 1 shows that the island of solutions described above is not unique and that there is another detached solution located at higher values of the overshooting parameter for both grids. However, the detailed morphology of such a second island of solutions differs in the two cases, since the FRANEC-based grid shows a single sharp peak while the MESA-based grid presents multiple and smoother peaks. For this reason, we analysed the MESA results by splitting the island into two sub-regions based on the value of the overshooting parameter . As in the case of the first island of solutions, the results based on the FRANEC grid are reported in Tab. 2 (case F-III) and those based on the MESA grid in Tab. 3 (cases M-II and M-III). Both codes suggest a slightly higher initial metallicity and an older age than in the first island of solutions. Concerning the metallicity, the agreement is perfect as F-III provides = 0.015 and M-II and M-III = 0.015 and = 0.016, respectively. This is not the case for the age, as FRANEC F-III gives 1.16 Gyr while MESA gives Gyr for both M-II and M-III. The convective core overshooting efficiency is well constrained in both cases. The FRANEC parameter is in the range 0.24-0.255 with median value 0.25, while the MESA is in the range 0.023-0.024 with median 0.023 for M-II and 0.027-0.028 with median 0.027 for M-III. As in the first island of solutions, such results approximately follow the relation . Concerning the evolutionary stage, this second class of solutions corresponds for both codes to a primary star in the central helium-burning phase, as in the first solution, but a secondary star in an earlier stage, that is, the overall contraction phase rather than in the SGB (see right panels in Fig. 2). In this case, both grids of models fail to match the radii of the two stars and the solutions provide unsatisfactory fits, with high . Even for the best of them (M-II), the value of the test is below the traditional acceptance level of 0.05, therefore leading us to reject the hypothesis that the recovered values provide a good fit to the data.

As a final check, we performed an additional reconstruction relying on a MESA grid of stellar models computed by completely neglecting core overshooting in the central helium burning phase. The change is negligible and the only detectable variation is a decrease of 0.01 Gyr in the system age, well below the statistical error. The results are essentially independent of the details of the treatment of convective transport during central helium-burning because the primary star is still in an early stage in this phase.

In summary, the maximum-likelihood procedure provides multiple solutions. According to a goodness-of-fit statistics, the best solution corresponds to a primary star in the central helium-burning phase and a secondary star in the SGB, regardless of the grid of adopted stellar models, namely F-I,II for FRANEC and M-I for MESA.

It is interesting to note that the same difficulty in uniquely identifying a set of best fitting parameters has recently been described by Kirkby-Kent2016 for the AI Phoenicis, a binary system composed of two stars of approximately 1.2 . Relying on very high observational mass and radius accuracies (0.4% and 0.7%, respectively), the evolutionary stage of the primary was firmly identified (the beginning of the RGB), while that of the secondary could not be unambiguously fixed. In fact, it was impossible to firmly place this star in the SGB or in the overall contraction phase, even if the space of the parameters adopted in the fit was much smaller than that used in the present analysis. In fact, the best fit models was established based on models at variable mixing-length values but fixed initial helium abundance and vice-versa. In all the cases the overshooting efficiency was keep fixed.

In the following section, we explore the effect of varying the observational errors on the inferred overshooting parameter and initial helium abundance. The impact of changing the prior constraints in the maximum likelihood procedure, such as keeping the ratio fixed and allowing different overshooting parameters, is discussed in Sect. 5.

## 4 Effect of observational uncertainties on parameter calibration

The previous analysis shows that even in a favourable case, such as TZ Fornacis, the observational errors affecting the physical quantities used to constrain the maximum-likelihood procedure can hamper a definitive conclusion in regards to the age of the system and the core overshooting efficiency. It is therefore interesting to quantify the sensitivity of the recovered results on the observational errors. To do that, we performed further numerical experiments by repeating the maximum-likelihood procedure on the same data as in the previous cases, but this time this was done by artificially varying the observational precision.

### 4.1 Effective temperatures and radii

As a preliminary experiment, we halved the uncertainty of the effective temperature of the secondary star (the effective temperature of the primary is already well constrained, see Tab. 1) and of the radii determinations. In particular, the relative error in the radii by Andersen1991 is of approximately 1.5% for the primary star and 2.3% for the secondary, while recent determinations for other binary systems often provide relative accuracy better than 1%.

Figure 3 shows the density of probability in the overshooting efficiency vs. age plane obtained from the FRANEC grid under the assumptions stated above. A comparison of the left panel of the figure with Fig. 1 shows that the refinement in the secondary star temperature uncertainty plays a minor role. The low overshooting solution, identified by neglecting the fine structure for as in Sect. 3, corresponds to , , and age = Gyr. The of the solution is 3.60, with a -value of 0.16. On the contrary, the increased precision in the radii produces a drastic change (right panel in Fig. 3), since the solution at vanishes, and the probability density shows two peaks around = 0.15 at = 0.013 and 0.16 at = 0.014. Both the two single solutions and the overall one, computed by neglecting the fine structure, provide satisfactory fits of the data, the lowest -value being 0.11 for the case.

In summary, even whilst artificially increasing the data precision, the maximum likelihood algorithm based on a grid of models computed by allowing the initial helium content and overshooting efficiency to vary in reasonable uncertainty ranges is able to find a satisfying solution. However, owing to the continuous refinement of observational instruments and techniques, it can be expected that, in the near future, the same fitting procedure could result in no acceptable fit of the stellar data for a given system. Such a result would allow us to test the general assumptions and limitations of the 1D stellar models.

Another point to discuss is the very high accuracy in the of the primary star in the Gallenne2015 data. Given the difficulty to precisely evaluate the systematic uncertainty, we repeated the analysis adopting a much more conservative error of 100 K for the primary star effective temperature. The results of the analysis are very similar to those of the reference scenario described in Sect. 3, with a slightly larger uncertainty. For the best fit solution: , , and age = Gyr. The goodness of fit test for this solution is satisfactory (, value = 0.27). A similar behaviour occurs for the second and unsatisfactory solution: , , and age = Gyr (, value = 0.001). The agreement of the results is not particularly surprising because, as can be seen in Table 2, the constraint on the effective temperature of the primary star is always satisfied without problems; therefore, relaxing this constraint does not force a significant change in the best fit solutions.

### 4.2 Masses

It is well recognised in the literature that only very precise determination of stellar masses and radii can severely test stellar models (see e.g. Torres2010). In particular, when attempting a calibration of a free parameter, as in the case of overshooting or mixing-length, a very high accuracy on fundamental parameters is mandatory, since uncertainties in the observational constraints will propagate in the final estimate. As discussed in the previous section, the unprecedented precision of mass determination of TZ Fornacis members (at the level of 0.001 ) provided by Gallenne2015 allowed for calibration of several unknown parameters without further bothering of mass uncertainty.

However, overshooting recently showed that for binary systems where both stars are in the MS phase, uncertainties at the level of 1% in mass determinations and of 0.5% in radii, values commonly found in current data, are high enough to hamper a statistically meaningful calibration of the core overshooting parameter. In spite of such evidence, it is unfortunately common to find overshooting calibrations in the literature that fail to properly address the effect of the error in stellar mass estimates, even when uncertainties of the order of 1% or larger are present. Such an assumption has no theoretical justification because the uncertainty on the calibrated parameter should be evaluated by propagating the errors in all the observational constraints. Moreover, the stellar mass is the most important parameter determining the evolution of the star; failing to address the uncertainty in its determination results in meaningless error on the final estimate.

The aim of this section is to demonstrate the importance of very accurate stellar mass measurements for calibration purposes. To do this, we again performed our maximum-likelihood procedure against TZ Fornacis as above, but this time assuming larger values of the mass uncertainty. The comparison between these new results with those previously discussed will asses the impact of mass uncertainty on the calibration of stellar parameters. We performed such a numerical experiment twice and only for FRANEC models. In the first case, we adopted the most precise determination available in the literature before the analysis by Gallenne2015 as error on primary and secondary masses, that is, the values quoted in Andersen1991: (3% relative error) and (1.5% relative error). In the second case, we adopted a 1% relative error on the two masses, a precision that is nowadays commonly achieved.

In order to perform the recovery procedure in presence of mass uncertainty of this order of magnitude, the grid of pre-computed stellar models must be extended to properly account for the larger range of variation, as explained in Sect. 2.1.

Figure 4 shows the 2D density of probability of the estimated core overshooting parameter and age for the two different assumptions about the mass uncertainty. The best-fit stellar parameters and their errors for both scenarios are collected in Tab. 4. The MM-A solution refers to the case of mass uncertainty from Andersen1991, while MM-B refers to the case of 1% uncertainty on stellar masses.

MM-A | MM-B | ||||||||

0.262 | 0.263 | 0.271 | 0.262 | 0.263 | 0.266 | 0.272 | 0.275 | 0.277 | |

0.013 | 0.014 | 0.015 | 0.013 | 0.014 | 0.016 | 0.012 | 0.013 | 0.014 | |

0.270 | 0.300 | 0.300 | 0.270 | 0.297 | 0.300 | 0.120 | 0.130 | 0.130 | |

age (Gyr) | 1.04 | 1.12 | 1.19 | 1.09 | 1.15 | 1.19 | 0.97 | 1.01 | 1.02 |

() | 0.169 | 0.174 | 0.180 | 0.169 | 0.172 | 0.175 | - | - | - |

3.34 | 2.74 | 8.08 | |||||||

0.50 | 0.60 | 0.04 |

The striking difference between the two panels of Fig. 4 and the left panel of Fig. 1 is due solely to the difference in the mass uncertainty used in the maximum-likelihood procedure. A simple increase of the mass uncertainty completely alters the morphology of the best-fit solution. Concerning the calibration of the overshooting, the less precise is the mass and the more skewed toward the largest value present in the grid of models is the estimated parameter, with best fit for both scenarios. In the case of the largest mass error (3%, left panel), the island of solutions at low overshooting values, which was the most probable one in Fig. 1, is completely missing, while it begins to manifest itself, even if at low density of probability, in the case of mass error at 1% level. The values of the goodness of fit tests for these solutions are higher than those reported in Tab. 2, and therefore the solutions are legitimate. Such a behaviour is expected because the fitting algorithm can explore a larger range of mass combinations.

It is thus apparent that an error of a few percent on the mass determination can not only broaden the uncertainty on the estimated parameters, but, as in this case, can also drastically affect the best fit solutions.

The explanation of this behaviour merits a brief analysis. It is clear that the larger the uncertainty affecting the observational constraints, the wider the parameter space that can be explored by the fit algorithm. Indeed, a fit of a system will never be perfect due to both the discrepancies between synthetic and real single stars and the modelling of the evolution of a binary system. Even in an ideal world in which the two stars evolve independently and theoretical models perfectly reproduce the data, a simple random error in the observations will produce a mismatch between fitted and real objects (see overshooting for an example of the possible magnitude of such an effect). Hence, it is reasonable to expect that the result obtained under tighter observational constraints provides a worse fit than that obtained under looser ones. This is particularly true where the mass of the stars is concerned because of its major effect on stellar evolution. The key point we would highlight here is that the possibility of testing the assumptions of stellar evolution in binary systems requires an accuracy in mass determination much higher than 1%, otherwise the propagation of larger mass uncertainty allows the algorithm to explore an overly wide range of mass combinations. This additional variability can mask a possible slight difference between theory and observations, since it is easier for the algorithm to find a track combination compatible with the observations. Although the correlation imposed in the covariance matrix between the masses (see Sect. 2) protects against unphysical inversion of the mass ratio, the algorithm is free to adjust the mass inside the nominal observational errors. To illustrate this effect, Fig. 5 shows the density of probability of reconstructed masses and radii for both primary (left panel) and secondary (right panel) stars, computed from the largest mass error set. From the right panel of the figure, it is clear that the mass of the secondary star is preferentially overestimated with respect to the observational value. This overestimation occurred to reduce the tension between the observational constraints and the grid of theoretical models. In detail, the shift in the fitted mass of the secondary is of approximately 0.04 , a value large enough to heavily influence the theoretical stellar track. It is therefore unsurprising that the calibrated age and overshooting of this system are completely different from those described in Sect. 3. The islands of solutions present in Fig. 1 no longer appear in the left panel of Fig. 5 because their likelihood is dominated by that of models at different masses.

## 5 Effect of different assumptions in the maximum-likelihood fit

### 5.1 Keeping the helium-to-metal enrichment ratio fixed

Stars such as TZ Fornacis members are too cold for spectroscopical constraint of their helium content, resulting in an additional unknown parameter to be calibrated, as the initial helium abundance significantly affects the evolution of stars of given mass and metallicity. Stellar modellers usually assume a linear relationship between the initial metallicity and helium abundance, that is, . However, the ratio is still poorly constrained (e.g. pagel98; jimenez03; gennaro10) and consequently the initial helium abundance to be adopted in stellar models at a given metallicity is subject to some variability. For this reason, we computed a grid of models for five values of in the range 1-3 (see Sect. 2.1) and we treated the helium-to-metal enrichment ratio as a parameter to be estimated by the maximum-likelihood procedure, as is the case for the overshooting and the age.

However, in the literature, it is much more common to find grids of models computed at fixed ratios, as is the case of publicly available databases, where a value of approximately two is usually adopted. Therefore, it is worthwhile analysing the effect of adopting a grid of stellar models computed by keeping the helium-to-metal enrichment ratio fixed on overshooting calibration and age estimate.

Figure 6 shows the density of probability in the overshooting efficiency vs. age plane obtained from the FRANEC grid but only by models with . The impressive difference between this figure and the left panel of Fig. 1 is due solely to the different assumption on the helium-to metal enrichment ratio (fixed vs. variable), as all the other characteristics of the grid of stellar models are exactly the same.

Figure 6 shows two distinct peaks at and 0.13, but, as stated above, this sub-structure could possibly be due to the grid step in the overshooting parameters. The stellar parameters inferred from the fit, neglecting the aforementioned sub-structure, are presented in Tab. 4, labelled as . As can be expected, Fig. 6 shows that forcing the estimates to adopt a higher value of initial helium content than those preferred by the maximum likelihood solution in Sect. 3 (0.275 vs. 0.262) shifts the age and the overshooting parameter estimates towards lower values, the system age being in the range [0.97; 1.02] Gyr. Moreover, the density of probability is highly constrained in a small region of the overshooting efficiency vs. age plane because only a very small subset of models computed adopting is compatible with the observational data. The goodness-of-fit test – with three degree of freedom, since the initial helium content is no longer estimated from the data – for the solution that neglects the peak sub-structure gives a -value of 0.04, and therefore a poor fit of the data. This is an expected behaviour, because the fitting algorithm is forced to explore a reduced space of parameters, away from the best-fitting region found in Sect. 3, for determining the solution of the system.

In conclusion, neglecting the current uncertainty in the initial helium abundance and adopting stellar models computed at a fixed severely affects both the age estimate and the overshooting calibration, confirming what has already been shown by overshooting for eclipsing binaries whose members are in MS.

### 5.2 Allowing different overshooting efficiency for the two stars

As anticipated in Sect. 1, the choice to fit the system adopting a common value of overshooting efficiency is based on the theoretical reference framework of independent evolution of the two stars. Under this assumption, with the two stars having almost the same mass, the possibility of different overshooting parameters in the two binary members can be safely neglected. However, it is interesting to discuss the methods for testing the need for different overshooting efficiency in the two stars and some shortcomings of this approach.

Although the solutions with low overshooting efficiency from FRANEC and MESA grids proved to be satisfactory in the goodness of fit test, a much better solution with independent overshooting efficiency for the two stars could, in principle, exist. To test this hypothesis, we repeated the estimation process relaxing the constraint of a common in the FRANEC grid, therefore losing one degree of freedom in the test.

OV-I | OV-II | |||||

0.261 | 0.263 | 0.269 | 0.262 | 0.263 | 0.264 | |

0.012 | 0.013 | 0.015 | 0.013 | 0.015 | 0.016 | |

0.140 | 0.151 | 0.160 | 0.160 | 0.180 | 0.200 | |

0.030 | 0.058 | 0.080 | 0.260 | 0.270 | 0.270 | |

age (Gyr) | 0.96 | 1.02 | 1.06 | 1.15 | 1.18 | 1.20 |

() | - | - | - | 0.169 | 0.171 | 0.173 |

(K) | 4937 | 4922 | ||||

(K) | 6779 | 6667 | ||||

() | 8.27 | 8.57 | ||||

() | 4.04 | 3.78 | ||||

1.42 | 8.96 | |||||

0.23 | 0.003 |

The results of the analysis are presented in Table 5. For the solution at low overshooting efficiency, the secondary star is fitted with , a much lower value than the primary . At first glance, it seems, therefore, that a different overshooting efficiency would be suggested for the two stars. However, such a conclusion is not statistically significant since it does not rely on a rigorous test.

The proper way to statistically evaluate the importance of additional parameters inserted in a model is based on the comparison of nested models, a ubiquitus statistical method of inference (see e.g. venables2002modern; simar). This approach relies on the comparison of appropriate goodness of fit statistics for two nested models (in this case the fit ). The two models we are interested in comparing are nested because they rely on the same data and grids, but they differ since the model with independent overshooting contains one additional parameter to be estimated. To follow the statistical nomenclature, let us call the model that allows for different overshooting efficiency between the two stars the full model, and the alternative one, the restricted model. Let and be the degrees of freedom of the full and the restricted models, respectively. Then the statistic has a distribution, allowing us to test the statistical significance of additional parameters in the model. In our case, for the comparison of models OV-II and F-I,II, this gives with one degree of freedom. The value of the test is 0.29, therefore the two models do not differ significantly, and offer an equivalent fit of the data. Hence, we cannot conclude that the solution with different overshooting is better than the other with a single value; relying on the principle of maximum parsimony, the model that assumes a common overshooting efficiency is therefore preferable.

In addition to all these statistical considerations, allowing the stars to have independent overshooting efficiencies would lead to an additional free parameter, which can easily mask the possible difficulty of stellar tracks to fit the system. In other words, the overshooting efficiency would reflect not only the additional mixing of the core, but also other effects from any given source, which would remain hidden. Therefore, although a difference in the overshooting parameter can be theoretically required and justified when the two stars have different masses, it is somewhat questionable when they haven’t. Moreover, the recent theoretical works by testW and overshooting show, in detail, the variability in the difference of age and overshooting efficiency that can be recovered by stars in binary systems with both components in MS phase. These works show that, even in the overoptimistic case when stars and models are in perfect agreement, that is, when the artificial stars are sampled from the same theoretical tracks adopted in the recovery, the variability in the recovered parameters is large. As an example, Fig. 4 of overshooting shows that for two stars in MS phase, and assuming an uncertainty of 1% on the masses and 0.5% on the radii, a difference of approximately 0.15 in the recovered overshooting parameter can be expected to occur owing solely to the statistical uncertainty in the observational constraints. Although these results cannot be directly adopted for the present study, they should nonetheless be considered before claiming a significant difference in the individually recovered overshooting efficiencies of the two stars. More thorough theoretical investigations are needed before relying on calibrations that could be the results of random fluctuations in the observables.

## 6 Comparison with the literature

Several age estimates for TZ Fornacis exist in the literature, such as Gyr (Andersen1991); 1.3 Gyr (Claret1995); [1.12; 1.33] Gyr (VandenBerg2006) and [1.08; 1.41] Gyr (Claret2011). Apart from the first determination, which is based on outdated opacity tables, the others agree relatively well within each other and are consistent with our best estimate. The overshooting efficiency for TZ Fornacis was estimated by Stancliffe2015 who, adopting MESA, found , a value much larger than our estimates. The difference can be partially ascribed to differences in the fitting algorithms, but mainly in the explored evolutionary phases; the grid employed in that paper does not include models in the central helium burning stage, our preferred solution, and to the initial helium values adopted in the grids.

However, the most relevant comparison is the one with the results presented by Gallenne2015, who adopt their newly determined mass and uncertainties. They provided a system age of Gyr and placed the secondary stars in the contraction phase by using stellar evolutionary tracks from BASTI (teramo04) and PARSEC (Bressan2012), which adopt = 0.20 and 0.25, respectively. Therefore, the solution identified in that paper is most similar to the one highly disfavoured by our analysis.

It is also interesting to compare the reconstructed best-fit overshooting values with results coming from asteroseismic observations of less massive stars. Recently, by relying on the detections of Kepler (Borucki2010), Deheuvels2016 estimated the extent of the convective core in eight MS stars in the mass range [1.12-1.45] . The resulting overshoot parameters, assuming instantaneous overshooting as in FRANEC code, were between 0.05 and 0.15 (with typical uncertainty of ) when microscopic diffusion is included in the computations. A possible increasing trend of overshoot parameter with stellar mass was also demonstrated, but much more data are required to definitively confirm it (see also the results from Claret2007; Claret2016, from eclipsing binaries). However, it should be noted that the definition of the overshooting extension beyond the classical Schwarzschild border adopted in Deheuvels2016 is somewhat different from those used here, and that the calibrated value of the overshooting parameter depends on the assumed chemical and input physics in the stellar computations. Therefore a much more meaningful comparison would concern the mass extension of the convective core rather than the overshooting parameter.

## 7 Conclusions

Taking advantage of the very precise mass determination of the stars in the eclipsing binary system TZ Fornacis recently provided by Gallenne2015, we tried to constrain the convective core overshooting, the initial helium abundance and the age of the system. We assumed independent evolution of the two stars and thus, having the binary members nearly equal in mass, the same overshooting efficiency for both components. We put particular effort into the statistical treatment of both the parameter estimates and of their errors. To do this, we used the SCEPtER pipeline (scepter1; eta; binary), a maximum likelihood procedure relying on several dense grids of stellar models. To check the possible systematic effects on the results of the adopted stellar evolutionary code, we computed two grids of models by means of independent codes, namely FRANEC and MESA. In addition to the best fit, we also discussed the impact on the calibration results of varying the observational uncertainties, of keeping fixed the ratio, and of allowing different overshooting efficiencies for the two binary members.

Concerning the best fit, overall, the results from within these two grids agree relatively well with each other. Despite the very high quality of the data, with errors in mass lower than 0.1% and in radius of between 1.5 and 2.2%, the quoted stellar parameters cannot be unambiguously constrained. In fact, regardless of the grid of stellar models used, we found at least two distinct classes of solutions.

The primary star is in the central helium-burning phase in both classes of solutions, whereas the secondary star is in the SGB phase in the first class and in the overall-contraction phase at the central hydrogen exhaustion in the second class.

Concerning the first class of solutions, the best fit based on FRANEC models provides an age of Gyr, a core overshooting parameter and an initial helium abundance . These values are in agreement with those based on MESA models, that is, age Gyr, diffusive overshooting and . Both grids of models suggest a low helium-to-metal enrichment ratio .

The second class of solutions yields a more efficient convective core overshooting both for the FRANEC and MESA formalism. In this case, the estimated ages are Gyr and Gyr, respectively. However, the goodness-of-fit of this second class of solutions is much worse than that of the first class.

Besides the determination of the best fit values of age and overshooting parameter for the TZ Fornacis system, in this work we specifically addressed some methodological and statistical problems affecting the calibration of free parameters from binary systems. We investigated how different yet legitimate choices in the computation of the stellar model grid and in the error treatment can produce completely different calibrations.

In particular, we showed the importance for calibration purposes of relying on very accurate stellar mass determination. To this aim, we repeated the estimation process with identical input but assuming larger yet still relatively typical uncertainties in the two stellar masses. As a result, we obtained that even an uncertainty at a level of 1% causes a severe bias in the inferred overshooting parameter, completely altering the morphology of the best fit solution.

We also studied the effect of neglecting the uncertainty in the helium-to-metal enrichment ratio, and thus of the initial helium content, on the stellar parameter estimates. To do this, we repeated the fitting procedure relying solely on models with fixed , a value frequently adopted in the literature. The result shows a notable bias, larger than the statistical error, in the estimates of system age and overshooting efficiency toward lower values.

The possibility of a different overshooting parameter for the two stars was also investigated, even if having two stars of nearly identical mass is theoretically unlikely. However, this choice did not improve the quality of the fit, and can lead to artefacts in the solutions.

In summary, we tried to show that the calibration of parameters from binary systems is a delicate task, affected by many decisions in the fitting stage. One of the main results of the analysis is the quantification of the importance to properly account for various sources of uncertainty that affect both the observational data and the theoretical model computations. Clearly this means a heavy computational effort because it requires the building of very large and fine grids of stellar models. However, it is a mandatory effort since fixing some values in the theoretical computation without observational evidence (i.e. the initial helium content) or neglecting the actual error in the masses of the two stars can severely affect the calibration from observations. On the other hand, this study provides a warning against relying on fit, which allows for too much variability, such as allowing for independent overshooting efficiency for stars of nearly equal mass, because the calibration can be significantly affected by mere random fluctuations.

Although we explicitly address many sources of uncertainty, we did not evaluate the possible systematic effects of fixing the input physics in stellar model computations. The quantification of this uncertainty would require a huge amount of computation and could not be explored in the present work.

More theoretical investigations aimed at quantifying the variability expected by chance of parameters calibrated from binary systems is needed before relying on the results obtained in this way. The continuously growing availability of powerful computation facilities for calculating theoretical stellar models under a wide range of input, and the development of sound statistical techniques for obtaining and comparing the best fit solutions, will offer a unique opportunity to firmly evaluate the actual degree of reliability of these calibrations.