from cosmic chronometers and Type Ia supernovae, with Gaussian Processes and the novel Weighted Polynomial Regression method
Abstract
In this paper we present new constraints on the Hubble parameter using: (i) the available data on obtained from cosmic chronometers (CCH); (ii) the Hubble rate data points extracted from the supernovae of Type Ia (SnIa) of the Pantheon compilation and the Hubble Space Telescope (HST) CANDELS and CLASH MultyCycle Treasury (MCT) programs; and (iii) the local HST measurement of provided by Riess et al. (2018), km/s/Mpc. Various determinations of using the Gaussian processes (GPs) method and the most updated list of CCH data have been recently provided by Yu, Ratra & Wang (2018). Using the Gaussian kernel they find km/s/Mpc. Here we extend their analysis to also include the most released and complete set of SnIa data, which allows us to reduce the uncertainty by a factor with respect to the result found by only considering the CCH information. We obtain km/s/Mpc, which favors again the lower range of values for and is in tension with . The tension reaches the level. We round off the GPs determination too by taking also into account the error propagation of the kernel hyperparameters when the CCH with and without are used in the analysis. In addition, we present a novel method to reconstruct functions from data, which consists in a weighted sum of polynomial regressions (WPR). We apply it from a cosmographic perspective to reconstruct and estimate from CCH and SnIa measurements. The result obtained with this method, km/s/Mpc, is fully compatible with the GPs ones. Finally, a more conservative GPs+WPR value is also provided, km/s/Mpc, which is still almost away from .
a]Adrià GómezValent b]and Luca Amendola Prepared for submission to JCAP
from cosmic chronometers and Type Ia supernovae, with Gaussian Processes and the novel Weighted Polynomial Regression method

Departament de Física Quàntica i Astrofísica, and Institut de Ciències del Cosmos, Univ. de Barcelona, Av. Diagonal 647, E08028 Barcelona, Catalonia, Spain

Institut für Theoretische Physik, RuprechtKarlsUniversität Heidelberg, Philosophenweg 16, 69120 Heidelberg, Germany
Keywords: dark energy experiments, supernova type Ia  standard candles
ArXiv ePrint: arXiv:1802.01505
Contents
 1 Introduction
 2 The data sets

3 The Gaussian Processes method
 3.1 Reconstruction of functions with Gaussian Processes
 3.2 Determination of with GPs, cosmic chronometers, the Pantheon+MCT SnIa compilation, and the local HST measurement
 3.3 Error propagation of the kernel hyperparameters
 3.4 Possible systematics affecting the CCH data, and their impact on our results
 4 The Weighted Polynomial Regression method
 5 Discussion and conclusions
 A Modeldependent results for the CDM and XCDM
1 Introduction
The Hubble constant, , is a pivotal quantity in cosmology, since it is used to construct the most basic time and distance cosmological scales. It was measured by the very first time by E. Hubble in 1929, who estimated it to be around km/s/Mpc [1]. Of course we know now that this first determination was exceedingly large, namely a factor above the true value. Present measurements agree that it must fall in the approximate range 6575 km/s/Mpc, but the situation regarding the available observational values of is disquieting. The reason is that there currently exists a significant tension between different measurements and determinations of this parameter. For instance, there is a big discrepancy between values obtained from local measurements depending on how the distances to the supernovae of Type Ia (SnIa) used in the analysis are calibrated. As an example we can compare the values inferred from the local Hubble Space Telescope (HST) measurement of [2], km/s/Mpc, with the one provided in [3], km/s/Mpc. The tension between these values is not small, around . The latter prefers the lower range and is more aligned with the investigations carried out by Allan Sandage and collaborators for many years [4], whereas the former lies in the higher range, follows the line of the previous analyses [5, 6, 7] and is also supported by other works, see e.g. [8, 11, 12, 10, 9, 13]. Recent strong lensing [14] and HII galaxy [15] measurements also favor the higher range of values. It is appropriate to mention too the recent determination km/s/Mpc [16], obtained (as the one from [3]) by using the tip of the red giant branch distances. It loosens the tension with the HST value. On the contrary, the authors of the dedicated study [17] find some indication that the local HST measurement of is an outlier upon performing a consistency test considering the constraints on the Hubble parameter that are obtained from a large variety of sources (including the one from Ref. [14]), thus favoring a systematicsbased explanation of the tension. In addition, we must also comment on the wellknown tension between the HST value from [2] and the one from Planck’s collaboration [18], km/s/Mpc, derived from the analysis of the TT,TE,EE+SIMlow cosmic microwave background (CMB) data by assuming the CDM cosmological model as the true one. In this case there is a tension between these two determinations. A tension is also found when comparing the HST value with the Planck one provided in the preceding paper [19]. At this moment the mismatch between Planck’s values and some of the local measurements of cannot be explained by any crucial systematic error in the data. If it does exist, it has not been detected yet, see Refs. [20, 21, 22]. It cannot be solved by the “Hubble bubble” hypothesis neither. The latter states that we live in a locally underdense region of the universe which makes us measure a higher value of than compared to the global one. Although this idea is interesting, it has proven to be highly improbable [23, 24]. Certainly, it is quite shocking that in this age of precision cosmology, twenty years after the discovery of the speedingup of the universe [25, 26] and almost ninety years after the discovery of its expansion [1], we still lack a resonant determination of , the first ever measured cosmological parameter.
Some attempts at loosening this tension from a theoretical perspective have been made by several authors, assuming new physics scenarios. For instance, by exploring extensions of the vanilla CDM parameter space [27, 28], by modifying the earlytime physics adding some kind of dark radiation [29] or early dark energy (DE) [30], by allowing DE interactions with dark matter (DM) [31], or with the aid of massive neutrinos [32]. See also the recent Ref. [33]. It is worth mentioning that when the large scale structure formation data is also considered in the fitting analysis of the concordance model the predicted value of is dragged away from Planck’s bestfit estimate too, and enters in conflict with both, the Planck and the HST values [34]. In [34, 35, 36] it is found that some dynamical DE models can loosen the tension in a very effective way by respecting at the same time the value preferred by Planck, so in these scenarios the lower range for the Hubble parameter is clearly favored. Previously it was shown that they provide an overall fit to the current cosmological data significantly better than the CDM [37, 38, 40, 41, 39].
In view of the crudeness of the tension exposed above, it is very important to obtain alternative estimates as reliable as possible for this parameter. Median statistics have been largely applied to get some constraints by using also historical compilations of measurements [42, 43, 44], as Huchra’s compilation^{1}^{1}1https://www.cfa.harvard.edu/ dfabricant/huchra/hubble.plot.dat. and updated extensions of it. Although this method somehow minimizes the effect of outliers, which is of course very welcome and something that the usual weighted mean does not do, it does not incorporate the information of the individual uncertainties of the data set elements, and therefore loses part of the information. Despite this issue, it proves to be a very robust method. The central values for obtained by using it tend to lie in the low range, but their uncertainties are still quite large and this fact makes them compatible with the local determinations from [2, 5, 6, 7]. For instance, the authors of [43] obtain km/s/Mpc at c.l..
The birth of gravitationalwave multimessenger astronomy has also come hand in hand with new measurements of the Hubble parameter. The detection of the gravitational wave and the electromagnetic counterpart produced by the merger of the binary neutronstar system GW170817 [45] has allowed to use this event as a standard siren and measure [46, 47]. The value reported in [46] is , which lies between the Planck and the HST values. The one presented in [47] is higher, . Although these constraints are still quite poor and are unable to discriminate between the two values in dispute, it is worth to remark that they have been obtained without using any form of cosmic distance ladder, and therefore are free of systematic errors that could affect the more standard astronomical determinations of cosmological distances. Nevertheless, binary neutronstar standard sirens can play an important role in the next decade. In Ref. [48] the authors demonstrate that a typical sample of of these objects can independently arbitrate the tension, and this will be possible in the following years thanks to the LIGO and Virgo experiments. Other measurements lying in between the HST and Planck’s values are e.g. [49, 50, 51]. They are actually compatible with both, Planck and the HST, at .
In this work we extend the Bayesian “nonparametric” analysis carried out in the recent paper [52] by including the effect of the most complete and updated data on SnIa provided in [53]. We will see how we can convert in a consistent way the Hubble rates that summarize the information of these SnIa into data on the Hubble function at the effective redshifts considered in [53], and how the low uncertainties of the SnIa data allow us to reduce the error of our determination of by a factor with respect to the case with no SnIa data, when only cosmic chronometers (CCH) are considered. We find values that lie mostly in the lower range of , supporting the preferred band by Planck [18, 19], and also other recent studies as e.g. [17, 34, 52, 48]. We also reanalyze the results obtained with the Gaussian Processes (GPs) method for the CCH and CCH+ data sets, by properly accounting for the error propagation of the hyperparameters entering the kernel that controls the correlations between the points of the reconstructed function . We will show that this fact results in a slight lowering of the central values of and an increase of the standard deviations with respect to the values inferred without propagating the hyperparameters’ errors. Despite this, the corrected values are still compatible at with the uncorrected ones, so no important differences are found. We complement our analysis in the context of the GPs with a detailed study on the impact that has the choice of the stellar population synthesis model used to extract the CCH data on our results.
We also present a novel method based on polynomial regression and Occam’s razor principle to reconstruct continuous functions from a list of data points. We apply it to the study of and obtain new determinations of by considering various data set configurations.
The layout of this paper is as follows. In Sect. 2 we briefly describe the data sets that will be used in the current analysis. In Sect. 3 we make a quick review of the GPs method and present the basic formulas. We apply this method to the CCH and CCH+Pantheon+MCT data sets, and also study the impact of the local determination [2] on our results. Later on we apply again the GPs method to the CCH data in a fully consistent way, in order to obtain the corrected GP determination of by accounting for the error propagation of the kernel hyperparameters. We end Sect. 3 by studying the impact on our results of potential systematic errors affecting the CCH data, and also analyze the effect of other changes in our CCH data set. In Sect. 4 we describe in detail the novel weighted polynomial regression (WPR) method, and apply it to different data sets and under two different cosmographical perspectives to reconstruct and derive . In Sect. 5 we finally discuss the results and present our conclusions.
2 The data sets
We devote this section to present and describe the data that will be used later on in the subsequent analyses based on Gaussian processes (in Sect. 3) and the weighted polynomial regression approach (in Sect. 4). We also provide the corresponding references, together with the model dependencies and assumptions behind these data. This is of course basic to ensure the correct understanding of the scope of our calculations and the range of validity of our results.
References  

[55]  
[56]  
[55]  
[57]  
[58]  
[58]  
[55]  
[57]  
[55]  
[58]  
[59]  
[57]  
[59]  
[59]  
[59]  
[60]  
[59]  
[61]  
[58]  
[58]  
[58]  
[58]  
[61]  
[57]  
[58]  
[57]  
[62]  
[57]  
[57]  
[57]  
[62] 
2.1 Cosmic chronometers
Correlation matrix  

Spectroscopic dating techniques of passively–evolving galaxies, i.e. galaxies with old stellar populations and low star formation rates, have become a good tool to obtain observational values of the Hubble function at redshifts (see [54] and also the references in Table 1). These measurements are independent of the Cepheid distance scale and do not rely on any particular cosmological model, although of course are subject to other sources of systematic uncertainties, as the ones associated to the modeling of stellar ages, which is carried out through the socalled stellar population synthesis techniques (SPS), see Sect. 3.4 for further details. Given a pair of ensembles of passivelyevolving galaxies at two different redshifts it is possible to infer from observations and under the assumption of a concrete SPS model. Therefore it is possible to compute too, which is the quantity we are interested in. Thus, cosmic chronometers allow us to obtain direct information about the Hubble function at different redshifts, contrary to other probes which do not directly measure , but integrated quantities as e.g. luminosity distances. In Table 1 we list the CCH data points on used in our main analyses, together with their corresponding uncertainties . We point out that we have used a diagonal covariance matrix for these data, i.e. .
In the current analysis we do not make use of data on extracted from the measurement of baryon acoustic oscillations (BAO) in order to avoid dealing with their cosmological model dependence. In this work we will try to reconstruct the continuous curve of using statistical methods as free as possible of assumptions on a particular physical cosmological model, and determine from the corresponding reconstructed curves. In this sense, we will try to depart as much as possible from the approach adopted in other works, as e.g. in [63, 64, 65, 66, 67, 68, 69, 70], in which is determined in the framework of concrete cosmological models ^{2}^{2}2We only dedicate Appendix A to a modeldependent study. More concretely, to the fitting analysis of the CDM and XCDM models with the data sets described in Sect. 2.1 and 2.2.. The first model assumption that we will take for granted is the homogeneity and isotropy of the universe at large scales. These features are exceedingly sustained by radiation backgrounds as CMB observations, and counts of sources observed at wavelengths ranging from radio to gamma rays. Notice that the measurements of CCH listed in Table 1 have been obtained from galaxies located at different angles in the sky. Under the coverage of the cosmological principle (CP) the dependence of the CCH data on the angle and location of the measured galaxies is removed, and therefore we are legitimated to reconstruct using all the values of Table 1.
2.2 Pantheon+MCT SnIa compilation
In this work we use the Hubble rate data points, i.e. , provided in [53] for six different redshifts in the range (we omit the one at because it is not Gaussiandistributed, see the caption of Table 2). They effectively compress the information of the SnIa at that take part of the Pantheon compilation [71] (which includes the 740 SnIa of the joint lightcurve analysis sample [72]), and the 15 SnIa at of the CANDELS and CLASH MultyCycle Treasury (MCT) programs obtained by the HST, 9 of which are at . The authors of [53] converted the raw SnIa measurements into by parametrizing at those six redshifts . The integral over that defines the luminosity distance is then obtained by interpolating between with cubic Hermite polynomials. Finally, the overall constant is marginalized away along with the absolute supernovae magnitude, see [53] for further details. The corresponding values of are Gaussian in very good approximation and are shown in Table 6 of [53], together with the corresponding correlation matrix. We present their inverse, , and the correlation matrix in Table 2, just for completeness. They are also almost perfectly Gaussian. It is important to remark that these values have been obtained by assuming a flat universe (and, again, the CP), and thus are modeldependent in this sense. We know, though, that these are in fact quite reasonable assumptions if our main aim is to use these data points to reconstruct around the current time. Note e.g. that the TT+lowP+lensing+BAO analysis carried out by the Planck collaboration in [19] predicts a current spatial curvature density at c.l., which is fully compatible with the flat universe scenario; or the analysis from [73], which in this case favors a closed universe, although the central value for is still low, around 0.006 when the model is confronted to the TT,TE,EE+lowP+lensing+BAO data. One can easily check that the relative change on caused but these tiny deviations from flatness is really small, being around in the redshift range . This is of course much smaller than the relative uncertainties of our data points and also than the one of the reconstructed functions (see e.g. our Figs. 2, 8 and 11). Thus, given the sensitivity of the data we are dealing with, the assumption of a flat universe has a derisory impact on our results.
2.3 Local HST determination of
In this work we also use in some of the analyses the local HST measurement from [2], km/s/Mpc, obtained with the cosmological distance ladder method, using Cepheids and SnIa.
3 The Gaussian Processes method
A Gaussian process is a distribution over functions, viz. they generalize the idea of a Gaussian distribution for a finite number of quantities to the continuum. Given a set of Gaussiandistributed data points one can use Gaussian processes to reconstruct the most probable underlying continuous function describing the data, and also obtain the associated confidence bands, without assuming a concrete parametrization of the aforesaid function. Instead, one assumes a concrete kernel, see below and the book [74]. This method has been used in several cosmological studies to reconstruct e.g. the DE equation of state parameter [75, 76, 77] or the expansion history [78, 79, 80, 81, 52, 82]. In the next subsection we review the main ideas behind the GPs method, and the formulas that are needed to implement it.
3.1 Reconstruction of functions with Gaussian Processes
A multivariate normal distribution is defined by a vector of mean values and a covariance matrix. Analogously to it, a Gaussian process is defined by two entities: its mean function, , and its twopoint covariance function ,
(3.1) 
By definition, a realization of a Gaussian process is a continuous curve, and it is possible to compute the probability of finding a realization inside any region . Let us now focus our attention on the covariance function for a while. At those redshifts at which we do not have any data point, but at which we want to obtain the value of the reconstructed function, we will define . is called the kernel function and at this stage it is completely unknown. The only thing that we know about it is that it is a symmetric function that encapsulates the information about the strength of the correlations between the values of the reconstructed function at two different points and , and also about the amplitude of the deviations from the mean value at a given location. For the points at at which we do have data, we know something more. In principle we have access to their observational correlations and uncertainties, and therefore we have their covariance matrix, . Of course, the latter can also incorporate the effect of some theoretical uncertainties that might enter the calculation of the final observational output. We define . This is the first step to establish the connection between the data and the underlying function. The correlations between points at a general and a general are also unknown, so we just define again . It is obvious that if we want the reconstructed function to vary smoothly, cannot be diagonal, i.e. it cannot be just proportional to a Dirac delta, since if it was it would be only able to describe noise at those locations with no data points. There are many options for the kernel function. Three of the most famous ones with only two degrees of freedom are the following:

The Gaussian kernel:
(3.2) 
The Cauchy kernel:
(3.3) 
The Matérn kernel:
(3.4)
and are the socalled hyperparameters of the kernel function and have a similar role in the three cases shown above. The first one controls the uncertainties’ size and the strength of the correlations, whereas the second somehow limits the scope of these correlations in , i.e. for distances the values of the underlying function at and are very uncorrelated. A priori one cannot exclude the possibility that the underlying function prefers a kernel which does not only depend on the distance , but also on the locations and themselves, in such a way that the symmetry under the interchange is kept intact. This is equivalent to say that the hyperparameters could be functions of and as well. We will stick in this work, though, to the more simple kernels (3.2)(3.4). Obviously, the reconstructed function will depend on the values of the hyperparameters and, in principle, also on our choice of the kernel. Once we select the latter, how can we properly select and ? At this point we must make use of our data. In the GPs philosophy, our data set is conceived as part of a subset of realizations of the Gaussian process. The hyperparameters are chosen so as to maximize the probability of the GP to produce our data set. If we marginalize the GP (3.1) over the points at we get the following multivariate normal distribution,
(3.5) 
where , with being the dimension of the vector of data points at our disposal, and can be set e.g. to , since the result is almost insensitive to this. Thus, the hyperparameters will be obtained upon the minimization of
(3.6) 
with being the marginal likelihood and the determinant of . This is the procedure described in [77] and used in e.g. [78, 79, 52] to compute the hyperparameters. Having done this, and using (3.1), we can compute the conditional probability of finding a given realization of the Gaussian process in the case in which . The resulting mean and variance functions extracted from such conditioned GP read, respectively,
(3.7) 
(3.8) 
These are the main outcomes of the GPs method, and will be used repeatedly along this paper. Two comments are in order now:

The GPs method is in some occasions presented as being modelindependent (see e.g. [81]), and nonparametric in others (see e.g. [77]). It is important to qualify these attributes properly. Although it is true that GPs are independent of any cosmological model, they are not modelindependent in the technical sense, since one has to choose a specific kernel, which is in charge of controlling the correlations between different points of the reconstructed function. Thus, the shape of the latter can be modified in a greater or lesser extent depending on our particular choice of such kernel, especially at those locations that are very unconstrained by the data. Strictly speaking, the GPs method is also parametric or, to be more precise, hyperparametric. The modeldependent and the parametric nature of the GPs are probably more hidden than in the usual parametric approaches, but these features are still there. So GPs are modeldependent in the sense explained before, and also (hyper)parametric.

As recognized in [77], a complete Bayesian analysis would demand to marginalize over the hyperparameters, rather than optimizing Eq. (3.6). The authors of [77] state that in most cases the log marginal likelihood is sharply peaked and that in such cases the optimization is a very good approximation of the marginalization procedure. The authors of [78, 79, 52] do not marginalize neither, they apply the same methodology as in [77]. At this stage we will apply this approximate scheme too, but later on we will explicitly show that the distribution of the hyperparameters is not peaked at all in the problem under study, and that a fully consistent analysis forces us to marginalize them, rather than just assume that their distribution is Diracdeltalike. This is equivalent to say that we will take into account the propagation of the hyperparameters’ uncertainties, something that has been neglected in the aforementioned previous studies.
In the following subsection we apply the GPs method to different data sets in order to reconstruct and derive the corresponding values of .
3.2 Determination of with GPs, cosmic chronometers, the Pantheon+MCT SnIa compilation, and the local HST measurement
Kernel  Data set(s)  

Gaussian (3.2)  CCH  1.20  +0.10  
CCH+Pantheon+MCT  2.71  +0.07  
CCH+  0.13  +3.68  
CCH++Pantheon+MCT  1.10  +3.23  
Cauchy (3.3)  CCH  0.71  +0.50  
CCH+Pantheon+MCT  2.66  +0.10  
CCH+  0.11  +3.67  
CCH++Pantheon+MCT  1.09  +3.23  
Matérn (3.4)  CCH  0.71  +0.29  
CCH+Pantheon+MCT  2.71  0.42  
CCH+  0.13  +3.61  
CCH++Pantheon+MCT  1.00  +3.08 
We apply now the methodology explained above with the various GP kernels (3.2)(3.4) to reconstruct from different data sources and infer the value of the Hubble parameter in each of these scenarios. The main outcomes of our analysis are shown in our Table 3. The results obtained by using only cosmic chronometers coincide with those presented in Ref. [52], see their Table 2 (Sample 3_0). We have used our own numerical programs, whereas the authors of [52] use the opensource Python package GaPP [77]. No difference can be appreciated between our results and theirs, which is of course good. When only CCH are used in the reconstruction of it is apparently difficult to obtain a resonant determination of by using the three alternative kernels. Nothing could be further from the truth. Although the central values are not coincident and the differences can be even of two units (compare e.g. the results obtained with the Gaussian and the Cauchy kernels), the three predictions are compatible at .
This is also manifestly shown in Fig. 1, where we plot the relative differences between the reconstructed functions obtained with the various kernels. Thus, the results obtained by using only CCH are completely resonant, and seem to prefer the lower range of values advocated in e.g. [3, 17, 18, 19], although they are still compatible at with the measurement from [2], , see the last three columns of Table 3. In the last two we collect the values of and , where is the distance (in units) between two determinations and ,
(3.9) 
where are the corresponding uncertainties. These results are telling us that cosmic chronometers cannot pronounce a verdict in favor or against . They do not have still enough resolution power. This can also be seen by looking at the values of that are obtained by combining CCH and the HST measurement (CCH+). The CCH data have little impact, and are only able to slightly reduce the original HST determination, both the central value and its uncertainty. Nevertheless the CCH data of Table 1 tend to favor central values lying in the lower range of , and only the future will tell us whether the new CCH measurements will persist in favoring this region, by shrinking more and more the error bars while keeping its central value fixed. It is also important to mention that the choice of the SPS model used to extract the CCH data might have a certain impact on our results. This will be analyzed in detail in Sect. 3.4.
Let us now describe how we have incorporated the Pantheon+MCT compilation data into our analysis. In order to use the Pantheon+MCT data in the reconstruction of the curve we are forced to translate the Hubble rate data points of Table 2 into Hubble function data. They are related through . The linking element is , but notice that is precisely what we are looking for. For instance, in the case of the CCH+Pantheon+MCT data set we lack an initial value of the Hubble parameter. In order to solve this problem we have applied an iterative numerical procedure. First of all we make use of the GPs method to derive from (only) the CCH data a value of . Then we combine this with the values of the Hubble rate of Table 2 by using a Monte Carlo routine to properly compute the central values and uncertainties of the processed . This is how we promote the Hubble rates of the Pantheon+MCT compilation to data points on . Then we can apply again the GPs method and derive the “corrected” value of . We can repeat these steps as many times as we want. It turns out that the resulting value of and its uncertainty converge, so we can decide to stop the iteration process when we attain the desired precision level. In our case we stop the routine when the difference between two successive values of is , which is enough given the uncertainties for shown in Table 3, since they are of order in all cases. With this iterative routine we find with the combined CCH+Pantheon+MCT data set central values which are even lower than those found using only CCH (see also Fig. 3). Recall that the Pantheon+MCT data set is obtained by compressing the information of more than SnIa, so it is quite significant that this data set favors the lower range of , as the CCH themselves. The values obtained with the three different kernels are again consistent with each other, since they are compatible at , and their uncertainties are strongly reduced with respect to when only CCH data are considered. To be concrete, the decrease is around a factor , and this makes our determination completely competitive with the HST measurement. The tension between the CCH+Pantheon+MCT value and is in all three cases of (cf. Table 3 and Figs. 23), so it is quite strong. This result is aligned with the conclusions of Ref. [17], in which the authors show that it is highly probable to be an outlier due to some kind of systematics in the HST measurement. This is also pointed out in [34]. Other local measurements, though, as the one from Ref. [14], slightly favor the HST preferred range. Thus the discussion is still open.
When the CCH data are combined with , the resulting determination of the Hubble parameter lies very close to itself, and the uncertainty remains almost unaltered. This is due to the fact that the HST measurement has a low uncertainty and, in addition, it is a direct measurement of , so its weight in the GP determination of is greater than the rest of the CCH data points. This also makes the results with the three kernels to be basically the same. Notice that in the CCH+Pantheon+MCT analysis something similar happens when compared with the scenario with only CCH, since by including the Pantheon+MCT data we add points that are very close to , and with very low uncertainties, as it is palpable from the two plots at the bottom of Fig. 2. This is basically reflecting what we could have already expected. When we have very precise data the values of the reconstructed function at these points and their vicinity are almost the same, independently of the kernel used in the analysis. This is actually something that must be demanded not only to the GPs but to any reconstruction method, as a matter of consistency.
Finally, if we consider the full CCH++Pantheon+MCT data set we obtain again coincident values using the three kernels (3.2)(3.4) for the very same reasons explained above. Notice that due to the effect of the Pantheon+MCT data our determination of is now lower than in the CCH+ case, and a small tension appears with itself, although only at . The tension between our determination from CCH++Pantheon+MCT and Planck’s prediction for the CDM is much larger, reaching in this case the level.
From the study carried out in this subsection we would like to remark two things that we deem important: First, the results obtained with the three different kernels, although are not coincident in some cases, are always consistent with each other. This can be clearly seen by taking a look to Fig. 1 and comparing e.g. the values of the fourth column of Table 3. There we explicitly show the level of discrepancy between our determinations of (obtained with the various data sets) and . The reader can check that for the three kernels used in this work we obtain almost exactly the same predictions once we choose a particular data set; and second, we have shown that the cosmic chronometers used in our analyses and more conspicuously the Pantheon+MCT data set prefer the lower range of values for , rather than the region of , and that the tension between the CCH+Pantheon+MCT result and the HST measurement is around the level.
3.3 Error propagation of the kernel hyperparameters
The analysis of the previous subsection has been carried out by assuming (as in e.g. [77, 78, 79, 52]) that the distributions of the hyperparameters are very peaked, effectively given by Dirac deltas located at the values at which the marginal log likelihood (3.6) is maximum. Now we study which are the changes introduced in our results if we apply the exact procedure by marginalizing the hyperparameters, in the case when only CCH are used. This is the correct way to account for their error propagation.
First of all, we have drawn the exact distributions of the and hyperparameters from (3.6) by means of a Monte Carlo routine. The result obtained for the Gaussian kernel is presented in Fig. 4. Such distributions are far from peaked, so in principle one could expect some important corrections to the determinations of shown in Table 3. Similar shapes for the histograms of the hyperparameters are found when the Cauchy or the Matérn kernels are used, so the analysis of this subsection proves to be fully necessary. Our numerical results are presented in Table 4. They have been obtained with a Monte Carlo MetropolisHastings algorithm [83, 84] that has allowed us to draw the exact distributions of for the three kernels under use. For each of them we show two different values of . The ones on the top correspond to the mean and associated standard deviation extracted from the resulting histograms of . The ones on the bottom are the bestfit values and the corresponding uncertainties at and c.l.. There is an obvious mismatch between the first and the second ones, which is more pronounced for the uncertainties rather than for the central values. The reason is simple. The exact distributions are not exactly Gaussian and this produces these small differences.
What can we highlight from these corrected results? They are again compatible with each other, so the outputs obtained with the three kernels are consistent. They are also perfectly compatible with the results of Table 3 obtained in the case when only CCH data are used. The corrections introduced by propagating adequately the errors of the hyperparameters are not dramatic at all. It is true that for all three kernels there is a reduction of the central values, which makes our determinations even more resonant with other values that one can find in the literature lying in the lower range of , but it is also true that the standard deviations grow as a direct consequence of the propagation of the hyperparameters’ errors. This adds an extra uncertainty to the final results, which are still compatible with .
We also want to remark that we cannot add the Pantheon+MCT data points and apply the recursive procedure explained in Sect. 3.2, just because in this case in which we take into consideration the propagation of the hyperparameters’ errors we obtain a collection of nonGaussiandistributed data points for and, therefore, it is not consistent to apply the GPs method, which by definition assume Gaussiandistributed data. This deviation from gaussianity is produced by the one of , that we infer in the first step of the iterative process directly from the CCH (see Table 4). The latter breaks the gaussianity of the original Pantheon+MCT Hubble rate data. Despite this, the present analysis has also served to show that the main results of Sect. 3.2 are not strongly altered. Although the shape of the kernel hyperparameters’ distributions are not Diracdeltalike in good approximation, we have explicitly checked that if we marginalize the hyperparameters instead of optimizing the log likelihood (3.6) the conclusions extracted from Sect. 3.2 are kept intact.
Before ending this section, we would like to study an alternative kernel, with three degrees of freedom, one more than in (3.2)(3.4). It is the socalled rational quadratic kernel,
(3.10) 
Kernel  

Gaussian (3.2)  
Cauchy (3.3)  
Matérn (3.4)  
Rational quadratic (3.10)  
Apart from and we have now an extra parameter, . The latter controls the departure of the kernel (3.10) from the Gaussian one (3.2). Note that for very large values of we can Taylorexpand (3.10) around by using
(3.11) 
where we have defined . The result reads,
(3.12) 
Using the GPs method with kernel (3.10) and the cosmic chronometers data we find km/s/Mpc if we just optimize the log marginal likelihood (3.6). If, instead, we propagate the errors of the hyperparameters we find the values presented in Table 4. It is remarkable that only in the latter case we recover the results obtained with the Gaussian kernel. This is possible because the CCH data prefer very large values of and, therefore, in this case the limit applies, so the rational quadratic kernel (3.10) just reduces to the Gaussian one (3.2), and the bestfit values of and coincide with the ones found with (3.2).
3.4 Possible systematics affecting the CCH data, and their impact on our results
The most important source of potential systematic errors affecting the CCH data on comes from the stellar population synthesis model adopted to compute the absolute ages of the passivelyevolving galaxies involved in the analyses. For instance, in the works [58, 59] the authors provide 13 values obtained by using two alternative SPS models: the one from Ref. [85], which we call BC03; and the one from Ref. [86], which we call MaStro. By comparing the BC03 and MaStro values reported in Refs. [58, 59] one can see that 10 out of the 13 measurements are higher when the MaStro SPS model is used, instead of the BC03 one. The differences are not very large, they are lower than in all cases except the two at and , which reach the and levels, respectively. Almost all of them are therefore individually compatible, but is important to perform a detailed study of their collective impact on our results. This is the main aim of this subsection.
Let us start describing which SPS models have been used in producing the CCH data points listed in Table 1 and the analyses carried out in the previous subsections. In Refs. [55, 60, 61] the authors only provide the values of obtained with the BC03 model. This constitutes a of the whole CCH data set presented in Table 1. In Ref. [62] only the combined MaStro/BC03 values are available, whereas in Refs. [56, 57] an alternative SPS model is used, different from the MaStro and BC03 ones. These points constitute the of the CCH data set. In contrast, in Refs. [58, 59] the authors provide both, the BC03 and MaStro values (cf. Tables 4 and 3 of [58] and [59], respectively). This group includes the remaining of the data. We have opted to use the BC03 values of these two references in our main analyses so as to incorporate consistently the data from [55, 60, 61], namely to avoid the use of a mixture of MaStro and BC03 values while maximizing the number of data points entering the calculations. Now we want to determine how robust are our results by studying which is their response under different changes of our CCH data set.
For instance, how do our determinations of change by using the measurements obtained with the MaStro model in Refs. [58, 59] instead of the BC03 ones? Let us stick for simplicity to the results obtained with the Gaussian kernel and without propagating the error of the hyperparameters. If we only use CCH data, we obtain km/s/Mpc. This has to be compared with the first value of the third column in Table 3. Clearly, there has been an increase of the central value. Now it falls in the HST preferred range, and this shows that there exists a systematic uncertainty comparable to the statistical one. Nevertheless, the difference is not significant from the statistical point of view, since they are both compatible at (at to be exact). When the Pantheon+MCT Hubble rates are also included, the result is again moved towards lower values, it reads km/s/Mpc. The tension with is now only of roughly , and is again compatible at with the results obtained using both the CCH measurements derived in the context of the BC03 SPS model and the Pantheon+MCT data (cf. Table 3, second number of the third column). The lowering of is mostly driven by the Hubble rate data points at and (cf. Table 2), which apart from having a very low relative uncertainty (around ) also acquire low central values, as can be seen in Fig. 1 of Ref. [53]. If we repeat the test removing these two data points we obtain a much larger value for the Hubble parameter, km/s/Mpc, as expected. The BC03 and MaStro results clearly favor the lower band of values when the SnIa data are also taken into account. This is also aligned with the results presented in the very recent Ref. [48], in which the authors use BAO data from BOSS together with the Pantheon compilation of SnIa to determine . They use the sound horizon at radiation drag inferred by Planck as a prior in their analysis, and find km/s/Mpc, which is fully resonant with our results, and keeps also the track of other works in the literature, as e.g. [17, 34, 52, 21].
Due to the fact that there is no apparent reason to rely on the MaStroderived measurements instead of the BC03derived ones, it is probably better to use the weighted average of the MaStro and BC03 values, and sum the difference in the resulting in quadrature to the total error, analogously to what is done in Ref. [62]. Let us firstly assume that the two SPS models provide completely independent (uncorrelated) results, which of course could be only a first rough approximation. The weighted average of the i BC03determination and the MaStro one is given by
(3.13) 
with
(3.14) 
being the associated statistical uncertainty. The total uncertainty is obtained upon summing the systematic error, i.e. , in quadrature to the statistical one, i.e. . The results obtained by using these processed CCH data points with and without the Pantheon+MCT data are km/s/Mpc and km/s/Mpc, respectively.
Let us consider now the most unfavorable scenario, in which the BC03 and MaStro values are completely correlated. The generalization of (3.14) that includes the effect of these correlations is
(3.15) 
where is the correlation coefficient between the i BC03 and MaStro determinations. Notice that for we retrieve (3.14), as desired. The results obtained by assuming with and without the Pantheon+MCT data read now: km/s/Mpc and km/s/Mpc, respectively. In this case the statistical uncertainty increases, since now we are losing part of the information carried by the two values due to the existing correlations. Nevertheless, the increase is very small and the results are essentially the same that we obtain by considering . Again, no significant deviations from the results presented in Table 3 are found. They prove to be quite robust. When we use only data on coming from CCH the results are more sensitive to the choice of SPS model, something that was already noted in Ref. [78]. The determinations of obtained by us in all these checks, though, are fully compatible, and seem to still favor with more power the lower range of values, especially when the SnIa information is also considered. This can be also seen in Fig. 5, where we show the reconstructed curves of when the BC03, MaStro and the averaged BC03/MaStro data are used in the analysis.
Let us perform yet another check: how do our results change if we only use data points obtained with the BC03 SPS model, namely if we avoid the use of the CCH data points of Refs. [56, 57, 62]? They read km/s/Mpc and km/s/Mpc with and without the Pantheon+MCT data, respectively. Again they are compatible at with the results presented in Table 3.
We have also performed some additional tests. We have studied, for instance, the impact on our results of removing the data points with a relative uncertainty lower than or equal to from the CCH data set of Table 1, i.e. those at . The results read: km/s/Mpc and km/s/Mpc for the CCH and CCH+Pantheon+MCT data sets, respectively. The uncertainties for grow, as expected. We have also determined the Hubble parameter by removing all the data points on from Refs. [58, 59]. The results read now: km/s/Mpc and km/s/Mpc for the CCH and CCH+Pantheon+MCT data sets. Again, we can conclude that our determinations remain stable and in any case fully compatible with our previous results. The consistency is kept intact, even when we test the contribution of different redshift regions for the CCH data. The values of obtained with the Pantheon+MCT SnIa data together with the CCH with is completely compatible with the one obtained when we consider the CCH data points with , or just take the full CCH data set.
The authors of Ref. [79] accounted for the systematics introduced in the analysis by the choice of a particular SPS model removing from their data set ([57, 61, 58]) those CCH data points at , which according to [58] are the ones with the greatest dependence on the assumed SPS model. They also added a uncertainty to the point at . Using GPs they obtained km/s/Mpc. When the systematics are not taken into account with the complete set of CCH measurements from [57, 61, 58] the analysis reduces to the one of [78], km/s/Mpc. Both results are compatible and lie also in the lower range.
Another interesting issue we can address is the one concerning the covariance matrix of the CCH data. There is no correlation matrix available in the literature for them, so we have used a diagonal one in our analyses, as described in Sect. 2.1. In order to estimate the potential impact of a hypothetical nondiagonal covariance matrix for the CCH data, we have removed the correlations between the Pantheon+MCT data points and have computed the resulting value of the Hubble function by using the CCH+Pantheon+MCT data set. We deem this is a useful check, since the promoted Hubble function data obtained from the product of the value extracted from CCH and the SnIa Hubble rates have as low relative uncertainties as the most precise CCH data points and, therefore, studying the stability of our results under changes of the covariance matrix in the SnIa sector could give us a good idea of their sensitivity to changes of the covariance matrix in the CCH sector. We have obtained, km/s/Mpc, which is fully inside the region of the result obtained when correlations are duly considered (cf. Table 3). Thus, no dramatic alteration of our results is produced, what again speaks up for their robustness. Despite this, there is no good reason to think that the covariance matrix for the promoted Pantheon+MCT data on is the same or similar to the one for the CCH data. Therefore, this must be conceived only as a first approximate test.
These tests must be thought of as something useful, but not definitive. We have studied what is the impact of several modifications of our data set on our results and we have seen that, although not exactly the same, they are quite compatible. Of course, the ideal framework would be to have a kind of unique data set with no dispersion coming from the choice of the SPS model. Nevertheless, we have used a quite consistent CCH list (cf. Table 1), in which the of the points have been obtained with the very same BC03 model, with a combination of BC03 and MaStro, and with alternative models. In the future, when the theoretical uncertainties associated to the SPS models are considerably reduced our results will have to be revised. It has also been recently claimed in [87, 88] that young stellar components on the quiescent galaxies used to extract the CCH data might have a nonnegligible impact on the estimation of . When this issue is fully understood also a revision of our results will be needed.
4 The Weighted Polynomial Regression method
Apart from the GPs there exist some alternative methods to reconstruct continuous functions from data. The most common way to do this is to assume a particular theoretical model or parametrization, fit it to the data by using the usual minimization to extract the bestfit values of the model parameters and the associated uncertainties, and then propagate them to build the reconstructed function with the corresponding confidence bands. In the case under study, we want to perform such reconstruction without relying on any particular cosmological model nor any particular parametrization. There are in the literature many papers that aim to reconstruct functions from observational data by assuming a concrete parametrization of the quantity of interest, see e.g. [89, 90, 91]. This parametrization can have more or less physical motivation, but normally the choice is quite ad hoc and is just one of the many (if not infinite) possibilities that might exist. Once we choose a particular model or parametrization, the following questions should be answered in a strongly convincing way if we are interested in extracting modelindependent information from the reconstructed function: (i) why have we chosen this concrete option, instead of others?; and (ii) how would our results change if an alternative model was selected? It is obvious that, by definition, upon relying on a concrete model in the fitting analysis we cannot derive modelindependent conclusions from it. The same is true even if we opt to use a pure phenomenological parametrization, just because all parametrizations implicitly assume some underlying physical behavior or are designed to work better in certain contexts.
Regression splines can be also useful to reconstruct functions between the range extremes of a given data set without assuming a particular cosmological model, see e.g. [92]. Notice, though, that in our study we want to obtain , which lies outside the cosmic chronometers data range (see Table 1), or precisely in the border if we add the HST data point, , so traditional splines cannot help us to estimate the Hubble parameter in any of these two cases. Smoothed splines are able to alleviate some of the problems, although they are expensive from the computational point of view. Moreover (as in the ordinary splines) we have to choose a particular order for the spline itself, something that is again quite subjective. The weighting method that will be explained later on could be in principle also applied to the smoothed splines approach in order to obtain a definite result from the average of the smoothed splines. In this work, though, we will apply the weighting procedure to a less involved case so as to better illustrate the method without extra complications.
Principal component analyses are also used to reconstruct functions from data, see e.g. [93] and references therein. Another promising possibility to reconstruct functions in the cosmological context is the socalled cosmographical approach, see e.g. Refs. [94, 95, 96]. In this framework all the cosmological functions are Taylorexpanded up to a certain order and the coefficients of the resulting polynomials are directly quantities with important physical meaning ^{3}^{3}3Alternative cosmographical expansions in terms of the Padé or Chebyshev polynomials have also been developed, see e.g. [97, 98].. This approach is quite attractive, since it allows to obtain physical information from data (as e.g. or , i.e. the deceleration parameter at the current time) in an almost modelindependent way. It only assumes the homogeneity and isotropy of the universe, which is wellsupported by observations. From our point of view, though, in this cosmographical method there is still some remaining degree of subjectivity in the choice of the highest order of the Taylor expansion used in the analysis, so although the method does not rely on any particular cosmological model, one must still make a choice in that point. Why is it better to consider all the terms of the Taylor expansion up to order, let us say, instead of ? Why not truncating the series at third order? How do the results change if we move from one option to the other? We will see later on that in the problem under study, i.e. the inference of from the reconstructed curve of , both the value and uncertainty of are quite sensitive to the these issues (most probably because lies at an extreme of our data set), and therefore, we must account for this problem that has been overlooked in past works. This is our main aim in this section.
4.1 Regression with linear functions in parameters
In this subsection we quickly review the technique and main expressions needed for fitting data with functions that are linear in the parameters. Imagine we have the following function,
(4.1) 
with being the fitting parameters and some functions of the variable . The latter are the socalled basis functions. If we have a collection of Gaussiandistributed data points with covariance matrix , and we want to fit (4.1) to them we have to maximize the likelihood
(4.2) 
with respect to the elements of the vector of parameters . Notice that in the last formula we are using the Einstein summation convention, as we will do in all the forthcoming expressions unless stated otherwise. Also note that we use Greek letters for indexes labeling data points, and Latin ones for those labeling the terms of , as in (4.1). Due to the linearity of the latter in the parameters it is possible to rewrite the likelihood (4.2) as a multivariate Gaussian distribution for the parameters too, i.e.
(4.3) 
where
(4.4) 
is the (inverse) covariance matrix of the paramaters, , and
(4.5) 
According to Bayes’ theorem (see e.g. [93]), if the prior distributions for the parameters are constant and flat we can directly identify (4.3) with up to a normalization factor, independent from . Thus, in this (and only this) constant flat prior scenario (4.5) is the mean (or bestfit) value of the i parameter. In any case, it is quite straightforward to compute the mean function and covariance matrix once a set of functions are chosen, i.e. once we select a particular model . It can be done as follows (here we write again the sum symbols explicitly),
Polyn. degree (d)  BIC  AIC  (BIC)  (AIC)  

(4.6) 
(4.7) 
The variance of the reconstructed function is just . It is also easy to compute the mean function and covariance matrix of the derivatives,
(4.8) 
(4.9) 
These tools can be used, e.g. to reconstruct in the cosmographic scenario. We just have to Taylorexpand around e.g. ,
(4.10) 
In this case , and , , etc. Applying the formulas (4.4)(4.9) and the CCH data we obtain the results presented in Figs. 67 and Table 5 for the twelve polynomials that result from truncating the Taylor expansion (4.10) at order 1, 2, 3, etc., up to order 12, respectively. Let us take a look on Fig. 6 first. It is evident that the shape of the reconstructed functions and confidence bands change a lot for the various (cosmographic) polynomials, especially at those regions with only few data points, i.e. far from , but also (although at lesser extent) in the lowredshift range. This dispersion is of course also passed on the value of inferred from each of them, see the fifth column of Table 5 too. This clearly is pointing out that choosing a particular polynomial in the cosmographic approach is completely insufficient and unjustified, since it leads us to incompatible determinations of . And the trouble is even enhanced for the derivative of , see Fig. 7. Notice that the problem is much more severe than the one affecting the choice of the kernel in the GPs method (see Sects. 3.2 and 3.3). In the next subsection we propose a way to get rid of this drawback.
4.2 Reconstruction of functions with the WPR method
To overcome the problem exposed above, it proves useful to regard it from a different angle. We should focus our efforts on trying to solve the following questions: why do we have to rely on the results of a particular polynomial? Does it exist a way of incorporating the information of all the candidate fitting polynomials consistently and, thus, a way of skipping the problem of choosing just one among them? Yes, it does. Let us call , ,…, the cosmographic polynomials of order , ,…, , respectively. That is, let us conceive each polynomial as a different model, and compute the probability density associated to the fact of having a certain shape for the function as follows,
(4.11) 
where is just a normalization constant that must be fixed by imposing
(4.12) 
Taking into account that
(4.13) 
and
(4.14) 
we find and therefore:
(4.15) 
We now denote as the most probable model (later on we will study possible ways to identify it among the whole set ) and rewrite the last expression as follows,
(4.16) 
where can be identified with the Bayes ratio [99]. Using (4.14) one finds
(4.17) 
so (4.16) can be finally written as
(4.18) 
This is the central expression of the weighted regression method, where the weights are directly given by the Bayes factors. Notice that from (4.18) we can compute the (weighted) moments and related quantities too. For instance, the weighted mean and variance read,
(4.19) 
(4.20) 
where and can be computed by means of (4.6) and (4.7), respectively. For the obtention of the Bayes factors we invoke the timehonored Bayesian and Akaike information criteria, BIC and AIC, defined as [100, 101]:
(4.21) 
with being the minimum of the function in the model , i.e. 2 times the exponent of (4.2) evaluated with the bestfit values , and the number of fitting parameters for this model, i.e. the dimension of . Recall that is the number of data points entering the analysis. In this way the Bayes factor between the model and the most probable model can be approximated by
Weight used  Data set(s)  
BIC  CCH  0.94  0.18  
CCH+  0.12  +3.52  
AIC  CCH  0.61  0.01  
CCH+  0.08  +3.58 