Accelerated expansion from ghost-free bigravity: a statistical analysis with improved generality

# Accelerated expansion from ghost-free bigravity: a statistical analysis with improved generality

Yashar Akrami, Tomi Sebastian Koivisto and Marit Sandstad
Institute of Theoretical Astrophysics, University of Oslo
P.O. Box 1029 Blindern, N-0315 Oslo, Norway
E-mail:
July 4, 2019
July 4, 2019
###### Abstract:

We study the background cosmology of the ghost-free, bimetric theory of gravity. We perform an extensive statistical analysis of the model using both frequentist and Bayesian frameworks and employ the constraints on the expansion history of the Universe from the observations of supernovae, the cosmic microwave background and the large scale structure to estimate the model’s parameters and test the goodness of the fits. We explore the parameter space of the model with nested sampling to find the best-fit chi-square, obtain the Bayesian evidence, and compute the marginalized posteriors and mean likelihoods. We mainly focus on a class of sub-models with no explicit cosmological constant (or vacuum energy) term to assess the ability of the theory to dynamically cause a late-time accelerated expansion. The model behaves as standard gravity without a cosmological constant at early times, with an emergent extra contribution to the energy density that converges to a cosmological constant in the far future. The model can in most cases yield very good fits and is in perfect agreement with the data. This is because many points in the parameter space of the model exist that give rise to time-evolution equations that are effectively very similar to those of the CDM. This similarity makes the model compatible with observations as in the CDM case, at least at the background level. Even though our results indicate a slightly better fit for the CDM concordance model in terms of the -value and evidence, none of the models is statistically preferred to the other. However, the parameters of the bigravity model are in general degenerate. A similar but perturbative analysis of the model as well as more data will be required to break the degeneracies and constrain the parameters, in case the model will still be viable compared to the CDM.

Modified Gravity, Massive Gravity, Bigravity, Dark Energy, Background Cosmology, Statistical Analysis
preprint:

## 1 Introduction

Ever since the first proposal by Fierz and Pauli [1], giving gravity a mass has seemed like a theoretically appealing extension of gravity. However, the fact that these theories contained ghost instabilities leading to the wrong limits in systems such as the solar system [2] rendered them unattractive, and research endeavors into massive gravity theories lay for the most part abandoned for decades.111For a recent review of massive gravity see [3].

Recently this has changed as new nonlinear interactions free of ghosts have been discovered, described and explored by de Rham, Gabadadze, Tolley and others [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]. This formulation of massive gravity used a fixed extra two-tensor with no dynamics of its own [11] and does not have stable homogeneous and isotropic solutions [18, 23, 28].

A generalization to having two free and dynamic metrics in the formulation was then proposed and showed a theoretically viable option by Hassan and Rosen in [29, 30, 31]. Various theoretical aspects of this interesting theory have been explored for instance in [32, 33, 34, 35, 36, 37, 38, 39, 40, 41]. In particular the cosmology has been explored in [42] which emphasized the energy contribution of the extra field, in [28] that described the Friedmann-Lemaître-Robertson-Walker (FLRW) solutions and sorted them into two branches and in [43] that looked at data constraints in some specialized corners of the parameter space. In [44] the cosmology of the theory in the setting where the fundamental metric is homogeneous and isotropic while the other is inhomogeneous was explored. In [45] the cosmological perturbation equations for this theory were given for the first time and these were reformulated and explored further in [46] and [47]. Using an alternative approach, it was recently claimed that the cosmological solutions exhibit an instability for some parameter combinations [48]. Such instabilities were however not observed222This is possibly because the “backward instabilities” reported in [48] correspond to decaying modes. In any case, in our comprehensive background analysis, we found no instabilities, which strongly suggests that at least the homogeneous modes are stable in these cosmologies. in the studies [46, 47] that used standard cosmological perturbation theory.

Although the background cosmology of this theory with two homogeneous and isotropic metrics has been explored in [42, 28, 43], none of them has given an exhaustive parameter estimation and exploration of the full parameter space in comparison with data in order to find whether this theory has new dynamics yielding better or comparable fits to those of the standard model of cosmology ( Cold Dark Matter; CDM; for an introduction, see e.g. [49, 50]). This is what this paper endeavors to do. It is quite interesting to study precisely how viable these models are, as a finite mass of a graviton would provide a well-motivated and theoretically consistent explanation for the enigmatic speed-up of the late-time expansion of the Universe (for a review, see e.g. [51]). In particular, were the data to prefer the higher order interactions of graviton that give it a mass over the constant term, one could also hope to shed new light on the long-standing cosmological constant problem [52]. For a large collection of other possible dynamical models of dark energy and modified gravity that have been proposed in attempts to address the issue of cosmic acceleration, see e.g. the reviews [53, 54, 55].

The paper is organized as follows; In section 2 we give a brief theoretical introduction to bigravity theory and in particular its formulation in two homogeneous and isotropic metrics. In section 3 we describe the observational data that we used to compare with predictions of the model in different parts of parameter space. We also present a handy reformulation that we used for the numerical integration of the equation set and how we performed this integration. In addition we describe the statistical methods we used to perform the parameter estimation. In section 4 we describe the results of our analysis and parameter estimation when a method for numerical integration of the equation set was established. To better understand the dynamics of the solutions we first present the analysis of certain specific parts of the parameter space, and in this way we demonstrate how the system becomes degenerate, how good fits to the data can be obtained, but also how the search for a best-fit model reveals that a theory which mimics the CDM as closely as possible is preferred, and the closeness333Which is in fact arbitrarily good in more than one corner of parameter space. to the CDM that is obtainable in these theories is just what makes them such good fits to the data.

## 2 The model

### 2.1 Bimetric theory of massive gravity

In this section, we briefly review the ghost-free, bimetric theory of gravity without restricting the framework to any particular choices of metrics. We start from the action level and present the modified field equations for the dynamical variables of the theory.

The theory contains two space-time metrics. The first one, which we denote by , is assumed to be the physical metric, namely the metric based on which all usual distances in cosmology are defined. The other metric, denoted by , is a dynamical rank-two tensor that is essential for the theory to be ghost-free; this metric when coupled to the physical metric, gives mass to gravitons444For other recent developments in bimetric and biconnected spacetimes, see [56, 57, 58, 59, 60]..

The action for our ghost-free, massive theory of gravity was first presented in [30]. We will follow the notation of [43] everywhere in the paper, and most of this introductory section (and the next section) will follow that paper quite closely. Let us begin with the action which has the following form:

 S = −M2g2∫d4x√−detgR(g)−M2f2∫d4x√−detfR(f) (1) +m2M2g∫d4x√−detg4∑n=0βnen(√g−1f)+∫d4x√−detgLm(g,Φ).

Here, the matrix is defined such that , and are the Ricci scalars for the metrics and , respectively, and denote the Planck masses corresponding to the two metrics, and shows the Lagrangian for the matter sector. In addition, are elementary symmetric polynomials of the eigenvalues of the matrix and have the following forms:

 e0(X)=1,e1(X)=[X], e2(X)=12([X]2−[X2]), (2) e4(X)=det(X),

where square brackets denote the traces of the matrices . The small in eq. (1) is the graviton mass and the five quantities () are free parameters that need to be determined observationally555It is important to note that there are other ways of parameterizing the model, in particular one that is used relatively widely in the literature on massive gravity in terms of parameters . In this case the action is still written in terms of but as functions of (see e.g. [11] for expressions that relate and ). The potentials corresponding to the -terms () however contain constant terms that contribute to the cosmological constant of the model, meaning that does not capture all the cosmological terms. Since one of the main goals of the present paper is to investigate whether the bigravity theory can explain the late-time acceleration of the Universe in absence of an explicit cosmological constant (i.e. when it is set to zero) we adhere to the -parameterization where at least the vacuum energy contribution to the cosmological constant that corresponds to the physical metric is represented by . This is the quantity we will set to zero in most of the analyses of this paper. Also see the discussion of footnote 8..

The equations of motion for the two metrics (or the equivalents to the Einstein equations) can be derived by varying the action with respect to the metrics; this gives:

 Rμν−12gμνR+m223∑n=0(−1)nβn[gμλYλ(n)ν(√g−1f)+gνλYλ(n)μ(√g−1f)]=1M2gTμν, (3) ¯Rμν−12gμν¯R+m22M2⋆3∑n=0(−1)nβ4−n[fμλYλ(n)ν(√g−1f)+fνλYλ(n)μ(√g−1f)]=0,

where over-bar denotes quantities corresponding to the metric, is the stress-energy-momentum tensor, , and

 Y(0)(X)=1,Y(1)(X)=X−1[X],Y(2)(X)=X2−X[X]+121([X]2−[X2]), Y(3)(X)=X3−X2[X]+12X([X]2−[X2])−161([X]3−3[X][X2]+2[X3]). (5)

As in standard gravity, these equations determine the dynamics of the space-time degrees of freedom, i.e. the geometry and time-evolution of the Universe, if the properties of matter and energy are known (through the tensor ).

As observed in [47], we can perform the constant rescaling and in order to set to unity. This quantity is therefore not a free parameter of the theory; we will drop it in the rest of the paper. We also assume to be the usual (reduced) Plank mass .

In addition to the equations of motion, further constraints are imposed on the dynamics of the two metrics by Bianchi identities and the assumption that the stress-energy-momentum tensor of the matter components is conserved:

 ▽μ3∑n=0(−1)nβn[gμλYλ(n)ν(√g−1f)+gνλYλ(n)μ(√g−1f)] = 0, (6) ¯▽μ3∑n=0(−1)nβ4−n[fμλYλ(n)ν(√g−1f)+fνλYλ(n)μ(√g−1f)] = 0. (7)

As we will see in the next section, this gives us an extra piece of information which dramatically simplifies the evolution equations when applied to the entire Universe.

### 2.2 The case for an isotropic and homogeneous universe

In the previous section, we described the full theory of ghost-free bimetric gravity where we did not assume any particular forms for the metrics. The purpose of the present paper is however to compare the cosmological predictions of the model to real data. A fundamental assumption in the standard concordance model of cosmology, the CDM, is that the Universe on large scales is spatially isotropic and homogeneous. This assumption restricts the metric of the Universe to be of the FLRW form. Based on the same observational and theoretical reasons, we follow [43] and assume both of the two metrics and in our bigravity model exhibit spatial isotropy and homogeneity. As we will see, this assumption leads to a set of generalized Friedmann equations that we will use in comparing the background dynamics of the Universe, predicted by the model, to different types of cosmological measurements.

The spatially isotropic and homogeneous metrics and read

 ds2g = −dt2+a2(dr21−kr2+r2(dθ2+sin2θdϕ2)), ds2f = −X2dt2+Y2(dr21−kr2+r2(dθ2+sin2θdϕ2)), (8)

where and are the time-dependent spatial scale factors corresponding to the metrics and , respectively. is a time-dependent function that must be determined through the evolution equations given by the model. We have additionally assumed the same spatial curvature for the two metrics () [28].

In order to have consistent solutions in this case, where the matter sources are time-dependent, eqs. (6) and (7) require the following relationship between the functions , and [43]:

 X=˙Y˙a=dYda, (9)

where over-dot denotes derivative with respect to time. As we mentioned earlier, this relation is necessary for the conservation of the stress-energy-momentum tensor through the Bianchi identity of .

As in the CDM case, we assume that the Universe is filled with dust, with the energy density , and radiation, with the energy density (other components can be added easily). The equations of motion (3) and (LABEL:eq:EoM4f) then give the following generalized Friedmann equations 666Note that eq. (11) is slightly different from the equivalent equation in [43]; this could be a misprint in that paper which has been corrected here.:

 3(˙aa)2+3ka2−m2[β0+3β1Ya+3β2Y2a2+β3Y3a3]=1M2g(ρm+ργ), (10)
 −2¨aa−(˙aa)2−ka2+m2[β0+β1(2Ya+˙Y˙a)+β2(Y2a2+2Y˙Ya˙a)+β3Y2˙Ya2˙a]=13M2gργ, (11)
 3(˙aY)2+3kY2−m2[β4+3β3aY+3β2a2Y2+β1a3Y3]=0, (12)
 (13)

## 3 Numerical investigation of the parameter space

This section is about our strategy and methods we use to compare the predictions of the bigravity model to a set of cosmological observations, assess how well they match and then constrain the parameters of the model. We first, in section 3.1, introduce the dataset we use and the cosmological quantities that have to be computed theoretically in order to make the comparison with observations. Section 3.2 describes all the simplified dynamical equations for the bigravity theory that we employ in our numerical investigation. It also shows more clearly what dynamical variables play the most important roles in the evolution equations. This in addition helps us to understand the physics of the model in comparison to the standard model. We then continue our description of the model in section 3.3 with a discussion about the initial conditions we need to set for the evolution equations in our numerical implementation of the model. Finally, in section 3.4 we focus on the statistical aspects of our work and review different statistical frameworks we employ, as well as our strategy for exploring the parameter space of the model. This clarifies how we aim to compare the model’s predictions to the real data in practice. Readers who are familiar with or not interested in such statistical and scanning techniques can skip section 3.4 and continue with our results and discussions in section 4.

### 3.1 Constraints from cosmology

Cosmological data that are used in comparing predictions of cosmological models to observations are classified into two main categories: 1) constraints from measuring the geometry and background evolution of the Universe on large scales (expansion history), and 2) constraints from the formation, distribution and evolution of structures in the Universe (growth history). In order to see whether a model is viable observationally, one usually starts with the background cosmology. This is also simpler to study since the background equations are considerably simpler to derive and implement numerically. Studying the structures requires the field equations to be perturbed around the background (FLRW metrics in our case). Here, we only work with the background dynamics and leave the investigation of the model at the perturbation level for future work.

Three main sources of information in cosmology are 1) anisotropies of the Cosmic Microwave Background (CMB) Radiation, 2) Baryon Acoustic Oscillations (BAO), and 3) Type Ia Supernovae (SNe Ia). At the background level, observational constraints on a theoretical model provided by these sources all involve calculations of different types of cosmological distances. In general, such distances depend on the parameters of the model and by comparing them to the measured distances one can constrain the model and determine how successful the model is in describing the Universe. In what follows, we briefly review the distance measures important in CMB, BAO and SNe Ia observations, how they are calculated from a cosmological model and how they are related to the actual cosmological data.

Cosmic Microwave Background: In order to properly extract the information from the tiny fluctuations on the CMB one usually looks at the angular distribution of the fluctuations through the computation of the angular power spectrum. One can on the other hand derive the theoretical power spectrum by solving some Boltzmann codes numerically and fit the model to the data by comparing the two spectra. The latter needs perturbative equations to be solved. There is however one important quantity that can be measured from the observed CMB power spectrum and only depends on the background equations: the position of the first peak on the spectrum (denoted by ). This represents the angular scale of the sound horizon at the recombination epoch. Since we are working only with the background equations in this paper, we adhere to this quantity to place constraints on our model. One can show that

 lA=π(1+zr)DA(zr)rs(zr), (14)

where is the angular-diameter distance to the CMB last-scattering surface, i.e. at the redshift of recombination . is the co-moving sound horizon at . Theoretically, and as functions of redshift can be calculated from the following expressions (we assume a flat universe, i.e. ):

 DA(z) = 1(1+z)∫z0cdz′H(z′), (15) rs(z) = ∫∞zcsdz′H(z′). (16)

Here, is the speed of light, is the speed of sound before recombination and is the Hubble parameter at a given redshift . The latest measurements of the CMB power spectrum by the satellite Wilkinson Microwave Anisotropy Probe (WMAP) [61] has determined the value of to be . In addition, the value we assume for is ; we neglect the uncertainties about this value [43].

Baryon Acoustic Oscillations: As in the CMB case, in order to extract full information from the distribution and evolution of large-scale structures in the Universe, one needs to look at the perturbative equations. This involves fitting the theoretical power spectrum of matter to the observed one. There are however simpler ways to work only at the background level. One such possibility is to consider the ratio of the sound horizon at the so-called drag epoch (with the redshift of ) to another quantity called dilation scale (denoted as ) at some arbitrary redshifts. is the redshift of an epoch at which the baryon acoustic oscillations are frozen in (). The theoretical value of the dilation scale at any redshift can be obtained from the expression

 DV(z)=[cz(1+z)2DA(z)2H(z)]1/3. (17)

In the present analysis, we use the values of this ratio when is measured at redshifts , , , , and . These measurements have been done by the galaxy surveys 2dFGRS, 6dFGS, SDSS and WiggleZ ([62], [63], [64]) and are given in table 1.

In applying the CMB and BAO measurements mentioned above to our model, we follow the strategy used in [43] where the BAO measurements of at different redshifts are multiplied by the CMB measurement of in order to reduce the model dependence of the constraints. Assuming the measured value of for the ratio reported by WMAP collaboration, the combined constraints (from CMB and BAO) used in our analysis are the ones we have given in table 1.

Type Ia Supernovae: Luminosity measurements of SNe Ia have proven to be a powerful source of information about the geometry and evolution of the Universe at late times. After the striking discovery of the accelerated expansion in 1998 [65, 66], they have received much attention as standard candles in observational cosmology. The key quantity in using SNe Ia in cosmological model analysis is the luminosity distance which can be computed theoretically for a model and constrained directly from the SNe observations:

 DL(z) = (1+z)∫z0cdz′H(z′). (18)

The dataset we use in our analysis is the Union2.1 compilation of SNe [67] which contains SNe in total. We include the reported systematic uncertainties for the measurements given in that paper.

Present value of Hubble parameter (): is a quantity that appears in all expressions used in comparing predictions of a theoretical model to the real data. It is therefore important to know what value it has in reality. We use the value provided by seven-year WMAP observations, i.e. km s Mpc. In CDM cosmology, is a free parameter of the model that is to be constrained observationally. As we will see in the next section, in our bigravity model, it is a prediction of the model when other parameters are fixed.

### 3.2 Hubble parameter and evolution equation

We saw in the previous section (eqs. (15), (16), (17) and (18)) that all quantities used in constraining a model through cosmological observations can be calculated theoretically only if the Hubble parameter is known during the evolution of the Universe. For the standard CDM model, this can be calculated in terms of the usual cosmological parameters such as various density parameters. In our bimetric theory, we have assumed that one metric (we chose it to be ) is the physical one and this is the metric that should be used in defining different cosmological quantities including distances. The metric is additionally assumed to be of FLRW form with a time-dependent scale factor () as the only dynamical quantity which determines the evolution of the Universe. This all means that we can follow the standard recipe and use the Hubble parameter defined based on that scale factor in our calculations of observable quantities (i.e. ). Our next task is therefore to find a time evolution equation for the Hubble parameter.

As usual, we define the redshift as , where is the scale factor at present time (which is chosen to be unity). This gives the possibility of calculating as a function of and using it directly in the expressions for cosmological distances. In addition, it turns out that working with the variable (where is the scale factor corresponding to the metric ) significantly simplifies the equations used in numerical calculations. In fact, as we will demonstrate below, all the interesting equations and dynamical variables (including the Hubble parameter) can be written in terms of (which we expect to be a time-dependent quantity). For numerical integration, as will be discussed below, it is additionally useful to reformulate the equations in terms of the e-folding time . This transforms the time derivative into the e-folding time derivative: , which we will denote by .

Before giving the expression for in terms of and parameters of our model, we perform one more redefinition, this time on the parameters. It is obvious already from the action (eg. (1)) that the parameter is redundant since it just multiplies all the s. We can therefore redefine s such that they absorb . However, since is presumably quite small (something of the order of the present value of the Hubble parameter, ), we further normalize the parameters to the observed value of (i.e. km s Mpc). This gives us a set of new parameters that we call s: . Note that the exact value of is not important here and we use it only for simplicity reasons, i.e to work with a set of parameters that are likely to have values of not more than a few orders of magnitude larger or smaller than unity. As we will see later, the present value of the Hubble parameter is not a free parameter of the theory and will be predicted by the model in terms of the other parameters.

After including all the aforementioned modifications into eq. (10), we get an equation for the Hubble parameter as follows 777Here, and in the rest of this section, we include the curvature term in the equations for a general -value. As we will mention later, in the present paper we however analyze the bigravity model only for the case (flat Universe); this value has been chosen for simplicity reasons and is also a justified assumption based on the current constraints on the curvature of the Universe from analyses of the CDM concordance model. Including the curvature term is however a very straightforward procedure and we leave it for future work.:

 H2H20=−kH20exp(2x)+B03+B1y+B2y2+B33y3+Ωm+Ωγ, (19)

where we have defined the density parameters as usual, i.e. ( being the energy density for the component ).

It can be seen from this expression that in order to compute the Hubble parameter at any given time during the cosmic evolution, one needs to know the values of the parameters , as well as the dynamical quantities , and . Matter and radiation follow the standard evolutions with redshift according to their traditional equations of state, i.e. and . We have not included an explicit cosmological constant density parameter in the equation because the parameter , and correspondingly , act as a cosmological constant; including the quantity is therefore redundant888Even though it is that we will call the (explicit) cosmological constant throughout the paper, we should note that in general it may not capture all contributions to the cosmological constant. Strictly speaking, it represents the vacuum energy contribution which is arguably the most interesting type of the cosmological constant. In bimetric theory (the ghost-free massive bigravity that we study in the present paper) the notion of vacuum energy (that receives contributions from the and matter loops) is well defined and is given by and (or correspondingly and ). On the other hand, the notion of an “intrinsic cosmological constant” is not as well defined. In General Relativity, “cosmological constant” and “vacuum energy” are the same, but not in bimetric theory. Vacuum energy always appears in the form in the action and this is the quantity that receives large corrections from quantum loops of heavy particles. In the bimetric theory measures vacuum energy (same of for the sector). One could also call this an “explicit” cosmological constant. But the total observed cosmological constant is to be read off from Einstein’s equations. In the bimetric setup whenever the equations contain a quantity that is cleanly identifiable as a cosmological constant, that will be a combination of the ’s. In the specific bigravity model that we are interested in here, i.e. with and being FLRW (or FLRW-like) metrics, the total observed cosmological constant must be identified from the Einstein’s equations in this case, i.e. the modified Friedmann equation (19). It seems from this equation that the -term (or equivalently -term) is the only term that appears as the cosmological constant for the physical metric and all other terms (that involve other ’s) are functions of time since is not a constant in general. As we will see later, our numerical analysis shows that is not a constant in order for the model to fit the data. We therefore call () the cosmological constant throughout the paper. Even if there is still a hidden cosmological constant lurking within the other -terms, we have at least set the vacuum energy contribution to zero by setting to zero. An interesting example of bigravity models for which a combination of all ’s contribute to the cosmological constant is the case of the backgrounds of the type (which are valid backgrounds only when the parameters satisfy some specific constraint; a property that is not satisfied in our case of cosmologically interesting backgrounds). In this case the parameter () corresponds to the cosmological constant (see footnote 5). The only way of cleanly identifying a cosmological constant is to consider type backgrounds. Then one obtains the most general expression for the cosmological constant in these models without constraining the parameters in any way (as in [39] or [68] where the details are given). This gives not , but what is called in those references, as the cosmological constant. Clearly, even if we set , we find that does not vanish unless we set . One may also think of this in the following way: in massive gravity (not bimetric theory) if one decides on , then becomes an intrinsic cosmological constant that must vanish for consistency. But as soon as is made dynamical, this privilege is lost..

As the next step in solving eq. (19) for , we need to find what the values of are at different times (or redshifts). Using eqs. (10) - (13) we obtain the following equation for (derivative of the -metric’s normalized scale factor with respect to ):

 y′ = y[−2B0+B3y3+3Ωm+6Ωγ+6B2+3B3y]+3B1(1−y2)B0+3Ωm+3Ωγ+6B1y+3B2(3y2−1)+2B3y(2y2−3)−3B4y2−y. (20)

This shows that the dynamical variable has a one-dimensional phase space, meaning that knowing its value at any given time is sufficient for obtaining its values at all other times; as we will see later, we use this differential equation to determine the value of (and accordingly ) at different redshifts. We will discuss in the next section how we can obtain the value of at a particular redshift to set an initial value for eq. (20).

Before continuing with setting the initial conditions, let us take a closer look at the equations of motion (10) - (13) to see whether there is any alternative approach in calculating the Hubble parameter as a function of redshift. The answer “seems” to be “yes”: the equation set yields another equation for the Hubble parameter (if at least one of the parameters , , or is nonzero):

 H2H20=−kexp(−2x)H20+B43y2+B3y+B2+B13y−1. (21)

Eqs. (19) and (21) are enough to set the values of the system at all times. We can see this by solving eq. (21) for and plugging it into eq. (19). This combination of eqs. (19) and (21) yields a quartic equation in 999Differentiating this equation with respect to gives the differential equation (20).:

 B33y4+(B2−B43)y3+(B1−B3)y2+(B03+Ωm+Ωγ−B2)y−B13=0. (22)

With equation (22) we could in principle calculate the values of at any redshifts without integrating the system through the entire history of the Universe. However, since a “quartic” equation can in general have as many as four real solutions, it is not necessarily easy to choose the right root at each given point: “is this root the one that the root from the previous point would have evolved to?” To insure that this is the case, we instead use the differential equation (20) to solve for . This also guarantees real solutions throughout the cosmic history since starting with a real solution, a differential equation with only real coefficients is guaranteed to yield solutions that will remain real at all times.

By studying eq. (22), we can immediately deduce what form the asymptotic solutions will take. At early times the terms containing and will dominate since they evolve according to and , respectively. Deep in the past the equation will then approximately read as

 (Ωm+Ωγ)y=0, (23)

so that , eliminating the contributions from in eq. (19). Hence, at early times, we will have the usual CDM model with . As time progresses the other terms become important, and starts varying with time. At late times, , and the curvature terms become negligible, and will be given in terms of a time-independent quartic equation. This means that the Universe will have transitioned again into a cosmological constant phase, but now with a different value of the constant. An example of this behavior can be seen in fig. 1 where we show as a function of redshift ( corresponds to far in the future). We therefore expect the model to give good fits to the cosmological observations as long as the change in is not dramatically large and the Hubble parameter evolves in a way similar to that of the CDM model.

### 3.3 Setting initial conditions

To perform the numerical integration of eq. (20), we start from the present time and integrate into the past. This is done by solving the quartic equation (22) at and then using eq. (20) to obtain at all interesting redshifts throughout the evolution of the Universe. One could in principle solve eq. (22) at any other arbitrary redshift instead, but choosing is particularly useful in practice for simplicity reasons, as well as to set the initial conditions stably. In addition, it makes it possible to compute co-moving distances belonging to each redshift of interest in the same integration code as the one used for computing the Hubble parameter. This makes the statistical analysis of the model faster and the numerical calculations more rational. Now that the starting values of and (i.e. at ) are determined for a set of model parameters using eqs. (22) and (21), the numerical integration can commence using eq. (20) and the Hubble parameter can be calculated at all redshifts using eq. (21).

Some caveats however need to be considered for this scheme since the quartic equation generically has several solutions: Firstly, all values of should not be accepted. is the scale factor of a metric () normalized to the scale factor of another metric (), and it therefore seems meaningless for to take on complex or negative values. Therefore only real and positive roots of eq. (22) should be considered; this means that many choices of parameters yield no viable solutions. Secondly, we see from eq. (19) and (21) that and not is the quantity that is directly calculated from ; we should therefore ensure that it receives positive values. We will see in more detail in the next section how we implement these conditions numerically. And thirdly, for a single set of parameters of the model, there may be several viable roots for eq. (22). Again, as we will see later, this subtlety is taken care of by treating the initial value of as an additional parameter in our statistical analysis. This parameter can in general take on , , , or different values at each point in the model’s parameter space (depending on the number of allowed solutions for the quartic eq. (22)). Hence, in exploring the parameter space for the best-fit parameter values, we should consider all the allowed initial values of , perform the analysis and then choose the initial that gives the best fit.

### 3.4 Statistical framework, scanning strategy and comparison to observation

So far, we have set up our theoretical model: we know how to solve the dynamical equations of the model, how to compute necessary observable quantities to be used in comparison of the theoretical predictions to various cosmological observables, and which data to use. What we intend to discuss in this section is how to confront the model with observation in practice, i.e. how to explore the parameter space of the model to test how fit it is to the data and what values of the parameters are preferred. Let us recap what we have found and discussed so far:

• The full model has the following six free parameters:

 Θ=(β0,β1,β2,β3,β4,Ω0m). (24)

Here, as we stated earlier, we have assumed a flat Universe (i.e. ). In addition, we have neglected the radiation contribution to the evolution equation and background observables, i.e. we have assumed . This latter assumption can be justified by the fact that in the standard CDM cosmology, the present energy density of radiation is vanishingly small () compared to the matter contribution (). Its value then remains negligible up to the redshift of matter-radiation equality which goes well beyond the redshift range we are using in the present analysis. We expect a similar ratio between and in our bigravity model which enables us to neglect radiation in the entire period of interest (from the present time to the epoch of recombination) as radiation and matter scale with redshift as and , respectively.

• By fixing the values of all parameters , eqs. (22) and (20) tell us how , corresponding to that chosen point in the parameter space, evolves with time (or with redshift).

• Eq. (21) can then be used to calculate the Hubble parameter as a function of .

• Knowing at a given point in the parameter space is then enough for calculating all the quantities we need in order to compare the model (at that point) with the observations we described in section 3.1.

In order to constrain the model and test how fit it is to the real data, we need to perform a statistical analysis. This requires a scanning of the parameter space of the model and this can be done in various ways depending on which statistical framework one chooses to work with. In statistical parameter estimation and model selection, there are in general two different types of statistical inference that are fundamentally very different: “Bayesian” and “frequentist” statistics (for a general introduction to statistical inference, see e.g. [69], and for some discussions about the differences between the two approaches in data analysis, see [70, 71, 72] and references therein). In presence of sufficient observational data where the parameters of the model are strongly constrained, both statistics are proven to yield the same results. Any discrepancy between the inferences therefore shows that either we need more constraining data, or the numerical methods employed in exploring the parameter space are not accurate enough. In what follows we first briefly review the two approaches and their main ingredients used for both parameter estimation and model selection. We then discuss why it is essential to use both approaches in order to ensure that the results of the statistical inference are reliable and robust. We use both approaches in our analysis of the bigravity model in this paper.

Bayesian inference: This approach has now become among the standard tools in cosmological data analysis (for applications of Bayesian statistics in physics, see e.g. [73], and for reviews of its applications in cosmology, see e.g. [74, 75, 76, 77]). Here, one can assign probabilities to the parameters of the model under consideration. Let us first define the following quantities: 1) , which denotes our “prior” probability density function (PDF) and reflects our knowledge or prejudices about the values of the parameters before comparing the model with observations. 2) , which is called the “likelihood”, and is the probability of obtaining the data from the model parameters when it is considered as a function of . 3) , or the “posterior” PDF, which represents the probability of the model parameters when the data are used. Finally, 4) , which is called the Bayesian “evidence” and is the probability of observing the data when we integrate over the entire parameter space. Here, we have assumed to be the model hypothesis under consideration which we have parameterize by . Bayes’ Theorem then relates these quantities in the following way:

 P(Θ|D;H)=P(D|Θ;H)P(Θ;H)P(D;H). (25)

This expression simply shows how our prior degree of belief about different values of model parameters is updated when new observational data are used. Our updated knowledge, given in terms of the multi-dimensional PDF, , is the quantity of interest in Bayesian statistics through which various statistical statements can be made. One can for example construct different (“credible”) intervals and regions in the parameter space corresponding to certain levels of confidence. In this framework, one-, two-, … and -dimensional (where is the total number of free parameters of the model) credible regions for any sub-space of the parameter space can then be easily constructed by integrating (or ‘marginalizing’) the full posterior PDF, , over all other parameters; this procedure then results in joint “marginal” PDFs for the parameters of interest.

In Bayesian parameter estimation, one reports the “posterior means” of the parameters (the expectation values of the parameters according to the marginalized posterior) as their most-favored values. Uncertainties about the favored values (at a given confidence level) are then estimated by “probability mass” surfaces containing corresponding percentages of the total marginalized posterior PDF.

In Bayesian model selection on the other hand, the quantity of interest is the evidence

 Z≡P(D;H)=∫VL(Θ)π(Θ)dΘ, (26)

where is the entire -dimensional parameter space of the model. Assume that we want to select the best-fit model between two hypotheses and . This can be done by looking at the ratio

 P(H1|D)P(H0|D)=P(D|H1)P(H1)P(D|H0)P(H0)=Z1Z0P(H1)P(H0), (27)

where and are the posterior PDFs for and to be the true hypotheses given the observed data, respectively, and and are the prior PDFs for the hypotheses before observing the data. The ratio can be set to unity in case we have no preference for any of the two models a priori. In that case, one observes from eq. (27) that in order to see whether a model is preferred compared to another one one needs to evaluate Bayesian evidences for the two cases and look at their ratio.

This is an interesting recipe for selecting between different theoretical models, but with two caveats: 1) The evidence ratio is useful only if the models at hand are equally motivated theoretically or based on previous observational constraints; this is not always true though. 2) One usually calculates the evidences numerically, meaning that the exact value of the integral (26) cannot be evaluated. In practice, one needs to choose some ranges for the parameters in the parameter space to scan over and this is effectively equivalent to imposing a prior. In most cases (at least when the likelihood has a Gaussian or nearly Gaussian shape), choosing larger ranges will give a smaller evidence (this can be seen also from the observation that the evidence is nothing but the average of the likelihood over the parameter space, at least when flat priors are used). In cases where the alternative hypothesis differs from the null hypothesis in that it is only an extension of the parameter space by adding new parameters (i.e. contains as a sub-space), the evidence comparison can be a very powerful method. The reason is that increasing the dimension of the parameter space yields a smaller evidence (see integral (26)) unless the likelihood values improve in such a way that the increase in the volume of the parameter space is compensated (the evidence recipe therefore automatically implements Occam’s razor). For completely different models however, one needs to know prior probabilities for the models, as well as the parameter ranges within each model.

Frequentist inference: In this approach to statistical data analysis, probabilities are assigned only to the data and cannot be assigned to model parameters. This means that the posterior PDF, , on which Bayesian parameter estimation is based, is completely meaningless to frequentists. In this framework, one instead works only with the likelihood which by definition is the probability of the observed data for a fixed set of model parameters and is well-defined in the frequentist framework.

Knowing the full, -dimensional likelihood of the model, one can infer various statistical properties of the model by constructing “confidence” intervals and regions (as opposed to the credible intervals and regions in Bayesian statistics). Perhaps the most common method for such constructions, that is numerically feasible enough, is the “profile likelihood” procedure [78, and references therein]. Here, instead of marginalizing over unwanted parameters, one maximizes (or profiles) the full likelihood along those parameters and obtains one-, two-, … and -dimensional profile likelihoods. The most-favored values for the parameters are then given by the values that maximize the full likelihood, and uncertainties upon those values (at a given confidence level) are constructed by iso-likelihood contours in the model parameter space around the best-fit point. For Gaussian-like likelihoods (which are always the case if sufficient data are available), the profile likelihood method is a good approximation to the exact frequentist construction of confidence intervals and regions proposed by Neyman [79]. One method that provides exact confidence regions and intervals even for complex (non-Gaussian) parameter spaces (although harder to implement numerically), is the “confidence belt” construction scheme [80].

In frequentist inference, in order to assess how fit a theoretical model is to the observed data, one again works with the full likelihood. Let us assume that the number of data points (measurements) used in the analysis is and the model has free parameters. In addition, we assume that each measured data point follows a Gaussian distribution (an assumption that is approximately true for the cosmological data we use in the present paper). One then expects the () at the best-fit (highest-likelihood) point to follow a chi-squared distribution with degrees of freedom if one repeats the measurements ideally an infinite number of times. Calculating the -value (the probability of obtaining a as large as, or larger than the one actually observed, assuming that the model is true) corresponding to the observed then provides a powerful tool to assess the goodness of fit of the model to the data.

Scanning of the parameter space: Our discussions so far about the statistical frameworks have made it clear that no matter which strategy we use, one quantity that we need to estimate accurately is the likelihood of the model as a function of the free parameters. Mapping of the likelihood correctly is therefore the first essential step in both Bayesian and frequentist statistics. Especially in the latter approach, it is all we need to know: the globally maximum likelihood value provides us with a goodness-of-fit test of the model (through calculating the -value at the best-fit point), as well as the favored values of parameters and uncertainties around them (through the profile likelihood construction and iso-likelihood contours). For the Bayesian framework however, one needs to take one step further and calculate the evidence (for model comparison) and the posterior PDF (for parameter estimation).

Simple grid scans are obviously the most straightforward methods in mapping the posteriors, but they are notoriously slow when implemented numerically for high-dimensional parameter spaces. An alternative approach that has become highly popular in cosmological data analysis is based on the Markov Chain Monte Carlo (MCMC) methods (for an introduction and overview of MCMC methods, see [81], and for their applications in cosmology, see e.g. [82]) which revolve around the idea that the density of the points obtained in the numerical exploration of the parameter space be proportional to the actual probability function that one aims to map. This provides a simple way of marginalizing over any set of parameters (required by Bayesian statistics) just by counting the number of sample points corresponding to a point in the selected sub-set of the full parameter space. MCMCs provide a largely improved scanning efficiency in comparison to grid searches.

More recently, another scanning algorithm based on the framework of nested sampling [83, 84], called MultiNest [85, 86], has attracted considerable attention in both particle physics and cosmology. The primary purpose of this method is to calculate the Bayesian evidence for a model, but it also provides the posterior distribution as a byproduct. It has been claimed that the results of the MultiNest and the MCMC parameter estimations are identical (up to numerical noise), while the former is two orders of magnitude faster than the latter. The technique is clearly optimized for Bayesian inference, but in the absence of a competitive alternative, it can be used also for frequentist analyses if the model parameter space is not far too complex with many spike-like, fine-tuned regions. We do not expect our cosmological model to be of this sort and therefore choose MultiNest as the statistical tool in our present analysis (for an alternative algorithm optimized for frequentist statistics, see e.g. [70]).

As we mentioned earlier, the results of Bayesian and frequentist statistics do agree if the information provided by data is so strong that the likelihood dominates over prior contributions to the posterior distribution. In this case, the credible and confidence regions will coincide providing unique statistical conclusions about the model parameters. In Bayesian statistics, the set of prior assumptions play an important role in the final inference. In cases where such effects are dominant, the statistical conclusions cannot be trusted. This feature of Bayesian statistics is however very interesting because it provides a good measure of the robustness of a fit; one should not consider the fit definitive if it strongly depends upon priors. In this case the data are not constraining enough and/or the model under consideration is so complex that a more detailed analysis of its parameter space is required (for a detailed discussion of the impacts of priors, but in a different context, see [87]). One other way to check whether the results of a fit are independent of priors is to compare them with the results of a frequentist analysis. In cases where the frequentist measures (such as the profile likelihood) do not point to similar preferred values for parameters and uncertainties around them, we can conclude that the model is not constrained properly with the data, and priors have strong impacts on the results.

The other interesting reason why one should consider both Bayesian and frequentist measures, such as marginal posteriors and profile likelihoods, respectively, in any scan, is that they probe different properties of the parameter space. Marginal posteriors are sensitive only to the total posterior probability mass in the sub-space that has been marginalized over, and they consequently do not probe fine-tuned regions with spike-like likelihood surfaces. Frequentist measures, such as profile likelihoods, are on the other hand (by construction) strongly sensitive to such regions. Considering both marginal posteriors and profile likelihoods therefore enables us to acquire a complete picture of the favored parameter regions and statistical properties of the model.

As we discussed above, most of the state-of-the-art scanning algorithms that are widely used in cosmological data analysis are based on either MCMCs or nested sampling techniques (as we use in the analysis of the present paper). All such methods are designed and optimized for Bayesian inference as their main objectives are to calculate either the posterior PDFs or the Bayesian evidence (or both). Since there are no competitive algorithms (in terms of speed, efficiency and accuracy) that are optimized for the frequentist approach, it is very common to use those Bayesian algorithms also to map the likelihood of the model, the quantity that is needed for frequentist inference. This gives another reason why one has to deal with Bayesian inference (though implicitly) even if s/he is interested only in frequentist statistics. In principle, results of frequentist inference are independent of prior assumptions. This is however the case only if an accurate construction of the likelihood function is available, which in turn depends on how powerful the employed scanning algorithm is. Using algorithms that are optimized for Bayesian exploration of the parameter space then implies that any mapping of the likelihood function based on such scans can in principle be strongly impacted by prior effects (for more discussions, see e.g. [70, 71]).

Ideally, one wants to compute both marginal posteriors and profile likelihoods by sampling the parameter space. Our discussion in the previous paragraph illustrates that MCMC-based sampling techniques would give a good estimate of the former, but they may not be sufficiently accurate in providing the latter. One may be able to get a better estimate of the maximum values of the likelihood at each point in the parameter sub-space of interest by generating a much larger number of samples, but with typical numbers of Monte-Carlo samples generated by standard algorithms, the estimated values may be too far from the actual ones. The reason simply is that the algorithms may miss the spike-like maxima as they are not sensitive to the fine-tuned regions. One can however obtain sufficiently accurate values for the “mean” values of the likelihood along the unwanted parameters (as defined in [82]), even from a small number of samples. Since in the present paper we will give our estimates of the parameter values only in terms of marginal posteriors, and frequentist measures are used only to assess the degree of dependence of the Bayesian results on the priors (we do not aim to compute frequentist confidence regions and intervals), we plot the mean likelihoods, instead of the profile likelihoods, on top of the posteriors. We still expect a good agreement between marginal posteriors and mean likelihoods for well constrained parameters [82]. If the posterior PDF is Gaussian or is separable with respect to the sub-space of the parameter space for which the posterior has been marginalized, the mean likelihood will be proportional to the marginal posterior. Mean likelihoods are however not purely frequentist quantities, as the posterior PDF is used in their definition. They can additionally help us acquire more information about the shape of the distribution along the directions over which we marginalize the distribution. The marginalization procedure looses all such information in particular about the skewness with respect to the marginalized dimensions, which is important for understanding possible correlations between the parameters; this information can be to a great extent recovered by plotting the mean likelihoods in addition to the marginal posteriors.

We end this section by a discussion about the construction of the model likelihood that is used in constraining the model. The scanning algorithm we use in our analysis (the MultiNest algorithm) works based on the calculation of the likelihood (or the ) value at each point in the parameter space. This is done by summing over all values for different contributions from cosmological measurements of section 3.1 when Gaussian distributions are assumed for uncertainties around the measured values. In cases where a point in the parameter space, i.e. a set of parameters, gives unacceptable values for a computed quantity (such as negative values for or as discussed in section 3.3), we assume a very large value for the (or a very small value for the likelihood); this automatically disfavors such points. Also, in cases where the parameters give more than one solution for the initial value of (at ), we compare the values obtained for all the solutions and retain the one that gives the lowest value.

## 4 Results and discussion

In the previous sections, we introduced the bimetric theory of gravity and the specific model that we aim to constrain using cosmological data. In addition, we presented the data set, the parameter space of the model, statistical methods and measures, as well as the scanning algorithm that we employ in our analysis. In the following sections, we present our results for particular sub-spaces of the full parameter space, as well as the full theory. We also discuss our findings and some subtleties that must be taken into account when interpreting the results. At the end of this section, we summarize our statistical results for all different models in table 2.

### 4.1 ΛCDM as the reference model

In order to assess how well the bigravity model fits the data, it is useful to compare the results to those of a reference model, with the obvious choice of the standard CDM model. We know that the CDM model is in perfect agreement with all existing cosmological constraints, in particular at the background level. This makes the model phenomenologically very interesting and so far the best way of parameterizing the evolution of the Universe. We therefore expect any alternative models that agree with observations to effectively mimic the CDM at late times; our bigravity model is no exception.

In order to be consistent with the assumptions in the bigravity case, we define our CDM model in terms of two free parameters (present value of the matter density parameter) and (present value of the Hubble parameter). Assuming a flat universe, will be a derived quantity and not a free parameter (). We scan this two-dimensional parameter space using the data and methods we described in the previous sections. The flat (top-hat) prior ranges we choose for the parameters are and km s Mpc, for and , respectively.

The value that we find at the best-fit point is , which corresponds to a -value of . This value shows, as expected, that the observed is less than away from the predicted value, an indication that the model is very well consistent with the data. The value of ( being the Bayesian evidence) we have obtained for the model is . In addition, fig. 2 shows the one-dimensional marginal posterior PDFs (black solid curves) and mean likelihoods (red dots) for and as the result of our analysis. The blue (inner) and green (outer) vertical dashed lines indicate () and () credible intervals, respectively. Both marginal posterior and mean-likelihood curves are normalized to their values at their peaks. Both curves have Gaussian forms and perfectly match, another indication that the model fits the data very well and its parameters are observationally well constrained.

### 4.2 No explicit cosmological constant (B0=0)

Let us now turn to our bigravity model. Looking again at the model’s action, eq. (1), we see that the parameter (and correspondingly ) is nothing but a cosmological constant term for the physical metric (with )101010Strictly speaking, represents the vacuum energy contribution to the cosmological constant; see the discussions in footnotes 5 and 8 for more details.. In fact, by setting all other s to zero, it is obvious that our bigravity model will reduce to the standard CDM model (it can be seen also from eq. (19) for the Hubble parameter). We know from the observational constraints on the standard model that one gets a very good fit to the data if such a term is present. This constant term has been proposed as the standard source of cosmic acceleration at late times. As we stated in section 1, this assumption has been strongly challenged by both cosmology and particle physics considerations, mainly because of the difficulty of explaining its small but not zero value compared to the theoretical predictions without the need for a huge fine-tuning. Finding an alternative to the cosmological constant that explains the late-time acceleration of the Universe has therefore been of great interest. In order to see whether one can get an accelerated universe from our bimetric modification of gravity without an explicit cosmological term (i.e. through a self-acceleration mechanism), we set the parameter to zero. We then try to fit the other parameters of the model to the data examining how well the model can explain the cosmic acceleration.

In addition, before we analyze the full system, it is helpful to study its sub-systems where only one or a couple of the parameters are nonzero and free to vary. This may help us to understand the dynamics of the model better, for instance, which terms are necessary for self-acceleration giving a good fit to the data, which parameters can be constrained observationally, and which ones remain unconstrained. In addition, such a step-by-step analysis will be helpful in understanding possible correlations between the parameters.

#### 4.2.1 Sub-models with two free parameters: (B1,Ω0m), (B2,Ω0m), (B3,Ω0m)

We start with sub-sets of the parameter space (24) where all s (or s) are set to zero except one of them. We have already assumed is fixed to zero (again, it is obvious that a model with only nonzero and will yield good fits to observations, as this is just the CDM model that we studied in section 4.1). Keeping a free parameter, we therefore have four choices for the reduced two-dimensional sub-space, i.e. cases with , , , or being free. Clearly having only non-zero will not yield an accelerated universe and hence a good fit. It is easy to see this by observing that with only , eq. (19) is nothing but the standard Hubble equation containing only matter and radiation. Such a model does not fit the set of cosmological observations. The interesting cases to study are therefore those of free , or (when the matter density parameter is also allowed to vary).

and nonzero: The one-dimensional posterior PDFs and mean likelihoods for this case are depicted in fig. 3, where the left (right) panel corresponds to (). As for the CDM case, the black solid curves (red dots) show the posterior PDFs (mean likelihoods), and the blue (green) vertical dashed lines indicate () credible intervals. The two-dimensional posterior PDF in the - plane is given in fig. 4, where the inner (outer) contours correspond to () credible regions. The Gaussian shapes of the curves in fig. 3, as well as the observation that the posteriors and the mean likelihoods perfectly match indicate that the model is well constrained. Here, we have used flat priors in our scans with prior ranges of and for and , respectively. The impact of increasing the scanning range for on the plots is negligible, which is another indication that the analysis is statistically robust and the constraints upon the model parameters are reliable.

The value at the best-fit point is , with the corresponding -value of . The in this case is slightly bigger than that of the CDM model, but the -value shows that the observed , assuming that the ()-bigravity is the true model, is still less than away from the predicted value. We conclude that the model is perfectly consistent with observations.

With the mentioned chosen parameter ranges for the scans, the value of we get is , which is smaller than that of the CDM. An immediate conclusion might be that the CDM is favored compared to our bigravity model, since the (evidence difference between the two models) is more than . One should however be cautious in this, as first of all the Bayesian model selection procedure we described in section 3.4 is based on not only the evidence ratio (or log-evidence difference), but also on our priors about the models before observing the data (see eq. (27)). This includes all previous observational constraints on the models as well as the theoretical preferences. We do not consider any previous observational prejudices about the models here, but theoretically one may argue that the bigravity model is preferred to the CDM model because the latter (although very well consistent with observations) strongly suffers from purely theoretical problems (the measured value for is not technically natural from a particle physics point of view). In addition, the prior ranges one chooses for the parameters to scan over can change the calculated value of the evidence. We can reduce the range for in our scans and get better evidence values. For example, as we will see below, there are reasons why we expect to possess positive values for the considered sub-set of the full model to agree with observations; this reduces the prior range by a factor of two. Other possible theoretical reasons may reduce the prior range even further, giving rise to an even larger evidence. For these reasons we do not use the Bayesian model selection approach in comparing the bigravity model with the CDM in the present paper. We will however use it later when we investigate how our fits may improve by allowing more parameters of the model to vary. This would tell us whether increasing the dimensionality of the parameter space could help the model to explain the data better or not (Occam’s razor).

In order to better understand why the model gives a good fit to the data in this particular case, let us look at the dynamics of the scale factor ratio at the best-fit point of the model. Fig. 1 that we used as an example in section 3.2 actually corresponds to this particular sub-space of the parameter space. We observe from that plot that starts off from the asymptotic value of zero in the far past (high redshifts) and evolves toward another constant value of at the far future. We obtained this value from our statistical analysis of the model, but it can be understood also from analytical studies of the model, in particular by analyzing eq. (22) at (far in the future). The redshift dependence of and tells us that the energy densities of both matter and radiation will vanish at , and consequently for the ()-model, the quartic equation (22) reduces to the following asymptotic form and analytical solution for :

 B1y2−B13=0⟹y=√13≈0.577. (28)

This value perfectly matches the one we obtained numerically. Moreover, plugging this asymptotic value into eq. (19) gives an equation for the Hubble parameter that resembles that of the CDM cosmology with only a cosmological constant contribution to the energy density of the Universe (i.e. an asymptotically de Sitter universe). The energy density parameter in this case will become . If were constant over the entire evolution of the Universe, this would mean that the model could give a good fit to the observations with the value of , where is the measured value of within the CDM cosmology (). We however know that is not constant and for all redshifts larger than zero (which our measurements actually correspond to) has a value smaller than its asymptotic limit. Therefore, in order for the model to compensate for the smallness of , we expect a good fit to the data with a somewhat larger value for at the best-fit point. This is exactly what we observe as the result of our statistical analysis, i.e. a fit comparable to the CDM with .

Obviously, a more rigorous analysis is needed in order to analytically justify the obtained value for at the best-fit point (i.e. ), and to understand why this actually gives good fits for the entire time of the cosmic evolution. However, we can get a better understanding of this by considering eq. (22) once again for our particular sub-set of the model where , , and are set to zero (and the contributions from radiation and curvature are neglected). The equation in this case becomes quadratic in with the following solution:

 y=−Ωm±√Ωm2+43B122B1. (29)

The model is expected to give a relatively better fit to the data if instead of (far in the future), it matches the CDM model at (the present time). Since the effective in this case is (from eq. (19)), eq. (29) can be written in the following form at :

 2ΩΛ=−Ω0m±√Ω0m2+43B12⟹B1=±√34[(1+ΩΛ)2−Ω0m2]. (30)

First of all, we must chose a positive value for . The reason is the following: As we saw, the effective has the value of with a positive value favored by observations (to yield cosmic acceleration). is the ratio of the two scale factors of the theory and has to be positive. We therefore observe that positive is favored by data. In order to compute the value of , we plug the values of and preferred by the data in the case of the CDM model (i.e. and ) into eq. (30). This gives which is very close to the value we have obtained numerically.

Furthermore, by increasing the redshift (moving into the past), increases and eq. (29) shows that always remains positive and real, and it smoothly transitions into in the extreme past. This means that the energy contributed by to the dynamics of the Universe becomes smaller and smaller compared to the matter contribution at higher redshifts and the Universe becomes matter-dominated. The model therefore has a well-behaved solution and resembles the CDM model at larger redshifts. This is another reason why the model yields good fits to the data.

Finally, let us mention one more interesting implication of our results. We have seen that assuming only one of the s to vary (i.e. ), we get a robust constraint on its value (see the right panel of fig. 3). We have already given the value at the best-fit point, i.e the highest-likelihood value (). Marginalizing the full posterior PDF over the unwanted parameters ( in this case), gives the Bayesian preferred value for and uncertainties upon it: . In the absence of other s (or s), eq. (1) indicates that the graviton mass is purely determined by . Using the definition of (i.e. ) and the obvious choice of (there is in fact no need to define as an independent parameter in this case; it can be absorbed into ), we can conclude that , where km s Mpc. Obviously, this constraint on the graviton mass is true only if our particular construction of the massive gravity theory is correct, no explicit cosmological term exists and all parameters (except ) are zero. These are strong assumptions that need to be tested observationally or justified theoretically.

and nonzero: The results of our statistical analysis of the model in this case shows that the best-fit has the value of , which corresponds to a -value smaller than . This clearly means that the model is strongly excluded by observations (with more than confidence). This is confirmed by the very low value obtained for the Bayesian evidence: (compare this with the CDM or () models). Let us see if we can analytically explain why one does not get a good fit in this case.

As for the () case, we use eq. (22) to study how behaves in the () case. The equation becomes a cubic one with the following form:

 y(B2(y2−1)+Ωm)=0. (31)

The trivial solution is clearly not interesting, and will not yield a good fit, as it does not produce any term in eq. (19) that mimics the cosmological constant (which is essential for giving a good fit). The model in this case is nothing but the CDM model (with ). The acceptable solution for is therefore:

 y=√1−ΩmB2. (32)

We again chose positive in order to have a physically meaningful model. We observe from eq. (19) that at late times we have an effective cosmological term . To have any hope of obtaining good fits this term must be positive. This implies that must be positive.

We immediately see from eq. (32) that as gets bigger than , turns imaginary, which is unphysical. Since scales as with redshift, we expect this to always happen for redshifts bigger than some , where

 z∗=(B2Ω0m)13−1. (33)

In case a value of could be chosen such that becomes larger than the maximum redshift we have considered in our analysis (i.e. ) we would still be able to get a good fit, even though the model would need to be modified at redshifts higher than . We can check this by trying to find an estimate for at the best-fit point. As in the case, this can be done by assuming that today. Eq. (31) then reads

 B2=Ω0m+ΩΛCDMΛ=1. (34)

Therefore, as soon as the model becomes unphysical. Assuming a value of for , this corresponds to , which is well within the range of the cosmological data we have used. We therefore conclude that the ()-model cannot yield good fits.

and nonzero: As for the previous case, our statistical analysis shows that the model is excluded observationally. The reason is that the best-fit in this case is (-value ), which is even larger than that of the ()-model. The Bayesian log-evidence is confirming that the model does not fit observations. Again, in order to understand the reason analytically, we proceed in the same way as in the previous cases, starting from eq. (22). The equation becomes:

 y(B33y3−B3y+Ωm)=0. (35)

Again, it is the nontrivial cubic equation in brackets that is a possible solution. To see how this behaves we consider the discriminant of the equation:

 Δ=43B23(B23−94Ω2m). (36)

When becomes negative, as it will when , only one root of the cubic equation will be real. Hence, it is natural to only consider this solution, which is:

 y=−1B33√32B23[3√Ωm+√Ω2m−49B23+3√Ωm−√Ω2m−49B23]. (37)

The effective cosmological constant in this case is (see eq. (19)) . A good fit to the data in this case means a positive cosmological constant, which given that must be positive, implies a positive value for . Eq. (37) is however not consistent with both and being positive. This means that the model cannot fit the data if becomes negative within the range of the data. The redshift beyond which this happens can be estimated in a similar manner as for the case:

 z∗=(2B33Ω0m)13−1. (38)

At the present time, eq. (35) reads

 B3y=Ω0m+ΩΛCDMΛ=1⟹y=1B3. (39)

Plugging this value for in terms of into the relation , we get

 B3≈(3ΩΛCDMΛ)−12 (40)

as an approximation for at the best-fit point. Assuming a value of for , eqs. (40) and (38) give , which is clearly within the range of the data. This implies that the model cannot give good fits to the data and is therefore excluded.