Revisiting Cosmological parameter estimation

Revisiting Cosmological parameter estimation

Jayanti Prasad jayanti@iucaa.ernet.in IUCAA, Post Bag 4, Ganeshkhind, Pune 411007, India.
July 12, 2019
Abstract

Constraining theoretical models with measuring the parameters of those from cosmic microwave background (CMB) anisotropy data is one of the most active areas in cosmology. WMAP, Planck and other recent experiments have shown that the six parameters standard CDM cosmological model still best fits the data. Bayesian methods based on Markov-Chain Monte Carlo (MCMC) sampling have been playing leading role in parameter estimation from CMB data. In one of the recent studies 2012PhRvD..85l3008P () we have shown that particle swarm optimization (PSO) which is a population based search procedure can also be effectively used to find the cosmological parameters which are best fit to the WMAP seven year data. In the present work we show that PSO not only can find the best-fit point, it can also sample the parameter space quite effectively, to the extent that we can use the same analysis pipeline to process PSO sampled points which is used to process the points sampled by Markov Chains, and get consistent results. We also present implementations of downhill-simplex Method of Nelder and Mead and Powell’s method of Bound Optimization BY Quadratic Approximation (BOBYQA) in this work for cosmological parameter estimation, and compare these methods with PSO. Since PSO has the advantage that it only needs the search range and does not need covariance-matrix, starting point or any other quantity which depend on the final results, it can be quite useful for a blind search of the best fit parameters. Apart from that, PSO is based on a completely different algorithm so it can supplement MCMC methods. We use PSO to estimate parameters from the WMAP nine year and Planck data and get consistent results.

pacs:

I Introduction

Cosmological parameters characterizing the primordial curvature and tensor perturbations which were present at the end of inflation, background expansion of the universe, physical events (like reionization) that took place after the decoupling of CMB photons can be estimated from the temperature and polarization anisotropies present in the CMB sky 1994PhRvL..72…13B (); 1995PhRvD..51.2599H (); 1995ApJ…439..503D (); 1996PhRvD..54.1332J (); 1996PhRvL..76.1007J (); 1997ApJ…488….1Z (); 1997MNRAS.291L..33B (); 1998PhRvD..57.2117B (); 2002PhRvD..66f3007K (); 2003ApJ…596..725C (); 2003moco.book…..D (). CMB anisotropies are represented by a stochastic field on a 2-sphere and can be completely characterized by the correlations between different directions. If CMB anisotropies are statistically isotropic and Gaussian also, as most observations indicate, then they can be completely described by two-point correlations or the angular power spectrum 1987MNRAS.226..655B (); 1995PhRvD..51.2599H (); 1997MNRAS.291L..33B (); 1998PhRvD..57.2117B (); 2001CQGra..18.2677C (); 2004astro.ph..3344C (); 2008PhRvD..77j3013H (); 2009PhRvD..79h3012H (). Since CMB anisotropies are quite small (one in one hundred thousand part for temperature) their evolution can be studied quite effectively using the theory of linear perturbations 1984ApJ…285L..45B (); 1995PhRvD..51.2599H (); 1995ApJ…455….7M (); 1998PhRvD..57.3290H (). There are numerical codes available which give us the angular power spectrum of CMB anisotropies at present for a given model of primordial fluctuations and background cosmology 1996ApJ…469..437S (); 1999ascl.soft09004S (); 2000ApJ…538..473L () and that can be compared with the observed angular power spectrum we get from CMB experiments like WMAP and Planck.

In general cosmological parameter estimation from CMB data involves finding the point in the multi-dimensional parameters space at which the likelihood function is maximum i.e., the best fit point. In Bayesian framework this is done by sampling the joint posterior probability distribution and from that various statistical quantities are computed. Since grid based sampling is prohibitively expansive (computationally) for a large number of parameters (at least six in our case), stochastic methods are generally employed which scales linearly (at the most) with the number of parameters. Stochastic methods based on Markov-Chain Monte Carlo (MCMC) sampling employing Metropolis-Hasting algorithm 1953JChPh..21.1087M (); Hasting1970 () have been leading cosmological parameters estimation from CMB data. MCMC method have an interesting property that they ensures that the number density of the sampled points is asymptotically proportional to the joint probability distribution 2001CQGra..18.2677C (); 2002PhRvD..66j3511L (); 2003ApJS..148..195V (); 2010LNP…800..147V (); 2014JCAP…07..018D (). There has been presented some modification in the usual Metropolis-Hasting algorithm to achieve better performance 2014JCAP…07..018D (). However, all the methods based on MCMC share the common weakness that the step size has to be chosen very carefully, in particular when the likelihood surface has multiple local maxima.

In a recent study 2012PhRvD..85l3008P () we have shown that particle swarm optimization or PSO (we call our code COSMOPSO), which is an artificial intelligence inspired population based search procedure, can be used to estimate cosmological parameters from the WMAP seven year data. At that time it was not clear to us whether PSO can be used as a “sampler” also like MCMC, apart from an optimizer. By carrying out a large number of simulations, we have found that if the convergence of PSO can be delayed and we use multiple realizations of PSO then PSO can also sample the parameter space quite effectively to the extent that we can use the same analysis pipeline to process the PSO sampled points which is used to process the points sampled in MCMC i.e., GetDist.

The plan of this paper is as follows. In section (2) we show that PSO particles in COSMOPSO do sample the parameter space effectively and present the results which we get for the WMAP nine year and Planck data. In section (3) we present an implementation of the downhill simplex method of Nelder and Mead neldermean1965 () for cosmological parameter estimation (we call the code COSMODSM) for the WMAP nine year data. In section (4) we present an implementation of Powell’s method of Bound Optimization BY Quadratic Approximation (BOBYQA) for cosmological parameter estimation (we call our code COSMOBOBYQA) and compare COSMOMC, COSMOPSO, COSMODSM with COSMOBOBYQA and present results for WMAP nine year data. In section (5) we present the discussion and conclusions of our work.

Ii Particle Swarm Optimization

Figure 1: The figure shows the positions of PSO particles along different directions with iterations (x-axis) for three different realizations (shown by different colors). PSO particles not only finally reach the point at which the cost function (likelihood) is maximum they also make random walks in way that the size of the jump at every stage is roughly proportional to their distance from the global maximum. As a result of this type of random walk the density of sample points increases when we approach towards the global maxima. In the above figure the distribution of particles along x-axis is not important since the number of steps which are needed for convergence are different for different realizations.

PSO was proposed by Kennedy and Eberhart KennedEberhart1995 (); KennedEberhart2001 () in 1995 and since then there have been proposed many modifications of PSO for different type of applications (for recent updates see Blum2008 (); Lazinic2009 ()). PSO is a population based search procedure which carry out local and global search in a multi-dimensional parameter space simultaneously KennedEberhart1995 (); KennedEberhart2001 (); Engelbrech2002 (); Blum2008 (); Lazinic2009 (). PSO has been regularly used in engineering problems since its inception, however, recently it has become common in astronomy also 2005MNRAS.359..251S (); 2010PhRvD..81f3002W (); 2011ApJ…727…80R (); 2012PhRvD..85l3008P (); 2013PhRvD..87j4021L (). In PSO, a team of particles or computational agents is launched in the multi-dimensional parameter space which is driven by the exploration of all the individual particles and the team together. If the position and the velocity of PSO particle at “time” are represented by and respectively then according to the PSO algorithm the velocity at time is computed as

(1)

and the position is updated as:

(2)

In Eq. (1) is the point at which the particle has found the maximum value of the cost function (in our case minimum value of where is the likelihood function) and is called its Pbest or the “personal best’ and is the location of that Pbest which is highest and called the “global best” or Gbest . The coefficients and are called acceleration coefficients and their values determine the weight we want to give to the exploration of the individual particles and the team respectively. The coefficient is called the inertia weight and its value decides the weight which we want to give the “inertial” motions of the particles and and are two uniform random numbers in the range .

The second and the third term in the right hand side of Eq. (1) resemble a simple harmonic motion (Hooke’s law) and corresponds to the force acting on particle at time :

(3)

where is the location of the Pbest or Gbest and , is the force constant which is a stochastic variable here. Eq.  (3) shows that PSO particle oscillate around the Pbest and Gbest with gradually decreasing random amplitudes. In the beginning of PSO particles exploration there are a very few particles which have their Pbest close to the Gbest (one particle has its Pbest and Gbest the same), however, when the particles approach close to the global maximum of the cost function all the particles have their Pbest close to the Gbest and they all oscillates around the points which are very close to each other.

In order to show the behavior of the cost function close to the best-fit point we Taylor expand around the best fit point , in the following way:

(4)

which shows that very close to the best fit point , every cost function can be approximated as a quadratic function so it is not required that the sampling method we use have more complex strategy than what is needed to sample a quadratic function. Since step size of PSO particles is approximately proportional to their distance from the global maximum (the approximation becomes better when we reach close to the global maximum) therefore the density of sample points increases in a natural way. We support the claim that PSO also can sample the parameter space effectively by estimating the cosmological parameter estimation from the WMAP nine year and Planck data. We arrange the points which are sampled by PSO into “chains” and use the  COSMOMC analysis pipeline GetDist on that.

In any PSO implementation we not only need to chose the values of the design parameters and the number of PSO particles, we also need to select a method for setting the initial positions and velocities of particles, maximum velocity, boundary condition and a termination criteria. Apart from the termination criteria, we keep the values of the design parameters and other consideration the same as we used in 2012PhRvD..85l3008P (). For termination, we stop PSO exploration when find that the change in the cost function is smaller than a user given value for a given number of steps.

Our cosmological parameter estimation code based on particle swarm optimization named COSMOPSO has three main components which are used for computing theoretical s, computing likelihood and evolving PSO particles. Since we have already discussed the third component, here we would like summarize the first and the second components.

For computing the angular power spectra of CMB temperature and polarization anisotropies for a given theoretical model (set of cosmological parameters) we use publicly available code CAMB camb () which employs a line of sight integration approach as was given in 1996ApJ…469..437S (). We use April 2014 version of CAMB which is different than what was used in WMAP nine year and Planck data analysis so some difference are expected in the values of cosmological parameters we report here.

In CAMB we vary only six cosmological parameters named physical densities of baryons () and dark matter (), the Hubble parameter at present (), the amplitude () and spectral index () of primordial scalar perturbations at some pivot scale (), and the the optical depth of reionization () epoch. Note that we consider as a fitting parameter and not the angular size () of the last scattering surface as is commonly done (in COSMOMC) and have some advantages over . We do not consider dark energy density () as a fitting parameter and compute that from the flatness condition:

(5)

We use pivot scale for the scalar and tensor power spectrum , when using only WMAP nine year data and use pivot scale , when using WMAP nine year and Planck data combined. We use l_max_scalar=2600, l_max_tensor=1500, l_eta_max_scalar=4000, l_eta_max_tensor=3000, the width of reionization redshift , CMB temperature , Helium fraction , massless neutrino species . We keep logical variables AccurateReionization and AccurateBB true and do not compute tensor perturbations and keep WantTensors false in CAMB.

The procedure which we follow to estimate cosmological parameters is as follows. We use eight different realizations (different seeds for random number generator) of PSO which have been converged for sampling. For every realization we sorted the sample points on the basis of the likelihood value and wrote them in the form of chains as are needed by the GetDist program of COSMOMC to compute the best-fit values, 68% limits and confidence regions. In order to make one-dimensional probability distribution (marginalized) and two-dimensional contours (68% and 95% confidence regions) smooth, smoothing parameters are passed to GetDist parameter file. We found that PSO needs more smoothing than what is needed for MCMC samples.

ii.1 WMAP nine year data

S. No Parameter Prior Description
1 Baryon density today
2 Cold dark matter density today
3 Hubble parameter at present
4 Primordial curvature perturbations
5 Scalar spectrum power-law index
6 Thomson scattering optical depth due to reionization
Table 1: Cosmological parameters and their priors we used in COSMOPSO, COSMODSM and COSMOBOBYQA. Note that the range we have considered is different than what is generally used (in COSMOMC) mainly because all the three methods we considered demand that we must be able to compute the likelihood function at every point in the prior range.

In order to estimate cosmological parameter with COSMOPSO and demonstrate that PSO can also sample the parameter space quite effectively, apart from finding the best fit values, here we present the result for the WMAP nine year data which has been made publicly available by the WMAP team wmap (). WMAP team has also made a FORTARN code available with the data to compute likelihoods for CMB temperature and polarization data for a user given theoretical model i.e., theoretical s. There have been a few changes in the way likelihoods are computed for the nine year release 2003ApJS..148..195V (), however, the basic approach has been the same which was outlined in 2003ApJS..148..175S (). WMAP nine year likelihood code computes likelihood differently at low and high-l for temperature and polarization power spectra 2013ApJS..208…20B (). At high-l () the likelihood is no longer computed using MASTER 2002ApJ…567….2H () (as was the case for earlier data releases) and use an optimally estimated power spectrum and errors based on the quadratic estimator 1997PhRvD..55.5895T (). At low-l () the TT-likelihood is computed using the Blackwell-Rao estimator which employs Gibbs sampling and uses bias-corrected Internal Linear Combination (ILC) map as an input 2004ApJS..155..227E ().

WMAP nine year likelihood does not have any extra parameter and needs only the theoretical power spectra for computing the likelihood. Some of the parameters which are used in the WMAP likelihood code are l_max=1200 and 800 and for TT and TE respectively. We also set use_TT, use_TE, use_lowl_TT, use_lowl_pol true in the WMAP nine year likelihood code.

Figure 2: The diagonal panels in this figure show the one-dimensional marginalized distributions and other panels show two-dimensional marginalized constraints (68% and 95% CL) for the standard six cosmological parameters for the WMAP nine year data which we obtained by combining the sampled points of eight different realizations of PSO and using the Getdist (the program which is used for analysis in COSMOMC) on those.

The prior which we use for cosmological parameters are given in Tab. (1). Note that range which we consider here are different than what are considered in COSMOMC mainly because all the three methods which we consider in the present work demand that we should be able to compute the likelihood function at all the points in the search space.

PSO COSMOMC
Parameter Best Fit 68% Limit Best Fit 68 % Limit
0.02260 0.02252 0.00018 0.02259 0.02265 0.00048
0.1140 0.1148 0.0018 0.1122 0.1137 0.0046
69.827 69.397 0.9758 70.059 69.660 2.178
2.395 2.416 0.044 2.369 2.402 0.068
0.9733 0.971 0.005 0.978 0.9730 0.013
0.086 0.086 0.003 0.086 0.088 0.014
- 3778.7840 3779.0220
Table 2: Cosmological parameters estimated using COSMOMC and PSO for the WMAP nine year temperature and polarization data. Note that the latest version of COSMOMC computes at pivot scale , so we convert the best fit and mean values to for comparison.

One of the results which we would like to highlight is that the sampling of the parameter space done by PSO particles. In Fig. (1) we show the distribution of particles in PSO along the directions of different cosmological parameters for three different realizations. From the distribution of particles we can see that in a single realization most likely we will not have a symmetric distribution of particles around the global best position. However, when we use multiple realizations of PSO which have been converged we get more and more symmetric distribution. Which is important if we want to sample the parameter space close to the global maximum fairly. The final confirmation of the distribution of particles can be done by looking at the one dimensional marginal distribution of cosmological parameters in Figs. (2) and (3).

We show the best fit cosmological parameters which we estimate using COSMOMC and COSMOPSO for the WMAP nine year data with their 68% limits in Tab. (2). Note that there is a slight difference between the COSMOMC best-fit parameters which we present in Tab. (2) and published in 2013ApJS..208…19H () which can be attributed to the different versions of codes we are using. From the table we can see that PSO is not only able to find the best-fit point, it is also able to find the mean and 65 % limits of the parameters quite close to what we get from COSMOMC for the same data. We would like to mention that the sampling done in COSMOPSO is not as good as in COSMOMC and as a results that we get slightly non-Gaussian one-dimensional marginal probability distributions as compared to COSMOMC. We also find that the minimum value of which we get in COSMOPSO is very close to what we get in COSMOMC (in fact COSMOPSO gives us slightly better likelihood).

In Fig. (2) we show the one dimensional marginal probability distributions for the six cosmological parameters in the diagonal panels and 68% and 98% confidence regions in the rest of the panels. From these panels we see that the marginal probability distributions and two-dimensional joint probability distribution as shown by the contour plots show expected behavior. This clearly shows that the way PSO particles make random walks in the parameter space can be used for sampling the parameter space quite effectively.

ii.2 WMAP nine year + Planck data

Figure 3: The same as in Fig. (2) for the WMAP nine year + Planck data.
PSO COSMOMC
Parameter Best Fit 68% Limit Best Fit 68 % Limit
0.02237 0.02231 0.00001 0.02233 0.02237 0.00020
0.1159 0.1167 0.0014 0.1166 0.1163 0.0020
69.508 69.138 0.683 68.688 68.835 0.975
2.1759 2.1838 0.023 2.1741 2.1832 0.037
0.9628 0.9610 0.003 0.9614 0.9618 0.005
0.087 0.088 0.005 0.086 0.088 0.009
- 8694.4260 8693.6840
Table 3: Cosmological parameters estimated using COSMOMC and PSO for the WMAP nine year (temperature and polarization data) + Planck (temperature data).

In this section we present our analysis for the WMAP nine year + Planck data. A detail description of the Planck project and the data product is given in 2014A&A…571A…1P (), however, here we would like to mention that Planck has more frequency channels over wide frequency range, higher angular resolution and better noise sensitivity which makes it far better equipped to handle different type of systematic (foreground etc.) than WMAP. At present Planck has made only temperature data publicly available which can be used alone or with a combination of other CMB data sets to constrain theoretical models. Planck collaboration has also made a code named Plc publicly available for computing the likelihood for temperature, polarization and lensing data. The code can be downloaded from plc () and the detail explanation abut the methodology of power spectrum and likelihood estimation can be found in 2014A&A…571A..15P ().

In order to compute likelihood for a given cosmological model, which is represented by a set of six angular power spectra (TT, EE, BB, TE, TB, EB), we use Planck likelihood software Plc. In Planck likelihood the high-l (up to ) TT likelihood is computed using CAMspec which needs 14 extra (nuisance) parameters which we fix to their best fit values given in 2014A&A…571A..16P () and vary only the six cosmological parameters which directly affect s. Planck likelihood software computes low-l ( to ) TT likelihood using commander and low-l polarization likelihood is computed using WMAP nine year data which needs TT, EE, BB and TE power spectra from to . Since power at and does not have any sense so we keep their values in the Planck likelihood. In COSMOPSO we add all the three Planck likelihoods (after changing the sign) to the WMAP nine year likelihood for our optimization function or the cost function. In order to cross check that we get consistent results we compare those with COSMOMC results.

The best fit cosmological parameters which we get for the WMAP nine + Planck data which we get for COSMOMC and COSMOPSO are given in Tab. (3). From the table we can re-confirm that PSO not only can find the best fit cosmological parameters, which gives the maximum value of the likelihood function, it can also find consistent estimates of the 68% limits of the cosmological parameters. For the WMAP nine year + Planck data also we have consider eight different realization of COSMOPSO which have been converged and create chains out of the sample points to process with GetDist. In Fig. (3) we show one dimensional marginalized probability distribution and 68% and 95% confidence regions of the estimated parameters and for those we can see that PSO gives consistent result for the joint data also.

Iii Downhill Simplex Method (DSM)

Downhill simplex method was introduced by Nelder and Mead neldermean1965 () in 1965 exactly for the purpose we are using it here i.e., “maximization of a likelihood function, in which the unknown parameters enter non-linearly”. On the basis of Nelder and Mead’s algorithm 1992nrfa.book…..P () has written a quite efficient program named ‘amoeba”. A great detail about the algorithm and implementation is also given in 1992nrfa.book…..P (). In place of directly using amoeba we have written our code from the scratch on the basis of the algorithm given in the original paper neldermean1965 (). Since there are some minor difference between the algorithm given in neldermean1965 () and implemented in 1992nrfa.book…..P (), therefor we would like to give an overview of the algorithm which is quite close to what is given fh2010_optimization () also.

iii.1 Basics

A geometric object with vertices in a n-dimensional space is called a “simplex”. For example, a triangle in a two-dimensional flat surface is a simplex. In one-dimensional optimization (root finding for example) we can “bracket” the solution between two points and keep making our bracket smaller and smaller following some algorithm and finally reach close to the solution. In more than one dimensional space there are no “bracketing” methods available and so the problem is much more challenging.

In the downhill simplex method we begin with an initial simplex i.e., set of points in a dimensional space and replace the point at which the cost function is maximum (the worst point) with a new point which we obtain by carrying out three type of geometrical operations named reflection, expansion and contraction. We keep repeating the process until the values of the function at all the points do not become close to each other (within a user defined tolerance). In downhill simplex method the simplex adapts the local geometry, in the sense, that it gets elongated when encounter a long inclined place, reflect when face a valley at an angle and contract close to a minimum.

The most important consideration of downhill simplex method are (1) the choice of the initial simplex (2) tolerance and (3) the values of the expansion and contraction coefficients. We keep the values of reflection coefficients , expansion coefficient and the contraction coefficients to their standard values and and respectively. Choosing tolerance is easy (we can chose a sufficiently small number) but choosing an initial simplex is a bit harder since we want to make sure that the simplex always remain within the allowed range without imposing any boundary condition.

iii.2 The Algorithm

S. No (a,b) iterations
1 (0.20,0.60) 428 0.022546 0.113998 69.689714 2.400900 0.971840 0.085522 3778.8044
2 (0.20,0.40) 330 0.022648 0.114334 69.782342 2.395106 0.974042 0.086739 3778.7934
3 (0.25,0.40) 348 0.022603 0.114112 69.811296 2.396340 0.973377 0.086554 3778.7840
4 (0.21,0.42) 356 0.022601 0.114102 69.804714 2.396605 0.973349 0.086532 3778.7839
5 (0.40,0.20) 306 0.022609 0.113243 70.223014 2.334731 0.973016 0.074727 3779.3774
6 (0.30,0.40) 364 0.022601 0.114110 69.807095 2.399997 0.973447 0.087404 3778.7830
7 (0.35,0.25) 374 0.022588 0.113895 69.868690 2.402573 0.972920 0.087475 3778.7909
8 (0.50,0.20) 222 0.022345 0.115229 68.930597 2.412138 0.965558 0.077042 3779.1554
Table 4: Best fit cosmological parameters for eight different initial conditions for Downhill-Simplex methods. Apart from the values of parameter and we have kept other parameters fixed i.e., and and . How rapidly the algorithm converges depend on the values of the parameters and i.e., the initial simplex. The highest value of the likelihood function is shown in bold.

Since the downhill simplex algorithm strictly demands that we should be able to compute the (cost) function at all the vertices at every step therefor we must make sure that the values of the parameters always remain in the permitted range. We use the same range of cosmological parameters in COSMODSM which we use for COSMOPSO and is given in Tab. (1). Since there is no standard prescription for choosing the starting simplex in Downhill-Simplex method therefore we chose an ad-hoc prescription which is as follows.

We represent the vertex of the simplex at iteration (or “time”) by and the minimum and maximum value of it are given by and . Note that here is a vector which has the dimensions of the parameter space (in our case it is six for the standard cosmological model).

The first vertex of the initial simplex is set as :

(6)

where is a user defined parameter in the range (we recommend it to be less than 0.5).

The other remaining vertices of the initial simplex are set as:

(7)

where and are unit vectors. The coefficients depend on the search range:

(8)

where is a user defined parameter in the range (we recommend it to be less than 0.5) and . Note that there is some degeneracy between and and in any ideal parametrization this should be avoided. Our parameterization is as good as choosing the initial points by trial and error.

In COSMODSM at every iteration we find the vertices which have the largest and smallest values of the cost function, called these and respectively. For better readability we also represent by and by . The second highest point is also needed and we represent that by . We also compute the centeroid of vertices (keeping the aside) and represent that with with respect to which all the geometric operations, reflection, expansion and contraction are carried out. We represent the value of the cost function by and so :

(9)

and

(10)

The centeroid is defined as:

(11)

We use the same convergence criteria which is used in 1992nrfa.book…..P () for which we compute:

(12)

and stop the search when where is a user defined number.

In neldermean1965 () three different equations are used for reflection, expansion and contraction, however, we use a single function as defined below to carry out all the three operations:

(13)

where or with and . We consider and in the present work.

The meaning of can be easily understood by writing:

(14)

which means that is the ratio of the distance of the new point and the old point from the centeroid .

The main steps of the Nelder-Mead Algorithm can be summarized in the following way:

  1. Set up the initial simplex , compute the values of the function , and find out the highest (worst) and lowest (best) point and , and the centeroid of the best points. We also need to find the second worst (highest) point .

  2. If the convergence is not reached reflect the worst point about the centeroid:

    (15)

    and go to the next step.

  3. If we find than that means we have found a point which is better than the worst point so we replace with and recompute and and go to step (2).

  4. If we find that means we are going in the right direction and it is useful to expand the new point along the same direction and get a new point :

    (16)

    If we find that than we replace the worst point with otherwise we replace the worst point with and go to step (2).

  5. If we find that means the simplex is too big and we need contraction. However, before that we test:

    1. If i.e., the reflected point is worse than the worst or is as worse as the worst we contract

      (17)

      If we find that the point is worse than the worst we shrink the whole simplex around the best point :

      (18)

      with and go to step (2)

    2. If then we contract the reflected point :

      (19)

      and if then we replace with otherwise again shrink the simplex around the best point and go to step (2).

Figure 4: Change in the cost (likelihood) function for the seven vertices we have (shown by different colors) as the Downhill Simplex algorithm iterates (x-axis shows the iterations) for the WMAP nine year data in COSMODSM. The minimum (best) value of the cost function is also shown by the straight line.
Figure 5: Change in the vertices (shown by different colors) of the simplex with iterations (x-axis) along different directions i.e., cosmological parameters. The best fit values of the parameters are also shown by straight lines.
Figure 6: Change in the positions of a few PSO particles with iterations (x-axis) along different directions. PSO particles sample much broader area of the parameters space along their journey towards the best fit point as compared to the vertices of simplex in Downhill Simplex method as shown in Fig. (5).

In any iterative algorithm like PSO and downhill simplex in general it is not that important what convergence criteria we choose since most of them look for the progress made in each iteration and stop the search when the improvement falls below a tolerance limit or remains below a tolerance limit. All the three methods which discussed in the present work have different criteria, however, they all have a very good agreement with the final results.

In order to demonstrate that the downhill simplex method of Nelder and Mead can successfully find the best fit cosmological parameter we considered WMAP nine year data for our analysis. Apart from the programs for the exploration of parameter space, we used the same program to compute the cost function (likelihood) which we use in COSMOPSO so there is no need to further elaborate on that here and so we will only present the results here.

In Fig. (4) we show the change in the values of cost function at the vertices of the simplex as the simplex evolves. From the figure we can see that like in any other iterative method in DSM also in the beginning we have huge improvement in the cost function which falls rapidly which means in DSM also (like PSO) the convergence is reached only asymptotically.

We show the best fit cosmological parameters which we get for the WMAP nine year data using COSMODSM in Tab. (4) for eight different initial conditions i.e, different values of the parameters and . We find that apart from having different number of iterations for convergence they all agree with the values of the best fit cosmological parameters with a reasonable limit.

We also tried to see if DSM also can sample the parameter space like PSO and found that it does a very poor job and also suffer from the drawback of early convergence. In order to show that in Fig. (5) and (6) we show the the vertices of simplex in COSMODSM and trajectories of a few PSO particles and COSMOPSO respectively. From these figures it is quite clear that PSO does sample a large volume of parameter space due to its stochastic nature which is not true for DSM.

Iv Powell’s method of Bound Optimization BY Quadratic Approximation (BOBYQA)

Since M. J. D. Powell gave his method of unconstrained optimization by quadratic approximation in 1962 powel1962 () there have been many changes in his approach of non-linear optimization without derivative as is reflected by the software COBYLA, NEWUOA, BOBYQA and LINCOA bobyqa () released by him time to time. In the present work we use BOBYQA FORTRAN 77 package which is available for downloading from bobyqa ().

Powell’s method of Bound Optimization BY Quadratic Approximation (BOBYQA) is a technique to find the global maximum/minimum of a function , , with without computing derivative of the cost function . The method is based on approximating the function by a quadratic function at a set of points which are chosen and adjusted automatically.

(20)

where is a constant, a column vector with elements and is a symmetric matrix and in total we have coefficients to fit the quadratic function for which we consider a set of interpolating points at iteration at which the approximation holds:

(21)

At every iteration we select a point from which has the property:

(22)

which we keep updating just like we keep updating the worst vertex at every iteration as we do in the Downhill-Simplex method.

The quadratic approximation holds true only within a small region characterized by a positive number called the “trust region radius” for which we must give an initial and final values. The value of keep changing with iterations and once it falls below we stop the program. Quadratic approximation within a small region is a common technique and is used in many other algorithms also including one we published in 2013PhRvD..88b3522G ().

At iteration we construct a vector from such that, (1) , (2) , (3) is not one of . Once we have a valid we can update .

(23)

if otherwise we set and one of the interpolation points is replaced with . After this has been done we generate and for iteration according to the prescription given in bobyqa ().

In order to use BOBYQA for cosmological parameters estimation from the WMAP nine year data we made the following considerations. We compute theoretical CMB angular power spectrum S using the publicly available code CAMB and use WMAP nine year likelihood code for computing the likelihood for temperature and polarization anisotropies as we do in section (II).

Since BOBYQA is designed in such a way that all the parameters must have values of the same order for which we normalize the six cosmological parameters with some default values so that they have the same order of and the same values of (where ) at every iteration. This step is important otherwise the parameters like which has a large value will hardly get updated. We use default values and avoided to give the exact values of the best fit cosmological parameters known to us.

There is no prescription available for choosing the number of interpolation points and one must find the appropriate value by trial and error. We have found that , which is in the range , works fine for our case. For and also there is no standard prescription so we must set their values by trying out different values. We use and and found that these values give us acceptable convergence i.e., convergence is reached after 340 iterations.

Here we would like to mention that all the versions of COSMOMC cosmomc () October 2012 onward also use Powell’s BOBYQA with some modifications cosmomc () to find the best-fit cosmological parameters when we need only those. Our implementation of Powell’s method is completely independent and different than what is used in COSMOMC it terms of the uses and the values of design parameters and and . It is also important to keep in mind that MCMC methods themselves also can find the best-fit point without any external optimizer, however, can take more time than COSMODSM or COSMOBOBYQA.

Parameter COSMOMC COSMOPSO COSMODSM COSMOBOBYQA
0.022597 0.022601 0.022601 0.022630
0.112282 0.114044 0.114110 0.114148
70.06592 69.82771 69.807095 69.87489
2.177424 2.395936 2.399997 2.414575
0.973842 0.973390 0.973447 0.974537
0.859690 0.086614 0.087404 0.091901
3779.0220 3778.7839 3778.7830 3778.8495
Table 5: We show the best fit cosmological parameters, which give the maximum likelihood, for COSMOMC, PSO, Downhill-Simplex methods and Powell’s BOBYQA in the second, third, fourth and fifth columns of the above table respectively.

In Tab. (5) we show the cosmological parameters which we have estimated with four different methods named COSMOMC, COSMOPSO, COSMODSM and COSMOBOBYQA and found that all the methods are able to find the correct values of the best fit cosmological parameters within reasonable time.

V Discussion and Conclusion

Building on our earlier work 2012PhRvD..85l3008P () in the present work we have shown that particle swarm optimization not only can be used to find the best fit cosmological parameters from CMB data sets like WMAP and Planck, it can also be used to sample the parameter space quite effectively. We have found that when using PSO as a sampler we must avoid early convergence and use multiple realizations of PSO so that the parameter space is covered uniformly. We present the results of our analysis of PSO for the WMAP nine year and Planck data and show that these are quite consistent with the standard results. In order to show that the sampling of the parameter space is quite effective we show that we can use the same analysis pipeline GetDist to process the PSO sampled point which is used to process the MCMC sampled points in COSMOMC and get consistent results which is clear from the one dimensional marginalized probability distribution and 68% and 95% confidence regions we show in Figs. (2) and (3) for the WMAP nine year and WMAP nine year + Planck data respectively.

We ran COSMOMC and COSMOPSO on a Linux cluster with multiple nodes using Message Passing Interface (MPI) library and shared memory parallelization library OpenMP. We have found that COSMODSM and COSMOBOBYQA are faster than COSMOPSO and COSMOMC, however, need fine tuning of the design parameters and have higher chances of failure. COSMOPSO has the problem of early convergence but that is not an issue if our aim is just to find the best fit point and not to sample the parameter space. In general, for 30 PSO particles in COSMOPSO we can achieve convergence within 400 steps, or less than 12000 computation of the cost function (CAMB calls + likelihood computation). For COSMODSM we need 400 steps with 7 vertices or around 2800 computation of the cost function. For COSMOBOBYQA we achieve convergence in around 400 steps or 400 computation of the cost function. It should be noted that all the different methods we have discussed here have different convergence criteria so direct comparison is not fair, however, whatever we have mentioned is qualitatively is true i.e., COSMOBOBYQA method is faster than COSMOPSO, COSMODSM and COSMOMC.

In the present work we use only basic formalism of PSO, however, PSO has scope for modifications also some of which have been already done for different type of problems (see Blum2008 (); Lazinic2009 ()). We believe that with some modifications PSO should be able to become a robust method for sampling also in the Bayesian data analysis. One of the interesting properties of PSO is that the trajectories of PSO particles can be compared with trajectories of point particles in classical mechanics since they also follow trajectories which minimize some function (action) like PSO particles. However, this analogy cannot be stretched too far since for PSO particles time is not continuous i.e., .

We also presented implementations of two new methods, Downhill simplex method of Nelder and Mead and Powell’s method of Bound Optimization BY Quadratic Approximation (BOBYQA) in the present work which are must faster than PSO but are less robust, in the sense, PSO is very less sensitive to its design parameters and does not need any starting point (all the starting points are equally good) which is not the case for other algorithm. Since COSMOPSO is a completely parallel code (uses distributed memory parallelism using MPI and shared memory parallelism using OpenMP) therefore computational cost is not an issue for a cluster.

We have found that all the four methods (as are summarized in Tab. (5) are quite successful in finding the best-fit cosmological parameters in a reasonable time.

Cosmological parameter estimation from CMB data sets is still an active area of research in cosmology and optimization methods like PSO can be quite helpful not only in cosmological parameters but at other stages also of CMB data analysis pipeline. Being an independent and robust PSO has potential of becoming an essential component of the CMB data analysis toolkit.

Acknowledgments

The author acknowledges the use of codes CAMB, COSMOMC and Planck likelihood Plc for this work and also thank the WMAP and Planck team for making the data publicly available. The authors thanks Scott Dodelson for suggesting to compare PSO with the downhill simplex method and Gaurav Goswami for the feedback and comments on the drafts. The author would also like to thank Tarun Souradeep for giving insight about CMB physics time to time and introducing the problem of cosmological parameter estimation. All the computational work for this project was done on the IUCAA HPC facility. The author specially thank the Science and Engineering Research Board (SERB) of the Govt. of India for financially supporting this work via a Start-Up Research Grant (Young Scientists) with grant no SR/FTP/PS-102/2012.

References

  • (1) J. Prasad and T. Souradeep, Phys. Rev. D85, 123008 (2012), 1108.5600.
  • (2) J. R. Bond, R. Crittenden, R. L. Davis, G. Efstathiou, and P. J. Steinhardt, Physical Review Letters 72, 13 (1994), arXiv:astro-ph/9309041.
  • (3) W. Hu and N. Sugiyama, Phys. Rev. D51, 2599 (1995), arXiv:astro-ph/9411008.
  • (4) S. Dodelson and J. M. Jubas, Astrophys. J. 439, 503 (1995), astro-ph/9308019.
  • (5) G. Jungman, M. Kamionkowski, A. Kosowsky, and D. N. Spergel, Phys. Rev. D54, 1332 (1996), arXiv:astro-ph/9512139.
  • (6) G. Jungman, M. Kamionkowski, A. Kosowsky, and D. N. Spergel, Physical Review Letters 76, 1007 (1996), arXiv:astro-ph/9507080.
  • (7) M. Zaldarriaga, D. N. Spergel, and U. Seljak, Astrophys. J. 488, 1 (1997), arXiv:astro-ph/9702157.
  • (8) J. R. Bond, G. Efstathiou, and M. Tegmark, Mon. Not. R. Astron. Soc. 291, L33 (1997), arXiv:astro-ph/9702100.
  • (9) J. R. Bond, A. H. Jaffe, and L. Knox, Phys. Rev. D57, 2117 (1998), arXiv:astro-ph/9708203.
  • (10) A. Kosowsky, M. Milosavljevic, and R. Jimenez, Phys. Rev. D66, 063007 (2002), arXiv:astro-ph/0206014.
  • (11) M. Chu, M. Kaplinghat, and L. Knox, Astrophys. J. 596, 725 (2003), arXiv:astro-ph/0212466.
  • (12) S. Dodelson, Modern cosmology (Academic Press, San Diego, U.S.A., 2003).
  • (13) J. R. Bond and G. Efstathiou, Mon. Not. R. Astron. Soc. 226, 655 (1987).
  • (14) N. Christensen, R. Meyer, L. Knox, and B. Luey, Classical and Quantum Gravity 18, 2677 (2001), arXiv:astro-ph/0103134.
  • (15) A. Challinor, arXiv:astro-ph/0403344 (2004), arXiv:astro-ph/0403344.
  • (16) S. Hamimeche and A. Lewis, Phys. Rev. D77, 103013 (2008), 0801.0554.
  • (17) S. Hamimeche and A. Lewis, Phys. Rev. D79, 083012 (2009), 0902.0674.
  • (18) J. R. Bond and G. Efstathiou, Astrophys. J. Lett. 285, L45 (1984).
  • (19) C.-P. Ma and E. Bertschinger, Astrophys. J. 455, 7 (1995), astro-ph/9506072.
  • (20) W. Hu, U. Seljak, M. White, and M. Zaldarriaga, Phys. Rev. D57, 3290 (1998), astro-ph/9709066.
  • (21) U. Seljak and M. Zaldarriaga, Astrophys. J. 469, 437 (1996), arXiv:astro-ph/9603033.
  • (22) U. Seljak and M. Zaldarriaga, CMBFAST: A microwave anisotropy code, 1999, 9909.004, Astrophysics Source Code Library.
  • (23) A. Lewis, A. Challinor, and A. Lasenby, Astrophys. J. 538, 473 (2000), arXiv:astro-ph/9911177.
  • (24) N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, J. Chem. Phys. 21, 1087 (1953).
  • (25) W. K. Hasting, Biometrika 57, 97 (1970).
  • (26) A. Lewis and S. Bridle, Phys. Rev. D66, 103511 (2002), arXiv:astro-ph/0205436.
  • (27) L. Verde et al., Astrophys. J. Suppl. Ser. 148, 195 (2003), arXiv:astro-ph/0302218.
  • (28) L. Verde, Statistical Methods in Cosmology, in Lecture Notes in Physics, Berlin Springer Verlag, edited by G. Wolschin, volume 800 of Lecture Notes in Physics, Berlin Springer Verlag, pp. 147–177, 2010, 0911.3105.
  • (29) S. Das and T. Souradeep, JCAP7, 18 (2014), 1403.1271.
  • (30) J. A. Nelder and R. Mead, The Computer Journal 7, 308 (1965).
  • (31) J. Kennedy and R. C. Eberhart, IEEE International Conference on Neural Networks 4, 1992 (1995).
  • (32) J. Kennedy and R. C. Eberhart, Swarm Intelligence (Morgan Kufmann, 2001).
  • (33) C. Blum and D. Merkle, Swarm Intelligence: Introduction and Applications (SpringeR, 2008).
  • (34) A. Lazinica, Particle Swarm Optimization (In-Tech., 2009).
  • (35) A. P. Engelbrecht, Computational Intelligence, An Introduction (John Wiley & Son, 2002).
  • (36) C. Skokos, K. E. Parsopoulos, P. A. Patsis, and M. N. Vrahatis, Mon. Not. R. Astron. Soc. 359, 251 (2005), astro-ph/0502164.
  • (37) Y. Wang and S. D. Mohanty, Phys. Rev. D81, 063002 (2010), 1001.0923.
  • (38) A. Rogers and J. D. Fiege, Astrophys. J. 727, 80 (2011), 1101.5803.
  • (39) L. Lentati et al., Phys. Rev. D87, 104021 (2013), 1210.3578.
  • (40) http://camb.info/.
  • (41) http://map.gsfc.nasa.gov/.
  • (42) D. N. Spergel et al., Astrophys. J. Suppl. Ser. 148, 175 (2003), astro-ph/0302209.
  • (43) C. L. Bennett et al., Astrophys. J. Suppl. Ser. 208, 20 (2013), 1212.5225.
  • (44) E. Hivon et al., Astrophys. J. 567, 2 (2002), astro-ph/0105302.
  • (45) M. Tegmark, Phys. Rev. D55, 5895 (1997), astro-ph/9611174.
  • (46) H. K. Eriksen et al., Astrophys. J. Suppl. Ser. 155, 227 (2004), astro-ph/0407028.
  • (47) G. Hinshaw et al., Astrophys. J. Suppl. Ser. 208, 19 (2013), 1212.5226.
  • (48) Planck Collaboration et al., Astronomy & Astrophysics571, A1 (2014), 1303.5062.
  • (49) http://wiki.cosmos.esa.int/planckpla/index.php/Main_Page.
  • (50) Planck Collaboration et al., Astronomy & Astrophysics571, A15 (2014), 1303.5075.
  • (51) Planck Collaboration et al., Astronomy & Astrophysics571, A16 (2014), 1303.5076.
  • (52) W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical recipes in FORTRAN. The art of scientific computing (Cambridge University Press, 1992).
  • (53) W. Forst and D. Hoffmann, Optimization — Theory and Practice (Springer, New York, U.S.A., 2010).
  • (54) M. J. D. Powel, The Computer Journal 5, 147 (1962).
  • (55) http://mat.uc.pt/ zhang/software.html.
  • (56) G. Goswami and J. Prasad, Phys. Rev. D88, 023522 (2013), 1303.4747.
  • (57) http://cosmologist.info/cosmomc/.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
101327
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description