Constrained probability distributions of correlation functions
Key Words.:cosmology - gravitational lensing - large-scale structure of the Universe - galaxies: statistics - Methods: statistical
11email: firstname.lastname@example.org 22institutetext: Max-Planck-Institut für Gravitationsphysik, Callinstraße 38, 30167 Hannover, Germany
Context:Two-point correlation functions are used throughout cosmology as a measure for the statistics of random fields. When used in Bayesian parameter estimation, their likelihood function is usually replaced by a Gaussian approximation. However, this has been shown to be insufficient.
Aims:For the case of Gaussian random fields, we search for an exact probability distribution of correlation functions, which could improve the accuracy of future data analyses.
Methods:We use a fully analytic approach, first expanding the random field in its Fourier modes, and then calculating the characteristic function. Finally, we derive the probability distribution function using integration by residues. We use a numerical implementation of the full analytic formula to discuss the behaviour of this function.
Results:We derive the univariate and bivariate probability distribution function of the correlation functions of a Gaussian random field, and outline how higher joint distributions could be calculated. We give the results in the form of mode expansions, but in one special case we also find a closed-form expression. We calculate the moments of the distribution and, in the univariate case, we discuss the Edgeworth expansion approximation. We also comment on the difficulties in a fast and exact numerical implementation of our results, and on possible future applications.
In several fields of science, there are observations that can be modelled as random processes, either time series or spatial random fields. In cosmology, mostly random fields are of interest, for example as the density perturbation field. Therefore, we use the language of random fields in this article. Still, all results are applicable to time series as well.
An important statistical quantity of random fields is the two-point correlation function, labelled in the following, with being the separation between two points of the field. When the empirical correlation function has been measured, it can be used to estimate the parameters of a theoretical model for the random field. The standard method for this, at least in cosmology, is Bayesian inference, using Bayes’ theorem:
where are the model parameters, is the posterior probability of some parameters given the data , is called the likelihood and is the prior probability of the parameters. The prior can be chosen, more or less freely, by several schemes, and the integral in the denominator is usually not relevant for parameter estimation studies, as only posterior ratios are necessary. But it is crucial to the Bayesian analysis to determine the correct likelihood function for the problem beforehand, a process also known as forward modelling.
However, in many applications, such as those most relevant for cosmology, it is very difficult to obtain the correct likelihood function either empirically or theoretically. Therefore, in those cases where the underlying random field is assumed to be Gaussian, it has been standard practice for some time to simply use a Gaussian approximation for the likelihood, as well. Examples include Fu et al. (2008) in a cosmic shear analysis, Okumura et al. (2008) for luminous red galaxy counts, and Seljak & Bertschinger (1993) for the cosmic microwave background correlation function.
There is no a priori reason to assume that the Gaussian approximation should be exact, or even very accurate. In fact, it was shown by Hartlap et al. (2009) that there is a significant deviation between the real likelihood and a Gaussian for simulated cosmic shear data. In this case, the error bars on cosmological parameters improved by 10-40% when using a non-Gaussian likelihood. This result encouraged a mathematical study on the properties of correlation functions of Gaussian random fields. Schneider & Hartlap (2009) found that these correlation functions cannot take arbitrary values, but are subject to constraints. This can be seen from first studying the power spectrum of the field, which is Fourier conjugate to the correlation function. Power spectra are always non-negative, for all wave vectors .
If the correlation function is measured over a separation vector (or ‘lag parameter’) and its integer multiples , the non-negativity of leads to constraints in the form of inequalities . Here, is the ‘correlation coefficient’ normalised by the correlation at zero separation, and the upper and lower boundaries and are functions of all with . The three lowest order constraints are , and .
Since a Gaussian or multivariate Gaussian is unbounded, the existence of the constraints already demonstrates that the real likelihood function cannot be exactly Gaussian, but at least must have its tails cut off. Still, the constraints do not immediately lead to a full description of the likelihood function. Another study by Sato et al. (2011) has used the copula approach to construct a more realistic likelihood function for large-scale structure data. However, this approach is mostly useful for numerical application and does not yield direct insights into the analytical structure of the likelihood function. Also, a preliminary investigation by Wilking & Schneider (in preparation) has shown that a simple approach using a Gaussian copula cannot correctly reproduce the likelihood under constraints. Therefore, in this article we will focus on a simple type of random fields, but try to find a fully analytical expression for the likelihood function, or probability distribution, of correlation functions, and will demonstrate that it is indeed manifestly non-Gaussian.
For this derivation, we concentrate on Gaussian random fields, since their unique properties allow for a fully analytical calculation. In addition, a Gaussian field is fully specified by its two-point statistics, i.e. or equivalently , which makes the derivation of their probability distributions especially rewarding, since then determines the full set of joint probability distributions , , and so on, with .
This article consists of five main sections, apart from this introduction. In Sect. 2, we derive the univariate probability distribution function. We also calculate its moments and present explicit results for a special power spectrum. Then, we repeat the derivation for bivariate distributions in Sect. 3, also discussing its moments. We go on to discuss possible numerical implementations of these results in Sect. 4. Using numerical evaluation, we can discuss the properties of the distribution functions in more detail in Sect. 5. There, we comment on the general analytical properties of the uni- and bivariate functions. We also use the moments to construct an Edgeworth expansion of the univariate distribution, and we generalise our derivations and results to higher dimensions. We conclude the article in Sect. 6.
2 Univariate distribution
We describe a real Gaussian random field by its Fourier decomposition
The Fourier modes are independently distributed, each with a Gaussian probability distribution:
The mode dispersions are determined by the power spectrum,
The following derivations will be independent of the choice of a power spectrum, as long as it obeys the constraint of non-negativity. So the are arbitrary parameters.
As is usually done in numerical simulations, we consider a finite field, , with periodic boundary conditions. A sufficiently large finite field will be representative of a field on the whole of if the random field has no power on scales larger than . This is also equivalent to the assumption of statistical homogeneity on these scales. Together with isotropy, we know that the correlation function depends on the distance modulus, or lag parameter, only:
We will now concentrate on a one-dimensional field to keep expressions simple. However, in Sect. 5.4 we will see that all results also hold for higher dimensions. We start with an estimator for the correlation function from the finite field, given by
which approaches the true value for .
Going to -space, the Fourier modes that fit inside the interval are discrete. Each wave number has to fulfil the condition , so that
with . For a real-valued random field, the Fourier components fulfil . Using this property, the mode expansion of the field can be split up as
Without loss of generality, we assume that the field has zero mean, since we can always achieve this by a simple transformation. Then, the zero mode cancels out. We can then insert his expansion into the estimator (6). For the spatial integrals, we can use the integral representation of the Kronecker delta symbol:
The correlation function is then given by
Executing the sum over , only half of the terms survive, and the remaining exponentials give a cosine function:
Now that we have a convenient expression for (the estimator of) the correlation function, we need to take one more intermediate step before calculating its probability distribution. This is the characteristic function, which, in general, is defined as the Fourier transform of a probability distribution function. For the given random field, we can calculate the characteristic function by means of an ensemble average:
Since the field is Gaussian, the modes are independently distributed, and the probability distribution factorises. Inserting (11), we get
In the individual factors , we substitute to solve the integral:
With the product over all modes, we obtain the full characteristic function as
where in the last step we introduced the shorthand notation
The characteristic function is an important result in its own right, since we can use it to calculate the moments of the distribution, which we will do in Sect. 2.2. But for now, we go on to calculate the probability density distribution by an inverse Fourier transform:
We will solve this integral using the theorem of residues, since the integrand is analytic except at its poles
All of these lie on the imaginary axis. However, if for some , then the corresponding factor in the characteristic function is unity, and there is no pole and no contribution to the integral from this term. On the other hand, some may be equal, and so there may be poles of higher order (multiple poles). These special cases will be discussed in appendix A, but for now we focus on the standard case of simple poles.
We can choose a contour of integration made up of two parts, a straight section combined with a semicircle of radius , which we parametrise by , with either for the upper or for the lower half-plane. To get the full integral for , we have to take the limit of . The numerator of the integrand is , whereas the denominator, after executing the product, is only a polynomial in . So the numerator dominates the convergence behaviour, requiring . For , this corresponds to , and we can close the contour in the lower half-plane. If instead , the requirement is , and we can close the contour in the upper half-plane. So for a given , only the poles lying in the corresponding half-plane contribute to the sum of residues. We encode this behaviour in the factor
where for the Heaviside step function we use the convention
If all poles are simple, we can calculate the residues by
Inserting the winding numbers for the upper and for the lower contour, the full integral is then
This result holds for most relevant combinations of input power spectra and lag parameters. For other cases, we derive a generalised result of a very similar functional form, that also holds for multiple poles, in appendix A.
We were unable to further simplify the limit of the infinite sum in the probability distribution function (22). However, as long as the power spectrum decreases at least like for large , our numerical implementation of the sum formulae, as described in Sect. 4, showed that the probability distribution function converges as well. In practice, it is therefore possible to truncate the series at some maximum mode number without losing much precision.
Also, it is obvious from Eq. (22) that for large , a single mode will always dominate the sum, so that asymptotically, the distribution is not Gaussian, , but instead exponential, .
We also note at this point that Eq. (22) depends on the field size , separation and power spectrum only through the ratios and , as can be seen from the definition of in Eq. (16). When we present numerical results in the further course of this article, we therefore give these quantities only. Furthermore, in the case of a Gaussian power spectrum,
we can directly state as the relevant quantity.
For such a power spectrum with and a separation , Fig. 1 demonstrates the convergence behaviour of the distribution function. In this case, the calculations with and produce virtually indistinguishable results, so that the former mode number is sufficient for all practical purposes. In general, the steepness of the power spectrum determines which maximum wave number is necessary: Inserting the wave numbers from Eq. (7) into Eq. (23), we see that is sufficient for Gaussian power spectra.
In this section, we calculate the moments of the distribution. Apart from possible use in future applications, this is also useful as a check for the distribution function derived above, since we can derive the moments in two independent ways and compare results. First, we can get the moments from the derivatives of the characteristic function (Kendall & Stuart 1977, p. 63):
The first derivative yields the mean of the distribution, or the expectation value of :
For the variance and other higher-order quantities, we use the central moments, which are the moments of the distribution of . The centralised characteristic function is simply
and it yields the central moments as (Kendall & Stuart 1977, pp. 57, 63)
The first six non-zero central moments are then
From these moments, we can also obtain some conventional statistical quantities: the variance , the standard deviation , the skewness and the kurtosis .
Alternatively, we can also calculate the moments from the probability distribution function by the integrals
An important check for the sanity of the distribution function will be to re-obtain the normalisation as unity by this approach. We can compute it as the moment of order zero:
We can evaluate this sum of products by considering another Fourier transform from back to :
Comparing this expression for the characteristic function to the original from Eq. (15), we get
Evaluation at yields
so that together with Eq. (34) we have shown that .
To calculate the higher moments, we make use of the integral
and of sum formulae of the type
These follow from taking derivatives with respect to in Eq. (36) and setting . We have checked the results for mean, variance, skewness and kurtosis and have reproduced the results of the characteristic function approach, demonstrating the validity of the probability distribution function.
2.3 Cumulative distribution function
From the probability distribution function (22), we can also directly calculate the cumulative distribution function, defined as . For single poles only, it is given by
Again, we use the notation for the Heaviside factor, but this time note the extra term of . Analogous expressions in the presence of higher-order poles could be obtained by integrating the corresponding probability density (95).
2.4 A special case - power-law power spectra
In general, the probability distribution function we found is a sum formula that needs to be evaluated numerically. However, if the power spectrum of the underlying random field is a power law , we can analytically find a more explicit expression for the univariate distribution function. In the case of
with a normalisation constant, and for a separation of , we have and the product factors are
which is a special case of the infinite product family (Prudnikov et al. 1986, p. 754)
The probability density function (at zero separation) is then
where the field size comes in through Eq. (4). We can now express the cumulative distribution function in terms of known functions as
Here, is a special case of the Jacobi elliptic theta function, which is known to satisfy (Whittaker & Watson 1963, p463 ff.)
We can then re-obtain the probability density function by differentiation,
where . It is not clear to us yet whether this connection to elliptical functions is a coincidence for just this special case, or whether it points towards a possible reformulation of the probability distribution for general power spectra.
However, it allows us to analyse the asymptotic behaviour of the distribution function for large . From Eq. (48), we have
for . Then, inserting into Eq. (2.4), we obtain
for . Thus, the distribution function behaves like a single exponential at large , and not like a Gaussian at all. This is in agreement with our general result for large from Sect. 2.1.
For the same power spectrum, we can also explicitly calculate the moments. With , we obtain the -th (central) moment from the sum
Here, is the Riemann zeta function and are the Bernoulli numbers (Prudnikov et al. 1986, p. 776) given by
For example, the mean of the distribution then is simply given by and the standard deviation is .
For power law power spectra with a different exponent, Eq. (45) still allows us to express the product factors explicitly in terms of known (trigonometric and hyperbolic) functions. Regrettably, these do not yield any known functions for the full probability distribution, or for the moments, as far as we are aware. Thus, a really explicit form was found for the special case of only.
3 Bivariate distribution
In this section, we will calculate the bivariate probability distribution function , which we will mostly abbreviate as with , . All preliminaries carry over from the univariate case, and the starting point of this calculation is
The characteristic function is now bivariate as well,
In the last step, we have defined a generalised shorthand for the factors to allow for the two different lag parameters . It is worth noting at this point that for higher multivariate distributions, say , the only change necessary in the characteristic function will be to add additional terms of in this factor, resulting in the generally valid expression
Next, we will obtain the bivariate probability distribution itself from the characteristic function by Fourier inversion, in analogy to the univariate case. But an important difference arises in this step, since the inversion now contains a double integration. Thus, we have to calculate the following:
Since the pairs of variables and each are mutually independent, the result has to be invariant under exchanging the order of integration. We choose to first integrate over and after that over . From the resulting formula, the symmetry will not be immediately apparent. However, we have checked the equivalence of both approaches by also explicitly evaluating the other choice. Also, we will assume simple poles in both integrations, and will only briefly comment on the effects of multiple poles at the end of this section.
The poles for the inner integration are now located at
and, for simple poles, their residues are
Here we have simplified the expression by defining the determinant factor
The arguments as to the choice of contours apply exactly as before, since the imaginary part of the poles remains unchanged from Eq. (18) to Eq. (58). Thus, poles with positive factors lie within the lower contour, whereas those with negative lie within the upper contour. The corresponding Heaviside factors encoding this behaviour are and . The winding numbers are for the upper and for the lower contour. So we obtain the full integral as
The remaining task is to calculate the second integral. For ease of notation, we will substitute and introduce the new variables
Then, the second integral can be reduced to the calculation of the term
This integral has poles at . For simple poles, the residues are
Furthermore, the choice of contours is also very similar to the previous procedure. We will again close the contour with semi-circles in either the upper or the lower half-plane, and the purely imaginary poles lead, by the same convergence argument, to Heaviside factors for the contour in the lower half-plane and for the contour in the upper half-plane. With the usual winding numbers , the integral is
Reinserting this result into the full expression (61), the bivariate probability distribution function is