Constrained correlation functions

Constrained correlation functions

Key Words.:
cosmology – gravitational lensing – large-scale structure of the Universe – galaxies: evolution – galaxies: statistics
1

Measurements of correlation functions and their comparison with theoretical models are often employed in natural sciences, including astrophysics and cosmology, to determine best-fitting model parameters and their confidence regions. Due to a lack of better descriptions, the likelihood function of the correlation function is often assumed to be a multi-variate Gaussian.

Using different methods, we show that correlation functions have to satisfy contraint relations, owing to the non-negativity of the power spectrum of the underlying random process. Specifically, for any statistically homogeneous and (for more than one spatial dimension) isotropic random field with correlation function , we derive inequalities for the correlation coefficients (for integer ) of the form , where the lower and upper bounds on depend on the , with , or more explicitly

 Ξn−{ξ(0),ξ(x),ξ(2x),…,ξ([n−1]x)}≤ξ(nx)≤Ξn+{ξ(0),ξ(x),ξ(2x),…,ξ([n−1]x)}.

Explicit expressions for the bounds are obtained for arbitrary . We show that these constraint equations very significantly limit the set of possible correlation functions. For one particular example of a fiducial cosmic shear survey, we show that the Gaussian likelihood ellipsoid has a significant spill-over into the region of correlation functions forbidden by the aforementioned constraints, rendering the resulting best-fitting model parameters and their error region questionable, and indicating the need for a better description of the likelihood function.

We conduct some simple numerical experiments which explicitly demonstrate the failure of a Gaussian description for the likelihood of . Instead, the shape of the likelihood function of the correlation coefficients appears to follow approximately that of the shape of the bounds on the , even if the Gaussian ellipsoid lies well within the allowed region. Therefore, we define a non-linear and coupled transformation of the , based on these bounds. Some numerical experiments then indicate that a Gaussian is a much better description of the likelihood in these transformed variables than of the original correlation coefficients – in particular, the full probability distribution then lies explicitly in the allowed region.

For more than one spatial dimension of the random field, the explicit expressions of the bounds on the are not optimal. We outline a geometrical method how tighter bounds may be obtained in principle. We illustrate this method for a few simple cases; a more general treatment awaits future work.

1 Introduction

One of the standard ways to obtain constraints on model parameters of a stochastic process is the determination of its two-point correlation function from observational data, where is the separation vector between pairs of points. This observed correlation function is then compared with the corresponding correlation function from a model, where denotes the model parameter(s). A commonly used method for this comparison is the consideration of the likelihood function , which yields the probability for observing the correlation function for a given set of parameters . It is common (see Seljak & Bertschinger 1993 for an application to microwave background anisotropies, Fu et al. 2008 for a cosmic shear analysis, or Okumura et al. 2008 for an application to the spatial correlation function of galaxies) to approximate this likelihood by a Gaussian,

 L({ξ(xi)}|p)∝exp[−12N∑i,j=1[ξ(xi)−ξ(xi;p)]\@tensCov−1ij[ξ(xj)−ξ(xj;p)]], (1)

where it has been assumed that the random field is homogeneous and isotropic, so that the correlation function depends only on the absolute value of the separation vector. Furthermore, it has been assumed that the correlation function is obtained at discrete points ; for an actual measurement, one usually has to bin the separation of pairs of points, in which case is the central value of the bin. In (1), is the covariance matrix of the correlation function between any pair of separations , .

In a recent paper (Hartlap et al. 2009) we have investigated the likelihood function for the cosmic shear correlation function and found that it deviates significantly from a Gaussian. This study relied on numerical ray-tracing simulations through the density field obtained from N-body simulations of the large-scale structure in the Universe.

In this paper, we will show that the likelihood function of the correlation function cannot be a Gaussian. In particular, we show that any correlation function obeys strict constraints, which can be expressed as

 Ξn−{ξ(0),ξ(x),ξ(2x),…,ξ([n−1]x)}≤ξ(nx)≤Ξn+{ξ(0),ξ(x),ξ(2x),…,ξ([n−1]x)} (2)

for arbitrary and integer ; these constraints can be derived by several different methods. With one of these methods, one can derive explicit equations for the upper and lower bounds in (2) for arbitrary values of . The basic reason for the occurrence of such constraints is the non-negativity of the power spectrum, or equivalently, the fact that covariance matrices of the values of a random fields at different positions are positive (semi-)definite.

The outline of the paper is as follows: In Sect. 2, we obtain bounds on the correlation function using the Cauchy–Schwarz inequality, as well as making use of the positive definiteness of the covariance matrix of random fields. It turns out that the latter method gives tighter constraints on ; in fact, these constraints are optimal for one-dimensional random fields. We show in Sect. 3 that these bounds significantly constrain the set of functions which can possibly be correlation function. Whereas the bounds obtained in Sect. 2 are valid for any dimension of the random field, they are not optimal in more than one dimension; we consider generalizations to higher-order random fields, and to arbitrary combinations of separations in Sect. 4. In Sect. 5 we introduce a non-linear coupled transformation of the correlation coefficients based on the bounds; for the case of a one-dimensional field, all combinations of values in these transformed quantities correspond to allowed correlation functions (meaning that one can find a non-negative power spectrum yielding the corresponding correlations). Hence, a Gaussian probability distribution of these transformed variables appears to be more realistic than one for the correlation function itself. This expectation is verified with some numerical experiments which are described in Sect. 6. Furthermore, we show that for a fiducial cosmological survey the Gaussian likelihood for the correlation function can significantly overlap with the region forbidden by (2), depending on survey size and number of separations at which the correlation function is measured. We conclude with a discussion and an outlook to future work and open questions.

2 Upper and lower bounds on correlation functions

Consider an -dimensional homogeneous and isotropic random field , with vanishing expectation value , and with power spectrum and correlation function

 ξ(\boldmathx)=∫dnk(2π)nP(|\boldmath% k|)exp(−i\boldmathk⋅\boldmathx), (3)

which depends only on the absolute value of , due to the assumed isotropy. This relation immediately shows that

 −ξ(0)≤ξ(x)≤ξ(0), (4)

owing to and . However, the lower bound in (4) is not an optimal one for more than one dimension. In two dimensions, the integral over the polar angle can be carried out, yielding

 ξ2−D(x)=∫∞0dkk2πP(k)J0(kx), (5)

where is the Bessel function of the first kind of zero order. Since has an absolute minimum at with (see also Abrahamsen 1997), the non-negativity of implies that . Similarly, in three dimensions one has

 ξ3−D(x)=∫∞0dkk22π2P(k)j0(kx), (6)

where is the spherical Bessel function of zero order. Since has an absolute minimum at of , the non-negativity of implies that .

In the following, we will concentrate mainly on the one-dimensional case and write

 ξ(x)=∫∞0dkP0(k)cos(xk). (7)

However, higher dimensions of the random field are included in all what follows, since by specifying in (3), we find

 ξ(x)=∫dnk(2π)nP(\boldmathk)exp(−ik1\boldmathx)=∫∞0dk1cos(k1x)2(2π)n∫dk2…dknP(k1,k2,…,kn), (8)

which thus takes the same form as (7). Thus, the -dimensional case can be included in the same formalism as the one-dimensional case; note that for this argument, the random field is not restricted to be isotropic. However, as we shall discuss later, the resulting inequalities will not be optimal for isotropic fields of higher dimension.

In the foregoing equations, can correspond either to the power spectrum of the underlying random process, or the sum of the underlying process and statistical noise. Furthermore, the power spectrum can also be the square of the Fourier transform of the realization of a random process in a finite sample volume. In all these cases, the non-negativity of applies, and the constraints on the corresponding correlation function derived below must hold.

2.1 Constraints from the Cauchy–Schwarz inequality

Making use of the Cauchy–Schwarz inequality,

 [∫∞0dkf(k)h(k)]2≤∫∞0dkf2(k)∫∞0dkh2(k), (9)

we obtain by setting and that2

 ξ2(x)≤ξ(0)∫∞0dkP0(k)cos2(xk)=ξ(0)∫∞0dkP0(k)1+cos(2xk)2=ξ(0)2[ξ(0)+ξ(2x)], (10)

where we made use of the identity . Together with (4) we therefore obtain the constraint equation

 −ξ(0)+2ξ2(x)ξ(0)≤ξ(2x)≤ξ(0). (11)

The interpretation of this constraint can be better seen in terms of the correlation coefficient , which is defined for an arbitrary . Then, (11) reads

 −1+2r21≤r2≤1. (12)

This result can be interpreted as follows: If two points separated by are strongly correlated, , then the value of the field at a position must equally be correlated with that at , which implies that also the correlation between the point and the origin must be large. If the field at is strongly anticorrelated with that at the origin, , than the field at must be similarly anticorrelated with that at , implying a strong correlation between the point and the origin. The smaller , the weaker is the constraint (12).

Making use of the identity

 Missing or unrecognized delimiter for \right

and applying the Cauchy–Schwarz inequality with and , we find that

 [ξ(x)+ξ(2x)]2≤[ξ(0)+ξ(x)][ξ(0)+ξ(3x)]. (13)

A second inequality is obtained by using in a similar way the identity

 Missing or unrecognized delimiter for \right

Both of these inequalities are summarized in terms of the correlation coefficient as

 −1+(r1+r2)2(1+r1)≤r3≤1−(r1−r2)2(1−r1). (14)

Further inequalities involving , with being an integer, can be derived in this way. Making use of the relations

 [cosa+cos(n−1)a]2=[1+cos(na)][1+cos(n−2)a];[cosa−cos(n−1)a]2=[1−cos(na)][1−cos(n−2)a]

for , and employing the Cauchy–Schwarz inequality in the same way as before, we find

 −1+(r1+rn−1)21+rn−2≤rn≤1−(r1−rn−1)21−rn−2, (15)

where the special case has been derived already. We have thus found a set of inequalities for all correlation coefficients . In the next section, we will obtain bounds on the correlation function using a different method, and will show that these ones are stricter than those in (15).

2.2 Constraints from a covariance matrix approach

We will proceed in a different way which is more straightforward. Consider a set of points , with integer and . The covariance matrix of the random field at these points has the simple form

 \@tensCij=⟨g(ix)g(jx)⟩=ξ(|i−j|x). (16)

As is well known, the covariance matrix must be positive semi-definite, i.e., its eigenvalues must be non-negative. Dividing by , we define

 \@tensAij=\@tensCij/ξ(0)=r|i−j|, (17)

and the eigenvalues of must obey . For , the eigenvalues read , yielding

 |r1|≤1, (18)

i.e. we reobtain (4). For , the eigenvalues read

 λ1=1−r2,λ2,3=2+r2±√8r21+r222,

and the conditions can be solved for , yielding (12). The four eigenvalues of in the case are

 λ1,2,3,4=1±12[r1+r3±√5r21−8r1r2+4r22−2r1r3+r23],

and the conditions after some algebra can be brought into the form (14). For , the eigenvalues of have a more complicated form; they are obtained as solutions of higher-order polynomials (see below).

However, we do not need an explicit expression for the eigenvalues, but only need to assure that they are non-negative. This condition can be formulated in a different way. The eigenvalues of the matrix are given by the roots of the characteristic polynomial, which is the determinant of the matrix . For a given , this polynomial is of order in and of the form

 λN+N−1∑k=0bkλk. (19)

The coefficients of the polynomial are functions of the , as obtained from calculating the determinant. On the other hand, they can be expressed by the roots of the polynomial; for example, for , one finds that

 λ3+2∑k=0bkλk=λ3−(λ1+λ2+λ3)λ2+(λ1λ2+λ1λ3+λ2λ3)λ−λ1λ2λ3.

The condition that all is then equivalent to the condition that , , and . It is easy to show that these conditions lead to the constraint (12), together with .

This procedure can be generalized for arbitrary : the condition that all eigenvalues of the correlation matrix are non-negative is equivalent to requiring that the coefficients of the characteristic polynomial (19) satisfy if is odd, and if is even. The explicit calculation of the characteristic polynomial by hand becomes infeasible, hence we use the computer algebra program Mathematica (Wolfram 1996). For successively larger , we calculate the characteristic polynomial. As we will show below, the characteristic polynomial factorizes into two polynomials; if is even, it factorizes into two polynomials of order , whereas if is odd, it factorizes into polynomials of order . The condition that all eigenvalues are non-negative is equivalent to the condition that the roots of both polynomials are non-negative; this condition can be translated into inequalities for the coefficients of the polynomials in the same way as described above.

We illustrate this procedure for the case . The characteristic polynomial in this case becomes

 (λ2+b1λ+b0)(λ3+c2λ2+c1λ+c0), (20)

with

 b1 = r2+r4−2 b0 = (1−r2)(1−r4)−(r1−r3)2 c2 = −3−r2−r4 c1 = 3(1−r21)−2r1r3−r23+2r4+r2(2+r4)−2r22 c0 = (2r21−1−r2)r4+r23+2r1r3(1−2r22)+2r22(1+r2)−r2+r21(3−4r2)−1

The conditions , and are satisfied, irrespective of the value of , owing to for all , and the inequalities obtained before for , . Thus, these three inequalities do not yield additional constraints of . Those come from requiring and . Both coefficients are linear in , which allows us to write the conditions explicitly,

 −1+r21(1−4r2)+2r22(1+r2)+2r1r3(1−2r2)+r231−2r21+r2≤r4≤1−(r1−r3)2(1−r2) (21)

Alternatively, one can use the function InequalitySolve of Mathematica for the five inequalities for the four coefficients, to obtain the same result.

We see that the upper bound in (21) agrees with that obtained from the Cauchy–Schwarz inequality – see (15) – but the lower bounds differ. Hence, for the bounds on derived from the two methods are different, and that will be the case for all . This is not surprising, since the Cauchy–Schwarz inequality does not make any statement on how ‘sharp’ the bounds are, i.e., whether the bounds can actually be approached. One can argue, using discrete Fourier transforms, that the bounds obtained from the covariance method are ‘sharp’ (for a one-dimensional field) in the sense that for each combination of allowed , one can find a non-negative power spectrum corresponding to these correlation coefficients – the bounds can therefore not be narrowed further.

It should be noted that in the cases given above, the lower (upper) bound on is a quadratic function in , and its quadratic term has a positive (negative) coefficient . This implies that the allowed region in the --plane is convex; indeed, as we will show below, the allowed -dimensional region for the is convex.

The procedure illustrated for the case holds for larger as well: only the coefficients of in the two polynomials yield new constraints on the correlation coefficient , and these two coefficients are linear in , so the constraints for can be obtained explicitly. This then leads us to conclude that the positive-definiteness of the matrix for a given is equivalent to the positivity of the determinants of all submatrices which are obtained by considering only the first rows and columns of , with . We will make use of this property in the next subsection.

For larger , the upper and lower bounds are given by quite complicated expressions, so we refrain from writing them down; however, using the output from Mathematica, they can be used for subsequent numerical calculations. A much more convenient method for calculating the upper and lower bounds explicitly will be obtained below.

To summarize, the procedure outlined gives us constraints on the form

 rnl≤rn≤rnu, (22)

where the lower and upper bounds are function of the , , and satisfy . For , the bounds and have been written down explicitly above.

The existence of these upper and lower bounds on , and thus on the correlation function, immediately implies that the likelihood function of the correlation function cannot be a Gaussian, since the support of the latter is unbounded. This does not imply necessarily that the Gaussian approximation is bad; if the Gaussian probability ellipsoid described by (1) is ‘small’ in the sense that it is well contained inside the allowed region, the existence of the bounds alone yields no estimate for the accuracy of the Gaussian approximation. We will come back to this issue in Sect. 6.

Whereas the expressions for and for larger become quite long, we found a remarkable result for the difference . This is most easily expressed by defining the functions

 Fn(r1,…,rn)=(rnu−rn)(rn−rnl). (23)

Then, for odd

 Δn(r1,…,rn−1)=2⎛⎝(n−1)/2∏k=1F2k⎞⎠⎛⎝(n−1)/2∏k=1F2k−1⎞⎠−1, (24)

and for even ,

 Δn(r1,…,rn−1)=2⎛⎝n/2∏k=1F2k−1⎞⎠⎛⎝n/2−1∏k=1F2k⎞⎠−1. (25)

This result has been obtained by guessing, and subsequently verified with Mathematica, up to , using the explicit expressions for the bounds derived next. We will make use of these properties in the Sect. 3.

2.3 Explicit expression for the bounds

We will now derive an explicit expression for the upper and lower bounds on the . In doing so, we will first show that the determinant of the matrix for points factorizes into two polynomials, both of which are linear in . Then we make use of the fact that the positive definiteness of is equivalent to having positive determinant of all submatrices obtained from by considering only the first rows and columns, ; however, these submatrices of correspond to the matrix for lower numbers of points, and their positive determinant is equivalent to the upper and lower bounds on the for . Hence, by increasing successively, the constraints on are obtained from requiring .

The determinant of a matrix is unchanged if a multiple of one row (column) is added to another row (column). We make use of this fact for the calculation of , by carrying out the following four steps:

1. We add the first row to the last, obtaining the matrix with elements

 \@tensA(1)ij=\@tensAij+δiN\@tensA1j.
2. We subtract the last column from the first,

 \@tensA(2)ij=\@tensA(1)ij−δ1j\@tensA(1)iN.
3. The second row is added to the -st one, the third row is added to the -nd one, and so on. For odd, this reads

 \@tensA(3)ij=\@tensA(2)ij+(N−3)/2∑k=1δ(N−k)i\@tensA(2)(1+k)j,

whereas for even, the sum extends to .

4. Finally, the -st column is subtracted from the second one, the -nd column is subtracted from the third one, and so on, which for odd reads

 \@tensA(4)ij=\@tensA(3)ij−(N−3)/2∑k=1δ(1+k)j\@tensA(3)i(N−k),

whereas for even the sum extends to .

These steps are illustrated for the case in Fig. 1. For even [odd], this results in a matrix which contains in the lower left corner a [] submatrix with elements zero. Therefore, the determinant of factorizes into the determinants of two matrices [of a and of a matrix]. Thus,

 det(\@tensA)≡det(\@tensAN)=det(\@tensA(4))=det(\@tensUN)det(\@tensVN), (26)

where we explicitly indicate the number of points, and thus the dimensionality of , with a superscript, and where for even , and are matrices with elements

 \@tensUNij=r|i−j|−rN+1−i−j;\@tensVNij=r|i−j|+ri+j−1,1≤i,j≤N/2, (27)

whereas for odd , and are and matrices, respectively, with elements

 \@tensUNij=r|i−j|−rN+1−i−j,1≤i,j≤(N−1)/2;\@tensVNij=r|i−j|+(1−δ1i)ri+j−2,1≤i,j≤(N+1)/2. (28)

Since , and the (for even; for odd this is a []) submatrix obtained by cancelling the first column and row of is just , we can write

 det(\@tensUN)=(1−rN−1)det(\@tensUN−2)+det(¯\@tensUN), (29)

where is the matrix which is obtained from by setting . The upper bound for is found by setting , which yields

 Missing or unrecognized delimiter for \left (30)

where we set . Analogously, the final element of reads , where for even , and for odd . Therefore,

 det(\@tensVN)=(1+rN−1)det(\@tensVN−2)+det(¯\@tensVN), (31)

where is obtained from by setting ; the lower bound for is then obtained by setting this expression to zero, or

 rnl=−1−det(¯\@tensVn+1)det(\@tensVn−1). (32)

Since is a linear function of , it must be of the form , where the coefficients are independent of . The value of yields the root of , and is thus the upper bound on . The coefficient follows from (29); therefore,

 det(\@tensUn+1)=(rnu−rn)det(\@tensUn−1). (33)

The analogous result holds for the , i.e.

 det(\@tensVn+1)=(rn−rnl)det(\@tensVn−1). (34)

These recursion relations then yield the explicit expressions

 det(\@tensUN)=N/2∏k=1(r(2k−1)u−r2k−1);det(\@tensVN)=N/2∏k=1(r2k−1−r(2k−1)l) (35)

for even, and

 det(\@tensUN)=(N−1)/2∏k=1(r(2k)u−r2k);det(\@tensVN)=(N−1)/2∏k=1(r2k−r(2k)l) (36)

for odd. This yields for the determinant of the matrix the explicit expression

 det(\@tensAN)=N/2∑k=1F2k−1,det(\@tensAN)=(N−1)/2∑k=1F2k (37)

for even and odd , respectively. Accordingly, the width of the allowed range of then becomes

 Δn=2det(\@tensAN)det(\@tensAN−1), (38)

where we made use of (24) and (25).

3 How strong are the constraints?

We shall now investigate the question how constraining the constraints on the correlation function or, equivalently, the correlation coefficients are. In order to quantify this question, we imagine that the correlation function is measured on a regular grid of separations , and from that we define as before . Due to (4), for all . We therefore consider the set of functions defined by their values on the grid points, and allow only values in the interval for all . Let be the largest value of that we consider; then the functions we consider are defined by a point in the -dimensional space spanned by the . This -dimensional cube has sidelength 2 and thus a volume of . We will now investigate which fraction of this volume corresponds to correlation functions which satisfy the constraints derived in the previous section.

A related question that can be answered is: suppose the values of , are given; what fraction of the -dimensional subspace, spanned by the with corresponds to allowed correlation functions? For example, if , then all other , and hence the condition is very constraining – the volume fraction of the subspace spanned by the with is zero in this case.

Mathematically, we define these volume fractions as

 VMn = ∫r(n+1)ur(n+1)ldrn+12∫r(n+2)ur(n+2)ldrn+22…∫rMurMldrM2 (39) = M∏k=n+1∫rkurkldrk2.

The factor in each integration accounts for the side-length of the -dimensional cube, so that the are indeed fractional volumes. depends on the with ; in particular, is the fractional volume which is allowed if all constraints are taken into account. From the definition of the it is obvious that the following recursion holds:

 VMn=∫r(n+1)ur(n+1)ldrn+12VM(n+1). (40)

We can therefore calculate the iteratively, starting with

 VM(M−1)=