Bayesian analysis of cosmic rays

A Bayesian analysis of the 69 highest energy cosmic rays detected by the Pierre Auger Observatory

Alexander Khanin and Daniel J. Mortlock
Astrophysics Group, Imperial College London, Blackett Laboratory, Prince Consort Road, London SW7 2AZ, U.K.
Department of Mathematics, Imperial College London, London SW7 2AZ, U.K.
Accepted 2016 ?????? ??. Received 2016 ?????? ??; in original form 2016 ???????? ??

The origins of ultra-high energy cosmic rays (UHECRs) remain an open question. Several attempts have been made to cross-correlate the arrival directions of the UHECRs with catalogs of potential sources, but no definite conclusion has been reached. We report a Bayesian analysis of the 69 events from the Pierre Auger Observatory (PAO), that aims to determine the fraction of the UHECRs that originate from known AGNs in the Veron-Cety & Veron (VCV) catalog, as well as AGNs detected with the Swift Burst Alert Telescope (Swift-BAT), galaxies from the 2MASS Redshift Survey (2MRS), and an additional volume-limited sample of 17 nearby AGNs. The study makes use of a multi-level Bayesian model of UHECR injection, propagation and detection. We find that for reasonable ranges of prior parameters, the Bayes factors disfavour a purely isotropic model. For fiducial values of the model parameters, we report 68% credible intervals for the fraction of source originating UHECRs of , , , and for the VCV, Swift-BAT and 2MRS catalogs, and the sample of 17 AGNs, respectively.

cosmic rays – methods: statistical
pagerange: A Bayesian analysis of the 69 highest energy cosmic rays detected by the Pierre Auger ObservatoryB.3pubyear: 2016

1 Introduction

Cosmic rays (CRs) are highly accelerated protons and atomic nuclei, some of which enter the Solar system and reach the Earth. They are the most energetic particles observed in nature, with energies in the range to (see e.g. Kotera & Olinto 2011,  Letessier-Selvon & Stanev 2011 for reviews).

A number of open scientific issues remain with respect to CRs, in particular ultra-high energy cosmic rays (UHECRs) with arrival energies . The study of UHECRs is complicated by the fact that they experience an abrupt cutoff in their energy spectrum at , so that only small samples are available. The largest currently available sample is the 69 events with recorded by the Pierre Auger Observatory (PAO) between 2004 January 1 and 2009 December 31 (Abreu et al. 2010).

One open issue in the study of UHECRs is the question of their sources. A number of candidates, such as active galactic nuclei (AGNs) and pulsars have been proposed, but studies have not been conclusive (see e.g. Kalmykov et al. 2013 for a review). The question of UHECR origins can be studied by attempting to associate the arrival directions with their sources. While UHECRs are charged particles and therefore experience magnetic deflection as they propagate, they are sufficiently energetic that the total deflection is expected to be to deg (e.g. Medina Tanco et al. 1998; Sigl et al. 2004; Dolag et al. 2005), so that some information about their points of origin should be retained.

Association of UHECRs with catalogs of potential sources is made possible by the fact that UHECRs with energies of are expected to have come from a limited radius of . This radius is sometimes called the Greisen-Zatsepin-Kuzmin (GZK) horizon, and arises due to the fact that UHECRs at those energies scatter off the cosmic microwave background (CMB) radiation in a process known as the GZK effect (Greisen 1966, Zatsepin & Kuzmin 1966). The mean free path of the GZK effect at high energies is a few and the energy loss in each collision is - The resultant attenuation is very rapid, and is the cause of the cutoff in the UHECR energy spectrum observed by both HiRes (Abbasi et al. 2008) and PAO (Abraham et al. 2008).

A number of attempts have been made to find correlations between UHECR arrival directions and catalogs of possible sources. Cross-correlation studies have been conducted with galaxy catalogs, such as the Two Micron All-Sky Survey (2MASS) Redshift Survey (2MRS) (Abraham et al. 2009; Abbasi et al. 2010), as well as specific types of objects such as active galactic nuclei (AGNs) (Abraham et al. 2007, 2008; George et al. 2008; Pe’Er et al. 2009; Watson et al. 2011) and BL Lacertae objects (BL LAcs) (Tinyakov & Tkachev 2001). Overall, no clear consensus has been reached. Different studies have reported different degrees of correlation, depending on the statistical approach, the UHECR sample, and the population of source candidates that was used. The most significant correlation was reported by the Pierre Auger Collaboration, between arrival directions of UHECRs with energies and the positions of nearby AGNs (Abraham et al. 2007). The result was supported by Yakutsk data (Ivanov 2009), but not by HiRes (Abbasi et al. 2008) or the Telescope Array (Abu-Zayyad et al. 2012). A more recent analysis of a larger PAO sample has shown a weaker correlation than before (Abreu et al. 2010).

The lack of consensus on these issues is partly due to the difficulty of analyzing such small sample sizes. Given the small size of the UHECR data sets, it is important to utilize as much of the available information as possible. This can be achieved by adopting a Bayesian methodology, that involves models of the relevant physical processes. The first steps to such a comprehensive Bayesian work have been made in the recent work of Watson et al. (2011) and Soiaporn et al. (2013).

Watson et al. (2011) analysed the 27 events that were analysed in Abraham et al. (2007), and derived a posterior for the fraction that originated from AGNs in the Veron-Cetty & Veron (VCV) catalog (Véron-Cetty & Véron 2006). To do so, they used a two-component parametric model characterized by a source rate and a background UHECR rate . The model assumed that the UHECR arrival directions are points drawn from a Poisson intensity distribution on the celestial sphere. The intensity distribution was obtained with a computational UHECR model. Watson et al. (2011) report strong evidence of a UHECR signal from the VCV AGNs. They find a low AGN fraction that is consistent with Abreu et al. (2010). For fiducial values of the model parameters, they report a 68% credible interval for the AGN fraction of .

Soiaporn et al. (2013) developed a multi-level Bayesian framework to attempt to associate the 69 UHECRs that were recorded at the PAO in the period 2004-2009 with 17 nearby AGNs catalogued by Goulding et al. (2010) (hereafter G10). They report evidence for a small but nonzero fraction of the UHECRs to have originated at the AGNs from G10, of the order of a few percent to 20%.

We extend the formalism of Watson et al. (2011) with both a greater data set and a refined UHECR model. Following Abreu et al. (2010), we extend the analysis to two further source catalogs: AGNs from the Swift Burst Alert Telescope (Swift-BAT) (Baumgartner et al. 2010) and galaxies from 2MRS (Huchra et al. 2012). We also extend the analysis to the 17 AGNs from the G10 catalog.

After discussing the UHECR and source data sets in Section 2, we explain our UHECR model in Section 3, discuss the statistical formalism of our Bayesian model comparison in Section 4, and the application of the formalism to mock data sets in Section 5. The results of applying the formalism to the PAO data are discussed in Section 6. Some aspects of our computational approach are described in Appendix A, and some subtleties of our model comparison are explored in Appendix B. We use a Hubble constant of where required to convert between redshifts and distances.

2 Data

2.1 UHECR sample

The sample of UHECR events that was used in this analysis were the 69 highest energy events recorded at the PAO between January 2004 and November 2009, as documented in Abreu et al. (2010). These are the events with observed energies above the threshold .

The PAO is a CR observatory located in Argentina, at a longitude of 69.5 W and a latitude 35.2 S. PAO is a hybrid observatory, which means that it uses both surface detection (SD) and fluorescent telescope detection (FD) of UHECRs. The observatory has SD plastic scintillators of a total area of 3000 and 4 FD telescopes.

The PAO’s total exposure of this data-set is and its relative exposure per unit solid angle, , is illustrated in Figure 1. The relative exposure is directly proportional to , the probability that a UHECR will be detected if it arrives from direction , but is normalized so that .

PAO measures UHECR arrival directions with an uncertainty of and arrival energies with a relative uncertainty of (Letessier-Selvon et al. 2014).

Figure 1: Relative PAO exposure in Galactic coordinates. The arrival directions of the 69 UHECRs are shown as black points. The Galactic centre (GC) and south celestial pole (SCP) are indicated.

Figure 2: Positional dependence of the expected number of source originating events, for the VCV, Swift-BAT, 2MRS, and G10 catalogs. A fiducial value of the smearing parameter is assumed. The arrival directions of the 69 UHECRs are shown as black points. Galactic coordinates are used, and the Galactic centre (GC) and south celestial pole (SCP) are indicated.

2.2 Source catalogs

As potential source catalogs, we consider AGNs from the VCV, Swift-BAT and G10 catalogs, and galaxies from the 2MRS catalog. This allows us to compare our analysis for the Swift-BAT and 2MRS sources with the analysis from Abreu et al. (2010), our analysis for the VCV sources with the analyses from both Abreu et al. (2010) and Watson et al. (2011), and our analysis of the G10 sources with Soiaporn et al. (2013).

We use the 12th edition of the VCV catalog, selecting sources with , as AGNs with higher redshift are too far away to be plausible UHECR sources, and can be shown to have a negligible effect on the results. We omit sources for which absolute magnitudes are not stated. The total number of VCV AGNs that meet those requirements is . This is the same sample of sources that was used in Abraham et al. (2007), Abreu et al. (2010) and Watson et al. (2011), and in PAO’s more recent analysis Aab et al. (2015). While the VCV catalog is heterogenous and thus not ideal for statistical studies, it is close to complete for the low-redshift AGNs that are of relevance here.

For the Swift-BAT catalog, we use the 58 month version, that includes a total of sources. In the case of the 2MRS catalog, we used the catalog version 2.4, 2011 Dec 16. We exclude events that are within of the Galactic plane, to avoid biases due to the incompleteness of the catalog in the region of the Galactic plane. This leaves a total of galaxies. These samples of Swift-BAT and 2MRS sources are the same as those used by Abreu et al. (2010).

The G10 catalog is a well-characterized volume-limited sample of AGNs. The 17 AGNs contained in it constitute all infrared-bright AGNs within 15 Mpc. This is the same sample that was used by Soiaporn et al. (2013).

3 UHECR model

A Bayesian UHECR analysis requires a realistic model of UHECR injection, propagation, and detection. This model was used both to compute the likelihoods in our statistical formalism (Section 4), and to create simulated mock catalogs of UHECRs to test our methods (Section 5).

3.1 Injection

We adopt a model in which any given UHECR source emits UHECRs with an emission spectrum given by


where the logarithmic slope is taken to be 3.6 (Abraham et al. 2010). The spectrum is normalized in such a way that the total emission rate of UHECRs with energy greater than is given by


where is the minimum UHECR emission energy and is the rate at which source emits UHECRs with .

3.2 Energy loss during propagation

The energy loss processes experienced by UHECRs can be characterized in terms of the loss length . Given the loss length as a function of energy, it is possible to calculate the total amount of energy that a UHECR loses as it travels to the Earth from a given distance by solving the differential equation


For pure proton composition, obeys the expression


where is the speed of light and , and are terms corresponding to the three main energy loss processes experienced by UHECRs of pure proton composition (e.g. Stanev 2009):

  1. the GZK scattering off the CMB photons at energies above ;

  2. Bethe-Heitler (BH) pair production (also a scattering process off the CMB radiation), which dominates at lower energies (Hillas 1968);

  3. the adiabatic energy loss due to the expansion of the Universe.

A detailed discussion of these terms, including expressions and parametrizations, can be found in De Domenico & Insolia (2013). For the energies that are relevant in this investigation, the dominant term is . The Bethe-Heitler and adiabatic processes dominate the energy loss at lower energies, but play only a minor role at the higher energies in question.

The loss lengths are shown as a function of energy in Figure 3. The contributions to the loss length from the BH and adiabatic losses are combined into a single function that is contrasted with the loss length due to the GZK effect, . The two are combined into the total loss length . The figure shows plots for values of 0.0 and 0.1, which correspond to distances of and , thus covering the GZK horizon. appears very rapidly after an energy of and begins to dominate the energy loss. As we are interested only in UHECRs with energies , the GZK scattering is the most relevant loss process in this investigation.

The energy dependence of is one of the main improvements of this propagation model over the model used in Watson et al. (2011), where was taken to be a constant. The constant value of used by Watson et al. (2011) is also displayed in Figure 3 for comparison.

Figure 3: Loss lengths from the three energy loss processes, compared to the constant constant loss length used by Watson et al. (2011), as described in Section 3.2.

3.3 Effective smearing

We combined the magnetic deflection that a UHECR experiences during propagation and the uncertainty in its detected arrival direction into a single kernel, which was chosen to be a von Mises-Fisher (vMF) distribution, defined as


where is the measured arrival direction of the ray, is the source direction and is the concentration parameter. The vMF distribution resembles a Gaussian on the sphere, with being inversely related to the width of the Gaussian: for large values of the distribution is peaked over an angular scale of ; if tends to the distribution becomes uniform on the sphere.

The magnitude of the deflection that the highest energy UHECRs experience is uncertain, with the estimates of typical deflection angles ranging from to deg (e.g. Medina Tanco et al. 1998; Sigl et al. 2004; Dolag et al. 2005). We assume a fiducial smearing angle of 3 deg (), but also conduct investigations for smearing angles of 6 and 10 deg ( and ).

3.4 Observed UHECR flux

The number of UHECRs from source above a threshold energy observed on Earth per unit area per unit time, , is a quantity that is important in our statistical analysis. This rate is proportional to the rate of UHECRs emitted by the source, , but it also depends on the distance-dependence of the UHECR energy loss, and on the UHECR injection spectrum. We use the UHECR propagation model described in Section 3.2 to determine the injection energy corresponding to the threshold energy and to the source distance . Combining this value with Equation 2 and with the source distance , we obtain


This expression assumes that the observed energy is equivalent to the arrival energy of the UHECR, . Thus, for the purposes of the calculation, the 12% energy uncertainty of the PAO measurements is neglected. The variation in source rates among the sources that we are considering is not negligible. We use the source rate of Centaurus A as the reference value . The source rate of a source is obtained by weighing the flux of that source in a particular band against the flux of Centaurus A in that same band. The wave band of the flux thereby is different depending on the source catalog. For VCV, the flux of the source in the -band is used, for Swift-BAT the X-ray flux, for 2MRS the IR flux, for G10 the -band flux. The fluxes are thus used as weights, so that sources with higher flux contribute more UHECRs. This approach is very similar to the approach used in Abreu et al. (2010), where fluxes were used to weigh the sources from the Swift-BAT and 2MRS catalogs in the same way. Incorporating the fluxes into the formalism, we obtain the expression


where is the distance to Centaurus A.

4 Statistical formalism

Given a sample of UHECRs arrival directions, we would like to determine the fraction of these rays that have come from a set of sources under consideration. To do so, we use a two-component parametric model characterized by two rates: The source rate and the isotropic background rate . As elaborated in Section 3.4, we use the source rate of Centaurus A as the reference value of . We obtain a joint posterior distribution for the two rates:


where is the prior distribution for and R, and is the likelihood (i.e. the probability of obtaining the data set d given values of and R).

4.1 Prior

We adopt a uniform prior over and R, with , . This plausibly encodes our ignorance of the two parameters, and, unlike maximum entropy priors, includes a possible value of 0 for both parameters. The maximum values of and are denoted as and . We have conducted our analysis for flat priors of varying width, using a variable width parameter . The expression for the prior can be written as


and have been chosen in such a way that when , the prior covers the 99.7% credible region implied by the likelihood and an infinitely broad uniform prior. This gives a data driven scaling for the rates. The priors and their dependence on are illustrated in Appendix B.

4.2 The likelihood

To compute the likelihood, we use a ‘counts in cells’ approach, in which the sky is divided into 6,480,000 pixels, that are distributed uniformly in right ascension and declination. Thus, the data set d can be rewritten as a set of counts in each pixel .

The likelihood is then given by a product of the individual Poisson likelihoods in each pixel, and can be written as


where and are the expected counts in pixel p due to sources and background, respectively. The expected number of counts in pixel p that are contributed by the background is


where the integral is over the pixel p, and is the relative exposure (Section 2.1). The expected number of source originating events in pixel p is


where the sum is over the sources, is the vMF distribution (Equation 5), and is the observed UHECR flux discussed in Section 3.4. Inserting Equations 11 and 12 into Equation 10, we arrive at the full likelihood.

The positional dependence of follows the relative exposure of PAO, as shown in Figure 1. The positional dependence of depends both on the PAO exposure and on the distribution of sources in the given catalog. Figure 2 shows the dependence for the four catalogs that are used in this study. The dependence is dominated by the distribution of local AGNs, by far the strongest source being Centaurus A (, ), which previously studies (e.g. Abraham et al. 2007) have suggested as the dominant UHECR source.

The expression for the likelihood can be rearranged to reduce the total number of computations, as described in Appendix A.

4.3 The source fraction

The source fraction111The source fraction is equivalent to the AGN fraction used in Watson et al. (2011) but now generalized to allow for non-AGN progenitors. is defined as the fraction of the UHECRs that are expected to have originated at the sources in whichever catalog is under consideration and is given by


The posterior for can be calculated from the posterior over the rates as


is insensitive to and provided they are sufficiently large.

4.4 Model comparison

We would like to compare model where all the UHECRs are drawn from a uniform distribution with model where the UHECRs are derived from a combination of a background and a source originating component. To do this, we conduct a Bayesian model comparison. For a data set d, and two models and , the ratio of the marginal likelihoods for the two models, termed the Bayes factor, is


In the specific case that is considered here, the models are nested: When , model reduces to model . A general expression of the Bayes factor in this situation is


It can be shown (Dickey 1971) that in the case of such nested models, the expression reduces to


This expression is known as the Savage-Dickey Density Ratio, or SDDR. Qualitatively, this expression means that the nested uniform model is preferred if, within the context of the more complicated model, the data result in an increased probability that .

5 Simulations

In order to investigate the constraining power of a data set of 69 events, we apply the method to simulated data sets. We use two extreme cases:

  1. Uniform arrival directions. These rays were drawn from a probability distribution that followed the PAO exposure.

  2. UHECRs originating at sources from a catalog. We conducted simulations for all four of the catalogs. In each catalog, the sources were weighted by their fluxes and the PAO exposure. Random sources were then selected, and the propagation model of Section 3 was used to propagate rays from the sources to the Earth.

The posteriors for the source and background rates, as well as the posteriors for the source fraction, are summarized in Figure 4. The posteriors for the uniform and source centred cases are completely disjoint, which demonstrates that in extreme senarios where all UHECRs originate either from a uniform background or from a source catalog, a data set of 69 events should be sufficient to distinguish between the two models. Figure 4 also shows the Bayes factors as functions of for the two cases. The Bayes factors that are displayed are the inverses of the SDDR given in Equation 17, and favour the more complex model for Bayes factors .

To assess the results of the Bayes factor simulations, we can derive a rough range of plausible values of from physical models, and then look at the behaviour of the Bayes factors at those physically plausible values. Plausible models of UHECR injection predict that the UHECR luminosity of a source like Centaurus A is of the order of (Fraija et al. 2012). If this is taken as the typical UHECR luminosity of a source, then for a UHECR energy range of , the range of source rates can be calculated by dividing the UHECR luminosity by the limiting values of this range. The result of this calculation is a range of source rates of roughly . The values of corresponding to this range have been marked on Figure 4. (The values are slightly different for each of the simulations. For the sake of clarity, only the values for the uniform simulation are displayed, the others being broadly similar.) For the sourced case, model is strongly favoured for all physically plausible values of , while for the uniform case, the simple uniform model is favoured for the physically plausible values.

Figure 4: Results from simulations: Uniform UHECRs, and UHECRs originating at sources from the 4 catalogs. In all cases, 69 events are used. (A) Posteriors for and . The contours are the 68.3%, 95.4% and 99.7% highest posterior density credible regions. (B) Posteriors for the source fraction. (C) Plot of Bayes factors as a function of the hyperparameter . In (C), the -mark and the vertical line signify the minimum and the maximum values of the physically plausible range of . The minimum and maximum values that are displayed correspond to the uniform simulation.

6 Results

The results of the application of the statistical methods described in Section 4 to the data described in Section 2 are shown in Figures 5 and 6. Figure 5 contrasts the results from our analysis with the equivalent results from Watson et al. (2011), and with the results for an intermediate case. The use of a more refined propagation model leads to a higher posterior probability for lower source rates. The reason for that is that in Watson’s propagation model, the energy loss length is constant and very small (Figure 3). UHECRs experience more drastic energy loss than in the more realistic model, which leads to more distant AGNs being excluded as plausible source candidates. As fewer sources are included, a higher source rate is required to generate the same sample of UHECRs.

The inclusion of 69 events reduces the extent to which the non-uniform model is favoured. This is evident from the posterior of the source fraction, and also from the behavaviour of . This result agrees with the results of Abreu et al. (2010), which reported that the full 69 events yield lower evidence of anisotropy than the earlier study Abraham et al. (2007), which analysed 27 events.

Figures 6 shows results for all four of the source catalogs, and for all values of the smearing parameter. Displayed are the posteriors for the source fraction, as well as plots of against . The constraints on the source fraction for all cases are shown in Table 1. The figures and table show that for greater smearing, the range of plausible values of is increased, and the most probable value of the source fraction is higher than for the fiducial model of . The reason is that for greater magnetic deflection, the UHECR intensity distribution becomes more uniform, so that the uniform and mixed models become more difficult to distinguish, and a greater range of values become viable.

The plots of demonstrate that for all physically plausible prior ranges of the model parameters, the fully isotropic model is disfavoured. The form of the dependence of on is elaborated upon in Appendix B.

These results for the VCV, Swift-BAT, and 2MRS catalogs can be compared with the results of Abreu et al. (2010), who used a correlation-based analysis on the VCV catalog that mirrored the analysis in Abraham et al. (2007). Abreu et al. (2010) reported a correlation of (38)% between UHECRs and sources from the VCV catalog, which was considerably lower than than the (69)% correlation that was reported in Abraham et al. (2007). This reduction in the correlation is consistent with our findings that the source fraction is reduced as we increase the data set from 27 to 69 events. In addition to these correlation based methods, Abreu et al. (2010) conducted a likelihood based study similar to the analysis presented here, where the likelihood was taken as a probability map of arrival directions of UHECRs, parametrized by a magnetic smoothing angle and a fraction of isotropic rays , which is equivalent to . These likelihood-based studies were conducted for the Swift-BAT and 2MRS catalogs. For the 2MRS case, the maximum likelihood values of and are reported as 0.56 and 7.8, respectively. The value lies between our chosen smearing angles 6 and 10. The value for corresponds to a value of of 0.44, which is consistent with our credible intervals for these chosen smearing angles. For the case of Swift-BAT, the maximum likelihood value of is given as 0.64, which corresponds to a source fraction of 0.36. The maximum likelihood estimate of the smearing angle is reported as 1.5, which is lower than our minimum chosen value of 3. Despite the difference between the angles, a value of 0.36 can still be considered broadly consistent with the 68% credible interval for 3, .

Our results for the G10 catalog can be compared with the work of Soiaporn et al. (2013). That analysis involved the full data set of 69 events, and found evidence for small but nonzero values of , of the order of a few percent to 20%, ruling out values of . This is broadly consistent with our results, which suggest that values of are the most probable for all values of the smearing parameter.

Table 1: Maximum a posteriori estimates and 68% credible intervals for .

Figure 5: Results for , and the sources from the VCV catalog. Results for 27 and 69 events, and for constant and variable loss lengths are displayed. (A) Posteriors for the source and background rates. The contours are the 68.3%, 95.4% and 99.7% highest posterior density credible regions. (B) Posterior for the source fraction. (C) Plot of Bayes factors as a function of the hyperparameter . In (C), physically plausible ranges of are shown for the cases of 27 events (blue) and 69 events (black), with a variable loss length. The -marks and the vertical lines signify the minimum and the maximum values of the physically plausible ranges of .

7 Conclusions

We have performed a Bayesian analysis of the 69 UHECRs detected by the PAO with energies to determine the fraction of these UHECRs that originated from catalogs of plausible UHECR sources. The sources considered were AGNs from the VCV, Swift-BAT, and G10 catalogs, and galaxies from the 2MRS catalog.

For the fiducial magnetic smearing parameter of 3 deg, we report 68% credible intervals for the source fraction of , , and for the VCV, Swift-BAT, G10 and 2MRS catalogs, respectively. For all physically plausible values of the model parameters, the fully uniform model is disfavoured. The results of our study are in broad agreement with previous work on this subject, such as Watson et al. (2011), Abreu et al. (2010) and Soiaporn et al. (2013). The credible intervals for the VCV catalog are lower than the analogous credible intervals from Watson et al. (2011), which used a similar method to analyse 27 PAO events. This is consistent with earlier studies: Abreu et al. (2010), which analysed 69 events, reported a lower signal of anisotropy than the earlier study Abraham et al. (2007), which used 27 events.

We will extend this Bayesian framework to include the arrival energies of the UHECRs as well as the arrival directions.

It is expected that future experiments will produce data sets that will be sufficiently large for our Bayesian method (and other statistical approaches; see e.g. Rouillé d’Orfeuil et al. 2014) to detect even the weak clustering expected if the UHECRS have come from nearby sources. PAO is continuing to take data and is expected to produce a sample of UHECRs over its first decade of operations. Looking further ahead, the planned Japanese Experiment Module Extreme Universe Space Observatory (JEM-EUSO, Adams Jr. et al. 2013) on the International Space Station (ISS) is scheduled for launch in 2017 and is expected to detect UHECRs annually over its five year lifetime.


  • Aab et al. (2015) Aab A., et al., 2015, The Astrophysical Journal, 804, 15
  • Abbasi et al. (2010) Abbasi R., et al., 2010, The Astrophysical Journal Letters, 713, L64
  • Abbasi et al. (2008) Abbasi R. U., et al., 2008, Physical Review Letters, 100, 101101
  • Abbasi et al. (2008) Abbasi R. U., et al., 2008, Astroparticle Physics, 30, 175
  • Abraham et al. (2010) Abraham J., Abreu P., Aglietta M., Ahn E. J., Allard D., Allen J., Alvarez-Muñiz J., Ambrosio M., Anchordoqui L., Andringa S., et al. 2010, Physics Letters B, 685, 239
  • Abraham et al. (2007) Abraham J., et al., 2007, Science, 318, 938
  • Abraham et al. (2008) Abraham J., et al., 2008, Astroparticle Physics, 29, 188
  • Abraham et al. (2008) Abraham J., et al., 2008, Physical Review Letters, 101, 061101
  • Abraham et al. (2009) Abraham J., et al., 2009, arxiv:0906.2347
  • Abreu et al. (2010) Abreu P., et al., 2010, Astroparticle Physics, 34, 314
  • Abu-Zayyad et al. (2012) Abu-Zayyad T., et al., 2012, The Astrophysical Journal, 757, 26
  • Adams Jr. et al. (2013) Adams Jr. J. H., et al., 2013, arXiv:1307.7071
  • Baumgartner et al. (2010) Baumgartner W. H., Tueller J., Markwardt C., Skinner G., 2010, in AAS/High Energy Astrophysics Division #11 Vol. 42 of Bulletin of the American Astronomical Society, The Swift-BAT 58 Month Survey. p. 675
  • De Domenico & Insolia (2013) De Domenico M., Insolia A., 2013, Journal of Physics G Nuclear Physics, 40, 015201
  • Dolag et al. (2005) Dolag K., Grasso D., Springel V., Tkachev I., 2005, J. Cosmology & Astro-Part. Phys., 1, 9
  • Fraija et al. (2012) Fraija N., et al., 2012, The Astrophysical Journal, 753, 40
  • George et al. (2008) George M. R., Fabian A. C., Baumgartner W. H., Mushotzky R. F., Tueller J., 2008, MNRAS, 388, L59
  • Goulding et al. (2010) Goulding A. D., Alexander D. M., Lehmer B. D., Mullaney J. R., 2010, MNRAS, 406, 597
  • Gregory (2010) Gregory P. C., 2010, Bayesian Logical Data Analysis for the Physical Sciences. Cambridge University Press, pp 376–388
  • Greisen (1966) Greisen K., 1966, Physical Review Letters, 16, 748
  • Hillas (1968) Hillas A. M., 1968, Canadian Journal of Physics, 46, 623
  • Huchra et al. (2012) Huchra J. P., et al., 2012, The Astrophysical Journal, 199, 26
  • Ivanov (2009) Ivanov A. A., 2009, Nuclear Physics B Proceedings Supplements, 190, 204
  • Kalmykov et al. (2013) Kalmykov N. N., Khrenov B. A., Kulikov G. V., Zotov M. Y., 2013, Journal of Physics Conference Series, 409, 012100
  • Kotera & Olinto (2011) Kotera K., Olinto A. V., 2011, ARA& A, 49, 119
  • Letessier-Selvon et al. (2014) Letessier-Selvon A., et al., 2014, Brazilian Journal of Physics, 44, 560
  • Letessier-Selvon & Stanev (2011) Letessier-Selvon A., Stanev T., 2011, Reviews of Modern Physics, 83, 907
  • Medina Tanco et al. (1998) Medina Tanco G. A., de Gouveia Dal Pino E. M., Horvath J. E., 1998, The Astrophysical Journal, 492, 200
  • Pe’Er et al. (2009) Pe’Er A., Murase K., Mészáros P., 2009, Phys. Rev. D, 80, 123018
  • Rouillé d’Orfeuil et al. (2014) Rouillé d’Orfeuil B., Allard D., Lachaud C., Parizot E., Blaksley C., Nagataki S., 2014, arXiv:1401.1119
  • Sigl et al. (2004) Sigl G., Miniati F., Enßlin T. A., 2004, Physical Review D, 70, 043007
  • Soiaporn et al. (2013) Soiaporn K., Chernoff D., Loredo T., Ruppert D., Wasserman I., 2013, The Annals of Applied Statistics, 7, 1249
  • Stanev (2009) Stanev T., 2009, New Journal of Physics, 11, 13
  • Tinyakov & Tkachev (2001) Tinyakov P. G., Tkachev I. I., 2001, Soviet Journal of Experimental and Theoretical Physics Letters, 74, 445
  • Véron-Cetty & Véron (2006) Véron-Cetty M.-P., Véron P., 2006, Astronomy and Astrophysics, 455, 773
  • Watson et al. (2011) Watson L. J., Mortlock D. J., Jaffe A. H., 2011, MNRAS, 418, 206
  • Zatsepin & Kuzmin (1966) Zatsepin G., Kuzmin V., 1966, JETP Lett., 4

Figure 6: Posteriors of the source fraction, and plots of against the hyperparameter , for the three smearing angles , and , and for the three source catalogs (A) VCV, (B) Swift-BAT, and (C) 2MRS. The plots of show physically plausible ranges of : The -marks and the vertical lines signify the minimum and the maximum values of these ranges.

Appendix A Numerical evaluation of the likelihood

The likelihood, as given in Equation 10, is a product over Poisson likelihoods for the individual pixels,


where the product is over the pixels, is the number of counts in pixel p, and and are the expected numbers of counts from the background and sources in pixel . This expression for the likelihood proved to be inefficient for use, as it required a great number of computations: The total number of pixels was 6,480,000. If a grid of is used, a total of 64,800,000,000 calculations would be required.

The total number of calculations can be greatly reduced by rearranging the expression. For a given data set, we can separate the product of Equation 18 into a product over those pixels that include an event, , and pixels that do not, . Using the fact that for all and for all , we can write


where and , and and are two pixelized maps obeying the equations


Thus, the initial expression has been rearranged in such a way that the vast majority of Poisson calculations is contained within the sums and . These sums can be calculated in advance for the entire grid of and R. This greatly reduces the total number of calculations required for Equation 18, and speeds up the full calculation by a factor of .

Appendix B Model comparison and prior sensitivity

The Bayes factor that was discussed in Section 4.4 is comparing two models: A simple model of uniform UHECRs, and a more complex model that has both uniform and sourced UHECRs. As explained in the section, due to being nested within , the expression for the Bayes factor reduces to


where and are the background and source rates and d are the data. Qualitatively, the expression means that the nested uniform model is preferred if, within the context of the more complex model, the data result in an increased probability that . A uniform prior was used, given by


where is the hyperparameter that determines the width of the prior. and have been chosen in such a way that when , the prior covers the 99.7% credible region implied by the likelihood and an infinitely broad uniform prior. To explain the dependence of the Bayes factor on , three illustrative cases are used: The case of a simple Gaussian likelihood, the case of the Poisson product likelihood of Equation 10, and the likelihood of On/Off measurements.

b.1 Gaussian likelihood

We consider the case of a Gaussian likelihood given by


where and are the coordinates of the likelihood mean, and are the standard deviations on the two parameters.

This likelihood is shown in the upper panel of Figure 7, focusing on three regions . These regions correspond to the regions over which the flat prior is taken for these values of the hyperparameter. The lower panel shows the posteriors for the same values. As the priors are flat, the posteriors are equivalent to the likelihood in the prior region, normalized over the prior region.

These posteriors can be used to illustrate the dependence of the Bayes factor in Equation 23 on the hyperparameter . For , the numerator is constant, as corresponds to the normalized likelihood, and does not vary as is increased beyond . The denominator falls linearly with . Thus, we expect that for , increases linearly with .

For lower values of , the behaviour of is more complicated, as can be seen in the left-hand lower panel of Figure 7. For low values of , the likelihood becomes


This means that the posterior becomes linear and increasingly flat as . As the function becomes increasingly flat, the ratio in Equation 23 becomes a ratio of two normalized flat functions, so that qualitatively, we can expect it to approach unity. This can also be shown more rigorously, as for low values of and , Equation 23 reduces to


Figure 7: Upper panel: Example Gaussian likelihood. The red lines denote prior regions for three different values of the hyperparameter . Lower panel: Posteriors for the same values are displayed.

Figure 8: Upper panel: Example Poisson product likelihood. The red lines denote prior regions for three different values of the hyperparameter . Lower panel: Posteriors for the same values are displayed.

b.2 Poisson product likelihood

We consider the same likelihood that was used in Equation 10. The total likelihood is a product of individual Poisson likelihoods for 6,480,000 pixels, and can be written as


where and are the expected numbers of counts in pixel p due to source and background rates, respectively. Figure 8 shows the likelihood, as well as three prior regions, and the posteriors calculated for these three regions.

For high values of , the posterior looks very much like the Gaussian, so that we expect the same behaviour for the Bayes factor, including the linear behaviour for . A difference arises at small values of . Here, we see that most of the posterior is concentrated at the highest values of and . As the product of Equation 28, for low values of and , reduces to


where is the total number of PAO events, and and are the pixelized maps that were discussed in Appendix A. As , the function becomes extremely steep in and R, as Figure 8 shows. For such a posterior, tends to zero as .

b.3 On/Off likelihood

An additional case that is of interest in this analysis is that of On/Off measurements. In high-energy astrophysics, when a measurement is taken of the number of counts coming from a source of interest, often an auxiliary measurement is made by pointing the detector off-source. These are called the On and Off measurements, respectively. The counts that are detected in the Off measurement are thereby produced solely by the background rate , while the counts in the On measurement are produced by both the background and the source rates and . From these two measurements, the source rate can then be estimated (e.g. Gregory 2010).

The likelihood for these kinds of measurements is the product of the Poisson likelihoods of the On and Off measurements:


where and are the numbers of counts on and off source, and and times the detector spends on and off the source. An example of such a likelihood is displayed in Figure 9. The On/Off likelihood is very similar to the Poisson product likelihood, as the former can be regarded as a special case of the latter. Thus, the dependence of the Bayes factor on can be expected to be similar to the dependence for the Poisson product case.

For the On/Off case, a standard expression for the Bayes factor has been derived (Gregory 2010), and can be written as