Generalized Tests for Selection Effects in GRB High-Energy Correlations
Several correlations among parameters derived from modelling the high-energy properties of GRBs have been reported. We show that well-known examples of these have common features indicative of strong contamination by selection effects. We focus here on the impact of detector threshold truncation on the spectral peak versus isotropic equivalent energy release (-) relation, extended to a large sample of 218 Swift and 56 HETE-2 GRBs with and without measured redshift. The existence of faint Swift events missing from pre-Swift surveys calls into question inferences based on pre-Swift surveys which must be subject to complicated incompleteness effects. We demonstrate a generalized method for treating data truncation in correlation analyses and apply this method to Swift and pre-Swift data. Also, we show that the - (“Ghirlanda”) correlation is effectively independent of the GRB redshifts, which suggests its existence has little to do with intrinsic physics. We suggest that a physically-based correlation, manifest observationally, must show significantly reduced scatter in the rest frame relative to the observer frame and must not persist if the assumed redshifts are scattered. As with the - correlation, we find that the pre-Swift, bright GRB - correlation of Amati et al. (2006) does not rigorously satisfy these conditions.
Subject headings:gamma rays: bursts — methods: statistical — cosmology: observations
Correlations are pervasive in astronomy and generally lead theory in allowing us to discover causal relationships between observed quantities. In the study of Gamma-ray Bursts (GRBs), multiple powerlaw relations among GRB observables have been used to uncover the intrinsic physics of GRBs themselves (e.g., Eichler & Levinson, 2004; Yamazaki et al., 2004; Rees & Mészáros, 2005; Levinson & Eichler, 2005; Lamb, Donaghy, & Graziani, 2005) and to use GRBs as probes to the distant Universe (e.g., Ghirlanda et al., 2005; Schaefer, 2007). Several authors have critically examined the limitations of such relations: understanding how well they potentially constrain physics given their form and scatter (e.g., Friedman & Bloom, 2005; Schaefer & Collazzi, 2007), uncovering possible evolution with cosmic time (e.g., Yonetoku et al., 2004; Li, 2007), realizing the commonality of outliers (Nakar & Piran, 2005; Band & Preece, 2005; Kaneko et al., 2006), and — most fundamentally — determining how the existence and form of the relations vary once spurious correlation imparted by selection effects is treated (e.g., Lloyd, Petrosian, & Mallozzi, 2000).
Recently, Butler et al. (2007, hereafter B07) report evidence from Swift satellite (Gehrels et al., 2004) observations of GRBs that several of the correlations exhibit a wide scatter and shift in normalization toward the Swift detection threshold, suggestive of an origin intimately connected to the detection limits of pre-Swift satellites. Because relations intrinsic to the physical processes underlying GRBs should not be instrument-dependent, a broader investigation into the data from pre-Swift satellites is crucial for determining whether and/or how the relations can be trusted to potentially constrain the physics of GRBs or cosmology.
Here, we extend our critique to include the full B07 catalog of Swift GRBs with and without measured redshifts (for 218 total Swift GRBs) and also to include HETE-2 GRBs with and without measured (56 GRBs). The uniform B07 catalog is novel for deriving bolometric fluences for all Swift GRBs (between GRBs 041220 and 070509), without requiring tight error bars on the spectroscopic fit parameters. The sample is therefore flux limited by the sensitivity of the Burst Alert Telescope (BAT; Barthelmy et al., 2005); because the X-ray Telescope (XRT; Burrows et al., 2005) localizes to few arcsecond precision nearly all BAT GRB afterglows, additional flux limits associated with afterglow localization and host galaxy detection (important for example when considering the GRBs with measured ) are likely not present or are far less important as compared to stronger flux limits imposed in pre-Swift surveys.
A study similar to this has recently been conducted by Ghirlanda et al. (2008) comparing a smaller sample of bright Swift GRBs to the Beppo-SAX sample (although only the sub-samples with measured ). Ghirlanda et al. (2008) find that faint GRBs detected by Swift (and HETE-2; Section 2) are missing from the Beppo-SAX sample, and there is no clear explanation for the missing data in terms of either the SAX trigger threshold or a (likely higher) threshold resulting from a demand that tight error bars be derived in the GRB spectral modelling. As in Ghirlanda et al. (2008), we also focus on the correlation between the isotropic equivalent energy and the peak in the spectrum (Lloyd, Petrosian, & Mallozzi, 2000; Amati et al., 2002).
Selection effects in high-energy GRBs observables are well-documented (e.g., Lee & Petrosian, 1996), if rarely treated. For faint GRBs, an expected departure of the true distributions of flux, duration, etc., from the observed distributions is expected given the steep GRB number density versus peak photon flux relation (, ; e.g., Preece et al., 2000), which places most GRBs in a given sample near the detection limit — or near some other imposed sample cutoff at a higher flux level — with a narrow logarithmic dispersion of dex.
In Section 2 of this paper we study the correlation of with the energy fluence roughly, where is the burst duration, for which it is clearly important to compensate for the expected non-intrinsic correlation of fluence with that arises essentially because GRB detectors are photon counters and not bolometers. Similarly structured correlations are then discussed in Section 3.
Two basic approaches have been attempted to compensate for GRB flux limits: (1) forward folding of a model GRB rate density, evolution, and luminosity function through a model of the detector response (e.g., Graziani, Donaghy, & Lamb, 2005; Lamb, Donaghy, & Graziani, 2005), and (2) applying non-parametric statistical methods to the data and the observed flux limit for each data point. In Section 2, we review and apply numerical methods which treat the observed data truncation directly according to (2), while we plan to approach the B07 data via path (1) in a future paper. In order to potentially better understand the origin of pre-Swift correlations not strongly present in the Swift sample, we explore the significance of these correlations after rejecting GRBs from the Swift sample which would not have been detected by HETE-2 (Section 2).
Extending our critique to the Ghirlanda et al. (2008) and similar pre-Swift samples of bright GRBs with redshifts (e.g., Amati et al., 2002, 2006; Schaefer, 2007) — where the data truncations are poorly known and the GRBs come from multiple instruments of varying sensitivity — we study (Section 3) how the relations transform from the observer to the source frames. We show that redshift dependence in the correlations is generally weak, which is expected for a correlation that arises due to detection selection effects but also continues to have low scatter in the source frame. We present tests to uncover whether the redshift dependence in other correlations is similarly weak, possibly providing circumstantial evidence that these correlations are non-intrinsic. This general approach, despite its limitations, can potentially find application in other population analyses where thresholds are important.
2. Flux Limits and the - Correlation
In Figure 1, we reproduce the narrow ( dex) scatter in observed in the recent review by Amati et al. (2006) for data from multiple missions. Following Nakar & Piran (2005) in realizing that reaches a maximum for , we assume this redshift value for all GRBs without measured redshift to place the entire B07 sample on the plot (black points). Most (67%) best-fit Swift values are below the lower red-dotted Amati et al. (2006) line. Nearly half (41%) are below the red dotted line at 90% confidence, indicating a clear preference for a lower flux normalization. We determine 90% confidence error bars directly for , given the observed spectrum, rather than attempting to propagate errors on the covariant quantities and .
If we plot the BAT threshold corresponding to the best-fit points, we can see that agreement (right side of the plot) is partly dictated by low-sensitivity, while many sensitive observations are strongly inconsistent. This finding is in excellent agreement with the BATSE studies of faint GRBs (Nakar & Piran, 2005; Band & Preece, 2005). There is no clear separation in Figure 1 between the Amati et al. (2006)-consistent and Amati et al. (2006)-inconsistent points, precluding obvious (non-circular) cuts which could create the semblance of close consistency.
In order to determine the most accurate threshold for each burst, we utilize the observed GRB spectrum and time profile directly instead of employing a model for the threshold (e.g., Band, 2003) which would only utilize results from the spectral fitting (i.e., peak photon flux, , etc.) and possibly the burst duration (e.g., Band, 2006). After determining the energy band — taken to be 15-350 keV for Swift but allowed to vary for HETE-2 (see Figure 2) — and temporal region which maximizes the signal-to-noise ratio , we divide the observed fluence by . This assumes a background dominated light curve (i.e., one where times more counts reach the detector from the diffuse sky background and potentially from non-imaged sources not of interest than from the source of interest), which is appropriate for all but the brightest handful of Swift GRBs, for which our threshold calculation takes into account the observed background counts.
Here and below, we choose a cutoff , because this is where the observed distributions appear to turn over. The turn-over indicates a drop in detection efficiency, because the number of faint bursts is known from BATSE (e.g., Preece et al., 2000) to increase at low flux levels. Because HETE-2 and Swift have intelligent trigger systems which seek to find the light curve region maximizing the (e.g., Band, 2006), the counts that define the also define the fluence threshold, approximately. This threshold estimate is conservative in cases where the trigger system fails to find the optimal region.
Many of the BAT points in Figure 1 are likely lower than plotted, because most known GRBs have . Also, the lower limit error bars in the figure for most of the points are strongly influenced by a BATSE-based prior on , which introduces a bias against the rare but not-unprecedented (e.g. from BATSE) inference that keV (see, B07).
Studies including some bright BAT GRBs (e.g., Amati et al., 2006; Schaefer, 2007) do not report the strong levels of inconsistency because the studies exclude the faint or hard GRBs for which tight constraints cannot be obtained. However, for the simple question of consistency/inconsistency, tight error bars are not required.
It is interesting to note that short-duration GRBs tend to appear in the lower left of Figure 1, populating a region where GRBs of longer duration generally cannot be detected. Only one short-duration GRB is detected at a flux level consistent with the pre-Swift level. As for long-duration GRBs, short-duration GRBs tend to be detected near threshold. Appart from whether correlates with , because of the impulsive nature and resulting low fluence over peak flux ratio for short-duration GRBs, short-duration GRBs are more likely than long-duration GRBs to be outliers to an - relation. We caution, therefore, against using low values to classify (as opposed to, e.g., Amati, 2008) intermediate duration events as short-duration (i.e., potential binary merger) events.
2.1. From Swift BAT to HETE-2
One straight-forward way to diagnose whether a flux limit is truly the origin of inconsistency between the BAT and pre-Swift samples is to compare the BAT data to data from satellites with very different detection sensitivities. If the corresponding surveys both extend to the respective limiting flux limits, then powerlaw fits for each sub-sample should have appropriately different normalizations. (In pre-Swift datasets considered alone as in the next section, the relative detection sensitivities generally do not vary widely, and a different technique must be adopted.) In this sub-section, we compare the Swift and pre-Swift samples using a well-known approach which directly treats the flux limits.
The application and refinement of statistical methods in survival analysis was pioneered in BATSE GRB studies by Petrosian (1993), Lee & Petrosian (1996), and others. By restricting comparison to “associated sets” of data – those GRBs detectable above the estimated thresholds of each other GRB in the set – it is possible to study distributions of and correlations among observables independent of the thresholding, without making assumptions on the nature of the GRBs below threshold.
In the early afterglow-era, these non-parametric methods were applied to correlations between GRB source frame quantities (Lloyd & Petrosian, 1999; Lloyd, Petrosian, & Mallozzi, 2000; Lloyd-Ronning & Ramirez-Ruiz, 2002; Kocevski & Liang, 2006). The study of Lloyd, Petrosian, & Mallozzi (2000), in particular, is important for (1) demonstrating that the correlation between isotropic equivalent -ray peak flux and likely arises as a result of the detection process while also (2) discovering a possibly intrinsic correlation between the isotropic equivalent energy release and (also, Amati et al., 2002).
The Kendall’s statistic employed in these studies reports the fraction of concordant (i.e., correlating) data minus the fraction of discordant data (i.e., anti-correlating). Use of the statistic to rule on a correlation’s strength derives from maximum likelihood principles, and the -test is non-parametric and has maximum statistical power (Kendall, 1938). A simple, elegant, and rigorous extension of the -test accounts for data truncation (see, Efron & Petrosian, 1999): tabulation is restricted to the concordant or discordant data pairs which are detectable above (or below in the case truncations from above) each other’s limits. In this way, both lower and upper limits on either the or variable can be treated, without the need to make assumptions regarding the missing data. Assumptions, sometimes difficult to uncover and possibly also hard to justify, regarding the missing data must generally be made with other methods (e.g., Gelman et al., 2004).
A major advantage of the truncated -test (which employs the limits directly) is that it is in general much easier to calculate a limit for an individual burst than it is to derive a general limit in valid for all . In fact, the general limit can never be calculated precisely, because the flux limit (e.g., a satellite trigger threshold) is not a function of , but of the observed photon flux in some band pass over some shorter time integration.
In the observer frame (Figure 2A), there is a highly significant apparent correlation between and for both HETE-2 (, ) and Swift (, ); see Figure 3. Here, is the energy fluence calculated in the source frame – keV band if is known (e.g., Amati et al., 2002) or in the observer frame – keV band if is unknown. The significance of the correlation becomes modest (, ) when we account for the Swift threshold.
That some significance remains indicates some amount of true correlation between and ; although, this residual correlation is strongly affected by the flux limit. Quantifying the slope, scatter, etc., of this possible residual correlation and its origin in local intrinsic or population-wide evolution effects will be the subject of a future study. Here, we are interested in the expected appearance of the correlation in the Swift survey relative to pre-Swift surveys. The significance of the Swift (, ) or HETE-2 (, ) correlation becomes marginal if we account for the HETE-2 detection threshold (see, Figure 2A). To apply the HETE-2 threshold to Swift data, we fit a second order polynomial to the HETE-2 threshold and evaluate that curve at the Swift points.
In the source frame and restricting to the 63 normal, long-duration GRBs with in B07, we find () for the - relation. However, considering only associated sets of observations above Swift threshold, the correlation strength drops strongly (, ). The similarly strong decrease in correlation significance for the source and observer frame correlation can be understood by noting that the correlations involve very similar data: observer frame points above and below the HETE-2 threshold transform to source frame points with a very weak dependence on the GRB redshifts. The red dashed curve in Figure 2B — corresponding to the trajectory one event follows as its is varied — illustrates how only weakly affects where points fall on the plot (see also, Section 3). Points in the extreme Bottom-Right of Figure 2A or Top-Left of Figure 2B cannot be made consistent at any (Figure 1; see also, Nakar & Piran, 2005; Band & Preece, 2005).
Randomly sampling redshifts from those observed by Swift, we find that % of Swift events above the HETE-2 threshold are consistent with the pre-Swift relation (solid and dotted lines). Only % of Swift event below the HETE-2 threshold are consistent. Most (60%) of the events under the HETE-2 threshold are inconsistent at any . We find similar results — % consistency above HETE-2 threshold and % consistency below HETE-2 threshold — if we assume a very different (toy) distribution and draw from a unit normal distribution. The best-fit - curve from B07 is a factor higher than the pre-Swift curve (solid and dotted lines in Figure 2B).
3. The Anatomy of a Correlation
We now turn our attention to a set of correlation tests useful for GRB samples with poorly understood flux limits. As mentioned above, pre-Swift afterglow-era satellites do not have typical sensitivities that vary by more than a factor of a few (see also, Band, 2003); we find that diagnosing the reliability of a correlation plotted on a log-log scale for these data cannot effectively be done by looking at variations in the normalizations alone. Moreover, the truncations tests discussed in the preceding section cannot easily be applied because the flux limits for each GRB in most pre-Swift surveys are unpublished. We seek to understand how the regression slope and scatter of the relation fits vary with assumptions on the nature of the relations.
For simplicity, if we assume that all GRB fit parameters have the same error, the errors can be ignored. Consider observer frame quantities and , for example. These translate to source frame quantities and , for example. Here, and , (accurate to 20% for , representing the bulk of known GRB redshifts).
If we assume an origin in the observer frame, which we hope later to rule out, cross terms involving the observables and can be ignored, and the expected source-frame linear regression slope (see, e.g. Bevington & Robinson, 2003) is:
Here, denotes an average after subtracting away the mean in or , and is the slope in the observer frame. Equation 1 gives the slope of the apparent source frame relation given the observed data, whereas determination of the true source frame relation requires detailed knowledge of the GRB rate density and luminosity function to impute the missing data. Although recent progress has been made on measuring these (e.g., Kistler et al., 2008), we do not attempt here to reconstruct . Moreover, for comparison with most GRB studies which simply assume (e.g., Firmani et al., 2005), we only require Equation 1.
Relatively independent of which survey we use, , and . This is the typical expected behavior for GRB correlations where or are typically much more broadly distributed than . There are, however, example correlations (e.g., consider Willingale et al., 2007) where the observables and are arranged such that , and the scatter in relative to that in can be important for defining the chance according to Equation 1.
We define the scatter in about the best-fit regression line to be the root-mean-square (RMS) deviation. For the best-fit regression line in Equation 1, the expected scatter is
Covariance terms between and or between and — which generally act to decrease the scatter in the source frame as compared to the observer frame — are dropped because we assume the correlation is non-intrinsic. To be explicit, rearrangement of an equation like shows that the observables should vary with , and this implies cross terms in Equation 2. The cross terms go to zero if the correlation is due to (-independent) observational effects.
The first term in Equation 2 is bound from the bottom by ; equality is obtained for . The second term in Equation 2 implies an increase in over unless, coincidentally, . Interestingly, this “coincidence” is satisfied nearly or exactly for many of the known GRB correlations.
As an example, consider the Ghirlanda et al. (2004) correlation: , where is the energy release corrected for beaming into a jet of angle . For the case of a uniform density medium surrounding the GRB (Sari et al., 1999), , where is the time in the observer frame where the jetting effects are expected to become apparent. Substituting into equations 1 and 2, it follows that , , and . Hence, , as also demonstrated graphically in Figure 4A. The Ghirlanda et al. (2004) correlation is, therefore, effectively independent of the measured spectroscopic redshifts.
3.1. Example Numerical Tests
For the Ghirlanda et al. (2004) correlation data in Figure 4A, the RMS scatter is dex for the observer frame variables and dex for the variables corrected for cosmological distance and redshift. This is a insignificant decrease according to an F-test. Likewise, we find that the source frame data are correlated strongly with Kendall’s . However, data generated with randomized redshifts (e.g., red points in Figure 4A) exhibit larger in most (76%) of the simulations. Somewhat more encouraging, we find a marginal increase in significance in dex. However, the increase is apparently dominated by 1–2 outlier events in a fraction of the simulations; if we instead employ the median absolute deviation about the median of the fit residuals as a robust measure of scatter, 52% of simulations exhibit lower scatter than for the observed data.
For comparison with the previous section and considering the 47 normal, long-duration bursts in Amati et al. (2006) for the - relation, we find a significant change in RMS scatter and slope only if we include two events at erg. Using the outlier-resistant measures of correlation ( or scatter estimated via the median absolute deviation about the median) we find that 10% of simulations yield a better correlation than the observed correlation, independent of whether the two XRFs are excluded. Hence, there is only very weak () evidence from our tests to favor an intrinsic explanation for the - correlation using the bright Amati et al. (2006) data. Exclusion of a small number of outliers is common practice (e.g., Amati et al., 2006); however, it may be reasonable to criticize us for allowing the outlier events to be flagged during and not before the simulations.
As an interesting side-note which may further suggest a paucity of intrinsic physics in Ghirlanda et al. (2004) type relations, the balancing with is characteristic of similar relations found assuming instead a wind-stratified medium (Nava et al., 2006) or even allowing to vary arbitrarily with and (Liang & Zhang, 2005). The latter relation has the smallest , which leads us to suspect that the only reason that a very large number of correlations do not abound in the literature is that we typically restrict to those than can be interpreted physically, or, short of this, we require the observables to evolve with in a physically plausible fashion (e.g., ).
Without these constraints, it is possible to invent highly-significant but absurd relations (e.g., redshifted Swift trigger number versus redshifted burst duration) which make a mockery of all correlation studies. This effect alone is likely not sufficient to generate the tightness of the Ghirlanda et al. (2004) correlation, however.
The tightness of the Ghirlanda et al. (2004) correlation, if it is not intrinsic, must stem additionally from the effects outlined in the previous sections for the functionally similar - correlation. Excluding GRB 970508 as in Ghirlanda et al. (2004), we find that the - relation has only a slight increase in scatter (0.18 dex) relative to the - relation (0.16 dex), considering the same data (Figure 4). In fact, because the inferred values are very narrowly distributed and is proportional to to a power less than unity, a - relation is potentially always tighter than a - relation, depending on how the scatter is measured.
The B07 Swift BAT and Sakamoto et al. (2005) HETE-2 GRB samples both exhibit a statistically significant correlation in the observer frame between and . What creates these correlations? The correlations follow the trigger threshold limits (Figure 2A) in both cases, and the correlation significances drop precipitously when we account for the flux limits (Section 2). Therefore, flux limits must play a strong role shaping the observed correlation, perhaps giving them most of their statistical significance.
What does this tell us about the source frame - relation? Because the redshift dependence in the transformation from - to - is weak, the source frame correlation is likely to have the same origin as the observer frame relation (Section 3). Indeed, we observe a strong decrease in the source frame correlation significance when we account for the Swift threshold.
Finally, what can be learned about the - relation as it appears in Swift data relative to the pre-Swift - relation? The normalization appears to be strongly instrument-dependent, and this suggests the pre-Swift normalization is defined largely by pre-Swift satellite flux limits. B07 find that the Swift - relation has a lower flux normalization and more scatter relative to pre-Swift - relations. We show above that we can raise this normalization (and decrease the scatter about the relation) to a level consistent with the pre-Swift - relation by imposing a heightened flux limit corresponding to detection by a satellite of HETE-2-like sensitivity.
Strictly speaking, from these observations we can only rule out the existence of a narrow relation between and , while some physical correlation between the quantities may be present at high flux levels (see, e.g., the paucity of bright events in Figure 2B). In any case, future satellites more sensitive than Swift are expected to further shift and broaden the - relation into an inequality.
More strongly, because we can probably attribute the slope (see, Section 3), scatter, and normalization of the correlation to selection effects related to a photon flux cutoff and the functional correlation between and (Section 1; also, Massaro et al., 2007), an intrinsic explanation for the correlation in the Swift data may be unnecessary. Our results also call into question any intrinsic explanation for the correlation in pre-Swift data or in surveys only including the brightest Swift events (e.g., Sakamoto et al., 2008), because an - relation should be instrument-independent. The existence of faint Swift events (in the observer frame) mostly missing from pre-Swift surveys shows that we have a poor understanding of the observational selection effects which truncate the earlier data (see, also, Ghirlanda et al., 2008).
We could be incorrect in drawing these conclusions if: (a) our spectral fits are systematically incorrect and there are, in fact, no GRBs at flux levels below HETE-2 threshold levels, or (b) we have estimated the HETE-2 threshold incorrectly by a factor . The accuracy of the spectral fits is addressed extensively in B07, where direct consistency is established relative to observations from Konus-WIND or Suzaku of bursts also detected by Swift. We note that Bellm et al. (2008) have independently verified the statistical analysis in B07 for several bursts also observed by RHESSI. Assumptions in B07 regarding the GRB spectra (see, Section 2), useful in compensating for the narrow BAT bandpass when determining bolometric fluences, are conservative in that they err toward large values. It also seems unlikely that the HETE-2 threshold should have any bearing on the censoring of Swift GRBs, considering also that many authors have reported BATSE GRBs detected below HETE-2 fluence limits (Nakar & Piran, 2005; Band & Preece, 2005; Kaneko et al., 2006).
For (b), we have shown that the - correlation significance, after accounting for the HETE-2 threshold, does not increase if we lower our threshold estimates by a factor of two. A larger error than this on our part for many GRBs is extremely unlikely given the straight-forward nature of the threshold calculation (Section 2). Moreover, such a substantial increase in HETE-2 sensitivity would make HETE-2 as sensitive as Swift, which is demonstrably incorrect given the dramatic difference in the GRB localization rate of the two missions (20yr for HETE-2 and 90yr for Swift). Given the mean peak photon flux to energy fluence ratio from Sakamoto et al. (2005, ph s erg keV), our HETE-2 threshold estimate is within 50% of that estimated in Band (2003).
Finally, there is strong indication that flux limits associated with spectroscopic redshift determination are important to estimate and consider when restricting to GRBs with measured redshift (e.g, as in Ghirlanda et al., 2008). In Figure 2 we mark the HETE-2 events with measured redshift; these are on average a factor two brighter relative to threshold than the events without measured redshifts.
The quality of the HETE-2 GRB localization and possibly also the brightness of the optical transient depend on the brightness of the GRB; both effects contribute to whether the afterglow and host galaxy can be detected. Additional complicated selection effects related to determining the GRB are discussed in Bloom (2003). These are expected to be less important for Swift due to arcsecond X-ray localizations.
We stress that a correlation is not de facto intrinsic simply because trigger thresholding can be ruled out as influencing the surveys. Other flux limits may dominate (and probably do dominate when the samples are restricted to GRBs with measured redshift). A crucial step in utilizing pre-Swift data to rule on the nature of correlations is to establish completeness in the surveys.
4.1. New Correlations: Handle with Care
To gauge the importance of systematic effects in this area of research, we have isolated from the literature two paths that likely have lead to apparently highly significant correlations: (1) selection effects truncate the data in various ways and the “missing” data are not treated; or (2) partial correlation with a hidden variable or variables is ignored. In this paper, we have studied type (1) errors. However, both type (1) and (2) errors can be unmasked using the tools outlined above.
Type (2) errors potentially arise in studies which employ one correlation (e.g., Lag-Luminosity, Norris et al., 2000) to infer for another correlation with similar variables (e.g., -Luminosity), without controlling for partial correlation with the variable in common (see, e.g., Lloyd-Ronning & Ramirez-Ruiz, 2002; Kocevski & Liang, 2006).
For example, from BATSE satellite observations where the detection threshold is very well characterized in terms of peak luminosity, we know that the -Luminosity correlation is largely formed in the detection process (Lloyd, Petrosian, & Mallozzi, 2000). However, this potential strong bias is ignored when Yonetoku et al. (2004) consider a purely intrinsic -Luminosity to derive a significant evolution in the rest-frame properties of GRBs (e.g., the correlation between Luminosity and ). The potential instrumental origin of the -Luminosity relation is also ignored by Schaefer (2007).
We recommend testing potential new correlations (or known correlations not explicitly mentioned here) by proving:
Increase in RMS scatter is statistically significant according to an F-test, and
the correlation scatter or significance determined using outlier resistant measures strongly decreases when the ’s are randomized and the correlations are recalculated, and
the observables, grouped to one side of the correlation equation, vary with as predicted.
These tests are to establish basic confidence in an intrinsic nature for the correlations and are not merely to establish that a correlation can be used to estimate (e.g., Li, 2006; Schaefer & Collazzi, 2007).
For these tests to be accurate, however, it is also necessary to identify selection effects acting on the data. Flux or fluence values should be present down to the established survey completeness level. Well-established methods can then be applied to compensate for data truncation (Section 2) and to control for partial correlations (e.g., Akritas & Siebert, 1996). Covariance between the measured quantities, if present, must also be treated (e.g., Lee & Petrosian, 1996; Cabrera et al., 2007, B07).
We show above for GRBs observed with multiple satellites that four example correlations reported in the literature have features (instrument-dependent normalizations, weak -dependence, etc.) indicative of strong contamination by or even an origin in selection effects. Contrarily, there is no widely-accepted, non-a-posterior explanation for the correlations in the source frame. Also, the a-posteriori theoretical explanations (e.g., Eichler & Levinson, 2004; Schaefer, 2007) fix the correlation normalizations by requiring GRBs to have one intrinsic spectrum or to be standardizable candles (e.g., a narrow distribution; see, Bloom, Frail, & Kulkarni, 2003), a hypothesis no longer well-supported by the data (see, Kocevski & Butler, 2008).
The common -independence of GRB correlations is either very odd or damning. As we discuss, this is one characteristic of a tight, apparent source-frame correlation which arises purely due to selection effects. For correlations between luminosity and some other measured quantity that does not depend on luminosity distance, -independence is likely not a consequence of intrinsic physics, because the GRB cannot possibly know how the distance to the observer should vary with . However, -independence might be expected for correlations between luminosities (e.g., the recently reviewed - correlation, Nysewander et al., 2008).
We note that a requirement of -independence trivially explains the relative slopes of the - and - relations (Section 3), something theory can apparently do as well but only with reference to complicated biases stemming from GRB beaming (Levinson & Eichler, 2005).
As an obvious point, we caution that the balancing of 1 terms on both sides of the correlation equations (from which the approximate -independence arises) implies the correlations cannot be used to infer . More speculatively — because redshift balancing allows a non-intrinsic observer-frame correlation to appear in the source frame as a low-scatter correlation — this balancing may have played a role in the discovery of potential intrinsic correlations. If the correlations currently known have been selected in place of correlations that do not balance redshift, and if the fitting of GRB properties and the pruning of outliers has been conducted with these correlations in mind, then using these correlations to construct a Hubble diagram and test concordance cosmology (e.g., Ghirlanda et al., 2005; Schaefer, 2007) is circular.
We stress that these redshift dependency problems — and our intrepetation of them — are unique to and potentially only characteristic of high- objects like GRBs (as opposed to, e.g., SNe) where there is essentially no low- calibration, and luminosity distance must generally be calculated using and assuming a cosmology model.
Small sample sizes may also have played an important role in allowing for tight apparent correlations through over-fitting. The correlation involving Luminosity, , and duration for 22 GRBs in Firmani et al. (2006), which does not appear to suffer from redshift balancing (see Figure 7 in Firmani et al., 2006), is an interesting case. When looking at larger datasets, however, there appears to be increased scatter and no statistically significant improvement in scatter relative to the -Luminosity (Collazzi & Schaefer, 2008) or - (Rossi et al., 2008) correlations. Investigation of the correlations in the largest possible datasets is, therefore, critical.
The GRB community is no longer starved for data. The next critical step toward uncovering intrinsic correlations is to combine all available data by establishing sample completeness in pre-Swift surveys (e.g., Ghirlanda et al., 2008) and treating the dominant flux truncations using methods like those outlined above.
In looking for new relations relevant to the physical processes underlying GRBs, and to avoid an inherent difficulty in deciding an origin for the correlations in the source or observer frames (Section 3), it may be important to choose observables less-broadly distributed than the characteristic range in 1 or to abandon observables with strong and complicated truncations (i.e., fluxes or fluences) altogether. Functional correlations in 1 could be minimized by choosing observables (e.g., powerlaw indices) which do not vary explicitly with . Most directly and circumventing all concerns raised above, we should establish calibration for GRBs at . It is, therefore, also crucial to understand whether GRB properties evolve with redshift.
- affiliation: Townes Fellow, Space Sciences Laboratory, University of California, Berkeley, CA, 94720-7450, USA
- affiliation: Astronomy Department, University of California, 445 Campbell Hall, Berkeley, CA 94720-3411, USA
- affiliation: GLAST/Einstein Fellow
- affiliation: Astronomy Department, University of California, 445 Campbell Hall, Berkeley, CA 94720-3411, USA
- affiliation: Astronomy Department, University of California, 445 Campbell Hall, Berkeley, CA 94720-3411, USA
- affiliation: Sloan Research Fellow
- Akritas, M. G., & Siebert, J. 1996, A&A, 278, 919
- Amati, L., et al. 2002, A&A, 390, 81
- Amati, L. 2006, MNRAS, 372, 233
- Amati, L. 2008, GCN #7612
- Barthelmy, S. D., et al. 2005, Space Science Reviews, 120, 143
- Band, D. L. 2003, ApJ, 588, 945
- Band, D. L. 2006, ApJ, 644, 378
- Band, D. L., & Preece, R. D. 2005, ApJ, 627, 319
- Bellm, E. C., et al. 2008, AIP Conf. Proc. GRB 2007, 1000, 154
- Bevington, P. R., & Robinson, D. K. 2003, Data Reduction and Error Analysis for the Physical Sciences (3rd ed.; McGraw-Hill: New York)
- Bloom, J. S. 2003, AJ, 125, 2865
- Bloom, J. S., Frail, D. A., & Kulkarni, S. R. 2003, ApJ, 594, 674
- Burrows, D. N., et al. 2005, Space Science Reviews, 120, 165
- Butler, N., et al. 2007, ApJ, 671, 656
- Cabrera, J. I., et al. 2007, MNRAS, 382, 342
- Collazzi, A. C., & Schaefer, B. E. 2008, arXiv:0808.2061, ApJ, in press
- Efron, B., & Petrosian, V. 1999, J. of Am. Stat. Assoc., 94, 824
- Eichler, D., & Levinson, A. 2004, ApJ, 614, L13
- Firmani, C., et al. 2004, ApJ, 611, 1033
- Firmani, C., et al. 2006, MNRAS, 370, 185
- Friedman, A. S., & Bloom, J. S. 2005, ApJ, 627, 1
- Gehrels, N., et al. 2004, ApJ, 611, 1005
- Gelman, A., et al. 2004, Bayesian Data Analysis (2nd ed.; Boca Raton:Chapman & Hall/CRC)
- Ghirlanda, G., Ghisellini, G., & Lazzati D. 2004, ApJ, 616, 331
- Ghirlanda, G., et al. 2005, ApJ, 613, L13
- Ghirlanda, et al. 2008, MNRAS, 387, 319
- Graziani, C., Donaghy, T. Q., & Lamb, D. Q. 2005, Il Nuovo Cimento C, 28, 681
- Kaneko, Y., et al. 2006, ApJS, 166, 298
- Kendall, M. G. 1938, Biometrika, 30, 81
- Kistler, M. D., et al. 2008, ApJ, 673, L119
- Kocevski, D., & Liang, E. 2006, ApJ, 642, 371
- Kocevski, D., & Butler, N. 2008, ApJ, 680, 531
- Lee, T. T., & Petrosian, V. 1996, ApJ, 470, 479
- Lamb, D. Q., Donaghy, T. Q., & Graziani, C. 2005, ApJ, 620, 335
- Levinson, A., & Eichler, D. 2005, ApJ, 629, L13
- Li, L. X. 2006, MNRAS, 374, L20
- Li, L. X., 2007, MNRAS, 79, L55
- Liang, E., & Zhang, B. 2005, ApJ, 633, 611
- Lloyd, N. M., & Petrosian, V. 1999, ApJ, 511, 550
- Lloyd, N. M., Petrosian, V., & Mallozzi, R. S. 2000, ApJ, 534, 227
- Lloyd-Ronning, N. M., & Ramirez-Ruiz, E. 2002, ApJ, 576, 101
- Massaro, F., Cutini, S., Conciatore, M. L., & Tramacere, A. 2007, arXiv:0710.2226
- Nakar, E., & Piran, T. 2005, MNRAS, 360, 73
- Nava, L., et al. 2006, A&A, 450, 471
- Norris, J., et al. 2000, ApJ, 534, 248
- Nysewander, M., Fruchter, A. S., &, Peér, A. 2008, arXiv:0808.2610
- Petrosian, V. 1993, ApJ, 402, L33
- Preece, R. D., et al. 2000, ApJ, 126, 19
- Rees, M. J, & Mészáros, P. 2005, ApJ, 628, 847
- Rossi, F., et al. 2008, MNRAS, 388, 1284
- Schaefer, B. E. 2007, ApJ, 660, 16
- Schaefer, B. E., & Collazzi, A. C. 2007, ApJ, 656, L53
- Sakamoto, T., et al. 2005, ApJ, 629, 311
- Sakamoto, T., et al. 2008, ApJ, 679, 570
- Sari, R., Piran, T., & Halpern, J. P. 1999, ApJ, 524, L43
- Vanderspek, R., et al. 2008, HETE-2 catalog, in prep.
- Willingale, R., et al. 2007, arXiv:0710.3727
- Yamazaki, R., Ioka, K., & Nakamura, T. 2004, ApJ, 606, L33
- Yonetoku, D., et al. 2004, ApJ, 609, 935