Generalized Tests for Selection Effects in GRB HighEnergy Correlations
Abstract
Several correlations among parameters derived from modelling the highenergy properties of GRBs have been reported. We show that wellknown examples of these have common features indicative of strong contamination by selection effects. We focus here on the impact of detector threshold truncation on the spectral peak versus isotropic equivalent energy release () relation, extended to a large sample of 218 Swift and 56 HETE2 GRBs with and without measured redshift. The existence of faint Swift events missing from preSwift surveys calls into question inferences based on preSwift surveys which must be subject to complicated incompleteness effects. We demonstrate a generalized method for treating data truncation in correlation analyses and apply this method to Swift and preSwift data. Also, we show that the  (“Ghirlanda”) correlation is effectively independent of the GRB redshifts, which suggests its existence has little to do with intrinsic physics. We suggest that a physicallybased correlation, manifest observationally, must show significantly reduced scatter in the rest frame relative to the observer frame and must not persist if the assumed redshifts are scattered. As with the  correlation, we find that the preSwift, bright GRB  correlation of Amati et al. (2006) does not rigorously satisfy these conditions.
Subject headings:
gamma rays: bursts — methods: statistical — cosmology: observations1. Introduction
Correlations are pervasive in astronomy and generally lead theory in allowing us to discover causal relationships between observed quantities. In the study of Gammaray Bursts (GRBs), multiple powerlaw relations among GRB observables have been used to uncover the intrinsic physics of GRBs themselves (e.g., Eichler & Levinson, 2004; Yamazaki et al., 2004; Rees & Mészáros, 2005; Levinson & Eichler, 2005; Lamb, Donaghy, & Graziani, 2005) and to use GRBs as probes to the distant Universe (e.g., Ghirlanda et al., 2005; Schaefer, 2007). Several authors have critically examined the limitations of such relations: understanding how well they potentially constrain physics given their form and scatter (e.g., Friedman & Bloom, 2005; Schaefer & Collazzi, 2007), uncovering possible evolution with cosmic time (e.g., Yonetoku et al., 2004; Li, 2007), realizing the commonality of outliers (Nakar & Piran, 2005; Band & Preece, 2005; Kaneko et al., 2006), and — most fundamentally — determining how the existence and form of the relations vary once spurious correlation imparted by selection effects is treated (e.g., Lloyd, Petrosian, & Mallozzi, 2000).
Recently, Butler et al. (2007, hereafter B07) report evidence from Swift satellite (Gehrels et al., 2004) observations of GRBs that several of the correlations exhibit a wide scatter and shift in normalization toward the Swift detection threshold, suggestive of an origin intimately connected to the detection limits of preSwift satellites. Because relations intrinsic to the physical processes underlying GRBs should not be instrumentdependent, a broader investigation into the data from preSwift satellites is crucial for determining whether and/or how the relations can be trusted to potentially constrain the physics of GRBs or cosmology.
Here, we extend our critique to include the full B07 catalog of Swift GRBs with and without measured redshifts (for 218 total Swift GRBs) and also to include HETE2 GRBs with and without measured (56 GRBs). The uniform B07 catalog is novel for deriving bolometric fluences for all Swift GRBs (between GRBs 041220 and 070509), without requiring tight error bars on the spectroscopic fit parameters. The sample is therefore flux limited by the sensitivity of the Burst Alert Telescope (BAT; Barthelmy et al., 2005); because the Xray Telescope (XRT; Burrows et al., 2005) localizes to few arcsecond precision nearly all BAT GRB afterglows, additional flux limits associated with afterglow localization and host galaxy detection (important for example when considering the GRBs with measured ) are likely not present or are far less important as compared to stronger flux limits imposed in preSwift surveys.
A study similar to this has recently been conducted by Ghirlanda et al. (2008) comparing a smaller sample of bright Swift GRBs to the BeppoSAX sample (although only the subsamples with measured ). Ghirlanda et al. (2008) find that faint GRBs detected by Swift (and HETE2; Section 2) are missing from the BeppoSAX sample, and there is no clear explanation for the missing data in terms of either the SAX trigger threshold or a (likely higher) threshold resulting from a demand that tight error bars be derived in the GRB spectral modelling. As in Ghirlanda et al. (2008), we also focus on the correlation between the isotropic equivalent energy and the peak in the spectrum (Lloyd, Petrosian, & Mallozzi, 2000; Amati et al., 2002).
Selection effects in highenergy GRBs observables are welldocumented (e.g., Lee & Petrosian, 1996), if rarely treated. For faint GRBs, an expected departure of the true distributions of flux, duration, etc., from the observed distributions is expected given the steep GRB number density versus peak photon flux relation (, ; e.g., Preece et al., 2000), which places most GRBs in a given sample near the detection limit — or near some other imposed sample cutoff at a higher flux level — with a narrow logarithmic dispersion of dex.
In Section 2 of this paper we study the correlation of with the energy fluence roughly, where is the burst duration, for which it is clearly important to compensate for the expected nonintrinsic correlation of fluence with that arises essentially because GRB detectors are photon counters and not bolometers. Similarly structured correlations are then discussed in Section 3.
Two basic approaches have been attempted to compensate for GRB flux limits: (1) forward folding of a model GRB rate density, evolution, and luminosity function through a model of the detector response (e.g., Graziani, Donaghy, & Lamb, 2005; Lamb, Donaghy, & Graziani, 2005), and (2) applying nonparametric statistical methods to the data and the observed flux limit for each data point. In Section 2, we review and apply numerical methods which treat the observed data truncation directly according to (2), while we plan to approach the B07 data via path (1) in a future paper. In order to potentially better understand the origin of preSwift correlations not strongly present in the Swift sample, we explore the significance of these correlations after rejecting GRBs from the Swift sample which would not have been detected by HETE2 (Section 2).
Extending our critique to the Ghirlanda et al. (2008) and similar preSwift samples of bright GRBs with redshifts (e.g., Amati et al., 2002, 2006; Schaefer, 2007) — where the data truncations are poorly known and the GRBs come from multiple instruments of varying sensitivity — we study (Section 3) how the relations transform from the observer to the source frames. We show that redshift dependence in the correlations is generally weak, which is expected for a correlation that arises due to detection selection effects but also continues to have low scatter in the source frame. We present tests to uncover whether the redshift dependence in other correlations is similarly weak, possibly providing circumstantial evidence that these correlations are nonintrinsic. This general approach, despite its limitations, can potentially find application in other population analyses where thresholds are important.
2. Flux Limits and the  Correlation
In Figure 1, we reproduce the narrow ( dex) scatter in observed in the recent review by Amati et al. (2006) for data from multiple missions. Following Nakar & Piran (2005) in realizing that reaches a maximum for , we assume this redshift value for all GRBs without measured redshift to place the entire B07 sample on the plot (black points). Most (67%) bestfit Swift values are below the lower reddotted Amati et al. (2006) line. Nearly half (41%) are below the red dotted line at 90% confidence, indicating a clear preference for a lower flux normalization. We determine 90% confidence error bars directly for , given the observed spectrum, rather than attempting to propagate errors on the covariant quantities and .
If we plot the BAT threshold corresponding to the bestfit points, we can see that agreement (right side of the plot) is partly dictated by lowsensitivity, while many sensitive observations are strongly inconsistent. This finding is in excellent agreement with the BATSE studies of faint GRBs (Nakar & Piran, 2005; Band & Preece, 2005). There is no clear separation in Figure 1 between the Amati et al. (2006)consistent and Amati et al. (2006)inconsistent points, precluding obvious (noncircular) cuts which could create the semblance of close consistency.
In order to determine the most accurate threshold for each burst, we utilize the observed GRB spectrum and time profile directly instead of employing a model for the threshold (e.g., Band, 2003) which would only utilize results from the spectral fitting (i.e., peak photon flux, , etc.) and possibly the burst duration (e.g., Band, 2006). After determining the energy band — taken to be 15350 keV for Swift but allowed to vary for HETE2 (see Figure 2) — and temporal region which maximizes the signaltonoise ratio , we divide the observed fluence by . This assumes a background dominated light curve (i.e., one where times more counts reach the detector from the diffuse sky background and potentially from nonimaged sources not of interest than from the source of interest), which is appropriate for all but the brightest handful of Swift GRBs, for which our threshold calculation takes into account the observed background counts.
Here and below, we choose a cutoff , because this is where the observed distributions appear to turn over. The turnover indicates a drop in detection efficiency, because the number of faint bursts is known from BATSE (e.g., Preece et al., 2000) to increase at low flux levels. Because HETE2 and Swift have intelligent trigger systems which seek to find the light curve region maximizing the (e.g., Band, 2006), the counts that define the also define the fluence threshold, approximately. This threshold estimate is conservative in cases where the trigger system fails to find the optimal region.
Many of the BAT points in Figure 1 are likely lower than plotted, because most known GRBs have . Also, the lower limit error bars in the figure for most of the points are strongly influenced by a BATSEbased prior on , which introduces a bias against the rare but notunprecedented (e.g. from BATSE) inference that keV (see, B07).
Studies including some bright BAT GRBs (e.g., Amati et al., 2006; Schaefer, 2007) do not report the strong levels of inconsistency because the studies exclude the faint or hard GRBs for which tight constraints cannot be obtained. However, for the simple question of consistency/inconsistency, tight error bars are not required.
It is interesting to note that shortduration GRBs tend to appear in the lower left of Figure 1, populating a region where GRBs of longer duration generally cannot be detected. Only one shortduration GRB is detected at a flux level consistent with the preSwift level. As for longduration GRBs, shortduration GRBs tend to be detected near threshold. Appart from whether correlates with , because of the impulsive nature and resulting low fluence over peak flux ratio for shortduration GRBs, shortduration GRBs are more likely than longduration GRBs to be outliers to an  relation. We caution, therefore, against using low values to classify (as opposed to, e.g., Amati, 2008) intermediate duration events as shortduration (i.e., potential binary merger) events.
2.1. From Swift BAT to HETE2
One straightforward way to diagnose whether a flux limit is truly the origin of inconsistency between the BAT and preSwift samples is to compare the BAT data to data from satellites with very different detection sensitivities. If the corresponding surveys both extend to the respective limiting flux limits, then powerlaw fits for each subsample should have appropriately different normalizations. (In preSwift datasets considered alone as in the next section, the relative detection sensitivities generally do not vary widely, and a different technique must be adopted.) In this subsection, we compare the Swift and preSwift samples using a wellknown approach which directly treats the flux limits.
The application and refinement of statistical methods in survival analysis was pioneered in BATSE GRB studies by Petrosian (1993), Lee & Petrosian (1996), and others. By restricting comparison to “associated sets” of data – those GRBs detectable above the estimated thresholds of each other GRB in the set – it is possible to study distributions of and correlations among observables independent of the thresholding, without making assumptions on the nature of the GRBs below threshold.
In the early afterglowera, these nonparametric methods were applied to correlations between GRB source frame quantities (Lloyd & Petrosian, 1999; Lloyd, Petrosian, & Mallozzi, 2000; LloydRonning & RamirezRuiz, 2002; Kocevski & Liang, 2006). The study of Lloyd, Petrosian, & Mallozzi (2000), in particular, is important for (1) demonstrating that the correlation between isotropic equivalent ray peak flux and likely arises as a result of the detection process while also (2) discovering a possibly intrinsic correlation between the isotropic equivalent energy release and (also, Amati et al., 2002).
The Kendall’s statistic employed in these studies reports the fraction of concordant (i.e., correlating) data minus the fraction of discordant data (i.e., anticorrelating). Use of the statistic to rule on a correlation’s strength derives from maximum likelihood principles, and the test is nonparametric and has maximum statistical power (Kendall, 1938). A simple, elegant, and rigorous extension of the test accounts for data truncation (see, Efron & Petrosian, 1999): tabulation is restricted to the concordant or discordant data pairs which are detectable above (or below in the case truncations from above) each other’s limits. In this way, both lower and upper limits on either the or variable can be treated, without the need to make assumptions regarding the missing data. Assumptions, sometimes difficult to uncover and possibly also hard to justify, regarding the missing data must generally be made with other methods (e.g., Gelman et al., 2004).
A major advantage of the truncated test (which employs the limits directly) is that it is in general much easier to calculate a limit for an individual burst than it is to derive a general limit in valid for all . In fact, the general limit can never be calculated precisely, because the flux limit (e.g., a satellite trigger threshold) is not a function of , but of the observed photon flux in some band pass over some shorter time integration.
In the observer frame (Figure 2A), there is a highly significant apparent correlation between and for both HETE2 (, ) and Swift (, ); see Figure 3. Here, is the energy fluence calculated in the source frame – keV band if is known (e.g., Amati et al., 2002) or in the observer frame – keV band if is unknown. The significance of the correlation becomes modest (, ) when we account for the Swift threshold.
That some significance remains indicates some amount of true correlation between and ; although, this residual correlation is strongly affected by the flux limit. Quantifying the slope, scatter, etc., of this possible residual correlation and its origin in local intrinsic or populationwide evolution effects will be the subject of a future study. Here, we are interested in the expected appearance of the correlation in the Swift survey relative to preSwift surveys. The significance of the Swift (, ) or HETE2 (, ) correlation becomes marginal if we account for the HETE2 detection threshold (see, Figure 2A). To apply the HETE2 threshold to Swift data, we fit a second order polynomial to the HETE2 threshold and evaluate that curve at the Swift points.
In the source frame and restricting to the 63 normal, longduration GRBs with in B07, we find () for the  relation. However, considering only associated sets of observations above Swift threshold, the correlation strength drops strongly (, ). The similarly strong decrease in correlation significance for the source and observer frame correlation can be understood by noting that the correlations involve very similar data: observer frame points above and below the HETE2 threshold transform to source frame points with a very weak dependence on the GRB redshifts. The red dashed curve in Figure 2B — corresponding to the trajectory one event follows as its is varied — illustrates how only weakly affects where points fall on the plot (see also, Section 3). Points in the extreme BottomRight of Figure 2A or TopLeft of Figure 2B cannot be made consistent at any (Figure 1; see also, Nakar & Piran, 2005; Band & Preece, 2005).
Randomly sampling redshifts from those observed by Swift, we find that % of Swift events above the HETE2 threshold are consistent with the preSwift relation (solid and dotted lines). Only % of Swift event below the HETE2 threshold are consistent. Most (60%) of the events under the HETE2 threshold are inconsistent at any . We find similar results — % consistency above HETE2 threshold and % consistency below HETE2 threshold — if we assume a very different (toy) distribution and draw from a unit normal distribution. The bestfit  curve from B07 is a factor higher than the preSwift curve (solid and dotted lines in Figure 2B).
3. The Anatomy of a Correlation
We now turn our attention to a set of correlation tests useful for GRB samples with poorly understood flux limits. As mentioned above, preSwift afterglowera satellites do not have typical sensitivities that vary by more than a factor of a few (see also, Band, 2003); we find that diagnosing the reliability of a correlation plotted on a loglog scale for these data cannot effectively be done by looking at variations in the normalizations alone. Moreover, the truncations tests discussed in the preceding section cannot easily be applied because the flux limits for each GRB in most preSwift surveys are unpublished. We seek to understand how the regression slope and scatter of the relation fits vary with assumptions on the nature of the relations.
For simplicity, if we assume that all GRB fit parameters have the same error, the errors can be ignored. Consider observer frame quantities and , for example. These translate to source frame quantities and , for example. Here, and , (accurate to 20% for , representing the bulk of known GRB redshifts).
If we assume an origin in the observer frame, which we hope later to rule out, cross terms involving the observables and can be ignored, and the expected sourceframe linear regression slope (see, e.g. Bevington & Robinson, 2003) is:
(1) 
Here, denotes an average after subtracting away the mean in or , and is the slope in the observer frame. Equation 1 gives the slope of the apparent source frame relation given the observed data, whereas determination of the true source frame relation requires detailed knowledge of the GRB rate density and luminosity function to impute the missing data. Although recent progress has been made on measuring these (e.g., Kistler et al., 2008), we do not attempt here to reconstruct . Moreover, for comparison with most GRB studies which simply assume (e.g., Firmani et al., 2005), we only require Equation 1.
Relatively independent of which survey we use, , and . This is the typical expected behavior for GRB correlations where or are typically much more broadly distributed than . There are, however, example correlations (e.g., consider Willingale et al., 2007) where the observables and are arranged such that , and the scatter in relative to that in can be important for defining the chance according to Equation 1.
We define the scatter in about the bestfit regression line to be the rootmeansquare (RMS) deviation. For the bestfit regression line in Equation 1, the expected scatter is
(2) 
Covariance terms between and or between and — which generally act to decrease the scatter in the source frame as compared to the observer frame — are dropped because we assume the correlation is nonintrinsic. To be explicit, rearrangement of an equation like shows that the observables should vary with , and this implies cross terms in Equation 2. The cross terms go to zero if the correlation is due to (independent) observational effects.
The first term in Equation 2 is bound from the bottom by ; equality is obtained for . The second term in Equation 2 implies an increase in over unless, coincidentally, . Interestingly, this “coincidence” is satisfied nearly or exactly for many of the known GRB correlations.
As an example, consider the Ghirlanda et al. (2004) correlation: , where is the energy release corrected for beaming into a jet of angle . For the case of a uniform density medium surrounding the GRB (Sari et al., 1999), , where is the time in the observer frame where the jetting effects are expected to become apparent. Substituting into equations 1 and 2, it follows that , , and . Hence, , as also demonstrated graphically in Figure 4A. The Ghirlanda et al. (2004) correlation is, therefore, effectively independent of the measured spectroscopic redshifts.
3.1. Example Numerical Tests
For the Ghirlanda et al. (2004) correlation data in Figure 4A, the RMS scatter is dex for the observer frame variables and dex for the variables corrected for cosmological distance and redshift. This is a insignificant decrease according to an Ftest. Likewise, we find that the source frame data are correlated strongly with Kendall’s . However, data generated with randomized redshifts (e.g., red points in Figure 4A) exhibit larger in most (76%) of the simulations. Somewhat more encouraging, we find a marginal increase in significance in dex. However, the increase is apparently dominated by 1–2 outlier events in a fraction of the simulations; if we instead employ the median absolute deviation about the median of the fit residuals as a robust measure of scatter, 52% of simulations exhibit lower scatter than for the observed data.
For comparison with the previous section and considering the 47 normal, longduration bursts in Amati et al. (2006) for the  relation, we find a significant change in RMS scatter and slope only if we include two events at erg. Using the outlierresistant measures of correlation ( or scatter estimated via the median absolute deviation about the median) we find that 10% of simulations yield a better correlation than the observed correlation, independent of whether the two XRFs are excluded. Hence, there is only very weak () evidence from our tests to favor an intrinsic explanation for the  correlation using the bright Amati et al. (2006) data. Exclusion of a small number of outliers is common practice (e.g., Amati et al., 2006); however, it may be reasonable to criticize us for allowing the outlier events to be flagged during and not before the simulations.
As an interesting sidenote which may further suggest a paucity of intrinsic physics in Ghirlanda et al. (2004) type relations, the balancing with is characteristic of similar relations found assuming instead a windstratified medium (Nava et al., 2006) or even allowing to vary arbitrarily with and (Liang & Zhang, 2005). The latter relation has the smallest , which leads us to suspect that the only reason that a very large number of correlations do not abound in the literature is that we typically restrict to those than can be interpreted physically, or, short of this, we require the observables to evolve with in a physically plausible fashion (e.g., ).
Without these constraints, it is possible to invent highlysignificant but absurd relations (e.g., redshifted Swift trigger number versus redshifted burst duration) which make a mockery of all correlation studies. This effect alone is likely not sufficient to generate the tightness of the Ghirlanda et al. (2004) correlation, however.
The tightness of the Ghirlanda et al. (2004) correlation, if it is not intrinsic, must stem additionally from the effects outlined in the previous sections for the functionally similar  correlation. Excluding GRB 970508 as in Ghirlanda et al. (2004), we find that the  relation has only a slight increase in scatter (0.18 dex) relative to the  relation (0.16 dex), considering the same data (Figure 4). In fact, because the inferred values are very narrowly distributed and is proportional to to a power less than unity, a  relation is potentially always tighter than a  relation, depending on how the scatter is measured.
4. Discussion
The B07 Swift BAT and Sakamoto et al. (2005) HETE2 GRB samples both exhibit a statistically significant correlation in the observer frame between and . What creates these correlations? The correlations follow the trigger threshold limits (Figure 2A) in both cases, and the correlation significances drop precipitously when we account for the flux limits (Section 2). Therefore, flux limits must play a strong role shaping the observed correlation, perhaps giving them most of their statistical significance.
What does this tell us about the source frame  relation? Because the redshift dependence in the transformation from  to  is weak, the source frame correlation is likely to have the same origin as the observer frame relation (Section 3). Indeed, we observe a strong decrease in the source frame correlation significance when we account for the Swift threshold.
Finally, what can be learned about the  relation as it appears in Swift data relative to the preSwift  relation? The normalization appears to be strongly instrumentdependent, and this suggests the preSwift normalization is defined largely by preSwift satellite flux limits. B07 find that the Swift  relation has a lower flux normalization and more scatter relative to preSwift  relations. We show above that we can raise this normalization (and decrease the scatter about the relation) to a level consistent with the preSwift  relation by imposing a heightened flux limit corresponding to detection by a satellite of HETE2like sensitivity.
Strictly speaking, from these observations we can only rule out the existence of a narrow relation between and , while some physical correlation between the quantities may be present at high flux levels (see, e.g., the paucity of bright events in Figure 2B). In any case, future satellites more sensitive than Swift are expected to further shift and broaden the  relation into an inequality.
More strongly, because we can probably attribute the slope (see, Section 3), scatter, and normalization of the correlation to selection effects related to a photon flux cutoff and the functional correlation between and (Section 1; also, Massaro et al., 2007), an intrinsic explanation for the correlation in the Swift data may be unnecessary. Our results also call into question any intrinsic explanation for the correlation in preSwift data or in surveys only including the brightest Swift events (e.g., Sakamoto et al., 2008), because an  relation should be instrumentindependent. The existence of faint Swift events (in the observer frame) mostly missing from preSwift surveys shows that we have a poor understanding of the observational selection effects which truncate the earlier data (see, also, Ghirlanda et al., 2008).
We could be incorrect in drawing these conclusions if: (a) our spectral fits are systematically incorrect and there are, in fact, no GRBs at flux levels below HETE2 threshold levels, or (b) we have estimated the HETE2 threshold incorrectly by a factor . The accuracy of the spectral fits is addressed extensively in B07, where direct consistency is established relative to observations from KonusWIND or Suzaku of bursts also detected by Swift. We note that Bellm et al. (2008) have independently verified the statistical analysis in B07 for several bursts also observed by RHESSI. Assumptions in B07 regarding the GRB spectra (see, Section 2), useful in compensating for the narrow BAT bandpass when determining bolometric fluences, are conservative in that they err toward large values. It also seems unlikely that the HETE2 threshold should have any bearing on the censoring of Swift GRBs, considering also that many authors have reported BATSE GRBs detected below HETE2 fluence limits (Nakar & Piran, 2005; Band & Preece, 2005; Kaneko et al., 2006).
For (b), we have shown that the  correlation significance, after accounting for the HETE2 threshold, does not increase if we lower our threshold estimates by a factor of two. A larger error than this on our part for many GRBs is extremely unlikely given the straightforward nature of the threshold calculation (Section 2). Moreover, such a substantial increase in HETE2 sensitivity would make HETE2 as sensitive as Swift, which is demonstrably incorrect given the dramatic difference in the GRB localization rate of the two missions (20yr for HETE2 and 90yr for Swift). Given the mean peak photon flux to energy fluence ratio from Sakamoto et al. (2005, ph s erg keV), our HETE2 threshold estimate is within 50% of that estimated in Band (2003).
Finally, there is strong indication that flux limits associated with spectroscopic redshift determination are important to estimate and consider when restricting to GRBs with measured redshift (e.g, as in Ghirlanda et al., 2008). In Figure 2 we mark the HETE2 events with measured redshift; these are on average a factor two brighter relative to threshold than the events without measured redshifts.
The quality of the HETE2 GRB localization and possibly also the brightness of the optical transient depend on the brightness of the GRB; both effects contribute to whether the afterglow and host galaxy can be detected. Additional complicated selection effects related to determining the GRB are discussed in Bloom (2003). These are expected to be less important for Swift due to arcsecond Xray localizations.
We stress that a correlation is not de facto intrinsic simply because trigger thresholding can be ruled out as influencing the surveys. Other flux limits may dominate (and probably do dominate when the samples are restricted to GRBs with measured redshift). A crucial step in utilizing preSwift data to rule on the nature of correlations is to establish completeness in the surveys.
4.1. New Correlations: Handle with Care
To gauge the importance of systematic effects in this area of research, we have isolated from the literature two paths that likely have lead to apparently highly significant correlations: (1) selection effects truncate the data in various ways and the “missing” data are not treated; or (2) partial correlation with a hidden variable or variables is ignored. In this paper, we have studied type (1) errors. However, both type (1) and (2) errors can be unmasked using the tools outlined above.
Type (2) errors potentially arise in studies which employ one correlation (e.g., LagLuminosity, Norris et al., 2000) to infer for another correlation with similar variables (e.g., Luminosity), without controlling for partial correlation with the variable in common (see, e.g., LloydRonning & RamirezRuiz, 2002; Kocevski & Liang, 2006).
For example, from BATSE satellite observations where the detection threshold is very well characterized in terms of peak luminosity, we know that the Luminosity correlation is largely formed in the detection process (Lloyd, Petrosian, & Mallozzi, 2000). However, this potential strong bias is ignored when Yonetoku et al. (2004) consider a purely intrinsic Luminosity to derive a significant evolution in the restframe properties of GRBs (e.g., the correlation between Luminosity and ). The potential instrumental origin of the Luminosity relation is also ignored by Schaefer (2007).
We recommend testing potential new correlations (or known correlations not explicitly mentioned here) by proving:

Increase in RMS scatter is statistically significant according to an Ftest, and

the correlation scatter or significance determined using outlier resistant measures strongly decreases when the ’s are randomized and the correlations are recalculated, and

the observables, grouped to one side of the correlation equation, vary with as predicted.
These tests are to establish basic confidence in an intrinsic nature for the correlations and are not merely to establish that a correlation can be used to estimate (e.g., Li, 2006; Schaefer & Collazzi, 2007).
For these tests to be accurate, however, it is also necessary to identify selection effects acting on the data. Flux or fluence values should be present down to the established survey completeness level. Wellestablished methods can then be applied to compensate for data truncation (Section 2) and to control for partial correlations (e.g., Akritas & Siebert, 1996). Covariance between the measured quantities, if present, must also be treated (e.g., Lee & Petrosian, 1996; Cabrera et al., 2007, B07).
5. Conclusions
We show above for GRBs observed with multiple satellites that four example correlations reported in the literature have features (instrumentdependent normalizations, weak dependence, etc.) indicative of strong contamination by or even an origin in selection effects. Contrarily, there is no widelyaccepted, nonaposterior explanation for the correlations in the source frame. Also, the aposteriori theoretical explanations (e.g., Eichler & Levinson, 2004; Schaefer, 2007) fix the correlation normalizations by requiring GRBs to have one intrinsic spectrum or to be standardizable candles (e.g., a narrow distribution; see, Bloom, Frail, & Kulkarni, 2003), a hypothesis no longer wellsupported by the data (see, Kocevski & Butler, 2008).
The common independence of GRB correlations is either very odd or damning. As we discuss, this is one characteristic of a tight, apparent sourceframe correlation which arises purely due to selection effects. For correlations between luminosity and some other measured quantity that does not depend on luminosity distance, independence is likely not a consequence of intrinsic physics, because the GRB cannot possibly know how the distance to the observer should vary with . However, independence might be expected for correlations between luminosities (e.g., the recently reviewed  correlation, Nysewander et al., 2008).
We note that a requirement of independence trivially explains the relative slopes of the  and  relations (Section 3), something theory can apparently do as well but only with reference to complicated biases stemming from GRB beaming (Levinson & Eichler, 2005).
As an obvious point, we caution that the balancing of 1 terms on both sides of the correlation equations (from which the approximate independence arises) implies the correlations cannot be used to infer . More speculatively — because redshift balancing allows a nonintrinsic observerframe correlation to appear in the source frame as a lowscatter correlation — this balancing may have played a role in the discovery of potential intrinsic correlations. If the correlations currently known have been selected in place of correlations that do not balance redshift, and if the fitting of GRB properties and the pruning of outliers has been conducted with these correlations in mind, then using these correlations to construct a Hubble diagram and test concordance cosmology (e.g., Ghirlanda et al., 2005; Schaefer, 2007) is circular.
We stress that these redshift dependency problems — and our intrepetation of them — are unique to and potentially only characteristic of high objects like GRBs (as opposed to, e.g., SNe) where there is essentially no low calibration, and luminosity distance must generally be calculated using and assuming a cosmology model.
Small sample sizes may also have played an important role in allowing for tight apparent correlations through overfitting. The correlation involving Luminosity, , and duration for 22 GRBs in Firmani et al. (2006), which does not appear to suffer from redshift balancing (see Figure 7 in Firmani et al., 2006), is an interesting case. When looking at larger datasets, however, there appears to be increased scatter and no statistically significant improvement in scatter relative to the Luminosity (Collazzi & Schaefer, 2008) or  (Rossi et al., 2008) correlations. Investigation of the correlations in the largest possible datasets is, therefore, critical.
The GRB community is no longer starved for data. The next critical step toward uncovering intrinsic correlations is to combine all available data by establishing sample completeness in preSwift surveys (e.g., Ghirlanda et al., 2008) and treating the dominant flux truncations using methods like those outlined above.
In looking for new relations relevant to the physical processes underlying GRBs, and to avoid an inherent difficulty in deciding an origin for the correlations in the source or observer frames (Section 3), it may be important to choose observables lessbroadly distributed than the characteristic range in 1 or to abandon observables with strong and complicated truncations (i.e., fluxes or fluences) altogether. Functional correlations in 1 could be minimized by choosing observables (e.g., powerlaw indices) which do not vary explicitly with . Most directly and circumventing all concerns raised above, we should establish calibration for GRBs at . It is, therefore, also crucial to understand whether GRB properties evolve with redshift.
Footnotes
 affiliation: Townes Fellow, Space Sciences Laboratory, University of California, Berkeley, CA, 947207450, USA
 affiliation: Astronomy Department, University of California, 445 Campbell Hall, Berkeley, CA 947203411, USA
 affiliation: GLAST/Einstein Fellow
 affiliation: Astronomy Department, University of California, 445 Campbell Hall, Berkeley, CA 947203411, USA
 affiliation: Astronomy Department, University of California, 445 Campbell Hall, Berkeley, CA 947203411, USA
 affiliation: Sloan Research Fellow
References
 Akritas, M. G., & Siebert, J. 1996, A&A, 278, 919
 Amati, L., et al. 2002, A&A, 390, 81
 Amati, L. 2006, MNRAS, 372, 233
 Amati, L. 2008, GCN #7612
 Barthelmy, S. D., et al. 2005, Space Science Reviews, 120, 143
 Band, D. L. 2003, ApJ, 588, 945
 Band, D. L. 2006, ApJ, 644, 378
 Band, D. L., & Preece, R. D. 2005, ApJ, 627, 319
 Bellm, E. C., et al. 2008, AIP Conf. Proc. GRB 2007, 1000, 154
 Bevington, P. R., & Robinson, D. K. 2003, Data Reduction and Error Analysis for the Physical Sciences (3rd ed.; McGrawHill: New York)
 Bloom, J. S. 2003, AJ, 125, 2865
 Bloom, J. S., Frail, D. A., & Kulkarni, S. R. 2003, ApJ, 594, 674
 Burrows, D. N., et al. 2005, Space Science Reviews, 120, 165
 Butler, N., et al. 2007, ApJ, 671, 656
 Cabrera, J. I., et al. 2007, MNRAS, 382, 342
 Collazzi, A. C., & Schaefer, B. E. 2008, arXiv:0808.2061, ApJ, in press
 Efron, B., & Petrosian, V. 1999, J. of Am. Stat. Assoc., 94, 824
 Eichler, D., & Levinson, A. 2004, ApJ, 614, L13
 Firmani, C., et al. 2004, ApJ, 611, 1033
 Firmani, C., et al. 2006, MNRAS, 370, 185
 Friedman, A. S., & Bloom, J. S. 2005, ApJ, 627, 1
 Gehrels, N., et al. 2004, ApJ, 611, 1005
 Gelman, A., et al. 2004, Bayesian Data Analysis (2nd ed.; Boca Raton:Chapman & Hall/CRC)
 Ghirlanda, G., Ghisellini, G., & Lazzati D. 2004, ApJ, 616, 331
 Ghirlanda, G., et al. 2005, ApJ, 613, L13
 Ghirlanda, et al. 2008, MNRAS, 387, 319
 Graziani, C., Donaghy, T. Q., & Lamb, D. Q. 2005, Il Nuovo Cimento C, 28, 681
 Kaneko, Y., et al. 2006, ApJS, 166, 298
 Kendall, M. G. 1938, Biometrika, 30, 81
 Kistler, M. D., et al. 2008, ApJ, 673, L119
 Kocevski, D., & Liang, E. 2006, ApJ, 642, 371
 Kocevski, D., & Butler, N. 2008, ApJ, 680, 531
 Lee, T. T., & Petrosian, V. 1996, ApJ, 470, 479
 Lamb, D. Q., Donaghy, T. Q., & Graziani, C. 2005, ApJ, 620, 335
 Levinson, A., & Eichler, D. 2005, ApJ, 629, L13
 Li, L. X. 2006, MNRAS, 374, L20
 Li, L. X., 2007, MNRAS, 79, L55
 Liang, E., & Zhang, B. 2005, ApJ, 633, 611
 Lloyd, N. M., & Petrosian, V. 1999, ApJ, 511, 550
 Lloyd, N. M., Petrosian, V., & Mallozzi, R. S. 2000, ApJ, 534, 227
 LloydRonning, N. M., & RamirezRuiz, E. 2002, ApJ, 576, 101
 Massaro, F., Cutini, S., Conciatore, M. L., & Tramacere, A. 2007, arXiv:0710.2226
 Nakar, E., & Piran, T. 2005, MNRAS, 360, 73
 Nava, L., et al. 2006, A&A, 450, 471
 Norris, J., et al. 2000, ApJ, 534, 248
 Nysewander, M., Fruchter, A. S., &, Peér, A. 2008, arXiv:0808.2610
 Petrosian, V. 1993, ApJ, 402, L33
 Preece, R. D., et al. 2000, ApJ, 126, 19
 Rees, M. J, & Mészáros, P. 2005, ApJ, 628, 847
 Rossi, F., et al. 2008, MNRAS, 388, 1284
 Schaefer, B. E. 2007, ApJ, 660, 16
 Schaefer, B. E., & Collazzi, A. C. 2007, ApJ, 656, L53
 Sakamoto, T., et al. 2005, ApJ, 629, 311
 Sakamoto, T., et al. 2008, ApJ, 679, 570
 Sari, R., Piran, T., & Halpern, J. P. 1999, ApJ, 524, L43
 Vanderspek, R., et al. 2008, HETE2 catalog, in prep.
 Willingale, R., et al. 2007, arXiv:0710.3727
 Yamazaki, R., Ioka, K., & Nakamura, T. 2004, ApJ, 606, L33
 Yonetoku, D., et al. 2004, ApJ, 609, 935