# Optimal Time-Series Selection of Quasars

###### Abstract

We present a novel method for the optimal selection of quasars using time-series observations in a single photometric bandpass. Utilizing the damped random walk model of Kelly et al. (2009), we parameterize the ensemble quasar structure function in Sloan Stripe 82 as a function of observed brightness. The ensemble model fit can then be evaluated rigorously for and calibrated with individual light curves with no parameter fitting. This yields a classification in two statistics — one describing the fit confidence and one describing the probability of a false alarm — which can be tuned, a priori, to achieve high quasar detection fractions (99% completeness with default cuts), given an acceptable rate of false alarms. We establish the typical rate of false alarms due to known variable stars as % (high purity). Applying the classification, we increase the sample of potential quasars relative to those known in Stripe 82 by as much as 29%, and by nearly a factor of two in the redshift range , where selection by color is extremely inefficient. This represents 1875 new quasars in a 290 deg field. The observed rates of both quasars and stars agree well with the model predictions, with % of quasars exhibiting the expected variability profile. We discuss the utility of the method at high redshift and in the regime of noisy and sparse data. Our time-series selection complements well independent selection based on quasar colors and has strong potential for identifying high-redshift quasars for BAO and other cosmology studies in the LSST era.

^{†}

^{†}slugcomment: Accepted to AJ

## 1 Introduction

Active Galactic Nucleii (AGN) — and quasars (QSOs) in particular — continue to play a central role in modern astrophysics. AGN emission is the hallmark of supermassive black hole growth. Powerful AGN outflows of photons and matter affect the Universe on small (accretion disk) and large (galactic, extragalactic) size scales, and there is now substantial evidence both from observations and simulations (both semi-analytical and hydrodynamical) that a “quasar mode” is a key part of massive galaxy formation and evolution. On an ensemble basis, the observed number densities (e.g., Croom et al., 2009), large scale-structure (e.g., Shen et al., 2007, 2009; Ross et al., 2009), and use as a cosmological probes via the Lyman- forest and Baryon Acoustic Oscillations (BAO; e.g., McDonald & Eisenstein, 2007), demonstrate that luminous AGN — despite their rare nature — provide elucidating constraints on some of the grandest questions of our time.

To fully exploit the potential of active galaxies, astronomers must identify and characterize the physics of large, representative samples. Recent surveys have identified quasars using spectroscopy (e.g., Schneider et al., 2010), color-selection (e.g., Richards et al., 2009), X-ray detections (e.g., Bauer et al., 2004), and optical variability (e.g., Schmidt et al., 2010), among other methods. The quasar catalogs generated by these surveys exhibit a range of different completeness and efficiency characteristics. This is partly due to differences in telescope sensitivities for the different surveys, but also to intrinsic differences in the physics that each survey samples. Deep X-ray surveys and optical variability surveys currently show the most promise for identifying large numbers of AGN per square degree of sky (Brandt & Hasinger, 2005).

Optical surveys most often utilize quasar photometric colors to separate quasars from field stars. Typical “completeness” fractions — the fraction of retained spectroscopically-confirmed quasars — for the selection of quasars based on color are % (e.g., Richards et al., 2001), although this can decrease dramatically, to %, in the redshift range () where the quasar UV color excess (- color) is similar to that of stars (Richards et al., 2006) or for highly extinguished quasars. The “purity” of selection, or the fraction of false alarms due to spectroscopically-confirmed stars, can be similarly low. Optical variability provides an independent selection criteria and is becoming an increasingly important survey technique. Quasar selection based on optical variability is a key component of current and upcoming missions, such as the Panoramic Survey Telescope & Rapid Response System (PanSTARRS; Kaiser et al., 2002) and the Large Synoptic Survey Telescope (LSST; Ivezić et al., 2008; Abell et al., 2009). Indeed, with the recent endorsement by the Astro2010 Decadal Survey of both ground and space-based wide-field synoptic surveys, exploring discovery in the time domain is particularly timely.

Quasar fluxes observed in optical passbands (e.g., Matthews & Sandage, 1963) meander in
time, non-periodically, with flux differences that tend to be larger on larger timescales
(e.g., Hook et al., 1994).
This has historically been characterized in studies of auto-correlation using the
so-called “structure
function” (e.g., Simonetti, Cordes & Heeschen, 1985; Hughes, Aller & Aller, 1992), evaluating the average square
variations versus timescale for an ensemble of quasars. It is generally assumed
that the combination of observations from individual sources,
each perhaps observed only a pair or a few times, will accurately describe the
intrinsic variability of a given quasar.
Data from the Sloan Digital Sky
Survey (SDSS; e.g., Abazajian et al., 2009),
and high-time-cadence observations for Stripe 82^{1}^{1}1The equatorial Stripe 82 region
(20h 24m R.A. 04h 08m, 1.27 deg
Dec 1.27 deg, 290 deg)
was repeatedly observed — 58 imaging runs from 1998
September to 2004 December — with 1–2
observations per week, each Fall.
in particular, have allowed major advances
in quantifying and understanding the nature of this variability
(e.g., Vanden Berk, 2004; Ivezić et al., 2004), particularly for individual objects (e.g., MacLeod et al., 2010; Sesar et al., 2010).
We explore in detail below
the connection between the ensemble variability and the variability of individual,
well-sampled quasars.

The correlated variability of quasars is unique compared to most other variable objects (e.g., stars) which tend on long-timescales (e.g., long relative to a periodicity timescale) to exhibit non-correlated variability. Quasars tend to vary much less on monthly and shorter timescales as compared to yearly timescales, unlike most stars (excluding, e.g., long-period variables). This distinction motivates the possibility of using quasar time series modelling to classify quasar and differentiate them from other objects. Such a classification, particularly if it can be done efficiently, with few data, could have tremendous benefit for selecting quasars for spectroscopic followup and use in cosmological studies (e.g., of BAO; McDonald & Eisenstein, 2007). Progress towards these ends is now becoming possible thanks to the generative “damped random walk” quasar light curve model — a model capable of stochastically producing a quasar like light curve, in this case with only 2 input parameters — uncovered by Kelly et al. (2009). Kozlowski & Kochanek (2010) have shown that this model accurately describes the light curves of 100 well-sampled quasars, and it can be used to separate these from stars. MacLeod et al. (2010) have shown that the model accurately describes individual quasars in Stripe 82. We show that it can be used to accurately describe the ensemble variability as well as the individual variability.

Below, we discuss a novel method to fit the structure function for Stripe 82 using the damped random walk model (Section 2). We then show (Section 3) how the fit of this average quasar model can be rigorously evaluated for individual light curves to separate quasars from stars with no parameter fitting. We show that nearly all known quasars (%; Section 4) in Stripe 82 show the telltale variability signature, with a very high completeness (%) that can be estimated a-priori. This offers a substantial improvement over ad-hoc methods (e.g., 90% completeness in Schmidt et al., 2010) for applying structure function fits to individual light curves. The fraction of stars which could be confused as quasars is, likewise, very small (%) and is in reasonable agreement with a-priori estimates.

The method we outline below is based on maximum likelihood principles; it is therefore theoretically optimal and applicable to any field or survey. The quasar probabilities returned by the method robustly take into account model uncertainties and uncertainties (both statistical and systematic) in the data. The method can be applied in the limit of very few data points (2 or more), because there are no free parameters to fit and can provide key additional leverage to aid color-based selection schemes for future surveys. The methodology is easily adaptable to include observations in multiple photometric filters and to avoid contamination from spurious data.

## 2 Data Selection and Ensemble Quasar Variability

The majority of work presented herein makes use of the photometry from the Sesar et al. (2007) variable source catalog. That catalog contains 67,507 objects in Sloan Stripe 82. These are selected based on the presence of statistically significant temporal variability ( mag RMS and in and ). Additional details regarding the survey selection criteria can be found in Sesar et al. (2007). The variability selection retains % of known (spectroscopic) quasars in the field. Those spectroscopically identified quasars were almost all entirely targeted as a result of color selection from the main SDSS survey (and not because of time-domain characteristics).

Our goal is to differentiate quasars from other field sources using variability metrics alone, without regard to color or cross-correlation with surveys at other wavebands. We begin by evaluating the ensemble variability of 6304 spectroscopically-confirmed quasars in Sesar et al. (2007) to quantify the observed range of magnitude change, , as a function of timescale between measurements at times and .

Figure 1 shows a histogram of all magnitude differences in -band for measurements separated on timescales (in days) between . The histogram is apparently symmetric, peaked at zero, and has broad, exponential wings. The exponential distribution here is commonly found (e.g., Ivezić et al., 2004). We find that the histogram can be well modelled as a sum of zero-mean Gaussians whose widths scale logarithmically with the quasar magnitude. Residual non-Gaussianity (i.e., an excess of large fluctuations as compared to the predictions from a Gaussian) is due to a small fraction (2%) of sources with excess variability (see also, Section 5.2). Figure 2 plots the Gaussian width — after removing the measurement uncertainty — as a function of magnitude and . This is commonly known as the first order quasar structure function (e.g., Simonetti, Cordes & Heeschen, 1985; Hughes, Aller & Aller, 1992).

To move from the ensemble variability to the variability of a given object, it is necessary to treat the covariances between neighboring points — which are estimated from the same data — in the structure function. Given the approximate Gaussianity of the ensemble, it is natural to fit a Gaussian random process to individual objects. Consistent with previous works (Kelly et al., 2009; Kozlowski & Kochanek, 2010; MacLeod et al., 2010), we find that the quasar variability as a function of time difference is well-modelled using a covariance matrix of the form:

(1) |

where is the measurement uncertainty for the i’th observation, is an exponential damping
timescale (units of days), is the intrinsic variance between
observations on short timescales day,
and is the Kronecker delta function (1 for , 0 otherwise). This model
predicts , which rises on
short timescales (). This
model is plotted over the data in Figure 2. To
obtain an acceptable fit, we allow and
to vary logarithmically with the median quasar magnitude.
The best fit scalings for the bands are reported in Table 1.
Python software to calculate the fits and the quality statistics discussed below can be downloaded from the
project webpage^{2}^{2}2http://astro.berkeley.edu/nat/qso_selection.

Filter | ||||
---|---|---|---|---|

3.90 | 0.12 | 2.73 | 0.02 | |

4.10 | 0.14 | 2.92 | 0.07 | |

4.34 | 0.20 | 3.12 | 0.15 | |

4.23 | 0.05 | 2.83 | 0.07 | |

4.44 | 0.13 | 3.06 | 0.07 |

Notes: , with , . Statistcal uncertainties in the above parameters are of order and are negligble. Magnitudes are in the AB system, uncorrected for Galactic extinction.

We can also fit the model directly by maximizing the posterior probability of the model given the data (see, also, Kozlowski & Kochanek, 2010). The best-fit and values for each quasar are plotted in Figure 3A for band. Note that consistent scalings with the band magnitude (Table 1) are found.

The origin of the scaling of variability with source brightness appears to stem from the well-known anti-correlation between intrinsic brightness and variability (e.g., Ivezić et al., 2004). Figure 3B displays this trend for the absolute magnitude in -band, . We note that no longer correlates with -band magnitude or redshift after subtracting away this trend. High-redshift quasars tend to appear toward the left of the plot (Figure 3B) — weak short-timescale variability — due to the flux limit. The effects of a declining luminosity function (and potentially source evolution) also work to keep low-redshift objects toward the right of the plot. The normalization of the fit in Figure 3B can be seen to vary slightly with redshift also as a result of this flux limit, and we can expect the apparent magnitude scalings (Table 1) to also have some dependence on redshift and the survey selection (see, Section 5.3). These shifts are small compared to the (apparently intrinsic) 1 dex scatter in the scalings (e.g., Figure 3B).

## 3 Quasar Variability Selection Formalism

We can regard the parameterization of the using the damped random walk model as a rigorous mathematical approach to evaluating the quasar likelihood for a given light curve, which takes into account all correlations in the data. Employing the mathematical prescription adapted from Rybicki & Press (1994) by Kozlowski & Kochanek (2010), we write for the probability of the data given the quasar variance model :

(2) |

We can marginalize analytically over and replace the exponent in Equation 2 with

(3) |

where . The inverse of is tridiagonal (Rybicki & Press, 1994), which allows for rapid computation. Given the parameterization above (Table 1) for and in terms of apparent magnitude mag, we can then directly evaluate for all objects of interest with no additional fitting (i.e., no fitting beyond the fitting of ).

For a quasar with the mean ensemble variability, should be distributed with degrees of freedom, where equals the number of data points minus one. The most likely value is . The expected distribution of for a temporally-uncorrelated source can be evaluated by Monte Carlo or estimated quickly as we now discuss.

### 3.1 Significance Estimates

The expected value for for a source that is not a quasar but varies in a time-independent fashion with Gaussian scatter is

(4) |

where is the expectation value of and is the matrix trace operation. In Equation 4, we assume that all observations have approximately the same uncertainty, and we ignore the light curve mean. Because off-diagonal terms in do not contribute, in Equation 4 is effectively a sum of the squares of Gaussian random variables — each with zero mean and standard deviation , which includes the measurement uncertainty — multiplied by a constant, . Therefore, the quantity will be distributed, where the number of degrees of freedom . (The missing degree of freedom represents the light curve mean which we have neglected to write down).

The true value of must be determined from the data. In the case again of equal uncertainty on all data points,

(5) |

where . Multiplying the probability density for by and integrating over , we find a Beta distribution for the Null-hypothesis distribution of given the data:

(6) |

with . This reference distribution approximates well the observed frequencies for stars (Figure 6) and can be used as the reference distribution to calculate the significance of a given value. It is useful also to define

(7) |

The quantity will be small and of order for a time-independent, non-quasar variable source (i.e., a potential false alarm).

### 3.2 Confidence Estimates

As we discuss above, the calculated for a hypothetical mean quasar — varying according to Equation 1 with the best-fit parameters in Table 1 defining and — will be of order and will follow a distribution. However, the variability of a given quasar may depart from the mean sample variability. We can allow for this on a source-by-source basis by replacing in Equation 2. The scale factor is a fudge factor that can be marginalized over to allow the mean quasar model to acceptably fit each quasar light curve.

The distribution of given the data for a given quasar is given by Equation 5, replacing with and with the most likely value for of . In an argument parallel to that above used to derive significance, the probability density describing can be convolved with to find a reference Beta distribution to evaluate quasar confidence:

(8) |

with . Likewise for for stars, this reference distribution approximates well the observed frequencies for quasars (Figure 6).

Note that because the reference distribution for evaluating significance has the same form as that used to evaluate confidence, a curve of constant represents a curve of equal odds in favor of the hypothesis quasar versus not-quasar, given equal prior information.

## 4 Quasar Variability Selection Application

In addition to 6573 quasars from Schneider et al. (2010), 3020 labelled stars (i.e., spectroscopically-confirmed) in the DR7 are plotted in Figure 3A. We note that the total number of objects now known to be quasars and used in these plots (and those below) is larger than the quasar sample of 6307 from Sesar et al. (2007) used above to establish . Known stars tend to have days (i.e., approximately temporally-uncorrelated variability) and also increased variability on short timescales relative to quasars (see, also, MacLeod et al., 2010). In principle, Figure 3A can be used for classification of unlabeled sources as well (e.g., Kozlowski & Kochanek, 2010); however, we and others find the uncertainties in these parameters to be large (often a factor of ten even for well-observed sources). Also, there is significant overlap between the populations of labelled sources. A more optimal separation could be developed by seeking to fit no free parameters.

Figure 4 shows and for all 17,588 sources in Sesar et al. (2007) with 50 or more observations in -band. Roughly 70% of all sources in Stripe 82 — excluding 290 R.A. 340 degrees where the Galactic stellar contribution dominates the source counts and blending between sources becomes common — have 50 or more observations. There is a clear separation between the majority of stars and quasars, and the values of and take on their expected values for labelled sources. For the population cut shown in Figure 4, the completeness fraction for the fraction of known quasars retained is 99%. One minus the fraction of known stars retained, the purity, is 97%.

Of all sources, we predict that 40% are quasars, potentially increasing the overall known quasar sample size by 20%. The candidate quasars have a distribution in -band magnitude consistent with that of the known quasars. We discuss the newly discovered quasars in more detail below in Section 5.

We also show explicitly in the plots the location of low redshift () and high redshift () quasars, 97.8% and 94.3% of which, respectively, survive the nominal quasar selection. These tend to have systematically high and low values, respectively, due to the trend toward low variability at high luminosity discussed above.

The selection can be optimized to pursue quasars by placing a more stringent cut on (low quasar-like variability) to reduce the number of selected low-redshift quasars and also relaxing the cut on overall variability to allow in more false alarms, . The resulting completeness (purity) for selecting quasars is 97.6% (95.7%), considering only contamination by stars (95% of quasars are retained, but most could be rejected using color information; Figure 8). Two of the three high-redshift quasars that do not survive the refined cut can be retained using the outlier rejection scheme outlined below. (These quasars both have a single deviant photometric point.)

The classification defined by the cut in Figure 4 essentially assumes equal prior probability that a source is or is not a quasar (Section 4). We can see (Figure 5) that the classification remains quite stable over a broad range of source densities which include the high stellar-density region of 290 R.A. 340 (degrees) and also now include sources with few measurement epochs (the minimum in the Sesar et al. (2007) sample is 9). Because we are now classifying sources with few observation epochs, we select based on the significance level of a given rather than on simply . The completeness (purity) we derive for Figure 5 is 99.1% (97.1%), relatively little changed. However, we caution that this neglects the fact that few sources are spectroscopically identified at high source densities deg (R.A. 300 degrees). It is clear that the dramatic rise in the overall source counts — which must be dominated by stars — would lead to marked completeness and purity decreases. It is, therefore, advisable in such high source density regions to apply a more strict cut on . In principle, prior information on the expected source density in a given direction for a given survey should be utilized to dial in an appropriate threshold value.

### 4.1 Significance/Confidence Validation: Tail Populations

Figure 6 shows the observed distributions of and . Over-plotted are the expected Beta distributions (Section 4) for the median number of degrees of freedom . These describe the observed frequencies well, apart from a tail of 10% of variable stars which have on average and a tail of 10% of highly-variable quasars which have on average. These tail contributions can be used to obtain a precise significance estimate and rates which agree well with predictions, although this improvement has little impact on the logarithmic quasar/star separation in Figure 4. We note that 50% of the objects are at low redshift (), where our approximation that variability scales as apparent (instead of absolute) magnitude is expected to break down.

Taking into account the excess number of events in the tails of the observed and distributions (Figures 6), a cut on either parameter at a value of 2.5 (corresponding to without the tail estimates) would need to be increased to to account for the tails at the same significance level. This corresponds to a cut on the initial distributions, ignoring the tails. Therefore, a conservative prescription to either reject false positives or to reject quasars, respectively, is to cut on or , respectively, at the significance level. At this significance level, there are few outliers: 26 stars (% of the sample) masquerading with as quasars and 15 (% of the sample) of overly-variable quasars, after excluding quasars at . We discuss the nature of these strong outliers in Section 5.2 below.

## 5 Discussion

The light curve fitting above has the potential to provide vital information for quasar selection, particularly for those objects which are challenging to select based on their photometric colors. Figure 7 shows the location of all Sesar et al. (2007) objects in the , color plane. Low redshift quasars tend to lie in a tight locus to the left of the plot. Stars run along a branch upward and to the left. High redshift (and also potentially highly extinguished) quasars fan out to the right of the quasar locus, through the stellar region. Color-based selection is clearly challenging for these objects (see also, Richards et al., 2006). We show in Figure 8 that our method is capable of identifying a substantial number — 1875 in this case — of highly statistically significant quasar candidates. Very few of these (%) lie in the color-color space typically dominated by stars (Region ‘V’ in Figure 7), and we suspect some of these 26 candidates are extinguished quasars.

The selection for Figure 8 synthesizes recommendations from above. For the high-Galactic latitude portion of Stripe 82 (excluding 290 R.A. 340 degrees), we apply a cut on to eliminate the tail of variable stars (Section 5.2; Figure 6, bottom). A higher significance cut () is utilized for the low-Galactic latitude region to account for the higher stellar density (Figure 5). Finally, we ignore sources brighter than (corresponding to 3% of the known quasar sample but likely few of the candidate quasar sample). This cut also helps to eliminate bright, red sources (i.e., primarily late-type stars at , in Figure 7), which are almost certainly not quasars.

Color Region | # Known | New (% Known) | New | New |
---|---|---|---|---|

A () | 6086 | 1498, 297 (25%, 5%) | 93, 44 (6%, 3%) | 1405, 253 (30%, 6%) |

B () | 325 | 288, 230 (89%, 71%) | 43, 41 (58%, 55%) | 245, 189 (98%, 75%) |

C () | 140 | 65, 48 (46%, 33%) | 13, 11 (43%,37%) | 52, 37 (47%, 34%) |

Notes: The color regions A, B, and C are defined in the text and in Figure 7. Columns 3–5 report the number of new quasars relative to spectroscopically-confirmed quasars, followed after a comma by the number (italicized) of new quasars not already color-selected in Richards et al. (2009) or spectroscopically-confirmed.

Quasars redshifts correlate strongly with their colors; three regions (A, B, and C) which contain % of quasars in the redshift ranges , , and , respectively, are plotted in Figure 8. These regions are defined so as to maintain the total number of quasars per redshift bin associated with a given color-color bin. Table 2 displays the number of known, spectroscopic quasars in each color bin as well as the number of new candidate quasars discovered here. In the last two columns of the table, we make this comparison separately for the bright ( mag) and faint ( mag) samples. In parentheses in these columns, we quote the implied fractional increase in known quasars.

We potentially increase the overall known quasar sample by 29% (18756573), with a substantial contribution (%) in the color-color region (B) where color selection is most difficult. The number of known quasars would increase by 36–46%, depending upon whether we include 15 candidate quasars located in the stellar region V in Figure 7. These gains are primarily due the selection here of the faint ( mag) quasars, although the fractional increases for quasars are relatively independent of source brightness. It is important to note when making these comparisons that these gains are relative to spectroscopically-confirmed quasars, whereas a large number of our candidates (1280) are also (un-observed) candidates based on color (Richards et al., 2006). The fraction of new quasars relative to color-selected quasars are quoted as the second percentage in the final two columns of Table 2. These numbers suggest that color-based selection is complete at the % level below and rather incomplete at higher redshift.

As most of the new quasar candidates come from the faint end of the detected source population, where the color selections are more incomplete (owing in part to the need for high signal-to-noise imaging in multiple filters), it is tempting to restrict to the brighter sources in making a direct comparison of time-domain and color selection completeness and purity; Table 2 provides this. However, viewing the entirety of Stripe 82 as a fixed volume of data, the fact that high-confidence quasars can be obtained fainter than the color-selection limit might be considered legitimate advantage of our technique.

Color Region | # Known | New (% Known) | New | New |
---|---|---|---|---|

I (white dwarfs) | 5 | 2, 1 (40%, 20%) | 0, 0 | 2, 1 (no previous) |

II (low- QSOs) | 5866 | 1387, 239 (24%, 4%) | 79, 34 (6%, 2%) | 1308, 205 (29%, 5%) |

III (dm/WD) | 134 | 77, 47 (57%, 35%) | 25, 19 (50%, 38%) | 52, 28 (62%, 33%) |

IV (RR Lyrae) | 158 | 98, 62 (62%, 39%) | 6, 5 (25%, 21%) | 92, 57 (69%, 43%) |

V (stars) | 252 | 257, 210 (102%, 83%) | 49, 48 (80%, 79%) | 208, 162 (109%, 85%) |

VI (high- QSOs) | 158 | 54, 36 (34%, 23%) | 6, 4 (16%, 11%) | 48, 32 (40%, 27%) |

Notes: The color regions I, II, III, IV, V, and VI correspond to the typical location of objects in parentheses in the first column (see Figure 7). Columns 3–5 report the number of new quasars relative to spectroscopically-confirmed quasars, followed after a comma by the number (italicized) of new quasars not already color-selected in Richards et al. (2009) or spectroscopically-confirmed.

Table 3 gives the number of known and candidate sources corresponding to the 6 color-color regions discussed in Sesar et al. (2007). These regions define the typical color-color locations of various astrophysical transients (see, Figure 7). The relative frequencies of the candidate quasars falling within a given region are roughly consistent with those of the known quasars. We note that there is a marked, relative increase in the number of candidates possibly associated with stellar locus stars (region V) and RR Lyrae stars (region IV). RR Lyrae stars tend to be strongly rejected as quasar candidates (Figure 4). In future work we will further explore what fraction of candidate quasars in the stellar locus are stars or highly extinguished quasars.

### 5.1 Weakly Variable or Non-Variable Quasars?

Sesar et al. (2007) determine that 93% of spectroscopically-confirmed quasars brighter than in Stripe 82 are variable at the mag level. They report a conservative fraction (%), which is limited by the measurement uncertainty. Fitting a detailed model for the expected variability, we can make a more general statement regarding the fraction of quasars that vary. As reported above, % of quasars with 50 or more data points yield a highly significant . The quasar population dwindles strongly below (a cut on variability), unlike the stellar population. The majority of stars have which can occur either because the variability is not qso-like or because the variability simply is not statistically significant beyond the measurement error.

The fact that nearly all quasars brighter than have (99.1% of quasars including those with as few as 9 measurement epochs; Section 4) indicates an intrinsic variability at least 60% larger than the typical measurement error of 0.02 mag. Quasar with weaker variability can be measured, and they are missing from Figure 4. The variability cut in Sesar et al. (2007) is agnostic as to whether a source is truly a quasar; therefore the presence of weakly variable stars where no weakly variable quasars are found (in Figure 4) indicates that approximately all quasars are intrinsically variable.

Using the spectroscopic sample from Schneider et al. (2010), in addition to the 6537 quasars considered above which have
light curves as part of the Sesar et al. (2007) project, there are 505 which are not contained in the Sesar et al. (2007) sample.
We downloaded^{3}^{3}3Using an SQL query on the Stripe 82 database hosted by SDSS at http://cas.sdss.org/stripe82/en/tools/search/x_sql.asp
the light curves for these quasars.
The extra fraction of potential quasars reflects uncertainty
in the overall sample size (due, e.g., to cuts on deblending flags, etc.).
We find that 35 of 285 sources
with more than 8 measurement epochs yield . Only 9 of these
have sufficiently weak variability to not be rejected as stable by a classical
test at the level.
This confirms that the
fraction of non-variable quasars is very small (%) and
not impacted strongly by the Sesar et al. (2007) variability selection.

### 5.2 Outliers

We have visually inspected the light curves and spectra for the strong outliers (26 stars and 15 quasars) identified above. Two-thirds of the quasar outliers appear to be the result of a mild amount of errant photometry: we note the when calculated in band for these objects and also in -band provided we remove 1–3 outlying (by ) flux measurements. This outlier identification for individual points in a quasar light curve is determined using the prediction for data point given all the other data points (see, e.g., Kozlowski & Kochanek, 2010, or our software). Of the remaining five sources, two have poor photometry apparently as the result of deblending from a nearby bright source and two have only marginally large after accounting for outliers. We retain one objects (SDSS J001130.40005751.7), a ROSAT source which appears to truly exhibit excessive, particularly short-timescale, variability (, days, in this case) as compared to the remaining sample of .

Vanden Berk (2004) find evidence for a significant increase in short-timescale variability for ROSAT-associated quasars. Consistently, we find a mild increase in on average for 94 ROSAT-associated quasars in Schneider et al. (2007). Of these, 80% have , and the median is . All but a few of these are at , which suggests the poor quasar fit quality is due to the standard luminosity dependent variability (Figure 3B) and not an anomalously high intrinsic variability. MacLeod et al. (2010) have also looked at this issue and find no statistically significant difference in variability for Stripe 82 quasars in the context of the damped random walk model.

An effective scheme for rejecting outliers from an individual quasar light curve, which retains the separation between quasars and stars in Figure 4, can be implemented as follows. First and foremost, apply no outlier rejection in calculating . Next, choose a maximum allowed number of outliers (say 3) and a significance level for the residual (say ). Calculate and the model prediction for the light curve to evaluate outliers iteratively, each time removing only the strongest outlier. Once this is complete, evaluate the standard deviation of the residuals and repeat the process with a significance cutoff that is the larger of 5 or 5 times the standard deviation. This two-step process makes it so that non-quasars receive little outlier rejection.

We note that the calculation of with this outlier rejection scheme substantially dampens by a factor the tail of 10% of mild outlying quasars, leaving primarily only the low- quasars/AGN. The peak in also shifts down by about % for all quasars. We note that we do not utilize this outlier rejection in the paper because the tails are adequately small in our view. Some form of outlier rejection may be quite important when using less pristine photometry.

The outlier rejection method summarized above is tailored for quasars and has no effect on the tail of 26 strongly outlying stars in or the 10% tail of modestly outlying stars. Of these objects, only 4 are late-type stars (which could, in principle be rejected based on color as discussed above). Seven of the objects have quasar-like colors (Figure 7), have targeting flags in SDSS associated with quasars, and have low-confidence (% at best) redshift determinations. We regard these as potentially mis-labeled stars. Fifteen high-confidence stars remain, and nine (about 0.3% of the full sample) exhibit quasar-like () variability. We note that one of these objects was targeted as a white dwarf (SDSS J234601.89004255.5), and in a future study we will probe in more detail the nature of the other outliers and the potential that the underlying physics (e.g., the presence of an accretion disk) may be similar to that of quasars.

The data from multiple filters can also be combined to obtain more robust values for and . The average of these for and bands leads to a small, but significant, improvement in the separation in Figure 4. (The addition of other bands helps little.) To eliminate outliers in the case of approximately simultaneous data, it may also effective to throw out observation epochs exhibiting strong color variations, for example times the standard deviation away from the median color . We observe that the quasar colors are relatively stable in neighboring photometric bands as a function of time, although there is a mild dependence on redshift. There are % color variations that occur — presumably due the difference in continuum and line variability — for redshifts that place the Mg line (also, Ivezić et al., 2004) in one of the passbands.

We have explored also simultaneously fitting Equation 2 to the data from 2 or more passbands to potentially improve statistics. One way to do this is to use the covariance matrix (Equation 1) for the data from one reference passband (e.g., -band) and then to allow the other bands to only vary linearly with the -band magnitude. We find that the and band data can be fit-well in this fashion, which further indicates the approximate constancy of quasar colors. However, the statistics (as measured by the scatter and source separation in Figure 4) do not improve.

### 5.3 Redshift Estimation

In Figures 3B and 4 above, we show a tendency for increased variability with decreasing luminosity which maps to a trend of decreasing variability with redshift. The survey flux limit plays a role in this trend, and evolutionary effects are also likely at work. Our classification appears to perform best above as a result. In principle, the classification can be further optimized to identify bursts at higher redshift (see, e.g., Figure 4) by performing a strict cut on (see, Section 4).

It is also possible to estimate the redshift of a quasar using either or . The scatter in estimated this way relative to the true is large, a factor of 2. There are also likely strong selection effects at play. The flux limit prevents high redshift bursts from exhibiting strong variability. That is, the Malmquist (1922) bias induces a time-domain bias. The survey color-based selection may also play a role. We caution the reader that selection effects that define a given survey may also strongly impact the utility of this redshift estimation. It is not clear at present that useful redshift constraints can be obtained from our variability measures.

## 6 Conclusions

We have explored a parameterization of the ensemble quasar variability structure function using the damped random walk model (Kelly et al., 2009). This enables a statistically rigorous evaluation of the fit of an individual quasar to the expected sample average variability profile. The latter step provides, essentially, a classification between objects undergoing quasar-like variability and objects exhibiting temporally uncorrelated variability. Unlike previous work, the classification requires no parameter fitting, is essentially free from survey-specific peculiarities, and appears to be very robust in separating known variable stars from quasars. Nearly all (%) known quasars show the expected variability profile and can be cleanly separated from stars, with 3% contamination by rare variable stars.

Our variability-selected quasar candidates span a range of redshifts, including a factor of nearly two increase in rates for intermediate redshifts (), where color-based selection performs poorly (e.g., Richards et al., 2006). The classification performs well also at high-redshift, and can be tuned to yield % completeness for quasars. We potentially increase the sample of quasars in Stripe 82 by 10–25%, depending upon whether we compare to color-selected (but not spectroscopically-confirmed) quasars or to spectroscopically-confirmed quasars only. Most of the new quasars are faint ( mag). Relatively independent of these considerations, we increase the quasar fraction by 33% or more for . Software to perform the classification as well as the list of candidate quasars can be downloaded from the project webpage.

This work has been conducted under the auspices of a broader project (The Time-domain Classification Project; Bloom et al., 2008), a goal of which is to characterize the allowed range of variability versus timescale for the full diversity of astrophysical objects. We have shown above how the temporal profile of quasars stands out against the signature of variable stars, predominantly on long timescales but with strong power to separate strong variables from quasars on short timescale. Future work will apply the methods outlined above to select highly extinguished and high-redshift quasars for spectroscopic followup and also to embed the methods discussed above withing the broader TCP framework of variable object classification. Of particular interest will be additional tests of schemes (outlined above) to reject photometric outliers and to apply the classification in the limit of very few data points for wide-field, and in particular real-time surveys, for example using data from the Palomar Transients Factory (Rau et al., 2009).

The “one-versus-many” classification framework shown here should prove to be an important pillar of any survey that relies on time-domain photometric observations for target selection (even if the targets of interest are not quasars). It appears that with the appropriate selection of cadences, the time-domain identification of quasars is now more robust, complete and pure compared with color selections. This has important implications for survey strategies of LSST, the Synoptic All-Sky Infra-Red survey (SASIR; Bloom et al., 2009), and other time-domain projects (e.g., WFIRST): when faced with a choice between more colors and a wider field in a given filter, the latter option is preferred. With deep imaging in one blue band ( or ), coupled with synoptic imaging in a redder band, we believe that high-redshift quasars may be most effectively discovered.

## References

- Abazajian et al. (2009) Abazajian, K. N., et al. 2009, ApJS, 182, 543
- Abell et al. (2009) Abell, P. A., et al. 2009, arXiv:0912.0201
- Bauer et al. (2004) Bauer, F. E., et al. 2004, AdSpR, 34, 2555
- Bloom et al. (2008) Bloom, J. S., et al. 2008, Astr. Nach., 329, 284
- Bloom et al. (2009) Bloom, J. S., et al. 2010, arXiv:0905.1965
- Brandt & Hasinger (2005) Brandt, W. N., & Hasinger, G. 2005, ARA&A, 43, 827
- Croom et al. (2009) Croom, S. M., et al. 2009, MNRAS, 399, 1755
- Hook et al. (1994) Hook, I. M., et al. 1994, MNRAS, 268, 305
- Hughes, Aller & Aller (1992) Hughes, P. A., Aller, H. D., Aller, M. F. 1992, ApJ, 396, 469H
- Ivezić et al. (2004) Ivezić, Z̆., et al. 2004, IAU Symposium 222, T. Storchi-Bergmann, L.C. Ho, & H. R. Schmitt eds.
- Ivezić et al. (2008) Ivezić, Z̆., et al. 2008, arXiv:0805.2366
- Kaiser et al. (2002) Kaiser, N., et al. 2002, SPIE 4836, 154
- Kelly et al. (2009) Kelly, B. C., et al. 2009, ApJ, 698, 895
- Kozlowski & Kochanek (2010) Kozlowski, S., & Kochanek, C. S. 2010, ApJ, 708, 927
- MacLeod et al. (2010) MacLeod, C. L., et al. 2010, arXiv:1004.0276
- Malmquist (1922) Malmquist, K. G. 1922, Lund Medd. Ser. I, 100, 1
- Matthews & Sandage (1963) Matthews, T. A., & Sandage, A. R. 1963, ApJ, 138, 30
- McDonald & Eisenstein (2007) McDonald, P., & Eisenstein, D. J. 2007, PhRvD, 76, 063009
- Rau et al. (2009) Rau, A., et al. 2009, PASP, 121, 1334
- Richards et al. (2001) Richards, G. T., et al. 2001, ApJ, 122, 1151
- Richards et al. (2006) Richards, G. T., et al. 2006, ApJS, 166, 470
- Richards et al. (2009) Richards, G. T., et al. 2009, ApJS, 180, 67
- Rybicki & Press (1994) Rybicki, G. B., & Press, W. H 1994, arXiv:comp-gas/9405004
- Ross et al. (2009) Ross, N. P., et al. 2009, ApJ, 697, 1634
- Schmidt et al. (2010) Schmidt, K., et al. 2010, arXiv:1002.2642
- Sesar et al. (2007) Sesar, B., et al., 2007, AJ, 134, 2236
- Sesar et al. (2010) Sesar, B., et al. 2010, ApJ, 708, 717
- Schneider et al. (2007) Schneider, D. P. et al. 2010, AJ, 134, 102
- Schneider et al. (2010) Schneider, D. P. et al. 2010, arXiv:1004.1167, AJ Accepted
- Simonetti, Cordes & Heeschen (1985) Simonetti, J. H., Cordes, J. M., Heeschen, D. S. 1985, ApJ, 296,
- Shen et al. (2007) Shen, Y., et al. 2007, AJ, 133, 2222
- Shen et al. (2009) Shen, Y., et al. 2009, Apj, 697, 1656
- Vanden Berk (2004) Vanden Berk, D. E., et al. 2004, ApJ, 601, 692