Time-lag Determination by Measures of Randomness

Methods of Reverberation mapping: I. Time-lag Determination by Measures of Randomness

Doron Chelouche11affiliation: Department of Physics, Faculty of Natural Sciences, University of Haifa, Haifa 3498838, Israel; doron@sci.haifa.ac.il, francisco.pozon@gmail.com 22affiliation: Haifa Research Center for Theoretical Physics and Astrophysics, University of Haifa, Haifa 3498838, Israel , Francisco Pozo Nuñez11affiliation: Department of Physics, Faculty of Natural Sciences, University of Haifa, Haifa 3498838, Israel; doron@sci.haifa.ac.il, francisco.pozon@gmail.com 22affiliation: Haifa Research Center for Theoretical Physics and Astrophysics, University of Haifa, Haifa 3498838, Israel , and Shay Zucker33affiliation: Department of Geosciences, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 6997801, Israel; shayz@post.tau.ac.il

A class of methods for measuring time delays between astronomical time series is introduced in the context of quasar reverberation mapping, which is based on measures of randomness or complexity of the data. Several distinct statistical estimators are considered that do not rely on polynomial interpolations of the light curves nor on their stochastic modeling, and do not require binning in correlation space. Methods based on von Neumann’s mean-square successive-difference estimator are found to be superior to those using other estimators. An optimized von Neumann scheme is formulated, which better handles sparsely sampled data and outperforms current implementations of discrete correlation function methods. This scheme is applied to existing reverberation data of varying quality, and consistency with previously reported time delays is found. In particular, the size-luminosity relation of the broad-line region in quasars is recovered with a scatter comparable to that obtained by other works, yet with fewer assumptions made concerning the process underlying the variability. The proposed method for time-lag determination is particularly relevant for irregularly sampled time series, and in cases where the process underlying the variability cannot be adequately modeled.

Subject headings:
galaxies: active — methods: data analysis — methods: statistical — quasars: general

1. Introduction

Reverberation mapping (RM) is a widely used technique for probing spatially unresolved regions in variable astronomical sources. Perhaps its best known application is for the study of the broad-line region (BLR) in quasars, where line emission from photoionized gas lags behind continuum variations, whence its spatial extent may be deduced (Bahcall et al., 1972; Cherepashchuk & Lyutyi, 1973). In this context, RM is at the core of studies to estimate the mass of supermassive black holes (e.g., Kaspi et al., 2000; Peterson & Wandel, 2000; Peterson et al., 2004) with implications for the formation and coevolution of black holes and galaxies (e.g., Shields et al., 2003; Volonteri et al., 2003; Schawinski et al., 2006; Netzer et al., 2007; Trakhtenbrot & Netzer, 2012).

In its simplest form, RM seeks to determine a time delay between time series, which reflects on the size of the source that emits the lagging signal (be it in quasars or in X-ray binaries; Uttley et al. 2014). Nevertheless, RM is not restricted to time-delay measurements, and additional information may be obtained on the geometry of the reverberating region by studying higher moments of the transfer function, provided the data are good enough (e.g., Blandford & McKee, 1982; Krolik et al., 1991; Pancoast et al., 2012; Grier et al., 2013; Pozo Nuñez et al., 2013). In what follows we restrict our discussion to time-delay measurement, which is a commonly encountered problem in astronomy.

Astronomical techniques for RM are inherently different from radar techniques, for example, because the driving signal cannot be controlled and the sampling is rarely regular, which undermines many common algorithms for signal processing. Various approaches have been put forward to deal with irregular sampling. For example, linear interpolation of the data is commonly applied by various algorithms for time-delay determination that are based on cross-correlation techniques (generally referred to in the literature as interpolated cross-correlation functions (ICCF), see, e.g., Gaskell & Peterson, 1987; Welsh, 1999; Chelouche & Daniel, 2012; Chelouche & Zucker, 2013). Other approaches that rely on more elaborate models include joint spline fitting of the quasar light curves with relative scaling and time-shifting, where the lag is obtained using a best-fit criterion (Tewes et al., 2013; Liao et al., 2015, and references therein). Although these approaches to RM are quite effective, which has to do with the fact that quasar variability is characterized by soft power density spectra (White & Peterson, 1994), they are not strictly justified because quasar light curves are stochastic in nature (Kelly et al., 2009).

With the advance in computing power, statistical models for quasar light curves (e.g., Scargle, 1981; Rybicki & Press, 1992) have been introduced to RM (e.g., Reynolds, 2000; Suganuma et al., 2006; Zu et al., 2011). These rely on the notion of quasar ergodicity, and often assume that the underlying process responsible for quasar variability is of the damped random-walk type. Whether this assumption is justified for all sources, at all times, and over the full range of variability timescales, is unclear (Mushotzky et al., 2011; Hernitschek et al., 2015, but see also Fausnaugh et al. 2016).

The aforementioned RM techniques, although widely used, require educated guessing as to how astronomical sources behave when not observed, e.g. due to seasonal gaps, which might result in erroneous lags if not properly applied (Hernitschek et al., 2015). An alternative approach to RM relies on (largely) model-independent time-delay measurement techniques, which fall into two main categories: discrete cross-correlation function (DCF) schemes (Edelson & Krolik, 1988, with its -transformed variant, ZDCF, by Alexander 1997), and serial-dependence schemes (Pelt et al., 1994, 1996, note that some of those have evolved beyond the notion of serial-dependence schemes and may rely on tunable parameters, e.g., Gil-Merino et al. 2002 and references therein). While the use of discrete correlation schemes is widespread in reverberation studies (e.g., Collier et al., 1998; Kaspi et al., 2000; Pozo Nuñez et al., 2012), they are often less stable when sparse sampling is concerned (White & Peterson, 1994; Kaspi et al., 2000), and their results are sensitive to binning in correlation space (Kovačević et al., 2014). Serial-dependence schemes, on the other hand, have largely been restricted to lensing studies (Pelt et al., 1996; Gil-Merino et al., 2002; Tewes et al., 2013; Liao et al., 2015, and references therein), and their relevance to RM problems with nontrivial transfer function effects has not been adequately demonstrated, nor has their performance been benchmarked with respect to other commonly used RM schemes.

With the above limitations of current RM techniques in mind, searching for additional approaches to time-lag measurement is of the essence. Specifically, more reliable model-independent means for RM would help to refine current BLR-size–luminosity relations (e.g., by decreasing their scatter, Haas et al. 2011), fully exploit upcoming survey data (e.g., Shen et al., 2015), and assess the viability of quasars as standard cosmological candles (Elvis & Karovska, 2002; Cackett et al., 2007; Watson et al., 2011; Czerny et al., 2013; Hönig et al., 2017).

In this work we introduce a new class of techniques to quasar RM, which is associated with measures of randomness and includes measures of data-complexity and serial-dependence. Such techniques are extensively used in various non-RM contexts in other fields of science (e.g., in cryptography and econometrics) but are not as widespread in astronomy. Common to all methods employed here is the minimal set of assumptions made when extracting time delays. In particular, the methods outlined here do not rely on data interpolation, nor on binning in correlation space, and do not invoke arguments regarding quasar ergodicity.

The paper is organized as follows: the method is described in §2, where various statistical estimators are defined. Section 3 benchmarks several variants of the method using an extensive suite of numerical simulations. The methods are applied to RM datasets of varying quality in §4, and their performance is benchmarked against other methods of RM. The summary follows in §5.

2. Methods

Figure 1.— The rationale behind an approach to reverberation mapping based on measures of randomness and serial-dependence. The inset shows two simulated light curves corresponding to the continuum signal (blue) and a line signal (green) that has been delayed and smeared by convolving with a wide kernel. Combining the two light curves into a single data stream, , a higher level of regularity is obtained when a relative shift of  days is employed compared to that obtained by using other values of (see text). Statistical estimators that measure the level of regularity of can therefore be used to recover the time delay.

Let us represent two discrete light curves, and , as two finite sets of ordered pairs:


where are the fluxes measured at times . is the number of data points in . We assign the indices of the ordered pairs so that they are in an increasing order of time stamps: when for , and similarly for . We further define the time-shifted version of ,


where is a parameter.

Let us assume that and are normalized to zero mean and unit standard deviation (the degree to which this assumption is effective is discussed in detail in §3.1), and that all time stamps are unique,111In the event that two data points have the same time stamp (this is generally unlikely, especially for ), we introduce arbitrarily small shifts, which do not affect the end result, so that time stamps are unique. i.e., . We then define the combined light curve as


where . Here also we assign the indices in an increasing order of the time stamps, so that is an interlaced combination of and .

Clearly, is a discrete function of time, which depends, by construction, on . In particular, if is a version of that lags by some time interval – as would be the case if is the continuum/broad-line light curve of quasars – then will appear more regular222By the term ”regular” we mean that longer trends in become more pronounced and short term fluctuations are suppressed. For example, when and are, time-wise, better aligned, the power spectrum of becomes more reminiscent of red noise than of white noise. when constructed from equation 3 using with (Fig. 1). The problem of estimating the lag between and therefore translates to finding a reliable estimator for the regularity, or conversely the randomness of , and defining a suitable extremum condition for it, from which the recovered time delay, , may be deduced. Note that many of the statistical properties of , such as its value-distribution moments, are independent of hence algorithms that are sensitive to the ordering of the data are required.

2.1. Definitions of Estimators

There is vast literature on quantifying the serial-dependence, level-of-randomness, and complexity of datasets, which relates to coding theory (e.g., cryptography and data compression problems), and some algorithms work better than others, depending on the type of data and the purpose for which they are employed333A useful compendium is by Bassham et al. (2010), which can be found at http://csrc.nist.gov/publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf.. Below, we outline a few representative algorithms/statistics that operate on but we emphasize that additional relevant estimators exist, which are not included here and may be worth exploring.

2.1.1 Von Neumann

The mean-square successive difference after von Neumann (1941) is


and recall the dependence of on , namely Equation 3, whose explicit dependence is henceforth dropped for brevity. In particular, if were to be drawn from a normal distribution then would converge to twice the variance for large . If, however, positive correlations exist between successive points on the light curve (e.g., the long trends seen in quasar light curves) then its value would be smaller than twice the variance of the light curve, which is insensitive to the ordering of points. For the case in question, constructing with , which corresponds to the lag between and , is expected to result in pronounced trends in and hence to a minimum in (Fig. 2). The recovered lag, , is defined here by the location of the minimum in over the -range probed. A version of this test, which operates on the ranked light curve, , and leads to similar results, is due to Bartels (1982). Similar results are also obtained when considering the sum of absolute differences rather than their squares (not shown).

A version of the von Neumann (VN) estimator due to Pelt et al. (1994, sometimes referred to as a ”dispersion spectrum” or ”Pelt statistics”), which is reminiscent of the definition, multiplies each of the terms in the above sum by a weighting factor, , where


( is the flux measurement uncertainty), and leads to a weighted version of the VN estimator. Like Pelt et al. (1994), we introduce a flag , which multiplies each of the terms in the VN sum, yet our definition expands on theirs, and reads {widetext}


where if the ’th data point originates from and otherwise (see also Appendix A). This definition of serves for a dual purpose: it allows us to include in the sum only those cross terms of the light curve, where each point in the pair originates in a different light curve, thereby ignoring redundant information for time-delay measurements (Pelt et al., 1994). Our definition of further allows us to exclude pairs of points whose observed difference in time-stamps is smaller than some timescale over which correlated errors may be important and could bias the results. For example, the effect of correlated errors between continuum and line flux measurements that originate from the same spectrum can be mitigated by using the -flag with , as considered, for simplicity, henceforth (see Alexander 1997, 2013). The refined VN expression then takes the form


2.1.2 Kendall

This is, essentially, a non-parametric correlation estimator (Kendall, 1938) that involves the counting of the number of concordant and discordant pairs444Concordant pairs are those that are characterized by similar trends, i.e., the first point in both pairs having a smaller (larger) rank than the second point. Discordant pairs are those whose trends are opposite; see also Zucker (2015). ( respectively) in or in its ranked version . It is defined as


and is larger for light curves that are more regular. In particular, when combining the light curves with the correct time shift (of order the lag), the estimator reaches a maximum at (Fig. 2). In addition to the standard implementation, we have also implemented a version of the test with the -flag, as defined by equation 6 with the appropriate normalization555The normalization term in the denominator of Eq. 8 then reads ..

2.1.3 Wald–Wolfowitz

Each point in the light curve is assigned with a property, , which is unity if its value is above the median and zero otherwise. The following sum is then calculated (Wald & Wolfowitz, 1940):


Light curves that show longer trends over many visits, and are hence less random, will result in smaller values of . Conversely, more random light curves, or those that are non-random but rather exhibit frequent – with respect to the cadence – sawtooth-like behavior (not a characteristic of quasar light curves), will attain higher values. We therefore identify the lag with that minimizes (Fig. 2). In addition to the standard implementation, we have also implemented a version of the test with each term in the sum multiplied by the -flag (Equation 6) with proper normalization maintained. Weighting by measurement uncertainties (Eq. 5) may also be formally included with proper normalization as in §2.1.1.

Figure 2.— Examples of measures of randomness of as a function of (in all cases is comprised of the two specific light curves shown in the inset of Fig. 1). Clearly, all measures show significant dependence on and their extrema occur around the input lag of 100 days (marked by the vertical dashed red line) hence . ZDCF results for the most likely lag, for the same particular set of light curves shown in the inset of Figure 1, are shown by the dashed blue line, with asymmetric uncertainty intervals around it, based on a maximum likelihood approach (Alexander, 2013), drawn as blue shaded regions.

2.1.4 Cumulative sums

This test assigns each point in with a property so that if and otherwise. If resembles a random walk then by summing over segments, one expects to observe relatively small deviations from zero (of the -type). On the other hand, larger deviations from a random-walk behavior are expected to occur in light curves that include correlations and trends extending over many successive points or ”steps” (Bassham et al., 2010, for additional information). In particular, we calculate the maximum of partial sums,


A maximal is expected to arise when is comparable to the time delay between and (Fig. 2). As there is no clear dependence of the above sums on (quasar light curves are not of a random-walk type), we do not consider -weights/-flag implementations here.

2.1.5 Blum-Kiefer-Rosenblatt

This test has evolved from the work of Hoeffding (1948), and has been shown to have superior qualities in the astronomical context (Zucker, 2016). The measure is defined using the ranked light curve as in Blum et al. (1961):


where is the ”bivariate rank” (Zucker, 2016), which is defined as the number of pairs of points for which and . For light curves with longer trends (i.e., a lower level of randomness), the first term on the right hand side of the above expression would be larger. We therefore seek that maximizes and identify it with the lag (Fig. 2). A -flagged version has also been formally implemented with the appropriate normalization ().

2.1.6 Lempel-Ziv

Lempel & Ziv (1976) introduced a complexity measure of a data stream, which is at the heart of a well known lossless data compression algorithm. A lower complexity measure means that the data are showing more significant trends and hence are easier to compress666For example, imagine points in the plane that trace a linear trend. These points may be characterized by values or instead by values (their -values augmented by the linear trend parameterization). Choosing the latter representation leads to a compression of the data volume without any information loss.. We therefore expect to have minimal complexity at around (Fig. 2). To limit ourselves to a finite eigenvocabulary (Lempel & Ziv, 1976) we consider two binary-sequence options: either we operate on , as defined in §2.1.3, or instead define if and otherwise ( is a sequence of -points using this definition). It turns out that the outcome, as far as determination of time lag is concerned, depends little on the particular implementation used. Due to the binary nature of this estimator, and the uncertainties associated with its normalization, we do not consider error-weighting or -flag implementations here.

2.2. Definitions of Optimized Estimators

Consider each of the aforementioned estimators, which operate on a time-series defined in equation 3. Clearly, an improper relative normalization of and (or ) could lead to erroneous time-delays (see more below). Nevertheless, due to finite sampling, the true relative normalization of and is unknown. In this work we improve upon the method described in Pelt et al. (1994), and consider the following scheme when assembling an optimal version of . Specifically, in analogy to equation 3 let us define


where are parameters with (unphysical normalizations of the light curves would be obtained for whereby is either nulled or its flux inverted). We note that for light curves typical of RM and from symmetry considerations we may instead choose to normalize instead of so that an expression analogous to Equations 3 and 12 would read


Clearly, the evaluation of should not depend on whether or is used as input, hence we define the optimized versions of the estimators as


where are calculated for respectively. An extremum can then be sought either numerically or analytically in the three-dimensional phase space spanned by and (see Appendix A for the VN scheme). The aforementioned symmetrization is robust to situations where little overlap exists between the shifted light curves, e.g., due to seasonal gaps. In particular, an asymmetric implementation of the normalization scheme may lead to either (in the case of a minimum) or (in the case of a maximum) solutions, and thence to erroneous time-delay measurements.

3. Simulations

Figure 3.— Recovered distributions of time-delay () for the various measures of randomness considered in this work (see legend). Evidently, the VN estimator is sharply peaked at the input lag and is characterized by the smallest dispersion around this value. Other estimators, especially Lempel–Ziv, cumulative sums, and Wald–Wolfowitz (non-solid lines) are less competitive for time-lag determination and may even be biased (see text).

To test the efficiency of measures of randomness for time-lag determination, and to identify potential biases with respect to the true lag, we resort to simulations. We model the driving continuum light curve, , using the method of Timmer & Koenig (1995) with the Fourier transform , where , which is typical of quasars over days to years timescales (Giveon et al., 1999; Mushotzky et al., 2011; Caplar et al., 2016)777Note that some works define the powerlaw index of the power spectral density instead..

The light echo is obtained from the convolution of the driving light curve with a transfer function , so that , where the input lag, , and denotes convolution. In this work we consider rectangular transfer functions over the time range where thus preserving causality while allowing us to control the level of BLR isotropy around the ionizing source by the ratio (e.g., Pozo Nuñez et al., 2013, and references therein).

To simulate light curves that are reminiscent of astronomical data, and are independently and randomly sampled with visits. For comparison, we also experimented with simultaneous uniform sampling of both time series and found the relative benchmarks of the various estimators to be qualitatively similar, hence they are not shown. Lastly, we add uncorrelated (Gaussian) noise to the sampled light curves to mimic observational uncertainties.

Figure 4.— Comparison of the VN and ZDCF schemes for time-delay measurement using mock light curves. Left: the distributions of based on the VN estimator (gray histogram), the optimized VN estimator (black histogram, see §2.2), and the ZDCF estimator (blue histogram); note the logarithmic scaling. Both versions of the VN estimator outperform the ZDCF with higher peak values and narrower distribution cores. The optimized VN scheme results in significantly lower rms deviations from (marked by the dashed red line). The inset shows the full range of obtained for the ZDCF and for the optimized VN (note the small-number statistics in the lower part of the diagram). The large suppression of spurious lags with using the VN scheme compared to the ZDCF method is noticeable (see text). Right: The optimized VN vs. the ZDCF time delays in units of the input lag, for sets of mock light curves. Note the lack of correlation between the bulk of the results of the two methods (see text), and the narrower distribution obtained for the VN scheme.

Having quasar RM campaigns in mind, our basic set of light-curve realizations, around which parameterized excursions are later explored (§3.1), is defined by (Kelly et al., 2009),  days (Kaspi et al., 2000), and , which is the ratio of the total span of the time series, , to the lag (see Fig. 8 in Welsh 1999 for motivation). In addition, each light curve is randomly sampled with 100 visits (), which corresponds to an average sampling timescale for each light curve of (Pozo Nuñez et al., 2012; Barth et al., 2015; Du et al., 2016). We further take , which is consistent with a relatively face-on disk geometry (Pancoast et al., 2012; Pozo Nuñez et al., 2013). A signal-to-noise ratio S/N is assumed, which corresponds to measurement uncertainties at the level (see Kaspi et al., 2000), and may be compared to the assumed fractional rms amplitude of variability of the quasar, (Rodríguez-Pascual et al., 1997; Kaspi et al., 2000).

The time-delays between pairs of mock light curves are deduced in a fully automated way by searching for the extrema associated with each of the aforementioned measures of randomness. Unless otherwise specified, we do not use an error-weighting scheme, nor -flag implementations.

The recovered distributions of time-lag, , are based on light-curve realizations, and are shown for the non-optimized estimator versions in Figure 3. We find that the Bartels and VN estimators lead to similar results, and only the latter, which leads to somewhat improved lag statistics, is shown. Further, the VN estimator is the best performer in our simulations: it exhibits the narrowest distribution of , and is relatively unbiased with . Next in terms of estimator performance come the Kendall and Blum–Kiefer–Rosenblatt estimators, which exhibit larger dispersions around the mean. The Wald–Wolfowitz, the Lempel–Ziv – the particular implementation of which appears to be largely immaterial – and the cumulative sums estimators are, in order of appearance, the worst performers: they lead to the broadest distributions around the central value, and their means, like their median and centroid values, and are biased to shorter lags – by up to 30% for the cumulative sums estimator – for our set of simulations.

3.1. Benchmarking the VN Estimator

With the above results in mind, we focus in this section on benchmarking and improving the von Neumann scheme for RM purposes, and compare its deduced lags to those obtained by the ZDCF method of Alexander (1997, 2013),888See the complete code package at: http://wwo.weizmann.ac.il/weizsites/tal/research/software/. which is a widely used discrete correlation scheme in the field of RM. Unless otherwise specified, the ZDCF algorithm is used here in its default optimal-binning mode and with zero-lag points omitted (Alexander, 2013).

A detailed comparison of the VN and ZDCF deduced lags for mock light curves defined by our basic parameterization is shown in Figure 4. We find that the von Neumann estimator provides a distribution of laga with a narrower core than the ZDCF – the ratio of their full widths at half-maximum is for the simulated case; see Fig. 4 – but with a more pronounced long-tail wing extending to short and even negative time delays (the latter are obtained in of the realizations for this set of simulations). Detrending of the light curves appears to be counter-productive for the VN approach and hence not is considered in the remainder of the paper. We attribute the emergence of the wing toward short time delays to relative inconsistencies in normalization in a fraction of the light-curve realizations, and demonstrate the improved statistics obtained for the optimized scheme of §2.2 in Figure 4. In particular, the distribution of obtained by the optimized scheme becomes more symmetric and narrow; it is significantly (%) narrower than the ZDCF, with the former having 68% of the values within  days from the peak for our basic set of simulations.

The optimized scheme appears to be considerably better than the ZDCF in suppressing spurious time delays over the range (see the inset of Fig. 4). Nevertheless, in what follows we restrict our time-delay search window to the range , within which more than half of the light-curve features covered by the time series are in the time-wise overlapping region between and (this statement is true in the statistical sense but may not hold for individual light curve realizations).

We further note the overall poor correspondence between the time-delay measurements of the VN and the ZDCF methods, leading to a Pearson correlation coefficient (righthand plot of Fig. 4). Recovered time delays that deviate from the central value by more than in either direction, for both methods, are somewhat better correlated (note the somewhat tilted outer contours in Fig. 4 characterized by ). Interestingly, the poor correlation between the time delays obtained by the two methods implies that they may be viewed as independent measurements of the lag, and under certain circumstances might even be used in tandem to improve time-delay estimates; a proper treatment of this possibility is beyond the scope of the present work.

Figure 5.— The relative performance of the optimized VN and the ZDCF methods ( implies an advantage to the VN method; see text). Each panel shows an excursion from the basic model, whose results are highlighted in red, by varying one parameter at a time (left panels relate to observational parameters and right panels to quasar parameters). For clarity, triangles mark upper limits, with the actual values being . Generally, the performance of the optimized VN estimator exceeds that of the ZDCF for the parameter range explored here.

We next benchmark the performance of the optimized VN method against the ZDCF for various parameterized excursions from our basic set of light-curve realizations. To this end we define the rms deviation,999The root mean square deviation is defined here as where is the number of realizations. To avoid having our statistics influenced by -outliers (see Fig. 4), includes the sum of all but the extreme 2.5% measurements on either side of the -distribution, for either of the estimators., , of the measured lag from the true lag per estimator, and calculate the ratio ( implies an advantage to the VN approach). The results are summarized in Figure 5, where each point is based on the analysis of mock light curves.

We find the performance of the optimized VN scheme to be consistently better than that of the ZDCF method over much of the parameter space explored here. Specifically, it performs best for slopes, , of the order of the observed values (but underperforms for , not shown) and for narrow transfer functions, such as those that characterize relatively face-on disks or rings (Pancoast et al., 2012). A further advantage of the optimized VN scheme is attained for more extended campaigns (with a fixed number of points), mainly because of the increased scatter in the ZDCF results with poorer sampling. Higher cadence and S/N further improve the relative performance of the optimized VN scheme, as may be expected from a robust statistical estimator given the rising quality of the data. Interestingly, considering a model with S/N, , and , and repeating the simulations, results in the distribution of of the optimized VN scheme being times narrower than the ZDCF (not shown).

As data quality deteriorates (e.g., sparser light curves), the deduced distributions of broaden and may blend with the spurious lag population at which is shown in the inset of Figure 4. Nevertheless, our calculations indicate a consistent advantage of the VN scheme over the ZDCF approach also in this limit. This is most notable for sparse sampling with and when the total number of points per light curve is low ( for this set of simulations; enforcing a different ZDCF binning scheme than the default does not appreciably change the results and hence is not shown). A significant relative decline in ZDCF’s performance is seen also for the short simulated campaigns with , which rarely characterizes RM campaigns because of well known biases (Welsh, 1999). In the example considered here, the ZDCF time-lag distributions develop a non-negligible wing extending to short/negative delays while the optimized VN is not substantially affected and hence is less biased. A more in-depth exploration of the relative performance of the two estimators under challenging observing conditions, which are not conducive to reliable RM (Welsh, 1999), is beyond the scope of this paper.

Finally, we note that a similar behavior characterizes also the Bartels estimator, whose results are not shown here. Specifically, its values are generally smaller than unity, and hence its performance is superior to the ZDCF, but they are larger than VN’s over the parameter range considered here.

3.1.1 The Weighted VN Scheme

Here we consider the effect of spikes in the light curve (due to larger measurement uncertainties in a fraction of the data points) and correlated noise between the light curves on the deduced time delays and the means to mitigate them using the weighted VN scheme of §2.1.1. To this end, more realistic sets of light curves have been simulated, which mimic typical light curves used for (a) BLR RM and (b) continuum time-delay measurements (perhaps associated with the accretion disk; Collier et al. 1998).

To better mimic BLR RM data we assume our basic set of simulations but include seasonal gaps 150 days wide, and assume that 10% of the data points suffer from an enhanced noise level, with their measurement uncertainties being larger by a factor of 3; an example is shown in Figure 6. Results for the recovered time delays from mock light curve simulations using the weighted and unweighted VN schemes are shown in Figure 6, and demonstrate the advantage of using the former. We find marginal advantage in using the -flag scheme.

To study the effect of correlated noise we turn to the case of continuum RM, where relevant light curves are often characterized by high, nearly uniform cadence, and we rely also on spectroscopic data for measuring fluxes at different wavelengths, hence leading to correlated errors. In our set of simulations we assume a time delay  day with . Sampling cadence is fixed at 1 day, hence the transfer function is barely resolved. We further consider correlated noise at the 5% level, which exceeds the 1% formal measurement uncertainties assumed here, and depict one such realization in Figure 6. Results for the weighted VN give a distribution of centered101010The fact that time-delays on subsampling timescales are obtained should not come as a surprise as those reflect on the average numbers obtained by the FR/RSS scheme (see §4). around 0.5 days, which does not extend to values as large as the input delay because of correlated errors. The bias is alleviated using our -flag implementation, and the average delay is consistent with the input lag.

Figure 6.— Application of the weighted optimized VN scheme to randomly sampled quasar light curves with gaps and spikes (upper panels) and to uniformly sampled light curves with correlated errors and uniform measurement uncertainties (lower panels). Recovered time-lag distributions for realizations (with FR/RSS implementation; see §4) are shown in the right panels while the left panels show typical examples for continuum (blue) and emission line (red) mock light curves. The different distributions shown pertain to different implementations of the algorithm (see legend and text).

4. Data Applications

In this section we apply the schemes to measure randomness (§2) to quasar data of varying quality, and benchmark their performance. We aim to provide lag measurements for individual sources and epochs, so time-lag uncertainties need to be quantified. Here we resort to the standard techniques in RM for estimating uncertainties in lags (see Pelt et al., 1994, for a different implementation), which involves the FR/RSS algorithm of Peterson et al. (1998, 2004), and quote upper/lower uncertainties that correspond to 16/84 percentiles as obtained using the lag statistics for realizations for each set of light curves (see Appendix B). Unless otherwise stated, optimized and weighted versions of the estimators (§2.2), including nontrivial -factor implementations, are considered where applicable.

Figure 7.— BLR-size–luminosity relation for multi-epoch data for NGC 5548. Data were taken from Peterson et al. (2002) with host-corrected luminosity estimates from Bentz et al. (2013). The blue line is the best-fit relation of Bentz et al. (2013). Measurement uncertainties were obtained using the FR/RSS scheme of Peterson et al. (2004). The Pearson correlation coefficient, , is denoted in each panel, and implies an advantage for the VN measure over the other methods described here.
object (days) (days) Ref.

Mrk 335
0.026 1

0.142 2

0.155 2

Fairall 9
0.047 3

Mrk 590
0.026 4

3C 120
0.033 1

Ark 120
0.033 4

Mrk 79
0.022 4

0.100 2

Mrk 110
0.035 4

0.234 2

NGC 3227
0.004 5

Mrk 142
0.045 6

NGC 3516
0.009 5

SBS 1116
0.028 6

Arp 151
0.021 6

NGC 3783
0.010 7

Mrk 1310
0.020 6

NGC 4051
0.002 8

NGC 4151
0.003 9

Mrk 202
0.021 6

NGC 4253
0.013 6

0.158 2

0.063 2

NGC 4593
0.009 10

NGC 4748
0.015 6

0.155 2

Mrk 279
0.031 11

0.090 2

NGC 5548
0.017 5


0.087 2

Mrk 817
0.032 8

Mrk 290
0.030 8

0.129 2

0.112 2

0.292 2

3C 390.3
0.056 13

NGC 6814
0.005 6

Mrk 509
0.034 4

0.063 1

NGC 7469
0.016 14

Values are in the rest frame of the source.

Data references: (1) Grier et al. 2012; (2) Kaspi et al. 2000; (3) Santos-Lleó et al. 1997; (4) Peterson et al. 1998; (5) Denney et al. 2009; (6) Bentz et al. 2009; (7) Stirpe et al. 1994; (8) Denney et al. 2009b (9) Bentz et al. 2006; (10) Denney et al. 2006; (11) Santos-Lleó et al. 2001; (12) Peterson et al. 2002; (13) Dietrich et al. 1998; (14) Peterson et al. 2014.

Table 1Von Neumann Analysis of the Sample of Active Galactic Nuclei of Bentz et al. (2013)

4.1. Multi-epoch Data for NGC 5548

NGC 5548 has one of the best sets of light curves suitable for RM, covering more than a decade of high-cadence observations (Peterson et al., 2002). Based on these data several works have shown that NGC 5548 follows an intrinsic BLR-size-luminosity relation (Peterson et al., 2002; Bentz et al., 2007, 2013), which is consistent with the extrinsic (i.e., multi-object) one. Here we analyze 13 consecutive sets of continuum and H light curves of NGC 5548 spanning a year each (yearly definitions follow those of Peterson et al. 2002; Bentz et al. 2013), and benchmark the performance of the optimized estimators defined in §2 by quantifying the scatter around the BLR-size–luminosity relation. Host-subtracted luminosity estimates for each epoch were taken from Bentz et al. (2013, and see §4.3).

Figure 7 shows versions of the intrinsic BLR-size–luminosity relations as obtained by each of the estimators defined in §2. The relative performance of the estimators is insensitive to whether weighted or unweighted versions are used. We find that the VN estimator outperforms the other measures of randomness, and leads to a tighter correlation between time-delay and luminosity with (time delays are quoted in Table 1). Conversely, the cumulative sums estimator is the worst performer (leading in some cases to negative lags and to ). These findings confirm our simulations (Fig. 3).

Given the advantages of the VN scheme over the other estimators explored here, we henceforth focus on its performance, and consider its weighted version with nontrivial implementations of - and -implementations unless otherwise specified.

4.2. The Palomar-Green Quasar Sample of Kaspi et al. (2000)

Here we benchmark the performance of the VN schemes using the spectrophotometric data of Kaspi et al. (2000) for 17 Palomar-Green (PG) quasars, and compare it to the default implementation of the ZDCF (see also Kaspi et al. 2000). Time delays reported for the latter are obtained using the maximum likelihood approach of Alexander (2013) and the corresponding correlation functions are identical to those reported in Kaspi et al. (2000, see their Fig. 4) and hence are not shown.

Time-delay measurements for the H line are reported here for two implementations of the VN algorithm: including nontrivial weights and flags (leading to the column in Table 2), and for an unweighted version operating over the full extent of (i.e., setting and leading to the column in Table 2). Below we comment on the statistical properties of the lags obtained with further details provided in Appendix C. That said, the reader is advised not to over-interpret the statistics given the sample size and data quality.

We find the results of the weighted and unweighted VN schemes to be similar with , and that their time delays are both comparably correlated with the quasar luminosity and have a similar -value to that of the ICCF scheme (Kaspi et al., 2000, and our Table 2). In contrast, the ZDCF time delays lead to a significantly weaker correlation with luminosity. Further, while the VN scheme leads to positive delays in all cases, the ZDCF reports negative delays in % of the cases. This hints at the superior stability of the VN scheme, as also indicated by our simulations, and perhaps reflects also on the overestimated lag uncertainties obtained by the FR/RSS scheme; see Appendix B. We also find that the VN results are very well correlated with the ICCF time delays (with ; see Table 2 and Kaspi et al., 2000). Taken together, our findings suggest that the VN scheme has two promising features: (1) it has the major advantage of being largely model- and binning-independent while (2) producing more stable results than default implementations of discrete cross-correlation function schemes.

object (days) (days) (days) (days)
PG0026+129 0.142
PG0052+251 0.155
PG0804+761 0.100
PG0844+349 0.064
PG0953+414 0.239
PG1211+143 0.085
PG1226+023 0.158
PG1229+204 0.064
PG1307+085 0.155
PG1351+640 0.087
PG1411+442 0.089
PG1426+015 0.086
PG1613+658 0.129
PG1617+175 0.114
PG1700+518 0.292
PG1704+608 0.371
PG2130+099 0.061
Pearson’s correlation coefficients: 0.96

Values are in the observed frame.

H lag measurements were considered as in Kaspi et al. (2000).

After correcting for cosmological time dilation and using logarithmic values (negative time delays were ignored).

Correlation coefficients should not be overinterpreted because of the limited sample size and the uncertainties associated with individual measurements. Uncertainties on the correlation coefficients are not provided for similar reasons.

Table 2Von Neumann Analysis of the Sample of Palomar-Green Quasars of Kaspi et al. (2000)

4.3. The Sample of Active Galactic Nuclei of Bentz et al. (2013)

Figure 8.— Application of the optimized and weighted VN scheme for the sample of Bentz et al. (2013). Left: Comparison of the time delays obtained here to those reported in Bentz et al. (2013, see references therein), which are largely based on the ICCF scheme (see Table 1). The agreement between the two sets of measurements is good with . The error bars reported here appear to be overestimated by % (see text). Right: The BLR-size–luminosity relation obtained using the VN scheme. We find consistency with the results of Bentz et al. (2013) for the targets analyzed in Table 1 (red points) including multi-epoch time-lag measurements of NGC 5548 (Fig. 7). The scatter in the points around the best-fit relation of Bentz et al. (2013) (blue line) is  dex, and hence is comparable to that reported by Bentz et al. (2013) and Zu et al. (2011). The unweighted power-law fit to our measurements (red line) is consistent with the fit of Bentz et al. (2013) (see text).

Here we revisit the sample of active galactic nuclei (AGN) of Bentz et al. (2013), for which the nuclear luminosity has been estimated by subtracting off the host’s contribution to the aperture flux, resulting in a tighter BLR size–luminosity relation. In particular, we reanalyze the continuum and H data for all sources listed in Table 1 using the optimized and weighted VN scheme with nontrivial -factor implementations. More information about individual targets may be found in Appendix C.

Time-lag measurements are in most cases robust, with the delays being highly significant, as expected given the overall data quality of the sample. That said, the values reported at the short-lag end of the distribution appear to be somewhat less significant than the values reported by Bentz et al. (2013), which may relate to the overall overestimation of the FR/RSS uncertainties when applied to the VN scheme (see below and Appendix B). Overall, a statistical comparison of the time delays obtained here to those of Bentz et al. (2013, and references therein) shows good agreement with (Fig. 8).

The expected correspondence between our results and the lags reported in Bentz et al. (2013) could be used to estimate the degree to which the quoted lag uncertainties are reasonable. To this end, we take a simplified approach and symmetrize the lower and upper measurement uncertainties reported in Table 1, and consider the ratio and its uncertainty with errors added in quadrature. With the expectation of a ratio of unity, we calculate the reduced and find its value to be . Reducing the lag uncertainties by % leads to a reduced , and in agreement with our calculations in Appendix B (note that this is merely a global estimate and may not reflect on the reliability of the measurement uncertainties for individual sources, nor on the reliability of the quoted uncertainties for the ICCF scheme).

Our version of the BLR-size-luminosity is shown in Figure 8 together with multi-epoch results for NGC 5548 of §4.1. The results are consistent with the best-fit relation of Bentz et al. (2013) of the form , and an unweighted fit to the von Neumann results gives a power-law slope and an intercept , which are consistent with the values reported in Bentz et al. (2013) of and .

We find the dispersion in our lag measurements relative to the (theoretical) BLR-size-luminosity relation111111After excluding Mrk 142 and NGC 3227 for reasons given in Bentz et al. (2013). Quoted uncertainties on were obtained by applying a random subset selection (RSS) scheme – akin to that discussed in Peterson et al. (1998) – to the data shown in the right panel of Figure 8 and quoted in Table 1. These do not include lag uncertainties contribution to the scatter due to reasons discussed in Appendix B. to be  dex, which is comparable to the values found in Zu et al. (2011) and Bentz et al. (2013); note, however, the differences in the samples (objects and epochs) as well as estimates of source luminosity considered in each work.

5. Summary

In this paper we seek to expand the range of methods available for time-lag determination in the field of quasar reverberation-mapping. To this end we introduce a class of statistical estimators, which was originally used to measure the degree of randomness or complexity of a dataset (e.g., in cryptographic applications), and adapt it to RM.

Six different randomness/complexity measures are considered, among which the von Neumann and Lempel–Ziv estimators (the latter is at the core of many lossless data compression algorithms) and appropriate extrema criteria for time-lag measurements are defined for each. At least two distinct implementations are considered per estimator (e.g., error-weighted and unweighted schemes) with the feature common to all being the minimal set of assumptions used to analyze the data. Specifically, none of the estimators considered here requires polynomial interpolation schemes to be defined, nor do they employ binning in correlation space. Further, they do not rely on ergodicity arguments and (tunable) stochastic models for quasar variability, hence time-delay measurements are restricted to the information embedded in the light curves being processed.

Among the various measures considered in this work, we find that an optimized implementation of the VN estimator (see Pelt et al., 1994, for a similar but not identical approach) performs best with the added benefit of being mathematically simple and computationally efficient to evaluate. We benchmark its performance using a large set of simulated light curves, and study the effects of data sampling rate, time-series duration, S/N, light-curve spikes, and correlated errors on the deduced lags. The effects of the properties of the quasar – the power density spectrum of its light curve and the BLR transfer function – on the deduced time-delay statistics are also investigated. We find that the optimized VN outperforms current implementations of the discrete correlation scheme for much of the parameter space relevant to RM, and leads to better agreement with the true lags.

Applying the optimized VN estimator to existing data of varying quality, we find its performance to be on par with that of other (more model-dependent) time-lag measurement schemes. In particular, the parameterization of the BLR-size–luminosity relation of Bentz et al. (2013) is recovered, and the scatter around this theoretical relation is found to be comparable to that reported in other studies.

It is found that the optimized VN estimator can, under favorable conditions, be used in conjunction with existing methods, such as the discrete correlation function scheme, to obtain refined lags and to identify biases or the breakdown of the assumptions underlying other methods of RM. Such refinement and verification of time delays is crucial for alleviating systematics and for reducing the scatter in the size-luminosity relation, as well as for identifying departures from it (e.g., Du et al., 2016), with implications for quasar physics and the use of quasars as standard cosmological candles.

To conclude, we find that measures of randomness significantly add to our arsenal of tools for RM, and that further exploration of the vast literature in the field is warranted. In particular, some of the schemes outlined here may be further refined, and additional measures of randomness/complexity not explored here may achieve superior performance as far as quasar RM is concerned.

This research has been supported in part by grants 950/15 and 848/16 from the Israeli Science Foundation (ISF), and by a Deutsche Forschungsgemeinschaft (DFG) grant HA3555/14-1. Computations were carried out on the Hive high-performance computer cluster at the University of Haifa, which is partly funded by ISF grant 2155/15. D. C. thanks Shai Kaspi and Uri Keshet for fruitful discussions that motivated this work, Boris Chorny and Keith Horne for wise advice, and Edi Barkai for invaluable and persistent support. We thank the referee for illuminating feedback.


  • Alexander (1997) Alexander, T. 1997, Astronomical Time Series, 218, 163
  • Alexander (2013) Alexander, T. 2013, arXiv:1302.1508
  • Bahcall et al. (1972) Bahcall, J. N., Kozlovsky, B.-Z., & Salpeter, E. E. 1972, ApJ, 171, 467
  • Bartels (1982) Bartels R. 1982, J. Am. Stat. Assoc., 77, 40
  • Barth et al. (2015) Barth, A. J., Bennert, V. N., Canalizo, G., et al. 2015, ApJS, 217, 26
  • Bassham et al. (2010) Bassham, L. E., Rukhin A., et al. 2010, National Institute of Standards and Technology, Special Publication (NIST SP) - 800-22
  • Bentz et al. (2006) Bentz, M. C., Denney, K. D., Cackett, E. M., et al. 2006, ApJ, 651, 775
  • Bentz et al. (2007) Bentz, M. C., Denney, K. D., Cackett, E. M., et al. 2007, ApJ, 662, 205
  • Bentz et al. (2013) Bentz, M. C., Denney, K. D., Grier, C. J., et al. 2013, ApJ, 767, 149
  • Bentz et al. (2009) Bentz, M. C., Walsh, J. L., Barth, A. J., et al. 2009, ApJ, 705, 199
  • Blandford & McKee (1982) Blandford, R. D., & McKee, C. F. 1982, ApJ, 255, 419
  • Blum et al. (1961) Blum J. R., Kiefer J., & Rosenblatt M. 1961, Ann. Math. Stat., 32, 485
  • Cackett et al. (2007) Cackett, E. M., Horne, K., & Winkler, H. 2007, MNRAS, 380, 669
  • Caplar et al. (2016) Caplar, N., Lilly, S. J., & Trakhtenbrot, B. 2016, arXiv:1611.03082
  • Chelouche & Daniel (2012) Chelouche, D., & Daniel E. 2012, ApJ, 747, 62
  • Chelouche & Zucker (2013) Chelouche, D., & Zucker S., 2013, ApJ, 769, 124
  • Cherepashchuk & Lyutyi (1973) Cherepashchuk, A. M., & Lyutyi, V. M. 1973, Astrophys. Lett., 13, 165
  • Collier et al. (1998) Collier, S. J., Horne, K., Kaspi, S., et al. 1998, ApJ, 500, 162
  • Czerny et al. (2013) Czerny, B., Hryniewicz, K., Maity, I., et al. 2013, A&A, 556, A97
  • Denney et al. (2006) Denney, K. D., Bentz, M. C., Peterson, B. M., et al. 2006, ApJ, 653, 152
  • Denney et al. (2009) Denney, K. D., Peterson, B. M., Pogge, R. W., et al. 2009, ApJ, 704, L80
  • Denney et al. (2009b) Denney, K. D., Watson, L. C., Peterson, B. M., et al. 2009, ApJ, 702, 1353
  • Dietrich et al. (1998) Dietrich, M., Peterson, B. M., Albrecht, P., et al. 1998, ApJS, 115, 185
  • Du et al. (2016) Du, P., Lu, K.-X., Zhang, Z.-X., et al. 2016, ApJ, 825, 126
  • Edelson & Krolik (1988) Edelson, R. A., & Krolik, J. H. 1988, ApJ, 333, 646
  • Elvis & Karovska (2002) Elvis, M., & Karovska, M. 2002, ApJ, 581, L67
  • Fausnaugh et al. (2016) Fausnaugh, M. M., Denney, K. D., Barth, A. J., et al. 2016, ApJ, 821, 56
  • Gaskell & Peterson (1987) Gaskell, C. M., & Peterson, B. M. 1987, ApJS, 65, 1
  • Gil-Merino et al. (2002) Gil-Merino, R., Wisotzki, L., & Wambsganss, J. 2002, A&A, 381, 428
  • Giveon et al. (1999) Giveon, U., Maoz, D., Kaspi, S., Netzer, H., & Smith, P. S. 1999, MNRAS, 306, 637
  • Grier et al. (2008) Grier, C. J., Peterson, B. M., Bentz, M. C., et al. 2008, ApJ, 688, 837-843
  • Grier et al. (2012) Grier, C. J., Peterson, B. M., Pogge, R. W., et al. 2012, ApJ, 755, 60
  • Grier et al. (2013) Grier, C. J., Peterson, B. M., Horne, K., et al. 2013, ApJ, 764, 47
  • Haas et al. (2011) Haas, M., Chini, R., Ramolla, M., et al. 2011, A&A, 535, A73
  • Hernitschek et al. (2015) Hernitschek, N., Rix, H.-W., Bovy, J., & Morganson, E. 2015, ApJ, 801, 45
  • Hoeffding (1948) Hoeffding W. 1948, Ann. Math. Stat., 19, 293
  • Hönig et al. (2017) Hönig, S. F., Watson, D., Kishimoto, M., et al. 2017, MNRAS, 464, 1693
  • Kaspi et al. (2000) Kaspi, S., Smith, P. S., Netzer, H., et al. 2000, ApJ, 533, 631
  • Kelly et al. (2009) Kelly, B. C., Bechtold, J., & Siemiginowska, A. 2009, ApJ, 698, 895-910
  • Kendall (1938) Kendall M. G., 1938, Biometrika, 30, 81
  • Kovačević et al. (2014) Kovačević, A., Popović, L. Č., Shapovalova, A. I., et al. 2014, Advances in Space Research, 54, 1414
  • Krolik et al. (1991) Krolik, J. H., Horne, K., Kallman, T. R., et al. 1991, ApJ, 371, 541
  • Lempel & Ziv (1976) Lempel A., & Ziv J. (1976), IEEE Trans. Inf. Theory, IT- 22, 75Ð81.
  • Liao et al. (2015) Liao, K., Treu, T., Marshall, P., et al. 2015, ApJ, 800, 11
  • Mushotzky et al. (2011) Mushotzky, R. F., Edelson, R., Baumgartner, W., & Gandhi, P. 2011, ApJ, 743, L12
  • Netzer et al. (2007) Netzer, H., Lira, P., Trakhtenbrot, B., Shemmer, O., & Cury, I. 2007, ApJ, 671, 1256
  • Pancoast et al. (2012) Pancoast, A., Brewer, B. J., Treu, T., et al. 2012, ApJ, 754, 49
  • Pelt et al. (1994) Pelt, J., Hoff, W., Kayser, R., Refsdal, S., & Schramm, T. 1994, A&A, 286, 775
  • Pelt et al. (1996) Pelt, J., Kayser, R., Refsdal, S., & Schramm, T. 1996, A&A, 305, 97
  • Peterson et al. (2002) Peterson, B. M., Berlind, P., Bertram, R., et al. 2002, ApJ, 581, 197
  • Peterson et al. (2004) Peterson, B. M., Ferrarese, L., Gilbert, K. M., et al. 2004, ApJ, 613, 682
  • Peterson & Wandel (2000) Peterson, B. M., & Wandel, A. 2000, ApJ, 540, L13
  • Peterson et al. (1998) Peterson, B. M., Wanders, I., Horne, K., et al. 1998, PASP, 110, 660
  • Peterson et al. (2014) Peterson, B. M., Grier, C. J., Horne, K., et al. 2014, ApJ, 795, 149
  • Pozo Nuñez et al. (2012) Pozo Nuñez, F., Ramolla, M., Westhues, C., et al. 2012, A&A, 545, A84
  • Pozo Nuñez et al. (2013) Pozo Nuñez, F., Westhues, C., Ramolla, M., et al. 2013, A&A, 552, A1
  • Reynolds (2000) Reynolds, C. S. 2000, ApJ, 533, 811
  • Rybicki & Press (1992) Rybicki, G. B., & Press, W. H. 1992, ApJ, 398, 169
  • Rodríguez-Pascual et al. (1997) Rodríguez-Pascual, P. M., Alloin, D., Clavel, J., et al. 1997, ApJS, 110, 9
  • Santos-Lleó et al. (1997) Santos-Lleó, M., Chatzichristou, E., de Oliveira, C. M., et al. 1997, ApJS, 112, 271
  • Santos-Lleó et al. (2001) Santos-Lleó, M., Clavel, J., Schulz, B., et al. 2001, A&A, 369, 57
  • Scargle (1981) Scargle, J. D. 1981, ApJS, 45, 1
  • Schawinski et al. (2006) Schawinski, K., Khochfar, S., Kaviraj, S., et al. 2006, Nature, 442, 888
  • Shen et al. (2015) Shen, Y., Brandt, W. N., Dawson, K. S., et al. 2015, ApJS, 216, 4
  • Shields et al. (2003) Shields, G. A., Gebhardt, K., Salviander, S., et al. 2003, ApJ, 583, 124
  • Stirpe et al. (1994) Stirpe, G. M., Winge, C., Altieri, B., et al. 1994, ApJ, 425, 609
  • Suganuma et al. (2006) Suganuma, M., Yoshii, Y., Kobayashi, Y., et al. 2006, ApJ, 639, 46
  • Tewes et al. (2013) Tewes, M., Courbin, F., & Meylan, G. 2013, A&A, 553, A120
  • Timmer & Koenig (1995) Timmer, J., & Koenig, M. 1995, A&A, 300, 707
  • Trakhtenbrot & Netzer (2012) Trakhtenbrot, B., & Netzer, H. 2012, MNRAS, 427, 3081
  • Uttley et al. (2014) Uttley, P., Cackett, E. M., Fabian, A. C., Kara, E., & Wilkins, D. R. 2014, A&A Rev., 22, 72
  • von Neumann (1941) von Neumann, J. 1941, Ann. Math. Statist. 12, 367
  • Volonteri et al. (2003) Volonteri, M., Haardt, F., & Madau, P. 2003, ApJ, 582, 559
  • Wald & Wolfowitz (1940) Wald A., Wolfowitz J. 1940, Ann. Math. Stat., 11, 147
  • Watson et al. (2011) Watson, D., Denney, K. D., Vestergaard, M., & Davis, T. M. 2011, ApJ, 740, L49
  • Welsh (1999) Welsh, W. F. 1999, PASP, 111, 1347
  • White & Peterson (1994) White, R. J., & Peterson, B. M. 1994, PASP, 106, 879
  • Zu et al. (2011) Zu, Y., Kochanek, C. S., & Peterson, B. M. 2011, ApJ, 735, 80
  • Zucker (2015) Zucker, S. 2015, MNRAS, 449, 2723
  • Zucker (2016) Zucker, S. 2016, MNRAS, 457, L118

Appendix A A scheme for computing the optimized von Neumann estimator

Here we outline an efficient computing scheme121212See http://chelouche.haifa.ac.il/codes/VonNeumann.m for a Matlab® version of the full code, and http://www.pozonunez.de/astro_codes/python/vnrm.py or http://www.pozonunez.de/astro_codes/idl/vnrm.pro for partial Python/IDL® implementations. for determining the global minimum of the VN estimator given the renormalization of and as defined in equations 12 and 13. We first note that we can write


where if the th data point originates from , and otherwise. Using these, we can write equation 14 as


Requiring a minimum for , at any given , we obtain the following set of two linear equations, , where




The solutions for and , inserted back into equation A2, yield the minimal (see, e.g., Figs. 9, 10), from which may be deduced at a modest computational cost. A similar, but not identical approach may reduce the computational load of the optimization scheme also for the other estimators discussed in this paper but is not further explored here.

The inclusion of weights, , and flags, (see §3.1.1) is straightforward and amounts to multiplying Eq. A2 by while maintaining proper normalization (Eq. 7). The derivation of the optimized schemes follows, as outlined above.

Appendix B The FR/RSS scheme and the von Neumann estimator

The FR/RSS method of Peterson et al. (2004, which includes refinements of the scheme of due to ) is used here to estimate the uncertainties on the time delay between the light curves of individual sources. While this method has been shown to provide reasonable, albeit somewhat conservative, lag uncertainties for ICCF schemes, it is yet to be shown to provide reasonable values also for the optimized VN scheme. To this end, we simulated the light curves for sources (§3), and carried out calculations of uncertainty in lag for each using the FR/RSS scheme with realizations. We find that, when averaged over the entire mock population, the input lag falls outside the 16%-84% range obtained via the FR/RSS scheme in of the cases (for the model parameterization explored here, and assuming random cadence), suggesting that the uncertainties are conservative. Requiring that the input delay falls within the designated uncertainty interval in % of the cases (within one standard deviation around the peak for normal distributions) requires that the quoted FR/RSS uncertainties be reduced by on average. A comparative analysis of the results of Bentz et al. (2013) and the time delays obtained here supports this conclusion (§4.3). Nevertheless, these results may not hold for individual targets, or for all object parameterizations and and light-curve cadences, and we therefore quote the standard FR/RSS uncertainties in Table 1. A more complete treatment of the estimation of the time-lag uncertainty for the VN RM scheme is beyond the scope of the present paper.

Appendix C Notes on Individual Sources

Results for the optimized (weighted) VN estimator are provided for each of the sources in Tables 1 and 2 with the estimator values and uncertainty contours shown in Figures 9 and 10. Note that the superior quality of the light curves that characterize objects in the compilation of Bentz et al. (2013, and references therein) results in more regular VN estimators with well defined troughs compared to the sample of Kaspi et al. (2000). This also reflects on the width of the error ”snakes” around the mean VN values (dotted black curves in Figs. 9 ,10), and results in better defined time delays. Note that the values of the VN estimator at different values of are correlated, as are their uncertainties. Also shown in Figures 9 and 10 in red are the numbers of pairs of points with (Eq. 6) that determine . The effect of discarding zero-lag pairs of points is evident in all sources, as are the seasonal gaps that might affect some time-delay measurements for the PG quasars (see also Zu et al., 2011) but are less relevant for most AGNs in the compilation of Bentz et al. (2013).

Below we comment on anomalies encountered while analyzing the sources and epochs considered here.

Figure 9.— Results for the H emission line based on the optimized VN estimator for the PG sample of quasars of Kaspi et al. (2000). Curves show the optimized VN mean (solid black line) and the 16%, 84% envelopes around it (dotted black lines) as obtained by the FR/RSS scheme. The time shift, , at which a -minimum occurs corresponds to the time delay (values for each source are provided in Table 1). This figure may be compared to Figure 4 in Kaspi et al. (2000) showing ZDCF and ICCF results for each of the targets, with qualitative agreement implied. Results for the H line are shown for PG 1351 for reasons discussed in Kaspi et al. (2000). Red curves show the number of pairs of points in the VN sum with (see Eq. 6).
Figure 10.— Same as Fig. 9 for the remaining objects in the complilation of AGN of Bentz et al. (2013). Values for the time delays for each of the sources are provided in Table 2.
  • Mrk 79: A second minimum at 200 days is ignored and the search for time delays is restricted to the range  days. Using the default search window of we get  days.

  • Mrk 110: The lag quoted in Table 1 was obtained by analyzing the full light curve for this source, while the values quoted from Bentz et al. (2013) are based on the mean values reported there and the uncertainty on their standard deviation (i.e., quoted lag uncertainties from Bentz et al. (2013) do not include the FR/RSS contribution).

  • PG 0953: The VN time delay is stable so long as the search window is  days .

  • NGC 3227: The VN time delay is stable so long as the search window is narrower than  days .

  • NGC 4151: The time-delay search interval was constrained to the interval  days since there exists a second VN minimum at 20 days whose inclusion leads to a solution of  days.

  • Mrk 202: Time delay search interval was restricted to the interval  days .

  • PG 1307: We take an approach akin to Kaspi et al. (2000), and find a stable solution within a search interval with a lower bound of  days (the upper bound matters little to the end result).

  • PG 1229: We restrict the search window to the range  days, because a second VN minimum exists at  days, which is reminiscent of the negative peak seen in the ZDCF for this source (see Fig. 4 in Kaspi et al. 2000).

  • Mrk 279: We search for lags within the range  days range since there exists another VN minimum at  days resulting in  days.

  • NGC 5548: The results shown in Figure 10 were obtained for the light curves of Denney et al. (2009). The analysis of individual epochs based on 13 years of data from Peterson et al. (2002) is not shown.

  • PG 1411: The time-delay search window is restricted to  days since a second VN minimum exists outside this range.

  • PG 1426: The time-delay search window is restricted to  days since a second VN minimum exists outside this range.

  • PG 1613: The time-delay search window is restricted to  days since a second VN minimum exists outside this range.

  • 3C 390.3: We find that different implementations of the VN algorithm (e.g., weighted vs. unweighted) change the structure of the VN trough in a non-negligible way, which can exceed the formal uncertainties provided by the FR/RSS scheme. Nevertheless, the delay is positive and  days for all implementations. Better data are required to better constrain the lag in this source.

  • NGC 6814: A local VN minimum exists also at negative delays where the number of pairs contributing to the signal is reduced (see Fig. 10). We therefore restrict the search window to  days .

  • PG 2130: Two very different delays are reported for this source: Table 2 and Figure 9 quote delays for the data set of Kaspi et al. (2000) while Table 1 and Figure 10 quote lags for the Grier et al. (2012) dataset. For more information about potential sampling issues for this source see Grier et al. (2008, 2012).

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters