An improved quasar detection method in EROS2 and MACHO LMC datasets
Abstract
We present a new classification method for quasar identification in the EROS2 and MACHO datasets based on a boosted version of Random Forest classifier. We use a set of variability features including parameters of a continuous auto regressive model. We prove that continuous auto regressive parameters are very important discriminators in the classification process. We create two training sets (one for EROS2 and one for MACHO datasets) using known quasars found in the LMC. Our model’s accuracy in both EROS2 and MACHO training sets is about 90% precision and 86% recall, improving the state of the art models accuracy in quasar detection. We apply the model on the complete, including 28 million objects, EROS2 and MACHO LMC datasets, finding 1160 and 2551 candidates respectively. To further validate our list of candidates, we crossmatched our list with a previous 663 known strong candidates, getting 74% of matches for MACHO and 40% in EROS. The main difference on matching level is because EROS2 is a slightly shallower survey which translates to significantly lower signaltonoise ratio lightcurves.
keywords:
Magellanic Clouds – methods: data analysis – quasars: general1 Introduction
Given the immense amount of data being produced by current deepsky surveys such as PanSTARRS (Kaiser et al., 2002), and future surveys such as LSST (Matter, 2007) and SkyMapper (Keller et al., 2007), astronomy is facing new challenges on how to analyze big data and thus on how to search or predict events/patterns of interest.
The size of the data has already exceeded the capability of manual examination or the capability of standard data analysis tools. LSST will produce 15 terabytes of data per night, which is even beyond the capacity of typical data storage today.
Thus in order to analyze such a huge amounts of data and detect interesting events or patterns with minimum false positives, innovative and novel data analysis methods are crucial for the success of such surveys.
In our previous works (Kim et al., 2011a, 2012) we developed classification models for the selection of quasars from large photometric databases using variability characteristics as the main discriminators. In particular we used a supervised classification model trained using a set of variability features calculated from MACHO lightcurves (Alcock, 2000). We applied the trained model to the entire MACHO database consisting of 40 million lightcurves and selected few thousands of quasar candidates. In this paper, we present an improved classification model used to detect quasars on MACHO (Alcock, 2000) and EROS2 dataset (Tisserand et al., 2007). The new model which works over an extended set of variability features, substantially decreases false positive rate and increases efficiency.
The actual model improvement is a result of an improvement in the machine learning classification model and the lightcurve features we use. Machine learning classification methods have been very popular for many decades. These methods are data analysis models that learn to predict a categorical variable from a set of other variables (of any type). Most known classification models are: decision trees (Quinlan, 1993), naive Bayes (Duda & Hart, 1973), Neural Networks (Rumelhart et al., 1986), Support Vector Machines (Cortes & Vapnik, 1995) and Random Forest (Breiman, 2001). There are some metamodels to improve classification results like Boosting methods Freund & Schapire (1997) and Mixtures of Experts (Jordan, 1994), among others. In general more recent classifiers are a result of research focused on building models able to search for patterns within high dimensional datasets, where the combinatorial number of possible projections of data is large.
Many machine learning classifiers have been applied to the analysis of astronomical data in particular to classify transients and variable stars from time series data (Bloom et al., 2011; Richards et al., 2011; Bloom & Richards, 2011; Debosscher et al., 2007; Wachman et al., 2009; Wang et al., 2010; Kim et al., 2011a, b). (Wang et al., 2010) proposed an algorithm to fit phaseshifted periodic time series using a mixture of Gaussian processes. (Debosscher et al., 2007) used many machine learning classifiers to learn a model that classifies variable stars in a sample from Hipparcos and OGLE databases. (Richards et al., 2011) used Random Forest classifier to classify between pulsational variables and eclipsing systems used in Milky Way tomography. In Bloom et al. (2011) they used machine learning algorithms to classify transients and variable stars from the Palomar Transient Factory (PTF) survey (Rau et al., 2009). In Wachman et al. (2009) they used cross correlation as a phase invariant feature to be used as a similarity indicator in a kernel function.
In this work we used a Random Forest classifier (Breiman, 2001) boosted with the AdaBoost algorithm (Freund & Schapire, 1997). The Random Forest classifier comes from the wellknown decision tree model (Quinlan, 1993) and Baggging techniques (Breiman, 1996), where the model randomly explores several subsets of features while analyzing samples of training data. This model performs very well in many machine learning domains (Breiman, 2001). AdaBoost algorithm (Freund & Schapire, 1997) is a boosting technique which fits a sequence of classification models (in this case a sequence of many Random Forests) to different subsets of data objects (in our case lightcurves), generating a mixture of classifiers each one specialized in smaller areas of the feature space. We call these classifiers as “weak classifiers” or “simpler classifiers”. This is a nice property for quasar classification, given that there are only a few known training quasars compared with the amount of nonquasars lightcurves. Having some weak classifiers that take care of some areas with no training quasars helps to filter out many nonquasars, while other specialized classifiers perform well near the quasar areas in the feature space.
Besides improving the classification model, we added new features as descriptors of lightcurves. These features correspond to the parameters of the continuous auto regressive (CAR(1)) model (Belcher et al., 1994) fitted to the lightcurves. Previous work shows that describing quasars using CAR(1) fitting parameters gives suitable results to differentiate them from other classes of lightcurves (Kelly et al., 2009). In this work they did not use machine learning classifiers to automatically detect quasars, they use CAR(1) model to fit 100 quasars lightcurves in order to find correlations between CAR(1) parameters and luminosity characteristics.
In our work we show that by adding CAR(1) features to our previous set of features (used in Kim et al. (2011a)), we can learn more accurate models for quasar detection. Given that our model is built to find quasars over dozens of millions of stars, we need to be very efficient in the estimation of the optimal parameters in order to make the process feasible within a considerable amount of time. Unfortunately, methods such as Metropolis Hastings or Gibbs Sampling are not suitable for our purposes, given the computational cost they involve.
To gain efficiency, we reduce the problem by approximating one of the parameters (the mean value of the light curve) and optimizing the remaining parameters (the amplitude and time scale of the variability) using a multidimensional unconstrained nonlinear minimization (Nelder & Mead, 1965). Once we get the optimal parameters we use them as features of the object corresponding to the lightcurve. Besides the CAR(1) features we also used time series features as in our previous work (Kim et al., 2011b), in section 4 we give details about all the features we extracted.
To check the fitting accuracy of our model we first calculate the training accuracy of our classifier using 10fold cross validation over a training set, which consists of about six thousand known light curves corresponding to different kinds of variable stars, nonvariable stars and confirmed quasars; one set corresponding to the MACHO database and another to the EROS2 database. In the MACHO case we substantially improve our training accuracy compared with our previous work (Kim et al., 2011b), increasing 14.3% in precision and 3.6% in recall for the MACHO database. In EROS2 training database, we get about the same training efficiency as in the MACHO case but we could not compare to our previous work because this is the first time we attempt to classify in EROS2 database. As an extra test for our candidates, we crossmatch them with the previous set of strong candidates found in Kim et al. (2011b), details are presented in section 5.
Using parallel computing we decrease the processing time to allow us to select quasar candidates from the entire database within three days. Note that the data analysis schema used in this work can be applied to any of the ongoing and future synoptic sky surveys such as PanSTARRS, LSST, and SkyMapper, among others. ^{1}^{1}1Our main computer resource is The Odyssey cluster supported by the FAS Research Computing Group at Harvard.
If confirmed the selected quasars from the MACHO database will provide critical information for galaxy evolution, black hole growth, large scale structure, etc. (Heckman et al. 2004; Bower et al. 2006; Trichas et al. 2009, 2010). Moreover the resulting quasar lightcurves will be a valuable dataset for quasar time variability studies, (e.g. time scale, blackhole mass, type i and ii variability) since MACHO and EROS lightcurves are wellsampled over 7.4 years (Alcock, 2000).
The paper is organized as follows, in section 2 we present details about EROS2 database, in section 3 we describe in details the classification model we use, including the Random Forest Model and AdaBoost, in section 4 we describe the features we use to describe the lightcurves, in section 5 we describe the experimental results for the MACHO and EROS2 dataset.
2 EROS2 Dataset
The EROS2 collaboration made use of the MARLY telescope, a one meter diameter RitcheyChrétien (/5.14) instrument dedicated to the survey. It was operated between July 1996 and March 2003 at La Silla Observatory (ESO, Chile). It was equipped with two wide angle CCD cameras which are located behind a dichroic beamsplitter. Each camera is a mosaic of 8 CCDs, 2 along right ascension and 4 along declination. Each CCD has pixels of m individual size, corresponding to a arcsec pixel surface on the sky. The size of the field of view is along right ascension and along declination. The dichroic beamsplitter allowed simultaneous imaging in two broad nonstandard passbands, in the range 42007200 (the socalled “blue” channel), and in the range 62009200 (the socalled “red” channel). The blue filter is intermediate between the standard and standard passbands, while the red filter is analogous to . The normalized transmission curve of these filters, compared to standard ones, is given by Hamadache (2004)^{2}^{2}2Available at URL: http://tel.archivesouvertes.fr on Fig. 3.3. Tisserand et al. (2007) give in Eq. (4) the equations to transform EROS2 magnitudes into and ones within an accuracy of 0.1 magnitude.
3 Methodology
To train a model that learns to detect quasars, we propose to use a combination of classifiers. Combination of multiple classifiers was first proposed by Xu et al. (1992). In that work, they proved that combining multiple classifiers overcome many of the individual classifiers limitations. In many pattern recognition problems, such as character recognition, handwritten text recognition and face recognition (Zhao et al., 2003; Plamondon & Srihari, 2000), combination of multiple classifiers obtain much better classification performance. One effective way to combine classifiers is the AdaBoost algorithm, proposed in Freund & Schapire (1997).
The AdaBoost algorithm consists of a set of base classifiers that are trained sequentially, such that each classifier is trained on the instances where the previous classifier obtained a bad performance (learn what your partners could not learn). In Freund & Schapire (1997), they show that if the training set used for each classifier depends on the goodness of fit of the previous classifier, then the performance of the whole system improves. To make that the base classifiers focus on different subsets of the training set, we assign weights to training data instances. The lower the weight for an instance, the less the classifier focuses on it (see section 3.1 for further details).
One of the advantages of boosting methods is that after the model fitting phase is completed, each of the base classifiers become an expert in some subset of data objects. This is one of the main reasons that motivate us to use a previous boosting step. Given that we have a very small amount of known quasars in our training set compared with the amount of non quasars, training a set of base classifiers that just learn how to filter out some of the non quasars would be very helpful for the next base classifier used in the sequential process. We now present a detailed description of the boosting method we use in this work, the AdaBoost algorithm (Freund & Schapire, 1997).
3.1 AdaBoost Algorithm
AdaBoost, short for adaptive boosting, is a machine learning algorithm proposed by Freund and Schapire (Freund & Schapire, 1997). It is a metaalgorithm because it combines many learning algorithms to perform classification. AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers. Although AdaBoost is sensitive to noisy data and outliers, it is less susceptible to overfitting (Dietterich, 1995) than most learning algorithms.
In the context of lightcurveclassification, suppose we have a training (labeled) set of lightcurves and features describing each lightcurve. Each lightcurve in the training set has a known given label (e.g. quasar or nonquasar). Let be a set of descriptors where each is a vector associated to the lightcurve where its descriptor (features) values are where is the number of features. Let be the labels such that if the lightcurve is a quasar and otherwise.
Let be the set of classifiers , where and be the distribution of weights on classifiers at iteration . Define to be the number of classifiers and a constant to be the number of times to iterate in the AdaBoost algorithm.
Initialization:
Algorithm:
Notes:

is the Kronecker delta.

is a normalization factor
The equation to update the classifier weight distribution is constructed so that when and when . Thus, after selecting an optimal classifier , for the distribution , the objects that classifier classified correctly are given less weight and those that it identified incorrectly are given more weight. Hence, when the algorithm proceeds to test the classifiers on , it is more likely to select a classifier that better classifies the objects that missed. Adaboost minimizes the training error (exponentially fast) if each weak classi er performs better than random guessing ().
The base classifier we used in this work is the Random Forest classifier (Breiman, 2001), a very strong classifier that has shown very good results in many different domains. The following section shows details about the Random Forest classifier.
3.2 Random Forest Classifier
Random Forests (RF) is a popular and very efficient algorithm based on decision tree models (Quinlan, 1993) and Bagging for classification problems (Breiman, 1996, 2001).
It belongs to the family of ensemble methods,
appearing in machine learning literature at the end of nineties (Dietterich, 2000) and has been used recently in the astronomical journals (Carliles et al., 2010; Richards et al., 2011).
The process of training or building a Random Forest given training data
is as follows:

Let be the number of trees in the Forest and the number of features on each tree, both values are model parameters.

Build sets of samples taken with replacement from the training set; this is called bagging. Note that each of the bags has the same number of elements from the training set but less different examples, given that the samples are taken with replacement.

For each of the sets, train a decision tree using a random sample of features from the set of possible features.
The Random Forest classifier creates many linear separators inside many featuresubsets until it gets suitable separations between objects from different classes. Linear separations come from each decision tree, each of the featuresubsets come from the random feature selection process on each tree. The bagging procedure is very useful to estimate the error of the classifier during the training process. This error can be estimated using outofthebag procedure, which means, “evaluate the performance of each tree using the objects not selected in the bag which belong to the tree” (see Breiman (2001) for further details).
After training the Random Forest, to classify a new unknown lightcurve descriptor, one uses each of the decision trees already trained with the Random Forest to classify the new unknown instance and the final decision is the most voted class among the set of decision trees (see Breiman (2001) for more details). In Breiman (2001) they show that as the number of trees tend to infinity the classification error of the RF becomes bounded and the classifier does not overfit the data.
4 Feature Extraction
We extracted 14 features per each band for each lightcurve. Those features correspond to 11 time series features used in our previous work (Kim et al., 2011b) and 3 features corresponding to the CAR(1) process.
4.1 Time Series features
Here we very briefly summarize the 11 time series features used in our previous work (Kim et al., 2011b).

: Is the number of points above/below the upper/lower bound line calculated as points that are over the average of the autocorrelation functions.

Stetson : Is the variability index derived based on the autocorrelation function of each lightcurve (Stetson, 1996).

: Is the range of the cumulative sums (starting from 1 to the number of observations) of each lightcurve (Ellaway, 1978).

: The ratio of the standard deviation, , to the mean magnitude, .

Stetson : Is a variability index (Stetson, 1996) that describes the synchronous variability of different bands.

: Is the ratio of the mean of the square of successive differences to the variance of data points.

: Average color for each lightcurve

: Is the number of three consecutive data points that are brighter or fainter than 2 and normalized the number by .
4.2 Continuous Auto Regressive Process Features
We use continuous time auto regressive model (CAR(1)) to model irregular sampled time series in MACHO and EROS2 lightcurves. CAR(1) process has three parameters, it provides a natural and consistent way of estimating a characteristic time scale and variance of lightcurves. CAR(1) process is described be the following stochastic differential equation (Brockwell & Davis, 2002)
where the mean value of the lightcurve is and the variance is .
is the relaxation time of the process , it can be interpreted as describing the variability amplitude of the time series.
can be interpreted
as describing the variability of the time series on time scales shorter than . is a white noise process with zero mean and variance equal to one.
The likelihood function of a CAR(1) model for a lightcurve with observations observed at times with measurement error variances
is:
(2)  
(3)  
(4)  
(5)  
(6)  
(7)  
(8) 
To find the optimal parameters we maximize the likelihood with respect to , and . Given that the likelihood does not have an analytical solution, we can solve it with a statistical sampling method such Metropolis Hastings (Metropolis et al., 1953). Given that we extract features for all the lightcurves in EROS2 and MACHO datasets (about 28 and 40 millions of stars respectively), performing a statistical sampling process to determine the optimal parameters would be feasible only in cases where stable solutions are found in a reasonable amount of time. We consider that less than 3 seconds is reasonable given our hardware resources. Unfortunately we could not get stable solutions considering that restriction. To overcome this situation we simplify the optimization problem by reducing the number of parameters to be estimated. Instead of estimating , and , we just estimate and and then we calculate as the mean magnitude of the lightcurve divided by . To check that this estimation works well, we use a sample of 250 lightcurves and compare the reduced Chisquare error using two and three parameters optimization, getting differences smaller than 2.5% in average.
This approximation allows us to perform a two dimensional optimization which can be solved with a regular numerical method in less than one second per lightcurve. We used the NelderMead multidimensional unconstrained nonlinear optimization (Nelder & Mead, 1965) to find the optimal parameters. Figure 1 shows the fitting of three quasar lightcurves with the resulting CAR(1) coefficients using the NelderMead algorithm. Note that instead of using directly as a feature, we use the mean magnitude of the lightcurve (), in order to have a cleaner feature ( is calculated from , which is already used as a feature).
5 QSO candidates on EROS2 and MACHO datasets
5.1 EROS2 dataset
To train a model able to find quasars in EROS2 we create a training set composed of 65 known quasars, 67 Be stars, 330 Long Periodic stars, 5829 nonvariable stars, 1727 RR Lyrae, 406 Cepheids, and 488 EB stars. We get these stars crossmatching the EROS2 dataset with MACHO known stars using positional matching with 3 arcsec of accuracy. We extracted features in bands R and B. Figures 2 and 3 show projections of the training set on different sets of features containing CAR(1) features. In many cases is easy to get a natural separation between quasars and the variable stars, but usually quasars overlap many of the non variable stars (ex. with B R, with , with ). Fortunately, there are many projections where quasars and nonvariable stars are mostly separated, (ex. with , with , with Stetson , with , with Stetson .)
To compare the distribution of the objects predicted as quasars with the training quasars and other variable stars, we plot our EROS2 training data plus the predicted quasars projected on many different pairs of features (Figs. 4 , 5). We can see that in most of the cases Predicted quasars and Training quasars have very similar distributions regardless of the small amount of training quasars we use. Main differences between both distributions are in general because of the big difference in size comparing training and testing data, resulting in a set of predicted quasars 20 times bigger than the training quasars set.
To get an indicator of the accuracy in the training set on EROS2 dataset, we run a 10 fold cross validation. This validation method consists of partitioning the dataset in 10 folds (subsets) of the same size, we iterate 10 times, on iteration we train the classifier with all the folds but the fold , then we test the performance on the fold (the one which the model did not see during the training). The process returns the model prediction for the entire dataset (the union of the 10 testing folds is equal to the data).
We measure the accuracy using the Fscore indicator. This indicator is calculated as the harmonic mean of precision and recall:
FScore
Where precision and recall are defined as:
Where and are the number of true positives, false positives and false negatives respectively.
Table 1 show the results for the boosted version of Random Forest, regular Random Forest and SVM (classifier used in our previous work (Kim et al., 2011b) ) with and without CAR features.
SVM  SVM  RF  RF  AB+RF  AB+RF 
No CAR  CAR  No CAR  CAR  No CAR  CAR 
0.74  0.855  0.787  0.813  0.81  0.868 
We find 1160 candidates in the EROS2 dataset. To validate our candidates we crossmatch them with the list of 663 MACHO strong candidates in Kim et al. (2011b). From that list, only 332 objects exists in EROS2 dataset, we find 191 matches between our EROS2 candidates and those 332 objects. Figure 6 shows some of the lightcurves of the quasar candidates for the EROS2 dataset.
Regarding the efficiency in the extraction of the CAR(1) features and the time series features, we implemented parallel processing in order to perform the features extraction and classification in a reasonable amount of time. EROS2 and MACHO databases are stored as a set of thousands of folders where each folder contains thousands of lightcurves of a given field. The feature extraction process runs as a set of parallel threads that run over different compressed files at the same time, extracting them and processing the lightcurves to get the features. Once the features are calculated they are written into a common file related to a particular folder, so each compressed file has a corresponding data file that stores the feature values of all the lightcurves within the folder. After the feature extraction process we run a classification process that runs in parallel over the thousands of data feature files calculated in the previous step.
5.2 MACHO dataset
MACHO was a survey which observed the sky starting in July 1992 and ending in 1999 to detect microlensing events produced by Milky Way halo objects. Several tens of millions of stars where observed in the Large Magellanic Cloud (LMC), Small Magellanic Cloud (SMC) and Galactic bulge (Alcock, 2000).
For the MACHO dataset we built a training set composed of 3969 nonvariable stars, 127 Be stars, 78 Cepheids, 193 eclipsing binaries, 288 RR Lyrae, 574 microlensing, 359 longperiod variables, and 58 quasars. We get the variable stars from the list of known MACHO variable sources extracted from SIMBAD’s MACHO variable catalog^{3}^{3}3http://vizier.ustrasbg.fr/vizbin/VizieR?source=II/247 (Alcock, 2001) and also from several other literature sources (Alcock, 1997a, b; Wood, 2000; Keller et al., 2002; Thomas, 2005). To get the non variable stars, we randomly chose a subset of MACHO lightcurves from a few MACHO LMC fields and removed all the known MACHO variables from the subset.
Each lightcurve is described as a feature vector which contains 28 features, 14 features for band B and 14 features for band R as described in section 4.
Figures 7 and 8 show the training set projected on a two variables feature space. We can see that and features show separations between two groups of classes: i) nonvariables, Cepheid and Eclipsing Binaries stars and ii) quasars, Microlensings, LPVs and Be stars. Combining and we can see a cluster of quasars, which overlaps with some of the Be stars, nonvariables, Microlensing and long period variables, but separates very well quasars from Cepheids, Eclipsing Binaries stars and most of the nonvariables. Projecting on and we can see that quasars separates from LPVs, Cepheids, Eclipsing Binaries, most of Be stars, most of the Microlensings and most of the nonvariables. The biggest overlap is with Microlensings.
By examining these projections we can see that quasars are clustered in high values of , with higher values compared to Eclipsing Binaries, Cepheids and RR Lyraes. is very good to separate quasars from nonvariables, also from Cepheids, RR Lyraes and Eclipsing Binaries stars. is not a good feature to separate quasars from Microlensings, Be stars and LPVs, but combining with B R we get a strong separation between them.
Table 2 shows comparative results among different classification models. We included a Support Vector Machine, Random Forest and Radom Forest Boosted with AdaBoost. On each case the classifier is tuned with the optimal set of parameters.
SVM  SVM  RF  RF  AB+RF  AB+RF 
No CAR  CAR  No CAR  CAR  No CAR  CAR 
0.787  0.824  0.826  0.841  0.844  0.877 
After we select and fit the model to the training set, we run on the whole MACHO data (about 40 million of lightcurves), from where we get 2551 quasar candidates. We crossmatch our candidates with the 2566 and 663 strong candidates in our previous work (Kim et al., 2011b) getting 1148 and 494 matches respectively.
Figure 9 shows some of the new candidates we find that are not in the previous list for MACHO candidates in Kim et al. (2011b)
There are some cases where the model confuses a periodic star with a quasar. Figure 10 shows one example of this case.
To analyze the distribution of predicted quasars in the feature space we show some projections of the training data plus the predicted quasars. Figures 11 and 12 show the distribution of predicted quasars , training quasars and all the other classes of stars. As in the EROS2 case, we can see that in many cases the predicted quasars show similar distributions compared with training quasars. There are some cases where a big portion of the predicted quasars is expanded out of the concentrated cluster of training quasars, for example, combining and B R
6 Summary
In this work we present a new list of candidate quasars from MACHO and EROS2 datasets. This new list is obtained using a new model that uses continuous auto correlation features plus time series features to feed a boosted version of the Random Forest classifier (Breiman, 2001). With this model we obtain a list of 1160 candidates for the EROS2 and 2551 candidates for the MACHO dataset. From our MACHO candidates we crossmatch them with the old list of candidates from Kim et al. (2011b) and we get 1148 matches. We crossmatch our EROS2 candidates with the list of 663 MACHO strong candidates in Kim et al. (2011b). From that list, only 332 objects exist in the EROS2 dataset, and we find 131 matches between our EROS2 candidates and those 332 objects (see table 3). We prove that using boosted Random Forest with CAR(1) features we improve the fitting of the model to the training set in both EROS2 and MACHO datasets.
We show that quasars are well separated from many other kind of variable stars using CAR(1) features combined with time series features. We also proved that adding CAR(1) features, SVM, Random Forest and Boosted Random Forest improve their training accuracy. There are some challenges to overcome in future work such as the confusion of some periodic stars with quasars. We notice that about 25% of false positives correspond to periodic stars. We believe that adding a dedicated module to filter periodic stars we can improve the results.
Previous Candidates  Previous Strong  New list of MACHO  List of EROS2 
MACHO ()  Candidates MACHO ()  Candidates ()  Candidates () 
2566  663  2551  1160 
Matches between  Matches between  Objects from ()  Matches between 
() and ()  () and ()  Catalogued in EROS2  () and () 
1148  491  332 ()  131 
Acknowledgments
This paper utilizes public domain data obtained by the MACHO Project,
jointly funded by the US Department of Energy through the University
of California, Lawrence Livermore National Laboratory under contract
No. W7405Eng48, by the National Science Foundation through the
Center for Particle Astrophysics of the University of California under
cooperative agreement AST8809616, and by the Mount Stromlo and Siding
Spring Observatory, part of the Australian National University.
The analysis in this paper has been done using the Odyssey cluster
supported by the FAS Research Computing Group at Harvard.
This research has made use of the SIMBAD
database, operated at CDS, Strasbourg, France.
We thank everyone from the EROS2 collaboration for the
access granted to the database. The EROS2 project was
funded by the CEA and the CNRS through the IN2P3 and INSU institutes.
References
 Alcock (1997a) Alcock, C., e. a. 1997a, ApJ, 491, L11+
 Alcock (1997b) —. 1997b, ApJ, 479, 119
 Alcock (2000) —. 2000, ApJ, 542, 281
 Alcock (2001) —. 2001, Variable Stars in the Large Magellanic Clouds, VizieR Online Data Catalog (http://vizier.ustrasbg.fr/vizbin/VizieR?source=II/247)
 Ansari (1996) Ansari, R. 1996, Vistas in Astronomy, 40, 519
 Belcher et al. (1994) Belcher, J., Hampton, J. S., & Wilson, G. T. 1994, Journal of the Royal Statistical Society. Series B (Methodological), 56, 141
 Bloom & Richards (2011) Bloom, J. S., & Richards, J. W. 2011, ArXiv eprints
 Bloom et al. (2011) Bloom, J. S., Richards, J. W., Nugent, P. E., Quimby, R. M., Kasliwal, M. M., Starr, D. L., Poznanski, D., Ofek, E. O., Cenko, S. B., Butler, N. R., Kulkarni, S. R., GalYam, A., & Law, N. 2011, ArXiv eprints
 Bower et al. (2006) Bower, R. G., Benson, A. J., Malbon, R., Helly, J. C., Frenk, C. S., Baugh, C. M., Cole, S., & Lacey, C. G. 2006, MNRAS, 370, 645
 Breiman (1996) Breiman, L. 1996, in Machine Learning, 123–140
 Breiman (2001) Breiman, L. 2001, in Machine Learning, 5–32
 Brockwell & Davis (2002) Brockwell, P., & Davis, R. 2002, Introduction to Time Series and Forecasting (Springer New York)
 Carliles et al. (2010) Carliles, S., Budavri, T., Heinis, S., Priebe, C., & Szalay, A. 2010, The Astrophysical Journal, 712, 511
 Cortes & Vapnik (1995) Cortes, C., & Vapnik, V. 1995, Machine Learning, 20, 273
 Debosscher et al. (2007) Debosscher, J., Sarro, L., Aerts, C., Cuypers, J., Vandenbussche, B., Garrido, R., & Solano, E. 2007, Astronomy and Astrophysics, 475, 1159
 Derue et al. (2002) Derue, F., Marquette, J.B., Lupone, S., Afonso, C., Alard, C., Albert, J.N., Amadon, A., Andersen, J., Ansari, R., Aubourg, É., Bareyre, P., Bauer, F., Beaulieu, J.P., Blanc, G., Bouquet, A., Char, S., Charlot, X., Couchot, F., Coutures, C., Ferlet, R., Fouqué, P., Glicenstein, J.F., Goldman, B., Gould, A., Graff, D., Gros, M., Haıssinski, J., Hamilton, J.C., Hardin, D., de Kat, J., Kim, A., Lasserre, T., Le Guillou, L., Lesquoy, É., Loup, C., Magneville, C., Mansoux, B., Maurice, É., Milsztajn, A., Moniez, M., PalanqueDelabrouille, N., Perdereau, O., Prévot, L., Regnault, N., Rich, J., Spiro, M., VidalMadjar, A., Vigroux, L., Zylberajch, S., & EROS Collaboration. 2002, Astronomy and Astrophysics, 389, 149
 Dietterich (1995) Dietterich, T. 1995, ACM Computing Surveys, 27, 326
 Dietterich (2000) Dietterich, T. 2000, in Proceedings of the First International Workshop on Multiple Classifier Systems (Springer Verlag), 1–15
 Duda & Hart (1973) Duda, R., & Hart, P. 1973, Pattern Classification and Scene Analysis (John Willey & Sons)
 Ellaway (1978) Ellaway, P. 1978, Electroencephalography and Clinical Neurophysiology, 45, 302
 Freund & Schapire (1997) Freund, Y., & Schapire, R. 1997, Journal of Computer and System Sciences
 Hamadache (2004) Hamadache, C. 2004, PhD thesis, Université Louis Pasteur  Strasbourg I
 Heckman et al. (2004) Heckman, T. M., Kauffmann, G., Brinchmann, J., Charlot, S., Tremonti, C., & White, S. D. M. 2004, ApJ, 613, 109
 Jordan (1994) Jordan, M. I. 1994, Neural Computation, 6, 181
 Kaiser et al. (2002) Kaiser, N., Aussel, H., Burke, B. E., Boesgaard, H., Chambers, K., Chun, M. R., Heasley, J. N., Hodapp, K.W., Hunt, B., Jedicke, R., Jewitt, D., Kudritzki, R., Luppino, G. A., Maberry, M., Magnier, E., Monet, D. G., Onaka, P. M., Pickles, A. J., Rhoads, P. H. H., Simon, T., Szalay, A., Szapudi, I., Tholen, D. J., Tonry, J. L., Waterson, M., & Wick, J. 2002, in Society of PhotoOptical Instrumentation Engineers (SPIE) Conference Series, Vol. 4836, Society of PhotoOptical Instrumentation Engineers (SPIE) Conference Series, ed. J. A. Tyson & S. Wolff, 154–164
 Keller et al. (2002) Keller, S. C., Bessell, M. S., Cook, K. H., Geha, M., & Syphers, D. 2002, AJ, 124, 2039
 Keller et al. (2007) Keller, S. C., Schmidt, B. P., Bessell, M. S., Conroy, P. G., Francis, P., Granlund, A., Kowald, E., Oates, A. P., MartinJones, T., Preston, T., Tisserand, P., Vaccarella, A., & Waterson, M. F. 2007, Publications of the Astronomical Society of Australia, 24
 Kelly et al. (2009) Kelly, B. C., Bechtold, J., & Siemiginowska, A. 2009, 698, 895
 Kim et al. (2011a) Kim, D.W., Protopapas, P., Byun, Y.I., Alcock, C., Khardon, R., & Trichas, M. 2011a, 735
 Kim et al. (2011b) —. 2011b, ApJ, 735, 68
 Kim et al. (2012) Kim, D.W., Protopapas, P., Trichas, M., RowanRobinson, M., Khardon, R., Alcock, C., & Byun, Y.I. 2012, 747
 Lomb (1976) Lomb, N. R. 1976, Ap&SS, 39, 447
 Matter (2007) Matter, D. 2007, Science, 1
 Metropolis et al. (1953) Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., & Teller, E. 1953, Journal of Chemical Physics, 21, 1087
 Nelder & Mead (1965) Nelder, J., & Mead, R. 1965, Computer Journal, 7, 308
 Plamondon & Srihari (2000) Plamondon, R., & Srihari, S. 2000, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22, 63
 Quinlan (1993) Quinlan, J. 1993, C4.5: programs for machine learning (Morgan Kaufmann Publishers Inc.)
 Rau et al. (2009) Rau, A., Kulkarni, S. R., Law, N. M., Bloom, J. S., Ciardi, D., Djorgovski, G. S., Fox, D. B., GalYam, A., Grillmair, C. C., Kasliwal, M. M., Nugent, P. E., Ofek, E. O., Quimby, R. M., Reach, W. T., Shara, M., Bildsten, L., Cenko, S. B., Drake, A. J., Filippenko, A. V., Helfand, D. J., Helou, G., Howell, D. A., Poznanski, D., & Sullivan, M. 2009, Publications of the Astronomical Society of the Pacific, 121, 1334
 Richards et al. (2011) Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., CrellinQuick, A., Higgins, J., Kennedy, R., & Rischard, M. 2011, The Astrophysical Journal, 733
 Rumelhart et al. (1986) Rumelhart, D., Hinton, G., & Williams, R. 1986, Learning internal representations by error propagation (MIT Press), 318–362
 Scargle (1982) Scargle, J. D. 1982, ApJ, 263, 835
 Stetson (1996) Stetson, P. B. 1996, PASP, 108, 851
 Thomas (2005) Thomas, C. L., e. a. 2005, ApJ, 631, 906
 Tisserand et al. (2007) Tisserand, P., Le Guillou, L., Afonso, C., Albert, J. N., Andersen, J., Ansari, R., Aubourg, É., Bareyre, P., Beaulieu, J. P., Charlot, X., Coutures, C., Ferlet, R., Fouqué, P., Glicenstein, J. F., Goldman, B., Gould, A., Graff, D., Gros, M., Haissinski, J., Hamadache, C., de Kat, J., Lasserre, T., Lesquoy, É., Loup, C., Magneville, C., Marquette, J. B., Maurice, É., Maury, A., Milsztajn, A., Moniez, M., PalanqueDelabrouille, N., Perdereau, O., Rahal, Y. R., Rich, J., Spiro, M., VidalMadjar, A., Vigroux, L., Zylberajch, S., & EROS2 Collaboration. 2007, Astronomy and Astrophysics, 469, 387
 Trichas et al. (2009) Trichas, M., Georgakakis, A., RowanRobinson, M., Nandra, K., Clements, D., & Vaccari, M. 2009, MNRAS, 399, 663
 Trichas et al. (2010) Trichas, M., RowanRobinson, M., Georgakakis, A., Valtchanov, I., Nandra, K., Farrah, D., Morrison, G., Clements, D., & Waddington, I. 2010, MNRAS, 405, 2243
 Wachman et al. (2009) Wachman, G., Khardon, R., Protopapas, P., & Alcock, C. 2009, in Lecture Notes in Computer Science, Vol. 5782, Machine Learning and Knowledge Discovery in Databases, ed. W. Buntine, M. Grobelnik, D. Mladenic, & J. ShaweTaylor (Springer Berlin / Heidelberg), 489–505
 Wang et al. (2010) Wang, Y., Khardon, R., & Protopapas, P. 2010, in Lecture Notes in Computer Science, Vol. 6323, Machine Learning and Knowledge Discovery in Databases (Springer Berlin / Heidelberg), 418–434
 Wood (2000) Wood, P. R. 2000, Publications of the Astronomical Society of Australia, 17, 18
 Xu et al. (1992) Xu, L., Krzyzak, A., & Suen, C. 1992, Systems, Man and Cybernetics, IEEE Transactions, 22, 418
 Zhao et al. (2003) Zhao, W., Chellappa, R., Phillips, P. J., & Rosenfeld, A. 2003, ACM Comput. Surv., 35, 399