Machine learning for transient discovery in Pan-STARRS1 difference imaging.
Abstract
Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time domain surveys with large format CCD cameras. This dependence on humans to reject ‘bogus’ detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper we explore a simple machine learning approach to real-bogus classification by constructing a training set from the image data of 32000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 2020 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25% of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1%, the classifier initially suggests a missed detection rate of around 10%. However we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6%.
keywords:
methods: data analysis, methods: statistical, techniques: image processing, surveys, supernovae: general1 Introduction
Current transient surveys such
as Pan-STARRS1 (PS1) (Kaiser et al., 2010), PTF (Rau et al., 2009), LSQ (Baltay et al., 2013),
SkyMapper (Keller et al., 2007) and CRTS (Drake et al., 2009) are efficient
discoverers of astrophysical transients. To make these surveys possible
it has become necessary to automate every step in the data pipeline
including data collection, archiving and reduction. A major goal for
time-domain astrophysics is early detection and rapid follow-up to
enable complete data sets for transients. Artefact rejection has become
the bottle-neck between fast transient detection and our ability to feed
these targets to follow-up surveys such as PESSTO (Smartt et al., 2013) and
PTF for early classification. Current artefact rejection
typically involves deriving some set of parameters from the image data of
individual detections and thresholding each parameter, only promoting
those detections that pass the thresholds to humans for
verification.
The numbers of detections that must be scanned by humans is still on
the order of hundreds of objects each night with a high false positive
rate. The processing artefacts produced are a
result of many factors such as saturated sources, convolution issues and
detector defects amongst others, and to a large extent are common across
all surveys. For the next generation of survey we cannot expect humans
to remain involved in this process of artefact rejection to the same
extent, where for example we expect on the order of transient
detections per night from LSST111http://www.lsst.org/lsst/.
Significant effort has been devoted to this problem in
anticipation of these next generation surveys, and to enable rapid turn
around from detection to classification for current surveys. Machine
learning techniques have been used to take advantage of the large
amounts of data gathered by these surveys to train a classifier that can
distinguish real astrophysical transients from artefacts or ‘bogus’
detections. Examples include Donalek et al. (2008) for the Palomar-Quest
survey, Romano, Aragon, &
Ding (2006) for SNFactory, and
Bailey et al. (2007) and du Buisson et al. (2014) for SDSS. PTF have
demonstrated the ability to efficiently characterise detections and
initiate rapid follow-up, see Gal-Yam et al. (2014) for example, where the
problem of real-bogus classification has been addressed by the work of
Bloom et al. (2012) and Brink et al. (2013). While these studies do achieve high
levels of performance, the parameters chosen to represent the images are
often dependent on the specific implementation and strategy of the
individual surveys.
In this paper we investigate a simple
representation of the images by using the pixel intensities in a region
around a detection in a single difference image. This choice of
parameterisation is independent of other aspects of the survey, and is
therefore applicable to any survey performing difference imaging while
also lending itself to implementation much earlier in the data
processing pipeline (potentially at the source extraction stage). We
begin by outlining the real-bogus problem in the context of PS1 in
Section 2, followed by a description of our training set
and image parameterisation in Section 3. In
Section 4 we discuss the various machine learning
algorithms we investigate, outline how we select the optimum classifier,
and report its performance compared with previous work. We continue in
Section 5 with some further analysis to help
understand how we expect the classifier to perform on a live data stream.
Finally we summarise our results and conclude in
Section 6.
2 PS1 and the Problem of Real-Bogus Classification
The Pan-STARRS1 system comprises a 1.8 m primary
mirror (Hodapp et al., 2004) and a field-of-view of 3.3 deg imaged by 60
48004800 pixel detectors, constructed from 10 m pixels
subtending 0.258 arcsec (for more details, see Magnier et al. (2013)). The
PS1 filter system consists of 5 filters, g, r, i,
z similar to SDSS griz (York et al., 2000) with the addition of
y, which extends redward of z. The system is described in
detail by Tonry et al. (2012b). The PS1 Science Consortium (PS1SC) operates
the PS1 telescope performing 2 major surveys. The Medium Deep Survey
(MDS) (Tonry et al., 2012a) is allocated 25% of observing time for high
cadence observations of 10 fields, each the size of the PS1
field-of-view. The wide-field 3 survey with 56% observing time
aims to observe the entire sky north of 30 deg. declination with a
total of 20 exposures per year in all five filters for each
pointing.
In this paper we use images from the MDS. Each night 3-5 of the
MDS fields are observed. Each epoch is composed of eight dithered
exposures of 8 113 s in g and r, or 8 240
s in i, z and y, producing nightly stacked images of
904 and 1632 s duration (Tonry et al., 2012a). Each stack achieves
5 depths of around 23.3 mag in g, r, i,
z and 21.7 mag in y. Images from the PS1 system are processed by the Image Processing
Pipeline (IPP; Magnier (2006)), on a computer cluster at the Maui
High Performance Computer Center (MHPCC). The images are passed through
a series of processing stages including device ‘detrending’, masking
and artefact location. Detrending includes bias correction and
flat-fielding using white light flat-field images from a dome screen, in
combination with an illumination correction obtained by rastering
sources across the field-of-view. After deriving an initial astrometric
solution, the flat-fielded images are then warped onto the tangent
plane of the sky using a flux-conserving algorithm. The plate scale for
the warped images was originally set at 0.200 arcsec pixel, but
has since been changed to 0.25 arcsec pixel in what is known
internally as the V3 tessellation for the MDS fields. Bad pixels are
masked on the individual images and carried through the stacking stage
to give the ‘nightly stacks’.
Difference imaging is performed on a daily basis by two
independent pipelines. IPP takes the nightly stacks and creates
difference images by subtracting a high-quality reference image from the
new data. Point spread function (PSF) photometry is then performed on
the difference images to produce catalogues of variables and transient
candidates (Gezari et al., 2012; McCrum et al., 2014). The Transient Science
Server (TSS) developed by the PS1SC ingests catalogues of detections of
residual flux in the difference images and presents potential transients
for human eyeballing.
In parallel, an independent set of
difference images are produced at the Centre for Astrophysics at Harvard
from the nightly stack images using the PHOTPIPE (Rest et al., 2014, 2005)
software. A custom-built reference stack is produced and subtracted from
the IPP nightly stack to produce an independent difference image. This
process is described in Gezari et al. (2010), Gezari et al. (2012),
Chomiuk et al. (2011), Berger et al. (2012), Chornock et al. (2013) and
Lunnan et al. (2013), and potential transients are visually inspected for
promotion to the status of transient alert. A cross-match between the
TSS and the PHOTPIPE transient streams is performed and agreement
between the detection and photometry is now excellent, particularly
after the application of uniform photometric calibration based on the
‘ubercal’ process (Schlafly et al., 2012; Magnier et al., 2013).
2.1 Artefacts in Difference Imaging
In this work we only use detections from IPP difference imaging and not the
independent PHOTPIPE detections. In Fig. 1 we show a modular
diagram of IPP difference imaging process and the sources of the main types of
artefact.
The first source of bogus detections are chip defects, which
take various forms. After detrending the chip data are resampled and
geometrically warped to fit a unit area of sky that the data are
projected onto, known as a sky cell. Occasionally a transient will lie
on a region of the detector that when projected onto the sky falls on
overlapping sky cells. This results in duplicate warp images of the
same chip data, with the object lying close to one of the skycell edges. After warping,
sky cell edges, chip defects and saturated sources are masked. Masked
pixels in individual exposures are propagated through the stacking stage.
A kernel is derived to degrade a high quality template image
to match the nightly stack. The template is convolved with this
kernel and subtracted from the nightly stack. This series of steps
leads to a class of artefacts which we refer to as convolution
issues. In general these arise from the derived kernel not being
able to accurately match all sources in the template to those in the
nightly stack. This causes problems with bright sources where the
kernel is unable to fit the entire PSF of the detection in the nightly
stack image. These artefacts appear as high signal-to-noise (S/N) PSFs but with darker rings appearing in the
wings, an example is shown in the bottom panel of
Fig. 4. We call these unclean subtractions. The flux in
these detections is probably due to a bright stellar variable. Identifying variable stars (and AGNs) is quite a
different problem to detecting transients and we have chosen not to
try to tailor our algorithms to do both. The efforts in this paper are
focused on finding transients, although inevitably stellar variables from
very faint host stars are detected. Hence we discard these bright stellar
variables that appear in the difference images as they are straightforward
to identify. We find these detections make up 10% of the bogus
detections.
The same convolution issues can lead
to poor host galaxy subtraction, where an inadequately convolved host
can be over or under subtracted leaving a pattern of positive
and negative flux. This makes it difficult to disentangle any potential real
detections. The third convolution issue we highlight in
Fig. 1 arises when point-like sources in the template image are
broader than that of the nightly stack resulting in an over subtraction in the wings of the source in
the difference image. This happens when observing conditions
have been particularly good and the nightly stack is of higher quality than
the template image (this is not a frequent occurrence). The final artefacts from the convolution and
subtraction stage are convolution
problems in the cores of faint galaxies, manifesting themselves as
faint nuclear transients and appearing as positive flux in the
difference image. Here the convolution step matches the morphology of
the faint galaxy in the template and nightly stacks well, however the
peak flux of the convolved template is lower than that in the nightly
stack. This results in the nucleus of the faint galaxy being
under subtracted leaving residual flux in the difference image. These
artefacts are the most difficult to identify by eye but are
distinguished by a narrower PSF than expected. It is not always clear if the
flux is due to real variability or an artefact of convolution, in any case these
targets could not be confidently selected as real transients for
follow-up. This highlights one of the major uncertainties in training
the algorithms — secure labelling of real and bogus objects, which we return to in
Sections 5.3
and 5.5.
Another source of artefacts arises during the source
extraction phase. Flux in the nightly stack
from diffraction spikes for example that
have no equivalent in the template image get flagged as potential
transients. We refer to these as spurious detections in Fig. 1.
Our approach to date for removing these contaminants has been to
attempt to derive a set of filters based on image statistics derived for
each potential transient detection by IPP. These filters normally take
the form of threshold values for some parameters (see
Section 2.2). However, the parameter space is typically
large and the work required to manually develop the optimal set of
filters is impractical. Despite this our current hand-engineered checks
allow only a small fraction of the bogus images through. This
still produces on the order of a few hundred bogus objects each night
passing the cuts and being presented to human scanners for verification.
This is approaching the limit of what can comfortably be processed by
humans on a daily basis and clearly a solution needs to be found for the
next generation of survey.
Over the course of the last 3
years of the PS1 survey we have accumulated a large amount of data
associated with a few tens of thousands of astronomical sources that have
either been classified as real objects or artefacts using a combination
of the cuts detailed in Section 2.2 and human scanning.
This readily available data lends itself to data-mining where we hope to
use the historic data to improve on the current method of
real-bogus classification. In Section 2.3 we outline how
supervised learning can be applied to this archive of PS1 data in order
to construct a real-bogus classifier that can be applied to the nightly
stream of new data gathered from PS1 and future surveys. First we
describe the cuts we perform.

2.2 Cuts
Prior to ingesting detections from IPP difference imaging into a MySQL
database at Queen’s University Belfast (QUB) we perform pre-ingest cuts based on the detection of saturated, masked or suspected
defective pixels within the PSF area. Taking as a typical night 3rd
September 2013 (56548 MJD (Modified Julian Date)), the 7 nightly stacks
produced 366267 detections (52000 detections per stack), the
pre-ingest cuts rejected 94.88% of these detections.
The 18750
detections passing the pre-ingest cuts are associated with transient
candidates if there are two or more quality detections within the last
seven observations of the field, including detections in more than one
filter, and an rms scatter in the positions of 0.5 arcsec. Each
quality detection must be of more than 3 significance
and have a Gaussian morphology (XYmoments 1.2). These post-ingest
cuts also include checks for convolution issues, proximity to bright
objects and ‘NaN’ values close to the centre of bright PSFs. 63% of the
detections that passed the pre-ingest cuts were rejected during the
post-ingest cuts. The remaining detections were promoted for human
screening, where 37% of the detections were deemed to be real. These
real transient candidates are cross-matched with catalogues of
astronomical sources in the MDS fields. We use our own MDS catalogue
and also extensive external catalogues (e.g. SDSS, GSC, 2MASS, NED,
Milliquas222http://quasars.org/milliquas.htm, Veron AGN, X-ray
catalogues) to make a contextual classification of supernova, variable
star, active galactic nuclei or nuclear transient. We also cross-match
with the Minor Planet Centre to reject asteroids, though most are
removed during the construction of the nightly stacks.
2.3 Supervised Learning for Classification
In general supervised learning entails learning a
model from a training set of data for which we provide the desired
output for each training example. For the purposes of designating a
detection as a real transient or a processing artefact, the desired output
for each image is discrete. In such cases the problem is a supervised
classification task for which there are a vast array of machine learning
algorithms. In Section 4 we discuss the algorithms
we try; however all such algorithms are trying to learn a model from the
training data that will allow them to map the input parameterisation of
each training example (see Section 3.2) to the desired
output or label, while at the same time ensuring the model
performs well on data not seen during the training phase. For building
a real-bogus classifier this is an obvious avenue to pursue as we have a
large sample of historical data for which we have labels provided by our
current cuts and also through human eyeballing.
In Fig. 2 we show a sample of both real and bogus
examples drawn at random from the training data. Often bogus
detections show a combination of the factors we describe in
Section 2.1 and typically this affects the centroiding
during the source detection stage.
![]() |
![]() |
3 Training Set and Feature Representation
As discussed in the previous section we must provide a labelled training set from which the classifier can learn to recognise the characteristics that can identify detections as being members of one of the classes: real or bogus. In order to learn a model that will generalise well to detections in new observations, it is important that detections in the training set are representative of all detections we expect to see. In practice this is easiest to achieve by providing the learning algorithm with the largest possible training set, indeed Brink et al. (2013) attribute much of their improvement in performance over Bloom et al. (2012) to using a training set with two orders of magnitude more training examples. In the remainder of this section we describe the compilation of the training set, starting with a description of our training example selection process and labelling.
3.1 Training Set
Over the past 3 years 1 million potential transients have been
catalogued in the MDS by the TSS. Approximately 8000 of these objects have been selected by humans as real transients and
promoted as potential targets for spectroscopic follow-up. As of the
end of the survey in May 2014, 515 transients had spectroscopic classifications.
The aggregate catalogue information for all objects extracted by IPP and
which pass the pre-ingest cuts described in
Section 2.2 are stored in a database at QUB. Individual detections are associated with an
object if they are spatially coincident within 0.5 arcsec. This
information is presented to humans in the form of
webpages333Similar webpages are made public for the PS1 3Pi
Survey at: http://star.pst.qub.ac.uk/ps1threepi/psdb/public/. The
webpages show all the photometric points produced by IPP in a
multi-colour lightcurve. The number of photometric detections
typically ranges from a few to a few dozen depending on the magnitude
and timescale of the transient objects (see Rest et al. (2014),
McCrum et al. (2014), Chomiuk et al. (2011) for examples of lightcurves). These webpages also present a subset of the image postage-stamps of the detections associated with an object (target image, reference image and difference image).
This subset contains the first detections of the object of which there
are always at least 2 (see Section 2.2) and up to 5
subsequent detections. Each object is then eyeballed by a human, those that appear to be real
transients are promoted as potential targets
for scientific follow-up, while artefacts are discarded.
Our training examples are drawn from the subset
of detections we choose to present on the human digestible webpages for each object, as detailed above (typically 2-3
but less than 7). The majority of real examples were taken from
detections of promoted objects with no spectroscopic classification. There is no
guarantee that all detections of a promoted object are necessarily
a result of good image subtractions. This prohibits simply assigning a
label of real to all individual detections associated with a promoted target. In order to
to ensure that we have a secure, reliable and clean set of real detections for training, we inspected and individually labelled 4352 detections
(from 1919 different transients) as real,
discarding any artefacts from the training set. We augmented this
sample of real detections with data from 53 spectroscopically
confirmed supernovae (from Dec. 2012 to Jan. 2014) for which we used the complete set of
detections (31 detections per object on average). These were again manually checked to remove bogus
detections. We held out the first detections of all 53 SNe, which we
use for testing in Section 5.7 and all detections
of PS1-13avb, which we use in Section 5.6. This leaves an additional 1603 real training examples
bringing the total to 5955 real detections.
Over the course of the survey approximately 800000 objects
have been discarded as artefacts providing on the order of
examples of bogus detections. We randomly sample
from the available bogus examples and aim for 4 times more bogus
examples as real, this is similar to the proportions used by
Brink et al. (2013). Initial tests with classifiers showed that a
significant proportion of the false positives appeared to be clean
subtractions. We improved the purity of the bogus sample by examining
the randomly selected bogus detections and added any detections that looked like
real transient subtractions to the list of real examples (the effect of label
contamination is further discussed in Section 5.3).
This produced an extra 464 examples for the set of real detections resulting
in a final total of 6419. We then selected 4 times as many bogus images from
the remainder of the bogus examples we inspected, producing a sample of
25676 bogus detections.
The final training set contains 32095
training examples. We divide the training examples into 2 sets,
distributed as follows; 75% for training and cross validation, and 25%
for testing. The training examples are randomly shuffled prior to
splitting with the caveat that all detections on the same night of
a given object are included in the same set. This is to avoid detections
with almost identical statistics being in multiple sets and giving a
false impression of a classifiers performance. The construction of the
data set is summarised in Table 1. The label for each
training example is a 1 or 0, with 1 representing a label of real and 0
bogus.
The training set we have constructed is representative,
containing examples of detections from different chips, seeing conditions,
and filters, with various levels of S/N and examples
of all types of processing artefact.
Set | Real | Bogus | Total |
---|---|---|---|
Training | 4800 | 19271 | 24071 |
Test | 1619 | 6405 | 8024 |
Total | 6419 | 25676 | 32095 |
3.2 Feature Representation
Machine learning algorithms require a 1-dimensional (1-D) vector representation
of each training example, where each element of the vector corresponds
to some numeric data or feature that may be
useful to the algorithm for discerning examples belonging to each class.
Previous work in the area of real-bogus classification, has focused on
using parameters contained in catalogues generated by the processing
pipeline and more complex features derived from that information to
represent the detections, see Table 1 from Brink et al. (2013) and Table 1 from
Romano, Aragon & Ding (2006).
The catalogue features available to
individual surveys depend on the implementation of their image
processing pipeline. When applying machine learning for real-bogus
classification to a new survey it may not be possible to calculate these
features based on the information available in the catalogues. There is
also potential to spend a lot of time deriving and testing ways to
combine the catalogue information that is available into features that
we hope capture the differences between real and bogus detections. Bogus
detections are the result of many factors and establishing a set of features
that can encapsulate them all is difficult. In contrast simply
representing the detections by their pixel intensity values requires no time
spent developing or tuning feature extractors. Previous work that
relies solely on the pixel data has proven effective for simple visual
classification tasks, such as hand written digits (LeCun et al., 1998). For
more complex tasks or to boost performance much of this work has been
performed by learning a hierarchy of unsupervised features from the pixel
data (Coates, Lee, & Ng (2011); LeCun et al. (1998)). Establishing a firm benchmark
on the pixel intensity representation allows us to assess the potential
gains from applying these more complex methods and is the main focus of
this paper. Using this representation we expect the learning algorithm
to identify salient relationships between pixels for the classification
task. In the next section we discuss our choice of features and
continue in the following section by describing the preprocessing steps
we apply before training.
3.2.1 Feature Vector Construction
To
represent our training examples, we use the pixel data itself. For a
given training example, we construct its feature vector by selecting a
2020 pixel area (corresponding to 5 times the average
seeing of PS1) around the centre of what IPP considers a transient,
which we refer to as a substamp. The
1-D vector is constructed by shifting off each column of the substamp and
concatenating those columns together to produce a 400 element vector of
pixel intensity values.
In Fig. 3 we show
visualisations of these feature vectors along with the substamp from which
they were constructed for examples of real detections and for various levels
of S/N. In Fig. 4 we show detections labelled as bogus
with examples of different types of artefact. A learning algorithm will
learn to identify patterns in the feature vectors that are characteristic
of examples belonging to the two classes.
The choice of
feature representation is independent of the implementation of the rest
of the image processing pipeline and survey, with the assumption that
the pixel level data is easily accessible.
3.2.2 Feature Preprocessing
Aside from the image processing steps carried out by the pipeline, we carry out 2 additional transformations of the data. We first replace any ‘NaN’ pixel values with 0s. ‘NaN’ pixel values typically arise from masking or floating point overflows during image processing. We choose to replace these pixel values with 0 so as not to influence the next step in the preprocessing phase. As a second step we apply a feature normalisation function which allows classifiers to focus on relative pixel intensities and limits the effect of absolute brightness on the classifiers. We apply the following normalisation:
(1) |
where is a feature vector and is the standard deviation of the pixel intensity values for that feature vector. This is the same normalisation function used by EyE444http://www.astromatic.net/software/eye (Bertin, 2001) and similar to that of Romano, Aragon & Ding (2006).
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
4 Optimisation of the Classification System
In order to achieve the best performance from
the machine learning algorithms discussed in the following sections it
is necessary to optimise the hyperparameters of each. This is done by a
process known as cross validation which is a brute force search of the
hyperparameter space, where a model is trained with the hyperparameters
selected at predetermined intervals within the space. The best
combination is selected by measuring the performance in a held out
sample of the 24071 training examples.
Below we give a brief
introduction to each of the classifiers. We also point out the free
parameters that must be selected by cross validation and discuss this
process in depth in Section 4.4.1. To end this section
on optimisation we show the performance of each classifier on the
out of sample data in the test set.
4.1 Artificial Neural Networks
Artificial Neural
Networks (ANN) comprise a number of interconnected nodes arranged into a
series of layers. In this study we limit ourselves to a 3-layer ANN
(consisting of an input layer, a hidden layer and an output layer) as
those with more than one hidden layer need more careful training and
require more computational power (Hinton, Osindero, &
Teh, 2006). For our purposes we
train feed-forward ANNs with back-propagation and randomly initialised
weights, where the activation of each node is calculated with the
logistic (sigmoid) function.
By limiting many of the choices
for the structure of the ANNs we remove the need to select these
hyperparameters during the cross validation phase in
Section 4.4.1 which significantly reduces the complexity
of the space we have to search. This economy of computation comes at
the cost of not testing regions of the parameter space (e.g. other
activation functions) and restricting the representational power of the
ANNs by requiring a single hidden layer. We are however left with only
2 hyperparamters to choose namely, the number of nodes that make up the
hidden layer, and the regularisation parameter, through
which we attempt to prevent overfitting. There is some suggestion
(Murtagh, 1991; Geva & Sitte, 1992) that the optimal number of nodes in the hidden
layer () is , where n is the number of input features. In
our case is fixed at 400 input features, suggesting we should train
ANNs with nodes, however training such large networks is
beyond the scope of this work and we instead choose to test values of
in the range 25-200.
We use our own vectorised
implementation of ANNs written in
python555https://www.python.org. The code relies on
numpy666http://www.numpy.org for efficient array manipulations
and scipy777http://docs.scipy.org/doc/scipy/reference/index.html
for optimisation of the objective function.
4.2 Random Forests
Random Forests (RFs) aim to classify examples by building many decision trees from bootstrapped (sampled with replacement) versions of the training data (Breiman, 2001). Classifications are then assigned based on the average of the ensemble of decision trees. Each individual tree is grown by randomly sampling features from the input features and selecting the feature that best separates real examples from bogus as informed by the gini function. We use scikit-learn’s888http://scikit-learn.org/stable/index.html implementation of RFs where we select hyperparameters by assigning values to variables n_estimators, max_features and min_samples_leaf; the total number of trees in the ensemble, the number of features considered at each split and the minimum number of examples that define a leaf, below which no further splitting is allowed. RFs provide the ability to estimate the importance of each feature which we use in Section 5.2.
4.3 Support Vector Machines
Support Vector Machines (SVMs) (Cortes & Vapnik, 1995) aim to find the hyperplane in the input feature space that optimally classifies training examples for linearly separable patterns, while simultaneously maximising the margin, the distance between the training examples which lie closest to the hyperplane, known as the support vectors. SVMs can be extended to non-linear patterns with the inclusion of a kernel, where the kernel transforms the original input data into a new parameter space. We again use scikit-learn’s implementation of SVMs where we choose the free parameters namely, the penalty parameter, C (similar to for ANNs) and the kernel parameter gamma, which controls the local influence that support vectors have on the decision boundary. We only try SVMs with a Radial Basis Function (RBF) kernel, this being the most common choice and again reduces the parameter space that must be searched.
4.4 Model Selection
For each algorithm discussed above we need a method to choose the optimal combination of hyperparameters that will achieve the best performance for the classification task. In order to compare the relative performance of the different models we need some Figure of Merit (FoM). We use the FoM of Brink et al. (2013) which captures the essence of the problem we are trying to solve. The FoM is defined as the minimum Missed Detection Rate (MDR) (False Negative Rate) that gives a False Positive Rate (FPR) of 1%. That is, assuming we are willing to accept that 1% of the images deemed real by the classifier and promoted to human scanners will turn out to be bogus, what fraction of the real images would be discarded? With this we can select the model that would discard the least real images while 1% of images classified as real can be expected to be bogus.
4.4.1 Cross Validation
When calculating
the FoM to compare the relative performance of models, it is important
that the measurement is made on data that the model has not inspected
during the training phase, otherwise we risk measuring the performance
on data that the model has overfit and report an FoM that we cannot
expect to achieve on out of sample data. To mitigate this effect we
split the data we designated for training in
Section 3.1 into 5 subsets or folds with
equal numbers of training examples. We then train each model on 4 of
these folds and use the fifth as a validation set to measure the
performance. The model is then retrained on 4 folds but a different
fold is held out. In total the model is trained 5 times with each fold
being held out once. We then average the results for the 5-folds and
choose the model that results in the best average FoM. A second
advantage is that for relatively small data sets where the composition
of the validation set may not be representative of the entire
population, by evaluating the performance on each fold in turn and then
averaging, we achieve a better estimate of the actual performance on the
entire data set.
In our case all 3 classifiers output a
prediction or hypothesis for each example. These hypotheses
can be thought of as the probability a given example
has of belonging to the class of real images, taking on values in the
range 0-1. A classifier predicts detections with hypotheses close to 1 are
highly likely real transients, while those close to 0 are bogus. In
Fig. 5 we plot the distribution of hypothesis values for
a RF with n_estimators=100, max_features=25 and
min_samples_leaf=1 trained on 4 folds of the training set. The distribution
plotted shows the hypotheses for the held-out fifth fold. To assign a
label of real or bogus we must define a decision boundary; a hypothesis
value above which the classifier labels detections as real, otherwise detections
are labelled bogus. If the classifier has learnt a useful model it
should output detections labelled as bogus with a hypothesis below the
decision boundary and those labelled as real above the decision boundary
for the prediction to be correct. Bogus detections with predictions above
the decision boundary are False Positives and real detections with
hypotheses below the decision boundary are Missed Detections. For our
FoM the decision boundary is selected as the hypothesis value above
which only 1% of the bogus detections lie (dashed line in
Fig. 5). The FoM is the fraction of the detections labelled
as real that lie below this choice of decision boundary. During 5-fold
cross validation a hypothesis distribution is generated by predicting
hypotheses for the detections in each of the held-out folds.
In Fig. 6 we show an example of the 5-fold cross
validation process for an RF with max_features=25 and
min_samples_leaf=1. In
this example we vary the number of decision trees, n_estimators and plot a
Receiver Operator Characteristic (ROC) curve for each model. ROC curves
are produced by varying the decision boundary at which we assign a
prediction to a label of real or bogus and calculate the FPR and MDR
that decision boundary produces for the validation set. From the
example in Fig. 6 we see that selecting a value of
100 for n_estimators produces the best FoM of 0.167, this means that
an FPR=1% produces a MDR of 16.7%. We also include 5% and 10% FPR
levels for reference. We repeated this process for various sizes of
hidden layer. We also show an example of measuring the FoM on a data
set containing a significant proportion of the training data, labelled
as overfit in Fig. 6.
By replicating this
process for both ANNs and SVMs we were able
to select the optimal set of hyperparameters for each algorithm. In the second column of
Table 2 we show the optimal hyperparamters selected
for each algorithm by cross validation. By using the validation sets to
select the hyperparameters, there is a danger that the hyperparameters
will in effect have been fit to these sets. As a result, the FoM we
measure on the validation sets is not an unbiased measurement of the
performance we would expect to achieve on data not included in the
training folds. We deal with this in the next section.


4.4.2 Testing
Having selected the optimal
model for each of the algorithms we retrain these models with the entire
training set. This allows the models to learn from more examples. To
measure how well we expect the models selected by cross validation in
the last section to perform on unseen data we measure the FoM on the
test set, the 25% of the data we held back from both training and
validation. This provides an unbiased estimate of the performance. In
Table 2 we show the FoM measured on the test set.
Fig. 7a shows the ROC curve for each model in
Table 2. We find that the RF is the best
classifier with a FoM of 0.106.
Fig. 7b shows a
close-up of the measured FoM for the RF classifier, where the measured
FoM is shown along with the performance we would expect to achieve if we
were to allow 5% or 10% of the bogus detections through to human scanners.
For example, allowing the FPR to slip to 5% increases the completeness
to 97.6%. We also plot the hypothesis distribution for the detections in
the test set in Fig. 8.
The FoM shown in
Fig. 7b is the single best classifier we find in our
analysis. Using this classifier on a data stream of nightly
observations from PS1, we would expect that 99% of the detections promoted
to humans would be of real astrophysical transients while 10.6% of the
real detections would be rejected by the classifier. Brink et al. (2013) report
a MDR of 7.7% for their system. As a next step it is useful to
investigate the detections for which the classifier produces incorrect
predictions to see if there are systematic errors that the classifier
makes or if it is making correct predictions for detections that have been
labelled incorrectly during the construction of the training set.
Classifier | Model Parameters | Threshold | FoM |
---|---|---|---|
Artificial Neural Network | =200, | 0.547 | 0.233 |
Support Vector Machine (RBF) | C=3, gamma=0.01 | 0.788 | 0.196 |
Random Forest | n_estimators=1000, max_features=25, min_samples_leaf=1 | 0.539 | 0.106 |
a)
b)

5 Further Analysis
In this section we attempt to get a better sense of how we expect the classifier to perform in practice by characterising its performance under various conditions. We aim to identify trends in the kinds of detections for which the classifier makes incorrect predictions and investigate the effect that providing the classifier with incorrectly labelled training and test sets has on the measured FoM. However, we begin this section by looking at methods to boost performance by combining classifiers.
5.1 Combining Classifiers
As a last step
toward boosting performance we investigated a selection of methods to
combine the RF, SVM and ANN from
Table 2. The predictions of the 3 methods are
correlated; a candidate highly ranked by the RF is likely to also be
highly ranked by the other 2 classifiers, but there are still detections of
real transients that are discarded by only one of the classifiers.
From Fig. 9 there are 24 detections labelled as real that only
the RF wrongly rejects, it is these examples that we hope to recover by
combining classifiers.
We tried only a few of the simplest
combination strategies. First we simply classified a detection based on
the majority vote of the 3 classifiers. Second we assigned each detection a
hypothesis that was the mean of the hypothesis values output by each
classifier. This produced a new distribution of mean hypotheses, where
we again selected the decision boundary to produce the FoM. Finally we
trained a SVM using the 3 hypotheses for each detection as the features
representing that detection. In the end none of these methods outperformed
the RF classifier, though the performance was comparable (see
Table 3).
This result is unsurprising given
that the classifiers are highly correlated and there is no guarantee
that these methods will outperform the best individual classifier
(Fumera & Roli, 2005). The RF is in itself an ensemble of classifiers (the
individual decision trees) and may already incorporate much of the gain
in performance we can expect from these simple methods.

Method | FPR | MDR |
---|---|---|
Majority Vote | 0.02 | 0.06 |
Mean Hypotheses | 0.01 | 0.12 |
Hypotheses as Features | 0.01 | 0.12 |
5.2 Relative Feature Importance
Random
Forests provide a built-in method to estimate the relative importance of
each feature to the classification (Breiman, 2001). By inspecting the
‘depth’ at which each feature is used as a decision node we can estimate
the relative importance of that feature, as those features used closer
to the top of the tree will contribute to the prediction of a larger
fraction of the training examples. The fraction of samples for which we
expect a feature to contribute to the classification can be used to
gauge its relative importance.
Fig. 10
shows the relative importance of each pixel determined from the training
set. The relative importance metric is normalised such that it sums to
1. The most important features have the highest values and as would be
expected are located in the centre of the image. The pixels on the
edges of the images are thought to be important for identifying many of
the bogus examples, where the object is not centred in the substamp and
often lies at the edge. For reference if features were equally
important they would each have a relative importance of .
Fig. 10 may suggest some
redundancy in the features bounding the central pixels. It is expected
that omitting these features would have little effect on the performance
of our classifier as RFs are thought to be unaffected by the inclusion
of noise variables in the feature vector (Biau, 2010). In contrast
Brink et al. (2013) find that the MDR for their RF classifier improves by
4% by omitting noisy features using a backward feature selection
method. The effect of feature selection is an interesting area for
future work and attempts at optimisation.

5.3 Label Contamination
We took care
to eliminate label contamination in Section 3.1, by
visually checking and manually labelling each training example.
Nonetheless we expect that there remain some examples with incorrect
labels. In this section we employ similar methods to those in
Brink et al. (2013) to investigate the effect that label contamination has
on our ability to train and test the optimal RF model.
First
we investigate the effect of adding label contamination to the training
set. We add contamination by randomly selecting a subset of the detections
from the training set and flipping their labels. Those labelled as real
are now labelled as bogus and vice versa. In
Fig. 11 we plot the effect of randomly flipping
labels in the training set while leaving the original labels in the test
set untouched. The measured MDR appears fairly unaffected up to around
6% contamination. The approach of Brink et al. (2013) is robust to around
10% suggesting our method may be more susceptible to incorrectly
labelled training data.
Next we flip labels in the test set,
while using the original training set labels as they are. Given that
the RF has been trained with correctly labelled data, for the most part
we expect it to provide the correct labels for the images in the test
set. However, the flipped labels affect our ability to accurately
measure the FoM. Although the classifier makes sensible predictions,
when we compare these predictions to the flipped labels the otherwise
correct predictions are now evaluated as False Positives or Missed
Detections. Fig. 11 shows how the FoM is affected
as we increase the fraction of flipped labels, we see that even at low
proportions labelling noise in the test set can have a significant
effect.

5.4 Classification as a Function of Signal-to-Noise
To investigate the classifier performance as a function of Signal-to-Noise (S/N), we also follow a similar analysis to Brink et al. (2013). We plot the distribution of magnitudes for each example in the test set labelled as real in Fig. 12. We divide the examples into 11 bins, each spanning 1 magnitude in the range 13 to 24 mag. We then use the classifier to make a prediction for the examples in each bin and calculate the fraction of examples classified as bogus which we take as an estimate of the classifier performance for objects at that level of S/N. For objects with magnitudes 20 there is a 6% chance of missing real detections. Counterintuitively the detection performance deteriorates for higher S/N objects. The number of examples of these cases are low as typically these objects result in artefacts from saturation and subsequent masking or unclean subtractions. However, this can also be understood as an effect of our feature representation, where we are learning classifications based on the relative intensity of pixels across the substamp. The tendency to misclassify such detections could stem from a combination of the large relative intensity differences between pixels in these substamps that often characterise artefacts and the low numbers of high S/N images of real transients. This explanation is further supported by both the ANN and SVM, which also misclassify these objects, suggesting the issue is with the data and not a consequence of the realisation of the RF. In the next section we try to identify any relationships in the missed detections.

5.5 Missed Detections
We inspected
the 172 missed detections (see Fig. 13) looking
for similarities that may explain why they were rejected. We found that these
missed detections are associated with 112 individual transients. Although we
took care to limit label contamination during the construction of the
training set, we identified some examples of obvious bogus detections mislabelled as
real that account for a small fraction (1%) of the missed
detections.
We also find about 29% of the missed detections appear to be a result of
faint galaxy convolution problems (see Section 2.1). These artefacts are difficult
to identify by eye and as a result have been incorrectly labelled as
real detections significantly contributing to the label contamination of the test set.

In Section
5.4 we discussed the high MDRs for bright sources. In
Fig. 14 we plot the hypothesis values for all detections
included in the histogram of Fig. 12 (i.e. all test set
detections that have been visually classified as real) against their
magnitude reported by IPP. A feature of the plot that stands out is the
cluster of sources with magnitudes brighter than 16 and hypotheses less
than 0.2. Magnier et al. (2013) report that for the PS1 3 survey,
saturation occurs at 13.5 for g, r, i,
13.0 for z and 12.0 in y. We were concerned
that these sources could be saturated, however to conclusively determine
this the individual images that are combined to make a nightly stack
would need to be examined. Instead we scaled the magnitudes reported by
Magnier et al. (2013) for PS1 3 exposures by the exposure times for the
individual images that make up a nightly stack and set a magnitude limit
of 16 mag. Objects brighter than this limit may have saturated cores in some
exposures and cannot safely be labelled as real. Some of these sources
on close inspection also show signs of the unclean subtractions we
highlighted in Section 2.1.
The detections brighter than 16 mag in Fig. 14 with
hypotheses above 0.2 are all associated with a single confirmed
supernova (SN), SN 2014bc (PS1-14xz)(Smartt et al., 2014). SN 2014bc is a
nearby (7.6 Mpc) Type-IIP located in the bright host galaxy NGC4258
(Messier 106). The transient lies close to the core of the host and as
a consequence the host has been poorly subtracted in the same location
in all the substamps. Detections of this object appear in both the
training and test set and although we ensured detections from the same night
must appear in the same set, the slowly evolving plateau has resulted in
detections with similar S/N and the same pattern of poor subtraction
appearing in both. It is therefore to be suspected that test set detections
associated with this SN would have been rejected along with the other
sources brighter than 16 mags had similar detections not been included in
the training set. This raises the issue of potentially missing the
brightest transients which are often of interest and the cheapest to
classify spectroscopically, we return to this in
Section 6.
The high MDRs in the
magnitude range 16-20 still remain unexplained. To address this in
Fig. 15 we plot the number of examples of real transients
in each of the magnitude bins used in Section 5.4 for both
the training and test sets. The plot clearly shows the deficit in
training examples at magnitudes brighter than 20 and lead us to conclude
that we lack enough training examples of high S/N transients to allow
the classifier to learn a model that generalises well in this regime.
In Fig. 15 we overlay the relative size of the test set
compared with the training set in each bin. We selected the test set by
randomly sampling 25% of the data available for training. The small
fractions of test examples available between 16 and 18 mags combined
with the low numbers in the range 16-20 mags severely impact our ability
to accurately measure the MDR in this range.
Aside from the issues associated with high S/N, there are a few other SNe with detections that
show similar host galaxy subtraction problems to SN 2014bc. Some of these are true bogus detections
which we show in Fig. 13.
Approximately 9% of the missed detections are bogus detections
around poor host subtractions. We include detections of SN 2014bc
with this group in Fig.13 though these
detections around 15th magnitude could
equally have been included with the bright sources.

Among the missed detections we also found substamps
where entire rows or columns along an edge of a substamp had been masked.
In the second panel of Fig. 3 we show an example
where the bottom 2 rows of pixels have been masked. These are
examples of the sky cell duplicates we describe in
Section 2.1. We were concerned the classifier was rejecting
these detections based on the masking. To see if this was
the case we identified all the examples of sky cell overlap among the
real test set detections, and found 20. As we ensured that detections from the
same night must be in the same set (training or test set), the
equivalent full 2020 pixel substamps were also in the test set. We
compared the performance on the full pixel substamp with that of the
partially masked substamp and found that there is only 1 case where the
masked substamp was rejected while the full pixel substamp was kept. In this
instance a significant proportion of the substamp was masked (7 columns)
with the edge lying close to the PSF. The majority of the remaining
substamp pairs were both assigned the same classification. There are
however 6 pairs where the masked substamp was correctly classified as real,
but the full pixel substamp was rejected, showing the classifier does not
tend to reject detections with sky cell masking simply due to the masked
regions.
The reason for rejecting one detection from the pair over
the other is unclear as both substamps are constructed from the same data.
The 6 pairs for which the full pixel substamp was labelled bogus, but the
masked substamp was labelled real are all associated with a single
transient and may not apply to other sky cell pairs. For
these substamp pairs we found that the centroids always differed by 1 pixel
and were offset in the same direction. We tried shifting the centre of
the stamps to the same pixel, but found that this had little impact on
the hypothesis. In all cases the flux-conserving warping results in
equivalent pixels containing different counts, though the difference is
typically small 10%. Given the small number of cases where
the detections of a sky cell pair are assigned to different classes (7 in
total) and that these detections are associated with only two transients (6
associated with a single transient where the full pixel substamp is
rejected and 1 associated with a different transient where the masked
substamp is rejected), it is difficult to explain this behaviour, though
one explanation may be the small differences in pixel intensity values
perhaps combined with the different centroids.
The nuances of difference imaging make it difficult to
determine the ground truth label for each detection. Humans
often require additional information beyond that contained in the single difference image e.g. position relative to the host, or the number of bad/good pixels visible in the input image. The investigations above suggest that the
classifier is identifying subtle relationships and correctly identifying that many of the ‘missed detections’ are dubiously labelled
as real. We estimate that 45% (5% bright sources; 29% convolution problems; 9%
poor host subtractions; 1% obvious mislabelled artefacts) or about 77 of the missed detections
are not of high enough quality to be confidently labelled as real
detections. Therefore the RF classifier is not strictly getting them
wrong. The high proportions of these cases among the
missed detections does not hold true for the entire sample of real
detections in the test set, where for example faint galaxy convolution problems
are crudely estimated to account for no more than 7%. Removing such detections from our test set results
in an MDR around 6.2%. The MDR of our classifier is therefore in the
range 6.2 - 10.6% for a FPR of 1% but most likely toward the lower
end of this range. The remaining 95 detections are true missed detections and
appear to be mislabelled by the classifier due to high S/N as
discussed above, poor seeing conditions and very low S/N detections near the
detection limit.

5.6 Medium Deep Confirmed Supernovae
In order to demonstrate how we might expect the classifier to perform on a live data stream we first use the classifier to make predictions for the supernova PS1-13avb for which we held out all associated detections from both the test and training set. This object has been spectroscopically classified as a Type Ib SN and has a well sampled lightcurve from about -18 days pre-maximum to around 106 days post maximum, including exposures in all 5 filters ranging in magnitude from around 23 to 20 mags (see Fig. 16 top panel). We selected this object for its high quality lightcurve and magnitude range which represents the majority of objects discovered in the PS1 MDS. In the bottom panel of Fig. 16 we show the hypothesis for each epoch of this target. The plot shows that the hypothesis is consistently above the decision boundary of 0.539 (selected in Section 4.4.2) with the exception of the detection from 56480.406 MJD (Modified Julian Date) which shows the transient at a magnitude of g approaching the detection limit in this filter. The detection is displayed as an inset in Fig. 16 with its hypothesis of 0.506, showing the low S/N and deviation from a PSF-like morphology.

5.7 Early Detection
One of the major aims of recent supernova searches has been to try to detect the transient as soon after explosion as possible in order to trigger rapid follow-up to spectroscopically study regions of the transients evolution that remain relatively unexplored (Gal-Yam et al. (2014); Cao et al. (2013)). To this end we carry out a simple test by using the classifier to make predictions for the first detections of all 53 classified SNe in our database. Again we held these detections out from the training and test sets. In Table 4 we list the 53 SNe and the details of the first detections along with the hypothesis for each detection. The classifier correctly predicts all detections as real and had it been running on a live data stream would have promoted all objects to humans for follow-up.
6 Summary of Results and Conclusions
In this work we have constructed a data set of detections from the Pan-STARRS1
Medium Deep Survey. We used this data set to train a Random Forest
classifier to reject bogus detections of transients before they are
presented to humans as potential targets for follow-up. As the feature
representation of these detections we used the pixel intensity values of a
2020 pixel substamp centred on the detection. This choice is
independent of the observing strategy and removes the need for careful
feature design and selection that requires specific domain knowledge.
The choice of features also make this method applicable to any survey
performing difference imaging and requires no information from either
the template image or nightly stack. Using the Figure of Merit as defined in
Brink et al. (2013) we selected the decision boundary such that objects
classified as real should be 99% pure, which resulted in a best
estimate of a Missed
Detection Rate of 6.2% (i.e. 93.8% complete) and can compete with
previous work in this area. We further tested the classifier by
applying it to the lightcurve of a Type-Ib supernova and found only one
missed detection out of 74. The missed detection had low
signal-to-noise. In addition, to assess the
classifiers performance for early detection, we used the classifier to
make predictions for the first detections of 53 spectroscopically
confirmed SNe in our database and found none would have been rejected.
We discovered our classifier struggles to provide accurate classifications for the brightest sources (19 mags). Many of these
are associated with bright variable stars and have ringing patterns due to the kernel size definition, which leads to labelling difficulties.
Some are also close to the saturation limit which may cause the algorithms to misidentify real sources as bogus.
The mathematical problem in detecting bright variable stars in difference images is clearly quite distinct from finding low flux
and moderate flux level transients in, or near extended galaxies. Furthermore the scientific goal in characterising variability
of stellar sources is typically based on total flux measurements whereas finding explosive transients requires the resolved
and unresolved galaxies to be subtracted. Our methods are tailored toward the latter, and can certainly not be blindly
applied to uncover complete populations of variable stars or variable AGNs. With a goal of discovering extragalactic transients,
one is content to ignore stellar variables in a data stream, although we show here that the algorithms can sometimes
misclassify bright and high signal-to-noise explosive transients.
We also found the MDR is
consistently higher for sources brighter than 20 mags which we attribute
to the lack of training data in this range. We would expect that
providing more training examples that are representative of these
objects would reduce the MDR for brighter sources. In this paper we
have only used a sample of the data from the PS1 MDS,
but we have access to the full database of MDS transients, which could
be used to provide more training data. In addition we also have data from PS1 3 difference imaging which
could also be used to boost training numbers and build a classifier that
could perform real-bogus classification for both surveys. In our analysis
we have not considered the
case of asteroids as these are typically removed during the construction
of the nightly stacks in the MDS. Including the
PS1 3 data, where differencing is performed on individual
exposures, would allow us to test the performance of our method on asteroids.
It may also be more beneficial to apply this approach at the source extraction
stage. By working directly on the pixel data the classifier could
potentially learn which sources to extract and which to discard from a
difference image before any further processing of a potential detection
is performed.
The
dependence of any machine learning approach to real-bogus classification
on large amounts of training data presents a serious problem for any new
survey. While many sources of processing artefacts are common across
surveys, differing pixel scales and seeing conditions prevent the use of
a classifier trained on one survey being directly applied to another.
A solution would be to build a training set based on hand
labelled commissioning data and periodically retrain the classifier as
new data become available. Alternatively an initial classifier
trained on the limited data available early in a survey could be
improved on
by employing online learning, where the classifier is automatically updated as new labelled
data are gathered (Shalev-Shwartz, 2011; Saffari et al., 2009).
Future work will focus on combining
the remaining PS1 data available into a single training set that will
hopefully address the S/N issue. Other areas of research could include
the use of semi-supervised feature learning
(Raina et al., 2007) and deep learning (Coates et al., 2013) that retain all the
advantages of our current approach at the expense of being more
computationally demanding. However, the added representational power of
larger ANNs and the possibility of applying the unsupervised features
learnt from one survey to a variety of other surveys could mean this is
a promising domain to explore.
An efficient real-bogus
classifier is only one step toward rapid discovery and classification of
transients. With next generation surveys the stream of transients will
need to be prioritised based on scientific goals. Providing a
contextual classification (Djorgovski et al., 2012; Bloom et al., 2012) of the
transients detected would allow researchers to select the most promising
candidates for their research goals and will also be the focus of future
work.
Acknowledgments
The Pan-STARRS1 Survey has been made possible through contributions of the Institute for Astronomy, the University of Hawaii, the Pan-STARRS Project Office, the Max-Planck Society and its participating institutes, the Max Planck Institute for Astronomy, Heidelberg and the Max Planck Institute for Extraterrestrial Physics, Garching, The Johns Hopkins University, Durham University, the University of Edinburgh, Queen’s University Belfast, the Harvard-Smithsonian Center for Astrophysics, and the Las Cumbres Observatory Global Telescope Network, Incorporated, the National Central University of Taiwan, and the National Aeronautics and Space Administration under Grant No. NNX08AR22G issued through the Planetary Science Division of the NASA Science Mission Directorate. The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC Grant agreement n [291222] (PI : S. J. Smartt) and the RCUK STFC grants ST/I001123/1 and ST/L000709/1. DEW acknowledges support from DEL in the form of a postgraduate studentship.
References
- Bailey et al. (2007) Bailey S., Aragon C., Romano R., Thomas R. C., Weaver B. A., Wong D., 2007, ApJ, 665, 1246
- Baltay et al. (2013) Baltay C. et al., 2013, PASP, 125, 683
- Berger et al. (2012) Berger E. et al., 2012, ApJL, 755, L29
- Bertin (2001) Bertin E., 2001, in Mining the Sky, Banday A. J., Zaroubi S., Bartelmann M., eds., p. 353
- Biau (2010) Biau G., 2010, ArXiv e-prints (1005.0208)
- Bloom et al. (2012) Bloom J. S. et al., 2012, PASP, 124, 1175
- Breiman (2001) Breiman L., 2001, Machine learning, 45, 5
- Brink et al. (2013) Brink H., Richards J. W., Poznanski D., Bloom J. S., Rice J., Negahban S., Wainwright M., 2013, MNRAS, 435, 1047
- Cao et al. (2013) Cao Y. et al., 2013, ApJL, 775, L7
- Chomiuk et al. (2011) Chomiuk L. et al., 2011, ApJ, 743, 114
- Chornock et al. (2013) Chornock R. et al., 2013, ApJ, 767, 162
- Coates et al. (2013) Coates A., Huval B., Wang T., Wu D., Catanzaro B., Ng A., 2013, in Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 1337–1345
- Coates, Lee & Ng (2011) Coates A., Lee H., Ng A. Y., 2011, in AISTATS 2011, Vol. 1001
- Cortes & Vapnik (1995) Cortes C., Vapnik V., 1995, Machine learning, 20, 273
- Djorgovski et al. (2012) Djorgovski S. G., Mahabal A. A., Donalek C., Graham M. J., Drake A. J., Moghaddam B., Turmon M., 2012, ArXiv e-prints (1209.1681)
- Donalek et al. (2008) Donalek C., Mahabal A., Djorgovski S. G., Marney S., Drake A., Glikman E., Graham M. J., Williams R., 2008, in American Institute of Physics Conference Series, Vol. 1082, American Institute of Physics Conference Series, Bailer-Jones C. A. L., ed., pp. 252–256
- Drake et al. (2009) Drake A. J. et al., 2009, ApJ, 696, 870
- du Buisson et al. (2014) du Buisson L., Sivanandam N., Bassett B. A., Smith M., 2014, ArXiv e-prints (1407.4118)
- Fumera & Roli (2005) Fumera G., Roli F., 2005, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 27, 942
- Gal-Yam et al. (2014) Gal-Yam A. et al., 2014, Nature, 509, 471
- Geva & Sitte (1992) Geva S., Sitte J., 1992, Neural Networks, IEEE Transactions on, 3, 621
- Gezari et al. (2012) Gezari S. et al., 2012, Nature, 485, 217
- Gezari et al. (2010) Gezari S. et al., 2010, ApJL, 720, L77
- Hinton, Osindero & Teh (2006) Hinton G., Osindero S., Teh Y. W., 2006, Neural computation, 18, 1527
- Hodapp et al. (2004) Hodapp K. W., Siegmund W. A., Kaiser N., Chambers K. C., Laux U., Morgan J., Mannery E., 2004, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 5489, Ground-based Telescopes, Oschmann Jr. J. M., ed., pp. 667–678
- Kaiser et al. (2010) Kaiser N. et al., 2010, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 7733, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series
- Keller et al. (2007) Keller S. C. et al., 2007, PASA, 24, 1
- LeCun et al. (1998) LeCun Y., Bottou L., Bengio Y., Haffner P., 1998, Proceedings of the IEEE, 86, 2278
- Lunnan et al. (2013) Lunnan R. et al., 2013, ApJ, 771, 97
- Magnier (2006) Magnier E., 2006, in The Advanced Maui Optical and Space Surveillance Technologies Conference
- Magnier et al. (2013) Magnier E. A. et al., 2013, ApJS, 205, 20
- McCrum et al. (2014) McCrum M. et al., 2014, MNRAS, 437, 656
- Murtagh (1991) Murtagh F., 1991, Neurocomputing, 2, 183
- Raina et al. (2007) Raina R., Battle A., Lee H., Packer B., Ng A. Y., 2007, in Proceedings of the 24th international conference on Machine learning, ACM, pp. 759–766
- Rau et al. (2009) Rau A. et al., 2009, PASP, 121, 1334
- Rest et al. (2014) Rest A. et al., 2014, ApJ, 795, 44
- Rest et al. (2005) Rest A. et al., 2005, ApJ, 634, 1103
- Romano, Aragon & Ding (2006) Romano R. A., Aragon C. R., Ding C., 2006, in Machine Learning and Applications, 2006. ICMLA’06. 5th International Conference on, IEEE, pp. 77–82
- Saffari et al. (2009) Saffari A., Leistner C., Santner J., Godec M., Bischof H., 2009, in Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pp. 1393–1400
- Schlafly et al. (2012) Schlafly E. F. et al., 2012, ApJ, 756, 158
- Shalev-Shwartz (2011) Shalev-Shwartz S., 2011, Foundations and Trends in Machine Learning, 4, 107
- Smartt et al. (2014) Smartt S. J. et al., 2014, The Astronomer’s Telegram, 6156, 1
- Smartt et al. (2013) Smartt S. J. et al., 2013, The Messenger, 154, 50
- Tonry et al. (2012a) Tonry J. L. et al., 2012a, ApJ, 745, 42
- Tonry et al. (2012b) Tonry J. L. et al., 2012b, ApJ, 750, 99
- York et al. (2000) York D. G. et al., 2000, AJ, 120, 1579
Name | Classification | First Detection (MJD) | Magnitude | Filter | Hypothesis |
---|---|---|---|---|---|
PS1-13duq | Ia | 56588.381 | 21.64 | i | 0.989 |
PS1-13bzp | Ia | 56478.276 | 21.33 | g | 0.98 |
PS1-13bzp | Ia | 56478.276 | 21.30 | g | 0.975 |
PS1-13abg | II-P | 56383.336 | 21.47 | z | 0.971 |
PS1-13bqg | Ia | 56443.287 | 20.90 | g | 0.968 |
PS1-13abg | II-P | 56383.336 | 21.55 | z | 0.963 |
PS1-14il | IIn | 56676.554 | 21.36 | z | 0.959 |
PS1-13vc | Ia | 56351.476 | 20.67 | z | 0.958 |
PS1-13abw | Ic | 56383.438 | 21.23 | z | 0.958 |
PS1-14ky | II | 56681.499 | 21.60 | z | 0.957 |
PS1-13ur | Ia | 56351.525 | 20.32 | z | 0.955 |
PS1-13eae | II | 56604.598 | 19.82 | y | 0.954 |
PS1-13alz | II-P | 56399.282 | 20.43 | i | 0.952 |
PS1-12cnr | Ia | 56283.340 | 20.33 | z | 0.948 |
PS1-13can | Ia | 56477.567 | 22.04 | z | 0.946 |
PS1-13cws | Ia | 56549.460 | 21.48 | z | 0.943 |
PS1-13ge | Ia | 56328.612 | 21.49 | g | 0.94 |
PS1-12cho | Ia | 56262.469 | 21.07 | z | 0.933 |
PS1-12cey | II | 56268.294 | 22.06 | g | 0.931 |
PS1-13bok | I | 56424.560 | 22.35 | r | 0.926 |
PS1-13djz | Ic | 56554.585 | 20.74 | z | 0.923 |
PS1-13a | Ia | 56289.280 | 21.20 | z | 0.919 |
PS1-13bit | Ia | 56420.548 | 22.69 | i | 0.918 |
PS1-13bqb | Ia | 56443.287 | 22.32 | g | 0.918 |
PS1-13djj | Ia | 56563.576 | 20.72 | g | 0.916 |
PS1-12bza | II-P | 56262.469 | 21.12 | z | 0.914 |
PS1-13brf | Ia | 56443.324 | 22.68 | r | 0.91 |
PS1-13hp | II-P | 56325.545 | 21.40 | g | 0.907 |
PS1-13adg | Ia | 56384.515 | 21.71 | r | 0.902 |
PS1-13awf | I | 56417.315 | 22.47 | i | 0.899 |
PS1-13atm | II-P | 56410.298 | 22.25 | z | 0.898 |
PS1-13cjb | II | 56501.436 | 22.63 | g | 0.896 |
PS1-12cho | Ia | 56262.469 | 21.08 | z | 0.884 |
PS1-13cai | Ia | 56477.567 | 21.99 | z | 0.882 |
PS1-13bni | II-P | 56420.548 | 23.07 | i | 0.873 |
PS1-13bog | Ia | 56417.341 | 22.57 | i | 0.867 |
PS1-12chw | Ia | 56262.313 | 21.21 | y | 0.857 |
PS1-13bqv | Ia | 56442.486 | 21.64 | z | 0.849 |
PS1-13djs | Ia | 56562.587 | 21.44 | z | 0.84 |
PS1-13aai/SN 2013au | Ia | 56370.421 | 19.29 | z | 0.833 |
PS1-13cuc/SN 2013go | Ia | 56536.587 | 19.03 | z | 0.814 |
PS1-13hs | I | 56328.515 | 21.97 | g | 0.801 |
PS1-13baf | II-P | 56414.521 | 22.54 | i | 0.801 |
PS1-13avb | Ib | 56414.521 | 21.75 | i | 0.796 |
PS1-13ayn | Ia | 56416.449 | 22.01 | r | 0.77 |
PS1-13aai/SN 2013au | Ia | 56370.421 | 19.36 | z | 0.735 |
PS1-13fo/SN 2013X | Ia | 56314.625 | 18.04 | y | 0.713 |
PS1-13bus | Ia | 56462.440 | 22.91 | i | 0.708 |
PS1-13brw | II-P | 56436.325 | 20.71 | y | 0.707 |
PS1-13hi | IIn | 56324.604 | 18.50 | z | 0.704 |
PS1-13bvc | Ia | 56469.349 | 21.90 | z | 0.698 |
PS1-13bzk | Ia | 56468.571 | 21.38 | y | 0.662 |
PS1-13abf | Ia | 56380.368 | 20.21 | y | 0.642 |
PS1-13arv | Ia | 56409.239 | 20.98 | y | 0.638 |
PS1-13wr | II-P | 56349.602 | 20.39 | y | 0.637 |
PS1-14xz/SN 2014bc | II-P | 56399.380 | 18.25 | i | 0.623 |
PS1-13wr | II-P | 56349.602 | 20.40 | y | 0.61 |
PS1-13buf | Ia | 56461.299 | 22.60 | z | 0.577 |