Subset Scanning Over Neural Network Activations

Subset Scanning Over Neural Network Activations


This work views neural networks as data generating systems and applies anomalous pattern detection techniques on that data in order to detect when a network is processing an anomalous input. Detecting anomalies is a critical component for multiple machine learning problems including detecting adversarial noise. More broadly, this work is a step towards giving neural networks the ability to recognize an out-of-distribution sample.

This is the first work to introduce “Subset Scanning” methods from the anomalous pattern detection domain to the task of detecting anomalous input of neural networks. Subset scanning treats the detection problem as a search for the ’most anomalous’ subset of node activations (i.e. highest scoring subset according to non-parametric scan statistics). Mathematical properties of these scoring functions allow the search to be completed in log-linear rather than exponential time while still guaranteeing the most anomalous subset of nodes in the network is identified for a given input. Quantitative results for detecting and characterizing adversarial noise are provided for CIFAR-10 images on a simple convolutional neural network. We observe an “interference” pattern where anomalous activations in shallow layers suppress the activation structure of the original image in deeper layers.

1 Introduction

“Awareness of ignorance is the beginning of wisdom.”

– Socrates

We wish to give neural networks the ability to know when they do not know. In addition, we want networks to be able to explain to a human “why” they do not know. Our approach to this ambitious task is to view neural networks as data generating systems and detect anomalous patterns in the “activation space” of their hidden layers.

The three goals of applying anomalous pattern detection techniques to data generated by neural networks are to: Quantify the anomalousness of activations within a neural network; Detect when anomalous patterns are present for a given input; and Characterize the anomaly by identifying the nodes participating in the anomalous pattern.

Furthermore, we approach these goals without specialized (re)training techniques or novel network architectures. These methods can be applied to any off-the-shelf, pre-trained model. We also emphasize that the adversarial noise detection task is conducted in an unsupervised form, without labeled examples of the noised images.

The primary contribution of this work is to demonstrate that non-parametric scan statistics, efficiently optimized over activations in a neural network, are able to quantify the anomalousness of a high-dimensional input into a real-valued “score”. This definition of anomalousness is with respect to a given network model and a set of “background” inputs that are assumed to generate normal or expected patterns in the activation space of the network. Our novel method measures the deviance between the activations of a given input under evaluation and the activations generated by the background inputs. A higher measured deviance result in a higher anomalousness score for the evaluation input.

The challenging aspect of measuring deviances in the activation space of neural networks is dealing with high-dimensional data on the order of the number of nodes in a network. Our baseline example in this work is a convolutional neural network trained on CIFAR-10 images with seven hidden layers and contains 96,800 nodes. Therefore, the measure of anomalousness must be effective in capturing (potentially subtle) deviances in a high-dimensional space and be computationally tractable. Subset scanning meets both of these requirements (see Section 2).

The reward for addressing this difficult problem is an unsupervised, anomalous-input detector that can be applied to any input and to any type of neural network architecture. This is because neural networks rely on their activation space to encode the features of their inputs and therefore quantifying deviations from expected behavior in the activation space has universal appeal and potential.

We are not analyzing the inputs directly (i.e. the pixel-space) nor performing dimensionality reduction to make the problem more tractable. We are identifying anomalous patterns at the node-level of networks by scanning over subsets of activations and quantifying their anomalousness.

The next contributions of this work focus on detection and characterization of adversarial noise added to inputs in order to change the labels [\citeauthoryearSzegedy et al.2013, \citeauthoryearGoodfellow, Shlens, and Szegedy2014, \citeauthoryearPapernot and McDaniel2016].

We do not claim state of the art results and do not compare against the numerous and varied approaches in the expanding literature. Rather, these results demonstrate that the “subset score” of anomalous activations within a neural network is able to detect the presence of subtle patterns in high dimensional space. Also note that a proper “adversarial noise defense” is outside the scope of this paper.

In addition to quantifying the anomalousness of a given input to a network, subset scanning identifies the subset of nodes that contributed to that score. This data can then be used for characterizing the anomalous pattern. These approaches have broad implications for explainable A.I. by aiding the human interpretability of network models.

For characterizing patterns in this work, we analyze the distribution of nodes (identified as anomalous under the presence of adversarial noise) across the layers of the network. We identify an “interference” pattern in deeper layers of the network that suggests the structure of activations normally present in clean images have been suppressed, presumably by the anomalous activations from shallower layers. These types of insights are only possible through the subset scanning approach to anomalous pattern detection.

The final contribution of this work is laying out a line of research that extends subset scanning further into the deep learning domain. This current paper introduces how to efficiently identify the most anomalous unconstrained subset of nodes in a neural network for a single input. The subset scanning literature has shown that the unconstrained subset has weak detection power compared to constrained searches where the constraints reflect domain-specific knowledge on the type of anomalous patterns to be detected [\citeauthoryearNeill2012, \citeauthoryearSpeakman et al.2016].

The rest of this paper is organized in the following sections. Section 2 reviews subset scanning and highlights the Linear Time Subset Scanning property originally introduced in [\citeauthoryearNeill2012]. The section goes on to introduce our novel method that combines subset scanning techniques and non-parametric scan statistics in order to detect anomalous patterns in neural network activations. Detection experiments are covered in Section 3. We provide quantitative detection and characterization results on adversarial noise applied to CIFAR-10 images. Future methodological extensions and new domains of application are covered in Section 4 and finally, Section 5 provides a summary of the contributions and insights of this work.

2 Subset Scanning

Subset scanning treats pattern detection as a search for the “most anomalous” subset of observations in the data where anomalousness is quantified by a scoring function, (typically a log-likelihood ratio). Therefore, we wish to efficiently identify over all subsets of the data . The particular scoring functions used in this work are covered in the next sub-section.

Subset scanning has been shown to succeed where other heuristic approaches may fail [\citeauthoryearNeill2012]. “Top-down” methods look for globally interesting patterns and then identifies sub-partitions to find smaller anomalous groups of records. These approaches may fail when the true anomaly is not evident from global aggregates.

Similarly, “Bottom-up” methods look for individually anomalous data points and attempt to aggregate them into clusters. These methods may fail when the pattern is only evident by evaluating a group of data points collectively.

Treating the detection problem as a subset scan has desirable statistical properties for maximizing detection power but the exhaustive search is infeasible for even moderately sized data sets. However, a large class of scoring functions satisfy the “Linear Time Subset Scanning” (LTSS) property which allows for exact, efficient maximization over all subsets of data without requiring an exhaustive search [\citeauthoryearNeill2012]. The following sub-sections highlight a class of functions that satisfy LTSS and describe how the efficient maximization process works for scanning over activations from nodes in a neural network.

Nonparametric Scan Statistics

Subset scanning and scan statistics more broadly consider scoring functions that are members of the exponential family and make explicit parametric assumptions on the data generating process. To avoid these assumptions, this work uses nonparametric scan statistics (NPSS) that have been used in other pattern detection methods [\citeauthoryearNeill and Lingwall2007, \citeauthoryearMcFowland III, Speakman, and Neill2013, \citeauthoryearMcFowland, Somanchi, and Neill2018, \citeauthoryearChen and Neill2014].

NPSS require baseline or background data to inform their data distribution under the null hypothesis of no anomaly present. The evaluation input (different than the background inputs) computes empirical -values by comparing it to the empirical baseline distribution. NPSS then searches for subsets of data in the evaluation input that contain the most evidence for not having been generated under . This evidence is quantified by an unexpectedly large number of low empirical -values generated by the evaluation input.

In our specific context, the baseline data is the node activations of 9000 clean CIFAR-10 images from the validation set. Each background image, , generates an activation at each network node ; and likewise, an evaluation image (which is potentially contaminated with adversarial noise) will produce activations .

For a given evaluation image , a collection of background images , and a network with nodes, we can obtain an empirical -value for each node . This is the proportion of activations from the background inputs, , that are larger than the activation from the evaluation input at node . We extend this notion to -value ranges which posses improved statistical properties  [\citeauthoryearMcFowland III, Speakman, and Neill2013].

We use the following two terms to form a -value range at node .

The range is then defined as


The empirical -value for node may then be viewed as random variable uniformly distributed between and under [\citeauthoryearMcFowland, Somanchi, and Neill2018].

As an example, consider a node with activations = for background images. An evaluation image may create an activation at node of and would be given a -value range for node . A different evaluation image may activate node at and would be given a -value range of . Finally, a third evaluation image may produce an activation of at node and would be assigned a range of .

Intuitively, if an evaluation image is “normal” (its activations are drawn from the same distribution as the baseline images) then few -value ranges will be extreme. The key assumption for subset scanning approaches is that under the alternative hypothesis of an anomaly present in the data then at least some subset of the activations will appear extreme.

The -value ranges from an evaluation input are processed by a nonparametric scan statistic in order to identify the subset of node activations that maximizes the scoring function , as this is the subset with the most statistical evidence for having been effected by an anomalous pattern.

The general form of the NPSS score function is


where represents the number of empirical -value ranges contained in subset and is the total probability mass less than (significance level) in these ranges.

The level defines a threshold which -value ranges can be compared against. Specifically, we calculate the portion of the range that falls below the threshold. This may be viewed as the probability that a -value from that range would be significant at that level and is defined as


bounded between 0 and 1.

This generalizes to a subset of nodes, intuitively. and .

Moreover, it has been shown that for a subset consisting of empirical -value ranges,  [\citeauthoryearMcFowland III, Speakman, and Neill2013]. Therefore, we assume an anomalous process will result in some where the observed significance is higher than the expected, , for some .

There are well-known goodness-of-fit statistics that can be utilized in NPSS [\citeauthoryearMcFowland, Somanchi, and Neill2018], the most popular is the Kolmogorov-Smirnov test [\citeauthoryearKolmogorov1933]. Another option is Higher-Criticism  [\citeauthoryearDonoho and Jin2004].

In this work we use the Berk-Jones test statistic[\citeauthoryearBerk and Jones1979]: , where is the Kullback-Liebler divergence between the observed and expected proportions of significant -values. Berk-Jones can be interpreted as the log-likelihood ratio for testing whether the -values are uniformly distributed on as compared to following a piece-wise constant alternative distribution, and has been shown to fulfill several optimality properties.

Efficient Maximization of NPSS

Although NPSS provides a means to evaluate the anomalousness of a subset of activations for a given input, discovering which of the possible subsets provides the most evidence of an anomalous pattern is computationally infeasible for large . However, NPSS has been shown to satisfy the linear-time subset scanning (LTSS) property [\citeauthoryearNeill2012], which allows for efficient and exact maximization of .For a pair of functions and representing the score of a given subset and the “priority” of data record respectively, we have a guarantee that the subset maximizing the score will be one consisting only of the top- highest priority records, for some between and .

For NPSS the priority of a node activation is the proportion of its -value range that is less than and was introduced in Equation 3. .

Figure 1: A sample problem with 4 nodes. The nodes’ -value ranges are shown along with two different thresholds. The tables show the priority of each node for each threshold and lists the subsets that must be scored in order to guarantee the highest scoring subset is identified.

Figure 1 shows a sample problem of maximizing a NPSS scoring function over 4 example nodes and two different thresholds. The leftmost graphic shows -value ranges for 4 nodes. We highlight node that has and . For we observe as 40% of node ’s -value range is below the threshold. A larger proportion of node ’s -value range falls below 0.2 and therefore node has a higher priority than .

We emphasize that a node’s priority (and therefore, also the priority ordering) is induced by the threshold value. We demonstrate this by considering in the example, as well. The priority of node increases to . Furthermore, the priority ordering of the nodes has changed for the different values. Node had a lower priority than node under and a higher priority than node under .

The next take-away from the example in Figure 1 is how the priority ordering over nodes creates at most 4 subsets (linearly many) that must be scored for each threshold in order to identify the highest-scoring subset overall. Recall the general form of NPSS scoring functions , where represents the number of empirical -value ranges contained in subset and is the total probability mass less than (significance level) in these ranges. When scoring the subset under we evaluate where 1.1 is the sum of 0.7 and 0.4 and 2 is the size of the subset. The scoring function is then quantifying how “anomalous” it is to observe 1.1 significant -values when the expectation is .

We conclude the toy example by providing intuition behind the efficient maximization of scoring functions that satisfy LTSS. Notice under we do not consider the subset . This is because we can guarantee a higher score by either including or removing . The priority ordering over the nodes guides this inclusion sequence which results in only linearly many subsets needing to be scored.

Network Details

Layer Details
Number of Nodes
in Layer
Conv 1 32, 3x3 32,768
Conv 2 32, 3x3 28,800
Pool 1 2x2 7,200
Conv 3 64, 3x3 14,400
Conv 4 64, 3x3 10,816
Pool 2 2x2 2304
Flat 512 512
Table 1: Convolutional Neural Network Architecture

We briefly describe the training process and network architecture before discussing adversarial attacks. We trained a standard convolutional neural network on 50,000 CIFAR-10 training images. The architecture consists of seven hidden layers summarized in Table 1. The first two layers are each composed of 32 3x3 convolution filters. The third layer is a 2x2 max pooling followed by a dropout of . The next three layers repeat this pattern but with 64 filters in each of the two convolution layers. Finally there is a flattened layer of 512 nodes with dropout of before the output layer. The model was trained using activation functions and reached a top-1 classification accuracy of 74%. The accuracy is within expectation for a simple network.

activation functions did achieve slightly higher accuracy for the same architecture, however an accurate model is not the focus of this paper and functions can be difficult to identify an “extreme” activation due to many of the values being 0 for a given input . This was evident even if the pre-activation value is anomalously high for the background but still 0 after . It is possible to perform subset scanning with functions with additional constraints. For example, only allowing positive activations to be considered as part of the most anomalous subset. These constraints clouded the story and we proceeded with instead.

3 Detecting Adversarial Noise with Subset Scanning

Machine Learning models are susceptible to adversarial perturbations of their input data that can cause the input to be misclassified [\citeauthoryearSzegedy et al.2013, \citeauthoryearGoodfellow, Shlens, and Szegedy2014, \citeauthoryearKurakin, Goodfellow, and Bengio2016b, \citeauthoryearDalvi et al.2004]. There are a variety of methods to make neural networks more robust to adversarial noise. Some require retraining with altered loss functions so that adversarial images must have a higher perturbation in order to be successful[\citeauthoryearPapernot et al.2015, \citeauthoryearPapernot and McDaniel2016]. Other detection methods rely on a supervised approach and treat the problem as classification rather than anomaly detection by training on noised examples [\citeauthoryearGrosse et al.2017, \citeauthoryearGong, Wang, and Ku2017, \citeauthoryearHuang et al.2015]. Another supervised approach is to use activations from hidden layers as features used by the detector. [\citeauthoryearMetzen et al.2017]

In contrast, our work treats the problem as anomalous pattern detection and operates in an unsupervised manner without apriori knowledge of the attack or labeled examples. We also do not rely on training data augmentation or specialized training techniques. These constraints make it a more difficult problem, but also more realistic in the adversarial noise domain as new attacks are constantly being created.

A defense in [\citeauthoryearFeinman et al.2017] is more similar to our work. They build a kernel density estimate over background activations from the nodes in only the last hidden layer and report when an image falls in a low density part of the density estimate. This works well on MNIST, but performs poorly on CIFAR-10 [\citeauthoryearCarlini and Wagner2017a]. Our novel subset scanning approach looks at anomalousness at the node-level and throughout the whole network.

All Nodes
FGSM 0.9997 0.9990
0.9420 0.8246
0.5201 0.4980
BIM 0.9913 0.9682
0.8755 0.6961
0.5177 0.4969
CW 0.5005 0.5035
0.5182 0.5020
0.5970 0.5230
Table 2: AUC Detection Results for Subset Scanning over Nodes vs. Scoring All Nodes

Training and Experiment Setup

For our adversarial experiments, we trained the network described in Section 2 on 50,000 CIFAR-10 images. We then took of the 10000 validation images and used them to generate the background activation distribution () at each of the 96,800 nodes in the network. The remaining 1000 images were used to form two groups: “Clean” (C) and “Adversarial” (A) with Adversarial being a noised version for various attack types. Group C did not change for each attack type. We then score the 2000 images contained in A and C. We emphasize their measure of anomalousness is not between the A and C noised counterparts but rather to the background originally formed by , respectively.

We do not calculate a score threshold for which any input above that score is classified as noise. Rather we report the area under the ROC curve which is a measure of how well the score separates the classes A and C. A value of 1.0 means the score perfectly separates the classes and a value of 0.5 is equivalent to random guessing.

Results and Discussion

Table 2 provides detection power results (as measured by area under the ROC curve) for a variety of attacks and their parameters.The three attack types are the Fast Gradient Sign Method (FGSM) [\citeauthoryearGoodfellow, Shlens, and Szegedy2014] and its iterative extension (BIM) [\citeauthoryearKurakin, Goodfellow, and Bengio2016a]. These two attacks have an parameter which is the maximum distance any pixel in the original image may be changed. Note this value is in the [0,1] scaled pixel space rather than [0,255].

The third attack was proposed by Carlini and Wagner (CW) [\citeauthoryearCarlini and Wagner2017b] and has a corresponding parameter that can create “high confidence” attacks. High confidence is measured by the difference between the highest and second highest logit values in the output layer (pre-softmax). All attacks were generated with CleverHans package [\citeauthoryearPapernot et al.2016, \citeauthoryearPapernot et al.2018].

All detection results are for the Berk-Jones NPSS scoring function introduced in Section 2.

The first numeric column shows the detection power when subset scanning is used to identify the most anomalous subset of activations for the input under evaluation. The last column shows the detection power when all nodes are considered together rather than the highest scoring subset. Detection power is higher when scanning over subsets of node and demonstrates early promise for expanding subset scanning methods in future work.

Although still overall low detection power, we point out that our method has a higher probability of detecting the “higher confidence ” CW attacks than the less confident versions. This is because the higher confidence attacks require more deviations in the activation space than their lower confidence versions.

Some attack types did not have 100% success rate. BIM failed to change the predicted label on 0.6% of images for of 0.01. FGSM failed to change the predicted label on 6.1% ,9.8%, and 18.8% of images for 0.10, 0.05, 0.01, respectively. CW failed to to generate “high confidence” attacks for 17.3% for . In all of these cases the failed attacks were removed before calculating detection power.

Figure 2: Area under the ROC curve results for scanning over seven individual layers of the network for multiple attacks. The numbers in parentheses under each layer name is the number of nodes in that particular layer.

In addition to subset scanning over the entire network we also performed separate searches over individual layers of the network. This may be thought of as a rudimentary constraint put on the search process, requiring subsets of nodes to be contained in a single layer. More sophisticated constraints are proposed in detail in Section 4.

Figure 2 shows the detection power of the Berk-Jones scoring function when scanning over individual layers of the network. We make two observations on these results of increasing importance. The first is the increase in detection power when scanning over just the first pooling layer compared to scanning over subsets of nodes in the entire network. Changes to the pixel space are best captured by the pooling layer that condenses the first two convolution layers.

Second, we note the unexpected behavior at Layer Conv 4 and partly Pool 2. The AUC values less than 0.5 are due to the score of the most anomalous subset of nodes in Conv 4 from noised images being less than the score of the most anomalous subset of nodes in Conv 4 for clean images. In other words normal activity is anomalously absent in those layers for noised images. We hypothesize that adversarial noise may more easily confuse neural networks by de-constructing the signal of the original image rather than overpowering it with a rogue signal. This “interference” approach results in large amounts of non-interesting activations in the presence of noise compared to the structure of activations for clean images. Further work is needed on different network architectures to explore this phenomenon.

We conclude the adversarial noise results by locating where (i.e. which layer) in the network the most anomalous activations are triggering. For this approach, we return to subset scanning over the entire network and define a representation metric for each subset and Layer of the network . Representation of a subset and layer has a value of 1 if the proportion of anomalous nodes in the subset and the layer is proportional to the relative size of the layer within the network. This metric allows measuring the relative “size” of the subset within a single layer despite layers varying in the number of nodes.

Figure 3 plots the representation for each subset of the 1000 noised (BIM ) and clean images.

We again make two observations of increasing importance. First, we see that anomalous activity (as identified by subset scanning) of clean images is equally represented across all layers with most subsets having representation centered over 1.0.

Second, and more consequential, adversarial images have anomalous activity over-represented in Pool 1 and under-represented in Conv 4 and Pool 2. This characterization of anomalous activity, as identified by our method, also suggests the “interference” theory of adversarial noise: Anomalous activations in the shallower layers of the network suppress the activation structure of the original image in deeper layers.

Figure 3: Representation measures the size of the subset in a given layer proportional to the size of the layer in the entire network. Values are shown for the BIM attack.

4 Extensions

To enable further clarity and applicability of the current work, many extensions of the method have been left for future work. These extensions may increase detection power or characterization or both.

Simple Extensions

We note that 2-tailed testing is a reasonable approach for tanh and sigmoid activation functions. The definition of “extreme” activations carries over to either larger or smaller than expected intuitively. It is also possible to calculate density-based -values where anomalousness is measured by activations from a low density area of the background activations. This is particularly relevant in deeper nodes where bimodal distributions are likely. This extension requires learning a univariate kernel density estimate at each node, but this can be done offline on background data only.

Additionally, we note that it may be worth calculating conditional -values where every label has its own set of background activations. Then at the time of evaluation only the predicted class’s background is used for calculating -value ranges. This may be particularly powerful for the adversarial noise setting, but it does reduce the size of the background activations by the number of classes.

Finally, the NPSS scoring functions have an additional tuning parameter that has been left at 1.0 for this work. This means we were able to identify very large subsets of the activations that were all slightly anomalous. Smaller values of limit the search space to a smaller number of more anomalous activations within the network. Smaller values have been shown to increase detection power further if the prior belief is a small fraction of the data records are participating in the pattern [\citeauthoryearMcFowland III, Speakman, and Neill2013].

Enforcing Hard Constraints

Constraints on the search space are essential parts of subset scanning. Without constraints, it is likely that inputs drawn from the null distribution will look anomalous by chance by “matching to the noise”. This hurts detection power despite there being a clear anomalous pattern in the alternative. In short, scanning over all subsets may be computationally tractable for scoring functions satisfying LTSS, but it is likely too broad of a search space to capture statistical significance.

This work briefly demonstrated one hard constraint on the search space by performing scans on individual layers of the network as shown in Figure 2. This simple extension increased detection power when scanning over the first pooling layer compared to scanning over all subsets of the network. Furthermore, on results not shown in this paper, we are able to increase the detection of CW from 0.5978 to 0.6553 by scanning over the first 3 layers combined.

Hard connectivity constraints on subset scanning have been used to identify an anomalous subset of data records that are connected in an underlying graph structure that is either known [\citeauthoryearSpeakman, McFowland III, and Neill2015, \citeauthoryearSpeakman, Zhang, and Neill2013] or unknown [\citeauthoryearSomanchi and Neill2017]. Unfortunately, identifying the highest-scoring connected subset is exponential in the number of nodes and a heuristic alternative could be used to identify a high-scoring connected subset [\citeauthoryearChen and Neill2014].

Enforcing Soft Constraints

In addition to changing the search space, we may also alter the definition of anomalousness by making changes to the scoring function itself. For example, we may wish to increase the score of a subset that contains nodes in Pool 1 Layer while decreasing the score of a subset that contains nodes in the Conv 4 Layer. These additional terms can be interpreted as the prior log-odds that a given node will be included in the most anomalous subset [\citeauthoryearSpeakman et al.2016].

Adversarial Noise and Additional Domains

We emphasize that this work is not proposing a proper defense to adversarial noise. However, detection is a critical component of a strong defense. Additional work is needed to turn detection into robustness by leveraging the most anomalous subset activations.

Continuing in the unsupervised fashion, we could mask activations from nodes in certain layers that were deemed anomalous to prevent them from propagating the anomalous pattern. In a supervised setting, the information contained in the most anomalous subset could be used as features for training a separate classifier. For example, systematically counting the number of nodes in the most anomalous subset in Pool 1 and Conv 4 could be powerful features.

We note potential for using the subset score to formulate an attack, rather than a tool for detection. Incorporating the subset score into the loss function of an iterative attack would minimize both the perturbation to the pixel space as well as deviations in the activation space [\citeauthoryearCarlini and Wagner2017a].

Continuing in the security space, we can also apply subset scanning to data poisoning [\citeauthoryearBiggio, Nelson, and Laskov2012, \citeauthoryearBiggio et al.2013]. This current work has considered each image individually, but it is possible to expand it so that the method identifies a group of images that are all anomalous for the same reasons in the activation space. This is the original intention of the Fast Generalized Subset Scan[\citeauthoryearMcFowland III, Speakman, and Neill2013].

Leaving the security domain, anomalous pattern detection on neural network activations can be expanded to more general settings of detecting out-of-distribution samples. This view has implications for detecting bias in classifiers, distribution shift for temporal data, and identifying when new class labels may appear in life-long learning domain.

Finally, we acknowledge that subset scanning over activations of neural networks may have uses in capturing patterns in normal, non-anomalous data. Identifying which subset of nodes activate higher than expected in a given network while processing normal inputs has implications for explainable A.I. [\citeauthoryearOlah et al.2018].

5 Conclusion

This work uses the Adversarial Noise domain as an effective narrative device to demonstrate that anomalous patterns in the activation space of neural networks can be Quantified, Detected, and Characterized.

The primary contribution of this work to the deep learning literature is a novel, unsupervised anomaly detector that can be applied to any pre-trained, off-the-shelf neural network model. The method is based on subset scanning which treats the detection problem as a search for the highest scoring (most anomalous) subset of node activations as measured by non-parametric scan statistics. These scoring functions satisfy the Linear Time Subset Scanning property which allows for exact, efficient maximization over all possible subsets of nodes in a network containing nodes.

Our method is able to quantify activation data on the order of 100,000 dimensions into a single real-valued anomalousness “score”. We then used this score to detect images that had been perturbed by an adversary in order to change the network’s class label of the input. Finally, we used the identified subset of anomalous nodes in the network to characterize the adversarial noise pattern. This analysis highlighted a possible “interference” mode of adversarial noise that uses anomalous activations in the shallow layers to suppress the the true activation pattern of the original image.

We concluded the work by highlighting multiple extensions of subset scanning into the deep learning space. Many of these extensions attempt to overcome the relative weak detection power of unconstrained subset scanning that was introduced in this work. This is accomplished by enforcing constraints on the search space or alterations to the scoring functions, or both.

Additional domains outside of adversarial noise and security will also benefit from identifying anomalous activity within neural networks. Life-long learning models need to recognize when a new class of inputs become available and production level systems must always guard against distribution shift over time.


  1. Berk, R. H., and Jones, D. H. 1979. Goodness-of-fit test statistics that dominate the Kolmogorov statistics. Zeitschrift fär Wahrscheinlichkeitstheorie und Verwandte Gebiete 47:47–59.
  2. Biggio, B.; Didaci, L.; Fumera, G.; and Roli, F. 2013. Poisoning attacks to compromise face templates. In Biometrics (ICB), 2013 Int. Conference on, 1–7. IEEE.
  3. Biggio, B.; Nelson, B.; and Laskov, P. 2012. Poisoning attacks against support vector machines. In International Conference on Machine Learning (ICML). Omnipress (arXiv preprint arXiv:1206.6389).
  4. Carlini, N., and Wagner, D. 2017a. Adversarial examples are not easily detected: Bypassing ten detection methods. CoRR abs/1705.07263.
  5. Carlini, N., and Wagner, D. 2017b. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy.
  6. Chen, F., and Neill, D. B. 2014. Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, 1166–1175.
  7. Dalvi, N.; Domingos, P.; Mausam; Sanghai, S.; and Verma, D. 2004. Adversarial classification. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), KDD ’04, 99–108. New York, NY, USA: ACM.
  8. Donoho, D., and Jin, J. 2004. Higher criticism for detecting sparse heterogeneous mixtures. Annals of Statistics 32(3):962–994.
  9. Feinman, R.; Curtin, R. R.; Shintre, S.; and Gardner, A. B. 2017. Detecting adversarial samples from artifacts. CoRR 1703.00410.
  10. Gong, Z.; Wang, W.; and Ku, W. 2017. Adversarial and clean data are not twins. CoRR abs/1704.04960.
  11. Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2014. Explaining and harnessing adversarial examples. CoRR abs/1412.6572.
  12. Grosse, K.; Manoharan, P.; Papernot, N.; Backes, M.; and McDaniel, P. D. 2017. On the (statistical) detection of adversarial examples. CoRR abs/1702.06280.
  13. Huang, R.; Xu, B.; Schuurmans, D.; and Szepesvári, C. 2015. Learning with a strong adversary. CoRR abs/1511.03034.
  14. Kolmogorov, A. N. 1933. Sulla determinazione empirica di una legge di distribuzione. na.
  15. Kurakin, A.; Goodfellow, I. J.; and Bengio, S. 2016a. Adversarial examples in the physical world. CoRR abs/1607.02533.
  16. Kurakin, A.; Goodfellow, I. J.; and Bengio, S. 2016b. Adversarial machine learning at scale. CoRR abs/1611.01236.
  17. McFowland III, E.; Speakman, S. D.; and Neill, D. B. 2013. Fast generalized subset scan for anomalous pattern detection. The Journal of Machine Learning Research 14(1):1533–1561.
  18. McFowland, III, E.; Somanchi, S.; and Neill, D. B. 2018. Efficient Discovery of Heterogeneous Treatment Effects in Randomized Experiments via Anomalous Pattern Detection. ArXiv e-prints.
  19. Metzen, J. H.; Genewein, T.; Fischer, V.; and Bischoff, B. 2017. On detecting adversarial perturbations. CoRR abs/1702.04267.
  20. Neill, D. B., and Lingwall, J. 2007. A nonparametric scan statistic for multivariate disease surveillance. Advances in Disease Surveillance 4:106.
  21. Neill, D. B. 2012. Fast subset scan for spatial pattern detection. Journal of the Royal Statistical Society (Series B: Statistical Methodology) 74(2):337–360.
  22. Olah, C.; Satyanarayan, A.; Johnson, I.; Carter, S.; Schubert, L.; Ye, K.; and Mordvintsev, A. 2018. The building blocks of interpretability. Distill.
  23. Papernot, N., and McDaniel, P. D. 2016. On the effectiveness of defensive distillation. CoRR abs/1607.05113.
  24. Papernot, N.; McDaniel, P. D.; Wu, X.; Jha, S.; and Swami, A. 2015. Distillation as a defense to adversarial perturbations against deep neural networks. CoRR abs/1511.04508.
  25. Papernot, N.; Goodfellow, I.; Sheatsley, R.; Feinman, R.; and McDaniel, P. 2016. cleverhans v1.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768.
  26. Papernot, N.; Faghri, F.; Carlini, N.; Goodfellow, I.; Feinman, R.; Kurakin, A.; Xie, C.; Sharma, Y.; Brown, T.; Roy, A.; Matyasko, A.; Behzadan, V.; Hambardzumyan, K.; Zhang, Z.; Juang, Y.-L.; Li, Z.; Sheatsley, R.; Garg, A.; Uesato, J.; Gierke, W.; Dong, Y.; Berthelot, D.; Hendricks, P.; Rauber, J.; and Long, R. 2018. Technical report on the cleverhans v2.1.0 adversarial examples library. arXiv preprint arXiv:1610.00768.
  27. Somanchi, S., and Neill, D. B. 2017. Graph structure learning from unlabeled data for early outbreak detection. IEEE Intelligent Systems 32(2):80–84.
  28. Speakman, S.; Somanchi, S.; III, E. M.; and Neill, D. B. 2016. Penalized fast subset scanning. Journal of Computational and Graphical Statistics 25(2):382–404.
  29. Speakman, S. D.; McFowland III, E.; and Neill, D. B. 2015. Scalable Detection of Anomalous Patterns With Connectivity Constraints. Journal of Computational and Graphical Statistics 24(4):1014–1033.
  30. Speakman, S.; Zhang, Y.; and Neill, D. B. 2013. Dynamic pattern detection with temporal consistency and connectivity constraints. In 2013 IEEE 13th International Conference on Data Mining, 697–706.
  31. Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I. J.; and Fergus, R. 2013. Intriguing properties of neural networks. CoRR abs/1312.6199.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description