Neuron Shapley: Discovering the Responsible Neurons

Neuron Shapley: Discovering the Responsible Neurons


We develop Neuron Shapley as a new framework to quantify the contribution of individual neurons to the prediction and performance of a deep network. By accounting for interactions across neurons, Neuron Shapley is more effective in identifying important filters compared to common approaches based on activation patterns. Interestingly, removing just 30 filters with the highest Shapley scores effectively destroys the prediction accuracy of Inception-v3 on ImageNet. Visualization of these few critical filters provides insights into how the network functions. Neuron Shapley is a flexible framework and can be applied to identify responsible neurons in many tasks. We illustrate additional applications of identifying filters that are responsible for biased prediction in facial recognition and filters that are vulnerable to adversarial attacks. Removing these filters is a quick way to repair models. Enabling all these applications is a new multi-arm bandit algorithm that we developed to efficiently estimate Neuron Shapley values.


1 Introduction

Understanding and interpreting the behavior of a trained neural net has gained increasing attention in the machine learning (ML) community. A popular approach is to interpret and visualize the behavior of specific neurons (People often randomly selected neurons from different layers to visualize.) in a trained network Olah et al. (2017), and there are several ways of doing this. For example, one could show which training data leads to the most positive or negative output from this neuron, or one could perform a deep-dream style visualization to see how to modify the input to greedily activate the neuron Olah et al. (2017). Such approaches are widely used and can give interesting insights into the function of the network. However, analysis of individual neurons is often ad-hoc since there is a large number of neurons in the network, with many distinct behaviors and it’s typically not clear which ones to investigate. Moreover, it’s not clear how relevant is the behavior of specific neurons to the function of the entire network.

In this paper, we propose a new framework to address these limitations by systematically and efficiently identifying the neurons that are the most important contributors to the network’s function, combined with methods to interpret these neurons. The basis of this framework is a new algorithm we propose—Neuron Shapley—for quantifying the importance of each neuron in a trained network while accounting for complex interactions between neurons. Interestingly, for several standard image recognition tasks and architectures, a small number of fewer than 30 neurons (filters) are crucially necessary for the model to achieve good prediction accuracy. Interpretation of these critical neurons provides more systematic insights into how the network functions.

Neuron Shapley is a very flexible framework that can be applied to identify responsible neurons in different tasks. When applied to a facial recognition network that makes biased predictions for black women, Neuron Shapley identifies the (few) neurons that are responsible for this disparity. It also identifies neurons that are the most responsible for vulnerabilities to adversarial attacks. In addition to facilitating interpretation, this opens up an interesting opportunity to use Neuron Shapley for fast model repair without needing to retrain. For example, our experiments show that simply zeroing out the few “culprit” neurons reduces disparity without much degradation to the overall accuracy.

Our contributions

We summarize the contributions of this work here. Conceptual: we develop the Neuron Shapley framework to quantify the contribution of each neuron to the network’s performance. Algorithmic: we introduce a new multi-arm bandit based algorithm that efficiently estimates Neuron Shapley values in large networks. Empirical: our systematic experiments discover several interesting findings, including the phenomenon that a small number of neurons are critical to the performance of the convnet in common image classifications. This facilitates both model interpretation and repair.

Related literature

Tracing a trained deep neural network’s behavior to its individual neurons (filters when the network is convolutional) is an important area of research in the interpretability literature. One approach is visualization by identifying human-understandable model inputs, which could be synthetic, that activate a specific set of neurons Simonyan et al. (2013); Szegedy et al. (2013, 2015); Erhan et al. (2009); Mordvintsev et al. (2015); Mahendran and Vedaldi (2015); Nguyen et al. (2016); Olah et al. (2017). A related approach is to search through a database (typically of images) and find the representative examples that fire a chosen neuron Bau et al. (2017); Kim et al. (2017); Ghorbani et al. (2019b). It’s also possible to track the propagation of the signal in the network for a specific input example and quantify a given neuron’s role, though this is typically used to identify important input features  Montavon et al. (2017); Binder et al. (2016); Bach et al. (2015); Shrikumar et al. (2017).

Shapley value Shapley (1953) has been studied in the cooperative game theory and economics Shapley and Roth (1988). It has also been explored for pruning relatively small models Stier et al. (2018). Recent works have studied applications of Shapley value for ML interpretation. In those works, Shapley value is used to quantify the contribution of individual data points to the model’s training  Ghorbani and Zou (2019); Jia et al. (2019); or to quantify the features in the input data that are salient to the network’s prediction. These data/feature centric applications of Shapley value are very different from our approach of measuring the importance of neurons (instead of data), and they provide orthogonal interpretations to a neuron-based interpretation. Moreover, we introduce an adaptive multi-arm bandit algorithm for efficient estimation of the Shapley value which is novel to the best of our knowledge. This algorithm enables us, for the first time, to compute Neuron Shapley values in large-scale state-of-the-art convolutional networks. Model repair has recently been studied in the specific case of fairness applications, but these typically involve some retraining of the network Kim et al. (2019). Repair through Neuron Shapley has the benefit of not requiring retraining, which is faster and is especially useful when access to a large amount of training data is hard for the end-user of the network.

2 Shapley Value for Neurons


We work with a trained ML model which is composed of a set of individual elements: . For example, could be a fully convolutional neural network with layers each with filters. The elements are its filters where . We focus on convolutional networks in this paper because most of the interpretation works are done for images. The Neuron Shapley approach also applies to other network architectures as well as to other ML models such as Random Forests. The trained model is evaluated on a specific performance metric , which can be accuracy, loss, disparity on different racial groups, etc. The performance on the full model is denoted as . Our goal is to assign responsibility to each neuron . Mathematically, we formulate this task as one to partition the overall performance metric among its elements: is element ’s contribution towards such that . For simplicity, we will write to denote .

In order to evaluate the contribution of specific elements or neurons in the model, we would like to “zero out” these elements. In a convnet, this is done by fixing the output of the filter to be its mean output over a set of validation images. This will kill the flow of information through that filter while keeping the mean statistics of the propagated signal from that layer intact (which is not the case if the output is replaced by all zeros). We are interested in subsets (e.g. a sub network), and we write to denote the performance of the model when the elements in are zeroed out. Note that we do not retrain the model after certain elements are zeroed out, and all of the weights of the network are fixed. We simply take the modified network and evaluate its test performance . The reason for doing this is that even fine-tuning the network for each would be prohibitively expensive computationally.

Desirable properties for neuron valuation

For a given model and performance metric, there are many ways to partition among the elements in . This task is further complicated by the fact that different neurons in a network can have complex interactions. We take an axiomatic approach and use the Shapley value to compute because Shapley value is the unique partition of that satisfies several desirable properties. We list these properties below:

  • Zero contribution One decision to make is how to handle neurons that have no contribution. We say that a neuron has no contribution if . In words, this means that it does not change the performance when added to any subset of other neurons in the network. For such null neurons, the valuation should be . One simple example is a neuron with all-zero parameters.

  • Symmetric elements Two neurons should have equal contributions assigned if they are exchangeable under every possible setting: if . Intuitively if adding or to any subnetwork produce the same performance, then they should have the same value.

  • Additivity in Performance Metric In many practical settings, there are two or more performance metrics for a given model. For example measures it’s accuracy on one test point and is its accuracy on a second test point. A natural way to measure the overall performance of the model is having a linear combination of such metrics e.g. . We would like each neuron’s overall contributions to follow the linear relationship i.e. . A real-world example of additivity’s importance is a setting where computing the overall performance is not an option due to privacy concerns. Suppose we are providing a healthcare ML model to several hospitals. The hospitals are not allowed to share their test data with us or each other and can only compute each neuron’s contribution to their own local task and report the results back to us. Additivity allows us to gather the local contributions and aggregate them.

The following contribution formula uniquely satisfies all these properties (while satisfying ):


The formula says that a neuron’s contribution can be interpreted as its marginal contribution to the performance of every subnetwork of the original model (normalized by the number of subnetworks with the same cardinality ). This formula takes into account the interactions between neurons. As a simple example, suppose there are two neurons that improve performance only if they are both present or absent and harm performance if only one is present. Eqn. 1 considers all these possible settings to compute the contribution of each neuron. This, to our knowledge, is one of the few methods that take such interactions into account.

Eqn. 1 is equivalent to the Shapley value Shapley (1953); Shapley and Roth (1988) originally defined for cooperative games. In a cooperative game, players are related to each other through a score function where is the reward if players in opt out. Shapley value was introduced as an equitable way of sharing the group reward among the players where equitable means satisfying the aforementioned properties. In our context, we will refer to in Eqn. 1 as Neuron Shapley. It’s possible to make a direct mapping between our setting and a cooperative game; therefore, proving the uniqueness of Neuron Shapley. The proof is discussed in appendix A.

3 Estimating Neuron Shapley

While Neuron Shapley has good properties, it is computationally expensive to compute. Computing the Shapley value in Eqn. 1 exactly requires an exponential number of operations, since there are exponential number of sets . In what follows, we discuss several techniques for efficiently approximating the Neuron Shapley values. We first rephrase the computational problem of Shapley values to a statistical one. We are then able to introduce approximation methods that result in orders of magnitude speed-ups.

Monte-Carlo estimation

For a model with elements, the Shapley value of the ’th component could be written as Ghorbani and Zou (2019):


where is a uniform distribution over permutations of the model elements and is the set of elements that appear before the ’th one in a given permutation (empty set if is the first element in ).

Following Eqn. 2, approximating the Shapley value is equivalent to estimating the mean of a random variable. Therefore, for an arbitrary error bound, Monte-Carlo estimation can be used to give an unbiased approximation of . Error analysis of this method of approximation has been studied in the literature. Mann and Shapley (1962); Castro et al. (2009); Maleki et al. (2013)

Early Truncation

For a neural network with elements (i.e. filters), is the model’s performance on a finite test set of data points. As the number of elements gets smaller ( small), it’s expected for the network’s performance to drop. For a small enough , the model’s performance degrades to zero or negligible performance as more and more connections in the network are removed (see appendix B for more examples). By utilizing this fact, in a sampled permutation , we can abstain from computing the marginal contribution of elements appearing early on: if is small. In this work, we define a “performance threshold” below which the model is considered dead. This truncation can lead to substantial computational savings (close to one order of magnitude).

Adaptive Sampling

In most of our applications, we are more interested in accurately identifying the important contributing neurons in the model rather than measuring the exact Shapley values of every neuron. This is particularly relevant since, as we will see, there is typically only a sparse number of influential neurons and most of the values are close to zero. Algorithmically, our problem is simplified to finding the subset1 of bounded random variables with the largest expected value from a set of bounded random variables. This can be formulated as a multi-armed-bandit (MAB) problem which has been successfully used in other settings to speed up computation Bagaria et al. (2018); Jamieson and Talwalkar (2016); Li et al. (2017); Zhang et al. (2019).

The MAB component of the algorithm is described in Alg. 1 and we explain the intuition here. For each neuron in the model, we keep tracking a lower and upper confidence bound (CB) on its value , which comes from standard estimation bounds. The goal is to confidently detect the top- neurons. Therefore, at each iteration, instead of sampling the marginal contribution of all neurons (as is the case for standard Monte Carlo approximation), we only sample for a subset of neurons i.e. the subset of neurons where the ’th value at that iteration is in between their lower and upper bounds. If there are no neurons satisfying the sampling condition, it means that the top- neurons are confidently separated (up to an error tolerance) from the rest. We show in appendix E that adaptive sampling results in a nearly one order of magnitude speedup. Although the algorithm adaptively samples to find the top- neurons, the estimated value for other neurons is in practice very close to the non-adaptive case (Spearman’s rank correlation ).

Combining our three approximation methods, we introduce a novel algorithm that we refer to as ”Truncated Multi Armed Bandit Shapley” (TMAB-Shapley). Details are in Alg. 1.

  Input: A trained network with elements ; a black-box metric that evaluates the performance of each subnetwork ; failure probability , tolerance , number of important elements .
  Output: Shapley value of elements:
  Initializations: Shapley values , variance of marginal contributions , Confidence bounds , ,
  while  do
     : Random permutation of elements
     for  do
        if  then
           if  then
           end if
        end if
        Thresh ’th largest value
     end for
  end while
Algorithm 1 Truncated Multi Armed Bandit Shapley
Figure 1: Visualizing filters critical for overall accuracy Top, we visualize the highest Shapley value filters for a select few Inception-v3 blocks (more results in the Appendix). For each filter, we show 5 examples of images that lead to the most positive activation and the most negative of that filter. Additionally, we visualize each filter by optimizing a random input to highly activate (positively or negatively) the selected filter through optimization. These filters can have meaningful interpretations, which we write on the left. Earlier layer filters extract simple features like color or pattern. As we go deeper, filters capture sophisticated features like crowdedness or how much color is in the image. On the bottom, we show how many of the top- contributing filters appear in each layer.

4 Experiments & Applications

Implementation details:

We apply Neuron Shapley to two widely-used deep convolutional neural network architectures. First is the Inception-v3 Szegedy et al. (2016) architecture trained on the ILSVRC2012 (a.k.a ImageNet) Russakovsky et al. (2015) dataset (reported test accuracy ). We use Alg. 1 to compute the Neuron Shapley value for each of the 17216 filters preceding the logit layer in this network. We divide the released ImageNet validation set into two parts (25000 images each) to serve as validation and test sets. The second model is the SqueezeNet Iandola et al. (2016) architecture that we trained on the celebA Liu et al. (2018) dataset to detect gender from face images ( test accuracy). This model has a total of 2976 filters. In all the experiments, we set to detect the top- important filters. The results are robust to the choice of . We use empirical Bernstein Mnih et al. (2008); Maurer and Pontil (2009) to compute the confidence bounds.

Neuron Shapley identifies a small number of critical neurons

We apply Alg. 1 to compute the Neuron Shapley value for all of the Inception-v3 filters. Here, we used the performance metric of the overall multi-class prediction accuracy of the network (on a randomly sampled batch of images) as to evaluate the Shapley values. Interestingly Neuron Shapley values are very sparse. We can evaluate the impact of the neurons with the largest Shapley values by zeroing them out in the Inception-v3. Removing just the top 10 filters and the overall test accuracy of Inception-v3 dropped from 74% to 38%; removing the top 20 neurons and the accuracy dropped to 8%. It is interesting that a handful of neurons have such a strong effect on the network’s performance. In contrast, removing 20 random neurons in the network or the 20 neurons with the largest average activation does not significantly reduce the accuracy of Inception-v3. A related phenomenon has shown that many connections/weights in the network can be removed without damaging performance Frankle and Carbin (2018). The difference is that these works on pruning and sparsity have primarily focused on removing connections, while our experiment here is for removing neurons.

The sparse set of critical filters as identified by high Shapley values is a natural set of neurons to visualize and interpret. In Fig. 1, we visualize the filter with the highest Shapley value in 7 of the layers of Inception-v3. We provide two types of visualizations: 1) Deep Dream images (first column of each block)2; 2) and the five images in the validation set that result in the most positive or most negative activation of the filter. The critical neurons in the earlier layers capture color (white vs. black) and texture (vertical stripes vs. smooth). The later layer critical neurons capture more complex concepts like colorfulness or crowdedness of the image which is consistent with previous findings using different approaches Kim et al. (2017); Bau et al. (2017); Ghorbani et al. (2019b). The final component of Fig. 1 shows how the top 100 filters with the highest Shapley values are distributed in different layers of Inception-v3. Overall more of these filters tend to be in the early layers, consistent with the notion that initial layers learn general concepts and the deeper layers are more class-specific. More results are discussed in Appendix C. We report similar experiment results for the Squeezenet model in Appendix D.

Figure 2: Class-specific critical neurons (a) Removing filters with the highest class-specific Shapley values (blue dash) reduce the class prediction accuracy more effectively than removing filters identified by other approaches. We select four representative classes to show (more in the Appendix). (b) We visualize two critical filters for each class by showing the top 5 most positively activating images along with the deep dream visualization of the filter. (c) Class-specific filters are more common in the deeper layers.

Class specific critical neurons

We can dive more deeply to investigate which neurons are the most responsible for class-specific predictions. For a given class (e.g. zebra, dumbbell), we use the class recall as the performance metric and apply Alg. 1 to detect the most important neurons for detecting that class. We then inspect filters with the largest contribution.

As before, Neuron Shapley discovers that a small number of filters are critical for class-specific predictions. We provide results for four representative classes (carousel, zebra, police van and dumbbell) in Fig. 2. In each of these classes removing top filters with the highest class-specific Shapley values lead to a dramatic decline in the network’s test accuracy for that class (Fig. 2a). For comparison, we also applied 4 popular alternative approaches for identifying important neurons—by filter size ( norm of the weights), norm of the filter’s response, leave-on-out impact (LOO; which is the change in the metric performance if just that filter is zeroed out). We also remove the top neurons identified by each of these alternative approaches. Overall, Neuron Shapley is more effective in finding critical neurons. We further report the network’s overall accuracy across all the classes; removing class-specific critical neurons does not affect the overall performance.

Fig. 2 visualizes two of the most critical neurons for each of the four classes—Deep Dream and the top five highest activating training images are shown for each filter. For the zebra class, diagonal stripes is a critical filter. For dumbbell, one critical neuron clearly captures dumbbell like shapes directly; the second captures people with arms visible, likely because that’s highly correlated with dumbbells in natural images (which is observed in previous literature Kim et al. (2017); Ghorbani et al. (2019b)). It’s also interesting to note most of the class-specific critical filters are located in the deeper layers (Fig. 2(c)), which is the opposite of the distribution of the generally important neurons.

Figure 3: Model repair using Neuron Shapley (a) We compute each filter’s contribution to the fair performance of the model. Removing filters with the most negative contribution shows that the model improves especially for black females (BF). The four populations are white female (WF), black female (BF), white male (WM) and black male (BM). (b) We compute each filter’s contribution to the adversary’s success rate. After removing these neurons, the adversary is much less successful (black), and the model becomes able to detect a large portion of adversarial perturbed images as their true class (red).

Neuron Shapley is a flexible framework that can be used to identify neurons that are responsible for many types of network behavior beyond the standard prediction accuracy. We illustrate its usage on two important applications in fairness and adversarial attacks.

Discovering unfair filters

It has been shown that the gender detection models have certain biases towards minorities Buolamwini and Gebru (2018): for example, they are less accurate on female faces and especially on black female faces. We took SqueezeNet trained on CelebA faces and evaluated its performance on the PPB dataset, which has an equal representation of four subgroups of gender-race Buolamwini and Gebru (2018). Following previous works Kim et al. (2019), we use the average recall on different subgroups as a measure of fairness and used this metric as the for evaluating Neuron Shapley. Alg. 1 is used to compute the Shapley value of each filter in SqueezeNet. In this case, we are most interested in the filters with the most negative values as they would decrease fairness and contribute to the disparity. Zeroing out these “culprit” filters greatly increased the gender classification accuracy on black female (BF) faces from to (Fig. 3). It also led to a substantial improvement for white females (WF). The average accuracy on PPB increased from to . The performance on the original CelebA data only dropped a little from this modification. This suggests the interesting potential of using Neuron Shapley for rapid model repair. Zeroing out filters can be much faster and easier than retraining the network, and it also does not require an extensive training set.

Identifying filters vulnerable to adversaries.

Deep neural networks are vulnerable to attacks. For example, an adversary can arbitrarily change the output of the model by adding imperceptible perturbations to the input image Goodfellow et al. (2014); Ghorbani et al. (2019a). We apply Neuron Shapley to identify filters that are most vulnerable to attacks.

We take the Inception-v3 model trained on ImageNet. We use an adversary whose goal is to perturb each validation image so it’s misclassified as a randomly chosen class by Inception-v3. We use the iterative PGD attack  Kurakin et al. (2016); Madry et al. (2017) which is one of the most common adversarial attack methods. We allow the perturbation norm to be at most (for pixel values between 0 and 255) which is considered as the maximum allowed perturbation size in the literature Tramèr et al. (2017).

The performance metric is the success rate of the adversary in fooling the network into predicting the randomly chosen labels on the validation data. The Neuron Shapley values are computed for each filter with respect to this . So a high value suggests that the filter is more targeted and leveraged by the adversary to produce misclassification. The rank correlation between these adversarial Shapley values and the original prediction accuracy Shapley values is just 0.3. This suggests that the network filters interact differently on the adversarially perturbed images than on the clean images.

We zero out the filters with the highest Shapley values (these are the most vulnerable filters). Removing just 16 filters and the adversary’s attack success rate drops from nearly 100% to nearly zero (0.1%). While the model’s performance on clean images drops more moderately from to . We note that while the modified network is robust to the original adversary, it is still vulnerable to a new adversary specifically designed to attack the modified network. This requires a white-box adversary who knows exactly which few neurons are zeroed out. We investigated several black-box adversaries—i.e. attacks that are developed on other datasets and which are not used to compute the Neuron Shapley value. The modified network is substantially more robust again these other black-box adversaries—their attack success rate drops by on average. This suggests that Neuron Shapley can potentially offer a fast mechanism to repair models against black-box attacks without needing to retrain.

Figure 4: Filter dropout effect The Neuron Shapley results for two Squeezenet models trained on the celeb-A dataset, one trained with filter dropout and one without. (a) The histogram of values for the two models. (b) Removing the 30 highest value neurons shows that the dropout-trained model is more robust. (c) Removing filters with the least values shows that the dropout trained model is robust to the removal of almost half of the filters. The celebA test set has a 40-60 class imbalance; therefore we see the sharp drop from accuracy to accuracy.

Dropout effect

The standard convnets, like the ones we use here, are typically trained without dropout regularization for conv layers Szegedy et al. (2016, 2017); Simonyan and Zisserman (2014). We hypothesize that adding dropout could substantially change the Shapley values because it encourages filters to be more independent. To test this, we train a second SqueezeNet on CelebA and use (filter) dropout throughout its training (). This model is accurate on test images, which is slightly lower than the non-dropout model. We compute the Neuron Shapley values in the new model. The values with dropout are more concentrated around zero (Fig. 4(a)). As expected, the dropout Squeezenet is also more robust to the removal of high-value filters (Fig. 4), presumably, because dropout encourages more redundancy. Though it’s interesting that even with dropout, removing just the top 30 filters can completely diminish the network’s performance, suggesting that there’s still a small number of critical neurons.

5 Discussion

We introduce Neuron Shapley, a post-training method to quantify individual neuron’s contribution to the network’s performance. Neuron Shapley is theoretically principled due to its connection from game theory and is well-suited to disentangling the interactions of different neurons. We show that using Neuron Shapley we are able to discover a sparse structure of critical neurons both on the class-level and the global-level. The model’s behavior is largely dependant on the presence of the critical neurons. We can utilize this sparsity to apply post-training fixes to the model without any access to the training data; e.g. we can make the model more fair towards specific subgroups or less fragile against adversarial attacks just by removing a few responsible neurons. This opens interesting new approaches to model repair that deserves further investigation. A drawback of the Neuron Shapley formulation is its large computational cost. We have introduced a novel multi-arm bandit algorithm to reduce that cost by orders of magnitude. This enables us to efficiently compute Shapley values on widely used deep networks. Throughout this work, we have mainly focused on post-training edits to the model. One interesting future direction is to change the model given the neuron contributions and retrain it in an iterative fashion.

Appendix A Proof

The following proof is a direct map of the original Shapley value proof in the cooperative game theory setting  Shapley (1953).

a.1 Neuron-Shapley satisfies the three desired properties

Zero contribution

If we have a neuron that contributes nothing to any subset of the rest of neurons, by definition of Eqn. 1, its value would be zero.

Symmetric Elements

If two neurons contribute exactly the same to any subset of the rest of neurons, again using Eqn. 1, they will have the same values by definition.


Assume we have . It follows that:

a.2 Proof of uniqueness

We show that any contribution scheme that satisfies the three desired properties is identical to Neuron-Shapley.

Consider a simple binary performance metric where for a subset of the neurons (), we have: if and otherwise. The contribution scheme has to divide among players while satisfying the three properties. By definition of the zero contribution property, a player wheres must have zero contribution. It’s also clear that any two players and are exchangeable and therefore should have equal contribution. Therefore, the only that satisfies the conditions is: if and otherwise.

Lemma A.1.

Given all of the subsets and the simple performance metrics as defined above, we can write any performance metric as a linear combination of these simple metrics i.e. for any subset :



For an arbitrary we have:

the term inside parentheses is the binomial expansion of meaning that it is equal to one if and zero otherwise. The only case where while is when . Therefore:

Now considering that our should satisfy additivity in performance metric, for any player we must have:

Given the result of the previous lemma, we must have:

and by changing the order of summation we have:

let’s define:

for two subsets that only differ in ’th filter (i.e. , we have (as all the right-hand-side terms are the same except for ). It follows that:

and for each that contains , there are sets of filters such that . We have:

and finally we have:

Appendix B Early truncation

As mentioned in the main text, one method of approximation is to assign zero marginal contribution to filters that appear early on in a sampled permutation. In Fig.5 we sample random order of players and start removing filters one by one with the sampled order (100 times). As the figures suggest, for any coalition of filters that are not large enough, the performance is completely degraded. For the Inception-v3 model, removing around of its nearly filters is enough to degrade the model. In our experiments, we approximate the marginal effect of filters by zero whenever the network performance falls below . This gives us one order of magnitude speed-up as we will not perform the actual forward pass for more than of the filters in a sampled permutation in Alg. 1. The same happens for the Squeezenet model by removing around one-fifth of the filters (out of nearly filters in the model).

Figure 5: Truncation The figure shows the performance of the two models used throughout this paper as filters are removed randomly (100 removal trajectories). It can be seen that for Inception-v3 mode, removing around of filters will break performance. The same is true for Squeezenet by removing around of filters.

Appendix C Model interpretation through Neuron-Shapley

A More complete set of examples of important filters of Inception-V3 model are visuazlied in Fig. 6.

Figure 6: Inception-V3 important filters

Appendix D Squeezenet Interpretation

Similar to Fig. 1, in Fig. 7, we show the most important filter in each layer of the Squeezenet model trained for gender detection task. There are a range of filters that can be interpreted as background color, skin color, face angle, amount of hair, and so forth.

Figure 7: CelebA important filters

Appendix E Alg. 1 Sample Efficiency

To investigate the speed-up effect of multi-armed-bandit trick for computing Shapley values, we run the original monte-carlo Shapley algorithm to compute the importance of filters in the squeezenet model. For both monte-carlo Shapley and TMAB-Shapley algorithms, we use empirical Bernstein Mnih et al. (2008); Maurer and Pontil (2009) error bounds which has the benefit of using the empirical variance of the sampled variable. At iteration of Alg. 1, for the ’th filter that has appeared in for iterations, we have:

with probability at least . is the size of the range of ’th filter’s marginal contributions. Throughout this work, we assume minimum priors over the filters and therefore fix for all filters i.e. removing a filter will never result in a more than of drop (or increase) in accuracy. We run both algorithms for an error tolerance of . Fig. 8 depicts the results: (a) First we show the number of samples each algorithm requires for each filter (for better visualization, we rank filters based on their value). As it is seen, TMAB-Shapley is considerably more sample efficient. On average, it requires of the samples required for MC-Shapley; in other words, around times smaller number of forward passes on the model. (b) The histogram of the number of filters versus the number of samples shows that the MAB-Shapley algorithm requires considerably fewer number of samples for most of the filters while requiring a large number of samples for a small group of filters that have values close to that of ’th filter. (c) Empirically, it seems like although the TMAB-Shapley is not targeted towards accurate computation of values for all filters, the computed values are very close to the accurate values computed by MC-Shapley method (Rank correlation = , . This shows that empirical Bernstein is returning pessimistic error bounds which could be an interesting direction of research for future work.

Figure 8: TMAB-Shapley sample efficiency


  1. the cardinality of this subset can be specified by the user or adaptively.
  2. Deep Dream uses gradient ascent to directly optimize for activation of a filter’s response while adding small transformations (jittering, blurring, etc) at each step Olah et al. (2017).


  1. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10 (7). Cited by: §1.
  2. Adaptive monte-carlo optimization. arXiv preprint arXiv:1805.08321. Cited by: §3.
  3. Network dissection: quantifying interpretability of deep visual representations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6541–6549. Cited by: §1, §4.
  4. Layer-wise relevance propagation for neural networks with local renormalization layers. In International Conference on Artificial Neural Networks, pp. 63–71. Cited by: §1.
  5. Gender shades: intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency, pp. 77–91. Cited by: §4.
  6. Polynomial calculation of the shapley value based on sampling. Computers & Operations Research 36 (5), pp. 1726–1730. Cited by: §3.
  7. Visualizing higher-layer features of a deep network. University of Montreal 1341 (3), pp. 1. Cited by: §1.
  8. The lottery ticket hypothesis: finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635. Cited by: §4.
  9. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 3681–3688. Cited by: §4.
  10. Towards automatic concept-based explanations. In Advances in Neural Information Processing Systems, pp. 9273–9282. Cited by: §1, §4, §4.
  11. Data shapley: equitable valuation of data for machine learning. In International Conference on Machine Learning, pp. 2242–2251. Cited by: §1, §3.
  12. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §4.
  13. SqueezeNet: alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360. Cited by: §4.
  14. Non-stochastic best arm identification and hyperparameter optimization. In Artificial Intelligence and Statistics, pp. 240–248. Cited by: §3.
  15. Towards efficient data valuation based on the shapley value. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1167–1176. Cited by: §1.
  16. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (tcav). arXiv preprint arXiv:1711.11279. Cited by: §1, §4, §4.
  17. Multiaccuracy: black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 247–254. Cited by: §1, §4.
  18. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236. Cited by: §4.
  19. Hyperband: a novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research 18 (1), pp. 6765–6816. Cited by: §3.
  20. Large-scale celebfaces attributes (celeba) dataset. Retrieved August 15, pp. 2018. Cited by: §4.
  21. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Cited by: §4.
  22. Understanding deep image representations by inverting them. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5188–5196. Cited by: §1.
  23. Bounding the estimation error of sampling-based shapley value approximation. arXiv preprint arXiv:1306.4265. Cited by: §3.
  24. Values of large games. 6: evaluating the electoral college exactly. Technical report RAND CORP SANTA MONICA CA. Cited by: §3.
  25. Empirical bernstein bounds and sample variance penalization. arXiv preprint arXiv:0907.3740. Cited by: Appendix E, §4.
  26. Empirical bernstein stopping. In Proceedings of the 25th international conference on Machine learning, pp. 672–679. Cited by: Appendix E, §4.
  27. Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recognition 65, pp. 211–222. Cited by: §1.
  28. Inceptionism: going deeper into neural networks, 2015. URL https://research. googleblog. com/2015/06/inceptionism-going-deeper-into-neural. html. Cited by: §1.
  29. Multifaceted feature visualization: uncovering the different types of features learned by each neuron in deep neural networks. arXiv preprint arXiv:1602.03616. Cited by: §1.
  30. Feature visualization. Distill. Note: External Links: Document Cited by: §1, §1, footnote 2.
  31. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115 (3), pp. 211–252. Cited by: §4.
  32. The shapley value: essays in honor of lloyd s. shapley. Cambridge University Press. Cited by: §1, §2.
  33. A value for n-person games. Contributions to the Theory of Games 2 (28), pp. 307–317. Cited by: Appendix A, §1, §2.
  34. Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3145–3153. Cited by: §1.
  35. Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034. Cited by: §1.
  36. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §4.
  37. Analysing neural network topologies: a game theoretic approach. Procedia Computer Science 126, pp. 234–243. Cited by: §1.
  38. Inception-v4, inception-resnet and the impact of residual connections on learning.. In AAAI, Vol. 4, pp. 12. Cited by: §4.
  39. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9. Cited by: §1.
  40. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826. Cited by: §4, §4.
  41. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §1.
  42. Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204. Cited by: §4.
  43. Adaptive monte carlo multiple testing via multi-armed bandits. arXiv preprint arXiv:1902.00197. Cited by: §3.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description