Trustworthy Convolutional Neural Networks: A Gradient Penalized-based Approach

Trustworthy Convolutional Neural Networks: A Gradient Penalized-based Approach

Abstract

Convolutional neural networks (CNNs) are commonly used for image classification. Saliency methods are examples of approaches that can be used to interpret CNNs post hoc, identifying the most relevant pixels for a prediction following the gradients flow. Even though CNNs can correctly classify images, the underlying saliency maps could be erroneous in many cases. This can result in skepticism as to the validity of the model or its interpretation. We propose a novel approach for training trustworthy CNNs by penalizing parameter choices that result in inaccurate saliency maps generated during training. We add a penalty term for inaccurate saliency maps produced when the predicted label is correct, a penalty term for accurate saliency maps produced when the predicted label is incorrect, and a regularization term penalizing overly confident saliency maps. Experiments show increased classification performance, user engagement, and trust.

1 Introduction

Convolutional neural networks (CNNs) are used in computer vision for tasks such as object detection (Redmon and Farhadi, 2018; Redmon et al., 2015; Redmon and Farhadi, 2017), image classification Krizhevsky et al. (2012), and visual question answering (Antol et al., 2015; Cadène et al., 2019). The success of these models has created a need for model transparency. Due to their complex structure, CNNs are often difficult to interpret. Features learned by CNNs can be visualized, but understanding the meaning of these hidden representations can be difficult for non-experts.

Indeed there are many approaches to interpreting the output of machine learning models (Ribeiro et al., 2018, 2016; Lundberg and Lee, 2017; Ancona et al., 2018; Montavon et al., 2015; Kim et al., 2018; Hooker et al., 2019; Chen et al., 2019). Saliency methods offer an intuitive way to understand what a CNN has learned. These algorithms provide post hoc interpretations by highlighting a set of pixels or super pixels in the input image that are most relevant for a prediction. Small changes to these highlighted pixels results in the biggest change in predicted score. One of the first saliency methods, Gradients (sometimes called Vanilla Gradients) compute the gradient of the class score with respect to the input image Simonyan et al. (2014). Guided BackPropagation Springenberg et al. (2015) imputes the gradient of layers using ReLUs, backpropagating only positive gradients. Class Activation Mapping (CAM) uses a specific architecture for CNNs to discriminate regions of the input Zhou et al. (2016). Grad-CAM generalizes CAM to any CNN architecture Selvaraju et al. (2016). There exists many different saliency methods (Shrikumar et al., 2017, 2016; Zeiler and Fergus, 2014; Smilkov et al., 2017; Sundararajan et al., 2017; Montavon et al., 2017) to visually interpret CNN predictions. There is existing work showing not all of these methods are robust Adebayo et al. (2018), and that saliency maps can be manipulated Dombrowski et al. (2019). HINT Selvaraju et al. (2019) encourages deep neural networks to be sensitive to the same input regions as humans. The authors of Rieger et al. (2019) add a penalty term for explanation predictions far away from their respective ground truth. For future versions, potential baselines include (Zintgraf et al., 2017; Chang et al., 2019; Kindermans et al., 2018)

On the task of image classification, saliency methods allow the user to visually inspect the highlighted regions of the image. The overlap between the saliency map and the object of interest can be easily and quickly compared by users. This forms an intuitive way to interpret what a model has learned. Saliency methods however can attribute relevant pixels outside the object of interest, producing what we term an inaccurate saliency map. Indeed any of the saliency methods mentioned above can encounter some form of this issue. This indicates the model has chosen a poor set of parameters.1 On the contrary, an accurate saliency map attributes pixel values over the object of interest.

Convolutional neural networks are most commonly optimized by minimizing the cross entropy between the target distribution and the predicted target distribution. This loss function is unaware that the choice of parameters giving a correct prediction may result in inaccurate saliency maps shown to the user. Conversely, parameters may be learned that give an incorrect prediction but result in a visually accurate saliency map. Figure 0(b) shows an example where the predicted label is incorrect, and the saliency map is inaccurate. Figure 0(c) shows an example where the predicted label is correct, and the saliency map is accurate.

We propose a loss function that unifies traditional loss functions with post hoc interpretation methods. This function includes a penalty term for inaccurate saliency maps generated when the predicted class is correct, a penalty term for accurate saliency maps generated when the prediction is incorrect, and a penalty term for overly confident saliency maps. Indeed this involves computing predicted saliency maps on the forward pass during training. We demonstrate these penalty terms can be added to the existing loss of a pre-trained model to continue training, or used in a transfer learning framework to improve post hoc saliency maps.

(a) Input Image: Cockroach
(b) VGG-16: Tick
(c) Proposed: Cockroach
Figure 1: VGG-16 incorrectly classifies the input image, and the post hoc saliency map is inaccurate.The trustworthy CNN correctly classifies the image and produces an accurate saliency map.

2 Problem Setting

2.1 Prior Work

Consider a convolutional neural network that classifies an image into one of classes. Let be the true label of an image . Let be ’s predicted label from , or equivalently . The cross entropy is thus given by

(1)

Two saliency methods of particular interest are Grad-CAM and Guided Grad-CAM Selvaraju et al. (2016). Let be the activation maps of some convolutional layer, a given activation map denoted . To produce a saliency map for some , Grad-CAM computes the gradients of the target class with respect to layer ’s activation maps. Global average pooling is performed on the gradients to serve as weights for each activation map. These weights, denoted in equation 2 represent the importance of a given activation map for some target class .

(2)

After computing a weighted combination of the activation maps, the resulting output is passed through a ReLU. Here a ReLU is used to focus only on the features that have a positive influence on the true class , i.e. pixels whose intensity should be increased in order to increase  Selvaraju et al. (2016). The saliency map output by Grad-CAM is thus given by

(3)

Guided Grad-CAM Selvaraju et al. (2016) combines the output from Grad-CAM (equation 3) with the output from Guided Backpropagation Springenberg et al. (2015) through element-wise multiplication. This is done for two reasons; First, Guided Backpropagation alone is not class discriminative. Second, Grad-CAM fails to produce high resolution (fine-grained) visualizations. Merging the two saliency methods produces saliency maps that are both high resolution and class discriminative Selvaraju et al. (2016).

2.2 Shortcomings

Despite state-of-the-art classification performance achieved by convolutional neural networks, loss minimizing parameters may result in saliency maps that do not highlight relevant pixels over the object of interest. Indeed the saliency map is dependent on the learned parameters. Parameters however are learned without knowing if the resulting saliency maps are visually accurate. Additionally, showing an inaccurate saliency map to a practitioner does not provide insight on how to change model parameters to correctly highlight pixels over the object of interest.

As an example, take a pre-trained model VGG-16 Simonyan and Zisserman (2015), trained on the ImageNet dataset Russakovsky et al. (2015). We use Grad-CAM on selected images to identify relevant pixels. Figure 2 shows four different cases that can be encountered when using saliency methods for interpretations. Figure 1(b) shows the case , where the predicted label is correct, and the resulting saliency map is visually accurate. Figure 1(g) shows case , where the predicted class is incorrect, and the resulting saliency map is correct. Figure 1(l) shows case , the predicted class is correct, and the resulting saliency map is inaccurate. Figure 1(q) shows case , where the predicted class is incorrect and the resulting saliency map is inaccurate.

(a) True class:
Brambling
(b) Predicted class:
Brambling
(c) Predicted class:
Brambling
(d) Predicted class:
Brambling
(e) Predicted class:
Brambling
(f) True class:
Vulture
(g) Predicted class:
Kite
(h) Predicted class:
Vulture
(i) Predicted class:
Vulture
(j) Predicted class:
Vulture
(k) True class:
Pajama
(l) Predicted class:
Pajama
(m) Predicted class:
Pajama
(n) Predicted class:
Pajama
(o) Predicted class:
Pajama
(p) True class:
Sports car
Input image
(q) Predicted class:
Racer
Baseline
(r) Predicted class:
Sports car
Proposed ()
(s) Predicted class:
Sports car
Proposed ()
(t) Predicted class:
Sports car
Proposed ()
Figure 2: Input image shown with post hoc saliency maps from a VGG-16 baseline, and our proposed gradient penalized based Trustworthy CNN model shown with various learning rates.

Models giving inaccurate predictions (Figures 1(g)1(q)) and/or inaccurate saliency maps (Figures 1(l)1(q)) will cause users to lose trust in the model. Currently, convolutional neural networks are optimized ignoring how the saliency map will look post hoc. To our knowledge, no method exists to train convolutional neural networks to produce visually accurate saliency maps. We propose a loss function that penalizes inaccurate saliency maps, resulting in model parameters that produce visually accurate saliency maps post hoc, and improved classification performance, ensuring better user trust.

3 Trustworthy Convolutional Neural Networks

We define a trustworthy CNN as one that produces accurate predictions and visually accurate post hoc saliency maps determined by user evaluation.

3.1 Loss function

To identify parameters that produce both accurate predictions and accurate saliency maps, constraints must to be added to the cross entropy loss. Saliency maps produced post hoc can be visually accurate while the model classifies the observation incorrectly. Additionally, visually inaccurate saliency maps can be produced while the model classifies the observation correctly. Lastly, visually inaccurate saliency maps can be observed while the model incorrectly classifies the observation. The loss function must consider the saliency maps produced from the parameter choices at each step taken by the optimizer.

Take a saliency map generated by a saliency method on the forward pass of training. We average the predicted saliency map across all dimensions. Given by equation 4, we use this penalty term to gauge the confidence of the predicted saliency map.

(4)

Adding the constraint of overly confident saliency maps generated during training does not penalize interactions between the saliency maps and predicted labels. Further constraints are needed to account for the predicted saliency map being accurate when the predicted label is incorrect, and the predicted saliency map being inaccurate when the predicted label is correct. Equation 5 and 6 are added for the interaction between the predicted class labels and predicted saliency map. Large gradient saliency maps with corresponding incorrect predicted labels are penalized, along with small gradient saliency maps with corresponding correct predicted labels.

(5)
(6)

The final loss function used for all plots and tables in this work is given by equation 7. We use a scalar to establish a dependence between and the cross entropy . 2

(7)

The loss function we plan on using in future versions is given by

(8)
(9)
(10)

where is the pixel-wise cross entropy between the ground truth saliency map and predicted saliency map.

3.2 Training

To optimize the loss proposed in equation 7, we freeze the weights of all other layers in the network. We use stochastic gradient descent in all our experiments, although any of its variants can be used.

Naturally, this loss function will be most effective in two settings; to update previously learned parameters of a pre-trained model, or learn parameters of a newly added layer in a transfer learning framework. Consider the following example, where a practitioner identifies a layer in a convolutional network that learns a noticeable systematic error. Our loss function allows practitioners to update the layer weights, and eliminate these errors without having to re-train the model from scratch. In the case of transfer learning, a new layer can be added and parameters can be learned that will produce accurate saliency maps post hoc.

There are no restrictions on which saliency method can be used to produce the saliency maps generated during training, provided the generated output when averaged is between zero and one. Regarding choice of saliency method, some choices make more intuitive sense than others. For example, Guided Backpropagation Springenberg et al. (2015) and Deconvolutions Zeiler and Fergus (2014) are not class discriminative, and therefore should not be chosen.

4 Experimental Studies

4.1 Transfer Learning

One interesting application of the proposed loss function is its application to the field of transfer learning Li et al. (2006). Knowledge from models trained on a specific task are applied to an entirely different domain. Some recent developments (Raghu et al., 2019; Lee et al., 2019; Song et al., 2019; Hanneke and Kpotufe, 2019; Zhuang et al., 2015), and several surveys (Pan and Yang, 2010; Zhuang et al., 2019) can provide further detail.

We demonstrate how the proposed loss can be used in the context of transfer learning to improve classification performance. For this experiment, we use MobilenetV2 Sandler et al. (2018) on the cats and dogs dataset Elson et al. (2007), found in Tensorflow Abadi et al. (2016). The task is to classify whether an image contains a cat or dog, using the knowledge learned from the ImageNet dataset. We place one convolutional layer after the last convolutional layer in the MobilenetV2 network, closest to the softmax layer. This additional convolutional layer consists of filters, using a kernel with a stride of . We then remove all layers after, and add a softmax layer. All other layer weights are frozen. The baseline model uses the cross entropy loss given by equation 1. We compare this against two trustworthy CNN models trained using equation 7. The first uses Grad-CAM to generate saliency maps during training, the second uses Guided Grad-CAM. We train all models for epochs using a batch size of . We compare post hoc saliency maps using the structured similarity index (SSIM) given by equation 11, to compare relative to the baseline model. We fix the learning rate to and set .

(11)

Where and , and is defined by the dynamic range of the pixel values.

4.2 VGG-16 on ImageNet

As a second experiment, we apply our loss function to VGG-16 Simonyan and Zisserman (2015), trained on the ImageNet dataset Russakovsky et al. (2015) for an image classification task. Again we train two trustworthy models, one using Grad-CAM to generate saliency maps during training, the other using Guided Grad-CAM. We demonstrate improved post hoc saliency maps as evaluated by users, and improved classification performance. We compare this to a VGG-16 baseline trained using only the cross entropy loss. We use a subset of images, and update the parameters of VGG-16 from the layer. This layer was chosen as it is the closest convolutional layer to the softmax layer. We freeze the weights of all other layers. We compare classification performance of all models using accuracy, precision, and recall. We train different models varying the learning rate and lambda . We perform a grid search for the learning rate and lambda hyperparameters. We consider learning rates , and . Post hoc saliency maps are evaluated with a user experiment, detailed below.

User Experiment

To compare the post hoc saliency maps across models, a scoring metric is needed. Two commonly used metrics are localization error Selvaraju et al. (2016), or the pointing game (Selvaraju et al., 2016; Zhang et al., 2016). Both methods require ground truth object labels. We do not assume the data has any object annotations, hence these metrics cannot be used.

To score the post hoc saliency maps between the trustworthy gradient penalized models and their respective baselines, we conduct a user experiment. We take the best performing set of hyperparameters (learning rate of , and ), and use all models in the experiment to generate saliency maps for user evaluation.

We randomly sample test set images for user evaluation. We show users the input image and a post hoc saliency map from each model. We ask users “Which image best highlights the <true_class> in the original image.” Users were given the option of “Can’t distinguish” and “They look the same” in the case that no saliency maps are convincing. Users are asked if they have a Bachelor’s degree in Computer Science, if they are familiar with the term saliency map used by the machine learning community, and if they have experience in computer vision. A link to the experiment can be found below. 3

5 Results

5.1 Transfer Learning

The classification performance for all models can be found in Table 1. We observe the trustworthy MobilenetV2 model trained using Guided Grad-CAM outperformed the baseline and trustworthy Grad-CAM model. We find that both trustworthy CNN models outperformed their respective baseline models trained with just the cross entropy term.

The SSIM between saliency maps of the baseline model and trustworthy CNN trained with Grad-CAM was , and between the baseline and trustworthy CNN trained with Guided Grad-CAM. When , the SSIM for the trustworthy Grad-CAM and Guided Grad-CAM models drop to , and respectively. When , the SSIM for the trustworthy Grad-CAM and Guided Grad-CAM models increases to , and respectively. This metric shows the post hoc saliency maps differ visually from the baseline, however, it fails to identify which saliency maps are more visually correct.

Model Accuracy Precision Recall
MobilenetV2 Baseline 97.6% 97.6% 97.6%
Trustworthy CNN w/ Grad-CAM 98.4% 98.4% 98.4%
Trustworthy CNN w/ Grad-CAM () 98.3% 98.3% 98.3%
Trustworthy CNN w/ Grad-CAM () 98.5% 98.5% 98.5%
Trustworthy CNN w/ Guided Grad-CAM 98.7% 98.7% 98.7%
Trustworthy CNN w/ Guided Grad-CAM () 98.6% 98.6% 98.6%
Trustworthy CNN w/ Guided Grad-CAM () 98.4% 98.4% 98.4%
Classification performance on transfer learning task. denotes models trained with equation 5 set to zero, denotes models trained with equation 6.
Table 1: Classification Performance-MobilenetV2
Accuracy Precision Recall
VGG-16 Baseline 66% (56%,75%) 49.7% 50.8%
Trustworthy CNN w/ Grad-CAM 70% (61%, 78%) 54.6% 55.7%
VGG-16 Baseline 66% (56%,75%) 49.7% 50.8%
Trustworthy CNN w/ Guided Grad-CAM 70% (61%, 78%) 54.6% 55.7%
Classification performance shown for gradient penalized trustworthy models and baselines on test set images shown during user experiment. A confidence interval (lower, upper) is also included.
Table 2: Classification Performance-VGG16

5.2 VGG-16 on ImageNet

Table 2 shows the accuracy, precision, and recall of all models on the subset of test set images shown to users in the experiment. We use the best performing set of hyperparameters to be evaluated by the users. We find that both trustworthy models outperform their respective baselines. We recognize the classification performance between the trustworthy models to be equal, likely due to setting .

User Experiment

We find the SSIM between the baseline VGG-16 and trustworthy Grad-CAM model was , and between the baseline VGG-16 and trustworthy Guided Grad-CAM model. According to this metric, the saliency maps generated by the gradient penalized models should be very similar to the base model. Through our user experiment however, we find this not to be the case. This is further discussed in Section 6.

Table 3 further breaks down the user experiment, showing the percentage of images that fall into each case. Users decided both trustworthy models outperform the baseline in all scenarios, except case . Recall case occurs when the predicted label is incorrect and the saliency map is accurate. Ideally images fall into case , and fewer images fall into cases , , and .

Case Case
VGG-16 Baseline 18% (7%,28%) 16% (6%, 26%)
Trustworthy CNN w/ Grad-CAM 44% (30%, 58%) 16% (6%, 26%)
VGG-16 Baseline 16% (6%, 26%) 10% (2%,18%)
Trustworthy CNN w/ Guided Grad-CAM 50% (36%, 64%) 18% (7%, 29%)
Case Case
VGG-16 Baseline 44% (30%, 58%) 16% (6%, 26%)
Trustworthy CNN w/ Grad-CAM 22% (10%, 33%) 12% (3%, 21%)
VGG-16 Baseline 48% (34%, 62%) 20% (9%,31%)
Trustworthy CNN w/ Guided Grad-CAM 18% (7%, 29%) 8% (0%, 20%)
For the Trustworthy CNN trained with Grad-CAM, of images shown to users were found to have accurate predictions, and more accurate saliency maps (relative to the baseline). A confidence interval (lower, upper) is included. Recall Case : % of observations with predicted labels correct and resulting saliency maps accurate. Case : % of observations with predicted labels incorrect and resulting saliency maps accurate. Case : % of observations with predicted labels correct and resulting saliency maps inaccurate. Case % of observations with predicted labels incorrect and resulting saliency maps inaccurate. High percentage is desirable for Case , low percentage is desirable for Cases , , and .
Table 3: User Experiment Breakdown-VGG16

5.3 Discussion

We find the trustworthy models trained with Guided Grad-CAM outperform all other models in terms of predicting correct labels and accurate post hoc saliency maps. In the transfer learning experiment, we find the SSIM decreases on gradient penalized models when equation 5 is set to zero. Additionally, the SSIM increases when equation 6 is set to zero. This shows the inherent trade-off between the two terms.

On the ImageNet dataset, users found the baseline VGG-16 models to produce inaccurate saliency maps. These models are not trustworthy. The most common error from the baseline models was producing accurate predictions with inaccurate saliency maps (case from Table 3). This is not surprising considering the cross entropy loss is saliency map unaware. Users will question the legitimacy of the model when inaccurate saliency maps are produced. Our approach offers improved classifiation results and more accurate saliency maps, resulting in increased user trust.

6 Limitations

6.1 Loss Function

One noticeable limitation using the proposed loss is that only one convolutional layer can be updated at a time in training. This is partially due to the limitations of some saliency methods. Grad-CAM and Guided Grad-CAM Selvaraju et al. (2016) for example generate saliency maps using the gradients from a specific layer only. Hence the gradients of one individual layer are used to compute the saliency map. This layer however may not fully represent what the entire model has learned.

(a) Input Image
(b) Hypothetical Model 1
(c) Hypothetical Model 2
Figure 3: Two saliency maps with equal attribution values

6.2 Saliency Map Scoring

One difficulty in scoring saliency maps is that two saliency maps can correctly highlight the object of interest, but equal attribution values can be assigned to different parts of the same object. An example of this can be demonstrated in Figure 3 using two hypothetical models.

For some input image, both models output saliency maps with equal total attribution values, but pixels are attributed to different locations on the object. The model in Figure 2(b) attributes the face of the Marmot, and the model in Figure 2(c) attributes a portion of the face and body. These two saliency maps have exactly the same total attribution values when averaged. Hence, the SSIM between Figures 2(b) and 2(c) is , but look significantly different. It is unclear which saliency map is more visually accurate.

7 Conclusion

In this work, we combine the use of post hoc interpretability methods with traditional loss functions to learn trustworthy model parameters. We propose a loss function that penalizes inaccurate saliency maps during training. Further constraining the loss function used by convolutional neural networks increases classification performance, and users found the post hoc saliency maps to be more accurate. This give a more dependable model. Future work involves extending this method to other tasks (image captioning, object tracking, etc), and other deep learning architectures.

Broader Impact

Users receiving an automated decision from a convolutional neural network will benefit from this research; Our approach provides a way to increase user trust in models previously treated as black box.

Using this approach, parameters of a pre-trained model can be updated, or parameters of a new layer can be learned in a transfer learning framework. Errors from an existing model can be identified and fixed. For practitioners wanting to eliminate a race or gender bias from a model, they will not have to retrain the model from scratch. This will save electricity used by the machine(s) to train.

We do not believe anyone is put at a disadvantage from this research. A failure of this system would mean the model would no longer be convincing to users, and thus no different than the original black box model.

Footnotes

  1. We recognize it is also possible the saliency method is not robust. For this paper we focus on model induced errors.
  2. We recognize and are not guaranteed to be between zero and one. In our experiments however, we find that the cross entropy term in equation 1 and regularization term in equation 4 are between zero and one when each term is divided by the number of classes .
  3. https://forms.gle/DMszuv84sbxB9tzt7

References

  1. (2015) 2015 IEEE international conference on computer vision, ICCV 2015, santiago, chile, december 7-13, 2015. IEEE Computer Society. External Links: Link, ISBN 978-1-4673-8391-2 Cited by: 10.
  2. (2017) 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, honolulu, hi, usa, july 21-26, 2017. IEEE Computer Society. External Links: Link, ISBN 978-1-5386-0457-1 Cited by: 44.
  3. (2018) 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, salt lake city, ut, usa, june 18-22, 2018. IEEE Computer Society. External Links: Link Cited by: 50.
  4. (2019) 2019 IEEE/CVF international conference on computer vision, ICCV 2019, seoul, korea (south), october 27 - november 2, 2019. IEEE. External Links: Link, ISBN 978-1-7281-4803-8 Cited by: 52.
  5. (2017) 5th international conference on learning representations, ICLR 2017, toulon, france, april 24-26, 2017, conference track proceedings. OpenReview.net. External Links: Link Cited by: 68.
  6. (2019) 7th international conference on learning representations, ICLR 2019, new orleans, la, usa, may 6-9, 2019. OpenReview.net. External Links: Link Cited by: 17.
  7. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. A. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu and X. Zheng (2016) TensorFlow: A system for large-scale machine learning. See 12th USENIX symposium on operating systems design and implementation, OSDI 2016, savannah, ga, usa, november 2-4, 2016, Keeton and Roscoe, pp. 265–283. External Links: Link Cited by: §4.1.
  8. J. Adebayo, J. Gilmer, M. Muelly, I. J. Goodfellow, M. Hardt and B. Kim (2018) Sanity checks for saliency maps. See Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, neurips 2018, 3-8 december 2018, montréal, canada, Bengio et al., pp. 9525–9536. External Links: Link Cited by: §1.
  9. M. Ancona, E. Ceolini, C. Öztireli and M. Gross (2018) Towards better understanding of gradient-based attribution methods for deep neural networks. See ?, External Links: Link Cited by: §1.
  10. S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick and D. Parikh (2015) VQA: visual question answering. See 1, pp. 2425–2433. External Links: Link, Document Cited by: §1.
  11. P. L. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger (Eds.) (2012) Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. proceedings of a meeting held december 3-6, 2012, lake tahoe, nevada, united states. External Links: Link Cited by: 31.
  12. S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi and R. Garnett (Eds.) (2018) Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, neurips 2018, 3-8 december 2018, montréal, canada. External Links: Link Cited by: 8.
  13. Y. Bengio and Y. LeCun (Eds.) (2014) 2nd international conference on learning representations, ICLR 2014, banff, ab, canada, april 14-16, 2014, workshop track proceedings. External Links: Link Cited by: 55.
  14. Y. Bengio and Y. LeCun (Eds.) (2015) 3rd international conference on learning representations, ICLR 2015, san diego, ca, usa, may 7-9, 2015, conference track proceedings. External Links: Link Cited by: 56.
  15. Y. Bengio and Y. LeCun (Eds.) (2015) 3rd international conference on learning representations, ICLR 2015, san diego, ca, usa, may 7-9, 2015, workshop track proceedings. External Links: Link Cited by: 59.
  16. R. Cadène, C. Dancette, H. Ben-younes, M. Cord and D. Parikh (2019) RUBi: reducing unimodal biases for visual question answering. See Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, neurips 2019, 8-14 december 2019, vancouver, bc, canada, Wallach et al., pp. 839–850. External Links: Link Cited by: §1.
  17. C. Chang, E. Creager, A. Goldenberg and D. Duvenaud (2019) Explaining image classifiers by counterfactual generation. See 6, External Links: Link Cited by: §1.
  18. C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin and J. Su (2019) This looks like that: deep learning for interpretable image recognition. See Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, neurips 2019, 8-14 december 2019, vancouver, bc, canada, Wallach et al., pp. 8928–8939. External Links: Link Cited by: §1.
  19. (2016) 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, las vegas, nv, usa, june 27-30, 2016. IEEE Computer Society. External Links: Link, ISBN 978-1-4673-8851-1 Cited by: 65.
  20. A. Dombrowski, M. Alber, C. J. Anders, M. Ackermann, K. Müller and P. Kessel (2019) Explanations can be manipulated and geometry is to blame. See Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, neurips 2019, 8-14 december 2019, vancouver, bc, canada, Wallach et al., pp. 13567–13578. External Links: Link Cited by: §1.
  21. J. G. Dy and A. Krause (Eds.) (2018) Proceedings of the 35th international conference on machine learning, ICML 2018, stockholmsmässan, stockholm, sweden, july 10-15, 2018. Proceedings of Machine Learning Research, Vol. 80, PMLR. External Links: Link Cited by: 28.
  22. J. Elson, J. R. Douceur, J. Howell and J. Saul (2007) Asirra: a CAPTCHA that exploits interest-aligned manual image categorization. See Proceedings of the 2007 ACM conference on computer and communications security, CCS 2007, alexandria, virginia, usa, october 28-31, 2007, Ning et al., pp. 366–374. External Links: Link, Document Cited by: §4.1.
  23. D. J. Fleet, T. Pajdla, B. Schiele and T. Tuytelaars (Eds.) (2014) Computer vision - ECCV 2014 - 13th european conference, zurich, switzerland, september 6-12, 2014, proceedings, part I. Lecture Notes in Computer Science, Vol. 8689, Springer. External Links: Link, Document, ISBN 978-3-319-10589-5 Cited by: 63.
  24. I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan and R. Garnett (Eds.) (2017) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, 4-9 december 2017, long beach, ca, USA. Cited by: 35.
  25. S. Hanneke and S. Kpotufe (2019) On the value of target data in transfer learning. See Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, neurips 2019, 8-14 december 2019, vancouver, bc, canada, Wallach et al., pp. 9867–9877. External Links: Link Cited by: §4.1.
  26. S. Hooker, D. Erhan, P. Kindermans and B. Kim (2019) A benchmark for interpretability methods in deep neural networks. See Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, neurips 2019, 8-14 december 2019, vancouver, bc, canada, Wallach et al., pp. 9734–9745. External Links: Link Cited by: §1.
  27. K. Keeton and T. Roscoe (Eds.) (2016) 12th USENIX symposium on operating systems design and implementation, OSDI 2016, savannah, ga, usa, november 2-4, 2016. USENIX Association. External Links: Link Cited by: 7.
  28. B. Kim, M. Wattenberg, J. Gilmer, C. J. Cai, J. Wexler, F. B. Viégas and R. Sayres (2018) Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). See Proceedings of the 35th international conference on machine learning, ICML 2018, stockholmsmässan, stockholm, sweden, july 10-15, 2018, Dy and Krause, pp. 2673–2682. External Links: Link Cited by: §1.
  29. P. Kindermans, K. T. Schütt, M. Alber, K. Müller, D. Erhan, B. Kim and S. Dähne (2018) Learning how to explain neural networks: patternnet and patternattribution. See ?, External Links: Link Cited by: §1.
  30. B. Krishnapuram, M. Shah, A. J. Smola, C. C. Aggarwal, D. Shen and R. Rastogi (Eds.) (2016) Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, san francisco, ca, usa, august 13-17, 2016. ACM. External Links: Link, Document, ISBN 978-1-4503-4232-2 Cited by: 46.
  31. A. Krizhevsky, I. Sutskever and G. E. Hinton (2012) ImageNet classification with deep convolutional neural networks. See Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. proceedings of a meeting held december 3-6, 2012, lake tahoe, nevada, united states, Bartlett et al., pp. 1106–1114. External Links: Link Cited by: §1.
  32. J. Lee, P. Sattigeri and G. W. Wornell (2019) Learning new tricks from old dogs: multi-source transfer learning from pre-trained networks. See Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, neurips 2019, 8-14 december 2019, vancouver, bc, canada, Wallach et al., pp. 4372–4382. External Links: Link Cited by: §4.1.
  33. B. Leibe, J. Matas, N. Sebe and M. Welling (Eds.) (2016) Computer vision - ECCV 2016 - 14th european conference, amsterdam, the netherlands, october 11-14, 2016, proceedings, part IV. Lecture Notes in Computer Science, Vol. 9908, Springer. External Links: Link, Document, ISBN 978-3-319-46492-3 Cited by: 64.
  34. F. Li, R. Fergus and P. Perona (2006) One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28 (4), pp. 594–611. External Links: Link, Document Cited by: §4.1.
  35. S. M. Lundberg and S. Lee (2017) A unified approach to interpreting model predictions. See Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, 4-9 december 2017, long beach, ca, USA, Guyon et al., pp. 4765–4774. External Links: Link Cited by: §1.
  36. S. A. McIlraith and K. Q. Weinberger (Eds.) (2018) Proceedings of the thirty-second AAAI conference on artificial intelligence, (aaai-18), the 30th innovative applications of artificial intelligence (iaai-18), and the 8th AAAI symposium on educational advances in artificial intelligence (eaai-18), new orleans, louisiana, usa, february 2-7, 2018. AAAI Press. External Links: Link Cited by: 47.
  37. G. Montavon, S. Bach, A. Binder, W. Samek and K. Müller (2015) Explaining nonlinear classification decisions with deep taylor decomposition. CoRR abs/1512.02479. External Links: Link, 1512.02479 Cited by: §1.
  38. G. Montavon, S. Lapuschkin, A. Binder, W. Samek and K. Müller (2017) Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recognit. 65, pp. 211–222. External Links: Link, Document Cited by: §1.
  39. P. Ning, S. D. C. di Vimercati and P. F. Syverson (Eds.) (2007) Proceedings of the 2007 ACM conference on computer and communications security, CCS 2007, alexandria, virginia, usa, october 28-31, 2007. ACM. External Links: ISBN 978-1-59593-703-2 Cited by: 22.
  40. S. J. Pan and Q. Yang (2010) A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22 (10), pp. 1345–1359. External Links: Link, Document Cited by: §4.1.
  41. D. Precup and Y. W. Teh (Eds.) (2017) Proceedings of the 34th international conference on machine learning, ICML 2017, sydney, nsw, australia, 6-11 august 2017. Proceedings of Machine Learning Research, Vol. 70, PMLR. External Links: Link Cited by: 53, 60.
  42. M. Raghu, C. Zhang, J. M. Kleinberg and S. Bengio (2019) Transfusion: understanding transfer learning for medical imaging. See Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, neurips 2019, 8-14 december 2019, vancouver, bc, canada, Wallach et al., pp. 3342–3352. External Links: Link Cited by: §4.1.
  43. J. Redmon, S. K. Divvala, R. B. Girshick and A. Farhadi (2015) You only look once: unified, real-time object detection. CoRR abs/1506.02640. External Links: Link, 1506.02640 Cited by: §1.
  44. J. Redmon and A. Farhadi (2017) YOLO9000: better, faster, stronger. See 2, pp. 6517–6525. External Links: Link, Document Cited by: §1.
  45. J. Redmon and A. Farhadi (2018) YOLOv3: an incremental improvement. CoRR abs/1804.02767. External Links: Link, 1804.02767 Cited by: §1.
  46. M. T. Ribeiro, S. Singh and C. Guestrin (2016) ”Why should I trust you?”: explaining the predictions of any classifier. See Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, san francisco, ca, usa, august 13-17, 2016, Krishnapuram et al., pp. 1135–1144. External Links: Link, Document Cited by: §1.
  47. M. T. Ribeiro, S. Singh and C. Guestrin (2018) Anchors: high-precision model-agnostic explanations. See Proceedings of the thirty-second AAAI conference on artificial intelligence, (aaai-18), the 30th innovative applications of artificial intelligence (iaai-18), and the 8th AAAI symposium on educational advances in artificial intelligence (eaai-18), new orleans, louisiana, usa, february 2-7, 2018, McIlraith and Weinberger, pp. 1527–1535. External Links: Link Cited by: §1.
  48. L. Rieger, C. Singh, W. J. Murdoch and B. Yu (2019) Interpretations are useful: penalizing explanations to align neural networks with prior knowledge. CoRR abs/1909.13584. External Links: Link, 1909.13584 Cited by: §1.
  49. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg and F. Li (2015) ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115 (3), pp. 211–252. External Links: Link, Document Cited by: §2.2, §4.2.
  50. M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov and L. Chen (2018) MobileNetV2: inverted residuals and linear bottlenecks. See 3, pp. 4510–4520. External Links: Link, Document Cited by: §4.1.
  51. R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh and D. Batra (2016) Grad-cam: why did you say that? visual explanations from deep networks via gradient-based localization. CoRR abs/1610.02391. External Links: Link, 1610.02391 Cited by: §1, §2.1, §2.1, §2.1, §4.2.1, §6.1.
  52. R. R. Selvaraju, S. Lee, Y. Shen, H. Jin, S. Ghosh, L. P. Heck, D. Batra and D. Parikh (2019) Taking a HINT: leveraging explanations to make vision and language models more grounded. See 4, pp. 2591–2600. External Links: Link, Document Cited by: §1.
  53. A. Shrikumar, P. Greenside and A. Kundaje (2017) Learning important features through propagating activation differences. See Proceedings of the 34th international conference on machine learning, ICML 2017, sydney, nsw, australia, 6-11 august 2017, Precup and Teh, pp. 3145–3153. External Links: Link Cited by: §1.
  54. A. Shrikumar, P. Greenside, A. Shcherbina and A. Kundaje (2016) Not just a black box: learning important features through propagating activation differences. CoRR abs/1605.01713. External Links: Link, 1605.01713 Cited by: §1.
  55. K. Simonyan, A. Vedaldi and A. Zisserman (2014) Deep inside convolutional networks: visualising image classification models and saliency maps. See 2nd international conference on learning representations, ICLR 2014, banff, ab, canada, april 14-16, 2014, workshop track proceedings, Bengio and LeCun, External Links: Link Cited by: §1.
  56. K. Simonyan and A. Zisserman (2015) Very deep convolutional networks for large-scale image recognition. See 3rd international conference on learning representations, ICLR 2015, san diego, ca, usa, may 7-9, 2015, conference track proceedings, Bengio and LeCun, External Links: Link Cited by: §2.2, §4.2.
  57. D. Smilkov, N. Thorat, B. Kim, F. B. Viégas and M. Wattenberg (2017) SmoothGrad: removing noise by adding noise. CoRR abs/1706.03825. External Links: Link, 1706.03825 Cited by: §1.
  58. J. Song, Y. Chen, X. Wang, C. Shen and M. Song (2019) Deep model transferability from attribution maps. See Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, neurips 2019, 8-14 december 2019, vancouver, bc, canada, Wallach et al., pp. 6179–6189. External Links: Link Cited by: §4.1.
  59. J. T. Springenberg, A. Dosovitskiy, T. Brox and M. A. Riedmiller (2015) Striving for simplicity: the all convolutional net. See 3rd international conference on learning representations, ICLR 2015, san diego, ca, usa, may 7-9, 2015, workshop track proceedings, Bengio and LeCun, External Links: Link Cited by: §1, §2.1, §3.2.
  60. M. Sundararajan, A. Taly and Q. Yan (2017) Axiomatic attribution for deep networks. See Proceedings of the 34th international conference on machine learning, ICML 2017, sydney, nsw, australia, 6-11 august 2017, Precup and Teh, pp. 3319–3328. External Links: Link Cited by: §1.
  61. H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox and R. Garnett (Eds.) (2019) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, neurips 2019, 8-14 december 2019, vancouver, bc, canada. External Links: Link Cited by: 16, 18, 20, 25, 26, 32, 42, 58.
  62. Q. Yang and M. J. Wooldridge (Eds.) (2015) Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI 2015, buenos aires, argentina, july 25-31, 2015. AAAI Press. External Links: Link, ISBN 978-1-57735-738-4 Cited by: 66.
  63. M. D. Zeiler and R. Fergus (2014) Visualizing and understanding convolutional networks. See Computer vision - ECCV 2014 - 13th european conference, zurich, switzerland, september 6-12, 2014, proceedings, part I, Fleet et al., pp. 818–833. External Links: Link, Document Cited by: §1, §3.2.
  64. J. Zhang, Z. L. Lin, J. Brandt, X. Shen and S. Sclaroff (2016) Top-down neural attention by excitation backprop. See Computer vision - ECCV 2016 - 14th european conference, amsterdam, the netherlands, october 11-14, 2016, proceedings, part IV, Leibe et al., pp. 543–559. External Links: Link, Document Cited by: §4.2.1.
  65. B. Zhou, A. Khosla, À. Lapedriza, A. Oliva and A. Torralba (2016) Learning deep features for discriminative localization. See 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, las vegas, nv, usa, june 27-30, 2016, CPVR, pp. 2921–2929. External Links: Link, Document Cited by: §1.
  66. F. Zhuang, X. Cheng, P. Luo, S. J. Pan and Q. He (2015) Supervised representation learning: transfer learning with deep autoencoders. See Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI 2015, buenos aires, argentina, july 25-31, 2015, Yang and Wooldridge, pp. 4119–4125. External Links: Link Cited by: §4.1.
  67. F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong and Q. He (2019) A comprehensive survey on transfer learning. CoRR abs/1911.02685. External Links: Link, 1911.02685 Cited by: §4.1.
  68. L. M. Zintgraf, T. S. Cohen, T. Adel and M. Welling (2017) Visualizing deep neural network decisions: prediction difference analysis. See 5, External Links: Link Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
414447
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description