A Model-Based Derivative-Free Approach to Black-Box Adversarial Examples: BOBYQA

A Model-Based Derivative-Free Approach to Black-Box Adversarial Examples: BOBYQA

Abstract

We demonstrate that model-based derivative free optimisation algorithms can generate adversarial targeted misclassification of deep networks using fewer network queries than non-model-based methods. Specifically, we consider the black-box setting, and show that the number of networks queries is less impacted by making the task more challenging either through reducing the allowed perturbation energy or training the network with defences against adversarial misclassification. We illustrate this by contrasting the BOBYQA algorithm Powell (2009) with the state-of-the-art model-free adversarial targeted misclassification approaches based on genetic Alzantot et al. (2019), combinatorial Moon et al. (2019), and direct-search Andriushchenko et al. (2019) algorithms. We observe that for high energy perturbations on networks, the aforementioned simpler model-free methods require the fewest queries. In contrast, the proposed BOBYQA based method achieves state-of-the-art results when the perturbation energy decreases, or if the network is trained against adversarial perturbations.

\printAffiliationsAndNotice

1 Introduction

Deep neural networks (NNs) achieve state-of-the-art performance on a growing number of applications such as acoustic modelling, image classification, and fake news detection Hinton et al. (2012); He et al. (2015); Monti et al. (2019) to name but a few. Alongside their growing application, there is a literature on the robustness of deep nets which shows that it is often possible to generate images with subtle perturbations, referred to as adversarial examples Szegedy et al. (2014); Goodfellow et al. (2015), to the input of a network resulting in its performance being severely degraded; for example, see Dalvi et al. (2004); Kurakin et al. (2017); Sitawarin et al. (2018); Eykholt et al. (2018); Yuan et al. (2019) concerning the use-case of self driving cars.

Methods to generate these adversarial examples are classified according to two main criteria Yuan et al. (2019).

Adversarial Specificity

establishes what the aim of the adversary is. In non-targeted attacks, the method perturbs the image in such a way that it is misclassified into any different category than the original one. While in targeted settings, the adversary specifies a category into which an image has to be misclassified.

Adversary’s Knowledge

defines the amount of information available to the adversary. In White-box settings the adversary has complete knowledge of the network architecture and weights, while in the Black-box setting the adversary is only able to obtain the pre-classification outpupt vector for a limited number of inputs. The White-box setting allows for the use of gradients of a missclassification objective to efficiently compute the adversarial example Goodfellow et al. (2015); Carlini and Wagner (2017); Chen et al. (2018), while the same optimization formulation of the Black-box setting requires use of a derivative free approach Narodytska and Kasiviswanathan (2017); Chen et al. (2017); Ilyas et al. (2018); Alzantot et al. (2019).

Figure 1: The success rate (SR) of the BOBYQA algorithm to generate a targeted adversarial example compared to GenAttack Alzantot et al. (2019), COMBI Moon et al. (2019), and SQUARE Andriushchenko et al. (2019) attacks as a function of the perturbation energy; specifically for a network trained on the CIFAR10 dataset without defences (Norm) and with the distillation defence (Adv) Papernot et al. (2016). It can be observed that as decreases the BOBYQA based method achieves a higher SR than other methods. Similarly, the success rate of BOBYQA is less affected by adversarial training. In particular, with the infinity norm of the perturbation limited to BOBYQA achieves a SR 1.15 and 1.59 folds better than SQUARE when considering Norm and Adv respectively. Here the number of network queries were restricted to 3,000, for further details see Fig. 9.

The generation of black-box targeted adversarial examples for deep NNs has been extensively studied in a setting initially proposed by Chen et al. (2017) where:

  • the adversarial example is found by solving an optimisation problem designed to change the original classification of a specific input to a specific alternative.

  • the perturbation, which causes the network to change the classification, has entries bounded in magnitude by a specified infinity norm (maximum entry magnitude).

  • the number of queries to the NN needed to generate the adversarial example should be as small as possible.

The Zeroth-Order-Optimization (ZOO) Chen et al. (2017) introduced DFO methods for computing adversarial examples in the black-box setting, specifically using a coordinate descent optimization algorithm. At the time this was a substantial departure from methods for the black-box setting which train a proxy NN and then employ gradient based methods for white-box attacks on the proxy network Papernot et al. (2017); Tu et al. (2019); such methods are especially effective when numerous adversarial examples will be computed, but require substantially more network queries than the methods designed for misclassifying individual examples. Following the introduction of ZOO, there have been numerous improvements using other model-free DFO based approaches, see Alzantot et al. (2019); Moon et al. (2019); Andriushchenko et al. (2019). Specifically, GenAttack Alzantot et al. (2019) is a genetic algorithm, COMBI Moon et al. (2019) is a direct-search method that explores the vertices of the perturbation energy, and SQUARE Andriushchenko et al. (2019) is a randomized direct-search method.

In this manuscript we consider an alternative model-based DFO method based on BOBYQA Powell (2009) which explicitly develops models that approximate the loss function in the optimisation problem and minimises the models using methods from continuous optimisation. By considering adversarial perturbations to three NNs trained on different datasets (MNIST, CIFAR10, and ImageNet), we show that for the model-free methods Alzantot et al. (2019); Moon et al. (2019); Andriushchenko et al. (2019) the number of evaluation of the NN grows more rapidly as the maximum perturbation energy decreases than does the method built upon BOBYQA. As a consequence GenAttack, COMBI and SQUARE are preferable for large values of the maximum perturbation energy and BOBYQA for smaller values. As an example Figure 1 illustrates how the BOBYQA based algorithm compares to GenAttack, COMBI, and SQUARE when considering a net either normally or adversarially trained on CIFAR10 with different maximum perturbation energies.

We observe the intuitive principle that direct-search methods are effective to misclassify NNs with high perturbation energies, while in more challenging settings it is preferable to use more sophisticated model-based methods, like ours. Model-based approaches will further challenge defences to adversarial missclassification Dhillon et al. (2018); Wang et al. (2019), and in so doing will lead to improved defences and more robust networks. Model-based DFO is a well developed area, and we expect further improvements are possible through a more extensive investigation of these approaches.

2 Adversarial Examples Formulated as an Optimisation Problem

Consider a classification operator from input space to output space of classes. A targeted adversarial perturbation to an input has the property that it changes the classification to a specified target class , i.e and . Herein we follow the formulation by Alzantot et al. (2019). Given: an image X, a maximum energy budget , and a suitable loss function , then the task of computing the adversarial perturbation can be cast as an optimisation problem such as

(1)

where the final two inequality constraints are due to the input entries being restricted to . Denoting the pre-classification output vector by , i.e. , then the misclassification of X to target label is achieved by if . In Carlini and Wagner (2017); Chen et al. (2017); Alzantot et al. (2019) they determined

(2)

to be the most effective loss function for computing in (1), and we also employ this choice throughout our experiments.

3 Derivative Free Optimisation for Adversarial Examples

Derivative Free Optimisation is a well developed field with numerous types of algorithms, see Conn et al. (2009) and Larson et al. (2019) for reviews on DFO principles and algorithms. Examples of classes of such methods include: direct search methods such as simplex, model-based methods, hybrid methods such as finite differences or implicit filtering, as well as randomized variants of the aforementioned and methods specific to convex or noisy objectives. The optimization formulation in Section 2 is amenable to virtually all DFO methods, making it unclear which of the algorithms to employ. Methods which have been trialled include: the finite difference based ZOO attack Chen et al. (2017), a combinatorial direct search of the perturbation energy constraint method COMBI Moon et al. (2019), a genetic direct search method GenAttack Alzantot et al. (2019), and most recently a randomized direct-search method Andriushchenko et al. (2019). Notably missing from the aforementioned list are model-based methods.

Given a set of samples with , model-based DFO methods start by identifying the minimiser of the objective among the samples at iteration , . Following this, a model for the objective function is constructed, typically centred around the minimizer. In its simplest form one uses a polynomial approximation to the objective, such as a quadratic model centred in

(3)

with , c, , and being also symmetric. In a white-box setting one would set and , but this is not feasible in the black-box setting as we do not have access to the derivatives of the objective function. Thus c and M are usually defined by imposing interpolation conditions

(4)

and when (i.e. the system of equations is under-determined) other conditions are introduced according to which algorithm is considered. The objective model (3) is considered to be a good estimate of the objective in a neighbourhood referred to as a trust region. Once the model is generated, the update step p is computed by solving the trust region problem

(5)

where is the radius of the region where we believe the model to be accurate, for more details see Nocedal and Wright (2006). The new point is added to and a prior point is potentially removed. Herein we consider an exemplary1 model-based method called BOBYQA.

Bobyqa

The BOBYQA algorithm, introduced in Powell (2009), updates the parameters of the model and M, in each iteration in such a way as to minimise the change in the quadratic term between iterates while otherwise fitting the sample values:

(6)
(7)

with and initialised as the zero matrix. When the number of parameters then the model is considered as linear with set as zero. We further allow only queries at each implementation of BOBYQA, since after the model is generated few iterations are needed to find the minimum.

3.1 Computational Scalability and Efficiency

For improved computational scalability and efficiency, we do not solve (1) for directly, but instead use domain sub-sampling and hierarchical liftings: domain sub-sampling iteratively sweeps over batches of variables, see (8), while hierarchical liftings clusters and perturbs variables simultaneously, see (12).

Domain Sub-Sampling

The simplest version of domain sub-sampling consists of partitioning input dimension into smaller disjoint domains; for example, domains of size which are disjoint and which cover all of . Rather than solving (1) for directly, for each of one sequentially solves for which are only non-zero for entries in . The resulting sub-domain perturbations are then summed to generate the full perturbation , see Figure 2 as an example. That is, the optimisation problem (1) is adapted to repeatedly looping over :

(8)

where the may be reinitialised; in particular following each loop over which occurs at .

Figure 2: Example of how the perturbation evolves through the iterations when an image in is attacked. In (a) the perturbation is and we select a sub-domain of pixels (in red). Once we have found the optimal perturbation in the selected sub-domain, we update the perturbation in (b) and select a new sub-domain of dimension . The same is repeated in (c).
(a)
(b)
(c)
(d)
Figure 3: Cumulative distribution function of successfully perturbed images as a function of number of queries to a NN trained on MNIST and CIFAR10 datasets. In each image the effectiveness of different sub-sampling methods in generating a successful adversarial example is shown for different values of perturbation energies . See Section 4.2 for details about experimental setup and NN architectures.

We considered three possible ways of selecting the domains

  • In Random Sampling we consider at each iteration a different random sub-samplings of the domain, .

  • In Ordered Sampling we generate a random disjoint partitioning of the domain. Once each variable has been optimised over once a new partitioning is generated.

  • In Variance Sampling we choose to select in decreasing order of local variance of , the variance in intensity among the 8 neighbouring variables (e.g. pixels) in the same colour channel. We further reinitialise after each loop through .

In Figure 3 we compare how these different sub-sampling techniques perform when generating adversarial example for the MNIST and CIFAR10 dataset. It can be observed that variance sampling consistently performs better than random and ordered sampling. This suggest that pixels belonging to high-contrast regions are more influential than the ones in a low-contrast one, and hence variance sampling is the preferable ordering.

Figure 4: Impact of hierarchical lifting approach on Loss function (2) as a function of the number of queries to Inception-v3 net trained on ImageNet dataset to find the adversarial example for a single image. The green vertical lines correspond to changes of hierarchical level, which entail an increase in the dimension of the optimisation space.

Hierarchical Lifting

When the domain is very high dimensional, working on single pixels is not efficient as the above described method would imply modifying only a very small proportion of the image; for instance, we will choose even when is almost three-hundred-thousand. Thus to perturb wider portions of the image, we consider a hierarchy of liftings as in the ZOO attack presented in Chen et al. (2017). We seek an adversarial example by optimising over increasingly higher dimensional spaces at each step referred here as level lifted to the image space. As an illustration, Figure 4 shows that hierarchical lifting has a significant impact on the minimisation of the loss function.

Figure 5: Example of how the perturbation is generated in a hierarchical lifting method with and on an image . In (a) the perturbation is and we highlight in red the boxes generated via the grid of dimension . Once we have found the optimal perturbation , we update the perturbation in (b) and further divide the image with a grid with blocks. Once an optimal solution is found for this grid, the final solution is shown in (c).

At each level we consider a linear lifting and find a level perturbation which is added to the full perturbation , according to

(9)

where is initialised as and the level perturbations of the previous layers are considered as fixed. Moreover, we impose that at each level, the grid has to double in refinement, i.e. . An example of how this works is illustrated in Figure 5.

When generating our adversarial examples, we considered two kind of liftings. The first kind of liftings is based on interpolation operations; a sorting matrix is applied such that every index of is uniquely associated to a node of a coarse grid masked over the original image. Afterwards, an interpolation is implemented over the values in the coarse grid, i.e. . The second kind of liftings, instead, forces the perturbation to be high-frequency since there is several literature on these perturbations being the most effective Guo et al. (2018); Gopalakrishnan et al. (2018); Sharma et al. (2019). Some preliminary results lead us to consider the “Block” lifting which considers a piecewise constant interpolation and corresponds to the one also used in Moon et al. (2019). Alternative piecewise linear or randomised orderings were also tried, but found not to be appreciably better to justify the added complexity. As we show for the example in Figure 6, this interpolation lifting divides an image in disjoint blocks via a coarse grid and associates to each of the blocks the same value of a parameter in . We characterise the lifting with the following conditions

(10)
(11)
Figure 6: In the “Block” lifting the perturbation is first applier to a sorting matrix S to which an interpolation L is implemented. Thus each block is associated uniquely to one of the variables in .

Since may still be very high (usually ), for each level we apply domain sub-sampling and consider . We order the blocks according to the variance of mean intensity among neighbouring blocks, in contrast to the variance within each block which was suggested in Chen et al. (2017). Consequently, at each level the adversarial example is found by solving the following iterative problem

(12)

where .

In its simplest formulation, hierarchical lifting struggles with the pixel-wise interval constraint, . To address this we allow the entries in to exceed the interval and then reproject the pixel-wise entries into the interval.

1:  Input: Image , target label , maximum perturbation , Neural Net , initial hierarchical level dimensions , maximum number of evaluations , batch sampling size , and maximum number of queries that we are allowed to do for each batch.
2:  Initialise , , .
3:  while  and  do
4:     Compute the number of sub samplings necessary to cover the whole domain
5:     Generate the lifting matrix
6:     for  do
7:        Compute the matrix which selects dimensions of the -dimensional domain.
8:        Define the bounds for a perturbation over the selected pixels of .
9:        Find by implementing the BOBYQA optimisation to the problem (12).
10:        Update the noise .
11:         += , , .
12:     end for
13:  end while
14:  if  then
15:     The perturbation is successful.
16:  else if  then
17:     The perturbation was not successful with iterations.
18:  end if
Algorithm 1 BOBYQA Based Algorithm

3.2 Algorithm pseudo-code

Our BOBYQA based algorithm is summarised in Algorithm 1; note that not using the hierarchical method corresponds to having one level with . A Python implementation of the proposed algorithm based on BOBYQA package from Cartis et al. (2019) is available on Github2.

4 Comparison of Derivative Free Methods

We compare the performance of our BOBYQA based algorithm to GenAttack Alzantot et al. (2019), combinatorial attacks COMBI Moon et al. (2019) and SQUARE Andriushchenko et al. (2019). The performance is measured by considering the distribution of queries needed to successfully find adversaries to different networks trained on three standard datasets: MNIST Lecun et al. (1998), CIFAR10 Krizhevsky (2009), and ImageNet Deng et al. (2009).

4.1 Parameter Setup for Algorithms

Our experiments rely for GenAttack Alzantot et al. (2019), COMBI Moon et al. (2019), and SQUARE Andriushchenko et al. (2019) on publicly available implementations3 with same hyperparameter setting and hierarchical approach as suggested by the respective authors.

For the proposed algorithm based on BOBYQA, we tuned three main parameters: the dimension of the initial set , the batch dimension , and the trust region radius.

(a) CIFAR10
(b) ImageNet
Figure 7: Comparison in loss function according to the different batch dimensions and the different dataset. After the linear model is generated, the optimisation algorithm is always allowed to query the net 5 times if or , or 10 times if . For ImageNet we are using the hierarchical lifting approach.
Figure 8: Comparison on how the loss decreases when the initial set dimension is either or in an attack to an image of MNIST with . We chose for both the methods and a maximum of function evaluations after the model was initialised, i.e. .

Batch Dimension Figure 7 shows the loss value averaged over 20 images for attacks to NNs trained on CIFAR10, and ImageNet datsets when different batch dimensions are chosen. The average objective loss as a function of network queries is largely insensitive to the batch sizes, but with modest differences for the larger ImageNet data set where was observed to require modestly fewer queries. For the remained of the simulations we use as a good trade-off between faster model generation and good performances.

Initial Set Dimension Once a subdomain of dimension is chosen, the model (3) is initialised with a set of samples on which the interpolation conditions (4) are imposed. There are two main choices for the dimension of the set: either , thus computing and c with the interpolation and leaving M always null and thus having a linear model, or which allows us to initialise , and the diagonal of M, hence obtaining a quadratic model. The results in Figure 8 show that at each iteration of the domain sub-sampling the quadratic method performs as well as a linear method, however it requires more queries to initialise the model. Thus we consider the linear model with 4.

Trust Region Radius Once the model for the optimisation is built, the step of the optimisation is bounded by the trust region radius. We have selected the beginning radius to be one third of the whole space in which the perturbation lies. With this choice of radius we usually reach within 5 steps a corner of the boundary, and the further iterates remain effectively stationary.

For the hierarchical lifting approach we consider an initial sub-domain of dimension , as this is the biggest grid that we can optimise over with a batch . After considering , we make use of and do not consider further levels.

4.2 Dataset and Neural Network Specifications

Experiments on each dataset are performed with one of the best performing NN architectures as described below

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 9: Cumulative fraction of test set images successfully misclassified with adversarial examples generated by GenAttack, COMBI, SQUARE and our BOBYQA based approach for different perturbation energies and NNs trained on MNIST, CIFAR10 and ImageNet dataset. In all results the solid and dashed lines denoted by ‘Norm’ and ‘Adv’ corresponds to attacks on nets trained without or with a defence strategy respectively. For MNIST and CIFAR we consider the distillation defence method from Papernot et al. (2016) while for ImageNet the adversarial training proposed in Kurakin et al. (2016).

Mnist/cifar10

MNIST and CIFAR10 are two data-sets with images divided between 10 classes and of dimension 28x28x1 and 32x32x3 respectively. On them we apply the net introduced in Chen et al. (2017) which is structured in succession by: 2 Conv layers with ReLu activation followed by a maxpooling layer. This process is repeated twice and then two dense layers with Relu activation are applied. Finally a softmax layer generates the output vector. For each dataset, we train the same architecture in two different ways obtaining separate nets. One is obtained by optimising the accuracy of the net on raw unperturbed images, while the other is trained with the application of the distillation defence by Papernot et al. (2016).

To generate a comprehensive distribution for the queries at each energy budget, for both the two trained nets and 10 images per class, we attempt to misclassify an image targeting all of the 9 remaining classes; this way we generate a total of 900 perturbations per energy budget. For these two datasets the images are of relative low dimension and we do not apply the hierarchical approach.

ImageNet

This is a data-set of millions of images with a dimension of 299x299x3 divided between 1000 classes. For this data-set we consider the Inception-v3 net Szegedy et al. (2016) trained with and without the adversarial defence proposed in Kurakin et al. (2016)5. Due to the large number of target classes in ImageNet, we perform tests on random images and target classes. The number of tests conducted for Inception-v3 Szegedy et al. (2016) and the adversarially trained variant Kurakin et al. (2016) are: 303 and 120 for , 155 and 114 for and 149 and 116 for respectively.

4.3 Experimental Results

In Figure 9 we present the cumulative fraction of images misclassified (abridged by CDF for cumulative distribution function) as a function of the number of queries to the NN for different perturbation energies . The pixels are normalised to be in the interval , hence, would imply that any pixel is allowed to change of the total intensity range from its initial value. By illustrating the CDFs we easily see which method has been able to misclassify the largest fraction of images in the given test-set for a fixed number of queries to the NN. It can be observed that the proposed BOBYQA based approach achieves state-of-the-art results when the perturbation bound of decreases. This behaviour is consistent across all of the considered datasets (MNIST, CIFAR10, and ImageNet); however, the energy at which the BOBYQA algorithm performs the best, varies in each case.

In the experiments we also considered nets trained with defence methods, distillation Papernot et al. (2016) for MNIST and CIFAR10 datasets while adversarial training Kurakin et al. (2016) for ImageNet, and the results can be identified in Figure 9 by the dashed lines. Similar to the previous case, we observe that the proposed BOBYQA based algorithm performs the best when the energy perturbation decreases. Moreover, the BOBYQA based algorithm seems to be the least affected in its performance when the any defence is used; for example, at 0.01 and 15,000 queries, the defence reduces the CDF of COMBI by 0.078 compared to 0.051 for BOBYQA. This further supports the idea that for more challenging scenarios model-based approaches are preferable as compared to model-free counterparts.

We associate the counter-intuitive improvement of the CDF in the MNIST and ImageNet with high perturbation energies cases to the distillation and the adversarial training being focused primarily on low energy perturbations. For ImageNet, non-model-based algorithms use different hierarchical approaches which we expect leads in part to the superior performance of COMBI in Fig. 9 panels (g)-(i).

5 Discussion and Conclusion

We have introduced BOBYQA, a method to search adversarial examples based on a model-based DFO algorithm and have conducted some experiments to understand how it compares to existing GenAttack Alzantot et al. (2019), COMBI Moon et al. (2019), and SQUARE Andriushchenko et al. (2019) attack, when targeted black-box adversarial examples are searched with the fewest queries to a neural net.

Following the results of the experiments that we presented above, the method with which generating the adversarial example should be chosen according to which setting the adversary is considering. When the perturbation energy is high, one should choose either COMBI if the input is high-dimensional or SQUARE if the input is low-dimensional. On the other hand, a model-based approach like BOBYQA should be considered as soon as the complexity of the setting increases, e.g. the maximum perturbation energy is reduced or the net is adversarially trained.

With the BOBYQA attack algorithm we have introduced a different approach for the generation of targeted adversarial examples in a black-box setting with the aim of exploring what advantages are achieved by considering model-based DFO algorithms. We did not focus on presenting an algorithm which is in absolute the most efficient; primarily because our algorithm has several aspects in which to be improved. The BOBYQA attack is limited by the implementation of py-BOBYQA Cartis et al. (2019) since the element-wise constraints do not allow the consideration of more sophisticated liftings which leverage on compressed sensing, to name one of the many possible variations.

In conclusion, the results in this paper support how sophisticated misclassification methods are preferable in challenging settings. As a consequence, variations on our model-based algorithms should be considered in the future as a tool to establish the effectiveness of newly presented adversarial defence techniques.

Acknowledgements

This publication is based on work supported by the EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling (EP/L015803/1) in collaboration with New Rock Capital Management.

Footnotes

  1. BOBYQA was selected among the numerous types of model-based DFO algorithms due to its efficiency observed for other similar problems requiring few model samples as in climate modelling Tett et al. (2013)
  2. https://github.com/giughi/A-Model-Based-
    Derivative-Free-Approach-to-Black-Box
    -Adversarial-Examples-BOBYQA
  3. GenAttack: https://github.com/nesl/adversarial_genattack
    COMBI: https://github.com/snu-mllab/parsimonious-blackbox-attack
    SQUARE: https://github.com/max-andr/square-attack
  4. The Constraint Optimisation by Linear Approximation (COBYLA), a linear based model DFO algorithm, was introduced before BOBYQA Powell (2007); however, COBYLA considers different constraints on the norm of the variable. Because of this and the possibility to extend the method to quadratic models, we name our algorithm after BOBYQA.
  5. For the non-adversarially trained net we considered the one available at http://jaina.cs.ucdavis.edu/datasets/adv/imagenet/inception_v3_2016_08_28_frozen.tar.gz, while for the weights of the adversarially trained net we relied on https://github.com/tensorflow/models/tree/master/research/adv_imagenet_models.

References

  1. GenAttack: practical black-box attacks with gradient-free optimization. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 1111–1119. External Links: Document Cited by: A Model-Based Derivative-Free Approach to Black-Box Adversarial Examples: BOBYQA, Figure 1, item Adversary’s Knowledge, §1, §1, §2, §3, §4.1, §4, §5.
  2. Square attack: a query-efficient black-box adversarial attack via random search. arXiv preprint arXiv:1912.00049. Cited by: A Model-Based Derivative-Free Approach to Black-Box Adversarial Examples: BOBYQA, Figure 1, §1, §1, §3, §4.1, §4, §5.
  3. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP), pp. 39–57. External Links: Document Cited by: item Adversary’s Knowledge, §2.
  4. Improving the flexibility and robustness of model-based derivative-free optimization solvers. ACM Trans. Math. Softw. 45 (3). External Links: Document Cited by: §3.2, §5.
  5. EAD: elastic-net attacks to deep neural networks via adversarial examples. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10–17. Cited by: item Adversary’s Knowledge.
  6. ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the ACM Workshop on Artificial Intelligence and Security (AISec), pp. 15–26. External Links: Document Cited by: item Adversary’s Knowledge, §1, §1, §2, §3.1, §3.1, §3, §4.2.
  7. Introduction to derivative-free optimization. Vol. 8, SIAM. Cited by: §3.
  8. Adversarial classification. In Proceedings of the ACM International conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 99–108. External Links: Document Cited by: §1.
  9. ImageNet: a large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255. External Links: Document Cited by: §4.
  10. Stochastic activation pruning for robust adversarial defense. In Proceedings of the International Conference on Learning Representations (ICLR), Cited by: §1.
  11. Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1625–1634. External Links: Document Cited by: §1.
  12. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations (ICLR), Cited by: item Adversary’s Knowledge, §1.
  13. Toward robust neural networks via sparsification. arXiv preprint arXiv:1810.10625. Cited by: §3.1.
  14. Low frequency adversarial perturbation. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), Cited by: §3.1.
  15. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034. External Links: ISBN 9781467383912, Document Cited by: §1.
  16. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine 29 (6), pp. 82–97. External Links: Document Cited by: §1.
  17. Black-box adversarial attacks with limited queries and information. In Proceedings of the International Conference on Machine Learning (ICML), pp. 2137–2146. Cited by: item Adversary’s Knowledge.
  18. Learning multiple layers of features from tiny images. Technical report University of Toronto. Cited by: §4.
  19. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236. Cited by: Figure 9, §4.2, §4.3.
  20. Adversarial examples in the physical world. In Proceedings of the International Conference on Learning Representations (ICLR), Workshop Track, Cited by: §1.
  21. Derivative-free optimization methods. Acta Numerica 28, pp. 287–404. External Links: Document Cited by: §3.
  22. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. External Links: Document Cited by: §4.
  23. Fake news detection on social media using geometric deep learning. arXiv preprint arXiv:1902.06673. Cited by: §1.
  24. Parsimonious black-box adversarial attacks via efficient combinatorial optimization. In Proceedings of the International Conference on Machine Learning (ICML), pp. 4636–4645. Cited by: A Model-Based Derivative-Free Approach to Black-Box Adversarial Examples: BOBYQA, Figure 1, §1, §1, §3.1, §3, §4.1, §4, §5.
  25. Simple black-box adversarial attacks on deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1310–1318. External Links: Document Cited by: item Adversary’s Knowledge.
  26. Numerical optimization. Springer-Verlag New York. External Links: Document, ISBN 978-0-387-30303-1 Cited by: §3.
  27. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP), pp. 582–597. External Links: Document Cited by: Figure 1, Figure 9, §4.2, §4.3.
  28. Practical black-box attacks against machine learning. In Proceedings of the ACM on Asia Conference on Computer and Communications Security (ASIA CCS), pp. 506–519. External Links: Document Cited by: §1.
  29. A view of algorithms for optimization without derivatives. Mathematics Today-Bulletin of the Institute of Mathematics and its Applications 43 (5), pp. 170–174. Cited by: footnote 4.
  30. The bobyqa algorithm for bound constrained optimization without derivatives. Technical report Technical Report DAMTP 2009/NA06, University of Cambridge. Cited by: A Model-Based Derivative-Free Approach to Black-Box Adversarial Examples: BOBYQA, §1, §3.
  31. On the effectiveness of low frequency perturbations. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 3389–3396. External Links: Document Cited by: §3.1.
  32. DARTS: deceiving autonomous cars with toxic signs. arXiv preprint arXiv:1802.06430. Cited by: §1.
  33. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826. External Links: Document Cited by: §4.2.
  34. Intriguing properties of neural networks. In Proceedings of the International Conference on Learning Representations (ICLR), Cited by: §1.
  35. Can top-of-atmosphere radiation measurements constrain climate predictions? part i: tuning. Journal of Climate 26 (23), pp. 9348–9366. External Links: Document Cited by: footnote 1.
  36. Autozoom: autoencoder-based zeroth order optimization method for attacking black-box neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence: Special Technical Track: AI for Social Impact, External Links: Document Cited by: §1.
  37. Protecting neural networks with hierarchical random switching: towards better robustness-accuracy trade-off for stochastic defenses. In Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI, Cited by: §1.
  38. Adversarial examples: attacks and defenses for deep learning. IEEE Transactions on Neural Networks and Learning Systems 30 (9), pp. 2805–2824. External Links: Document Cited by: §1, §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
409288
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description