Toward Finding The Global Optimal of Adversarial Examples
Abstract
Current machine learning models are vulnerable to adversarial examples (Goodfellow et al., 2014), we noticed that current stateoftheart methods (Kurakin et al., 2016; Cheng et al., 2018) to attack a welltrained model often stuck in local optimal values. We conduct series of experiments on both whitebox and blackbox settings, and find out that by different initialization, the attack algorithm will finally converge to very different local optimals, suggesting the importance of careful and thorough search in the attack space. In this paper, we propose a general boosting algorithm that can help current attack to find a more global optimal example. Specifically, we search for the adversarial examples by starting from different points/directions, and in certain interval we adopt successive halving (Jamieson & Talwalkar, 2016) to cut down the searching directions that are not promising, and use Bayesian Optimization (Pelikan et al., 1999; Bergstra et al., 2011) to resample from the search space based on the knowledge obtained from past searches. We demonstrate that by applying our methods to stateoftheart attack algorithms in both blackand white box setting, we can further reduce the distortion between the original image and adversarial sample about 10%20%. By adopting dynamic successive halving, we can reduce the computation cost 510 times without harming the final result. We conduct experiments in models trained on MNIST or ImageNet and also try on decision tree models,these experiments suggest that our method is a general way to boost the performance of current adversarial attack methods.
1 Introduction
It has been widely shown that current machine learning models (include deep neural networks) suffer from attack and vulnerable to adversarial examples (Goodfellow et al., 2014; Szegedy et al., 2013; Chen et al., 2017a). Researchers have developed methods (Goodfellow et al., 2014; MoosaviDezfooli et al., 2016; Carlini & Wagner, 2017; Chen et al., 2017b) to generate such examples that can mislead even the stateoftheart models. These methods mainly use gradient on the loss function of the model’s output, and perform backpropagation on the input which will change the model’s output most quickly. Such methods are effective to generate adversarial examples with low perturbation on the original sample. However, a significant shortcoming of these gradientbased attacks is that it’s very likely to converge in a local optimal value. Recent work (Wang et al., 2018) show that by different initialization, such as different starting point in whitebox setting or different direction in blackbox setting, the final convergence value can be very different. However, these methods introduce lots of extra computation cost and their improvement is relatively minor.
In this paper, we argue that gradientbased method is severely lack of variance and can significantly benefit from introducing more diversity. Specifically, we mainly try to boost the performance of SignOPT attack (Cheng et al., 2018) in hardlabel blackbox setting, since the goal is to find the best direction that has minimum distortion, the starting direction is very important. We later generalize our method on the whitebox attack such as C&W attack and also boost its performance. Generally speaking, we initialize the attack by different configurations and apply currently best attack algorithms on them, to reduce computation cost and introduce more guided variance, we continuely cut the worse part of configuration and resample new configurations using Bayesian Optimization. In whitebox attack, we randomly sample points within an ball and perform PGD (Kurakin et al., 2016) attack on them, in hardlabel blackbox attack (Brendel et al., 2017; Cheng et al., 2018), we first sample directions and apply SignOPT attack on them. In certain iterations, we will conduct successive halving (Jamieson & Talwalkar, 2016) which will abandon the worst half of these configurations to reduce computation cost, and use a Bayesian Optimization method called Tree Parzen Estimator (TPE) (Bergstra et al., 2011) to resample from search space, this procedure will encourage the algorithm to search for more promising areas and introducing guided variance in the middle step. By adopting these methods, we can enhance the performance of current attack without increasing too much computation.
Our contribution are summarized below:

We conduct thorough experiments to show that current gradientbased attack algorithms often converge to local minima or maxima and can not find a good adversarial example, thus require further improvement.

We design a general algorithm to boost the performance of current attack algorithms and encourage them to find a more global optimal value, which can generate better adversarial examples.

We conduct comprehensive experiments on several datasets and attack algorithms. We show that our method can help the current algorithm to find better examples with 10%20% lower distortion. By the introduction of cutting mechanism, we can reduce the computation cost 5x10x compared to noncutted search.
2 Background and Related work
2.1 Whitebox attack
First attack attempt conducts experiments in a whitebox setting which is relatively easy, it means that the attacker has free access to the model’s architecture and parameters. FGSM (Goodfellow et al., 2014) is one of the first algorithm in this setting, It performs single backpropagation step on the gradient and then uses it to generate a adversarial example. Madry et al. (2017) and Kurakin et al. (2016) further improve the FSGM method by turning it into a iterativem way called Projected Gradient Decent (PGD), that is, at each iteration the model only walk through the calculated gradient for a small step, then recalculate the gradient and walk again. The original whitebox attacks calculate gradient on the softmax output and try to maximize the error of the original label, while C&W attack (Carlini & Wagner, 2017) tries to minimize the distance between the original image and the attack image by metric with an norm, and simultaneously maximize the error of model’s prediction for the original label.
All of these whitebox attack methods highly depend on the gradient to find direction that will make the model generates wrong predictions. However, Wang et al. (2018, 2019) show that starting from different points around the orinigal one, these points will converge to very different local optimals. The author further uses interval attack to find a series of starting point which increases the diversity and apply PGD attack on it. Th result shows that compared to the original PGD attack starts from difference points, the interval attack will find local optimals with final distortion spread much wider, and by increasing the number of starting points, it can find a better optimal than PGD.
2.2 Blackbox attack
In blackbox setting, the attacker does not have no direct access to the model’s parameter or architecture, that is, the model remains a black box to the attacker. So the attacker has to only depend on the output of the model to conduct attack. Depending on how much information the model will give to the outside world, the problem is subsequently splited into two parts: softlabel and hardlabel. In the soft label setting, after the model accepted an input, it will output the probability of each label, while in the hard label setting, the model will only output the top1 label with greatest probability. For soft label attack, Chen et al. (2017b) were first to use Zero Order Optimization(ZOO) to approximately estimate the gradient and perform PGD–based adversarial attack on it. In Ilyas et al. (2018a), the author uses Neural evolution strategy (NES) to estimate the gradient. As for the hardlabel attack, Brendel et al. (2017) first formulate this problem and use boundary attack to find the adversarial example with lowest minimum distortion. Later, Cheng et al. (2018)(OPT attack) describes the hardlabel attack as a continuous form and uses ZOO to search for the best direction. Cheng et al. (2018)(SignOPT) further reduce the queries by only calculating the sign of ZOO updating function. These methods still uses gradient as the signal to find the best adversarial examples with minimum distortion, and will of course suffer from local optimal. In experiment we find that by starting from different directions, the final converging results can be very different.
2.3 Bayesian Optimization and Successive Halving
Bayesian Optimization (BO) has been successfully applied to optimize a function which is undifferentiable or blackbox like finding the hyperparameters of neural networks in AutoML area. It mainly adopts the idea to sample new points based on the past knowledge. Basically, Bayesian optimization finds the optimal value of a given function in a iterative manner: at each iteration i, BO uses a probabilistic model to estimate and approach the unknown function based on the data points that are already observed by the last iterations. Specifically, it samples new data points where is the acquisition function and are the samples queried from so far. The most widely used acquisition functions is the expected improvement (EI):
(1) 
Where is the value of the best sample generated so far and is the location of that sample, i.e. .
The Tree Parzen Estimator (TPE). TPE (Bergstra et al., 2011) is a Bayesian Optimization method proposed to solve the hyperparameter tuning problems that uses a kernel density estimator (KDE) to approximate the distribution of instead of trying to model the objective function directly. Specifically, it models the and instead of , and define using two separate KDE and :
(2) 
where is a constant between the lowest and largest value of in . (Bergstra et al., 2011) shows that maximizing the radio is equivalent to optimizing the EI function described in Equation 1 (see Appendix B for more detail). In such setting, the computational cost of generating a new data point by KDE grows linearly with the number of data points already generated, while traditional Gaussian Process (GP) will require cubictime.
Successive Halving. The idea behind Successive Halving (Jamieson & Talwalkar, 2016) can be easily illustrated by it’s name: first initialize a set of configurations and perform some calculations on them, then evaluate the performance of all configurations and discard the worst half od these configurations, this process continues until there is only one configuration left. BOHB (Falkner et al., 2018) combines HyperBand (derived from Successive Halving) (Li et al., 2016) and TPE to solve the AutoML problem and achieve greate success.
However, these methods are originally applied to the hyperparameter tuning problem where the parameters need to be searched are not too much(approximately 1020), it will suffer from ”dimensional curse” when the number of parameters grows larger and the computation cost needed will be unacceptable. There are already some work (Moriconi et al., 2019; Wang et al., 2013) try to use BO in high dimension, while we still found in experiment that simply use BO can not converge as good as gradientbased methods.
3 Methodology
3.1 Basic Intuition
Due to the high dimension of image classification problem, current adversarial attack method primarily adopt the gradientbased algorithm to find a adversarial sample. These methods are efficient always much quicker to to find a successful attack sample compared to probabilisticbased methods like Bayesian Optimization. However, gradientbased algorithm can easily stuck in a local optimal value and its final result highly depends on the starting direction, this is because at each iteration, it always takes a step through gradient which assures to reach a better point than before. Such methods lack of variance and make it almost impossible to find a global optimal value, especially when the searching space is high dimensional and nonconvex and may contains lots of local optimal values.
To better illustrate this phenomenon, Figure 1 shows the results of an attack at an image from MNIST dataset by 784 start directions(equal to the dimension of MNIST dataset), the attack is done by SignOPT which is the current stateoftheart algorithm in hardlabel blackbox attack. To maximize the cosine difference between different start directions, we use Gram–Schmidt Process to force the directions to be orthogonal to each other, the MNIST dataset have 784 dimension (28*28) so the diversity should be enough. We can see that the best and worse curves do converge in a quite different value, suggesting that tuning the starting direction and introducing additional variance is necessary in the SignOPT method.
3.2 A General Boosting Mechanism
The general goal is to efficiently find a better adversarial example based on current attack algorithms without introducing too much computation cost. Normally, there are two commonly used methods to solve a blackbox optimization problem: gradientbased and probabilisticbased algorithm. Gradientbased methods which are commonly used in blackbox attack estimate the gradient of the objective function and go through the gradient iteratively until convergence, while probabilisticbased algorithm such as Bayesian Optimization (BO) (Pelikan et al., 1999; Snoek et al., 2012) try to approximate the objective function by a probabilistic model, it first sample uniformly across the search space and incorporates prior belief about the objective function, then try to query new points that most likely to be a better one based on the observation of past steps. Generally speaking, gradientbased methods converge fast but may stuck in some local optimal directions. Probabilisticbased algorithm can have a better chance to find more global optimal values, however, the computation cost grows exponentially while the dimension increases and quickly become unacceptable.
We argue that the gradientbased methods’ performance will naturally depends on the starting directions. As show in Figure 2, starting from different points like and , gradientbased methods will result in different local minimas. The natural way to tackle this problem is to try more starting directions, but that will add lots of computation cost and less efficient. So our general goal is to efficiently find better optimal values, like drop the unpromising configurations and adopting guided search techniques.
Our algorithm to boost the performance of current attack methods is presented in Algorithm 1. In the searching phase, we maintain two sets to record the information: step pool records all iterations performed including the cutted one, and available search pool stores all configurations that are still active and need to be searched later. The reason that we record all the iterations taken is that to fit a KDE with high dimension and requires lots of data to fill the search space, also, it can reflect the changes between iterations and help the model to better understand the search space. We sample directions using Gaussian or Uniform distribution as starting configurations, then for each interval , we first perform iterations on each of the configurations, and cut the worst percent of searching direction, in the meanwhile, we will use TPE resampling to add new possible directions to search.
The procedure of Tree Parzen Estimator resampling is to encourage the algorithm to find possible directions with lower distortion. The reasons and benefits to resample besides of cutting is:

Depends not only on the starting directions. First, this procedure will add more variance in the middle of searching steps, instead of depending solely on the starting directions. In experiment we find that start with the same direction, SignOPT often has similar final convergence distortion despite the procedure to estimate gradient also introduce some sort of random.

Guided random. The starting directions are sampled randomly with no information about the objective function, however, after obtained some awareness of the boundary by past searches, we can resample new directions guided by them, which will make the resample more efficient and more likely to find better directions.

Parameter Sharing. Past iterations performed by SignOPT already lead to relatively better local areas with small distortion, by sampling around these areas, we can regard this as sharing the past searches with prior ones, not starting searching from the original point, which can further reduce the computation cost.
As shown in Algorithm 2, we first divide the observed data into two parts by their distortion (the lower means the better), and formulate two separate KDE to fit these two subsets. Later we try to sample new data with the minimum value of which can be proved to have maximum relative gain on distortion ( see appendix B for more detail), since we can’t directly find such points, we sample for a few times (the number is set to 100 during the experiment) from and keep the one with the minimum .
3.3 Boosting HardLabel BlackBox Attack
In blackbox attack, we mainly study how to boost the performance in hardlabel setting. We choose SignOPT attack Cheng et al. (2018) which is the stateoftheart attack algorithm in this area as base algorithm, and enhance its performance by successive halving and TPE resampling. To tackle the problem that hardlabel attack only have binary information describing each label as true or false, OPT attack transforms the hardlabel attack as a continuous form and thus can apply gradientbased algorithm to optimize it. Specifically, it defines as the minimum distortion toward a direction and try to find that have smallest :
(3) 
by calculating using binary search, the author transforms undifferentiable the hardlabel attack problem into an continuous form, and use Zero Order Optimization (Chen et al., 2017b) to estimate the directional derivative of :
(4) 
SignOPT further improves this method by just using a single query to calculate the sign of Equation 4. We argue that these method still highly depend on the estimated gradient and can lead to nonoptimal local minimas. Thus can greatly benefit from adding more variance into these algorithms.
We try to apply our boosting algorithm to SignOPT attack and help enhance its performance. To adopt our boosting algorithm into SignOPT attack, the Critic metric is the distance between the original example and adversarial example, and lower distance means the better and promising configuration.
3.4 Boosting HardLabel BlackBox Attack
Besides SignOPT attack, we demonstrate that our algorithm can also be applied to whitebox attack, specifically, we try to boost the C&W attack (Carlini & Wagner, 2017) by adding more variance into it. We argue that C&W is also a gradientbased method and it’s performance will also restricted by local minimas. We describe the exact setting in Appendix B.
4 Experiments
To evaluate the effectiveness and generalization ability of our algorithm in order to help current attack method to find better optimal value, we conduct experiments on both whitebox and hardlabel black box attack. We use several popular image classification dataset and test them on both whitebox and blackbox setting. Also, we try to enhance the performance of Gradient Boosting Decision Tree (GBDT), since the GBDT is naturally nondifferentiable and can not apply current whitebox attack on it, we only try blackbox attack on GBDT.
4.1 HardLabel BlackBox attack
4.1.1 Attack on image.
We conduct experiments on three standard datasets: MNIST (LeCun et al., 1998), CIFAR10 (Krizhevsky et al., 2010) and ImageNet1000 (Deng et al., 2009) using the stateoftheart attack algorithm SignOPT. The neural network model’s architecture is the same with one reported in SignOPT. In detail, both MNIST and CIFAR use the same network structure contains four convolution layers, two maxpooling layers and two fullyconnected layers. As reported in Carlini & Wagner (2017) and Cheng et al. (2018), we finally achieve an accuracy of 99.5% on MNIST and 82.5% on CIFAR10. As for the ImageNet dataset, We use the pretrained Resnet50(He et al., 2016) network provided by torchvision (Marcel & Rodriguez, 2010), which achieves a Top1 accuracy of 76.15%. We randomly select 100 examples from each dataset (in the test dataset) to evaluate.
We adopt SignOPT attack to be the base algorithm to be boosted, and also include the following two algorithms for comparison:

Boundary blackbox attack (Brendel et al., 2017): We use the implementation provided in Foolbox https://github.com/bethgelab/foolbox.

Optbased blackbox attack (Cheng et al., 2018): We use the implementation provided at https://github.com/LeMinhThong/blackboxattack.
MNIST  CIFAR10  ImageNet1000  

Avg  ASR  Avg  ASR  Avg.  ASR  
()  ()  ()  
C&W (Whitebox)  0.96  60%  0.11  68%  1.53  49% 
Boundary attack  1.27  21%  0.15  49%  2.02  19% 
OPTbased attack  1.11  39%  0.14  52%  1.67  38% 
SignOPT attack  1.05  51%  0.12  61%  1.43  59% 
TPESH attack  0.95  62%  0.09  73%  1.32  81% 
We study the effect of successive halving and TPE resampling separately. As shown in Figure 3, in successive halving, we continuing cutting the worse percent of searching direction during a specific interval until there is only one sample left. As for successive halving and TPE resampling combined, we do cutting and resampling in the first several phase, and only do cutting later until all searching directions are cut or the search reach local optimal and does not move again. We can see from Figure 3(b) that TPE resampling indeed find some directions that are better than original ones. We observed from experiments that the final best direction mostly comes from TPE resampling instead of original starting directions, which demonstrates the importance of introducing resampling in the middle of optimization steps. The quantitative influence of successive halving and TPE resampling are demonstrated in Appendix E.
A natural question will rise: How many starting points are enough? In order to find the best number of starting points, we do attack on an image with different number of starting directions, for a specific number of starting directions, we also run several times and average the result to reduce variance. Figure 4 shows the attack on MNIST image using SignOPT method, we can see the effect that the number of starting direction have on the final converging distortion. The increase the starting directions do not help reduce the minimum distortion much after about 50 directions. Also, We can find that the standard deviation is smaller and the final distortion is lower when resampling by TPE is introduced. This is probably because TPE resampling also introduce variance in the middle step and making the algorithm not completely depends on the starting directions, this will also helps increase the probability to find a better optimal value .
Another question is: What is the best cutting interval?The cutting interval decides how many iterations are applied before the next cutting and resampling is conducted. It is an important factor to be tuned. If it is to small, the configuration will be evaluated and possibly cutted before they relatively converge, which makes the cutting unreasonable and inaccurate. If it is too large, then the configurations that are not promising will be performed for more iterations which is a waste of computation. Li et al. (2016) develop an algorithm called HyperBand to search for the best searching interval and cutting rate. However, HyperBand is used to solve the hyperparameter tuning in AutoML which is different from our problem. In the hyperparameter tuning problem, for a specific dataset only one best optimal setting exists and different datasets can have very different best settings for searching interval and cutting rate. However, in our experiment we will have thousands of images to be attacked, and the images in the same dataset always share the same best searching interval and cutting rate. As a result, we artificially find the best setting during the experiment for each dataset, instead of searching them like HyperBand. This will reduce lots of unnecessary computations. The parameters for different datasets are shown in Appendix F.
4.1.2 Attack on Gradient Boosting Decision Tree (GBDT).
We conduct untargeted attack on gradient booting decision tree (GBDT). Since SignOPT does not include the experiment with GBDT, we use the OPTbased attack (Cheng et al., 2018) and apply our boosting algorithm on it. In this experiment, we also use MNIST (LeCun et al., 1998) dataset for multiclass classification, we select the popular GBDT framework LightGBM and use the parameters in https://github.com/Koziev/MNIST_Boosting. The parameters suggested in this repository achieve 98.09% accuracy for MNIST dataset. The attack settings are almost the same to the method described before. Specifically, we start from 30 directions and conduct successive halving per 2000 queries. The results of the GBDT attack on MNIST are shown in Table 3.
MNIST  CIFAR10  
Avg  ASR  Avg  ASR  
()  ()  
C&W attack  0.98  41%  0.12  47% 
TPESH attack  0.92  66%  0.08  69% 
HIGGS  MNIST  
Avg  ASR  Avg  ASR  
()  ()  
OPTbased attack  0.169  52%  0.952  61% 
TPESH attack  0.103  81%  0.722  79% 
4.1.3 \colorred(Cho: maybe add attack KNN experiments. )
4.2 WhiteBox attack
In the whitebox attack, we adopt C&W attack method described in (Carlini & Wagner, 2017) and perform the attack on both MNIST and CIFAR10 dataset. We select 100 examples for each dataset to evaluate the performance. In order to introduce variance and encourage the model to find better and global optimal values, we randomly sample 50 points inside the ball (we use during the experiment) as the starting points, and apply C&W attack on each of them. For simplicity, we fix described in Equation 6 to be 0.2, and it can easily be applied to different settings of . The cutting and resampling are used during the search and the detail parameters can be found in Appendix F. The results before and after boosting are shown in Table 3.
5 Conclution
In this paper, we propose a general boosting framework that can be applied to both whitebox and blackbox attack to find a more global optimal adversarial sample without increasing too much computational cost. Our method enjoys the variance guaranteed by Bayesian Optimization and efficiency provided by Gradientbased Optimization, and will find a much better optimal value compared to previous work. We also prove experimentally that different starting directions do significantly effect the final attack distortion and studied the best number of directions to achieve both efficiency and optimum.
Acknowledgments
References
 Bergstra et al. (2011) James S Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms for hyperparameter optimization. In Advances in neural information processing systems, pp. 2546–2554, 2011.
 Brendel et al. (2017) Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decisionbased adversarial attacks: Reliable attacks against blackbox machine learning models. arXiv preprint arXiv:1712.04248, 2017.
 Carlini & Wagner (2017) Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE, 2017.
 Chen et al. (2017a) Hongge Chen, Huan Zhang, PinYu Chen, Jinfeng Yi, and ChoJui Hsieh. Attacking visual language grounding with adversarial examples: A case study on neural image captioning. arXiv preprint arXiv:1712.02051, 2017a.
 Chen et al. (2017b) PinYu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and ChoJui Hsieh. Zoo: Zeroth order optimization based blackbox attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26. ACM, 2017b.
 Cheng et al. (2018) Minhao Cheng, Thong Le, PinYu Chen, Jinfeng Yi, Huan Zhang, and ChoJui Hsieh. Queryefficient hardlabel blackbox attack: An optimizationbased approach. arXiv preprint arXiv:1807.04457, 2018.
 Deng et al. (2009) Jia Deng, Wei Dong, Richard Socher, LiJia Li, Kai Li, and Li FeiFei. Imagenet: A largescale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
 Falkner et al. (2018) Stefan Falkner, Aaron Klein, and Frank Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. arXiv preprint arXiv:1807.01774, 2018.
 Goodfellow et al. (2014) Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
 He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
 Ilyas et al. (2018a) Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Blackbox adversarial attacks with limited queries and information. arXiv preprint arXiv:1804.08598, 2018a.
 Ilyas et al. (2018b) Andrew Ilyas, Logan Engstrom, and Aleksander Madry. Prior convictions: Blackbox adversarial attacks with bandits and priors. arXiv preprint arXiv:1807.07978, 2018b.
 Jamieson & Talwalkar (2016) Kevin Jamieson and Ameet Talwalkar. Nonstochastic best arm identification and hyperparameter optimization. In Artificial Intelligence and Statistics, pp. 240–248, 2016.
 Krizhevsky et al. (2010) Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar10 (canadian institute for advanced research). URL http://www. cs. toronto. edu/kriz/cifar. html, 8, 2010.
 Kurakin et al. (2016) Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
 LeCun et al. (1998) Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
 Li et al. (2016) Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hyperband: A novel banditbased approach to hyperparameter optimization. arXiv preprint arXiv:1603.06560, 2016.
 Madry et al. (2017) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
 Marcel & Rodriguez (2010) Sébastien Marcel and Yann Rodriguez. Torchvision the machinevision package of torch. In Proceedings of the 18th ACM international conference on Multimedia, pp. 1485–1488. ACM, 2010.
 MoosaviDezfooli et al. (2016) SeyedMohsen MoosaviDezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582, 2016.
 Moriconi et al. (2019) Riccardo Moriconi, Marc Peter Deisenroth, and K. S. Sesh Kumar. Highdimensional bayesian optimization using lowdimensional feature spaces. 2019.
 Pelikan et al. (1999) Martin Pelikan, David E Goldberg, and Erick CantúPaz. Boa: The bayesian optimization algorithm. In Proceedings of the 1st Annual Conference on Genetic and Evolutionary ComputationVolume 1, pp. 525–532. Morgan Kaufmann Publishers Inc., 1999.
 Snoek et al. (2012) Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pp. 2951–2959, 2012.
 Szegedy et al. (2013) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
 Wang et al. (2018) Shiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Mixtrain: Scalable training of formally robust neural networks. arXiv preprint arXiv:1811.02625, 2018.
 Wang et al. (2019) Shiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Enhancing gradientbased attacks with symbolic intervals. arXiv preprint arXiv:1906.02282, 2019.
 Wang et al. (2013) Ziyu Wang, Masrour Zoghi, Frank Hutter, David Matheson, and Nando De Freitas. Bayesian optimization in high dimensions via random embeddings. In TwentyThird International Joint Conference on Artificial Intelligence, 2013.
Appendix A Introduction to adversarial attack
In a common attack setting, we are trying to find the weakness of a well trained machine learning models by generating some examples that can be correctly classified by human but not machine. Specifically, assume that we have a welltrained multiclass classification model , various adversarial attack algorithms try to find the such that:
(5) 
There are several key features of a successful adversarial example that worth mention:

The new example should be near the original .

The output of the machine learning model changes.

Human can easily classify to its correct label since the change is minor.
The possible explaination that a well trained model will exhibit such phenomenon can be illustrated in Figure 5, assume we are attacking a classification model such image recognition, the problem is that the model’s prodiction of certain class may not be the true, we can see that the region with deeper color is the true area of a class, and two region with label a and b are actually far away, which is obvious. However, the model’s predicted region for these two classes may be different from true regions, and their boundary are close to each other, which makes the adversarial attack happens.
Appendix B Boosting WhiteBox Attack
Figure 6 shows a possible boundary distribution and C&W (Carlini & Wagner, 2017) attack performed on it. The decision boundary of a neural network can be very unsmooth and contains lots of local optimal points. Generally speaking, the cost from the original point toward the boundary(which means a successful attack) highly depends on the directions. Traditional whitebox attack algorithms like FSGM (Goodfellow et al., 2014), PGD (Kurakin et al., 2016) and C&W (Carlini & Wagner, 2017) use walk through gradient to reach the boundary, implying that gradient can guide to the optimal value, which may not be the case. We try to improve PGD attack to find a global local optimal adversarial example by encouraging it to search not just depends the gradient. Assume that the original sample is (, ) and is the loss function of a neural network, C&W attack conducts iterative search as following:
(6)  
which depend solely on the gradient. Instead of starting from a single direction which is calculated by gradient on the original point, we randomly sample points within an ball (which means the the distance between the generated point and the original one is less than , this value may be slightly different depends on datasets) as a set of possible configurations. Later, we assign a fixed budget, i.e. certain number of PGD iterations, to each of these configurations. We evaluate each of these configurations after they run out the budget and cut the worst percent of configurations and do not assign any budget to them further. We evaluate these configurations by comparing their on the same iterations, the one with higher will be presumed good. This procedure will continue until their is only one configuration left.
Besides successive halving, we also resample from the search space guided by past searches. We use TPE resampling algorithm described in Algorithm 2 to add new possible configurations and perform search on them. We will describe it more specifically in the next section. Noted that in the blackbox setting, the good configurations are those have larger than , but in blackbox setting, the configurations with lower than are good ones, because in blackbox attack we are performing search on the boundary which already guarantee to have a successful attack, so the only goal is to find a lower distortion.
Appendix C Tree Parzen Estimator
Theorem 1 In function 2, maximizing the radio is equal to optimizing the Expected Improvement (EI) in function 1
Proof:
The Expected Improvement can also be written as:
(7) 
Assume that , then:
(8) 
Therefore,
(9) 
So finally,
(10) 
Which means maximizing is equivalent to maximize the EI function.
Appendix D Experiments on CIFAR10 and ImageNet
In both MNIST and CIFAR10 dataset, we keep the dimension unchanged during the attack. On ImageNet1000 dataset, because the dimension is too high (224*224) and the probabilistic density function of KDE is so close to zero at each point, it will make the predicted probability underflow and cause numerical errors. To tackle this problem, as described in Ilyas et al. (2018b), the author suggests that images tend to have a spatially local similarity, for instance, pixels in a local region tend to be similar to each other. This phenomenon also applies to the gradients. More specifically, if two points are close, then the gradients of these two points will be relatively similar, this paper call it ”datadependent priors”. This gives us the opportunity to reduce the dimension of ImageNet1000 dataset to avoid the numerical problem in KDE. In the experiment, we reduce the dimension by apply a the mean pooling operation with kernel size , the is set to be 5 to balance precision and query efficiency. Table 1 shows the results by applying successive halving and TPE resampling (starting from 50 random directions) on SignOPT method to boost the performance, we can see that the performance gain is significant and the attack success rate (ASR) reach the new stateoftheart in hardlabel blackbox attack setting.
Appendix E Adversarial examples
cifar10 curves
ImageNet curves
Appendix F Ablation Study
We study the influence of introducing cutting and resampling into our method. We conduct hardlabel blackbox attack on MNIST using SignOPT attack as base algorithm. For comparison, we use MultiDirectional attack as the baseline method. It samples 30 starting directions and perform attack on them without cutting and resampling during the middle step. We can see from Table 4 that introducing cutting in the middle step reduce the computation cost without harming the overall performance, and by introducing resampling, we can find better adversarial examples without increasing too much computation.
Avg  Relative Gain  ASR()  Queries  Relative Loss  

SingleDirectional attack  1.05  0%  51%  40456  0 
MultiDirectional attack  0.98  6.6%  57%  1216428  30x 
Successive Halving attack  0.99  5.7%  55%  257893  6.4x 
TPESH attack  0.95  9.5%  62%  402245  9.9x 
Appendix G parameters for different datasets
g.1 BlackBox Attack
Dataset 



Resample Times  
MNIST  40000  3500  1.4  3  
CIFAR10  20000  2000  1.3  3  
ImageNet  200000  6000  1.6  4 
g.2 WhiteBox Attack
Dataset 



Resample Times  
MNIST  200  
CIFAR10 