Toward Finding The Global Optimal of Adversarial Examples

Toward Finding The Global Optimal of Adversarial Examples

Zhenxin Xiao
Department of Computer Science
Zhejiang University, Hangzhou, China
{alanshawzju}@gmail.com
&Kai-Wei Chang & Cho-Jui Hsieh
Department of Computer Science
University of California, Los Angeles
{kwchang,chohsieh}@cs.ucla.edu
Abstract

Current machine learning models are vulnerable to adversarial examples (Goodfellow et al., 2014), we noticed that current state-of-the-art methods (Kurakin et al., 2016; Cheng et al., 2018) to attack a well-trained model often stuck in local optimal values. We conduct series of experiments on both white-box and black-box settings, and find out that by different initialization, the attack algorithm will finally converge to very different local optimals, suggesting the importance of careful and thorough search in the attack space. In this paper, we propose a general boosting algorithm that can help current attack to find a more global optimal example. Specifically, we search for the adversarial examples by starting from different points/directions, and in certain interval we adopt successive halving (Jamieson & Talwalkar, 2016) to cut down the searching directions that are not promising, and use Bayesian Optimization (Pelikan et al., 1999; Bergstra et al., 2011) to resample from the search space based on the knowledge obtained from past searches. We demonstrate that by applying our methods to state-of-the-art attack algorithms in both black-and white box setting, we can further reduce the distortion between the original image and adversarial sample about 10%-20%. By adopting dynamic successive halving, we can reduce the computation cost 5-10 times without harming the final result. We conduct experiments in models trained on MNIST or ImageNet and also try on decision tree models,these experiments suggest that our method is a general way to boost the performance of current adversarial attack methods.

1 Introduction

It has been widely shown that current machine learning models (include deep neural networks) suffer from attack and vulnerable to adversarial examples (Goodfellow et al., 2014; Szegedy et al., 2013; Chen et al., 2017a). Researchers have developed methods (Goodfellow et al., 2014; Moosavi-Dezfooli et al., 2016; Carlini & Wagner, 2017; Chen et al., 2017b) to generate such examples that can mislead even the state-of-the-art models. These methods mainly use gradient on the loss function of the model’s output, and perform back-propagation on the input which will change the model’s output most quickly. Such methods are effective to generate adversarial examples with low perturbation on the original sample. However, a significant shortcoming of these gradient-based attacks is that it’s very likely to converge in a local optimal value. Recent work (Wang et al., 2018) show that by different initialization, such as different starting point in white-box setting or different direction in black-box setting, the final convergence value can be very different. However, these methods introduce lots of extra computation cost and their improvement is relatively minor.

In this paper, we argue that gradient-based method is severely lack of variance and can significantly benefit from introducing more diversity. Specifically, we mainly try to boost the performance of Sign-OPT attack (Cheng et al., 2018) in hard-label black-box setting, since the goal is to find the best direction that has minimum distortion, the starting direction is very important. We later generalize our method on the white-box attack such as C&W attack and also boost its performance. Generally speaking, we initialize the attack by different configurations and apply currently best attack algorithms on them, to reduce computation cost and introduce more guided variance, we continuely cut the worse part of configuration and resample new configurations using Bayesian Optimization. In white-box attack, we randomly sample points within an ball and perform PGD (Kurakin et al., 2016) attack on them, in hard-label black-box attack (Brendel et al., 2017; Cheng et al., 2018), we first sample directions and apply Sign-OPT attack on them. In certain iterations, we will conduct successive halving (Jamieson & Talwalkar, 2016) which will abandon the worst half of these configurations to reduce computation cost, and use a Bayesian Optimization method called Tree Parzen Estimator (TPE) (Bergstra et al., 2011) to resample from search space, this procedure will encourage the algorithm to search for more promising areas and introducing guided variance in the middle step. By adopting these methods, we can enhance the performance of current attack without increasing too much computation.

Our contribution are summarized below:

  1. We conduct thorough experiments to show that current gradient-based attack algorithms often converge to local minima or maxima and can not find a good adversarial example, thus require further improvement.

  2. We design a general algorithm to boost the performance of current attack algorithms and encourage them to find a more global optimal value, which can generate better adversarial examples.

  3. We conduct comprehensive experiments on several datasets and attack algorithms. We show that our method can help the current algorithm to find better examples with 10%-20% lower distortion. By the introduction of cutting mechanism, we can reduce the computation cost 5x-10x compared to non-cutted search.

2 Background and Related work

2.1 White-box attack

First attack attempt conducts experiments in a white-box setting which is relatively easy, it means that the attacker has free access to the model’s architecture and parameters. FGSM (Goodfellow et al., 2014) is one of the first algorithm in this setting, It performs single back-propagation step on the gradient and then uses it to generate a adversarial example. Madry et al. (2017) and Kurakin et al. (2016) further improve the FSGM method by turning it into a iterativem way called Projected Gradient Decent (PGD), that is, at each iteration the model only walk through the calculated gradient for a small step, then re-calculate the gradient and walk again. The original white-box attacks calculate gradient on the softmax output and try to maximize the error of the original label, while C&W attack (Carlini & Wagner, 2017) tries to minimize the distance between the original image and the attack image by metric with an norm, and simultaneously maximize the error of model’s prediction for the original label.

All of these white-box attack methods highly depend on the gradient to find direction that will make the model generates wrong predictions. However, Wang et al. (2018, 2019) show that starting from different points around the orinigal one, these points will converge to very different local optimals. The author further uses interval attack to find a series of starting point which increases the diversity and apply PGD attack on it. Th result shows that compared to the original PGD attack starts from difference points, the interval attack will find local optimals with final distortion spread much wider, and by increasing the number of starting points, it can find a better optimal than PGD.

2.2 Black-box attack

In black-box setting, the attacker does not have no direct access to the model’s parameter or architecture, that is, the model remains a black box to the attacker. So the attacker has to only depend on the output of the model to conduct attack. Depending on how much information the model will give to the outside world, the problem is subsequently splited into two parts: soft-label and hard-label. In the soft label setting, after the model accepted an input, it will output the probability of each label, while in the hard label setting, the model will only output the top-1 label with greatest probability. For soft label attack, Chen et al. (2017b) were first to use Zero Order Optimization(ZOO) to approximately estimate the gradient and perform PGD–based adversarial attack on it. In Ilyas et al. (2018a), the author uses Neural evolution strategy (NES) to estimate the gradient. As for the hard-label attack, Brendel et al. (2017) first formulate this problem and use boundary attack to find the adversarial example with lowest minimum distortion. Later, Cheng et al. (2018)(OPT attack) describes the hard-label attack as a continuous form and uses ZOO to search for the best direction. Cheng et al. (2018)(Sign-OPT) further reduce the queries by only calculating the sign of ZOO updating function. These methods still uses gradient as the signal to find the best adversarial examples with minimum distortion, and will of course suffer from local optimal. In experiment we find that by starting from different directions, the final converging results can be very different.

2.3 Bayesian Optimization and Successive Halving

Bayesian Optimization (BO) has been successfully applied to optimize a function which is un-differentiable or black-box like finding the hyper-parameters of neural networks in AutoML area. It mainly adopts the idea to sample new points based on the past knowledge. Basically, Bayesian optimization finds the optimal value of a given function in a iterative manner: at each iteration i, BO uses a probabilistic model to estimate and approach the unknown function based on the data points that are already observed by the last iterations. Specifically, it samples new data points where is the acquisition function and are the samples queried from so far. The most widely used acquisition functions is the expected improvement (EI):

(1)

Where is the value of the best sample generated so far and is the location of that sample, i.e. .

The Tree Parzen Estimator (TPE). TPE (Bergstra et al., 2011) is a Bayesian Optimization method proposed to solve the hyper-parameter tuning problems that uses a kernel density estimator (KDE) to approximate the distribution of instead of trying to model the objective function directly. Specifically, it models the and instead of , and define using two separate KDE and :

(2)

where is a constant between the lowest and largest value of in . (Bergstra et al., 2011) shows that maximizing the radio is equivalent to optimizing the EI function described in Equation 1 (see Appendix B for more detail). In such setting, the computational cost of generating a new data point by KDE grows linearly with the number of data points already generated, while traditional Gaussian Process (GP) will require cubic-time.

Successive Halving. The idea behind Successive Halving (Jamieson & Talwalkar, 2016) can be easily illustrated by it’s name: first initialize a set of configurations and perform some calculations on them, then evaluate the performance of all configurations and discard the worst half od these configurations, this process continues until there is only one configuration left. BOHB (Falkner et al., 2018) combines HyperBand (derived from Successive Halving) (Li et al., 2016) and TPE to solve the AutoML problem and achieve greate success.

However, these methods are originally applied to the hyper-parameter tuning problem where the parameters need to be searched are not too much(approximately 10-20), it will suffer from ”dimensional curse” when the number of parameters grows larger and the computation cost needed will be unacceptable. There are already some work (Moriconi et al., 2019; Wang et al., 2013) try to use BO in high dimension, while we still found in experiment that simply use BO can not converge as good as gradient-based methods.

3 Methodology

3.1 Basic Intuition

Due to the high dimension of image classification problem, current adversarial attack method primarily adopt the gradient-based algorithm to find a adversarial sample. These methods are efficient always much quicker to to find a successful attack sample compared to probabilistic-based methods like Bayesian Optimization. However, gradient-based algorithm can easily stuck in a local optimal value and its final result highly depends on the starting direction, this is because at each iteration, it always takes a step through gradient which assures to reach a better point than before. Such methods lack of variance and make it almost impossible to find a global optimal value, especially when the searching space is high dimensional and non-convex and may contains lots of local optimal values.

(a) Histogram describing the distribution of final local optimal distortions.
(b) Distortion vs Queries curves.
Figure 1: Sample drawn from an image in MNIST test dataset. The figures shows that the final distortion can be very different because of various starting directions.

To better illustrate this phenomenon, Figure 1 shows the results of an attack at an image from MNIST dataset by 784 start directions(equal to the dimension of MNIST dataset), the attack is done by Sign-OPT which is the current state-of-the-art algorithm in hard-label black-box attack. To maximize the cosine difference between different start directions, we use Gram–Schmidt Process to force the directions to be orthogonal to each other, the MNIST dataset have 784 dimension (28*28) so the diversity should be enough. We can see that the best and worse curves do converge in a quite different value, suggesting that tuning the starting direction and introducing additional variance is necessary in the Sign-OPT method.

3.2 A General Boosting Mechanism

The general goal is to efficiently find a better adversarial example based on current attack algorithms without introducing too much computation cost. Normally, there are two commonly used methods to solve a black-box optimization problem: gradient-based and probabilistic-based algorithm. Gradient-based methods which are commonly used in black-box attack estimate the gradient of the objective function and go through the gradient iteratively until convergence, while probabilistic-based algorithm such as Bayesian Optimization (BO) (Pelikan et al., 1999; Snoek et al., 2012) try to approximate the objective function by a probabilistic model, it first sample uniformly across the search space and incorporates prior belief about the objective function, then try to query new points that most likely to be a better one based on the observation of past steps. Generally speaking, gradient-based methods converge fast but may stuck in some local optimal directions. Probabilistic-based algorithm can have a better chance to find more global optimal values, however, the computation cost grows exponentially while the dimension increases and quickly become unacceptable.

Figure 2: Illustration.

We argue that the gradient-based methods’ performance will naturally depends on the starting directions. As show in Figure 2, starting from different points like and , gradient-based methods will result in different local minimas. The natural way to tackle this problem is to try more starting directions, but that will add lots of computation cost and less efficient. So our general goal is to efficiently find better optimal values, like drop the un-promising configurations and adopting guided search techniques.

Our algorithm to boost the performance of current attack methods is presented in Algorithm 1. In the searching phase, we maintain two sets to record the information: step pool records all iterations performed including the cutted one, and available search pool stores all configurations that are still active and need to be searched later. The reason that we record all the iterations taken is that to fit a KDE with high dimension and requires lots of data to fill the search space, also, it can reflect the changes between iterations and help the model to better understand the search space. We sample directions using Gaussian or Uniform distribution as starting configurations, then for each interval , we first perform iterations on each of the configurations, and cut the worst percent of searching direction, in the meanwhile, we will use TPE resampling to add new possible directions to search.

1:Hard-label model , original example , maximum iteration , searching pool size , cutting interval , cutting rate , searching interval increase rate , Critic metric ;
2:Initialize step pool and available search pool as an empty list;
3:for each  do
4:     Randomly sample direction from a Gaussian or Uniform distribution;
5:     .
6:.
7:for each  do
8:     for each  do                          perform attack on all configurations
9:         for each  do               conduct iterations before cutting
10:              Calculate gradient by Sign-OPT attack.
11:              
12:              .                Record all interval steps          
13:         .                     Record final step      
14:     Delete the worst percent of searching direction from
15:      TPE-resampling(, )
16:     . Increase the searching interval before cutting and resampling
Algorithm 1 Framework of the attack boosting algorithm in Sign-OPT attack.

The procedure of Tree Parzen Estimator resampling is to encourage the algorithm to find possible directions with lower distortion. The reasons and benefits to resample besides of cutting is:

  1. Depends not only on the starting directions. First, this procedure will add more variance in the middle of searching steps, instead of depending solely on the starting directions. In experiment we find that start with the same direction, Sign-OPT often has similar final convergence distortion despite the procedure to estimate gradient also introduce some sort of random.

  2. Guided random. The starting directions are sampled randomly with no information about the objective function, however, after obtained some awareness of the boundary by past searches, we can resample new directions guided by them, which will make the resample more efficient and more likely to find better directions.

  3. Parameter Sharing. Past iterations performed by Sign-OPT already lead to relatively better local areas with small distortion, by sampling around these areas, we can regard this as sharing the past searches with prior ones, not starting searching from the original point, which can further reduce the computation cost.

As shown in Algorithm 2, we first divide the observed data into two parts by their distortion (the lower means the better), and formulate two separate KDE to fit these two subsets. Later we try to sample new data with the minimum value of which can be proved to have maximum relative gain on distortion ( see appendix B for more detail), since we can’t directly find such points, we sample for a few times (the number is set to 100 during the experiment) from and keep the one with the minimum .

1:Observed datas , resample rate ;
2:Initialize as an empty list;
3:Divide into two subset (better) and (worse) based on critic ;
4:Build two separate KDEs on and denoted as and respectively;
5:Use Grid Search to find the best KDE bandwidth , for and ;
6:for each  do
7:     initialization: , ;
8:     while  do
9:         Sample from ;
10:         if  then;
11:              ;          
12:               
13:     ;
14:return ;
Algorithm 2 Tree Parzen Estimator resampling.

3.3 Boosting Hard-Label Black-Box Attack

In black-box attack, we mainly study how to boost the performance in hard-label setting. We choose Sign-OPT attack Cheng et al. (2018) which is the state-of-the-art attack algorithm in this area as base algorithm, and enhance its performance by successive halving and TPE resampling. To tackle the problem that hard-label attack only have binary information describing each label as true or false, OPT attack transforms the hard-label attack as a continuous form and thus can apply gradient-based algorithm to optimize it. Specifically, it defines as the minimum distortion toward a direction and try to find that have smallest :

(3)

by calculating using binary search, the author transforms un-differentiable the hard-label attack problem into an continuous form, and use Zero Order Optimization (Chen et al., 2017b) to estimate the directional derivative of :

(4)

Sign-OPT further improves this method by just using a single query to calculate the sign of Equation 4. We argue that these method still highly depend on the estimated gradient and can lead to non-optimal local minimas. Thus can greatly benefit from adding more variance into these algorithms.

We try to apply our boosting algorithm to Sign-OPT attack and help enhance its performance. To adopt our boosting algorithm into Sign-OPT attack, the Critic metric is the distance between the original example and adversarial example, and lower distance means the better and promising configuration.

3.4 Boosting Hard-Label Black-Box Attack

Besides Sign-OPT attack, we demonstrate that our algorithm can also be applied to white-box attack, specifically, we try to boost the C&W attack (Carlini & Wagner, 2017) by adding more variance into it. We argue that C&W is also a gradient-based method and it’s performance will also restricted by local minimas. We describe the exact setting in Appendix B.

4 Experiments

To evaluate the effectiveness and generalization ability of our algorithm in order to help current attack method to find better optimal value, we conduct experiments on both white-box and hard-label black box attack. We use several popular image classification dataset and test them on both white-box and black-box setting. Also, we try to enhance the performance of Gradient Boosting Decision Tree (GBDT), since the GBDT is naturally non-differentiable and can not apply current white-box attack on it, we only try black-box attack on GBDT.

4.1 Hard-Label Black-Box attack

4.1.1 Attack on image.

We conduct experiments on three standard datasets: MNIST (LeCun et al., 1998), CIFAR-10 (Krizhevsky et al., 2010) and ImageNet-1000 (Deng et al., 2009) using the state-of-the-art attack algorithm Sign-OPT. The neural network model’s architecture is the same with one reported in Sign-OPT. In detail, both MNIST and CIFAR use the same network structure contains four convolution layers, two max-pooling layers and two fully-connected layers. As reported in Carlini & Wagner (2017) and Cheng et al. (2018), we finally achieve an accuracy of 99.5% on MNIST and 82.5% on CIFAR-10. As for the ImageNet dataset, We use the pretrained Resnet-50(He et al., 2016) network provided by torchvision (Marcel & Rodriguez, 2010), which achieves a Top-1 accuracy of 76.15%. We randomly select 100 examples from each dataset (in the test dataset) to evaluate.

We adopt Sign-OPT attack to be the base algorithm to be boosted, and also include the following two algorithms for comparison:

MNIST CIFAR-10 ImageNet-1000
Avg ASR Avg ASR Avg. ASR
() () ()
C&W (White-box) 0.96 60% 0.11 68% 1.53 49%
Boundary attack 1.27 21% 0.15 49% 2.02 19%
OPT-based attack 1.11 39% 0.14 52% 1.67 38%
Sign-OPT attack 1.05 51% 0.12 61% 1.43 59%
TPE-SH attack 0.95 62% 0.09 73% 1.32 81%
Table 1: Results of hard-label black-box attack on MNIST, CIFAR-10 and ImageNet-1000. We compare the performance several attack algorithms under untargeted setting.

We study the effect of successive halving and TPE resampling separately. As shown in Figure 3, in successive halving, we continuing cutting the worse percent of searching direction during a specific interval until there is only one sample left. As for successive halving and TPE resampling combined, we do cutting and resampling in the first several phase, and only do cutting later until all searching directions are cut or the search reach local optimal and does not move again. We can see from Figure 3(b) that TPE resampling indeed find some directions that are better than original ones. We observed from experiments that the final best direction mostly comes from TPE resampling instead of original starting directions, which demonstrates the importance of introducing resampling in the middle of optimization steps. The quantitative influence of successive halving and TPE resampling are demonstrated in Appendix E.

(a) Successive Halving.
(b) Successive Halving and TPE resampling.
Figure 3: Illustration of the effect of Successive Halving and TPE resampling. By using successive halving, we can reduce the query count by a significant amount while not harming the final result. And TPE resampling can help find a more global minimum value. Note that Figure 3(b) only exhibit part of the curve to show the effect of TPE.

A natural question will rise: How many starting points are enough? In order to find the best number of starting points, we do attack on an image with different number of starting directions, for a specific number of starting directions, we also run several times and average the result to reduce variance. Figure 4 shows the attack on MNIST image using Sign-OPT method, we can see the effect that the number of starting direction have on the final converging distortion. The increase the starting directions do not help reduce the minimum distortion much after about 50 directions. Also, We can find that the standard deviation is smaller and the final distortion is lower when resampling by TPE is introduced. This is probably because TPE resampling also introduce variance in the middle step and making the algorithm not completely depends on the starting directions, this will also helps increase the probability to find a better optimal value .

(a) Successive Halving.
(b) Successive Halving and TPE resampling.
Figure 4: Number of starting directions vs Final distortion. Since we will run multiple times on each setup to reduce variance, the red line shows the average distortion and the blue area shows the standard deviation.

Another question is: What is the best cutting interval?The cutting interval decides how many iterations are applied before the next cutting and resampling is conducted. It is an important factor to be tuned. If it is to small, the configuration will be evaluated and possibly cutted before they relatively converge, which makes the cutting unreasonable and inaccurate. If it is too large, then the configurations that are not promising will be performed for more iterations which is a waste of computation. Li et al. (2016) develop an algorithm called HyperBand to search for the best searching interval and cutting rate. However, HyperBand is used to solve the hyper-parameter tuning in AutoML which is different from our problem. In the hyper-parameter tuning problem, for a specific dataset only one best optimal setting exists and different datasets can have very different best settings for searching interval and cutting rate. However, in our experiment we will have thousands of images to be attacked, and the images in the same dataset always share the same best searching interval and cutting rate. As a result, we artificially find the best setting during the experiment for each dataset, instead of searching them like HyperBand. This will reduce lots of unnecessary computations. The parameters for different datasets are shown in Appendix F.

4.1.2 Attack on Gradient Boosting Decision Tree (GBDT).

We conduct untargeted attack on gradient booting decision tree (GBDT). Since Sign-OPT does not include the experiment with GBDT, we use the OPT-based attack (Cheng et al., 2018) and apply our boosting algorithm on it. In this experiment, we also use MNIST (LeCun et al., 1998) dataset for multi-class classification, we select the popular GBDT framework LightGBM and use the parameters in https://github.com/Koziev/MNIST_Boosting. The parameters suggested in this repository achieve 98.09% accuracy for MNIST dataset. The attack settings are almost the same to the method described before. Specifically, we start from 30 directions and conduct successive halving per 2000 queries. The results of the GBDT attack on MNIST are shown in Table 3.

MNIST CIFAR-10
Avg ASR Avg ASR
() ()
C&W attack 0.98 41% 0.12 47%
TPE-SH attack 0.92 66% 0.08 69%
Table 3: Comparison of results of untargeted attack on gradient boosting decision tree.
HIGGS MNIST
Avg ASR Avg ASR
() ()
OPT-based attack 0.169 52% 0.952 61%
TPE-SH attack 0.103 81% 0.722 79%
Table 2: Comparison of results of untargeted attack on white-box attack.

4.1.3 \colorred(Cho: maybe add attack KNN experiments. )

4.2 White-Box attack

In the white-box attack, we adopt C&W attack method described in (Carlini & Wagner, 2017) and perform the attack on both MNIST and CIFAR-10 dataset. We select 100 examples for each dataset to evaluate the performance. In order to introduce variance and encourage the model to find better and global optimal values, we randomly sample 50 points inside the ball (we use during the experiment) as the starting points, and apply C&W attack on each of them. For simplicity, we fix described in Equation 6 to be 0.2, and it can easily be applied to different settings of . The cutting and resampling are used during the search and the detail parameters can be found in Appendix F. The results before and after boosting are shown in Table 3.

5 Conclution

In this paper, we propose a general boosting framework that can be applied to both white-box and black-box attack to find a more global optimal adversarial sample without increasing too much computational cost. Our method enjoys the variance guaranteed by Bayesian Optimization and efficiency provided by Gradient-based Optimization, and will find a much better optimal value compared to previous work. We also prove experimentally that different starting directions do significantly effect the final attack distortion and studied the best number of directions to achieve both efficiency and optimum.

Acknowledgments

References

Appendix A Introduction to adversarial attack

In a common attack setting, we are trying to find the weakness of a well trained machine learning models by generating some examples that can be correctly classified by human but not machine. Specifically, assume that we have a well-trained multi-class classification model , various adversarial attack algorithms try to find the such that:

(5)

There are several key features of a successful adversarial example that worth mention:

  1. The new example should be near the original .

  2. The output of the machine learning model changes.

  3. Human can easily classify to its correct label since the change is minor.

The possible explaination that a well trained model will exhibit such phenomenon can be illustrated in Figure 5, assume we are attacking a classification model such image recognition, the problem is that the model’s prodiction of certain class may not be the true, we can see that the region with deeper color is the true area of a class, and two region with label a and b are actually far away, which is obvious. However, the model’s predicted region for these two classes may be different from true regions, and their boundary are close to each other, which makes the adversarial attack happens.

Figure 5: Illustration of a possible region for two classes.

Appendix B Boosting White-Box Attack

Figure 6: Illustration of a possible boundary distribution and attack steps on it. Starting from different directions, we conduct cutting and resampling during the middle steps. The directions that are not promising are cut to save computational cost, and the directions that reach lower error value will be expanded to encourage exploring. This figure also shows that the boundary can be very unsmooth and contains lots of local optimal points on the surface.

Figure 6 shows a possible boundary distribution and C&W (Carlini & Wagner, 2017) attack performed on it. The decision boundary of a neural network can be very unsmooth and contains lots of local optimal points. Generally speaking, the cost from the original point toward the boundary(which means a successful attack) highly depends on the directions. Traditional white-box attack algorithms like FSGM (Goodfellow et al., 2014), PGD (Kurakin et al., 2016) and C&W (Carlini & Wagner, 2017) use walk through gradient to reach the boundary, implying that gradient can guide to the optimal value, which may not be the case. We try to improve PGD attack to find a global local optimal adversarial example by encouraging it to search not just depends the gradient. Assume that the original sample is (, ) and is the loss function of a neural network, C&W attack conducts iterative search as following:

(6)

which depend solely on the gradient. Instead of starting from a single direction which is calculated by gradient on the original point, we randomly sample points within an ball (which means the the distance between the generated point and the original one is less than , this value may be slightly different depends on datasets) as a set of possible configurations. Later, we assign a fixed budget, i.e. certain number of PGD iterations, to each of these configurations. We evaluate each of these configurations after they run out the budget and cut the worst percent of configurations and do not assign any budget to them further. We evaluate these configurations by comparing their on the same iterations, the one with higher will be presumed good. This procedure will continue until their is only one configuration left.

Besides successive halving, we also resample from the search space guided by past searches. We use TPE resampling algorithm described in Algorithm 2 to add new possible configurations and perform search on them. We will describe it more specifically in the next section. Noted that in the black-box setting, the good configurations are those have larger than , but in black-box setting, the configurations with lower than are good ones, because in black-box attack we are performing search on the boundary which already guarantee to have a successful attack, so the only goal is to find a lower distortion.

Appendix C Tree Parzen Estimator

Theorem 1 In function 2, maximizing the radio is equal to optimizing the Expected Improvement (EI) in function 1

Proof:

The Expected Improvement can also be written as:

(7)

Assume that , then:

(8)

Therefore,

(9)

So finally,

(10)

Which means maximizing is equivalent to maximize the EI function.

Appendix D Experiments on CIFAR-10 and ImageNet

In both MNIST and CIFAR-10 dataset, we keep the dimension unchanged during the attack. On ImageNet-1000 dataset, because the dimension is too high (224*224) and the probabilistic density function of KDE is so close to zero at each point, it will make the predicted probability underflow and cause numerical errors. To tackle this problem, as described in Ilyas et al. (2018b), the author suggests that images tend to have a spatially local similarity, for instance, pixels in a local region tend to be similar to each other. This phenomenon also applies to the gradients. More specifically, if two points are close, then the gradients of these two points will be relatively similar, this paper call it ”data-dependent priors”. This gives us the opportunity to reduce the dimension of ImageNet-1000 dataset to avoid the numerical problem in KDE. In the experiment, we reduce the dimension by apply a the mean pooling operation with kernel size , the is set to be 5 to balance precision and query efficiency. Table 1 shows the results by applying successive halving and TPE resampling (starting from 50 random directions) on Sign-OPT method to boost the performance, we can see that the performance gain is significant and the attack success rate (ASR) reach the new state-of-the-art in hard-label black-box attack setting.

Appendix E Adversarial examples

cifar-10 curves

ImageNet curves

Appendix F Ablation Study

We study the influence of introducing cutting and resampling into our method. We conduct hard-label black-box attack on MNIST using Sign-OPT attack as base algorithm. For comparison, we use Multi-Directional attack as the baseline method. It samples 30 starting directions and perform attack on them without cutting and resampling during the middle step. We can see from Table 4 that introducing cutting in the middle step reduce the computation cost without harming the overall performance, and by introducing resampling, we can find better adversarial examples without increasing too much computation.

Avg Relative Gain ASR() Queries Relative Loss
Single-Directional attack 1.05 0% 51% 40456 0
Multi-Directional attack 0.98 6.6% 57% 1216428 30x
Successive Halving attack 0.99 5.7% 55% 257893 6.4x
TPE-SH attack 0.95 9.5% 62% 402245 9.9x
Table 4: Comparions for the effectiveness of successive halving and TPE resampling. The Relative Gain is based on Multi-Directional attack and Queries means total queries of all directions.

Appendix G parameters for different datasets

g.1 Black-Box Attack

Dataset
Maximum Queries
Per Direction
Cutting and
Resampling Interval
Interval Increase
Ratio
Resample Times
MNIST 40000 3500 1.4 3
CIFAR-10 20000 2000 1.3 3
ImageNet 200000 6000 1.6 4
Table 5: Comparions for the effectiveness of successive halving and TPE resampling. The Relative Gain is based on Multi-Directional attack and Queries means total queries of all directions.

g.2 White-Box Attack

Dataset
Maximum Iterations
Per Point
Cutting and
Resampling Interval
Interval Increase
Ratio
Resample Times
MNIST 200
CIFAR-10
Table 6: Comparions for the effectiveness of successive halving and TPE resampling. The Relative Gain is based on Multi-Directional attack and Queries means total queries of all directions.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
389548
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description