Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent

Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent

Abstract

Despite the great achievements of the modern deep neural networks (DNNs), the vulnerability/robustness of state-of-the-art DNNs raises security concerns in many application domains requiring high reliability. Various adversarial attacks are proposed to sabotage the learning performance of DNN models. Among those, the black-box adversarial attack methods have received special attentions owing to their practicality and simplicity. Black-box attacks usually prefer less queries in order to maintain stealthy and low costs. However, most of the current black-box attack methods adopt the first-order gradient descent method, which may come with certain deficiencies such as relatively slow convergence and high sensitivity to hyper-parameter settings. In this paper, we propose a zeroth-order natural gradient descent (ZO-NGD) method to design the adversarial attacks, which incorporates the zeroth-order gradient estimation technique catering to the black-box attack scenario and the second-order natural gradient descent to achieve higher query efficiency. The empirical evaluations on image classification datasets demonstrate that ZO-NGD can obtain significantly lower model query complexities compared with state-of-the-art attack methods.

\nocopyright

Introduction

Modern technologies based on machine learning (ML) and specifically deep learning (DL), have achieved significant breakthroughs [23] in various applications. Deep neural network (DNN) serves as a fundamental component in artificial intelligence. However, despite the outstanding performance, many recent studies demonstrate that state-of-the-art DNNs in computer vision [43, 48], speech recognition [1, 24] and deep reinforcement learning [25] are vulnerable to adversarial examples [14], which add carefully designed imperceptible distortions to legitimate inputs aiming to mislead the DNNs at test time. This raises concerns of the DNN robustness in many applications with high reliability and dependability requirements.

With the recent exploration of adversarial attacks in image classification and objection detection, the vulnerability/robustness of DNNs has attracted ever-increasing attentions and efforts in the research field known as adversarial machine learning. A large amount of efforts have been devoted to: 1) designing adversarial perturbations in various ML applications [14, 7, 8, 52, 44]; 2) security evaluation methodologies to systematically estimate the DNN robustness [4, 47]; and 3) defense mechanisms against adversarial attacks [6, 11, 27, 41, 40, 45, 39]. This work mainly investigates the first category to build the groundwork towards developing potential defensive measures in reliable ML.

However, most of preliminary studies on this topic focus on the white-box setting where the target DNN model is completely available to the attacker [14, 7, 50]. More specifically, the adversary can compute the gradients of the output with respect to the input to identify the effect of perturbing certain input pixels, with complete knowledge about the DNN model’s internal structure, parameters and configurations. Despite the theoretical interest, it is unrealistic to adopt the white-box adversarial methods to attack practical black-box threat models [51], where the internal model states/configurations are not revealed to the attacker (e.g., Google Cloud Vision API). Instead, the adversary can only query the model by submitting inputs and obtain the corresponding model outputs of prediction probabilities when generating adversarial examples.

In the black-box adversarial setting, it is often the case that the less queries, the more efficient an attack becomes. Large amount of queries may be at the risk of exposing the adversary or high financial cost in the case where the query is charged per query. Notably, to date most of the white-box [14, 7] and black-box attacks [9, 17, 49] are based on first-order gradient descent methods. Different from the widely utilized first-order optimization, the application of second-order optimization [30] is less explored due to the large computation overhead, although it may achieve faster convergence rate. The work [33] adopts natural gradient descent (NGD) to train ResNet-50 on ImageNet in 35 epochs, demonstrating its great potentiality.

In this work, inspired by the superb convergence performance of NGD, we propose zeroth-order natural gradient descent (ZO-NGD), which incorporates the zeroth-order (ZO) method and the second-order NGD, to generate black-box adversarial examples in a query-efficient manner. The contributions of this work are summarized as follows:

+ Design of adversary attacks with NGD: To the best of our knowledge, we are the first to derive the Fisher information matrix (FIM) and adopt the second-order NGD method for adversarial attacks, which is different from other first-order-based white-box and black-box attack methods.

+ Co-optimization of zeroth-order and second-order methods: In the black-box setting, we incorporate the zeroth-order random gradient estimation to estimate the gradients which is not directly available, and leverage the second-order NGD to achieve high query-efficiency.

+ No additional queries to obtain the FIM: During the queries to estimate the gradients of the loss, with our design the Fisher information is a byproduct that are extracted and evaluated without requiring additional query complexity.

+ Scalability to high dimensional datasets: In NGD, it is computationally infeasible to compute and invert the FIM with billions of elements on large scale datasets like ImageNet. To address this problem, we propose a method to avoid the computation and inverse of the FIM and thus the computation complexity is at most the same as the input images, rather than its square (the dimension of the FIM).

Related Work

In adversarial ML, the black-box setting is more practical where the attacker can only query the target model by providing input images and receive the probability density output for the input.

Black-box Attack

Attack with gradient estimation

In the black-box setting, as the gradients are not directly available, gradient estimation methods via zeroth-order optimization [42, 38, 13] are proposed to estimate the gradients. The ZOO method [9] performs pixel-level gradient estimation first and then perform white-box C&W attack [7] with the estimated gradients. Despite its high success rate, it suffers from intensive computation and huge queries due to element-wise gradient estimation.

The more practical threat models are investigated in [17]. New attack methods based on Natural Evolutionary Strategies (NES) and Monte Carlo approximation to estimate the gradients are developed to mislead ImageNet classifiers under more restrictive threat models. The work [18] further proposes to use the prior information including the time-dependent priors and data-dependent priors to enhance the query efficiency.

Different from the previous first-order-based methods, the work [46] exploits the second-order optimization to improve query efficiency. In general, they explore Hessian information in the parameter space while our work explores the Hessian information in the distribution space (aka information matrix). Particularly, our method obtains the Fisher information during the first-order information (gradients) estimation for free while the mentioned paper needs additional queries for Hessian-based second-order optimization.

Heuristic black-box attacks

In the transfer attack [34], the attacker first trains a surrogate model with data labeled by the target model. White-box attacks are applied to attack the surrogate model and the generated examples are transferred to attack the target model. However, it may suffer from low attack success rate due to the low similarity between the surrogate model and the target model.

The boundary method [5] utilizes a conceptually simple idea to decrease the distortion through random walks and find successful adversarial perturbations while staying on the misclassification boundary. However, it suffers from high computational complexity and lacks algorithmic convergence guarantees.

Second-order optimization

First-order gradient descent methods have been extensively used in various ML tasks. They are easy to implement and suitable for large-scale DL. But these methods come with well known deficiencies such as relatively-slow convergence and sensitivity to hyper-parameter settings. On the other hand, second-order optimization methods provide a elegant solution by selectively re-scaling the gradient with the curvature information [30]. As a kind of second-order method, NGD proves to be Fisher efficient by using the FIM instead of the Hessian matrix [3, 29]. But the large overhead to compute, store and invert the FIM may limit its application. To address this, Kronecker Factored Approximate Curvature (K-FAC) is proposed to train DNNs [15].

Problem Formulation

Threat Model: In this paper, we mainly investigate black-box adversarial attacks for image classification with DNNs. Different from the white-box setting which has fully access to the DNN model and its internal structures/parameters, the black-box setting constrains the information available to the adversary. The attacker can only query the model by providing an input image and obtain the DNN output score/probability of the input. The black-box setting is more consistent with the scenario of “machine-learning-deployed-as-a-service” like Google Cloud Vision API.

In the following, we first provide a general problem formulation for adversarial attack which can be adopted to either white-box or black-box settings. Then, an efficient solution is proposed for the black-box setting. We highlight that this method can be easily adopted to the white-box setting by using the exact gradients to achieve higher query efficiency.

Attack Model: Given a legitimate image with its correct class label , the objective is to design an optimal adversarial perturbation so that the perturbed example can lead to a misclassification by the DNN model trained on legitimate images. The DNN model would misclassify the adversarial example to another class . can be obtained by solving the following problem,

(1)

where and denotes an attack loss incurred by misclassifying to another class . denotes the norm. In problem (1), the constraints on ensure that the perturbed noise at each pixel (normalized to ) is imperceptible up to a predefined -tolerant threshold.

Motivated by [7], the loss function is expressed as

(2)

where denotes the model’s prediction score or probability of the -th class for the input , and is a confidence parameter usually set to zero. Basically, achieves its minimum value 0 if is smaller than , indicating there is a label with higher probability than the correct label and thus a misclassification is achieved by adding the perturbation to . In this paper, we mainly investigate the untargeted attack which does not specify the target misclassified label. The targeted attack can be easily implemented following nearly the same problem formulation and loss function with slight modifications [7, 18]. We focus on the general formulation here and omit the targeted attack formulation.

Note that in Eq. (Problem Formulation), we use the log probability instead of because the output probability distribution tends to have one dominating class. The log operator is used to reduce the effect of the dominating class while it still preserves the probability order of all classes.

As most of the white-box attack methods rely on gradient descent methods, the unavailability of the gradients in black-box settings will limit their application. Gradient estimation methods (known as zeroth-order optimization) are applied to perform the normal projected first-order gradient descent process [17, 26] as follows,

(3)

where is the learning rate and the performs the projection onto the feasible set .

In the black-box setting, it is often the case that the query number is limited or high query efficiency is required by the adversary. The zeroth-order method tries to extract gradient information of the objective function and the first-order method is applied to minimize the loss due to its wide application in ML. However, the second-order information of the queries is not fully exploited. In this paper, we aim to take advantages of the model’s second-order information and propose a novel method named ZO-NGD optimization.

Zeroth-order Nature Gradient Descent

The proposed method is based on NGD [29] and ZO optimization [13]. In the applications of optimizing probabilistic models, NGD uses the natural gradient by multiplying the gradient with the FIM to update the parameters. NGD seems to be a potentially attractive alternative method as it requires fewer total iterations than gradient descent [32, 28, 16].

Motivated from the perspective of information geometry, NGD defines the steepest descent/direction in the realizable distribution space instead of the parameter space. The distance in the distribution space is measured with a special “Riemannian metric” [2], which is different from the standard Euclidean distance metric in the parameter space. This Riemannian metric does not rely on the parameters like the Euclidean metric, but depends on the distributions themselves. Thus it is invariant to any smooth or invertible reparameterization of the model. More details are discussed in the Geometric Interpretation Section.

Next we will introduce the FIM and the implementation details to perform NGD. Basically, the proposed framework first queries the model to estimate the gradients and Fisher information. Then after the damping and inverting processes, natural gradient is obtained to update the perturbation. Algorithm 1 shows the pseudo code of the ZO-NGD.

0:    The legitimate image ; the correct label ; the model to be queried; the learning rate ; the sampling step size ;
0:    Adversarial perturbation ;
1:  initialize with all zeros;
2:  for  do
3:     Query the model with and obtain the probability ;
4:     for  do
5:        Generate a random direction vector drawn from a uniform distribution over the surface of a unit sphere;
6:        Query the model with and obtain ;
7:     end for
8:     Estimate the gradients of the loss function according to Eq. (16);
9:     Estimate the gradients of the log-likelihood function according to Eq. (Gaussian Smoothing and Gradient Estimation);
10:     Compute the FIM according to Eq. (18) and perform the nature gradient update as shown in Eq. (19).
11:  end for
Algorithm 1 Framework of ZO-NGD.

Fisher Information Matrix and Natural Gradient

We introduce and derive the FIM in this section. In general, finding an adversarial example can be formulated as a training problem. In the idealized setting, input vectors are drawn independently from a distribution with density function , and the corresponding output is drawn from a conditional target distribution with density function . The target joint distribution is with the density of . By finding an adversarial perturbation , we obtain the learned distribution , whose density is .

In statistics, the score function [10] indicates how sensitive a likelihood function is to its parameters . Explicitly, the score function for is the gradient of the log-likelihood with respect to asbelow,

(4)
Lemma 1

The expected value of the score function with respect to is zero.

The proof is shown in the appendix. We can define an uncertainty measure around the expected value (i.e., the covariance of the score function) as follows,

(5)

The covariance of the score function above is the definition of the Fisher information. It is in the form of a matrix and the FIM can be written as

(6)

Note that this expression involves the losses on all possible values of the classes , not only the actual label for each data sample. As only corresponds to a single input , the training set only contains one data sample. Besides, since and does not depend on , we have

(7)

Then the FIM can be transformed to

(8)

The exact expectation with categories is expressed as,

(9)

The usual definition of the natural gradient is

(10)

and the NGD minimizes the loss function through

(11)

Outer Product and Monte Carlo Approximation

The FIM involves an expectation over all possible classes drawn from the probability distribution output. In the case with large number of classes, it is impractical to compute the exact FIM due to the intensive computation. To address the high computation overhead, in general there are two methods to approximate the FIM, the outer product approximation and the Monte Carlo approximation.

Outer Product Approximation

The outer product approximation of the FIM [35, 32] only uses the actual label to avoid the expectation over all possible labels , as below,

(12)

Thus a rank-one matrix can be obtained directly.

Monte Carlo Approximation

Monte Carlo (MC) approximation [32] replaces the expectation over with samples,

(13)

where each is drawn from the distribution . The MC natural gradient works well in practice with .

For higher query efficiency, we adopt the outer product approximation as it does not require additional queries .

Gaussian Smoothing and Gradient Estimation

To compute the FIM and perform NGD, we need to obtain the gradients of the loss function and the gradients of the log-likelihood , which are not directly available in the black-box setting.

To address this difficulty, we first introduce the Gaussian approximation of [31],

(14)

where is the Frobenius norm, is a smoothing parameter and is a random vector distributed uniformly over the surface of a unit sphere, i.e., . Its gradient can be written as

(15)

where is the Gaussian smoothing function. Thus, based on Eq. (Gaussian Smoothing and Gradient Estimation), we apply the zeroth-order random gradient estimation to estimate the gradients by

(16)

and

(17)

where is the number of random direction vectors and denote independent and identically distributed (i.i.d.) random direction vectors following Gaussian distribution.

We note that in each gradient estimation step, by querying the model times, we can simultaneously obtain both the and as demonstrated in Algorithm 1. Different from the zeroth-order gradient descent which only estimates the gradients of the loss function (such as \citeauthorchen2017zoo and \citeauthorilyas2018blackbox), ZO-NGD obtains and computes the FIM from the same query outputs without incurring additional query complexity. This is one major difference between ZO-NGD and other zeroth-order methods. Thus, higher query-efficiency can be achieved by leveraging the FIM and second-order optimization.

Damping for Fisher Information Matrix

The inverse of the FIM is required for natural gradient. However, the eigenvalue distribution of the FIM is known to have an extremely long tail [19], where most of the eigenvalues are close to zero. This in turn causes the eigenvalues of the inverse FIM to be extremely large, leading to the unstable training. To mitigate this problem, damping technique is used to add a positive value to the diagonal of the FIM to stabilize the training as shown below,

(18)

where is a constant. As the damping limits the maximum eigenvalue of the inverse FIM, we can restrict the norm of the gradients. This prevents ZO-NGD from moving too far in flat directions.

With the obtained FIM, the perturbation update is

(19)

ZO-NGD tries to extract the Fisher information to perform second-order optimization for faster convergence rate and better query efficiency.

Scalability to High Dimensional Datasets

Note that the FIM has a dimension of where is the dimension of the input image. On ImageNet dataset which typically contains images with about pixels (), the FIM would have billions of elements and thus it is quite difficult to compute or store the FIM, not to mention its inverse. In the application of training DNN models, the Kronecker Factored Approximate Curvature (K-FAC) method [28, 33] is adopted to deal with the difficulty of high dimensions of the DNN model. However, K-FAC methods may not be suitable in the application of finding adversarial examples as the assumption of uncorrelated channels is not valid and thus we can not apply the block diagonalization method for the FIM. Instead, we propose another method to compute of high dimensions as follows. First we have

(20)

where . The inverse matrix can be represented as,

(21)

This can be verified simply by checking their multiplication and we omit the proof here. Then the gradient update in Eq. (19) is

(22)

During the computation of , we compute first in Eq. (22) and obtain a scalar, then is simply the sum of two vectors. Although and its inverse might have billions of elements, we avoid directly computing them and the dimension of the internal computation is at most the same level as the dimension of the images, rather than its square . Thus, the ZO-NGD method can be applied on datasets with high dimensional images.

Geometric Interpretation

We provide a geometric interpretation for the natural gradient here. The negative gradient can be interpreted as the steepest descent direction in the sense that it yields the most reduction in per unit of change of , where the change is measured by the standard Euclidean norm [29], as shown below,

(23)

By following the direction, we can obtain the change of within a certain -neighbourhood to minimize the loss function.

Lemma 2

The negative natural gradient is the steepest descent direction in the distribution space.

We provide the proof of Lemma 2 in the appendix1. In the parameter space, the negative gradient is the steepest descent direction to minimize the loss function. By contrast, in the distribution space where the distance is measured by KL divergence, the steepest descent direction is the negative natural gradient. Thus, the direction in distribution space defined by the natural gradient will be invariant to the choice of parameterization [36], i.e., it will not be affected by how the model is parametrized, but only depends on the distribution induced by the parameters.

Experimental Results

In this section, we present the experimental results of the ZO-NGD method. We compare ZO-NGD with various attack methods on three image classification datasets, MNIST [22], CIFAR-10 [20] and ImageNet [12].

We train two networks for MNIST and CIFAR-10 datasets, respectively. The model for MNIST achieves 99.6% accuracy with four convolutional layers, two max pooling layers, two fully connected layers and a softmax layer. For CIFAR-10, we adopt the same model architecture as MNIST, achieving 80% accuracy. For ImageNet, a pre-trained Inception v3 network [37] is utilized instead of training our own model, attaining 96% top-5 accuracy. All experiments are performed on machines with NVIDIA GTX 1080 TI GPUs.

Evaluation of White-box Attack

We first check the white-box setting, where we compare the proposed NGD with PGD from adversarial training. PGD is a typical first-order method while NGD utilizes the second-order FIM. The query here is defined as one forward pass and one subsequent backpropagation as we need to obtain the gradients through backpropagation. We report the average number of queries over 500 images for successful adversaries on each dataset. On MNIST, NGD requires 2.12 queries while PGD needs 4.88 queries with . On CIFAR-10, NGD requires 2.06 queries while PGD needs 4.21 queries with . On ImageNet, NGD requires 2.20 queries while PGD needs 5.62 queries with . We can see that NGD achieves higher query efficiency by incorporating FIM.

Evaluation on MNIST and CIFAR-10

In the evaluation on MNIST and CIFAR-10, we select 2000 correctly classified images from MNIST and CIFAR-10 test datasets, respectively, and perform black-box attacks for these images. We compare the ZO-NGD method with the transfer attack [34], ZOO black-box attack [9], and the natural-evolution-strategy-based projected gradient descent method (NES-PGD) [17]. For the transfer attack [34], we apply C&W attack [7] to the surrogate model. The implementations of ZOO and NES-PGD are based on the GitHub code released by the authors2. For the attack methods, the pixel values of all images are normalized to the range of . In the proposed ZO-NGD method, the sampling number in the random gradient estimation as defined in Eq. (16) and (Gaussian Smoothing and Gradient Estimation) is set to 40. is set to 0.4 for MNIST and 0.2 for CIFAR-10 or ImageNet. In Eq. (16) and (Gaussian Smoothing and Gradient Estimation), we set for three datasets. is set to 0.01.

dataset Attack method
success
rate
average
queries
reduction
rate
MNIST
Transfer attack
82% - -
ZOO attack 100% 8,300 0%
NES-PGD 98.2% 1,243 85%
ZO-NGD 98.7% 523 93.7%
CIFAR
Transfer attack
85% - -
ZOO attack 99.8 % 6,500 0%
NES-PGD 98.9% 417 93.6%
ZO-NGD 99.2 % 131 98%
Table 1: Performance evaluation of black-box adversarial attacks on MNIST and CIFAR-10.

The experimental results are summarized in Table 1. We show the success rate and the average queries over successful adversarial examples for the black-box attack methods on MNIST and CIFAR-10 datasets. As shown in Table 1, the transfer attack does not achieve high success rate due to the difference between the surrogate model and the original target model. The ZOO attack method can achieve high success rate at the cost of excessive query complexity since it performs gradient estimation for each pixel of the input image. We can observe that the ZO-NGD method requires significantly less queries than the NES-PGD method. NES-PGD uses natural evolutionary strategies for gradient estimation and then perform first-order gradient descent to obtain the adversarial perturbations. Compared with NES-PGD, the proposed ZO-NGD not only estimates the first-order gradients of the loss function, but also tries to obtain the second-order Fisher information from the queries without incurring additional query complexities, leading to higher query-efficiency. From Table 1, we can observe that the ZO-NGD method attains the smallest number of queries to successfully obtain the adversarial examples in the black-box setting. Benchmarking on the ZOO method, the query reduction ratio of ZO-NGD can be as high as 93.7% on MNIST and 98% on CIFAR-10.

Evaluation on ImageNet

We perform black-box adversarial attacks on ImageNet where 1000 correctly classified images are randomly selected. On ImageNet, we compare the proposed ZO-NGD with the ZOO attack, NES-PGD method and the bandit attack with time and data-dependent priors (named as Bandits[TD]) [18]. The transfer attack is not performed since it is not easy to train a surrogate model on ImageNet. The Bandits[TD] method makes use of the prior information for the gradients estimation, including the time-dependent priors which explores the heavily correlated successive gradients, and the data-dependent priors which exploits the spatially local similarity exhibited in images. After gradient estimation with the priors or bandits information, first-order gradient descent method is applied.

Attack method
success
rate
average
queries
reduction
ratio
ZOO attack 98.9% 16,800 0%
NES-PGD 94.6% 1,325 92.1%
Bandits[TD] 96.1% 791 95.3%
ZO-NGD 97% 582 96.5%
Table 2: Performance evaluation of black-box adversarial attacks on ImageNet.

We present the performance evaluation on ImageNet in Table 2. The success rate and the average queries over successful attacks for various black-box attack methods are reported. Table 2 shows the ZOO attack method can achieve high success rate with high query complexity due to its element-wise gradient estimation. We can have a similar observation that the ZO-NGD method only requires a much smaller number of queries than the NES-PGD method due to the faster convergence rate of second-order optimization by exploring the Fisher information. We also find that the ZO-NGD method also outperforms the Bandits[TD] method in terms of query efficiency. The Bandits[TD] method enhances the query efficiency of gradient estimations through the incorporation of priors information for the gradients, but its attack methodology is still based on the first-order gradient descent method. As observed from Table 2, the ZO-NGD method achieves the highest query-efficiency for successful adversarial attacks in the black-box setting. It can obtain 96.5% query reduction ratio on ImageNet when compared with the ZOO method. In Figure 1, we show some legitimate images on ImageNet and their corresponding adversarial examples obtained by ZO-NGD. We can observe that the adversarial perturbations are imperceptible. More examples on MNIST and CIFAR-10 are shown in the appendix.

Figure 1: The legitimate images and their adversarial examples generated by ZO-NGD.

Ablation study

In this ablation study, we perform sensitivity analysis on the proposed ZO-NGD method based variations in model architectures and different parameter settings. Below we summarize the conclusion and findings from this ablation study and report their details in the appendix. (1) Tested on VGG16 and ResNet and varying the parameters and in ZO-NGD, the results demonstrate the consistent superior performance of ZO-NGD by leveraging the second-order optimization. (2) We inspect the approximation techniques used in ZO-NGD including damping and outer product method. The results show that there is a wide range of proper values such that damping can work effectively to reduce the loss, and the outer product is a reasonable approximation based on the empirical evidence. We also note that the ASR of ZOO is higher than ZO-NGD. We provide a discussion about the ASR v.s. query number in the appendix.

Query Number Distribution

Figure 2 shows the cumulative distribution (CDF) of the query number for 1000 images on three datasets, validating ZO-NGD’s query efficiency.

Figure 2: CDF of query number on three datasets using ZO-NGD.

Transferability

The transferability of adversarial examples is an interesting and valuable metric to measure the performance. To show the transferability, we use 500 targeted adversarial examples generated by ZO-NGD on ImageNet with on Inception to attack ResNet and VGG16 model. It achieves 94.4% and 95.6% ASR, respectively, demonstrating high transferability of our method. Our transferred ASR is also higher than NES-PGD (92.1% and 92.9% ASR).

Conclusion

In this paper, we propose a novel ZO-NGD to achieve high query-efficiency in black-box adversarial attacks. It incorporates the ZO random gradient estimation and the second-order FIM for NGD. The performance evaluation on three image classification datasets demonstrate the effectiveness of the proposed method in terms of fast convergence and improved query efficiency over state-of-the-art methods.

Acknowledgments

This work is partly supported by the National Science Foundation CNS-1932351, and is also based upon work partially supported by the Department of Energy National Energy Technology Laboratory under Award Number DE-OE0000911.

Disclaimer

This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

Appendix A Appendix

Appendix B Geometric Interpretation

We provide a geometric interpretation for the natural gradient in this section. The negative gradient can be interpreted as the steepest descent direction in the sense that it yields the most reduction in per unit of change of , where the change is measured by the standard Euclidean norm [29], as shown below,

(24)

By following the direction, we can obtain the change of within a certain -neighbourhood to minimize the loss function.

As the loss function is related to the likelihood, we can explore the steepest direction to minimize the loss in the space of all possible likelihoods (i.e. distribution space). KL divergence [21] is a popular measure of the distance between two distributions. For two distributions and , their KL divergence is defined as

(25)
Lemma 3

The FIM is the Hessian of KL divergence between two distributions and , with respect to , evaluated at .

The proof is shown in the appendix. By Lemma 3, the FIM can be regarded as the curvature in the distribution space.

Lemma 4

The second order Taylor expansion of the KL divergence can be expressed as

(26)

We provide a proof in the appendix.

Next we explore the direction to minimize the loss function in the distribution space where the distance is measured by the KL divergence. Although in general, the KL divergence is not symmetric, it is (approximately) symmetric in a local neighborhood. The problem can be formulated as

(27)

where is a certain constant. The purpose of fixing the KL divergence to a constant is to move along the distribution space with a constant speed, regardless of the curvature.

Thus, we can obtain Lemma 2 and show its proof in the appendix. In the parameter space, the negative gradient is the steepest descent direction to minimize the loss function. By contrast, in the distribution space, the steepest descent direction is the negative natural gradient. Thus, the direction in distribution space defined by the natural gradient will be invariant to the choice of parameterization [36], i.e., it will not be affected by how the model is parametrized, but only depends on the distribution induced by the parameters.

Appendix C Proof of Lemmas

Proof of Lemma 1

Proof of Lemma 1.

Proof of Lemma 3

Proof of Lemma 3. The gradients of the KL divergence can be expressed as

(28)
(29)

The Hessian of the KL divergence is defined by

(30)

Proof of Lemma 4

Proof of Lemma 4.

(31)

Notice that the first term is zero as they are the same distributions. The second term is zero due to Lemma 1.

Proof of Lemma 2

Proof of Lemma 2. The Lagrangian function of the minimization can be formulated as

(32)

To solve this minimization, we set its derivative to zero:

(33)
(34)
(35)

We can see that the negative natural gradient defines the steepest direction in the distribution space.

Appendix D Adversarial Examples

3

6

8

7

5

MNIST

deer

ship

truck

frog

cat

CIFAR

Figure A1: The legitimate images and their adversarial examples generated by ZO-NGD on MNIST and CIFAR-10. The misclassified class are shown on top of each image.

Appendix E Ablation study

Various Models

To check the performance of the proposed method on various model architecture, we performed experiments on three datasets using two new models (VGG16 & ResNet) and summarized the results in Table A1. The proposed ZO-NGD is more query-efficient than NES-PGD.

MNIST (VGG) CIFAR10 (VGG)
ASR Query ASR Query
NES-PGD 99.2% 1082 98.3% 381
ZO-NGD 99.5% 548 98.2% 152
ImageNet (VGG) ImageNet (ResNet)
ASR Query ASR Query
NES-PGD 96.8% 1136 97.2% 1281
ZO-NGD 96.5% 594 98.1% 624
Table A1: Attack success rate (ASR) and query # for two new models.

Parameter Analysis

The sensitivity analysis on the parameters and are demonstrated in Table A2. As observed form Table A2, the ZO-NGD performance is robust to different values (fixing ), and larger leads to fewer queries.

Value of , while fixing
1 0.1 0.01
ASR Query ASR Query ASR Query
97.3% 626 97% 582 96.6% 596
Value of , while fixing
0.15 0.2 0.25
ASR Query ASR Query ASR Query
96.1% 619 97% 583 98.2% 559
Table A2: Attack success rate (ASR) and query # for various hyper-parameters on ImageNet (Inception).

In second-order optimization, damping is a common technique to compensate for errors in the quadratic approximation. The parameter plays a key role in damping. To show the influence of , we demonstrate the loss after 2 ADMM iterations for a wide range of values with the same initialization in Figure A2(a). We observe that 0.01 or 0.001 is an appropriate choice for to achieve higher query efficiency.

Drift of Outer Product

Although we adopt outer product (Equ. (12)) to approximate Equ. (9), we use the empirical evidence below to motivate why Equ. (12) dominates in Equ. (9) and the approximation is reasonable. For a well-trained model, the prediction probability of a correctly classified image usually dominates the probability distribution, that is, is usually much larger than other probabilities if is correct and is small. We plot the average prediction probability distribution of 1000 correctly classified images on CIFAR-10 and ImageNet for their top-10 labels in Figure A2(b). As observed from Figure A2(b), the correct label usually dominates in the probability distribution, leading to reasonable approximation loss from Equ. (9) to Equ. (12).

(a) Influence of on MNIST

(b) probability distribution

Figure A2: Influence of and prediction probability distribution. Prediction probability distribution on CIFAR-10/ImageNet.

Footnotes

  1. footnotemark:
  2. The code and appendix are available at https://github.com/LinLabNEU/ZO˙NGD˙blackbox.

References

  1. M. Alzantot, B. Balaji and M. B. Srivastava (2018) Did you hear that? adversarial examples against automatic speech recognition. CoRR abs/1801.00554. External Links: Link, 1801.00554 Cited by: Introduction.
  2. S. Amari and H. Nagaoka (2007) Methods of information geometry. Vol. 191, American Mathematical Soc.. Cited by: Zeroth-order Nature Gradient Descent.
  3. S. Amari (1998-02) Natural gradient works efficiently in learning. Neural Comput. 10 (2), pp. 251–276. External Links: ISSN 0899-7667, Link, Document Cited by: Second-order optimization.
  4. B. Biggio, G. Fumera and F. Roli (2014) Security evaluation of pattern classifiers under attack. IEEE TKDE 26 (4), pp. 984–996. Cited by: Introduction.
  5. W. Brendel, J. Rauber and M. Bethge (2017) Decision-based adversarial attacks: reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248. Cited by: Heuristic black-box attacks.
  6. S. R. Bulò, B. Biggio and et. al. (2017-11) Randomized prediction games for adversarial machine learning. IEEE Transactions on Neural Networks and Learning Systems 28 (11), pp. 2466–2478. External Links: ISSN 2162-237X Cited by: Introduction.
  7. N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, pp. 39–57. Cited by: Introduction, Introduction, Introduction, Attack with gradient estimation, Problem Formulation, Evaluation on MNIST and CIFAR-10.
  8. P. Chen, Y. Sharma, H. Zhang, J. Yi and C. Hsieh (2017) EAD: elastic-net attacks to deep neural networks via adversarial examples. arXiv preprint arXiv:1709.04114. Cited by: Introduction.
  9. P. Chen, H. Zhang, Y. Sharma, J. Yi and C. Hsieh (2017) Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26. Cited by: Introduction, Attack with gradient estimation, Evaluation on MNIST and CIFAR-10.
  10. D. R. Cox and D. V. Hinkley (1979) Theoretical statistics. Chapman and Hall/CRC. Cited by: Fisher Information Matrix and Natural Gradient.
  11. A. Demontis, M. Melis and et. al. (2018) Yes, machine learning can be more secure! a case study on android malware detection. IEEE TDSC (), pp. 1–1. External Links: Document, ISSN 1545-5971 Cited by: Introduction.
  12. J. Deng, W. Dong, R. Socher, L. Li, K. Li and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248–255. Cited by: Experimental Results.
  13. J. C. Duchi, M. I. Jordan, M. J. Wainwright and A. Wibisono (2015-05) Optimal rates for zero-order convex optimization: the power of two function evaluations. IEEE Transactions on Information Theory 61 (5), pp. 2788–2806. External Links: Document, ISSN 0018-9448 Cited by: Attack with gradient estimation, Zeroth-order Nature Gradient Descent.
  14. I. Goodfellow, J. Shlens and C. Szegedy (2015) Explaining and harnessing adversarial examples. 2015 ICLR arXiv preprint arXiv:1412.6572. Cited by: Introduction, Introduction, Introduction, Introduction.
  15. R. B. Grosse and J. Martens (2016) A kronecker-factored approximate fisher matrix for convolution layers.. In ICML, Vol. 1, pp. 2. Cited by: Second-order optimization.
  16. R. Grosse and R. Salakhudinov (2015-07–09 Jul) Scaling up natural gradient by sparsely factorizing the inverse fisher matrix. In Proceedings of the 32nd International Conference on Machine Learning, F. Bach and D. Blei (Eds.), Proceedings of Machine Learning Research, Vol. 37, Lille, France, pp. 2304–2313. External Links: Link Cited by: Zeroth-order Nature Gradient Descent.
  17. A. Ilyas, L. Engstrom, A. Athalye and J. Lin (2018-07) Black-box adversarial attacks with limited queries and information. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, External Links: Link Cited by: Introduction, Attack with gradient estimation, Problem Formulation, Evaluation on MNIST and CIFAR-10.
  18. A. Ilyas, L. Engstrom and A. Madry (2018) Prior convictions: black-box adversarial attacks with bandits and priors. ICLR 2019. External Links: Link Cited by: Attack with gradient estimation, Problem Formulation, Evaluation on ImageNet.
  19. R. Karakida, S. Akaho and S. Amari (2018) Universal statistics of fisher information in deep neural networks: mean field approach. arXiv preprint arXiv:1806.01316. Cited by: Damping for Fisher Information Matrix.
  20. A. Krizhevsky and G. Hinton (2009) Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto. Cited by: Experimental Results.
  21. S. Kullback and R. A. Leibler (1951-03) On information and sufficiency. Ann. Math. Statist. 22 (1), pp. 79–86. External Links: Document, Link Cited by: Appendix B.
  22. Y. Lecun, L. Bottou, Y. Bengio and P. Haffner (1998-11) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. External Links: Document, ISSN 0018-9219 Cited by: Experimental Results.
  23. Y. LeCun, Y. Bengio and G. Hinton (2015-05) Deep learning. 521, pp. 436–44. Cited by: Introduction.
  24. Z. Li, C. Ding, S. Wang, W. Wen, Y. Zhuo, C. Liu, Q. Qiu, W. Xu, X. Lin, X. Qian and Y. Wang (2019-02) E-rnn: design optimization for efficient recurrent neural networks in fpgas. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), Vol. , pp. 69–80. External Links: Document, ISSN 1530-0897 Cited by: Introduction.
  25. Y. Lin, Z. Hong and et. al. (2017) Tactics of adversarial attack on deep reinforcement learning agents. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 3756–3762. Cited by: Introduction.
  26. S. Liu, P. Chen, X. Chen and M. Hong (2019) SignSGD via zeroth-order oracle. International Conference on Learning Representations. Cited by: Problem Formulation.
  27. A. Madry, A. Makelov, L. Schmidt, D. Tsipras and A. Vladu (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Cited by: Introduction.
  28. J. Martens and R. Grosse (2015) Optimizing neural networks with kronecker-factored approximate curvature. In International conference on machine learning, pp. 2408–2417. Cited by: Scalability to High Dimensional Datasets, Zeroth-order Nature Gradient Descent.
  29. J. Martens (2014) New perspectives on the natural gradient method. CoRR abs/1412.1193. External Links: Link, 1412.1193 Cited by: Appendix B, Second-order optimization, Geometric Interpretation, Zeroth-order Nature Gradient Descent.
  30. J. Martens (2016) Second-order optimization for neural networks. University of Toronto (Canada). Cited by: Introduction, Second-order optimization.
  31. Y. Nesterov and V. Spokoiny (2017-04-01) Random gradient-free minimization of convex functions. Foundations of Computational Mathematics 17 (2), pp. 527–566. External Links: ISSN 1615-3383, Document, Link Cited by: Gaussian Smoothing and Gradient Estimation.
  32. Y. Ollivier (2015) Riemannian metrics for neural networks i: feedforward networks. Information and Inference: A Journal of the IMA 4 (2), pp. 108–153. Cited by: Outer Product Approximation, Monte Carlo Approximation, Zeroth-order Nature Gradient Descent.
  33. K. Osawa, Y. Tsuji, Y. Ueno, A. Naruse, R. Yokota and S. Matsuoka (2018) Second-order optimization method for large mini-batch: training resnet-50 on imagenet in 35 epochs. CVPR 2019. External Links: Link, 1811.12019 Cited by: Introduction, Scalability to High Dimensional Datasets.
  34. N. Papernot, P. D. McDaniel and I. J. Goodfellow (2016) Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. CoRR abs/1605.07277. External Links: Link, 1605.07277 Cited by: Heuristic black-box attacks, Evaluation on MNIST and CIFAR-10.
  35. R. Pascanu and Y. Bengio (2013) Natural gradient revisited. CoRR abs/1301.3584. External Links: Link, 1301.3584 Cited by: Outer Product Approximation.
  36. R. Pascanu and Y. Bengio (2013) Revisiting natural gradient for deep networks. arXiv preprint arXiv:1301.3584. Cited by: Appendix B, Geometric Interpretation.
  37. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna (2016) Rethinking the inception architecture for computer vision. CVPR 2016, pp. 2818–2826. Cited by: Experimental Results.
  38. C. Tu, P. Ting, P. Chen, S. Liu, H. Zhang, J. Yi, C. Hsieh and S. Cheng (2018) AutoZOOM: autoencoder-based zeroth order optimization method for attacking black-box neural networks. arXiv preprint arXiv:1805.11770. Cited by: Attack with gradient estimation.
  39. S. Wang, X. Wang, S. Ye, P. Zhao and X. Lin (2018-11) Defending dnn adversarial attacks with pruning and logits augmentation. In 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Vol. , pp. 1144–1148. External Links: Document, ISSN null Cited by: Introduction.
  40. S. Wang, X. Wang, P. Zhao, W. Wen, D. Kaeli, P. Chin and X. Lin (2018) Defensive dropout for hardening deep neural networks under adversarial attacks. In ICCAD ’18, External Links: ISBN 978-1-4503-5950-4, Link, Document Cited by: Introduction.
  41. X. Wang, S. Wang, P. Chen, Y. Wang, B. Kulis, X. Lin and S. Chin (2019) Protecting neural networks with hierarchical random switching: towards better robustness-accuracy trade-off for stochastic defenses. In IJCAI 2019, Cited by: Introduction.
  42. Y. Wang, S. Du, S. Balakrishnan and A. Singh (2018-09–11 Apr) Stochastic zeroth-order optimization in high dimensions. In AISTATS 2018, Proceedings of Machine Learning Research, Vol. 84. External Links: Link Cited by: Attack with gradient estimation.
  43. C. Xie, J. Wang and et. al. (2018-Oct.) Adversarial examples for semantic segmentation and object detection. In ICCV 2017, pp. 1378–1387. Cited by: Introduction.
  44. K. Xu, S. Liu, P. Zhao, P.-Y. Chen, H. Zhang, D. Erdogmus, Y. Wang and X. Lin (2018-08) Structured Adversarial Attack: Towards General Implementation and Better Interpretability. ArXiv e-prints. External Links: 1808.01664 Cited by: Introduction.
  45. K. Xu, H. Chen, S. Liu, P. Chen, T. Weng, M. Hong and X. Lin (2019) Topology attack and defense for graph neural networks: an optimization perspective. arXiv preprint arXiv:1906.04214. Cited by: Introduction.
  46. H. Ye, Z. Huang, C. Fang, C. J. Li and T. Zhang (2018) Hessian-aware zeroth-order optimization for black-box adversarial attack. CoRR abs/1812.11377. External Links: Link, 1812.11377 Cited by: Attack with gradient estimation.
  47. H. Zhang, T.-W. Weng and et. al. (2018) Efficient Neural Network Robustness Certification with General Activation Functions. In NIPS 2018, Cited by: Introduction.
  48. A. Zhao, K. Fu, S. Wang, J. Zuo, Y. Zhang, Y. Hu and H. Wang (2017-08) Aircraft recognition based on landmark detection in remote sensing images. IEEE Geoscience and Remote Sensing Letters 14 (8), pp. 1413–1417. External Links: Document, ISSN 1558-0571 Cited by: Introduction.
  49. P. Zhao, S. Liu, P. Chen, N. Hoang, K. Xu, B. Kailkhura and X. Lin (2019-10) On the design of black-box adversarial examples by leveraging gradient-free optimization and operator splitting method. In ICCV 2019, Cited by: Introduction.
  50. P. Zhao, S. Liu, Y. Wang and X. Lin (2018) An admm-based universal framework for adversarial attacks on deep neural networks. In ACM Multimedia 2018, External Links: ISBN 978-1-4503-5665-7, Link, Document Cited by: Introduction.
  51. P. Zhao, S. Wang, C. Gongye, Y. Wang, Y. Fei and X. Lin (2019) Fault sneaking attack: a stealthy framework for misleading deep neural networks. In DAC 2019, External Links: ISBN 978-1-4503-6725-7, Link, Document Cited by: Introduction.
  52. P. Zhao, K. Xu, S. Liu, Y. Wang and X. Lin (2019) ADMM attack: an enhanced adversarial attack for deep neural networks with undetectable distortions. In ASPDAC 2019, External Links: ISBN 978-1-4503-6007-4, Link, Document Cited by: Introduction.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
408835
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description