Towards QueryEfficient BlackBox Adversary with ZerothOrder Natural Gradient Descent
Abstract
Despite the great achievements of the modern deep neural networks (DNNs), the vulnerability/robustness of stateoftheart DNNs raises security concerns in many application domains requiring high reliability. Various adversarial attacks are proposed to sabotage the learning performance of DNN models. Among those, the blackbox adversarial attack methods have received special attentions owing to their practicality and simplicity. Blackbox attacks usually prefer less queries in order to maintain stealthy and low costs. However, most of the current blackbox attack methods adopt the firstorder gradient descent method, which may come with certain deficiencies such as relatively slow convergence and high sensitivity to hyperparameter settings. In this paper, we propose a zerothorder natural gradient descent (ZONGD) method to design the adversarial attacks, which incorporates the zerothorder gradient estimation technique catering to the blackbox attack scenario and the secondorder natural gradient descent to achieve higher query efficiency. The empirical evaluations on image classification datasets demonstrate that ZONGD can obtain significantly lower model query complexities compared with stateoftheart attack methods.
Introduction
Modern technologies based on machine learning (ML) and specifically deep learning (DL), have achieved significant breakthroughs [23] in various applications. Deep neural network (DNN) serves as a fundamental component in artificial intelligence. However, despite the outstanding performance, many recent studies demonstrate that stateoftheart DNNs in computer vision [43, 48], speech recognition [1, 24] and deep reinforcement learning [25] are vulnerable to adversarial examples [14], which add carefully designed imperceptible distortions to legitimate inputs aiming to mislead the DNNs at test time. This raises concerns of the DNN robustness in many applications with high reliability and dependability requirements.
With the recent exploration of adversarial attacks in image classification and objection detection, the vulnerability/robustness of DNNs has attracted everincreasing attentions and efforts in the research field known as adversarial machine learning. A large amount of efforts have been devoted to: 1) designing adversarial perturbations in various ML applications [14, 7, 8, 52, 44]; 2) security evaluation methodologies to systematically estimate the DNN robustness [4, 47]; and 3) defense mechanisms against adversarial attacks [6, 11, 27, 41, 40, 45, 39]. This work mainly investigates the first category to build the groundwork towards developing potential defensive measures in reliable ML.
However, most of preliminary studies on this topic focus on the whitebox setting where the target DNN model is completely available to the attacker [14, 7, 50]. More specifically, the adversary can compute the gradients of the output with respect to the input to identify the effect of perturbing certain input pixels, with complete knowledge about the DNN model’s internal structure, parameters and configurations. Despite the theoretical interest, it is unrealistic to adopt the whitebox adversarial methods to attack practical blackbox threat models [51], where the internal model states/configurations are not revealed to the attacker (e.g., Google Cloud Vision API). Instead, the adversary can only query the model by submitting inputs and obtain the corresponding model outputs of prediction probabilities when generating adversarial examples.
In the blackbox adversarial setting, it is often the case that the less queries, the more efficient an attack becomes. Large amount of queries may be at the risk of exposing the adversary or high financial cost in the case where the query is charged per query. Notably, to date most of the whitebox [14, 7] and blackbox attacks [9, 17, 49] are based on firstorder gradient descent methods. Different from the widely utilized firstorder optimization, the application of secondorder optimization [30] is less explored due to the large computation overhead, although it may achieve faster convergence rate. The work [33] adopts natural gradient descent (NGD) to train ResNet50 on ImageNet in 35 epochs, demonstrating its great potentiality.
In this work, inspired by the superb convergence performance of NGD, we propose zerothorder natural gradient descent (ZONGD), which incorporates the zerothorder (ZO) method and the secondorder NGD, to generate blackbox adversarial examples in a queryefficient manner. The contributions of this work are summarized as follows:
+ Design of adversary attacks with NGD: To the best of our knowledge, we are the first to derive the Fisher information matrix (FIM) and adopt the secondorder NGD method for adversarial attacks, which is different from other firstorderbased whitebox and blackbox attack methods.
+ Cooptimization of zerothorder and secondorder methods: In the blackbox setting, we incorporate the zerothorder random gradient estimation to estimate the gradients which is not directly available, and leverage the secondorder NGD to achieve high queryefficiency.
+ No additional queries to obtain the FIM: During the queries to estimate the gradients of the loss, with our design the Fisher information is a byproduct that are extracted and evaluated without requiring additional query complexity.
+ Scalability to high dimensional datasets: In NGD, it is computationally infeasible to compute and invert the FIM with billions of elements on large scale datasets like ImageNet. To address this problem, we propose a method to avoid the computation and inverse of the FIM and thus the computation complexity is at most the same as the input images, rather than its square (the dimension of the FIM).
Related Work
In adversarial ML, the blackbox setting is more practical where the attacker can only query the target model by providing input images and receive the probability density output for the input.
Blackbox Attack
Attack with gradient estimation
In the blackbox setting, as the gradients are not directly available, gradient estimation methods via zerothorder optimization [42, 38, 13] are proposed to estimate the gradients. The ZOO method [9] performs pixellevel gradient estimation first and then perform whitebox C&W attack [7] with the estimated gradients. Despite its high success rate, it suffers from intensive computation and huge queries due to elementwise gradient estimation.
The more practical threat models are investigated in [17]. New attack methods based on Natural Evolutionary Strategies (NES) and Monte Carlo approximation to estimate the gradients are developed to mislead ImageNet classifiers under more restrictive threat models. The work [18] further proposes to use the prior information including the timedependent priors and datadependent priors to enhance the query efficiency.
Different from the previous firstorderbased methods, the work [46] exploits the secondorder optimization to improve query efficiency. In general, they explore Hessian information in the parameter space while our work explores the Hessian information in the distribution space (aka information matrix). Particularly, our method obtains the Fisher information during the firstorder information (gradients) estimation for free while the mentioned paper needs additional queries for Hessianbased secondorder optimization.
Heuristic blackbox attacks
In the transfer attack [34], the attacker first trains a surrogate model with data labeled by the target model. Whitebox attacks are applied to attack the surrogate model and the generated examples are transferred to attack the target model. However, it may suffer from low attack success rate due to the low similarity between the surrogate model and the target model.
The boundary method [5] utilizes a conceptually simple idea to decrease the distortion through random walks and find successful adversarial perturbations while staying on the misclassification boundary. However, it suffers from high computational complexity and lacks algorithmic convergence guarantees.
Secondorder optimization
Firstorder gradient descent methods have been extensively used in various ML tasks. They are easy to implement and suitable for largescale DL. But these methods come with well known deficiencies such as relativelyslow convergence and sensitivity to hyperparameter settings. On the other hand, secondorder optimization methods provide a elegant solution by selectively rescaling the gradient with the curvature information [30]. As a kind of secondorder method, NGD proves to be Fisher efficient by using the FIM instead of the Hessian matrix [3, 29]. But the large overhead to compute, store and invert the FIM may limit its application. To address this, Kronecker Factored Approximate Curvature (KFAC) is proposed to train DNNs [15].
Problem Formulation
Threat Model: In this paper, we mainly investigate blackbox adversarial attacks for image classification with DNNs. Different from the whitebox setting which has fully access to the DNN model and its internal structures/parameters, the blackbox setting constrains the information available to the adversary. The attacker can only query the model by providing an input image and obtain the DNN output score/probability of the input. The blackbox setting is more consistent with the scenario of “machinelearningdeployedasaservice” like Google Cloud Vision API.
In the following, we first provide a general problem formulation for adversarial attack which can be adopted to either whitebox or blackbox settings. Then, an efficient solution is proposed for the blackbox setting. We highlight that this method can be easily adopted to the whitebox setting by using the exact gradients to achieve higher query efficiency.
Attack Model: Given a legitimate image with its correct class label , the objective is to design an optimal adversarial perturbation so that the perturbed example can lead to a misclassification by the DNN model trained on legitimate images. The DNN model would misclassify the adversarial example to another class . can be obtained by solving the following problem,
(1) 
where and denotes an attack loss incurred by misclassifying to another class . denotes the norm. In problem (1), the constraints on ensure that the perturbed noise at each pixel (normalized to ) is imperceptible up to a predefined tolerant threshold.
Motivated by [7], the loss function is expressed as
(2) 
where denotes the model’s prediction score or probability of the th class for the input , and is a confidence parameter usually set to zero. Basically, achieves its minimum value 0 if is smaller than , indicating there is a label with higher probability than the correct label and thus a misclassification is achieved by adding the perturbation to . In this paper, we mainly investigate the untargeted attack which does not specify the target misclassified label. The targeted attack can be easily implemented following nearly the same problem formulation and loss function with slight modifications [7, 18]. We focus on the general formulation here and omit the targeted attack formulation.
Note that in Eq. (Problem Formulation), we use the log probability instead of because the output probability distribution tends to have one dominating class. The log operator is used to reduce the effect of the dominating class while it still preserves the probability order of all classes.
As most of the whitebox attack methods rely on gradient descent methods, the unavailability of the gradients in blackbox settings will limit their application. Gradient estimation methods (known as zerothorder optimization) are applied to perform the normal projected firstorder gradient descent process [17, 26] as follows,
(3) 
where is the learning rate and the performs the projection onto the feasible set .
In the blackbox setting, it is often the case that the query number is limited or high query efficiency is required by the adversary. The zerothorder method tries to extract gradient information of the objective function and the firstorder method is applied to minimize the loss due to its wide application in ML. However, the secondorder information of the queries is not fully exploited. In this paper, we aim to take advantages of the model’s secondorder information and propose a novel method named ZONGD optimization.
Zerothorder Nature Gradient Descent
The proposed method is based on NGD [29] and ZO optimization [13]. In the applications of optimizing probabilistic models, NGD uses the natural gradient by multiplying the gradient with the FIM to update the parameters. NGD seems to be a potentially attractive alternative method as it requires fewer total iterations than gradient descent [32, 28, 16].
Motivated from the perspective of information geometry, NGD defines the steepest descent/direction in the realizable distribution space instead of the parameter space. The distance in the distribution space is measured with a special “Riemannian metric” [2], which is different from the standard Euclidean distance metric in the parameter space. This Riemannian metric does not rely on the parameters like the Euclidean metric, but depends on the distributions themselves. Thus it is invariant to any smooth or invertible reparameterization of the model. More details are discussed in the Geometric Interpretation Section.
Next we will introduce the FIM and the implementation details to perform NGD. Basically, the proposed framework first queries the model to estimate the gradients and Fisher information. Then after the damping and inverting processes, natural gradient is obtained to update the perturbation. Algorithm 1 shows the pseudo code of the ZONGD.
Fisher Information Matrix and Natural Gradient
We introduce and derive the FIM in this section. In general, finding an adversarial example can be formulated as a training problem. In the idealized setting, input vectors are drawn independently from a distribution with density function , and the corresponding output is drawn from a conditional target distribution with density function . The target joint distribution is with the density of . By finding an adversarial perturbation , we obtain the learned distribution , whose density is .
In statistics, the score function [10] indicates how sensitive a likelihood function is to its parameters . Explicitly, the score function for is the gradient of the loglikelihood with respect to asbelow,
(4) 
Lemma 1
The expected value of the score function with respect to is zero.
The proof is shown in the appendix. We can define an uncertainty measure around the expected value (i.e., the covariance of the score function) as follows,
(5) 
The covariance of the score function above is the definition of the Fisher information. It is in the form of a matrix and the FIM can be written as
(6) 
Note that this expression involves the losses on all possible values of the classes , not only the actual label for each data sample. As only corresponds to a single input , the training set only contains one data sample. Besides, since and does not depend on , we have
(7) 
Then the FIM can be transformed to
(8) 
The exact expectation with categories is expressed as,
(9) 
The usual definition of the natural gradient is
(10) 
and the NGD minimizes the loss function through
(11) 
Outer Product and Monte Carlo Approximation
The FIM involves an expectation over all possible classes drawn from the probability distribution output. In the case with large number of classes, it is impractical to compute the exact FIM due to the intensive computation. To address the high computation overhead, in general there are two methods to approximate the FIM, the outer product approximation and the Monte Carlo approximation.
Outer Product Approximation
Monte Carlo Approximation
Monte Carlo (MC) approximation [32] replaces the expectation over with samples,
(13) 
where each is drawn from the distribution . The MC natural gradient works well in practice with .
For higher query efficiency, we adopt the outer product approximation as it does not require additional queries .
Gaussian Smoothing and Gradient Estimation
To compute the FIM and perform NGD, we need to obtain the gradients of the loss function and the gradients of the loglikelihood , which are not directly available in the blackbox setting.
To address this difficulty, we first introduce the Gaussian approximation of [31],
(14) 
where is the Frobenius norm, is a smoothing parameter and is a random vector distributed uniformly over the surface of a unit sphere, i.e., . Its gradient can be written as
(15) 
where is the Gaussian smoothing function. Thus, based on Eq. (Gaussian Smoothing and Gradient Estimation), we apply the zerothorder random gradient estimation to estimate the gradients by
(16) 
and
(17) 
where is the number of random direction vectors and denote independent and identically distributed (i.i.d.) random direction vectors following Gaussian distribution.
We note that in each gradient estimation step, by querying the model times, we can simultaneously obtain both the and as demonstrated in Algorithm 1. Different from the zerothorder gradient descent which only estimates the gradients of the loss function (such as \citeauthorchen2017zoo and \citeauthorilyas2018blackbox), ZONGD obtains and computes the FIM from the same query outputs without incurring additional query complexity. This is one major difference between ZONGD and other zerothorder methods. Thus, higher queryefficiency can be achieved by leveraging the FIM and secondorder optimization.
Damping for Fisher Information Matrix
The inverse of the FIM is required for natural gradient. However, the eigenvalue distribution of the FIM is known to have an extremely long tail [19], where most of the eigenvalues are close to zero. This in turn causes the eigenvalues of the inverse FIM to be extremely large, leading to the unstable training. To mitigate this problem, damping technique is used to add a positive value to the diagonal of the FIM to stabilize the training as shown below,
(18) 
where is a constant. As the damping limits the maximum eigenvalue of the inverse FIM, we can restrict the norm of the gradients. This prevents ZONGD from moving too far in flat directions.
With the obtained FIM, the perturbation update is
(19) 
ZONGD tries to extract the Fisher information to perform secondorder optimization for faster convergence rate and better query efficiency.
Scalability to High Dimensional Datasets
Note that the FIM has a dimension of where is the dimension of the input image. On ImageNet dataset which typically contains images with about pixels (), the FIM would have billions of elements and thus it is quite difficult to compute or store the FIM, not to mention its inverse. In the application of training DNN models, the Kronecker Factored Approximate Curvature (KFAC) method [28, 33] is adopted to deal with the difficulty of high dimensions of the DNN model. However, KFAC methods may not be suitable in the application of finding adversarial examples as the assumption of uncorrelated channels is not valid and thus we can not apply the block diagonalization method for the FIM. Instead, we propose another method to compute of high dimensions as follows. First we have
(20) 
where . The inverse matrix can be represented as,
(21) 
This can be verified simply by checking their multiplication and we omit the proof here. Then the gradient update in Eq. (19) is
(22)  
During the computation of , we compute first in Eq. (22) and obtain a scalar, then is simply the sum of two vectors. Although and its inverse might have billions of elements, we avoid directly computing them and the dimension of the internal computation is at most the same level as the dimension of the images, rather than its square . Thus, the ZONGD method can be applied on datasets with high dimensional images.
Geometric Interpretation
We provide a geometric interpretation for the natural gradient here. The negative gradient can be interpreted as the steepest descent direction in the sense that it yields the most reduction in per unit of change of , where the change is measured by the standard Euclidean norm [29], as shown below,
(23) 
By following the direction, we can obtain the change of within a certain neighbourhood to minimize the loss function.
Lemma 2
The negative natural gradient is the steepest descent direction in the distribution space.
We provide the proof of Lemma 2 in the appendix
Experimental Results
In this section, we present the experimental results of the ZONGD method. We compare ZONGD with various attack methods on three image classification datasets, MNIST [22], CIFAR10 [20] and ImageNet [12].
We train two networks for MNIST and CIFAR10 datasets, respectively. The model for MNIST achieves 99.6% accuracy with four convolutional layers, two max pooling layers, two fully connected layers and a softmax layer. For CIFAR10, we adopt the same model architecture as MNIST, achieving 80% accuracy. For ImageNet, a pretrained Inception v3 network [37] is utilized instead of training our own model, attaining 96% top5 accuracy. All experiments are performed on machines with NVIDIA GTX 1080 TI GPUs.
Evaluation of Whitebox Attack
We first check the whitebox setting, where we compare the proposed NGD with PGD from adversarial training. PGD is a typical firstorder method while NGD utilizes the secondorder FIM. The query here is defined as one forward pass and one subsequent backpropagation as we need to obtain the gradients through backpropagation. We report the average number of queries over 500 images for successful adversaries on each dataset. On MNIST, NGD requires 2.12 queries while PGD needs 4.88 queries with . On CIFAR10, NGD requires 2.06 queries while PGD needs 4.21 queries with . On ImageNet, NGD requires 2.20 queries while PGD needs 5.62 queries with . We can see that NGD achieves higher query efficiency by incorporating FIM.
Evaluation on MNIST and CIFAR10
In the evaluation on MNIST and CIFAR10, we select 2000 correctly classified images from MNIST and CIFAR10 test datasets, respectively, and perform blackbox attacks for these images. We compare the ZONGD method with the transfer attack [34], ZOO blackbox attack [9], and the naturalevolutionstrategybased projected gradient descent method (NESPGD) [17]. For the transfer attack [34], we apply C&W attack [7] to the surrogate model.
The implementations of ZOO and NESPGD are based on the GitHub code released by the authors
dataset  Attack method 




MNIST 

82%      
ZOO attack  100%  8,300  0%  
NESPGD  98.2%  1,243  85%  
ZONGD  98.7%  523  93.7%  
CIFAR 

85%      
ZOO attack  99.8 %  6,500  0%  
NESPGD  98.9%  417  93.6%  
ZONGD  99.2 %  131  98% 
The experimental results are summarized in Table 1. We show the success rate and the average queries over successful adversarial examples for the blackbox attack methods on MNIST and CIFAR10 datasets. As shown in Table 1, the transfer attack does not achieve high success rate due to the difference between the surrogate model and the original target model. The ZOO attack method can achieve high success rate at the cost of excessive query complexity since it performs gradient estimation for each pixel of the input image. We can observe that the ZONGD method requires significantly less queries than the NESPGD method. NESPGD uses natural evolutionary strategies for gradient estimation and then perform firstorder gradient descent to obtain the adversarial perturbations. Compared with NESPGD, the proposed ZONGD not only estimates the firstorder gradients of the loss function, but also tries to obtain the secondorder Fisher information from the queries without incurring additional query complexities, leading to higher queryefficiency. From Table 1, we can observe that the ZONGD method attains the smallest number of queries to successfully obtain the adversarial examples in the blackbox setting. Benchmarking on the ZOO method, the query reduction ratio of ZONGD can be as high as 93.7% on MNIST and 98% on CIFAR10.
Evaluation on ImageNet
We perform blackbox adversarial attacks on ImageNet where 1000 correctly classified images are randomly selected. On ImageNet, we compare the proposed ZONGD with the ZOO attack, NESPGD method and the bandit attack with time and datadependent priors (named as Bandits[TD]) [18]. The transfer attack is not performed since it is not easy to train a surrogate model on ImageNet. The Bandits[TD] method makes use of the prior information for the gradients estimation, including the timedependent priors which explores the heavily correlated successive gradients, and the datadependent priors which exploits the spatially local similarity exhibited in images. After gradient estimation with the priors or bandits information, firstorder gradient descent method is applied.
Attack method 




ZOO attack  98.9%  16,800  0%  
NESPGD  94.6%  1,325  92.1%  
Bandits[TD]  96.1%  791  95.3%  
ZONGD  97%  582  96.5% 
We present the performance evaluation on ImageNet in Table 2. The success rate and the average queries over successful attacks for various blackbox attack methods are reported. Table 2 shows the ZOO attack method can achieve high success rate with high query complexity due to its elementwise gradient estimation. We can have a similar observation that the ZONGD method only requires a much smaller number of queries than the NESPGD method due to the faster convergence rate of secondorder optimization by exploring the Fisher information. We also find that the ZONGD method also outperforms the Bandits[TD] method in terms of query efficiency. The Bandits[TD] method enhances the query efficiency of gradient estimations through the incorporation of priors information for the gradients, but its attack methodology is still based on the firstorder gradient descent method. As observed from Table 2, the ZONGD method achieves the highest queryefficiency for successful adversarial attacks in the blackbox setting. It can obtain 96.5% query reduction ratio on ImageNet when compared with the ZOO method. In Figure 1, we show some legitimate images on ImageNet and their corresponding adversarial examples obtained by ZONGD. We can observe that the adversarial perturbations are imperceptible. More examples on MNIST and CIFAR10 are shown in the appendix.
Ablation study
In this ablation study, we perform sensitivity analysis on the proposed ZONGD method based variations in model architectures and different parameter settings. Below we summarize the conclusion and findings from this ablation study and report their details in the appendix. (1) Tested on VGG16 and ResNet and varying the parameters and in ZONGD, the results demonstrate the consistent superior performance of ZONGD by leveraging the secondorder optimization. (2) We inspect the approximation techniques used in ZONGD including damping and outer product method. The results show that there is a wide range of proper values such that damping can work effectively to reduce the loss, and the outer product is a reasonable approximation based on the empirical evidence. We also note that the ASR of ZOO is higher than ZONGD. We provide a discussion about the ASR v.s. query number in the appendix.
Query Number Distribution
Figure 2 shows the cumulative distribution (CDF) of the query number for 1000 images on three datasets, validating ZONGD’s query efficiency.
Transferability
The transferability of adversarial examples is an interesting and valuable metric to measure the performance. To show the transferability, we use 500 targeted adversarial examples generated by ZONGD on ImageNet with on Inception to attack ResNet and VGG16 model. It achieves 94.4% and 95.6% ASR, respectively, demonstrating high transferability of our method. Our transferred ASR is also higher than NESPGD (92.1% and 92.9% ASR).
Conclusion
In this paper, we propose a novel ZONGD to achieve high queryefficiency in blackbox adversarial attacks. It incorporates the ZO random gradient estimation and the secondorder FIM for NGD. The performance evaluation on three image classification datasets demonstrate the effectiveness of the proposed method in terms of fast convergence and improved query efficiency over stateoftheart methods.
Acknowledgments
This work is partly supported by the National Science Foundation CNS1932351, and is also based upon work partially supported by the Department of Energy National Energy Technology Laboratory under Award Number DEOE0000911.
Disclaimer
This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.
Appendix A Appendix
Appendix B Geometric Interpretation
We provide a geometric interpretation for the natural gradient in this section. The negative gradient can be interpreted as the steepest descent direction in the sense that it yields the most reduction in per unit of change of , where the change is measured by the standard Euclidean norm [29], as shown below,
(24) 
By following the direction, we can obtain the change of within a certain neighbourhood to minimize the loss function.
As the loss function is related to the likelihood, we can explore the steepest direction to minimize the loss in the space of all possible likelihoods (i.e. distribution space). KL divergence [21] is a popular measure of the distance between two distributions. For two distributions and , their KL divergence is defined as
(25) 
Lemma 3
The FIM is the Hessian of KL divergence between two distributions and , with respect to , evaluated at .
The proof is shown in the appendix. By Lemma 3, the FIM can be regarded as the curvature in the distribution space.
Lemma 4
The second order Taylor expansion of the KL divergence can be expressed as
(26) 
We provide a proof in the appendix.
Next we explore the direction to minimize the loss function in the distribution space where the distance is measured by the KL divergence. Although in general, the KL divergence is not symmetric, it is (approximately) symmetric in a local neighborhood. The problem can be formulated as
(27) 
where is a certain constant. The purpose of fixing the KL divergence to a constant is to move along the distribution space with a constant speed, regardless of the curvature.
Thus, we can obtain Lemma 2 and show its proof in the appendix. In the parameter space, the negative gradient is the steepest descent direction to minimize the loss function. By contrast, in the distribution space, the steepest descent direction is the negative natural gradient. Thus, the direction in distribution space defined by the natural gradient will be invariant to the choice of parameterization [36], i.e., it will not be affected by how the model is parametrized, but only depends on the distribution induced by the parameters.
Appendix C Proof of Lemmas
Proof of Lemma 1
Proof of Lemma 1.
Proof of Lemma 3
Proof of Lemma 3. The gradients of the KL divergence can be expressed as
(28) 
(29) 
The Hessian of the KL divergence is defined by
(30) 
Proof of Lemma 4
Proof of Lemma 2
Proof of Lemma 2. The Lagrangian function of the minimization can be formulated as
(32) 
To solve this minimization, we set its derivative to zero:
(33) 
(34) 
(35) 
We can see that the negative natural gradient defines the steepest direction in the distribution space.
Appendix D Adversarial Examples
3 
6 
8 
7 
5 

MNIST 

deer 
ship 
truck 
frog 
cat 

CIFAR 
Appendix E Ablation study
Various Models
To check the performance of the proposed method on various model architecture, we performed experiments on three datasets using two new models (VGG16 & ResNet) and summarized the results in Table A1. The proposed ZONGD is more queryefficient than NESPGD.
MNIST (VGG)  CIFAR10 (VGG)  
ASR  Query  ASR  Query  
NESPGD  99.2%  1082  98.3%  381 
ZONGD  99.5%  548  98.2%  152 
ImageNet (VGG)  ImageNet (ResNet)  
ASR  Query  ASR  Query  
NESPGD  96.8%  1136  97.2%  1281 
ZONGD  96.5%  594  98.1%  624 
Parameter Analysis
The sensitivity analysis on the parameters and are demonstrated in Table A2. As observed form Table A2, the ZONGD performance is robust to different values (fixing ), and larger leads to fewer queries.
Value of , while fixing  
1  0.1  0.01  
ASR  Query  ASR  Query  ASR  Query 
97.3%  626  97%  582  96.6%  596 
Value of , while fixing  
0.15  0.2  0.25  
ASR  Query  ASR  Query  ASR  Query 
96.1%  619  97%  583  98.2%  559 
In secondorder optimization, damping is a common technique to compensate for errors in the quadratic approximation. The parameter plays a key role in damping. To show the influence of , we demonstrate the loss after 2 ADMM iterations for a wide range of values with the same initialization in Figure A2(a). We observe that 0.01 or 0.001 is an appropriate choice for to achieve higher query efficiency.
Drift of Outer Product
Although we adopt outer product (Equ. (12)) to approximate Equ. (9), we use the empirical evidence below to motivate why Equ. (12) dominates in Equ. (9) and the approximation is reasonable. For a welltrained model, the prediction probability of a correctly classified image usually dominates the probability distribution, that is, is usually much larger than other probabilities if is correct and is small. We plot the average prediction probability distribution of 1000 correctly classified images on CIFAR10 and ImageNet for their top10 labels in Figure A2(b). As observed from Figure A2(b), the correct label usually dominates in the probability distribution, leading to reasonable approximation loss from Equ. (9) to Equ. (12).
(a) Influence of on MNIST 
(b) probability distribution 
Footnotes
 footnotemark:
 The code and appendix are available at https://github.com/LinLabNEU/ZO˙NGD˙blackbox.
References
 (2018) Did you hear that? adversarial examples against automatic speech recognition. CoRR abs/1801.00554. External Links: Link, 1801.00554 Cited by: Introduction.
 (2007) Methods of information geometry. Vol. 191, American Mathematical Soc.. Cited by: Zerothorder Nature Gradient Descent.
 (199802) Natural gradient works efficiently in learning. Neural Comput. 10 (2), pp. 251–276. External Links: ISSN 08997667, Link, Document Cited by: Secondorder optimization.
 (2014) Security evaluation of pattern classifiers under attack. IEEE TKDE 26 (4), pp. 984–996. Cited by: Introduction.
 (2017) Decisionbased adversarial attacks: reliable attacks against blackbox machine learning models. arXiv preprint arXiv:1712.04248. Cited by: Heuristic blackbox attacks.
 (201711) Randomized prediction games for adversarial machine learning. IEEE Transactions on Neural Networks and Learning Systems 28 (11), pp. 2466–2478. External Links: ISSN 2162237X Cited by: Introduction.
 (2017) Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, pp. 39–57. Cited by: Introduction, Introduction, Introduction, Attack with gradient estimation, Problem Formulation, Evaluation on MNIST and CIFAR10.
 (2017) EAD: elasticnet attacks to deep neural networks via adversarial examples. arXiv preprint arXiv:1709.04114. Cited by: Introduction.
 (2017) Zoo: zeroth order optimization based blackbox attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26. Cited by: Introduction, Attack with gradient estimation, Evaluation on MNIST and CIFAR10.
 (1979) Theoretical statistics. Chapman and Hall/CRC. Cited by: Fisher Information Matrix and Natural Gradient.
 (2018) Yes, machine learning can be more secure! a case study on android malware detection. IEEE TDSC (), pp. 1–1. External Links: Document, ISSN 15455971 Cited by: Introduction.
 (2009) Imagenet: a largescale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248–255. Cited by: Experimental Results.
 (201505) Optimal rates for zeroorder convex optimization: the power of two function evaluations. IEEE Transactions on Information Theory 61 (5), pp. 2788–2806. External Links: Document, ISSN 00189448 Cited by: Attack with gradient estimation, Zerothorder Nature Gradient Descent.
 (2015) Explaining and harnessing adversarial examples. 2015 ICLR arXiv preprint arXiv:1412.6572. Cited by: Introduction, Introduction, Introduction, Introduction.
 (2016) A kroneckerfactored approximate fisher matrix for convolution layers.. In ICML, Vol. 1, pp. 2. Cited by: Secondorder optimization.
 (201507–09 Jul) Scaling up natural gradient by sparsely factorizing the inverse fisher matrix. In Proceedings of the 32nd International Conference on Machine Learning, F. Bach and D. Blei (Eds.), Proceedings of Machine Learning Research, Vol. 37, Lille, France, pp. 2304–2313. External Links: Link Cited by: Zerothorder Nature Gradient Descent.
 (201807) Blackbox adversarial attacks with limited queries and information. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, External Links: Link Cited by: Introduction, Attack with gradient estimation, Problem Formulation, Evaluation on MNIST and CIFAR10.
 (2018) Prior convictions: blackbox adversarial attacks with bandits and priors. ICLR 2019. External Links: Link Cited by: Attack with gradient estimation, Problem Formulation, Evaluation on ImageNet.
 (2018) Universal statistics of fisher information in deep neural networks: mean field approach. arXiv preprint arXiv:1806.01316. Cited by: Damping for Fisher Information Matrix.
 (2009) Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto. Cited by: Experimental Results.
 (195103) On information and sufficiency. Ann. Math. Statist. 22 (1), pp. 79–86. External Links: Document, Link Cited by: Appendix B.
 (199811) Gradientbased learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. External Links: Document, ISSN 00189219 Cited by: Experimental Results.
 (201505) Deep learning. 521, pp. 436–44. Cited by: Introduction.
 (201902) Ernn: design optimization for efficient recurrent neural networks in fpgas. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), Vol. , pp. 69–80. External Links: Document, ISSN 15300897 Cited by: Introduction.
 (2017) Tactics of adversarial attack on deep reinforcement learning agents. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 3756–3762. Cited by: Introduction.
 (2019) SignSGD via zerothorder oracle. International Conference on Learning Representations. Cited by: Problem Formulation.
 (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Cited by: Introduction.
 (2015) Optimizing neural networks with kroneckerfactored approximate curvature. In International conference on machine learning, pp. 2408–2417. Cited by: Scalability to High Dimensional Datasets, Zerothorder Nature Gradient Descent.
 (2014) New perspectives on the natural gradient method. CoRR abs/1412.1193. External Links: Link, 1412.1193 Cited by: Appendix B, Secondorder optimization, Geometric Interpretation, Zerothorder Nature Gradient Descent.
 (2016) Secondorder optimization for neural networks. University of Toronto (Canada). Cited by: Introduction, Secondorder optimization.
 (20170401) Random gradientfree minimization of convex functions. Foundations of Computational Mathematics 17 (2), pp. 527–566. External Links: ISSN 16153383, Document, Link Cited by: Gaussian Smoothing and Gradient Estimation.
 (2015) Riemannian metrics for neural networks i: feedforward networks. Information and Inference: A Journal of the IMA 4 (2), pp. 108–153. Cited by: Outer Product Approximation, Monte Carlo Approximation, Zerothorder Nature Gradient Descent.
 (2018) Secondorder optimization method for large minibatch: training resnet50 on imagenet in 35 epochs. CVPR 2019. External Links: Link, 1811.12019 Cited by: Introduction, Scalability to High Dimensional Datasets.
 (2016) Transferability in machine learning: from phenomena to blackbox attacks using adversarial samples. CoRR abs/1605.07277. External Links: Link, 1605.07277 Cited by: Heuristic blackbox attacks, Evaluation on MNIST and CIFAR10.
 (2013) Natural gradient revisited. CoRR abs/1301.3584. External Links: Link, 1301.3584 Cited by: Outer Product Approximation.
 (2013) Revisiting natural gradient for deep networks. arXiv preprint arXiv:1301.3584. Cited by: Appendix B, Geometric Interpretation.
 (2016) Rethinking the inception architecture for computer vision. CVPR 2016, pp. 2818–2826. Cited by: Experimental Results.
 (2018) AutoZOOM: autoencoderbased zeroth order optimization method for attacking blackbox neural networks. arXiv preprint arXiv:1805.11770. Cited by: Attack with gradient estimation.
 (201811) Defending dnn adversarial attacks with pruning and logits augmentation. In 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Vol. , pp. 1144–1148. External Links: Document, ISSN null Cited by: Introduction.
 (2018) Defensive dropout for hardening deep neural networks under adversarial attacks. In ICCAD ’18, External Links: ISBN 9781450359504, Link, Document Cited by: Introduction.
 (2019) Protecting neural networks with hierarchical random switching: towards better robustnessaccuracy tradeoff for stochastic defenses. In IJCAI 2019, Cited by: Introduction.
 (201809–11 Apr) Stochastic zerothorder optimization in high dimensions. In AISTATS 2018, Proceedings of Machine Learning Research, Vol. 84. External Links: Link Cited by: Attack with gradient estimation.
 (2018Oct.) Adversarial examples for semantic segmentation and object detection. In ICCV 2017, pp. 1378–1387. Cited by: Introduction.
 (201808) Structured Adversarial Attack: Towards General Implementation and Better Interpretability. ArXiv eprints. External Links: 1808.01664 Cited by: Introduction.
 (2019) Topology attack and defense for graph neural networks: an optimization perspective. arXiv preprint arXiv:1906.04214. Cited by: Introduction.
 (2018) Hessianaware zerothorder optimization for blackbox adversarial attack. CoRR abs/1812.11377. External Links: Link, 1812.11377 Cited by: Attack with gradient estimation.
 (2018) Efficient Neural Network Robustness Certification with General Activation Functions. In NIPS 2018, Cited by: Introduction.
 (201708) Aircraft recognition based on landmark detection in remote sensing images. IEEE Geoscience and Remote Sensing Letters 14 (8), pp. 1413–1417. External Links: Document, ISSN 15580571 Cited by: Introduction.
 (201910) On the design of blackbox adversarial examples by leveraging gradientfree optimization and operator splitting method. In ICCV 2019, Cited by: Introduction.
 (2018) An admmbased universal framework for adversarial attacks on deep neural networks. In ACM Multimedia 2018, External Links: ISBN 9781450356657, Link, Document Cited by: Introduction.
 (2019) Fault sneaking attack: a stealthy framework for misleading deep neural networks. In DAC 2019, External Links: ISBN 9781450367257, Link, Document Cited by: Introduction.
 (2019) ADMM attack: an enhanced adversarial attack for deep neural networks with undetectable distortions. In ASPDAC 2019, External Links: ISBN 9781450360074, Link, Document Cited by: Introduction.