Evolving Robust Neural Architectures to Defend from Adversarial Attacks

Evolving Robust Neural Architectures to Defend from Adversarial Attacks

Abstract

Deep neural networks are prone to misclassify slightly modified input images. Recently, many defences have been proposed, but none have improved the robustness of neural networks consistently. Here, we propose to use adversarial attacks as a function evaluation to automatically search for neural architectures that can resist such attacks. Experiments on neural architecture search algorithms from the literature show that although accurate, they are not able to find robust architectures. A major reason for this lies in their limited search space. By creating a novel neural architecture search with options for dense layers to connect with convolution layers and vice-versa as well as the addition of concatenation layers in the search, we were able to evolve an architecture that is inherently accurate on adversarial samples. Interestingly, this inherent robustness of the evolved architecture rivals state-of-the-art defences such as adversarial training while being trained only on the non-adversarial samples. Moreover, the evolved architecture makes use of some peculiar traits which might be useful for developing even more robust ones. Thus, the results here demonstrate that more robust architectures exist as well as opens up a new range of possibilities for the development and exploration of deep neural networks using automatic architecture search.
Code available at http://bit.ly/RobustArchitectureSearch.

1 Introduction

Automatic Architecture Search (AAS) and adversarial samples have rarely appeared together. Regarding adversarial samples, they were discovered in when DNNs were shown to behave strangely for nearly the same images [50]. Afterwards, a series of vulnerabilities were found [38, 36, 6, 49]. Such adversarial attacks can also be easily applied to real-world scenarios [25, 2], which confers a big problem for current deep neural networks’ applications. Currently, there is not any known learning algorithm or procedure that can defend against adversarial attacks consistently.

Regarding AAS, the automatic design of architectures has been of broad interest for many years. The aim is to develop methods that do not need specialists in order to be applied to a different application. This would confer not only generality but also easy of use. Most of the algorithms for AAS are either based on reinforcement learning [62, 4, 63, 42] or evolutionary computation [30, 44, 58, 13, 28, 40, 43, 27, 35, 57]. On the one hand, in reinforcement learning approaches, architectures are created from a sequence of actions which are afterwards rewarded proportionally to the crafted architecture’s accuracy. On the other hand, in evolutionary computation based methods, small changes in the architecture (mutations) and recombinations (crossover) are used to create new architectures. All architectures evolved are evaluated based on their accuracy. Some of the best architectures based on this accuracy are chosen to continue to the next generation.

Here we propose the use of AAS to tackle the robustness issues exposed by adversarial samples. In other words, architecture search will be employed not only to find accurate neural networks but also robust ones. This is based on the principle that robustness of neural networks can be evaluated by using adversarial attacks as function evaluation. We hypothesize that if there is a solution in a given architecture search space, the search algorithm would be able to find it. This is not only a blind search for a cure. The best architectures found should also hint which structures and procedures provide robustness for neural networks. Therefore, it would be possible to use the results of the search to understand further how to improve the representation of models as well as design yet more robust ones.

2 Adversarial Machine Learning

Adversarial machine learning is a constrained optimization problem. Let be the output of a machine learning algorithm in which is the input of the algorithm for input and output of sizes (images with three channels are considered) and respectively. Adversarial samples x’ can be defined as follows:

(1)

in which is a small perturbation added to the input. Therefore, adversarial machine learning can be defined as an optimization problem 1:

subject to (2)

where and are respectively the soft-label for the correct class and a threshold value.

Moreover, attacks can be divided according to the function optimized. In this way, there are (limited number of pixels attacked), , and (limited amount of variation in each pixel) types of attacks. There are many types of attacks as well as their improvements. Universal perturbation types of attacks were shown possible in which a single perturbation added to most of the samples is capable of fooling a DNN in most of the cases [36]. Image patches are also able to make a DNN misclassify [6]. Moreover, extreme attacks such as only modifying one pixel () called one-pixel attack is also shown to be surprisingly effective [49]. Most of these attacks can be easily transferred to real scenarios by using printed out versions of them [25]. Moreover, carefully crafted glasses [46] or even general 3D adversarial objects are also capable of causing misclassification [2]. Regarding understanding the phenomenon, it is argued in [16] that DNNs’ linearity is one of the main reasons. Another recent investigation proposes the conflicting saliency added by adversarial samples as the reason for misclassification [56].

Many defensive systems were proposed to mitigate some of the problems. However, current solutions are still far from solving the problems. Defensive distillation uses a smaller neural network to learn the content from the original one [41] however it was shown not to be robust enough [10]. The addition of adversarial samples to the training dataset, called adversarial training, was also proposed [16, 22, 34]. However, adversarial training has a strong bias in the type of adversarial samples used and is still vulnerable to attacks [51]. Many recent variations of defences were proposed [15, 19, 12, 17, 47, 59, 33, 7] which are carefully analyzed, and many of their shortcomings explained in [1, 52]. In this paper, different from previous approaches, we aim to tackle the robustness problems of DNNs by automatically searching for inherent robust architectures.

3 Architecture Search

There are three components to a neural architecture search: search space, search strategy and performance estimation. A search space substantially limits the representation of the architecture in a given space. A search strategy must be employed to search for architectures in a defined search space. Some widely used search strategies for AAS are: Random Search, Bayesian Optimization [23], Evolutionary Methods [30, 44, 58, 13, 28, 40, 43, 27, 35, 57], Reinforcement Learning [62, 4, 8, 61, 9, 42], and Gradient Based Methods [5, 32, 31]. Finally, a performance estimation (usually error rate) is required to evaluate the explored architectures.

Currently, most of the current AAS suffer from high computational cost while searching in a relatively small search space [62, 30, 44, 26, 29]. It is already shown in [60] that, if there is a possibility of fitness approximation at small search spaces, we could evolve algorithms in an ample search space. Moreover, many architecture searches focus primarily on the hyper-parameter search while using architecture search spaces around previously hand-crafted architecture [53, 3, 26, 14, 21] such as DenseNet [9] which are proved to be vulnerable to adversarial attacks [54]. Therefore, for robustness architectures to be found, it is crucial to expand the search space beyond the current AAS.

SMASH [5] uses a neural network to generate the weights of the primary model. The main strength of this approach lies in preventing high computational cost, which is incurred in other searches. However, this comes at the cost of not being able of tweaking hyperparameters which affect weights like initialisers and regularisers. Deep Architect [37] follows a hierarchical approach using various search algorithms such as Monte Carlo Tree Search (MCTS) and Sequential Model based Global Optimization (SMBO).

4 Searching for Robust Architectures

A robust evaluation (defined in Section 4.1) and search algorithm must be defined to search for robust architectures. The search algorithm may be an AAS from the architecture provided that some modifications are made (Section 4.2). However, to allow for a more extensive search space, which is better suited to the problem, we also propose the Robust Architecture Search (Section 5).

4.1 Robustness Evaluation

Adversarial accuracy may seem like a natural function evaluation for assessing neural networks’ robustness. However, there are many types of perturbations possible; each will result in a different type of robustness assessment and evolution. For example, let us suppose an evaluation with is chosen, robust networks against might be developed while nothing can be said from other and attack types (different ). Therefore, plays a role but the different types of , , and completely change the type of robustness, such as wide perturbations (), punctual perturbations () and a mix of both ( and ). To avoid creating neural networks that are only robust against one type of robustness and at the same time to allow robustness to slowly build-up from any partial robustness, a set of adversarial attacks for varying and are necessary.

Model Optimizer Attack Attack Total
1 3 5 10 1 3 5 10
CapsNet DE 18 46 45 47 05 09 12 24 206
CMA-ES 14 34 45 62 09 38 74 98 374
AT DE 23 59 63 66 00 02 03 06 222
CMA-ES 20 50 70 82 03 12 25 57 319
ResNet DE 23 66 75 77 06 22 46 78 393
CMA-ES 11 49 63 77 28 72 75 83 458
FS DE 21 73 78 78 04 21 45 78 398
CMA-ES 17 49 69 78 26 63 66 74 442
Total 147 426 508 567 81 239 346 498 2812
Table 1: The number of samples used from each type of black-box attack to compose the adversarial samples. Based on the principle of the transferability of adversarial samples, these adversarial samples are used as a fast attack for the robustness evaluation of architectures. Details of the attacks as well as the motivation for using a model-agnostic (black-box) dual quality ( and ) assessment are explained in detail at [54].

To evaluate the robustness of architectures in varying and while at the same time keeping computational cost low, we use here a transferable type of attack. In other words, adversarial samples previously found by attacking other methods are stored and used as possible adversarial samples to the current model under evaluation. This solves the problem that most of the attacks are usually slow to be put inside a loop which can make the search for architectures too expensive. Specifically, we use the adversarial samples from two types of attacks ( and attacks) with two optimization algorithms (Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [18] and Differential Evolution (DE) [48]). We attacked traditional architectures such as ResNet [20] and CapsNet [45]. We also attacked some state-of-art-defences such as Adversarial Training (AT) [34] and Feature Squeezing (FS) [59] defending ResNet. We use CIFAR-10 dataset [24] to generate the adversarial samples. Table 1 shows a summary of the number of images used from each type of attack, totalling adversarial samples. Attacks were made using the model agnostic dual quality assessment [54]. The evaluation procedure consists of calculating the amount of successful adversarial samples divided by the total of possible adversarial samples. This also avoids problems with different amount of perturbation necessary for attacks to succeed, which could cause incomparable results.

4.2 Robust Search Conversion of AAS

By changing the fitness function (in the case of evolutionary computation based AAS) or the reward function (in the case of reinforcement learning based AAS), it is possible to create robust search versions of AAS algorithms. In other words, it is possible to convert the search for accuracy into the search for robustness and accuracy. Here we use SMASH and DeepArchitect for the tests. The reason for the choice lies in the difference between the methods and availability of the code. Both methods have their evaluation function modified to contain not only accuracy but also robustness (Section 4.1).

5 Robust Architecture Search (RAS)

Here, we propose an evolutionary algorithm to search for robust architectures called Robust Architecture Search (RAS). It makes sense to focus here on search spaces that allow for unusual layer types and their combinations to happen, which is more vast than the current traditional search spaces. The motivation to consider this vast search space is that some of the most robust architectures might contain these unusual combinations which are not yet found or deeply explored.

RAS Overview

RAS works by creating three initial populations (layer, block and model populations). Every generation, the model population have each of its members modified five times by mutations. The modified members are added to the population as new members. Here we propose a utility evaluation in which layer and block populations are evaluated by the number of models (architectures) using them. Models are evaluated by their accuracy and attack resilience (accuracy on adversarial samples). All blocks and layers which are not used by any of the current members of the model population are removed at the end step of each generation. Moreover, architectures compete with similar ones in their subpopulation, such that only the fittest of each subpopulation survives.

5.1 Description of Population

For such a vast search space to be more efficiently searched, we propose to use three subpopulations, allowing for the reuse of blocks and layers. Specifically, the layers consist of:

Layer Population:

Raw layers (convolutional and fully connected) which make up the blocks.

Block Population:

Blocks which are a combination of layers.

Model Population:

A population of architectures which consists of interconnected blocks.

Figure 1: Illustration of the proposed RAS structure with three subpopulations.

Figure 1 illustrates the architecture. The initial population consists of random architectures which contain blocks made up of layers, in which is a uniform random distribution with minimum and maximum values. The possible available parameters for the layers are as follows: for convolutional layers, filter size might be , , or , stride size may be or and kernel size is either , or ; for fully connected layers, the unit size may assume the values of , , or . All the layers use Rectified Linear Unit (ReLU) as an activation function and are followed by a batch-normalization layer.

The possible available parameters for the layers are as follows: for convolutional layers, filter size might be , , or , stride size may be or and kernel size is either , or ; for fully connected layers, the unit size may assume the values of , , or . All the layers use Rectified Linear Unit (ReLU) as an activation function.

5.2 Mutation Operators

Regarding the mutation operators used to evolve the architecture, they can be divided into layer, block and model mutations which can only be applied to the respective layer, block and model populations’ individuals. The following paragraphs define the possible mutations.

Layer Mutation

Layer mutations are of the following types:

Change Kernel:

Changes the kernel size of the convolution layer,

Change Filter:

Changes the filter size of the convolution layer,

Change Units:

Changes the unit size of the fully connected layer,

Swap Layer:

Chosen layer is swapped with a random layer from the layer population.

Block Mutation

Block mutation change a single block in the block population. The possibilities are:

Add Layer:

A random layer is added to a chosen random block,

Remove Layer:

A random layer is removed from a chosen random block,

Add Layer Connection:

A random connection between two layers from the chosen random block is added,

Remove Layer Connection:

A random connection between the two layers from the chosen random block is removed,

Swap Block:

Chosen block is swapped with a random block from the population.

Model Mutation

Model mutation modify a given architecture. The possible model mutations are:

Add Block:

A random block is added to the model,

Remove Block:

A random block is removed from the model,

Add Block connection:

A random connection between the two blocks is added,

Remove Block connection:

A random connection between the two blocks is removed.

All mutations add a new member to the population instead of substituting the previous one. In this manner, if nothing is done, the population of layers and blocks may explode, increasing the number of lesser quality layers and blocks. This would cause the probability of choosing functional layers and blocks to decrease. To avoid this, when the layer or block population exceeds individuals, the only layer/block mutation available is swap layers/blocks.

5.3 Objective (Fitness) Function

Fitness of an individual of the model population is measured using the final validation accuracy of the model after training for a maximum of epochs with early stopping if accuracy or validation accuracy do not change more than in the span of epochs. Regarding the fitness calculation, the fitness is calculated as the accuracy of the model plus the robustness of the model (Fitness = Accuracy Of The model - Attack Accuracy). The robustness of the architecture is calculated as described in Section 4.1.

The accuracy of the architecture is calculated after the model is trained for epochs over the whole set of samples ( samples) of the CIFAR-10’s training dataset (for every generation) or over random samples of the CIFAR-10 training dataset (for all other generations). This allows an efficient evolution to happen in which blocks and layers evolve at a faster rate without interfering with the architecture’s accuracy. Using entire dataset subjects to evolving the architecture to have better accuracy and using a subset of the dataset evolves the layers and blocks of the architecture at a faster rate.

5.4 Spectrum-based Niching Scheme

Figure 2: Illustration of the proposed Evolutionary Strategy. In this strategy, Step 2, 3, and 4 are repeated for all the individuals in the child population. Step 1 is repeated when all the individuals in the child population have been evaluated against a cluster individual.

To keep a high amount of diversity while searching in a vast search space by using a novel algorithm described below also shown in Figure 2. This niching scheme uses the idea of Spectrum-based niching from [55] but explores a different approach to it. First, all the initial population is converted into a cluster population such that each individual in the initial population is a cluster representative. Then we create two child individuals for each cluster representative by randomly applying five mutation operators on cluster representative. We then find the closest cluster representative to the child individual using spectrum described below. If the fitness of the child individual is better than the closest cluster representative than the child individual becomes the new cluster representative, and the old cluster representative is removed from the population and the generation. The process is completed for all the individuals in a cluster population. We are hence evolving a generation of the evolution.

Here, we use the spectrum as a histogram containing the features: Number of Blocks, Number of Total Layers, Number of Block Connections, Number of Total Layer Connections, Number of Dense Layers, Number of Convolution Layers, Number of Dense to Dense Connections, Number of Dense to Convolution Connections, Number of Convolution to Dense Connections, and Number of Convolution to Convolution Connections. By using this Spectrum-based niching scheme, we aim to achieve an open-ended evolution, preventing the evolution from converging to a single robust architecture. Preserving diversity in the population ensures that the exploration rate remains relatively high, allowing us to find different architectures even after many evolution steps. For the vast search space of architectures, this property is especially important, allowing the algorithm to traverse the vast search space efficiently.

6 Experiments on RAS and Converted AAS

Architecture Search Testing ER ER on Adversarial Samples
DeepArchitect* [37] 25% 75%
SMASH* [5] 23% 82%
Ours 18% 42%
Table 2: Error Rate (ER) on both the testing dataset and adversarial samples when the evaluation function has both accuracies on the testing data and accuracy on the adversarial samples. *Both DeepArchitect and SMASH had their evaluation function modified to be the sum of accuracy on the testing and adversarial samples.

Here, experiments are conducted on both the proposed RAS and converted versions of DeepArchitect and SMASH. The objective is to achieve the highest robustness possible using different types of architecture search algorithms and compare their result and effectiveness. Initially, DeepArchitect and Smash found architectures which had an error rate of and respectively when the fitness is only based on the DNN’s testing accuracy. However, when the accuracy on adversarial samples is included in the function evaluation, the final error rate drops to and respectively (Table 2). This may also indicate that poisoning the dataset might cause a substantial decrease in accuracy for the architectures found by SMASH and DeepArchitect. In the case of RAS, even with a more extensive search space, an error rate of is achieved.

Regarding the robustness of the architectures found, Table 2 shows that the final architecture found by DeepArchitect and SMASH were very susceptible to attacks, with error rate on adversarial samples of and respectively. Despite the inclusion of the (measured accuracy on adversarial samples) on the function evaluation, the architectures were still unable to find a robust architecture. This might be a consequence of the relatively small search space used and more focused initialization procedures.

Moreover, the proposed method (RAS) finds an architecture which has an error rate of only on adversarial samples. This improvement rivals the error rate achieved by networks trained with adversarial training, the best to date training for increasing robustness. Note, however, that in the case of the evolved architecture, this is a inherent property of the architecture found . The architecture is inherently robust without any kind of specialised training or defence such as adversarial training (i.e., the architecture was only trained on the training dataset). The addition of defences should increase its robustness further.

7 Analyzing RAS

Figure 3: Accuracy improvement over the generations.
Figure 4: The overall distribution of the architectures found in each generation. The connections from the input layer and the softmax layer are always present and, therefore, they are omitted in the calculation.

In this section, we will evaluate the proposed architecture regarding its evolution quality and how subpopulations behave throughout the process. Figure 3 shows how the mean accuracy of the architectures evolved increases over time. The pattern of behaviour is typical of evolutionary algorithms, showing that evolution is happening as expected.

In Figure 4, the overall characteristics of the evolved architectures throughout the generations are shown. The average number of blocks and the connections between them increase over the generations. However, the average number of layers never reaches the same complexity as the initial models. The number of layers decreases steeply initially while slowly increasing afterwards. Therefore, the overall behaviour is that blocks become smaller and numerous. A consequence of this is that the number of connections becomes proportional to the number of connections between blocks and therefore exhibit similar behaviour. The average number of layers per block and the average number of connections shows little change, varying only and respectively.

Notice that the average number of layers increases but the average number of layers per block continues to decrease albeit slowly. Consequently, blocks tend to degenerate into a few layers, resulting in around three layers per block from the first average number of layers per block. Lastly, the average number of connections in a block is kept more or less the same, with the mean varying throughout only from to . The behaviour described above might suggest that it is hard to create big reusable blocks. This seems to be supported by both the decrease of complexity observed as well as the increase in the number of blocks.

8 Analyzing the Final Architecture: Searching for the Key to Inherent Robustness

Figure 5: Two fragments of the evolved architecture which has peculiar traits.
Figure 6: A more in-depth look at the two fragments from Figure 5, showing the size of input and output for each of the layers. The top fragment corresponds to the left fragment of Figure 5 and the bottom one corresponds to the right fragment.

RAS found an architecture that possesses inherent robustness capable of rivalling current defences. To investigate the reason behind this robustness, we can take a more in-depth look at the architecture found. Figures 5 and 6 show some peculiarities from the evolved architecture: multiple bottlenecks, projections into high-dimensional space and paths with different constraints.

Multiple Bottlenecks and Projections into High-Dimensional Space

The first peculiarity is the use of Dense layers in-between Convolutional ones. This might seem like a bottleneck similar to the ones used in variational autoencoders. However, it is the opposite of a bottleneck (Figure 6); it is a projection in high-dimensional space. The evolved architecture uses mostly a low number of filters while, in some parts of it, high-dimensional projections exist. In the whole architecture, four Dense layers in-between Convolutional ones were used, and all of the projects into higher dimensional space. This follows directly from Cover’s Theorem which states that projecting into high dimensional space makes a training set linearly separable [11].

Paths with Different Constraints

The second peculiarity is the use of multiple paths with the different number of filters and output sizes after high-dimensional projections. Notice how the number of filters differs in each of the Convolutional layers in these paths. This means there are different constraints over the learning in each of these paths, which should foster different types of features. Therefore, this is a multi-bottleneck structure forcing the learning of different sets of features which are now easily constructed from the previous high-dimensional projection.

9 Conclusions

Automatic search for robust architectures is proposed as a paradigm for developing and researching robust models. This paradigm is based on using adversarial attacks together with error rate as evaluation functions in AAS. Experiments on using this paradigm with some of the current AAS had poor results. This was justified by the small search space used by current methods. Here, we propose the RAS method, which has a broader search space, including concatenation, connections between dense to convolutional layer and vice-versa. Results with RAS showed that architectures which are inherent robust do indeed exist. In fact, the evolved architecture achieved robust results comparable with state-of-the-art defences while not having any specialised training or defence. In other words, the evolved architecture is inherently robust. Such inherent robustness could increase if adversarial training, or other types of defence, or a combination of them are employed together with it.

Moreover, investigating the reasons behind such robustness have shown that some peculiar traits are present. The evolved architecture has overall a low number of filters and many bottlenecks. Multiple projections into high-dimensional space are also present to possibly facilitate the separation of features (Cover’s Theorem). It also uses multiple paths with different constraints after the high-dimensional projection, which should, consequently, cause a diverse set of features to be learned by the network. Thus, in the search space of DNNs, more robust architectures do exist, and more research is required to find and fully document them as well as their features.

Acknowledgments

This work was supported by JST, ACT-I Grant Number JP-50166, Japan. Additionally, we would like to thank Prof. Junichi Murata for the kind support without which it would not be possible to conduct this research.

\appendixpage

Appendix A Full Plots of the Evolved Architecture

Figure 7: Full plot of the evolved architecture, showing most of the parameters (continued in Figure 8).
Figure 8: Full plot of the evolved architecture, showing most of the parameters (continuation of Figure 7).
Figure 9: Full plot of the evolved architecture, with the input and output shapes shown (continued in Figure 10).
Figure 10: Full plot of the evolved architecture, with the input and output shapes shown (continuation of Figure 9).

Figures 7 and 8 shows the complete structure of the evolved architecture by RAS. In Figures 9 and 10, there is another perspective from the same plot which has different details available such as the input and output size of each layer.

Appendix B All Convolutional RAS Variation

Algorithm Robustness Evaluation Original Evaluation
Test ER Adversarial ER Test ER Adversarial ER
DeepArchitect 25 75 11 79
Smash 23 82 4 89
Ours (Dense+Conv) 18 42 17 47
Ours (Only Conv) 25 45
Table 3: The function evaluation for the original one is set to be equal to the accuracy of the classifier while the robustness one uses the accuracy summed with the accuracy over the adversarial samples. Test ER stands for error rate on the testing dataset. Adversarial ER stands for error rate on the adversarial samples.

We tried a variation of RAS in which only combinations of convolutional layers were allowed. The results are posted in Table 3. As expected, the results were not as promising (worse error rates and deeper architectures) as the ones using a wider search space and therefore we skipped the rest of the tests. Table 3 also shows the accuracy of RAS when using only test accuracy as the function evaluation.

Footnotes

  1. Here the definition will only concern untargeted attacks but a similar optimization problem can be defined for targeted attacks

References

  1. A. Athalye, N. Carlini and D. Wagner (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In Icml, Cited by: §2.
  2. A. Athalye and I. Sutskever (2018) Synthesizing robust adversarial examples. In Icml, Cited by: §1, §2.
  3. (2017-11) AutoML for large scale image classification and object detection. External Links: Link Cited by: §3.
  4. B. Baker, O. Gupta, N. Naik and R. Raskar (2016) Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167. Cited by: §1, §3.
  5. A. Brock, T. Lim, J. M. Ritchie and N. Weston (2017) SMASH: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344. Cited by: §3, §3, Table 2.
  6. T. B. Brown, D. Mané, A. Roy, M. Abadi and J. Gilmer (2017) Adversarial patch. arXiv preprint arXiv:1712.09665. Cited by: §1, §2.
  7. J. Buckman, A. Roy, C. Raffel and I. Goodfellow (2018) Thermometer encoding: one hot way to resist adversarial examples. Iclr. Cited by: §2.
  8. H. Cai, T. Chen, W. Zhang, Y. Yu and J. Wang (2018) Efficient architecture search by network transformation. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §3.
  9. H. Cai, J. Yang, W. Zhang, S. Han and Y. Yu (2018) Path-level network transformation for efficient architecture search. arXiv preprint arXiv:1806.02639. Cited by: §3, §3.
  10. N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §2.
  11. T. M. Cover (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE transactions on electronic computers (3), pp. 326–334. Cited by: §8.
  12. N. Das, M. Shanbhogue, S. Chen, F. Hohman, L. Chen, M. E. Kounavis and D. H. Chau (2017) Keeping the bad guys out: protecting and vaccinating deep learning with jpeg compression. arXiv preprint arXiv:1705.02900. Cited by: §2.
  13. D. Dohan, D. So and Q. Le (2018) Evolving modular neural sequence architectures with genetic programming. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 37–38. Cited by: §1, §3.
  14. A. Dushatskiy, A. M. Mendrik, T. Alderliesten and P. A. Bosman (2019) Convolutional neural network surrogate-assisted gomea. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 753–761. Cited by: §3.
  15. G. K. Dziugaite, Z. Ghahramani and D. M. Roy (2016) A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853. Cited by: §2.
  16. I. J. Goodfellow, J. Shlens and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §2, §2.
  17. C. Guo, M. Rana, M. Cisse and L. van der Maaten (2018) Countering adversarial images using input transformations. In Iclr, Cited by: §2.
  18. N. Hansen, S. D. Müller and P. Koumoutsakos (2003) Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolutionary computation 11 (1), pp. 1–18. Cited by: §4.1.
  19. T. Hazan, G. Papandreou and D. Tarlow (2016) Perturbations, optimization, and statistics. MIT Press. Cited by: §2.
  20. K. He, X. Zhang, S. Ren and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §4.1.
  21. X. He, K. Zhao and X. Chu (2019) AutoML: a survey of the state-of-the-art. arXiv preprint arXiv:1908.00709. Cited by: §3.
  22. R. Huang, B. Xu, D. Schuurmans and C. Szepesvári (2015) Learning with a strong adversary. arXiv preprint arXiv:1511.03034. Cited by: §2.
  23. K. Kandasamy, W. Neiswanger, J. Schneider, B. Poczos and E. P. Xing (2018) Neural architecture search with bayesian optimisation and optimal transport. In Advances in Neural Information Processing Systems, pp. 2016–2025. Cited by: §3.
  24. A. Krizhevsky and G. Hinton (2009) Learning multiple layers of features from tiny images. Citeseer. Cited by: §4.1.
  25. A. Kurakin, I. Goodfellow and S. Bengio (2016) Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533. Cited by: §1, §2.
  26. H. Lee, D. Yu and Y. Kim (2018) On the hardness of parameter optimization of convolution neural networks using genetic algorithm and machine learning. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 51–52. Cited by: §3.
  27. J. Liang, E. Meyerson, B. Hodjat, D. Fink, K. Mutch and R. Miikkulainen (2019) Evolutionary neural automl for deep learning. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 401–409. Cited by: §1, §3.
  28. J. Liang, E. Meyerson and R. Miikkulainen (2018) Evolutionary architecture search for deep multitask networks. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 466–473. Cited by: §1, §3.
  29. R. H. Lima and A. T. Pozo (2019) Evolving convolutional neural networks through grammatical evolution. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 179–180. Cited by: §3.
  30. H. Liu, K. Simonyan, O. Vinyals, C. Fernando and K. Kavukcuoglu (2017) Hierarchical representations for efficient architecture search. Iclr. Cited by: §1, §3, §3.
  31. H. Liu, K. Simonyan and Y. Yang (2018) Darts: differentiable architecture search. arXiv preprint arXiv:1806.09055. Cited by: §3.
  32. R. Luo, F. Tian, T. Qin, E. Chen and T. Liu (2018) Neural architecture optimization. In Advances in neural information processing systems, pp. 7816–7827. Cited by: §3.
  33. X. Ma, B. Li, Y. Wang, S. M. Erfani, S. Wijewickrema, G. Schoenebeck, D. Song, M. E. Houle and J. Bailey (2018) Characterizing adversarial subspaces using local intrinsic dimensionality. arXiv preprint arXiv:1801.02613. Cited by: §2.
  34. A. Madry, A. Makelov, L. Schmidt, D. Tsipras and A. Vladu (2018) Towards deep learning models resistant to adversarial attacks. In Iclr, Cited by: §2, §4.1.
  35. R. Miikkulainen, J. Liang, E. Meyerson, A. Rawal, D. Fink, O. Francon, B. Raju, H. Shahrzad, A. Navruzyan and N. Duffy (2019) Evolving deep neural networks. In Artificial Intelligence in the Age of Neural Networks and Brain Computing, pp. 293–312. Cited by: §1, §3.
  36. S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi and P. Frossard (2017) Universal adversarial perturbations. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 86–94. Cited by: §1, §2.
  37. R. Negrinho and G. Gordon (2017) Deeparchitect: automatically designing and training deep architectures. arXiv preprint arXiv:1704.08792. Cited by: §3, Table 2.
  38. A. Nguyen, J. Yosinski and J. Clune (2015) Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 427–436. Cited by: §1.
  39. M. Nicolae, M. Sinn, M. N. Tran, B. Buesser, A. Rawat, M. Wistuba, V. Zantedeschi, N. Baracaldo, B. Chen, H. Ludwig, I. Molloy and B. Edwards (2018) Adversarial robustness toolbox v1.1.0. CoRR 1807.01069. External Links: Link Cited by: §4.1.
  40. E. Papavasileiou and B. Jansen (2018) Configuring the parameters of artificial neural networks using neuroevoiution and automatic algorithm configuration. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 61–62. Cited by: §1, §3.
  41. N. Papernot, P. McDaniel, X. Wu, S. Jha and A. Swami (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. Cited by: §2.
  42. H. Pham, M. Guan, B. Zoph, Q. Le and J. Dean (2018) Efficient neural architecture search via parameter sharing. In International Conference on Machine Learning, pp. 4092–4101. Cited by: §1, §3.
  43. E. Real, A. Aggarwal, Y. Huang and Q. V. Le (2018) Regularized evolution for image classifier architecture search. arXiv preprint arXiv:1802.01548. Cited by: §1, §3.
  44. E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le and A. Kurakin (2017) Large-scale evolution of image classifiers. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2902–2911. Cited by: §1, §3, §3.
  45. S. Sabour, N. Frosst and G. E. Hinton (2017) Dynamic routing between capsules. In Advances in neural information processing systems, pp. 3856–3866. Cited by: §4.1.
  46. M. Sharif, S. Bhagavatula, L. Bauer and M. K. Reiter (2016) Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 1528–1540. Cited by: §2.
  47. Y. Song, T. Kim, S. Nowozin, S. Ermon and N. Kushman (2018) Pixeldefend: leveraging generative models to understand and defend against adversarial examples. In Iclr, Cited by: §2.
  48. R. Storn and K. Price (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization 11 (4), pp. 341–359. External Links: ISSN 0925-5001 Cited by: §4.1.
  49. J. Su, D. V. Vargas and K. Sakurai (2019) One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation 23 (5), pp. 828–841. Cited by: §1, §2.
  50. C. e. al. Szegedy (2014) Intriguing properties of neural networks. In In ICLR, Cited by: §1.
  51. F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh and P. McDaniel (2017) Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204. Cited by: §2.
  52. J. Uesato, B. O’Donoghue, P. Kohli and A. Oord (2018) Adversarial risk and the dangers of evaluating against weak attacks. In International Conference on Machine Learning, pp. 5032–5041. Cited by: §2.
  53. (2017-11) Using machine learning to explore neural network architecture. External Links: Link Cited by: §3.
  54. D. V. Vargas and S. Kotyan (2019) Robustness assessment for adversarial machine learning: problems, solutions and a survey of current neural networks and defenses. arXiv preprint arXiv:1906.06026. Cited by: §3, §4.1, Table 1.
  55. D. V. Vargas and J. Murata (2017) Spectrum-diverse neuroevolution with unified neural models. IEEE transactions on neural networks and learning systems 28 (8), pp. 1759–1773. Cited by: §5.4.
  56. D. V. Vargas and J. Su (2019) Understanding the one-pixel attack: propagation maps and locality analysis. arXiv preprint arXiv:1902.02947. Cited by: §2.
  57. B. Wang, Y. Sun, B. Xue and M. Zhang (2019) Evolving deep neural networks by multi-objective particle swarm optimization for image classification. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 490–498. Cited by: §1, §3.
  58. L. Xie and A. Yuille (2017) Genetic cnn. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1379–1388. Cited by: §1, §3.
  59. W. Xu, D. Evans and Y. Qi (2017) Feature squeezing: detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155. Cited by: §2, §4.1.
  60. D. Yu and Y. Kim (2018) Is it worth to approximate fitness by machine learning? investigation on the extensibility according to problem size. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 77–78. Cited by: §3.
  61. Z. Zhong, J. Yan, W. Wu, J. Shao and C. Liu (2018) Practical block-wise neural network architecture generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2423–2432. Cited by: §3.
  62. B. Zoph and Q. V. Le (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. Cited by: §1, §3, §3.
  63. B. Zoph, V. Vasudevan, J. Shlens and Q. V. Le (2018) Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710. Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
408847
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description