Adversarially Robust Generalization Just
Requires More Unlabeled Data
Abstract
Neural network robustness has recently been highlighted by the existence of adversarial examples. Many previous works show that the learned networks do not perform well on perturbed test data, and significantly more labeled data is required to achieve adversarially robust generalization. In this paper, we theoretically and empirically show that with just more unlabeled data, we can learn a model with better adversarially robust generalization. The key insight of our results is based on a risk decomposition theorem, in which the expected robust risk is separated into two parts: the stability part which measures the prediction stability in the presence of perturbations, and the accuracy part which evaluates the standard classification accuracy. As the stability part does not depend on any label information, we can optimize this part using unlabeled data. We further prove that for a specific Gaussian mixture problem illustrated by Schmidt et al. (2018), adversarially robust generalization can be almost as easy as the standard generalization in supervised learning if a sufficiently large amount of unlabeled data is provided. Inspired by the theoretical findings, we further show that a practical adversarial training algorithm that leverages unlabeled data can improve adversarial robust generalization on MNIST and Cifar10.
1 Introduction
Deep learning (LeCun et al., 2015), especially deep Convolutional Neural Network (CNN) (LeCun et al., 1998), has led to stateoftheart results spanning many machine learning fields, such as image classification (Simonyan & Zisserman, 2014; He et al., 2016; Huang et al., 2017; Hu et al., 2017), object detection (Ren et al., 2015; Redmon et al., 2016; Lin et al., 2018), semantic segmentation (Long et al., 2015; Zhao et al., 2017; Chen et al., 2018) and action recognition (Tran et al., 2015; Wang et al., 2016, 2018).
Despite the great success in numerous applications, recent studies show that deep CNNs are vulnerable to some welldesigned input samples named as Adversarial Examples (Szegedy et al., 2013; Biggio et al., 2013). Take image classification as an example, for almost every commonly used wellperformed CNN, attackers are able to construct a small perturbation on an input image. The perturbation is almost imperceptible to humans but can fool the model to make a wrong prediction. The problem is serious as some designed adversarial examples can be transferred among different kinds of CNN architectures (Papernot et al., 2016), which makes it possible to perform blackbox attack: an attacker has no access to the model parameters or even architecture, but can still easily fool a machine learning system.
There is a rapidly growing body of work on studying how to obtain a robust neural network model. Most of the successful methods are based on adversarial training (Szegedy et al., 2013; Madry et al., 2017; Goodfellow et al., 2015; Huang et al., 2015). The highlevel idea of these works is that during training, we predict the strongest perturbation to each sample against the current model and use the perturbed sample together with the correct label for gradient descent optimization. However, the learned model tends to overfit on the training data and fails to keep robust on unseen testing data. For example, using the stateoftheart adversarial robust training method (Madry et al., 2017), the defense success rate of the learned model on the testing data is below 60% while that on the training data is almost 100%, which indicates that the robustness fails to generalize. Some theoretical results further show that it is challenging to achieve adversarially robust generalization. Fawzi et al. (2018) proves that adversarial examples exist for any classifiers and can be transferred across different models, making it impossible to design network architectures free from adversarial attacks. Schmidt et al. (2018) shows that adversarially robust generalization requires much more labeled data than standard generalization in certain cases. Tsipras et al. (2019) presents an inherent tradeoff between accuracy and robust accuracy and argues that the phenomenon comes from the fact that robust classifiers learn different features. Therefore it is hard to reach high robustness for standard training methods.
Given the challenge of the task and previous findings, in this paper, we provide several theoretical and empirical results towards better adversarially robust generalization. In particular, we show that we can learn an adversarially robust model which generalizes well if we have plenty of unlabeled data, and the labeled sample complexity for adversarially robust generalization in Schmidt et al. (2018) can be largely reduced if unlabeled data is used. First, we show that the expected robust risk can be upper bounded by the sum of two terms: a stability term which measures whether the model can output consistent predictions under perturbations, and an accuracy term which evaluates whether the model can make correct predictions on natural samples. Given the stability term does not rely on ground truth labels, unlabeled data can be used to minimize this term and thus improve the generalization ability. Second, we prove that for the Gaussian mixture problem defined in Schmidt et al. (2018), if unlabeled data can be used, adversarially robust generalization will be almost as easy as the standard generalization in supervised learning (i.e. using the same number of labeled samples under similar conditions). Inspired by the theoretical findings, we provide a practical algorithm that can learn from both labeled and unlabeled data for better adversarially robust generalization. Our experiments on MNIST and Cifar10 show that the method achieves better performance, which verifies our theoretical findings.
Our contributions are in three folds.

In Section 3.2.1, we provide a theorem to show that unlabeled data can be naturally used to improve the expected robust risk in general setting and thus leveraging unlabeled data is a way to improve adversarially robust generalization.

In Section 3.2.2, we discuss a specific Gaussian mixture problem introduced in Schmidt et al. (2018). In Schmidt et al. (2018), the authors proved that in this case, the labeled sample complexity for robust generalization is significantly larger than that for standard generalization. As an extension of this work, we prove that in this case, the labeled sample complexity for robust generalization can be the same as that for standard generalization if we have enough unlabeled data.

Inspired by our theoretical findings, we provide an adversarial robust training algorithm using both labeled and unlabeled data. Our experimental results show that the algorithm achieves better performance than baseline algorithms on MNIST and Cifar10, which empirically proves that unlabeled data can help improve adversarially robust generalization.
2 Related works
Adversarial attacks and defense
Most previous works study how to attack a neural network model using small perturbations under certain norm constraints, such as norm or norm. For the constraint, Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2015) finds a direction to which the perturbation increases the classification loss at an input point to the greatest extent; Projected Gradient Descent (PGD) (Madry et al., 2017) extends FGSM by updating the direction of the attack in an iterative manner and clipping the modifications in the norm range after each iteration. For the constraint, DeepFool (MoosaviDezfooli et al., 2016) iteratively computes the minimal norm of an adversarial perturbation by linearizing around the input in each iteration. C&W attack (Carlini & Wagner, 2016) is a comprehensive approach that works under both norm constraints. In this work, we focus on learning a robust model to defend the whitebox attack, i.e. the attacker knows the model parameters and thus can use the algorithms above to attack the model.
There are a large number of papers about defending against adversarial attacks, but the result is far from satisfactory. Remarkably, Athalye et al. (2018) shows most defense methods take advantage of socalled “gradient mask” and provides an attacking method called BPDA to correct the gradients. So far, adversarial training (Madry et al., 2017) has been the most successful whitebox defense algorithm. By modeling the learning problem as a minimax game between the attacker and defender, the robust model can be trained using iterative optimization methods. Some recent papers (Wang et al., 2019; Gao et al., 2019) theoretically prove the convergence of adversarial training. Moreover, Shafahi et al. (2019); Zhang et al. (2019a) propose ways to accelerate the speed of adversarial training. Adversarial logit pairing (Kannan et al., 2018) and TRADES (Zhang et al., 2019b) further improve adversarial training by decomposing the prediction error as the sum of classification error and boundary error, and Wang et al. (2019) proposes to improve adversarial training by evaluating the quality of adversarial examples using the FOSC metric.
Semisupervised learning
Using unlabeled data to help the learning process has been proved promising in different applications (Rasmus et al., 2015; Zhang & Shi, 2011; Elworthy, 1994). Many approaches use regularizers called “soft constraints” to make the model “behave” well on unlabeled data. For example, transductive SVM (Joachims, 1999) uses prediction confidence as a soft constraint, and graphbased SSL (Belkin et al., 2006; Talukdar & Crammer, 2009) requires the model to have similar outputs at endpoints of an edge. The most related work to ours is the consistencybased SSL. It uses consistency as a soft constraint, which encourages the model to make consistent predictions on unlabeled data when a small perturbation is added. The consistency metric can be either computed by the model’s own predictions, such as the model (Sajjadi et al., 2016), Temporal Ensembling (Laine & Aila, 2016) and Virtual Adversarial Training (Miyato et al., 2018), or by the predictions of a teacher model, such as the mean teacher model (Tarvainen & Valpola, 2017).
Semisupervised learning for adversarially robust generalization
There are three other concurrent and independent works (Carmon et al., 2019; Uesato et al., 2019; Najafi et al., 2019) which also explore how to use unlabeled data to help adversarially robust generalization. We describe the three works below, and compare them with ours. See also Carmon et al. (2019) and Uesato et al. (2019) for the comparison of all the four works from their perspective.
Najafi et al. (2019) investigate the robust semisupervised learning from the distributionally robust optimization perspective. They assign soft labels to the unlabeled data according to an adversarial loss and train such images together with the labeled ones. Results on a wide range of tasks show that the proposed algorithm improves the adversarially robust generalization. Both Najafi et al. (2019) and we conduct semisupervised experiments by removing labels from the training data.
Uesato et al. (2019) study the Gaussian mixture model of Schmidt et al. (2018) and theoretically show that a selftraining algorithm can successfully leverage unlabeled data to improve adversarial robustness. They extend the selftraining algorithm to the real image dataset Cifar10, augment it with unlabeled Tiny Image dataset and improve stateoftheart adversarial robustness. They show strong improvements in the low labeled data regimes by removing most labels from CIFAR10 and SVHN. In our work, we also study the Gaussian mixture model and show that a slightly different algorithm can improve adversarially robust generalization as well. We observe similar improvements using our algorithm on Cifar10 and MNIST.
Carmon et al. (2019) obtain similar theoretical and empirical results as in Uesato et al. (2019), and offer a more comprehensive analysis of other aspects. They show that by using unlabeled data and robust selftraining, the learned models can obtain better certified robustness against all possible attacks. Moreover, they study the impact of different training components on the final model performance, such as the size of unlabeled data. We also study the influence of different factors in our experiments and have similar observations.
3 Main results
In this section, we illustrate the benefits of using unlabeled data for robust generalization from a theoretical perspective.
3.1 Notations and definitions
We consider a standard classification task with an underlying data distribution over pairs of examples and corresponding labels . Usually is unknown and we can only access to in which is independent and identically drawn from , . For ease of reference, we denote this empirical distribution as (i.e. the uniform distribution over i.i.d. sampled data). We also assume that we are given a suitable loss function , where is parameterized by . The standard loss function is the zeroone loss, i.e. . Due to its discontinuous and nondifferentiable nature, surrogate loss functions such as crossentropy or mean square loss are commonly used during optimization.
Our goal is to find an that minimizes the expected classification risk. Without loss of any generality, our theory is mainly based on the binary classification problem, i.e. . All theorems below can be easily extended to the multiclass classification problem. For a binary classification problem, the expected classification risk is defined as below.
Definition 1.
(Expected classification risk). Let be a probability distribution over . The expected classification risk of a classifier under distribution and loss function is defined as .
We use to denote the classification risk under the underlying distribution and use to denote the classification risk under the empirical distribution. We use to denote the risk with the zeroone loss function. The classification risk characterizes whether the model is accurate. However, we also care about whether is robust. For example, when input is an image, we hope a small change (perturbation) to will not change the prediction of . To this end, Schmidt et al. (2018) defines expected robust classification risk as follows.
Definition 2.
(Expected robust classification risk). Let be a probability distribution over and be a perturbation set. Then the robust classification risk of a classifier under distribution and loss function is defined as .
Again, we use to denote the expected robust classification risk under the underlying distribution and use to denote the expected robust classification risk under the empirical distribution. We use to denote the robust risk with the zeroone loss function. In real practice, the most commonly used setting is the perturbation under bounded norm constraint . For simplicity, we refer to the robustness defined by this perturbation set as robustness.
3.2 Robust generalization analysis
Our first result (Section 3.2.1) shows that unlabeled data can be used to improve adversarially robust generalization in general setting. Our second result (Section 3.2.2) shows that for a specific learning problem defined on Gaussian mixture model, compared to previous work (Schmidt et al., 2018), the sample complexity for robust generalization can be significantly reduced by using unlabeled data. Both results suggest that using unlabeled data is a natural way to improve adversarially robust generalization. All detailed proofs of the theorems and lemmas in this section can be found in the appendix.
3.2.1 General results
In this subsection, we show that the expected robust classification risk can be bounded by the sum of two terms. The first term only depends on the hypothesis space and the unlabeled data, and the second term is a standard PAC bound.
Theorem 1.
Let be the hypothesis space. Let be the set of i.i.d. samples drawn from the underlying distribution . For any function , with probability at least over the random draw of , we have
(1) 
where (1) is a term that can be optimized with only unlabeled data and (2) is the standard PAC generalization bound. is the marginal distribution for and is the empirical Rademacher complexity of hypothesis space .
From Theorem 1, we can see that the expected robust classification risk is bounded by the sum of two terms: the first term only involves the marginal distribution and the second term is the standard PAC generalization error bound. This shows that the expected robust risk minimization can be achieved by jointly optimizing the two terms simultaneously: we can optimize the first term using unlabeled data sampled from and optimize the second term using labeled data sampled from , which is the same as the standard supervised learning.
While Cullina et al. (2018) suggests that in the standard PAC learning scenario (only labeled data is considered), the generalization gap of robust risk can be sometimes uncontrollable by the capacity of hypothesis space , our results show that we can mitigate this problem by introducing unlabeled data. In fact, our following result shows that with enough unlabeled data, learning a robust model can be almost as easy as learning a standard model.
3.2.2 Learning from Gaussian mixture model
The learning problem defined on Gaussian mixture model is illustrated in Schmidt et al. (2018) as an example to show adversarially robust generalization needs much more labeled data compared to standard generalization. In this subsection, we show that for this specific problem, just using more unlabeled data is enough to achieve adversarially robust generalization. For completeness, we first list the results in Schmidt et al. (2018) and then show our theoretical findings.
Definition 3.
(Gaussian mixture model (Schmidt et al., 2018)). Let be the perclass mean vector and let be the variance parameter. Then the Gaussian mixture model is defined by the following distribution over : First, draw a label uniformly at random. Then sample the data point from .
Given the samples from the distribution defined above, the learning problem is to find a linear classifier to predict label from . Schmidt et al. (2018) proved the following sample complexity bound for standard generalization.
Theorem 2.
(Theorem 4 in Schmidt et al. (2018)). Let be drawn from the Gaussian mixture model with and where is a universal constant. Let be the vector . Then with high probability, the expected classification risk of the linear classifier using 01 loss is at most 1%.
Theorem 2 suggests that we can learn a linear classifier with low classification risk (e.g., 1%) even if there is only one labeled data. However, the following theorem shows that for adversarially robust generalization under perturbation, significantly more labeled data is required.
Theorem 3.
(Theorem 6 in Schmidt et al. (2018)). Let be any learning algorithm, i.e. a function from samples to a binary classifier . Moreover, let , let , and let be drawn from . We also draw samples from the Gaussian mixture model. Then the expected robust classification risk of using 01 loss is at least if the number of labeled data .
As we can see from above theorem, the sample complexity for robust generalization is larger than that of standard generalization by . This shows that for highdimensional problems, adversarial robustness can provably require a significantly larger number of samples. We provide a new result which shows that the learned model can be robust if there is only one labeled data and sufficiently many unlabeled data. Our theorem is stated as follow:
Theorem 4.
Let be a labeled point drawn from Gaussian mixture model with and . Let be unlabeled points drawn from . Let such that . Let . Then there exists a constant such that for any , with high probability, the expected robust classification risk of using 01 loss is at most when the number of unlabeled points and .
From Theorem 4, we can see that when the number of unlabeled points is significant, we can learn a highly accurate and robust model using only one labeled point.
Proof sketch
The learning process can be intuitively described as the following three steps: in the first step, we use unlabeled data to estimate the direction of although we do not know the label that (or ) corresponds to. Specifically, we choose the direction which maximizes the quantity which can be viewed as a measure of the confidence at data points. In the second step, we use the given labeled point to determine the “sign” of with high probability, we note that when the direction is correctly estimated in the first step, then the only one labeled point is sufficient to give the correct sign with high probability. Finally, we give a good estimation of by combining the two steps above and learn a robust classifier. The three key lemmas corresponding to the three steps are listed below ( are constants for ).
Lemma 1.
Under the same setting as Theorem 4, suppose that and . Then, with probability at least , there is a unique unit maximal eigenvector of the sample covariance matrix such that .
Lemma 2.
Under the same setting as Theorem 4, suppose is a unit vector such that for some constant . Then with probability at least , we have .
Lemma 3.
Our theoretical findings suggest that we can improve the adversarially robust generalization using unlabeled data. In the next section, we will present a practical algorithm for real applications, which further verifies our main results.
4 Algorithm and experiments
4.1 Practical algorithm
Let be a set of labeled data and be a set of unlabeled data. Motivated by the theory in the previous section, to achieve better adversarially robust generalization, we can optimize the classifier to be accurate on and robust on . This is also equivalent to making the classifier accurate and robust on and robust on . Therefore, we design two loss terms on and separately.
For the labeled dataset , we use the standard robust adversarial training objective function:
(2) 
Following the most common setting, during training, the classifier outputs a probability distribution over categories and is evaluated by crossentropy loss defined as , where is the output probability for category .
For unlabeled data , we use an objective function which measures robustness without labels:
(3) 
Putting the two objective functions together, our training loss is defined as a combination of and as follows:
(4) 
Here is a coefficient to trade off the two loss terms. In real practice, we use iterative optimization methods to learn the function . In the inner loop, we fix the model and use Projected Gradient Descent (PGD) to learn the attack for any . In the outer loop, we use stochastic gradient descent to optimize on the perturbed s. The general training process is shown in Algorithm 1.
Remark
We notice that Algorithm 1 is a generalized version of Virtual Adversarial Training (VAT) (Miyato et al., 2018). When setting the PGD step , the algorithm is almost equivalent to the original VAT algorithm, which is particular useful for improving standard generalization. However, according to our experimental results below, setting does not help improve adversarial robust generalization. The improvement of adversarial robust generalization using unlabeled data exists when setting a relatively larger .
4.2 Experimental setting
We verify Algorithm 1 on MNIST and Cifar10. Following Madry et al. (2017), we use the Resnet model and modify the network incorporating wider layers by a factor of 10. This results in a network with five residual units with (16, 160, 320, 640) filters each. During training, we apply data augmentation including random crops and flips, as well as per image standardization. The initial learning rate is 0.1, and decay by 0.1 twice during training. In the inner loop, we run a 7step PGD with step size for each minibatch. The perturbation is constrained to be under norm.
Following many previous works (Laine & Aila, 2016; Tarvainen & Valpola, 2017; Miyato et al., 2018; Athiwaratkun et al., 2019), we sample / labeled data from the training set and use them as labeled data. We mask out the labels of the remaining images in the training set and use them as unlabeled data. By doing this, we conduct two semisupervised learning tasks and call them the / experiments. In a minibatch, we sample 25/50 labeled images and 225/200 unlabeled images for the / experiment respectively. In both experiments, we use several different values of as an ablation study for this hyperparameter by setting , , . Learning rate is decayed at the and the epoch. We use the original PGDbased adversarial training (Madry et al., 2017) on the sampled / labeled data as the baseline algorithm for comparison (referred to as PGDadv). Our algorithm is referred to as Ours.
4.3 Experimental results
NA  NA  RA  RA  DSR  

5  PGDadv on 5  98.31  98.38  96.95  96.89  98.49 
Ours (, )  98.36  98.54  97.82  97.19  98.63  
Ours (, )  98.43  98.55  98.18  97.28  98.71  
Ours (, )  98.56  98.56  98.46  97.31  98.73  
10  PGDadv on 10  98.91  98.83  97.96  97.64  98.80 
Ours (, )  98.92  98.92  98.55  97.91  98.98  
Ours (, )  98.90  98.89  98.76  97.93  99.03  
Ours (, )  98.93  98.87  98.77  98.01  99.13  
PGDadv on 50  99.89  99.44  99.77  98.84  99.40 
NA  NA  RA  RA  DSR  

5  PGDadv on 5  61.18  60.57  32.40  30.54  50.42 
Ours (, )  63.24  60.44  32.97  30.90  51.13  
Ours (, )  61.73  60.71  35.20  32.96  54.29  
Ours (, )  61.88  60.46  35.07  33.54  55.47  
Ours (, )  68.15  67.14  0.13  0.12  0.00  
10  PGDadv on 10  78.80  73.79  45.60  37.48  50.79 
Ours (, )  78.24  72.92  47.96  38.86  53.29  
Ours (, )  78.74  73.16  51.20  41.18  56.29  
Ours (, )  78.95  73.35  52.24  42.48  57.91  
Ours (, )  81.43  78.64  2.22  2.27  0.03  
PGDadv on 50  99.91  85.40  96.71  49.99  58.54 
We list all results of the / experiments in Tables 1 and 2. We use five criteria to evaluate the performance of the model: the natural training/test accuracy (NA and NA), the robust training/test accuracy using PGD7 attack (RA and RA) and the defense success rate (DSR).
First, we can see that in both experiments, the robust test accuracy is improved when we use unlabeled data. For example, on Cifar10 the robust test accuracy of the models trained under SSL with for the / experiments increase by 3.0/5.0 percents compared to the PGDadv baselines. We also check the defense success rate which evaluates whether the model is robust given the prediction is correct. As we can see from the last column in Tables 1 and 2, the defense success rate of models trained using our proposed method is much higher than the baselines. In particular, the defense success rate of the model trained with in the experiment is competitive to the model trained using PGDadv on the whole dataset. This clearly shows the advantage of our proposed algorithm.
Second, we can also see the influence of the value of . The model trained with a larger has higher robust accuracy. For example, in the experiment, the robust test accuracy of the model trained with is more than better than that with . However, we observe that training will become hard to converge if .
Third, using larger produces more robust models. As we can see from the table, in the experiment, relatively higher natural training/test accuracy can be achieved by setting (vanilla VAT algorithm). However, the robust training/testing accuracy are significantly worse and are near zero. This clearly shows that using a stronger attack on both labeled and unlabeled data leads to better adversarially robust generalization, which is also consistent with our theory.
5 Conclusion
In this paper, we theoretically and empirically show that with just more unlabeled data, we can learn models with better adversarially robust generalization. We first give an expected robust risk decomposition theorem and then show that for a specific learning problem on the Gaussian mixture model, the adversarially robust generalization can be almost as easy as standard generalization. Based on these theoretical results, we develop an algorithm which leverages unlabeled data during training and empirically show its advantage. As future work, we will study the sample complexity of unlabeled data for broader function classes and solve more challenging real tasks.
References
 Athalye et al. (2018) Anish Athalye, Nicholas Carlini, and David A. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. CoRR, abs/1802.00420, 2018.
 Athiwaratkun et al. (2019) Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, and Andrew Gordon Wilson. There are many consistent explanations of unlabeled data: Why you should average. In International Conference on Learning Representations, 2019.
 Belkin et al. (2006) Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research, 7(Nov):2399–2434, 2006.
 Biggio et al. (2013) Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases, pp. 387–402. Springer, 2013.
 Carlini & Wagner (2016) Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. CoRR, abs/1608.04644, 2016.
 Carmon et al. (2019) Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, and John C Duchi. Unlabeled data improves adversarial robustness. arXiv preprint arXiv:1905.13736, 2019.
 Chen et al. (2018) LiangChieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoderdecoder with atrous separable convolution for semantic image segmentation. arXiv preprint arXiv:1802.02611, 2018.
 Cullina et al. (2018) Daniel Cullina, Arjun Nitin Bhagoji, and Prateek Mittal. Paclearning in the presence of evasion adversaries. arXiv preprint arXiv:1806.01471, 2018.
 Elworthy (1994) David Elworthy. Does baumwelch reestimation help taggers? In Proceedings of the Fourth Conference on Applied Natural Language Processing, ANLC ’94, pp. 53–58, Stroudsburg, PA, USA, 1994. Association for Computational Linguistics. doi: 10.3115/974358.974371. URL https://doi.org/10.3115/974358.974371.
 Fawzi et al. (2018) Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier. arXiv preprint arXiv:1802.08686, 2018.
 Gao et al. (2019) Ruiqi Gao, Tianle Cai, Haochuan Li, Liwei Wang, ChoJui Hsieh, and Jason D Lee. Convergence of adversarial training in overparametrized networks. arXiv preprint arXiv:1906.07916, 2019.
 Goodfellow et al. (2015) Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
 He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
 Hu et al. (2017) Jie Hu, Li Shen, and Gang Sun. Squeezeandexcitation networks. arXiv preprint arXiv:1709.01507, 7, 2017.
 Huang et al. (2017) Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In CVPR, volume 1(2), pp. 3, 2017.
 Huang et al. (2015) Ruitong Huang, Bing Xu, Dale Schuurmans, and Csaba Szepesvári. Learning with a strong adversary. arXiv preprint arXiv:1511.03034, 2015.
 Joachims (1999) Thorsten Joachims. Transductive inference for text classification using support vector machines. In Proceedings of the Sixteenth International Conference on Machine Learning, ICML ’99, pp. 200–209, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. ISBN 1558606122.
 Kannan et al. (2018) Harini Kannan, Alexey Kurakin, and Ian J. Goodfellow. Adversarial logit pairing. CoRR, abs/1803.06373, 2018.
 Laine & Aila (2016) Samuli Laine and Timo Aila. Temporal ensembling for semisupervised learning. CoRR, abs/1610.02242, 2016.
 LeCun et al. (1998) Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
 LeCun et al. (2015) Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015.
 Lin et al. (2018) TsungYi Lin, Priyal Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. IEEE transactions on pattern analysis and machine intelligence, 2018.
 Long et al. (2015) Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440, 2015.
 Madry et al. (2017) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
 Miyato et al. (2018) Takeru Miyato, Shinichi Maeda, Shin Ishii, and Masanori Koyama. Virtual adversarial training: a regularization method for supervised and semisupervised learning. IEEE transactions on pattern analysis and machine intelligence, 2018.
 Mohri et al. (2012) Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. MIT Press, 2012. ISBN 9780262018258.
 MoosaviDezfooli et al. (2016) SeyedMohsen MoosaviDezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A simple and accurate method to fool deep neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
 Najafi et al. (2019) Amir Najafi, Shinichi Maeda, Masanori Koyama, and Takeru Miyato. Robustness to adversarial perturbations in learning from incomplete data. arXiv preprint arXiv:1905.13021, 2019.
 Papernot et al. (2016) Nicolas Papernot, Patrick D. McDaniel, and Ian J. Goodfellow. Transferability in machine learning: from phenomena to blackbox attacks using adversarial samples. CoRR, abs/1605.07277, 2016.
 Rasmus et al. (2015) Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. Semisupervised learning with ladder networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems 28, pp. 3546–3554. Curran Associates, Inc., 2015.
 Redmon et al. (2016) Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, realtime object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
 Ren et al. (2015) Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster rcnn: Towards realtime object detection with region proposal networks. In Advances in neural information processing systems, pp. 91–99, 2015.
 Sajjadi et al. (2016) Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen. Regularization with stochastic transformations and perturbations for deep semisupervised learning. CoRR, abs/1606.04586, 2016.
 Schmidt et al. (2018) Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. In Advances in Neural Information Processing Systems, pp. 5019–5031, 2018.
 Shafahi et al. (2019) Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! arXiv preprint arXiv:1904.12843, 2019.
 Simonyan & Zisserman (2014) Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556, 2014.
 Szegedy et al. (2013) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013.
 Talukdar & Crammer (2009) Partha Pratim Talukdar and Koby Crammer. New regularized algorithms for transductive learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 442–457. Springer, 2009.
 Tarvainen & Valpola (2017) Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weightaveraged consistency targets improve semisupervised deep learning results. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems 30, pp. 1195–1204. Curran Associates, Inc., 2017.
 Tran et al. (2015) Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp. 4489–4497, 2015.
 Tsipras et al. (2019) Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2019.
 Uesato et al. (2019) Jonathan Uesato, JeanBaptiste Alayrac, PoSen Huang, Robert Stanforth, Alhussein Fawzi, and Pushmeet Kohli. Are labels required for improving adversarial robustness? CoRR, abs/1905.13725, 2019. URL http://arxiv.org/abs/1905.13725.
 Wainwright (2019) Martin J. Wainwright. HighDimensional Statistics: A NonAsymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019. doi: 10.1017/9781108627771.
 Wang et al. (2016) Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision, pp. 20–36. Springer, 2016.
 Wang et al. (2018) Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Nonlocal neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 1(3), pp. 4, 2018.
 Wang et al. (2019) Yisen Wang, Xingjun Ma, James Bailey, Jinfeng Yi, Bowen Zhou, and Quanquan Gu. On the convergence and robustness of adversarial training. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 6586–6595, Long Beach, California, USA, 09–15 Jun 2019. PMLR.
 Zhang & Shi (2011) Bing Zhang and Mingguang Shi. Semisupervised learning improves gene expressionbased prediction of cancer recurrence. Bioinformatics, 27(21):3017–3023, 09 2011. ISSN 13674803. doi: 10.1093/bioinformatics/btr502.
 Zhang et al. (2019a) Dinghuai Zhang, Tianyuan Zhang, Yiping Lu, Zhanxing Zhu, and Bin Dong. You only propagate once: Painless adversarial training using maximal principle. arXiv preprint arXiv:1905.00877, 2019a.
 Zhang et al. (2019b) Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled tradeoff between robustness and accuracy. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 7472–7482, Long Beach, California, USA, 09–15 Jun 2019b. PMLR.
 Zhao et al. (2017) Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890, 2017.
Appendix A Background on generalization and Rademacher complexity
The Rademacher complexity is a commonly used capacity measure for a hypothesis space.
Definition 4.
Given a set of samples, the empirical Rademacher complexity of function class (mapping from to ) is defined as:
(5) 
where contains i.i.d. random variables drawn from the Rademacher distribution unif({1, 1}).
By using the Rademacher complexity, we can directly provide an upper bound on the generalization error.
Theorem 5.
(Theorem 3.5 in Mohri et al. (2012)). Suppose is the loss, let be the set of i.i.d. samples drawn from the underlining distribution . Let be the hypothesis space, then with probability at least over , for any :
(6) 
Appendix B Proof of Theorem 1
Appendix C Proof of Theorem 4
For convenience, in this section, we use or to denote some universal constants, where .
In the proof of Theorem 4, we will use the concentration bound for covariance estimation in Wainwright (2019). We first introduce the definition of spiked covariance ensemble.
Definition 5.
(Spiked covariance ensemble). A sample from the spiked covariance ensemble takes the form
(9) 
where is a zeromean random variable with unit variance, is a fixed scalar, is a fixed unit vector and is a random vector independent of , with zero mean and covariance matrix .
To see why spiked covariance ensemble model is useful, we note that the Gaussian mixture model is its special case. Specifically, let ’s be the unlabeled data in Theorem 4. Then follows the Gaussian mixture distribution , and is a spiked covariance ensemble with parameter , uniformly distributed on , and .
The following theorem from Wainwright (2019) characterizes the concentration property of spiked covariance ensemble, which we will further use to bound the robust classification error. Intuitively, the theorem says that we can approximately recover in the spiked covariance ensemble model using the top eigenvector of the sample covariance matrix .
Theorem 6.
(Concentration of covariance estimation, see Corollary 8.7 in Wainwright (2019)). Given i.i.d. samples from the spiked covariance ensemble with subGaussian tails (which means both and are subGaussian with parameter at most one), suppose that and . Then, with probability at least , there is a unique maximal eigenvector of the sample covariance matrix such that
(10) 
Using the theorem above, we can show that for the Gaussian mixture model, one of the top unit eigenvector of the sample covariance matrix is approximately . In other words, we can approximately recover the parameter up to a sign difference: the principal component analysis of gives either or , while is close to .
Lemma 4.
Under the same setting as Theorem 4, suppose that and . Then, with probability at least , there is a unique maximal eigenvector of the sample covariance matrix with unit norm such that
(11) 
Proof.
As discussed above, is a spiked covariance ensemble. By Theorem 6 we have with probability at least , there is a unique maximal eigenvector of the sample covariance matrix such that
(12) 
Let , we have . Below we need to consider two cases, and .
Case 1: . Let , since both and are unit vectors, we have
(13) 
Recall that , which is equivalent to
Rearranging the terms and using AMGM inequality gives
(14) 
Therefore, by equation 13,
By substituting , and , we complete the proof.
Case 2: . Let be one of such that the the inner product is nonnegative. Since both and are unit vectors, we have
(15) 
Therefore, . ∎
Now we have proved that by using the top eigenvector of , we can recover the up to a sign difference. Next, we will show that it is possible to determine the sign using the labeled data.
Lemma 5.
Under the same setting as Theorem 4, suppose is a unit vector such that where . Then with probability at least , we have .
Proof.
Since , and both and are unit vectors, we have . So the event is equivalent to the event , i.e.
(16) 
Recall that is sampled from the Gaussian distribution , where is sampled uniformly at random from , we have follows the Gaussian distribution . Hence,
(17) 
Moreover, from we can get
(18) 
So, using the Gaussian tail bound for all , and combining with equation 16, equation 17, equation 18, we have
(19) 
as stated in the lemma. ∎
Armed with Lemma 4 and Lemma 5, we now have a precise estimation of in the Gaussian mixture model. Then, we will show that the high precision of the estimation can be translated to low robust risk. To achieve this, we need a lemma from Schmidt et al. (2018), which upper bounds the robust classification risk of a linear classifier in terms of its inner product with .
Lemma 6.
Lemma 6 guarantees that if we can estimate precisely, we can achieve small robust classification risk. Combine with Lemma 4 and Lemma 5 which provide such estimation, we are now ready to prove the robust classification risk bound stated in Theorem 4. We can actually prove a slightly more general theorem below with some extra parameters, and obtain Theorem 4 as a corollary.
Theorem 7.
Let be a labeled data drawn from Gaussian mixture model with . Let be unlabeled data drawn from . Let be as stated in Lemma 4, and be the normalized eigenvector (i.e. ) with respect to the maximal eigenvalue of such that with probability at least . Let . Then with probability at least , the linear classifier has robust classification risk at most when
(20) 