# Manifold Adversarial Learning

###### Abstract

Recently proposed adversarial training methods show the robustness to both adversarial and original examples and achieve state-of-the-art results in supervised and semi-supervised learning. All the existing adversarial training methods consider only how the worst perturbed examples (i.e., adversarial examples) could affect the model output. Despite their success, we argue that such setting may be in lack of generalization, since the output space (or label space) is apparently less informative. In this paper, we propose a novel method, called Manifold Adversarial Training (MAT). MAT manages to build an adversarial framework based on how the worst perturbation could affect the distributional manifold rather than the output space. Particularly, a latent data space with the Gaussian Mixture Model (GMM) will be first derived. On one hand, MAT tries to perturb the input samples in the way that would rough the distributional manifold the worst. On the other hand, the deep learning model is trained trying to promote in the latent space the manifold smoothness, measured by the variation of Gaussian mixtures (given the local perturbation around the data point). Importantly, since the latent space is more informative than the output space, the proposed MAT can learn better a robust and compact data representation, leading to further performance improvement. The proposed MAT is important in that it can be considered as a superset of one recently-proposed discriminative feature learning approach called center loss. We conducted a series of experiments in both supervised and semi-supervised learning on three benchmark data sets, showing that the proposed MAT can achieve remarkable performance, much better than those of the state-of-the-art adversarial approaches. We also present a series of visualization which could generate further understanding or explanation on adversarial examples.

## I Introduction

Adversarial examples refer to augmented data points generated by imperceptible perturbation of input samples. Being difficult to distinguish from real examples, such adversarial examples could however change the prediction of many of the best learning models including the state-of-the-art deep learning models [26][8][18]. To alleviate such problems, researchers have proposed adversarial training, able to certify both the robustness on adversarial examples and the generalization on original examples. Adversarial training could be used in both supervised and semi-supervised training. In supervised adversarial learning, the data labels are needed to derive the worst perturbation against loss function [21][13]; in semi-supervised adversarial learning, a virtual adversarial training (VAT) is used by smoothing the output distribution with penalizing the -divergence between outputs of adversarial and original examples. VAT achieves state-of the-art performance on both image and text classification [16][17][15].

Previous adversarial training methods simply consider how to make the results of prediction worse (in the output space) without considering how the data are robustly represented in a latent space. In general, the latent space is much more informative than the output space. It is hence meaningful if we can design the adversarial learning in the latent space rather than the output space. In this work, we develop a novel model called Manifold Adversarial Training (MAT) in the latent space. We engage an information based regularization, i.e., Maximum Mutual Information (MMI) [1][28] so as to define a distributional manifold in the latent space. We then apply the adversarial training to smooth such manifold by penalizing the -divergence between the distributions of latent features of the adversarial and original examples. The novel framework is trained in an adversarial way: the adversarial noise is generated to rough the distributional manifold, while the model is trained to smooth it to make the latent space more representative. It is similar to traditional Laplacian regularization methods with locality-preserving properties [3][2]. However, our approach is based on information geometry and the information metric -divergence.

To our best knowledge, this is one novel work that learns adversarially both a robust and compact representation in the latent space. It also presents a unified framework in that a simplified MAT could derive a famous recently-proposed discriminative feature learning model [29][27]. Though developed in the framework of supervised learning, it is straightforward and much easier to be extended in semi-supervised learning. We develop a feasible and efficient training algorithm capable of obtaining remarkably better performance than the existing adversarial training approaches. In particular, we implemented our proposed method on benchmark datasets MNIST, CIFAR-10, and SVHN. Our method achieves in supervised and semi-supervise learning the state-of-the-art performance, much better than the best of the existing counterpart methods.

## Ii Related Work

It has a long history to use the perturbed examples to regularize the output [24]. Bishop et al. proposed a method to add the Gaussian noise to input samples and showed that it is equivalent to adding the penalty term to original objective function [4]. Dropout can also be treated as random perturbation to prevent from over fitting [24][6]. The Unified Gradient Regularization Family is proposed to find the worst perturbation to increase the objective function [13]. It approximates the non-convex problem with Taylor series and applies the Lagrange multiplier method to evaluate the worst perturbation. Another similar work proposed by Aman et al. is to perturb the underlying data distribution in a Wasserstein ball [21]. Virtual Adversarial training proposed by Takeru [16][15] is perhaps most related to our work. It developed a method extending the adversarial training to semi-supervised task by promoting the local smoothness of the output distribution.

Another type of semi-supervised learning methods are based on generative models. Ladder network combined the deep network and auto encoder with connections between two networks at each layer and achieves encouraging results [19]. Triple generative Adversarial Network is proposed to combine Generative Adversarial Network (GAN) with classifier. There are three players, generator, discriminator and classifier playing against with each other [5]. Some Bayesian methods employ variational methods with deep learning [9].

This work is also related to some traditional manifold regularization. Laplacian Eigenmaps was developed to consider how to construct a representation for data lying on a low dimensional manifold embedded in a high-dimensional space [3]. A geometric framework was also proposed to exploit the geometry of the marginal distribution using the information of both labeled and unlabeled examples [2]. Both of these two methods tried to smooth the low dimensional manifold with preserving local neighborhood information.

## Iii Main Method

In this section, we first introduce two previous adversarial training methods, adversarial training with norm constraint and Virtual Adversarial Training (VAT) which are close related to our proposed approach. Then we describe our proposed Manifold Adversarial Training (MAT). Specifically, we first introduce how to regularize the latent space with Maximum Mutual Information (MMI) and represent the latent features with Gaussian Mixtures. We then define the notion of the low dimensional submanifold of latent space and the smoothness of statistic manifold. After that we describe the major framework and present a series of theory to derive the practical optimization algorithm. Finally, we also briefly conduct a computational analysis. We now give a set of notations. The input set is represented as and the corresponding output is . Let and where, is the dimension of input space and denotes the output dimension. The model distribution is denoted by with model parameters . In this paper, labeled dataset and unlabeled dataset are used to train the model .

### Iii-a Adversarial Training

Adversarial training is to train the model with both the natural examples and adversarial examples. Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM) to generate the adversarial examples [8] and Lyu et al proposed a method which extends FGSM to norm constraint [13]. Then we can formulate the optimization problem of adversarial training in the form of distribution:

(1) |

where is the true distribution of output label given input and denotes the model distribution. is a perturbation added on the input within a small range. is a small value. None negative function is used to measure the divergence between true distribution and the model distribution. For examples, KL-divergence, f-divergence can be used to measure the divergence. The objective of adversarial training is to fit the true distribution with model distribution both on natural examples and adversarial examples.

To implement the adversarial training, the inner maximization problem need to be firstly solved to obtain the worst perturbation. For simplifying optimization, the first Taylor expansion of is used as the approximation for original objective function. With the constraint, the worst perturbation can be approximated as:

(2) |

where is the dual of p, i.e, and the first derivative of loss function with respect to input . When , this method can be degraded to Fast Gradient Sign Method (FGSM). And the worst perturbation becomes:

(3) |

For finding the worst perturbation, it is easy to compute with backpropagation. Moreover it does not need a large amount of computation since it just needs one more backward process. After computing the adversarial perturbation, the model distribution is trained to approximate the true distribution on both natural and adversarial examples. It has been proved to be able to achieve the better generalization performance than the traditional training methods of deep neural network.

### Iii-B Virtual Adversarial Training

Another work similar to our method is Virtual Adversarial Training (VAT). In most cases, the full labeled information is not provided and it is important to make use of the information of unlabeled data. Different from the traditional adversarial training methods, Virtual Adversarial Training method can utilize both labeled and unlabeled data. The optimization problem of VAT can be formulated as:

(4) |

where, is either labeled data or unlabeled data. Different from traditional adversarial training, Virtual Adversarial Training minimize the divergence between two model distributions instead of the divergence between true distribution and model distribution. The objective of VAT is to smooth the output distribution around through the method of adversarial training. Specifically, inner optimization problem is to find the worst perturbation within a small range to make the model distributions of and the most different, oppositely, the outer one is to try to make two model distributions the same. Therefore, VAT does not require the label information and can be applied on unsupervised and semi-supervised tasks.

### Iii-C Manifold Adversarial Training

Both the traditional adversarial training and Virtual Adversarial Training methods attempt to find the worst perturbation to make the outputs of models worst. However, these two methods do not consider how the data is represented in latent space. Therefore, we design a new adversarial method Manifold Adversarial Training (MAT) to learn a good geometric structure of the data distribution for improving the robustness of representation in latent space. In following subsections, we first introduce how to regularize the latent space with Maximum Mutual Information (MMI). We then define the notion of smoothness of distributional manifold. After that we describe the major framework and present a series of theory to derive the practical optimization algorithm. Finally, we also briefly conduct a computational analysis.

### Iii-D Modeling the Latent Space

In the probability theory, the mutual information is to measure the mutual dependence between two variables. In other words, it measures how much knowing one of these variables reduces uncertainty about the other. For learning discriminative features with more information of labels, MMI is proposed to increase the mutual dependence between the latent features and the class label which can be seen as an information regularization. We define the label set and the latent features , where and denote the number of classes and samples respectively. is the dimensionality of the latent space. The mutual information between the event and the event with respect to is given by:

(5) |

where and are two random variables representing class label and latent feature respectively.

In this work, we assume the conditional probability , the multivariate Gaussian distribution, as Eq. (2). We also assume that the latent features are generated by Gaussian mixtures. The number of components of Gaussian mixtures is given as the class number in this paper.

(6) |

In Eq. (2), is the class center for class and is corresponding covariance matrix. represents the dimension of latent space. In this work, we assume the class prior is a constant ( is the class number). Then, the latent feature can be represented by Gaussian mixtures, . Again,the number of the components is set to the class number . Each component of Gaussian mixtures denotes the probability of the latent feature assigned to class . We reformulate the mutual information as follows:

In this case, maximizing the mutual information is equivalent to maximizing the log posterior . The last term of Eq. (III-D) is the log marginal distribution which can be seen as the normalization term. Therefore, we can just maximize the first term . It is equivalent to making the latent features more compact or discriminate with respect to class centers. Then we define the information regularization in this paper:

(8) |

One of our objectives is to maximize the above information regularization term so as to model a good latent space. In this paper, though we assume GMM over the latent space, it presents a generalized version to a recently proposed famous center loss model [29]. This can be readily obtained in Proposition 0.1.

###### Proposition III.1.

The center loss [29] can be viewed as the special case of our proposed information regularization.

###### Proof.

We reformulate the information regularization as:

If we make the covariance matrix as identity, we can have:

where is a constant. It is clear that the above last term changes into the center loss as defined in [29]. ∎

Remarks. Center loss [29] is also proposed to penalize latent features to be closer to class centers. However, it implicitly assumes an identity covariance matrix as shown in Proposition 0.1. In comparison, we propose a more generalized regularization. With such regularization, we can represent the latent features with Gaussian mixtures which can be easily employed on information geometry and information metric -divergence, which we show in the next subsection. Note that, in this work, we just employ such regularization term in the last hidden layer.

### Iii-E Defining Low Dimensional Statistical Manifold

An -dimensional manifold is defined as a set of points such that each point has -dimensional extensions in its neighborhood and such a neighborhood is topologically equivalent to an -dimensional Euclidean space. Intuitively speaking, it can be viewed as a deformed Euclidean space. In this paper, the dimension of the entire manifold of latent space is denoted as and the coordinate system is defined as . In the previous section, the feature points are modeled by Gaussian mixtures. Hence, the set of all the Gaussian mixtures is an -dimensional statistical submanifold , where a point on manifold denotes a Gaussian mixture function and describes a coordinate system ( is the component of Gaussian mixture) as illustrated in Figure 1. The degree of separation of two points on manifold is measured by the -divergence between two Gaussian mixtures. In this paper, our aim is to smooth the manifold .

### Iii-F Defining Statistical Manifold Smoothing

In this section, we consider how to evaluate the smoothness of statistical manifold. First, we define some notations. The input set is denoted as and the output of model is defined as . The training set is given as follows:

(9) |

We then use to train the model distribution with the regularization term . In the previous subsection, the latent features are represented by the Gaussian mixtures . We can then easily define the distance between the two points on the statistical manifold with -divergence between mixtures and . For learning a good manifold in the latent space, we try to preserve the local information in the embedding as in [3][2]. Specifically, we add the small adversarial perturbation to input samples and try to make close the latent representation of perturbed samples and original one. Since the latent features are represented by Gaussian mixtures in this work, the information metric -divergence can be readily used to measure the similarity. The notion of smoothness of statistical manifold can then be defined as the variation of the latent features on the statistical space caused by the adversarial perturbation as follows:

(10) |

where denotes the latent feature of with the model parameters . which is defined as the Gaussian mixture representation for latent features . represents the adversarial perturbation.

The smaller value of means the statistical manifold is more smooth at even when the perturbation is imposed. We also define the smoothness in the output or label space as in [16][15]:

(11) |

In more details, describes the -divergence between the output space of adversarial example and the original examples. The overall smoothness for both the output and manifold in latent space are then given as follows:

(12) |

### Iii-G Final Optimization Problem of MAT

In the previous subsections, we have defined the information regularization, manifold smoothness, and label space smoothness. We can now obtain the final optimization problem of our proposed framework as follows:

(13) |

where the hyperparameters , and .

In the above objective function, the first term represents the data log likelihood, the second term defines the overall data smoothness in both the manifold and output space, and the last term presents the information regularization. On one hand, the proposed MAT framework tries to optimize the model parameter so as to find the best latent space (by maximizing the information regularization ), enlarge the data log likelihood (so as to fit the data) as well as increasing the data smoothness (decreasing ); on the other hand, the imperceptible perturbation tries to minimize the above objective function. In other words, the proposed novel model tries to find the best parameter that is even robust to the worst perturbation as given by the optimal . Since this adversarial learning is defined in the latent manifold space, we call this model as Manifold Adversarial Training (MAT). In the next section, we will use a strategy similar to [13] and [16] and discuss how to solve this optimization problem practically and efficiently.

### Iii-H Practical Algorithm

Similar to [13], we try to solve the above optimization problem in an alternative way. We first solve the inner minimization problem with respect to , i.e. the worst perturbation, which we denote as . In order to calculate this worst perturbation, the -divergence between GMMs needs to be calculated firstly. Since it is difficult to conduct the evaluation directly, we approximate it by matching between the Gaussian elements of the two Gaussian mixture density as described in [7]:

where is a stochastic matrix. For simplifying the calculation, we try to optimize the upper bound of -divergence with assuming an identity matrix and the mixture weight and .

Since it is difficult to evaluate the worst perturbation in a non-convex problem, we relax it to a convex problem with second-order Taylor expansions as in [16][15]. Since reaches the minimum value when and is differentiable with respect to and , the first derivative . We can finally approximate it as follows:

(14) |

where, is the Hessian matrix calculated by employing the second derivative, .

Before we could obtain the worst perturbation, we first present Lemma III.2 and Lemma III.3 as follows:

###### Lemma III.2.

Let be a real square matrix in and be the dominant eigenvector of and is a vector which is not perpendicular to . Then, the iterative calculation of

(15) |

will make converge to . ( represents the normalization operator)

###### Lemma III.3.

Let be a Hessian matrix function with respect to in and be a vector in . Then we have:

(16) |

###### Proof.

We give the Taylor expansion of the first derivative of function :

(17) |

then we have:

(18) |

∎

By using Lemma 0.2 and Lemma 0.3 , we could solve the inner minimization problem, i.e., obtain the current worst perturbation. Specifically, we have Theorem 0.4 showing that it can then be written as the production of most sensitive direction and the scale , where the most sensitive direction can be approximated iteratively by the power method.

(19) |

where, represents the normalization operator and . is a vector which is not perpendicular to dominant eigenvector of .

###### Theorem III.4.

The steepest direction in (14) can be approximated by iterative calculation of:

(20) |

where represents the normalization operator.

###### Proof.

Applying the Lagrange multiplier method on (14), we can easily get that the steepest direction has the same direction with the dominant eigenvector of . Then we can easily approximate the steepest direction by iteritive calculation of (refer to Lemma 0.2):

(21) |

where, is a vector which is not perpendicular to dominant eigenvector of . Then we can calculate using Lemma 0.3:

(22) |

where, is a small value. Then we have:

(23) |

∎

After we solve the inner minimization problem with respect to , we can then solve the maximization problem with respect to . We iterate these two steps until the process converges. The detailed pseudo code algorithm is shown in Algorithm 1.

### Iii-I Computational Analysis

We briefly touch the computational analysis here. Compared with previous methods like VAT [16], our proposed method needs to compute the mean and covariance matrix for Gaussian mixtures additionally. In this work, parameters of Gaussian mixtures are evaluated by the latent features of labeled data instead of stochastic gradient. It may be inefficient to compute the inverse of covariance matrix. However, in the low-dimensional latent space, the inverse of covariance matrix can be easily and directly obtained; even in the very high-dimensional space, the covariance matrix can be assumed as a diagonal matrix which is also easy to get its inverse. Same as VAT, we just need one iteration of the power method to estimate the worst perturbation . The whole procedure for updating the parameters of deep neural networks consists of two forward and two back propagations. The first forward and backward propagations are used to evaluate the worst perturbation. After calculating the final loss, propagate forward and backward again to update the parameters of neural network and Gaussian mixtures.

## Iv Experiment

For assessing the efficacy of our proposed MAT, we implemented it on several benchmark datasets including MNIST, CIFAR-10, and SVHN. In principle, adversarial methods can be regarded as a robust method which typically have better generalization abilities than the traditional methods. To check if the proposed MAT can indeed improve the classification performance, we first applied MAT on the on benign data of MNIST and CIFAR-10 in both the supervised and semi-supervised task. Additionally, to check further the robustness of our proposed adversarial training framework, we perform various experiments to examine how the proposed MAT could defend the attack from various adversarial methods. To illustrate our proposed method, a series of visualizations were made, offering some interesting results which might be used to explain the adversarial examples.

### Iv-a Experiments on Benign Data

We first evaluated the performance of our proposed MAT methods against many other comparison methods on the benign data. We first report the performance of various methods in the setting of supervised classification and then perform the comparison in semi-supervised learning.

#### Iv-A1 Supervised Learning

We implement the same framework Lenet++ with [29] on MNIST dataset. For Lenet++, there are only two dimensions in the last hidden layer which is convenient for visualization. The based framework for experiment on CIFAR-10 is the same as [16] called Conv-Large. For MNIST dataset, we train the deep model with labeled training samples, and we evaluate it with test samples. For CIFAR-10, we use training samples and test samples. To search good hyper parameters, the training set is divided into training set and validation set. We choose a set of hyper parameters with the best performance on the validation set. For MNIST dataset, the best hyper parameters are obtained as: , and . For CIFAR-10, the parameters are obtained as: , and .

Method | MNIST |
---|---|

Test error rate() | |

SVM | |

Dropout [24] | |

Ladder networks [19] | |

Adversarial, norm constraint [8] | |

Adversarial, norm constraint [16] | |

RPT [16] | |

Baseline | |

Center loss | |

VAT | |

MAT |

Method | CIFAR-10 |
---|---|

Test error rate() | |

Network in Network [12] | |

All-CNN [22] | |

Deeply Supervised Net [11] | |

Highway Network [25] | |

RPT [16] | |

Baseline | |

VAT | |

MAT |

Table I and Table II list the performance of our proposed method and other competitive methods on MNIST and CIFAR-10. Generally, adversarial methods can be regarded as a robust method which typically have better generalization abilities than the traditional methods. As observed from both the tables, MAT clearly achieves the best performance among all the comparison methods in both the datasets. It is probably unfair to compare with some methods, e.g., ladder network, simply because the different based model were used there. However, it is sufficient to show the superiority of our proposed MAT due to its significant lower error rate than most of the other competitive adversarial methods.

#### Iv-A2 Semi-supervised Learning

In the methodology part, we have introduced the MAT’s final objective function Eq. (13) where the first term is the ordinary soft-max loss function and the last term is the information regularization term. These two terms are exploited on the training data with labels. And the second term is used to smooth the manifold and output distribution which does not need the label information. Therefore, our method can readily be extended to semi-supervised learning. Following the same setting as [16], we implement our MAT both on CIFAR-10 and SVHN with labeled data and labeled data respectively. We use the same base model with [16] called Cov-Large with batch normalization and dropout. We exploit the mini-batch of size for both labeled data and unlabeled data on CIFAR-10. For SVHN, we use the labeled batch with size and unlabeled batch with size .

Method | SVHN |
---|---|

Test error rate() | |

SWWAE [30] | |

Skip Generative Model [14] | |

GAN with feature matching [20] | |

model [10] | |

RPT [16] | |

VAT | |

MAT |

#### Iv-A3 Visualization

In order to illustrate why the proposed MAT could perform excellent, we take the MNIST as one example to visualize various methods. In particular, Figure 2 shows the illustration of the last embedding space of Lenet++ with center loss, softmax, VAT, and MAT on the MNIST test data.^{1}^{1}1Note that the graph has been smoothed with fitting a Gaussian over each point in order to generate clear visualizations. It is obvious that the latent features of our method are represented more compactly and discriminatively with respect to class centers. VAT learns the similar latent space with softmax, but apparently both of their features are not as discriminative as our proposed MAT. This may illustrate why our proposed MAT method could usually generate better classification performance than VAT and the traditional CNN.

Since our proposed MAT seeks a smooth latent space which we believe would benefit the classification, we plot in Figure 3 the learning curve (accuracy rate) and the smoothness for the three different methods: MAT (our proposed method), VAT, and the baseline (traditional CNN) on CIFAR-10.^{2}^{2}2Our proposed MAT and VAT are both extended with the same baseline model. For MAT, the best set of hyperparameters, i.e., and was used, while for VAT, the best setting reported in [16] was directly applied. Compared with the other two methods, our proposed MAT clearly increases the smoothness , which is defined by the average of the smoothness of output space and latent space:

(24) |

When we calculate the smoothness, the perturbation needs to be normalized to unit vector. Figure 3(b) shows that our proposed method indeed learns the smoother latent and output space.

### Iv-B Performance on Defending Adversarial Examples

We now turn to examining how the proposed MAT method could defend the attacks from various adversarial examples generation approaches in comparison with other competitive methods. Visualization is also presented so as to obtain further understandings on adversarial examples.

#### Iv-B1 Robustness to Adversarial Attacks

We implement different methods on FGSM and 2-norm adversarial attack for MNIST, CIFAR-10, and SVHN datasets. Particularly, we generate in the test sets of MNIST and CIFAR-10 10,000 adversarial examples according to FSGM and 2-norm attacks [13] respectively. For SVHN, we generate 26,032 adversarial examples. We increase the level of adversarial noise gradually from 0 to 8 in MNIST and SVHN with the step size as 1 and from 0 to 13 in CIFAR-10 with the step size as 1.6. We then test the performance of various training methods on these adversarial examples. The performance is plotted in Fig. 4. As clearly observed, the proposed MAT shows better robustness against the two types of adversarial examples. Particularly, when the adversarial noises are small, all the adversarial training methods show similar results but perform much better than the CNN (exploiting no adversarial training); when the adversarial attacks are stronger, the proposed method overall demonstrates clearly better performance in almost all the cases, verifying its significant robustness. One exception can be identified in Fig. 4(c) where the FGSM training method performs the best while our proposed MAT performs the second. However, since the FGSM training was specifically designed to defend the FGSM adversarial examples, it will not be very surprising that FGSM can do better against the FGSM adversarial examples. Nonetheless, except Fig. 4(c), our MAT still demonstrates the best robustness in all the other cases.

Again, taking MNIST as one example, we show the TSNE embedding for the second last layer given by VAT and MAT under various levels of 2-norm adversarial attack. As Figure 5 shows, when the degree of adversarial noise increases (from 0 to 6 with step size 2), the clusters of different classes are less discriminative. In contrast to VAT, our proposed MAT obtains much more discriminative features on different levels of adversarial perturbation.

#### Iv-B2 Visualization of Adversarial Examples

In this subsection, we visualize some adversarial examples generated by both our proposed method and the other comparison adversarial methods. To visualize the adversarial examples, we directly train the deep neural network with original images and then calculate the worst perturbation through back propagation. For better visualization, the image up-sampling was applied.

Specifically, we compare the adversarial examples generated by our method (MAT), VAT, and traditional adversarial learning on MNIST, SVHN, and CIFAR10. Figure 6-8 show several adversarial examples (generated by different methods) from the three datasets. It is interesting to note that after the adversarial training given by MAT, the original images tend to be morphed into some other categories. Particularly, digit ”” was changed into ””, ”” was changed to ””, and ”” was changed to ”” on MNIST (see Figure 6). Similar cases can also be observed in Figure 7 where the worst perturbation generated by MAT tends to change the number ”” to number ”” (the 1st image in the 2nd row of Figure LABEL:visual1), the number ”” to ”” (the 4th image in the 2nd row of Figure 7), and numbers ”66” to ”00” (the 8th image in the 2nd row of Figure 7). In comparison, the other adversarial methods also appear to generate adversarial examples in a same sense but not as obvious as MAT. This may partly explain why the adversarial examples would be difficult to be recognized by many traditional deep neural networks. Furthermore, generating the most challenging examples (being similar to or even changed to other categories), MAT indeed leads to the most serious adversarial attacks that deep neural networks hardly recognize. Finally, it is observed that the major changed areas occur on the number strokes, which can be considered as the “manifold” of these digit images. This implies that our MAT training method could truly generate adversarial attacks in the manifold.

Last, we also show in Figure 8 adversarial examples on CIFAR-10 generated from various methods. Unfortunately, very different from MNIST and SVHN, it appears that the resulting examples give no obvious clues on how the images would be changed. This might be explained by the fact that general objects are sufficiently more complicated than digits. We would leave the topic of generating explainable adversarial examples in the future.

## V Conclusion

We present the Manifold Adversarial Training (MAT), a novel method to smooth the distributional manifold in the latent space. Compared with other adversarial training methods, our proposed MAT learns adversarially a robust feature representation in the latent space, making the latent space both informative and discriminative. Specifically, we first represent the latent features with Gaussian mixtures. We then define the smoothness of distributional manifold based on the -divergence between Gaussian mixtures for the original and adversarial examples. We implemented MAT on MNIST, CIFAR-10 and SVHN in both supervised and unsupervised tasks. The results showed that our proposed model is much better than those state-of-the-art methods.

## References

- [1] (1986) Maximum mutual information estimation of hidden markov model parameters for speech recognition. In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP’86., Vol. 11, pp. 49–52. Cited by: §I.
- [2] (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research 7 (Nov), pp. 2399–2434. Cited by: §I, §II, §III-F.
- [3] (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation 15 (6), pp. 1373–1396. Cited by: §I, §II, §III-F.
- [4] (1995) Training with noise is equivalent to tikhonov regularization. Neural computation 7 (1), pp. 108–116. Cited by: §II.
- [5] (2017) Triple generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 4091–4101. Cited by: §II.
- [6] (2016) Dropout as a bayesian approximation: representing model uncertainty in deep learning. In international conference on machine learning, pp. 1050–1059. Cited by: §II.
- [7] (2003) An efficient image similarity measure based on approximations of kl-divergence between two gaussian mixtures. In null, pp. 487. Cited by: §III-H.
- [8] (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §I, §III-A, TABLE I.
- [9] (2014) Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems, pp. 3581–3589. Cited by: §II.
- [10] (2016) Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242. Cited by: TABLE III, TABLE IV.
- [11] (2015) Deeply-supervised nets. In Artificial Intelligence and Statistics, pp. 562–570. Cited by: TABLE II.
- [12] (2013) Network in network. arXiv preprint arXiv:1312.4400. Cited by: TABLE II.
- [13] (2015) A unified gradient regularization family for adversarial examples. In Data Mining (ICDM), 2015 IEEE International Conference on, pp. 301–309. Cited by: §I, §II, §III-A, §III-G, §III-H, §IV-B1.
- [14] (2016) Auxiliary deep generative models. arXiv preprint arXiv:1602.05473. Cited by: TABLE III.
- [15] (2016) Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725. Cited by: §I, §II, §III-F, §III-H.
- [16] (2017) Virtual adversarial training: a regularization method for supervised and semi-supervised learning. arXiv preprint arXiv:1704.03976. Cited by: §I, §II, §III-F, §III-G, §III-H, §III-I, §IV-A1, §IV-A2, §IV-A3, TABLE I, TABLE II, TABLE III, TABLE IV.
- [17] (2015) Distributional smoothing with virtual adversarial training. arXiv preprint arXiv:1507.00677. Cited by: §I.
- [18] (2015) Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 427–436. Cited by: §I.
- [19] (2015) Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems, pp. 3546–3554. Cited by: §II, TABLE I, TABLE IV.
- [20] (2016) Improved techniques for training gans. In Advances in Neural Information Processing Systems, pp. 2234–2242. Cited by: TABLE III, TABLE IV.
- [21] (2017) Certifiable distributional robustness with principled adversarial training. arXiv preprint arXiv:1710.10571. Cited by: §I, §II.
- [22] (2014) Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806. Cited by: TABLE II.
- [23] (2015) Unsupervised and semi-supervised learning with categorical generative adversarial networks. arXiv preprint arXiv:1511.06390. Cited by: TABLE IV.
- [24] (2014) Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15 (1), pp. 1929–1958. Cited by: §II, TABLE I.
- [25] (2015) Highway networks. arXiv preprint arXiv:1505.00387. Cited by: TABLE II.
- [26] (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §I.
- [27] (2018) Rethinking feature distribution for loss functions in image classification. Cited by: §I.
- [28] (1996) Multi-modal volume registration by maximization of mutual information. Medical image analysis 1 (1), pp. 35–51. Cited by: §I.
- [29] (2016) A discriminative feature learning approach for deep face recognition. In European Conference on Computer Vision, pp. 499–515. Cited by: §I, §III-D, §III-D, §III-D, Proposition III.1, §IV-A1.
- [30] Stacked what-where auto-encoders. arxiv 2015. arXiv preprint arXiv:1506.02351. Cited by: TABLE III.