Invariant Representations from Adversarially Censored Autoencoders

Invariant Representations from
Adversarially Censored Autoencoders

Ye Wang,   Toshiaki Koike-Akino
Mitsubishi Electric Research Laboratories
Cambridge, MA 02139
{yewang, koike}
&Deniz Erdogmus
Dept. of Electrical and Computer Engineering
Northeastern University, Boston, MA 02115

We combine conditional variational autoencoders (VAE) with adversarial censoring in order to learn invariant representations that are disentangled from nuisance/sensitive variations. In this method, an adversarial network attempts to recover the nuisance variable from the representation, which the VAE is trained to prevent. Conditioning the decoder on the nuisance variable enables clean separation of the representation, since they are recombined for model learning and data reconstruction. We show this natural approach is theoretically well-founded with information-theoretic arguments. Experiments demonstrate that this method achieves invariance while preserving model learning performance, and results in visually improved performance for style transfer and generative sampling tasks.


Invariant Representations from
Adversarially Censored Autoencoders

  Ye Wang,   Toshiaki Koike-Akino Mitsubishi Electric Research Laboratories Cambridge, MA 02139 {yewang, koike} Deniz Erdogmus Dept. of Electrical and Computer Engineering Northeastern University, Boston, MA 02115


noticebox[b]Preprint. Work in progress.\end@float

1 Introduction

We consider the problem of learning data representations that are invariant to nuisance variations and/or sensitive features. Such representations could be useful for fair/robust classification [21, 12, 20], domain adaptation [19, 16], privacy preservation [6, 7], and style transfer [14]. We investigate how this problem can be addressed by extensions of the variational autoencoder (VAE) model introduced by [9], where a generative model is learned as a pair of neural networks: an encoder that produces a representation from data , and a decoder that reconstructs the data from the representation .

A conditional VAE [17] can be trained while conditioned on the nuisance/sensitive variable (i.e., the encoder and decoder each have as an additional input). In principle, this should yield an encoder that extracts representations that are invariant to , since the corresponding generative model (decoder) implicitly enforces independence between and . Intuitively, an efficient encoder should learn to exclude information about from , since is already provided directly to the decoder. However, as we demonstrate in our experiments, invariance is not sufficiently achieved in practice, possibly due to approximations arising from imperfect optimization and parametric models. The adversarial feature learning approach of [4] proposes training an unconditioned autoencoder along with an adversarial network that attempts to recover a binary sensitive variable from the representation . However, this approach results in a challenging tradeoff between enforcing invariance and preserving enough information in the representation to allow decoder reconstruction and generative model learning.

Our work proposes and investigates the natural combination of adversarial censoring with a conditional VAE, while also generalizing to allow categorical (non-binary) or continuous . Although an adversary is used to enforce invariance between and , the decoder is given both and as inputs enabling data reconstruction and model learning. This approach disentangles the representation from the nuisance variations , while still preserving enough information in to recover the data when recombined with . In Section 2.2, we present a theoretical interpretation for adversarial censoring as reinforcement of the representation invariance that is already implied by the generative model of a conditional VAE. Our experiments in Section 3 quantitatively and qualitatively show that adversarial censoring of a conditional VAE can achieve representation invariance while limiting degradation of model learning performance. Further, the performance in style transfer and generative sampling tasks appear visually improved by adversarial censoring (see Figures 3 and 4).

1.1 Further Discussion of Related Work

The variational fair autoencoder of [11] extends the conditional VAE by introducing an invariance-enforcing penalty term based on maximum mean discrepancy (MMD). However, this approach is not readily extensible to non-binary or continuous .

Generative adversarial networks (GAN) and the broader concept of adversarial training were introduced by [5]. The work of [14] also combines adversarial training with VAEs to disentangle nuisance variations from the representation . However, their approach instead attaches the adversary to the output of the decoder, which requires a more complicated training procedure handling sample triplets and swapping representations, but also incorporates the learned similarity concept of [10]. Our approach is much simpler to train since the adversary is attached to the encoder directly enforcing representation invariance.

Addressing the problem of learning fair representations [21], further work on adversarial feature learning [12, 20, 6, 7, 19, 16] have used adversarial training to learn invariant representations tailored to classification tasks (i.e., in comparison to our work, they replace the decoder with a classifier). However, note that in [12], the adversary is instead attached to the output of the classifier. Besides fairness/robustness, domain adaptation [19, 16] and privacy [6, 7] are also addressed. By considering invariance in the context of a VAE, our approach instead aims to produce general purpose representations and does not require additional class labels.

GANs have also been combined with VAEs in many other ways, although not with the aim of producing invariant representations. However, the following concepts could be combined in parallel with adversarial censoring. As mentioned earlier, in [10], an adversary attached to the decoder learns a similarity metric to enhance VAE training. In [13, 15], an adversary is used to approximate the Kullback–Leibler (KL)-divergence in the VAE training objective, allowing for more general encoder architectures and latent representation priors. Both [2] and [3] independently propose a method to train an autoencoder using an adversary that tries to distinguish between pairs of data samples and extracted representations versus synthetic samples and the latent representations from which they were generated.

2 Formulation

In Section 2.1, we review the formulation of conditional VAEs as developed by [9, 17]. Sections 2.2 and 2.3 propose techniques to enforce invariant representations via adversarial censoring and increasing the KL-divergence regularization.

Figure 1: Training setup for adversarially censored VAE. The encoder and decoder are trained to maximize the sum of the objective terms (in three dotted boxes), while the adversary is trained to minimize its objective.

2.1 Conditional Variational Autoencoders

The generative model for the data involves an observed variable and a latent variable . The nuisance (or sensitive) variations are modeled by , while captures other remaining information. Since the aim is to extract a latent representation that is free of the nuisance variations in , these variables are modeled by the joint distribution , where and are explicitly made independent. The generative model is from a parametric family of distributions that is appropriate for the data. The latent prior can be chosen to have a convenient form, such as the standard multivariate normal distribution . No knowledge or assumptions about the nuisance variable prior are needed since it is not directly used in the learning procedure.

The method for learning this model involves maximizing the log-likelihood for a set of training samples with respect to the conditional distribution

This objective is analogous to minimizing the KL-divergence between the true conditional distribution of the data and the model since

where the expectation is with respect to and can be approximated by .

Using a variational posterior to approximate the actual posterior , the log-likelihood can be lower bounded by


where in each expectation. The quantity given by (1) is known as the variational or evidence lower bound (ELBO). The inequality in (1) follows since

Thus, by optimizing both and to maximize the lower bound , while is trained toward the true conditional distribution of the data, is trained toward the corresponding posterior .

In the VAE architecture, the generative model (decoder) and variational posterior (encoder) are realized as neural networks that take as input and , respectively, as illustrated in Figure 1, and output the parameters of their respective distributions. This architecture is specifically a conditional VAE, since the encoding and decoding are conditioned on the nuisance variable .

When the encoder is realized as conditionally Gaussian:


where the mean vector and diagonal covariance matrix are determined as a function of , and the latent variable distribution is set to the standard Gaussian , the KL-divergence term in (1) can be analytically derived and differentiated [9]. However, the expectations in (1) must be estimated by sampling.

Hence, the learning procedure maximizes a sampled approximation of the ELBO , given by


where, for ,


which approximates the expectations in (1) by sampling .

2.2 Representation Invariance via Adversarial Censoring

In principle, optimal training with ideal parametric approximations should result in an encoder that accurately approximates the true posterior , for which and are independent by construction. Thus, the theoretically optimal encoder should produce a representation that is independent of the nuisance variable . In practice, however, since the encoder is realized as a parametric approximation and globally optimal convergence cannot be guaranteed, we often observe that the representation produced by the trained encoder is significantly correlated with the nuisance variable . Further, one may wish to train an encoder that does not use as an input, to allow the representation to be generated from the data alone. However, this additional restriction on the encoder may increase the challenge of extracting invariant representations.

Invariance could be be enforced by minimizing the mutual information where is the latent representation generated by the encoder. Mutual information can be subtracted from the lower bound of (1), yielding

where equality is still met for . Thus, incorporating a mutual information penalty term into the lower bound does not, in principle, change the theoretical maximum. However, since computing mutual information is generally intractable, we apply the approximation technique of [1], which utilizes a variational posterior and the lower bound


where equality is met for equal to the actual posterior for which the expectation and entropies are defined with respect to. Hence, maximizing over the variational posterior , which can also be similarly realized as a neural network, yields an approximation of . The entropy , although generally unknown, is constant with respect to the optimization variables. Incorporating this variational approximation of the mutual information penalty into (3), modulo dropping the constant , results in the adversarial training objective


where are the same samples used for as given by (4), and is a parameter that controls the emphasis on invariance. Note that when is a categorical variable (e.g., a class label), the additional, adversarial network to realize the variational posterior is essentially just a classifier trained (by minimizing cross-entropy loss) to recover from the representation generated by the encoder. In this approach, the VAE is adversarially trained to maximize the cross-entropy loss of this classifier combined with the original objective given by (3). Figure 1 illustrates the overall VAE training framework including adversarial censoring.

2.3 Invariance via KL-divergence Censoring

Another approach to enforce invariance is to introduce a hyperparameter to increase the weight of the KL-divergence terms in (4), yielding the alternative objective terms


for which (4) is the special case when . The KL-divergence terms can be interpreted as regularizing the variational posterior toward the latent prior, which encourages the encoder to generate representations that are invariant to not only but also the data . While increasing further encourages invariant representations, it potentially disrupts model learning, since the overall dependence on the data is affected.

3 Experiments

(a) Adversary Accuracy vs ELBO
(b) Mutual Information vs ELBO
Figure 2: Quantitative performance comparison. Smaller values along the x-axes correspond to better invariance. Larger values along the y-axis (ELBO) correspond to better model learning.

We evaluate the performance of various VAEs for learning invariant representations, under several scenarios for conditioning the encoder and/or decoder on the sensitive/nuisance variable :

  • Full: Both the encoder and decoder are conditioned on . In this case, the decoder is the generative model and the encoder is the variational posterior as described in Section 2.1.

  • Partial: Only the decoder is conditioned on . This case is similar to the previous, except that the encoder approximates the variational posterior without as an input.

  • Basic (unconditioned): Neither the encoder nor decoder are conditioned on . This baseline case is the standard, unconditioned VAE where is not used as an input.

In combination with these VAE scenarios, we also examine several approaches for encouraging invariant representations:

  • Adversarial Censoring: This approach, as described in Section 2.2, introduces an additional network that attempts to recover from the representation . The VAE and this additional network are adversarially trained according to the objective given by (6).

  • KL Censoring: This approach, as described in Section 2.3, increases the weight on the KL-divergence terms, using the alternative objective terms given by (7).

  • Baseline (none): As a baseline, the VAE is trained according to the original objective given by (3) without any additional modifications to enforce invariance.

(a) Partial – Baseline
(b) Partial – Adversarial censoring
(c) Partial – KL censoring
(d) Full – Baseline
(e) Full – Adversarial censoring
(f) Full – KL censoring
Figure 3: Style transfer with conditional VAEs. The top row within each image shows the original test set examples input the encoder, while the other rows show the corresponding output of the decoder when conditioned on different digit classes .

3.1 Dataset and Network Details

We use the MNIST dataset, which consists of 70,000 grayscale, pixel images of handwritten digits and corresponding labels in . We treat the vectorized images in as the data , while the digit labels serve as the nuisance variable . Thus, our objective is to train VAE models that learn representations that capture features (i.e., handwriting style) invariant of the digit class .

We use basic, multilayer perceptron architectures to realize the VAE (similar to the architecture used in [9]) and the adversarial network. This allows us to illustrate how the performance of even very simple VAE architectures can be improved with adversarial censoring. We choose the latent representation to have 20 dimensions, with its prior set as the standard Gaussian, i.e., . The encoder, decoder, and adversarial networks each use a single hidden layer of 500 nodes with the activation function. In the scenarios where the encoder (or decoder) is conditioned on the nuisance variable, the one-hot encoding of is concatenated with (or , respectively) to form the input. The adversarial network uses a 10-dimensional softmax output layer to produce the variational posterior .

We use the encoder to realize the conditionally Gaussian variational posterior given by (2). The encoder network produces a 40-dimensional vector (with no activation function applied) that represents the mean vector concatenated with the log of the diagonal of the covariance matrix . This allows us to compute the KL-divergence terms in (4) analytically as given by [9].

The output layer of the decoder network has 784 nodes and applies the sigmoid activation function, matching the size and scale of the images. We treat the decoder output, denoted by , as parameters of a generative model given by

where and are the components of and , respectively. Although not strictly binary, the MNIST images are nearly black and white, allowing this Bernoulli generative model to be a reasonable approximation. We directly display to generate the example output images.

We implemented these experiments with the Chainer deep learning framework [18]. The networks were trained over the 60,000 image training set for 100 epochs with 100 images per batch, while evaluation and example generation were performed with the 10,000 image test set. The adversarial and VAE networks were each updated alternatingly once per batch with Adam [8]. Relying on stochastic estimation over each batch, we set the sampling parameter in (4), (6), and (7).

(a) Partial – Baseline
(b) Partial – Adversarial censoring
(c) Partial – KL censoring
(d) Full – Baseline
(e) Full – Adversarial censoring
(f) Full – KL censoring
Figure 4: Generative sampling with conditional VAEs. Latent representations are sampled from and input to the decoder to generate synthetic images, with the decoder conditioned on selected digit classes in .
(a) MNIST Examples
(b) Basic – Adversarial censoring
(c) Basic – Adversarial censoring
(d) Basic – Baseline
(e) Basic – KL censoring
(f) Basic – KL censoring
Figure 5: Generative sampling with unconditioned (“basic”) VAEs. Attempting to censor an unconditioned VAE results in severely degraded model performance.

3.2 Evaluation Methods

We quantitatively evaluate the trained VAEs for how well they:

  • Learn the data model: We measure this with the ELBO score estimated by computing over the test data set (see (3) and (4)).

  • Produce invariant representations: We measure this via the adversarial approach described in Section 2.2. Even when not using adversarial censoring, we still train an adversarial network in parallel (i.e., its loss gradients are not fed back into the main VAE training) that attempts to recover the sensitive variable from the representation . The classification accuracy and cross-entropy loss of the adversarial network provide measures of invariance. Since the digit class is uniformly distributed over , the entropy is equal to and can be combined with the cross-entropy loss (see (5) and [1]) to yield an estimate of the mutual information , which we report instead.

The VAEs are also qualitatively evaluated with the following visual tasks:

  • Style Transfer (Digit Change): An image from the test set is input to the encoder to produce a representation by sampling from . Then, the decoder is applied to produce the image , while changing the digit class to .

  • Generative Model Sampling: A synthetic image is generated by first sampling a latent variable from the prior , and then applying the decoder to produce the image for a selected digit class .

3.3 Results and Discussion

Figure 2 presents the quantitative performance comparison for the various combinations of VAEs with full (), partial (), or no conditioning (), and with invariance encouraged by adversarial censoring ( red —), KL censoring ( blue —), or nothing (black). Each pair of red and blue curves represent varying emphasis on enforcing invariance (as the parameters and are respectively changed) and meet at a black point corresponding to the baseline (no censoring) case (where and ).

Unsurprisingly, the baseline, unconditioned VAE produces a representation that readily reveals the digit class ( accuracy), since otherwise image reconstruction by the decoder would be difficult. However, even when partially or fully conditioned on , the baseline VAEs still significantly reveal (partial: , full: accuracies). Both adversarial and KL censoring are effective at enforcing invariance, with adversarial accuracy approaching chance and mutual information approaching zero as the parameters and are respectively increased. However, the adversarial approach has less of an impact on the model learning performance (as measured by the ELBO score). With conditional VAEs, adversarial censoring achieves invariance while having only a small impact on the ELBO score, and appears to visually improve performance (particularly for the partially conditioned case) in the style transfer and sampling tasks as shown in Figures 3 and 4. The worse model learning performance with KL censoring seems to result in blurrier (although seemingly cleaner) images, as also shown in Figures 3 and 4. Attempting to censor a basic (unconditioned) autoencoder (as proposed by [4]) rapidly degrades model learning performance, which manifests as severely degraded sampling performance as shown in Figure 5.

The results in Figures 3 and 4 correspond to specific points in Figure 2 as follows: (a) baseline , (b) left-most (), (c) left-most (), (d) baseline , (e) left-most (), (f) left-most (). Figure 5 results correspond to points in Figure 2 as follows: (a) MNIST test examples, (b-c) two left-most (), (d) baseline , (e-f) two left-most (). Note that larger values for the and parameters were required for the unconditioned VAEs to achieve similar levels of invariance as the conditioned cases.

4 Conclusion

The natural combination of conditional VAEs with adversarial censoring is a theoretically well-founded method to generate invariant representations that are disentangled from nuisance variations. Conditioning the decoder on the nuisance variable allows the representation to be cleanly separated and model learning performance to be preserved, since and are both used to reconstruct the data . Training VAEs with adversarial censoring visually improved performance in style transfer and generative sampling tasks.


  • [1] D. Barber and F. Agakov. The IM algorithm: a variational approach to information maximization. In Advances in Neural Information Processing Systems (NIPS), pages 201–208, 2003.
  • [2] J. Donahue, P. Krähenbühl, and T. Darrell. Adversarial feature learning. In Proceedings of the International Conference on Machine Learning (ICML), 2017.
  • [3] V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, and A. Courville. Adversarially learned inference. In Proceedings of the International Conference on Machine Learning (ICML), 2017.
  • [4] H. Edwards and A. Storkey. Censoring representations with an adversary. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.
  • [5] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS), pages 2672–2680, 2014.
  • [6] J. Hamm. Minimax filter: Learning to preserve privacy from inference attacks. Journal of Machine Learning Research, 18(129):1–31, 2017.
  • [7] Y. Iwasawa, K. Nakayama, I. E. Yairi, and Y. Matsuo. Privacy issues regarding the application of DNNs to activity-recognition using wearables and its countermeasures by use of adversarial training. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI), pages 1930–1936, 2017.
  • [8] D. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.
  • [9] D. P. Kingma and M. Welling. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR), 2014.
  • [10] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. In Proceedings of the International Conference on Machine Learning (ICML), pages 1558–1566, 2016.
  • [11] C. Louizos, K. Swersky, Y. Li, M. Welling, and R. Zemel. The variational fair autoencoder. In Proceedings of the International Conference on Learning Representations (ICLR), 2016.
  • [12] G. Louppe, M. Kagan, and K. Cranmer. Learning to pivot with adversarial networks. In Advances in Neural Information Processing Systems (NIPS), pages 982–991, 2017.
  • [13] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey. Adversarial autoencoders. In Proceedings of the International Conference on Learning Representations (ICLR), 2016.
  • [14] M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun. Disentangling factors of variation in deep representation using adversarial training. In Advances in Neural Information Processing Systems (NIPS), pages 5040–5048, 2016.
  • [15] L. Mescheder, S. Nowozin, and A. Geiger. Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks. In Proceedings of the International Conference on Machine Learning (ICML), pages 2391–2400, 2017.
  • [16] J. Shen, Y. Qu, W. Zhang, and Y. Yu. Adversarial representation learning for domain adaptation. arXiv preprint arXiv:1707.01217, 2017.
  • [17] K. Sohn, H. Lee, and X. Yan. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems (NIPS), pages 3483–3491, 2015.
  • [18] S. Tokui, K. Oono, S. Hido, and J. Clayton. Chainer: a next-generation open source framework for deep learning. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS), 2015.
  • [19] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition (CVPR), 2017.
  • [20] Q. Xie, Z. Dai, Y. Du, E. Hovy, and G. Neubig. Controllable invariance through adversarial feature learning. arXiv preprint arXiv:1705.11122, 2017.
  • [21] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In Proceedings of the International Conference on Machine Learning (ICML), pages 325–333, 2013.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description