Adversarial Variational Domain Adaptation
In this work we address the problem of transferring knowledge obtained from a vast annotated source domain to a low labeled or unlabeled target domain. We propose Adversarial Variational Domain Adaptation (AVDA), a semi-supervised domain adaptation method based on deep variational embedded representations. We use approximate inference and adversarial methods to map samples from source and target domains into an aligned semantic embedding. We show that on a semi-supervised few-shot scenario, our approach can be used to obtain a significant speed-up in performance when using an increasing number of labels on the target domain.
Deep neural networks have become the state of the art for a lot of machine learning problems. However, these methods usually imply the need for a large amount of labeled data in order to avoid overfitting and be able to generalize. Furthermore, it is assumed that train and test data come from the same distribution and feature space. This becomes a huge problem in cases when labeling is costly and/or time-consuming. One way to address this challenge is to use a source domain which contains a vast amount of annotated data and reduce the domain shift between this domain and a different, but similar, target domain in which we have few or even no annotations.
Domain adaptation (DA) methods aim at reducing the domain shift between datasets pan_2010 , allowing to generalize a model trained on source to perform similarly on the target domain by finding a common shared space between them. Deep DA uses deep neural networks to achieve this task. Previous works in deep DA have addressed the problem of domain shift by using statistical measures long_2017 ; long_2015 ; tzeng_2014 ; yan_2017 ; Zhuang_2015 ; sun_2016 ; peng_2017 or introducing class-based loss functions tzeng_2015 ; gebru_2017 ; motiian_2017a ; motiian_2017b in order to diminish the distance between domain distributions. Since the appearance of Generative Adversarial Networks goodfellow_2014 new approaches have been developed focused on using adversarial domain adaptation (ADA) techniques ganin_2015 ; ganin_2014 . The goal of adversarial domain adaptation tzeng_2017 is to learn from the source data distribution a model to predict on the target distribution by finding a common representation for the data by using an adversarial objective with respect to a domain discriminator. This way, a domain-invariant feature space can be used to solve a classification task on both the source and the target.
Despite ADA methods being good at aligning distributions even in an unsupervised domain adaptation (UDA) scenario (i.e. with no labels from the target), they have problems when facing some domain adaptation challenges. First, since most of these methods were made to tackle UDA problems, they usually fail when there is a significant covariate shift between domains zou_2019 . Second, these methods are not able to take advantage of the semi-supervised scenario in order to produce more accurate models when a few amount of labels are available from the target, generating poor decision boundaries near annotated target data saito_2019 . This behavior has been studied in different works, which tried to adapt domain-invariant features from different classes independently saito_2018 ; kang_2019 .
In this work, we propose Adversarial Variational Domain Adaptation (AVDA), a domain adaptation model which works on unsupervised and semi-supervised scenarios by exploiting target labels when they are available by using variational deep embedding (VaDE jiang_2016 ) and adversarial methods goodfellow_2014 . The idea behind AVDA is to correct the domain shift of each class independently by using an embedded space composed by a mixture of Gaussians, in which each class correspond to a Gaussian mixture component.
The performance of AVDA was validated on benchmark digit recognition tasks using MNIST lecunn_1998 , USPS usps_1988 , and SVHN svhn_2011 datasets and on a real case consisting in galaxy images using the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS, grogin011 ) as source and the Cluster Lensing and Supernova Survey with Hubble (CLASH, Postman_2012 ) as target. We demonstrate competitive results among other methods in the state-of-the-art for the digits task and then show the potential of our model obtaining speed-up in performance when few target labels are available even in a high domain shift scenario.
2 Related Work
Due the capability of deep networks to learn transferable bengio_2012 ; yosinski_2014 ; oquab_2014 and invariant goodfellow_2009 representations of the data, the idea of transferring knowledge acquired from a vast labeled source to increase the performance on a target domain has become a wide area of research Tan_2018 . Domain adaptation methods deal with this challenge by reducing the domain shift between source and target domains pan_2010 , aligning a common intern representation for them.
Some statistical metrics have been proposed in order to align source and target distributions, such as maximum mean discrepancy (MMD) long_2017 ; long_2015 ; tzeng_2014 ; yan_2017 , Kullback Leibler (KL) divergence Zhuang_2015 or correlation alignment (CORAL) sun_2016 ; peng_2017 . Since the appearance of Generative Adversarial Networks goodfellow_2014 significant work has been developed around adversarial domain adaptation (ADA) techniques ganin_2015 . The idea of ADA methods is to use a domain classifier which discriminates if a sample belongs to the source or target domain, while a generator learns how to create indistinguishable representations of data in order to fool the domain classifier. By doing this, a domain-invariant representation of the data distribution is produced in a latent space.
Despite ADA models achieving good results either by matching distributions in a feature representation (i.e. feature-level) ganin_2014 ; ming_2017 ; long_2018 ; shu_2018 ; russo_2017 or generating target images that look as if they were part of the source dataset (i.e. pixel-level) isola_2016 ; zhu_2017 ; hoffman_2017 ; hu_2018 ; hosseini_2019 , when they are used in a UDA scenario, they have difficulties dealing with big covariate shifts between domains zou_2019 . Furthermore, when a few number of annotated target samples are included, these models often do not improve performance relative to just train with labeled target samples saito_2019 . In order to deal with few labels, few-shot domain adaptation methods have been created motiian_2017a ; motiian_2017b , which are not meant to work with unlabeled data, often producing overfitted representations and having problems to generalize on the target domain wang_2018 .
Semi-supervised domain adaptation (SSDA) deal with these challenges using both labeled and unlabeled samples during training gong_2012 ; gopalan_2011 ; glorot_2011 ; santos_2017 ; belhaj_2018 ; saito_2018 . Usually for SSDA we are interested on finding a space in which labeled and unlabeled target samples belonging to the same class have a similar internal representation donahue_2013 ; yao_2015 ; zou_2019 ; saito_2019 . A promising approach to deal with labeled and unlabeled samples during training are semi-supervised variational autoencoders kingma_2014 ; rasmus_2015 ; maloe_2016 . These models seek to learn a latent space which depends on the labeled data. As the latent space is shared between labeled and unlabeled data, points from the same class will be closer in the latent space. This latent space can be extended to an embedding in which each class is represented by an embedding component. Our proposed AVDA framework uses a variational deep embedding jiang_2016 representation, in which both source and target samples that belong to the same class are mapped into an embedding component, allowing the model to obtain a significant speed-up in performance as more labels are used from the target domain.
3 Adversarial Variational Domain Adaptation
In this work we propose Adversarial Variational Domain Adaptation (AVDA), a model based on semi-supervised variational deep embedding and adversarial methods. We use a Gaussian mixture model as a prior for the embedded space and align samples from source and target domains that belong to the same class into the same Gaussian component.
3.1 Problem Definition
In a semi-supervised domain adaptation scenario, we are given a source domain with number of labeled samples and a target domain with number of labeled samples. Also, for the target domain we have a subset of unlabeled samples. For both domains we have the same classes, i.e. , . Source and target data are drawn from unknown joint distributions and respectively, where .
The goal of this work is to build a model that provides an embedding space in which source and target data have the same representation for each of the classes. We propose the use of a Semi-supervised Variational Deep Embedding . This model is composed by the inference models and that encodes source and target data into this latent representation, which we set to be a mixture of Gaussian distribution depending on the labels and they are parametrized by and , for source and target respectively. Also, the generative model describes the data as if they were generated from a latent variable and is parametrized by . A discriminative process is included to enforce the separability between the Gaussian mixture components. The overall model is displayed in Figure 1.
3.2 Adversarial Variational Domain Adaptation Model
For the source domain, we define a generative process as follows:
where is a multinomial distribution parametrized by , the prior probability for class , , . At the same time, and are the mean and variance of the embedded normal distribution corresponding to class labels . is a likelihood function whose parameters are formed by non-linear transformations of the variable using a neural network with parameters . In this work, we use deep neural networks as a non-linear function approximation.
For source and target domains, we define two inference models. We use variational inference to find an approximation for the true posterior distribution using the approximated posterior distributions and which are parametrized by a deep neural networks with parameters for the source domain and for the target domain. We assume the approximate posterior can be factorized as , and model it by using normal and categorical distributions as follows:
where and are the outputs of the source and target deep neural networks with parameters and respectively, and are then used to sample from a Gaussian mixture distribution by using the reparametrization trick defined in kingma_2013 . and represent the source and target processes modeled through independent neural networks. These networks take the latent variables and return the parameters and to sample a categorical variable by using a Gumbel-Softmax distribution jiang_2016 . With this estimator, we can generate labels and backpropagate through this sampled categorical variable by using the continuous relaxation defined by Jiang et al. 2016 jiang_2016 , avoiding the marginalization over all the categorical labels introduced in kingma_2014 ; jiang_2016 , significantly reducing computational costs.
3.3 Variational Objectives
We can notice that the observed label never appears in the equation. However, the supervision comes in when computing the Kullback-Leibler divergence between and . In fact, we force to be equal to the Gaussian belonging to the with distribution . At the same time, a predictive function is included in order to enforce the separability between the embedding components. The lower bound for source domain can be optimized by minimizing the following objective:
where is the hyper-parameter that controls the relative importance between the generative and discriminative processes of the model.
On the other hand, for the target domain, we would like that the inference model will be able to map the samples into the same embedding obtained by the source generative model. Taking this into account, the objective can be decomposed in two parts. One for labeled data and other for unlabeled data. For a single target labeled data point we can obtain the supervised objective as follows:
where is the optimized prior distribution for the source. is the hyper-parameter that controls the relative importance of the discriminative process in the model. For a single target unlabeled data point we can derive the unsupervised objective as follows:
The minimization of this term helps generating the mapping of each unlabeled target sample component into the correspondent embedding space using a predicted categorical variable.
3.4 Adversarial Objective
We would like the agglomerative distribution over the approximated posterior distribution to be the same for source and target domains (i.e ). These distributions are embedded spaces which depend on the labels, hence we use the semi-supervised generative adversarial model proposed by Odena A. 2016 odena_2016 in order to encourage the alignment between these distributions. In particular, we use a discriminator with parameters which takes the form of a classifier distinguishing between classes, where the first classes correspond to the source classes and the class correspond to a class representing the data was generated by the target inference model. By doing this, we encourage the discriminator to learn the underlying distribution of each class independently. This discriminator, can be optimized using the following objective:
where is the cross entropy loss. On the the other hand, we try to confuse via an adversarial loss which forces the inference model of the target to learn a mapping from the samples to their correspondent embedding component. We can optimize the parameters of the target inference model using the following objective:
where are the predicted categorical variables sampled from the Gumbel-Softmax distribution for unlabeled target samples and the real class for labeled target samples.
3.5 Overall Objectives and Optimization
In this section we describe the overall objective of the model and how this objective can be optimized. The training process is done in three steps: train the source model, train the discriminator, and train the target model.
Source Optimization: The first step consists in optimizing the source model. By doing this, we can obtain an embedding space which will be used later to map target samples into the same embedding components. The overall objective for the source domain is defined in Equation 5 and it is composed of three terms. The first term can be computed analytically by following the proof of jiang_2016 as follows:
where and are the approximated mean and variance given by the neural network. and are the mean and variance of each embedding component. is the dimensionality of and . The second term can be optimized by computing the expectation of gradients using the reparametrization trick defined in kingma_2013 ; rezende_2014 as follows:
where denotes simple gradients over the parameters and . is the element-wise product. The third discriminative term can be trivially optimized by minimizing the cross entropy loss between real labels and predicted labels as estimated by the predictive function.
Discriminative Step: The discriminative step is done by minimizing Equation 8 with respect to the parameters of the discriminator . The goal of this step is to encourage the discriminator to learn the embedding representation generated by using the source domain. Then, we minimize the target inference model in order to fool the discriminator mapping samples into the same embedding. For this purpose, the discriminative step and later introduced target step are performed alternately.
Target Step: The overall objective for the target domain can be written as follows:
In this Equation, can be optimized using Equation 10. can be decomposed into two terms as in Equation 7. Following the derivation of belhaj_2018 , we can compute the derivatives of the second term using expectation of gradients and the reparametrization trick defined in jang_2016 to derive a Monte Carlo estimator as follows:
where is a predicted categorical variable sampled using the Gumbel-softmax relaxation defined as:
where are parameters outputed by the neural network , is a random variable sampled from and is a hyperparameter that regulates the entropy of the sampling. This reparametrization trick allows us to discretize during the forward pass, while we can use a continuous approximation in the backward pass. The KL divergence of Equation 13 is similar to the one introduced for source optimization, hence we can use the analytical solution introduced in Equation 10. The derivatives of the second term of Equation 12 can be computed as:
We evaluate our framework on the digits dataset composed of MNIST lecunn_1998 , USPS usps_1988 , and Street View House Numbers (SVHN) svhn_2011 . We then apply it to a real case scenario using galaxy images from the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS, grogin011 ) as source and the Cluster Lensing and Supernova Survey with Hubble (CLASH, Postman_2012 ) as target.
Digits: We use three benchmark digits datasets: MNIST (M), USPS (U), and Street View House Numbers (SVHN) (S). Theses datasets contain images of digits from 0 to 9 in different environments: M and U contain handwritten digits while, S contain natural scene images. Each dataset is composed of its own train and test set: train samples and test samples for MNIST; train samples and test samples for USPS; and train samples and test samples samples for SVHN. For adaptation purposes we used, the evaluation protocol proposed in CyCADA hoffman_2017 , i.e. three domain adaptation task are performed: from MNIST to USPS (M U), from USPS to MNIST (U M), and from SVHN to MNIST (S M).
Galaxies: We use galaxy images from CANDELS and CLASH and address the problem of classifying them according to their morphologies: smooth, features, irregular, point source, and unclassifiable 111Data is public available at:
https://drive.google.com/open?id=1BSc42VfAb2Mw0zlQShTFUnbCQaf11q4q. This is a challenging problem due to the high domain shift between source and target, which is given by changes in the spectroscopic filters in which the images were captured. Specifically, the CANDELS dataset contains images from the GOOD-S giavalisco_2003 galaxy cluster in the near-infrared F160W spectroscopic filter, and labels created by expert astronomers karteltepe . The CLASH dataset contains images from 25 galaxy clusters in 16 different spectroscopic filters ranged from ultraviolet to near-infrared. Labels for CLASH were also created by experts as described in perez2018 .
4.2 Implementation Details
For both tasks, images were rescaled to and resized to x. The SVHN dataset contains RGB images, so for S M, the MNIST images were repeated in each of the three filters in order to use the same input tensor size to the network. The hyperparameters were empirically selected by measuring the performance on 100 randomly selected samples from the training set for digits and 3 for the galaxies, performing 5 cross-validated experiments. Specifically, for all scenarios was set as and as . was set as and was set as . Our embedding is created on a 20-dimensional space, where the means and standard deviations of each Gaussian component are learnt via backpropagation. The means are initially set along different axes so that and (all-ones vector). Our training was performed using the Adam optimizer kingma_2015 with parameters and a learning rate of using mini-batches of 128 samples. For fair comparison, we used similar network architectures as the ones proposed in ming_2017 ; hu_2018 .
1) Digits: We compare our results with state-of-the-art methods in UDA and SSDA scenarios. In the UDA scenario, we use a fully labeled source and a fully unlabeled target to perform the adaptation. In Table 1, we show our results and compare them to previous approaches as obtained from their papers. We report the mean accuracy and its standard deviation across ten random experiments and the accuracy obtained by our best run. Even though our method is not designed for an unsupervised scenario, it is competitive on methods on M U and U M. Our method performs poorly on S M, which presents a higher domain shift, and ACAL performs the best by matching features at a pixel-level adding a relaxation on the cycle consistency, replacing it by a task-specific loss.
For the SSDA scenario, we perform experiments using one label per class on the target (denoted as 1-shot) and five labels per class on the target (denoted as 5-shot). The rest of the training samples are used in an unsupervised fashion. We compare our results against other methods that utilize the same number of labels per class. Notice that CCSA motiian_2017b and FADA motiian_2017a do not utilize unlabeled samples during training while we do. The results are computed using the same procedure as before and are displayed in Table 2. Here, we outperform all previous approaches. Our approach has a higher speed of adaptation in the sense that by using a small number of labels in the target domain we are able to obtain competitive results.
|Method||M U||U M||S M|
|AVDA (ours) best|
|AVDA (ours) random|
|Method||M U||U M||S M||M U||U M||S M|
|AVDA (ours) best|
|AVDA (ours) random|
Ablation Study: We examined the performance of AVDA adopting three different training strategies, in which we change critical components of our framework. First, we investigate the use of fixed priors during training (i.e they are not updated via backpropagation). We denoted this experiment as AVDAFP. Second, we investigate the model in a classical adversarial domain adaptation scenario (i.e the discriminator tries to distinguish between samples generated by source or target). We denote this experiment as AVDAADA. Third, we investigate the model when a target generative model is included. We denote this experiment as AVDAGT. The experiments were performed in the most difficult digit scenario S M using five labels per class. For the first experiment, AVDAFP obtained an accuracy of , hence lowering the accuracy of our model. For the second experiment, AVDAADA obtained an accuracy of , increasing the variance of the model performance. For the third experiments, AVDAGT obtained an accuracy of , decreasing the discriminative capability of our model. In consequence, AVDA components obtain the best performance as compared to these three slightly variant frameworks.
Visualization: In order to visualize the alignment between source and target domains, we visualize the embedding space by using t-distributed stochastic neighbor embedding (t-SNE, maaten_2008 ) for the task S M considering a 5-shot scenario. Figure 2 shows this visualization. On the left, we show each class in a different color, demonstrating the classifying capability of the model. On the right, we show the source and target in different colors, demonstrating the ability of the model to generate good alignments between the data labels and the embedded components for both domains.
2) Galaxies: For the morphology classification task, we trained 6 different models using 0, 1, 5, 10, 25 and 50 labeled target samples per class. As in a classical semi-supervised setting, all unlabeled target images were used for training and evaluation. We show the results in Figure 3. We can notice that just a few number of labeled samples are enough to make important corrections in the domain shift, observing a significant speed-up in performance when labels are included.
In this paper we present Adversarial Variational Domain Adaptation (AVDA), a semi-supervised approach for domain adaptations problems were a vast annotated source domain is available but none or few labels from a target domain exist. Unlike previous methods which align source and target domains into a single common feature space, we use a variational embedding and align samples that belong to the same class into the same embedding component using adversarial methods. Experiments on digits and galaxy morphology classification problems are used to validate the proposed approach. Our model presents a significant speed-up in terms of the increase in accuracy as more labeled examples are used from the target domain, increasing the accuracy more than with only one label per class for the morphology classification task. Though our framework does not show competitive on an unsupervised scenario, we demonstrate the capability of the model to align embedding spaces even in high domain shift scenarios by outperforming state-of-the-art methods from to in a semi-supervised scenario.
- (1) Marouan Belhaj, Pavlos Protopapas, and Weiwei Pan. Deep variational transfer: Transfer learning through semi-supervised deep generative models. CoRR, abs/1812.03123, 2018.
- (2) Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation Learning: A Review and New Perspectives. arXiv e-prints, page arXiv:1206.5538, Jun 2012.
- (3) John S. Denker, W. R. Gardner, Hans Peter Graf, Donnie Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, Henry S. Baird, and Isabelle Guyon. Neural network recognizer for hand-written zip code digits. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems 1, pages 323–331. Morgan-Kaufmann, 1989.
- (4) J. Donahue, J. Hoffman, E. Rodner, K. Saenko, and T. Darrell. Semi-supervised domain adaptation with instance constraints. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 668–675, June 2013.
- (5) Yaroslav Ganin and Victor Lempitsky. Unsupervised Domain Adaptation by Backpropagation. arXiv e-prints, page arXiv:1409.7495, Sep 2014.
- (6) Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. J. Mach. Learn. Res., 17(1):2096–2030, January 2016.
- (7) Timnit Gebru, Judy Hoffman, and Li Fei-Fei. Fine-grained recognition in the wild: A multi-task domain adaptation approach. CoRR, abs/1709.02476, 2017.
- (8) M. Giavalisco et al. The Great Observatories Origins Deep Survey: Initial results from optical and near-infrared imaging. Astrophys. J., 600:L93, 2004.
- (9) Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 513–520, 2011.
- (10) B. Gong, Y. Shi, F. Sha, and K. Grauman. Geodesic flow kernel for unsupervised domain adaptation. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2066–2073, June 2012.
- (11) Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, and Andrew Y. Ng. Measuring invariances in deep networks. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 646–654. Curran Associates, Inc., 2009.
- (12) Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014.
- (13) R. Gopalan, Ruonan Li, and R. Chellappa. Domain adaptation for object recognition: An unsupervised approach. In 2011 International Conference on Computer Vision, pages 999–1006, Nov 2011.
- (14) Norman A. Grogin, Dale D. Kocevski, and S. M. Faber. Candels: The cosmic assembly near-infrared deep extragalactic legacy survey. The Astrophysical Journal Supplement, 2011.
- (15) Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei A. Efros, and Trevor Darrell. Cycada: Cycle-consistent adversarial domain adaptation. CoRR, abs/1711.03213, 2017.
- (16) Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, and Richard Socher. Augmented cyclic adversarial learning for domain adaptation. CoRR, abs/1807.00374, 2018.
- (17) Lanqing Hu, Meina Kan, Shiguang Shan, and Xilin Chen. Duplex generative adversarial network for unsupervised domain adaptation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- (18) Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. Image-to-image translation with conditional adversarial networks. CoRR, abs/1611.07004, 2016.
- (19) Eric Jang, Shixiang Gu, and Ben Poole. Categorical Reparameterization with Gumbel-Softmax. arXiv e-prints, page arXiv:1611.01144, Nov 2016.
- (20) Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, and Hanning Zhou. Variational deep embedding: A generative approach to clustering. CoRR, abs/1611.05148, 2016.
- (21) Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. arXiv e-prints, page arXiv:1401.4082, Jan 2014.
- (22) Guoliang Kang, Lu Jiang, Yi Yang, and Alexander G. Hauptmann. Contrastive adaptation network for unsupervised domain adaptation. CoRR, abs/1901.00976, 2019.
- (23) J. S. Kartaltepe, M. Mozena, D. Kocevski, D. H. McIntosh, J. Lotz, E. F. Bell, S. Faber, H. Ferguson, D. Koo, R. Bassett, M. Bernyk, K. Blancato, F. Bournaud, P. Cassata, M. Castellano, E. Cheung, C. J. Conselice, D. Croton, T. Dahlen, D. F. de Mello, L. DeGroot, J. Donley, J. Guedes, N. Grogin, N. Hathi, M. Hilton, B. Hollon, A. Koekemoer, N. Liu, R. A. Lucas, M. Martig, E. McGrath, C. McPartland, B. Mobasher, A. Morlock, E. O’Leary, M. Peth, J. Pforr, A. Pillepich, D. Rosario, E. Soto, A. Straughn, O. Telford, B. Sunnquist, J. Trump, B. Weiner, S. Wuyts, H. Inami, S. Kassin, C. Lani, G. B. Poole, and Z. Rizer. CANDELS Visual Classifications: Scheme, Data Release, and First Results. apjs, 221:11, November 2015.
- (24) Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2015.
- (25) Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, and Max Welling. Semi-Supervised Learning with Deep Generative Models. arXiv e-prints, page arXiv:1406.5298, Jun 2014.
- (26) Diederik P Kingma and Max Welling. Auto-Encoding Variational Bayes. arXiv e-prints, page arXiv:1312.6114, Dec 2013.
- (27) Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, Nov 1998.
- (28) Ming-Yu Liu, Thomas Breuel, and Jan Kautz. Unsupervised image-to-image translation networks. CoRR, abs/1703.00848, 2017.
- (29) Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I. Jordan. Learning transferable features with deep adaptation networks. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, pages 97–105. JMLR.org, 2015.
- (30) Mingsheng Long, ZHANGJIE CAO, Jianmin Wang, and Michael I Jordan. Conditional adversarial domain adaptation. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 1640–1650. Curran Associates, Inc., 2018.
- (31) Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I. Jordan. Deep transfer learning with joint adaptation networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pages 2208–2217. JMLR.org, 2017.
- (32) Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, and Ole Winther. Auxiliary Deep Generative Models. arXiv e-prints, page arXiv:1602.05473, Feb 2016.
- (33) Saeid Motiian, Quinn Jones, Seyed Iranmanesh, and Gianfranco Doretto. Few-shot adversarial domain adaptation. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 6670–6680. Curran Associates, Inc., 2017.
- (34) Saeid Motiian, Marco Piccirilli, Donald A. Adjeroh, and Gianfranco Doretto. Unified deep supervised domain adaptation and generalization. CoRR, abs/1709.10190, 2017.
- (35) Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. NIPS, 01 2011.
- (36) Augustus Odena. Semi-Supervised Learning with Generative Adversarial Networks. arXiv e-prints, page arXiv:1606.01583, Jun 2016.
- (37) M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 1717–1724, June 2014.
- (38) S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359, Oct 2010.
- (39) Xingchao Peng and Kate Saenko. Synthetic to real adaptation with deep generative correlation alignment networks. CoRR, abs/1701.05524, 2017.
- (40) Manuel Pérez-Carrasco, Guillermo Cabrera-Vives, Monserrat Martinez-Marín, Pierluigi Cerulo, Ricardo Demarco, Pavlos Protopapas, Julio Godoy, and Marc Huertas-Company. Multiband galaxy morphologies for clash: a convolutional neural network transferred from candels. arXiv preprint arXiv:1810.07857, 2018.
- (41) M. Postman, D. Coe, N. Benítez, L. Bradley, T. Broadhurst, M. Donahue, H. Ford, O. Graur, G. Graves, S. Jouvel, A. Koekemoer, D. Lemze, E. Medezinski, A. Molino, L. Moustakas, S. Ogaz, A. Riess, S. Rodney, P. Rosati, K. Umetsu, W. Zheng, A. Zitrin, M. Bartelmann, R. Bouwens, N. Czakon, S. Golwala, O. Host, L. Infante, S. Jha, Y. Jimenez-Teja, D. Kelson, O. Lahav, R. Lazkoz, D. Maoz, C. McCully, P. Melchior, M. Meneghetti, J. Merten, J. Moustakas, M. Nonino, B. Patel, E. Regös, J. Sayers, S. Seitz, and A. Van der Wel. The Cluster Lensing and Supernova Survey with Hubble: An Overview. apj, 199:25, April 2012.
- (42) Antti Rasmus, Harri Valpola, Mikko Honkala, Mathias Berglund, and Tapani Raiko. Semi-supervised learning with ladder network. CoRR, abs/1507.02672, 2015.
- (43) Paolo Russo, Fabio Maria Carlucci, Tatiana Tommasi, and Barbara Caputo. From source to target and back: symmetric bi-directional adaptive GAN. CoRR, abs/1705.08824, 2017.
- (44) Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, and Kate Saenko. Semi-supervised Domain Adaptation via Minimax Entropy. arXiv e-prints, page arXiv:1904.06487, Apr 2019.
- (45) Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, and Tatsuya Harada. Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3723–3732, 2018.
- (46) Cicero Nogueira dos Santos, Kahini Wadhawan, and Bowen Zhou. Learning loss functions for semi-supervised learning via discriminative adversarial networks. arXiv preprint arXiv:1707.02198, 2017.
- (47) Rui Shu, Hung Bui, Hirokazu Narui, and Stefano Ermon. A DIRT-t approach to unsupervised domain adaptation. In International Conference on Learning Representations, 2018.
- (48) Baochen Sun and Kate Saenko. Deep CORAL: correlation alignment for deep domain adaptation. CoRR, abs/1607.01719, 2016.
- (49) Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. A survey on deep transfer learning. CoRR, abs/1808.01974, 2018.
- (50) Eric Tzeng, Judy Hoffman, Trevor Darrell, and Kate Saenko. Simultaneous deep transfer across domains and tasks. CoRR, abs/1510.02192, 2015.
- (51) Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. CoRR, abs/1702.05464, 2017.
- (52) Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. Deep domain confusion: Maximizing for domain invariance. CoRR, abs/1412.3474, 2014.
- (53) Laurens van der Maaten and Geoffrey E. Hinton. Visualizing data using t-sne. 2008.
- (54) Mei Wang and Weihong Deng. Deep visual domain adaptation: A survey. CoRR, abs/1802.03601, 2018.
- (55) Hongliang Yan, Yukang Ding, Peihua Li, Qilong Wang, Yong Xu, and Wangmeng Zuo. Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation. CoRR, abs/1705.00609, 2017.
- (56) T. Yao, , C. Ngo, and and. Semi-supervised domain adaptation with subspace learning for visual recognition. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2142–2150, June 2015.
- (57) Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? CoRR, abs/1411.1792, 2014.
- (58) Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. CoRR, abs/1703.10593, 2017.
- (59) Fuzhen Zhuang, Xiaohu Cheng, Ping Luo, Sinno Jialin Pan, and Qing He. Supervised representation learning: Transfer learning with deep autoencoders. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pages 4119–4125. AAAI Press, 2015.
- (60) Han Zou, Yuxun Zhou, Jianfei Yang, Huihan Liu, Hari Prasanna Das, and Costas Spanos. Consensus adversarial domain adaptation. In AAAI Conference on Artificial Intelligence 33, 01 2019.