CatGAN: Coupled Adversarial Transfer for Domain Generation
This paper introduces a Coupled adversarial transfer GAN (CatGAN), an efficient solution to domain alignment. The basic principles of CatGAN focus on the domain generation strategy for adaptation which is motivated by the generative adversarial net (GAN) and the adversarial discriminative domain adaptation (ADDA). CatGAN is structured by shallow multilayer perceptrons (MLPs) for adversarial domain adaptation. The CatGAN comprises of two slim and symmetric sub-networks, which then formulates a coupled adversarial learning framework. With such symmetry, the input images from source/target domain can be fed into the MLP network for target/source domain generation, supervised by the coupled discriminators for confrontation. Notablely, each generator contains GAN loss and domain loss to guarantee the simple network work well. The content fidelity term aims at preserving the domain specific knowledge during generation. Another finding is that the class-wise CatGAN is an effective alternative to conditional GAN without label constraint in generative learning. We show experimentally that the proposed model achieves competitive performance with state-of-the art approaches.
In computer vision, the task-specific classifier usually does not work well on related but distribution mismatched tasks. The reason lies in that the test data has different distribution from the training data (i.e. data bias). In machine learning, it is issued as domain mismatch problem. It seems that the solution to cross-domain problems is often natural to human, who can easily recognize the instances from source or target domains. However, machines do not have such capabilities to naturally relate the source and target domains as human do. That is, machine learning implies a fundamental assumption of distribution consistency. Domain adaptation (DA)[6, 19] techniques that are capable of easing such domain shift problem have thus received significant attention from engineering recently.
DA model and algorithm allows machine learning methods to be self-adapted among multiple knowledge domains, that is, the trained model parameters from one data domain can be adapted to another domain. It is thus of great practical importance to explore DA methods. The assumption underpinning DA is that, although the domains differ, there is sufficient commonality to support such adaptation.
In this paper, we reformulate DA as a conditional image generation problem. The mapping function from one domain to another can be viewed as the modeling process of the generator, which achieves automatic domain shift alignment during data sampling . Adversarial adaptation has become a natural incarnation of DA approach, which seeks to minimize an approximate domain discrepancy through an adversarial objective function. Therefore, an adversarial domain adaptation framework with domain generators and domain discriminators as GAN does is proposed. Generative adversarial network (GAN) picks two networks called generator and discriminator working against each other. The generator is trained to produce images confusing the discriminator, which in turn tries to distinguish them from real examples. This adversarial strategy is very suitable for DA problem , therefore, this confrontation principle is exploited to ensure that the discriminator cannot distinguish the source domain from the generated target domain. The GAN inspired domain adaptation (ADDA) with convolutional neural network (CNN) has achieved a surprisingly good performance in .
Worthy noting that, in GANs, the realistic of generated images is important. However, the purpose of DA methods is to reduce the domain discrepancy, while the realistic of the generated image is not important. Therefore, our focus lies in the domain distribution alignment instead of the pure image generation as GAN does. To this end, a variety of GANs that always adopt deep neural network are over wasted and complicated for solving domain adaptation. In this work, we proposed a simple, slim but effective Coupled adversarial transfer generative adversarial network (CatGAN) for domain adaptation. The proposed CatGAN is formulated with a slim and symmetric multilayer perceptron (MLP) structure for generative adversarial adaptation.
Specifically, CatGAN comprises of two symmetric and coupled sub-networks with generator, discriminator and domain knowledge fidelity term, which then formulates a coupled learning framework. With the symmetric, source and target domains can be generated from each other with an adversarial mechanism supervised by the coupled discriminators. The network compatibility for arbitrary domain generation is then guaranteed. The domain specific knowledge of source and target domains can be retained by designing a domain knowledge fidelity term. In order to guarantee the domain realistic between the generated domain and real domain, a domain loss is designed in the generators.
The pipeline of CatGAN can be described simply as two generators and two discriminators are integrated for adversarial domain adaptation and feature representation. The main contribution and novelty of this work are threefold:
In order to reduce the domain discrepancy, we propose a simple but effective coupled adversarial transfer net (CatGAN), which is a slim and symmetric adversarial domain adaptation network structured by shallow multilayer perceptrons (MLPs). Through the proposed network, source and target domains can be generated against each other with an adversarial mechanism supervised by the coupled discriminators.
The CatGAN is a generative adversarial domain adaptation network comprising of two similarly structured sub-networks integrating generator and discriminator. Also, the coupled learning framework is two-way, such that arbitrary domain generation can be guaranteed without constraining the input to be source or target.
In domain generation, a domain knowledge fidelity loss and a domain specific loss are designed for domain content self-preservation and domain content similarity. In this way, the domain distortion in domain generation is avoided and the domain adapted feature representation become more stable and discriminative.
2 Related Work
2.1 Shallow Domain Adaptation
A number of shallow learning methods have been proposed to tackle DA problems. Generally, these shallow domain adaptation methods comprise of three categories.
Classifier based approaches. A generic way is to learn a common classifier on auxiliary domain data by leveraging a few labeled target data. Duan et al. proposed an adaptive multiple kernel learning (AMKL) for web-consumer video event recognition. Also, a domain transfer MKL (DTMKL), which jointly learns a SVM and a kernel function for classifier adaptation. Zhang et al. proposed a robust domain classifier adaptation method (EDA) with manifold regularization for visual recognition.
Feature augmentation/transformation based approaches. In , Hoffman et al. proposed a Max-Margin Domain Transforms (MMDT), in which a category specific transformation was optimized for domain transfer. Long et al. proposed a Transfer Sparse Coding (TSC) approach to construct robust sparse representations by using empirical Maximum Mean Discrepancy (MMD)  as the distance measure. Long et al. also proposed a Transfer Joint Matching (TJM) which tends to learn a non-linear transformation across domains by minimizing the MMD based distribution discrepancy.
Feature reconstruction based approaches. Different from those methods above, domain adaptation is achieved by feature reconstruction between domains. Jhuo et al. proposed a RDALR method, in which the source data is reconstructed by the target data using low-rank model. Similarly, Shao et al.  proposed a LTSL method by pre-learning a subspace using PCA or LDA, then low-rank representation across domains is modeled. Zhang et al.  proposed a Latent Sparse Domain Transfer (LSDT) method by jointly learning a subspace projection and sparse reconstruction across domains.
2.2 Deep Domain Adaptation
Deep learning, as a category of data-driven domain adaptation method, has witnessed a great achievements [31, 26, 34]. However, for small-sized tasks, deep learning may not work well. Therefore, deep domain adaptation methods on small-scale tasks have been emerged.
Donahue et al. proposed a deep transfer strategy for small-scale object recognition, by training a CNN network (AlexNet) on ImageNet. Similarly, Razavian et al. also proposed to train a deep network based on ImageNet for high-level domain feature extractor. Tzeng et al. proposed a CNN based DDC method which achieved successful knowledge transfer between domains and tasks. Long et al. proposed a deep adaptation network (DAN) by imposing MMD loss on the high-level features across domains. Additionally, Long et al. also proposed a residual transfer network (RTN) which tends to learn a residual classifier based on softmax loss. Oquab et al. proposed a CNN architecture for middle-level feature transfer, which was trained on a large-scale annotated image set. Hu et al. proposed a non-CNN based deep transfer metric learning (DTML) method to learn a set of hierarchical nonlinear transformations for cross-domain visual recognition.
Recently, GANs inspired adversarial domain adaptation methods have been preliminarily studied. Tzeng et al. proposed a novel ADDA method  for adversarial domain adaptation, in which CNN is used for adversarial discriminative feature learning. A GAN based model  that adapted the source-domain images to appear as if drawn from the target domain was proposed, in which domain image generation was focused. The two works have shown the potential of adversarial learning in domain adaptation. Very recently, in , Hoffman et al. proposed a CyCADA method which shows a similar characteristic of cycle generation with ours. This method adapts representations at both the pixel-level and feature-level, enforcing cycle-consistency by leveraging a task loss.
2.3 Generative Adversarial Networks
The generative adversarial network (GAN) was first proposed by Goodfellow et al. to generate images and produced a high influence in deep learning. GAN learns two sub-networks: a generator and a discriminator. The discriminator reveals whether a sample is fake or real, while the generator produces samples as real as possible to cheat the discriminator. Mirza et al. proposed a conditional generative adversarial net (CGAN) where both networks G and D receive an additional information vector as input. Salimans et al. gets a state-of-the-art result in semi-supervised classification and improves the visual quality of GAN. Kim et al. proposed a DiscoGAN for discovering cross-domain relations and transferring style from one domain to another. The key attributes such as orientation and face identity are preserved. Our CatGAN is also inspired by this style transfer method.
3 The Proposed CatGAN
In this paper, the source and target domain are defined by subscript ¡°¡± and ¡°¡±. The training set of source and target domain are defined as and , respectively. A generator network is denoted : , that maps data from domain to its co-domain . The discriminator network is denoted as and the subscript denotes that it discriminates samples in target domain and co-domain . Note that, and are similarly defined.
3.2 Idea of Adversarial Domain Generation
Direct supervised learning on the target domain is not possible due to label scarcity, therefore, a target co-domain can be produced by an adversarial domain generator using source domain. Our key idea is to learn a ”source target” generative feature representation by which a domain insensitive classifier can be learnt for recognition. Noteworthily, our aim is to minimize the feature divergence between domains other than generate a vivid target image. Therefore, a simple and flexible network is expected instead of very complicated structure.
Additionally, standard GAN and conditional GAN both have some limitations. In standard GAN, explicitly supervised data is seldom available, and the randomly generated samples can become tricky if the corresponding content information is lost. Thus the trained classifier may not work obviously well. In conditional GAN, although a label constraint is imposed, it does not guarantee the cross-domain relation because the domain mapping is one-directional.
Since conditional GAN architecture only learns one mapping from one domain to another (one-way), a two-way adversarial domain generation method with more freedom is designed. The core of our CatGAN model depends on two symmetric GANs coupled together, and a pair of symmetric generative and discriminative functions are resulted.
The proposed CatGAN is a two-way symmetric generative adversarial network, which is shown in Fig.1. The two-way generation function is a bijective mapping. The flow of CatGAN in implementation can be described as follows.
First, image or feature can be taken as input instead of noise to feed into the model. The way-1 of CatGAN comprises of generator () and discriminator (). The way-2 comprises of generator () and discriminator (). For way-1, the source data is fed into the generator, and a co-target data is generated. Then the generated target data and the real target data are fed into the discriminator network () for adversarial training. For way-2, the similar operation with way-1 is conducted. In order to achieve the bijective mapping, we expect that the real source data can be recovered by feeding the generated into the generator for progressively learning supervised by . Similarly, is also fine-tuned by feeding the supervised by to recover the real target training data.
In the proposed model, the generator is a two-layered perceptron and the discriminator is a three-layered perceptron. Sigmoid function is used as activation function in hidden layer. The network structure of generator and discriminator is described in Fig.2.
The proposed CatGAN model has a symmetric structure comprising of two generators and two discriminators, which are described in two ways across domains (S T and T S). We first describe the model of way-1 (S T) which has the same model as way-2 (T S).
Way-1: S T:
A target domain discriminator () which classifies whether a generated data point is drawn from the target domain (real). The discriminator loss is formulated as
where . is the generator aiming to produce realistic data as target domain. Therefore, the supervised generator loss is formulated as
The focus of CatGAN is to reduce the distribution difference across domains, we therefore propose to minimize the domain loss , which can help the learning of the generator as shown in Fig.2. Specifically, in order to reduce the distribution mismatch between the generated target data and the original target data , the domain loss can be formulated as
where is the center matrix of the target data. Noteworthily, during network training phase, the sigmoid function is imposed on the domain loss for probability output normalized to . Therefore, the target domain loss can be written as
Further, for keeping the content in source data, we establish a content fidelity in our model. Ideally, the equality should be satisfied, that is, the generation is reversible. However, this hard constraint is difficult to be guaranteed and a relaxed soft constraint is more desirable. Therefore, we try to minimize the distance with a reconstruction loss function , i.e. source content loss, formulated as follows
where , and is a generator of way-2 (). Finally, the objective function of the way-1 generator is composed of 3 parts:
Way-2: T S:
For way-2, the similar model with way-1 is formulated, including the source discriminator loss , the source data generator loss , source domain loss , and the target content loss . Specifically, the loss functions can be formulated as follows
where , and is the center matrix of source data. Finally, the objective function of the way-2 generator is also composed of 3 parts, formulated as follows
Complete CatGAN Model:
The proposed CatGAN model is a coupled net, each of which learns the bijective mapping from one domain to another. The two ways in CatGAN are jointly trained in an alternative manner. The two generators in Fig.1 share the same parameters and so do the two generators . The generated data and are fed into the discriminators and , respectively.
Finally, the complete model of CatGAN including the generator and the discriminator can be formulated as the following two subproblems.
In CatGAN, one key difference from the previous GAN model is that a two-ways architecture is are proposed, with each a domain loss and a content loss are designed for domain alignment and content fidelity. Note that, the CatGAN has a similar structure with the discoGAN , but essentially different. First, the purpose of CatGAN is for domain adaptation by domain generation, domain reconstruction and content preservation. Second, the CatGAN is structured by using shallow multilayer perceptrons and the domain adaption mode (supervised, semi-supervised, unsupervised) can be freely changed. Third, in order to minimize the domain discrepancy but keep the content fidelity, a domain loss and a content loss are designed.
As the CatGAN model we proposed is flexible and simple, it can be regarded as a shallow domain adaptation approach. Therefore, the shallow feature (e.g., pixel-level or low-level) and deep feature can be fed into the model.
5.1 Comparison with Shallow Domain Adaptation
In this section, two benchmark datasets including the COIL-20 object dataset and the MSRC-VOC 2007 datasets for cross-domain object/image recognition are used. Results on COIL-20 dataset: Columbia Object Image Library: The COIL-20 dataset contains 20 objects with 1440 gray scale images (72 multi-pose images per object). In experiments, by following the experimental protocol in, the dataset is divided into two subsets C1 and C2, with each 2 quadrants are included. Specifically, the C1 set contains the directions of quadrants 1 and 3. The C2 set contains the directions of quadrants 2 and 4. The two subsets are with different distribution but relevant in semantic, and therefore come to a DA problem. By taking C1 and C2 as source and target domain alternatively, the cross-domain recognition rates of different methods are shown in Table 1, from which we can see that the proposed CatGAN shows a significantly superior performance () over other state-of-the art shallow DA methods with 8% improvement.
Results on MSRC
5.2 Comparison with Deep Domain Adaptation
In this section, we deploy the experiments on 4DA benchmark dataset and handwritten digit datasets for comparison with state-of-the-art deep domain adaptation approaches. Note that, 3 handwritten digit datasets including MNIST, USPS, and SVHN as shown in Fig.3 are used.
Four domains such as Amazon (A), DSLR (D), Webcam (W)
|Tasks||Source only||Gradient reversal||Domain confusion||CoGAN||DANN||ADDA|
Handwritten Digits Experiment.
Three handwritten digits datasets including MNIST (M)
For the handwritten digits recognition experiments, we get the deep features of three datasets using LeNet architecture provided in the Caffe source code. For adaptation between MNIST and USPS, we follow the training protocol established in, randomly sampling 2000 images from MNIST and 1800 images from USPS. For adaptation between SVHN and MNIST, we use the full training sets for comparison against. We do the same domain adaptation tasks by following the experiment in . The difference between  and our method is that the ADDA  is an unsupervised method while ours is semi-supervised. An essential problem that the generated samples may be changed randomly is ignored (e.g. instead of ). In our CatGAN, this problem can be handled by using class-wise model. In our setting, 10 samples per class from target domain are randomly selected for training. Finally, 5 random splits are used, and the average classification accuracies are reported in Table 4. From the results, we observe that our CatGAN model outperforms other state-of-the-art methods with improvement. The superiority is therefore proved.
6.1 Model Variation
In order to keep the generated samples match with the label, CatGAN can be trained class by class which results in an unsupervised structure. The training samples in source domain and target domain are preprocessed independently, thus the number of networks to be trained equals to number of classes. While the conditional CatGAN can be used as a semi-supervised method. The training samples as well as their labels in target domain are provided in the training procedure as constraint information. In this way, the number of networks to be trained is only one, but it comes at the cost of computational complexity. Specifically, the structure of the class-wise CatGAN and the conditional CatGAN is shown in Fig.4 (a) and (b), respectively. Fig.4 (a) shows the class-wise CatGAN which has multiple class-specific networks to be trained. Fig.4 (b) shows the Conditional CatGAN which has only one network to be trained. For comparison, their recognition performance is given in Table 5, from which we observe the similar performance is achieved for both variations of CatGAN.
6.2 Model Visualization
In this section, the visualization of the generated digit images and data distribution in plane is presented. For better insight of the CatGAN model, the image visualization is explored. We have shown the visualization of handwritten digits from M to U. The first two in Fig.6 illustrate the generated image and . The last two of Fig.6 shows the generated image and . We observe that the generated data has clear category information.
The distribution of source data, generated target data and real target data is visualized in Fig.5 by using the t-SNE embeddings . We can observe the better clustering characteristic of the generated data, and the feature discrimination is improved. As a result, the cross domain classification performance can be naturally promoted.
|Tasks||Class-wise CatGAN||Conditional CatGAN|
In this paper, we propose a new domain adaption perspective with domain generation. Therefore, a coupled adversarial transfer GAN(CatGAN) comprising of two generators, two discriminators and two content fidelity terms method is introduced. Each generator contains GAN loss and domain loss to guarantee the simple network work well. This symmetric model can achieve bijective mapping to keep the intrinsic content information not malposed and can be mapped from one domain to the other domain arbitrarily. For the sake of keeping inner relations between two domains retained further more as the label information keep constant. In the process of minimizing the content discrepancy between the re-generated samples and objective samples, the content fidelity term is used aiming at preserving the domain specific content during generation. Considering that the network is a simple two layered perceptrons in essence, it is a shallow transfer learning method which can be compared to the deep transfer learning method. Extensive experiments on benchmark DA datasets demonstrate the superiority of the proposed method over several state-of-the-art DA methods.
- K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan. Unsupervised pixel-level domain adaptation with generative adversarial networks. 2016.
- S. F. Chang, D. T. Lee, D. Liu, and I. Jhuo. Robust visual domain adaptation with low-rank reconstruction. In CVPR, pages 2168–2175, 2013.
- J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. 50(1):I–647, 2013.
- J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In ICML, pages 647–655, 2014.
- L. Duan, I. W. Tsang, and D. Xu. Domain transfer multiple kernel learning. IEEE Trans. PAMI, 34(3):465–479, 2012.
- L. Duan, D. Xu, I. W. Tsang, and J. Luo. Visual event recognition in videos by learning from web data. In CVPR, pages 1959–1966, 2010.
- Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-Adversarial Training of Neural Networks. 2017.
- B. Gong, Y. Shi, F. Sha, and K. Grauman. Geodesic flow kernel for unsupervised domain adaptation. In CVPR, pages 2066–2073, 2012.
- I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In International Conference on Neural Information Processing Systems, pages 2672–2680, 2014.
- J. Hoffman, E. Rodner, J. Donahue, B. Kulis, and K. Saenko. Asymmetric and category invariant feature transformations for domain adaptation. IJCV, 109(1):28–41, 2014.
- J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. A.Efros, and T. Darrell. Cycada: Cycle-consistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213, 2017.
- J. Hoffman, D. Wang, F. Yu, and T. Darrell. Fcns in the wild: Pixel-level adversarial and constraint-based adaptation. 2016.
- J. Hu, J. Lu, and Y.-P. Tan. Deep transfer metric learning. In CVPR, pages 325–333, 2015.
- A. Iyer, J. S. Nath, and S. Sarawagi. Maximum mean discrepancy for class ratio estimation: Convergence bounds and kernel selection. In ICML, pages 530–538, 2014.
- I.-H. Jhuo, D. Liu, D. Lee, and S.-F. Chang. Robust visual domain adaptation with low-rank reconstruction. In CVPR, pages 2168–2175, 2012.
- T. Kanamori, S. Hido, and M. Sugiyama. A least-squares approach to direct importance estimation. JMLR, 10(Jul):1391–1445, 2009.
- T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim. Learning to discover cross-domain relations with generative adversarial networks. 2017.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. NIPS, 25(2):1097–1105, 2012.
- B. Kulis, K. Saenko, and T. Darrell. What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In CVPR, pages 1785–1792, 2011.
- M. Long, Y. Cao, J. Wang, and M. Jordan. Learning transferable features with deep adaptation networks. In ICML, pages 97–105, 2015.
- M. Long, G. Ding, J. Wang, J. Sun, Y. Guo, and P. S. Yu. Transfer sparse coding for robust image representation. In ICCV, pages 407–414, 2013.
- M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu. Transfer joint matching for unsupervised domain adaptation. In CVPR, pages 1410–1417, 2014.
- M. Long, H. Zhu, J. Wang, and M. I. Jordan. Unsupervised domain adaptation with residual transfer networks. In NIPS, pages 136–144, 2016.
- H. Lu, L. Zhang, Z. Cao, W. Wei, K. Xian, C. Shen, and A. V. D. Hengel. When unsupervised domain adaptation meets tensor representations. 2017.
- M. Mirza and S. Osindero. Conditional generative adversarial nets. Computer Science, pages 2672–2680, 2014.
- M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. In CVPR, pages 1717–1724, 2014.
- C. Rate and C. Retrieval. Columbia object image library (coil-20). Computer, 2011.
- T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. 2016.
- M. Shao, D. Kit, and Y. Fu. Generalized transfer subspace learning through low-rank constraint. IJCV, 109(1-2):74–93, 2014.
- A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. In CVPR, pages 806–813, 2014.
- E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko. Simultaneous deep transfer across domains and tasks. In ICCV, pages 4068–4076, 2015.
- E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adaptation. 2017.
- J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Trans. PAMI, 31(2):210–227, 2009.
- M. Xie, N. Jean, M. Burke, D. Lobell, and S. Ermon. Transfer learning from deep features for remote sensing and poverty mapping. arXiv, 2015.
- Y. Xu, X. Fang, J. Wu, X. Li, and D. Zhang. Discriminative transfer subspace learning via low-rank and sparse representation. IEEE Trans Image Processing, 25(2):850–863, 2015.
- L. Zhang and D. Zhang. Robust visual knowledge transfer via extreme learning machine-based domain adpatation. IEEE Trans. Image Processing, 25(3):4959–4973, 2016.
- L. Zhang, W. Zuo, and D. Zhang. Lsdt: Latent sparse domain transfer learning for visual adaptation. IEEE Trans Image Processing, 25(3):1177–1191, 2016.