Light-weight calibrator: a separable component for unsupervised domain adaptation

Light-weight calibrator: a separable component for unsupervised domain adaptation

Abstract

Existing domain adaptation methods aim at learning features that can be generalized among domains. These methods commonly require to update source classifier to adapt to the target domain and do not properly handle the trade off between the source domain and the target domain. In this work, instead of training a classifier to adapt to the target domain, we use a separable component called data calibrator to help the fixed source classifier recover discrimination power in the target domain, while preserving the source domain’s performance. When the difference between two domains is small, the source classifier’s representation is sufficient to perform well in the target domain and outperforms GAN-based methods in digits. Otherwise, the proposed method can leverage synthetic images generated by GANs to boost performance and achieve state-of-the-art performance in digits datasets and driving scene semantic segmentation. Our method empirically reveals that certain intriguing hints, which can be mitigated by benigh noise similar to adversarial attack to domain discriminators, are one of the sources for performance degradation under the domain shift.

\cvprfinalcopy

1 Introduction

Figure 1: Concept Illustration. (a) The source classifier in labeled source domain. (b) The source classifier in unlabeled target domain. (c) Existing methods that are developed to learn domain-invariant features. (d) In real world, the testing set consists of both source domain images and target domain images. (e) The proposed method keeps the representation of source classifier and calibrates target images to fit the source classifier’s representation.
Figure 2: Performance trade off between source and target domain. Some existing methods improve target performance at the expense of source domain performance. On contrast, the proposed method keeps good source domain performance and outperforms these methods in target domain performance.

Deep neural networks have achieved great performance in solving diverse machine learning problems. However, solving the so called domain shift problem is challenging when neural networks are trying to generalize across domains [28, 33, 24]. Extensive efforts have been made on unsupervised domain adaptation [28, 6, 36, 31, 13, 35, 12, 20, 29]. Early domain adaptation methods use different distance metrics or statistics data to align neural networks’ feature distribution of source domain with their feature distribution of target domain. Adversarial domain adaptation methods [6, 35] leverage a two players adversarial game to achieve domain adaptation: A domain classifier is encouraged to learn the difference between the feature distribution of two domains while the classification model is encouraged to maximize the classification loss of the domain classifier by learning domain invariant representation that is indistinguishable to the domain classifier. In addition to feature-level adversarial game, there is another line of works that use Generative Adversarial Networks(GANs) [9] to generate source domain images with target domain styles, playing a pixel level adversarial game.

However, there are issues that have been rarely discussed. Consider a neural network that is deployed in a device and the device needs to move between different domains. It moves from a domain that is close to its trained source domain to another domain that has no labeled data. Traditional unsupervised domain adaptation suffices to handle this simple case. However, the devices can freely move to other domains, which include the source domain. This sample but more realistic scenario brings issues to existing methods. The issues are two folds: (1) Existing methods commonly require to finetune or train a new classifier during domain adaptation. It is not flexible if models are compressed and deployed[11, 46]. (2) Previous methods omit to show the trade off between source domain performance and target domain performance. Some of them have poor performance trade off as indicated in Figure 2. Therefore, when the environments are constantly changing, existing methods are likely to have performance degradation and are not able to adapt to new environments in a flexible way.

Some prior works try to work on changing domains [37, 1]. Bobu \etal [1] proposes to adapt to continuously changing target domains and Wulfmeier \etal [37] proposes to incrementally adapt to changing domains. However their methods require to finetune the model, and after the model is deployed, the method cannot work properly for unanticipated new domains. We thereby propose two properties a domain adaptation method should have for changing target domains with deployed models.

(1) Good trade-off between source and target domain. Given the complexity of the real world, it is unrealistic to assume that the one chosen target domain is the ultimate application domain. Existing methods assume that the source domain only consists of synthetic images and omit to show the source domain performance after domain adaptation, mostly because that it is assumed the source domain will not be encountered again. A counter example is that both source domain and target domain consist of real world images and source domain will also be encountered. In this case, sacrificing source domain performance is not acceptable.

Figure 3: SVHN to MNIST task. Source classifier LeNet is trained in SVHN. (a) The source classifier’s prediction on SVHN. (b) The source classifier’s prediction on MNIST. (c) The source classifier’s prediction on SVHN, with data calibrator. (d) The source classifier’s prediction on MNIST, with data calibrator.

(2) Flexibility to adapt to arbitrary new domains after being deployed. Deep neural networks are widely deployed in specialized devices [10]. Usually, they are compressed via model compression methods [11, 46, 44, 41, 25] before being deployed and they are not expected to be updated after being deployed. As far as we know, all existing domain adaptation methods will require finetune the models, which contradicts with model compression methods.

It is natural to expect that collecting more data will make a neural network learn universal representation and tremendous investment is made for collecting bigger datasets [5, 16]. However, datasets are found to contain database bias [33, 24]. Training against large datasets does not guarantee the performance of models under changing environments. Therefore, adapting to unanticipated new environments will be necessary and lacking of the flexibility will be an issue.

In this work, we take the first step to mitigate both limitations and formulate unsupervised domain adaptation in a novel way. Figure 1 illustrates the difference between previous methods and our method in the conceptual level. Previous methods commonly update the source classifier’s weights when domain adaptation is needed while ours modifies inputs to achieve domain adaptation.

We refer existing methods that attempt to learn cross-domain models as monolithic domain adaptation approach. On contrast, we propose a separable component called data calibrator to achieve domain adaptation, which can be seen as a distributed domain adaptation approach. In our framework, the source classifier is responsible for learning representation under supervised training and the data calibrator is responsible for achieving domain adaptation via unsupervised training.

Our core observation is that the learnt representation from the source domain is not as bad as we thought as shown in Figure 3. The performance degradation brought by domain shift can be mitigated by slightly modifying the target domain images by adding perturbation , which we refer as calibration, to the images. By applying calibration to target domain images, these images fit the source classifier’s learnt representation significantly better. We show that we can train a light-weight data calibrator whose number of parameters is only 0.25% to 5.8% of the deployed model and we can use it to adapt the deployed model to arbitrary target domains.

We also want to emphasize that our study focus on the setting that the source domain and the target domain share the common label space otherwise the source classifier will not work properly in the target domain.

To summarize our contributions:

  • We propose a data calibrator to calibrate target domain images to better fit source classifier’s representation while maintaining the source domain performance. We improve previous state-of-the-art average accuracy from 95.1% to 97.6% in digits experiments and frequency weighted IoU from 72.4% to 75.1% in GTA5 to CityScapes adaptation.

  • The proposed data calibrator is light weight and can be less than 1% in terms of number of parameters compared to the deployed model in GTA5 to CityScapes adaptation and it is a separable domain adaptation approach for it does not need to update the source classifier’s weights, thus very convenient for deployment.

  • We give new insights on what causes the performance degradation under domain shift and show how to counter it correspondingly.

2 Related Work

Unsupervised Domain Adaptation Visual domain adaptation can trace back to [28]. Early domain adaptation methods focus on aligning deep representation between two domains by using Maximum Mean Discrepancy(MMD) [23, 36, 19] whereas deep Correlation Alignment (CORAL) [31] used statistics such as mean and covariance to achieve feature alignment.

Another line of works leverage the idea of domain classifiers. Torralba \etal [33] used ”name the database” to demonstrate that databases are commonly biased and it is even possible to train a domain classifier to correctly classify images to databases they come from. Intuitively, if a domain classifier can learn the difference between source domain and target domain from pixels, then it is also possible for a domain classifier to learn the difference between deep representation of source domain images and target domain images. A line of works explore the idea of training a classifier that confuses the domain classifier by maximizing the domain confusion loss [34, 6, 35, 7, 30, 36]. In addition to the attempt of confusing a domain classifier in the feature level, pixel level adaptation is also explored. Hoffman \etal [13] achieves pixel level adaptation for segmentation task, but it uses neural networks’ hidden layer output for pixel level adaptation. Our method incorporates both pixel level domain classifier and feature level domain classifier. The pixel level classifier we use directly takes the pixels as inputs, closer to the spirit of ”name the dataset” [33].

Generative Adversarial Networks Another line of works leverages the power of Generative Adversarial Networks (GANs) [9] to generate source images with target images’ style. The first of this kind is CoGANs [18] that jointly learns the source domain representation and the target domain representation by forcing the weight sharing between two GANs. Bousmalis \etal [2] used GANs to produce images that have similar styles to target domain and makes the target task classifier to train images of both. Hoffman \etal [12] proposes to use semantic consistency loss and cycle consistency loss and achieve significantly better domain adaptation performance. As a comparison, our method can outperform those methods without requiring high-resources to train GANs.

Adversarial Attack Neural networks are known for suffering from adversarial attacks [32, 40, 39, 45]. The simplest form of adversarial attack is FGSM [8], which adds a calculated perturbation on the original image, making neural networks misclassify with high confidence. Interestingly, the proposed data calibrator also uses an additive perturbation on images to achieve domain adaptation. The connection between adversarial attack and domain adaptation will be revealed at the objective function in our framework. Essentially, our data calibrator learns to generate adversarial examples that maximize classification loss of domain classifiers.Recently, Ilyas \etal [14] demonstrates that adversarial attack might leverage ”non-robust features” to control classifiers’ prediction. We believe that ”non-robust features” play an important role in performance degradation brought by domain shift. We will provide more analysis about the connection between our method and adversarial attack in Section 5.

3 A Separable Calibrator For Unsupervised Domain Adaptation

Figure 4: Training, testing phase and data calibrator architecture. In the training phase, the pixel level discriminators and the feature space discriminator try to discriminate images to 4 groups while the data calibrator tries to fool both discriminators to treat calibrated images as the source images. In the testing phase, the deployed model takes calibrated images as inputs. The architecture for the data calibrator consists of down sampling layers, up sampling layers and skip connections.

3.1 The overview of the method

In unsupervised domain adaptation, we have access to source domain images and labels drawn from the source domain distribution , and target domain images drawn from a target domain distribution , where there are no labels. Let be the learned classifier for source domain images. The goal of our work is to design a data calibrator such that achieves high accuracy on both source and target domain data. As the classifier is only trained on source domain and there is no information related to the target, the data calibrator has to satisfy:

(1)

where and are from target and source domain respectively.

Let where the feature extractor and is the final classifier. A relaxed condition for achieving (1) is to impose the Lipschitz condition on , i.e.

for some constant which is a stability condition. Therefore, the following two constraints are imposed on the data calibrator:

(2)

It is noted that denotes the input of and denotes the feature map which implies the alignment on both pixel and feature level for source and target domain data. This motivates the following loss function:

(3)

where denotes the Cross entropy. The loss function in (3) encourages the data calibrator for domain adaption while keeping the performance in source domain. In this work, the data calibrator is set as , i.e. only the perturbation is learned by the calibrator. However, as the target information is blind, minimizing (3) is difficult and another method is needed for training the calibrator .

3.2 Adversarial Domain Adaptation with Proposed Calibrator

In this work, we extend the traditional adversarial domain adaption methods [6, 7, 35] and train the proposed calibrator via adversarial learning instead of minimizing (3).

Traditional adversarial domain adaptation methods play a adversarial game between the target classifier and feature discriminator . Because they update weight parameters of to maximize the confusion loss of domain discriminators, the resulted adapted models lack the flexibility of adjusting to new domains after being deployed and are under the risk of sacrificing source domain performance.

On contrast, the basic idea of our extended adversarial domain adaption method is that let there be pixel level domain discriminator and feature level domain discriminator . And let a data calibrator modify images such that domain discriminators can no longer distinguish between and nor between and . Meanwhile, the corresponding features of calibrated images are also confusing such that the feature level discriminator can no longer distinguish between and nor between and . After the calibrator is trained, inputs are fed to the calibrator before fed to the model, as shown in the testing phase at Figure 4.

As shown in Figure 4, the training of the proposed method needs a trained source classifier . Let the source classifier be trained by the following loss function:

(4)

Based on the learned classifier , the pixel level domain discriminator and feature level domain discriminator are proposed for training the calibrator such that the pixel and feature level alignment conditions (2) is satisfied. Furthermore, in order to have a finer discrimination power among images and features from source domain and target domain, we divide the inputs of the domain discriminators into 4 groups inspired by the few shot domain adaptation [21].

These four groups (,i=1,2,3,4) are defined as as follows: represents source domain images , represents target domain images . Therefore, learning to distinguish images and features from and encourages the domain discriminators to learn the distributions of source domain and target domain. Additionally, calibrated source images are defined to belong to and calibrated target images are defined to belong to as to provide learning signal for the adversarial game. Let be the group labels for each group.

Feature Level Discriminator. The feature level discriminator aims to discriminate feature level distribution . Its objective is to minimize categorical cross entropy loss as following:

(5)

In our work, the feature level discriminator is a simple neural network with only two fully connected layers. During training, the feature level discriminator learns to discriminate features distribution of .

Pixel Level Discriminator. The limitation of using only feature level discriminator is that feature level discriminator cannot fully capture the information in the pixel level after images are transformed via pooling layers and strided convolutional layers of the model. Thus, following the original idea of [33], a pixel level discriminator is added to learn pixel level distribution of by following objective function:

(6)

The pixel level discriminator shares the same architecture as the feature level discriminator , i.e. a two layer fully connected network. The biggest challenge for the pixel level discriminator is its tendency of over-fitting to the training set. From our observations in experiments, the validation accuracy starts going down when the training loss for the pixel discriminator gets very low,. Indeed, if the calibrator is optimized towards to a pixel level discriminator that overfits, it looses the generalization power. Therefore, we apply following tricks to the inputs of pixel level discriminator to prevent it from overfitting: (1) A image patch is randomly taken from the image. (2) The pixels of the patch is randomly shuffled in the spatial axis. By applying the above two tricks, the overfitting is mitigated.

Data Calibrator. The data calibrator’s goal is to fool both the pixel level discriminator and feature level discriminator by the following loss function:

(7)

from which the learned calibrator is expected to learn knowledge in source and target domain and satisfies (2).

Method MNIST to USPS USPS to MNIST SVHN to MNIST Average Acc.
ADDN [35] 90.1 95.2 80.1 88.5
CoGAN [18] 91.2 89.1 - -
SBADA [27] 97.6 95.0 76.1 89.6
CYCADA [12] 95.6 96.5 90.4 94.2
CDAN [20] 95.6 98.0 89.2 94.3
PFA [3] 95.0 - 93.9 -

MSTN [38]
92.9 97.6 93.3 94.6
MCD [29] 93.8 95.7 95.8 95.1


Ours
95.6 97.1 97.1 96.6
CyCleGAN+Ours 97.1 98.3 97.5 97.6

Table 1: Results on digits datasets for unsupervised domain adaptation. Our method achieves state-of-the-art performance without using stylized source images. Our method can be further improved by using stylized source images.

The total training loss of our data calibrator can be divided into two parts. When the calibrator tries to fool domain discriminators to treat as , the calibrator tends to approximate the identity mapping. On contrast, when the calibrator tries to fool domain discriminators to treat as , the calibrator is to calibrate target domain images to mitigate the domain shift.

The ResNet generator [15] is used as the architecture of the calibrator for digits and GTA5 to CityScapes experiments. It consists of downsampling layers, upsampling layers and skip connections, as shown in Figure 4. It is noted that the performance does not simply get better when the calibrator network is getting larger. However, reducing the width can improve training as it is believed that it prevents the data calibrator from overfitting when the training data is not sufficient. Additionally, applying norm constrain to the output of the data calibrator plays an important role in GTA5 to CityScapes adaptation. We will give a more detailed discussion on this constrain in Section 5.

4 Evaluation and Results

Figure 5: Semantic Segmentation results for GTA5 to CityScapes. (s-a) Test images from GTA5. (s-b) Predictions from the model trained in GTA5. (s-c) Our prediction. (s-d) Ground truth annotations for test images. (t-a) Test images from CityScapes. (t-b) Predictions from the model trained in GTA5. (t-c) Predictions from our method. (t-d) Ground truth annotations for test images.

road

sidewalk

building

wall

fence

pole

traffic light

traffic sign

vegetation

terrain

sky

person

rider

car

truck

bus

train

motorbike

bicycle

mIoU

fwIoU

Pixel acc.

Source only 42.7 26.3 51.7 5.5 6.8 13.8 23.6 6.9 75.5 11.5 36.8 49.3 0.9 46.7 3.4 5.0 0.0 5.0 1.4 21.7 47.4 62.5
CyCADA 79.1 33.1 77.9 23.4 17.3 32.1 33.3 31.8 81.5 26.7 69.0 62.8 14.7 74.5 20.9 25.6 6.9 18.8 20.4 39.5 72.4 82.3
Ours 83.5 35.2 79.9 24.6 16.2 32.8 33.1 31.8 81.7 29.2 66.3 63.0 14.3 81.8 21.0 26.5 8.5 16.7 24.0 40.5 75.1 84.0
Target 97.3 79.8 88.6 32.5 48.2 56.3 63.6 73.3 89.0 58.9 93.0 78.2 55.2 92.2 45.0 67.3 39.6 49.9 73.6 67.4 89.6 94.3
Table 2: Adaptation between GTA5 and CityScapes. Source only shows results of DRN-26 [43] trained in GTA5 and tested in CityScapes. Target only shows results of DRN-26 trained in CityScapes and tested in CityScapes. Our method outperforms CyCADA in mean IoU, freqency weighted IoU and pixel accuracy. In particular, our frequency weighted IoU is 2.7% better than CyCADA.

In this section, we evaluate our method under unsupervised domain adaptation setting on digits and driving scene semantic segmentation tasks.

Digits We evaluate our method on three commonly used digits datasets: MNIST [17], USPS, and SVHN [22]. We use the same data processing and LeNet architecture as Hoffman \etal [12] and perform three unsupervised domain adaptation tasks: USPS to MNIST, MNIST to USPS and SVHN to MNIST. We report our results of using unstylized source images and stylized source images produced by CycleGAN [47] respectively.

GTA5 to CityScapes GTA5 [26] is a synthetic driving scene dataset and CityScapes [4] is a real world driving scene dataset. The GTA5 dataset has 24966 densely labeled RGB images of size , which contains 19 common classes with CityScapes, as we included in Table 2. The CityScapes dataset contains 5000 densely labeled RGB images of size from 27 cities. In this work, we use DRN-26 [43] as the source classifier. We use the released DRN-26 model from CyCADA [12] as our source classifier, which is trained in stylized GTA5 images.

All components are implemented using Pytorch. For digits experiments, source classifiers and other components are trained with the Adam optimizer with learning rate 1e-4. We use batches of 128 samples from each domain and the images are zero-centered and rescaled to . For GTA5 to CityScapes experiments, we use Adam optimizer with learning rate 1e-4 with batch size 6. We use same LeNet architecture as CyCADA for all digits experiments and DRN-26 [43] for GTA5 to CityScapes task. Our best results are obtained within 50 epochs for digits and within 10 epochs for GTA5 to CityScapes.

Details about other components such as architecture of the data calibrator and domain discriminators can be found at Appendix.

4.1 Digits Experiments

As we show in Figure 3, the learnt representation of source classifier is not as bad as we thought. To prove that, we show that without training a new classifier or using stylized source images produced by GANs, we can just use the source classifier trained in the source domain and train a data calibrator to modify the images to fit the source classifier’s representation. As we show in Table 1, using data calibrator alone can outperform previous methods in average accuracy. For difficult task such as SVHN to MNIST, we can further boost our performance by using stylized source images [47] as source domain, resulting in 7% performance improvement compared to CyCADA, another method that leverages stylized source images for unsupervised domain adaptation.

4.2 Performance Trade off Among Domains

As we discuss in Section 1, existing methods omit to show the trade off between source domain performance and target domain performance. In this subsection, we show that many existing methods have poor source and target domain performance trade off. We use the released code from CyCADA [12],ADDA [35] and MCD [29], follow their setting and train their adapted models to get similar reported target domain performance. We then test their adapted model on the source domain and target domain, report the performance before domain adaptation, after domain adaptation. We observe from Figure 2 that, while ADDA has close performance at USPS to MNIST as ours in the target domain, but its source domain performance is 5% lower than ours. CyCADA has a lot higher target domain performance compared to ADDA, however, it sacrifices source domain performance significantly. MCD is better than the other two in performance trade off, but it uses a baseline that has over-parameterized fully connected layers and does not converge well when we replace their backbone with the same LeNet architecture other approaches and ours use. While our method can be further improved by using GAN generated images as source domain, using the data calibrator alone without stylized images can already surpass these methods in both source domain performance and target domain performance as indicated by Figure 2.

4.3 GTA5 to Cityscapes

GTA5 to Cityscapes is a unsupervised domain adaptation task that is closer to real world setting. Compared to classification task, segmentation task is more challenging because that finer domain adaptation methods are required to mitigate domain shift in pixel levels.

As shown in Table 1, our method has better results in all three commonly used metrics such as mIoU, fwIoU, and pixel accuracy. In particular, our fwIoU is 2.7% better than CyCADA. In Figure 5, we visualize our semantic segmentation results. From (s-b) to two rows at (t-b), we observe the performance degradation brought by the domain shift. (s-c) and (t-c) shows the segmentation results produced by our method. Our method largely mitigates the performance degradation in target domain as well as maintaining source domain performance. Because we improve the accuracy of cars by a large margin, the visualization for cars are quite close to the ground truth annotations.

Figure 6: Images from SVHN to MNIST adaptation Images before and after being calibrated and their view in the frequency domain. The appearance of images are not changed much unlike what style transfer GANs do. In frequency domain, high frequency information is reduced.

5 Discussion

This section is organized as following: Section 5.1 focuses on the analysis of calibrated images in the frequency domain. In Section 5.2, we discuss the connection between adversarial attack and domain adaptation. In Section 5.3, deployment of the data calibrator will be discussed.

5.1 Fourier Perspective

We use Fast Fourier Transform(FFT) to show images before and after adding calibration. It can be seen in Figure 6 that the high frequency information is decreased after images are added with the output of our data calibrator. High frequency information is often related to textures that varies significantly across domains. Yin \etal [42] demonstrates that naturally trained models are biased towards high frequency information, which makes models suffer from high frequency noise. Our method might help remove these high frequency information from images thus mitigating the domain shift problem.

5.2 Connection to Adversarial Attack

Compared to other methods that train classifiers to adapt to target domains, in our domain adaptation framework, once trained in the source domain, the source classifier is not updated and we fully rely on the representation learnt in the source domain to perform tasks in the target domain. Thus the additive calibration produced by our data calibrator needs to figure out how to transform target domain images to a form that better fits the source classifier’s representation.

But what does it mean by modifying target domain images to better fit the source classifier’s representation? We first hypothesize that there are two candidate explanations of what the data calibrator does: (1) the data calibrator acts as a style transfer GAN that converts the style of target domain images to source domain images’s thus achieve domain adaptation. (2) the data calibrator learns to manipulate non-robust features that are useful to neural networks but are intriguing to human [14]. Our data calibrator might learn to suppress these non-robust features thus mitigate the issue brought by the domain shift.

As can be observed from Figure 6, the images modified by our calibrator do not change their appearance in the way the style transfer GAN usually does. We also follow the convention of adversarial attack [8] to limit of the calibration and provide the plot in Appendix. Our best result in Table 2 is obtained by limiting the of calibration to 0.01, so small that a human might not be able to tell. Essentially, our data calibrator is trained to produce a perturbation that fools the domain discriminators with human imperceivbale perturbation, which is very similar to the behavior of adversarial attacks  [32, 8]. This suggests that our data calibrator is not performing style transfer but leveraging intriguing hints to mitigate the domain shift problem. Our method suggests that there is a potential connection between adversarial attack and domain adaptation and our results should be interesting to both research community.

5.3 Calibrator for Deployment

As we discuss in Section 1, one of the limitations of existing domain adaptation methods is the lack of flexibility. As far as we know, most existing domain adaptation methods will require finetune the deployed model when there is a new target domain. However, the deployed model is usually compressed and stored in specialized hardwares thus adapting the deployed models to new domains requires a long, costly process and might not be fast enough for time-sensitive applications.

On contrast, our method does not require updating the deployed model and has greater flexibility when adapting to a new domain is desired. Additionally, the overhead brought by the calibrator is moderate. We tested the number of parameters of the classifier and data calibrator. For digits experiment, the number of parameter of LeNet is 3.1 millions while the data calibrator has 0.18 millions of parameters, only 5.8% compared to the model. For GTA5 to CityScapes experiments, the DRN-26 model has 20.6 millions of parameters while our data calibrator only has 0.05 millions of parameters, only 0.24% compared to the DRN-26 model.

We thereby conclude that the proposed data calibrator is light-weight compared to the deployed model and does not bring too much overhead during deployment.

6 Conclusion

In summary, the proposed method not only achieves state-of -the-art performance in unsupervised domain adaptation for digits classification task and driving scene semantic segmentation task, but also be suitable for deployed models to adapt to new domains without the need to update their weights. This approach provides a feasible solution for online unsupervised domain adaptation. While the community is trying to build a monolithic model that can work across as many domains as possible, the separable approach we propose is also worth investigating.

Figure 7: Network architectures used for digits experiments . We show the source classifier , proposed calibrator , pixel level domain discriminator and feature level domain discriminator .
Figure 8: Performance vs. ball of calibration produced by the calibrator.  We show that with calibration that is imperceivable to human, we can achieve state-of-the-art domain adaptation performance. Calibration with large ball has worse performance, probably due to overfitting or models’ poor rosbutness to pixel modification in general
GTA5 to CityScapes N. of Param.(M) Flops(G)
DRN-26 20.6 200
Data Calibrator 0.05 2.67
Digits N. of Param.(M) Flops(G)
LeNet 3.13 0.03
Data Calibrator 0.18 0.02
Table 3: Overhead of data calibrator. We show that our calibrator is light-weight both in terms of number of parameters and flops. Even for network as tiny as LeNet, the calibrator is small compared to it

References

  1. A. Bobu, E. Tzeng, J. Hoffman and T. Darrell (2018) Adapting to continuously shifting domains. Cited by: §1.
  2. K. Bousmalis, N. Silberman, D. Dohan, D. Erhan and D. Krishnan (2017) Unsupervised pixel-level domain adaptation with generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3722–3731. Cited by: §2.
  3. C. Chen, W. Xie, W. Huang, Y. Rong, X. Ding, Y. Huang, T. Xu and J. Huang (2019) Progressive feature alignment for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 627–636. Cited by: Table 1.
  4. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth and B. Schiele (2016) The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3213–3223. Cited by: §4.
  5. J. Deng, W. Dong, R. Socher, L. Li, K. Li and L. I. Fei-Fei (2009) A large-scale hierarchical image database. 2009 ieee conf comput vis pattern recognit. Ieee. Cited by: §1.
  6. Y. Ganin and V. Lempitsky (2014) Unsupervised domain adaptation by backpropagation. arXiv preprint arXiv:1409.7495. Cited by: §1, §2, §3.2.
  7. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand and V. Lempitsky (2016) Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17 (1), pp. 2096–2030. Cited by: §2, §3.2.
  8. I. J. Goodfellow, J. Shlens and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §2, §5.2.
  9. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1, §2.
  10. S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz and W. J. Dally (2016) EIE: efficient inference engine on compressed deep neural network. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 243–254. Cited by: §1.
  11. S. Han, J. Pool, J. Tran and W. Dally (2015) Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pp. 1135–1143. Cited by: §1, §1.
  12. J. Hoffman, E. Tzeng, T. Park, J. Zhu, P. Isola, K. Saenko, A. A. Efros and T. Darrell (2017) Cycada: cycle-consistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213. Cited by: §1, §2, Table 1, §4.2, §4, §4.
  13. J. Hoffman, D. Wang, F. Yu and T. Darrell (2016) Fcns in the wild: pixel-level adversarial and constraint-based adaptation. arXiv preprint arXiv:1612.02649. Cited by: §1, §2.
  14. A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran and A. Madry (2019) Adversarial examples are not bugs, they are features. arXiv preprint arXiv:1905.02175. Cited by: §2, §5.2.
  15. J. Johnson, A. Alahi and L. Fei-Fei (2016) Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision, pp. 694–711. Cited by: §3.2.
  16. A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci and T. Duerig (2018) The open images dataset v4: unified image classification, object detection, and visual relationship detection at scale. arXiv preprint arXiv:1811.00982. Cited by: §1.
  17. Y. LeCun, L. Bottou, Y. Bengio and P. Haffner (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §4.
  18. M. Liu and O. Tuzel (2016) Coupled generative adversarial networks. In Advances in neural information processing systems, pp. 469–477. Cited by: §2, Table 1.
  19. M. Long, Y. Cao, J. Wang and M. I. Jordan (2015) Learning transferable features with deep adaptation networks. arXiv preprint arXiv:1502.02791. Cited by: §2.
  20. M. Long, Z. Cao, J. Wang and M. I. Jordan (2018) Conditional adversarial domain adaptation. In Advances in Neural Information Processing Systems, pp. 1640–1650. Cited by: §1, Table 1.
  21. S. Motiian, Q. Jones, S. Iranmanesh and G. Doretto (2017) Few-shot adversarial domain adaptation. In Advances in Neural Information Processing Systems, pp. 6670–6680. Cited by: §3.2.
  22. Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu and A. Y. Ng (2011) Reading digits in natural images with unsupervised feature learning. Cited by: §4.
  23. J. Quiñonero-Candela, M. Sugiyama, A. Schwaighofer and N. Lawrence (2008) Covariate shift and local learning by distribution matching. MIT Press. Cited by: §2.
  24. B. Recht, R. Roelofs, L. Schmidt and V. Shankar (2019) Do imagenet classifiers generalize to imagenet?. arXiv preprint arXiv:1902.10811. Cited by: §1, §1.
  25. A. Ren, T. Zhang, S. Ye, J. Li, W. Xu, X. Qian, X. Lin and Y. Wang (2019) ADMM-nn: an algorithm-hardware co-design framework of dnns using alternating direction methods of multipliers. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 925–938. Cited by: §1.
  26. S. R. Richter, V. Vineet, S. Roth and V. Koltun (2016) Playing for data: ground truth from computer games. In European conference on computer vision, pp. 102–118. Cited by: §4.
  27. P. Russo, F. M. Carlucci, T. Tommasi and B. Caputo (2018) From source to target and back: symmetric bi-directional adaptive gan. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8099–8108. Cited by: Table 1.
  28. K. Saenko, B. Kulis, M. Fritz and T. Darrell (2010) Adapting visual category models to new domains. In European conference on computer vision, pp. 213–226. Cited by: §1, §2.
  29. K. Saito, K. Watanabe, Y. Ushiku and T. Harada (2018) Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3723–3732. Cited by: §1, Table 1, §4.2.
  30. B. Sun, J. Feng and K. Saenko (2016) Return of frustratingly easy domain adaptation. In Thirtieth AAAI Conference on Artificial Intelligence, Cited by: §2.
  31. B. Sun and K. Saenko (2016) Deep coral: correlation alignment for deep domain adaptation. In European Conference on Computer Vision, pp. 443–450. Cited by: §1, §2.
  32. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §2, §5.2.
  33. A. Torralba and A. A. Efros (2011) Unbiased look at dataset bias.. In CVPR, Vol. 1, pp. 7. Cited by: §1, §1, §2, §3.2.
  34. E. Tzeng, J. Hoffman, T. Darrell and K. Saenko (2015) Simultaneous deep transfer across domains and tasks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4068–4076. Cited by: §2.
  35. E. Tzeng, J. Hoffman, K. Saenko and T. Darrell (2017) Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176. Cited by: §1, §2, §3.2, Table 1, §4.2.
  36. E. Tzeng, J. Hoffman, N. Zhang, K. Saenko and T. Darrell (2014) Deep domain confusion: maximizing for domain invariance. arXiv preprint arXiv:1412.3474. Cited by: §1, §2, §2.
  37. M. Wulfmeier, A. Bewley and I. Posner (2018) Incremental adversarial domain adaptation for continually changing environments. In 2018 IEEE International conference on robotics and automation (ICRA), pp. 1–9. Cited by: §1.
  38. S. Xie, Z. Zheng, L. Chen and C. Chen (2018) Learning semantic representations for unsupervised domain adaptation. In International Conference on Machine Learning, pp. 5419–5428. Cited by: Table 1.
  39. K. Xu, H. Chen, S. Liu, P. Chen, T. Weng, M. Hong and X. Lin (2019) Topology attack and defense for graph neural networks: an optimization perspective. In International Joint Conference on Artificial Intelligence (IJCAI), Cited by: §2.
  40. K. Xu, S. Liu, P. Zhao, P. Chen, H. Zhang, Q. Fan, D. Erdogmus, Y. Wang and X. Lin (2019) Structured adversarial attack: towards general implementation and better interpretability. In International Conference on Learning Representations, External Links: Link Cited by: §2.
  41. S. Ye, K. Xu, S. Liu, H. Cheng, J. Lambrechts, H. Zhang, A. Zhou, K. Ma, Y. Wang and X. Lin (2019-10) Adversarial robustness vs. model compression, or both?. In The IEEE International Conference on Computer Vision (ICCV), Cited by: §1.
  42. D. Yin, R. G. Lopes, J. Shlens, E. D. Cubuk and J. Gilmer (2019) A fourier perspective on model robustness in computer vision. arXiv preprint arXiv:1906.08988. Cited by: §5.1.
  43. F. Yu, V. Koltun and T. Funkhouser (2017) Dilated residual networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 472–480. Cited by: Table 2, §4, §4.
  44. T. Zhang, S. Ye, K. Zhang, J. Tang, W. Wen, M. Fardad and Y. Wang (2018) A systematic dnn weight pruning framework using alternating direction method of multipliers. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 184–199. Cited by: §1.
  45. P. Zhao, S. Liu, P. Chen, N. Hoang, K. Xu, B. Kailkhura and X. Lin (2019) On the design of black-box adversarial examples by leveraging gradient-free optimization and operator splitting method. In Proceedings of the IEEE International Conference on Computer Vision, pp. 121–130. Cited by: §2.
  46. A. Zhou, A. Yao, Y. Guo, L. Xu and Y. Chen (2017) Incremental network quantization: towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044. Cited by: §1, §1.
  47. J. Zhu, T. Park, P. Isola and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232. Cited by: §4.1, §4.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
410147
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description