Unsupervised Open Domain Recognition by Semantic Discrepancy Minimization
Abstract
We address the unsupervised open domain recognition (UODR) problem, where categories in labeled source domain is only a subset of those in unlabeled target domain . The task is to correctly classify all samples in including known and unknown categories. UODR is challenging due to the domain discrepancy, which becomes even harder to bridge when a large number of unknown categories exist in . Moreover, the classification rules propagated by graph CNN (GCN) may be distracted by unknown categories and lack generalization capability.
To measure the domain discrepancy for asymmetric label space between and , we propose SemanticGuided Matching Discrepancy (SGMD), which first employs instance matching between and , and then the discrepancy is measured by a weighted feature distance between matched instances. We further design a limited balance constraint to achieve a more balanced classification output on known and unknown categories. We develop Unsupervised Open Domain Transfer Network (UODTN), which learns both the backbone classification network and GCN jointly by reducing the SGMD, enforcing the limited balance constraint and minimizing the classification loss on . UODTN better preserves the semantic structure and enforces the consistency between the learned domain invariant visual features and the semantic embeddings. Experimental results show superiority of our method on recognizing images of both known and unknown categories.
1 Introduction
We study the unsupervised open domain recognition problem (UODR) in this paper. In UODR, a labeled source domain and unlabeled target domain are given, where the categories in is only a subset of those in . The task is to classify all samples in including known and unknown categories, which is undoubtedly a more challenging task but closer to the case in realworld applications compared to other related tasks in Domain Adaptation (DA) [2, 3, 4, 20, 21, 33, 40, 42, 34, 13, 5, 10, 25] and ZeroShot Learning (ZSL) [24, 11, 28, 16, 32, 15, 14, 29, 41].







Unsupervised DA  ✓  ✓  ✓  
Partial Unsupervised DA  ✓  ✓  ✓  ✓  
Unsupervised open set DA  ✓  ✓  ✓  
Generalized ZSL  ✓  ✓  –  
Transductive Generalized ZSL  ✓  ✓  ✓  –  
UODR  ✓  ✓  ✓  ✓  ✓ 
The major differences among UODR and other related problems are summarized in Table 1. Traditional unsupervised DA [4, 20, 21, 33, 42] is too strict to assume that and share the same categories. Researchers begin to explore a more difficult setting that and do not share the same categories (asymmetric category space). In partial adversarialDA [3] and partial weightedDA [40], the authors addressed the problem that the category space of is a subset of . However, the category space is still constrained in close set expanded by source domain categories. For the more difficult setting, i.e., the category space of is a subset of is rarely considered in DA field. In open set DA [2], there are unknown categories both in and , but the task is to classify only the samples of known categories in the target domain, while the samples of unknown categories are ignored. In contrast, there is no unknown categories in , and all samples of known and unknown categories in target domain are required to be classified in UODR. UODR is also different from generalized ZSL [28, 16, 11, 32, 15, 14], since in generalized ZSL all the data are from the same domain and there is no presumed domain discrepancy between (i.e., training set in ZSL) and (i.e., testing set in ZSL). Therefore, existing solutions can not be directly used to solve UODR problem due to its unique characteristics.
UODR is challenging due to the semantic discrepancy between and , which can be explained from both feature distribution and semantic aspects. First, there is large divergence on both content and distribution perspectives between and , which is also referred to as domain discrepancy in existing DA studies [4, 20, 21, 33, 42]. The domain discrepancy is even harder to bridge if a large number of unknown categories are injected into . In this case, directly applying techniques used in DA, e.g., MMD [20] and DCORAL [33], would lead to negative transfer. Second, it is hard to classify instances of unknown categories without labeled training data or any auxiliary attributes information [16, 37]. With the knowledge on the relationship among known and unknown categories, graph CNN (GCN) [15] can be used in UODR to propagate classification rules of known categories to unknown categories [38, 14]. However, in generalized ZSL, there exists mode collapse that forces the prediction of unknown categories samples into the seen categories. Worse still, the propagated classification rules on unknown categories may lack generalization capability due to the domain discrepancy between and .
The key idea to address UODR is minimizing the semantic divergence from both feature distribution and semantic aspects. Specifically, on unlabeled domain , there exists many unknown categories with similar image instances given a certain known category in . To reduce the distraction brought by unknown categories in , the domaininvariant feature learning is performed by reducing the domain discrepancy measured on data from the shared (known) categories of and . We propose SemanticGuided Matching Discrepancy (SGMD), which first employs instance matching between and to produce coarsely matched pairs [3]. The discrepancy is then measured by a weighted feature distances on these pairs, where the weight is the thresholded similarity of their target domain classifier responses. The target domain classification output provides semantic level abstraction on a wide range of categories, and instance pair with the same category label are assumed to have similar classification outputs. Therefore, the weight reflects the degree of semantic consistency of each pair, and the weighted distance calculation further reduces the negative effect of noisy matching.
Similar as [38, 14], GCN is used to propagate the classification rules from known to unknown categories as the first step, where the category relation is described by WordNet. The propagated classification rules are then used to initialize the classification layer of backbone network. Based on the backbone classification network, to deal with semantic shift from known to unknown categories, we design a limited balance constraint to prevent target domain samples of unknown categories being classified into known categories, and better avoid strongly biased classifiers on unknown categories compared to the balance constraint proposed by [32].
Putting the components together, we develop Unsupervised Open Domain Transfer Network (UODTN), which learns the backbone classification network and GCN jointly by reducing the SGMD, achieving the limited balance, enforcing the semantic structure preserving via GCN, and minimizing the classification loss on . Compared to multistage learning paradigms [38, 14] that perform GCNbased classification model propagation and visual feature learning stepbystep, the joint classification network and GCN learning can better preserve the semantic structure and enforce the consistency between the learned domain invariant visual features and the semantic embeddings. We construct two datasets for evaluating our method on UODR. Experimental results show the effectiveness of our method on recognizing images of both known and unknown categories in . We make our collected data and codes publicly available at https://github.com/junbaoZHUO/UODTN.
2 Related Work
Deep unsupervised domain adaptation. Most of the deep unsupervised domain adaptation models are trained by combining classification loss on with additional losses such as discrepancy reducing losses [20, 33, 21, 4, 8], adversarial discriminative losses [7, 34, 36], adversarial generative losses [19, 1, 13] and reconstruction losses [9]. We only review some discrepancyreducingbased methods that closely related to our method. A single linear kernel is applied to only one FullyConnected (FC) layer to minimize Maximum Mean Discrepancy (MMD) in DDC [35]. The sum of MMDs defined between several FC layers, including the last classification layer, is considered in Deep Adaptation Network (DAN) [20]. In Joint Adaptation Networks [21], the joint distribution discrepancies of the multilayer activations are considered rather than separate adaptations on marginal and conditional distributions which often require strong independence and/or smoothness assumptions on the factorized distributions. Instead of MMD, domain discrepancy is measured by the difference between the secondorder statistics (i.e., covariance) [33, 42]. Domain discrepancy on both convolutional representation and the classification layer is explicitly considered in [42]. PMD [4] aims to approximate the firstorder Wasserstein distance between two domains via minimum weight graph matching. These discrepancyreducingbased methods can only handle the case that and share the same label space.
Generalized ZSL. Generalized ZSL drops the assumption that target domain contains only unknown categories [23, 17, 11, 32, 16, 28, 31]. Being the most related problem to UODR, transductive generalized ZSL [11, 32, 28, 16] is performed in a semisupervised learning manner that both the labeled source data and the unlabeled target data are available, where there is no presumed domain discrepancy between and . However, in UODR, there exists domain discrepancy between and . Propagated Semantic Transfer (PST) [28] exploits the manifold structure of novel classes by incorporating external knowledge, such as linguistic or expert specified information to conduct label propagation. Unsupervised Attribute Alignment (UAA) [16] associates crossdomain attributes by regularized sparse coding which enforces attributes shared by known and unknown categories to be similar. In [11], a novel joint learning approach is proposed to learn the shared model space (SMS) for models such that the knowledge can be effectively transferred between classes using the attributes. Unbias ZSL [32] enforces a balanced classifier responses among known and unknown categories for unlabeled target data to learn an unbiased embedding space for ZSL.
Object recognition via knowledge graph. Salakhutdinov et al. [30] use WordNet to share the representations among different object classifiers so that objects with few training examples can borrow statistical strength from related objects. Deng et al. [6] apply the exclusion rules as a constraint and add objectattribute relations into the graph to train object classifiers for zeroshot applications. In contrast to these methods of using graph as constraints, a 6layer deep GCN is constructed to directly generate novel object classifiers in [38]. In [14], the authors argue that too many layers of GCN results in oversmooth classifier and propose to train a single layer GCN. Furthermore, in [14], a more dense graph structure is utilized and finetune the feature space to adapt to the generated semantic embedding space.
3 Method
3.1 Common Notations
Some common notations used in this paper are introduced here. Suppose that there are sourcedomain training examples with labels , and unlabeled targetdomain examples , where their labels are not available and . That is, there are unknown categories in target domain. and are the raw images from source and target domains respectively. Let be the feature extractor and let and denote the classifier pretrained on and classifier for target domain .
3.2 Framework
As shown in Figure 1, our Unsupervised Open Domain Transfer Network (UODTN) contains a backbone classification network with classifier layer for all categories in target domain and a GCN that maintains the relationships among all categories. We first use GCN to generate the semantics embeddings of unknown categories in target domain and then initialize the classifier layer of backbone classification network by these semantic embeddings. Based on the initialized backbone classification network, we further reduce the proposed semanticguided matching discrepancy, enforce the proposed limited balance constraint and integrate GCN to minimize the semantic discrepancy in UODR problem. The backbone classification network and GCN are jointly trained in an endtoend manner with GCN aiming at preserving semantic structure encoded in word vectors and knowledge graph. The details are illustrated as follows.
3.3 Generating unknown class semantic embeddings
With the auxiliary information encoded in word vectors and knowledge graph for unknown categories, we can generate the unknown class semantic embeddings via GCN. We first construct a graph with nodes where each node is a dimensional vector presenting a distinct concept/class. In order to propagate the semantic embeddings of known categories to unknown categories, additional nodes are required for constructing full path from known categories to unknown categories. Each node is initialized with word vector of the class name. The relationships among the classes in the knowledge graph, say, WordNet, are encoded in form of a symmetric adjacency matrix , which also includes selfloops. We propagate such relationship as performing convolution on the graph
where is composed of word vectors and denotes the trainable weights. denotes a nonlinear activation function. is a degree matrix where . By training the GCN to predict the classifier weights of known classes, the GCN simultaneously generates the classifier weights of unknown classes while preserves the semantic relationship exhibited in word vectors and knowledge graph. The loss is
where denotes the classifier weights obtained by extracting the weights of , the classifier pretrained on source domain. We replace the original classifier of pretrained ResNet50 with the generated classifiers to form a classification network for source and target domain.
3.4 Semanticguided matching discrepancy
In real world scenario, there always exists domain discrepancy between manually collected labeled data (source domain) and practical data (target domain). Such domain discrepancy leads to performance degradation on target domain and more severely, makes GCN propagate biased semantic embeddings to unknown categories. Therefore, it is urgent to reduce the domain discrepancy. However, it is difficult to measure the domain discrepancy in UODR problem since there are many unknown categories samples. Existing domain discrepancy measurements such as MMD [20, 21] and difference between correlation [33, 42], assume that the source and target domain share same categories, which can not handle asymmetric label space for UODR.
We propose semanticguided matching discrepancy to estimate the domain discrepancy. We extract the features of all instances from source and target domain and construct a bipartite graph between the two domains. The weights of the bipartite graph are pairwise distance of all pairs. In this work, we use distance while other distance metrics can also be used. By solving minimum weight matching problem via the Hungarian algorithm, we obtain coarse and noisy matched instance pairs (pairs linked with red line in the left part of Figure 1) between source and target domain. Directly reducing the discrepancy measured from noisy matched instances pair will inevitably lead to negative transfer. Hence, we propose to utilize the semantic consistency of matched pairs to filter such noisy matched pairs. Precisely, given matched source and target instances and , we extract their features as and , and calculate their classifier responses and respectively, the semanticguided matching discrepancy is
where is the distance metrics which can be distance, the discrepancy metric encoded in domain discriminator when using adversarial training, etc. denotes inner product. is indicator function and is a given threshold. The similarity reveals the degree of semantic consistency of each pair since samples of the same classes are assumed to have similar classification responses.
3.5 Limited balance constraint
To prevent target domain samples of unknown categories being classified into known categories, it is straightforward to add a balance constraint to classifier responses for target domain instances. The vanilla balance constraint [32] is calculated as:
However, such balance constraint may grow into unexpected large value since there is no label for target domain, which will result in biased classifiers of unknown categories. To prevent the classifier response of unknown categories growing abnormally, we propose limited balance constraint:
where and is a manually set constant that control the ratio of classification response of unknown categories over all categories. Such constraint enforces the ratio of classification response of unknown categories over all categories to lie in an appropriate range. Ideally, can be set according to the prior of the proportion of unknown classes over all categories.
3.6 Semantic structure preserving via GCN
The semantic structure among categories exhibited in word vectors and knowledge graph can not be well preserved via reducing semanticguided matching discrepancy and enforcing limited balance constraint. To preserve such relationship, we integrate GCN into our training, resulting in an endtoend framework. Different from subsection 3.3, semantic embeddings of all categories in target domain are considered in the loss term:
where denotes the classifier weights obtained by extracting the weights of , the classification layer for all categories in target domain. Unlike the method proposed in [14], which fixes the classifier learnt from GCN and finetune the features, the classifier in our model can be well adapted to data while the semantic relationship of all categories is still maintained via GCN.
3.7 Joint training
After initializing the classifier layer of UODTN via trained GCN in subsection 3.3, we utilize all proposed techniques to train UODTN in an endtoend manner. The total loss is
where is classification loss on labeled source domain. , and are weights for semanticguided matching discrepancy minimizing loss, limited balance constraint and structure preserving loss of GCN. Specifically, minimizing semanticguided matching discrepancy provides domaininvariant features for classifiers of known and unknown categories. Further, the classifiers of known categories receive both the supervision of classification loss and regularization of GCN. On the other hand, the classifiers of unknown categories are trained with guidance from limited balance constraint and GCN. Joint training is unhindered to achieve better tradeoff of classification accuracy between known and unknown categories in target domain. Minimizing sematic guided matching discrepancy actually propagates semantic information from feature perspective while GCN propagates semantic embeddings from semantic perspective. The UODR problem is actually an illconditioned problem where limited balance constraint prevents ill solutions of UODTN during the training progress.
4 Experiment
4.1 Datasets
We evaluate our method on two datasets: a smallscale dataset I2AwA and a largescale dataset I2WebV. The target domain of I2AwA is AwA2[39] which is a replacement of the original AwA dataset for zeroshot learning. It consists of 50 animal classes, with a total of 37,322 images and an average of 746 images per class. We use the proposed split in [39] in which 40 classes are regarded as known categories and the rest 10 classes as unknown categories. We collect a source domain dataset with 40 known categories via Google image searching engine. We manually remove the noisy images resulting 2,970 images in total. There exists domain discrepancy between source and target domain as shown in Figure2. As for I2WebV, its source domain is ILSVRC2012 with 1,000 classes which consists of 1,279,847 images totally. The target domain of I2WebV is the validation set of WebVision [18] with 5,000 classes, which is composed of 294,009 images. I2WebV is a very challenging dataset as there is large domain discrepancy between two domains and large number of unknown categories in target domain, some of which are very different from 1,000 known categories. The knowledge database we use for both I2AwA and I2WebV is WordNet [22].
4.2 Evaluation metrics
We perform classification on the whole target domain similar to generalized zeroshot learning and report the Top 1 Accuracies of known categories, unknown categories and all categories on target domain for better understanding the knowledge transfer process.
4.3 Baselines
we compare our method with several baselines: zGCN [38], two variants including dGCN and adGCN proposed in [14], bGCN and pmdbGCN. zGCN is built upon graph which utilizes both word vectors and the categorical relationships encoded in WordNet to predict the classifiers of unknown categories. Following zGCN, the authors in [14] utilize a more dense graph structure (dGCN) and assign different weights for additional edges (adGCN). We also construct bGCN, GCN with original balance constraint proposed in stateoftheart transductive zeroshot learning methods [32]. Furthermore, on the basic of bGCN, we implement another variant of GCN, pmdbGCN, which further reduces the population matching discrepancy [4], a stateoftheart domain discrepancy measurement which shows superiority over MMD.
4.4 Implementation details
We construct two distinct graphs based on WordNet [22] for I2AwA and I2WebV respectively. The graph nodes include all categories of target domain and their children and ancestors. Precisely, the number of nodes for graphs of I2AwA and I2WebV are 255 and 7,460. The word vectors for all categories are extracted via GloVe text model [27] which is trained on Wikipedia. Word vectors for nodes in graph are set as inputs of GCN. We use ResNet50 [12] pretrained on ILSVRC2012 as basic model where the last fully connected layer, i.e., the classification layer is regarded as the target that GCN tends to predict. We train the GCN with word vectors as inputs and classifier of pretrained ResNet50 as target to obtain the initial classifiers of target domain in I2WebV. As for I2AwA, the supervison information for training GCN is classifiers finetuned on the source domain of I2AwA. These initial classifiers are then concatenated into feature extractor of pretrained ResNet50 (with original classifier layer removed) to form a backbone classification network for source and target domain. We fix some beginning convolutional layers of ResNet50 to accelerate the training process. The global average pooling responses before classification layer are thought as features and based on these features we construct a bipartite graph with each subgraph representing source and target domain. We use Hungarian algorithm to get minimum weight matched pairs for estimating population matching discrepancy [4] and our proposed semanticguided matching discrepancy. Specifically, we use the discrepancy metric encoded in domain discriminator as distance metric in Eqn. (3). It is difficult to get minimum weight matched pairs for bipartite graph based on large scale datasets. We simply apply divide and conquer strategy to handle this issue. Take I2AwA as an example, we randomly divide source/target domain into 5 folds, respectively. Then we construct 5 bipartite graphs for each fold pair and use Hungarian algorithm to get minimum weight matched pairs for 5 bipartite graphs. All of our experiments are implemented with Pytorch [26]. More details can be seen in our released codes.
4.5 Results and discussion
The classification results on I2AwA and I2WebV are shown in Table 2 and Table 3. As shown in Table 2 and Table 3, our method UODTN outperforms all the baselines by considerable margins, achieving 3.7% and 0.9% improvements on unknown classes and all classes on I2AwA. For a more challenging dataset I2WebV, we implement two variants of UODTN with different , and according to different tradeoff between known and unknown categories. Precisely, aiming at achieving higher average performance, UODTN (Avg.) shows 9.9%, 0.2% and 2.2% improvements on known classes, unknown classes and all classes compared to bGCN. On the contrary, UODTN (Ukn.) that pays more attention to unknown categories, achieves remarkable improvement on unknown categories by 1.0% while the overall top 1 accuracy is still higher that bGAN. Noting that WebVision contains 4,000 unknown categories, 1.0% improvement is a great progress without any labels of unknown categories available. We also obtain the following observations: (1) zGCN, dGCN and adGCN obtained from labeled source domain and knowledge graph can not fit target data well, as there is severe classification confusion between known and unknown categories. UODTN and bGCN show improvement over zGCN, dGCN and adGCN indicating that fitting target domain data leads to better generalization of networks. However, comparing bGCN with UODTN, we can see that merely introducing a balance constraint on classifier responses is insufficient as there exists domain discrepancy between source and target domain. Such domain discrepancy results in suboptimal classifiers cause distracted semantic embeddings when being propagated to unknown categories in target domain. (2) Merely reducing the domain discrepancy estimated by traditional methods leads to negative transfer as revealed by comparison between bGCN and pmdbGCN. Note that we assign a very small weight to population matching discrepancy reducing term for optimal results of pmdbGCN. On the contrary, by reducing our proposed semanticguided matching discrepancy, such negative transfer can be avoided and more domaininvariant features are learned by UODTN, which is illustrated in 4.6.
Known  Unknown  All  

zGCN [38]  77.2  21.0  65.0 
UODTN (lb)  83.9  32.5  73.0 
UODTN (lb+sgmd)  84.6  31.0  73.3 
UODTN (lb+sgmd+gcn)  84.7  31.7  73.5 
4.6 Ablation Study
To go deeper with the efficacy of semanticguided matching discrepancy, limited balance constraint and joint training of GCN, we conduct ablation study on I2AwA by evaluating several models (Table 4): (1) zGCN, without adding any proposed techniques in UODTN; (2) UODTN (lb), which includes only limited balance constraint; (3) UODTN (lb+sgmd), which further contains semanticguided matching discrepancy reducing module; (4) UODTN (lb+sgmd+gcn), which is the full model with limited balance constraint, semanticguided matching discrepancy reducing module and joint training of GCN. We can see that UODTN (lb) outperforms zGCN [38] by a large margin since limited balance constraint can prevent the classifier activations on known categories growing abnormally. By the way, from Table 2 and 4, we can see that UODTN (lb) outperforms bGCN which shows the superiority of limited balance constraint over original balance constraint [32]. Further, we can observe that UODTN (lb+sgmd) improves the performance by 0.1% compared with UODTN (lb), which validates that reducing semanticguided matching discrepancy can not only avoid negative transfer but also boost the domain invariance of learned features. By further integrating GCN for joint training, UODTN (lb+sgmd+gcn) gains improvement over UODTN (lb+sgmd). It is rational as the relationship among all known and unknown categories is essential for transferring effective semantic embeddings for unlabeled unknown categories. Joint training with GCN progressively maintains the semantic structure encoded in word vector and knowledge graph to guarantee the boost of UODTN.
4.7 Traditional domain adaptation
We conduct experiments for traditional domain adaptation to validate that semanticguided matching discrepancy (SGMD) is capable of dealing DA. We simply adopt distance for Eqn. (3) here. The source domain is ImageNet and the target domain is a subset of Webvision that shares 1,000 categories with ImageNet for I2WebV. From the first row in Table 5, we can see that SGMD is slightly better than PMD and to MMD, demonstrating that weighted mechanism is helpful for DA. Note that the matching is fixed, so PMD is poor than MMD. However, our SGMD is still better than MMD which validates the effectiveness of weighted mechanism. Domain adaptation results on I2AwA are shown in the second row in Table 5. The discrepancy between source and target domain of I2AwA is large and the size of source domain is small. Besides, the categories in AwA2 are similar so that domain adaptation on I2AwA is very challenging. With fixed matching, SGMD outperforms MMD and PMD significantly which validates the superiority of SGMD.
ResNet  MMD  PMD  SGMD  
I2WebV (1K)  67.7  68.0  67.9  68.1 
I2AwA (40)  84.0  84.2  84.4  85.1 
4.8 Visualization
We visualize the tSNE embeddings of the images of target domain with features extracted from best competitor bGCN and our model UODTN on I2AwA in Figure 3. We only visualize 15 known categories and 3 unknown categories for the sake of visualization quality and clarity. These known categories include the categories that are related to 3 unknown for better understanding the influence between known and unknown categories. From Figure 3 (a), we can see that in the black box area, the samples of unknown category are mixed with those of known category for bGCN. On the contrary, in Figure 3 (b), the two categories are well separated by UODTN which qualitatively verifies the effectiveness of semanticguided matching discrepancy, limited balance constraint and joint training of GCN in UODTN.
4.9 Illustrative examples
We show some qualitative results of UODTN in Figure 4. We observe that UODTN effectively transfers the semantic embeddings of source domain to unknown categories in target domain. This property mainly depends on joint training with GCN to preserve the semantic relationships between known and unknown categories while improving the discrimination ability of classifier. Figure 4 provides some correct classification results of UODTN. For all instances, except the true categories that the instances belong to, the classifiers of correctly related unknown/known categories are also activated with large confidence. This indicates that UODTN can effectively transfer the knowledge from both labeled source domain, word vectors and knowledge graph. More illustrative examples including incorrect results can be seen in supplementary material.
5 Conclusion
We explore unsupervised open domain recognition problem, where an unlabeled target domain and a discrepant labeled source domain that only covers a subset of categories of target domain are given, and the goal is to classify all instances of target domain. UODR is more challenging due to the semantic discrepancy between and , which exhibits large divergence on both content and distribution perspectives between and and semantic shift from known to unknown categories between the two domains. We develop Unsupervised Open Domain Transfer Network (UODTN) , which learns the backbone classification network and GCN jointly by reducing the SGMD, achieving the limited balance, enforcing the semantic structure preserving via GCN, and minimizing the classification loss on . We collect two datasets for UODR problem and extensive experiments validate the effectiveness of UODTN. In future work, discriminating known and unknown categories to alleviate the semantic shift in OUDR problem also worths studying, since it is a nontrivial task as there is function to distinguish known and unknown categories.
6 Acknowledgement
This work was supported in part by National Natural Science Foundation of China: 61672497, 61620106009, U1636214 and 61836002, in part by National Basic Research Program of China (973 Program): 2015CB351800, and in part by Key Research Program of Frontier Sciences of CAS: QYZDJSSWSYS013.
References
 [1] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan. Unsupervised pixellevel domain adaptation with generative adversarial networks. In CVPR, volume 1, page 7, 2017.
 [2] P. P. Busto and J. Gall. Open set domain adaptation. In ICCV, pages 754–763, 2017.
 [3] Z. Cao, M. Long, J. Wang, and M. I. Jordan. Partial transfer learning with selective adversarial networks. arXiv preprint arXiv:1707.07901, 2017.
 [4] J. Chen, C. LI, Y. Ru, and J. Zhu. Population matching discrepancy and applications in deep learning. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, NIPS, pages 6262–6272. Curran Associates, Inc., 2017.
 [5] N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy. Optimal transport for domain adaptation. IEEE transactions on pattern analysis and machine intelligence, 39(9):1853–1865, 2017.
 [6] J. Deng, N. Ding, Y. Jia, A. Frome, K. Murphy, S. Bengio, Y. Li, H. Neven, and H. Adam. Largescale object classification using label relation graphs. In ECCV, pages 48–64. Springer, 2014.
 [7] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domainadversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
 [8] M. Ghifary, W. Bastiaan Kleijn, M. Zhang, and D. Balduzzi. Domain generalization for object recognition with multitask autoencoders. In ICCV, pages 2551–2559, 2015.
 [9] M. Ghifary, W. B. Kleijn, M. Zhang, D. Balduzzi, and W. Li. Deep reconstructionclassification networks for unsupervised domain adaptation. In ECCV, pages 597–613. Springer, 2016.
 [10] B. Gong, Y. Shi, F. Sha, and K. Grauman. Geodesic flow kernel for unsupervised domain adaptation. In CVPR, pages 2066–2073. IEEE, 2012.
 [11] Y. Guo, G. Ding, X. Jin, and J. Wang. Transductive zeroshot recognition via shared model space learning. In AAAI, volume 3, page 8, 2016.
 [12] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. pages 770–778, 2015.
 [13] J. Hoffman, E. Tzeng, T. Park, J.Y. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell. Cycada: Cycleconsistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213, 2017.
 [14] M. Kampffmeyer, Y. Chen, X. Liang, H. Wang, Y. Zhang, and E. P. Xing. Rethinking knowledge graph propagation for zeroshot learning. arXiv preprint arXiv:1805.11724, 2018.
 [15] T. N. Kipf and M. Welling. Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
 [16] E. Kodirov, T. Xiang, Z. Fu, and S. Gong. Unsupervised domain adaptation for zeroshot learning. In ICCV, pages 2452–2460, 2015.
 [17] E. Kodirov, T. Xiang, and S. Gong. Semantic autoencoder for zeroshot learning. arXiv preprint arXiv:1704.08345, 2017.
 [18] W. Li, L. Wang, W. Li, E. Agustsson, and L. Van Gool. Webvision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862, 2017.
 [19] M.Y. Liu and O. Tuzel. Coupled generative adversarial networks. In NIPS, pages 469–477, 2016.
 [20] M. Long, Y. Cao, J. Wang, and M. I. Jordan. Learning transferable features with deep adaptation networks. arXiv preprint arXiv:1502.02791, 2015.
 [21] M. Long, H. Zhu, J. Wang, and M. I. Jordan. Deep transfer learning with joint adaptation networks. arXiv preprint arXiv:1605.06636, 2016.
 [22] G. A. Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
 [23] M. Norouzi, T. Mikolov, S. Bengio, Y. Singer, J. Shlens, A. Frome, G. S. Corrado, and J. Dean. Zeroshot learning by convex combination of semantic embeddings. arXiv preprint arXiv:1312.5650, 2013.
 [24] M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell. Zeroshot learning with semantic output codes. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, NIPS, pages 1410–1418. Curran Associates, Inc., 2009.
 [25] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2):199–210, 2011.
 [26] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. 2017.
 [27] J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word representation. In Conference on Empirical Methods in Natural Language Processing, pages 1532–1543, 2014.
 [28] M. Rohrbach, S. Ebert, and B. Schiele. Transfer learning in a transductive setting. In NIPS, pages 46–54, 2013.
 [29] B. RomeraParedes and P. Torr. An embarrassingly simple approach to zeroshot learning. In ICML, pages 2152–2161, 2015.
 [30] R. Salakhutdinov, A. Torralba, and J. Tenenbaum. Learning to share visual appearance for multiclass object detection. In CVPR, pages 1481–1488. IEEE, 2011.
 [31] R. Socher, M. Ganjoo, C. D. Manning, and A. Ng. Zeroshot learning through crossmodal transfer. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, NIPS, pages 935–943. Curran Associates, Inc., 2013.
 [32] J. Song, C. Shen, Y. Yang, Y. Liu, and M. Song. Transductive unbiased embedding for zeroshot learning. In ICCV, pages 1024–1033, 2018.
 [33] B. Sun and K. Saenko. Deep coral: Correlation alignment for deep domain adaptation. In ECCV, pages 443–450. Springer, 2016.
 [34] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adaptation. In CVPR, volume 1, page 4, 2017.
 [35] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell. Deep domain confusion: Maximizing for domain invariance. Computer Science, 2014.
 [36] R. Volpi, P. Morerio, S. Savarese, and V. Murino. Adversarial feature augmentation for unsupervised domain adaptation. In ICCV, pages 5495–5504, 2018.
 [37] S. Wang, S. Jiang, Q. Huang, and Q. Tian. Multifeature metric learning with knowledge transfer among semantics and social tagging. In CVPR, pages 2240–2247. IEEE, 2012.
 [38] X. Wang, Y. Ye, and A. Gupta. Zeroshot recognition via semantic embeddings and knowledge graphs. In CVPR, pages 6857–6866, 2018.
 [39] Y. Xian, B. Schiele, and Z. Akata. Zeroshot learningthe good, the bad and the ugly. arXiv preprint arXiv:1703.04394, 2017.
 [40] J. Zhang, Z. Ding, W. Li, and P. Ogunbona. Importance weighted adversarial nets for partial domain adaptation. In CVPR, pages 8156–8164, 2018.
 [41] L. Zhang, T. Xiang, S. Gong, et al. Learning a deep embedding model for zeroshot learning. 2017.
 [42] J. Zhuo, S. Wang, W. Zhang, and Q. Huang. Deep unsupervised convolutional domain adaptation. In Proceedings of the 2017 ACM on Multimedia Conference, pages 261–269. ACM, 2017.