Exploiting known semantic relationships between fine-grained tasks is critical to the success of recent model agnostic approaches. These approaches often rely on meta-optimization to make a model robust to systematic task or domain shifts. However, in practice, the performance of these methods can suffer, when there are no coherent semantic relationships between the tasks (or domains). We present Invenio, a structured meta-learning algorithm to infer semantic similarities between a given set of tasks and to provide insights into the complexity of transferring knowledge between different tasks. In contrast to existing techniques such as Task2Vec and Taskonomy, which measure similarities between pre-trained models, our approach employs a novel self-supervised learning strategy to discover these relationships in the training loop and at the same time utilizes them to update task-specific models in the meta-update step. Using challenging task and domain databases, under few-shot learning settings, we show that Invenio can discover intricate dependencies between tasks or domains, and can provide significant gains over existing approaches in terms of generalization performance. The learned semantic structure between tasks/domains from Invenio is interpretable and can be used to construct meaningful priors for tasks or domains.
Invenio: Discovering Hidden Relationships Between Tasks/Domains Using Structured Meta Learning The first two authors contributed equally.
Sameeksha Katoch Kowshik Thopalli Jayaraman J. Thiagarajan
Pavan Turaga Andreas Spanias
Arizona State University, Lawrence Livermore National Labs
The success of deep learning in a wide-variety of AI applications can be partly attributed to its ability to be re-purposed for novel tasks or operating environments. This is particularly crucial in data-hungry scenarios, e.g. few shot learning, where datasets or models from related tasks can be effectively leveraged to solve the task at hand. Though transfer learning methods have been proposed for applications including computer vision , language processing [4, 20] and medical image analysis , even the more sophisticated approaches (based on deep neural networks) are often found to be brittle when applied in scenarios characterized by challenging domain and task shifts. Consequently, it is imperative to qualify the degree of similarity between the training and testing scenarios (domain or task), in order to assess if a model can be effectively re-purposed. This naturally calls for approaches that can reason about the semantic space of tasks (or domains), and to quantify how difficult it is to transfer from a scenario to another. In this spirit, Achille et al.  recently proposed an information-theoretic framework for characterizing the complexity of tasks through the information from parameters of a deep neural network, and designed an asymmetric distance function between tasks that was showed to be strongly correlated to the ease of transfer learning based on model fine-tuning. A similar distance function was utilized by TASK2VEC  to produce embeddings that describe the semantic space of tasks. More specifically, this distance is computed based on Fisher information from parameters of trained networks for different tasks, and hence this implicitly assumes access to sufficient training data. As a result, it is not suitable for comparing tasks in few-shot learning scenarios.
In this paper, we represent both tasks and domains by few-shot datasets of input images and discrete output labels, and our goal is to infer hidden relationships between tasks or domains. We present Invenio, a scalable model agnostic (meta learning) approach, that can effectively infer the semantic structure of the space of tasks (or domains) and at the same time leverage the inferred relationships to perform task-specific model optimization. In other words, instead of explicitly performing model fine-tuning between two similar tasks, Invenio identifies the structure that is maximally beneficial for transfer learning between the entire set of tasks or domains.
Proposed Work. The recent class of meta-optimization approaches for few-shot domain/task generalization [5, 10] attempt to learn a single model on a set of observed tasks, which is assumed to be only a few gradient descent steps away from good task-specific models. Their success hinges on the assumption that the observed set of tasks or domains are realizations from a common distribution. However, in practice, the degree of similarity between tasks or domains are unknown a priori, and hence the assumption of finding a single base learner could be restrictive. In contrast, Invenio makes a general assumption that there exists an inherent semantic space of tasks/domains, wherein information from each subset of related tasks (or domains) can be used to make a task-specific learner effective. To this end, we develop a structured meta-learning algorithm (Figure 1) that infers semantic similarities between different tasks (or domains), while also obtaining generalizable base learners. More specifically, our approach allows each task (or equivalently domain) to use separate model parameters while enabling information sharing between related tasks, and trains them for generalization using gradient-through-gradient style optimization. A crucial outcome from the proposed approach is a structured semantic space of tasks (or domains) which can be utilized to build powerful task/domain priors. In order to demonstrate the use of Invenio, we design challenging task ( tasks) and domain ( domain shifts) databases and show that the inferred semantic relationships are highly interpretable. More importantly, Invenio provides significant performance gains, in terms of generalization, when compared to conventional strategies.
Our contributions can be summarized as follows:
Unfolding the inherent semantic structure for tasks/domains in few-shot learning settings, which can be used for designing effective priors.
A structured meta learning algorithm for leveraging the similarities to build highly generalizable models.
Empirical studies with custom task and domain databases to show the effectiveness of Invenio to identify meaningful relationships which can be used to enable improved generalization.
2 Problem Setup
In this paper, we consider systematic task and domain shifts in image classification, and explore the use of a model agnostic approach for inferring semantic similarities between them using few-shot datasets. We begin by describing the overall setup and assumptions.
We represent each task or domain as a few-shot dataset with input images and output space of labels. We denote the set of labeled datasets corresponding to a set of tasks by the general notation . Each domain is also defined similarly using a finite dataset. We consider a few-shot setting, wherein each dataset is comprised of labeled examples (they are not assumed to be equal), and we assume that there is access to observed data from all tasks (or domains). In existing approaches for task and domain generalization, each task (or domain) is assumed to be a realization from a common unknown distribution, (or ), and they are expected to be related to each other for guaranteed success. However, this assumption is highly restrictive and often times the relationships between tasks or domains are not known a priori. Hence, we develop a structured meta learning technique that infers the relationships between tasks/domains and simultaneously leverages this information to produce task/domain-specific models with improved performance.
Domain Design. Domain shifts correspond to variations in the data statistics that can render a trained model ineffective , particularly when we do not do have access to data that is representative of the testing scenario. In this case, we assume that the datasets correspond to solving the same task, i.e. the same input and output label spaces, but are characterized by differences in the marginal feature distribution with identical conditional distributions . Example domains that we consider in our experiments include a variety of image transformations such as scaling, color transformations, rotation etc.
Task Design. In this case, we assume each dataset corresponds to a binary classification problem of detecting the presence of a specific object, while the negative class is heterogeneous (contains images from multiple classes). However, we assume that there are no inherent domain shifts and the marginal feature distributions are identical.
Handling Task/Domain Shifts: Broadly, approaches for dealing with domain shifts can be categorized into domain adaptation [7, 15, 16] and domain generalization  techniques. While the former adapts a pre-trained model using unlabeled or sparsely labeled target domain data, the latter is aimed at designing a model that can work well even in unseen target domains. When data from multiple domains are available at train time, one can utilize multi-domain learning methods , which attempt to extract domain-agnostic feature representations with the hope that these common factors can persist even with unseen test domains. More recently, model agnostic approaches that rely on meta-optimization (learning to learn)  to improve the generalization of a base learner have gained a surge in interest. On the other hand, combating task shifts requires controlled knowledge transfer from a pre-trained model that was trained on a task related to the target task. When compared to conventional multi-task learning methods , model agnostic meta-learning  have been found to be effective for few-shot learning scenarios.
Model Agnostic Task/Domain Generalization: Following [10, 5], we will now derive a generic model agnostic approach that applies to both task and domain generalization, which forms the core of Invenio. Though we develop the formulation for tasks , it is directly applicable to the case of domains as well. In order to enable the generalization of a unified model with parameters to all observed tasks, we first split the set of tasks into meta-train and meta-test tasks, and respectively.
Given the prediction for a sample from task , we can use the task-specific loss function to measure its fidelity. A typical meta-optimization strategy consists of two steps referred as meta-train and meta-test steps. In the meta-train step, the model parameters are updated using the aggregated losses from the meta-train tasks:
This loss function is parameterized using and hence the gradients are calculated with respect to this loss function, .
where denotes the step size. In the meta-test step, the estimated parameters are evaluated on the meta-test tasks to virtually measure the generalization performance. Consequently, the aggregated loss function obtained using the updated parameters on the test tasks can be written as
Our goal is to update the parameters such that it can be effective for both meta-train and meta-test tasks. Hence, the overall objective is:
To intuitively understand this objective, we follow the analysis in  and perform first-order Taylor expansion on the second term to obtain
where the expansion is carried out around . Intuitively, the meta-optimization process amounts to minimizing the losses on training tasks while maximizing the dot product between the gradients from train and test tasks.
4 Proposed Approach
In this section, we present the proposed approach for inferring the semantic space of tasks or domains using only few-shot datasets. Without loss of generality, we set up the formulation for the task case, though it is applicable to domains as well. In contrast to existing meta-optimization approaches  , we express the learner for each task as a task-specific transformation described using parameters , which maps input images from that task to its output label space . Consequently, for a sample from the task , i.e., , the prediction can be obtained as: In our setup, we implement all these learners as deep networks (e.g convolutional networks). As described earlier, our formulation follows classical multi-task learning, where all datasets are assumed to be drawn from the same data distribution (no covariate shifts) and they differ only through task shifts . Further, we assume that each task solves a binary classification problem, e.g. detecting the presence of a certain object in images.
We now describe our approach that reveals the semantic relationships between tasks, and at the same time exploits this semantic similarity to produce improved generalization for all observed tasks. Though there are fundamental differences in how tasks and domains are defined in our setup (See Section 2, the meta-learning style optimization of Invenio is applicable to both tasks and domains.
Structured Meta-Learning with Task-Specific . We follow the notations introduced in Section 3. The parameters for the task-specific transformations are learned such that each should generalize to semantically related tasks. We propose a novel structured meta-learning formulation to achieve this objective. During the meta-train step, for each of the train tasks , we compute the loss based on binary cross-entropy (cross-entropy for multi-class classification for domains) as follows:
Here, the subscript for the loss indicates that the data samples from task are used for evaluating the loss function with model parameters . The parameters are then updated using the gradient update step as:
In order to ensure that the learned transformation generalizes to related tasks, we need to first quantify the similarity between a meta-train task and a meta-test task . More specifically, we measure the similarity between two tasks as the dot product between gradients with respect to the weights relative to the losses evaluated on both the training domain data and test domain data respectively. Mathematically, this is expressed as:
The summation in the above expression is over all parameters in the set , and the gradient estimate is obtained by summing over all mini-batches. Intuitively, input images from both tasks and are processed using the same transformation , and we expect the tasks to be related if the gradients for updating the parameters are in a similar direction. This intuition corroborates with existing formulations in [1, 2], where it was shown that the gradients of weights of a neural network relative to a task-specific loss are a meaningful representation of the task itself. Hence the aggregated meta-test loss for updating the parameters can be expressed as
Note, the for the set of meta-test tasks are normalized to sum to for obtaining . As it can be seen in the above expression, the similarity scores between tasks are used to determine which test tasks that the parameters must generalize to and this naturally induces a semantic structure between the set of input tasks. The overall objective for updating each using the structured meta-optimization can thus be written as,
As mentioned above, the proposed strategy infers a semantic space of tasks or domains. This evolution of the semantic structure that occurs inherently as a part of our optimization strategy serves as a powerful tool in terms of its resourcefulness for generalization to new test scenarios, development of new tasks, investigating semantic similarities and unmasking new affiliations. A detailed algorithm of our approach is outlined in Algorithm 1.
5 Experiment: Task Shifts
In order to demonstrate Invenio, we construct a custom task database, study the inferred semantic structure, and perform quantitative evaluation in terms of classification performance across all tasks. Our database is a collection of tasks sampled from four different benchmark image classification datasets, with the underlying assumption that there are no inherent covariate shifts. As described earlier, our focus is on the few-shot learning scenarios and we do not assume prior knowledge of the underlying relationships between the constituent tasks.
Task Database Design. Our task database consists of binary classification tasks sampled from four different datasets namely CUB  , DeepFashion , iMaterialist  and iNaturalist . While the positive class in each task corresponds to a specific image class from one of the datasets, the negative class contains images (randomly chosen) from all datasets.
CUB200 : The Caltech-UCSD Birds dataset contains a total of images from different bird species. We randomly selected categories and included them as different binary classification tasks.
Deep Fashion : This is a large-scale clothes database with diverse images from categories. We used images from randomly selected categories to construct our tasks.
iMaterialist : We considered categories from this large-scale fashion dataset to form our tasks. Note that, this curation was performed such that some of the chosen categories overlap with those selected from Deep Fashion.
iNaturalist : This is a large-scale species detection dataset. We sampled categories from broad taxonomical classes such as Mammalia, Reptalia, Aves etc. Note, in the selected set of tasks, tasks correspond to birds from the Aves category, and hence we expect to observe semantic relevance to tasks from the CUB database.
By design, there is partial overlap in tasks between iNaturalist and CUB datasets, and similarly between iMaterialist and DeepFashion, while simultaneously there is a clear disconnect between fashion and species datasets. Such a design enables us to evaluate our approach and reason about the discovered semantic structure between tasks. Each binary classification problem contains a maximum of positive samples (from a specific image class), while another randomly chosen samples from the remaining set of tasks constitute the negative class.
Architecture. We use the same model architecture for each task , but with individual parameter sets – Conv(3,20,3,3), ReLU, MaxPool, Conv(20,50,3,3), ReLU, MaxPool, Linear(2450,500), ReLU, Linear(500,1), ReLU. Note, we resize all images to pixels. The learning rates for the meta-train and meta-test phases were set to e and e respectively.
|Meta-Test Batch Size|
Semantic Space of Tasks. Invenio jointly infers the inherent semantic structure and optimizes the task-specific model parameters through a structured meta-learning approach. In order to analyze the inferred semantic space, upon completion the training process, we compute the pairwise task similarities between all tasks using (7). Denoting the similarity matrix by , we perform truncated SVD to obtain low-dimensional embeddings for analysis and visualization of the task relationships. Figure 6(a) illustrates a D visualization of the task space, which clearly reveals a separation between the fashion and species datasets. In Figure 2, we show the most similar tasks for each query task (left most in each row) in the embedding space (D embeddings). We find that the semantics inferred by Invenio matches human knowledge in these examples, and hence can be expected to lead to improved generalization when solved jointly. For example, the Formal Dresses task is found to be semantically similar with other types of dresses, Tees, Cardigans etc., while Eurycea (snake) is in the neighborhood of other reptiles, .e.g. Iguana and Varanus. However, we also find somewhat unexpected relationships – Flycatcher and Fistularia Commersonii (fish), or Euphydryas (butterfly) and Populus (plant) due to the occurrence of similar visual patterns though they are semantically unrelated.
Performance Evaluation: Our hypothesis is that by leveraging the inferred semantic structure into the learning process, we can improve the quality of the task-specific predictive models. To this end, we evaluate the classification performance of Invenio on a held-out test set for each of the tasks in the database. For comparison, we consider the two popular baselines: (i) Transfer Learning: This is the most commonly adopted strategy for task adaptation. We train a model, with the same architecture as ours on the complete CIFAR10 dataset  and subsequently fine-tune the model using labeled data from each of the tasks independently; and (ii) Shared Model: This approach assumes a shared model across all the tasks and employs the model agnostic meta leanring (MAML)  technique to optimize for the model parameters such that it generalizes to the entire set of tasks. We use the accuracy metric to measure the performance and we report results on the held-out test set for all cases. Note that, the proposed approach can be viewed as a generalization of the Shared Model baseline, wherein the task relationships are exploited while updating the task-specific model parameters.
Figure 6(b) compares the classification performance of the different approaches on the entire task database. In particular, we show the median, along with and quantiles, of test accuracies across all tasks. As evidenced from the plot, by exploiting the semantic structure of the space of tasks, the proposed approach significantly outperforms the baseline methods. A fine-grained evaluation of Invenio in comparison to Transfer Learning for each of the tasks can be found in Figure 6(c). A critical parameter of the proposed approach in Algorithm 1 is the meta-test batch size. While utilizing the entire meta-test set would allow us to identify the relationships effectively, it is not computationally feasible. Hence, in practice, we use a smaller batch size. From Table 1, we find that, increasing the batch size until from to leads to appreciable improvements, while we observe no significant improvements beyond .
While the examples showed in Figure 2 reasonably agree with our understanding of task similarities, we also find cases (see Figure 7) where the relationships are not easily justified. Nevertheless, the inferred structure still provided significant performance gains. For example, for the Blazers task, the neighborhood is highly diverse, however Invenio achieves test accuracy compared to with transfer learning. Similarly, images from the Eumetopias Jubatu task (sea lion) contains complex visual patterns leading to not-so-interpretable relationships. Surprisingly, this achieves a large performance gain of over transfer learning. These results clearly evidence the importance of considering similarities between tasks to build highly effective predictive models.
6 Experiment: Domain Shifts
The model agnostic nature of Invenio allows its use in revealing hidden relationships between datasets that involve complex covariate shifts. To this end, we develop a custom domain database and demonstrate the effectiveness of Invenio in improving the fidelity of the resulting domain-specific classifiers.
Domain Database Design: Our domain database is composed different variants of the CIFAR-10 dataset , obtained using a broad class of image transformations, while solving the same task of multi-class classification ( classes). Here is the complete list of domain shifts considered: (i) Rotation: variants were generated by rotating the images, where the degree of rotation was varied between to ; (ii) Flip: We generated datasets by applying horizontal and vertical flips to the images. These transformations can be viewed as special cases of Rotation; (iii) Affine: We constructed domains by applying different affine transformations to images and this was carried out by varying the settings for scale and shear; (iv) Color: different datasets were created by manipulating parameters pertinent to color transformations, namely brightness, saturation, contrast and hue; and (v) Filter: We used blurring and Gaussian smoothing techniques to create variants of the base domain. While Gaussian smoothing produces blurring by applying Gaussian function based transformation on image pixels, the Box Blur filter replaces each pixel by the average of its neighboring pixels.
Intuitively, we expect geometric transformations such as Affine, Rotation and Flip to be related among themselves and can benefit by shared feature representations. On the other hand, transformation such as hue, saturation, contrast and brightness are expected be strongly related. Each domain is comprised of 300 randomly chosen samples from each class and the performance evaluation is carried out using a held-out test set for all domains.
Architecture: For all the domain specific base learners, we use the same architecture – Conv(3,20,5,1), ReLU, MaxPool, Conv(20,50,5,1), ReLU, MaxPool, Linear(2450,500), ReLU, Linear(500,10), ReLU which follows the same syntax as tasks. Similar to the previous experiment, the learning rates for the meta-train and meta-test phases were set to e and e respectively.
Semantic Space of Domains. Figure 8(a) provides a D visualization of the semantic space obtained by applying truncated SVD on the similarity matrix between the set of domains. As it can be observed, the structure largely aligns with our hypothesis, i.e., the geometric transforms such as, rotation, flip and shear are closely related to each other. An interesting outcome is that the scale transformation does not belong in the same part of the semantic space as the other geometric transformations. We attribute this to the information loss that occurs due to cropping of the zoomed image to remain within the original boundaries.
Similar observations can be made about domains constructed based on color transformations to the original images. It is evident from Figure. 8(a) that the datasets generated by manipulating hue, saturation and contrast respectively, are closely related to each other. However, brightness changes manifest as being completely unrelated to other standard color transformations. As illustrated in Figure 9, this may be partly due to the high degree of brightness change that we applied, which caused the shadows/darker regions to mask the crucial features like edges. On the other hand the Contrast transformation makes separation between dark and bright regions more prominent. Finally, the two filtering transformations that we considered are found to carry shared knowledge about the images, since both of them produce low-pass variants of the original images.
Performance Evaluation. Similar to the task shifts case, we evaluate the effect of incorporating the learned semantic relationships into the domain-specific model optimization, in terms of performance on held-out test data. In Figure 8(b), we compare the test accuracies for all domains against the baseline approach where all domains use a shared model. We find that Invenio produces significant performance improvements particularly in the cases of Rotation, Hue and Shear transformations, which actually corresponds to the densest part of the semantic space. This indicates that Invenio is able to perform meaningful data augmentation, thus leading to models with higher fidelity.
In this paper, we introduced Invenio, a structured meta learning approach that infers the inherent semantic structure, and provides insights into the complexity of transferring knowledge between different tasks (or domains). Unlike existing approaches such as Task2Vec  and Taskonomy , which compare tasks by measuring similarities between pre-trained models, Invenio adopts a self-supervised strategy by identifying the semantics inside the training loop. Furthermore, our approach is applicable to few-shot learning settings and can scale effectively to a large number of tasks. Finally, the inferred semantics largely agree with our intuition, and even when they do not, they still help in improving the classification performance over existing transfer learning strategies.
An important outcome of this work is that the insights from Invenio can be utilized to produce powerful task/domain priors, which can in turn be used to sample new tasks, akin to generative models for data. This work belongs to the class of recent approaches that are aimed at abstracting learning objectives in AI systems [13, 11]. Designing effective strategies for sampling from these task priors, and building algorithmic solutions for data augmentation remain part of future work.
- (2019) Task2Vec: task embedding for meta-learning. ArXiv abs/1902.03545. Cited by: §1, §4, §7.
- (2019) The information complexity of learning tasks, their structure and their distance. arXiv preprint arXiv:1904.03292. Cited by: §1, §4.
- (2016) Learning to learn by gradient descent by gradient descent. ArXiv abs/1606.04474. Cited by: §3.
- (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, Cited by: §1.
- (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, Cited by: §1, §3, §3, §4, §5.
- (2019) The imaterialist fashion attribute dataset. arXiv preprint arXiv:1906.05750. Cited by: §5, §5.
- (2017) CyCADA: cycle-consistent adversarial domain adaptation. In ICML, Cited by: §3.
- (2019) Machine learning on biomedical images: interactive learning, transfer learning, class imbalance, and beyond. In 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 85–90. Cited by: §1.
- (2010) Convolutional deep belief networks on cifar-10. Unpublished manuscript 40 (7), pp. 1–9. Cited by: §5, §6.
- (2017) Learning to generalize: meta-learning for domain generalization. In AAAI, Cited by: §1, §3, §3, §3.
- (2019) Self-supervised generalisation with meta auxiliary learning. arXiv preprint arXiv:1901.08933. Cited by: §7.
- (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1096–1104. Cited by: §5, §5.
- (2017) Interpretable and pedagogical examples. arXiv preprint arXiv:1711.00694. Cited by: §7.
- (2017) An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098. Cited by: §3.
- (2018) A DIRT-t approach to unsupervised domain adaptation. In International Conference on Learning Representations, External Links: Cited by: §3.
- (2017) Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition (CVPR), Vol. 1, pp. 4. Cited by: §3.
- (2018) The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8769–8778. Cited by: §5, §5.
- (2011) The caltech-ucsd birds-200-2011 dataset. Cited by: §5, §5.
- (2017) Unifying multi-domain multitask learning: tensor and neural network perspectives. In Domain Adaptation in Computer Vision Applications, pp. 291–309. Cited by: §4.
- (2019) XLNet: generalized autoregressive pretraining for language understanding. ArXiv abs/1906.08237. Cited by: §1.
- (2018) Taskonomy: disentangling task transfer learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3712–3722. Cited by: §1, §7.
- (2018) Adversarial multiple source domain adaptation. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi and R. Garnett (Eds.), pp. 8559–8570. External Links: Cited by: §2, §3.